gRPC Streaming Patterns
Meridian exposes long-running model inference and tool calls over bidirectional gRPC streams. This recipe walks through server-streaming, client-streaming, and full duplex flows, with production-ready backpressure, cancellation, and reconnect semantics for the Meridian SDK.
1. Server-streaming token deltas
The most common Meridian pattern: client sends one request, server emits N token deltas until the model halts. Use this for chat completions, embeddings batches, and any flow where the client has nothing to add mid-stream. Keep frames small (under 16 KB) and flush on every delta so the proxy never buffers.
service Meridian {
rpc Complete(CompleteRequest)
returns (stream TokenDelta);
}
message TokenDelta {
string text = 1;
uint32 index = 2;
bool final = 3;
}2. Client-streaming uploads
For embeddings ingest or long context priming, the client streams chunks and the server returns a single summary. Cap each chunk at 1 MB to stay under the default gRPC max message size, and send a sentinel finalize=true chunk so the server knows to commit. Meridian charges per ingested token, not per chunk, so chunk granularity is purely a transport concern.
3. Bidirectional duplex for agents
Agent loops use full duplex: the server emits tool-call requests, the client executes them and streams results back into the same channel. Always attach a per-call deadline of 60s and propagate cancellation downward; orphaned tool calls are the #1 cost leak in agent workloads. The Meridian SDK ships a DuplexSession helper that handles reconnect with exponential backoff up to 30s.