RECIPE

gRPC Streaming Patterns

Meridian exposes long-running model inference and tool calls over bidirectional gRPC streams. This recipe walks through server-streaming, client-streaming, and full duplex flows, with production-ready backpressure, cancellation, and reconnect semantics for the Meridian SDK.

1. Server-streaming token deltas

The most common Meridian pattern: client sends one request, server emits N token deltas until the model halts. Use this for chat completions, embeddings batches, and any flow where the client has nothing to add mid-stream. Keep frames small (under 16 KB) and flush on every delta so the proxy never buffers.

service Meridian {
  rpc Complete(CompleteRequest)
    returns (stream TokenDelta);
}

message TokenDelta {
  string text = 1;
  uint32 index = 2;
  bool   final = 3;
}

2. Client-streaming uploads

For embeddings ingest or long context priming, the client streams chunks and the server returns a single summary. Cap each chunk at 1 MB to stay under the default gRPC max message size, and send a sentinel finalize=true chunk so the server knows to commit. Meridian charges per ingested token, not per chunk, so chunk granularity is purely a transport concern.

3. Bidirectional duplex for agents

Agent loops use full duplex: the server emits tool-call requests, the client executes them and streams results back into the same channel. Always attach a per-call deadline of 60s and propagate cancellation downward; orphaned tool calls are the #1 cost leak in agent workloads. The Meridian SDK ships a DuplexSession helper that handles reconnect with exponential backoff up to 30s.