Partial response patterns
Stream tokens to the client without waiting for the full completion. Partial response patterns let you flush UI updates incrementally, cut perceived latency in half, and gracefully recover when the upstream model drops the connection mid-flight. This recipe walks through three production-tested strategies used inside Meridian.
1. Token-level streaming
The simplest pattern: forward each delta from the model straight to your render layer. Works well for chat interfaces where every token should appear as soon as it lands. Combine with a tiny buffer (about 40 characters) so React reconciliation does not thrash on every single character. Flush on sentence boundaries when you can.
2. Chunked JSON assembly
When the model is emitting structured JSON, naiveJSON.parsebreaks on every partial chunk. Use a streaming parser that yields completed keys as they finalize, and project them into your UI slot-by-slot. The user sees fields populate top-down instead of one big spinner.
3. Resume on disconnect
Long completions sometimes drop. Persist the partial buffer plus the last accepted sequence id, and on reconnect request the model to continue from that anchor. Meridian's gateway preserves the routing decision for 30 seconds so the resumed request lands on the same model deployment.
Example
// Stream tokens incrementally with partial response handling
import Meridian from '@meridian/sdk';
const client = new Meridian({ apiKey: process.env.MERIDIAN_KEY });
const stream = await client.chat.stream({
model: 'azure/model-router',
messages: [{ role: 'user', content: 'Summarize the doc' }],
});
let buffer = '';
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content ?? '';
buffer += delta;
process.stdout.write(delta);
// Flush partial UI every ~40 chars or on sentence boundary
if (buffer.length > 40 || /[.!?]\s$/.test(buffer)) {
await renderPartial(buffer);
buffer = '';
}
}
await renderFinal();