RECIPE

PII Redaction with LLMs

Strip personally identifiable information from free-form text before it ever reaches a downstream model, a log sink, or a vector store. This recipe shows the production pattern Meridian customers use to keep emails, phone numbers, SSNs, and addresses out of training pipelines while preserving semantic structure for analytics.

1. Pick a redaction strategy

Two strategies dominate: deterministic regex pre-filtering and LLM-guided entity extraction. Regex catches structured patterns (emails, credit cards, phone numbers) cheaply and is the right first pass. The LLM then sweeps for soft entities like names, locations, and free-text addresses that regex misses. Use a small fast model here — speed beats raw capability.

2. Replace, do not delete

Swap detected spans with stable placeholders (e.g. [EMAIL_1], [NAME_2]) instead of stripping them. This preserves sentence structure for the downstream model, keeps token counts stable, and lets you re-hydrate the original text after processing if you need to return a user-facing answer.

3. Call the Meridian endpoint

Meridian exposes a single redaction endpoint that runs the regex pre-pass and the LLM sweep in one round trip. Pass the raw text, get back redacted text plus a span map you can store separately under stricter access controls.

import { Meridian } from '@meridian/sdk';

const client = new Meridian({ apiKey: process.env.MERIDIAN_KEY });

const result = await client.redact({
  text: 'Email me at jane@acme.com about order 4242-4242-4242-4242',
  model: 'azure/model-router',
  entities: ['EMAIL', 'CREDIT_CARD', 'NAME', 'PHONE'],
});

console.log(result.redacted);
// "Email me at [EMAIL_1] about order [CREDIT_CARD_1]"

console.log(result.spans);
// [{ type: 'EMAIL', original: 'jane@acme.com', placeholder: '[EMAIL_1]' }]