Benchmark · run of June 18, 2026

Measured, not promised.

210 synthetic French legal documents (rulings, invoices, HR letters) run through Anoni and three public detectors. Same documents, same scoring rule, fixed seed. The test replays identically.

Strict recall

Anoni (models + rules)

94.8%

openai/privacy-filter (raw)

67.8%

Microsoft Presidio (fr)

58.6%

AInonymizer

64.4%

Anoni (rules only)

59.3%

Share of data found with the correct category · official run of June 18, 2026, 210 synthetic French legal documents.

System	Strict recall	Leaks avoided	False positives
Anoni (models + rules)	94.8%	99.0%	73
openai/privacy-filter (raw)	67.8%	91.9%	217
Microsoft Presidio (fr)	58.6%	77.1%	356
AInonymizer	64.4%	67.9%	143
Anoni (rules only)	59.3%	65.2%	0

Official run of June 18, 2026 · Apple M5 Pro, 24 GB, MPS, on 210 documents. “Strict recall”: the data is found with the right category (overlap ≥ 66%). “Leaks avoided”: share of sensitive characters covered by a detection, all categories combined. “False positives”: detections on text that isn’t personal data. Anoni has the fewest (73) of any model detector, so it over-redacts the least. This is what matters before sending text to an AI.

Dive into the detail

Recall per category, every system

The bench, category by category. The solid bar is Anoni; the thin bar is the best competitor on that row. The gap opens up where context has to be understood: places, phone numbers, people. On checksum-validated identifiers, almost everyone maxes out.

Anoni best competitor

People

99.6%Presidio 94.5%

Addresses

86.7%AInonymizer 81.9%

Places

94.3%privacy-filter 51.1%

Emails

100.0%Presidio 100.0%

Phone numbers

100.0%Presidio 88.6%

Dates

100.0%AInonymizer 100.0%

NIR

98.6%privacy-filter 100.0%

SIRET / SIREN

100.0%privacy-filter 90.7%

IBAN

100.0%AInonymizer 100.0%

Organizations

75.7%AInonymizer 53.7%

The solid black bar is Anoni; the thin bar below is the best competitor. Perfect detection (100%) on: Emails · Phone numbers · Dates · SIRET / SIREN · IBAN. French identifiers are validated, checksum included, not guessed from a pattern.

The “organizations” recall is the lowest, but it’s a labeling matter, not a leak: company names are sometimes redacted as people. The data still gets removed. Organizations are covered at 96.4%.

The method, honestly

The documents are synthetic. Generated from six realistic legal templates, with valid French identifiers (NIR, SIRET, IBAN with correct checksums) and ground truth recorded at generation. Each system is measured as shipped, default settings, pinned versions. No real document was used in the test. So no personal data.

A synthetic bench has its limits. It measures detection on known document structures, not the infinite variety of the real world. That is also what makes it reproducible. The test set and the measurement details will be published. Until then, the numbers can be verified on request: contact@anoni.dev.

One more piece of honesty. The first run of the bench revealed a flaw in our own engine: a SIRET could mask a NIR under the wrong label. Fixed, re-measured, published. That is exactly what a benchmark is for.

And on data we didn’t write

A synthetic bench is still our bench. So Anoni is also measured on an independent corpus: ai4privacy OpenPII. Real French personal data, hand-annotated, under an open license (CC-BY). 500 documents we didn’t write.

On Anoni’s French scope (people, addresses, places, emails, phone numbers, dates): 92.7% strict recall and 97.1% of sensitive characters covered. An independent measurement. The honest answer to “sure, but it’s your own test set”.

The corpus doesn’t label organizations and mixes in non-French identifiers (cards, passports) outside Anoni’s scope. The figure therefore covers the categories Anoni targets. Reproducible, like the rest.

Medical domain

And on healthcare documents

Discharge summaries, prescriptions, referral letters. Real medical records are never public, and that is exactly why they must stay on the machine. With no public corpus, the medical bench is synthetic: 105 documents, same scoring rule.

Anoni (models + rules)

91.4%

openai/privacy-filter (raw)

73.6%

Microsoft Presidio (fr)

55.1%

AInonymizer

55.9%

Anoni (rules only)

71.4%

Strict recall on 105 synthetic French medical documents · run of June 18, 2026.

The social-security number (carte Vitale), patient and doctor names, dates: all detected. And above all the patient record number (IPP), the hospital’s own identifier, which Anoni covers at 100.0% where most tools leave it in the clear.

Public medical de-identification models are English-only and gated behind access agreements. So Anoni’s medical bench is synthetic and reproducible, like the legal one.

See also: compliance · for lawyers · download.