Anoni

Benchmark

Measured, not promised.

60 synthetic French legal documents (rulings, invoices, HR letters) run through Anoni and three public detectors. Same documents, same scoring rule, fixed seed. The test replays identically.

Strict recall

Anoni (models + rules)
89.0%
openai/privacy-filter (raw)
69.7%
Microsoft Presidio (fr)
65.7%
AInonymizer
64.7%
Anoni (rules only)
59.2%
Share of data found with the correct category · official run of June 12, 2026, 60 synthetic French legal documents.
SystemStrict recallLeaks avoidedFalse positives
Anoni (models + rules)89.0%99.0%15
openai/privacy-filter (raw)69.7%93.8%64
Microsoft Presidio (fr)65.7%83.1%104
AInonymizer64.7%68.4%40
Anoni (rules only)59.2%65.1%0

Official run of June 12, 2026. “Strict recall”: the data is found with the right category (overlap ≥ 66%). “Leaks avoided”: share of sensitive characters covered by a detection, all categories combined. This is what matters before sending text to an AI.

Perfect detection (100%) on this bench

NIR (social security) · SIRET / SIREN · IBAN · People · Emails · Dates. Structured French identifiers are validated, checksum included. Not guessed from a pattern.

The method, honestly

The documents are synthetic. Generated from six realistic legal templates, with valid French identifiers (NIR, SIRET, IBAN with correct checksums) and ground truth recorded at generation. Each system is measured as shipped, default settings, pinned versions. No real document was used in the test. So no personal data.

A synthetic bench has its limits. It measures detection on known document structures, not the infinite variety of the real world. That is also what makes it reproducible. The test set and the measurement details will be published. Until then, the numbers can be verified on request: contact@anoni.dev.

One more piece of honesty. The first run of the bench revealed a flaw in our own engine: a SIRET could mask a NIR under the wrong label. Fixed, re-measured, published. That is exactly what a benchmark is for.

And on data we didn’t write

A synthetic bench is still our bench. So Anoni is also measured on an independent corpus: ai4privacy OpenPII. Real French personal data, hand-annotated, under an open license (CC-BY). 500 documents we didn’t write.

On Anoni’s French scope (people, addresses, places, emails, phone numbers, dates): 92.7% strict recall and 97.1% of sensitive characters covered. An independent measurement, on real data. The honest answer to “sure, but it’s your own test set”.

The corpus doesn’t label organizations and mixes in non-French identifiers (cards, passports) outside Anoni’s scope. The figure therefore covers the categories Anoni targets. Reproducible, like the rest.

See also: compliance · for lawyers · download.