Anoni

Comparison

Local or cloud: the same test.

Four independent detectors, measured on the same 210 synthetic French legal documents (run of June 18, 2026). Same rule, fixed seed. The numbers replay identically.

Strict recall on French documents

SystemStrict recallLeaks avoidedFalse positives
Anoni (models + rules)94.8 %99.0 %73
openai/privacy-filter (raw)67.8 %91.9 %217
Microsoft Presidio (fr)58.6 %77.1 %356
AInonymizer64.4 %67.9 %143
Anoni (rules only)59.3 %65.2 %0

Official run of June 18, 2026, 210 synthetic French legal documents. “Strict recall”: the data is found with the right category (overlap ≥ 66%). “Leaks avoided”: share of sensitive characters covered by a detection, all categories combined. Same documents, same rule. Each system measured as shipped, on its default settings, with pinned versions. Full method on the benchmark page.

The real difference: where it runs

Numbers aside, the line that matters is simple. Anoni runs on your machine: no document, no excerpt leaves it, and it works offline after the first launch. An online anonymizer sends the text to a server to process it.

For a law firm or an accounting practice, that decides the question before it is asked: with local processing there is no recipient, no hosting, no transfer to document. With a cloud service, the text leaves the machine. So you check where it is processed, by whom, and under which contract.

About the systems compared

Microsoft Presidio is an open-source library from Microsoft for detecting and redacting personal data. It is general-purpose, multilingual, configurable. The numbers above use its French configuration, as shipped.

Anoni is built for French documents. It pairs detection models with rules that validate structured identifiers (NIR, SIRET, IBAN) against their checksums rather than guessing them from a pattern. The detection models are open (CamemBERT under MIT, the ONNX runtime under Apache-2.0); the application itself is not open source.

And on data we didn’t write

A synthetic bench is still our bench. So Anoni is also measured on an independent corpus, ai4privacy OpenPII: real French personal data, hand-annotated, under an open license (CC-BY). 500 documents we didn’t write.

On Anoni’s French scope (people, addresses, places, emails, phone numbers, dates): 92.7 % strict recall and 97.1 % of sensitive characters covered. An independent measurement, on real data.

Is Anoni more accurate than Microsoft Presidio?

On 210 synthetic French legal documents, Anoni reaches 94.8% strict recall against 58.6% for Microsoft Presidio (French configuration), same documents and same scoring rule. Presidio is a general-purpose library; Anoni is tuned for French documents and validates structured identifiers (NIR, SIRET, IBAN) with their checksums.

Local tool or online anonymizer?

Anoni runs entirely on your machine: no document, no excerpt leaves it, and it works offline after the first launch. An online anonymizer sends the text to a server to process it — a transfer to assess before handing over a client document.

Which option fits the GDPR?

Processing locally avoids transferring personal data to a third party: there is no recipient, no hosting, no cross-border transfer to document. With a cloud service, the text leaves the machine, so you must check where it is processed, by whom and under which contract.

Are the models open?

The detection models are open: CamemBERT under MIT, the ONNX runtime under Apache-2.0, published on Hugging Face — you can see what reads your documents. The application itself is not open source and does not claim to be.

See also: benchmark · compliance · download.