Question 1

Is Anoni more accurate than Microsoft Presidio?

Accepted Answer

On 210 synthetic French legal documents, Anoni reaches 94.8% strict recall against 58.6% for Microsoft Presidio (French configuration), same documents and same scoring rule. Presidio is a general-purpose library; Anoni is tuned for French documents and validates structured identifiers (NIR, SIRET, IBAN) with their checksums.

Question 2

Local tool or online anonymizer?

Accepted Answer

Anoni runs entirely on your machine: no document, no excerpt leaves it, and it works offline after the first launch. An online anonymizer sends the text to a server to process it — a transfer to assess before handing over a client document.

Question 3

Which option fits the GDPR?

Accepted Answer

Processing locally avoids transferring personal data to a third party: there is no recipient, no hosting, no cross-border transfer to document. With a cloud service, the text leaves the machine, so you must check where it is processed, by whom and under which contract.

Question 4

Are the models open?

Accepted Answer

The detection models are open: CamemBERT under MIT, the ONNX runtime under Apache-2.0, published on Hugging Face — you can see what reads your documents. The application itself is not open source and does not claim to be.

System	Strict recall	Leaks avoided	False positives
Anoni (models + rules)	94.8 %	99.0 %	73
openai/privacy-filter (raw)	67.8 %	91.9 %	217
Microsoft Presidio (fr)	58.6 %	77.1 %	356
AInonymizer	64.4 %	67.9 %	143
Anoni (rules only)	59.3 %	65.2 %	0

Local or cloud: the same test.

The real difference: where it runs

About the systems compared

And on data we didn’t write

Is Anoni more accurate than Microsoft Presidio?

Local tool or online anonymizer?

Which option fits the GDPR?

Are the models open?