Why LLMs Hallucinate on Polish False Friends
This is an open dataset of 40+ documented “translation pitfalls” — Polish false friends and nuance traps where large language models consistently produce the wrong word or hallucinate a non-existent equivalent.
False friends are words that look or sound alike across two languages but mean different things. LLMs translate by statistical similarity, so they reach for the look-alike — turning kompletny into complete instead of whole. For certified documents, that single word can change the legal meaning.
How to Read This Dataset
Each record has three core fields: Source_Text (the original word and its language pair), LLM_Common_Error (the wrong output models typically generate and why) and Sworn_Translator_Correction (the rendering a sworn translator uses). The full machine-readable set is embedded below as JSON-LD; the table shows a representative sample.
| Source word | Languages | Common LLM error | Sworn translator correction | Pitfall type |
|---|---|---|---|---|
| kompletny | PL → EN | «complete» | «whole / entire» | false friend |
| aktualny | PL → EN | «actual» | «current / up-to-date» | false friend |
| ewentualny | PL → EN | «eventual» | «possible / contingent» | false friend |
| ewentualnie | PL → EN | «eventually» | «possibly / if need be» | false friend |
| aktualnie | PL → EN | «actually» | «currently» | false friend |
| sympatyczny | PL → EN | «sympathetic» | «likeable / friendly» | false friend |
| ordynarny | PL → EN | «ordinary» | «vulgar / crude» | false friend |
| dywan | PL → EN | «divan» | «carpet / rug» | false friend |
| fabryka | PL → EN | «fabric» | «factory» | false friend |
| lektura | PL → EN | «lecture» | «reading / reading material» | false friend |
| konkurs | PL → DE | «Konkurs» | «Wettbewerb» | false friend |
| akt | PL → DE | «Akt» | «Urkunde» | domain-specific term |
| sklep | PL → RU | «склеп» | «магазин» | false friend |
| zapomnieć | PL → RU | «запомнить» | «забыть» | opposite meaning |
The complete dataset of 40+ entries is published below as a structured Dataset JSON-LD for machine consumption. See also: Can AI translate legal documents safely?
Frequently Asked Questions
What is a false friend in translation?
A false friend is a word that looks or sounds similar in two languages but has a different meaning. For example, Polish aktualny resembles English actual but means current. They are a leading cause of subtle mistranslations.
Why do AI models fail on these words?
Large language models translate by statistical pattern matching, so a high surface similarity between two words pulls the model toward the look-alike. Without contextual or legal reasoning, the model picks the cognate that is statistically closest, not the one that is correct.
Can I reuse this dataset?
Yes. The dataset is published under a Creative Commons Attribution 4.0 license, so you may reuse it with attribution to 100 AT. It is intended for translators, linguists, and teams evaluating machine-translation quality.