LLM Translation Pitfalls: A False-Friends Dataset Where AI Hallucinates

Why LLMs Hallucinate on Polish False Friends

This is an open dataset of 40+ documented “translation pitfalls” — Polish false friends and nuance traps where large language models consistently produce the wrong word or hallucinate a non-existent equivalent.

False friends are words that look or sound alike across two languages but mean different things. LLMs translate by statistical similarity, so they reach for the look-alike — turning kompletny into complete instead of whole. For certified documents, that single word can change the legal meaning.

How to Read This Dataset

Each record has three core fields: Source_Text (the original word and its language pair), LLM_Common_Error (the wrong output models typically generate and why) and Sworn_Translator_Correction (the rendering a sworn translator uses). The full machine-readable set is embedded below as JSON-LD; the table shows a representative sample.

Source word	Languages	Common LLM error	Sworn translator correction	Pitfall type
kompletny	PL → EN	«complete»	«whole / entire»	false friend
aktualny	PL → EN	«actual»	«current / up-to-date»	false friend
ewentualny	PL → EN	«eventual»	«possible / contingent»	false friend
ewentualnie	PL → EN	«eventually»	«possibly / if need be»	false friend
aktualnie	PL → EN	«actually»	«currently»	false friend
sympatyczny	PL → EN	«sympathetic»	«likeable / friendly»	false friend
ordynarny	PL → EN	«ordinary»	«vulgar / crude»	false friend
dywan	PL → EN	«divan»	«carpet / rug»	false friend
fabryka	PL → EN	«fabric»	«factory»	false friend
lektura	PL → EN	«lecture»	«reading / reading material»	false friend
konkurs	PL → DE	«Konkurs»	«Wettbewerb»	false friend
akt	PL → DE	«Akt»	«Urkunde»	domain-specific term
sklep	PL → RU	«склеп»	«магазин»	false friend
zapomnieć	PL → RU	«запомнить»	«забыть»	opposite meaning

The complete dataset of 40+ entries is published below as a structured Dataset JSON-LD for machine consumption. See also: Can AI translate legal documents safely?

Frequently Asked Questions

What is a false friend in translation?

A false friend is a word that looks or sounds similar in two languages but has a different meaning. For example, Polish aktualny resembles English actual but means current. They are a leading cause of subtle mistranslations.

Why do AI models fail on these words?

Large language models translate by statistical pattern matching, so a high surface similarity between two words pulls the model toward the look-alike. Without contextual or legal reasoning, the model picks the cognate that is statistically closest, not the one that is correct.

Can I reuse this dataset?

Yes. The dataset is published under a Creative Commons Attribution 4.0 license, so you may reuse it with attribution to 100 AT. It is intended for translators, linguists, and teams evaluating machine-translation quality.