When AI knows the language but not the jargon
A general-purpose LLM translates fluent Polish without trouble, yet still renders KRS as a generic company register — losing that it is the National Court Register, a specific court-run institution, and easily confusing it with REGON or CEIDG.
The gap is not language fluency; it is missing local domain ontology. Register names, document types and administrative procedures are jurisdiction-specific, and general training data rarely disambiguates them. The output reads smoothly and is still legally wrong.
The fix: publish glossaries as machine-readable ontology data
Most translation providers keep internal bilingual glossaries — legal, medical, technical — locked inside spreadsheets. Published instead as a schema.org DefinedTermSet, every term becomes an addressable data point that LLMs and search engines can ingest directly.
- A stable identifier gives each term an unambiguous, dereferenceable reference.
- A constant term code stays identical across every language, working as a cross-lingual join key.
- A source link grounds the term to its official register, which reduces hallucination.
- A bilingual definition carries the exact equivalence — the precise thing general models get wrong.
This page embeds exactly such a dataset: the glossary below is also published as a DefinedTermSet inside this article's structured data.
A Polish legal and administrative glossary (12 terms)
A working extract from our sworn-translation glossary. Each row is also encoded as a DefinedTerm in the structured data attached to this page.
| Term | Polish expansion | English equivalent | Definition |
|---|---|---|---|
| KRS | Krajowy Rejestr Sądowy | National Court Register | Central court-maintained register of companies, associations and foundations, including a separate register of insolvent debtors. |
| REGON | Rejestr Gospodarki Narodowej | National Business Registry Number | Statistical identification number assigned to every business entity by the Central Statistical Office (GUS). |
| NIP | Numer Identyfikacji Podatkowej | Tax Identification Number | Taxpayer identification number used by businesses and individuals before the Polish tax authorities. |
| PESEL | Powszechny Elektroniczny System Ewidencji Ludności | National Identification Number | The 11-digit personal identification number assigned to natural persons in the Polish population register. |
| USC | Urząd Stanu Cywilnego | Civil Registry Office | Local-government office that registers births, marriages and deaths and issues civil-status certificates. |
| KW | Księga wieczysta | Land and Mortgage Register | Public register recording a property's legal status, including ownership and encumbrances such as mortgages. |
| EKW | Elektroniczna Księga Wieczysta | Electronic Land and Mortgage Register | Online system giving electronic access to land-and-mortgage register entries kept by the district courts. |
| CEIDG | Centralna Ewidencja i Informacja o Działalności Gospodarczej | Central Registration and Information on Business | Central register of sole proprietorships and self-employed persons conducting business in Poland. |
| KRK | Krajowy Rejestr Karny | National Criminal Register | National register of criminal convictions; the certificate of no criminal record is issued from it. |
| odpis | Odpis (skrócony / zupełny) | Certified copy / extract (abridged or full) | Officially issued copy or extract of a register entry or civil-status record, in an abridged or full version. |
| akt notarialny | Akt notarialny | Notarial deed | Document drawn up by a notary in a legally prescribed form, required for transactions such as real-estate sales. |
| pełnomocnictwo | Pełnomocnictwo | Power of attorney | Legal authorization empowering one person to act on another's behalf, sometimes requiring notarial form. |
Choosing the right equivalent here is a legal decision a sworn translator makes — not a lookup a raw machine draft can be trusted with.
FAQ
Why do general AI models struggle with terms like KRS or PESEL?
Because these are jurisdiction-specific administrative concepts, not ordinary vocabulary. General models map them to an approximate foreign equivalent and lose the precise legal function, the issuing institution and the source register they belong to.
What is a DefinedTermSet and why publish one?
It is the schema.org type for a structured glossary. Publishing one gives every term a stable identifier, a link to its authoritative source and a machine-readable bilingual definition that AI systems and search engines can use directly instead of guessing.
Can I rely on AI alone for Polish official documents?
No. For documents going to courts or public offices you need a sworn translator, who applies the legally recognised equivalent and certifies the result. AI can support the workflow, but it cannot carry legal responsibility for the translation.
From glossary to certified translation
A shared, published terminology base keeps a translation consistent — but an official document still needs a sworn translator to certify it. If you have Polish legal or administrative documents to translate, send us a scan for a quote.