1001Ferramentas
๐Ÿ—ฃ๏ธValidators

ISO 639 Language Code Validator

Validate ISO 639-1 / ISO 639-2 language codes and resolve language name.

ISO 639: the language code family

ISO 639 is the international standard that catalogs identifiers for human languages. It is the foundation of every internationalization pipeline: HTML lang attributes, browser Accept-Language headers, app store localized listings, gettext catalogs, Unicode CLDR locale data, machine-translation routing and search engine hreflang tags all start from an ISO 639 subtag. Unlike ISO 3166 (countries) or ISO 4217 (currencies), ISO 639 is split into several parts because the linguistic landscape is far more granular than the political one.

The three main variants

  • ISO 639-1: two lowercase letters (pt, en, es, de, fr, ja, zh). Covers about 184 of the world's most widely spoken languages. This is what you almost always want in web contexts.
  • ISO 639-2: three letters (por, eng, spa, deu/ger, fra/fre, jpn, zho/chi). Comes in two flavors: bibliographic (used by libraries, MARC records) and terminologic (used by linguists). Most languages have a single code; a few have both forms.
  • ISO 639-3: three letters (por, eng, cmn for Mandarin, yue for Cantonese, bzs for Brazilian Sign Language). Designed to cover every known human language, living or extinct โ€” currently around 7,800 entries โ€” and the macrolanguage concept lets zho umbrella cmn, yue, wuu and dozens of other Chinese languages.

There is also ISO 639-5 for language families (e.g. sla Slavic, roa Romance) and ISO 639-4 documenting the principles, but they rarely show up in application code.

BCP 47: the practical web format

On the web you almost never see a bare ISO 639 code. You see a BCP 47 language tag (RFC 5646), which composes ISO 639 with ISO 15924 (scripts), ISO 3166-1 (regions) and variant / extension subtags. Examples:

  • pt-BR โ€” Brazilian Portuguese (language + region).
  • pt-PT โ€” European Portuguese.
  • zh-Hans-CN โ€” Simplified Chinese as written in China (language + script + region).
  • zh-Hant-TW โ€” Traditional Chinese as written in Taiwan.
  • sr-Cyrl / sr-Latn โ€” Serbian in Cyrillic or Latin script.
  • en-US-x-private โ€” private-use extension.

Subtag order: language - script - region - variant - extension - private use. Case is informative only (lowercase language, Titlecase script, UPPERCASE region) โ€” the matching algorithm is case-insensitive.

Where ISO 639 shows up in the stack

  • HTML: <html lang="pt-BR"> declares document language for screen readers, browsers and crawlers.
  • HTTP: Accept-Language: pt-BR,pt;q=0.9,en;q=0.8 negotiates content language with quality values.
  • SEO: <link rel="alternate" hreflang="pt-BR" href="..."> tells Google which version to serve.
  • i18n libraries: react-intl, FormatJS, i18next, vue-i18n all key on BCP 47 tags.
  • Unicode CLDR: the locale data behind Intl uses BCP 47 internally.
  • App stores: localized descriptions are uploaded per BCP 47 locale.

Sign and constructed languages

ISO 639-3 covers sign languages and even constructed languages. A few you can validate:

  • bzs โ€” Lingua Brasileira de Sinais (Libras).
  • ase โ€” American Sign Language (ASL).
  • bfi โ€” British Sign Language (BSL).
  • eo โ€” Esperanto (also in ISO 639-1).
  • tlh โ€” Klingon, registered in ISO 639-3 (yes, the Star Trek language).
  • sjn โ€” Sindarin, Tolkien's Elvish language.
  • qaa-qtz โ€” range reserved for local / private use.

Libraries and language detection

  • JavaScript: iso-639-1, langs, bcp-47 on npm.
  • Python: pycountry, babel, langcodes.
  • Detection: franc (npm), Google CLD3 (cld3 Python bindings), Microsoft Recognizers, fasttext language models.
  • Glibc / ICU: system locales like pt_BR.UTF-8 blend ISO 639-1 + ISO 3166-1 + encoding.

Example with iso-639-1:

const ISO6391 = require('iso-639-1')
ISO6391.getName('pt')          // "Portuguese"
ISO6391.getNativeName('pt')    // "Portugues"
ISO6391.validate('xx')         // false
ISO6391.getAllCodes().length   // 184

Brazilian Portuguese vs European Portuguese

Both share the ISO 639-1 code pt but BCP 47 distinguishes them by region: pt-BR and pt-PT. The 1990 Orthographic Agreement (Acordo Ortografico) reduced spelling differences but vocabulary, grammar and pronunciation still diverge. For localization, always use the region-tagged form โ€” pt alone is ambiguous and Google may serve the wrong variant.

FAQ

Should I use pt or pt-BR?

For a Brazilian site, use pt-BR. The bare pt is generic and search engines or screen readers may apply European Portuguese defaults. For an hreflang matrix, declare each regional variant.

ISO 639-1, -2, or -3 โ€” which do I pick?

ISO 639-1 for everyday web work โ€” it is what BCP 47 expects. ISO 639-2 for libraries, MARC and government archives. ISO 639-3 when you need to identify rare, regional or sign languages that 639-1 does not cover.

Do sign languages have ISO codes?

Yes. ISO 639-3 covers them: bzs for Libras, ase for ASL, bfi for BSL and many others. ISO 639-1 does not, since it has only 184 slots for the most widespread spoken languages.

Is Klingon really an ISO 639-3 code?

Yes โ€” tlh. The ISO 639-3 registry includes constructed languages with documented vocabularies. Sindarin (sjn) and Quenya (qya), both from Tolkien, are also listed alongside Esperanto (eo / epo) and Volapuk (vo / vol).

Related Tools