PT Letter Frequency Zipf
Shows typical letter frequency in Brazilian Portuguese Zipf law top positions.
โ
Letter Frequency in Portuguese (Brazilian) and Zipf’s Law
In a natural language, the frequency of a token tends to fall off in inverse proportion to its rank. That is Zipf’s law: f(k) ≈ C / k^s, where k is the rank, s the exponent (close to 1 for words and lower for letters) and C a normalization constant. George Kingsley Zipf described it for words in “Human Behavior and the Principle of Least Effort” (1949), but the same inverse-rank shape gives a decent approximation of how letters are distributed too.
In Brazilian Portuguese the most common letters come out roughly as A (14%), E (12%), O (10%), S (8%), R (6%), I (6%), N (5%), D (5%), M (5%), T (4%). Vowels dominate, which says a lot about Portuguese phonotactics and sets it apart from consonant-heavy languages like Czech or Polish. The exact counts shift depending on whether you sample news, fiction or technical text, but the ranking barely moves.
This empirical distribution is what makes classical substitution-cipher cryptanalysis work, since counting letters in the ciphertext lets you recover the substitution alphabet. It also drives word games like Hangman and Forca, and it sits behind compression schemes from information theory such as Huffman coding, where common letters get shorter binary codes and rare ones get longer codes so the total bit count stays small.
Applications
You see it in classical cryptanalysis (Caesar, Vigenère, monoalphabetic substitution), in Huffman and arithmetic coding for text compression (gzip, bzip2), in OCR error correction and language identification, in keyboard layout work โ the BR-Nativo layout was designed around PT-BR letter frequencies โ and in solvers for word games (Wordle/Termo, Scrabble) as well as computational linguistics generally.
FAQ
Why is “A” the most common Portuguese letter? Portuguese leans hard on the vowel /a/: feminine endings (-a), -ar verb conjugations, the -ava imperfect, and articles like a and as all pile it up. English peaks at “E” for much the same kind of morphological reason.
Does Zipf’s law fit letters perfectly? Not as well as it fits words. Because the alphabet is small and tightly constrained, letters spread out in a flatter curve. The inverse-rank intuition still holds, but in practice an exponential or shifted-Zipf model matches the data more closely.
How does PT-BR differ from PT-PT in frequencies? Hardly at all โ usually under a percentage point per letter, and what gap there is comes down to spelling reforms and vocabulary preferences. The top five vowels are the same in both, in roughly the same order.
Related Tools
Rent Adjustment Calculator
Compute annual rent adjustment by IGP-M or IPCA accumulated in the last 12 months (manually configurable).
Pregnancy Calculator
Compute estimated due date (EDD), gestational age and trimester from the last menstrual period (LMP).
Fertile Period Calculator
Compute fertile window and ovulation day from the first day of the last cycle and the average cycle length.