SSML Builder (Speech)
Build SSML (Speech Synthesis Markup) documents compatible with Alexa, Google and Polly with break, prosody, emphasis, phoneme and voice tags.
SSML — the markup language that makes synthetic voices sound human
SSML (Speech Synthesis Markup Language) is a W3C standard — version 1.1 was published in 2010 — for telling a text-to-speech engine how to read a string. Plain text gives the engine a single signal (the words); SSML adds pauses, emphasis, pitch, rate, phonemes, character spelling, dates, currencies, and substitutions. It is the difference between a contact-centre bot that reads "Dr. R. Silva" as "doctor R Silva" and one that reads it as "doctor Reginaldo Silva, account number A-one-two-three".
Anatomy of an SSML document
Every document is wrapped in a <speak> root element. Inside, you mix plain text with tags that modify the surrounding speech:
<speak>
Welcome to <emphasis level="strong">Amazon</emphasis>.
Please wait <break time="500ms"/> while I connect you.
<prosody rate="slow" pitch="+2st">Slowly and high.</prosody>
Your account number is <say-as interpret-as="characters">A123</say-as>.
</speak>
The tags you will actually use
<break time="500ms"/>— insert a pause; acceptsms,s, or strengthweak|medium|strong|x-strong.<emphasis level="strong">— stress a word. Most engines supportstrong,moderate,reduced.<prosody rate="slow" pitch="+2st" volume="loud">— fine-grained speed, pitch (semitones or %), volume control.<say-as interpret-as="characters|digits|date|time|currency|telephone">— force interpretation. "A123" reads as "A-one-two-three" withcharacters; "2025-12-31" reads as a date withdate.<phoneme alphabet="ipa" ph="ˈnaɪki">Nike</phoneme>— custom pronunciation in IPA or X-SAMPA.<sub alias="Doctor">Dr.</sub>— substitute spoken form. Universal across providers.
Engines and providers
SSML support is not uniform. Amazon Polly implements the most complete subset plus extensions (<amazon:effect>, newscaster style, breathing sounds). Google Cloud Text-to-Speech (Wavenet, Neural2, Studio voices) is stricter and rejects invalid markup outright. Microsoft Azure Speech uses a slightly different namespace and adds <mstts:express-as style="cheerful"> for emotional styles. IBM Watson and Amazon Connect (the IVR product) both consume SSML. For Brazilian Portuguese, the best neural voices are Polly's Camila and Vitória, Google's pt-BR-Wavenet-C, and Azure's Francisca. Apple VoiceOver ships Felipe and Luciana on macOS/iOS.
Where SSML is worth the effort
- Contact-centre IVR — account numbers, currency amounts, and dates need
say-astags to avoid embarrassing reads. - Audiobooks and podcasts — long-form synthesis with deliberate pacing and emphasis.
- Voice assistants — Alexa Skills SDK requires SSML for any non-trivial response; Google Actions accepts it.
- Accessibility — screen readers honour some SSML hints embedded via ARIA or
aria-label.
FAQ
Is SSML portable between providers? Partly. The W3C core (break, emphasis, prosody, say-as, sub, phoneme) works almost everywhere. Provider-specific extensions (amazon:effect, mstts:express-as) do not. Test in the target engine before you ship.
Are there neural voices for Brazilian Portuguese? Yes. Polly's Camila (neural) is the de-facto industry pick for natural-sounding BR-PT. Google has pt-BR-Neural2-A through pt-BR-Neural2-C. Azure offers Francisca and Antônio. All cost around US$ 16 per 1 M characters for neural, ~US$ 4 for standard.
How many tags should I use? Fewer than you think. Engines are good at default prosody; over-tagging produces robotic results. Use say-as where ambiguity exists, break for deliberate beats, and stop there. Resist the urge to micro-manage every word with prosody.
What is a PLS lexicon? Pronunciation Lexicon Specification — a separate XML file mapping written forms to phonetic spellings. Useful for brand names and jargon you reuse across many SSML documents (define "Nike" once, reference everywhere).
Related Tools
Handwriting Generator
Convert typed text into an image with handwriting appearance. Useful for adding a personal touch to digital work.
Resume Generator
Fill a simple printable A4 CV from a form with personal data, education and experience.
Favicon Generator
Generate a favicon from text/emoji in all common sizes (16, 32, 48, 64, 192, 512). PNG download.