1001Ferramentas
πŸ”£Validators

Base64 Validator

Check whether a string is valid Base64 (with or without padding). Shows decoded size and whether content looks like UTF-8 or binary.

β€”

Base64: the binary-to-text encoding behind data URIs, JWTs and email attachments

Base64 is the binary-to-text encoding scheme that turns arbitrary bytes into a printable ASCII string using a 64-character alphabet. Standardized by RFC 4648 (2006, replacing RFC 3548) and originally introduced for the MIME email standard in RFC 2045 (1996), it remains the workhorse for shipping binary payloads through systems that only understand text: SMTP headers, JSON, XML, HTTP cookies, JWT signatures and HTML attributes.

The trade-off is size: every 3 input bytes become 4 output characters, giving a 33% overhead. The benefit is universality: a base64 string survives any 7-bit ASCII pipeline intact, no escaping, no corruption.

Standard alphabet vs base64url

The standard alphabet defined by RFC 4648 uses A-Z, a-z, 0-9, + and / β€” 64 characters in total. The = sign is reserved for padding at the end (1 or 2 chars depending on the input length).

The base64url variant (also RFC 4648, section 5) swaps the two unsafe characters: + becomes - and / becomes _, making the string safe to drop into URLs, filenames and HTTP headers without percent-encoding. Padding is also optional in base64url, which is why JWTs never carry trailing =.

Validation regex and the modulo-4 rule

A well-formed standard base64 string matches ^[A-Za-z0-9+/]*={0,2}$ and its length must be a multiple of 4. For base64url the regex is ^[A-Za-z0-9_-]*={0,2}$ and the length constraint relaxes when padding is omitted (length mod 4 = 2 or 3).

Encoded output size from n input bytes is ceil(n / 3) * 4 characters. Decoding the reverse: a 100-character standard string carries between 73 and 75 bytes depending on padding.

Common pitfalls when validating Base64

  • Mixed alphabets: a JWT segment contains - or _ and will fail the standard regex. Try both alphabets before declaring it invalid.
  • Missing padding: legal in base64url, illegal in strict standard Base64. Some parsers tolerate it, others don't.
  • MIME line wrapping: RFC 2045 inserts a CRLF every 76 characters. Strip whitespace before validating.
  • Unicode source: btoa() in browsers only handles Latin-1. For UTF-8 text you must first encode with TextEncoder, then base64 the bytes.
  • Truncated payload: a valid syntax doesn't guarantee a valid decode. Attempt the decode to confirm semantic integrity.

Real-world use cases

  • Data URIs: data:image/png;base64,iVBORw0KGgo... embeds images directly in HTML/CSS.
  • JWT (JSON Web Tokens): header, payload and signature are base64url segments separated by dots.
  • HTTP Basic Auth: Authorization: Basic dXNlcjpwYXNz is base64(user:pass).
  • Email attachments (MIME): base64 wraps PDFs, images and binaries inside text-only SMTP.
  • Embedded SVG, fonts and Web Workers: any binary asset inlined into a textual file.
  • Cookies and localStorage: small binary blobs (sometimes gzip-then-base64) stored client-side.

Performance and libraries

Browsers ship native btoa() and atob(), but they only operate on ASCII strings β€” Unicode requires TextEncoder/TextDecoder. Node.js uses Buffer.from(str, 'base64') and buf.toString('base64'), which handle bytes directly and are highly optimized. Popular libraries include base64-js and js-base64 for cross-environment Unicode-aware encoding.

FAQ

Is validating Base64 easy?

Syntax validation is straightforward β€” alphabet plus length-mod-4 plus padding check. Semantic validation (does it decode to something meaningful?) requires actually attempting the decode and inspecting the bytes.

What is the URL-safe variant?

It is base64url, defined in RFC 4648 section 5. It swaps + for - and / for _, so the string can be used in URLs, filenames and HTTP headers without percent-encoding.

Can a Base64 string be valid without padding?

In strict RFC 4648 Base64, no β€” padding is mandatory. In base64url (and many practical implementations), yes β€” padding is optional. JWTs are the canonical example of unpadded base64url.

Why is the overhead 33%?

Because each base64 character carries 6 bits of information (2^6 = 64), while each byte carries 8 bits. To encode 3 bytes (24 bits) you need 4 characters (24 bits), so 4/3 = 1.33x the original size.

Should I gzip before Base64?

If the payload is large and compressible (JSON, XML, text), DEFLATE-then-base64 can dramatically shrink the transported string β€” common in cookies and HTTP cache strategies. For already-compressed data (PNG, JPEG, ZIP) it just adds CPU cost.

Related Tools