Why is Base64-encoded data about 33% larger than the original?

Three bytes (24 bits) of binary data are represented as four Base64 characters (24 bits ÷ 6 bits per character = 4 characters). Each original byte becomes approximately 1.33 characters. The exact overhead depends on the input length: if it's a multiple of 3, the overhead is exactly 33%. With padding, the overhead is at most 4/3 = 33.3%.

What is the difference between Base64 and Base64URL?

Base64URL replaces the + character with - and the / character with _, making the encoded string safe to embed in URLs and HTTP headers without percent-encoding. It also typically omits the = padding. JWTs use Base64URL. Most API tokens and web contexts where the string appears in a URL should use Base64URL.

Can Base64 encode any type of file?

Yes. Base64 operates on raw bytes, so it can encode any file: images, PDFs, audio, video, executables, and compressed archives. This is how email attachments work. For web use, keep in mind the 33% size overhead and the fact that the browser must parse the entire Base64 string before it can use the data.

Is Base64 the same as hex encoding?

Both are binary-to-text encodings, but hex (Base16) uses 2 characters per byte (4 bits per character), making it 100% larger than the original. Base64 uses 4 characters per 3 bytes, making it 33% larger. Base64 is therefore more space-efficient. Hex is more human-readable for inspecting raw bytes (e.g. hash values).

Why does btoa() throw "InvalidCharacterError" on my string?

btoa() accepts only Latin-1 characters (code points 0–255). Characters outside this range — emoji, CJK, diacritics beyond Latin-1 — will throw. Encode your string to UTF-8 bytes first: use encodeURIComponent + fromCharCode, or TextEncoder.encode() and then pass the resulting Uint8Array to a base64 function that accepts binary data.

Base64 Encoding & Decoding: The Complete Guide (RFC 4648)

What Is Base64 and Why Does It Exist?

Base64 is a binary-to-text encoding scheme defined in RFC 4648 (October 2006). It was designed to solve a fundamental problem: many text-based protocols (SMTP email, HTTP headers, XML, JSON) can only safely carry a subset of ASCII characters. Raw binary data contains bytes in the range 0x00–0xFF, many of which are control characters that will corrupt or truncate a text message.

Base64 solves this by converting arbitrary binary data into a string using only 64 safe characters: A–Z (26), a–z (26), 0–9 (10), + and / (2), plus = for padding. Every byte value 0x00–0xFF can be represented unambiguously using these characters.

The trade-off is size: Base64-encoded data is approximately 33% larger than the original binary. Three bytes (24 bits) are encoded as four characters (24 bits / 6 bits per character = 4 characters).

How Base64 Encoding Works — Step by Step

Base64 encodes input in 3-byte groups. Each group of 3 bytes (24 bits) is split into four 6-bit values, each of which maps to one character in the Base64 alphabet.

Input:    M        a        n
ASCII:    77       97       110
Binary:   01001101 01100001 01101110
         ↓ Split into 6-bit groups ↓
         010011  010110  000101  101110
         19      22      5       46
         ↓ Map to Base64 alphabet ↓
         T       W       F       u

Result: "TWFu"

When the input length is not a multiple of 3, padding is applied:

1 leftover byte → two Base64 chars + ==
2 leftover bytes → three Base64 chars + =
0 leftover bytes → no padding

Example: "Ma" encodes to "TWE=" (one = padding because 2 bytes were left over).

Standard Base64 vs Base64URL — When to Use Each

RFC 4648 defines two distinct alphabets:

Variant	Characters 62–63	Padding
Standard (§4)	`+` and `/`	Required (`=`)
URL-safe (§5)	`-` and `_`	Optional (often omitted)

Standard Base64 uses + and /, which have special meanings in URLs and HTTP query strings. Always use Base64URL when embedding Base64 in a URL, query parameter, cookie, or JWT header/payload.

JWT tokens, for example, use Base64URL encoding without padding — that's why you won't see = signs or +// characters in a JWT.

The conversion between the two is trivial:

// Standard → URL-safe
standard.replace(/+/g, '-').replace(///g, '_').replace(/=/g, '')

// URL-safe → Standard (restore padding first)
const padded = urlSafe + '='.repeat((4 - urlSafe.length % 4) % 4)
padded.replace(/-/g, '+').replace(/_/g, '/')

Where Base64 Appears in Real Systems

HTTP Basic Authentication — credentials are sent as Authorization: Basic base64(username:password). This is encoding, not encryption — always use HTTPS.
JSON Web Tokens — the header and payload are Base64URL-encoded; the signature is Base64URL-encoded binary.
Data URIs — embed images and fonts directly in HTML/CSS: data:image/png;base64,iVBORw0KGgo...
Email attachments (MIME) — defined in RFC 2045. Email was originally designed for 7-bit ASCII text; Base64 allows binary attachments to traverse legacy mail servers.
API keys and secrets — many services distribute random bytes as Base64 strings because they are easier to copy/paste than raw hex.
TLS/SSH certificates and keys — PEM format is a Base64-encoded DER certificate wrapped in -----BEGIN CERTIFICATE----- headers.
Content Security Policy hashes — CSP script hashes use Base64-encoded SHA digests: sha256-base64hash.
WebCrypto / crypto APIs — exchanging keys and signatures between browser WebCrypto and server-side libraries typically uses Base64 encoding.

Base64 in 6 Programming Languages

JavaScript JavaScript / Browser

// Browser-native (ASCII only — breaks on non-Latin Unicode)
const encoded = btoa('Hello, World!');         // 'SGVsbG8sIFdvcmxkIQ=='
const decoded = atob('SGVsbG8sIFdvcmxkIQ=='); // 'Hello, World!'

// Safe UTF-8 encoding (handles all Unicode)
function encodeUTF8(str) {
  return btoa(encodeURIComponent(str).replace(
    /%([0-9A-F]{2})/g,
    (_, p1) => String.fromCharCode(parseInt(p1, 16))
  ));
}

// Node.js (also works in browsers via Buffer polyfill)
const encoded = Buffer.from('Hello').toString('base64');        // 'SGVsbG8='
const decoded = Buffer.from('SGVsbG8=', 'base64').toString();  // 'Hello'

// URL-safe Base64 in Node.js
const urlSafe = Buffer.from('Hello').toString('base64url');

Python Python

import base64

# Standard Base64
encoded = base64.b64encode(b'Hello, World!')
# b'SGVsbG8sIFdvcmxkIQ=='

decoded = base64.b64decode(b'SGVsbG8sIFdvcmxkIQ==')
# b'Hello, World!'

# URL-safe Base64 (no + or /)
url_encoded = base64.urlsafe_b64encode(b'Hello, World!')

# Encode a string (not bytes)
text = 'Héllo'
encoded_str = base64.b64encode(text.encode('utf-8')).decode('ascii')

# Decode back to string
decoded_str = base64.b64decode(encoded_str).decode('utf-8')

Go Go

package main

import (
    "encoding/base64"
    "fmt"
)

func main() {
    input := []byte("Hello, World!")

    // Standard Base64
    encoded := base64.StdEncoding.EncodeToString(input)
    fmt.Println(encoded) // SGVsbG8sIFdvcmxkIQ==

    decoded, err := base64.StdEncoding.DecodeString(encoded)
    if err != nil { panic(err) }
    fmt.Println(string(decoded)) // Hello, World!

    // URL-safe Base64 (no padding)
    urlEncoded := base64.RawURLEncoding.EncodeToString(input)
    fmt.Println(urlEncoded) // SGVsbG8sIFdvcmxkIQ
}

Rust Rust (base64 crate)

use base64::{engine::general_purpose, Engine as _};

fn main() {
    let input = b"Hello, World!";

    // Standard Base64
    let encoded = general_purpose::STANDARD.encode(input);
    println!("{}", encoded); // SGVsbG8sIFdvcmxkIQ==

    let decoded = general_purpose::STANDARD.decode(&encoded).unwrap();
    println!("{}", String::from_utf8(decoded).unwrap());

    // URL-safe Base64 (no padding)
    let url_encoded = general_purpose::URL_SAFE_NO_PAD.encode(input);
    println!("{}", url_encoded); // SGVsbG8sIFdvcmxkIQ
}

Shell Command Line

# Encode (Linux)
echo -n "Hello, World!" | base64
# SGVsbG8sIFdvcmxkIQ==

# Decode
echo "SGVsbG8sIFdvcmxkIQ==" | base64 --decode
# Hello, World!

# Encode a file
base64 image.png > image.b64

# macOS uses -b 0 instead of --decode / -D
echo "SGVsbG8sIFdvcmxkIQ==" | base64 -D  # macOS

# URL-safe Base64 with tr
echo -n "Hello" | base64 | tr '+/' '-_' | tr -d '='

Java Java

import java.util.Base64;
import java.nio.charset.StandardCharsets;

byte[] input = "Hello, World!".getBytes(StandardCharsets.UTF_8);

// Standard Base64
String encoded = Base64.getEncoder().encodeToString(input);
// SGVsbG8sIFdvcmxkIQ==

byte[] decoded = Base64.getDecoder().decode(encoded);
String back = new String(decoded, StandardCharsets.UTF_8);

// URL-safe Base64 (no padding)
String urlEncoded = Base64.getUrlEncoder()
    .withoutPadding()
    .encodeToString(input);

Edge Cases and Common Mistakes

1. btoa() breaks on non-Latin-1 characters

The browser-native btoa() function only accepts Latin-1 characters (byte range 0x00–0xFF). Passing a string with emoji or CJK characters throws InvalidCharacterError. Use the UTF-8 encode pattern shown in the JavaScript example, or TextEncoder/TextDecoder with the WebCrypto API.

2. Padding mismatch

Some systems strip the trailing = padding; others require it. If you receive a "malformed base64" error, try adding = signs until the length is a multiple of 4:

const padded = token + '='.repeat((4 - token.length % 4) % 4);

3. Line wrapping (MIME)

RFC 2045 (MIME) specifies that Base64-encoded email body parts must have line breaks every 76 characters. Standard Base64 implementations (used for JWTs, APIs, etc.) do not add line breaks. If you copy a Base64 string from an email client, strip all whitespace before decoding.

4. Base64 is not encryption

Base64 is trivially reversible by anyone. Never use it as a security mechanism. HTTP Basic Auth, for example, sends credentials in plain Base64 — it only works securely over HTTPS. If you need to protect data, use proper encryption (AES-GCM, ChaCha20-Poly1305) not Base64.

Base64 Encoding & Decoding: The Complete Developer Guide