What Is a Diff? The History and Theory
A diff (from "difference") is an output that describes the changes between two versions of a file or text. The concept originates with the Unix diff utility, first released in 1974 as part of Version 4 Unix, written by Douglas McIlroy. The algorithm that powers it — and most modern diff tools — is the Longest Common Subsequence (LCS) algorithm, which finds the smallest edit required to transform one text into another.
The key insight of LCS-based diffing is that it minimises the number of insertions and deletions needed to convert the original text to the modified text. This produces the most human-readable diff — one that shows only what actually changed, not arbitrary replacements.
Modern diff tools — including this one — often use Eugene Myers' diff algorithm (1986), which computes the shortest edit script in O(N·D) time and space, where N is the total length of both inputs and D is the number of differences. For practical inputs, this is extremely fast.
The Unified Diff Format — Reading Every Line
Git, most code review tools, and the POSIX diff -u command all use the unified diff format. If you've done any code review, you've seen it. Here's a complete annotated example:
--- a/src/auth.js ← original file (a/ prefix = old version)
+++ b/src/auth.js ← modified file (b/ prefix = new version)
@@ -12,7 +12,9 @@ ← hunk header (explained below)
function login(user) { ← context line (unchanged), 1 space indent
- const token = makeJWT(user.id); ← removed line, - prefix
- res.cookie('token', token); ← removed line
+ const token = makeJWT( ← added line, + prefix
+ user.id, { expiresIn: '15m' } ← added line
+ ); ← added line
+ res.cookie('token', token, { httpOnly: true, secure: true });
return token; ← context line (unchanged)
}
The hunk header @@ -12,7 +12,9 @@ decoded:
-12,7— in the original file, this hunk starts at line 12 and covers 7 lines (including context)+12,9— in the modified file, this hunk starts at line 12 and covers 9 lines (two lines were added)
The three context lines surrounding each changed section are critical for understanding what changed and why. Without context, a diff showing only - token / + jwt is meaningless; with context, you can see the function and variable being modified.
Split View vs Unified View — Which to Use When
Most diff tools, including this one, offer two presentation modes.
Unified (single-column)
Unified view shows original and modified lines interleaved in a single stream with + and − prefixes. It is:
- More compact — both versions fit on screen simultaneously
- The standard format used by git diff, GitHub, GitLab, and code review tools
- Better for wide-ranging changes across many parts of the file
- Easier to read when many lines are added or removed in sequence
Split (side-by-side)
Split view shows original and modified side by side in two columns. It is:
- Better for modified lines — you can directly compare old text and new text
- Easier to read small changes within long lines
- Better when the structure of the document is important to understand
- Preferred by many developers for code review of complex changes
The CodeLint.Dev Diff Checker supports both modes with character-level (inline) highlighting in both views — meaning changed words or characters within a line are highlighted independently of the line-level diff.
Character-Level Diffing — Seeing Exactly What Changed
Line-level diffing (which standard unified diff provides) marks an entire line as changed even if only one word changed. Character-level diffing goes further and highlights the exact characters that differ within a modified line.
// Original
const url = 'http://api.example.com/v1/users';
// Modified
const url = 'https://api.example.com/v2/users';
// Line-level diff: the entire line is marked as changed (−/+)
// Character-level diff highlights only: http → https and v1 → v2
Character-level diffs are especially valuable when reviewing:
- Configuration file changes where a single value changes in a long line
- Prose edits where a word is replaced in a long paragraph
- Renaming a variable used throughout a function
- URL or endpoint changes in test files
Using Diff for AI-Generated Code Review
The diff checker has become an essential tool in the age of AI-assisted development. When an LLM rewrites a function, refactors a module, or "improves" your code, you need to see exactly what changed before accepting it into your codebase.
Common AI diff workflows:
- Before/after comparison — paste your original code in the left panel, the AI's output in the right panel, and see every change highlighted.
- Prompt A vs Prompt B — compare outputs from two different prompts or models to choose the better one.
- Iterative refinement — track how a piece of code evolves across multiple AI interactions.
- Hallucination detection — when an AI claims it "only changed X", the diff will reveal if it silently removed error handling, changed method signatures, or altered business logic.
The key insight: never accept AI-generated code without reviewing a diff against the original. AI models frequently make correct-looking changes in the stated area while silently modifying adjacent code. A diff makes these silent changes visible in seconds.
Generating Diffs from the Command Line and in Code
For CI pipelines, pre-commit hooks, and programmatic diffing, you need to generate diffs in code:
# Unified diff between two files (3 lines of context)
diff -u original.txt modified.txt
# More context
diff -U 10 original.txt modified.txt
# Ignore whitespace-only changes
diff -u -b original.txt modified.txt
# Git diff (working tree vs last commit)
git diff
# Git diff between two commits
git diff abc123..def456 -- src/auth.js
# Git diff with word-level highlighting
git diff --word-diff
# Generate a patch file to apply later
git diff > changes.patch
# Apply a patch
git apply changes.patchimport { createPatch, diffLines, diffWords } from 'diff';
const original = `function greet(name) {
return 'Hello, ' + name;
}`;
const modified = `function greet(name, greeting = 'Hello') {
return \`\${greeting}, \${name}!\`;
}`;
// Line-level diff
const lineDiff = diffLines(original, modified);
for (const part of lineDiff) {
const prefix = part.added ? '+' : part.removed ? '-' : ' ';
process.stdout.write(prefix + part.value);
}
// Unified patch string (ready to apply with patch command)
const patch = createPatch('greet.js', original, modified);
console.log(patch);import difflib
original = """function greet(name) {
return 'Hello, ' + name;
}""".splitlines(keepends=True)
modified = """function greet(name, greeting = 'Hello') {
return `${greeting}, ${name}!`;
}""".splitlines(keepends=True)
# Unified diff
diff = difflib.unified_diff(
original, modified,
fromfile='original', tofile='modified',
n=3 # context lines
)
print(''.join(diff))
# HTML side-by-side diff
html_diff = difflib.HtmlDiff()
html = html_diff.make_file(original, modified, 'original', 'modified')
with open('diff.html', 'w') as f:
f.write(html)