Context
I need syntax highlighting for code blocks in blog posts and documentation. Code snippets should be readable, visually appealing, and support both light and dark themes automatically.
The major syntax highlighting libraries have different tradeoffs:
Client-Side Highlighters:
- Prism.js (via refractor/rehype-prism): Fastest performance, supports 280+ languages, highly modular and customizable, but uses simpler regex-based highlighting that can miss edge cases (e.g., Python f-strings, SQL nested queries, YAML complex anchors)
- Highlight.js (via rehype-highlight): Supports 192 languages, fast and lightweight, but uses heuristic-based detection that can misidentify code blocks and produce less granular token highlighting
- Sugar High: Extremely fast (~2KB), but only supports JavaScript/TypeScript and lacks broad language coverage
Build-Time Highlighters:
- Shiki (via rehype-pretty-code): Uses VS Code's syntax engine with TextMate grammars for token-level accuracy, supports 100+ languages (fewer than Prism/Highlight.js but covers all common ones), but larger bundle (~250KB) and 7x slower than Prism
- Starry Night: GitHub's highlighter using TextMate grammars, similar accuracy to Shiki but smaller ecosystem and less LLM training data
I want a solution that prioritizes:
- Accuracy: Precise syntax highlighting across diverse languages (Python, SQL, YAML, JSON, TypeScript, Bash, etc.) without heuristic guessing
- Theme Ecosystem: Access to popular, production-quality themes without custom CSS
- Dual Theme Support: Automatic light/dark mode switching without rebuilding
- Build-Time Rendering: No client-side highlighting overhead—pre-render at build time for SSG
- LLM-Friendly: Well-documented themes and configuration that AI agents understand
Decision
I decided to use Shiki via the rehype-pretty-code plugin.
This aligns with The Goldilocks Zone and LLM-Optimized. Shiki uses VS Code's battle-tested syntax engine and theme ecosystem, both of which have massive LLM training data. The build-time approach eliminates client-side performance concerns.
The implementation uses:
- Dual themes:
github-darkandgithub-lightfor automatic theme switching - Build-time rendering: Code blocks are highlighted during Next.js SSG build, not in the browser
- Custom styling:
keepBackground: falseallows Tailwind-based backgrounds instead of theme defaults - rehype integration: Runs as a unified pipeline step alongside other MDX transformations
Example
Below are live examples demonstrating Shiki's VS Code-level syntax highlighting. These code blocks are automatically highlighted at build time with precise token-level accuracy.
Python with f-strings and decorators
Notice how decorators, type hints, and nested f-string expressions are all accurately highlighted:
from functools import lru_cache
from typing import List, Optional
@lru_cache(maxsize=128)
def fibonacci(n: int) -> int:
"""Calculate the nth Fibonacci number with memoization."""
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
def format_results(numbers: List[int]) -> str:
return f"Fibonacci sequence: {', '.join(f'F({i})={fibonacci(i)}' for i in numbers)}"
# Complex f-string with nested expressions
print(f"Results: {format_results([1, 2, 3, 4, 5])}")SQL with window functions
Observe the precise highlighting of window functions, partitions, and CTEs:
-- Calculate running total and rank by department
SELECT
employee_id,
department,
salary,
SUM(salary) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank,
LAG(salary, 1) OVER (PARTITION BY department ORDER BY hire_date) AS previous_salary
FROM employees
WHERE hire_date >= '2023-01-01'
ORDER BY department, salary_rank;TypeScript with generics and type inference
Generic constraints, async/await, and nullish coalescing are all highlighted correctly:
interface Repository<T> {
findById(id: string): Promise<T | null>;
findAll(): Promise<T[]>;
save(entity: T): Promise<T>;
}
class InMemoryRepository<T extends { id: string }> implements Repository<T> {
private items = new Map<string, T>();
async findById(id: string): Promise<T | null> {
return this.items.get(id) ?? null;
}
async findAll(): Promise<T[]> {
return Array.from(this.items.values());
}
async save(entity: T): Promise<T> {
this.items.set(entity.id, entity);
return entity;
}
}
// Type inference in action
const userRepo = new InMemoryRepository<{ id: string; name: string }>();
const user = await userRepo.findById('123'); // Type: { id: string; name: string } | nullYAML with anchors and merge keys
Complex YAML features like anchors, aliases, and merge keys are precisely tokenized:
defaults: &defaults
adapter: postgres
host: localhost
pool: 5
development:
<<: *defaults
database: myapp_dev
production:
<<: *defaults
host: ${DATABASE_HOST}
database: myapp_prod
pool: 25Bash with parameter expansion
Shell scripts with parameter expansion, conditionals, and special variables are highlighted correctly:
#!/bin/bash
set -euo pipefail
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly LOG_FILE="${SCRIPT_DIR}/deploy.log"
deploy() {
local env="${1:-staging}"
local version="${2:-$(git describe --tags --always)}"
echo "Deploying ${version} to ${env}..." | tee -a "$LOG_FILE"
if [[ "$env" == "production" ]]; then
read -rp "Are you sure? [y/N] " confirm
[[ "$confirm" =~ ^[Yy]$ ]] || exit 1
fi
}
deploy "$@"These live examples showcase:
- VS Code-level accuracy: Uses the same TextMate grammars that power VS Code
- Complex syntax support: Python decorators and f-strings, SQL window functions, TypeScript generics, YAML anchors, Bash parameter expansion
- Theme-aware: Automatically switches between
github-lightandgithub-darkbased on your theme - Build-time rendering: Zero client-side JavaScript overhead—highlighting happens during build
- No client-side libraries: Unlike Prism.js or Highlight.js, Shiki doesn't ship to the browser
Consequences
Pros
- VS Code-Level Accuracy: Uses the same TextMate grammars that power VS Code. SQL queries, Python decorators, YAML anchors, JSON schema comments, and Bash parameter expansion are all highlighted with precise token-level granularity. Prism/Highlight.js support these languages too, but use simpler regex patterns that can miss edge cases (e.g., Python f-string expressions, SQL window functions, complex YAML merges).
- Language Support: Supports 100+ languages (same as VS Code). While Prism (280+) and Highlight.js (192) support more total languages, Shiki covers all common languages needed for technical blog posts—Python, SQL, YAML, JSON, Bash, TypeScript, Go, Rust, Terraform, Docker, etc.
- Theme Ecosystem: Access to 100+ VS Code themes out of the box. No need to write custom CSS for syntax colors—just reference a theme name. Themes are maintained by the community and battle-tested in millions of editors.
- Dual Theme Support: Built-in light/dark mode switching. The same code block renders with
github-lightin light mode andgithub-darkin dark mode, automatically synchronized withnext-themes. - Build-Time Rendering: Highlighting happens during
next build, not in the browser. Zero client-side JavaScript overhead for syntax highlighting. Pages load instantly with pre-rendered HTML. - LLM-Native: Shiki and VS Code themes are extensively documented in LLM training data. AI agents can easily generate rehype-pretty-code configurations and understand theme references.
- Maintainability: VS Code theme updates automatically benefit Shiki users. No need to manually maintain custom syntax CSS as languages evolve.
- Unstyled by Default: rehype-pretty-code provides logical attributes (data-language, data-theme) without opinionated CSS, allowing full control over styling with Tailwind classes.
Cons
- Large Bundle: ~250KB for Shiki + WASM dependencies, significantly larger than Prism (~5KB). However, this is acceptable because highlighting runs at build-time—the bundle never ships to the client.
- Build Performance: ~7x slower than Prism for highlighting. For large codebases with hundreds of code blocks, build times could increase. Currently acceptable for a personal blog with moderate code block counts.
- No Client-Side Highlighting: Cannot dynamically highlight user-generated code (e.g., live code editors). Highlighting is static, baked into the HTML at build time. Not a concern for static blog posts.
- Theme Lock-In: Switching themes requires rebuilding the site. Cannot offer users theme selection without pre-generating all theme variants.