· 7 min read ·

The Coverage Problem That GPU Font Rendering Spent Twenty Years Solving

Source: hackernews

Font rendering has been a GPU problem for twenty years, and the solutions to it trace a specific path through the question of coverage.

When a GPU renders a fragment, it evaluates the shader at a single point and assigns that point a color. Text rendering needs something different: a measure of how much of each pixel’s area falls inside a glyph outline. That value is what produces clean edges and legible small text. Without it, you get aliasing. The mismatch between point evaluation and area integration is the central challenge of GPU font rendering, and every major technique in the space represents a different answer to it.

Eric Lengyel’s decade retrospective on Slug covers ten years of library maintenance, but the approach it documents sits at the end of a twenty-year trajectory. Understanding where it fits requires tracing the two techniques that preceded it.

Loop-Blinn: Shape Without Coverage

Charles Loop and Jim Blinn’s 2005 SIGGRAPH paper, “Resolution Independent Curve Rendering using Programmable Graphics Hardware,” was the first work to render Bézier curves analytically on the GPU without precomputed textures. The core idea was to triangulate each glyph so that triangles cover the filled interior, and for boundary triangles crossing the curve, assign vertex coordinates that let the fragment shader evaluate the curve equation exactly.

For a quadratic Bézier, each boundary triangle receives three-component texture coordinates (k, l, m) such that the sign of k² - lm determines inside from outside. The shader evaluates this expression and discards fragments on the wrong side:

// Loop-Blinn quadratic curve test
vec3 klm = v_klm; // interpolated from vertex attributes
if (klm.x * klm.x - klm.y * klm.z > 0.0) discard;

This is genuinely elegant. No texture atlas, no offline processing, and the curve is exact at any resolution because the shader evaluates the actual polynomial. The technique handles both quadratic Béziers (TrueType) and cubic (OpenType CFF) through different KLM mappings.

What it does not compute is coverage. The discard either keeps or drops a fragment with no notion of how much of that fragment’s area lies inside the curve. Anti-aliasing requires multi-sample rendering. Without MSAA, the result is aliased at boundaries, sharp but jagged. With MSAA, the hardware supersamples and averages, producing acceptable results at the cost of MSAA overhead.

For 2005 GPU hardware, this was the right tradeoff. Fragment shader arithmetic was expensive relative to MSAA, which had dedicated hardware resolve pipelines. Loop-Blinn occupied a real niche and influenced subsequent work on GPU vector rendering substantially.

SDF: Coverage Without Shape

Two years later, Chris Green at Valve published the signed distance field technique at SIGGRAPH 2007. Instead of computing geometry per fragment, precompute a texture where each texel stores the signed distance to the nearest glyph outline edge. At runtime, sample this texture and apply a smoothstep near zero:

float d = texture(sdf_atlas, uv).r;
float alpha = smoothstep(0.5 - smoothing, 0.5 + smoothing, d);

SDF computes coverage, approximately, from a cheap texture sample. MSAA is not required. The smoothstep produces a continuous anti-aliased boundary. The technique spread through games throughout the 2010s because it was fast, simple to implement, and produced acceptable results across a range of sizes.

The limitation is structural. Distance to the nearest edge encodes proximity, not shape. At a sharp corner, two edges meet, and the distance field in that neighborhood is governed by whichever edge is nearer. The corner itself is just a point, and the smooth distance function rounds it. Fine strokes at sizes far from the precomputed resolution blur or disappear.

Viktor Chlumský’s multi-channel SDF extended the technique by encoding directional information across RGB channels to recover sharp corners, and it improved results meaningfully. MSDF handles corners noticeably better than single-channel SDF and is now common for game UI text. It remains an approximation with a preprocessing step, and it still has failure modes at extreme scales or with complex glyph geometry.

SDF and Loop-Blinn represent complementary failures. Loop-Blinn had the right shape, wrong coverage method. SDF had coverage, wrong shape representation. Neither combined both.

Slug: Exact Coverage

Eric Lengyel’s Slug library launched in 2015 and combines what the two prior approaches separated. Like Loop-Blinn, it evaluates actual glyph geometry in the shader. Unlike Loop-Blinn, it computes exact coverage rather than relying on MSAA.

For each fragment, the shader identifies curve segments whose y-intervals overlap the current fragment row, evaluates winding contributions using the nonzero fill rule, and integrates over the fragment area to produce a coverage value. For TrueType quadratic curves this requires solving a quadratic per segment; for OpenType cubic curves, a cubic. Lengyel published the formal algorithm in the Journal of Computer Graphics Techniques in 2017 as “GPU-Centered Font Rendering Directly from Glyph Outlines.”

// Winding contribution for a quadratic bezier segment
float a = P0.y - 2.0 * P1.y + P2.y;
float b = 2.0 * (P1.y - P0.y);
float c = P0.y - frag_y;
float disc = b * b - 4.0 * a * c;
// Real roots give crossings; derivative sign gives winding direction
// Coverage integrates winding over the fragment area, not just the center

Sharp corners are sharp because the actual control points define where corners occur. Two curve segments meeting at a point have a discontinuous tangent in the data; the winding integral over the fragment area handles this correctly without approximation. Fine strokes remain fine for the same reason.

The question of why not earlier is answered by hardware. Iterating over a variable-length list of curve segments in a fragment shader requires reading from an arbitrary GPU buffer. Shader Storage Buffer Objects arrived in GLSL 4.3 in 2012. Direct3D 11 structured buffers arrived in 2009. Before these, packing glyph data into textures was possible but cumbersome, and per-fragment iteration over variable-length data had poor performance on 2007-era hardware. The algorithm was waiting for hardware to mature; Slug launched after it did.

A spatial acceleration structure makes the per-fragment iteration cost practical. Slug culls curve segments to only those whose y-intervals overlap the current fragment row, so the shader evaluates the small subset of segments that could influence coverage rather than the entire glyph outline. The per-fragment cost grows with local glyph complexity, not total outline length.

Vello: Exact Coverage, Different Architecture

Raph Levien’s Vello, developed under the Linebender project, achieves the same exactness goal through a different GPU architecture. Rather than per-fragment winding integration, Vello uses a tiling approach: divide the screen into tiles, assign curve segments to tiles in a GPU compute geometry stage, then fill each tile using an exact area integral in compute shaders.

The two approaches are exact in the same mathematical sense and different in where the work happens. In Slug, coverage integration runs in the fragment shader; each fragment independently evaluates the curves that could influence it. In Vello, coverage accumulates per tile in compute, and the fragment stage reads precomputed values. The tiling approach avoids redundant per-fragment work for pixels fully inside a glyph, which matters when text covers large screen areas. The per-fragment approach integrates more naturally into existing fragment-shader pipelines.

Vello is open source and Rust-native, targeting desktop application development in Rust. Slug is a commercial C++ SDK aimed at game engines and graphics pipelines with existing C++ codebases. The two do not compete directly; they serve different engineering contexts and reflect different positions on how GPU coverage should be computed.

Variable Fonts and the Precomputation Problem

OpenType 1.8 in 2016 introduced variable fonts, where a single font file encodes a continuous design space across axes for weight, width, slant, and arbitrary parameters. For precomputed approaches, this creates a combinatorial problem. An SDF atlas generated at one weight is wrong at another. Generating atlases across a continuous design space is either impractical or requires live regeneration with real latency whenever a design axis value changes.

For Slug, variable font support is mechanical. Interpolating between control point positions for two masters is a linear operation on the same curve data the shader evaluates:

P_interp = (1 - w) * P_master0 + w * P_master1

The interpolated curves are valid Bézier curves; the shader evaluates them with the same algorithm as any static glyph. Committing to actual curve geometry at render time means format extensions that modify geometry compose cleanly. Committing to precomputed approximations means format extensions that require re-approximating become pipeline engineering problems.

What Ten Years of Maintenance Adds

The algorithm in Slug has not changed since 2015, because mathematics does not deprecate. What changed is everything around it. The library now supports Vulkan, Metal, Direct3D 12, and OpenGL 4 across Windows, macOS, Linux, iOS, and Android, through the API transitions of an entire GPU generation. Three distinct shader languages are in active use. Mobile GPU architectures with tiling renderers and bandwidth constraints shaped shader design in ways that were not fully visible in 2015.

Maintaining stable external interfaces through this churn, so that commercial integrators can upgrade without being burned by breaking changes, is the real engineering work the retrospective describes. It is less visible than the algorithmic contribution but represents an equal or larger investment of time.

The JCGT paper formalization serves a specific purpose in this context. A customer integrating Slug into a production pipeline can read the mathematics, reproduce the reasoning, and audit what the library claims to compute. Peer-reviewed algorithmic documentation in commercial graphics middleware is uncommon. It separates the correctness claim from the implementation and gives integrators something to verify against beyond benchmarks and screenshots.

The Trajectory

The path from Loop-Blinn through SDF to Slug represents a twenty-year search for exact coverage. Loop-Blinn had exact shape, relied on MSAA for coverage. SDF had cheap coverage, approximated shape. Slug produces exact coverage from exact shape, at a per-fragment cost that hardware made affordable in 2015 and that has only decreased since.

Lengyel’s decade retrospective is most useful read with this trajectory in mind. Slug did not arrive in isolation; it arrived after twenty years of GPU font rendering had clarified what the right answer looked like. The library’s contribution was building and sustaining the implementation that delivered it.

Was this interesting?