Ten Years of Exact GPU Font Rendering: What Slug Got Right From the Start
Source: hackernews
The Slug font rendering library shipped commercially around 2015, and Eric Lengyel’s retrospective on its first decade is worth reading carefully. Not because anniversary posts are usually interesting, but because Slug made an architectural bet at launch that most developers in 2015 would have called impractical, and the bet paid off. Ten years later, with Vulkan, Metal, DirectX 12, and WebGPU all in its support matrix, that original decision looks better than ever.
The bet was simple to state: do not approximate the coverage computation, compute it exactly.
What Coverage Actually Means
When a fragment shader runs, it evaluates at a single point. Text rendering needs something different: the fraction of a pixel’s area that lies inside a glyph outline. That fraction is coverage. Get it right and text is crisp at any size. Get it wrong and you’re managing the artifacts that result.
Every major approach to GPU font rendering has had a distinct answer to this problem, and the differences between them explain a lot of history.
Loop and Blinn’s 2005 SIGGRAPH paper was the first practical analytical approach. It triangulates glyph outlines and encodes the Bezier curve implicit form into per-vertex texture coordinates, so the fragment shader can determine inside versus outside with a single sign test. Elegant and cheap. But it produces no coverage value, just a binary result. Anti-aliasing comes from MSAA, which was hardware-accelerated by 2005 and cheap relative to extra fragment arithmetic. Given the hardware of that era, it was the right tradeoff.
Chris Green’s 2007 Valve paper introduced what most developers now call SDF rendering. The approach precomputes a texture where each texel stores the signed distance to the nearest glyph outline edge. At runtime, the fragment shader samples that texture and runs a smoothstep around the threshold to produce approximate coverage. One texture lookup, one comparison, done. The performance profile was perfect for 2007 hardware: no loops, no structured buffer access, no variable-length data iteration in shaders.
The structural problem with SDF is fundamental. Distance encodes proximity, not shape. Near a sharp corner, the distance field is smooth regardless of the angular discontinuity in the outline, so corners soften. Fine strokes thin incorrectly at sizes far from the baked resolution. SDF is an approximation with predictable failure modes.
MSDF, developed by Viktor Chlumský around 2015, improved on single-channel SDF by encoding edge direction information across the RGB channels of a texture. It recovers corner sharpness better than single-channel SDF and became standard in engines like Godot and libGDX. But it remains an approximation, baked offline, and the channels encode edge direction rather than sub-pixel coverage, which matters for LCD rendering.
How Slug Computes Coverage
Slug stores actual Bezier curve data for each glyph in GPU-accessible buffers. For each fragment, the shader identifies which curve segments have y-intervals that overlap the current fragment row, then integrates the winding number function over the fragment area using those segments.
For TrueType fonts with quadratic Beziers, each segment requires solving a quadratic per y-crossing:
float a = P0.y - 2.0 * P1.y + P2.y;
float b = 2.0 * (P1.y - P0.y);
float c = P0.y - sample_y;
float disc = b * b - 4.0 * a * c;
if (disc >= 0.0) {
float t1 = (-b - sqrt(disc)) / (2.0 * a);
float t2 = (-b + sqrt(disc)) / (2.0 * a);
// evaluate x crossing at valid t values, accumulate signed winding
}
OpenType CFF fonts use cubic Beziers, which require a cubic polynomial solve per segment. More expensive, still analytical. Lengyel published the full algorithm in the Journal of Computer Graphics Techniques in 2017 as “GPU-Centered Font Rendering Directly from Glyph Outlines,” providing a peer-reviewed mathematical foundation.
The spatial acceleration structure is what keeps per-fragment cost tractable. Only segments whose y-intervals overlap the current row contribute, so iteration count is proportional to local glyph complexity rather than total outline length. For typical Latin glyphs, that’s three to six contributing segments per fragment on average.
The payoff is exact sub-pixel coverage. Because the integration computes actual area intersection, Slug can run three separate integrals per fragment for LCD sub-pixel antialiasing, one for each sub-pixel rectangle:
float subpixel_w = pixel_width / 3.0;
float coverage_r = integrateGlyphArea(x - subpixel_w, y, subpixel_w, pixel_h);
float coverage_g = integrateGlyphArea(x, y, subpixel_w, pixel_h);
float coverage_b = integrateGlyphArea(x + subpixel_w, y, subpixel_w, pixel_h);
This is geometrically impossible with SDF or MSDF. Distance fields encode one scalar per texel; sub-pixel coverage requires three spatially distinct area integrals. The data structure cannot represent what the computation requires.
Why 2015, Not 2010
Slug’s approach requires iterating over variable-length structured data in a fragment shader. That is not something fragment shaders could do efficiently before OpenGL 4.3 Shader Storage Buffer Objects in 2012. SSBOs allow arbitrary read/write access to GPU memory from shaders; before them, per-fragment iteration over curve data had no clean implementation path.
D3D11 structured buffers appeared in 2009, but the combination of SSBO access, adequate fragment throughput, and a broadly supported OpenGL 4.x baseline only came together around 2012 to 2014. Slug launched when hardware had crossed the threshold where exact coverage was practical, not before.
This is a pattern worth noting. The dominant approaches in any GPU rendering domain tend to reflect the hardware constraints at the time they became widely used, not the hardware constraints at the time of need. SDF spread through games throughout the 2010s even as the hardware that made Slug viable became standard. Good-enough approximations accumulate inertia.
Variable Fonts and the Atlas Problem
OpenType variable fonts, introduced in OpenType 1.8 in 2016, expose a combinatorial problem for precomputed approaches. A variable font encodes a continuous design space, with axes like weight, width, and optical size. A text atlas that covered multiple points in that space would need to be regenerated each time the design coordinates changed, or the implementation would need separate atlases per design instance.
For Slug, variable fonts are handled by interpolating the Bezier control points between design instances and running the same fragment shader. The algorithm does not change; the input data changes continuously. There is no baking step to redo, no atlas to invalidate. Variable font support is effectively free.
The same applies to color fonts, where the glyph outlines themselves may vary continuously, and to arbitrary zoom levels in document viewers or map labels where the effective display size of any given glyph may span an order of magnitude.
The Compare Against Vello
The closest contemporary alternative with the same commitment to exact coverage is Vello, Raph Levien’s compute-shader based 2D renderer developed under the Linebender project. Vello divides the screen into tiles, uses GPU compute shaders to assign scene elements to tiles, then accumulates exact area integrals per tile.
Vello and Slug arrive at the same mathematical result through different GPU architectures. Vello’s tiling approach amortizes work across fully-covered tiles and fits cleanly into document and UI rendering pipelines where the entire scene is submitted as a batch. Slug’s per-fragment approach integrates into existing 3D game rendering pipelines where you want to drop text rendering into an existing fragment-shader–based pass without restructuring the pipeline around a separate tiling compute stage.
Neither is strictly better. They solve the same mathematical problem for different integration contexts.
NVIDIA’s NV_path_rendering also achieves correct coverage via a stencil-then-cover approach: a first pass renders glyph outlines into the stencil buffer using fan triangles with stencil bit toggling, and a second bounding-quad pass reads the stencil for final color. The quality is high, but the two-pass dependency, stencil buffer consumption, and hard NVIDIA extension requirement limit its practical use in engines with existing multi-pass pipelines.
What a Decade of Maintenance Looks Like
The retrospective is also honest about what sustained commercial development of a GPU library actually involves. Slug launched against OpenGL 4 and D3D11. The GPU API landscape since then includes Vulkan, Metal, DirectX 12, SPIR-V shader compilation, and early WebGPU support. Keeping the library functional and well-supported across three major GPU API generations while preserving stable application-facing interfaces is substantial engineering work, and it is the kind of work that commercial licensing makes possible where open source projects can stall.
The JCGT paper provides a fixed mathematical specification. The API surface around it has had to evolve considerably.
When the Cost Is Worth It
Slug’s per-fragment cost is higher than SDF in the fragment shader: real Bezier arithmetic versus a texture sample and a comparison. For most 3D scenes, text rendering is a small fraction of total fragment work, so this rarely matters in practice. The cost-benefit calculation shifts for mobile and bandwidth-constrained hardware, and for 2D applications rendering large amounts of small text at fixed sizes, where MSDF may be visually indistinguishable at lower implementation cost.
The use cases where exact coverage earns its cost are specific: 3D billboard text at arbitrary orientations and distances, zoomable document or map viewers where display scale is continuous, variable font animation, high-density displays where sub-pixel rendering matters, VR where effective pixel density is low enough that every coverage error is visible. These are the domains where approximation fails visibly and correctness is a product differentiator.
Ten years of Slug is partly a story about one library’s technical choices holding up. It is mostly a story about what happens when a rendering algorithm is designed around what the math requires rather than what the hardware of a given year can cheaply deliver.