· 8 min read ·

The Architectural Bet Inside Slug's Font Renderer

Source: hackernews

Eric Lengyel’s Slug library turned ten years old recently, and his retrospective has made the rounds. Most of the discussion lands on the obvious comparison: SDF versus analytic rendering, approximation versus exactness. That framing is correct but incomplete. Slug is not the only way to render fonts exactly on the GPU, and the more interesting question is why Lengyel structured it the way he did, specifically as a pure fragment shader approach, when at least three other architecturally distinct exact approaches existed or were emerging at the same time.

The answer reveals something about what the library was actually designed to do, and why a decade of platform maintenance has kept its fundamental approach unchanged while the surrounding graphics API landscape has transformed almost entirely.

Four Architectures for Exact GPU Text

By 2015, when Slug launched commercially, the GPU font rendering design space already had multiple branches. They shared a goal, rendering glyphs without pre-baking into texture atlases, but differed substantially in where the work happened inside the GPU pipeline.

Loop-Blinn (2005). Charles Loop and Jim Blinn’s paper from SIGGRAPH 2005 was the first practically oriented analytical GPU curve renderer. Their approach assigns each glyph a set of triangles on the CPU, encodes the Bezier curve implicit form into per-vertex texture coordinates, and then resolves inside/outside per fragment using a simple sign test on those coordinates. For a quadratic Bezier, the test amounts to evaluating whether u^2 - v < 0 at the interpolated texture coordinate pair (u, v), which the hardware evaluates trivially per fragment.

This is elegant and was ahead of its time. The shader cost is minimal: one multiply, one compare. The problem is in the geometry stage. Each glyph outline must be tessellated on the CPU into a specific triangle configuration where each triangle corresponds to one curve region. Compound glyphs with overlapping contours need careful handling to preserve the correct fill rule. Generating this geometry correctly for arbitrary TrueType outlines requires significant preprocessing logic, and the geometry must be regenerated for each distinct glyph shape. Antialiasing correct to a fraction of a pixel requires additional machinery that the original formulation did not provide cleanly.

Loop-Blinn trades fragment shader cost for geometry preprocessing cost and CPU-side tessellation complexity. For the GPU hardware of 2005, that trade was favorable. By 2015, fragment shader throughput had grown substantially relative to geometry throughput, and the tessellation complexity had become the practical obstacle.

Stencil-Then-Cover (NV_path_rendering, 2011). NVIDIA’s NV_path_rendering extension implemented a two-pass approach called stencil-then-cover. In the first pass, the glyph outline is rendered into the stencil buffer using a fan of triangles from a single origin point; the stencil bit toggles at each triangle boundary, implementing the even-odd fill rule. In the second pass, a bounding quadrilateral reads the stencil to determine which pixels lie inside the glyph and writes the final color. Antialiasing is handled by the GPU’s multisample hardware.

This approach produces very high quality results and handles complex glyph geometry correctly. Skia used a related approach in its GPU backend, and the stencil-then-cover idea underlies several other GPU path rendering systems. The limitations are structural: it requires two rendering passes, it uses the stencil buffer in a way that can conflict with other rendering operations in a complex scene, and the GL extension version depends on vendor-specific functionality. Deploying NV_path_rendering in a game engine meant taking a hard dependency on NVIDIA hardware, or maintaining a fallback path for everything else.

For a standalone document rendering application, these constraints are manageable. For a game engine shipping across desktop, mobile, and console with a Vulkan-first pipeline and no stencil buffer reserved for font use, they are not.

Per-Fragment Ray Casting (Slug, ~2015). Slug moves the entire problem to the fragment shader. No geometry preprocessing per glyph at runtime. No stencil buffer involvement. The glyph outline data lives in a GPU buffer, organized per glyph into horizontal bands that allow the shader to quickly identify which curve segments are relevant to the current fragment row. The shader loads those segments, evaluates Bezier intersections analytically using the winding number rule, and integrates coverage over the fragment area. For a quadratic TrueType Bezier, this means solving a quadratic per segment:

// Find t where Bezier y-component equals the sample y
float a = P0.y - 2.0 * P1.y + P2.y;
float b = 2.0 * (P1.y - P0.y);
float c = P0.y - sample_y;
float disc = b * b - 4.0 * a * c;
if (disc >= 0.0) {
    // Real roots: evaluate x position, determine winding contribution
    float t1 = (-b - sqrt(disc)) / (2.0 * a);
    float t2 = (-b + sqrt(disc)) / (2.0 * a);
    // Accumulate signed winding from crossings in [0, 1]
}

For OpenType CFF fonts using cubic curves, this extends to solving a cubic polynomial per segment, more expensive but still analytical. The band acceleration structure is what makes this practical: rather than iterating over every curve in a glyph for every fragment, the shader only visits curves whose y-extent overlaps the current fragment row. For typical Latin glyphs this bounds the iteration to three to six segments per fragment on average.

The formal algorithm is in Lengyel’s 2017 JCGT paper, which provides a peer-reviewed description of both the winding number integration and the coverage computation. The peer-reviewed publication matters for commercial licensing: it gives integrators an auditable mathematical foundation rather than just a black box.

Compute-Shader Tiling (Vello, ~2019-present). Raph Levien and contributors at the Linebender project have developed Vello, which takes a fundamentally different architecture again. Rather than evaluating coverage per fragment in a fragment shader, Vello uses a pipeline of compute shaders that first assign scene elements to screen tiles, then process each tile to produce a coverage buffer, then composite the result. This is closer to software rasterization on the GPU than to either Slug or Loop-Blinn.

The tile-based approach is highly parallel and scales well to complex scenes with many vector elements. It processes entire scenes as batches rather than drawing individual glyphs one at a time. The trade-off is that it requires compute shader support and operates best in a 2D scene rendering context where the entire scene is submitted together.

Why the Fragment Shader Makes Sense for Games

The comparative picture makes Slug’s choice clearer. If your application is a 2D document renderer or a desktop GUI framework, Vello’s compute approach or stencil-then-cover are architecturally appropriate: you control the entire rendering pipeline, you can reserve the stencil buffer, and you benefit from batch processing an entire document’s worth of glyphs together.

If your application is a 3D game engine, the constraints are different. Text appears on surfaces, on billboards, on HUD elements rendered at unpredictable scales and viewing angles. The rendering pipeline is not yours to redesign around font rendering: it belongs to the frame graph, with depth buffers, shadow passes, and particle systems all competing for the same GPU resources. The stencil buffer is occupied. Geometry preprocessing must be minimal because glyphs may appear on objects with arbitrary transforms.

A pure fragment shader approach fits cleanly into this context. Bind the glyph buffer, draw a textured quad for each glyph, let the shader handle the geometry of the outline. No passes, no stencil involvement, no CPU tessellation per frame. The shader cost is higher than SDF, but text is typically a small fraction of total fragment work in a 3D scene, and the quality is unconditionally correct at any scale the scene requires.

This is also why Slug’s support for sub-pixel antialiasing is meaningful in this context. An LCD display’s sub-pixel structure means R, G, and B samples are horizontally offset from each other by one-third of a pixel. A fragment shader evaluating Bezier coverage analytically can compute separate coverage values for each sub-pixel position, producing LCD-quality antialiasing that SDF cannot replicate without significant additional processing. This is most visible at small text sizes, where the sharpness difference between sub-pixel rendering and standard antialiasing is legible.

The Decade Confirms the Bet

The graphics API landscape between 2015 and 2026 changed dramatically: OpenGL to Vulkan, DirectX 11 to DirectX 12, Metal, SPIR-V, WebGPU. Lengyel’s retrospective describes the sustained engineering effort of porting across all of these while keeping the API stable enough for commercial integrators. The core algorithm required no changes through all of this. The winding number is not an API-specific concept.

This is the structural advantage of putting complexity into mathematics rather than into the rendering pipeline. Pipeline-dependent approaches, those that require stencil buffers, geometry stages, or specific extension support, must adapt when pipelines change. A fragment shader evaluating quadratic intersection arithmetic adapts to any platform that runs shaders, which by 2026 means every platform worth shipping on.

The OpenType specification also kept moving. Variable fonts in 2016 added continuous design-space interpolation between outline masters; color fonts added layered and SVG-based glyphs. For Slug, variable font support involves interpolating Bezier control points before uploading to the GPU, which composes naturally with the existing algorithm. Interpolated Bezier curves are still Bezier curves:

P_interpolated = (1 - t) * P_master_a + t * P_master_b

For an atlas-based approach, each position in the variable font design space potentially requires a separately precomputed atlas entry. The atlas-based pipeline does not compose with continuous design spaces.

Where This Leaves the Design Space

None of these four approaches is universally correct. Loop-Blinn made sense in 2005 on the hardware of 2005. Stencil-then-cover makes sense today for applications that can structure their pipeline around it. Vello makes sense for 2D document and UI rendering in the Rust ecosystem. Slug makes sense for 3D rendering contexts that cannot reserve the stencil buffer, require text at unpredictable scales, and need to integrate into an existing multi-pass game rendering pipeline with minimal friction.

The decade Slug has spent in production is evidence of two things: that the approach is technically sound, and that the market segment it addresses, games and graphics-heavy applications needing high-quality text without preprocessing, is real and not well-served by the alternatives. The algorithm was formally published, the platform support is broad, and the maintenance is active in a way that open source alternatives without commercial backing are not always guaranteed to be.

Ten years is a long time in GPU programming. That the fundamental approach has not changed, while everything around it has, is the kind of validation that only time can provide.

Was this interesting?