61% Fewer Allocations: How Shopify Tuned the Liquid Template Engine

Back in March 2026, Simon Willison flagged a notable performance commit landing in Shopify’s Liquid gem: 53% faster parse+render, 61% fewer allocations. If you ship Ruby and you have never sat with a memory profiler watching objects pile up during template rendering, those two numbers might look like routine maintenance. They are not. The allocation figure is the more significant one, and understanding why requires a short detour into how Ruby’s runtime actually spends its time.

The Allocation Tax in Ruby

Ruby is a garbage-collected language with a generational mark-and-sweep GC. Every object you allocate lives on a managed heap. The GC has to mark reachable objects, sweep dead ones, and occasionally compact. The more objects you create per request, the more work the GC does, and the more it interferes with your actual application code.

For a web application rendering templates on every request, this matters a lot. Object allocation is not just about memory. It is about throughput, latency variance, and GC pause frequency. A template engine that creates 61% fewer objects per render is not just faster in a benchmark; it produces less GC pressure across the lifetime of a process, which means steadier latency under load.

Ruby has progressively improved its GC over the years. Ruby 2.1 introduced generational GC. Ruby 2.2 added incremental GC. Ruby 3.x brought object shapes and variable width allocations to reduce the cost of small objects. But none of these improvements make allocation free. The most reliable way to reduce GC pressure is to allocate fewer objects in the first place.

How Liquid Works

Liquid is a two-phase engine. In the parse phase, raw template text is tokenized and assembled into an abstract syntax tree composed of Tag and Variable nodes. In the render phase, the engine walks that tree against a Context object that holds the current variable scope, and outputs a string.

The parse phase is typically called once and the result cached. The render phase runs on every request. This distinction matters for understanding where optimization effort pays off: parse-time savings compound only when you are not caching, while render-time savings apply to every request.

Liquid’s architecture is deliberately conservative. Unlike ERB, which compiles templates to Ruby code strings and then evals them (producing native Ruby methods that the JIT can optimize), Liquid interprets its AST on every render. This is a security and sandboxing decision. Shopify’s storefronts run third-party Liquid templates, and eval-based rendering would make sandboxing far harder. The cost of that safety is that Liquid cannot amortize template logic into native code the way ERB can.

This means Liquid’s rendering performance is entirely determined by how efficiently it walks its AST and constructs output. Object allocation is the main variable under the engine’s control.

Where the Allocations Come From

A naive Ruby template engine creates a staggering number of objects during a single render. Consider what happens when Liquid evaluates a simple variable like {{ product.title }}:

The tokenizer produces a string token for the variable expression.
The parser creates a Variable node containing the parsed expression.
The renderer resolves product against the context, then traverses to title.
String operations on the output buffer may create intermediate strings.

Multiply this by hundreds of variable references, loop iterations over product arrays, and conditional blocks, and you start to see how a single template render can produce thousands of short-lived strings and objects that do nothing except get collected.

Common sources of unnecessary allocation in mature Ruby codebases include:

String literals that should be frozen. In Ruby, every time you write "some_string" without the # frozen_string_literal: true magic comment, you get a new String object. With frozen string literals enabled, identical string literals become the same object. For a template engine, this applies to tag names, operator strings, whitespace, and punctuation that appear in every template.

Intermediate strings during output construction. Using String#+ to build output creates a new string on each concatenation. The idiomatic alternative is String#<< (in-place concat) or using an explicit output buffer that accumulates chunks. Liquid’s rendering has historically used an output buffer approach, but the specifics of how tags append to it matter significantly.

Hash and Array construction during lookup. Resolving variable paths like product.variants.first.price requires traversal. If that traversal creates intermediate arrays or hash objects to hold partial results, those accumulate fast in loops.

Redundant wrapping objects. Some engines wrap primitive values (integers, booleans, nil) in decorator objects to give them uniform interface methods. If those wrappers are created on every access rather than cached or avoided for simple types, the cost is proportional to template complexity.

The liquid-c Context

It is worth noting that Shopify runs liquid-c in production, a C extension that accelerates the lexer and tokenizer. The pure Ruby gem improvements described here are distinct from that work.

The pure Ruby path still matters. It is the default for anyone installing the liquid gem without the C extension. It runs in test suites. It runs on platforms where native extensions are unavailable or impractical. And because liquid-c only covers the lexer/scanner portion of the parse phase, improvements to the pure Ruby parser and renderer benefit all users, including those running with the C extension.

The fact that 53% gains were available in the pure Ruby path of a gem that Shopify has been running in production for over fifteen years says something about the nature of performance work in mature codebases. These are not “obvious” wins that got overlooked. They require profiling under realistic workloads, understanding which allocations are load-bearing and which are incidental, and systematic work to eliminate the incidental ones.

Comparison: ERB and the Compilation Trade-off

Ruby’s standard library ships with ERB, which takes a fundamentally different approach. ERB compiles templates to Ruby source code strings, then evals them into methods. A template like:

<h1><%= product.title %></h1>

becomes something like:

_buf = ''
_buf << '<h1>'
_buf << (product.title).to_s
_buf << '</h1>'
_buf

Once compiled and evald, this is native Ruby bytecode. The YJIT compiler (introduced in Ruby 3.1 and substantially improved in 3.2 and 3.3) can JIT-compile it further. The allocation profile is minimal because you are essentially calling native methods on a buffer.

Liquid cannot do this because its sandboxing requirements forbid arbitrary Ruby execution. The template language is intentionally limited: no method calls beyond a whitelist of Drops, no eval, no require. This means every performance gain has to come from optimizing the interpreter loop itself, not from leaning on the Ruby runtime’s compilation machinery.

Haml and Slim take the compilation approach for HTML templates and are similarly fast for the same reasons. The lesson is that interpreted template engines have a steeper optimization hill to climb, but Liquid has legitimate reasons for staying on that hill.

What 61% Fewer Allocations Looks Like in Practice

At Shopify’s scale, this is the difference between meaningful. Their storefronts serve millions of requests per hour. Each request renders a Liquid template. A 61% reduction in allocations per render translates directly to reduced GC frequency, shorter GC pauses, and more consistent tail latencies.

For smaller users of the gem, including anyone using Jekyll, which uses Liquid for its templates, the gains show up in build times. A Jekyll site with hundreds of pages, each rendering multiple Liquid templates with loops and partials, can see meaningful reductions in generation time from allocation improvements. Jekyll’s build performance has been a persistent complaint in the static site community; anything that reduces allocation pressure during rendering compounds across the entire build.

The Liquid gem has millions of downloads per month. It underpins Shopify themes, Jekyll, GitHub Pages (which runs Jekyll), and a range of other Ruby projects that chose it for its sandboxing guarantees. Performance work at this layer propagates widely.

The Broader Pattern

Shopify’s Liquid optimization follows a pattern that shows up repeatedly in mature, high-traffic Ruby applications. The first wave of optimization is usually architectural: move work out of the hot path, cache aggressively, avoid redundant computation. The second wave is often compiler/extension work: liquid-c is exactly this. The third wave, which is what this commit represents, is systematic allocation reduction: audit every object the hot path creates, eliminate the unnecessary ones, and measure the result.

This third wave requires tooling. Ruby’s built-in ObjectSpace and gems like memory_profiler and allocation_tracer make it possible to see exactly which lines of code are responsible for which allocations. Running a template render through memory_profiler and reading the output is one of the most instructive exercises in Ruby performance engineering because it makes the allocation cost of ordinary-looking code concrete.

The result here, 53% faster parse+render from 61% fewer allocations, is a reasonable ratio. It suggests the allocations being eliminated were genuinely on the critical path, not just background noise. Allocations that never contribute to GC cycles (because they are quickly collected in the young generation before a major GC) have smaller impact. The ones being eliminated here were evidently costing real time.

For anyone maintaining a Ruby library or application with performance-sensitive rendering paths, this commit is a case study worth examining. Not for the specific techniques, but for the methodology: profile allocations under realistic load, find the unnecessary ones, eliminate them systematically, and measure the end-to-end impact rather than just the micro-benchmark.