Big-O as a Test Assertion: What bigoish Brings to Rust's Testing Story
Source: lobsters
There’s a class of performance regression that unit tests miss entirely and benchmarks catch only if you’re paying attention. An algorithm that was O(n log n) becomes O(n²) after a refactor. The tests still pass. The benchmark still runs. The numbers look similar at n=1000. But production is n=10,000,000, and now the service is crawling.
The bigoish crate on crates.io addresses this by treating complexity class as something you can measure and assert on, not just claim in a comment.
What the Crate Actually Does
The core idea is simple: run a function at several input sizes, record how long it takes at each, and fit the resulting (n, time) curve against a set of candidate complexity models: O(1), O(log n), O(n), O(n log n), O(n²), O(n³). The model that best fits the data wins.
The API centers on providing a closure that takes a usize input size and does the work. bigoish runs that closure across the size range you specify and reports back the inferred complexity class. The usage looks roughly like this:
use bigoish::*;
#[test]
fn test_sort_complexity() {
let complexity = measure(|n| {
let mut v: Vec<i32> = (0..n as i32).rev().collect();
v.sort();
})
.with_sizes([100, 1_000, 10_000, 100_000])
.run();
assert_eq!(complexity, Complexity::NLogN);
}
The important part is that last line. You’re asserting a complexity class the same way you assert a return value. If someone later replaces .sort() with a naïve O(n²) implementation, this test fails.
The Statistics Under the Hood
Getting from a list of (n, milliseconds) pairs to a complexity class involves two standard techniques, often used together.
The first is log-log regression. If runtime scales as T(n) = c * n^p, then log(T) = log(c) + p * log(n), which is a line in log-log space. Fit a line via ordinary least squares and the slope p is the exponent: slope ≈ 1.0 means O(n), slope ≈ 2.0 means O(n²), and so on. The goodness of fit (R²) tells you how well the candidate model explains the data.
The second technique is the doubling ratio: compute T(2n) / T(n) for successive pairs of doubled sizes. If that ratio converges to ~2, the algorithm is O(n). If it converges to ~4, it’s O(n²). O(n log n) produces a ratio slightly above 2, converging toward 2 as n grows.
Neither technique is foolproof. Both require that you sample enough orders of magnitude to see the asymptotic behavior. At small n, constant factors and lower-order terms dominate. A low-constant O(n²) insertion sort can look like O(n log n) in the 100–1000 range. Geometric spacing of your test sizes — 10, 100, 1000, 10000 rather than 10, 20, 30, 40 — is essential to span enough range for the regression to stabilize.
This is well-understood methodology. The academic groundwork for empirical algorithm analysis was laid in the 1990s and formalized in Catherine McGeoch’s A Guide to Experimental Algorithmics, which remains the standard reference for this kind of work. bigoish is a practical distillation of that methodology into a form that fits inside a Rust test suite.
This Isn’t New, But It’s Underused
The most mature prior implementation of this idea in a compiled language is Google Benchmark’s complexity estimation feature, which has been available since around 2015. You annotate a benchmark with candidate complexity classes and it prints the R² values alongside inferred scaling:
static void BM_StringCompare(benchmark::State& state) {
std::string s1(state.range(0), '-');
std::string s2(state.range(0), '-');
for (auto _ : state) {
benchmark::DoNotOptimize(s1.compare(s2));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK(BM_StringCompare)->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
The Python ecosystem has had the big-O library since 2013, which takes a callable and a data generator and returns the best-fit complexity class with a clean one-liner:
import big_o
best, others = big_o.big_o(sorted, big_o.datagen.integers, min_n=100, max_n=100_000)
print(best) # O(n log n)
What’s notable is how rarely this kind of assertion shows up in typical CI pipelines, for Rust projects or otherwise. Most Rust codebases use Criterion.rs for benchmarking, which produces excellent statistical timing data but leaves the interpretation to the developer. Criterion tells you a function got 15% slower; it doesn’t tell you whether that function is still O(n) or has become O(n²). bigoish occupies a different, narrower role: it asserts on the shape of the curve rather than specific timing numbers.
Why Rust Specifically Benefits
One practical complication with empirical complexity testing in any language is that the compiler might optimize away your measured computation, producing artificially good results. Rust is aggressive about dead code elimination. If you measure a function and never use its return value, the optimizer can potentially elide the entire call.
The standard solution is std::hint::black_box, which prevents the compiler from treating a value as dead. Criterion.rs has used this for years. Any serious complexity testing crate needs to wrap its timed calls appropriately, or measured functions need to return something that the framework consumes through black_box.
On the positive side, Rust’s Fn/FnMut trait bounds make the framework API clean. A complexity testing function can accept impl FnMut(usize) and the type system handles the rest without virtual dispatch overhead. std::time::Instant gives nanosecond-precision monotonic timing without pulling in external dependencies.
The ownership model also creates a useful constraint: if your measured function captures state from the outer scope, Rust makes those ownership semantics explicit. There’s no accidental sharing between iterations in a way that would corrupt the timing signal, unlike Python where a mutable default argument or global could subtly bleed between runs.
The Real Limitations
Empirical complexity testing is a practical tool, not a proof system. It can tell you that your function behaves like O(n log n) for the input sizes and distributions you tested. It cannot tell you that it remains O(n log n) for all n or all inputs.
The input distribution dependency is worth taking seriously. A sort that’s O(n log n) on uniformly random integers might be O(n²) on adversarial inputs. Empirical testing only covers what you put in. If you measure complexity on random data and ship, you may still have a quadratic worst case waiting in production.
Distinguishing similar complexity classes is also genuinely hard. O(n log n) and O(n^1.1) are nearly indistinguishable in the range n = 10³ to n = 10⁵. You need several orders of magnitude, and even then the confidence is limited. A regression from O(n log n) to O(n^1.2) might not trigger a failure if your test sizes are too narrow.
None of this makes the tool useless. Catching a regression from O(n) to O(n²) in CI is valuable and achievable. The technique is well-suited for detecting gross complexity class changes: the kind that cause real production incidents. It is poorly suited for subtle superlinear regressions or for giving formal guarantees.
Where This Fits in a Rust Testing Stack
The practical complement to bigoish in a production Rust project looks like this: unit tests for correctness, Criterion for tracking absolute timing regressions over time via stored baselines, and bigoish for asserting that specific functions stay within their expected complexity class. They answer different questions.
Criterion’s stored baselines catch “this function got 20% slower.” bigoish catches “this function is now quadratic.” Both are worth having; they don’t overlap much.
For teams that ship data-structure-heavy or algorithm-heavy code, empirical complexity assertions in CI can prevent the category of incident where a data structure change looks innocuous at test scale but is catastrophic at production scale. The techniques behind bigoish have been known for decades. Having them available as a standard Rust test assertion, rather than a custom analysis script someone writes once and never runs again, is the concrete contribution.