· 6 min read ·

What a Million-Line C++ Codebase Teaches You About Agentic Coding

Source: lobsters

The ClickHouse engineering team documented their experience with agentic coding recently, and it’s one of the more useful data points we have on what AI coding tools feel like inside a massive, genuinely complex systems project.

Most writing about agentic coding comes from web or mobile contexts, where the feedback loop is fast, the codebase is typically under a hundred thousand lines, and type errors surface in milliseconds. ClickHouse is a different category of problem. The project is written almost entirely in C++, spans well over a million lines of code across its GitHub repository, and implements a columnar database engine that includes its own storage subsystem, a vectorized query executor, JIT compilation via LLVM, and a distributed query planner. A clean build can take 30 to 60 minutes on modern hardware. Understanding what an agent encounters in that environment is genuinely informative.

The Feedback Loop Problem

The hardest thing about applying agentic coding to large C++ projects isn’t context; it’s latency. Agents that edit code need some way to verify their changes. In JavaScript or Python, that means running a test or starting a dev server in a few seconds. In C++, it often means waiting for the compiler.

ClickHouse uses clang as its primary compiler and ships tooling around clang-tidy and clang-format for code quality. The project supports incremental builds, which helps enormously with agent workflows; you don’t need to rebuild the entire database engine to verify a change to one query function. But incremental C++ builds still routinely take several minutes per iteration, and when an agent makes a change that touches a widely-included header, the incremental build can cascade into something much longer.

The practical consequence is that an agentic loop that works well in a fast-feedback environment, where each cycle of writing, testing, and revising completes in seconds, needs to be rethought entirely. Teams working in large C++ projects often end up favoring agents for tasks where the verification step can be fast, such as running a specific unit test or checking compiler output for a small translation unit, rather than tasks that require full system builds to validate.

One strategy that helps is scoping the agent’s working area tightly. If you give an agent a clear boundary, a single function, file, or module, and you build just that scope, you can bring the feedback loop down to something workable. The challenge is that C++ dependencies are often non-local; a change to a template in a header file can have effects across dozens of translation units, and an agent working on a single file may not have visibility into those downstream effects.

Context at Scale

The second structural problem is context management. Current large language models have context windows measured in hundreds of thousands of tokens, which sounds like a lot until you consider that a single subsystem in ClickHouse can span tens of thousands of lines across dozens of files. The MergeTree storage engine, ClickHouse’s primary storage format supporting ReplicatedMergeTree, SummingMergeTree, and the rest of the MergeTree family, is not something you can load into a context window and reason about holistically.

An agent working on a bug in query execution needs to understand the interface contracts between the query planner, the query pipeline, and the storage engine. None of those relationships are visible from any single file, and the template-heavy C++ style common in performance-sensitive code doesn’t make things easier; the type system does a lot of work, but it spreads that work across many layers of abstraction that an agent has to trace. Template error messages in C++ are famously verbose and hard to parse even for experienced engineers; for an agent operating on constrained context, they’re a significant source of noise.

Well-designed context injection helps here. Tools like Claude Code support repository-level context files (CLAUDE.md) that let teams encode architectural knowledge, naming conventions, and subsystem boundaries directly into the agent’s working context. For a project like ClickHouse, a well-maintained context document can do a lot to orient an agent before it starts reading individual files, telling it that storage engines implement a certain interface, that all query functions follow a particular threading model, or that certain header files are widely included and should be touched carefully.

There’s a pattern observable across large projects: codebases that were already well-documented, with clear module boundaries and strong naming conventions, tend to work better with AI agents than codebases where that knowledge lived only in the heads of long-tenured engineers. ClickHouse has invested significantly in its developer documentation, which provides a foundation for this kind of context injection.

Where Agents Deliver

Given those constraints, the question becomes which tasks agents can handle reliably in a project like this. Teams working in large C++ codebases tend to describe a common task hierarchy.

At the top of the useful tier: writing tests. Test code tends to be more self-contained, follows repeatable patterns, and doesn’t require deep understanding of cross-subsystem invariants. An agent that understands the test framework and is given a concrete function to test can often produce useful test coverage with minimal iteration. ClickHouse uses its own testing infrastructure with functional tests, unit tests, and performance tests; each category has patterns that an agent can learn and apply consistently.

One tier down: documentation, comments, and code explanation. Agents are strong at taking a function and producing a clear description of what it does, especially when they can see the full function body. For a project with the scope of ClickHouse, this represents a real productivity multiplier; there is always more internal code that deserves better documentation than anyone has time to write.

Below that: small, well-scoped feature additions or refactors where the interface contract is clear and the compilation scope is narrow. Adding a new configuration option, implementing a new aggregate function that follows an existing pattern, refactoring a self-contained utility. These tasks work well because they’re constrained in both context requirements and compilation surface area.

Where agents consistently struggle: core engine changes, storage format modifications, distributed system behavior, and anything that requires deep understanding of the threading model or memory ownership semantics. These are tasks where the agent’s context budget, build feedback loop, and dependency awareness all hit their limits at the same time. The kind of change that a senior ClickHouse engineer carries in their head over a week of careful work, understanding how a modification propagates through the system, isn’t something current agents can replicate by reading files.

Designing for Agents

There’s a longer-term implication worth sitting with. If agentic coding tools become a standard part of how engineers work on large codebases, the properties that make a codebase agent-friendly, such as clear module boundaries, fast incremental builds, well-documented subsystem contracts, and low-surprise dependency graphs, start to look less like nice-to-haves and more like architectural requirements.

ClickHouse’s experience is instructive here because the codebase was not designed with AI agents in mind; it was designed for performance, for correctness, and for the humans who knew it deeply. Adapting it to agent-assisted workflows requires layering on the kinds of contextual scaffolding that good software teams have always benefited from, but now have a much stronger immediate reason to maintain. The incentive structure has shifted: a well-written architecture document or a carefully scoped module boundary pays dividends not just when onboarding new engineers, but on every agentic coding session going forward.

The ClickHouse blog post is worth reading as a ground-level account of what this looks like in practice at a project that operates near the limits of what current tools can handle. The broader lesson is that the limiting factors for agentic coding in large systems projects are mostly the same things that made those projects hard for humans: slow feedback, opaque dependencies, and knowledge that never made it into the documentation. The agents just make those shortcomings more immediate.

Was this interesting?