The Documentation Gap: Why Godogen Needed a Quirks Database to Generate Godot Games

Retrieval-augmented generation over API documentation has become the standard response to LLM code generation failures in specialized domains. Feed the model the current docs, reduce hallucinations, get better output. Godogen, a pipeline that generates complete playable Godot 4 projects from text prompts, applies this approach, but its reference system has a third component that documentation retrieval cannot provide: a quirks database.

That database, encoding engine behaviors that do not appear in official documentation, turns out to be load-bearing. Understanding why reveals something worth generalizing beyond Godot.

What Documentation Actually Contains

Godot 4’s API reference is comprehensive. The engine’s XML source files document approximately 850 classes, their methods, properties, signals, and return types. For most of those classes, the documentation accurately describes what the API does.

What documentation does not contain is what the API does in specific conditions that the documentation authors did not anticipate needing to explain. These conditions are often implicit invariants that experienced Godot developers know and new developers discover by debugging. An LLM trained on documentation has no path to this second category of knowledge.

Godogen’s reference system has three layers to address this: a hand-written GDScript language specification, the full Godot 4 API documentation converted from the engine’s XML source, and the quirks database. The first two layers correspond to what documentation provides. The third layer corresponds to what only running Godot and observing its behavior teaches you.

The Owner Property as a Case Study

The clearest example of knowledge that documentation does not capture is the owner property requirement during headless scene construction.

Godogen builds Godot scenes by writing GDScript programs that construct the node graph in memory and serialize it using the engine’s ResourceSaver. This approach delegates format correctness to the engine rather than generating the raw .tscn text, which is fragile due to internal consistency requirements.

During headless construction, every node added programmatically must have its owner property set explicitly to the scene root:

var node = Node2D.new()
scene_root.add_child(node)
node.owner = scene_root  # This line is required

Godot uses owner to determine which nodes belong to the serialized scene graph versus which are transient scene-tree members that should not persist. The distinction matters because scenes can contain nodes that are instantiated from sub-scenes, and their ownership relationships encode where they came from.

Omitting the owner assignment produces a failure mode that is easy to misinterpret: the node appears correct in memory during construction, writes to disk without any error message, and then silently disappears when the saved .tscn file is reloaded. The program that built the scene reported success. The engine accepted the save operation. The output file exists and is syntactically valid. The node is simply not there when the file is opened.

This behavior is not documented in the API reference entry for Node.add_child(), nor in the Node.owner property documentation in any way that would tell you it is required for headless construction. It is knowledge that surfaces through building headless scenes, noticing that nodes vanish on reload, and working backwards to find the cause. Experienced Godot developers who have done headless scene work know it. Developers working from documentation alone, including LLMs trained on documentation, do not.

This is exactly the category the quirks database is designed to capture.

The Difference Between Documentation Knowledge and Operational Knowledge

API knowledge exists in two distinct layers. The first is documented behavior: what a function accepts, what it returns, what side effects it has under normal use. For most APIs, documentation covers this well.

The second layer is operational behavior: what happens under conditions the documentation authors did not think to specify, at phase boundaries in object lifecycles, when combining two features whose interaction is not explicitly documented, or when using the API in a context it was designed for but that has implicit prerequisites.

LLMs trained on code and documentation are strong on the first layer. The training corpus contains the documentation itself plus code examples that demonstrate typical use. The second layer requires an additional source: observations from running the code and examining the results.

The @onready annotation has the same two-layer character. Documentation accurately describes what it does: defers variable assignment until _ready() is called after the node enters the scene tree. What documentation does not explicitly state is that this annotation is meaningless during headless construction because there is no active scene tree to enter. The annotation is syntactically valid in a headless builder script and will not produce a compile error. It simply will not execute the assignment, leaving the variable null, in a way that produces confusing runtime failures in the generated game.

Teaching an LLM this constraint requires either showing it many examples of headless construction code that correctly avoids @onready, or explicitly documenting the constraint in a format the model can retrieve and reason from. Godogen chose the latter.

Why Context Budget Shapes the Architecture

The full Godot 4 API reference runs to several megabytes of documentation across 850+ classes. Injecting this entire reference into the context window on every generation call is not feasible; the documentation alone would exhaust the budget before any game specification entered the prompt.

Godogen’s response is lazy-loading: the agent first determines which classes are relevant to the requested game type, then retrieves documentation only for those classes. A 2D platformer resolves to a predictable working set: CharacterBody2D, Sprite2D, CollisionShape2D, Area2D, Camera2D, AnimationPlayer, Timer, and the Input singleton. The documentation for this set is a fraction of the full reference.

The quirks database entries associated with those specific classes are loaded alongside their documentation. This means quirks are not a separate lookup; they are bundled with the relevant API documentation, delivered at the same point in the context where the model encounters the API they annotate.

This is structurally different from a general-purpose vector store retrieval approach. Generic RAG over documentation chunks works for query-answer use cases where the relevant information is contained within a few document passages. Code generation against a stateful runtime requires knowing not just what the API does, but what it does in each phase of the runtime lifecycle, which requires the quirks annotations to be co-located with the base documentation rather than retrieved separately based on query similarity.

The Generalization Beyond Godot

Godot is not unusual in having a significant layer of operational knowledge not captured in its documentation. The property is common to any complex runtime with lifecycle phases, implicit invariants, or behaviors that emerge from the interaction of multiple subsystems.

Audio plugin APIs have it. VST3 and AU plugin formats have initialization sequences where certain host callbacks are valid and others are not, and plugin authors learn these boundaries through debugging and forum posts rather than specification documents. GPU shader systems have it. The interaction between certain HLSL or GLSL features, driver implementations, and hardware architectures produces behaviors documented nowhere but well-known to graphics programmers. Embedded RTOS environments have it: FreeRTOS and Zephyr both have interrupt context restrictions on API calls that are specified in prose documentation but not enforced by the API itself, making them easy to violate and hard to debug.

Any LLM pipeline targeting these domains faces the same structural gap that Godogen identified in GDScript. Documentation-based retrieval closes the first layer of the gap. Closing the second layer requires a curated database of operational knowledge built from debugging, experimentation, and practitioner experience.

Building that database is engineering work distinct from writing documentation. It requires running the API under the conditions that generate failures, recognizing the pattern in the failure, and encoding the constraint in a form the model can retrieve and apply. Godogen’s author spent a year on this pipeline across four major rewrites. A significant portion of that investment went into learning which behaviors needed to be in the quirks database, by encountering the failures they produced and working back to their causes.

The result is not transferable in the way that documentation is. Godot’s XML API docs can be converted and reused by any project. The quirks database is specific to this pipeline, built through accumulated experience of headless Godot development. That specificity is not a limitation of the approach; it is the point. Operational knowledge is domain-specific by nature, and capturing it requires doing the work in the domain.