· 7 min read ·

GDScript Is a Harder LLM Target Than C# or C++, and the Gap Is Structural

Source: hackernews

Godogen is a pipeline that takes a text prompt and produces a complete, playable Godot 4 project: architecture, 2D/3D assets, GDScript source, and a working scene graph. A year of development and four major rewrites went into making that output reliable. The engineering investments it required are diagnostic, and they become more informative when you compare them to what a similar pipeline targeting Unity or Unreal would actually need.

The short version is that GDScript is a harder LLM generation target than C# or C++, and that difficulty is structural rather than incidental. Understanding why clarifies what Godogen had to build and suggests what any serious LLM pipeline for an engine-specific language needs to address.

The Training Data Problem Is Not Uniform Across Engines

The dominant factor in LLM code generation quality for any language is representation in the pretraining corpus. This is not a subtle effect; it is the primary variable. Languages that appear in billions of lines of public code are well-understood by current models. Languages that appear in millions are much less reliably generated. Languages that appear in thousands, or that are locked behind engine-specific repositories with shallow GitHub representation, require explicit compensation.

Unity’s scripting language is C#. C# has decades of training data from enterprise software, web backends, open source tooling, and game development that have nothing to do with Unity. When a model generates a MonoBehaviour subclass, it draws on that entire general-purpose C# corpus. The Unity-specific layer, Transform, Rigidbody, Time.deltaTime, is thinner, but the language substrate is well-understood. A model generating Unity code might hallucinate the exact signature of an engine API, but it will produce C# that is structurally correct and idiomatic.

Unreal Engine uses C++ as its primary scripting language and a visual Blueprint system for logic that designers typically author. C++ has an even larger training corpus than C#. Unreal’s conventions are idiosyncratic: the macro system (UCLASS, UPROPERTY, UFUNCTION), the garbage-collected pointer types (TWeakObjectPtr, TSharedPtr), the module build system, all of these have sparse representation relative to general C++. But the language itself is solid ground. A model generating Unreal C++ can lean on decades of C++ knowledge and produce structurally valid code even when it gets engine-specific details wrong.

GDScript has no analogous general-purpose foundation to lean on. It is purpose-built for Godot and exists essentially nowhere else. The only meaningful training data is GDScript itself, and that corpus is both recent and split by a significant API migration. Godot 4 changed enough from Godot 3 that training data spanning both versions is actively harmful: a model interpolating between versions produces code that satisfies neither.

The specific problem is that the Godot 3 to Godot 4 transition was a comprehensive API revision. KinematicBody2D became CharacterBody2D. The signal connection API changed from a string-based system to first-class signal objects. move_and_slide() changed from accepting a velocity vector argument to reading velocity as a property on CharacterBody2D. These are not subtle deprecations; they are complete behavioral changes that produce runtime errors when mixed.

# Godot 3 signal connection
$Timer.connect("timeout", self, "_on_timer_timeout")

# Godot 4 signal connection
$Timer.timeout.connect(_on_timer_timeout)

A model that has seen both forms in its training data will blend them. The result compiles in neither version.

The Python Trap Is Specific to GDScript

The syntax similarity between GDScript and Python creates a failure mode that has no equivalent in C# or C++ generation. When a model doesn’t know a C# method name, it tends to produce something obviously wrong: a method call that doesn’t exist, a type mismatch that the compiler catches immediately. The failure is legible.

When a model doesn’t know a GDScript method name, it reaches for the Python equivalent. The result is syntactically plausible. It passes a visual inspection. It fails at runtime in ways that require understanding the distinction between GDScript and Python to diagnose.

Godogen describes this as the central training data problem: approximately 850 classes in the GDScript API, with Python-like syntax that encourages confident wrong answers. Confident wrong answers are harder to debug than obviously wrong answers, because the failure signal is more ambiguous. A null reference from a missing node is easier to attribute to the right cause than a runtime type error from a Python idiom that GDScript evaluates differently.

The response Godogen built is a structured reference system: a hand-written language specification, the full Godot 4 API documentation converted from Godot’s XML source files, and a quirks database for behaviors that documentation doesn’t cover at all. The lazy-loading architecture, which pulls in only the specific API classes relevant to a given task rather than injecting the full 850-class reference at once, is a practical necessity: the full reference would exhaust the context budget before generation begins.

This is retrieval-augmented generation applied to a structured API catalogue, and it is something a Unity or Unreal pipeline would need in a less severe form. The C# and C++ foundations are stable enough that model knowledge is a reasonable starting point. The GDScript foundation requires explicit correction before generation can produce reliable output.

Scene Serialization as an Engine-Specific Constraint

Godot’s .tscn scene format is a text serialization designed to be written by the Godot editor, not by humans or generative systems. It uses internal cross-references between sections, integer resource identifiers that must remain consistent, and external resource path references that break silently when the project structure changes. A single malformed line prevents the scene from loading. Generating this format directly as text is fragile in the way that generating XML by string concatenation is fragile: the format’s invariants are easy to violate and the errors they produce are not always informative.

Unity stores scenes as YAML-like serialized text, which has similar fragility, but Unity’s programmatic scene construction via Editor scripting has a different character. You can build and modify scenes using Editor APIs without needing to touch the serialized format at all, and the Unity Editor’s play mode gives you a live scene tree to work against.

Godogen builds scenes using Godot’s headless runtime API instead of generating .tscn text directly. A GDScript program constructs the node hierarchy in memory and lets the engine serialize it, delegating format correctness to the engine itself. This is the right approach, but it introduces the build-time versus runtime distinction that Unity’s equivalent workflow does not impose as severely.

In a headless Godot context, the @onready annotation does not apply: it defers variable initialization until _ready() is called after the node enters an active scene tree, which does not happen during headless construction. Signal connections that use node paths behave differently without an instantiated scene tree to traverse. The node owner property must be set explicitly on every programmatically created node or the node will serialize to disk correctly but disappear silently on reload. The ownership requirement is not documented in the API reference for any individual method; it is the kind of knowledge that surfaces through building scenes headlessly and debugging why nodes are missing after a save-load cycle.

# During headless scene construction, owner must be set explicitly
var node = Node2D.new()
scene_root.add_child(node)
node.owner = scene_root  # Omitting this causes the node to vanish on reload

Unreal has its own implicit contract category, particularly around object construction order and the macro system, but the specific pattern of nodes silently vanishing due to a missing property assignment has no direct equivalent in the Unity or Unreal development model. It is a Godot-specific footgun that requires explicit knowledge encoding.

What the Comparison Clarifies

The difficulty distribution across these engines is not about language quality. GDScript is a well-designed language for its purpose. Godot’s scene system is coherent and powerful. The difficulty is a function of training data availability, version stability in the training corpus, and the specificity of the implicit contracts in the runtime model.

For C# and Unreal C++, a pipeline can lean on general language knowledge and supplement with engine-specific context for the thinner parts. For GDScript, the general language knowledge is actively misleading, and the engine-specific context must cover the full surface before generation produces reliable output.

The four rewrites Godogen went through over a year are an honest measure of how much that engineering investment amounts to. Building the reference system, determining the lazy-loading architecture, encoding the phase-specific API availability, and constructing a visual evaluation loop to catch failures that don’t surface as compilation errors: none of this is accidental complexity. It is the minimum required to compensate for GDScript’s position on the training data spectrum and the implicit contracts in Godot’s runtime model.

For developers looking at the LLM code generation landscape for game development, this comparison has a practical implication. Unity and Unreal pipelines can get surprisingly far with general-purpose models and thin supplementary context. GDScript pipelines need the kind of infrastructure Godogen built, or something equivalent, to produce output that consistently runs. The gap is real, it is structural, and it is unlikely to close quickly given how recent the Godot 4 corpus is relative to C# and C++.

Godogen is open source on GitHub. The reference system and the phase-specific API documentation it built are the parts worth studying for anyone attempting LLM code generation in a similarly underrepresented domain.

Was this interesting?