The Flat File as Personal Knowledge Base: Karpathy's Idea File and Why Structure Is Overrated
Source: hackernews
Andrej Karpathy recently shared a gist he calls his “LLM Wiki” as an example of what he describes as an “idea file.” The format is almost aggressively simple: a running document, loosely structured at best, where he accumulates observations, links, half-formed thoughts, and notes about LLMs and AI. The accompanying tweet framed it not as a polished resource but as a working artifact, something that exists primarily to be fed into an LLM context window when needed.
This landed with 200+ points on Hacker News, which tells you the idea is resonating with developers who have spent years fighting their note-taking systems.
The Overhead Problem With Traditional PKM
The personal knowledge management space has been doing laps around the same core problem for decades. Niklas Luhmann’s Zettelkasten is the canonical reference: a slip-box of atomic notes, each linked to related notes, allowing emergent connections to surface over time. Luhmann used it to write dozens of books across multiple disciplines and swore by it. The catch is that the system requires constant, disciplined curation. Every new note has to be atomic, linked, titled, and filed. The overhead is the point, in the original formulation: the act of structuring forces you to think.
Modern tools like Obsidian and Roam Research have tried to lower that friction while preserving the graph-linking model. They’ve built large communities. They’ve also produced an entire genre of blog posts about people abandoning their carefully constructed second brains because maintenance became a second job.
The “idea file” or “swipe file” is the older, less prestigious alternative. Writers have kept them for generations: a document or folder where you throw anything that seems interesting, with no particular organizational scheme, on the theory that having the material is more valuable than having it organized. You retrieve things by memory or by scanning, not by navigating a graph.
The problem with the swipe file approach has always been retrieval. Once the file grows past a few hundred entries, finding something specific requires either a good memory for context or a keyword search that may not surface what you want. The structure-first approach, for all its overhead, at least gives you reliable retrieval.
What Changes When the Reader Is an LLM
Karpathy’s insight, implicit in the LLM Wiki format, is that retrieval is no longer the bottleneck. When you’re not trying to navigate the file yourself but instead pasting it into an LLM’s context window, the organizational requirements collapse. The LLM can synthesize across dozens of loosely written paragraphs, find connections between entries that don’t share keywords, and answer questions about the material in natural language.
This changes the cost-benefit calculation for personal knowledge management completely. The main cost of Zettelkasten-style systems is the structuring work done at capture time. If you can defer that work entirely, and let the LLM handle synthesis at query time, the rational approach is to optimize for capture friction instead. Write things down in whatever form they come to you. Paste links with a sentence of context. Drop half-finished thoughts. The file doesn’t need to be coherent on its own because you’re never going to read it linearly.
This is related to but distinct from RAG (Retrieval-Augmented Generation), where a retrieval step fetches relevant chunks before the LLM sees them. An idea file used as a context dump is simpler: small enough to fit in the window, fed in whole, no retrieval step needed. The simplicity is load-bearing. There’s no indexing infrastructure, no embedding pipeline, no vector database to maintain. It’s a text file.
What Goes In the File
The specific framing as an “LLM Wiki” rather than a general idea file is worth paying attention to. Karpathy is maintaining a document specifically about the domain he’s working in most intensively, not a general-purpose note collection. This is practical for a few reasons.
First, domain-specific files stay small enough to fit in a context window for longer. A general life notes file accumulates indefinitely; a file scoped to one technical area has a natural size limit determined by how much there is to know.
Second, when you’re working on problems in a domain, the file earns its keep immediately. You paste it at the start of a conversation, and the LLM has your accumulated context: your terminology preferences, your prior reasoning on related problems, the sources you found useful, the approaches you already ruled out. You don’t have to re-explain yourself.
Third, maintaining a domain-specific file is itself a learning practice. The act of adding to it, even loosely, requires you to at least form a sentence about why something was worth noting. That’s a lower bar than writing a proper Zettelkasten card, but it’s not zero.
The Comparison to Software Development Context Files
Developers have been converging on something similar from the other direction. The CLAUDE.md and AGENTS.md files that AI coding tools use for project context are structured idea files: documents that accumulate conventions, decisions, known issues, and architectural context so that an LLM working on the codebase doesn’t have to rediscover them from scratch each session.
The LLM Wiki is a personal version of the same thing. Where CLAUDE.md captures project context, the idea file captures personal domain context: what you know, what you’ve tried, how you think about problems in this area.
This pattern, sometimes called context anchoring, is gaining traction as developers figure out how to work effectively with AI tools across sessions. The LLM’s memory doesn’t persist. Yours does, but only if you write it down. The idea file is the bridge.
The Limits of the Approach
The flat-file approach has real constraints. Context windows are large now but not unlimited, and a file that’s grown for years may exceed what you can practically paste. At that point you need either periodic pruning, splitting into sub-files by topic, or a retrieval layer, at which point you’re back in RAG territory.
There’s also a quality issue. A well-maintained Zettelkasten produces links that you can browse and learn from directly. A loosely written idea file is largely opaque to a human reader; its value depends entirely on having an LLM available to query it. If you want something that’s simultaneously useful as an LLM context document and as a human-readable reference, some structure helps.
And the approach rewards a certain kind of working style. If you’re disciplined enough to actually add to the file when you encounter something worth saving, it works well. If you’re the kind of person who intends to maintain a knowledge system but never opens it, the format won’t save you.
Why This Feels Like the Right Default
For most working developers, the idea file model is probably the right default precisely because it’s the lowest viable investment. You can start with a single markdown file today, add to it whenever something seems worth capturing, and get value out of it the next time you’re working through a related problem with an LLM.
Karpathy sharing his version publicly is useful not because the specific contents are universal but because it normalizes the format. You don’t need a graph database or a Zettelkasten methodology or a plugin ecosystem. You need a file and the habit of adding to it.
The HackerNews thread surfaced the predictable comparisons to TiddlyWiki, Org-mode, and Notion. They’re not wrong, exactly, but they’re also not the point. The point is that the retrieval model has changed, and that changes what the optimal storage model looks like. A flat file optimized for LLM consumption is a different artifact than a hyperlinked wiki optimized for human navigation, even if they contain similar information.
That distinction is worth sitting with. Most of our knowledge management instincts were formed in an era when we were the only reader. They may need updating.