Every Language Supports Unicode Identifiers. Almost None Let You Write Keywords in Your Own Script.

Every mainstream programming language will tell you it supports Unicode identifiers. Python 3 has done it since PEP 3131 in 2007. Rust stabilized non-ASCII identifiers in 1.53 in 2021. JavaScript has had it in the ECMAScript spec for years. You can name your variable 나이 or 年齢 or возраст and the compiler will not blink.

But the keywords are still if, while, return, struct. Every one of them. In every mainstream language. The implicit assumption baked into every popular programming language since FORTRAN is that the person reading the control flow already reads English.

Han, a new language posted to Hacker News by its author, is a deliberate attempt to close that gap for Korean. It is a statically typed language written in Rust where every keyword is Hangul. 만약 for if. 아니면 for else. 동안 for while. 함수 for function. 반환 for return. The full pipeline is there: lexer, parser, AST, a tree-walking interpreter, and LLVM IR codegen. There is also a REPL and a basic LSP server.

The author is transparent about the motivation. They saw a post about someone rewriting a C++ codebase to Rust using AI assistance in under two weeks, thought that if AI could do that it could help build a language from scratch, and combined it with curiosity about what Korean programming would look like. The result is a side project, not a production pitch. But it sits inside a longer history worth tracing.

This Has Been Tried, Several Times

The impulse to write programs in something other than English-derived ASCII has surfaced repeatedly over sixty years. The trajectories are informative.

易语言 (E-Language), developed in China in 1999, is the most commercially successful example. It is a full Windows RAD environment with Chinese keywords, a Chinese standard library, and a Chinese IDE. Actual commercial software has been built with it. It never escaped its niche, but within that niche it has been genuinely useful for twenty-five years.

なでしこ (Nadeshiko), a Japanese-keyword language that has been maintained since 2004, has an active user base in Japan and appears in some educational curricula. Its predecessor, ひまわり (Himawari), predates it. These are not abandoned experiments.

On the esoteric side, 아희 (Aheui) is a Korean esoteric language where each Hangul syllable character encodes an opcode through its constituent jamo. Programs are written on a two-dimensional grid of Hangul characters. It is Turing-complete, has multiple serious implementations including a Rust-based LLVM compiler, and has an active community. Aheui is the most sophisticated piece of Korean programming language infrastructure that exists, and most Western programmers have never heard of it.

Wenyan-lang, a Turing-complete language whose syntax mimics classical Chinese literary prose, went viral on GitHub in late 2019. It hit tens of thousands of stars. Programs look like:

吾有一數。曰三。名之曰「甲」。

Which translates roughly to: “I have one number. It is 3. I name it 甲.” The viral moment sparked the usual debate about whether localized programming languages are useful bridges or elaborate novelties. It did not produce a long-term ecosystem, but it proved there was appetite.

MIT’s Scratch is the largest-scale deployment of the idea. The block labels localize completely into 70+ languages including Korean, Arabic, Hebrew, and Chinese. When a child in Seoul drags a block that says 계속 반복하기, they are writing a forever loop in Korean. Scratch’s localization is not an afterthought; it is a design pillar.

What the Rust Choice Means

Writing a language implementation in Rust has become extremely common since about 2018. Gleam reached v1.0 in March 2024 with a Rust-based compiler. Roc uses Rust throughout. SWC, the TypeScript compiler that Next.js relies on, is Rust. The pattern is mature.

The attraction is concrete. Rust’s enums with data are a natural fit for AST node types:

enum Expr {
    If { condition: Box<Expr>, then: Box<Stmt>, else_: Option<Box<Stmt>> },
    Call { callee: String, args: Vec<Expr> },
    Literal(Value),
    Ident(String),
}

Pattern matching over these nodes is how you write an interpreter or code generator without the match falling through or exhaustiveness being left to runtime. The ? operator makes error propagation in parsers read almost like prose. And Rust’s memory model means you can be precise about whether AST nodes live in an arena, on the heap with Box<T>, or shared with Rc<T>, which matters when you start doing things like closure capture analysis.

For LLVM specifically, the inkwell crate provides safe, ergonomic Rust bindings. A simple integer addition in inkwell looks like this:

let i64_type = context.i64_type();
let fn_type = i64_type.fn_type(&[i64_type.into(), i64_type.into()], false);
let function = module.add_function("더하기", fn_type, None);
let entry = context.append_basic_block(function, "entry");
builder.position_at_end(entry);
let a = function.get_nth_param(0).unwrap().into_int_value();
let b = function.get_nth_param(1).unwrap().into_int_value();
let result = builder.build_int_add(a, b, "result");
builder.build_return(Some(&result));

The function name here is 더하기, the Korean word for “addition”. Nothing in LLVM IR cares about that string. LLVM IR is already language-agnostic at the symbol level; it just traffics in names and types. In this sense, the LLVM backend is the part of the compiler that is most indifferent to the language being Korean.

Han going beyond a tree-walking interpreter to include LLVM IR codegen is a meaningful architectural choice. A tree-walking interpreter is fine for exploration, but it ties performance to the host runtime and makes features like closures over mutable state and tail call optimization much harder to implement correctly. LLVM IR codegen means Han can, in principle, generate native binaries with real optimization passes. The official LLVM Kaleidoscope tutorial walks exactly this pipeline for a toy language and is the most common starting point for this work.

What Localized Languages Are Actually For

The argument against languages like Han is usually: you still need to read English documentation, use English package names, and communicate with a global developer community. Teaching someone to program in Korean keywords just defers the problem.

This argument is correct but misses the target. The strongest case for localized programming languages is not adult professional developers. It is the early-stage learner who is building a mental model of what a conditional branch or a function call is. When the control flow token is a word you already know, 만약 meaning “if” and 아니면 meaning “else”, the syntactic noise is lower. You can think about the logic rather than the vocabulary. This is why Scratch’s localization is taken seriously by educators: the cognitive load reduction at the beginning of learning is real.

The counterpoint to AI-assisted coding is worth raising here too. GitHub Copilot and similar tools mean that a Korean developer can describe what they want in Korean and receive working Python. In one framing, this makes localized keyword languages less necessary: the natural language interface is just the prompt. In another framing, it validates the underlying goal completely, because the entire premise is that programming does not need to be mediated through English.

Han’s author mentions noticing growing global interest in Korean language and culture. That is a real phenomenon. Korean media consumption has reached substantial scale in countries where Korean is not spoken natively. The number of people studying Korean as a foreign language has grown sharply through the 2020s. Curiosity about what Korean programming would look like is not an unreasonable response to that context.

The Harder Problems Ahead

Building a lexer that tokenizes Hangul is the easy part. Hangul has excellent Unicode properties. Each syllable block is a single code point in the U+AC00 to U+D7A3 range, and the standard Rust string handling deals with it cleanly.

The harder problems are tooling and the identifier-keyword boundary. Syntax highlighting in editors depends on the editor’s grammar files knowing what counts as a keyword. Language servers need to communicate keyword information to IDE tooling. Han already has a basic LSP server, which is a serious commitment: the Language Server Protocol is a substantial surface area to implement correctly.

There is also the question of error messages. A compiler that tells you 만약 뒤에 조건이 없습니다 rather than expected condition after if is a materially better experience for a Korean speaker. Error message localization is often the last thing language implementers think about and the thing users notice most.

Han is a side project posted on a Sunday. The repo has a REPL and an LSP and LLVM codegen. Whether it develops into something with a real user community depends on choices the author has not made yet: documentation language, standard library depth, whether to pursue the educational angle explicitly.

But the thing it has already done is demonstrate the gap clearly. Unicode identifier support in Rust, Python, and JavaScript means you can name your variables in Hangul. Han means you can read the control flow in Hangul too. Those are not the same thing, and most programming language designers have treated them as though they were.