From Proof of Concept to Active Campaign: How Glassworm Weaponized Unicode Against the Supply Chain

The University of Cambridge published Trojan Source in October 2021, proving that Unicode bidirectional control characters could make malicious code look like a comment to a human reviewer while the compiler read something entirely different. Compiler vendors issued CVEs. Rust patched its compiler within weeks. Python added a SyntaxWarning. GitHub put a yellow banner on files containing bidirectional characters. The ecosystem responded, and then largely moved on.

Four years later, Aikido Security is reporting that Glassworm, a campaign using invisible Unicode characters for supply chain attacks, is back with a coordinated wave hitting GitHub repositories, npm packages, and VSCode extensions simultaneously. The research deserves more than a patch-and-move-on response, because the shift from proof-of-concept to active operational campaign is the story here, not the Unicode technique itself.

The Mechanics Behind the Attack

To understand why Glassworm is dangerous specifically in 2026, you need to understand that the attack surface has three distinct layers, and each one has a different trust model.

The foundational trick is the gap between what Unicode’s rendering rules display and what a parser reads. The Unicode bidirectional algorithm (Bidi) exists to correctly render mixed left-to-right and right-to-left text, like a sentence in English that contains an Arabic phrase. Bidi control characters embedded in source code manipulate this rendering without affecting the byte-level content the compiler processes.

The most dangerous of these is U+202E, the RIGHT-TO-LEFT OVERRIDE (RLO). Insert it inside a comment, and everything rendered after it runs visually right-to-left. Pair it with U+202C (POP DIRECTIONAL FORMATTING, PDF) to close the scope. The bytes in the file are unchanged; only the visual presentation shifts. A code reviewer sees a comment. The compiler reads executable code.

A simplified illustration:

// Check permissions: access denied for regular users \u202e }  
if (isAdmin) {
    grantAccess();

With the RLO active, the visual rendering of that first line looks like a comment ending in a closing brace. The bytes tell the compiler a different story entirely.

Separate from the Bidi class, there are zero-width characters: U+200B (ZERO WIDTH SPACE), U+200C (ZERO WIDTH NON-JOINER), U+200D (ZERO WIDTH JOINER), and U+FEFF (the byte-order mark, which mid-string behaves as zero-width no-break space). These do not affect rendering in any visible way. They are genuinely invisible. Their damage model differs from Bidi: instead of hiding code, they break string identity.

Consider a security check like:

if (userRole === "admin") {
  // grant access
}

If the string literal "admin" in the source file contains a U+200B between a and d, the visual display is identical to "admin", but the runtime comparison will fail against any externally-sourced string that does not contain the same invisible character. The reverse is also exploitable: embed a ZWSP in a deny-list string, and the comparison will pass for inputs that look identical to the prohibited value.

Three Vectors, Three Trust Models

What makes Glassworm notable is the targeting of three ecosystems that have very different security postures.

GitHub is where code review happens. Pull request review is fundamentally a visual operation, and GitHub’s diff view renders Unicode through the browser’s Chromium engine, which applies the Bidi algorithm. GitHub did add a warning banner for files containing Bidi control characters after Trojan Source, but that banner applies to the file view, not the diff view. A PR that inserts a single U+202E into an existing comment, or places a U+200B inside a string literal comparison, can survive review by a careful human reader. The banner is a speed bump, not a gate.

Zero-width characters specifically receive no warning at all on GitHub as of this writing. They are invisible in file views, invisible in diffs, and invisible in blame views. A commit that inserts U+200B into a JavaScript string that handles authentication leaves no visual trace in the repository interface.

npm is where the attack becomes a supply chain problem. JavaScript package contents, including preinstall and postinstall scripts, can contain arbitrary Unicode. The npm audit command does not scan source files for invisible characters; it queries a vulnerability database keyed on package versions. The integrity field in package-lock.json protects against a package being modified after publishing but does nothing if the invisible characters were present at publish time.

Socket.dev has been doing the harder work of behavioral and static analysis on npm packages, and their tooling does flag Unicode anomalies as a risk signal. But Socket is an opt-in layer on top of npm, not a default. The baseline npm registry has no Unicode auditing.

VSCode extensions are the highest-privilege vector. Extensions run as the user, with full filesystem and network access unless you are using a restricted profile. The VSCode Marketplace does review extensions, but that review is not byte-level source analysis. A VSIX file is a ZIP archive of JavaScript; the source inside is rarely subjected to the kind of character-level inspection that would catch zero-width insertions.

After Trojan Source, VSCode added "editor.renderControlCharacters": true as an option that renders Bidi and other control characters as visible colored boxes in the editor. It is not the default. Most developers have never toggled it.

Why the 2021 Patches Were Incomplete

The post-Trojan Source patches addressed compilers and some tooling, but they addressed the Bidi attack specifically. The zero-width character class received far less attention because the original paper focused on the visually dramatic Bidi reversal attack, which is easier to demonstrate.

Rust 1.56.1 (November 2021) made Bidi characters in source a compiler error outside string literals and a warning inside them. Python 3.12 added a SyntaxWarning for Bidi characters. GCC 12 added -Wbidi-chars, which is off by default. These are meaningful mitigations for compiled code, but they do not cover the JavaScript/TypeScript ecosystem where Glassworm’s npm vector lives. Node.js and V8 do not warn on Bidi or zero-width characters in source files. The browser runtime does not warn on them in script tags. The entire JavaScript supply chain processes these files silently.

ESLint’s no-irregular-whitespace rule catches some zero-width characters, but its coverage is incomplete. The rule targets whitespace used in unexpected positions, not zero-width characters embedded inside string literals or identifier names.

What Defenses Actually Work

A pre-commit hook targeting the relevant Unicode ranges is the most reliable line of defense for a repository, because it runs before code enters the shared history:

#!/usr/bin/env bash
# Detect Bidi control characters and common zero-width characters
if git diff --cached --name-only | xargs grep -lP '[\x{202A}-\x{202E}\x{2066}-\x{2069}\x{200B}-\x{200D}\x{FEFF}]' 2>/dev/null; then
  echo "Suspicious Unicode characters detected in staged files."
  exit 1
fi

The grep pattern covers U+202A through U+202E (LRE, RLE, PDF, LRO, RLO), U+2066 through U+2069 (the newer isolate characters), U+200B through U+200D (zero-width space, non-joiner, joiner), and U+FEFF (BOM). This will generate false positives in repositories that legitimately handle internationalized content in source files, so scope it to the file types that matter: .js, .ts, .py, .go, .rs, .rb.

For VSCode specifically, enabling "editor.renderControlCharacters": true in your settings makes Bidi characters visible as [RLO] and similar markers. The Highlight Bad Chars extension goes further, rendering zero-width characters with a red background highlight.

For npm packages, running the following against extracted package contents before committing a dependency update is a reasonable manual audit step:

npm pack some-package && tar -xzf some-package-*.tgz && \
  grep -rP '[\x{200B}-\x{200F}\x{202A}-\x{202E}]' package/

For new projects, Socket.dev as a GitHub app provides automated scanning of new and updated dependencies for Unicode anomalies among other behavioral signals.

The LLM Angle Nobody Is Talking About Enough

There is a newer attack surface that the original Trojan Source paper could not anticipate: LLM-generated code. If a language model is trained on a corpus that includes repositories with invisible Unicode characters, or if a model is manipulated via prompt injection in its context window to emit zero-width characters in its output, the generated code inherits the attack without any human having manually inserted it.

This is not hypothetical as a threat class. Prompt injection via poisoned web content retrieved by an agent, or via malicious text in a document fed to a code-generating workflow, is an active research area. The invisible Unicode attack is a natural payload for this vector: the model emits code that looks correct to the human reviewing the suggestion, and the harmful character is never visible in the diff or in the chat interface.

The defense here is the same pre-commit hook, but the attack path is different enough that teams using AI code generation tools should treat Unicode auditing as mandatory rather than optional.

The Broader Pattern

Trojan Source established that the attack was theoretically possible and that compilers were vulnerable. Glassworm establishes that adversaries have moved beyond proof-of-concept into coordinated deployment across the supply chain’s most-trafficked surfaces. The gap between those two states is exactly the gap that security tooling failed to close in the intervening four years.

The Hacker News discussion of the Aikido research surfaces the familiar argument that pinning dependencies with hash integrity protects against this. It does not, because the invisible characters can be present in the package at the moment of first publication. Hash pinning protects against post-publication modification; it does nothing against a poisoned initial publish.

The more honest assessment is that the JavaScript ecosystem, and the code review tooling built on top of GitHub, were not designed with byte-level adversarial content in mind. The abstractions that make code readable to humans, rendered Unicode, diff views, syntax highlighting, are the same abstractions the attack exploits. Closing that gap requires tooling that operates below the rendering layer, which means hooks, scanners, and compiler-level enforcement that most projects still do not run by default.