· 6 min read ·

Why Node.js Module Hooks Exist but `fs` Hooks Don't

Source: hackernews

The argument made in Matteo Collina’s case for a Node.js virtual filesystem is sharp at its core: Node.js is the only major modern runtime without a first-class VFS abstraction, and the ecosystem has been paying for that gap in redundant workarounds for years. The argument is easier to follow if you start with the specific architectural reason the gap exists, rather than the list of symptoms.

Two Separate Pipelines

Node.js has two module systems: CommonJS (require) and ES modules (import). Both have robust extension points. CommonJS exposes Module._resolveFilename, Module._extensions, and Module._load, which are not officially stable but have been present and functional for over a decade. The ES module loader, stabilized in recent Node.js releases, has a proper register() API with documented resolve and load hooks. You can intercept an import statement, inspect the specifier, and return whatever source code you like, including code that never touches the filesystem at all.

The fs module has none of this. Its architecture is a four-layer stack with no public interception points:

user code
  → lib/fs.js  (JavaScript wrapper)
    → lib/internal/fs/utils.js, lib/internal/fs/promises.js
      → node_file.cc  (C++ binding)
        → libuv  (cross-platform async I/O)
          → OS syscalls

There is no hook between the JavaScript layer and the C++ binding that applies uniformly to all fs operations, remains accessible from JavaScript, and stays stable across Node.js releases. The design was deliberate. Node.js was built to expose OS primitives, not abstract them. The fs module is meant to be thin.

The asymmetry has a concrete consequence. You can intercept import './config.json' and route it anywhere. You cannot intercept fs.readFileSync('./config.json') from any stable, public surface. Any library that reads files internally with fs is opaque to the interception layer that exists for modules.

Three Approaches, Three Failure Modes

The ecosystem has converged on three categories of workarounds, and none of them generalize cleanly.

Monkey-patching. Libraries like memfs implement the full node:fs API as an in-memory volume. mock-fs takes a lower-level approach, patching the internal binding object that lib/fs.js delegates to. For single-threaded tests against code you control, this works. The problems emerge at the edges: native addons that call uv_fs_open directly bypass any JavaScript-layer patch; Worker threads do not inherit patches applied to the main thread’s module objects; code that caches a reference to fs.readFile before the patch runs will hold the unpatched version. These tools are useful for unit tests and fragile everywhere else.

C++ binding patches. This was pkg’s approach, and it solved the transparency problem properly. Vercel’s single-binary packaging tool maintained patched Node.js builds where node_file.cc was modified to check an embedded virtual filesystem before falling through to the OS. The embedded FS used a special path prefix — /snapshot/ on Linux, C:\snapshot\ on Windows — and stored a full directory tree in the binary. Every fs operation, including readdir, stat, and createReadStream, resolved transparently against it. If your app read template files from disk using fs.createReadStream, they worked inside the packaged binary without any code changes, without any special API, without any awareness that the app was running from a binary.

The maintenance burden was unsustainable. Every Node.js release required fresh patched binaries across every supported platform, because node_file.cc and its neighbors changed often enough to break the patches. Vercel stopped maintaining pkg in 2023 for exactly this reason. A community fork, @yao-pkg/pkg, continues the work, but the structural cost has not changed.

A separate asset API. This is the approach Node.js’s built-in single executable application (SEA) feature took. Assets embedded via the sea-config.json format are accessible through node:sea:

const { getAsset, getAssetAsBlob } = require('node:sea');

// Returns an ArrayBuffer — a copy of the embedded data
const buf = getAsset('config.json');
const text = Buffer.from(buf).toString('utf8');

// Returns a Blob, zero-copy from the embedded binary section
const blob = getAssetAsBlob('template.html', { type: 'text/html' });

The sea-config.json configuration:

{
  "main": "bundle.js",
  "output": "sea-prep.blob",
  "assets": {
    "config.json": "./config/defaults.json",
    "template.html": "./templates/main.html"
  }
}

This API is clean and well-specified. It is also completely isolated from fs. A library that reads a default configuration with JSON.parse(fs.readFileSync(path.join(__dirname, 'defaults.json'), 'utf8')) will throw ENOENT inside a SEA binary unless the file also exists on disk. You have to pre-bundle the entire application into a single JS file using esbuild or ncc before embedding, and the bundler must inline every library that reads from the filesystem at runtime. There is no streaming interface for SEA assets, no way to enumerate what is embedded, and no path for require() to resolve modules from the embedded store without a custom loader.

Compared to what pkg delivered through its C++ patches, SEA is a step backward in transparency, even though its architecture is much cleaner to maintain.

What Go Did

Go 1.16 shipped io/fs, a minimal interface-based VFS in the standard library:

type FS interface {
    Open(name string) (File, error)
}

// Optional extended interfaces, checked via type assertion:
type ReadDirFS interface {
    FS
    ReadDir(name string) ([]DirEntry, error)
}

type ReadFileFS interface {
    FS
    ReadFile(name string) ([]byte, error)
}

Three implementations ship with the standard library. embed.FS holds files embedded at compile time via //go:embed directives. os.DirFS wraps a real directory. fstest.MapFS provides an in-memory implementation for tests. All three implement fs.FS, so they are interchangeable at any call site that accepts the interface.

Serving embedded static files over HTTP:

//go:embed static/*
var staticFiles embed.FS

http.Handle("/static/", http.FileServer(http.FS(staticFiles)))

Testing file-reading code with an in-memory fixture:

fsys := fstest.MapFS{
    "config.json": &fstest.MapFile{Data: []byte(`{"key":"value"}`)},
}
result, err := mypackage.ReadConfig(fsys) // function accepts fs.FS

The composability works across the entire Go ecosystem because Go’s standard library adopted fs.FS uniformly, from the HTTP server to the template engine to the archive packages. Because the interface lives in the standard library, there is only one target for library authors to program against. Libraries written years after io/fs shipped speak the same interface as stdlib itself.

Node.js never established this convention, and no npm package can establish it retroactively. Any VFS standard defined outside of Node core is optional. The vast majority of packages on npm were written before such a standard existed and call require('fs') directly. There is no realistic path to retrofitting them through community coordination alone.

What a Fix Requires

The Platformatic proposal describes a registration mechanism, roughly a node:vfs module where code can register a provider for a path prefix, intercepting all fs operations that match:

import { register } from 'node:vfs';
import { Volume } from 'memfs';

const vol = new Volume();
vol.fromJSON({ '/embedded/config.json': JSON.stringify({ key: 'value' }) });
register(vol, { prefix: '/embedded' });

// Any code in the process, including third-party libraries calling node:fs:
fs.readFileSync('/embedded/config.json'); // routes to vol

For this to work at the right level, the registry lookup needs to live in node_file.cc, not just in lib/fs.js. Native addons that call libuv directly will bypass any JavaScript-layer interception. Worker threads need to inherit or be explicitly given registry state. The require loader and the ESM loader both need to consult the same registry so that module resolution and raw file reads remain consistent with each other.

The performance concern is real. Every fs call checking a registry lookup adds overhead on a code path that runs constantly in most server applications. A sentinel check that short-circuits the lookup when no provider is registered would be necessary for this to be acceptable in production, and proving that the fast path has negligible cost would require careful benchmarking across the range of workloads Node.js serves.

The case for building it rests on where the cost actually sits today. The work pkg did to keep patched binaries current on every release, the work every test framework does to keep mock-fs working across Node.js internal changes, the work every bundler does to inline asset-reading code at build time because it cannot intercept at runtime: all of this is redundant effort, paid on every release cycle, and invisible in aggregate because it is distributed across thousands of unrelated projects. A single well-designed interception layer in core would eliminate most of it, and it would improve the testing story, the single-binary story, and the plugin sandboxing story simultaneously, without requiring any library to change its call sites.

Was this interesting?