Subsecond VM Coldstarts and the Portability Problem smolvm Is Trying to Solve
Source: hackernews
The smol-machines/smolvm project landed on Hacker News recently with a straightforward pitch: subsecond coldstarts, portable virtual machines. It got 276 points and a lively discussion thread, which tells you something about how much this particular problem still bothers people. Fast VM startup is not a new problem, but it remains genuinely unsolved in the general case, and portability makes it harder.
Let me trace the landscape this project is entering.
Why VM Coldstart Latency Matters
The classic VM startup flow is expensive. QEMU, the general-purpose emulator most developers have encountered, typically takes 500ms to several seconds to boot even a stripped-down Linux guest. That’s fine for long-running workloads. It’s completely unsuitable for serverless functions, sandbox execution environments, or anything that needs to spin up in response to a user request.
The cost comes from several layers: initializing virtual hardware, loading the kernel, running init, mounting filesystems, starting services. Most of that work is redundant when you’re booting the same image for the thousandth time that hour.
Cloud providers have been attacking this problem for years. AWS Lambda’s original architecture used container isolation, not full VMs, specifically because container startup was faster. When Firecracker shipped in 2018, it represented a substantial shift: a purpose-built microVM monitor written in Rust, using Linux KVM for hardware-assisted virtualization, with a minimal device model that deliberately omits anything unnecessary. Firecracker achieves VM boot in roughly 125ms on warm hardware. Fly.io’s Machine API, which powers their edge compute platform, is built on top of Firecracker.
125ms is fast. It’s fast enough for Fly.io and Lambda. But it’s not subsecond in the way that feels instant to a user triggering a cold path, and it gets worse when you account for application startup on top of the kernel boot.
The Snapshot Approach
The most effective technique for reducing coldstart latency below the kernel boot floor is snapshotting. Instead of booting from scratch, you snapshot a VM after it has initialized, then restore from that snapshot on demand. Firecracker supports this: you can create a memory snapshot and restore it in under 10ms. AWS Lambda SnapStart uses a similar approach for Java functions, snapshotting after the JVM and application have warmed up.
CRIU (Checkpoint/Restore in Userspace) brings a related capability to containers and processes: freeze a running process tree, serialize its memory and state to disk, restore it later. CRIU has been used to achieve sub-100ms restore times for complex applications.
The tradeoff with snapshots is state. A restored VM contains whatever state was present at snapshot time, including open file descriptors, in-memory randomness, clock state, and any other time-sensitive data. Applications need to be written or wrapped to reinitialize this state after restore. AWS documents this for SnapStart; it requires lifecycle hooks. It’s manageable but not transparent.
What Makes smolvm Different
smolvm’s angle is both coldstart and portability, and the combination is what makes it interesting. Firecracker requires KVM, which means Linux on x86-64 or ARM. You cannot run Firecracker on macOS or Windows without a Linux VM underneath it. For development workflows, this means Docker Desktop, a heavyweight Linux VM, or similar workarounds.
Portable microVMs need to abstract over the native hypervisor interfaces: KVM on Linux, Hypervisor.framework (HVF) on macOS, and Windows Hypervisor Platform (WHP) on Windows. Each of these exposes hardware-assisted virtualization without requiring root access, but they have different APIs and different performance characteristics.
Cloud Hypervisor, the Intel-backed project, targets KVM and MSHV (Microsoft Hypervisor). libkrun from Red Hat provides a library interface over KVM and HVF. Crossvm from Google powers Chrome OS VMs and has grown portability over time. Each project has made different tradeoffs between portability, performance, and feature completeness.
smolvm appears to be pursuing a genuinely portable path: the same VM image running with native hypervisor acceleration on Linux, macOS, and Windows, with startup times that stay subsecond across all three. The “smol” framing echoes the Rust ecosystem’s preference for minimal, focused libraries over kitchen-sink frameworks, which suggests the implementation is probably small and auditable rather than feature-packed.
The Kernel Side of Coldstart
Portability alone does not give you subsecond boot. You also need to minimize what the kernel does during startup. Standard Linux kernels do substantial work initializing device drivers, setting up subsystems, and running early userspace. A 125ms Firecracker boot uses a stripped kernel config that disables hundreds of options that are irrelevant in a VM with a minimal device model.
Unikernels take this further. Projects like Unikraft and MirageOS compile an application directly with only the OS components it needs, producing a single-purpose binary with no general-purpose kernel overhead. Unikraft can boot in under 1ms. The tradeoff is that building a unikernel requires application-specific work; you cannot take an arbitrary Linux binary and run it as a unikernel.
For a general-purpose VM that can run unmodified Linux workloads and still boot subsecond, the practical approach combines a minimal kernel, minimal init, and possibly pre-allocation of memory and CPU state. If the VM monitor avoids expensive device emulation and the kernel boots to a minimal userspace quickly, subsecond is achievable without snapshots.
Why Portability Is Harder Than It Looks
Even with hardware-assisted virtualization available on all three major platforms, building a portable VM monitor requires handling significant divergence in behavior.
KVM’s API is built around ioctl calls on /dev/kvm. Hypervisor.framework is a C framework with a fundamentally different mental model. WHP is a Win32 API. Memory mapping, VCPU creation, interrupt handling, and MMIO emulation all work differently. Writing a clean abstraction layer over all three without sacrificing performance is real systems programming work.
Then there is the guest image question. x86-64 VMs require firmware initialization (typically a stripped BIOS or UEFI). ARM requires different boot protocols. If the goal is one image format that runs everywhere, you need to either target a single architecture and rely on Rosetta/emulation on Apple Silicon, or maintain architecture-specific images with a common toolchain.
Most portable VM projects have landed on shipping separate images per architecture but a unified API. smolvm’s pitch of “portable” likely means the host-side tooling and image format are portable, not that the guest binary is cross-architecture.
WebAssembly as an Alternative Frame
It’s worth putting smolvm next to the WebAssembly runtimes, because they are often mentioned in the same conversations about fast, portable, sandboxed execution.
Wasmtime and Wasmer achieve near-instant startup, sub-millisecond in many cases, because WebAssembly is a high-level bytecode rather than machine code running a full OS. You pay for it in two ways: applications must be compiled to WASM, which means a specific language and toolchain, and the execution environment is fundamentally different from a Linux process, so POSIX compatibility requires the WASI interface which is still evolving.
For sandboxing untrusted code in a known language stack, WASM is often the right answer. For running existing Linux binaries, arbitrary language runtimes, or workloads that need a real OS environment, you still need a VM or at least a container with kernel-enforced isolation.
smolvm targets the VM side of this trade. It gives you stronger isolation than a container and better portability than Firecracker, at the cost of higher complexity than a WASM runtime.
Reading the Ecosystem Moment
The timing of this project makes sense. Serverless and edge compute have made fast VM startup a first-class concern rather than an academic one. The hypervisor APIs on macOS and Windows have matured to the point where native acceleration is reliable. Rust has become the default language for systems work in this space (Firecracker, Cloud Hypervisor, crosvm, and now apparently smolvm are all Rust), which provides memory safety without GC overhead.
The gap this project is targeting is real: developers building on macOS who want to test VM-based sandboxing locally, without spinning up a Linux environment first. CI systems that want to run workloads in proper VM isolation without the overhead of full QEMU. Edge deployments where the target hardware might be Linux on ARM or Windows on x86.
Whether smolvm fills that gap depends on details that are still emerging: what guest formats it supports, how it handles networking and storage, what the embedding API looks like for programmatic VM management, and how the snapshot/restore story develops. The subsecond coldstart number is achievable at this point, multiple projects have demonstrated it, so the real differentiator will be the developer experience and the breadth of workloads it can run.
For anyone interested in this space, the Firecracker design document remains one of the clearest explanations of what decisions go into a minimal VM monitor. Cloud Hypervisor’s architecture document covers some of the same ground with different tradeoffs. And the Unikraft papers are worth reading if you want to understand how far boot times can go when you are willing to specialize the kernel for a specific workload.
smolvm is early. The Show HN post with 276 points and 91 comments is a prototype getting signal, not a production-ready platform. But the problem it is working on is genuinely worth solving, and the combination of subsecond coldstart and cross-platform portability is a more interesting target than either property alone.