When 8GB Is Enough: How DuckDB Handles Data Larger Than RAM

The cheapest MacBook in early 2026 is a MacBook Air starting around $1,099, with 8GB of unified RAM. Claiming you can run genuine big data analytics on that machine would have seemed like marketing language five years ago. DuckDB’s recent benchmark post makes a credible case that the claim holds, and the mechanics behind it are worth examining.

The story is not simply that DuckDB is fast. It is about how a well-designed execution model, a good file format, and hardware that has quietly improved all converged to shift the threshold for what constitutes a “big data problem.”

Why the Standard Python Stack Hits a Wall

The conventional Python data stack for years meant loading data into a Pandas DataFrame. Pandas is row-oriented in memory and eager by default, meaning it reads the full dataset into RAM before any computation begins. A 50GB CSV on an 8GB machine produces an out-of-memory error before the query starts.

Spark was the historical answer: distribute the data across multiple machines, pool their memory, and run a distributed execution engine. This works, but the operational weight is real. Every Spark job requires cluster bootstrap time (typically 3-5 minutes on AWS EMR), tuning of executor memory settings, and tolerance for shuffle failures that appear only on large inputs. For exploratory analysis, the iteration cycle is painful.

DuckDB’s design premise is different. Rather than assuming data fits in RAM and failing when it does not, it treats memory as a configurable constraint from the start.

The Execution Model

DuckDB uses a vectorized, columnar execution engine with roots in the MonetDB/X100 research lineage. Queries execute in a pipeline where data moves through operators in batches called vectors, defaulting to 2,048 rows per batch. This is SIMD-friendly, cache-efficient, and critically, does not require the full dataset to be resident in memory at once.

The memory ceiling is explicit and configurable:

SET memory_limit = '4GB';
SET temp_directory = '/tmp/duckdb_spill';

When an operator requires global intermediate state, such as a hash join, a GROUP BY aggregation, or an ORDER BY sort, DuckDB builds that state incrementally. If the state would exceed the memory limit, DuckDB partitions it by hash and spills those partitions to the temp directory, then processes each partition in passes. For joins this is a variant of grace hashing; for sorts it is an external merge sort.

A query like this works on a machine with 4GB available, processing 100GB of Parquet data:

SET memory_limit = '4GB';
SET temp_directory = '/tmp/duckdb_spill';

SELECT
    date_trunc('month', event_ts) AS month,
    event_type,
    COUNT(*) AS events,
    SUM(revenue) AS total_revenue
FROM read_parquet('/data/events/**/*.parquet')
WHERE event_ts >= '2025-01-01'
  AND region = 'us-east'
GROUP BY 1, 2
ORDER BY 1, 2;

DuckDB scans the Parquet files in a streaming fashion, accumulates hash table state for the GROUP BY, and spills partitions when they would exceed the memory limit. The full dataset never lives in RAM simultaneously.

Parquet Is Doing Half the Work

Any “big data on a laptop” claim requires scrutiny of the data format. A 100GB CSV and a 100GB Parquet file are not equivalent problems.

Parquet is a columnar storage format that stores per-column statistics in its metadata: minimum value, maximum value, and null counts for each row group (typically 128MB of uncompressed data). DuckDB’s Parquet reader uses these statistics for two optimizations before touching the file content.

Row group skipping: if a row group’s recorded maximum value for event_ts is before ‘2025-01-01’, DuckDB skips the entire row group. There is no I/O, no decompression, and no processing for that row group.

Column pruning: if the query references 3 columns out of 30, DuckDB reads only those 3 column chunks per row group. The other 27 columns are never touched.

A query nominally over 100GB of Parquet might read 6-8GB from disk after these optimizations apply. The problem has already shrunk by an order of magnitude before the vectorized engine processes a single row. CSV offers neither property; it requires reading every byte in order to parse the structure.

DuckDB extends this to remote data with the same selective I/O behavior:

import duckdb

conn = duckdb.connect()
conn.execute("SET memory_limit = '4GB'")
conn.execute("INSTALL httpfs; LOAD httpfs;")

result = conn.execute("""
    SELECT region, SUM(revenue) AS total
    FROM read_parquet(
        's3://my-bucket/sales/year=2025/**/*.parquet',
        hive_partitioning = True
    )
    GROUP BY region
    ORDER BY total DESC
""").df()

The S3 read uses HTTP range requests to fetch only the row groups and columns the query requires. A full bucket scan is not happening.

Why Apple Silicon Changes the Math

Two hardware properties make M-series MacBooks particularly suited to this workload.

Memory bandwidth: Apple Silicon’s unified memory architecture delivers substantially higher bandwidth than comparable x86 laptop hardware. The M3’s memory subsystem sustains roughly 100 GB/s; a comparable Intel or AMD laptop typically delivers 50-70 GB/s. DuckDB’s vectorized operators are often memory-bandwidth-bound during large column scans and hash operations. Higher bandwidth means those operators run closer to their theoretical throughput ceiling.

NVMe speed: Apple Silicon Macs use NVMe storage tightly integrated with the SoC, with sequential read speeds above 7 GB/s on recent models. When DuckDB spills to disk, the round-trip cost of writing and reading back a partition is meaningfully lower than on a PC laptop with a mid-range SATA SSD at 500 MB/s. The penalty for exceeding the memory limit is smaller, which narrows the gap between “fits in RAM” and “requires spill” considerably.

Unified memory also simplifies resource management. There is no PCIe boundary between CPU and GPU memory; the OS and DuckDB draw from the same physical pool. Setting memory_limit = '6GB' on an 8GB machine leaves 2GB for the operating system without the conflict over separate address spaces that a discrete GPU setup introduces.

How This Compares to Polars

Polars is the other serious contender for single-machine analytical work at scale. Written in Rust with an Arrow-native columnar model, its lazy execution API can process data larger than RAM through streaming evaluation.

The distinction is practical. Polars excels at DataFrame-centric workflows: selecting, filtering, joining, and chaining transformations in a Python-native style. Its lazy API compiles a query plan and evaluates it with streaming where the plan allows. DuckDB excels at SQL-centric analysis, particularly for complex multi-table joins, correlated subqueries, window functions, and ad-hoc queries where the optimizer’s join reordering and cardinality estimation matter.

DuckDB’s out-of-core support is also more uniformly complete across operator types. Polars’s streaming mode handles many cases well but has historically had gaps for certain join shapes and aggregation patterns. For exploratory analytics on a large Parquet dataset with complex SQL, DuckDB’s story is more consistent.

The two tools are composable rather than competing. DuckDB can query Polars DataFrames directly via the Arrow interface, allowing each to handle the work it does best.

What Five Years of Progress Replaced

To appreciate what this means in practice, consider the equivalent workflow from 2020:

Provision an EMR cluster on AWS with 3-5 nodes at $0.50-2.00 per hour
Write PySpark code, submit the job, and wait for cluster bootstrap
Debug executor memory errors, tune spark.executor.memory and spark.driver.memoryOverhead
Iterate slowly because each test run requires re-submitting the job
Pay for idle cluster time between debugging rounds

The DuckDB equivalent is pip install duckdb and writing a SQL query. Iteration takes seconds. The dataset has to grow substantially before the cost of distributed infrastructure pays for itself against that simplicity.

The Remaining Limits

Honest accounting requires stating where this approach falls short.

Multi-terabyte datasets exceed local SSD capacity, not just RAM. A 10TB dataset is not a DuckDB-on-a-MacBook problem regardless of how well memory management works. Concurrent multi-user workloads are outside DuckDB’s design; it uses a single-writer lock that serializes writes and limits concurrent read patterns. Real-time streaming ingestion, fault-tolerant ETL with orchestration and retry logic, and workloads that need to scale horizontally under load all still require distributed infrastructure.

The threshold has moved substantially, however. The HackerNews discussion around this article has practitioners sharing benchmarks from their own datasets, and the consistent observation is that the crossover point where you genuinely need a cluster sits much higher than intuition built on 2019-era tooling would suggest.

Format and Configuration as Leverage

The broader observation is that “big data” has always been partly a statement about tooling, not data volume alone. Pandas with CSV is genuinely constrained at 8GB. DuckDB with Parquet, a tuned memory limit, and an M3’s fast NVMe is not. The machine did not change; the software and format choices did.

This is the substantive claim in the original post, and the benchmarks back it up. If your current workflow reaches for cloud infrastructure when data exceeds available RAM, it is worth testing a local DuckDB query first. For a large class of analytical problems, the answer comes back from the laptop before the cluster finishes bootstrapping.