Elasticsearch and Meilisearch Have Different Theories of What Search Should Do

When someone writes about ditching Elasticsearch for Meilisearch, the story usually centers on developer experience: the single binary, the instant setup, the typo tolerance that just works. Those things are real. But the migration decision runs deeper than ergonomics. Elasticsearch and Meilisearch were built with fundamentally different architectural theories, and understanding those theories tells you more about whether to switch than any benchmarking spreadsheet will.

What Lucene Was Built to Do

Elasticsearch sits on top of Apache Lucene, a Java library with a 25-year lineage. Lucene’s architecture reflects its origins in document retrieval research: it uses an inverted index organized into immutable segments, a term dictionary compressed with finite state transducers, and BM25 as the default relevance model since Elasticsearch 5.0 in 2016.

BM25 is a probabilistic relevance model. It scores documents by combining term frequency (how often the query term appears in the document) with inverse document frequency (how rare the term is across the corpus), and normalizes by field length. The tunable parameters k1 (term frequency saturation, default 1.2) and b (length normalization, default 0.75) let you adjust how much each factor contributes. This is a well-studied model with decades of information retrieval research behind it, and it works well for corpus-scale retrieval where you want to rank broad sets of potentially relevant documents.

Lucene’s segment architecture also enables something Meilisearch cannot match: columnar storage via doc values. Doc values store field data in a column-oriented format on disk, which means Elasticsearch can compute aggregations, date histograms, term counts, percentiles, and cardinality estimates, without loading full documents into memory. This is why Elasticsearch works as an analytics engine, not just a search engine. The log aggregation pipelines, Kibana dashboards, and SIEM use cases are not accidents of Elasticsearch’s popularity. They are the natural output of an architecture designed for analytics at scale.

The cost of that architecture is operational weight. Elasticsearch 8.x requires JVM 17+, bundles its own JDK, and recommends setting the heap to 50% of available RAM up to a ceiling of 31 GB (to stay within the JVM’s compressed ordinary object pointers range). On Linux, you must set vm.max_map_count=262144 via sysctl before Elasticsearch will even start. Security is enabled by default in 8.x, which means TLS certificates and authentication configuration even for a local development instance. Shard count must be planned upfront; getting it wrong means performance problems that cannot be easily corrected without a full reindex.

What milli Was Built to Do

Meilisearch is written in Rust and uses a custom search engine called milli (merged into the main repository in v1.x) on top of LMDB via the heed crate. LMDB is a B-tree key-value store that uses memory-mapped I/O, which means reads go through the OS page cache with zero-copy semantics. The read path has almost no overhead once data is warm.

milli does not use BM25. It uses a bucket-sort relevance pipeline. Documents are sorted into ranked buckets by sequential rules, each acting as a tiebreaker for the previous:

words -> typo -> proximity -> attribute -> sort -> exactness

The proximity rule is the notable one. milli maintains a word-pair proximity database: for each pair of consecutive query words, it stores the distance between those words across all documents. A document where “machine learning” appears as adjacent words outranks one where those words appear twenty tokens apart. Most BM25 implementations do not capture this at all; phrase proximity in Elasticsearch requires explicit phrase queries or span_near queries, whereas in Meilisearch it is part of the default ranking pipeline.

Typo tolerance is also built in and on by default. Meilisearch applies Damerau-Levenshtein distance automatically: no tolerance for words under 5 characters, one permitted edit for 5-8 character words, two edits for longer words. In Elasticsearch, fuzzy matching requires you to pass "fuzziness": "AUTO" on each query, and it does not interleave with primary relevance ranking in the same way.

The combined result is a search experience that works well out of the box for user-facing search boxes, particularly prefix search and search-as-you-type. Meilisearch maintains a separate prefix inverted index alongside the main inverted index, enabling fast completions without a separate autocomplete pipeline.

The Operational Gap

Starting Meilisearch:

curl -L https://install.meilisearch.com | sh
./meilisearch
# REST API at localhost:7700, web UI included

Starting Elasticsearch in development:

docker run -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.17.0

That Docker command works for local experiments. In production, the xpack.security.enabled=false flag is not acceptable, which means TLS configuration, keystore management, and certificate rotation enter the operational picture. The JVM heap must be explicitly bounded. Index templates should be defined before data ingestion if you want reliable mappings, because Elasticsearch’s dynamic mapping will happily create thousands of fields from loosely structured JSON, a problem known as mapping explosion.

For teams running Kubernetes, Elasticsearch requires the ECK operator or careful manual StatefulSet configuration with persistent volumes, init containers to set the sysctl, and liveness probes tuned to handle slow startup. Meilisearch is a single Deployment with a single PersistentVolumeClaim.

The operational gap is not incidental. Elasticsearch was designed for teams with dedicated infrastructure engineers, and that design intent is visible at every layer of the stack.

What You Give Up

The migration is not free. Meilisearch in the open-source single-binary form has no clustering. There is no native sharding across nodes, no replica for high availability, no automatic failover. Meilisearch Cloud handles this with a managed multi-tenant architecture, but the open-source deployment is a single point of failure. Elasticsearch’s distributed architecture, built on Raft-based consensus since 7.x, handles tens of billions of documents across sharded clusters with replication and automatic shard rebalancing.

The aggregation framework disappears entirely. Meilisearch supports facet counts: you can ask for the distribution of a categorical field across search results. But date histograms, numeric percentiles, cardinality estimates, and pipeline aggregations are not available. If you are using Elasticsearch for anything beyond search, such as powering a dashboard, aggregating log data, or computing analytics over your document corpus, Meilisearch cannot substitute for those workloads.

Relevance tuning is also more constrained. Elasticsearch’s function_score query lets you combine BM25 with arbitrary scoring functions: field value factors, Gaussian decay for recency or geo-distance, custom Painless scripts. Since Elasticsearch 8.8, Reciprocal Rank Fusion lets you blend keyword and vector search scores in a principled way. Meilisearch gives you the ranked rule pipeline and the ability to add custom sort criteria based on numeric fields. That covers most product search requirements, but it does not cover the cases where Elasticsearch’s programmable relevance is genuinely necessary.

For vector search, both tools now support hybrid keyword-plus-vector retrieval. Meilisearch added stable vector search in v1.6 with support for OpenAI, HuggingFace, and Ollama embedders. Elasticsearch has offered HNSW-based kNN since 8.x and its own sparse vector model, ELSER, since 8.8. Both are viable for RAG workloads, though Elasticsearch’s ML node infrastructure makes it easier to co-locate the embedder with the search cluster.

When the Trade Makes Sense

The migration makes sense when your dataset is under roughly 10 million documents, your primary use case is a user-facing search box, and you do not need analytics running against your search index. E-commerce product catalogs, documentation search, internal knowledge bases, and SaaS application search are all good fits. Meilisearch’s Federation API, introduced in v1.5, lets you search across multiple indexes in a single request, which handles the common pattern of searching across different content types simultaneously.

If you are ingesting logs, running analytics, building security tooling, or storing more than a few tens of millions of documents, Elasticsearch’s architectural investments pay off in ways that Meilisearch cannot replicate. OpenSearch, the AWS-maintained fork of Elasticsearch 7.10, is worth considering in that space if the Elastic licensing model is a concern.

The more useful framing is not which tool is better, but what each tool is optimizing for. Lucene and the JVM give Elasticsearch a powerful, battle-tested analytics substrate with horizontal scalability built in. LMDB and the milli engine give Meilisearch sub-10ms latency, automatic typo tolerance, and proximity-aware ranking, all with a startup time measured in milliseconds and a memory footprint measured in hundreds of megabytes rather than gigabytes. These are not interchangeable tools, and the migration holds up when the workload matches the destination tool’s design center.