· 5 min read ·

When Meilisearch Became a Hybrid Search Engine: The Embedder Model and What It Costs

Source: lobsters

When Ani Safifi describes ditching Elasticsearch for Meilisearch, the story centers on the familiar pain points: JVM heap sizing, shard planning, and cluster state management applied to a simple search use case. That mismatch is real and well-documented. What the migration story misses is a more recent dimension of this comparison: both systems now offer vector and hybrid search, but their integration models differ in ways that affect more than setup effort.

Meilisearch added vector search in v1.3 and hybrid search combining keyword and vector results in a single query in the same release. Elasticsearch has had kNN search via the dense_vector field type since version 8.0, with the current implementation using HNSW (Hierarchical Navigable Small World) graphs backed by the Lucene HNSW implementation. The capability gap between the two systems is largely closed. Both can combine lexical matching with approximate nearest-neighbor search against dense vectors. The difference is in who generates the vectors, and that distinction has real consequences for how you build and maintain the integration.

How Elasticsearch Handles Vectors

In Elasticsearch, the dense_vector field type stores fixed-length float arrays alongside your document fields. You define the dimensionality in your index mapping:

PUT /products
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "description": { "type": "text" },
      "embedding": {
        "type": "dense_vector",
        "dims": 1536,
        "index": true,
        "similarity": "cosine"
      }
    }
  }
}

When you index a document, you are responsible for computing the embedding and including it in the document body. Elasticsearch does not call an external model on your behalf. A hybrid query combines a knn clause with a standard query:

POST /products/_search
{
  "knn": {
    "field": "embedding",
    "query_vector": [0.024, -0.109, "...", 0.031],
    "k": 20,
    "num_candidates": 100
  },
  "query": {
    "match": { "title": "running shoes" }
  }
}

You are responsible for generating both the indexed embeddings and the query vector. The advantage is control: you can use any embedding model, update models independently of the index, and apply compression. Elasticsearch 8.9 introduced scalar and product quantization for dense vectors, reducing storage overhead by 4x to 32x at some precision cost. Scalar quantization converts float32 to int8; product quantization applies learned codebooks to reduce vectors to a fraction of their original size while retaining most of the retrieval precision.

How Meilisearch Handles Vectors

Meilisearch takes a different position. Rather than treating the embedding as a field you provide, it treats embedding generation as a service the search engine performs on your behalf. You configure an “embedder” in the index settings, pointing Meilisearch at a model provider:

PATCH /indexes/products/settings
Content-Type: application/json

{
  "embedders": {
    "default": {
      "source": "ollama",
      "url": "http://localhost:11434",
      "model": "nomic-embed-text",
      "documentTemplate": "{{doc.title}} {{doc.description}}"
    }
  }
}

From that point, Meilisearch generates embeddings automatically when documents are added or updated. The embedder is called for each document’s rendered documentTemplate, and the resulting vector is stored in the LMDB index alongside the document. A hybrid search query then looks like this:

POST /indexes/products/search
Content-Type: application/json

{
  "q": "running shoes",
  "hybrid": {
    "semanticRatio": 0.5,
    "embedder": "default"
  }
}

The semanticRatio parameter blends keyword and vector results using reciprocal rank fusion. A value of 0 is pure keyword search, 1.0 is pure semantic search, and values between interpolate. The Meilisearch server calls the embedder to generate a query vector at search time, runs approximate nearest-neighbor search against stored vectors, and merges results with keyword candidates.

Supported embedder sources include OpenAI, HuggingFace Inference API, Ollama, and generic REST endpoints that accept text input and return a float array. The Ollama integration in particular makes it feasible to run a complete hybrid search stack locally without external API calls, which is useful for prototyping, internal tools, or products that cannot send data to third-party embedding APIs.

What the Design Difference Actually Means

The Meilisearch approach removes the embedding pipeline from your application code. There is no preprocessing step, no external job that runs before indexing, no synchronization problem between your documents and their embeddings. You configure the embedder once, index plain documents, and Meilisearch handles the rest.

The cost of this is opacity in two dimensions.

First, the embedder is called as part of the indexing task. If your embedder is slow, a large model on modest hardware or a rate-limited API, indexing slows proportionally. A large bulk import becomes a large number of sequential embedding calls. With the LMDB write transaction constraint that Meilisearch inherits, these calls cannot be parallelized across transactions. An import that would take minutes without embeddings can take hours when embedding generation is bottlenecked on an API rate limit or an underpowered local inference server.

Second, the embedding model is coupled to the index configuration. Changing the model requires re-indexing every document, because the stored vectors occupy a specific dimensional space tied to that model’s output. The vectors from nomic-embed-text (768 dimensions) are not reusable with text-embedding-3-large (3072 dimensions by default). Elasticsearch faces the same fundamental constraint, but the decoupling is at a different layer: you manage the embedding pipeline yourself, which means you can build a new index with the new model’s vectors in parallel while the old index continues serving traffic, then swap with an alias. Meilisearch has the indexSwap API for zero-downtime re-indexing, but the re-generation of embeddings through the configured embedder adds a time cost that grows with dataset size and model throughput.

Storage Overhead

Vector storage adds meaningful overhead that compounds Meilisearch’s existing LMDB index amplification. A float32 vector for a 768-dimensional model like nomic-embed-text occupies 3,072 bytes per document. A 1536-dimensional OpenAI text-embedding-3-small output occupies 6,144 bytes per document. For a one-million-document index, that is 3 to 6 gigabytes of vector data before accounting for the inverted index, facet database, and raw document storage that LMDB already holds.

Meilisearch v1.x stores vectors in full float32 precision with no quantization option exposed to users. Elasticsearch’s scalar quantization reduces the 4-byte-per-dimension cost to 1 byte with minimal recall degradation on most retrieval benchmarks. For a dataset where storage or memory pressure is a consideration, that difference is significant. The Elasticsearch quantization documentation notes that int8 scalar quantization typically reduces recall by less than 1% on common benchmarks while cutting vector storage by 75%.

Which Architecture Fits

The Meilisearch embedder model fits projects where the search feature is owned by a small team, where the integration’s simplicity matters more than optimal embedding throughput, and where the dataset size keeps storage overhead manageable. Running Ollama locally with nomic-embed-text and Meilisearch is a complete hybrid search stack with no external dependencies, which has genuine value for developers who want semantic search without managing a separate embedding service.

Elasticsearch’s explicit vector model fits when you have a dedicated ML pipeline, need fine-grained control over batching and retry behavior during indexing, want quantization for storage efficiency at scale, or need to decouple model updates from index rebuilds. It also fits when the search feature is one component in an existing Elasticsearch deployment that already exists for analytics or log ingestion.

The migration question the source article raises was clearer when the two systems had different feature sets. With both now offering hybrid search, the comparison shifts from whether each system can do it to how each system’s integration model fits your team’s workflow. Meilisearch’s answer is: we generate the vectors, you stay out of the pipeline. Elasticsearch’s answer is: you generate the vectors, we store and retrieve them efficiently. Both are coherent positions, and the right choice follows from whether you want to own that pipeline or delegate it.

Was this interesting?