Finding Rust Functions by Type: Inside the Search Engine That Thinks Like a Compiler
Source: lobsters
There is a class of problem every Rust programmer encounters: you want a function that converts an Option<T> into a T using a fallback value, but you cannot remember what it is called. You know the shape. You need Option<T> -> T -> T. Text search is useless here because you have no name to search for.
Roogle is a Rust API search engine that solves exactly this. You give it a type signature and it returns functions that match. It is inspired directly by Hoogle, the Haskell tool that has been doing this since 2004, and it works through the same fundamental mechanism: structural type unification over a pre-built index.
The Algorithm at Its Core
Roogle ingests rustdoc JSON output (rustdoc --output-format json), which gives it a machine-readable representation of every public item in a crate, complete with fully resolved type signatures. It builds an index over these items and, at query time, runs approximate unification between your query and each indexed signature.
The query language is a subset of Rust type syntax:
(i32, i32) -> i32
Option<T> -> T -> T
&str -> String
Unification is the same algorithm at the heart of type inference. Given two type terms, find a substitution mapping type variables to types such that both terms become identical under that substitution. For the query Option<T> -> T -> T, a candidate like Option::unwrap_or with signature (Option<A>, A) -> A unifies successfully: bind T = A, and the structures match.
The classical version of this is Robinson’s unification algorithm from 1965, and it is binary: either unification succeeds or it fails. A search engine needs ranking, so Roogle uses approximate unification with scoring. Matching a concrete type where the query had a generic variable succeeds but with a penalty. Extra arguments in the indexed function that the query did not mention incur a penalty. Exact structural matches score highest. This produces an ordered list of results rather than a flat set.
Before any matching happens, both queries and indexed types go through normalization. Type variables are alpha-renamed to a canonical sequence so that (x -> y -> x) and (a -> b -> a) are treated identically. Lifetime annotations are stripped entirely. Nobody searching for a function wants to reason about lifetimes during discovery; they can read the signature after they find the function.
For large corpora, running full unification on every indexed item is too slow. The practical approach is a two-phase filter: first compute a fingerprint for both query and candidate (the set of type constructor names appearing in the signature), and prune any candidate whose fingerprint has no overlap with the query fingerprint. Only surviving candidates go through full unification. This brings search time down from linear-in-corpus to roughly linear in the number of candidates with overlapping type constructors.
What Hoogle Solved First
Hoogle has been indexed over all of Hackage for many years and covers tens of thousands of Haskell packages. Neil Mitchell’s original work on the tool includes one feature that makes it particularly useful: argument permutation. When searching, Hoogle tries all orderings of the query arguments against the candidate’s parameter list. Haskell functions are curried, so argument order matters in the type, but callers often misremember it. Trying all permutations recovers results that would otherwise be missed. Since practical function arity is almost always five or fewer, the factorial cost is negligible.
Hoogle also has over two decades of refinement in its ranking. It weights results from base (the standard library) higher than obscure packages, factors in download counts from Hackage, and handles type class contexts gracefully. This kind of ranking work is unglamorous but critical for the tool to feel useful rather than overwhelming.
Where Roogle Sits in Rust’s Tooling Landscape
Rust already has several documentation and search tools, and understanding where Roogle fits requires knowing what those tools do and do not do.
docs.rs runs rustdoc on every published crate and serves the HTML output. Since around Rust 1.74, rustdoc’s embedded JavaScript search engine gained basic type-directed search capability. You can open any crate’s documentation and type a query like -> Option<String> to find functions returning that type. This is real type search, implemented in JavaScript against the search-index.js file that rustdoc generates per crate. It works well for exploring a single crate you are already in, but it does not cross crate boundaries and has no hosted cross-crate instance.
rust-analyzer is the Rust language server. Its workspace symbol search (# in most editors) is text-based over symbol names. It provides type inference, inlay hints, and go-to-definition, all of which are enormously useful for code you are already writing. What it does not do is help you discover unknown functions by the shape of their type. That is a different problem: API discovery rather than code navigation.
Roogle targets the gap between these two. It is not a language server; it is an offline search tool that you run against a pre-built index. The current implementation indexes the Rust standard library. There is no publicly hosted instance over all of crates.io, which is the main thing standing between Roogle and being as broadly useful as Hoogle.
Other Languages That Solved This
The Haskell community showed the approach works. The PureScript community built Pursuit, which is a hosted Hoogle-inspired search over all PureScript packages. It handles the same core algorithm, and because PureScript’s type system is close to Haskell’s, the implementation transferred with minimal adaptation.
The OCaml community has Sherlodoc, presented at the OCaml Workshop 2023. It builds on odoc (OCaml’s standard documentation tool) and supports type-based search using a trie-indexed fingerprint structure for fast candidate pruning. OCaml presents harder matching problems than Haskell or Rust because of polymorphic variants and structural subtyping; Sherlodoc handles these by specializing functor-polymorphic types to their concrete instantiations where possible. A hosted instance exists.
Dependently-typed languages like Idris and Agda have type-directed search as a first-class language feature: the search tactic and auto tactic perform proof search, which is the same thing as type-directed function search when the type system is expressive enough. In those languages, the compiler is the search engine.
Notably absent from this list: Go, TypeScript, and Swift. Go’s type system is deliberately simple and structural, and the discoverability problem is less acute because the ecosystem is smaller and more uniform. TypeScript’s structural type system with union types and conditional types makes precise type unification theoretically much harder. Swift lacks a hosted type-search tool despite having a rich type system.
The Infrastructure That Makes It Possible
Roogle would not be practical without rustdoc’s JSON output format. Parsing Rust source code to extract type signatures is genuinely hard: macros, re-exports, type aliases, trait implementations, and conditional compilation all make a naive parser inadequate. Rustdoc’s JSON format resolves all of this. It gives you every public item with its fully resolved, normalized type signature, including cross-crate paths resolved to their canonical forms. std::option::Option and core::option::Option appear as the same thing.
The JSON format was unstable for several years, which made building reliable downstream tools on it risky. Stabilization work has been ongoing; the format is now versioned and more dependable. That stability matters for projects like Roogle: if the format changes between Rust releases, the tool’s index becomes stale or the parser breaks.
What Still Needs to Happen
Roogle as it stands is an excellent proof of concept and a correct implementation of the core algorithm. The gap between it and being a daily-use tool is primarily infrastructure, not algorithms.
Building and hosting an index over all published crates on crates.io requires running rustdoc on each crate (or each version), storing the JSON output, indexing the extracted signatures, and serving queries at scale. This is the equivalent of what Hackage provides for Hoogle. It is an engineering and hosting problem, not a research problem.
Trait bounds present another refinement opportunity. When a function is generic with a bound like T: Display, that constraint is semantically meaningful for search: a user looking for T -> String probably wants to find format!-style functions, and knowing that T must implement Display helps distinguish between otherwise similar signatures. Full handling of where clauses and trait bounds in unification would improve precision.
The combination of Roogle-style cross-crate type search with the in-page type search rustdoc already provides would cover most of the discoverability problem. Rustdoc can handle intra-crate queries at documentation-read time; a hosted Roogle-style service would handle the cross-crate discovery problem at development time.
The underlying idea is two decades old, proven in Haskell, replicated in PureScript and OCaml, and now arriving in Rust. The algorithm is well-understood. The remaining work is building the index and keeping it current. That is a solvable problem, and the Rust ecosystem is better positioned to solve it now than it was a few years ago, because the rustdoc JSON format that makes it tractable finally has the stability guarantees that downstream tooling requires.