Speed

RedHop runs in your process over a Rust core (no network round-trip, no service to call), so the numbers below are dominated by real work (parsing, indexing, scoring), not overhead. All measurements are CPU-only on a single machine with a warm index. Absolute milliseconds drift ~10–15% run-to-run, so read the shape, not the last digit.

Lexical (the default): instant

The default lexical tier (BM25) needs no model and no embedding step, so a document is queryable almost immediately:

`contract.pdf` path, ~189k tokens	RedHop (BM25)
time to first answer	0.02s
warm per-query	~1ms

Each query also prunes to budget and emits a Decision Report, so it does more than a bare retriever and still answers in about a millisecond.

Reproduce: cargo run -p redhop-examples --example eval_cuad_documents --release

Semantic: a one-time cost, then fast forever

The opt-in semantic / hybrid tiers embed your chunks once (cached), then score every query by exact cosine over those cached vectors. So the cost is setup once, fast forever:

corpus	embed-all (one-time setup)	warm per-query
~13k tokens (1 contract)	~2s	~6ms
~38k tokens (5 contracts)	~7s	~6ms
~189k tokens (15 contracts)	~17s	~6ms

Warm queries land at ~6ms: the query embedding dominates, and exact cosine over the cached vectors is cheap. The only real cost is embedding everything up front, and you pay it only if you opt into a dense tier. The lexical default skips it entirely. With from_folder(persist=True) the embeddings are written to disk, so the embed-all is paid once and reloaded on every later run.

Reproduce: bench/.venv/bin/python bench/speed_compare.py

Latency stays flat as documents grow

The most important property for interactive use: per-query time barely moves as the document gets bigger: BM25 lookup is independent of corpus size, so a 4,000-page PDF answers as fast as a 1-page one once it’s loaded. Time-to-first-answer is dominated by parsing the PDF (~2.5ms/page, linear), with chunking, indexing, and the query negligible on top:

Pages	Chunks	Time to first answer	Warm query
1,000	1,000	2.3s	~2ms
2,000	2,000	5.0s	~2ms
4,000	4,000	11.5s	~2ms

A thousands-of-page document is fully interactive after its one-time load. (Adding the semantic tier adds the embed-all, ~11s per 1,000 chunks, which persist=True makes a one-time cost.) Measured on synthetic PDFs via from_file on the lexical default, a latency measurement (parse + index + query), not an answer-quality one.

Reproduce: bench/.venv/bin/python bench/large_pdf.py · bench/.venv/bin/python bench/large_pdf.py --semantic

Speed is one axis. Answer quality and evidence retention are the other. Those, with the head-to-head against LangChain and LlamaIndex, live on the Benchmarks page.