Overview
RedHop is a reasoning-preserving context runtime. It sits between your documents and an LLM: you hand it text and a question, and it returns the context the model should actually see — chunking, retrieving, and allocating internally, and explaining what it did.
The one idea
Section titled “The one idea”Retrieval quality is not the same thing as reasoning quality. Transformers tolerate irrelevant context better than they tolerate missing reasoning links.
So RedHop optimizes for keeping the evidence a question actually needs, and it only intervenes when intervention is measured to help — large, diluted contexts get pruned; small ones are left alone.
What it owns
Section titled “What it owns”- Loading + parsing — text, PDF/DOCX/PPTX/XLSX, and whole folders (with citations)
- Chunking
- Internal retrieval (lexical by default; optional dense retrieval — see below)
- Context allocation under a token budget
- Reasoning-safe, conditional optimization
- Observability and token economics
Retrieval is a ladder — start cheap, climb only when you must
Section titled “Retrieval is a ladder — start cheap, climb only when you must”Begin at the cheapest rung that works, and step up only when your queries demand it.
Rung (retrieval=) | Dependency | Reach for it when |
|---|---|---|
"lexical" — BM25 (default) | none — zero model, fully offline | the answer shares words with the query (most document QA) |
"hybrid" — BM25 prune → dense rerank | a model name (model="bge-small" auto-downloads) | semantic search over many files / a folder |
"semantic" — global dense | same | highest recall — scores every chunk by meaning |
1 · Lexical handles most document QA. The documents you reason over are often keyword-dense — contracts, API references, specs, manuals, logs, legal docs — where the words in the question are the words in the answer. BM25 handles those with zero model, zero infra, indexing in milliseconds. Most document QA starts and ends here.
2 · Semantic, when words stop matching meaning — two tiers:
"hybrid"— re-ranks a BM25 candidate pool by meaning; it only embeds that pool per query, so it scales to a whole folder."semantic"— ranks every chunk by meaning, for the highest recall when the question and the answer share no words.
You just name an embedding model; how each tier actually ranks — cosine, fusion, and optional reranking — is in How the search works below.
→ Retrieval options — the tiers, how to enable them, and what each asks of you.
How the search works
Section titled “How the search works”- Lexical (
lexical) ranks by BM25 — classic term-frequency scoring over an in-process inverted index. No model, no embeddings. - Hybrid (
hybrid) runs two stages: BM25 narrows the corpus to a candidate pool (default 50 chunks), then a local embedding model encodes that pool and the query and reorders by cosine similarity. Only the pool is embedded per query. - Semantic (
semantic) encodes every chunk once, caches the vectors, and ranks them all by exact cosine against the query — a brute-force scan (no approximate-nearest-neighbour index), so per-query cost is dominated by embedding the query, not the corpus size. - Mixed corpora (code + prose) under
hybridare merged with reciprocal rank fusion: code is ranked lexically (exact identifiers matter; general embedders are weak on code), prose by cosine, and the two ranked lists are fused. - Optional cross-encoder rerank (
rerank="cross-encoder") adds a precise second stage on any tier — it jointly encodes each(query, passage)pair and reorders the pool, more accurate than cosine, at a model call per candidate.
You supply an embedding model only for the dense tiers — named once and auto-downloaded on first use; the lexical default needs none.
The mental model
Section titled “The mental model”doc = redhop.Document.from_text(text) # documentsctx = doc.context(query) # + queries → contextconst doc = Document.fromText(text); // documentsconst ctx = doc.context(query); // + queries → contextlet mut doc = redhop::Document::from_text("doc", text)?; // documentslet ctx = doc.context(query)?; // + queries → contextYou think in documents and queries. Retrieval is an implementation detail.
Under the hood
Section titled “Under the hood”Under the Python surface, RedHop is a Rust library for retrieval infrastructure: chunking, retrieval, reranking, and diagnostics. It does not generate text or bundle an embedding model — embedding plugs in through a trait boundary. RedHop’s contribution is the orchestration between these stages and the diagnostics engine that makes retrieval quality observable from text alone.
Layering
Section titled “Layering”RedHop ships as a single redhop crate (one Python wheel, one npm package, one
Cargo crate). Internally it is organized as modules; each layer above the core
depends only on the trait surface below it, not on sibling implementations.
redhop single published crate ├── document high-level façade (Document, read_file, …) ├── context budget-aware assembly + Decision Report ├── chunking ├── retrieval ├── reranking (under feature "semantic") ├── embeddings (under feature "semantic") ├── files (under feature "files") └── core traits + typesThe trait surface
Section titled “The trait surface”redhop::core (re-exported as redhop::traits) defines the pluggable
abstractions — the entire contract a caller has to understand:
| Trait | Owns |
|---|---|
TokenizerBackend | Token counting, sentence segmentation, truncation. |
Chunker | Document → Vec<Chunk>. |
EmbeddingProvider | &[String] → Vec<Embedding>. |
Retriever | Query → Vec<RetrievalResult> + ingest. |
Reranker | Reorder candidate results. |
DiagnosticsEngine | (Query, &[RetrievalResult]) → DiagnosticsReport. |
Data flow
Section titled “Data flow”Document(s) → chunker.chunk_batch → Vec<Chunk> (optionally + Embedding) → retriever.index → [state] → retriever.retrieve(q, k) → Vec<RetrievalResult> (score + ScoreBreakdown) → reranker.rerank (optional)→ reordered top_k → diagnostics.diagnose → DiagnosticsReportHybrid retrieval fans the query out to several sub-retrievers in parallel and fuses them with Reciprocal Rank Fusion by default — rank-based and scale-free, the right pick for heterogeneous score distributions. Weighted-sum fusion with min-max normalization is available when scores are commensurable.
Why these choices
Section titled “Why these choices”Embeddings aren’t bundled. Forcing one model into the library ties users to a
single quality/latency/cost point and pulls in heavy runtime dependencies. The
EmbeddingProvider trait is async and batch-friendly, so any backend plugs in
cleanly — and the default Document path needs none.
Diagnostics are first-class. Retrieval failure modes are observable from text
alone — you don’t need the LLM to know you served it a context full of
distractors. The engine computes its metrics on every query with no model
dependence, and emits machine-readable warning codes (low_lexical_grounding,
high_distractor_ratio, retrieval_saturated) for monitoring and adaptive
routing.
Chunking is core. Chunk boundaries determine evidence density and topical
purity, which dominate the metrics that matter — and chunk granularity is the
measured lever (see benchmarks). AdaptiveChunker is the
long-term home for evidence-aware chunking; today it pairs sentence segmentation
with a Jaccard cohesion gate.
A proven BM25 engine. Lexical retrieval is solved: production analyzers, fast
scoring, an in-memory index for embeddable use. We build on a mature engine behind
the Retriever trait rather than reinventing it.
Exact cosine, in process. Dense retrieval scores the query against locally computed embeddings with exact cosine — no ANN, no external index. Correct by construction, and it keeps the whole pipeline embeddable with nothing to operate.
What we explicitly avoided
Section titled “What we explicitly avoided”- Fake-AI boundary detection in chunking — a conservative lexical-cohesion gate ships today; the rest is roadmapped, not faked.
- Speculative topology / knowledge-graph retrieval / semantic-continuity heuristics — research, not infrastructure.
- LLM integrations — once retrieval returns, RedHop is done; what comes after is the caller’s problem.
Where to go next
Section titled “Where to go next”To recap: a from_text → context surface, a lexical default that needs no model,
and a diagnostics engine on every call. One tier asks for a dependency — dense
retrieval, for the semantic and paraphrase queries BM25 misses. It’s a real
trade-off (a one-time embed cost and a model download), so it gets its own page:
→ Retrieval options — when to reach for dense retrieval, how to enable it, and exactly what it asks of you.
Next: Retrieval options · retrieval & context tips · how RedHop compares to LangChain and LlamaIndex.