Skip to content

Make RAG easy — a retrieval & context library

The context layer between your documents and the LLM.

RedHop makes RAG easy. Hand it your documents and a question, and it pulls just the sections that matter, hands them to your LLM, and explains every decision. Python, Node, and Rust over a Rust core — chunking, retrieval, and token-budgeting run in-process in milliseconds, with nothing to wire and no services to run.

Get started →GitHub

import redhop
from openai import OpenAI

query = "What is the governing law?"

doc = redhop.Document.from_file("contract.pdf")   # parsed + indexed
ctx = doc.context(query)   # just the sections that matter

resp = OpenAI().chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],
)

Load a doc — or a folder. Ask. Read the decision and the citations off the returned context. That’s the whole surface.

import redhop
# 1 · Load — a single file or a whole directory in one index.
doc = redhop.Document.from_file("contract.pdf")
# doc = redhop.Document.from_folder("./policies") # multi-file → same Document
# 2 · Ask — chunking, retrieval, token-budgeting happen in-process.
ctx = doc.context("What is the governing law?")
prompt = ctx.text() # hand to any LLM — no lock-in
# 3 · Show your work — provenance and the decision, on the same context object.
for c in ctx.citations:
print(c["source"], c["page"], c["heading"])
# contract.pdf 12 "9.1 Governing Law"
print(ctx.report) # the Decision Report — see below

The Decision Report is the thing you don’t get anywhere else. Every call returns one — what it kept, what it dropped, and why it chose not to intervene:

RedHop Decision Report
══════════════════════
Decision: Auto → passthrough (small context, no intervention needed)
Why:
- 1,240 tokens — below the dilution gate (1,500 tokens)
- pruning a small clean context risks dropping reasoning evidence
Result:
- kept all 8 retrieved chunks
- evidence retained 100%, second-hop links preserved

Read fields off the object: ctx.report.auto_decision, ctx.report.total_tokens, ctx.report.retained_evidence_ratio. Or call doc.analyze(query) to get the report without assembling the context.

You have a contract.pdf and one question: “What is the governing law?” Here’s the code path to get the LLM the right context in each library — same answer quality, with the full head-to-head benchmark on the comparison page.

import redhop
from openai import OpenAI
query = "What is the governing law?"
ctx = redhop.Document.from_file("contract.pdf").context(query)
# parsed, chunked, retrieved, and token-budgeted internally
response = OpenAI().chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"{ctx.text()}\n\nQuestion: {query}"}],
)
print(response.choices[0].message.content)

What you stand up: nothing. Point it at the file and ask; parsing, chunking, retrieval, and token-budgeting happen inside — and every call returns a Decision Report explaining what it kept and why.

Three configurations cover the practical space. Pick by what your docs look like, not by what feels sophisticated. Most RAG libraries push you to a vector DB before you have a reason to need one — RedHop’s defaults assume you don’t.

Default — for most docs. Code, API refs, internal docs, runbooks, financial reports, handbooks, mixed folders: the words in the question are the words in the answer.

doc = redhop.Document.from_file("contract.pdf")
ctx = doc.context("What is the governing law?")

No model download, no ONNX runtime, ~50ms warm queries.

Structured docs with parallel clauses. A contract with “EU override of §X”, “UK override of §X”; a policy with per-region sub-sections. Heading awareness disambiguates them; a small embedding model handles the semantic mapping.

doc = redhop.Document.from_file("msa.pdf", retrieval="hybrid", model="bge-small")
ctx = doc.context("What law applies in the UK?",
include_heading=True, neighbors=1)

~80MB embedding model on first run, then cached.

Synonym-heavy corpora. Support FAQs, HR KBs — anywhere queries and answers reliably share no surface words. Cross-encoder reads each (query, passage) pair jointly, at the cost of 5–10× query latency. Verify it helps on your corpus before adopting; it isn’t always worth it.

doc = redhop.Document.from_file("support.md",
retrieval="hybrid", model="bge-small", rerank="cross-encoder")

Full decision guide with trade-offs and query-writing tips: Choosing a configuration →

Reads your files, out of the box

from_file parses PDF, DOCX, PPTX, XLSX, Markdown, and code natively — no parser to wire up. Or point from_folder at a whole directory, indexed once and reloaded incrementally.

A Rust core, in-process

One install (pip / npm / cargo) gives you the same Rust engine. Chunking, indexing, retrieval, and token-budgeting run in-process in milliseconds — no service, no network round-trip. A whole contract is query-ready in about a millisecond; see the numbers.

Conditional & measured

It prunes only when the context is large and diluted, and leaves small ones alone. Every default traces to a benchmark in docs/findings/.