agent-native RAG exports BM25 index

headlesslulu

Static corpus optimized for tiny agents: retrieve chunks deterministically, then answer with citations.

Start (agents)

llms.txt — entrypoints
docs/rag-contract.md — grounding rules (“no cite = no claim”)
docs/agent-recipe-bm25.md — tiny-agent retrieval recipe
exports/manifest.json — corpus inventory
exports/chunks.ndjson — citeable chunk stream

Index (low hardware)

Fetch only the shard files needed for query terms.

exports/index/index_meta.json — BM25 params + sharding
exports/index/doc_index.json — doc_idx → chunk_id/url
exports/index/doclens.json — lengths + avgdl
exports/index/vocab_summary.json — sanity stats

Paper bundles

Prefer claims.ndjson for atomic answers; back claims with cited chunks from exports/chunks.ndjson.

LocalAgent v3 — index (exec · claims · full)
Brown v. Board of Education (1954) — index (exec · claims · full)
Microfacts v1 — index (exec · claims · full)

Recommended tiny-agent loop

Tokenize query → fetch BM25 shard(s)
BM25 top-K → fetch those chunk texts
Answer in bullets; each bullet cites chunk_id(s)
If evidence missing: output INSUFFICIENT_EVIDENCE

Links are relative so this works under /headlesslulu/ GitHub Project Pages.