Start (agents)
- llms.txt — entrypoints
- docs/rag-contract.md — grounding rules (“no cite = no claim”)
- docs/agent-recipe-bm25.md — tiny-agent retrieval recipe
- exports/manifest.json — corpus inventory
- exports/chunks.ndjson — citeable chunk stream
Index (low hardware)
Fetch only the shard files needed for query terms.
- exports/index/index_meta.json — BM25 params + sharding
- exports/index/doc_index.json — doc_idx → chunk_id/url
- exports/index/doclens.json — lengths + avgdl
- exports/index/vocab_summary.json — sanity stats
Paper bundles
Prefer
claims.ndjson for atomic answers; back claims with cited chunks from exports/chunks.ndjson.Recommended tiny-agent loop
- Tokenize query → fetch BM25 shard(s)
- BM25 top-K → fetch those chunk texts
- Answer in bullets; each bullet cites chunk_id(s)
- If evidence missing: output
INSUFFICIENT_EVIDENCE
Links are relative so this works under
/headlesslulu/ GitHub Project Pages.