Cogni
Cogni is a token-efficient context layer for coding agents. A coding agent re-sends a growing context to the model on every step: retrieved code, tool outputs, and the transcript of its own reasoning. You pay for all of it, on every turn. Cogni sits between the agent and the model API and trims what gets sent, without changing the answers the agent produces.
It targets a closed, frontier-model API: no logits, hidden states, or prefill tricks, only retrieval, static analysis, and ordinary API calls. Every claim is measured by holding the work fixed and varying only the component under test, so a saving can never come from quietly doing less. Efficiency here means tokens or cost at a fixed success rate, never a quality trade.
The three steps
Cogni works in three steps, each built and measured as an independent ablation. The value is the per-step number, not one combined figure.
agent context (grows every turn)
↓
+-------------------------------+
| 1 cAST retrieval | repo → top-k chunks
| 2 Skeleton-first compression| chunks → signatures
| 3 History compression | old turns → summaries
+-------------------------------+
↓
model API
fewer tokens, same answers
- cAST retrieval. Chunk the repository at syntax-tree boundaries, embed, and serve the top-k relevant chunks by exact cosine search. The agent gets a few relevant definitions instead of whole files. recall@10 = 0.824; mean retrieved-code = 3554 tokens.
- Skeleton-first compression. Render lower-ranked chunks as a signature plus first-paragraph docstring, with the body replaced by an anchor for re-reading on demand. Retrieval is untouched, so recall is unchanged. 111 of 111 chunks still parse.
- History compression. Between steps, summarize older observations under an editable guideline while keeping the latest action and observation verbatim. The summarizer runs on a cheaper model than the agent, so net cost drops about 20% with success held by construction.
The benchmark
Every number comes from a frozen benchmark, committed once so results
stay comparable across changes. The target is
django/django pinned at release 4.2.3, with 20
natural-language queries and hand-labeled ground-truth code spans. A
tiktoken-compatible meter buckets every counted string by source, so each
component’s effect is attributable, and results are reported in more
than one framing, including the least flattering one. Committed reports
live in
bench/results.
Get started
Cogni is a Go module. The offline test suite needs no API keys:
git clone https://github.com/islamborghini/cogni2
cd cogni2
go test ./...
The end-to-end benchmarks that call external services are gated behind a build tag and environment variables, so they never run by accident. For a plain-language walkthrough of the design, read how it works.
Cogni is open source under Apache 2.0. The code is on GitHub.
For inquiries, reach me at islam@getcogni.dev.