Cogni

Cogni is a token-efficient context layer for coding agents. A coding agent re-sends a growing context to the model on every step: retrieved code, tool outputs, and the transcript of its own reasoning. You pay for all of it, on every turn. Cogni sits between the agent and the model API and trims what gets sent, without changing the answers the agent produces.

It targets a closed, frontier-model API: no logits, hidden states, or prefill tricks, only retrieval, static analysis, and ordinary API calls. Every claim is measured by holding the work fixed and varying only the component under test, so a saving can never come from quietly doing less. Efficiency here means tokens or cost at a fixed success rate, never a quality trade.

The three steps

Cogni works in three steps, each built and measured as an independent ablation. The value is the per-step number, not one combined figure.

  agent context (grows every turn)
                 ↓
  +-------------------------------+
  |  1  cAST retrieval            |   repo      → top-k chunks
  |  2  Skeleton-first compression|   chunks    → signatures
  |  3  History compression       |   old turns → summaries
  +-------------------------------+
                 ↓
             model API
       fewer tokens, same answers

cAST retrieval. Chunk the repository at syntax-tree boundaries, embed, and serve the top-k relevant chunks by exact cosine search. The agent gets a few relevant definitions instead of whole files. recall@10 = 0.824; mean retrieved-code = 3554 tokens.
Skeleton-first compression. Render lower-ranked chunks as a signature plus first-paragraph docstring, with the body replaced by an anchor for re-reading on demand. Retrieval is untouched, so recall is unchanged. 111 of 111 chunks still parse.
History compression. Between steps, summarize older observations under an editable guideline while keeping the latest action and observation verbatim. The summarizer runs on a cheaper model than the agent, so net cost drops about 20% with success held by construction.

The benchmark

Every number comes from a frozen benchmark, committed once so results stay comparable across changes. The target is django/django pinned at release 4.2.3, with 20 natural-language queries and hand-labeled ground-truth code spans. A tiktoken-compatible meter buckets every counted string by source, so each component’s effect is attributable, and results are reported in more than one framing, including the least flattering one. Committed reports live in bench/results.

Get started

Cogni is a Go module. The offline test suite needs no API keys:

git clone https://github.com/islamborghini/cogni2
cd cogni2
go test ./...

The end-to-end benchmarks that call external services are gated behind a build tag and environment variables, so they never run by accident. For a plain-language walkthrough of the design, read how it works.

Cogni is open source under Apache 2.0. The code is on GitHub.

For inquiries, reach me at islam@getcogni.dev.