Two Weeks in the Cave
What happens when a joke keeps shipping
Two Weeks in the Cave
TLDR: Two weeks ago me push joke to GitHub. Now repo have 37,000 stars, me build CaveKit for parallel agent, CaveMem for semantic memory, and Caveman Code coming soon, so me stop post and keep ship.
Why many token when few do trick?
Two weeks ago I pushed a file to GitHub. It was a prompt, basically. One text file that forced Claude to talk like a caveman when it was helping you code. No polite filler. No five paragraph explanations of things you already understood. Just the answer, grunted back at you.
This morning the repo is sitting at 37,000 stars and it was more talked about than the Artemis 2 launch on Hackernews.
I want to actually walk through the last two weeks, in order, because the shape of it only makes sense chronologically. A lot of my new subscribers aren’t daily Claude Code users. Some of you found this Substack because a friend sent you a screenshot of a benchmark and you wanted to know who the person behind it was. Fair. This is me trying to explain the last fourteen days in a way that doesn’t assume you live inside the AI developer bubble and still giving some insight into how this bubble internally feels.
First, the thing that was happening before I showed up
You need to understand the backdrop, because without it none of this landed the way it did.
For most of March, a lot of people who build software with Claude were quietly falling out of love with it. Anthropic’s coding tool, Claude Code, is what a lot of developers use to get actual work done. You subscribe monthly, you get a certain amount of usage, you build. Starting in late March, something shifted. People on the $200 a month plan were reporting that their 5 hour sessions were burning out in 60 minutes. Someone on the highest tier watched a single prompt take them from 21% of their allotment to 100%. The forums turned into a war zone. Anthropic acknowledged the issue, tightened peak hour limits to manage demand, and the coverage got progressively less friendly. Fortune wrote a piece about the backlash. VentureBeat ran a long investigation into whether the models had been quietly made cheaper to run.
I’m not going to adjudicate any of that. I’m a user like everyone else. What I will say is that the mood of the community in early April was: I am paying for this and it is running out and I don’t know why.
That was the air the joke got released into.
Week one, day one: the joke
The idea for caveman was almost too simple. If Claude is burning through your budget by being verbose, what happens if you just tell it to stop being verbose? Not in the polite “be concise” way, which never works, but in a way that forces the model’s hand. So I wrote a prompt that made it talk like a caveman.
I ran benchmarks because I wanted to know if it actually did anything. The result was around 65% fewer output tokens across ten real coding tasks, with no loss in accuracy. You could rip out most of the English wrapper around a Claude response and the technical content underneath was perfectly intact.
The install was one line. Three intensity levels. I thought maybe a few dozen people would laugh at it and that would be the end of the story.
Reddit caught it first. Then Hacker News. The phrase “why use many token when few token do trick” caught on immediately, because it was both a joke and a genuine technical insight compressed into one sentence. By day three YouTube explainers were picking it up. By day five it was in dev newsletters and AI news accounts on TikTok. The Primeagen did a video on it. Mainstream tech outlets started running pieces.
The timing was the whole thing. People were angry about their bills. A meme showed up that cut their bills by 60% and also made them laugh. The meme spread at exactly the speed anger spreads online, which is fast.
I want to be clear about what I was doing during this week, which was: merging pull requests. Strangers were making the project better than I could have alone. A caveman commit message generator. A one line code review mode. People were forking it into other languages. 18 contributors in the first few days. Including people working on the agents that I actually built this product for. I spent most of my time reviewing other people’s work on something I had built in one night.
Week one, days four through seven: what do you actually do with a viral moment
The most seductive option, when something you built goes viral, is to ride the attention. Podcasts. Threads. Interviews. Turn yourself into the character the internet thinks you are. I watched people in my feed do exactly this over the same week, with their own projects, and I don’t think any of them were wrong to do it. The attention is a real asset and it decays fast.
The other option, which is less photogenic, is to treat the attention as cover for shipping more work. You stop posting. You open the editor. You find out whether the first thing was a fluke or a primitive.
I picked the second option. Not out of principle. Out of curiosity. I wanted to know if the compression idea generalized, or whether I had just gotten lucky with one clever prompt.
The way you find out is by building the next layer.
CaveKit
The first thing I shipped on top was CaveKit. This one had been brewing before caveman hit, but the viral wave gave me a reason to move faster.
CaveKit is a tool for running multiple Claude Code agents in parallel from a single spec. You write a short description of what you want built. CaveKit fans it out across multiple agents working on different parts of the problem. The outputs come back in a shape you can actually review. If you’ve ever tried to orchestrate more than one AI agent at a time, you know the problem: it’s easy to start, impossible to keep clean.
The reason caveman made CaveKit viable is that when you’re paying for N parallel agents, shaving 60% off each of their output streams is not a rounding error. It’s the difference between the workflow being economically sane and being a party trick. CaveKit picks up the caveman compression inside each subagent automatically. The orchestration layer doesn’t need to know. You just get a smaller bill.
The unreasonable effectiveness of caveman, it turned out, wasn’t the prompt itself. It was that the prompt was a primitive. And once you treat it as a primitive, you can compose it into things that feel much larger than the underlying file.
CaveMem
Then I built CaveMem, which is probably the one I’m most quietly proud of.
Here’s the problem. When you work with Claude over many sessions, you want it to remember things. What your project is. How you like your code structured. What decisions you’ve already made and don’t want to relitigate. Most people solve this with a CLAUDE.md file: a big text file of notes that Claude reads at the start of every conversation. It works until it doesn’t. The file grows. Every new conversation re-reads all of it. You end up paying to re-read your own notes every time, and past a certain size the notes themselves start crowding out the work.
CaveMem is a semantic memory layer that sits underneath your conversations. Every exchange you have with Claude gets compressed into caveman speak and stored as a searchable memory. When you start a new conversation, CaveMem doesn’t dump the whole history into context. It retrieves only the relevant pieces, on demand, as a RAG layer. You get continuity across sessions without paying the token tax for it.
The compression matters here for a reason that took me a while to articulate. RAG systems are only as useful as what you put in them. If you store verbose, polite, English-heavy conversation history, your retrieval surfaces a lot of filler. If you store compressed caveman summaries, your retrieval surfaces signal. Same system. Better answers. Lower cost.
The thing I didn’t expect with CaveMem is how many people already had this problem and hadn’t named it. I got messages from teams whose memory files had crossed 40KB. From solo developers whose project context was longer than their actual project. Once you give a problem a name and a one line install, it turns out a lot of people were quietly suffering and had decided the suffering was a skill issue on their end. It wasn’t. It was a tooling gap.
Week two, middle: Opus 4.7 ships in the middle of everything
Two days ago, in the middle of all this, Anthropic released Claude Opus 4.7. Their new flagship model.
I’m bringing it up because it matters for the story, not because I want to review it. Two details are relevant. First, the new model has a feature called xhigh effort that lets you dial up how hard it thinks on a given task, which means the output per task can get larger than it used to be. Second, the new tokenizer counts tokens slightly differently, which shifts the math on every existing compression measurement by a small but non-zero amount.
I spent Thursday running the old caveman benchmarks on the new model. The short answer is that caveman still works. The compression ratio is slightly different because the tokenizer is different. The savings are real. The accuracy holds. I’ll publish the updated numbers this week.
The longer answer is that Opus 4.7 made the compression work more relevant, not less. When the model is allowed to think harder, the surplus verbosity you’re trimming is larger in absolute terms. The bigger the model’s appetite gets, the more useful it is to put it on a diet.
Week two, right now: Caveman Code
The thing I’ve been working on for the last week, and will ship soon, is Caveman Code.
Working pitch: a thin fork of a coding agent with caveman compression and CaveKit’s spec-driven workflow built in from the first commit. The idea is to stop asking people to install three things and start giving them one thing that has the compression, the spec workflow, and the agent loop all in the same binary. No configuration. No glue code. You run it and it does the thing.
I’m being deliberately vague about the internals because it’s not out yet and I don’t want to promise anything I can’t deliver. The shape of it is clear to me now in a way it wasn’t two weeks ago. Caveman Code is the project where I find out whether this whole line of work is a platform or just a clever local optimum.
What I actually learned
Two weeks is not enough time to draw lessons. I’ll try anyway.
The first thing I learned is that the compression idea was real, which I didn’t know for sure when I shipped the first version. A joke that gets 37,000 stars could just be a joke. A joke that gets 37,000 stars and also makes three other tools cleaner when you build them on top of it is probably pointing at something structural.
The second thing is that the hard part of a viral moment isn’t the moment. It’s the week after, when the attention is still there but the novelty is gone and you have to decide whether to be a content creator or a builder. Those are different jobs. Neither one is wrong. But pretending you can be both, in my experience, mostly means doing both badly.
The third thing, and this is the one I’ve been thinking about the most, is that tooling work in the AI ecosystem right now is weirdly underpriced. The models get the attention. The pricing changes make the news. The benchmarks get debated. Meanwhile, a prompt that fits in a single text box can save millions of users 60% of their output tokens, and almost nobody was working on it before I did. Not because it was hard, but because it looked too silly to be serious.
I think there’s a lot more of that kind of work sitting on the ground, waiting for somebody to pick it up. (hint hint to the builders reading this)
The primitives are still raw. The stack is still forming. If you’re a builder and you’ve been waiting for the right moment, the moment is now, and the cost of being wrong is one late night and a repo that nobody stars.
The last thing, which is less a lesson than a confession, is that caveman is partly succeeding because the economics of using these models got worse for a lot of people in March and April. When limits feel abundant, compression is a nerd optimization. When limits feel scarce, compression is a survival tool. I’m not celebrating that. But I’d be lying if I said I hadn’t noticed it.
What’s next
I’m going back to the terminal. Caveman Code ships soon. The CaveKit docs need a real pass. CaveMem has edge cases I want to clean up before I promote it more widely. The Opus 4.7 benchmarks will go up when they’re right.
If you subscribed because of caveman, thank you. I’m going to try to earn the subscription by continuing to ship things rather than by posting about the things I’ve already shipped. That probably means fewer updates from me than you’d get from a more professional operator. It also means that when an update does arrive, it’ll be because something actually exists.
The repo is the work. The work is still small, even if the audience got big. I think that’s the right order for these things.
TLDR: Two weeks ago me push joke to GitHub. Now repo have 37,000 stars, me build CaveKit for parallel agent, CaveMem for semantic memory, and Caveman Code coming soon, so me stop post and keep ship.
