Written by Max Rumpf on April 5th, 2024.

I pin human completions at 1 token per second in all calculations. That is realistic for high quality human tokens, although some people are faster or slower.

The productivity speedup AI apps can provide is limited by how much human-in-the-loop work is required. Humans are ~1-3 tokens per second. They can’t really be sped up – unless you’re Neuralink.

BCG puts this number at 1.4x. I'm fine disagreeing. The 1.8x is at current GPT-4 speeds

So if your application requires a human completion for every LLM completion (i.e. ChatGPT or AI Copilots) then your maximum speedup is ~2x – even when LLMs become 10x faster.

Devin doesn't fully realize it's potential yet.

Cognition Labs’ Devin is better, because it needs a human completion only every ~10 iterations. At current speeds, this feels about 2.9x better than raw ChatGPT, which is nice, but not mind-blowing. But because they’re frugal with human tokens, they can go to ~10x productivity speedup just by waiting for models to get faster!

Let's free that y axis! Future agent only needs a human completion every 100 iterations.

The fun stuff starts when AI agents get to the 100-1000x range, i.e. only require human input every 100-1000 iterations. It’s going to be a long way there – but I’m excited every time I see something that will get us closer: Like code execution from E2B.dev, browsing from Browserbase and a context engine from SID.ai.

Many copilots & current ChatGPTs will seem silly in hindsight: Like doing a 1 on 1 with your intern every 15 minutes – when you could be managing a team that does a month’s worth of progress between every meeting.

Today, developers are frugal with LLM tokens (I know: they’re expensive) – alas we’ve built tools to use them wisely: Parea, Humanloop, Langfuse, LangChain. But the most important thing to be frugal with are human tokens (both input and output) – they will define the overall productivity speedup your application can provide. Humans are insanely slow.

AI agents don’t yet work well – but it won’t be a competition once they do.

Naturally, there are many caveats here: Iterations are gameable, and reducing human tokens has been an important trend outside of agents, too: Google let you find information with fewer keystrokes and reading than anyone else – same holds for Perplexity today. Button presses can be tokens (depending on the action they trigger) etc.


Back of the napkin calculation on human token cost. At $50/h and 400 tokens per minute input (reading/listening at 2x) / 100 tokens per minute output (slow typing/speaking with breaks): $8333/1M output tokens or $2083/1M input tokens, which is ~ 100x more expensive than GPT-4.