Reflections on AI, May 2026
Foreword: Why I am writing this
As people who know me are well aware, I’ve been spending a lot of time thinking about this AI thing lately. There is a lot happening all the time and I am constantly updating my beliefs, learning new things, and just catching up with the latest news. As I experiment, read, and discuss with friends, I try my best to predict the future a little bit. What I’ve noticed is that I take things for granted now that I was very uncertain about only 6 months ago. That is perhaps not a great sign for the strength of my predictions, but then I like to think that predicting the future is unusually difficult right now. Change is the only constant. At the same time, I notice more and more uncertainty and concern from my friends outside the tech bubble about what impact AI will have. If I can help reduce that uncertainty by sharing my perspective – I must.
This made me feel like I really should write these thoughts and predictions down so that I can look back at them and review, and so others can get a peek into my perspective. This is the first such text. Posting it publicly serves a double purpose: First, it opens up to external scrutiny which gives quick feedback on my thoughts, and it can serve as a foundation for further discussion. Second, I currently believe there is a massive information and experience asymmetry around frontier AI. Most people do not have access to the latest models, and even if they do, they do not have time or the environment around them to explore the latest and greatest ways of utilizing tools. It is a social thing as much as a resource thing.
Through my work at a leading AI startup and regular discussions with friends at frontier AI labs I believe I get an unusually early and complete picture of the bleeding edge. If this writing can help disseminate that knowledge more broadly, that seems very good. I believe more people having accurate context about the state of this technology is a good way to ensure the technology is used for good. A small number of people will always fall victim to internal bias and struggle to accurately predict the desires of the many. I alone could not define “good” for all people. (This is why democracy is so great!) I expect the format to be fairly personal, not fully thought through and moderately scattered. I hope you, and my future self, enjoy it!
[Skip to bullet list of my view of the state of AI in May 2026]
Early takeoff
In 1956, when Dutch computer scientist Edsger W. Dijkstra invented his famous algorithm for finding the shortest path between cities on a map, computers (the machines) were widely regarded as useless. There were few everyday problems that computers could help with, to such a point that “programmer” was not considered a real job (while notably “Computer” was still a job title). Yet, to a small number of people it was apparent that this technology had massive value and would revolutionize the world – clearly finding the shortest path between any two points on a map is useful.
May 2026, as I begin writing this text, I see certain parallels with AI. To me and my colleagues, the utility is extremely noticeable: all the things I did when I started my job one year ago are today done by agents, and humans merely ask the questions. In many ways, the job I started does not exist anymore. I used to be an experimental physicist; it is not normal to go from a physicist with a bit of coding experience to one of the most productive software engineers at a software company in just a year. I am at least twice as productive as I was a year ago, and I have agents that work for me day and night doing as much work as I was manually doing before. What I am experiencing is continued acceleration in that direction. Some call this early takeoff: the start of an inevitable societal transformation.
Yet, when I talk to people outside of tech, their belief in AI capabilities is… limited. Many still question the usability of the technology and some even speak of an impending bubble bursting. This discrepancy in perception is stunning! To be clear: the utility achievable today is not limited to writing code. I already do my taxes, business admin and bookkeeping using Claude. It just works. In fact, it does a better job than the consultants I used to hire for thousands of dollars per year.
And somehow it is true that my friends speak of three-year timelines until AI can even start doing any kind of bookkeeping reliably. Why? Diffusion is much slower than I had expected. That is, the time it takes from when something becomes possible to when it becomes widely adopted. And in a sense this should come as no surprise: if you do not use frontier AI at least every few weeks, you will be far behind in your mental image of what is possible. Humans are bad at internalizing exponential growth (as highlighted by e.g. the Covid-19 pandemic). The fact is still that most people have never paid for AI, have never used frontier models, and have not spent 20 hours trying to solve their problems with AI. And why would you, if they seem mostly useless? It is hard to even come up with what tasks to give it if you have no intuition for AI capabilities. These things take time to learn.
Sometimes tech people like to say things like “It will just take 5 minutes to do with AI”. This is a bit like saying “Solving this complex integral will take just 5 minutes”, neglecting to mention the need for hours of studying calculus, including all the little tricks to reach the answer faster. Learning to use capable tools takes time because we need to build intuition for the capabilities, and all the small things a seasoned user takes for granted.
I notice the gap in perceived AI capabilities even internally at my company (which by all standards is at the very forefront of AI adoption). Even just a slight difference in capability perception leads to a massive difference in possible output when it comes to certain tasks. For example, we recently beta-tested an “agentic coding interview” where we wanted to allow all types of AI tools. I was given 45 minutes. I just copy-pasted the interview task into my local AI setup and went to grab some snacks. After 17 minutes it had completed the task flawlessly and even built out an extensive demo and an explainer of exactly what it had done. When others tried it, it took them anywhere from 14 to 65 minutes to solve the task – while having access to the exact same tools!
This is just to illustrate the variance of output even within a very AI-forward organization. We had to rethink the interview.
Being “AGI-pilled”
There is this fascinating concept of being AGI-pilled. It basically describes someone who believes that Artificial Intelligence is going to be human-level at most things, and that it will rapidly reshape society, the economy and the way we work and live.
It is a bit presumptuous to implicitly claim that you are one of few who see the world as it truly is. To top it off, the term “AGI” is extremely vague. Yet, it is kind of useful as a shorthand.
Am I AGI-pilled? I’m starting to believe that it is probably wishful denial to believe that humans will be better at making any decisions about code at all in the very near future. The question is rather how far beyond code that will be true, and how soon. I believe this because I have seen it happen bit-by-bit and with a clear trajectory in my own work. For example, I recently built a cost-monitoring agent for Lovable. For my first version, I built out a clever (and complex) anomaly-detection framework using things I’d learnt from signal processing. I then told an agent to just run that program every few hours and report anomalies. It was terrible. It triggered on random noise and daily fluctuations. It failed to notice slow but sustained changes, akin to someone making a small but bad change (the stuff I wanted to catch). What did I do instead? I just told the agent: “Fetch the recent data and look at it. If anything looks off, page me and investigate possible causes.” Yes, seriously – that worked flawlessly! By the time I get to my computer or phone, it has usually already found the root cause. My ego is still recovering (though it feasts on the knowledge that the agent is saving us millions of dollars, and it realizes this will continue to happen).
Where we used to have to prompt models hard to use best practices for coding (refactoring code, building small extensible parts, making sensible abstractions) we might soon find ourselves prompting the opposite: “Please make changes in chunks humans can understand”, “Don’t do cleanup for simple tasks”, “Don’t overthink future extensibility”. To use the manager analogy: I think we are moving from “Give clear instructions and inspire” to “Get out of the way and trust” – if our goal is maximum output. What this means for humans is profound and hard to predict. I strongly believe we are severely underestimating the impact this will have on work. We will need to spend much more time writing specifications and deciding what to do, instead of how to do it.
I have started giving some of my close family and friends Claude subscriptions to help bridge this gap. What I notice is that it is still hard to get going; the time investment needed to see value is still high. I have hundreds of colleagues who teach me all the latest workflows, showing me the things that work now that did not work 4 months ago. I constantly have to update my beliefs.
Closing the loop
I must briefly mention a recent article published by The Anthropic Institute (a part of Anthropic): “When AI builds itself”. It talks about recursive self-improvement: AI accelerating AI development. In the article they present evidence of this feedback loop. I recommend reading the full article. In my opinion it is well written and realistic.
One thing I want to highlight is their mention of Amdahl’s Law: the authors basically say that as one part of a system becomes very efficient, something else becomes the bottleneck. Writing code is now almost free, but reviewing it is still expensive. This does to some extent explain the slow diffusion we are observing: human adoption is bottlenecked by human timelines – how often we talk to each other, try new things, and update our beliefs. Something that struck me is that AI systems appear remarkably good at identifying bottlenecks and automatically addressing them. I do believe an argument like this is the strongest argument against AGI in the near future: that we may hit some bottlenecks that are simply slow to get past (Compute? Energy? Metal atoms?) However, I also think this risks being wishful thinking – at least I would not rely on it.
When I first learnt of the model that has come to be known as “Claude Mythos”, it became clear that this level of capability would become generally accessible within 6 months (that is, within at most a few months now). Because even if Anthropic does not release it, someone else will have caught up by then. If nothing dramatic changes, this will continue to happen: progressively more powerful models will keep being released.
The question most people seem to still be asking is “Can it ever?”. I instead urge everyone to internalize the current state of AI and try to get a sense of how fast things are still developing. Test things! Experiment! Talk about it! This could very well be 100 times more important than “business as usual”. If you think about these issues today, you will have an incredible opportunity to shape the future.
The question we collectively need to urgently answer is: What do we want the world to look like when AI development continues to accelerate and becomes widely adopted? Do we actually want this continued acceleration? What does a world with full AGI actually look like?
Benjamin Verbeek
Stockholm, Sweden
May 2026
Below is a concrete list of observations I have made in May 2026 about the state of AI. It may help give some more substance to why I observe continued acceleration.
The state of AI in May 2026, as perceived by me
Frontier models & capabilities
- One of the biggest recent changes in my workflow (over the past ~2 months) is long-running tasks. I make one thoughtful request and leave agents (with access to most systems I have access to) running autonomously for several hours. I come back to an incredible amount of work done and several bumps along the way handled excellently. I like to just get a finished HTML report which I read in the browser, and maybe a test report with some evidence. This capability is still very poorly productized. Example: “Build a publish tool for the Lovable agent. When you are done, test it end-to-end by creating some complex Lovable projects and testing edge cases. Show me the resulting conclusions as an HTML report opened in my browser.”
- OpenAI’s GPT-5.5 (effort xhigh) is currently considered the strongest coding model. Engineers are generally willing to pay for fast-mode.
- Anthropic’s Claude Opus 4.7/4.8 is considered a bit faster and more pleasant for general-purpose tasks. Many use both Opus 4.8 and GPT-5.5.
- Cursor’s Composer 2.5 has received general praise for being exceptionally fast given its capability tier
- Claude Code is still the main driver for general-purpose tasks, in part due to integrations lock-in (it has access to basically everything I have access to)
- Most developers use both Codex and Claude Code in tandem, many just in the terminal or with some semi-custom tmux/ghostty setup
- An increasing amount of work is being done by Cloud Agents and automations
- GPT Image 2 is incredibly good at image generation, even with text.
What has been solved
- >95% of all code at Lovable is written by AI
- Compaction just kind of works now, especially in Codex. Conversations can now have almost arbitrary length without context-rot or severe quality degradation.
- One million token context windows also work well (but they are expensive to operate)
- Plan mode is no longer needed. Just use
--yolo,--dangerously-skip-permissionsorauto-mode - Navigating and progressively finding skills on its own just works
- Giving tasks that run reliably and productively for several hours works well given a good setup with healthy guardrails
- No need to prompt using things like “You are an expert analyst…”.
- Reciting long strings of random characters (e.g. links) reliably works
- Hallucination is low. Trust in agents is growing by the day.
- The need for specialized subagents is being replaced by general-purpose subagents
What is still bad
- Writing text that is nice to read. It is still very verbose and distinctly AI.
- Estimating the time it will take for AI to do tasks (it still seems to estimate human-equivalent times). Often proposes work that will take “a day” and then proceeds to complete it in 5 minutes.
- Memory and learning from mistakes between sessions is still clumsy.
- Using external agents to review code is still useful and often surfaces issues the implementer fails to notice.
Other observations and predictions
- Code generated is generally on par with a good engineer now. For specific tasks, it will almost always outperform an engineer. Prompting things like “Fix this” generally works better than “Fix this in this way”.
- For broader architecture and product decisions, humans generally still outshine models, though it is getting close.
- I predict that in 6 months, code review will be mostly solved and less than 5% of code produced will be read by humans.
- I also predict that in 6 months there will be parts of our codebase that are autonomously written and iterated on by agents without any human intervention (a first step towards self-improvement)
- We will spend more time defining outcomes than methods to achieve them
- The Pope spoke about AI (quite critically). Students booed AI mentions in commencement speeches. Public perception of AI is not very positive. Even if (when) AI turns out to be broadly useful, it is not clear that this will lead to a better world for humans. Here I believe governments and companies alike have a responsibility to make sure it does benefit all.
- As some are starting to see the power of AI, some governments have started taking it very seriously. There is even a risk that we will start seeing nationalization of compute and travel restrictions on top AI researchers by e.g. China and the United States.
- A general-purpose OpenAI model disproved a long-standing mathematical conjecture. This is a genuinely impressive result. It combined insights from two separate areas of mathematics – a combination of knowledge very few mathematicians possessed. Thus, some consider it less of a “Eureka-moment for AI” and more of an impressive combination of existing pieces of knowledge. Yet I can’t help but think “How much of scientific advance is just combining existing pieces of knowledge in new ways?” I suspect quite a lot, if not all.
Remember: just two years ago, many programmers were sceptical of AI ever producing a significant portion of production code. Now that is a fait accompli. It seems naive to expect less from the future. We are close to another step-change in AI capability for writing code, and it is just a matter of time before we see widespread adoption of AI for doing taxes and bookkeeping.
Newsletter
If you found this interesting you can get an email when I publish something else (if I remember to send it).