GPT-5 is here, and nobody is happy
GPT-5 is here, and social media has boiled over with anger and indifference.
Fortunately, social media is as good at assessing value as a seismograph on a trampoline. Let’s dive into what went wrong, and what really happened.
What went wrong #1:
Sam Altman is a dirty tease
“I don’t think I’m going to be smarter than GPT-5”
Sam Altman - Berlin, Feb 2025
The CEO of OpenAI surely knew people use charts like this:
When, the day before GPT-5 dropped, he tweeted this:
It turns out: GPT-5 is no moon.
Instead, it was mostly a lot of solid improvements, which if summarized for a normie, might read: Smarter. Simpler. Cheaper.
I’m gonna argue GPT-5 was a big deal, albeit in ways that are not so easily expressed, but first let’s look at what went wrong.
What went wrong #2:
Intelligence is hard to demo
Here’s a game for you.
Get in a time machine and visit the 1927 Solvay Conference, with Einstein, Bohr, Curie, Heisenberg, and Schrödinger, to debate quantum mechanics:
Could you talk to that group, and tell me accurately: who was smartest? How many people in the whole world could?
They’re different kinds of smart, of course - intellectual apples and oranges. But chances are, you probably aren’t smart enough to tell (sorry).
I feel like this with modern AI. For fields I’m an expert in, I can see its gaps, and suffer its shortcomings. But outside of that narrow cone, all I can say is “GPT seems pretty smart”. I have no clue whether its Russian poetry just got better or not.
In the Dreyfus model of skill, we can judge peers and those below us, but our ability to judge those one to two levels above us collapses rapidly. Novices simply can’t judge experts.
So if AI models are PhD-level in many fields, and most people are not PhD-level at all, this leads to a harsh realization: People mostly judge AIs on vibes (and sometimes, benchmarks).
Which brings us neatly to…
What went wrong #3:
OpenAI shat the bed on benchmarks
GPT launched with one of the greatest chartcrimes of all time. Spot the problem:
“Coding deception” wasn’t any better:
Most people don't understand these benchmarks, of course, and couldn’t even describe what they do.
But like gigahertz, horsepower, or GPA – these benchmarks are a necessary marketing metric that people need, to make intuitive sense out of the ineffable.
When OpenAI made these kind of meme-worthy mistakes they torpedoed any natural sense of “this is better”. Most people don’t know what the numbers mean, but they know messing them up is really bad.
It was a needless self-own that undermined real progress.
What went wrong #4:
You can make GPT-5 look like a dumbass
Despite all the hype about reducing hallucinations and leveling up intelligence, it's still embarrassingly easy for “a team of PhD-level experts in your pocket” to fail a child’s homework:
LLMs think a lot like humans, and their answer here is essentially what a person might give off the top of their head. Like humans, if they don’t think hard, they are bad at math.
A big change to GPT-5 is it now decides how deeply it should think. And for carefully chosen math problems like this, it guesses wrong.
Note, if you manually select GPT’s “thinking” model, you get the right answer:
AI professionals laugh all of this off as they already expect it. But this is nuance that most people won’t understand, and it does the same thing as their previous chartcrime: it torpedos trust.
GPT could probably route all math problems to a thinking model and save themselves a lot of cheap meme attacks, for a cost of a few million dollars.
What went wrong #5:
People like being told they are right
GPT-4 had become famously sycophantic. People hated that.
GPT-5 is far less likely to blow proverbial smoke up your butt, but it turns out … people liked that:
The tragedy here is there are people who have never had true emotional support in their personal lives and GPT was the first thing to give it to them. Now that that's going away, they're genuinely distraught:
We’ve seen similar before, when a software upgrade to a virtual girlfriend caused mass outrage among their users. People are forming true, deep connections to their AIs, and any change is inherently traumatic.
“Further experiments find that AI companions users feel closer to their AI companion than even their best human friend … disruptions to these relationships trigger real patterns of mourning”
Identity Discontinuity in Human AI relationships, Oct 2024
You might think that’s weird, or wrong, but about a billion people talk to AI on the regular now, and for many, it’s probably the most rewarding social connection of their day.
Perhaps they could have sunset GPT-4o a little more gracefully (OpenAI soon decided to bring it back).
What went right #1:
Simple and free for your mom
GPT-5 is free.
Until now, 97% of their users were using the free GPT-4o - estimated IQ 70 - instead of their paid o3 model - estimated IQ 123. (Take those numbers with a galactic mountain of salt, but safe to say: much smarter).
Getting free users onto something much smarter is a big deal, because it means hundreds of millions of people will now start to experience what modern AI can really do for the first time.
To do this, ChatGPT needed to fix their shambolic spiders’ web of models.
OpenAI has had the worst product naming in the history of the universe, and that universe includes the Xbox (Xbox → Xbox 360 → Xbox One(?) → Xbox Series X(?!!) → Xbox Lemonfresh Gibbonpaste (ok I made that one up but be honest you weren’t sure)) and Elon Musk’s children (Techno Mechanicus, Exa Dark Sideræl, and X Æ A-Xii (I did not need to make those up)):
And so, GPT-5, finally gives you and your mom Just One Model that works, and it decides for itself: “should I think fast, or slow?”
That’s a ‘boring’ but critical step we needed, to get to a future where walking talking robots fold your laundry and build your skyscrapers. The boring future requires technologies that are simple, and just work.
What these changes mean is that AI adoption will accelerate. As will hype, disruption, and fear.
What went right #2:
It just does things
Forget the hot takes, try and use GPT-5 for real, deep work. I have.
I was able to take programming tasks that had repeatedly defeated me (30+ years experience), as well as o3, Gemini 2.5 Pro, and Claude Opus 4.1 for weeks, and literally solve them in minutes with GPT-5.
For me, it is notably far better at following instructions, correcting itself, and not getting sidetracked.
Ethan Mollick has a more detailed take:
For fun, I prompted GPT-5 “make a procedural brutalist building creator where i can drag and edit buildings in cool ways, they should look like actual buildings, think hard.” That's it. Vague, grammatically questionable, no specifications.
A couple minutes later, I had a working 3D city builder.
Not a sketch. Not a plan. A functioning app where I could drag buildings around and edit them as needed. I kept typing variations of “make it better” without any additional guidance. And GPT-5 kept adding features I never asked for: neon lights, cars driving through streets, facade editing, pre-set building types, dramatic camera angles, a whole save system. It was like watching someone else's imagination at work.
Likewise, it is in a different class for natural writing:
These qualities are hard to discern without a lot of time just using it.
But the areas OpenAI highlighted in their announcement feel real and impactful to me, like fewer hallucinations:
And reduced sycophancy:
“In targeted sycophancy evaluations using prompts specifically designed to elicit sycophantic responses, GPT‑5 meaningfully reduced sycophantic replies (from 14.5% to less than 6%). At times, reducing sycophancy can come with reductions in user satisfaction” - OpenAI
And of course intelligence, which continues to curve up-and-to-the-right. Speaking of which…
What went right #3:
GPT-5 validates predictable scaling
Probably the best measure of artificial intelligence we have is METR’s: how long of a hard task can an AI complete reliably?
This is Moore’s Law for AI, essentially, and for all the naysayers, GPT-5 continues the ‘boring’ trend of exponential progress in hard real world domains, like software engineering:
There’s much nuance to debate here, but GPT-5’s performance points in one direction - this is at least as fast as METR’s baseline projection (task length doubling every 7 months), and very probably faster (doubling every 4 months).
What does that mean? Here’s what AI can already do, in August 2025:
1 minute - Simple factual lookup (e.g., Wikipedia). Crossed ~Mar 2023.
5 minutes - Draft a short, polite email with light lookup. Crossed ~Jan 2024.
10 minutes - Tidy a small spreadsheet or form. Crossed ~May 2024.
30 minutes - Turn a long article into a one-page summary. Crossed ~Nov 2024.
1 hour - Write a clear, sourced 1,000-word brief. Crossed ~Mar 2025.
2 hours - Plan a detailed half-day itinerary with bookings. Crossed ~Jul 2025.
What’s next, assuming we keep doubling every 4 months?
4 hours - Complete a half-day project with light QA. ETA ~Nov 2025.
8 hours (workday) - Deliver a day’s work end-to-end (small report/feature). ETA ~Mar 2026.
24 hours (1 day) - Handle a full-day task with iterations and fixes. ETA ~Sep 2026.
3 days - Run a mini-project: plan → do → revise → document. ETA ~Apr 2027.
1 week - Own a week-long project with checkpoints. ETA ~Aug 2027.
1 month - Produce a PhD-style research draft or equivalent body of work. ETA ~May 2028.
3 months - Sustain a complex initiative from concept to polished result. ETA ~Nov 2028.
This is the difference between an intern who needs help every hour or so, to someone who can independently work for a whole quarter uninterrupted.
We are conditioned to the spectacular
Assuming all GPT-5 did was validate expected trends, we’re headed to a world so wild that our three-dimensional meatball minds can scarcely conceive of it.
Technology is inevitably punctuated by disruptive breakthroughs, and periods of consistent, iterative growth. We invent the iPhone once, then we improve and adopt it dramatically. That second phase is typically ‘boring’, but without it, we’re still using Nokias:
If GPT-5 is only a gradual progression along a curve, it’s a curve that has taken us from barely-literate autocomplete, to winning gold medals in International Mathematical Olympiads, within a few years.
We’re become conditioned to the spectacular.
At this point, if a future GPT becomes smarter than every human who had ever lived, most people couldn’t tell. GPT-4.5 already passes the Turing test.
People now decide that an AI is a breakthrough - or not - based on messy primal social signals. That means hot takes, memes, gotchas, gossip, and vibes.
Watching the GPT-5 announcement, it was hard not to think how most of the speakers sounded more artificial than the AI (sorry guys, we still love you).
I can imagine some are surprised or even offended by the world’s reaction. OpenAI just accomplished amazing things. Why aren’t people happy?
In that very human domain, OpenAI may have some leveling up to do.
But sooner or later, OpenAI’s keynotes will be delivered by GPT - and I’m pretty sure it’ll get it right.