My guess is that with the new UI they've either mistakenly or deliberately reduced the thinking effort or the budget, even if they write "extended thinking" in the app. And it seems that they're (mistakenly or deliberately) now routing some queries to the Flash model despite the selection of Pro/Thinking.
It was super bad right after launch, yet the AI Studio or API calls were on par to the earlier experience. Despite the selected "extended thinking", for the first time in a very long time the LLM outputed fully incorrect and a non-working code for a super simple problem (schema matching), which it can solve easily (and did through the API). So it's definitely not a coincidence.
I'm hoping that it's temporary, because otherwise we're back to GPT-5.0 levels of bad and it will kill the coding usage.
Hallucinations galore.
It is supposed to personalize responses since it has access to my main gmail account. It thought I was a woman and working in healthcare. Pretty wildly off.
Massively so, yes. It feels snappier tho, and this makes the situation arguably worse - imagine they have this big dashboard showing that people are interacting more the AI, so they should keep on improving the speed even though the results themselves are terrible.
This is very common for AI companies, they release the full beast to reach the top of benchmarks, then quantize after everyone calls them the best and everyone buys a subscription.
My guess is that with the new UI they've either mistakenly or deliberately reduced the thinking effort or the budget, even if they write "extended thinking" in the app. And it seems that they're (mistakenly or deliberately) now routing some queries to the Flash model despite the selection of Pro/Thinking.
It was super bad right after launch, yet the AI Studio or API calls were on par to the earlier experience. Despite the selected "extended thinking", for the first time in a very long time the LLM outputed fully incorrect and a non-working code for a super simple problem (schema matching), which it can solve easily (and did through the API). So it's definitely not a coincidence.
I'm hoping that it's temporary, because otherwise we're back to GPT-5.0 levels of bad and it will kill the coding usage.
Hallucinations galore. It is supposed to personalize responses since it has access to my main gmail account. It thought I was a woman and working in healthcare. Pretty wildly off.
Massively so, yes. It feels snappier tho, and this makes the situation arguably worse - imagine they have this big dashboard showing that people are interacting more the AI, so they should keep on improving the speed even though the results themselves are terrible.
"I'm In This Photo, And I Don't Like It".
This is very common for AI companies, they release the full beast to reach the top of benchmarks, then quantize after everyone calls them the best and everyone buys a subscription.