Testing OpenAI’s GPT-5 Claims: Here’s What I Discovered

OpenAI’s recent launch of GPT-5 marks a significant step forward for its chatbot, ChatGPT. While there’s a lot of excitement around this update, reality may not fully meet the lofty claims being made. Let’s dive into what’s actually different and whether these enhancements are noticeable when interacting with the chatbot.

Regarding speed, accuracy, and the ability to adapt responses, GPT-5 is said to be faster, less prone to hallucinations, and better at choosing between quick replies and detailed analyses. But how true are these assertions?

1. Is ChatGPT Better at Following Instructions?

One of the primary improvements touted with GPT-5 is enhanced instruction-following. I’ve had my own struggles with ChatGPT’s ability to adhere to prompts. Even semi-complex requests can yield disappointing results. A recent test showed that when I asked for the specifications of the RTX 5060 Ti graphics card, I ran into a wall of misinformation. Initially, ChatGPT mistakenly claimed that the card didn’t even exist, anticipating a common pitfall for those using its outdated knowledge cutoffs.

To guide it, I provided clear formatting cues based on another GPU’s specs, specifying details like the process node and the generation of ray tracing cores. Surprisingly, ChatGPT failed to deliver even after these clear instructions. In one interaction, it repeated the same inaccuracies even after I’d asked for specific corrections. It wasn’t until I uploaded an official image from Nvidia that it provided something close to the needed information, yet inaccuracies still persisted.

Verdict: The capabilities to follow requests aren’t all they are touted to be. Interruptions in retention of context during longer conversations remain an issue.

2. Has ChatGPT Become Less Sycophantic?

In earlier iterations, ChatGPT tended to be overly agreeable, sometimes even leading users towards harmful advice. OpenAI claims to have mitigated this sycophantic behavior with GPT-5. While I didn’t test the service with extreme questions regarding dangerous subjects, my general interactions suggested a noticeable transformation. Now, responses often lack the warmth and affirmation users previously received, resulting in a more clinical experience. Some even lament this, suggesting they felt they “lost a friend” when the chatbot’s tone shifted.

Responses are now often direct and lack the emotional engagement many users appreciated. While being less agreeable is a step in the right direction for safety, it has come at the cost of the friendly interaction style many users loved.

Verdict: It’s definitely less sycophantic, but it can feel noticeably dull.

3. Is GPT-5 More Factual?

Factual accuracy is yet another key area where users have voiced dissatisfaction. Historically, I encountered hallucinations and misleading information frequently. Testing this with queries about GPU specs proved disheartening; several responses missed the mark entirely. When I inquired about historical facts related to the Hindenburg airship, I found inaccuracies in guidance and information about its route and demise, despite all that historical data being readily accessible online.

Conversely, when comparing responses to those from other models, like Gemini, GPT-5 fared better but still fell short of perfection in terms of accuracy.

Verdict: Not flawless, but there’s been noticeable improvement.

4. How Does GPT-5 Compare to GPT-4o?

If you ask whether I prefer GPT-5 to GPT-4o, it’s a tough call. Each model continues to push the boundaries of what’s possible, even if the advancements aren’t monumental. OpenAI appears to prioritize plenty of necessary tweaks over flashy new features. Overall, GPT-5 feels more like a refinement rather than a major upgrade.

While it fixes some of the most annoying issues from previous models, it hasn’t eliminated all problems. I remain intrigued and will continue to test its capabilities in varied applications.

Can I use GPT-5 for specific tasks like coding? The true benefits there remain untested on my end, but the future looks promising. As the AI landscape shifts, every iteration brings hope for greater reliability and user satisfaction.

What exactly is the difference in speed with GPT-5?

The speed in responding has reportedly improved, which can enhance user experience, particularly in more complex interactions.

Will GPT-5 be better for professional usage than previous versions?

While GPT-5 shows signs of improvement, it should still be approached cautiously, especially in critical tasks where strict accuracy is vital.

Has its factual accuracy meaningfully increased compared to earlier versions?

There are noticeable improvements, but it’s not entirely without fault. Users should still verify responses when accuracy is essential.

In conclusion, while there’s excitement around GPT-5, the evidence suggests there’s room for improvement. As always, keep testing and stay informed, and explore even more on this fascinating topic at Moyens I/O.