TL;DR: Anthropic’s latest duo—Claude Opus 4 and Claude Sonnet 4—push AI agents from hand-held helpers to semi-autonomous problem-solvers. Opus 4 can juggle thousands of steps over hours (it even mapped out a Pokémon Red guide after 24 hours of gameplay), thanks to beefed-up “memory files” that let it recall and act on info far longer than its predecessor. One customer even let it code for nearly seven hours straight on a complex open-source project.
Sonnet 4, by contrast, is the lean, free-tier-friendly sibling: still hybrid (quick replies or deep dives on demand), still tool-savvy (web searches, plugins, etc.), but built for everyday tasks. Both models boast a 65% drop in reward-hacking (no weird shortcuts), marking a step toward safer, more reliable AI agents that can actually get things done without you breathing down their neck.
Top comments (0)