Tweet
sonnet 4.5 dropped 2 weeks back – “the best coding model yet” we ran it through our internal agentic coding benchmark – real-world engineering tasks and codebases. results: indeed, claude 4.5 wins! (although gpt-5-codex is close and costs less than half) but something much more interesting surprised me: > about half the tasks each model failed were passed by the other. in other words: > they’re different types of coders. we deep dived one example. claude 4.5: the craftsman perfectionist. slow to err, obsessed with correctness, maybe a little neurotic about spacing but ultimately reliable. gpt-5-codex: the hacker-engineer. exploratory, error-prone, and a bit too eager to improvise. full benchmark + example deep dive -> https://t.co/JNA6fvBjFA