https://x.com/echen/status/1978239403337130044
echen @echensonnet 4.5 dropped 2 weeks back – “the best coding model yet”
we ran it through our internal agentic coding benchmark – real-world engineering tasks and codebases.
results: indeed, claude 4.5 wins! (although gpt-5-codex is close and costs less than half)
but something much more interesting surprised me:
> about half the tasks each model failed were passed by the other.
in other words:
> they’re different types of coders.
we deep dived one example.
claude 4.5: the craftsman perfectionist. slow to err, obsessed with correctness, maybe a little neurotic about spacing but ultimately reliable.
gpt-5-codex: the hacker-engineer. exploratory, error-prone, and a bit too eager to improvise.
full benchmark + example deep dive ->Oct 14, 2025 View on X →
Tuesday, October 14, 2025
