The New AI Model, Qwen 3.6-27B Is a Giant White Pill

Thursday, April 23, 2026 AI

Scraped Article

Cutting to the punchline. Alibaba / Qwen have delivered an insanely powerful affordable model (3.6) which is likely to change the narrative on closed vs open source. It is bullish for Alibaba stock ($BABA - disclosure, I am long). Combined with a large number of other impressive releases (GLM 5.1 -- disclosure am long 2513 HK), Minimax) - we are also entering an important period where Chinese open source becomes a driver of A story around affordable agentic workflows A driver of demand for crypto AI products which are powered by open source models - ranging from GPU financing, to outright open source demand bids (Venice, Ambient, Akash), or AI based governance and reward protocols (my own protocol - Post Fiat, or Bittensor) Just very pragmatically, a way to protect your IP without just dumping it into closed source model providers and hoping they don't rob you blind Previous story: tool use on agentic coding tools means infinite token consumption, everyone is rate limited, you can't even use Claude New story: Chinese open source models on a RTX Pro 6000 (a $10k investment, or $2 an hour on Runpod) can run codex/claude code and do most of the work you need. Putting this into context, using Claude Code with full tool use through API only can run $60-80 per open task/ terminal. Assuming an engineer is using 2 terminals (waiting for 1 to load) - this could be a $300k a year line item per engineer. This - in part, explains Jensen's somewhat outrageous statement that a top engineer should be spending $500k a year on compute. This would add up if you were using Opus 4.7 going full blast without a max plan using Zero Data Retention (ZDR). I saw the Qwen 3.6 benchmarks that suggested it performs similarly to Opus 4.5 and did a doubletake. There have been a number of big model drops in the past week. Moonshotai/kimi-k2.6 is a particularly powerful as a chat model -- delivering insight without the painful verbosity of Opus 4.7. But chat, while cool doesn't move the economic needle You need a model that can design, and code - if you want to replace costly engineers or build useful internal software. To get set up, I rented a RTX Pro for $1.89 an hour (with some extra storage costs on Runpod. I wrote an instruction guide here if you are interested in reproducing this. I don't generally trust the Chinese labs Performance Using Ollama this thing delivers 54.7 t/s (faster than you can read most likely). This is quite high (many models on public cloud services deliver 1/2 that rate). The problem is that Qwen 3.6 is a very verbose reasoning model so it does consume quite a lot of tokens talking through things, which slows it down. If you turn reasoning mode of performance screams (decent for chat). But logic drops. Sample Benchmark 1 As much of an Anthropic / claude code hater I can be - it genuinely is much better at designer than other models I fed the following prompt to each model: Make the most beautiful possible HTML landing page you can for a cryptocurrency called Post Fiat, which is an AI hive mind where users pseudonymously contribute information to a collective. The call to action is to Join the Task Node. A secondary call to action is to Run a Validator. This is an HTML mock and I'll render locally to test it out. Follow all relevant design best practices and those of world class brands. The goal is for this to be a piece of art, not something that is obviously AI generated with generic emojis. Someone should see this, and want to be part of something. please execute and return full HTML thanks" The following is Opus 4.7 on a 1 shot Note that Opus in the actual Claude loads a 'front end skills' markdown that substantially improves its performance GPT 5.4 UX (as usual) - is fairly bleak. This is obviously well known. And here is Qwen 3.6 Benchmark 2: Codex Next I wanted to see if Qwen 3.6 27b actually performed inside of Codex. (Codex is OpenAI's vibe coding tool - which can handle other models running on it in the config) I ran an entire end to end session to improve various things on the website. You can also use open source tools like Aider, or Claude Code -- the point here is that you can drag and drop any model into these 'harnesses' The end result wasn't anything to write home about but it did what I said mostly, and did what I suggested including adding complex hover overs, dynamic background animations and working links. I never had to spend more than 1 turn correcting things. It didn't follow instructions in a 'god tier' engineer way but it got the job done, as instructed When you compare it to the designer savant that is Opus 4.7 -- it's definitely not as cool, and I do not think it really approaches the level that a fully skill-instructed UX does. Conclusion The point of this article isn't really to say "oh wow these websites are sick". It's more that we've finally reached a model that runs on consumer grade hardware, with high enough speed to run multiple codex sessions o