Twitter AI Evaluation (legacy)

Sunday, April 5, 2026

AI Evaluated

Tweets

Explore

Save

Skip

@gkisokay Save Insight

Anthropic just banned Claude subscriptions from powering OpenClaw. Here's why my stack was already built for this. I never ran Opus 4.6 through a subscription for OpenClaw or Hermes. It runs in Claude Code for complex external dev only. Same with GPT-5.4 in Codex. The internal agent runtime is a completely different stack: 1. Qwen3.5 9B runs locally. $0. Always on. Feeds the subconscious ideation loop 24/7. Beats GPT-OSS-120B by 13x. Awesome. 2. MiniMax M2.7 is the agent's backbone. 97% skill adherence, built for agents, $0.30/M tokens. The $10 plan allows for 1500 calls every 5 hours. Amazing. 3. GPT-5.4 mini is the Hermes brain. debates ideas with the subconscious, builds output, ~$0.075 avg per run. It's smart enough to orchestrate your entire system, and you can actually use your subscription plan here via OAuth. Incredible! Over the last 24 hours, the subconscious ran 15 times, for a total of $1.58. Not too shabby for an always-improving agentic system. The lesson is to build your agent stack on a multiple LLM stack. Local models handle volume. Generous subscription models handle execution and judgment. You own the cost structure. Full-stack breakdown in the table. (see image)

Quick Insight

This is about building cost-effective AI agent systems using a multi-model approach instead of relying on single expensive APIs. The author runs cheaper local models for high-volume tasks and uses premium models only for complex reasoning, claiming to keep costs under $2/day for a continuously running agent system.

Actionable Takeaway

Brian could experiment with this multi-tier approach in his AI dev workflows - use a local model like Qwen for routine code analysis/suggestions and reserve GPT/Claude for complex architecture decisions or customer-facing features.

Related to Your Work

For Brian's webhook integrations and analytics dashboards, this pattern could work well: local models handle log parsing and basic anomaly detection while premium models tackle complex business logic or customer offer optimization where accuracy is critical.

Thread/Source Worth Reading

The tweet mentions a table breakdown that would show the actual cost/performance metrics, but without seeing the image, can't evaluate if the specific numbers and model choices are worth diving into.

@gkisokay Explore Further

Quick Insight

This is a detailed technical guide for building self-improving AI agent systems — specifically a "subconscious agent" that continuously refines workflows by running ideation→critique→synthesis loops and writing improvements back to persistent state. It's practical architecture advice, not theoretical fluff, with specific implementation requirements like persistent JSON/JSONL state, model routing, and approval gates.

Actionable Takeaway

Build a simple version of this improvement loop for one of your existing agents — start with basic JSON state persistence and a cron-triggered cycle that reviews recent runs, generates improvement ideas, and writes back refined prompts or workflow configurations.

Related to Your Work

This directly applies to your AI-powered dev workflows and automation tools. Instead of manually tweaking your print-on-demand automation or webhook processing logic when edge cases emerge, a subconscious agent could continuously analyze failures and suggest configuration improvements or new error handling patterns.

Thread/Source Worth Reading

YES — The linked article is a comprehensive implementation guide with specific technical requirements (file structures, model routing, state persistence). It includes concrete examples of the artifact files the system generates and explains the architecture decisions behind each component. Worth reading for the implementation details.

@NickSpisak_ Explore Further

Quick Insight

This is Karpathy's approach to building a personal knowledge base using AI to organize scattered notes into a searchable wiki. Instead of complex tools, it's just three folders (raw/, wiki/, outputs/) where you dump everything and let AI create structured, interconnected documentation.

Actionable Takeaway

Set up the three-folder structure and try it with your existing bookmarks, side project notes, and technical documentation. Use the agent-browser CLI tool to automatically scrape articles into your raw/ folder instead of manual copy-pasting.

Related to Your Work

Perfect for organizing your side project research, fintech industry articles, and technical learnings across TypeScript/AWS/AI integrations. Could replace scattered bookmarks and notes with a searchable knowledge base that connects patterns across your different projects.

Thread/Source Worth Reading

Yes - the linked article provides the complete implementation including folder structure, CLAUDE.md schema template, and specific prompts. The agent-browser tool (26K GitHub stars) for automated web scraping is particularly valuable for building the raw content pipeline.

@karpathy Explore Further

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

Quick Insight

Karpathy is proposing a shift from sharing actual code/apps to sharing high-level "idea files" that AI agents can then customize and build for specific use cases. He's using an "LLM wiki" concept as an example, stored in a GitHub gist that agents can interpret and implement.

Actionable Takeaway

Try creating an "idea file" template for one of your recurring side project patterns (like Chrome extension boilerplates or webhook integration setups) and test whether Claude/GPT can actually build working implementations from just the abstract description.

Related to Your Work

This could streamline how you document and share internal tools at the fintech startup - instead of maintaining specific code templates for webhook integrations or analytics dashboard components, you could maintain idea files that agents customize for different partner requirements or use cases.

Thread/Source Worth Reading

The linked gist is worth checking - it should show Karpathy's actual "idea file" format for the LLM wiki concept, which would be the concrete example of how to structure these agent-readable specifications.

@cyrilXBT Skip

INSTEAD OF WATCHING NETFLIX TONIGHT. Spend 1 hour with this. Claude AI FULL COURSE that teaches you how to BUILD and AUTOMATE anything. The people who watch this tonight will wake up tomorrow with a skill that most people will not have in 2 years. The people who skip it will still be watching Netflix next year wondering why nothing in their life has changed. Your call.

Quick Insight

This is a typical Twitter hook promoting a Claude AI course/tutorial. The extreme language ("skill that most people will not have in 2 years") is pure hype, but if there's actual content behind it, learning advanced Claude prompting and automation workflows could be valuable for Brian's AI integration work and automation-heavy side projects.

Actionable Takeaway

Check if the linked course covers Claude's API integrations, function calling, or workflow automation patterns that could improve his current AI-powered dev workflows or be integrated into his fintech platform's features.

Related to Your Work

Brian's already doing AI integrations at work and building AI-powered dev workflows as side projects. Advanced Claude automation techniques could enhance his webhook processing, improve his print-on-demand automation, or create better AI agents for his web agency tools.

Thread/Source Worth Reading

No link provided in the tweet text. Without seeing the actual course content, this is just marketing copy. Would need to find the linked resource to evaluate if it's worth the time investment.