R
rajan agarwal @_rajanagarwal
Friday, October 24, 2025 import

Tweet

i've spent the past ~1 year working on coding, browser-use and video editing agents/post-training. here are some thoughts & predictions i have: 1. Browser-Use Agents. It's been super cool to see how much browser use agents have evolved in the past year and to notice how much progress we still have left to gain. So many labs, big and small, are working on browser/computer use models for varying use cases. A few notable interfaces are @skybysoftware's beautiful agent interaction, @yutori_ai scouts, @GeneralAgentsCo and of course @comet and @ChatGPTapp Atlas. I currently do RL post training & research as an intern (the first!) at @AmazonScience AGI Lab, with the team that was previously at @AdeptAILabs. We work on Nova Act, a really good and reliable browser-use model! From what I've seen, I believe we are currently in the state where we are learning to interact with the existing messy-ness of the browser/OS, adjacent to self driving cars trying to interact with randomness on the road. I expect a lot of new interfaces and abstractions to come out that help us interact with websites in more intuitive ways for agents, browsers being one of them. I also expect APIs to become a lot more modular, like the approach of ChatGPT Apps where we eventually reach one chat interface to interact with everything. I still find computer-use actuation to be a bit gimmicky since tasks are short term, while browser-use is more often long horizon. But I do think having this always understanding experience on desktops is super important, kind of like @AviSchiffmann Friend but in your menu bar. We are still so far away from highly capable web agents and it's a super exciting RL and research problem. 2. Coding Agents. This one speaks for itself; if you can make 90% of new startups 10x faster, then you are providing insane value to small teams. I have a feeling we will see a lot more emphasis on background coding agents over the next bit. @cognition is taking the right bets with both @windsurf and @devinAI, building smaller foundational models to help the core agent become stronger/faster. deepwiki alone has provided so much value to my understanding of codebases too. @cursor_ai's experiments with online RL are super cool and i hope this inspires more foundational models for coding too. This summer, I worked on Shadow, a background coding agent (think an oss worse version of Devin) and spent a lot of time reverse engineering and thinking about agents for long-horizon tasks. It's interesting seeing @claudeai and @openai codex exploring codebases slowly, vs @cursor_ai @cognition indexing and building on top of retrieval—i think retrieval wins in most cases, since it's hard to grep your way out of everything, but there has to be some processing speed ups here. that's why i love swe-grep so much. Also, >1 year ago I wrote a blog about a coding subagents experiment that I ran, where we had 3 extremely specialized subagents work together, outperforming some other single-thread agents that we built. i think subagent interoperability is a great way to reduce context overflow in individual agents and manage information and understanding of code, but we have some work to do with context transfer and the latency that comes with it. We have too many tools and not enough parallelism of domain expertise, that's not how software engineering works. Coding agents are becoming increasingly capable but we have a LOT of work to do with instruction following, setting invariants for code agent safety, etc... I think we also have to temporarily give these agents a little less ownership. If people are spending too much time fixing code, I would rather they heavily direct us in the right direction instead of make changes confidently. I really like the ability to continuously add messages without interrupting in Claude Code and Cursor, but I think people using Tab and Ask mode should be more mainstream than Agent tbh. 3. Video Editing Agents. This is a bit of a curveball in this list, but I think this is a big & important industry with two sectors. First, video editing for tiktoks, tv shows and movies are extremely important. In the Winter, I worked @trykino on creating the best experience for hollywood film producers to take countless hours of footage and instantly index/search/make edits with them. We didn't have to make an incredibly high quality video edits, but rather just make already great video editors much better. There are SOO many video editing for tiktoks/short term on the market but I'm not too much of a fan of the "optimize-for-attention-span" products. All of the cursor for video editings are getting a bit out of hand, I think helping producers and editors create 2 hour masterpieces in the theaters is a super important problem that Kino & a few others are solving. This is a huge video long context problem that I spent months working on, and will continue to evolve as we try to understand hours and hours of footage even better. Second, there's video generation. The concept of Sora is confusing at first, but I think it's just proof that realistic video generation is here and now it's time to make content. i love the @gabriel1 generations because they're just funny, and it's a lot more justifiable to make a really appealing 10 second video generation that we can laugh at, but this is just a step towards long video gen that's useful or visually stunning. @runwayml and @LumaLabsAI never fail to impress me with their new launches and i'm so excited about world models like @theworldlabs or @GoogleDeepMind genie 3 coming out (please be real!) Video generation means increasing control over what humans see, so I hope to see more edit-existing-videos products that help filmmakers make CGI, for example. I think the common theme across these three is that these agents and intelligent post training is helping creative minds produce more work to the world. This is what AGI is all about, isn't it?