I also use it this way and I'm overall pretty happy with it, but it feels like they really want us to use it in "autopilot" mode. It's like they have two conflicting priorities of "make people use more tokens so we can bill them more" and "people are using more tokens than expected, our pricing structure is no longer sustainable"
(but I guess they're not really conflicting, if the "solution" involves upgrading to a higher plan)
I feel like they are making it harder to use it this way. Encouraging autonomous is one thing, but it really feels more like they are handicapping engaged use. I suspect it reflects their own development practices and needs.
This is something I've thought of as well. The way the caps are implemented, it really disincentivizes engaged use. The 5-hour window especially is very awkward and disruptive. The net result is that I have to somewhat plan my day around when the 5-hour window will affect it. That by itself is a powerful disincentive from using Claude. It has also caused me to use different tools for things I previously would have used Claude for. For example, detailed plans I use codex now rather than Claude, because I hit the limit way too fast when doing documentation work. It certainly doesn't hurt that codex seems to be better at it, but I wouldn't even have a codex subscription if it wasn't for claude's usage limits
Wow, weird to see someone mirror my experience so closely. At the $100 plan my day was being warped around how to maximise multple 5 hour sessions so that it felt worth it. Dropped down to the $20 plan and stopped playing the game as I know I'll just consume the weekly usage in the few days I have free. Meanwhile codex gave me a free month, their 5HourUsageWindow:WeeklyUsageWindow ratio feels way better balanced and it gets may more work done from it. Similar to you, any task involving reading/reviewing docs [or code reviews] now insta-nukes claude's usage. My record is 12 minutes so far...
Another big one for me is that they dropped the cache TTLs. It is normal for me to come back to a session an hour later, but someone "autopilot"-ing won't have such gaps.
not just the cache though. every time you stop and come back, it basically reloads the whole session. if you just let it keep going, it counts like one smooth run. you hit the wall faster for actually checking its work.
It was probably the bug about cache getting purged after 5min rather than 1hour. You can review things pretty well within an hour. 5min is a real crunch. 5min doesn't mix with multitasking or getting interrupted.
I think the culty element of AI development is really blinding a lot of these companies to what their tools are actually useful for. They’re genuinely great productivity enhancers, but the boosters are constantly going on about how it’s going to replace all your employees and it’s just. . .not good for that! And I don’t mean “not yet” I mean I don’t see it ever getting there barring some major breakthrough on the order of inventing a room-temp superconductor.
I agree with you, the "replacing people" narrative is not only wrong, it's inflammatory and brand suicide for these AI companies who don't seem to realize (or just don't care) the kind of buzz saw of public opinion they're walking straight towards.
That said, looking at the way things work in big companies, AI has definitely made it so one senior engineer with decent opinions can outperform a mediocre PM plus four engineers who just do what they're told.
Do you have any good resources on how to work like that? I made the move from "auto complete on steroids" to "agents write most of my code". But I can't imagine running agents unchecked (and in parallel!) for any significant amount of time.
Right now, I'm finding a decent rhythm in running 10-20 prompts and then kind of checking the results a few different ways. I'll ask the agent to review the code, I'll go through myself, I'll do some usability and gut checks.
This seems to be a good window where I can implement a pretty large feature, and then go through and address structural issues. Goofy thinks like the agent adding an extra database, weird fallback logic where it ends up building multiple systems in parallel, etc.
Currently, I find multiple agents in parallel on the same project to be not super functional. Theres just a lot of weird things, agents get confused about work trees, git conflicts abound, and I found the administrative overhead to be too heavy. I think plenty of people are working on streamlining the orchestration issue.
In the mean time, I combat the ADD by working on a few projects in parallel. This seems to work pretty well for now.
It's still cat herding, but the thing is that refactors are now pretty quick. You just have to have awareness of them
I was thinking it'd be cool to have an IDE that did coloring of, say, the last 10 git commits to a project so you could see what has changed. I think robust static analysis and code as data tools built into an IDE would be powerful as well.
The agents basically see your codebase fresh every time you prompt. And with code changes happening much more regularly, I think devs have to build tools with the same perspective.
It will come naturally! I have started with autocomplete as well. I was stumbling upon different problems and was fixing them by implementing best practices. Current stack is:
1/ Claude Code with yolo mode
2/ superpowers plugin
3/ red/green tdd
4/ a lot of planning and requirements before writing any code
It feels like you always touch this edge of capability of models and your current workflow. Delegate more complex task, and system fails. Delegate more simple and system works great. Improve your workflow and move this complexity to a higher level.
But... I am llm power user for more than a year and a half now. I cant delegate exactly because ive reviewed a lot of llm's code, and it is never good enough for me to step down from reviewing everything manually. I can understand how you can vibe code dashboard or tests, but vibe code your entire backend without checking it thru carefully? Madness.
For me you open a markdown editor and draft up a code plan and details of what you'd do as a coder at a high level then bust into whatever tool in planning mode (I usually fire this into the opus 4.5 model) and have it break it down into concise steps and then hand it off to a simple model (gpt spark, sonnet, composer or whatever) to execute. when I feel frisky I'll just have opus one shot it and it can be done in a few minutes.
I use Claude “on the web” or Google Jules. Essentially everything happens in a sandbox - so yolo isn’t a huge risk. You can even box its network access. You review the PR at the end or steer it if it’s veering off course.
(but I guess they're not really conflicting, if the "solution" involves upgrading to a higher plan)