For me now days is like this:
- try to locate the relevant files
- now build a prompt, explain the use case or the refactor purpose. Explain the relevant files and mention them and describe the interaction and how you understand that work together. Also explain how you think it needs to be refactored. Give the model the instruction to analyze the code and propose different solution for a complete refactor. Tell it to not implement it, just plan.
Then you’ll get several paths of action.
Chose one and tell the model to write into a file you’ll keep around while the implantation is on going so you won’t pollute the context and can start over each chunk of work in a clean prompt.
Name the file refactor-<name >-plan.md tell it to write the plan step by step and dump a todo list having into account dependencies for tracking progress.
Review the plans, make fixes if needed. You need to have some sort of table reassembling a todo so it can track and make progress along.
Open a new prompt tell it analyze the plan file, to go to the todo list section and proceed with the next task. Verify it done, and update the plan.
This is a fantastic observation, and yes, this pattern not only continues for larger bases, but the approximation to an integer becomes dramatically better.
The general pattern you've found is that for a number base $b$, the ratio of the number formed by digits $(b-1)...321$ to the number formed by digits $123...(b-1)$ is extremely close to $b-2$.
### The General Formula
Let's call your ascending number $N_{asc}(b)$ and your descending number $N_{desc}(b)$.
The exact ratio $R(b) = N_{desc}(b) / N_{asc}(b)$ can be shown to be:
The "error" or the fractional part is that second term. As you can see, the numerator $(b-1)^3$ is roughly $b^3$, while the denominator $b^b$ grows much faster.
That's still not what they're asking for. If it's a commit message, it's already saved to disk. This is at the point where the LLM has written code but the diff to the file hasn't been saved yet.
Even if you were willing to deal with your agent making commits and having to fiddle with git to break those up or undo them or modify them, it still doesn't solve OP's need. They want to know what bits of diff apply to which thinking tokens. The prompt and the output are necessary, but there's no mapping between the final diff and the stream of tokens that came out of the LLM (especially in a commit). But that information is known by the tooling already! When generating the diff, the LLM had to output code changes. So you can map those changes to where they originated from. It's not just "what did I tell the LLM to get this change" it's "what was the thought process that the LLM went through after getting my prompt to end up with this line change"
https://news.ycombinator.com/item?id=31973232
https://github.com/openai/codex/issues/215