On the Future of Pull Requests in the Age of Agentic Coding
Several people have been arguing publicly that pull request review breaks down when an agent wrote the code. Thomas Dohmke makes the case when pitching Entire. Peter Steinberger has been pushing a “prompt request” framing. CodeRabbit has weighed in on the prompt-vs-PR debate. Most of this discussion either argues that a new primitive is needed (without specifying the UX) or proposes reviewing the prompt upstream of code generation. This post is about the third option: what the downstream review surface should look like once the agent transcript is part of it.
Imagine the following setup:
- You have a decent LLM.
- You have a good agentic coding harness on top of it.
- Every pull request that gets opened automatically triggers a review pass by AI agents. Those reviewers feed their comments back into the agentic harness, which iterates on the code — say, 30 rounds of back-and-forth — fixing clear mistakes and tightening obvious rough edges.
- You have a solid test suite, type checks, linters, and the usual guardrails in place.
This is not hypothetical. The pieces exist today, and many repositories are already moving in this direction.
One caveat to be honest about up front: whether the iterative loop actually closes reliably — whether the harness in fact catches the small stuff — is itself a frontier problem. Agents today still miss obvious bugs ten iterations in. The argument that follows is conditional on that gap closing. If you do not believe it ever will, none of what follows will apply.
Even then, someone still needs to review the change before merging; the point is to make that review start from better context than a raw wall of changed lines.
The current PR view is for a workflow we no longer have
GitHub’s PR view is a diff with inline comments. That UI was built for a world where a human wrote those lines and another human inspected the patch before merge — where scrutinizing each line for off-by-ones and missed edge cases was the right job. In 2026 it is unfair to expect human reviewers to be that first reader, and it does not scale. The diff is the output. The interesting object is the process that produced it.
Attach the conversation, not just the diff
Steinberger’s “prompt request”
Peter Steinberger, the founder of OpenClaw, has been arguing publicly for a while that the right unit of review in the age of agentic coding is no longer the diff but the prompt that produced it — what he calls a “prompt request” rather than a pull request (Lex Fridman interview).
The value of the prompt and the agent transcript is that they explain intent and process: what the author tried to make the agent do, which constraints were explicit, which alternatives were considered, and which checks the agent claims to have run. That is context a diff alone does not provide.
The proposal
Attach the conversation history with the agentic coding harness to each commit. Every prompt, every tool call, every plan the agent considered and rejected, every test run and how the output changed the next step — all of it is an artifact you keep alongside the code.
When the pull request is opened, a separate LLM agent reads the diff and those conversations across all the commits in the PR. It then produces a summary of the important decisions: which architectural choices were made, which alternatives were considered, which assumptions were taken on trust, which corners of the problem the agent never explored, and which parts of the diff probably deserve human attention.
The reviewer starts from that summary, then uses it to decide where to inspect the actual code, tests, and behavior carefully.
You still approve or reject based on the change that gets merged. The reasoning trace is there to make the review easier to orient, not to make the decision for you.
Already being built
The concrete primitive behind this idea is already being built by Entire.
Entire is building a CLI that hooks into your git workflow, captures the agent session on every push, and indexes those sessions alongside commits in the repo itself rather than in some external service. They support Claude Code, Cursor, Gemini CLI, OpenCode, GitHub Copilot CLI, and others. Entire frames this today as a way to read the history of why code was written a certain way. But the same plumbing is exactly what a PR review built around better review context would sit on top of.
The diff is not going away. Sometimes you really do want to look at the code. But it should stop being the thing the review starts from. The conversation should be.