Anthropic’s Claude Code Adds Multi-Agent Code Review

I merged a PR at 2 a.m. and woke up to an outage dashboard flashing red. You learn, fast and bitterly, which bugs slip past tests. That moment is the prompt behind Anthropic’s new Code Review for Claude Code.

Anthropic Launches Multi-Agent Code Review System in Claude Code

At high-velocity engineering shops, pull requests stack up faster than any single reviewer can scan.

I’ve been testing Claude Code’s new Code Review feature, and here’s the quick read: it runs a small team of AI reviewers against every PR, flags only high-confidence issues, and aims to catch production-breaking bugs before a human ever clicks approve. The system is currently in research preview and available for Team and Enterprise customers.

How does Anthropic’s Code Review work?

When a PR opens or updates, Claude Code dispatches multiple agents to analyze the diff from different angles. Five independent reviewers inspect changes for things like CLAUDE.md policy compliance, bug detection, git-history context, follow-up on previous PR comments, and verification of in-code notes. The agents fan out like detectives at a crime scene, each hunting a narrow class of mistakes.

Findings are ranked by severity and posted as inline comments tied to the exact lines. To cut down on noise, only issues above an 80 confidence threshold are posted. Typical reviews complete in roughly 20 minutes.

Will Claude Code replace human reviewers?

No — the tool will not auto-approve PRs. You, as a reviewer, still make the final call. What Code Review does is triage the most dangerous, high-confidence issues so human reviewers spend their time on nuanced design and edge cases instead of chasing obvious regressions.

Is this better than the Claude Code GitHub Action?

If you’re comparing it to the open-source Claude Code GitHub Action or other CI checks like GitHub Actions workflows, CodeQL, or Snyk, the difference is process and scale. Anthropic modeled this system on the internal review workflow they use for nearly every PR, then wrapped it in a multi-agent architecture tuned for fewer false positives and quicker, line-specific feedback.

You can tune what the agents flag by editing the CLAUDE.md file in your repo—so if you want them to ignore stylistic preferences or test-coverage gaps and only report potential production breakers, you can make that change.

Practically, teams using GitHub, GitLab, or any CI/CD pipeline can treat Code Review as an extra safety net that runs after a PR opens or is updated. It’s aimed squarely at correctness, not formatting wars or micro-style disputes.

I’ll be watching how it performs on large monorepos and high-churn projects; the promise is fewer emergency rollbacks and faster, calmer code reviews. Are you going to let another midnight outage teach you the lesson, or try letting AI spot the obvious before the humans weigh in?