AI Engines
diffray is built on a powerful foundation of AI engines that work together to deliver comprehensive code reviews. This architecture enables a true multi-agent system where specialized agents collaborate effectively.
Why Agents, Not Just LLMs?
A common approach to AI code review is sending code to an LLM with a prompt like "review this code" and getting back a response. While simple, this approach has fundamental limitations:
The Problem with Single-Pass LLM Reviews
| Single LLM Call | Agent-Based System |
|---|---|
| Sees only what you send | Explores the codebase autonomously |
| One-shot generation | Iterative analysis with verification |
| Can't follow imports or dependencies | Navigates project structure intelligently |
| Hallucinations go unchecked | Validates findings with real tools |
| Fixed context window | Focuses attention where it matters |
What Makes an Agent Different?
An agent is an AI system that can:
- Use tools — read files, search code, run analyzers
- Make decisions — choose what to investigate based on findings
- Iterate — follow leads, verify hypotheses, dig deeper
- Self-correct — validate its own reasoning against real data
When diffray reviews your PR, agents don't just "look at the diff" — they:
- Trace dependencies — follow imports to understand how changed code affects the system
- Check related files — examine tests, configs, and documentation
- Verify assumptions — run static analysis to confirm suspected issues
- Cross-reference — look up type definitions, API contracts, and conventions
Granular Context, Focused Attention
A single LLM reviewing all aspects of code simultaneously faces a fundamental problem: context dilution. As it tries to check security, performance, bugs, and style all at once, its attention spreads thin. The more concerns it juggles, the more likely it is to miss issues.
diffray solves this with specialized agents, each with its own narrow focus, and intelligent context curation that ensures every agent receives precisely the information it needs. Each agent:
- Starts fresh — clean context window, no accumulated fatigue
- Stays focused — one job, done thoroughly
- Goes deep — can spend full context on its specialty
- Doesn't drift — no risk of forgetting its purpose mid-review
This is similar to having a team of specialists vs. one generalist trying to do everything. A security expert who only looks for vulnerabilities will catch more than someone splitting attention across 10 different concerns.
Real Example
Consider a function signature change in a PR:
Single LLM approach: "This changes the return type, make sure callers are updated" (generic advice)
Agent approach:
- Searches for all usages of this function across the codebase
- Identifies 3 call sites that now have type mismatches
- Checks if tests cover these scenarios
- Reports specific files and line numbers with concrete impact analysis
The difference is between speculation and investigation.
Core Engine
The core engine provides the foundation for the entire multi-agent system:
Advanced Language Models
We use the latest frontier models from Anthropic — Haiku 4.5, Sonnet 4.5, and Opus 4.5 — currently the most capable AI models for understanding and analyzing code. Each task within the review pipeline is matched with the optimal model:
- Complex analysis tasks — largest models for deep reasoning and understanding
- Pattern matching — faster models for quick checks
- Validation passes — specialized models for verification
This model selection approach ensures both high-quality analysis and efficient processing.
Intelligent File Search
The core includes an advanced file search system that quickly navigates codebases of any size:
- Smart pattern matching — finds relevant files instantly across thousands of files
- Context-aware search — understands code structure, not just text
- Efficient exploration — minimizes API calls while maximizing coverage
- Parallel search — multiple search strategies run simultaneously
This means agents can quickly locate dependencies, related code, and project context to validate their findings.
Task Management System
A built-in task tracking system ensures thorough, consistent reviews:
- Structured checklists — every review follows a comprehensive process
- No missed steps — the system tracks what's been analyzed and what remains
- Rule adherence — custom rules are never forgotten during analysis
- Progress visibility — clear tracking of what each agent has completed
This prevents agents from overlooking issues or skipping important checks.
Tooling Engine
The tooling engine provides agents with the ability to verify their hypotheses using real code analysis tools. Rather than relying solely on AI pattern matching, agents can invoke specialized tools to confirm issues exist.
Integrated Tools
| Tool | Purpose | Detects |
|---|---|---|
| TruffleHog | Secrets detection | API keys, credentials, tokens, private keys |
| Semgrep | Static analysis | Injection, auth issues, code quality |
| TypeScript Compiler | Type checking | Type mismatches, missing properties |
| ESLint/Biome | Linting | Code style, potential bugs |
| Dependency scanners | Vulnerability detection | Known CVEs in dependencies |
How AI + Tools Work Together
| Approach | Strengths | Limitations |
|---|---|---|
| AI-only | Context awareness, reasoning | May miss edge cases |
| Tools-only | Precise detection | High false positives, no context |
| AI + Tools | Accurate detection + intelligent filtering | Best of both |
Hypothesis Verification
When an agent suspects a problem, it can run targeted analysis:
Agent: "This looks like SQL injection..."
→ Runs Semgrep with sql-injection rules
→ Tool confirms: "Unparameterized query at line 45"
→ Agent: Reports with concrete evidence
This dramatically reduces false positives by grounding AI analysis in concrete tool output.
Tool Output Enhancement
When tools find issues, AI:
- Validates findings — confirms the issue is real in context
- Filters noise — removes false positives based on actual usage
- Explains impact — describes why the issue matters in your codebase
- Suggests fixes — provides specific, actionable remediation
For detailed information about security-specific tools, see Security Tools.
Multi-Agent Architecture
These engines enable a sophisticated multi-agent system:
Agent Collaboration
- Parallel execution — multiple specialized agents work simultaneously
- Shared context — agents access the same codebase understanding
- Finding deduplication — overlapping discoveries are merged intelligently
- Cross-validation — agents can verify each other's findings
Phased Review Pipeline
Reviews run through a multi-phase pipeline, each phase optimized for its purpose:
Clone → Data Prep → Summarize → Triage → Rules → Review → Deduplication → Validation → Report
Phase 1: Clone
Fetches the repository and checks out the PR branch. This creates a clean working environment for analysis.
Phase 2: Data Preparation
Builds a comprehensive understanding of your codebase:
- Dependency graph — maps how files connect through imports, exports, and type definitions
- Call chains — traces function calls across the codebase
- File classification — identifies file types, frameworks, and patterns in use
- Change impact analysis — determines which parts of the codebase are affected by the PR
Phase 3: Summarize
LLM generates a high-level summary of the changes:
- Change categorization — groups changes by type (new feature, bug fix, refactor, etc.)
- Scope assessment — identifies the breadth and depth of modifications
- Risk signals — flags potentially high-impact areas for closer review
Phase 4: Triage
Routes files to the appropriate specialized agents:
- Agent matching — determines which agents are relevant for each file
- Priority assignment — orders files by potential impact and complexity
- Context bundling — prepares the right context for each agent's review
Phase 5: Rules
Loads and filters project-specific rules:
- Rule discovery — finds rules from
.diffray/rules/and default rule sets - Relevance filtering — agents filter rules to only those applicable to current changes
- Priority weighting — orders rules by importance for the specific review context
Phase 6: Review
Specialized agents analyze different aspects in parallel:
- Parallel execution — multiple agents work simultaneously for speed
- Focused analysis — each agent applies its specialty (security, performance, bugs, etc.)
- Tool integration — agents can invoke static analyzers to verify findings
- Evidence gathering — agents collect specific file locations and code references
Phase 7: Deduplication
Merges and rescores overlapping findings:
- Similarity detection — identifies when multiple agents found the same issue
- Consensus scoring — issues found by multiple agents get higher confidence
- Finding consolidation — merges duplicate reports into single, comprehensive findings
Phase 8: Validation
Verifies issues and rescores confidence:
- False positive filtering — removes issues that don't hold up to scrutiny
- Confidence recalculation — adjusts scores based on validation results
- Evidence verification — confirms that reported issues actually exist in the code
- Severity assessment — finalizes issue severity based on validated impact
Phase 9: Report
Generates the final PR comments:
- Comment formatting — structures findings for readability
- Code references — links directly to relevant lines in the PR
- Actionable suggestions — provides specific recommendations for fixes
- Summary generation — creates an overview of all findings
Continuous Evolution
The engines evolve with the latest advances:
- New models — latest AI capabilities integrated as they become available
- Tool updates — static analyzers kept current with language evolution
- Rule refinement — review rules continuously improved based on feedback
- Performance optimization — faster reviews without sacrificing quality
The result? A multi-agent system that combines AI reasoning with concrete code analysis — delivering accurate, verified findings instead of speculation.