AI Engines

diffray is built on a powerful foundation of AI engines that work together to deliver comprehensive code reviews. This architecture enables a true multi-agent system where specialized agents collaborate effectively.

Why Agents, Not Just LLMs?

A common approach to AI code review is sending code to an LLM with a prompt like "review this code" and getting back a response. While simple, this approach has fundamental limitations:

The Problem with Single-Pass LLM Reviews

Single LLM Call	Agent-Based System
Sees only what you send	Explores the codebase autonomously
One-shot generation	Iterative analysis with verification
Can't follow imports or dependencies	Navigates project structure intelligently
Hallucinations go unchecked	Validates findings with real tools
Fixed context window	Focuses attention where it matters

What Makes an Agent Different?

An agent is an AI system that can:

Use tools — read files, search code, run analyzers
Make decisions — choose what to investigate based on findings
Iterate — follow leads, verify hypotheses, dig deeper
Self-correct — validate its own reasoning against real data

When diffray reviews your PR, agents don't just "look at the diff" — they:

Trace dependencies — follow imports to understand how changed code affects the system
Check related files — examine tests, configs, and documentation
Verify assumptions — run static analysis to confirm suspected issues
Cross-reference — look up type definitions, API contracts, and conventions

Granular Context, Focused Attention

A single LLM reviewing all aspects of code simultaneously faces a fundamental problem: context dilution. As it tries to check security, performance, bugs, and style all at once, its attention spreads thin. The more concerns it juggles, the more likely it is to miss issues.

diffray solves this with specialized agents, each with its own narrow focus, and intelligent context curation that ensures every agent receives precisely the information it needs. Each agent:

Starts fresh — clean context window, no accumulated fatigue
Stays focused — one job, done thoroughly
Goes deep — can spend full context on its specialty
Doesn't drift — no risk of forgetting its purpose mid-review

This is similar to having a team of specialists vs. one generalist trying to do everything. A security expert who only looks for vulnerabilities will catch more than someone splitting attention across 10 different concerns.

Real Example

Consider a function signature change in a PR:

Single LLM approach: "This changes the return type, make sure callers are updated" (generic advice)

Agent approach:

Searches for all usages of this function across the codebase
Identifies 3 call sites that now have type mismatches
Checks if tests cover these scenarios
Reports specific files and line numbers with concrete impact analysis

The difference is between speculation and investigation.

Core Engine

The core engine provides the foundation for the entire multi-agent system:

Advanced Language Models

We use the latest frontier models from Anthropic — Haiku 4.5, Sonnet 4.5, and Opus 4.5 — currently the most capable AI models for understanding and analyzing code. Each task within the review pipeline is matched with the optimal model:

Complex analysis tasks — largest models for deep reasoning and understanding
Pattern matching — faster models for quick checks
Validation passes — specialized models for verification

This model selection approach ensures both high-quality analysis and efficient processing.

Intelligent File Search

The core includes an advanced file search system that quickly navigates codebases of any size:

Smart pattern matching — finds relevant files instantly across thousands of files
Context-aware search — understands code structure, not just text
Efficient exploration — minimizes API calls while maximizing coverage
Parallel search — multiple search strategies run simultaneously

This means agents can quickly locate dependencies, related code, and project context to validate their findings.

Task Management System

A built-in task tracking system ensures thorough, consistent reviews:

Structured checklists — every review follows a comprehensive process
No missed steps — the system tracks what's been analyzed and what remains
Rule adherence — custom rules are never forgotten during analysis
Progress visibility — clear tracking of what each agent has completed

This prevents agents from overlooking issues or skipping important checks.

Tooling Engine

The tooling engine provides agents with the ability to verify their hypotheses using real code analysis tools. Rather than relying solely on AI pattern matching, agents can invoke specialized tools to confirm issues exist.

Integrated Tools

Tool	Purpose	Detects
TruffleHog	Secrets detection	API keys, credentials, tokens, private keys
Semgrep	Static analysis	Injection, auth issues, code quality
TypeScript Compiler	Type checking	Type mismatches, missing properties
ESLint/Biome	Linting	Code style, potential bugs
Dependency scanners	Vulnerability detection	Known CVEs in dependencies

How AI + Tools Work Together

Approach	Strengths	Limitations
AI-only	Context awareness, reasoning	May miss edge cases
Tools-only	Precise detection	High false positives, no context
AI + Tools	Accurate detection + intelligent filtering	Best of both

Hypothesis Verification

When an agent suspects a problem, it can run targeted analysis:

Agent: "This looks like SQL injection..."
       → Runs Semgrep with sql-injection rules
       → Tool confirms: "Unparameterized query at line 45"
       → Agent: Reports with concrete evidence

This dramatically reduces false positives by grounding AI analysis in concrete tool output.

Tool Output Enhancement

When tools find issues, AI:

Validates findings — confirms the issue is real in context
Filters noise — removes false positives based on actual usage
Explains impact — describes why the issue matters in your codebase
Suggests fixes — provides specific, actionable remediation

For detailed information about security-specific tools, see Security Tools.

Multi-Agent Architecture

These engines enable a sophisticated multi-agent system:

Agent Collaboration

Parallel execution — multiple specialized agents work simultaneously
Shared context — agents access the same codebase understanding
Finding deduplication — overlapping discoveries are merged intelligently
Cross-validation — agents can verify each other's findings

Phased Review Pipeline

Reviews run through a multi-phase pipeline, each phase optimized for its purpose:

Clone → Data Prep → Summarize → Triage → Rules → Review → Deduplication → Validation → Report

Phase 1: Clone

Fetches the repository and checks out the PR branch. This creates a clean working environment for analysis.

Phase 2: Data Preparation

Builds a comprehensive understanding of your codebase:

Dependency graph — maps how files connect through imports, exports, and type definitions
Call chains — traces function calls across the codebase
File classification — identifies file types, frameworks, and patterns in use
Change impact analysis — determines which parts of the codebase are affected by the PR

Phase 3: Summarize

LLM generates a high-level summary of the changes:

Change categorization — groups changes by type (new feature, bug fix, refactor, etc.)
Scope assessment — identifies the breadth and depth of modifications
Risk signals — flags potentially high-impact areas for closer review

Phase 4: Triage

Routes files to the appropriate specialized agents:

Agent matching — determines which agents are relevant for each file
Priority assignment — orders files by potential impact and complexity
Context bundling — prepares the right context for each agent's review

Phase 5: Rules

Loads and filters project-specific rules:

Rule discovery — finds rules from .diffray/rules/ and default rule sets
Relevance filtering — agents filter rules to only those applicable to current changes
Priority weighting — orders rules by importance for the specific review context

Phase 6: Review

Specialized agents analyze different aspects in parallel:

Parallel execution — multiple agents work simultaneously for speed
Focused analysis — each agent applies its specialty (security, performance, bugs, etc.)
Tool integration — agents can invoke static analyzers to verify findings
Evidence gathering — agents collect specific file locations and code references

Phase 7: Deduplication

Merges and rescores overlapping findings:

Similarity detection — identifies when multiple agents found the same issue
Consensus scoring — issues found by multiple agents get higher confidence
Finding consolidation — merges duplicate reports into single, comprehensive findings

Phase 8: Validation

Verifies issues and rescores confidence:

False positive filtering — removes issues that don't hold up to scrutiny
Confidence recalculation — adjusts scores based on validation results
Evidence verification — confirms that reported issues actually exist in the code
Severity assessment — finalizes issue severity based on validated impact

Phase 9: Report

Generates the final PR comments:

Comment formatting — structures findings for readability
Code references — links directly to relevant lines in the PR
Actionable suggestions — provides specific recommendations for fixes
Summary generation — creates an overview of all findings

Continuous Evolution

The engines evolve with the latest advances:

New models — latest AI capabilities integrated as they become available
Tool updates — static analyzers kept current with language evolution
Rule refinement — review rules continuously improved based on feedback
Performance optimization — faster reviews without sacrificing quality

The result? A multi-agent system that combines AI reasoning with concrete code analysis — delivering accurate, verified findings instead of speculation.

Why Agents, Not Just LLMs?​

The Problem with Single-Pass LLM Reviews​

What Makes an Agent Different?​

Granular Context, Focused Attention​

Real Example​

Core Engine​

Advanced Language Models​

Intelligent File Search​

Task Management System​

Tooling Engine​

Integrated Tools​

How AI + Tools Work Together​

Hypothesis Verification​

Tool Output Enhancement​

Multi-Agent Architecture​

Agent Collaboration​

Phased Review Pipeline​

Phase 1: Clone​

Phase 2: Data Preparation​

Phase 3: Summarize​

Phase 4: Triage​

Phase 5: Rules​

Phase 6: Review​

Phase 7: Deduplication​

Phase 8: Validation​

Phase 9: Report​

Continuous Evolution​