Skip to main content

AI Engines

diffray is built on a powerful foundation of AI engines that work together to deliver comprehensive code reviews. This architecture enables a true multi-agent system where specialized agents collaborate effectively.

Why Agents, Not Just LLMs?

A common approach to AI code review is sending code to an LLM with a prompt like "review this code" and getting back a response. While simple, this approach has fundamental limitations:

The Problem with Single-Pass LLM Reviews

Single LLM CallAgent-Based System
Sees only what you sendExplores the codebase autonomously
One-shot generationIterative analysis with verification
Can't follow imports or dependenciesNavigates project structure intelligently
Hallucinations go uncheckedValidates findings with real tools
Fixed context windowFocuses attention where it matters

What Makes an Agent Different?

An agent is an AI system that can:

  1. Use tools — read files, search code, run analyzers
  2. Make decisions — choose what to investigate based on findings
  3. Iterate — follow leads, verify hypotheses, dig deeper
  4. Self-correct — validate its own reasoning against real data

When diffray reviews your PR, agents don't just "look at the diff" — they:

  • Trace dependencies — follow imports to understand how changed code affects the system
  • Check related files — examine tests, configs, and documentation
  • Verify assumptions — run static analysis to confirm suspected issues
  • Cross-reference — look up type definitions, API contracts, and conventions

Granular Context, Focused Attention

A single LLM reviewing all aspects of code simultaneously faces a fundamental problem: context dilution. As it tries to check security, performance, bugs, and style all at once, its attention spreads thin. The more concerns it juggles, the more likely it is to miss issues.

diffray solves this with specialized agents, each with its own narrow focus, and intelligent context curation that ensures every agent receives precisely the information it needs. Each agent:

  • Starts fresh — clean context window, no accumulated fatigue
  • Stays focused — one job, done thoroughly
  • Goes deep — can spend full context on its specialty
  • Doesn't drift — no risk of forgetting its purpose mid-review

This is similar to having a team of specialists vs. one generalist trying to do everything. A security expert who only looks for vulnerabilities will catch more than someone splitting attention across 10 different concerns.

Real Example

Consider a function signature change in a PR:

Single LLM approach: "This changes the return type, make sure callers are updated" (generic advice)

Agent approach:

  1. Searches for all usages of this function across the codebase
  2. Identifies 3 call sites that now have type mismatches
  3. Checks if tests cover these scenarios
  4. Reports specific files and line numbers with concrete impact analysis

The difference is between speculation and investigation.

Core Engine

The core engine provides the foundation for the entire multi-agent system:

Advanced Language Models

We use the latest frontier models from Anthropic — Haiku 4.5, Sonnet 4.5, and Opus 4.5 — currently the most capable AI models for understanding and analyzing code. Each task within the review pipeline is matched with the optimal model:

  • Complex analysis tasks — largest models for deep reasoning and understanding
  • Pattern matching — faster models for quick checks
  • Validation passes — specialized models for verification

This model selection approach ensures both high-quality analysis and efficient processing.

The core includes an advanced file search system that quickly navigates codebases of any size:

  • Smart pattern matching — finds relevant files instantly across thousands of files
  • Context-aware search — understands code structure, not just text
  • Efficient exploration — minimizes API calls while maximizing coverage
  • Parallel search — multiple search strategies run simultaneously

This means agents can quickly locate dependencies, related code, and project context to validate their findings.

Task Management System

A built-in task tracking system ensures thorough, consistent reviews:

  • Structured checklists — every review follows a comprehensive process
  • No missed steps — the system tracks what's been analyzed and what remains
  • Rule adherence — custom rules are never forgotten during analysis
  • Progress visibility — clear tracking of what each agent has completed

This prevents agents from overlooking issues or skipping important checks.

Tooling Engine

The tooling engine provides agents with the ability to verify their hypotheses using real code analysis tools. Rather than relying solely on AI pattern matching, agents can invoke specialized tools to confirm issues exist.

Integrated Tools

ToolPurposeDetects
TruffleHogSecrets detectionAPI keys, credentials, tokens, private keys
SemgrepStatic analysisInjection, auth issues, code quality
TypeScript CompilerType checkingType mismatches, missing properties
ESLint/BiomeLintingCode style, potential bugs
Dependency scannersVulnerability detectionKnown CVEs in dependencies

How AI + Tools Work Together

ApproachStrengthsLimitations
AI-onlyContext awareness, reasoningMay miss edge cases
Tools-onlyPrecise detectionHigh false positives, no context
AI + ToolsAccurate detection + intelligent filteringBest of both

Hypothesis Verification

When an agent suspects a problem, it can run targeted analysis:

Agent: "This looks like SQL injection..."
→ Runs Semgrep with sql-injection rules
→ Tool confirms: "Unparameterized query at line 45"
→ Agent: Reports with concrete evidence

This dramatically reduces false positives by grounding AI analysis in concrete tool output.

Tool Output Enhancement

When tools find issues, AI:

  • Validates findings — confirms the issue is real in context
  • Filters noise — removes false positives based on actual usage
  • Explains impact — describes why the issue matters in your codebase
  • Suggests fixes — provides specific, actionable remediation

For detailed information about security-specific tools, see Security Tools.

Multi-Agent Architecture

These engines enable a sophisticated multi-agent system:

Agent Collaboration

  • Parallel execution — multiple specialized agents work simultaneously
  • Shared context — agents access the same codebase understanding
  • Finding deduplication — overlapping discoveries are merged intelligently
  • Cross-validation — agents can verify each other's findings

Phased Review Pipeline

Reviews run through a multi-phase pipeline, each phase optimized for its purpose:

Clone → Data Prep → Summarize → Triage → Rules → Review → Deduplication → Validation → Report

Phase 1: Clone

Fetches the repository and checks out the PR branch. This creates a clean working environment for analysis.

Phase 2: Data Preparation

Builds a comprehensive understanding of your codebase:

  • Dependency graph — maps how files connect through imports, exports, and type definitions
  • Call chains — traces function calls across the codebase
  • File classification — identifies file types, frameworks, and patterns in use
  • Change impact analysis — determines which parts of the codebase are affected by the PR

Phase 3: Summarize

LLM generates a high-level summary of the changes:

  • Change categorization — groups changes by type (new feature, bug fix, refactor, etc.)
  • Scope assessment — identifies the breadth and depth of modifications
  • Risk signals — flags potentially high-impact areas for closer review

Phase 4: Triage

Routes files to the appropriate specialized agents:

  • Agent matching — determines which agents are relevant for each file
  • Priority assignment — orders files by potential impact and complexity
  • Context bundling — prepares the right context for each agent's review

Phase 5: Rules

Loads and filters project-specific rules:

  • Rule discovery — finds rules from .diffray/rules/ and default rule sets
  • Relevance filtering — agents filter rules to only those applicable to current changes
  • Priority weighting — orders rules by importance for the specific review context

Phase 6: Review

Specialized agents analyze different aspects in parallel:

  • Parallel execution — multiple agents work simultaneously for speed
  • Focused analysis — each agent applies its specialty (security, performance, bugs, etc.)
  • Tool integration — agents can invoke static analyzers to verify findings
  • Evidence gathering — agents collect specific file locations and code references

Phase 7: Deduplication

Merges and rescores overlapping findings:

  • Similarity detection — identifies when multiple agents found the same issue
  • Consensus scoring — issues found by multiple agents get higher confidence
  • Finding consolidation — merges duplicate reports into single, comprehensive findings

Phase 8: Validation

Verifies issues and rescores confidence:

  • False positive filtering — removes issues that don't hold up to scrutiny
  • Confidence recalculation — adjusts scores based on validation results
  • Evidence verification — confirms that reported issues actually exist in the code
  • Severity assessment — finalizes issue severity based on validated impact

Phase 9: Report

Generates the final PR comments:

  • Comment formatting — structures findings for readability
  • Code references — links directly to relevant lines in the PR
  • Actionable suggestions — provides specific recommendations for fixes
  • Summary generation — creates an overview of all findings

Continuous Evolution

The engines evolve with the latest advances:

  • New models — latest AI capabilities integrated as they become available
  • Tool updates — static analyzers kept current with language evolution
  • Rule refinement — review rules continuously improved based on feedback
  • Performance optimization — faster reviews without sacrificing quality

The result? A multi-agent system that combines AI reasoning with concrete code analysis — delivering accurate, verified findings instead of speculation.