Shadow Testing Guide

Test proposed fixes against real traffic without affecting production.

What is Shadow Testing?

Shadow testing runs a proposed fix alongside your production agent:

Real queries are sent to both versions
Responses are compared
No impact on users

Workflow

Identify a failure cluster

const clusters = await api.get('/api/v1/cases', {
  params: { agentId: 'agent_123', status: 'open' }
});
 
const targetCluster = clusters.data[0];
// { id: 'case_001', title: 'Auth guidance errors', traceCount: 47 }

Generate a fix

const fix = await api.post('/api/v1/fixes', {
  caseId: 'case_001',
  type: 'prompt_update',
  description: 'Add 2FA verification step',
  changes: {
    prompt: 'Updated system prompt with 2FA instructions'
  }
});

Run shadow test

const test = await api.post('/api/v1/shadow-tests', {
  fixId: fix.data.id,
  config: {
    sampleSize: 50,
    comparisonMode: 'side_by_side'
  }
});

Review results

const results = await api.get(`/api/v1/shadow-tests/${test.data.id}/results`);
 
console.log(results.data);
// {
//   improved: 42,
//   unchanged: 5,
//   regressed: 3,
//   improvementRate: 0.84,
//   recommendation: 'Safe to deploy'
// }

Apply fix

if (results.data.improvementRate > 0.8) {
  await api.post(`/api/v1/fixes/${fix.data.id}/apply`);
}

Configuration Options

Option	Description
`sampleSize`	Number of traces to test
`comparisonMode`	`side_by_side` or `sequential`
`metrics`	Metrics to compare
`threshold`	Minimum improvement required

Best Practices

Test with representative samples
Include edge cases
Set appropriate thresholds
Monitor regression closely

Next Steps

Deployment - Deploy to production
API Reference - Issues and fixes API

Drift Monitoring Multi-Agent Tracing