GuidesShadow Testing

Shadow Testing Guide

Test proposed fixes against real traffic without affecting production.

What is Shadow Testing?

Shadow testing runs a proposed fix alongside your production agent:

  • Real queries are sent to both versions
  • Responses are compared
  • No impact on users

Workflow

Identify a failure cluster

const clusters = await api.get('/api/v1/cases', {
  params: { agentId: 'agent_123', status: 'open' }
});
 
const targetCluster = clusters.data[0];
// { id: 'case_001', title: 'Auth guidance errors', traceCount: 47 }

Generate a fix

const fix = await api.post('/api/v1/fixes', {
  caseId: 'case_001',
  type: 'prompt_update',
  description: 'Add 2FA verification step',
  changes: {
    prompt: 'Updated system prompt with 2FA instructions'
  }
});

Run shadow test

const test = await api.post('/api/v1/shadow-tests', {
  fixId: fix.data.id,
  config: {
    sampleSize: 50,
    comparisonMode: 'side_by_side'
  }
});

Review results

const results = await api.get(`/api/v1/shadow-tests/${test.data.id}/results`);
 
console.log(results.data);
// {
//   improved: 42,
//   unchanged: 5,
//   regressed: 3,
//   improvementRate: 0.84,
//   recommendation: 'Safe to deploy'
// }

Apply fix

if (results.data.improvementRate > 0.8) {
  await api.post(`/api/v1/fixes/${fix.data.id}/apply`);
}

Configuration Options

OptionDescription
sampleSizeNumber of traces to test
comparisonModeside_by_side or sequential
metricsMetrics to compare
thresholdMinimum improvement required

Best Practices

  1. Test with representative samples
  2. Include edge cases
  3. Set appropriate thresholds
  4. Monitor regression closely

Next Steps