Event Loop Lag
⚠️
Severity: High | Alert Threshold: Event loop lag > 100ms for 2+ minutes
Overview
This alert triggers when the Node.js event loop is blocked for extended periods, causing all requests to queue and response times to degrade.
Understanding Event Loop Lag
The event loop is Node.js’s mechanism for handling asynchronous operations. When it’s blocked:
- All incoming requests queue up
- Response times increase dramatically
- Health checks may fail
- Service appears unresponsive
Diagnostic Steps
Check for Blocking Operations
# Look for synchronous operation warnings
gcloud logging read 'textPayload=~"sync" OR textPayload=~"blocking"' --limit 20Identify Heavy Computations
Common blockers in ThinkHive:
- Large JSON parsing (traces with many spans)
- Synchronous file operations
- Complex regex evaluation
- Large array operations
Check CPU Usage
High CPU correlates with event loop blocking:
# View CPU metrics
gcloud monitoring metrics list --filter="metric.type=run.googleapis.com/container/cpu/utilization"Review Recent Changes
- New evaluation criteria with complex logic?
- Changes to trace processing?
- New synchronous operations?
Common Causes & Remediation
Symptoms: Lag when processing large traces
Fix: Stream JSON parsing
// Instead of
const data = JSON.parse(hugeString);
// Use streaming
const { parse } = require('stream-json');
const pipeline = stream.pipe(parse());Quick Mitigations
Increase CPU Allocation
gcloud run services update thinkhive-demo \
--region us-central1 \
--cpu 2Reduce Concurrency
Fewer concurrent requests = less event loop contention:
gcloud run services update thinkhive-demo \
--region us-central1 \
--concurrency 20Scale Out
gcloud run services update thinkhive-demo \
--region us-central1 \
--min-instances 3 \
--max-instances 20Monitoring Event Loop
Add monitoring to the application:
// Using prom-client or similar
const { monitorEventLoopDelay } = require('perf_hooks');
const histogram = monitorEventLoopDelay({ resolution: 20 });
histogram.enable();
setInterval(() => {
console.log(`Event loop P99: ${histogram.percentile(99) / 1e6}ms`);
histogram.reset();
}, 60000);Prevention
- Profile code for blocking operations
- Use worker threads for CPU-intensive tasks
- Implement request size limits
- Add timeouts to all operations
- Regular performance testing
- Monitor event loop metrics