Memory Critical
🚫
Severity: Critical | Alert Threshold: Memory usage > 95% OR OOM events
Overview
This alert triggers when memory usage reaches critical levels (>95%) or Out-of-Memory (OOM) events occur. This requires immediate action to prevent service disruption.
Immediate Actions
⚠️
Time-Critical: Service may crash within minutes. Act immediately.
Step 1: Increase Memory (Do This First)
gcloud run services update thinkhive-demo \
--region us-central1 \
--memory 2GiStep 2: Reduce Concurrency
gcloud run services update thinkhive-demo \
--region us-central1 \
--concurrency 20Step 3: Scale Out
gcloud run services update thinkhive-demo \
--region us-central1 \
--min-instances 2 \
--max-instances 20Diagnostic Steps
Check for OOM Events
# Search for OOM kills
gcloud logging read 'textPayload=~"OOM" OR textPayload=~"out of memory" OR textPayload=~"heap"' \
--limit 20 \
--freshness=1hIdentify Memory Consumer
# Check what's using memory
gcloud logging read 'textPayload=~"memory" AND severity>=WARNING' --limit 30Check Recent Traffic Spike
Traffic spikes can cause memory exhaustion:
- Large batch uploads
- Many concurrent analysis requests
- Evaluation runs on large datasets
Review Recent Deployments
gcloud run revisions list --service thinkhive-demo --region us-central1 --limit 5Root Cause Analysis
| Symptom | Likely Cause | Fix |
|---|---|---|
| Gradual increase | Memory leak | Find and fix leak, restart |
| Sudden spike | Large request/batch | Add request size limits |
| After deployment | New code issue | Rollback |
| Traffic correlated | Under-provisioned | Increase memory |
Emergency Rollback
If recent deployment is suspected:
# List revisions
gcloud run revisions list --service thinkhive-demo --region us-central1
# Rollback to previous stable version
gcloud run services update-traffic thinkhive-demo \
--region us-central1 \
--to-revisions PREVIOUS_REVISION=100Recovery Checklist
- Memory increased to safe level
- Concurrency reduced if needed
- Service is stable (check health endpoints)
- Root cause identified
- Incident documented
- Prevention measures planned
Prevention
Request Size Limits
// Add to Express middleware
app.use(express.json({ limit: '10mb' }));
app.use(express.urlencoded({ limit: '10mb', extended: true }));Memory Monitoring
// Add memory monitoring
setInterval(() => {
const usage = process.memoryUsage();
const heapPercent = (usage.heapUsed / usage.heapTotal) * 100;
if (heapPercent > 90) {
console.warn(`High heap usage: ${heapPercent.toFixed(1)}%`);
}
}, 30000);Graceful Degradation
- Implement circuit breakers for memory-heavy operations
- Queue large batch operations
- Stream responses instead of buffering