High Latency
⚠️
Severity: Medium-High | Alert Threshold: P95 latency > 2 seconds for 5+ minutes
Overview
This alert triggers when the 95th percentile response time exceeds 2 seconds sustained over 5 minutes.
Impact Assessment
| P95 Latency | User Impact | Priority |
|---|---|---|
| 2-5s | Noticeable slowness | Medium |
| 5-10s | Poor user experience | High |
| > 10s | Unusable, potential timeouts | Critical |
Diagnostic Steps
Identify Slow Endpoints
# Check Cloud Run metrics for latency by path
gcloud logging read 'resource.type="cloud_run_revision"
httpRequest.latency>"2s"' \
--limit 50 \
--format "table(httpRequest.requestUrl, httpRequest.latency)"Check Database Performance
# Look for slow queries in logs
gcloud logging read 'textPayload=~"slow query"' --limit 20
# Check active connections
# Via Neon dashboard: https://console.neon.techCheck External API Latency
Common external dependencies:
- OpenAI API calls (LLM operations)
- Auth0 (authentication)
- Pinecone (vector search)
# Look for timeout/slow external calls
gcloud logging read 'textPayload=~"timeout" OR textPayload=~"ETIMEDOUT"' --limit 20Review Resource Utilization
# Check CPU utilization
gcloud run services describe thinkhive-demo \
--region us-central1 \
--format "value(status)"Common Causes & Remediation
Symptoms: Queries taking > 1 second
Diagnostic:
-- Check for missing indexes (run in Neon console)
SELECT schemaname, tablename, indexname
FROM pg_indexes
WHERE schemaname = 'thinkhive';Fix:
- Add missing indexes
- Optimize N+1 queries
- Enable query caching
Quick Mitigations
Increase Resources
gcloud run services update thinkhive-demo \
--region us-central1 \
--memory 1Gi \
--cpu 2 \
--concurrency 80Enable Minimum Instances
gcloud run services update thinkhive-demo \
--region us-central1 \
--min-instances 2Scale Out
gcloud run services update thinkhive-demo \
--region us-central1 \
--max-instances 20Prevention
- Implement request timeouts
- Add caching for expensive operations
- Use connection pooling for database
- Set up performance budgets
- Regular load testing
- Monitor P50/P95/P99 latencies