RunbooksDatabase Slow

Database Slow

⚠️

Severity: High | Alert Threshold: Query P95 > 1 second for 5+ minutes

Overview

This alert triggers when database query latency exceeds acceptable thresholds, indicating potential database performance issues.

Quick Checks

CheckCommand/Location
Neon Statushttps://neonstatus.com
Connection CountNeon Dashboard → Connections
Query PerformanceNeon Dashboard → Query Stats

Diagnostic Steps

Check Neon Dashboard

  1. Log into Neon Console
  2. Select the ThinkHive project
  3. Check “Monitoring” tab for:
    • Active connections
    • Query latency
    • CPU/Memory usage

Identify Slow Queries

-- Run in Neon SQL Editor
SELECT
  query,
  calls,
  mean_exec_time,
  total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Check Connection Pool

# Look for connection errors in logs
gcloud logging read 'textPayload=~"connection" AND severity>=WARNING' --limit 20

Check Table Statistics

-- Check for tables needing VACUUM
SELECT
  schemaname,
  relname,
  n_dead_tup,
  last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

Common Causes & Remediation

Symptoms: Full table scans, slow WHERE clauses

Diagnostic:

-- Find missing indexes
SELECT
  relname,
  seq_scan,
  idx_scan,
  seq_scan - idx_scan AS diff
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
ORDER BY diff DESC;

Fix: Add appropriate indexes

CREATE INDEX CONCURRENTLY idx_traces_agent_id
ON traces(agent_id);

Emergency Actions

Kill Long-Running Queries

-- Find long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
 
-- Terminate a specific query (use with caution)
SELECT pg_terminate_backend(PID);

Restart Connection Pool

# Force service restart to reset connections
gcloud run services update thinkhive-demo \
  --region us-central1 \
  --update-env-vars DB_RESTART=$(date +%s)

Prevention

  • Regular ANALYZE on frequently updated tables
  • Monitor query patterns and add indexes proactively
  • Implement query timeouts
  • Use read replicas for heavy read workloads
  • Archive historical data regularly
  • Set up slow query logging