Core Concepts
ThinkHive uses a run-centric architecture to model AI agent behavior. Understanding these concepts will help you get the most out of the platform.
The Run-Centric Model
Core Entities
Run
A Run is a complete agent execution from input to final output. It represents one unit of work your AI agent performs.
const run = {
id: "run_abc123",
name: "customer-support-chat",
input: "How do I reset my password?",
output: "To reset your password, go to Settings > Security...",
outcome: "success",
startTime: "2024-01-15T10:30:00Z",
endTime: "2024-01-15T10:30:02Z",
metadata: {
customerId: "cust_xyz",
channel: "chat"
}
};Trace
A Trace is the collection of operations (spans) that occurred during a run. It provides the detailed timeline of what happened.
Span
A Span represents a single operation within a trace:
| Span Type | Description | Example |
|---|---|---|
| LLM | A call to a language model | GPT-4 completion |
| Retrieval | A search or retrieval operation | Vector database query |
| Tool | An external tool invocation | Web search, calculator |
| Chain | A workflow or orchestration step | LangChain chain execution |
const span = {
id: "span_def456",
type: "llm",
name: "gpt-4-response",
startTime: "2024-01-15T10:30:00.500Z",
endTime: "2024-01-15T10:30:01.800Z",
attributes: {
"llm.model": "gpt-4",
"llm.provider": "openai",
"llm.input_tokens": 150,
"llm.output_tokens": 200,
}
};Policies
A Policy defines behavioral rules for your AI agent without requiring code changes. Policies let you configure guardrails, routing logic, and response constraints through the ThinkHive dashboard.
const policy = {
id: "policy_001",
name: "No Financial Advice",
type: "guardrail",
status: "active",
rules: [
{
condition: "topic_contains",
value: ["investment", "stock", "financial advice"],
action: "block",
fallbackResponse: "I'm not able to provide financial advice. Please consult a licensed advisor."
}
]
};Policy Types
| Type | Description | Example |
|---|---|---|
| Guardrail | Block or modify responses matching criteria | Prevent financial/medical advice |
| Routing | Direct requests to specific agent configurations | Route VIP customers to premium model |
| Constraint | Enforce response format or content rules | Require citations, limit length |
| Fallback | Define behavior when primary agent fails | Graceful degradation responses |
Policies are evaluated at runtime and can be updated without redeploying your agent. Changes take effect immediately.
Quality Concepts
Claims
Claims are assertions extracted from agent responses. They are categorized as:
Facts are statements that can be verified against source material or known truths.
Inferences are conclusions drawn from available information.
const claim = {
id: "claim_789",
content: "The password reset link expires after 24 hours",
type: "fact", // or "inference"
confidence: 0.95,
source: "retrieved_doc_1",
supported: true,
};Cases
A Case is a cluster of similar failures or issues. ThinkHive automatically groups related problems to help you prioritize improvements.
const case_ = {
id: "case_001",
title: "Password reset instructions incomplete",
severity: "medium",
status: "open",
traceCount: 47,
pattern: "Missing 2FA reset step when user has MFA enabled",
suggestedFix: "Add conditional step for MFA users"
};Fixes
A Fix is a proposed improvement for a case. ThinkHive can generate fix suggestions automatically.
Shadow Tests
A Shadow Test validates a fix by comparing new behavior against the baseline without affecting production.
Evaluation Metrics
RAG Quality Metrics
| Metric | Description | Range |
|---|---|---|
| Groundedness | Response supported by retrieved context | 0-1 |
| Faithfulness | Response consistent with context | 0-1 |
| Citation Accuracy | Proper attribution to sources | 0-1 |
| Context Relevance | Retrieved docs match query | 0-1 |
Hallucination Types
ThinkHive detects 9 types of hallucinations:
- Unsupported Claims - Statements not in source material
- Factual Errors - Incorrect facts
- Contradictions - Conflicting statements
- Out-of-Scope - Answering beyond available info
- Fabricated References - Made-up citations
- Missing Context - Omitting critical information
- Logical Fallacies - Invalid reasoning
- Semantic Drift - Subtle meaning changes
- Attribution Errors - Incorrect source attribution
Business Concepts
Customer Context
Customer Context captures the state of a customer at a point in time, enabling time-series analysis.
const context = {
customerId: "cust_xyz",
timestamp: "2024-01-15T10:30:00Z",
metrics: {
subscription_tier: "premium",
arr: 5000,
health_score: 0.85,
last_support_contact: "2024-01-10"
}
};Calibration
Calibration measures how well confidence scores predict actual outcomes. A well-calibrated model predicts outcomes accurately.
const calibration = {
brierScore: 0.12, // Lower is better (0-1)
expectedCalibrationError: 0.08,
status: "well_calibrated" // or "under_confident", "over_confident"
};Ticket Linking
ThinkHive supports 7 methods for deterministically linking runs to support tickets:
- Explicit run ID in ticket
- Zendesk marker embedding
- Timestamp correlation
- Conversation thread ID
- Customer session ID
- Semantic similarity
- Manual linking
API Tiers
| Tier | Rate Limit | Features |
|---|---|---|
| Free | 10/min | Basic tracing, explainability |
| Starter | 60/min | + Search, patterns, quality metrics |
| Professional | 300/min | + RAG eval, hallucinations, ROI |
| Enterprise | 1,000/min | All features + dedicated support |
Next Steps
- Your First Trace - Build a complete traced application
- JavaScript SDK - Detailed SDK documentation
- API Reference - REST API endpoints