The Challenge: Write-Only Runbooks
Every engineering organization writes runbooks. They live in Confluence, Google Docs, GitHub wikis, Notion pages, and Slack bookmarks. Teams spend weeks writing detailed procedures for every failure scenario.
Then an incident happens at 3 AM, and nobody reads them.
The problem isn’t that runbooks don’t exist. It’s that finding the right runbook under pressure is harder than the incident itself:
- Scattered knowledge. Runbooks live in 4+ different systems. The Kubernetes rollback procedure is in GitHub. The database failover guide is in Confluence. The PagerDuty escalation matrix is in a Google Sheet.
- Stale content. The deployment guide was last updated 8 months ago. Since then, the team migrated from Helm to ArgoCD. The runbook is worse than useless — it’s actively misleading.
- Search doesn’t work. You remember a runbook exists about “that thing where the queue backs up.” Good luck finding it with keyword search when you don’t remember the exact title.
- Context switching. During an incident, switching from terminal → browser → wiki → search → scroll → read adds 5-10 minutes of friction. Every time.
“Our MTTR was 45 minutes. Of that, 30 minutes was finding and reading the right runbook. Only 15 minutes was actually fixing the problem.”
The Solution: DevOps RAG
DevOps RAG is a retrieval-augmented generation system purpose-built for operational knowledge. It ingests all your runbooks — from Git repos, wikis, docs, wherever they live — chunks them, embeds them with OpenAI, and makes them queryable in natural language.
The key difference from generic RAG: DevOps RAG has no UI. Its only interface is MCP (Model Context Protocol), which means any AI-powered coding environment — Claude Code, Cursor, Codex, OpenClaw — can query your operational knowledge without leaving the terminal.
Architecture: Git → Chunks → Embeddings → Answers
┌──────────────────────────────────────────────────────────┐
│ Ingestion Pipeline │
│ │
│ Git Repos ──┐ │
│ Markdown ───┤ │
│ Confluence ─┤──▶ Chunking ──▶ OpenAI ──▶ Vector Store │
│ Google Docs ┤ (semantic) Embeddings (Pinecone) │
│ Slack ──────┘ │
│ │
│ Webhook: PR merged → re-index affected runbooks │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ Query Pipeline │
│ │
│ User Query ──▶ Embed ──▶ Similarity Search ──▶ Top-K │
│ (cosine, k=5) Chunks │
│ │ │
│ LLM Context │
│ │ │
│ Contextual Answer │
│ + Citations │
└──────────────────────────────────────────────────────────┘Two design decisions make this system work in practice:
- Git-native ingestion. Runbooks live in Git, not a wiki. Every PR merge triggers re-indexing. DevOps RAG always has the latest version. No more “this runbook is from 2023” surprises.
- MCP-first interface. The primary consumer of operational knowledge in 2026 isn’t a human with a browser — it’s an AI agent with a task. MCP exposes three tools that any compatible agent can call.
MCP Integration: Three Tools, Zero UI
DevOps RAG exposes exactly three MCP tools. This is intentional — fewer tools means agents use them correctly more often:
1. ask_devops — Ask a Question, Get an Answer
// MCP tool: ask_devops
{
"question": "How do I rollback a failed Kubernetes deployment?",
"top_k": 5,
"verbose": true
}
// Response:
{
"answer": "To rollback a failed Kubernetes deployment:\n\n1. Check current rollout status:\n kubectl rollout status deployment/<name>\n\n2. Rollback to previous version:\n kubectl rollout undo deployment/<name>\n\n3. Verify rollback succeeded:\n kubectl rollout status deployment/<name>\n\n4. If specific revision needed:\n kubectl rollout undo deployment/<name> --to-revision=<n>",
"citations": [
{
"source": "runbooks/kubernetes-rollback.md",
"chunk": "Section 3: Emergency Rollback Procedure",
"relevance": 0.94
},
{
"source": "runbooks/deployment-guide.md",
"chunk": "Section 7: Rollback Strategies",
"relevance": 0.87
}
]
}2. search_runbooks — Find Relevant Documents
// MCP tool: search_runbooks
{
"topic": "database failover"
}
// Response:
{
"runbooks": [
{
"source": "runbooks/postgres-failover.md",
"title": "PostgreSQL Failover Procedure",
"relevance": 0.92
},
{
"source": "runbooks/rds-disaster-recovery.md",
"title": "RDS Multi-AZ Failover",
"relevance": 0.85
}
]
}3. list_sources — Inventory Your Knowledge Base
// MCP tool: list_sources
{}
// Response:
{
"total_chunks": 45,
"total_sources": 18,
"sbom_components": 240,
"sources": [
"runbooks/kubernetes-rollback.md",
"runbooks/postgres-failover.md",
"runbooks/incident-escalation.md",
"runbooks/deployment-guide.md",
...
]
}Setup: 5 Minutes to Queryable Runbooks
Adding DevOps RAG to your coding environment takes one configuration block:
// Claude Code / Cursor / OpenClaw MCP config
{
"mcpServers": {
"devops-rag": {
"command": "node",
"args": ["/path/to/devops-rag-mcp/index.js"],
"env": {
"DEVOPS_RAG_URL": "https://devops-rag.avyay.ai",
"DEVOPS_RAG_API_KEY": "your-api-key"
}
}
}
}Or deploy with Docker for self-hosted environments:
# Docker deployment
docker run -d \
--name devops-rag \
-p 8080:8080 \
-e OPENAI_API_KEY=sk-... \
-e PINECONE_API_KEY=... \
-v ./runbooks:/app/runbooks \
ghcr.io/gaurav21/devops-rag:latest
# Ingest your runbooks
curl -X POST http://localhost:8080/api/ingest \
-H "Content-Type: application/json" \
-d '{"source_dir": "/app/runbooks"}'
# Query
curl http://localhost:8080/api/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do I scale the worker pool?"}'Real Data: 18 Sources, 45 Chunks, Sub-Second Retrieval
Here’s what our production DevOps RAG instance looks like:
| Metric | Value |
|---|---|
| Total knowledge chunks | 45 |
| Source runbooks | 18 |
| SBOM components tracked | 240 |
| Average query latency | <800ms |
| Embedding model | OpenAI text-embedding-3-small |
| Vector dimensions | 1536 |
| Similarity metric | Cosine |
Example: Incident at 2 AM
Here’s how an incident plays out with DevOps RAG vs without:
| Step | Without DevOps RAG | With DevOps RAG |
|---|---|---|
| Alert fires | Open PagerDuty (2 min) | Open PagerDuty (2 min) |
| Find runbook | Search Confluence, Slack, GitHub (15 min) | ask_devops "queue backing up" (10 sec) |
| Read & understand | Scroll through 20-page doc (10 min) | Get specific steps with context (30 sec) |
| Execute fix | Follow (possibly outdated) steps (15 min) | Follow current, cited steps (12 min) |
| Total MTTR | ~42 min | ~15 min |
The biggest time savings aren’t in the fix itself — they’re in eliminating the search-and-read overhead. Instead of 25 minutes finding and parsing a runbook, the on-call engineer asks one question and gets actionable steps with citations in under a second.
Always Up to Date: Git-Native Ingestion
The #1 failure mode of internal knowledge systems is stale content. DevOps RAG solves this by treating runbooks as code:
# GitHub webhook: re-index on PR merge
# .github/workflows/reindex-runbooks.yml
name: Re-index Runbooks
on:
push:
branches: [main]
paths:
- 'runbooks/**'
jobs:
reindex:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Trigger re-indexing
run: |
curl -X POST https://devops-rag.avyay.ai/api/ingest \
-H "Authorization: Bearer ${{ secrets.DEVOPS_RAG_KEY }}" \
-H "Content-Type: application/json" \
-d '{"source_dir": "runbooks/", "force": true}'When an engineer updates a runbook (via PR, reviewed and merged like any other code change), the webhook triggers re-indexing automatically. The vector embeddings are refreshed within minutes. No manual sync. No “remember to update the wiki.”
The Results: 67% MTTR Reduction
| Metric | Before | After | Change |
|---|---|---|---|
| Mean time to resolution | 45 min | 15 min | -67% |
| Time finding runbooks | 25 min | <1 min | -96% |
| Runbook coverage | ~60% (unknown gaps) | 100% (auditable) | +40% |
| Runbook freshness | Months (manual updates) | Minutes (auto-reindex) | Real-time |
| Knowledge accessibility | Browser + search | Terminal + natural language | — |
The Compound Effect
The MTTR improvement is the headline number, but the real value compounds over time:
- New engineers onboard faster. Instead of “go read the wiki,” they ask questions and get answers. The learning curve for operational knowledge drops from weeks to days.
- Runbooks actually get written. When runbooks are consumed by AI (not humans scrolling), there’s less pressure for perfect formatting and more emphasis on accurate content. Engineers write more because the friction is lower.
- Agents handle routine incidents. When MĀRGA detects a provider outage, the on-call agent queries DevOps RAG for the failover procedure and executes it autonomously. Human intervention needed only for novel incidents.
- SBOM tracking comes free. With 240 components tracked, DevOps RAG also serves as an inventory of your software supply chain — queryable with the same natural language interface.
Get Started with DevOps RAG
# Option 1: Docker (self-hosted)
docker pull ghcr.io/gaurav21/devops-rag:latest
docker run -p 8080:8080 \
-e OPENAI_API_KEY=sk-... \
-v ./runbooks:/app/runbooks \
ghcr.io/gaurav21/devops-rag:latest
# Option 2: MCP server (for Claude Code / Cursor)
npm install -g @avyay/devops-rag-mcp
# Option 3: REST API
curl https://devops-rag.avyay.ai/api/ask \
-H "Authorization: Bearer your-key" \
-d '{"question": "How do I rollback a deployment?"}'- MCP Server: Available for Claude Code, Cursor, and OpenClaw
- Documentation: docs.avyay.ai/devops-rag
- REST API: Docker deployment or managed endpoint
Gaurav Sharma is the founder of Avyay (अव्यय). DevOps RAG is part of the Avyay platform’s operational intelligence layer. Read the full architecture at avyay.ai/blog/avyay-architecture.