Build an AI Slack Bot That Answers From Your Runbooks (RAG + Ollama Guide)
The Problem Every DevOps Team Has
Someone asks in Slack: “How do we failover the database to the standby region?” The answer exists — buried in a runbook written six months ago, in a Confluence page nobody bookmarked, or in the head of the engineer who is currently on vacation.
This happens multiple times per week on every team. The knowledge exists. Finding it under pressure is the problem.
A RAG-powered Slack bot fixes this. It reads your runbooks, wiki pages, and incident postmortems, then answers questions from that knowledge base — in Slack, where your team already works. Running on Ollama means the entire system stays private. No company data touches external APIs.
An AI Slack bot that answers from your actual runbooks — not generic internet knowledge
How RAG Works (30-Second Explanation)
RAG (Retrieval Augmented Generation) is a simple concept:
- Index your documents — runbooks, wiki pages, incident reports — into a vector database
- When someone asks a question, search the vector database for the most relevant document chunks
- Feed those chunks to the LLM along with the question
- The LLM generates an answer based on your actual documentation, not its training data
The result: an AI that gives answers grounded in your specific infrastructure, processes, and tooling — not generic DevOps advice from the internet.
Architecture Overview
Slack (user asks question)
|
v
Bot Server (Python/FastAPI)
|
v
Vector Search (ChromaDB)
|-- finds relevant doc chunks
v
Ollama (local LLM)
|-- generates answer from chunks
v
Slack (bot replies with answer + source links)
All components run on your infrastructure. Nothing leaves your network.
Step 1: Set Up the Document Pipeline
First, collect your documentation into a single directory:
mkdir -p ~/rag-bot/docs
# Copy runbooks
cp /path/to/runbooks/*.md ~/rag-bot/docs/
# Export Confluence pages (use Confluence API or export)
# Export Notion pages (use Notion API or export as markdown)
The bot works best with markdown files. Convert other formats:
# PDF to markdown
pip install marker-pdf
marker ~/rag-bot/docs/incident-report.pdf ~/rag-bot/docs/
# HTML to markdown
pip install html2text
html2text page.html > ~/rag-bot/docs/page.md
Step 2: Build the Vector Index
Install dependencies:
pip install chromadb sentence-transformers langchain langchain-community
Create the indexing script:
# index_docs.py
import os
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
DOCS_DIR = "./docs"
DB_DIR = "./vectordb"
# Load all markdown files
loader = DirectoryLoader(DOCS_DIR, glob="**/*.md", loader_cls=TextLoader)
documents = loader.load()
print(f"Loaded {len(documents)} documents")
# Split into chunks (500 tokens with overlap for context)
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50,
separators=["\n## ", "\n### ", "\n\n", "\n", " "]
)
chunks = splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")
# Create embeddings and store
embeddings = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2"
)
vectordb = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory=DB_DIR
)
print(f"Vector database created at {DB_DIR}")
Run it:
python index_docs.py
This creates a local vector database from your documentation. The all-MiniLM-L6-v2 embedding model runs on CPU and is fast enough for this use case.
Step 3: Build the RAG Query Engine
# rag_engine.py
import requests
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
DB_DIR = "./vectordb"
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3.1:8b"
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectordb = Chroma(persist_directory=DB_DIR, embedding_function=embeddings)
def ask(question: str) -> dict:
# Search for relevant chunks
results = vectordb.similarity_search(question, k=4)
# Build context from chunks
context = "\n\n---\n\n".join([doc.page_content for doc in results])
sources = list(set([doc.metadata.get("source", "unknown") for doc in results]))
# Build prompt
prompt = f"""You are a DevOps assistant that answers questions based on
internal documentation. Use ONLY the context provided below to answer.
If the answer is not in the context, say "I could not find this in our
documentation."
Context:
{context}
Question: {question}
Answer concisely and include specific commands or steps when relevant."""
# Query Ollama
response = requests.post(OLLAMA_URL, json={
"model": MODEL,
"prompt": prompt,
"stream": False
})
answer = response.json()["response"]
return {
"answer": answer,
"sources": sources
}
Test it locally:
python -c "from rag_engine import ask; print(ask('How do we failover the database?'))"
The RAG pipeline indexes your documentation and retrieves relevant chunks for each question
Step 4: Create the Slack Bot
Create a Slack App at api.slack.com/apps:
- Click Create New App and select From scratch
- Name it “RunbookBot” and select your workspace
- Under OAuth & Permissions, add these Bot Token Scopes:
app_mentions:readchat:writechannels:history
- Under Event Subscriptions, enable and subscribe to
app_mention - Install the app to your workspace and copy the Bot User OAuth Token
Build the bot server:
# bot.py
import os
from fastapi import FastAPI, Request
from slack_sdk import WebClient
from rag_engine import ask
app = FastAPI()
slack = WebClient(token=os.environ["SLACK_BOT_TOKEN"])
@app.post("/slack/events")
async def handle_event(request: Request):
body = await request.json()
# Slack URL verification
if body.get("type") == "url_verification":
return {"challenge": body["challenge"]}
event = body.get("event", {})
if event.get("type") == "app_mention":
question = event["text"].split(">", 1)[-1].strip()
channel = event["channel"]
thread_ts = event.get("thread_ts", event["ts"])
# Query the RAG engine
result = ask(question)
# Format response
source_list = "\n".join([f"- `{s}`" for s in result["sources"]])
message = f"{result['answer']}\n\n*Sources:*\n{source_list}"
slack.chat_postMessage(
channel=channel,
thread_ts=thread_ts,
text=message
)
return {"ok": True}
Run it:
export SLACK_BOT_TOKEN="xoxb-your-token-here"
pip install fastapi uvicorn slack-sdk
uvicorn bot:app --host 0.0.0.0 --port 8000
Expose it with ngrok for development or deploy behind your reverse proxy for production.
Step 5: Deploy with Docker Compose
For production, run the entire stack with Docker:
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: rag-ollama
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
deploy:
resources:
reservations:
memory: 8G
bot:
build: .
container_name: rag-slack-bot
ports:
- "8000:8000"
environment:
- SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN}
- OLLAMA_URL=http://ollama:11434/api/generate
volumes:
- ./vectordb:/app/vectordb
- ./docs:/app/docs
depends_on:
- ollama
volumes:
ollama-data:
Step 6: Keep the Index Updated
Set up a cron job to re-index when documentation changes:
# Re-index every night at 2 AM
0 2 * * * cd /opt/rag-bot && python index_docs.py >> /var/log/rag-index.log 2>&1
For Confluence or Notion, write a sync script that pulls updated pages before re-indexing:
# sync_docs.py
import subprocess
# Pull latest runbooks from git
subprocess.run(["git", "-C", "./docs/runbooks", "pull"])
# Re-index
subprocess.run(["python", "index_docs.py"])
Real-World Usage Examples
After deploying this for a team of 8 engineers, these were the most common queries:
| Query | Response Time | Accuracy |
|---|---|---|
| ”How do we restart the payment service?“ | 3 sec | Correct, included exact kubectl commands |
| ”What is the RDS failover procedure?“ | 4 sec | Correct, referenced the DR runbook steps |
| ”Who is on-call this week?“ | 2 sec | Correct (pulled from on-call schedule doc) |
| “What caused the outage on March 15?“ | 5 sec | Correct, summarized the postmortem |
| ”How do I access the staging database?“ | 3 sec | Correct, included connection string format |
The bot answers 80-85% of routine questions correctly. For the remaining 15-20%, it says “I could not find this in our documentation” — which is the correct behavior. No hallucination.
Performance Tips
Chunk size matters. 500 tokens per chunk works well for runbooks. For longer documents like architecture docs, increase to 1000 tokens with 100 token overlap.
Model selection: Llama 3.1 8B is the best balance for this use case. DeepSeek R1 is better for complex reasoning but slower — use it only if your runbooks require multi-step analysis.
Embedding model: all-MiniLM-L6-v2 is fast and accurate for technical documentation. If accuracy is critical, upgrade to all-mpnet-base-v2 (slower but better retrieval).
Response caching: For repeated questions, add a simple cache layer to avoid re-querying Ollama.
Key Takeaways
- RAG lets your AI bot answer from your actual documentation, not generic training data
- The entire stack (Ollama + ChromaDB + Slack bot) runs on your infrastructure with complete privacy
- 500-token chunks with overlap work best for technical runbooks
- The bot correctly answers 80-85% of routine DevOps questions
- Set up automated re-indexing to keep answers current
- Llama 3.1 8B provides the best speed/quality balance for this use case
- Total setup time: 2-3 hours for a working prototype
FAQ
How much RAM does the full stack need?
Ollama with Llama 3.1 8B needs 8GB. ChromaDB with 1000 documents uses under 1GB. The bot server uses negligible resources. Total: 10-12GB RAM for a comfortable setup. A machine with 16GB handles everything with room to spare.
Can it answer questions about things not in the documentation?
No — by design. The prompt instructs the model to only answer from the provided context. If the answer is not in your indexed documents, the bot responds with “I could not find this in our documentation.” This prevents hallucination, which is critical for infrastructure operations.
How do I add new documents?
Drop markdown files into the docs/ directory and re-run python index_docs.py. The vector database rebuilds in seconds for typical documentation sets (under 1000 files). For continuous updates, use the cron-based sync described in Step 6.
Can multiple Slack channels use the same bot?
Yes. Install the bot to any channel and mention it with @RunbookBot. Each channel can ask different questions — the bot searches the same knowledge base. You can also create separate knowledge bases per team by running multiple bot instances with different document directories.
How does this compare to Slack’s built-in AI?
Slack AI searches message history. This bot searches your structured documentation — runbooks, postmortems, architecture docs. The answers are grounded in verified documentation, not ad-hoc Slack conversations that may be outdated or incorrect.
Conclusion
A RAG-powered Slack bot transforms how your team accesses operational knowledge. Instead of searching Confluence, asking in Slack, or pinging the on-call engineer, anyone can get accurate answers from your actual documentation in seconds.
The setup takes an afternoon. The knowledge base improves as you add more documentation. And since everything runs on your own infrastructure with Ollama, no company data ever leaves your network.
Need help building an AI-powered operations bot for your team? View our consulting services
Read next: AIOps Explained: How AI Is Transforming Incident Response