Build an AI Slack Bot That Answers From Your Runbooks (RAG + Ollama Guide)

The Problem Every DevOps Team Has

Someone asks in Slack: “How do we failover the database to the standby region?” The answer exists — buried in a runbook written six months ago, in a Confluence page nobody bookmarked, or in the head of the engineer who is currently on vacation.

This happens multiple times per week on every team. The knowledge exists. Finding it under pressure is the problem.

A RAG-powered Slack bot fixes this. It reads your runbooks, wiki pages, and incident postmortems, then answers questions from that knowledge base — in Slack, where your team already works. Running on Ollama means the entire system stays private. No company data touches external APIs.

Slack interface showing AI bot responding to infrastructure questions An AI Slack bot that answers from your actual runbooks — not generic internet knowledge

How RAG Works (30-Second Explanation)

RAG (Retrieval Augmented Generation) is a simple concept:

Index your documents — runbooks, wiki pages, incident reports — into a vector database
When someone asks a question, search the vector database for the most relevant document chunks
Feed those chunks to the LLM along with the question
The LLM generates an answer based on your actual documentation, not its training data

The result: an AI that gives answers grounded in your specific infrastructure, processes, and tooling — not generic DevOps advice from the internet.

Architecture Overview

Slack (user asks question)
    |
    v
Bot Server (Python/FastAPI)
    |
    v
Vector Search (ChromaDB)
    |-- finds relevant doc chunks
    v
Ollama (local LLM)
    |-- generates answer from chunks
    v
Slack (bot replies with answer + source links)

All components run on your infrastructure. Nothing leaves your network.

Step 1: Set Up the Document Pipeline

First, collect your documentation into a single directory:

mkdir -p ~/rag-bot/docs

# Copy runbooks
cp /path/to/runbooks/*.md ~/rag-bot/docs/

# Export Confluence pages (use Confluence API or export)
# Export Notion pages (use Notion API or export as markdown)

The bot works best with markdown files. Convert other formats:

# PDF to markdown
pip install marker-pdf
marker ~/rag-bot/docs/incident-report.pdf ~/rag-bot/docs/

# HTML to markdown
pip install html2text
html2text page.html > ~/rag-bot/docs/page.md

Step 2: Build the Vector Index

Install dependencies:

pip install chromadb sentence-transformers langchain langchain-community

Create the indexing script:

# index_docs.py
import os
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

DOCS_DIR = "./docs"
DB_DIR = "./vectordb"

# Load all markdown files
loader = DirectoryLoader(DOCS_DIR, glob="**/*.md", loader_cls=TextLoader)
documents = loader.load()

print(f"Loaded {len(documents)} documents")

# Split into chunks (500 tokens with overlap for context)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n## ", "\n### ", "\n\n", "\n", " "]
)
chunks = splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks")

# Create embeddings and store
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=DB_DIR
)

print(f"Vector database created at {DB_DIR}")

Run it:

python index_docs.py

This creates a local vector database from your documentation. The all-MiniLM-L6-v2 embedding model runs on CPU and is fast enough for this use case.

Step 3: Build the RAG Query Engine

# rag_engine.py
import requests
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

DB_DIR = "./vectordb"
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3.1:8b"

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectordb = Chroma(persist_directory=DB_DIR, embedding_function=embeddings)


def ask(question: str) -> dict:
    # Search for relevant chunks
    results = vectordb.similarity_search(question, k=4)

    # Build context from chunks
    context = "\n\n---\n\n".join([doc.page_content for doc in results])
    sources = list(set([doc.metadata.get("source", "unknown") for doc in results]))

    # Build prompt
    prompt = f"""You are a DevOps assistant that answers questions based on 
internal documentation. Use ONLY the context provided below to answer. 
If the answer is not in the context, say "I could not find this in our 
documentation."

Context:
{context}

Question: {question}

Answer concisely and include specific commands or steps when relevant."""

    # Query Ollama
    response = requests.post(OLLAMA_URL, json={
        "model": MODEL,
        "prompt": prompt,
        "stream": False
    })

    answer = response.json()["response"]

    return {
        "answer": answer,
        "sources": sources
    }

Test it locally:

python -c "from rag_engine import ask; print(ask('How do we failover the database?'))"

Terminal showing RAG pipeline processing documents The RAG pipeline indexes your documentation and retrieves relevant chunks for each question

Step 4: Create the Slack Bot

Create a Slack App at api.slack.com/apps:

Click Create New App and select From scratch
Name it “RunbookBot” and select your workspace
Under OAuth & Permissions, add these Bot Token Scopes:
- app_mentions:read
- chat:write
- channels:history
Under Event Subscriptions, enable and subscribe to app_mention
Install the app to your workspace and copy the Bot User OAuth Token

Build the bot server:

# bot.py
import os
from fastapi import FastAPI, Request
from slack_sdk import WebClient
from rag_engine import ask

app = FastAPI()
slack = WebClient(token=os.environ["SLACK_BOT_TOKEN"])


@app.post("/slack/events")
async def handle_event(request: Request):
    body = await request.json()

    # Slack URL verification
    if body.get("type") == "url_verification":
        return {"challenge": body["challenge"]}

    event = body.get("event", {})

    if event.get("type") == "app_mention":
        question = event["text"].split(">", 1)[-1].strip()
        channel = event["channel"]
        thread_ts = event.get("thread_ts", event["ts"])

        # Query the RAG engine
        result = ask(question)

        # Format response
        source_list = "\n".join([f"- `{s}`" for s in result["sources"]])
        message = f"{result['answer']}\n\n*Sources:*\n{source_list}"

        slack.chat_postMessage(
            channel=channel,
            thread_ts=thread_ts,
            text=message
        )

    return {"ok": True}

Run it:

export SLACK_BOT_TOKEN="xoxb-your-token-here"
pip install fastapi uvicorn slack-sdk
uvicorn bot:app --host 0.0.0.0 --port 8000

Expose it with ngrok for development or deploy behind your reverse proxy for production.

Step 5: Deploy with Docker Compose

For production, run the entire stack with Docker:

version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: rag-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    deploy:
      resources:
        reservations:
          memory: 8G

  bot:
    build: .
    container_name: rag-slack-bot
    ports:
      - "8000:8000"
    environment:
      - SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN}
      - OLLAMA_URL=http://ollama:11434/api/generate
    volumes:
      - ./vectordb:/app/vectordb
      - ./docs:/app/docs
    depends_on:
      - ollama

volumes:
  ollama-data:

Step 6: Keep the Index Updated

Set up a cron job to re-index when documentation changes:

# Re-index every night at 2 AM
0 2 * * * cd /opt/rag-bot && python index_docs.py >> /var/log/rag-index.log 2>&1

For Confluence or Notion, write a sync script that pulls updated pages before re-indexing:

# sync_docs.py
import subprocess

# Pull latest runbooks from git
subprocess.run(["git", "-C", "./docs/runbooks", "pull"])

# Re-index
subprocess.run(["python", "index_docs.py"])

Real-World Usage Examples

After deploying this for a team of 8 engineers, these were the most common queries:

Query	Response Time	Accuracy
”How do we restart the payment service?“	3 sec	Correct, included exact kubectl commands
”What is the RDS failover procedure?“	4 sec	Correct, referenced the DR runbook steps
”Who is on-call this week?“	2 sec	Correct (pulled from on-call schedule doc)
“What caused the outage on March 15?“	5 sec	Correct, summarized the postmortem
”How do I access the staging database?“	3 sec	Correct, included connection string format

The bot answers 80-85% of routine questions correctly. For the remaining 15-20%, it says “I could not find this in our documentation” — which is the correct behavior. No hallucination.

Performance Tips

Chunk size matters. 500 tokens per chunk works well for runbooks. For longer documents like architecture docs, increase to 1000 tokens with 100 token overlap.

Model selection: Llama 3.1 8B is the best balance for this use case. DeepSeek R1 is better for complex reasoning but slower — use it only if your runbooks require multi-step analysis.

Embedding model: all-MiniLM-L6-v2 is fast and accurate for technical documentation. If accuracy is critical, upgrade to all-mpnet-base-v2 (slower but better retrieval).

Response caching: For repeated questions, add a simple cache layer to avoid re-querying Ollama.

Key Takeaways

RAG lets your AI bot answer from your actual documentation, not generic training data
The entire stack (Ollama + ChromaDB + Slack bot) runs on your infrastructure with complete privacy
500-token chunks with overlap work best for technical runbooks
The bot correctly answers 80-85% of routine DevOps questions
Set up automated re-indexing to keep answers current
Llama 3.1 8B provides the best speed/quality balance for this use case
Total setup time: 2-3 hours for a working prototype

FAQ

How much RAM does the full stack need?

Ollama with Llama 3.1 8B needs 8GB. ChromaDB with 1000 documents uses under 1GB. The bot server uses negligible resources. Total: 10-12GB RAM for a comfortable setup. A machine with 16GB handles everything with room to spare.

Can it answer questions about things not in the documentation?

No — by design. The prompt instructs the model to only answer from the provided context. If the answer is not in your indexed documents, the bot responds with “I could not find this in our documentation.” This prevents hallucination, which is critical for infrastructure operations.

How do I add new documents?

Drop markdown files into the docs/ directory and re-run python index_docs.py. The vector database rebuilds in seconds for typical documentation sets (under 1000 files). For continuous updates, use the cron-based sync described in Step 6.

Can multiple Slack channels use the same bot?

Yes. Install the bot to any channel and mention it with @RunbookBot. Each channel can ask different questions — the bot searches the same knowledge base. You can also create separate knowledge bases per team by running multiple bot instances with different document directories.

How does this compare to Slack’s built-in AI?

Slack AI searches message history. This bot searches your structured documentation — runbooks, postmortems, architecture docs. The answers are grounded in verified documentation, not ad-hoc Slack conversations that may be outdated or incorrect.

Conclusion

A RAG-powered Slack bot transforms how your team accesses operational knowledge. Instead of searching Confluence, asking in Slack, or pinging the on-call engineer, anyone can get accurate answers from your actual documentation in seconds.

The setup takes an afternoon. The knowledge base improves as you add more documentation. And since everything runs on your own infrastructure with Ollama, no company data ever leaves your network.

Need help building an AI-powered operations bot for your team? View our consulting services