Private AI chat interface with Open WebUI and Ollama running on local infrastructure
← All Articles
AI Automation

Build Your Own Private ChatGPT with Open WebUI and Ollama (2026 Guide)

Why You Need a Private ChatGPT

Every message sent to ChatGPT, Claude, or Gemini passes through external servers. For personal use, that is fine. For internal company knowledge, client data, proprietary code, or anything covered by compliance requirements — it is a risk.

A private ChatGPT gives you the same conversational AI experience with one critical difference: nothing leaves your server. No API costs scaling with usage. No vendor lock-in. No terms of service changes breaking your workflow overnight.

After running this setup for over a year on a Proxmox homelab, the combination of Open WebUI and Ollama is the most production-ready self-hosted AI stack available today.

AI robot representing private local language model deployment Private AI means your data stays on your hardware — no third-party APIs required

What You Are Building

The stack has two components:

  • Ollama — runs LLM models locally, exposes an API on port 11434
  • Open WebUI — a browser-based chat interface that connects to Ollama, supports multiple users, conversation history, file uploads, and model switching

The result is a ChatGPT-like interface accessible from any browser on your network, powered entirely by models running on your own hardware.

Hardware Requirements

You do not need a GPU for smaller models. CPU inference on quantized models is completely viable for personal and small team use.

SetupRAMStorageModels You Can Run
Minimum8GB20GBPhi-3 mini, Gemma 2B, TinyLlama
Recommended16GB50GBLlama 3.1 8B, Mistral 7B, DeepSeek R1 8B
Power user32GB+100GB+Llama 3.1 70B (Q4), Mixtral, CodeLlama 34B

For teams of 2-5 people, a machine with 32GB RAM and an older NVIDIA GPU (RTX 3060 12GB or better) handles concurrent requests well.

Step 1: Install Ollama

On Linux or macOS:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version

Pull your first model:

ollama pull llama3.1:8b

This downloads the Llama 3.1 8B model (4.7GB). Test it:

ollama run llama3.1:8b "Explain Kubernetes pods in one paragraph"

If you get a response, Ollama is working.

Step 2: Deploy Open WebUI with Docker

Open WebUI runs as a single Docker container. Create a docker-compose.yml:

version: '3.8'

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: always
    ports:
      - "3000:8080"
    volumes:
      - open-webui-data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open-webui-data:

Start it:

docker compose up -d

Open http://your-server-ip:3000 in your browser. Create an admin account on first visit — this becomes the system administrator.

Docker terminal showing container deployment process Open WebUI deploys as a single Docker container — production-ready in minutes

Step 3: Configure Models

Open WebUI automatically detects all models available in Ollama. Pull additional models based on your use case:

# General purpose conversation
ollama pull llama3.1:8b

# Coding assistance
ollama pull codellama:13b

# Fast responses for simple tasks
ollama pull phi3:mini

# Reasoning and analysis
ollama pull deepseek-r1:8b

In the Open WebUI interface, click the model selector at the top of any conversation to switch between models.

Step 4: Multi-User Setup

Open WebUI supports multiple users out of the box. As admin, go to Admin Panel and configure:

  • Default Sign Up Role: Set to “User” or “Pending” (requires admin approval)
  • Default Model: Choose the model new users see first
  • Enable Signup: Toggle on for team access

Each user gets their own conversation history, saved prompts, and preferences. No data crosses between accounts.

Step 5: Secure the Deployment

For production use beyond localhost, add these configurations:

Reverse Proxy with Nginx

server {
    listen 443 ssl;
    server_name ai.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/ai.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/ai.yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Restrict Ollama to Localhost

By default, Ollama listens on 127.0.0.1:11434. Do not expose this port to the internet. Open WebUI communicates with Ollama internally — external access is unnecessary.

Firewall Rules

# Allow only HTTPS
sudo ufw allow 443/tcp

# Block direct access to Open WebUI port
sudo ufw deny 3000/tcp

# Block direct access to Ollama
sudo ufw deny 11434/tcp

Step 6: API Integration

Ollama exposes a REST API compatible with the OpenAI format. Integrate it into your existing tools:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {"role": "user", "content": "Write a Terraform VPC module"}
    ]
  }'

This means any tool that works with the OpenAI API can point to your local Ollama instance instead — scripts, AI DevOps tools, IDE extensions, and automation pipelines.

Performance Tips

Model selection matters more than hardware. A 7B model on 16GB RAM responds in 2-3 seconds. A 70B model on the same hardware takes 30+ seconds per response.

Use quantized models. Q4 quantization cuts model size by 75% with minimal quality loss. Ollama uses Q4 by default.

Keep hot models loaded. Ollama keeps recently used models in memory. Switching models constantly forces reload from disk. For team use, stick to 1-2 primary models.

Monitor memory usage:

ollama ps

This shows which models are loaded and their memory footprint.

Server infrastructure representing self-hosted AI deployment Self-hosted AI runs on your own infrastructure — from a homelab to a dedicated server

Cost Comparison: Self-Hosted vs Cloud AI

Self-Hosted (Ollama)OpenAI APIClaude API
Monthly cost (light use)$0 (existing hardware)$20-50$20-50
Monthly cost (team of 5)$0$100-500$100-500
Data privacyCompleteThird-partyThird-party
Internet requiredNoYesYes
Model choiceAny open modelGPT-4 onlyClaude only
Uptime dependencyYour hardwareVendor statusVendor status

For teams doing 50+ queries per day, self-hosting pays for itself within the first month — even if you buy dedicated hardware.

Key Takeaways

  • Open WebUI + Ollama gives you a fully private ChatGPT alternative with zero API costs
  • Runs on existing hardware — no GPU required for smaller models
  • Multi-user support with separate conversation histories out of the box
  • OpenAI-compatible API means existing tools and scripts work without changes
  • Secure the deployment with a reverse proxy and firewall rules for team access
  • Start with Llama 3.1 8B — best balance of speed and quality for most use cases

FAQ

Can I run Open WebUI without a GPU?

Yes. Ollama runs models on CPU with quantization. A 7B parameter model on CPU with 16GB RAM gives 2-3 second response times. Smaller models like Phi-3 mini run on 8GB RAM. GPU accelerates inference but is not required.

How does Open WebUI compare to ChatGPT?

The interface is very similar — conversations, chat history, file uploads, and model switching. The difference is that Open WebUI runs on your hardware, supports any open-source model, and costs nothing beyond electricity. The tradeoff is that open-source models are generally less capable than GPT-4 or Claude for complex reasoning tasks.

Can multiple people use it at the same time?

Yes. Open WebUI supports concurrent users with separate accounts and conversation histories. Each user can select different models and maintain their own settings. For 5+ concurrent users, 32GB RAM and a GPU are recommended.

Is it secure enough for company use?

With proper setup — HTTPS, reverse proxy, firewall rules, and authentication — yes. The critical advantage is that no data leaves your network. For compliance-sensitive environments (healthcare, finance, government), this is often a requirement.

Which model should I start with?

Llama 3.1 8B is the best starting point. It handles general conversation, summarization, and code generation well on 16GB RAM. Add DeepSeek R1 8B for reasoning tasks and CodeLlama for programming-specific work.

Conclusion

Running a private ChatGPT is no longer experimental. Open WebUI and Ollama make it production-ready with a single Docker container and a few commands. Your data stays on your hardware, your costs stay at zero, and your team gets the AI tools they need without vendor dependencies.

The setup takes under 30 minutes. The models keep improving every month. There has never been a better time to self-host AI.

Need help deploying a private AI stack for your team? View our Local AI Deployment service

Read next: How to Run DeepSeek R1 Locally with Ollama

Written by
SysOpX
Battle-tested DevOps & AWS engineering guides
Need DevOps help? →