Why You Need a Private ChatGPT
Every message sent to ChatGPT, Claude, or Gemini passes through external servers. For personal use, that is fine. For internal company knowledge, client data, proprietary code, or anything covered by compliance requirements — it is a risk.
A private ChatGPT gives you the same conversational AI experience with one critical difference: nothing leaves your server. No API costs scaling with usage. No vendor lock-in. No terms of service changes breaking your workflow overnight.
After running this setup for over a year on a Proxmox homelab, the combination of Open WebUI and Ollama is the most production-ready self-hosted AI stack available today.
Private AI means your data stays on your hardware — no third-party APIs required
What You Are Building
The stack has two components:
- Ollama — runs LLM models locally, exposes an API on port 11434
- Open WebUI — a browser-based chat interface that connects to Ollama, supports multiple users, conversation history, file uploads, and model switching
The result is a ChatGPT-like interface accessible from any browser on your network, powered entirely by models running on your own hardware.
Hardware Requirements
You do not need a GPU for smaller models. CPU inference on quantized models is completely viable for personal and small team use.
| Setup | RAM | Storage | Models You Can Run |
|---|---|---|---|
| Minimum | 8GB | 20GB | Phi-3 mini, Gemma 2B, TinyLlama |
| Recommended | 16GB | 50GB | Llama 3.1 8B, Mistral 7B, DeepSeek R1 8B |
| Power user | 32GB+ | 100GB+ | Llama 3.1 70B (Q4), Mixtral, CodeLlama 34B |
For teams of 2-5 people, a machine with 32GB RAM and an older NVIDIA GPU (RTX 3060 12GB or better) handles concurrent requests well.
Step 1: Install Ollama
On Linux or macOS:
curl -fsSL https://ollama.com/install.sh | sh
Verify the installation:
ollama --version
Pull your first model:
ollama pull llama3.1:8b
This downloads the Llama 3.1 8B model (4.7GB). Test it:
ollama run llama3.1:8b "Explain Kubernetes pods in one paragraph"
If you get a response, Ollama is working.
Step 2: Deploy Open WebUI with Docker
Open WebUI runs as a single Docker container. Create a docker-compose.yml:
version: '3.8'
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: always
ports:
- "3000:8080"
volumes:
- open-webui-data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
open-webui-data:
Start it:
docker compose up -d
Open http://your-server-ip:3000 in your browser. Create an admin account on first visit — this becomes the system administrator.
Open WebUI deploys as a single Docker container — production-ready in minutes
Step 3: Configure Models
Open WebUI automatically detects all models available in Ollama. Pull additional models based on your use case:
# General purpose conversation
ollama pull llama3.1:8b
# Coding assistance
ollama pull codellama:13b
# Fast responses for simple tasks
ollama pull phi3:mini
# Reasoning and analysis
ollama pull deepseek-r1:8b
In the Open WebUI interface, click the model selector at the top of any conversation to switch between models.
Step 4: Multi-User Setup
Open WebUI supports multiple users out of the box. As admin, go to Admin Panel and configure:
- Default Sign Up Role: Set to “User” or “Pending” (requires admin approval)
- Default Model: Choose the model new users see first
- Enable Signup: Toggle on for team access
Each user gets their own conversation history, saved prompts, and preferences. No data crosses between accounts.
Step 5: Secure the Deployment
For production use beyond localhost, add these configurations:
Reverse Proxy with Nginx
server {
listen 443 ssl;
server_name ai.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/ai.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ai.yourdomain.com/privkey.pem;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Restrict Ollama to Localhost
By default, Ollama listens on 127.0.0.1:11434. Do not expose this port to the internet. Open WebUI communicates with Ollama internally — external access is unnecessary.
Firewall Rules
# Allow only HTTPS
sudo ufw allow 443/tcp
# Block direct access to Open WebUI port
sudo ufw deny 3000/tcp
# Block direct access to Ollama
sudo ufw deny 11434/tcp
Step 6: API Integration
Ollama exposes a REST API compatible with the OpenAI format. Integrate it into your existing tools:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [
{"role": "user", "content": "Write a Terraform VPC module"}
]
}'
This means any tool that works with the OpenAI API can point to your local Ollama instance instead — scripts, AI DevOps tools, IDE extensions, and automation pipelines.
Performance Tips
Model selection matters more than hardware. A 7B model on 16GB RAM responds in 2-3 seconds. A 70B model on the same hardware takes 30+ seconds per response.
Use quantized models. Q4 quantization cuts model size by 75% with minimal quality loss. Ollama uses Q4 by default.
Keep hot models loaded. Ollama keeps recently used models in memory. Switching models constantly forces reload from disk. For team use, stick to 1-2 primary models.
Monitor memory usage:
ollama ps
This shows which models are loaded and their memory footprint.
Self-hosted AI runs on your own infrastructure — from a homelab to a dedicated server
Cost Comparison: Self-Hosted vs Cloud AI
| Self-Hosted (Ollama) | OpenAI API | Claude API | |
|---|---|---|---|
| Monthly cost (light use) | $0 (existing hardware) | $20-50 | $20-50 |
| Monthly cost (team of 5) | $0 | $100-500 | $100-500 |
| Data privacy | Complete | Third-party | Third-party |
| Internet required | No | Yes | Yes |
| Model choice | Any open model | GPT-4 only | Claude only |
| Uptime dependency | Your hardware | Vendor status | Vendor status |
For teams doing 50+ queries per day, self-hosting pays for itself within the first month — even if you buy dedicated hardware.
Key Takeaways
- Open WebUI + Ollama gives you a fully private ChatGPT alternative with zero API costs
- Runs on existing hardware — no GPU required for smaller models
- Multi-user support with separate conversation histories out of the box
- OpenAI-compatible API means existing tools and scripts work without changes
- Secure the deployment with a reverse proxy and firewall rules for team access
- Start with Llama 3.1 8B — best balance of speed and quality for most use cases
FAQ
Can I run Open WebUI without a GPU?
Yes. Ollama runs models on CPU with quantization. A 7B parameter model on CPU with 16GB RAM gives 2-3 second response times. Smaller models like Phi-3 mini run on 8GB RAM. GPU accelerates inference but is not required.
How does Open WebUI compare to ChatGPT?
The interface is very similar — conversations, chat history, file uploads, and model switching. The difference is that Open WebUI runs on your hardware, supports any open-source model, and costs nothing beyond electricity. The tradeoff is that open-source models are generally less capable than GPT-4 or Claude for complex reasoning tasks.
Can multiple people use it at the same time?
Yes. Open WebUI supports concurrent users with separate accounts and conversation histories. Each user can select different models and maintain their own settings. For 5+ concurrent users, 32GB RAM and a GPU are recommended.
Is it secure enough for company use?
With proper setup — HTTPS, reverse proxy, firewall rules, and authentication — yes. The critical advantage is that no data leaves your network. For compliance-sensitive environments (healthcare, finance, government), this is often a requirement.
Which model should I start with?
Llama 3.1 8B is the best starting point. It handles general conversation, summarization, and code generation well on 16GB RAM. Add DeepSeek R1 8B for reasoning tasks and CodeLlama for programming-specific work.
Conclusion
Running a private ChatGPT is no longer experimental. Open WebUI and Ollama make it production-ready with a single Docker container and a few commands. Your data stays on your hardware, your costs stay at zero, and your team gets the AI tools they need without vendor dependencies.
The setup takes under 30 minutes. The models keep improving every month. There has never been a better time to self-host AI.
Need help deploying a private AI stack for your team? View our Local AI Deployment service
Read next: How to Run DeepSeek R1 Locally with Ollama