How to Run DeepSeek R1 Locally with Ollama (No Cloud, No API Costs)

What Makes DeepSeek R1 Different

DeepSeek R1 is not just another language model. It is a reasoning model — meaning it thinks through problems step by step before answering, similar to how OpenAI’s o1 works. The difference is that DeepSeek R1 is open-source and runs locally on your own hardware.

For DevOps and infrastructure work, this matters. When you ask DeepSeek R1 to debug a Terraform state conflict or analyze a Kubernetes pod failure, it breaks down the problem methodically rather than pattern-matching to the most common answer. The reasoning chain is visible in the output, so you can follow and verify its logic.

Running it locally through Ollama means zero API costs, complete privacy, and no internet dependency. After testing it on a Proxmox homelab for several months, it has become the go-to model for complex infrastructure questions.

AI neural network visualization representing DeepSeek R1 reasoning capabilities DeepSeek R1 brings reasoning-capable AI to local hardware — no cloud required

DeepSeek R1 Model Variants

DeepSeek R1 comes in multiple sizes. Each variant is a distilled version of the full model, optimized for different hardware:

Model	Parameters	RAM Required	Download Size	Best For
deepseek-r1:1.5b	1.5B	4GB	1.1GB	Testing, simple queries
deepseek-r1:7b	7B	8GB	4.7GB	Daily use on laptops
deepseek-r1:8b	8B	8GB	4.9GB	Best quality/speed balance
deepseek-r1:14b	14B	16GB	9.0GB	Complex reasoning
deepseek-r1:32b	32B	24GB	19GB	Near-full quality
deepseek-r1:70b	70B	48GB+	43GB	Maximum quality, needs GPU

For most DevOps work, the 8B variant hits the right balance. Fast enough for interactive use, smart enough to handle real infrastructure problems.

Step 1: Install Ollama

If you do not have Ollama installed:

# Linux / macOS
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version

On macOS, you can also install via Homebrew:

brew install ollama
ollama serve

Step 2: Pull DeepSeek R1

Download the 8B variant (recommended starting point):

ollama pull deepseek-r1:8b

This downloads approximately 4.9GB. On a typical connection, it takes 5-10 minutes.

For machines with 16GB+ RAM, the 14B variant delivers noticeably better reasoning:

ollama pull deepseek-r1:14b

Step 3: Run DeepSeek R1

Interactive mode:

ollama run deepseek-r1:8b

You get a chat prompt. Try a reasoning-heavy question:

>>> My Terraform apply failed with "Error: cycle detected" between an 
    AWS security group and an EC2 instance. How do I debug this?

DeepSeek R1 will show its reasoning process in <think> tags before delivering the answer. This chain-of-thought output is what separates it from standard models.

Single query mode:

ollama run deepseek-r1:8b "Explain the difference between RDS Multi-AZ and Read Replicas"

Terminal showing Ollama running a local AI model Running DeepSeek R1 locally — the model thinks through problems before answering

Step 4: Use the API

Ollama exposes a REST API compatible with OpenAI’s format:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1:8b",
    "messages": [
      {
        "role": "user",
        "content": "Write a bash script that checks for unattached EBS volumes in all AWS regions"
      }
    ]
  }'

Python Integration

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "deepseek-r1:8b",
    "prompt": "Generate a Kubernetes NetworkPolicy that allows only port 443 ingress from the frontend namespace",
    "stream": False
})

print(response.json()["response"])

Use with Open WebUI

For a browser-based ChatGPT-like interface with DeepSeek R1, deploy Open WebUI. It automatically detects all Ollama models including DeepSeek R1 and provides conversation history, multi-user support, and file uploads.

Step 5: Create a Custom Modelfile

Customize DeepSeek R1’s behavior for your specific use case:

cat << 'EOF' > Modelfile-devops
FROM deepseek-r1:8b

SYSTEM """You are a senior DevOps engineer specializing in AWS, Kubernetes, 
and Terraform. When asked about infrastructure, always consider: cost 
implications, security best practices, high availability, and disaster 
recovery. Provide specific commands and configurations, not general advice."""

PARAMETER temperature 0.3
PARAMETER num_ctx 8192
EOF

ollama create devops-r1 -f Modelfile-devops

Now run your custom model:

ollama run devops-r1 "Design a multi-AZ RDS setup with automated failover"

The lower temperature (0.3) gives more consistent, deterministic answers — better for infrastructure work where you want reliability over creativity.

DeepSeek R1 vs Other Local Models

After running multiple models on the same hardware, here is how DeepSeek R1 compares for DevOps tasks:

Model	Reasoning	Code Quality	Speed (8B)	Infrastructure Knowledge
DeepSeek R1 8B	Excellent	Very good	15 tok/s	Strong
Llama 3.1 8B	Good	Good	18 tok/s	Good
Mistral 7B	Good	Good	20 tok/s	Moderate
Gemma 2 9B	Good	Moderate	16 tok/s	Moderate
Phi-3 Mini	Basic	Basic	25 tok/s	Basic

DeepSeek R1 wins on reasoning tasks. When you ask it to debug a complex Terraform dependency issue or design a disaster recovery architecture, the step-by-step thinking produces significantly better answers than models that jump straight to a response.

Llama 3.1 is faster for simple tasks. For quick lookups, command syntax, and straightforward questions, Llama 3.1 responds faster because it skips the reasoning step.

Best strategy: Keep both models available and switch based on the task complexity.

Performance Optimization

Memory Management

Check what is running:

ollama ps

Ollama keeps the last-used model in memory. To free memory:

ollama stop deepseek-r1:8b

GPU Acceleration

If you have an NVIDIA GPU, Ollama uses it automatically. Verify:

nvidia-smi

For partial GPU offloading (model too large for VRAM):

OLLAMA_NUM_GPU=20 ollama run deepseek-r1:14b

This offloads 20 layers to GPU while keeping the rest in RAM — a useful trick when your model barely exceeds VRAM capacity.

Increase Context Window

Default context is 2048 tokens. For longer conversations:

ollama run deepseek-r1:8b --num-ctx 8192

More context uses more memory. The 8B model with 8192 context needs approximately 10GB RAM.

Coding workspace with multiple monitors showing code DeepSeek R1 handles complex infrastructure questions that simpler models struggle with

Real-World Use Cases

Debugging Terraform Errors

Prompt: My terraform plan shows "forces replacement" on an RDS instance 
when I only changed the backup_retention_period. Why is this happening 
and how do I fix it without destroying the database?

DeepSeek R1 correctly identifies that certain RDS parameters require replacement while others can be modified in-place, and suggests using lifecycle { ignore_changes } as an immediate fix while explaining the proper long-term approach.

Writing Ansible Playbooks

Prompt: Write an Ansible playbook that hardens SSH on Ubuntu 22.04 — 
disable root login, change default port, enable key-only auth, and 
configure fail2ban.

The reasoning model checks each hardening step for dependency order and produces a working playbook with proper handlers and idempotent tasks.

Incident Response Analysis

Prompt: CloudWatch shows CPU at 98% on our application server, but the 
application logs show normal request rates. Memory is at 60%. What 
should I check?

DeepSeek R1 systematically works through possible causes — zombie processes, runaway cron jobs, OOM killer approaching, swap thrashing — rather than giving a generic “check your processes” response.

Key Takeaways

DeepSeek R1 is a reasoning model — it thinks step by step, producing better answers for complex technical questions
The 8B variant runs on 8GB RAM without a GPU and responds in 2-3 seconds
Custom Modelfiles let you create specialized versions for DevOps, security, or coding tasks
The OpenAI-compatible API integrates with existing tools and scripts
Keep DeepSeek R1 for complex reasoning and Llama 3.1 for quick lookups — use both
Zero API costs and complete data privacy make it ideal for internal team use

FAQ

Is DeepSeek R1 as good as ChatGPT for DevOps work?

For reasoning-heavy tasks like debugging, architecture decisions, and root cause analysis — it is surprisingly close to GPT-4 quality, especially the 14B and 32B variants. For general conversation and creative writing, cloud models still have an edge. The key advantage is that DeepSeek R1 runs locally with zero cost and complete privacy.

How much disk space does DeepSeek R1 need?

The 8B model needs 4.9GB of disk space. The 14B model needs 9GB. The 70B model needs 43GB. Ollama stores models in ~/.ollama/models/ — make sure this partition has enough space. You can change the storage location with the OLLAMA_MODELS environment variable.

Can I run DeepSeek R1 on a Mac?

Yes. Ollama runs natively on macOS with Apple Silicon (M1/M2/M3/M4) acceleration. The 8B model performs well on any Mac with 16GB unified memory. Apple’s Metal GPU acceleration is used automatically, giving better performance than CPU-only Linux machines with similar specs.

What is the thinking/reasoning output in DeepSeek R1?

DeepSeek R1 outputs its reasoning process in <think> tags before the final answer. This shows how the model breaks down the problem, considers alternatives, and arrives at its conclusion. You can see it following a logical chain rather than jumping to an answer. This is especially valuable for infrastructure decisions where understanding the “why” matters as much as the “what.”

Can I use DeepSeek R1 with my team?

Yes. Deploy Open WebUI with Ollama for a multi-user ChatGPT-like interface. Each team member gets their own account with separate conversation history. For a team of 5, a machine with 32GB RAM and a mid-range GPU handles concurrent requests well.

Conclusion

DeepSeek R1 brings reasoning-capable AI to your own hardware. The setup takes 10 minutes, costs nothing to run, and keeps your data completely private. For DevOps engineers dealing with complex infrastructure decisions daily, having a reasoning model available locally is a genuine productivity multiplier.

Start with the 8B variant. If you need more capability, move up to 14B. The models keep improving with every release — and your self-hosted setup benefits from every upgrade without changing a line of configuration.

Need help setting up a local AI inference server for your team? View our Local AI Deployment service