Can Your Laptop Run Local LLMs?
Almost certainly yes. In March 2026, most laptops made in the last 3-4 years can run smaller AI models. High-end laptops can run models that rival GPT-4.
This guide walks through exactly how to get started on Mac (Apple Silicon M4) and Windows (NVIDIA RTX GPUs). We'll cover hardware requirements, software installation, and optimization tips.
Quick Hardware Check
📬 Enjoying this guide?
Get these updates in your inbox every week
New VRAM data, model benchmarks, and setup guides — straight to you. Free.
Before you start, check if your laptop meets these specs:
Minimum Specs (3B-8B models)
- RAM: 8GB minimum
- Storage: 10GB free space
- GPU: Any modern GPU with 4GB+ VRAM, OR Apple Silicon M1/M2/M3/M4
- Models you can run: Phi-3 Mini (3B), Llama 4 Vega 8B (Q3-Q4 quantization), MiniMax M2.1 Flash 8B
Recommended Specs (8B-30B models)
- RAM: 16GB
- Storage: 25GB free space
- GPU: NVIDIA with 8GB+ VRAM, OR Apple Silicon M4 Pro with 24GB+ unified memory
- Models you can run: GLM-4.7 Flash 30B (Q4), Qwen3 30B, MiMo-V2 34B, DeepSeek V3.2 7B
High-End Specs (70B-109B models)
- RAM: 48GB+
- Storage: 80GB+ free space
- GPU: NVIDIA with 24GB+ VRAM (RTX 5070 Ti+, 5090), OR Apple M4 Max/Ultra with 64GB+ unified memory
- Models you can run: Llama 4 Scout 109B, DeepSeek V3.2 235B (Q3), Kimi K2.5 72B, GLM-5 128B
macOS Setup (Apple Silicon)
Apple Silicon Macs (M1, M2, M3, M4) are excellent for local LLMs because they use unified memory — your system RAM doubles as VRAM.
Step 1: Check Your Mac Specs
| Mac Chip | Memory | Best Model Size | Example Models |
|---|---|---|---|
| M1, M2, M3 | 8GB | 3B-8B (Q3-Q4) | Phi-3 Mini, Llama 4 Vega 8B Q3 |
| M1, M2, M3, M4 | 16GB | 8B (Q5-Q8) | Llama 4 Vega 8B, GLM-4.7 Flash |
| M4 Pro | 24-32GB | 30B (Q4-Q5) | Qwen3 30B, GLM-4.7 30B |
| M4 Pro, M4 Max | 48GB | 30B (Q8) or 72B (Q4) | Kimi K2.5 72B Q4, MiMo-V2 34B |
| M4 Max | 64-96GB | 72B (Q5-Q8) | Kimi K2.5, Qwen3-Coder 70B |
| M4 Ultra | 128-192GB | 109B+ (Q5-Q8) | Llama 4 Scout 109B, DeepSeek V3.2 235B Q3 |
Step 2: Install Homebrew (if not installed)
Open Terminal (Applications → Utilities → Terminal) and run:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Follow the on-screen instructions. This takes 2-5 minutes.
Step 3: Install Ollama
brew install ollamaThis installs Ollama, the easiest way to run local models on Mac.
Step 4: Start Ollama Service
ollama serveKeep this Terminal window open. Ollama is now running in the background.
Step 5: Download and Run Your First Model
Open a new Terminal window (⌘T) and run:
ollama pull glm4-flashThis downloads the GLM-4.7 Flash 30B model (~18GB). Wait for it to complete.
Then start chatting:
ollama run glm4-flashType your question and press Enter. You're now running a local AI model!
Step 6: Optimize Performance (Optional)
For faster inference on Mac:
- M1/M2/M3/M4 (8-16GB):
glm4-flash,phi3:mini,llama4-vega - M4 Pro (24-32GB):
qwen3,glm4-flash,deepseek-v3.2:7b - M4 Max (48-96GB):
kimi-k2.5,llama4-scout,qwen3-coder - M4 Ultra (128-192GB):
llama4-scout,deepseek-v3.2:speciale,glm5
Windows Setup (NVIDIA GPUs)
If your Windows laptop has an NVIDIA GPU (RTX 3060 or newer, RTX 50-series recommended), you can run local LLMs with excellent performance.
Step 1: Check Your GPU
| VRAM | Best Model Size | Example Models |
|---|---|---|
| 4GB | 3B-8B (Q2-Q3) | Phi-3 Mini, Llama 4 Vega 8B Q2 |
| 6GB | 8B (Q3-Q4) | Llama 4 Vega 8B Q4 |
| 8GB | 8B (Q5-Q8) | Llama 4 Vega Q8, GLM-4.7 Flash |
| 12GB | 30B (Q3-Q4) | Qwen3 30B Q4, MiMo-V2 13B Q8 |
| 16GB | 30B (Q5) or 72B (Q2) | GLM-4.7 30B Q5, Kimi K2.5 Q2 |
| 24GB+ | 72B (Q4) or 109B (Q3) | Llama 4 Scout 109B Q3, Kimi K2.5 Q4 |
Step 2: Install Ollama for Windows
Ollama will start automatically as a Windows service.
Step 3: Open Command Prompt or PowerShell
Press Win+R, type cmd, press Enter.
Step 4: Download and Run Your First Model
ollama pull llama4-vegaThis downloads Llama 4 Vega 8B (~5.2GB). Progress bar will show download status.
Then run:
ollama run llama4-vegaType your question and press Enter. Your GPU is now running a local AI model.
Step 5: Optimize Performance (NVIDIA)
Update GPU drivers:- Plug in your laptop — GPU throttles on battery
- Close background apps — especially browsers with many tabs
- Use Q4 quantization — faster inference with minimal quality loss
- Check GPU usage — Task Manager → Performance → GPU should show 80-100% utilization during inference
- 4-6GB:
llama4-vega,phi3,minimax-m2.1:flash - 8GB:
glm4-flash,qwen3,deepseek-v3.2:7b - 12-16GB:
qwen3,mimo-v2:13b,kimi-k2.5:q3 - 24GB+:
llama4-scout,kimi-k2.5,glm5
Alternative: LM Studio (GUI Option)
If you prefer a visual interface over the command line, try LM Studio:
LM Studio is great for beginners but uses slightly more RAM than Ollama.
---
Troubleshooting
"Model runs slow (< 5 tokens/second)"
- Mac: Close other apps, plug in power, try a smaller model or lower quantization
- Windows: Update GPU drivers, check GPU usage in Task Manager, ensure model is using GPU (not CPU)
"Out of memory" error
- Try a smaller model:
ollama pull phi3:mini - Use lower quantization:
ollama pull llama4-vega:q2 - Check available RAM/VRAM: Mac (Activity Monitor), Windows (Task Manager)
"Ollama command not found" (Mac)
- Restart Terminal
- Run:
brew reinstall ollama
"CUDA not available" (Windows)
- Update NVIDIA drivers
- Restart your computer
- Reinstall Ollama
What Models Should You Try?
After getting your first model running, experiment with these: For coding:
ollama pull deepseek-v3.2
ollama run deepseek-v3.2
For reasoning / math:
ollama pull glm5
ollama run glm5
For creative writing:
ollama pull minimax-m2.1
ollama run minimax-m2.1
For lightweight / fast:
ollama pull phi3:mini
ollama run phi3:mini
Use ollama list to see all downloaded models.
---
Next Steps
- Learn about quantization — see our Quantization Guide to understand Q4, Q5, Q8 formats
- Calculate VRAM needs — use our VRAM Calculator to plan model upgrades
- Compare tools — read Ollama vs LM Studio vs Jan for alternative options
- Upgrade hardware — check Best GPUs for Local LLMs if you need more power
Get weekly model updates — VRAM data, benchmarks & setup guides
Know which new models your GPU can run before you download 4GB of weights. Free.