How to Run a Local LLM on Your Laptop in 2026

Can Your Laptop Run Local LLMs?

Almost certainly yes. In March 2026, most laptops made in the last 3-4 years can run smaller AI models. High-end laptops can run models that rival GPT-4.

This guide walks through exactly how to get started on Mac (Apple Silicon M4) and Windows (NVIDIA RTX GPUs). We'll cover hardware requirements, software installation, and optimization tips.

Quick Hardware Check

📬 Enjoying this guide?

Get these updates in your inbox every week

New VRAM data, model benchmarks, and setup guides — straight to you. Free.

Before you start, check if your laptop meets these specs:

Minimum Specs (3B-8B models)

RAM: 8GB minimum
Storage: 10GB free space
GPU: Any modern GPU with 4GB+ VRAM, OR Apple Silicon M1/M2/M3/M4
Models you can run: Phi-3 Mini (3B), Llama 4 Vega 8B (Q3-Q4 quantization), MiniMax M2.1 Flash 8B

Recommended Specs (8B-30B models)

RAM: 16GB
Storage: 25GB free space
GPU: NVIDIA with 8GB+ VRAM, OR Apple Silicon M4 Pro with 24GB+ unified memory
Models you can run: GLM-4.7 Flash 30B (Q4), Qwen3 30B, MiMo-V2 34B, DeepSeek V3.2 7B

High-End Specs (70B-109B models)

RAM: 48GB+
Storage: 80GB+ free space
GPU: NVIDIA with 24GB+ VRAM (RTX 5070 Ti+, 5090), OR Apple M4 Max/Ultra with 64GB+ unified memory
Models you can run: Llama 4 Scout 109B, DeepSeek V3.2 235B (Q3), Kimi K2.5 72B, GLM-5 128B

Not sure what you have? Use our VRAM Calculator to find the largest model your laptop can run.

macOS Setup (Apple Silicon)

Apple Silicon Macs (M1, M2, M3, M4) are excellent for local LLMs because they use unified memory — your system RAM doubles as VRAM.

Step 1: Check Your Mac Specs

Click the Apple logo → About This Mac

Note your chip (M4, M4 Pro, M4 Max, M4 Ultra, etc.) and Memory amount

Match it to this chart:

Mac Chip	Memory	Best Model Size	Example Models
M1, M2, M3	8GB	3B-8B (Q3-Q4)	Phi-3 Mini, Llama 4 Vega 8B Q3
M1, M2, M3, M4	16GB	8B (Q5-Q8)	Llama 4 Vega 8B, GLM-4.7 Flash
M4 Pro	24-32GB	30B (Q4-Q5)	Qwen3 30B, GLM-4.7 30B
M4 Pro, M4 Max	48GB	30B (Q8) or 72B (Q4)	Kimi K2.5 72B Q4, MiMo-V2 34B
M4 Max	64-96GB	72B (Q5-Q8)	Kimi K2.5, Qwen3-Coder 70B
M4 Ultra	128-192GB	109B+ (Q5-Q8)	Llama 4 Scout 109B, DeepSeek V3.2 235B Q3

Step 2: Install Homebrew (if not installed)

Open Terminal (Applications → Utilities → Terminal) and run:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Follow the on-screen instructions. This takes 2-5 minutes.

Step 3: Install Ollama

brew install ollama

This installs Ollama, the easiest way to run local models on Mac.

Step 4: Start Ollama Service

ollama serve

Keep this Terminal window open. Ollama is now running in the background.

Step 5: Download and Run Your First Model

Open a new Terminal window (⌘T) and run:

ollama pull glm4-flash

This downloads the GLM-4.7 Flash 30B model (~18GB). Wait for it to complete.

Then start chatting:

ollama run glm4-flash

Type your question and press Enter. You're now running a local AI model!

Step 6: Optimize Performance (Optional)

For faster inference on Mac:

Close unnecessary apps — free up RAM for the model

Plug in your laptop — macOS throttles on battery

Use Q4 quantization — best balance of speed and quality

Recommended models for Mac:

M1/M2/M3/M4 (8-16GB): glm4-flash, phi3:mini, llama4-vega
M4 Pro (24-32GB): qwen3, glm4-flash, deepseek-v3.2:7b
M4 Max (48-96GB): kimi-k2.5, llama4-scout, qwen3-coder
M4 Ultra (128-192GB): llama4-scout, deepseek-v3.2:speciale, glm5

---

Windows Setup (NVIDIA GPUs)

If your Windows laptop has an NVIDIA GPU (RTX 3060 or newer, RTX 50-series recommended), you can run local LLMs with excellent performance.

Step 1: Check Your GPU

Right-click on Desktop → NVIDIA Control Panel

Go to System Information (bottom-left)

Find Dedicated video memory — this is your VRAM

OR open Task Manager (Ctrl+Shift+Esc) → Performance tab → GPU → check Dedicated GPU memory.

VRAM	Best Model Size	Example Models
4GB	3B-8B (Q2-Q3)	Phi-3 Mini, Llama 4 Vega 8B Q2
6GB	8B (Q3-Q4)	Llama 4 Vega 8B Q4
8GB	8B (Q5-Q8)	Llama 4 Vega Q8, GLM-4.7 Flash
12GB	30B (Q3-Q4)	Qwen3 30B Q4, MiMo-V2 13B Q8
16GB	30B (Q5) or 72B (Q2)	GLM-4.7 30B Q5, Kimi K2.5 Q2
24GB+	72B (Q4) or 109B (Q3)	Llama 4 Scout 109B Q3, Kimi K2.5 Q4

Step 2: Install Ollama for Windows

Go to ollama.com/download

Download OllamaSetup.exe

Run the installer — it auto-detects your NVIDIA GPU

Follow the prompts (takes ~2 minutes)

Ollama will start automatically as a Windows service.

Step 3: Open Command Prompt or PowerShell

Press Win+R, type cmd, press Enter.

Step 4: Download and Run Your First Model

ollama pull llama4-vega

This downloads Llama 4 Vega 8B (~5.2GB). Progress bar will show download status.

Then run:

ollama run llama4-vega

Type your question and press Enter. Your GPU is now running a local AI model.

Step 5: Optimize Performance (NVIDIA)

Update GPU drivers:

Go to nvidia.com/drivers

Download the latest Game Ready Driver (or Studio Driver) for your GPU

Install and restart

Performance tips:

Plug in your laptop — GPU throttles on battery
Close background apps — especially browsers with many tabs
Use Q4 quantization — faster inference with minimal quality loss
Check GPU usage — Task Manager → Performance → GPU should show 80-100% utilization during inference

Recommended models by VRAM:

4-6GB: llama4-vega, phi3, minimax-m2.1:flash
8GB: glm4-flash, qwen3, deepseek-v3.2:7b
12-16GB: qwen3, mimo-v2:13b, kimi-k2.5:q3
24GB+: llama4-scout, kimi-k2.5, glm5

---

Alternative: LM Studio (GUI Option)

If you prefer a visual interface over the command line, try LM Studio:

Download from lmstudio.ai

Install (works on both Mac and Windows)

Open LM Studio → Discover tab

Browse and download models with one click

Go to Chat tab and start talking

LM Studio is great for beginners but uses slightly more RAM than Ollama.

---

Troubleshooting

"Model runs slow (< 5 tokens/second)"

Mac: Close other apps, plug in power, try a smaller model or lower quantization
Windows: Update GPU drivers, check GPU usage in Task Manager, ensure model is using GPU (not CPU)

"Out of memory" error

Try a smaller model: ollama pull phi3:mini
Use lower quantization: ollama pull llama4-vega:q2
Check available RAM/VRAM: Mac (Activity Monitor), Windows (Task Manager)

"Ollama command not found" (Mac)

Restart Terminal
Run: brew reinstall ollama

"CUDA not available" (Windows)

Update NVIDIA drivers
Restart your computer
Reinstall Ollama

---

What Models Should You Try?

After getting your first model running, experiment with these: For coding:

ollama pull deepseek-v3.2
ollama run deepseek-v3.2

For reasoning / math:

ollama pull glm5
ollama run glm5

For creative writing:

ollama pull minimax-m2.1
ollama run minimax-m2.1

For lightweight / fast:

ollama pull phi3:mini
ollama run phi3:mini

Use ollama list to see all downloaded models.

---

Next Steps

Learn about quantization — see our Quantization Guide to understand Q4, Q5, Q8 formats
Calculate VRAM needs — use our VRAM Calculator to plan model upgrades
Compare tools — read Ollama vs LM Studio vs Jan for alternative options
Upgrade hardware — check Best GPUs for Local LLMs if you need more power

You're now running AI models locally. Your data stays private, costs are fixed, and you're in full control. Welcome to local AI in 2026.

Get weekly model updates — VRAM data, benchmarks & setup guides

Know which new models your GPU can run before you download 4GB of weights. Free.