13 min read

How to Run a Local LLM on Your Laptop in 2026

Step-by-step guide for Mac (Apple Silicon M4) and Windows (NVIDIA RTX 50-series). From checking specs to running your first model in under 20 minutes.

Can Your Laptop Run Local LLMs?

Almost certainly yes. In March 2026, most laptops made in the last 3-4 years can run smaller AI models. High-end laptops can run models that rival GPT-4.

This guide walks through exactly how to get started on Mac (Apple Silicon M4) and Windows (NVIDIA RTX GPUs). We'll cover hardware requirements, software installation, and optimization tips.

Quick Hardware Check

📬 Enjoying this guide?

Get these updates in your inbox every week

New VRAM data, model benchmarks, and setup guides — straight to you. Free.

Before you start, check if your laptop meets these specs:

Minimum Specs (3B-8B models)

Recommended Specs (8B-30B models)

High-End Specs (70B-109B models)

Not sure what you have? Use our VRAM Calculator to find the largest model your laptop can run.

macOS Setup (Apple Silicon)

Apple Silicon Macs (M1, M2, M3, M4) are excellent for local LLMs because they use unified memory — your system RAM doubles as VRAM.

Step 1: Check Your Mac Specs

  • Click the Apple logo → About This Mac
  • Note your chip (M4, M4 Pro, M4 Max, M4 Ultra, etc.) and Memory amount
  • Match it to this chart:
  • Mac ChipMemoryBest Model SizeExample Models
    M1, M2, M38GB3B-8B (Q3-Q4)Phi-3 Mini, Llama 4 Vega 8B Q3
    M1, M2, M3, M416GB8B (Q5-Q8)Llama 4 Vega 8B, GLM-4.7 Flash
    M4 Pro24-32GB30B (Q4-Q5)Qwen3 30B, GLM-4.7 30B
    M4 Pro, M4 Max48GB30B (Q8) or 72B (Q4)Kimi K2.5 72B Q4, MiMo-V2 34B
    M4 Max64-96GB72B (Q5-Q8)Kimi K2.5, Qwen3-Coder 70B
    M4 Ultra128-192GB109B+ (Q5-Q8)Llama 4 Scout 109B, DeepSeek V3.2 235B Q3

    Step 2: Install Homebrew (if not installed)

    Open Terminal (Applications → Utilities → Terminal) and run:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

    Follow the on-screen instructions. This takes 2-5 minutes.

    Step 3: Install Ollama

    brew install ollama

    This installs Ollama, the easiest way to run local models on Mac.

    Step 4: Start Ollama Service

    ollama serve

    Keep this Terminal window open. Ollama is now running in the background.

    Step 5: Download and Run Your First Model

    Open a new Terminal window (⌘T) and run:

    ollama pull glm4-flash

    This downloads the GLM-4.7 Flash 30B model (~18GB). Wait for it to complete.

    Then start chatting:

    ollama run glm4-flash

    Type your question and press Enter. You're now running a local AI model!

    Step 6: Optimize Performance (Optional)

    For faster inference on Mac:

  • Close unnecessary apps — free up RAM for the model
  • Plug in your laptop — macOS throttles on battery
  • Use Q4 quantization — best balance of speed and quality
  • Recommended models for Mac: ---

    Windows Setup (NVIDIA GPUs)

    If your Windows laptop has an NVIDIA GPU (RTX 3060 or newer, RTX 50-series recommended), you can run local LLMs with excellent performance.

    Step 1: Check Your GPU

  • Right-click on Desktop → NVIDIA Control Panel
  • Go to System Information (bottom-left)
  • Find Dedicated video memory — this is your VRAM
  • OR open Task Manager (Ctrl+Shift+Esc) → Performance tab → GPU → check Dedicated GPU memory.
    VRAMBest Model SizeExample Models
    4GB3B-8B (Q2-Q3)Phi-3 Mini, Llama 4 Vega 8B Q2
    6GB8B (Q3-Q4)Llama 4 Vega 8B Q4
    8GB8B (Q5-Q8)Llama 4 Vega Q8, GLM-4.7 Flash
    12GB30B (Q3-Q4)Qwen3 30B Q4, MiMo-V2 13B Q8
    16GB30B (Q5) or 72B (Q2)GLM-4.7 30B Q5, Kimi K2.5 Q2
    24GB+72B (Q4) or 109B (Q3)Llama 4 Scout 109B Q3, Kimi K2.5 Q4

    Step 2: Install Ollama for Windows

  • Go to ollama.com/download
  • Download OllamaSetup.exe
  • Run the installer — it auto-detects your NVIDIA GPU
  • Follow the prompts (takes ~2 minutes)
  • Ollama will start automatically as a Windows service.

    Step 3: Open Command Prompt or PowerShell

    Press Win+R, type cmd, press Enter.

    Step 4: Download and Run Your First Model

    ollama pull llama4-vega

    This downloads Llama 4 Vega 8B (~5.2GB). Progress bar will show download status.

    Then run:

    ollama run llama4-vega

    Type your question and press Enter. Your GPU is now running a local AI model.

    Step 5: Optimize Performance (NVIDIA)

    Update GPU drivers:
  • Go to nvidia.com/drivers
  • Download the latest Game Ready Driver (or Studio Driver) for your GPU
  • Install and restart
  • Performance tips: Recommended models by VRAM: ---

    Alternative: LM Studio (GUI Option)

    If you prefer a visual interface over the command line, try LM Studio:

  • Download from lmstudio.ai
  • Install (works on both Mac and Windows)
  • Open LM Studio → Discover tab
  • Browse and download models with one click
  • Go to Chat tab and start talking
  • LM Studio is great for beginners but uses slightly more RAM than Ollama.

    ---

    Troubleshooting

    "Model runs slow (< 5 tokens/second)"

    "Out of memory" error

    "Ollama command not found" (Mac)

    "CUDA not available" (Windows)

    ---

    What Models Should You Try?

    After getting your first model running, experiment with these: For coding:

    ollama pull deepseek-v3.2
    ollama run deepseek-v3.2
    For reasoning / math:
    ollama pull glm5
    ollama run glm5
    For creative writing:
    ollama pull minimax-m2.1
    ollama run minimax-m2.1
    For lightweight / fast:
    ollama pull phi3:mini
    ollama run phi3:mini

    Use ollama list to see all downloaded models.

    ---

    Next Steps

    You're now running AI models locally. Your data stays private, costs are fixed, and you're in full control. Welcome to local AI in 2026.

    Get weekly model updates — VRAM data, benchmarks & setup guides

    Know which new models your GPU can run before you download 4GB of weights. Free.