Help Center  /  Hardware

Hardware for running local AI

Because Tholos AI runs models on your own machine, your hardware sets the ceiling on which models you can run and how fast they respond. The good news: it runs on ordinary laptops and desktops — a GPU is an enhancement, not a requirement. This guide covers what Tholos AI needs, and how to think about hardware for running local LLMs more generally.

What Tholos AI needs

The single most important factor is memory — the model has to fit. A capable everyday baseline is around 16 GB RAM with a modest GPU (e.g. an RTX 3060 6 GB) or an Apple Silicon machine (e.g. M1 Pro). You can run comfortably below that with a Light model; you’ll want more for the larger tiers.

TierExample modelMemory neededBest for
LightLlama 3.2 3B~8 GB RAMFast summaries, simple Q&A, extraction on modest machines.
BalancedQwen 2.5 7B12 GB+ RAM, or 4–6 GB VRAMThe sweet spot: strong instruction-following and multilingual work.
PowerQwen 2.5 14B24–32 GB+ RAM, or 12 GB VRAMHighest-quality reasoning — contract review, complex analysis.
Workstationgpt-oss 120B (MoE)80 GB+ RAM and 24 GB+ VRAM, or 96 GB+ unified memoryFrontier-class reasoning for regulated-industry workloads (opt-in).

On first run, Tholos AI inspects your hardware and pre-selects a sensible tier; you can change it any time. See Choosing the right AI model.

GPU acceleration (optional)

A GPU speeds generation up substantially, but Tholos AI runs on CPU alone if you don’t have one. It’s detected and used automatically — support by platform:

PlatformGPU backendNotes
Windows (NVIDIA / AMD / Intel)Vulkan (language model) + DirectML (speech & search)Auto-detected and vendor-agnostic. Keep your GPU drivers current.
macOS (Apple Silicon)MetalAuto-activated. Unified memory is a real advantage (see below).
macOS (Intel)CPU onlyNo GPU acceleration.

Setup is essentially automatic — for details and troubleshooting, see Setting up GPU acceleration.

Understanding the hardware: RAM vs VRAM

Running an LLM is mostly a memory problem. Two numbers matter:

  • VRAM (GPU memory) determines how much of the model can live on the fast GPU. If the whole model fits in VRAM, it runs fastest.
  • System RAM holds whatever doesn’t fit in VRAM (and the entire model on a CPU-only machine). It’s slower than VRAM but lets you run larger models than your GPU alone could hold.

A useful rule of thumb: a model’s memory footprint is close to its file size, so a ~5 GB quantized 7B model wants roughly 6–8 GB free to run well. Quantization (see AI models in Tholos AI) is what makes big models fit. A fast SSD also helps — models load from disk into memory.

Hardware for running LLMs, by ambition

Laptop / everyday desktop

16 GB RAM runs Light and Balanced models well; an 8 GB+ discrete GPU makes them snappy. This covers the large majority of professional use.

Prosumer workstation

32–64 GB RAM with a 16–24 GB GPU (e.g. a high-end consumer card) comfortably runs Power-tier 14B models on the GPU and leaves headroom for documents and other apps. VRAM is the limiting factor for keeping a model fully on the GPU, so prioritize it.

Apple Silicon

Apple’s unified memory is shared between CPU and GPU, so a Mac with 32–128 GB can run models that would need an expensive multi-GPU rig on a PC — a cost-effective route to the Power tier and beyond. Use your total RAM as the budget when judging which tier fits.

Regulated-industry / frontier workstation

To run 100B-class Mixture-of-Experts models (the Workstation tier), you want 128 GB+ system RAM and 24–48 GB+ VRAM (a high-VRAM professional card, dual GPUs, or 128 GB+ unified memory on Apple Silicon). MoE models activate only a slice of their parameters per token, so they run faster than their total size suggests — but they still have to fit in memory, which is why RAM capacity dominates here. The download alone is 60–100+ GB.

You don’t have to get this exactly right up front. Start with the tier Tholos AI suggests for your machine; if answers feel shallow, step up a tier; if responses feel slow, step down. Adding RAM is usually the cheapest way to unlock a bigger model.

Related

Related articles

← Back to Help Center