Setting up GPU acceleration
A GPU makes Tholos AI noticeably faster — the model generates more words per second — but it’s an enhancement, not a requirement. Everything works on the CPU alone. The good news is there’s almost nothing to set up: Tholos AI detects a usable GPU automatically and uses it. This article explains what happens by default, how to confirm it’s working, and what to do if it isn’t.
It’s automatic
On launch, Tholos AI checks your hardware and uses your GPU if it finds a usable one — there’s no “turn on GPU” switch to flip. If it can’t use a GPU, it falls back to the CPU and keeps working. You’ll see a GPU acceleration line in the startup system check: a green tick means a GPU is in use; a warning (“No usable GPU detected”) means it’s running on CPU — that’s a heads-up, not an error.
Platform support
| Platform | GPU acceleration |
|---|---|
| Windows (NVIDIA, AMD, or Intel GPU) | Automatic and vendor-agnostic. The language model is accelerated through a Vulkan backend, and the speech and document-search models through DirectML — any reasonably modern GPU qualifies. |
| macOS (Apple Silicon — M1 and later) | Automatic via Metal. Apple’s unified memory is a real advantage for larger models (see below). |
| macOS (Intel) | CPU only — no GPU acceleration. |
What you actually need to do
- On Windows: keep your GPU drivers current. An up-to-date driver from NVIDIA, AMD, or Intel is the single most common thing that makes acceleration available (or fixes it when it’s missing).
- On Apple Silicon: nothing — Metal is built in and used automatically.
- Make sure the model fits. A GPU only helps if there’s room for the model. If a model is larger than your GPU’s VRAM, part of it runs on the GPU and the rest on the CPU — still faster than CPU alone, just not the full speed-up. See Hardware for running local AI for VRAM guidance.
Confirming it’s working
- The startup system check shows the GPU acceleration line as completed.
- The Models view reports your detected GPU and its VRAM — if it shows your card (rather than “CPU only”), the app can see it.
- Generation simply feels faster — more words appear per second than on CPU.
If you see “No usable GPU detected”
- Update your GPU driver and relaunch — this resolves most cases on Windows.
- Integrated graphics with no dedicated video memory often don’t qualify; the app will use the CPU instead. A discrete GPU with its own VRAM is what unlocks acceleration.
- On an Intel Mac, this is expected — those run on CPU only.
- It’s safe to keep using Tholos AI on CPU. If responses feel slow, step down to a smaller model tier — see Choosing the right AI model.