Choosing the right AI model for your hardware
Tholos AI runs open-weight AI models on your own hardware, so the right model depends on the machine you have. You never need to learn quantization formats or parameter counts — the catalog presents models as simple tiers with plain quality labels. This guide explains the tiers and how to choose.
The four tiers
| Tier | Typical models | Download size | Best for |
|---|---|---|---|
| Light | Llama 3.2 3B, Phi-3 Mini | ~2 GB | Fast summaries, simple Q&A, entity extraction. Runs on 8 GB RAM. |
| Balanced | Qwen 2.5 7B, Mistral 7B | ~4–5 GB | Strong instruction following, multilingual work, structured output. The sweet spot for most users. |
| Power | Qwen 2.5 14B | ~8–10 GB | Highest-quality reasoning — contract review, complex analysis. |
| Workstation | Frontier-class 100B+ models | 60–100+ GB | Opt-in tier for regulated-industry workloads on workstation-grade hardware. |
Let the app pick for you
On first run, Tholos AI inspects your hardware and pre-selects a sensible tier:
- Light — under 12 GB RAM
- Balanced — 12–24 GB RAM, or a GPU with 6+ GB VRAM
- Power — 32 GB+ RAM, or 12+ GB VRAM
- Workstation — only offered (never auto-selected) when the hardware clearly supports it, such as 80 GB+ RAM with a 24 GB+ GPU or 96 GB+ unified memory on Apple Silicon — the download alone is 60–100+ GB
You can always change tiers later in the Models view. A practical rule: if answers feel shallow or a workflow struggles with implicit references, step up one tier; if responses feel slow, step down.
Quality over quantity — what's behind the catalog
The catalog is deliberately small. Every model in it passed a workflow-specific test suite — real tasks like summarizing a legal contract, answering cited questions about a financial report, and spotting asymmetric clauses in an NDA — not just public benchmarks. Models that fail the critical workflows are rejected regardless of their benchmark scores. Real model names are shown in the UI: Tholos AI doesn't rebrand open-source models.
The models you don't have to think about
Besides the main language model, workflows use specialist models — mostly handled for you:
- Transcription (speech-to-text): a fast English dictation model ships with the installer; larger multilingual and high-accuracy models (covering 99 languages) are optional downloads — pick a bigger one if meeting transcription accuracy matters more than speed.
- OCR: English and Chinese ship bundled; Japanese, Korean, German, French, Spanish and more are small optional downloads (~12–20 MB each).
- Embeddings (for document search/RAG): bundled and selected automatically based on your documents' languages — nothing to configure.
Downloads, integrity, and bringing your own
- Model downloads come from their original public sources, and every file is verified against a SHA-256 checksum before it's allowed to load.
- The catalog check is a plain request that sends no user data, and you can disable it entirely — see How Tholos AI keeps your data private.
- Prefer your own weights? Drop any
GGUFlanguage model into the models folder — the app detects and validates it. This is also the install path for air-gapped machines.
Curious how model choice interacts with long documents? Read Working with long documents: context windows explained.