Choosing the right AI model for your hardware

How to use 6 min read Updated Jun 10, 2026

Tholos AI runs open-weight AI models on your own hardware, so the right model depends on the machine you have. You never need to learn quantization formats or parameter counts — the catalog presents models as simple tiers with plain quality labels. This guide explains the tiers and how to choose.

The four tiers

Tier	Typical models	Download size	Best for
Light	Llama 3.2 3B, Phi-3 Mini	~2 GB	Fast summaries, simple Q&A, entity extraction — pick it when you want the quickest responses.
Balanced	Qwen 2.5 7B, Mistral 7B	~4–5 GB	Strong instruction following, multilingual work, structured output. The sweet spot for most users.
Power	Qwen 2.5 14B	~8–10 GB	Highest-quality reasoning — contract review, complex analysis.
Workstation	Frontier-class 100B+ models	60–100+ GB	Opt-in tier for regulated-industry workloads on workstation-grade hardware.

Let the app pick for you

Tholos AI’s minimum is 16 GB RAM. On first run it inspects your hardware and pre-selects a sensible tier:

Balanced — the default on a typical 16 GB machine, or a GPU with 6+ GB VRAM
Power — 32 GB+ RAM, or 12+ GB VRAM
Workstation — only offered (never auto-selected) when the hardware clearly supports it, such as 80 GB+ RAM with a 24 GB+ GPU or 96 GB+ unified memory on Apple Silicon — the download alone is 60–100+ GB

You can change tiers any time in the Models view. Want the fastest possible responses? Drop to the Light tier manually — it trades some quality for speed. A practical rule: if answers feel shallow or a workflow struggles with implicit references, step up one tier; if responses feel slow, step down.

Quality over quantity — what's behind the catalog

The catalog is deliberately small. Every model in it passed a workflow-specific test suite — real tasks like summarizing a legal contract, answering cited questions about a financial report, and spotting asymmetric clauses in an NDA — not just public benchmarks. Models that fail the critical workflows are rejected regardless of their benchmark scores. Real model names are shown in the UI: Tholos AI doesn't rebrand open-source models.

The models you don't have to think about

Besides the main language model, workflows use specialist models — mostly handled for you:

Transcription (speech-to-text): a fast English dictation model ships with the installer; larger multilingual and high-accuracy models (covering 99 languages) are optional downloads — pick a bigger one if meeting transcription accuracy matters more than speed.
OCR: English and Chinese ship bundled; Japanese, Korean, German, French, Spanish and more are small optional downloads (~12–20 MB each).
Embeddings (for document search/RAG): bundled and selected automatically based on your documents' languages — nothing to configure.

Downloads, integrity, and bringing your own

Model downloads come from their original public sources, and every file is verified against a SHA-256 checksum before it's allowed to load.
The catalog check is a plain request that sends no user data, and you can disable it entirely — see How Tholos AI keeps your data private.
Prefer your own weights? Drop any GGUF language model into the models folder — the app detects and validates it. This is also the install path for air-gapped machines.

Keep in mind your edition's model slots: Standard holds 2 installed models, Professional 6, Business unlimited. See Which edition is right for you?

Curious how model choice interacts with long documents? Read Working with long documents: context windows explained.

← Back to Help Center

Choosing the right AI model for your hardware

The four tiers

Let the app pick for you

Quality over quantity — what's behind the catalog

The models you don't have to think about

Downloads, integrity, and bringing your own

Related articles