Help Center  /  How to use

Choosing the right AI model for your hardware

Tholos AI runs open-weight AI models on your own hardware, so the right model depends on the machine you have. You never need to learn quantization formats or parameter counts — the catalog presents models as simple tiers with plain quality labels. This guide explains the tiers and how to choose.

The four tiers

TierTypical modelsDownload sizeBest for
LightLlama 3.2 3B, Phi-3 Mini~2 GBFast summaries, simple Q&A, entity extraction. Runs on 8 GB RAM.
BalancedQwen 2.5 7B, Mistral 7B~4–5 GBStrong instruction following, multilingual work, structured output. The sweet spot for most users.
PowerQwen 2.5 14B~8–10 GBHighest-quality reasoning — contract review, complex analysis.
WorkstationFrontier-class 100B+ models60–100+ GBOpt-in tier for regulated-industry workloads on workstation-grade hardware.

Let the app pick for you

On first run, Tholos AI inspects your hardware and pre-selects a sensible tier:

  • Light — under 12 GB RAM
  • Balanced — 12–24 GB RAM, or a GPU with 6+ GB VRAM
  • Power — 32 GB+ RAM, or 12+ GB VRAM
  • Workstation — only offered (never auto-selected) when the hardware clearly supports it, such as 80 GB+ RAM with a 24 GB+ GPU or 96 GB+ unified memory on Apple Silicon — the download alone is 60–100+ GB

You can always change tiers later in the Models view. A practical rule: if answers feel shallow or a workflow struggles with implicit references, step up one tier; if responses feel slow, step down.

Quality over quantity — what's behind the catalog

The catalog is deliberately small. Every model in it passed a workflow-specific test suite — real tasks like summarizing a legal contract, answering cited questions about a financial report, and spotting asymmetric clauses in an NDA — not just public benchmarks. Models that fail the critical workflows are rejected regardless of their benchmark scores. Real model names are shown in the UI: Tholos AI doesn't rebrand open-source models.

The models you don't have to think about

Besides the main language model, workflows use specialist models — mostly handled for you:

  • Transcription (speech-to-text): a fast English dictation model ships with the installer; larger multilingual and high-accuracy models (covering 99 languages) are optional downloads — pick a bigger one if meeting transcription accuracy matters more than speed.
  • OCR: English and Chinese ship bundled; Japanese, Korean, German, French, Spanish and more are small optional downloads (~12–20 MB each).
  • Embeddings (for document search/RAG): bundled and selected automatically based on your documents' languages — nothing to configure.

Downloads, integrity, and bringing your own

  • Model downloads come from their original public sources, and every file is verified against a SHA-256 checksum before it's allowed to load.
  • The catalog check is a plain request that sends no user data, and you can disable it entirely — see How Tholos AI keeps your data private.
  • Prefer your own weights? Drop any GGUF language model into the models folder — the app detects and validates it. This is also the install path for air-gapped machines.
Keep in mind your edition's model slots: Standard holds 2 installed models, Professional 6, Business unlimited. See Which edition is right for you?

Curious how model choice interacts with long documents? Read Working with long documents: context windows explained.

Related articles

← Back to Help Center