Help Center  /  Workflows

Using Entity Extraction

Entity Extraction pulls structured fields out of unstructured documents and hands you back a clean table you can drop straight into a spreadsheet. Instead of copying vendor names and amounts off 200 invoices by hand — or parties and governing law out of a stack of contracts — you describe the columns you want and let Tholos AI fill them in, entirely on-device.

When to use it

  • An invoice register — extract vendor, amount, currency, due date, and invoice number from a folder of invoice PDFs into a spreadsheet ready for the accounting system.
  • A contract abstract — pull parties, effective date, term, governing law, and termination notice from a batch of contracts into a register.
  • A prescription audit — extract drug, dose, frequency, and prescriber from a set of clinical notes.
  • An ad-hoc task — define your own fields (e.g. counterparty, fee, milestone) and run them against one file or a whole folder.

How to use it

  1. Open the Workflows view and choose Extract Entities.
  2. Attach your document(s). Supported formats: PDF, DOCX, TXT; scanned files are OCR’d automatically.
  3. Choose a schema — the fields (columns) you want extracted (see below).
  4. Optionally turn on export and pick a format, then run it. A preview table appears; export when it looks right.

Choosing what to extract

A schema dropdown lets you pick how the columns are defined:

  • General entities — tick the kinds you want from a chip set: Person, Organization, Location, Date, Money, Email, Phone, URL. (Person, Organization, Location, and Date are pre-selected.)
  • Invoices — Vendor, Amount, Currency, Due Date, Invoice Number.
  • Contracts — Parties, Effective Date, Term, Governing Law, Termination Notice.
  • Clinical notes — Drug, Dose, Frequency, Prescriber.
  • Custom fields — type a comma-separated list of field names; anything you name becomes a column.

The result

You get a preview table with one row per document (or per detected entity instance) and one column per field. Cells the model couldn’t find a value for are left empty rather than guessed. You can sort or filter the preview before exporting. Export to CSV or XLSX (saved to your Settings export folder). For a batch run you get a consolidated workbook plus a status column showing extraction success per file.

Processing many files? Stage them in the Batch Process view to extract from a whole list in one unattended pass — ideal for a finance team turning a folder of invoices into spreadsheets.

Checking the result

  • Open the exported file and skim for empty or suspicious cells.
  • Spot-check five random rows against the source documents — every populated cell should trace back to actual text in the source.
  • For date fields, confirm a consistent format across rows (ISO is a safe choice) before downstream use.
  • If a row has many empty cells, open the source and confirm the data really is missing rather than a model miss.

Tips

  • If a field is empty across most rows, the name may be too jargon-specific — try a clearer synonym.
  • For invoices in multiple currencies, include a Currency field alongside Amount so the spreadsheet stays unambiguous.
  • For tougher documents, a larger model in the Models view extracts more reliably — see Choosing the right AI model.

Related articles

← Back to Help Center