Building Your First Production AI Workflow in Python (Tutorial)

2026-05-19 AI Automation 13 min read Sree Jagatab

Most AI tutorials show you how to call the OpenAI API and stop there. Real production AI workflows need five additional layers: schema validation, structured retry, confidence routing, audit logging, and reversible actions. This tutorial walks through building all five for a representative workflow — invoice data extraction — using only Python and the libraries that show up in every real production system we ship.

The workflow we'll build

Input: a PDF invoice or its raw text. Output: structured data (supplier, invoice number, dates, line items with VAT, totals) posted to Xero (or any other destination) as a draft, with a full audit trail.

Stack

Step 1 — Define the schema

Don't skip this. Pydantic schemas are the contract between the LLM and your business logic. If the LLM returns malformed data, Pydantic catches it before it can do damage.

Pseudocode (real code formatting will vary):

Critical detail: money in pence as int, not pounds as float. Floats and money don't mix; one rounding error and your accounting reconciliation breaks.

Step 2 — Call the LLM with structured output

Use OpenAI's structured output mode (or Anthropic's tool-use, or Mistral's schema mode). The model returns JSON that matches your schema, or it errors. No more "parse the response and pray".

Pseudocode flow:

  1. Call openai.chat.completions.parse(...) with response_format=Invoice
  2. OpenAI guarantees the response conforms to your schema, or raises
  3. Validate it through Pydantic anyway (belt and braces)
  4. Cross-check: sum of line_items.line_total_pence + vat_total_pence should equal total_pence

The arithmetic check at step 4 is crucial. LLMs are bad at sums. Compute the totals yourself from the line items; if the LLM's reported totals don't match, log a warning and use the computed values.

Step 3 — Wrap in retry logic

Network errors happen. API rate limits happen. Use tenacity to retry with exponential backoff:

Step 4 — Confidence scoring + routing

Not every successful extraction is equally trustworthy. Compute a confidence score from signals you have:

Combine into a single confidence score (0-100). Route based on threshold:

Step 5 — Audit logging

Every decision the system makes is written to an append-only audit log BEFORE any side effect. This means:

When a customer asks "why did the system do X?", you can answer authoritatively. When an audit asks for proof of process, you have it. Without an audit log, you're running blind.

Step 6 — Reversible side effects

Post to Xero as DRAFT, never as posted. Send to Slack for human approval before final action. Schedule emails with 5-minute delay so they can be cancelled. Anything that can't be undone needs a human in the loop.

The pattern: suggest, don't execute. The system does 90% of the work; the human approves the final 10%.

Putting it together — the workflow

Conceptual flow:

  1. Receive PDF (via email webhook, file upload, scheduled poll)
  2. Run OCR (Textract or Mistral OCR) if needed; extract text
  3. Call LLM with structured-output Pydantic schema
  4. Validate output; retry once with stricter prompt if it fails
  5. Compute totals from line items; warn if LLM totals disagree
  6. Compute confidence score from all signals
  7. Write to audit log (BEFORE any side effect)
  8. Route based on confidence: auto-draft / review queue / flag
  9. Send action (Xero API, Slack message, email)
  10. Notify bookkeeper of any review/flag items via Slack

What you've built

A production-grade AI workflow that:

This is the layer between "AI that's impressive in a demo" and "AI that runs unattended overnight in a production accounting practice". The libraries are simple; the pattern is rigorous.

Next steps

For a full real-world case study of this exact pattern in production, see our invoice OCR automation case study with architecture diagram and ROI math.

Or if you'd rather not build it yourself: we'll build it for you, typically £3,500 fixed-price, payback in 3 months.

Sree Jagatab
Sree Jagatab is an AI automation engineer based in Wisbech, Cambridgeshire. He builds custom Python and AI automation for UK SMEs across Cambridge, Peterborough, and the surrounding region. More about Sree →

Got a workflow you want to talk through?

30 minutes, no pitch. We'll tell you honestly what we'd build — or whether automation isn't right yet.