Invoice OCR Automation for an Accounting Firm

From 12 hours/week of manual data entry to under 1 hour of review. End-to-end representative engagement.

Sector
Accounting / Bookkeeping
Duration
4 weeks build + 2 weeks bedding-in
Budget band
£3,500 fixed-price build + £150/month run

1. The problem

A mid-sized Cambridgeshire accounting firm processes ~800 supplier invoices per month across 60+ clients. Most arrive as email PDFs or scanned images. The existing process: a bookkeeper opens each invoice, types supplier name, invoice number, line items, VAT, and totals into Xero, files the original. Average 12 hours per week of skilled time, spread across two team members, on work no client wanted to be billed for. The firm uses Dext for the easy cases; the awkward 30% of invoices (multi-line, badly-photographed, foreign-supplier formats, hand-corrected) still need manual handling.

2. Stack

Language: Python 3.12
LLM: OpenAI GPT-4o (structured output mode)
OCR: AWS Textract (Form / Table extraction)
Schema validation: Pydantic v2
Compute: AWS Lambda + EventBridge (event-driven)
Database: PostgreSQL on Neon (audit log + idempotency)
Integration: Xero API (bills as drafts, not direct post)
Email ingest: AWS SES inbound to S3 → Lambda trigger
Alerting: Slack webhooks for exception queue + daily summary
Observability: CloudWatch + Sentry

3. Architecture

System topology — what runs where, what talks to what:

    +--------------------+      +--------------------+      +---------------------+
    |  Email inbox       |      |  AWS SES inbound   |      |  S3 attachment      |
    |  (forwarded from   | ---> |  receives, stores  | ---> |  bucket (PDF/PNG)   |
    |   suppliers)       |      |  attachments       |      |                     |
    +--------------------+      +--------------------+      +----------+----------+
                                                                       |
                                                                       v
        +-------------------------+   +-------------------------+   +-------------------------+
        |  AWS Textract           |   |  GPT-4o structured     |   |  Pydantic schema        |
        |  - Form extraction      |   |  output:                |   |  validation              |
        |  - Table extraction     | ->|  - Supplier             | ->|  Reject malformed,      |
        |  - Confidence scores    |   |  - Invoice #            |   |  retry once,             |
        |                         |   |  - Line items + VAT     |   |  escalate to queue       |
        +-------------------------+   |  - Totals + due date    |   +------------+------------+
                                      +-------------------------+                |
                                                                                  v
                                       +--------------------+      +--------------------+
                                       |  Confidence route: |      |  Audit log         |
                                       |  >=85%, known supp |----->|  (Postgres,        |
                                       |  -> auto Xero draft|      |   append-only)     |
                                       |  60-84% -> queue   |      |  Every decision,   |
                                       |  <60% -> flag      |      |  every action      |
                                       +--------+-----------+      +--------------------+
                                                |
                                                v
                                       +--------------------+      +--------------------+
                                       |  Xero API          |      |  Slack notify      |
                                       |  - Bill as DRAFT   |----->|  Per-bookkeeper    |
                                       |  - Attach original |      |  daily summary     |
                                       |  - Tag with conf.  |      |                    |
                                       +--------------------+      +--------------------+

4. Automation flow

End-to-end runtime flow — what happens when a real input arrives:

  1. Trigger. Supplier emails an invoice to bookkeeping@.co.uk. AWS SES receives, deposits attachment to S3, fires a Lambda within 5 seconds.
  2. Extraction. Textract extracts raw text + tables. Output passed to GPT-4o with a strict schema-output prompt. Returns structured JSON for supplier name, invoice number, dates, line items with VAT breakdown, totals.
  3. Validation. Pydantic validates the JSON against a strict schema. Any failure triggers ONE retry with stricter prompting before escalating to human-review queue.
  4. Confidence scoring. Each field gets a confidence score. Combined into overall confidence. Routing decision: 0.85+ with known supplier = auto-draft to Xero. 0.60-0.85 = bookkeeper review queue. <0.60 or duplicate detected = flag for partner attention.
  5. Action. Xero bill created as DRAFT (never auto-posted). Original PDF attached. Internal metadata tags confidence score and trace ID. Bookkeeper reviews + approves to post.
  6. Audit + notify. Every decision written to Postgres append-only audit log: input hash, extracted fields, confidence, route taken, human action. Daily Slack summary per bookkeeper.

5. What success looked like

6. Outcomes

Manual entry time12 hours/week → under 1 hour/week (review only)
End-to-end latencyEmail arrives → Xero draft posted in median 90 seconds
Auto-post rate~72% of invoices auto-drafted at >85% confidence
Bookkeeper review queue~22% (mostly new-supplier cases)
Flagged for attention~6% (duplicates, unusual amounts, malformed)
Error rateComparable to manual entry — no false-positive duplicates after week 3 tuning

7. Speed improvements

8. ROI math

Payback & ongoing return: At a loaded bookkeeper rate of £35/hr, 11 hours/week of saved time = £20,020/year. Build cost £3,500 + £1,800/year run = payback ~3 months, ongoing net saving ~£18,200/year. Plus partner time recovered from "is this duplicate?" Slack pings = harder to quantify but real.

9. Maintenance model

Silver care plan (£299/month). Includes: weekly prompt/model performance review (catches drift), monthly false-positive review (re-tune thresholds if needed), quarterly cost optimisation (AI API costs scale linearly with volume — we periodically swap GPT-4o for gpt-4o-mini on simpler cases). Plus 2 hours/month of small dev changes — adding new supplier patterns, tweaking the audit-log query interface, etc. Most clients stay on Silver; complex deployments (multi-entity, multi-currency, custom integrations) usually need Gold £699/month.

See care plan tiers for full structure.

10. Honest gotchas — what we'd do differently

Have a workflow that looks like this?

30 minutes, no pitch. We'll tell you honestly whether automation will pay back for your specific case.