Invoice OCR Automation for an Accounting Firm

From 12 hours/week of manual data entry to under 1 hour of review. End-to-end representative engagement.

Sector: Accounting / Bookkeeping
Duration: 4 weeks build + 2 weeks bedding-in
Budget band: £3,500 fixed-price build + £150/month run

1. The problem

A mid-sized Cambridgeshire accounting firm processes ~800 supplier invoices per month across 60+ clients. Most arrive as email PDFs or scanned images. The existing process: a bookkeeper opens each invoice, types supplier name, invoice number, line items, VAT, and totals into Xero, files the original. Average 12 hours per week of skilled time, spread across two team members, on work no client wanted to be billed for. The firm uses Dext for the easy cases; the awkward 30% of invoices (multi-line, badly-photographed, foreign-supplier formats, hand-corrected) still need manual handling.

2. Stack

Language: Python 3.12

LLM: OpenAI GPT-4o (structured output mode)

OCR: AWS Textract (Form / Table extraction)

Schema validation: Pydantic v2

Compute: AWS Lambda + EventBridge (event-driven)

Database: PostgreSQL on Neon (audit log + idempotency)

Integration: Xero API (bills as drafts, not direct post)

Email ingest: AWS SES inbound to S3 → Lambda trigger

Alerting: Slack webhooks for exception queue + daily summary

Observability: CloudWatch + Sentry

3. Architecture

System topology — what runs where, what talks to what:

    +--------------------+      +--------------------+      +---------------------+
    |  Email inbox       |      |  AWS SES inbound   |      |  S3 attachment      |
    |  (forwarded from   | ---> |  receives, stores  | ---> |  bucket (PDF/PNG)   |
    |   suppliers)       |      |  attachments       |      |                     |
    +--------------------+      +--------------------+      +----------+----------+
                                                                       |
                                                                       v
        +-------------------------+   +-------------------------+   +-------------------------+
        |  AWS Textract           |   |  GPT-4o structured     |   |  Pydantic schema        |
        |  - Form extraction      |   |  output:                |   |  validation              |
        |  - Table extraction     | ->|  - Supplier             | ->|  Reject malformed,      |
        |  - Confidence scores    |   |  - Invoice #            |   |  retry once,             |
        |                         |   |  - Line items + VAT     |   |  escalate to queue       |
        +-------------------------+   |  - Totals + due date    |   +------------+------------+
                                      +-------------------------+                |
                                                                                  v
                                       +--------------------+      +--------------------+
                                       |  Confidence route: |      |  Audit log         |
                                       |  >=85%, known supp |----->|  (Postgres,        |
                                       |  -> auto Xero draft|      |   append-only)     |
                                       |  60-84% -> queue   |      |  Every decision,   |
                                       |  <60% -> flag      |      |  every action      |
                                       +--------+-----------+      +--------------------+
                                                |
                                                v
                                       +--------------------+      +--------------------+
                                       |  Xero API          |      |  Slack notify      |
                                       |  - Bill as DRAFT   |----->|  Per-bookkeeper    |
                                       |  - Attach original |      |  daily summary     |
                                       |  - Tag with conf.  |      |                    |
                                       +--------------------+      +--------------------+

4. Automation flow

End-to-end runtime flow — what happens when a real input arrives:

Trigger. Supplier emails an invoice to bookkeeping@.co.uk. AWS SES receives, deposits attachment to S3, fires a Lambda within 5 seconds.
Extraction. Textract extracts raw text + tables. Output passed to GPT-4o with a strict schema-output prompt. Returns structured JSON for supplier name, invoice number, dates, line items with VAT breakdown, totals.
Validation. Pydantic validates the JSON against a strict schema. Any failure triggers ONE retry with stricter prompting before escalating to human-review queue.
Confidence scoring. Each field gets a confidence score. Combined into overall confidence. Routing decision: 0.85+ with known supplier = auto-draft to Xero. 0.60-0.85 = bookkeeper review queue. <0.60 or duplicate detected = flag for partner attention.
Action. Xero bill created as DRAFT (never auto-posted). Original PDF attached. Internal metadata tags confidence score and trace ID. Bookkeeper reviews + approves to post.
Audit + notify. Every decision written to Postgres append-only audit log: input hash, extracted fields, confidence, route taken, human action. Daily Slack summary per bookkeeper.

5. What success looked like

Cut manual invoice data-entry time by 80%+ without compromising accuracy
Catch unusual invoices (new supplier, large amount, possible duplicate) for human review before posting
Keep a full audit trail of every decision the system made and every human review
No disruption to the existing Xero workflow — bookkeepers retain final approve-and-post control

6. Outcomes

Manual entry time	12 hours/week → under 1 hour/week (review only)
End-to-end latency	Email arrives → Xero draft posted in median 90 seconds
Auto-post rate	~72% of invoices auto-drafted at >85% confidence
Bookkeeper review queue	~22% (mostly new-supplier cases)
Flagged for attention	~6% (duplicates, unusual amounts, malformed)
Error rate	Comparable to manual entry — no false-positive duplicates after week 3 tuning

7. Speed improvements

Email-to-draft. Manual: median 6 hours (next batch processing). Automated: median 90 seconds end-to-end.
Onboarding new supplier. Manual: 10 min per first invoice. Automated: pattern-learns after 3-5 invoices, then auto-handles.
Bookkeeper context-switch. Down from 800 invoices/month touched to ~176 (queue + flagged). Deep work time recovered.

8. ROI math

Payback & ongoing return: At a loaded bookkeeper rate of £35/hr, 11 hours/week of saved time = £20,020/year. Build cost £3,500 + £1,800/year run = payback ~3 months, ongoing net saving ~£18,200/year. Plus partner time recovered from "is this duplicate?" Slack pings = harder to quantify but real.

9. Maintenance model

Silver care plan (£299/month). Includes: weekly prompt/model performance review (catches drift), monthly false-positive review (re-tune thresholds if needed), quarterly cost optimisation (AI API costs scale linearly with volume — we periodically swap GPT-4o for gpt-4o-mini on simpler cases). Plus 2 hours/month of small dev changes — adding new supplier patterns, tweaking the audit-log query interface, etc. Most clients stay on Silver; complex deployments (multi-entity, multi-currency, custom integrations) usually need Gold £699/month.

See care plan tiers for full structure.

10. Honest gotchas — what we'd do differently

GPT-4o's VAT arithmetic on multi-line invoices needed a deterministic post-validation step — never trust an LLM with sums. Total = sum(line_items × VAT rate) verified in Python before Xero gets the draft.
Some suppliers send invoices as merged PDFs (5 invoices in one file). Pipeline now splits by page and processes individually. Detection done via page-count heuristic + first-page text-density signal.
Initial confidence thresholds were too aggressive; tuned over the first two weeks based on bookkeeper feedback ("we'd have caught this anyway" vs "we missed this one ourselves").
Two suppliers sent invoices with VAT split into multiple lines per VAT rate — needed a structured-output schema update to handle line.vat_breakdown[].
Foreign currency invoices (EUR, USD) needed exchange-rate snapshot logic — uses ECB daily fix, cached. Some practices wanted invoice-date FX rather than receipt-date FX; configurable.

Have a workflow that looks like this?

30 minutes, no pitch. We'll tell you honestly whether automation will pay back for your specific case.

WhatsApp Sree 07864 880790