Invoice OCR Automation for an Accounting Firm
From 12 hours/week of manual data entry to under 1 hour of review. End-to-end representative engagement.
- Sector
- Accounting / Bookkeeping
- Duration
- 4 weeks build + 2 weeks bedding-in
- Budget band
- £3,500 fixed-price build + £150/month run
1. The problem
A mid-sized Cambridgeshire accounting firm processes ~800 supplier invoices per month across 60+ clients. Most arrive as email PDFs or scanned images. The existing process: a bookkeeper opens each invoice, types supplier name, invoice number, line items, VAT, and totals into Xero, files the original. Average 12 hours per week of skilled time, spread across two team members, on work no client wanted to be billed for. The firm uses Dext for the easy cases; the awkward 30% of invoices (multi-line, badly-photographed, foreign-supplier formats, hand-corrected) still need manual handling.
2. Stack
3. Architecture
System topology — what runs where, what talks to what:
+--------------------+ +--------------------+ +---------------------+
| Email inbox | | AWS SES inbound | | S3 attachment |
| (forwarded from | ---> | receives, stores | ---> | bucket (PDF/PNG) |
| suppliers) | | attachments | | |
+--------------------+ +--------------------+ +----------+----------+
|
v
+-------------------------+ +-------------------------+ +-------------------------+
| AWS Textract | | GPT-4o structured | | Pydantic schema |
| - Form extraction | | output: | | validation |
| - Table extraction | ->| - Supplier | ->| Reject malformed, |
| - Confidence scores | | - Invoice # | | retry once, |
| | | - Line items + VAT | | escalate to queue |
+-------------------------+ | - Totals + due date | +------------+------------+
+-------------------------+ |
v
+--------------------+ +--------------------+
| Confidence route: | | Audit log |
| >=85%, known supp |----->| (Postgres, |
| -> auto Xero draft| | append-only) |
| 60-84% -> queue | | Every decision, |
| <60% -> flag | | every action |
+--------+-----------+ +--------------------+
|
v
+--------------------+ +--------------------+
| Xero API | | Slack notify |
| - Bill as DRAFT |----->| Per-bookkeeper |
| - Attach original | | daily summary |
| - Tag with conf. | | |
+--------------------+ +--------------------+
4. Automation flow
End-to-end runtime flow — what happens when a real input arrives:
- Trigger. Supplier emails an invoice to bookkeeping@
.co.uk. AWS SES receives, deposits attachment to S3, fires a Lambda within 5 seconds. - Extraction. Textract extracts raw text + tables. Output passed to GPT-4o with a strict schema-output prompt. Returns structured JSON for supplier name, invoice number, dates, line items with VAT breakdown, totals.
- Validation. Pydantic validates the JSON against a strict schema. Any failure triggers ONE retry with stricter prompting before escalating to human-review queue.
- Confidence scoring. Each field gets a confidence score. Combined into overall confidence. Routing decision: 0.85+ with known supplier = auto-draft to Xero. 0.60-0.85 = bookkeeper review queue. <0.60 or duplicate detected = flag for partner attention.
- Action. Xero bill created as DRAFT (never auto-posted). Original PDF attached. Internal metadata tags confidence score and trace ID. Bookkeeper reviews + approves to post.
- Audit + notify. Every decision written to Postgres append-only audit log: input hash, extracted fields, confidence, route taken, human action. Daily Slack summary per bookkeeper.
5. What success looked like
- Cut manual invoice data-entry time by 80%+ without compromising accuracy
- Catch unusual invoices (new supplier, large amount, possible duplicate) for human review before posting
- Keep a full audit trail of every decision the system made and every human review
- No disruption to the existing Xero workflow — bookkeepers retain final approve-and-post control
6. Outcomes
| Manual entry time | 12 hours/week → under 1 hour/week (review only) |
| End-to-end latency | Email arrives → Xero draft posted in median 90 seconds |
| Auto-post rate | ~72% of invoices auto-drafted at >85% confidence |
| Bookkeeper review queue | ~22% (mostly new-supplier cases) |
| Flagged for attention | ~6% (duplicates, unusual amounts, malformed) |
| Error rate | Comparable to manual entry — no false-positive duplicates after week 3 tuning |
7. Speed improvements
- Email-to-draft. Manual: median 6 hours (next batch processing). Automated: median 90 seconds end-to-end.
- Onboarding new supplier. Manual: 10 min per first invoice. Automated: pattern-learns after 3-5 invoices, then auto-handles.
- Bookkeeper context-switch. Down from 800 invoices/month touched to ~176 (queue + flagged). Deep work time recovered.
8. ROI math
9. Maintenance model
See care plan tiers for full structure.
10. Honest gotchas — what we'd do differently
- GPT-4o's VAT arithmetic on multi-line invoices needed a deterministic post-validation step — never trust an LLM with sums. Total = sum(line_items × VAT rate) verified in Python before Xero gets the draft.
- Some suppliers send invoices as merged PDFs (5 invoices in one file). Pipeline now splits by page and processes individually. Detection done via page-count heuristic + first-page text-density signal.
- Initial confidence thresholds were too aggressive; tuned over the first two weeks based on bookkeeper feedback ("we'd have caught this anyway" vs "we missed this one ourselves").
- Two suppliers sent invoices with VAT split into multiple lines per VAT rate — needed a structured-output schema update to handle line.vat_breakdown[].
- Foreign currency invoices (EUR, USD) needed exchange-rate snapshot logic — uses ECB daily fix, cached. Some practices wanted invoice-date FX rather than receipt-date FX; configurable.
Have a workflow that looks like this?
30 minutes, no pitch. We'll tell you honestly whether automation will pay back for your specific case.