How We Build
The engineering methodology behind Jagatab.UK AI automation projects. Documented because procurement teams ask for it and because clients deserve to know what they're buying.
1Project lifecycle
Five phases. The timing is illustrative — a smaller workflow ships in 3 weeks end-to-end; a larger build with multiple integrations takes 8–10 weeks.
1. Discovery (week 1)
30-minute initial call, then a 60–90 minute structured discovery: current workflow walkthrough, system inventory, data flow mapping, success criteria, error tolerance, deployment constraints. Output is a written scope document with fixed-price quote.
Typical duration: 3–5 working days. No cost.
2. Design (week 1–2)
Architecture document: system components, data flow diagram, API choices, model selection (LLM, embedding, OCR), hosting topology, observability plan, rollback plan, security/compliance considerations. Shared with you for sign-off before any code is written.
Typical duration: 3–5 working days, parallel with the next phase.
3. Build (week 2–6)
Implementation in Git from day one. Weekly demo calls. CI/CD pipeline live by end of week 1 of this phase. Anything that goes live behind a feature flag we can toggle. You see incremental progress, not a black box.
Typical duration: 2–5 weeks depending on scope.
4. Deploy & bed in (week 6–8)
Staged rollout: shadow mode (system runs but doesn't take action) → restricted live (one team / one workflow segment) → full live. Observability dashboards live throughout. Tuning loop with your team daily for the first week.
Typical duration: 1–2 weeks.
5. Handover & support (week 8+)
Full documentation: architecture, runbooks, prompts, model choices, integration credentials (rotated to yours), monitoring access, incident response plan. 30 days of bug-fix support included. Optional ongoing maintenance retainer.
Typical duration: 3–5 working days for handover, then ongoing.
2Reference architecture
The shape of a typical AI automation project. Specific components vary, but the pattern below is the load-bearing skeleton in most of what we build.
Key engineering choices baked into this pattern:
- Idempotent workers. Every job can be retried safely — no double-posting, no duplicate emails.
- Audit log first. Every decision the system makes is written to an append-only log before any side effect. Full traceability.
- Human-in-the-loop where needed. Low-confidence outputs queue for review rather than auto-acting. The threshold is tunable.
- Schema-validated outputs. LLM responses are parsed against a Pydantic / Zod schema before use. Malformed output triggers a retry with stricter prompting, then escalates.
3Stack choices
The defaults. We deviate when there's a specific reason; we don't deviate for novelty.
Language
Python 3.12 (backend / workers). TypeScript (frontend / serverless).
LLM API
OpenAI GPT-4o, Anthropic Claude. Default mix depends on the task; we route per-job.
Embeddings & vector search
OpenAI text-embedding-3-small + pgvector on Postgres. Cheap, fast, simple.
OCR / document AI
AWS Textract for structured docs. Mistral OCR / vision LLM for messy inputs.
Database
Postgres (Neon, RDS, or Supabase). pgvector for embedding storage. SQLite only for tiny tools.
Hosting / compute
Vercel for web. AWS Lambda + EventBridge for jobs. Cloud Run when Lambda doesn't fit.
Frontend
Next.js 15 (App Router) + Tailwind. Static HTML where Next.js would be overkill.
Auth
Clerk for new apps. Existing systems integrate where they already live.
Observability
Sentry for errors. Axiom for structured logs. CloudWatch for AWS-native telemetry.
Payments
Stripe Checkout + webhooks. Customer Portal for self-serve.
CI/CD
GitHub Actions. Preview deploys on Vercel. Lambda deploys via SAM or Serverless Framework.
Region
UK / EU by default: AWS eu-west-2 (London), Vercel London edge, Neon eu-west-2.
4Quality, testing & safety
Five practices we apply to every project, irrespective of size.
- Schema-validated LLM I/O. Every LLM output parses to a Pydantic / Zod schema. Failure → retry → escalate. Hallucinations don't reach side effects.
- Eval suite per workflow. A small (20–100 example) test set of real inputs with known-good outputs. Run in CI. Tracks regression when prompts or models change.
- Confidence routing. Every AI decision tagged with a confidence score. Below threshold → human review queue, not silent failure.
- Audit log for every action. What input came in, what the system extracted, what action was taken, when, with what confidence. Searchable, exportable, retained for at least the regulatory minimum.
- Reversible side effects where possible. Posting drafts to Xero / pending approvals in Slack rather than direct send. Real-world “undo” available.
5Security & compliance
Detailed in the security page. Summary: UK GDPR + Data Protection Act 2018 alignment, UK/EU-region hosting by default, DPA available, no PII in training data, no third-party model training on your inputs (we use APIs configured for zero-retention where the provider supports it).
Want to discuss your specific architecture?
30 minutes, no pitch. Bring an architecture question or a workflow you want second-opinion on — we'll talk through it honestly.