How We Build

The engineering methodology behind Jagatab.UK AI automation projects. Documented because procurement teams ask for it and because clients deserve to know what they're buying.

1Project lifecycle

Five phases. The timing is illustrative — a smaller workflow ships in 3 weeks end-to-end; a larger build with multiple integrations takes 8–10 weeks.

1. Discovery (week 1)

30-minute initial call, then a 60–90 minute structured discovery: current workflow walkthrough, system inventory, data flow mapping, success criteria, error tolerance, deployment constraints. Output is a written scope document with fixed-price quote.

Typical duration: 3–5 working days. No cost.

2. Design (week 1–2)

Architecture document: system components, data flow diagram, API choices, model selection (LLM, embedding, OCR), hosting topology, observability plan, rollback plan, security/compliance considerations. Shared with you for sign-off before any code is written.

Typical duration: 3–5 working days, parallel with the next phase.

3. Build (week 2–6)

Implementation in Git from day one. Weekly demo calls. CI/CD pipeline live by end of week 1 of this phase. Anything that goes live behind a feature flag we can toggle. You see incremental progress, not a black box.

Typical duration: 2–5 weeks depending on scope.

4. Deploy & bed in (week 6–8)

Staged rollout: shadow mode (system runs but doesn't take action) → restricted live (one team / one workflow segment) → full live. Observability dashboards live throughout. Tuning loop with your team daily for the first week.

Typical duration: 1–2 weeks.

5. Handover & support (week 8+)

Full documentation: architecture, runbooks, prompts, model choices, integration credentials (rotated to yours), monitoring access, incident response plan. 30 days of bug-fix support included. Optional ongoing maintenance retainer.

Typical duration: 3–5 working days for handover, then ongoing.

2Reference architecture

The shape of a typical AI automation project. Specific components vary, but the pattern below is the load-bearing skeleton in most of what we build.

+-------------------+ +-------------------+ +-------------------+ | Source systems | | Trigger / queue | | Worker (Python) | | (Email, Xero, | ---> | (Webhook, SQS, | ---> | Lambda / Cloud | | CRM, S3, etc.) | | EventBridge) | | Run / Fargate | +-------------------+ +-------------------+ +---------+---------+ | +----------------------------+ | v +-------------------+ +-------------------+ +-------------------+ | AI services | | Database | | Audit log / | | (GPT-4o, Claude, | | (Postgres / | | observability | | Textract, | | Neon / | | (Sentry, Axiom, | | embeddings) | | pgvector) | | CloudWatch) | +-------------------+ +-------------------+ +-------------------+ | v +-------------------+ | Destination | | (Xero post, | | Slack alert, | | CRM update, | | customer email)| +-------------------+

Key engineering choices baked into this pattern:

Idempotent workers. Every job can be retried safely — no double-posting, no duplicate emails.
Audit log first. Every decision the system makes is written to an append-only log before any side effect. Full traceability.
Human-in-the-loop where needed. Low-confidence outputs queue for review rather than auto-acting. The threshold is tunable.
Schema-validated outputs. LLM responses are parsed against a Pydantic / Zod schema before use. Malformed output triggers a retry with stricter prompting, then escalates.

3Stack choices

The defaults. We deviate when there's a specific reason; we don't deviate for novelty.

Language

Python 3.12 (backend / workers). TypeScript (frontend / serverless).

LLM API

OpenAI GPT-4o, Anthropic Claude. Default mix depends on the task; we route per-job.

Embeddings & vector search

OpenAI text-embedding-3-small + pgvector on Postgres. Cheap, fast, simple.

OCR / document AI

AWS Textract for structured docs. Mistral OCR / vision LLM for messy inputs.

Database

Postgres (Neon, RDS, or Supabase). pgvector for embedding storage. SQLite only for tiny tools.

Hosting / compute

Vercel for web. AWS Lambda + EventBridge for jobs. Cloud Run when Lambda doesn't fit.

Frontend

Next.js 15 (App Router) + Tailwind. Static HTML where Next.js would be overkill.

Auth

Clerk for new apps. Existing systems integrate where they already live.

Observability

Sentry for errors. Axiom for structured logs. CloudWatch for AWS-native telemetry.

Payments

Stripe Checkout + webhooks. Customer Portal for self-serve.

CI/CD

GitHub Actions. Preview deploys on Vercel. Lambda deploys via SAM or Serverless Framework.

Region

UK / EU by default: AWS eu-west-2 (London), Vercel London edge, Neon eu-west-2.

4Quality, testing & safety

Five practices we apply to every project, irrespective of size.

Schema-validated LLM I/O. Every LLM output parses to a Pydantic / Zod schema. Failure → retry → escalate. Hallucinations don't reach side effects.
Eval suite per workflow. A small (20–100 example) test set of real inputs with known-good outputs. Run in CI. Tracks regression when prompts or models change.
Confidence routing. Every AI decision tagged with a confidence score. Below threshold → human review queue, not silent failure.
Audit log for every action. What input came in, what the system extracted, what action was taken, when, with what confidence. Searchable, exportable, retained for at least the regulatory minimum.
Reversible side effects where possible. Posting drafts to Xero / pending approvals in Slack rather than direct send. Real-world “undo” available.

5Security & compliance

Detailed in the security page. Summary: UK GDPR + Data Protection Act 2018 alignment, UK/EU-region hosting by default, DPA available, no PII in training data, no third-party model training on your inputs (we use APIs configured for zero-retention where the provider supports it).

Want to discuss your specific architecture?

30 minutes, no pitch. Bring an architecture question or a workflow you want second-opinion on — we'll talk through it honestly.

WhatsApp Sree 07864 880790