GPT-4 Customer Support Chatbot for a Cambridge Service Business

24/7 tier-1 support agent grounded in the company's own docs. End-to-end representative engagement.

Sector
Cambridge service business (~30 employees)
Duration
3 weeks build + 1 week tuning
Budget band
£4,800 fixed-price build + £80/month run

1. The problem

A Cambridge service business with ~30 employees was fielding ~150 customer support emails per week. Around 70% were repeat questions answerable from existing FAQ, knowledge base, and Zendesk reply history. Support team was small (3 people), and out-of-hours emails piled up — customer feedback consistently mentioned slow response time. The business had tried generic "ChatGPT for support" widgets; they'd either hallucinated incorrect answers (and customers noticed) or refused to engage with anything specific to the business.

2. Stack

Language: Python 3.12 (backend) + TypeScript (widget)
LLM: OpenAI GPT-4o (chat completion with structured citations)
Embeddings: OpenAI text-embedding-3-small
Vector store: pgvector on Neon Postgres
Compute: Vercel Edge Functions (low-latency widget API)
Frontend: Lightweight React widget (~12KB gzipped) embedded on site
Handoff: Zendesk API — ticket creation with full conversation context
Analytics: Per-conversation logging + weekly quality review dashboard
Observability: Sentry + Axiom structured logs

3. Architecture

System topology — what runs where, what talks to what:

        +-----------------+
        |  Customer       |
        |  on website     |
        +--------+--------+
                 |
                 v
        +--------------------+      +---------------------+
        |  React widget      |      |  Vercel Edge        |
        |  (12KB)            | ---> |  Function (API)     |
        |  Conversational UI |      |  Sub-100ms p95      |
        +--------------------+      +----------+----------+
                                                |
                                  +-------------+------------+
                                  |                          |
                                  v                          v
                  +---------------------------+   +---------------------+
                  |  Embed query              |   |  Conversation       |
                  |  text-embedding-3-small   |   |  history (Postgres) |
                  |  (1536 dims)              |   |  Last N turns       |
                  +-------------+-------------+   +---------------------+
                                |
                                v
                  +---------------------------+
                  |  pgvector similarity      |
                  |  search over ~2,000 chunks|
                  |  (docs, FAQ, past replies)|
                  |  Top 5 by cosine distance |
                  +-------------+-------------+
                                |
                                v
                  +---------------------------+
                  |  GPT-4o with strict prompt|
                  |  + retrieved context      |
                  |  + conversation history   |
                  |  Output:                  |
                  |    - answer text          |
                  |    - confidence score     |
                  |    - cited chunk IDs      |
                  +-------------+-------------+
                                |
              +-----------------+-----------------+
              |                                   |
              v                                   v
      +---------------+               +----------------------+
      |  High conf:   |               |  Low conf / explicit |
      |  Show answer  |               |  handoff requested:  |
      |  + citations  |               |  Create Zendesk      |
      |  to customer  |               |  ticket w/ context   |
      +-------+-------+               +-----------+----------+
              |                                   |
              v                                   v
      +-------------------------------------------------+
      |  Audit log: every Q+A, confidence, citations,   |
      |  customer rating (thumbs up/down)               |
      +-------------------------------------------------+

4. Automation flow

End-to-end runtime flow — what happens when a real input arrives:

  1. Indexing (one-off + weekly). Crawled the company website, FAQ, product docs, and 18 months of anonymised Zendesk reply history. Chunked into ~2,000 semantic chunks (~400 tokens each). Embedded with text-embedding-3-small. Stored in pgvector alongside chunk text and source metadata. Weekly re-index for any updated docs.
  2. Query. Customer types message. Widget streams to Vercel Edge Function. Edge function embeds the query, performs pgvector similarity search, returns top 5 chunks.
  3. Generation. GPT-4o called with strict system prompt: "Answer only from the provided context. Cite which chunks support each claim. If context insufficient, say so and offer human handoff." Plus retrieved chunks plus conversation history.
  4. Confidence + routing. Confidence derived from: chunk relevance scores + LLM self-assessment of completeness. High → show answer with citation links. Low or "I don't know" → escalate to Zendesk with structured summary.
  5. Handoff. On escalation, Zendesk ticket auto-created with: customer details, issue summary, attempted answers, retrieved-but-rejected chunks (useful for support team training), conversation transcript.
  6. Feedback loop. Customer rates each answer thumbs up/down. Down-rated conversations flow to weekly review queue → support team updates knowledge base or tweaks prompts.

5. What success looked like

6. Outcomes

Tier-1 deflection rateSettled at 55–65% after first two weeks of tuning
Customer satisfaction (CSAT)Out-of-hours satisfaction up materially; in-hours equivalent to human-only baseline
Median response time<5 seconds (vs ~4 hours during business hours, ~14 hours overnight)
Hallucination incidents0 confirmed after week 3 (refusal rate ~8% on novel questions)
Support team capacityFreed 12-15 hours/week → reinvested in complex case resolution + customer success
AI API cost~£75/month at 600 conversations/week (~£0.03 per conversation)

7. Speed improvements

SEO & visibility growth

9. ROI math

Payback & ongoing return: 6 hours/week of support team time recovered × £25/hr loaded cost = £7,800/year. Plus 6 hours/week of customer-satisfaction-driven value (harder to quantify but real). Build £4,800 + ~£1,000/year run = payback ~7-8 months, then net ~£6,800/year ongoing on the conservative line alone.

10. Maintenance model

Gold care plan (£699/month). Includes: monthly knowledge-base refresh (re-index any updated docs), weekly quality review (60-90 min walkthrough of low-confidence + thumbs-down conversations, tune prompts), monthly AI cost optimisation review (route simple queries to gpt-4o-mini, save 40-60% on API spend), prompt evals on synthetic test set. Plus 6 hours/month of dev work for new features (themed dark mode, new integration, custom analytics). Customer-facing chatbots almost always need Gold or Enterprise — the alternative is degradation over time as the knowledge base drifts from reality.

See care plan tiers for full structure.

11. Honest gotchas — what we'd do differently

Have a workflow that looks like this?

30 minutes, no pitch. We'll tell you honestly whether automation will pay back for your specific case.