GPT-4 Customer Support Chatbot for a Cambridge Service Business

24/7 tier-1 support agent grounded in the company's own docs. End-to-end representative engagement.

Sector: Cambridge service business (~30 employees)
Duration: 3 weeks build + 1 week tuning
Budget band: £4,800 fixed-price build + £80/month run

1. The problem

A Cambridge service business with ~30 employees was fielding ~150 customer support emails per week. Around 70% were repeat questions answerable from existing FAQ, knowledge base, and Zendesk reply history. Support team was small (3 people), and out-of-hours emails piled up — customer feedback consistently mentioned slow response time. The business had tried generic "ChatGPT for support" widgets; they'd either hallucinated incorrect answers (and customers noticed) or refused to engage with anything specific to the business.

2. Stack

Language: Python 3.12 (backend) + TypeScript (widget)

LLM: OpenAI GPT-4o (chat completion with structured citations)

Embeddings: OpenAI text-embedding-3-small

Vector store: pgvector on Neon Postgres

Compute: Vercel Edge Functions (low-latency widget API)

Frontend: Lightweight React widget (~12KB gzipped) embedded on site

Handoff: Zendesk API — ticket creation with full conversation context

Analytics: Per-conversation logging + weekly quality review dashboard

Observability: Sentry + Axiom structured logs

3. Architecture

System topology — what runs where, what talks to what:

        +-----------------+
        |  Customer       |
        |  on website     |
        +--------+--------+
                 |
                 v
        +--------------------+      +---------------------+
        |  React widget      |      |  Vercel Edge        |
        |  (12KB)            | ---> |  Function (API)     |
        |  Conversational UI |      |  Sub-100ms p95      |
        +--------------------+      +----------+----------+
                                                |
                                  +-------------+------------+
                                  |                          |
                                  v                          v
                  +---------------------------+   +---------------------+
                  |  Embed query              |   |  Conversation       |
                  |  text-embedding-3-small   |   |  history (Postgres) |
                  |  (1536 dims)              |   |  Last N turns       |
                  +-------------+-------------+   +---------------------+
                                |
                                v
                  +---------------------------+
                  |  pgvector similarity      |
                  |  search over ~2,000 chunks|
                  |  (docs, FAQ, past replies)|
                  |  Top 5 by cosine distance |
                  +-------------+-------------+
                                |
                                v
                  +---------------------------+
                  |  GPT-4o with strict prompt|
                  |  + retrieved context      |
                  |  + conversation history   |
                  |  Output:                  |
                  |    - answer text          |
                  |    - confidence score     |
                  |    - cited chunk IDs      |
                  +-------------+-------------+
                                |
              +-----------------+-----------------+
              |                                   |
              v                                   v
      +---------------+               +----------------------+
      |  High conf:   |               |  Low conf / explicit |
      |  Show answer  |               |  handoff requested:  |
      |  + citations  |               |  Create Zendesk      |
      |  to customer  |               |  ticket w/ context   |
      +-------+-------+               +-----------+----------+
              |                                   |
              v                                   v
      +-------------------------------------------------+
      |  Audit log: every Q+A, confidence, citations,   |
      |  customer rating (thumbs up/down)               |
      +-------------------------------------------------+

4. Automation flow

End-to-end runtime flow — what happens when a real input arrives:

Indexing (one-off + weekly). Crawled the company website, FAQ, product docs, and 18 months of anonymised Zendesk reply history. Chunked into ~2,000 semantic chunks (~400 tokens each). Embedded with text-embedding-3-small. Stored in pgvector alongside chunk text and source metadata. Weekly re-index for any updated docs.
Query. Customer types message. Widget streams to Vercel Edge Function. Edge function embeds the query, performs pgvector similarity search, returns top 5 chunks.
Generation. GPT-4o called with strict system prompt: "Answer only from the provided context. Cite which chunks support each claim. If context insufficient, say so and offer human handoff." Plus retrieved chunks plus conversation history.
Confidence + routing. Confidence derived from: chunk relevance scores + LLM self-assessment of completeness. High → show answer with citation links. Low or "I don't know" → escalate to Zendesk with structured summary.
Handoff. On escalation, Zendesk ticket auto-created with: customer details, issue summary, attempted answers, retrieved-but-rejected chunks (useful for support team training), conversation transcript.
Feedback loop. Customer rates each answer thumbs up/down. Down-rated conversations flow to weekly review queue → support team updates knowledge base or tweaks prompts.

5. What success looked like

Deflect 50%+ of tier-1 customer enquiries with answers indistinguishable from human support
Never hallucinate — only answer from grounded knowledge base, escalate cleanly otherwise
Match existing support team's tone (warm, specific, not corporate)
Customer always gets a human if they want one — no dark-pattern dead-ends

6. Outcomes

Tier-1 deflection rate	Settled at 55–65% after first two weeks of tuning
Customer satisfaction (CSAT)	Out-of-hours satisfaction up materially; in-hours equivalent to human-only baseline
Median response time	<5 seconds (vs ~4 hours during business hours, ~14 hours overnight)
Hallucination incidents	0 confirmed after week 3 (refusal rate ~8% on novel questions)
Support team capacity	Freed 12-15 hours/week → reinvested in complex case resolution + customer success
AI API cost	~£75/month at 600 conversations/week (~£0.03 per conversation)

7. Speed improvements

First-byte response time. <300ms p95 via Vercel Edge Functions (vs ~2-4 seconds with naive serverless cold-start)
Customer answer latency. Streaming response, first word in <2s, full answer typically 4-8s
Out-of-hours coverage. Was: nothing until next morning. Now: instant. Customer perceived improvement is enormous.

SEO & visibility growth

Site engagement signals. Average session duration +28% (customers stay to chat). Bounce rate -12% on docs pages.
Schema rich results. Site now eligible for FAQPage rich snippets (we generated FAQ schema from the AI's most-asked questions).
Indirect: support docs improved. Weekly knowledge-gap reviews surfaced 30+ FAQ improvements over 3 months → better content → better organic rankings.

9. ROI math

Payback & ongoing return: 6 hours/week of support team time recovered × £25/hr loaded cost = £7,800/year. Plus 6 hours/week of customer-satisfaction-driven value (harder to quantify but real). Build £4,800 + ~£1,000/year run = payback ~7-8 months, then net ~£6,800/year ongoing on the conservative line alone.

10. Maintenance model

Gold care plan (£699/month). Includes: monthly knowledge-base refresh (re-index any updated docs), weekly quality review (60-90 min walkthrough of low-confidence + thumbs-down conversations, tune prompts), monthly AI cost optimisation review (route simple queries to gpt-4o-mini, save 40-60% on API spend), prompt evals on synthetic test set. Plus 6 hours/month of dev work for new features (themed dark mode, new integration, custom analytics). Customer-facing chatbots almost always need Gold or Enterprise — the alternative is degradation over time as the knowledge base drifts from reality.

See care plan tiers for full structure.

11. Honest gotchas — what we'd do differently

First version hallucinated when retrieved chunks were ambiguous — fixed by tightening the system prompt to require explicit citation of which chunk supports each claim. Made the model materially less prone to confabulating.
Pricing-page queries needed a special-case live-fetch path (pricing changes weekly, no point caching). Custom tool added to the LLM toolbox: get_current_pricing() — calls live API.
Privacy review: chat transcripts retained 30 days only. No PII in embeddings (we redact emails/phone numbers before embedding). UK/EU pgvector instance.
Initial latency was bad (~3-5s first byte) before Vercel Edge migration. Moving the API to Edge Functions cut p95 by 4×.
Customer "did this answer help?" rating widget initially had 60% click-through but devolved to 15% after a month — we removed the persistent prompt and only show it after specific signal of high-stakes question. Engagement back to 35%.

Have a workflow that looks like this?

30 minutes, no pitch. We'll tell you honestly whether automation will pay back for your specific case.

WhatsApp Sree 07864 880790