LLM & RAGlive

Consulate Chatbot

A grounded RAG chatbot for the Korean Consulate in Toronto.

216 official posts · BM25 + embeddings hybrid

The problem

Consulate visitors and Korean nationals in Toronto repeatedly ask the same procedural questions (passport, visa, notarization, military service, family registration). The official bulletin has the answers but is hard to search. This chatbot grounds answers strictly in 216 official posts using hybrid retrieval and refuses to invent anything outside that corpus.

Who this is for

Civic-tech and government-AI teams, anyone shipping a public-facing RAG chatbot where hallucinations cause real harm.

Architecture

Query normalizer: Strips colloquial Korean endings and detects topic (passport / visa / notarization / military service / family registration).
BM25 sub-indexes per topic: When topic is detected, search a topic-specific BM25 index for precision.
Full BM25 + embedding hybrid (RRF): When no topic is detected, run full BM25 plus OpenAI embedding search and merge with Reciprocal Rank Fusion.
Context assembler: Top 5 posts assembled into a context window up to 16,000 characters.
GPT-4o generator: Temperature 0.05, instructed to answer strictly from the provided posts, with source links and disclaimer.
Streaming SSE: Token-by-token response streamed to the browser; marked.js renders markdown live.

Request / data flow

01User asks a question (Korean) → query normalized + topic detected.
02If topic detected → BM25 sub-index search. Else → full BM25 + embeddings → RRF merge.
03Top 5 posts assembled into a ≤16,000-character context.
04GPT-4o streams the answer grounded in that context, with source links + disclaimer.

Key decisions

Hybrid BM25 + embeddings, not pure vector.

whyGovernment terminology is rare and exact-match-heavy (form numbers, statute names). Pure embeddings drift on those; BM25 nails them.

Per-topic sub-indexes when topic is detected.

whyPassport answers should never come from the military-service section, even if vector cosine says they're "close".

Strict grounding + disclaimer + source links.

whyWrong civic-service information has real consequences. Better to say "I don't know" than to invent.

Temperature 0.05.

whyDeterminism matters more than fluency variety for a public-facing reference bot.

Stack

PythonFastAPIOpenAIRAGBM25SSE

If I rebuilt it

›Auto-resync the bulletin scrape on a schedule and version the embedding index so old answers can be traced to old content.
›Surface confidence (best-hit rank / score) alongside answers so the user knows when retrieval was weak.

← All projects Contact me about this →