Building a RAG Chatbot with Azure OpenAI and AI Search

Overview

Production RAG implementation: chunking strategies, hybrid search, re-ranking, and grounding verification.

What Is Retrieval-Augmented Generation?

RAG solves the fundamental problem with LLMs: they don't know your proprietary data. Instead of fine-tuning (expensive, data-hungry), RAG retrieves relevant context at query time and injects it into the prompt.

The flow is: user asks a question → search your knowledge base for relevant chunks → inject chunks into GPT-4 prompt → return grounded answer with citations.

No hallucinations about your data
No fine-tuning cost or complexity
Always up-to-date with your latest documents
Verifiable answers with source citations

Chunking Strategy Matters

How you split documents into chunks dramatically affects retrieval quality. Fixed-size chunks are simple but break semantic units. Sentence-window chunking maintains context. Hierarchical chunking creates summary → detail relationships.

We use semantic chunking based on markdown headers for technical documentation—splitting at h1/h2 boundaries preserves complete concepts and dramatically improves retrieval precision.

Fixed size: simple, often adequate
Sentence window: preserves local context
Semantic/markdown: best for structured docs
Hierarchical: summary nodes accelerate multi-hop reasoning

Hybrid Search with Re-Ranking

Azure AI Search supports hybrid retrieval combining dense vector search (semantic similarity) with sparse BM25 keyword search. Reciprocal Rank Fusion then merges the result sets.

Adding a cross-encoder re-ranker as a final step—scoring each candidate chunk against the full query—consistently pushes retrieval accuracy above 90% in our production systems.

var searchOptions = new SearchOptions {
    VectorSearch = new VectorSearchOptions {
        Queries = { new VectorizedQuery(embedding) { Fields = { "contentVector" } } }
    },
    SemanticSearch = new SemanticSearchOptions {
        SemanticConfigurationName = "my-semantic-config",
        QueryAnswer = new QueryAnswer(QueryAnswerType.Extractive),
        QueryCaption = new QueryCaption(QueryCaptionType.Extractive)
    }
};

Key Takeaways

RAG is cheaper and more maintainable than fine-tuning
Chunking strategy is the most impactful RAG tuning lever
Hybrid search outperforms pure vector or keyword search
Always include source citations in responses
Monitor retrieval quality with evaluation datasets

Saurav Rai

Founder & Lead Architect, Omni Stack

7+ years building enterprise .NET and cloud applications for clients across Australia, USA, and the Middle East. Passionate about clean architecture, developer experience, and shipping fast.

Building a RAG Chatbot with Azure OpenAI and AI Search

Overview

What Is Retrieval-Augmented Generation?

Chunking Strategy Matters

Hybrid Search with Re-Ranking

Key Takeaways

ML.NET in Production: Custom Classification Models for .NET Apps

Blazor Auto Render Mode: Server vs WebAssembly Per Component

Building Real-Time Dashboards with SignalR and Blazor Server

Building a RAG Chatbot with Azure OpenAI and AI Search

Overview

What Is Retrieval-Augmented Generation?

Chunking Strategy Matters

Hybrid Search with Re-Ranking

Key Takeaways

Related Articles

ML.NET in Production: Custom Classification Models for .NET Apps

Blazor Auto Render Mode: Server vs WebAssembly Per Component

Building Real-Time Dashboards with SignalR and Blazor Server