Vivek Kaushik
AboutBlogWorkMy Work Ethics
🔎 Azure AI Search

🔎 Azure AI Search

Created
Feb 14, 2026 07:37 AM
Tags
Azure AI Search is not just a search engine — it is the retrieval backbone of enterprise RAG systems.

1️⃣ What is Azure AI Search?

Azure AI Search is a fully managed search service that enables:
  • Full-text search (BM25)
  • Semantic ranking
  • Vector search
  • Hybrid search
  • Indexing pipelines
  • Skill-based enrichment
  • Filtering & faceting
  • Security trimming
In modern AI systems, it acts as:
The retrieval layer in RAG architecture.

2️⃣ Core Components

1️⃣ Index

Logical container that stores searchable documents.
Defines:
  • Fields
  • Data types
  • Filterable fields
  • Sortable fields
  • Vector fields
  • Semantic configuration
Important:
Schema changes require reindexing.

2️⃣ Documents

Each document is a JSON object with fields defined in index schema.
Example:
{ "id": "doc-123", "content": "...", "title": "...", "tenantId": "mankind", "vector": [0.12, -0.45, ...] }

3️⃣ Data Source

Where documents come from:
  • Blob Storage
  • Azure SQL
  • Cosmos DB
  • SharePoint
  • Custom REST endpoint

4️⃣ Indexer

Automated crawler that:
  • Reads from data source
  • Applies skillset
  • Pushes documents to index
Supports:
  • Scheduled runs
  • Incremental indexing
  • Change detection
  • Soft delete detection

5️⃣ Skillset

Used for enrichment during ingestion.
Examples:
  • OCR
  • Text splitting
  • Entity recognition
  • Key phrase extraction
  • Embedding generation
  • Image analysis

3️⃣ Search Types

1️⃣ Full-Text Search (BM25)

Traditional keyword-based ranking.
Best for:
  • Exact matches
  • IDs
  • Structured text

2️⃣ Semantic Search

Uses deep language understanding to:
  • Rerank results
  • Extract captions
  • Extract answers
Improves precision.
Requires:
  • Semantic configuration

3️⃣ Vector Search

Uses embeddings for similarity search.
Best for:
  • Conceptual similarity
  • Synonym handling
  • Natural language search

4️⃣ Hybrid Search (Recommended)

Combines:
  • Keyword (BM25)
  • Semantic reranking
  • Vector similarity
Hybrid search improves both recall and precision.
Enterprise best practice.

4️⃣ Index Design Best Practices

Separate Fields Properly

  • searchable
  • filterable
  • sortable
  • facetable
  • retrievable
Do NOT make everything searchable.

Vector Fields

Define:
  • Collection(Edm.Single)
  • dimensions
  • vectorSearchConfiguration
Match dimension exactly with embedding model.

Semantic Configuration

Defines:
  • Title field
  • Content field
  • Keywords
Used by semantic ranker.

5️⃣ Ingestion Models

1️⃣ On-Demand (Synchronous)

User triggers ingestion.
Pros:
  • Immediate availability
Cons:
  • High latency
  • Cost spikes
  • Poor scalability
Best for:
  • Small POCs

2️⃣ Batch Ingestion (Offline)

Scheduled indexers.
Pros:
  • Predictable cost
  • Scalable
  • Controlled
Best for:
  • Large document corpus

3️⃣ Event-Driven Ingestion

Triggered by:
  • Blob events
  • Cosmos change feed
  • Event Grid
Flow:
Event → Function → Push API → Index update
Best for:
  • Near real-time updates

4️⃣ Hybrid Ingestion

Combine:
  • Batch for large sets
  • Event-driven for updates
  • On-demand for small uploads
Most enterprise systems use hybrid.

6️⃣ Index Update Strategies

Azure AI Search does NOT auto-trigger on updates.
Use:

1️⃣ Scheduled Indexers

Supports incremental indexing.

2️⃣ Push API

Direct document upload/update.

3️⃣ Event Grid + Azure Function

Simulated trigger-based indexing.

Zero-Downtime Schema Changes

Use Index Aliases:
  1. Create new index (v2)
  1. Backfill
  1. Switch alias
  1. Delete old index
Never modify schema in-place in production.

7️⃣ Retrieval Strategy in RAG

Typical Query Flow:
  1. Generate query embedding
  1. Execute hybrid search
  1. Apply filters
  1. Retrieve Top-K
  1. Inject into prompt

Top-K Tuning

Too low:
  • Miss context
Too high:
  • Context overflow
  • Noise
Tune experimentally.

8️⃣ Common Failure Modes

1️⃣ Poor Chunking

Fragmented or noisy context.

2️⃣ Low Retrieval Recall

Relevant docs not retrieved.

3️⃣ High Hallucination Despite Retrieval

Often prompt or context overflow issue.

4️⃣ Stale Index

Ingestion pipeline failed.

5️⃣ Schema Drift

Embedding dimension mismatch.

9️⃣ Document-Level RBAC (CRITICAL)

Security must be enforced at retrieval layer.

Implementation

Add metadata fields:
  • tenantId
  • department
  • role
  • securityGroupIds
Mark as:
  • filterable: true
Apply filter in query:
opts.Filter = $"tenantId eq '{userTenant}'";

Why This Matters

LLM is not a security boundary.
Unauthorized documents must never be retrieved.

🔟 Performance Considerations

Retrieval Latency Factors

  • Semantic ranking cost
  • Vector similarity cost
  • Top-K size
  • Index size
  • Replica/partition count

Scaling

Scale with:
  • Replicas (query throughput)
  • Partitions (index size)

1️⃣1️⃣ Cost Optimization

Costs come from:
  • Index size
  • Semantic ranker usage
  • Vector search
  • Embedding generation
  • Replica count
Optimize by:
  • Reducing Top-K
  • Using smaller embeddings
  • Caching embeddings
  • Batch ingestion

1️⃣2️⃣ Observability

Monitor:
  • Query latency
  • Search score distribution
  • Top-K usage
  • Indexer failures
  • Replica usage
  • Throttling
Use:
  • Azure Monitor
  • Application Insights

1️⃣3️⃣ Enterprise Patterns

Multi-Tenant Strategy

Option A:
  • Separate index per tenant
Option B:
  • Shared index + metadata filters
Most SaaS use shared index.

Metadata Filtering + Vector

Combine:
Vector search + filter condition
Prevents cross-tenant leakage.

Security Trimming

Always apply RBAC filter before LLM call.

1️⃣4️⃣ RAG Evaluation with AI Search

Measure:

Retrieval Metrics

  • Recall@K
  • Precision
  • MRR

Generation Metrics

  • Faithfulness
  • Groundedness
  • Hallucination rate
Evaluate retrieval and generation separately.

🔥 Core Mental Models

Retrieval quality determines generation quality.
Enforce security at search layer, not LLM layer.
Ingestion and query pipelines must scale independently.
Hybrid search is almost always superior to pure vector.
Index versioning prevents downtime.

Table of Contents
1️⃣ What is Azure AI Search?2️⃣ Core Components1️⃣ Index2️⃣ Documents3️⃣ Data Source4️⃣ Indexer5️⃣ Skillset3️⃣ Search Types1️⃣ Full-Text Search (BM25)2️⃣ Semantic Search3️⃣ Vector Search4️⃣ Hybrid Search (Recommended)4️⃣ Index Design Best PracticesSeparate Fields ProperlyVector FieldsSemantic Configuration5️⃣ Ingestion Models1️⃣ On-Demand (Synchronous)2️⃣ Batch Ingestion (Offline)3️⃣ Event-Driven Ingestion4️⃣ Hybrid Ingestion6️⃣ Index Update Strategies1️⃣ Scheduled Indexers2️⃣ Push API3️⃣ Event Grid + Azure FunctionZero-Downtime Schema Changes7️⃣ Retrieval Strategy in RAGTop-K Tuning8️⃣ Common Failure Modes1️⃣ Poor Chunking2️⃣ Low Retrieval Recall3️⃣ High Hallucination Despite Retrieval4️⃣ Stale Index5️⃣ Schema Drift9️⃣ Document-Level RBAC (CRITICAL)ImplementationWhy This Matters🔟 Performance ConsiderationsRetrieval Latency FactorsScaling1️⃣1️⃣ Cost Optimization1️⃣2️⃣ Observability1️⃣3️⃣ Enterprise PatternsMulti-Tenant StrategyMetadata Filtering + VectorSecurity Trimming1️⃣4️⃣ RAG Evaluation with AI SearchRetrieval MetricsGeneration Metrics🔥 Core Mental Models
Copyright 2026 Vivek Kaushik