Vivek Kaushik

Azure AI Search is not just a search engine — it is the retrieval backbone of enterprise RAG systems.

1️⃣ What is Azure AI Search?

Azure AI Search is a fully managed search service that enables:

Full-text search (BM25)

Semantic ranking

Vector search

Hybrid search

Indexing pipelines

Skill-based enrichment

Filtering & faceting

Security trimming

In modern AI systems, it acts as:

The retrieval layer in RAG architecture.

2️⃣ Core Components

1️⃣ Index

Logical container that stores searchable documents.

Defines:

Fields

Data types

Filterable fields

Sortable fields

Vector fields

Semantic configuration

Important:

Schema changes require reindexing.

2️⃣ Documents

Each document is a JSON object with fields defined in index schema.

Example:


{
  "id": "doc-123",
  "content": "...",
  "title": "...",
  "tenantId": "mankind",
  "vector": [0.12, -0.45, ...]
}

3️⃣ Data Source

Where documents come from:

Blob Storage

Azure SQL

Cosmos DB

SharePoint

Custom REST endpoint

4️⃣ Indexer

Automated crawler that:

Reads from data source

Applies skillset

Pushes documents to index

Supports:

Scheduled runs

Incremental indexing

Change detection

Soft delete detection

5️⃣ Skillset

Used for enrichment during ingestion.

Examples:

Text splitting

Entity recognition

Key phrase extraction

Embedding generation

Image analysis

3️⃣ Search Types

1️⃣ Full-Text Search (BM25)

Traditional keyword-based ranking.

Best for:

Exact matches

Structured text

2️⃣ Semantic Search

Uses deep language understanding to:

Rerank results

Extract captions

Extract answers

Improves precision.

Requires:

Semantic configuration

3️⃣ Vector Search

Uses embeddings for similarity search.

Best for:

Conceptual similarity

Synonym handling

Natural language search

4️⃣ Hybrid Search (Recommended)

Combines:

Keyword (BM25)

Semantic reranking

Vector similarity

Hybrid search improves both recall and precision.

Enterprise best practice.

4️⃣ Index Design Best Practices

Separate Fields Properly

searchable

filterable

sortable

facetable

retrievable

Do NOT make everything searchable.

Vector Fields

Define:

Collection(Edm.Single)

dimensions

vectorSearchConfiguration

Match dimension exactly with embedding model.

Semantic Configuration

Defines:

Title field

Content field

Keywords

Used by semantic ranker.

5️⃣ Ingestion Models

1️⃣ On-Demand (Synchronous)

User triggers ingestion.

Pros:

Immediate availability

Cons:

High latency

Cost spikes

Poor scalability

Best for:

Small POCs

2️⃣ Batch Ingestion (Offline)

Scheduled indexers.

Pros:

Predictable cost

Scalable

Controlled

Best for:

Large document corpus

3️⃣ Event-Driven Ingestion

Triggered by:

Blob events

Cosmos change feed

Event Grid

Flow:

Event → Function → Push API → Index update

Best for:

Near real-time updates

4️⃣ Hybrid Ingestion

Combine:

Batch for large sets

Event-driven for updates

On-demand for small uploads

Most enterprise systems use hybrid.

6️⃣ Index Update Strategies

Azure AI Search does NOT auto-trigger on updates.

Use:

1️⃣ Scheduled Indexers

Supports incremental indexing.

2️⃣ Push API

Direct document upload/update.

3️⃣ Event Grid + Azure Function

Simulated trigger-based indexing.

Zero-Downtime Schema Changes

Use Index Aliases:

Create new index (v2)

Backfill

Switch alias

Delete old index

Never modify schema in-place in production.

7️⃣ Retrieval Strategy in RAG

Typical Query Flow:

Generate query embedding

Execute hybrid search

Apply filters

Retrieve Top-K

Inject into prompt

Top-K Tuning

Too low:

Miss context

Too high:

Context overflow

Noise

Tune experimentally.

8️⃣ Common Failure Modes

1️⃣ Poor Chunking

Fragmented or noisy context.

2️⃣ Low Retrieval Recall

Relevant docs not retrieved.

3️⃣ High Hallucination Despite Retrieval

Often prompt or context overflow issue.

4️⃣ Stale Index

Ingestion pipeline failed.

5️⃣ Schema Drift

Embedding dimension mismatch.

9️⃣ Document-Level RBAC (CRITICAL)

Security must be enforced at retrieval layer.

Implementation

Add metadata fields:

tenantId

department

role

securityGroupIds

Mark as:

filterable: true

Apply filter in query:


opts.Filter = $"tenantId eq '{userTenant}'";

Why This Matters

LLM is not a security boundary.

Unauthorized documents must never be retrieved.

🔟 Performance Considerations

Retrieval Latency Factors

Semantic ranking cost

Vector similarity cost

Top-K size

Index size

Replica/partition count

Scaling

Scale with:

Replicas (query throughput)

Partitions (index size)

1️⃣1️⃣ Cost Optimization

Costs come from:

Index size

Semantic ranker usage

Vector search

Embedding generation

Replica count

Optimize by:

Reducing Top-K

Using smaller embeddings

Caching embeddings

Batch ingestion

1️⃣2️⃣ Observability

Monitor:

Query latency

Search score distribution

Top-K usage

Indexer failures

Replica usage

Throttling

Use:

Azure Monitor

Application Insights

1️⃣3️⃣ Enterprise Patterns

Multi-Tenant Strategy

Option A:

Separate index per tenant

Option B:

Shared index + metadata filters

Most SaaS use shared index.

Metadata Filtering + Vector

Combine:

Vector search + filter condition

Prevents cross-tenant leakage.

Security Trimming

Always apply RBAC filter before LLM call.

1️⃣4️⃣ RAG Evaluation with AI Search

Measure:

Retrieval Metrics

Recall@K

Precision

Generation Metrics

Faithfulness

Groundedness

Hallucination rate

Evaluate retrieval and generation separately.

🔥 Core Mental Models

Retrieval quality determines generation quality.

Enforce security at search layer, not LLM layer.

Ingestion and query pipelines must scale independently.

Hybrid search is almost always superior to pure vector.

Index versioning prevents downtime.

Azure AI Search is not just a search engine — it is the retrieval backbone of enterprise RAG systems.

1️⃣ What is Azure AI Search?

Azure AI Search is a fully managed search service that enables:

Full-text search (BM25)

Semantic ranking

Vector search

Hybrid search

Indexing pipelines

Skill-based enrichment

Filtering & faceting

Security trimming

In modern AI systems, it acts as:

The retrieval layer in RAG architecture.

2️⃣ Core Components

1️⃣ Index

Logical container that stores searchable documents.

Defines:

Fields

Data types

Filterable fields

Sortable fields

Vector fields

Semantic configuration

Important:

Schema changes require reindexing.

2️⃣ Documents

Each document is a JSON object with fields defined in index schema.

Example:


{
  "id": "doc-123",
  "content": "...",
  "title": "...",
  "tenantId": "mankind",
  "vector": [0.12, -0.45, ...]
}

3️⃣ Data Source

Where documents come from:

Blob Storage

Azure SQL

Cosmos DB

SharePoint

Custom REST endpoint

4️⃣ Indexer

Automated crawler that:

Reads from data source

Applies skillset

Pushes documents to index

Supports:

Scheduled runs

Incremental indexing

Change detection

Soft delete detection

5️⃣ Skillset

Used for enrichment during ingestion.

Examples:

Text splitting

Entity recognition

Key phrase extraction

Embedding generation

Image analysis

3️⃣ Search Types

1️⃣ Full-Text Search (BM25)

Traditional keyword-based ranking.

Best for:

Exact matches

Structured text

2️⃣ Semantic Search

Uses deep language understanding to:

Rerank results

Extract captions

Extract answers

Improves precision.

Requires:

Semantic configuration

3️⃣ Vector Search

Uses embeddings for similarity search.

Best for:

Conceptual similarity

Synonym handling

Natural language search

4️⃣ Hybrid Search (Recommended)

Combines:

Keyword (BM25)

Semantic reranking

Vector similarity

Hybrid search improves both recall and precision.

Enterprise best practice.

4️⃣ Index Design Best Practices

Separate Fields Properly

searchable

filterable

sortable

facetable

retrievable

Do NOT make everything searchable.

Vector Fields

Define:

Collection(Edm.Single)

dimensions

vectorSearchConfiguration

Match dimension exactly with embedding model.

Semantic Configuration

Defines:

Title field

Content field

Keywords

Used by semantic ranker.

5️⃣ Ingestion Models

1️⃣ On-Demand (Synchronous)

User triggers ingestion.

Pros:

Immediate availability

Cons:

High latency

Cost spikes

Poor scalability

Best for:

Small POCs

2️⃣ Batch Ingestion (Offline)

Scheduled indexers.

Pros:

Predictable cost

Scalable

Controlled

Best for:

Large document corpus

3️⃣ Event-Driven Ingestion

Triggered by:

Blob events

Cosmos change feed

Event Grid

Flow:

Event → Function → Push API → Index update

Best for:

Near real-time updates

4️⃣ Hybrid Ingestion

Combine:

Batch for large sets

Event-driven for updates

On-demand for small uploads

Most enterprise systems use hybrid.

6️⃣ Index Update Strategies

Azure AI Search does NOT auto-trigger on updates.

Use:

1️⃣ Scheduled Indexers

Supports incremental indexing.

2️⃣ Push API

Direct document upload/update.

3️⃣ Event Grid + Azure Function

Simulated trigger-based indexing.

Zero-Downtime Schema Changes

Use Index Aliases:

Create new index (v2)

Backfill

Switch alias

Delete old index

Never modify schema in-place in production.

7️⃣ Retrieval Strategy in RAG

Typical Query Flow:

Generate query embedding

Execute hybrid search

Apply filters

Retrieve Top-K

Inject into prompt

Top-K Tuning

Too low:

Miss context

Too high:

Context overflow

Noise

Tune experimentally.

8️⃣ Common Failure Modes

1️⃣ Poor Chunking

Fragmented or noisy context.

2️⃣ Low Retrieval Recall

Relevant docs not retrieved.

3️⃣ High Hallucination Despite Retrieval

Often prompt or context overflow issue.

4️⃣ Stale Index

Ingestion pipeline failed.

5️⃣ Schema Drift

Embedding dimension mismatch.

9️⃣ Document-Level RBAC (CRITICAL)

Security must be enforced at retrieval layer.

Implementation

Add metadata fields:

tenantId

department

role

securityGroupIds

Mark as:

filterable: true

Apply filter in query:


opts.Filter = $"tenantId eq '{userTenant}'";

Why This Matters

LLM is not a security boundary.

Unauthorized documents must never be retrieved.

🔟 Performance Considerations

Retrieval Latency Factors

Semantic ranking cost

Vector similarity cost

Top-K size

Index size

Replica/partition count

Scaling

Scale with:

Replicas (query throughput)

Partitions (index size)

1️⃣1️⃣ Cost Optimization

Costs come from:

Index size

Semantic ranker usage

Vector search

Embedding generation

Replica count

Optimize by:

Reducing Top-K

Using smaller embeddings

Caching embeddings

Batch ingestion

1️⃣2️⃣ Observability

Monitor:

Query latency

Search score distribution

Top-K usage

Indexer failures

Replica usage

Throttling

Use:

Azure Monitor

Application Insights

1️⃣3️⃣ Enterprise Patterns

Multi-Tenant Strategy

Option A:

Separate index per tenant

Option B:

Shared index + metadata filters

Most SaaS use shared index.

Metadata Filtering + Vector

Combine:

Vector search + filter condition

Prevents cross-tenant leakage.

Security Trimming

Always apply RBAC filter before LLM call.

1️⃣4️⃣ RAG Evaluation with AI Search

Measure:

Retrieval Metrics

Recall@K

Precision

Generation Metrics

Faithfulness

Groundedness

Hallucination rate

Evaluate retrieval and generation separately.

🔥 Core Mental Models

Retrieval quality determines generation quality.

Enforce security at search layer, not LLM layer.

Ingestion and query pipelines must scale independently.

Hybrid search is almost always superior to pure vector.

Index versioning prevents downtime.

🔎 Azure AI Search

1️⃣ What is Azure AI Search?

2️⃣ Core Components

1️⃣ Index

2️⃣ Documents

3️⃣ Data Source

4️⃣ Indexer

5️⃣ Skillset

3️⃣ Search Types

1️⃣ Full-Text Search (BM25)

2️⃣ Semantic Search

3️⃣ Vector Search

4️⃣ Hybrid Search (Recommended)

4️⃣ Index Design Best Practices

Separate Fields Properly

Vector Fields

Semantic Configuration

5️⃣ Ingestion Models

1️⃣ On-Demand (Synchronous)

2️⃣ Batch Ingestion (Offline)

3️⃣ Event-Driven Ingestion

4️⃣ Hybrid Ingestion

6️⃣ Index Update Strategies

1️⃣ Scheduled Indexers

2️⃣ Push API

3️⃣ Event Grid + Azure Function

Zero-Downtime Schema Changes

7️⃣ Retrieval Strategy in RAG

Top-K Tuning

8️⃣ Common Failure Modes

1️⃣ Poor Chunking

2️⃣ Low Retrieval Recall

3️⃣ High Hallucination Despite Retrieval

4️⃣ Stale Index

5️⃣ Schema Drift

9️⃣ Document-Level RBAC (CRITICAL)

Implementation

Why This Matters

🔟 Performance Considerations

Retrieval Latency Factors

Scaling

1️⃣1️⃣ Cost Optimization

1️⃣2️⃣ Observability

1️⃣3️⃣ Enterprise Patterns

Multi-Tenant Strategy

Metadata Filtering + Vector

Security Trimming

1️⃣4️⃣ RAG Evaluation with AI Search

Retrieval Metrics

Generation Metrics

🔥 Core Mental Models

🔎 Azure AI Search

1️⃣ What is Azure AI Search?

2️⃣ Core Components

1️⃣ Index

2️⃣ Documents

3️⃣ Data Source

4️⃣ Indexer

5️⃣ Skillset

3️⃣ Search Types

1️⃣ Full-Text Search (BM25)

2️⃣ Semantic Search

3️⃣ Vector Search

4️⃣ Hybrid Search (Recommended)

4️⃣ Index Design Best Practices

Separate Fields Properly

Vector Fields

Semantic Configuration

5️⃣ Ingestion Models

1️⃣ On-Demand (Synchronous)

2️⃣ Batch Ingestion (Offline)

3️⃣ Event-Driven Ingestion

4️⃣ Hybrid Ingestion

6️⃣ Index Update Strategies

1️⃣ Scheduled Indexers

2️⃣ Push API

3️⃣ Event Grid + Azure Function

Zero-Downtime Schema Changes

7️⃣ Retrieval Strategy in RAG

Top-K Tuning