Azure AI Search is not just a search engine — it is the retrieval backbone of enterprise RAG systems.
1️⃣ What is Azure AI Search?
Azure AI Search is a fully managed search service that enables:
- Full-text search (BM25)
- Semantic ranking
- Vector search
- Hybrid search
- Indexing pipelines
- Skill-based enrichment
- Filtering & faceting
- Security trimming
In modern AI systems, it acts as:
The retrieval layer in RAG architecture.
2️⃣ Core Components
1️⃣ Index
Logical container that stores searchable documents.
Defines:
- Fields
- Data types
- Filterable fields
- Sortable fields
- Vector fields
- Semantic configuration
Important:
Schema changes require reindexing.
2️⃣ Documents
Each document is a JSON object with fields defined in index schema.
Example:
{ "id": "doc-123", "content": "...", "title": "...", "tenantId": "mankind", "vector": [0.12, -0.45, ...] }
3️⃣ Data Source
Where documents come from:
- Blob Storage
- Azure SQL
- Cosmos DB
- SharePoint
- Custom REST endpoint
4️⃣ Indexer
Automated crawler that:
- Reads from data source
- Applies skillset
- Pushes documents to index
Supports:
- Scheduled runs
- Incremental indexing
- Change detection
- Soft delete detection
5️⃣ Skillset
Used for enrichment during ingestion.
Examples:
- OCR
- Text splitting
- Entity recognition
- Key phrase extraction
- Embedding generation
- Image analysis
3️⃣ Search Types
1️⃣ Full-Text Search (BM25)
Traditional keyword-based ranking.
Best for:
- Exact matches
- IDs
- Structured text
2️⃣ Semantic Search
Uses deep language understanding to:
- Rerank results
- Extract captions
- Extract answers
Improves precision.
Requires:
- Semantic configuration
3️⃣ Vector Search
Uses embeddings for similarity search.
Best for:
- Conceptual similarity
- Synonym handling
- Natural language search
4️⃣ Hybrid Search (Recommended)
Combines:
- Keyword (BM25)
- Semantic reranking
- Vector similarity
Hybrid search improves both recall and precision.
Enterprise best practice.
4️⃣ Index Design Best Practices
Separate Fields Properly
searchable
filterable
sortable
facetable
retrievable
Do NOT make everything searchable.
Vector Fields
Define:
Collection(Edm.Single)
dimensions
vectorSearchConfiguration
Match dimension exactly with embedding model.
Semantic Configuration
Defines:
- Title field
- Content field
- Keywords
Used by semantic ranker.
5️⃣ Ingestion Models
1️⃣ On-Demand (Synchronous)
User triggers ingestion.
Pros:
- Immediate availability
Cons:
- High latency
- Cost spikes
- Poor scalability
Best for:
- Small POCs
2️⃣ Batch Ingestion (Offline)
Scheduled indexers.
Pros:
- Predictable cost
- Scalable
- Controlled
Best for:
- Large document corpus
3️⃣ Event-Driven Ingestion
Triggered by:
- Blob events
- Cosmos change feed
- Event Grid
Flow:
Event → Function → Push API → Index update
Best for:
- Near real-time updates
4️⃣ Hybrid Ingestion
Combine:
- Batch for large sets
- Event-driven for updates
- On-demand for small uploads
Most enterprise systems use hybrid.
6️⃣ Index Update Strategies
Azure AI Search does NOT auto-trigger on updates.
Use:
1️⃣ Scheduled Indexers
Supports incremental indexing.
2️⃣ Push API
Direct document upload/update.
3️⃣ Event Grid + Azure Function
Simulated trigger-based indexing.
Zero-Downtime Schema Changes
Use Index Aliases:
- Create new index (v2)
- Backfill
- Switch alias
- Delete old index
Never modify schema in-place in production.
7️⃣ Retrieval Strategy in RAG
Typical Query Flow:
- Generate query embedding
- Execute hybrid search
- Apply filters
- Retrieve Top-K
- Inject into prompt
Top-K Tuning
Too low:
- Miss context
Too high:
- Context overflow
- Noise
Tune experimentally.
8️⃣ Common Failure Modes
1️⃣ Poor Chunking
Fragmented or noisy context.
2️⃣ Low Retrieval Recall
Relevant docs not retrieved.
3️⃣ High Hallucination Despite Retrieval
Often prompt or context overflow issue.
4️⃣ Stale Index
Ingestion pipeline failed.
5️⃣ Schema Drift
Embedding dimension mismatch.
9️⃣ Document-Level RBAC (CRITICAL)
Security must be enforced at retrieval layer.
Implementation
Add metadata fields:
- tenantId
- department
- role
- securityGroupIds
Mark as:
filterable: true
Apply filter in query:
opts.Filter = $"tenantId eq '{userTenant}'";
Why This Matters
LLM is not a security boundary.
Unauthorized documents must never be retrieved.
🔟 Performance Considerations
Retrieval Latency Factors
- Semantic ranking cost
- Vector similarity cost
- Top-K size
- Index size
- Replica/partition count
Scaling
Scale with:
- Replicas (query throughput)
- Partitions (index size)
1️⃣1️⃣ Cost Optimization
Costs come from:
- Index size
- Semantic ranker usage
- Vector search
- Embedding generation
- Replica count
Optimize by:
- Reducing Top-K
- Using smaller embeddings
- Caching embeddings
- Batch ingestion
1️⃣2️⃣ Observability
Monitor:
- Query latency
- Search score distribution
- Top-K usage
- Indexer failures
- Replica usage
- Throttling
Use:
- Azure Monitor
- Application Insights
1️⃣3️⃣ Enterprise Patterns
Multi-Tenant Strategy
Option A:
- Separate index per tenant
Option B:
- Shared index + metadata filters
Most SaaS use shared index.
Metadata Filtering + Vector
Combine:
Vector search + filter condition
Prevents cross-tenant leakage.
Security Trimming
Always apply RBAC filter before LLM call.
1️⃣4️⃣ RAG Evaluation with AI Search
Measure:
Retrieval Metrics
- Recall@K
- Precision
- MRR
Generation Metrics
- Faithfulness
- Groundedness
- Hallucination rate
Evaluate retrieval and generation separately.
🔥 Core Mental Models
Retrieval quality determines generation quality.
Enforce security at search layer, not LLM layer.
Ingestion and query pipelines must scale independently.
Hybrid search is almost always superior to pure vector.
Index versioning prevents downtime.
