Azure Cosmos DB

Deployment & Ease of Use
Easy multi-model API support, portal interface and SDKs
Key Functions
Globally distributed multi-model NoSQL database with tunable consistency
Pricing Basis
Request Units (RUs) consumed
Scalability
Global distribution, multi-region writes
Security & IAM
Encryption, RBAC, firewall, private link

1. What is Azure Cosmos DB?

Azure Cosmos DB is Microsoft's fully managed, globally distributed, multi-model NoSQL database service designed for high availability, low latency, elastic scalability, and rich querying for modern application needs.
  • It supports multiple data models: document, key-value, graph, column-family.
  • Offers turnkey global distribution across Azure regions with multi-master writes.
  • Provides comprehensive SLAs covering throughput, latency, availability, and consistency.
  • Enables schema-agnostic, JSON-based data storage with automatic indexing.
  • Supports multiple APIs (Core SQL, MongoDB, Cassandra, Gremlin, Table).

2. Core Architecture and Resource Model

  • Database Account: The top-level resource managing global distribution and API type.
  • Databases: Logical containers of data sets.
  • Containers (collections, tables, graphs): Store user data in schema-agnostic documents or records.
  • Items: Individual data entries (e.g., JSON documents).
  • Partitions: Physical partitions split data horizontally using a partition key for scalability.
    • Each physical partition can store up to 50 GB and handle 10,000 request units per second (RU/s).
    • Logical partitions store data grouped by partition key, up to 20 GB each.
  • Throughput: Provisioned in Request Units (RU/s), a normalized measure of CPU, IOPS, and memory needed per operation.
  • Multi-master / Multi-region replication for active-active globally distributed apps.
  • Consistency Models: Five well-defined levels from strong to eventual consistency with tradeoffs between latency and availability.

3. Data Models and APIs Supported

Model
API
Use Case
Document
Core SQL API, MongoDB API
JSON documents, schema flexible
Key-Value
Table API
Simple key-value stores
Graph
Gremlin API
Graph traversals, social networks, recommendations
Column-Family
Cassandra API
Wide-column stores, time-series

4. Key Features

  • Global Distribution: Add multiple Azure regions and replicate data transparently.
  • Elastic Scalability: Scale throughput (RU/s) and storage independently.
  • Multi-API Support: Develop using familiar APIs like MongoDB, Cassandra.
  • Automatic Indexing: Schema-agnostic indexing with customizable policies.
  • Comprehensive SLAs: Covers latency (<10 ms reads, <15 ms writes), throughput, availability (99.999% single region, 99.99% multi-region).
  • Multi-master Writes: Concurrent writes across regions with Conflict Resolution.
  • Consistency Levels: Strong, Bounded Staleness, Session, Consistent Prefix, Eventual.
  • Integration: Azure Functions, Logic Apps, Synapse, Azure Search.
  • Change Feed: Stream of changes data for event-driven architectures.

5. Pricing and Scaling

  • Pricing Model:
    • Based on provisioned Request Units per second (RU/s) consumed by your workloads.
    • Storage size (GB) used for data and indexes.
    • Number of regions configured for replication.
  • Autoscale: Dynamic RU/s scaling based on load, capped at max RU/s.
  • Serverless Mode: Pay-per-use with built-in auto-scaling, suitable for spiky workloads.
  • Multi-region writes: Additional charges for replication traffic.

6. Development

  • SDKs available for .NET, Java, Python, JavaScript, Node.js, Go, and others.
  • Rich query support with SQL-like syntax (for Core SQL API).
  • Support for stored procedures, triggers, user-defined functions (server-side JavaScript).
  • Use Cosmos DB Emulator for local development.

Simple Sample: Create and Query Container (C#)

using Microsoft.Azure.Cosmos; using System.Threading.Tasks; string endpoint = "<your-endpoint>"; string key = "<your-key>"; string databaseId = "SampleDB"; string containerId = "Items"; int throughput = 400; CosmosClient client = new CosmosClient(endpoint, key); // Create database and container Database db = await client.CreateDatabaseIfNotExistsAsync(databaseId); Container container = await db.CreateContainerIfNotExistsAsync(containerId, "/partitionKey", throughput); // Insert an item dynamic item = new { id = "1", partitionKey = "users", name = "Cosmos User" }; await container.CreateItemAsync(item, new PartitionKey(item.partitionKey)); // Query items var query = container.GetItemQueryIterator<dynamic>("SELECT * FROM c WHERE c.partitionKey = 'users'"); while (query.HasMoreResults) { foreach (var result in await query.ReadNextAsync()) { Console.WriteLine(result.name); } }

7. Security and IAM

  • Supports Azure Active Directory (Azure AD) authentication.
  • Role-Based Access Control (RBAC) integrated with Azure AD.
  • Resource tokens: scoped permissions for clients.
  • Network restrictions via Virtual Network Service Endpoints, Private Endpoints (Azure Private Link).
  • Data encrypted at rest by default using Microsoft-managed keys; optional Customer Managed Keys (CMK).
  • Encryption in transit with TLS.
  • Advanced Threat Protection and Firewall rules.

8. Consistency Models Explained

Consistency Level
Characteristics
Use Case
Strong
Linearizability, reads always see latest writes
Mission-critical financial and inventory apps
Bounded Staleness
Reads lag behind writes by a time or version window
Geo-distributed apps needing causal consistency
Session
Guarantees monotonic reads/writes within a client session
Interactive user apps requiring consistent reads per session
Consistent Prefix
Reads never see out-of-order writes
Messaging, social networking with causal ordering
Eventual
No ordering guarantees, lowest latency
High throughput, relaxed consistency scenarios

9. Deployment and Configuration

  • Provision Cosmos DB accounts with desired APIs, region(s), and consistency.
  • Configure throughput at the database or container level.
  • Manage indexing policies, Time to Live (TTL) on items, and analytical store (Synapse Link).
  • Set up multi-region writes and failover priorities.
  • Use ARM templates, Azure CLI, PowerShell, or Azure Portal for deployment automation.
  • Monitor via Azure Monitor, Cosmos DB metrics, and diagnostic logs.

10. Advanced Features

  • Change Feed: Stream item-level changes for real-time event processing.
  • Synapse Link: Analytics store for near real-time analytics with Azure Synapse without ETL.
  • Multi-master: Provide write availability and low latency globally.
  • Backup & Restore: Automatic backups with restore capabilities.
  • Time to Live (TTL): Auto-delete expired data.
  • Custom Conflict Resolution: When multi-master conflicts occur, custom policies can be used.

11. Monitoring and Troubleshooting

  • Monitor RU consumption, latency, availability using Azure metrics.
  • Use diagnostic logs for detailed operation insights.
  • Cosmos DB account health and throughput metrics in Azure Monitor.
  • Query and indexing performance analysis.
  • Alerts and autoscale configuration for cost/performance management.

12. Summary Table: Azure Cosmos DB

ServiceKey
Functions Description
Pricing Basis
Scalability
Security & IAM
Deployment & Ease of Use
Azure Cosmos DB
Globally distributed, multi-model NoSQL database with turnkey global distribution and guaranteed low latency
Provisioned/request units + storage
Elastic RU/s scaling, multi-region writes
Azure AD, RBAC, private endpoints, encryption at rest/in transit
Deploy with Portal, CLI, ARM; supports multiple APIs/SDKs

FAQ

1: What is Azure Cosmos DB?
A: Azure Cosmos DB is a fully managed, globally distributed, multi-model NoSQL database service designed for high availability, low latency, and elastic scalability. It supports multiple data models including document, key-value, graph, and column-family, and provides turnkey global distribution with multi-master writes and comprehensive SLAs for throughput, latency, availability, and consistency.
Q2: What data models and APIs does Cosmos DB support?
A: Cosmos DB supports:
  • Document Model: via Core (SQL) API and MongoDB API (JSON documents)
  • Key-Value Model: via Table API
  • Graph Model: via Gremlin API for graph traversals
  • Column-Family Model: via Cassandra API
    • This allows developers to use familiar tools and access patterns per application needs.
Q3: How does Cosmos DB achieve global distribution?
A: Cosmos DB lets you replicate your data across any number of Azure regions with a single click. It offers active-active multi-master replication allowing writes in all regions, with low-latency global reads and configurable failover priorities for disaster recovery.
Q4: Explain the throughput model of Cosmos DB.
A: Cosmos DB throughput is provisioned in Request Units per second (RU/s), a normalized measure of compute, I/O, and memory to serve requests. You can provision RU/s at database or container level, and can scale manually or use autoscale mode. Serverless mode provides pay-per-request billing for spiky applications.
Q5: Describe the consistency levels available in Cosmos DB.
A: Cosmos DB offers five consistency levels balancing availability and latency:
  • Strong: Linearizability; reads always see the latest writes.
  • Bounded Staleness: Reads lag behind writes by some staleness window.
  • Session: Guarantees monotonic reads/writes within a client session.
  • Consistent Prefix: Reads see writes in order but possibly delayed.
  • Eventual: No ordering guarantees; lowest latency and highest availability.

Intermediate Interview Questions and Answers

Q6: What is partitioning in Cosmos DB and why is it important?
A: Partitioning splits data horizontally across multiple physical partitions. Each partition handles a subset of the data based on a partition key. This allows Cosmos DB to scale throughput and storage elastically. Choosing an effective partition key that distributes workload evenly is critical for performance.
Q7: What is the Azure Cosmos DB Change Feed?
A: The Change Feed provides a persistent record of changes (adds and updates) to items within a container, ordered by time. It enables building event-driven architectures, real-time streaming, and data replication pipelines.
Q8: How does Cosmos DB handle indexing?
A: Cosmos DB automatically indexes all data by default with no schema or index management required. You can customize indexing policies per container to exclude specific paths or indexes or to optimize for specific query patterns.
Q9: What are Cosmos DB's multi-master capabilities?
A: Multi-master enables multiple regions to accept writes simultaneously, providing low-latency writes worldwide with multi-master replication and conflict resolution policies to handle concurrent updates.
Q10: How can you secure data in Azure Cosmos DB?
A: Cosmos DB supports Azure AD authentication with RBAC, resource tokens for fine-grained access, encryption at rest using Microsoft-managed or customer-managed keys, TLS for data in transit, private endpoints (Azure Private Link), and IP firewall rules.

Advanced and Tricky Interview Questions and Answers

Q11: What best practices would you follow when choosing a partition key?
A: A good partition key distributes data and request workload evenly across partitions to avoid hotspots, aligns query patterns and transactional boundaries, maintains manageable sized logical partitions (<20GB), and supports scalable throughput. Common poor choices include monotonically increasing values (like timestamps).
Q12: How does Cosmos DB ensure ACID transactions?
A: Cosmos DB provides ACID transactions scoped to a single logical partition. It supports transactional batch operations (Create, Update, Delete) on multiple items within the same partition key, ensuring atomicity and isolation.
Q13: How would you handle hot partitions in Cosmos DB?
A: Identify partition keys causing skewed traffic and redesign data model or partition strategy to flatten load. Implement retry with exponential backoff on throttled requests and scale-up RU/s or enable autoscale. Sometimes sharding or composite keys help.
Q14: What challenges arise from selecting the wrong consistency level?
A: Using Strong consistency increases latency and reduces availability in geo-distributed systems; Eventual may cause stale reads or anomalies inconsistent with app requirements. Picking an inappropriate level can affect throughput, latency, and correctness.
Q15: How does Cosmos DB conflict resolution work in multi-master?
A: It supports Last Writer Wins (timestamp-based) conflict resolution by default. You can also implement custom conflict handlers via server-side stored procedures or your application logic.
Q16: How do indexing policies impact provisioned throughput?
A: Complex or broad indexing increases RU consumption on write operations. Excluding large or rarely queried paths from indexing or switching to range/spatial indexes where needed optimizes RU costs and write latency.
Q17: How can you optimize performance and reduce RU consumption in Cosmos DB?
A: Use parameterized queries, leverage the appropriate consistency level, avoid cross-partition queries, project only necessary fields, apply indexing policies selectively, use server-side logic (stored procedures/triggers), and cache query results when feasible.

Scenario-Based Interview Questions and Answers

Q18: Design a globally distributed IoT data ingestion pipeline using Cosmos DB.
A: Use Cosmos DB with multi-region writes for low-latency ingestion globally. Select a partition key that includes device or geographic regions to distribute load. Enable the Change Feed to trigger Azure Functions for real-time processing and integrate with Synapse Link for analytics.
Q19: How do you migrate from a relational database to Cosmos DB?
A: Identify schema elements to map to Cosmos DB's schema-agnostic documents; design a suitable partition key; migrate and transform relational data into JSON documents; use bulk import tools; rework queries to Cosmos DB SQL API or relevant API; adapt for eventual consistency and scale.
Q20: How do you implement time to live (TTL) in Cosmos DB?
A: Enable TTL at container or item level to automatically delete expired items after a set period, useful for caching, session storage, or logs to reduce storage and cost.