- What is MongoDB?
- Why use MongoDB?
- What are the Advantages of MongoDB?
- When Should You Use MongoDB?
- How does MongoDB exactly store the data?
- MongoDB vs. RDBMS: What are the differences?
- How does MongoDB scale?
- How does MongoDB scale horizontally?
- What are the advantages of sharding?
- What methods can we use for sharding in MongoDB?
- How does atomicity and transaction work in MongoDB?
- What is an Aggregation Pipeline in MongoDB?
- Where should I use an index in MongoDB?
- How would you choose an indexing strategy in MongoDB and what are some common considerations we need to care about?
MongoDB is a document database built on a horizontal scale-out architecture that uses a flexible schema for storing data. Founded in 2007, MongoDB has a worldwide following in the developer community.
As a document database, MongoDB makes it easy for developers to store structured or unstructured data. It uses a JSON-like format to store documents. This format directly maps to native objects in most modern programming languages, making it a natural choice for developers, as they don’t need to think about normalizing data. MongoDB can also handle high volume and can scale both vertically or horizontally to accommodate large data loads.
- A Powerful Document-Oriented Database
- Developer User Experience
- Scalability and Transactionality
- Platform and Ecosystem Maturity
- Integrating large amounts of diverse data
- Describing complex data structures that evolve
- Delivering data in high-performance applications
- Supporting hybrid and multi-cloud applications
- Supporting agile development and collaboration
In MongoDB, records are stored as documents in compressed BSON files. The documents can be retrieved directly in JSON format, which has many benefits:
- It is a natural form to store data.
- It is human-readable.
- Structured and unstructured information can be stored in the same document.
- You can nest JSON to store complex data objects.
- JSON has a flexible and dynamic schema, so adding fields or leaving a field out is not a problem.
- Documents map to objects in most popular programming languages.
Most developers find it easy to work with JSON because it is a simple and powerful way to describe and store data.
One of the main differences between MongoDB and RDBMS is that RDBMS is a relational database while MongoDB is nonrelational. Likewise, while most RDBMS systems use SQL to manage stored data, MongoDB uses BSON for data storage.
While RDBMS uses tables and rows, MongoDB uses documents and collections. In RDBMS a table -- the equivalent to a MongoDB collection -- stores data as columns and rows. Likewise, a row in RDBMS is the equivalent of a MongoDB document but stores data as structured data items in a table. A column denotes sets of data values, which is the equivalent to a field in MongoDB.
mongo scales the same way as any distributed application would scale, throw more resources at it(Vertical Scaling
) or distribute the data on multiple physical servers or nodes (Horizontal Scaling
) .
or in other words:
Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space.
Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required.
While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server.
MongoDB supports horizontal scaling through sharding.
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
A MongoDB sharded cluster consists of the following components:
- shard: Each shard contains a subset of the sharded data. Each shard must be deployed as a replica set.
- mongos: The
mongos
acts as a query router, providing an interface between client applications and the sharded cluster.mongos
can support hedged reads to minimize latencies. - config servers: Config servers store metadata and configuration settings for the cluster. As of MongoDB 3.4, config servers must be deployed as a replica set (CSRS).
The following graphic describes the interaction of components within a sharded cluster:
MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.
MongoDB distributes the read and write workload across the shards in the sharded cluster, allowing each shard to process a subset of cluster operations. Both read and write workloads can be scaled horizontally across the cluster by adding more shards.
For queries that include the shard key or the prefix of a compound shard key, mongos
can target the query at a specific shard or set of shards. These targeted operations are generally more efficient than broadcasting to every shard in the cluster.
mongos
can support hedged reads to minimize latencies.
Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. As the data set grows, additional shards increase the storage capacity of the cluster.
The deployment of config servers and shards as replica sets provide increased availability.
Even if one or more shard replica sets become completely unavailable, the sharded cluster can continue to perform partial reads and writes. That is, while data on the unavailable shard(s) cannot be accessed, reads or writes directed at the available shards can still succeed.
MongoDB supports two sharding strategies for distributing data across sharded clusters.
Hashed Sharding involves computing a hash of the shard key field's value. Each chunk is then assigned a range based on the hashed shard key values.
Ranged sharding involves dividing data into ranges based on the shard key values. Each chunk is then assigned a range based on the shard key values.
In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.
When a single write operation (e.g. db.collection.updateMany()
) modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic.
For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports distributed transactions, including transactions on replica sets and sharded clusters.
An aggregation pipeline consists of one or more stages that process documents:
- Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values.
- The documents that are output from a stage are passed to the next stage.
- An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values.
If your application is repeatedly running queries on the same fields, you can create an index on those fields to improve performance.
Although indexes improve query performance, adding an index has negative performance impact for write operations. For collections with a high write-to-read ratio, indexes are expensive because each insert must also update any indexes.
How would you choose a indexing strategy in mongo and what are some common considerations we need to care about?
The best indexes for your application must take a number of factors into account, including the kinds of queries you expect, the ratio of reads to writes, and the amount of free memory on your system.
The best overall strategy for designing indexes is to profile a variety of index configurations with data sets similar to the ones you'll be running in production to see which configurations perform best. Inspect the current indexes created for your collections to ensure they are supporting your current and planned queries. If an index is no longer used, drop the index.
These are some considerations you need to take for your indexing strategy:
-
Use the ESR (Equality, Sort, Range) Rule
The ESR (Equality, Sort, Range) Rule is a guide to creating indexes that support your queries efficiently.
-
Create Indexes to Support Your Queries
An index supports a query when the index contains all the fields scanned by the query. Creating indexes that support queries results in greatly increased query performance.
-
Use Indexes to Sort Query Results
To support efficient queries, use the strategies here when you specify the sequential order and sort order of index fields.
-
When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing.
-
Create Indexes to Ensure Query Selectivity
Selectivity is the ability of a query to narrow results using the index. Selectivity allows MongoDB to use the index for a larger portion of the work associated with fulfilling the query.