Sharding refers to the process of splitting data up across machines, the term partitioning is also sometimes used to describe the concept. By putting a subset of data on each machine, it becomes possible to store more data and handle more load without requiring large or more powerful machines, instead just a large quantity of less-powerful machines.

What sharding is?

Sharding is the method MongoDB uses to split a large collection across several servers. MongoDB’s sharding allows you to create a cluster of many machines (shards) and break up your collection across them, putting a subset of data on each shard.

Simple goals of sharding

Make cluster “invisible”

One of the goals of sharding is to make a cluster machine look like a single machine to your application. To accomplish this, MongoDB comes with a special routing process called mongos. Mongos sits in front of your cluster and looks like an ordinary mongod server to anything that connects to it. This router keeps a “table of content” that tells it which shard contains which data. Application can connect to this router and issue request normally. Knowing what data is on which shard, mongos is able to forward the request to the appropriate shards.

Make the cluster always available

A cluster can’t guarantee it’ll always be available but MongoDB ensures maximum uptime in a couple different ways. Every part of a cluster can and should have atleast some redundant processes running on other machines so that if one machine goes down, the other one can automatically pick up the slack.

Let the cluster grow easily

As the system needs more space or resources, you should be able to add them. With MongoDB you can add as much capacity as you need it.

How sharding works?

To understand how sharding works you must understand the basic components of sharding which are explained as follows:

Sharding Components

A sharded cluster consists of shards, mongos routers and config servers.
Sharding Components
Sharded cluster consists of shards, mongos routers and config servers.

Shards

A shard is a replica set or a single mongod that contains a subset of the data for the sharded cluster. Together, the cluster’s shards hold the entire data set of the cluster.

Mongos Routers

If each shard contains part of the cluster’s data, then you still need an interface to the cluster as a whole. The mongos process is a router that directs all reads and writes to the appropriate shards. Mongos processes are lightweight and non-persistent.

Config Servers

As Mongos processes are non-persistent, then something must durably store the shard cluster’s, that’s the job of the config servers. The config servers persist the shard cluster’s metadata.

Sharding Components

Sharding in MonogDB allows the clusters to be easy to use and easy to administrate and it allows your application to grow easily, robustly and naturally as far as it needs to.