Storage engine is one of the key component of any database. It is, in fact, a software module which is used by database management system to perform all storage related operations e.g. create information, read information and update any information. The term storage means both disk storage and memory storage. Choosing right storage engine is need of the hour which will impact performance of database or application in greater way.
Considering this, MongoDB has tried to focus on this from the beginning. Initially they have used MMAPv1. It was designed to perform excellence in high volume inserts, reads as well as in-place updates. However, with growing requirement of making this database more robust, MongoDB was looking for more better options. This is because MMAPv1 was lacking some key features like “Compression” was not supported, snapshot & checkpoints, and MMAPv1 provides concurrency control features at collection level, however, a document level concurrency controls was the need of the hour. This is also because applications now need to support a variety of workloads like in-memory read and write, real time analytics.
With the advent of version 3.0 MongoDB has introduced key feature in the storage engine which is known as “Pluggable” storage engine which means the storage API is written in such a way that it allows MongoDB to configure different storage engines of “Developer’s” choice to facilitates their storage access requirements and reduce complexity of running multiple databases with added advantages that more than one type of “Pluggable” storage engine can co-exists. However, even this was not seems to be sufficient to achieve growing storage access requirements.
It was a game changer in storage engine technology for MongoDB when they acquired a high performance, NoSQL, Open Source extensible platform for data management i.e. WiredTiger in December 16, 2014. Besides “Encryption At Rest” WiredTiger provides many features like much required document level concurrency control, checkpointing, and compression. With this storage engine, MongoDB has achieved eventually what they required in their storage engine functionality. However, MongoDB and WiredTiger team is constantly striving for further improvement.
On the basis of its quality, WiredTiger has replaced its goodness it replaces MMAPv1 as default storage engine starting from version 3.2.
As per WiredTiger itself, this storage engine is designed to offer:
- Both low latency and high throughput (in-cache reads require no latching, writes typically require a single latch)
- Handles data sets much larger than RAM without performance or resource degradation
- Predictable behavior under heavy access and large volumes of data
- Transactional semantics without blocking
- Stores are not corrupted by torn writes, reverting to the last checkpoint after system failure
- Petabyte tables support, records up to 4GB, and record numbers up to 64-bits
The Robustness shown by WiredTiger storage engine is because of the way it was developed. Implemented with different programming techniques, this storage engine not only performs more work per CPU if we compare with other storage engine but also can be scaled on modern multi-CPU architecture. With the use of “Optimistic concurrency control algorithm”, operations on one thread do not block another thread which means strong isolation is provided by WiredTiger where every single update conflicts are detected to preserve data consistency.
Further, with support of row as well as column oriented both, WiredTiger is very efficient in memory management which supports mix and match operation on both row & column stores at table level. This storage engine is so versatile that it not only supports different sized B-tree internal and leaf pages in the same file but also supports “Log-Structured merge tree” to buffer updates in small files which fits into cache and then merge automatically into large files. Moreover, “Log-Structured merge tree” is also known for automatically creating bloom filters to avoid unnecessary reads from files which does not contains matching keys.
When it comes to the size of information written to the disk, WiredTiger gets advantages here as well by using compact file format which reduces 20-50% of the work for itself. With the help of block compression and variable length pages, it reduces the amount of information required to be written by 30-80% further.
With many more other features comes with WiredTiger, this storage engine has made work of engineers at MongoDB easier and allow them to focus on other areas instead of storage engine.