In previous blog, we have detailed out about the inception of PostgreSQL. In this blog, we will explain the architecture (MVCC) with respect to PostgreSQL.
Concurrency Control:
There are more than one way to provide concurrency control by database. Concurrency is an important factor to allow many users to work on the same dataset at the same point of time by any databases.
MVCC (Multiversion concurrency control)
MVCC is a concurrency control method that is commonly being used by databases to achieve this. MVCC solves the concurrency issue by keeping multiple copies of data using snapshots. Different databases have implemented MVCC features in different ways.
In this blog, we will explore the PostgreSQL implementation of MVCC.
To understand it in a much better way, we have to explore how DMLs are being handled and what a transaction is.
Let’s start with PostgreSQL way of handling DMLs ( Insert, Update & Delete). In PostgreSQL’s world row of a table is known as “Tuple
“. Whenever an insert is received, it stores the data on block. There will be only one copy of this data created. This tuple is versioned by this database. For the sake of simplicity, now, let’s name it as version and assign the value 1.0. Later when we explain transactions, we will relate this.
However, when an update is received, it creates a copy of the tuple (version 1.0) in question on a new block and modify the newest copy of the tuple and increment its version (for example, say version 1.1). The database follows the same process for any and every update. For every update, a new version of the tuple created as well as an old version exists on disks. This technique is also known as copy-on-write
.
The same happens with delete also. The only difference in update and delete is, there will not be any further versioning of the deleted tuple.
Whenever a select comes in, the database always looks for the highest available version of the tuple in the table and provide the output from there. It will be more transparent to us once we understand how transactions happen.
A transaction is any piece of code that is executed inside the BEGIN & END block. If the end-user does not include BEGIN & END keyword, Postgres itself add it to the statement. Postgres assigns an ID to every transaction and called it XID
. It takes the value of the current XID, increments it, and assign it to the current transaction. Every tuple that has been committed to the database also includes the value of XID. This is achieved by adding this value as extra information to the tuple and name it as xmin
. We can now replace the above used name “version” with the xmin
. With each update on this tuple, the data is not updated in place. Actually, the tuple is copied to a new location on the block with a new incremented xmin
.
Postgres, attach another extra information with the same tuple known as xmax
in case of the delete operation.
We will explore this more in the next blog.