PostgreSQL Physical Storage

This blog will discuss about how data is stored in PostgreSQL and other files that helps database operations.

A bit of terminology :

A tuple or an item is a synonym for a row
A relation is a synonym for a table
A file node is an id – a reference to a table or an index.
A block and page are equals – a 8kb segment information.
A heap refer to heap file. Heap files are lists of unordered records of variable size.
OID stands for Object Identifier.

Where is my database stored :

As we aware the Postgres instance will contain two paths.

Bin path — Where all the executables will be stored.
Data path — Where the actual databases will be stored.

Inside data/base folder you can see folders with each database OID. There will be one folder for each database. We can relate them by using pg_database view.

What is there in data path ?

The data path contains mostly below folders and configuration files.

Where tables are stored :

Inside the database folder, you can see each file for each relation/index. Some file names will end with suffix like fsm(free space mapping), and vm(virtual mapping). These are created and utilized during vacuum process.

FSM: which stores information about free space available in the relation.

VM: VM to keep track of which pages contain only tuples that are known to be visible to all active transactions. In others words, to track which pages are known to have no dead row.

When a table exceeds 1 GB, it is divided into gigabyte-sized segments. The first segment’s file name is the same as the filenode; subsequent segments are named filenode.1, filenode.2, etc. This arrangement avoids problems on platforms that have file size limitations.

TOAST :

PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. When a row is attempted to be stored that exceeds this size, TOAST basically breaks up the data of large columns into smaller “pieces” and stores them into a TOAST table. Each table you create has its own associated (unique) TOAST table, which may or may not ever end up being used, depending on the size of rows you insert. All of this is transparent to the user and enabled by default. The mechanism is accomplished by splitting up the large column entry into 2KB bytes and storing them as chunks in the TOAST tables. It then stores the length and a pointer to the TOAST entry back where the column is normally stored. Because of how the pointer system is implemented, most TOAST’able column types are limited to a max size of 1GB.

TOAST has a number of advantages compared to a more straightforward approach such as allowing row values to span pages. The big values of TOASTed attributes will only be pulled out (if selected at all) at the time the result set is sent to the client. The table itself will much smaller and more of its rows fit in the shared buffer cache than would be the case without any out-of-line storage (TOAST). It’s also more likely that the sort sets get smaller which imply having sorts being done entirely in memory.

I am Harisai