PostgreSQL Internals

PostgreSQL 18.1 DocumentationPostgreSQL Documentation

The Data

Where is all the data stored

In Postgres, all data is generally stored inside /base folder.

You can check where this folder is with:

SHOW data_directory;

Find which data folder belongs to which database with:

SELECT oid, datname FROM pg_database;

How data is stored in the files

There are some different types of files for different purposes.

The Heap or Heap file, is the file that contains all the data (rows) of a table, (usually the files that has a number as name).

A single Heap file is divided into Blocks or Pages, where each page/block stores some number of rows.

At last, an individual row is called Tuple or Item.

The Connection

Crashed connections will ROLLBACK transactions automatically.

Connection pools

A pool internally maintains several different clients that can be reused.

Explain & Explain Analyze

About

EXPLAIN is a tool to show the Planner Step of a query, which is the step that decides the most optimal way to execute the desired query.

EXPLAIN builds a query plan and display info about it.

EXPLAIN ANALYZE builds a query plan, run it, and info about it.

EXPLAIN ANALYZE output

The time metrics from normal queries may be misleading, since it may include networking time.

Use Execution Time result to check how long the query actually took.

EXPLAIN ANALYZE
SELECT * FROM table_name;

Example

Hash Join (cost=8.28..1756.30 rows=11 width=81) (actual time=0.030..34.031 rows=7 loops=1)
  Hash Cond: (comments.user_id = users.id)
  -> Seq Scan on comments (cost=0.00..1589.30 rows=60410 width=72) (actual time=0.030..14.031 rows=60410 loops=1)
  -> Hash (cost=8.30..8.30 rows=1 width=17) (actual time=0.050..0.051 rows=1 loops=1)
    Buckets: 1024 Batches: 1 Memory Usage: 9kb
    -> Index Scan using users_username_idx on users (cost=0.28..8.30 rows=1 width=17) (actual time=0.030..0.031 rows=1 loops=1)
       Index Cond: ((username)::text = 'Admin'::text)
Planning Time: 3.4545 ms
Execution Time: 34.496 ms

Lines starting with arrows -> are query nodes, which are the steps trying to access data inside the database or trying to do some processing. (The first line is also a query node, even though it doesn't have the arrow)

The order to read the EXPLAIN planning result, is starting from the inner most rows.

And the results from each the query nodes, will be passed to the nearest query node PARENT.

The cost of a Parent node will start up from the SUM of costs of its children, and then, add its own processing cost.

Costs

How the cost of a query node is calculated

Not the full calculation, only some of the basic variables.

Tip

Not all query types will use all the variables.

For a sequencial read, as example, only #page read sequentially and #rows scanned.

pg_stats

rows and width fields from EXPLAIN are taken from a stats table that Postgres maintain about the tables.

Check them with:

SELECT * FROM pg_stats WHERE tablename = 'table-name';

Scan types

Sequential scan

Will be used:

As default when no other valuable alternative is available.
When reading from the begining of the dataset. (e.g. SELECT * FROM table_name)
When the filtering clause is not very limiting, so that the end result will be almost the whole table contents.

It is fast when data is read sequencially.

Index nodes scans

Will be used:

When an index is used to access the dataset.
When data file and index files are separated but they are nearby.

Scans on index nodes can be separated in:

Index scan: index -> seeking the tuples -> the read again the data. (2 trip scan, it must get data of the index col and then get the rest of the data for other columns) (e.g. SELECT * FROM table_name WHERE index_col = 1)
Index Only scan: requested index columns only -> directly get data from index file. (when it only needs to return data from the index column) (e.g. SELECT index_col FROM table_name WHERE index_col = 1)
Bitmap index scan: builds a memory bitmap of where the tuples that satistfy the statement clauses.

Join nodes scans

Will be used:

When joining tables.
When joins are performaed on two tables at a time; If more tables are joined together, the output of one join is treated as input to a subsequent join.

When joining a large number of tables, the genetic query optimizer settings may affect what combinstions of joins are considered.

Scans on join nodes can be separated in:

Hashed join: Builds a hash table from the inner table, keyed by the join key, then scan the outer table, checking if a corresponding value is present.
Merged join: Joins two children already sorted bu their shared join key. This only needs to scan each relation once, but both inputs need to be sorted buy the join key first.
Nested loop: For each row in the outer table, iterate through all the rows in the inner table and see if they match the join condition. If the inner relation can be scanned with an index, that can improve the performance of a nested loop join. (This is generally an inefficient way to process joins but is always available and sometimes mey be the only option)

Indexes

Full Table Scan

Is when PG has to load many (or all) rows from the heap file to memory.

(Frequently, but not always, poor performance)

Postgres' Planner, may decide not to use indexes on a query if doing sequencial scans are faster.

About

Index is a data structure that efficiently tells which block/index a record is stored at, thus significantly reducing search time, and avoiding Full Table Scans.

Retrieving information with Indexes, will usually do random page read and in some cases can be slower than doing sequencial scans.

Expecially if returning lots of data.

This means that indexes are really good for returning small selections of data, relative to the size of the data.

Indexes are stored in files that can considerably take a lot of space, since their size scale with the Data that you have.

Types of index

Type

Description

B-Tree

General purpose index. 99% of the time you want this.

Hash

Speeds up simple equality checks.

GiST

Geometry, full-text search.

SP-GiST

Clustered data, such as dates - many rows might have the same year.

GIN

For columns that contain arrays of JSON data.

BRIN

Specialized for really large datasets.

B-Tree

It is the default index type when creating an index. B-Tree is a self-balancing tree that can run SELECT, INSERT, DELETE and Sequential reads in logarithmic time log n.

Hash

An index type only used for equality operators =.

They also can consume more space compared to B-Tree.

GIN

Stands for (Generalized Inverted Index), and its core capability is to speed up full text searches.

GIN index stores "key" (or an element or a value) and the "position list" pairs. The position list is the rowID of the key. This means, if the "key" occurs at multiple places in the document, GIN index stores the key only once along with its position of occurrences which not only keeps the GIN index compact in size and also helps speed-up the searches.

Depending on the complexity of the data, maintaining GIN indexes can be expensive.

CREATE INDEX index_name ON table_name USING GIN(col_name) (col_name);

BRIN

It is based on the range of values and data blocks on a storage.

It is smaller and less contly to maintain than B-Tree index.

Creating

-- CREATE [UNIQUE] INDEX index_name ON table_name [USING index_type]
-- (
--     column_name [ASC | DESC] [NULLS [FIRST | LAST]],
--     ...
-- )

Where index_name conventionally follows table_colname1_colname2_idx.

When creating multi-column indexes, always place the most selective columns first, so the index access method will be the cheapest.

Deleting

DROP INDEX [CONCURRENTLY] [IF EXISTS] index_name [CASCADE | RESTRICT];

CONCURRENTLY means when you execute the DROP statement, Postgres aquires an exclusive lock on the table and block other accesses until the index removal completes.

CASCADE means if the index has dependent objects, it will automatically drop these objects and all objects that depend on those objects. (It is recursive)

RESTRICT tells Postgres to refuse to drop the index if any objects depend on it. (Used by default)

Managing

Listing all indexes

-- Entire database
SELECT * FROM pg_indexes;

-- Only from public schema
SELECT * FROM pg_indexes WHERE schemaname = 'public';

-- Only for a particular table
SELECT * FROM pg_indexes WHERE
    tablename = 'table_name'
    and schemaname = 'public'
;

Check the size of index table

SELECT pg_size_pretty(pg_indexes_size('table_name'));

Check how much space indexes take in relation to the table

SELECT pg_size_pretty(pg_relation_size('table_name'));

Check stats for all indexes

-- Entire database
SELECT * FROM pg_stat_all_indexes;

-- Only from public schema
SELECT * FROM pg_stat_all_indexes WHERE schemaname = 'public';

-- Only for a particular table
SELECT * FROM pg_stat_all_indexes WHERE
    schemaname = 'public'
    and tablename = 'table_name'
;

Schemas

Schemas in Postgres are namespaces (like folders) to help organize things in a database. Each Schema can have its own separate copy of a table.

Every database gets a default schema called "public".

To runs commands and queries in custom schemas, the schema-name must prepend the table-name. Otherwise it will use the default "public" schema.

This behavior can be changed Changing default schema.

Schemas are interesting for API tests (Isolation). So that tests will access the same database but a different schema, thus changes in it will not reflect in the "public" schema.

Schemas allow to organize and limit database objects access to users.

Users can only access objects in the schemas that they own.

Creating

When creating a new schema in a database, it will not pull the tables nor data from the public schema. It will be totally empty.

CREATE SCHEMA schema_name;

Creating tables in this schema

CREATE TABLE schema_name.table_name (
    -- columns
);

Moving tables between schemas

ALTER TABLE schema_name.table_name SET SCHEMA other_schema;

Access rights

There are two access levels rights:

USAGE: to access schema.
CREATE: to create objects in the schema.

-- GRANT [USAGE | CREATE] ON SCHEMA schema_name TO [user | role]

-- Will grant access to the schema
GRANT USAGE ON SCHEMA schema_name TO role_name;

-- But will also have to grant read to tables inside the schema
GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO role_name;

Changing default schema

Show how Postgres is handling the default value

SHOW search_path;

Usually, the default result of this query will be "$user", public. This means that it will always - if not specified - try to use:

First, a schema named after the User that holds the connection.
Or if it does not exist, the schema named public.

Change the search path

SET search_path TO test, public;

This will change the search path to be test, public.

Meaning it will first, try to find a schema named test. And if not existent, then the public one.

Domains Data Types

CREATE DOMAIN statement creates a user-defined data type with a range, optional DEFAULT, NOT NULL and CHECK constraint.

They have to be unique within a Schema scope and cannot be used outside of the Schema scope where they are defined.

They help standardize your database data types in one place.

Domain data types are different from Composite (Objects) , which can hold composite objects.

Creating

CREATE DOMAIN us_postal_code TEXT NOT NULL
CHECK (
    VALUE ~'^\d{5}$'
    OR VALUE ~'^\d{5}-\d{4}$'
)

Then use it in other tables like:

CREATE TABLE locations {
    id SERIAL PRIMARY KEY
    postal us_postal_code
};

Listing

List all the Domain data types in a Schema with:

SELECT typname
FROM pg_catalog.pg_type
JOIN pg_catalog.pg_namespace
    ON pg_namespace.oid = pg_type.typnamespace
WHERE
    typtype = 'd' and nspname = 'public'
;

Deleting

Before dropping domains data types, make sure to not use them anymore in other tables.

DROP DOMAIN name;

-- Will forcefully delete Columns in tables that use the domain data type
DROP DOMAIN name CASCADE;

NextInstall & Management

Last updated 1 month ago

hashtagThe Data

hashtagWhere is all the data stored

hashtagHow data is stored in the filesarrow-up-right

hashtagThe Connection

hashtagConnection pools

hashtagExplain & Explain Analyze

hashtagAbout

hashtagEXPLAIN ANALYZE output

hashtagCosts

hashtagHow the cost of a query node is calculatedarrow-up-right

hashtagpg_stats

hashtagScan types

hashtagSequential scan

hashtagIndex nodes scans

hashtagJoin nodes scans

hashtagIndexes

hashtagAbout

hashtagTypes of index

hashtagB-Tree

hashtagHash

hashtagGIN

hashtagBRIN

hashtagCreating

hashtagDeleting

hashtagManaging

hashtagListing all indexes

hashtagCheck the size of index table

hashtagCheck how much space indexes take in relation to the table

hashtagCheck stats for all indexes

hashtagSchemas

hashtagCreating

hashtagCreating tables in this schema

hashtagMoving tables between schemas

hashtagAccess rights

hashtagChanging default schema

hashtagShow how Postgres is handling the default value

hashtagChange the search path

hashtagDomains Data Types

hashtagCreating

hashtagListing

hashtagDeleting

The Data

Where is all the data stored

How data is stored in the files

The Connection

Connection pools

Explain & Explain Analyze

About

EXPLAIN ANALYZE output

Costs

How the cost of a query node is calculated

pg_stats

Scan types

Sequential scan

Index nodes scans

Join nodes scans

Indexes

About

Types of index

B-Tree

Hash

GIN

BRIN

Creating

Deleting

Managing

Listing all indexes

Check the size of index table

Check how much space indexes take in relation to the table

Check stats for all indexes

Schemas

Creating

Creating tables in this schema

Moving tables between schemas

Access rights

Changing default schema

Show how Postgres is handling the default value

Change the search path

Domains Data Types

Creating

Listing

Deleting