PostgreSQL

Internals

The Data

Where is all the data stored

In Postgres, all data is generally stored inside /base folder.

You can check where this folder is with:

SHOW data_directory;

Find which data folder belongs to which database with:

SELECT oid, datname FROM pg_database;

There are some different types of files for different purposes.

The Heap or Heap file, is the file that contains all the data (rows) of a table, (usually the files that has a number as name).

A single Heap file is divided into Blocks or Pages, where each page/block stores some number of rows.

At last, an individual row is called Tuple or Item.

Drawing
Drawing

The Connection

Connection pools

A pool internally maintains several different clients that can be reused.

Drawing

Explain & Explain Analyze

About

EXPLAIN is a tool to show the Planner Step of a query, which is the step that decides the most optimal way to execute the desired query.

EXPLAIN builds a query plan and display info about it.

EXPLAIN ANALYZE builds a query plan, run it, and info about it.

EXPLAIN ANALYZE output

EXPLAIN ANALYZE
SELECT * FROM table_name;
Example
Hash Join (cost=8.28..1756.30 rows=11 width=81) (actual time=0.030..34.031 rows=7 loops=1)
  Hash Cond: (comments.user_id = users.id)
  -> Seq Scan on comments (cost=0.00..1589.30 rows=60410 width=72) (actual time=0.030..14.031 rows=60410 loops=1)
  -> Hash (cost=8.30..8.30 rows=1 width=17) (actual time=0.050..0.051 rows=1 loops=1)
    Buckets: 1024 Batches: 1 Memory Usage: 9kb
    -> Index Scan using users_username_idx on users (cost=0.28..8.30 rows=1 width=17) (actual time=0.030..0.031 rows=1 loops=1)
       Index Cond: ((username)::text = 'Admin'::text)
Planning Time: 3.4545 ms
Execution Time: 34.496 ms

Lines starting with arrows -> are query nodes, which are the steps trying to access data inside the database or trying to do some processing. (The first line is also a query node, even though it doesn't have the arrow)

The order to read the EXPLAIN planning result, is starting from the inner most rows.

And the results from each the query nodes, will be passed to the nearest query node PARENT.

Drawing

Costs

Not the full calculation, only some of the basic variables.

Drawing

Tip

Not all query types will use all the variables.

For a sequencial read, as example, only #page read sequentially and #rows scanned.

pg_stats

Check them with:

SELECT * FROM pg_stats WHERE tablename = 'table-name';

Scan types

Sequential scan

Will be used:

  • As default when no other valuable alternative is available.

  • When reading from the begining of the dataset. (e.g. SELECT * FROM table_name)

  • When the filtering clause is not very limiting, so that the end result will be almost the whole table contents.

It is fast when data is read sequencially.

Index nodes scans

Will be used:

  • When an index is used to access the dataset.

  • When data file and index files are separated but they are nearby.

Scans on index nodes can be separated in:

  • Index scan: index -> seeking the tuples -> the read again the data. (2 trip scan, it must get data of the index col and then get the rest of the data for other columns) (e.g. SELECT * FROM table_name WHERE index_col = 1)

  • Index Only scan: requested index columns only -> directly get data from index file. (when it only needs to return data from the index column) (e.g. SELECT index_col FROM table_name WHERE index_col = 1)

  • Bitmap index scan: builds a memory bitmap of where the tuples that satistfy the statement clauses.

Join nodes scans

Will be used:

  • When joining tables.

  • When joins are performaed on two tables at a time; If more tables are joined together, the output of one join is treated as input to a subsequent join.

When joining a large number of tables, the genetic query optimizer settings may affect what combinstions of joins are considered.

Scans on join nodes can be separated in:

  • Hashed join: Builds a hash table from the inner table, keyed by the join key, then scan the outer table, checking if a corresponding value is present.

  • Merged join: Joins two children already sorted bu their shared join key. This only needs to scan each relation once, but both inputs need to be sorted buy the join key first.

  • Nested loop: For each row in the outer table, iterate through all the rows in the inner table and see if they match the join condition. If the inner relation can be scanned with an index, that can improve the performance of a nested loop join. (This is generally an inefficient way to process joins but is always available and sometimes mey be the only option)

Indexes

About

Index is a data structure that efficiently tells which block/index a record is stored at, thus significantly reducing search time, and avoiding Full Table Scans.

Drawing

Types of index

Type
Description

B-Tree

General purpose index. 99% of the time you want this.

Hash

Speeds up simple equality checks.

GiST

Geometry, full-text search.

SP-GiST

Clustered data, such as dates - many rows might have the same year.

GIN

For columns that contain arrays of JSON data.

BRIN

Specialized for really large datasets.

B-Tree

It is the default index type when creating an index. B-Tree is a self-balancing tree that can run SELECT, INSERT, DELETE and Sequential reads in logarithmic time log n.

Hash

An index type only used for equality operators =.

They also can consume more space compared to B-Tree.

GIN

Stands for (Generalized Inverted Index), and its core capability is to speed up full text searches.

GIN index stores "key" (or an element or a value) and the "position list" pairs. The position list is the rowID of the key. This means, if the "key" occurs at multiple places in the document, GIN index stores the key only once along with its position of occurrences which not only keeps the GIN index compact in size and also helps speed-up the searches.

CREATE INDEX index_name ON table_name USING GIN(col_name) (col_name);

BRIN

It is based on the range of values and data blocks on a storage.

It is smaller and less contly to maintain than B-Tree index.

Creating

-- CREATE [UNIQUE] INDEX index_name ON table_name [USING index_type]
-- (
--     column_name [ASC | DESC] [NULLS [FIRST | LAST]],
--     ...
-- )

Where index_name conventionally follows table_colname1_colname2_idx.

Deleting

DROP INDEX [CONCURRENTLY] [IF EXISTS] index_name [CASCADE | RESTRICT];

CONCURRENTLY means when you execute the DROP statement, Postgres aquires an exclusive lock on the table and block other accesses until the index removal completes.

CASCADE means if the index has dependent objects, it will automatically drop these objects and all objects that depend on those objects. (It is recursive)

RESTRICT tells Postgres to refuse to drop the index if any objects depend on it. (Used by default)

Managing

Listing all indexes

-- Entire database
SELECT * FROM pg_indexes;

-- Only from public schema
SELECT * FROM pg_indexes WHERE schemaname = 'public';

-- Only for a particular table
SELECT * FROM pg_indexes WHERE
    tablename = 'table_name'
    and schemaname = 'public'
;

Check the size of index table

SELECT pg_size_pretty(pg_indexes_size('table_name'));

Check how much space indexes take in relation to the table

SELECT pg_size_pretty(pg_relation_size('table_name'));

Check stats for all indexes

-- Entire database
SELECT * FROM pg_stat_all_indexes;

-- Only from public schema
SELECT * FROM pg_stat_all_indexes WHERE schemaname = 'public';

-- Only for a particular table
SELECT * FROM pg_stat_all_indexes WHERE
    schemaname = 'public'
    and tablename = 'table_name'
;

Schemas

Schemas in Postgres are namespaces (like folders) to help organize things in a database. Each Schema can have its own separate copy of a table.

Drawing

Every database gets a default schema called "public".

Creating

When creating a new schema in a database, it will not pull the tables nor data from the public schema. It will be totally empty.

CREATE SCHEMA schema_name;

Creating tables in this schema

CREATE TABLE schema_name.table_name (
    -- columns
);

Moving tables between schemas

ALTER TABLE schema_name.table_name SET SCHEMA other_schema;

Access rights

There are two access levels rights:

  • USAGE: to access schema.

  • CREATE: to create objects in the schema.

-- GRANT [USAGE | CREATE] ON SCHEMA schema_name TO [user | role]

-- Will grant access to the schema
GRANT USAGE ON SCHEMA schema_name TO role_name;

-- But will also have to grant read to tables inside the schema
GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO role_name;

More on user priviledges Roles/Users.

Changing default schema

Show how Postgres is handling the default value

SHOW search_path;

Usually, the default result of this query will be "$user", public. This means that it will always - if not specified - try to use:

  1. First, a schema named after the User that holds the connection.

  2. Or if it does not exist, the schema named public.

Change the search path

SET search_path TO test, public;

This will change the search path to be test, public.

Meaning it will first, try to find a schema named test. And if not existent, then the public one.

Domains Data Types

CREATE DOMAIN statement creates a user-defined data type with a range, optional DEFAULT, NOT NULL and CHECK constraint.

They have to be unique within a Schema scope and cannot be used outside of the Schema scope where they are defined.

They help standardize your database data types in one place.

Creating

CREATE DOMAIN us_postal_code TEXT NOT NULL
CHECK (
    VALUE ~'^\d{5}$'
    OR VALUE ~'^\d{5}-\d{4}$'
)

Then use it in other tables like:

CREATE TABLE locations {
    id SERIAL PRIMARY KEY
    postal us_postal_code
};

Listing

List all the Domain data types in a Schema with:

SELECT typname
FROM pg_catalog.pg_type
JOIN pg_catalog.pg_namespace
    ON pg_namespace.oid = pg_type.typnamespace
WHERE
    typtype = 'd' and nspname = 'public'
;

Deleting

DROP DOMAIN name;

-- Will forcefully delete Columns in tables that use the domain data type
DROP DOMAIN name CASCADE;

Last updated