PostgreSQL
Internals
The Data
Where is all the data stored
In Postgres, all data is generally stored inside /base
folder.
You can check where this folder is with:
SHOW data_directory;
Find which data folder belongs to which database with:
SELECT oid, datname FROM pg_database;
There are some different types of files for different purposes.
The Heap or Heap file, is the file that contains all the data (rows) of a table, (usually the files that has a number as name).
A single Heap file is divided into Blocks or Pages, where each page/block stores some number of rows.
At last, an individual row is called Tuple or Item.
The Connection
Crashed connections will ROLLBACK
transactions automatically.
Connection pools
A pool internally maintains several different clients that can be reused.
Explain & Explain Analyze
About
EXPLAIN
is a tool to show the Planner Step of a query, which is the step that decides the most optimal way to execute the desired query.
EXPLAIN
builds a query plan and display info about it.
EXPLAIN ANALYZE
builds a query plan, run it, and info about it.
EXPLAIN ANALYZE output
The time metrics from normal queries may be misleading, since it may include networking time.
Use Execution Time result to check how long the query actually took.
EXPLAIN ANALYZE
SELECT * FROM table_name;
Hash Join (cost=8.28..1756.30 rows=11 width=81) (actual time=0.030..34.031 rows=7 loops=1)
Hash Cond: (comments.user_id = users.id)
-> Seq Scan on comments (cost=0.00..1589.30 rows=60410 width=72) (actual time=0.030..14.031 rows=60410 loops=1)
-> Hash (cost=8.30..8.30 rows=1 width=17) (actual time=0.050..0.051 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kb
-> Index Scan using users_username_idx on users (cost=0.28..8.30 rows=1 width=17) (actual time=0.030..0.031 rows=1 loops=1)
Index Cond: ((username)::text = 'Admin'::text)
Planning Time: 3.4545 ms
Execution Time: 34.496 ms
Lines starting with arrows ->
are query nodes, which are the steps trying to access data inside the database or trying to do some processing. (The first line is also a query node, even though it doesn't have the arrow)
The order to read the EXPLAIN
planning result, is starting from the inner most rows.
And the results from each the query nodes, will be passed to the nearest query node PARENT.
The cost of a Parent node will start up from the SUM of costs of its children, and then, add its own processing cost.
Costs
Not the full calculation, only some of the basic variables.
pg_stats
rows
and width
fields from EXPLAIN
are taken from a stats table that Postgres maintain about the tables.
Check them with:
SELECT * FROM pg_stats WHERE tablename = 'table-name';
Scan types
Sequential scan
Will be used:
As default when no other valuable alternative is available.
When reading from the begining of the dataset. (e.g.
SELECT * FROM table_name
)When the filtering clause is not very limiting, so that the end result will be almost the whole table contents.
It is fast when data is read sequencially.
Index nodes scans
Will be used:
When an index is used to access the dataset.
When data file and index files are separated but they are nearby.
Scans on index nodes can be separated in:
Index scan:
index -> seeking the tuples -> the read again the data
. (2 trip scan, it must get data of the index col and then get the rest of the data for other columns) (e.g.SELECT * FROM table_name WHERE index_col = 1
)Index Only scan:
requested index columns only -> directly get data from index file
. (when it only needs to return data from the index column) (e.g.SELECT index_col FROM table_name WHERE index_col = 1
)Bitmap index scan:
builds a memory bitmap of where the tuples that satistfy the statement clauses
.
Join nodes scans
Will be used:
When joining tables.
When joins are performaed on two tables at a time; If more tables are joined together, the output of one join is treated as input to a subsequent join.
When joining a large number of tables, the genetic query optimizer settings may affect what combinstions of joins are considered.
Scans on join nodes can be separated in:
Hashed join: Builds a hash table from the inner table, keyed by the join key, then scan the outer table, checking if a corresponding value is present.
Merged join: Joins two children already sorted bu their shared join key. This only needs to scan each relation once, but both inputs need to be sorted buy the join key first.
Nested loop: For each row in the outer table, iterate through all the rows in the inner table and see if they match the join condition. If the inner relation can be scanned with an index, that can improve the performance of a nested loop join. (This is generally an inefficient way to process joins but is always available and sometimes mey be the only option)
Indexes
Full Table Scan
Is when PG has to load many (or all) rows from the heap file to memory.
(Frequently, but not always, poor performance)
Postgres' Planner, may decide not to use indexes on a query if doing sequencial scans are faster.
About
Index is a data structure that efficiently tells which block/index a record is stored at, thus significantly reducing search time, and avoiding Full Table Scans.
Retrieving information with Indexes, will usually do random page read
and in some cases can be slower than doing sequencial scans.
Expecially if returning lots of data.
This means that indexes are really good for returning small selections of data, relative to the size of the data.
Indexes are stored in files that can considerably take a lot of space, since their size scale with the Data that you have.
Types of index
B-Tree
General purpose index. 99% of the time you want this.
Hash
Speeds up simple equality checks.
GiST
Geometry, full-text search.
SP-GiST
Clustered data, such as dates - many rows might have the same year.
GIN
For columns that contain arrays of JSON data.
BRIN
Specialized for really large datasets.
B-Tree
It is the default index type when creating an index. B-Tree is a self-balancing tree that can run SELECT
, INSERT
, DELETE
and Sequential reads in logarithmic time log n
.
Hash
An index type only used for equality operators =
.
They also can consume more space compared to B-Tree.
GIN
Stands for (Generalized Inverted Index), and its core capability is to speed up full text searches.
GIN index stores "key" (or an element or a value) and the "position list" pairs. The position list is the rowID of the key. This means, if the "key" occurs at multiple places in the document, GIN index stores the key only once along with its position of occurrences which not only keeps the GIN index compact in size and also helps speed-up the searches.
Depending on the complexity of the data, maintaining GIN indexes can be expensive.
CREATE INDEX index_name ON table_name USING GIN(col_name) (col_name);
BRIN
It is based on the range of values and data blocks on a storage.
It is smaller and less contly to maintain than B-Tree index.
Creating
-- CREATE [UNIQUE] INDEX index_name ON table_name [USING index_type]
-- (
-- column_name [ASC | DESC] [NULLS [FIRST | LAST]],
-- ...
-- )
Where index_name
conventionally follows table_colname1_colname2_idx
.
When creating multi-column indexes, always place the most selective columns first, so the index access method will be the cheapest.
Deleting
DROP INDEX [CONCURRENTLY] [IF EXISTS] index_name [CASCADE | RESTRICT];
CONCURRENTLY
means when you execute the DROP statement, Postgres aquires an exclusive lock on the table and block other accesses until the index removal completes.
CASCADE
means if the index has dependent objects, it will automatically drop these objects and all objects that depend on those objects. (It is recursive)
RESTRICT
tells Postgres to refuse to drop the index if any objects depend on it. (Used by default)
Managing
Listing all indexes
-- Entire database
SELECT * FROM pg_indexes;
-- Only from public schema
SELECT * FROM pg_indexes WHERE schemaname = 'public';
-- Only for a particular table
SELECT * FROM pg_indexes WHERE
tablename = 'table_name'
and schemaname = 'public'
;
Check the size of index table
SELECT pg_size_pretty(pg_indexes_size('table_name'));
Check how much space indexes take in relation to the table
SELECT pg_size_pretty(pg_relation_size('table_name'));
Check stats for all indexes
-- Entire database
SELECT * FROM pg_stat_all_indexes;
-- Only from public schema
SELECT * FROM pg_stat_all_indexes WHERE schemaname = 'public';
-- Only for a particular table
SELECT * FROM pg_stat_all_indexes WHERE
schemaname = 'public'
and tablename = 'table_name'
;
Schemas
Schemas in Postgres are namespaces (like folders) to help organize things in a database. Each Schema can have its own separate copy of a table.
Every database gets a default schema called "public".
To runs commands and queries in custom schemas, the schema-name
must prepend the table-name
. Otherwise it will use the default "public" schema.
This behavior can be changed Changing default schema.
Schemas are interesting for API tests (Isolation). So that tests will access the same database but a different schema, thus changes in it will not reflect in the "public" schema.
Schemas allow to organize and limit database objects access to users.
Users can only access objects in the schemas that they own.
Creating
When creating a new schema in a database, it will not pull the tables nor data from the public
schema. It will be totally empty.
CREATE SCHEMA schema_name;
Creating tables in this schema
CREATE TABLE schema_name.table_name (
-- columns
);
Moving tables between schemas
ALTER TABLE schema_name.table_name SET SCHEMA other_schema;
Access rights
There are two access levels rights:
USAGE
: to access schema.CREATE
: to create objects in the schema.
-- GRANT [USAGE | CREATE] ON SCHEMA schema_name TO [user | role]
-- Will grant access to the schema
GRANT USAGE ON SCHEMA schema_name TO role_name;
-- But will also have to grant read to tables inside the schema
GRANT SELECT ON ALL TABLES IN SCHEMA schema_name TO role_name;
More on user priviledges Roles/Users.
Changing default schema
Show how Postgres is handling the default value
SHOW search_path;
Usually, the default result of this query will be "$user", public
. This means that it will always - if not specified - try to use:
First, a schema named after the User that holds the connection.
Or if it does not exist, the schema named
public
.
Change the search path
SET search_path TO test, public;
This will change the search path to be test, public
.
Meaning it will first, try to find a schema named test
. And if not existent, then the public
one.
Domains Data Types
CREATE DOMAIN
statement creates a user-defined data type with a range, optional DEFAULT
, NOT NULL
and CHECK
constraint.
They have to be unique within a Schema scope and cannot be used outside of the Schema scope where they are defined.
Domain data types are different from Composite (Objects) , which can hold composite objects.
Creating
CREATE DOMAIN us_postal_code TEXT NOT NULL
CHECK (
VALUE ~'^\d{5}$'
OR VALUE ~'^\d{5}-\d{4}$'
)
Then use it in other tables like:
CREATE TABLE locations {
id SERIAL PRIMARY KEY
postal us_postal_code
};
Listing
List all the Domain data types in a Schema with:
SELECT typname
FROM pg_catalog.pg_type
JOIN pg_catalog.pg_namespace
ON pg_namespace.oid = pg_type.typnamespace
WHERE
typtype = 'd' and nspname = 'public'
;
Deleting
Before dropping domains data types, make sure to not use them anymore in other tables.
DROP DOMAIN name;
-- Will forcefully delete Columns in tables that use the domain data type
DROP DOMAIN name CASCADE;
Last updated