PostgreSQL
Data Types
Casting values
You can CAST values in postgres with ::
.
It will enforce the target cast type constraints, and raise errors if the casting value does not comply.
-- This way postgres will treat it as `integer` instead of `numeric`
SELECT (2.0::integer)
Numeric
smallint
Integer
2-byte, range: -32,768 to +32,767
integer
Integer
4-byte, range: -2B to +2B
bigint
Integer
8-byte, large range
decimal
/ numeric
Fixed-Precision
Arbitrary precision exact numbers
real
/ float
Floating-Point
4-byte, approx precision ~6 decimal digits
double precision
Floating-Point
8-byte, approx precision ~15 digits
serial
, bigserial
, smallserial
Auto-increment
Integer auto sequences
Date/Time
timestamp
Date and time (no timezone)
timestamptz
Timestamp with timezone
date
Calendar date (year, month, day)
time
Time of day (no date, no timezone)
timetz
Time with timezone
interval
Time span (e.g., 3 days, 2 hours)
Character
There is no performance difference between them (in terms of memory layout, storage size, index size or access speed). Internally postgres treats them the same, unlike MySQL.
Even a varchar(254)
vs varchar(500)
behaves the same at storage level. The only difference is enforced length checks on input.
When using a varchar(500)
for instance, postgres doesn't allocate 500 bytes, only the actual content length of the data.
Postgres B-tree indexes work the same way on text
and varchar
, since the index is built from the actual string values.
A 50-character string indexed on
VARCHAR(254)
orTEXT
will produce the same index size and access path.Limiting the max length does not reduce index size unless actual values are smaller.
char(n)
Fixed-length string (Spaces are added to pad at the end)
varchar(n)
Variable-length string, with limit
text
Variable unlimited-length string
UUID
uuid
Universally Unique Identifier
(By default uses uuidv4
)
Boolean
boolean
Stores TRUE
, FALSE
, or NULL
Enumerated
enum
Custom, user-defined list of valid string values
CREATE TYPE mood AS ENUM ('happy', 'sad', 'ok');
Array
type[]
One-dimensional or multi-dimensional arrays
E.g.
text[]
,integer[][]
Json & Jsonb
json
Text-based JSON (slower, preserves formatting)
jsonb
Binary JSON (faster, no formatting preserved)
Geometric
point
(x, y) coordinate
line
Infinite line
lseg
Line segment
box
Rectangular box
path
Closed/open path
polygon
Polygon
circle
Center and radius
Network Address
cidr
IP network
inet
IP address
macaddr
MAC address
macaddr8
MAC address (EUI-64)
Bit String
bit(n)
Fixed-length bit string
bit varying(n)
Variable-length bit string
Range
int4range
Range of integers (4-byte)
tsrange
Range of timestamps (no timezone)
daterange
Range of dates
Composite (Objects)
row
Custom record-like structures
CREATE TYPE full_name AS (
first text,
last text
);
Query
Aggregates
group by
Groups rows by columns.
count
having
Filter a set of groups.
Utility functions
greatest
Pick the highest out of a list of values.
SELECT GREATEST(col1, col2, x, ...)
least
Pick the lowest out of a list of values.
SELECT LEAST(col1, col2, x, ...)
case
To use ifs
in a select statement.
SELECT
col1,
CASE
WHEN col2 > x THEN 'value1'
WHEN col2 < x THEN 'value2'
ELSE 'values3'
END
Recursive CTE (Common Table Expressions)
A way to perform recursive queries, meaning a query that calls itself until a stopping condition is met.
The
Anchor member
runs once and provides the initial dataset.The
Recursive member
runs repeatedly, using results from previous iterations.The process stops when the recursive part returns no new rows.
WITH RECURSIVE countdown(val) AS (
SELECT 3 AS val -- Anchor member (base case)
UNION
SELECT val - 1 FROM countdown WHERE val > 1 -- Recursive member (self-reference)
)
SELECT * FROM countdown;
Check differences of using UNION
and UNION ALL
.
Example
This example is prone to infinite loops, if for some reason, the employee is its own manager or if someones manager is someone that was already looked on (A -> B -> C -> A
).
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT,
manager_id INT REFERENCES employees(id)
);
-- Recursive query to get subordinates of employee with id = 1
WITH RECURSIVE emp_tree(id, name, manager_id) AS (
-- Anchor: CEO or root person
SELECT id, name, manager_id
FROM employees
WHERE id = 1
UNION ALL
-- Recursive part: find direct reports
SELECT e.id, e.name, e.manager_id
FROM employees e
INNER JOIN emp_tree et ON e.manager_id = et.id
)
SELECT * FROM emp_tree;
Union vs Union All
There is a difference in using UNION
and UNION ALL
in the recursive block.
Union All
The recursive part reuses all rows, including duplicates.
This is usually what you want.
It's faster because there's no sorting or deduplication.
You'll need to manually prevent infinite loops if needed. (e.g., via path tracking)
Union
PostgreSQL will remove duplicate rows at every recursion level.
This adds CPU cost due to hashing or sorting.
If you're not careful, it can also block valid recursion paths.
You may think this helps with cycle prevention, but:
It only deduplicates entire rows — so even small changes can create new "unique" rows.
It doesn’t guarantee correct logic in graphs/hierarchies.
Fixing infinite loops
To prevent cycles or infinite loops, build a path history with an array:
WITH RECURSIVE emp_tree(id, name, manager_id, path_ids) AS (
SELECT id, name, manager_id, ARRAY[id] AS path_ids
FROM employees
WHERE id = 1
UNION ALL
SELECT e.id, e.name, e.manager_id, et.paths_ids || e.id -- build history path
FROM employees e
INNER JOIN emp_tree et ON e.manager_id = et.id
-- Don't revisit rows where the ID was already seen
WHERE NOT e.id = ANY(et.path_ids)
)
SELECT * FROM emp_tree;
Limiting depth of recusion
WITH RECURSIVE emp_tree(id, name, manager_id, depth) AS (
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE id = 1
UNION ALL
SELECT e.id, e.name, e.manager_id, et.depth + 1
FROM employees e
INNER JOIN emp_tree et ON e.manager_id = et.id
-- Depth limit of 5 (Don't go deeper than 4 levels of subordinates)
WHERE et.depth< 5
)
SELECT * FROM emp_tree;
Views
Views are persistent "fake" tables that have rows from other tables. These can be exact rows as they exists on another table, or a computed value.
They can be referenced in any place where you would normally reference a table.
Views don't create new tables or move any data around.
Views don't improve performance on heavy queries.
Since the View data is not "saved", every access to a View will under the hood run the heavy query every time.
Creating
CREATE VIEW view_name AS (
-- In here, any kind of SELECT query
SELECT * FROM somewhere
);
Updating
CREATE OR REPLACE VIEW view_name AS (
-- In here, any kind of SELECT query
SELECT * FROM somewhere
);
Deleting
DROP VIEW view_name;
Materialized Views
With Materialized Views, the query gets executed only at specific times, but the results are saved and can be referenced without rerunning the query.
Creating
CREATE MATERIALIZED VIEW view_name (
-- In here, the heavy query
SELECT * FROM somewhere
) WITH DATA;
WITH DATA
will tell Postgres to automatically run the query, one time, when the Materialized View is created.
Refreshing data
Data must be manually refreshed.
Postgres doesn't automatically update the data on them.
REFRESH MATERIALIZED VIEW view_name;
Internals
The Data
Where is all the data stored
In Postgres, all data is generally stored inside /base
folder.
You can check where this folder is with:
SHOW data_directory;
Find which data folder belongs to which database with:
SELECT oid, datname FROM pg_database;
There are some different types of files for different purposes.
The Heap or Heap file, is the file that contains all the data (rows) of a table, (usually the files that has a number as name).
A single Heap file is divided into Blocks or Pages, where each page/block stores some number of rows.
At last, an individual row is called Tuple or Item.
Explain & Explain Analyze
EXPLAIN
is a tool to show the Planner Step of a query, which is the step that decides the most optimal way to execute the desired query.
EXPLAIN
builds a query plan and display info about it.
EXPLAIN ANALYZE
builds a query plan, run it, and info about it.
EXPLAIN ANALYZE output
The time metrics from normal queries may be misleading, since it may include networking time.
Use Execution Time result to check how long the query actually took.
EXPLAIN ANALYZE
SELECT * FROM table;
Hash Join (cost=8.28..1756.30 rows=11 width=81) (actual time=0.030..34.031 rows=7 loops=1)
Hash Cond: (comments.user_id = users.id)
-> Seq Scan on comments (cost=0.00..1589.30 rows=60410 width=72) (actual time=0.030..14.031 rows=60410 loops=1)
-> Hash (cost=8.30..8.30 rows=1 width=17) (actual time=0.050..0.051 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kb
-> Index Scan using users_username_idx on users (cost=0.28..8.30 rows=1 width=17) (actual time=0.030..0.031 rows=1 loops=1)
Index Cond: ((username)::text = 'Admin'::text)
Planning Time: 3.4545 ms
Execution Time: 34.496 ms
Lines starting with arrows ->
are query nodes, which are the steps trying to access data inside the database or trying to do some processing. (The first line is also a query node, even though it doesn't have the arrow)
The order to read the EXPLAIN
planning result, is starting from the inner most rows.
And the results from each the query nodes, will be passed to the nearest query node PARENT.
The cost of a Parent node will start up from the SUM of costs of its children, and then, add its own processing cost.
Not the full calculation, only some of the basic variables.
pg_stats
rows
and width
fields from EXPLAIN
are taken from a stats table that Postgres maintain about the tables.
Check them with:
SELECT * FROM pg_stats WHERE tablename = 'table-name';
The Connection
Crashed connections will ROLLBACK
transactions automatically.
Connection pools
A pool internally maintains several different clients that can be reused.
Indexes
Full Table Scan
Is when PG has to load many (or all) rows from the heap file to memory.
(Frequently, but not always, poor performance)
Postgres' Planner, may decide not to use indexes on a query if doing sequencial scans are faster.
About
Index is a data structure that efficiently tells which block/index a record is stored at, thus significantly reducing search time, and avoiding Full Table Scans.
Retrieving information with Indexes, will usually do random page read
and in some cases can be slower than doing sequencial scans.
Expecially if returning lots of data.
This means that indexes are really good for returning small selections of data, relative to the size of the data.
Indexes are stored in files that can considerably take a lot of space, since their size scale with the Data that you have.
Create an index with
CREATE INDEX <index-name> ON table (column, ...);
Where index-name
conventionally follows table_colname1_colname2_idx
.
Delete an index with
DROP INDEX <index-name>;
Check how much space indexes take in relation to the table
SELECT pg_size_pretty(pg_relation_size('table-name'));
SELECT pg_size_pretty(pg_relation_size('index-name'));
Types of index
B-Tree
General purpose index. 99% of the time you want this.
Hash
Speeds up simple equality checks.
GiST
Geometry, full-text search.
SP-GiST
Clustered data, such as dates - many rows might have the same year.
GIN
For columns that contain arrays of JSON data.
BRIN
Specialized for really large datasets.
Schemas
Schemas in Postgres are like folders to organize things in a database. Each Schema can have its own separate copy of a table.
Every database gets a default schema called "public".
To runs commands and queries in custom schemas, the schema-name
must prepend the table-name
. Otherwise it will use the default "public" schema.
This behavior can be changed Changing default schema.
Schemas are interesting for API tests (Isolation). So that tests will access the same database but a different schema, thus changes in it will not reflect in the "public" schema.
Creating
When creating a new schema in a database, it will not pull the tables nor data from the public
schema. It will be totally empty.
CREATE SCHEMA schema_name;
Creating tables in this schema
CREATE TABLE schema_name.table_name (
-- columns
);
Changing default schema
Show how Postgres is handling the default value
SHOW search_path;
Usually, the default result of this query will be "$user", public
. This means that it will always - if not specified - try to use:
First, a schema named after the User that holds the connection.
Or if it does not exist, the schema named
public
.
Change the search path
SET search_path TO test, public;
This will change the search path to be test, public
.
Meaning it will first, try to find a schema named test
. And if not existent, then the public
one.
Last updated