PostgreSQL

Data Types

Casting values

You can CAST values in postgres with ::.

-- This way postgres will treat it as `integer` instead of `numeric`
SELECT (2.0::integer)

Numeric

Type
Subtype
Description

smallint

Integer

2-byte, range: -32,768 to +32,767

integer

Integer

4-byte, range: -2B to +2B

bigint

Integer

8-byte, large range

decimal / numeric

Fixed-Precision

Arbitrary precision exact numbers

real / float

Floating-Point

4-byte, approx precision ~6 decimal digits

double precision

Floating-Point

8-byte, approx precision ~15 digits

serial, bigserial, smallserial

Auto-increment

Integer auto sequences

Date/Time

Type
Description

timestamp

Date and time (no timezone)

timestamptz

Timestamp with timezone

date

Calendar date (year, month, day)

time

Time of day (no date, no timezone)

timetz

Time with timezone

interval

Time span (e.g., 3 days, 2 hours)

Character

Type
Description

char(n)

Fixed-length string (Spaces are added to pad at the end)

varchar(n)

Variable-length string, with limit

text

Variable unlimited-length string

UUID

Type
Description

uuid

Universally Unique Identifier (By default uses uuidv4)

Boolean

Type
Description

boolean

Stores TRUE, FALSE, or NULL

Enumerated

Type
Description

enum

Custom, user-defined list of valid string values

CREATE TYPE mood AS ENUM ('happy', 'sad', 'ok');

Array

Type
Description

type[]

One-dimensional or multi-dimensional arrays

E.g. text[], integer[][]

Json & Jsonb

Type
Description

json

Text-based JSON (slower, preserves formatting)

jsonb

Binary JSON (faster, no formatting preserved)

Geometric

Type
Description

point

(x, y) coordinate

line

Infinite line

lseg

Line segment

box

Rectangular box

path

Closed/open path

polygon

Polygon

circle

Center and radius

Network Address

Type
Description

cidr

IP network

inet

IP address

macaddr

MAC address

macaddr8

MAC address (EUI-64)

Bit String

Type
Description

bit(n)

Fixed-length bit string

bit varying(n)

Variable-length bit string

Range

Type
Description

int4range

Range of integers (4-byte)

tsrange

Range of timestamps (no timezone)

daterange

Range of dates

Composite (Objects)

Type
Description

row

Custom record-like structures

CREATE TYPE full_name AS (
  first text,
  last text
);

Query

Aggregates

group by

Groups rows by columns.

Can only directly access the columns used to group the rows.

Any other column can only be accessed with an aggregate functions.

count

When using Count to count a particular column, have in mind that null values of this column are not counted.

Use Count(*) then, to count the rows.

having

Filter a set of groups.

Utility functions

greatest

Pick the highest out of a list of values.

SELECT GREATEST(col1, col2, x, ...)

least

Pick the lowest out of a list of values.

SELECT LEAST(col1, col2, x, ...)

case

To use ifs in a select statement.

If ELSE is not used, a null value will be returned if none of the conditions were met.

SELECT
  col1,
  CASE
    WHEN col2 > x THEN 'value1'
    WHEN col2 < x THEN 'value2'
    ELSE 'values3'
  END

Recursive CTE (Common Table Expressions)

A way to perform recursive queries, meaning a query that calls itself until a stopping condition is met.

Useful when having a tree or graph-type data structure.

  1. The Anchor member runs once and provides the initial dataset.

  2. The Recursive member runs repeatedly, using results from previous iterations.

    1. The process stops when the recursive part returns no new rows.

WITH RECURSIVE countdown(val) AS (
  SELECT 3 AS val -- Anchor member (base case)
  UNION
  SELECT val - 1 FROM countdown WHERE val > 1 -- Recursive member (self-reference)
)
SELECT * FROM countdown;

Example

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name TEXT,
    manager_id INT REFERENCES employees(id)
);

-- Recursive query to get subordinates of employee with id = 1

WITH RECURSIVE emp_tree(id, name, manager_id) AS (
    -- Anchor: CEO or root person
    SELECT id, name, manager_id
    FROM employees
    WHERE id = 1

    UNION ALL

    -- Recursive part: find direct reports
    SELECT e.id, e.name, e.manager_id
    FROM employees e
    INNER JOIN emp_tree et ON e.manager_id = et.id
)
SELECT * FROM emp_tree;

Union vs Union All

There is a difference in using UNION and UNION ALL in the recursive block.

Union All

  • The recursive part reuses all rows, including duplicates.

  • This is usually what you want.

  • It's faster because there's no sorting or deduplication.

  • You'll need to manually prevent infinite loops if needed. (e.g., via path tracking)

Union

  • PostgreSQL will remove duplicate rows at every recursion level.

  • This adds CPU cost due to hashing or sorting.

  • If you're not careful, it can also block valid recursion paths.

You may think this helps with cycle prevention, but:

  • It only deduplicates entire rows — so even small changes can create new "unique" rows.

  • It doesn’t guarantee correct logic in graphs/hierarchies.

Fixing infinite loops

To prevent cycles or infinite loops, build a path history with an array:

WITH RECURSIVE emp_tree(id, name, manager_id, path_ids) AS (
    SELECT id, name, manager_id, ARRAY[id] AS path_ids
    FROM employees
    WHERE id = 1

    UNION ALL

    SELECT e.id, e.name, e.manager_id, et.paths_ids || e.id -- build history path
    FROM employees e
    INNER JOIN emp_tree et ON e.manager_id = et.id
    -- Don't revisit rows where the ID was already seen
    WHERE NOT e.id = ANY(et.path_ids)
)
SELECT * FROM emp_tree;

Limiting depth of recusion

WITH RECURSIVE emp_tree(id, name, manager_id, depth) AS (
    SELECT id, name, manager_id, 1 AS depth
    FROM employees
    WHERE id = 1

    UNION ALL

    SELECT e.id, e.name, e.manager_id, et.depth + 1
    FROM employees e
    INNER JOIN emp_tree et ON e.manager_id = et.id
    -- Depth limit of 5 (Don't go deeper than 4 levels of subordinates)
    WHERE et.depth< 5
)
SELECT * FROM emp_tree;

Views

Views are persistent "fake" tables that have rows from other tables. These can be exact rows as they exists on another table, or a computed value.

They can be referenced in any place where you would normally reference a table.

Creating

CREATE VIEW view_name AS (
    -- In here, any kind of SELECT query
    SELECT * FROM somewhere
);

Updating

CREATE OR REPLACE VIEW view_name AS (
    -- In here, any kind of SELECT query
    SELECT * FROM somewhere
);

Deleting

DROP VIEW view_name;

Materialized Views

With Materialized Views, the query gets executed only at specific times, but the results are saved and can be referenced without rerunning the query.

The ideia is that it will not garantee data integrity, since it can hold old values, until its data is refreshed.

Creating

CREATE MATERIALIZED VIEW view_name (
    -- In here, the heavy query
    SELECT * FROM somewhere
) WITH DATA;

WITH DATA will tell Postgres to automatically run the query, one time, when the Materialized View is created.

Refreshing data

REFRESH MATERIALIZED VIEW view_name;

Internals

The Data

Where is all the data stored

In Postgres, all data is generally stored inside /base folder.

You can check where this folder is with:

SHOW data_directory;

Find which data folder belongs to which database with:

SELECT oid, datname FROM pg_database;

There are some different types of files for different purposes.

The Heap or Heap file, is the file that contains all the data (rows) of a table, (usually the files that has a number as name).

A single Heap file is divided into Blocks or Pages, where each page/block stores some number of rows.

At last, an individual row is called Tuple or Item.

Drawing
Drawing

Explain & Explain Analyze

EXPLAIN is a tool to show the Planner Step of a query, which is the step that decides the most optimal way to execute the desired query.

EXPLAIN builds a query plan and display info about it.

EXPLAIN ANALYZE builds a query plan, run it, and info about it.

EXPLAIN ANALYZE output

EXPLAIN ANALYZE
SELECT * FROM table;
Example
Hash Join (cost=8.28..1756.30 rows=11 width=81) (actual time=0.030..34.031 rows=7 loops=1)
  Hash Cond: (comments.user_id = users.id)
  -> Seq Scan on comments (cost=0.00..1589.30 rows=60410 width=72) (actual time=0.030..14.031 rows=60410 loops=1)
  -> Hash (cost=8.30..8.30 rows=1 width=17) (actual time=0.050..0.051 rows=1 loops=1)
    Buckets: 1024 Batches: 1 Memory Usage: 9kb
    -> Index Scan using users_username_idx on users (cost=0.28..8.30 rows=1 width=17) (actual time=0.030..0.031 rows=1 loops=1)
       Index Cond: ((username)::text = 'Admin'::text)
Planning Time: 3.4545 ms
Execution Time: 34.496 ms

Lines starting with arrows -> are query nodes, which are the steps trying to access data inside the database or trying to do some processing. (The first line is also a query node, even though it doesn't have the arrow)

The order to read the EXPLAIN planning result, is starting from the inner most rows.

And the results from each the query nodes, will be passed to the nearest query node PARENT.

Drawing

Not the full calculation, only some of the basic variables.

Drawing

Tip

Not all query types will use all the variables.

For a sequencial read, as example, only #page read sequentially and #rows scanned.

pg_stats

Check them with:

SELECT * FROM pg_stats WHERE tablename = 'table-name';

The Connection

Connection pools

A pool internally maintains several different clients that can be reused.

Drawing

Indexes

About

Index is a data structure that efficiently tells which block/index a record is stored at, thus significantly reducing search time, and avoiding Full Table Scans.

Drawing

Create an index with

CREATE INDEX <index-name> ON table (column, ...);

Where index-name conventionally follows table_colname1_colname2_idx.

Delete an index with

DROP INDEX <index-name>;

Check how much space indexes take in relation to the table

SELECT pg_size_pretty(pg_relation_size('table-name'));

SELECT pg_size_pretty(pg_relation_size('index-name'));

Types of index

Type
Description

B-Tree

General purpose index. 99% of the time you want this.

Hash

Speeds up simple equality checks.

GiST

Geometry, full-text search.

SP-GiST

Clustered data, such as dates - many rows might have the same year.

GIN

For columns that contain arrays of JSON data.

BRIN

Specialized for really large datasets.

Schemas

Schemas in Postgres are like folders to organize things in a database. Each Schema can have its own separate copy of a table.

Every database gets a default schema called "public".

Creating

When creating a new schema in a database, it will not pull the tables nor data from the public schema. It will be totally empty.

CREATE SCHEMA schema_name;

Creating tables in this schema

CREATE TABLE schema_name.table_name (
    -- columns
);

Changing default schema

Show how Postgres is handling the default value

SHOW search_path;

Usually, the default result of this query will be "$user", public. This means that it will always - if not specified - try to use:

  1. First, a schema named after the User that holds the connection.

  2. Or if it does not exist, the schema named public.

Change the search path

SET search_path TO test, public;

This will change the search path to be test, public.

Meaning it will first, try to find a schema named test. And if not existent, then the public one.

Last updated