Prometheus

Power your metrics and alerting with a leading open-source monitoring solution.

Features

  • Multi-dimensional data model with time series data identified by metric name and key/value pairs, making it a powerful tool for data collection.

  • PromQL a flexible query language to leverage this dimensionality and create powerful prompts.

  • Efficient storage, no reliance on distributed storage; single server nodes are autonomous.

  • Easy integration with Grafana and other clients. (Despite being a powerful tool, Prometheus don't have visual display of its data)

  • Intelligent alert system.

Components

The Prometheus ecosystem consist of multiple components:

  • The main, Prometheus server which scrapes and stores time series data.

  • Client libraries for instrumenting application code.

  • A push gateway for supporting short-lived jobs.

  • Special-purpose exporters for services.

  • An alert-manager to handle alerts.

  • Other support tools.

Architecture

Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs.

It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts.

Grafana or other API consumers can be used to visualize the collected data.

Retrieval

The component that receives all the information from outside.

TSDB (Time series Database)

The component that stores all information with a time series format.

Data storage changes with time.

Recent data is more precise than old data.

HTTP server

An internal HTTP server, by which it can monitor itself, and to make all the data available to the outside.

Service Discovery

Prometheus can have access and communicate with Service Discoveries, to find new targets to monitor. (for instance when you have dynamic architectures where nodes are added and deleted due to auto-scaling)

For services or applications that are not working all the time. (Ex.: An application available only once a day at a specific time)

To avoid Prometheus to keep trying to pull unecessarily the metrics from this job directly, the job push the metrics and data to the Pushgateway, and the Pushgateway "stores" this data.

So, the Pushgateway can be used for services that don't need to produce useful metrics all the time.

Useful for generating metrics over thid-party software (that you don't have access to).

Drawing

You can build your own Exporter if needed.

The Alertmanager hits the Prometheus Server's Http server to get the required data for the alerts to work.

Metrics

The Prometheus client libraries offer four core metric types.

Counter

A metric with an incremental (cumulative) value.

It can only increase or be reset to zero on restarts.

Gauge

It is a metric that represents a single numerical value that can arbitrarily go up and down.

Ex.: Like mesuraments of memory usage, temperatures, etc.

A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.

Useful for frequency distribution.

Similar to a histogram, a summary samples observations (usually things like request durations and response sizes).

While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.

The documentation is kind of weak, but Grafana for instance has really good auto complete for Prometheus queries.

Last updated