Observability Stack

Monowar
2 min readOct 26, 2022

--

Observability is tooling or a solution that allows us to debug the system actively. Observability is based on exploring properties and patterns not defined in advance. Observability is important because it gives us visibility into what’s happening inside the system. The 3 pillars of observability are logs, metrics and traces.

A Dockerized Grafana/Prometheus/Victoriametrics/Loki/Jaeger environment

  • Ensure docker and docker-compose is installed and running (see https://docs.docker.com/get-docker/)
  • Run docker-compose up
  • Once instances are up you can connect to http://localhost:3000 (or http://<ip of server>:3000)
  • The default credentials are admin/passw0rd

Grafana is a multi-platform open-source analytics and interactive visualization web application. It allows query, visualize, alert on and understanding of the metrics.

Prometheus is an open-source monitoring solution for collecting and aggregating metrics as time series data.

VictoriaMetrics is an Open Source Time Series Database (see https://github.com/VictoriaMetrics/VictoriaMetrics).

VictoriaMetrics and Prometheus write data to disk at roughly 2MB/s speed when collecting 280K samples per second. Prometheus generates more disk write spikes with much higher values reaching 50MB/s, while the maximum disk writes spike for VictoriaMetrics is 15MB/s. VictoriaMetrics needs up to 5x less RAM and 7x less disk space compared to Prometheus when scraping thousands of node_exporter targets. So you can use one of them.

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system. It is designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream. (see https://github.com/grafana/loki).

Promtail is the agent, responsible for gathering logs and sending them to Loki.

Jaeger is an open-source distributed tracing tool meant to monitor and troubleshoot transactions in distributed systems. (see https://github.com/jaegertracing/jaeger).

Telegraf is the agent for collecting and sending all metrics and events from various systems. (see https://github.com/influxdata/telegraf).

You can use telegraf input plugins from here, https://github.com/influxdata/telegraf/tree/master/plugins/inputs

Download: https://github.com/mhoshim/observability

Screenshots

System Metrics
Kernel Logs

--

--

Monowar

Technology Leader | Cloud Architect | DevOps | AWS Certified Professional