Photo by Luke Chesser on Unsplash
Telemetry and Metrics in Elixir
Learn about the metrics you can collect in Elixir with the telemetry_metrics library
In the previous article, I introduced you to Telemetry. The telemetry
library provides a way to:
- generate events
- create handler functions that will receive the events
- register which handlers will be called when an event occurs
The principal advantage of telemetry
is a standardized way to emit events and collect measurements in Elixir. This way, third-party libraries and code that we write ourselves can use it to generate events in a consistent way.
Let's now take a step back and consider a couple of things. An event is just a measure of something that happened at a given point in time. By itself doesn't tell us much. We might need more data to compare it against or to infer some trend or some rate of change. But the measure by itself is not very useful.
With that in mind, the telemetry developers created a library called telemetry_metrics
. According to their documentation, telemetry_metrics
is:
Common interface for defining metrics based on :telemetry events.
Ok, but what are metrics, and how it is different from measurements taken with events? Well, again according to their docs:
Metrics are aggregations of Telemetry events with specific name, providing a view of the system's behavior over time.
Ok, it seems clear, right? Not really.
If you start using the telemetry_metrics
library you'll notice that in fact DOESN'T AGGREGATE anything!!!
What?
Exactly that. Doesn't aggregate anything.
If you generate 1000 events with a measurement each, this library is not going to keep, manipulate, calculate, summarize or do anything by itself.
So, what is useful for, you might ask?
Its aim is to define what kind of metrics you're going to create from events. That's it. Just a declaration, a definition, a contract if you will. Nothing else.
So, how do you get the real metrics, the processed results of your carefully collected measurements?
For that, you need something else. You need a reporter
.
The reporter's responsibility is to aggregate and process each event received so that it conforms to the specified metric it handles.
telemetry_metrics
ships with a simple reporter called ConsoleReporter
that doesn't do much. It only outputs the measurement and metric details to the console. Not much, but allows you to verify that everything works and you're collecting data to generate metrics correctly.
For more useful reporters you should either write them yourself or get them from someone that has already written it.
In this article, I am going to show you how to use the default ConsoleReporter
and how to create a custom reporter.
Prerequisites
Use asdf
and install these versions of elixir and erlang:
asdf install erlang 24.2.1
asdf global erlang 24.2.1
asdf install elixir 1.13.3-otp-24
asdf global elixir 1.13.3-otp-24
Create an Elixir app
mix new metrics --sup
This will create an Elixir app with an application callback where we can attach our telemetry events and configure our metrics.
Install dependencies
Add telemetry
and telemetry_metrics
to your mix.exs
:
defp deps do
[
{:telemetry, "~> 1.0"},
{:telemetry_metrics, "~> 0.6.1"}
]
end
Create an event emitter
Let's create a simple event emitter that we can use to test our metrics. Open the lib/metrics.ex
file that mix created for you and replace its contents with this:
defmodule Metrics do
def emit(value) do
:telemetry.execute([:metrics, :emit], %{value: value})
end
end
As you can see it uses :telemetry.execute
to emit an event passing the value we provide.
Define some metrics
Ok, now the interesting part: let's define some metrics we want to collect. telemetry_metrics
allows to create several types of metrics:
counter/2
which counts the total number of emitted eventssum/2
which keeps track of the sum of selected measurementlast_value/2
holding the value of the selected measurement from the most recent eventsummary/2
calculating statistics of the selected measurement, like maximum, mean, percentiles, etc.distribution/2
which builds a histogram of selected measurement
We are going to start with the basics and define a counter metric, assuming we want to count how many times the event has happened.
Create a new file lib/metrics/telemetry.ex
and put this in it:
defmodule Metrics.Telemetry do
use Supervisor
import Telemetry.Metrics
def start_link(arg) do
Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
end
def init(_arg) do
children = [
{Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
]
Supervisor.init(children, strategy: :one_for_one)
end
defp metrics do
[
counter("metrics.emit.value")
]
end
end
Let's analyze it. First of all, it is a simple Supervisor
that we need to start from some other process in order to work. We'll attach it to our main application supervisor tree. More on that later.
Then, the init
function is configuring the children for this supervisor. We have a single child here, the ConsoleReporter
. And we are passing a list of metrics to it as initial arguments.
Here is where it becomes interesting. Telemetry.Metrics.counter
is the definition of the metric we want to use. We pass a string: "metrics.emit.value"
. This string will be split using the period as separator and everything but the last part (metrics.emit), will be the event to attach to. The last part (value) will be taken from the measurement passed to the telemetry event handler.
Let's explain it. We have this call in Metrics.emit/1
:
:telemetry.execute([:metrics, :emit], %{value: value})
and we have a metrics definition in Metrics.Telemetry.metrics/0
:
counter("metrics.emit.value")
As you see, the counter is going to attach a handler for the [:metrics, :emit]
event and get the :value
attribute from the second argument to telemetry.execute
call (the measurement):
- event: "metrics.emit" -> [:metrics, :emit]
- measurement attribute: "value" -> :value
If we were collecting the :query_time
of [:my_app, :repo, :query]
event, we would write "my_app.repo.query.query_time"
.
Ok, let's continue.
We are using the ConsoleReporter
to collect our metrics. We defined the counter
metric. We also said that ConsoleReporter
do nothing but output the values received. That's is enough, for now, to check that we are in fact collecting metrics with our ConsoleReporter
.
But one thing is missing. We need to attach our Metrics.Telemetry
supervisor to the application supervisor, otherwise, nothing will happen. Open application.ex
and change the start/2
function to this:
def start(_type, _args) do
children = [
Metrics.Telemetry
]
opts = [strategy: :one_for_one, name: Metrics.Supervisor]
Supervisor.start_link(children, opts)
end
As you see we are putting our Metrics.Telemetry
as a child of the application Supervisor.
Test it
Let's try it with iex
. Open a shell terminal, get the dependencies and open an iex
session:
mix deps.get
iex -S mix
Emit an event:
iex(1)> Metrics.emit(4)
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: metrics.emit
All measurements: %{value: 4}
All metadata: %{}
Metric measurement: :value (counter)
Tag values: %{}
:ok
iex(2)>
Yay, it works. Although we didn't attach any handler to the [:metrics, :emit] event manually, the ConsoleReporter
handled it for us thanks to telemetry_metrics
. As I said, it only echoes what is passed in, but the important thing here is that it works.
If the reporter were a little more advanced it would do something with the passed value.
CustomReporter
Let's create a simple CustomReporter
by ourselves to see how this works. One thing to notice is that the reporter needs to have a memory of previous values in order to do its calculations. In our case, we need to remember how many events we have received in order to increment it every time we receive a new event. We could use :ets
or a GenServer
or Agent
depending on how complex our implementation needs to be. For this tutorial, I am going to use an Agent
as it is very simple to use to keep some persistent value.
Create a file called lib/metrics/telemetry/reporter_state.ex
and write this:
defmodule Metrics.Telemetry.ReporterState do
use Agent
def start_link(initial_value) do
Agent.start_link(fn -> initial_value end, name: __MODULE__)
end
def value do
Agent.get(__MODULE__, & &1)
end
def increment do
Agent.update(__MODULE__, &(&1 + 1))
end
end
Let's attach it as a child to our main application supervisor so that it is started when the app starts. Edit application.ex
and change the start/2
function to this:
def start(_type, _args) do
children = [
{Metrics.Telemetry.ReporterState, 0},
Metrics.Telemetry
]
opts = [strategy: :one_for_one, name: Metrics.Supervisor]
Supervisor.start_link(children, opts)
end
end
Now the application will automatically start a counter agent with an initial value of 0 that we can use from every other part of the app by using its name Metrics.Telemetry.CounterAgent
. Let's try it:
iex -S mix
iex(1)> Metrics.Telemetry.ReporterState.value()
0
iex(2)> Metrics.Telemetry.ReporterState.increment()
:ok
iex(3)> Metrics.Telemetry.ReporterState.value()
1
Nice, we have our agent working. We can use it to maintain the state of our custom reporter.
Let's write our reporter. Create a file called lib/metrics/telemetry/custom_reporter.ex
:
defmodule Metrics.Telemetry.CustomReporter do
use GenServer
alias Metrics.Telemetry.ReporterState
alias Telemetry.Metrics
def start_link(metrics: metrics) do
GenServer.start_link(__MODULE__, metrics)
end
@impl true
def init(metrics) do
Process.flag(:trap_exit, true)
groups = Enum.group_by(metrics, & &1.event_name)
for {event, metrics} <- groups do
id = {__MODULE__, event, self()}
:telemetry.attach(id, event, &__MODULE__.handle_event/4, metrics)
end
{:ok, Map.keys(groups)}
end
def handle_event(_event_name, measurements, metadata, metrics) do
metrics
|> Enum.map(&handle_metric(&1, measurements, metadata))
end
defp handle_metric(%Metrics.Counter{} = metric, _measurements, _metadata) do
ReporterState.increment()
current_value = ReporterState.value()
IO.puts("Metric: #{metric.__struct__}. Current value: #{current_value}")
end
defp handle_metric(metric, _measurements, _metadata) do
IO.puts("Unsupported metric: #{metric.__struct__}")
end
@impl true
def terminate(_, events) do
for event <- events do
:telemetry.detach({__MODULE__, event, self()})
end
:ok
end
end
A lot of things happening here, but don't get distracted by those. What you should notice is the handle_event
function that distributes the event to all the specific metric handlers to, well, handle the event.
The handle_metric
function uses ReporterState
to keep the current state and increments the running counter that we use to track how many times we have received this event. It also prints the new current state to the console.
Let's use our CustomReporter
instead of the ConsoleReporter
. Change the init/1
function in telemetry.ex
to be like this:
def init(_arg) do
children = [
{Metrics.Telemetry.CustomReporter, metrics: metrics()}
]
Supervisor.init(children, strategy: :one_for_one)
end
As you see, now we use our CustomReporter
and we pass the metrics to it just as we did with the ConsoleReporter
. Let's try it. Open the console and start iex
:
iex -S mix
iex(1)> Metrics.emit(4)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: 1
:ok
iex(2)> Metrics.emit(5)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: 2
:ok
iex(3)> Metrics.emit(2)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: 3
:ok
And there it is, now the counter is correctly keeping track of how many times we have received the event.
Let's review a few more things about the CustomReporter
. Most of them are better explained in the official docs here.
It traps the exit signal to detach the handlers when the app terminates or the process is about to finish. It also groups together similar events that require several metrics tracking. For example, suppose you want to collect counter
and sum
metrics for the event "my_app.repo.query.query_time"
. This code gets the event once and then calls handle_metric
two times, one with the %Telemetry.Metrics.Counter{}
as the first argument and one with %Telemetry.Metrics.Sum{}
as the first argument. And for both of those calls, the second argument is the measurement we just received.
Add support for more metrics
Ok, so far so good. But if you notice we are just handling one metric. Let's add support for the sum
metric.
First, to sum all the event measurements, we need to also keep track of the running sum so far. Our ReporterState
is only handling a single integer as the state. Let's change it to now store a tuple, with the first element being the count and the second element being the sum.
Change ReporterState
to this:
defmodule Metrics.Telemetry.ReporterState do
use Agent
def start_link(initial_value) do
Agent.start_link(fn -> initial_value end, name: __MODULE__)
end
def value do
Agent.get(__MODULE__, & &1)
end
def increment do
Agent.update(__MODULE__, fn {count, sum} -> {count + 1, sum} end)
end
def sum(value) do
Agent.update(__MODULE__, fn {count, sum} -> {count, sum + value} end)
end
end
Now the agent has a composite state and a function to increment the count part and a function to add to the total part. This is a naive implementation, of course, and doesn't even care about both parts of the state getting out of sync, but for our purposes, it will suffice. We needed a way to store state, we have it.
Let's change our CustomReporter
. Remove the old handle_metric
functions and put these ones instead:
defp handle_metric(%Metrics.Counter{} = metric, _measurements, _metadata) do
ReporterState.increment()
current_value = ReporterState.value()
IO.puts("Metric: #{metric.__struct__}. Current value: #{inspect(current_value)}")
end
defp handle_metric(%Metrics.Sum{} = metric, %{value: value}, _metadata) do
ReporterState.sum(value)
current_value = ReporterState.value()
IO.puts("Metric: #{metric.__struct__}. Current value: #{inspect(current_value)}")
end
defp handle_metric(metric, _measurements, _metadata) do
IO.puts("Unsupported metric: #{metric.__struct__}")
end
Again, nothing fancy. Now we handle a second type of metric, the %Metrics.Sum{}
and, similarly to the count one, we use the ReporterState
to keep track of the event measurements so far.
Let's tell telemetry_metrics to also handle this type of metrics. Update the metrics/0
function of telemetry.ex
to this:
defp metrics do
[
counter("metrics.emit.value"),
sum("metrics.emit.value")
]
end
Now the sum
metric is also being collected.
Finally, change the start/2
function in application.ex
to this:
def start(_type, _args) do
children = [
{Metrics.Telemetry.ReporterState, {0, 0}},
Metrics.Telemetry
]
opts = [strategy: :one_for_one, name: Metrics.Supervisor]
Supervisor.start_link(children, opts)
end
We are set. Let's try it. In the shell:
iex -S mix
iex(1)> Metrics.emit(4)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {1, 0}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {1, 4}
:ok
iex(2)> Metrics.emit(3)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {2, 4}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {2, 7}
:ok
iex(3)> Metrics.emit(2)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {3, 7}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {3, 9}
:ok
iex(4)> Metrics.emit(1)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {4, 9}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {4, 10}
:ok
And that's it. Our CustomReporter
is now capable of tracking two different metrics for us. Of course, a production system will have a better way to store and keep track of the series of measurements so that it has more guarantees about the data ingested. But that is out of the scope of this article.
Summary
We learned that telemetry_metrics
:
- offers a way to define 5 types of metrics in a standard way.
- doesn't really care or dictates how the measurements should be stored, aggregated, or manipulated
- the reporter responsibility is to ingest and implement the real manipulation of the data
There are several open-source implementations of reporters that allow us to make our measurements available to tools like Prometheus or StatsD servers.
In a future article, I'll talk about integrating with them.
Source code
You can find the source code for this article in this github repository.
About
I'm Miguel Cobá. I write about Elixir, Elm, Software Development, and eBook writing.
- Follow me on Twitter
- Subscribe to my newsletter
- Read all my articles on my blog
- Get my books: