# Telemetry and Metrics in Elixir

In the [previous](https://blog.miguelcoba.com/introduction-to-telemetry-in-elixir) article, I introduced you to Telemetry. The `telemetry` library provides a way to:

- **generate** events
- **create** handler functions that will receive the events
- **register** which handlers will be called when an event occurs

The principal advantage of `telemetry` is a **standardized** way to emit events and collect measurements in Elixir. This way, third-party libraries and code that we write ourselves can use it to generate events in a consistent way.

Let's now take a step back and consider a couple of things. An event is just a measure of something that happened at a given point in time. By itself doesn't tell us much. We might need more data to compare it against or to infer some trend or some rate of change. But the measure by itself is not very useful.

With that in mind, the telemetry developers created a library called `telemetry_metrics`. According to their documentation, `telemetry_metrics` is:

> Common interface for defining metrics based on :telemetry events.

Ok, but what are metrics, and how it is different from measurements taken with events? Well, again according to their docs:

> Metrics are aggregations of Telemetry events with specific name, providing a view of the system's behavior over time.

Ok, it seems clear, right? Not really. 

If you start using the `telemetry_metrics` library you'll notice that in fact **DOESN'T AGGREGATE** anything!!!

What? 

Exactly that. Doesn't aggregate anything. 

If you generate 1000 events with a measurement each, this library is not going to keep, manipulate, calculate, summarize or do anything by itself.

So, what is useful for, you might ask? 

Its aim is to **define** what kind of metrics you're going to create from events. That's it. Just a declaration, a definition, a contract if you will. Nothing else.

So, how do you get the real metrics, the processed results of your carefully collected measurements?

For that, you need something else. You need a `reporter`.

The reporter's responsibility is to aggregate and process each event received so that it conforms to the specified metric it handles.

`telemetry_metrics` ships with a simple reporter called `ConsoleReporter` that doesn't do much. It only outputs the measurement and metric details to the console. Not much, but allows you to verify that everything works and you're collecting data to generate metrics correctly.

For more useful reporters you should either write them yourself or get them from someone that has already written it.

In this article, I am going to show you how to use the default `ConsoleReporter` and how to create a custom reporter.

## Prerequisites

Use `asdf` and install these versions of elixir and erlang:

```console
asdf install erlang 24.2.1
asdf global erlang 24.2.1
asdf install elixir 1.13.3-otp-24
asdf global elixir 1.13.3-otp-24
```

## Create an Elixir app

```
mix new metrics --sup
```

This will create an Elixir app with an application callback where we can attach our telemetry events and configure our metrics.

## Install dependencies

Add `telemetry` and `telemetry_metrics` to your `mix.exs`:

```elixir
  defp deps do
    [
      {:telemetry, "~> 1.0"},
      {:telemetry_metrics, "~> 0.6.1"}
    ]
  end
```

## Create an event emitter

Let's create a simple event emitter that we can use to test our metrics. Open the `lib/metrics.ex` file that mix created for you and replace its contents with this:

```elixir
defmodule Metrics do
  def emit(value) do
    :telemetry.execute([:metrics, :emit], %{value: value})
  end
end
```

As you can see it uses `:telemetry.execute` to emit an event passing the value we provide.

## Define some metrics

Ok, now the interesting part: let's define some metrics we want to collect. `telemetry_metrics` allows to create several types of metrics:

- `counter/2` which counts the total number of emitted events
- `sum/2` which keeps track of the sum of selected measurement
- `last_value/2` holding the value of the selected measurement from the most recent event
- `summary/2` calculating statistics of the selected measurement, like maximum, mean, percentiles, etc.
- `distribution/2` which builds a histogram of selected measurement

We are going to start with the basics and define a counter metric, assuming we want to count how many times the event has happened.

Create a new file `lib/metrics/telemetry.ex` and put this in it:

```elixir
defmodule Metrics.Telemetry do
  use Supervisor
  import Telemetry.Metrics

  def start_link(arg) do
    Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
  end

  def init(_arg) do
    children = [
      {Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end

  defp metrics do
    [
      counter("metrics.emit.value")
    ]
  end
end
```

Let's analyze it. First of all, it is a simple `Supervisor` that we need to start from some other process in order to work. We'll attach it to our main application supervisor tree. More on that later.

Then, the `init` function is configuring the children for this supervisor. We have a single child here, the `ConsoleReporter`. And we are passing a list of metrics to it as initial arguments.

Here is where it becomes interesting. `Telemetry.Metrics.counter` is the **definition** of the metric we want to use. We pass a string: `"metrics.emit.value"`. This string will be split using the period as separator and everything but the last part (metrics.emit), will be the event to attach to. The last part (value) will be taken from the measurement passed to the telemetry event handler.

Let's explain it. We have this call in `Metrics.emit/1`:

```elixir
    :telemetry.execute([:metrics, :emit], %{value: value})
```

and we have a metrics definition in `Metrics.Telemetry.metrics/0`:

```elixir
      counter("metrics.emit.value")
```

As you see, the counter is going to attach a handler for the `[:metrics, :emit]` event and get the `:value` attribute from the second argument to `telemetry.execute` call (the measurement):

- **event**: "metrics.emit" -> [:metrics, :emit]
- **measurement attribute**: "value" ->  :value

If we were collecting the `:query_time` of `[:my_app, :repo, :query]` event, we would write `"my_app.repo.query.query_time"`.

Ok, let's continue.

We are using the `ConsoleReporter` to collect our metrics. We defined the `counter` metric. We also said that `ConsoleReporter` do nothing but output the values received. That's is enough, for now, to check that we are in fact collecting metrics with our `ConsoleReporter`.

But one thing is missing. We need to attach our `Metrics.Telemetry` supervisor to the application supervisor, otherwise, nothing will happen. Open `application.ex` and change the `start/2` function to this:

```elixir

  def start(_type, _args) do
    children = [
      Metrics.Telemetry
    ]

    opts = [strategy: :one_for_one, name: Metrics.Supervisor]
    Supervisor.start_link(children, opts)
  end
```

As you see we are putting our `Metrics.Telemetry` as a child of the application Supervisor.


## Test it

Let's try it with `iex`. Open a shell terminal, get the dependencies and open an `iex` session:

```bash
mix deps.get
iex -S mix
```

Emit an event:

```elixir
iex(1)> Metrics.emit(4)
[Telemetry.Metrics.ConsoleReporter] Got new event!
Event name: metrics.emit
All measurements: %{value: 4}
All metadata: %{}

Metric measurement: :value (counter)
Tag values: %{}

:ok
iex(2)> 
```

Yay, it works. Although we didn't attach any handler to the [:metrics, :emit] event manually, the `ConsoleReporter` handled it for us thanks to `telemetry_metrics`. As I said, it only echoes what is passed in, but the important thing here is that it works.

If the reporter were a little more advanced it would do something with the passed value.

## CustomReporter

Let's create a simple `CustomReporter` by ourselves to see how this works. One thing to notice is that the reporter needs to have a memory of previous values in order to do its calculations. In our case, we need to remember how many events we have received in order to increment it every time we receive a new event. We could use `:ets` or a `GenServer` or `Agent` depending on how complex our implementation needs to be. For this tutorial, I am going to use an `Agent` as it is very simple to use to keep some persistent value.

Create a file called `lib/metrics/telemetry/reporter_state.ex` and write this:

```elixir
defmodule Metrics.Telemetry.ReporterState do
  use Agent

  def start_link(initial_value) do
    Agent.start_link(fn -> initial_value end, name: __MODULE__)
  end

  def value do
    Agent.get(__MODULE__, & &1)
  end

  def increment do
    Agent.update(__MODULE__, &(&1 + 1))
  end
end
```

Let's attach it as a child to our main application supervisor so that it is started when the app starts. Edit `application.ex` and change the `start/2` function to this:

```elixir
  def start(_type, _args) do
    children = [
      {Metrics.Telemetry.ReporterState, 0},
      Metrics.Telemetry
    ]

    opts = [strategy: :one_for_one, name: Metrics.Supervisor]
    Supervisor.start_link(children, opts)
  end
end
```

Now the application will automatically start a counter agent with an initial value of 0 that we can use from every other part of the app by using its name `Metrics.Telemetry.CounterAgent`. Let's try it:

```bash
iex -S mix
iex(1)> Metrics.Telemetry.ReporterState.value()
0
iex(2)> Metrics.Telemetry.ReporterState.increment()
:ok
iex(3)> Metrics.Telemetry.ReporterState.value()    
1
```

Nice, we have our agent working. We can use it to maintain the state of our custom reporter.

Let's write our reporter. Create a file called `lib/metrics/telemetry/custom_reporter.ex`:

```elixir
defmodule Metrics.Telemetry.CustomReporter do
  use GenServer

  alias Metrics.Telemetry.ReporterState
  alias Telemetry.Metrics

  def start_link(metrics: metrics) do
    GenServer.start_link(__MODULE__, metrics)
  end

  @impl true
  def init(metrics) do
    Process.flag(:trap_exit, true)

    groups = Enum.group_by(metrics, & &1.event_name)

    for {event, metrics} <- groups do
      id = {__MODULE__, event, self()}
      :telemetry.attach(id, event, &__MODULE__.handle_event/4, metrics)
    end

    {:ok, Map.keys(groups)}
  end

  def handle_event(_event_name, measurements, metadata, metrics) do
    metrics
    |> Enum.map(&handle_metric(&1, measurements, metadata))
  end

  defp handle_metric(%Metrics.Counter{} = metric, _measurements, _metadata) do
    ReporterState.increment()

    current_value = ReporterState.value()

    IO.puts("Metric: #{metric.__struct__}. Current value: #{current_value}")
  end

  defp handle_metric(metric, _measurements, _metadata) do
    IO.puts("Unsupported metric: #{metric.__struct__}")
  end

  @impl true
  def terminate(_, events) do
    for event <- events do
      :telemetry.detach({__MODULE__, event, self()})
    end

    :ok
  end
end
```

A lot of things happening here, but don't get distracted by those. What you should notice is the `handle_event` function that distributes the event to all the specific metric handlers to, well, handle the event.

The `handle_metric` function uses `ReporterState` to keep the current state and increments the running counter that we use to track how many times we have received this event. It also prints the new current state to the console.

Let's use our `CustomReporter` instead of the `ConsoleReporter`. Change the `init/1` function in `telemetry.ex` to be like this:

```elixir
  def init(_arg) do
    children = [
      {Metrics.Telemetry.CustomReporter, metrics: metrics()}
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end
```

As you see, now we use our `CustomReporter` and we pass the metrics to it just as we did with the `ConsoleReporter`. Let's try it. Open the console and start `iex`:

```console
iex -S mix
iex(1)> Metrics.emit(4)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: 1
:ok
iex(2)> Metrics.emit(5)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: 2
:ok
iex(3)> Metrics.emit(2)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: 3
:ok
```

And there it is, now the counter is correctly keeping track of how many times we have received the event.

Let's review a few more things about the `CustomReporter`. Most of them are better explained in the official docs [here](https://hexdocs.pm/telemetry_metrics/Telemetry.Metrics.ConsoleReporter.html).

It traps the exit signal to detach the handlers when the app terminates or the process is about to finish. It also groups together similar events that require several metrics tracking. For example, suppose you want to collect `counter` and `sum` metrics for the event `"my_app.repo.query.query_time"`. This code gets the event once and then calls `handle_metric` two times, one with the `%Telemetry.Metrics.Counter{}` as the first argument and one with `%Telemetry.Metrics.Sum{}` as the first argument. And for both of those calls, the second argument is the measurement we just received.

## Add support for more metrics

Ok, so far so good. But if you notice we are just handling one metric. Let's add support for the `sum` metric.

First, to sum all the event measurements, we need to also keep track of the running sum so far. Our `ReporterState` is only handling a single integer as the state. Let's change it to now store a tuple, with the first element being the count and the second element being the sum.

Change `ReporterState` to this:

```elixir
defmodule Metrics.Telemetry.ReporterState do
  use Agent

  def start_link(initial_value) do
    Agent.start_link(fn -> initial_value end, name: __MODULE__)
  end

  def value do
    Agent.get(__MODULE__, & &1)
  end

  def increment do
    Agent.update(__MODULE__, fn {count, sum} -> {count + 1, sum} end)
  end

  def sum(value) do
    Agent.update(__MODULE__, fn {count, sum} -> {count, sum + value} end)
  end
end

```

Now the agent has a composite state and a function to increment the count part and a function to add to the total part. This is a naive implementation, of course, and doesn't even care about both parts of the state getting out of sync, but for our purposes, it will suffice. We needed a way to store state, we have it. 

Let's change our `CustomReporter`. Remove the old `handle_metric` functions and put these ones instead:

```elixir
  defp handle_metric(%Metrics.Counter{} = metric, _measurements, _metadata) do
    ReporterState.increment()

    current_value = ReporterState.value()

    IO.puts("Metric: #{metric.__struct__}. Current value: #{inspect(current_value)}")
  end

  defp handle_metric(%Metrics.Sum{} = metric, %{value: value}, _metadata) do
    ReporterState.sum(value)

    current_value = ReporterState.value()

    IO.puts("Metric: #{metric.__struct__}. Current value: #{inspect(current_value)}")
  end

  defp handle_metric(metric, _measurements, _metadata) do
    IO.puts("Unsupported metric: #{metric.__struct__}")
  end
```

Again, nothing fancy. Now we handle a second type of metric, the `%Metrics.Sum{}` and, similarly to the count one, we use the `ReporterState` to keep track of the event measurements so far.

Let's tell telemetry_metrics to also handle this type of metrics. Update the `metrics/0` function of `telemetry.ex` to this:

```elixir
  defp metrics do
    [
      counter("metrics.emit.value"),
      sum("metrics.emit.value")
    ]
  end
```

Now the `sum` metric is also being collected.

Finally, change the `start/2` function in `application.ex` to this:

```elixir
 def start(_type, _args) do
    children = [
      {Metrics.Telemetry.ReporterState, {0, 0}},
      Metrics.Telemetry
    ]

    opts = [strategy: :one_for_one, name: Metrics.Supervisor]
    Supervisor.start_link(children, opts)
  end
```

We are set. Let's try it. In the shell:

```bash
iex -S mix
iex(1)> Metrics.emit(4)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {1, 0}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {1, 4}
:ok
iex(2)> Metrics.emit(3)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {2, 4}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {2, 7}
:ok
iex(3)> Metrics.emit(2)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {3, 7}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {3, 9}
:ok
iex(4)> Metrics.emit(1)
Metric: Elixir.Telemetry.Metrics.Counter. Current value: {4, 9}
Metric: Elixir.Telemetry.Metrics.Sum. Current value: {4, 10}
:ok
```

And that's it. Our `CustomReporter` is now capable of tracking two different metrics for us. Of course, a production system will have a better way to store and keep track of the series of measurements so that it has more guarantees about the data ingested. But that is out of the scope of this article.

## Summary

We learned that `telemetry_metrics`:
- offers a way to define 5 types of metrics in a standard way.
- doesn't really care or dictates how the measurements should be stored, aggregated, or manipulated
- the reporter responsibility is to ingest and implement the real manipulation of the data

There are several open-source implementations of reporters that allow us to make our measurements available to tools like Prometheus or StatsD servers.

In a future article, I'll talk about integrating with them.

## Source code

You can find the source code for this article in this [github repository](https://github.com/miguelcoba/metrics).

## About

I'm [Miguel Cobá](https://miguelcoba.com). I write about Elixir, Elm, Software Development, and eBook writing.

- Follow me on [Twitter](https://twitter.com/MiguelCoba_)
- Subscribe to my [newsletter](https://newsletter.miguelcoba.com)
- Read all my articles on my [blog](https://blog.miguelcoba.com)
- Get my books:
    - [100 Elixir Tips](https://store.miguelcoba.com/l/100elixirtips) [FREE]
    - [Deploying Elixir](https://store.miguelcoba.com/l/deployingelixir) [FREE]
    - [Deploying Elixir: Advanced Topics](https://store.miguelcoba.com/l/advancedtopics)
    - [eBook Writing Workflow for Developers](https://store.miguelcoba.com/l/ebookwriting)


