Introduction to Telemetry in Elixir

Learn the basics of telemetry in Elixir

Maybe you, like me, have found Telemetry while reading Elixir but have had no time to learn more details about it. I recently put some time aside to learn it and decided to write about it.

In this article I am going to show you what Telemetry is, why is needed, and how to use it in your own Elixir project.

Software Telemetry

First of all, what is telemetry?

Wikipedia explains it very clearly:

Telemetry is the [...] collection of measurements [...] and their automatic transmission [...] for monitoring

We can apply that to software systems, too, by collecting measurements while a software system is running and keeping track of that data.

Why would we do this, you may ask? Glad you ask. There are several reasons, but the main one is to discover what is happening inside our systems and then optimize them based on our findings.

What problem does telemetry solve for us? It gives us visibility into things that are important to us to measure and monitor.

Maybe we want to keep track of CPU utilization to see why our AWS bill is so high, or we need to see how many times a query is executed and how long it took to finish in our PostgreSQL database.

Measuring things and monitoring is not enough. If we don't use the collected data to make decisions and change or optimize the system measured then we might as well not measure anything at all.

By measuring our systems we get several benefits:

  • we can look for parts of the system that are slow and can be improved.
  • we could discover parts of the system that are not used frequently enough and that maybe can be scaled down to reduce our infrastructure bill.
  • we could aggregate how many resources a customer is using every month and generate utilization graphs and other insights to make better use of our system. This would allow us to invite the customer to upgrade to a higher plan that includes this kind of data analysis.

In summary, software telemetry allows us to measure and act upon the collected data of a running system for our own benefit.

Telemetry in Elixir

There are several ways to take measures from a running Elixir app.

For example, the following fragment would take the duration of a function execution:

def some_function do
    start = System.monotonic_time()
    # do something that you want to measure
    stop = System.monotonic_time()
    record_measure("operation took: ", stop - start)
    # ...
end

In the strict sense, that is telemetry. But is not the best approach. It has several drawbacks:

  • It is repetitive
  • You have to reimplement it in every app you want to use it
  • You have no way to disable measuring, for example, during unit tests
  • more things that I can't think right now

Elixir Telemetry library

Given that telemetry is something that many people want, a library was created to address this need. It is called telemetry.

Advantages:

  • provides a standard interface to measure and emit telemetry data
  • it is small and simple to use

How does it work? It is very simple:

  • you execute a measurement
  • you create a function that will be invoked when the measure is taken
  • you attach the function to the event measured

Using telemetry

Let's try a simple example to see how easy is to start collecting telemetry data.

I am going to create a very simple Elixir for a fictional grocery store. We want to measure each sale.

First, create a simple Elixir project:

mix new telemetry_intro
cd telemetry_intro

Add telemetry to the list of dependencies in mix.exs:

defp deps() do
  [
    {:telemetry, "~> 1.0"}
  ]
end

Download the dependencies:

mix deps.get

Now our business logic module. Create a file named lib/grocery/store.ex and put this inside:

defmodule TelemetryIntro.Grocery.Store do
  def sale(product, quantity, amount) do
    total = quantity * amount

    "Sold #{product}: #{quantity} units at #{amount} each. Total #{total}"
  end
end

As you see it is very simple. It just receives some parameters, does something, and then returns a value. Doesn't have any measure code yet.

Let's just try it with iex. In the shell, write:

iex -S mix
iex(1)> TelemetryIntro.Grocery.Store.sale("apple", 4, 0.5)
"Sold apple: 4 units at 0.5 each. Total 2.0"

Nothing unexpected here. Let's add some measurements. Say we want to keep track of the total and also to track what product was sold.

Change the sale function to be like this:

defmodule TelemetryIntro.Grocery.Store do
  def sale(product, quantity, amount) do
    total = quantity * amount

    :telemetry.execute(
      [:grocery, :store, :sale],
      %{total: total},
      %{product: product}
    )

    "Sold #{product}: #{quantity} units at #{amount} each. Total #{total}"
  end
end

Good. Now we are doing something. We are using the execute function of the telemetry module and we are passing three things:

  • the event name as a list of keywords
  • the measurement we are interested in (sale total in our example)
  • additional metadata we might need (here I am just putting the product name)

The metadata could be anything we want: our users' IP address, some tracking header from a web request, the user id, etc.

The measurement is what we are interested in. This is what we'll collect and analyze and then use to decide things. It can be anything, but it must be a measurement of something.

Finally is the event name. You can put anything here. Most people put a hierarchy of words to easily identify the events. What is important to keep in mind is that this event name will be used later to match which function will handle this event.

Let's see how the counterpart, the event handler, looks.

Create a file lib/metrics/instrumenter.ex

defmodule TelemetryIntro.Metrics.Instrumenter do
  require Logger

  def handle_event([:grocery, :store, :sale], measurements, metadata, _config) do
    Logger.info("[Sale telemetry: #{measurements.total}] total for #{metadata.product}")
  end
end

It is very simple. It has a function (the name doesn't matter but the convention is to name it handle_event) that has as the first parameter the keyword list we are going to match against. Then the measurements and metadata. The last parameter doesn't matter right now. Just ignore it.

This function is going to be called every time the execute function is called and the event name matches. That means that this function must avoid blocking execution for much time as it will be evaluated synchronously.

Finally, we do a lame thing and just pass the received info to Logger.info. Ideally, we'll send that telemetry data to some repository for further processing and analysis but that is outside of the scope of this article (I'll cover it in a future one, tho).

What you need to notice here is that we decoupled the emitting of the telemetry data with the handling of the data collected. Our app can use :telemetry.execute everywhere with the same standard interface and, as long as we have a corresponding function to handle that event, we'll be collecting that data.

Even more important, if we don't have an event handler for that event, our emitting code will still work. It won't even notice there is nothing collecting data.

We are only missing one thing. We need to specify which function will handle a given event. We use the :telemetry.attach function to do that.

Create a function called setup in the Instrumenter module:

defmodule TelemetryIntro.Metrics.Instrumenter do
  require Logger

  def setup do
    events = [
      [:grocery, :store, :sale]
    ]

    :telemetry.attach_many("telemetry-intro-instrumenter", events, &__MODULE__.handle_event/4, nil)
  end

  def handle_event([:grocery, :store, :sale], measurements, metadata, _config) do
    Logger.info("[Sale telemetry: #{measurements.total}] total for #{metadata.product}")
  end
end

As you can see, it is passing to :telemetry.attach_many a list of event names to monitor and the function that will handle all those event names.

By using pattern matching, the same handle_event function can handle all the necessary events in your app.

Now we need to call this setup function when the app starts.

Let's change the way our elixir app starts by modifying the mix.exs file and changing the application function inside it so that it is like this:

  def application do
    [
      extra_applications: [:logger],
      mod: {TelemetryIntro, []}
    ]
  end

We added the mod: to specify a module that will be called when the application starts to execute some custom code. Our custom code will be the telemetry configuration.

Now replace the contents of the lib/telemetry_intro.ex module with this:

defmodule TelemetryIntro do
  use Application

  def start(_type, _args) do
    TelemetryIntro.Metrics.Instrumenter.setup()

    children = []
    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

This module implements the start callback of the Application behavior. This function's responsibility is to spawn and link a supervisor and return {:ok, pid} or {:ok, pid, state}. Here we just start a supervisor with no children. That will suffice for this example.

Let's try it. Start again the application with iex:

iex -S mix
iex(1)> TelemetryIntro.Grocery.Store.sale("apple", 4, 0.2)

00:18:01.867 [info]  [Sale telemetry: 0.8] total for apple
"Sold apple: 4 units at 0.2 each. Total 0.8"

Great, it works. The first output line is the result of handle_event call. The second one is the return value of the sale function.

Let's see what happens if we don't configure a handler for the event.

Comment the TelemetryIntro.Metrics.Instrumenter.setup() in telemetry_intro.ex

defmodule TelemetryIntro do
  use Application

  def start(_type, _args) do
    # TelemetryIntro.Metrics.Instrumenter.setup()

    children = []
    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

And try again with iex:

iex -S mix
iex(1)> TelemetryIntro.Grocery.Store.sale("apple", 4, 0.2)
"Sold apple: 4 units at 0.2 each. Total 0.8"

As you can see, the code still works, even if there is no handler configured for that event. Our app still calls the :telemetry.execute function but nothing breaks when nothing is configured to handle that event. How cool is that?

We learned how to emit events and handle them. You can send this info to a third-party error monitoring service or ingest it to a big data store. From here, the sky is the limit.

Summary

In this article we learned:

  • what is telemetry
  • how to add it to an Elixir project
  • how to measure and collect data
  • how telemetry works

As we have seen, telemetry is a powerful tool to inspect the behavior of a running system.

As long as we use the collected data for making decisions to improve our system or its behavior, data collection through telemetry is an investment that every system must seriously consider making.

If you want to learn more about telemetry, you can read the full documentation here

About

I'm Miguel Cobá. I write about Elixir, Elm, Software Development, and eBook writing.