Observability Part1— Demystifying MELT Telemetry Data: Metrics, Events, Logs, Traces
To have a better observability in our system, the first step is to gather performance data. In this article, we will try to understand the four key telemetry data types which we rely on to gain insights into system performance. The acronym that encapsulates these data types is MELT, which stands for Metrics, Events, Logs, and Traces. We will explore each of these telemetry data types using a flight reservation system as an example.
Metrics are aggregated numbers calculated by observing events over a period of time. They provide us a high-level summary of what is happening in our system. Think of metrics as the big picture view.
In our flight reservation system example, a metric could be the average booking time for flights over the last hour. This metric is calculated by observing the start and end times of flight bookings during that time period.
Events are distinct actions occurring at a specific moment in time. They capture discrete occurrences in our system. An event contains information such as a timestamp and relevant details about the action.
When we are purchasing a flight ticket, that purchase transaction is an event. It includes essential information like the timestamp of the purchase, the ticket price, the source, and the destination. Another event might be a flight search for a specific route and date. Events provide valuable insights into what specific actions took place within your system at a particular point in time.
Logs are like events but with a lot more detail. While an event summarizes an action, logs provide a comprehensive, granular information of that action. One event can generate multiple lines of logs.
Continuing with our flight reservation system, an event could be a successful booking, but the logs associated with it might include every step of the booking process. This could encompass details such as user interactions, server responses, and error messages. Logs are invaluable for troubleshooting and debugging, as they provide a detailed information of what happened during an event.