Getting started with Prometheus
Posted by Pieter Vandercammen, Last modified by Pieter Vandercammen on 18 July 2022 04:31 PM
Getting started with Prometheus
Long Duration test-runs
Tests over extended periods are important to demonstrate the robustness at the scale of whole networks. They reveal rare but high impact issues that can flood support-centers with angry customers.
Since v2.18, the ByteBlower GUI has gotten more capabilities to support such long tests. In these cases the software will now even suggest to save the intermediate results only in a dedicated datastore like Prometheus.
In this article we’ll provide a short introduction to Prometheus, describe its advantages and provide information on how to configure it all.
Why are endurance tests difficult?
Long tests collect lots of results. Each second of the test, for each of the participating traffic flows, the ByteBlower server sends out many results: #bytes, #packets, various timestamps, aggregated latency results, ... All these results are shown in the ByteBlower GUI report.
The tables and graphs shown above are built from different result types though:
The difference between both types is well visible when the device under test experiences a short issue during the test-run. For example, take a traffic flow described below.
The tabular results show a small amount of packet loss. While most applications do tolerate a small but persistent loss, a modem reboot on the other hand does have a much more significant impact. Such differences may remain hidden in the summary of the whole scenario but are very visible in the results over time.
As mentioned earlier, finding such rare but high-impact issues requires long test-runs with many devices. By Increasing the number of devices and by increasing the duration of the test as whole, there's a much higher chance to encounter future device-issues.
Saving the over-time results
These over-time results are clearly important, but each intermediate value needs to be saved in order to be shown later on. Long tests thus result in large volumes of data.
Before the v2.18, the ByteBlower GUI took the responsibility to save all over-time results. This places a limit on how large tests can grow before overloading the application. To support this end the ByteBlower GUI now also supports offloading this responsibility to a dedicated datastore (Prometheus). In addition, these tools tie into much more extensive eco-systems that can visualize these results better.
The Prometheus storage engine specializes in saving long runs of numerical results. In addition, Prometheus has the advantage of:
When to use Prometheus?
The real-time results are always available for Prometheus. For small tests there is no difference. At the end of the test, you will receive a graphical report with summary results in tabular form and over-time results shown in graphs.
When running large scenarios for extended periods of time with the GUI, storing the real-time results poses a bottleneck for the application. In such cases the ByteBlower GUI will propose to save the results only externally.
You will still get an HTML report for large test scenarios. This report will only contain summary results. The over-time data can be found in the external tooling. How to set up this tooling is described in the next parts of this article.
Saving results in Prometheus
Prometheus is a storage engine that has several advantages for saving results over extended periods of time. It was started as side-project by Soundcloud but since has gained considerable traction in the open-source community.
As described higher in the previous section, no ByteBlower GUI configuration is required. It’s Prometheus who takes the initiative to scrape these results and store them (pull model).
The general getting started guide for Prometheus is well written. Yet, to make getting started even easier, in the sections below we wish to focus the ByteBlower GUI specific parts.
To try it out, no additional hardware is required, ByteBlower GUI and Prometheus server can run on the same machine.
Prometheus uses a plain-text configuration file (‘prometheus.yml’). Below the default configuration is modified to scrape the results from a ByteBlower GUI on the same computer (localhost). The results are available over TCP using port 8123.
Viewing the results
Prometheus is primarily a storage engine. It has an integrated webserver that allows you to configure the data targets. You can also see the scraped data in a very basic way.
With the ByteBlower GUI performing a test run, the following query can be tried:
More monitoring is available in the ‘Status’ dropdown in the top menu-bar. If everything is running fine, the ‘Targets’-page should show the state ‘UP’ and a recent “Last Scrape”.
Dedicated dashboards with Grafana.
Grafana is a dedicated dashboarding tool that matches well with Prometheus. It makes it easy to visualize the collected results in various ways. These visuals are saved and updated at regular intervals.
Like Prometheus, Grafana is an opensource tool with publicly available binaries. To get started, it can run together with the ByteBlower GUI and Prometheus on the same machine.
The getting started guide is also available online. From here there are two additional steps:
Where to go next from here?
Check out our Dashboard examples. Contact us when you don’t find what you want between the examples.
The full API guide is available on https://api.byteblower.com/prometheus. This guide contains all the details.
Finally, Prometheus is part of a whole ecosystem. This offers much more than just collecting and displaying the results. Alerting is one of these features. We don’t have much experience with this part yet but we’re happy to hear your results.