Prometheus and Grafana

Introduction

Tests over extended periods are important to demonstrate the robustness at the scale of whole networks. They reveal rare but high-impact issues that can flood support centers with angry customers.

Since v2.18, the ByteBlower GUI has gotten more capabilities to support such long tests. In these cases, the software will now even suggest saving the intermediate results only in a dedicated data store like Prometheus.

In this article, we’ll provide a short introduction to Prometheus, describe its advantages, and provide information on how to configure it all.

Why are endurance tests difficult?

Long tests collect lots of results. Each second of the test, for each of the participating traffic flows, the ByteBlower server sends out many results: #bytes, #packets, various timestamps, aggregated latency results,... All these results are shown in the ByteBlower GUI report. 

The tables and graphs shown above are built from different result types through:

  • The cumulative results: These results are a summary of the whole testing period. These values are used for the tables.
  • Over-time results, or the interval results: These results are a summary of small, individual periods. The graphs show these values.

The difference between both types is well visible when the device under test experiences a short issue during the test-run. For example, take the traffic flow described below.

The tabular results show a small amount of packet loss. While most applications do tolerate a small but persistent loss, a modem reboot on the other hand does have a much more significant impact. Such differences may remain hidden in the summary of the whole scenario but are very visible in the results over time.

As mentioned earlier, finding such rare but high-impact issues requires long test-runs with many devices. By Increasing the number of devices and by increasing the duration of the test as a whole, there's a much higher chance of encountering future device-issues. 

Saving the over-time results

These over-time results are clearly important, but each intermediate value needs to be saved in order to be shown later on. Long tests thus result in large volumes of data.

Before v2.18, the ByteBlower GUI took the responsibility to save all over-time results. This places a limit on how large tests can grow before overloading the application. To support this end the ByteBlower GUI now also supports offloading this responsibility to a dedicated datastore (Prometheus). In addition, these tools tie into much more extensive ecosystems that can visualize these results better.

Why use Prometheus?

The Prometheus storage engine specializes in saving long runs of numerical results.  In addition, Prometheus has the advantage of:

  • Automatic storage management to keep the total size of the storage limited.
  • Compression of the gathered results keeps disk requirements small.
  • Fast querying of results collected over extended periods of time.
  • Easy installation and configuration.
When to use Prometheus?

The real-time results are always available for Prometheus. For small tests, there is no difference. At the end of the test, you will receive a graphical report with summary results in tabular form and over-time results shown in graphs.

When running large scenarios for extended periods of time with the GUI, storing the real-time results poses a bottleneck for the application. In such cases, the ByteBlower GUI will propose to save the results only externally.  

You will still get an HTML report for large test scenarios. This report will only contain summary results. The over-time data can be found in the external tooling. How to set up this tooling is described in the next parts of this article.

Saving results in Prometheus

Prometheus is a storage engine that has several advantages for saving results over extended periods of time. It was started as a side-project by Soundcloud but since has gained considerable traction in the open-source community.

As described higher in the previous section, no ByteBlower GUI configuration is required.  It’s Prometheus who takes the initiative to scrape these results and store them (pull model).

The general getting started guide for Prometheus is well written. Yet, to make getting started even easier, in the sections below we wish to focus on the ByteBlower GUI-specific parts.

To try it out, no additional hardware is required, ByteBlower GUI and Prometheus server can run on the same machine. 

Prometheus uses a plain-text configuration file (‘prometheus.yml’). Below the default configuration is modified to scrape the results from a ByteBlower GUI on the same computer (localhost). The results are available over TCP using port 8123.

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: ‘ByteBlower GUI’
    static_configs:
      - targets: ['localhost:8123']
Viewing the results

Prometheus is primarily a storage engine. It has an integrated webserver that allows you to configure the data targets. You can also see the scraped data in a very basic way.
By default, Prometheus hosts its main page on port 9090.  Locally you can browse to http://localhost:9090, which shows the following page:

With the ByteBlower GUI performing a test run, the following query can be tried:

  • byteblower_gui_traffic_bytes_total

More monitoring is available in the ‘Status’ dropdown in the top menu-bar. If everything is running fine, the ‘Targets’-page should show the state ‘UP’ and a recent “Last Scrape”.

Dedicated dashboards with Grafana