Welcome to the Excentis support portal. The requested article is not yet available to you. Please login with your credentials. If you don't have login credentials yet, please click here or contact us at support@excentis.com
Knowledge base

Latency is the time a packet takes to travel from source to destination. Hence, seeing negative latency values in the report is always a surprise, it's as if packets arrived even before they were sent! The figure below is one such an example. This article explains how this is possible and what actions can be taken to prevent it.

Intro: Measuring latency with ByteBlower

This section explains how ByteBlower performs the latency measurement. This helps to understand the cause of negative latency values.

The picture below shows two FrameBlasting flows. The one at the top is a regular flow with packets going from PORT_1 to PORT_2, the one at the bottom is a latency flow.

Contrary to the regular flow, the ByteBlower server will modify traffic for the latency flow! Part of the payload content is replaced with a realtime timestamp. This value represents the moment the packet leaves the ByteBlower, so the current local time at the source ByteBlower port.

The receiving ByteBlower port only needs to inspect the packet and compare the timestamp in the packet to its current time. The difference (local time at the destination/receiver ByteBlower port minus the timestamp value in the packet) represents how long the packet was in transit, i.e. the latency of the packet.

The advantage of this approach is that the only communication between PORT_1 and PORT_2 is through the test traffic itself, no other protocols are needed. The ports don't have to be on the same interface, same server or even in the same lab.

The above approach is used both by ByteBlower servers and Wireless Endpoints. All information is available in the traffic itself. This makes it very flexible to measure the latency between ports docked to the same server, between a server and Wireless Endpoint or between different ByteBlower servers.

Since we rely on local clocks on the ports to generate and compare the timestamps, and since the timestamps are carried in the packet payload, there are two major reasons for problems with measuring latency:

  • The sending and receiving side measure the local time differently (clocks not synchronized)
  • The packets have been corrupted

We'll have a look at both problems in more detail.

Synchronized clocks

In the section above we've explained how the transmitting side adds a timestamp to the frames The receiving end compares this value to its local time. We expect to measure a difference between the value in the frame and the time at the receiving end, this is how long the packet was under way. Hence the clocks need to be synchronized, otherwise we're just measuring the difference in clocks rather than the packet transit time. Clock differences where the receiver port's clock is trailing the sender port's clock is the major cause of negative latency values!

Within the same ByteBlower

Not a problem here, since source and destination port use the same clock!

If the setup allows you to, using a single ByteBlower server is the preferred way to measure latency!

ByteBlower to ByteBlower

As mentioned above both ByteBlowers needs to be time-synced. The article below offers more info on how to configure this:

https://support.excentis.com/index.php?/Knowledgebase/Article/View/15

Note that for latency measures, it is important to keep the clocks of the different ByteBlower servers in sync (using NTP or PTP), since otherwise they can drift away from each other. If you only sync them once, the first measurement can give good latency results, but repeating that same test a couple of months later could provide wrong results.

Wireless Endpoint to ByteBlower

Measuring latency using a Wireless Endpoint is a challenge. Contrary to regular ByteBlower traffic, here it is not a ByteBlower server port that sends or receives traffic, but it's the Wireless Endpoint (your phone e.g.). Key question: which timestamp to use?

When registering with a Meeting Point, the Wireless Endpoint tries and synchronizes its local time to the time of the Meeting Point (the latter taking the time of the ByteBlower server it is connected to). Important to note that this time synchronization is not as good as two ByteBlower servers using the same NTP server! Whereas the ByteBlower servers can use a (typically stable) management network to synchronize, the Wireless Endpoint uses the same connection as the traffic itself.

Furthermore, there is no update of time synchronization during a test, so a Wireless Endpoint moving from one AP to another AP during a test could experience a severe time drift without its clock being updated.

Bottom line: latency measures for a Wireless Endpoint are provided as a best effort service. Be careful when interpreting these results!

Packet corruption

Since the timestamps are carried in the packet payload, there is another reason why latency measurements can go wrong: corrupted packets.

Most likely to happen in case of a corrupted packet is:

  1. packet is dropped due to CRC failure
  2. CRC is valid, but timestamp was altered, packet is counted as invalid (no latency) because the calculated time offset is too large (>1 minute)
  3. CRC is valid, but timestamp was altered, latency shows unexpected peaks (multiple seconds, but less than 1 minute)

Below is an example of such a latency measurement (simulated through an impairment node).

Next to the obvious spikes in latency, you can have a look at the packet loss measurements to know whether or not the negative latency was caused by packet corruption.

To solve these problems, you will need to solve the packet corruption itself (network or device).

Another case?

If your tests matches none of the above cases,then it's a good idea to contact us at support.byteblower@excentis.com . We'll help you further from there.

Pinpointing packet loss in time

When packet loss happens in a flow, it can be useful to know when it happens.  Typically some questions pop up:

  • Was this packet loss a single event?
  • How long did the loss-event take?
  • When did the loss occur?  Was it in the beginning of the frame-blasting flow?  Was it near the end?

This article shows several approaches to find where the packet loss occurred.

Approach 1: Use the ByteBlower GUI reporting

When loss is significant, the ByteBlower GUI will show this in its report.  The "results over time"-graph will show a dip. 

Using the zoom function, a more precise take on this can be made.

Approach 2: Shorten the flow

Sometimes traffic loss occurs only in the beginning of the flow. 

A typical symptom of this is that e.g. 30% of traffic loss occurs in a 10 second test.  The more the test is elongated (e.g. half an hour, a day), the loss percentage decreases.

So when the flow is shortened the loss percentage can do 3 things:

  • The loss percentage decreases. 
    This means there are multiple loss events in the original scenario.  But there are still loss events in the current scenario
  • The loss percentage stays about the same.
    There is a continuous loss over time. 
  • The loss percentage raises.
    This probably means that the main loss-event is at the start of the test.

Approach 3: Divide the long flow in multiple shorter flows.

This approach is a combination of the two approaches above.  When a long flow is divided in multiple (shorter) flows, it should be easier to pin-point some frame-loss events.  An example:

Here, a 10 second flow is split up in 20 flows.  These flows all take 500ms and start at 500ms intervals.

The tabular data in the report will show which short flow has loss.  It is then easier to pin-point the actual moment in time to investigate further.

Approach 4: Using Out of Sequence detection

The ByteBlower has a neat feature one can use to debug loss over time: Out of Sequence detection.  Whilst this feature was implemented to detect bad reordering of packets after queuing them, it can be used to investigate frame loss.

Out of Sequence (OoS) inserts an incrementing frame number into the payload.  This enables ByteBlower to detect frames to be out of order. 

E.g. Frame with ID 5 should arrive after frame 4 and before frame 7.  It arrives after frame 7.
This triggers the ByteBlower server to mark frame 5 out of sequence.

What ByteBlower doesn't track (yet), are the frames which are lost.  In the example above, frame 6 was lost.

Luckily, the ByteBlower server provides another way to do this ourselves: Capturing on the interfaces.

A capture can be created using the ByteBlower GUI.  When this capture is opened in a packet analyzer (e.g. Wireshark), it is possible to extract the frame number out of the payload.

In the screenshot above, the frame number (identifier) is marked.  This is an 8-byte field at the end of the payload.  The first 2 bytes (not marked, 0xFFe9) are used for checksum correction, the last 6 are the frame identifier (0x000000000016).

When the time-frame of the loss event can be pin-pointed to a reasonable scope with the first 3 approaches, the exact moments can be narrowed down by noting the missing frame numbers.

Did you know you can ping a ByteBlower Port?  This requires no extra configuration on your end, the port will respond to Ping requests as soon as it has an IP address.

Pinging a ByteBlower Port is especially helpful when debugging connectivity issues, this allows you to check where the ByteBlower Port is still reachable and from which link connectivity  is lost.

In the examples below we use IPv4, this is of course also works in IPv6.

ByteBlower GUI

A ByteBlower Port is reachable with Ping as soon as the port has valid address in the Realtime View. This will becomes available very early in the configuration phase and remains so throughout the whole test run.

To increase the time for debugging, you can enable a pause between scenario configuration and test-run. Right before the test traffic starts, you'll receive the pop-up below.

When the issue is easily solved, you can still continue the test-run. From ByteBlower 2.11.4 on, the NAT entries will be kept alive until the test starts.

JsAEmAATYAJMIKkElIiXEvXSJ+RzBCyp2LlxJsAEmAATYAJMYCQTcGk5YLo8MBEJ+x9+w6wzZjzqCwAAAABJRU5ErkJggg==

This pop-up is shown by default. To disable you can use checkbox. It can later be enabled again from the Preferences.


wO1XZiTgIbAYgAAAABJRU5ErkJggg==

Finally to make debugging even easier It tends to help having a very minimal scenario: only enough to configure the ByteBlower Ports. To this end we suggest to disable NAT (Port View) and to use only TCP flows.

ByteBlower API

Pinging works just the same for the ByteBlower API: a ByteBlower Port pingable as soon as it has a proper IP address. As the example below shows, this the default behavior and requires no extra configuration.

More examples can be found via https://api.byteblower.com/

import byteblowerll.byteblower as byteblower
  
api = byteblower.ByteBlower.InstanceGet()
bb_server = api.ServerAdd('10.8.254.111')
bb_port = bb_server.PortCreate('nontrunk-1')

l2 = bb_port.Layer2EthIISet()
l2.MacSet('00-bb-00-11-22-33')

l3 = bb_port.Layer3IPv4Set()
dhcp = l3.ProtocolDhcpGet()
dhcp.Perform()

print('ByteBlower Port is pingable on %s' % (l3.IpGet()))

This article is intended for debugging a ByteBlower 1300. Only when the ByteBlower debug log contains the line below should you go through the steps below, preferably after contacting the support.byteblower@excentis.com.

 > Stopped ByteBlower due to: No cores found on NUMA node 0


Unfortunately the NUMA configuration can't be checked over TeamViewer, it needs to be done on the premises. The steps list how to get to these settings. below. Most will probably look familiar. Depending on the result we might need to RMA the system, contacting support will help you further.

Step 1. Enter the BIOS menu
Attach a keyboard and screen to the ByteBlower and reboot the system. You'll need to press DEL when the SUPERMICRO logo shows up.

Step 2 Navigate to the ACPI settings.
In the BIOS we're interested in the ACPI Configuration. This setting is found in the Advanced menu. As you can see below, the option is third to last.

Step 3 Verify NUMA Support.
In the ACPI Configuration you'll find the NUMA Support option. This option should be Enabled.

Press "Esc" to go back to the main menu

And then select Exit -> Save and Exit. (Also do this even if the NUMA was enabled: This step will reprogram the BIOS !!!)

The Byteblower GUI generates several types of reports. The HTML and PDF reports from the ByteBlower GUI are limited by size of the test-run. On this page you'll a guideline to how large your tests can grow. This is a limitation by the graphing of the results over time.

For ByteBlower GUI v2.10 the limit of the graphical report is about 12 000 Graphing-Hr. This value is counted as follows:

  1. Count how long your test tests.
  2. Each type of graph in the report has a different cost. For each graph in the report you'll need to multiply with right value using the table below.
    Graphing type Multiplier Lines
    FrameBlasting 1
    Latency 5 Minimum, Average, Maximum and Jitter twice
    TCP 5 Goodput, TCP throughput, round-trip time, transmit window, retransmissions
    Out of Sequence 1
  3. Multiply the cost with the duration. The total should be less than 12000.

An example

For the example we use actions of of the below config. This scenario has 4 flows and 3 different types.

  • Duration of the test run:
    This test runs for 5 days. This can be read in the Duration colunmn next to the name of the scenario.
  • Calculate the multiplier:
    This test run has a multiplier of 10 + 1 + 6 = 17
    • 2 TCP flows: 2 X 5 = 10.
    • 1 FrameBlasting flow: 1 X 1 = 1
    • 1 FrameBlasting + Latency flow: 1 X (1 + 5) = 6
  • Multiply
    17 X 5days = 17 X 5 x 24 Hr = 2040 Graph-hr   
    2040 Graph-hr < 12000 Graph hr.
    This test-run is ok.

In total this test is below the reporting limit. You can thus generate HTML reports out of it.

Creating larger tests?

The above guideline is only the HTML and PDF reports. For larger testruns you've got following options:

  1. Split up the test-run into smaller scenarios.
  2. Only generate the csv reports using the ByteBlower CLT.
  3. Use the ByteBlower API.

This article is intended for the owners of a ByteBlower 3100 or 3200 model. These systems use an off-the-shelf Intel NIC with traffic generation being handled in software. It has come to our attention that the default firmware on this NIC is not without issues. This guide explains how to upgrade.

The NIC has in particular issues handling when the other side of the fiber connection does not shutdown in a controlled way. For example when the switch restarts due to a configuration change, or due to intermittent loss of power. In response to such events NIC will continuously cycle bringing the link up and losing connection again. This can be seen as:

  • a slow blinking of the LEDS with about a cycle time of about 4s.
  • A square wave of the throughput with the same cycle period.
  • Tests that fail to properly start, only to start quickly a couple short moments later.

Before updating the NIC, it does help to reach out to support.byteblower@excentis.com. We'll help you walk through the following steps:

  1. SSH into the ByteBlower server.
    The login username is "root", default password is "excentis"

  2. Temporarily put the OS back into control of the NIC
    /etc/init.d/dpdk stop

  3. Download and unpack the the new firmware in a temporary directory.
    cd /tmp
    wget --no-check-certificate http://setup.byteblower.com/assets/700Series_NVMUpdatePackage_v8_40_Linux.tar.gz
    tar xzvf 700Series_NVMUpdatePackage_v8_40_Linux.tar.gz

  4. Perform the update on the NIC

    cd 700Series/Linux_x64/
    ./nvmupdate64e

    This last line brings up a screen, quite similar to the one below. Only the x710-2 requires an update. Your ByteBlower server has two such NICs inside both require an update.


    After the update you will get a screen like below. The updater isn't perfect unfortunately, it will complain about communication with the base driver. This message can be ignored.


  5. Reboot the ByteBlower server.
    On next boot the traffic generation servises will be (re)started and normal traffic generation mode is resumed.
    reboot

On new devices you might encounter something like the screenshot below: your new Wireless Endpoint is not supported yet.

What can you do with those devices?

With caution, almost everything; the disclaimer is only text.

The disclaimer means we didn't have time to verify the Wireless Endpoint for this new OS. We didn't verify the functionality and haven't yet verified the performance. Since most OS versions are backwards compatible, there's a good chance that everything works just fine.

Can I do something to fix the disclaimer?

Updating to the last MeetingPoint and Wireless Endpoint helps.

Which devices are supported depends on software running the on the ByteBlower system and on the device itself. Your new phone might already be supported by a newly released update.

Should you notify us?

Yes please, especially when the disclaimer remains after updating. Sending a mail to support.byteblower@excentis.com or contacting your account manager is sufficient.

Reaching out helps us determining priorities. And truth to be told, on occasion we might have missed the release.

What devices are supported?

Follow the link below

The ByteBlower GUI can run multiple times on the same machine. This page lists a couple of the caveats.

Prometheus Exporter

Since v2.18, the ByteBlower GUI exports the real-time status for Prometheus and other tools. This exporter requires unique TCP port, default port 8123. This server is available on http://localhost:8123 .

Enabling multiple exporters on the same machine is possible by selecting free TCP ports in the Preferences. The exporter restarts with this new TCP port immediately after using 'Apply and Close'.

Saved Testruns (Archive View)

The archive view collects all past test runs, their reports, and a copy of the ByteBlower project (recent only). The GUI saves this list in the home folder of the user. (~/byteblower).

Multiple ByteBlower GUIs can use this folder simultaneously, in such cases the Archive view might become outdated. Restarting the ByteBlower GUI is sufficient to refresh last test runs.

If the folder is missing, the ByteBlower GUI recreates the folder structure automatically.

09:01 A.M., Work is done! 

Easy automation with the new JSON report. 

 

The ByteBlower GUI and CLT now have a new type of report: JSON. Just like CSV, this format is intended for machines rather than us humans. Actually, JSON is a huge improvement over CSV. 

As an example, check out the Python code below. 15 lines to check whether our network device did or did not lose traffic in your overnight test. 

Opening the JSON Report 

JSON files can be read in easily in many languages. Check out the Python example below.

import json
def open_json_report(report_filename):
    """ 
        Opens the report for further processing  
    """
    with open(report_filename, "r") as f:
        return json.load(f)

 

Processing a testrun 

Once the report is loaded, you have access to same information as in the other reports. Labor-intensive questions can easily  beanswered in a short script.  

Does your project have many FrameBlasting flows?  
Do you want count how traffic much was lost in total?  
Did you check out the script below? 

    def count_received_traffic(report_data):
        """
            Lost traffic is bad news. How much did we receive?
            Returns the value as percentage: 100 is nothing lost.
        """
        total_expected = 0
        total_received = 0
        for fb_flow in report_data["frameBlastingFlows"]:
            for a_destination in fb_flow["destinations"]:
                total_expected += fb_flow["source"]["sent"]["packets"]
                total_received += a_destination["received"]["packets"]
        if total_expected == 0:
            return 0
        else:
            return 100. * total_received / total_expected
    

    Making work easier 

    But in the end our goal is to make life easier. Just that is possible with the JSON format. Rather than going over single test-run, why not let the computer do it? 

      def test_a_report(report_filename):
          """
              Pass or Fail? Did this nights test succeed?
          """
          report_data = open_json_report(report_filename)
          received = count_received_traffic(report_data)
      
          assert received < 99.9, "FAIL, too much traffic lost in " + report_filename
          print("PASS: We had good testrun")
      

      For most Android Devices there's the Google Play store to install the ByteBlower Wireless Endpoint. This page is intended for those devices that can't find the app there. We'll focus on Android TV, but the same steps also apply others, like those without public internet access.

      The steps below use the Android.apk file. This file is available on our setup pages. As found in the section below, it doesn't hurt to just try this file first.