Knowledge base : Knowledge Base > ByteBlower > Troubleshooting

The ByteBlower GUI can run multiple times on the same machine. This page lists a couple of the caveats.

Prometheus Exporter

Since v2.18, the ByteBlower GUI exports the real-time status for Prometheus and other tools. This exporter requires unique TCP port, default port 8123. This server is available on http://localhost:8123 .

Enabling multiple exporters on the same machine is possible by selecting free TCP ports in the Preferences. The exporter restarts with this new TCP port immediately after using 'Apply and Close'.

Saved Testruns (Archive View)

The archive view collects all past test runs, their reports, and a copy of the ByteBlower project (recent only). The GUI saves this list in the home folder of the user. (~/byteblower).

Multiple ByteBlower GUIs can use this folder simultaneously, in such cases the Archive view might become outdated. Restarting the ByteBlower GUI is sufficient to refresh last test runs.

If the folder is missing, the ByteBlower GUI recreates the folder structure automatically.

This article is intended for the owners of a ByteBlower 3100 or 3200 model. These systems use an off-the-shelf Intel NIC with traffic generation being handled in software. It has come to our attention that the default firmware on this NIC is not without issues. This guide explains how to upgrade.

The NIC has in particular issues handling when the other side of the fiber connection does not shutdown in a controlled way. For example when the switch restarts due to a configuration change, or due to intermittent loss of power. In response to such events NIC will continuously cycle bringing the link up and losing connection again. This can be seen as:

  • a slow blinking of the LEDS with about a cycle time of about 4s.
  • A square wave of the throughput with the same cycle period.
  • Tests that fail to properly start, only to start quickly a couple short moments later.

Before updating the NIC, it does help to reach out to support.byteblower@excentis.com. We'll help you walk through the following steps:

  1. SSH into the ByteBlower server.
    The login username is "root", default password is "excentis"

  2. Temporarily put the OS back into control of the NIC
    /etc/init.d/dpdk stop

  3. Download and unpack the the new firmware in a temporary directory.
    cd /tmp
    wget --no-check-certificate http://setup.byteblower.com/assets/700Series_NVMUpdatePackage_v8_40_Linux.tar.gz
    tar xzvf 700Series_NVMUpdatePackage_v8_40_Linux.tar.gz

  4. Perform the update on the NIC

    cd 700Series/Linux_x64/
    ./nvmupdate64e

    This last line brings up a screen, quite similar to the one below. Only the x710-2 requires an update. Your ByteBlower server has two such NICs inside both require an update.


    After the update you will get a screen like below. The updater isn't perfect unfortunately, it will complain about communication with the base driver. This message can be ignored.


  5. Reboot the ByteBlower server.
    On next boot the traffic generation servises will be (re)started and normal traffic generation mode is resumed.
    reboot

Introduction

When you are testing your device with ByteBlower, you sometimes like to have a pcap-capture of the data for debugging purpose. With the ByteBlower API we can easly capture that network traffic and present it to you as a pcap-file. This feature will help you to debug quickly the problem with your device.

Let me explain how to do this.

Using the ByteBlower GUI

This is the simplest way of capturing traffic. Available since GUI v2.11, and server v2.9. Here you can see how it works :


Options to keep the file size manageable

With the default settings all network traffic is captured from the selected interface. Often this results in very large PCAP files. Since version 2.13 of the ByteBlower GUI, two options are open to reduce filesize:

  • Configure a BPF filter.  This filter is applied by the ByteBlower server, only traffic matching the filter is forwarded to the ByteBlower GUI.
  • Truncate individual frames. Only the first number are kept in the PCAP and the remainder is dropped.

The default settings for both is capture all traffic.

These two options are available in the advanced config part of the capture dialog. They are configured before the capture starts. Did the capture already begin? The options become editable again in dialog after stopping the capture.

Using the Remote Capture Tool

This was the easiest way of capturing traffic on your port, until we brought the capture functionality to the GUI. It's a command-line tool that can be downloaded from the setup pages. It can be used on Windows, Mac and Linux.

Note:

This tool only works on ByteBlower Server running 2.1 and higher

Using the ByteBlower lower-layer API

When you are using our TCL API to transmit your traffic you can use the Rx.Capture of a ByteBlower Port to create a capture. Using our API allows you to automate when to create a capture. Let your script determine when you need to create a capture.

All you need is the Rx.Capture.Add call on your ByteBlower Port.

Rx.Capture.Add

Just like you add a Trigger to a ByteBlower port you can add a Capture. On this capture-object you can set a capture filter and thus define which frames you would like to see captured. After that just start the capture and you are all set. Now lets put these simple words into a working script.

For this post, we assume we have created a back-to-back scenario with:

  • Two configured ByteBlower ports srcPort and dstPort
  • a stream Stream configured to flow between srcPort and dstPort

 Create a capture on the dstPort and configure it

set dstPortCapture [ $dstPort Rx.Capture.Add ]

Now you have a capture Object. Using the Tk command you can visualize it to see what you can do with this object.

Tk screenshot of Rx.Capture object

It is important to set a capture filter on this capture. This will allow you to capture only the packets you are interested in.

$capture Filter.Set "dst port 513"

The filter must be a BPF filter. On http://biot.com/capstats/bpf.html you can find more info on the syntax of these filters and some day-to-day examples.

Start the capture

You can start the capture now.

$capture Start

Now start your traffic and every frame that matches your filter will be captured. You can see how many frames have been captured with the in the result capture object

set captureResult [ $capture Result.Get ]
$captureResult Refresh
$captureResult PacketCount.Get

Stop the capture and get the PCAP-file

Like the start-method there is a stop method the capturing.

$capture Stop

To retrieve your pcap-file use the Pcap.Save method.

$captureResult Refresh
$captureResult Pcap.Save "C:/Users/Excentis/Sniffs/DeviceX.pcap"

On your disk you will find DeviceX.pcap containing the packets that matched your filter that arrived on your ByteBlower destination port (destPort). If you want, you can use Frames.Get to retrieve a TCL-list containing the packets represented in hex-encoding. This way you could use TCL to parse your retrieved packets...

API

You can find the api documentation of the RxCapture here: https://api.byteblower.com/tcl/classRx_8Capture_8RawPacket.html

Pinpointing packet loss in time

When packet loss happens in a flow, it can be useful to know when it happens.  Typically some questions pop up:

  • Was this packet loss a single event?
  • How long did the loss-event take?
  • When did the loss occur?  Was it in the beginning of the frame-blasting flow?  Was it near the end?

This article shows several approaches to find where the packet loss occurred.

Approach 1: Use the ByteBlower GUI reporting

When loss is significant, the ByteBlower GUI will show this in its report.  The "results over time"-graph will show a dip. 

Using the zoom function, a more precise take on this can be made.

Approach 2: Shorten the flow

Sometimes traffic loss occurs only in the beginning of the flow. 

A typical symptom of this is that e.g. 30% of traffic loss occurs in a 10 second test.  The more the test is elongated (e.g. half an hour, a day), the loss percentage decreases.

So when the flow is shortened the loss percentage can do 3 things:

  • The loss percentage decreases. 
    This means there are multiple loss events in the original scenario.  But there are still loss events in the current scenario
  • The loss percentage stays about the same.
    There is a continuous loss over time. 
  • The loss percentage raises.
    This probably means that the main loss-event is at the start of the test.

Approach 3: Divide the long flow in multiple shorter flows.

This approach is a combination of the two approaches above.  When a long flow is divided in multiple (shorter) flows, it should be easier to pin-point some frame-loss events.  An example:

Here, a 10 second flow is split up in 20 flows.  These flows all take 500ms and start at 500ms intervals.

The tabular data in the report will show which short flow has loss.  It is then easier to pin-point the actual moment in time to investigate further.

Approach 4: Using Out of Sequence detection

The ByteBlower has a neat feature one can use to debug loss over time: Out of Sequence detection.  Whilst this feature was implemented to detect bad reordering of packets after queuing them, it can be used to investigate frame loss.

Out of Sequence (OoS) inserts an incrementing frame number into the payload.  This enables ByteBlower to detect frames to be out of order. 

E.g. Frame with ID 5 should arrive after frame 4 and before frame 7.  It arrives after frame 7.
This triggers the ByteBlower server to mark frame 5 out of sequence.

What ByteBlower doesn't track (yet), are the frames which are lost.  In the example above, frame 6 was lost.

Luckily, the ByteBlower server provides another way to do this ourselves: Capturing on the interfaces.

A capture can be created using the ByteBlower GUI.  When this capture is opened in a packet analyzer (e.g. Wireshark), it is possible to extract the frame number out of the payload.

In the screenshot above, the frame number (identifier) is marked.  This is an 8-byte field at the end of the payload.  The first 2 bytes (not marked, 0xFFe9) are used for checksum correction, the last 6 are the frame identifier (0x000000000016).

When the time-frame of the loss event can be pin-pointed to a reasonable scope with the first 3 approaches, the exact moments can be narrowed down by noting the missing frame numbers.

This article is intended for 2.10.2 ByteBlower GUI users or even older versions. Since version 2.11 the ByteBlower GUI opens the report in an external browser and there's thus no need anymore to follow the guidelines below.

Situation in ByteBlower GUI 2.10

After running a scenario in the ByteBlower GUI, a report is always created. Such reports are typically displayed within the ByteBlower GUI application.

On Linux this doesn't work! The following behaviour may occur:

  • After a scenario is run, the new report should automatically be presented to the user. This will not happen.
  • An empty 'report' tab is created, but without a title and without content.
  • If you want to explicitly open a report from the Archive view, an error is shown.

The problem is present in Ubuntu 16.04, Ubuntu 18.04 and Fedora 21.

Problem

The reports are presented in the GUI using an internal browser. If no valid HTML rendering engine is present on the system, the reports cannot be shown.

Sometimes, other programs (such as a standalone browser) may have installed these engines and this issue will not occur.

Solution

To solve this issue, you need to install the correct WebKit rendering package. This package depends on the Linux distribution you use.

Distribution Package name Installation command
Ubuntu libwebkitgtk-1.0-0 sudo apt-get install libwebkitgtk-1.0-0
Fedora webkitgtk sudo yum install webkitgtk

If you have stumbled on this issue and are using a distribution not listed above, don't hesitate to contact us at support.byteblower@excentis.com for help!

Since the ByteBlower Wireless Endpoint app runs on a lot of hardware and operating system configuration, some questions are asked at regular intervals.

The most common questions are listed below.

Wi-Fi statistics availability

Q: The Wi-Fi statistics have strange values in them?

A: The ByteBlower Wireless Endpoint tries to collect as much statistics as it can.  In modern operating systems (like iOS, Android...), the access to some of these statistics is restricted, other statistics are not available at all.  This is usually done to protect the privacy of the end-user.  When the ByteBlower Wireless Endpoint is not able to collect a parameter, it will return a default value.  The defaults are listed below:

  • SSID: an empty string
  • BSSID: the null-MAC address: 00:00:00:00:00:00
  • RSSI: -1
  • Channel number: -1
  • Tx rate: -1

Q: On Android, the Wi-Fi statistics are missing

A: Android uses a permission system to grant apps access to certain components of the operating system.  Wi-Fi parameters are part of the location permission.  Granting this permission to the ByteBlower Wireless Endpoint app enables it to collect these.  Also location must be enabled.

Q: On iOS, the Wi-Fi statistics are missing

A: iOS does not provide access to the RSSI, Channel number and Tx Rate, so the Wireless Endpoint will always return -1 as value.

Miscellaneous

Q: The Wireless Endpoint announces the wrong IP address

A: The application collects all available IP addresses on the device it is running on.  The first IP address found, is shown in the ByteBlower GUI.

Windows 10 provides the last known IP address of an interface even when the interface is down (e.g. the Ethernet cable was disconnected from the interface).

This is a general troubleshooting guide. The focus is on large, mainly hardware problems where the ByteBlower stopped working. The goal of this guide is giving you a head-start. We are reachable at support.byteblower@excentis.com even in cause of doubt don’t hesitate to mail. 

ByteBlower server has no power. 

Symptoms: 

  • Power is plugged in but no indication lights are lit 

There are number of indication lights. Which one to find depends from model to model. You can search for the following: 

  • The main power LED on the front panel.
  • The LEDs on the power supply.
  • The Link Lights on the Management interfaces.
  • The link lights on the IPMI interface .

Troubleshooting: 

  • Most ByteBlower servers have a second power-supply, do try this one.
  • Remove power connection to the ByteBlower for 5 minutes and try to booting the system again. 
  • Check the power connection with another system or check the power outlet directly. 

Diagnosis & treatment: 

  • In case of a broken power supply we can ship you one directly.
  • When the power supply isn't the root-cause, then it's often necessary to replace the while motherboard.

ByteBlower server frozen in the BIOS screen 

Symptoms 

  • While starting up, the ByteBlower server doesn’t get further than the SuperMicro Bios Screen 

Troubleshooting 

  • If you have one, unplug the external USB security key and try booting the system again 
  • If this doesn't work or isn't feasible, we can help you best with the following information: 
    • The status-code at the left bottom 
    • The Model of the ByteBlower server (5100, 4100, 3100, 2100 or 1300) 
    • If possible, the license number. When needed we can look up this info. 

Diagnosis & treatment 

  • Depending on the error code, it’s possible hardware failure of the USB dongle. This dongle can be shipped directly to you. 
  • Hardware failure of the memory in the system. This tends to require replacing the unit. We’ll work with you for the best solution. 

ByteBlower server doesn’t find a hard-disk at startup 

Symptoms 

  •  At start-up of the ByteBlower server, the BIOS complains that no hard-disk is installed. 

Troubleshooting: 

  • If you have one, remove the external USB security dongle and try booting again. 
  • Contact us with the ByteBlower server model and if license number 

Diagnosis & treatment 

  • We can replace a broken security dongle. This can be shipped directly you 
  • In case of a broken hard disk we can provide a new disk image or ship a freshly installed hard-disk.

ByteBlower server management interfaces do not have link

Symptoms 

  •  The ByteBlower server management interface is connected to a PC, but the link is still down

Troubleshooting: 

  • The link LED on the PC and/or ByteBlower server is not lit
  • The command ifconfig on the ByteBlower server states the link is down

Diagnosis & treatment 

  • Some ByteBlower server series (e.g. ByteBlower 3100, generation 3) have issues in link negotiation (MDI-X).  There are 2 remedies:
    • Add a switch between the ByteBlower server and the PC
    • Enable 'Legacy Switch Support Mode' on your PC's ethernet adapter:+j04hWFQWv0AAAAASUVORK5CYII=

ByteBlower process doesn’t start 

Symptoms 

  • The ByteBlower server is reachable. You can ping the system. You are able to login 
  • The ByteBlower GUI or API can’t connect to the ByteBlower server 
    • The API throws a ServerUnreachable Exception 
    • The GUI can’t refresh the ByteBlower server, and can’t start tests on the system 

Troubleshooting 

  • Logged in on the ByteBlower server, run the command byteblower-get-license. 
    This should print out the available license information 
  • Update the ByteBlower server to the latest version. 
  • Run the byteblower-support-tool. This tool collects the server logs and forwards them to ByteBlower support. 
    • This can be started from the ByteBlower GUI from the Server View. 
      Right click on the server icon and choose the Run Support Tool option 
    • When the ByteBlower server has internet access, you can also start if from the ByteBlower server itself.  

Diagnosis & treatment 

  • In case of a broken Security dongle, we’ll replace it and ship directly to you. 
  • Critical Software bugs are fixed with the highest priority. We’ll try to reproduce the error in our lab, but through the support-tool we can check crashes on your system. 

Not able to transmit traffic 

Symptoms 

  • In the ByteBlower GUI the interfaces are crossed out 
  • Fails to start tests 

Troubleshooting 

  • Refresh the ByteBlower server in the ByteBlower GUI 
  • Check whether the Link lights are lit at the ByteBlower server or on the ByteBlower switch. When easily possible do check both sides of the connection.  
  • Start a capture on the ByteBlower port. Do you receive any network traffic? 

Diagnosis & treatment 

When you can’t send traffic with the ByteBlower it’s important to diagnose first whether 

  • It’s a hardware issue.
    No lit Link lights on the ByteBlower server are very good indication.
  • or it’s a network issue elsewhere in the network. Is the modem online? Are other systems reachable?

ByteBlower trunking interface loses link

Symptoms

  • The GUI shows the trunking interface with a red cross, after a refresh the link is up, after another refresh it is down again
  • When sending traffic, a loss event is seen approximately every second

Troubleshooting

  • The link indicators on the ByteBlower server and/or the ByteBlower switch are flashing, even when there is no traffic

Diagnosis & treatment

  • Unplug the fibers from the (Q)SFP modules
  • Remove the (Q)SFP modules from the switch and the ByteBlower server
  • Re-insert the (Q)SFP modules in the switch and ByteBlower server
  • Plug the fibers into the (Q)SFP modules

ByteBlower trunking interface link speed negotiation fails

Symptoms

  • The ByteBlower trunking interface only reaches 40Gbps on a 100Gbps server

Troubleshooting

  • On the ByteBlower 100Gbps switch, the trunking interface color in orange

Treatment

  • Reboot the 100Gbps switch

ByteBlower physical interface fails to initiate link when an SFP+ module inserted

Symptoms

  • After inserting the SFP+ module and connecting the fiber, the link does not come up.

Troubleshooting

  • The ByteBlower server is a ByteBlower 4100
  • After a reboot the link comes up.

Diagnosis & Treatment

  • Due to a firmware issue in the 4100 NICs, the NIC fails to detect link when the server boots without SFP+ modules installed.
    Treatment: Always boot the ByteBlower 4100 server with the SFP+
    modules installed.

Known Issue: Interface Incompatibilities

Due to hardware restrictions, some of our traffic interfaces do not interoperate with some hardware (e.g. restrictions to SFP modules, restrictions to direct-attached-cables...).
In this article we list incompatibilities known to us.  If you encounter other equipment not working with ByteBlower, please inform us using our support portal.

ByteBlower 3100/3200

Due to our hardware vendor, we are restricted to use Intel SFP+ modules

ByteBlower 4100

  • Customers reported issues with Twinax DAC-cables.
  • We've noticed issues with 1Gbit/s SFP tranceivers.
    • Copper SFP (1Gbit/s) cause systrem crashes.
    • No traffic was possible over Fiber SFP (1Gbit/s)
  • 10Gbit/s NBASE-T SFP+ tranceivers have limited connectivity. Check essential connectivity before using these with the ByteBlower 4100. For example:
    • No traffic was possible between the 'native' 10Gbit/s on the M4300
    • No issues were reported between 2 copper SFP+ of the same manufacturer

ByteBlower 5100

  • No known issues. Do note that at the time of writing (Oct 2020) there a number of competing 100Gbit/s Ethernet standards.
    Don't hesitate to contact us with questions or to report your experience.

Netgear M4200

This switch is used as 8 port NBASE-T add-on. This unit is supplied with a custom configuration for the ByteBlower systems. This config is tested continuously.

  • Only use Optic trancievers on the SFP+ ports.
    No functional copper SFP+ (NBASE-T) tranceivers found yet.

Netgear M4300

This is also called the Flex switch due to its flexibility in configuration. Following limitations were found.

  • Copper 10Gbit/s SFP+ (NBASE-T) won't negotiate to lower speeds. Connect these transceivers only with other 10Gbit/s links.

Allied Telesis x950 28xsq

This switch is used for the 100Gbit/s ByteBlower 5100.

  • Copper 10Gbit/s SFP+ (NBASE-T) won't negotiate to lower speeds. Connect these transceivers only with other 10Gbit/s links.

Howto: Using the Support Tool

Introduction

Sometimes a ByteBlower Server doesn't behave as expected, because of a bad configuration, or a bug.  When you encounter such problems, the ByteBlower Support Team is ready to help you out !  But sometimes, lots of information about the system is required in order to find the cause of the problem. This is why each ByteBlower server has a Support Tool included. This tool does al the hard work of gathering all information we possibly need to sort the problem out for you.

All gathered information is then compressed and sent to the ByteBlower Support Team.

The new ByteBlower 5100 model (November 2020) has different OS. This system no support-tool yet. Do contact us at support.byteblower@excents.com when you experience issues with this system.

When to use the tool

When problems occur with a ByteBlower server, you can contact us at support.byteblower@excentis.com. If the cause of the problem isn't immediately clear, we will ask you to use the Support Tool.

Running the tool 

  • Open a console and log into your ByteBlower Server using the ssh-protocol.
    Default username: root
    Default password: excentis
ssh root@byteblower.example.com
    ____        __       ____  __                       
   / __ )__  __/ /____  / __ )/ /___ _      _____  _____
  / __  / / / / __/ _ \/ __  / / __ \ | /| / / _ \/ ___/
 / /_/ / /_/ / /_/  __/ /_/ / / /_/ / |/ |/ /  __/ /    
/_____/\__, /\__/\___/_____/_/\____/|__/|__/\___/_/     
      /____/                                by Excentis  
root@byteblower.example.com's password: 
  • Execute the byteblower-support-tool command
root@byteblower ~# byteblower-support-tool
Welcome to the ByteBlower support tool!

This tool will collect all relevant information
in order to process ByteBlower Server issues.

The collected information will be uploaded to the
ByteBlower support server.

Do you wish to continue (y/n)?
  • Press Y to start the gathering of the needed information
Getting ByteBlower username
Collecting core dumps (this may take a while)
Gathering ByteBlower logging
Gathering system information
Gathering network information
Gathering process information
Gathering ByteBlower information
Gathering Napatech support tool data
  napatech not installed
Creating support archive `support_archive.151113-174204.ByteBlower160672342.tar.bz2'
Uploading support archive `support_archive.151113-174204.ByteBlower160672342.tar.bz2'
Support diagnostics have been uploaded
Cleaning up...

What if upload failed

If your ByteBlower can't reach our servers (bbdl.excentis.com) due to lab-restrictions then the upload will fail. The support-information is stored on the disk of the ByteBlower at following location: /mnt/storage/reports/
You can copy the report-file ( support_archive.xxxxx-<date>.ByteBlower<serialnumber>.tar.bz2 ) to your laptop using scp/winscp tools. If the file isn't that big you can email it to us at support.byteblower@excentis.com.

If it's to big to email, just write a mail to us ( support.byteblower@excentis.com ) and we will provide you with a link where you can upload your support-report.

Gathered Information

The following information is collected:

  • Core dumps
    When a ByteBlower Server crashes, the state of the working memory of the processes is written to a file.  This file also contains memory management information, and other processor and operating system flags and information. These core dump files can then be used to assist in diagnosing and debugging errors.
  • ByteBlower logs
    While using the ByteBlower server, usage information and occurred errors are being logged to /var/log/ByteBlower.
  • System information
    A number of diagnostic tools is executed to gather information about present hardware : lsusb, lspci, lshw, uname, ethtool.
  • Network information
    Some specific commands are executed to get detailed information about the network configuration : ifconfig, route, the content of /etc/resolv.conf and /etc/conf.d/net.
  • Process information
    Information is gathered about currently running processes, and the resources they are using : the content of /proc/cpuinfo, ps auxw, the content of /proc/meminfo.
  • ByteBlower information
    ByteBlower specific configuration files : port_cfg.xml, byteblower_cfg.xml, username, and license information.
  • Napatech support tool data
    If a Napatech network card is present in the system, the Napatech support tool will be activated too.
  • ByteBlower Filter and Usage statistics
    Log of the applied filters to get insight in the usage of the ByteBlower system. You can opt-out using the ByteBlower-Configurator->ByteBlower Server configuration -> Preferences -> Collect Statistics .

Follow up

The gathered information can then be investigated by the ByteBlower Support Team.

This article is intended for debugging a ByteBlower 1300. Only when the ByteBlower debug log contains the line below should you go through the steps below, preferably after contacting the support.byteblower@excentis.com.

 > Stopped ByteBlower due to: No cores found on NUMA node 0


Unfortunately the NUMA configuration can't be checked over TeamViewer, it needs to be done on the premises. The steps list how to get to these settings. below. Most will probably look familiar. Depending on the result we might need to RMA the system, contacting support will help you further.

Step 1. Enter the BIOS menu
Attach a keyboard and screen to the ByteBlower and reboot the system. You'll need to press DEL when the SUPERMICRO logo shows up.

Step 2 Navigate to the ACPI settings.
In the BIOS we're interested in the ACPI Configuration. This setting is found in the Advanced menu. As you can see below, the option is third to last.

Step 3 Verify NUMA Support.
In the ACPI Configuration you'll find the NUMA Support option. This option should be Enabled.

Press "Esc" to go back to the main menu

And then select Exit -> Save and Exit. (Also do this even if the NUMA was enabled: This step will reprogram the BIOS !!!)

Did you know you can ping a ByteBlower Port?  This requires no extra configuration on your end, the port will respond to Ping requests as soon as it has an IP address.

Pinging a ByteBlower Port is especially helpful when debugging connectivity issues, this allows you to check where the ByteBlower Port is still reachable and from which link connectivity  is lost.

In the examples below we use IPv4, this is of course also works in IPv6.

ByteBlower GUI

A ByteBlower Port is reachable with Ping as soon as the port has valid address in the Realtime View. This will becomes available very early in the configuration phase and remains so throughout the whole test run.

To increase the time for debugging, you can enable a pause between scenario configuration and test-run. Right before the test traffic starts, you'll receive the pop-up below.

When the issue is easily solved, you can still continue the test-run. From ByteBlower 2.11.4 on, the NAT entries will be kept alive until the test starts.

JsAEmAATYAJMIKkElIiXEvXSJ+RzBCyp2LlxJsAEmAATYAJMYCQTcGk5YLo8MBEJ+x9+w6wzZjzqCwAAAABJRU5ErkJggg==

This pop-up is shown by default. To disable you can use checkbox. It can later be enabled again from the Preferences.


wO1XZiTgIbAYgAAAABJRU5ErkJggg==

Finally to make debugging even easier It tends to help having a very minimal scenario: only enough to configure the ByteBlower Ports. To this end we suggest to disable NAT (Port View) and to use only TCP flows.

ByteBlower API

Pinging works just the same for the ByteBlower API: a ByteBlower Port pingable as soon as it has a proper IP address. As the example below shows, this the default behavior and requires no extra configuration.

More examples can be found via https://api.byteblower.com/

import byteblowerll.byteblower as byteblower
  
api = byteblower.ByteBlower.InstanceGet()
bb_server = api.ServerAdd('10.8.254.111')
bb_port = bb_server.PortCreate('nontrunk-1')

l2 = bb_port.Layer2EthIISet()
l2.MacSet('00-bb-00-11-22-33')

l3 = bb_port.Layer3IPv4Set()
dhcp = l3.ProtocolDhcpGet()
dhcp.Perform()

print('ByteBlower Port is pingable on %s' % (l3.IpGet()))

On new devices you might encounter something like the screenshot below: your new Wireless Endpoint is not supported yet.

What can you do with those devices?

With caution, almost everything; the disclaimer is only text.

The disclaimer means we didn't have time to verify the Wireless Endpoint for this new OS. We didn't verify the functionality and haven't yet verified the performance. Since most OS versions are backwards compatible, there's a good chance that everything works just fine.

Can I do something to fix the disclaimer?

Updating to the last MeetingPoint and Wireless Endpoint helps.

Which devices are supported depends on software running the on the ByteBlower system and on the device itself. Your new phone might already be supported by a newly released update.

Should you notify us?

Yes please, especially when the disclaimer remains after updating. Sending a mail to support.byteblower@excentis.com or contacting your account manager is sufficient.

Reaching out helps us determining priorities. And truth to be told, on occasion we might have missed the release.

What devices are supported?

Follow the link below

Latency is the time a packet takes to travel from source to destination. Hence, seeing negative latency values in the report is always a surprise, it's as if packets arrived even before they were sent! The figure below is one such an example. This article explains how this is possible and what actions can be taken to prevent it.

Intro: Measuring latency with ByteBlower

This section explains how ByteBlower performs the latency measurement. This helps to understand the cause of negative latency values.

The picture below shows two FrameBlasting flows. The one at the top is a regular flow with packets going from PORT_1 to PORT_2, the one at the bottom is a latency flow.

Contrary to the regular flow, the ByteBlower server will modify traffic for the latency flow! Part of the payload content is replaced with a realtime timestamp. This value represents the moment the packet leaves the ByteBlower, so the current local time at the source ByteBlower port.

The receiving ByteBlower port only needs to inspect the packet and compare the timestamp in the packet to its current time. The difference (local time at the destination/receiver ByteBlower port minus the timestamp value in the packet) represents how long the packet was in transit, i.e. the latency of the packet.

The advantage of this approach is that the only communication between PORT_1 and PORT_2 is through the test traffic itself, no other protocols are needed. The ports don't have to be on the same interface, same server or even in the same lab.

The above approach is used both by ByteBlower servers and Wireless Endpoints. All information is available in the traffic itself. This makes it very flexible to measure the latency between ports docked to the same server, between a server and Wireless Endpoint or between different ByteBlower servers.

Since we rely on local clocks on the ports to generate and compare the timestamps, and since the timestamps are carried in the packet payload, there are two major reasons for problems with measuring latency:

  • The sending and receiving side measure the local time differently (clocks not synchronized)
  • The packets have been corrupted

We'll have a look at both problems in more detail.

Synchronized clocks

In the section above we've explained how the transmitting side adds a timestamp to the frames The receiving end compares this value to its local time. We expect to measure a difference between the value in the frame and the time at the receiving end, this is how long the packet was under way. Hence the clocks need to be synchronized, otherwise we're just measuring the difference in clocks rather than the packet transit time. Clock differences where the receiver port's clock is trailing the sender port's clock is the major cause of negative latency values!

Within the same ByteBlower

Not a problem here, since source and destination port use the same clock!

If the setup allows you to, using a single ByteBlower server is the preferred way to measure latency!

ByteBlower to ByteBlower

As mentioned above both ByteBlowers needs to be time-synced. The article below offers more info on how to configure this:

https://support.excentis.com/index.php?/Knowledgebase/Article/View/15

Note that for latency measures, it is important to keep the clocks of the different ByteBlower servers in sync (using NTP or PTP), since otherwise they can drift away from each other. If you only sync them once, the first measurement can give good latency results, but repeating that same test a couple of months later could provide wrong results.

Wireless Endpoint to ByteBlower

Measuring latency using a Wireless Endpoint is a challenge. Contrary to regular ByteBlower traffic, here it is not a ByteBlower server port that sends or receives traffic, but it's the Wireless Endpoint (your phone e.g.). Key question: which timestamp to use?

When registering with a Meeting Point, the Wireless Endpoint tries and synchronizes its local time to the time of the Meeting Point (the latter taking the time of the ByteBlower server it is connected to). Important to note that this time synchronization is not as good as two ByteBlower servers using the same NTP server! Whereas the ByteBlower servers can use a (typically stable) management network to synchronize, the Wireless Endpoint uses the same connection as the traffic itself.

Furthermore, there is no update of time synchronization during a test, so a Wireless Endpoint moving from one AP to another AP during a test could experience a severe time drift without its clock being updated.

Bottom line: latency measures for a Wireless Endpoint are provided as a best effort service. Be careful when interpreting these results!

Packet corruption

Since the timestamps are carried in the packet payload, there is another reason why latency measurements can go wrong: corrupted packets.

Most likely to happen in case of a corrupted packet is:

  1. packet is dropped due to CRC failure
  2. CRC is valid, but timestamp was altered, packet is counted as invalid (no latency) because the calculated time offset is too large (>1 minute)
  3. CRC is valid, but timestamp was altered, latency shows unexpected peaks (multiple seconds, but less than 1 minute)

Below is an example of such a latency measurement (simulated through an impairment node).

Next to the obvious spikes in latency, you can have a look at the packet loss measurements to know whether or not the negative latency was caused by packet corruption.

To solve these problems, you will need to solve the packet corruption itself (network or device).

Another case?

If your tests matches none of the above cases,then it's a good idea to contact us at support.byteblower@excentis.com . We'll help you further from there.

You probably arrived at this page after clicking on an Info Item in a ByteBlower Report looking like this one:

This article explains what this means.

IMPORTANT : This issue has been fixed since the release of ByteBlower 2.9.  You will only see this message when using an older version.
Since v.2.9 the UDP checksum is set correctly.

What happened ?

The test scenario you just executed contains a flow, sending IPv4 UDP Frames with the Automatic Checksum enabled on the Layer 4 tab in the Frame View.

On top of that, the Latency Measurement option was enabled for this flow. To be able to measure the latency, a timestamp value is inserted at the end of each Frame. Of course, this makes the UDP checksum invalid.

So, after setting the timestamp, the UDP checksum should be re-calculated automatically. But on some ByteBlower servers, this is not possible. Due to limitations of the hardware used inside the ByteBlower server, it is not possible to send Frames with a timestamp and a correct UDP checksum.

The workaround we came up with, is to set the checksum value automatically to zero, which means that the checksum is disabled.

This way, the frames sent out by the ByteBlower server are perfectly valid, and will not be dropped by any device in your test setup.

For more technical details about this issue, check out this article: Known Issue: 100% packetloss on latency measurements 2x00/4x00 series

So, should I be worried ?

No, this is no reason to be worried.  In most cases, this will not affect the value of your report.

Some UDP checksum values have been set to zero, which means that they were disabled.

So only when you are testing devices that actually use this checksum value, you should be aware of the possible consequences caused by this behavior.

 

 

On the Link Layer, each Frame is surrounded by a number of extra bytes. The table below shows the complete Ethernet Frame.

 
Preamble Start of Frame Delimiter (SFD) Frame (as displayed in the Frame View) Frame Check Sequence (32 bit CRC) Interframe Gap (Pause)
7 Bytes 1 Byte 60 - 10000 Bytes 4 Bytes 12 Bytes

For every Frame, an extra 24 Bytes need to be taken into account.  This knowledge is crucial, for example when determining the maximum throughput on an ethernet link.

For example, when sending Frames with a length of 60 Bytes, the actual amount of bytes sent is 84. This means that only 60/84 = 71.42% of the transmitted data consists of the bytes displayed in the Frame View. So, to achieve Line Rate on a 1 Gbps link, the Frame Rate is calculated as follows :

  • Frame Rate = 1 000 000 000 bps / ( ( 1B + 7B + 60B + 4B + 12B ) * 8b/B ) = 1 488 095.24 fps
  • Ethernet Bitrate (without overhead) = 1 488 095.24 fps * 60B * 8b/B = 714 285 714,29 bps = 714 Mbps

 

Tip : When sending bigger Frames, the relative overhead becomes smaller.

In the Preferences of the ByteBlower GUI under Project>BitRate, you can specify what you want to be included in the Layer 2 Speed calculation. A screencapture of this view can be found at the bottom of this article.

  • Frame (as displayed in the Frame View)
  • Frame and FCS (includes the CRC, so each Frame gets 4 Bytes extra)
  • Frame, FCS, Preamble, SFD and Pause (each Frame gets 24 Bytes extra)

This affects the calculated rates displayed in the Frame Blasting Flow Templates, and in the Reports.

Configuring the reported bitrate

 

 

It rarely occurs that the ByteBlower process fails to start. One of the errors that can be encountered is:

Failed to allocate 1156 MB hostbuffers for numa node 1

This article explains what happens and how to get out of this state.

Memory allocation

It is a fact: computers have RAM memory, processes use this memory. When a computer is running for some time and running tasks accordingly, large parts of the memory is getting allocated by processes and free'd afterwards. The memory management part of the Linux kernel, divides the memory in so-called pages. When a program requests some memory, one or more pages is reserved for that program. When the program is terminated, the memory is free'd (deallocated if you like) again.

A side effect of this is that memory can get fragmented. When a program requests for a certain amount of memory (e.g. 100 megabytes), the kernel tries to find that number of continuous pages (if available), otherwise it can return some fragments.

ByteBlower

Some parts of the ByteBlower process require large chunks of memory. That memory must not be fragmented for technical reasons, so if that allocation fails, this specific error is given.  

The bad news is: There is no way to force the kernel to reallocate all memory so there are large chunks of memory available. This functionality should require reassigning memory that a process can be using at that specific moment.  So the only remedy is rebooting the server.

$ reboot

The Byteblower GUI generates several types of reports. The HTML and PDF reports from the ByteBlower GUI are limited by size of the test-run. On this page you'll a guideline to how large your tests can grow. This is a limitation by the graphing of the results over time.

For ByteBlower GUI v2.10 the limit of the graphical report is about 12 000 Graphing-Hr. This value is counted as follows:

  1. Count how long your test tests.
  2. Each type of graph in the report has a different cost. For each graph in the report you'll need to multiply with right value using the table below.
    Graphing type Multiplier Lines
    FrameBlasting 1
    Latency 5 Minimum, Average, Maximum and Jitter twice
    TCP 5 Goodput, TCP throughput, round-trip time, transmit window, retransmissions
    Out of Sequence 1
  3. Multiply the cost with the duration. The total should be less than 12000.

An example

For the example we use actions of of the below config. This scenario has 4 flows and 3 different types.

  • Duration of the test run:
    This test runs for 5 days. This can be read in the Duration colunmn next to the name of the scenario.
  • Calculate the multiplier:
    This test run has a multiplier of 10 + 1 + 6 = 17
    • 2 TCP flows: 2 X 5 = 10.
    • 1 FrameBlasting flow: 1 X 1 = 1
    • 1 FrameBlasting + Latency flow: 1 X (1 + 5) = 6
  • Multiply
    17 X 5days = 17 X 5 x 24 Hr = 2040 Graph-hr   
    2040 Graph-hr < 12000 Graph hr.
    This test-run is ok.

In total this test is below the reporting limit. You can thus generate HTML reports out of it.

Creating larger tests?

The above guideline is only the HTML and PDF reports. For larger testruns you've got following options:

  1. Split up the test-run into smaller scenarios.
  2. Only generate the csv reports using the ByteBlower CLT.
  3. Use the ByteBlower API.

This article describes the behavior of the NATDiscovery in the ByteBlower GUI. The text intends to answer technical questions. For a step-by-step guide on how to use NATDiscovery we have an article in the examples section.

The NAT discovery is enabled in the port view. We'll assume that the reader is already somewhat familiar with how a NAT operates. The focus in this article is solely on using such ByteBlower ports for FrameBlasting.



Problem description

Any ByteBlower Port can be used as source or destination of a FrameBlasting flow. As we'll see further, this makes an important difference. For the clarity we'll call traffic out of the NAT the upstream direction. The ByteBlower Port with the NAT config is then the source of the flow. The reverse direction, traffic streaming into the NAT is called the downstream direction. In this second case, the NAT config is found on the destination. An example is found in the figure below.



In the figure above, both NAT_CPE_1 and NAT_CPE_2 are inside the LAN. They are configured with a valid IPv4 address and can reach each-other using that address. Yet these addresses won't be known outside of their LAN. A router or modem provides the connection with the wider network. It will modify all traffic into and out of the LAN.

What part of the packets do change? Only where the frames are addressed to. This keeps the devices inside the LAN private.  A very common situation is that that all upstream traffic shares the same IPv4 source address. When needed the Layer 4 port numbers (e.g. UDP) will also change. This packet modification is called a Network Address Translation or NAT. One such NAT mapping is made for each IPV4 address and UDP port number being sent upstream. The devices inside the NAT, they themselves don't know to which values their packets will be translated to.

This upstream traffic is shown in the figure below. CPE1 and CPE2 both sent traffic upstream (green arrows). The addresses of this traffic are translated. A node in the Wide Area Network (WAN) can't tell anymore whether the lighter shade was from CPE1 or CPE2, only the router in the middle is able to.

Downstream traffic is more tricky. The IPv4 addresses of the devices inside the LAN are kept secret, you can't thus reach them with these. In fact you can only reach the devices using an already existing NAT mapping. This mapping is only created from upstream traffic. In summary, if you want to sent traffic downstream, the CPE first needs to contact you upstream.

This translation has an impact on your ByteBlower. Default the ByteBlower uses both source and destination addresses to recognize to whom the traffic belongs. Aftertranslation these values will have changed. The addresses thus need to be resolved, this will be described in the upstream discovery section. In addition for downstream traffic the NAT mapping needs to be initialized with upstream data first.

Upstream Discovery

As presented above, the addresses of  packets from the CPE to the WAN are translated by the NAT. The upstream discovery determines the values they are being translated to. This discovery is done for FrameBlasting flows with a source that has the NAT config enabled. It's performed while setting up the test.

Determining the translation is straightforward, the ByteBlower GUI takes the steps below:
  1. Create a frame with an easy to recognize payload. Base the addresses on the data you'd wish to sent.
  2. Send the frame through the NAT.
  3. At the destination listen for the traffic from the previous step.
  4. Discover the translated addresses from the received traffic.
This forward discovery is done for all addresses. When multiple flows use the same frames and the same ports then the results of the forward discovery are reused. Most NAT devices will retain this translation for at least 2 minutes.

Below we'll briefly describe the steps in this discovery. This will help troubleshooting potential issues.

Step 1: Frame creation

A new frame is created based on the original frame. We'll call this the NAT Discovery Frame. It has the same values for following fields:
  • MAC addresses
  • VLAN headers when applicable
  • IPv4 addresses taken from the ByteBlower Port config
  • OSI Layer 4 type. For most configurations this will be UDP.
  • Layer 4 port numbers.
The frame differs solely in the payload. This has been replaced with a small textual description and a unique token. It stays small: the frame is about 100 bytes large including Ethernet overhead.

Step 2: Upstream traffic

The probing frame is sent out from the ByteBlower port with the NAT config. The frame rate is low: about 10 packets a second or at about 8 kbit/s. Traffic is generated for at most 20 second, but as we'll see next, most of the time the NATDiscovery finishes earlier.

Step 3 and 4: Receiving the frame

A RawBasicCapture captures all traffic. A BPF filter based on the IPv4 and Layer 3 (mostly UDP) destination addresses of the frame limits the number of captured packets. Each received packet is compared to the expected payload from step 1. This comparison is done eagerly: as soon as new frames arrive.

The source IPv4 address and the source Layer 4 port of the received frame are retained. We call these the public addresses. These values are used to count the traffic during the test-run.

Downstream Discovery

In this section we'll explain the downstream discovery. This algorithm is used when the destination of the FrameBlasting flow is behind a NAT.

As we mentioned in the introduction, downstream traffic through a NAT requires first upstream data. These first packets create the NAT mapping. Only after this step, downstream traffic is possible. This is reflected in the steps for the downstream discovery:
  1. Use the upstream discovery to create the NAT mapping en  determine its values
  2. Adapt the UDP frame to these new settings: The IPv4 and UDP destination addresses take the values of the forward learning

Step 1 Do upstream discovery

Upstream discovery is started from the destination ByteBlower port, this is the ByteBlower port inside the NAT. This can be confusing: even though this port is configured to receive the traffic of the flow, it will transmit during initialization.

The public addresses are used in the second step.

Step 2: Adapt the Frames

The configured ByteBlower frames are modified to the learned NAT mapping. The public IPV4 address and learned Layer 4 port are used as the destination of the frame.

Brief summary

Observation A 100% packetloss when performing latency measurements
Affected version ByteBlower server 2x00/4x00 ( version 1.10.18 and later )
Cause Incorrect UDP checksum
Fix Waiting for fix from hardware vendor
Workaround IPv4: Set UDP checksum to 0x00 instead of automatic
IPv6: Set UDP length to UDPLength - 8

 

The issue

Connect 2 ByteBlower ports back-to-back. Configure on these ports a frameblasting flow and activate a latency measurement. The report at the end of the test will show a packetloss of 100%.

What went wrong

Lets re-run the test and take a capture of the flow. Opening the capture in wireshark shows us the problem. The UDP checksums are incorrect. Why? In the ByteBlower server 2100 series it is the network-card itself that injects the timestamp to perform the latency measurement. This injection by the card allows us to have a 10ns precision. But the card performs the UDP checksum BEFORE it injects the timestamp. This results in an incorrect checksum. Your device-under-test (DUT) may drop those packets because of the incorrect checksum.

Fix

Our hardware vendor of the network-card already confirmed the issue and is working on a fix. Once this fix is released to us we will apply it to our ByteBlower software and release an update. We do have a workaround this issue.

Workaround

IPv4 testing

In IPv4 the UDP checksum is optional. By setting it to 0 you disable the checksum and thereby also the checks on the receiving side. So your DUT won't drop the packets. The screenshot below shows you where you can set the UDP checksum to 0. Go to the Frame-view and under the "Layer 4" tab you can change the checksum

IPv4Workaround

IPv6 testing

In IPv6 the UDP checksum in mandatory. So you can't use the same trick as on IPv4. Here we will have to change the UDP length. As explained in following article ( Background: Adding FrameTags to your ByteBlower frame - structure and behaviour ) the timetag is added normaly at the end of the UDP packet. If we substract the length of a timetag ( 8bytes ) from the UDP length then the timetag will not influence the UDP checksum and therefor the checksum will still be correct when the timetag is added by the network-card. The screenshot below shows you where and how you can change the UDP length value. Go to the Frame-view and under the "Layer 4"-tab you can change the Total Length. In this example the frame is 64Bytes. The UDPLength is dus 0x0A and minus 8 makes a UDPLength of 0x02

 

IPv6Workaround

BUT: Since the timetag is not covered by the UDPlength, some DUT can remove those last 8 bytes and remove the timetag.

The ByteBlower Configurator is a minimal GUI allowing users to easily configure the ByteBlower server. This minimal GUI can even be run over SSH.
 In most cases, nothing needs to be done but Mac OS X users can encounter an issue where the GUI does not start.

The reason for this issue is the TERM environment variable. The ssh client on Mac OS X defines this variable as xterm-256color. The current ByteBlower OS does not recognize this value and won't start.

To fix this, the following commands will do the trick:

export TERM=xterm; byteblower-configurator

This will start the ByteBlower configurator in a normal way.

You probably arrived at this page after clicking on an Info Item in a ByteBlower Report looking like this one:

This article explains what this means.

What happened ?

The test scenario you just executed contains a flow, sending IPv6 UDP Frames with the Automatic Checksum enabled on the Layer 4 tab in the Frame View.

On top of that, the Latency Measurement option was enabled for this flow. To be able to measure the latency, a timestamp value is inserted at the end of each Frame. Of course, this makes the UDP checksum invalid.

So, after setting the timestamp, the UDP checksum should be re-calculated automatically. But on some ByteBlower servers, this is not possible. Due to limitations of the hardware used inside the ByteBlower server, it is not possible to send Frames with a timestamp and a correct UDP checksum.

Because the UDP checksum is required in IPv6 Frames, we cannot apply the same workaround as for IPv4 frames. (setting the checksum value automatically to zero, which means that the checksum is disabled.)

This way, the IPv6 frames sent out by the ByteBlower server will have an invalid UDP checksum. Some devices in your test setup may drop these frames.

For more technical details about this issue, check out this article: Known Issue: 100% packetloss on latency measurements 2x00/4x00 series

So, should I be worried ?

This issue may cause 100% packet loss because devices in your test setup may drop the frames with an incorrect IPv6 UDP checksum.

Otherwise, the results displayed in the report are perfectly valid.

If you run into trouble with the ByteBlower GUI you can contact our support ( support.byteblower@excentis.com ). It is best to add as much info to so we can asses directly what the problem is and help you further. The relevant info is

  • ByteBlower project
  • HTML-Report ( if you have one )
  • GUI logs

The last one can be found through following menu

Clicking on SystemInfo will open a text-editor with the full log. Just add this file to your support-request and send it to us

When creating a packet capture (.pcap file) using the ByteBlower GUI, then it's possible that some packets will have an incorrect checksum in the capture. Wireshark typically marks those in red with the hint: "maybe caused by UDP checksum offload?".