Pinpointing packet loss in time with the ByteBlower GUI
Posted by Vincent De Maertelaere, Last modified by Vincent De Maertelaere on 21 April 2021 09:33 AM
Pinpointing packet loss in time
When packet loss happens in a flow, it can be useful to know when it happens. Typically some questions pop up:
Approach 1: Use the ByteBlower GUI reporting
When loss is significant, the ByteBlower GUI will show this in its report. The "results over time"-graph will show a dip.
Using the zoom function, a more precise take on this can be made.
Approach 2: Shorten the flow
Sometimes traffic loss occurs only in the beginning of the flow.
A typical symptom of this is that e.g. 30% of traffic loss occurs in a 10 second test. The more the test is elongated (e.g. half an hour, a day), the loss percentage decreases.
So when the flow is shortened the loss percentage can do 3 things:
Approach 3: Divide the long flow in multiple shorter flows.
This approach is a combination of the two approaches above. When a long flow is divided in multiple (shorter) flows, it should be easier to pin-point some frame-loss events. An example:
Here, a 10 second flow is split up in 20 flows. These flows all take 500ms and start at 500ms intervals.
The tabular data in the report will show which short flow has loss. It is then easier to pin-point the actual moment in time to investigate further.
Approach 4: Using Out of Sequence detection
The ByteBlower has a neat feature one can use to debug loss over time: Out of Sequence detection. Whilst this feature was implemented to detect bad reordering of packets after queuing them, it can be used to investigate frame loss.
Out of Sequence (OoS) inserts an incrementing frame number into the payload. This enables ByteBlower to detect frames to be out of order.
E.g. Frame with ID 5 should arrive after frame 4 and before frame 7. It arrives after frame 7.
What ByteBlower doesn't track (yet), are the frames which are lost. In the example above, frame 6 was lost.
Luckily, the ByteBlower server provides another way to do this ourselves: Capturing on the interfaces.
In the screenshot above, the frame number (identifier) is marked. This is an 8-byte field at the end of the payload. The first 2 bytes (not marked, 0xFFe9) are used for checksum correction, the last 6 are the frame identifier (0x000000000016).
When the time-frame of the loss event can be pin-pointed to a reasonable scope with the first 3 approaches, the exact moments can be narrowed down by noting the missing frame numbers.