Pinpointing packet loss in time

Introduction

When packet loss happens in a flow, it can be useful to know when it happens.  Typically, some questions pop up:

  • Was this packet loss a single event?
  • How long did the loss-event take?
  • When did the loss occur?  Was it at the beginning of the frame-blasting flow?  Was it nearing the end?

This article shows several approaches to find where the packet loss occurred.

Approach 1:
Use the ByteBlower GUI reporting

When loss is significant, the ByteBlower GUI will show this in its report.  The "results over time"-graph will show a dip. 

Using the zoom function, a more precise take on this can be made.


Approach 2: Shorten the test

Sometimes traffic loss occurs only at the beginning of the flow. 

A typical symptom of this is that e.g. 30% of traffic loss occurs in a 10 second test.  The more the test is elongated (e.g. half an hour, a day), the loss percentage decreases.

So, when the flow is shortened the loss percentage can do 3 things:

  • The loss percentage decreases. 
    This means there are multiple loss events in the original scenario.  But there are still loss events in the current scenario
  • The loss percentage stays about the same.
    There is a continuous loss over time. 
  • The loss percentage increases.
    This probably means that the main loss-event is at the start of the test.


Approach 3:
Divide the long flow into multiple shorter flows

This approach is a combination of the two approaches above.  When a long flow is divided into multiple (shorter) flows, it should be easier to pinpoint some frame-loss events.  An example:

Here, a 10 second flow is split up into 20 flows.  These flows all take 500ms and start at 500ms intervals.

The tabular data in the report will show which short flow has a loss.  It is then easier to pinpoint the actual moment in time to investigate further.


Approach 4:
Using Out of Sequence detection

The ByteBlower has a neat feature one can use to debug loss over time: Out of Sequence detection.  Whilst this feature was implemented to detect bad reordering of packets after queuing them, it can be used to investigate frame loss.

Out of Sequence (OoS) inserts an incrementing frame number into the payload.  This enables ByteBlower to detect frames to be out of order. 

E.g. Frame with ID 5 should arrive after frame 4 and before frame 7.  It arrives after frame 7.
This triggers the ByteBlower server to mark frame 5 out of sequence.

What ByteBlower doesn't track (yet), are the frames which are lost.  In the example above, frame 6 was lost.

Luckily, the ByteBlower server provides another way to do this: Capturing the interfaces.

A capture can be created using the ByteBlower GUI.  When this capture is opened in a packet analyzer (e.g. Wireshark), it is possible to extract the frame number out of the payload.

In the screenshot above, the frame number (identifier) is marked.  This is an 8-byte field at the end of the payload.  The first 2 bytes (not marked, 0xFFe9) are used for checksum correction, the last 6 are the frame identifier (0x000000000016).

When the timeframe of the loss event can be pinpointed to a reasonable scope with the first 3 approaches, the exact moments can be narrowed down by noting the missing frame numbers.