Interesting Multipath TCP talks

Various tutorials and trainings have been taught on Multipath TCP during the last years. Some of these have been recorded and are available via youtube.com.

The most recent video presentation is the talk given by Octavian Purdila from intel at the netdev‘01 conference In this talk, Octavian first starts with a brief tutorial on Multipath TCP targeted at Linux networking kernel developpers and then describes in details the structure of the current code and the plans for upstreaming it to the official Linux kernel.

A longer tutorial on the Multipath TCP protocol was given by Olivier Bonaventure at IETF‘87 in Berlinin August 2013.

Christoph Paasch gave a shorter Multipath TCP tutorial earlier during FOSDEM‘13 in Brussels.

Earlier, Costin Raiciu and Christoph Paasch gave a one hour Google Research talk on the design of the protocol and several use cases.

Costin : google tech talk

The Google Research talk was given a few days after the presentation of the USENIX NSDI‘12 paper that received the community award. This presentation is available from the USENIX website.

mptcptrace demo, experiment five

This is the fifth post of a series of five. Context is presented in the first post. The second post is here. The third post is here. The fourth post is here.

Fifth experiment

Green at 0s:
  • delay : 10ms
  • bandwidth : 4mbit/s
  • loss : 0%
Green at 5s:
  • delay : 100ms
  • bandwidth : 4mbit/s
  • loss : 10%
Green at 15s:
  • delay : 10ms
  • bandwidth : 4mbit/s
  • loss : 0%
Red:
  • delay : 40ms
  • bandwidth : 4mbit/s
Client:
  • Scheduler : Round robin

Last experiment of this series, we come back to the third experiment, and instead of adding a 1% loss rate after 15 seconds of the red subflow, we change the MPTCP scheduler and we use the round robin. It is worth to note that the round robin scheduler still respects the congestion window of the subflows.

Let’s see the evolution of the sequence number :

../../../_images/sequence4.png

The first thing that we can see are the small steps between 5 and 15 seconds. We also have the impression that we use more the red subflow but if we zoom :

../../../_images/sequence_zoom4.png
../../../_images/sequence_zoom_21.png

we can confirm that we use both subflows. We see 3 lines:

  1. The segments, because they are sent together at the sender, red and green subflows do not form two separate lines
  2. The green acks : we can see that they are closer to the segment line
  3. The red acks : we can see that all red acks are late from the MPTCP point of view. This is normal since the green delay is shorter.

If we look at the evolution of the sequence number between 5 and 15 seconds, we can observe a series of stairs.

../../../_images/sequence_zoom_3.png

If we take a close at one of the this stairs :

../../../_images/sequence_zoom_4.png

because the green subflow is lossy during this period, we have reordering. Because we use the round robin scheduler, MPTCP still decides to send some data over the green path.

If we now take a look at the evolution of the goodput :

../../../_images/gput4.png

We can see the perturbation of the lossy link over the “instantaneous” goodput.

However the impact on the average goodput is somehow mitigated. Depending of the application, these variations may be problematic or not.

../../../_images/flight2.png

If we take a look at the evolution of the MPTCP unacked data, we see a lot of variations during the period 5 to 15 seconds. This is due to the reordering that happens during this period. This is not a big issue as long as the receive window is big enough to absorb these variations. In some scenarios this may be an issue if the window is too small. We may also remark that MPTCP may use more memory in this case on the receiver due to the buffer auto-tuning.

Finally, we can take a look at the evolution of the unacked data at the TCP level.

../../../_images/flightpf2.png

We can observe that we use both subflows during the whole connection but losses between 5 and 15 seconds on the green subflow leads to a bigger usage of the red subflow during this period.

Conclusion

This ends the series of posts that shows some basic MPTCP experiments. mptcptrace has been used to get the values out of the traces and R scripts have been used to produce the graphs. However we did not really post process the data in R. We have more experiments and visualizations that we will present later.

Why are there sometimes long delays before the establishment of MPTCP connections ?

Multipath TCP users sometimes complain that Multipath TCP connections are established after a longer delay than regular TCP connections. This can happen in some networks and the culprit is usually a middlebox hidden on the path between the client and the server. This problem can easily be detected by capturing the packets on the client with tcpdump Such a capture looks like :

11:24:05.225096 IP client.59499 > multipath-tcp.org.http:
      Flags [S], seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7358805 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

The client sends a SYN with the MP_CAPABLE option... Since it receives no answer, it retransmits the SYN.

11:24:06.224215 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7359055 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

And unfortunately two times more...

11:24:08.228242 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7359556 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

11:24:12.236284 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7360558 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

At this point, Multipath TCP considers that there could be a middlebox that discards SYN segments with the MP_CAPABLE option on the path to reach the server and disables Multipath TCP.

11:24:20.244351 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7362560 ecr 0,nop,wscale 7], length 0

This segment immediately reaches the server that replies :

11:24:20.396718 IP multipath-tcp.org.http > client.59499: Flags [S.],
      seq 3954135908, ack 270592033, win 28960, options [mss 1380,sackOK,
      TS val 2522075773 ecr 7362560,nop,wscale 7], length 0
11:24:20.396748 IP client.59499 > multipath-tcp.org.http: Flags [.],
      ack 1, win 229, options [nop,nop,TS val 7362598 ecr 2522075773], length 0

As shown by the trace, the middlebox, by dropping the SYN segments containing the MP_CAPABLE option has delayed the establishment of the TCP connection by fifteen seconds. This delay is controlled by the initial retransmission timer (one second in this example) and the exponential backoff applied by TCP to successive retransmissions of the same segments.

What can Multipath TCP users do to reduce this delay ?

  • the best answer is to contact their sysadmins/network administrators and use a tool like tracebox to detect where packets with the MP_CAPABLE option are dropped and upgrade this middlebox
  • if changing the network is not possible, the implementation of Multipath TCP in the Linux kernel can be configured to more aggressively fallback to regular TCP through the net.mptcp.mptcp_syn_retries configuration variable described on http://multipath-tcp.org/pmwiki.php/Users/ConfigureMPTCP. This variable controls the number of retransmissions for the initial SYN before stopping to use the MP_CAPABLE option (the default is 3)

mptcptrace demo, experiment four

This is the fourth post of a series of five. Context is presented in the first post. The second post is here. The third post is here.

Fourth experiment

Green at 0s:
  • delay : 10ms
  • bandwidth : 4mbit/s
  • loss : 0%
Green at 5s:
  • delay : 100ms
  • bandwidth : 4mbit/s
  • loss : 10%
Green at 15s:
  • delay : 10ms
  • bandwidth : 4mbit/s
  • loss : 0%
Red at 0:
  • delay : 40ms
  • bandwidth : 4mbit/s
Red at 15:
  • delay : 40ms
  • bandwidth : 4mbit/s
  • loss : 1%
Client:
  • Scheduler : default

In this fourth experiment, we change the loss rate of the red path to 1% after 15 seconds.

Again we take a look at the evolution of the sequence number.

../../../_images/sequence3.png

In this case however we can see the shift after 15 seconds. Because the red link become lossy after 15 seconds, the congestion window of the red subflow shrinks and MPTCP needs to send data again on the green subflow if the congestion window of the red subflow becomes too small to sustain the application rate. Because MPTCP now sends a little bit of traffic on the green subflow, it realises that the green subflow has changed and has now a lower delay and a lower loss rate. As a consequence the green subflow will open the congestion window again and will have a large enough congestion window to sustain the application rate.

../../../_images/sequence_zoom3.png
../../../_images/sequence_zoom_2.png

If we take a look at the evolution of the goodput, we see the two shifts at 5s and at 15s

../../../_images/gput3.png
../../../_images/gput_zoom2.png
../../../_images/gput_zoom_2.png

The evolution of the MTPCP unacked data is shown below :

../../../_images/flight1.png

We can see the change after 15s and that we use less of the receiver window after 15 seconds... In this case, because the red link is lossy, we consume less of the receive window. In our case the receive window is big enough anyway, but we would have different results if the window were smaller. We could reduce the window size by reducing the rmem but this for an other experiment.

Again we can look at the evolution of the unacked data at the TCP level

../../../_images/flightpf1.png

again we observe the shift after 15 seconds.