Can Multipath TCP cope with middleboxes ?

As explained in a previous blog post, Multipath TCP had to cope with a variety of middleboxes which could interfere with this TCP extension.

Shortly after we detected the first interferences between a firewall and Multipath TCP, Honda et al. presented a detailed analysis [HNR+11] of the limits of the extensibility of TCP based on Internet measurements. To correctly understand the problems caused by middleboxes, we first need to remember that they can operate in any layer of the protocol stack as illustrated in the figure below.


When a router forwards an IPv4 packet that contains a TCP segment, it may modify some fields of the IPv4 header but never changes any field of the TCP header. This is one of the basis of the layering principles.


Middleboxes are different. As they potentially operate in any layer of the protocol stack, they can potentially change any field of the packet headers, in any layer. Some of them also modify packet payloads.


The main difficulty in such a network environement is that the TCP state on the client and on the server are updated based on information carried out inside packets. When the information placed in these packets changes after their transmission by one of the communicating hosts, this can create strange problems. Several of the functions of the Multipath TCP were designed to cope with middlebox interference. Here are a few examples :

  • During the three-way handshake, the client sends the MP_CAPABLE option in the third ack to cope with a middlebox that could remove it from the SYN+ACK
  • The ADD_ADDR, REMOVE_ADDR and MP_JOIN option contain an address identifier to cope with Network Address Translation
  • The DSS option uses relative sequence numbers to cope with middleboxes that randomize the initial TCP sequence number
  • The DSS option maps of block of data from the bytestream onto the TCP subflow. The length field of the DSS option allows to cope with middleboxes (or fast NICs) that segment/reassemble packets
  • The DSS option contains a Checksum to cope with middleboxes that add/remove bytes in the payload

Multipath TCP and its implementation in the Linux kernel can cope with these interferences and others. This makes Multipath TCP very robust compared to older TCP extensions. An example with a strange middlebox was published in another blog post.

A detailed analysis of the reactions of Multipath TCP against those interferences was published in [HDP+13]. In some cases, Multipath TCP reacts by closing the subflow that passes through this middlebox. In other cases, it fallsback to regular TCP. A summary of this analysis may be found in the table below.


If you suspect that there is a middlebox that interferes with Multipath TCP connections on a path, you can use tracebox [Gre] to detect the location of this middlebox. Examples of the utilisation of tracebox on Linux/MacOS and Android appeared on earlier blog posts.


[Gre]Detal Gregory. \texttt tracebox.
[HDP+13]Benjamin Hesmans, Fabien Duchene, Christoph Paasch, Gregory Detal, and Olivier Bonaventure. Are TCP Extensions Middlebox-proof? In Proceedings of the 2013 Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox). 2013. URL:
[HNR+11]M. Honda, Y. Nishida, C. Raiciu, A. Greenhalgh, M. Handley, and H. Tokuda. Is it still possible to extend TCP? In Proceedings of the 2011 ACM SIGCOMM conference on Internet Measurement Conference (IMC). 2011.

Fixing problems before the submission deadline

In the academic community, paper submission deadlines are sometimes strong incentives that encourage researchers to find solutions to problems that they ignored until then. While preparing the final version of a paper [RPB+12] that describes the design and the implementation of Multipath TCP, we thought that it would be interesting to add some measurement results to confirm that the protocol worked well for the important use case of combining the Wi-Fi and cellular interfaces on smartphones. We had already performed various experiments with such wireless networks and were expecting that the results could be obtained in a few hours.

Our initial objective was to meet one of the functional goals of as described in RFC 6581 :

*Improve Throughput: Multipath TCP MUST support the concurrent use
of multiple paths. To meet the minimum performance incentives for deployment, a Multipath TCP connection over multiple paths SHOULD achieve no worse throughput than a single TCP connection over the best constituent path.*

We created a small measurement setup in the lab by using two servers connected over Gigabit Ethernet with tc.


We first verified whether TCP could use the two wireless links when used alone. This was indeed the case as shown in the figure below (source [RPB+12]).


For this measurement, we looked at the impact of the receive window on the measured throughput. For TCP, the impact is low, except when the window is smaller than the bandwidth delay product, but this is not a surprise. When then ran the same experiments with the two interfaces with Multipath TCP. We were expecting some impact with a small window but did not anticipate the results shown below (source [RPB+12]).


When the maximum window is large, Multipath TCP aggregates the cellular and the Wi-Fi interfaces as expected. However, when the receive window is smaller, Multipath TCP can transfer at a rate which is small than regular TCP. This result was annoying and we were less than a week before the submission deadline. It was difficult to submit the paper without describing this basic use case in the paper. We organised daily teleconferences to understand the problem and then try to solve it.

tcpdump helped us to understand the problem by collecting packet traces. The main issue was the difference between the delay of the cellular link and the delay of the Wi-Fi link. We observe frequently the following situation in the packet trace. The server sent many packets over the Wi-Fi interface and one over the cellular interface. The acknowledgements were coming quickly from the Wi-Fi interface, but the sender had frequently to wait for an acknowledgement over the cellular interface. During these periods, the receive window was full and the sender could not transmit packets over the Wi-Fi link although it was idle. This was the explanation for the reduced throughput with the small receive window.

Once the problem was identified, the problem could be solved. The solution is composed of two parts. First, when Multipath TCP detects that it is window-blocked and there is some unacknowledged data, it tries to re-inject the data over another subflow whose congestion window is open. If this data is acknowledged quickly, then the receiver will advertise a large receive window that will enable the sender to transmit. Unfortunately, this is not sufficient as the same situation could happen again later. The second part of the solution is to penalise the slow subflow by halving its congestion window. These two elements of the solutions fixed the problem over Wi-Fi and cellular.


This heuristic was later improved after a detailed experimental evaluation over a wire range of network conditions [PKB13].


[PKB13]Christoph Paasch, Ramin Khalili, and Olivier Bonaventure. On the benefits of applying experimental design to improve multipath tcp. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies, 393–398. ACM, 2013. URL:
[RPB+12](1, 2, 3) C. Raiciu, C. Paasch, S. Barre, A. Ford, M. Honda, F. Duchene, O. Bonaventure, and M. Handley. How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI). 2012. URL:

Multipath TCP inside the beast

One of the nice points about releasing open-source software such as the Multipath TCP implementation in the Linux kernel is that there are unexpected use cases. In early 2013, we were contacted by Niels Laukens who works for VRT, the Dutch speaking television in Belgium. He had been following the project and identified a nice use case. Journalists use more and more computers to prepare their articles, but also when they go off-site for interviews. Once the interview has been recorded, they often need to edit it locally before uploading it to the television services to broadcast it or place it on the web site.

For live videos, they often rely on dedicated satellite channels, but these are expensive and they need a large antenna. Such antennas are fine when an event is planned and they need a large coverage. However, there are many situations where they cannot send a large team to record interviews and short movies. To cover those cases, they have equipped a small “mini” that serves as a mobile studio. A single journalist can record an interview, edit it and then send it over the air. This last part is the most interesting one for us. Satellite links are expensive and there are many situations where it is difficult to use a satellite. 3G, 4G and Wi-Fi could help, but their performance differ. Asking each journalist to learn to select the best network to upload his work was not a feasible solution. Fortunately, Niels found the right solution with Multipath TCP. The mini is equipped with a simple Multipath TCP proxy that is attached to all the available networks. The journalist to use his/her regular laptop through the proxy to upload his/her movies via all the available connections. This is much faster and simpler than always moving the car to a location where the satellite works well.

VRT published a nice video of their mini that is internally called “The Beast” :

Multipath TCP in the datacenter

In the scientific literature, one of the first important use case for Multipath TCP was to distribute the load datacenters. Several architectures have been proposed for datacenters. They differ in how links are organised, but all offer multiple paths between the servers. Measurement studies:cite:benson2010network have shown that datacenter traffic is composed of a lot of short flows called mice that are delay-sensitive, but most of the data is transported in long flows, called elephants that consume most of the bandwidth and can compete with the mice. One of the problems in a datacenter is that congestion can happen on some of the network links while others are unused. This is illustrated in the figure below that shows two TCP connections competing for the same link.


This problem was studied by Raiciu et al. by simulations [RBP+11]. They demonstrate that these collisions between competing flows significantly impact the performance of TCP.


Different techniques have been explored in the literature to solve this problem. Many of the proposed approaches used a centralised controller with Openflow or other similar techniques to reroute flows to avoid congestion.


With Multipath TCP, a completely distributed solution is possible. It leverages the utilisation of Equal Cost Multipath (ECMP) on datacenter switches. When a router/switch has several paths having the same cost towards a given destination, it can send packets over any of these paths. To maximise load-balancing, routers install all the available paths in their forwarding tables and balance the arriving packets over all of them. To ensure that all the packets that correspond to the same layer-4 flow follow the same path and thus have roughly the same delay, routers usually select the outgoing equal cost path by computing : H(IP_{src}||IP_{dst}||Port_{src}||Port_{dst})~mod~n when n is the number of equal cost paths towards the packet’s destination and H a hash function.

A consequence of this utilisation of ECMP is that TCP connections with different source ports between two hosts will sometimes follow different paths. This motivated the design of the ndiffports path manager in the Linux kernel. This path manager opens different subflows using the same source and destination IP addresses, the same destination port but different source addresses. The benefit of this approach is that the different subflows of a Multipath TCP connection will likely follow different paths inside the datacenter. With this path manager, Multipath TCP improves the utilisation of the datacenter as illustrated by the simulation results below.


One of the limiting factors of ECMP is that flows with different source ports may still use the same paths. This problem can be fixed by using a reversible hash function [DPVDL+13].

From an operation viewpoint, the most convincing argument of [RBP+11] was that similar results were obtained with the Linux implementation of Multipath TCP [PBD+14] in real datacenters by using Amazon EC2 servers.


The SIGCOMM‘11 article [RBP+11] attracted a lot of interest in the scientific community and is one of the most widely cited Multipath TCP articles. However, as of 2018, no real deployment of Multipath TCP in the datacenter has been publicly documented.


[DPVDL+13]Gregory Detal, Christoph Paasch, Simon Van Der Linden, Pascal Merindol, Gildas Avoine, and Olivier Bonaventure. Revisiting flow-based load balancing: stateless path selection in data center networks. Computer Networks, 57(5):1204–1216, 2013. URL:
[PBD+14]C. Paasch, S. Barré, G. Detal, F. Duchene, and others. Linux kernel implementation of Multipath TCP., 2014.
[RBP+11](1, 2, 3) C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving Datacenter Performance and Robustness with Multipath TCP. ACM SIGCOMM Computer Communication Review (CCR), 41(4):266–277, 2011. URL: