Multipath TCP and load balancers

Load balancers play a very important role in today’s Internet. Most Internet services are provided by servers that reside behind one or several layers of load-balancers. Various load-balancers have been proposed and implemented. They can operate at layer 3, layer 4 or layer 7. Layer 4 is very popular and we focus on such load balancers in this blog post. A layer-4 load balancer uses information from the transport layer to load balance TCP connections over different servers. There are two main types of layer-4 load balancers :

  • The stafeful load balancers

  • The stateless load balancers

    Schematically, a load balancer is a device or network function that processes incoming packets and forwards all packets that belong to the same connection to a specific server. A stateful load balancer will maintain a table that associates the five-tuple that identifies a TCP connection to a specific server. When a packet arrives, it seeks a matching entry in the table. If a match is found, the packet is forwarded to the selected server. If there is no match, e.g. the packet is a SYN, a server is chosen and the table is updated before forwarding the packet. The table entries are removed when they expire or when the associated connection is closed. A stateless load balancer does not maintain a table. Instead, it relies on hash function that is computed over each incoming packet. A simple approach is to use a CRC over the source and destination addresses and ports and associate each server to a range of CRC values.

With Multipath TCP, a single connection can be composed of different subflows that have their own five tuples. This implies that that data corresponding to a given Multipath TCP connection can be received over several different TCP subflows that obviously need to be forwarded to the same server by the load balancer. Several approaches have been proposed in the literature to solve this problem.

In Datacenter Scale Load Balancing for Multipath Transport, V. Olteanu and C. Raiciu proposed two different tricks to support stateless load balancers with Multipath TCP. First, the load balancer selects the key that will be used by the server for each incoming Multipath TCP connection. As this key is used to Token that identifies the Multipath connection in the MP_JOIN option, this enables the load balancer to control the Token that clients will send when creating subflows. This allows the load balancer to correctly associated MP_JOINs to the server that terminates the corresponding connection. This is not sufficient for a stateless load balancer. A stateless load balancer also needs to associate each incoming packet to a specific server. If this packet belongs to a subflow, it carries the source and destination addresses and ports, but those of a subflow have no releationship with the initial subflow. They solve this problem by encoding the identification of the server inside a part of the TCP timestamp option.

In Towards a Multipath TCP Aware Load Balancer, S. Lienardy and B. Donnet propose a mix between stateless and stateful approaches. The packets from the first subflow are sent to a specific server by hashing their source and destination addresses and ports. They then extract the key exchanged in the third ack to store the token associated with this connection. This token is then placed in a map that is used to load balance the SYN MP_JOIN packets. The reception of an MP_JOIN packet forces the creation of an entry in a table that is used to map the packets from the additional subflows.

In Making Multipath TCP friendlier to Load Balancers and Anycast, F. Duchene and O. Bonaventure leverage a feature of the forthcoming standard’s track version of Multipath TCP. In this revision, the MP_CAPABLE option has been modified compared to RFC6824. A first modification is that the client does not send its key anymore in the SYN packet. A second modification is the C that when when set by a server in the SYN+ACK, it indicates that the server will not accept additional MPTCP subflows to the source address and flows of the SYN. This bit was specifically introduced to support load balancers. It works as follows. When a client creates a connection, it sends a SYN towards the load balancer with the MP_CAPABLE option but no key. The load balancer selects one server to handle the connection, e.g. based on a stateless hash. Each server has a dedicated IP address or a dedicated port number. It replies to the SYN with a SYN+ACK that contains the MP_CAPABLE option with the C bit set. Once the connection is established, it sends an ADD_ADDR option with its direct IP address to the client. The client then uses the direct address to create the subflows and those can completely bypass the load balancer. The source code of the implementation is available from

The latest Multipath TCP load balancer was proposed in Stateless Datacenter Load-balancing with Beamer by V. Olteanu et al. It assigns one port to each load balanced server and also forces the client to create the subflows towards this per-server port number. The load balancer is implemented in both software (click elements) and hardare (P4) and evaluated in details. The source code is available from

Experimenting with MPTCP using raw sockets

Although Multipath TCP is already available on several platforms (Linux, FreeBSD, iOS11), applications like Tracebox (or Mobile Tracebox) are still a convenient choice for users eager to experiment with the new protocol without installing the full MPTCP stack. These tools (along with zmap, tcpexposure, etc) allow to forge custom packets (e.g. using raw sockets) emulating newer extensions or protocols.

For instance Tracebox can highlight middleboxes interfering with MPTCP by sending MP_CAPABLE Syn with increasing TTL and collecting ICMP Time Exceeded messages from intermediate routers. Even when they don’t respond or don’t quote the full TCP header, Syn Ack without MP_CAPABLE option received from well known MPTCP servers can still reveal interference. When path is clean from middleboxes, MP_CAPABLE Syn can also be used to assess if a server adopts MPTCP.

Unfortunately middleboxes can be very subtle: e.g. they can be completely transparent to MP_CAPABLE packets but still interfere with ADD_ADDR and DSS or they can fictitiously support all options carried by Syn.

This leads to the need of more articulated MPTCP tests: in this post we describe a test (included in Mobile Tracebox), that uses raw sockets to establish a MPTCP connection, exchange data and also associate a second subflow.

In the first part we detail for every step how options should be correctly crafted for MPTCP experiment to succeed, in the second part we explore further scenarios (e.g. options not perfectly compliant with the protocol) to see how a MPTCP stack reacts to them. This can benefit development of similar application, avoiding pitfalls when dealing at low level with MPTCP, but also can help to better understand how the protocol concretely works. The figure summarizes packets exchanged between A, our client running the test, and B, a MPTCP-enabled server (


MPTCP-compliant scenario

We report the output of Mobile Tracebox (only interesting header fields are included).

 0:   [TCP Syn] TCP::SourcePort(24d2)  TCP::Option_MPTCP(00811000000000000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (00810c4d5dfc94d0a464)

 0:   [TCP Ack]  TCP::SourcePort(24d2) TCP::Option_MPTCP(008110000000000000000c4d5dfc94d0a464)
64:  *

 0:   [TCP Ack 72 bytes] TCP::SourcePort(24d2) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100483aca)  TCP::Payload ("GET / HTTP/1.1...")
64: [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
64: [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 0:  [TCP Syn] TCP::SourcePort (cefc) TCP::Option_MPTCP(10023a03caf210000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (100256c7a377b2e33fdaa29163c5)

All fields are in hexadecimal format: we can easily acknowledge the MPTCP option subtype from the first digit. A full trace of the packets exchanged during the probe is also reported.

18:48:32.485197 IP client1.9426 > Flags [S], seq 19922944, win 65535, options [mptcp capable csum {0x1000000000000000}], length 0
18:48:32.573554 IP > client1.9426: Flags [S.], seq 3005334072, ack 19922945, win 28800, options [mss 1452,mptcp capable csum {0xc4d5dfc94d0a464}], length 0
18:48:32.573792 IP client1.9426 > Flags [.], ack 1, win 65535, options [mptcp capable csum {0x1000000000000000,0xc4d5dfc94d0a464}], length 0
18:48:35.577198 IP client1.9426 > Flags [.], seq 1:73, ack 1, win 65535, options [mptcp dss seq 4216210269 subseq 1 len 72 csum 0x3aca], length 72: HTTP: GET / HTTP/1.1
18:48:35.664046 IP > client1.9426: Flags [.], ack 1, win 28800, options [mptcp add-addr id 8,mptcp dss ack 4216210269], length 0
18:48:35.664556 IP > client1.9426: Flags [.], ack 73, win 28800, options [mptcp dss ack 4216210341], length 0
18:48:35.666894 IP > client1.9426: Flags [P.], seq 1:503, ack 73, win 28800, options [mptcp dss ack 4216210341 seq 2447520560 subseq 1 len 502 csum 0xc36b], length 502: HTTP: HTTP/1.1 200 OK
18:48:38.670543 IP client2.52988 > Flags [S], seq 1793048487, win 65535, options [mptcp join id 2 token 0x3a03caf2 nonce 0x10000000], length 0
18:48:38.756268 IP > client2.52988: Flags [S.], seq 1665111958, ack 1793048488, win 28800, options [mss 1452,mptcp join id 2 hmac 0x56c7a377b2e33fda nonce 0xa29163c5], length 0

The test uses two client’s addresses (, client1 –, client2) for the two subflows, but it’s still possible to use the same address just with different source ports.

Everything starts with a Syn carrying MP_CAPABLE option (subtype 0x0) with flags A (Checksum required) and H (use of HMAC-SHA1 as crypto algorithm) and a 64 bits key chosen by the client (0x1000000000000000). Server replies with a MP_CAPABLE Syn Ack containing same flags and its key: client takes note of server’s key to echo it on MP_CAPABLE Ack (but also to forge the subsequent MP_JOIN).

If the clients attempts to send a MP_JOIN message at this point MPTCP stack will discard the new subflow with a Rst, since no data has been actually exchanged on the first subflow. This means we have to send a packet with a real payload and a DSS option. To avoid the server dumping our packet or simply closing the connection payload must be a real HTTP request.

GET / HTTP/1.1
Connection: keep-alive

We also have to assemble a compliant DSS Option (subtype 0x2): we set flags to 0x4 (data sequence number of 32 bits), Subflow Sequence Number to 1, Data-Level Length to the length of our TCP payload (72 bytes); Data Sequence Number is generated from the SHA-1 hash of the client’s key; finally a DSS checksum has to be calculated on payload and DSS pseudo-header. Server answers with 2 packets, the first carries an ADD_ADDR option (subtype 0x3) advertising server IPv6 address: this is a symptom that MPTCP stack has acknowledged that we are speaking MPTCP language. The second contains a DSS Option: we can see how sent data is acked on both TCP and MPTCP levels.


To avoid DSS checksum calculation we can use a Data-Level Length greater than the actual TCP payload length: in this case the packet will be accepted but DSS checksum will not be evaluated waiting for the next TCP segment (packet will be acked at TCP level but not MPTCP level).

After data exchange has taken place on the first subflow we can finally use a second subflow to join MPTCP connection. We send a new Syn from different source address and port (or just different port) with a MP_JOIN option (subtype 0x2) carrying a Token obtained from the key sent by server in its MP_CAPABLE Syn Ack (the first 32 bits of the SHA-1 hash of server’s key) and a Random number; Address Id is obviously set to 2. The server answers with a MP_JOIN Syn Ack (carrying a Hash-based Message Authentication Code and a Random number), sign that our MPTCP experiment has succeeded.

Other scenarios

Another advantage of raw sockets is that we can send packets not perfectly compliant with the protocol simulating how MPTCP stack reacts to possible malfunctioning or tricky middlebox interference.

Invalid MP_CAPABLE Key

In this scenario the client echoes a wrong server’s key in MP_CAPABLE Ack: this inconsistency is ignored and communication proceeds well on both TCP and MPTCP level on the first subflow. Also MP_JOIN still succeeds as long as the token is calculated from the correct server’s key.

 0:   [TCP Syn] TCP::SourcePort(fac4)  TCP::Option_MPTCP(00811000000000000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (0081422b61826574250e)

 0:   [TCP Ack]  TCP::SourcePort(fac4) TCP::Option_MPTCP(008110000000000000002000000000000000)
64:  *

 0:   [TCP Ack 72 bytes] TCP::SourcePort(fac4) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100483aca)  TCP::Payload ("GET / HTTP/1.1...")
64: [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
64: [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 0:  [TCP Syn] TCP::SourcePort (cc2e) TCP::Option_MPTCP(1002531ed1b010000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (1002ea5ea69620cd23fc2c3feb6a)

No DSS Checksum, despite requested by counterpart

In the next scenario client sends a DSS option without checksum, although server has requested DSS checksum in its MP_CAPABLE Syn Ack: server replies with a Rst terminating the subflow, but the subsequent MP_JOIN still succeeds.

 0:   [TCP Syn] TCP::SourcePort(c49a)  TCP::Option_MPTCP(00011000000000000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (008106ef03ac4b958a2f)

 0:   [TCP Ack]  TCP::SourcePort(c49a) TCP::Option_MPTCP(0001100000000000000006ef03ac4b958a2f)
64:  *

 0:   [TCP Ack 72 bytes] TCP::SourcePort(c49a) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d000000010048)  TCP::Payload ("GET / HTTP/1.1...")
64: [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
64: [TCP Rst Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(2001fb4e435d)

 0:  [TCP Syn] TCP::SourcePort (6bf0) TCP::Option_MPTCP(100252ad76cd10000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (10028ce03d14fa336cdae211d46d)

Bad DSS Checksum (MP_FAIL)

In another scenario a wrong DSS Checksum is sent, in this case the server correctly acknowledges data at TCP level, but sends a MP_FAIL (subtype 0x6) option causing fall back to a single subflow. Obviously subsequent MP_JOIN Syn will be rejected.

 0:   [TCP Syn] TCP::SourcePort(f1ba)  TCP::Option_MPTCP(00811000000000000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (0081b144931ac84d865a)

 0:   [TCP Ack]  TCP::SourcePort(f1ba) TCP::Option_MPTCP(00811000000000000000b144931ac84d865a)
64:  *

 0:   [TCP Ack 72 bytes] TCP::SourcePort(f1ba) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100480100)  TCP::Payload ("GET / HTTP/1.1...")
64: [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
64: [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(60008710f99bfb4e435d) TCP::Option_MPTCP(2001fb4e435d)

 0:  [TCP Syn] TCP::SourcePort (ebbb) TCP::Option_MPTCP(10023d41ba9910000000)
64: [TCP Rst Ack] -TCP::Option_MPTCP

Fall back without MP_FAIL

Fall back can also occur when client sets DSS Data-Level Length to 0 (“infinite mapping”): in this scenario server acknowledges data at TCP and MPTCP level and doesn’t send any MP_FAIL (since this case is interpreted as a choice by the client and not an anomalous event like a wrong DSS checksum), but fall back is still evident when client attempts to associate a new sufblow and MP_JOIN is not accepted.

 0:   [TCP Syn] TCP::SourcePort(de0a)  TCP::Option_MPTCP(00811000000000000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (0081b6e7a1b307358b82)

 0:   [TCP Ack]  TCP::SourcePort(de0a) TCP::Option_MPTCP(00811000000000000000b6e7a1b307358b82)
64:  *

 0:   [TCP Ack 72 bytes] TCP::SourcePort(de0a) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100000000)  TCP::Payload ("GET / HTTP/1.1...")
64: [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
64: [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 0:  [TCP Syn] TCP::SourcePort (2c6d) TCP::Option_MPTCP(10020edf2fd810000000)
64: [TCP Rst Ack] -TCP::Option_MPTCP

Bad MP_JOIN Token

In the last scenario tested the client sends a wrong Token in MP_JOIN Syn. The server unsurprisingly replies with a Rst.

 0:   [TCP Syn] TCP::SourcePort(1594)  TCP::Option_MPTCP(00811000000000000000)
64: [TCP Syn Ack] TCP::Option_MPTCP (008191d0ae47af67a0f2)

 0:   [TCP Ack]  TCP::SourcePort(1594) TCP::Option_MPTCP(0081100000000000000091d0ae47af67a0f2)
64:  *

 0:   [TCP Ack 72 bytes] TCP::SourcePort(1594) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100483aca)  TCP::Payload ("GET / HTTP/1.1...")
64: [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
64: [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 0:  [TCP Syn] TCP::SourcePort (d26e) TCP::Option_MPTCP(10020200000010000000)
64: [TCP Rst Ack] -TCP::Option_MPTCP

Mobile Tracebox

The MPTCP test described in the first part has been included in the new version of Mobile Tracebox Screenshots show how to select destination address and the appropriate probe. To avoid a full traceroute on every packet sent, minimum TTL can be conveniently set to 64.

../../../_images/Screenshot_1_select_address.png ../../../_images/Screenshot_2_select_probe_category.png ../../../_images/Screenshot_3_select_probe.png ../../../_images/Screenshot_4_advanced_min_ttl.png ../../../_images/Screenshot_5_successfull_probe_output.png

Since raw sockets are needed this probe is available only on rooted Android devices.

Multipath TCP on iOS11 : A closer look at the TCP Options

Multipath TCP uses a variety of TCP options to use different paths simultaneously. Several Multipath TCP options are defined in RFC6824 :

  • subtype 0x0: MP_CAPABLE
  • subtype 0x1: MP_JOIN
  • subtype 0x2: DSS
  • subtype 0x3: ADD_ADDR
  • subtype 0x4: REMOVE_ADDR
  • subtype 0x5: MP_PRIO
  • subtype 0x6: MP_FAIL
  • subtype 0x7: MP_FASTCLOSE

In this blog post, we explore in more details the packet trace collected on an iPhone using iOS11 beta. We start our analysis with the three-way handshake. The trace contains one Multipath TCP connection. Recent versions of Wireshark support Multipath TCP and we use the tcp.options.mptcp.subtype==0 filter to match all the packets that contain the MP_CAPABLE option. This option only appears in the three packets of the initial three-way handshake. Let us first analyse the SYN sent by the iPhone. In our test over an LTE network, iOS11 beta2 advertises the following options:

  • MSS set to 1410 bytes. This is a relatively small value that was probably chosen to reduce the risk of fragmentation or Path MTU discovery problems since cellular networks often use tunnels internally
  • Selective Acknowledgements are proposed
  • The Window scale factor is set to 6 and the iPhone advertises a 64Kbytes window.
  • The Timestamp option is used as well.
  • The MP_CAPABLE option sent by the iPhone does not request the utilisation of the DSS checksum. The DSS checksum was introduced in RFC6824 to detect middlebox interference. Previous versions of iOS did not use this checksum to support Siri because Siri ran over HTTPS and this prevents most middlebox interference. However, when Multipath TCP is used to support a protocol such as HTTP, there is a risk of interference from middleboxes that inject HTTP headers. If you plan to use Multipath TCP on iOS11, you should probably rely on HTTPS and forget HTTP for other reasons than Multipath TCP.

The server, in this trace the Linux implementation running on replies with Selective Acknowledgements, Timestamps, a Window Scaling factor set to 7 and requires the utilisation of the DSS Checksum.


The MP_CAPABLE option contained in the third ACK sent by the iPhone confirms that the iPhone will use the DSS checksum for this connection as requested by the server.


The utilisation of the DSS Checksum is clearly visible in the first data packet that is sent by the iPhone. It uses 32 bits long Data sequence numbers and data acknowledgement numbers.


The first data packet returned by the Linux server is shown below. It also uses 32 bits data sequence and data acknowledgement numbers.


With iOS11 beta2, the iPhone uses the MP_PRIO option and sets the cellular subflow as a backup subflow. This is immediately visible in the fourth packet of the trace that is shown below.


Apple has already explained earlier that they do not use the ADD_ADDR option because their stack is focussed on clients and they do not see a benefit in advertising client addresses since those are often behind a NAT or firewall. We did not observe ADD_ADDR or REMOVE_ADDR in our first trace.

The MP_JOIN option is used to create subflows. In our trace, this happens at time 4.74 when we enable the WiFi interface. The MP_JOIN option contains the token advertise by the server in the MP_CAPABLE option and its backup flag is reset. This indicates that the WiFi subflow is preferred to the cellular flow that was initially created. It is interesting to note that iOS11 beta advertises a longer MSS over the WiFi interface than over the cellular one. The same window scaling factor (6) is used.


We did not observe MP_FASTCLOSE in this trace.

We’ll discuss MP_FAIL in another post since it is related to fallbacks to TCP.

MPTCP experiments on iOS 11 beta

MPTCP support has been announced for iOS 11 during wwwdc2017. The developer documentation presents a new instance property called multipathServiceType inside the URLSessionConfiguration class that can be set to one of the constants specified in MultipathServiceType enumeration, which is also in the URLSessionConfiguration class. The enumeration contains four constants and the documentation has a small description for each constant :

  1. none : The default service type indicating that Multipath TCP should not be used.
  2. handover : A Multipath TCP service that provides seamless handover between Wi-Fi and cellular in order to preserve the connection.
  3. interactive : A service whereby Multipath TCP attempts to use the lowest-latency interface.
  4. aggregate : A service that aggregates the capacities of other Multipath options in an attempt to increase throughput and minimize latency.

The code bellow shows a simple example of usage:

let config = URLSessionConfiguration.ephemeral

config.multipathServiceType = URLSessionConfiguration.MultipathServiceType.handover
let session = URLSession(configuration: config)

let url = URL(string: "")

let task = session.dataTask(with: url!, completionHandler:{...})


We will present experiments done with iOS in a series of posts on this blog. In our first experiment, we use the handover service type. We start the connection with the wifi interface down and after a few seconds, we turn on the wifi interface. The trace of the connection is available here. We use mptcptrace to see how the subflows are used. Let’s take a look at the Multipath-TCP sequence numbers over time :


As expected, the connection starts on the mobile interface because it is the only interface available at that time. When the wifi interface becomes available, around five seconds after the start of the connection, all the traffic is immediately sent to the wifi subflow.

Let’s take a closer look at what happens during the transition around five seconds after the start of the connection:


On this graph, MPTCP acknowledgements are pictured as blue crosses. We can see on this zoom, on the left upper corner, that the client receives out-of-sequence (from Multipath-TCP’s perspective) packets during the transition. This is due to the fact that iOS tries to terminate the connection as soon as possible on the mobile interface and the server does not know yet that it should not be used anymore. Starting from packets 4647 in the trace, we can see the zero window advertisement and resets sent by the iPhone on the mobile subflow. Once the server detects that some packets will not arrive on the mobile subflow, when it receives the reset, it reinjects the packets on the wifi subflow. During the time of the reinjections, out-of-order packets are kept in the out-of-order queue of MPTCP on the client side. To observe this out-of-sequence queue, we zoom on the right top corner of the graph :


On this graph, we can observe the MPTCP ACKs that cover the out-of-sequence packets received earlier. In particular we can observe a hole in the middle of the graph. If we zoom on other parts of the graph we can see several holes like this one.

This concludes our first analysis of Multipath-TCP on iOS. Stay tuned for more detailed analysis and tests. In next posts, we will discuss other Multipath TCP services offered by iOS11.