MPTCP

Native Multipath TCP support for OpenSSH

Fri, 08 Jul 2022 00:00:00 +0200

Native Multipath TCP support for OpenSSH

During the Open Week at UCLouvain, we added native Multipath TCP support on the OpenSSH client and server in order to connect to a remote machine using multiple network interfaces.

What’s the point ?

There are several benefits to this native support :

Improve the total bandwidth by combining the bandwidth of each interface.
Move the ssh session from an interface to another without losing the connection.
Keep the connection alive even if there is no interface connected during some time.

Compiling and installation

Follow the instructions shown in the README file.

Setting up mptcpd

mptcpd is a daemon that allows to automatically setup new interfaces with mptcp. It comes with a config file allowing to specify a mode or list of modes for each new mptcp address. Our tests were done using the ‘subflow’ mode.

Manual mptcp configuration

New addresses can be added manually using the ip command. To do so, please refer to the redhat documentation.

Prevent timeouts

Configure those options in sshd_config:

ClientAliveInterval 60

> Sends a packet to the client after 60 seconds of inactivity.
ClientAliveCountMax 60

> Closes the connection when 60 packets have been sent and no response have been received.

Usage

Clone this fork of openssh, compile on the mptcp_support branch and install, set the client and server config files:

Uncomment UseMPTCP no in ssh_config and change to yes
Uncomment UseMPTCP no in sshd_config and change to yes

Then run SSH as usual.

Alternatively, run the following commands from the directory where the compiled binaries are located:

On the server side:

$(pwd)/sshd -o UseMPTCP=yes

On the client side:

./ssh -o UseMPTCP=yes user@hostname

Real life testing

The first step to check our port of ssh to mptcp was to see in Wireshark if mptcp packets were being transmitted. Then, the real life testing can begin.

Here, at the UCLouvain, we have a lot of auditoriums and computer labs close to each other. The eduroam network is used accross buildings to give students access to the internet. This gave us the idea to walk from building to building, switching access point often and to keep a mobile 4G connection used by Multipath TCP. The goal was to see if an ssh session would break over time. You can see the path we have taken accross the city in the following picture:

Along with it, we have captured the packets on the client device to see which interface was used and when. In the plot, we have only selected the “interesting” part of the data when the connection switches between 4G and wifi. We see on the first part that the 4G took over. We think it is because there was a black zone in eduroam between the two first access points in our path. The same phenomenon can be seen a bit later.

At the end we noticed that the ssh connection was still alive even after switching interfaces multiple times. We can thus say from our experiment that the MPTCP port of ssh seems to be successful.

Apple Music on iOS13 uses Multipath TCP through load-balancers

Sun, 27 Oct 2019 00:00:00 +0200

Apple Music on iOS13 uses Multipath TCP through load-balancers

Since the publication of RFC6824 in 2013, interest in Multipath TCP has continued to grow and various use cases have been deployed. Apple uses Multipath TCP for its Siri voice recognition application since 2013 to support seamless handovers while walking. Tessares uses Multipath TCP to deploy Hybrid Access Networks that combine xDSL and LTE to provide faster Internet access services in rural areas. Samsung, LG and Huawei use Multipath TCP on their Android smartphones to combine Wi-Fi and 4G. Recently, 3GPPP has selected Multipath TCP to support Wi-Fi/5G co-existence in future 5G networks and a first prototype has been demonstrated.

Despite these growing deployments, web hosting companies and CDNs have complained that Multipath TCP was difficult to deploy because they assume that load balancers would need to be changed to terminate Multipath TCP connections.

It turns out that it is possible to support Multipath TCP on servers with todays’ load-balancers. Fabien Duchêne proposed and evaluated this solution in the Making Multipath TCP Friendlier to Load-Balancers and Anycast paper that he presented at ICNP’17. A simpler load-balancer is illustrated in the figure below. The load-balancer uses IP address 1.2.3.4 and forwards connections to the servers behind it, selecting them e.g. based on a hash function.

With Multipath TCP, this simple approach does not work anymore as the second subflow from the client could be hashed to a different server than the one of the initial subflow.

The solution proposed by Fabien Duchêne is both simple and efficient. The load-balancer has a public IP address that is advertised by the DNS. Furthermore, each server that resides behind this load balancer has it own public IP address. When a client contacts the load-balanced service, its SYN packet reaches the load-balancer that selects one of the servers in the pool. The server confirms the establishment of the connection using the load-balanced IP address. As soon as the Multipath TCP connection is established, the server sends a packet containing its own IP address inside an ADD_ADDR option. Thanks to this address, the client can establish the subsequent subflows directly towards the server address, effectively bypassing the load-balancer.

Given its benefits, this solution has been included in the standards-track version of Multipath TCP that is being finalised by the IETF. However, it had not yet been deployed at a large scale, until the release of iOS13 by Apple in September 2019. Given the benefits of Multipath TCP for Siri, Apple has decided to also use Multipath TCP for its Apple Maps and Apple Music applications. These two applications are often used while walking and thus suffered from interruptions during Wi-Fi/cellular handovers. A closer look at a packet trace collected from an iPhone using Apple Music shows interesting results.

First, Apple Music uses Multipath TCP as shown by the options contained in the SYN packet. It is interesting to point out that both the iPhone and the server use the first version of Multipath TCP defined in RFC6824.

Once the connection has been established, the server sends quickly a packet that advertises its own address in the ADD_ADDR option.

A closer look at this option indicates that this address is advertised as having address identifier 0. According to RFC6824, this identifier is reserved for the address of the initial subflow. This advertisement thus changes the address of the initial subflow, and we can expect that iOS13 will use this new address to establish subsequent subflows on this Multipath TCP connection. We checked by contacting one of Apple Music servers from a Linux client to see how Linux reacts to such an option and it supports it correctly.

According to statista, there were more than 60 million subscribers of the Apple Music service in June 2019. As they upgrade their smartphones to support iOS13, we will observe a huge growth in Multipath TCP traffic during the coming weeks… If you see other servers or CDNs that enable Multipath TCP, let us know…

SOCKS and Multipath TCP

Fri, 21 Dec 2018 00:00:00 +0100

SOCKS and Multipath TCP

The main benefit of Multipath TCP is that it enables either the simultaneous utilisation of different paths or provides fast handovers from one path to another. Several deployments of Multipath TCP leverage these unique capabilities of multipath transport. When Apple opted for Multipath TCP in 2013, they could leverage its benefits by enabling it on both their iPhones and iPads and on the servers that support the Siri application. This is the end-to-end approach to deploying Multipath TCP.

However, there are many deployment scenarios where it would be beneficial to use Multipath TCP to interact with servers that have not yet been upgraded to support this new protocol. A first use case are the smartphones willing to combine Wi-Fi and cellular or quickly switch from one to another. Another use case is to combine different access networks to benefit from a higher bandwidth. In large cities, users can often use high bandwidth access links through VDSL, FFTH or cable networks. However, in rural areas, bandwidth is often limited and many home users or SMEs need to combine different access links to reach 10 Mbps or a small multiple of this low bandwidth. Multipath TCP has been used in these two deployments together with SOCKS RFC 1928.

SOCKS is an old application layer protocol that was designed to allow users in a corporate network to access the Internet through a firewall that filters connections. Several implementations of SOCKS exist and a popular one is shadowsocks. In Soutch Korea, the smarpthones that use the Gigapath service interact through a SOCKS server over Multipath TCP as illustrated in the figure below.

This SOCKS service can be easily deployed once the Multipath TCP implementation has been ported on the smartphones. It appears as a simple application that interacts with the SOCKS server which is managed by the network operator. However, SOCKS is a chatty protocol that significantly increases the delay to establish a connection. To establish one TCP connection through the SOCKS proxy, the smartphone needs to exchange several packets as shown in the figure below.

First, the smartphone sends a SYN with the MP_CAPABLE option to create a connection to the SOCKS server. The SOCKS server replies and this consumes one round-trip-time. Then, the smartphone sends a SOCKS message to initiate the SOCKS sessions and find the methods that the SOCKS server supports. In some cases, an additional step is required to authenticate the client (not shown in the figure). Finally, the smartphone sends a CONNECT message to request the creation of a connection towards the final server. At this point, the smartphone can also create a subflow towards the SOCKS server. A benefit of some SOCKS deployment is that it can also encrypt all the data exchanged between the smartphone and the SOCKS server. This was important a few years ago, but less crucial today given the number of services that already use TLS.

A similar approach has been used to combine different access links. For example, OVH provides the overthebox that uses a SOCKS proxy running on an access router to combine several xDSL/cable/cellular links in a single pipe. The SOCKS proxy running on this router interacts with a SOCKS server as shown above. The code running on their access router is open-source and available from https://github.com/ovh/overthebox. OpenMPTCPRouter uses a similar approach.

The main benefit of SOCKS in these deployment is that it enables the simultaneous utilization of different access links. This increases the throughput for long TCP connections as shown by measurements from speedtest and similar services. However, SOCKS has a major drawback: it increases the time required to establish all TCP connections by several round-trip-times between the client and the SOCKS server. This additional delay can be significant for applications that rely on short TCP connections. The figure below (source [CBHB16a]) shows that the round-trip-time on cellular networks can be significant.

Measurement carried on smartphones [CBHB16a] show that many applications use very short connections that exchange a small amount of data. Increasing the setup time of these connections by forcing them to be proxied by SOCKS may affect the user experience.

References

[CBHB16a]

(1, 2) Quentin De Coninck, Matthieu Baerts, Benjamin Hesmans, and Olivier Bonaventure. A first analysis of multipath tcp on smartphones. In 17th International Passive and Active Measurements Conference, volume 17. Springer, March-April 2016. URL: https://inl.info.ucl.ac.be/publications/first-analysis-multipath-tcp-smartphones.html.

Using load balancers in front of Multipath TCP servers

Thu, 20 Dec 2018 00:00:00 +0100

Using load balancers in front of Multipath TCP servers

Load balancers play a very important role in today’s Internet. Most Internet services are provided by servers that reside behind one or several layers of load-balancers. Various load-balancers have been proposed and implemented. They can operate at layer 3, layer 4 or layer 7. Layer 4 is very popular and we focus on such load balancers in this blog post. A layer-4 load balancer uses information from the transport layer to load balance TCP connections over different servers. There are two main types of layer-4 load balancers :

The stafeful load balancers
The stateless load balancers

Schematically, a load balancer is a device or network function that processes incoming packets and forwards all packets that belong to the same connection to a specific server. A stateful load balancer will maintain a table that associates the five-tuple that identifies a TCP connection to a specific server. When a packet arrives, it seeks a matching entry in the table. If a match is found, the packet is forwarded to the selected server. If there is no match, e.g. the packet is a SYN, a server is chosen and the table is updated before forwarding the packet. The table entries are removed when they expire or when the associated connection is closed. A stateless load balancer does not maintain a table. Instead, it relies on hash function that is computed over each incoming packet. A simple approach is to use a CRC over the source and destination addresses and ports and associate each server to a range of CRC values. This is illustrated in the figure below (source: Fabien Duchene’s presentation of [DB17]).

With Multipath TCP, a single connection can be composed of different subflows that have their own five tuples. This implies that that data corresponding to a given Multipath TCP connection can be received over several different TCP subflows that obviously need to be forwarded to the same server by the load balancer. This is illustrated in the figure below (source: Fabien Duchene’s presentation of [DB17]).

Commercial load balancers

Several approaches have been proposed in the literature to solve this problem. In Datacenter Scale Load Balancing for Multipath Transport [OR16], V. Olteanu and C. Raiciu proposed two different tricks to support stateless load balancers with Multipath TCP. First, the load balancer selects the key that will be used by the server for each incoming Multipath TCP connection. As this key is used to Token that identifies the Multipath connection in the MP_JOIN option, this enables the load balancer to control the Token that clients will send when creating subflows. This allows the load balancer to correctly associated MP_JOINs to the server that terminates the corresponding connection. This is not sufficient for a stateless load balancer. A stateless load balancer also needs to associate each incoming packet to a specific server. If this packet belongs to a subflow, it carries the source and destination addresses and ports, but those of a subflow have no relationship with the initial subflow. They solve this problem by encoding the identification of the server inside a part of the TCP timestamp option.

In Towards a Multipath TCP Aware Load Balancer [LienardyD16], S. Lienardy and B. Donnet propose a mix between stateless and stateful approaches. The packets from the first subflow are sent to a specific server by hashing their source and destination addresses and ports. They then extract the key exchanged in the third ack to store the token associated with this connection. This token is then placed in a map that is used to load balance the SYN MP_JOIN packets. The reception of an MP_JOIN packet forces the creation of an entry in a table that is used to map the packets from the additional subflows. This is illustrated in the figure below (source [LienardyD16])

A benefit of this approach is that since the second subflow is directly associated with the server, all the packets exchanged over this subflow can reach the server without passing through the load balancer.

In Making Multipath TCP friendlier to Load Balancers and Anycast, F. Duchene and O. Bonaventure leverage a feature of the forthcoming standard’s track version of Multipath TCP. In this revision, the MP_CAPABLE option has been modified compared to RFC6824. A first modification is that the client does not send its key anymore in the SYN packet. A second modification is the C that when when set by a server in the SYN+ACK, it indicates that the server will not accept additional MPTCP subflows to the source address and flows of the SYN. This bit was specifically introduced to support load balancers. It works as follows. When a client creates a connection, it sends a SYN towards the load balancer with the MP_CAPABLE option but no key. The load balancer selects one server to handle the connection, e.g. based on a stateless hash. Each server has a dedicated IP address or a dedicated port number. It replies to the SYN with a SYN+ACK that contains the MP_CAPABLE option with the C bit set. Once the connection is established, it sends an ADD_ADDR option with its direct IP address to the client. The client then uses the direct address to create the subflows and those can completely bypass the load balancer. The source code of the implementation is available from https://github.com/fduchene/ICNP2017

The latest Multipath TCP load balancer was proposed in Stateless Datacenter Load-balancing with Beamer by V. Olteanu et al. It assigns one port to each load balanced server and also forces the client to create the subflows towards this per-server port number. The load balancer is implemented in both software (click elements) and hardware (P4) and evaluated in details. The source code is available from https://github.com/Beamer-LB

Commercial load balancers such as F5 or Citrix also support Multipath TCP.

References

[DB17]

(1, 2) Fabien Duchene and Olivier Bonaventure. Making multipath tcp friendlier to load balancers and anycast. In Network Protocols (ICNP), 2017 IEEE 25th International Conference on, 1–10. IEEE, 2017. URL: https://inl.info.ucl.ac.be/publications/making-multipath-tcp-friendlier-load-balancers-and-anycast.html.

[LienardyD16]

(1, 2) Simon Liénardy and Benoit Donnet. Towards a multipath tcp aware load balancer. In Proceedings of the 2016 Applied Networking Research Workshop, 13–15. ACM, 2016.

[OR16]	Vladimir Olteanu and Costin Raiciu. Datacenter scale load balancing for multipath transport. In Proceedings of the 2016 workshop on Hot topics in Middleboxes and Network Function Virtualization, 20–25. ACM, 2016.

[OR16]

Vladimir Olteanu and Costin Raiciu. Datacenter scale load balancing for multipath transport. In Proceedings of the 2016 workshop on Hot topics in Middleboxes and Network Function Virtualization, 20–25. ACM, 2016.

Which servers use Multipath TCP ?

Wed, 19 Dec 2018 00:00:00 +0100

Which servers use Multipath TCP ?

Since the publication of RFC 6824, Multipath TCP has been deployed for several use cases described in RFC 8041 and [BS16]. However, there has been no public study of the deployment of Multipath TCP on servers.

A first study appeared in [MHFB15]. They used zmap to send SYN packets with the MP_CAPABLE option and checked whether they replied with the MP_CAPABLE option. They used 452,008 unique IP addresses corresponding to a bit less than 2 million domains. Their results are summarised in the table below (source [MHFB15]).

Unfortunately, there was a flaw in the methodology. Checking the presence of the MP_CAPABLE option in the SYN+ACKis not sufficient as some middleboxes simply echo in the SYN+ACK any option contained in the SYN even if they do not understand it. The dataset was later corrected and updated. Any zmap scan must check that a returned TCP option in a SYN+ACK is not simply echoed by a middlebox. For Multipath TCP, this means that the returned MP_CAPABLE option must differ from the option sent in the SYN.

Since the publication of this paper, no other study using zmap has been published. Several research groups use zmap to carry out measurement studies. One of them is the netray Internet Observatory at RWTH Aachen University. They have used zmap to study the deployment of HTTP/2, QUIC or the initial TCP congestion window. I recently discussed with Jan Ruth and Oliver Hohlfeld who had some interesting data. In August 2015, a scan of the entire IPv4 addressing space revealed 921 Multipath TCP servers, with about one sixth of them hosted by Apple. At the same time, about 24.000 addresses returned in the SYN+ACK an MP_CAPABLE option that was exactly the same as the one sent in the SYN packet.

They proposed to launch another zmap scan last week over the entire IPv4 addressing space on port 443. 88484 addresses replied with an MP_CAPABLE option. Out of these, 77384 simply returned exactly the same MP_CAPABLE option as the one included in the SYN packet. There are thus more middleboxes that echo unsupported TCP options than in August 2015. The interesting point of their study is that 11k addresses replied with a valid MP_CAPABLE option. This indicates an important growth in the deployment of Multipath TCP on servers. This analysis ignores the utilisation of Multipath TCP on clients such as iPhones and other smartphones, but also in Hybrid Access Networks since none of those devices respond to SYN packets.

The exact usage of these servers is not known. If you manage one of them, we’d be interested in knowing more about your use case to document it in revisions of RFC 8041. If it receives a lot of traffic, we’d also be interested in analysing packet traces in more details to extend the measurements published in [CBHB16a][CBHB16b][HTSB15][TCH+16].

References

[BS16]

Olivier Bonaventure and SungHoon Seo. Multipath tcp deployments. IETF Journal, 12(2):24–27, 2016. URL: https://www.ietfjournal.org/multipath-tcp-deployments/.

[CBHB16a]

Quentin De Coninck, Matthieu Baerts, Benjamin Hesmans, and Olivier Bonaventure. A first analysis of multipath tcp on smartphones. In 17th International Passive and Active Measurements Conference, volume 17. Springer, March-April 2016. URL: https://inl.info.ucl.ac.be/publications/first-analysis-multipath-tcp-smartphones.html.

[CBHB16b]

Quentin De Coninck, Matthieu Baerts, Benjamin Hesmans, and Olivier Bonaventure. Observing real smartphone applications over multipath tcp. IEEE Communications Magazine, Network Testing Series, March 2016. URL: https://inl.info.ucl.ac.be/publications/observing-real-smartphone-applications-over-multipath-tcp.html.

[HTSB15]

Benjamin Hesmans, Viet-Hoang Tran, Ramin Sadre, and Olivier Bonaventure. A first look at real multipath tcp traffic. In Traffic Monitoring and Analysis. 2015. URL: https://inl.info.ucl.ac.be/publications/first-look-real-multipath-tcp-traffic.html.

[MHFB15]

(1, 2) Olivier Mehani, Ralph Holz, Simone Ferlin, and Roksana Boreli. An early look at multipath tcp deployment in the wild. In Proceedings of the 6th International Workshop on Hot Topics in Planet-Scale Measurement, HotPlanet ‘15, 7–12. New York, NY, USA, 2015. ACM. URL: http://doi.acm.org/10.1145/2798087.2798088, doi:10.1145/2798087.2798088.

[TCH+16]

Viet-Hoang Tran, Quentin De Coninck, Benjamin Hesmans, Ramin Sadre, and Olivier Bonaventure. Observing real multipath tcp traffic. Computer Communications, 2016. URL: https://inl.info.ucl.ac.be/publications/observing-real-multipath-tcp-traffic.html, doi:10.1016/j.comcom.2016.01.014.

Advertising addresses with Multipath TCP

Tue, 18 Dec 2018 00:00:00 +0100

Advertising addresses with Multipath TCP

An important feature of Multipath TCP is that the communicating hosts can easily learn the addresses that can be used to reach their peers. Multipath TCP uses special TCP options to advertise this information. In this post, we look at the evolution of these options during the design of RFC 6824

The first MPTCP draft, draft-ford-mptcp-multiaddressed-00 defined four options to deal with addresses.

The first option was called Add Address. It was intended to advertise one IP address owned by the host that sends it over a given Multipath TCP connection. This option contained an IP version field, to support IPv4 and IPv6, and an index. This index was intended to cope with NAT that could translate addresses in the IP header without translating information in the TCP options.

A companion option was the remove address option that simply contained the index of the address to be removed.

In addition to these two options that have been slightly modified later in RFC 6824, draft-ford-mptcp-multiaddressed-00 also defined two options for implicit path management. The first one is Request-SYN. It was intended to request the peer to initiate a subflow towards the announced address. The Add Address option was intended to be purely informational and the Request-SYN was intended to trigger the establishment of a new subflow.

There was also a Request-FIN option that was intended to request the termination of a subflow.

These two options were removed in draft-ford-mptcp-multiaddressed-01 and only the Add and Remove address options were kept.

The working group draft, draft-ietf-mptcp-multiaddressed-00 introduced another two changes to these options. First, it became possible to advertise a specific port number together with an IP address. Second, the document included a discussion on adding a backup flag to indicate that the advertised address should be treated as a backup one. This bit became fully defined in draft-ietf-mptcp-multiaddressed-02. As of December 2018, it is unclear whether the ability to advertise a port number is used by a real implementation. Some measurements about the utilization of this option on www.multipath-tcp.org are discussed in [HTSB15] and [TCH+16].

Then, the backup bit was removed in draft-ietf-mptcp-multiaddressed-03. In draft-ietf-mptcp-multiaddressed-04, a new option called MP_PRIO was introduced to allow a host to dynamically change the priority (backup or not) of an address. This separation between the address advertisement and the announcement of the backup status was intended to provide more reactivity.

This option is sent by a receiver to indicate to its peer that it wants to change the backup status of the subflow over which this option is sent.

It should be noted that the MP_JOIN option that is used to create subflows contains a backup flag that allows to indicate whether a new subflow should be treated as a backup one or not. In contrast with the MP_PRIO option that is sent unreliably in a TCP option, the MP_JOIN option is always exchanged reliably, which guarantees that the communicating hosts know the backup status of newly established flows.

The discussion on advertising addresses continued within the IETF after the publication of RFC 6824. The Linux implementation supports this option and uses it to announce a new address as soon as this address is known. However, the reception of an ADD_ADDR option does not necessarily trigger the establishment of a subflow. This is under the responsibility of the path manager. The fullmesh patch manager running on a client tries to use all possible addresses, but a Linux server does not create subflows. Apple’s implementation does not support the ADD_ADDR option. They assume that the servers have a single IP address and that the client will anyway create subflows by using the MP_JOIN option.

One of the concerns with the ADD_ADDR option included in RFC 6824 was the risk of attacks where an attacker could inject his address in an existing Multipath TCP connection by sending a spoofed packet containing this option and valid TCP sequence numbers. This attack was described in the threats analysis RFC 7430. The new ADD_ADDR option was proposed at IETF88 and included in draft-ietf-mptcp-rfc6824bis-00. It includes a truncated HMAC that authenticates it with the keys exchanged during the initial handshake. It also ensures that this option is not modified by an on-path attacker that has not observed the initial handshake.

The discussion continued and draft-ietf-mptcp-rfc6824bis-01 introduced an Echo bit to provide a reliable delivery of this option.

As of December 2018, this option is included in the last version of draft-ietf-mptcp-rfc6824bis-12 but has not yet been deployed in the field.

References

[HTSB15]

[TCH+16]

Multipath TCP APIs

Mon, 17 Dec 2018 00:00:00 +0100

Multipath TCP APIs

Shortly after the publication of RFC 6824, the IETF published RFC 6827. This RFC provides a first discussion of how a socket interface could be designed to support Multipath TCP. When RFC 6827 was published, none of the existing implementations included a socket API for Multipath TCP. The Linux implementation used the standard sockets with a few options and sysctls. This enabled Multipath TCP to be useable by any application, but without special means to control the utilisation of the subflows. This ability to support standard application remains one of the main advantages of this implementation. Once Multipath TCP has been added in the Linux kernel, all applications immediately benefit from its multipath capabilities.

However, there are some used cases where the applications know that they are using Multipath TCP and want to control some of its features such as the utilisation of the subflows of the packet scheduler. A first approach to provide this control was proposed in [HDB+15]. This solution exposes the Multipath TCP kernel API to user space applications through netlink messages.

Thanks to these messages, it is possible to design a userspace path manager librayr that control the utilisation of the subflows. This is important for use cases where one path is more costly than the other, e.g. on smartphones where the celullar interface is billed on a volume basis while the Wi-Fi interface is not metered. Several use cases and examples are described in [HDB+15].

A second possible approach is to extend the traditional socket API by using socket options that can be set or retrieved by the applications [HB16]. Six new socket options are proposed in this article and described later in draft-hesmans-mptcp-socket-03.

These socket options enable an application to control the creation and the removal of subflows during the lifetime of a Multipath TCP connection. Two use cases are described in this short paper : refreshing subflows on a regular basis and delaying the establishment of subflows. The patches that implement these socket options have been distributed on the mptcp-dev mailing list but not yet integrated in the official releases of the Multipath TCP implementation in the Linux kernel.

Apple has also defined a specific API to enable applications to interact with Multipath TCP. A simple example was provided in [HB16].

This API was private and not officially supported by Apple although there were examples in the Apple source code. When Apple announced Multipath TCP support for all iOS11 application, they also released a simple API that extends the URLSessionConfiguration property. This API does not provide a fine control to the utilization of the subflows as the socket API implemented on Linux, but it allows applications to select between the handover and interactive modes. Additional information is available from https://developer.apple.com/documentation/foundation/urlsessionconfiguration/improving_network_reliability_using_multipath_tcp

References

[HB16]

(1, 2) Benjamin Hesmans and Olivier Bonaventure. An enhanced socket api for multipath tcp. In Proceeding ANRW ‘16 Proceedings of the 2016 Applied Networking Research Workshop. 2016. URL: https://inl.info.ucl.ac.be/publications/enhanced-socket-api-multipath-tcp.html.

[HDB+15]

(1, 2) Benjamin Hesmans, Gregory Detal, Sebastien Barre, Raphael Bauduin, and Olivier Bonaventure. Smapp : towards smart multipath tcp-enabled applications. In Proc. Conext 2015. Heidelberg, December 2015. URL: https://inl.info.ucl.ac.be/publications/smapp-towards-smart-multipath-tcp-enabled-applications.html.

Evolution of the documents produced by the MPTCP working group

Sun, 16 Dec 2018 00:00:00 +0100

Evolution of the documents produced by the MPTCP working group

Since its creation, the mptcp working group of the IETF has produced 7 documents that were published as RFCs and an eighth one is currently in last call. This post provides a brief description of these different documents as a starting point for someone who wants to start to look at multipath transport protocols.

The working group has produced two Experimental RFCs and five Informational ones. The first two RFCs are RFC 6181 which discusses the security issues that were taken into account for the design of Multipath TCP and RFC 6182 which describes the basic architectural principles for Multipath TCP. RFC 6182 has already been discussed in a previous blog post. RFC 6181 was written by Marcelo Bagnulo and builds upon earlier work on a security analysis of Mobile IPv6 in RFC 4225 and shim6 RFC 4218. The key security issues that are discussed in this document are :

a flooding attack where an attacker who forces a server to send packets to a victim address by trying to force the server to use a subflow towards the victim. This attack was a concern for network layer protocols, but for Multipath TCP, this is not an issue since Multipath TCP validates the creation of the subflows with a three-way handshake and the MP_JOIN option.

a hijacking attack where an attacker leverages the address agility mechanisms of Multipath TCP to hijack an existing connection. One concern was the risk of an attacker being able to create a man-in-the-middle attack against existing Multipath TCP. This attack is prevented by the mechanism used by Multipath TCP to create subflows.

a discussion of time-shifted hijacking attacks

The threats analysis continued ad was later expanded in RFC 7430. This RFC introduces other types of attacks : an attack on ADD_ADDR which could allow an off-path attacker to create man-in-the-middle attack but under very unlikely circumstances, a denial of service attack on MP_JOIN, a SYN flood amplification attack and an eavesdropper that observes the initial handshake. These attacks were considered in the design of Multipath TCP or their implementations.

The other Informational RFCs are RFC 6897 which discusses API considerations and RFC 8041 which describes the known use cases where Multipath TCP has been deployed. Some of these use cases are also discussed in [BS16].

The three main documents produced by the mptcp working group are RFC 6356 which defines the coupled congestion control scheme and RFC 6824. As their publication date suggests, the congestion control scheme was stable much earlier and the protocol specification.

Experimental and Standards-track RFCs contain MUST, SHOULD and other keywords defined in RFC 2119. As RFC 6356 specifies a congestion control scheme, it only include two MUST keywords. However, RFC 6824 provides a precise protocol specification that leverages these keywords.

The above figure, plotted with a script developed by Maxime Piraux for [PDCB18], shows the evolution of the utilization of the RFC 2119 keywords in the different versions of draft-ietf-mptcp-multiaddressed. There were three main phases in the utilization of these keywords: a low usage in the first four drafts, then a sudden increase in draft-ietf-mptcp-multiaddressed-03 that included changes in the MP_CAPABLE option, the addition of an address identifier in MP_PRIO and answers to review comments a huge list of changes.

It is interesting to observe the evolution of draft-ietf-mptcp-rfc6824bis

The above figure shows that when looking at the RFC 2119 keywords, the specification did not change a lot compared to RFC 6824. Most of the changes were clarifications except for the redefinition of the MP_CAPABLE option as discussed in a previous blog post.

In contrast, the transport part of QUIC, defined in draft-ietf-quic-transport appears to be much more complex when counting the RFC 2119 keywords and the specification is not yet finished.

References

[BS16]

Olivier Bonaventure and SungHoon Seo. Multipath tcp deployments. IETF Journal, 12(2):24–27, 2016. URL: https://www.ietfjournal.org/multipath-tcp-deployments/.

[PDCB18]

Maxime Piraux, Quentin De Coninck, and Olivier Bonaventure. Observing the evolution of quic implementations. In Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC, EPIQ’18, 8–14. New York, NY, USA, 2018. ACM. URL: https://quic-tracker.info.ucl.ac.be/blog/results/paper/2018/11/19/epiq-18-paper-accepted.html, doi:10.1145/3284850.3284852.

Apple uses Multipath TCP

Sat, 15 Dec 2018 00:00:00 +0100

Apple uses Multipath TCP

The initial specification for Multipath TCP was published in January 2013 RFC 6824. Apple had participated to some of the discussions during the IETF meetings before, but never announced a deployment. Shortly after the publication of RFC 6824, Phil Eardley published a blank internet draft, draft-eardley-mptcp-implementations-survey-01 that explicitly asked questions to implementers.

Four implementations were disclosed during the summer of 2013:

the stable Linux implementation discussed on page 11

an ongoing implementation on FreeBSD discussed on page 19

an anonymous implementation discussed on page 23

on implementation on Citrix load balancers discussed on page 31

Five years after, it is interesting to look at the characteristics of this anonymous implementation.

This implementation only supports client-initiated subflows

It uses 4 bytes DSN as a default, but can support 8 bytes DSN

The support for ADD_ADDR and REMOVE_ADDR was described as : It does not support sending ADD_ADDR or processing ADD_ADDR as it is considered a security risk. Also, we only have a client side implementation at the moment which always initiates the sub flows. The remote end does not send ADD_ADDR in our configuration. The client can send REMOVE_ADDR however when one of the established sub flow’s source address goes away. The client ignores incoming REMOVE_ADDR options also.

It does not implement the coupled congestion control defined in RFC 6356

It uses a private API and not the socket API proposed in RFC 6897

The proposed deployment is described as follows : MPTCP in mobile environments is very powerful when used in the active/backup mode. Since the network interfaces available on mobile devices have different cost characteristics as well as different bring up and power usage characteristics, it is not useful to share load across all available network interfaces - at least not currently. Providing session continuity across changing network environments is the key deployment scenario.

In September 2013, Apple launched iOS7 that included support for Multipath TCP. Apple’s motivation for using Multipath TCP on iOS have been explained in details in [BS16]:

Siri is the digital assistant in Apple’s iOS and macOS operating systems. Because speech recognition requires tremendous processing power, Siri streams spoken commands to Apple’s datacenter for speech recognition; the result is sent back to the smartphone. Although the duration of a user’s interaction with Siri is relatively short, Siri’s usage pattern made this data transfer a perfect client for MPTCP.

Many people use Siri while walking or driving. As they move farther away from a WiFi access point, the TCP connection used by Siri to stream its voice eventually fails, resulting in error messages.

To address this issue, Apple has been using MPTCP—and benefiting from its handover capabilities—since its iOS 7 release. When a user issues a Siri voice command, iOS establishes an MPTCP connection over WiFi and cellular. If the phone loses connectivity to the WiFi access point, traffic is handed over to the cellular interface. A WiFi connection that is still in sight of an access point can have a channel become so lossy that barely any segments can be transmitted. In this case, another retransmission timeout happens and iOS retransmits the traffic over the cellular link.

The article continues with additional information that describes how Apple has tuned Multipath TCP to this specific use case. A description of the Multipath TCP handshake used by Siri has been published in a previous blog post.

While Multipath TCP was part of iOS, it was only used by Apple’s own Siri applications. The regular applications could not leverage the benefits of Multipath TCP. This changed in 2017 with the launch of iOS11. During WDC2017, Christoph Paasch and his colleagues announced that any application would be able to use Multipath TCP on iOS11.

A detailed summary of these announcements appear on the tessares blog. iOS11 supports two modes of operation : Handover and Interactive.

Connection starts over the WiFi link and no packet is sent over the cellular interface. If the signal gets worse, a new TCP subflow will be created on the cellular interface automatically. The cellular subflow will be removed once the user is back in a WiFi network.

The interactive mode establishes both WiFi and cellular subflows for each Multipath TCP connection, even if the WiFi network appears to be working well. The objective of this mode is to reduce latency. The Multipath TCP scheduler will select the flow that provides the lowest latency.

Since the publication of iOS11, some applications have started to use Multipath TCP. One of them is the Multipath Tester, an application written by Quentin De Coninck that allows to compare the performance of Multipath TCP and Multipath QUIC on iOS11 [CB18]. You can download it from https://itunes.apple.com/us/app/multipathtester/id1351286809

References

[BS16]

Olivier Bonaventure and SungHoon Seo. Multipath tcp deployments. IETF Journal, 12(2):24–27, 2016. URL: https://www.ietfjournal.org/multipath-tcp-deployments/.

[CB18]

Quentin De Coninck and Olivier Bonaventure. Observing network handovers with multipath TCP. In Proceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos, SIGCOMM 2018, Budapest, Hungary, August 20-25, 2018, 54–56. 2018. URL: https://multipath-quic.org/multipathtester/2018/08/28/sigcomm-poster.html, doi:10.1145/3234200.3234214.

Can Multipath TCP cope with middleboxes ?

Fri, 14 Dec 2018 00:00:00 +0100

Can Multipath TCP cope with middleboxes ?

As explained in a previous blog post, Multipath TCP had to cope with a variety of middleboxes which could interfere with this TCP extension.

Shortly after we detected the first interferences between a firewall and Multipath TCP, Honda et al. presented a detailed analysis [HNR+11] of the limits of the extensibility of TCP based on Internet measurements. To correctly understand the problems caused by middleboxes, we first need to remember that they can operate in any layer of the protocol stack as illustrated in the figure below.

When a router forwards an IPv4 packet that contains a TCP segment, it may modify some fields of the IPv4 header but never changes any field of the TCP header. This is one of the basis of the layering principles.

Middleboxes are different. As they potentially operate in any layer of the protocol stack, they can potentially change any field of the packet headers, in any layer. Some of them also modify packet payloads.

The main difficulty in such a network environement is that the TCP state on the client and on the server are updated based on information carried out inside packets. When the information placed in these packets changes after their transmission by one of the communicating hosts, this can create strange problems. Several of the functions of the Multipath TCP were designed to cope with middlebox interference. Here are a few examples :

During the three-way handshake, the client sends the MP_CAPABLE option in the third ack to cope with a middlebox that could remove it from the SYN+ACK

The ADD_ADDR, REMOVE_ADDR and MP_JOIN option contain an address identifier to cope with Network Address Translation

The DSS option uses relative sequence numbers to cope with middleboxes that randomize the initial TCP sequence number

The DSS option maps of block of data from the bytestream onto the TCP subflow. The length field of the DSS option allows to cope with middleboxes (or fast NICs) that segment/reassemble packets

The DSS option contains a Checksum to cope with middleboxes that add/remove bytes in the payload

Multipath TCP and its implementation in the Linux kernel can cope with these interferences and others. This makes Multipath TCP very robust compared to older TCP extensions. An example with a strange middlebox was published in another blog post.

A detailed analysis of the reactions of Multipath TCP against those interferences was published in [HDP+13]. In some cases, Multipath TCP reacts by closing the subflow that passes through this middlebox. In other cases, it fallsback to regular TCP. A summary of this analysis may be found in the table below.

If you suspect that there is a middlebox that interferes with Multipath TCP connections on a path, you can use tracebox [Gre] to detect the location of this middlebox. Examples of the utilisation of tracebox on Linux/MacOS and Android appeared on earlier blog posts.

References

[Gre]

Detal Gregory. \texttt tracebox. http://www.tracebox.org.

[HDP+13]

Benjamin Hesmans, Fabien Duchene, Christoph Paasch, Gregory Detal, and Olivier Bonaventure. Are TCP Extensions Middlebox-proof? In Proceedings of the 2013 Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox). 2013. URL: https://inl.info.ucl.ac.be/publications/are-tcp-extensions-middlebox-proof.html.

[HNR+11]

M. Honda, Y. Nishida, C. Raiciu, A. Greenhalgh, M. Handley, and H. Tokuda. Is it still possible to extend TCP? In Proceedings of the 2011 ACM SIGCOMM conference on Internet Measurement Conference (IMC). 2011.

Fixing problems before the submission deadline

Thu, 13 Dec 2018 00:00:00 +0100

Fixing problems before the submission deadline

In the academic community, paper submission deadlines are sometimes strong incentives that encourage researchers to find solutions to problems that they ignored until then. While preparing the final version of a paper [RPB+12] that describes the design and the implementation of Multipath TCP, we thought that it would be interesting to add some measurement results to confirm that the protocol worked well for the important use case of combining the Wi-Fi and cellular interfaces on smartphones. We had already performed various experiments with such wireless networks and were expecting that the results could be obtained in a few hours.

Our initial objective was to meet one of the functional goals of as described in RFC 6581 :

*Improve Throughput: Multipath TCP MUST support the concurrent use

of multiple paths. To meet the minimum performance incentives for deployment, a Multipath TCP connection over multiple paths SHOULD achieve no worse throughput than a single TCP connection over the best constituent path.*

We created a small measurement setup in the lab by using two servers connected over Gigabit Ethernet with tc.

We first verified whether TCP could use the two wireless links when used alone. This was indeed the case as shown in the figure below (source [RPB+12]).

For this measurement, we looked at the impact of the receive window on the measured throughput. For TCP, the impact is low, except when the window is smaller than the bandwidth delay product, but this is not a surprise. When then ran the same experiments with the two interfaces with Multipath TCP. We were expecting some impact with a small window but did not anticipate the results shown below (source [RPB+12]).

When the maximum window is large, Multipath TCP aggregates the cellular and the Wi-Fi interfaces as expected. However, when the receive window is smaller, Multipath TCP can transfer at a rate which is small than regular TCP. This result was annoying and we were less than a week before the submission deadline. It was difficult to submit the paper without describing this basic use case in the paper. We organised daily teleconferences to understand the problem and then try to solve it.

tcpdump helped us to understand the problem by collecting packet traces. The main issue was the difference between the delay of the cellular link and the delay of the Wi-Fi link. We observe frequently the following situation in the packet trace. The server sent many packets over the Wi-Fi interface and one over the cellular interface. The acknowledgements were coming quickly from the Wi-Fi interface, but the sender had frequently to wait for an acknowledgement over the cellular interface. During these periods, the receive window was full and the sender could not transmit packets over the Wi-Fi link although it was idle. This was the explanation for the reduced throughput with the small receive window.

Once the problem was identified, the problem could be solved. The solution is composed of two parts. First, when Multipath TCP detects that it is window-blocked and there is some unacknowledged data, it tries to re-inject the data over another subflow whose congestion window is open. If this data is acknowledged quickly, then the receiver will advertise a large receive window that will enable the sender to transmit. Unfortunately, this is not sufficient as the same situation could happen again later. The second part of the solution is to penalise the slow subflow by halving its congestion window. These two elements of the solutions fixed the problem over Wi-Fi and cellular.

This heuristic was later improved after a detailed experimental evaluation over a wire range of network conditions [PKB13].

References

[PKB13]

Christoph Paasch, Ramin Khalili, and Olivier Bonaventure. On the benefits of applying experimental design to improve multipath tcp. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies, 393–398. ACM, 2013. URL: https://inl.info.ucl.ac.be/publications/benefits-applying-experimental-design-improve-multipath-tcp.

[RPB+12]

(1, 2, 3) C. Raiciu, C. Paasch, S. Barre, A. Ford, M. Honda, F. Duchene, O. Bonaventure, and M. Handley. How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI). 2012. URL: https://inl.info.ucl.ac.be/publications/how-hard-can-it-be-designing-and-implementing-deployable-multipath-tcp.html.

Multipath TCP inside the beast

Wed, 12 Dec 2018 00:00:00 +0100

Multipath TCP inside the beast

One of the nice points about releasing open-source software such as the Multipath TCP implementation in the Linux kernel is that there are unexpected use cases. In early 2013, we were contacted by Niels Laukens who works for VRT, the Dutch speaking television in Belgium. He had been following the project and identified a nice use case. Journalists use more and more computers to prepare their articles, but also when they go off-site for interviews. Once the interview has been recorded, they often need to edit it locally before uploading it to the television services to broadcast it or place it on the web site.

For live videos, they often rely on dedicated satellite channels, but these are expensive and they need a large antenna. Such antennas are fine when an event is planned and they need a large coverage. However, there are many situations where they cannot send a large team to record interviews and short movies. To cover those cases, they have equipped a small “mini” that serves as a mobile studio. A single journalist can record an interview, edit it and then send it over the air. This last part is the most interesting one for us. Satellite links are expensive and there are many situations where it is difficult to use a satellite. 3G, 4G and Wi-Fi could help, but their performance differ. Asking each journalist to learn to select the best network to upload his work was not a feasible solution. Fortunately, Niels found the right solution with Multipath TCP. The mini is equipped with a simple Multipath TCP proxy that is attached to all the available networks. The journalist to use his/her regular laptop through the proxy to upload his/her movies via all the available connections. This is much faster and simpler than always moving the car to a location where the satellite works well.

VRT published a nice video of their mini that is internally called “The Beast” :

https://www.youtube.com/watch?v=JMRWq7aqi9o

Multipath TCP in the datacenter

Tue, 11 Dec 2018 00:00:00 +0100

Multipath TCP in the datacenter

In the scientific literature, one of the first important use case for Multipath TCP was to distribute the load datacenters. Several architectures have been proposed for datacenters. They differ in how links are organised, but all offer multiple paths between the servers. Measurement studies:cite:benson2010network have shown that datacenter traffic is composed of a lot of short flows called mice that are delay-sensitive, but most of the data is transported in long flows, called elephants that consume most of the bandwidth and can compete with the mice. One of the problems in a datacenter is that congestion can happen on some of the network links while others are unused. This is illustrated in the figure below that shows two TCP connections competing for the same link.

This problem was studied by Raiciu et al. by simulations [RBP+11]. They demonstrate that these collisions between competing flows significantly impact the performance of TCP.

Different techniques have been explored in the literature to solve this problem. Many of the proposed approaches used a centralised controller with Openflow or other similar techniques to reroute flows to avoid congestion.

With Multipath TCP, a completely distributed solution is possible. It leverages the utilisation of Equal Cost Multipath (ECMP) on datacenter switches. When a router/switch has several paths having the same cost towards a given destination, it can send packets over any of these paths. To maximise load-balancing, routers install all the available paths in their forwarding tables and balance the arriving packets over all of them. To ensure that all the packets that correspond to the same layer-4 flow follow the same path and thus have roughly the same delay, routers usually select the outgoing equal cost path by computing : when n is the number of equal cost paths towards the packet’s destination and H a hash function.

A consequence of this utilisation of ECMP is that TCP connections with different source ports between two hosts will sometimes follow different paths. This motivated the design of the ndiffports path manager in the Linux kernel. This path manager opens different subflows using the same source and destination IP addresses, the same destination port but different source addresses. The benefit of this approach is that the different subflows of a Multipath TCP connection will likely follow different paths inside the datacenter. With this path manager, Multipath TCP improves the utilisation of the datacenter as illustrated by the simulation results below.

One of the limiting factors of ECMP is that flows with different source ports may still use the same paths. This problem can be fixed by using a reversible hash function [DPVDL+13].

From an operation viewpoint, the most convincing argument of [RBP+11] was that similar results were obtained with the Linux implementation of Multipath TCP [PBarreD+14] in real datacenters by using Amazon EC2 servers.

The SIGCOMM’11 article [RBP+11] attracted a lot of interest in the scientific community and is one of the most widely cited Multipath TCP articles. However, as of 2018, no real deployment of Multipath TCP in the datacenter has been publicly documented.

References

[DPVDL+13]

Gregory Detal, Christoph Paasch, Simon Van Der Linden, Pascal Merindol, Gildas Avoine, and Olivier Bonaventure. Revisiting flow-based load balancing: stateless path selection in data center networks. Computer Networks, 57(5):1204–1216, 2013. URL: https://inl.info.ucl.ac.be/publications/revisiting-flow-based-load-balancing-stateless-path-selection-data-center-networks.html.

[PBarreD+14]

C. Paasch, S. Barré, G. Detal, F. Duchene, and others. Linux kernel implementation of Multipath TCP. https://www.multipath-tcp.org, 2014.

[RBP+11]

(1, 2, 3) C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving Datacenter Performance and Robustness with Multipath TCP. ACM SIGCOMM Computer Communication Review (CCR), 41(4):266–277, 2011. URL: https://inl.info.ucl.ac.be/publications/improving-datacenter-performance-and-robustness-multipath-tcp.html.

The first Multipath TCP enabled smartphones

Mon, 10 Dec 2018 00:00:00 +0100

The first Multipath TCP enabled smartphones

Smartphones are a very important use case for Multipath TCP with two very large scale deployments :

Apple uses Multipath TCP on iPhones and iPads. In 2013, the initial motivation was to support fast handovers for the Siri voice recognition application. Since iOS12, any iOS application can use Multipath TCP

High-end Samsung and LG samrtphones in South Korea use Multipath TCP to bond their Wi-Fi and cellular interface for the GigaLTE service to reach a bandwidth of up to 1 Gbps.

Although the designers of Multipath TCP were aware of the importance of the smartphones, the initial focus of the design was to enable ressource pooling. A first analysis of the benefits that Multipath TCP appeared in [RNBH11] with experiments carried out with a userspace implementation running on laptops.

The initial support for Multipath TCP on smartphones started with interactions with the engineers of a young Irish company called Multipath Networks. They had developed a first prototype that bonds several access networks together by using Multipath TCP. To cover new use cases, such as in-vehicle data transfers, they were testing the ability of Multipath TCP to quickly recover from the failure of any network interface. At that time, Multipath TCP was mainly used on servers and laptops and this part of the code was not heavily tested. Thanks to their detailed tests, we could improve the Multipath TCP implementation in the Linux kernel to better react to link failures.

When a smartphone moves, it may switch frequently from one access network to another. We explore these failures in more details in [PDD+12] by leveraging the recent updates to the Linux implementation. A key result of this paper was to demonstrate that Multipath TCP can provide fast handovers as shown in the figure below.

Our measurements indicated that the application could take up to 3 seconds to recover from a network failure with TCP. For Multipath TCP, we proposed two different modes of operation: Full MPTCP and the Backup Mode. With Full MPTCP, the two subflows are used simultaneously while the Backup Mode use the MP_PRIO option to indicate that a subflow should only be used once the other subflow fails.

This paper was also the first to describe measurements with Multipath TCP on a real smartphone: the Nokia N950.

It was mainly used to study the energy consumption of the cellular and Wi-Fi interfaces when using Multipath TCP.

Unfortunately, this developper phone never became widely used. We then performed some experiments with Samsung Galaxy S2 smartphones and we obtained later a dozen of Nexus5. The Samsung Galaxy S2 was used to record a nice video that demonstrates how Multipath TCP enables smooth handovers.

https://www.youtube.com/watch?v=vbq0osOEPHs

The video shows two live video streamings. The smartphone uses a Wi-Fi hotspot provided by a laptop that captures packets on its Wi-Fi interface and displays them on the right part of the screen. With regular TCP, the video stream stops when the smartphone leaves the area covered by the Wi-Fi access point. With Multipath TCP, the video stream switches automatically to the cellular interface and then later comes back to the Wi-Fi when the user returns.

Several researchers have explored the performance of the Linux Multipath TCP on smartphones. De Coninck et al. have configured a set of Nexus devices to use a SOCKS proxy [CBHB16a]. This is similar to the Giga LTE deployment mentioned earlier, but at a much smaller scale. This analysis captured 71 millions packets over more than 7 weeks. Using the same devices, the same team of researchers analysed the behavior of the most popular smartphone applications over Multipath TCP [CBHB16b]. A similar study was published in [NGQ+16]. Later, De Coninck proposed several improvements to the Multipath TCP implementation to tune it for interactive applications running on smartphones [CB18].

As of December 2018, there is no published detailed analysis of the performance of Multipath TCP on iPhones, despite the large user base. We’d be very interesting in collecting packet traces on servers that are used by iPhone applications that have enabled Multipath TCP. Feel free to contact me if you have deployed such an application.

References

[CBHB16a]

[CBHB16b]

[CB18]

Quentin De Coninck and Olivier Bonaventure. Tuning multipath tcp for interactive applications on smartphones. In IFIP Networking 2018. 2018. URL: https://inl.info.ucl.ac.be/publications/tuning-multipath-tcp-interactive-applications-smartphones.html.

[NGQ+16]

Ashkan Nikravesh, Yihua Guo, Feng Qian, Z Morley Mao, and Subhabrata Sen. An in-depth understanding of multipath tcp on mobile devices: measurement and system design. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking, 189–201. ACM, 2016.

[PDD+12]

C. Paasch, G. Detal, F. Duchene, C. Raiciu, and O. Bonaventure. Exploring Mobile/WiFi Handover with Multipath TCP. In Proceedings of the 2012 ACM SIGCOMM workshop on Cellular Networks: Operations, Challenges, and Future Design (CellNet). 2012. URL: https://inl.info.ucl.ac.be/publications/exploring-mobilewifi-handover-multipath-tcp.html.

[RNBH11]

Costin Raiciu, Dragos Niculescu, Marcelo Bagnulo, and Mark James Handley. Opportunistic mobility with multipath tcp. In Proceedings of the sixth international workshop on MobiArch, 7–12. ACM, 2011.

Why is the Multipath TCP scheduler so important ?

Sun, 09 Dec 2018 00:00:00 +0100

Why is the Multipath TCP scheduler so important ?

Multipath TCP can pool several links together. An important use case for Multipath TCP are the smartphones and tablets equipped with both 3G and WiFi interfaces. On such devices, Multipath TCP would establish two subflows, one over the WiFi interface and one over the 3G interface. Once the two subflows have been established, one the main decisions taken by Multipath TCP is the scheduling [1] of the packets over the different subflows.

This scheduling decision is very important because it can impact performance and quality of experience. In the current implementation of Multipath TCP in the Linux kernel, the scheduler always prefers the subflow with the smallest round-trip-time to send data. A typical example of the operation of this scheduler is shown in the demo below from the https://www.multipath-tcp.org web site :

https://www.youtube.com/watch?v=VWN0ctPi5cw

On this demo, the Multipath TCP client uses SSH over Multipath TCP to connect to a server that exports a screensaver over the SSH session. The client has three interfaces : WiFi, 3G and Ethernet. Multipath continuously measures the round-trip-time every time it sends data over any of these subflows. The Ethernet subflow has the lowest routing time. WiFi has a slightly higher round-trip-time and 3G has the worst round-trip-time. The SSH session is usually not limited by the network throughput and all subflows are available every time data needs to be transmitted. When Ethernet is available, it is preferred over the other interfaces. WiFi is preferred over 3G and 3G is only used when the two other interfaces are unavailable.

Sending data over the subflow with the smallest round-trip-time is not sufficient to achieve good performance on memory constrained devices that use a small receive window. This problem was first explored in [RPB+12] where reinjection and penalizations where proposed to mitigate the head-of-line blocking than can occur when the receiver advertises a limited receive window. The typical scenario is a smartphone using 3G and WiFi where 3G is slower than WiFi. If the receiver is window-limited, then it might happen that a packet is sent on the 3G subflow and then the WiFi subflow becomes blocked due to the limited receive window. In this case, the algorithm proposed in [RPB+12] will reinject the unacknowledged data from the 3G subflow on the WiFi subflow and reduce the congestion window on the 3G subflow. This problem has been analyzed in more details in [PKB13] by considering a large number of scenarios. This analysis has resulted in various improvements to the Linux Multipath TCP implementation [PBarreD+14]. A detailed analysis of the performance of the current packet schedulers has been published in [PFAB14].

Several researchers have proposed other types of schedulers for Multipath TCP or other transport protocols. In theory, if a scheduler has perfect knowledge of the network characteristics (bandwidth, delay), it could optimally schedule the packets that are transmitted to prevent head-of-line blocking problems and minimize the buffer occupancy. In practice, and in a real implementation, this is slightly more difficult because the delay varies and the bandwidth is unknown and varies in function of the other TCP connections.

The Delay-Aware Packet Scheduling For Multipath Transport proposed in [KLM+14] is a recent example of such schedulers. [KLM+14] considers two paths with different delays and generates a schedule, i.e. a list of sequence numbers to be transmitted over the different paths. Some limitations of the proposed scheduler are listed in [KLM+14], notably : the DAPS scheduler assumes that there is a lage difference in delays between the different paths and it assumes that the congestion windows are stable. In practice, these conditions are not always true and a scheduler should operate in all situations. [KLM+14] implements the proposed scheduler in the ns-2 CMT simulator dans evaluates its performance in small networks.

Another scheduler is proposed in [YAE13]. This scheduler tries to estimate the available capacity on each subflow and measures the number of bytes transmitted over each subflow. This enables the scheduler to detect when the subflow is sending too much data and select the other subflow at that time. The proposed scheduler is implemented in the Linux kernel, but unfortunately the source code does not seem to have been released by the authors of [YAE13]. The performance of the scheduler is evaluated by considering a simulation scenario with very long file transfers in a network with a very small amount of buffering. It is unclear whether this represents a real use case for Multipath TCP.

Since the publication of [PFAB14], various schedulers have been proposed and evaluated based on simulations and measurements. A detailed review of these schedulers would be much longer than a a blog post. Here are a few pointers for recent and interesting papers.

Corbillon et al. propose a cross-layer scheduler that is optimised for video content [CAPK+16]

Kimura et al. propose three alternate Multipath TCP schedulers [KLL17]

Several researchers have proposed Multipath TCP schedulers that transmit the same packet over different subflows [WZS16], [FErbshausserB+16], [HKV16], [FSS+18]

De Coninck proposes and implements a scheduler that is tuned for Multipath TCP applications running on smartphones [CB18]. In a nutshell, this scheduler tries to send packet over the last subflow from which data was received. This enables it to better support handovers from cellular to Wi-Fi compared to classical schedulers.

The most interesting approach to solve the Multipath TCP scheduling problem was proposed by Frommgen et al. [FrommgenRErbshausser+17]. This paper proposes a high-level API that enables application developpers to inject eBPF code containing their scheduling decision directly in the Linux kernel. Such a programming model provides a lot of flexibility and enables to solve this problem in a generic manner. Unfortunately, the code developed for this paper has not been submitted for inclusion in the Linux kernel implementation.

It can be expected that researchers will continue to propose scheduler for Multipath TCP and other multipath transport protocols (a new scheduler [RHB18] has already been proposed for the recent Multipath QUIC [DCB17]). There is still room for improvement in the Multipath TCP scheduler. However, to be convincing, the evaluation of a new scheduler should not be limited to small scale simulations. It should consider a wide range of scenarios like [PKB13] and demonstrate that it can be efficiently implemented in the Linux kernel. Ideally, it should provide a flexible API that leverages the eBPF VM in the Linux kernel such as proposed in [FrommgenRErbshausser+17].

[1]	A first version of this post was published in 2014

References

[CB18]

[CAPK+16]

Xavier Corbillon, Ramon Aparicio-Pardo, Nicolas Kuhn, Géraldine Texier, and Gwendal Simon. Cross-layer scheduler for video streaming over mptcp. In Proceedings of the 7th International Conference on Multimedia Systems, 7. ACM, 2016.

[DCB17]

Quentin De Coninck and Olivier Bonaventure. Multipath quic: design and evaluation. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies, 160–166. ACM, 2017. URL: https://www.multipath-quic.org.

[FSS+18]

Benevid Felix, Igor Steuck, Aldri Santos, Stefano Secci, and Michele Nogueira. Redundant packet scheduling by uncorrelated paths in heterogeneous wireless networks. In 2018 IEEE Symposium on Computers and Communications (ISCC), 00498–00503. IEEE, 2018.

[FErbshausserB+16]

Alexander Frommgen, Tobias Erbshäußer, Alejandro Buchmann, Torsten Zimmermann, and Klaus Wehrle. Remp tcp: low latency multipath tcp. In Communications (ICC), 2016 IEEE International Conference on, 1–7. IEEE, 2016.

[FrommgenRErbshausser+17]

(1, 2) Alexander Frömmgen, Amr Rizk, Tobias Erbshäußer, Max Weller, Boris Koldehofe, Alejandro Buchmann, and Ralf Steinmetz. A programming model for application-defined multipath tcp scheduling. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, 134–146. ACM, 2017.

[HKV16]

Axel Hunger, Pascal A Klein, and Martin H Verbunt. Evaluation of the redundancy-bandwidth trade-off and jitter compensation in rmptcp. In New Technologies, Mobility and Security (NTMS), 2016 8th IFIP International Conference on, 1–5. IEEE, 2016.

[KLL17]

Bruno YL Kimura, Demetrius CSF Lima, and Antonio AF Loureiro. Alternative scheduling decisions for multipath tcp. IEEE Communications Letters, 21(11):2412–2415, 2017.

[KLM+14]

(1, 2, 3, 4) Nicolas Kuhn, Emmanuel Lochin, Ahlem Mifdaoui, Golam Sarwar, Olivier Mehani, and Roksana Boreli. Daps: intelligent delay-aware packet scheduling for multipath transport. In Communications (ICC), 2014 IEEE International Conference on, 1222–1227. IEEE, 2014.

[PBarreD+14]

C. Paasch, S. Barré, G. Detal, F. Duchene, and others. Linux kernel implementation of Multipath TCP. https://www.multipath-tcp.org, 2014.

[PFAB14]

(1, 2) Christoph Paasch, Simone Ferlin, Ozgu Alay, and Olivier Bonaventure. Experimental evaluation of multipath tcp schedulers. In ACM SIGCOMM Capacity Sharing Workshop (CSWS). ACM, 2014. URL: https://inl.info.ucl.ac.be/publications/experimental-evaluation-multipath-tcp-schedulers.html.

[PKB13]

(1, 2) Christoph Paasch, Ramin Khalili, and Olivier Bonaventure. On the benefits of applying experimental design to improve multipath tcp. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies, 393–398. ACM, 2013. URL: https://inl.info.ucl.ac.be/publications/benefits-applying-experimental-design-improve-multipath-tcp.

[RHB18]

Alexander Rabitsch, Per Hurtig, and Anna Brunstrom. A stream-aware multipath quic scheduler for heterogeneous paths. In Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC, 29–35. ACM, 2018.

[RPB+12]

(1, 2) C. Raiciu, C. Paasch, S. Barre, A. Ford, M. Honda, F. Duchene, O. Bonaventure, and M. Handley. How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI). 2012. URL: https://inl.info.ucl.ac.be/publications/how-hard-can-it-be-designing-and-implementing-deployable-multipath-tcp.html.

[WZS16]

Wei Wang, Liang Zhou, and Yi Sun. Improving multipath tcp for latency sensitive flows in the cloud. In Cloud Networking (Cloudnet), 2016 5th IEEE International Conference on, 45–50. IEEE, 2016.

[YAE13]

(1, 2) Fan Yang, Paul Amer, and Nasif Ekiz. A scheduler for multipath tcp. In Computer Communications and Networks (ICCCN), 2013 22nd International Conference on, 1–7. IEEE, 2013.

The evolution of the MP_CAPABLE option

Sat, 08 Dec 2018 00:00:00 +0100

The evolution of the MP_CAPABLE option

A TCP connection always starts with a three-way handshake. The client sends a SYN packet that contains the TCP options that it wants to negotiate with the server. The server replies with the TCP options that it supports. Like most TCP extensions (the only known exception is the support for Explicit Congestion Notification defined in RFC 3168 that uses bits of the TCP header for its negotiation), Multipath TCP defines the MP_CAPABLE option that needs to be used in the SYN.

s [ label = "SYN MP_CAPABLE\n\n"]; |||; s=>c [label = "SYN+ACK MP_CAPABLE\n\n"]; |||; c=>s [label="ACK\n\n"]; }" usemap="#cb4b00e057b38778c02651e0f06a2fb8a3351d0b"/>

This option first appeared in draft-ford-mptcp-multiaddressed-00. Its evolution since 2009 reflects the evolution of the protocol and some of the lessons learned during its design and based on experiments.

This firs draft proposed the following format for this MP_CAPABLE option.

This initial format contains several interesting fields. Like most TCP options, the MP_CAPABLE option is encoded using a type length value format. draft-ford-mptcp-multiaddressed-00 decided to use a different option type for different Multipath TCP options. At that time, this was considered to be a normal practice. The Selective Acknowledgement option RFC 2018 also defined two TCP options: one to negotiate its utilization in the SYN packet and one to carry the selective acknowledgements. Besides the Kind and Length field, this first MP_CAPABLE option contained a Version number (the protocol designers already anticipated the possibility of having different versions of Multipath TCP) and a 32 bits Sender Token.

TCP uses IP addresses and port numbers to identify connections. Each packet belonging to a connection carries the source and destination addresses and ports that uniquely identify it. Multipath TCP cannot rely on this to identify connections since each Multipath TCP connection is composed of a set of TCP connections, called subflows, which can vary over time. As indicated in RFC 6182, Multipath TCP uses unique identifiers that are called tokens to identify connections. Each host generates a token that identifies each connection locally. The MP_CAPABLE option carries the token chosen by the client in the SYN and the token chosen by the server in the SYN+ACK.

s [ label = "SYN MP_CAPABLE(Client Token)\n\n"]; |||; s=>c [label = "SYN+ACK MP_CAPABLE(Server Token)\n\n"]; |||; c=>s [label="ACK\n\n"]; }" usemap="#5cb70ff4501bc951b5cbdc6886dead4e03a0ccf8"/>

The first change to this important option appeared in draft-ford-mptcp-multiaddressed-03.txt which was published in March 2010.

When a TCP connection starts, the client selects its initial sequence number and places it inside the SYN. The server does the same with its SYN+ACK. In addition to the TCP sequence numbers that are included in the TCP header, Multipath TCP uses Data Sequence Numbers (DSN) that are placed in the DSN option. These DSNs are used to number the data that is sent over the different TCP subflows. The first drafts assumed that the DSN would start at 0 in both directions when a Multipath TCP connection is created. However, this choice raised security concerns since an attacker could easily predict the content of the DSN option that he should use to inject data inside a Multipath TCP connection. The normal way to solve such a problem is to use a random value for the initial DSN in each direction. The second version of the MP_CAPABLE option carries this value.

s [ label = "SYN MP_CAPABLE(Client Token+IDSN)\n\n"]; |||; s=>c [label = "SYN+ACK MP_CAPABLE(Server Token+IDSN)\n\n"]; |||; c=>s [label="ACK\n\n"]; }" usemap="#87be65dc36c02534c48d4f33733668c7d33cc379"/>

This draft was adopted by the working group as draft-ietf-mptcp-multiaddressed-00. The MP_CAPABLE option was modified again in draft-ietf-mptcp-multiaddressed-02. The new format is shown below.

The new format still contains a version number but it adds two keys. The client sends an MP_CAPABLE option that contains the sender key in the SYN, while the server replies with a SYN+ACK that contains an MP_CAPABLE option with both the Sender and the Receiver keys. These keys were introduced to improve the security of the addition of new subflows with the MP_JOIN option. The client and the server exchange their keys during the three-way handshake and later used them to prove their ownership of the connection when they add a new subflow. A simple solution to carry these keys would have been to extend the MP_CAPABLE option that was shown earlier. Unfortunately, it would have become difficult to open a Multipath TCP connection given the limited length of the TCP extended header.

s [ label = "SYN MP_CAPABLE(Client Key)\n\n"]; |||; s=>c [label = "SYN+ACK MP_CAPABLE(Server Key)\n\n"]; |||; c=>s [label="ACK\n\n"]; }" usemap="#59a50e9cb98bf6a28f4a1624d18eb067f70469d4"/>

These two keys help to authenticate the establishment of subflows, but we saw earlier that Multipath TCP hosts need to exchange tokens and agree on initial sequence numbers and that this information was included in the MP_CAPABLE option. With the new format, we had to find a way to compute these tokens and initial data sequence numbers from the keys. draft-ietf-mptcp-multiaddressed-02 computes them with a hash function and uses the high (resp. low) order bits of this hash as the token (resp. initial data sequence number).

Thanks to these hashes, we could reduce the length of the MP_CAPABLE option. However, this created a small risk of collision. When a host generates a random 64 bits key, it must verify that the hash of this key does collide with the token of an existing Multipath TCP connection. The figure below, extracted from [RPB+12], shows that the performance impact of this verification is not too high.

Two modifications appeared in draft-ietf-mptcp-multiaddressed-03. The first one is that this document requests a single TCP Option Kind for all Multipath TCP options. This was mainly because we feared that a middlebox could drop one type of Multipath TCP option and not the other. Dealing with such corner cases would have made the protocol much more complex. We hoped that if a middlebox decided to drop unknown TCP options, it would do so based on their TCP Option Kind. Such a middlebox would thus either remove the MP_CAPABLE option from the SYN packet or discard such a SYN packet. Multipath TCP clients already coped with the latter by removing the MP_CAPABLE option after 3 unsuccessful attempts to transmit a SYN that carries this option.

The second modification introduced in draft-ietf-mptcp-multiaddressed-03 is the definition of two flags : S and C. S was intended to negotiate the hash function used to compute the tokens and the initial sequence numbers from the keys. C allows to negotiate the utilization of the DSN checksum that will be explained in a subsequent blog post.

The format of this option changed again in draft-ietf-mptcp-multiaddressed-10 to clarify the different flags.

Besides the changes to the format of the MP_CAPABLE option, there were also changes to the processing of this option. As middleboxes such as firewalls could block SYN that contain an unknown option, the Multipath TCP draft, draft-ford-mptcp-multiaddressed-00 already suggested that a client should only set the MP_CAPABLE options in the first few transmissions of a SYN and fallback to TCP if it does not receive any SYN+ACK in response. This is illustrated in the figure below.

fw [ label = "SYN MP_CAPABLE"]; |||; c=>fw [ label = "SYN MP_CAPABLE"]; |||; c=>fw [ label = "SYN MP_CAPABLE"]; |||; c=>s [ label = "SYN \n\n"]; |||; s=>c [label = "SYN+ACK \n\n"]; |||; c=>s [label="ACK\n\n"]; }" usemap="#88766f20c413933fa09f03f09785e3183fcbea54"/>

Another issue with Multipath TCP is that exchanging keys in the SYN and SYN+ACK forces the server to maintain state for each connection attempt upon reception of the client SYN. This could lead to denial of service attacks that have affected TCP implementations in the past RFC 4987. Since draft-ietf-mptcp-multiaddressed-02, Multipath TCP copes with this problem by allowing the server to be stateless. For this, the client needs to retransmit its key and the server key inside an MP_CAPABLE option that is carried in the third ACK.

s [ label = "SYN MP_CAPABLE(Client Key)\n\n"]; |||; s=>c [label = "SYN+ACK MP_CAPABLE(Server Key)\n\n"]; |||; c=>s [label="ACK MP_CAPABLE(Client and Server Keys)\n\n"]; }" usemap="#41520bb2bb360cefaafe69e8591763dab417decb"/>

The presence of the MP_CAPABLE option in the third packet also confirms the utilization of Multipath TCP to the server.

At this point, RFC 6824 was published in January 2013. In September 2013, Apple deployed Multipath TCP on all iPhones to support the Siri application. This deployment will be discussed in a specific blog post.

Based on the feedback received from implementors and given the successful deployments, the MPTCP working group started in late 2013 to work on a revision of RFC 6824 : draft-ietf-mptcp-rfc6824bis-00. A first change was proposed in draft-ietf-mptcp-rfc6824bis-05. This was an attempt to reduce the length of the options in the SYN packet.

The client sends a small MP_CAPABLE option that only contains the first four bytes. The server replies with its key and then the client echoes the client and server keys in the third ACK. The MP_CAPABLE option is also placed in the first packet that carries data. This choice was motivated by the fact that the third ACK is not transmitted reliably. If this packet does not reach the server, then it will not be aware of the client key. By placing the MP_CAPABLE option inside a packet that carries data, we ensure that it will eventually reach the server.

s [ label = "SYN MP_CAPABLE(4 bytes only)\n\n"]; |||; s=>c [label = "SYN+ACK MP_CAPABLE(Server Key)\n\n"]; |||; c=>s [label="ACK MP_CAPABLE(Client and Server Keys)\n\n"]; |||; c=>s [label="ACK MP_CAPABLE(Client and Server Keys, length of data and checksum)\n[First data]\n"]; }" usemap="#06132027b8d2de36bc7d5657d728ad797e1cb322"/>

Another important modification appeared in draft-ietf-mptcp-rfc6824bis-07. Although this modification only specifies the value of one bit in the MP_CAPABLE option, it is important to efficiently support load-balancers. This has been described in details in [DB17].

References

[DB17]

Fabien Duchene and Olivier Bonaventure. Making multipath tcp friendlier to load balancers and anycast. In Network Protocols (ICNP), 2017 IEEE 25th International Conference on, 1–10. IEEE, 2017. URL: https://inl.info.ucl.ac.be/publications/making-multipath-tcp-friendlier-load-balancers-and-anycast.html.

[RPB+12]

C. Raiciu, C. Paasch, S. Barre, A. Ford, M. Honda, F. Duchene, O. Bonaventure, and M. Handley. How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI). 2012. URL: https://inl.info.ucl.ac.be/publications/how-hard-can-it-be-designing-and-implementing-deployable-multipath-tcp.html.

Multipath TCP encounters its first middlebox

Fri, 07 Dec 2018 00:00:00 +0100

Multipath TCP encounters its first middlebox

Like many Internet protocols, Multipath TCP was designed by a working group. By involving several network engineers, one expects that they will think about all possible problems and that the final design will be stronger. However, there is also a risk that this design by committee becomes a series of compromises and that more and more options are added to the protocol. The best way to avoid this risk is to implement the protocol while it is being specified or base the key decisions of the design on an existing implementation. During the first year, the MPTCP working group mainly worked by emails and many ideas were exchanged.

Fortunately, two members of the working group started to write an implementation based on the first drafts. Sebastien Barre announced his first prototype implementation in the Linux kernel on November 18th, 2009. Costin Raicu replied on November 24th, 2009 with a user space implementation that also extended the Linux TCP stack.

Sebastien Barre ‘s implementation was designed to test the feasibility of implementing Multipath TCP inside an operating systems kernel. Costin Raicu ‘s implementation was designed to evaluate the performance of the coupled congestion control scheme [WRGH11]. These two implementations were complementary. Sebastien’s implementation leveraged his earlier experience in implementing shim6 RFC 5533 in the Linux kernel [BarreRB11]. For this reason, it only supported IPv6, while Costin’s implementation only worked over IPv4. Sebastien continued to improve his first implementation. Version 0.2 that was released in March 2010. When he added IPv4 support to this implementation, he sent the kernel sources to colleagues in Finland and UK. The Multipath TCP implementation was working well inside our labs and this was an opportunity to test it over longer distance paths to see how retransmissions and other techniques reacted to longer delays. The first tests were a disaster. This version of Multipath TCP could establish a connection to the remote server, but no data was exchanged. Sebastien looked at all the possible sources of problems and eventually took a packet trace on both servers to manually check the packets that were exchanged. Eventually, he found that a middlebox somewhere on the Internet was changing the TCP sequence numbers of the packets without modifying the Multipath TCP options. He summarized his findings in the email below.

He continued to explore the problem and found that the culprit was our campus firewall… This firewall was configured to configured to rewrite TCP sequence numbers to protect TCP connections from weak machines such as Windows98 …

This was the first time that a middlebox interfered with a Multipath TCP implementation, by far not the the last unfortunately. The protocol designers learned this lesson and looked at different ways to make Multipath TCP much more resilient to middlebox interference. We will explore these issues in more details later in another blog post.

An important lesson that we learned is that by having developed a fully functional implementation early, we could quickly detect operational problems and fix them. Sebastien’s initial implementation later became the reference Multipath TCP implementation and various developers have contributed to the current code base. The global Internet is far more complex than what students learn in textbooks…

References

[BarreRB11]

Sébastien Barré, John Ronan, and Olivier Bonaventure. Implementation and evaluation of the shim6 protocol in the linux kernel. Computer Communications, 34(14):1685–1695, 2011. URL: https://inl.info.ucl.ac.be/publications/implementation-and-evaluation-shim6-protocol-linux-kernel.

[WRGH11]

D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley. Design, Implementation and Evaluation of Congestion Control for Multipath TCP. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 2011.

Multipath TCP : controlling congestion

Thu, 06 Dec 2018 00:00:00 +0100

Multipath TCP : controlling congestion

Since the publication of Van Jacobson’s seminal paper on Congestion Avoidance and Control in 1988 [Jac88], congestion control has been one of the most active topics in transport protocol research. This was one of the key scientific challenges for the design of a multipath transport protocol.

As explained in RFC 6182, one of the goals of Multipath TCP was to solve the congestion problem while remaining fair with regular TCP traffic. As Multipath TCP uses multiple parallel TCP connections that are called subflows, a naive implementation of Multipath TCP that would use a standard TCP congestion control scheme for each subflow would be unfair against single TCP flows that would compete for the same resources.

Several researchers had addressed problems that are similar to multipath congestion control earlier but from a mathematical viewpoint and without considering an implementation in a real protocol [KV05], [KMassoulieT07]. An interesting survey article was published later [KMassoulieT11].

The first pratical multipath congestion control scheme was proposed in the paper Design, Implementation and Evaluation of Congestion Control for Multipath TCP [WRGH11]. This is an evolution of the classical TCP congestion control scheme RFC 5681. Compared to the single flow TCP congestion control, the multipath congestion control scheme slows down the increase of the congestion window while still halving it when congestion occurs. A summary of this multipath congestion control algorithm can be found below (source [WRGH11]).

The IETF has adopted this multipath congestion control algorithm as the default one for Multipath TCP RFC 6356. It was initially designed based on simulations with htsim and a preliminary userspace implementation of MPTCP. It has later been added to the Multipath TCP implementation in the Linux kernel.

Since the publication of [WRGH11], several other congestion control algorithms have been proposed and imlemented. These include OLIA [KGP+12], BALIA [PWHL16] and a multipath adaptation of TCP Vegas [CXF12]. The authors of these three algorithms released their implementation in Multipath TCP implementation in the Linux kernel. OLIA [KGP+12] is often preferred by Multipath TCP users.

Each of the above articles discussed the merits of the proposed congestion control scheme and compares it by simulations or measurements with other congestion control schemes. It is likely that other multipath congestion control algorithms will be proposed by the research community. As the IETF is adopting the CUBIC congestion control algorithm for single path TCP RFC 8312, a multipath variant would cleary be a useful contribution. It would also be interesting to develop a multipath variant of BBR [CCG+16].

A recent survey [KL18] summarises the multipath congestion control schemes implemented in the Linux kernel in a nice table shown below.

References

[CXF12]

Yu Cao, Mingwei Xu, and Xiaoming Fu. Delay-based congestion control for multipath tcp. In Network Protocols (ICNP), 2012 20th IEEE International Conference on, 1–10. IEEE, 2012.

[CCG+16]

Neal Cardwell, Yuchung Cheng, C Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. Bbr: congestion-based congestion control. Queue, 14(5):50, 2016.

[Jac88]

V. Jacobson. Congestion avoidance and control. ACM SIGCOMM Computer Communication Review, 18(4):314–329, 1988.

[KV05]

Frank Kelly and Thomas Voice. Stability of end-to-end algorithms for joint routing and rate control. ACM SIGCOMM Computer Communication Review, 35(2):5–12, 2005.

[KMassoulieT07]

Peter Key, Laurent Massoulié, and Don Towsley. Path selection and multipath congestion control. In INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE, 143–151. IEEE, 2007.

[KMassoulieT11]

Peter Key, Laurent Massoulié, and Don Towsley. Path selection and multipath congestion control. Communications of the ACM, 54(1):109–116, 2011.

[KGP+12]

(1, 2) R. Khalili, N. Gast, M. Popovic, U. Upadhyay, and J.-Y. Le Boudec. MPTCP is not Pareto-Optimal: Performance Issues and a Possible Solution. In Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies (CoNEXT). 2012.

[KL18]

Bruno Yuji Lino Kimura and Antonio Alfredo Frederico Loureiro. Mptcp linux kernel congestion controls. Technical Report arXiv:1812.03210, Arxiv, Dec. 2018. URL: https://arxiv.org/abs/1812.03210.

[PWHL16]

Qiuyu Peng, Anwar Walid, Jaehyun Hwang, and Steven H Low. Multipath tcp: analysis, design, and implementation. IEEE/ACM Transactions on Networking (ToN), 24(1):596–609, 2016.

[WRGH11]

(1, 2, 3) D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley. Design, Implementation and Evaluation of Congestion Control for Multipath TCP. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 2011.

Multipath TCP : the Maastricht consensus

Wed, 05 Dec 2018 00:00:00 +0100

Multipath TCP : the Maastricht consensus

The work on Multipath TCP started in 2008 and quickly its designers approached the IETF to create a new working group. This process starts with a BOF that was held in July 2009 in Stockholm. The discussion at the MPTCP BOF was both open and constructive reading again the meeting minutes and the MPTCP working group was quickly approved.

Its initial charter was pretty ambitious:

Mar 2010          Established WG consensus on the Architecture
Aug 2010          Submit to IESG architectural guidelines and security threat analysis as informational RFC(s)
Mar 2011          Submit to IESG basic coupled congestion control as an experimental RFC
Mar 2011          Submit to IESG protocol specification for MPTCP extensions as an experimental RFC
Mar 2011          Submit to IESG an extended API for MPTCP as an or part of an experimental or informational RFC
Mar 2011          Submit to IESG application considerations as an informational RFC
Mar 2011          Recharter or close WG

All the design was supposed to be finished in less than two years. The congestion control problem that was considered as the most important one was almost solved and the protocol design did not seem too difficult.

Shortly after the creation of the working group, Alan Ford asked an interesting question on the multipathtcp mailing list : Given that endhosts will need to signal information through a TCP connection, should they use TCP options or the payload with a TLV format to encode this information ?

This was a very important design decision. On one hand, encoding control information in TCP options is the standard way of extending TCP. An important benefit of using TCP options is that there is a clear separation between the control information and the user payload. However, the extended TCP header has a limited size and it is difficult to exchange a lot of control information inside TCP option. Another issue is that TCP options are not exchanged reliably since they are not acked. On the other hand, placing control information inside the packet payload required the utilisation of a Type/Length/Value format in the packet payload to distinguish between user data and control information.

The debate over this key design question lasted for almost a year. Michael Scharf was convinced by the idea of placing control information inside the payload and proposed a detailed design in Multi-Connection TCP (MCTCP) Transport. An important advantage of using the payload to carry control information was that MCTCP could be implemented as a library that intercepts system calls without requiring modifications to the kernel TCP implementation. Michael Scharf provided additional information in a subsequent paper [SB11]. However, using the payload to carry control information has several drawbacks. First, it is difficult to implement flow control at the Multipath TCP level since it depends on the flow control of the underlying TCP connection. Second, the TLV format imposes a specific format to the packets that are exchanged by Multipath TCP hosts. This format might interfere with middleboxes such as firewalls. For example, consider a firewall that is configure to verify that all connections on port 80 only carry valid HTTP requests and responses. With MCTCP, this firewall will observe a traffic pattern that is not exactly HTTP.

The figure below (source Multi-Connection TCP (MCTCP) Transport) describes the architecture of MCTCP.

The payload/options debate lasted for almost a year and the progress of the working group was slow. There was a risk that these two designs could evolve in parallel for a long period of time. In July 2010, the working group met in Maastricht and one of the main objectives of this meeting was to reach a consensus on this important design decision. Costin Raiciu explained the arguments that were in favor of using TCP options while Michael Scharf was in favor of using TLVs in the payload. In this end, the working group agreed to focus its energy on developing a Multipath TCP protocol that uses TCP options

References

[SB11]

M. Scharf and T. Banniza. Mctcp: a multipath transport shim layer. In 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011, volume, 1–5. Dec 2011. doi:10.1109/GLOCOM.2011.6134021.

Multipath TCP: the architectural principles

Tue, 04 Dec 2018 00:00:00 +0100

Multipath TCP: the architectural principles

When an IETF working group start, its members first need to agree on a charter and a set of principles that will guide their work. For Multipath TCP, the key architectural principles have been documented in :rfc`6182`.

The first, an very important, assumption for the design of Multipath TCP was that at least one of the two communicating hosts would have two or more IP addresses. This is captured in the figure below (source RFC 6182).

Then, it is interesting to recondiser the function goals that are listed in Section 2.1 of RFC 6182:

Improve Throughput

Improve Resilience

Today, given the wide deployment of Multipath TCP on Apple smartphones, it could be suprising that this document did not anticipate the need to support fast handovers, i.e. the ability to quickly switch a connection from Wi-Fi to cellular or the opposite. The resilience requirement takes into account the ability to retransmit data from one path to another.

After these functional goals, four important compatibility goals were listed in RFC 6182. The first one is that Multipath TCP should remain compatible with the existing socket API, although the document expected that another more advanced API would be developed later. Second, and this turned out to be a very difficult compatibility goal, Multipath TCP must preserve the ability to transfer data. This implies that if two hosts were able to exchange data over a given network path, they should still be able to exchange the same data once TCP has been replaced by Multipath TCP. In some network scenarios, preserving the ability to exchange data could rely on a fallback to regular TCP. The third goal is that Multipath TCP should not harm existing TCP flows from a congestion control viewpoint. Finally, the third goal is that Multipath TCP should not be less secure than regular TCP.

Section 4 of RFC 6182 provides a high level functional decomposition of Multipath TCP. One of the key elements of this decomposition is that a Multipath TCP will be composed of a set of subflows as illustrated in the figure below (source RFC 6182).

Then, Section 5 provides the key design principes that will be discussed in more details in other blog posts. Section 7 concerns the interactions with middleboxes, another important problem that will also be discussed in subsequent blog posts.

The Multipath TCP foundations

Mon, 03 Dec 2018 00:00:00 +0100

The Multipath TCP foundations

When looking at a new protocol, it is also interesting to start by reading the initial motivations for its design. The initial design of Multipath TCP was heavily influenced by the Resource Pooling Principle written by Damon Wischick, Mark Handley and Marcelo Bagnulo [WHB08] and published as an editorial in SIGCOMM’s Computer Communication Review. Since the early days of computer networks, statistical multiplexing, failure resilience and load balancing have played a key role in enabling networks to carry a growing amount of traffic. However, many of the techniques that are used today were designed under the assumption that they needed to have a local impact. Many of these designs missed the opportunity of considering the problem of pooling all the available resources as an end-to-end problem.

Multipath TCP, by enabling endhosts to efficiently use different paths to exchange packets was designed to solve one aspect of this problem. Content Distribution Networks and the more recent Mobile Edge Computing approaches also contribute to this overal goal of improving the sharing of all the available network resources. The initial design for Multipath TCP is briefly sketched on page 3 :

Later on this page, Damon Wischick, Mark Handley and Marcelo Bagnulo provide an interesting comment about the design of Multipath TCP :

Adding multipath support to TCP is so obvious that it has been re-invented many times [Hui95], [HS02], [RA04], [DWPW07], and multihoming is built into SCTP, though no protocol that simultaneously uses multiple paths has ever been standardized let alone widely deployed. Why is there not more multipath at the transport layer? Perhaps because it has not been understood that multipath lets end systems solve network-wide resource pooling problems, and because the issues with cur- rent mechanisms are only now becoming pressing enough to fix.*

This paragraph clearly suggests that one of the objectives of Multipath TCP will be to put the endhosts in control for the selection and the utilisation of multiple end-to-end paths to reach a given destination. In fact, Resource Pooling Principle could be considered as a natural extension of Saltzer, Reed and Clark’s End-to-end arguments in system design paper [SRC84] when considering resource utilisation.

The paper ends with one definition and two observations that remain valid today:

Definition. Resource pooling means making a collection of networked resources behave as though they make up a single pooled resource. The general method of resource pooling is to build mechanisms for shifting load between various parts of the network

Observation 1. Resource pooling is often the only practical way to achieve resilience at acceptable cost.

Observation 2. Resource pooling is also a cost-effective way to achieve flexibility and high utilization.

References

[DWPW07]

Yu Dong, Dingding Wang, Niki Pissinou, and Jian Wang. Multi-path load balancing in transport layer. In Next Generation Internet Networks, 3rd EuroNGI Conference on, 135–142. IEEE, 2007.

[HS02]

Hung-Yun Hsieh and Raghupathy Sivakumar. Ptcp: an end-to-end transport layer protocol for striped connections. In null, 24. IEEE, 2002.

[Hui95]

C Huitema. Multi-homed tcp. draft-huitema-multi-homed-01. Internet Engineering Task Force (IETF), 1995.

[RA04]

Kultida Rojviboonchai and Hitoshi Aida. An evaluation of multi-path transmission control protocol (m/tcp) with robust acknowledgement schemes. IEICE transactions on communications, 87(9):2699–2707, 2004.

[SRC84]

J. Saltzer, D. Reed, and D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems (TOCS), 2(4):277–288, 1984.

[WHB08]

D. Wischik, M. Handley, and M. Bagnulo. The resource pooling principle. SIGCOMM Comput. Commun. Rev., 38(5):47–52, September 2008. URL: http://doi.acm.org/10.1145/1452335.1452342, doi:10.1145/1452335.1452342.

Multipath TCP Tutorials

Sun, 02 Dec 2018 00:00:00 +0100

Multipath TCP Tutorials

Many scientific articles and IETF documents have been published on Multipath TCP. A network engineer, researcher or student who wants to learn Multipath TCP will probable start from a search engine or Wikipedia. A sample result is provided below.

The Multipath TCP page on Wikipedia provides some pointers, but this is probably not the simplest starting point to learn Multipath TCP. Fortunately, several tutorial articles that describe the basic principles of this TCP extension have been published.

One of the first tutorial articles is An overview of Multipath TCP that was published in USENIX login; in May 2012 [BHR12]. This article provides a basic overview of some of the principles of Multipath TCP.

The second article is simply entitled Multipath TCP and appeared in Communications of the ACM in 2014 [PB14]. It provides a more detailed overview of the protocols and some of its use cases. This is probably the most complete tutorial article on Multipath TCP.

If you prefer to listen to video tutorials instead of reading articles, several of them have been posted on youtube.

A long tutorial on the Multipath TCP protocol was given by Olivier Bonaventure at IETF’87 in Berlinin August 2013.

https://www.youtube.com/watch?v=Wp0Kr3B64tA

Christoph Paasch gave a shorter Multipath TCP tutorial earlier during FOSDEM’13 in Brussels.

https://www.youtube.com/watch?v=wvO0bcWgXCs

Earlier, Costin Raiciu and Christoph Paasch gave a one hour Google Research talk on the design of the protocol and several use cases.

https://www.youtube.com/watch?v=02nBaaIoFWU

[PB14]

Christoph Paasch and Olivier Bonaventure. Multipath tcp. Commun. ACM, 57(4):51–57, April 2014. URL: http://doi.acm.org/10.1145/2578901, doi:10.1145/2578901.

The first ten years of Multipath TCP

Sat, 01 Dec 2018 00:00:00 +0100

The first ten years of Multipath TCP

Multipath TCP was designed within the FP7 Trilogy project that started in early 2008. The first ideas on Multipath TCP were discussed in 2008, slightly more than a decade ago. During this decade, Multipath TCP has evolved a lot. It has also generated a lot of interest within the scientific community with several hundreds of articles that use, extend or reference Multipath TCP. As an illustration of the scientific impact of Multipath TCP, the figure below shows the cumulative number of citations for the sequence of internet drafts that became RFC 6824 according to Google Scholar.

The industrial impact of Multipath TCP is also very important as Apple uses it on all iPhones and several network operators use it to create Hybrid Access Networks that combine xDSL and LTE to provide faster Internet services in rural areas.

On all the remaining days until Christmas, a new post will appear on this blog to illustrate one particular aspect of Multipath TCP with pointers to relevant scientific papers, commercial deployments, … This series of blog posts will constitute a simple advent calendar that could be useful for network engineers and researchers who want to understand how this new protocol works and why it is becoming more and more important in today’s Internet.

Multipath TCP and load balancers

Thu, 04 Oct 2018 00:00:00 +0200

Multipath TCP and load balancers

The stafeful load balancers
The stateless load balancers

Schematically, a load balancer is a device or network function that processes incoming packets and forwards all packets that belong to the same connection to a specific server. A stateful load balancer will maintain a table that associates the five-tuple that identifies a TCP connection to a specific server. When a packet arrives, it seeks a matching entry in the table. If a match is found, the packet is forwarded to the selected server. If there is no match, e.g. the packet is a SYN, a server is chosen and the table is updated before forwarding the packet. The table entries are removed when they expire or when the associated connection is closed. A stateless load balancer does not maintain a table. Instead, it relies on hash function that is computed over each incoming packet. A simple approach is to use a CRC over the source and destination addresses and ports and associate each server to a range of CRC values.

With Multipath TCP, a single connection can be composed of different subflows that have their own five tuples. This implies that that data corresponding to a given Multipath TCP connection can be received over several different TCP subflows that obviously need to be forwarded to the same server by the load balancer. Several approaches have been proposed in the literature to solve this problem.

In Datacenter Scale Load Balancing for Multipath Transport, V. Olteanu and C. Raiciu proposed two different tricks to support stateless load balancers with Multipath TCP. First, the load balancer selects the key that will be used by the server for each incoming Multipath TCP connection. As this key is used to Token that identifies the Multipath connection in the MP_JOIN option, this enables the load balancer to control the Token that clients will send when creating subflows. This allows the load balancer to correctly associated MP_JOINs to the server that terminates the corresponding connection. This is not sufficient for a stateless load balancer. A stateless load balancer also needs to associate each incoming packet to a specific server. If this packet belongs to a subflow, it carries the source and destination addresses and ports, but those of a subflow have no releationship with the initial subflow. They solve this problem by encoding the identification of the server inside a part of the TCP timestamp option.

In Towards a Multipath TCP Aware Load Balancer, S. Lienardy and B. Donnet propose a mix between stateless and stateful approaches. The packets from the first subflow are sent to a specific server by hashing their source and destination addresses and ports. They then extract the key exchanged in the third ack to store the token associated with this connection. This token is then placed in a map that is used to load balance the SYN MP_JOIN packets. The reception of an MP_JOIN packet forces the creation of an entry in a table that is used to map the packets from the additional subflows.

The latest Multipath TCP load balancer was proposed in Stateless Datacenter Load-balancing with Beamer by V. Olteanu et al. It assigns one port to each load balanced server and also forces the client to create the subflows towards this per-server port number. The load balancer is implemented in both software (click elements) and hardare (P4) and evaluated in details. The source code is available from https://github.com/Beamer-LB

Experimenting with MPTCP using raw sockets

Wed, 18 Apr 2018 00:00:00 +0200

Experimenting with MPTCP using raw sockets

Although Multipath TCP is already available on several platforms (Linux, FreeBSD, iOS11), applications like Tracebox (or Mobile Tracebox) are still a convenient choice for users eager to experiment with the new protocol without installing the full MPTCP stack. These tools (along with zmap, tcpexposure, etc) allow to forge custom packets (e.g. using raw sockets) emulating newer extensions or protocols.

For instance Tracebox can highlight middleboxes interfering with MPTCP by sending MP_CAPABLE Syn with increasing TTL and collecting ICMP Time Exceeded messages from intermediate routers. Even when they don’t respond or don’t quote the full TCP header, Syn Ack without MP_CAPABLE option received from well known MPTCP servers can still reveal interference. When path is clean from middleboxes, MP_CAPABLE Syn can also be used to assess if a server adopts MPTCP.

Unfortunately middleboxes can be very subtle: e.g. they can be completely transparent to MP_CAPABLE packets but still interfere with ADD_ADDR and DSS or they can fictitiously support all options carried by Syn.

This leads to the need of more articulated MPTCP tests: in this post we describe a test (included in Mobile Tracebox), that uses raw sockets to establish a MPTCP connection, exchange data and also associate a second subflow.

In the first part we detail for every step how options should be correctly crafted for MPTCP experiment to succeed, in the second part we explore further scenarios (e.g. options not perfectly compliant with the protocol) to see how a MPTCP stack reacts to them. This can benefit development of similar application, avoiding pitfalls when dealing at low level with MPTCP, but also can help to better understand how the protocol concretely works. The figure summarizes packets exchanged between A, our client running the test, and B, a MPTCP-enabled server (multipath-tcp.org).

MPTCP-compliant scenario

We report the output of Mobile Tracebox (only interesting header fields are included).

 192.168.42.7   [TCP Syn] TCP::SourcePort(24d2)  TCP::Option_MPTCP(00811000000000000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (00810c4d5dfc94d0a464)

 192.168.42.7   [TCP Ack]  TCP::SourcePort(24d2) TCP::Option_MPTCP(008110000000000000000c4d5dfc94d0a464)
 *

 192.168.42.7   [TCP Ack 72 bytes] TCP::SourcePort(24d2) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100483aca)  TCP::Payload ("GET / HTTP/1.1...")
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 192.168.1.102  [TCP Syn] TCP::SourcePort (cefc) TCP::Option_MPTCP(10023a03caf210000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (100256c7a377b2e33fdaa29163c5)

All fields are in hexadecimal format: we can easily acknowledge the MPTCP option subtype from the first digit. A full trace of the packets exchanged during the probe is also reported.

48:32.485197 IP client1.9426 > mptcp.info.ucl.ac.be.http: Flags [S], seq 19922944, win 65535, options [mptcp capable csum {0x1000000000000000}], length 0
48:32.573554 IP mptcp.info.ucl.ac.be.http > client1.9426: Flags [S.], seq 3005334072, ack 19922945, win 28800, options [mss 1452,mptcp capable csum {0xc4d5dfc94d0a464}], length 0
48:32.573792 IP client1.9426 > mptcp.info.ucl.ac.be.http: Flags [.], ack 1, win 65535, options [mptcp capable csum {0x1000000000000000,0xc4d5dfc94d0a464}], length 0
48:35.577198 IP client1.9426 > mptcp.info.ucl.ac.be.http: Flags [.], seq 1:73, ack 1, win 65535, options [mptcp dss seq 4216210269 subseq 1 len 72 csum 0x3aca], length 72: HTTP: GET / HTTP/1.1
48:35.664046 IP mptcp.info.ucl.ac.be.http > client1.9426: Flags [.], ack 1, win 28800, options [mptcp add-addr id 8 mptcp.info.ucl.ac.be,mptcp dss ack 4216210269], length 0
48:35.664556 IP mptcp.info.ucl.ac.be.http > client1.9426: Flags [.], ack 73, win 28800, options [mptcp dss ack 4216210341], length 0
48:35.666894 IP mptcp.info.ucl.ac.be.http > client1.9426: Flags [P.], seq 1:503, ack 73, win 28800, options [mptcp dss ack 4216210341 seq 2447520560 subseq 1 len 502 csum 0xc36b], length 502: HTTP: HTTP/1.1 200 OK
48:38.670543 IP client2.52988 > mptcp.info.ucl.ac.be.http: Flags [S], seq 1793048487, win 65535, options [mptcp join id 2 token 0x3a03caf2 nonce 0x10000000], length 0
48:38.756268 IP mptcp.info.ucl.ac.be.http > client2.52988: Flags [S.], seq 1665111958, ack 1793048488, win 28800, options [mss 1452,mptcp join id 2 hmac 0x56c7a377b2e33fda nonce 0xa29163c5], length 0

The test uses two client’s addresses (192.168.42.7, client1 – 192.168.1.102, client2) for the two subflows, but it’s still possible to use the same address just with different source ports.

Everything starts with a Syn carrying MP_CAPABLE option (subtype 0x0) with flags A (Checksum required) and H (use of HMAC-SHA1 as crypto algorithm) and a 64 bits key chosen by the client (0x1000000000000000). Server replies with a MP_CAPABLE Syn Ack containing same flags and its key: client takes note of server’s key to echo it on MP_CAPABLE Ack (but also to forge the subsequent MP_JOIN).

If the clients attempts to send a MP_JOIN message at this point MPTCP stack will discard the new subflow with a Rst, since no data has been actually exchanged on the first subflow. This means we have to send a packet with a real payload and a DSS option. To avoid the server dumping our packet or simply closing the connection payload must be a real HTTP request.

GET / HTTP/1.1
Host: blog.multipath-tcp.org
Connection: keep-alive

We also have to assemble a compliant DSS Option (subtype 0x2): we set flags to 0x4 (data sequence number of 32 bits), Subflow Sequence Number to 1, Data-Level Length to the length of our TCP payload (72 bytes); Data Sequence Number is generated from the SHA-1 hash of the client’s key; finally a DSS checksum has to be calculated on payload and DSS pseudo-header. Server answers with 2 packets, the first carries an ADD_ADDR option (subtype 0x3) advertising server IPv6 address: this is a symptom that MPTCP stack has acknowledged that we are speaking MPTCP language. The second contains a DSS Option: we can see how sent data is acked on both TCP and MPTCP levels.

Note

To avoid DSS checksum calculation we can use a Data-Level Length greater than the actual TCP payload length: in this case the packet will be accepted but DSS checksum will not be evaluated waiting for the next TCP segment (packet will be acked at TCP level but not MPTCP level).

After data exchange has taken place on the first subflow we can finally use a second subflow to join MPTCP connection. We send a new Syn from different source address and port (or just different port) with a MP_JOIN option (subtype 0x2) carrying a Token obtained from the key sent by server in its MP_CAPABLE Syn Ack (the first 32 bits of the SHA-1 hash of server’s key) and a Random number; Address Id is obviously set to 2. The server answers with a MP_JOIN Syn Ack (carrying a Hash-based Message Authentication Code and a Random number), sign that our MPTCP experiment has succeeded.

Other scenarios

Another advantage of raw sockets is that we can send packets not perfectly compliant with the protocol simulating how MPTCP stack reacts to possible malfunctioning or tricky middlebox interference.

Invalid MP_CAPABLE Key

In this scenario the client echoes a wrong server’s key in MP_CAPABLE Ack: this inconsistency is ignored and communication proceeds well on both TCP and MPTCP level on the first subflow. Also MP_JOIN still succeeds as long as the token is calculated from the correct server’s key.

 192.168.42.7   [TCP Syn] TCP::SourcePort(fac4)  TCP::Option_MPTCP(00811000000000000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (0081422b61826574250e)

 192.168.42.7   [TCP Ack]  TCP::SourcePort(fac4) TCP::Option_MPTCP(008110000000000000002000000000000000)
 *

 192.168.42.7   [TCP Ack 72 bytes] TCP::SourcePort(fac4) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100483aca)  TCP::Payload ("GET / HTTP/1.1...")
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 192.168.1.102  [TCP Syn] TCP::SourcePort (cc2e) TCP::Option_MPTCP(1002531ed1b010000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (1002ea5ea69620cd23fc2c3feb6a)

No DSS Checksum, despite requested by counterpart

In the next scenario client sends a DSS option without checksum, although server has requested DSS checksum in its MP_CAPABLE Syn Ack: server replies with a Rst terminating the subflow, but the subsequent MP_JOIN still succeeds.

 192.168.42.7   [TCP Syn] TCP::SourcePort(c49a)  TCP::Option_MPTCP(00011000000000000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (008106ef03ac4b958a2f)

 192.168.42.7   [TCP Ack]  TCP::SourcePort(c49a) TCP::Option_MPTCP(0001100000000000000006ef03ac4b958a2f)
 *

 192.168.42.7   [TCP Ack 72 bytes] TCP::SourcePort(c49a) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d000000010048)  TCP::Payload ("GET / HTTP/1.1...")
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
 130.104.230.45 [TCP Rst Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(2001fb4e435d)

 192.168.1.102  [TCP Syn] TCP::SourcePort (6bf0) TCP::Option_MPTCP(100252ad76cd10000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (10028ce03d14fa336cdae211d46d)

Bad DSS Checksum (MP_FAIL)

In another scenario a wrong DSS Checksum is sent, in this case the server correctly acknowledges data at TCP level, but sends a MP_FAIL (subtype 0x6) option causing fall back to a single subflow. Obviously subsequent MP_JOIN Syn will be rejected.

 192.168.42.7   [TCP Syn] TCP::SourcePort(f1ba)  TCP::Option_MPTCP(00811000000000000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (0081b144931ac84d865a)

 192.168.42.7   [TCP Ack]  TCP::SourcePort(f1ba) TCP::Option_MPTCP(00811000000000000000b144931ac84d865a)
 *

 192.168.42.7   [TCP Ack 72 bytes] TCP::SourcePort(f1ba) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100480100)  TCP::Payload ("GET / HTTP/1.1...")
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(60008710f99bfb4e435d) TCP::Option_MPTCP(2001fb4e435d)

 192.168.1.102  [TCP Syn] TCP::SourcePort (ebbb) TCP::Option_MPTCP(10023d41ba9910000000)
 130.104.230.45 [TCP Rst Ack] -TCP::Option_MPTCP

Fall back without MP_FAIL

Fall back can also occur when client sets DSS Data-Level Length to 0 (“infinite mapping”): in this scenario server acknowledges data at TCP and MPTCP level and doesn’t send any MP_FAIL (since this case is interpreted as a choice by the client and not an anomalous event like a wrong DSS checksum), but fall back is still evident when client attempts to associate a new sufblow and MP_JOIN is not accepted.

 192.168.42.7   [TCP Syn] TCP::SourcePort(de0a)  TCP::Option_MPTCP(00811000000000000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (0081b6e7a1b307358b82)

 192.168.42.7   [TCP Ack]  TCP::SourcePort(de0a) TCP::Option_MPTCP(00811000000000000000b6e7a1b307358b82)
 *

 192.168.42.7   [TCP Ack 72 bytes] TCP::SourcePort(de0a) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100000000)  TCP::Payload ("GET / HTTP/1.1...")
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 192.168.1.102  [TCP Syn] TCP::SourcePort (2c6d) TCP::Option_MPTCP(10020edf2fd810000000)
 130.104.230.45 [TCP Rst Ack] -TCP::Option_MPTCP

Bad MP_JOIN Token

In the last scenario tested the client sends a wrong Token in MP_JOIN Syn. The server unsurprisingly replies with a Rst.

 192.168.42.7   [TCP Syn] TCP::SourcePort(1594)  TCP::Option_MPTCP(00811000000000000000)
 130.104.230.45 [TCP Syn Ack] TCP::Option_MPTCP (008191d0ae47af67a0f2)

 192.168.42.7   [TCP Ack]  TCP::SourcePort(1594) TCP::Option_MPTCP(0081100000000000000091d0ae47af67a0f2)
 *

 192.168.42.7   [TCP Ack 72 bytes] TCP::SourcePort(1594) TCP::SeqNumber(01300001) TCP::Option_MPTCP(2004fb4e435d0000000100483aca)  TCP::Payload ("GET / HTTP/1.1...")
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300001) TCP::Option_MPTCP(3608200106a8308f000102163efffec5c815)
 130.104.230.45 [TCP Ack]  TCP::AckNumber(01300049) TCP::Option_MPTCP(2001fb4e43a5)

 192.168.1.102  [TCP Syn] TCP::SourcePort (d26e) TCP::Option_MPTCP(10020200000010000000)
 130.104.230.45 [TCP Rst Ack] -TCP::Option_MPTCP

Mobile Tracebox

The MPTCP test described in the first part has been included in the new version of Mobile Tracebox Screenshots show how to select destination address and the appropriate probe. To avoid a full traceroute on every packet sent, minimum TTL can be conveniently set to 64.

Since raw sockets are needed this probe is available only on rooted Android devices.

Multipath TCP on iOS11 : A closer look at the TCP Options

Mon, 10 Jul 2017 00:00:00 +0200

Multipath TCP on iOS11 : A closer look at the TCP Options

Multipath TCP uses a variety of TCP options to use different paths simultaneously. Several Multipath TCP options are defined in RFC6824 :

subtype 0x0: MP_CAPABLE
subtype 0x1: MP_JOIN
subtype 0x2: DSS
subtype 0x3: ADD_ADDR
subtype 0x4: REMOVE_ADDR
subtype 0x5: MP_PRIO
subtype 0x6: MP_FAIL
subtype 0x7: MP_FASTCLOSE

In this blog post, we explore in more details the packet trace collected on an iPhone using iOS11 beta. We start our analysis with the three-way handshake. The trace contains one Multipath TCP connection. Recent versions of Wireshark support Multipath TCP and we use the tcp.options.mptcp.subtype==0 filter to match all the packets that contain the MP_CAPABLE option. This option only appears in the three packets of the initial three-way handshake. Let us first analyse the SYN sent by the iPhone. In our test over an LTE network, iOS11 beta2 advertises the following options:

MSS set to 1410 bytes. This is a relatively small value that was probably chosen to reduce the risk of fragmentation or Path MTU discovery problems since cellular networks often use tunnels internally

Selective Acknowledgements are proposed

The Window scale factor is set to 6 and the iPhone advertises a 64Kbytes window.

The Timestamp option is used as well.

The MP_CAPABLE option sent by the iPhone does not request the utilisation of the DSS checksum. The DSS checksum was introduced in RFC6824 to detect middlebox interference. Previous versions of iOS did not use this checksum to support Siri because Siri ran over HTTPS and this prevents most middlebox interference. However, when Multipath TCP is used to support a protocol such as HTTP, there is a risk of interference from middleboxes that inject HTTP headers. If you plan to use Multipath TCP on iOS11, you should probably rely on HTTPS and forget HTTP for other reasons than Multipath TCP.

The server, in this trace the Linux implementation running on multipath-tcp.org replies with Selective Acknowledgements, Timestamps, a Window Scaling factor set to 7 and requires the utilisation of the DSS Checksum.

The MP_CAPABLE option contained in the third ACK sent by the iPhone confirms that the iPhone will use the DSS checksum for this connection as requested by the server.

The utilisation of the DSS Checksum is clearly visible in the first data packet that is sent by the iPhone. It uses 32 bits long Data sequence numbers and data acknowledgement numbers.

The first data packet returned by the Linux server is shown below. It also uses 32 bits data sequence and data acknowledgement numbers.

With iOS11 beta2, the iPhone uses the MP_PRIO option and sets the cellular subflow as a backup subflow. This is immediately visible in the fourth packet of the trace that is shown below.

Apple has already explained earlier that they do not use the ADD_ADDR option because their stack is focussed on clients and they do not see a benefit in advertising client addresses since those are often behind a NAT or firewall. We did not observe ADD_ADDR or REMOVE_ADDR in our first trace.

The MP_JOIN option is used to create subflows. In our trace, this happens at time 4.74 when we enable the WiFi interface. The MP_JOIN option contains the token advertise by the server in the MP_CAPABLE option and its backup flag is reset. This indicates that the WiFi subflow is preferred to the cellular flow that was initially created. It is interesting to note that iOS11 beta advertises a longer MSS over the WiFi interface than over the cellular one. The same window scaling factor (6) is used.

We did not observe MP_FASTCLOSE in this trace.

We’ll discuss MP_FAIL in another post since it is related to fallbacks to TCP.

MPTCP experiments on iOS 11 beta

Wed, 05 Jul 2017 00:00:00 +0200

MPTCP experiments on iOS 11 beta

MPTCP support has been announced for iOS 11 during wwwdc2017. The developer documentation presents a new instance property called multipathServiceType inside the URLSessionConfiguration class that can be set to one of the constants specified in MultipathServiceType enumeration, which is also in the URLSessionConfiguration class. The enumeration contains four constants and the documentation has a small description for each constant :

none : The default service type indicating that Multipath TCP should not be used.
handover : A Multipath TCP service that provides seamless handover between Wi-Fi and cellular in order to preserve the connection.
interactive : A service whereby Multipath TCP attempts to use the lowest-latency interface.
aggregate : A service that aggregates the capacities of other Multipath options in an attempt to increase throughput and minimize latency.

The code bellow shows a simple example of usage:

let config = URLSessionConfiguration.ephemeral

config.multipathServiceType = URLSessionConfiguration.MultipathServiceType.handover
let session = URLSession(configuration: config)

let url = URL(string: "http://multipath-tcp.org/data/uml/vmlinux_64")

let task = session.dataTask(with: url!, completionHandler:{...})

task.resume()

We will present experiments done with iOS in a series of posts on this blog. In our first experiment, we use the handover service type. We start the connection with the wifi interface down and after a few seconds, we turn on the wifi interface. The trace of the connection is available here. We use mptcptrace to see how the subflows are used. Let’s take a look at the Multipath-TCP sequence numbers over time :

As expected, the connection starts on the mobile interface because it is the only interface available at that time. When the wifi interface becomes available, around five seconds after the start of the connection, all the traffic is immediately sent to the wifi subflow.

Let’s take a closer look at what happens during the transition around five seconds after the start of the connection:

On this graph, MPTCP acknowledgements are pictured as blue crosses. We can see on this zoom, on the left upper corner, that the client receives out-of-sequence (from Multipath-TCP’s perspective) packets during the transition. This is due to the fact that iOS tries to terminate the connection as soon as possible on the mobile interface and the server does not know yet that it should not be used anymore. Starting from packets 4647 in the trace, we can see the zero window advertisement and resets sent by the iPhone on the mobile subflow. Once the server detects that some packets will not arrive on the mobile subflow, when it receives the reset, it reinjects the packets on the wifi subflow. During the time of the reinjections, out-of-order packets are kept in the out-of-order queue of MPTCP on the client side. To observe this out-of-sequence queue, we zoom on the right top corner of the graph :

On this graph, we can observe the MPTCP ACKs that cover the out-of-sequence packets received earlier. In particular we can observe a hole in the middle of the graph. If we zoom on other parts of the graph we can see several holes like this one.

This concludes our first analysis of Multipath-TCP on iOS. Stay tuned for more detailed analysis and tests. In next posts, we will discuss other Multipath TCP services offered by iOS11.

The “Experimental” status of Multipath TCP

Wed, 04 Jan 2017 00:00:00 +0100

The “Experimental” status of Multipath TCP

Multipath TCP is defined in RFC 6824 and I recently heard feedback from someone working for industry who mentioned that Multipath TCP should not be considered for deployment given its Experimental status. I was surprised by this comment and I think that it would be useful to clarify some facts about the maturity of Multipath TCP.

First, from a administrative viewpoint, the Experimental status of Multipath TCP was decided at the creation of the IETF MPTCP working group. At that time, it was unclear whether it would be even possible to specify a protocol like Multipath TCP and the IESG wanted to encourage experiments with the new protocol. By selecting this option , the IESG prepared a future standardisation of the protocol and this is happening right now with the definition of a standards-track version of Multipath TCP in RFC6824bis . According to the milestones of the IETF MPTCP working group, this revision should be ready in 2017.

Second, from a technical viewpoint, the maturity of a protocol cannot be inferred from the status of its specification. The best way to measure this maturity is to observe the interoperable implementations and the deployment of the protocol. From these two viewpoints, Multipath TCP is a clear success. There are endhost implementations on Linux, FreeBSD, Apple iOS, MacOS and Oracle Solaris. Multipath TCP is also supported on various middleboxes including Citrix Netscaler, F5 BIG-IP LTM and Ericsson.

From a deployment viewpoint, Multipath TCP is also a huge success. Hundreds of millions of users of Apple devices (iPhone, iPad, laptops) use Multipath TCP every time they use the Siri voice recognition application. In Korea, a dozen of models of high-end smartphones from Samsung and LG include a port of the reference implementation of Multipath TCP in the Linux kernel and use SOCKS proxies to bond WiFi and fast LTE. Several network operators provide those proxies as a commercial service. Other companies such as Swisscom or OVH also rely on SOCKS proxies to bond different types of links together. Another emerging use case are hybrid access networks. In various countries, network operators are require to provide fast broadband services, even in rural areas where deploying fiber is too expensive. Many of these operators want to combine their xDSL and LTE networks in order to improve the bandwidth to their customers. Tessares has already deployed a pilot hybrid access network solution that leverages Multipath TCP in Belgium.

Multipath TCP projects during the IETF97 Hackathon

Sun, 13 Nov 2016 00:00:00 +0100

Multipath TCP projects during the IETF97 Hackathon

The IETF organised a hackathon during the weekend before the IETF’97 meeting in Seoul. There are already several large scale deployments of Multipath TCP. However, these deployments focus on very specific utilisations of Multipath TCP for special applications or through various forms of proxies.

Recently, Benjamin Hesmans has released an enhanced socket API for Multipath TCP . This API has the potential of enabling new use cases for Multipath TCP by allowing application developpers to control the establishment and the utilisation of the Multipath TCP subflows. To understand how this new API could be used, we organised a remote hackathon at Ecole Polytechnique de Louvain. We had two teams working on Multipath TCP during the IETF’97 Hackathon. In Seoul, 5 IETFers, including three PhD students from the IP Networking Lab worked in Seoul and 25 students worked in Louvain-la-Neuve on this new socket API.

These two teams received the best overall award from the organisers of the IETF’97 Hackathon in Seoul for their effort.

The Seoul team, composed of Benjamin Hesmans, Fabien Duchene, Olivier Tilmans, SungHoon Seo and François Serman worked on developing a library that can be pre-loaded before launching an unmodified application to use the new socket API to control how this application uses the underlying Multipath TCP subflows. This is described in these slides

In Louvain-la-Neuve, eight teams worked on different use cases.

Two groups of students worked on porting the Multipath TCP socket API to other langages than C. They have created prototype code for Java and Ruby. The other teams worked on curl, lighttpd, Openssh, ipef3 and nc.

Grégory Vander Schueren, Raphaël Bauduin and Thibault Gérondal worked on modifying Ruby to support the new Multipath TCP socket API. They obtained running code to support some of the new socket operations directly from ruby. Their work is summarised in these slides

Guillaume Demaude and Pierre Ortegat have analysed the problem of supporting the new Multipath TCP socket API in Java. It turned out that the Socket class in Java had not been designed to be extended. They have thus written static methods that implement the new socket API. Their results are in summarised in these slides and their code is available from https://github.com/reirep/matcp-java.git

Hoang Tran Viet, Remi Chauvenne and Thibault Libioulle have worked on iperf3, a throughput measurement tool. They have added the support for Multipath TCP inside iperf3 and modified the application to exchange the addresses of the client and of the server that are used to perform the tests. Their results are summarised in these slides

Charles-Henry Bertrand and Sylvain Dassier have explored how to modify the netcat testing tool to support Multipath TCP. Their results are summarised in these slides

Maxime Beugom, Antoine Denauw, Alexandre Dubray, Julien Gomez and Julian Roussieau have worked on Openssh. Their prototype controls changes the underlying subflows after the transmission of a number of bytes or after some time. This prototype demonstrates that by influencing the underlying subflows a security application can select different paths and thus counter on-path attacks. Their results are summarised in these slides Their code is posted on https://github.com/Derwaan/openssh-portable

Arnaud Dethise and Jacob Eliat-Eliat have modified curl to only create Multipath TCP subflows on connections that carry a sufficient number of bytes or last a sufficient time. Experiments have shown that Multipath TCP does not bring benefits for very short flows and this demonstrates how an application can defer the establishment of subflows. Their results are summarised in these slides and their prototype code is available from https://github.com/adethise/curl/tree/mptcp

Maxime Andries, Pablo Gonzalez Alvarez and Antoine Lambot have explored the possibility of creating subflows from the server. For this, they started from the lighttpd server and have modified it to create subflows when the web object returned by the server is large enough. Their results are summarised in these slides

Alexis Clarembeau has explored the possibility of developing a higher-level API for Multipath TCP that exposes a more abstract interface to the application. His results are summarised in these slides

MPTCP in Wireshark

Tue, 23 Aug 2016 00:00:00 +0200

MPTCP in Wireshark

Wireshark is a widely used network analyzer that can capture network traffic, save the captured packets (*.pcap) for later analysis and most importantly helps with analyzing such packet traces. Wireshark supports many protocols, which means it is able to assign meaning to bytes (dissect in the wireshark nomenclature) and display it accordingly. In some cases as in the TCP dissector, Wireshark even builds some state to provide expert information, for instance to identify TCP retransmissions. So far, Wireshark supported stateless dissection of MPTCP, i.e., it could dissect MPTCP options correctly, without being able to identify Multipath TCP connections.

Since November 2015 and the following patch (i.e., starting from Wireshark >= 2.1), Wireshark now considers MPTCP as a separate protocol, and builds states for MPTCP as well, thus mimicking TCP dissection.

This means Wireshark is now able to (providing the matching features are enabled):

map TCP subflows (tcp.stream) to MPTCP connections (mptcp.stream, see also mptcp.analysis.subflows).
List MPTCP connections

identify the master subflows (*mptcp.master == 1*)
check for mistmatched key/tokens and key/initial sequence data number (ISN)
etc… start filtering packets with *mptcp.* and wireshark autocompletion should show the different possibilities

Full MPTCP dissection can be quite CPU-consuming, thus some options are disabled by default and can be enabled through the menu Edit -> Preferences -> Protocols -> MPTCP.

Display relative MPTCP sequence numbers substracts the ISN to Data Sequence Numbers. This works only if the initial packets with the keys (3 way handshake) are captured and the wireshark option tcp relative sequence numbers is enabled.
In depth analysis of data sequence signal (DSS) mappings tells wireshark to look for the packets which sent the DSS mappings that cover the current packet; wireshark then displays a clickable item that brings you to the packet. This feature enables the creation of interval trees (introduced especially for this feature), which should consume quite a bit of memory/CPU so use with care !
Check for data duplication across subflows is a feature that was intended to help detect opportunistic reinjections or redundant schedulers but this is mostly experimental so use with care.

Matthieu Coudron

Multipath TCP News : January 2016

Tue, 05 Jan 2016 00:00:00 +0100

Multipath TCP News : January 2016

Multipath TCP continues to attract interest from both academic researchers who write papers that use or improve the protocol as well as engineers from industry who are deploying new innovative services on top of this new TCP extension. In this newsletter that we’ll try to post every month on the Multipath TCP blog, we’ll summarise the main information about Multipath TCP that we have collected during the previous month. Feel free to contact Olivier Bonaventure if you would like to publish something in this newsletter.

Implementation news

The MPTCP-DEV mailing list has been pretty active during the last month. Three patches have been announced :

Three bug fixes pushed by Christoph Paasch :

A first implementation of the ADD_ADDR2 option by Fabrizio Demaria. This option was proposed in RFC6824bis and includes a HMAC to authenticate the advertised address.

https://listes-2.sipr.ucl.ac.be/sympa/arc/mptcp-dev/2015-12/msg00056.html

Alexander Frommgen has announced a new website that can be used to verify that Multipath TCP works end-to-end : http://amiusingmptcp.de

This new website goes beyond the original http://amiusingmptcp.com that is not available anymore.

Another useful tool is an improved AndroidTracebox by Raffaele Zullo. It can be used on smartphones to detect middlebox interference in cellular and WiFi networks.

Scientific publications

December 2015 has been a busy month for scientific publications on Multipath TCP. Almost an entire session was devoted to Multipath TCP at Conext’2015 in Heidelberg with three papers :

Design, Implementation and Evaluation of Energy-Aware Multi-Path TCP by Yeon-sup Lim et al. proposes a variant of Multipath TCP called eMPTCP that aims at minimising the energy consumption of Multipath TCP on smartphones. This is an expanded version of a earlier workshop paper by the same authors.
An Anatomy of Mobile Web Performance over Multipath TCP by Bo Han et al. analyses by measurements how the utilisation of Multipath TCP affects the performance of Mobile websites.
SMAPP : Towards Smart Multipath TCP-enabled APPlications by Benjamin Hesmans et al. proposes and evaluates a Netlink-based API that allows applications to control the utilisation of the Multipath TCP subflows.

IETF

The IETF mailing list has been rather quite during the last month. One relevant draft has been updated :

https://tools.ietf.org/html/draft-boucadair-mptcp-plain-mode-06

This draft addresses the Hybrid Access Networks, i.e. access networks that combine two different link layer technologies, typically DSL and LTE. The Broadband Forum is developing solutions to enable network operators to efficiently use two heterogeneous networks together and some of the proposed solutions rely on Multipath TCP. This draft proposes a TCP option similar to the one proposed in Multipath in the Middle(Box) and discusses how such a solution could be used to support UDP.

Commercial usage of Multipath TCP

Fri, 25 Dec 2015 00:00:00 +0100

Commercial usage of Multipath TCP

Since the publication of RFC 6824 in January 2013, various companies have started to leverage Multipath TCP’s unique capabilities to create new innovate products. This post is a short summary of some of the publicly announced utilisations of Multipath TCP .

Multipath networks

Multipath networks is an Irish company that was the first to use Multipath TCP to bond two ADSL links or an ADSL and a wireless link. Their initial product relied on a modified home router that used the Linux Multipath TCP kernel together with OpenVPN and an HTTP proxy. The router intercepts all TCP traffic, sends it to a server running in the cloud over Multipath TCP and the server uses regular TCP to reach the final destination. Unfortunately, the company went bankrupt and the service is not sold anymore as of this writing.

VRT

VRT is the Flemish TV broadcaster in Belgium. They have designed their own cars to allow web journalists to capture videos, edit them and upload them to the VRT head quarters. Videos are long files that require a large bandwidth to be uploaded quickly. To allow the journalists to send their video reports as quickly as possibly, the latest VRT car, called The Beast has been equipped with three types of antennas :

one satellite antenna

several 3G antennas with the corresponding SIMs

several WiFi antennas

Once a video is ready, the server running in the car automatically starts all the available network interfaces and combines them thanks to Multipath TCP to upload the entire video to the VRT head quarters. This car has been used in production for more than a year at VRT.

Apple

Apple has started to use Multipath TCP on iPhones and iPads in September 2013 to support the Siri voice recognition application. Thanks to Multipath TCP, these mobile devices can better cope with losses and connectivity problems over the wireless interfaces. This deployment uses an implementation written by Apple’s engineers that is now also included in MacOS. Apple’s implementation of Multipath TCP does not include all the features of the protocol defined in RFC 6824 but it is fully interoperable with the Linux implementation.

Tessares

Tessares is a recent spinoff from UCL that was created with funding from Proximus, the Belgian network operator and the VIVES investment fund. Its objective is to develop new innovative network services on top of Multipath TCP. The first product developed by this company is a solution for Hybrid Access Networks. Such an access network combines two different types of technologies, typically DSL and 3G/4G. It is illustrated in the figure below.

This solution is composed of two different network devices :

The Hybrid CPE (HCPE)

The Hybrid Aggregation Gateway (HAG)

The HCPE is a CPE device that is capable of using two separate access networks. It is typically a home router that has been extended with a 3G/4G interface. Tessares provides a tuned version of the Multipath TCP implementation in the Linux kernel that has been optimised for this platform. It also includes a Multipath TCP proxy that intercepts the TCP connections established by the devices in the home network and converts them into Multipath TCP connections. Thanks to the utilisation of Multipath TCP, the devices used in the home network can use both the DSL and the 3G/4G network. The Hybrid Aggregation Gateway terminates the Multipath TCP connection and converts them into regular TCP connections so that regular servers that have not been upgraded to support Multipath TCP can be contacted.

The BroadBand Forum is working on solutions to support Hybrid Access Networks. During the last Broadband World Forum in London, several companies have demonstrated solutions that include the Multipath TCP implementation in the Linux kernel : Tessares that received a highly commended award, SoftatHome, Sagemcom, Technicolor, Intel and Ericsson.

Gigapath

Gigapath is a commercial service that was launched during the summer 2015 by Korean Telecom. In Korea competition among network operators forces them to provide higher bandwidth mobile services. The cellular networks deployed in this country are among the fastest in the world, but this is still not sufficient. Gigapath allows smartphone uses to combine together their 4G and WiFi networks to reach bandwidths of 800 Mbps and more.

From a technical viewpoint, the solution deployed by KT combines Multipath TCP and the SOCKS protocol. Korean Telecom has convinced Samsung and LG Electronics to port the open-source Multipath TCP implementation in the Linux kernel on their high-end smartphones. As of December 2015, there are about half a dozen different smartphone models from these two vendors that include Multipath TCP. Each smartphone also includes a SOCKS client that intercepts all TCP connection establishments and redirects them to a SOCKS proxy running on one server managed by Korean Telecom. The SOCKS proxy uses the Multipath TCP implementation in the Linux kernel and terminates the Multipath TCP connection.

In July 2015, 5,000 users had subscribed to the Gigapath service. In November 2015, there were about 20,000 users.

Overthebox

OVH is a French cloud company that also provides DSL services. In September 2015, they announced a new product called Overthebox. This product combines Multipath TCP and SOCKS proxies to enable users to bond different DSL lines together. In contrast with the SOCKS-based solution deployed by KT, OVH did not modify the enduser devices. Instead, they provide a device that is attached to the different DSL routers that need to be combined. This device acts as the default gateway in the home network and serves as the DHCP server. Its SOCKS client can then intercept all established TCP connections and convert them into Multipath TCP towards a SOCKS server running in the cloud. The SOCKS server terminates the Multipath TCP and creates a regular TCP connection to the final destination. In December 2015, more than 300 users already participate in the beta and the commercial deployment is expected in January 2016.

New ways to verify that Multipath TCP works through your network

Wed, 16 Dec 2015 00:00:00 +0100

New ways to verify that Multipath TCP works through your network

The design of Multipath TCP has been heavily influenced by the middleboxes that have been deployed in a wide range of networks, notably in cellular and enterprise networks. Some of these middleboxes like regular NATs interact correctly with Multipath TCP and many Multipath TCP users work behind NATs. However, some middleboxes, such as firewalls or TCP optimisers, terminate TCP connections or interfere with TCP options and thus interact badly with Multipath TCP.

Several tools can be used to verify that Multipath TCP works through a given network. If you have installed a Multipath TCP enabled kernel, you can simply use curl and issue the following command :

curl http://www.multipath-tcp.org

The webserver that supports http://www.multipath-tcp.org has been configured to send a special response to an HTTP request with the curl User-Agent. If the request is sent over a regular TCP connection, the server replies with :

Nay, Nay, Nay, your have an old computer that does not speak MPTCP. Shame on you!

If the HTTP request is sent over a Multipath TCP connection, the server replies with :

Yay, you are MPTCP-capable! You can now rest in peace.

This is a basic test that if often used to validate the correct installation of a Multipath TCP enabled Linux kernel.

However, many users are interested in simpler tests through a web interface or through a smartphone application. Two young researchers have recently released two useful tools.

http://amiusingmptcp.com has been the first website created to verify that Multipath TCP was working correctly. Unfortunately, it is not anymore up and running. Alexander Frommgen and his colleagues at TU Darmstadt have posted an updated version of this website. In addition to verifying that the web page is served over a Multipath TCP connection, the new web site also checks whether Multipath TCP passes correctly through other ports. You can test it at http://amiusingmptcp.de Other tests will be added soon.

Another option is tracebox. This command-line tool allows to perform traceroute-like tests with different TCP options to verify whether they pass through middleboxes. tracebox works well on Linux and MacOS, but not yet on smartphones.

Raffaele Zullo a student at the University of Napoli in Italy has spent several months at the University of Liege to work with Benoit Donnet. During his internship, he developed a new version of tracebox that runs on Android smartphones. It requires a rooted smartphone, but does not need a Multipath TCP kernel on the smartphone. You can download it from

https://play.google.com/store/apps/details?id=be.ac.ulg.mobiletracebox

Measuring the adoption of Multipath TCP is not so simple…

Tue, 27 Oct 2015 00:00:00 +0100

Measuring the adoption of Multipath TCP is not so simple…

In September 2015, a google alert announced a new workshop paper entitled An Early Look at Multipath TCP Deployment in the Wild. The paper abstract was intriguing with sentences like We find that less than 0.1% of Alexa unique domains and IP addresses currently support MPTCP. Their geographic distribution is quite different from that of clients reported in other studies, with the majority of domains being in China. Based on the abstract and the results reported in the paper, one could assume that Multipath TCP has already been deployed on various Internet servers, and the paper lists several important websites in China that are supposed to support Multipath TCP.

Unfortunately, the initial measurements reported in this workshop paper were flawed. Most of the studies on the deployment of TCP extensions have used a network scanner (zmap in this one) to send SYN segments containing a specific TCP option. If the server replies with the same option, then it is assumed to support the TCP extension. The authors of this paper applied the same methodology to Multipath TCP. Unfortunately, looking simply at the presence of an option in the SYN+ACK is not sufficient because there are middleboxes that simply echo any option that they receive. This problem has been discussed in the Multipath TCP mailing list in the past and has influenced the design of Multipath TCP. It is notably described in the article that proposed the tracebox debugging tool.

With tracebox, it is easy to check whether a server really supports Multipath TCP. Let us start with multipath-tcp.org

sudo tracebox -n -v -p "IP/TCP/MPCAPABLE" multipath-tcp.org
tracebox to 130.104.230.45 (multipath-tcp.org): 64 hops max
...
...
...
9: 130.104.230.45 TCP::SrcPort (47416 -> 80) TCP::DstPort (80 -> 47416)
   TCP::SeqNumber (554672918 -> 3111441317) TCP::AckNumber (0 -> 554672919)
   TCP::DataOffset (8 -> 9) TCP::Flags (( SYN ) -> ( SYN ACK ))
   TCP::WindowsSize (5840 -> 28800) TCP::CheckSum (0x5eb9 -> 0x206d)
   IP::TotalLength (52 -> 56) IP::Identification (0x50ab -> 0x0)
   IP::Flags (0 -> 2) IP::TTL (9 -> 57) IP::CheckSum (0x406 -> 0x1879)
   IP::SourceIP (192.168.0.9 -> 130.104.230.45)
   IP::DestinationIP (130.104.230.45 -> 192.168.0.9)
   +TCPOptionMaxSegSize < TCPOptionMaxSegSize (4 bytes) :: Kind = 2 , Length = 4 , MaxSegSize = 1380 , >
   TCPOptionMPTCPCapable::Sender's Key (Sender's Key = 692439777126907904 -> Sender's Key = 17898842517462319104)

The tracebox command is used to send a SYN segment with the MP_CAPABLE option. The output above (only the last line, the interesting one is shown) indicates that the server has replied by adding the MSS option that was not present in the SYN. Furthermore, the SYN+ACK includes the MP_CAPABLE option with a different key than the one sent in the SYN segment.

The same test towards a server that does not (yet ?) support Multipath TCP is shown below :

sudo tracebox -n -v -p "IP/TCP/MPCAPABLE" google.com
tracebox to 109.88.203.231 (google.com): 64 hops max
...
...
...
5: 109.88.203.231 TCP::SrcPort (12002 -> 80) TCP::DstPort (80 -> 12002)
TCP::SeqNumber (304251108 -> 752364946) TCP::AckNumber (0 -> 304251109)
TCP::DataOffset (8 -> 6) TCP::Flags (( SYN ) -> ( SYN ACK ))
TCP::WindowsSize (5840 -> 29200) TCP::CheckSum (0x6797 -> 0xf71)
IP::TotalLength (52 -> 44) IP::Identification (0x7c31 -> 0xaeac)
IP::TTL (5 -> 60) IP::CheckSum (0xbd6 -> 0xd62e)
IP::SourceIP (192.168.0.9 -> 109.88.203.231)
IP::DestinationIP (109.88.203.231 -> 192.168.0.9)
+TCPOptionMaxSegSize < TCPOptionMaxSegSize (4 bytes) :: Kind = 2 , Length = 4 , MaxSegSize = 1460 , >
-TCPOptionMPTCPCapable < TCPOptionMPTCPCapable (12 bytes) :: Kind = 30 , Length = 12 , Subtype = 0 , Version = 0 , Checksum = 1 (Checksum Enabled) , Flags = 0 , Crypto = 1 (HMAC-SHA1) , Sender's Key = Sender's Key = 1674215399152943104 , >

Here, the - (minus) sign before the MP_CAPABLE option indicates that this option was not included in the SYN+ACK. This server clearly does not support Multipath TCP.

Now, let us perform the same test to one server located in China :

sudo tracebox -n -v -p "IP/TCP/MPCAPABLE" cnzz.com
tracebox to 42.156.162.55 (cnzz.com): 64 hops max
...
...
...
23: 42.156.162.55 TCP::SrcPort (7487 -> 80) TCP::DstPort (80 -> 7487)
TCP::SeqNumber (915764810 -> 316943796) TCP::AckNumber (0 -> 915764811)
TCP::Flags (( SYN ) -> ( SYN ACK )) TCP::CheckSum (0x61b5 -> 0x210c)
IP::TTL (23 -> 46) IP::CheckSum (0xaf7c -> 0xcc48) IP::SourceIP
(192.168.0.9 -> 42.156.162.55) IP::DestinationIP
(42.156.162.55 -> 192.168.0.9)

Here, there is no - (minus) sign before the MP_CAPABLE option, which indicates that this option has been echoed by this server. A valid Multipath TCP server would not select the same key as the client, but a middlebox that echoes unknown options would…

Note that this behaviour is not specific to the Multipath TCP option. Any unknown option is echoed by these middleboxes :

sudo tracebox -n -v -p "IP/TCP/TCPOption.new{kind=222, length=4, data={249,137}}" baidu.com
tracebox to 111.13.101.208 (baidu.com): 64 hops max
...
...
...
20: 111.13.101.208 TCP::SrcPort (46253 -> 80) TCP::DstPort (80 -> 46253)
TCP::SeqNumber (569044298 -> 733320366) TCP::AckNumber(0 -> 569044299)
TCP::Flags (( SYN ) -> ( SYN ACK )) TCP::CheckSum (0x57be -> 0x9749)
IP::TTL (20 -> 47) IP::CheckSum (0x9426 -> 0xa4fa)
IP::SourceIP (192.168.0.9 -> 111.13.101.208)
IP::DestinationIP (111.13.101.208 -> 192.168.0.9)

While a normal server removes this unknown option in the SYN+ACK.

sudo tracebox -n -v -p "IP/TCP/TCPOption.new{kind=222, length=4, data={249,137}}" multipath-tcp.org
tracebox to 130.104.230.45 (multipath-tcp.org): 64 hops max
...
...
...
9: 130.104.230.45 TCP::SrcPort (51065 -> 80) TCP::DstPort (80 -> 51065)
TCP::SeqNumber (10489234 -> 3886645051) TCP::AckNumber (0 -> 10489235)
TCP::Flags (( SYN ) -> ( SYN ACK )) TCP::WindowsSize (5840 -> 29200)
TCP::CheckSum (0xb23c -> 0xc02c) IP::Identification (0x2b81 -> 0x0)
IP::Flags (0 -> 2) IP::TTL (9 -> 57) IP::CheckSum (0x3130 -> 0x1885)
IP::SourceIP (192.168.0.9 -> 130.104.230.45)
IP::DestinationIP (130.104.230.45 -> 192.168.0.9)
-TCPOption < TCPOption (4 bytes) :: Kind = 222 , Length = 4 , Payload = \xf9\x89>
+TCPOptionMaxSegSize < TCPOptionMaxSegSize (4 bytes) :: Kind = 2 , Length = 4 , MaxSegSize = 1380 , >

Here, the MSS option has been added and the unknown option has been removed.

After some discussions, the authors of this scan have updated their methodology and now correctly distinguish between real Multipath TCP enabled servers and middleboxes that echo Multipath TCP options. They even provide an interesting dashboard that summarises their measurements at

https://academic-network-security.research.nicta.com.au/mptcp/deployment/

NetPerfMeter : A Network Performance Metering Tool

Mon, 07 Sep 2015 00:00:00 +0200

NetPerfMeter : A Network Performance Metering Tool

Introduction

A common problem for evaluating multiple transport protocols in a multi-platform environment is to have a test tool that is capable to run in all these environments, and – of course – to support all necessary protocols. Using different evaluation tools is not a good solution, since each tool may introduce its own – and possibly incompatible – parametrisation scheme. In order to overcome this problem, originally for the use case of evaluating the Stream Control Transmission Protocol (SCTP) and compare it to the Transmission Control Protocol (TCP), NetPerfMeter has been designed and developed.

What is NetPerfMeter?

NetPerfMeter [3][5] is an open source, multi-platform transport protocol performance evaluation software. It currently supports the Linux, FreeBSD and MacOS platforms (with possibility to easily extend it to further platforms), and the transport protocols SCTP, TCP including Multi-Path TCP (MPTCP, if supported by the operating system), UDP (User Datagram Protocol) and DCCP (Datagram Congestion Control Protocol, if supported by the operating system). The figure below presents the NetPerfMeter protocol stack.

In each direction, a data channel can be operated in saturated mode (send as much as possible; not available for UDP-based data channels) or non-saturated mode. In case of the non-saturated mode, the traffic is configured in form of frame rates and frame sizes. That is, frames of a given size are generated in given intervals. Both, frame rate and frame size can be randomised as well.

The performance of the data channels (bandwidth, delay, jitter, etc.) is evaluated and recorded at active and passive side. At the end of a measurement, all collected data is transferred over the control channel to the active side. That is, a user can just conveniently collect all results at the active side. Further details on NerPerfMeter can be found in [3]. [4], [3] also introduce an OMNeT++ simulation model for NetPerfMeter that can be used in simulations, in order to easily compare simulation results and real-world measurements. [1], [2] and [6] provide examples of NetPerfMeter usage for SCTP and MPTCP protocol performance analyses.

Testing MPTCP with NetPerfMeter

In the following, a short tutorial for using NetPerfMeter with MPTCP is provided. It explains the basic features to start experimenting. A more detailed overview of all possible options can be found in the manpage of NetPerfMeter.

Starting the Passive Side

First, the passive side of NetPerfMeter needs to be started:

netperfmeter 9000

This command starts NetPerfMeter in server mode, waiting for control channel connections on port 9000.

Starting the Active Side

One Non-Saturated TCP Flow

Let us start with just one TCP flow:

netperfmeter :9000 -tcp const10:const1460:const0:const0

This command starts NetPerfMeter in client mode, connecting to the given peer (given by hostname or IP address), port 9000. A TCP data channel is established, sending 10 frames/s with 1460 bytes/frame from the active to the passive side. The second parameter block specifies 0 frames/s with 0 bytes/frame (that is, no data) in the opposite direction. The measurement ends when stopped by SIGINT (i.e. Ctrl+C).

One Saturated TCP Flow

Making the TCP flow saturated (that is, to send as much as possible) is as easy as setting the frame rate to 0 frames/s (meaning to send as much as possible). Each frame again has a size of 1460 bytes:

netperfmeter :9000 -tcp const0:const1460:const0:const0

A Bidirectional TCP Flow

In order to also transmit data in the direction from passive side to active side, just update the parameters:

netperfmeter :9000 -tcp const0:const1460:const10:const1460

A fixed frame size of 1460 bytes may not be useful in all scenarios. Therefore, NetPerfMeter provides random distributions as well:

exp: Negative exponential distribution with given average. Example: exp1000.

uniform,: Uniform distribution from lower to upper bound. Example: uniform500,25000.

Random distributions can of course be used for the frame rate as well.

Multi-Path Transport

NetPerfMeter is also able to turn MPTCP on or off per socket (given that the implementation supports the TCP_MULTIPATH_ENABLE socket option). Then, MPTCP can be enabled by the “cmt=mptcp” option:

netperfmeter :9000 -tcp const0:const1460:const0:const0:cmt=mptcp

MPTCP versus TCP

Of course, NetPerfMeter can use multiple data channels as well:

netperfmeter :9000 \
    -tcp const0:const1460:const0:const0:cmt=mptcp \
    -tcp const0:const1460:const0:const0:cmt=off

In this case, NetPerfMeter starts one MPTCP flow and one concurrent TCP flow. “cmt=off” turns MPTCP off (the default; just not specifying the “cmt” option has the same effect).

Recording Statistics

One of the most important features for researchers is of course to easily get machine-readable results files. Two parameters provide the generation of such files:

-scalar=.

: Generates files named ---. with aggregates over the whole measurement. The scalar file format is compatible to OMNeT++ scalar files.

-vector=. :

Generates files named ---. with per-frame statistics. The vector files are text tables that could for example be processed by GNU R2 or GNU Plot3. For convenience, if the suffix ends with .bz>2, the results file is BZip2-compressed on the fly. To automate measurements, the -runtime= option specifies a duration of the measurement. That is, the following example would run a 60s MPTCP versus TCP comparison, and record scalars as well as vectors:

netperfmeter :9000 \
   -runtime=60 \
   -scalar=scalars.sca.bz2 \
   -vector=vectors.vec.bz2 \
   -tcp const0:const1460:const0:const0:cmt=mptcp \
   -tcp const0:const1460:const0:const0:cmt=off

Conclusion

NetPerfMeter is a convenient and flexible open source tool for transport protocol performance analysis. It particularly provides multi-platform support and works with TCP, SCTP, UDP as well as DCCP.

References

[1] Dreibholz, T.; Zhou, X. and Fa, F.: “Multi-Path TCP in Real-World Setups – An Evaluation in the NorNet Core Testbed”, in 5th International Workshop on Protocols and Applications with Multi-Homing Support (PAMS), pp. 617–622, Gwangju/South Korea, March 2015.

[2] Dreibholz, T.; Adhari, H.; Becke, M. and Rathgeb, E. P.: “Simulation and Experimental Evaluation of Multipath Congestion Control Strategies”, in Proceedings of the 2nd International Workshop on Protocols and Applications with Multi-Homing Support (PAMS), Fukuoka/Japan, March 2012.

[3] Dreibholz, T.: “Evaluation and Optimisation of Multi-Path Transport using the Stream Control Transmission Protocol”, Habilitation Treatise, University of Duisburg-Essen, Faculty of Economics, Institute for Computer Science and Business Information Systems, March 2012.

[4] Dreibholz, T.; Adhari, H.; Becke, M. and Rathgeb, E. P.: “NetPerfMeter – A Versatile Tool for Multi-Protocol Network Performance Evaluations”, OMNeT++ Code Contribution, University of Duisburg-Essen, Institute for Experimental Mathematics, February 2012.

[5] Dreibholz, T.; Becke, M.; Adhari, H. and Rathgeb, E. P.: “Evaluation of A New Multipath Congestion Control Scheme using the NetPerfMeter Tool-Chain”, inProceedings of the 19th IEEE International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pp. 1–6, Hvar/Croatia, September 2011.

[6] Adhari, H.; Dreibholz, T.; Becke, M.; Rathgeb, E. P. and Tüxen, M.: “Evaluation of Concurrent Multipath Transfer over Dissimilar Paths”, in Proceedings of the 1st International Workshop on Protocols and Applications with Multi-Homing Support (PAMS), pp. 708–714, Singapore, March 2011.

In Korean, Multipath TCP is pronounced GIGA Path

Fri, 24 Jul 2015 00:00:00 +0200

In Korean, Multipath TCP is pronounced GIGA Path

In September 2013, Apple surprised the networking community by enabling Multipath TCP on all iOS devices . The main motivation for this deployment was to support Apple’s voice recognition application and enable it to work seamlessly over both WiFi and cellular networks. Multipath TCP is a good match for this application, but it can also be used for other use cases.

At IETF’93 in Prague, SungHoon Seo provided several very interesting details of the Gigapath commercial service that is now sold by KT. This service enables smartphone users to reach bandwidth of up to 1 Gbps on existing smartphones. This is probably the fastest commercially deployed mobile network. They achieve this high bandwidth by combining both fast LTE (with carrier aggregation) and fast WiFi networks on Multipath TCP enabled smartphones. At this stage, only the Samsung Galaxy S6 and Galaxy S6 Edge smartphones support the Gigapath service, but KT is working with other vendors to add Multipath TCP on their smartphones. Measurements presented at the MPTCP Working Group meeting revealed that current smartphones are able to reach throughputs of about 800 Mbps out of a theoretical maximum of 1.17 Gbps.

What is more impressive is how the system has been implemented and how the users can benefit from it. The figure below, extracted from SungHoon Seo’s presentation provides the general architecture of the GIGA Path system.

On the client side, the smartphones include the open-source Multipath TCP implementation in the Linux kernel. Samsung reused release 0.89.4 and backported it in their Android kernel. The full source code of their Multipath TCP kernel is available online

Enabling Multipath TCP on the smartphone is the first step in deploying it. However, this is not sufficient since there are very few servers that support Multipath TCP today. To enable their users to benefit from Multipath TCP for all the applications that they use, KT has opted for a SOCKSv5 proxy. This proxy is running on x86 servers using release 0.89.5 of the open-source Multipath TCP implementation in the Linux kernel. During the presentation, SungHoon Seo mentioned that despite the recent rollout of the service, there were already 5,500 active users on the SOCKS proxy the last time he checked. Thanks to this proxy, the subscribes of the Giga Path service in Korea can benefit from Multipath TCP with all the TCP-based applications that they use.

At the end of KT’s presentation, another network engineer mentioned that he would come back to his management and propose a similar approach to deploy Multipath TCP in his network. We can thus expect other large scale deployments in the coming months.

A closer look at the scientific literature on Multipath TCP

Thu, 09 Apr 2015 00:00:00 +0200

A closer look at the scientific literature on Multipath TCP

Some time ago, @ben_pfaff sent a tweet indicating that he was impressed by a google scholar search that returns more than 1,000 hits for “open vswitch”. This triggered my curiosity and I wondered what was the current impact of Multipath TCP in the scientific literature.

Google scholar clearly indicates that there is a Multipath TCP effect in the academic community. For the “Multipath TCP” query, google scholar already lists more than 1500 results.

For the “mptcp” query, google scholar reports a similar number of documents.

This is a clear indication that there is a growing number of researchers who are adopting Multipath TCP and propose extensions and improvements to it. The researchers who start to work on Multipath TCP will need to understand an already large set of articles. As a first step towards a detailed bibliography on Multipath TCP, I’ve started to collect some notes on IETF documents and scientific paper on this topic. This is a work in progress that will be updated every time I find some time to dig into older papers or new interesting papers are published.

The current version of the annotated bibliography will always be available from https://github.com/obonaventure/mptcp-bib This repository contains all the latex and bibtex files and could be useful to anyone doing research on Multipath TCP. The pdf version (mptcp-bib.pdf) will also be updated after each major change for those who prefer pdf documents.

Interesting Multipath TCP talks

Wed, 25 Mar 2015 00:00:00 +0100

Interesting Multipath TCP talks

Various tutorials and trainings have been taught on Multipath TCP during the last years. Some of these have been recorded and are available via youtube.com.

The most recent video presentation is the talk given by Octavian Purdila from intel at the netdev’01 conference In this talk, Octavian first starts with a brief tutorial on Multipath TCP targeted at Linux networking kernel developpers and then describes in details the structure of the current code and the plans for upstreaming it to the official Linux kernel.

https://www.youtube.com/watch?v=wftz2cU5SZs

A longer tutorial on the Multipath TCP protocol was given by Olivier Bonaventure at IETF’87 in Berlinin August 2013.

https://www.youtube.com/watch?v=Wp0Kr3B64tA

Christoph Paasch gave a shorter Multipath TCP tutorial earlier during FOSDEM’13 in Brussels.

https://www.youtube.com/watch?v=wvO0bcWgXCs

Earlier, Costin Raiciu and Christoph Paasch gave a one hour Google Research talk on the design of the protocol and several use cases.

Costin : google tech talk

https://www.youtube.com/watch?v=02nBaaIoFWU

The Google Research talk was given a few days after the presentation of the USENIX NSDI’12 paper that received the community award. This presentation is available from the USENIX website.

mptcptrace demo, experiment five

Fri, 06 Feb 2015 00:00:00 +0100

mptcptrace demo, experiment five

This is the fifth post of a series of five. Context is presented in the first post. The second post is here. The third post is here. The fourth post is here.

Fifth experiment

Green at 0s:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Green at 5s:	delay : 100ms bandwidth : 4mbit/s loss : 10%
Green at 15s:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Red:	delay : 40ms bandwidth : 4mbit/s
Client:	Scheduler : Round robin

Last experiment of this series, we come back to the third experiment, and instead of adding a 1% loss rate after 15 seconds of the red subflow, we change the MPTCP scheduler and we use the round robin. It is worth to note that the round robin scheduler still respects the congestion window of the subflows.

Let’s see the evolution of the sequence number :

The first thing that we can see are the small steps between 5 and 15 seconds. We also have the impression that we use more the red subflow but if we zoom :

we can confirm that we use both subflows. We see 3 lines:

The segments, because they are sent together at the sender, red and green subflows do not form two separate lines
The green acks : we can see that they are closer to the segment line
The red acks : we can see that all red acks are late from the MPTCP point of view. This is normal since the green delay is shorter.

If we look at the evolution of the sequence number between 5 and 15 seconds, we can observe a series of stairs.

If we take a close at one of the this stairs :

because the green subflow is lossy during this period, we have reordering. Because we use the round robin scheduler, MPTCP still decides to send some data over the green path.

If we now take a look at the evolution of the goodput :

We can see the perturbation of the lossy link over the “instantaneous” goodput.

However the impact on the average goodput is somehow mitigated. Depending of the application, these variations may be problematic or not.

If we take a look at the evolution of the MPTCP unacked data, we see a lot of variations during the period 5 to 15 seconds. This is due to the reordering that happens during this period. This is not a big issue as long as the receive window is big enough to absorb these variations. In some scenarios this may be an issue if the window is too small. We may also remark that MPTCP may use more memory in this case on the receiver due to the buffer auto-tuning.

Finally, we can take a look at the evolution of the unacked data at the TCP level.

We can observe that we use both subflows during the whole connection but losses between 5 and 15 seconds on the green subflow leads to a bigger usage of the red subflow during this period.

Conclusion

This ends the series of posts that shows some basic MPTCP experiments. mptcptrace has been used to get the values out of the traces and R scripts have been used to produce the graphs. However we did not really post process the data in R. We have more experiments and visualizations that we will present later.

Why are there sometimes long delays before the establishment of MPTCP connections ?

Thu, 05 Feb 2015 00:00:00 +0100

Why are there sometimes long delays before the establishment of MPTCP connections ?

Multipath TCP users sometimes complain that Multipath TCP connections are established after a longer delay than regular TCP connections. This can happen in some networks and the culprit is usually a middlebox hidden on the path between the client and the server. This problem can easily be detected by capturing the packets on the client with tcpdump Such a capture looks like :

11:24:05.225096 IP client.59499 > multipath-tcp.org.http:
      Flags [S], seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7358805 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

The client sends a SYN with the MP_CAPABLE option… Since it receives no answer, it retransmits the SYN.

11:24:06.224215 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7359055 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

And unfortunately two times more…

11:24:08.228242 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7359556 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

11:24:12.236284 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7360558 ecr 0,nop,wscale 7,
      mptcp capable csum {0xaa7fa775d16fa6bf}], length 0

At this point, Multipath TCP considers that there could be a middlebox that discards SYN segments with the MP_CAPABLE option on the path to reach the server and disables Multipath TCP.

11:24:20.244351 IP client.59499 > multipath-tcp.org.http: Flags [S],
      seq 270592032, win 29200, options [mss 1460,sackOK,
      TS val 7362560 ecr 0,nop,wscale 7], length 0

This segment immediately reaches the server that replies :

11:24:20.396718 IP multipath-tcp.org.http > client.59499: Flags [S.],
      seq 3954135908, ack 270592033, win 28960, options [mss 1380,sackOK,
      TS val 2522075773 ecr 7362560,nop,wscale 7], length 0
11:24:20.396748 IP client.59499 > multipath-tcp.org.http: Flags [.],
      ack 1, win 229, options [nop,nop,TS val 7362598 ecr 2522075773], length 0

As shown by the trace, the middlebox, by dropping the SYN segments containing the MP_CAPABLE option has delayed the establishment of the TCP connection by fifteen seconds. This delay is controlled by the initial retransmission timer (one second in this example) and the exponential backoff applied by TCP to successive retransmissions of the same segments.

What can Multipath TCP users do to reduce this delay ?

the best answer is to contact their sysadmins/network administrators and use a tool like tracebox to detect where packets with the MP_CAPABLE option are dropped and upgrade this middlebox

if changing the network is not possible, the implementation of Multipath TCP in the Linux kernel can be configured to more aggressively fallback to regular TCP through the net.mptcp.mptcp_syn_retries configuration variable described on http://multipath-tcp.org/pmwiki.php/Users/ConfigureMPTCP. This variable controls the number of retransmissions for the initial SYN before stopping to use the MP_CAPABLE option (the default is 3)

mptcptrace demo, experiment four

Thu, 05 Feb 2015 00:00:00 +0100

mptcptrace demo, experiment four

This is the fourth post of a series of five. Context is presented in the first post. The second post is here. The third post is here.

Fourth experiment

Green at 0s:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Green at 5s:	delay : 100ms bandwidth : 4mbit/s loss : 10%
Green at 15s:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Red at 0:	delay : 40ms bandwidth : 4mbit/s
Red at 15:	delay : 40ms bandwidth : 4mbit/s loss : 1%
Client:	Scheduler : default

In this fourth experiment, we change the loss rate of the red path to 1% after 15 seconds.

Again we take a look at the evolution of the sequence number.

In this case however we can see the shift after 15 seconds. Because the red link become lossy after 15 seconds, the congestion window of the red subflow shrinks and MPTCP needs to send data again on the green subflow if the congestion window of the red subflow becomes too small to sustain the application rate. Because MPTCP now sends a little bit of traffic on the green subflow, it realises that the green subflow has changed and has now a lower delay and a lower loss rate. As a consequence the green subflow will open the congestion window again and will have a large enough congestion window to sustain the application rate.

If we take a look at the evolution of the goodput, we see the two shifts at 5s and at 15s

The evolution of the MTPCP unacked data is shown below :

We can see the change after 15s and that we use less of the receiver window after 15 seconds… In this case, because the red link is lossy, we consume less of the receive window. In our case the receive window is big enough anyway, but we would have different results if the window were smaller. We could reduce the window size by reducing the rmem but this for an other experiment.

Again we can look at the evolution of the unacked data at the TCP level

again we observe the shift after 15 seconds.

mptcptrace demo, experiment three

Wed, 04 Feb 2015 00:00:00 +0100

mptcptrace demo, experiment three

This is the third post of a series of five. Context is presented in the first post. The second post is here.

Third experiment

Green at 0s:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Green at 5s:	delay : 100ms bandwidth : 4mbit/s loss : 10%
Green at 15s:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Red:	delay : 40ms bandwidth : 4mbit/s
Client:	Scheduler : default

In this third scenario, the green link become lossless again after 15 seconds and its delay comes back to 10ms.

Again let’s see the evolution of the sequence numbers.

We can observer the same shift as the previous experiment from the green path to the red path after 5 seconds (see below). However, quite surprisingly, we can not see any shift after 15 seconds. Due to the many losses and the high delay between 5 and 15 on the green path, the green path may have a small congestion window and still have the same estimation of the RTT because no new traffic has been sent over the green subflow. No traffic has been pushed on the green subflow by MPTCP, because the red subflow, even though it has a longer delay, can still sustain the application rate. In conclusion, because MPTCP never probes the green subflow, it can not get any idea that this subflow could now be better.

We can also take a look the goodput. We see the same perturbation at 5 second that we saw on the previous experiment. And we don’t see any any change around 15 seconds. It’s normal because MPTCP does not come back on the green subflow.

Again the zoom shows similarities with the previous experiment.

We can also take a look at the amount of data that is not yet acked from the perspective of the sender.

On this figure, the black line shows the evolution of the receive window announced by the receiver. The cyan line shows the amount of data that is not MPTCP-acked while the blue line shows the sum of data that is not TCP acked. If the blue line and the cyan line do not stick, this means that that segments arrive out of order (from the MPTCP sequence number view point) at the receiver

We can also look at the evolution of the tcp unacked data by subflow.

This graph shows the evolution of the unacked data at the TCP layer. We can observe that MPTCP either choose one path or the other but hardly never choose to use both at the same time. On this graph we can also observe the consequence of a longer delay on the red path : the amount of data on flight is higher.

mptcptrace demo, experiment two

Tue, 03 Feb 2015 00:00:00 +0100

mptcptrace demo, experiment two

This is the second post of a series of five. Context is presented in the first post.

Second experiment

Green at 0s:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Green at 5s:	delay : 100ms bandwidth : 4mbit/s loss : 10%
Red:	delay : 40ms bandwidth : 4mbit/s
Client:	Scheduler : default

In this scenario, we change delay and the loss rate of the green link after 5 seconds.

Just like we did for the first experiment, we take a look at the ack/seq graph.

As expected, we see the shift from the green path to the red path after 5 sec. We will now take a closer look to what happens around 5 seconds, when the delay and the loss rate rise.

Because of the loss, the congestion window of the green subflow starts to shrink. As a consequence, MPTCP starts to send segments over the red subflow because there is not enough space in the green congestion window to sustain the rate. The spacing between the green segments and green triangle is different than the spacing between the red segments and the red triangles. This reflects the different RTT on the green subflow and on the red subflow. Because the green link is lossy and has a longer delay, we observe duplicate acks from the red link because we miss some segments sent on the green link at the receiver. However on the red link, even if we have a higher delay, we don’t have many losses. This leads to jump in acks during the transition. We can see the stair shape.

As we did for the previous experiment, we now take a look at the evolution of the goodput.

We see the transition after 5 seconds on the blue line. The consequences over the cyan line is however quite small. But let’s zoom a bit on this specific event.

As we can see after five seconds there is a drop in the “instantaneous” goodput, however this change does not really last enough time to have an impact on the average goodput. If the application is sensitive to goodput variation this may be an issue. If the application is not, the user should barely see the difference.

With synproxy, the middlebox can be on the server itself

Tue, 03 Feb 2015 00:00:00 +0100

With synproxy, the middlebox can be on the server itself

Multipath TCP works by adding the new TCP options defined in RFC 6824 in all TCP segments. A Multipath TCP connection always starts with a SYN segment that contains the MP_CAPABLE option. To use benefit from Multipath TCP, both the clients and the server must be upgraded with an operating system that supports Multipath TCP. With such a kernel on the client and the servers, Multipath TCP should be used for all connections between the two hosts. This is true provided that there are no middleboxes on the path between the client and the server.

A user of the Multipath TCP implementation in the Linux kernel recently reported problems on using Multipath TCP on a server. During the discussion, it appeared that a possible source of problems could be the synproxy module that is part of recent iptables implementations. synproxy, as described in a RedHat blog post can be used to mitigate denial of service attacks on TCP by filtering the SYN segments. This module could be installed by default on your server or could have been enabled by the system administrators. If you plan to use Multipath TCP on the server, you need to disable it because synproxy does not currently support Multipath TCP and will discard the SYN segments that contain the unknown MP_CAPABLE option. In this case, the middlebox that breaks Multipath TCP resides on the Multipath TCP enabled server…

mptcptrace demo

Mon, 02 Feb 2015 00:00:00 +0100

mptcptrace demo

In this post we will show a small demonstration of mptcptrace usage. mptcptrace is available from http://bitbucket.org/bhesmans/mptcptrace.

The traces used for these examples are available from /data/blogPostPCAP.tar.gz . This is the first post of a series of 5 posts.

Context

We consider the following use case:

We have the client on the left that has two interfaces and two links to the server.

The green and red links each have well defined delay, bandwidth and loss rate. These links are respectively only used by the first interface of the client and the second interface of the client.

For our experiments, the client pushes a given amount of data at a given rate (at the application level) to the server.

For all the experiments the client sends 15000kB of data at 400kB/s. Experiments should always last at least 150/4 = 37s.

We use the fullmesh path manager for all the experiments. We also set the congestion control scheme to lia for all the experiments.

The rmem for all the experiments is set to 10240 87380 16777216

All this topologies are within mininet.

We analyze the client trace with mptcptrace with the following command:

mptcptrace -f client.pcap -s -G 50 -F 3 -a

but xplot files are not always easy to convert to png files, so we use:

mptcptrace -f client.pcap -s -G 50 -F 3 -a -w 2

to output csv files instead of xpl files and we parse them with R scripts.

First experiment

Green:	delay : 10ms bandwidth : 4mbit/s loss : 0%
Red:	delay : 40ms bandwidth : 4mbit/s loss : 0%
Client:	Scheduler : default

We collect the trace of the connection and generate the following graphs

To get a first general idea, we can take a look at the sequence/acknowledgments graph

The graph is composed of vertical lines for each segments sent and small triangles for each ack. The bottom of a vertical line is the start of the MPTCP map and the top of is the end of the MPTCP map. Since there are many segments and acks, it looks like a simple line, however the zoom bellow shows the segments and the acks. The color depends on the path. Here most of the data goes through the green path because the default MPTCP scheduler always choose the path with the smallest RTT as long as there is enough space in the congestion window. Because the green path can support the sending rate of the application alone, MPTCP does not need to use the red path. However we can see a small difference at more or less 20 seconds. Let’s zoom on this part.

On this zoom we can see that MPTCP decides to send one segment on the red path. This may be due to a loss on the green subflow. Because the red path has a higher delay, segments sent by the green subflow after the red segment will arrive before at the client. In other words, green data will arrive out of sequence at the receiver. We see a series of duplicate MPTCP green acks before a jump in the acks. We can also see that the red ack takes into account data that has not been sent on the red subflow. This is an indication that some green segments arrive before the red segment. We can also observe that the red ack arrives late and acks data that are already acked at the MPTCP layer. This is due to delay on the way back for the red ack. Nevertheless this ack still carries a usefull ack for the TCP layer.

On the next figure, we take a look at the evolution of the goodput.

The cyan line shows the average goodput since the beginning of the connection while the blue line shows the average goodput over the last 50 acks (see the mptcptrace parameter). The blue line represents a more instantaneous value for the goodput. We could have a more instantaneous view by reducing the value of the G parameter in the mptcptrace command.

Multipath TCP through a strange middlebox

Fri, 30 Jan 2015 00:00:00 +0100

Multipath TCP through a strange middlebox

Users of the Multipath TCP implementation in the Linux kernel perform experiments in various networks that the developpers could not have access to. One of these users complained that Multipath TCP was not working in a satellite environment. Such networks often contain Performance Enhancing Proxies (PEP) that “tune” TCP connections to improve their performance. Often, those PEPs terminate TCP connections and the MPTCP options sent by the client never reach the server. This was not the case in this network and the user complained that Multipath TCP did not advertise the addresses of the server. Fortunately, he managed to capture a packet trace on both the client and the server. An analysis of this packet trace gives interesting insights on the impact of such PEPs on TCP extensions.

The network topology is very simple. The client has two private interfaces (client1 and client2), both behind NATs and the server has two public IP addresses. In the trace below we replace the private IP addresses of the client by client1 and client2. Its public IP address is replaced by client and the two server addresses are server1 and server2.

The client opens a TCP connection towards the server.

09:27:12.316613 IP (tos 0x0, ttl 64, id 15494, offset 0, flags [DF], proto TCP (6), length 72)
 client1.47862 > server1.49803: Flags [S], cksum,
 seq 3452765235, win 28440, options [mss 1422,sackOK,TS val 55654581 ecr 0,
 nop,wscale 8,mptcp capable {0x69ccde41dca19b8f}], length 0

09:27:13.318852 IP (tos 0x0, ttl 64, id 15495, offset 0, flags [DF], proto TCP (6), length 72)
  client1.47862 > server1.49803: Flags [S], cksum,
  seq 3452765235, win 28440, options [mss 1422,sackOK,TS val 55655584 ecr 0,
  nop,wscale 8,mptcp capable {0x69ccde41dca19b8f}], length 0

This is a normal TCP SYN segment with the MSS, SACK, timestamp and MP_CAPABLE options. The second packet does not seem to reach the server. The first is translated by the NAT and received as follows by the server.

09:27:22.729048 IP (tos 0x0, ttl 47, id 15494, offset 0, flags [DF], proto TCP (6), length 72)
  client.47862 > server1.49803: Flags [S], cksum,
  seq 3452765235, win 384, options [mss 1285,sackOK,TS val 55654581 ecr 0,
  nop,wscale 8,mptcp capable {0x69ccde41dca19b8f}], length 0

There are several interesting points to observe when comparing the two packets. First, the MSS option is modified. This is not unusual but indicates a middlebox on the path. Note that the window is severely reduced (384 instead of 28440). The server replies with a SYN+ACK.

09:27:22.729220 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 72)
   server1.49803 > client.47862: Flags [S.], cksum,
   seq 3437506945, ack 3452765236, win 28560, options [mss 1460,sackOK,
   TS val 155835098 ecr 55654581,nop,wscale 8,
   mptcp capable {0x32205e67a94ad606}], length 0

This segment is also modified by the middlebox. It updates the MSS, window, but does not change the timestamp chosen by the server.

09:27:14.188324 IP (tos 0x0, ttl 51, id 0, offset 0, flags [DF], proto TCP (6), length 72)
   server1.49803 > client1.47862: Flags [S.], cksum,
   seq 3437506945, ack 3452765236, win 384, options [mss 1285,sackOK,
   TS val 155835098 ecr 55654581,nop,wscale 8,
   mptcp capable {0x32205e67a94ad606}], length 0

Since the MP_CAPABLE option has been received in the SYN+ACK segment, the client can confirm the utilisation of Multipath TCP on this connection. This is done by placing the MP_CAPABLE option in the third ack.

09:27:14.188574 IP (tos 0x0, ttl 64, id 15496, offset 0, flags [DF], proto TCP (6), length 80)
   client1.47862 > server1.49803: Flags [.], cksum,
   seq 1, ack 1, win 112, options [nop,nop,TS val 55656453 ecr 155835098,
   mptcp capable {0x69ccde41dca19b8f,0x32205e67a94ad606},
   mptcp dss ack 3426753824], length 0

This segment is received by the server as follows.

09:27:23.456784 IP (tos 0x0, ttl 47, id 15495, offset 0, flags [DF], proto TCP (6), length 80)
   client.47862 > server1.49803: Flags [.], cksum,
   seq 1, ack 1, win 384, options [nop,nop,TS val 55654655 ecr 155835098,
   mptcp capable {0x69ccde41dca19b8f,0x32205e67a94ad606},
   mptcp dss ack 3426753824], length 0

The middlebox has updated the window and the timestamp but it did not change anything in the MP_CAPABLE option and Multipath TCP is confirmed on both the client and the server. The server sends immediately a duplicate acknowledgement containing the ADD_ADDR option to announce its second address.

09:27:23.456960 IP (tos 0x0, ttl 64, id 60464, offset 0, flags [DF], proto TCP (6), length 68)
  server1.49803 > client.47862: Flags [.], cksum,
  seq 1, ack 1, win 112, options [nop,nop,TS val 155835826 ecr 55654655,
  mptcp add-addr id 3 server2,mptcp dss ack 2495228045], length 0

Unfortunately, this segment never reaches the client. As the current path managers do not retransmit the ADD_ADDR option on a regular basis, the client is never informed of the second address.

Since the client also has a second address, it tries to inform the server by sending a duplicate acknowledgement.

09:27:14.188636 IP (tos 0x0, ttl 64, id 15497, offset 0, flags [DF], proto TCP (6), length 68)
  client1.47862 > server1.49803: Flags [.], cksum,
  seq 1, ack 1, win 112, options [nop,nop,TS val 55656453 ecr 155835098,
  mptcp add-addr id 4 client2,mptcp dss ack 3426753824], length 0

This segment never reaches the server. It is likely that the PEP notices that the segment is a duplicate acknowledgement and filters them. Maybe a solution to enable Multipath TCP to correctly pass through this particular middlebox would be place the ADD_ADDR option inside segments that contain data or use techniques to ensure their reliable delivery as proposed in Exploring Mobile/WiFi Handover with Multipath TCP

Note that the Multipath TCP options are correctly transported in other packets. For example, here is the first data segment sent by the server.

09:27:23.466575 IP (tos 0x0, ttl 64, id 60465, offset 0, flags [DF], proto TCP (6), length 107)
  server1.49803 > client.47862: Flags [P.], cksum,
  seq 1:36, ack 1, win 112, options [nop,nop,TS val 155835836 ecr 55654655,
  mptcp dss ack 2495228045 seq 3426753824 subseq 1 len 35,nop,nop], length 35

This segment is received by the client as follows.

09:27:14.987619 IP (tos 0x0, ttl 51, id 60465, offset 0, flags [DF], proto TCP (6), length 107)
  server1.49803 > client1.47862: Flags [P.], cksum,
  seq 1:36, ack 1, win 320, options [nop,nop,TS val 155835173 ecr 55656453,
  mptcp dss ack 2495228045 seq 3426753824 subseq 1 len 35,nop,nop], length 35

The middlebox has modified the timestamp and windows but did not change the Multipath TCP options.

The client can also send data to the server.

09:27:14.988371 IP (tos 0x0, ttl 64, id 15499, offset 0, flags [DF], proto TCP (6), length 93)
  client1.47862 > server1.49803: Flags [P.], cksum,
  seq 1:22, ack 36, win 112, options [nop,nop,TS val 55657253 ecr 155835173,
  mptcp dss ack 3426753859 seq 2495228045 subseq 1 len 21,nop,nop], length 21

The server receives this segment as follows.

09:27:24.320654 IP (tos 0x0, ttl 47, id 15497, offset 0, flags [DF], proto TCP (6), length 93)
  client.47862 > server1.49803: Flags [P.], cksum,
  seq 1:22, ack 36, win 320, options [nop,nop,TS val 55654742 ecr 155835836,
  mptcp dss ack 3426753859 seq 2495228045 subseq 1 len 21,nop,nop], length 21

It is interesting to compare the acknowledgement sent by the client for this segment.

09:27:14.987885 IP (tos 0x0, ttl 64, id 15498, offset 0, flags [DF], proto TCP (6), length 60)
  client1.47862 > server1.49803: Flags [.], cksum,
  seq 1, ack 36, win 112, options [nop,nop,TS val 55657253 ecr 155835173,
  mptcp dss ack 3426753859], length 0

with the acknowledgement that the server actually receives.

09:27:23.487569 IP (tos 0x0, ttl 237, id 15496, offset 0, flags [DF], proto TCP (6), length 68)
  client.47862 > server1.49803: Flags [.], cksum,
  seq 1, ack 36, win 320, options [nop,eol], length 0

The server receives the acknowledgement within 21 msec of the transmission of the data segment. Furthermore, it has a TTL of 237 while the acknowledgement sent by the client had a TTL of 64. Since both packets have a different IPv4 id, it is very likely that the acknowledgement was generated by the PEP and not copied from the client. Note that the middlebox has replaced the second nop option with an eol option. A closer look at the packet reveals something even stranger. The IPv4 packet is 68 bytes long while it contains an IPv4 header (20 bytes), a TCP header (20 bytes) and the nop and eol options, both one byte long. The packet contains thus 26 bytes of garbage (starting with 0c1e below) :

0x0010:            baf6 c28b cdcd 0434 cce4 31a5
0x0020:  c010 0140 1515 0000 0100 0c1e 69cc de41
0x0030:  dca1 9b8f 0a08 0101 0351 3902 0949 ddbc
0x0040:  0000 0000

The value of the TCP Data Offset (c, i.e. 48 bytes) indicates that the middlebox considers that bytes 0c1e ... 0000 belong to the TCP options, but since they appear after the eol option, they are ignored by the TCP stack on the receiver.

This actually removes the timestamp option and the DSS option. Removing the timestamp option was possible according to RFC 1323, but this behaviour is not anymore permitted with RFC 7323. The removal of the DSS option is a problem for Multipath TCP since there is no data acknowledgement. Fortunately, RFC 6824 and the Multipath TCP implementation in the Linux kernel have predicted this problem. Indeed, this ack acknowledges new data without containing a DSS option and Multipath TCP immediately fallsback to regular TCP. This preserves the connectivity at the cost of losing the benefits of Multipath TCP.

A similar problem happens in the other direction. The server has stopeed using Multipath TCP and sends the following packet.

09:27:24.320870 IP (tos 0x0, ttl 64, id 60466, offset 0, flags [DF], proto TCP (6), length 52)
  server1.49803 > client.47862: Flags [.], cksum,
  seq 36, ack 22, win 112, options [nop,nop,TS val 155836690 ecr 55654742], length 0

This ACK does not contain any DSS option. It is processed by the middlebox that removes the timestamp option.

09:27:15.787814 IP (tos 0x0, ttl 254, id 60466, offset 0, flags [DF], proto TCP (6), length 68)
  server1.49803 > client1.47862: Flags [.], cksum,
  seq 36, ack 22, win 320, options [nop,eol], length 0

Again note that the change in the TTL indicates that the middlebox has created a new packet to convey the acknowledgement to the client. At this point, the client fallsback to regular TCP as well as shown by the next segment that it sends.

09:27:15.788047 IP (tos 0x0, ttl 64, id 15500, offset 0, flags [DF], proto TCP (6), length 844)
  client1.47862 > server1.49803: Flags [P.], cksum,
  seq 22:814, ack 36, win 112, options [nop,nop,TS val 55658053 ecr 155835173], length 792

The transfer continues like a regular TCP connection. Note that the TCP timestamps are back. This strange middlebox shows that the objective of preserving connectivity in the presence of middleboxes is well met by the Multipath TCP implementation in the Linux kernel

Text updated on February 2nd and February 3rd based on comments from Raphael Bauduin and Gregory Detal

Simplifying Multipath TCP configuration with Babel

Mon, 19 Jan 2015 00:00:00 +0100

Simplifying Multipath TCP configuration with Babel

Multipath TCP builds its sub-flows based on pairs of the client’s and server’s IP addresses. When a host is connected to two different providers, it should have one IP address associated to each provider, which allows Multipath TCP to effectively use these two paths simultaneously. However, with a single default route, packets will follow the same path, independently of their source address, which prevents Multipath TCP from working properly.

The Multipath TCP website provides a recipe for configuring multiple routes manually on a single host directly connected to multiple providers using policy routing. Even for such simple topology, the procedure consists of 7 commands. Analogous configuration for a more complex network, while possible, is exceedingly tedious, error-prone and fragile.

Consider for example the following topology, where routers A and B are connected to distinct providers, and the host H is connected to router C. Routers A and B have default routes for packets sourced in 10.1.1.0/24 and 10.1.2.0/24 respectively.

     0.0.0.0/0    --- A ----- B ---    0.0.0.0/0
from 10.1.1.0/24       \     /      from 10.1.2.0/24
                        \   /
                         \ /
                          C
                          |
                          |
                          H

A manual configuration of this network will require an intervention on at least routers A, B and C. If for some reason the topology changes, a link fails or a router crashes, the network will fail to work correctly until an explicit intervention of the administrator.

Routing configuration of complex networks should not be done manually. That is the role of a dynamic routing protocol.

The source-specific version of the Babel routing protocol RFC 6126 will dynamically configure a network to route packets based both on both their source and destination addresses. Babel will learn routes installed in the kernel of the gateway routers (A and B), announce them through the network, and install them in the routing tables of the routers. Only the edge gateways need manual configuration to teach Babel which source prefix to apply to the default route they provide. A contrario, internal routers (C) will not need any manual configuration. In our example, we need to add the following directives to the Babel configuration file (/etc/babeld.conf) of the two gateways:

On router A:

redistribute ip 0.0.0.0/0 eq 0 src-prefix 10.1.1.0/24
redistribute ip 0.0.0.0/0 eq 0 proto 3 src-prefix 10.1.1.0/24

On router B:

redistribute ip 0.0.0.0/0 eq 0 src-prefix 10.1.2.0/24
redistribute ip 0.0.0.0/0 eq 0 proto 3 src-prefix 10.1.2.0/24

The second line specifies proto 3 because Babel does not redistribute proto boot routes by default.

The case of IPv6 is similar. In fact, it may be simpler, because Babel recognises kernel-side source-specific routes: in the best case, no additional configuration may be needed in IPv6.

The source-specific version of Babel works on Linux, and is available in OpenWRT as the babels package. You can also retrieve its source code from:

git clone https://github.com/boutier/babeld

Additional information about the Babel protocol and its operation with Multipath TCP may be found in the technical report entitled Source-specific routing written by Matthieu Boutier and Juliusz Chroboczek

Useful Multipath TCP software

Tue, 13 Jan 2015 00:00:00 +0100

Useful Multipath TCP software

Since the first release of the Multipath TCP implementation in the Linux kernel, various software packages have been written by project students and project contributors. Some of this software is available from the official Multipath TCP github repository. The core packages are :

The Multipath TCP patch for the Linux kernel

The Multipath TCP extension for the net-tools

The Multipath TCP extension for ip-route

In addition to these packages, the repository contains several ports for specific platforms including the Raspberry PI and several smartphones. However, most of these ports were based on an old version of the Multipath TCP kernel and have not been maintained.

Extensions to tcpdump and wireshark were initially posted as patches. The basic support for Multipath TCP is now included in these two important packages. Extensions are still being developed, e.g. https://github.com/teto/wireshark

Besides these core packages, several other open-source packages can be very useful for Multipath TCP users and developers :

tracebox is a flexible traceroute-like package that allows to detect middlebox interference inside a network. It can be used to verify whether packets containing any of the MPTCP options pass through firewalls. The source code for tracebox is available from https://github.com/tracebox/tracebox

mptcp-scapy is an extension to the popular scapy packet manipulation software that supports Multipath TCP. It was recently updated and can be obtained from https://github.com/Neohapsis/mptcp-abuse

MPTCPtrace is the equivalent for Multipath TCP of the popular tcptrace packages. It parses libpcap files and extracts both statistics and plots from the packet capture. This is one of the best way to analyse the dynamics of Multipath TCP in a real network.

MBClick is a set of Click elements that reproduce the behaviour of various types of middleboxes. It proved to be very useful while developing the middlebox support in Multipath TCP.

packetdrill is a software that allows to test precisely the behaviour of a TCP implementation. It has been extended to support Multipath TCP and a set of tests has been developed : https://github.com/ArnaudSchils/packetdrill_mptcp

MPTCP vagrant is a set of scripts that can be used to automate the creation of Multipath TCP capable virtualbox images.

A first look at multipath-tcp.org : subflows

Wed, 17 Dec 2014 00:00:00 +0100

A first look at multipath-tcp.org : subflows

In theory, a Multipath TCP connection can gather an unlimited number of subflows. In practice, implementations limit the number of concurrent subflows. The Linux implementation used on the monitored server can support up to 32 different subflows. We analyse here the number of subflows that are established for each Multipath TCP connection. Since our server never establishes subflows, this number is an indication of the capabilities of the clients that interact with it.

The figure below provides the distribution of the number of subflows per Multipath TCP connection. We show the distribution for the number of successfully established subflows, i.e., subflows that complete the handshake, as well as for all attempted ones. As can be seen, several connection attempts either fail completely or establish less subflows than intended. In total, we observe 5098 successful connections with 8701 subflows. The majority of the observed connections (57%) only establish one subflow. Around 27% of them use two subflows. Only 10 connections use more than 8 subflows, which are omitted from the figure.

A first look at multipath-tcp.org : ADD_ADDR usage

Tue, 16 Dec 2014 00:00:00 +0100

A first look at multipath-tcp.org : ADD_ADDR usage

The first question that we asked ourselves about the usage of Multipath TCP was whether the communicating hosts were using multiple addresses.

Since the packet trace was collected on the server that hosts the Multipath TCP implementation in the Linux kernel, we can expect that many Linux enthusiasts use it to download new versions of the code, visit the documentation, perform tests or verify that their configuration is correct. These users might run different versions of Multipath TCP in the Linux kernel or on other operating systems. Unfortunately, as of this writing, there is not enough experience with the Multipath TCP implementations to detect which operating system was used to generate specific Multipath TCP packets.

Thanks to the ADD_ADDR option, it is however possible to collect interesting data about the characteristics of the clients that contact our server. Over the 5098 observed Multipath TCP connections, 3321 of them announced at least one address. Surprisingly, only 21% of the collected IPv4 addresses in the ADD_ADDR option were globally routable address.

The remaining 79% of the IPv4 addresses found in the ADD_ADDR option were private addresses and in some cases link-local addresses. This confirms that Multipath TCP’s ability to pass through NATs is an important feature of the protocol RFC 6824.

The IPv6 addresses collected in the ADD_ADDR option had more diversity. We first observed 72% of globally routable IPv6 addresses. The other types of addresses that we observed are shown in the table below. The IPv4-compatible and the 6to4 IPv6 addresses were expected, but the link local and documentation addresses should have been filtered by the client and not be announced over Multipath TCP connections. The Multipath TCP specification RFC 6824 should be updated to specify which types of IPv4 and IPv6 addresses can be advertised over a Multipath TCP connection.

Address type Count

Link-local (IPv4) 51

Link-local (IPv6) 241

Documentation only (IPv6) 21

IPv4-compatible IPv6 13

6to4 206

A first look at Multipath TCP traffic

Mon, 15 Dec 2014 00:00:00 +0100

A first look at Multipath TCP traffic

The Multipath TCP website is a unique vantage point observe Multipath TCP traffic on the global Internet. We have recently collected a one-week long packet trace from this serverL. It has been collected using tcpdump and contains the headers of all TCP packets received and sent by the server hosting the Multipath TCP Linux kernel implementation. Apart from a web server, the machine also hosts an FTP server and an Iperf server. The machine has one physical network interface with two IP addresses (IPv4 and IPv6) and runs the stable version 0.89 of the Multipath TCP implementation in the Linux kernel.

To analyse the Multipath TCP connections in the dataset, we have extended the mptcptrace software . mptcptrace handles all the main features of the Multipath TCP protocol and can extract various statistics from a packet trace. Where necessary, we have combined it with tcptrace and/or its output has been further processed by custom scripts.

The table below summarizes the general characteristics of the dataset. In total, the server received around 136 million TCP packets with 134 GiBytes of data (including the TCP and IP headers) during the measurement period. As shown in the table (in the block Multipath TCP), a significant part of the TCP traffic was related to Multipath TCP. Unsurprisingly, IPv4 remains more popular than IPv6, but it is interesting to note that the fraction of IPv6 traffic from the hosts that are using Multipath TCP (9.8%) is bigger than from the hosts using regular TCP (3.7%). This confirms that dual-stack hosts are an important use case for Multipath TCP.

We have also studied the application protocols used in the multipath TCP traffic. Around 22.7% of the packets were sent or received on port 80 (HTTP) of the server. A similar percentage of packets (21.2%) was sent to port 5001 (Iperf) by users conducting performance measurements. The FTP server, was responsible for the majority of packets. It hosts the debian and ubuntu packages for the Multipath TCP kernel and is thus often used by Multipath TCP users.

In terms of connections, HTTP was responsible for 89.7% of the traffic, Iperf for 6.4%, and FTP control connections for 1.9% and the 2.0% higher ports and are probably FTP data connections.

All TCP Total IPv4 IPv6

# of packets [Mpkt] 136.1 128.5 7.6

# of bytes [GiByte] 134.0 129.0 5.0

Multipath TCP Total IPv4 IPv6

# of packets [Mpkt] 29.4 25.0 4.4

# of bytes [GiByte] 20.5 18.5 2.0

In subsequent posts, we will explore the packet trace and provide additional information about what we have learned about Multipath TCP when analysing it.

FlowBender : revisiting Equal Cost Multipath in Datacenters

Sun, 07 Dec 2014 00:00:00 +0100

FlowBender : revisiting Equal Cost Multipath in Datacenters

Equal Cost Multipath (ECMP) is a widely used technique that allows routers and switches to spread the packets over several paths having the same cost. When a router/switch has several paths having the same cost towards a given destination, it can send packets over any of these paths. To maximise load-balancing, routers install all the available paths in their forwarding tables and balance the arriving packets over all of them. To ensure that all the packets that correspond to the same layer-4 flow follow the same path and thus have roughly the same delay, routers usually select the outgoing equal cost path by computing : when n is the number of equal cost paths towards the packet’s destination and H a hash function. This technique works well in practice and is used in both datacenters and ISP networks.

A consequence of this utilisation of ECMP is that TCP connections with different source ports between two hosts will sometimes follow different paths. In large ISP networks, this may lead to very different round-trip-times for different flows between a pair of hosts. In datacenters, is has been shown that Multipath TCP can better exploit the available network resources by load balancing TCP traffic over all equal cost paths. The ndiffports path manager was designed with this use case in mind.

In a recent paper presented at Conext 2014, researchers from Google, Purdue University and Fabien Duchene propose another approach to allow TCP to efficiently utilise all paths inside a datacenter. Instead of using Multipath TCP to spread the packets from each connection over several paths (and risk increased delays due to reordering at the destination), they change the hash function used by the routers/switches. For this, they build upon the Smart hashing algorithm found in some broadcom switches. In an homogeneous datacenter that uses a single type of switches, they select the outgoing path as where TTL is the Time-to-Live extracted from the packet. This is not the first load flow-based balancing strategy that uses the TTL. Another example is CFLB that even allows to control the path followed by the packets. In addition to using the TTL for load-balancing, the datacenter switches that they use support Explicit Congestion Notification and set the CE bit when their buffer growths. They then modify the TCP sources to react to congestion events. When a source receives a TCP acknowledgement that indicates congestion, it simply reacts by changing the TTL of all the packets sent over this connection. As illustrated in the figure below, this improves the flow termination time under higher loads.

In homogeneous datacenters, the FlowBender approach is probably a viable solution. However, Multipath TCP continues to have benefits in public datacenters where the endhosts cannot influence the operation of the routers and switches.

Help us measure MPTCP over the public Internet

Sat, 15 Nov 2014 00:00:00 +0100

Help us measure MPTCP over the public Internet

We have started to analyse the MPTCP packets received on http://www.multipath-tcp.org and would like to have more data on the behavior of MPTCP over the public Internet. We’ve seen some unexpected results by looking at these packets. For example, some MPTCP connections announce up to 14 different addresses. We would like to better understand all the factors that influence the performance of MPTCP over the public Internet.

To collect more MPTCP packets, we have installed a new measurement server. The server is currently connected at 100 Mbps and we have enabled the echo and discard services. We would appreciate if all MPTCP users could help us to improve our understanding of the operation of MPTCP in the global Internet by generating MPTCP traffic towards this server. The results of this analysis will, of course, be released as technical report.

If you use MPTCP on Linux or (even better) on any other OS, preferably on a physical host that has two or more interfaces (or at least an IPv4 and IPv6 address), could you perform the following tests :

dd if=/dev/zero bs=1M count=10 | nc discard.multipath-tcp.org 9

This command will send 10 Mbytes of zeros to our server.

dd if=/dev/zero bs=1M count=10 | nc echo.multipath-tcp.org 7 >> /dev/null

This command will send 10 Mbytes of zeros to our server and it will return them by to you

Feel free to increase the number of blocks for larger transfers. If you use MPTCP on a mobile device, we’d be very interested in measurements from different locations. If you have access to another MPTCP implementation, we’d love to receive measurements packets from non-Linux hosts.

Testing Multipath TCP

Thu, 30 Oct 2014 00:00:00 +0100

Testing Multipath TCP

Once you have installed your mptcp-enabled kernel, you can test it is working as expected using the echo and discard services available on http://multipath-tcp.org

The echo service RFC 862 can be reached with a telnet client on port 7, and will send back every line you send it. The discard service RFC 863 is available on port 9 and discards all data you send it.

Using those services makes it easy to test you MPTCP stack : those services are not normally used and when capturing packets to and from those services, you can be nearly sure you won’t see unrelated packets (i.e. packets from other connections), which would certainly be the case if you tested it with port 80.

Opening a connection

In this post, we will look at the TCP segments exchanged between your host and the discard service running on discard.multipath-tcp.org by using tcpdump (an alternative is to use wireshark)

Let’s first see what happens when we open a connection to the discard service with the command

telnet discard.multipath-tcp.org 9

and let’s capture the packets exchanged in another terminal with the command

tcpdump -n -i any port 9

This captures segments to and from port 9 on all interfaces of the host, which has an ethernet interface (IPv4 and IPv6) and a wifi interface (IPv4). It also avoids name resolution with the -n flag.

Here are the packets captured when the connection is established. The first three packets captured are the classical 3-way handshake:

44:24.854234 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [S], seq 2314766615, win 28800, options [mss 1440,sackOK,TS val 1395168 ecr 0,nop,wscale 7,mptcp capable csum {0xb49d03c2011d7aba}], length 0
44:24.877157 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421: Flags [S.], seq 3417687790, ack 2314766616, win 28160, options [mss 1440,sackOK,TS val 296602671 ecr 1395168,nop,wscale 7,mptcp capable csum {0x8fb9c7b493f33d4b}], length 0
44:24.878529 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [.], ack 3417687791, win 225, options [nop,nop,TS val 1395174 ecr 296602671,mptcp capable csum {0xb49d03c2011d7aba,0x8fb9c7b493f33d4b},mptcp dss ack 2962569294], length 0

If your host is correctly configured to use mptcp, each of these 3 packets should include the option “mptcp capable” as above.

The 3-way handshake is immediately followed by other packets, 5 in this case:

44:24.878538 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [.], ack 3417687791, win 225, options [nop,nop,TS val 1395174 ecr 296602671,mptcp add-addr id 2 130.104.228.97,mptcp dss ack 2962569294], length 0
44:24.878547 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [.], ack 3417687791, win 225, options [nop,nop,TS val 1395174 ecr 296602671,mptcp add-addr id 3 192.168.122.1,mptcp dss ack 2962569294], length 0
44:24.878551 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [.], ack 3417687791, win 225, options [nop,nop,TS val 1395174 ecr 296602671,mptcp add-addr id 4 130.104.111.30,mptcp dss ack 2962569294], length 0
44:24.878557 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [.], ack 3417687791, win 225, options [nop,nop,TS val 1395174 ecr 296602671,mptcp add-addr id 8 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8,mptcp dss ack 2962569294], length 0
44:24.904876 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421: Flags [.], ack 2314766616, win 220, options [nop,nop,TS val 296602677 ecr 1395174,mptcp add-addr id 2 37.187.114.89,mptcp dss ack 1565554328], length 0

At this time, no additional mptcp subflow has been opened. This will only happen after a data packet with mptcp options has been received, to be sure no middlebox is messing things up.

These are packets with the option add-address, communicating other addresses used by the client and the server. The client advertises 4 additional addresses:

168.122.1   (libvirt bridge)
104.228.97  (IPv4)
104.111.30  (wifi)
6a8:3080:2:f24d:a2ff:fe96:8ce8  (second IPv6 global scope)

In our case, having the client advertise its addresses does not add any value, but let’s analyse it further for the sake of the experiment.

The first address is the address of a bridge used by libvirt on the client. Advertising this address should be avoided. You can disable mptcp for an interface with the patched ip route 2 available from multipath-tcp.org (if you added the apt repository, you can install it with apt-get install iproute2).

In my case, disabling the advertising of the virbr0 interface’s address is achieved with:

ip link set dev virbr0 multipath off

Advertising the IPv4 address on the same interface as the IPv6 address that was used to open the connection makes sense as the path used by each will probably be different, and hence will have different performance characteristics.

The server announces one additional address: 37.187.114.89

Here is, for the compatison, the trace obtained up to this point when disabling mptcp by issuing the command

echo 0 > /proc/sys/net/mptcp/mptcp_enabled

00:32.731162 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522 > 2001:41d0:a:6759::1.9: Flags [S], seq 3444541109, win 28800, options [mss 1440,sackOK,TS val 1637138 ecr 0,nop,wscale 7], length 0
00:32.752091 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522: Flags [S.], seq 1047133288, ack 3444541110, win 28560, options [mss 1440,sackOK,TS val 296844644 ecr 1637138,nop,wscale 7], length 0
00:32.752128 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522 > 2001:41d0:a:6759::1.9: Flags [.], ack 1047133289, win 225, options [nop,nop,TS val 1637143 ecr 296844644], length 0

There’s no mptcp_enabled option, and no additional address is advertised. Only the 3 packets of the 3-way handshake are exchanged.

At this time the connection is open, and both hosts using mptcp have advertised their additional addresses and received the other hosts addresses.

Data transfer

We can now send a line of text to the service, I just type one character and press enter.

Let’s first look at what happen when normal TCP is used:

11:01:16.960802 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522 > 2001:41d0:a:6759::1.9: Flags [P.], seq 3444541110:3444541113, ack 1047133289, win 225, options [nop,nop,TS val 1648195 ecr 296844644], length 3
11:01:16.981867 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522: Flags [.], ack 3444541113, win 224, options [nop,nop,TS val 296855701 ecr 1648195], length 0

Only 2 segments are exchanged, the first sending the data to the server, the second being the ack from the server.

Things are different when using multipath tcp. The first two segments are equivalent to the normal tcp connection:

10:51:54.636308 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [P.], seq 2314766616:2314766619, ack 3417687791, win 225, options [nop,nop,TS val 1507614 ecr 296602677,mptcp dss ack 2962569294 seq 1565554328 subseq 1 len 3 csum 0xbd71], length 3
10:51:54.657885 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421: Flags [.], ack 2314766619, win 220, options [nop,nop,TS val 296715118 ecr 1507614,mptcp dss ack 1565554331], length 0

but those are followed by other segments:

51:54.657944 IP 130.104.228.97.52462 > 37.187.114.89.9: Flags [S], seq 508980983, win 29200, options [mss 1460,sackOK,TS val 1507619 ecr 0,nop,wscale 7,mptcp join id 2 token 0x7b14451d nonce 0x1ca1c3df], length 0
51:54.657958 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [S], seq 2757512376, win 29200, options [mss 1460,sackOK,TS val 1507619 ecr 0,nop,wscale 7,mptcp join id 3 token 0x7b14451d nonce 0xbd6f9678], length 0
51:54.657971 IP 130.104.111.30.45837 > 37.187.114.89.9: Flags [S], seq 77997614, win 29200, options [mss 1460,sackOK,TS val 1507619 ecr 0,nop,wscale 7,mptcp join id 4 token 0x7b14451d nonce 0x6d029f47], length 0
51:54.657984 IP6 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280 > 2001:41d0:a:6759::1.9: Flags [S], seq 899129682, win 28800, options [mss 1440,sackOK,TS val 1507619 ecr 0,nop,wscale 7,mptcp join id 8 token 0x7b14451d nonce 0xdad211ee], length 0

Those 4 segments are all SYN segments with the mptcp join option. This is the client trying to open additional subflows. We see subflows are requested from addresses

104.228.97 (2 times)
104.111.30
6a8:3080:2:f24d:a2ff:fe96:8ce8

The first address listed is the source of 2 requests to open subflows. Note that the id in the second packet is 3, which, if you look at the add-addr segments above, you’ll see associated with IP 192.168.122.1. This behaviour is due to the private address of the libvirt bridge being used to open a new subflow, but it is natted by libvirt.

The SYN segments are requesting the opening of additional subflows, and here are the segments completing these 3-way handshakes.

First, the subflow requested from IP 130.104.228.97 and port 52462 is opened:

10:51:54.670368 IP 37.187.114.89.9 > 130.104.228.97.52462: Flags [S.], seq 276814345, ack 508980984, win 28560, options [mss 1460,sackOK,TS val 296715122 ecr 1507619,nop,wscale 7,mptcp join id 2 hmac 0x680d8e1f8915b0c nonce 0x8a55d382], length 0
10:51:54.670420 IP 130.104.228.97.52462 > 37.187.114.89.9: Flags [.], ack 276814346, win 454, options [nop,nop,TS val 1507622 ecr 296715122,mptcp join hmac 0x75c4548dd3b71d3172b6fab1dc1ad62b94ba58a4], length 0

Then the subflow asked for the private IP on the client is completed.

10:51:54.670427 IP 37.187.114.89.9 > 130.104.228.97.34887: Flags [S.], seq 3653300584, ack 2757512377, win 28560, options [mss 1460,sackOK,TS val 296715122 ecr 1507619,nop,wscale 7,mptcp join id 2 hmac 0x7bc3d5be52b067e4 nonce 0x5195517a], length 0
10:51:54.670441 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [.], ack 3653300585, win 682, options [nop,nop,TS val 1507622 ecr 296715122,mptcp join hmac 0x5e10da51f9a7205f97c8a1e90283341f1b17b557], length 0

Finally the third 3-way handshake is completed:

10:51:54.681913 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280: Flags [S.], seq 3184835425, ack 899129683, win 28160, options [mss 1440,sackOK,TS val 296715123 ecr 1507619,nop,wscale 7,mptcp join id 8 hmac 0xfbb9cf34c4bc25bf nonce 0xf9e90440], length 0
10:51:54.681941 IP6 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280 > 2001:41d0:a:6759::1.9: Flags [.], ack 3184835426, win 907, options [nop,nop,TS val 1507625 ecr 296715123,mptcp join hmac 0x420cc6a3bcbd81fadd3298b264e2a20f97bd98ea], length 0

At this time, the subflows are in the PRE_ESTABLISHED state, and cannot be used yet, because the last segment sent by the initiating party is the only one containing its authentication information. An acknowledgement of this last segment is required before data can be sent through the subflow. Here are the 3 acknowledgments:

51:54.682476 IP 37.187.114.89.9 > 130.104.228.97.52462: Flags [.], ack 508980984, win 444, options [nop,nop,TS val 296715125 ecr 1507622,mptcp dss ack 1565554331], length 0
51:54.682497 IP 37.187.114.89.9 > 130.104.228.97.34887: Flags [.], ack 2757512377, win 667, options [nop,nop,TS val 296715125 ecr 1507622,mptcp dss ack 1565554331], length 0
51:54.702843 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280: Flags [.], ack 899129683, win 887, options [nop,nop,TS val 296715129 ecr 1507625,mptcp dss ack 1565554331], length 0

At this time, 3 additional subflows have been set up. The subflow on the wireless interface has not been set up, and we can see new attempts:

51:55.654975 IP 130.104.111.30.45837 > 37.187.114.89.9: Flags [S], seq 77997614, win 29200, options [mss 1460,sackOK,TS val 1507869 ecr 0,nop,wscale 7,mptcp join id 4 token 0x7b14451d nonce 0x6d029f47], length 0
51:57.658963 IP 130.104.111.30.45837 > 37.187.114.89.9: Flags [S], seq 77997614, win 29200, options [mss 1460,sackOK,TS val 1508370 ecr 0,nop,wscale 7,mptcp join id 4 token 0x7b14451d nonce 0x6d029f47], length 0
52:01.666976 IP 130.104.111.30.45837 > 37.187.114.89.9: Flags [S], seq 77997614, win 29200, options [mss 1460,sackOK,TS val 1509372 ecr 0,nop,wscale 7,mptcp join id 4 token 0x7b14451d nonce 0x6d029f47], length 0
52:09.682970 IP 130.104.111.30.45837 > 37.187.114.89.9: Flags [S], seq 77997614, win 29200, options [mss 1460,sackOK,TS val 1511376 ecr 0,nop,wscale 7,mptcp join id 4 token 0x7b14451d nonce 0x6d029f47], length 0
52:25.714981 IP 130.104.111.30.45837 > 37.187.114.89.9: Flags [S], seq 77997614, win 29200, options [mss 1460,sackOK,TS val 1515384 ecr 0,nop,wscale 7,mptcp join id 4 token 0x7b14451d nonce 0x6d029f47], length 0
52:57.746964 IP 130.104.111.30.45837 > 37.187.114.89.9: Flags [S], seq 77997614, win 29200, options [mss 1460,sackOK,TS val 1523392 ecr 0,nop,wscale 7,mptcp join id 4 token 0x7b14451d nonce 0x6d029f47], length 0

This is due to a firewall blocking access to port 9.

Connection tear down

Now that we have opened a connection, transfered data and observerd subflows being established, we can close the connection. In the telnet connection, type the control sequence (usually ^], press CTRL-]), press enter and enter quit to exit telnet. At that time the connection is closed.

Let’s first look at what happens in standard TCP:

01:45.672776 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522 > 2001:41d0:a:6759::1.9: Flags [F.], seq 3444541113, ack 1047133289, win 225, options [nop,nop,TS val 1655373 ecr 296855701], length 0
01:45.694726 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522: Flags [F.], seq 1047133289, ack 3444541114, win 224, options [nop,nop,TS val 296862880 ecr 1655373], length 0
01:45.694766 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46522 > 2001:41d0:a:6759::1.9: Flags [.], ack 1047133290, win 225, options [nop,nop,TS val 1655378 ecr 296862880], length 0

In short, both ends send a FIN segment that is acknowledged to close the connection in both directions.

With mptcp, things are more complex as we also have opened multiple subflows.

First, at the MPTCP level, it is signaled that no more data will be sent with a DATA_FIN flagged segment. This segment has to be acknowledged in the DSS. As with TCP, this is done in both directions. This is seen in segments 1,2,4 below. As no more data will be transmitted, subflows can be teared down as classical tcp connections. This is what happens from segment 3. We see that segment 4 is used both to signal an acknowledgement at the DSS level, as well as signal a FIN at the subflow level.

58:03.422967 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [.], ack 3653300585, win 907, options [nop,nop,TS val 1599810 ecr 296715125,mptcp dss fin ack 2962569294 seq 1565554331 subseq 0 len 1 csum 0x2f7f], length 0
58:03.435575 IP 37.187.114.89.9 > 130.104.228.97.34887: Flags [.], ack 2757512377, win 887, options [nop,nop,TS val 296807315 ecr 1599810,mptcp dss fin ack 1565554332 seq 2962569294 subseq 0 len 1 csum 0x31ad], length 0
58:03.435631 IP6 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280 > 2001:41d0:a:6759::1.9: Flags [F.], seq 899129683, ack 3184835426, win 907, options [nop,nop,TS val 1599814 ecr 296715129,mptcp dss ack 2962569294], length 0
58:03.435647 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [F.], seq 2757512377, ack 3653300585, win 907, options [nop,nop,TS val 1599814 ecr 296715125,mptcp dss ack 2962569294], length 0
58:03.435653 IP 130.104.228.97.52462 > 37.187.114.89.9: Flags [F.], seq 508980984, ack 276814346, win 907, options [nop,nop,TS val 1599814 ecr 296715125,mptcp dss ack 2962569294], length 0
58:03.435658 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [F.], seq 2314766619, ack 3417687791, win 907, options [nop,nop,TS val 1599814 ecr 296715118,mptcp dss ack 2962569294], length 0
58:03.435669 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [.], ack 3653300585, win 907, options [nop,nop,TS val 1599814 ecr 296807315,mptcp dss ack 2962569295], length 0
58:03.447702 IP 37.187.114.89.9 > 130.104.228.97.34887: Flags [F.], seq 3653300585, ack 2757512378, win 887, options [nop,nop,TS val 296807318 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.447735 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [.], ack 3653300586, win 907, options [nop,nop,TS val 1599817 ecr 296807318,mptcp dss ack 2962569295], length 0
58:03.447743 IP 37.187.114.89.9 > 130.104.228.97.52462: Flags [F.], seq 276814346, ack 508980985, win 887, options [nop,nop,TS val 296807318 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.447748 IP 130.104.228.97.52462 > 37.187.114.89.9: Flags [.], ack 276814347, win 907, options [nop,nop,TS val 1599817 ecr 296807318,mptcp dss ack 2962569295], length 0
58:03.457119 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421: Flags [F.], seq 3417687791, ack 2314766620, win 887, options [nop,nop,TS val 296807319 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.457147 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [.], ack 3417687792, win 907, options [nop,nop,TS val 1599819 ecr 296807319,mptcp dss ack 2962569295], length 0
58:03.460016 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280: Flags [F.], seq 3184835426, ack 899129684, win 887, options [nop,nop,TS val 296807319 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.460032 IP6 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280 > 2001:41d0:a:6759::1.9: Flags [.], ack 3184835427, win 907, options [nop,nop,TS val 1599820 ecr 296807319,mptcp dss ack 2962569295], length 0

We can also look at what happens per subflow. Here is the DATA_FIN sent and acknowledged, with the subflow closed afterwards:

58:03.422967 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [.], ack 3653300585, win 907, options [nop,nop,TS val 1599810 ecr 296715125,mptcp dss fin ack 2962569294 seq 1565554331 subseq 0 len 1 csum 0x2f7f], length 0
58:03.435575 IP 37.187.114.89.9 > 130.104.228.97.34887: Flags [.], ack 2757512377, win 887, options [nop,nop,TS val 296807315 ecr 1599810,mptcp dss fin ack 1565554332 seq 2962569294 subseq 0 len 1 csum 0x31ad], length 0
58:03.435647 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [F.], seq 2757512377, ack 3653300585, win 907, options [nop,nop,TS val 1599814 ecr 296715125,mptcp dss ack 2962569294], length 0
58:03.435669 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [.], ack 3653300585, win 907, options [nop,nop,TS val 1599814 ecr 296807315,mptcp dss ack 2962569295], length 0
58:03.447702 IP 37.187.114.89.9 > 130.104.228.97.34887: Flags [F.], seq 3653300585, ack 2757512378, win 887, options [nop,nop,TS val 296807318 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.447735 IP 130.104.228.97.34887 > 37.187.114.89.9: Flags [.], ack 3653300586, win 907, options [nop,nop,TS val 1599817 ecr 296807318,mptcp dss ack 2962569295], length 0

Hereafter we see the tear down of the 3 remaining subflows:

58:03.435631 IP6 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280 > 2001:41d0:a:6759::1.9: Flags [F.], seq 899129683, ack 3184835426, win 907, options [nop,nop,TS val 1599814 ecr 296715129,mptcp dss ack 2962569294], length 0
58:03.460016 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280: Flags [F.], seq 3184835426, ack 899129684, win 887, options [nop,nop,TS val 296807319 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.460032 IP6 2001:6a8:3080:2:f24d:a2ff:fe96:8ce8.39280 > 2001:41d0:a:6759::1.9: Flags [.], ack 3184835427, win 907, options [nop,nop,TS val 1599820 ecr 296807319,mptcp dss ack 2962569295], length 0


58:03.435653 IP 130.104.228.97.52462 > 37.187.114.89.9: Flags [F.], seq 508980984, ack 276814346, win 907, options [nop,nop,TS val 1599814 ecr 296715125,mptcp dss ack 2962569294], length 0
58:03.447743 IP 37.187.114.89.9 > 130.104.228.97.52462: Flags [F.], seq 276814346, ack 508980985, win 887, options [nop,nop,TS val 296807318 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.447748 IP 130.104.228.97.52462 > 37.187.114.89.9: Flags [.], ack 276814347, win 907, options [nop,nop,TS val 1599817 ecr 296807318,mptcp dss ack 2962569295], length 0


58:03.435658 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [F.], seq 2314766619, ack 3417687791, win 907, options [nop,nop,TS val 1599814 ecr 296715118,mptcp dss ack 2962569294], length 0
58:03.457119 IP6 2001:41d0:a:6759::1.9 > 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421: Flags [F.], seq 3417687791, ack 2314766620, win 887, options [nop,nop,TS val 296807319 ecr 1599814,mptcp dss ack 1565554332], length 0
58:03.457147 IP6 2001:6a8:3080:2:95ad:6e51:ba2:31ea.46421 > 2001:41d0:a:6759::1.9: Flags [.], ack 3417687792, win 907, options [nop,nop,TS val 1599819 ecr 296807319,mptcp dss ack 2962569295], length 0

Citing Multipath TCP

Fri, 10 Oct 2014 00:00:00 +0200

Citing Multipath TCP

A growing number of scientific papers use the Multipath TCP implementation in the Linux kernel to perform experiments, develop new features or compare Multipath TCP with newly proposed techniques. While reading these scientific papers, we often see different ways of citing the Multipath TCP implementation in the Linux kernel. As of this writing, more than twenty developers have contributed to this implementation and the number continues to grow. The full list of contributors is available from : http://multipath-tcp.org/mptcp_stats/authors.html

If you write a scientific paper that uses the Multipath TCP implementation in the Linux kernel, we encourage you cite it by using the following reference :

Christoph Paasch, Sebastien Barre, et al., Multipath TCP implementation in the Linux kernel, available from http://www.multipath-tcp.org

The corresponding bibtex entry may be found below

@Misc{MPTCPLinux,
    author =    {Christoph Paasch and Sebastien Barre and others},
    title =     {Multipath TCP implementation in the Linux kernel},
    howpublished = {Available from http://www.multipath-tcp.org}
}

Please also indicate the precise version of the implementation that you used to ease the reproduction of your results. We also strongly encourage you to distribute the software the you used to perform your experiments and the patches that you have written on top of this implementation. This will allow other researchers to reproduce your results.

Recommended Multipath TCP configuration

Tue, 16 Sep 2014 00:00:00 +0200

Recommended Multipath TCP configuration

A growing number of researchers and users are downloading the pre-compiled Linux kernels that include the Multipath TCP implementation. Besides the researchers who performed experiments on improving the protocol or its implementation, we see a growing number of users that deploy Multipath TCP on real machines to benefit from its multihoming capabilities. Several of these users have asked questions on the mptcp-dev mailing list on how to configure Multipath TCP in the Linux kernel. There are several parts of the Multipath TCP implementation that can be tuned.

The first element that can be configured is the path manager. The path manager has been reimplemented in a modular manner recently. This is the part of the software that controls the establishment of new subflows. The latests versions of Multipath TCP contain a path manager that has a modular architecture, but as of this writing, only two different path managers have been implemented : the fullmesh and the ndiffports path managers.

The fullmesh path manager is the default one and should be used in most deployments. On a client, it will advertise all the IP addresses of the client to the server and listen to all the IP addresses that are advertised by the server. It also listens to events from the network interfaces and reacts by adding/removing addresses when interfaces go up or down. On a server, it allows the server to automatically learn all the available addresses and announce them to the client. Note that in the current implementation the server never creates subflows, even if it learns different addresses from the client. The reason is that the client is often behind a NAT or firewall and creating subflows from the server is not a good idea in this case. The typical use case for this fullmesh path manager is a dual-homed client connected to a single-homed server (e.g. a smartphone connected to a regular server). In this case, two subflows will be established on each of the interfaces of the dual-homed client. We expect that this is the more popular use case for Multipath TCP. It should be noted that if the client has N addresses and the server M addresses, this path manager will establish subflows. This is probably not optimal in all scenarios.

The ndiffports path manager was designed for a specific use case in mind : exploit the equal costs multiple paths that are available in a dataceenter. This allowed to demonstrate nice performance results with Multipath TCP in the Amazon EC2 datacenter in a SIGCOMM11 paper. It can also be used to perform some tests between single-homed hosts. However, this path manager does not automatically learn the IP addresses on the client and the server and does not react to interface changes. As for the full-mesh path manager, the server never creates subflows. The ndiffports path manager should not be used in production and should be considered as an example on how a path manager can be written inside the Linux kernel.

A second important module in the Multipath TCP implementation in the Linux kernel is the packet scheduler. This scheduler is used every time a new packet needs to be sent. When there are several subflows that are active and have an open congestion window, the default scheduler selects the subflow with the smallest round-trip-time. The various measurements that have been performed during the last few years with the Multipath TCP implementation in the Linux kernel indicate that this scheduler appears to be the best compromise from a performance viewpoint. Recently, the implementation of the scheduler have been made more modulas to enable researchers to experiment with other schedulers. A round-robin scheduler has been implemented and evaluated in a recent paper that shows that the default scheduler remains the best choice. Researchers might come up later with a better scheduler than improves the performance of Multipath TCP under specific circumstances, but as of this writing the default rtt-based scheduler remains the best choice.

A third important part of Multipath TCP is the congestion control scheme. The standard congestion control scheme is the Linked Increase Algorithm (LIA) defined in RFC 6356. It provides a similar performance as the NewReno congestion control algorithm with single path TCP. An alternative is the OLIA congestion control algorithm. The paper that proposes this algorithm has shown that it gives some benefits over LIA in several environments. Our experience indicates that LIA and OLIA could safely be used as a default in deployments. Recently, a delay based congestion control scheme tuned for Multipath TCP has been added to the Linux implementation. Users who plan to use this congestion control scheme in specific environments should first perform tests before deploying it.

There are two other configuration parameters that could be tuned to improve the performance of Multipath TCP. First, Multipath TCP tends to consume more buffers than regular TCP since data is transmitted over paths with different delays. If you experience performance issues with the default buffer sizes, you might try to increase them, see https://fasterdata.es.net/host-tuning/linux/ for additional information. Second, if Multipath TCP is used on paths having different Maximum Segment Sizes, there are scenarios where the performance can be significantly reduced. A patch that solves this problem has been posted recently. If your version of the Multipath TCP kernel does not include this patch, you might want to force the MTU on all your interfaces one the client to use the same value (or force a lower MTU on the server to ensure that the clients always use the same MSS).

Multipath TCP discussed at Blackhat 2014

Thu, 07 Aug 2014 00:00:00 +0200

Multipath TCP discussed at Blackhat 2014

The interest in Multipath TCP continues to grow. During IETF90, an engineer from Oracle confirmed that they were working on an implementation of Multipath TCP on Solaris. This indicates that companies see a possible benefit with Multipath TCP. Earlier this week, Catherine Pearce and Patrick Thomas from Neohapsis gave a presentation on how the deployment of Multipath TCP could affect enterprise that heavily rely on firewalls and IDS in their corporate network. This first ‘heads up’ for the security community will likely be followed by many other attempts to analyse the security of Multipath TCP and its implications on the security of an enterprise network.

In parallel with their presentation, Catherine and Patrick have released two software packages that could be useful for Multipath TCP users. Both are based on a first implementation of Multipath TCP inside scapy written by Nicolas Maitre during his Master thesis at UCL.

mptcp_scanner is a tool that probes remote hosts to verify whether they support Multipath TCP. It would be interesting to see whether an iPhone is detected as such (probably not because there are no servers running on the iPhone). In the long term, we can expect that nmap

mptcp_fragmenter is a tool that mimics how a Multipath TCP connection could send start over different subflows. Currently, the tool is very simple, five subflows are used and their source port numbers are fixed. Despite of this limitation, it is a good starting point to test the support of Multipath TCP on firewalls. We can expect that new features will be added as firewalls add support for Multipath TCP.

NorNet moving to Multipath TCP

Fri, 30 May 2014 00:00:00 +0200

NorNet moving to Multipath TCP

The NortNet testbed is a recent and very interesting initiative from the Simula laboratory in Norway. This testbed is composed of two parts :

the NorNet Core is a set of servers on different locations in Norway and possibly abroad. Each location has a few servers that are connected to several ISPs. There are correctly more than a dozen of NortNet Core sites in Norway

the NorNet Edge comprises hundreds of nodes that are connected to several cellular network providers

As os this writing, NortNet is the largest experimental platform that gathers nodes that are really multihomed. Recently, NortNet has upgraded its kernels to include Multipath TCP. This will enable large scale experiments with Multipath TCP in different network environments. We can expect more experimental papers that use Multipath TCP at a large scale.

Additional information about NortNet maybe be found on the web and in scientific articles such as :

Gran, Ernst Gunnar; Dreibholz, Thomas and Kvalbein, Amund: NorNet Core - A Multi-Homed Research Testbed Computer Networks, Special Issue on Future Internet Testbeds, vol. 61, pp. 75-87, DOI 10.1016/j.bjp.2013.12.035, ISSN 1389-1286, March 14, 2014.

Tracebox for Android

Thu, 15 May 2014 00:00:00 +0200

Tracebox for Android

tracebox is a middlebox detection tool that was desgined by Gregory Detal to detect middleboxes that modify IP or TCP headers [IMC] . The command line version of tracebox runs of various Linux variants and allows to craft special packets with various options to test for middleboxes that :

add TCP options (e.g. many ADSL routers would insert an MSS option in the SYN segment)

modify TCP options (e.g. a transparent proxy will change the timestamp option)

remove TCP options (e.g. some firewalls remove unknown TCP options)

The latter is problematic for extensions like Multipath TCP and various testers and informed us that some 3G/4G networks block Multipath TCP by default. Some operators have even informed us that they’ve used tracebox to detect the offending firewall and change its configuration. However, running tracebox from a laptop is not always convenient.

Valentin Thirion, a student from the University of Liege, has been recently working on a port of some of the features of tracebox on the Android platform. His App, Android Tracebox runs on rooted smartphones and contains a basic implementation of tracebox that can be used to detect where some specific options, including the MP_CAPABLE option are removed or modified in the network.

In the Mobistar network that I use, it does not reveal any strange behavior

traceboxing to 173.252.110.27
* * *
10.40.211.145 IP::DSCP/ECN IP::TTL IP::Checksum TCP::Checksum TCP::Option_MSS
172.31.5.238 IP::DSCP/ECN IP::TTL IP::Checksum
10.30.23.89 IP::DSCP/ECN IP::TTL IP::Checksum TCP::Checksum TCP::Option_MSS
212.65.36.129 IP::DSCP/ECN IP::TTL IP::Checksum
81.52.186.121 IP::TTL IP::Checksum
193.251.240.218 IP::TTL IP::Checksum TCP::Checksum TCP::Option_MSS
193.251.132.15 IP::TTL IP::Checksum TCP::Checksum TCP::Option_MSS
193.251.252.166 IP::TTL IP::Checksum
204.15.22.198 IP::TTL IP::Checksum TCP::Checksum TCP::Option_MSS
31.13.29.255 IP::TTL IP::Checksum TCP::Checksum TCP::Option_MSS
31.13.24.95 IP::TTL IP::Checksum TCP::Checksum TCP::Option_MSS
173.252.65.99 IP::DSCP/ECN IP::TTL IP::Checksum

This output indicates that this 3G provider is using DSCP to mark packets, and that it updates or inserts the MSS option. This is a common behavior to prevent some problems with Path MTU discovery.

On the WiFi access provided by Voo, the path seems to be even cleaner

traceboxing to 173.252.110.27
192.168.0.1 IP::Checksum
10.163.0.1 IP::TTL IP::Checksum
78.129.125.89 IP::TTL IP::Checksum
212.3.237.49 IP::TTL IP::Checksum
4.69.148.182 IP::TTL IP::Checksum TCP::Checksum
4.69.143.94 IP::TTL IP::Checksum TCP::Checksum
4.69.137.70 IP::TTL IP::Checksum TCP::Checksum
4.69.141.18 IP::TTL IP::Checksum TCP::Checksum
4.69.202.57 IP::TTL IP::Checksum TCP::Checksum
4.69.134.150 IP::TTL IP::Checksum TCP::Checksum
4.69.149.210 IP::TTL IP::Checksum
4.53.116.78 IP::TTL IP::Checksum
31.13.24.8 IP::TTL IP::Checksum TCP::Checksum
31.13.29.232 IP::TTL IP::Checksum TCP::Checksum
173.252.64.191 IP::TTL IP::Checksum

References

[IMC]

Gregory Detal, Benjamin Hesmans, Olivier Bonaventure, Yves Vanaubel, and Benoit Donnet. 2013. Revealing middlebox interference with tracebox. In Proceedings of the 2013 conference on Internet measurement conference (IMC ‘13). ACM, New York, NY, USA, 1-8. DOI=10.1145/2504730.2504757 http://doi.acm.org/10.1145/2504730.2504757

Dissecting Siri

Tue, 01 Apr 2014 00:00:00 +0200

Dissecting Siri

Siri is the voice recognition application used by Appel’s iPhones and iPads. The application captures the user’s voice and send it to Apple servers on the cloud that run voice recognition algorithms and return the voice samples converted in text format. Since this is a closed source application, there are very few details about its operation. Still, it is the largest user of Multipath TCP and for this reason, it’s worth being discussed here.

A recent paper [Cavivlione] written by Luca Caviglione briefly analyses the Siii application from a networking viewpoint. The papers looks at the sizes of the packets that were exchanged, tries to infer the type of data exchanged and the duration of the TCP connections. It discusses several scenarios during the user dictates various sentences. Unfortunately, the paper was written before the release of iOS7 that started to use Multipath TCP for Siri.

It could be interesting to perform similar tests with a recent version of Siri that uses Multipath TCP. Unfortunately, since the data is encrypted and potentially partially transmitted over cellular networks, this is more challenging than when a single TCP connection was used. Up to iOS6, the open-source SiriProxy could be used to intercept Siri messages and even use them to trigger some specific operations for e.g. home automation. Unfortunately, SiriProxy does not seem to be useable anymore with iOS7 as discussed in details on https://github.com/plamoni/SiriProxy/issues/542

References

[Cavivlione]

Luca Caviglione, A first look at traffic patterns of Siri, Transactions on Emerging Telecommunications Technologies, 2013, http://dx.doi.org/10.1002/ett.2697

Why is the Multipath TCP scheduler so important ?

Sun, 30 Mar 2014 00:00:00 +0100

Why is the Multipath TCP scheduler so important ?

Sending data over the subflow with the smallest round-trip-time is not sufficient to achieve good performance on memory constrained devices that use a small receive window. This problem was first explored in [NSDI12] where reinjection and penalizations where proposed to mitigate the head-of-line blocking than can occur when the receiver advertises a limited receive window. The typical scenario is a smartphone using 3G and WiFi where 3G is slower than WiFi. If the receiver is window-limited, then it might happen that a packet is sent on the 3G subflow and then the WiFi subflow becomes blocked due to the limited receive window. In this case, the algorithm proposed in [NSDI12] will reinject the unacknowledged data from the 3G subflow on the WiFi subflow and reduce the congestion window on the 3G subflow. This problem has been analyzed in more details in [Conext13] by considering a large number of scenarios. This analysis has resulted in various improvements to the Linux Multipath TCP implementation.

During the last years, several researchers have proposed other types of schedulers for Multipath TCP or other transport protocols. In theory, if a scheduler has perfect knowledge of the network characteristics (bandwidth, delay), it could optimally schedule the packets that are transmitted to prevent head-of-line blocking problems and minimize the buffer occupancy. In practice, and in a real implementation, this is slightly more difficult because the delay varies and the bandwidth is unknown and varies in function of the other TCP connections.

A few articles have tried to solve the scheduling problem by using a different approach than the one currently implemented in the Linux kernel.

The Delay-Aware Packet Scheduling For Multipath Transport proposed in [DAPS] is a recent example of such schedulers. [DAPS] considers two paths with different delays and generates a schedule, i.e. a list of sequence numbers to be transmitted over the different paths. Some limitations of the proposed scheduler are listed in [DAPS], notably : the DAPS scheduler assumes that there is a lage difference in delays between the different paths and it assumes that the congestion windows are stable. In practice, these conditions are not always true and a scheduler should operate in all situations. [DAPS] implements the proposed scheduler in the ns-2 CMT simulator dans evaluates its performance in small networks.

Another scheduler is proposed in [YAE2013]. This scheduler tries to estimate the available capacity on each subflow and measures the number of bytes transmitted over each subflow. This enables the scheduler to detect when the subflow is sending too much data and select the other subflow at that time. The proposed scheduler is implemented in the Linux kernel, but unfortunately the source code does not seem to have been released by the authors of [YAE2013]. The performance of the scheduler is evaluated by considering a simulation scenario with very long file transfers in a network with a very small amount of buffering. It is unclear whether this represents a real use case for Multipath TCP.

It can be expected that other researchers will propose new Multipath TCP schedulers. This is room for improvement in the part of the Multipath TCP code. However, to be convincing, the evaluation of a new scheduler should not be limited to small scale simulations. It should consider a wide range of scenarios like [Conext13] and demonstrate that it can be efficiently implemented in the Linux kernel.

References

[NSDI12]

(1, 2) Costin Raiciu, C. Paasch, S. Barre, A. Ford, and M. Honda and O. Bonaventure and M. Handley, How hard can it be? designing and implementing a deployable Multipath TCP USENIX NSDI, 2012.

[Conext13]

(1, 2) Christoph Paasch, R. Khalili, and O. Bonaventure, On the benefits of applying experimental design to improve Multipath TCP, presented at the CoNEXT ‘13: Proceedings of the ninth ACM conference on Emerging networking experiments and technologies, 2013.

[DAPS]

(1, 2, 3, 4) Nicolas Kuhn, E. Lochin, A. Mifdaoui, G. Sarwar, O. Mehani, and R. Boreli, DAPS: Intelligent Delay-Aware Packet Scheduling For Multipath Transport presented at the ICCC, 2014

[YAE2013]

(1, 2) Fan Yang, P. Amer, and N. Ekiz, A Scheduler for Multipath TCP, presented at the Computer Communications and Networks (ICCCN), 2013 22nd International Conference on, 2013, pp. 1-7.

Researchers contribute Multipath TCP code

Fri, 28 Mar 2014 00:00:00 +0100

Researchers contribute Multipath TCP code

Our Multipath TCP implementation in the Linux continues to attracts a lot of interest from both researchers and industry. Until now, most of the work on the implementation has been done by researchers at UCL are close collaborators who work with us in the framework of scientific projects with the few exceptions. During the last week, two research groups have contributed new patches to Multipath TCP.

The first patch, proposed last week by Enhuan Dong adds an implementation of a Multipath-aware Vegas congestion control scheme. Most TCP congestion control schemes rely on packet losses to detect congestion with one notable exception : TCP Vegas [1] . TCP Vegas measures the round-trip-time and uses increases in round-trip-times as an indication of congestion and adapts its congestion window accordingly. In 2012, several researchers proposed to adapt TCP Vegas for Multipath TCP [2] . This patch is a first step in implementing this extension of TCP Vegas in the Linux kernel. It has already generated some discussion on the mailing list.

The second patch is an extension to the Multipath TCP path manager. The path manager is a recent addition to the Linux Multipath TCP implementation. It acts as a control plane for Multipath TCP since it includes the logic that decides when and how subflows are created. The default path manager creates a full-mesh of subflows, but this is not always the best solution. The path manager was designed to be flexible and extensible. The patch sent by Duncan Eastoe and Luca Boccassi supports the Binder system described in [3] . It also includes some support for using the IPv6 Routing header with Multipath TCP. Given that this header has been deprecated, it is unlikely that this will end up in the standard Multipath TCP implementation, but it could be useful for research experiments.

[1]	Lawrence S. Brakmo, S. W. O’Malley, and L. L. Peterson, TCP Vegas: new techniques for congestion detection and avoidance presented at the SIGCOMM’94: Proceedings of the conference on Communications architectures, protocols and applications, New York, New York, USA, 1994, pp. 24-35.

[2]	Yu Cao, M. Xu, and X. Fu, Delay-based congestion control for multipath TCP , presented at the Network Protocols (ICNP), 2012 20th IEEE International Conference on, 2012, pp. 1-10.

[3]	Luca Boccassi, M. M. Fayed, and M. K. Marina, Binder: a system to aggregate multiple internet gateways in community networks presented at the LCDNet’13: Proceedings of the 2013 ACM MobiCom workshop on Lowest cost denominator networking for universal access, New York, New York, USA, 2013, p. 3.

Observing Siri : the three-way handshake

Mon, 24 Feb 2014 00:00:00 +0100

Observing Siri : the three-way handshake

Apple’s Siri is the largest use of Multipath TCP as of this writing. This post looks at one Multipath TCP connection established by a single-homed iPad running iOS7 over a single WiFi interface. The trace below shows a simple Multipath TCP session between this iPad and the standard Siri server. As all Multipath TCP connections, it starts with a three-way exchange :

12:43:31.311061 IP (tos 0x0, ttl 64, id 54778, offset 0, flags [DF], proto TCP (6), length 76)
      192.168.2.2.62787 > siri.4.https: Flags [S], cksum 0x5e3a (correct), seq 2739181685, win 65535, options [mss 1460,nop,wscale 3,mp capable flags:H sndkey:96e576198c475350,nop,nop,TS val 1363555813 ecr 0,sackOK,eol], length 0

The first segment is a SYN segment. It contains several TCP options :

the MSS option advertising a standard maximum segment size of 1460 bytes

the wscale option defined in RFC 1323 that advertises a scaling factor of 3

the timestamp option defined in RFC 1323

the sackOK option that allows to negotiate the utilisation of selective acknowledgements

These TCP options are standard TCP options that are used on modern TCP stacks. It is a bit surprising to see a window scale option for an application like Siri where typically only a small amount of data will be exchanged.

The last option is the MP_CAPABLE option defined in RFC 6824 that proposes the utilisation of Multipath TCP. In the SYN segment, this option contains the random 64 bits key chosen by the sender.

12:43:31.342236 IP (tos 0x0, ttl 244, id 52382, offset 0, flags [DF], proto TCP (6), length 64)
      siri.4.https > 192.168.2.2.62787: Flags [S.], cksum 0x034b (correct), seq 1880401460, ack 2739181686, win 8190, options [mss 1460,nop,wscale 4,nop,nop,sackOK,mp capable flags:H sndkey:d7b705e4d86c1a66], length 0

The second segment is the SYN+ACK segment returned by the server. It is interesting to note that this segment does not contain the RFC 1323 timestamp option and uses a different windows scale than the one proposed by the client in the SYN segment. The absence of the timestamp option is probably to avoid using too many option bytes in the data segments.

12:43:31.345448 IP (tos 0x0, ttl 64, id 47496, offset 0, flags [DF], proto TCP (6), length 60)
      192.168.2.2.62787 > siri.4.https: Flags [.], cksum 0x3719 (correct), seq 1, ack 1, win 8280, options [mp capable flags:H sndkey:96e576198c475350 rcvkey:d7b705e4d86c1a66], length 0

The third segment, contains the MP_CAPABLE option that includes the keys chosen by the sender and the receiver. Since the client repeats the sender and receiver keys in the ACK segment, the server can remain stateless.

43:31.357386 IP (tos 0x0, ttl 64, id 53779, offset 0, flags [DF], proto TCP (6), length 204)
168.2.2.62787 > siri.4.https: Flags [P.], cksum 0xeb34 (correct), seq 1:145, ack 1, win 8280, options [mp dss flags:MA dack: 2248627404 dsn: 3845908739 sfsn: 1 dlen: 144,eol], length 144
43:31.357430 IP (tos 0x0, ttl 64, id 23748, offset 0, flags [DF], proto TCP (6), length 100)
168.2.2.62787 > siri.4.https: Flags [P.], cksum 0x0c31 (correct), seq 145:185, ack 1, win 8280, options [mp dss flags:MA dack: 2248627404 dsn: 3845908883 sfsn: 145 dlen: 40,eol], length 40
43:31.385032 IP (tos 0x0, ttl 244, id 30705, offset 0, flags [DF], proto TCP (6), length 48)
     siri.4.https > 192.168.2.2.62787: Flags [.], cksum 0x4c82 (correct), seq 1, ack 145, win 2221, options [mp dss flags:A dack: 3845908883], length 0
43:31.389460 IP (tos 0x0, ttl 244, id 31058, offset 0, flags [DF], proto TCP (6), length 48)
      siri.4.https > 192.168.2.2.62787: Flags [.], cksum 0x4c34 (correct), seq 1, ack 185, win 2219, options [mp dss flags:A dack: 3845908923], length 0

The data transfer can now start. Siri uses HTTPS and thus the Multipath TCP connection begins with a TLS handshake. The details of this handshake are not important for Multipath TCP. There are some interesting details to mention concerning this utilisation of Multipath TCP. First, iOS7 does not seem to use the DSS checksum. This checksum was designed to detect payload modifications by middleboxes. With TLS, it is unlikely that a middlebox will modify the contents of the segment. Second, the DSNs and Data acks are 32 bits wide while RFC 6824 defines both 32 bits and 64 bits Data sequence numbers. iOS7 seems to place one DSS option inside each segment.

When analyzing packet traces, it is often interesting to show graphically the evolution of the connection. For regular TCP, tcptrace provides various ways to visualise the evolution of a TCP connection. Benjamin Hesmans is developing a tool that will provide the same features but for Multipath TCP. This tool is still being developed, but it already provides some nice visualisations. Since Siri only sends a small amount of data, we can only plot the evolution of the Multipath TCP Data Sequence Number.

The figure below shows the flow of data from the client (i.e. the iPad) to the server. Each vertical bar corresponds to one or more segments and the red dots represent acknowledgments. The WiFi network used for the test worked well and there were no losses.

With Siri, most of the data is sent by the client as shown by the server sequence number trace below. After the TLS handshake, only very few data is sent by the server.

The Multipath TCP control stream

Wed, 19 Feb 2014 00:00:00 +0100

The Multipath TCP control stream

During IETF89 we will present a draft [CS] that proposes to define the semantics of a single bit inside the DSS option. This change might appear small at first glance, but it could have a huge impact on the evolution of Multipath TCP and its future.

The DSS option is defined in RFC 6824 to encode the mapping between the Data Sequence Number and the subflow sequence number. RFC 6824 supports one bytestream in each direction between the communicating hosts. This bytestream is use to carry the data supplied by the user applications.

The Control Stream draft [CS] proposes to support two bytestreams in each direction. The first is the regular bytestream that is used to transport regular data. The second is a bytestream that allows the communicating hosts to exchange control information that is relevant for the Multipath TCP connection. [CS] defines the S bit in the DSS option shown below to indicate whether the mapping corresponds to the regular bytestream or to the control stream.

                    1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+----------------------+
|     Kind      |    Length     |Subtype|(reserved)|S|F|m|M|a|A|
+---------------+---------------+-------+----------------------+
|        Control ACK (4 or 8 octets, depending on flags)       |
+--------------------------------------------------------------+
|Control sequence number (4 or 8 octets, depending on flags)   |
+--------------------------------------------------------------+
|              Subflow Sequence Number (4 octets)              |
+-------------------------------+------------------------------+
|Control-Level Length (2 octets)|      Checksum (2 octets)     |
+-------------------------------+------------------------------+

 The S bit of the 'reserved' field is set to 1 when sending on the
                       control stream.

Why would someone want to support two bytestreams over a single Multipath TCP connection ?

The main motivation is that we would like to exchange control information between communicating Multipath TCP hosts without being limited by the existing TCP options :

TCP options are sent unreliably. When a host sends a segment that contains an ADD_ADDR option inside an acknowledgement, it cannot be certain that this option will be delivered to the other hosts. Some techniques to improve the reliability of the delivery of this option are discussed in [CellNet12]

TCP options have a limited size. In the Multipath TCP handshake, we use several tricks to extract the required tokens and ISDN from a hash computation to minimize the length of the MP_CAPABLE option, but this hack is far from perfect

With a bytestream that allows to send control information inside the payload of TCP segments, it is possible to define new techniques to synchronise the two communicating state machines. As a first example, it becomes possible to ensure a reliable delivery of the ADD_ADDR option. Consider a client having several IPv6 addresses. An RFC 6824 compliant implementation would probably send these addresses inside independent TCP segments as shown below :

s [ label = "SYN(seq=x)\n\n"]; |||; s=>c [label = "SYN+ACK(seq=y,ack=x+1)\n\n"]; |||; c=>s [label="ACK(ack=y+1)\n\n"]; |||; c=>s [label="ACK(seq=x+1,ack=y+1)ADD_ADDR(IP1)\n\n"]; |||; c-Xs [label="ACK(seq=x+1,ack=y+1)ADD_ADDR(IP2)\n\n"]; |||; c=>s [label="ACK(seq=x+1,ack=y+1)ADD_ADDR(IP3)\n\n"]; |||; }" usemap="#36e5d88545e9727164e9eb84cc4810a461aef8ed"/>

The only way for the sender to recover from the loss of the segment advertising IP2 is to regularly send the list of addresses that it owns. This is inefficient.

With the control stream, advertising several addresses becomes much simpler.

s [ label = "SYN(seq=x)\n\n"]; |||; s=>c [label = "SYN+ACK(seq=y,ack=x+1)\n\n"]; |||; c=>s [label="ACK(ack=y+1)\n\n"]; |||; c-Xs [label="ACK(seq=x+1,ack=y+1)DSS(CS,IP1-IP2-IP3)\n\n", linecolour=red]; |||; ...; |||; c=>s [label="ACK(seq=x+1,ack=y+1)DSS(CS,IP1-IP2-IP3)\n\n", linecolour=red]; |||; s=>c [label="ACK()\n\n", linecolour=red]; }" usemap="#001a16cc1632cec41b443b5c9cf1d2c86ada432b"/>

The same applies to the RM_ADDR option. With the control stream, the list of the addresses owned by each host can be exchanged reliably.

This is not the only application of the proposed control stream. The control stream could prove to be very useful to enhance the security of Multipath TCP. RFC 6824 includes a basic method to “authenticate” the addition of subflows by exchanging 64 bits keys in clear during the initial three-way handshake. From a security viewpoint, exchanging 64 bits in clear is obviously not the best solution. A better solution would be to use longer keys and rely on a key exchange scheme that is secure even if a passive listener is able to capture the segments exchanged. By relying exclusively on TCP options, this is impossible. With the control stream, it becomes possible to use any secure key agreement mechanism such as Diffie Hellmann or any other scheme to agree on a shared secret. Once the shared secret has been negotiated, it can be used to authenticate the establishment of the additional subflows.

Transporting control information inside the payload of segments may sound familiar to those who have followed the discussions that lead to the design of Multipath TCP. During several months in 2010, the MPTCP working group discussed about two solutions to transport data over different paths.

The first approach, that became later RFC 6824 only uses TCP options to encode all the control information. This solution was considered to be optimal to pass through various types of middleboxes. Recent experience with Multipath TCP implementations shows that Multipath TCP can indeed pass through most types of middleboxes.

The second approach, proposed by Michael Scharf in [MCTCP] , was to encode all the control information inside the payload of the TCP segments. For this, MC-TCP relies on a TLV-format to exchange both control and user data. Compared to the first approach, the advantage of MC-TCP was that it was possible to implement it as a library in user-space, but the MPTCP working group felt that this solution was too risky given the prevalence of middleboxes.

The control stream makes a minimal use of the TLV format to encode some control information. It remains to be seen whether there are interactions with some types of middleboxes that could lead to problems. DPIs are a likely source of problem for the control stream, but they already hae a problem today with Multipath TCP if they do not process all the data for a given Multipath TCP connection. Adding the control stream does not create an additional problem and one can expect that with the deployment of Multipath TCP on all iOS7 devices, middlebox vendors will start to add support for Multipath TCP on the DPI boxes…

Bibliography

[CS]	(1, 2, 3) Christoph Paasch, O. Bonaventure, A generic control stream for Multipath TCP , February 2014, Internet draft, work in progress, https://datatracker.ietf.org/doc/draft-paasch-mptcp-control-stream/

[MCTCP]

Michael Scharf, Multi-Connection TCP (MCTCP) Transport , Internet draft, July 2010, work in progress

[CellNet12]

Christoph Paasch, Gregory Detal, Fabien Duchene, Costin Raiciu, and Olivier Bonaventure. 2012. Exploring mobile/WiFi handover with multipath TCP . In Proceedings of the 2012 ACM SIGCOMM workshop on Cellular networks: operations, challenges, and future design (CellNet ‘12). ACM, New York, NY, USA, 31-36. http://doi.acm.org/10.1145/2342468.2342476

Source Coding and Fountain Codes meet Multipath TCP

Mon, 17 Feb 2014 00:00:00 +0100

Source Coding and Fountain Codes meet Multipath TCP

Multipath TCP continues to attract the interest of networking researchers. In less than one week, two preprints of papers that apply networking coding techniques to improve the performance Multipath TCP have been published.

In the first paper [FMTCP] , Yong Cui and his colleagues propose to apply Fountain Codes, as defined in RFC 6380 to Multipath TCP. This is an interesting approach that encodes the data and adds redundancy to the data before transmission. Thanks to this coding scheme, FMTCP does not need to retransmit all segments when losses occur. Of course, the tradeoff is that FMTCP generates more data than Multipath TCP. The paper evaluates the performance of FMTCP by comparing it with Multipath TCP by relying on ns-2 simulations. The simulations show that on lossy links FMTCP performs better than Multipath TCP. This is not surprising given that Multipath TCP’s congestion control scheme tries to move traffic away from lossy paths by reducing its transmission rate on those paths. FMTCP on the other hand can easily recover from a limited number of losses by relying on its coding scheme. It would be interesting to complement these simulations by :

evaluating the performance of several competing FMTCP flows. This would show the impact of the losses on the congestion control scheme that is used by FMTCP (the paper is not very clear about the congestion control scheme that it used and unfortunately, the source code of the simulator used is not referenced in the paper)

evaluating the support of interactive applications. Such applications are more likely to benefit from FMTCP than bulk transfer, but this might have an impact on the coding scheme that will not have to deal anymore with fixed size segments

Another related paper is SC-MPTCP [SCMPTCP] . SC-MPTCP relies on source coding to encode the segments before being transmitted. It also compares SC-MPTCP with regular Multipath TCP, but explicitly mentions the utilisation of the coupled congestion control scheme and relies on ns-3 simulations. The simulations analyze the impact of losses on the performance of both SC-MPTCP and regular Multipath TCP. In addition, it also proposes a scheduling technique to select on which subflows segments have to be sent to minimize reordering on the receiver and thus prevent head-of-line blocking when the receive buffer is limited. This technique is applicable to both regular Multipath TCP and SC-MPTCP. The paper then evaluates two interesting multihoming scenarios and shows the benefits of SC-MPTCP compared to Multipath TCP.

These two papers show that by adding redundancy to the data segments, it is possible to improve the performance of Multipath TCP in lossy environments. These results were based on simulations and there are two remaining questions that need to be answered :

How can these techniques be efficiently implemented in TCP stacks, either kernel-based on by modifying user-level TCP stacks that start to (re)appear ? How do applications interact with such modified stacks ?

What happens when several sources using source coding or fountain codes compete for the same bottleneck link where packets are discarded due to congestion ? Is the loss pattern as favorable for the source coding and fountain codes are random losses that mainly correspond to wireless links ?

Bibliography

[FMTCP]

Yong Cui, Lian Wang, Xin Wang,Hongyi Wang, and Yining Wang, FMTCP: A Fountain Code-Based Multipath Transmission Control Protocol, to appear in IEEE/ACM Transactions on Networking, http://dx.doi.org/10.1109/TNET.2014.2300140

[SCMPTCP]

Ming Li, Andrey Lukyanenko, Sasu Tarkoma, Yong Cui, Antti Yla-Jaaski, Tolerating path heterogeneity in multipath TCP with bounded receive buffers, Computer Networks, Available online 6 February 2014, ISSN 1389-1286, http://dx.doi.org/10.1016/j.comnet.2014.01.011.

Computing MPTCP’s initial Data Sequence Number (IDSN)

Tue, 04 Feb 2014 00:00:00 +0100

Computing MPTCP’s initial Data Sequence Number (IDSN)

On a regular TCP subflow, the sequence number used in the SYN segment serves as the initial sequence number. All subsequent segments are numbered starting at this initial sequence number.

s [ label = "SYN(seq=x)\n\n"]; |||; s=>c [label = "SYN+ACK(seq=y,ack=x+1)\n\n"]; |||; c=>s [label="ACK(ack=y+1)\n\n"]; |||; c=>s [label="First data(seq=x+1,ack=y+1)\n\n"]; |||; }" usemap="#2d63b19605bdb327b313485ff1d93b9764c81291"/>

Multipath TCP uses two levels of sequence numbers : the regular sequence numbers that appear inside the header of each TCP segment and the Multipath-level Data Sequence Numbers that are used inside Multipath TCP options. The DSNs enable the receiver to reorder the data received over the different subflows. The Data Sequence Number is incremented every time data is sent and an initial Data Sequence Number is negotiated during the three way handshake on the first subflow. Due to the limited TCP option space, the initial DSN is computed from the information exchanged during the three-way handshake.

s [ label = "SYN(seq=x, MP_CAPABLE(ClientKey))\n\n" ]; |||; s=>c [label = "SYN+ACK(seq=y,ack=x+1, MP_CAPABLE(ServerKey))\n\n" ]; |||; c=>s [label="ACK(ack=y+1), MP_CAPABLE(ClientKey,ServerKey)\n\n"]; |||; }" usemap="#e509aec4067bfbf247bf8c759cbe8c37f19e852a"/>

At the end of the three-way handshake, the initial DSN can be computed as the low order 64 bits of the hash of the sender’s key. At this point, one could wonder why we need an initial DSN for each Multipath TCP connection. There are two reasons for that. The first one is that using an initial DSN can improve the resilience to segment injection attacks by using an initial DSN that cannot be easily predicted by attackers. The second reason is to prevent data losses in case of failure of the initial subflow. Consider the scenario depicted below. The client creates the first subflow,

s [ label = "SYN(seq=x, MP_CAPABLE(ClientKey))\n\n" ]; |||; s=>c1 [label = "SYN+ACK(seq=y,ack=x+1, MP_CAPABLE(ServerKey))\n\n" ]; |||; c1=>s [label="ACK(ack=y+1), MP_CAPABLE(ClientKey,ServerKey)\n\n"]; |||; c1-x s [label="First Data(seq=x+1,DSS(seq=idsn),data=a)\n\n"]; |||; c2=>s [ label = "SYN(seq=w, MP_JOIN)\n\n" ]; |||; s=>c2 [label = "SYN+ACK(seq=z,ack=w+1, MP_JOIN)\n\n"]; |||; c2=>s [label="ACK(ack=z+1))\n\n"]; |||; c2=> s [label="Second Data(seq=w+1,DSS(seq=idsn+1),data=b)\n\n"]; |||; }" usemap="#0616a6419ef3ed53b910c13896d9bf7e4f9ec87b"/>

Without an agreement on the initial DSN, the server would not know whether the first data that it receives on the second subflow is the initial data at the Multipath TCP level or not.

Measuring with Multipath TCP

Fri, 31 Jan 2014 00:00:00 +0100

Measuring with Multipath TCP

We often receive questions from students or engineers who start to use Multipath TCP about the measurement software that they can use to assess the performance of protocol. The best approach is often to use real traffic with real applications because this will correspond to the real use case for Multipath TCP. However, it is not always possible to deploy Multipath TCP on a large number of clients and servers to perform such experiments. During the last years, we’ve used several software tools to measure the performance of Multipath TCP :

ipref3 allows to measure the memory-to-memory throughput

netperf supports both memory-to-memory transfers and request/response types of transfers

ab the Apache HTTP server benchmarking tool allows to measure the performance of web servers such as apache

weighttp another web server benchmarking tool

There are other, more generic measurement tools that could also be useful for some types of measurements, but we do not have direct experience with them :

mgen from NRL is capable of generating various types of traffic patterns over UDP and TCP

D-ITG [Botta2012] allows to generate traffic according to some statistical properties. Also works above TCP.

If you’ve used other open-source software to measure Multipath TCP performance, feel free to add comments below so that we can update this page.

References

[Botta2012]

Botta, A. Dainotti, A. Pescape, A tool for the generation of realistic network workload for emerging networking scenarios , Computer Networks (Elsevier), 2012, Volume 56, Issue 15, pp 3531-3547, http://dx.doi.org/10.1016/j.comnet.2012.02.019

Multipath TCP algorithms : theory and design

Thu, 30 Jan 2014 00:00:00 +0100

Multipath TCP algorithms : theory and design

The congestion control schemes used by Multipath TCP have been designed based on earlier theoretical work. The current implementation in the Linux kernel supports two congestion control schemes : Coupled RFC 6356 and OLIA [Khalili2012] . Several other multipath congestion control schemes have been proposed and it is likely that the quest for the best multipath congestion control scheme will continue in the coming years.

In [Peng2013] , Qiuyu Peng and his colleagues propose a theoretical analysis of several multipath TCP congestion schemes and compares them based on ns-2 simulations. The theoretical analysis considers three criteria :

TCP Friendliness or is the congestion control scheme fair compared with regular TCP (NewReno in this case RFC 5681)

Responsiveness or how quickly can the congestion control scheme can adapt to changing network conditions

Window Fluctuations

The second criteria is very interesting because real traffic is more bursty than the large file transfers that are used to evaluate congestion control schemes by simulations.

Four existing congestion control schemes are compared with a newly proposed one in [Peng2013] . Unfortunately, the paper only considers early multipath congestion control schemes that predate RFC 6356 and does not compare the proposed algorithm with OLIA [Khalili2012] . Considering OLIA and perhaps other congestion control schemes and analyzing more complex simulations would be a useful extension to this work.

References

[Khalili2012]

(1, 2) Ramin Khalili, Nicolas Gast, Miroslav Popovic, Utkarsh Upadhyay, and Jean-Yves Le Boudec. 2012. MPTCP is not pareto-optimal: performance issues and a possible solution. In Proceedings of the 8th international conference on Emerging networking experiments and technologies (CoNEXT ‘12). ACM, New York, NY, USA, 1-12. http://doi.acm.org/10.1145/2413176.2413178

[Peng2013]

(1, 2) Qiuyu Peng, Anwar Walid, and Steven H. Low. 2013. Multipath TCP algorithms: theory and design. In Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems (SIGMETRICS ‘13). ACM, New York, NY, USA, 305-316. http://doi.acm.org/10.1145/2465529.2466585

Multipath TCP and middleboxes

Wed, 29 Jan 2014 00:00:00 +0100

Multipath TCP and middleboxes

The design of Multipath TCP RFC 6824 has been heavily influenced by the presence of middleboxes in the global Internet. In [Honda2011] Micchio Honda and his colleagues showed that middleboxes could change almost any field of the IP and TCP headers. Based on these measurements, the MPTCP working group developed various heuristics to enable Multipath TCP to cope with the interference caused by middleboxes. The implementation of these heuristics inside the Multipath TCP implementation in the Linux kernel was more difficult than initially taught given all the corner cases that had to be supported. To verify the correct operation of these heuristics, Benjamin Hesmans wrote a set of Click elements that implement models of the various interferences that can be caused by middleboxes on TCP segments [Hesmans2013b] . MBClick [Hesmans2013a] could enable other implementors of Multipath TCP or other extensions to validate the interactions between their implementation and middleboxes.

While testing the impact of middleboxes, we also evaluated whether already deployed TCP extensions were vulnerable to middlebox interference. We knew from practical experience with our local firewalls that sequence number randomisation was still a default in many firewalls. We measured the impact of sequence number randomisation on existing implementations of the TCP Selective Acknowledgement options.

Source [Hesmans2013b]

The figure above shows that when an old firewall randomises the TCP sequence numbers without randomising the SACK blocks, the TCP throughput is lower when SACKs are enabled than when they are disabled. This unexpected results is due to an implementation choice in the Linux and MacOS versions that we tested. Both stacks completely ignore a packet that contains an invalid SACK block. With a dump sequence number randomiser, almost all received SACK blocks are invalid. This implies that as soon as there are packet losses, TCP acknowledgements are considered to be invalid and ignored. This blocks the fast retransmit mechanism and TCP can only rely on its retransmission timer to recover from packet losses.

References

[Hesmans2013a]

Benjamin Hesmans, Click elements to model middleboxes, https://bitbucket.org/bhesmans/click}

[Hesmans2013b]

(1, 2) Benjamin Hesmans, Fabien Duchene, Christoph Paasch, Gregory Detal and Olivier Bonaventure, Are TCP Extensions Middlebox-proof?, Proceedings of the 2013 Workshop on Hot Topics in Middleboxes and Network Function Virtualization (HotMiddlebox), 2013, http://dx.doi.org/10.1145/2535828.2535830

[Honda2011]

Michio Honda, Yoshifumi Nishida, Costin Raiciu, Adam Greenhalgh, Mark Handley, Hideyuki Tokuda, Is it still possible to extend TCP?, Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, November 02-04, 2011, Berlin, Germany , http://dx.doi.org/10.1145/2068816.2068834

Multipath Algorithms and Strategies to Improve TCP Performance over Wireless Mesh Networks

Wed, 29 Jan 2014 00:00:00 +0100

Multipath Algorithms and Strategies to Improve TCP Performance over Wireless Mesh Networks

Multipath TCP can be applied in a wide range of network environments. Wireless mesh networks, with their ability to provide multiple paths between a pair of nodes are particularly appealing. David Gomez and his colleagues discuss in [1] some possible usages of Multipath TCP in wireless mesh networks. The article relies on ns-3 simulations to evaluate the performance of Multipath TCP in a wireless mesh network. Three simple scenarios are considered. The first one uses regular TCP as the baseline. The second uses Multipath TCP with only one radio interface while the third considers two non-overlapping radio interfaces per node. Unsurprisingly, the simulations show that Multipath TCP outperforms regular TCP when hosts have two radio interfaces.

Simulations with MPTCP over wireless mesh (source [1] )

Unfortunately, the authors do not provide the simulator and the simulation scripts used for their research. Given the availability of Multipath TCP in the Linux kernel , it would be interesting to deploy Multipath TCP in real test mesh networks and analyze its performance.

[1]

(1, 2) David Gomez, Carlos Rabadan, Pablo Garrido, Ramon Aguero, Multipath Algorithms and Strategies to Improve TCP Performance over Wireless Mesh Networks, Mobile Networks and Management, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Volume 125, 2013, pp 15-28, Springer, http://dx.doi.org/10.1007/978-3-319-04277-0_2

On the Benefits of Applying Experimental Design to Improve Multipath TCP

Tue, 28 Jan 2014 00:00:00 +0100

On the Benefits of Applying Experimental Design to Improve Multipath TCP

Multipath TCP, despite being an extension to TCP is still a relatively young protocol. Achieving high performance in a wide range of network conditions is still an issue. The paper On the Benefits of Applying Experimental Design to Improve Multipath TCP is an interesting paper that attempts to improve the performance of Multipath TCP. While many researchers focus on simulations and usually evaluate some network scenarios, often due to space limitations, this paper takes a different approach.

First, it applies the Experimental Design methodology develops by statisticians. This methodology allows to efficiently and accurately (from a statistical viewpoint) evaluate the performance of a system by taking into account the impact of various parameters and intelligently selecting their values.

Second, the improvements proposed to Multipath TCP are not simply applied and evaluated to a simplified model of Multipath TCP implemented inside a simulator. They are evaluated directly on Linux kernel implementation of Multipath TCP available from http://www.multipath-tcp.org . The measurements are performed on the mininet platform. The source code of the Linux implementation and all the measurement scripts are available from http://multipath-tcp.org/conext2013 . This ensures that any researcher or protocol developer can validate, reproduce and improve the algorithms described in the paper. In an ideal world, all research papers would provide the information that allow to quickly reproduce the research described in the paper…

Reference

[PKB2013] C. Paasch, R. Khalili, O. Bonaventure, On the Benefits of Applying Experimental Design to Improve Multipath TCP, Conext 2013, Dec. 2013, Santa Barbara, USA, http://dx.doi.org/10.1145/2535372.2535403

Address type	Count
Link-local (IPv4)	51
Link-local (IPv6)	241
Documentation only (IPv6)	21
IPv4-compatible IPv6	13
6to4	206

All TCP	Total	IPv4	IPv6
# of packets [Mpkt]	136.1	128.5	7.6
# of bytes [GiByte]	134.0	129.0	5.0

Multipath TCP	Total	IPv4	IPv6
# of packets [Mpkt]	29.4	25.0	4.4
# of bytes [GiByte]	20.5	18.5	2.0