Multipath TCP through a strange middlebox

Users of the Multipath TCP implementation in the Linux kernel perform experiments in various networks that the developpers could not have access to. One of these users complained that Multipath TCP was not working in a satellite environment. Such networks often contain Performance Enhancing Proxies (PEP) that “tune” TCP connections to improve their performance. Often, those PEPs terminate TCP connections and the MPTCP options sent by the client never reach the server. This was not the case in this network and the user complained that Multipath TCP did not advertise the addresses of the server. Fortunately, he managed to capture a packet trace on both the client and the server. An analysis of this packet trace gives interesting insights on the impact of such PEPs on TCP extensions.

The network topology is very simple. The client has two private interfaces (client1 and client2), both behind NATs and the server has two public IP addresses. In the trace below we replace the private IP addresses of the client by client1 and client2. Its public IP address is replaced by client and the two server addresses are server1 and server2.

The client opens a TCP connection towards the server.

09:27:12.316613 IP (tos 0x0, ttl 64, id 15494, offset 0, flags [DF], proto TCP (6), length 72)
 client1.47862 > server1.49803: Flags [S], cksum,
 seq 3452765235, win 28440, options [mss 1422,sackOK,TS val 55654581 ecr 0,
 nop,wscale 8,mptcp capable {0x69ccde41dca19b8f}], length 0

09:27:13.318852 IP (tos 0x0, ttl 64, id 15495, offset 0, flags [DF], proto TCP (6), length 72)
  client1.47862 > server1.49803: Flags [S], cksum,
  seq 3452765235, win 28440, options [mss 1422,sackOK,TS val 55655584 ecr 0,
  nop,wscale 8,mptcp capable {0x69ccde41dca19b8f}], length 0

This is a normal TCP SYN segment with the MSS, SACK, timestamp and MP_CAPABLE options. The second packet does not seem to reach the server. The first is translated by the NAT and received as follows by the server.

09:27:22.729048 IP (tos 0x0, ttl 47, id 15494, offset 0, flags [DF], proto TCP (6), length 72)
  client.47862 > server1.49803: Flags [S], cksum,
  seq 3452765235, win 384, options [mss 1285,sackOK,TS val 55654581 ecr 0,
  nop,wscale 8,mptcp capable {0x69ccde41dca19b8f}], length 0

There are several interesting points to observe when comparing the two packets. First, the MSS option is modified. This is not unusual but indicates a middlebox on the path. Note that the window is severely reduced (384 instead of 28440). The server replies with a SYN+ACK.

09:27:22.729220 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 72)
   server1.49803 > client.47862: Flags [S.], cksum,
   seq 3437506945, ack 3452765236, win 28560, options [mss 1460,sackOK,
   TS val 155835098 ecr 55654581,nop,wscale 8,
   mptcp capable {0x32205e67a94ad606}], length 0

This segment is also modified by the middlebox. It updates the MSS, window, but does not change the timestamp chosen by the server.

09:27:14.188324 IP (tos 0x0, ttl 51, id 0, offset 0, flags [DF], proto TCP (6), length 72)
   server1.49803 > client1.47862: Flags [S.], cksum,
   seq 3437506945, ack 3452765236, win 384, options [mss 1285,sackOK,
   TS val 155835098 ecr 55654581,nop,wscale 8,
   mptcp capable {0x32205e67a94ad606}], length 0

Since the MP_CAPABLE option has been received in the SYN+ACK segment, the client can confirm the utilisation of Multipath TCP on this connection. This is done by placing the MP_CAPABLE option in the third ack.

09:27:14.188574 IP (tos 0x0, ttl 64, id 15496, offset 0, flags [DF], proto TCP (6), length 80)
   client1.47862 > server1.49803: Flags [.], cksum,
   seq 1, ack 1, win 112, options [nop,nop,TS val 55656453 ecr 155835098,
   mptcp capable {0x69ccde41dca19b8f,0x32205e67a94ad606},
   mptcp dss ack 3426753824], length 0

This segment is received by the server as follows.

09:27:23.456784 IP (tos 0x0, ttl 47, id 15495, offset 0, flags [DF], proto TCP (6), length 80)
   client.47862 > server1.49803: Flags [.], cksum,
   seq 1, ack 1, win 384, options [nop,nop,TS val 55654655 ecr 155835098,
   mptcp capable {0x69ccde41dca19b8f,0x32205e67a94ad606},
   mptcp dss ack 3426753824], length 0

The middlebox has updated the window and the timestamp but it did not change anything in the MP_CAPABLE option and Multipath TCP is confirmed on both the client and the server. The server sends immediately a duplicate acknowledgement containing the ADD_ADDR option to announce its second address.

09:27:23.456960 IP (tos 0x0, ttl 64, id 60464, offset 0, flags [DF], proto TCP (6), length 68)
  server1.49803 > client.47862: Flags [.], cksum,
  seq 1, ack 1, win 112, options [nop,nop,TS val 155835826 ecr 55654655,
  mptcp add-addr id 3 server2,mptcp dss ack 2495228045], length 0

Unfortunately, this segment never reaches the client. As the current path managers do not retransmit the ADD_ADDR option on a regular basis, the client is never informed of the second address.

Since the client also has a second address, it tries to inform the server by sending a duplicate acknowledgement.

09:27:14.188636 IP (tos 0x0, ttl 64, id 15497, offset 0, flags [DF], proto TCP (6), length 68)
  client1.47862 > server1.49803: Flags [.], cksum,
  seq 1, ack 1, win 112, options [nop,nop,TS val 55656453 ecr 155835098,
  mptcp add-addr id 4 client2,mptcp dss ack 3426753824], length 0

This segment never reaches the server. It is likely that the PEP notices that the segment is a duplicate acknowledgement and filters them. Maybe a solution to enable Multipath TCP to correctly pass through this particular middlebox would be place the ADD_ADDR option inside segments that contain data or use techniques to ensure their reliable delivery as proposed in Exploring Mobile/WiFi Handover with Multipath TCP

Note that the Multipath TCP options are correctly transported in other packets. For example, here is the first data segment sent by the server.

09:27:23.466575 IP (tos 0x0, ttl 64, id 60465, offset 0, flags [DF], proto TCP (6), length 107)
  server1.49803 > client.47862: Flags [P.], cksum,
  seq 1:36, ack 1, win 112, options [nop,nop,TS val 155835836 ecr 55654655,
  mptcp dss ack 2495228045 seq 3426753824 subseq 1 len 35,nop,nop], length 35

This segment is received by the client as follows.

09:27:14.987619 IP (tos 0x0, ttl 51, id 60465, offset 0, flags [DF], proto TCP (6), length 107)
  server1.49803 > client1.47862: Flags [P.], cksum,
  seq 1:36, ack 1, win 320, options [nop,nop,TS val 155835173 ecr 55656453,
  mptcp dss ack 2495228045 seq 3426753824 subseq 1 len 35,nop,nop], length 35

The middlebox has modified the timestamp and windows but did not change the Multipath TCP options.

The client can also send data to the server.

09:27:14.988371 IP (tos 0x0, ttl 64, id 15499, offset 0, flags [DF], proto TCP (6), length 93)
  client1.47862 > server1.49803: Flags [P.], cksum,
  seq 1:22, ack 36, win 112, options [nop,nop,TS val 55657253 ecr 155835173,
  mptcp dss ack 3426753859 seq 2495228045 subseq 1 len 21,nop,nop], length 21

The server receives this segment as follows.

09:27:24.320654 IP (tos 0x0, ttl 47, id 15497, offset 0, flags [DF], proto TCP (6), length 93)
  client.47862 > server1.49803: Flags [P.], cksum,
  seq 1:22, ack 36, win 320, options [nop,nop,TS val 55654742 ecr 155835836,
  mptcp dss ack 3426753859 seq 2495228045 subseq 1 len 21,nop,nop], length 21

It is interesting to compare the acknowledgement sent by the client for this segment.

09:27:14.987885 IP (tos 0x0, ttl 64, id 15498, offset 0, flags [DF], proto TCP (6), length 60)
  client1.47862 > server1.49803: Flags [.], cksum,
  seq 1, ack 36, win 112, options [nop,nop,TS val 55657253 ecr 155835173,
  mptcp dss ack 3426753859], length 0

with the acknowledgement that the server actually receives.

09:27:23.487569 IP (tos 0x0, ttl 237, id 15496, offset 0, flags [DF], proto TCP (6), length 68)
  client.47862 > server1.49803: Flags [.], cksum,
  seq 1, ack 36, win 320, options [nop,eol], length 0

The server receives the acknowledgement within 21 msec of the transmission of the data segment. Furthermore, it has a TTL of 237 while the acknowledgement sent by the client had a TTL of 64. Since both packets have a different IPv4 id, it is very likely that the acknowledgement was generated by the PEP and not copied from the client. Note that the middlebox has replaced the second nop option with an eol option. A closer look at the packet reveals something even stranger. The IPv4 packet is 68 bytes long while it contains an IPv4 header (20 bytes), a TCP header (20 bytes) and the nop and eol options, both one byte long. The packet contains thus 26 bytes of garbage (starting with 0c1e below) :

0x0010:            baf6 c28b cdcd 0434 cce4 31a5
0x0020:  c010 0140 1515 0000 0100 0c1e 69cc de41
0x0030:  dca1 9b8f 0a08 0101 0351 3902 0949 ddbc
0x0040:  0000 0000

The value of the TCP Data Offset (c, i.e. 48 bytes) indicates that the middlebox considers that bytes 0c1e ... 0000 belong to the TCP options, but since they appear after the eol option, they are ignored by the TCP stack on the receiver.

This actually removes the timestamp option and the DSS option. Removing the timestamp option was possible according to RFC 1323, but this behaviour is not anymore permitted with RFC 7323. The removal of the DSS option is a problem for Multipath TCP since there is no data acknowledgement. Fortunately, RFC 6824 and the Multipath TCP implementation in the Linux kernel have predicted this problem. Indeed, this ack acknowledges new data without containing a DSS option and Multipath TCP immediately fallsback to regular TCP. This preserves the connectivity at the cost of losing the benefits of Multipath TCP.

A similar problem happens in the other direction. The server has stopeed using Multipath TCP and sends the following packet.

09:27:24.320870 IP (tos 0x0, ttl 64, id 60466, offset 0, flags [DF], proto TCP (6), length 52)
  server1.49803 > client.47862: Flags [.], cksum,
  seq 36, ack 22, win 112, options [nop,nop,TS val 155836690 ecr 55654742], length 0

This ACK does not contain any DSS option. It is processed by the middlebox that removes the timestamp option.

09:27:15.787814 IP (tos 0x0, ttl 254, id 60466, offset 0, flags [DF], proto TCP (6), length 68)
  server1.49803 > client1.47862: Flags [.], cksum,
  seq 36, ack 22, win 320, options [nop,eol], length 0

Again note that the change in the TTL indicates that the middlebox has created a new packet to convey the acknowledgement to the client. At this point, the client fallsback to regular TCP as well as shown by the next segment that it sends.

09:27:15.788047 IP (tos 0x0, ttl 64, id 15500, offset 0, flags [DF], proto TCP (6), length 844)
  client1.47862 > server1.49803: Flags [P.], cksum,
  seq 22:814, ack 36, win 112, options [nop,nop,TS val 55658053 ecr 155835173], length 792

The transfer continues like a regular TCP connection. Note that the TCP timestamps are back. This strange middlebox shows that the objective of preserving connectivity in the presence of middleboxes is well met by the Multipath TCP implementation in the Linux kernel

Text updated on February 2nd and February 3rd based on comments from Raphael Bauduin and Gregory Detal