Subflow Creation / Management Issues

* Subflow Creation / Management Issues
@ 2021-11-15 22:44 Phil Greenland
  2021-11-16 11:21 ` Paolo Abeni
  2021-11-17  1:22 ` Mat Martineau
  0 siblings, 2 replies; 4+ messages in thread
From: Phil Greenland @ 2021-11-15 22:44 UTC (permalink / raw)
  To: mptcp

Hi,

I’m currently working with a client, integrating MPTCP into their platform, with a goal of achieving reliable connectivity over a mix of WAN technologies.

We started using 5.14 but were having problems getting MPTCP to handover between subflows when connectivity on individual links became poor.

Updating to 5.15 brought a massive improvement. With the scheduler updates (stale subflow detection) handovers between links are now virtually seamless. Great work :-)

We’ve started looking at corner cases that we might experience in the field.

In our use case all connections originate from the client, to a server on the internet. The client has multiple endpoint addresses registered via “ip mptcp”, over which multiple subflows are created.

———————————

Scenario 1)

If we take network links up and down, adding and removing endpoint addresses as we do so, everything appears to work well, subflows are connected and disconnected as expected.

However if a subflow fails, for example due to reception of a TCP RST from a NAT this seems to cause problems.

The next time a link is taken down (a link other than the failing one), the associated subflow is disconnected as expected.

When the link is brought back up, more often than not the failed subflow (the one stopped by a TCP RST) it re-connected and the newly added endpoint address, associated with the interface that was just brought up is not used.

I believe this behaviour is due to an interaction between the local_addr_used variable within the MPTCP meta socket pm structure and the logic in select_local_address.

The variable is incremented and decremented on subflow connection / disconnection (following endpoint addition / removal), but it's not decremented following the failure of a subflow.

The select_local_address function only seems to check if a local address is currently in use before returning it.

Therefore when an address is removed / re-added a single new subflow is permitted to be created (due to local_addr_used < local_addr_max), however the select_local_address function returns the first available address.

Scenario 2)

Following the initial MPTCP flow creation (with id 0?), if a subflow connection fails to be established, due to a connect timeout for example, no further subflows are established for the connection.

It appears that the subflow connection success handler mptcp_pm_nl_subflow_established triggers the establishment of the next. Such that once the chain is broken by a failed connection no further subflows are established, even if unused endpoint addresses remain.

Scenario 3)

As a continuation of scenario 1, if for some strange reason all subflows were to fail, all at once, or over time. I'm left with just the MPTCP meta socket. The application believes it's still connected but there’s no way it seems to recover the connection. Other than adding a new endpoint address, which will trigger one of the subflows (again not necessarily the new endpoint) to be re-connected.

———————————

Is there anything on your roadmap that might help address these few issues above?

I’ve been considering a temporary hack, which I hope would address all three. Adding a netlink call, based heavily on your add address handler, to walk the meta socket list, and try to reconnect any / all failed subflows. Which I can call periodically or in response to network events.

I’d happily attempt to develop a more formal fix, with some guidance, but haven’t done lot of Linux kernel development in the past.

Apologies for the long email / keep up the great work! It’s really good to see MPTCP make it into the mainline kernel.

Thanks,

Phil

Phil Greenland | Software Engineer | Quantulum Ltd

E  phil@quantulum.co.uk
W www.quantulum.co.uk

^ permalink raw reply	[flat|nested] 4+ messages in thread