All of lore.kernel.org
 help / color / mirror / Atom feed
* Subflow Creation / Management Issues
@ 2021-11-15 22:44 Phil Greenland
  2021-11-16 11:21 ` Paolo Abeni
  2021-11-17  1:22 ` Mat Martineau
  0 siblings, 2 replies; 4+ messages in thread
From: Phil Greenland @ 2021-11-15 22:44 UTC (permalink / raw)
  To: mptcp

Hi,

I’m currently working with a client, integrating MPTCP into their platform, with a goal of achieving reliable connectivity over a mix of WAN technologies.

We started using 5.14 but were having problems getting MPTCP to handover between subflows when connectivity on individual links became poor.

Updating to 5.15 brought a massive improvement. With the scheduler updates (stale subflow detection) handovers between links are now virtually seamless. Great work :-)

We’ve started looking at corner cases that we might experience in the field.

In our use case all connections originate from the client, to a server on the internet. The client has multiple endpoint addresses registered via “ip mptcp”, over which multiple subflows are created.

———————————

Scenario 1)

If we take network links up and down, adding and removing endpoint addresses as we do so, everything appears to work well, subflows are connected and disconnected as expected.

However if a subflow fails, for example due to reception of a TCP RST from a NAT this seems to cause problems.

The next time a link is taken down (a link other than the failing one), the associated subflow is disconnected as expected.

When the link is brought back up, more often than not the failed subflow (the one stopped by a TCP RST) it re-connected and the newly added endpoint address, associated with the interface that was just brought up is not used.

I believe this behaviour is due to an interaction between the local_addr_used variable within the MPTCP meta socket pm structure and the logic in select_local_address.

The variable is incremented and decremented on subflow connection / disconnection (following endpoint addition / removal), but it's not decremented following the failure of a subflow.

The select_local_address function only seems to check if a local address is currently in use before returning it.

Therefore when an address is removed / re-added a single new subflow is permitted to be created (due to local_addr_used < local_addr_max), however the select_local_address function returns the first available address.

Scenario 2)

Following the initial MPTCP flow creation (with id 0?), if a subflow connection fails to be established, due to a connect timeout for example, no further subflows are established for the connection.

It appears that the subflow connection success handler mptcp_pm_nl_subflow_established triggers the establishment of the next. Such that once the chain is broken by a failed connection no further subflows are established, even if unused endpoint addresses remain.

Scenario 3)

As a continuation of scenario 1, if for some strange reason all subflows were to fail, all at once, or over time. I'm left with just the MPTCP meta socket. The application believes it's still connected but there’s no way it seems to recover the connection. Other than adding a new endpoint address, which will trigger one of the subflows (again not necessarily the new endpoint) to be re-connected.

———————————

Is there anything on your roadmap that might help address these few issues above?

I’ve been considering a temporary hack, which I hope would address all three. Adding a netlink call, based heavily on your add address handler, to walk the meta socket list, and try to reconnect any / all failed subflows. Which I can call periodically or in response to network events.

I’d happily attempt to develop a more formal fix, with some guidance, but haven’t done lot of Linux kernel development in the past.

Apologies for the long email / keep up the great work! It’s really good to see MPTCP make it into the mainline kernel.

Thanks,

Phil

Phil Greenland | Software Engineer | Quantulum Ltd

E  phil@quantulum.co.uk
W www.quantulum.co.uk


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Subflow Creation / Management Issues
  2021-11-15 22:44 Subflow Creation / Management Issues Phil Greenland
@ 2021-11-16 11:21 ` Paolo Abeni
  2021-11-17  1:22 ` Mat Martineau
  1 sibling, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2021-11-16 11:21 UTC (permalink / raw)
  To: Phil Greenland, mptcp

Hello,

On Mon, 2021-11-15 at 22:44 +0000, Phil Greenland wrote:

> Scenario 1)
> 
> If we take network links up and down, adding and removing endpoint
> addresses as we do so, everything appears to work well, subflows are
> connected and disconnected as expected.
> 
> However if a subflow fails, for example due to reception of a TCP RST
> from a NAT this seems to cause problems.
> 
> The next time a link is taken down (a link other than the failing one),
> the associated subflow is disconnected as expected.
> 
> When the link is brought back up, more often than not the failed
> subflow (the one stopped by a TCP RST) it re-connected and the newly
> added endpoint address, associated with the interface that was just
> brought up is not used.
> 
> I believe this behaviour is due to an interaction between the
> local_addr_used variable within the MPTCP meta socket pm structure and
> the logic in select_local_address.
> 
> The variable is incremented and decremented on subflow connection /
> disconnection (following endpoint addition / removal), but it's not
> decremented following the failure of a subflow.
> 
> The select_local_address function only seems to check if a local
> address is currently in use before returning it.
> 
> Therefore when an address is removed / re-added a single new subflow is
> permitted to be created (due to local_addr_used < local_addr_max),
> however the select_local_address function returns the first available
> address.
> 
> Scenario 2)
> 
> Following the initial MPTCP flow creation (with id 0?), if a subflow
> connection fails to be established, due to a connect timeout for
> example, no further subflows are established for the connection.
> 
> It appears that the subflow connection success handler
> mptcp_pm_nl_subflow_established triggers the establishment of the next.
> Such that once the chain is broken by a failed connection no further
> subflows are established, even if unused endpoint addresses remain.
> 
> Scenario 3)
> 
> As a continuation of scenario 1, if for some strange reason all
> subflows were to fail, all at once, or over time. I'm left with just
> the MPTCP meta socket. The application believes it's still connected
> but there’s no way it seems to recover the connection. Other than
> adding a new endpoint address, which will trigger one of the subflows
> (again not necessarily the new endpoint) to be re-connected.
> 
> ———————————
> 
> Is there anything on your roadmap that might help address these few
> issues above?

Thank you for the extensive report. There are a lot of useful
information above. I think it would help if you could file individual
github tickets for the different scenarios. Possibly 1 && 3 could be
boundled in the same ticket.

It would be great if you could be as accurate as possible, e.g. down to
the involved 'ip mptcp' commands and/or pcap traces.

This one:

https://github.com/multipath-tcp/mptcp_net-next/issues/235

is likely related to your 2nd scenario, please add there the relevant
info/data.

> I’ve been considering a temporary hack, which I hope would address
> all three. Adding a netlink call, based heavily on your add address
> handler, to walk the meta socket list, and try to reconnect any / all
> failed subflows. Which I can call periodically or in response to
> network events.
> 
> I’d happily attempt to develop a more formal fix, with some guidance,
> but haven’t done lot of Linux kernel development in the past.

I suggest to follow-up on the github issues tracker. I think a solution
for the 2nd scenario should be quite straight-forward, while 1 && 3
will likely require more work. 

The tricky part would be likely do the 'correct' decision later when
the next subflow is created. Should we always skip endpoints that got a
RST in the past? Should we allow a limited number of re-connection
attempts? in any case that will require quite a bit of additional state
to be tracked.

> Apologies for the long email / keep up the great work! It’s really
> good to see MPTCP make it into the mainline kernel.

This kind of feedback is very helpful! Thank you for providing this
whole bunch of info!

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Subflow Creation / Management Issues
  2021-11-15 22:44 Subflow Creation / Management Issues Phil Greenland
  2021-11-16 11:21 ` Paolo Abeni
@ 2021-11-17  1:22 ` Mat Martineau
  2021-11-20 17:47   ` Phil Greenland
  1 sibling, 1 reply; 4+ messages in thread
From: Mat Martineau @ 2021-11-17  1:22 UTC (permalink / raw)
  To: Phil Greenland; +Cc: mptcp

[-- Attachment #1: Type: text/plain, Size: 5147 bytes --]


Hello Phil,

On Mon, 15 Nov 2021, Phil Greenland wrote:

> Hi,
>
> I’m currently working with a client, integrating MPTCP into their platform, with a goal of achieving reliable connectivity over a mix of WAN technologies.
>
> We started using 5.14 but were having problems getting MPTCP to handover between subflows when connectivity on individual links became poor.
>
> Updating to 5.15 brought a massive improvement. With the scheduler updates (stale subflow detection) handovers between links are now virtually seamless. Great work :-)
>
> We’ve started looking at corner cases that we might experience in the field.
>
> In our use case all connections originate from the client, to a server on the internet. The client has multiple endpoint addresses registered via “ip mptcp”, over which multiple subflows are created.
>
> ———————————
>
> Scenario 1)
>
> If we take network links up and down, adding and removing endpoint addresses as we do so, everything appears to work well, subflows are connected and disconnected as expected.
>
> However if a subflow fails, for example due to reception of a TCP RST from a NAT this seems to cause problems.
>
> The next time a link is taken down (a link other than the failing one), the associated subflow is disconnected as expected.
>
> When the link is brought back up, more often than not the failed subflow (the one stopped by a TCP RST) it re-connected and the newly added endpoint address, associated with the interface that was just brought up is not used.
>
> I believe this behaviour is due to an interaction between the local_addr_used variable within the MPTCP meta socket pm structure and the logic in select_local_address.
>
> The variable is incremented and decremented on subflow connection / disconnection (following endpoint addition / removal), but it's not decremented following the failure of a subflow.
>
> The select_local_address function only seems to check if a local address is currently in use before returning it.
>
> Therefore when an address is removed / re-added a single new subflow is permitted to be created (due to local_addr_used < local_addr_max), however the select_local_address function returns the first available address.
>
> Scenario 2)
>
> Following the initial MPTCP flow creation (with id 0?), if a subflow connection fails to be established, due to a connect timeout for example, no further subflows are established for the connection.
>
> It appears that the subflow connection success handler mptcp_pm_nl_subflow_established triggers the establishment of the next. Such that once the chain is broken by a failed connection no further subflows are established, even if unused endpoint addresses remain.
>
> Scenario 3)
>
> As a continuation of scenario 1, if for some strange reason all subflows were to fail, all at once, or over time. I'm left with just the MPTCP meta socket. The application believes it's still connected but there’s no way it seems to recover the connection. Other than adding a new endpoint address, which will trigger one of the subflows (again not necessarily the new endpoint) to be re-connected.
>
> ———————————
>

First, thank you for the feedback - it's very helpful to know how this 
MPTCP implementation is getting used and what issues people are running in 
to.

Paolo covered your questions well in relation to the existing in-kernel 
path manager that's responsible for controlling subflow establishment. 
(Thanks Paolo!)

> Is there anything on your roadmap that might help address these few issues above?
>

I think we may have something in development that will fit your needs 
better on the client side: userspace path management.

The existing in-kernel path manager is best suited to server-side use 
cases. We can still improve it to better handle the issues you've 
observed, and Paolo's advice on github issues 
(https://github.com/multipath-tcp/mptcp_net-next/issues) will help with 
that.

Userspace path management will add more netlink calls that allow a 
userspace program like mptcpd (https://github.com/intel/mptcpd) to control 
the establishement of subflows - and to customize path management without 
requiring changes to the kernel. The tradeoff is the overhead of using the 
netlink API, so this is better suited to client devices with a smaller 
number of connections, as opposed to a server with hundreds or thousands 
of connections.

Does that sound like a fit for what you're doing on the client side?

> I’ve been considering a temporary hack, which I hope would address all three. Adding a netlink call, based heavily on your add address handler, to walk the meta socket list, and try to reconnect any / all failed subflows. Which I can call periodically or in response to network events.
>
> I’d happily attempt to develop a more formal fix, with some guidance, but haven’t done lot of Linux kernel development in the past.
>
> Apologies for the long email / keep up the great work! It’s really good to see MPTCP make it into the mainline kernel.
>

Thanks again for the long email and the encouragement!


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Subflow Creation / Management Issues
  2021-11-17  1:22 ` Mat Martineau
@ 2021-11-20 17:47   ` Phil Greenland
  0 siblings, 0 replies; 4+ messages in thread
From: Phil Greenland @ 2021-11-20 17:47 UTC (permalink / raw)
  To: Mat Martineau, pabeni; +Cc: mptcp

Hi Mat and Paolo,

I’ve raised an issue for scenario 1+3 (https://github.com/multipath-tcp/mptcp_net-next/issues/242) with a packet capture and hopefully detailed enough notes to reproduce. I’ve not mentioned scenario 3 specifically but from my nosing at the code it appears to be a follow on from the behaviour described.

I’ve added a comment to the other ticket on the subflow management front.

Userspace path management wise, I built the daemon a while ago, before spotting that it was the out-of-tree kernel that had support for subflow management and it was still in the pipeline for in-tree implementation (think I’m right there?).

When it comes along it would certainly be one way of achieving our re-connection goals.

As mentioned in my GitHub comment our use case is pretty simple. Creating a limited number of connections from the client, over unreliable links, with subflows up and connected whenever possible. If the kernel could support a basic reconnection strategy this would be our preferred option initially.

Down the line we were looking to setup something slightly more elaborate.

At present we’ve got a low bandwidth stream of critical control data, which should be allowed to travel over any and all available interfaces, although prefer our faster / lower cost ones when possible (we’re hoping to use the subflow backup option for this….not quite got there yet though).

We’d like to add a high bandwidth stream of additional telemetry, which should only be allowed to create subflows over specific interfaces (our fast, low cost ones, avoiding a slow and comparatively expensive link).

This scenario would likely benefit from user space path management.

It seems that this would be completely achievable via a custom mptcpd plugin expressing our rules, once the in-tree kernel supports userspace subflow management.

A generic configuration file driven, rules based plugin would be the dream, to save me writing my own…although I’m not sure what the rules would look like, so possibly a bit of a stretch :-P.

Thanks for the advice and super fast replies!

Regards,

Phil

> On 17 Nov 2021, at 01:22, Mat Martineau <mathew.j.martineau@linux.intel.com> wrote:
> 
> 
> Hello Phil,
> 
> On Mon, 15 Nov 2021, Phil Greenland wrote:
> 
>> Hi,
>> 
>> I’m currently working with a client, integrating MPTCP into their platform, with a goal of achieving reliable connectivity over a mix of WAN technologies.
>> 
>> We started using 5.14 but were having problems getting MPTCP to handover between subflows when connectivity on individual links became poor.
>> 
>> Updating to 5.15 brought a massive improvement. With the scheduler updates (stale subflow detection) handovers between links are now virtually seamless. Great work :-)
>> 
>> We’ve started looking at corner cases that we might experience in the field.
>> 
>> In our use case all connections originate from the client, to a server on the internet. The client has multiple endpoint addresses registered via “ip mptcp”, over which multiple subflows are created.
>> 
>> ———————————
>> 
>> Scenario 1)
>> 
>> If we take network links up and down, adding and removing endpoint addresses as we do so, everything appears to work well, subflows are connected and disconnected as expected.
>> 
>> However if a subflow fails, for example due to reception of a TCP RST from a NAT this seems to cause problems.
>> 
>> The next time a link is taken down (a link other than the failing one), the associated subflow is disconnected as expected.
>> 
>> When the link is brought back up, more often than not the failed subflow (the one stopped by a TCP RST) it re-connected and the newly added endpoint address, associated with the interface that was just brought up is not used.
>> 
>> I believe this behaviour is due to an interaction between the local_addr_used variable within the MPTCP meta socket pm structure and the logic in select_local_address.
>> 
>> The variable is incremented and decremented on subflow connection / disconnection (following endpoint addition / removal), but it's not decremented following the failure of a subflow.
>> 
>> The select_local_address function only seems to check if a local address is currently in use before returning it.
>> 
>> Therefore when an address is removed / re-added a single new subflow is permitted to be created (due to local_addr_used < local_addr_max), however the select_local_address function returns the first available address.
>> 
>> Scenario 2)
>> 
>> Following the initial MPTCP flow creation (with id 0?), if a subflow connection fails to be established, due to a connect timeout for example, no further subflows are established for the connection.
>> 
>> It appears that the subflow connection success handler mptcp_pm_nl_subflow_established triggers the establishment of the next. Such that once the chain is broken by a failed connection no further subflows are established, even if unused endpoint addresses remain.
>> 
>> Scenario 3)
>> 
>> As a continuation of scenario 1, if for some strange reason all subflows were to fail, all at once, or over time. I'm left with just the MPTCP meta socket. The application believes it's still connected but there’s no way it seems to recover the connection. Other than adding a new endpoint address, which will trigger one of the subflows (again not necessarily the new endpoint) to be re-connected.
>> 
>> ———————————
>> 
> 
> First, thank you for the feedback - it's very helpful to know how this MPTCP implementation is getting used and what issues people are running in to.
> 
> Paolo covered your questions well in relation to the existing in-kernel path manager that's responsible for controlling subflow establishment. (Thanks Paolo!)
> 
>> Is there anything on your roadmap that might help address these few issues above?
>> 
> 
> I think we may have something in development that will fit your needs better on the client side: userspace path management.
> 
> The existing in-kernel path manager is best suited to server-side use cases. We can still improve it to better handle the issues you've observed, and Paolo's advice on github issues (https://github.com/multipath-tcp/mptcp_net-next/issues) will help with that.
> 
> Userspace path management will add more netlink calls that allow a userspace program like mptcpd (https://github.com/intel/mptcpd) to control the establishement of subflows - and to customize path management without requiring changes to the kernel. The tradeoff is the overhead of using the netlink API, so this is better suited to client devices with a smaller number of connections, as opposed to a server with hundreds or thousands of connections.
> 
> Does that sound like a fit for what you're doing on the client side?
> 
>> I’ve been considering a temporary hack, which I hope would address all three. Adding a netlink call, based heavily on your add address handler, to walk the meta socket list, and try to reconnect any / all failed subflows. Which I can call periodically or in response to network events.
>> 
>> I’d happily attempt to develop a more formal fix, with some guidance, but haven’t done lot of Linux kernel development in the past.
>> 
>> Apologies for the long email / keep up the great work! It’s really good to see MPTCP make it into the mainline kernel.
>> 
> 
> Thanks again for the long email and the encouragement!
> 
> 
> --
> Mat Martineau
> Intel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-20 17:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-15 22:44 Subflow Creation / Management Issues Phil Greenland
2021-11-16 11:21 ` Paolo Abeni
2021-11-17  1:22 ` Mat Martineau
2021-11-20 17:47   ` Phil Greenland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.