All of lore.kernel.org
 help / color / mirror / Atom feed
* problems with MP_FAIL timer
@ 2022-05-11 11:28 Paolo Abeni
  2022-05-11 14:16 ` Geliang Tang
  2022-05-11 15:32 ` Mat Martineau
  0 siblings, 2 replies; 5+ messages in thread
From: Paolo Abeni @ 2022-05-11 11:28 UTC (permalink / raw)
  To: mptcp, Geliang Tang

Hello,

re-reviewing again the commit:

mptcp: reset subflow when MP_FAIL doesn't respond

There are a few issue with sk_timer handling, specifically:
- no additional mptcp_data_lock is needed to stop/reset it
- I think overlapping infinite mapping reception and close may lead to
some inconsistent timer usage

More importantly I could not find the RFC reference for such
timer?!? @Geliang, @Mat could you please point that out?!?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with MP_FAIL timer
  2022-05-11 11:28 problems with MP_FAIL timer Paolo Abeni
@ 2022-05-11 14:16 ` Geliang Tang
  2022-05-11 15:32 ` Mat Martineau
  1 sibling, 0 replies; 5+ messages in thread
From: Geliang Tang @ 2022-05-11 14:16 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: MPTCP Upstream, Geliang Tang

Hi Paolo,

Paolo Abeni <pabeni@redhat.com> 于2022年5月11日周三 19:28写道:
>
> Hello,
>
> re-reviewing again the commit:
>
> mptcp: reset subflow when MP_FAIL doesn't respond
>
> There are a few issue with sk_timer handling, specifically:
> - no additional mptcp_data_lock is needed to stop/reset it
> - I think overlapping infinite mapping reception and close may lead to
> some inconsistent timer usage
>
> More importantly I could not find the RFC reference for such
> timer?!? @Geliang, @Mat could you please point that out?!?

The timer isn't mentioned in the RFC 8684. It only mentions MP_FAIL
should be sent back here:

'''
3.7. Fallback

In this case, if a receiver identifies a checksum failure when there
is only one path, it will send back an MP_FAIL option on the
subflow-level ACK, referring to the data-level sequence number of the
start of the segment on which the checksum error was detected.
'''

Thanks,
-Geliang

>
> Thanks,
>
> Paolo
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with MP_FAIL timer
  2022-05-11 11:28 problems with MP_FAIL timer Paolo Abeni
  2022-05-11 14:16 ` Geliang Tang
@ 2022-05-11 15:32 ` Mat Martineau
  2022-05-11 16:10   ` Paolo Abeni
  1 sibling, 1 reply; 5+ messages in thread
From: Mat Martineau @ 2022-05-11 15:32 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp, Geliang Tang

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

On Wed, 11 May 2022, Paolo Abeni wrote:

> Hello,
>
> re-reviewing again the commit:
>
> mptcp: reset subflow when MP_FAIL doesn't respond
>
> There are a few issue with sk_timer handling, specifically:
> - no additional mptcp_data_lock is needed to stop/reset it
> - I think overlapping infinite mapping reception and close may lead to
> some inconsistent timer usage
>

I'm working on a patch for this.

The issues you describe are what I was trying to comment on in
https://lore.kernel.org/mptcp/bdde3a40-1471-c8b9-65bd-f53ddebacdb4@linux.intel.com/, 
but my proposed solution (deferred events for coordinating with 
msk->sk_state) isn't the way to fix it.

I think checking the subflows for SOCK_DEAD before changing the timer 
should prevent bad interaction with msk close, since the subflows are 
always orphaned before the msk resets sk_timer.

> More importantly I could not find the RFC reference for such
> timer?!? @Geliang, @Mat could you please point that out?!?
>

The RFC doesn't specify what to do if the peer ignores the MP_FAIL. 
Resending seemed to be hard to resolve correctly (what if it turned out to 
be a duplicate?), so the choice was to revert to the reset behavior after 
a timeout.

--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with MP_FAIL timer
  2022-05-11 15:32 ` Mat Martineau
@ 2022-05-11 16:10   ` Paolo Abeni
  2022-05-11 18:24     ` Mat Martineau
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2022-05-11 16:10 UTC (permalink / raw)
  To: Mat Martineau; +Cc: mptcp, Geliang Tang

On Wed, 2022-05-11 at 08:32 -0700, Mat Martineau wrote:
> On Wed, 11 May 2022, Paolo Abeni wrote:
> 
> > Hello,
> > 
> > re-reviewing again the commit:
> > 
> > mptcp: reset subflow when MP_FAIL doesn't respond
> > 
> > There are a few issue with sk_timer handling, specifically:
> > - no additional mptcp_data_lock is needed to stop/reset it
> > - I think overlapping infinite mapping reception and close may lead to
> > some inconsistent timer usage
> > 
> 
> I'm working on a patch for this.
> 
> The issues you describe are what I was trying to comment on in
> https://lore.kernel.org/mptcp/bdde3a40-1471-c8b9-65bd-f53ddebacdb4@linux.intel.com/, 
> but my proposed solution (deferred events for coordinating with 
> msk->sk_state) isn't the way to fix it.

I think there is anoter issue: in mptcp_timeout_timer(), via
mptcp_check_mp_fail_response() -> mp_fail_response_expect_subflow() ->
mp_fail_response_expect_subflow(), the code traverse the conn_list
under the msk socket spin_lock. That is not enough: it should instead
acquire the full msk socket lock. 

We can use the is_not_owned || defer to release_cb schema, but I think
we are simply better off reverting such patch. Any strong opinion
against that?

Paolo
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with MP_FAIL timer
  2022-05-11 16:10   ` Paolo Abeni
@ 2022-05-11 18:24     ` Mat Martineau
  0 siblings, 0 replies; 5+ messages in thread
From: Mat Martineau @ 2022-05-11 18:24 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp, Geliang Tang

On Wed, 11 May 2022, Paolo Abeni wrote:

> On Wed, 2022-05-11 at 08:32 -0700, Mat Martineau wrote:
>> On Wed, 11 May 2022, Paolo Abeni wrote:
>>
>>> Hello,
>>>
>>> re-reviewing again the commit:
>>>
>>> mptcp: reset subflow when MP_FAIL doesn't respond
>>>
>>> There are a few issue with sk_timer handling, specifically:
>>> - no additional mptcp_data_lock is needed to stop/reset it
>>> - I think overlapping infinite mapping reception and close may lead to
>>> some inconsistent timer usage
>>>
>>
>> I'm working on a patch for this.
>>
>> The issues you describe are what I was trying to comment on in
>> https://lore.kernel.org/mptcp/bdde3a40-1471-c8b9-65bd-f53ddebacdb4@linux.intel.com/,
>> but my proposed solution (deferred events for coordinating with
>> msk->sk_state) isn't the way to fix it.
>
> I think there is anoter issue: in mptcp_timeout_timer(), via
> mptcp_check_mp_fail_response() -> mp_fail_response_expect_subflow() ->
> mp_fail_response_expect_subflow(), the code traverse the conn_list
> under the msk socket spin_lock. That is not enough: it should instead
> acquire the full msk socket lock.

Agreed.

> We can use the is_not_owned || defer to release_cb schema, but I think
> we are simply better off reverting such patch. Any strong opinion
> against that?

I don't think we can just revert: it's important to either fall back or 
reset, and not leave the connection in some indefinite state after sending 
a MP_FAIL that's ignored (the RFC is specific that the peer must respond 
with MP_FAIL). So we need to use some timer to implement that behavior, or 
use other events (rx/tx/etc) to check for a timely MP_FAIL reply and reset 
if it hasn't arrived. The timer approach seems simpler.


--
Mat Martineau
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-11 18:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11 11:28 problems with MP_FAIL timer Paolo Abeni
2022-05-11 14:16 ` Geliang Tang
2022-05-11 15:32 ` Mat Martineau
2022-05-11 16:10   ` Paolo Abeni
2022-05-11 18:24     ` Mat Martineau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.