All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mat Martineau <mathew.j.martineau@linux.intel.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp-net v2 3/6] Squash-to: "mptcp: invoke MP_FAIL response when needed"
Date: Thu, 16 Jun 2022 17:27:48 -0700 (PDT)	[thread overview]
Message-ID: <97bec763-478b-8dd9-917a-8bf1867b13be@linux.intel.com> (raw)
In-Reply-To: <0eaffe68b6ffa6ad0465e8f0ec85e896a8a49d7b.camel@redhat.com>

On Thu, 16 Jun 2022, Paolo Abeni wrote:

> On Wed, 2022-06-15 at 17:59 -0700, Mat Martineau wrote:
>> On Wed, 15 Jun 2022, Paolo Abeni wrote:
>>
>>> This tries to address a few issues outstanding in the mentioned
>>> patch:
>>> - we explicitly need to reset the timeout timer for mp_fail's sake
>>> - we need to explicitly generate a tcp ack for mp_fail, otherwise
>>>  there are no guarantees for suck option being sent out
>>> - the timeout timer needs handling need some caring, as it's still
>>>  shared between mp_fail and msk socket timeout.
>>> - we can re-use msk->first for msk->fail_ssk, as only the first/mpc
>>>  subflow can fail without reset. That additionally avoid the need
>>>  to clear fail_ssk on the relevant subflow close.
>>> - fail_tout would need some additional annotation. Just to be on the
>>>  safe side move its manipulaiton under the ssk socket lock.
>>>
>>> Last 2 paragraph of the squash to commit should be replaced with:
>>>
>>> """
>>> It leverages the fact that only the MPC/first subflow can gracefully
>>> fail to avoid unneeded subflows traversal: the failing subflow can
>>> be only msk->first.
>>>
>>> A new 'fail_tout' field is added to the subflow context to record the
>>> MP_FAIL response timeout and use such field to reliably share the
>>> timeout timer between the MP_FAIL event and the MPTCP socket close timeout.
>>>
>>> Finally, a new ack is generated to send out MP_FAIL notification as soon
>>> as we hit the relevant condition, instead of waiting a possibly unbound
>>> time for the next data packet.
>>> """
>>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> net/mptcp/pm.c       |  4 +++-
>>> net/mptcp/protocol.c | 54 ++++++++++++++++++++++++++++++++++++--------
>>> net/mptcp/protocol.h |  4 ++--
>>> net/mptcp/subflow.c  | 24 ++++++++++++++++++--
>>> 4 files changed, 72 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>>> index 3c7f07bb124e..45e2a48397b9 100644
>>> --- a/net/mptcp/pm.c
>>> +++ b/net/mptcp/pm.c
>>> @@ -305,13 +305,15 @@ void mptcp_pm_mp_fail_received(struct sock *sk, u64 fail_seq)
>>> 	if (!READ_ONCE(msk->allow_infinite_fallback))
>>> 		return;
>>>
>>> -	if (!msk->fail_ssk) {
>>> +	if (!subflow->fail_tout) {
>>> 		pr_debug("send MP_FAIL response and infinite map");
>>>
>>> 		subflow->send_mp_fail = 1;
>>> 		subflow->send_infinite_map = 1;
>>> +		tcp_send_ack(sk);
>>> 	} else {
>>> 		pr_debug("MP_FAIL response received");
>>> +		WRITE_ONCE(subflow->fail_tout, 0);
>>> 	}
>>> }
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index a0f9f3831509..50026b8da625 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -500,7 +500,7 @@ static void mptcp_set_timeout(struct sock *sk)
>>> 	__mptcp_set_timeout(sk, tout);
>>> }
>>>
>>> -static bool tcp_can_send_ack(const struct sock *ssk)
>>> +static inline bool tcp_can_send_ack(const struct sock *ssk)
>>> {
>>> 	return !((1 << inet_sk_state_load(ssk)) &
>>> 	       (TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_TIME_WAIT | TCPF_CLOSE | TCPF_LISTEN));
>>> @@ -2490,24 +2490,56 @@ static void __mptcp_retrans(struct sock *sk)
>>> 		mptcp_reset_timer(sk);
>>> }
>>>
>>> +/* schedule the timeout timer for the nearest relevant event: either
>>> + * close timeout or mp_fail timeout. Both of them could be not
>>> + * scheduled yet
>>> + */
>>> +void mptcp_reset_timeout(struct mptcp_sock *msk, unsigned long fail_tout)
>>> +{
>>> +	struct sock *sk = (struct sock *)msk;
>>> +	unsigned long timeout, close_timeout;
>>> +
>>> +	if (!fail_tout && !sock_flag(sk, SOCK_DEAD))
>>> +		return;
>>> +
>>> +	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN;
>>> +
>>> +	/* the following is basically time_min(close_timeout, fail_tout) */
>>> +	if (!fail_tout)
>>> +		timeout = close_timeout;
>>> +	else if (!sock_flag(sk, SOCK_DEAD))
>>> +		timeout = fail_tout;
>>> +	else if (time_after(close_timeout, fail_tout))
>>> +		timeout = fail_tout;
>>> +	else
>>> +		timeout = close_timeout;
>>> +
>>> +	sk_reset_timer(sk, &sk->sk_timer, timeout);
>>> +}
>>
>> Hi Paolo -
>>
>> The above function seems more complex than needed. If mptcp_close() has
>> been called, the fail timeout should be considered canceled and the
>> "close" has exclusive use of sk_timer. subflow->fail_tout can be set to 0
>> and sk_timer can be set only based on TCP_TIMEWAIT_LEN.
>
> I agree the above function is non trivial, on the flip side, it helps
> keeping all the timeout logic in a single place.
>
> Note that msk->first->fail_tout is under the subflow socket lock
> protection, and this function is not always invoked under such lock, so
> we will need to either add more read/write once annotation or split the
> logic in the caller.
>
> Side note: I agree a bit with David Laight wrt *_once preoliferation,
> possibly with less divisive language:
> https://lore.kernel.org/netdev/cea2c2c39d0e4f27b2e75cdbc8fce09d@AcuMS.aculab.com/
>
> ;)

Heh - yeah, I saw that exchange too :)

>
> *_ONCE are a bit hard to track and difficult to verify formally, so I
> tried hard to reduce their usage. Currently we have only a single READ
> operation outside the relevant lock, which is AFAIK the 'safer'
> possible scenario with such annotations. All the write operations are
> under the subflow socket lock.
>
>> TCP_RTO_MAX for the MP_FAIL timeout was kind of arbitrary - the spec
>> doesn't say what exactly to do. We didn't want the connection to be in the
>> "unacknowledged" MP_FAIL state forever, but also didn't want to give up
>> too fast.
>
> This function don't make any assumption on the relative timeout length
> and should fit the case if we will make them configurable someday
>
>> Given that, do you think the complexity is justified to possibly reset the
>> subflow earlier in this "unacknowledged MP_FAIL followed by a close"
>> scenario?
>
> IMHO yes ;) But if you have strong opinion otherwise I can refactor the
> code...
>

I do think it's better to cancel the MP_FAIL timeout once mptcp_close() is 
called. It's not just about the code in mptcp_reset_timeout(), it doesn't 
seem meaningful to continue MP_FAIL handling after close and to have 
various scenarios to consider/test/etc. for timeout or MP_FAIL echo 
ordering.


--
Mat Martineau
Intel

  reply	other threads:[~2022-06-17  0:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-15 20:28 [PATCH mptcp-net v2 0/6] mptcp: mp_fail related fixes Paolo Abeni
2022-06-15 20:28 ` [PATCH mptcp-net v2 1/6] mptcp: fix error mibs accounting Paolo Abeni
2022-06-15 20:28 ` [PATCH mptcp-net v2 2/6] mptcp: introduce MAPPING_BAD_CSUM Paolo Abeni
2022-06-15 20:28 ` [PATCH mptcp-net v2 3/6] Squash-to: "mptcp: invoke MP_FAIL response when needed" Paolo Abeni
2022-06-16  0:59   ` Mat Martineau
2022-06-16  7:11     ` Paolo Abeni
2022-06-17  0:27       ` Mat Martineau [this message]
2022-06-15 20:28 ` [PATCH mptcp-net v2 4/6] mptcp: fix shutdown vs fallback race Paolo Abeni
2022-06-15 20:28 ` [PATCH mptcp-net v2 5/6] mptcp: consistent map handling on failure Paolo Abeni
2022-06-15 20:28 ` [PATCH mptcp-net v2 6/6] mptcp: fix race on unaccepted mptcp sockets Paolo Abeni
2022-06-15 22:10   ` mptcp: fix race on unaccepted mptcp sockets: Tests Results MPTCP CI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97bec763-478b-8dd9-917a-8bf1867b13be@linux.intel.com \
    --to=mathew.j.martineau@linux.intel.com \
    --cc=mptcp@lists.linux.dev \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.