From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 482B87F for ; Fri, 17 Jun 2022 00:27:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655425670; x=1686961670; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=ubZBXvjq6EG2W6pPpBceUDNlQkCRbxLyd8CokoDLArg=; b=LmAC5UhTgewcriACSwxN+LQRDOXVrWxrmcFtrcwT37XAZNU+cfXYpHlJ mtxqHF9cO9FpL/ukg1cycI8SzVPfRW0xz1nwurWyz7iqg9qIPkJ2Dg9l3 PEVfpei/hHKrTZdPzYlV4wA2x0h5BssccOrnmVOnS+negv/rQ98B0gBfz 0xp0HXK6yyaPX3jjzSwnouJyfUFfpm0GF7wO2sMHnFIiR8FDBRQMST392 fDhYRdKm2nW89a5dWJrMB7DTr+EHMImbu6sFcJ3qlno7ZOexISeeYR9G1 ZlIvXCz2XglX7Cjh0A4cmW638KqK0ZZMaIliLywi7VYjljiMfmCeM5B0Y Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10380"; a="343345856" X-IronPort-AV: E=Sophos;i="5.92,306,1650956400"; d="scan'208";a="343345856" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2022 17:27:49 -0700 X-IronPort-AV: E=Sophos;i="5.92,306,1650956400"; d="scan'208";a="688059640" Received: from lhossain-mobl.amr.corp.intel.com ([10.255.231.225]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jun 2022 17:27:49 -0700 Date: Thu, 16 Jun 2022 17:27:48 -0700 (PDT) From: Mat Martineau To: Paolo Abeni cc: mptcp@lists.linux.dev Subject: Re: [PATCH mptcp-net v2 3/6] Squash-to: "mptcp: invoke MP_FAIL response when needed" In-Reply-To: <0eaffe68b6ffa6ad0465e8f0ec85e896a8a49d7b.camel@redhat.com> Message-ID: <97bec763-478b-8dd9-917a-8bf1867b13be@linux.intel.com> References: <1deca06703f8666dc199b76f39cd881733741666.1655324843.git.pabeni@redhat.com> <58c2377-cc90-7f55-9aa5-1a9d6578fd8a@linux.intel.com> <0eaffe68b6ffa6ad0465e8f0ec85e896a8a49d7b.camel@redhat.com> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed On Thu, 16 Jun 2022, Paolo Abeni wrote: > On Wed, 2022-06-15 at 17:59 -0700, Mat Martineau wrote: >> On Wed, 15 Jun 2022, Paolo Abeni wrote: >> >>> This tries to address a few issues outstanding in the mentioned >>> patch: >>> - we explicitly need to reset the timeout timer for mp_fail's sake >>> - we need to explicitly generate a tcp ack for mp_fail, otherwise >>> there are no guarantees for suck option being sent out >>> - the timeout timer needs handling need some caring, as it's still >>> shared between mp_fail and msk socket timeout. >>> - we can re-use msk->first for msk->fail_ssk, as only the first/mpc >>> subflow can fail without reset. That additionally avoid the need >>> to clear fail_ssk on the relevant subflow close. >>> - fail_tout would need some additional annotation. Just to be on the >>> safe side move its manipulaiton under the ssk socket lock. >>> >>> Last 2 paragraph of the squash to commit should be replaced with: >>> >>> """ >>> It leverages the fact that only the MPC/first subflow can gracefully >>> fail to avoid unneeded subflows traversal: the failing subflow can >>> be only msk->first. >>> >>> A new 'fail_tout' field is added to the subflow context to record the >>> MP_FAIL response timeout and use such field to reliably share the >>> timeout timer between the MP_FAIL event and the MPTCP socket close timeout. >>> >>> Finally, a new ack is generated to send out MP_FAIL notification as soon >>> as we hit the relevant condition, instead of waiting a possibly unbound >>> time for the next data packet. >>> """ >>> >>> Signed-off-by: Paolo Abeni >>> --- >>> net/mptcp/pm.c | 4 +++- >>> net/mptcp/protocol.c | 54 ++++++++++++++++++++++++++++++++++++-------- >>> net/mptcp/protocol.h | 4 ++-- >>> net/mptcp/subflow.c | 24 ++++++++++++++++++-- >>> 4 files changed, 72 insertions(+), 14 deletions(-) >>> >>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c >>> index 3c7f07bb124e..45e2a48397b9 100644 >>> --- a/net/mptcp/pm.c >>> +++ b/net/mptcp/pm.c >>> @@ -305,13 +305,15 @@ void mptcp_pm_mp_fail_received(struct sock *sk, u64 fail_seq) >>> if (!READ_ONCE(msk->allow_infinite_fallback)) >>> return; >>> >>> - if (!msk->fail_ssk) { >>> + if (!subflow->fail_tout) { >>> pr_debug("send MP_FAIL response and infinite map"); >>> >>> subflow->send_mp_fail = 1; >>> subflow->send_infinite_map = 1; >>> + tcp_send_ack(sk); >>> } else { >>> pr_debug("MP_FAIL response received"); >>> + WRITE_ONCE(subflow->fail_tout, 0); >>> } >>> } >>> >>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >>> index a0f9f3831509..50026b8da625 100644 >>> --- a/net/mptcp/protocol.c >>> +++ b/net/mptcp/protocol.c >>> @@ -500,7 +500,7 @@ static void mptcp_set_timeout(struct sock *sk) >>> __mptcp_set_timeout(sk, tout); >>> } >>> >>> -static bool tcp_can_send_ack(const struct sock *ssk) >>> +static inline bool tcp_can_send_ack(const struct sock *ssk) >>> { >>> return !((1 << inet_sk_state_load(ssk)) & >>> (TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_TIME_WAIT | TCPF_CLOSE | TCPF_LISTEN)); >>> @@ -2490,24 +2490,56 @@ static void __mptcp_retrans(struct sock *sk) >>> mptcp_reset_timer(sk); >>> } >>> >>> +/* schedule the timeout timer for the nearest relevant event: either >>> + * close timeout or mp_fail timeout. Both of them could be not >>> + * scheduled yet >>> + */ >>> +void mptcp_reset_timeout(struct mptcp_sock *msk, unsigned long fail_tout) >>> +{ >>> + struct sock *sk = (struct sock *)msk; >>> + unsigned long timeout, close_timeout; >>> + >>> + if (!fail_tout && !sock_flag(sk, SOCK_DEAD)) >>> + return; >>> + >>> + close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN; >>> + >>> + /* the following is basically time_min(close_timeout, fail_tout) */ >>> + if (!fail_tout) >>> + timeout = close_timeout; >>> + else if (!sock_flag(sk, SOCK_DEAD)) >>> + timeout = fail_tout; >>> + else if (time_after(close_timeout, fail_tout)) >>> + timeout = fail_tout; >>> + else >>> + timeout = close_timeout; >>> + >>> + sk_reset_timer(sk, &sk->sk_timer, timeout); >>> +} >> >> Hi Paolo - >> >> The above function seems more complex than needed. If mptcp_close() has >> been called, the fail timeout should be considered canceled and the >> "close" has exclusive use of sk_timer. subflow->fail_tout can be set to 0 >> and sk_timer can be set only based on TCP_TIMEWAIT_LEN. > > I agree the above function is non trivial, on the flip side, it helps > keeping all the timeout logic in a single place. > > Note that msk->first->fail_tout is under the subflow socket lock > protection, and this function is not always invoked under such lock, so > we will need to either add more read/write once annotation or split the > logic in the caller. > > Side note: I agree a bit with David Laight wrt *_once preoliferation, > possibly with less divisive language: > https://lore.kernel.org/netdev/cea2c2c39d0e4f27b2e75cdbc8fce09d@AcuMS.aculab.com/ > > ;) Heh - yeah, I saw that exchange too :) > > *_ONCE are a bit hard to track and difficult to verify formally, so I > tried hard to reduce their usage. Currently we have only a single READ > operation outside the relevant lock, which is AFAIK the 'safer' > possible scenario with such annotations. All the write operations are > under the subflow socket lock. > >> TCP_RTO_MAX for the MP_FAIL timeout was kind of arbitrary - the spec >> doesn't say what exactly to do. We didn't want the connection to be in the >> "unacknowledged" MP_FAIL state forever, but also didn't want to give up >> too fast. > > This function don't make any assumption on the relative timeout length > and should fit the case if we will make them configurable someday > >> Given that, do you think the complexity is justified to possibly reset the >> subflow earlier in this "unacknowledged MP_FAIL followed by a close" >> scenario? > > IMHO yes ;) But if you have strong opinion otherwise I can refactor the > code... > I do think it's better to cancel the MP_FAIL timeout once mptcp_close() is called. It's not just about the code in mptcp_reset_timeout(), it doesn't seem meaningful to continue MP_FAIL handling after close and to have various scenarios to consider/test/etc. for timeout or MP_FAIL echo ordering. -- Mat Martineau Intel