netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: "Duan,Muquan" <duanmuquan@baidu.com>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
	"dsahern@kernel.org" <dsahern@kernel.org>,
	 "kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	 "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] tcp: fix connection reset due to tw hashdance race.
Date: Thu, 8 Jun 2023 13:54:09 +0200	[thread overview]
Message-ID: <CANn89iJ5kHmksR=nGSMVjacuV0uqu5Hs0g1s343gvAM9Yf=+Bg@mail.gmail.com> (raw)
In-Reply-To: <7FD2F3ED-A3B5-40EF-A505-E7A642D73208@baidu.com>

On Thu, Jun 8, 2023 at 1:24 PM Duan,Muquan <duanmuquan@baidu.com> wrote:
>
> Besides trying to find the right tw sock, another idea is that if FIN segment finds listener sock, just discard the segment, because this is obvious a bad case, and the peer will retransmit it. Or for FIN segment we only look up in the established hash table, if not found then discard it.
>

Sure, please give the RFC number and section number that discusses
this point, and then we might consider this.

Just another reminder about TW : timewait sockets are "best effort".

Their allocation can fail, and /proc/sys/net/ipv4/tcp_max_tw_buckets
can control their number to 0

Applications must be able to recover gracefully if a 4-tuple is reused too fast.

>
> 2023年6月8日 下午12:13,Eric Dumazet <edumazet@google.com> 写道:
>
> On Thu, Jun 8, 2023 at 5:59 AM Duan,Muquan <duanmuquan@baidu.com> wrote:
>
>
> Hi, Eric,
>
> Thanks a lot for your explanation!
>
> Even if we add reader lock,  if set the refcnt outside spin_lock()/spin_unlock(), during the interval between spin_unlock() and refcnt_set(),  other cpus will see the tw sock with refcont 0, and validation for refcnt will fail.
>
> A suggestion, before the tw sock is added into ehash table, it has been already used by tw timer and bhash chain, we can firstly add refcnt to 2 before adding two to ehash table,. or add the refcnt one by one for timer, bhash and ehash. This  can avoid the refcont validation failure on other cpus.
>
> This can reduce the frequency of the connection reset issue from 20 min to 180 min for our product,  We may wait quite a long time before the best solution is ready, if this obvious defect is fixed, userland applications can benefit from it.
>
> Looking forward to your opinions!
>
>
> Again, my opinion is that we need a proper fix, not work arounds.
>
> I will work on this a bit later.
>
> In the meantime you can apply locally your patch if you feel this is
> what you want.
>
>

  parent reply	other threads:[~2023-06-08 11:54 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06  6:43 [PATCH v2] tcp: fix connection reset due to tw hashdance race Duan Muquan
2023-06-06  7:07 ` Eric Dumazet
     [not found]   ` <DFBEBE81-34A5-4394-9C5B-1A849A6415F1@baidu.com>
2023-06-07 13:32     ` Eric Dumazet
2023-06-07 15:18       ` Duan,Muquan
2023-06-07 15:27         ` Eric Dumazet
     [not found]           ` <8C32A1F5-1160-4863-9201-CF9346290115@baidu.com>
2023-06-08  4:13             ` Eric Dumazet
     [not found]               ` <7FD2F3ED-A3B5-40EF-A505-E7A642D73208@baidu.com>
2023-06-08 11:54                 ` Eric Dumazet [this message]
2023-06-15 12:14                   ` Duan,Muquan
2023-06-15 15:24                     ` Eric Dumazet
2023-06-20  3:30                       ` Duan,Muquan
2023-06-20  8:44                         ` Eric Dumazet
2023-06-20 10:37                           ` Duan,Muquan
2023-06-08  5:47       ` Kuniyuki Iwashima
2023-06-08  6:35         ` Eric Dumazet
2023-06-19 17:03           ` Kuniyuki Iwashima
2023-06-19 17:39             ` Eric Dumazet
2023-06-19 17:58               ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iJ5kHmksR=nGSMVjacuV0uqu5Hs0g1s343gvAM9Yf=+Bg@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=duanmuquan@baidu.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).