linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <cwang@twopensource.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Tejun Heo <tj@kernel.org>, David Miller <davem@davemloft.net>,
	Tom Herbert <tom@herbertland.com>,
	kafai@fb.com, kernel-team <kernel-team@fb.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jiri Pirko <jiri@resnulli.us>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	Thomas Graf <tgraf@suug.ch>, Scott Feldman <sfeldma@gmail.com>
Subject: Re: Possible netlink autobind regression
Date: Wed, 16 Sep 2015 22:02:00 -0700	[thread overview]
Message-ID: <CAHA+R7OdJyZiidHarKhp-U6R5fkHj=5KjugK=XoWj=q4D8krxw@mail.gmail.com> (raw)
In-Reply-To: <20150917034134.GA19327@gondor.apana.org.au>

On Wed, Sep 16, 2015 at 8:41 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Thu, Sep 17, 2015 at 11:08:45AM +0800, Herbert Xu wrote:
>>
>> Good catch! I think your explanation makes perfect sense.  Linus
>> ran into this previously too after suspend-and-resume.
>
> Unfortunately you can't just postpone the setting of portid because
> once you pass it onto rhashtable the portid must never change while
> it's in custody.
>
> So what I've done is essentially revert my previous fix and instead
> add a new boolean "bound" to indicate whether the socket has been
> bound.
>
> ---8<---
> netlink: Fix autobind race condition that leads to zero port ID
>
> The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
> Reset portid after netlink_insert failure") introduced a race
> condition where if two threads tried to autobind the same socket
> one of them may end up with a zero port ID.
>
> This patch reverts that commit and instead fixes it by introducing
> a separte "bound" variable to indicate whether a socket has been
> bound.
>
> Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
> Reported-by: Tejun Heo <tj@kernel.org>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>

We saw similar soft lockup with the one Tejun reported, in our data
center.


> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Just one comment below.

[...]

> @@ -1285,7 +1287,7 @@ static int netlink_release(struct socket *sock)
>
>         skb_queue_purge(&sk->sk_write_queue);
>
> -       if (nlk->portid) {
> +       if (nlk->bound) {
>                 struct netlink_notify n = {
>                                                 .net = sock_net(sk),
>                                                 .protocol = sk->sk_protocol,

This part doesn't look correct, seems it is checking if this is a kernel
netlink socket rather than if it is bound. But I am not sure...

Other than this, looks good to me:

Reviewed-by: Cong Wang <cwang@twopensource.com>

  reply	other threads:[~2015-09-17  5:02 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17  2:29 Possible netlink autobind regression Tejun Heo
2015-09-17  3:08 ` Herbert Xu
2015-09-17  3:41   ` Herbert Xu
2015-09-17  5:02     ` Cong Wang [this message]
2015-09-17  5:15       ` Herbert Xu
2015-09-17 11:25         ` Thomas Graf
2015-09-17 11:30         ` Tejun Heo
2015-09-18  6:36           ` [PATCH v3] netlink: Fix autobind race condition that leads to zero port ID Herbert Xu
2015-09-18 11:16             ` [PATCH v4] " Herbert Xu
2015-09-21  5:55               ` David Miller
2015-09-21  6:06                 ` Herbert Xu
2015-09-21  6:11                   ` David Miller
2015-09-21 13:34                     ` netlink: Replace rhash_portid with bound Herbert Xu
2015-09-21 18:20                       ` Tejun Heo
2015-09-22  3:38                         ` [PATCH v2] " Herbert Xu
2015-09-22 16:10                           ` Tejun Heo
2015-09-22 18:42                             ` Linus Torvalds
2015-09-22 18:53                               ` Tejun Heo
2015-09-22 19:28                                 ` Linus Torvalds
2015-09-22 19:50                                   ` Tejun Heo
2015-09-22 20:03                                     ` Linus Torvalds
2015-09-22 20:36                                       ` Bjørn Mork
2015-09-22 21:04                                         ` Linus Torvalds
2015-09-23  6:13                             ` Herbert Xu
2015-09-23 15:54                               ` Tejun Heo
2015-09-24  2:30                                 ` Herbert Xu
2015-09-24  2:46                                   ` Tejun Heo
2015-09-24  2:54                                     ` Herbert Xu
2015-09-24  3:06                                       ` Tejun Heo
2015-09-24  3:21                                         ` Herbert Xu
2015-09-24  3:29                                           ` Tejun Heo
2015-09-24  3:31                                             ` Herbert Xu
2015-09-24  3:41                                               ` Tejun Heo
2015-09-24  3:42                                                 ` Herbert Xu
2015-09-24  3:43                                                   ` Tejun Heo
2015-09-24  3:44                                                     ` Herbert Xu
2015-09-24 19:11                           ` David Miller
2015-09-24 20:05                             ` Tejun Heo
2015-09-25  1:43                               ` netlink: Add barrier to netlink_connect for theoretical case Herbert Xu
2015-09-25  3:24                                 ` Linus Torvalds
2015-09-25  3:39                                   ` Herbert Xu
2015-09-25 15:09                                     ` Tejun Heo
2015-09-25 15:01                                 ` Tejun Heo
2015-09-26 13:16                                   ` netlink: Add netlink_bound helper and use it in netlink_getname Herbert Xu
2015-09-26 18:09                                     ` Tejun Heo
2015-09-26 19:41                                       ` Herbert Xu
2015-09-26 19:45                                         ` Tejun Heo
2015-09-26 19:49                                           ` Herbert Xu
2015-09-26 19:52                                             ` Tejun Heo
2015-09-26 19:55                                               ` Herbert Xu
2015-09-26 20:05                                                 ` Tejun Heo
2015-09-26 20:10                                                   ` Herbert Xu
2015-09-26 20:17                                                     ` Tejun Heo
2015-09-21 20:52                       ` [PATCH] netlink: Replace rhash_portid with load_acquire protected boolean Tejun Heo
2015-09-18 13:37             ` [PATCH v3] netlink: Fix autobind race condition that leads to zero port ID Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHA+R7OdJyZiidHarKhp-U6R5fkHj=5KjugK=XoWj=q4D8krxw@mail.gmail.com' \
    --to=cwang@twopensource.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=jiri@resnulli.us \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=sfeldma@gmail.com \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    --cc=tom@herbertland.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).