All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	netdev@vger.kernel.org, linyunsheng@huawei.com,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Antoine Tenart" <atenart@kernel.org>,
	"Alexander Lobakin" <alobakin@pm.me>,
	"Wei Wang" <weiwan@google.com>, "Taehee Yoo" <ap420073@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>, "Arnd Bergmann" <arnd@arndb.de>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"Neil Horman" <nhorman@redhat.com>,
	"Dust Li" <dust.li@linux.alibaba.com>
Subject: Re: [PATCH net v2] napi: fix race inside napi_enable
Date: Mon, 18 Oct 2021 15:55:03 -0700	[thread overview]
Message-ID: <20211018155503.74aeaba9@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <YW3t8AGxW6p261hw@us.ibm.com>

On Mon, 18 Oct 2021 14:58:08 -0700 Sukadev Bhattiprolu wrote:
> >                       CPU0       |                   CPU1       | napi.state
> > ===============================================================================
> > napi_disable()                   |                              | SCHED | NPSVC
> > napi_enable()                    |                              |
> > {                                |                              |
> >     smp_mb__before_atomic();     |                              |
> >     clear_bit(SCHED, &n->state); |                              | NPSVC
> >                                  | napi_schedule_prep()         | SCHED | NPSVC
> >                                  | napi_poll()                  |
> >                                  |   napi_complete_done()       |
> >                                  |   {                          |
> >                                  |      if (n->state & (NPSVC | | (1)
> >                                  |               _BUSY_POLL)))  |
> >                                  |           return false;      |
> >                                  |     ................         |
> >                                  |   }                          | SCHED | NPSVC
> >                                  |                              |
> >     clear_bit(NPSVC, &n->state); |                              | SCHED
> > }                                |                              |  
> 
> So its possible that after cpu0 cleared SCHED, cpu1 could have set it
> back and we are going to use cmpxchg() to detect and retry right? If so,

This is a state diagram before the change. There's no cmpxchg() here.
        
> > napi_schedule_prep()             |                              | SCHED | MISSED (2)
> > 
> > (1) Here return direct. Because of NAPI_STATE_NPSVC exists.
> > (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list
> > 
> > Since NAPI_STATE_SCHED already exists and napi is not in the
> > sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
> > exist.
> > 
> > 1. This will cause this queue to no longer receive packets.
> > 2. If you encounter napi_disable under the protection of rtnl_lock, it
> >    will cause the entire rtnl_lock to be locked, affecting the overall
> >    system.
> > 
> > This patch uses cmpxchg to implement napi_enable(), which ensures that
> > there will be no race due to the separation of clear two bits.
> > 
> > Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > ---
> >  net/core/dev.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 74fd402d26dd..7ee9fecd3aff 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6923,12 +6923,16 @@ EXPORT_SYMBOL(napi_disable);
> >   */
> >  void napi_enable(struct napi_struct *n)
> >  {
> > -	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> > -	smp_mb__before_atomic();
> > -	clear_bit(NAPI_STATE_SCHED, &n->state);
> > -	clear_bit(NAPI_STATE_NPSVC, &n->state);
> > -	if (n->dev->threaded && n->thread)
> > -		set_bit(NAPI_STATE_THREADED, &n->state);
> > +	unsigned long val, new;
> > +
> > +	do {
> > +		val = READ_ONCE(n->state);
> > +		BUG_ON(!test_bit(NAPI_STATE_SCHED, &val));  
> 
> is this BUG_ON valid/needed? We could have lost the cmpxchg() and
> the other thread could have set NAPI_STATE_SCHED?

The BUG_ON() is here to make sure that when napi_enable() is called the
napi instance was dormant, i.e. disabled. We have "STATE_SCHED" bit set
on disabled NAPIs because that bit means ownership. Whoever disabled
the NAPI owns it.

That BUG_ON() could have been taken outside of the loop, there's no
point re-checking on every try. 

Are you seeing NAPI-related failures? We had at least 3 reports in the
last two weeks of strange failures which look like NAPI state getting
corrupted on net-next...

  reply	other threads:[~2021-10-18 22:55 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18  8:52 [PATCH net v2] napi: fix race inside napi_enable Xuan Zhuo
2021-09-20  8:50 ` patchwork-bot+netdevbpf
2021-09-20 19:20 ` Jakub Kicinski
2021-09-22  6:47   ` Xuan Zhuo
2021-09-23 13:14     ` Jakub Kicinski
     [not found]       ` <1632404456.506512-1-xuanzhuo@linux.alibaba.com>
2021-09-23 14:54         ` Jakub Kicinski
2021-10-18 21:58 ` Sukadev Bhattiprolu
2021-10-18 22:55   ` Jakub Kicinski [this message]
2021-10-18 23:36     ` Dany Madden
2021-10-18 23:47       ` Jakub Kicinski
2021-10-19  0:01         ` Sukadev Bhattiprolu
2021-10-22  3:16       ` Sukadev Bhattiprolu
2021-10-25 17:36         ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211018155503.74aeaba9@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=alobakin@pm.me \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dust.li@linux.alibaba.com \
    --cc=edumazet@google.com \
    --cc=linyunsheng@huawei.com \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@redhat.com \
    --cc=sukadev@linux.ibm.com \
    --cc=weiwan@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.