All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <weiwan@google.com>
To: Martin Zaharinov <micron10@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Alexander Duyck <alexanderduyck@fb.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy poll
Date: Fri, 26 Feb 2021 14:37:56 -0800	[thread overview]
Message-ID: <CAEA6p_A93G5he_kqMzCZWgOThOT_rkB0WQ1oNA4wLVpthv2H9A@mail.gmail.com> (raw)
In-Reply-To: <CALidq=UWupwXMMYAMMF2GW4ifR0WQJos6VqXPuzQ0_seHGUHdA@mail.gmail.com>

On Fri, Feb 26, 2021 at 2:36 PM Martin Zaharinov <micron10@gmail.com> wrote:
>
>
>
> On Sat, Feb 27, 2021, 00:24 Wei Wang <weiwan@google.com> wrote:
>>
>> On Fri, Feb 26, 2021 at 1:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >
>> > On Fri, 26 Feb 2021 10:28:00 -0800 Wei Wang wrote:
>> > > Hi Martin,
>> > > Could you help try the following new patch on your setup and let me
>> > > know if there are still issues?
>> >
>> > FWIW your email got line wrapped for me.
>> >
>> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> > > index ddf4cfc12615..9ed0f89ccdd5 100644
>> > > --- a/include/linux/netdevice.h
>> > > +++ b/include/linux/netdevice.h
>> > > @@ -357,9 +357,10 @@ enum {
>> > >         NAPI_STATE_NPSVC,               /* Netpoll - don't dequeue
>> > > from poll_list */
>> > >         NAPI_STATE_LISTED,              /* NAPI added to system lists */
>> > >         NAPI_STATE_NO_BUSY_POLL,        /* Do not add in napi_hash, no
>> > > busy polling */
>> > > -       NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>> > > +       NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() grabs SHED
>> >
>> > nit: SHED -> SCHED
>> Ack.
>>
>> >
>> > > bit and could busy poll */
>> > >         NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over
>> > > softirq processing*/
>> > >         NAPI_STATE_THREADED,            /* The poll is performed
>> > > inside its own thread*/
>> > > +       NAPI_STATE_SCHED_BUSY_POLL,     /* Napi is currently scheduled
>> > > in busy poll mode */
>> >
>> > nit: Napi -> NAPI ?
>> Ack.
>>
>> >
>> > >  };
>> > >
>> > >  enum {
>> > > @@ -372,6 +373,7 @@ enum {
>> > >         NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>> > >         NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>> > >         NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>> > > +       NAPIF_STATE_SCHED_BUSY_POLL     = BIT(NAPI_STATE_SCHED_BUSY_POLL),
>> > >  };
>> > >
>> > >  enum gro_result {
>> > > diff --git a/net/core/dev.c b/net/core/dev.c
>> > > index 6c5967e80132..c717b67ce137 100644
>> > > --- a/net/core/dev.c
>> > > +++ b/net/core/dev.c
>> > > @@ -1501,15 +1501,14 @@ static int napi_kthread_create(struct napi_struct *n)
>> > >  {
>> > >         int err = 0;
>> > >
>> > > -       /* Create and wake up the kthread once to put it in
>> > > -        * TASK_INTERRUPTIBLE mode to avoid the blocked task
>> > > -        * warning and work with loadavg.
>> > > +       /* Avoid using  kthread_run() here to prevent race
>> > > +        * between softirq and kthread polling.
>> > >          */
>> > > -       n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
>> > > -                               n->dev->name, n->napi_id);
>> > > +       n->thread = kthread_create(napi_threaded_poll, n, "napi/%s-%d",
>> > > +                                  n->dev->name, n->napi_id);
>> >
>> > I'm not sure this takes care of rapid:
>> >
>> > dev_set_threaded(0)
>> >  # NAPI gets sent to sirq
>> > dev_set_threaded(1)
>> >
>> > since subsequent set_threaded(1) doesn't spawn the thread "afresh".
>> >
>>
>> I think the race between softirq and kthread could be purely dependent
>> on the SCHED bit. In napi_schedule_prep(), we check if SCHED bit is
>> set. And we only call ____napi_schedule() when SCHED bit is not set.
>> In ____napi_schedule(), we either wake up kthread, or raise softirq,
>> never both.
>> So as long as we don't wake up the kthread when creating it, there
>> should not be a chance of race between softirq and kthread.
>>
>> > >         if (IS_ERR(n->thread)) {
>> > >                 err = PTR_ERR(n->thread);
>> > > -               pr_err("kthread_run failed with err %d\n", err);
>> > > +               pr_err("kthread_create failed with err %d\n", err);
>> > >                 n->thread = NULL;
>> > >         }
>> > >
>> > > @@ -6486,6 +6485,7 @@ bool napi_complete_done(struct napi_struct *n,
>> > > int work_done)
>> > >                 WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>> > >
>> > >                 new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>> > > +                             NAPIF_STATE_SCHED_BUSY_POLL |
>> > >                               NAPIF_STATE_PREFER_BUSY_POLL);
>> > >
>> > >                 /* If STATE_MISSED was set, leave STATE_SCHED set,
>> > > @@ -6525,6 +6525,7 @@ static struct napi_struct *napi_by_id(unsigned
>> > > int napi_id)
>> > >
>> > >  static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
>> > >  {
>> > > +       clear_bit(NAPI_STATE_SCHED_BUSY_POLL, &napi->state);
>> > >         if (!skip_schedule) {
>> > >                 gro_normal_list(napi);
>> > >                 __napi_schedule(napi);
>> > > @@ -6624,7 +6625,8 @@ void napi_busy_loop(unsigned int napi_id,
>> > >                         }
>> > >                         if (cmpxchg(&napi->state, val,
>> > >                                     val | NAPIF_STATE_IN_BUSY_POLL |
>> > > -                                         NAPIF_STATE_SCHED) != val) {
>> > > +                                         NAPIF_STATE_SCHED |
>> > > +                                         NAPIF_STATE_SCHED_BUSY_POLL) != val) {
>> > >                                 if (prefer_busy_poll)
>> > >
>> > > set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
>> > >                                 goto count;
>> > > @@ -6971,7 +6973,10 @@ static int napi_thread_wait(struct napi_struct *napi)
>> > >         set_current_state(TASK_INTERRUPTIBLE);
>> > >
>> > >         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>> > > -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>> > > +               unsigned long val = READ_ONCE(napi->state);
>> > > +
>> > > +               if (val & NAPIF_STATE_SCHED &&
>> > > +                   !(val & NAPIF_STATE_SCHED_BUSY_POLL)) {
>> >
>> > Again, not protected from the napi_disable() case AFAICT.
>>
>> Hmmm..... Yes. I think you are right. I missed that napi_disable()
>> also grabs the SCHED bit. In this case, I think we have to use the
>> SCHED_THREADED bit. The SCHED_BUSY_POLL bit is not enough to protect
>> the race between napi_disable() and napi_threaded_poll(). :(
>> Sorry, I missed this point when evaluating both solutions. I will have
>> to switch to use the SCHED_THREADED bit.
>
>
>
> should I wait with the test
> when you fix this?
>
Yes. Please. Sorry for the confusion.

>>
>>
>> >
>> > >                         WARN_ON(!list_empty(&napi->poll_list));
>> > >                         __set_current_state(TASK_RUNNING);
>> > >                         return 0;

  parent reply	other threads:[~2021-02-26 22:39 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23 23:41 [PATCH net] net: fix race between napi kthread mode and busy poll Wei Wang
2021-02-24 19:48 ` Jakub Kicinski
2021-02-24 20:37   ` Eric Dumazet
2021-02-24 21:30     ` Jakub Kicinski
2021-02-24 22:29       ` Wei Wang
2021-02-24 23:29         ` Jakub Kicinski
     [not found]       ` <CANn89i+xGsMpRfPwZK281jyfum_1fhTNFXq7Z8HOww9H1BHmiw@mail.gmail.com>
2021-02-24 23:52         ` Jakub Kicinski
2021-02-24 23:59           ` Eric Dumazet
2021-02-25  0:07             ` Jakub Kicinski
2021-02-25  0:11               ` Alexander Duyck
2021-02-25  0:16                 ` Wei Wang
2021-02-25  0:32                   ` Jakub Kicinski
2021-02-25  0:44                     ` Wei Wang
2021-02-25  0:49                       ` Jakub Kicinski
2021-02-25  1:06                         ` Wei Wang
2021-02-25  1:40                           ` Jakub Kicinski
2021-02-25  2:16                             ` Wei Wang
2021-02-25  0:20                 ` Jakub Kicinski
2021-02-25  1:22                   ` Alexander Duyck
2021-02-25  2:03                     ` Jakub Kicinski
2021-02-25  2:31                       ` Wei Wang
2021-02-25  5:52                         ` Martin Zaharinov
2021-02-25  8:21                         ` Jakub Kicinski
2021-02-25 18:29                           ` Wei Wang
2021-02-25 23:00                             ` Jakub Kicinski
2021-02-26  0:16                               ` Wei Wang
2021-02-26  1:18                                 ` Jakub Kicinski
2021-02-26  1:49                                   ` Wei Wang
2021-02-26  3:52                                   ` Alexander Duyck
2021-02-26 18:28                                     ` Wei Wang
2021-02-26 21:35                                       ` Jakub Kicinski
2021-02-26 22:24                                         ` Wei Wang
     [not found]                                           ` <CALidq=UWupwXMMYAMMF2GW4ifR0WQJos6VqXPuzQ0_seHGUHdA@mail.gmail.com>
2021-02-26 22:37                                             ` Wei Wang [this message]
2021-02-26 23:10                                           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEA6p_A93G5he_kqMzCZWgOThOT_rkB0WQ1oNA4wLVpthv2H9A@mail.gmail.com \
    --to=weiwan@google.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=kuba@kernel.org \
    --cc=micron10@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.