All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <weiwan@google.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Alexander Duyck <alexanderduyck@fb.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Martin Zaharinov <micron10@gmail.com>
Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy poll
Date: Thu, 25 Feb 2021 10:29:47 -0800	[thread overview]
Message-ID: <CAEA6p_DdccvmymRWEtggHgqb9dQ6NjK8rsrA03HH+r7mzt=5uw@mail.gmail.com> (raw)
In-Reply-To: <20210225002115.5f6215d8@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Thu, Feb 25, 2021 at 12:21 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 24 Feb 2021 18:31:55 -0800 Wei Wang wrote:
> > On Wed, Feb 24, 2021 at 6:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Thu, 25 Feb 2021 01:22:08 +0000 Alexander Duyck wrote:
> > > > Yeah, that was the patch Wei had done earlier. Eric complained about the extra set_bit atomic operation in the threaded path. That is when I came up with the idea of just adding a bit to the busy poll logic so that the only extra cost in the threaded path was having to check 2 bits instead of 1.
> > >
> > > Maybe we can set the bit only if the thread is running? When thread
> > > comes out of schedule() it can be sure that it has an NAPI to service.
> > > But when it enters napi_thread_wait() and before it hits schedule()
> > > it must be careful to make sure the NAPI is still (or already in the
> > > very first run after creation) owned by it.
> >
> > Are you suggesting setting the SCHED_THREAD bit in napi_thread_wait()
> > somewhere instead of in ____napi_schedule() as you previously plotted?
> > What does it help? I think if we have to do an extra set_bit(), it
> > seems cleaner to set it in ____napi_schedule(). This would solve the
> > warning issue as well.
>
> I was thinking of something roughly like this:
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ddf4cfc12615..3bce94e8c110 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -360,6 +360,7 @@ enum {
>         NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>         NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>         NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> +       NAPI_STATE_SCHED_THREAD,        /* Thread owns the NAPI and will poll */
>  };
>
>  enum {
> @@ -372,6 +373,7 @@ enum {
>         NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>         NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>         NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> +       NAPIF_STATE_SCHED_THREAD        = BIT(NAPI_STATE_SCHED_THREAD),
>  };
>
>  enum gro_result {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6c5967e80132..852b992d0ebb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>                  */
>                 thread = READ_ONCE(napi->thread);
>                 if (thread) {
> +                       if (thread->state == TASK_RUNNING)
> +                               set_bit(NAPIF_STATE_SCHED_THREAD, &napi->state);
>                         wake_up_process(thread);
>                         return;
>                 }
> @@ -6486,7 +6488,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>                 WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>
>                 new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> -                             NAPIF_STATE_PREFER_BUSY_POLL);
> +                             NAPIF_STATE_PREFER_BUSY_POLL |
> +                             NAPIF_STATE_SCHED_THREAD);
>
>                 /* If STATE_MISSED was set, leave STATE_SCHED set,
>                  * because we will call napi->poll() one more time.
> @@ -6968,16 +6971,24 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>
>  static int napi_thread_wait(struct napi_struct *napi)
>  {
> +       bool woken = false;
> +
>         set_current_state(TASK_INTERRUPTIBLE);
>
>         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> +               unsigned long state = READ_ONCE(napi->state);
> +
> +               if ((state & NAPIF_STATE_SCHED) &&
> +                   ((state & NAPIF_STATE_SCHED_THREAD) || woken)) {
>                         WARN_ON(!list_empty(&napi->poll_list));
>                         __set_current_state(TASK_RUNNING);
>                         return 0;
> +               } else {
> +                       WARN_ON(woken);
>                 }
>
>                 schedule();
> +               woken = true;
>                 set_current_state(TASK_INTERRUPTIBLE);
>         }
>         __set_current_state(TASK_RUNNING);
>
>
> Extra set_bit() is only done if napi_schedule() comes early enough to
> see the thread still running. When the thread is woken we continue to
> assume ownership.
>
> It's just an idea (but it may solve the first run and the disable case).

Hmm... I don't think the above patch would work. Consider a situation that:
1. At first, the kthread is in sleep mode.
2. Then someone calls napi_schedule() to schedule work on this napi.
So ____napi_schedule() is called. But at this moment, the kthread is
not yet in RUNNING state. So this function does not set SCHED_THREAD
bit.
3. Then wake_up_process() is called to wake up the thread.
4. Then napi_threaded_poll() calls napi_thread_wait(). woken is false
and SCHED_THREAD bit is not set. So the kthread will go to sleep again
(in INTERRUPTIBLE mode) when schedule() is called, and waits to be
woken up by the next napi_schedule().
That will introduce arbitrary delay for the napi->poll() to be called.
Isn't it? Please enlighten me if I did not understand it correctly.

I personally prefer to directly set SCHED_THREAD bit in ____napi_schedule().
Or stick with SCHED_BUSY_POLL solution and replace kthread_run() with
kthread_create().

  reply	other threads:[~2021-02-25 18:32 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23 23:41 [PATCH net] net: fix race between napi kthread mode and busy poll Wei Wang
2021-02-24 19:48 ` Jakub Kicinski
2021-02-24 20:37   ` Eric Dumazet
2021-02-24 21:30     ` Jakub Kicinski
2021-02-24 22:29       ` Wei Wang
2021-02-24 23:29         ` Jakub Kicinski
     [not found]       ` <CANn89i+xGsMpRfPwZK281jyfum_1fhTNFXq7Z8HOww9H1BHmiw@mail.gmail.com>
2021-02-24 23:52         ` Jakub Kicinski
2021-02-24 23:59           ` Eric Dumazet
2021-02-25  0:07             ` Jakub Kicinski
2021-02-25  0:11               ` Alexander Duyck
2021-02-25  0:16                 ` Wei Wang
2021-02-25  0:32                   ` Jakub Kicinski
2021-02-25  0:44                     ` Wei Wang
2021-02-25  0:49                       ` Jakub Kicinski
2021-02-25  1:06                         ` Wei Wang
2021-02-25  1:40                           ` Jakub Kicinski
2021-02-25  2:16                             ` Wei Wang
2021-02-25  0:20                 ` Jakub Kicinski
2021-02-25  1:22                   ` Alexander Duyck
2021-02-25  2:03                     ` Jakub Kicinski
2021-02-25  2:31                       ` Wei Wang
2021-02-25  5:52                         ` Martin Zaharinov
2021-02-25  8:21                         ` Jakub Kicinski
2021-02-25 18:29                           ` Wei Wang [this message]
2021-02-25 23:00                             ` Jakub Kicinski
2021-02-26  0:16                               ` Wei Wang
2021-02-26  1:18                                 ` Jakub Kicinski
2021-02-26  1:49                                   ` Wei Wang
2021-02-26  3:52                                   ` Alexander Duyck
2021-02-26 18:28                                     ` Wei Wang
2021-02-26 21:35                                       ` Jakub Kicinski
2021-02-26 22:24                                         ` Wei Wang
     [not found]                                           ` <CALidq=UWupwXMMYAMMF2GW4ifR0WQJos6VqXPuzQ0_seHGUHdA@mail.gmail.com>
2021-02-26 22:37                                             ` Wei Wang
2021-02-26 23:10                                           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEA6p_DdccvmymRWEtggHgqb9dQ6NjK8rsrA03HH+r7mzt=5uw@mail.gmail.com' \
    --to=weiwan@google.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=kuba@kernel.org \
    --cc=micron10@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.