All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <weiwan@google.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Alexander Duyck <alexanderduyck@fb.com>,
	Martin Zaharinov <micron10@gmail.com>
Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy poll
Date: Wed, 24 Feb 2021 14:29:21 -0800	[thread overview]
Message-ID: <CAEA6p_Crfx8_izk+GCE30a-DAwiKbNmxNKJ0=7be1Wtm8AbX8Q@mail.gmail.com> (raw)
In-Reply-To: <20210224133032.4227a60c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Wed, Feb 24, 2021 at 1:30 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 24 Feb 2021 21:37:36 +0100 Eric Dumazet wrote:
> > On Wed, Feb 24, 2021 at 8:48 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Tue, 23 Feb 2021 15:41:30 -0800 Wei Wang wrote:
> > > > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> > > > determine if the kthread owns this napi and could call napi->poll() on
> > > > it. However, if socket busy poll is enabled, it is possible that the
> > > > busy poll thread grabs this SCHED bit (after the previous napi->poll()
> > > > invokes napi_complete_done() and clears SCHED bit) and tries to poll
> > > > on the same napi.
> > > > This patch tries to fix this race by adding a new bit
> > > > NAPI_STATE_SCHED_BUSY_POLL in napi->state. This bit gets set in
> > > > napi_busy_loop() togther with NAPI_STATE_SCHED, and gets cleared in
> > > > napi_complete_done() together with NAPI_STATE_SCHED. This helps
> > > > distinguish the ownership of the napi between kthread and the busy poll
> > > > thread, and prevents the kthread from polling on the napi when this napi
> > > > is still owned by the busy poll thread.
> > > >
> > > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> > > > Reported-by: Martin Zaharinov <micron10@gmail.com>
> > > > Suggested-by: Alexander Duyck <alexanderduyck@fb.com>
> > > > Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> > > > Reviewed-by: Eric Dumazet <edumazet@google.come>
> > >
> > > AFAIU sched bit controls the ownership of the poll_list
> >
> > I disagree. BUSY POLL never inserted the napi into a list,
> > because the user thread was polling one napi.
> >
> > Same for the kthread.
>
> There is no delayed execution in busy_poll. It either got the sched bit
> and it knows it, or it didn't.
>
> > wake_up_process() should be good enough.
>
> Well, if that's the direction maybe we should depend on the thread
> state more?  IOW pay less attention to SCHED and have
> napi_complete_done() set_current_state() if thread is running?
>
> I didn't think that through fully but you can't say "wake_up_process()
> should be good enough" and at the same time add another bit proving
> it's not enough.
>
> > > Can we pleaseadd a poll_list for the thread and make sure the
> > > thread polls based on the list?
> >
> > A list ? That would require a spinlock or something ?
>
> Does the softnet list require a spinlock?
>
> Obviously with current code the list would only ever have one napi
> instance per thread but I think it's worth the code simplicity.
> napi_complete_done() dels from the list / releases that ownership
> already.
>

I think what Jakub proposed here should work. But I have a similar
concern as Eric. I think the kthread belongs to the NAPI instance, and
the kthread only polls on that specific NAPI if threaded mode is
enabled. Adding the NAPI to a list that the kthread polls seems to be
a reverse of logic. And it is unlike the sd->poll_list, where multiple
NAPI instances could be added to that list and get polled. But
functionality-wise, it does seem it will work.

> > > IMO that's far clearer than defining a forest of ownership state
> > > bits.
> >
> > Adding a bit seems simpler than adding a list.
>
> In terms of what? LoC?
>
> Just to find out what the LoC is I sketched this out:
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ddf4cfc12615..77f09ced9ee4 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -348,6 +348,7 @@ struct napi_struct {
>         struct hlist_node       napi_hash_node;
>         unsigned int            napi_id;
>         struct task_struct      *thread;
> +       struct list_head        thread_poll_list;
>  };
>
>  enum {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6c5967e80132..99ff083232e9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>                  */
>                 thread = READ_ONCE(napi->thread);
>                 if (thread) {
> +                       list_add_tail(&napi->poll_list,
> +                                     &napi->thread_poll_list);
>                         wake_up_process(thread);
>                         return;
>                 }
> @@ -6777,6 +6779,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>                 return;
>
>         INIT_LIST_HEAD(&napi->poll_list);
> +       INIT_LIST_HEAD(&napi->thread_poll_list);
>         INIT_HLIST_NODE(&napi->napi_hash_node);
>         hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
>         napi->timer.function = napi_watchdog;
> @@ -6971,8 +6974,7 @@ static int napi_thread_wait(struct napi_struct *napi)
>         set_current_state(TASK_INTERRUPTIBLE);
>
>         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> -                       WARN_ON(!list_empty(&napi->poll_list));
> +               if (!list_emtpy(&napi->thread_poll_list)) {
>                         __set_current_state(TASK_RUNNING);
>                         return 0;
>                 }
>
> $ git diff --stat
>  include/linux/netdevice.h | 1 +
>  net/core/dev.c            | 6 ++++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> > > I think with just the right (wrong?) timing this patch will still
> > > not protect against disabling the NAPI.
> >
> > Maybe, but this patch is solving one issue that was easy to trigger.
> >
> > disabling the NAPI is handled already.
>
> The thread checks if NAPI is getting disabled, then time passes, then
> it checks if it's scheduled. If napi gets disabled in the "time passes"
> period thread will think that it got scheduled again.
>

Not sure if I understand it correctly, when you say "then it checks if
it's scheduled", do you mean the schedule() call in napi_thread_wait()
that re-enters this function? If so, it still checks to make sure
!napi_disable_pending(napi) before it goes to poll on the napi
instance. I think that is sufficient to make sure we don't poll on a
NAPI that is in DISABLE state?


> Sure, we can go and make special annotations in all other parts of NAPI
> infra, but isn't that an obvious sign of a bad design?
>
>
> I wanted to add that I have spent quite a bit of time hacking around
> the threaded NAPI thing before I had to maintain, and (admittedly my
> brain is not very capable but) I had a hard time getting things working
> reliably with netpoll, busy polling, disabling etc. IOW I'm not just
> claiming that "more bits" is not a good solution on a whim.

  reply	other threads:[~2021-02-24 22:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23 23:41 [PATCH net] net: fix race between napi kthread mode and busy poll Wei Wang
2021-02-24 19:48 ` Jakub Kicinski
2021-02-24 20:37   ` Eric Dumazet
2021-02-24 21:30     ` Jakub Kicinski
2021-02-24 22:29       ` Wei Wang [this message]
2021-02-24 23:29         ` Jakub Kicinski
     [not found]       ` <CANn89i+xGsMpRfPwZK281jyfum_1fhTNFXq7Z8HOww9H1BHmiw@mail.gmail.com>
2021-02-24 23:52         ` Jakub Kicinski
2021-02-24 23:59           ` Eric Dumazet
2021-02-25  0:07             ` Jakub Kicinski
2021-02-25  0:11               ` Alexander Duyck
2021-02-25  0:16                 ` Wei Wang
2021-02-25  0:32                   ` Jakub Kicinski
2021-02-25  0:44                     ` Wei Wang
2021-02-25  0:49                       ` Jakub Kicinski
2021-02-25  1:06                         ` Wei Wang
2021-02-25  1:40                           ` Jakub Kicinski
2021-02-25  2:16                             ` Wei Wang
2021-02-25  0:20                 ` Jakub Kicinski
2021-02-25  1:22                   ` Alexander Duyck
2021-02-25  2:03                     ` Jakub Kicinski
2021-02-25  2:31                       ` Wei Wang
2021-02-25  5:52                         ` Martin Zaharinov
2021-02-25  8:21                         ` Jakub Kicinski
2021-02-25 18:29                           ` Wei Wang
2021-02-25 23:00                             ` Jakub Kicinski
2021-02-26  0:16                               ` Wei Wang
2021-02-26  1:18                                 ` Jakub Kicinski
2021-02-26  1:49                                   ` Wei Wang
2021-02-26  3:52                                   ` Alexander Duyck
2021-02-26 18:28                                     ` Wei Wang
2021-02-26 21:35                                       ` Jakub Kicinski
2021-02-26 22:24                                         ` Wei Wang
     [not found]                                           ` <CALidq=UWupwXMMYAMMF2GW4ifR0WQJos6VqXPuzQ0_seHGUHdA@mail.gmail.com>
2021-02-26 22:37                                             ` Wei Wang
2021-02-26 23:10                                           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEA6p_Crfx8_izk+GCE30a-DAwiKbNmxNKJ0=7be1Wtm8AbX8Q@mail.gmail.com' \
    --to=weiwan@google.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=kuba@kernel.org \
    --cc=micron10@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.