All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] napi: fix race inside napi_enable
@ 2021-09-18  8:52 Xuan Zhuo
  2021-09-20  8:50 ` patchwork-bot+netdevbpf
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Xuan Zhuo @ 2021-09-18  8:52 UTC (permalink / raw)
  To: netdev, linyunsheng
  Cc: David S. Miller, Jakub Kicinski, Eric Dumazet, Daniel Borkmann,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Taehee Yoo,
	Björn Töpel, Arnd Bergmann, Kumar Kartikeya Dwivedi,
	Neil Horman, Dust Li

The process will cause napi.state to contain NAPI_STATE_SCHED and
not in the poll_list, which will cause napi_disable() to get stuck.

The prefix "NAPI_STATE_" is removed in the figure below, and
NAPI_STATE_HASHED is ignored in napi.state.

                      CPU0       |                   CPU1       | napi.state
===============================================================================
napi_disable()                   |                              | SCHED | NPSVC
napi_enable()                    |                              |
{                                |                              |
    smp_mb__before_atomic();     |                              |
    clear_bit(SCHED, &n->state); |                              | NPSVC
                                 | napi_schedule_prep()         | SCHED | NPSVC
                                 | napi_poll()                  |
                                 |   napi_complete_done()       |
                                 |   {                          |
                                 |      if (n->state & (NPSVC | | (1)
                                 |               _BUSY_POLL)))  |
                                 |           return false;      |
                                 |     ................         |
                                 |   }                          | SCHED | NPSVC
                                 |                              |
    clear_bit(NPSVC, &n->state); |                              | SCHED
}                                |                              |
                                 |                              |
napi_schedule_prep()             |                              | SCHED | MISSED (2)

(1) Here return direct. Because of NAPI_STATE_NPSVC exists.
(2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list

Since NAPI_STATE_SCHED already exists and napi is not in the
sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
exist.

1. This will cause this queue to no longer receive packets.
2. If you encounter napi_disable under the protection of rtnl_lock, it
   will cause the entire rtnl_lock to be locked, affecting the overall
   system.

This patch uses cmpxchg to implement napi_enable(), which ensures that
there will be no race due to the separation of clear two bits.

Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 net/core/dev.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 74fd402d26dd..7ee9fecd3aff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6923,12 +6923,16 @@ EXPORT_SYMBOL(napi_disable);
  */
 void napi_enable(struct napi_struct *n)
 {
-	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
-	smp_mb__before_atomic();
-	clear_bit(NAPI_STATE_SCHED, &n->state);
-	clear_bit(NAPI_STATE_NPSVC, &n->state);
-	if (n->dev->threaded && n->thread)
-		set_bit(NAPI_STATE_THREADED, &n->state);
+	unsigned long val, new;
+
+	do {
+		val = READ_ONCE(n->state);
+		BUG_ON(!test_bit(NAPI_STATE_SCHED, &val));
+
+		new = val & ~(NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC);
+		if (n->dev->threaded && n->thread)
+			new |= NAPIF_STATE_THREADED;
+	} while (cmpxchg(&n->state, val, new) != val);
 }
 EXPORT_SYMBOL(napi_enable);
 
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-09-18  8:52 [PATCH net v2] napi: fix race inside napi_enable Xuan Zhuo
@ 2021-09-20  8:50 ` patchwork-bot+netdevbpf
  2021-09-20 19:20 ` Jakub Kicinski
  2021-10-18 21:58 ` Sukadev Bhattiprolu
  2 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-09-20  8:50 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, linyunsheng, davem, kuba, edumazet, daniel, atenart,
	alobakin, weiwan, ap420073, bjorn, arnd, memxor, nhorman,
	dust.li

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Sat, 18 Sep 2021 16:52:32 +0800 you wrote:
> The process will cause napi.state to contain NAPI_STATE_SCHED and
> not in the poll_list, which will cause napi_disable() to get stuck.
> 
> The prefix "NAPI_STATE_" is removed in the figure below, and
> NAPI_STATE_HASHED is ignored in napi.state.
> 
>                       CPU0       |                   CPU1       | napi.state
> ===============================================================================
> napi_disable()                   |                              | SCHED | NPSVC
> napi_enable()                    |                              |
> {                                |                              |
>     smp_mb__before_atomic();     |                              |
>     clear_bit(SCHED, &n->state); |                              | NPSVC
>                                  | napi_schedule_prep()         | SCHED | NPSVC
>                                  | napi_poll()                  |
>                                  |   napi_complete_done()       |
>                                  |   {                          |
>                                  |      if (n->state & (NPSVC | | (1)
>                                  |               _BUSY_POLL)))  |
>                                  |           return false;      |
>                                  |     ................         |
>                                  |   }                          | SCHED | NPSVC
>                                  |                              |
>     clear_bit(NPSVC, &n->state); |                              | SCHED
> }                                |                              |
>                                  |                              |
> napi_schedule_prep()             |                              | SCHED | MISSED (2)
> 
> [...]

Here is the summary with links:
  - [net,v2] napi: fix race inside napi_enable
    https://git.kernel.org/netdev/net/c/3765996e4f0b

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-09-18  8:52 [PATCH net v2] napi: fix race inside napi_enable Xuan Zhuo
  2021-09-20  8:50 ` patchwork-bot+netdevbpf
@ 2021-09-20 19:20 ` Jakub Kicinski
  2021-09-22  6:47   ` Xuan Zhuo
  2021-10-18 21:58 ` Sukadev Bhattiprolu
  2 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2021-09-20 19:20 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, linyunsheng, David S. Miller, Eric Dumazet,
	Daniel Borkmann, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Taehee Yoo, Björn Töpel, Arnd Bergmann,
	Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

On Sat, 18 Sep 2021 16:52:32 +0800 Xuan Zhuo wrote:
> The process will cause napi.state to contain NAPI_STATE_SCHED and
> not in the poll_list, which will cause napi_disable() to get stuck.
> 
> The prefix "NAPI_STATE_" is removed in the figure below, and
> NAPI_STATE_HASHED is ignored in napi.state.
> 
>                       CPU0       |                   CPU1       | napi.state
> ===============================================================================
> napi_disable()                   |                              | SCHED | NPSVC
> napi_enable()                    |                              |
> {                                |                              |
>     smp_mb__before_atomic();     |                              |
>     clear_bit(SCHED, &n->state); |                              | NPSVC
>                                  | napi_schedule_prep()         | SCHED | NPSVC
>                                  | napi_poll()                  |
>                                  |   napi_complete_done()       |
>                                  |   {                          |
>                                  |      if (n->state & (NPSVC | | (1)
>                                  |               _BUSY_POLL)))  |
>                                  |           return false;      |
>                                  |     ................         |
>                                  |   }                          | SCHED | NPSVC
>                                  |                              |
>     clear_bit(NPSVC, &n->state); |                              | SCHED
> }                                |                              |
>                                  |                              |
> napi_schedule_prep()             |                              | SCHED | MISSED (2)
> 
> (1) Here return direct. Because of NAPI_STATE_NPSVC exists.
> (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list
> 
> Since NAPI_STATE_SCHED already exists and napi is not in the
> sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
> exist.
> 
> 1. This will cause this queue to no longer receive packets.
> 2. If you encounter napi_disable under the protection of rtnl_lock, it
>    will cause the entire rtnl_lock to be locked, affecting the overall
>    system.
> 
> This patch uses cmpxchg to implement napi_enable(), which ensures that
> there will be no race due to the separation of clear two bits.
> 
> Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>

Why don't you just invert the order of clearing the bits:

diff --git a/net/core/dev.c b/net/core/dev.c
index a796754f75cc..706eca8112c1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6953,8 +6953,8 @@ void napi_enable(struct napi_struct *n)
 {
        BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
        smp_mb__before_atomic();
-       clear_bit(NAPI_STATE_SCHED, &n->state);
        clear_bit(NAPI_STATE_NPSVC, &n->state);
+       clear_bit(NAPI_STATE_SCHED, &n->state);
        if (n->dev->threaded && n->thread)
                set_bit(NAPI_STATE_THREADED, &n->state);
 }

That's simpler and symmetric with the disable path.

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-09-20 19:20 ` Jakub Kicinski
@ 2021-09-22  6:47   ` Xuan Zhuo
  2021-09-23 13:14     ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Xuan Zhuo @ 2021-09-22  6:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: =?utf-8?q?netdev=40vger=2Ekernel=2Eorg=2C?=,
	=?utf-8?q?_linyunsheng=40huawei=2Ecom=2C?=,
	=?utf-8?q?_David_S=2E_Miller_=3Cdavem=40davemloft=2Enet=3E=2C?=,
	=?utf-8?q?_Eric_Dumazet_=3Cedumazet=40google=2Ecom=3E=2C?=,
	=?utf-8?q?_Daniel_Borkmann_=3Cdaniel=40iogearbox=2Enet=3E=2C?=,
	=?utf-8?q?_Antoine_Tenart_=3Catenart=40kernel=2Eorg=3E=2C?=,
	=?utf-8?q?_Alexander_Lobakin_=3Calobakin=40pm=2Eme=3E=2C?=,
	=?utf-8?q?_Wei_Wang_=3Cweiwan=40google=2Ecom=3E=2C?=,
	=?utf-8?q?_Taehee_Yoo_=3Cap420073=40gmail=2Ecom=3E=2C?=,
	=?utf-8?b?IEJqw7ZybiBUw7ZwZWwgPGJqb3JuQGtlcm5lbC5vcmc+LA==?=,
	=?utf-8?q?_Arnd_Bergmann_=3Carnd=40arndb=2Ede=3E=2C?=,
	=?utf-8?q?_Kumar_Kartikeya_Dwivedi_=3Cmemxor=40gmail=2Ecom=3E=2C?=,
	=?utf-8?q?_Neil_Horman_=3Cnhorman=40redhat=2Ecom=3E=2C?=,
	=?utf-8?q?_Dust_Li_=3Cdust=2Eli=40linux=2Ealibaba=2Ecom=3E?=

On Mon, 20 Sep 2021 12:20:24 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Sat, 18 Sep 2021 16:52:32 +0800 Xuan Zhuo wrote:
> > The process will cause napi.state to contain NAPI_STATE_SCHED and
> > not in the poll_list, which will cause napi_disable() to get stuck.
> >
> > The prefix "NAPI_STATE_" is removed in the figure below, and
> > NAPI_STATE_HASHED is ignored in napi.state.
> >
> >                       CPU0       |                   CPU1       | napi.state
> > ===============================================================================
> > napi_disable()                   |                              | SCHED | NPSVC
> > napi_enable()                    |                              |
> > {                                |                              |
> >     smp_mb__before_atomic();     |                              |
> >     clear_bit(SCHED, &n->state); |                              | NPSVC
> >                                  | napi_schedule_prep()         | SCHED | NPSVC
> >                                  | napi_poll()                  |
> >                                  |   napi_complete_done()       |
> >                                  |   {                          |
> >                                  |      if (n->state & (NPSVC | | (1)
> >                                  |               _BUSY_POLL)))  |
> >                                  |           return false;      |
> >                                  |     ................         |
> >                                  |   }                          | SCHED | NPSVC
> >                                  |                              |
> >     clear_bit(NPSVC, &n->state); |                              | SCHED
> > }                                |                              |
> >                                  |                              |
> > napi_schedule_prep()             |                              | SCHED | MISSED (2)
> >
> > (1) Here return direct. Because of NAPI_STATE_NPSVC exists.
> > (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list
> >
> > Since NAPI_STATE_SCHED already exists and napi is not in the
> > sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
> > exist.
> >
> > 1. This will cause this queue to no longer receive packets.
> > 2. If you encounter napi_disable under the protection of rtnl_lock, it
> >    will cause the entire rtnl_lock to be locked, affecting the overall
> >    system.
> >
> > This patch uses cmpxchg to implement napi_enable(), which ensures that
> > there will be no race due to the separation of clear two bits.
> >
> > Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
>
> Why don't you just invert the order of clearing the bits:

I think it should be an atomic operation. The original two-step clear itself is
problematic. So from this perspective, it is not just a solution to this
problem.

Thanks.

>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a796754f75cc..706eca8112c1 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6953,8 +6953,8 @@ void napi_enable(struct napi_struct *n)
>  {
>         BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
>         smp_mb__before_atomic();
> -       clear_bit(NAPI_STATE_SCHED, &n->state);
>         clear_bit(NAPI_STATE_NPSVC, &n->state);
> +       clear_bit(NAPI_STATE_SCHED, &n->state);
>         if (n->dev->threaded && n->thread)
>                 set_bit(NAPI_STATE_THREADED, &n->state);
>  }
>
> That's simpler and symmetric with the disable path.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-09-22  6:47   ` Xuan Zhuo
@ 2021-09-23 13:14     ` Jakub Kicinski
       [not found]       ` <1632404456.506512-1-xuanzhuo@linux.alibaba.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2021-09-23 13:14 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, linyunsheng, David S. Miller, Eric Dumazet, Neil Horman,
	Dust Li, Wei Wang

On Wed, 22 Sep 2021 14:47:47 +0800 Xuan Zhuo wrote:
> > Why don't you just invert the order of clearing the bits:  
> 
> I think it should be an atomic operation. The original two-step clear itself is
> problematic. So from this perspective, it is not just a solution to this
> problem.

[resending, my MUA seems to have corrupted the CC list previously]

Can you show what breaks by it being non-atomic?

Because, again, the disable part is not atomic. Either it's needed on
both sides or it's not needed on either.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
       [not found]       ` <1632404456.506512-1-xuanzhuo@linux.alibaba.com>
@ 2021-09-23 14:54         ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2021-09-23 14:54 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, linyunsheng, David S. Miller, Eric Dumazet, Neil Horman,
	Dust Li, Wei Wang

On Thu, 23 Sep 2021 21:40:56 +0800 Xuan Zhuo wrote:
> On Thu, 23 Sep 2021 06:14:17 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > > I think it should be an atomic operation. The original two-step clear itself is
> > > problematic. So from this perspective, it is not just a solution to this
> > > problem.  
> >
> > [resending, my MUA seems to have corrupted the CC list previously]
> >
> > Can you show what breaks by it being non-atomic?  
> 
> Isn't the problem this time caused by non-atoms?
> 
> Of course, in response to this problem, adjusting the order seems to be able to
> solve this problem. Compared to changing to atomic operations, we have to test
> other problems that may be caused by modifying this order.
> 
> Relatively speaking, the use of atoms is a relatively simple way of processing.

Whether atomics are simple or not is not the question.

What I'm saying is that having asymmetric enable and disable paths
is fragile.

> > Because, again, the disable part is not atomic. Either it's needed on
> > both sides or it's not needed on either.  
> 
> For the disable part, I think it’s okay not to use atoms. Have you considered
> any special scenarios?

The point is both sides should do the same thing.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-09-18  8:52 [PATCH net v2] napi: fix race inside napi_enable Xuan Zhuo
  2021-09-20  8:50 ` patchwork-bot+netdevbpf
  2021-09-20 19:20 ` Jakub Kicinski
@ 2021-10-18 21:58 ` Sukadev Bhattiprolu
  2021-10-18 22:55   ` Jakub Kicinski
  2 siblings, 1 reply; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2021-10-18 21:58 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, linyunsheng, David S. Miller, Jakub Kicinski,
	Eric Dumazet, Daniel Borkmann, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Taehee Yoo, Björn Töpel, Arnd Bergmann,
	Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

Xuan Zhuo [xuanzhuo@linux.alibaba.com] wrote:
> The process will cause napi.state to contain NAPI_STATE_SCHED and
> not in the poll_list, which will cause napi_disable() to get stuck.
> 
> The prefix "NAPI_STATE_" is removed in the figure below, and
> NAPI_STATE_HASHED is ignored in napi.state.
> 
>                       CPU0       |                   CPU1       | napi.state
> ===============================================================================
> napi_disable()                   |                              | SCHED | NPSVC
> napi_enable()                    |                              |
> {                                |                              |
>     smp_mb__before_atomic();     |                              |
>     clear_bit(SCHED, &n->state); |                              | NPSVC
>                                  | napi_schedule_prep()         | SCHED | NPSVC
>                                  | napi_poll()                  |
>                                  |   napi_complete_done()       |
>                                  |   {                          |
>                                  |      if (n->state & (NPSVC | | (1)
>                                  |               _BUSY_POLL)))  |
>                                  |           return false;      |
>                                  |     ................         |
>                                  |   }                          | SCHED | NPSVC
>                                  |                              |
>     clear_bit(NPSVC, &n->state); |                              | SCHED
> }                                |                              |

So its possible that after cpu0 cleared SCHED, cpu1 could have set it
back and we are going to use cmpxchg() to detect and retry right? If so,


>                                  |                              |
> napi_schedule_prep()             |                              | SCHED | MISSED (2)
> 
> (1) Here return direct. Because of NAPI_STATE_NPSVC exists.
> (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list
> 
> Since NAPI_STATE_SCHED already exists and napi is not in the
> sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
> exist.
> 
> 1. This will cause this queue to no longer receive packets.
> 2. If you encounter napi_disable under the protection of rtnl_lock, it
>    will cause the entire rtnl_lock to be locked, affecting the overall
>    system.
> 
> This patch uses cmpxchg to implement napi_enable(), which ensures that
> there will be no race due to the separation of clear two bits.
> 
> Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> ---
>  net/core/dev.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 74fd402d26dd..7ee9fecd3aff 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6923,12 +6923,16 @@ EXPORT_SYMBOL(napi_disable);
>   */
>  void napi_enable(struct napi_struct *n)
>  {
> -	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> -	smp_mb__before_atomic();
> -	clear_bit(NAPI_STATE_SCHED, &n->state);
> -	clear_bit(NAPI_STATE_NPSVC, &n->state);
> -	if (n->dev->threaded && n->thread)
> -		set_bit(NAPI_STATE_THREADED, &n->state);
> +	unsigned long val, new;
> +
> +	do {
> +		val = READ_ONCE(n->state);
> +		BUG_ON(!test_bit(NAPI_STATE_SCHED, &val));

is this BUG_ON valid/needed? We could have lost the cmpxchg() and
the other thread could have set NAPI_STATE_SCHED?

Sukadev

> +
> +		new = val & ~(NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC);
> +		if (n->dev->threaded && n->thread)
> +			new |= NAPIF_STATE_THREADED;
> +	} while (cmpxchg(&n->state, val, new) != val);
>  }
>  EXPORT_SYMBOL(napi_enable);
>  
> -- 
> 2.31.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-10-18 21:58 ` Sukadev Bhattiprolu
@ 2021-10-18 22:55   ` Jakub Kicinski
  2021-10-18 23:36     ` Dany Madden
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2021-10-18 22:55 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Xuan Zhuo, netdev, linyunsheng, David S. Miller, Eric Dumazet,
	Daniel Borkmann, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Taehee Yoo, Björn Töpel, Arnd Bergmann,
	Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

On Mon, 18 Oct 2021 14:58:08 -0700 Sukadev Bhattiprolu wrote:
> >                       CPU0       |                   CPU1       | napi.state
> > ===============================================================================
> > napi_disable()                   |                              | SCHED | NPSVC
> > napi_enable()                    |                              |
> > {                                |                              |
> >     smp_mb__before_atomic();     |                              |
> >     clear_bit(SCHED, &n->state); |                              | NPSVC
> >                                  | napi_schedule_prep()         | SCHED | NPSVC
> >                                  | napi_poll()                  |
> >                                  |   napi_complete_done()       |
> >                                  |   {                          |
> >                                  |      if (n->state & (NPSVC | | (1)
> >                                  |               _BUSY_POLL)))  |
> >                                  |           return false;      |
> >                                  |     ................         |
> >                                  |   }                          | SCHED | NPSVC
> >                                  |                              |
> >     clear_bit(NPSVC, &n->state); |                              | SCHED
> > }                                |                              |  
> 
> So its possible that after cpu0 cleared SCHED, cpu1 could have set it
> back and we are going to use cmpxchg() to detect and retry right? If so,

This is a state diagram before the change. There's no cmpxchg() here.
        
> > napi_schedule_prep()             |                              | SCHED | MISSED (2)
> > 
> > (1) Here return direct. Because of NAPI_STATE_NPSVC exists.
> > (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list
> > 
> > Since NAPI_STATE_SCHED already exists and napi is not in the
> > sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
> > exist.
> > 
> > 1. This will cause this queue to no longer receive packets.
> > 2. If you encounter napi_disable under the protection of rtnl_lock, it
> >    will cause the entire rtnl_lock to be locked, affecting the overall
> >    system.
> > 
> > This patch uses cmpxchg to implement napi_enable(), which ensures that
> > there will be no race due to the separation of clear two bits.
> > 
> > Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > ---
> >  net/core/dev.c | 16 ++++++++++------
> >  1 file changed, 10 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 74fd402d26dd..7ee9fecd3aff 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6923,12 +6923,16 @@ EXPORT_SYMBOL(napi_disable);
> >   */
> >  void napi_enable(struct napi_struct *n)
> >  {
> > -	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
> > -	smp_mb__before_atomic();
> > -	clear_bit(NAPI_STATE_SCHED, &n->state);
> > -	clear_bit(NAPI_STATE_NPSVC, &n->state);
> > -	if (n->dev->threaded && n->thread)
> > -		set_bit(NAPI_STATE_THREADED, &n->state);
> > +	unsigned long val, new;
> > +
> > +	do {
> > +		val = READ_ONCE(n->state);
> > +		BUG_ON(!test_bit(NAPI_STATE_SCHED, &val));  
> 
> is this BUG_ON valid/needed? We could have lost the cmpxchg() and
> the other thread could have set NAPI_STATE_SCHED?

The BUG_ON() is here to make sure that when napi_enable() is called the
napi instance was dormant, i.e. disabled. We have "STATE_SCHED" bit set
on disabled NAPIs because that bit means ownership. Whoever disabled
the NAPI owns it.

That BUG_ON() could have been taken outside of the loop, there's no
point re-checking on every try. 

Are you seeing NAPI-related failures? We had at least 3 reports in the
last two weeks of strange failures which look like NAPI state getting
corrupted on net-next...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-10-18 22:55   ` Jakub Kicinski
@ 2021-10-18 23:36     ` Dany Madden
  2021-10-18 23:47       ` Jakub Kicinski
  2021-10-22  3:16       ` Sukadev Bhattiprolu
  0 siblings, 2 replies; 13+ messages in thread
From: Dany Madden @ 2021-10-18 23:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Sukadev Bhattiprolu, Xuan Zhuo, netdev, linyunsheng,
	David S. Miller, Eric Dumazet, Daniel Borkmann, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Taehee Yoo, Björn Töpel,
	Arnd Bergmann, Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

On 2021-10-18 15:55, Jakub Kicinski wrote:
> On Mon, 18 Oct 2021 14:58:08 -0700 Sukadev Bhattiprolu wrote:
>> >                       CPU0       |                   CPU1       | napi.state
>> > ===============================================================================
>> > napi_disable()                   |                              | SCHED | NPSVC
>> > napi_enable()                    |                              |
>> > {                                |                              |
>> >     smp_mb__before_atomic();     |                              |
>> >     clear_bit(SCHED, &n->state); |                              | NPSVC
>> >                                  | napi_schedule_prep()         | SCHED | NPSVC
>> >                                  | napi_poll()                  |
>> >                                  |   napi_complete_done()       |
>> >                                  |   {                          |
>> >                                  |      if (n->state & (NPSVC | | (1)
>> >                                  |               _BUSY_POLL)))  |
>> >                                  |           return false;      |
>> >                                  |     ................         |
>> >                                  |   }                          | SCHED | NPSVC
>> >                                  |                              |
>> >     clear_bit(NPSVC, &n->state); |                              | SCHED
>> > }                                |                              |
>> 
>> So its possible that after cpu0 cleared SCHED, cpu1 could have set it
>> back and we are going to use cmpxchg() to detect and retry right? If 
>> so,
> 
> This is a state diagram before the change. There's no cmpxchg() here.
> 
>> > napi_schedule_prep()             |                              | SCHED | MISSED (2)
>> >
>> > (1) Here return direct. Because of NAPI_STATE_NPSVC exists.
>> > (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list
>> >
>> > Since NAPI_STATE_SCHED already exists and napi is not in the
>> > sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always
>> > exist.
>> >
>> > 1. This will cause this queue to no longer receive packets.
>> > 2. If you encounter napi_disable under the protection of rtnl_lock, it
>> >    will cause the entire rtnl_lock to be locked, affecting the overall
>> >    system.
>> >
>> > This patch uses cmpxchg to implement napi_enable(), which ensures that
>> > there will be no race due to the separation of clear two bits.
>> >
>> > Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable")
>> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
>> > ---
>> >  net/core/dev.c | 16 ++++++++++------
>> >  1 file changed, 10 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/net/core/dev.c b/net/core/dev.c
>> > index 74fd402d26dd..7ee9fecd3aff 100644
>> > --- a/net/core/dev.c
>> > +++ b/net/core/dev.c
>> > @@ -6923,12 +6923,16 @@ EXPORT_SYMBOL(napi_disable);
>> >   */
>> >  void napi_enable(struct napi_struct *n)
>> >  {
>> > -	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
>> > -	smp_mb__before_atomic();
>> > -	clear_bit(NAPI_STATE_SCHED, &n->state);
>> > -	clear_bit(NAPI_STATE_NPSVC, &n->state);
>> > -	if (n->dev->threaded && n->thread)
>> > -		set_bit(NAPI_STATE_THREADED, &n->state);
>> > +	unsigned long val, new;
>> > +
>> > +	do {
>> > +		val = READ_ONCE(n->state);
>> > +		BUG_ON(!test_bit(NAPI_STATE_SCHED, &val));
>> 
>> is this BUG_ON valid/needed? We could have lost the cmpxchg() and
>> the other thread could have set NAPI_STATE_SCHED?
> 
> The BUG_ON() is here to make sure that when napi_enable() is called the
> napi instance was dormant, i.e. disabled. We have "STATE_SCHED" bit set
> on disabled NAPIs because that bit means ownership. Whoever disabled
> the NAPI owns it.
> 
> That BUG_ON() could have been taken outside of the loop, there's no
> point re-checking on every try.
> 
> Are you seeing NAPI-related failures? We had at least 3 reports in the
> last two weeks of strange failures which look like NAPI state getting
> corrupted on net-next...

We hit two napi related crashes while attempting mtu size change.

1st crash:
[430425.020051] ------------[ cut here ]------------
[430425.020053] kernel BUG at ../net/core/dev.c:6938!
[430425.020057] Oops: Exception in kernel mode, sig: 5 [#1]
[430425.020068] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[430425.020075] Modules linked in: binfmt_misc rpadlpar_io rpaphp 
xt_tcpudp iptable_filter ip_tables x_tables pseries_rng ibmvnic rng_core 
ibmveth vmx_crypto gf128mul fuse btrfs blake2b_generic xor zstd_compress 
lzo_compress raid6_pq dm_service_time crc32c_vpmsum dm_mirror 
dm_region_hash dm_log dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
[430425.020123] CPU: 0 PID: 34337 Comm: kworker/0:3 Kdump: loaded 
Tainted: G    W     5.15.0-rc2-suka-00486-gce916130f5f6 #3
[430425.020133] Workqueue: events_long __ibmvnic_reset [ibmvnic]
[430425.020145] NIP: c000000000cb03f4 LR: c0080000014a4ce8 CTR: 
c000000000cb03b0
[430425.020151] REGS: c00000002e5d37e0 TRAP: 0700  Tainted: G    W     
(5.15.0-rc2-suka-00486-gce916130f5f6)
[430425.020159] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> 
CR: 28248428 XER: 20000001
[430425.020176] CFAR: c0080000014ad9cc IRQMASK: 0
         GPR00: c0080000014a4ce8 c00000002e5d3a80 c000000001b12100 
c0000001274f3190
         GPR04: 00000000ffff36dc fffffffffffffff6 0000000000000019 
0000000000000010
         GPR08: c00000002ec48900 0000000000000001 c0000001274f31a0 
c0080000014ad9b8
         GPR12: c000000000cb03b0 c000000001d00000 0000000080000000 
00000000000003fe
         GPR16: 00000000000006e3 0000000000000000 0000000000000008 
c00000002ec48af8
         GPR20: c00000002ec48db0 0000000000000000 0000000000000004 
0000000000000000
         GPR24: c00000002ec48000 0000000000000004 c00000002ec49070 
0000000000000006
         GPR28: c00000002ec48900 c00000002ec48900 0000000000000002 
c00000002ec48000
[430425.020248] NIP [c000000000cb03f4] napi_enable+0x44/0xc0
[430425.020257] LR [c0080000014a4ce8] __ibmvnic_open+0xf0/0x440 
[ibmvnic]
[430425.020265] Call Trace:
[430425.020269] [c00000002e5d3a80] [c00000002ec48900] 0xc00000002ec48900 
(unreliable)
[430425.020277] [c00000002e5d3ab0] [c0080000014a4f40] 
__ibmvnic_open+0x348/0x440 [ibmvnic]
[430425.020286] [c00000002e5d3b40] [c0080000014ace58] 
__ibmvnic_reset+0xb10/0xe40 [ibmvnic]
[430425.020296] [c00000002e5d3c60] [c0000000001673a4] 
process_one_work+0x2d4/0x5d0
[430425.020305] [c00000002e5d3d00] [c000000000167718] 
worker_thread+0x78/0x6c0
[430425.020314] [c00000002e5d3da0] [c000000000173388] 
kthread+0x188/0x190
[430425.020322] [c00000002e5d3e10] [c00000000000cee4] 
ret_from_kernel_thread+0x5c/0x64
[430425.020331] Instruction dump:
[430425.020335] 38a0fff6 39430010 e92d0c80 f9210028 39200000 60000000 
60000000 e9030010
[430425.020348] f9010020 e9210020 7d2948f8 792907e0 <0b090000> e9230038 
7d072838 89290889
[430425.020364] ---[ end trace 3abb5ec5589518ca ]---
[430425.068100]
[430425.068108] Sending IPI to other CPUs
[430425.068206] IPI complete
[430425.090333] kexec: Starting switchover sequence.

2nd crash:
[ 1526.539335] NAPI poll function 0x96b6b00f7adfd returned 0, exceeding 
its budget of -49738736.
[ 1526.539349] BUG: Kernel NULL pointer dereference on read at 
0x00000000
[ 1526.539354] Faulting instruction address: 0xc000000000cc4054
[ 1526.539358] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1526.539376] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 1526.539390] Modules linked in: rpadlpar_io rpaphp xt_tcpudp 
iptable_filter ip_tables x_tables pseries_rng ibmvnic rng_core ibmveth 
vmx_crypto gf128mul fuse btrfs blake2b_generic xor zstd_compress 
lzo_compress raid6_pq dm_service_time crc32c_vpmsum dm_mirror 
dm_region_hash dm_log dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
[ 1526.539469] CPU: 0 PID: 11 Comm: ksoftirqd/0 Kdump: loaded Not 
tainted 5.15.0-rc2-suka-00489-gd86e74e0e2e9 #4
[ 1526.539484] NIP: c000000000cc4054 LR: c000000000cc4494 CTR: 
c00000000089b9f0
[ 1526.539495] REGS: c00000000652b790 TRAP: 0380  Not tainted 
(5.15.0-rc2-suka-00489-gd86e74e0e2e9)
[ 1526.539506] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> 
CR: 28004224 XER: 20000001
[ 1526.539538] CFAR: c000000000cc4490 IRQMASK: 0
         GPR00: c000000000cc4494 c00000000652ba30 c000000001b12100 
c0000003fd090d08
         GPR04: 00000000fffeffff c00000000652b800 0000000000000027 
c0000003fd007e08
         GPR08: 0000000000000023 0000000000000000 0000000000000027 
0000000000000001
         GPR12: 0000000028004224 c000000001d00000 c0000000064ceb00 
0000000000000001
         GPR16: 0000000000000000 c000000001b33a00 000000010001df1f 
0000000000000003
         GPR20: c0000003fd090c00 0000000000000026 ffffffffffffffff 
0000000000000000
         GPR24: c000000001b33a00 c00000000652bba8 000000010001df20 
c00000000652ba58
         GPR28: c00000000652bb97 fffffffffd090c10 c0000003fd090d08 
0000000000000000
[ 1526.539661] NIP [c000000000cc4054] 
netif_receive_skb_list_internal+0x54/0x340
[ 1526.539675] LR [c000000000cc4494] gro_normal_list.part.149+0x34/0x60
[ 1526.539687] Call Trace:
[ 1526.539696] [c00000000652ba30] [c0000000001ca308] 
vprintk_emit+0xe8/0x2b0 (unreliable)
[ 1526.539713] [c00000000652bab0] [c000000000cc4494] 
gro_normal_list.part.149+0x34/0x60
[ 1526.539727] [c00000000652bae0] [c000000000cc6340] 
__napi_poll+0x250/0x330
[ 1526.539741] [c00000000652bb70] [c000000000cc697c] 
net_rx_action+0x31c/0x370
[ 1526.539754] [c00000000652bc20] [c000000000f3c8fc] 
__do_softirq+0x16c/0x420
[ 1526.539768] [c00000000652bd20] [c000000000147654] 
run_ksoftirqd+0x64/0x90
[ 1526.539783] [c00000000652bd40] [c00000000017955c] 
smpboot_thread_fn+0x21c/0x260
[ 1526.539796] [c00000000652bda0] [c000000000173388] kthread+0x188/0x190
[ 1526.539812] [c00000000652be10] [c00000000000cee4] 
ret_from_kernel_thread+0x5c/0x64
[ 1526.539827] Instruction dump:
[ 1526.539835] fbe1fff8 7c7e1b78 f8010010 f821ff81 ebe30000 3b610028 
e92d0c80 f9210048
[ 1526.539858] 39200000 fb610028 fb610030 7fa3f840 <eb9f0000> 419e004c 
7ffdfb78 60000000
[ 1526.539884] ---[ end trace e9681bc32653835d ]---
[ 1526.559758]
[ 1526.559766] Sending IPI to other CPUs
[ 1526.559858] IPI complete
[ 1526.581983] kexec: Starting switchover sequence.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-10-18 23:36     ` Dany Madden
@ 2021-10-18 23:47       ` Jakub Kicinski
  2021-10-19  0:01         ` Sukadev Bhattiprolu
  2021-10-22  3:16       ` Sukadev Bhattiprolu
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2021-10-18 23:47 UTC (permalink / raw)
  To: Dany Madden
  Cc: Sukadev Bhattiprolu, Xuan Zhuo, netdev, linyunsheng,
	David S. Miller, Eric Dumazet, Daniel Borkmann, Antoine Tenart,
	Alexander Lobakin, Wei Wang, Taehee Yoo, Björn Töpel,
	Arnd Bergmann, Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

On Mon, 18 Oct 2021 16:36:36 -0700 Dany Madden wrote:
> > The BUG_ON() is here to make sure that when napi_enable() is called the
> > napi instance was dormant, i.e. disabled. We have "STATE_SCHED" bit set
> > on disabled NAPIs because that bit means ownership. Whoever disabled
> > the NAPI owns it.
> > 
> > That BUG_ON() could have been taken outside of the loop, there's no
> > point re-checking on every try.
> > 
> > Are you seeing NAPI-related failures? We had at least 3 reports in the
> > last two weeks of strange failures which look like NAPI state getting
> > corrupted on net-next...  
> 
> We hit two napi related crashes while attempting mtu size change.

Is it reproducible or happens rarely and randomly?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-10-18 23:47       ` Jakub Kicinski
@ 2021-10-19  0:01         ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2021-10-19  0:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Dany Madden, Xuan Zhuo, netdev, linyunsheng, David S. Miller,
	Eric Dumazet, Daniel Borkmann, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Taehee Yoo, Björn Töpel, Arnd Bergmann,
	Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

Jakub Kicinski [kuba@kernel.org] wrote:
> On Mon, 18 Oct 2021 16:36:36 -0700 Dany Madden wrote:
> > > The BUG_ON() is here to make sure that when napi_enable() is called the
> > > napi instance was dormant, i.e. disabled. We have "STATE_SCHED" bit set
> > > on disabled NAPIs because that bit means ownership. Whoever disabled
> > > the NAPI owns it.
> > > 
> > > That BUG_ON() could have been taken outside of the loop, there's no
> > > point re-checking on every try.
> > > 
> > > Are you seeing NAPI-related failures? We had at least 3 reports in the
> > > last two weeks of strange failures which look like NAPI state getting
> > > corrupted on net-next...  
> > 
> > We hit two napi related crashes while attempting mtu size change.

BTW these are with a couple of bug fixes in ibmvnic driver applied
to 1e0083bd0777 ("gve: DQO: avoid unused variable warnings").

We are trying to narrow it down to the following change that is
required to be able to change mtu on ibmvnic.

> 
> Is it reproducible or happens rarely and randomly?

Random. We have been testing for a couple of weeks but hit both today.

Sukadev

---
index 8f17096e614d..a1533979c670 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1914,8 +1914,6 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
        ind_bufp = &tx_scrq->ind_buf;
 
        if (test_bit(0, &adapter->resetting)) {
-               if (!netif_subqueue_stopped(netdev, skb))
-                       netif_stop_subqueue(netdev, queue_num);
                dev_kfree_skb_any(skb);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-10-18 23:36     ` Dany Madden
  2021-10-18 23:47       ` Jakub Kicinski
@ 2021-10-22  3:16       ` Sukadev Bhattiprolu
  2021-10-25 17:36         ` Jakub Kicinski
  1 sibling, 1 reply; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2021-10-22  3:16 UTC (permalink / raw)
  To: Dany Madden
  Cc: Jakub Kicinski, Xuan Zhuo, netdev, linyunsheng, David S. Miller,
	Eric Dumazet, Daniel Borkmann, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Taehee Yoo, Björn Töpel, Arnd Bergmann,
	Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

Dany Madden [drt@linux.ibm.com] wrote:
> 
> We hit two napi related crashes while attempting mtu size change.
> 
> 1st crash:
> [430425.020051] ------------[ cut here ]------------
> [430425.020053] kernel BUG at ../net/core/dev.c:6938!
> [430425.020057] Oops: Exception in kernel mode, sig: 5 [#1]
> [430425.020068] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [430425.020075] Modules linked in: binfmt_misc rpadlpar_io rpaphp xt_tcpudp
> iptable_filter ip_tables x_tables pseries_rng ibmvnic rng_core ibmveth
> vmx_crypto gf128mul fuse btrfs blake2b_generic xor zstd_compress
> lzo_compress raid6_pq dm_service_time crc32c_vpmsum dm_mirror dm_region_hash
> dm_log dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> [430425.020123] CPU: 0 PID: 34337 Comm: kworker/0:3 Kdump: loaded Tainted: G
> W     5.15.0-rc2-suka-00486-gce916130f5f6 #3
> [430425.020133] Workqueue: events_long __ibmvnic_reset [ibmvnic]
> [430425.020145] NIP: c000000000cb03f4 LR: c0080000014a4ce8 CTR:
> c000000000cb03b0
> [430425.020151] REGS: c00000002e5d37e0 TRAP: 0700  Tainted: G    W
> (5.15.0-rc2-suka-00486-gce916130f5f6)
> [430425.020159] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
> 28248428 XER: 20000001
> [430425.020176] CFAR: c0080000014ad9cc IRQMASK: 0
>         GPR00: c0080000014a4ce8 c00000002e5d3a80 c000000001b12100
> c0000001274f3190
>         GPR04: 00000000ffff36dc fffffffffffffff6 0000000000000019
> 0000000000000010
>         GPR08: c00000002ec48900 0000000000000001 c0000001274f31a0
> c0080000014ad9b8
>         GPR12: c000000000cb03b0 c000000001d00000 0000000080000000
> 00000000000003fe
>         GPR16: 00000000000006e3 0000000000000000 0000000000000008
> c00000002ec48af8
>         GPR20: c00000002ec48db0 0000000000000000 0000000000000004
> 0000000000000000
>         GPR24: c00000002ec48000 0000000000000004 c00000002ec49070
> 0000000000000006
>         GPR28: c00000002ec48900 c00000002ec48900 0000000000000002
> c00000002ec48000
> [430425.020248] NIP [c000000000cb03f4] napi_enable+0x44/0xc0
> [430425.020257] LR [c0080000014a4ce8] __ibmvnic_open+0xf0/0x440 [ibmvnic]
> [430425.020265] Call Trace:
> [430425.020269] [c00000002e5d3a80] [c00000002ec48900] 0xc00000002ec48900
> (unreliable)
> [430425.020277] [c00000002e5d3ab0] [c0080000014a4f40]
> __ibmvnic_open+0x348/0x440 [ibmvnic]
> [430425.020286] [c00000002e5d3b40] [c0080000014ace58]
> __ibmvnic_reset+0xb10/0xe40 [ibmvnic]
> [430425.020296] [c00000002e5d3c60] [c0000000001673a4]
> process_one_work+0x2d4/0x5d0
> [430425.020305] [c00000002e5d3d00] [c000000000167718]
> worker_thread+0x78/0x6c0
> [430425.020314] [c00000002e5d3da0] [c000000000173388] kthread+0x188/0x190
> [430425.020322] [c00000002e5d3e10] [c00000000000cee4]
> ret_from_kernel_thread+0x5c/0x64
> [430425.020331] Instruction dump:
> [430425.020335] 38a0fff6 39430010 e92d0c80 f9210028 39200000 60000000
> 60000000 e9030010
> [430425.020348] f9010020 e9210020 7d2948f8 792907e0 <0b090000> e9230038
> 7d072838 89290889
> [430425.020364] ---[ end trace 3abb5ec5589518ca ]---
> [430425.068100]
> [430425.068108] Sending IPI to other CPUs
> [430425.068206] IPI complete
> [430425.090333] kexec: Starting switchover sequence.

Jakub,

We hit this napi_enable() BUG_ON() crash three times this week. In two
of those times it appears that

	napi->state = netdev_priv(netdev)

i.e it contains ibmvnic_adapter* in our case.

	# Crash was on eth3

	crash> net |grep eth3
	c00000002e948000  eth3   10.1.194.173

	crash> net_device |grep SIZE
	SIZE: 2304

	crash> px 2304
	$1 = 0x900

	crash> ibmvnic_adapter c00000002e948900 |grep napi
	  napi = 0xc00000003b7dc000,
	  num_active_rx_napi = 8,
	  napi_enabled = false,

	crash> napi_struct 0xc00000003b7dc000 |grep state
	  state = 13835058056063650048,
	    state = 0 '\000',

	crash> px 13835058056063650048
	$2 = 0xc00000002e948900		#eth3 ibmvnic_adapter above

In the third case napi->state was 16 (i.e NAPI_STATE_SCHED was clear and
hence the bug in napi_enable()).

Let us know if any other fields are of interest. Do we have any clues on
when this started?

Sukadev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] napi: fix race inside napi_enable
  2021-10-22  3:16       ` Sukadev Bhattiprolu
@ 2021-10-25 17:36         ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2021-10-25 17:36 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Dany Madden, Xuan Zhuo, netdev, linyunsheng, David S. Miller,
	Eric Dumazet, Daniel Borkmann, Antoine Tenart, Alexander Lobakin,
	Wei Wang, Taehee Yoo, Björn Töpel, Arnd Bergmann,
	Kumar Kartikeya Dwivedi, Neil Horman, Dust Li

On Thu, 21 Oct 2021 20:16:04 -0700 Sukadev Bhattiprolu wrote:
> Let us know if any other fields are of interest. Do we have any clues on
> when this started?

I think this is the first of the series of similar hard to explain
reports:

https://lore.kernel.org/all/000000000000c1524005cdeacc5f@google.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-25 17:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-18  8:52 [PATCH net v2] napi: fix race inside napi_enable Xuan Zhuo
2021-09-20  8:50 ` patchwork-bot+netdevbpf
2021-09-20 19:20 ` Jakub Kicinski
2021-09-22  6:47   ` Xuan Zhuo
2021-09-23 13:14     ` Jakub Kicinski
     [not found]       ` <1632404456.506512-1-xuanzhuo@linux.alibaba.com>
2021-09-23 14:54         ` Jakub Kicinski
2021-10-18 21:58 ` Sukadev Bhattiprolu
2021-10-18 22:55   ` Jakub Kicinski
2021-10-18 23:36     ` Dany Madden
2021-10-18 23:47       ` Jakub Kicinski
2021-10-19  0:01         ` Sukadev Bhattiprolu
2021-10-22  3:16       ` Sukadev Bhattiprolu
2021-10-25 17:36         ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.