All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v4] net: fix race between napi kthread mode and busy poll
@ 2021-03-16 22:36 Wei Wang
  2021-03-17 18:58 ` Jakub Kicinski
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Wei Wang @ 2021-03-16 22:36 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, netdev
  Cc: Martin Zaharinov, Alexander Duyck, Eric Dumazet, Paolo Abeni,
	Hannes Frederic Sowa

Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi->poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi->poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.

Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Reported-by: Martin Zaharinov <micron10@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
Change since v3:
  - Add READ_ONCE() for thread->state and add comments in
    ____napi_schedule().
  
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 19 ++++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5b67ea89d5f2..87a5d186faff 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -360,6 +360,7 @@ enum {
 	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
 	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
+	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
 };
 
 enum {
@@ -372,6 +373,7 @@ enum {
 	NAPIF_STATE_IN_BUSY_POLL	= BIT(NAPI_STATE_IN_BUSY_POLL),
 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
 	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
+	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
 };
 
 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c5967e80132..d3195a95f30e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
 		 */
 		thread = READ_ONCE(napi->thread);
 		if (thread) {
+			/* Avoid doing set_bit() if the thread is in
+			 * INTERRUPTIBLE state, cause napi_thread_wait()
+			 * makes sure to proceed with napi polling
+			 * if the thread is explicitly woken from here.
+			 */
+			if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
+				set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
 			wake_up_process(thread);
 			return;
 		}
@@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 		WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
 
 		new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
+			      NAPIF_STATE_SCHED_THREADED |
 			      NAPIF_STATE_PREFER_BUSY_POLL);
 
 		/* If STATE_MISSED was set, leave STATE_SCHED set,
@@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 
 static int napi_thread_wait(struct napi_struct *napi)
 {
+	bool woken = false;
+
 	set_current_state(TASK_INTERRUPTIBLE);
 
 	while (!kthread_should_stop() && !napi_disable_pending(napi)) {
-		if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+		/* Testing SCHED_THREADED bit here to make sure the current
+		 * kthread owns this napi and could poll on this napi.
+		 * Testing SCHED bit is not enough because SCHED bit might be
+		 * set by some other busy poll thread or by napi_disable().
+		 */
+		if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
 			WARN_ON(!list_empty(&napi->poll_list));
 			__set_current_state(TASK_RUNNING);
 			return 0;
 		}
 
 		schedule();
+		/* woken being true indicates this thread owns this napi. */
+		woken = true;
 		set_current_state(TASK_INTERRUPTIBLE);
 	}
 	__set_current_state(TASK_RUNNING);
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-16 22:36 [PATCH net v4] net: fix race between napi kthread mode and busy poll Wei Wang
@ 2021-03-17 18:58 ` Jakub Kicinski
  2021-03-17 21:50 ` patchwork-bot+netdevbpf
  2021-03-20  8:45 ` Martin Zaharinov
  2 siblings, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2021-03-17 18:58 UTC (permalink / raw)
  To: Wei Wang
  Cc: David S . Miller, netdev, Martin Zaharinov, Alexander Duyck,
	Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa

On Tue, 16 Mar 2021 15:36:47 -0700 Wei Wang wrote:
> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> determine if the kthread owns this napi and could call napi->poll() on
> it. However, if socket busy poll is enabled, it is possible that the
> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> on the same napi. napi_disable() could grab the SCHED bit as well.
> This patch tries to fix this race by adding a new bit
> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> in napi_complete_done(), and we only poll the napi in kthread if this
> bit is set. This helps distinguish the ownership of the napi between
> kthread and other scenarios and fixes the race issue.
> 
> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> Reported-by: Martin Zaharinov <micron10@gmail.com>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-16 22:36 [PATCH net v4] net: fix race between napi kthread mode and busy poll Wei Wang
  2021-03-17 18:58 ` Jakub Kicinski
@ 2021-03-17 21:50 ` patchwork-bot+netdevbpf
  2021-03-20  8:45 ` Martin Zaharinov
  2 siblings, 0 replies; 25+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-03-17 21:50 UTC (permalink / raw)
  To: Wei Wang
  Cc: davem, kuba, netdev, micron10, alexanderduyck, edumazet, pabeni, hannes

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Tue, 16 Mar 2021 15:36:47 -0700 you wrote:
> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> determine if the kthread owns this napi and could call napi->poll() on
> it. However, if socket busy poll is enabled, it is possible that the
> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> on the same napi. napi_disable() could grab the SCHED bit as well.
> This patch tries to fix this race by adding a new bit
> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> in napi_complete_done(), and we only poll the napi in kthread if this
> bit is set. This helps distinguish the ownership of the napi between
> kthread and other scenarios and fixes the race issue.
> 
> [...]

Here is the summary with links:
  - [net,v4] net: fix race between napi kthread mode and busy poll
    https://git.kernel.org/netdev/net/c/cb038357937e

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-16 22:36 [PATCH net v4] net: fix race between napi kthread mode and busy poll Wei Wang
  2021-03-17 18:58 ` Jakub Kicinski
  2021-03-17 21:50 ` patchwork-bot+netdevbpf
@ 2021-03-20  8:45 ` Martin Zaharinov
  2021-03-20  9:55   ` Eric Dumazet
  2 siblings, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-03-20  8:45 UTC (permalink / raw)
  To: Wei Wang
  Cc: David S . Miller, Jakub Kicinski, netdev, Alexander Duyck,
	Eric Dumazet, Paolo Abeni, Hannes Frederic Sowa

Hi Wei 
Check this:

[   39.706567] ------------[ cut here ]------------
[   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
[   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
[   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
[   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
[   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
[   39.706619] Workqueue: events work_for_cpu_fn
[   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
[   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
[   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
[   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
[   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
[   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
[   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
[   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
[   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
[   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   39.706656] Call Trace:
[   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
[   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
[   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
[   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
[   39.706716]  ? __kmalloc+0x37/0x160
[   39.706720]  ? kmem_cache_alloc+0xcb/0x120
[   39.706723]  ? irq_get_irq_data+0x5/0x20
[   39.706726]  ? mp_check_pin_attr+0xe/0xf0
[   39.706729]  ? irq_get_irq_data+0x5/0x20
[   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
[   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
[   39.706739]  ? pci_conf1_read+0x9f/0xf0
[   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
[   39.706746]  local_pci_probe+0x1b/0x40
[   39.706750]  work_for_cpu_fn+0xb/0x20
[   39.706754]  process_one_work+0x1ec/0x350
[   39.706758]  worker_thread+0x24b/0x4d0
[   39.706760]  ? process_one_work+0x350/0x350
[   39.706762]  kthread+0xea/0x120
[   39.706766]  ? kthread_park+0x80/0x80
[   39.706770]  ret_from_fork+0x1f/0x30
[   39.706774] ---[ end trace 7a203f3ec972a377 ]---

Martin
	

> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> 
> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> determine if the kthread owns this napi and could call napi->poll() on
> it. However, if socket busy poll is enabled, it is possible that the
> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> on the same napi. napi_disable() could grab the SCHED bit as well.
> This patch tries to fix this race by adding a new bit
> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> in napi_complete_done(), and we only poll the napi in kthread if this
> bit is set. This helps distinguish the ownership of the napi between
> kthread and other scenarios and fixes the race issue.
> 
> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> Reported-by: Martin Zaharinov <micron10@gmail.com>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
> Change since v3:
>  - Add READ_ONCE() for thread->state and add comments in
>    ____napi_schedule().
> 
> include/linux/netdevice.h |  2 ++
> net/core/dev.c            | 19 ++++++++++++++++++-
> 2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5b67ea89d5f2..87a5d186faff 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -360,6 +360,7 @@ enum {
> 	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
> 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
> 	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
> +	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
> };
> 
> enum {
> @@ -372,6 +373,7 @@ enum {
> 	NAPIF_STATE_IN_BUSY_POLL	= BIT(NAPI_STATE_IN_BUSY_POLL),
> 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
> 	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
> +	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
> };
> 
> enum gro_result {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6c5967e80132..d3195a95f30e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> 		 */
> 		thread = READ_ONCE(napi->thread);
> 		if (thread) {
> +			/* Avoid doing set_bit() if the thread is in
> +			 * INTERRUPTIBLE state, cause napi_thread_wait()
> +			 * makes sure to proceed with napi polling
> +			 * if the thread is explicitly woken from here.
> +			 */
> +			if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> +				set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> 			wake_up_process(thread);
> 			return;
> 		}
> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> 		WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> 
> 		new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> +			      NAPIF_STATE_SCHED_THREADED |
> 			      NAPIF_STATE_PREFER_BUSY_POLL);
> 
> 		/* If STATE_MISSED was set, leave STATE_SCHED set,
> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> 
> static int napi_thread_wait(struct napi_struct *napi)
> {
> +	bool woken = false;
> +
> 	set_current_state(TASK_INTERRUPTIBLE);
> 
> 	while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> -		if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> +		/* Testing SCHED_THREADED bit here to make sure the current
> +		 * kthread owns this napi and could poll on this napi.
> +		 * Testing SCHED bit is not enough because SCHED bit might be
> +		 * set by some other busy poll thread or by napi_disable().
> +		 */
> +		if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> 			WARN_ON(!list_empty(&napi->poll_list));
> 			__set_current_state(TASK_RUNNING);
> 			return 0;
> 		}
> 
> 		schedule();
> +		/* woken being true indicates this thread owns this napi. */
> +		woken = true;
> 		set_current_state(TASK_INTERRUPTIBLE);
> 	}
> 	__set_current_state(TASK_RUNNING);
> -- 
> 2.31.0.rc2.261.g7f71774620-goog
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-20  8:45 ` Martin Zaharinov
@ 2021-03-20  9:55   ` Eric Dumazet
  2021-03-20 10:31     ` Martin Zaharinov
  2021-03-30  9:25     ` Martin Zaharinov
  0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2021-03-20  9:55 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Wei Wang, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa

On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Wei
> Check this:
>
> [   39.706567] ------------[ cut here ]------------
> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100

Probably more relevant to Intel maintainers than Wei :/

> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
> [   39.706619] Workqueue: events work_for_cpu_fn
> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   39.706656] Call Trace:
> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
> [   39.706716]  ? __kmalloc+0x37/0x160
> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
> [   39.706723]  ? irq_get_irq_data+0x5/0x20
> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
> [   39.706729]  ? irq_get_irq_data+0x5/0x20
> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
> [   39.706746]  local_pci_probe+0x1b/0x40
> [   39.706750]  work_for_cpu_fn+0xb/0x20
> [   39.706754]  process_one_work+0x1ec/0x350
> [   39.706758]  worker_thread+0x24b/0x4d0
> [   39.706760]  ? process_one_work+0x350/0x350
> [   39.706762]  kthread+0xea/0x120
> [   39.706766]  ? kthread_park+0x80/0x80
> [   39.706770]  ret_from_fork+0x1f/0x30
> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>
> Martin
>
>
> > On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> >
> > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> > determine if the kthread owns this napi and could call napi->poll() on
> > it. However, if socket busy poll is enabled, it is possible that the
> > busy poll thread grabs this SCHED bit (after the previous napi->poll()
> > invokes napi_complete_done() and clears SCHED bit) and tries to poll
> > on the same napi. napi_disable() could grab the SCHED bit as well.
> > This patch tries to fix this race by adding a new bit
> > NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> > ____napi_schedule() if the threaded mode is enabled, and gets cleared
> > in napi_complete_done(), and we only poll the napi in kthread if this
> > bit is set. This helps distinguish the ownership of the napi between
> > kthread and other scenarios and fixes the race issue.
> >
> > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> > Reported-by: Martin Zaharinov <micron10@gmail.com>
> > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Wei Wang <weiwan@google.com>
> > Cc: Alexander Duyck <alexanderduyck@fb.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> > Change since v3:
> >  - Add READ_ONCE() for thread->state and add comments in
> >    ____napi_schedule().
> >
> > include/linux/netdevice.h |  2 ++
> > net/core/dev.c            | 19 ++++++++++++++++++-
> > 2 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 5b67ea89d5f2..87a5d186faff 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -360,6 +360,7 @@ enum {
> >       NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> >       NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> >       NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> > +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> > };
> >
> > enum {
> > @@ -372,6 +373,7 @@ enum {
> >       NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> >       NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> >       NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> > +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> > };
> >
> > enum gro_result {
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 6c5967e80132..d3195a95f30e 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >                */
> >               thread = READ_ONCE(napi->thread);
> >               if (thread) {
> > +                     /* Avoid doing set_bit() if the thread is in
> > +                      * INTERRUPTIBLE state, cause napi_thread_wait()
> > +                      * makes sure to proceed with napi polling
> > +                      * if the thread is explicitly woken from here.
> > +                      */
> > +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> > +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> >                       wake_up_process(thread);
> >                       return;
> >               }
> > @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> >               WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> >
> >               new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> > +                           NAPIF_STATE_SCHED_THREADED |
> >                             NAPIF_STATE_PREFER_BUSY_POLL);
> >
> >               /* If STATE_MISSED was set, leave STATE_SCHED set,
> > @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> >
> > static int napi_thread_wait(struct napi_struct *napi)
> > {
> > +     bool woken = false;
> > +
> >       set_current_state(TASK_INTERRUPTIBLE);
> >
> >       while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> > -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> > +             /* Testing SCHED_THREADED bit here to make sure the current
> > +              * kthread owns this napi and could poll on this napi.
> > +              * Testing SCHED bit is not enough because SCHED bit might be
> > +              * set by some other busy poll thread or by napi_disable().
> > +              */
> > +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> >                       WARN_ON(!list_empty(&napi->poll_list));
> >                       __set_current_state(TASK_RUNNING);
> >                       return 0;
> >               }
> >
> >               schedule();
> > +             /* woken being true indicates this thread owns this napi. */
> > +             woken = true;
> >               set_current_state(TASK_INTERRUPTIBLE);
> >       }
> >       __set_current_state(TASK_RUNNING);
> > --
> > 2.31.0.rc2.261.g7f71774620-goog
> >
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-20  9:55   ` Eric Dumazet
@ 2021-03-20 10:31     ` Martin Zaharinov
  2021-03-30  9:25     ` Martin Zaharinov
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Zaharinov @ 2021-03-20 10:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Wang, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa

Hi Eric 
May be I write to Wai to check yet.

And one other problem may be is for network team and I don't know how much it has to do with it.

Mar 20 06:06:28 [367562.703896][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:06:33 [367567.824137][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:06:39 [367572.944079][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:06:44 [367578.064217][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:06:49 [367583.184378][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:06:54 [367588.304470][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:06:59 [367593.414634][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:04 [367598.534996][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:09 [367603.664872][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:14 [367608.785017][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:20 [367613.905101][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:25 [367619.025236][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:30 [367624.145448][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:35 [367629.265489][T1217504] team0: Failed to send options change via netlink (err -105)
Mar 20 06:07:40 [367634.385630][T1217504] team0: Failed to send options change via netlink (err -105)


when this happens it stops connecting to the server


Martin	

> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Wei
>> Check this:
>> 
>> [   39.706567] ------------[ cut here ]------------
>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
> 
> Probably more relevant to Intel maintainers than Wei :/
> 
>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>> [   39.706619] Workqueue: events work_for_cpu_fn
>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [   39.706656] Call Trace:
>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>> [   39.706716]  ? __kmalloc+0x37/0x160
>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>> [   39.706746]  local_pci_probe+0x1b/0x40
>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>> [   39.706754]  process_one_work+0x1ec/0x350
>> [   39.706758]  worker_thread+0x24b/0x4d0
>> [   39.706760]  ? process_one_work+0x350/0x350
>> [   39.706762]  kthread+0xea/0x120
>> [   39.706766]  ? kthread_park+0x80/0x80
>> [   39.706770]  ret_from_fork+0x1f/0x30
>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>> 
>> Martin
>> 
>> 
>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>> 
>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>> determine if the kthread owns this napi and could call napi->poll() on
>>> it. However, if socket busy poll is enabled, it is possible that the
>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>> This patch tries to fix this race by adding a new bit
>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>> bit is set. This helps distinguish the ownership of the napi between
>>> kthread and other scenarios and fixes the race issue.
>>> 
>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>> ---
>>> Change since v3:
>>> - Add READ_ONCE() for thread->state and add comments in
>>>   ____napi_schedule().
>>> 
>>> include/linux/netdevice.h |  2 ++
>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 5b67ea89d5f2..87a5d186faff 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -360,6 +360,7 @@ enum {
>>>      NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>      NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>      NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>> };
>>> 
>>> enum {
>>> @@ -372,6 +373,7 @@ enum {
>>>      NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>      NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>      NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>> };
>>> 
>>> enum gro_result {
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 6c5967e80132..d3195a95f30e 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>               */
>>>              thread = READ_ONCE(napi->thread);
>>>              if (thread) {
>>> +                     /* Avoid doing set_bit() if the thread is in
>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>> +                      * makes sure to proceed with napi polling
>>> +                      * if the thread is explicitly woken from here.
>>> +                      */
>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>                      wake_up_process(thread);
>>>                      return;
>>>              }
>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>              WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>> 
>>>              new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>                            NAPIF_STATE_PREFER_BUSY_POLL);
>>> 
>>>              /* If STATE_MISSED was set, leave STATE_SCHED set,
>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>> 
>>> static int napi_thread_wait(struct napi_struct *napi)
>>> {
>>> +     bool woken = false;
>>> +
>>>      set_current_state(TASK_INTERRUPTIBLE);
>>> 
>>>      while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>> +              * kthread owns this napi and could poll on this napi.
>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>> +              * set by some other busy poll thread or by napi_disable().
>>> +              */
>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>                      WARN_ON(!list_empty(&napi->poll_list));
>>>                      __set_current_state(TASK_RUNNING);
>>>                      return 0;
>>>              }
>>> 
>>>              schedule();
>>> +             /* woken being true indicates this thread owns this napi. */
>>> +             woken = true;
>>>              set_current_state(TASK_INTERRUPTIBLE);
>>>      }
>>>      __set_current_state(TASK_RUNNING);
>>> --
>>> 2.31.0.rc2.261.g7f71774620-goog
>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-20  9:55   ` Eric Dumazet
  2021-03-20 10:31     ` Martin Zaharinov
@ 2021-03-30  9:25     ` Martin Zaharinov
  2021-03-30 13:39       ` Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-03-30  9:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Wang, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa

Hi Eric and Wei

Please check this log :


1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
[1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
[1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
[1584289.107263] Call Trace:
[1584289.107266]  dump_stack+0x58/0x6b
[1584289.209562]  warn_alloc.cold+0x70/0xd4
[1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
[1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
[1584289.474009]  allocate_slab+0x272/0x450
[1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
[1584289.519147]  kmem_cache_alloc+0x110/0x120
[1584289.541416]  build_skb+0x1a/0x200
[1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
[1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
[1584289.605528]  __napi_poll+0x1f/0x130
[1584289.625842]  napi_threaded_poll+0x110/0x160
[1584289.646110]  ? __napi_poll+0x130/0x130
[1584289.665810]  kthread+0xea/0x120
[1584289.684836]  ? kthread_park+0x80/0x80
[1584289.703440]  ret_from_fork+0x1f/0x30
[1584289.721616] Mem-Info:
[1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
                  active_file:17408 inactive_file:149 isolated_file:32
                  unevictable:1440359 dirty:17500 writeback:0
                  slab_reclaimable:43368 slab_unreclaimable:155124
                  mapped:817431 shmem:7650 pagetables:32093 bounce:0
                  free:17832 free_pcp:113 free_cma:0
[1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
[1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
[1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
[1584290.104980] lowmem_reserve[]: 0 0 13985 13985
[1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
[1584290.237051] lowmem_reserve[]: 0 0 0 0
[1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
[1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
[1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
[1584290.409087] 1465768 total pagecache pages
[1584290.434531] 4165289 pages RAM
[1584290.459616] 0 pages HighMem/MovableOnly
[1584290.484480] 104766 pages reserved
[1584290.508709] 0 pages hwpoisoned
[1584301.710231] team0: Failed to send options change via netlink (err -105)
[1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
[1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
[1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
[1584302.776532] Call Trace:
[1584302.799361]  dump_stack+0x58/0x6b
[1584302.821791]  dump_header+0x4c/0x2e6
[1584302.843580]  oom_kill_process.cold+0xb/0x10
[1584302.865223]  out_of_memory.part.0+0x125/0x5f0
[1584302.886641]  out_of_memory+0x54/0xa0
[1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
[1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
[1584302.947874]  __get_free_pages+0x8/0x30
[1584302.967246]  pgd_alloc+0x21/0x180
[1584302.986355]  mm_alloc+0x1af/0x250
[1584303.005085]  alloc_bprm+0x80/0x2a0
[1584303.023328]  do_execveat_common+0x8b/0x330
[1584303.041181]  __x64_sys_execve+0x2b/0x40
[1584303.058513]  do_syscall_64+0x2d/0x40
[1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1584303.091891] RIP: 0033:0x488376
[1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
[1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
[1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
[1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
[1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
[1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
[1584303.379094] Mem-Info:
[1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
                  active_file:12975 inactive_file:168 isolated_file:32
                  unevictable:909709 dirty:12864 writeback:10
                  slab_reclaimable:42415 slab_unreclaimable:154783
                  mapped:39825 shmem:14744 pagetables:26041 bounce:0
                  free:537002 free_pcp:1813 free_cma:0
[1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
[1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
[1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
[1584303.888935] lowmem_reserve[]: 0 0 13985 13985
[1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
[1584304.036531] lowmem_reserve[]: 0 0 0 0
[1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
[1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
[1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
[1584304.287094] 933871 total pagecache pages
[1584304.312815] 4165289 pages RAM
[1584304.337915] 0 pages HighMem/MovableOnly
[1584304.362522] 104766 pages reserved
[1584304.386516] 0 pages hwpoisoned

> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Wei
>> Check this:
>> 
>> [   39.706567] ------------[ cut here ]------------
>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
> 
> Probably more relevant to Intel maintainers than Wei :/
> 
>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>> [   39.706619] Workqueue: events work_for_cpu_fn
>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [   39.706656] Call Trace:
>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>> [   39.706716]  ? __kmalloc+0x37/0x160
>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>> [   39.706746]  local_pci_probe+0x1b/0x40
>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>> [   39.706754]  process_one_work+0x1ec/0x350
>> [   39.706758]  worker_thread+0x24b/0x4d0
>> [   39.706760]  ? process_one_work+0x350/0x350
>> [   39.706762]  kthread+0xea/0x120
>> [   39.706766]  ? kthread_park+0x80/0x80
>> [   39.706770]  ret_from_fork+0x1f/0x30
>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>> 
>> Martin
>> 
>> 
>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>> 
>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>> determine if the kthread owns this napi and could call napi->poll() on
>>> it. However, if socket busy poll is enabled, it is possible that the
>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>> This patch tries to fix this race by adding a new bit
>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>> bit is set. This helps distinguish the ownership of the napi between
>>> kthread and other scenarios and fixes the race issue.
>>> 
>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>> ---
>>> Change since v3:
>>> - Add READ_ONCE() for thread->state and add comments in
>>>   ____napi_schedule().
>>> 
>>> include/linux/netdevice.h |  2 ++
>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 5b67ea89d5f2..87a5d186faff 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -360,6 +360,7 @@ enum {
>>>      NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>      NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>      NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>> };
>>> 
>>> enum {
>>> @@ -372,6 +373,7 @@ enum {
>>>      NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>      NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>      NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>> };
>>> 
>>> enum gro_result {
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 6c5967e80132..d3195a95f30e 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>               */
>>>              thread = READ_ONCE(napi->thread);
>>>              if (thread) {
>>> +                     /* Avoid doing set_bit() if the thread is in
>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>> +                      * makes sure to proceed with napi polling
>>> +                      * if the thread is explicitly woken from here.
>>> +                      */
>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>                      wake_up_process(thread);
>>>                      return;
>>>              }
>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>              WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>> 
>>>              new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>                            NAPIF_STATE_PREFER_BUSY_POLL);
>>> 
>>>              /* If STATE_MISSED was set, leave STATE_SCHED set,
>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>> 
>>> static int napi_thread_wait(struct napi_struct *napi)
>>> {
>>> +     bool woken = false;
>>> +
>>>      set_current_state(TASK_INTERRUPTIBLE);
>>> 
>>>      while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>> +              * kthread owns this napi and could poll on this napi.
>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>> +              * set by some other busy poll thread or by napi_disable().
>>> +              */
>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>                      WARN_ON(!list_empty(&napi->poll_list));
>>>                      __set_current_state(TASK_RUNNING);
>>>                      return 0;
>>>              }
>>> 
>>>              schedule();
>>> +             /* woken being true indicates this thread owns this napi. */
>>> +             woken = true;
>>>              set_current_state(TASK_INTERRUPTIBLE);
>>>      }
>>>      __set_current_state(TASK_RUNNING);
>>> --
>>> 2.31.0.rc2.261.g7f71774620-goog
>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-30  9:25     ` Martin Zaharinov
@ 2021-03-30 13:39       ` Eric Dumazet
  2021-04-10 11:22         ` Bug Report Napi GRO ixgbe Martin Zaharinov
  2021-09-09 11:18         ` [PATCH net v4] net: fix race between napi kthread mode and busy poll Martin Zaharinov
  0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2021-03-30 13:39 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Wei Wang, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa

On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Eric and Wei
>
> Please check this log :
>

Please send a normal report to netdev.

This has nothing to to with us (Eric & Wei)

Thanks.

>
> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> [1584289.107263] Call Trace:
> [1584289.107266]  dump_stack+0x58/0x6b
> [1584289.209562]  warn_alloc.cold+0x70/0xd4
> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
> [1584289.474009]  allocate_slab+0x272/0x450
> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
> [1584289.519147]  kmem_cache_alloc+0x110/0x120
> [1584289.541416]  build_skb+0x1a/0x200
> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
> [1584289.605528]  __napi_poll+0x1f/0x130
> [1584289.625842]  napi_threaded_poll+0x110/0x160
> [1584289.646110]  ? __napi_poll+0x130/0x130
> [1584289.665810]  kthread+0xea/0x120
> [1584289.684836]  ? kthread_park+0x80/0x80
> [1584289.703440]  ret_from_fork+0x1f/0x30
> [1584289.721616] Mem-Info:
> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>                   active_file:17408 inactive_file:149 isolated_file:32
>                   unevictable:1440359 dirty:17500 writeback:0
>                   slab_reclaimable:43368 slab_unreclaimable:155124
>                   mapped:817431 shmem:7650 pagetables:32093 bounce:0
>                   free:17832 free_pcp:113 free_cma:0
> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
> [1584290.237051] lowmem_reserve[]: 0 0 0 0
> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
> [1584290.409087] 1465768 total pagecache pages
> [1584290.434531] 4165289 pages RAM
> [1584290.459616] 0 pages HighMem/MovableOnly
> [1584290.484480] 104766 pages reserved
> [1584290.508709] 0 pages hwpoisoned
> [1584301.710231] team0: Failed to send options change via netlink (err -105)
> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> [1584302.776532] Call Trace:
> [1584302.799361]  dump_stack+0x58/0x6b
> [1584302.821791]  dump_header+0x4c/0x2e6
> [1584302.843580]  oom_kill_process.cold+0xb/0x10
> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
> [1584302.886641]  out_of_memory+0x54/0xa0
> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
> [1584302.947874]  __get_free_pages+0x8/0x30
> [1584302.967246]  pgd_alloc+0x21/0x180
> [1584302.986355]  mm_alloc+0x1af/0x250
> [1584303.005085]  alloc_bprm+0x80/0x2a0
> [1584303.023328]  do_execveat_common+0x8b/0x330
> [1584303.041181]  __x64_sys_execve+0x2b/0x40
> [1584303.058513]  do_syscall_64+0x2d/0x40
> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [1584303.091891] RIP: 0033:0x488376
> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
> [1584303.379094] Mem-Info:
> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>                   active_file:12975 inactive_file:168 isolated_file:32
>                   unevictable:909709 dirty:12864 writeback:10
>                   slab_reclaimable:42415 slab_unreclaimable:154783
>                   mapped:39825 shmem:14744 pagetables:26041 bounce:0
>                   free:537002 free_pcp:1813 free_cma:0
> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
> [1584304.036531] lowmem_reserve[]: 0 0 0 0
> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
> [1584304.287094] 933871 total pagecache pages
> [1584304.312815] 4165289 pages RAM
> [1584304.337915] 0 pages HighMem/MovableOnly
> [1584304.362522] 104766 pages reserved
> [1584304.386516] 0 pages hwpoisoned
>
> > On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>
> >> Hi Wei
> >> Check this:
> >>
> >> [   39.706567] ------------[ cut here ]------------
> >> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
> >> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
> >
> > Probably more relevant to Intel maintainers than Wei :/
> >
> >> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
> >> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
> >> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
> >> [   39.706619] Workqueue: events work_for_cpu_fn
> >> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
> >> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> >> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
> >> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
> >> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
> >> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
> >> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
> >> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
> >> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
> >> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
> >> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> [   39.706656] Call Trace:
> >> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
> >> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
> >> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
> >> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
> >> [   39.706716]  ? __kmalloc+0x37/0x160
> >> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
> >> [   39.706723]  ? irq_get_irq_data+0x5/0x20
> >> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
> >> [   39.706729]  ? irq_get_irq_data+0x5/0x20
> >> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
> >> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
> >> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
> >> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
> >> [   39.706746]  local_pci_probe+0x1b/0x40
> >> [   39.706750]  work_for_cpu_fn+0xb/0x20
> >> [   39.706754]  process_one_work+0x1ec/0x350
> >> [   39.706758]  worker_thread+0x24b/0x4d0
> >> [   39.706760]  ? process_one_work+0x350/0x350
> >> [   39.706762]  kthread+0xea/0x120
> >> [   39.706766]  ? kthread_park+0x80/0x80
> >> [   39.706770]  ret_from_fork+0x1f/0x30
> >> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
> >>
> >> Martin
> >>
> >>
> >>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> >>>
> >>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> >>> determine if the kthread owns this napi and could call napi->poll() on
> >>> it. However, if socket busy poll is enabled, it is possible that the
> >>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> >>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> >>> on the same napi. napi_disable() could grab the SCHED bit as well.
> >>> This patch tries to fix this race by adding a new bit
> >>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> >>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> >>> in napi_complete_done(), and we only poll the napi in kthread if this
> >>> bit is set. This helps distinguish the ownership of the napi between
> >>> kthread and other scenarios and fixes the race issue.
> >>>
> >>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> >>> Reported-by: Martin Zaharinov <micron10@gmail.com>
> >>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> >>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>> Cc: Alexander Duyck <alexanderduyck@fb.com>
> >>> Cc: Eric Dumazet <edumazet@google.com>
> >>> Cc: Paolo Abeni <pabeni@redhat.com>
> >>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >>> ---
> >>> Change since v3:
> >>> - Add READ_ONCE() for thread->state and add comments in
> >>>   ____napi_schedule().
> >>>
> >>> include/linux/netdevice.h |  2 ++
> >>> net/core/dev.c            | 19 ++++++++++++++++++-
> >>> 2 files changed, 20 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>> index 5b67ea89d5f2..87a5d186faff 100644
> >>> --- a/include/linux/netdevice.h
> >>> +++ b/include/linux/netdevice.h
> >>> @@ -360,6 +360,7 @@ enum {
> >>>      NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> >>>      NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> >>>      NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> >>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> >>> };
> >>>
> >>> enum {
> >>> @@ -372,6 +373,7 @@ enum {
> >>>      NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> >>>      NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> >>>      NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> >>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> >>> };
> >>>
> >>> enum gro_result {
> >>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>> index 6c5967e80132..d3195a95f30e 100644
> >>> --- a/net/core/dev.c
> >>> +++ b/net/core/dev.c
> >>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >>>               */
> >>>              thread = READ_ONCE(napi->thread);
> >>>              if (thread) {
> >>> +                     /* Avoid doing set_bit() if the thread is in
> >>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
> >>> +                      * makes sure to proceed with napi polling
> >>> +                      * if the thread is explicitly woken from here.
> >>> +                      */
> >>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> >>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> >>>                      wake_up_process(thread);
> >>>                      return;
> >>>              }
> >>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> >>>              WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> >>>
> >>>              new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> >>> +                           NAPIF_STATE_SCHED_THREADED |
> >>>                            NAPIF_STATE_PREFER_BUSY_POLL);
> >>>
> >>>              /* If STATE_MISSED was set, leave STATE_SCHED set,
> >>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> >>>
> >>> static int napi_thread_wait(struct napi_struct *napi)
> >>> {
> >>> +     bool woken = false;
> >>> +
> >>>      set_current_state(TASK_INTERRUPTIBLE);
> >>>
> >>>      while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> >>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> >>> +             /* Testing SCHED_THREADED bit here to make sure the current
> >>> +              * kthread owns this napi and could poll on this napi.
> >>> +              * Testing SCHED bit is not enough because SCHED bit might be
> >>> +              * set by some other busy poll thread or by napi_disable().
> >>> +              */
> >>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> >>>                      WARN_ON(!list_empty(&napi->poll_list));
> >>>                      __set_current_state(TASK_RUNNING);
> >>>                      return 0;
> >>>              }
> >>>
> >>>              schedule();
> >>> +             /* woken being true indicates this thread owns this napi. */
> >>> +             woken = true;
> >>>              set_current_state(TASK_INTERRUPTIBLE);
> >>>      }
> >>>      __set_current_state(TASK_RUNNING);
> >>> --
> >>> 2.31.0.rc2.261.g7f71774620-goog
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Bug Report Napi GRO ixgbe
  2021-03-30 13:39       ` Eric Dumazet
@ 2021-04-10 11:22         ` Martin Zaharinov
  2021-04-12  8:36           ` Paolo Abeni
  2021-09-09 11:18         ` [PATCH net v4] net: fix race between napi kthread mode and busy poll Martin Zaharinov
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-04-10 11:22 UTC (permalink / raw)
  To: netdev
  Cc: Wei Wang, David S . Miller, Jakub Kicinski, Alexander Duyck,
	Paolo Abeni, Hannes Frederic Sowa, Eric Dumazet, alobakin

Hi  Team

One report latest kernel 5.11.12 

Please check and help to find and fix

Apr 10 12:46:25  [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88
Apr 10 12:46:26  [214315.570814][ T3345] FS:  0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000
Apr 10 12:46:26  [214315.622416][ T3345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 10 12:46:26  [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0
Apr 10 12:46:26  [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 10 12:46:26  [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 10 12:46:26  [214315.799749][ T3345] Call Trace:
Apr 10 12:46:26  [214315.824268][ T3345]  netif_receive_skb_list_internal+0x5e/0x2c0
Apr 10 12:46:26  [214315.848996][ T3345]  napi_gro_flush+0x11b/0x260
Apr 10 12:46:26  [214315.873320][ T3345]  napi_complete_done+0x107/0x180
Apr 10 12:46:26  [214315.897160][ T3345]  ixgbe_poll+0x10e/0x2a0 [ixgbe]
Apr 10 12:46:26  [214315.920564][ T3345]  __napi_poll+0x1f/0x130
Apr 10 12:46:26  [214315.943475][ T3345]  napi_threaded_poll+0x110/0x160
Apr 10 12:46:26  [214315.966252][ T3345]  ? __napi_poll+0x130/0x130
Apr 10 12:46:26  [214315.988424][ T3345]  kthread+0xea/0x120
Apr 10 12:46:26  [214316.010247][ T3345]  ? kthread_park+0x80/0x80
Apr 10 12:46:26  [214316.031729][ T3345]  ret_from_fork+0x1f/0x30
Apr 10 12:46:26  [214316.052904][ T3345] ---[ end trace c7726a0541128b42 ]—

Martin


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Bug Report Napi GRO ixgbe
  2021-04-10 11:22         ` Bug Report Napi GRO ixgbe Martin Zaharinov
@ 2021-04-12  8:36           ` Paolo Abeni
  2021-04-26  8:31             ` Martin Zaharinov
  2021-05-09 10:40             ` Bug Report Napi GRO ixgbe Martin Zaharinov
  0 siblings, 2 replies; 25+ messages in thread
From: Paolo Abeni @ 2021-04-12  8:36 UTC (permalink / raw)
  To: Martin Zaharinov, netdev
  Cc: Wei Wang, David S . Miller, Jakub Kicinski, Alexander Duyck,
	Hannes Frederic Sowa, Eric Dumazet, alobakin

Hello,

On Sat, 2021-04-10 at 14:22 +0300, Martin Zaharinov wrote:
> Hi  Team
> 
> One report latest kernel 5.11.12 
> 
> Please check and help to find and fix

Please provide a complete splat, including the trapping instruction.
> 
> Apr 10 12:46:25  [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88
> Apr 10 12:46:26  [214315.570814][ T3345] FS:  0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000
> Apr 10 12:46:26  [214315.622416][ T3345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Apr 10 12:46:26  [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0
> Apr 10 12:46:26  [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Apr 10 12:46:26  [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Apr 10 12:46:26  [214315.799749][ T3345] Call Trace:
> Apr 10 12:46:26  [214315.824268][ T3345]  netif_receive_skb_list_internal+0x5e/0x2c0
> Apr 10 12:46:26  [214315.848996][ T3345]  napi_gro_flush+0x11b/0x260
> Apr 10 12:46:26  [214315.873320][ T3345]  napi_complete_done+0x107/0x180
> Apr 10 12:46:26  [214315.897160][ T3345]  ixgbe_poll+0x10e/0x2a0 [ixgbe]
> Apr 10 12:46:26  [214315.920564][ T3345]  __napi_poll+0x1f/0x130
> Apr 10 12:46:26  [214315.943475][ T3345]  napi_threaded_poll+0x110/0x160
> Apr 10 12:46:26  [214315.966252][ T3345]  ? __napi_poll+0x130/0x130
> Apr 10 12:46:26  [214315.988424][ T3345]  kthread+0xea/0x120
> Apr 10 12:46:26  [214316.010247][ T3345]  ? kthread_park+0x80/0x80
> Apr 10 12:46:26  [214316.031729][ T3345]  ret_from_fork+0x1f/0x30

Could you please also provide the decoded the stack trace? Something
alike the following will do:

cat <file contaning the splat> | ./scripts/decode_stacktrace.sh <path to vmlinux>

Even more importantly:

threaded napi is implemented with the merge
commit adbb4fb028452b1b0488a1a7b66ab856cdf20715, which landed into the
vanilla tree since v5.12.rc1 and is not backported to 5.11.x. What
kernel are you really using?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Bug Report Napi GRO ixgbe
  2021-04-12  8:36           ` Paolo Abeni
@ 2021-04-26  8:31             ` Martin Zaharinov
  2021-05-08 10:48               ` Bug Report Napi kthread rcd Martin Zaharinov
  2021-05-09 10:40             ` Bug Report Napi GRO ixgbe Martin Zaharinov
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-04-26  8:31 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Wei Wang, David S . Miller, Jakub Kicinski,
	Alexander Duyck, Hannes Frederic Sowa, Eric Dumazet, alobakin

Hi Paolo
Sorry for delay.

After disable gro on eth interface and team0 and now work fine.

In this case I user kernel 5.11.12 but after release 5.12 I will migrate to them and will check for problem with kthread.

Thanks,
I will update if have other problem.

Martin	

> On 12 Apr 2021, at 11:36, Paolo Abeni <pabeni@redhat.com> wrote:
> 
> Hello,
> 
> On Sat, 2021-04-10 at 14:22 +0300, Martin Zaharinov wrote:
>> Hi  Team
>> 
>> One report latest kernel 5.11.12 
>> 
>> Please check and help to find and fix
> 
> Please provide a complete splat, including the trapping instruction.
>> 
>> Apr 10 12:46:25  [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88
>> Apr 10 12:46:26  [214315.570814][ T3345] FS:  0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000
>> Apr 10 12:46:26  [214315.622416][ T3345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Apr 10 12:46:26  [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0
>> Apr 10 12:46:26  [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Apr 10 12:46:26  [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Apr 10 12:46:26  [214315.799749][ T3345] Call Trace:
>> Apr 10 12:46:26  [214315.824268][ T3345]  netif_receive_skb_list_internal+0x5e/0x2c0
>> Apr 10 12:46:26  [214315.848996][ T3345]  napi_gro_flush+0x11b/0x260
>> Apr 10 12:46:26  [214315.873320][ T3345]  napi_complete_done+0x107/0x180
>> Apr 10 12:46:26  [214315.897160][ T3345]  ixgbe_poll+0x10e/0x2a0 [ixgbe]
>> Apr 10 12:46:26  [214315.920564][ T3345]  __napi_poll+0x1f/0x130
>> Apr 10 12:46:26  [214315.943475][ T3345]  napi_threaded_poll+0x110/0x160
>> Apr 10 12:46:26  [214315.966252][ T3345]  ? __napi_poll+0x130/0x130
>> Apr 10 12:46:26  [214315.988424][ T3345]  kthread+0xea/0x120
>> Apr 10 12:46:26  [214316.010247][ T3345]  ? kthread_park+0x80/0x80
>> Apr 10 12:46:26  [214316.031729][ T3345]  ret_from_fork+0x1f/0x30
> 
> Could you please also provide the decoded the stack trace? Something
> alike the following will do:
> 
> cat <file contaning the splat> | ./scripts/decode_stacktrace.sh <path to vmlinux>
> 
> Even more importantly:
> 
> threaded napi is implemented with the merge
> commit adbb4fb028452b1b0488a1a7b66ab856cdf20715, which landed into the
> vanilla tree since v5.12.rc1 and is not backported to 5.11.x. What
> kernel are you really using?
> 
> Thanks,
> 
> Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Bug Report Napi kthread rcd
  2021-04-26  8:31             ` Martin Zaharinov
@ 2021-05-08 10:48               ` Martin Zaharinov
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Zaharinov @ 2021-05-08 10:48 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Wei Wang, David S . Miller, Jakub Kicinski,
	Alexander Duyck, Hannes Frederic Sowa, Eric Dumazet, alobakin

Hi all 
One more bug report .
Kernel is 5.12.1 

If you need more info I will write.

Server run with 200 users with nat 

[81402.540906] rcu: INFO: rcu_sched self-detected stall on CPU
[81402.540909] rcu: 5-....: (3314 ticks this GP) idle=74e/1/0x4000000000000000 softirq=4979878/4979878 fqs=2554 last_accelerate: a926/c0a0 dyntick_enabled: 1
[81402.540911] (t=6001 jiffies g=7517749 q=44479)
[81402.540913] NMI backtrace for cpu 5
[81402.540914] CPU: 5 PID: 36 Comm: ksoftirqd/5 Tainted: G O 5.12.1 #1
[81402.540916] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
[81402.540917] Call Trace:
[81402.540919]
[81402.540920] dump_stack+0x65/0x7d
[81402.540924] ? lapic_can_unplug_cpu+0x70/0x70
[81402.540927] nmi_trigger_cpumask_backtrace.cold+0x40/0x4d
[81402.540929] rcu_dump_cpu_stacks+0xbe/0xec
[81402.540932] rcu_sched_clock_irq.cold+0x195/0x3f1
[81402.540934] ? enqueue_task_fair+0x796/0xbd0
[81402.540938] update_process_times+0x88/0xc0
[81402.540942] tick_sched_timer+0x7f/0x110
[81402.540944] ? tick_nohz_dep_set_task+0x80/0x80
[81402.540945] __hrtimer_run_queues+0x10b/0x1b0
[81402.540947] hrtimer_interrupt+0x10a/0x420
[81402.540949] __sysvec_apic_timer_interrupt+0x47/0x60
[81402.540952] sysvec_apic_timer_interrupt+0x65/0x90
[81402.540955]
[81402.540955] asm_sysvec_apic_timer_interrupt+0xf/0x20
[81402.540959] RIP: 0010:console_unlock+0x366/0x5e0
[81402.540961] Code: ff ff 8b 05 44 5f b2 01 85 c0 75 66 c7 05 3a 5f b2 01 01 00 00 00 e9 0f fd ff ff e8 f4 1c 00 00 48 85 db 74 01 fb 8b 54 24 0c <85> d2 0f 84 4a fd ff ff e8 1d 2b 7c 00 e9 40 fd ff ff 4d 85 ff 74
[81402.540963] RSP: 0018:ffff9dc980203a80 EFLAGS: 00000206
[81402.540964] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000000
[81402.540965] RDX: 0000000000000000 RSI: 0000000000000087 RDI: ffffffff82b59898
[81402.540966] RBP: 0000000000000000 R08: ffff9786814db080 R09: 0000000000000000
[81402.540966] R10: ffff9786a85bf260 R11: ffff9786f7bd7cf0 R12: 0000000000000048
[81402.540967] R13: 0000000000000000 R14: 20c49ba5e353f7cf R15: 0000000000000000
[81402.540968] ? common_interrupt+0x14/0xa0
[81402.540969] ? asm_common_interrupt+0x1b/0x40
[81402.540971] vprintk_default+0x5a/0x150
[81402.540972] printk+0x43/0x45
[81402.540975] create_nat_session+0x1c5e/0x1cfd [xt_NAT]
[81402.540978] ipt_do_table+0x2e5/0x670 [ip_tables]
[81402.540980] ? ip_route_input_noref+0xa8/0x1e0
[81402.540983] nf_hook_slow+0x36/0xa0
[81402.540986] ip_forward+0x40d/0x450
[81402.540987] ? ip4_obj_hashfn+0xc0/0xc0
[81402.540989] process_backlog+0x11a/0x230
[81402.540992] __napi_poll+0x1f/0x130
[81402.540994] net_rx_action+0x239/0x2f0
[81402.540996] ? run_timer_softirq+0x730/0x880
[81402.540998] __do_softirq+0xaf/0x1da
[81402.541000] run_ksoftirqd+0x15/0x20
[81402.541004] smpboot_thread_fn+0xb3/0x140
[81402.541006] ? sort_range+0x20/0x20
[81402.541008] kthread+0xea/0x120
[81402.541010] ? kthread_park+0x80/0x80
[81402.541012] ret_from_fork+0x1f/0x30
[81416.300055] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
[81476.311498] rcu: INFO: rcu_sched self-detected stall on CPU
[81476.311500] rcu: 3-....: (1 GPs behind) idle=86a/1/0x4000000000000000 softirq=4703397/4703398 fqs=2596 last_accelerate: c5ff/dd71 dyntick_enabled: 1
[81476.311503] (t=6001 jiffies g=7517753 q=82419)
[81476.311505] NMI backtrace for cpu 3
[81476.311506] CPU: 3 PID: 527214 Comm: kworker/3:2 Tainted: G O 5.12.1 #1
[81476.311507] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
[81476.311509] Workqueue: rcu_gp wait_rcu_exp_gp
[81476.311512] Call Trace:
[81476.311514]
[81476.311515] dump_stack+0x65/0x7d
[81476.311519] ? lapic_can_unplug_cpu+0x70/0x70
[81476.311521] nmi_trigger_cpumask_backtrace.cold+0x40/0x4d
[81476.311523] rcu_dump_cpu_stacks+0xbe/0xec
[81476.311527] rcu_sched_clock_irq.cold+0x195/0x3f1
[81476.311529] ? timekeeping_advance+0x34e/0x540
[81476.311531] update_process_times+0x88/0xc0
[81476.311534] tick_sched_timer+0x7f/0x110
[81476.311536] ? tick_nohz_dep_set_task+0x80/0x80
[81476.311537] __hrtimer_run_queues+0x10b/0x1b0
[81476.311539] hrtimer_interrupt+0x10a/0x420
[81476.311541] __sysvec_apic_timer_interrupt+0x47/0x60
[81476.311544] sysvec_apic_timer_interrupt+0x65/0x90
[81476.311547]
[81476.311547] asm_sysvec_apic_timer_interrupt+0xf/0x20
[81476.311551] RIP: 0010:console_unlock+0x366/0x5e0
[81476.311554] Code: ff ff 8b 05 44 5f b2 01 85 c0 75 66 c7 05 3a 5f b2 01 01 00 00 00 e9 0f fd ff ff e8 f4 1c 00 00 48 85 db 74 01 fb 8b 54 24 0c <85> d2 0f 84 4a fd ff ff e8 1d 2b 7c 00 e9 40 fd ff ff 4d 85 ff 74
[81476.311555] RSP: 0018:ffff9dc980313cc0 EFLAGS: 00000206
[81476.311556] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000000
[81476.311557] RDX: 0000000000000000 RSI: 0000000000000087 RDI: ffffffff82b59898
[81476.311557] RBP: 0000000000000000 R08: ffff9786814db080 R09: 0000000000000000
[81476.311558] R10: ffff9786a85bac10 R11: ffff97872e90acf0 R12: 0000000000000048
[81476.311559] R13: 0000000000000000 R14: 20c49ba5e353f7cf R15: 0000000000000000
[81476.311560] vprintk_default+0x5a/0x150
[81476.311562] printk+0x43/0x45
[81476.311563] synchronize_rcu_expedited_wait.cold+0x20/0x2db
[81476.311565] rcu_exp_wait_wake+0xc/0x110
[81476.311567] process_one_work+0x1ec/0x350
[81476.311569] worker_thread+0x4f/0x4d0
[81476.311570] ? process_one_work+0x350/0x350
[81476.311571] kthread+0xea/0x120
[81476.311573] ? kthread_park+0x80/0x80
[81476.311574] ret_from_fork+0x1f/0x30
[81551.199572] } 19586 jiffies s: 14473 root: 0x0/.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Bug Report Napi GRO ixgbe
  2021-04-12  8:36           ` Paolo Abeni
  2021-04-26  8:31             ` Martin Zaharinov
@ 2021-05-09 10:40             ` Martin Zaharinov
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Zaharinov @ 2021-05-09 10:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Wei Wang, David S . Miller, Jakub Kicinski,
	Alexander Duyck, Hannes Frederic Sowa, Eric Dumazet, alobakin

Hi Paolo

Its  urgent

I get same bug with new kernel 5.12.1

its normal server with nat traffic and need GRO to be enabled to work
speed on users.

Please check :

May  9 12:30:23 [126568.653018][ T3527] ------------[ cut here ]------------
May  9 12:30:23 [126568.653019][ T3527] list_del corruption.
prev->next should be ffff9478d6b55a00, but was ffffb0ebc3123d88
May  9 12:30:23 [126568.653023][ T3527] WARNING: CPU: 20 PID: 3527 at
lib/list_debug.c:51 __list_del_entry_valid+0x79/0x90
May  9 12:30:23 [126568.653026][ T3527] Modules linked in:
nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic
slhc xt_dtvqos(O) xt_TCPMSS xt_nat iptable_mangle iptable_nat
ip_tables team_mode_loadbalance team netconsole coretemp ixgbe mdio
mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp
nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp
nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
May  9 12:30:23 [126568.653049][ T3527] CPU: 20 PID: 3527 Comm:
napi/eth1-542 Tainted: G        W  O      5.12.1 #1
May  9 12:30:23 [126568.653050][ T3527] Hardware name: Supermicro
Super Server/X10SRi-F, BIOS 3.3 10/28/2020
May  9 12:30:23 [126568.653051][ T3527] RIP:
0010:__list_del_entry_valid+0x79/0x90
May  9 12:30:23 [126568.653054][ T3527] Code: c3 48 89 fe 4c 89 c2 48
c7 c7 08 db 34 b8 e8 2c df 51 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48
c7 c7 40 db 34 b8 e8 15 df 51 00 <0f> 0b 31 c0 c3 48 c7 c7 80 db 34 b8
e8 04 df 51 00 0f 0b 31 c0 c3
May  9 12:30:23 [126568.653055][ T3527] RSP: 0018:ffffb0ebc3123d78
EFLAGS: 00010296
May  9 12:30:23 [126568.653056][ T3527] RAX: 0000000000000054 RBX:
ffff9478d6b55a00 RCX: 80000000fff832ec
May  9 12:30:23 [126568.653057][ T3527] RDX: 0000000000000000 RSI:
0000000000000002 RDI: ffffffffb8b59898
May  9 12:30:23 [126568.653058][ T3527] RBP: ffff9477eac08158 R08:
00000000000098c4 R09: 000000000000000f
May  9 12:30:23 [126568.653059][ T3527] R10: 0000000000000004 R11:
ffff947f1e8fa1b4 R12: ffff9478d6b54400
May  9 12:30:23 [126568.653059][ T3527] R13: ffff9478d6b55a00 R14:
ffff94789340d400 R15: ffffb0ebc3123d88
May  9 12:30:23 [126568.653060][ T3527] FS:  0000000000000000(0000)
GS:ffff947f1fd00000(0000) knlGS:0000000000000000
May  9 12:30:23 [126568.653061][ T3527] CS:  0010 DS: 0000 ES: 0000
CR0: 0000000080050033
May  9 12:30:24 [126568.653062][ T3527] CR2: 00007fc73d0e0000 CR3:
00000001dea18003 CR4: 00000000001706e0
May  9 12:30:24 [126568.653063][ T3527] Call Trace:
May  9 12:30:24 [126568.653063][ T3527]
netif_receive_skb_list_internal+0x5e/0x2b0
May  9 12:30:24 [126568.653066][ T3527]  ? napi_gro_receive+0x14d/0x160
May  9 12:30:24 [126568.653068][ T3527]  ? enqueue_to_backlog+0x39/0x250
May  9 12:30:24 [126568.653069][ T3527]  napi_gro_flush+0x11b/0x260
May  9 12:30:24 [126568.653071][ T3527]  napi_complete_done+0x107/0x180
May  9 12:30:24 [126568.653073][ T3527]  ixgbe_poll+0x10e/0x2a0 [ixgbe]
May  9 12:30:24 [126568.653080][ T3527]  __napi_poll+0x1f/0x130
May  9 12:30:24 [126568.653082][ T3527]  napi_threaded_poll+0x105/0x150
May  9 12:30:24 [126568.653084][ T3527]  ? __napi_poll+0x130/0x130
May  9 12:30:24 [126568.653086][ T3527]  kthread+0xea/0x120
May  9 12:30:24 [126568.653088][ T3527]  ? kthread_park+0x80/0x80
May  9 12:30:24 [126568.653090][ T3527]  ret_from_fork+0x1f/0x30
May  9 12:30:24 [126568.653092][ T3527] ---[ end trace 946b481f5c11bfe9 ]---
May  9 12:30:24 [126568.653092][ T3527] ------------[ cut here ]------------
May  9 12:30:24 [126568.653093][ T3527] list_del corruption.
prev->next should be ffff9478d6b54400, but was ffffb0ebc3123d88
May  9 12:30:24 [126568.653097][ T3527] WARNING: CPU: 20 PID: 3527 at
lib/list_debug.c:51 __list_del_entry_valid+0x79/0x90
May  9 12:30:24 [126568.653099][ T3527] Modules linked in:
nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic
slhc xt_dtvqos(O) xt_TCPMSS xt_nat iptable_mangle iptable_nat
ip_tables team_mode_loadbalance team netconsole coretemp ixgbe mdio
mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp
nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp
nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos
May  9 12:30:24 [126568.653113][ T3527] CPU: 20 PID: 3527 Comm:
napi/eth1-542 Tainted: G        W  O      5.12.1 #1
May  9 12:30:24 [126568.653114][ T3527] Hardware name: Supermicro
Super Server/X10SRi-F, BIOS 3.3 10/28/2020
May  9 12:30:24 [126568.653114][ T3527] RIP:
0010:__list_del_entry_valid+0x79/0x90
May  9 12:30:24 [126568.653116][ T3527] Code: c3 48 89 fe 4c 89 c2 48
c7 c7 08 db 34 b8 e8 2c df 51 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48
c7 c7 40 db 34 b8 e8 15 df 51 00 <0f> 0b 31 c0 c3 48 c7 c7 80 db 34 b8
e8 04 df 51 00 0f 0b 31 c0 c3
May  9 12:30:24 [126568.653117][ T3527] RSP: 0018:ffffb0ebc3123d78
EFLAGS: 00010296
May  9 12:30:24 [126568.653118][ T3527] RAX: 0000000000000054 RBX:
ffff9478d6b54400 RCX: 80000000fff8330b
May  9 12:30:24 [126568.653119][ T3527] RDX: 0000000000000000 RSI:
0000000000000002 RDI: ffffffffb8b59898
May  9 12:30:24 [126568.653120][ T3527] RBP: ffff9477eac08158 R08:
0000000000009921 R09: 000000000000000f
May  9 12:30:24 [126568.653120][ T3527] R10: 0000000000000004 R11:
ffff947f1e8fab74 R12: ffff9478d6b55800
May  9 12:30:24 [126568.653121][ T3527] R13: ffff9478d6b54400 R14:
ffff9478d6b54700 R15: ffffb0ebc3123d88
May  9 12:30:25 [126568.653122][ T3527] FS:  0000000000000000(0000)
GS:ffff947f1fd00000(0000) knlGS:0000000000000000
May  9 12:30:25 [126568.653123][ T3527] CS:  0010 DS: 0000 ES: 0000
CR0: 0000000080050033
May  9 12:30:25 [126568.653124][ T3527] CR2: 00007fc73d0e0000 CR3:
00000001dea18003 CR4: 00000000001706e0
May  9 12:30:25 [126568.653124][ T3527] Call Trace:
May  9 12:30:25 [126568.653125][ T3527]
netif_receive_skb_list_internal+0x5e/0x2b0
May  9 12:30:25 [126568.653127][ T3527]  ? napi_gro_receive+0x14d/0x160
May  9 12:30:25 [126568.653129][ T3527]  ? enqueue_to_backlog+0x39/0x250
May  9 12:30:25 [126568.653130][ T3527]  napi_gro_flush+0x11b/0x260
May  9 12:30:25 [126568.653132][ T3527]  napi_complete_done+0x107/0x180
May  9 12:30:25 [126568.653134][ T3527]  ixgbe_poll+0x10e/0x2a0 [ixgbe]
May  9 12:30:25 [126568.653140][ T3527]  __napi_poll+0x1f/0x130
May  9 12:30:25 [126568.653142][ T3527]  napi_threaded_poll+0x105/0x150
May  9 12:30:25 [126568.653144][ T3527]  ? __napi_poll+0x130/0x130
May  9 12:30:25 [126568.653146][ T3527]  kthread+0xea/0x120
May  9 12:30:25 [126568.653148][ T3527]  ? kthread_park+0x80/0x80
May  9 12:30:25 [126568.653151][ T3527]  ret_from_fork+0x1f/0x30
May  9 12:30:25 [126568.653152][ T3527] ---[ end trace 946b481f5c11bfea ]---

Best Regards,
Martin

На пн, 12.04.2021 г. в 11:37 ч. Paolo Abeni <pabeni@redhat.com> написа:
>
> Hello,
>
> On Sat, 2021-04-10 at 14:22 +0300, Martin Zaharinov wrote:
> > Hi  Team
> >
> > One report latest kernel 5.11.12
> >
> > Please check and help to find and fix
>
> Please provide a complete splat, including the trapping instruction.
> >
> > Apr 10 12:46:25  [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88
> > Apr 10 12:46:26  [214315.570814][ T3345] FS:  0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000
> > Apr 10 12:46:26  [214315.622416][ T3345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Apr 10 12:46:26  [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0
> > Apr 10 12:46:26  [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > Apr 10 12:46:26  [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Apr 10 12:46:26  [214315.799749][ T3345] Call Trace:
> > Apr 10 12:46:26  [214315.824268][ T3345]  netif_receive_skb_list_internal+0x5e/0x2c0
> > Apr 10 12:46:26  [214315.848996][ T3345]  napi_gro_flush+0x11b/0x260
> > Apr 10 12:46:26  [214315.873320][ T3345]  napi_complete_done+0x107/0x180
> > Apr 10 12:46:26  [214315.897160][ T3345]  ixgbe_poll+0x10e/0x2a0 [ixgbe]
> > Apr 10 12:46:26  [214315.920564][ T3345]  __napi_poll+0x1f/0x130
> > Apr 10 12:46:26  [214315.943475][ T3345]  napi_threaded_poll+0x110/0x160
> > Apr 10 12:46:26  [214315.966252][ T3345]  ? __napi_poll+0x130/0x130
> > Apr 10 12:46:26  [214315.988424][ T3345]  kthread+0xea/0x120
> > Apr 10 12:46:26  [214316.010247][ T3345]  ? kthread_park+0x80/0x80
> > Apr 10 12:46:26  [214316.031729][ T3345]  ret_from_fork+0x1f/0x30
>
> Could you please also provide the decoded the stack trace? Something
> alike the following will do:
>
> cat <file contaning the splat> | ./scripts/decode_stacktrace.sh <path to vmlinux>
>
> Even more importantly:
>
> threaded napi is implemented with the merge
> commit adbb4fb028452b1b0488a1a7b66ab856cdf20715, which landed into the
> vanilla tree since v5.12.rc1 and is not backported to 5.11.x. What
> kernel are you really using?
>
> Thanks,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-03-30 13:39       ` Eric Dumazet
  2021-04-10 11:22         ` Bug Report Napi GRO ixgbe Martin Zaharinov
@ 2021-09-09 11:18         ` Martin Zaharinov
  2021-09-10  0:30           ` Wei Wang
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-09-09 11:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Wei Wang, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Eric and Wei

Please see this bug report from last hour ,
Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
Uptime before crash : 10day 




Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.

> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Eric and Wei
>> 
>> Please check this log :
>> 
> 
> Please send a normal report to netdev.
> 
> This has nothing to to with us (Eric & Wei)
> 
> Thanks.
> 
>> 
>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>> [1584289.107263] Call Trace:
>> [1584289.107266]  dump_stack+0x58/0x6b
>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>> [1584289.474009]  allocate_slab+0x272/0x450
>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>> [1584289.541416]  build_skb+0x1a/0x200
>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>> [1584289.605528]  __napi_poll+0x1f/0x130
>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>> [1584289.646110]  ? __napi_poll+0x130/0x130
>> [1584289.665810]  kthread+0xea/0x120
>> [1584289.684836]  ? kthread_park+0x80/0x80
>> [1584289.703440]  ret_from_fork+0x1f/0x30
>> [1584289.721616] Mem-Info:
>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>>                  active_file:17408 inactive_file:149 isolated_file:32
>>                  unevictable:1440359 dirty:17500 writeback:0
>>                  slab_reclaimable:43368 slab_unreclaimable:155124
>>                  mapped:817431 shmem:7650 pagetables:32093 bounce:0
>>                  free:17832 free_pcp:113 free_cma:0
>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>> [1584290.409087] 1465768 total pagecache pages
>> [1584290.434531] 4165289 pages RAM
>> [1584290.459616] 0 pages HighMem/MovableOnly
>> [1584290.484480] 104766 pages reserved
>> [1584290.508709] 0 pages hwpoisoned
>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>> [1584302.776532] Call Trace:
>> [1584302.799361]  dump_stack+0x58/0x6b
>> [1584302.821791]  dump_header+0x4c/0x2e6
>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>> [1584302.886641]  out_of_memory+0x54/0xa0
>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>> [1584302.947874]  __get_free_pages+0x8/0x30
>> [1584302.967246]  pgd_alloc+0x21/0x180
>> [1584302.986355]  mm_alloc+0x1af/0x250
>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>> [1584303.023328]  do_execveat_common+0x8b/0x330
>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>> [1584303.058513]  do_syscall_64+0x2d/0x40
>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [1584303.091891] RIP: 0033:0x488376
>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>> [1584303.379094] Mem-Info:
>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>>                  active_file:12975 inactive_file:168 isolated_file:32
>>                  unevictable:909709 dirty:12864 writeback:10
>>                  slab_reclaimable:42415 slab_unreclaimable:154783
>>                  mapped:39825 shmem:14744 pagetables:26041 bounce:0
>>                  free:537002 free_pcp:1813 free_cma:0
>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>> [1584304.287094] 933871 total pagecache pages
>> [1584304.312815] 4165289 pages RAM
>> [1584304.337915] 0 pages HighMem/MovableOnly
>> [1584304.362522] 104766 pages reserved
>> [1584304.386516] 0 pages hwpoisoned
>> 
>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>>> 
>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>> 
>>>> Hi Wei
>>>> Check this:
>>>> 
>>>> [   39.706567] ------------[ cut here ]------------
>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>>> 
>>> Probably more relevant to Intel maintainers than Wei :/
>>> 
>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [   39.706656] Call Trace:
>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>>>> [   39.706754]  process_one_work+0x1ec/0x350
>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>>>> [   39.706760]  ? process_one_work+0x350/0x350
>>>> [   39.706762]  kthread+0xea/0x120
>>>> [   39.706766]  ? kthread_park+0x80/0x80
>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>>>> 
>>>> Martin
>>>> 
>>>> 
>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>>>> 
>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>>>> determine if the kthread owns this napi and could call napi->poll() on
>>>>> it. However, if socket busy poll is enabled, it is possible that the
>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>>>> This patch tries to fix this race by adding a new bit
>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>>>> bit is set. This helps distinguish the ownership of the napi between
>>>>> kthread and other scenarios and fixes the race issue.
>>>>> 
>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>> ---
>>>>> Change since v3:
>>>>> - Add READ_ONCE() for thread->state and add comments in
>>>>>  ____napi_schedule().
>>>>> 
>>>>> include/linux/netdevice.h |  2 ++
>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>>>>> --- a/include/linux/netdevice.h
>>>>> +++ b/include/linux/netdevice.h
>>>>> @@ -360,6 +360,7 @@ enum {
>>>>>     NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>>>     NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>>>     NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>>>> };
>>>>> 
>>>>> enum {
>>>>> @@ -372,6 +373,7 @@ enum {
>>>>>     NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>>>     NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>>>     NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>>>> };
>>>>> 
>>>>> enum gro_result {
>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>> index 6c5967e80132..d3195a95f30e 100644
>>>>> --- a/net/core/dev.c
>>>>> +++ b/net/core/dev.c
>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>>>              */
>>>>>             thread = READ_ONCE(napi->thread);
>>>>>             if (thread) {
>>>>> +                     /* Avoid doing set_bit() if the thread is in
>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>>>> +                      * makes sure to proceed with napi polling
>>>>> +                      * if the thread is explicitly woken from here.
>>>>> +                      */
>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>>>                     wake_up_process(thread);
>>>>>                     return;
>>>>>             }
>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>>>             WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>>>> 
>>>>>             new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>>>                           NAPIF_STATE_PREFER_BUSY_POLL);
>>>>> 
>>>>>             /* If STATE_MISSED was set, leave STATE_SCHED set,
>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>>>> 
>>>>> static int napi_thread_wait(struct napi_struct *napi)
>>>>> {
>>>>> +     bool woken = false;
>>>>> +
>>>>>     set_current_state(TASK_INTERRUPTIBLE);
>>>>> 
>>>>>     while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>>>> +              * kthread owns this napi and could poll on this napi.
>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>>>> +              * set by some other busy poll thread or by napi_disable().
>>>>> +              */
>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>>>                     WARN_ON(!list_empty(&napi->poll_list));
>>>>>                     __set_current_state(TASK_RUNNING);
>>>>>                     return 0;
>>>>>             }
>>>>> 
>>>>>             schedule();
>>>>> +             /* woken being true indicates this thread owns this napi. */
>>>>> +             woken = true;
>>>>>             set_current_state(TASK_INTERRUPTIBLE);
>>>>>     }
>>>>>     __set_current_state(TASK_RUNNING);
>>>>> --
>>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>>> 
>>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-09 11:18         ` [PATCH net v4] net: fix race between napi kthread mode and busy poll Martin Zaharinov
@ 2021-09-10  0:30           ` Wei Wang
  2021-09-10  1:57             ` Martin Zaharinov
  2021-09-15 14:22             ` Martin Zaharinov
  0 siblings, 2 replies; 25+ messages in thread
From: Wei Wang @ 2021-09-10  0:30 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Martin,

Is there a reproducer for this? What kind of traffic is it running?
What is the following config:
cat /proc/sys/net/core/busy_poll
cat /proc/sys/net/core/busy_read
cat /sys/class/net/<ixgbe_dev>/threaded
And is SO_PREFER_BUSY_POLL used?

Thanks.
Wei



On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Eric and Wei
>
> Please see this bug report from last hour ,
> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
> Uptime before crash : 10day
>
>
>
>
> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>
> > On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>
> >> Hi Eric and Wei
> >>
> >> Please check this log :
> >>
> >
> > Please send a normal report to netdev.
> >
> > This has nothing to to with us (Eric & Wei)
> >
> > Thanks.
> >
> >>
> >> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
> >> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
> >> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >> [1584289.107263] Call Trace:
> >> [1584289.107266]  dump_stack+0x58/0x6b
> >> [1584289.209562]  warn_alloc.cold+0x70/0xd4
> >> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
> >> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
> >> [1584289.474009]  allocate_slab+0x272/0x450
> >> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
> >> [1584289.519147]  kmem_cache_alloc+0x110/0x120
> >> [1584289.541416]  build_skb+0x1a/0x200
> >> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
> >> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
> >> [1584289.605528]  __napi_poll+0x1f/0x130
> >> [1584289.625842]  napi_threaded_poll+0x110/0x160
> >> [1584289.646110]  ? __napi_poll+0x130/0x130
> >> [1584289.665810]  kthread+0xea/0x120
> >> [1584289.684836]  ? kthread_park+0x80/0x80
> >> [1584289.703440]  ret_from_fork+0x1f/0x30
> >> [1584289.721616] Mem-Info:
> >> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
> >>                  active_file:17408 inactive_file:149 isolated_file:32
> >>                  unevictable:1440359 dirty:17500 writeback:0
> >>                  slab_reclaimable:43368 slab_unreclaimable:155124
> >>                  mapped:817431 shmem:7650 pagetables:32093 bounce:0
> >>                  free:17832 free_pcp:113 free_cma:0
> >> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
> >> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
> >> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
> >> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
> >> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
> >> [1584290.237051] lowmem_reserve[]: 0 0 0 0
> >> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
> >> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
> >> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
> >> [1584290.409087] 1465768 total pagecache pages
> >> [1584290.434531] 4165289 pages RAM
> >> [1584290.459616] 0 pages HighMem/MovableOnly
> >> [1584290.484480] 104766 pages reserved
> >> [1584290.508709] 0 pages hwpoisoned
> >> [1584301.710231] team0: Failed to send options change via netlink (err -105)
> >> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
> >> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
> >> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >> [1584302.776532] Call Trace:
> >> [1584302.799361]  dump_stack+0x58/0x6b
> >> [1584302.821791]  dump_header+0x4c/0x2e6
> >> [1584302.843580]  oom_kill_process.cold+0xb/0x10
> >> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
> >> [1584302.886641]  out_of_memory+0x54/0xa0
> >> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
> >> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
> >> [1584302.947874]  __get_free_pages+0x8/0x30
> >> [1584302.967246]  pgd_alloc+0x21/0x180
> >> [1584302.986355]  mm_alloc+0x1af/0x250
> >> [1584303.005085]  alloc_bprm+0x80/0x2a0
> >> [1584303.023328]  do_execveat_common+0x8b/0x330
> >> [1584303.041181]  __x64_sys_execve+0x2b/0x40
> >> [1584303.058513]  do_syscall_64+0x2d/0x40
> >> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [1584303.091891] RIP: 0033:0x488376
> >> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
> >> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
> >> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
> >> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
> >> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
> >> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
> >> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
> >> [1584303.379094] Mem-Info:
> >> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
> >>                  active_file:12975 inactive_file:168 isolated_file:32
> >>                  unevictable:909709 dirty:12864 writeback:10
> >>                  slab_reclaimable:42415 slab_unreclaimable:154783
> >>                  mapped:39825 shmem:14744 pagetables:26041 bounce:0
> >>                  free:537002 free_pcp:1813 free_cma:0
> >> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
> >> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
> >> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
> >> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
> >> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
> >> [1584304.036531] lowmem_reserve[]: 0 0 0 0
> >> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
> >> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
> >> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
> >> [1584304.287094] 933871 total pagecache pages
> >> [1584304.312815] 4165289 pages RAM
> >> [1584304.337915] 0 pages HighMem/MovableOnly
> >> [1584304.362522] 104766 pages reserved
> >> [1584304.386516] 0 pages hwpoisoned
> >>
> >>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
> >>>
> >>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>
> >>>> Hi Wei
> >>>> Check this:
> >>>>
> >>>> [   39.706567] ------------[ cut here ]------------
> >>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
> >>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>
> >>> Probably more relevant to Intel maintainers than Wei :/
> >>>
> >>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
> >>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
> >>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
> >>>> [   39.706619] Workqueue: events work_for_cpu_fn
> >>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> >>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
> >>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
> >>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
> >>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
> >>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
> >>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
> >>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
> >>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
> >>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>> [   39.706656] Call Trace:
> >>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
> >>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
> >>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
> >>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
> >>>> [   39.706716]  ? __kmalloc+0x37/0x160
> >>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
> >>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
> >>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
> >>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
> >>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
> >>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
> >>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
> >>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
> >>>> [   39.706746]  local_pci_probe+0x1b/0x40
> >>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
> >>>> [   39.706754]  process_one_work+0x1ec/0x350
> >>>> [   39.706758]  worker_thread+0x24b/0x4d0
> >>>> [   39.706760]  ? process_one_work+0x350/0x350
> >>>> [   39.706762]  kthread+0xea/0x120
> >>>> [   39.706766]  ? kthread_park+0x80/0x80
> >>>> [   39.706770]  ret_from_fork+0x1f/0x30
> >>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
> >>>>
> >>>> Martin
> >>>>
> >>>>
> >>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> >>>>>
> >>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> >>>>> determine if the kthread owns this napi and could call napi->poll() on
> >>>>> it. However, if socket busy poll is enabled, it is possible that the
> >>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> >>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> >>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
> >>>>> This patch tries to fix this race by adding a new bit
> >>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> >>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> >>>>> in napi_complete_done(), and we only poll the napi in kthread if this
> >>>>> bit is set. This helps distinguish the ownership of the napi between
> >>>>> kthread and other scenarios and fixes the race issue.
> >>>>>
> >>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> >>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
> >>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> >>>>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
> >>>>> Cc: Eric Dumazet <edumazet@google.com>
> >>>>> Cc: Paolo Abeni <pabeni@redhat.com>
> >>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >>>>> ---
> >>>>> Change since v3:
> >>>>> - Add READ_ONCE() for thread->state and add comments in
> >>>>>  ____napi_schedule().
> >>>>>
> >>>>> include/linux/netdevice.h |  2 ++
> >>>>> net/core/dev.c            | 19 ++++++++++++++++++-
> >>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>>> index 5b67ea89d5f2..87a5d186faff 100644
> >>>>> --- a/include/linux/netdevice.h
> >>>>> +++ b/include/linux/netdevice.h
> >>>>> @@ -360,6 +360,7 @@ enum {
> >>>>>     NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> >>>>>     NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> >>>>>     NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> >>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> >>>>> };
> >>>>>
> >>>>> enum {
> >>>>> @@ -372,6 +373,7 @@ enum {
> >>>>>     NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> >>>>>     NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> >>>>>     NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> >>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> >>>>> };
> >>>>>
> >>>>> enum gro_result {
> >>>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>>> index 6c5967e80132..d3195a95f30e 100644
> >>>>> --- a/net/core/dev.c
> >>>>> +++ b/net/core/dev.c
> >>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >>>>>              */
> >>>>>             thread = READ_ONCE(napi->thread);
> >>>>>             if (thread) {
> >>>>> +                     /* Avoid doing set_bit() if the thread is in
> >>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
> >>>>> +                      * makes sure to proceed with napi polling
> >>>>> +                      * if the thread is explicitly woken from here.
> >>>>> +                      */
> >>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> >>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> >>>>>                     wake_up_process(thread);
> >>>>>                     return;
> >>>>>             }
> >>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> >>>>>             WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> >>>>>
> >>>>>             new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> >>>>> +                           NAPIF_STATE_SCHED_THREADED |
> >>>>>                           NAPIF_STATE_PREFER_BUSY_POLL);
> >>>>>
> >>>>>             /* If STATE_MISSED was set, leave STATE_SCHED set,
> >>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> >>>>>
> >>>>> static int napi_thread_wait(struct napi_struct *napi)
> >>>>> {
> >>>>> +     bool woken = false;
> >>>>> +
> >>>>>     set_current_state(TASK_INTERRUPTIBLE);
> >>>>>
> >>>>>     while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> >>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> >>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
> >>>>> +              * kthread owns this napi and could poll on this napi.
> >>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
> >>>>> +              * set by some other busy poll thread or by napi_disable().
> >>>>> +              */
> >>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> >>>>>                     WARN_ON(!list_empty(&napi->poll_list));
> >>>>>                     __set_current_state(TASK_RUNNING);
> >>>>>                     return 0;
> >>>>>             }
> >>>>>
> >>>>>             schedule();
> >>>>> +             /* woken being true indicates this thread owns this napi. */
> >>>>> +             woken = true;
> >>>>>             set_current_state(TASK_INTERRUPTIBLE);
> >>>>>     }
> >>>>>     __set_current_state(TASK_RUNNING);
> >>>>> --
> >>>>> 2.31.0.rc2.261.g7f71774620-goog
> >>>>>
> >>>>
> >>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-10  0:30           ` Wei Wang
@ 2021-09-10  1:57             ` Martin Zaharinov
  2021-09-15 14:22             ` Martin Zaharinov
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Zaharinov @ 2021-09-10  1:57 UTC (permalink / raw)
  To: Wei Wang
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Wei,

The problem is hard to reproduce. I see for second time this problem in a period of 1 month.
Server use for Firewall/NAT/PPPOE and have connect users to machine.


cat /proc/sys/net/core/busy_poll - 50
cat /proc/sys/net/core/busy_read - 50
cat /sys/class/net/eth0/threaded - 1
cat /sys/class/net/eth1/threaded - 1 


May be not use SO_PREFER_BUSY_POLL - how to check and enable?

P.S.

Eth0 and eth1 united in one common teamd/bond 

Best regards,
Martin

> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
> 
> Hi Martin,
> 
> Is there a reproducer for this? What kind of traffic is it running?
> What is the following config:
> cat /proc/sys/net/core/busy_poll
> cat /proc/sys/net/core/busy_read
> cat /sys/class/net/<ixgbe_dev>/threaded
> And is SO_PREFER_BUSY_POLL used?
> 
> Thanks.
> Wei
> 
> 
> 
> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Eric and Wei
>> 
>> Please see this bug report from last hour ,
>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
>> Uptime before crash : 10day
>> 
>> 
>> 
>> 
>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>> 
>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
>>> 
>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>> 
>>>> Hi Eric and Wei
>>>> 
>>>> Please check this log :
>>>> 
>>> 
>>> Please send a normal report to netdev.
>>> 
>>> This has nothing to to with us (Eric & Wei)
>>> 
>>> Thanks.
>>> 
>>>> 
>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>> [1584289.107263] Call Trace:
>>>> [1584289.107266]  dump_stack+0x58/0x6b
>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>>>> [1584289.474009]  allocate_slab+0x272/0x450
>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>>>> [1584289.541416]  build_skb+0x1a/0x200
>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>>>> [1584289.605528]  __napi_poll+0x1f/0x130
>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
>>>> [1584289.665810]  kthread+0xea/0x120
>>>> [1584289.684836]  ? kthread_park+0x80/0x80
>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
>>>> [1584289.721616] Mem-Info:
>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>>>>                 active_file:17408 inactive_file:149 isolated_file:32
>>>>                 unevictable:1440359 dirty:17500 writeback:0
>>>>                 slab_reclaimable:43368 slab_unreclaimable:155124
>>>>                 mapped:817431 shmem:7650 pagetables:32093 bounce:0
>>>>                 free:17832 free_pcp:113 free_cma:0
>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>>>> [1584290.409087] 1465768 total pagecache pages
>>>> [1584290.434531] 4165289 pages RAM
>>>> [1584290.459616] 0 pages HighMem/MovableOnly
>>>> [1584290.484480] 104766 pages reserved
>>>> [1584290.508709] 0 pages hwpoisoned
>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>> [1584302.776532] Call Trace:
>>>> [1584302.799361]  dump_stack+0x58/0x6b
>>>> [1584302.821791]  dump_header+0x4c/0x2e6
>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>>>> [1584302.886641]  out_of_memory+0x54/0xa0
>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>>>> [1584302.947874]  __get_free_pages+0x8/0x30
>>>> [1584302.967246]  pgd_alloc+0x21/0x180
>>>> [1584302.986355]  mm_alloc+0x1af/0x250
>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [1584303.091891] RIP: 0033:0x488376
>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>>>> [1584303.379094] Mem-Info:
>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>>>>                 active_file:12975 inactive_file:168 isolated_file:32
>>>>                 unevictable:909709 dirty:12864 writeback:10
>>>>                 slab_reclaimable:42415 slab_unreclaimable:154783
>>>>                 mapped:39825 shmem:14744 pagetables:26041 bounce:0
>>>>                 free:537002 free_pcp:1813 free_cma:0
>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>>>> [1584304.287094] 933871 total pagecache pages
>>>> [1584304.312815] 4165289 pages RAM
>>>> [1584304.337915] 0 pages HighMem/MovableOnly
>>>> [1584304.362522] 104766 pages reserved
>>>> [1584304.386516] 0 pages hwpoisoned
>>>> 
>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>>>>> 
>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Wei
>>>>>> Check this:
>>>>>> 
>>>>>> [   39.706567] ------------[ cut here ]------------
>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>> 
>>>>> Probably more relevant to Intel maintainers than Wei :/
>>>>> 
>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> [   39.706656] Call Trace:
>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
>>>>>> [   39.706762]  kthread+0xea/0x120
>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>>>>>> 
>>>>>> Martin
>>>>>> 
>>>>>> 
>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>>>>>> 
>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>>>>>> This patch tries to fix this race by adding a new bit
>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>>>>>> bit is set. This helps distinguish the ownership of the napi between
>>>>>>> kthread and other scenarios and fixes the race issue.
>>>>>>> 
>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>>>> ---
>>>>>>> Change since v3:
>>>>>>> - Add READ_ONCE() for thread->state and add comments in
>>>>>>> ____napi_schedule().
>>>>>>> 
>>>>>>> include/linux/netdevice.h |  2 ++
>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>>>>> 
>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>>>>>>> --- a/include/linux/netdevice.h
>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>> @@ -360,6 +360,7 @@ enum {
>>>>>>>    NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>>>>>    NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>>>>>    NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>>>>>> };
>>>>>>> 
>>>>>>> enum {
>>>>>>> @@ -372,6 +373,7 @@ enum {
>>>>>>>    NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>>>>>    NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>>>>>    NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>>>>>> };
>>>>>>> 
>>>>>>> enum gro_result {
>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>> index 6c5967e80132..d3195a95f30e 100644
>>>>>>> --- a/net/core/dev.c
>>>>>>> +++ b/net/core/dev.c
>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>>>>>             */
>>>>>>>            thread = READ_ONCE(napi->thread);
>>>>>>>            if (thread) {
>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>>>>>> +                      * makes sure to proceed with napi polling
>>>>>>> +                      * if the thread is explicitly woken from here.
>>>>>>> +                      */
>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>>>>>                    wake_up_process(thread);
>>>>>>>                    return;
>>>>>>>            }
>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>>>>>            WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>>>>>> 
>>>>>>>            new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>>>>>                          NAPIF_STATE_PREFER_BUSY_POLL);
>>>>>>> 
>>>>>>>            /* If STATE_MISSED was set, leave STATE_SCHED set,
>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>>>>>> 
>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
>>>>>>> {
>>>>>>> +     bool woken = false;
>>>>>>> +
>>>>>>>    set_current_state(TASK_INTERRUPTIBLE);
>>>>>>> 
>>>>>>>    while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>>>>>> +              * kthread owns this napi and could poll on this napi.
>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>>>>>> +              * set by some other busy poll thread or by napi_disable().
>>>>>>> +              */
>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>>>>>                    WARN_ON(!list_empty(&napi->poll_list));
>>>>>>>                    __set_current_state(TASK_RUNNING);
>>>>>>>                    return 0;
>>>>>>>            }
>>>>>>> 
>>>>>>>            schedule();
>>>>>>> +             /* woken being true indicates this thread owns this napi. */
>>>>>>> +             woken = true;
>>>>>>>            set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>    }
>>>>>>>    __set_current_state(TASK_RUNNING);
>>>>>>> --
>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>>>>> 
>>>>>> 
>>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-10  0:30           ` Wei Wang
  2021-09-10  1:57             ` Martin Zaharinov
@ 2021-09-15 14:22             ` Martin Zaharinov
  2021-09-15 15:45               ` Wei Wang
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-09-15 14:22 UTC (permalink / raw)
  To: Wei Wang
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Wei
Please see this bug log :


Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.




> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
> 
> Hi Martin,
> 
> Is there a reproducer for this? What kind of traffic is it running?
> What is the following config:
> cat /proc/sys/net/core/busy_poll
> cat /proc/sys/net/core/busy_read
> cat /sys/class/net/<ixgbe_dev>/threaded
> And is SO_PREFER_BUSY_POLL used?
> 
> Thanks.
> Wei
> 
> 
> 
> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Eric and Wei
>> 
>> Please see this bug report from last hour ,
>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
>> Uptime before crash : 10day
>> 
>> 
>> 
>> 
>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>> 
>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
>>> 
>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>> 
>>>> Hi Eric and Wei
>>>> 
>>>> Please check this log :
>>>> 
>>> 
>>> Please send a normal report to netdev.
>>> 
>>> This has nothing to to with us (Eric & Wei)
>>> 
>>> Thanks.
>>> 
>>>> 
>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>> [1584289.107263] Call Trace:
>>>> [1584289.107266]  dump_stack+0x58/0x6b
>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>>>> [1584289.474009]  allocate_slab+0x272/0x450
>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>>>> [1584289.541416]  build_skb+0x1a/0x200
>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>>>> [1584289.605528]  __napi_poll+0x1f/0x130
>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
>>>> [1584289.665810]  kthread+0xea/0x120
>>>> [1584289.684836]  ? kthread_park+0x80/0x80
>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
>>>> [1584289.721616] Mem-Info:
>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>>>>                 active_file:17408 inactive_file:149 isolated_file:32
>>>>                 unevictable:1440359 dirty:17500 writeback:0
>>>>                 slab_reclaimable:43368 slab_unreclaimable:155124
>>>>                 mapped:817431 shmem:7650 pagetables:32093 bounce:0
>>>>                 free:17832 free_pcp:113 free_cma:0
>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>>>> [1584290.409087] 1465768 total pagecache pages
>>>> [1584290.434531] 4165289 pages RAM
>>>> [1584290.459616] 0 pages HighMem/MovableOnly
>>>> [1584290.484480] 104766 pages reserved
>>>> [1584290.508709] 0 pages hwpoisoned
>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>> [1584302.776532] Call Trace:
>>>> [1584302.799361]  dump_stack+0x58/0x6b
>>>> [1584302.821791]  dump_header+0x4c/0x2e6
>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>>>> [1584302.886641]  out_of_memory+0x54/0xa0
>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>>>> [1584302.947874]  __get_free_pages+0x8/0x30
>>>> [1584302.967246]  pgd_alloc+0x21/0x180
>>>> [1584302.986355]  mm_alloc+0x1af/0x250
>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [1584303.091891] RIP: 0033:0x488376
>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>>>> [1584303.379094] Mem-Info:
>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>>>>                 active_file:12975 inactive_file:168 isolated_file:32
>>>>                 unevictable:909709 dirty:12864 writeback:10
>>>>                 slab_reclaimable:42415 slab_unreclaimable:154783
>>>>                 mapped:39825 shmem:14744 pagetables:26041 bounce:0
>>>>                 free:537002 free_pcp:1813 free_cma:0
>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>>>> [1584304.287094] 933871 total pagecache pages
>>>> [1584304.312815] 4165289 pages RAM
>>>> [1584304.337915] 0 pages HighMem/MovableOnly
>>>> [1584304.362522] 104766 pages reserved
>>>> [1584304.386516] 0 pages hwpoisoned
>>>> 
>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>>>>> 
>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Wei
>>>>>> Check this:
>>>>>> 
>>>>>> [   39.706567] ------------[ cut here ]------------
>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>> 
>>>>> Probably more relevant to Intel maintainers than Wei :/
>>>>> 
>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> [   39.706656] Call Trace:
>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
>>>>>> [   39.706762]  kthread+0xea/0x120
>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>>>>>> 
>>>>>> Martin
>>>>>> 
>>>>>> 
>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>>>>>> 
>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>>>>>> This patch tries to fix this race by adding a new bit
>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>>>>>> bit is set. This helps distinguish the ownership of the napi between
>>>>>>> kthread and other scenarios and fixes the race issue.
>>>>>>> 
>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>>>> ---
>>>>>>> Change since v3:
>>>>>>> - Add READ_ONCE() for thread->state and add comments in
>>>>>>> ____napi_schedule().
>>>>>>> 
>>>>>>> include/linux/netdevice.h |  2 ++
>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>>>>> 
>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>>>>>>> --- a/include/linux/netdevice.h
>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>> @@ -360,6 +360,7 @@ enum {
>>>>>>>    NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>>>>>    NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>>>>>    NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>>>>>> };
>>>>>>> 
>>>>>>> enum {
>>>>>>> @@ -372,6 +373,7 @@ enum {
>>>>>>>    NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>>>>>    NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>>>>>    NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>>>>>> };
>>>>>>> 
>>>>>>> enum gro_result {
>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>> index 6c5967e80132..d3195a95f30e 100644
>>>>>>> --- a/net/core/dev.c
>>>>>>> +++ b/net/core/dev.c
>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>>>>>             */
>>>>>>>            thread = READ_ONCE(napi->thread);
>>>>>>>            if (thread) {
>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>>>>>> +                      * makes sure to proceed with napi polling
>>>>>>> +                      * if the thread is explicitly woken from here.
>>>>>>> +                      */
>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>>>>>                    wake_up_process(thread);
>>>>>>>                    return;
>>>>>>>            }
>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>>>>>            WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>>>>>> 
>>>>>>>            new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>>>>>                          NAPIF_STATE_PREFER_BUSY_POLL);
>>>>>>> 
>>>>>>>            /* If STATE_MISSED was set, leave STATE_SCHED set,
>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>>>>>> 
>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
>>>>>>> {
>>>>>>> +     bool woken = false;
>>>>>>> +
>>>>>>>    set_current_state(TASK_INTERRUPTIBLE);
>>>>>>> 
>>>>>>>    while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>>>>>> +              * kthread owns this napi and could poll on this napi.
>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>>>>>> +              * set by some other busy poll thread or by napi_disable().
>>>>>>> +              */
>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>>>>>                    WARN_ON(!list_empty(&napi->poll_list));
>>>>>>>                    __set_current_state(TASK_RUNNING);
>>>>>>>                    return 0;
>>>>>>>            }
>>>>>>> 
>>>>>>>            schedule();
>>>>>>> +             /* woken being true indicates this thread owns this napi. */
>>>>>>> +             woken = true;
>>>>>>>            set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>    }
>>>>>>>    __set_current_state(TASK_RUNNING);
>>>>>>> --
>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>>>>> 
>>>>>> 
>>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-15 14:22             ` Martin Zaharinov
@ 2021-09-15 15:45               ` Wei Wang
  2021-09-15 20:57                 ` Martin Zaharinov
  2021-09-22 14:12                 ` Martin Zaharinov
  0 siblings, 2 replies; 25+ messages in thread
From: Wei Wang @ 2021-09-15 15:45 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Thanks Martin for the report.
Without a reproducer, it might be hard to debug. I will double check
the code to check for potential race between kthread poll and busy
poll.

Thanks.
Wei

On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Wei
> Please see this bug log :
>
>
> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
>
>
>
>
> > On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
> >
> > Hi Martin,
> >
> > Is there a reproducer for this? What kind of traffic is it running?
> > What is the following config:
> > cat /proc/sys/net/core/busy_poll
> > cat /proc/sys/net/core/busy_read
> > cat /sys/class/net/<ixgbe_dev>/threaded
> > And is SO_PREFER_BUSY_POLL used?
> >
> > Thanks.
> > Wei
> >
> >
> >
> > On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>
> >> Hi Eric and Wei
> >>
> >> Please see this bug report from last hour ,
> >> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
> >> Uptime before crash : 10day
> >>
> >>
> >>
> >>
> >> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
> >> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
> >> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> >> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
> >> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> >> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
> >> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
> >> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
> >> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
> >> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
> >> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
> >> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
> >> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
> >> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
> >> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
> >> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
> >> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
> >> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
> >> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
> >> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
> >> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
> >> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
> >> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
> >> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
> >> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
> >> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
> >> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
> >> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
> >> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
> >> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
> >> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
> >> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
> >> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
> >> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
> >> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
> >> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
> >> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
> >> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
> >> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
> >> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
> >> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
> >>
> >>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
> >>>
> >>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>
> >>>> Hi Eric and Wei
> >>>>
> >>>> Please check this log :
> >>>>
> >>>
> >>> Please send a normal report to netdev.
> >>>
> >>> This has nothing to to with us (Eric & Wei)
> >>>
> >>> Thanks.
> >>>
> >>>>
> >>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
> >>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
> >>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >>>> [1584289.107263] Call Trace:
> >>>> [1584289.107266]  dump_stack+0x58/0x6b
> >>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
> >>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
> >>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
> >>>> [1584289.474009]  allocate_slab+0x272/0x450
> >>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
> >>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
> >>>> [1584289.541416]  build_skb+0x1a/0x200
> >>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
> >>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
> >>>> [1584289.605528]  __napi_poll+0x1f/0x130
> >>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
> >>>> [1584289.646110]  ? __napi_poll+0x130/0x130
> >>>> [1584289.665810]  kthread+0xea/0x120
> >>>> [1584289.684836]  ? kthread_park+0x80/0x80
> >>>> [1584289.703440]  ret_from_fork+0x1f/0x30
> >>>> [1584289.721616] Mem-Info:
> >>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
> >>>>                 active_file:17408 inactive_file:149 isolated_file:32
> >>>>                 unevictable:1440359 dirty:17500 writeback:0
> >>>>                 slab_reclaimable:43368 slab_unreclaimable:155124
> >>>>                 mapped:817431 shmem:7650 pagetables:32093 bounce:0
> >>>>                 free:17832 free_pcp:113 free_cma:0
> >>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
> >>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
> >>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
> >>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
> >>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
> >>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
> >>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
> >>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
> >>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
> >>>> [1584290.409087] 1465768 total pagecache pages
> >>>> [1584290.434531] 4165289 pages RAM
> >>>> [1584290.459616] 0 pages HighMem/MovableOnly
> >>>> [1584290.484480] 104766 pages reserved
> >>>> [1584290.508709] 0 pages hwpoisoned
> >>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
> >>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
> >>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
> >>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >>>> [1584302.776532] Call Trace:
> >>>> [1584302.799361]  dump_stack+0x58/0x6b
> >>>> [1584302.821791]  dump_header+0x4c/0x2e6
> >>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
> >>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
> >>>> [1584302.886641]  out_of_memory+0x54/0xa0
> >>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
> >>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
> >>>> [1584302.947874]  __get_free_pages+0x8/0x30
> >>>> [1584302.967246]  pgd_alloc+0x21/0x180
> >>>> [1584302.986355]  mm_alloc+0x1af/0x250
> >>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
> >>>> [1584303.023328]  do_execveat_common+0x8b/0x330
> >>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
> >>>> [1584303.058513]  do_syscall_64+0x2d/0x40
> >>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>> [1584303.091891] RIP: 0033:0x488376
> >>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
> >>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
> >>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
> >>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
> >>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
> >>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
> >>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
> >>>> [1584303.379094] Mem-Info:
> >>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
> >>>>                 active_file:12975 inactive_file:168 isolated_file:32
> >>>>                 unevictable:909709 dirty:12864 writeback:10
> >>>>                 slab_reclaimable:42415 slab_unreclaimable:154783
> >>>>                 mapped:39825 shmem:14744 pagetables:26041 bounce:0
> >>>>                 free:537002 free_pcp:1813 free_cma:0
> >>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
> >>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
> >>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
> >>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
> >>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
> >>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
> >>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
> >>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
> >>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
> >>>> [1584304.287094] 933871 total pagecache pages
> >>>> [1584304.312815] 4165289 pages RAM
> >>>> [1584304.337915] 0 pages HighMem/MovableOnly
> >>>> [1584304.362522] 104766 pages reserved
> >>>> [1584304.386516] 0 pages hwpoisoned
> >>>>
> >>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
> >>>>>
> >>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Wei
> >>>>>> Check this:
> >>>>>>
> >>>>>> [   39.706567] ------------[ cut here ]------------
> >>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
> >>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>>>
> >>>>> Probably more relevant to Intel maintainers than Wei :/
> >>>>>
> >>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
> >>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
> >>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
> >>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
> >>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> >>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
> >>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
> >>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
> >>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
> >>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
> >>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
> >>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
> >>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
> >>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>> [   39.706656] Call Trace:
> >>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
> >>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
> >>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
> >>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
> >>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
> >>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
> >>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
> >>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
> >>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
> >>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
> >>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
> >>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
> >>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
> >>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
> >>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
> >>>>>> [   39.706754]  process_one_work+0x1ec/0x350
> >>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
> >>>>>> [   39.706760]  ? process_one_work+0x350/0x350
> >>>>>> [   39.706762]  kthread+0xea/0x120
> >>>>>> [   39.706766]  ? kthread_park+0x80/0x80
> >>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
> >>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
> >>>>>>
> >>>>>> Martin
> >>>>>>
> >>>>>>
> >>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> >>>>>>>
> >>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> >>>>>>> determine if the kthread owns this napi and could call napi->poll() on
> >>>>>>> it. However, if socket busy poll is enabled, it is possible that the
> >>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> >>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> >>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
> >>>>>>> This patch tries to fix this race by adding a new bit
> >>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> >>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> >>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
> >>>>>>> bit is set. This helps distinguish the ownership of the napi between
> >>>>>>> kthread and other scenarios and fixes the race issue.
> >>>>>>>
> >>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> >>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
> >>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> >>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
> >>>>>>> Cc: Eric Dumazet <edumazet@google.com>
> >>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
> >>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >>>>>>> ---
> >>>>>>> Change since v3:
> >>>>>>> - Add READ_ONCE() for thread->state and add comments in
> >>>>>>> ____napi_schedule().
> >>>>>>>
> >>>>>>> include/linux/netdevice.h |  2 ++
> >>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
> >>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
> >>>>>>> --- a/include/linux/netdevice.h
> >>>>>>> +++ b/include/linux/netdevice.h
> >>>>>>> @@ -360,6 +360,7 @@ enum {
> >>>>>>>    NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> >>>>>>>    NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> >>>>>>>    NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> >>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> >>>>>>> };
> >>>>>>>
> >>>>>>> enum {
> >>>>>>> @@ -372,6 +373,7 @@ enum {
> >>>>>>>    NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> >>>>>>>    NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> >>>>>>>    NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> >>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> >>>>>>> };
> >>>>>>>
> >>>>>>> enum gro_result {
> >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>>>>> index 6c5967e80132..d3195a95f30e 100644
> >>>>>>> --- a/net/core/dev.c
> >>>>>>> +++ b/net/core/dev.c
> >>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >>>>>>>             */
> >>>>>>>            thread = READ_ONCE(napi->thread);
> >>>>>>>            if (thread) {
> >>>>>>> +                     /* Avoid doing set_bit() if the thread is in
> >>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
> >>>>>>> +                      * makes sure to proceed with napi polling
> >>>>>>> +                      * if the thread is explicitly woken from here.
> >>>>>>> +                      */
> >>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> >>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> >>>>>>>                    wake_up_process(thread);
> >>>>>>>                    return;
> >>>>>>>            }
> >>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> >>>>>>>            WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> >>>>>>>
> >>>>>>>            new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> >>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
> >>>>>>>                          NAPIF_STATE_PREFER_BUSY_POLL);
> >>>>>>>
> >>>>>>>            /* If STATE_MISSED was set, leave STATE_SCHED set,
> >>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> >>>>>>>
> >>>>>>> static int napi_thread_wait(struct napi_struct *napi)
> >>>>>>> {
> >>>>>>> +     bool woken = false;
> >>>>>>> +
> >>>>>>>    set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>>
> >>>>>>>    while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> >>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> >>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
> >>>>>>> +              * kthread owns this napi and could poll on this napi.
> >>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
> >>>>>>> +              * set by some other busy poll thread or by napi_disable().
> >>>>>>> +              */
> >>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> >>>>>>>                    WARN_ON(!list_empty(&napi->poll_list));
> >>>>>>>                    __set_current_state(TASK_RUNNING);
> >>>>>>>                    return 0;
> >>>>>>>            }
> >>>>>>>
> >>>>>>>            schedule();
> >>>>>>> +             /* woken being true indicates this thread owns this napi. */
> >>>>>>> +             woken = true;
> >>>>>>>            set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>>    }
> >>>>>>>    __set_current_state(TASK_RUNNING);
> >>>>>>> --
> >>>>>>> 2.31.0.rc2.261.g7f71774620-goog
> >>>>>>>
> >>>>>>
> >>>>
> >>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-15 15:45               ` Wei Wang
@ 2021-09-15 20:57                 ` Martin Zaharinov
  2021-09-22 14:12                 ` Martin Zaharinov
  1 sibling, 0 replies; 25+ messages in thread
From: Martin Zaharinov @ 2021-09-15 20:57 UTC (permalink / raw)
  To: Wei Wang
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Wei

yes is not easy to reproduce this here i run this on 20 machine and this with dump run for last 10-12 day without problem .

Martin	 

> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote:
> 
> Thanks Martin for the report.
> Without a reproducer, it might be hard to debug. I will double check
> the code to check for potential race between kthread poll and busy
> poll.
> 
> Thanks.
> Wei
> 
> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Wei
>> Please see this bug log :
>> 
>> 
>> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
>> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
>> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
>> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
>> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
>> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
>> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
>> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
>> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
>> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
>> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
>> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
>> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
>> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
>> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
>> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
>> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
>> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
>> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
>> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
>> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
>> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
>> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
>> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
>> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
>> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
>> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
>> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
>> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
>> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
>> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
>> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
>> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
>> 
>> 
>> 
>> 
>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
>>> 
>>> Hi Martin,
>>> 
>>> Is there a reproducer for this? What kind of traffic is it running?
>>> What is the following config:
>>> cat /proc/sys/net/core/busy_poll
>>> cat /proc/sys/net/core/busy_read
>>> cat /sys/class/net/<ixgbe_dev>/threaded
>>> And is SO_PREFER_BUSY_POLL used?
>>> 
>>> Thanks.
>>> Wei
>>> 
>>> 
>>> 
>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>> 
>>>> Hi Eric and Wei
>>>> 
>>>> Please see this bug report from last hour ,
>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
>>>> Uptime before crash : 10day
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>>>> 
>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
>>>>> 
>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Eric and Wei
>>>>>> 
>>>>>> Please check this log :
>>>>>> 
>>>>> 
>>>>> Please send a normal report to netdev.
>>>>> 
>>>>> This has nothing to to with us (Eric & Wei)
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>>> 
>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>> [1584289.107263] Call Trace:
>>>>>> [1584289.107266]  dump_stack+0x58/0x6b
>>>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>>>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>>>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>>>>>> [1584289.474009]  allocate_slab+0x272/0x450
>>>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>>>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>>>>>> [1584289.541416]  build_skb+0x1a/0x200
>>>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>>>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>>>>>> [1584289.605528]  __napi_poll+0x1f/0x130
>>>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>>>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
>>>>>> [1584289.665810]  kthread+0xea/0x120
>>>>>> [1584289.684836]  ? kthread_park+0x80/0x80
>>>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
>>>>>> [1584289.721616] Mem-Info:
>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>>>>>>                active_file:17408 inactive_file:149 isolated_file:32
>>>>>>                unevictable:1440359 dirty:17500 writeback:0
>>>>>>                slab_reclaimable:43368 slab_unreclaimable:155124
>>>>>>                mapped:817431 shmem:7650 pagetables:32093 bounce:0
>>>>>>                free:17832 free_pcp:113 free_cma:0
>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>>>>>> [1584290.409087] 1465768 total pagecache pages
>>>>>> [1584290.434531] 4165289 pages RAM
>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly
>>>>>> [1584290.484480] 104766 pages reserved
>>>>>> [1584290.508709] 0 pages hwpoisoned
>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>> [1584302.776532] Call Trace:
>>>>>> [1584302.799361]  dump_stack+0x58/0x6b
>>>>>> [1584302.821791]  dump_header+0x4c/0x2e6
>>>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>>>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>>>>>> [1584302.886641]  out_of_memory+0x54/0xa0
>>>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>>>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>>>>>> [1584302.947874]  __get_free_pages+0x8/0x30
>>>>>> [1584302.967246]  pgd_alloc+0x21/0x180
>>>>>> [1584302.986355]  mm_alloc+0x1af/0x250
>>>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>>>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
>>>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>>>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
>>>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>> [1584303.091891] RIP: 0033:0x488376
>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>>>>>> [1584303.379094] Mem-Info:
>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>>>>>>                active_file:12975 inactive_file:168 isolated_file:32
>>>>>>                unevictable:909709 dirty:12864 writeback:10
>>>>>>                slab_reclaimable:42415 slab_unreclaimable:154783
>>>>>>                mapped:39825 shmem:14744 pagetables:26041 bounce:0
>>>>>>                free:537002 free_pcp:1813 free_cma:0
>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>>>>>> [1584304.287094] 933871 total pagecache pages
>>>>>> [1584304.312815] 4165289 pages RAM
>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly
>>>>>> [1584304.362522] 104766 pages reserved
>>>>>> [1584304.386516] 0 pages hwpoisoned
>>>>>> 
>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>>>>>>> 
>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Wei
>>>>>>>> Check this:
>>>>>>>> 
>>>>>>>> [   39.706567] ------------[ cut here ]------------
>>>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>>>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>> 
>>>>>>> Probably more relevant to Intel maintainers than Wei :/
>>>>>>> 
>>>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>>>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>>>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>>>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>>>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>>>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>>>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>>>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>>>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>>>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>>>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>>>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>>>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> [   39.706656] Call Trace:
>>>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>>>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>>>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>>>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>>>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>>>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>>>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>>>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>>>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>>>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>>>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>>>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>>>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>>>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>>>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>>>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
>>>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>>>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
>>>>>>>> [   39.706762]  kthread+0xea/0x120
>>>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
>>>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>>>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>>>>>>>> 
>>>>>>>> Martin
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>>>>>>>> 
>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>>>>>>>> This patch tries to fix this race by adding a new bit
>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between
>>>>>>>>> kthread and other scenarios and fixes the race issue.
>>>>>>>>> 
>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>>>>>> ---
>>>>>>>>> Change since v3:
>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in
>>>>>>>>> ____napi_schedule().
>>>>>>>>> 
>>>>>>>>> include/linux/netdevice.h |  2 ++
>>>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>>>>>>>>> --- a/include/linux/netdevice.h
>>>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>>>> @@ -360,6 +360,7 @@ enum {
>>>>>>>>>   NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>>>>>>>   NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>>>>>>>   NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> enum {
>>>>>>>>> @@ -372,6 +373,7 @@ enum {
>>>>>>>>>   NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>>>>>>>   NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>>>>>>>   NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> enum gro_result {
>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644
>>>>>>>>> --- a/net/core/dev.c
>>>>>>>>> +++ b/net/core/dev.c
>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>>>>>>>            */
>>>>>>>>>           thread = READ_ONCE(napi->thread);
>>>>>>>>>           if (thread) {
>>>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
>>>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>>>>>>>> +                      * makes sure to proceed with napi polling
>>>>>>>>> +                      * if the thread is explicitly woken from here.
>>>>>>>>> +                      */
>>>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>>>>>>>                   wake_up_process(thread);
>>>>>>>>>                   return;
>>>>>>>>>           }
>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>>>>>>>           WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>>>>>>>> 
>>>>>>>>>           new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>>>>>>>                         NAPIF_STATE_PREFER_BUSY_POLL);
>>>>>>>>> 
>>>>>>>>>           /* If STATE_MISSED was set, leave STATE_SCHED set,
>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>>>>>>>> 
>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
>>>>>>>>> {
>>>>>>>>> +     bool woken = false;
>>>>>>>>> +
>>>>>>>>>   set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>> 
>>>>>>>>>   while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>>>>>>>> +              * kthread owns this napi and could poll on this napi.
>>>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>>>>>>>> +              * set by some other busy poll thread or by napi_disable().
>>>>>>>>> +              */
>>>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>>>>>>>                   WARN_ON(!list_empty(&napi->poll_list));
>>>>>>>>>                   __set_current_state(TASK_RUNNING);
>>>>>>>>>                   return 0;
>>>>>>>>>           }
>>>>>>>>> 
>>>>>>>>>           schedule();
>>>>>>>>> +             /* woken being true indicates this thread owns this napi. */
>>>>>>>>> +             woken = true;
>>>>>>>>>           set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>>   }
>>>>>>>>>   __set_current_state(TASK_RUNNING);
>>>>>>>>> --
>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-15 15:45               ` Wei Wang
  2021-09-15 20:57                 ` Martin Zaharinov
@ 2021-09-22 14:12                 ` Martin Zaharinov
  2021-09-23 20:31                   ` Martin Zaharinov
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-09-22 14:12 UTC (permalink / raw)
  To: Wei Wang
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Wei

One more bug report from last hours:



Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.

> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote:
> 
> Thanks Martin for the report.
> Without a reproducer, it might be hard to debug. I will double check
> the code to check for potential race between kthread poll and busy
> poll.
> 
> Thanks.
> Wei
> 
> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi Wei
>> Please see this bug log :
>> 
>> 
>> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
>> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
>> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
>> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
>> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
>> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
>> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
>> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
>> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
>> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
>> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
>> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
>> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
>> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
>> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
>> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
>> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
>> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
>> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
>> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
>> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
>> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
>> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
>> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
>> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
>> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
>> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
>> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
>> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
>> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
>> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
>> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
>> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
>> 
>> 
>> 
>> 
>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
>>> 
>>> Hi Martin,
>>> 
>>> Is there a reproducer for this? What kind of traffic is it running?
>>> What is the following config:
>>> cat /proc/sys/net/core/busy_poll
>>> cat /proc/sys/net/core/busy_read
>>> cat /sys/class/net/<ixgbe_dev>/threaded
>>> And is SO_PREFER_BUSY_POLL used?
>>> 
>>> Thanks.
>>> Wei
>>> 
>>> 
>>> 
>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>> 
>>>> Hi Eric and Wei
>>>> 
>>>> Please see this bug report from last hour ,
>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
>>>> Uptime before crash : 10day
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>>>> 
>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
>>>>> 
>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Eric and Wei
>>>>>> 
>>>>>> Please check this log :
>>>>>> 
>>>>> 
>>>>> Please send a normal report to netdev.
>>>>> 
>>>>> This has nothing to to with us (Eric & Wei)
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>>> 
>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>> [1584289.107263] Call Trace:
>>>>>> [1584289.107266]  dump_stack+0x58/0x6b
>>>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>>>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>>>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>>>>>> [1584289.474009]  allocate_slab+0x272/0x450
>>>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>>>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>>>>>> [1584289.541416]  build_skb+0x1a/0x200
>>>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>>>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>>>>>> [1584289.605528]  __napi_poll+0x1f/0x130
>>>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>>>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
>>>>>> [1584289.665810]  kthread+0xea/0x120
>>>>>> [1584289.684836]  ? kthread_park+0x80/0x80
>>>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
>>>>>> [1584289.721616] Mem-Info:
>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>>>>>>                active_file:17408 inactive_file:149 isolated_file:32
>>>>>>                unevictable:1440359 dirty:17500 writeback:0
>>>>>>                slab_reclaimable:43368 slab_unreclaimable:155124
>>>>>>                mapped:817431 shmem:7650 pagetables:32093 bounce:0
>>>>>>                free:17832 free_pcp:113 free_cma:0
>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>>>>>> [1584290.409087] 1465768 total pagecache pages
>>>>>> [1584290.434531] 4165289 pages RAM
>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly
>>>>>> [1584290.484480] 104766 pages reserved
>>>>>> [1584290.508709] 0 pages hwpoisoned
>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>> [1584302.776532] Call Trace:
>>>>>> [1584302.799361]  dump_stack+0x58/0x6b
>>>>>> [1584302.821791]  dump_header+0x4c/0x2e6
>>>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>>>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>>>>>> [1584302.886641]  out_of_memory+0x54/0xa0
>>>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>>>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>>>>>> [1584302.947874]  __get_free_pages+0x8/0x30
>>>>>> [1584302.967246]  pgd_alloc+0x21/0x180
>>>>>> [1584302.986355]  mm_alloc+0x1af/0x250
>>>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>>>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
>>>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>>>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
>>>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>> [1584303.091891] RIP: 0033:0x488376
>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>>>>>> [1584303.379094] Mem-Info:
>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>>>>>>                active_file:12975 inactive_file:168 isolated_file:32
>>>>>>                unevictable:909709 dirty:12864 writeback:10
>>>>>>                slab_reclaimable:42415 slab_unreclaimable:154783
>>>>>>                mapped:39825 shmem:14744 pagetables:26041 bounce:0
>>>>>>                free:537002 free_pcp:1813 free_cma:0
>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>>>>>> [1584304.287094] 933871 total pagecache pages
>>>>>> [1584304.312815] 4165289 pages RAM
>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly
>>>>>> [1584304.362522] 104766 pages reserved
>>>>>> [1584304.386516] 0 pages hwpoisoned
>>>>>> 
>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>>>>>>> 
>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Wei
>>>>>>>> Check this:
>>>>>>>> 
>>>>>>>> [   39.706567] ------------[ cut here ]------------
>>>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>>>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>> 
>>>>>>> Probably more relevant to Intel maintainers than Wei :/
>>>>>>> 
>>>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>>>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>>>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>>>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>>>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>>>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>>>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>>>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>>>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>>>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>>>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>>>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>>>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> [   39.706656] Call Trace:
>>>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>>>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>>>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>>>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>>>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>>>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>>>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>>>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>>>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>>>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>>>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>>>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>>>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>>>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>>>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>>>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
>>>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>>>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
>>>>>>>> [   39.706762]  kthread+0xea/0x120
>>>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
>>>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>>>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>>>>>>>> 
>>>>>>>> Martin
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>>>>>>>> 
>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>>>>>>>> This patch tries to fix this race by adding a new bit
>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between
>>>>>>>>> kthread and other scenarios and fixes the race issue.
>>>>>>>>> 
>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>>>>>> ---
>>>>>>>>> Change since v3:
>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in
>>>>>>>>> ____napi_schedule().
>>>>>>>>> 
>>>>>>>>> include/linux/netdevice.h |  2 ++
>>>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>>>>>>> 
>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>>>>>>>>> --- a/include/linux/netdevice.h
>>>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>>>> @@ -360,6 +360,7 @@ enum {
>>>>>>>>>   NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>>>>>>>   NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>>>>>>>   NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> enum {
>>>>>>>>> @@ -372,6 +373,7 @@ enum {
>>>>>>>>>   NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>>>>>>>   NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>>>>>>>   NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>>>>>>>> };
>>>>>>>>> 
>>>>>>>>> enum gro_result {
>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644
>>>>>>>>> --- a/net/core/dev.c
>>>>>>>>> +++ b/net/core/dev.c
>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>>>>>>>            */
>>>>>>>>>           thread = READ_ONCE(napi->thread);
>>>>>>>>>           if (thread) {
>>>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
>>>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>>>>>>>> +                      * makes sure to proceed with napi polling
>>>>>>>>> +                      * if the thread is explicitly woken from here.
>>>>>>>>> +                      */
>>>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>>>>>>>                   wake_up_process(thread);
>>>>>>>>>                   return;
>>>>>>>>>           }
>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>>>>>>>           WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>>>>>>>> 
>>>>>>>>>           new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>>>>>>>                         NAPIF_STATE_PREFER_BUSY_POLL);
>>>>>>>>> 
>>>>>>>>>           /* If STATE_MISSED was set, leave STATE_SCHED set,
>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>>>>>>>> 
>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
>>>>>>>>> {
>>>>>>>>> +     bool woken = false;
>>>>>>>>> +
>>>>>>>>>   set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>> 
>>>>>>>>>   while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>>>>>>>> +              * kthread owns this napi and could poll on this napi.
>>>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>>>>>>>> +              * set by some other busy poll thread or by napi_disable().
>>>>>>>>> +              */
>>>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>>>>>>>                   WARN_ON(!list_empty(&napi->poll_list));
>>>>>>>>>                   __set_current_state(TASK_RUNNING);
>>>>>>>>>                   return 0;
>>>>>>>>>           }
>>>>>>>>> 
>>>>>>>>>           schedule();
>>>>>>>>> +             /* woken being true indicates this thread owns this napi. */
>>>>>>>>> +             woken = true;
>>>>>>>>>           set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>>   }
>>>>>>>>>   __set_current_state(TASK_RUNNING);
>>>>>>>>> --
>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-22 14:12                 ` Martin Zaharinov
@ 2021-09-23 20:31                   ` Martin Zaharinov
  2021-09-24  0:54                     ` Wei Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-09-23 20:31 UTC (permalink / raw)
  To: Wei Wang
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hey Wai

If you find any fix for this write me to test .

kthread is a very good solution for network load server but need to find from where is come this bug .


Martin

> On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Hi Wei
> 
> One more bug report from last hours:
> 
> 
> 
> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
> 
>> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote:
>> 
>> Thanks Martin for the report.
>> Without a reproducer, it might be hard to debug. I will double check
>> the code to check for potential race between kthread poll and busy
>> poll.
>> 
>> Thanks.
>> Wei
>> 
>> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>> 
>>> Hi Wei
>>> Please see this bug log :
>>> 
>>> 
>>> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>>> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
>>> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
>>> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
>>> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
>>> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
>>> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>>> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
>>> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
>>> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
>>> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
>>> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
>>> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
>>> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
>>> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
>>> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
>>> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
>>> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
>>> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
>>> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>>> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>>> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>>> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>>> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>>> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>>> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>>> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>>> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>>> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>>> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
>>> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
>>> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
>>> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
>>> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>>> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
>>> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
>>> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
>>> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
>>> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
>>> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
>>> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
>>> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
>>> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
>>> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>>> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>>> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>>> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>>> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>>> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>>> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>>> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>>> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>>> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>>> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
>>> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
>>> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
>>> 
>>> 
>>> 
>>> 
>>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
>>>> 
>>>> Hi Martin,
>>>> 
>>>> Is there a reproducer for this? What kind of traffic is it running?
>>>> What is the following config:
>>>> cat /proc/sys/net/core/busy_poll
>>>> cat /proc/sys/net/core/busy_read
>>>> cat /sys/class/net/<ixgbe_dev>/threaded
>>>> And is SO_PREFER_BUSY_POLL used?
>>>> 
>>>> Thanks.
>>>> Wei
>>>> 
>>>> 
>>>> 
>>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>> 
>>>>> Hi Eric and Wei
>>>>> 
>>>>> Please see this bug report from last hour ,
>>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
>>>>> Uptime before crash : 10day
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>>>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>>>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>>>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>>>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>>>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>>>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>>>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>>>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>>>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>>>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>>>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>>>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>>>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>>>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>>>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>>>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>>>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>>>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>>>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>>>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>>>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>>>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>>>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>>>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>>>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>>>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>>>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>>>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>>>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>>>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>>>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>>>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>>>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>>>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>>>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>>>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>>>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>>>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>>>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>>>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>>>>> 
>>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
>>>>>> 
>>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Eric and Wei
>>>>>>> 
>>>>>>> Please check this log :
>>>>>>> 
>>>>>> 
>>>>>> Please send a normal report to netdev.
>>>>>> 
>>>>>> This has nothing to to with us (Eric & Wei)
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>>> 
>>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>>> [1584289.107263] Call Trace:
>>>>>>> [1584289.107266]  dump_stack+0x58/0x6b
>>>>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>>>>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>>>>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>>>>>>> [1584289.474009]  allocate_slab+0x272/0x450
>>>>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>>>>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>>>>>>> [1584289.541416]  build_skb+0x1a/0x200
>>>>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>>>>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>>>>>>> [1584289.605528]  __napi_poll+0x1f/0x130
>>>>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>>>>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
>>>>>>> [1584289.665810]  kthread+0xea/0x120
>>>>>>> [1584289.684836]  ? kthread_park+0x80/0x80
>>>>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
>>>>>>> [1584289.721616] Mem-Info:
>>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>>>>>>>               active_file:17408 inactive_file:149 isolated_file:32
>>>>>>>               unevictable:1440359 dirty:17500 writeback:0
>>>>>>>               slab_reclaimable:43368 slab_unreclaimable:155124
>>>>>>>               mapped:817431 shmem:7650 pagetables:32093 bounce:0
>>>>>>>               free:17832 free_pcp:113 free_cma:0
>>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>>>>>>> [1584290.409087] 1465768 total pagecache pages
>>>>>>> [1584290.434531] 4165289 pages RAM
>>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly
>>>>>>> [1584290.484480] 104766 pages reserved
>>>>>>> [1584290.508709] 0 pages hwpoisoned
>>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>>> [1584302.776532] Call Trace:
>>>>>>> [1584302.799361]  dump_stack+0x58/0x6b
>>>>>>> [1584302.821791]  dump_header+0x4c/0x2e6
>>>>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>>>>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>>>>>>> [1584302.886641]  out_of_memory+0x54/0xa0
>>>>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>>>>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>>>>>>> [1584302.947874]  __get_free_pages+0x8/0x30
>>>>>>> [1584302.967246]  pgd_alloc+0x21/0x180
>>>>>>> [1584302.986355]  mm_alloc+0x1af/0x250
>>>>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>>>>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
>>>>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>>>>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
>>>>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>> [1584303.091891] RIP: 0033:0x488376
>>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>>>>>>> [1584303.379094] Mem-Info:
>>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>>>>>>>               active_file:12975 inactive_file:168 isolated_file:32
>>>>>>>               unevictable:909709 dirty:12864 writeback:10
>>>>>>>               slab_reclaimable:42415 slab_unreclaimable:154783
>>>>>>>               mapped:39825 shmem:14744 pagetables:26041 bounce:0
>>>>>>>               free:537002 free_pcp:1813 free_cma:0
>>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>>>>>>> [1584304.287094] 933871 total pagecache pages
>>>>>>> [1584304.312815] 4165289 pages RAM
>>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly
>>>>>>> [1584304.362522] 104766 pages reserved
>>>>>>> [1584304.386516] 0 pages hwpoisoned
>>>>>>> 
>>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>> 
>>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Wei
>>>>>>>>> Check this:
>>>>>>>>> 
>>>>>>>>> [   39.706567] ------------[ cut here ]------------
>>>>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>>>>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>>> 
>>>>>>>> Probably more relevant to Intel maintainers than Wei :/
>>>>>>>> 
>>>>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>>>>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>>>>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>>>>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>>>>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>>>>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>>>>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>>>>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>>>>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>>>>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>>>>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>>>>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>>>>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>> [   39.706656] Call Trace:
>>>>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>>>>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>>>>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>>>>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>>>>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>>>>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>>>>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>>>>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>>>>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>>>>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>>>>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>>>>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>>>>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>>>>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>>>>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>>>>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
>>>>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>>>>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
>>>>>>>>> [   39.706762]  kthread+0xea/0x120
>>>>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
>>>>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>>>>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>>>>>>>>> 
>>>>>>>>> Martin
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
>>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
>>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>>>>>>>>> This patch tries to fix this race by adding a new bit
>>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between
>>>>>>>>>> kthread and other scenarios and fixes the race issue.
>>>>>>>>>> 
>>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>>>>>>> ---
>>>>>>>>>> Change since v3:
>>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in
>>>>>>>>>> ____napi_schedule().
>>>>>>>>>> 
>>>>>>>>>> include/linux/netdevice.h |  2 ++
>>>>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>>>>>>>> 
>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>>>>>>>>>> --- a/include/linux/netdevice.h
>>>>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>>>>> @@ -360,6 +360,7 @@ enum {
>>>>>>>>>>  NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>>>>>>>>  NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>>>>>>>>  NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>>>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>>>>>>>>> };
>>>>>>>>>> 
>>>>>>>>>> enum {
>>>>>>>>>> @@ -372,6 +373,7 @@ enum {
>>>>>>>>>>  NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>>>>>>>>  NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>>>>>>>>  NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>>>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>>>>>>>>> };
>>>>>>>>>> 
>>>>>>>>>> enum gro_result {
>>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644
>>>>>>>>>> --- a/net/core/dev.c
>>>>>>>>>> +++ b/net/core/dev.c
>>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>>>>>>>>           */
>>>>>>>>>>          thread = READ_ONCE(napi->thread);
>>>>>>>>>>          if (thread) {
>>>>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
>>>>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>>>>>>>>> +                      * makes sure to proceed with napi polling
>>>>>>>>>> +                      * if the thread is explicitly woken from here.
>>>>>>>>>> +                      */
>>>>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>>>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>>>>>>>>                  wake_up_process(thread);
>>>>>>>>>>                  return;
>>>>>>>>>>          }
>>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>>>>>>>>          WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>>>>>>>>> 
>>>>>>>>>>          new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>>>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>>>>>>>>                        NAPIF_STATE_PREFER_BUSY_POLL);
>>>>>>>>>> 
>>>>>>>>>>          /* If STATE_MISSED was set, leave STATE_SCHED set,
>>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>>>>>>>>> 
>>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
>>>>>>>>>> {
>>>>>>>>>> +     bool woken = false;
>>>>>>>>>> +
>>>>>>>>>>  set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>>> 
>>>>>>>>>>  while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>>>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>>>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>>>>>>>>> +              * kthread owns this napi and could poll on this napi.
>>>>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>>>>>>>>> +              * set by some other busy poll thread or by napi_disable().
>>>>>>>>>> +              */
>>>>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>>>>>>>>                  WARN_ON(!list_empty(&napi->poll_list));
>>>>>>>>>>                  __set_current_state(TASK_RUNNING);
>>>>>>>>>>                  return 0;
>>>>>>>>>>          }
>>>>>>>>>> 
>>>>>>>>>>          schedule();
>>>>>>>>>> +             /* woken being true indicates this thread owns this napi. */
>>>>>>>>>> +             woken = true;
>>>>>>>>>>          set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>>>  }
>>>>>>>>>>  __set_current_state(TASK_RUNNING);
>>>>>>>>>> --
>>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-23 20:31                   ` Martin Zaharinov
@ 2021-09-24  0:54                     ` Wei Wang
  2021-09-24  6:18                       ` Martin Zaharinov
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Wang @ 2021-09-24  0:54 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Martin,

It looks like there might still be a race between kthread polling and
busy polling. I am looking into the code but was not able to identify
the cause.
May I ask why you need to enable both at the same time?

Thanks.
Wei


On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hey Wai
>
> If you find any fix for this write me to test .
>
> kthread is a very good solution for network load server but need to find from where is come this bug .
>
>
> Martin
>
> > On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote:
> >
> > Hi Wei
> >
> > One more bug report from last hours:
> >
> >
> >
> > Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
> > Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
> > Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> > Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> > Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
> > Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> > Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> > Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
> > Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
> > Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
> > Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
> > Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
> > Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
> > Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
> > Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> > Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
> > Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
> > Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
> > Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
> > Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> > Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> > Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
> > Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
> > Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
> > Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
> > Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> > Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> > Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
> > Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
> > Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
> > Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
> > Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
> > Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
> > Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
> > Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
> > Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> > Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> > Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> > Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> > Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> > Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> > Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> > Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> > Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> > Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> > Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> > Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
> > Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
> > Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
> > Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
> > Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> > Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> > Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
> > Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
> > Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
> > Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
> > Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> > Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> > Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
> > Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> > Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
> > Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
> > Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> > Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> > Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> > Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> > Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> > Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> > Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> > Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> > Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> > Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> > Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
> > Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
> > Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
> >
> >> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote:
> >>
> >> Thanks Martin for the report.
> >> Without a reproducer, it might be hard to debug. I will double check
> >> the code to check for potential race between kthread poll and busy
> >> poll.
> >>
> >> Thanks.
> >> Wei
> >>
> >> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>
> >>> Hi Wei
> >>> Please see this bug log :
> >>>
> >>>
> >>> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> >>> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
> >>> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
> >>> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
> >>> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
> >>> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
> >>> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
> >>> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
> >>> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
> >>> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
> >>> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
> >>> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
> >>> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
> >>> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
> >>> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >>> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
> >>> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
> >>> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
> >>> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
> >>> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
> >>> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
> >>> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
> >>> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
> >>> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> >>> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
> >>> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
> >>> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
> >>> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
> >>> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
> >>> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> >>> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
> >>> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
> >>> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
> >>> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
> >>> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
> >>> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
> >>> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
> >>> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
> >>> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
> >>> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
> >>> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
> >>> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
> >>> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
> >>> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
> >>> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
> >>> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
> >>> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
> >>> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> >>> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
> >>> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
> >>> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
> >>> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
> >>> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
> >>> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> >>> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
> >>> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
> >>> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
> >>>
> >>>
> >>>
> >>>
> >>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
> >>>>
> >>>> Hi Martin,
> >>>>
> >>>> Is there a reproducer for this? What kind of traffic is it running?
> >>>> What is the following config:
> >>>> cat /proc/sys/net/core/busy_poll
> >>>> cat /proc/sys/net/core/busy_read
> >>>> cat /sys/class/net/<ixgbe_dev>/threaded
> >>>> And is SO_PREFER_BUSY_POLL used?
> >>>>
> >>>> Thanks.
> >>>> Wei
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>
> >>>>> Hi Eric and Wei
> >>>>>
> >>>>> Please see this bug report from last hour ,
> >>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
> >>>>> Uptime before crash : 10day
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
> >>>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
> >>>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> >>>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >>>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
> >>>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> >>>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
> >>>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
> >>>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
> >>>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
> >>>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
> >>>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
> >>>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
> >>>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
> >>>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
> >>>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
> >>>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >>>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >>>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
> >>>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
> >>>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
> >>>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
> >>>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >>>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >>>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
> >>>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
> >>>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
> >>>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
> >>>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
> >>>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
> >>>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
> >>>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
> >>>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >>>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >>>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >>>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >>>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >>>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >>>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >>>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >>>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
> >>>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
> >>>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
> >>>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
> >>>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >>>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >>>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
> >>>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
> >>>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
> >>>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
> >>>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >>>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >>>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
> >>>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >>>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
> >>>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
> >>>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >>>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >>>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >>>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >>>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >>>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >>>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >>>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >>>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
> >>>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
> >>>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
> >>>>>
> >>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>
> >>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Eric and Wei
> >>>>>>>
> >>>>>>> Please check this log :
> >>>>>>>
> >>>>>>
> >>>>>> Please send a normal report to netdev.
> >>>>>>
> >>>>>> This has nothing to to with us (Eric & Wei)
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>>>
> >>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
> >>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
> >>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >>>>>>> [1584289.107263] Call Trace:
> >>>>>>> [1584289.107266]  dump_stack+0x58/0x6b
> >>>>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
> >>>>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
> >>>>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
> >>>>>>> [1584289.474009]  allocate_slab+0x272/0x450
> >>>>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
> >>>>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
> >>>>>>> [1584289.541416]  build_skb+0x1a/0x200
> >>>>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
> >>>>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
> >>>>>>> [1584289.605528]  __napi_poll+0x1f/0x130
> >>>>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
> >>>>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
> >>>>>>> [1584289.665810]  kthread+0xea/0x120
> >>>>>>> [1584289.684836]  ? kthread_park+0x80/0x80
> >>>>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
> >>>>>>> [1584289.721616] Mem-Info:
> >>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
> >>>>>>>               active_file:17408 inactive_file:149 isolated_file:32
> >>>>>>>               unevictable:1440359 dirty:17500 writeback:0
> >>>>>>>               slab_reclaimable:43368 slab_unreclaimable:155124
> >>>>>>>               mapped:817431 shmem:7650 pagetables:32093 bounce:0
> >>>>>>>               free:17832 free_pcp:113 free_cma:0
> >>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
> >>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
> >>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
> >>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
> >>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
> >>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
> >>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
> >>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
> >>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
> >>>>>>> [1584290.409087] 1465768 total pagecache pages
> >>>>>>> [1584290.434531] 4165289 pages RAM
> >>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly
> >>>>>>> [1584290.484480] 104766 pages reserved
> >>>>>>> [1584290.508709] 0 pages hwpoisoned
> >>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
> >>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
> >>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
> >>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >>>>>>> [1584302.776532] Call Trace:
> >>>>>>> [1584302.799361]  dump_stack+0x58/0x6b
> >>>>>>> [1584302.821791]  dump_header+0x4c/0x2e6
> >>>>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
> >>>>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
> >>>>>>> [1584302.886641]  out_of_memory+0x54/0xa0
> >>>>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
> >>>>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
> >>>>>>> [1584302.947874]  __get_free_pages+0x8/0x30
> >>>>>>> [1584302.967246]  pgd_alloc+0x21/0x180
> >>>>>>> [1584302.986355]  mm_alloc+0x1af/0x250
> >>>>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
> >>>>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
> >>>>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
> >>>>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
> >>>>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>>>> [1584303.091891] RIP: 0033:0x488376
> >>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
> >>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
> >>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
> >>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
> >>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
> >>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
> >>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
> >>>>>>> [1584303.379094] Mem-Info:
> >>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
> >>>>>>>               active_file:12975 inactive_file:168 isolated_file:32
> >>>>>>>               unevictable:909709 dirty:12864 writeback:10
> >>>>>>>               slab_reclaimable:42415 slab_unreclaimable:154783
> >>>>>>>               mapped:39825 shmem:14744 pagetables:26041 bounce:0
> >>>>>>>               free:537002 free_pcp:1813 free_cma:0
> >>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
> >>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
> >>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
> >>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
> >>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
> >>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
> >>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
> >>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
> >>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
> >>>>>>> [1584304.287094] 933871 total pagecache pages
> >>>>>>> [1584304.312815] 4165289 pages RAM
> >>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly
> >>>>>>> [1584304.362522] 104766 pages reserved
> >>>>>>> [1584304.386516] 0 pages hwpoisoned
> >>>>>>>
> >>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>>>
> >>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Wei
> >>>>>>>>> Check this:
> >>>>>>>>>
> >>>>>>>>> [   39.706567] ------------[ cut here ]------------
> >>>>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
> >>>>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>>>>>>
> >>>>>>>> Probably more relevant to Intel maintainers than Wei :/
> >>>>>>>>
> >>>>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
> >>>>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
> >>>>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
> >>>>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
> >>>>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> >>>>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
> >>>>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
> >>>>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
> >>>>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
> >>>>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
> >>>>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
> >>>>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
> >>>>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
> >>>>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>>>> [   39.706656] Call Trace:
> >>>>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
> >>>>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
> >>>>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
> >>>>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
> >>>>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
> >>>>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
> >>>>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
> >>>>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
> >>>>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
> >>>>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
> >>>>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
> >>>>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
> >>>>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
> >>>>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
> >>>>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
> >>>>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
> >>>>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
> >>>>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
> >>>>>>>>> [   39.706762]  kthread+0xea/0x120
> >>>>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
> >>>>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
> >>>>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
> >>>>>>>>>
> >>>>>>>>> Martin
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> >>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
> >>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
> >>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> >>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> >>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
> >>>>>>>>>> This patch tries to fix this race by adding a new bit
> >>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> >>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> >>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
> >>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between
> >>>>>>>>>> kthread and other scenarios and fixes the race issue.
> >>>>>>>>>>
> >>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> >>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
> >>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> >>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
> >>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
> >>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
> >>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >>>>>>>>>> ---
> >>>>>>>>>> Change since v3:
> >>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in
> >>>>>>>>>> ____napi_schedule().
> >>>>>>>>>>
> >>>>>>>>>> include/linux/netdevice.h |  2 ++
> >>>>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
> >>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
> >>>>>>>>>> --- a/include/linux/netdevice.h
> >>>>>>>>>> +++ b/include/linux/netdevice.h
> >>>>>>>>>> @@ -360,6 +360,7 @@ enum {
> >>>>>>>>>>  NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> >>>>>>>>>>  NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> >>>>>>>>>>  NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> >>>>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> >>>>>>>>>> };
> >>>>>>>>>>
> >>>>>>>>>> enum {
> >>>>>>>>>> @@ -372,6 +373,7 @@ enum {
> >>>>>>>>>>  NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> >>>>>>>>>>  NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> >>>>>>>>>>  NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> >>>>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> >>>>>>>>>> };
> >>>>>>>>>>
> >>>>>>>>>> enum gro_result {
> >>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644
> >>>>>>>>>> --- a/net/core/dev.c
> >>>>>>>>>> +++ b/net/core/dev.c
> >>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >>>>>>>>>>           */
> >>>>>>>>>>          thread = READ_ONCE(napi->thread);
> >>>>>>>>>>          if (thread) {
> >>>>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
> >>>>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
> >>>>>>>>>> +                      * makes sure to proceed with napi polling
> >>>>>>>>>> +                      * if the thread is explicitly woken from here.
> >>>>>>>>>> +                      */
> >>>>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> >>>>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> >>>>>>>>>>                  wake_up_process(thread);
> >>>>>>>>>>                  return;
> >>>>>>>>>>          }
> >>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> >>>>>>>>>>          WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> >>>>>>>>>>
> >>>>>>>>>>          new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> >>>>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
> >>>>>>>>>>                        NAPIF_STATE_PREFER_BUSY_POLL);
> >>>>>>>>>>
> >>>>>>>>>>          /* If STATE_MISSED was set, leave STATE_SCHED set,
> >>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> >>>>>>>>>>
> >>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
> >>>>>>>>>> {
> >>>>>>>>>> +     bool woken = false;
> >>>>>>>>>> +
> >>>>>>>>>>  set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>>>>>
> >>>>>>>>>>  while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> >>>>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> >>>>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
> >>>>>>>>>> +              * kthread owns this napi and could poll on this napi.
> >>>>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
> >>>>>>>>>> +              * set by some other busy poll thread or by napi_disable().
> >>>>>>>>>> +              */
> >>>>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> >>>>>>>>>>                  WARN_ON(!list_empty(&napi->poll_list));
> >>>>>>>>>>                  __set_current_state(TASK_RUNNING);
> >>>>>>>>>>                  return 0;
> >>>>>>>>>>          }
> >>>>>>>>>>
> >>>>>>>>>>          schedule();
> >>>>>>>>>> +             /* woken being true indicates this thread owns this napi. */
> >>>>>>>>>> +             woken = true;
> >>>>>>>>>>          set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>>>>>  }
> >>>>>>>>>>  __set_current_state(TASK_RUNNING);
> >>>>>>>>>> --
> >>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-24  0:54                     ` Wei Wang
@ 2021-09-24  6:18                       ` Martin Zaharinov
  2021-09-24 16:42                         ` Wei Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Zaharinov @ 2021-09-24  6:18 UTC (permalink / raw)
  To: Wei Wang
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

Hi Wei

I think we discussed it somewhere here that it should be enabled: 

cat /proc/sys/net/core/busy_poll - 50
cat /proc/sys/net/core/busy_read - 50

and one more :

“ packet receipt:

                   high-latency
interrupt-based -------------------> poll-based

Busy polling helps reduce latency in the network receive path by

	• allowing socket layer code to poll the receive queue of a network device,
	• and disable network interrupts.
This eliminates

	• delays caused by the interrupts
	• and the resultant context switches
However, it

	• increses CPU utilization.
	• Also prevent the CPU from sleeping, which can incur additional power comsumption.
Busy polling is disabled by default.

Set net.core.busy_poll to a value other than 0 to enable it.

This parameter controls the number of microseconds to wait for packets on the device queue for socket pool and selects. Red Hat recemmends a value of 50.

Add the SO_BUSY_POLL socket option to the socket. "



do you think it comes from him?

Martin

> On 24 Sep 2021, at 3:54, Wei Wang <weiwan@google.com> wrote:
> 
> Hi Martin,
> 
> It looks like there might still be a race between kthread polling and
> busy polling. I am looking into the code but was not able to identify
> the cause.
> May I ask why you need to enable both at the same time?
> 
> Thanks.
> Wei
> 
> 
> On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hey Wai
>> 
>> If you find any fix for this write me to test .
>> 
>> kthread is a very good solution for network load server but need to find from where is come this bug .
>> 
>> 
>> Martin
>> 
>>> On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote:
>>> 
>>> Hi Wei
>>> 
>>> One more bug report from last hours:
>>> 
>>> 
>>> 
>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>>> 
>>>> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote:
>>>> 
>>>> Thanks Martin for the report.
>>>> Without a reproducer, it might be hard to debug. I will double check
>>>> the code to check for potential race between kthread poll and busy
>>>> poll.
>>>> 
>>>> Thanks.
>>>> Wei
>>>> 
>>>> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>> 
>>>>> Hi Wei
>>>>> Please see this bug log :
>>>>> 
>>>>> 
>>>>> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>>>>> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
>>>>> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
>>>>> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
>>>>> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
>>>>> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
>>>>> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>>>>> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
>>>>> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
>>>>> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
>>>>> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
>>>>> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
>>>>> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
>>>>> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
>>>>> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>>> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
>>>>> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
>>>>> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
>>>>> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
>>>>> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
>>>>> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>>> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>>>>> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>>>>> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>>>>> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>>>>> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>>>>> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>>>>> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>>>>> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>>>>> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>>>>> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>>>>> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
>>>>> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
>>>>> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
>>>>> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
>>>>> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>>>>> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
>>>>> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
>>>>> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
>>>>> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
>>>>> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
>>>>> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
>>>>> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
>>>>> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
>>>>> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
>>>>> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>>>>> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>>>>> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>>>>> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>>>>> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>>>>> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>>>>> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>>>>> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>>>>> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>>>>> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>>>>> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
>>>>> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
>>>>> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
>>>>>> 
>>>>>> Hi Martin,
>>>>>> 
>>>>>> Is there a reproducer for this? What kind of traffic is it running?
>>>>>> What is the following config:
>>>>>> cat /proc/sys/net/core/busy_poll
>>>>>> cat /proc/sys/net/core/busy_read
>>>>>> cat /sys/class/net/<ixgbe_dev>/threaded
>>>>>> And is SO_PREFER_BUSY_POLL used?
>>>>>> 
>>>>>> Thanks.
>>>>>> Wei
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Eric and Wei
>>>>>>> 
>>>>>>> Please see this bug report from last hour ,
>>>>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
>>>>>>> Uptime before crash : 10day
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>>>>>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>>>>>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>>>>>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>>>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>>>>>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>>>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>>>>>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>>>>>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>>>>>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>>>>>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>>>>>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>>>>>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>>>>>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>>>>>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>>>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>>>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>>>>>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>>>>>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>>>>>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>>>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>>>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>>>>>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>>>>>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>>>>>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>>>>>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>>>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>>>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>>>>>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>>>>>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>>>>>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>>>>>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>>>>>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>>>>>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>>>>>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>>>>>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>>>>>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>>>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>>>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>>>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>>>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>>>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>>>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>>>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>>>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>>>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>>>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>>>>>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>>>>>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>>>>>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>>>>>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>>>>>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>>>>>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>>>>>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>>>>>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>>>>>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>>>>>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>>>>>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>>>>>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>>>>>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>>>>>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>>>>>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>>>>>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>>>>>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>>>>>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>>>>>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>>>>>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>>>>>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>>>>>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>>>>>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>>>>>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>>>>>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>>>>>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>>>>>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>>>>>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>>>>>>> 
>>>>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>> 
>>>>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Eric and Wei
>>>>>>>>> 
>>>>>>>>> Please check this log :
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Please send a normal report to netdev.
>>>>>>>> 
>>>>>>>> This has nothing to to with us (Eric & Wei)
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>>>>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>>>>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>>>>> [1584289.107263] Call Trace:
>>>>>>>>> [1584289.107266]  dump_stack+0x58/0x6b
>>>>>>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>>>>>>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>>>>>>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>>>>>>>>> [1584289.474009]  allocate_slab+0x272/0x450
>>>>>>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>>>>>>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>>>>>>>>> [1584289.541416]  build_skb+0x1a/0x200
>>>>>>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>>>>>>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>>>>>>>>> [1584289.605528]  __napi_poll+0x1f/0x130
>>>>>>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>>>>>>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
>>>>>>>>> [1584289.665810]  kthread+0xea/0x120
>>>>>>>>> [1584289.684836]  ? kthread_park+0x80/0x80
>>>>>>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
>>>>>>>>> [1584289.721616] Mem-Info:
>>>>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>>>>>>>>>              active_file:17408 inactive_file:149 isolated_file:32
>>>>>>>>>              unevictable:1440359 dirty:17500 writeback:0
>>>>>>>>>              slab_reclaimable:43368 slab_unreclaimable:155124
>>>>>>>>>              mapped:817431 shmem:7650 pagetables:32093 bounce:0
>>>>>>>>>              free:17832 free_pcp:113 free_cma:0
>>>>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>>>>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>>>>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>>>>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>>>>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>>>>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>>>>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>>>>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>>>>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>>>>>>>>> [1584290.409087] 1465768 total pagecache pages
>>>>>>>>> [1584290.434531] 4165289 pages RAM
>>>>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly
>>>>>>>>> [1584290.484480] 104766 pages reserved
>>>>>>>>> [1584290.508709] 0 pages hwpoisoned
>>>>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>>>>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>>>>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>>>>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>>>>>>>>> [1584302.776532] Call Trace:
>>>>>>>>> [1584302.799361]  dump_stack+0x58/0x6b
>>>>>>>>> [1584302.821791]  dump_header+0x4c/0x2e6
>>>>>>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>>>>>>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>>>>>>>>> [1584302.886641]  out_of_memory+0x54/0xa0
>>>>>>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>>>>>>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>>>>>>>>> [1584302.947874]  __get_free_pages+0x8/0x30
>>>>>>>>> [1584302.967246]  pgd_alloc+0x21/0x180
>>>>>>>>> [1584302.986355]  mm_alloc+0x1af/0x250
>>>>>>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>>>>>>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
>>>>>>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>>>>>>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
>>>>>>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>>>> [1584303.091891] RIP: 0033:0x488376
>>>>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>>>>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>>>>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>>>>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>>>>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>>>>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>>>>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>>>>>>>>> [1584303.379094] Mem-Info:
>>>>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>>>>>>>>>              active_file:12975 inactive_file:168 isolated_file:32
>>>>>>>>>              unevictable:909709 dirty:12864 writeback:10
>>>>>>>>>              slab_reclaimable:42415 slab_unreclaimable:154783
>>>>>>>>>              mapped:39825 shmem:14744 pagetables:26041 bounce:0
>>>>>>>>>              free:537002 free_pcp:1813 free_cma:0
>>>>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>>>>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>>>>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>>>>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>>>>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>>>>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>>>>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>>>>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>>>>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>>>>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>>>>>>>>> [1584304.287094] 933871 total pagecache pages
>>>>>>>>> [1584304.312815] 4165289 pages RAM
>>>>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly
>>>>>>>>> [1584304.362522] 104766 pages reserved
>>>>>>>>> [1584304.386516] 0 pages hwpoisoned
>>>>>>>>> 
>>>>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Wei
>>>>>>>>>>> Check this:
>>>>>>>>>>> 
>>>>>>>>>>> [   39.706567] ------------[ cut here ]------------
>>>>>>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>>>>>>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>>>>> 
>>>>>>>>>> Probably more relevant to Intel maintainers than Wei :/
>>>>>>>>>> 
>>>>>>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>>>>>>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>>>>>>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>>>>>>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>>>>>>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>>>>>>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>>>>>>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>>>>>>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>>>>>>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>>>>>>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>>>>>>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>>>>>>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>>>>>>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>>>>>>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>> [   39.706656] Call Trace:
>>>>>>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>>>>>>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>>>>>>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>>>>>>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>>>>>>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>>>>>>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>>>>>>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>>>>>>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>>>>>>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>>>>>>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>>>>>>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>>>>>>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>>>>>>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>>>>>>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>>>>>>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>>>>>>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
>>>>>>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>>>>>>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
>>>>>>>>>>> [   39.706762]  kthread+0xea/0x120
>>>>>>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
>>>>>>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>>>>>>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>>>>>>>>>>> 
>>>>>>>>>>> Martin
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>>>>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
>>>>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
>>>>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>>>>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>>>>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>>>>>>>>>>>> This patch tries to fix this race by adding a new bit
>>>>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>>>>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>>>>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>>>>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between
>>>>>>>>>>>> kthread and other scenarios and fixes the race issue.
>>>>>>>>>>>> 
>>>>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>>>>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>>>>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>>>>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>>>>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>>>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>>>>>>>>> ---
>>>>>>>>>>>> Change since v3:
>>>>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in
>>>>>>>>>>>> ____napi_schedule().
>>>>>>>>>>>> 
>>>>>>>>>>>> include/linux/netdevice.h |  2 ++
>>>>>>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>>>>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>>>>>>>>>>>> 
>>>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>>>>>>>>>>>> --- a/include/linux/netdevice.h
>>>>>>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>>>>>>> @@ -360,6 +360,7 @@ enum {
>>>>>>>>>>>> NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>>>>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>>>>>>>>>>>> NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>>>>>>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>>>>>>>>>>>> };
>>>>>>>>>>>> 
>>>>>>>>>>>> enum {
>>>>>>>>>>>> @@ -372,6 +373,7 @@ enum {
>>>>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>>>>>>>>>>>> NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>>>>>>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>>>>>>>>>>>> };
>>>>>>>>>>>> 
>>>>>>>>>>>> enum gro_result {
>>>>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644
>>>>>>>>>>>> --- a/net/core/dev.c
>>>>>>>>>>>> +++ b/net/core/dev.c
>>>>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>>>>>>>>>>>          */
>>>>>>>>>>>>         thread = READ_ONCE(napi->thread);
>>>>>>>>>>>>         if (thread) {
>>>>>>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
>>>>>>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>>>>>>>>>>>> +                      * makes sure to proceed with napi polling
>>>>>>>>>>>> +                      * if the thread is explicitly woken from here.
>>>>>>>>>>>> +                      */
>>>>>>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>>>>>>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>>>>>>>>>>>>                 wake_up_process(thread);
>>>>>>>>>>>>                 return;
>>>>>>>>>>>>         }
>>>>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>>>>>>>>>>>>         WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>>>>>>>>>>>> 
>>>>>>>>>>>>         new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>>>>>>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>>>>>>>>>>>>                       NAPIF_STATE_PREFER_BUSY_POLL);
>>>>>>>>>>>> 
>>>>>>>>>>>>         /* If STATE_MISSED was set, leave STATE_SCHED set,
>>>>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>>>>>>>>>>>> 
>>>>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
>>>>>>>>>>>> {
>>>>>>>>>>>> +     bool woken = false;
>>>>>>>>>>>> +
>>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>>>>> 
>>>>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>>>>>>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>>>>>>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>>>>>>>>>>>> +              * kthread owns this napi and could poll on this napi.
>>>>>>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>>>>>>>>>>>> +              * set by some other busy poll thread or by napi_disable().
>>>>>>>>>>>> +              */
>>>>>>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>>>>>>>>>>>>                 WARN_ON(!list_empty(&napi->poll_list));
>>>>>>>>>>>>                 __set_current_state(TASK_RUNNING);
>>>>>>>>>>>>                 return 0;
>>>>>>>>>>>>         }
>>>>>>>>>>>> 
>>>>>>>>>>>>         schedule();
>>>>>>>>>>>> +             /* woken being true indicates this thread owns this napi. */
>>>>>>>>>>>> +             woken = true;
>>>>>>>>>>>>         set_current_state(TASK_INTERRUPTIBLE);
>>>>>>>>>>>> }
>>>>>>>>>>>> __set_current_state(TASK_RUNNING);
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
  2021-09-24  6:18                       ` Martin Zaharinov
@ 2021-09-24 16:42                         ` Wei Wang
       [not found]                           ` <CALidq=V5O6oco+JRWbdKZ4pUXzZOoaUJCu_yCh55M_ccA_6QYQ@mail.gmail.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Wei Wang @ 2021-09-24 16:42 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

On Thu, Sep 23, 2021 at 11:18 PM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Wei
>
> I think we discussed it somewhere here that it should be enabled:
>
> cat /proc/sys/net/core/busy_poll - 50
> cat /proc/sys/net/core/busy_read - 50
>
> and one more :
>
> “ packet receipt:
>
>                    high-latency
> interrupt-based -------------------> poll-based
>
> Busy polling helps reduce latency in the network receive path by
>
>         • allowing socket layer code to poll the receive queue of a network device,
>         • and disable network interrupts.
> This eliminates
>
>         • delays caused by the interrupts
>         • and the resultant context switches
> However, it
>
>         • increses CPU utilization.
>         • Also prevent the CPU from sleeping, which can incur additional power comsumption.
> Busy polling is disabled by default.
>
> Set net.core.busy_poll to a value other than 0 to enable it.
>
> This parameter controls the number of microseconds to wait for packets on the device queue for socket pool and selects. Red Hat recemmends a value of 50.
>
> Add the SO_BUSY_POLL socket option to the socket. "
>
>
>
> do you think it comes from him?

Yes. I understand that you enabled busy polling. I was just wondering
why you need to enable busy polling + thread polling.
Could you help me confirm on the receive side, if the application uses
either of the following socket options:
- SO_BUSY_POLL_BUDGET
- SO_PREFER_BUSY_POLL

And what kernel version are you using?

>
> Martin
>
> > On 24 Sep 2021, at 3:54, Wei Wang <weiwan@google.com> wrote:
> >
> > Hi Martin,
> >
> > It looks like there might still be a race between kthread polling and
> > busy polling. I am looking into the code but was not able to identify
> > the cause.
> > May I ask why you need to enable both at the same time?
> >
> > Thanks.
> > Wei
> >
> >
> > On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote:
> >>
> >> Hey Wai
> >>
> >> If you find any fix for this write me to test .
> >>
> >> kthread is a very good solution for network load server but need to find from where is come this bug .
> >>
> >>
> >> Martin
> >>
> >>> On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote:
> >>>
> >>> Hi Wei
> >>>
> >>> One more bug report from last hours:
> >>>
> >>>
> >>>
> >>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
> >>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
> >>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> >>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
> >>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> >>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
> >>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
> >>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
> >>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
> >>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
> >>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
> >>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
> >>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
> >>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
> >>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
> >>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
> >>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
> >>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
> >>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
> >>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
> >>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
> >>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
> >>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
> >>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
> >>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
> >>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
> >>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
> >>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
> >>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
> >>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
> >>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
> >>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
> >>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
> >>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
> >>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
> >>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
> >>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
> >>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
> >>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
> >>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
> >>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
> >>>
> >>>> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote:
> >>>>
> >>>> Thanks Martin for the report.
> >>>> Without a reproducer, it might be hard to debug. I will double check
> >>>> the code to check for potential race between kthread poll and busy
> >>>> poll.
> >>>>
> >>>> Thanks.
> >>>> Wei
> >>>>
> >>>> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>
> >>>>> Hi Wei
> >>>>> Please see this bug log :
> >>>>>
> >>>>>
> >>>>> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> >>>>> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
> >>>>> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
> >>>>> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
> >>>>> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
> >>>>> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
> >>>>> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
> >>>>> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
> >>>>> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
> >>>>> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
> >>>>> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
> >>>>> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
> >>>>> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
> >>>>> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
> >>>>> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >>>>> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
> >>>>> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
> >>>>> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
> >>>>> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
> >>>>> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
> >>>>> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>>>> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
> >>>>> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
> >>>>> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
> >>>>> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> >>>>> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
> >>>>> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
> >>>>> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
> >>>>> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
> >>>>> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
> >>>>> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> >>>>> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
> >>>>> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
> >>>>> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
> >>>>> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
> >>>>> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
> >>>>> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
> >>>>> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
> >>>>> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
> >>>>> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
> >>>>> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
> >>>>> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
> >>>>> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
> >>>>> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
> >>>>> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
> >>>>> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
> >>>>> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
> >>>>> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
> >>>>> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> >>>>> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
> >>>>> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
> >>>>> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
> >>>>> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
> >>>>> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
> >>>>> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
> >>>>> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
> >>>>> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>>>> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
> >>>>> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
> >>>>>>
> >>>>>> Hi Martin,
> >>>>>>
> >>>>>> Is there a reproducer for this? What kind of traffic is it running?
> >>>>>> What is the following config:
> >>>>>> cat /proc/sys/net/core/busy_poll
> >>>>>> cat /proc/sys/net/core/busy_read
> >>>>>> cat /sys/class/net/<ixgbe_dev>/threaded
> >>>>>> And is SO_PREFER_BUSY_POLL used?
> >>>>>>
> >>>>>> Thanks.
> >>>>>> Wei
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Eric and Wei
> >>>>>>>
> >>>>>>> Please see this bug report from last hour ,
> >>>>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
> >>>>>>> Uptime before crash : 10day
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
> >>>>>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
> >>>>>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
> >>>>>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >>>>>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
> >>>>>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>>>>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
> >>>>>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
> >>>>>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
> >>>>>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
> >>>>>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
> >>>>>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
> >>>>>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
> >>>>>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
> >>>>>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>>>>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>>>>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
> >>>>>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
> >>>>>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
> >>>>>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >>>>>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >>>>>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
> >>>>>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
> >>>>>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
> >>>>>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
> >>>>>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >>>>>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >>>>>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
> >>>>>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>>>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
> >>>>>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
> >>>>>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
> >>>>>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
> >>>>>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
> >>>>>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
> >>>>>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
> >>>>>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
> >>>>>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >>>>>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >>>>>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >>>>>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >>>>>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >>>>>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >>>>>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >>>>>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >>>>>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>>>>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>>>>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
> >>>>>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
> >>>>>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
> >>>>>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
> >>>>>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
> >>>>>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
> >>>>>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
> >>>>>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
> >>>>>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
> >>>>>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
> >>>>>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
> >>>>>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
> >>>>>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
> >>>>>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>>>>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
> >>>>>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
> >>>>>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
> >>>>>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
> >>>>>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
> >>>>>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
> >>>>>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
> >>>>>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
> >>>>>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
> >>>>>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
> >>>>>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
> >>>>>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
> >>>>>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
> >>>>>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
> >>>>>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>>>>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
> >>>>>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
> >>>>>>>
> >>>>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>>>
> >>>>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Eric and Wei
> >>>>>>>>>
> >>>>>>>>> Please check this log :
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Please send a normal report to netdev.
> >>>>>>>>
> >>>>>>>> This has nothing to to with us (Eric & Wei)
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
> >>>>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
> >>>>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >>>>>>>>> [1584289.107263] Call Trace:
> >>>>>>>>> [1584289.107266]  dump_stack+0x58/0x6b
> >>>>>>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
> >>>>>>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
> >>>>>>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
> >>>>>>>>> [1584289.474009]  allocate_slab+0x272/0x450
> >>>>>>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
> >>>>>>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
> >>>>>>>>> [1584289.541416]  build_skb+0x1a/0x200
> >>>>>>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
> >>>>>>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
> >>>>>>>>> [1584289.605528]  __napi_poll+0x1f/0x130
> >>>>>>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
> >>>>>>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
> >>>>>>>>> [1584289.665810]  kthread+0xea/0x120
> >>>>>>>>> [1584289.684836]  ? kthread_park+0x80/0x80
> >>>>>>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
> >>>>>>>>> [1584289.721616] Mem-Info:
> >>>>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
> >>>>>>>>>              active_file:17408 inactive_file:149 isolated_file:32
> >>>>>>>>>              unevictable:1440359 dirty:17500 writeback:0
> >>>>>>>>>              slab_reclaimable:43368 slab_unreclaimable:155124
> >>>>>>>>>              mapped:817431 shmem:7650 pagetables:32093 bounce:0
> >>>>>>>>>              free:17832 free_pcp:113 free_cma:0
> >>>>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
> >>>>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
> >>>>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
> >>>>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
> >>>>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
> >>>>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
> >>>>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
> >>>>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
> >>>>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
> >>>>>>>>> [1584290.409087] 1465768 total pagecache pages
> >>>>>>>>> [1584290.434531] 4165289 pages RAM
> >>>>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly
> >>>>>>>>> [1584290.484480] 104766 pages reserved
> >>>>>>>>> [1584290.508709] 0 pages hwpoisoned
> >>>>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
> >>>>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
> >>>>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
> >>>>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
> >>>>>>>>> [1584302.776532] Call Trace:
> >>>>>>>>> [1584302.799361]  dump_stack+0x58/0x6b
> >>>>>>>>> [1584302.821791]  dump_header+0x4c/0x2e6
> >>>>>>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
> >>>>>>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
> >>>>>>>>> [1584302.886641]  out_of_memory+0x54/0xa0
> >>>>>>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
> >>>>>>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
> >>>>>>>>> [1584302.947874]  __get_free_pages+0x8/0x30
> >>>>>>>>> [1584302.967246]  pgd_alloc+0x21/0x180
> >>>>>>>>> [1584302.986355]  mm_alloc+0x1af/0x250
> >>>>>>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
> >>>>>>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
> >>>>>>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
> >>>>>>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
> >>>>>>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>>>>>> [1584303.091891] RIP: 0033:0x488376
> >>>>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
> >>>>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
> >>>>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
> >>>>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
> >>>>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
> >>>>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
> >>>>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
> >>>>>>>>> [1584303.379094] Mem-Info:
> >>>>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
> >>>>>>>>>              active_file:12975 inactive_file:168 isolated_file:32
> >>>>>>>>>              unevictable:909709 dirty:12864 writeback:10
> >>>>>>>>>              slab_reclaimable:42415 slab_unreclaimable:154783
> >>>>>>>>>              mapped:39825 shmem:14744 pagetables:26041 bounce:0
> >>>>>>>>>              free:537002 free_pcp:1813 free_cma:0
> >>>>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
> >>>>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> >>>>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
> >>>>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
> >>>>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
> >>>>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
> >>>>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
> >>>>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
> >>>>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
> >>>>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
> >>>>>>>>> [1584304.287094] 933871 total pagecache pages
> >>>>>>>>> [1584304.312815] 4165289 pages RAM
> >>>>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly
> >>>>>>>>> [1584304.362522] 104766 pages reserved
> >>>>>>>>> [1584304.386516] 0 pages hwpoisoned
> >>>>>>>>>
> >>>>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Wei
> >>>>>>>>>>> Check this:
> >>>>>>>>>>>
> >>>>>>>>>>> [   39.706567] ------------[ cut here ]------------
> >>>>>>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
> >>>>>>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>>>>>>>>
> >>>>>>>>>> Probably more relevant to Intel maintainers than Wei :/
> >>>>>>>>>>
> >>>>>>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
> >>>>>>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
> >>>>>>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
> >>>>>>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
> >>>>>>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
> >>>>>>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> >>>>>>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
> >>>>>>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
> >>>>>>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
> >>>>>>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
> >>>>>>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
> >>>>>>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
> >>>>>>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
> >>>>>>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
> >>>>>>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>>>>>> [   39.706656] Call Trace:
> >>>>>>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
> >>>>>>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
> >>>>>>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
> >>>>>>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
> >>>>>>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
> >>>>>>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
> >>>>>>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
> >>>>>>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
> >>>>>>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
> >>>>>>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
> >>>>>>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
> >>>>>>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
> >>>>>>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
> >>>>>>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
> >>>>>>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
> >>>>>>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
> >>>>>>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
> >>>>>>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
> >>>>>>>>>>> [   39.706762]  kthread+0xea/0x120
> >>>>>>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
> >>>>>>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
> >>>>>>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
> >>>>>>>>>>>
> >>>>>>>>>>> Martin
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> >>>>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
> >>>>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
> >>>>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> >>>>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> >>>>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
> >>>>>>>>>>>> This patch tries to fix this race by adding a new bit
> >>>>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> >>>>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> >>>>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
> >>>>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between
> >>>>>>>>>>>> kthread and other scenarios and fixes the race issue.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> >>>>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
> >>>>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> >>>>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>>>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
> >>>>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
> >>>>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
> >>>>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>> Change since v3:
> >>>>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in
> >>>>>>>>>>>> ____napi_schedule().
> >>>>>>>>>>>>
> >>>>>>>>>>>> include/linux/netdevice.h |  2 ++
> >>>>>>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
> >>>>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
> >>>>>>>>>>>> --- a/include/linux/netdevice.h
> >>>>>>>>>>>> +++ b/include/linux/netdevice.h
> >>>>>>>>>>>> @@ -360,6 +360,7 @@ enum {
> >>>>>>>>>>>> NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> >>>>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> >>>>>>>>>>>> NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> >>>>>>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> >>>>>>>>>>>> };
> >>>>>>>>>>>>
> >>>>>>>>>>>> enum {
> >>>>>>>>>>>> @@ -372,6 +373,7 @@ enum {
> >>>>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> >>>>>>>>>>>> NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> >>>>>>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> >>>>>>>>>>>> };
> >>>>>>>>>>>>
> >>>>>>>>>>>> enum gro_result {
> >>>>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644
> >>>>>>>>>>>> --- a/net/core/dev.c
> >>>>>>>>>>>> +++ b/net/core/dev.c
> >>>>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >>>>>>>>>>>>          */
> >>>>>>>>>>>>         thread = READ_ONCE(napi->thread);
> >>>>>>>>>>>>         if (thread) {
> >>>>>>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
> >>>>>>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
> >>>>>>>>>>>> +                      * makes sure to proceed with napi polling
> >>>>>>>>>>>> +                      * if the thread is explicitly woken from here.
> >>>>>>>>>>>> +                      */
> >>>>>>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> >>>>>>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> >>>>>>>>>>>>                 wake_up_process(thread);
> >>>>>>>>>>>>                 return;
> >>>>>>>>>>>>         }
> >>>>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> >>>>>>>>>>>>         WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> >>>>>>>>>>>>
> >>>>>>>>>>>>         new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> >>>>>>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
> >>>>>>>>>>>>                       NAPIF_STATE_PREFER_BUSY_POLL);
> >>>>>>>>>>>>
> >>>>>>>>>>>>         /* If STATE_MISSED was set, leave STATE_SCHED set,
> >>>>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> >>>>>>>>>>>>
> >>>>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
> >>>>>>>>>>>> {
> >>>>>>>>>>>> +     bool woken = false;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>>>>>>>
> >>>>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> >>>>>>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> >>>>>>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
> >>>>>>>>>>>> +              * kthread owns this napi and could poll on this napi.
> >>>>>>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
> >>>>>>>>>>>> +              * set by some other busy poll thread or by napi_disable().
> >>>>>>>>>>>> +              */
> >>>>>>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> >>>>>>>>>>>>                 WARN_ON(!list_empty(&napi->poll_list));
> >>>>>>>>>>>>                 __set_current_state(TASK_RUNNING);
> >>>>>>>>>>>>                 return 0;
> >>>>>>>>>>>>         }
> >>>>>>>>>>>>
> >>>>>>>>>>>>         schedule();
> >>>>>>>>>>>> +             /* woken being true indicates this thread owns this napi. */
> >>>>>>>>>>>> +             woken = true;
> >>>>>>>>>>>>         set_current_state(TASK_INTERRUPTIBLE);
> >>>>>>>>>>>> }
> >>>>>>>>>>>> __set_current_state(TASK_RUNNING);
> >>>>>>>>>>>> --
> >>>>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
       [not found]                           ` <CALidq=V5O6oco+JRWbdKZ4pUXzZOoaUJCu_yCh55M_ccA_6QYQ@mail.gmail.com>
@ 2021-09-24 16:57                             ` Wei Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Wei Wang @ 2021-09-24 16:57 UTC (permalink / raw)
  To: Martin Zaharinov
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Hannes Frederic Sowa, Greg KH

On Fri, Sep 24, 2021 at 9:50 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi Wei
>
> App is pppd server
> and not use
> - SO_BUSY_POLL_BUDGET
> - SO_PREFER_BUSY_POLL
> kernel with crash is 5.13.13

I see. Thanks.

>
> i move to 5.14.7 but disable kthread until I talk with you.
>
> I check and intel card 82599 not use busy poll .
>
> And may be not need to use busy poll.
>
> If not enable busypoll is will go to same crash.

Just to confirm, are you saying that if you disable busypoll, the
issue no longer happens?

> and may be need to add documents that when is enable kthread need to by disable busypoll

In theory, it should still work. But there is probably some race
hidden somewhere. If it is OK for your workload to disable busy poll,
maybe it's better to do that for now.
I will dig a bit more and see what I can find.


>
>
> Martin
>
>
> On Fri, Sep 24, 2021, 19:42 Wei Wang <weiwan@google.com> wrote:
>>
>> On Thu, Sep 23, 2021 at 11:18 PM Martin Zaharinov <micron10@gmail.com> wrote:
>> >
>> > Hi Wei
>> >
>> > I think we discussed it somewhere here that it should be enabled:
>> >
>> > cat /proc/sys/net/core/busy_poll - 50
>> > cat /proc/sys/net/core/busy_read - 50
>> >
>> > and one more :
>> >
>> > “ packet receipt:
>> >
>> >                    high-latency
>> > interrupt-based -------------------> poll-based
>> >
>> > Busy polling helps reduce latency in the network receive path by
>> >
>> >         • allowing socket layer code to poll the receive queue of a network device,
>> >         • and disable network interrupts.
>> > This eliminates
>> >
>> >         • delays caused by the interrupts
>> >         • and the resultant context switches
>> > However, it
>> >
>> >         • increses CPU utilization.
>> >         • Also prevent the CPU from sleeping, which can incur additional power comsumption.
>> > Busy polling is disabled by default.
>> >
>> > Set net.core.busy_poll to a value other than 0 to enable it.
>> >
>> > This parameter controls the number of microseconds to wait for packets on the device queue for socket pool and selects. Red Hat recemmends a value of 50.
>> >
>> > Add the SO_BUSY_POLL socket option to the socket. "
>> >
>> >
>> >
>> > do you think it comes from him?
>>
>> Yes. I understand that you enabled busy polling. I was just wondering
>> why you need to enable busy polling + thread polling.
>> Could you help me confirm on the receive side, if the application uses
>> either of the following socket options:
>> - SO_BUSY_POLL_BUDGET
>> - SO_PREFER_BUSY_POLL
>>
>> And what kernel version are you using?
>>
>> >
>> > Martin
>> >
>> > > On 24 Sep 2021, at 3:54, Wei Wang <weiwan@google.com> wrote:
>> > >
>> > > Hi Martin,
>> > >
>> > > It looks like there might still be a race between kthread polling and
>> > > busy polling. I am looking into the code but was not able to identify
>> > > the cause.
>> > > May I ask why you need to enable both at the same time?
>> > >
>> > > Thanks.
>> > > Wei
>> > >
>> > >
>> > > On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote:
>> > >>
>> > >> Hey Wai
>> > >>
>> > >> If you find any fix for this write me to test .
>> > >>
>> > >> kthread is a very good solution for network load server but need to find from where is come this bug .
>> > >>
>> > >>
>> > >> Martin
>> > >>
>> > >>> On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote:
>> > >>>
>> > >>> Hi Wei
>> > >>>
>> > >>> One more bug report from last hours:
>> > >>>
>> > >>>
>> > >>>
>> > >>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>> > >>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>> > >>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>> > >>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> > >>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>> > >>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> > >>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>> > >>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>> > >>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>> > >>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>> > >>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>> > >>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>> > >>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>> > >>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>> > >>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> > >>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> > >>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>> > >>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>> > >>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>> > >>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> > >>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> > >>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>> > >>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>> > >>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>> > >>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>> > >>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> > >>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> > >>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>> > >>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> > >>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>> > >>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>> > >>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>> > >>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>> > >>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>> > >>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>> > >>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>> > >>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> > >>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> > >>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> > >>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> > >>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> > >>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> > >>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> > >>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> > >>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> > >>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> > >>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> > >>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>> > >>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>> > >>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>> > >>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>> > >>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> > >>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> > >>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>> > >>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>> > >>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>> > >>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>> > >>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> > >>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> > >>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>> > >>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> > >>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> > >>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>> > >>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>> > >>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> > >>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> > >>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> > >>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> > >>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> > >>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> > >>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> > >>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> > >>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> > >>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> > >>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>> > >>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> > >>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>> > >>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>> > >>>
>> > >>>> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote:
>> > >>>>
>> > >>>> Thanks Martin for the report.
>> > >>>> Without a reproducer, it might be hard to debug. I will double check
>> > >>>> the code to check for potential race between kthread poll and busy
>> > >>>> poll.
>> > >>>>
>> > >>>> Thanks.
>> > >>>> Wei
>> > >>>>
>> > >>>> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> > >>>>>
>> > >>>>> Hi Wei
>> > >>>>> Please see this bug log :
>> > >>>>>
>> > >>>>>
>> > >>>>> Sep 15 08:04:56  [2034411.548669][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>>>> Sep 15 08:04:56  [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> > >>>>> Sep 15 08:04:56  [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>>>> Sep 15 08:04:56  [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>>>> Sep 15 08:04:56  [2034411.725536][ T3195] Call Trace:
>> > >>>>> Sep 15 08:04:56  [2034411.749948][ T3195]  netif_receive_skb_list_internal+0x25c/0x2b0
>> > >>>>> Sep 15 08:04:56  [2034411.774579][ T3195]  gro_normal_one+0x6e/0x90
>> > >>>>> Sep 15 08:04:56  [2034411.798786][ T3195]  napi_gro_flush+0xb1/0x100
>> > >>>>> Sep 15 08:04:56  [2034411.822410][ T3195]  napi_complete_done+0x107/0x180
>> > >>>>> Sep 15 08:04:56  [2034411.845614][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>> > >>>>> Sep 15 08:04:56  [2034411.868480][ T3195]  __napi_poll+0x1f/0x100
>> > >>>>> Sep 15 08:04:56  [2034411.890899][ T3195]  ? __napi_poll+0x100/0x100
>> > >>>>> Sep 15 08:04:56  [2034411.912799][ T3195]  napi_threaded_poll+0x105/0x150
>> > >>>>> Sep 15 08:04:56  [2034411.934567][ T3195]  kthread+0x101/0x120
>> > >>>>> Sep 15 08:04:56  [2034411.955873][ T3195]  ? set_kthread_struct+0x30/0x30
>> > >>>>> Sep 15 08:04:56  [2034411.977157][ T3195]  ret_from_fork+0x1f/0x30
>> > >>>>> Sep 15 08:04:56  [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]---
>> > >>>>> Sep 15 08:04:56  [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> > >>>>> Sep 15 08:04:56  [2034412.058658][ T3195] #PF: supervisor read access in kernel mode
>> > >>>>> Sep 15 08:04:56  [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page
>> > >>>>> Sep 15 08:04:56  [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0
>> > >>>>> Sep 15 08:04:56  [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI
>> > >>>>> Sep 15 08:04:56  [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S      W  O      5.13.12 #1
>> > >>>>> Sep 15 08:04:56  [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> > >>>>> Sep 15 08:04:56  [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>> > >>>>> Sep 15 08:04:56  [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>> > >>>>> Sep 15 08:04:56  [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>> > >>>>> Sep 15 08:04:56  [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>> > >>>>> Sep 15 08:04:56  [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>> > >>>>> Sep 15 08:04:56  [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>> > >>>>> Sep 15 08:04:56  [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>> > >>>>> Sep 15 08:04:57  [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>> > >>>>> Sep 15 08:04:57  [2034412.507493][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>> > >>>>> Sep 15 08:04:57  [2034412.553528][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>>>> Sep 15 08:04:57  [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> > >>>>> Sep 15 08:04:57  [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>>>> Sep 15 08:04:57  [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>>>> Sep 15 08:04:57  [2034412.721656][ T3195] Call Trace:
>> > >>>>> Sep 15 08:04:57  [2034412.746016][ T3195]  gro_normal_one+0x6e/0x90
>> > >>>>> Sep 15 08:04:57  [2034412.770321][ T3195]  napi_gro_flush+0xb1/0x100
>> > >>>>> Sep 15 08:04:57  [2034412.794137][ T3195]  napi_complete_done+0x107/0x180
>> > >>>>> Sep 15 08:04:57  [2034412.817556][ T3195]  ixgbe_poll+0x10e/0x240 [ixgbe]
>> > >>>>> Sep 15 08:04:57  [2034412.840522][ T3195]  __napi_poll+0x1f/0x100
>> > >>>>> Sep 15 08:04:57  [2034412.862829][ T3195]  ? __napi_poll+0x100/0x100
>> > >>>>> Sep 15 08:04:57  [2034412.884804][ T3195]  napi_threaded_poll+0x105/0x150
>> > >>>>> Sep 15 08:04:57  [2034412.906305][ T3195]  kthread+0x101/0x120
>> > >>>>> Sep 15 08:04:57  [2034412.927502][ T3195]  ? set_kthread_struct+0x30/0x30
>> > >>>>> Sep 15 08:04:57  [2034412.948434][ T3195]  ret_from_fork+0x1f/0x30
>> > >>>>> Sep 15 08:04:57  [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC]
>> > >>>>> Sep 15 08:04:57  [2034413.136792][ T3195] CR2: 0000000000000000
>> > >>>>> Sep 15 08:04:57  [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]---
>> > >>>>> Sep 15 08:04:57  [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0
>> > >>>>> Sep 15 08:04:57  [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49
>> > >>>>> Sep 15 08:04:57  [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296
>> > >>>>> Sep 15 08:04:57  [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
>> > >>>>> Sep 15 08:04:57  [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003
>> > >>>>> Sep 15 08:04:57  [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff
>> > >>>>> Sep 15 08:04:57  [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00
>> > >>>>> Sep 15 08:04:58  [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00
>> > >>>>> Sep 15 08:04:58  [2034413.487558][ T3195] FS:  0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000
>> > >>>>> Sep 15 08:04:58  [2034413.535263][ T3195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>>>> Sep 15 08:04:58  [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0
>> > >>>>> Sep 15 08:04:58  [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>>>> Sep 15 08:04:58  [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>>>> Sep 15 08:04:58  [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt
>> > >>>>> Sep 15 08:04:58  [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> > >>>>> Sep 15 08:04:58  [2034413.906445][ T3195] Rebooting in 10 seconds..
>> > >>>>> Sep 15 08:05:08  [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG.
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote:
>> > >>>>>>
>> > >>>>>> Hi Martin,
>> > >>>>>>
>> > >>>>>> Is there a reproducer for this? What kind of traffic is it running?
>> > >>>>>> What is the following config:
>> > >>>>>> cat /proc/sys/net/core/busy_poll
>> > >>>>>> cat /proc/sys/net/core/busy_read
>> > >>>>>> cat /sys/class/net/<ixgbe_dev>/threaded
>> > >>>>>> And is SO_PREFER_BUSY_POLL used?
>> > >>>>>>
>> > >>>>>> Thanks.
>> > >>>>>> Wei
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> > >>>>>>>
>> > >>>>>>> Hi Eric and Wei
>> > >>>>>>>
>> > >>>>>>> Please see this bug report from last hour ,
>> > >>>>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up
>> > >>>>>>> Uptime before crash : 10day
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> Sep  9 12:49:31  [829553.899833][ T2925] ------------[ cut here ]------------
>> > >>>>>>> Sep  9 12:49:31  [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158
>> > >>>>>>> Sep  9 12:49:31  [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90
>> > >>>>>>> Sep  9 12:49:31  [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> > >>>>>>> Sep  9 12:49:31  [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G           O      5.13.13 #1
>> > >>>>>>> Sep  9 12:49:31  [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> > >>>>>>> Sep  9 12:49:31  [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90
>> > >>>>>>> Sep  9 12:49:31  [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6
>> > >>>>>>> Sep  9 12:49:31  [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286
>> > >>>>>>> Sep  9 12:49:31  [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001
>> > >>>>>>> Sep  9 12:49:32  [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff
>> > >>>>>>> Sep  9 12:49:32  [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff
>> > >>>>>>> Sep  9 12:49:32  [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00
>> > >>>>>>> Sep  9 12:49:32  [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00
>> > >>>>>>> Sep  9 12:49:32  [829554.744221][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> > >>>>>>> Sep  9 12:49:32  [829554.795701][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>>>>>> Sep  9 12:49:32  [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> > >>>>>>> Sep  9 12:49:32  [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>>>>>> Sep  9 12:49:32  [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>>>>>> Sep  9 12:49:32  [829554.972250][ T2925] Call Trace:
>> > >>>>>>> Sep  9 12:49:32  [829554.996597][ T2925]  netif_receive_skb_list_internal+0x25c/0x2b0
>> > >>>>>>> Sep  9 12:49:32  [829555.021270][ T2925]  busy_poll_stop+0x113/0x140
>> > >>>>>>> Sep  9 12:49:32  [829555.045679][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> > >>>>>>> Sep  9 12:49:32  [829555.069833][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> > >>>>>>> Sep  9 12:49:32  [829555.093659][ T2925]  napi_busy_loop+0x212/0x280
>> > >>>>>>> Sep  9 12:49:32  [829555.117046][ T2925]  ep_poll+0xba/0x380
>> > >>>>>>> Sep  9 12:49:32  [829555.140048][ T2925]  ? __napi_poll+0x1f/0x100
>> > >>>>>>> Sep  9 12:49:32  [829555.162477][ T2925]  do_epoll_wait+0xa6/0xc0
>> > >>>>>>> Sep  9 12:49:32  [829555.184504][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> > >>>>>>> Sep  9 12:49:32  [829555.206138][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> > >>>>>>> Sep  9 12:49:32  [829555.227619][ T2925]  ? do_syscall_64+0x3a/0x70
>> > >>>>>>> Sep  9 12:49:32  [829555.248592][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> > >>>>>>> Sep  9 12:49:32  [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]---
>> > >>>>>>> Sep  9 12:49:32  [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9
>> > >>>>>>> Sep  9 12:49:32  [829555.357231][ T2925] #PF: supervisor read access in kernel mode
>> > >>>>>>> Sep  9 12:49:32  [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page
>> > >>>>>>> Sep  9 12:49:32  [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0
>> > >>>>>>> Sep  9 12:49:32  [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI
>> > >>>>>>> Sep  9 12:49:32  [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G        W  O      5.13.13 #1
>> > >>>>>>> Sep  9 12:49:32  [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020
>> > >>>>>>> Sep  9 12:49:32  [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> > >>>>>>> Sep  9 12:49:33  [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> > >>>>>>> Sep  9 12:49:33  [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> > >>>>>>> Sep  9 12:49:33  [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> > >>>>>>> Sep  9 12:49:33  [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> > >>>>>>> Sep  9 12:49:33  [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> > >>>>>>> Sep  9 12:49:33  [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> > >>>>>>> Sep  9 12:49:33  [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> > >>>>>>> Sep  9 12:49:33  [829555.797754][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> > >>>>>>> Sep  9 12:49:33  [829555.843229][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>>>>>> Sep  9 12:49:33  [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> > >>>>>>> Sep  9 12:49:33  [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>>>>>> Sep  9 12:49:33  [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>>>>>> Sep  9 12:49:33  [829556.008563][ T2925] Call Trace:
>> > >>>>>>> Sep  9 12:49:33  [829556.032547][ T2925]  ? enqueue_to_backlog+0x81/0x250
>> > >>>>>>> Sep  9 12:49:33  [829556.056686][ T2925]  netif_receive_skb_list_internal+0x24d/0x2b0
>> > >>>>>>> Sep  9 12:49:33  [829556.080870][ T2925]  busy_poll_stop+0x113/0x140
>> > >>>>>>> Sep  9 12:49:33  [829556.104559][ T2925]  ? ep_destroy_wakeup_source+0x20/0x20
>> > >>>>>>> Sep  9 12:49:33  [829556.128028][ T2925]  ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe]
>> > >>>>>>> Sep  9 12:49:33  [829556.151405][ T2925]  napi_busy_loop+0x212/0x280
>> > >>>>>>> Sep  9 12:49:33  [829556.174478][ T2925]  ep_poll+0xba/0x380
>> > >>>>>>> Sep  9 12:49:33  [829556.196887][ T2925]  ? __napi_poll+0x1f/0x100
>> > >>>>>>> Sep  9 12:49:33  [829556.219070][ T2925]  do_epoll_wait+0xa6/0xc0
>> > >>>>>>> Sep  9 12:49:33  [829556.240778][ T2925]  do_epoll_pwait.part.0+0x9/0x70
>> > >>>>>>> Sep  9 12:49:33  [829556.262203][ T2925]  __x64_sys_epoll_pwait+0x6a/0x100
>> > >>>>>>> Sep  9 12:49:33  [829556.283188][ T2925]  ? do_syscall_64+0x3a/0x70
>> > >>>>>>> Sep  9 12:49:33  [829556.303666][ T2925]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>> > >>>>>>> Sep  9 12:49:33  [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw]
>> > >>>>>>> Sep  9 12:49:33  [829556.487037][ T2925] CR2: 00000000000496c9
>> > >>>>>>> Sep  9 12:49:33  [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]---
>> > >>>>>>> Sep  9 12:49:33  [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0
>> > >>>>>>> Sep  9 12:49:34  [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02
>> > >>>>>>> Sep  9 12:49:34  [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282
>> > >>>>>>> Sep  9 12:49:34  [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015
>> > >>>>>>> Sep  9 12:49:34  [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1
>> > >>>>>>> Sep  9 12:49:34  [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900
>> > >>>>>>> Sep  9 12:49:34  [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8
>> > >>>>>>> Sep  9 12:49:34  [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8
>> > >>>>>>> Sep  9 12:49:34  [829556.831749][ T2925] FS:  00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000
>> > >>>>>>> Sep  9 12:49:34  [829556.879295][ T2925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>>>>>> Sep  9 12:49:34  [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0
>> > >>>>>>> Sep  9 12:49:34  [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>>>>>> Sep  9 12:49:34  [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>>>>>> Sep  9 12:49:34  [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt
>> > >>>>>>> Sep  9 12:49:34  [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> > >>>>>>> Sep  9 12:49:34  [829557.231174][ T2925] Rebooting in 10 seconds..
>> > >>>>>>> Sep  9 12:49:44  [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG.
>> > >>>>>>>
>> > >>>>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote:
>> > >>>>>>>>
>> > >>>>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>> Hi Eric and Wei
>> > >>>>>>>>>
>> > >>>>>>>>> Please check this log :
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> Please send a normal report to netdev.
>> > >>>>>>>>
>> > >>>>>>>> This has nothing to to with us (Eric & Wei)
>> > >>>>>>>>
>> > >>>>>>>> Thanks.
>> > >>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null)
>> > >>>>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G           O      5.11.4 #1
>> > >>>>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>> > >>>>>>>>> [1584289.107263] Call Trace:
>> > >>>>>>>>> [1584289.107266]  dump_stack+0x58/0x6b
>> > >>>>>>>>> [1584289.209562]  warn_alloc.cold+0x70/0xd4
>> > >>>>>>>>> [1584289.209569]  __alloc_pages_slowpath.constprop.0+0xd57/0xfb0
>> > >>>>>>>>> [1584289.209574]  __alloc_pages_nodemask+0x15a/0x180
>> > >>>>>>>>> [1584289.474009]  allocate_slab+0x272/0x450
>> > >>>>>>>>> [1584289.496731]  ___slab_alloc.constprop.0+0x41e/0x4d0
>> > >>>>>>>>> [1584289.519147]  kmem_cache_alloc+0x110/0x120
>> > >>>>>>>>> [1584289.541416]  build_skb+0x1a/0x200
>> > >>>>>>>>> [1584289.563121]  ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe]
>> > >>>>>>>>> [1584289.584618]  ixgbe_poll+0xeb/0x2a0 [ixgbe]
>> > >>>>>>>>> [1584289.605528]  __napi_poll+0x1f/0x130
>> > >>>>>>>>> [1584289.625842]  napi_threaded_poll+0x110/0x160
>> > >>>>>>>>> [1584289.646110]  ? __napi_poll+0x130/0x130
>> > >>>>>>>>> [1584289.665810]  kthread+0xea/0x120
>> > >>>>>>>>> [1584289.684836]  ? kthread_park+0x80/0x80
>> > >>>>>>>>> [1584289.703440]  ret_from_fork+0x1f/0x30
>> > >>>>>>>>> [1584289.721616] Mem-Info:
>> > >>>>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0
>> > >>>>>>>>>              active_file:17408 inactive_file:149 isolated_file:32
>> > >>>>>>>>>              unevictable:1440359 dirty:17500 writeback:0
>> > >>>>>>>>>              slab_reclaimable:43368 slab_unreclaimable:155124
>> > >>>>>>>>>              mapped:817431 shmem:7650 pagetables:32093 bounce:0
>> > >>>>>>>>>              free:17832 free_pcp:113 free_cma:0
>> > >>>>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no
>> > >>>>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> > >>>>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726
>> > >>>>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB
>> > >>>>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985
>> > >>>>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB
>> > >>>>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0
>> > >>>>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB
>> > >>>>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB
>> > >>>>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB
>> > >>>>>>>>> [1584290.409087] 1465768 total pagecache pages
>> > >>>>>>>>> [1584290.434531] 4165289 pages RAM
>> > >>>>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly
>> > >>>>>>>>> [1584290.484480] 104766 pages reserved
>> > >>>>>>>>> [1584290.508709] 0 pages hwpoisoned
>> > >>>>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105)
>> > >>>>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0
>> > >>>>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G           O      5.11.4 #1
>> > >>>>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019
>> > >>>>>>>>> [1584302.776532] Call Trace:
>> > >>>>>>>>> [1584302.799361]  dump_stack+0x58/0x6b
>> > >>>>>>>>> [1584302.821791]  dump_header+0x4c/0x2e6
>> > >>>>>>>>> [1584302.843580]  oom_kill_process.cold+0xb/0x10
>> > >>>>>>>>> [1584302.865223]  out_of_memory.part.0+0x125/0x5f0
>> > >>>>>>>>> [1584302.886641]  out_of_memory+0x54/0xa0
>> > >>>>>>>>> [1584302.907302]  __alloc_pages_slowpath.constprop.0+0xb03/0xfb0
>> > >>>>>>>>> [1584302.927913]  __alloc_pages_nodemask+0x15a/0x180
>> > >>>>>>>>> [1584302.947874]  __get_free_pages+0x8/0x30
>> > >>>>>>>>> [1584302.967246]  pgd_alloc+0x21/0x180
>> > >>>>>>>>> [1584302.986355]  mm_alloc+0x1af/0x250
>> > >>>>>>>>> [1584303.005085]  alloc_bprm+0x80/0x2a0
>> > >>>>>>>>> [1584303.023328]  do_execveat_common+0x8b/0x330
>> > >>>>>>>>> [1584303.041181]  __x64_sys_execve+0x2b/0x40
>> > >>>>>>>>> [1584303.058513]  do_syscall_64+0x2d/0x40
>> > >>>>>>>>> [1584303.075281]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> > >>>>>>>>> [1584303.091891] RIP: 0033:0x488376
>> > >>>>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>> > >>>>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b
>> > >>>>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376
>> > >>>>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660
>> > >>>>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000
>> > >>>>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258
>> > >>>>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100
>> > >>>>>>>>> [1584303.379094] Mem-Info:
>> > >>>>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0
>> > >>>>>>>>>              active_file:12975 inactive_file:168 isolated_file:32
>> > >>>>>>>>>              unevictable:909709 dirty:12864 writeback:10
>> > >>>>>>>>>              slab_reclaimable:42415 slab_unreclaimable:154783
>> > >>>>>>>>>              mapped:39825 shmem:14744 pagetables:26041 bounce:0
>> > >>>>>>>>>              free:537002 free_pcp:1813 free_cma:0
>> > >>>>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no
>> > >>>>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> > >>>>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726
>> > >>>>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB
>> > >>>>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985
>> > >>>>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB
>> > >>>>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0
>> > >>>>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB
>> > >>>>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB
>> > >>>>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB
>> > >>>>>>>>> [1584304.287094] 933871 total pagecache pages
>> > >>>>>>>>> [1584304.312815] 4165289 pages RAM
>> > >>>>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly
>> > >>>>>>>>> [1584304.362522] 104766 pages reserved
>> > >>>>>>>>> [1584304.386516] 0 pages hwpoisoned
>> > >>>>>>>>>
>> > >>>>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Hi Wei
>> > >>>>>>>>>>> Check this:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> [   39.706567] ------------[ cut here ]------------
>> > >>>>>>>>>>> [   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
>> > >>>>>>>>>>> [   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
>> > >>>>>>>>>>
>> > >>>>>>>>>> Probably more relevant to Intel maintainers than Wei :/
>> > >>>>>>>>>>
>> > >>>>>>>>>>> [   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
>> > >>>>>>>>>>> [   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
>> > >>>>>>>>>>> [   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
>> > >>>>>>>>>>> [   39.706619] Workqueue: events work_for_cpu_fn
>> > >>>>>>>>>>> [   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
>> > >>>>>>>>>>> [   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>> > >>>>>>>>>>> [   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
>> > >>>>>>>>>>> [   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
>> > >>>>>>>>>>> [   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
>> > >>>>>>>>>>> [   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
>> > >>>>>>>>>>> [   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
>> > >>>>>>>>>>> [   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
>> > >>>>>>>>>>> [   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
>> > >>>>>>>>>>> [   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >>>>>>>>>>> [   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
>> > >>>>>>>>>>> [   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >>>>>>>>>>> [   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > >>>>>>>>>>> [   39.706656] Call Trace:
>> > >>>>>>>>>>> [   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
>> > >>>>>>>>>>> [   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
>> > >>>>>>>>>>> [   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
>> > >>>>>>>>>>> [   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
>> > >>>>>>>>>>> [   39.706716]  ? __kmalloc+0x37/0x160
>> > >>>>>>>>>>> [   39.706720]  ? kmem_cache_alloc+0xcb/0x120
>> > >>>>>>>>>>> [   39.706723]  ? irq_get_irq_data+0x5/0x20
>> > >>>>>>>>>>> [   39.706726]  ? mp_check_pin_attr+0xe/0xf0
>> > >>>>>>>>>>> [   39.706729]  ? irq_get_irq_data+0x5/0x20
>> > >>>>>>>>>>> [   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
>> > >>>>>>>>>>> [   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
>> > >>>>>>>>>>> [   39.706739]  ? pci_conf1_read+0x9f/0xf0
>> > >>>>>>>>>>> [   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
>> > >>>>>>>>>>> [   39.706746]  local_pci_probe+0x1b/0x40
>> > >>>>>>>>>>> [   39.706750]  work_for_cpu_fn+0xb/0x20
>> > >>>>>>>>>>> [   39.706754]  process_one_work+0x1ec/0x350
>> > >>>>>>>>>>> [   39.706758]  worker_thread+0x24b/0x4d0
>> > >>>>>>>>>>> [   39.706760]  ? process_one_work+0x350/0x350
>> > >>>>>>>>>>> [   39.706762]  kthread+0xea/0x120
>> > >>>>>>>>>>> [   39.706766]  ? kthread_park+0x80/0x80
>> > >>>>>>>>>>> [   39.706770]  ret_from_fork+0x1f/0x30
>> > >>>>>>>>>>> [   39.706774] ---[ end trace 7a203f3ec972a377 ]---
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Martin
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
>> > >>>>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on
>> > >>>>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the
>> > >>>>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll()
>> > >>>>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll
>> > >>>>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well.
>> > >>>>>>>>>>>> This patch tries to fix this race by adding a new bit
>> > >>>>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
>> > >>>>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared
>> > >>>>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this
>> > >>>>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between
>> > >>>>>>>>>>>> kthread and other scenarios and fixes the race issue.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
>> > >>>>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com>
>> > >>>>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>> > >>>>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>> > >>>>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com>
>> > >>>>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>> > >>>>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com>
>> > >>>>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> > >>>>>>>>>>>> ---
>> > >>>>>>>>>>>> Change since v3:
>> > >>>>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in
>> > >>>>>>>>>>>> ____napi_schedule().
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> include/linux/netdevice.h |  2 ++
>> > >>>>>>>>>>>> net/core/dev.c            | 19 ++++++++++++++++++-
>> > >>>>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-)
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> > >>>>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644
>> > >>>>>>>>>>>> --- a/include/linux/netdevice.h
>> > >>>>>>>>>>>> +++ b/include/linux/netdevice.h
>> > >>>>>>>>>>>> @@ -360,6 +360,7 @@ enum {
>> > >>>>>>>>>>>> NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>> > >>>>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>> > >>>>>>>>>>>> NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
>> > >>>>>>>>>>>> +     NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>> > >>>>>>>>>>>> };
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> enum {
>> > >>>>>>>>>>>> @@ -372,6 +373,7 @@ enum {
>> > >>>>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>> > >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>> > >>>>>>>>>>>> NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
>> > >>>>>>>>>>>> +     NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>> > >>>>>>>>>>>> };
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> enum gro_result {
>> > >>>>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>> > >>>>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644
>> > >>>>>>>>>>>> --- a/net/core/dev.c
>> > >>>>>>>>>>>> +++ b/net/core/dev.c
>> > >>>>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>> > >>>>>>>>>>>>          */
>> > >>>>>>>>>>>>         thread = READ_ONCE(napi->thread);
>> > >>>>>>>>>>>>         if (thread) {
>> > >>>>>>>>>>>> +                     /* Avoid doing set_bit() if the thread is in
>> > >>>>>>>>>>>> +                      * INTERRUPTIBLE state, cause napi_thread_wait()
>> > >>>>>>>>>>>> +                      * makes sure to proceed with napi polling
>> > >>>>>>>>>>>> +                      * if the thread is explicitly woken from here.
>> > >>>>>>>>>>>> +                      */
>> > >>>>>>>>>>>> +                     if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
>> > >>>>>>>>>>>> +                             set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>> > >>>>>>>>>>>>                 wake_up_process(thread);
>> > >>>>>>>>>>>>                 return;
>> > >>>>>>>>>>>>         }
>> > >>>>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>> > >>>>>>>>>>>>         WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>         new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
>> > >>>>>>>>>>>> +                           NAPIF_STATE_SCHED_THREADED |
>> > >>>>>>>>>>>>                       NAPIF_STATE_PREFER_BUSY_POLL);
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>         /* If STATE_MISSED was set, leave STATE_SCHED set,
>> > >>>>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi)
>> > >>>>>>>>>>>> {
>> > >>>>>>>>>>>> +     bool woken = false;
>> > >>>>>>>>>>>> +
>> > >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE);
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) {
>> > >>>>>>>>>>>> -             if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
>> > >>>>>>>>>>>> +             /* Testing SCHED_THREADED bit here to make sure the current
>> > >>>>>>>>>>>> +              * kthread owns this napi and could poll on this napi.
>> > >>>>>>>>>>>> +              * Testing SCHED bit is not enough because SCHED bit might be
>> > >>>>>>>>>>>> +              * set by some other busy poll thread or by napi_disable().
>> > >>>>>>>>>>>> +              */
>> > >>>>>>>>>>>> +             if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>> > >>>>>>>>>>>>                 WARN_ON(!list_empty(&napi->poll_list));
>> > >>>>>>>>>>>>                 __set_current_state(TASK_RUNNING);
>> > >>>>>>>>>>>>                 return 0;
>> > >>>>>>>>>>>>         }
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>         schedule();
>> > >>>>>>>>>>>> +             /* woken being true indicates this thread owns this napi. */
>> > >>>>>>>>>>>> +             woken = true;
>> > >>>>>>>>>>>>         set_current_state(TASK_INTERRUPTIBLE);
>> > >>>>>>>>>>>> }
>> > >>>>>>>>>>>> __set_current_state(TASK_RUNNING);
>> > >>>>>>>>>>>> --
>> > >>>>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>
>> > >>>>>
>> > >>>
>> > >>
>> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-09-24 16:57 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 22:36 [PATCH net v4] net: fix race between napi kthread mode and busy poll Wei Wang
2021-03-17 18:58 ` Jakub Kicinski
2021-03-17 21:50 ` patchwork-bot+netdevbpf
2021-03-20  8:45 ` Martin Zaharinov
2021-03-20  9:55   ` Eric Dumazet
2021-03-20 10:31     ` Martin Zaharinov
2021-03-30  9:25     ` Martin Zaharinov
2021-03-30 13:39       ` Eric Dumazet
2021-04-10 11:22         ` Bug Report Napi GRO ixgbe Martin Zaharinov
2021-04-12  8:36           ` Paolo Abeni
2021-04-26  8:31             ` Martin Zaharinov
2021-05-08 10:48               ` Bug Report Napi kthread rcd Martin Zaharinov
2021-05-09 10:40             ` Bug Report Napi GRO ixgbe Martin Zaharinov
2021-09-09 11:18         ` [PATCH net v4] net: fix race between napi kthread mode and busy poll Martin Zaharinov
2021-09-10  0:30           ` Wei Wang
2021-09-10  1:57             ` Martin Zaharinov
2021-09-15 14:22             ` Martin Zaharinov
2021-09-15 15:45               ` Wei Wang
2021-09-15 20:57                 ` Martin Zaharinov
2021-09-22 14:12                 ` Martin Zaharinov
2021-09-23 20:31                   ` Martin Zaharinov
2021-09-24  0:54                     ` Wei Wang
2021-09-24  6:18                       ` Martin Zaharinov
2021-09-24 16:42                         ` Wei Wang
     [not found]                           ` <CALidq=V5O6oco+JRWbdKZ4pUXzZOoaUJCu_yCh55M_ccA_6QYQ@mail.gmail.com>
2021-09-24 16:57                             ` Wei Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.