All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Zaharinov <micron10@gmail.com>
To: Wei Wang <weiwan@google.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, Alexander Duyck <alexanderduyck@fb.com>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>
Subject: Re: [PATCH net v4] net: fix race between napi kthread mode and busy poll
Date: Sat, 20 Mar 2021 10:45:28 +0200	[thread overview]
Message-ID: <6AF20AA6-07E7-4DDD-8A9E-BE093FC03802@gmail.com> (raw)
In-Reply-To: <20210316223647.4080796-1-weiwan@google.com>

Hi Wei 
Check this:

[   39.706567] ------------[ cut here ]------------
[   39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557)
[   39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100
[   39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas
[   39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G           O      5.11.7 #1
[   39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020
[   39.706619] Workqueue: events work_for_cpu_fn
[   39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100
[   39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
[   39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292
[   39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff
[   39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001
[   39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea
[   39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008
[   39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000
[   39.706646] FS:  0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000
[   39.706649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0
[   39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   39.706656] Call Trace:
[   39.706658]  i40e_setup_pf_switch+0x617/0xf80 [i40e]
[   39.706683]  i40e_probe.part.0.cold+0x8dc/0x109e [i40e]
[   39.706708]  ? acpi_ns_check_object_type+0xd4/0x193
[   39.706713]  ? acpi_ns_check_package_list+0xfd/0x205
[   39.706716]  ? __kmalloc+0x37/0x160
[   39.706720]  ? kmem_cache_alloc+0xcb/0x120
[   39.706723]  ? irq_get_irq_data+0x5/0x20
[   39.706726]  ? mp_check_pin_attr+0xe/0xf0
[   39.706729]  ? irq_get_irq_data+0x5/0x20
[   39.706731]  ? mp_map_pin_to_irq+0xb7/0x2c0
[   39.706735]  ? acpi_register_gsi_ioapic+0x86/0x150
[   39.706739]  ? pci_conf1_read+0x9f/0xf0
[   39.706743]  ? pci_bus_read_config_word+0x2e/0x40
[   39.706746]  local_pci_probe+0x1b/0x40
[   39.706750]  work_for_cpu_fn+0xb/0x20
[   39.706754]  process_one_work+0x1ec/0x350
[   39.706758]  worker_thread+0x24b/0x4d0
[   39.706760]  ? process_one_work+0x350/0x350
[   39.706762]  kthread+0xea/0x120
[   39.706766]  ? kthread_park+0x80/0x80
[   39.706770]  ret_from_fork+0x1f/0x30
[   39.706774] ---[ end trace 7a203f3ec972a377 ]---

Martin
	

> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote:
> 
> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> determine if the kthread owns this napi and could call napi->poll() on
> it. However, if socket busy poll is enabled, it is possible that the
> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> on the same napi. napi_disable() could grab the SCHED bit as well.
> This patch tries to fix this race by adding a new bit
> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> in napi_complete_done(), and we only poll the napi in kthread if this
> bit is set. This helps distinguish the ownership of the napi between
> kthread and other scenarios and fixes the race issue.
> 
> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> Reported-by: Martin Zaharinov <micron10@gmail.com>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
> Change since v3:
>  - Add READ_ONCE() for thread->state and add comments in
>    ____napi_schedule().
> 
> include/linux/netdevice.h |  2 ++
> net/core/dev.c            | 19 ++++++++++++++++++-
> 2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5b67ea89d5f2..87a5d186faff 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -360,6 +360,7 @@ enum {
> 	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
> 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
> 	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
> +	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
> };
> 
> enum {
> @@ -372,6 +373,7 @@ enum {
> 	NAPIF_STATE_IN_BUSY_POLL	= BIT(NAPI_STATE_IN_BUSY_POLL),
> 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
> 	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
> +	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
> };
> 
> enum gro_result {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6c5967e80132..d3195a95f30e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> 		 */
> 		thread = READ_ONCE(napi->thread);
> 		if (thread) {
> +			/* Avoid doing set_bit() if the thread is in
> +			 * INTERRUPTIBLE state, cause napi_thread_wait()
> +			 * makes sure to proceed with napi polling
> +			 * if the thread is explicitly woken from here.
> +			 */
> +			if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE)
> +				set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> 			wake_up_process(thread);
> 			return;
> 		}
> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> 		WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> 
> 		new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> +			      NAPIF_STATE_SCHED_THREADED |
> 			      NAPIF_STATE_PREFER_BUSY_POLL);
> 
> 		/* If STATE_MISSED was set, leave STATE_SCHED set,
> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> 
> static int napi_thread_wait(struct napi_struct *napi)
> {
> +	bool woken = false;
> +
> 	set_current_state(TASK_INTERRUPTIBLE);
> 
> 	while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> -		if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> +		/* Testing SCHED_THREADED bit here to make sure the current
> +		 * kthread owns this napi and could poll on this napi.
> +		 * Testing SCHED bit is not enough because SCHED bit might be
> +		 * set by some other busy poll thread or by napi_disable().
> +		 */
> +		if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> 			WARN_ON(!list_empty(&napi->poll_list));
> 			__set_current_state(TASK_RUNNING);
> 			return 0;
> 		}
> 
> 		schedule();
> +		/* woken being true indicates this thread owns this napi. */
> +		woken = true;
> 		set_current_state(TASK_INTERRUPTIBLE);
> 	}
> 	__set_current_state(TASK_RUNNING);
> -- 
> 2.31.0.rc2.261.g7f71774620-goog
> 


  parent reply	other threads:[~2021-03-20 11:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16 22:36 [PATCH net v4] net: fix race between napi kthread mode and busy poll Wei Wang
2021-03-17 18:58 ` Jakub Kicinski
2021-03-17 21:50 ` patchwork-bot+netdevbpf
2021-03-20  8:45 ` Martin Zaharinov [this message]
2021-03-20  9:55   ` Eric Dumazet
2021-03-20 10:31     ` Martin Zaharinov
2021-03-30  9:25     ` Martin Zaharinov
2021-03-30 13:39       ` Eric Dumazet
2021-04-10 11:22         ` Bug Report Napi GRO ixgbe Martin Zaharinov
2021-04-12  8:36           ` Paolo Abeni
2021-04-26  8:31             ` Martin Zaharinov
2021-05-08 10:48               ` Bug Report Napi kthread rcd Martin Zaharinov
2021-05-09 10:40             ` Bug Report Napi GRO ixgbe Martin Zaharinov
2021-09-09 11:18         ` [PATCH net v4] net: fix race between napi kthread mode and busy poll Martin Zaharinov
2021-09-10  0:30           ` Wei Wang
2021-09-10  1:57             ` Martin Zaharinov
2021-09-15 14:22             ` Martin Zaharinov
2021-09-15 15:45               ` Wei Wang
2021-09-15 20:57                 ` Martin Zaharinov
2021-09-22 14:12                 ` Martin Zaharinov
2021-09-23 20:31                   ` Martin Zaharinov
2021-09-24  0:54                     ` Wei Wang
2021-09-24  6:18                       ` Martin Zaharinov
2021-09-24 16:42                         ` Wei Wang
     [not found]                           ` <CALidq=V5O6oco+JRWbdKZ4pUXzZOoaUJCu_yCh55M_ccA_6QYQ@mail.gmail.com>
2021-09-24 16:57                             ` Wei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6AF20AA6-07E7-4DDD-8A9E-BE093FC03802@gmail.com \
    --to=micron10@gmail.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.