linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RT] Question about i40e threaded irq
@ 2021-05-11  6:09 Juri Lelli
  2021-05-11  6:24 ` Stefan Assmann
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Juri Lelli @ 2021-05-11  6:09 UTC (permalink / raw)
  To: Thomas Gleixner, Sebastian Andrzej Siewior; +Cc: linux-rt-users, LKML, sassmann

Hi,

The following has been reported on RT.

[ 2007.106483] list_add corruption. next->prev should be prev (ffff8d86a0aadd00), but was ffff8d86a0aadec8. (next=ffff8d86a0aadd00).
[ 2007.118155] ------------[ cut here ]------------
[ 2007.118156] kernel BUG at lib/list_debug.c:25!
[ 2007.118160] invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 2007.118162] CPU: 36 PID: 4651 Comm: irq/429-i40e-en Kdump: loaded Tainted: G          I  #1
[ 2007.118163] Hardware name: Dell Inc. PowerEdge R740xd/04FC42, BIOS 2.6.4 04/09/2020
[ 2007.118168] RIP: 0010:__list_add_valid.cold.0+0x12/0x28
[ 2007.118169] Code: 85 5d 00 00 00 48 8b 50 08 48 39 f2 0f 85 42 00 00 00 b8 01 00 00 00 c3 48 89 d1 48 c7 c7 20 1e 6e ad 48 89 c2 e8 e0 11 cd ff <0f> 0b 48 89 c1 4c 89 c6 48 c7 c7 78 1e 6e ad e8 cc 11 cd ff 0f 0b
[ 2007.118170] RSP: 0018:ffffa598d9b0be68 EFLAGS: 00010246
[ 2007.118171] RAX: 0000000000000075 RBX: ffff8d86a0aadd00 RCX: 0000000000000001
[ 2007.118171] RDX: 0000000000000000 RSI: ffffffffad6cf723 RDI: 00000000ffffffff
[ 2007.118172] RBP: ffff8d8694112010 R08: 0000000000000000 R09: 0000000000000a36
[ 2007.118173] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d86a0aadd00
[ 2007.118173] R13: ffffffffac71f240 R14: ffff8d869d3758a0 R15: ffff8d993af88000
[ 2007.118174] FS:  0000000000000000(0000) GS:ffff8d86a0a80000(0000) knlGS:0000000000000000
[ 2007.118174] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2007.118175] CR2: 00007fa2d29211a0 CR3: 00000011d8c0e002 CR4: 00000000007606e0
[ 2007.118175] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2007.118176] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2007.118176] PKRU: 55555554
[ 2007.118176] Call Trace:
[ 2007.118180]  __napi_schedule_irqoff+0x34/0x60
[ 2007.118191]  i40e_msix_clean_rings+0x3f/0x50 [i40e]
[ 2007.118195]  irq_forced_thread_fn+0x30/0x80
[ 2007.118196]  irq_thread+0xdd/0x180
[ 2007.118198]  ? wake_threads_waitq+0x30/0x30
[ 2007.118198]  ? irq_thread_check_affinity+0x20/0x20
[ 2007.118202]  kthread+0x112/0x130
[ 2007.118203]  ? kthread_flush_work_fn+0x10/0x10
[ 2007.118207]  ret_from_fork+0x1f/0x40

The following tracing bits have been then collected, which seem
relevant.

irq/532--4667   36....2 13343.788389: softirq_raise:        vec=3 [action=NET_RX]
irq/532--4667   36....2 13343.788391: kernel_stack:         <stack trace>
=> trace_event_raw_event_softirq (ffffffff83eb6f77)
=> __raise_softirq_irqoff (ffffffff83eb7acd)
=> i40e_msix_clean_rings (ffffffffc03fe6ef)
=> irq_forced_thread_fn (ffffffff83f1f2a0)
=> irq_thread (ffffffff83f1f58d)
=> kthread (ffffffff83ed5f42)
=> ret_from_fork (ffffffff8480023f)

irq/529--4664   36d.h.2 13343.788402: softirq_raise:        vec=3 [action=NET_RX]
irq/529--4664   36d.h.2 13343.788404: kernel_stack:         <stack trace>
=> trace_event_raw_event_softirq (ffffffff83eb6f77)
=> __raise_softirq_irqoff (ffffffff83eb7acd)
=> rps_trigger_softirq (ffffffff8452fa49)
=> flush_smp_call_function_queue (ffffffff83f591c3)
=> smp_call_function_single_interrupt (ffffffff8480294b)
=> call_function_single_interrupt (ffffffff84801c3f)
=> __list_add_valid (ffffffff8424b050)
=> __napi_schedule_irqoff (ffffffff8452fa94)
=> i40e_msix_clean_rings (ffffffffc03fe6ef)
=> irq_forced_thread_fn (ffffffff83f1f2a0)
=> irq_thread (ffffffff83f1f58d)
=> kthread (ffffffff83ed5f42)
=> ret_from_fork (ffffffff8480023f)

My understanding is that rps_trigger_softirq() sneaked in while
i40e_msix_clean_rings() threaded irq was running and, since the latter is
using napi_schedule_irqoff(), the softnet_data poll_list got eventually
corrupted.

Now, doing s/napi_schedule_irqoff/napi_schedule/ in the i40e driver seem
to cure the problem. I'm not sure that there isn't a more general
solution, though. Is it expected that napi_schedule_irqoff users in RT
know what they are doing or do we want/need to fix this in a more
general way?

Thanks!

Juri


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RT] Question about i40e threaded irq
  2021-05-11  6:09 [RT] Question about i40e threaded irq Juri Lelli
@ 2021-05-11  6:24 ` Stefan Assmann
  2021-05-11  7:46 ` Thomas Gleixner
  2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
  2 siblings, 0 replies; 15+ messages in thread
From: Stefan Assmann @ 2021-05-11  6:24 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior, linux-rt-users, LKML

On 2021-05-11 08:09, Juri Lelli wrote:
> Hi,
> 
> The following has been reported on RT.
> 
> [ 2007.106483] list_add corruption. next->prev should be prev (ffff8d86a0aadd00), but was ffff8d86a0aadec8. (next=ffff8d86a0aadd00).
> [ 2007.118155] ------------[ cut here ]------------
> [ 2007.118156] kernel BUG at lib/list_debug.c:25!
> [ 2007.118160] invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
> [ 2007.118162] CPU: 36 PID: 4651 Comm: irq/429-i40e-en Kdump: loaded Tainted: G          I  #1
> [ 2007.118163] Hardware name: Dell Inc. PowerEdge R740xd/04FC42, BIOS 2.6.4 04/09/2020
> [ 2007.118168] RIP: 0010:__list_add_valid.cold.0+0x12/0x28
> [ 2007.118169] Code: 85 5d 00 00 00 48 8b 50 08 48 39 f2 0f 85 42 00 00 00 b8 01 00 00 00 c3 48 89 d1 48 c7 c7 20 1e 6e ad 48 89 c2 e8 e0 11 cd ff <0f> 0b 48 89 c1 4c 89 c6 48 c7 c7 78 1e 6e ad e8 cc 11 cd ff 0f 0b
> [ 2007.118170] RSP: 0018:ffffa598d9b0be68 EFLAGS: 00010246
> [ 2007.118171] RAX: 0000000000000075 RBX: ffff8d86a0aadd00 RCX: 0000000000000001
> [ 2007.118171] RDX: 0000000000000000 RSI: ffffffffad6cf723 RDI: 00000000ffffffff
> [ 2007.118172] RBP: ffff8d8694112010 R08: 0000000000000000 R09: 0000000000000a36
> [ 2007.118173] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8d86a0aadd00
> [ 2007.118173] R13: ffffffffac71f240 R14: ffff8d869d3758a0 R15: ffff8d993af88000
> [ 2007.118174] FS:  0000000000000000(0000) GS:ffff8d86a0a80000(0000) knlGS:0000000000000000
> [ 2007.118174] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2007.118175] CR2: 00007fa2d29211a0 CR3: 00000011d8c0e002 CR4: 00000000007606e0
> [ 2007.118175] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2007.118176] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2007.118176] PKRU: 55555554
> [ 2007.118176] Call Trace:
> [ 2007.118180]  __napi_schedule_irqoff+0x34/0x60
> [ 2007.118191]  i40e_msix_clean_rings+0x3f/0x50 [i40e]
> [ 2007.118195]  irq_forced_thread_fn+0x30/0x80
> [ 2007.118196]  irq_thread+0xdd/0x180
> [ 2007.118198]  ? wake_threads_waitq+0x30/0x30
> [ 2007.118198]  ? irq_thread_check_affinity+0x20/0x20
> [ 2007.118202]  kthread+0x112/0x130
> [ 2007.118203]  ? kthread_flush_work_fn+0x10/0x10
> [ 2007.118207]  ret_from_fork+0x1f/0x40
> 
> The following tracing bits have been then collected, which seem
> relevant.
> 
> irq/532--4667   36....2 13343.788389: softirq_raise:        vec=3 [action=NET_RX]
> irq/532--4667   36....2 13343.788391: kernel_stack:         <stack trace>
> => trace_event_raw_event_softirq (ffffffff83eb6f77)
> => __raise_softirq_irqoff (ffffffff83eb7acd)
> => i40e_msix_clean_rings (ffffffffc03fe6ef)
> => irq_forced_thread_fn (ffffffff83f1f2a0)
> => irq_thread (ffffffff83f1f58d)
> => kthread (ffffffff83ed5f42)
> => ret_from_fork (ffffffff8480023f)
> 
> irq/529--4664   36d.h.2 13343.788402: softirq_raise:        vec=3 [action=NET_RX]
> irq/529--4664   36d.h.2 13343.788404: kernel_stack:         <stack trace>
> => trace_event_raw_event_softirq (ffffffff83eb6f77)
> => __raise_softirq_irqoff (ffffffff83eb7acd)
> => rps_trigger_softirq (ffffffff8452fa49)
> => flush_smp_call_function_queue (ffffffff83f591c3)
> => smp_call_function_single_interrupt (ffffffff8480294b)
> => call_function_single_interrupt (ffffffff84801c3f)
> => __list_add_valid (ffffffff8424b050)
> => __napi_schedule_irqoff (ffffffff8452fa94)
> => i40e_msix_clean_rings (ffffffffc03fe6ef)
> => irq_forced_thread_fn (ffffffff83f1f2a0)
> => irq_thread (ffffffff83f1f58d)
> => kthread (ffffffff83ed5f42)
> => ret_from_fork (ffffffff8480023f)
> 
> My understanding is that rps_trigger_softirq() sneaked in while
> i40e_msix_clean_rings() threaded irq was running and, since the latter is
> using napi_schedule_irqoff(), the softnet_data poll_list got eventually
> corrupted.
> 
> Now, doing s/napi_schedule_irqoff/napi_schedule/ in the i40e driver seem
> to cure the problem. I'm not sure that there isn't a more general
> solution, though. Is it expected that napi_schedule_irqoff users in RT
> know what they are doing or do we want/need to fix this in a more
> general way?

Just wanted to add that this does not concern i40e alone. A good number
of wired NIC drivers use napi_schedule_irqoff() in the same way.

Thanks!

  Stefan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RT] Question about i40e threaded irq
  2021-05-11  6:09 [RT] Question about i40e threaded irq Juri Lelli
  2021-05-11  6:24 ` Stefan Assmann
@ 2021-05-11  7:46 ` Thomas Gleixner
  2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2021-05-11  7:46 UTC (permalink / raw)
  To: Juri Lelli, Sebastian Andrzej Siewior; +Cc: linux-rt-users, LKML, sassmann

On Tue, May 11 2021 at 08:09, Juri Lelli wrote:
> My understanding is that rps_trigger_softirq() sneaked in while
> i40e_msix_clean_rings() threaded irq was running and, since the latter is
> using napi_schedule_irqoff(), the softnet_data poll_list got eventually
> corrupted.
>
> Now, doing s/napi_schedule_irqoff/napi_schedule/ in the i40e driver seem
> to cure the problem. I'm not sure that there isn't a more general
> solution, though. Is it expected that napi_schedule_irqoff users in RT
> know what they are doing or do we want/need to fix this in a more
> general way?

The straight forward fix it to map napi_schedule_irqoff() to
napi_schedule() on RT. Fixing this at the driver level is a whack-a-mole
game.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-11  6:09 [RT] Question about i40e threaded irq Juri Lelli
  2021-05-11  6:24 ` Stefan Assmann
  2021-05-11  7:46 ` Thomas Gleixner
@ 2021-05-12 21:43 ` Sebastian Andrzej Siewior
  2021-05-12 22:28   ` Thomas Gleixner
                     ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-05-12 21:43 UTC (permalink / raw)
  To: netdev
  Cc: Juri Lelli, Thomas Gleixner, linux-rt-users, Steven Rostedt,
	LKML, sassmann, David S. Miller, Jakub Kicinski

__napi_schedule_irqoff() is an optimized version of __napi_schedule()
which can be used where it is known that interrupts are disabled,
e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
callbacks.

On PREEMPT_RT enabled kernels this assumptions is not true. Force-
threaded interrupt handlers and spinlocks are not disabling interrupts
and the NAPI hrtimer callback is forced into softirq context which runs
with interrupts enabled as well.

Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
game so make __napi_schedule_irqoff() invoke __napi_schedule() for
PREEMPT_RT kernels.

The callers of ____napi_schedule() in the networking core have been
audited and are correct on PREEMPT_RT kernels as well.

Reported-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
an inline provided which invokes __napi_schedule().

This was not chosen as it creates #ifdeffery all over the place and with
the proposed solution the code reflects the documentation consistently
and in one obvious place.

 net/core/dev.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 222b1d322c969..febb23708184e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6501,11 +6501,18 @@ EXPORT_SYMBOL(napi_schedule_prep);
  * __napi_schedule_irqoff - schedule for receive
  * @n: entry to schedule
  *
- * Variant of __napi_schedule() assuming hard irqs are masked
+ * Variant of __napi_schedule() assuming hard irqs are masked.
+ *
+ * On PREEMPT_RT enabled kernels this maps to __napi_schedule()
+ * because the interrupt disabled assumption might not be true
+ * due to force-threaded interrupts and spinlock substitution.
  */
 void __napi_schedule_irqoff(struct napi_struct *n)
 {
-	____napi_schedule(this_cpu_ptr(&softnet_data), n);
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		____napi_schedule(this_cpu_ptr(&softnet_data), n);
+	else
+		__napi_schedule(n);
 }
 EXPORT_SYMBOL(__napi_schedule_irqoff);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
@ 2021-05-12 22:28   ` Thomas Gleixner
  2021-05-13  0:50     ` Steven Rostedt
  2021-05-14 18:56     ` Jakub Kicinski
  2021-05-13  5:12   ` Juri Lelli
  2021-05-13 20:20   ` patchwork-bot+netdevbpf
  2 siblings, 2 replies; 15+ messages in thread
From: Thomas Gleixner @ 2021-05-12 22:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev
  Cc: Juri Lelli, linux-rt-users, Steven Rostedt, LKML, sassmann,
	David S. Miller, Jakub Kicinski, stable-rt

On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote:
> __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> which can be used where it is known that interrupts are disabled,
> e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> callbacks.
>
> On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> threaded interrupt handlers and spinlocks are not disabling interrupts
> and the NAPI hrtimer callback is forced into softirq context which runs
> with interrupts enabled as well.
>
> Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> PREEMPT_RT kernels.
>
> The callers of ____napi_schedule() in the networking core have been
> audited and are correct on PREEMPT_RT kernels as well.
>
> Reported-by: Juri Lelli <juri.lelli@redhat.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

> ---
> Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> an inline provided which invokes __napi_schedule().
>
> This was not chosen as it creates #ifdeffery all over the place and with
> the proposed solution the code reflects the documentation consistently
> and in one obvious place.

Blame me for that decision.

No matter which variant we end up with, this needs to go into all stable
RT kernels ASAP.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-12 22:28   ` Thomas Gleixner
@ 2021-05-13  0:50     ` Steven Rostedt
  2021-05-13 16:43       ` Steven Rostedt
  2021-05-14 18:56     ` Jakub Kicinski
  1 sibling, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2021-05-13  0:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
	LKML, sassmann, David S. Miller, Jakub Kicinski, stable-rt

On Thu, 13 May 2021 00:28:02 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> No matter which variant we end up with, this needs to go into all stable
> RT kernels ASAP.

Is this in rt-devel already?

I'll start pulling in whatever is in there.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
  2021-05-12 22:28   ` Thomas Gleixner
@ 2021-05-13  5:12   ` Juri Lelli
  2021-05-13 20:20   ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 15+ messages in thread
From: Juri Lelli @ 2021-05-13  5:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, Thomas Gleixner, linux-rt-users, Steven Rostedt, LKML,
	sassmann, David S. Miller, Jakub Kicinski

Hi,

On 12/05/21 23:43, Sebastian Andrzej Siewior wrote:
> __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> which can be used where it is known that interrupts are disabled,
> e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> callbacks.
> 
> On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> threaded interrupt handlers and spinlocks are not disabling interrupts
> and the NAPI hrtimer callback is forced into softirq context which runs
> with interrupts enabled as well.
> 
> Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> PREEMPT_RT kernels.
> 
> The callers of ____napi_schedule() in the networking core have been
> audited and are correct on PREEMPT_RT kernels as well.
> 
> Reported-by: Juri Lelli <juri.lelli@redhat.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> an inline provided which invokes __napi_schedule().
> 
> This was not chosen as it creates #ifdeffery all over the place and with
> the proposed solution the code reflects the documentation consistently
> and in one obvious place.
> 
>  net/core/dev.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 222b1d322c969..febb23708184e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6501,11 +6501,18 @@ EXPORT_SYMBOL(napi_schedule_prep);
>   * __napi_schedule_irqoff - schedule for receive
>   * @n: entry to schedule
>   *
> - * Variant of __napi_schedule() assuming hard irqs are masked
> + * Variant of __napi_schedule() assuming hard irqs are masked.
> + *
> + * On PREEMPT_RT enabled kernels this maps to __napi_schedule()
> + * because the interrupt disabled assumption might not be true
> + * due to force-threaded interrupts and spinlock substitution.
>   */
>  void __napi_schedule_irqoff(struct napi_struct *n)
>  {
> -	____napi_schedule(this_cpu_ptr(&softnet_data), n);
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> +		____napi_schedule(this_cpu_ptr(&softnet_data), n);
> +	else
> +		__napi_schedule(n);
>  }
>  EXPORT_SYMBOL(__napi_schedule_irqoff);

Thanks for the patch!

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

Best,
Juri


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-13  0:50     ` Steven Rostedt
@ 2021-05-13 16:43       ` Steven Rostedt
  2021-05-14 12:11         ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2021-05-13 16:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
	LKML, sassmann, David S. Miller, Jakub Kicinski, stable-rt

On Wed, 12 May 2021 20:50:46 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 13 May 2021 00:28:02 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > No matter which variant we end up with, this needs to go into all stable
> > RT kernels ASAP.  
> 
> Is this in rt-devel already?
> 
> I'll start pulling in whatever is in there.

I don't see this in the rt-devel tree. The stable-rt releases always pull
from there (following the stable vs mainline relationship).

Is there going to be a new rt-devel release?

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
  2021-05-12 22:28   ` Thomas Gleixner
  2021-05-13  5:12   ` Juri Lelli
@ 2021-05-13 20:20   ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-05-13 20:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, juri.lelli, tglx, linux-rt-users, rostedt, linux-kernel,
	sassmann, davem, kuba

Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Wed, 12 May 2021 23:43:24 +0200 you wrote:
> __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> which can be used where it is known that interrupts are disabled,
> e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> callbacks.
> 
> On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> threaded interrupt handlers and spinlocks are not disabling interrupts
> and the NAPI hrtimer callback is forced into softirq context which runs
> with interrupts enabled as well.
> 
> [...]

Here is the summary with links:
  - [net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
    https://git.kernel.org/netdev/net-next/c/8380c81d5c4f

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-13 16:43       ` Steven Rostedt
@ 2021-05-14 12:11         ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2021-05-14 12:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
	LKML, sassmann, David S. Miller, Jakub Kicinski, stable-rt

On Thu, May 13 2021 at 12:43, Steven Rostedt wrote:
> On Wed, 12 May 2021 20:50:46 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> On Thu, 13 May 2021 00:28:02 +0200
>> Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> > No matter which variant we end up with, this needs to go into all stable
>> > RT kernels ASAP.  
>> 
>> Is this in rt-devel already?
>> 
>> I'll start pulling in whatever is in there.
>
> I don't see this in the rt-devel tree. The stable-rt releases always pull
> from there (following the stable vs mainline relationship).
>
> Is there going to be a new rt-devel release?

Once we have time to work on it.

The patch got applied to net-next, so please pick it up from there.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-12 22:28   ` Thomas Gleixner
  2021-05-13  0:50     ` Steven Rostedt
@ 2021-05-14 18:56     ` Jakub Kicinski
  2021-05-14 19:44       ` Alison Chaiken
  2021-05-14 20:16       ` Thomas Gleixner
  1 sibling, 2 replies; 15+ messages in thread
From: Jakub Kicinski @ 2021-05-14 18:56 UTC (permalink / raw)
  To: Thomas Gleixner, Sebastian Andrzej Siewior
  Cc: netdev, Juri Lelli, linux-rt-users, Steven Rostedt, LKML,
	sassmann, David S. Miller, stable-rt

On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:
> On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote:
> > __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> > which can be used where it is known that interrupts are disabled,
> > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> > callbacks.
> >
> > On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> > threaded interrupt handlers and spinlocks are not disabling interrupts
> > and the NAPI hrtimer callback is forced into softirq context which runs
> > with interrupts enabled as well.
> >
> > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> > game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> > PREEMPT_RT kernels.
> >
> > The callers of ____napi_schedule() in the networking core have been
> > audited and are correct on PREEMPT_RT kernels as well.
> >
> > Reported-by: Juri Lelli <juri.lelli@redhat.com>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>  
> 
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> 
> > ---
> > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> > an inline provided which invokes __napi_schedule().
> >
> > This was not chosen as it creates #ifdeffery all over the place and with
> > the proposed solution the code reflects the documentation consistently
> > and in one obvious place.  
> 
> Blame me for that decision.
> 
> No matter which variant we end up with, this needs to go into all stable
> RT kernels ASAP.

Mumble mumble. I thought we concluded that drivers used on RT can be
fixed, we've already done it for a couple drivers (by which I mean two).
If all the IRQ handler is doing is scheduling NAPI (which it is for
modern NICs) - IRQF_NO_THREAD seems like the right option.

Is there any driver you care about that we can convert to using
IRQF_NO_THREAD so we can have new drivers to "do the right thing"
while the old ones depend on this workaround for now?


Another thing while I have your attention - ____napi_schedule() does
__raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
On non-RT we get occasional NOHZ warnings when drivers schedule napi
from process context, but on RT this is even more of a problem, right?
ksoftirqd won't run until something else actually wakes it up?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-14 18:56     ` Jakub Kicinski
@ 2021-05-14 19:44       ` Alison Chaiken
  2021-05-14 21:53         ` Thomas Gleixner
  2021-05-14 20:16       ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Alison Chaiken @ 2021-05-14 19:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior, netdev, Juri Lelli,
	linux-rt-users, Steven Rostedt, LKML, sassmann, David S. Miller,
	stable-rt

On Fri, May 14, 2021 at 11:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:
> > On Wed, May 12 2021 at 23:43, Sebastian Andrzej Siewior wrote:
> > > __napi_schedule_irqoff() is an optimized version of __napi_schedule()
> > > which can be used where it is known that interrupts are disabled,
> > > e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
> > > callbacks.
> > >
> > > On PREEMPT_RT enabled kernels this assumptions is not true. Force-
> > > threaded interrupt handlers and spinlocks are not disabling interrupts
> > > and the NAPI hrtimer callback is forced into softirq context which runs
> > > with interrupts enabled as well.
> > >
> > > Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
> > > game so make __napi_schedule_irqoff() invoke __napi_schedule() for
> > > PREEMPT_RT kernels.
> > >
> > > The callers of ____napi_schedule() in the networking core have been
> > > audited and are correct on PREEMPT_RT kernels as well.
> > >
> > > Reported-by: Juri Lelli <juri.lelli@redhat.com>
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> >
> > > ---
> > > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
> > > an inline provided which invokes __napi_schedule().
> > >
> > > This was not chosen as it creates #ifdeffery all over the place and with
> > > the proposed solution the code reflects the documentation consistently
> > > and in one obvious place.
> >
> > Blame me for that decision.
> >
> > No matter which variant we end up with, this needs to go into all stable
> > RT kernels ASAP.
>
> Mumble mumble. I thought we concluded that drivers used on RT can be
> fixed, we've already done it for a couple drivers (by which I mean two).
> If all the IRQ handler is doing is scheduling NAPI (which it is for
> modern NICs) - IRQF_NO_THREAD seems like the right option.
>
> Is there any driver you care about that we can convert to using
> IRQF_NO_THREAD so we can have new drivers to "do the right thing"
> while the old ones depend on this workaround for now?
>
>
> Another thing while I have your attention - ____napi_schedule() does
> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
> On non-RT we get occasional NOHZ warnings when drivers schedule napi
> from process context, but on RT this is even more of a problem, right?
> ksoftirqd won't run until something else actually wakes it up?

By "NOHZ warnings," do you mean "NOHZ: local_softirq_pending"?    We see
that message about once a week with 4.19.   Presumably any failure of
____napi_schedule() to wake ksoftirqd could only cause problems for the
NET_RX softirq, so if the pending softirq is different, the cause lies
elsewhere.

-- Alison Chaiken
   Aurora Innovation

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-14 18:56     ` Jakub Kicinski
  2021-05-14 19:44       ` Alison Chaiken
@ 2021-05-14 20:16       ` Thomas Gleixner
  2021-05-14 20:38         ` Jakub Kicinski
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2021-05-14 20:16 UTC (permalink / raw)
  To: Jakub Kicinski, Sebastian Andrzej Siewior
  Cc: netdev, Juri Lelli, linux-rt-users, Steven Rostedt, LKML,
	sassmann, David S. Miller, stable-rt

On Fri, May 14 2021 at 11:56, Jakub Kicinski wrote:
> On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:
>> > ---
>> > Alternatively __napi_schedule_irqoff() could be #ifdef'ed out on RT and
>> > an inline provided which invokes __napi_schedule().
>> >
>> > This was not chosen as it creates #ifdeffery all over the place and with
>> > the proposed solution the code reflects the documentation consistently
>> > and in one obvious place.  
>> 
>> Blame me for that decision.
>> 
>> No matter which variant we end up with, this needs to go into all stable
>> RT kernels ASAP.
>
> Mumble mumble. I thought we concluded that drivers used on RT can be
> fixed, we've already done it for a couple drivers (by which I mean two).
> If all the IRQ handler is doing is scheduling NAPI (which it is for
> modern NICs) - IRQF_NO_THREAD seems like the right option.

Yes. That works, but there are a bunch which do more than that IIRC.

> Is there any driver you care about that we can convert to using
> IRQF_NO_THREAD so we can have new drivers to "do the right thing"
> while the old ones depend on this workaround for now?

The start of this thread was about i40e_msix_clean_rings() which
probably falls under the IRQF_NO_THREAD category, but I'm sure that
there are others. So I chose the safe way for RT for now.

> Another thing while I have your attention - ____napi_schedule() does
> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
> On non-RT we get occasional NOHZ warnings when drivers schedule napi
> from process context, but on RT this is even more of a problem, right?
> ksoftirqd won't run until something else actually wakes it up?

Correct. I sent a patch for the r8152 usb network driver today which
suffers from that problem. :)

As I said there, we want a (debug/lockdep) check in __napi_schedule()
whether soft interrupts are disabled, but let me have a look whether
that check might make more sense directly in __raise_softirq_irqoff().

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-14 20:16       ` Thomas Gleixner
@ 2021-05-14 20:38         ` Jakub Kicinski
  0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2021-05-14 20:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
	Steven Rostedt, LKML, sassmann, David S. Miller, stable-rt

On Fri, 14 May 2021 22:16:10 +0200 Thomas Gleixner wrote:
> On Fri, May 14 2021 at 11:56, Jakub Kicinski wrote:
> > On Thu, 13 May 2021 00:28:02 +0200 Thomas Gleixner wrote:  
> >> Blame me for that decision.
> >> 
> >> No matter which variant we end up with, this needs to go into all stable
> >> RT kernels ASAP.  
> >
> > Mumble mumble. I thought we concluded that drivers used on RT can be
> > fixed, we've already done it for a couple drivers (by which I mean two).
> > If all the IRQ handler is doing is scheduling NAPI (which it is for
> > modern NICs) - IRQF_NO_THREAD seems like the right option.  
> 
> Yes. That works, but there are a bunch which do more than that IIRC.
> 
> > Is there any driver you care about that we can convert to using
> > IRQF_NO_THREAD so we can have new drivers to "do the right thing"
> > while the old ones depend on this workaround for now?  
> 
> The start of this thread was about i40e_msix_clean_rings() which
> probably falls under the IRQF_NO_THREAD category, but I'm sure that
> there are others. So I chose the safe way for RT for now.

Sounds reasonable. I'll send a patch with a new helper and convert 
an example driver I'm sure falls into the "napi_schedule(); return;"
category. I just want to make sure "the right thing to do" is
accessible for people writing new drivers.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
  2021-05-14 19:44       ` Alison Chaiken
@ 2021-05-14 21:53         ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2021-05-14 21:53 UTC (permalink / raw)
  To: Alison Chaiken, Jakub Kicinski
  Cc: Sebastian Andrzej Siewior, netdev, Juri Lelli, linux-rt-users,
	Steven Rostedt, LKML, sassmann, David S. Miller, stable-rt

On Fri, May 14 2021 at 12:44, Alison Chaiken wrote:
> On Fri, May 14, 2021 at 11:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> Another thing while I have your attention - ____napi_schedule() does
>> __raise_softirq_irqoff() which AFAIU does not wake the ksoftirq thread.
>> On non-RT we get occasional NOHZ warnings when drivers schedule napi
>> from process context, but on RT this is even more of a problem, right?
>> ksoftirqd won't run until something else actually wakes it up?
>
> By "NOHZ warnings," do you mean "NOHZ: local_softirq_pending"?    We see
> that message about once a week with 4.19.   Presumably any failure of
> ____napi_schedule() to wake ksoftirqd could only cause problems for the
> NET_RX softirq, so if the pending softirq is different, the cause lies
> elsewhere.

If you read the above carefully you might notice that this _IS_ about
____napi_schedule() being invoked from task context which raises NET_RX
and then results in pending 08! which is NET_RX.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-14 21:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11  6:09 [RT] Question about i40e threaded irq Juri Lelli
2021-05-11  6:24 ` Stefan Assmann
2021-05-11  7:46 ` Thomas Gleixner
2021-05-12 21:43 ` [PATCH net-next] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sebastian Andrzej Siewior
2021-05-12 22:28   ` Thomas Gleixner
2021-05-13  0:50     ` Steven Rostedt
2021-05-13 16:43       ` Steven Rostedt
2021-05-14 12:11         ` Thomas Gleixner
2021-05-14 18:56     ` Jakub Kicinski
2021-05-14 19:44       ` Alison Chaiken
2021-05-14 21:53         ` Thomas Gleixner
2021-05-14 20:16       ` Thomas Gleixner
2021-05-14 20:38         ` Jakub Kicinski
2021-05-13  5:12   ` Juri Lelli
2021-05-13 20:20   ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).