netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] r8152: Ensure that napi_schedule() is handled
@ 2021-05-14 10:17 Thomas Gleixner
  2021-05-14 19:38 ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2021-05-14 10:17 UTC (permalink / raw)
  To: linux-usb, netdev
  Cc: Michal Svec, David S. Miller, Jakub Kicinski, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov

From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 14 May 2021 11:46:08 +0200

The driver invokes napi_schedule() in several places from task
context. napi_schedule() raises the NET_RX softirq bit and relies on the
calling context to ensure that the softirq is handled. That's usually on
return from interrupt or on the outermost local_bh_enable().

But that's not the case here which causes the soft interrupt handling to be
delayed to the next interrupt or local_bh_enable(). If the task in which
context this is invoked is the last runnable task on a CPU and the CPU goes
idle before an interrupt arrives or a local_bh_disable/enable() pair
handles the pending soft interrupt then the NOHZ idle code emits the
following warning.

  NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

Prevent this by wrapping the napi_schedule() invocation from task context
into a local_bh_disable/enable() pair.

Reported-by: Michal Svec <msvec@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---

Note: That's not the first incident of this. Shouldn't napi_schedule()
      have a debug check (under lockdep) to catch this?

---
 drivers/net/usb/r8152.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1543,6 +1543,17 @@ void write_mii_word(struct net_device *n
 	r8152_mdio_write(tp, reg, val);
 }
 
+/*
+ * Wrapper around napi_schedule() to ensure that the raised network softirq
+ * is actually handled.
+ */
+static void r8152_napi_schedule(struct napi_struct *napi)
+{
+	local_bh_disable();
+	napi_schedule(napi);
+	local_bh_enable();
+}
+
 static int
 r8152_submit_rx(struct r8152 *tp, struct rx_agg *agg, gfp_t mem_flags);
 
@@ -1753,7 +1764,7 @@ static void read_bulk_callback(struct ur
 		spin_lock_irqsave(&tp->rx_lock, flags);
 		list_add_tail(&agg->list, &tp->rx_done);
 		spin_unlock_irqrestore(&tp->rx_lock, flags);
-		napi_schedule(&tp->napi);
+		r8152_napi_schedule(&tp->napi);
 		return;
 	case -ESHUTDOWN:
 		rtl_set_unplug(tp);
@@ -2640,7 +2651,7 @@ int r8152_submit_rx(struct r8152 *tp, st
 		netif_err(tp, rx_err, tp->netdev,
 			  "Couldn't submit rx[%p], ret = %d\n", agg, ret);
 
-		napi_schedule(&tp->napi);
+		r8152_napi_schedule(&tp->napi);
 	}
 
 	return ret;
@@ -8202,7 +8213,7 @@ static int rtl8152_post_reset(struct usb
 	usb_submit_urb(tp->intr_urb, GFP_KERNEL);
 
 	if (!list_empty(&tp->rx_done))
-		napi_schedule(&tp->napi);
+		r8152_napi_schedule(&tp->napi);
 
 	return 0;
 }
@@ -8256,7 +8267,7 @@ static int rtl8152_runtime_resume(struct
 		smp_mb__after_atomic();
 
 		if (!list_empty(&tp->rx_done))
-			napi_schedule(&tp->napi);
+			r8152_napi_schedule(&tp->napi);
 
 		usb_submit_urb(tp->intr_urb, GFP_NOIO);
 	} else {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 10:17 [PATCH RFC] r8152: Ensure that napi_schedule() is handled Thomas Gleixner
@ 2021-05-14 19:38 ` Jakub Kicinski
  2021-05-14 20:25   ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2021-05-14 19:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-usb, netdev, Michal Svec, David S. Miller, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov

On Fri, 14 May 2021 12:17:19 +0200 Thomas Gleixner wrote:
> The driver invokes napi_schedule() in several places from task
> context. napi_schedule() raises the NET_RX softirq bit and relies on the
> calling context to ensure that the softirq is handled. That's usually on
> return from interrupt or on the outermost local_bh_enable().
> 
> But that's not the case here which causes the soft interrupt handling to be
> delayed to the next interrupt or local_bh_enable(). If the task in which
> context this is invoked is the last runnable task on a CPU and the CPU goes
> idle before an interrupt arrives or a local_bh_disable/enable() pair
> handles the pending soft interrupt then the NOHZ idle code emits the
> following warning.
> 
>   NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
> 
> Prevent this by wrapping the napi_schedule() invocation from task context
> into a local_bh_disable/enable() pair.

I should have read through my inbox before replying :)

I'd go for switching to raise_softirq_irqoff() in ____napi_schedule()...
why not?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 19:38 ` Jakub Kicinski
@ 2021-05-14 20:25   ` Thomas Gleixner
  2021-05-14 20:46     ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2021-05-14 20:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-usb, netdev, Michal Svec, David S. Miller, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov

On Fri, May 14 2021 at 12:38, Jakub Kicinski wrote:

> On Fri, 14 May 2021 12:17:19 +0200 Thomas Gleixner wrote:
>> The driver invokes napi_schedule() in several places from task
>> context. napi_schedule() raises the NET_RX softirq bit and relies on the
>> calling context to ensure that the softirq is handled. That's usually on
>> return from interrupt or on the outermost local_bh_enable().
>> 
>> But that's not the case here which causes the soft interrupt handling to be
>> delayed to the next interrupt or local_bh_enable(). If the task in which
>> context this is invoked is the last runnable task on a CPU and the CPU goes
>> idle before an interrupt arrives or a local_bh_disable/enable() pair
>> handles the pending soft interrupt then the NOHZ idle code emits the
>> following warning.
>> 
>>   NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
>> 
>> Prevent this by wrapping the napi_schedule() invocation from task context
>> into a local_bh_disable/enable() pair.
>
> I should have read through my inbox before replying :)
>
> I'd go for switching to raise_softirq_irqoff() in ____napi_schedule()...
> why not?

Except that some instruction cycle beancounters might complain about
the extra conditional for the sane cases.

But yes, I'm fine with that as well. That's why this patch is marked RFC :)

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 20:25   ` Thomas Gleixner
@ 2021-05-14 20:46     ` Jakub Kicinski
  2021-05-14 21:10       ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2021-05-14 20:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-usb, netdev, Michal Svec, David S. Miller, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov

On Fri, 14 May 2021 22:25:50 +0200 Thomas Gleixner wrote:
> On Fri, May 14 2021 at 12:38, Jakub Kicinski wrote:
> 
> > On Fri, 14 May 2021 12:17:19 +0200 Thomas Gleixner wrote:  
> >> The driver invokes napi_schedule() in several places from task
> >> context. napi_schedule() raises the NET_RX softirq bit and relies on the
> >> calling context to ensure that the softirq is handled. That's usually on
> >> return from interrupt or on the outermost local_bh_enable().
> >> 
> >> But that's not the case here which causes the soft interrupt handling to be
> >> delayed to the next interrupt or local_bh_enable(). If the task in which
> >> context this is invoked is the last runnable task on a CPU and the CPU goes
> >> idle before an interrupt arrives or a local_bh_disable/enable() pair
> >> handles the pending soft interrupt then the NOHZ idle code emits the
> >> following warning.
> >> 
> >>   NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
> >> 
> >> Prevent this by wrapping the napi_schedule() invocation from task context
> >> into a local_bh_disable/enable() pair.  
> >
> > I should have read through my inbox before replying :)
> >
> > I'd go for switching to raise_softirq_irqoff() in ____napi_schedule()...
> > why not?  
> 
> Except that some instruction cycle beancounters might complain about
> the extra conditional for the sane cases.
> 
> But yes, I'm fine with that as well. That's why this patch is marked RFC :)

When we're in the right context (irq/bh disabled etc.) the cost is just
read of preempt_count() and jump, right? And presumably preempt_count()
is in the cache already, because those sections aren't very long. Let me
make this change locally and see if it is in any way perceivable.

Obviously if anyone sees a way to solve the problem without much
ifdefinery and force_irqthreads checks that'd be great - I don't.
I'd rather avoid pushing this kind of stuff out to the drivers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 20:46     ` Jakub Kicinski
@ 2021-05-14 21:10       ` Thomas Gleixner
  2021-05-14 21:41         ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2021-05-14 21:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-usb, netdev, Michal Svec, David S. Miller, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov

On Fri, May 14 2021 at 13:46, Jakub Kicinski wrote:
> On Fri, 14 May 2021 22:25:50 +0200 Thomas Gleixner wrote:
>> Except that some instruction cycle beancounters might complain about
>> the extra conditional for the sane cases.
>> 
>> But yes, I'm fine with that as well. That's why this patch is marked RFC :)
>
> When we're in the right context (irq/bh disabled etc.) the cost is just
> read of preempt_count() and jump, right? And presumably preempt_count()
> is in the cache already, because those sections aren't very long. Let me
> make this change locally and see if it is in any way perceivable.

Right. Just wanted to mention it :)

> Obviously if anyone sees a way to solve the problem without much
> ifdefinery and force_irqthreads checks that'd be great - I don't.

This is not related to force_irqthreads at all. This very driver invokes
it from plain thread context.

> I'd rather avoid pushing this kind of stuff out to the drivers.

You could have napi_schedule_intask() or something like that which would
do the local_bh_disable()/enable() dance around the invocation of
napi_schedule(). That would also document it clearly in the drivers. A
quick grep shows a bunch of instances which could be replaced:

drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c-5704-		local_bh_disable();
drivers/net/ethernet/mellanox/mlx4/en_netdev.c-1830-		local_bh_disable();
drivers/net/usb/r8152.c-1552-	local_bh_disable();
drivers/net/virtio_net.c-1355-	local_bh_disable();
drivers/net/wireless/intel/iwlwifi/pcie/rx.c-1650-	local_bh_disable();
drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2015-		local_bh_disable();
drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2225-		local_bh_disable();
drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2235-		local_bh_disable();
drivers/s390/net/qeth_core_main.c-3515-	local_bh_disable();

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 21:10       ` Thomas Gleixner
@ 2021-05-14 21:41         ` Jakub Kicinski
  2021-05-14 23:23           ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2021-05-14 21:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-usb, netdev, Michal Svec, David S. Miller, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov

On Fri, 14 May 2021 23:10:43 +0200 Thomas Gleixner wrote:
> On Fri, May 14 2021 at 13:46, Jakub Kicinski wrote:
> > On Fri, 14 May 2021 22:25:50 +0200 Thomas Gleixner wrote:  
> >> Except that some instruction cycle beancounters might complain about
> >> the extra conditional for the sane cases.
> >> 
> >> But yes, I'm fine with that as well. That's why this patch is marked RFC :)  
> >
> > When we're in the right context (irq/bh disabled etc.) the cost is just
> > read of preempt_count() and jump, right? And presumably preempt_count()
> > is in the cache already, because those sections aren't very long. Let me
> > make this change locally and see if it is in any way perceivable.  
> 
> Right. Just wanted to mention it :)
> 
> > Obviously if anyone sees a way to solve the problem without much
> > ifdefinery and force_irqthreads checks that'd be great - I don't.  
> 
> This is not related to force_irqthreads at all. This very driver invokes
> it from plain thread context.

I see, but a driver calling __napi_schedule_irqoff() from its IRQ
handler _would_ be an issue, right? Or do irq threads trigger softirq
processing on exit?

> > I'd rather avoid pushing this kind of stuff out to the drivers.  
> 
> You could have napi_schedule_intask() or something like that which would
> do the local_bh_disable()/enable() dance around the invocation of
> napi_schedule(). That would also document it clearly in the drivers. A
> quick grep shows a bunch of instances which could be replaced:
> 
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c-5704-		local_bh_disable();
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c-1830-		local_bh_disable();
> drivers/net/usb/r8152.c-1552-	local_bh_disable();
> drivers/net/virtio_net.c-1355-	local_bh_disable();
> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-1650-	local_bh_disable();
> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2015-		local_bh_disable();
> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2225-		local_bh_disable();
> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2235-		local_bh_disable();
> drivers/s390/net/qeth_core_main.c-3515-	local_bh_disable();

Very well aware, I've just sent a patch for mlx5 last week :)

My initial reaction was the same as yours - we should add lockdep
check, and napi_schedule_intask(). But then I started wondering
if it's all for nothing on rt or with force_irqthreads, and therefore
we should just eat the extra check.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 21:41         ` Jakub Kicinski
@ 2021-05-14 23:23           ` Thomas Gleixner
  2021-05-14 23:36             ` Jakub Kicinski
  2021-05-15 13:09             ` Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Thomas Gleixner @ 2021-05-14 23:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-usb, netdev, Michal Svec, David S. Miller, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov, LKML, Peter Zijlstra,
	Boqun Feng

On Fri, May 14 2021 at 14:41, Jakub Kicinski wrote:
> On Fri, 14 May 2021 23:10:43 +0200 Thomas Gleixner wrote:
>> On Fri, May 14 2021 at 13:46, Jakub Kicinski wrote:
>> > On Fri, 14 May 2021 22:25:50 +0200 Thomas Gleixner wrote:  
>> >> Except that some instruction cycle beancounters might complain about
>> >> the extra conditional for the sane cases.
>> >> 
>> >> But yes, I'm fine with that as well. That's why this patch is marked RFC :)  
>> >
>> > When we're in the right context (irq/bh disabled etc.) the cost is just
>> > read of preempt_count() and jump, right? And presumably preempt_count()
>> > is in the cache already, because those sections aren't very long. Let me
>> > make this change locally and see if it is in any way perceivable.  
>> 
>> Right. Just wanted to mention it :)
>> 
>> > Obviously if anyone sees a way to solve the problem without much
>> > ifdefinery and force_irqthreads checks that'd be great - I don't.  
>> 
>> This is not related to force_irqthreads at all. This very driver invokes
>> it from plain thread context.
>
> I see, but a driver calling __napi_schedule_irqoff() from its IRQ
> handler _would_ be an issue, right? Or do irq threads trigger softirq
> processing on exit?

Yes, they do. See irq_forced_thread_fn(). It has a local_bh_disable() /
local_bh_ enable() pair around the invocation to ensure that.

>> > I'd rather avoid pushing this kind of stuff out to the drivers.  
>> 
>> You could have napi_schedule_intask() or something like that which would
>> do the local_bh_disable()/enable() dance around the invocation of
>> napi_schedule(). That would also document it clearly in the drivers. A
>> quick grep shows a bunch of instances which could be replaced:
>> 
>> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c-5704-		local_bh_disable();
>> drivers/net/ethernet/mellanox/mlx4/en_netdev.c-1830-		local_bh_disable();
>> drivers/net/usb/r8152.c-1552-	local_bh_disable();
>> drivers/net/virtio_net.c-1355-	local_bh_disable();
>> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-1650-	local_bh_disable();
>> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2015-		local_bh_disable();
>> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2225-		local_bh_disable();
>> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2235-		local_bh_disable();
>> drivers/s390/net/qeth_core_main.c-3515-	local_bh_disable();
>
> Very well aware, I've just sent a patch for mlx5 last week :)
>
> My initial reaction was the same as yours - we should add lockdep
> check, and napi_schedule_intask(). But then I started wondering
> if it's all for nothing on rt or with force_irqthreads, and therefore
> we should just eat the extra check.

We can make that work but sure I'm not going to argue when you decide to
just go for raise_softirq_irqsoff().

I just hacked that check up which is actually useful beyond NAPI. It's
straight forward except for that flush_smp_call_function_from_idle()
oddball, which immeditately triggered that assert because block mq uses
__raise_softirq_irqsoff() in a smp function call...

See below. Peter might have opinions though :)

Thanks,

        tglx
---
 include/linux/lockdep.h |   21 +++++++++++++++++++++
 include/linux/sched.h   |    1 +
 kernel/smp.c            |    2 ++
 kernel/softirq.c        |   18 ++++++++++++++----
 4 files changed, 38 insertions(+), 4 deletions(-)

--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -636,6 +636,23 @@ do {									\
 		     (!in_softirq() || in_irq() || in_nmi()));		\
 } while (0)
 
+#define lockdep_set_softirq_raise_safe()				\
+do {									\
+	current->softirq_raise_safe = 1;				\
+} while (0)
+
+#define lockdep_clear_softirq_raise_safe()				\
+do {									\
+	current->softirq_raise_safe = 0;				\
+} while (0)
+
+#define lockdep_assert_softirq_raise_ok()				\
+do {									\
+	WARN_ON_ONCE(__lockdep_enabled &&				\
+		     !current->softirq_raise_safe &&			\
+		     !(softirq_count() | hardirq_count()));		\
+} while (0)
+
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
@@ -648,6 +665,10 @@ do {									\
 # define lockdep_assert_preemption_enabled() do { } while (0)
 # define lockdep_assert_preemption_disabled() do { } while (0)
 # define lockdep_assert_in_softirq() do { } while (0)
+
+# define lockdep_set_softirq_raise_safe()	do { } while (0)
+# define lockdep_clear_softirq_raise_safe()	do { } while (0)
+# define lockdep_assert_softirq_raise_ok()	do { } while (0)
 #endif
 
 #ifdef CONFIG_PROVE_RAW_LOCK_NESTING
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1058,6 +1058,7 @@ struct task_struct {
 	u64				curr_chain_key;
 	int				lockdep_depth;
 	unsigned int			lockdep_recursion;
+	unsigned int			softirq_raise_safe;
 	struct held_lock		held_locks[MAX_LOCK_DEPTH];
 #endif
 
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -691,7 +691,9 @@ void flush_smp_call_function_from_idle(v
 	cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->idle, CFD_SEQ_NOCPU,
 		      smp_processor_id(), CFD_SEQ_IDLE);
 	local_irq_save(flags);
+	lockdep_set_softirq_raise_safe();
 	flush_smp_call_function_queue(true);
+	lockdep_clear_softirq_raise_safe();
 	if (local_softirq_pending())
 		do_softirq();
 
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -664,12 +664,19 @@ void irq_exit(void)
 	lockdep_hardirq_exit();
 }
 
+static inline void ____raise_softirq_irqoff(unsigned int nr)
+{
+	lockdep_assert_irqs_disabled();
+	trace_softirq_raise(nr);
+	or_softirq_pending(1UL << nr);
+}
+
 /*
  * This function must run with irqs disabled!
  */
 inline void raise_softirq_irqoff(unsigned int nr)
 {
-	__raise_softirq_irqoff(nr);
+	____raise_softirq_irqoff(nr);
 
 	/*
 	 * If we're in an interrupt or softirq, we're done
@@ -693,11 +700,14 @@ void raise_softirq(unsigned int nr)
 	local_irq_restore(flags);
 }
 
+/*
+ * Must be invoked with interrupts disabled and either from softirq serving
+ * context or with local bottom halfs disabled.
+ */
 void __raise_softirq_irqoff(unsigned int nr)
 {
-	lockdep_assert_irqs_disabled();
-	trace_softirq_raise(nr);
-	or_softirq_pending(1UL << nr);
+	lockdep_assert_softirq_raise_ok();
+	____raise_softirq_irqoff(nr);
 }
 
 void open_softirq(int nr, void (*action)(struct softirq_action *))


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 23:23           ` Thomas Gleixner
@ 2021-05-14 23:36             ` Jakub Kicinski
  2021-05-15 13:09             ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2021-05-14 23:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-usb, netdev, Michal Svec, David S. Miller, Hayes Wang,
	Thierry Reding, Lee Jones, Borislav Petkov, LKML, Peter Zijlstra,
	Boqun Feng

On Sat, 15 May 2021 01:23:02 +0200 Thomas Gleixner wrote:
> On Fri, May 14 2021 at 14:41, Jakub Kicinski wrote:
> >> This is not related to force_irqthreads at all. This very driver invokes
> >> it from plain thread context.  
> >
> > I see, but a driver calling __napi_schedule_irqoff() from its IRQ
> > handler _would_ be an issue, right? Or do irq threads trigger softirq
> > processing on exit?  
> 
> Yes, they do. See irq_forced_thread_fn(). It has a local_bh_disable() /
> local_bh_ enable() pair around the invocation to ensure that.

Ah, excellent!

> >> You could have napi_schedule_intask() or something like that which would
> >> do the local_bh_disable()/enable() dance around the invocation of
> >> napi_schedule(). That would also document it clearly in the drivers. A
> >> quick grep shows a bunch of instances which could be replaced:
> >> 
> >> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c-5704-		local_bh_disable();
> >> drivers/net/ethernet/mellanox/mlx4/en_netdev.c-1830-		local_bh_disable();
> >> drivers/net/usb/r8152.c-1552-	local_bh_disable();
> >> drivers/net/virtio_net.c-1355-	local_bh_disable();
> >> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-1650-	local_bh_disable();
> >> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2015-		local_bh_disable();
> >> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2225-		local_bh_disable();
> >> drivers/net/wireless/intel/iwlwifi/pcie/rx.c-2235-		local_bh_disable();
> >> drivers/s390/net/qeth_core_main.c-3515-	local_bh_disable();  
> >
> > Very well aware, I've just sent a patch for mlx5 last week :)
> >
> > My initial reaction was the same as yours - we should add lockdep
> > check, and napi_schedule_intask(). But then I started wondering
> > if it's all for nothing on rt or with force_irqthreads, and therefore
> > we should just eat the extra check.  
> 
> We can make that work but sure I'm not going to argue when you decide to
> just go for raise_softirq_irqsoff().
> 
> I just hacked that check up which is actually useful beyond NAPI. It's
> straight forward except for that flush_smp_call_function_from_idle()
> oddball, which immeditately triggered that assert because block mq uses
> __raise_softirq_irqsoff() in a smp function call...
> 
> See below. Peter might have opinions though :)

Looks good to me, since my thinking that RT complicates things here was
wrong I'm perfectly happy with the lockdep + napi_schedule_intask().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-14 23:23           ` Thomas Gleixner
  2021-05-14 23:36             ` Jakub Kicinski
@ 2021-05-15 13:09             ` Peter Zijlstra
  2021-05-15 19:06               ` Thomas Gleixner
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-05-15 13:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jakub Kicinski, linux-usb, netdev, Michal Svec, David S. Miller,
	Hayes Wang, Thierry Reding, Lee Jones, Borislav Petkov, LKML,
	Boqun Feng

On Sat, May 15, 2021 at 01:23:02AM +0200, Thomas Gleixner wrote:
> We can make that work but sure I'm not going to argue when you decide to
> just go for raise_softirq_irqsoff().
> 
> I just hacked that check up which is actually useful beyond NAPI. It's
> straight forward except for that flush_smp_call_function_from_idle()
> oddball, which immeditately triggered that assert because block mq uses
> __raise_softirq_irqsoff() in a smp function call...
> 
> See below. Peter might have opinions though :)

Yeah, lovely stuff :-)


> +#define lockdep_assert_softirq_raise_ok()				\
> +do {									\
> +	WARN_ON_ONCE(__lockdep_enabled &&				\
> +		     !current->softirq_raise_safe &&			\
> +		     !(softirq_count() | hardirq_count()));		\
> +} while (0)

> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -691,7 +691,9 @@ void flush_smp_call_function_from_idle(v
>  	cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->idle, CFD_SEQ_NOCPU,
>  		      smp_processor_id(), CFD_SEQ_IDLE);
>  	local_irq_save(flags);
> +	lockdep_set_softirq_raise_safe();
>  	flush_smp_call_function_queue(true);
> +	lockdep_clear_softirq_raise_safe();
>  	if (local_softirq_pending())
>  		do_softirq();

I think it might make more sense to raise hardirq_count() in/for
flush_smp_call_function_queue() callers that aren't already from hardirq
context. That's this site and smpcfd_dying_cpu().

Then we can do away with this new special case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC] r8152: Ensure that napi_schedule() is handled
  2021-05-15 13:09             ` Peter Zijlstra
@ 2021-05-15 19:06               ` Thomas Gleixner
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2021-05-15 19:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jakub Kicinski, linux-usb, netdev, Michal Svec, David S. Miller,
	Hayes Wang, Thierry Reding, Lee Jones, Borislav Petkov, LKML,
	Boqun Feng

On Sat, May 15 2021 at 15:09, Peter Zijlstra wrote:
> On Sat, May 15, 2021 at 01:23:02AM +0200, Thomas Gleixner wrote:
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -691,7 +691,9 @@ void flush_smp_call_function_from_idle(v
>>  	cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->idle, CFD_SEQ_NOCPU,
>>  		      smp_processor_id(), CFD_SEQ_IDLE);
>>  	local_irq_save(flags);
>> +	lockdep_set_softirq_raise_safe();
>>  	flush_smp_call_function_queue(true);
>> +	lockdep_clear_softirq_raise_safe();
>>  	if (local_softirq_pending())
>>  		do_softirq();
>
> I think it might make more sense to raise hardirq_count() in/for
> flush_smp_call_function_queue() callers that aren't already from hardirq
> context. That's this site and smpcfd_dying_cpu().
>
> Then we can do away with this new special case.

Right.

Though I just checked smpcfd_dying_cpu(). That ones does not run
softirqs after flushing the function queue and it can't do that because
that's in the CPU dying phase with interrupts disabled where the CPU is
already half torn down.

Especially as softirq processing enables interrupts, which might cause
even more havoc.

Anyway how is it safe to run arbitrary functions there after the CPU
removed itself from the online mask? That's daft to put it mildly.

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-15 19:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-14 10:17 [PATCH RFC] r8152: Ensure that napi_schedule() is handled Thomas Gleixner
2021-05-14 19:38 ` Jakub Kicinski
2021-05-14 20:25   ` Thomas Gleixner
2021-05-14 20:46     ` Jakub Kicinski
2021-05-14 21:10       ` Thomas Gleixner
2021-05-14 21:41         ` Jakub Kicinski
2021-05-14 23:23           ` Thomas Gleixner
2021-05-14 23:36             ` Jakub Kicinski
2021-05-15 13:09             ` Peter Zijlstra
2021-05-15 19:06               ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).