ath10k.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* NOHZ tick-stop error with ath10k SDIO
@ 2021-08-18 15:18 Fabio Estevam
  2021-08-18 15:43 ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Fabio Estevam @ 2021-08-18 15:18 UTC (permalink / raw)
  To: Kalle Valo, Paul E . McKenney; +Cc: ath10k, linux-mmc, Ulf Hansson, Marek Vasut

Hi,

When launching the hostapd application on a i.MX7 based board with an
ath10k device connected via SDIO, the following "NOHZ tick-stop error"
messages are seen:

# hostapd /etc/wifi.conf
Configuration file: /etc/wifi.conf
wlan0: interface state UNINITIALIZED->COUNTRY_UPDATE
[   63.021149] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
Using interface wlan0 with hwaddr 00:1f:7b:31:04:a0 and ssid "thessid"
[   67.332470] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wlan0: interface state COUNTRY_UPDATE->ENABLED
wlan0: AP-ENABLED
[   68.025845] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   69.025973] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   69.607432] cfg80211: failed to load regulatory.db
[   72.026748] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   73.027039] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   74.027159] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   75.027109] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   76.027461] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   77.027391] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!
[   78.027560] NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!

This happens on all kernel versions from 5.10  to 5.13.

Any ideas on how to fix this problem?

Thanks,

Fabio Estevam

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-08-18 15:18 NOHZ tick-stop error with ath10k SDIO Fabio Estevam
@ 2021-08-18 15:43 ` Paul E. McKenney
  2021-08-18 16:29   ` Fabio Estevam
  2021-09-17 16:32   ` Qais Yousef
  0 siblings, 2 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-08-18 15:43 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut, qais.yousef

On Wed, Aug 18, 2021 at 12:18:25PM -0300, Fabio Estevam wrote:
> Hi,
> 
> When launching the hostapd application on a i.MX7 based board with an
> ath10k device connected via SDIO, the following "NOHZ tick-stop error"
> messages are seen:
> 
> # hostapd /etc/wifi.conf
> Configuration file: /etc/wifi.conf
> wlan0: interface state UNINITIALIZED->COUNTRY_UPDATE
> [   63.021149] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> Using interface wlan0 with hwaddr 00:1f:7b:31:04:a0 and ssid "thessid"
> [   67.332470] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> wlan0: interface state COUNTRY_UPDATE->ENABLED
> wlan0: AP-ENABLED
> [   68.025845] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   69.025973] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   69.607432] cfg80211: failed to load regulatory.db
> [   72.026748] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   73.027039] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   74.027159] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   75.027109] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   76.027461] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   77.027391] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> [   78.027560] NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!
> 
> This happens on all kernel versions from 5.10  to 5.13.
> 
> Any ideas on how to fix this problem?

I believe that you need this commit (and possibly some prerequsites):

47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")

Adding Qais on CC for his thoughts.

							Thanx, Paul

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-08-18 15:43 ` Paul E. McKenney
@ 2021-08-18 16:29   ` Fabio Estevam
  2021-08-18 17:02     ` Fabio Estevam
  2021-09-17 16:32   ` Qais Yousef
  1 sibling, 1 reply; 14+ messages in thread
From: Fabio Estevam @ 2021-08-18 16:29 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut, qais.yousef

Hi Paul,

On Wed, Aug 18, 2021 at 12:43 PM Paul E. McKenney <paulmck@kernel.org> wrote:

> I believe that you need this commit (and possibly some prerequsites):
>
> 47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")
>
> Adding Qais on CC for his thoughts.

Thanks for the suggestion, but I am running 5.13.11, which already
contains this commit.

Any extra logs I should capture to help us understand the problem?

Thanks

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-08-18 16:29   ` Fabio Estevam
@ 2021-08-18 17:02     ` Fabio Estevam
  2021-08-18 17:56       ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Fabio Estevam @ 2021-08-18 17:02 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut, qais.yousef

Hi Paul,

On Wed, Aug 18, 2021 at 1:29 PM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Paul,
>
> On Wed, Aug 18, 2021 at 12:43 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> > I believe that you need this commit (and possibly some prerequsites):
> >
> > 47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")
> >
> > Adding Qais on CC for his thoughts.
>
> Thanks for the suggestion, but I am running 5.13.11, which already
> contains this commit.
>
> Any extra logs I should capture to help us understand the problem?

In case it helps, I followed your suggestion from:
https://lkml.org/lkml/2020/12/10/676

With the debug patch and suggested command line, I get the following log:
https://pastebin.com/raw/X96zKw7i

Thanks

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-08-18 17:02     ` Fabio Estevam
@ 2021-08-18 17:56       ` Paul E. McKenney
  2021-08-19 13:24         ` Fabio Estevam
  2021-09-02 21:51         ` Thomas Gleixner
  0 siblings, 2 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-08-18 17:56 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut,
	qais.yousef, frederic, tglx

On Wed, Aug 18, 2021 at 02:02:17PM -0300, Fabio Estevam wrote:
> Hi Paul,
> 
> On Wed, Aug 18, 2021 at 1:29 PM Fabio Estevam <festevam@gmail.com> wrote:
> >
> > Hi Paul,
> >
> > On Wed, Aug 18, 2021 at 12:43 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > > I believe that you need this commit (and possibly some prerequsites):
> > >
> > > 47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")
> > >
> > > Adding Qais on CC for his thoughts.
> >
> > Thanks for the suggestion, but I am running 5.13.11, which already
> > contains this commit.
> >
> > Any extra logs I should capture to help us understand the problem?
> 
> In case it helps, I followed your suggestion from:
> https://lkml.org/lkml/2020/12/10/676
> 
> With the debug patch and suggested command line, I get the following log:
> https://pastebin.com/raw/X96zKw7i

And it turns out that I am also seeing it in v5.14-rc2, just a lot less
frequently than earlier.  I have seen three instances of handler #02
(NET_TX_SOFTIRQ?) over the past month or so while you are seeing handler
#08 (BLOCK_SOFTIRQ?), in case that makes a difference.

Adding Frederic and Thomas on CC, though I believe Frederic is off
the grid at the moment.

							Thanx, Paul

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-08-18 17:56       ` Paul E. McKenney
@ 2021-08-19 13:24         ` Fabio Estevam
  2021-09-02 21:51         ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Fabio Estevam @ 2021-08-19 13:24 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut,
	qais.yousef, Frederic Weisbecker, Thomas Gleixner

Hi Paul,

On Wed, Aug 18, 2021 at 2:56 PM Paul E. McKenney <paulmck@kernel.org> wrote:

> And it turns out that I am also seeing it in v5.14-rc2, just a lot less
> frequently than earlier.  I have seen three instances of handler #02
> (NET_TX_SOFTIRQ?) over the past month or so while you are seeing handler
> #08 (BLOCK_SOFTIRQ?), in case that makes a difference.
>
> Adding Frederic and Thomas on CC, though I believe Frederic is off
> the grid at the moment.

In my case, these errors are very easy to reproduce so if you need me
to collect any debug
info, just let me know.

Thanks!

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-08-18 17:56       ` Paul E. McKenney
  2021-08-19 13:24         ` Fabio Estevam
@ 2021-09-02 21:51         ` Thomas Gleixner
  2021-09-02 22:09           ` Paul E. McKenney
  2021-09-03  8:07           ` Thomas Gleixner
  1 sibling, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2021-09-02 21:51 UTC (permalink / raw)
  To: paulmck, Fabio Estevam
  Cc: Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut,
	qais.yousef, frederic

Paul,

On Wed, Aug 18 2021 at 10:56, Paul E. McKenney wrote:
> On Wed, Aug 18, 2021 at 02:02:17PM -0300, Fabio Estevam wrote:
>> On Wed, Aug 18, 2021 at 1:29 PM Fabio Estevam <festevam@gmail.com> wrote:
>> > On Wed, Aug 18, 2021 at 12:43 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>> >
>> > > I believe that you need this commit (and possibly some prerequsites):
>> > >
>> > > 47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")
>> > >
>> > > Adding Qais on CC for his thoughts.
>> >
>> > Thanks for the suggestion, but I am running 5.13.11, which already
>> > contains this commit.
>> >
>> > Any extra logs I should capture to help us understand the problem?
>> 
>> In case it helps, I followed your suggestion from:
>> https://lkml.org/lkml/2020/12/10/676
>> 
>> With the debug patch and suggested command line, I get the following log:
>> https://pastebin.com/raw/X96zKw7i
>
> And it turns out that I am also seeing it in v5.14-rc2, just a lot less
> frequently than earlier.  I have seen three instances of handler #02
> (NET_TX_SOFTIRQ?) over the past month or so while you are seeing handler
> #08 (BLOCK_SOFTIRQ?), in case that makes a difference.

Huch? #02 is TIMER_SOFTIRQ and #08 is NET_TX_SOFTIRQ.

And looking at that ftrace output in the pastebin there is nothing which
raises NET_TX_SOFTIRQ but then the warning claims it is pending.

This does not make any sense at all.

Thanks,

        tglx



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-09-02 21:51         ` Thomas Gleixner
@ 2021-09-02 22:09           ` Paul E. McKenney
  2021-09-03  8:07           ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-09-02 22:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fabio Estevam, Kalle Valo, ath10k, linux-mmc, Ulf Hansson,
	Marek Vasut, qais.yousef, frederic

On Thu, Sep 02, 2021 at 11:51:15PM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Wed, Aug 18 2021 at 10:56, Paul E. McKenney wrote:
> > On Wed, Aug 18, 2021 at 02:02:17PM -0300, Fabio Estevam wrote:
> >> On Wed, Aug 18, 2021 at 1:29 PM Fabio Estevam <festevam@gmail.com> wrote:
> >> > On Wed, Aug 18, 2021 at 12:43 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >> >
> >> > > I believe that you need this commit (and possibly some prerequsites):
> >> > >
> >> > > 47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")
> >> > >
> >> > > Adding Qais on CC for his thoughts.
> >> >
> >> > Thanks for the suggestion, but I am running 5.13.11, which already
> >> > contains this commit.
> >> >
> >> > Any extra logs I should capture to help us understand the problem?
> >> 
> >> In case it helps, I followed your suggestion from:
> >> https://lkml.org/lkml/2020/12/10/676
> >> 
> >> With the debug patch and suggested command line, I get the following log:
> >> https://pastebin.com/raw/X96zKw7i
> >
> > And it turns out that I am also seeing it in v5.14-rc2, just a lot less
> > frequently than earlier.  I have seen three instances of handler #02
> > (NET_TX_SOFTIRQ?) over the past month or so while you are seeing handler
> > #08 (BLOCK_SOFTIRQ?), in case that makes a difference.
> 
> Huch? #02 is TIMER_SOFTIRQ and #08 is NET_TX_SOFTIRQ.

Idiot here was forgetting that the #02 represents bit 1 (as you say,
TIMER_SOFTIRQ) rather than numeral 2.  Ditto for the #08.  :-/

							Thanx, Paul

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-09-02 21:51         ` Thomas Gleixner
  2021-09-02 22:09           ` Paul E. McKenney
@ 2021-09-03  8:07           ` Thomas Gleixner
  2021-09-04 21:10             ` Fabio Estevam
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2021-09-03  8:07 UTC (permalink / raw)
  To: paulmck, Fabio Estevam
  Cc: Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut,
	qais.yousef, frederic

Fabio,

On Thu, Sep 02 2021 at 23:51, Thomas Gleixner wrote:
> On Wed, Aug 18 2021 at 10:56, Paul E. McKenney wrote:
>> On Wed, Aug 18, 2021 at 02:02:17PM -0300, Fabio Estevam wrote:
>>> On Wed, Aug 18, 2021 at 1:29 PM Fabio Estevam <festevam@gmail.com> wrote:
>>>
>>> With the debug patch and suggested command line, I get the following log:
>>> https://pastebin.com/raw/X96zKw7i
>
> And looking at that ftrace output in the pastebin there is nothing which
> raises NET_TX_SOFTIRQ but then the warning claims it is pending.
>
> This does not make any sense at all.

Looked once more at the trace output. It seems to be incomplete. The
last recording of softirq raise was at 379568us ~= 0.38s post boot, but
the splat comes about 20 seconds post boot. Did your kernel trigger a
WARN_ON before that splat? If so, that might have disabled tracing.

As you are triggering this manually by invoking hostapd and the machine
should be still functional afterwards, can you please replace Paul's
debug patch with the one below? Please remove the command line option
and do the following:

# echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_raise/enable
# echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_entry/enable
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# hostapd ...

Once the warning triggered do:

# cat /sys/kernel/debug/tracing/trace >trace.txt

That should give us the full trace data and hopefully a better
understanding of the problem.

Thanks,

        tglx
---
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6bffe5af8cb1..269f804090ef 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1015,6 +1015,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 
 		if (ratelimit < 10 && !local_bh_blocked() &&
 		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
+			tracing_off();
 			pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n",
 				(unsigned int) local_softirq_pending());
 			ratelimit++;

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-09-03  8:07           ` Thomas Gleixner
@ 2021-09-04 21:10             ` Fabio Estevam
  2021-09-05 13:00               ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Fabio Estevam @ 2021-09-04 21:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E . McKenney, Kalle Valo, ath10k, linux-mmc, Ulf Hansson,
	Marek Vasut, qais.yousef, Frederic Weisbecker

Hi Thomas,

Thanks for your response.

On Fri, Sep 3, 2021 at 5:07 AM Thomas Gleixner <tglx@linutronix.de> wrote:

> Looked once more at the trace output. It seems to be incomplete. The
> last recording of softirq raise was at 379568us ~= 0.38s post boot, but
> the splat comes about 20 seconds post boot. Did your kernel trigger a
> WARN_ON before that splat? If so, that might have disabled tracing.

You are right. The WARN_ON only happens after hostapd runs, which is at a
much later stage.

> As you are triggering this manually by invoking hostapd and the machine
> should be still functional afterwards, can you please replace Paul's
> debug patch with the one below? Please remove the command line option
> and do the following:
>
> # echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_raise/enable
> # echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_entry/enable
> # echo 1 > /proc/sys/kernel/stack_tracer_enabled
> # hostapd ...
>
> Once the warning triggered do:
>
> # cat /sys/kernel/debug/tracing/trace >trace.txt
>
> That should give us the full trace data and hopefully a better
> understanding of the problem.

I did as suggested and here is trace.txt:
https://pastebin.com/VUfLRJ8a

Also, while investigating this problem I saw a commit that fixed a
similar issue:
e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()").

I then tried the same approach on the ath10k sdio driver:

diff --git a/drivers/net/wireless/ath/ath10k/sdio.c
b/drivers/net/wireless/ath/ath10k/sdio.c
index b746052737e0..eb705214f3f0 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -1363,8 +1363,11 @@ static void
ath10k_rx_indication_async_work(struct work_struct *work)
         ep->ep_ops.ep_rx_complete(ar, skb);
     }

-    if (test_bit(ATH10K_FLAG_CORE_REGISTERED, &ar->dev_flags))
+    if (test_bit(ATH10K_FLAG_CORE_REGISTERED, &ar->dev_flags)) {
+        local_bh_disable();
         napi_schedule(&ar->napi);
+        local_bh_enable();
+    }
 }

and no longer get the "NOHZ tick-stop error: Non-RCU local softirq work is
pending, handler #08!!!" error messages after launching hostapd.

Is this a proper fix?

Thanks,

Fabio Estevam

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-09-04 21:10             ` Fabio Estevam
@ 2021-09-05 13:00               ` Thomas Gleixner
  2021-09-05 13:07                 ` Fabio Estevam
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2021-09-05 13:00 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Paul E . McKenney, Kalle Valo, ath10k, linux-mmc, Ulf Hansson,
	Marek Vasut, qais.yousef, Frederic Weisbecker

Fabio,

On Sat, Sep 04 2021 at 18:10, Fabio Estevam wrote:
> On Fri, Sep 3, 2021 at 5:07 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> I did as suggested and here is trace.txt:
> https://pastebin.com/VUfLRJ8a

Lacks a stack trace, but yes this one is the culprit:

kworker/u4:2-70      [000] d..1    87.940929: softirq_raise: vec=3 [action=NET_RX]

It has only interrupts and preemption disabled and it's in task
context. So if there is no interrupt raised and no local_bh_disable /
enable() pair invoked before the CPU goes idle nothing will handle the
softirq and the raised bit stays pending which makes the NOHZ idle code
complain.

> Also, while investigating this problem I saw a commit that fixed a
> similar issue:
> e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()").
>
> I then tried the same approach on the ath10k sdio driver:
>
> diff --git a/drivers/net/wireless/ath/ath10k/sdio.c
> b/drivers/net/wireless/ath/ath10k/sdio.c
> index b746052737e0..eb705214f3f0 100644
> --- a/drivers/net/wireless/ath/ath10k/sdio.c
> +++ b/drivers/net/wireless/ath/ath10k/sdio.c
> @@ -1363,8 +1363,11 @@ static void
> ath10k_rx_indication_async_work(struct work_struct *work)
>          ep->ep_ops.ep_rx_complete(ar, skb);
>      }
>
> -    if (test_bit(ATH10K_FLAG_CORE_REGISTERED, &ar->dev_flags))
> +    if (test_bit(ATH10K_FLAG_CORE_REGISTERED, &ar->dev_flags)) {
> +        local_bh_disable();
>          napi_schedule(&ar->napi);
> +        local_bh_enable();
> +    }
>  }
>
> and no longer get the "NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!" error messages after launching hostapd.
>
> Is this a proper fix?

Yes. This is correct. See above.

Thanks,

        tglx

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-09-05 13:00               ` Thomas Gleixner
@ 2021-09-05 13:07                 ` Fabio Estevam
  0 siblings, 0 replies; 14+ messages in thread
From: Fabio Estevam @ 2021-09-05 13:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E . McKenney, Kalle Valo, ath10k, linux-mmc, Ulf Hansson,
	Marek Vasut, qais.yousef, Frederic Weisbecker

Hi Thomas,

On Sun, Sep 5, 2021 at 10:00 AM Thomas Gleixner <tglx@linutronix.de> wrote:

> Yes. This is correct. See above.

Thanks for your help. Appreciated.

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-08-18 15:43 ` Paul E. McKenney
  2021-08-18 16:29   ` Fabio Estevam
@ 2021-09-17 16:32   ` Qais Yousef
  2021-09-17 17:09     ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Qais Yousef @ 2021-09-17 16:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Fabio Estevam, Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut

Hi Paul

On 08/18/21 08:43, Paul E. McKenney wrote:
> On Wed, Aug 18, 2021 at 12:18:25PM -0300, Fabio Estevam wrote:
> > Hi,
> > 
> > When launching the hostapd application on a i.MX7 based board with an
> > ath10k device connected via SDIO, the following "NOHZ tick-stop error"
> > messages are seen:
> > 
> > # hostapd /etc/wifi.conf
> > Configuration file: /etc/wifi.conf
> > wlan0: interface state UNINITIALIZED->COUNTRY_UPDATE
> > [   63.021149] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > Using interface wlan0 with hwaddr 00:1f:7b:31:04:a0 and ssid "thessid"
> > [   67.332470] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> > wlan0: interface state COUNTRY_UPDATE->ENABLED
> > wlan0: AP-ENABLED
> > [   68.025845] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   69.025973] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   69.607432] cfg80211: failed to load regulatory.db
> > [   72.026748] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   73.027039] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   74.027159] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   75.027109] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   76.027461] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   77.027391] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > [   78.027560] NOHZ tick-stop error: Non-RCU local softirq work is
> > pending, handler #08!!!
> > 
> > This happens on all kernel versions from 5.10  to 5.13.
> > 
> > Any ideas on how to fix this problem?
> 
> I believe that you need this commit (and possibly some prerequsites):
> 
> 47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")
> 
> Adding Qais on CC for his thoughts.

Sorry for the late response. A combination of holidays and sickness kept me
away from email for a while.

I did see an issue on 5.10 recently but I was running android kernel. I thought
initially the problem is similar to the upstream one we were seeing on mainline
for a while in the past but it turned out a genuine bug due to a patch that
tries to 'fix' softirq interference with RT. Reverting that patch fixed the
issue for me. It turned out later that it was specific to the platform I was
running on and it's not reproducible by others on other platforms.

Upstream 5.10-LTS was fine for me.

HTH.

Thanks

--
Qais Yousef

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NOHZ tick-stop error with ath10k SDIO
  2021-09-17 16:32   ` Qais Yousef
@ 2021-09-17 17:09     ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2021-09-17 17:09 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Fabio Estevam, Kalle Valo, ath10k, linux-mmc, Ulf Hansson, Marek Vasut

On Fri, Sep 17, 2021 at 05:32:45PM +0100, Qais Yousef wrote:
> Hi Paul
> 
> On 08/18/21 08:43, Paul E. McKenney wrote:
> > On Wed, Aug 18, 2021 at 12:18:25PM -0300, Fabio Estevam wrote:
> > > Hi,
> > > 
> > > When launching the hostapd application on a i.MX7 based board with an
> > > ath10k device connected via SDIO, the following "NOHZ tick-stop error"
> > > messages are seen:
> > > 
> > > # hostapd /etc/wifi.conf
> > > Configuration file: /etc/wifi.conf
> > > wlan0: interface state UNINITIALIZED->COUNTRY_UPDATE
> > > [   63.021149] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > Using interface wlan0 with hwaddr 00:1f:7b:31:04:a0 and ssid "thessid"
> > > [   67.332470] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> > > wlan0: interface state COUNTRY_UPDATE->ENABLED
> > > wlan0: AP-ENABLED
> > > [   68.025845] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   69.025973] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   69.607432] cfg80211: failed to load regulatory.db
> > > [   72.026748] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   73.027039] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   74.027159] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   75.027109] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   76.027461] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   77.027391] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > [   78.027560] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, handler #08!!!
> > > 
> > > This happens on all kernel versions from 5.10  to 5.13.
> > > 
> > > Any ideas on how to fix this problem?
> > 
> > I believe that you need this commit (and possibly some prerequsites):
> > 
> > 47c218dcae65 ("tick/sched: Prevent false positive softirq pending warnings on RT")
> > 
> > Adding Qais on CC for his thoughts.
> 
> Sorry for the late response. A combination of holidays and sickness kept me
> away from email for a while.
> 
> I did see an issue on 5.10 recently but I was running android kernel. I thought
> initially the problem is similar to the upstream one we were seeing on mainline
> for a while in the past but it turned out a genuine bug due to a patch that
> tries to 'fix' softirq interference with RT. Reverting that patch fixed the
> issue for me. It turned out later that it was specific to the platform I was
> running on and it's not reproducible by others on other platforms.
> 
> Upstream 5.10-LTS was fine for me.
> 
> HTH.

Good to hear that you found the problem and fixed it!

							Thanx, Paul

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-17 17:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18 15:18 NOHZ tick-stop error with ath10k SDIO Fabio Estevam
2021-08-18 15:43 ` Paul E. McKenney
2021-08-18 16:29   ` Fabio Estevam
2021-08-18 17:02     ` Fabio Estevam
2021-08-18 17:56       ` Paul E. McKenney
2021-08-19 13:24         ` Fabio Estevam
2021-09-02 21:51         ` Thomas Gleixner
2021-09-02 22:09           ` Paul E. McKenney
2021-09-03  8:07           ` Thomas Gleixner
2021-09-04 21:10             ` Fabio Estevam
2021-09-05 13:00               ` Thomas Gleixner
2021-09-05 13:07                 ` Fabio Estevam
2021-09-17 16:32   ` Qais Yousef
2021-09-17 17:09     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).