All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
@ 2021-01-23 23:37 Qais Yousef
  2021-01-23 23:37 ` [PATCH 1/2] printk: Add new pr_*_deferred_once() variants Qais Yousef
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Qais Yousef @ 2021-01-23 23:37 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner
  Cc: Steven Rostedt, John Ogness, Peter Zijlstra (Intel),
	linux-kernel, Qais Yousef

I hit a pr_warn() inside hrtimer_interrupt() which lead to a BUG: Invalid wait
context splat.

The problem wasn't reproducible but I think the cause is obvious, printk can't
be called from interrupt context.

AFAICU printk_deferred() is safe from NMI, so I assumed it is safe to be called
from hrtimer_interrupt() too. Adding a pr_warn_once() inside
hrtimer_interrupt() in a location where it is always hit produces the BUG
splat. Replacing it with pr_warn_deferred_once() generates the printk warning
without any splat.

I added a new pr_*_deferred_once() variants to avoid open coding; but the name
ended not much shorter and I'm not sure if the wrappers are a win overall.
Since I've already done it, I'm sticking to it in this post. But will be happy
to drop it and just open code the printk_deferred_once(KERN_WARN, ...) in
hrtimer_interrupt() instead.

Thanks

Qais Yousef (2):
  printk: Add new pr_*_deferred_once() variants
  hrtimer: Use pr_warn_deferred_once() in hrtimer_interrupt()

 include/linux/printk.h | 24 ++++++++++++++++++++++++
 kernel/time/hrtimer.c  |  3 ++-
 2 files changed, 26 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] printk: Add new pr_*_deferred_once() variants
  2021-01-23 23:37 [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt() Qais Yousef
@ 2021-01-23 23:37 ` Qais Yousef
  2021-01-25 10:37   ` Peter Zijlstra
  2021-01-23 23:37 ` [PATCH 2/2] hrtimer: Use pr_warn_deferred_once() in hrtimer_interrupt() Qais Yousef
  2021-01-26  4:46 ` [PATCH 0/2] Fix BUG: Invalid wait context " Sergey Senozhatsky
  2 siblings, 1 reply; 16+ messages in thread
From: Qais Yousef @ 2021-01-23 23:37 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner
  Cc: Steven Rostedt, John Ogness, Peter Zijlstra (Intel),
	linux-kernel, Qais Yousef

To allow users in code where printk is not allowed.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 include/linux/printk.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe7eb2351610..92c0978c7b44 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -480,21 +480,45 @@ extern int kptr_restrict;
 	printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
 /* no pr_cont_once, don't do that... */
 
+#define pr_emerg_deferred_once(fmt, ...)				\
+	printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_alert_deferred_once(fmt, ...)				\
+	printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_crit_deferred_once(fmt, ...)					\
+	printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_err_deferred_once(fmt, ...)					\
+	printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn_deferred_once(fmt, ...)					\
+	printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_notice_deferred_once(fmt, ...)				\
+	printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_info_deferred_once(fmt, ...)					\
+	printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+/* no pr_cont_deferred_once, don't do that... */
+
 #if defined(DEBUG)
 #define pr_devel_once(fmt, ...)					\
 	printk_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_devel_deferred_once(fmt, ...)			\
+	printk_deferred_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #else
 #define pr_devel_once(fmt, ...)					\
 	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_devel_deferred_once(fmt, ...)			\
+	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #endif
 
 /* If you are writing a driver, please use dev_dbg instead */
 #if defined(DEBUG)
 #define pr_debug_once(fmt, ...)					\
 	printk_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_debug_deferred_once(fmt, ...)			\
+	printk_deferred_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #else
 #define pr_debug_once(fmt, ...)					\
 	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_debug_deferred_once(fmt, ...)			\
+	no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
 #endif
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] hrtimer: Use pr_warn_deferred_once() in hrtimer_interrupt()
  2021-01-23 23:37 [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt() Qais Yousef
  2021-01-23 23:37 ` [PATCH 1/2] printk: Add new pr_*_deferred_once() variants Qais Yousef
@ 2021-01-23 23:37 ` Qais Yousef
  2021-01-26  4:46 ` [PATCH 0/2] Fix BUG: Invalid wait context " Sergey Senozhatsky
  2 siblings, 0 replies; 16+ messages in thread
From: Qais Yousef @ 2021-01-23 23:37 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner
  Cc: Steven Rostedt, John Ogness, Peter Zijlstra (Intel),
	linux-kernel, Qais Yousef

printk is not allowed in this context and causes a BUG: Invalid wait context.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 kernel/time/hrtimer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 743c852e10f2..2d9b7cf1d5e2 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1700,7 +1700,8 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	else
 		expires_next = ktime_add(now, delta);
 	tick_program_event(expires_next, 1);
-	pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta));
+	pr_warn_deferred_once("hrtimer: interrupt took %llu ns\n",
+			      ktime_to_ns(delta));
 }
 
 /* called with interrupts disabled */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants
  2021-01-23 23:37 ` [PATCH 1/2] printk: Add new pr_*_deferred_once() variants Qais Yousef
@ 2021-01-25 10:37   ` Peter Zijlstra
  2021-01-25 10:58     ` John Ogness
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2021-01-25 10:37 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner, Steven Rostedt,
	John Ogness, linux-kernel

On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
> To allow users in code where printk is not allowed.
> 
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  include/linux/printk.h | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index fe7eb2351610..92c0978c7b44 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -480,21 +480,45 @@ extern int kptr_restrict;
>  	printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>  /* no pr_cont_once, don't do that... */
>  
> +#define pr_emerg_deferred_once(fmt, ...)				\
> +	printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_alert_deferred_once(fmt, ...)				\
> +	printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_crit_deferred_once(fmt, ...)					\
> +	printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_err_deferred_once(fmt, ...)					\
> +	printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_warn_deferred_once(fmt, ...)					\
> +	printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_notice_deferred_once(fmt, ...)				\
> +	printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
> +#define pr_info_deferred_once(fmt, ...)					\
> +	printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> +/* no pr_cont_deferred_once, don't do that... */

I absolutely detest this.. the whole _deferred thing is an abomination.

We should be very close to printk not needing this anymore, printk
people?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants
  2021-01-25 10:37   ` Peter Zijlstra
@ 2021-01-25 10:58     ` John Ogness
  2021-01-25 19:46       ` Qais Yousef
  2022-01-11 13:40       ` Andy Shevchenko
  0 siblings, 2 replies; 16+ messages in thread
From: John Ogness @ 2021-01-25 10:58 UTC (permalink / raw)
  To: Peter Zijlstra, Qais Yousef
  Cc: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner, Steven Rostedt,
	linux-kernel

On 2021-01-25, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
>> To allow users in code where printk is not allowed.
>> 
>> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
>> ---
>>  include/linux/printk.h | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>> 
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index fe7eb2351610..92c0978c7b44 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> @@ -480,21 +480,45 @@ extern int kptr_restrict;
>>  	printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>>  /* no pr_cont_once, don't do that... */
>>  
>> +#define pr_emerg_deferred_once(fmt, ...)				\
>> +	printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_alert_deferred_once(fmt, ...)				\
>> +	printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_crit_deferred_once(fmt, ...)					\
>> +	printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_err_deferred_once(fmt, ...)					\
>> +	printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_warn_deferred_once(fmt, ...)					\
>> +	printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_notice_deferred_once(fmt, ...)				\
>> +	printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
>> +#define pr_info_deferred_once(fmt, ...)					\
>> +	printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>> +/* no pr_cont_deferred_once, don't do that... */
>
> I absolutely detest this.. the whole _deferred thing is an
> abomination.

And it will disappear at some point.

> We should be very close to printk not needing this anymore, printk
> people?

It will disappear once console printing threads are introduced. We
probably still have a few kernel releases until we see that. First we
need to finish merging full lockless access, remove the safe buffers,
and merge the atomic consoles.

John Ogness

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants
  2021-01-25 10:58     ` John Ogness
@ 2021-01-25 19:46       ` Qais Yousef
  2022-01-11 13:40       ` Andy Shevchenko
  1 sibling, 0 replies; 16+ messages in thread
From: Qais Yousef @ 2021-01-25 19:46 UTC (permalink / raw)
  To: John Ogness
  Cc: Peter Zijlstra, Petr Mladek, Sergey Senozhatsky, Thomas Gleixner,
	Steven Rostedt, linux-kernel

On 01/25/21 12:04, John Ogness wrote:
> On 2021-01-25, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
> >> To allow users in code where printk is not allowed.
> >> 
> >> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> >> ---
> >>  include/linux/printk.h | 24 ++++++++++++++++++++++++
> >>  1 file changed, 24 insertions(+)
> >> 
> >> diff --git a/include/linux/printk.h b/include/linux/printk.h
> >> index fe7eb2351610..92c0978c7b44 100644
> >> --- a/include/linux/printk.h
> >> +++ b/include/linux/printk.h
> >> @@ -480,21 +480,45 @@ extern int kptr_restrict;
> >>  	printk_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> >>  /* no pr_cont_once, don't do that... */
> >>  
> >> +#define pr_emerg_deferred_once(fmt, ...)				\
> >> +	printk_deferred_once(KERN_EMERG pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_alert_deferred_once(fmt, ...)				\
> >> +	printk_deferred_once(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_crit_deferred_once(fmt, ...)					\
> >> +	printk_deferred_once(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_err_deferred_once(fmt, ...)					\
> >> +	printk_deferred_once(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_warn_deferred_once(fmt, ...)					\
> >> +	printk_deferred_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_notice_deferred_once(fmt, ...)				\
> >> +	printk_deferred_once(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
> >> +#define pr_info_deferred_once(fmt, ...)					\
> >> +	printk_deferred_once(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> >> +/* no pr_cont_deferred_once, don't do that... */
> >
> > I absolutely detest this.. the whole _deferred thing is an
> > abomination.
> 
> And it will disappear at some point.
> 
> > We should be very close to printk not needing this anymore, printk
> > people?
> 
> It will disappear once console printing threads are introduced. We
> probably still have a few kernel releases until we see that. First we
> need to finish merging full lockless access, remove the safe buffers,
> and merge the atomic consoles.

Okay. As I said in the cover letter, I didn't think the addition of these new
macros looked like a win overall.

I will drop this patch and just open code the use of printk_deferred_once() in
hrtimer_interrupt(). Which should be easier to fix later when it should
disappear.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-23 23:37 [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt() Qais Yousef
  2021-01-23 23:37 ` [PATCH 1/2] printk: Add new pr_*_deferred_once() variants Qais Yousef
  2021-01-23 23:37 ` [PATCH 2/2] hrtimer: Use pr_warn_deferred_once() in hrtimer_interrupt() Qais Yousef
@ 2021-01-26  4:46 ` Sergey Senozhatsky
  2021-01-26 14:59   ` Qais Yousef
  2 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-01-26  4:46 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Petr Mladek, Sergey Senozhatsky, Thomas Gleixner, Steven Rostedt,
	John Ogness, Peter Zijlstra (Intel),
	linux-kernel

On (21/01/23 23:37), Qais Yousef wrote:
> 
> I hit a pr_warn() inside hrtimer_interrupt() which lead to a BUG: Invalid wait
> context splat.
> 
> The problem wasn't reproducible but I think the cause is obvious, printk can't
> be called from interrupt context.
> 
> AFAICU printk_deferred() is safe from NMI, so I assumed it is safe to be called
> from hrtimer_interrupt() too. Adding a pr_warn_once() inside
> hrtimer_interrupt() in a location where it is always hit produces the BUG
> splat. Replacing it with pr_warn_deferred_once() generates the printk warning
> without any splat.

Could you please post the lockdep splat?
Why is it invalid? Is this... -rt kernel, perhaps?

	-ss

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-26  4:46 ` [PATCH 0/2] Fix BUG: Invalid wait context " Sergey Senozhatsky
@ 2021-01-26 14:59   ` Qais Yousef
  2021-01-26 15:22     ` John Ogness
  2021-01-26 15:58     ` Sergey Senozhatsky
  0 siblings, 2 replies; 16+ messages in thread
From: Qais Yousef @ 2021-01-26 14:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Thomas Gleixner, Steven Rostedt, John Ogness,
	Peter Zijlstra (Intel),
	linux-kernel

On 01/26/21 13:46, Sergey Senozhatsky wrote:
> On (21/01/23 23:37), Qais Yousef wrote:
> > 
> > I hit a pr_warn() inside hrtimer_interrupt() which lead to a BUG: Invalid wait
> > context splat.
> > 
> > The problem wasn't reproducible but I think the cause is obvious, printk can't
> > be called from interrupt context.
> > 
> > AFAICU printk_deferred() is safe from NMI, so I assumed it is safe to be called
> > from hrtimer_interrupt() too. Adding a pr_warn_once() inside
> > hrtimer_interrupt() in a location where it is always hit produces the BUG
> > splat. Replacing it with pr_warn_deferred_once() generates the printk warning
> > without any splat.
> 
> Could you please post the lockdep splat?

Sorry I should have done that. I thought printk() in interrupt is not allowed
and obvious.

I reported it here: https://lore.kernel.org/lkml/20210114124512.mg3vexudfmoadef5@e107158-lin/

	# [67628.388606] hrtimer: interrupt took 304720 ns
	[67628.393546]
	[67628.393550] =============================
	[67628.393554] [ BUG: Invalid wait context ]
	[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
	[67628.393560] -----------------------------
	[67628.393563] sugov:0/192 is trying to lock:
	[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
	[67628.393581] other info that might help us debug this:
	[67628.393584] context-{2:2}
	[67628.393586] 4 locks held by sugov:0/192:
	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
	[67628.393618]  #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
	[67628.393632]  #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
	[67628.393646] stack backtrace:
	[67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
	[67628.393653] Hardware name: ARM Juno development board (r2) (DT)
	[67628.393656] Call trace:
	[67628.393659]  dump_backtrace+0x0/0x1b0
	[67628.393661]  show_stack+0x20/0x70
	[67628.393664]  dump_stack+0xf8/0x168
	[67628.393666]  __lock_acquire+0x964/0x1c88
	[67628.393669]  lock_acquire+0x26c/0x500
	[67628.393671]  _raw_spin_lock+0x64/0x88
	[67628.393674]  pl011_console_write+0x138/0x218
	[67628.393677]  console_unlock+0x2c4/0x6d8
	[67628.393680]  vprintk_emit+0x134/0x310
	[67628.393682]  vprintk_default+0x40/0x50
	[67628.393685]  vprintk_func+0xfc/0x2b0
	[67628.393687]  printk+0x68/0x90
	[67628.393690]  hrtimer_interrupt+0x24c/0x250
	[67628.393693]  arch_timer_handler_phys+0x3c/0x50
	[67628.393695]  handle_percpu_devid_irq+0xd8/0x460
	[67628.393698]  generic_handle_irq+0x38/0x50
	[67628.393701]  __handle_domain_irq+0x6c/0xc8
	[67628.393704]  gic_handle_irq+0xb0/0xf0
	[67628.393706]  el1_irq+0xb4/0x180
	[67628.393709]  _raw_spin_unlock_irqrestore+0x58/0xa8
	[67628.393712]  hrtimer_start_range_ns+0x1b0/0x420
	[67628.393715]  msg_submit+0x100/0x108
	[67628.393717]  mbox_send_message+0x84/0x128
	[67628.393720]  scpi_send_message+0x11c/0x2a8
	[67628.393723]  scpi_dvfs_set_idx+0x48/0x70
	[67628.393726]  scpi_dvfs_set_rate+0x60/0x78
	[67628.393728]  clk_change_rate+0x184/0x8a8
	[67628.393731]  clk_core_set_rate_nolock+0x1c0/0x280
	[67628.393734]  clk_set_rate+0x40/0x98
	[67628.393736]  scpi_cpufreq_set_target+0x40/0x68
	[67628.393739]  __cpufreq_driver_target+0x1a0/0x5d0

> Why is it invalid? Is this... -rt kernel, perhaps?

No I was running on Linus master at the time.

AFAIU printk will hold the console lock which could sleep in interrupt context.

Did I miss something?

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-26 14:59   ` Qais Yousef
@ 2021-01-26 15:22     ` John Ogness
  2021-01-26 16:18       ` Peter Zijlstra
  2021-01-26 15:58     ` Sergey Senozhatsky
  1 sibling, 1 reply; 16+ messages in thread
From: John Ogness @ 2021-01-26 15:22 UTC (permalink / raw)
  To: Qais Yousef, Sergey Senozhatsky
  Cc: Petr Mladek, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra (Intel),
	linux-kernel

On 2021-01-26, Qais Yousef <qais.yousef@arm.com> wrote:
> 	[67628.393550] =============================
> 	[67628.393554] [ BUG: Invalid wait context ]
> 	[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> 	[67628.393560] -----------------------------
> 	[67628.393563] sugov:0/192 is trying to lock:
> 	[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> 	[67628.393581] other info that might help us debug this:
> 	[67628.393584] context-{2:2}
> 	[67628.393586] 4 locks held by sugov:0/192:
> 	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> 	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> 	[67628.393618]  #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> 	[67628.393632]  #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
> 	[67628.393646] stack backtrace:
> 	[67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
> 	[67628.393653] Hardware name: ARM Juno development board (r2) (DT)
> 	[67628.393656] Call trace:
> 	[67628.393659]  dump_backtrace+0x0/0x1b0
> 	[67628.393661]  show_stack+0x20/0x70
> 	[67628.393664]  dump_stack+0xf8/0x168
> 	[67628.393666]  __lock_acquire+0x964/0x1c88
> 	[67628.393669]  lock_acquire+0x26c/0x500
> 	[67628.393671]  _raw_spin_lock+0x64/0x88
> 	[67628.393674]  pl011_console_write+0x138/0x218
> 	[67628.393677]  console_unlock+0x2c4/0x6d8
> 	[67628.393680]  vprintk_emit+0x134/0x310
> 	[67628.393682]  vprintk_default+0x40/0x50
> 	[67628.393685]  vprintk_func+0xfc/0x2b0
> 	[67628.393687]  printk+0x68/0x90
> 	[67628.393690]  hrtimer_interrupt+0x24c/0x250
> 	[67628.393693]  arch_timer_handler_phys+0x3c/0x50
> 	[67628.393695]  handle_percpu_devid_irq+0xd8/0x460
> 	[67628.393698]  generic_handle_irq+0x38/0x50
> 	[67628.393701]  __handle_domain_irq+0x6c/0xc8
> 	[67628.393704]  gic_handle_irq+0xb0/0xf0
> 	[67628.393706]  el1_irq+0xb4/0x180
> 	[67628.393709]  _raw_spin_unlock_irqrestore+0x58/0xa8
> 	[67628.393712]  hrtimer_start_range_ns+0x1b0/0x420
> 	[67628.393715]  msg_submit+0x100/0x108
> 	[67628.393717]  mbox_send_message+0x84/0x128
> 	[67628.393720]  scpi_send_message+0x11c/0x2a8
> 	[67628.393723]  scpi_dvfs_set_idx+0x48/0x70
> 	[67628.393726]  scpi_dvfs_set_rate+0x60/0x78
> 	[67628.393728]  clk_change_rate+0x184/0x8a8
> 	[67628.393731]  clk_core_set_rate_nolock+0x1c0/0x280
> 	[67628.393734]  clk_set_rate+0x40/0x98
> 	[67628.393736]  scpi_cpufreq_set_target+0x40/0x68
> 	[67628.393739]  __cpufreq_driver_target+0x1a0/0x5d0

pl011_console_write() does:

    local_irq_save(flags);
    spin_lock(&uap->port.lock);

This needs to be:

    spin_lock_irqsave(&uap->port.lock, flags);

Of course, it will be a trickier than that because of the
@oops_in_progress madness. This bug has been around for quite a while.

John Ogness

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-26 14:59   ` Qais Yousef
  2021-01-26 15:22     ` John Ogness
@ 2021-01-26 15:58     ` Sergey Senozhatsky
  2021-01-26 16:23       ` Peter Zijlstra
  2021-01-26 16:58       ` Petr Mladek
  1 sibling, 2 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2021-01-26 15:58 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Sergey Senozhatsky, Petr Mladek, Thomas Gleixner, Steven Rostedt,
	John Ogness, Peter Zijlstra (Intel),
	linux-kernel

On (21/01/26 14:59), Qais Yousef wrote:
> On 01/26/21 13:46, Sergey Senozhatsky wrote:
> > On (21/01/23 23:37), Qais Yousef wrote:

[..]

> I reported it here: https://lore.kernel.org/lkml/20210114124512.mg3vexudfmoadef5@e107158-lin/
>
> 	# [67628.388606] hrtimer: interrupt took 304720 ns
> 	[67628.393546]
> 	[67628.393550] =============================
> 	[67628.393554] [ BUG: Invalid wait context ]
> 	[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> 	[67628.393560] -----------------------------
> 	[67628.393563] sugov:0/192 is trying to lock:
> 	[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> 	[67628.393581] other info that might help us debug this:
> 	[67628.393584] context-{2:2}
> 	[67628.393586] 4 locks held by sugov:0/192:
> 	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> 	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> 	[67628.393618]  #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> 	[67628.393632]  #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
> 	[67628.393646] stack backtrace:
> 	[67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
> 	[67628.393653] Hardware name: ARM Juno development board (r2) (DT)
> 	[67628.393656] Call trace:
> 	[67628.393659]  dump_backtrace+0x0/0x1b0
> 	[67628.393661]  show_stack+0x20/0x70
> 	[67628.393664]  dump_stack+0xf8/0x168
> 	[67628.393666]  __lock_acquire+0x964/0x1c88
> 	[67628.393669]  lock_acquire+0x26c/0x500
> 	[67628.393671]  _raw_spin_lock+0x64/0x88
> 	[67628.393674]  pl011_console_write+0x138/0x218
> 	[67628.393677]  console_unlock+0x2c4/0x6d8
> 	[67628.393680]  vprintk_emit+0x134/0x310
> 	[67628.393682]  vprintk_default+0x40/0x50
> 	[67628.393685]  vprintk_func+0xfc/0x2b0
> 	[67628.393687]  printk+0x68/0x90
> 	[67628.393690]  hrtimer_interrupt+0x24c/0x250
> 	[67628.393693]  arch_timer_handler_phys+0x3c/0x50
> 	[67628.393695]  handle_percpu_devid_irq+0xd8/0x460
> 	[67628.393698]  generic_handle_irq+0x38/0x50
> 	[67628.393701]  __handle_domain_irq+0x6c/0xc8
> 	[67628.393704]  gic_handle_irq+0xb0/0xf0
> 	[67628.393706]  el1_irq+0xb4/0x180
> 	[67628.393709]  _raw_spin_unlock_irqrestore+0x58/0xa8
> 	[67628.393712]  hrtimer_start_range_ns+0x1b0/0x420
> 	[67628.393715]  msg_submit+0x100/0x108
> 	[67628.393717]  mbox_send_message+0x84/0x128
> 	[67628.393720]  scpi_send_message+0x11c/0x2a8
> 	[67628.393723]  scpi_dvfs_set_idx+0x48/0x70
> 	[67628.393726]  scpi_dvfs_set_rate+0x60/0x78
> 	[67628.393728]  clk_change_rate+0x184/0x8a8
> 	[67628.393731]  clk_core_set_rate_nolock+0x1c0/0x280
> 	[67628.393734]  clk_set_rate+0x40/0x98
> 	[67628.393736]  scpi_cpufreq_set_target+0x40/0x68
> 	[67628.393739]  __cpufreq_driver_target+0x1a0/0x5d0

Thanks.

> > Why is it invalid? Is this... -rt kernel, perhaps?
> 
> No I was running on Linus master at the time.
> 
> AFAIU printk will hold the console lock which could sleep in interrupt context.
> 
> Did I miss something?

printk() is not permitted to sleep/schedule/etc and it never does.
Generally it should be OK to call it from IRQ (module recursion paths).

	-ss

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-26 15:22     ` John Ogness
@ 2021-01-26 16:18       ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2021-01-26 16:18 UTC (permalink / raw)
  To: John Ogness
  Cc: Qais Yousef, Sergey Senozhatsky, Petr Mladek, Thomas Gleixner,
	Steven Rostedt, linux-kernel

On Tue, Jan 26, 2021 at 04:28:42PM +0106, John Ogness wrote:
> On 2021-01-26, Qais Yousef <qais.yousef@arm.com> wrote:
> > 	[67628.393550] =============================
> > 	[67628.393554] [ BUG: Invalid wait context ]
> > 	[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > 	[67628.393560] -----------------------------
> > 	[67628.393563] sugov:0/192 is trying to lock:
> > 	[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > 	[67628.393581] other info that might help us debug this:
> > 	[67628.393584] context-{2:2}
> > 	[67628.393586] 4 locks held by sugov:0/192:
> > 	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > 	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0


> pl011_console_write() does:
> 
>     local_irq_save(flags);
>     spin_lock(&uap->port.lock);
> 
> This needs to be:
> 
>     spin_lock_irqsave(&uap->port.lock, flags);
> 
> Of course, it will be a trickier than that because of the
> @oops_in_progress madness. This bug has been around for quite a while.

Well yes, but that's not going to be fixing anything.

The problem is that sugov_work() is holding a raw_spin_lock, and you're
trying to acquire a spin_lock.

That's an invalid lock nesting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-26 15:58     ` Sergey Senozhatsky
@ 2021-01-26 16:23       ` Peter Zijlstra
  2021-01-26 18:08         ` Qais Yousef
  2021-01-26 16:58       ` Petr Mladek
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2021-01-26 16:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Qais Yousef, Petr Mladek, Thomas Gleixner, Steven Rostedt,
	John Ogness, linux-kernel

On Wed, Jan 27, 2021 at 12:58:33AM +0900, Sergey Senozhatsky wrote:
> On (21/01/26 14:59), Qais Yousef wrote:

> > 	# [67628.388606] hrtimer: interrupt took 304720 ns
> > 	[67628.393546]
> > 	[67628.393550] =============================
> > 	[67628.393554] [ BUG: Invalid wait context ]
> > 	[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > 	[67628.393560] -----------------------------
> > 	[67628.393563] sugov:0/192 is trying to lock:
> > 	[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > 	[67628.393581] other info that might help us debug this:
> > 	[67628.393584] context-{2:2}
> > 	[67628.393586] 4 locks held by sugov:0/192:
> > 	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > 	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> > 	[67628.393618]  #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> > 	[67628.393632]  #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8

> > Did I miss something?
> 
> printk() is not permitted to sleep/schedule/etc and it never does.
> Generally it should be OK to call it from IRQ (module recursion paths).

The report is that it is trying to acquire spin_lock() while holding
raw_spin_lock(), which is an invalid lock nesting.

Note that this is CONFIG_PROVE_RAW_LOCK_NESTING=y which specifically
checks for this.

On current (mainline) kernel configs this is not yet a problem, but the
moment we do land PREEMPT_RT this order will be problematic.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-26 15:58     ` Sergey Senozhatsky
  2021-01-26 16:23       ` Peter Zijlstra
@ 2021-01-26 16:58       ` Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2021-01-26 16:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Qais Yousef, Thomas Gleixner, Steven Rostedt, John Ogness,
	Peter Zijlstra (Intel),
	linux-kernel

On Wed 2021-01-27 00:58:33, Sergey Senozhatsky wrote:
> On (21/01/26 14:59), Qais Yousef wrote:
> > On 01/26/21 13:46, Sergey Senozhatsky wrote:
> > > On (21/01/23 23:37), Qais Yousef wrote:
> 
> [..]
> 
> > I reported it here: https://lore.kernel.org/lkml/20210114124512.mg3vexudfmoadef5@e107158-lin/
> >
> > 	# [67628.388606] hrtimer: interrupt took 304720 ns
> > 	[67628.393546]
> > 	[67628.393550] =============================
> > 	[67628.393554] [ BUG: Invalid wait context ]
> > 	[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > 	[67628.393560] -----------------------------
> > 	[67628.393563] sugov:0/192 is trying to lock:
> > 	[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > 	[67628.393581] other info that might help us debug this:
> > 	[67628.393584] context-{2:2}
> > 	[67628.393586] 4 locks held by sugov:0/192:
> > 	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > 	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> > 	[67628.393618]  #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> > 	[67628.393632]  #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
> > 	[67628.393646] stack backtrace:
> > 	[67628.393649] CPU: 0 PID: 192 Comm: sugov:0 Not tainted 5.11.0-rc3-00019-g86be331946f7 #37
> > 	[67628.393653] Hardware name: ARM Juno development board (r2) (DT)
> > 	[67628.393656] Call trace:
> > 	[67628.393659]  dump_backtrace+0x0/0x1b0
> > 	[67628.393661]  show_stack+0x20/0x70
> > 	[67628.393664]  dump_stack+0xf8/0x168
> > 	[67628.393666]  __lock_acquire+0x964/0x1c88
> > 	[67628.393669]  lock_acquire+0x26c/0x500
> > 	[67628.393671]  _raw_spin_lock+0x64/0x88
> > 	[67628.393674]  pl011_console_write+0x138/0x218
> > 	[67628.393677]  console_unlock+0x2c4/0x6d8
> > 	[67628.393680]  vprintk_emit+0x134/0x310
> > 	[67628.393682]  vprintk_default+0x40/0x50
> > 	[67628.393685]  vprintk_func+0xfc/0x2b0
> > 	[67628.393687]  printk+0x68/0x90
> > 	[67628.393690]  hrtimer_interrupt+0x24c/0x250
> > 	[67628.393693]  arch_timer_handler_phys+0x3c/0x50
> > 	[67628.393695]  handle_percpu_devid_irq+0xd8/0x460
> > 	[67628.393698]  generic_handle_irq+0x38/0x50
> > 	[67628.393701]  __handle_domain_irq+0x6c/0xc8
> > 	[67628.393704]  gic_handle_irq+0xb0/0xf0
> > 	[67628.393706]  el1_irq+0xb4/0x180
> > 	[67628.393709]  _raw_spin_unlock_irqrestore+0x58/0xa8
> > 	[67628.393712]  hrtimer_start_range_ns+0x1b0/0x420
> > 	[67628.393715]  msg_submit+0x100/0x108
> > 	[67628.393717]  mbox_send_message+0x84/0x128
> > 	[67628.393720]  scpi_send_message+0x11c/0x2a8
> > 	[67628.393723]  scpi_dvfs_set_idx+0x48/0x70
> > 	[67628.393726]  scpi_dvfs_set_rate+0x60/0x78
> > 	[67628.393728]  clk_change_rate+0x184/0x8a8
> > 	[67628.393731]  clk_core_set_rate_nolock+0x1c0/0x280
> > 	[67628.393734]  clk_set_rate+0x40/0x98
> > 	[67628.393736]  scpi_cpufreq_set_target+0x40/0x68
> > 	[67628.393739]  __cpufreq_driver_target+0x1a0/0x5d0
> 
> Thanks.
> 
> > > Why is it invalid? Is this... -rt kernel, perhaps?
> > 
> > No I was running on Linus master at the time.
> > 
> > AFAIU printk will hold the console lock which could sleep in interrupt context.
> > 
> > Did I miss something?
> 
> printk() is not permitted to sleep/schedule/etc and it never does.
> Generally it should be OK to call it from IRQ (module recursion paths).

To make it clear. printk() takes the mutex via console_trylock().
It does not sleep when the lock is not available. But lockdep should
be aware of this.

Honestly, I am not sure what the lockdep complains about here.
IMHO, the key is that we try to take lock

     (&port_lock_key){-.-.}-{3:3}

in context

     context-{2:2}

where {-.-.} looks fine. Both dots and dashes looks safe, see
      get_usage_char()

So, it is likely the {3:3} vs. {2:2}. I have never seen this.
It has been added by the commit de8f5e4f2dc1f032 ("lockdep: Introduce
wait-type checks").

Might it be that we take normal mutex under a raw one?

It would be safe in normal kernel. But it would be bad
in RT one. But lockep should complain about it even in
the normal kernel.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt()
  2021-01-26 16:23       ` Peter Zijlstra
@ 2021-01-26 18:08         ` Qais Yousef
  0 siblings, 0 replies; 16+ messages in thread
From: Qais Yousef @ 2021-01-26 18:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Petr Mladek, Thomas Gleixner, Steven Rostedt,
	John Ogness, linux-kernel

On 01/26/21 17:23, Peter Zijlstra wrote:
> On Wed, Jan 27, 2021 at 12:58:33AM +0900, Sergey Senozhatsky wrote:
> > On (21/01/26 14:59), Qais Yousef wrote:
> 
> > > 	# [67628.388606] hrtimer: interrupt took 304720 ns
> > > 	[67628.393546]
> > > 	[67628.393550] =============================
> > > 	[67628.393554] [ BUG: Invalid wait context ]
> > > 	[67628.393557] 5.11.0-rc3-00019-g86be331946f7 #37 Not tainted
> > > 	[67628.393560] -----------------------------
> > > 	[67628.393563] sugov:0/192 is trying to lock:
> > > 	[67628.393566] ffff000800b1d898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x138/0x218
> > > 	[67628.393581] other info that might help us debug this:
> > > 	[67628.393584] context-{2:2}
> > > 	[67628.393586] 4 locks held by sugov:0/192:
> > > 	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > > 	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0
> > > 	[67628.393618]  #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310
> > > 	[67628.393632]  #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8
> 
> > > Did I miss something?
> > 
> > printk() is not permitted to sleep/schedule/etc and it never does.
> > Generally it should be OK to call it from IRQ (module recursion paths).
> 
> The report is that it is trying to acquire spin_lock() while holding
> raw_spin_lock(), which is an invalid lock nesting.
> 
> Note that this is CONFIG_PROVE_RAW_LOCK_NESTING=y which specifically
> checks for this.
> 
> On current (mainline) kernel configs this is not yet a problem, but the
> moment we do land PREEMPT_RT this order will be problematic.

I should have dug more into the history of printk() and the meaning of the
splat. Sorry for the noise.

Looking at v5.10.8-rt24 the following fix is applied in RT

	https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/drivers/tty/serial/amba-pl011.c?h=linux-5.10.y-rt&id=008cc77aff249e830e5eb90b7ae3a6784597b8cf

which is what John suggested.

Looking at the locks held

> > > 	[67628.393589]  #0: ffff0008059cb720 (&sg_policy->work_lock){+.+.}-{4:4}, at: sugov_work+0x58/0x88
> > > 	[67628.393603]  #1: ffff800015446f20 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x34/0xb0

These two are mutexes.

> > > 	[67628.393618]  #2: ffff8000152aaa60 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x12c/0x310

This is a semaphore.

> > > 	[67628.393632]  #3: ffff8000152aab88 (console_owner){-.-.}-{0:0}, at: console_unlock+0x190/0x6d8

I think this is acquired by console_lock_spinning_enable() which has acquiring
syntax I'm not familiar with. console_owner_lock is defined as RAW_SPINLOCK, so
regardless of how it is acquired, it must be the problem.

Looks like John has reworked this code in RT too. So maybe this is just a red
herring after all..

	https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/kernel/printk/printk.c?h=linux-5.10.y-rt&id=0097798fd99948d3ffea535005eee7eb3b14fd06

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants
  2021-01-25 10:58     ` John Ogness
  2021-01-25 19:46       ` Qais Yousef
@ 2022-01-11 13:40       ` Andy Shevchenko
  2022-01-13 11:20         ` Qais Yousef
  1 sibling, 1 reply; 16+ messages in thread
From: Andy Shevchenko @ 2022-01-11 13:40 UTC (permalink / raw)
  To: John Ogness
  Cc: Peter Zijlstra, Qais Yousef, Petr Mladek, Sergey Senozhatsky,
	Thomas Gleixner, Steven Rostedt, linux-kernel

On Mon, Jan 25, 2021 at 12:04:01PM +0106, John Ogness wrote:
> On 2021-01-25, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:

...

> > We should be very close to printk not needing this anymore, printk
> > people?
> 
> It will disappear once console printing threads are introduced. We
> probably still have a few kernel releases until we see that. First we
> need to finish merging full lockless access, remove the safe buffers,
> and merge the atomic consoles.

Where are we now at this?

I almost easily can reproduce the below at boot time for last several releases

 cblist_init_generic: Setting adjustable number of callback queues.

 =============================
 [ BUG: Invalid wait context ]
 5.16.0-next-20220111-00101-gb3e9f395c6f1 #28 Not tainted
 -----------------------------
 swapper/0/1 is trying to lock:
 ffffffffb3dce758 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x3a8/0x480
 other info that might help us debug this:
 context-{5:5}
 3 locks held by swapper/0/1:
  #0: ffffffffb2955e08 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1e0
  #1: ffffffffb2953ec0 (console_lock){+.+.}-{0:0}, at: _printk+0x53/0x6a
  #2: ffffffffb2873ca0 (console_owner){....}-{0:0}, at: console_unlock+0x157/0x580
 stack backtrace:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-next-20220111-00101-gb3e9f395c6f1 #28
 Hardware name: Intel Corporation CHERRYVIEW D0 PLATFORM/Braswell CRB, BIOS BRAS.X64.B082.R00.1507270557 07/27/2015
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x7d
  __lock_acquire.cold+0xca/0x2e8
  ? stack_trace_save+0x46/0x70
  lock_acquire+0xd2/0x300
  ? serial8250_console_write+0x3a8/0x480
  _raw_spin_lock_irqsave+0x39/0x50
  ? serial8250_console_write+0x3a8/0x480
  serial8250_console_write+0x3a8/0x480
  console_unlock+0x397/0x580
  vprintk_emit+0x99/0x2e0
  _printk+0x53/0x6a
  cblist_init_generic.constprop.0.cold+0x24/0x31
  rcu_init_tasks_generic+0x5/0xd9
  kernel_init_freeable+0x14d/0x294
  ? rest_init+0x250/0x250
  kernel_init+0x11/0x110
  ret_from_fork+0x22/0x30
  </TASK>
 cblist_init_generic: Setting shift to 2 and lim to 1.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] printk: Add new pr_*_deferred_once() variants
  2022-01-11 13:40       ` Andy Shevchenko
@ 2022-01-13 11:20         ` Qais Yousef
  0 siblings, 0 replies; 16+ messages in thread
From: Qais Yousef @ 2022-01-13 11:20 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: John Ogness, Peter Zijlstra, Petr Mladek, Sergey Senozhatsky,
	Thomas Gleixner, Steven Rostedt, linux-kernel

On 01/11/22 15:40, Andy Shevchenko wrote:
> On Mon, Jan 25, 2021 at 12:04:01PM +0106, John Ogness wrote:
> > On 2021-01-25, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Sat, Jan 23, 2021 at 11:37:40PM +0000, Qais Yousef wrote:
> 
> ...
> 
> > > We should be very close to printk not needing this anymore, printk
> > > people?
> > 
> > It will disappear once console printing threads are introduced. We
> > probably still have a few kernel releases until we see that. First we
> > need to finish merging full lockless access, remove the safe buffers,
> > and merge the atomic consoles.
> 
> Where are we now at this?

I think this answer from Peter then the summarizes the problem

	\x7fLINKIFYGDDfGbBIJGdfBFBfaIHdbHFdAGCFbIDAACadHAAJ

"""
Note that this is CONFIG_PROVE_RAW_LOCK_NESTING=y which specifically
checks for this.

On current (mainline) kernel configs this is not yet a problem, but the
moment we do land PREEMPT_RT this order will be problematic.
"""

PREEMPT_RT patchset has fixes that should land on mainline eventually and AFAIU
this is a red herring on mainline. Although this might be a new call site that
breaks the rule for PREEMPT_RT and they'd want to know about it.

I wonder if CONFIG_PROVE_RAW_LOCK_NESTING should depend on PREEMPT_RT until the
fixes land - assuming they haven't, I'm not following its state.

HTH.

Cheers

--
Qais Yousef

> 
> I almost easily can reproduce the below at boot time for last several releases
> 
>  cblist_init_generic: Setting adjustable number of callback queues.
> 
>  =============================
>  [ BUG: Invalid wait context ]
>  5.16.0-next-20220111-00101-gb3e9f395c6f1 #28 Not tainted
>  -----------------------------
>  swapper/0/1 is trying to lock:
>  ffffffffb3dce758 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x3a8/0x480
>  other info that might help us debug this:
>  context-{5:5}
>  3 locks held by swapper/0/1:
>   #0: ffffffffb2955e08 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1e0
>   #1: ffffffffb2953ec0 (console_lock){+.+.}-{0:0}, at: _printk+0x53/0x6a
>   #2: ffffffffb2873ca0 (console_owner){....}-{0:0}, at: console_unlock+0x157/0x580
>  stack backtrace:
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-next-20220111-00101-gb3e9f395c6f1 #28
>  Hardware name: Intel Corporation CHERRYVIEW D0 PLATFORM/Braswell CRB, BIOS BRAS.X64.B082.R00.1507270557 07/27/2015
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x57/0x7d
>   __lock_acquire.cold+0xca/0x2e8
>   ? stack_trace_save+0x46/0x70
>   lock_acquire+0xd2/0x300
>   ? serial8250_console_write+0x3a8/0x480
>   _raw_spin_lock_irqsave+0x39/0x50
>   ? serial8250_console_write+0x3a8/0x480
>   serial8250_console_write+0x3a8/0x480
>   console_unlock+0x397/0x580
>   vprintk_emit+0x99/0x2e0
>   _printk+0x53/0x6a
>   cblist_init_generic.constprop.0.cold+0x24/0x31
>   rcu_init_tasks_generic+0x5/0xd9
>   kernel_init_freeable+0x14d/0x294
>   ? rest_init+0x250/0x250
>   kernel_init+0x11/0x110
>   ret_from_fork+0x22/0x30
>   </TASK>
>  cblist_init_generic: Setting shift to 2 and lim to 1.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-01-13 11:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-23 23:37 [PATCH 0/2] Fix BUG: Invalid wait context in hrtimer_interrupt() Qais Yousef
2021-01-23 23:37 ` [PATCH 1/2] printk: Add new pr_*_deferred_once() variants Qais Yousef
2021-01-25 10:37   ` Peter Zijlstra
2021-01-25 10:58     ` John Ogness
2021-01-25 19:46       ` Qais Yousef
2022-01-11 13:40       ` Andy Shevchenko
2022-01-13 11:20         ` Qais Yousef
2021-01-23 23:37 ` [PATCH 2/2] hrtimer: Use pr_warn_deferred_once() in hrtimer_interrupt() Qais Yousef
2021-01-26  4:46 ` [PATCH 0/2] Fix BUG: Invalid wait context " Sergey Senozhatsky
2021-01-26 14:59   ` Qais Yousef
2021-01-26 15:22     ` John Ogness
2021-01-26 16:18       ` Peter Zijlstra
2021-01-26 15:58     ` Sergey Senozhatsky
2021-01-26 16:23       ` Peter Zijlstra
2021-01-26 18:08         ` Qais Yousef
2021-01-26 16:58       ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.