linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] sched: Use printk_deferred() in __schedule()
@ 2022-02-09 22:13 Stephen Brennan
  2022-02-09 22:13 ` [PATCH 1/1] sched: Use printk_deferred during __schedule() Stephen Brennan
  2022-02-09 23:00 ` [PATCH 0/1] sched: Use printk_deferred() in __schedule() Phil Auld
  0 siblings, 2 replies; 3+ messages in thread
From: Stephen Brennan @ 2022-02-09 22:13 UTC (permalink / raw)
  To: Juri Lelli, Ingo Molnar, Peter Zijlstra, Vincent Guittot
  Cc: linux-kernel, Steven Rostedt, Mel Gorman, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Ben Segall, Stephen Brennan

Hello all,

I have a customer whose systems are encountering a deadlock due to a
warning in the perf_cgroup_switch function, which is called during
__schedule:

  #5 [ffffa82fd8fd3700] native_queued_spin_lock_slowpath at ffffffff8ef0ef2b
  #6 [ffffa82fd8fd3708] queued_spin_lock_slowpath at ffffffff8ef0eafb
  #7 [ffffa82fd8fd3718] _raw_spin_lock at ffffffff8f805a93
  #8 [ffffa82fd8fd3728] try_to_wake_up at ffffffff8eee1f8e
  #9 [ffffa82fd8fd3790] wake_up_process at ffffffff8eee2425
 #10 [ffffa82fd8fd37a0] wake_up_worker at ffffffff8eec67b8
 #11 [ffffa82fd8fd37b0] insert_work at ffffffff8eec7997
 #12 [ffffa82fd8fd37f0] __queue_work at ffffffff8eec7af2
 #13 [ffffa82fd8fd3838] queue_work_on at ffffffff8eec82f8
 #14 [ffffa82fd8fd3850] drm_fb_helper_dirty at ffffffffc1d3cea5 [drm_kms_helper]
 #15 [ffffa82fd8fd3890] drm_fb_helper_sys_imageblit at ffffffffc1d3d010 [drm_kms_helper]
 #16 [ffffa82fd8fd38b0] soft_cursor at ffffffff8f316906
 #17 [ffffa82fd8fd3900] bit_cursor at ffffffff8f3166da
 #18 [ffffa82fd8fd39d0] fbcon_cursor at ffffffff8f311e3f
 #19 [ffffa82fd8fd3a18] hide_cursor at ffffffff8f3c6213
 #20 [ffffa82fd8fd3a30] vt_console_print at ffffffff8f3c7e42
 #21 [ffffa82fd8fd3a98] console_unlock at ffffffff8ef1d4ac
 #22 [ffffa82fd8fd3ae0] vprintk_emit at ffffffff8ef1ecf5
 #23 [ffffa82fd8fd3b40] vprintk_default at ffffffff8ef1ee29
 #24 [ffffa82fd8fd3b50] vprintk_func at ffffffff8ef1f8a4
 #25 [ffffa82fd8fd3b70] printk at ffffffff8ef1bf58
 #26 [ffffa82fd8fd3bd0] report_bug at ffffffff8f7dbbc5
 #27 [ffffa82fd8fd3c10] fixup_bug at ffffffff8ee3592c
 #28 [ffffa82fd8fd3c30] do_error_trap at ffffffff8ee35989
 #29 [ffffa82fd8fd3c78] do_invalid_op at ffffffff8ee35ac2
 #30 [ffffa82fd8fd3ca0] invalid_op at ffffffff8fa013b7
     [exception RIP: perf_cgroup_switch+330]
     RIP: ffffffff8f01c26a  RSP: ffffa82fd8fd3d58  RFLAGS: 00010086
     RAX: 0000000000000002  RBX: ffff99013f7b2708  RCX: 000000000000038f
     RDX: 00000000000105a0  RSI: 0000000000000000  RDI: 000000000000038f
     RBP: ffffa82fd8fd3d98   R8: 000000000000016e   R9: 0000000000000039
     R10: 0000000000000000  R11: 0000000000000000  R12: ffff99013f7a8010
     R13: 0000000000000000  R14: 0000000000000000  R15: ffff99013f7b2700
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #31 [ffffa82fd8fd3d50] perf_cgroup_switch at ffffffff8f01c223
 #32 [ffffa82fd8fd3da0] __perf_event_task_sched_in at ffffffff8f01c538
 #33 [ffffa82fd8fd3e18] finish_task_switch at ffffffff8eedd460
 #34 [ffffa82fd8fd3e50] __schedule at ffffffff8f800794
 #35 [ffffa82fd8fd3ea0] schedule_idle at ffffffff8f800ebc
 #36 [ffffa82fd8fd3eb8] do_idle at ffffffff8eee794d
 #37 [ffffa82fd8fd3f10] cpu_startup_entry at ffffffff8eee7bfd
 #38 [ffffa82fd8fd3f28] start_secondary at ffffffff8ee6f6e9
 #39 [ffffa82fd8fd3f50] secondary_startup_64 at ffffffff8ee000e6

While certainly, the perf bug which results in this warning should be
fixed, and I am working on tracking it down, it's also clear that this
sort of warning should not deadlock the kernel. My proposed change is to
enter a "printk_deferred" section at the beginning of __schedule as the
rq lock is taken, and then end it as the lock is dropped. This style of
fix has already been proposed before [1], but was NAK'd.

I think enough has changed that we should reconsider this change.

First, one reason for the NAK was that a better printk implementation
was forthcoming, which would only write to atomic-safe console
drivers ("sane" drivers as they were called in that discussion),
delegating the rest to a kthread. This implementation is still not
upstream, but in the meantime we've had several kernel releases with
this sort of issue present.

Second, another reason for the NAK was that WARN statements require
immediate printing, since the system may not be stable enough to print
the warning later. Though this is true, warnings are not meant to be
fatal, and we should hope that the system can limp along long enough to
exit a single context switch. Even if that's not possible, the printk
infrastructure contains several ways to allow the buffered messages to
reach the console even if the system crashes later:

1) the log buffer can be written to several places via kmsg_dump()
2) panic tries very hard to empty the log buffer to console drivers
3) crash dumps contain the full log buffer data, and debug tools can
readily reconstruct these records

Third, since the posting of the previous series, commit
57116ce17b04 ("workqueue: fix state-dump console deadlock") was
introduced with no objection, to address the same deadlock in a
workqueue function.

Of course, I'm not bound to this particular solution -- I'd be glad to
discuss alternatives and other ideas. Thanks for reading and
considering!

[1]: https://lore.kernel.org/lkml/20200927161130.33172-1-zhouchengming@bytedance.com/


Stephen Brennan (1):
  sched: Use printk_deferred during __schedule()

 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] sched: Use printk_deferred during __schedule()
  2022-02-09 22:13 [PATCH 0/1] sched: Use printk_deferred() in __schedule() Stephen Brennan
@ 2022-02-09 22:13 ` Stephen Brennan
  2022-02-09 23:00 ` [PATCH 0/1] sched: Use printk_deferred() in __schedule() Phil Auld
  1 sibling, 0 replies; 3+ messages in thread
From: Stephen Brennan @ 2022-02-09 22:13 UTC (permalink / raw)
  To: Juri Lelli, Ingo Molnar, Peter Zijlstra, Vincent Guittot
  Cc: linux-kernel, Steven Rostedt, Mel Gorman, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Ben Segall, Stephen Brennan

When the rq lock is held and a printk occurs, some console drivers may
trigger a deadlock by calling into scheduler functions (e.g. scheduling
work). Though printk() is avoided in these contexts, there are some WARN
or WARN_ONCE statements which can trigger deadlocks during context
switch operations. These warnings should not trigger deadlocks. Use the
printk_deferred helper to ensure console drivers aren't called until
after the critical section is over.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 848eaa0efe0e..bc8afc3020a4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4861,6 +4861,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	perf_event_task_sched_in(prev, current);
 	finish_task(prev);
 	tick_nohz_task_switch();
+	printk_deferred_exit();
 	finish_lock_switch(rq);
 	finish_arch_post_lock_switch();
 	kcov_finish_switch(current);
@@ -6208,6 +6209,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	 */
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
+	printk_deferred_enter();
 
 	/* Promote REQ to ACT */
 	rq->clock_update_flags <<= 1;
@@ -6298,6 +6300,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 
 		rq_unpin_lock(rq, &rf);
 		__balance_callbacks(rq);
+		printk_deferred_exit();
 		raw_spin_rq_unlock_irq(rq);
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/1] sched: Use printk_deferred() in __schedule()
  2022-02-09 22:13 [PATCH 0/1] sched: Use printk_deferred() in __schedule() Stephen Brennan
  2022-02-09 22:13 ` [PATCH 1/1] sched: Use printk_deferred during __schedule() Stephen Brennan
@ 2022-02-09 23:00 ` Phil Auld
  1 sibling, 0 replies; 3+ messages in thread
From: Phil Auld @ 2022-02-09 23:00 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Juri Lelli, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	linux-kernel, Steven Rostedt, Mel Gorman, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Ben Segall

Hi Stephen,

On Wed, Feb 09, 2022 at 02:13:18PM -0800 Stephen Brennan wrote:
> Hello all,
> 
> I have a customer whose systems are encountering a deadlock due to a
> warning in the perf_cgroup_switch function, which is called during
> __schedule:
> 
>   #5 [ffffa82fd8fd3700] native_queued_spin_lock_slowpath at ffffffff8ef0ef2b
>   #6 [ffffa82fd8fd3708] queued_spin_lock_slowpath at ffffffff8ef0eafb
>   #7 [ffffa82fd8fd3718] _raw_spin_lock at ffffffff8f805a93
>   #8 [ffffa82fd8fd3728] try_to_wake_up at ffffffff8eee1f8e
>   #9 [ffffa82fd8fd3790] wake_up_process at ffffffff8eee2425
>  #10 [ffffa82fd8fd37a0] wake_up_worker at ffffffff8eec67b8
>  #11 [ffffa82fd8fd37b0] insert_work at ffffffff8eec7997
>  #12 [ffffa82fd8fd37f0] __queue_work at ffffffff8eec7af2
>  #13 [ffffa82fd8fd3838] queue_work_on at ffffffff8eec82f8
>  #14 [ffffa82fd8fd3850] drm_fb_helper_dirty at ffffffffc1d3cea5 [drm_kms_helper]
>  #15 [ffffa82fd8fd3890] drm_fb_helper_sys_imageblit at ffffffffc1d3d010 [drm_kms_helper]
>  #16 [ffffa82fd8fd38b0] soft_cursor at ffffffff8f316906
>  #17 [ffffa82fd8fd3900] bit_cursor at ffffffff8f3166da
>  #18 [ffffa82fd8fd39d0] fbcon_cursor at ffffffff8f311e3f
>  #19 [ffffa82fd8fd3a18] hide_cursor at ffffffff8f3c6213
>  #20 [ffffa82fd8fd3a30] vt_console_print at ffffffff8f3c7e42
>  #21 [ffffa82fd8fd3a98] console_unlock at ffffffff8ef1d4ac
>  #22 [ffffa82fd8fd3ae0] vprintk_emit at ffffffff8ef1ecf5
>  #23 [ffffa82fd8fd3b40] vprintk_default at ffffffff8ef1ee29
>  #24 [ffffa82fd8fd3b50] vprintk_func at ffffffff8ef1f8a4
>  #25 [ffffa82fd8fd3b70] printk at ffffffff8ef1bf58
>  #26 [ffffa82fd8fd3bd0] report_bug at ffffffff8f7dbbc5
>  #27 [ffffa82fd8fd3c10] fixup_bug at ffffffff8ee3592c
>  #28 [ffffa82fd8fd3c30] do_error_trap at ffffffff8ee35989
>  #29 [ffffa82fd8fd3c78] do_invalid_op at ffffffff8ee35ac2
>  #30 [ffffa82fd8fd3ca0] invalid_op at ffffffff8fa013b7
>      [exception RIP: perf_cgroup_switch+330]
>      RIP: ffffffff8f01c26a  RSP: ffffa82fd8fd3d58  RFLAGS: 00010086
>      RAX: 0000000000000002  RBX: ffff99013f7b2708  RCX: 000000000000038f
>      RDX: 00000000000105a0  RSI: 0000000000000000  RDI: 000000000000038f
>      RBP: ffffa82fd8fd3d98   R8: 000000000000016e   R9: 0000000000000039
>      R10: 0000000000000000  R11: 0000000000000000  R12: ffff99013f7a8010
>      R13: 0000000000000000  R14: 0000000000000000  R15: ffff99013f7b2700
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #31 [ffffa82fd8fd3d50] perf_cgroup_switch at ffffffff8f01c223
>  #32 [ffffa82fd8fd3da0] __perf_event_task_sched_in at ffffffff8f01c538
>  #33 [ffffa82fd8fd3e18] finish_task_switch at ffffffff8eedd460
>  #34 [ffffa82fd8fd3e50] __schedule at ffffffff8f800794
>  #35 [ffffa82fd8fd3ea0] schedule_idle at ffffffff8f800ebc
>  #36 [ffffa82fd8fd3eb8] do_idle at ffffffff8eee794d
>  #37 [ffffa82fd8fd3f10] cpu_startup_entry at ffffffff8eee7bfd
>  #38 [ffffa82fd8fd3f28] start_secondary at ffffffff8ee6f6e9
>  #39 [ffffa82fd8fd3f50] secondary_startup_64 at ffffffff8ee000e6
> 
> While certainly, the perf bug which results in this warning should be
> fixed, and I am working on tracking it down, it's also clear that this
> sort of warning should not deadlock the kernel. My proposed change is to
> enter a "printk_deferred" section at the beginning of __schedule as the
> rq lock is taken, and then end it as the lock is dropped. This style of
> fix has already been proposed before [1], but was NAK'd.
> 
> I think enough has changed that we should reconsider this change.
> 
> First, one reason for the NAK was that a better printk implementation
> was forthcoming, which would only write to atomic-safe console
> drivers ("sane" drivers as they were called in that discussion),
> delegating the rest to a kthread. This implementation is still not
> upstream, but in the meantime we've had several kernel releases with
> this sort of issue present.
> 
> Second, another reason for the NAK was that WARN statements require
> immediate printing, since the system may not be stable enough to print
> the warning later. Though this is true, warnings are not meant to be
> fatal, and we should hope that the system can limp along long enough to
> exit a single context switch. Even if that's not possible, the printk
> infrastructure contains several ways to allow the buffered messages to
> reach the console even if the system crashes later:
> 
> 1) the log buffer can be written to several places via kmsg_dump()
> 2) panic tries very hard to empty the log buffer to console drivers
> 3) crash dumps contain the full log buffer data, and debug tools can
> readily reconstruct these records
> 
> Third, since the posting of the previous series, commit
> 57116ce17b04 ("workqueue: fix state-dump console deadlock") was
> introduced with no objection, to address the same deadlock in a
> workqueue function.
> 
> Of course, I'm not bound to this particular solution -- I'd be glad to
> discuss alternatives and other ideas. Thanks for reading and
> considering!
> 
> [1]: https://lore.kernel.org/lkml/20200927161130.33172-1-zhouchengming@bytedance.com/
> 

Thanks for raising this. I've got a bug report with the same issue. In this
case we hit the warning in cfs_rq_is_decayed(). That should be less likely
to fire now with recent changes but that's only a bandaid.

I think the printk rework is coming Real Soon Now, but it's not here yet as
you say. It would be helpful to get some visibility back on this issue. This
seems like a reasonable way out in the meantime.


Cheers,
Phil


> 
> Stephen Brennan (1):
>   sched: Use printk_deferred during __schedule()
> 
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> -- 
> 2.30.2
> 

-- 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-09 23:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09 22:13 [PATCH 0/1] sched: Use printk_deferred() in __schedule() Stephen Brennan
2022-02-09 22:13 ` [PATCH 1/1] sched: Use printk_deferred during __schedule() Stephen Brennan
2022-02-09 23:00 ` [PATCH 0/1] sched: Use printk_deferred() in __schedule() Phil Auld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).