linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock
@ 2021-03-27 23:25 Waiman Long
  2021-03-27 23:25 ` [PATCH 2/2] sched/debug: Use sched_debug_lock to serialize use of cgroup_path[] only Waiman Long
  2021-03-29 10:23 ` [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Daniel Thompson
  0 siblings, 2 replies; 5+ messages in thread
From: Waiman Long @ 2021-03-27 23:25 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira
  Cc: Bharata B Rao, Phil Auld, linux-kernel, Waiman Long

The sched_debug_lock was used only in print_cpu().  The
print_cpu() function has two callers - sched_debug_show() and
sysrq_sched_debug_show(). Both of them are invoked by user action
(sched_debug file and sysrq-t). As print_cpu() won't be called from
interrupt context at all, there is no point in disabling IRQ when
acquiring sched_debug_lock.

Besides, if the system has many cpus and the sched_debug_lock is somehow
busy (e.g. parallel sysrq-t), the system may hit a hard lockup panic, like

[ 7809.796262] Kernel panic - not syncing: Hard LOCKUP
[ 7809.796264] CPU: 13 PID: 79867 Comm: reproducer.sh Kdump: loaded Tainted: G          I      --------- -  - 4.18.0-301.el8.x86_64 #1
[ 7809.796264] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.9 06/29/2018
[ 7809.796265] Call Trace:
[ 7809.796265]  <NMI>
[ 7809.796266]  dump_stack+0x5c/0x80
[ 7809.796266]  panic+0xe7/0x2a9
[ 7809.796267]  nmi_panic.cold.9+0xc/0xc
[ 7809.796267]  watchdog_overflow_callback.cold.7+0x5c/0x70
[ 7809.796268]  __perf_event_overflow+0x52/0xf0
[ 7809.796268]  handle_pmi_common+0x204/0x2a0
[ 7809.796269]  ? __set_pte_vaddr+0x32/0x50
[ 7809.796269]  ? __native_set_fixmap+0x24/0x30
[ 7809.796270]  ? ghes_copy_tofrom_phys+0xd3/0x1c0
[ 7809.796271]  intel_pmu_handle_irq+0xbf/0x160
[ 7809.796271]  perf_event_nmi_handler+0x2d/0x50
[ 7809.796272]  nmi_handle+0x63/0x110
[ 7809.796272]  default_do_nmi+0x49/0x100
[ 7809.796273]  do_nmi+0x17e/0x1e0
[ 7809.796273]  end_repeat_nmi+0x16/0x6f
[ 7809.796274] RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1d0
[ 7809.796275] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
[ 7809.796276] RSP: 0018:ffffaa54cd887df8 EFLAGS: 00000002
[ 7809.796277] RAX: 0000000000000101 RBX: 0000000000000246 RCX: 0000000000000000
[ 7809.796278] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff936b66d0
[ 7809.796278] RBP: ffffffff9301fb40 R08: 0000000000000004 R09: 000000000000004f
[ 7809.796279] R10: 0000000000000000 R11: ffffaa54cd887cc0 R12: ffff907fd0a29ec0
[ 7809.796280] R13: 0000000000000000 R14: ffffffff926ab7c0 R15: 0000000000000000
[ 7809.796280]  ? native_queued_spin_lock_slowpath+0x5b/0x1d0
[ 7809.796281]  ? native_queued_spin_lock_slowpath+0x5b/0x1d0
[ 7809.796281]  </NMI>
[ 7809.796282]  _raw_spin_lock_irqsave+0x32/0x40
[ 7809.796283]  print_cpu+0x261/0x7c0
[ 7809.796283]  sysrq_sched_debug_show+0x34/0x50
[ 7809.796284]  sysrq_handle_showstate+0xc/0x20
[ 7809.796284]  __handle_sysrq.cold.11+0x48/0xfb
[ 7809.796285]  write_sysrq_trigger+0x2b/0x30
[ 7809.796285]  proc_reg_write+0x39/0x60
[ 7809.796286]  vfs_write+0xa5/0x1a0
[ 7809.796286]  ksys_write+0x4f/0xb0
[ 7809.796287]  do_syscall_64+0x5b/0x1a0
[ 7809.796287]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 7809.796288] RIP: 0033:0x7fabe4ceb648

It is not to disable IRQ for so long that a hard lockup panic is
triggered. Fix that by not disabling IRQ when taking sched_debug_lock.

Fixes: efe25c2c7b3a ("sched: Reinstate group names in /proc/sched_debug")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/debug.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 486f403a778b..c4ae8a0853a1 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -666,7 +666,6 @@ void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq)
 static void print_cpu(struct seq_file *m, int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	unsigned long flags;
 
 #ifdef CONFIG_X86
 	{
@@ -717,13 +716,13 @@ do {									\
 	}
 #undef P
 
-	spin_lock_irqsave(&sched_debug_lock, flags);
+	spin_lock(&sched_debug_lock);
 	print_cfs_stats(m, cpu);
 	print_rt_stats(m, cpu);
 	print_dl_stats(m, cpu);
 
 	print_rq(m, rq, cpu);
-	spin_unlock_irqrestore(&sched_debug_lock, flags);
+	spin_unlock(&sched_debug_lock);
 	SEQ_printf(m, "\n");
 }
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] sched/debug: Use sched_debug_lock to serialize use of cgroup_path[] only
  2021-03-27 23:25 [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Waiman Long
@ 2021-03-27 23:25 ` Waiman Long
  2021-03-29 10:23 ` [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Daniel Thompson
  1 sibling, 0 replies; 5+ messages in thread
From: Waiman Long @ 2021-03-27 23:25 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira
  Cc: Bharata B Rao, Phil Auld, linux-kernel, Waiman Long

The purpose of sched_debug_lock is to serialize the use of the global
cgroup_path[] buffer in print_cpu(). The rests of the printf calls
don't need serialization from sched_debug_lock. The printing of sched
debug data to console can take quite a while, taking sched_debug_lock
at the print_cpu() level will unnecessarily block the progress of other
print_cpu() users. Fix that by holding sched_debug_lock only when using
cgroup_path[] via task_group_path().

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/sched/debug.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index c4ae8a0853a1..87168d13714f 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -8,8 +8,6 @@
  */
 #include "sched.h"
 
-static DEFINE_SPINLOCK(sched_debug_lock);
-
 /*
  * This allows printing both to /proc/sched_debug and
  * to the console
@@ -470,6 +468,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group
 #endif
 
 #ifdef CONFIG_CGROUP_SCHED
+static DEFINE_SPINLOCK(sched_debug_lock);
 static char group_path[PATH_MAX];
 
 static char *task_group_path(struct task_group *tg)
@@ -506,7 +505,9 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
 	SEQ_printf(m, " %d %d", task_node(p), task_numa_group_id(p));
 #endif
 #ifdef CONFIG_CGROUP_SCHED
+	spin_lock(&sched_debug_lock);
 	SEQ_printf(m, " %s", task_group_path(task_group(p)));
+	spin_unlock(&sched_debug_lock);
 #endif
 
 	SEQ_printf(m, "\n");
@@ -543,7 +544,9 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	SEQ_printf(m, "\n");
+	spin_lock(&sched_debug_lock);
 	SEQ_printf(m, "cfs_rq[%d]:%s\n", cpu, task_group_path(cfs_rq->tg));
+	spin_unlock(&sched_debug_lock);
 #else
 	SEQ_printf(m, "\n");
 	SEQ_printf(m, "cfs_rq[%d]:\n", cpu);
@@ -614,7 +617,9 @@ void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq)
 {
 #ifdef CONFIG_RT_GROUP_SCHED
 	SEQ_printf(m, "\n");
+	spin_lock(&sched_debug_lock);
 	SEQ_printf(m, "rt_rq[%d]:%s\n", cpu, task_group_path(rt_rq->tg));
+	spin_unlock(&sched_debug_lock);
 #else
 	SEQ_printf(m, "\n");
 	SEQ_printf(m, "rt_rq[%d]:\n", cpu);
@@ -716,13 +721,11 @@ do {									\
 	}
 #undef P
 
-	spin_lock(&sched_debug_lock);
 	print_cfs_stats(m, cpu);
 	print_rt_stats(m, cpu);
 	print_dl_stats(m, cpu);
 
 	print_rq(m, rq, cpu);
-	spin_unlock(&sched_debug_lock);
 	SEQ_printf(m, "\n");
 }
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock
  2021-03-27 23:25 [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Waiman Long
  2021-03-27 23:25 ` [PATCH 2/2] sched/debug: Use sched_debug_lock to serialize use of cgroup_path[] only Waiman Long
@ 2021-03-29 10:23 ` Daniel Thompson
  2021-03-29 11:19   ` Peter Zijlstra
  2021-03-29 14:23   ` Waiman Long
  1 sibling, 2 replies; 5+ messages in thread
From: Daniel Thompson @ 2021-03-29 10:23 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Bharata B Rao, Phil Auld,
	linux-kernel

On Sat, Mar 27, 2021 at 07:25:28PM -0400, Waiman Long wrote:
> The sched_debug_lock was used only in print_cpu().  The
> print_cpu() function has two callers - sched_debug_show() and
> sysrq_sched_debug_show(). Both of them are invoked by user action
> (sched_debug file and sysrq-t). As print_cpu() won't be called from
> interrupt context at all, there is no point in disabling IRQ when
> acquiring sched_debug_lock.

This looks like it introduces a deadlock risk if sysrq-t triggers from an
interrupt context. Has the behaviour of sysrq changed recently or will
tools like MAGIC_SYSRQ_SERIAL still trigger from interrupt context?


Daniel.


> 
> Besides, if the system has many cpus and the sched_debug_lock is somehow
> busy (e.g. parallel sysrq-t), the system may hit a hard lockup panic, like
> 
> [ 7809.796262] Kernel panic - not syncing: Hard LOCKUP
> [ 7809.796264] CPU: 13 PID: 79867 Comm: reproducer.sh Kdump: loaded Tainted: G          I      --------- -  - 4.18.0-301.el8.x86_64 #1
> [ 7809.796264] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.9 06/29/2018
> [ 7809.796265] Call Trace:
> [ 7809.796265]  <NMI>
> [ 7809.796266]  dump_stack+0x5c/0x80
> [ 7809.796266]  panic+0xe7/0x2a9
> [ 7809.796267]  nmi_panic.cold.9+0xc/0xc
> [ 7809.796267]  watchdog_overflow_callback.cold.7+0x5c/0x70
> [ 7809.796268]  __perf_event_overflow+0x52/0xf0
> [ 7809.796268]  handle_pmi_common+0x204/0x2a0
> [ 7809.796269]  ? __set_pte_vaddr+0x32/0x50
> [ 7809.796269]  ? __native_set_fixmap+0x24/0x30
> [ 7809.796270]  ? ghes_copy_tofrom_phys+0xd3/0x1c0
> [ 7809.796271]  intel_pmu_handle_irq+0xbf/0x160
> [ 7809.796271]  perf_event_nmi_handler+0x2d/0x50
> [ 7809.796272]  nmi_handle+0x63/0x110
> [ 7809.796272]  default_do_nmi+0x49/0x100
> [ 7809.796273]  do_nmi+0x17e/0x1e0
> [ 7809.796273]  end_repeat_nmi+0x16/0x6f
> [ 7809.796274] RIP: 0010:native_queued_spin_lock_slowpath+0x5b/0x1d0
> [ 7809.796275] Code: 6d f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00
> [ 7809.796276] RSP: 0018:ffffaa54cd887df8 EFLAGS: 00000002
> [ 7809.796277] RAX: 0000000000000101 RBX: 0000000000000246 RCX: 0000000000000000
> [ 7809.796278] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff936b66d0
> [ 7809.796278] RBP: ffffffff9301fb40 R08: 0000000000000004 R09: 000000000000004f
> [ 7809.796279] R10: 0000000000000000 R11: ffffaa54cd887cc0 R12: ffff907fd0a29ec0
> [ 7809.796280] R13: 0000000000000000 R14: ffffffff926ab7c0 R15: 0000000000000000
> [ 7809.796280]  ? native_queued_spin_lock_slowpath+0x5b/0x1d0
> [ 7809.796281]  ? native_queued_spin_lock_slowpath+0x5b/0x1d0
> [ 7809.796281]  </NMI>
> [ 7809.796282]  _raw_spin_lock_irqsave+0x32/0x40
> [ 7809.796283]  print_cpu+0x261/0x7c0
> [ 7809.796283]  sysrq_sched_debug_show+0x34/0x50
> [ 7809.796284]  sysrq_handle_showstate+0xc/0x20
> [ 7809.796284]  __handle_sysrq.cold.11+0x48/0xfb
> [ 7809.796285]  write_sysrq_trigger+0x2b/0x30
> [ 7809.796285]  proc_reg_write+0x39/0x60
> [ 7809.796286]  vfs_write+0xa5/0x1a0
> [ 7809.796286]  ksys_write+0x4f/0xb0
> [ 7809.796287]  do_syscall_64+0x5b/0x1a0
> [ 7809.796287]  entry_SYSCALL_64_after_hwframe+0x65/0xca
> [ 7809.796288] RIP: 0033:0x7fabe4ceb648
> 
> It is not to disable IRQ for so long that a hard lockup panic is
> triggered. Fix that by not disabling IRQ when taking sched_debug_lock.
> 
> Fixes: efe25c2c7b3a ("sched: Reinstate group names in /proc/sched_debug")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  kernel/sched/debug.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 486f403a778b..c4ae8a0853a1 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -666,7 +666,6 @@ void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq)
>  static void print_cpu(struct seq_file *m, int cpu)
>  {
>  	struct rq *rq = cpu_rq(cpu);
> -	unsigned long flags;
>  
>  #ifdef CONFIG_X86
>  	{
> @@ -717,13 +716,13 @@ do {									\
>  	}
>  #undef P
>  
> -	spin_lock_irqsave(&sched_debug_lock, flags);
> +	spin_lock(&sched_debug_lock);
>  	print_cfs_stats(m, cpu);
>  	print_rt_stats(m, cpu);
>  	print_dl_stats(m, cpu);
>  
>  	print_rq(m, rq, cpu);
> -	spin_unlock_irqrestore(&sched_debug_lock, flags);
> +	spin_unlock(&sched_debug_lock);
>  	SEQ_printf(m, "\n");
>  }
>  
> -- 
> 2.18.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock
  2021-03-29 10:23 ` [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Daniel Thompson
@ 2021-03-29 11:19   ` Peter Zijlstra
  2021-03-29 14:23   ` Waiman Long
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2021-03-29 11:19 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Waiman Long, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Bharata B Rao, Phil Auld,
	linux-kernel

On Mon, Mar 29, 2021 at 11:23:45AM +0100, Daniel Thompson wrote:
> On Sat, Mar 27, 2021 at 07:25:28PM -0400, Waiman Long wrote:
> > The sched_debug_lock was used only in print_cpu().  The
> > print_cpu() function has two callers - sched_debug_show() and
> > sysrq_sched_debug_show(). Both of them are invoked by user action
> > (sched_debug file and sysrq-t). As print_cpu() won't be called from
> > interrupt context at all, there is no point in disabling IRQ when
> > acquiring sched_debug_lock.
> 
> This looks like it introduces a deadlock risk if sysrq-t triggers from an
> interrupt context. Has the behaviour of sysrq changed recently or will
> tools like MAGIC_SYSRQ_SERIAL still trigger from interrupt context?

Yeah, sysrq-t is very often interrupt context. The patch is clearly
bogus.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock
  2021-03-29 10:23 ` [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Daniel Thompson
  2021-03-29 11:19   ` Peter Zijlstra
@ 2021-03-29 14:23   ` Waiman Long
  1 sibling, 0 replies; 5+ messages in thread
From: Waiman Long @ 2021-03-29 14:23 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Bharata B Rao, Phil Auld,
	linux-kernel

On 3/29/21 6:23 AM, Daniel Thompson wrote:
> On Sat, Mar 27, 2021 at 07:25:28PM -0400, Waiman Long wrote:
>> The sched_debug_lock was used only in print_cpu().  The
>> print_cpu() function has two callers - sched_debug_show() and
>> sysrq_sched_debug_show(). Both of them are invoked by user action
>> (sched_debug file and sysrq-t). As print_cpu() won't be called from
>> interrupt context at all, there is no point in disabling IRQ when
>> acquiring sched_debug_lock.
> This looks like it introduces a deadlock risk if sysrq-t triggers from an
> interrupt context. Has the behaviour of sysrq changed recently or will
> tools like MAGIC_SYSRQ_SERIAL still trigger from interrupt context?

You are right. It looks like that if CONFIG_MAGIC_SYSRQ_SERIAL is set, 
it is possible for sysrq information dump to happen in an interrupt 
context. I missed that in my initial analysis. However, doing the time 
consuming info dump with interrupt disabled for an extended period of 
time is not a good idea.

Still with my second patch to minimize the size of the critical 
sections, it will minimize the chance of causing trouble except when 
doing it directly from the console.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-29 14:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-27 23:25 [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Waiman Long
2021-03-27 23:25 ` [PATCH 2/2] sched/debug: Use sched_debug_lock to serialize use of cgroup_path[] only Waiman Long
2021-03-29 10:23 ` [PATCH 1/2] sched/debug: Don't disable IRQ when acquiring sched_debug_lock Daniel Thompson
2021-03-29 11:19   ` Peter Zijlstra
2021-03-29 14:23   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).