All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: SysRq nice-all-RT-tasks is broken
@ 2017-03-08 15:23 Laurent Dufour
  2017-03-08 16:51 ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Laurent Dufour @ 2017-03-08 15:23 UTC (permalink / raw)
  To: Andrew Morton, Steven Rostedt
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar

Hi,

It appears that triggering the SysRq nice-all-RT-tasks from the console
while a real task is active is leading to panic the system like this :

 sysrq: SysRq : Nice All RT Tasks
 ------------[ cut here ]------------
 kernel BUG at /build/linux-twbIHf/linux-4.10.0/kernel/sched/core.c:4089!
 Oops: Exception in kernel mode, sig: 5 [#1]
 SMP NR_CPUS=2048
 NUMA
 pSeries
 Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace
fscache pseries_rng vmx_crypto binfmt_misc dm_multipath scsi_dh_rdac
scsi_dh_emc scsi_dh_alua sunrpc ip_tables x_tables autofs4 btrfs xor
raid6_pq ses enclosure scsi_transport_sas crc32c_vpmsum mlx5_core be2net
ipr devlink
 CPU: 100 PID: 0 Comm: swapper/100 Not tainted 4.10.0-8-generic #10-Ubuntu
 task: c0000003b6179c00 task.stack: c0000003b620c000
 NIP: c0000000001044f0 LR: c00000000010e524 CTR: c0000000006d0ec0
 REGS: c00000077fc978d0 TRAP: 0700   Not tainted  (4.10.0-8-generic)
 MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
   CR: 28002228  XER: 20000001
 CFAR: c00000000010e520 SOFTE: 0
 GPR00: c00000000010e524 c00000077fc97b50 c00000000143c900 c00000038785fa00
 GPR04: c00000077fc97c00 0000000000000000 0000000000000000 ffffffffffffffff
 GPR08: 0000000000000000 0000000000010000 0000000000000000 0000000000000406
 GPR12: 0000000028002228 c000000007b58400 c0000003b620ff90 000000001ede2d80
 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 GPR20: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
 GPR24: 0000000000000000 c00000077a0f8b50 0000000000000063 0000000000000000
 GPR28: c00000038785fa00 c00000077fc97c00 0000000000000000 c00000038785fa00
 NIP [c0000000001044f0] __sched_setscheduler+0x90/0xd30
 LR [c00000000010e524] normalize_rt_tasks+0x184/0x1c0
 Call Trace:
 [c00000077fc97be0] [c00000000010e524] normalize_rt_tasks+0x184/0x1c0
 [c00000077fc97c60] [c0000000006d0ee0] sysrq_handle_unrt+0x20/0x40
 [c00000077fc97c80] [c0000000006d1c98] __handle_sysrq+0xe8/0x280
 [c00000077fc97d20] [c0000000006eab38] hvc_poll+0x1c8/0x360
 [c00000077fc97dc0] [c0000000006ebd74] hvc_handle_interrupt+0x24/0x50
 [c00000077fc97de0] [c0000000001453d0] __handle_irq_event_percpu+0x90/0x2c0
 [c00000077fc97ea0] [c000000000145638] handle_irq_event_percpu+0x38/0x90
 [c00000077fc97ee0] [c0000000001456f4] handle_irq_event+0x64/0xb0
 [c00000077fc97f10] [c00000000014add8] handle_fasteoi_irq+0xe8/0x290
 [c00000077fc97f40] [c000000000143ebc] generic_handle_irq+0x4c/0x80
 [c00000077fc97f60] [c0000000000167b0] __do_irq+0x80/0x1d0
 [c00000077fc97f90] [c000000000029414] call_do_irq+0x14/0x24
 [c0000003b620fa40] [c000000000016994] do_IRQ+0x94/0x140
 [c0000003b620fa80] [c000000000008be4] hardware_interrupt_common+0x114/0x120
 --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28
     LR = check_and_cede_processor+0x34/0x50
 [c0000003b620fd70] [c0000003b620fe10] 0xc0000003b620fe10 (unreliable)
 [c0000003b620fdd0] [c00000000095db50] shared_cede_loop+0x50/0x150
 [c0000003b620fe00] [c00000000095accc] cpuidle_enter_state+0x16c/0x430
 [c0000003b620fe60] [c00000000012c8bc] call_cpuidle+0x4c/0x90
 [c0000003b620fe80] [c00000000012cce0] do_idle+0x2c0/0x330
 [c0000003b620ff00] [c00000000012cfa8] cpu_startup_entry+0x38/0x50
 [c0000003b620ff30] [c000000000044180] start_secondary+0x330/0x370
 [c0000003b620ff90] [c00000000000aa6c] start_secondary_prolog+0x10/0x14
 Instruction dump:
 7c7f1b78 7cb62b78 7cdb3378 2b8a0006 7d5507b4 419e0010 83440014 235a0063
 7f5a07b4 78290464 8129000c 552902ee <0b090000> 2f950000 419c04b8 2f950005
 ---[ end trace 891618fbca78ebe7 ]---

I got it on Power and on X86_64, but I guess it should happen in all
architectures.
Here are the steps to recreate it :
1. Create a RT task : sudo chrt -f 50 /bin/sleep 999999
2. On the console trigger the 'nice-all-RT-tasks' SYS-RQ.

The panic is triggered by the BUG_ON(in_interrupt()) introduced by this
commit:

66e5393a78b3 ("[PATCH] BUG() if setscheduler is called from interrupt
context")

Since SysRq is run from the interrupt context, the panic is expected.

Looking at the code, I'm wondering if the BUG_ON() is still required in
__sched_setscheduler(). But I'm not so confident, so requesting your
advise here.

Thanks,
Laurent.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: SysRq nice-all-RT-tasks is broken
  2017-03-08 15:23 RFC: SysRq nice-all-RT-tasks is broken Laurent Dufour
@ 2017-03-08 16:51 ` Steven Rostedt
  2017-03-08 16:57   ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2017-03-08 16:51 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar

On Wed, 8 Mar 2017 16:23:35 +0100
Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:

> I got it on Power and on X86_64, but I guess it should happen in all
> architectures.
> Here are the steps to recreate it :
> 1. Create a RT task : sudo chrt -f 50 /bin/sleep 999999
> 2. On the console trigger the 'nice-all-RT-tasks' SYS-RQ.
> 
> The panic is triggered by the BUG_ON(in_interrupt()) introduced by this
> commit:
> 
> 66e5393a78b3 ("[PATCH] BUG() if setscheduler is called from interrupt
> context")
> 
> Since SysRq is run from the interrupt context, the panic is expected.
> 
> Looking at the code, I'm wondering if the BUG_ON() is still required in
> __sched_setscheduler(). But I'm not so confident, so requesting your
> advise here.
> 

Hmm, that commit was added in 2.6.18, and you're right, a lot has
changed since then. Have you tried removing it and running it under
lockdep, and see if it triggers any warnings?

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: SysRq nice-all-RT-tasks is broken
  2017-03-08 16:51 ` Steven Rostedt
@ 2017-03-08 16:57   ` Steven Rostedt
  2017-03-08 17:03     ` Laurent Dufour
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2017-03-08 16:57 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar

On Wed, 8 Mar 2017 11:51:14 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:


> Hmm, that commit was added in 2.6.18, and you're right, a lot has
> changed since then. Have you tried removing it and running it under
> lockdep, and see if it triggers any warnings?

I did a little digging, and it appears that its the rt mutex wait lock
that the comment was referring to. Today that spin lock is irq safe. I
believe its safe to remove the BUG_ON(). Want me to send a patch?

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: SysRq nice-all-RT-tasks is broken
  2017-03-08 16:57   ` Steven Rostedt
@ 2017-03-08 17:03     ` Laurent Dufour
  2017-03-08 17:40       ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Laurent Dufour @ 2017-03-08 17:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar

On 08/03/2017 17:57, Steven Rostedt wrote:
> On Wed, 8 Mar 2017 11:51:14 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> 
>> Hmm, that commit was added in 2.6.18, and you're right, a lot has
>> changed since then. Have you tried removing it and running it under
>> lockdep, and see if it triggers any warnings?
> 
> I did a little digging, and it appears that its the rt mutex wait lock
> that the comment was referring to. Today that spin lock is irq safe. I
> believe its safe to remove the BUG_ON(). Want me to send a patch?

Sure, go ahead ;)

Thanks,
Laurent.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: SysRq nice-all-RT-tasks is broken
  2017-03-08 17:03     ` Laurent Dufour
@ 2017-03-08 17:40       ` Steven Rostedt
  2017-03-08 17:46         ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2017-03-08 17:40 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra


[
  Added Peter

   Update: Laurent noticed that sysrq 'n' (nice-all-RT-tasks) calls
   __sched_setscheduler() form interrupt context. At the start of that
   function, there's a BUG_ON(in_interrupt()). The reason for that was
   due to the rt mutex pi code calling wait_lock. Which was not irq
   safe. Now it is, but that's not good enough.
]

On Wed, 8 Mar 2017 18:03:55 +0100
Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:

> On 08/03/2017 17:57, Steven Rostedt wrote:
> > On Wed, 8 Mar 2017 11:51:14 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> >   
> >> Hmm, that commit was added in 2.6.18, and you're right, a lot has
> >> changed since then. Have you tried removing it and running it under
> >> lockdep, and see if it triggers any warnings?  
> > 
> > I did a little digging, and it appears that its the rt mutex wait lock
> > that the comment was referring to. Today that spin lock is irq safe. I
> > believe its safe to remove the BUG_ON(). Want me to send a patch?  
> 
> Sure, go ahead ;)
>

Actually, it's still not safe :-/

I just noticed this in the call path:

	raw_spin_unlock_irq(&task->pi_lock);

As well as other raw_spin_unlock_irq()s.

Which would enable interrupts regardless of the previous state.

One solution is to change all those to irqsave() but that seems to be a
big step for something that is rarely done (how many years has it been
since 2.6.18?).

I wonder if we should just have a special flag sent by that sysrq
trigger. Since it is causing all tasks to go "nice" there's no need to
do the pi chain walk in __sched_setscheduler().

-- Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RFC: SysRq nice-all-RT-tasks is broken
  2017-03-08 17:40       ` Steven Rostedt
@ 2017-03-08 17:46         ` Steven Rostedt
  2017-03-09  9:02           ` Laurent Dufour
  2017-05-23  8:46           ` [tip:sched/core] sched/core: Allow __sched_setscheduler() in interrupts when PI is not used tip-bot for Steven Rostedt (VMware)
  0 siblings, 2 replies; 8+ messages in thread
From: Steven Rostedt @ 2017-03-08 17:46 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra

On Wed, 8 Mar 2017 12:40:12 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I wonder if we should just have a special flag sent by that sysrq
> trigger. Since it is causing all tasks to go "nice" there's no need to
> do the pi chain walk in __sched_setscheduler().

Hah, there already is a flag!

Laurent, can you test this patch:

-- Steve

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b31fc0..7292fa9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4129,8 +4129,8 @@ static int __sched_setscheduler(struct task_struct *p,
 	int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
 	struct rq *rq;
 
-	/* May grab non-irq protected spin_locks: */
-	BUG_ON(in_interrupt());
+	/* The pi code expects interrupts enabled */
+	BUG_ON(pi && in_interrupt());
 recheck:
 	/* Double check policy once rq lock held: */
 	if (policy < 0) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: RFC: SysRq nice-all-RT-tasks is broken
  2017-03-08 17:46         ` Steven Rostedt
@ 2017-03-09  9:02           ` Laurent Dufour
  2017-05-23  8:46           ` [tip:sched/core] sched/core: Allow __sched_setscheduler() in interrupts when PI is not used tip-bot for Steven Rostedt (VMware)
  1 sibling, 0 replies; 8+ messages in thread
From: Laurent Dufour @ 2017-03-09  9:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra

On 08/03/2017 18:46, Steven Rostedt wrote:
> On Wed, 8 Mar 2017 12:40:12 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> I wonder if we should just have a special flag sent by that sysrq
>> trigger. Since it is causing all tasks to go "nice" there's no need to
>> do the pi chain walk in __sched_setscheduler().
> 
> Hah, there already is a flag!
> 
> Laurent, can you test this patch:

Tested on ppc64, no more panic \o/

FWIW,
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Thanks,
Laurent.

> 
> -- Steve
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3b31fc0..7292fa9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4129,8 +4129,8 @@ static int __sched_setscheduler(struct task_struct *p,
>  	int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
>  	struct rq *rq;
> 
> -	/* May grab non-irq protected spin_locks: */
> -	BUG_ON(in_interrupt());
> +	/* The pi code expects interrupts enabled */
> +	BUG_ON(pi && in_interrupt());
>  recheck:
>  	/* Double check policy once rq lock held: */
>  	if (policy < 0) {
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/core] sched/core: Allow __sched_setscheduler() in interrupts when PI is not used
  2017-03-08 17:46         ` Steven Rostedt
  2017-03-09  9:02           ` Laurent Dufour
@ 2017-05-23  8:46           ` tip-bot for Steven Rostedt (VMware)
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot for Steven Rostedt (VMware) @ 2017-05-23  8:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rostedt, ldufour, hpa, peterz, torvalds, mingo, tglx, linux-kernel, akpm

Commit-ID:  896bbb2522587e3b8eb2a0d204d43ccc1042a00d
Gitweb:     http://git.kernel.org/tip/896bbb2522587e3b8eb2a0d204d43ccc1042a00d
Author:     Steven Rostedt (VMware) <rostedt@goodmis.org>
AuthorDate: Thu, 9 Mar 2017 10:18:42 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 23 May 2017 10:01:34 +0200

sched/core: Allow __sched_setscheduler() in interrupts when PI is not used

When priority inheritance was added back in 2.6.18 to sched_setscheduler(), it
added a path to taking an rt-mutex wait_lock, which is not IRQ safe. As PI
is not a common occurrence, lockdep will likely never trigger if
sched_setscheduler was called from interrupt context. A BUG_ON() was added
to trigger if __sched_setscheduler() was ever called from interrupt context
because there was a possibility to take the wait_lock.

Today the wait_lock is irq safe, but the path to taking it in
sched_setscheduler() is the same as the path to taking it from normal
context. The wait_lock is taken with raw_spin_lock_irq() and released with
raw_spin_unlock_irq() which will indiscriminately enable interrupts,
which would be bad in interrupt context.

The problem is that normalize_rt_tasks, which is called by triggering the
sysrq nice-all-RT-tasks was changed to call __sched_setscheduler(), and this
is done from interrupt context!

Now __sched_setscheduler() takes a "pi" parameter that is used to know if
the priority inheritance should be called or not. As the BUG_ON() only cares
about calling the PI code, it should only bug if called from interrupt
context with the "pi" parameter set to true.

Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: dbc7f069b93a ("sched: Use replace normalize_task() with __sched_setscheduler()")
Link: http://lkml.kernel.org/r/20170308124654.10e598f2@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4a31239..877241e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4188,8 +4188,8 @@ static int __sched_setscheduler(struct task_struct *p,
 	int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK;
 	struct rq *rq;
 
-	/* May grab non-irq protected spin_locks: */
-	BUG_ON(in_interrupt());
+	/* The pi code expects interrupts enabled */
+	BUG_ON(pi && in_interrupt());
 recheck:
 	/* Double check policy once rq lock held: */
 	if (policy < 0) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-23  8:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-08 15:23 RFC: SysRq nice-all-RT-tasks is broken Laurent Dufour
2017-03-08 16:51 ` Steven Rostedt
2017-03-08 16:57   ` Steven Rostedt
2017-03-08 17:03     ` Laurent Dufour
2017-03-08 17:40       ` Steven Rostedt
2017-03-08 17:46         ` Steven Rostedt
2017-03-09  9:02           ` Laurent Dufour
2017-05-23  8:46           ` [tip:sched/core] sched/core: Allow __sched_setscheduler() in interrupts when PI is not used tip-bot for Steven Rostedt (VMware)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.