linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
       [not found]                 ` <20181207095004.GB3729@jagdpanzerIV>
@ 2018-12-10  9:45                   ` Feng Tang
  2018-12-10 15:57                     ` Petr Mladek
  2018-12-11  8:00                     ` Sergey Senozhatsky
  0 siblings, 2 replies; 8+ messages in thread
From: Feng Tang @ 2018-12-10  9:45 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen, linux-kernel

Hi Sergey,

+ lkml.

On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote:
> On (12/06/18 11:58), Feng Tang wrote:
> > > Same here, I tried on several platforms and hardly get the sysrq magic key
> > > working, though it works while system is running.
> > > 
> > > And it make me wondering if those workqueue dependent led blinking code
> > > can still really work.
> > 
> > Also, IMHO, if we need a panic blink method, it should better be simple
> > and robust with only HW registers access plus delay function, as I'm not
> > sure if the scheduling can still work. 
> > 
> > Anyway, can I propose to make the "local_irq_enable" conditional and off
> > by default, and add a warning.
> 
> I'm not sure what to do about this. I think that the behaviour is platform
> specific. For instance, arm64 keeps secondary CPUs in a busy loop
> 	while (1)
> 		cpu_relax();

Yes, it's similar to x86's handling for non-panic CPU.

> 
> (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> if I got it right); so it seems that arm64 can handle IRQs after panic. And
> if there are platforms that handle IRQ (including sysrq) after panic, then
> both options - making printk a noop or keeping local irqs off - maybe can
> cause some problems. Or maybe not. We better ask arch people.

Yes, this is very valid concern. And after Petr and you raised it, I did
some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
all works well before panic, and fails after panic. But I did remember the
PageUp/PageDown key worked on some laptop years ago. And you actually raised a
good question: what do we expect for the post-panic kernel? 

For the v4 patch, my thought is, for experienced developers to make
sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline,
or runtime change it by 
 "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on"
while for normal user, they can by default see the clean panic call stack
either on a screen or a serial console. 

Thanks,
Feng

> 
> Personally, on my x86 laptop, I'd prefer the srollback to work after panic.
> Just my 5 cents.
> 
> 	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-10  9:45                   ` + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree Feng Tang
@ 2018-12-10 15:57                     ` Petr Mladek
  2018-12-11  8:07                       ` Sergey Senozhatsky
  2018-12-11  8:00                     ` Sergey Senozhatsky
  1 sibling, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2018-12-10 15:57 UTC (permalink / raw)
  To: Feng Tang
  Cc: Sergey Senozhatsky, Peter Zijlstra, akpm, bp, keescook,
	mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt,
	Sasha Levin, Andi Kleen, linux-kernel

On Mon 2018-12-10 17:45:54, Feng Tang wrote:
> Hi Sergey,
> 
> + lkml.
> 
> On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote:
> > On (12/06/18 11:58), Feng Tang wrote:
> > > > Same here, I tried on several platforms and hardly get the sysrq magic key
> > > > working, though it works while system is running.
> > > > 
> > > > And it make me wondering if those workqueue dependent led blinking code
> > > > can still really work.
> > > 
> > > Also, IMHO, if we need a panic blink method, it should better be simple
> > > and robust with only HW registers access plus delay function, as I'm not
> > > sure if the scheduling can still work. 
> > > 
> > > Anyway, can I propose to make the "local_irq_enable" conditional and off
> > > by default, and add a warning.
> > 
> > I'm not sure what to do about this. I think that the behaviour is platform
> > specific. For instance, arm64 keeps secondary CPUs in a busy loop
> > 	while (1)
> > 		cpu_relax();
> 
> Yes, it's similar to x86's handling for non-panic CPU.
> 
> > 
> > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > if there are platforms that handle IRQ (including sysrq) after panic, then
> > both options - making printk a noop or keeping local irqs off - maybe can
> > cause some problems. Or maybe not. We better ask arch people.
> 
> Yes, this is very valid concern. And after Petr and you raised it, I did
> some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> all works well before panic, and fails after panic. But I did remember the
> PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> good question: what do we expect for the post-panic kernel?

I am not sure why it does not work. But it would be nice if sysrq
worked.


> For the v4 patch, my thought is, for experienced developers to make
> sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline,
> or runtime change it by
>  "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on"
> while for normal user, they can by default see the clean panic call stack
> either on a screen or a serial console.

I see this problematic from two reasons:

First, the blinking and especially sysrq might be handy but they
are disabled by default. It is too late to do anything when the
system is panicing.

Second, new parameters or configure options should be avoided
whenever possible. They are easy to implement but the are less
easy to use. The current amount of them is already overhelming.


I still think that calming down printk() is acceptable when
it can be restored from sysrq.

I think that only few people might be interested into debugging
post-panic problems. We could print a warning for them about
that printk() has got disabled.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-10  9:45                   ` + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree Feng Tang
  2018-12-10 15:57                     ` Petr Mladek
@ 2018-12-11  8:00                     ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  8:00 UTC (permalink / raw)
  To: Feng Tang
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, akpm, bp,
	keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel

Hi,

On (12/10/18 17:45), Feng Tang wrote:
> Yes, this is very valid concern. And after Petr and you raised it, I did
> some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> all works well before panic, and fails after panic. But I did remember the
> PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> good question: what do we expect for the post-panic kernel?

Yeah.
It used to be case that people expected some things to work after panic.

> For the v4 patch, my thought is, for experienced developers to make
> sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline,
> or runtime change it by
>  "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on"
> while for normal user, they can by default see the clean panic call stack
> either on a screen or a serial console.

Before we move on, just a quick question, since I wasn't Cc-ed to v1
and v2 of this patch - did you have a chance to ask x86 people if they
can help in any way? Asking to make sure that we are not fixing a _maybe_
x86-specific problem in arch-independent/common code.


/* offtopic */

LOL, wish this was a "dumb-and-ugly-solutions" contest; I'm pretty sure
I'd take the first prize with this one:

---
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 04adc8d60aed..40f643bb7fdc 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -181,6 +181,16 @@ asmlinkage __visible void smp_reboot_interrupt(void)
 	irq_exit();
 }
 
+static void native_smp_suppress_reschedule(int cpu)
+{
+}
+
+static void native_smp_to_up(void)
+{
+	WARN_ON_ONCE(num_online_cpus() > 1);
+	smp_ops.smp_send_reschedule = native_smp_suppress_reschedule;
+}
+
 static void native_stop_other_cpus(int wait)
 {
 	unsigned long flags;
@@ -250,6 +260,7 @@ static void native_stop_other_cpus(int wait)
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
+	native_smp_to_up();
 	local_irq_restore(flags);
 }
---

If the system is not SMP anymore (hlt non-panic CPUs) - rewrite
some smp_ops pointers to NOOP stubs to suppress some of those
warnings.

I know it's utterly awful.

	-ss

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-10 15:57                     ` Petr Mladek
@ 2018-12-11  8:07                       ` Sergey Senozhatsky
  2018-12-11  8:22                         ` Petr Mladek
  2018-12-11  8:32                         ` Feng Tang
  0 siblings, 2 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  8:07 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Feng Tang, Sergey Senozhatsky, Peter Zijlstra, akpm, bp,
	keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel

On (12/10/18 16:57), Petr Mladek wrote:
> > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > > if there are platforms that handle IRQ (including sysrq) after panic, then
> > > both options - making printk a noop or keeping local irqs off - maybe can
> > > cause some problems. Or maybe not. We better ask arch people.
> > 
> > Yes, this is very valid concern. And after Petr and you raised it, I did
> > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> > all works well before panic, and fails after panic. But I did remember the
> > PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> > good question: what do we expect for the post-panic kernel?
> 
> I am not sure why it does not work. But it would be nice if sysrq
> worked.

Absolutely.

[..]
> I still think that calming down printk() is acceptable when
> it can be restored from sysrq.

I would agree; peeking one of the two solutions, printk patch is
probably preferable.

> I think that only few people might be interested into debugging
> post-panic problems. We could print a warning for them about
> that printk() has got disabled.

Dunno. This _maybe_ (speculation!) can upset folks on those platforms
that have sysrq working after panic. printk is a common code.

I'm probably missing a lot of things here, but just in case, I'm not
sure at which point the idea of patching some files under arch/x86
directory was ruled out and why.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:07                       ` Sergey Senozhatsky
@ 2018-12-11  8:22                         ` Petr Mladek
  2018-12-11  8:26                           ` Sergey Senozhatsky
  2018-12-11  8:32                         ` Feng Tang
  1 sibling, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2018-12-11  8:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Feng Tang, Peter Zijlstra, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen, linux-kernel

On Tue 2018-12-11 17:07:43, Sergey Senozhatsky wrote:
> On (12/10/18 16:57), Petr Mladek wrote:
> > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > > > if there are platforms that handle IRQ (including sysrq) after panic, then
> > > > both options - making printk a noop or keeping local irqs off - maybe can
> > > > cause some problems. Or maybe not. We better ask arch people.
> > > 
> > > Yes, this is very valid concern. And after Petr and you raised it, I did
> > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> > > all works well before panic, and fails after panic. But I did remember the
> > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> > > good question: what do we expect for the post-panic kernel?
> > 
> > I am not sure why it does not work. But it would be nice if sysrq
> > worked.
> 
> Absolutely.
> 
> [..]
> > I still think that calming down printk() is acceptable when
> > it can be restored from sysrq.
> 
> I would agree; peeking one of the two solutions, printk patch is
> probably preferable.
> 
> > I think that only few people might be interested into debugging
> > post-panic problems. We could print a warning for them about
> > that printk() has got disabled.
> 
> Dunno. This _maybe_ (speculation!) can upset folks on those platforms
> that have sysrq working after panic. printk is a common code.
> 
> I'm probably missing a lot of things here, but just in case, I'm not
> sure at which point the idea of patching some files under arch/x86
> directory was ruled out and why.

I suggested to clear the panic_blinking (or whatever name) in
__handle_sysrq(). The idea is that sysrq needs manual intervention.
It allows to see the original message before it gets overridden
by a potential sysrq-related output.

It assumes that sysrq is the only interesting operation when
printk() might be useful at this state.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:22                         ` Petr Mladek
@ 2018-12-11  8:26                           ` Sergey Senozhatsky
  0 siblings, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  8:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Feng Tang, Peter Zijlstra, akpm, bp,
	keescook, mm-commits, sergey.senozhatsky, stable, tglx,
	Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel

On (12/11/18 09:22), Petr Mladek wrote:
> I suggested to clear the panic_blinking (or whatever name) in
> __handle_sysrq(). The idea is that sysrq needs manual intervention.
> It allows to see the original message before it gets overridden
> by a potential sysrq-related output.

Ah, indeed. Sounds good.

> It assumes that sysrq is the only interesting operation when
> printk() might be useful at this state.

Agreed. Makes sense.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:07                       ` Sergey Senozhatsky
  2018-12-11  8:22                         ` Petr Mladek
@ 2018-12-11  8:32                         ` Feng Tang
  2018-12-11  9:08                           ` Sergey Senozhatsky
  1 sibling, 1 reply; 8+ messages in thread
From: Feng Tang @ 2018-12-11  8:32 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Peter Zijlstra, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin,
	Andi Kleen, linux-kernel

Hi Sergey,

On Tue, Dec 11, 2018 at 05:07:43PM +0900, Sergey Senozhatsky wrote:
> On (12/10/18 16:57), Petr Mladek wrote:
> > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware,
> > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And
> > > > if there are platforms that handle IRQ (including sysrq) after panic, then
> > > > both options - making printk a noop or keeping local irqs off - maybe can
> > > > cause some problems. Or maybe not. We better ask arch people.
> > > 
> > > Yes, this is very valid concern. And after Petr and you raised it, I did
> > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device
> > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key
> > > all works well before panic, and fails after panic. But I did remember the
> > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a
> > > good question: what do we expect for the post-panic kernel?
> > 
> > I am not sure why it does not work. But it would be nice if sysrq
> > worked.
> 
> Absolutely.
> 
> [..]
> > I still think that calming down printk() is acceptable when
> > it can be restored from sysrq.
> 
> I would agree; peeking one of the two solutions, printk patch is
> probably preferable.
> 
> > I think that only few people might be interested into debugging
> > post-panic problems. We could print a warning for them about
> > that printk() has got disabled.
> 
> Dunno. This _maybe_ (speculation!) can upset folks on those platforms
> that have sysrq working after panic. printk is a common code.
> 
> I'm probably missing a lot of things here, but just in case, I'm not
> sure at which point the idea of patching some files under arch/x86
> directory was ruled out and why.

Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304 

And actually no one ruled out the v1 patch :), I don't have HW of other
archs like arm/ppc, so I just read some of the arch code, and found
most of them use the similar flow like x86, that's why I chosed to
finding a soluton inside panic.c itself.

Thanks,
Feng


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
  2018-12-11  8:32                         ` Feng Tang
@ 2018-12-11  9:08                           ` Sergey Senozhatsky
  0 siblings, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-12-11  9:08 UTC (permalink / raw)
  To: Feng Tang, Peter Zijlstra, Petr Mladek, Steven Rostedt,
	Thomas Gleixner, Frederic Weisbecker
  Cc: Sergey Senozhatsky, akpm, bp, keescook, mm-commits,
	sergey.senozhatsky, stable, Sasha Levin, Andi Kleen,
	linux-kernel

Adding Frederic,

https://lkml.org/lkml/2018/10/11/304


On (12/11/18 16:32), Feng Tang wrote:
> Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304
>
> And actually no one ruled out the v1 patch :), I don't have HW of other
> archs like arm/ppc, so I just read some of the arch code, and found
> most of them use the similar flow like x86, that's why I chosed to
> finding a soluton inside panic.c itself.

Interesting. So if the problem is that we need to clear cpu bit in
several cpumaks (e.g. nohz.idle_cpus_mask) when we stop_this_cpu(),
then I'd say let's clear cpumasks which are needed to be clear (doing
some of the things which sched_cpu_dying() does, except that we need
it on !CONFIG_HOTPLUG_CPU systems too). The idea of notifiers also looks
interesting.


x86 and sched gurus, can you please help?

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-11  9:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20181204102033.ltdvc7gmev2gvlkq@pathway.suse.cz>
     [not found] ` <20181204154936.wbgcovzpc54n6dvs@shbuild888>
     [not found]   ` <20181205022654.GA503@jagdpanzerIV>
     [not found]     ` <20181205024713.nqyt6qiamokq7qtl@shbuild888>
     [not found]       ` <20181205025728.GC503@jagdpanzerIV>
     [not found]         ` <20181205052912.GA423@jagdpanzerIV>
     [not found]           ` <20181205080044.GA11190@jagdpanzerIV>
     [not found]             ` <20181205154620.4dqtledc2duhrp2c@shbuild888>
     [not found]               ` <20181206035825.jz2bfh3errj23rjq@shbuild888>
     [not found]                 ` <20181207095004.GB3729@jagdpanzerIV>
2018-12-10  9:45                   ` + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree Feng Tang
2018-12-10 15:57                     ` Petr Mladek
2018-12-11  8:07                       ` Sergey Senozhatsky
2018-12-11  8:22                         ` Petr Mladek
2018-12-11  8:26                           ` Sergey Senozhatsky
2018-12-11  8:32                         ` Feng Tang
2018-12-11  9:08                           ` Sergey Senozhatsky
2018-12-11  8:00                     ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).