* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree [not found] ` <20181207095004.GB3729@jagdpanzerIV> @ 2018-12-10 9:45 ` Feng Tang 2018-12-10 15:57 ` Petr Mladek 2018-12-11 8:00 ` Sergey Senozhatsky 0 siblings, 2 replies; 8+ messages in thread From: Feng Tang @ 2018-12-10 9:45 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel Hi Sergey, + lkml. On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote: > On (12/06/18 11:58), Feng Tang wrote: > > > Same here, I tried on several platforms and hardly get the sysrq magic key > > > working, though it works while system is running. > > > > > > And it make me wondering if those workqueue dependent led blinking code > > > can still really work. > > > > Also, IMHO, if we need a panic blink method, it should better be simple > > and robust with only HW registers access plus delay function, as I'm not > > sure if the scheduling can still work. > > > > Anyway, can I propose to make the "local_irq_enable" conditional and off > > by default, and add a warning. > > I'm not sure what to do about this. I think that the behaviour is platform > specific. For instance, arm64 keeps secondary CPUs in a busy loop > while (1) > cpu_relax(); Yes, it's similar to x86's handling for non-panic CPU. > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > if I got it right); so it seems that arm64 can handle IRQs after panic. And > if there are platforms that handle IRQ (including sysrq) after panic, then > both options - making printk a noop or keeping local irqs off - maybe can > cause some problems. Or maybe not. We better ask arch people. Yes, this is very valid concern. And after Petr and you raised it, I did some experiments with 3 x86 platforms at my hand, one Apollolake IOT device with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key all works well before panic, and fails after panic. But I did remember the PageUp/PageDown key worked on some laptop years ago. And you actually raised a good question: what do we expect for the post-panic kernel? For the v4 patch, my thought is, for experienced developers to make sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline, or runtime change it by "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on" while for normal user, they can by default see the clean panic call stack either on a screen or a serial console. Thanks, Feng > > Personally, on my x86 laptop, I'd prefer the srollback to work after panic. > Just my 5 cents. > > -ss ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-10 9:45 ` + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree Feng Tang @ 2018-12-10 15:57 ` Petr Mladek 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:00 ` Sergey Senozhatsky 1 sibling, 1 reply; 8+ messages in thread From: Petr Mladek @ 2018-12-10 15:57 UTC (permalink / raw) To: Feng Tang Cc: Sergey Senozhatsky, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On Mon 2018-12-10 17:45:54, Feng Tang wrote: > Hi Sergey, > > + lkml. > > On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote: > > On (12/06/18 11:58), Feng Tang wrote: > > > > Same here, I tried on several platforms and hardly get the sysrq magic key > > > > working, though it works while system is running. > > > > > > > > And it make me wondering if those workqueue dependent led blinking code > > > > can still really work. > > > > > > Also, IMHO, if we need a panic blink method, it should better be simple > > > and robust with only HW registers access plus delay function, as I'm not > > > sure if the scheduling can still work. > > > > > > Anyway, can I propose to make the "local_irq_enable" conditional and off > > > by default, and add a warning. > > > > I'm not sure what to do about this. I think that the behaviour is platform > > specific. For instance, arm64 keeps secondary CPUs in a busy loop > > while (1) > > cpu_relax(); > > Yes, it's similar to x86's handling for non-panic CPU. > > > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > if there are platforms that handle IRQ (including sysrq) after panic, then > > both options - making printk a noop or keeping local irqs off - maybe can > > cause some problems. Or maybe not. We better ask arch people. > > Yes, this is very valid concern. And after Petr and you raised it, I did > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > all works well before panic, and fails after panic. But I did remember the > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > good question: what do we expect for the post-panic kernel? I am not sure why it does not work. But it would be nice if sysrq worked. > For the v4 patch, my thought is, for experienced developers to make > sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline, > or runtime change it by > "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on" > while for normal user, they can by default see the clean panic call stack > either on a screen or a serial console. I see this problematic from two reasons: First, the blinking and especially sysrq might be handy but they are disabled by default. It is too late to do anything when the system is panicing. Second, new parameters or configure options should be avoided whenever possible. They are easy to implement but the are less easy to use. The current amount of them is already overhelming. I still think that calming down printk() is acceptable when it can be restored from sysrq. I think that only few people might be interested into debugging post-panic problems. We could print a warning for them about that printk() has got disabled. Best Regards, Petr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-10 15:57 ` Petr Mladek @ 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:22 ` Petr Mladek 2018-12-11 8:32 ` Feng Tang 0 siblings, 2 replies; 8+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 8:07 UTC (permalink / raw) To: Petr Mladek Cc: Feng Tang, Sergey Senozhatsky, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On (12/10/18 16:57), Petr Mladek wrote: > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > > if there are platforms that handle IRQ (including sysrq) after panic, then > > > both options - making printk a noop or keeping local irqs off - maybe can > > > cause some problems. Or maybe not. We better ask arch people. > > > > Yes, this is very valid concern. And after Petr and you raised it, I did > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > > all works well before panic, and fails after panic. But I did remember the > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > > good question: what do we expect for the post-panic kernel? > > I am not sure why it does not work. But it would be nice if sysrq > worked. Absolutely. [..] > I still think that calming down printk() is acceptable when > it can be restored from sysrq. I would agree; peeking one of the two solutions, printk patch is probably preferable. > I think that only few people might be interested into debugging > post-panic problems. We could print a warning for them about > that printk() has got disabled. Dunno. This _maybe_ (speculation!) can upset folks on those platforms that have sysrq working after panic. printk is a common code. I'm probably missing a lot of things here, but just in case, I'm not sure at which point the idea of patching some files under arch/x86 directory was ruled out and why. -ss ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:07 ` Sergey Senozhatsky @ 2018-12-11 8:22 ` Petr Mladek 2018-12-11 8:26 ` Sergey Senozhatsky 2018-12-11 8:32 ` Feng Tang 1 sibling, 1 reply; 8+ messages in thread From: Petr Mladek @ 2018-12-11 8:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Feng Tang, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On Tue 2018-12-11 17:07:43, Sergey Senozhatsky wrote: > On (12/10/18 16:57), Petr Mladek wrote: > > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > > > if there are platforms that handle IRQ (including sysrq) after panic, then > > > > both options - making printk a noop or keeping local irqs off - maybe can > > > > cause some problems. Or maybe not. We better ask arch people. > > > > > > Yes, this is very valid concern. And after Petr and you raised it, I did > > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > > > all works well before panic, and fails after panic. But I did remember the > > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > > > good question: what do we expect for the post-panic kernel? > > > > I am not sure why it does not work. But it would be nice if sysrq > > worked. > > Absolutely. > > [..] > > I still think that calming down printk() is acceptable when > > it can be restored from sysrq. > > I would agree; peeking one of the two solutions, printk patch is > probably preferable. > > > I think that only few people might be interested into debugging > > post-panic problems. We could print a warning for them about > > that printk() has got disabled. > > Dunno. This _maybe_ (speculation!) can upset folks on those platforms > that have sysrq working after panic. printk is a common code. > > I'm probably missing a lot of things here, but just in case, I'm not > sure at which point the idea of patching some files under arch/x86 > directory was ruled out and why. I suggested to clear the panic_blinking (or whatever name) in __handle_sysrq(). The idea is that sysrq needs manual intervention. It allows to see the original message before it gets overridden by a potential sysrq-related output. It assumes that sysrq is the only interesting operation when printk() might be useful at this state. Best Regards, Petr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:22 ` Petr Mladek @ 2018-12-11 8:26 ` Sergey Senozhatsky 0 siblings, 0 replies; 8+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 8:26 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Feng Tang, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On (12/11/18 09:22), Petr Mladek wrote: > I suggested to clear the panic_blinking (or whatever name) in > __handle_sysrq(). The idea is that sysrq needs manual intervention. > It allows to see the original message before it gets overridden > by a potential sysrq-related output. Ah, indeed. Sounds good. > It assumes that sysrq is the only interesting operation when > printk() might be useful at this state. Agreed. Makes sense. -ss ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:22 ` Petr Mladek @ 2018-12-11 8:32 ` Feng Tang 2018-12-11 9:08 ` Sergey Senozhatsky 1 sibling, 1 reply; 8+ messages in thread From: Feng Tang @ 2018-12-11 8:32 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel Hi Sergey, On Tue, Dec 11, 2018 at 05:07:43PM +0900, Sergey Senozhatsky wrote: > On (12/10/18 16:57), Petr Mladek wrote: > > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > > > if there are platforms that handle IRQ (including sysrq) after panic, then > > > > both options - making printk a noop or keeping local irqs off - maybe can > > > > cause some problems. Or maybe not. We better ask arch people. > > > > > > Yes, this is very valid concern. And after Petr and you raised it, I did > > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > > > all works well before panic, and fails after panic. But I did remember the > > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > > > good question: what do we expect for the post-panic kernel? > > > > I am not sure why it does not work. But it would be nice if sysrq > > worked. > > Absolutely. > > [..] > > I still think that calming down printk() is acceptable when > > it can be restored from sysrq. > > I would agree; peeking one of the two solutions, printk patch is > probably preferable. > > > I think that only few people might be interested into debugging > > post-panic problems. We could print a warning for them about > > that printk() has got disabled. > > Dunno. This _maybe_ (speculation!) can upset folks on those platforms > that have sysrq working after panic. printk is a common code. > > I'm probably missing a lot of things here, but just in case, I'm not > sure at which point the idea of patching some files under arch/x86 > directory was ruled out and why. Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304 And actually no one ruled out the v1 patch :), I don't have HW of other archs like arm/ppc, so I just read some of the arch code, and found most of them use the similar flow like x86, that's why I chosed to finding a soluton inside panic.c itself. Thanks, Feng ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:32 ` Feng Tang @ 2018-12-11 9:08 ` Sergey Senozhatsky 0 siblings, 0 replies; 8+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 9:08 UTC (permalink / raw) To: Feng Tang, Peter Zijlstra, Petr Mladek, Steven Rostedt, Thomas Gleixner, Frederic Weisbecker Cc: Sergey Senozhatsky, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, Sasha Levin, Andi Kleen, linux-kernel Adding Frederic, https://lkml.org/lkml/2018/10/11/304 On (12/11/18 16:32), Feng Tang wrote: > Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304 > > And actually no one ruled out the v1 patch :), I don't have HW of other > archs like arm/ppc, so I just read some of the arch code, and found > most of them use the similar flow like x86, that's why I chosed to > finding a soluton inside panic.c itself. Interesting. So if the problem is that we need to clear cpu bit in several cpumaks (e.g. nohz.idle_cpus_mask) when we stop_this_cpu(), then I'd say let's clear cpumasks which are needed to be clear (doing some of the things which sched_cpu_dying() does, except that we need it on !CONFIG_HOTPLUG_CPU systems too). The idea of notifiers also looks interesting. x86 and sched gurus, can you please help? -ss ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-10 9:45 ` + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree Feng Tang 2018-12-10 15:57 ` Petr Mladek @ 2018-12-11 8:00 ` Sergey Senozhatsky 1 sibling, 0 replies; 8+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 8:00 UTC (permalink / raw) To: Feng Tang Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel Hi, On (12/10/18 17:45), Feng Tang wrote: > Yes, this is very valid concern. And after Petr and you raised it, I did > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > all works well before panic, and fails after panic. But I did remember the > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > good question: what do we expect for the post-panic kernel? Yeah. It used to be case that people expected some things to work after panic. > For the v4 patch, my thought is, for experienced developers to make > sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline, > or runtime change it by > "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on" > while for normal user, they can by default see the clean panic call stack > either on a screen or a serial console. Before we move on, just a quick question, since I wasn't Cc-ed to v1 and v2 of this patch - did you have a chance to ask x86 people if they can help in any way? Asking to make sure that we are not fixing a _maybe_ x86-specific problem in arch-independent/common code. /* offtopic */ LOL, wish this was a "dumb-and-ugly-solutions" contest; I'm pretty sure I'd take the first prize with this one: --- diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 04adc8d60aed..40f643bb7fdc 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -181,6 +181,16 @@ asmlinkage __visible void smp_reboot_interrupt(void) irq_exit(); } +static void native_smp_suppress_reschedule(int cpu) +{ +} + +static void native_smp_to_up(void) +{ + WARN_ON_ONCE(num_online_cpus() > 1); + smp_ops.smp_send_reschedule = native_smp_suppress_reschedule; +} + static void native_stop_other_cpus(int wait) { unsigned long flags; @@ -250,6 +260,7 @@ static void native_stop_other_cpus(int wait) local_irq_save(flags); disable_local_APIC(); mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); + native_smp_to_up(); local_irq_restore(flags); } --- If the system is not SMP anymore (hlt non-panic CPUs) - rewrite some smp_ops pointers to NOOP stubs to suppress some of those warnings. I know it's utterly awful. -ss ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-12-11 9:08 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20181204102033.ltdvc7gmev2gvlkq@pathway.suse.cz> [not found] ` <20181204154936.wbgcovzpc54n6dvs@shbuild888> [not found] ` <20181205022654.GA503@jagdpanzerIV> [not found] ` <20181205024713.nqyt6qiamokq7qtl@shbuild888> [not found] ` <20181205025728.GC503@jagdpanzerIV> [not found] ` <20181205052912.GA423@jagdpanzerIV> [not found] ` <20181205080044.GA11190@jagdpanzerIV> [not found] ` <20181205154620.4dqtledc2duhrp2c@shbuild888> [not found] ` <20181206035825.jz2bfh3errj23rjq@shbuild888> [not found] ` <20181207095004.GB3729@jagdpanzerIV> 2018-12-10 9:45 ` + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree Feng Tang 2018-12-10 15:57 ` Petr Mladek 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:22 ` Petr Mladek 2018-12-11 8:26 ` Sergey Senozhatsky 2018-12-11 8:32 ` Feng Tang 2018-12-11 9:08 ` Sergey Senozhatsky 2018-12-11 8:00 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).