* [PATCH 0/2] random debug patches @ 2016-03-18 15:28 Peter Zijlstra 2016-03-18 15:28 ` [PATCH 1/2] watchdog: Fix output Peter Zijlstra 2016-03-18 15:28 ` [PATCH 2/2] trace: Record and show NMI state Peter Zijlstra 0 siblings, 2 replies; 11+ messages in thread From: Peter Zijlstra @ 2016-03-18 15:28 UTC (permalink / raw) To: linux-kernel; +Cc: rostedt, dzickus, mingo, tglx, peterz The only relation between these patches is that they are the result of me trying to hunt down fail. Other than that they're unrelated and can be independently applied. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] watchdog: Fix output 2016-03-18 15:28 [PATCH 0/2] random debug patches Peter Zijlstra @ 2016-03-18 15:28 ` Peter Zijlstra 2016-03-18 16:37 ` Don Zickus 2016-03-18 15:28 ` [PATCH 2/2] trace: Record and show NMI state Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2016-03-18 15:28 UTC (permalink / raw) To: linux-kernel; +Cc: rostedt, dzickus, mingo, tglx, peterz [-- Attachment #1: peterz-watchdog-fix-output.patch --] [-- Type: text/plain, Size: 1309 bytes --] The pr_crap() functions generate idiotic output; use printk(). Broken output: [ 3538.718135] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:2646] [ 3383.233583] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27 Fixed output: [14180.328194] BUG: soft lockup - CPU#2 stuck for 23s! [perf_fuzzer:13650] [ 1064.914925] BUG: NMI Watchdog detected hard LOCKUP on cpu 11 Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/watchdog.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(s if (__this_cpu_read(hard_watchdog_warn) == true) return; - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); + printk(KERN_EMERG "BUG: NMI detected hard LOCKUP on cpu %d", this_cpu); print_modules(); print_irqtrace_events(current); if (regs) @@ -467,7 +467,7 @@ static enum hrtimer_restart watchdog_tim } } - pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", + printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); __this_cpu_write(softlockup_task_ptr_saved, current); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-18 15:28 ` [PATCH 1/2] watchdog: Fix output Peter Zijlstra @ 2016-03-18 16:37 ` Don Zickus 2016-03-18 16:44 ` Peter Zijlstra 2016-03-18 17:49 ` Peter Zijlstra 0 siblings, 2 replies; 11+ messages in thread From: Don Zickus @ 2016-03-18 16:37 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, rostedt, mingo, tglx On Fri, Mar 18, 2016 at 04:28:03PM +0100, Peter Zijlstra wrote: > The pr_crap() functions generate idiotic output; use printk(). > > Broken output: > > [ 3538.718135] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:2646] > [ 3383.233583] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27 > > Fixed output: > > [14180.328194] BUG: soft lockup - CPU#2 stuck for 23s! [perf_fuzzer:13650] > [ 1064.914925] BUG: NMI Watchdog detected hard LOCKUP on cpu 11 Would something like this be a better patch? Cheers, Don diff --git a/kernel/watchdog.c b/kernel/watchdog.c index b3ace6e..e3302a0 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -9,7 +9,7 @@ * to those contributors as well. */ -#define pr_fmt(fmt) "NMI watchdog: " fmt +#define pr_fmt(fmt) "Lockup detector: " fmt #include <linux/mm.h> #include <linux/cpu.h> @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(struct perf_event *event, if (__this_cpu_read(hard_watchdog_warn) == true) return; - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); + pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu); print_modules(); print_irqtrace_events(current); if (regs) @@ -467,7 +467,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) } } - pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", + pr_emerg("Detected soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); __this_cpu_write(softlockup_task_ptr_saved, current); > > Cc: Don Zickus <dzickus@redhat.com> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/watchdog.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(s > if (__this_cpu_read(hard_watchdog_warn) == true) > return; > > - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + printk(KERN_EMERG "BUG: NMI detected hard LOCKUP on cpu %d", this_cpu); > print_modules(); > print_irqtrace_events(current); > if (regs) > @@ -467,7 +467,7 @@ static enum hrtimer_restart watchdog_tim > } > } > > - pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > + printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > smp_processor_id(), duration, > current->comm, task_pid_nr(current)); > __this_cpu_write(softlockup_task_ptr_saved, current); > > ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-18 16:37 ` Don Zickus @ 2016-03-18 16:44 ` Peter Zijlstra 2016-03-18 16:48 ` Peter Zijlstra 2016-03-18 17:49 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2016-03-18 16:44 UTC (permalink / raw) To: Don Zickus; +Cc: linux-kernel, rostedt, mingo, tglx On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote: > On Fri, Mar 18, 2016 at 04:28:03PM +0100, Peter Zijlstra wrote: > > The pr_crap() functions generate idiotic output; use printk(). > > > > Broken output: > > > > [ 3538.718135] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [perf_fuzzer:2646] > > [ 3383.233583] NMI watchdog: Watchdog detected hard LOCKUP on cpu 27 > > > > Fixed output: > > > > [14180.328194] BUG: soft lockup - CPU#2 stuck for 23s! [perf_fuzzer:13650] > > [ 1064.914925] BUG: NMI Watchdog detected hard LOCKUP on cpu 11 > > Would something like this be a better patch? > -#define pr_fmt(fmt) "NMI watchdog: " fmt > +#define pr_fmt(fmt) "Lockup detector: " fmt > - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu); That still results in something like: Lockup detector: Detected hard LOCKUP on cpu 22 Which is still weird, doesn't mention NMI nor BUG (and when the NMI triggers it really rather frequently is a dead system). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-18 16:44 ` Peter Zijlstra @ 2016-03-18 16:48 ` Peter Zijlstra 2016-03-18 17:42 ` Joe Perches 2016-03-18 18:12 ` Don Zickus 0 siblings, 2 replies; 11+ messages in thread From: Peter Zijlstra @ 2016-03-18 16:48 UTC (permalink / raw) To: Don Zickus; +Cc: linux-kernel, rostedt, mingo, tglx On Fri, Mar 18, 2016 at 05:44:41PM +0100, Peter Zijlstra wrote: > On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote: > > Would something like this be a better patch? > > > -#define pr_fmt(fmt) "NMI watchdog: " fmt > > +#define pr_fmt(fmt) "Lockup detector: " fmt You might want to audit all the other pr_ nonsense in that file; a lot of the msgs don't particularly make sense when combined with the prefix. Some are for the NMI thing, others for the soft lockup one. I would just chuck pr_crap() altogether and use sensible text. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-18 16:48 ` Peter Zijlstra @ 2016-03-18 17:42 ` Joe Perches 2016-03-18 18:12 ` Don Zickus 1 sibling, 0 replies; 11+ messages in thread From: Joe Perches @ 2016-03-18 17:42 UTC (permalink / raw) To: Peter Zijlstra, Don Zickus; +Cc: linux-kernel, rostedt, mingo, tglx On Fri, 2016-03-18 at 17:48 +0100, Peter Zijlstra wrote: > On Fri, Mar 18, 2016 at 05:44:41PM +0100, Peter Zijlstra wrote: > > > > On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote: > > > > > > Would something like this be a better patch? > > > > > > -#define pr_fmt(fmt) "NMI watchdog: " fmt > > > +#define pr_fmt(fmt) "Lockup detector: " fmt > You might want to audit all the other pr_ nonsense in that file; a > lot > of the msgs don't particularly make sense when combined with the > prefix. > > Some are for the NMI thing, others for the soft lockup one. > > I would just chuck pr_crap() altogether and use sensible text. or change to #define pr_fmt(fmt) fmt and output whatever unprefixed content you want. btw: the first printk/pr_emerg should add a newline pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); sb: pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-18 16:48 ` Peter Zijlstra 2016-03-18 17:42 ` Joe Perches @ 2016-03-18 18:12 ` Don Zickus 1 sibling, 0 replies; 11+ messages in thread From: Don Zickus @ 2016-03-18 18:12 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, rostedt, mingo, tglx On Fri, Mar 18, 2016 at 05:48:58PM +0100, Peter Zijlstra wrote: > On Fri, Mar 18, 2016 at 05:44:41PM +0100, Peter Zijlstra wrote: > > On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote: > > > Would something like this be a better patch? > > > > > -#define pr_fmt(fmt) "NMI watchdog: " fmt > > > +#define pr_fmt(fmt) "Lockup detector: " fmt > > You might want to audit all the other pr_ nonsense in that file; a lot > of the msgs don't particularly make sense when combined with the prefix. > > Some are for the NMI thing, others for the soft lockup one. > > I would just chuck pr_crap() altogether and use sensible text. Hehe. I'll look into that next week. :-) Cheers, Don ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-18 16:37 ` Don Zickus 2016-03-18 16:44 ` Peter Zijlstra @ 2016-03-18 17:49 ` Peter Zijlstra 2016-03-19 0:11 ` Jiri Kosina 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2016-03-18 17:49 UTC (permalink / raw) To: Don Zickus; +Cc: linux-kernel, rostedt, mingo, tglx, jkosina On Fri, Mar 18, 2016 at 12:37:48PM -0400, Don Zickus wrote: > +++ b/kernel/watchdog.c > @@ -9,7 +9,7 @@ > * to those contributors as well. > */ > > -#define pr_fmt(fmt) "NMI watchdog: " fmt > +#define pr_fmt(fmt) "Lockup detector: " fmt > > #include <linux/mm.h> > #include <linux/cpu.h> > @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(struct perf_event *event, > if (__this_cpu_read(hard_watchdog_warn) == true) > return; > > - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > + pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu); > print_modules(); > print_irqtrace_events(current); > if (regs) It was Jiri who made this mess by replacing WARN(), which has a very distinct format, with this custom stuff. I think we should go back to the WARN() thing. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-18 17:49 ` Peter Zijlstra @ 2016-03-19 0:11 ` Jiri Kosina 2016-03-19 0:39 ` Jiri Kosina 0 siblings, 1 reply; 11+ messages in thread From: Jiri Kosina @ 2016-03-19 0:11 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Don Zickus, linux-kernel, rostedt, mingo, tglx On Fri, 18 Mar 2016, Peter Zijlstra wrote: > > -#define pr_fmt(fmt) "NMI watchdog: " fmt > > +#define pr_fmt(fmt) "Lockup detector: " fmt > > > > #include <linux/mm.h> > > #include <linux/cpu.h> > > @@ -350,7 +350,7 @@ static void watchdog_overflow_callback(struct perf_event *event, > > if (__this_cpu_read(hard_watchdog_warn) == true) > > return; > > > > - pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu); > > + pr_emerg("Detected hard LOCKUP on cpu %d", this_cpu); > > print_modules(); > > print_irqtrace_events(current); > > if (regs) > > It was Jiri who made this mess by replacing WARN(), which has a very > distinct format, with this custom stuff. Right, this was in 55537871e ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup"). To be really honest, I don't really have 100% reliable explanation why I did it this way; I *think* it was because otherwise, in case WARN is preserved, we'll either - have different output format for current (warning) CPU and all the others (trigger_allbutself_cpu_backtrace()) or - have duplicated backtrace for current CPU, one coming from WARN and one coming from the all-cpu backtrace > I think we should go back to the WARN() thing. I don't really have strong opinion on this, but someone should pick the poison from the two options above; I tried in the mentioned commit, but apparently not to general satisfaction :) Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] watchdog: Fix output 2016-03-19 0:11 ` Jiri Kosina @ 2016-03-19 0:39 ` Jiri Kosina 0 siblings, 0 replies; 11+ messages in thread From: Jiri Kosina @ 2016-03-19 0:39 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Don Zickus, linux-kernel, rostedt, mingo, tglx On Sat, 19 Mar 2016, Jiri Kosina wrote: > > It was Jiri who made this mess by replacing WARN(), which has a very > > distinct format, with this custom stuff. > > Right, this was in 55537871e ("kernel/watchdog.c: perform all-CPU > backtrace in case of hard lockup"). > > To be really honest, I don't really have 100% reliable explanation why I > did it this way; I *think* it was because otherwise, in case WARN is > preserved, we'll either > > - have different output format for current (warning) CPU and all the > others (trigger_allbutself_cpu_backtrace()) > > or > > - have duplicated backtrace for current CPU, one coming from WARN and one > coming from the all-cpu backtrace > > > I think we should go back to the WARN() thing. > > I don't really have strong opinion on this, but someone should pick the > poison from the two options above; I tried in the mentioned commit, but > apparently not to general satisfaction :) Hmm, or we can actually make the whole thing behave the way that was introduced by 55537871e only in case of sysctl_hardlockup_all_cpu_backtrace being set, and use WARN() otherwise. I'll send out a patch for this during / after the weekend if noone else beats me to do it. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] trace: Record and show NMI state 2016-03-18 15:28 [PATCH 0/2] random debug patches Peter Zijlstra 2016-03-18 15:28 ` [PATCH 1/2] watchdog: Fix output Peter Zijlstra @ 2016-03-18 15:28 ` Peter Zijlstra 1 sibling, 0 replies; 11+ messages in thread From: Peter Zijlstra @ 2016-03-18 15:28 UTC (permalink / raw) To: linux-kernel; +Cc: rostedt, dzickus, mingo, tglx, peterz [-- Attachment #1: trace-nmi.patch --] [-- Type: text/plain, Size: 1831 bytes --] The latency tracer format has a nice column to indicate IRQ state, but this is not able to tell us about NMI state. When tracing perf interrupt handlers (which often run in NMI context) it is very useful to see how the events nest. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/trace/trace.c | 1 + kernel/trace/trace.h | 1 + kernel/trace/trace_output.c | 10 +++++++--- 3 files changed, 9 insertions(+), 3 deletions(-) --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1662,6 +1662,7 @@ tracing_generic_entry_update(struct trac #else TRACE_FLAG_IRQS_NOSUPPORT | #endif + ((pc & NMI_MASK ) ? TRACE_FLAG_NMI : 0) | ((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) | ((pc & SOFTIRQ_MASK) ? TRACE_FLAG_SOFTIRQ : 0) | (tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) | --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -125,6 +125,7 @@ enum trace_flag_type { TRACE_FLAG_HARDIRQ = 0x08, TRACE_FLAG_SOFTIRQ = 0x10, TRACE_FLAG_PREEMPT_RESCHED = 0x20, + TRACE_FLAG_NMI = 0x40, }; #define TRACE_BUF_SIZE 1024 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -389,7 +389,9 @@ int trace_print_lat_fmt(struct trace_seq char irqs_off; int hardirq; int softirq; + int nmi; + nmi = entry->flags & TRACE_FLAG_NMI; hardirq = entry->flags & TRACE_FLAG_HARDIRQ; softirq = entry->flags & TRACE_FLAG_SOFTIRQ; @@ -415,10 +417,12 @@ int trace_print_lat_fmt(struct trace_seq } hardsoft_irq = + (nmi && hardirq) ? 'Z' : + nmi ? 'z' : (hardirq && softirq) ? 'H' : - hardirq ? 'h' : - softirq ? 's' : - '.'; + hardirq ? 'h' : + softirq ? 's' : + '.' ; trace_seq_printf(s, "%c%c%c", irqs_off, need_resched, hardsoft_irq); ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-03-19 0:39 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-18 15:28 [PATCH 0/2] random debug patches Peter Zijlstra 2016-03-18 15:28 ` [PATCH 1/2] watchdog: Fix output Peter Zijlstra 2016-03-18 16:37 ` Don Zickus 2016-03-18 16:44 ` Peter Zijlstra 2016-03-18 16:48 ` Peter Zijlstra 2016-03-18 17:42 ` Joe Perches 2016-03-18 18:12 ` Don Zickus 2016-03-18 17:49 ` Peter Zijlstra 2016-03-19 0:11 ` Jiri Kosina 2016-03-19 0:39 ` Jiri Kosina 2016-03-18 15:28 ` [PATCH 2/2] trace: Record and show NMI state Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.