All of lore.kernel.org
 help / color / mirror / Atom feed
* v3.0: Weird kernel log message when resuming avout NMI received
@ 2011-07-31  6:56 Francis Moreau
  2011-07-31 11:06 ` Cyrill Gorcunov
  2011-08-02 19:52 ` Don Zickus
  0 siblings, 2 replies; 15+ messages in thread
From: Francis Moreau @ 2011-07-31  6:56 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hello,

I'm seeing those kernel message when resuming:

[  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
[  524.973288] Do you have a strange power saving mode enabled?
[  524.973289] Dazed and confused, but trying to continue

I don't know if it's important or not because the system seems to work
after but maybe it worths to report

-- 
Francis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31  6:56 v3.0: Weird kernel log message when resuming avout NMI received Francis Moreau
@ 2011-07-31 11:06 ` Cyrill Gorcunov
  2011-07-31 11:19   ` Jiri Slaby
  2011-07-31 15:20   ` Francis Moreau
  2011-08-02 19:52 ` Don Zickus
  1 sibling, 2 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2011-07-31 11:06 UTC (permalink / raw)
  To: Francis Moreau; +Cc: Linux Kernel Mailing List

On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
> Hello,
> 
> I'm seeing those kernel message when resuming:
> 
> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> [  524.973288] Do you have a strange power saving mode enabled?
> [  524.973289] Dazed and confused, but trying to continue
> 
> I don't know if it's important or not because the system seems to work
> after but maybe it worths to report
> 

Hi Francis, cpu please?

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31 11:06 ` Cyrill Gorcunov
@ 2011-07-31 11:19   ` Jiri Slaby
  2011-07-31 11:26     ` Cyrill Gorcunov
  2011-08-02 19:48     ` Don Zickus
  2011-07-31 15:20   ` Francis Moreau
  1 sibling, 2 replies; 15+ messages in thread
From: Jiri Slaby @ 2011-07-31 11:19 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Francis Moreau, Linux Kernel Mailing List

On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote:
> On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
>> Hello,
>>
>> I'm seeing those kernel message when resuming:
>>
>> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
>> [  524.973288] Do you have a strange power saving mode enabled?
>> [  524.973289] Dazed and confused, but trying to continue
>>
>> I don't know if it's important or not because the system seems to work
>> after but maybe it worths to report
>>
> 
> Hi Francis, cpu please?

Hi, I'm seeing those too. For a longer time though. Please see
https://bugzilla.novell.com/show_bug.cgi?id=678882#c16

I suspect NMI watchdog rewrite caused this.

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 Duo CPU     E6850  @ 3.00GHz
stepping        : 11
cpu MHz         : 2993.060
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts
tpr_shadow vnmi flexpriority
bogomips        : 5986.12
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 Duo CPU     E6850  @ 3.00GHz
stepping        : 11
cpu MHz         : 2993.060
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts
tpr_shadow vnmi flexpriority
bogomips        : 5984.98
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

thanks,
-- 
js

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31 11:19   ` Jiri Slaby
@ 2011-07-31 11:26     ` Cyrill Gorcunov
  2011-08-02 19:48     ` Don Zickus
  1 sibling, 0 replies; 15+ messages in thread
From: Cyrill Gorcunov @ 2011-07-31 11:26 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Francis Moreau, Linux Kernel Mailing List, Don Zickus,
	Ingo Molnar, Stephane Eranian

On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote:
> On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote:
> > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
> >> Hello,
> >>
> >> I'm seeing those kernel message when resuming:
> >>
> >> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> >> [  524.973288] Do you have a strange power saving mode enabled?
> >> [  524.973289] Dazed and confused, but trying to continue
> >>
> >> I don't know if it's important or not because the system seems to work
> >> after but maybe it worths to report
> >>
> > 
> > Hi Francis, cpu please?
> 
> Hi, I'm seeing those too. For a longer time though. Please see
> https://bugzilla.novell.com/show_bug.cgi?id=678882#c16
> 
> I suspect NMI watchdog rewrite caused this.
> 

Crap. I thought we've resolved it (not exactly this one but still
unknown nmi issue) :) Still I don't remember if we paid much
attention on suspend/resume time. Don?

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31 11:06 ` Cyrill Gorcunov
  2011-07-31 11:19   ` Jiri Slaby
@ 2011-07-31 15:20   ` Francis Moreau
  2011-07-31 15:32     ` Cyrill Gorcunov
  1 sibling, 1 reply; 15+ messages in thread
From: Francis Moreau @ 2011-07-31 15:20 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Linux Kernel Mailing List

Hi Cyrill

On Sun, Jul 31, 2011 at 1:06 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
>> Hello,
>>
>> I'm seeing those kernel message when resuming:
>>
>> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
>> [  524.973288] Do you have a strange power saving mode enabled?
>> [  524.973289] Dazed and confused, but trying to continue
>>
>> I don't know if it's important or not because the system seems to work
>> after but maybe it worths to report
>>
>
> Hi Francis, cpu please?
>

Still the same which causes some kvm-tools madness ;)

$ cat /proc/cpuinfo

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 37
model name	: Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping	: 5
cpu MHz		: 1197.000
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow
vnmi flexpriority ept vpid
bogomips	: 5053.42
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 37
model name	: Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping	: 5
cpu MHz		: 1197.000
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 2
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow
vnmi flexpriority ept vpid
bogomips	: 5053.72
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 37
model name	: Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping	: 5
cpu MHz		: 1197.000
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow
vnmi flexpriority ept vpid
bogomips	: 5053.70
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 37
model name	: Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping	: 5
cpu MHz		: 1197.000
cache size	: 3072 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 2
apicid		: 5
initial apicid	: 5
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3
cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida arat dts tpr_shadow
vnmi flexpriority ept vpid
bogomips	: 5053.74
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:



-- 
Francis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31 15:20   ` Francis Moreau
@ 2011-07-31 15:32     ` Cyrill Gorcunov
  2011-08-01 11:05       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2011-07-31 15:32 UTC (permalink / raw)
  To: Francis Moreau
  Cc: LKML, Don Zickus, Peter Zijlstra, Stephane Eranian, Ingo Molnar

On Sun, Jul 31, 2011 at 05:20:15PM +0200, Francis Moreau wrote:
> Hi Cyrill
> 
> On Sun, Jul 31, 2011 at 1:06 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
> >> Hello,
> >>
> >> I'm seeing those kernel message when resuming:
> >>
> >> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> >> [  524.973288] Do you have a strange power saving mode enabled?
> >> [  524.973289] Dazed and confused, but trying to continue
> >>
> >> I don't know if it's important or not because the system seems to work
> >> after but maybe it worths to report
> >>
> >
> > Hi Francis, cpu please?
> >
> 
> Still the same which causes some kvm-tools madness ;)
> 

OK, thanks Francis, I've CC'ed a couple of people involved into
x86 perf implementation. I believe at moment of wakeup we could
drop any/all overflow flags in perf registers (still didn't look
into, will try to find some time a bit later).

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31 15:32     ` Cyrill Gorcunov
@ 2011-08-01 11:05       ` Peter Zijlstra
  2011-08-01 11:21         ` Cyrill Gorcunov
  2011-08-15  8:37         ` Francis Moreau
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-08-01 11:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Francis Moreau, LKML, Don Zickus, Stephane Eranian, Ingo Molnar

On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote:

> > >> I'm seeing those kernel message when resuming:
> > >>
> > >> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> > >> [  524.973288] Do you have a strange power saving mode enabled?
> > >> [  524.973289] Dazed and confused, but trying to continue
> > >>
> > >> I don't know if it's important or not because the system seems to work
> > >> after but maybe it worths to report

So I guess the problem is the NMI watchdog and suspend stuff not
shutting things down properly.. 

Argh, the PM notifier muck runs before the hotplug notifiers and it
doesn't avoid hotplug races on its own.. what crap.

something like the below perhaps, compile tested only.. does it work?

---
Subject: perf: Add PM notifiers
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Aug 01 12:49:14 CEST 2011

Francis reports that s2r gets him spurious NMIs, this is because the
suspend code leaves the boot cpu up and running.

Cure this by adding a suspend notifier. The problem is that hotplug
and suspend are completely un-serialized and the PM notifiers run
before the suspend cpu unplug of all but the boot cpu.

This leaves a window where the user can initialize another hotplug
operation (either remove or add a cpu) resulting in either one too
many or one too few hotplug ops. Thus we cannot use the hotplug code
for the suspend case.

There's another reason to not use the hotplug code, which is that the
hotplug code totally destroys the perf state, we can do better for
suspend and simply remove all counters from the PMU so that we can
re-instate them on resume.

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/events/core.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 95 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/events/core.c
===================================================================
--- linux-2.6.orig/kernel/events/core.c
+++ linux-2.6/kernel/events/core.c
@@ -29,6 +29,7 @@
 #include <linux/hardirq.h>
 #include <linux/rculist.h>
 #include <linux/uaccess.h>
+#include <linux/suspend.h>
 #include <linux/syscalls.h>
 #include <linux/anon_inodes.h>
 #include <linux/kernel_stat.h>
@@ -6809,7 +6810,7 @@ static void __cpuinit perf_event_init_cp
 	struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
 
 	mutex_lock(&swhash->hlist_mutex);
-	if (swhash->hlist_refcount > 0) {
+	if (swhash->hlist_refcount > 0 && !swhash->swevent_hlist) {
 		struct swevent_hlist *hlist;
 
 		hlist = kzalloc_node(sizeof(*hlist), GFP_KERNEL, cpu_to_node(cpu));
@@ -6898,7 +6899,14 @@ perf_cpu_notify(struct notifier_block *s
 {
 	unsigned int cpu = (long)hcpu;
 
-	switch (action & ~CPU_TASKS_FROZEN) {
+	/*
+	 * Ignore suspend/resume action, the perf_pm_notifier will
+	 * take care of that.
+	 */
+	if (action & CPU_TASKS_FROZEN)
+		return NOTIFY_OK;
+
+	switch (action) {
 
 	case CPU_UP_PREPARE:
 	case CPU_DOWN_FAILED:
@@ -6917,6 +6925,90 @@ perf_cpu_notify(struct notifier_block *s
 	return NOTIFY_OK;
 }
 
+static void perf_pm_resume_cpu(void *unused)
+{
+	struct perf_cpu_context *cpuctx;
+	struct perf_event_context *ctx;
+	struct pmu *pmu;
+	int idx;
+
+	idx = srcu_read_lock(&pmus_srcu);
+	list_for_each_entry_rcu(pmu, &pmus, entry) {
+		cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+		ctx = cpuctx->task_ctx;
+
+		perf_ctx_lock(cpuctx, ctx);
+		perf_pmu_disable(cpuctx->ctx.pmu);
+
+		cpu_ctx_sched_out(cpuctx, EVENT_ALL);
+		if (ctx)
+			ctx_sched_out(ctx, cpuctx, EVENT_ALL);
+
+		perf_pmu_enable(cpuctx->ctx.pmu);
+		perf_ctx_unlock(cpuctx, ctx);
+	}
+	srcu_read_unlock(&pmus_srcu, idx);
+}
+
+static void perf_pm_suspend_cpu(void *unused)
+{
+	struct perf_cpu_context *cpuctx;
+	struct perf_event_context *ctx;
+	struct pmu *pmu;
+	int idx;
+
+	idx = srcu_read_lock(&pmus_srcu);
+	list_for_each_entry_rcu(pmu, &pmus, entry) {
+		cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+		ctx = cpuctx->task_ctx;
+
+		perf_ctx_lock(cpuctx, ctx);
+		perf_pmu_disable(cpuctx->ctx.pmu);
+
+		perf_event_sched_in(cpuctx, ctx, current);
+
+		perf_pmu_enable(cpuctx->ctx.pmu);
+		perf_ctx_unlock(cpuctx, ctx);
+	}
+	srcu_read_unlock(&pmus_srcu, idx);
+}
+
+static int perf_resume(void)
+{
+	get_online_cpus();
+	smp_call_function(perf_pm_resume_cpu, NULL, 1);
+	put_online_cpus();
+
+	return NOTIFY_OK;
+}
+
+static int perf_suspend(void)
+{
+	get_online_cpus();
+	smp_call_function(perf_pm_suspend_cpu, NULL, 1);
+	put_online_cpus();
+
+	return NOTIFY_OK;
+}
+
+static int perf_pm(struct notifier_block *self, unsigned long action, void *ptr)
+{
+	switch (action) {
+	case PM_POST_HIBERNATION:
+	case PM_POST_SUSPEND:
+		return perf_resume();
+	case PM_HIBERNATION_PREPARE:
+	case PM_SUSPEND_PREPARE:
+		return perf_suspend();
+	default:
+		return NOTIFY_DONE;
+	}
+}
+
+static struct notifier_block perf_pm_notifier = {
+	.notifier_call = perf_pm,
+};
+
 void __init perf_event_init(void)
 {
 	int ret;
@@ -6931,6 +7023,7 @@ void __init perf_event_init(void)
 	perf_tp_register();
 	perf_cpu_notifier(perf_cpu_notify);
 	register_reboot_notifier(&perf_reboot_notifier);
+	register_pm_notifier(&perf_pm_notifier);
 
 	ret = init_hw_breakpoint();
 	WARN(ret, "hw_breakpoint initialization failed with: %d", ret);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-08-01 11:05       ` Peter Zijlstra
@ 2011-08-01 11:21         ` Cyrill Gorcunov
  2011-08-01 12:12           ` Peter Zijlstra
  2011-08-15  8:37         ` Francis Moreau
  1 sibling, 1 reply; 15+ messages in thread
From: Cyrill Gorcunov @ 2011-08-01 11:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Francis Moreau, LKML, Don Zickus, Stephane Eranian, Ingo Molnar,
	Jiri Slaby

On Mon, Aug 01, 2011 at 01:05:02PM +0200, Peter Zijlstra wrote:
> On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote:
> 
> > > >> I'm seeing those kernel message when resuming:
> > > >>
> > > >> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> > > >> [  524.973288] Do you have a strange power saving mode enabled?
> > > >> [  524.973289] Dazed and confused, but trying to continue
> > > >>
> > > >> I don't know if it's important or not because the system seems to work
> > > >> after but maybe it worths to report
> 
> So I guess the problem is the NMI watchdog and suspend stuff not
> shutting things down properly.. 
> 
> Argh, the PM notifier muck runs before the hotplug notifiers and it
> doesn't avoid hotplug races on its own.. what crap.
> 
> something like the below perhaps, compile tested only.. does it work?
> 

Thanks a huge, Peter! (I'm CC'ing Jiri as well, he has same issue)

...
> @@ -6809,7 +6810,7 @@ static void __cpuinit perf_event_init_cp
>  	struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
>  
>  	mutex_lock(&swhash->hlist_mutex);
> -	if (swhash->hlist_refcount > 0) {
> +	if (swhash->hlist_refcount > 0 && !swhash->swevent_hlist) {

Should not there be rcu_dereference(swhash->swevent_hlist)?

>  		struct swevent_hlist *hlist;
>  
>  		hlist = kzalloc_node(sizeof(*hlist), GFP_KERNEL, cpu_to_node(cpu));
...

	Cyrill

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-08-01 11:21         ` Cyrill Gorcunov
@ 2011-08-01 12:12           ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-08-01 12:12 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Francis Moreau, LKML, Don Zickus, Stephane Eranian, Ingo Molnar,
	Jiri Slaby

On Mon, 2011-08-01 at 15:21 +0400, Cyrill Gorcunov wrote:
> >       mutex_lock(&swhash->hlist_mutex);
> > -     if (swhash->hlist_refcount > 0) {
> > +     if (swhash->hlist_refcount > 0 && !swhash->swevent_hlist) {
> 
> Should not there be rcu_dereference(swhash->swevent_hlist)? 

swhash->hlist_mutex is the modifier lock for that rcu use and should
thus serialize things sufficiently.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31 11:19   ` Jiri Slaby
  2011-07-31 11:26     ` Cyrill Gorcunov
@ 2011-08-02 19:48     ` Don Zickus
  2011-08-02 21:00       ` Jiri Slaby
  1 sibling, 1 reply; 15+ messages in thread
From: Don Zickus @ 2011-08-02 19:48 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Cyrill Gorcunov, Francis Moreau, Linux Kernel Mailing List

On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote:
> On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote:
> > On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
> >> Hello,
> >>
> >> I'm seeing those kernel message when resuming:
> >>
> >> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> >> [  524.973288] Do you have a strange power saving mode enabled?
> >> [  524.973289] Dazed and confused, but trying to continue
> >>
> >> I don't know if it's important or not because the system seems to work
> >> after but maybe it worths to report
> >>
> > 
> > Hi Francis, cpu please?
> 
> Hi, I'm seeing those too. For a longer time though. Please see
> https://bugzilla.novell.com/show_bug.cgi?id=678882#c16

That is with 2.6.37.  Lots of watchdog/perf nmi fixes have been added
since then (though dmesg suggests the watchdog is disabled due to a broken
PMU).

I think for the Core iX family, 2.6.39 and higher have been stable.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-07-31  6:56 v3.0: Weird kernel log message when resuming avout NMI received Francis Moreau
  2011-07-31 11:06 ` Cyrill Gorcunov
@ 2011-08-02 19:52 ` Don Zickus
  1 sibling, 0 replies; 15+ messages in thread
From: Don Zickus @ 2011-08-02 19:52 UTC (permalink / raw)
  To: Francis Moreau; +Cc: Linux Kernel Mailing List

On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
> Hello,
> 
> I'm seeing those kernel message when resuming:
> 
> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> [  524.973288] Do you have a strange power saving mode enabled?
> [  524.973289] Dazed and confused, but trying to continue
> 
> I don't know if it's important or not because the system seems to work
> after but maybe it worths to report

Depends on what you were doing when this happened and if it is repeatable?

If it is from perf/nmi_watchdog it probably isn't that big of a deal
(though we should fix it).  Otherwise it could be a hardware failure.

Can you attach a dmesg log?

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-08-02 19:48     ` Don Zickus
@ 2011-08-02 21:00       ` Jiri Slaby
  2011-08-03  0:52         ` Donald Zickus II
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Slaby @ 2011-08-02 21:00 UTC (permalink / raw)
  To: Don Zickus; +Cc: Cyrill Gorcunov, Francis Moreau, Linux Kernel Mailing List

On 08/02/2011 09:48 PM, Don Zickus wrote:
> On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote:
>> On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote:
>>> On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
>>>> Hello,
>>>>
>>>> I'm seeing those kernel message when resuming:
>>>>
>>>> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
>>>> [  524.973288] Do you have a strange power saving mode enabled?
>>>> [  524.973289] Dazed and confused, but trying to continue
>>>>
>>>> I don't know if it's important or not because the system seems to work
>>>> after but maybe it worths to report
>>>>
>>>
>>> Hi Francis, cpu please?
>>
>> Hi, I'm seeing those too. For a longer time though. Please see
>> https://bugzilla.novell.com/show_bug.cgi?id=678882#c16
> 
> That is with 2.6.37.  Lots of watchdog/perf nmi fixes have been added
> since then (though dmesg suggests the watchdog is disabled due to a broken
> PMU).

No, comment 16 (the link above) is about 3.0.

> I think for the Core iX family, 2.6.39 and higher have been stable.

It looks like not quite...

regards,
-- 
js

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-08-02 21:00       ` Jiri Slaby
@ 2011-08-03  0:52         ` Donald Zickus II
  0 siblings, 0 replies; 15+ messages in thread
From: Donald Zickus II @ 2011-08-03  0:52 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Cyrill Gorcunov, Francis Moreau, Linux Kernel Mailing List



----- Original Message -----
> On 08/02/2011 09:48 PM, Don Zickus wrote:
> > On Sun, Jul 31, 2011 at 01:19:29PM +0200, Jiri Slaby wrote:
> >> On 07/31/2011 01:06 PM, Cyrill Gorcunov wrote:
> >>> On Sun, Jul 31, 2011 at 08:56:30AM +0200, Francis Moreau wrote:
> >>>> Hello,
> >>>>
> >>>> I'm seeing those kernel message when resuming:
> >>>>
> >>>> [ 524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
> >>>> [ 524.973288] Do you have a strange power saving mode enabled?
> >>>> [ 524.973289] Dazed and confused, but trying to continue
> >>>>
> >>>> I don't know if it's important or not because the system seems to
> >>>> work
> >>>> after but maybe it worths to report
> >>>>
> >>>
> >>> Hi Francis, cpu please?
> >>
> >> Hi, I'm seeing those too. For a longer time though. Please see
> >> https://bugzilla.novell.com/show_bug.cgi?id=678882#c16
> >
> > That is with 2.6.37. Lots of watchdog/perf nmi fixes have been added
> > since then (though dmesg suggests the watchdog is disabled due to a
> > broken
> > PMU).
> 
> No, comment 16 (the link above) is about 3.0.

Hi Jiri,

My apologies.  I see a bugzilla and just scroll to the top, habit I guess.  It seems like Peter has a fix already.  Hopefully it works.

Cheers,
Don

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-08-01 11:05       ` Peter Zijlstra
  2011-08-01 11:21         ` Cyrill Gorcunov
@ 2011-08-15  8:37         ` Francis Moreau
  2011-08-16 15:07           ` Francis Moreau
  1 sibling, 1 reply; 15+ messages in thread
From: Francis Moreau @ 2011-08-15  8:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Cyrill Gorcunov, LKML, Don Zickus, Stephane Eranian, Ingo Molnar

Hello Peter,

Sorry for the loooonnng delay but summer rest :)

On Mon, Aug 1, 2011 at 1:05 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote:
>
>> > >> I'm seeing those kernel message when resuming:
>> > >>
>> > >> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
>> > >> [  524.973288] Do you have a strange power saving mode enabled?
>> > >> [  524.973289] Dazed and confused, but trying to continue
>> > >>
>> > >> I don't know if it's important or not because the system seems to work
>> > >> after but maybe it worths to report
>
> So I guess the problem is the NMI watchdog and suspend stuff not
> shutting things down properly..
>
> Argh, the PM notifier muck runs before the hotplug notifiers and it
> doesn't avoid hotplug races on its own.. what crap.
>
> something like the below perhaps, compile tested only.. does it work?

Thanks for the fix, I'm going to give it a test and report in 2 days.

-- 
Francis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v3.0: Weird kernel log message when resuming avout NMI received
  2011-08-15  8:37         ` Francis Moreau
@ 2011-08-16 15:07           ` Francis Moreau
  0 siblings, 0 replies; 15+ messages in thread
From: Francis Moreau @ 2011-08-16 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Cyrill Gorcunov, LKML, Don Zickus, Stephane Eranian, Ingo Molnar

Peter,

On Mon, Aug 15, 2011 at 10:37 AM, Francis Moreau <francis.moro@gmail.com> wrote:
> Hello Peter,
>
> Sorry for the loooonnng delay but summer rest :)
>
> On Mon, Aug 1, 2011 at 1:05 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> On Sun, 2011-07-31 at 19:32 +0400, Cyrill Gorcunov wrote:
>>
>>> > >> I'm seeing those kernel message when resuming:
>>> > >>
>>> > >> [  524.973283] Uhhuh. NMI received for unknown reason 3d on CPU 0.
>>> > >> [  524.973288] Do you have a strange power saving mode enabled?
>>> > >> [  524.973289] Dazed and confused, but trying to continue
>>> > >>
>>> > >> I don't know if it's important or not because the system seems to work
>>> > >> after but maybe it worths to report
>>
>> So I guess the problem is the NMI watchdog and suspend stuff not
>> shutting things down properly..
>>
>> Argh, the PM notifier muck runs before the hotplug notifiers and it
>> doesn't avoid hotplug races on its own.. what crap.
>>
>> something like the below perhaps, compile tested only.. does it work?
>
> Thanks for the fix, I'm going to give it a test and report in 2 days.
>

I'm trying to test the fix but have to stick with the 3.0 kernel and
your patch seems to be based on 3.1.

So I had to backport a couple of patches otherwise I'm getting the
following error:

  CC      kernel/events/core.o
kernel/events/core.c: In function ‘perf_pm_resume_cpu’:
kernel/events/core.c:7342: error: implicit declaration of function
‘perf_ctx_lock’
kernel/events/core.c:7350: error: implicit declaration of function
‘perf_ctx_unlock’
kernel/events/core.c: In function ‘perf_pm_suspend_cpu’:
kernel/events/core.c:7370: error: implicit declaration of function
‘perf_event_sched_in’

Here are the patches that I backported on top of 3.0:

 perf: Collect the schedule-in rules in one function
 perf: Change and simplify ctx::is_active semantics
 perf: Simplify and fix __perf_install_in_context()
 perf: Remove task_ctx_sched_in()
 perf: Optimize event scheduling locking
 perf: Clean up 'ctx' reference counting
 perf: Optimize ctx_sched_out()

After the second suspend/resume sequence I still have the same warning
except that the reason has changed to '2d':

 kernel:[  404.091393] Uhhuh. NMI received for unknown reason 2d on CPU 0.
 kernel:[  404.091394] Do you have a strange power saving mode enabled?
 kernel:[  404.091396] Dazed and confused, but trying to continue

Thanks
-- 
Francis

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-08-16 15:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-31  6:56 v3.0: Weird kernel log message when resuming avout NMI received Francis Moreau
2011-07-31 11:06 ` Cyrill Gorcunov
2011-07-31 11:19   ` Jiri Slaby
2011-07-31 11:26     ` Cyrill Gorcunov
2011-08-02 19:48     ` Don Zickus
2011-08-02 21:00       ` Jiri Slaby
2011-08-03  0:52         ` Donald Zickus II
2011-07-31 15:20   ` Francis Moreau
2011-07-31 15:32     ` Cyrill Gorcunov
2011-08-01 11:05       ` Peter Zijlstra
2011-08-01 11:21         ` Cyrill Gorcunov
2011-08-01 12:12           ` Peter Zijlstra
2011-08-15  8:37         ` Francis Moreau
2011-08-16 15:07           ` Francis Moreau
2011-08-02 19:52 ` Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.