linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: using smp_processor_id() during suspend with 2.6.25-rc8
@ 2008-04-07 13:25 Zdenek Kabelac
  2008-04-07 21:54 ` Jiri Kosina
  0 siblings, 1 reply; 31+ messages in thread
From: Zdenek Kabelac @ 2008-04-07 13:25 UTC (permalink / raw)
  To: Kernel development list

Hello

This has appeared in my kernel log:

Zdenek

----

 hwsleep-0322 [00] enter_sleep_state     : Entering sleep state [S3]
BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/18791
caller is do_machine_check+0xa9/0x500
Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32

Call Trace:
 [debug_smp_processor_id+196/208] debug_smp_processor_id+0xc4/0xd0
 [do_machine_check+169/1280] do_machine_check+0xa9/0x500
 [init_8259A+27/256] ? init_8259A+0x1b/0x100
 [mce_init+86/240] mce_init+0x56/0xf0
 [mce_resume+11/16] mce_resume+0xb/0x10
 [__sysdev_resume+32/96] __sysdev_resume+0x20/0x60
 [sysdev_resume+88/144] sysdev_resume+0x58/0x90
 [device_power_up+9/16] device_power_up+0x9/0x10
 [suspend_devices_and_enter+327/416] suspend_devices_and_enter+0x147/0x1a0
 [enter_state+326/464] enter_state+0x146/0x1d0
 [state_store+186/256] state_store+0xba/0x100
 [kobj_attr_store+23/32] kobj_attr_store+0x17/0x20
 [sysfs_write_file+202/320] sysfs_write_file+0xca/0x140
 [vfs_write+203/368] vfs_write+0xcb/0x170
 [sys_write+80/144] sys_write+0x50/0x90
 [system_call_after_swapgs+123/128] system_call_after_swapgs+0x7b/0x80

Extended CMOS year: 2000
BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/18791
caller is retrigger_next_event+0x1c/0xf0
Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32

Call Trace:
 [debug_smp_processor_id+196/208] debug_smp_processor_id+0xc4/0xd0
 [retrigger_next_event+28/240] retrigger_next_event+0x1c/0xf0
 [_spin_unlock+48/96] ? _spin_unlock+0x30/0x60
 [hres_timers_resume+30/80] hres_timers_resume+0x1e/0x50
 [timekeeping_resume+293/352] timekeeping_resume+0x125/0x160
 [__sysdev_resume+32/96] __sysdev_resume+0x20/0x60
 [sysdev_resume+88/144] sysdev_resume+0x58/0x90
 [device_power_up+9/16] device_power_up+0x9/0x10
 [suspend_devices_and_enter+327/416] suspend_devices_and_enter+0x147/0x1a0
 [enter_state+326/464] enter_state+0x146/0x1d0
 [state_store+186/256] state_store+0xba/0x100
 [kobj_attr_store+23/32] kobj_attr_store+0x17/0x20
 [sysfs_write_file+202/320] sysfs_write_file+0xca/0x140
 [vfs_write+203/368] vfs_write+0xcb/0x170
 [sys_write+80/144] sys_write+0x50/0x90
 [system_call_after_swapgs+123/128] system_call_after_swapgs+0x7b/0x80

BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/18791
caller is retrigger_next_event+0x84/0xf0
Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32

Call Trace:
 [debug_smp_processor_id+196/208] debug_smp_processor_id+0xc4/0xd0
 [retrigger_next_event+132/240] retrigger_next_event+0x84/0xf0
 [hres_timers_resume+30/80] hres_timers_resume+0x1e/0x50
 [timekeeping_resume+293/352] timekeeping_resume+0x125/0x160
 [__sysdev_resume+32/96] __sysdev_resume+0x20/0x60
 [sysdev_resume+88/144] sysdev_resume+0x58/0x90
 [device_power_up+9/16] device_power_up+0x9/0x10
 [suspend_devices_and_enter+327/416] suspend_devices_and_enter+0x147/0x1a0
 [enter_state+326/464] enter_state+0x146/0x1d0
 [state_store+186/256] state_store+0xba/0x100
 [kobj_attr_store+23/32] kobj_attr_store+0x17/0x20
 [sysfs_write_file+202/320] sysfs_write_file+0xca/0x140
 [vfs_write+203/368] vfs_write+0xcb/0x170
 [sys_write+80/144] sys_write+0x50/0x90
 [system_call_after_swapgs+123/128] system_call_after_swapgs+0x7b/0x80

Enabling non-boot CPUs ...
lockdep: fixing up alternatives.
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 13:25 BUG: using smp_processor_id() during suspend with 2.6.25-rc8 Zdenek Kabelac
@ 2008-04-07 21:54 ` Jiri Kosina
  2008-04-07 22:02   ` Andi Kleen
  2008-04-11 10:25   ` Ingo Molnar
  0 siblings, 2 replies; 31+ messages in thread
From: Jiri Kosina @ 2008-04-07 21:54 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Kernel development list, Ingo Molnar, Thomas Gleixner

[ CCs added ]

On Mon, 7 Apr 2008, Zdenek Kabelac wrote:

> This has appeared in my kernel log:

The two patches below should fix these, shouldn't they?

>  hwsleep-0322 [00] enter_sleep_state     : Entering sleep state [S3]
> BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/18791
> caller is do_machine_check+0xa9/0x500

From: Jiri Kosina <jkosina@suse.cz>

x86: fix possible race in do_machine_check()

It is not safe to call smp_processor_id() in cases we are not sure someone 
will not reschedule us.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

diff --git a/arch/x86/kernel/cpu/mcheck/mce_64.c b/arch/x86/kernel/cpu/mcheck/mce_64.c
index 9a699ed..bf92375 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -189,6 +189,7 @@ void do_machine_check(struct pt_regs * regs, long error_code)
 	 * error.
 	 */
 	int kill_it = 0;
+	unsigned long flags;
 
 	atomic_inc(&mce_entry);
 
@@ -199,6 +200,8 @@ void do_machine_check(struct pt_regs * regs, long error_code)
 		goto out2;
 
 	memset(&m, 0, sizeof(struct mce));
+
+	local_irq_save(flags);
 	m.cpu = smp_processor_id();
 	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
 	/* if the restart IP is not valid, we're done for */
@@ -315,6 +318,7 @@ void do_machine_check(struct pt_regs * regs, long error_code)
 	wrmsrl(MSR_IA32_MCG_STATUS, 0);
  out2:
 	atomic_dec(&mce_entry);
+	local_irq_restore(flags);
 }
 
 #ifdef CONFIG_X86_MCE_INTEL

> Extended CMOS year: 2000
> BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/18791
> caller is retrigger_next_event+0x1c/0xf0
> Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32

From: Jiri Kosina <jkosina@suse.cz>

hrtimer: fix possible race in retrigger_next_event()

It is not safe to perform per-cpu variable read and use it later in cases 
we are not sure someone will not reschedule us.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 98bee01..c008096 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -486,6 +486,7 @@ static void retrigger_next_event(void *arg)
 	struct hrtimer_cpu_base *base;
 	struct timespec realtime_offset;
 	unsigned long seq;
+	unsigned long flags;
 
 	if (!hrtimer_hres_active())
 		return;
@@ -497,6 +498,7 @@ static void retrigger_next_event(void *arg)
 					-wall_to_monotonic.tv_nsec);
 	} while (read_seqretry(&xtime_lock, seq));
 
+	local_irq_save(flags);
 	base = &__get_cpu_var(hrtimer_bases);
 
 	/* Adjust CLOCK_REALTIME offset */
@@ -506,6 +508,7 @@ static void retrigger_next_event(void *arg)
 
 	hrtimer_force_reprogram(base);
 	spin_unlock(&base->lock);
+	local_irq_restore(flags);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 21:54 ` Jiri Kosina
@ 2008-04-07 22:02   ` Andi Kleen
  2008-04-07 22:04     ` Jiri Kosina
  2008-04-11 10:25   ` Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2008-04-07 22:02 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Zdenek Kabelac, Kernel development list, Ingo Molnar, Thomas Gleixner

Jiri Kosina <jkosina@suse.cz> writes:
>
> It is not safe to call smp_processor_id() in cases we are not sure someone 
> will not reschedule us.

We are sure. Machine checks always run with interrupts off.

Also BTW in a non preempt-rt kernel all exceptions run non preemptable
anyways.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:02   ` Andi Kleen
@ 2008-04-07 22:04     ` Jiri Kosina
  2008-04-07 22:12       ` Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2008-04-07 22:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Zdenek Kabelac, Kernel development list, Ingo Molnar, Thomas Gleixner

On Tue, 8 Apr 2008, Andi Kleen wrote:

> > It is not safe to call smp_processor_id() in cases we are not sure 
> > someone will not reschedule us.
> We are sure. Machine checks always run with interrupts off.

I know. However preempt_count is a little bit inconsistent in such cases 
though.

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:12       ` Andi Kleen
@ 2008-04-07 22:11         ` Jiri Kosina
  2008-04-07 22:14           ` Jiri Kosina
  2008-04-07 22:23           ` Andi Kleen
  0 siblings, 2 replies; 31+ messages in thread
From: Jiri Kosina @ 2008-04-07 22:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Zdenek Kabelac, Kernel development list, Ingo Molnar, Thomas Gleixner

On Tue, 8 Apr 2008, Andi Kleen wrote:

> > I know. However preempt_count is a little bit inconsistent in such cases 
> > though.
> And? interrupts off beats preempt count anyways. Why did you write the 
> patch? Was there a (incorrect) warning triggered?

Reported at http://lkml.org/lkml/2008/4/7/130

BTW is also mce_init() (called from mce_resume()) guaranteed to run with 
IRQs off?

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:04     ` Jiri Kosina
@ 2008-04-07 22:12       ` Andi Kleen
  2008-04-07 22:11         ` Jiri Kosina
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2008-04-07 22:12 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andi Kleen, Zdenek Kabelac, Kernel development list, Ingo Molnar,
	Thomas Gleixner

On Tue, Apr 08, 2008 at 12:04:39AM +0200, Jiri Kosina wrote:
> On Tue, 8 Apr 2008, Andi Kleen wrote:
> 
> > > It is not safe to call smp_processor_id() in cases we are not sure 
> > > someone will not reschedule us.
> > We are sure. Machine checks always run with interrupts off.
> 
> I know. However preempt_count is a little bit inconsistent in such cases 
> though.

And? interrupts off beats preempt count anyways.

Why did you write the patch? Was there a (incorrect) warning triggered?

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:11         ` Jiri Kosina
@ 2008-04-07 22:14           ` Jiri Kosina
  2008-04-07 22:23           ` Andi Kleen
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Kosina @ 2008-04-07 22:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Zdenek Kabelac, Kernel development list, Ingo Molnar, Thomas Gleixner

On Tue, 8 Apr 2008, Jiri Kosina wrote:

> Reported at http://lkml.org/lkml/2008/4/7/130
> BTW is also mce_init() (called from mce_resume()) guaranteed to run with 
> IRQs off?

... I wouldn't say so, otherwise the BUG wouldn't trigger, right? 
debug_smp_processor_id() is explicitly checking both for preempt_count 
value and irqs_disabled().

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:11         ` Jiri Kosina
  2008-04-07 22:14           ` Jiri Kosina
@ 2008-04-07 22:23           ` Andi Kleen
  2008-04-07 22:29             ` Rafael J. Wysocki
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2008-04-07 22:23 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andi Kleen, Zdenek Kabelac, Kernel development list, Ingo Molnar,
	Thomas Gleixner, rjw

On Tue, Apr 08, 2008 at 12:11:17AM +0200, Jiri Kosina wrote:
> On Tue, 8 Apr 2008, Andi Kleen wrote:
> 
> > > I know. However preempt_count is a little bit inconsistent in such cases 
> > > though.
> > And? interrupts off beats preempt count anyways. Why did you write the 
> > patch? Was there a (incorrect) warning triggered?
> 
> Reported at http://lkml.org/lkml/2008/4/7/130
> 
> BTW is also mce_init() (called from mce_resume()) guaranteed to run with 
> IRQs off?

[cc rafael]

The mce resume is a sysdev.

sysdevs were always supposed to run completely with interrupts off. If they 
don't anymore that's some kind of higher level resume code bug which you need 
to fix there, not hack around in the low level code.

If it does that it likely broke more code too.

Obviously turning on preemption anywhere around the machine check is
fatal because it touches CPU state and if you reschedule you could
switch to another CPU and change or access the wrong CPU's state.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:23           ` Andi Kleen
@ 2008-04-07 22:29             ` Rafael J. Wysocki
  2008-04-07 22:33               ` Jiri Kosina
  2008-04-07 22:35               ` Andi Kleen
  0 siblings, 2 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2008-04-07 22:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jiri Kosina, Zdenek Kabelac, Kernel development list,
	Ingo Molnar, Thomas Gleixner

On Tuesday, 8 of April 2008, Andi Kleen wrote:
> On Tue, Apr 08, 2008 at 12:11:17AM +0200, Jiri Kosina wrote:
> > On Tue, 8 Apr 2008, Andi Kleen wrote:
> > 
> > > > I know. However preempt_count is a little bit inconsistent in such cases 
> > > > though.
> > > And? interrupts off beats preempt count anyways. Why did you write the 
> > > patch? Was there a (incorrect) warning triggered?
> > 
> > Reported at http://lkml.org/lkml/2008/4/7/130
> > 
> > BTW is also mce_init() (called from mce_resume()) guaranteed to run with 
> > IRQs off?
> 
> [cc rafael]
> 
> The mce resume is a sysdev.
> 
> sysdevs were always supposed to run completely with interrupts off. If they 
> don't anymore that's some kind of higher level resume code bug which you need 
> to fix there, not hack around in the low level code.

They are executed with interrupts disabled, on one CPU.

> If it does that it likely broke more code too.
> 
> Obviously turning on preemption anywhere around the machine check is
> fatal because it touches CPU state and if you reschedule you could
> switch to another CPU and change or access the wrong CPU's state.

FWIW, at the point when sysdevs are resumed we are single-threaded.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:29             ` Rafael J. Wysocki
@ 2008-04-07 22:33               ` Jiri Kosina
  2008-04-07 22:53                 ` Zdenek Kabelac
                                   ` (2 more replies)
  2008-04-07 22:35               ` Andi Kleen
  1 sibling, 3 replies; 31+ messages in thread
From: Jiri Kosina @ 2008-04-07 22:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andi Kleen, Zdenek Kabelac, Kernel development list, Ingo Molnar,
	Thomas Gleixner

On Tue, 8 Apr 2008, Rafael J. Wysocki wrote:

> > The mce resume is a sysdev.
> > sysdevs were always supposed to run completely with interrupts off. If they 
> > don't anymore that's some kind of higher level resume code bug which you need 
> > to fix there, not hack around in the low level code.
> They are executed with interrupts disabled, on one CPU.

So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id() 
triggers the warning? Apparently preempt_count is zero, irqs_disabled() 
returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed.

So there clearly is a bug somewhere.

> > Obviously turning on preemption anywhere around the machine check is
> > fatal because it touches CPU state and if you reschedule you could
> > switch to another CPU and change or access the wrong CPU's state.
> FWIW, at the point when sysdevs are resumed we are single-threaded.

Is that really relevant here? We still could be switched over to another 
CPU, and that would break things.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:29             ` Rafael J. Wysocki
  2008-04-07 22:33               ` Jiri Kosina
@ 2008-04-07 22:35               ` Andi Kleen
  2008-04-07 22:54                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2008-04-07 22:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andi Kleen, Jiri Kosina, Zdenek Kabelac, Kernel development list,
	Ingo Molnar, Thomas Gleixner

On Tue, Apr 08, 2008 at 12:29:30AM +0200, Rafael J. Wysocki wrote:
> On Tuesday, 8 of April 2008, Andi Kleen wrote:
> > On Tue, Apr 08, 2008 at 12:11:17AM +0200, Jiri Kosina wrote:
> > > On Tue, 8 Apr 2008, Andi Kleen wrote:
> > > 
> > > > > I know. However preempt_count is a little bit inconsistent in such cases 
> > > > > though.
> > > > And? interrupts off beats preempt count anyways. Why did you write the 
> > > > patch? Was there a (incorrect) warning triggered?
> > > 
> > > Reported at http://lkml.org/lkml/2008/4/7/130
> > > 
> > > BTW is also mce_init() (called from mce_resume()) guaranteed to run with 
> > > IRQs off?
> > 
> > [cc rafael]
> > 
> > The mce resume is a sysdev.
> > 
> > sysdevs were always supposed to run completely with interrupts off. If they 
> > don't anymore that's some kind of higher level resume code bug which you need 
> > to fix there, not hack around in the low level code.
> 
> They are executed with interrupts disabled, on one CPU.

Well then someone enables them incorrectly, see the report above.

> 
> > If it does that it likely broke more code too.
> > 
> > Obviously turning on preemption anywhere around the machine check is
> > fatal because it touches CPU state and if you reschedule you could
> > switch to another CPU and change or access the wrong CPU's state.
> 
> FWIW, at the point when sysdevs are resumed we are single-threaded.

You mean single CPUed? Even a single thread could switch to another CPU.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:33               ` Jiri Kosina
@ 2008-04-07 22:53                 ` Zdenek Kabelac
  2008-04-07 23:10                   ` Rafael J. Wysocki
  2008-04-08  8:09                   ` Jiri Kosina
  2008-04-07 22:56                 ` Rafael J. Wysocki
  2008-04-10  9:45                 ` Pavel Machek
  2 siblings, 2 replies; 31+ messages in thread
From: Zdenek Kabelac @ 2008-04-07 22:53 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Rafael J. Wysocki, Andi Kleen, Kernel development list,
	Ingo Molnar, Thomas Gleixner

2008/4/8, Jiri Kosina <jkosina@suse.cz>:
> On Tue, 8 Apr 2008, Rafael J. Wysocki wrote:
>
>  > > The mce resume is a sysdev.
>  > > sysdevs were always supposed to run completely with interrupts off. If they
>  > > don't anymore that's some kind of higher level resume code bug which you need
>  > > to fix there, not hack around in the low level code.
>  > They are executed with interrupts disabled, on one CPU.
>
>
> So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id()
>  triggers the warning? Apparently preempt_count is zero, irqs_disabled()
>  returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed.
>
>  So there clearly is a bug somewhere.
>
>
>  > > Obviously turning on preemption anywhere around the machine check is
>  > > fatal because it touches CPU state and if you reschedule you could
>  > > switch to another CPU and change or access the wrong CPU's state.
>  > FWIW, at the point when sysdevs are resumed we are single-threaded.
>
>
> Is that really relevant here? We still could be switched over to another
>  CPU, and that would break things.


To avoid some mystery - these two warns precede warnings I've
initially posted here - the reason why I've not posted also these 2
kvm related warns is that kvm maintainers already know about this
problem and said in the past that they will eventually fix them (If I
remember correctly)

Warnings and bug looked somewhat unrelated - but I could be wrong and
the mystery irq could be mangles in kvm....


ACPI: Preparing to enter system sleep state S3
Disabling non-boot CPUs ...
kvm: disabling virtualization on CPU1
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp_64.c:424 smp_call_function_single+0xea/0x100()
Modules linked in: cpufreq_userspace tun i915 drm ipt_MASQUERADE
iptable_nat nf_na
t nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables bridge llc nfsd lockd nfs_acl auth
_rpcgss exportfs autofs4 sunrpc binfmt_misc dm_mirror dm_log
dm_multipath dm_mod uinput kvm_intel kvm snd_hda_intel snd_seq_oss snd
_seq_midi_event snd_seq arc4 snd_seq_device snd_pcm_oss ecb
snd_mixer_oss snd_pcm crypto_blkcipher video snd_timer iTCO_wdt
cryptom
gr crypto_algapi rtc_cmos iwl3945 sdhci mmc_core thinkpad_acpi snd
mac80211 psmouse evdev iTCO_vendor_support rtc_core i2c_i801 out
put rtc_lib soundcore snd_page_alloc e1000e sr_mod intel_agp battery
ac backlight nvram serio_raw i2c_core cfg80211 cdrom button uh
ci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode]
Pid: 18951, comm: kstopmachine Not tainted 2.6.25-rc8 #32

Call Trace:
 [warn_on_slowpath+95/144] warn_on_slowpath+0x5f/0x90
 [vprintk+560/1136] ? vprintk+0x230/0x470
 [hrtick_set+57/304] ? hrtick_set+0x39/0x130
 [_end+108902941/2109239016] ? :kvm:decache_vcpus_on_cpu+0x65/0xe0
 [smp_call_function_single+234/256] smp_call_function_single+0xea/0x100
 [_end+109030358/2109239016] :kvm_intel:vmx_vcpu_decache+0x2e/0x40
 [_end+108903005/2109239016] :kvm:decache_vcpus_on_cpu+0xa5/0xe0
 [_end+108882341/2109239016] :kvm:hardware_disable+0x2d/0x40
 [_end+108882503/2109239016] :kvm:kvm_cpu_hotplug+0x8f/0xe0
 [notifier_call_chain+63/128] notifier_call_chain+0x3f/0x80
 [raw_notifier_call_chain+17/32] raw_notifier_call_chain+0x11/0x20
 [take_cpu_down+27/64] take_cpu_down+0x1b/0x40
 [do_stop+328/576] do_stop+0x148/0x240
 [do_stop+0/576] ? do_stop+0x0/0x240
 [do_stop+0/576] ? do_stop+0x0/0x240
 [kthread+73/144] kthread+0x49/0x90
 [child_rip+10/18] child_rip+0xa/0x12
 [restore_args+0/48] ? restore_args+0x0/0x30
 [kthread+0/144] ? kthread+0x0/0x90
 [child_rip+0/18] ? child_rip+0x0/0x12

---[ end trace ca143223eefdc828 ]---
CPU 1 is now offline
lockdep: fixing up alternatives.
SMP alternatives: switching to UP code
CPU1 is down
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp_64.c:424 smp_call_function_single+0xea/0x100()
Modules linked in: cpufreq_userspace tun i915 drm ipt_MASQUERADE
iptable_nat nf_na
ECT xt_tcpudp iptable_filter ip_tables x_tables bridge llc nfsd lockd
nfs_acl auth
rror dm_log dm_multipath dm_mod uinput kvm_intel kvm snd_hda_intel
snd_seq_oss snd
cm_oss ecb snd_mixer_oss snd_pcm crypto_blkcipher video snd_timer
iTCO_wdt cryptom
thinkpad_acpi snd mac80211 psmouse evdev iTCO_vendor_support rtc_core
i2c_i801 out
d intel_agp battery ac backlight nvram serio_raw i2c_core cfg80211
cdrom button uh
microcode]
Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32

Call Trace:
 [warn_on_slowpath+95/144] warn_on_slowpath+0x5f/0x90
 [_spin_unlock_irqrestore+69/144] ? _spin_unlock_irqrestore+0x45/0x90
 [release_console_sem+535/560] ? release_console_sem+0x217/0x230
 [vprintk+560/1136] ? vprintk+0x230/0x470
 [_end+108902941/2109239016] ? :kvm:decache_vcpus_on_cpu+0x65/0xe0
 [smp_call_function_single+234/256] smp_call_function_single+0xea/0x100
 [_end+109030358/2109239016] :kvm_intel:vmx_vcpu_decache+0x2e/0x40
 [_end+108903005/2109239016] :kvm:decache_vcpus_on_cpu+0xa5/0xe0
 [_end+108882341/2109239016] :kvm:hardware_disable+0x2d/0x40
 [_end+108882595/2109239016] :kvm:kvm_suspend+0xb/0x10
 [sysdev_suspend+209/560] sysdev_suspend+0xd1/0x230
 [device_power_down+177/256] device_power_down+0xb1/0x100
 [suspend_devices_and_enter+300/416] suspend_devices_and_enter+0x12c/0x1a0
 [enter_state+326/464] enter_state+0x146/0x1d0
 [state_store+186/256] state_store+0xba/0x100
 [kobj_attr_store+23/32] kobj_attr_store+0x17/0x20
 [sysfs_write_file+202/320] sysfs_write_file+0xca/0x140
 [vfs_write+203/368] vfs_write+0xcb/0x170
 [sys_write+80/144] sys_write+0x50/0x90
 [system_call_after_swapgs+123/128] system_call_after_swapgs+0x7b/0x80

---[ end trace ca143223eefdc828 ]---
Extended CMOS year: 2000

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:35               ` Andi Kleen
@ 2008-04-07 22:54                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2008-04-07 22:54 UTC (permalink / raw)
  To: Andi Kleen, Zdenek Kabelac
  Cc: Jiri Kosina, Kernel development list, Ingo Molnar, Thomas Gleixner

On Tuesday, 8 of April 2008, Andi Kleen wrote:
> On Tue, Apr 08, 2008 at 12:29:30AM +0200, Rafael J. Wysocki wrote:
> > On Tuesday, 8 of April 2008, Andi Kleen wrote:
> > > On Tue, Apr 08, 2008 at 12:11:17AM +0200, Jiri Kosina wrote:
> > > > On Tue, 8 Apr 2008, Andi Kleen wrote:
> > > > 
> > > > > > I know. However preempt_count is a little bit inconsistent in such cases 
> > > > > > though.
> > > > > And? interrupts off beats preempt count anyways. Why did you write the 
> > > > > patch? Was there a (incorrect) warning triggered?
> > > > 
> > > > Reported at http://lkml.org/lkml/2008/4/7/130
> > > > 
> > > > BTW is also mce_init() (called from mce_resume()) guaranteed to run with 
> > > > IRQs off?
> > > 
> > > [cc rafael]
> > > 
> > > The mce resume is a sysdev.
> > > 
> > > sysdevs were always supposed to run completely with interrupts off. If they 
> > > don't anymore that's some kind of higher level resume code bug which you need 
> > > to fix there, not hack around in the low level code.
> > 
> > They are executed with interrupts disabled, on one CPU.
> 
> Well then someone enables them incorrectly, see the report above.

Hm, well.  I really don't think that ACPI or the core does it, but anyway
Zdenek, could you please apply the appended patch and see if it
produces any additional warnings?

If it doesn't, then probably one of the sysdevs does it in its ->resume()
routine, which also will be easy to check against.

> > 
> > > If it does that it likely broke more code too.
> > > 
> > > Obviously turning on preemption anywhere around the machine check is
> > > fatal because it touches CPU state and if you reschedule you could
> > > switch to another CPU and change or access the wrong CPU's state.
> > 
> > FWIW, at the point when sysdevs are resumed we are single-threaded.
> 
> You mean single CPUed? Even a single thread could switch to another CPU.

Signle CPUed too.

Thanks,
Rafael

---
---
 drivers/acpi/sleep/main.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-2.6/drivers/acpi/sleep/main.c
===================================================================
--- linux-2.6.orig/drivers/acpi/sleep/main.c
+++ linux-2.6/drivers/acpi/sleep/main.c
@@ -141,9 +141,13 @@ static int acpi_pm_enter(suspend_state_t
 		break;
 	}
 
+	WARN_ON(!irqs_disabled());
+
 	/* Reprogram control registers and execute _BFS */
 	acpi_leave_sleep_state_prep(acpi_state);
 
+	WARN_ON(!irqs_disabled());
+
 	/* ACPI 3.0 specs (P62) says that it's the responsibility
 	 * of the OSPM to clear the status bit [ implying that the
 	 * POWER_BUTTON event should not reach userspace ]
@@ -158,6 +162,8 @@ static int acpi_pm_enter(suspend_state_t
 	 */
 	acpi_hw_disable_all_gpes();
 
+	WARN_ON(!irqs_disabled());
+
 	local_irq_restore(flags);
 	printk(KERN_DEBUG "Back to C!\n");
 
@@ -165,6 +171,8 @@ static int acpi_pm_enter(suspend_state_t
 	if (acpi_state == ACPI_STATE_S3)
 		acpi_restore_state_mem();
 
+	WARN_ON(!irqs_disabled());
+
 	return ACPI_SUCCESS(status) ? 0 : -EFAULT;
 }
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:33               ` Jiri Kosina
  2008-04-07 22:53                 ` Zdenek Kabelac
@ 2008-04-07 22:56                 ` Rafael J. Wysocki
  2008-04-10  9:45                 ` Pavel Machek
  2 siblings, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2008-04-07 22:56 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andi Kleen, Zdenek Kabelac, Kernel development list, Ingo Molnar,
	Thomas Gleixner

On Tuesday, 8 of April 2008, Jiri Kosina wrote:
> On Tue, 8 Apr 2008, Rafael J. Wysocki wrote:
> 
> > > The mce resume is a sysdev.
> > > sysdevs were always supposed to run completely with interrupts off. If they 
> > > don't anymore that's some kind of higher level resume code bug which you need 
> > > to fix there, not hack around in the low level code.
> > They are executed with interrupts disabled, on one CPU.
> 
> So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id() 
> triggers the warning? Apparently preempt_count is zero, irqs_disabled() 
> returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed.
> 
> So there clearly is a bug somewhere.

Yes, there is.  Still, I wonder why doesn't everyone see it.
 
> > > Obviously turning on preemption anywhere around the machine check is
> > > fatal because it touches CPU state and if you reschedule you could
> > > switch to another CPU and change or access the wrong CPU's state.
> > FWIW, at the point when sysdevs are resumed we are single-threaded.
> 
> Is that really relevant here? We still could be switched over to another 
> CPU, and that would break things.

No, we couldn't, because the other CPUs are off at this point.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:53                 ` Zdenek Kabelac
@ 2008-04-07 23:10                   ` Rafael J. Wysocki
  2008-04-08  8:09                   ` Jiri Kosina
  1 sibling, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2008-04-07 23:10 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Jiri Kosina, Andi Kleen, Kernel development list, Ingo Molnar,
	Thomas Gleixner

On Tuesday, 8 of April 2008, Zdenek Kabelac wrote:
> 2008/4/8, Jiri Kosina <jkosina@suse.cz>:
> > On Tue, 8 Apr 2008, Rafael J. Wysocki wrote:
> >
> >  > > The mce resume is a sysdev.
> >  > > sysdevs were always supposed to run completely with interrupts off. If they
> >  > > don't anymore that's some kind of higher level resume code bug which you need
> >  > > to fix there, not hack around in the low level code.
> >  > They are executed with interrupts disabled, on one CPU.
> >
> >
> > So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id()
> >  triggers the warning? Apparently preempt_count is zero, irqs_disabled()
> >  returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed.
> >
> >  So there clearly is a bug somewhere.
> >
> >
> >  > > Obviously turning on preemption anywhere around the machine check is
> >  > > fatal because it touches CPU state and if you reschedule you could
> >  > > switch to another CPU and change or access the wrong CPU's state.
> >  > FWIW, at the point when sysdevs are resumed we are single-threaded.
> >
> >
> > Is that really relevant here? We still could be switched over to another
> >  CPU, and that would break things.
> 
> 
> To avoid some mystery - these two warns precede warnings I've
> initially posted here - the reason why I've not posted also these 2
> kvm related warns is that kvm maintainers already know about this
> problem and said in the past that they will eventually fix them (If I
> remember correctly)
> 
> Warnings and bug looked somewhat unrelated - but I could be wrong and
> the mystery irq could be mangles in kvm....

Can you please test without the kvm, then?

> ACPI: Preparing to enter system sleep state S3
> Disabling non-boot CPUs ...
> kvm: disabling virtualization on CPU1
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/smp_64.c:424 smp_call_function_single+0xea/0x100()
> Modules linked in: cpufreq_userspace tun i915 drm ipt_MASQUERADE
> iptable_nat nf_na
> t nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp
> iptable_filter ip_tables x_tables bridge llc nfsd lockd nfs_acl auth
> _rpcgss exportfs autofs4 sunrpc binfmt_misc dm_mirror dm_log
> dm_multipath dm_mod uinput kvm_intel kvm snd_hda_intel snd_seq_oss snd
> _seq_midi_event snd_seq arc4 snd_seq_device snd_pcm_oss ecb
> snd_mixer_oss snd_pcm crypto_blkcipher video snd_timer iTCO_wdt
> cryptom
> gr crypto_algapi rtc_cmos iwl3945 sdhci mmc_core thinkpad_acpi snd
> mac80211 psmouse evdev iTCO_vendor_support rtc_core i2c_i801 out
> put rtc_lib soundcore snd_page_alloc e1000e sr_mod intel_agp battery
> ac backlight nvram serio_raw i2c_core cfg80211 cdrom button uh
> ci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode]
> Pid: 18951, comm: kstopmachine Not tainted 2.6.25-rc8 #32
> 
> Call Trace:
>  [warn_on_slowpath+95/144] warn_on_slowpath+0x5f/0x90
>  [vprintk+560/1136] ? vprintk+0x230/0x470
>  [hrtick_set+57/304] ? hrtick_set+0x39/0x130
>  [_end+108902941/2109239016] ? :kvm:decache_vcpus_on_cpu+0x65/0xe0
>  [smp_call_function_single+234/256] smp_call_function_single+0xea/0x100
>  [_end+109030358/2109239016] :kvm_intel:vmx_vcpu_decache+0x2e/0x40
>  [_end+108903005/2109239016] :kvm:decache_vcpus_on_cpu+0xa5/0xe0
>  [_end+108882341/2109239016] :kvm:hardware_disable+0x2d/0x40
>  [_end+108882503/2109239016] :kvm:kvm_cpu_hotplug+0x8f/0xe0
>  [notifier_call_chain+63/128] notifier_call_chain+0x3f/0x80
>  [raw_notifier_call_chain+17/32] raw_notifier_call_chain+0x11/0x20
>  [take_cpu_down+27/64] take_cpu_down+0x1b/0x40
>  [do_stop+328/576] do_stop+0x148/0x240
>  [do_stop+0/576] ? do_stop+0x0/0x240
>  [do_stop+0/576] ? do_stop+0x0/0x240
>  [kthread+73/144] kthread+0x49/0x90
>  [child_rip+10/18] child_rip+0xa/0x12
>  [restore_args+0/48] ? restore_args+0x0/0x30
>  [kthread+0/144] ? kthread+0x0/0x90
>  [child_rip+0/18] ? child_rip+0x0/0x12
> 
> ---[ end trace ca143223eefdc828 ]---
> CPU 1 is now offline
> lockdep: fixing up alternatives.
> SMP alternatives: switching to UP code
> CPU1 is down
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/smp_64.c:424 smp_call_function_single+0xea/0x100()
> Modules linked in: cpufreq_userspace tun i915 drm ipt_MASQUERADE
> iptable_nat nf_na
> ECT xt_tcpudp iptable_filter ip_tables x_tables bridge llc nfsd lockd
> nfs_acl auth
> rror dm_log dm_multipath dm_mod uinput kvm_intel kvm snd_hda_intel
> snd_seq_oss snd
> cm_oss ecb snd_mixer_oss snd_pcm crypto_blkcipher video snd_timer
> iTCO_wdt cryptom
> thinkpad_acpi snd mac80211 psmouse evdev iTCO_vendor_support rtc_core
> i2c_i801 out
> d intel_agp battery ac backlight nvram serio_raw i2c_core cfg80211
> cdrom button uh
> microcode]
> Pid: 18791, comm: pm-suspend Not tainted 2.6.25-rc8 #32
> 
> Call Trace:
>  [warn_on_slowpath+95/144] warn_on_slowpath+0x5f/0x90
>  [_spin_unlock_irqrestore+69/144] ? _spin_unlock_irqrestore+0x45/0x90
>  [release_console_sem+535/560] ? release_console_sem+0x217/0x230
>  [vprintk+560/1136] ? vprintk+0x230/0x470
>  [_end+108902941/2109239016] ? :kvm:decache_vcpus_on_cpu+0x65/0xe0
>  [smp_call_function_single+234/256] smp_call_function_single+0xea/0x100
>  [_end+109030358/2109239016] :kvm_intel:vmx_vcpu_decache+0x2e/0x40
>  [_end+108903005/2109239016] :kvm:decache_vcpus_on_cpu+0xa5/0xe0
>  [_end+108882341/2109239016] :kvm:hardware_disable+0x2d/0x40

This looks suspiciously.

Apparently, kvm_suspend() does something not entirely sane and that may
very well confuse things later on.

>  [_end+108882595/2109239016] :kvm:kvm_suspend+0xb/0x10
>  [sysdev_suspend+209/560] sysdev_suspend+0xd1/0x230
>  [device_power_down+177/256] device_power_down+0xb1/0x100
>  [suspend_devices_and_enter+300/416] suspend_devices_and_enter+0x12c/0x1a0
>  [enter_state+326/464] enter_state+0x146/0x1d0
>  [state_store+186/256] state_store+0xba/0x100
>  [kobj_attr_store+23/32] kobj_attr_store+0x17/0x20
>  [sysfs_write_file+202/320] sysfs_write_file+0xca/0x140
>  [vfs_write+203/368] vfs_write+0xcb/0x170
>  [sys_write+80/144] sys_write+0x50/0x90
>  [system_call_after_swapgs+123/128] system_call_after_swapgs+0x7b/0x80
> 
> ---[ end trace ca143223eefdc828 ]---
> Extended CMOS year: 2000

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:53                 ` Zdenek Kabelac
  2008-04-07 23:10                   ` Rafael J. Wysocki
@ 2008-04-08  8:09                   ` Jiri Kosina
  2008-04-10 11:16                     ` Zdenek Kabelac
  1 sibling, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2008-04-08  8:09 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Rafael J. Wysocki, Andi Kleen, Kernel development list,
	Ingo Molnar, Thomas Gleixner

On Tue, 8 Apr 2008, Zdenek Kabelac wrote:

> To avoid some mystery - these two warns precede warnings I've initially 
> posted here - the reason why I've not posted also these 2 kvm related 
> warns is that kvm maintainers already know about this problem and said 
> in the past that they will eventually fix them (If I remember correctly)
> Warnings and bug looked somewhat unrelated - but I could be wrong and
> the mystery irq could be mangles in kvm....

Yes, please try to reproduce without KVM. It could happen that some KVM 
code wrongly enabled IRQs somewhere.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 22:33               ` Jiri Kosina
  2008-04-07 22:53                 ` Zdenek Kabelac
  2008-04-07 22:56                 ` Rafael J. Wysocki
@ 2008-04-10  9:45                 ` Pavel Machek
  2008-04-10 23:25                   ` Jiri Kosina
  2 siblings, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2008-04-10  9:45 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Rafael J. Wysocki, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Ingo Molnar, Thomas Gleixner

On Tue 2008-04-08 00:33:48, Jiri Kosina wrote:
> On Tue, 8 Apr 2008, Rafael J. Wysocki wrote:
> 
> > > The mce resume is a sysdev.
> > > sysdevs were always supposed to run completely with interrupts off. If they 
> > > don't anymore that's some kind of higher level resume code bug which you need 
> > > to fix there, not hack around in the low level code.
> > They are executed with interrupts disabled, on one CPU.
> 
> So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id() 
> triggers the warning? Apparently preempt_count is zero, irqs_disabled() 
> returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed.

We are single-threaded because we 'unplugged' all the other cpus...
but I'm not sure the BUG() code realises that...

> So there clearly is a bug somewhere.
> 
> > > Obviously turning on preemption anywhere around the machine check is
> > > fatal because it touches CPU state and if you reschedule you could
> > > switch to another CPU and change or access the wrong CPU's state.
> > FWIW, at the point when sysdevs are resumed we are single-threaded.
> 
> Is that really relevant here? We still could be switched over to another 
> CPU, and that would break things.

There are no other CPUs.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-08  8:09                   ` Jiri Kosina
@ 2008-04-10 11:16                     ` Zdenek Kabelac
  0 siblings, 0 replies; 31+ messages in thread
From: Zdenek Kabelac @ 2008-04-10 11:16 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Rafael J. Wysocki, Andi Kleen, Kernel development list,
	Ingo Molnar, Thomas Gleixner

2008/4/8, Jiri Kosina <jkosina@suse.cz>:
> On Tue, 8 Apr 2008, Zdenek Kabelac wrote:
>
>  > To avoid some mystery - these two warns precede warnings I've initially
>  > posted here - the reason why I've not posted also these 2 kvm related
>  > warns is that kvm maintainers already know about this problem and said
>  > in the past that they will eventually fix them (If I remember correctly)
>  > Warnings and bug looked somewhat unrelated - but I could be wrong and
>  > the mystery irq could be mangles in kvm....
>
>
> Yes, please try to reproduce without KVM. It could happen that some KVM
>  code wrongly enabled IRQs somewhere.

Well I'm unable to simulate this warning message - it happened to me
once even with kvm, so for now I have no idea how to easily replicate
this problem. But I'll keep an eye on this.

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-10  9:45                 ` Pavel Machek
@ 2008-04-10 23:25                   ` Jiri Kosina
  2008-04-11 10:49                     ` Pavel Machek
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2008-04-10 23:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Ingo Molnar, Thomas Gleixner

On Thu, 10 Apr 2008, Pavel Machek wrote:

> > > > sysdevs were always supposed to run completely with interrupts off. If they 
> > > > don't anymore that's some kind of higher level resume code bug which you need 
> > > > to fix there, not hack around in the low level code.
> > > They are executed with interrupts disabled, on one CPU.
> > So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id() 
> > triggers the warning? Apparently preempt_count is zero, irqs_disabled() 
> > returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed.
> We are single-threaded because we 'unplugged' all the other cpus... but 
> I'm not sure the BUG() code realises that...

That's unrelated, right? The real problem here is that irqs_disabled() 
call inside debug_smp_processor_id() signalized that interrupts are 
enabled, and that (correctly) triggered the bug.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-07 21:54 ` Jiri Kosina
  2008-04-07 22:02   ` Andi Kleen
@ 2008-04-11 10:25   ` Ingo Molnar
  2008-04-11 10:34     ` Jiri Kosina
  2008-04-11 10:39     ` Andi Kleen
  1 sibling, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2008-04-11 10:25 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Zdenek Kabelac, Kernel development list, Thomas Gleixner


* Jiri Kosina <jkosina@suse.cz> wrote:

> [ CCs added ]

thanks Jiri, i've applied both patches of yours. (let me know if there 
are any updated patches)

do they solve the problem?

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:25   ` Ingo Molnar
@ 2008-04-11 10:34     ` Jiri Kosina
  2008-04-11 10:40       ` Ingo Molnar
  2008-04-11 10:39     ` Andi Kleen
  1 sibling, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2008-04-11 10:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Zdenek Kabelac, Kernel development list, Thomas Gleixner

On Fri, 11 Apr 2008, Ingo Molnar wrote:

> > [ CCs added ]
> thanks Jiri, i've applied both patches of yours. (let me know if there 
> are any updated patches) do they solve the problem?

Hi Ingo,

thanks, but I don't think they are the proper fix anymore. We'd rather 
figure out why interrupts were enabled in the first place while the code 
was running.

Which is quite difficult, because Zdenek doesn't seem to be able to 
reproduce it, so we even don't know whether it has been caused by KVM or 
not.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:25   ` Ingo Molnar
  2008-04-11 10:34     ` Jiri Kosina
@ 2008-04-11 10:39     ` Andi Kleen
  1 sibling, 0 replies; 31+ messages in thread
From: Andi Kleen @ 2008-04-11 10:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Zdenek Kabelac, Kernel development list, Thomas Gleixner

Ingo Molnar <mingo@elte.hu> writes:

> * Jiri Kosina <jkosina@suse.cz> wrote:
>
>> [ CCs added ]
>
> thanks Jiri, i've applied both patches of yours. (let me know if there 
> are any updated patches)

Please unapply them, as discussed in the earlier thread they are not
the correct fix. The original bug is still not found unfortunately, but it's 
most likely somewhere in the KVM suspend/resume code.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:34     ` Jiri Kosina
@ 2008-04-11 10:40       ` Ingo Molnar
  0 siblings, 0 replies; 31+ messages in thread
From: Ingo Molnar @ 2008-04-11 10:40 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Zdenek Kabelac, Kernel development list, Thomas Gleixner


* Jiri Kosina <jkosina@suse.cz> wrote:

> On Fri, 11 Apr 2008, Ingo Molnar wrote:
> 
> > > [ CCs added ]
> > thanks Jiri, i've applied both patches of yours. (let me know if there 
> > are any updated patches) do they solve the problem?
> 
> Hi Ingo,
> 
> thanks, but I don't think they are the proper fix anymore. We'd rather 
> figure out why interrupts were enabled in the first place while the 
> code was running.
> 
> Which is quite difficult, because Zdenek doesn't seem to be able to 
> reproduce it, so we even don't know whether it has been caused by KVM 
> or not.

if sticking a WARN_ON_ONCE(!irqs_disabled()) somewhere would help in 
making the breakage more visible i'd be glad to apply such a patch.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-10 23:25                   ` Jiri Kosina
@ 2008-04-11 10:49                     ` Pavel Machek
  2008-04-11 10:51                       ` Ingo Molnar
  0 siblings, 1 reply; 31+ messages in thread
From: Pavel Machek @ 2008-04-11 10:49 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Rafael J. Wysocki, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Ingo Molnar, Thomas Gleixner

On Fri 2008-04-11 01:25:13, Jiri Kosina wrote:
> On Thu, 10 Apr 2008, Pavel Machek wrote:
> 
> > > > > sysdevs were always supposed to run completely with interrupts off. If they 
> > > > > don't anymore that's some kind of higher level resume code bug which you need 
> > > > > to fix there, not hack around in the low level code.
> > > > They are executed with interrupts disabled, on one CPU.
> > > So, any idea why mce_resume() -> mce_init() -> debug_smp_processor_id() 
> > > triggers the warning? Apparently preempt_count is zero, irqs_disabled() 
> > > returns false, and cpumask_of_cpu() is not equal to current->cpus_allowed.

...that is okay in case of just one CPU.

> > We are single-threaded because we 'unplugged' all the other cpus... but 
> > I'm not sure the BUG() code realises that...
> 
> That's unrelated, right? The real problem here is that irqs_disabled() 
> call inside debug_smp_processor_id() signalized that interrupts are 
> enabled, and that (correctly) triggered the bug.

Well, we can discuss it personally.

Yes, it is probably bug that interrupts are enabled there. No, the bug
can't potentially bite the user because there should be just one CPU
"plugged in"... so it is kind of false-positive, too..

Does the warning go away with this?


									Pavel

diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 6c90fb9..8195c37 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -35,6 +35,13 @@ unsigned int debug_smp_processor_id(void
 		goto out;
 
 	/*
+	 * It is valid to assume CPU-locality if there's just one
+	 * CPU active
+	 */
+	if (num_online_cpus() == 1)
+		goto out;
+
+	/*
 	 * Avoid recursion:
 	 */
 	preempt_disable();

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:49                     ` Pavel Machek
@ 2008-04-11 10:51                       ` Ingo Molnar
  2008-04-11 10:54                         ` Pavel Machek
  2008-04-11 11:37                         ` Andi Kleen
  0 siblings, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2008-04-11 10:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jiri Kosina, Rafael J. Wysocki, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Thomas Gleixner


* Pavel Machek <pavel@ucw.cz> wrote:

> diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
> index 6c90fb9..8195c37 100644
> --- a/lib/smp_processor_id.c
> +++ b/lib/smp_processor_id.c
> @@ -35,6 +35,13 @@ unsigned int debug_smp_processor_id(void
>  		goto out;
>  
>  	/*
> +	 * It is valid to assume CPU-locality if there's just one
> +	 * CPU active
> +	 */
> +	if (num_online_cpus() == 1)
> +		goto out;

thanks Pavel, i picked this up into sched-devel.git - it makes sense 
independently of whether it solves the warning.

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:51                       ` Ingo Molnar
@ 2008-04-11 10:54                         ` Pavel Machek
  2008-04-11 11:09                           ` Ingo Molnar
  2008-04-11 15:29                           ` Rafael J. Wysocki
  2008-04-11 11:37                         ` Andi Kleen
  1 sibling, 2 replies; 31+ messages in thread
From: Pavel Machek @ 2008-04-11 10:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Rafael J. Wysocki, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Thomas Gleixner

On Fri 2008-04-11 12:51:09, Ingo Molnar wrote:
> 
> * Pavel Machek <pavel@ucw.cz> wrote:
> 
> > diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
> > index 6c90fb9..8195c37 100644
> > --- a/lib/smp_processor_id.c
> > +++ b/lib/smp_processor_id.c
> > @@ -35,6 +35,13 @@ unsigned int debug_smp_processor_id(void
> >  		goto out;
> >  
> >  	/*
> > +	 * It is valid to assume CPU-locality if there's just one
> > +	 * CPU active
> > +	 */
> > +	if (num_online_cpus() == 1)
> > +		goto out;
> 
> thanks Pavel, i picked this up into sched-devel.git - it makes sense 
> independently of whether it solves the warning.

Thanks!

(I just want to fix the underlying problem in suspend, too. I guess
I'll just do something like

diff --git a/drivers/base/sys.c b/drivers/base/sys.c
index 8e13fd9..adb7850 100644
--- a/drivers/base/sys.c
+++ b/drivers/base/sys.c
@@ -367,6 +367,7 @@ int sysdev_suspend(pm_message_t state)
 			/* Call auxillary drivers first */
 			list_for_each_entry(drv, &cls->drivers, entry) {
 				if (drv->suspend) {
+					BUG_ON(!in_interrupt());
 					ret = drv->suspend(sysdev, state);
 					if (ret)
 						goto aux_driver;
@@ -442,6 +443,7 @@ int sysdev_resume(void)
 		list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
 			pr_debug(" %s\n", kobject_name(&sysdev->kobj));
 
+			BUG_ON(!in_interrupt());
 			__sysdev_resume(sysdev);
 		}
 	}

...to catch this kind of problems early).
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:54                         ` Pavel Machek
@ 2008-04-11 11:09                           ` Ingo Molnar
  2008-04-12  9:51                             ` Pavel Machek
  2008-04-11 15:29                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2008-04-11 11:09 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jiri Kosina, Rafael J. Wysocki, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Thomas Gleixner


* Pavel Machek <pavel@ucw.cz> wrote:

> (I just want to fix the underlying problem in suspend, too. I guess
> I'll just do something like

>  				if (drv->suspend) {
> +					BUG_ON(!in_interrupt());
>  					ret = drv->suspend(sysdev, state);

>  			pr_debug(" %s\n", kobject_name(&sysdev->kobj));
>  
> +			BUG_ON(!in_interrupt());
>  			__sysdev_resume(sysdev);

small note: please always use WARN_ON_ONCE() in such cases. That way 
people's boxes dont blow up and the problem will be debuggable if the 
box survives despite the error condition. (and thus it can be relayed to 
kerneloops.org - all major distros have the package included already and 
Fedora enables the client by default)

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:51                       ` Ingo Molnar
  2008-04-11 10:54                         ` Pavel Machek
@ 2008-04-11 11:37                         ` Andi Kleen
  2008-04-11 15:27                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2008-04-11 11:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pavel Machek, Jiri Kosina, Rafael J. Wysocki, Andi Kleen,
	Zdenek Kabelac, Kernel development list, Thomas Gleixner

> thanks Pavel, i picked this up into sched-devel.git - it makes sense 
> independently of whether it solves the warning.

This means the CPU preempt sanity checking is effectively disabled
on UP kernels. Even though more and more people should run multi-core
now there is still a sizeable user base left on single core.

Don't think this is a good idea. We need even the UP testers.

If he wants this for suspend or machine check/oops (always useful there
to disable such warnings) then there should
be separate checks for this. Perhaps system_state could be enhanced for this?

-Andi


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 11:37                         ` Andi Kleen
@ 2008-04-11 15:27                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2008-04-11 15:27 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Pavel Machek, Jiri Kosina, Zdenek Kabelac,
	Kernel development list, Thomas Gleixner

On Friday, 11 of April 2008, Andi Kleen wrote:
> > thanks Pavel, i picked this up into sched-devel.git - it makes sense 
> > independently of whether it solves the warning.
> 
> This means the CPU preempt sanity checking is effectively disabled
> on UP kernels. Even though more and more people should run multi-core
> now there is still a sizeable user base left on single core.
> 
> Don't think this is a good idea. We need even the UP testers.

Agreed.

Besides, I'd like to learn what caused the problem to appear in the first
place.

> If he wants this for suspend or machine check/oops (always useful there
> to disable such warnings) then there should
> be separate checks for this. Perhaps system_state could be enhanced for this?

That would make sense.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 10:54                         ` Pavel Machek
  2008-04-11 11:09                           ` Ingo Molnar
@ 2008-04-11 15:29                           ` Rafael J. Wysocki
  1 sibling, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2008-04-11 15:29 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, Jiri Kosina, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Thomas Gleixner

On Friday, 11 of April 2008, Pavel Machek wrote:
> On Fri 2008-04-11 12:51:09, Ingo Molnar wrote:
> > 
> > * Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > > diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
> > > index 6c90fb9..8195c37 100644
> > > --- a/lib/smp_processor_id.c
> > > +++ b/lib/smp_processor_id.c
> > > @@ -35,6 +35,13 @@ unsigned int debug_smp_processor_id(void
> > >  		goto out;
> > >  
> > >  	/*
> > > +	 * It is valid to assume CPU-locality if there's just one
> > > +	 * CPU active
> > > +	 */
> > > +	if (num_online_cpus() == 1)
> > > +		goto out;
> > 
> > thanks Pavel, i picked this up into sched-devel.git - it makes sense 
> > independently of whether it solves the warning.
> 
> Thanks!
> 
> (I just want to fix the underlying problem in suspend, too. I guess
> I'll just do something like
> 
> diff --git a/drivers/base/sys.c b/drivers/base/sys.c
> index 8e13fd9..adb7850 100644
> --- a/drivers/base/sys.c
> +++ b/drivers/base/sys.c
> @@ -367,6 +367,7 @@ int sysdev_suspend(pm_message_t state)
>  			/* Call auxillary drivers first */
>  			list_for_each_entry(drv, &cls->drivers, entry) {
>  				if (drv->suspend) {
> +					BUG_ON(!in_interrupt());
>  					ret = drv->suspend(sysdev, state);
>  					if (ret)
>  						goto aux_driver;
> @@ -442,6 +443,7 @@ int sysdev_resume(void)
>  		list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
>  			pr_debug(" %s\n", kobject_name(&sysdev->kobj));
>  
> +			BUG_ON(!in_interrupt());
>  			__sysdev_resume(sysdev);
>  		}
>  	}
> 
> ...to catch this kind of problems early).

Well, I'd say the BUG_ON()s are over the top in here (why to crash the system
even if it wouldn't have crashed otherwise?).

WARN_ON_ONCE() should be enough.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG: using smp_processor_id() during suspend with 2.6.25-rc8
  2008-04-11 11:09                           ` Ingo Molnar
@ 2008-04-12  9:51                             ` Pavel Machek
  0 siblings, 0 replies; 31+ messages in thread
From: Pavel Machek @ 2008-04-12  9:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Rafael J. Wysocki, Andi Kleen, Zdenek Kabelac,
	Kernel development list, Thomas Gleixner

On Fri 2008-04-11 13:09:39, Ingo Molnar wrote:
> 
> * Pavel Machek <pavel@ucw.cz> wrote:
> 
> > (I just want to fix the underlying problem in suspend, too. I guess
> > I'll just do something like
> 
> >  				if (drv->suspend) {
> > +					BUG_ON(!in_interrupt());
> >  					ret = drv->suspend(sysdev, state);
> 
> >  			pr_debug(" %s\n", kobject_name(&sysdev->kobj));
> >  
> > +			BUG_ON(!in_interrupt());
> >  			__sysdev_resume(sysdev);
> 
> small note: please always use WARN_ON_ONCE() in such cases. That way 
> people's boxes dont blow up and the problem will be debuggable if the 
> box survives despite the error condition. (and thus it can be relayed to 
> kerneloops.org - all major distros have the package included already and 
> Fedora enables the client by default)

Agreed, BUG_ON() was bad idea and I got the condition wrong.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-04-12  9:50 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-07 13:25 BUG: using smp_processor_id() during suspend with 2.6.25-rc8 Zdenek Kabelac
2008-04-07 21:54 ` Jiri Kosina
2008-04-07 22:02   ` Andi Kleen
2008-04-07 22:04     ` Jiri Kosina
2008-04-07 22:12       ` Andi Kleen
2008-04-07 22:11         ` Jiri Kosina
2008-04-07 22:14           ` Jiri Kosina
2008-04-07 22:23           ` Andi Kleen
2008-04-07 22:29             ` Rafael J. Wysocki
2008-04-07 22:33               ` Jiri Kosina
2008-04-07 22:53                 ` Zdenek Kabelac
2008-04-07 23:10                   ` Rafael J. Wysocki
2008-04-08  8:09                   ` Jiri Kosina
2008-04-10 11:16                     ` Zdenek Kabelac
2008-04-07 22:56                 ` Rafael J. Wysocki
2008-04-10  9:45                 ` Pavel Machek
2008-04-10 23:25                   ` Jiri Kosina
2008-04-11 10:49                     ` Pavel Machek
2008-04-11 10:51                       ` Ingo Molnar
2008-04-11 10:54                         ` Pavel Machek
2008-04-11 11:09                           ` Ingo Molnar
2008-04-12  9:51                             ` Pavel Machek
2008-04-11 15:29                           ` Rafael J. Wysocki
2008-04-11 11:37                         ` Andi Kleen
2008-04-11 15:27                           ` Rafael J. Wysocki
2008-04-07 22:35               ` Andi Kleen
2008-04-07 22:54                 ` Rafael J. Wysocki
2008-04-11 10:25   ` Ingo Molnar
2008-04-11 10:34     ` Jiri Kosina
2008-04-11 10:40       ` Ingo Molnar
2008-04-11 10:39     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).