linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] 2.5.69 oops at sysenter_past_esp
@ 2003-05-06 19:52 mikpe
  2003-05-06 22:35 ` Dave Jones
  0 siblings, 1 reply; 7+ messages in thread
From: mikpe @ 2003-05-06 19:52 UTC (permalink / raw)
  To: linux-kernel

Old Dell Latitude with very basic .config: PII, IDE/PIIX, ext2, cardbus,
hotplug, networking, but no SMP, {IO-,}APIC, ACPI, usb.

Booting into a text console, not starting X or inserting cardbus NIC,
suspending the box (apm). At resume, I am immediately greeted with an
oops looking like:

general protection fault: 0000 [#?]
CPU:	0
EIP:	0060:[<c0109079>]	Not tainted
EFLAGS:	00010246
EIP is at systenter_past_esp+0x6e/0x71
<register dump>
Process <varies, any one of the daemons>
Stack: ...
Call Trace: <empty>

The machine is almost but not completely dead at this point.
The oops repeats several times with varying intervals (from
seconds up to minutes). The keyboard is initially not dead
(it responds to RET) but it too locks up after a while.

I don't know if this is new in 2.5.69, as I didn't test suspend
with 2.5.68 -- I've had resume-related PS/2 mouse problems with
recent 2.5 kernels, fixed finally by the "psmouse_noext" option.

/Mikael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] 2.5.69 oops at sysenter_past_esp
  2003-05-06 19:52 [BUG] 2.5.69 oops at sysenter_past_esp mikpe
@ 2003-05-06 22:35 ` Dave Jones
  2003-05-07  9:33   ` [PATCH] restore sysenter MSRs at resume mikpe
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Jones @ 2003-05-06 22:35 UTC (permalink / raw)
  To: mikpe; +Cc: linux-kernel

On Tue, May 06, 2003 at 09:52:24PM +0200, mikpe@csd.uu.se wrote:
 > suspending the box (apm). At resume, I am immediately greeted with an
 > oops looking like:
 > 
 > general protection fault: 0000 [#?]
 > CPU:	0
 > EIP:	0060:[<c0109079>]	Not tainted
 > EFLAGS:	00010246
 > EIP is at systenter_past_esp+0x6e/0x71

I wonder if your BIOS is trashing the sysenter MSRs on suspend.
Maybe they need restoring ?

		Dave


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] restore sysenter MSRs at resume
  2003-05-06 22:35 ` Dave Jones
@ 2003-05-07  9:33   ` mikpe
  2003-05-07 14:41     ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: mikpe @ 2003-05-07  9:33 UTC (permalink / raw)
  To: Dave Jones; +Cc: torvalds, linux-kernel

Dave Jones writes:
 > On Tue, May 06, 2003 at 09:52:24PM +0200, mikpe@csd.uu.se wrote:
 >  > suspending the box (apm). At resume, I am immediately greeted with an
 >  > oops looking like:
 >  > 
 >  > general protection fault: 0000 [#?]
 >  > CPU:	0
 >  > EIP:	0060:[<c0109079>]	Not tainted
 >  > EFLAGS:	00010246
 >  > EIP is at systenter_past_esp+0x6e/0x71
 > 
 > I wonder if your BIOS is trashing the sysenter MSRs on suspend.
 > Maybe they need restoring ?

I've confirmed that that's exatly what's happening. EIP points to
the sysexit instruction in entry.S, and the sysenter MSRs are all zero.

The patch below hooks sysenter into the driver model and implements
a resume() method which restores the sysenter MSRs. On my '98 vintage
Latitude, this is necessary since those MSRs are cleared at resume.
Failure to restore them leads to oopses and eventual kernel hang.
(Of course, your user-space must also use sysenter. RH9 does.)

The patch has a debug printk() for problematic systems that require
the fix. If it says your machine didn't preserve the MSRs, please
post a note about this to LKML with your machine model, so we can
estimate the scope of the problem.

/Mikael

diff -ruN linux-2.5.69/arch/i386/kernel/sysenter.c linux-2.5.69.sysenter-pm/arch/i386/kernel/sysenter.c
--- linux-2.5.69/arch/i386/kernel/sysenter.c	2003-05-05 22:56:28.000000000 +0200
+++ linux-2.5.69.sysenter-pm/arch/i386/kernel/sysenter.c	2003-05-07 10:50:39.690468848 +0200
@@ -51,6 +51,53 @@
 	put_cpu();	
 }
 
+#ifdef CONFIG_PM
+#include <linux/device.h>
+
+static int sysenter_resume(struct device *dev, u32 state, u32 level)
+{
+	if (level != RESUME_POWER_ON)
+		return 0;
+	/* for collecting statistics, will go away */
+	{
+		unsigned int h, l0, l1, l2;
+		rdmsr(MSR_IA32_SYSENTER_CS, l0, h);
+		rdmsr(MSR_IA32_SYSENTER_ESP, l1, h);
+		rdmsr(MSR_IA32_SYSENTER_EIP, l2, h);
+		if (!l0 || !l1 || !l2)
+			printk("sysenter_resume: your BIOS didn't preserve the SYSENTER MSRs\n");
+		else
+			printk("sysenter_resume: congratulations, your BIOS seems Ok\n");
+	}
+	enable_sep_cpu(NULL);
+	return 0;
+}
+
+static struct device_driver sysenter_driver = {
+	.name		= "sysenter",
+	.bus		= &system_bus_type,
+	.resume		= sysenter_resume,
+};
+
+static struct sys_device device_sysenter = {
+	.name		= "sysenter",
+	.id		= 0,
+	.dev		= {
+		.name	= "sysenter",
+		.driver	= &sysenter_driver,
+	},
+};
+
+static int __init init_sysenter_devicefs(void)
+{
+	driver_register(&sysenter_driver);
+	return sys_device_register(&device_sysenter);
+}
+
+#else	/* CONFIG_PM */
+static inline int init_sysenter_devicefs(void) { return 0; }
+#endif	/* CONFIG_PM */
+
 /*
  * These symbols are defined by vsyscall.o to mark the bounds
  * of the ELF DSO images included therein.
@@ -76,7 +123,7 @@
 	       &vsyscall_sysenter_end - &vsyscall_sysenter_start);
 
 	on_each_cpu(enable_sep_cpu, NULL, 1, 1);
-	return 0;
+	return init_sysenter_devicefs();
 }
 
 __initcall(sysenter_setup);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] restore sysenter MSRs at resume
  2003-05-07  9:33   ` [PATCH] restore sysenter MSRs at resume mikpe
@ 2003-05-07 14:41     ` Linus Torvalds
  2003-05-07 17:23       ` mikpe
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2003-05-07 14:41 UTC (permalink / raw)
  To: mikpe; +Cc: Dave Jones, linux-kernel


On Wed, 7 May 2003 mikpe@csd.uu.se wrote:
> 
> The patch below hooks sysenter into the driver model and implements
> a resume() method which restores the sysenter MSRs.

This is wrong.

For one thing, you screw up SMP seriously, by not enabling sysenter on all
CPU's, only the boot one.

For another, we shouldn't have "device drivers" for the CPU. I certainly
agree about restoring the sysenter MSR's, but they should be restored by
the CPU-specific code long _before_ we start initializing devices.

So I think we should just make it part of the CPU initialization (which
should be in two parts: the low-level asm part for the "core" CPU
registers, and then the high-level C part for things like the MSR's, 
user-space segment stuff etc).

So why not just add an explicit call to "cpu_resume()" in one of the 
"do_magic_resume()" things, instead of playing games with device trees..

> The patch has a debug printk() for problematic systems that require
> the fix. If it says your machine didn't preserve the MSRs, please
> post a note about this to LKML with your machine model, so we can
> estimate the scope of the problem.

I really think that it should be done unconditionally - there's no point 
in even _expecting_ the BIOS to restore various random MSR's. I can't 
imagine that many do.

		Linus



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] restore sysenter MSRs at resume
  2003-05-07 14:41     ` Linus Torvalds
@ 2003-05-07 17:23       ` mikpe
  2003-05-07 17:39         ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: mikpe @ 2003-05-07 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, linux-kernel

Linus Torvalds writes:
 > 
 > On Wed, 7 May 2003 mikpe@csd.uu.se wrote:
 > > 
 > > The patch below hooks sysenter into the driver model and implements
 > > a resume() method which restores the sysenter MSRs.
 > 
 > This is wrong.
 > 
 > For one thing, you screw up SMP seriously, by not enabling sysenter on all
 > CPU's, only the boot one.

We don't do apm suspend/resume on SMP, so this is no different from the
current situation. I don't know if acpi does it or not.

 > For another, we shouldn't have "device drivers" for the CPU. I certainly
 > agree about restoring the sysenter MSR's, but they should be restored by
 > the CPU-specific code long _before_ we start initializing devices.
 > 
 > So I think we should just make it part of the CPU initialization (which
 > should be in two parts: the low-level asm part for the "core" CPU
 > registers, and then the high-level C part for things like the MSR's, 
 > user-space segment stuff etc).
 > 
 > So why not just add an explicit call to "cpu_resume()" in one of the 
 > "do_magic_resume()" things, instead of playing games with device trees..

Where would cpu_resume() [and cpu_suspend()] live?
arch/i386/kernel/suspend* belong to SOFTWARE_SUSPEND, but I don't
think that approach is desirable when apm mostly works for UP.

I could probably get away with simply having apm.c invoke the C code
in suspend.c, which does restore the SYSENTER MSRs. suspend.c itself
doesn't seem to depend on the SOFTWARE_SUSPEND machinery, but
suspend_asm.S does.

Does that sound reasonable?

 > > The patch has a debug printk() for problematic systems that require
 > > the fix. If it says your machine didn't preserve the MSRs, please
 > > post a note about this to LKML with your machine model, so we can
 > > estimate the scope of the problem.
 > 
 > I really think that it should be done unconditionally - there's no point 
 > in even _expecting_ the BIOS to restore various random MSR's. I can't 
 > imagine that many do.

It does the restore unconditionally, the check is just informational.

/Mikael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] restore sysenter MSRs at resume
  2003-05-07 17:23       ` mikpe
@ 2003-05-07 17:39         ` Linus Torvalds
  2003-05-08 21:47           ` Pavel Machek
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2003-05-07 17:39 UTC (permalink / raw)
  To: mikpe; +Cc: Dave Jones, linux-kernel


On Wed, 7 May 2003 mikpe@csd.uu.se wrote:
> 
> We don't do apm suspend/resume on SMP, so this is no different from the
> current situation. I don't know if acpi does it or not.

Well, the thing is, if we ever do want to support it (and I suspect we 
do), we should have the infrastructure ready. It shouldn't be too hard to 
support SMP suspend in a 2.7.x timeframe, since it from a technology angle 
looks like simply hot-plug CPU's. Some of the infrastructure for that 
already exists.

But I seriously doubt we want to do CPU hot-plug as a device driver. 
Having a hook in place for it in the arch directory will make it easyish 
to add once we integrate all the other hotplug code (which is very 
unlikely in the 2.6.x timeframe).

> I could probably get away with simply having apm.c invoke the C code
> in suspend.c, which does restore the SYSENTER MSRs. suspend.c itself
> doesn't seem to depend on the SOFTWARE_SUSPEND machinery, but
> suspend_asm.S does.
> 
> Does that sound reasonable?

Sounds reasonable to me. In fact, it looks like it really already exists
as the current "restore_processor_state()" thing.

In fact, that one already _does_ call "enable_sep_cpu()", so what's up?

		Linus


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] restore sysenter MSRs at resume
  2003-05-07 17:39         ` Linus Torvalds
@ 2003-05-08 21:47           ` Pavel Machek
  0 siblings, 0 replies; 7+ messages in thread
From: Pavel Machek @ 2003-05-08 21:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mikpe, Dave Jones, linux-kernel

Hi!

> > We don't do apm suspend/resume on SMP, so this is no different from the
> > current situation. I don't know if acpi does it or not.
> 
> Well, the thing is, if we ever do want to support it (and I suspect we 
> do), we should have the infrastructure ready. It shouldn't be too hard to 
> support SMP suspend in a 2.7.x timeframe, since it from a technology angle 
> looks like simply hot-plug CPU's. Some of the infrastructure for that 
> already exists.

Actually, then MSRs should restored during hotadd operation, so resume
still does *not* care about non-boot cpus...
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-05-08 22:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-06 19:52 [BUG] 2.5.69 oops at sysenter_past_esp mikpe
2003-05-06 22:35 ` Dave Jones
2003-05-07  9:33   ` [PATCH] restore sysenter MSRs at resume mikpe
2003-05-07 14:41     ` Linus Torvalds
2003-05-07 17:23       ` mikpe
2003-05-07 17:39         ` Linus Torvalds
2003-05-08 21:47           ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).