All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: General protection fault after STR (32 bit systems only)
@ 2015-06-11 23:45 Srinivas Pandruvada
  2015-06-12  6:07 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Srinivas Pandruvada @ 2015-06-11 23:45 UTC (permalink / raw)
  To: mingo, tglx, hpa, pavel, rjw
  Cc: x86, linux-pm, linux-kernel, Srinivas Pandruvada

Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
after resume, this cause general protection fault. This is very difficult
to reproduce, but this does happen on 32 bit systems. This crash is not
reproduced after save/restore DS during calls to _save_processor_state/
__restore_processor_state respectively.
Similar issue was reported in the past
https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
was not identified.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/suspend_32.h | 2 +-
 arch/x86/power/cpu.c              | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index 552d6c9..3503d0b 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -11,7 +11,7 @@
 
 /* image of the saved processor state */
 struct saved_context {
-	u16 es, fs, gs, ss;
+	u16 ds, es, fs, gs, ss;
 	unsigned long cr0, cr2, cr3, cr4;
 	u64 misc_enable;
 	bool misc_enable_saved;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 757678f..e0dfb01 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
 	 * segment registers
 	 */
 #ifdef CONFIG_X86_32
+	savesegment(ds, ctxt->ds);
 	savesegment(es, ctxt->es);
 	savesegment(fs, ctxt->fs);
 	savesegment(gs, ctxt->gs);
@@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	 * segment registers
 	 */
 #ifdef CONFIG_X86_32
+	loadsegment(ds, ctxt->ds);
 	loadsegment(es, ctxt->es);
 	loadsegment(fs, ctxt->fs);
 	loadsegment(gs, ctxt->gs);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-11 23:45 [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
@ 2015-06-12  6:07 ` Ingo Molnar
  2015-06-12  6:48   ` Andy Lutomirski
                     ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Ingo Molnar @ 2015-06-12  6:07 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, pavel, rjw, x86, linux-pm, linux-kernel,
	Denys Vlasenko, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Linus Torvalds


* Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:

> Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
> after resume, this cause general protection fault. [...]

But s2ram has no influence on 'returning to user-space'. So unless I'm missing 
something this changelog makes no sense.

Unfortunately the patch seems to be completely bogus as well:

> [...] This is very difficult to reproduce, but this does happen on 32 bit 
> systems. This crash is not reproduced after save/restore DS during calls to 
> _save_processor_state/ __restore_processor_state respectively.
>
> Similar issue was reported in the past
> https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
> was not identified.
> 
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> ---
>  arch/x86/include/asm/suspend_32.h | 2 +-
>  arch/x86/power/cpu.c              | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> index 552d6c9..3503d0b 100644
> --- a/arch/x86/include/asm/suspend_32.h
> +++ b/arch/x86/include/asm/suspend_32.h
> @@ -11,7 +11,7 @@
>  
>  /* image of the saved processor state */
>  struct saved_context {
> -	u16 es, fs, gs, ss;
> +	u16 ds, es, fs, gs, ss;
>  	unsigned long cr0, cr2, cr3, cr4;
>  	u64 misc_enable;
>  	bool misc_enable_saved;
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 757678f..e0dfb01 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
>  	 * segment registers
>  	 */
>  #ifdef CONFIG_X86_32
> +	savesegment(ds, ctxt->ds);
>  	savesegment(es, ctxt->es);
>  	savesegment(fs, ctxt->fs);
>  	savesegment(gs, ctxt->gs);
> @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>  	 * segment registers
>  	 */
>  #ifdef CONFIG_X86_32
> +	loadsegment(ds, ctxt->ds);
>  	loadsegment(es, ctxt->es);
>  	loadsegment(fs, ctxt->fs);
>  	loadsegment(gs, ctxt->gs);

So save_processor_state() is used by 32-bit s2ram in the following place:

arch/x86/kernel/acpi/wakeup_32.S:       call    save_processor_state

Other uses are:

arch/x86/kernel/acpi/wakeup_64.S:       call    save_processor_state
arch/x86/kernel/apm_32.c:       save_processor_state();
arch/x86/kernel/machine_kexec_32.c:             save_processor_state();
arch/x86/kernel/machine_kexec_64.c:             save_processor_state();
arch/x86/platform/olpc/xo1-wakeup.S:    call    save_processor_state
kernel/power/hibernate.c:       save_processor_state();
kernel/power/hibernate.c:       save_processor_state();

but neither of these call sites seems to matter to the bug: the 32-bit system does 
not use APM, kexec, is not an OLPC and does not use hibernation.

And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full 
state save/restore before ACPI (BIOS) downcalls.

Furthermore, the bug report in:

    https://bugzilla.kernel.org/show_bug.cgi?id=61781

talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does 
not trigger in v3.10.

1)

So the first critical question is: if the ACPI/BIOS suspend code corrupts the 
kernel's DS, how can we get so far as to resume fully, return to user-space, and 
segfault there so that it can all be reported?

So neither the explanation nor the code makes any sense in the context of the 
reported bugs. Can anyone else offer any plausible theory about why this patch 
would fix 32-bit user-space segfaults?

2)

Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is 
prudent to avoid the BIOS stomping on our DS.

But the restoration done in this patch is bogus, it's done way too late in a C 
function, there's a number of places where we might use the kernel DS before it's 
restored, such as restore_registers():

restore_registers:
        movl    saved_context_ebp, %ebp
        movl    saved_context_ebx, %ebx
        movl    saved_context_esi, %esi
        movl    saved_context_edi, %edi
        pushl   saved_context_eflags
        popfl
        ret

this is called before restore_processor_state().

those 'saved_context_*' references are already using the kernel DS! So if it's 
corrupted, we'll crash there already. So your patch seems totally nonsensical.

3)

So the patch below (totally untested) does the DS restoration early on, as the 
first thing after we emerge from the sleep. This should be equivalent to your 
patch, but is more robust and much simpler: there's no need to save DS, because we 
know that it has to be __KERNEL_DS.

Could you please test whether this solves the problem as well? Also, could you 
please describe how the failure triggers in your system: how many times do you 
have to suspend/resume to trigger the segfaults, and is there anything that makes 
the failures less or more likely?

Thanks,

	Ingo

=================>
 arch/x86/kernel/acpi/wakeup_32.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..9a3ce66e0dd8 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
 	jmp	ret_point
 	.p2align 4,,7
 ret_point:
+	/* In case the BIOS corrupted DS, make the kernel context minimally functional: */
+	movl	$__KERNEL_DS, %eax
+	movl	%eax, %ds
+
 	call	restore_registers
 	call	restore_processor_state
 	ret

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  6:07 ` Ingo Molnar
@ 2015-06-12  6:48   ` Andy Lutomirski
  2015-06-12  7:15     ` Ingo Molnar
  2015-06-12  7:41   ` Andy Lutomirski
  2015-06-12 16:15   ` [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
  2 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2015-06-12  6:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Brian Gerst, Linus Torvalds

On Thu, Jun 11, 2015 at 11:07 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
>
>> Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
>> after resume, this cause general protection fault. [...]
>
> But s2ram has no influence on 'returning to user-space'. So unless I'm missing
> something this changelog makes no sense.
>
> Unfortunately the patch seems to be completely bogus as well:
>
>> [...] This is very difficult to reproduce, but this does happen on 32 bit
>> systems. This crash is not reproduced after save/restore DS during calls to
>> _save_processor_state/ __restore_processor_state respectively.
>>
>> Similar issue was reported in the past
>> https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
>> was not identified.
>>
>> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> ---
>>  arch/x86/include/asm/suspend_32.h | 2 +-
>>  arch/x86/power/cpu.c              | 2 ++
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
>> index 552d6c9..3503d0b 100644
>> --- a/arch/x86/include/asm/suspend_32.h
>> +++ b/arch/x86/include/asm/suspend_32.h
>> @@ -11,7 +11,7 @@
>>
>>  /* image of the saved processor state */
>>  struct saved_context {
>> -     u16 es, fs, gs, ss;
>> +     u16 ds, es, fs, gs, ss;
>>       unsigned long cr0, cr2, cr3, cr4;
>>       u64 misc_enable;
>>       bool misc_enable_saved;
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 757678f..e0dfb01 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
>>        * segment registers
>>        */
>>  #ifdef CONFIG_X86_32
>> +     savesegment(ds, ctxt->ds);
>>       savesegment(es, ctxt->es);
>>       savesegment(fs, ctxt->fs);
>>       savesegment(gs, ctxt->gs);
>> @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>>        * segment registers
>>        */
>>  #ifdef CONFIG_X86_32
>> +     loadsegment(ds, ctxt->ds);
>>       loadsegment(es, ctxt->es);
>>       loadsegment(fs, ctxt->fs);
>>       loadsegment(gs, ctxt->gs);
>
> So save_processor_state() is used by 32-bit s2ram in the following place:
>
> arch/x86/kernel/acpi/wakeup_32.S:       call    save_processor_state
>
> Other uses are:
>
> arch/x86/kernel/acpi/wakeup_64.S:       call    save_processor_state
> arch/x86/kernel/apm_32.c:       save_processor_state();
> arch/x86/kernel/machine_kexec_32.c:             save_processor_state();
> arch/x86/kernel/machine_kexec_64.c:             save_processor_state();
> arch/x86/platform/olpc/xo1-wakeup.S:    call    save_processor_state
> kernel/power/hibernate.c:       save_processor_state();
> kernel/power/hibernate.c:       save_processor_state();
>
> but neither of these call sites seems to matter to the bug: the 32-bit system does
> not use APM, kexec, is not an OLPC and does not use hibernation.
>
> And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full
> state save/restore before ACPI (BIOS) downcalls.
>
> Furthermore, the bug report in:
>
>     https://bugzilla.kernel.org/show_bug.cgi?id=61781
>
> talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does
> not trigger in v3.10.
>
> 1)
>
> So the first critical question is: if the ACPI/BIOS suspend code corrupts the
> kernel's DS, how can we get so far as to resume fully, return to user-space, and
> segfault there so that it can all be reported?
>
> So neither the explanation nor the code makes any sense in the context of the
> reported bugs. Can anyone else offer any plausible theory about why this patch
> would fix 32-bit user-space segfaults?

I'm too tired to look at this intelligently right now, but this
reminds me of the sysret_ss_attrs thing.  What if we have a situation
where, after suspend/resume, we end up with a perfectly valid ss
*selector* (or, on 64-bit kernels, a ds selector that does not matter
one whit) but a somehow-screwed-up ds *cached hidden descriptor*.  (On
32-bit kernels, this could be something exotic like grows-down limit
2^31.)

Now we do the very first return.  If we're on AMD hardware and that
return is SYSRET, then we end up with some complete random garbage
loaded in the hidden DS descriptor if SYSRET on 32-bit mode is indeed
screwed up on AMD.

Of course, this is apparently DS and not SS, but maybe something
similar is happening.  How easily reproducible is this, and what cpu
is it on?  Would something like 'push %ds; pop %ds' after SYSENTER in
vdso32/sysenter.S fix it?

Also, damnit systemd, stop catching SEGV.  I want to know all the SEGV
details, and you helpfully threw them away.  Good job.

>
> 2)
>
> Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is
> prudent to avoid the BIOS stomping on our DS.
>
> But the restoration done in this patch is bogus, it's done way too late in a C
> function, there's a number of places where we might use the kernel DS before it's
> restored, such as restore_registers():
>
> restore_registers:
>         movl    saved_context_ebp, %ebp
>         movl    saved_context_ebx, %ebx
>         movl    saved_context_esi, %esi
>         movl    saved_context_edi, %edi
>         pushl   saved_context_eflags
>         popfl
>         ret
>
> this is called before restore_processor_state().
>
> those 'saved_context_*' references are already using the kernel DS! So if it's
> corrupted, we'll crash there already. So your patch seems totally nonsensical.

Don't even bother saving it.  Just load the known value on resume.

This is a bit odd, in that we don't do the X86_BUG_SYSRET_SS_ATTRS
fixup on 32-bit kernels, and yet no one reported the
sysret_ss_attrs_32 test failing on 32-bit kernels.  I assume that
what's going on is that 32-bit kernels don't have any ways to enter
the kernel with SS=0 (because ss0 is in the TSS and it isn't zero),
but we never thought of the other way to have a bogus SS descriptor in
kernel mode: change the GDT without reloading SS.

I'm presently mystified about DS.  The 32-bit sysenter path loads
__KERNEL_DS into ds (in a macro helpfully called SAVE_ALL), but what,
if anything, restores DS on return?  Could we actually be executing in
32-bit userspace with DS & 3 == 0 after every SYSEXIT on 32-bit
kernels?  If so, yikes!

Should we load __USER_DS into DS and ES right before SYSEXIT?

Here's my full-fledged half-asleep theory:

We suspend to RAM.  We resume.  DS and/or ES contains something
unusual but not unusual enough to crash us.  Our first entry to
userspace is via SYSEXIT.  Because we're daft, we don't reload DS or
ES at any point along the way.  Now we're in userspace with an even
more screwed up DS or ES than usual.  We get SIGSEGV (presumably #GP)
and try to deliver the signal.  We end up with impossible pt_regs
(bogus RPL) but who cares?  We get to __setup_frame, which fixes the
garbage in pt_regs and we re-enter user mode through an IRET patch, so
we finally reload DS and ES.  As a result, we successfully deliver the
signal.  The saved regs would reveal the damage, but systemd throws
them away, and we remain confused for a full ten kernel versions.

--Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  6:48   ` Andy Lutomirski
@ 2015-06-12  7:15     ` Ingo Molnar
  0 siblings, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2015-06-12  7:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Brian Gerst, Linus Torvalds


* Andy Lutomirski <luto@amacapital.net> wrote:

> > 1)
> >
> > So the first critical question is: if the ACPI/BIOS suspend code corrupts the 
> > kernel's DS, how can we get so far as to resume fully, return to user-space, 
> > and segfault there so that it can all be reported?
> >
> > So neither the explanation nor the code makes any sense in the context of the 
> > reported bugs. Can anyone else offer any plausible theory about why this patch 
> > would fix 32-bit user-space segfaults?
> 
> I'm too tired to look at this intelligently right now, but this reminds me of 
> the sysret_ss_attrs thing.  What if we have a situation where, after 
> suspend/resume, we end up with a perfectly valid ss *selector* (or, on 64-bit 
> kernels, a ds selector that does not matter one whit) but a somehow-screwed-up 
> ds *cached hidden descriptor*.  (On 32-bit kernels, this could be something 
> exotic like grows-down limit 2^31.)

Yes, that theory is what my patch tests, by reloading DS with __KERNEL_DS.

This should be safe as the first thing to execute after re-entry, as we don't 
save/restore the GDT. (If the BIOS mucks with the GDT without restoring it to our 
value we are probably screwed in any case.)

> Now we do the very first return.  If we're on AMD hardware and that return is 
> SYSRET, then we end up with some complete random garbage loaded in the hidden DS 
> descriptor if SYSRET on 32-bit mode is indeed screwed up on AMD.

But why would this change from v3.10 to v3.11? I cannot see any low level x86 
change that should make a difference there.

> Don't even bother saving it.  Just load the known value on resume.

Yeah, so that's what my simple patch does.

> Here's my full-fledged half-asleep theory:
> 
> We suspend to RAM.  We resume.  DS and/or ES contains something unusual but not 
> unusual enough to crash us.  Our first entry to userspace is via SYSEXIT.  
> Because we're daft, we don't reload DS or ES at any point along the way.  Now 
> we're in userspace with an even more screwed up DS or ES than usual.  We get 
> SIGSEGV (presumably #GP) and try to deliver the signal.  We end up with 
> impossible pt_regs (bogus RPL) but who cares?  We get to __setup_frame, which 
> fixes the garbage in pt_regs and we re-enter user mode through an IRET patch, so 
> we finally reload DS and ES.  As a result, we successfully deliver the signal.  
> The saved regs would reveal the damage, but systemd throws them away, and we 
> remain confused for a full ten kernel versions.

That's indeed plausible.

If so then the DS reloading patch I sent should help.

So we should also do a full review of all the DS/ES save/restore paths, 
everywhere, as they don't seem to be very consistently done.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  6:07 ` Ingo Molnar
  2015-06-12  6:48   ` Andy Lutomirski
@ 2015-06-12  7:41   ` Andy Lutomirski
  2015-06-12  7:50     ` Ingo Molnar
  2015-06-12 16:15   ` [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
  2 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2015-06-12  7:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Brian Gerst, Linus Torvalds

On Thu, Jun 11, 2015 at 11:07 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
>
>> Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
>> after resume, this cause general protection fault. [...]
>
> But s2ram has no influence on 'returning to user-space'. So unless I'm missing
> something this changelog makes no sense.
>
> Unfortunately the patch seems to be completely bogus as well:
>
>> [...] This is very difficult to reproduce, but this does happen on 32 bit
>> systems. This crash is not reproduced after save/restore DS during calls to
>> _save_processor_state/ __restore_processor_state respectively.
>>
>> Similar issue was reported in the past
>> https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
>> was not identified.
>>
>> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> ---
>>  arch/x86/include/asm/suspend_32.h | 2 +-
>>  arch/x86/power/cpu.c              | 2 ++
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
>> index 552d6c9..3503d0b 100644
>> --- a/arch/x86/include/asm/suspend_32.h
>> +++ b/arch/x86/include/asm/suspend_32.h
>> @@ -11,7 +11,7 @@
>>
>>  /* image of the saved processor state */
>>  struct saved_context {
>> -     u16 es, fs, gs, ss;
>> +     u16 ds, es, fs, gs, ss;
>>       unsigned long cr0, cr2, cr3, cr4;
>>       u64 misc_enable;
>>       bool misc_enable_saved;
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 757678f..e0dfb01 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
>>        * segment registers
>>        */
>>  #ifdef CONFIG_X86_32
>> +     savesegment(ds, ctxt->ds);
>>       savesegment(es, ctxt->es);
>>       savesegment(fs, ctxt->fs);
>>       savesegment(gs, ctxt->gs);
>> @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>>        * segment registers
>>        */
>>  #ifdef CONFIG_X86_32
>> +     loadsegment(ds, ctxt->ds);
>>       loadsegment(es, ctxt->es);
>>       loadsegment(fs, ctxt->fs);
>>       loadsegment(gs, ctxt->gs);
>
> So save_processor_state() is used by 32-bit s2ram in the following place:
>
> arch/x86/kernel/acpi/wakeup_32.S:       call    save_processor_state
>
> Other uses are:
>
> arch/x86/kernel/acpi/wakeup_64.S:       call    save_processor_state
> arch/x86/kernel/apm_32.c:       save_processor_state();
> arch/x86/kernel/machine_kexec_32.c:             save_processor_state();
> arch/x86/kernel/machine_kexec_64.c:             save_processor_state();
> arch/x86/platform/olpc/xo1-wakeup.S:    call    save_processor_state
> kernel/power/hibernate.c:       save_processor_state();
> kernel/power/hibernate.c:       save_processor_state();
>
> but neither of these call sites seems to matter to the bug: the 32-bit system does
> not use APM, kexec, is not an OLPC and does not use hibernation.
>
> And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full
> state save/restore before ACPI (BIOS) downcalls.
>
> Furthermore, the bug report in:
>
>     https://bugzilla.kernel.org/show_bug.cgi?id=61781
>
> talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does
> not trigger in v3.10.
>
> 1)
>
> So the first critical question is: if the ACPI/BIOS suspend code corrupts the
> kernel's DS, how can we get so far as to resume fully, return to user-space, and
> segfault there so that it can all be reported?
>
> So neither the explanation nor the code makes any sense in the context of the
> reported bugs. Can anyone else offer any plausible theory about why this patch
> would fix 32-bit user-space segfaults?
>
> 2)
>
> Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is
> prudent to avoid the BIOS stomping on our DS.
>
> But the restoration done in this patch is bogus, it's done way too late in a C
> function, there's a number of places where we might use the kernel DS before it's
> restored, such as restore_registers():
>
> restore_registers:
>         movl    saved_context_ebp, %ebp
>         movl    saved_context_ebx, %ebx
>         movl    saved_context_esi, %esi
>         movl    saved_context_edi, %edi
>         pushl   saved_context_eflags
>         popfl
>         ret
>
> this is called before restore_processor_state().
>
> those 'saved_context_*' references are already using the kernel DS! So if it's
> corrupted, we'll crash there already. So your patch seems totally nonsensical.
>
> 3)
>
> So the patch below (totally untested) does the DS restoration early on, as the
> first thing after we emerge from the sleep. This should be equivalent to your
> patch, but is more robust and much simpler: there's no need to save DS, because we
> know that it has to be __KERNEL_DS.
>
> Could you please test whether this solves the problem as well? Also, could you
> please describe how the failure triggers in your system: how many times do you
> have to suspend/resume to trigger the segfaults, and is there anything that makes
> the failures less or more likely?
>
> Thanks,
>
>         Ingo
>
> =================>
>  arch/x86/kernel/acpi/wakeup_32.S | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..9a3ce66e0dd8 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
>         jmp     ret_point
>         .p2align 4,,7
>  ret_point:
> +       /* In case the BIOS corrupted DS, make the kernel context minimally functional: */
> +       movl    $__KERNEL_DS, %eax
> +       movl    %eax, %ds
> +

On further thought, I think you want movl $__USER_DS, %eax.  The
32-bit kernel is a strange beast.  Also, you should probably fix up
%es as well.

--Andy

>         call    restore_registers
>         call    restore_processor_state
>         ret



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  7:41   ` Andy Lutomirski
@ 2015-06-12  7:50     ` Ingo Molnar
  2015-06-12  8:15       ` H. Peter Anvin
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2015-06-12  7:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Brian Gerst, Linus Torvalds


* Andy Lutomirski <luto@amacapital.net> wrote:

> > --- a/arch/x86/kernel/acpi/wakeup_32.S
> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
> > @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
> >         jmp     ret_point
> >         .p2align 4,,7
> >  ret_point:
> > +       /* In case the BIOS corrupted DS, make the kernel context minimally functional: */
> > +       movl    $__KERNEL_DS, %eax
> > +       movl    %eax, %ds
> > +
> 
> On further thought, I think you want movl $__USER_DS, %eax.  The
> 32-bit kernel is a strange beast.  Also, you should probably fix up
> %es as well.

So restore_processor_state() already restores ES. The idea here was to reload DS 
early on, because the kernel implicitly uses it for data access so we need it to 
be good to be able to continue executing any generic kernel code.

We don't use %es: prefixed assembly AFAICS, what are the implicit users of ES?

Also, to further confuse things, we also have:

ENTRY(wakeup_pmode_return)
wakeup_pmode_return:
        movw    $__KERNEL_DS, %ax
        movw    %ax, %ss
        movw    %ax, %ds
        movw    %ax, %es
        movw    %ax, %fs
        movw    %ax, %gs

        # reload the gdt, as we need the full 32 bit address
        lidt    saved_idt
        lldt    saved_ldt
        ljmp    $(__KERNEL_CS), $1f
1:
        movl    %cr3, %eax
        movl    %eax, %cr3
        wbinvd

which seems to be another layer of restoration - but it possibly does not trigger 
in the S2RAM case here.

Oh, funny the 'reload the gdt' comment: do you see an LGDT there? It reloads all 
segment selectors, the IDT, LDT and CR3, but does not seem to reload the GDT - the 
only thing the comment describes.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  7:50     ` Ingo Molnar
@ 2015-06-12  8:15       ` H. Peter Anvin
  2015-06-12  8:36         ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2015-06-12  8:15 UTC (permalink / raw)
  To: Ingo Molnar, Andy Lutomirski
  Cc: Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner, Pavel Machek,
	Rafael J. Wysocki, X86 ML, linux-pm, linux-kernel,
	Denys Vlasenko, Borislav Petkov, Brian Gerst, Linus Torvalds

%es is used implicitly by string instructions.

On June 12, 2015 12:50:13 AM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>
>* Andy Lutomirski <luto@amacapital.net> wrote:
>
>> > --- a/arch/x86/kernel/acpi/wakeup_32.S
>> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
>> > @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
>> >         jmp     ret_point
>> >         .p2align 4,,7
>> >  ret_point:
>> > +       /* In case the BIOS corrupted DS, make the kernel context
>minimally functional: */
>> > +       movl    $__KERNEL_DS, %eax
>> > +       movl    %eax, %ds
>> > +
>> 
>> On further thought, I think you want movl $__USER_DS, %eax.  The
>> 32-bit kernel is a strange beast.  Also, you should probably fix up
>> %es as well.
>
>So restore_processor_state() already restores ES. The idea here was to
>reload DS 
>early on, because the kernel implicitly uses it for data access so we
>need it to 
>be good to be able to continue executing any generic kernel code.
>
>We don't use %es: prefixed assembly AFAICS, what are the implicit users
>of ES?
>
>Also, to further confuse things, we also have:
>
>ENTRY(wakeup_pmode_return)
>wakeup_pmode_return:
>        movw    $__KERNEL_DS, %ax
>        movw    %ax, %ss
>        movw    %ax, %ds
>        movw    %ax, %es
>        movw    %ax, %fs
>        movw    %ax, %gs
>
>        # reload the gdt, as we need the full 32 bit address
>        lidt    saved_idt
>        lldt    saved_ldt
>        ljmp    $(__KERNEL_CS), $1f
>1:
>        movl    %cr3, %eax
>        movl    %eax, %cr3
>        wbinvd
>
>which seems to be another layer of restoration - but it possibly does
>not trigger 
>in the S2RAM case here.
>
>Oh, funny the 'reload the gdt' comment: do you see an LGDT there? It
>reloads all 
>segment selectors, the IDT, LDT and CR3, but does not seem to reload
>the GDT - the 
>only thing the comment describes.
>
>Thanks,
>
>	Ingo

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  8:15       ` H. Peter Anvin
@ 2015-06-12  8:36         ` Ingo Molnar
  2015-06-12 15:48           ` Brian Gerst
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2015-06-12  8:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Srinivas Pandruvada, Ingo Molnar,
	Thomas Gleixner, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Brian Gerst, Linus Torvalds


* H. Peter Anvin <hpa@zytor.com> wrote:

> %es is used implicitly by string instructions.

Ok, so we are probably better off reloading ES as well early, right
when we return from the firmware, just in case something does
a copy before we hit the ES restore in restore_processor_state(),
which is a generic C function?

Something like the patch below?

I also added FS/GS/SS reloading to make it complete. If this (or a variant 
thereof, it's still totally untested) works then we can remove the segment 
save/restore layer in __save/restore_processor_state().

Thanks,

	Ingo

===========>
 arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..1376a7fc21b7 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
 
 
 restore_registers:
+	/*
+	 * In case the BIOS corrupted our segment descriptors,
+	 * reload them to clear out any shadow descriptor
+	 * state:
+	 */
+	movl	$__USER_DS, %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	%eax, %fs
+	movl	%eax, %gs
+	movl	$__KERNEL_DS, %eax
+	movl	%eax, %ss
+
 	movl	saved_context_ebp, %ebp
 	movl	saved_context_ebx, %ebx
 	movl	saved_context_esi, %esi

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  8:36         ` Ingo Molnar
@ 2015-06-12 15:48           ` Brian Gerst
  2015-06-12 18:11             ` Andy Lutomirski
                               ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Brian Gerst @ 2015-06-12 15:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: H. Peter Anvin, Andy Lutomirski, Srinivas Pandruvada,
	Ingo Molnar, Thomas Gleixner, Pavel Machek, Rafael J. Wysocki,
	X86 ML, linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds

On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * H. Peter Anvin <hpa@zytor.com> wrote:
>
>> %es is used implicitly by string instructions.
>
> Ok, so we are probably better off reloading ES as well early, right
> when we return from the firmware, just in case something does
> a copy before we hit the ES restore in restore_processor_state(),
> which is a generic C function?
>
> Something like the patch below?
>
> I also added FS/GS/SS reloading to make it complete. If this (or a variant
> thereof, it's still totally untested) works then we can remove the segment
> save/restore layer in __save/restore_processor_state().
>
> Thanks,
>
>         Ingo
>
> ===========>
>  arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..1376a7fc21b7 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>
>
>  restore_registers:
> +       /*
> +        * In case the BIOS corrupted our segment descriptors,
> +        * reload them to clear out any shadow descriptor
> +        * state:
> +        */
> +       movl    $__USER_DS, %eax
> +       movl    %eax, %ds
> +       movl    %eax, %es
> +       movl    %eax, %fs
> +       movl    %eax, %gs
> +       movl    $__KERNEL_DS, %eax
> +       movl    %eax, %ss
> +
>         movl    saved_context_ebp, %ebp
>         movl    saved_context_ebx, %ebx
>         movl    saved_context_esi, %esi

If you follow the convoluted flow of the calls in this file,
wakeup_pmode_return is the first thing called from the trampoline on
resume, and that loads the data segments with __KERNEL_DS.  The better
fix would be to set ds/es to __USER_DS there.  Also since we are in
the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
__KERNEL_STACK_CANARY or user's gs.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12  6:07 ` Ingo Molnar
  2015-06-12  6:48   ` Andy Lutomirski
  2015-06-12  7:41   ` Andy Lutomirski
@ 2015-06-12 16:15   ` Srinivas Pandruvada
  2015-06-13  7:15     ` [PATCH, DEBUG] x86/32: Add small delay after resume Ingo Molnar
  2 siblings, 1 reply; 36+ messages in thread
From: Srinivas Pandruvada @ 2015-06-12 16:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, tglx, hpa, pavel, rjw, x86, linux-pm, linux-kernel,
	Denys Vlasenko, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Linus Torvalds, Kleen, Andi

On Fri, 2015-06-12 at 08:07 +0200, Ingo Molnar wrote:
> * Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
> 
> > Suspend to RAM process is returning to userspsace with DS set to KERNEL_DS
> > after resume, this cause general protection fault. [...]
> 
> But s2ram has no influence on 'returning to user-space'. So unless I'm missing 
> something this changelog makes no sense.
> 
> Unfortunately the patch seems to be completely bogus as well:
> 
> > [...] This is very difficult to reproduce, but this does happen on 32 bit 
> > systems. This crash is not reproduced after save/restore DS during calls to 
> > _save_processor_state/ __restore_processor_state respectively.
> >
> > Similar issue was reported in the past
> > https://bugzilla.kernel.org/show_bug.cgi?id=61781, for which the root cause
> > was not identified.
> > 
> > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Reviewed-by: Andi Kleen <ak@linux.intel.com>
> > ---
> >  arch/x86/include/asm/suspend_32.h | 2 +-
> >  arch/x86/power/cpu.c              | 2 ++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
> > index 552d6c9..3503d0b 100644
> > --- a/arch/x86/include/asm/suspend_32.h
> > +++ b/arch/x86/include/asm/suspend_32.h
> > @@ -11,7 +11,7 @@
> >  
> >  /* image of the saved processor state */
> >  struct saved_context {
> > -	u16 es, fs, gs, ss;
> > +	u16 ds, es, fs, gs, ss;
> >  	unsigned long cr0, cr2, cr3, cr4;
> >  	u64 misc_enable;
> >  	bool misc_enable_saved;
> > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> > index 757678f..e0dfb01 100644
> > --- a/arch/x86/power/cpu.c
> > +++ b/arch/x86/power/cpu.c
> > @@ -79,6 +79,7 @@ static void __save_processor_state(struct saved_context *ctxt)
> >  	 * segment registers
> >  	 */
> >  #ifdef CONFIG_X86_32
> > +	savesegment(ds, ctxt->ds);
> >  	savesegment(es, ctxt->es);
> >  	savesegment(fs, ctxt->fs);
> >  	savesegment(gs, ctxt->gs);
> > @@ -198,6 +199,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> >  	 * segment registers
> >  	 */
> >  #ifdef CONFIG_X86_32
> > +	loadsegment(ds, ctxt->ds);
> >  	loadsegment(es, ctxt->es);
> >  	loadsegment(fs, ctxt->fs);
> >  	loadsegment(gs, ctxt->gs);
> 
> So save_processor_state() is used by 32-bit s2ram in the following place:
> 
> arch/x86/kernel/acpi/wakeup_32.S:       call    save_processor_state
> 
> Other uses are:
> 
> arch/x86/kernel/acpi/wakeup_64.S:       call    save_processor_state
> arch/x86/kernel/apm_32.c:       save_processor_state();
> arch/x86/kernel/machine_kexec_32.c:             save_processor_state();
> arch/x86/kernel/machine_kexec_64.c:             save_processor_state();
> arch/x86/platform/olpc/xo1-wakeup.S:    call    save_processor_state
> kernel/power/hibernate.c:       save_processor_state();
> kernel/power/hibernate.c:       save_processor_state();
> 
> but neither of these call sites seems to matter to the bug: the 32-bit system does 
> not use APM, kexec, is not an OLPC and does not use hibernation.
> 
> And if we look at arch/x86/kernel/acpi/wakeup_32.S it's a straightforward full 
> state save/restore before ACPI (BIOS) downcalls.
> 
> Furthermore, the bug report in:
> 
>     https://bugzilla.kernel.org/show_bug.cgi?id=61781
> 
> talks about SIGSEGVs, and claims that this regression triggers in v3.11 but does 
> not trigger in v3.10.
> 
> 1)
> 
> So the first critical question is: if the ACPI/BIOS suspend code corrupts the 
> kernel's DS, how can we get so far as to resume fully, return to user-space, and 
> segfault there so that it can all be reported?
> 
> So neither the explanation nor the code makes any sense in the context of the 
> reported bugs. Can anyone else offer any plausible theory about why this patch 
> would fix 32-bit user-space segfaults?
> 
> 2)
> 
> Btw., I don't mind the idea of the patch itself per se: saving/restoring DS is 
> prudent to avoid the BIOS stomping on our DS.
> 
> But the restoration done in this patch is bogus, it's done way too late in a C 
> function, there's a number of places where we might use the kernel DS before it's 
> restored, such as restore_registers():
> 
> restore_registers:
>         movl    saved_context_ebp, %ebp
>         movl    saved_context_ebx, %ebx
>         movl    saved_context_esi, %esi
>         movl    saved_context_edi, %edi
>         pushl   saved_context_eflags
>         popfl
>         ret
> 
> this is called before restore_processor_state().
> 
> those 'saved_context_*' references are already using the kernel DS! So if it's 
> corrupted, we'll crash there already. So your patch seems totally nonsensical.
> 
> 3)
> 
> So the patch below (totally untested) does the DS restoration early on, as the 
> first thing after we emerge from the sleep. This should be equivalent to your 
> patch, but is more robust and much simpler: there's no need to save DS, because we 
> know that it has to be __KERNEL_DS.
> 
> Could you please test whether this solves the problem as well? Also, could you 
> please describe how the failure triggers in your system: how many times do you 
> have to suspend/resume to trigger the segfaults, and is there anything that makes 
> the failures less or more likely?
It is very random. Sometimes only few hundred trys reproduce this issue.
Some other times it requires thousands of trys (sometimes not
reproducible at all for days)
It is very time sensitive. A small delay or some debug code in resume
path prevents this to crash.
The BIOS folks created special version to check if they are corrupting
any DS, but they were not able to catch any corruption.
Since these are special deployed systems running critical application,
need to request the tests again with your changes. May take long time.

Thanks,
Srinivas
> 
> Thanks,
> 
> 	Ingo
> 
> =================>
>  arch/x86/kernel/acpi/wakeup_32.S | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..9a3ce66e0dd8 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -81,6 +81,10 @@ ENTRY(do_suspend_lowlevel)
>  	jmp	ret_point
>  	.p2align 4,,7
>  ret_point:
> +	/* In case the BIOS corrupted DS, make the kernel context minimally functional: */
> +	movl	$__KERNEL_DS, %eax
> +	movl	%eax, %ds
> +
>  	call	restore_registers
>  	call	restore_processor_state
>  	ret



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12 15:48           ` Brian Gerst
@ 2015-06-12 18:11             ` Andy Lutomirski
  2015-06-12 18:31               ` Srinivas Pandruvada
  2015-06-12 22:45             ` Denys Vlasenko
  2015-06-13  7:03             ` Ingo Molnar
  2 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2015-06-12 18:11 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Ingo Molnar, H. Peter Anvin, Srinivas Pandruvada, Ingo Molnar,
	Thomas Gleixner, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds

On Fri, Jun 12, 2015 at 8:48 AM, Brian Gerst <brgerst@gmail.com> wrote:
> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * H. Peter Anvin <hpa@zytor.com> wrote:
>>
>>> %es is used implicitly by string instructions.
>>
>> Ok, so we are probably better off reloading ES as well early, right
>> when we return from the firmware, just in case something does
>> a copy before we hit the ES restore in restore_processor_state(),
>> which is a generic C function?
>>
>> Something like the patch below?
>>
>> I also added FS/GS/SS reloading to make it complete. If this (or a variant
>> thereof, it's still totally untested) works then we can remove the segment
>> save/restore layer in __save/restore_processor_state().
>>
>> Thanks,
>>
>>         Ingo
>>
>> ===========>
>>  arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
>> index 665c6b7d2ea9..1376a7fc21b7 100644
>> --- a/arch/x86/kernel/acpi/wakeup_32.S
>> +++ b/arch/x86/kernel/acpi/wakeup_32.S
>> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>>
>>
>>  restore_registers:
>> +       /*
>> +        * In case the BIOS corrupted our segment descriptors,
>> +        * reload them to clear out any shadow descriptor
>> +        * state:
>> +        */
>> +       movl    $__USER_DS, %eax
>> +       movl    %eax, %ds
>> +       movl    %eax, %es
>> +       movl    %eax, %fs
>> +       movl    %eax, %gs
>> +       movl    $__KERNEL_DS, %eax
>> +       movl    %eax, %ss
>> +
>>         movl    saved_context_ebp, %ebp
>>         movl    saved_context_ebx, %ebx
>>         movl    saved_context_esi, %esi
>
> If you follow the convoluted flow of the calls in this file,
> wakeup_pmode_return is the first thing called from the trampoline on
> resume, and that loads the data segments with __KERNEL_DS.  The better
> fix would be to set ds/es to __USER_DS there.  Also since we are in
> the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
> __KERNEL_STACK_CANARY or user's gs.

So why is this issue rare?  Is it just that the first entry to
userspace after resume is only very rarely via SYSEXIT?  Or is there
some other code path that usually, but not quite always, fixes up the
segment registers?

--Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12 18:11             ` Andy Lutomirski
@ 2015-06-12 18:31               ` Srinivas Pandruvada
  2015-06-13  7:00                 ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Srinivas Pandruvada @ 2015-06-12 18:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Brian Gerst, Ingo Molnar, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds

On Fri, 2015-06-12 at 11:11 -0700, Andy Lutomirski wrote:
> On Fri, Jun 12, 2015 at 8:48 AM, Brian Gerst <brgerst@gmail.com> wrote:
> > On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >>
> >> * H. Peter Anvin <hpa@zytor.com> wrote:
> >>
> >>> %es is used implicitly by string instructions.
> >>
> >> Ok, so we are probably better off reloading ES as well early, right
> >> when we return from the firmware, just in case something does
> >> a copy before we hit the ES restore in restore_processor_state(),
> >> which is a generic C function?
> >>
> >> Something like the patch below?
> >>
> >> I also added FS/GS/SS reloading to make it complete. If this (or a variant
> >> thereof, it's still totally untested) works then we can remove the segment
> >> save/restore layer in __save/restore_processor_state().
> >>
> >> Thanks,
> >>
> >>         Ingo
> >>
> >> ===========>
> >>  arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> >> index 665c6b7d2ea9..1376a7fc21b7 100644
> >> --- a/arch/x86/kernel/acpi/wakeup_32.S
> >> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> >> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
> >>
> >>
> >>  restore_registers:
> >> +       /*
> >> +        * In case the BIOS corrupted our segment descriptors,
> >> +        * reload them to clear out any shadow descriptor
> >> +        * state:
> >> +        */
> >> +       movl    $__USER_DS, %eax
> >> +       movl    %eax, %ds
> >> +       movl    %eax, %es
> >> +       movl    %eax, %fs
> >> +       movl    %eax, %gs
> >> +       movl    $__KERNEL_DS, %eax
> >> +       movl    %eax, %ss
> >> +
> >>         movl    saved_context_ebp, %ebp
> >>         movl    saved_context_ebx, %ebx
> >>         movl    saved_context_esi, %esi
> >
> > If you follow the convoluted flow of the calls in this file,
> > wakeup_pmode_return is the first thing called from the trampoline on
> > resume, and that loads the data segments with __KERNEL_DS.  The better
> > fix would be to set ds/es to __USER_DS there.  Also since we are in
> > the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
> > __KERNEL_STACK_CANARY or user's gs.
> 
> So why is this issue rare?  Is it just that the first entry to
> userspace after resume is only very rarely via SYSEXIT?  Or is there
> some other code path that usually, but not quite always, fixes up the
> segment registers?
We spent a quite a bit of time debugging this. Any small debug code or
printk causes problem to disappear. We noticed that we can reproduce
this more frequently when the we run just one user space application to
just do suspend/resume test, which supports yours theory.


> 
> --Andy



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12 15:48           ` Brian Gerst
  2015-06-12 18:11             ` Andy Lutomirski
@ 2015-06-12 22:45             ` Denys Vlasenko
  2015-06-13 14:20               ` Pavel Machek
  2015-06-13  7:03             ` Ingo Molnar
  2 siblings, 1 reply; 36+ messages in thread
From: Denys Vlasenko @ 2015-06-12 22:45 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner, Pavel Machek,
	Rafael J. Wysocki, X86 ML, linux-pm, linux-kernel,
	Denys Vlasenko, Borislav Petkov, Linus Torvalds

On Fri, Jun 12, 2015 at 5:48 PM, Brian Gerst <brgerst@gmail.com> wrote:
> If you follow the convoluted flow of the calls in this file,
> ...

Speaking of which. It is indeed quite bad.

For one, saved_eip is only ever set to point to ret_point:

ENTRY(saved_eip)        .long   0
...

        movl    $ret_point, saved_eip

and it has just a single user, where an indirect jump
through it is performed:

        # jump to place where we left off
        movl    saved_eip, %eax
        jmp     *%eax

No comments why it is so.

All this seems to be equivalent to trivial

        # jump to place where we left off
        jmp     ret_point

Am I missing something?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12 18:31               ` Srinivas Pandruvada
@ 2015-06-13  7:00                 ` Ingo Molnar
  0 siblings, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2015-06-13  7:00 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Andy Lutomirski, Brian Gerst, H. Peter Anvin, Ingo Molnar,
	Thomas Gleixner, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds


* Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:

> On Fri, 2015-06-12 at 11:11 -0700, Andy Lutomirski wrote:
> > On Fri, Jun 12, 2015 at 8:48 AM, Brian Gerst <brgerst@gmail.com> wrote:
> > > On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >>
> > >> * H. Peter Anvin <hpa@zytor.com> wrote:
> > >>
> > >>> %es is used implicitly by string instructions.
> > >>
> > >> Ok, so we are probably better off reloading ES as well early, right
> > >> when we return from the firmware, just in case something does
> > >> a copy before we hit the ES restore in restore_processor_state(),
> > >> which is a generic C function?
> > >>
> > >> Something like the patch below?
> > >>
> > >> I also added FS/GS/SS reloading to make it complete. If this (or a variant
> > >> thereof, it's still totally untested) works then we can remove the segment
> > >> save/restore layer in __save/restore_processor_state().
> > >>
> > >> Thanks,
> > >>
> > >>         Ingo
> > >>
> > >> ===========>
> > >>  arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
> > >>  1 file changed, 13 insertions(+)
> > >>
> > >> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> > >> index 665c6b7d2ea9..1376a7fc21b7 100644
> > >> --- a/arch/x86/kernel/acpi/wakeup_32.S
> > >> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> > >> @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
> > >>
> > >>
> > >>  restore_registers:
> > >> +       /*
> > >> +        * In case the BIOS corrupted our segment descriptors,
> > >> +        * reload them to clear out any shadow descriptor
> > >> +        * state:
> > >> +        */
> > >> +       movl    $__USER_DS, %eax
> > >> +       movl    %eax, %ds
> > >> +       movl    %eax, %es
> > >> +       movl    %eax, %fs
> > >> +       movl    %eax, %gs
> > >> +       movl    $__KERNEL_DS, %eax
> > >> +       movl    %eax, %ss
> > >> +
> > >>         movl    saved_context_ebp, %ebp
> > >>         movl    saved_context_ebx, %ebx
> > >>         movl    saved_context_esi, %esi
> > >
> > > If you follow the convoluted flow of the calls in this file,
> > > wakeup_pmode_return is the first thing called from the trampoline on
> > > resume, and that loads the data segments with __KERNEL_DS.  The better
> > > fix would be to set ds/es to __USER_DS there.  Also since we are in
> > > the kernel, fs is fixed at __KERNEL_PERCPU, and gs is either
> > > __KERNEL_STACK_CANARY or user's gs.
> > 
> > So why is this issue rare?  Is it just that the first entry to
> > userspace after resume is only very rarely via SYSEXIT?  Or is there
> > some other code path that usually, but not quite always, fixes up the
> > segment registers?
>
> We spent a quite a bit of time debugging this. Any small debug code or printk 
> causes problem to disappear. We noticed that we can reproduce this more 
> frequently when the we run just one user space application to just do 
> suspend/resume test, which supports yours theory.

So what happens if you add a couple of CPUID (or NOP) calls? If it's timing 
related, not segment register related, then that might 'fix' the bug too.

Which would suggest that it might be some hardware problem, or a race with 
something (which could not really be on the Linux side, as we are the only thread 
of execution at this point)?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12 15:48           ` Brian Gerst
  2015-06-12 18:11             ` Andy Lutomirski
  2015-06-12 22:45             ` Denys Vlasenko
@ 2015-06-13  7:03             ` Ingo Molnar
  2015-06-13 18:23               ` Andy Lutomirski
  2 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2015-06-13  7:03 UTC (permalink / raw)
  To: Brian Gerst
  Cc: H. Peter Anvin, Andy Lutomirski, Srinivas Pandruvada,
	Ingo Molnar, Thomas Gleixner, Pavel Machek, Rafael J. Wysocki,
	X86 ML, linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds


* Brian Gerst <brgerst@gmail.com> wrote:

> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * H. Peter Anvin <hpa@zytor.com> wrote:
> >
> >> %es is used implicitly by string instructions.
> >
> > Ok, so we are probably better off reloading ES as well early, right
> > when we return from the firmware, just in case something does
> > a copy before we hit the ES restore in restore_processor_state(),
> > which is a generic C function?
> >
> > Something like the patch below?
> >
> > I also added FS/GS/SS reloading to make it complete. If this (or a variant
> > thereof, it's still totally untested) works then we can remove the segment
> > save/restore layer in __save/restore_processor_state().
> >
> > Thanks,
> >
> >         Ingo
> >
> > ===========>
> >  arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> > index 665c6b7d2ea9..1376a7fc21b7 100644
> > --- a/arch/x86/kernel/acpi/wakeup_32.S
> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
> > @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
> >
> >
> >  restore_registers:
> > +       /*
> > +        * In case the BIOS corrupted our segment descriptors,
> > +        * reload them to clear out any shadow descriptor
> > +        * state:
> > +        */
> > +       movl    $__USER_DS, %eax
> > +       movl    %eax, %ds
> > +       movl    %eax, %es
> > +       movl    %eax, %fs
> > +       movl    %eax, %gs
> > +       movl    $__KERNEL_DS, %eax
> > +       movl    %eax, %ss
> > +
> >         movl    saved_context_ebp, %ebp
> >         movl    saved_context_ebx, %ebx
> >         movl    saved_context_esi, %esi
> 
> If you follow the convoluted flow of the calls in this file, wakeup_pmode_return 
> is the first thing called from the trampoline on resume, and that loads the data 
> segments with __KERNEL_DS. [...]

So if wakeup_pmode_return is really the first thing called then the whole premise 
of shadow descriptor corruption goes out the window: we reload all relevant 
segment registers.

Which leaves us with only two small channels through which the patch might make a 
bug go away:

  - timing, as it introduces a small delay

  - code/cache layout, as it slightly rearranges the code

... but both of these are in the 'grasping at straws' category of hypotheses 
really.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH, DEBUG] x86/32: Add small delay after resume
  2015-06-12 16:15   ` [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
@ 2015-06-13  7:15     ` Ingo Molnar
  2015-06-15 16:10       ` Srinivas Pandruvada
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2015-06-13  7:15 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, pavel, rjw, x86, linux-pm, linux-kernel,
	Denys Vlasenko, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Linus Torvalds, Kleen, Andi


* Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:

> >  Also, could you please describe how the failure triggers in your system: how 
> > many times do you have to suspend/resume to trigger the segfaults, and is 
> > there anything that makes the failures less or more likely?
>
> It is very random. Sometimes only few hundred trys reproduce this issue. Some 
> other times it requires thousands of trys (sometimes not reproducible at all for 
> days) It is very time sensitive.

So the very same kernel image will produce different crash patterns depending on 
the time of day? That suggests heat/hardware problems.

> [...] A small delay or some debug code in resume path prevents this to crash.

Fun...

> The BIOS folks created special version to check if they are corrupting any DS, 
> but they were not able to catch any corruption. [...]

So is it true that we always execute wakeup_pmode_return first after we return 
from the BIOS?

If so then the BIOS touching DS cannot be an issue, as we re-initialize all 
segment selectors, which reloads the descriptors:

ENTRY(wakeup_pmode_return)
wakeup_pmode_return:
        movw    $__KERNEL_DS, %ax
        movw    %ax, %ss
        movw    %ax, %ds
        movw    %ax, %es
        movw    %ax, %fs
        movw    %ax, %gs

        # reload the gdt, as we need the full 32 bit address
        lidt    saved_idt
        lldt    saved_ldt
        ljmp    $(__KERNEL_CS), $1f

> [...] Since these are special deployed systems running critical application, 
> need to request the tests again with your changes. May take long time.

So my second patch is clearly broken as per Brian Gerst's comments.

What I would suggest is to try a patch that adds just 100 NOPs or so - attached 
below. This patch will add a small delay without any side effects (other than 
changing the kernel image layout).

If that makes the crash go away, then I'd say the likelihood that it's hardware 
related increases substantially: maybe a PLL has not stabilized yet sufficiently 
after resume, or there's some latent heat sensitivity and the fan has not started 
up yet, etc.

( You can then use this simple delay generating patch in production systems as 
  well, to work around the problem. Maybe convince the BIOS folks to add a delay 
  like this to their resume path before they call Linux. )

Thanks,

	Ingo

=================>

 arch/x86/kernel/acpi/wakeup_32.S | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..ef26999da80a 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -10,6 +10,12 @@
 
 ENTRY(wakeup_pmode_return)
 wakeup_pmode_return:
+
+	/* Timing delay of a few dozen cycles: give the hardware some time to recover */
+	.rept 100
+	nop
+	.endr
+
 	movw	$__KERNEL_DS, %ax
 	movw	%ax, %ss
 	movw	%ax, %ds

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-12 22:45             ` Denys Vlasenko
@ 2015-06-13 14:20               ` Pavel Machek
  0 siblings, 0 replies; 36+ messages in thread
From: Pavel Machek @ 2015-06-13 14:20 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Brian Gerst, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, X86 ML, linux-pm, linux-kernel,
	Denys Vlasenko, Borislav Petkov, Linus Torvalds

On Sat 2015-06-13 00:45:29, Denys Vlasenko wrote:
> On Fri, Jun 12, 2015 at 5:48 PM, Brian Gerst <brgerst@gmail.com> wrote:
> > If you follow the convoluted flow of the calls in this file,
> > ...
> 
> Speaking of which. It is indeed quite bad.
> 
> For one, saved_eip is only ever set to point to ret_point:
> 
> ENTRY(saved_eip)        .long   0
> ...
> 
>         movl    $ret_point, saved_eip
> 
> and it has just a single user, where an indirect jump
> through it is performed:
> 
>         # jump to place where we left off
>         movl    saved_eip, %eax
>         jmp     *%eax
> 
> No comments why it is so.
> 
> All this seems to be equivalent to trivial
> 
>         # jump to place where we left off
>         jmp     ret_point
> 
> Am I missing something?

I don't think so. Its just that slowdown was not bad enough tofix it.

...patch would be welcome, and even better if you could check the issue
on 64-bit kernel, too...
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-13  7:03             ` Ingo Molnar
@ 2015-06-13 18:23               ` Andy Lutomirski
  2015-06-13 21:30                 ` Brian Gerst
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2015-06-13 18:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Brian Gerst, H. Peter Anvin, Srinivas Pandruvada, Ingo Molnar,
	Thomas Gleixner, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds

On Sat, Jun 13, 2015 at 12:03 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
>> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> >
>> > * H. Peter Anvin <hpa@zytor.com> wrote:
>> >
>> >> %es is used implicitly by string instructions.
>> >
>> > Ok, so we are probably better off reloading ES as well early, right
>> > when we return from the firmware, just in case something does
>> > a copy before we hit the ES restore in restore_processor_state(),
>> > which is a generic C function?
>> >
>> > Something like the patch below?
>> >
>> > I also added FS/GS/SS reloading to make it complete. If this (or a variant
>> > thereof, it's still totally untested) works then we can remove the segment
>> > save/restore layer in __save/restore_processor_state().
>> >
>> > Thanks,
>> >
>> >         Ingo
>> >
>> > ===========>
>> >  arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
>> >  1 file changed, 13 insertions(+)
>> >
>> > diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
>> > index 665c6b7d2ea9..1376a7fc21b7 100644
>> > --- a/arch/x86/kernel/acpi/wakeup_32.S
>> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
>> > @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>> >
>> >
>> >  restore_registers:
>> > +       /*
>> > +        * In case the BIOS corrupted our segment descriptors,
>> > +        * reload them to clear out any shadow descriptor
>> > +        * state:
>> > +        */
>> > +       movl    $__USER_DS, %eax
>> > +       movl    %eax, %ds
>> > +       movl    %eax, %es
>> > +       movl    %eax, %fs
>> > +       movl    %eax, %gs
>> > +       movl    $__KERNEL_DS, %eax
>> > +       movl    %eax, %ss
>> > +
>> >         movl    saved_context_ebp, %ebp
>> >         movl    saved_context_ebx, %ebx
>> >         movl    saved_context_esi, %esi
>>
>> If you follow the convoluted flow of the calls in this file, wakeup_pmode_return
>> is the first thing called from the trampoline on resume, and that loads the data
>> segments with __KERNEL_DS. [...]
>
> So if wakeup_pmode_return is really the first thing called then the whole premise
> of shadow descriptor corruption goes out the window: we reload all relevant
> segment registers.

True, but it still leaves the fact that we're loading __KERNEL_DS
instead of __USER_DS, right?  So we end up in the kernel in some
context (I have no clue what context) with __KERNEL_DS loaded.  It's
very easy for us to inadvertently fix it: we could return to userspace
by any means whatsoever except SYSEXIT, or we could even return back
to some preempted kernel context.

I still think we should replace __KERNEL_DS with __USER_DS in
wakeup_pmode_return and see if the problem goes away.

--Andy

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: General protection fault after STR (32 bit systems only)
  2015-06-13 18:23               ` Andy Lutomirski
@ 2015-06-13 21:30                 ` Brian Gerst
  2015-06-14  6:56                   ` [PATCH] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Brian Gerst @ 2015-06-13 21:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, H. Peter Anvin, Srinivas Pandruvada, Ingo Molnar,
	Thomas Gleixner, Pavel Machek, Rafael J. Wysocki, X86 ML,
	linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds

On Sat, Jun 13, 2015 at 2:23 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sat, Jun 13, 2015 at 12:03 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Brian Gerst <brgerst@gmail.com> wrote:
>>
>>> On Fri, Jun 12, 2015 at 4:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>> >
>>> > * H. Peter Anvin <hpa@zytor.com> wrote:
>>> >
>>> >> %es is used implicitly by string instructions.
>>> >
>>> > Ok, so we are probably better off reloading ES as well early, right
>>> > when we return from the firmware, just in case something does
>>> > a copy before we hit the ES restore in restore_processor_state(),
>>> > which is a generic C function?
>>> >
>>> > Something like the patch below?
>>> >
>>> > I also added FS/GS/SS reloading to make it complete. If this (or a variant
>>> > thereof, it's still totally untested) works then we can remove the segment
>>> > save/restore layer in __save/restore_processor_state().
>>> >
>>> > Thanks,
>>> >
>>> >         Ingo
>>> >
>>> > ===========>
>>> >  arch/x86/kernel/acpi/wakeup_32.S | 13 +++++++++++++
>>> >  1 file changed, 13 insertions(+)
>>> >
>>> > diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
>>> > index 665c6b7d2ea9..1376a7fc21b7 100644
>>> > --- a/arch/x86/kernel/acpi/wakeup_32.S
>>> > +++ b/arch/x86/kernel/acpi/wakeup_32.S
>>> > @@ -61,6 +61,19 @@ ENTRY(wakeup_pmode_return)
>>> >
>>> >
>>> >  restore_registers:
>>> > +       /*
>>> > +        * In case the BIOS corrupted our segment descriptors,
>>> > +        * reload them to clear out any shadow descriptor
>>> > +        * state:
>>> > +        */
>>> > +       movl    $__USER_DS, %eax
>>> > +       movl    %eax, %ds
>>> > +       movl    %eax, %es
>>> > +       movl    %eax, %fs
>>> > +       movl    %eax, %gs
>>> > +       movl    $__KERNEL_DS, %eax
>>> > +       movl    %eax, %ss
>>> > +
>>> >         movl    saved_context_ebp, %ebp
>>> >         movl    saved_context_ebx, %ebx
>>> >         movl    saved_context_esi, %esi
>>>
>>> If you follow the convoluted flow of the calls in this file, wakeup_pmode_return
>>> is the first thing called from the trampoline on resume, and that loads the data
>>> segments with __KERNEL_DS. [...]
>>
>> So if wakeup_pmode_return is really the first thing called then the whole premise
>> of shadow descriptor corruption goes out the window: we reload all relevant
>> segment registers.
>
> True, but it still leaves the fact that we're loading __KERNEL_DS
> instead of __USER_DS, right?  So we end up in the kernel in some
> context (I have no clue what context) with __KERNEL_DS loaded.  It's
> very easy for us to inadvertently fix it: we could return to userspace
> by any means whatsoever except SYSEXIT, or we could even return back
> to some preempted kernel context.
>
> I still think we should replace __KERNEL_DS with __USER_DS in
> wakeup_pmode_return and see if the problem goes away.

I'm pretty sure that's what the problem is.  If you look at the
sysexit path, it never reloads ds/es.  It assumes they are still
__USER_DS set at sysenter.  The iret path does restore all the user
segments.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] x86: Load __USER_DS into DS/ES after resume
  2015-06-13 21:30                 ` Brian Gerst
@ 2015-06-14  6:56                   ` Ingo Molnar
  2015-06-14  7:03                     ` Pavel Machek
       [not found]                     ` <CA+55aFzB9dYidEf_7Hs47FOF7WPPJnJQwj_RiwL--c5Gb1uqyw@mail.gmail.com>
  0 siblings, 2 replies; 36+ messages in thread
From: Ingo Molnar @ 2015-06-14  6:56 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Andy Lutomirski, H. Peter Anvin, Srinivas Pandruvada,
	Ingo Molnar, Thomas Gleixner, Pavel Machek, Rafael J. Wysocki,
	X86 ML, linux-pm, linux-kernel, Denys Vlasenko, Borislav Petkov,
	Linus Torvalds


* Brian Gerst <brgerst@gmail.com> wrote:

> >> So if wakeup_pmode_return is really the first thing called then the whole 
> >> premise of shadow descriptor corruption goes out the window: we reload all 
> >> relevant segment registers.
> >
> > True, but it still leaves the fact that we're loading __KERNEL_DS instead of 
> > __USER_DS, right?  So we end up in the kernel in some context (I have no clue 
> > what context) with __KERNEL_DS loaded.  It's very easy for us to inadvertently 
> > fix it: we could return to userspace by any means whatsoever except SYSEXIT, 
> > or we could even return back to some preempted kernel context.
> >
> > I still think we should replace __KERNEL_DS with __USER_DS in 
> > wakeup_pmode_return and see if the problem goes away.
> 
> I'm pretty sure that's what the problem is.  If you look at the sysexit path, it 
> never reloads ds/es.  It assumes they are still __USER_DS set at sysenter.  The 
> iret path does restore all the user segments.

Ok, so something like the patch below, right?

Thanks,

	Ingo

=====================>
 arch/x86/kernel/acpi/wakeup_32.S | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..7302bbaea184 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -12,11 +12,13 @@ ENTRY(wakeup_pmode_return)
 wakeup_pmode_return:
 	movw	$__KERNEL_DS, %ax
 	movw	%ax, %ss
-	movw	%ax, %ds
-	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
 
+	movw	$__KERNEL_DS, %ax
+	movw	%ax, %ds
+	movw	%ax, %es
+
 	# reload the gdt, as we need the full 32 bit address
 	lidt	saved_idt
 	lldt	saved_ldt

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] x86: Load __USER_DS into DS/ES after resume
  2015-06-14  6:56                   ` [PATCH] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
@ 2015-06-14  7:03                     ` Pavel Machek
       [not found]                     ` <CA+55aFzB9dYidEf_7Hs47FOF7WPPJnJQwj_RiwL--c5Gb1uqyw@mail.gmail.com>
  1 sibling, 0 replies; 36+ messages in thread
From: Pavel Machek @ 2015-06-14  7:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Brian Gerst, Andy Lutomirski, H. Peter Anvin,
	Srinivas Pandruvada, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, X86 ML, linux-pm, linux-kernel,
	Denys Vlasenko, Borislav Petkov, Linus Torvalds

On Sun 2015-06-14 08:56:35, Ingo Molnar wrote:
> 
> * Brian Gerst <brgerst@gmail.com> wrote:
> 
> > >> So if wakeup_pmode_return is really the first thing called then the whole 
> > >> premise of shadow descriptor corruption goes out the window: we reload all 
> > >> relevant segment registers.
> > >
> > > True, but it still leaves the fact that we're loading __KERNEL_DS instead of 
> > > __USER_DS, right?  So we end up in the kernel in some context (I have no clue 
> > > what context) with __KERNEL_DS loaded.  It's very easy for us to inadvertently 
> > > fix it: we could return to userspace by any means whatsoever except SYSEXIT, 
> > > or we could even return back to some preempted kernel context.
> > >
> > > I still think we should replace __KERNEL_DS with __USER_DS in 
> > > wakeup_pmode_return and see if the problem goes away.
> > 
> > I'm pretty sure that's what the problem is.  If you look at the sysexit path, it 
> > never reloads ds/es.  It assumes they are still __USER_DS set at sysenter.  The 
> > iret path does restore all the user segments.
> 
> Ok, so something like the patch below, right?
> 
> Thanks,
> 
> 	Ingo
> 
> =====================>
>  arch/x86/kernel/acpi/wakeup_32.S | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..7302bbaea184 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -12,11 +12,13 @@ ENTRY(wakeup_pmode_return)
>  wakeup_pmode_return:
>  	movw	$__KERNEL_DS, %ax
>  	movw	%ax, %ss
> -	movw	%ax, %ds
> -	movw	%ax, %es
>  	movw	%ax, %fs
>  	movw	%ax, %gs
>  
> +	movw	$__KERNEL_DS, %ax
> +	movw	%ax, %ds
> +	movw	%ax, %es

Umm. Are you sure? :-).
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2] x86: Load __USER_DS into DS/ES after resume
       [not found]                     ` <CA+55aFzB9dYidEf_7Hs47FOF7WPPJnJQwj_RiwL--c5Gb1uqyw@mail.gmail.com>
@ 2015-06-14  7:49                       ` Ingo Molnar
  2015-06-14  8:57                         ` Pavel Machek
                                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Ingo Molnar @ 2015-06-14  7:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Srinivas Pandruvada, the arch/x86 maintainers, linux-pm,
	Rafael J. Wysocki, Pavel Machek, Andy Lutomirski, Brian Gerst,
	Peter Anvin


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Jun 13, 2015 8:56 PM, "Ingo Molnar" <mingo@kernel.org> wrote:
> >
> > Ok, so something like the patch below, right?
> >
> >         movw    $__KERNEL_DS, %ax
> >         movw    %ax, %ss
> > -       movw    %ax, %ds
> > -       movw    %ax, %es
> >         movw    %ax, %fs
> >         movw    %ax, %gs
> >
> > +       movw    $__KERNEL_DS, %ax
> > +       movw    %ax, %ds
> > +       movw    %ax, %es
> 
> .. except with less cut-and-paste bugs.
> 
> That second KERNEL should be USER.

Yeah, doh :-/ Updated patch below.

Thanks,

	Ingo

=====
 arch/x86/kernel/acpi/wakeup_32.S | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..0c26b1b44e51 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -12,11 +12,13 @@ ENTRY(wakeup_pmode_return)
 wakeup_pmode_return:
 	movw	$__KERNEL_DS, %ax
 	movw	%ax, %ss
-	movw	%ax, %ds
-	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
 
+	movw	$__USER_DS, %ax
+	movw	%ax, %ds
+	movw	%ax, %es
+
 	# reload the gdt, as we need the full 32 bit address
 	lidt	saved_idt
 	lldt	saved_ldt

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] x86: Load __USER_DS into DS/ES after resume
  2015-06-14  7:49                       ` [PATCH v2] " Ingo Molnar
@ 2015-06-14  8:57                         ` Pavel Machek
  2015-06-14 14:22                           ` Brian Gerst
  2015-06-15 16:12                         ` Srinivas Pandruvada
  2015-06-16  9:13                         ` Pavel Machek
  2 siblings, 1 reply; 36+ messages in thread
From: Pavel Machek @ 2015-06-14  8:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Denys Vlasenko, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Srinivas Pandruvada, the arch/x86 maintainers,
	linux-pm, Rafael J. Wysocki, Andy Lutomirski, Brian Gerst,
	Peter Anvin

On Sun 2015-06-14 09:49:22, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Jun 13, 2015 8:56 PM, "Ingo Molnar" <mingo@kernel.org> wrote:
> > >
> > > Ok, so something like the patch below, right?
> > >
> > >         movw    $__KERNEL_DS, %ax
> > >         movw    %ax, %ss
> > > -       movw    %ax, %ds
> > > -       movw    %ax, %es
> > >         movw    %ax, %fs
> > >         movw    %ax, %gs
> > >
> > > +       movw    $__KERNEL_DS, %ax
> > > +       movw    %ax, %ds
> > > +       movw    %ax, %es
> > 
> > .. except with less cut-and-paste bugs.
> > 
> > That second KERNEL should be USER.
> 
> Yeah, doh :-/ Updated patch below.

Do we want similar patch for 64-bit, too?

Signed-off-by: Pavel Machek <pavel@ucw.cz>

diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
index ae693b5..2ac2bc7 100644
--- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -18,10 +18,12 @@ ENTRY(wakeup_long64)
 	cmpq	%rdx, %rax
 	jne	bogus_64_magic
 
-	movw	$__KERNEL_DS, %ax
-	movw	%ax, %ss	
+	movw	$__USER_DS, %ax
 	movw	%ax, %ds
 	movw	%ax, %es
+
+	movw	$__KERNEL_DS, %ax
+	movw	%ax, %ss	
 	movw	%ax, %fs
 	movw	%ax, %gs
 	movq	saved_rsp, %rsp


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] x86: Load __USER_DS into DS/ES after resume
  2015-06-14  8:57                         ` Pavel Machek
@ 2015-06-14 14:22                           ` Brian Gerst
  0 siblings, 0 replies; 36+ messages in thread
From: Brian Gerst @ 2015-06-14 14:22 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, Linus Torvalds, Denys Vlasenko, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Srinivas Pandruvada,
	the arch/x86 maintainers, linux-pm, Rafael J. Wysocki,
	Andy Lutomirski, Peter Anvin

On Sun, Jun 14, 2015 at 4:57 AM, Pavel Machek <pavel@ucw.cz> wrote:
> On Sun 2015-06-14 09:49:22, Ingo Molnar wrote:
>>
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> > On Jun 13, 2015 8:56 PM, "Ingo Molnar" <mingo@kernel.org> wrote:
>> > >
>> > > Ok, so something like the patch below, right?
>> > >
>> > >         movw    $__KERNEL_DS, %ax
>> > >         movw    %ax, %ss
>> > > -       movw    %ax, %ds
>> > > -       movw    %ax, %es
>> > >         movw    %ax, %fs
>> > >         movw    %ax, %gs
>> > >
>> > > +       movw    $__KERNEL_DS, %ax
>> > > +       movw    %ax, %ds
>> > > +       movw    %ax, %es
>> >
>> > .. except with less cut-and-paste bugs.
>> >
>> > That second KERNEL should be USER.
>>
>> Yeah, doh :-/ Updated patch below.
>
> Do we want similar patch for 64-bit, too?
>
> Signed-off-by: Pavel Machek <pavel@ucw.cz>
>
> diff --git a/arch/x86/kernel/acpi/wakeup_64.S b/arch/x86/kernel/acpi/wakeup_64.S
> index ae693b5..2ac2bc7 100644
> --- a/arch/x86/kernel/acpi/wakeup_64.S
> +++ b/arch/x86/kernel/acpi/wakeup_64.S
> @@ -18,10 +18,12 @@ ENTRY(wakeup_long64)
>         cmpq    %rdx, %rax
>         jne     bogus_64_magic
>
> -       movw    $__KERNEL_DS, %ax
> -       movw    %ax, %ss
> +       movw    $__USER_DS, %ax
>         movw    %ax, %ds
>         movw    %ax, %es
> +
> +       movw    $__KERNEL_DS, %ax
> +       movw    %ax, %ss
>         movw    %ax, %fs
>         movw    %ax, %gs
>         movq    saved_rsp, %rsp

64-bit is a different animal.  DS/ES are ignored in 64-bit mode, so
they are always userspace-owned, and only matter for 32-bit user
processes.  64-bit does currently save/restore these in
x86/power/cpu.c.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, DEBUG] x86/32: Add small delay after resume
  2015-06-13  7:15     ` [PATCH, DEBUG] x86/32: Add small delay after resume Ingo Molnar
@ 2015-06-15 16:10       ` Srinivas Pandruvada
  2015-06-16 21:33         ` H. Peter Anvin
  0 siblings, 1 reply; 36+ messages in thread
From: Srinivas Pandruvada @ 2015-06-15 16:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: mingo, tglx, hpa, pavel, rjw, x86, linux-pm, linux-kernel,
	Denys Vlasenko, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Linus Torvalds, Kleen, Andi

On Sat, 2015-06-13 at 09:15 +0200, Ingo Molnar wrote:
> * Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
> 
> > >  Also, could you please describe how the failure triggers in your system: how 
> > > many times do you have to suspend/resume to trigger the segfaults, and is 
> > > there anything that makes the failures less or more likely?
> >
> > It is very random. Sometimes only few hundred trys reproduce this issue. Some 
> > other times it requires thousands of trys (sometimes not reproducible at all for 
> > days) It is very time sensitive.
> 
> So the very same kernel image will produce different crash patterns depending on 
> the time of day? That suggests heat/hardware problems.
> 
> > [...] A small delay or some debug code in resume path prevents this to crash.
> 
> Fun...
> 
> > The BIOS folks created special version to check if they are corrupting any DS, 
> > but they were not able to catch any corruption. [...]
> 
> So is it true that we always execute wakeup_pmode_return first after we return 
> from the BIOS?
> 
> If so then the BIOS touching DS cannot be an issue, as we re-initialize all 
> segment selectors, which reloads the descriptors:
> 
> ENTRY(wakeup_pmode_return)
> wakeup_pmode_return:
>         movw    $__KERNEL_DS, %ax
>         movw    %ax, %ss
>         movw    %ax, %ds
>         movw    %ax, %es
>         movw    %ax, %fs
>         movw    %ax, %gs
> 
>         # reload the gdt, as we need the full 32 bit address
>         lidt    saved_idt
>         lldt    saved_ldt
>         ljmp    $(__KERNEL_CS), $1f
> 
> > [...] Since these are special deployed systems running critical application, 
> > need to request the tests again with your changes. May take long time.
> 
> So my second patch is clearly broken as per Brian Gerst's comments.
> 
> What I would suggest is to try a patch that adds just 100 NOPs or so - attached 
> below. This patch will add a small delay without any side effects (other than 
> changing the kernel image layout).
> 
> If that makes the crash go away, then I'd say the likelihood that it's hardware 
> related increases substantially: maybe a PLL has not stabilized yet sufficiently 
> after resume, or there's some latent heat sensitivity and the fan has not started 
> up yet, etc.

> ( You can then use this simple delay generating patch in production systems as 
>   well, to work around the problem. Maybe convince the BIOS folks to add a delay 
>   like this to their resume path before they call Linux. )
This was already experimented. They added delay in BIOS before handing
over to OS, the crash still occurred.
We were thinking that BIOS SMI handler responsible for suspend/wake up
corrupted the DS even after control passed over to OS. But couldn't
prove it.
Thanks for your valuable debugging suggestions.

Thanks,
Srinivas
 
> 
> Thanks,
> 
> 	Ingo
> 
> =================>
> 
>  arch/x86/kernel/acpi/wakeup_32.S | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..ef26999da80a 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -10,6 +10,12 @@
>  
>  ENTRY(wakeup_pmode_return)
>  wakeup_pmode_return:
> +
> +	/* Timing delay of a few dozen cycles: give the hardware some time to recover */
> +	.rept 100
> +	nop
> +	.endr
> +
>  	movw	$__KERNEL_DS, %ax
>  	movw	%ax, %ss
>  	movw	%ax, %ds



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] x86: Load __USER_DS into DS/ES after resume
  2015-06-14  7:49                       ` [PATCH v2] " Ingo Molnar
  2015-06-14  8:57                         ` Pavel Machek
@ 2015-06-15 16:12                         ` Srinivas Pandruvada
  2015-06-16  9:13                         ` Pavel Machek
  2 siblings, 0 replies; 36+ messages in thread
From: Srinivas Pandruvada @ 2015-06-15 16:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Denys Vlasenko, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, linux-pm,
	Rafael J. Wysocki, Pavel Machek, Andy Lutomirski, Brian Gerst,
	Peter Anvin

On Sun, 2015-06-14 at 09:49 +0200, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Jun 13, 2015 8:56 PM, "Ingo Molnar" <mingo@kernel.org> wrote:
> > >
> > > Ok, so something like the patch below, right?
> > >
> > >         movw    $__KERNEL_DS, %ax
> > >         movw    %ax, %ss
> > > -       movw    %ax, %ds
> > > -       movw    %ax, %es
> > >         movw    %ax, %fs
> > >         movw    %ax, %gs
> > >
> > > +       movw    $__KERNEL_DS, %ax
> > > +       movw    %ax, %ds
> > > +       movw    %ax, %es
> > 
> > .. except with less cut-and-paste bugs.
> > 
> > That second KERNEL should be USER.
> 
> Yeah, doh :-/ Updated patch below.
> 
> Thanks,
> 
> 	Ingo
> 
> =====
>  arch/x86/kernel/acpi/wakeup_32.S | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..0c26b1b44e51 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -12,11 +12,13 @@ ENTRY(wakeup_pmode_return)
>  wakeup_pmode_return:
>  	movw	$__KERNEL_DS, %ax
>  	movw	%ax, %ss
> -	movw	%ax, %ds
> -	movw	%ax, %es
>  	movw	%ax, %fs
>  	movw	%ax, %gs
>  
> +	movw	$__USER_DS, %ax
> +	movw	%ax, %ds
> +	movw	%ax, %es
> +
>  	# reload the gdt, as we need the full 32 bit address
>  	lidt	saved_idt
>  	lldt	saved_ldt

We did suggest this patch to our customer and initial test results
showed no crash. But they didn't merge this because they did months of
testing with my earlier patch with load/save segment of ds, which will
store restore _USER_DS.

I tried this patch again today on another 32 bit system and STR didn't
break, so if you apply this change, during next kernel upgrade this will
be picked up.

Thanks to you, Andy, Brian for helping on this issue.

- Srinivas



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] x86: Load __USER_DS into DS/ES after resume
  2015-06-14  7:49                       ` [PATCH v2] " Ingo Molnar
  2015-06-14  8:57                         ` Pavel Machek
  2015-06-15 16:12                         ` Srinivas Pandruvada
@ 2015-06-16  9:13                         ` Pavel Machek
  2015-06-16 21:40                           ` Rafael J. Wysocki
  2 siblings, 1 reply; 36+ messages in thread
From: Pavel Machek @ 2015-06-16  9:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Denys Vlasenko, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Srinivas Pandruvada, the arch/x86 maintainers,
	linux-pm, Rafael J. Wysocki, Andy Lutomirski, Brian Gerst,
	Peter Anvin

> > That second KERNEL should be USER.
> 
> Yeah, doh :-/ Updated patch below.
> 
> Thanks,
> 
> 	Ingo

Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Pavel Machek <pavel@ucw.cz>

[of course, it worked for me before, so I only tested it also works
with the patch.]

Rafael, can you apply it?
									Pavel


> diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
> index 665c6b7d2ea9..0c26b1b44e51 100644
> --- a/arch/x86/kernel/acpi/wakeup_32.S
> +++ b/arch/x86/kernel/acpi/wakeup_32.S
> @@ -12,11 +12,13 @@ ENTRY(wakeup_pmode_return)
>  wakeup_pmode_return:
>  	movw	$__KERNEL_DS, %ax
>  	movw	%ax, %ss
> -	movw	%ax, %ds
> -	movw	%ax, %es
>  	movw	%ax, %fs
>  	movw	%ax, %gs
>  
> +	movw	$__USER_DS, %ax
> +	movw	%ax, %ds
> +	movw	%ax, %es
> +
>  	# reload the gdt, as we need the full 32 bit address
>  	lidt	saved_idt
>  	lldt	saved_ldt

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, DEBUG] x86/32: Add small delay after resume
  2015-06-15 16:10       ` Srinivas Pandruvada
@ 2015-06-16 21:33         ` H. Peter Anvin
  2015-06-16 22:25           ` Srinivas Pandruvada
  2015-06-17 16:33           ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 36+ messages in thread
From: H. Peter Anvin @ 2015-06-16 21:33 UTC (permalink / raw)
  To: Srinivas Pandruvada, Ingo Molnar
  Cc: mingo, tglx, pavel, rjw, x86, linux-pm, linux-kernel,
	Denys Vlasenko, Andy Lutomirski, Borislav Petkov, Brian Gerst,
	Linus Torvalds, Kleen, Andi

On 06/15/2015 09:10 AM, Srinivas Pandruvada wrote:
>>
>> So is it true that we always execute wakeup_pmode_return first after we return 
>> from the BIOS?
>>
>> If so then the BIOS touching DS cannot be an issue, as we re-initialize all 
>> segment selectors, which reloads the descriptors:
>>
>> ENTRY(wakeup_pmode_return)
>> wakeup_pmode_return:
>>         movw    $__KERNEL_DS, %ax
>>         movw    %ax, %ss
>>         movw    %ax, %ds
>>         movw    %ax, %es
>>         movw    %ax, %fs
>>         movw    %ax, %gs
>>
>>         # reload the gdt, as we need the full 32 bit address
>>         lidt    saved_idt
>>         lldt    saved_ldt
>>         ljmp    $(__KERNEL_CS), $1f
>>

Where does the GDT get initialized?

	-hpa



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] x86: Load __USER_DS into DS/ES after resume
  2015-06-16  9:13                         ` Pavel Machek
@ 2015-06-16 21:40                           ` Rafael J. Wysocki
  2015-06-17  8:59                             ` x86: allow using different kernel version for 32-bit, too Pavel Machek
  2015-06-18  9:13                             ` [PATCH v2] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
  0 siblings, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2015-06-16 21:40 UTC (permalink / raw)
  To: Pavel Machek, Ingo Molnar
  Cc: Linus Torvalds, Denys Vlasenko, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Srinivas Pandruvada, the arch/x86 maintainers,
	linux-pm, Andy Lutomirski, Brian Gerst, Peter Anvin

On Tuesday, June 16, 2015 11:13:33 AM Pavel Machek wrote:
> > > That second KERNEL should be USER.
> > 
> > Yeah, doh :-/ Updated patch below.
> > 
> > Thanks,
> > 
> > 	Ingo
> 
> Acked-by: Pavel Machek <pavel@ucw.cz>
> Tested-by: Pavel Machek <pavel@ucw.cz>
> 
> [of course, it worked for me before, so I only tested it also works
> with the patch.]
> 
> Rafael, can you apply it?

Yes, I can, but I thought Ingo would take it into the tip tree.

Ingo, what's your plan regarding this one?

Rafael


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, DEBUG] x86/32: Add small delay after resume
  2015-06-16 21:33         ` H. Peter Anvin
@ 2015-06-16 22:25           ` Srinivas Pandruvada
  2015-06-17 16:33           ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 36+ messages in thread
From: Srinivas Pandruvada @ 2015-06-16 22:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, mingo, tglx, pavel, rjw, x86, linux-pm,
	linux-kernel, Denys Vlasenko, Andy Lutomirski, Borislav Petkov,
	Brian Gerst, Linus Torvalds, Kleen, Andi

On Tue, 2015-06-16 at 14:33 -0700, H. Peter Anvin wrote:
> On 06/15/2015 09:10 AM, Srinivas Pandruvada wrote:
> >>
> >> So is it true that we always execute wakeup_pmode_return first after we return 
> >> from the BIOS?
> >>
> >> If so then the BIOS touching DS cannot be an issue, as we re-initialize all 
> >> segment selectors, which reloads the descriptors:
> >>
> >> ENTRY(wakeup_pmode_return)
> >> wakeup_pmode_return:
> >>         movw    $__KERNEL_DS, %ax
> >>         movw    %ax, %ss
> >>         movw    %ax, %ds
> >>         movw    %ax, %es
> >>         movw    %ax, %fs
> >>         movw    %ax, %gs
> >>
> >>         # reload the gdt, as we need the full 32 bit address
> >>         lidt    saved_idt
> >>         lldt    saved_ldt
> >>         ljmp    $(__KERNEL_CS), $1f
> >>
> 
> Where does the GDT get initialized?
In wakeup_start
During acpi_sleep_prepare we set a 32 bit FW waking vector which points
to wakeup_start.

Thanks,
Srinivas
> 
> 	-hpa
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 36+ messages in thread

* x86: allow using different kernel version for 32-bit, too
  2015-06-16 21:40                           ` Rafael J. Wysocki
@ 2015-06-17  8:59                             ` Pavel Machek
  2015-06-18  9:13                             ` [PATCH v2] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
  1 sibling, 0 replies; 36+ messages in thread
From: Pavel Machek @ 2015-06-17  8:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Denys Vlasenko, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Srinivas Pandruvada, the arch/x86 maintainers,
	linux-pm, Andy Lutomirski, Brian Gerst, Peter Anvin, jbohac,
	vojtech

Rafael, would something like this be acceptable?

I think I'd like to unify hibernate_32/64.c into hibernate.c with
ifdefs as a next step.

Best regards,
								Pavel

commit e4d6e488069b1452fdaab06ecc812969d76ef777
Author: Pavel <pavel@ucw.cz>
Date:   Wed Jun 17 10:35:01 2015 +0200

    Port d158cbdf39ffaec9dd5299fdfdfdd2c7897a71dc to i386.

Signed-off-by: Pavel Machek <pavel@ucw.cz>

(Thanks go to czech SUSE, Vojtech Pavlik and Jiri Bohac for this one,
altough they probably have no chance to understand why.)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8c47337..808c262 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2074,7 +2074,7 @@ menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
 	def_bool y
-	depends on X86_64 && HIBERNATION
+	depends on HIBERNATION
 
 source "kernel/power/Kconfig"
 
diff --git a/arch/x86/include/asm/suspend_32.h b/arch/x86/include/asm/suspend_32.h
index 552d6c9..d5711c3 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -24,4 +24,7 @@ struct saved_context {
 	unsigned long return_address;
 } __attribute__((packed));
 
+extern char core_restore_code;
+extern char restore_registers;
+
 #endif /* _ASM_X86_SUSPEND_32_H */
diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
new file mode 100644
index 0000000..c5f2fce
--- /dev/null
+++ b/arch/x86/power/hibernate.c
@@ -0,0 +1,46 @@
+int reallocate_restore_code(void)
+{
+	relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC);
+	if (!relocated_restore_code)
+		return -ENOMEM;
+	memcpy(relocated_restore_code, &core_restore_code,
+	       &restore_registers - &core_restore_code);
+	return 0;
+}
+
+struct restore_data_record {
+	unsigned long jump_address;
+	unsigned long cr3;
+	unsigned long magic;
+};
+
+/**
+ *	arch_hibernation_header_save - populate the architecture specific part
+ *		of a hibernation image header
+ *	@addr: address to save the data at
+ */
+int arch_hibernation_header_save(void *addr, unsigned int max_size)
+{
+	struct restore_data_record *rdr = addr;
+
+	if (max_size < sizeof(struct restore_data_record))
+		return -EOVERFLOW;
+	rdr->jump_address = restore_jump_address;
+	rdr->cr3 = restore_cr3;
+	rdr->magic = RESTORE_MAGIC;
+	return 0;
+}
+
+/**
+ *	arch_hibernation_header_restore - read the architecture specific data
+ *		from the hibernation image header
+ *	@addr: address to read the data from
+ */
+int arch_hibernation_header_restore(void *addr)
+{
+	struct restore_data_record *rdr = addr;
+
+	restore_jump_address = rdr->jump_address;
+	restore_cr3 = rdr->cr3;
+	return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
+}
diff --git a/arch/x86/power/hibernate_32.c b/arch/x86/power/hibernate_32.c
index 291226b..29c8cdf 100644
--- a/arch/x86/power/hibernate_32.c
+++ b/arch/x86/power/hibernate_32.c
@@ -4,6 +4,7 @@
  * Distribute under GPLv2
  *
  * Copyright (c) 2006 Rafael J. Wysocki <rjw@sisk.pl>
+ * Copyright (c) 2015 Pavel Machek <pavel@ucw.cz>
  */
 
 #include <linux/gfp.h>
@@ -14,13 +15,28 @@
 #include <asm/pgtable.h>
 #include <asm/mmzone.h>
 #include <asm/sections.h>
+#include <asm/suspend.h>
 
 /* Defined in hibernate_asm_32.S */
 extern int restore_image(void);
 
+/*
+ * Address to jump to in the last phase of restore in order to get to the image
+ * kernel's text (this value is passed in the image header).
+ */
+unsigned long restore_jump_address __visible;
+
+/*
+ * Value of the cr3 register from before the hibernation (this value is passed
+ * in the image header).
+ */
+unsigned long restore_cr3 __visible;
+
 /* Pointer to the temporary resume page tables */
 pgd_t *resume_pg_dir;
 
+void *relocated_restore_code __visible;
+
 /* The following three functions are based on the analogous code in
  * arch/x86/mm/init_32.c
  */
@@ -142,6 +158,9 @@ static inline void resume_init_first_level_page_table(pgd_t *pg_dir)
 #endif
 }
 
+#define RESTORE_MAGIC	0x2468aceUL
+#include "hibernate.c"
+
 int swsusp_arch_resume(void)
 {
 	int error;
@@ -155,6 +174,10 @@ int swsusp_arch_resume(void)
 	if (error)
 		return error;
 
+	error = reallocate_restore_code();
+	if (error)
+		return error;
+
 	/* We have got enough memory and from now on we cannot recover */
 	restore_image();
 	return 0;
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
index 009947d..527a902 100644
--- a/arch/x86/power/hibernate_64.c
+++ b/arch/x86/power/hibernate_64.c
@@ -78,6 +78,9 @@ static int set_up_temporary_mappings(void)
 	return 0;
 }
 
+#define RESTORE_MAGIC	0x0123456789ABCDEFUL
+#include "hibernate.c"
+
 int swsusp_arch_resume(void)
 {
 	int error;
@@ -86,11 +89,9 @@ int swsusp_arch_resume(void)
 	if ((error = set_up_temporary_mappings()))
 		return error;
 
-	relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC);
-	if (!relocated_restore_code)
-		return -ENOMEM;
-	memcpy(relocated_restore_code, &core_restore_code,
-	       &restore_registers - &core_restore_code);
+	error = reallocate_restore_code();
+	if (error)
+		return error;
 
 	restore_image();
 	return 0;
@@ -107,41 +108,4 @@ int pfn_is_nosave(unsigned long pfn)
 	return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
 }
 
-struct restore_data_record {
-	unsigned long jump_address;
-	unsigned long cr3;
-	unsigned long magic;
-};
-
-#define RESTORE_MAGIC	0x0123456789ABCDEFUL
-
-/**
- *	arch_hibernation_header_save - populate the architecture specific part
- *		of a hibernation image header
- *	@addr: address to save the data at
- */
-int arch_hibernation_header_save(void *addr, unsigned int max_size)
-{
-	struct restore_data_record *rdr = addr;
-
-	if (max_size < sizeof(struct restore_data_record))
-		return -EOVERFLOW;
-	rdr->jump_address = restore_jump_address;
-	rdr->cr3 = restore_cr3;
-	rdr->magic = RESTORE_MAGIC;
-	return 0;
-}
-
-/**
- *	arch_hibernation_header_restore - read the architecture specific data
- *		from the hibernation image header
- *	@addr: address to read the data from
- */
-int arch_hibernation_header_restore(void *addr)
-{
-	struct restore_data_record *rdr = addr;
 
-	restore_jump_address = rdr->jump_address;
-	restore_cr3 = rdr->cr3;
-	return (rdr->magic == RESTORE_MAGIC) ? 0 : -EINVAL;
-}
diff --git a/arch/x86/power/hibernate_asm_32.S b/arch/x86/power/hibernate_asm_32.S
index 1d0fa0e..db5f22a 100644
--- a/arch/x86/power/hibernate_asm_32.S
+++ b/arch/x86/power/hibernate_asm_32.S
@@ -1,5 +1,14 @@
 /*
- * This may not use any stack, nor any variable that is not "NoSave":
+ * Hibernation support for i386
+ *
+ * Distribute under GPLv2.
+ *
+ * Copyright 2007 Rafael J. Wysocki <rjw@sisk.pl>
+ * Copyright 2005 Andi Kleen <ak@suse.de>
+ * Copyright 2004, 2015 Pavel Machek <pavel@ucw.cz>
+ *
+ * swsusp_arch_resume must not use any stack or any nonlocal variables while
+ * copying pages:
  *
  * Its rewriting one kernel image with another. What is stack in "old"
  * image could very well be data page in "new" image, and overwriting
@@ -23,6 +32,13 @@ ENTRY(swsusp_arch_suspend)
 	pushfl
 	popl saved_context_eflags
 
+	/* save the address of restore_registers */
+	movl	$restore_registers, %eax
+	movl	%eax, restore_jump_address
+	/* save cr3 */
+	movl	%cr3, %eax
+	movl	%eax, restore_cr3
+
 	call swsusp_save
 	ret
 
@@ -38,9 +54,18 @@ ENTRY(restore_image)
 	movl	%cr3, %eax;  # flush TLB
 	movl	%eax, %cr3
 1:
+
+	/* prepare to jump to the image kernel */
+	movl	restore_jump_address, %eax
+	movl	restore_cr3, %ebx
+
+	/* prepare to copy image data to their original locations */
 	movl	restore_pblist, %edx
+	movl	relocated_restore_code, %ecx
+	jmpl	*%ecx
 	.p2align 4,,7
 
+ENTRY(core_restore_code)
 copy_loop:
 	testl	%edx, %edx
 	jz	done
@@ -48,7 +73,7 @@ copy_loop:
 	movl	pbe_address(%edx), %esi
 	movl	pbe_orig_address(%edx), %edi
 
-	movl	$1024, %ecx
+	movl	$(PAGE_SIZE >> 2), %ecx
 	rep
 	movsl
 
@@ -57,6 +82,20 @@ copy_loop:
 	.p2align 4,,7
 
 done:
+	/* jump to the restore_registers address from the image header */
+	jmpl	*%eax
+	/*
+	 * NOTE: This assumes that the boot kernel's text mapping covers the
+	 * image kernel's page containing restore_registers and the address of
+	 * this page is the same as in the image kernel's text mapping (it
+	 * should always be true, because the text mapping is linear, starting
+	 * from 0, and is supposed to cover the entire kernel text for every
+	 * kernel).
+	 *
+	 * code below belongs to the image kernel
+	 */
+
+ENTRY(restore_registers)
 	/* go back to the original page tables */
 	movl	$swapper_pg_dir, %eax
 	subl	$__PAGE_OFFSET, %eax
@@ -81,4 +120,7 @@ done:
 
 	xorl	%eax, %eax
 
+	/* tell the hibernation core that we've just restored the memory */
+	movl	%eax, in_suspend
+
 	ret


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH, DEBUG] x86/32: Add small delay after resume
  2015-06-16 21:33         ` H. Peter Anvin
  2015-06-16 22:25           ` Srinivas Pandruvada
@ 2015-06-17 16:33           ` Konrad Rzeszutek Wilk
  2015-06-17 17:22             ` H. Peter Anvin
  1 sibling, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-17 16:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Srinivas Pandruvada, Ingo Molnar, mingo, tglx, pavel, rjw, x86,
	linux-pm, linux-kernel, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Linus Torvalds, Kleen, Andi

On Tue, Jun 16, 2015 at 02:33:10PM -0700, H. Peter Anvin wrote:
> On 06/15/2015 09:10 AM, Srinivas Pandruvada wrote:
> >>
> >> So is it true that we always execute wakeup_pmode_return first after we return 
> >> from the BIOS?
> >>
> >> If so then the BIOS touching DS cannot be an issue, as we re-initialize all 
> >> segment selectors, which reloads the descriptors:
> >>
> >> ENTRY(wakeup_pmode_return)
> >> wakeup_pmode_return:
> >>         movw    $__KERNEL_DS, %ax
> >>         movw    %ax, %ss
> >>         movw    %ax, %ds
> >>         movw    %ax, %es
> >>         movw    %ax, %fs
> >>         movw    %ax, %gs
> >>
> >>         # reload the gdt, as we need the full 32 bit address
> >>         lidt    saved_idt
> >>         lldt    saved_ldt
> >>         ljmp    $(__KERNEL_CS), $1f
> >>
> 
> Where does the GDT get initialized?
> 
> 	-hpa

mit 84e70971e67d97bc2db18a4e76d42846272a54bd
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Fri Apr 5 16:42:22 2013 -0400

    x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, DEBUG] x86/32: Add small delay after resume
  2015-06-17 16:33           ` Konrad Rzeszutek Wilk
@ 2015-06-17 17:22             ` H. Peter Anvin
  2015-06-17 18:29               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 36+ messages in thread
From: H. Peter Anvin @ 2015-06-17 17:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Srinivas Pandruvada, Ingo Molnar, mingo, tglx, pavel, rjw, x86,
	linux-pm, linux-kernel, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Linus Torvalds, Kleen, Andi

On 06/17/2015 09:33 AM, Konrad Rzeszutek Wilk wrote:
>>>>
>>
>> Where does the GDT get initialized?
>>
>> 	-hpa
> 
> mit 84e70971e67d97bc2db18a4e76d42846272a54bd
> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date:   Fri Apr 5 16:42:22 2013 -0400
> 
>     x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
> 

Store, no.  LOAD?

	-hpa



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH, DEBUG] x86/32: Add small delay after resume
  2015-06-17 17:22             ` H. Peter Anvin
@ 2015-06-17 18:29               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-06-17 18:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Srinivas Pandruvada, Ingo Molnar, mingo, tglx, pavel, rjw, x86,
	linux-pm, linux-kernel, Denys Vlasenko, Andy Lutomirski,
	Borislav Petkov, Brian Gerst, Linus Torvalds, Kleen, Andi

On Wed, Jun 17, 2015 at 10:22:49AM -0700, H. Peter Anvin wrote:
> On 06/17/2015 09:33 AM, Konrad Rzeszutek Wilk wrote:
> >>>>
> >>
> >> Where does the GDT get initialized?
> >>
> >> 	-hpa
> > 
> > mit 84e70971e67d97bc2db18a4e76d42846272a54bd
> > Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Date:   Fri Apr 5 16:42:22 2013 -0400
> > 
> >     x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
> > 
> 
> Store, no.  LOAD?

__save_processor_state:
..
  /*
         * We save it here, but restore it only in the hibernate case.
         * For ACPI S3 resume, this is loaded via 'early_gdt_desc' in
         * 64-bit
         * mode in "secondary_startup_64". In 32-bit mode it is done via
         * 'pmode_gdt' in wakeup_start.
         */

> 
> 	-hpa
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] x86: Load __USER_DS into DS/ES after resume
  2015-06-16 21:40                           ` Rafael J. Wysocki
  2015-06-17  8:59                             ` x86: allow using different kernel version for 32-bit, too Pavel Machek
@ 2015-06-18  9:13                             ` Ingo Molnar
  2015-06-22 14:06                               ` Rafael J. Wysocki
  1 sibling, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2015-06-18  9:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Linus Torvalds, Denys Vlasenko, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Srinivas Pandruvada,
	the arch/x86 maintainers, linux-pm, Andy Lutomirski, Brian Gerst,
	Peter Anvin


* Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> On Tuesday, June 16, 2015 11:13:33 AM Pavel Machek wrote:
> > > > That second KERNEL should be USER.
> > > 
> > > Yeah, doh :-/ Updated patch below.
> > > 
> > > Thanks,
> > > 
> > > 	Ingo
> > 
> > Acked-by: Pavel Machek <pavel@ucw.cz>
> > Tested-by: Pavel Machek <pavel@ucw.cz>
> > 
> > [of course, it worked for me before, so I only tested it also works
> > with the patch.]
> > 
> > Rafael, can you apply it?
> 
> Yes, I can, but I thought Ingo would take it into the tip tree.
> 
> Ingo, what's your plan regarding this one?

Feel free to push it!

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] x86: Load __USER_DS into DS/ES after resume
  2015-06-18  9:13                             ` [PATCH v2] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
@ 2015-06-22 14:06                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2015-06-22 14:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pavel Machek, Linus Torvalds, Denys Vlasenko, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Srinivas Pandruvada,
	the arch/x86 maintainers, linux-pm, Andy Lutomirski, Brian Gerst,
	Peter Anvin

On Thursday, June 18, 2015 11:13:58 AM Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> 
> > On Tuesday, June 16, 2015 11:13:33 AM Pavel Machek wrote:
> > > > > That second KERNEL should be USER.
> > > > 
> > > > Yeah, doh :-/ Updated patch below.
> > > > 
> > > > Thanks,
> > > > 
> > > > 	Ingo
> > > 
> > > Acked-by: Pavel Machek <pavel@ucw.cz>
> > > Tested-by: Pavel Machek <pavel@ucw.cz>
> > > 
> > > [of course, it worked for me before, so I only tested it also works
> > > with the patch.]
> > > 
> > > Rafael, can you apply it?
> > 
> > Yes, I can, but I thought Ingo would take it into the tip tree.
> > 
> > Ingo, what's your plan regarding this one?
> 
> Feel free to push it!
> 
> Acked-by: Ingo Molnar <mingo@kernel.org>

OK, applied, thanks!

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-06-22 13:40 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-11 23:45 [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
2015-06-12  6:07 ` Ingo Molnar
2015-06-12  6:48   ` Andy Lutomirski
2015-06-12  7:15     ` Ingo Molnar
2015-06-12  7:41   ` Andy Lutomirski
2015-06-12  7:50     ` Ingo Molnar
2015-06-12  8:15       ` H. Peter Anvin
2015-06-12  8:36         ` Ingo Molnar
2015-06-12 15:48           ` Brian Gerst
2015-06-12 18:11             ` Andy Lutomirski
2015-06-12 18:31               ` Srinivas Pandruvada
2015-06-13  7:00                 ` Ingo Molnar
2015-06-12 22:45             ` Denys Vlasenko
2015-06-13 14:20               ` Pavel Machek
2015-06-13  7:03             ` Ingo Molnar
2015-06-13 18:23               ` Andy Lutomirski
2015-06-13 21:30                 ` Brian Gerst
2015-06-14  6:56                   ` [PATCH] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
2015-06-14  7:03                     ` Pavel Machek
     [not found]                     ` <CA+55aFzB9dYidEf_7Hs47FOF7WPPJnJQwj_RiwL--c5Gb1uqyw@mail.gmail.com>
2015-06-14  7:49                       ` [PATCH v2] " Ingo Molnar
2015-06-14  8:57                         ` Pavel Machek
2015-06-14 14:22                           ` Brian Gerst
2015-06-15 16:12                         ` Srinivas Pandruvada
2015-06-16  9:13                         ` Pavel Machek
2015-06-16 21:40                           ` Rafael J. Wysocki
2015-06-17  8:59                             ` x86: allow using different kernel version for 32-bit, too Pavel Machek
2015-06-18  9:13                             ` [PATCH v2] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
2015-06-22 14:06                               ` Rafael J. Wysocki
2015-06-12 16:15   ` [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
2015-06-13  7:15     ` [PATCH, DEBUG] x86/32: Add small delay after resume Ingo Molnar
2015-06-15 16:10       ` Srinivas Pandruvada
2015-06-16 21:33         ` H. Peter Anvin
2015-06-16 22:25           ` Srinivas Pandruvada
2015-06-17 16:33           ` Konrad Rzeszutek Wilk
2015-06-17 17:22             ` H. Peter Anvin
2015-06-17 18:29               ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.