All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: mingo@redhat.com, tglx@linutronix.de, hpa@zytor.com,
	pavel@ucw.cz, rjw@rjwysocki.net, x86@kernel.org,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Borislav Petkov <bp@alien8.de>, Brian Gerst <brgerst@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Kleen, Andi" <andi.kleen@intel.com>
Subject: [PATCH, DEBUG] x86/32: Add small delay after resume
Date: Sat, 13 Jun 2015 09:15:47 +0200	[thread overview]
Message-ID: <20150613071547.GA27446@gmail.com> (raw)
In-Reply-To: <1434125724.2353.19.camel@spandruv-DESK3.jf.intel.com>


* Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:

> >  Also, could you please describe how the failure triggers in your system: how 
> > many times do you have to suspend/resume to trigger the segfaults, and is 
> > there anything that makes the failures less or more likely?
>
> It is very random. Sometimes only few hundred trys reproduce this issue. Some 
> other times it requires thousands of trys (sometimes not reproducible at all for 
> days) It is very time sensitive.

So the very same kernel image will produce different crash patterns depending on 
the time of day? That suggests heat/hardware problems.

> [...] A small delay or some debug code in resume path prevents this to crash.

Fun...

> The BIOS folks created special version to check if they are corrupting any DS, 
> but they were not able to catch any corruption. [...]

So is it true that we always execute wakeup_pmode_return first after we return 
from the BIOS?

If so then the BIOS touching DS cannot be an issue, as we re-initialize all 
segment selectors, which reloads the descriptors:

ENTRY(wakeup_pmode_return)
wakeup_pmode_return:
        movw    $__KERNEL_DS, %ax
        movw    %ax, %ss
        movw    %ax, %ds
        movw    %ax, %es
        movw    %ax, %fs
        movw    %ax, %gs

        # reload the gdt, as we need the full 32 bit address
        lidt    saved_idt
        lldt    saved_ldt
        ljmp    $(__KERNEL_CS), $1f

> [...] Since these are special deployed systems running critical application, 
> need to request the tests again with your changes. May take long time.

So my second patch is clearly broken as per Brian Gerst's comments.

What I would suggest is to try a patch that adds just 100 NOPs or so - attached 
below. This patch will add a small delay without any side effects (other than 
changing the kernel image layout).

If that makes the crash go away, then I'd say the likelihood that it's hardware 
related increases substantially: maybe a PLL has not stabilized yet sufficiently 
after resume, or there's some latent heat sensitivity and the fan has not started 
up yet, etc.

( You can then use this simple delay generating patch in production systems as 
  well, to work around the problem. Maybe convince the BIOS folks to add a delay 
  like this to their resume path before they call Linux. )

Thanks,

	Ingo

=================>

 arch/x86/kernel/acpi/wakeup_32.S | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 665c6b7d2ea9..ef26999da80a 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -10,6 +10,12 @@
 
 ENTRY(wakeup_pmode_return)
 wakeup_pmode_return:
+
+	/* Timing delay of a few dozen cycles: give the hardware some time to recover */
+	.rept 100
+	nop
+	.endr
+
 	movw	$__KERNEL_DS, %ax
 	movw	%ax, %ss
 	movw	%ax, %ds

  reply	other threads:[~2015-06-13  7:16 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11 23:45 [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
2015-06-12  6:07 ` Ingo Molnar
2015-06-12  6:48   ` Andy Lutomirski
2015-06-12  7:15     ` Ingo Molnar
2015-06-12  7:41   ` Andy Lutomirski
2015-06-12  7:50     ` Ingo Molnar
2015-06-12  8:15       ` H. Peter Anvin
2015-06-12  8:36         ` Ingo Molnar
2015-06-12 15:48           ` Brian Gerst
2015-06-12 18:11             ` Andy Lutomirski
2015-06-12 18:31               ` Srinivas Pandruvada
2015-06-13  7:00                 ` Ingo Molnar
2015-06-12 22:45             ` Denys Vlasenko
2015-06-13 14:20               ` Pavel Machek
2015-06-13  7:03             ` Ingo Molnar
2015-06-13 18:23               ` Andy Lutomirski
2015-06-13 21:30                 ` Brian Gerst
2015-06-14  6:56                   ` [PATCH] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
2015-06-14  7:03                     ` Pavel Machek
     [not found]                     ` <CA+55aFzB9dYidEf_7Hs47FOF7WPPJnJQwj_RiwL--c5Gb1uqyw@mail.gmail.com>
2015-06-14  7:49                       ` [PATCH v2] " Ingo Molnar
2015-06-14  8:57                         ` Pavel Machek
2015-06-14 14:22                           ` Brian Gerst
2015-06-15 16:12                         ` Srinivas Pandruvada
2015-06-16  9:13                         ` Pavel Machek
2015-06-16 21:40                           ` Rafael J. Wysocki
2015-06-17  8:59                             ` x86: allow using different kernel version for 32-bit, too Pavel Machek
2015-06-18  9:13                             ` [PATCH v2] x86: Load __USER_DS into DS/ES after resume Ingo Molnar
2015-06-22 14:06                               ` Rafael J. Wysocki
2015-06-12 16:15   ` [PATCH] x86: General protection fault after STR (32 bit systems only) Srinivas Pandruvada
2015-06-13  7:15     ` Ingo Molnar [this message]
2015-06-15 16:10       ` [PATCH, DEBUG] x86/32: Add small delay after resume Srinivas Pandruvada
2015-06-16 21:33         ` H. Peter Anvin
2015-06-16 22:25           ` Srinivas Pandruvada
2015-06-17 16:33           ` Konrad Rzeszutek Wilk
2015-06-17 17:22             ` H. Peter Anvin
2015-06-17 18:29               ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150613071547.GA27446@gmail.com \
    --to=mingo@kernel.org \
    --cc=andi.kleen@intel.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=pavel@ucw.cz \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.