All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Druzhinin <igor.druzhinin@citrix.com>
To: andrew.cooper3@citrix.com, jbeulich@suse.com
Cc: Igor Druzhinin <igor.druzhinin@citrix.com>, xen-devel@lists.xen.org
Subject: [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
Date: Fri, 16 Feb 2018 17:46:48 +0000	[thread overview]
Message-ID: <1518803208-25855-1-git-send-email-igor.druzhinin@citrix.com> (raw)

We're noticing a reproducible system boot hang on certain
post-Skylake platforms where the BIOS is configured in
legacy boot mode with x2APIC disabled. The system stalls
immediately after writing the first SMP initialization
sequence into APIC ICR.

The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation for
some reason takes more time than we expect during INIT-SIPI-SIPI
sequence. As the result, the system is constantly moving between
NMI and SMI handler and not making any progress.

To avoid this, initialize the watchdog after SMP bootstrap on
CPU0 and, additionally, protect the NMI handler by moving
IO port access before NMI re-scheduling.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 xen/arch/x86/apic.c    | 2 +-
 xen/arch/x86/smpboot.c | 3 +++
 xen/arch/x86/traps.c   | 9 +++++++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 5039173..ffa5a69 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -684,7 +684,7 @@ void setup_local_APIC(void)
         printk("Leaving ESR disabled.\n");
     }
 
-    if (nmi_watchdog == NMI_LOCAL_APIC)
+    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
         setup_apic_nmi_watchdog();
     apic_pm_activate();
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 2ebef03..1844116 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
 void __init smp_cpus_done(void)
 {
     if ( nmi_watchdog == NMI_LOCAL_APIC )
+    {
+        setup_apic_nmi_watchdog();
         check_nmi_watchdog();
+    }
 
     setup_ioapic_dest();
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2e022b0..c16f146 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback = dummy_nmi_callback;
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
-    unsigned char reason;
+    unsigned char reason = 0;
     bool handle_unknown = false;
 
     ++nmi_count(cpu);
@@ -1714,6 +1714,12 @@ void do_nmi(const struct cpu_user_regs *regs)
     if ( nmi_callback(regs, cpu) )
         return;
 
+    /* This IO port access is likely to produce SMI which, in turn,
+     * may take enough time for the next NMI tick to happen. To avoid having
+     * nested NMIs as the result let's call it before watchdog re-scheduling */
+    if ( cpu == 0 )
+        reason = inb(0x61);
+
     if ( (nmi_watchdog == NMI_NONE) ||
          (!nmi_watchdog_tick(regs) && watchdog_force) )
         handle_unknown = true;
@@ -1721,7 +1727,6 @@ void do_nmi(const struct cpu_user_regs *regs)
     /* Only the BSP gets external NMIs from the system. */
     if ( cpu == 0 )
     {
-        reason = inb(0x61);
         if ( reason & 0x80 )
             pci_serr_error(regs);
         if ( reason & 0x40 )
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

             reply	other threads:[~2018-02-16 17:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-16 17:46 Igor Druzhinin [this message]
2018-02-17  5:20 ` [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap Alexey G
2018-02-19 13:12 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1518803208-25855-1-git-send-email-igor.druzhinin@citrix.com \
    --to=igor.druzhinin@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.