All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
@ 2018-02-16 17:46 Igor Druzhinin
  2018-02-17  5:20 ` Alexey G
  2018-02-19 13:12 ` Jan Beulich
  0 siblings, 2 replies; 3+ messages in thread
From: Igor Druzhinin @ 2018-02-16 17:46 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich; +Cc: Igor Druzhinin, xen-devel

We're noticing a reproducible system boot hang on certain
post-Skylake platforms where the BIOS is configured in
legacy boot mode with x2APIC disabled. The system stalls
immediately after writing the first SMP initialization
sequence into APIC ICR.

The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation for
some reason takes more time than we expect during INIT-SIPI-SIPI
sequence. As the result, the system is constantly moving between
NMI and SMI handler and not making any progress.

To avoid this, initialize the watchdog after SMP bootstrap on
CPU0 and, additionally, protect the NMI handler by moving
IO port access before NMI re-scheduling.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 xen/arch/x86/apic.c    | 2 +-
 xen/arch/x86/smpboot.c | 3 +++
 xen/arch/x86/traps.c   | 9 +++++++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 5039173..ffa5a69 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -684,7 +684,7 @@ void setup_local_APIC(void)
         printk("Leaving ESR disabled.\n");
     }
 
-    if (nmi_watchdog == NMI_LOCAL_APIC)
+    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
         setup_apic_nmi_watchdog();
     apic_pm_activate();
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 2ebef03..1844116 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
 void __init smp_cpus_done(void)
 {
     if ( nmi_watchdog == NMI_LOCAL_APIC )
+    {
+        setup_apic_nmi_watchdog();
         check_nmi_watchdog();
+    }
 
     setup_ioapic_dest();
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2e022b0..c16f146 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback = dummy_nmi_callback;
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
-    unsigned char reason;
+    unsigned char reason = 0;
     bool handle_unknown = false;
 
     ++nmi_count(cpu);
@@ -1714,6 +1714,12 @@ void do_nmi(const struct cpu_user_regs *regs)
     if ( nmi_callback(regs, cpu) )
         return;
 
+    /* This IO port access is likely to produce SMI which, in turn,
+     * may take enough time for the next NMI tick to happen. To avoid having
+     * nested NMIs as the result let's call it before watchdog re-scheduling */
+    if ( cpu == 0 )
+        reason = inb(0x61);
+
     if ( (nmi_watchdog == NMI_NONE) ||
          (!nmi_watchdog_tick(regs) && watchdog_force) )
         handle_unknown = true;
@@ -1721,7 +1727,6 @@ void do_nmi(const struct cpu_user_regs *regs)
     /* Only the BSP gets external NMIs from the system. */
     if ( cpu == 0 )
     {
-        reason = inb(0x61);
         if ( reason & 0x80 )
             pci_serr_error(regs);
         if ( reason & 0x40 )
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
  2018-02-16 17:46 [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap Igor Druzhinin
@ 2018-02-17  5:20 ` Alexey G
  2018-02-19 13:12 ` Jan Beulich
  1 sibling, 0 replies; 3+ messages in thread
From: Alexey G @ 2018-02-17  5:20 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: andrew.cooper3, jbeulich, xen-devel

On Fri, 16 Feb 2018 17:46:48 +0000
Igor Druzhinin <igor.druzhinin@citrix.com> wrote:

>We're noticing a reproducible system boot hang on certain
>post-Skylake platforms where the BIOS is configured in
>legacy boot mode with x2APIC disabled. The system stalls
>immediately after writing the first SMP initialization
>sequence into APIC ICR.
>
>The cause of the problem is watchdog NMI handler execution -
>somewhere near the end of NMI handling (after it's already
>rescheduled the next NMI) it tries to access IO port 0x61
>to get the actual NMI reason on CPU0. Unfortunately, this
>port is emulated by BIOS using SMIs and this emulation for
>some reason takes more time than we expect during INIT-SIPI-SIPI
>sequence. As the result, the system is constantly moving between
>NMI and SMI handler and not making any progress.
>
>To avoid this, initialize the watchdog after SMP bootstrap on
>CPU0 and, additionally, protect the NMI handler by moving
>IO port access before NMI re-scheduling.
>
>Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

This is far better than merely reducing the NMI rate.

The reason why SMI execution takes so long in the MP initialization
sequence particularly is (very likely) a necessity to wait for other
CPUs in the SMI handler.

Sending INIT causes target CPU to become numb for a relatively long
time (SDM propose 10ms delay in the MP init seq after it). If, right
after sending the INIT IPI, we immediately cause SMI I/O trap by reading
port 61h -- this means that at the beginning of the SMI handler
the primary CPU will have to wait (via a spinlock) for other CPUs
including the one which only started to come out from INIT coma.
AFAIR it doesn't even begin execution of SMI processing until all CPUs
sync their execution in SMM via that spinlock. Thus overall long SMI
processing time, enough for a next NMI watchdog tick to arrive.

Basically, it's a race between sending INIT to APs, triggering SMI by
reading the port 61h, spin-waiting for other CPUs at the start of the
SMI handler and the NMI watchdog timer (which generates flow of SMIs).

If this assumptions are correct, delaying triggering SMIs due to a
NMI watchdog until MP initialization is complete is the right
solution.

>---
> xen/arch/x86/apic.c    | 2 +-
> xen/arch/x86/smpboot.c | 3 +++
> xen/arch/x86/traps.c   | 9 +++++++--
> 3 files changed, 11 insertions(+), 3 deletions(-)
>
>diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
>index 5039173..ffa5a69 100644
>--- a/xen/arch/x86/apic.c
>+++ b/xen/arch/x86/apic.c
>@@ -684,7 +684,7 @@ void setup_local_APIC(void)
>         printk("Leaving ESR disabled.\n");
>     }
> 
>-    if (nmi_watchdog == NMI_LOCAL_APIC)
>+    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
>         setup_apic_nmi_watchdog();
>     apic_pm_activate();
> }
>diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>index 2ebef03..1844116 100644
>--- a/xen/arch/x86/smpboot.c
>+++ b/xen/arch/x86/smpboot.c
>@@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
> void __init smp_cpus_done(void)
> {
>     if ( nmi_watchdog == NMI_LOCAL_APIC )
>+    {
>+        setup_apic_nmi_watchdog();
>         check_nmi_watchdog();
>+    }
> 
>     setup_ioapic_dest();
> 
>diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
>index 2e022b0..c16f146 100644
>--- a/xen/arch/x86/traps.c
>+++ b/xen/arch/x86/traps.c
>@@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback =
>dummy_nmi_callback;
> void do_nmi(const struct cpu_user_regs *regs)
> {
>     unsigned int cpu = smp_processor_id();
>-    unsigned char reason;
>+    unsigned char reason = 0;
>     bool handle_unknown = false;
> 
>     ++nmi_count(cpu);
>@@ -1714,6 +1714,12 @@ void do_nmi(const struct cpu_user_regs *regs)
>     if ( nmi_callback(regs, cpu) )
>         return;
> 
>+    /* This IO port access is likely to produce SMI which, in turn,
>+     * may take enough time for the next NMI tick to happen. To avoid
>having
>+     * nested NMIs as the result let's call it before watchdog
>re-scheduling */
>+    if ( cpu == 0 )
>+        reason = inb(0x61);
>+
>     if ( (nmi_watchdog == NMI_NONE) ||
>          (!nmi_watchdog_tick(regs) && watchdog_force) )
>         handle_unknown = true;
>@@ -1721,7 +1727,6 @@ void do_nmi(const struct cpu_user_regs *regs)
>     /* Only the BSP gets external NMIs from the system. */
>     if ( cpu == 0 )
>     {
>-        reason = inb(0x61);
>         if ( reason & 0x80 )
>             pci_serr_error(regs);
>         if ( reason & 0x40 )


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
  2018-02-16 17:46 [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap Igor Druzhinin
  2018-02-17  5:20 ` Alexey G
@ 2018-02-19 13:12 ` Jan Beulich
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2018-02-19 13:12 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: andrew.cooper3, xen-devel

>>> On 16.02.18 at 18:46, <igor.druzhinin@citrix.com> wrote:
> We're noticing a reproducible system boot hang on certain
> post-Skylake platforms where the BIOS is configured in
> legacy boot mode with x2APIC disabled. The system stalls
> immediately after writing the first SMP initialization
> sequence into APIC ICR.
> 
> The cause of the problem is watchdog NMI handler execution -
> somewhere near the end of NMI handling (after it's already
> rescheduled the next NMI) it tries to access IO port 0x61
> to get the actual NMI reason on CPU0. Unfortunately, this
> port is emulated by BIOS using SMIs and this emulation for
> some reason takes more time than we expect during INIT-SIPI-SIPI
> sequence. As the result, the system is constantly moving between
> NMI and SMI handler and not making any progress.
> 
> To avoid this, initialize the watchdog after SMP bootstrap on
> CPU0 and, additionally, protect the NMI handler by moving
> IO port access before NMI re-scheduling.

Much better, yet what about post boot onlining of CPUs? I think we
assume to be safe in that case just because at that time we run at
a lower nmi_hz. Might be worthwhile to spell this out above.

> @@ -1714,6 +1714,12 @@ void do_nmi(const struct cpu_user_regs *regs)
>      if ( nmi_callback(regs, cpu) )
>          return;
>  
> +    /* This IO port access is likely to produce SMI which, in turn,
> +     * may take enough time for the next NMI tick to happen. To avoid having
> +     * nested NMIs as the result let's call it before watchdog re-scheduling */

Please correct the comment style (/* and */ on their own lines,
full stop after second sentence. Also following the earlier
discussion I don't think "likely" is appropriate - how about "not
impossible"? Also perhaps "do it" instead of "call it" (as you're
talking about a port access, not a function call)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-02-19 13:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-16 17:46 [PATCH v2] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap Igor Druzhinin
2018-02-17  5:20 ` Alexey G
2018-02-19 13:12 ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.