All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv1] x86: only check for one watchdog NMI
@ 2015-06-22 16:21 David Vrabel
  2015-06-23  9:09 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: David Vrabel @ 2015-06-22 16:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, David Vrabel, Jan Beulich

Since the NMI handler can now recognize watchdog NMIs, make
check_nmi_watchdog() only check for at least one watchdog NMI.  This
prevents false negatives caused by other processors (which may be
being power managed by the BIOS) running at reduced clock frequencies.

This will also slightly speed up boot times since we only wait the
full 10 ticks if the NMI watchdog on one or more CPUs is not working.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 xen/arch/x86/nmi.c |   26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index 2ab97a0..725ea04 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -139,7 +139,18 @@ int nmi_active;
 
 static void __init wait_for_nmis(void *p)
 {
-    mdelay((10*1000)/nmi_hz); /* wait 10 ticks */
+    unsigned int cpu = smp_processor_id();
+    unsigned int start_count = nmi_count(cpu);
+    unsigned long ticks = 10 * 1000 * cpu_khz / nmi_hz;
+    unsigned long s, e;
+
+    s = rdtsc();
+    do {
+        cpu_relax();
+        if ( nmi_count(cpu) > start_count )
+            break;
+        e = rdtsc();
+    } while( e - s < ticks );
 }
 
 int __init check_nmi_watchdog (void)
@@ -156,15 +167,16 @@ int __init check_nmi_watchdog (void)
     for_each_online_cpu ( cpu )
         prev_nmi_count[cpu] = nmi_count(cpu);
 
-    /* Wait for 10 ticks.  Busy-wait on all CPUs: the LAPIC counter that
-     * the NMI watchdog uses only runs while the core's not halted */
-    if ( nmi_watchdog == NMI_LOCAL_APIC )
-        smp_call_function(wait_for_nmis, NULL, 0);
-    wait_for_nmis(NULL);
+    /*
+     * Wait at most 10 ticks for a watchdog NMI on each CPU.
+     * Busy-wait on all CPUs: the LAPIC counter that the NMI watchdog
+     * uses only runs while the core's not halted
+     */
+    on_selected_cpus(&cpu_online_map, wait_for_nmis, NULL, 1);
 
     for_each_online_cpu ( cpu )
     {
-        if ( nmi_count(cpu) - prev_nmi_count[cpu] <= 5 )
+        if ( nmi_count(cpu) - prev_nmi_count[cpu] < 1 )
         {
             printk(" %d", cpu);
             ok = 0;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv1] x86: only check for one watchdog NMI
  2015-06-22 16:21 [PATCHv1] x86: only check for one watchdog NMI David Vrabel
@ 2015-06-23  9:09 ` Jan Beulich
  2015-06-23  9:34   ` Andrew Cooper
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2015-06-23  9:09 UTC (permalink / raw)
  To: David Vrabel; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 22.06.15 at 18:21, <david.vrabel@citrix.com> wrote:
> Since the NMI handler can now recognize watchdog NMIs, make
> check_nmi_watchdog() only check for at least one watchdog NMI.  This
> prevents false negatives caused by other processors (which may be
> being power managed by the BIOS) running at reduced clock frequencies.

Doesn't this go a little too far, in that it allows through the case where
an NMI works exactly once (a behavior not unheard of)? Lowering the
count to e.g. 3 would seem acceptable, but not any further. Raising
the wait time might then need to be the way to go if the (approximate)
1:3 ratio still isn't enough to cope with BIOS power managed CPUs.

Btw., can an OS know of the power state CPUs come up in? I.e. can
the wait time be adjusted dynamically? Or is this (perhaps intentionally)
completely transparent to the OS?

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv1] x86: only check for one watchdog NMI
  2015-06-23  9:09 ` Jan Beulich
@ 2015-06-23  9:34   ` Andrew Cooper
  2015-06-23  9:51     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2015-06-23  9:34 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel; +Cc: xen-devel, Keir Fraser

On 23/06/15 10:09, Jan Beulich wrote:
>>>> On 22.06.15 at 18:21, <david.vrabel@citrix.com> wrote:
>> Since the NMI handler can now recognize watchdog NMIs, make
>> check_nmi_watchdog() only check for at least one watchdog NMI.  This
>> prevents false negatives caused by other processors (which may be
>> being power managed by the BIOS) running at reduced clock frequencies.
> Doesn't this go a little too far, in that it allows through the case where
> an NMI works exactly once (a behavior not unheard of)? Lowering the
> count to e.g. 3 would seem acceptable, but not any further. Raising
> the wait time might then need to be the way to go if the (approximate)
> 1:3 ratio still isn't enough to cope with BIOS power managed CPUs.
>
> Btw., can an OS know of the power state CPUs come up in? I.e. can
> the wait time be adjusted dynamically? Or is this (perhaps intentionally)
> completely transparent to the OS?

With the mwait driver, probably, as it is specifically designed to
completely bypass any firmware settings which might be in place.

Anything else is dependent on how much information can be gleaned from
the ACPI tables, but most firmware deliberately has a "BIOS controlled"
mode which is designed to restrict what the OS is capable of doing.

~Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv1] x86: only check for one watchdog NMI
  2015-06-23  9:34   ` Andrew Cooper
@ 2015-06-23  9:51     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2015-06-23  9:51 UTC (permalink / raw)
  To: Andrew Cooper, David Vrabel; +Cc: xen-devel, Keir Fraser

>>> On 23.06.15 at 11:34, <andrew.cooper3@citrix.com> wrote:
> On 23/06/15 10:09, Jan Beulich wrote:
>>>>> On 22.06.15 at 18:21, <david.vrabel@citrix.com> wrote:
>>> Since the NMI handler can now recognize watchdog NMIs, make
>>> check_nmi_watchdog() only check for at least one watchdog NMI.  This
>>> prevents false negatives caused by other processors (which may be
>>> being power managed by the BIOS) running at reduced clock frequencies.
>> Doesn't this go a little too far, in that it allows through the case where
>> an NMI works exactly once (a behavior not unheard of)? Lowering the
>> count to e.g. 3 would seem acceptable, but not any further. Raising
>> the wait time might then need to be the way to go if the (approximate)
>> 1:3 ratio still isn't enough to cope with BIOS power managed CPUs.
>>
>> Btw., can an OS know of the power state CPUs come up in? I.e. can
>> the wait time be adjusted dynamically? Or is this (perhaps intentionally)
>> completely transparent to the OS?
> 
> With the mwait driver, probably, as it is specifically designed to
> completely bypass any firmware settings which might be in place.

The mwait driver deals with C states, while reduced frequency
means P states (and we specifically busy wait here to prevent
entering and C state).

> Anything else is dependent on how much information can be gleaned from
> the ACPI tables, but most firmware deliberately has a "BIOS controlled"
> mode which is designed to restrict what the OS is capable of doing.

None of which would help in identifying what state a CPU is in.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-23  9:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 16:21 [PATCHv1] x86: only check for one watchdog NMI David Vrabel
2015-06-23  9:09 ` Jan Beulich
2015-06-23  9:34   ` Andrew Cooper
2015-06-23  9:51     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.