[Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask
@ 2020-02-17 18:43 Roger Pau Monne
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 1/6] x86/smp: unify header includes in smp.h Roger Pau Monne
                   ` (6 more replies)
  0 siblings, 7 replies; 32+ messages in thread
From: Roger Pau Monne @ 2020-02-17 18:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	Roger Pau Monne

Hello,

Commit:

5500d265a2a8fa63d60c08beb549de8ec82ff7a5
x86/smp: use APIC ALLBUT destination shorthand when possible

Introduced a bogus usage of the scratch cpumask: it was used in a
function that could be called from interrupt context, and hence using
the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
together with also preventing the usage of any per-CPU variables when
send_IPI_mask is called from #MC or #NMI context. Previous patches are
preparatory changes.

Patch #6 adds some debug infrastructure to make sure the scratch cpumask
is used in the right context, and hence should prevent further missuses.

Thanks, Roger.

Roger Pau Monne (6):
  x86/smp: unify header includes in smp.h
  x86: introduce a nmi_count tracking variable
  x86: track when in #MC context
  x86: track when in #NMI context
  x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  x86: add accessors for scratch cpu mask

 xen/arch/x86/cpu/mcheck/mce.c |  2 ++
 xen/arch/x86/io_apic.c        |  6 +++--
 xen/arch/x86/irq.c            | 13 ++++++---
 xen/arch/x86/mm.c             | 30 ++++++++++++++-------
 xen/arch/x86/msi.c            |  4 ++-
 xen/arch/x86/nmi.c            | 11 ++++----
 xen/arch/x86/smp.c            | 51 ++++++++++++++++++++++++++++++++++-
 xen/arch/x86/smpboot.c        | 10 +++++--
 xen/arch/x86/traps.c          | 10 ++++++-
 xen/include/asm-x86/hardirq.h | 23 +++++++++++++++-
 xen/include/asm-x86/nmi.h     |  2 ++
 xen/include/asm-x86/smp.h     | 15 ++++++++---
 xen/include/xen/irq_cpustat.h |  1 +
 13 files changed, 148 insertions(+), 30 deletions(-)

-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Xen-devel] [PATCH v2 1/6] x86/smp: unify header includes in smp.h
  2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
@ 2020-02-17 18:43 ` Roger Pau Monne
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 2/6] x86: introduce a nmi_count tracking variable Roger Pau Monne
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Roger Pau Monne @ 2020-02-17 18:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Unify the two adjacent header includes that are both gated with ifndef
__ASSEMBLY__.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/include/asm-x86/smp.h | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 1aa55d41e1..92d69a5ea0 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -5,13 +5,10 @@
  * We need the APIC definitions automatically as part of 'smp.h'
  */
 #ifndef __ASSEMBLY__
+#include <xen/bitops.h>
 #include <xen/kernel.h>
 #include <xen/cpumask.h>
 #include <asm/current.h>
-#endif
-
-#ifndef __ASSEMBLY__
-#include <xen/bitops.h>
 #include <asm/mpspec.h>
 #endif
 
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Xen-devel] [PATCH v2 2/6] x86: introduce a nmi_count tracking variable
  2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 1/6] x86/smp: unify header includes in smp.h Roger Pau Monne
@ 2020-02-17 18:43 ` Roger Pau Monne
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context Roger Pau Monne
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: Roger Pau Monne @ 2020-02-17 18:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

This is modeled after the irq_count variable, and is used to account
for all the NMIs handled by the system.

This will allow to repurpose the nmi_count() helper so it can be used
in a similar manner as local_irq_count(): account for the NMIs
currently in service.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/nmi.c        | 11 +++++------
 xen/arch/x86/traps.c      |  4 +++-
 xen/include/asm-x86/nmi.h |  2 ++
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index a5c6bdd0ce..e286ceeb40 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -151,15 +151,14 @@ int nmi_active;
 
 static void __init wait_for_nmis(void *p)
 {
-    unsigned int cpu = smp_processor_id();
-    unsigned int start_count = nmi_count(cpu);
+    unsigned int start_count = this_cpu(nmi_count);
     unsigned long ticks = 10 * 1000 * cpu_khz / nmi_hz;
     unsigned long s, e;
 
     s = rdtsc();
     do {
         cpu_relax();
-        if ( nmi_count(cpu) >= start_count + 2 )
+        if ( this_cpu(nmi_count) >= start_count + 2 )
             break;
         e = rdtsc();
     } while( e - s < ticks );
@@ -177,7 +176,7 @@ void __init check_nmi_watchdog(void)
     printk("Testing NMI watchdog on all CPUs:");
 
     for_each_online_cpu ( cpu )
-        prev_nmi_count[cpu] = nmi_count(cpu);
+        prev_nmi_count[cpu] = per_cpu(nmi_count, cpu);
 
     /*
      * Wait at most 10 ticks for 2 watchdog NMIs on each CPU.
@@ -188,7 +187,7 @@ void __init check_nmi_watchdog(void)
 
     for_each_online_cpu ( cpu )
     {
-        if ( nmi_count(cpu) - prev_nmi_count[cpu] < 2 )
+        if ( per_cpu(nmi_count, cpu) - prev_nmi_count[cpu] < 2 )
         {
             printk(" %d", cpu);
             ok = false;
@@ -593,7 +592,7 @@ static void do_nmi_stats(unsigned char key)
 
     printk("CPU\tNMI\n");
     for_each_online_cpu ( i )
-        printk("%3d\t%3d\n", i, nmi_count(i));
+        printk("%3d\t%3u\n", i, per_cpu(nmi_count, i));
 
     if ( ((d = hardware_domain) == NULL) || (d->vcpu == NULL) ||
          ((v = d->vcpu[0]) == NULL) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 56067f85d1..3dbc66bb64 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1683,13 +1683,15 @@ static int dummy_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 
 static nmi_callback_t *nmi_callback = dummy_nmi_callback;
 
+DEFINE_PER_CPU(unsigned int, nmi_count);
+
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
     unsigned char reason = 0;
     bool handle_unknown = false;
 
-    ++nmi_count(cpu);
+    this_cpu(nmi_count)++;
 
     if ( nmi_callback(regs, cpu) )
         return;
diff --git a/xen/include/asm-x86/nmi.h b/xen/include/asm-x86/nmi.h
index f9dfca6afb..a288f02a50 100644
--- a/xen/include/asm-x86/nmi.h
+++ b/xen/include/asm-x86/nmi.h
@@ -31,5 +31,7 @@ nmi_callback_t *set_nmi_callback(nmi_callback_t *callback);
  * Remove the handler previously set.
  */
 void unset_nmi_callback(void);
+
+DECLARE_PER_CPU(unsigned int, nmi_count);
  
 #endif /* ASM_NMI_H */
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context
  2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 1/6] x86/smp: unify header includes in smp.h Roger Pau Monne
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 2/6] x86: introduce a nmi_count tracking variable Roger Pau Monne
@ 2020-02-17 18:43 ` Roger Pau Monne
  2020-02-17 19:29   ` Julien Grall
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 4/6] x86: track when in #NMI context Roger Pau Monne
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monne @ 2020-02-17 18:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	Roger Pau Monne

Add helpers to track when executing in #MC context. This is modeled
after the in_irq helpers.

Note that there are no users of in_mc() introduced by the change,
further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mce.c | 2 ++
 xen/include/asm-x86/hardirq.h | 5 +++++
 xen/include/xen/irq_cpustat.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index d61e582af3..93ed5752ac 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -93,7 +93,9 @@ void x86_mce_vector_register(x86_mce_vector_t hdlr)
 
 void do_machine_check(const struct cpu_user_regs *regs)
 {
+    mc_enter();
     _machine_check_vector(regs);
+    mc_exit();
 }
 
 /*
diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
index 34e1b49260..af3eab6a4d 100644
--- a/xen/include/asm-x86/hardirq.h
+++ b/xen/include/asm-x86/hardirq.h
@@ -8,6 +8,7 @@ typedef struct {
 	unsigned int __softirq_pending;
 	unsigned int __local_irq_count;
 	unsigned int __nmi_count;
+	unsigned int mc_count;
 	bool_t __mwait_wakeup;
 } __cacheline_aligned irq_cpustat_t;
 
@@ -18,6 +19,10 @@ typedef struct {
 #define irq_enter()	(local_irq_count(smp_processor_id())++)
 #define irq_exit()	(local_irq_count(smp_processor_id())--)
 
+#define in_mc() 	(mc_count(smp_processor_id()) != 0)
+#define mc_enter()	(mc_count(smp_processor_id())++)
+#define mc_exit()	(mc_count(smp_processor_id())--)
+
 void ack_bad_irq(unsigned int irq);
 
 extern void apic_intr_init(void);
diff --git a/xen/include/xen/irq_cpustat.h b/xen/include/xen/irq_cpustat.h
index 73629f6ec8..12b932fc39 100644
--- a/xen/include/xen/irq_cpustat.h
+++ b/xen/include/xen/irq_cpustat.h
@@ -26,5 +26,6 @@ extern irq_cpustat_t irq_stat[];
 #define local_irq_count(cpu)	__IRQ_STAT((cpu), __local_irq_count)
 #define nmi_count(cpu)		__IRQ_STAT((cpu), __nmi_count)
 #define mwait_wakeup(cpu)	__IRQ_STAT((cpu), __mwait_wakeup)
+#define mc_count(cpu)		__IRQ_STAT((cpu), mc_count)
 
 #endif	/* __irq_cpustat_h */
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Xen-devel] [PATCH v2 4/6] x86: track when in #NMI context
  2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
                   ` (2 preceding siblings ...)
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context Roger Pau Monne
@ 2020-02-17 18:43 ` Roger Pau Monne
  2020-02-18 10:40   ` Andrew Cooper
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monne @ 2020-02-17 18:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Add helpers to track when running in #MC context. This is modeled
after the in_irq helpers, but does not support reentry.

Note that there are no users of in_mc() introduced by the change,
further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/traps.c          |  6 ++++++
 xen/include/asm-x86/hardirq.h | 18 +++++++++++++++++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 3dbc66bb64..f4f2c13ae9 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1692,9 +1692,13 @@ void do_nmi(const struct cpu_user_regs *regs)
     bool handle_unknown = false;
 
     this_cpu(nmi_count)++;
+    nmi_enter();
 
     if ( nmi_callback(regs, cpu) )
+    {
+        nmi_exit();
         return;
+    }
 
     /*
      * Accessing port 0x61 may trap to SMM which has been actually
@@ -1720,6 +1724,8 @@ void do_nmi(const struct cpu_user_regs *regs)
         if ( !(reason & 0xc0) && handle_unknown )
             unknown_nmi_error(regs, reason);
     }
+
+    nmi_exit();
 }
 
 nmi_callback_t *set_nmi_callback(nmi_callback_t *callback)
diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
index af3eab6a4d..8bcae99eac 100644
--- a/xen/include/asm-x86/hardirq.h
+++ b/xen/include/asm-x86/hardirq.h
@@ -2,12 +2,14 @@
 #define __ASM_HARDIRQ_H
 
 #include <xen/cache.h>
+#include <xen/lib.h>
+#include <xen/smp.h>
 #include <xen/types.h>
 
 typedef struct {
 	unsigned int __softirq_pending;
 	unsigned int __local_irq_count;
-	unsigned int __nmi_count;
+	bool in_nmi;
 	unsigned int mc_count;
 	bool_t __mwait_wakeup;
 } __cacheline_aligned irq_cpustat_t;
@@ -23,6 +25,20 @@ typedef struct {
 #define mc_enter()	(mc_count(smp_processor_id())++)
 #define mc_exit()	(mc_count(smp_processor_id())--)
 
+#define in_nmi()	__IRQ_STAT(smp_processor_id(), in_nmi)
+
+static inline void nmi_enter(void)
+{
+    ASSERT(!in_nmi());
+    in_nmi() = true;
+}
+
+static inline void nmi_exit(void)
+{
+    ASSERT(in_nmi());
+    in_nmi() = false;
+}
+
 void ack_bad_irq(unsigned int irq);
 
 extern void apic_intr_init(void);
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
                   ` (3 preceding siblings ...)
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 4/6] x86: track when in #NMI context Roger Pau Monne
@ 2020-02-17 18:43 ` Roger Pau Monne
  2020-02-18 10:53   ` Andrew Cooper
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 6/6] x86: add accessors for scratch cpu mask Roger Pau Monne
  2020-02-18  7:40 ` [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Jürgen Groß
  6 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monne @ 2020-02-17 18:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Sander Eikelenboom, Wei Liu, Jan Beulich, Roger Pau Monne

Using scratch_cpumask in send_IPI_mak is not safe because it can be
called from interrupt context, and hence Xen would have to make sure
all the users of the scratch cpumask disable interrupts while using
it.

Instead introduce a new cpumask to be used by send_IPI_mask, and
disable interrupts while using.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Don't use the shorthand when in #MC or #NMI context.
---
 xen/arch/x86/smp.c     | 26 +++++++++++++++++++++++++-
 xen/arch/x86/smpboot.c |  9 ++++++++-
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index c7caf5bc26..0a9a9e7f02 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -59,6 +59,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
     apic_write(APIC_ICR, cfg);
 }
 
+DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
 /*
  * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
  * excluding the local CPU. @cpumask may be empty.
@@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
 void send_IPI_mask(const cpumask_t *mask, int vector)
 {
     bool cpus_locked = false;
-    cpumask_t *scratch = this_cpu(scratch_cpumask);
+    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
+    unsigned long flags;
+
+    if ( in_mc() || in_nmi() )
+    {
+        /*
+         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
+         * because we have no way to avoid reentry, so do not use the APIC
+         * shorthand.
+         */
+        alternative_vcall(genapic.send_IPI_mask, mask, vector);
+        return;
+    }
+
 
     /*
      * This can only be safely used when no CPU hotplug or unplug operations
@@ -81,7 +95,15 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
          local_irq_is_enabled() && (cpus_locked = get_cpu_maps()) &&
          (park_offline_cpus ||
           cpumask_equal(&cpu_online_map, &cpu_present_map)) )
+    {
+        /*
+         * send_IPI_mask can be called from interrupt context, and hence we
+         * need to disable interrupts in order to protect the per-cpu
+         * send_ipi_cpumask while being used.
+         */
+        local_irq_save(flags);
         cpumask_or(scratch, mask, cpumask_of(smp_processor_id()));
+    }
     else
     {
         if ( cpus_locked )
@@ -89,6 +111,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
             put_cpu_maps();
             cpus_locked = false;
         }
+        local_irq_save(flags);
         cpumask_clear(scratch);
     }
 
@@ -97,6 +120,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
     else
         alternative_vcall(genapic.send_IPI_mask, mask, vector);
 
+    local_irq_restore(flags);
     if ( cpus_locked )
         put_cpu_maps();
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index e83e4564a4..82e89201b3 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
 static cpumask_t scratch_cpu0mask;
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);
+static cpumask_t send_ipi_cpu0mask;
+
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
@@ -930,6 +933,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
         FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu));
         if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
             FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu));
+        if ( per_cpu(send_ipi_cpumask, cpu) != &send_ipi_cpu0mask )
+            FREE_CPUMASK_VAR(per_cpu(send_ipi_cpumask, cpu));
     }
 
     cleanup_cpu_root_pgt(cpu);
@@ -1034,7 +1039,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 
     if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
            cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
-           cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+           cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) &&
+           cond_alloc_cpumask_var(&per_cpu(send_ipi_cpumask, cpu))) )
         goto out;
 
     rc = 0;
@@ -1175,6 +1181,7 @@ void __init smp_prepare_boot_cpu(void)
     cpumask_set_cpu(cpu, &cpu_present_map);
 #if NR_CPUS > 2 * BITS_PER_LONG
     per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask;
+    per_cpu(send_ipi_cpumask, cpu) = &send_ipi_cpu0mask;
 #endif
 
     get_cpu_info()->use_pv_cr3 = false;
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Xen-devel] [PATCH v2 6/6] x86: add accessors for scratch cpu mask
  2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
                   ` (4 preceding siblings ...)
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
@ 2020-02-17 18:43 ` Roger Pau Monne
  2020-02-18  7:40 ` [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Jürgen Groß
  6 siblings, 0 replies; 32+ messages in thread
From: Roger Pau Monne @ 2020-02-17 18:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Current usage of the per-CPU scratch cpumask is dangerous since
there's no way to figure out if the mask is already being used except
for manual code inspection of all the callers and possible call paths.

This is unsafe and not reliable, so introduce a minimal get/put
infrastructure to prevent nested usage of the scratch mask and usage
in interrupt context.

Move the declaration of scratch_cpumask to smp.c in order to place the
declaration and the accessors as close as possible.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Use __builtin_return_address(0) instead of __func__.
 - Move declaration of scratch_cpumask and scratch_cpumask accessor to
   smp.c.
 - Do not allow usage in #MC or #NMI context.
---
 xen/arch/x86/io_apic.c    |  6 ++++--
 xen/arch/x86/irq.c        | 13 ++++++++++---
 xen/arch/x86/mm.c         | 30 +++++++++++++++++++++---------
 xen/arch/x86/msi.c        |  4 +++-
 xen/arch/x86/smp.c        | 25 +++++++++++++++++++++++++
 xen/arch/x86/smpboot.c    |  1 -
 xen/include/asm-x86/smp.h | 10 ++++++++++
 7 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..4ee261b632 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
     entry.vector = vector;
 
     if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(entry, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     } else {
         printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
                irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
@@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
 
     if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(rte, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     }
     else
     {
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..7ecf5376e3 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 {
     unsigned int cpu, old_vector, irq = desc->irq;
     unsigned int vector = desc->arch.vector;
-    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
+    cpumask_t *tmp_mask = get_scratch_cpumask();
 
     BUG_ON(!valid_irq_vector(vector));
 
@@ -223,7 +223,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
     trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
 
     if ( likely(!desc->arch.move_in_progress) )
+    {
+        put_scratch_cpumask();
         return;
+    }
 
     /* If we were in motion, also clear desc->arch.old_vector */
     old_vector = desc->arch.old_vector;
@@ -236,6 +239,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
         per_cpu(vector_irq, cpu)[old_vector] = ~irq;
     }
 
+    put_scratch_cpumask();
     release_old_vec(desc);
 
     desc->arch.move_in_progress = 0;
@@ -1152,10 +1156,11 @@ static void irq_guest_eoi_timer_fn(void *data)
         break;
 
     case ACKTYPE_EOI:
-        cpu_eoi_map = this_cpu(scratch_cpumask);
+        cpu_eoi_map = get_scratch_cpumask();
         cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
         spin_unlock_irq(&desc->lock);
         on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
+        put_scratch_cpumask();
         return;
     }
 
@@ -2531,12 +2536,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
     unsigned int irq;
     static int warned;
     struct irq_desc *desc;
+    cpumask_t *affinity = get_scratch_cpumask();
 
     for ( irq = 0; irq < nr_irqs; irq++ )
     {
         bool break_affinity = false, set_affinity = true;
         unsigned int vector;
-        cpumask_t *affinity = this_cpu(scratch_cpumask);
 
         if ( irq == 2 )
             continue;
@@ -2640,6 +2645,8 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
                    irq, CPUMASK_PR(affinity));
     }
 
+    put_scratch_cpumask();
+
     /* That doesn't seem sufficient.  Give it 1ms. */
     local_irq_enable();
     mdelay(1);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index edc238e51a..75b6114c1c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1261,7 +1261,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
              (l1e_owner == pg_owner) )
         {
             struct vcpu *v;
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             cpumask_clear(mask);
 
@@ -1278,6 +1278,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 
             if ( !cpumask_empty(mask) )
                 flush_tlb_mask(mask);
+            put_scratch_cpumask();
         }
 #endif /* CONFIG_PV_LDT_PAGING */
         put_page(page);
@@ -2902,7 +2903,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                  * vital that no other CPUs are left with mappings of a frame
                  * which is about to become writeable to the guest.
                  */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 BUG_ON(in_irq());
                 cpumask_copy(mask, d->dirty_cpumask);
@@ -2918,6 +2919,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                     perfc_incr(need_flush_tlb_flush);
                     flush_tlb_mask(mask);
                 }
+                put_scratch_cpumask();
 
                 /* We lose existing type and validity. */
                 nx &= ~(PGT_type_mask | PGT_validated);
@@ -3634,7 +3636,7 @@ long do_mmuext_op(
         case MMUEXT_TLB_FLUSH_MULTI:
         case MMUEXT_INVLPG_MULTI:
         {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             if ( unlikely(currd != pg_owner) )
                 rc = -EPERM;
@@ -3644,12 +3646,17 @@ long do_mmuext_op(
                                    mask)) )
                 rc = -EINVAL;
             if ( unlikely(rc) )
+            {
+                put_scratch_cpumask();
                 break;
+            }
 
             if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
                 flush_tlb_mask(mask);
             else if ( __addr_ok(op.arg1.linear_addr) )
                 flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            put_scratch_cpumask();
+
             break;
         }
 
@@ -3682,7 +3689,7 @@ long do_mmuext_op(
             else if ( likely(cache_flush_permitted(currd)) )
             {
                 unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 cpumask_clear(mask);
                 for_each_online_cpu(cpu)
@@ -3690,6 +3697,7 @@ long do_mmuext_op(
                                              per_cpu(cpu_sibling_mask, cpu)) )
                         __cpumask_set_cpu(cpu, mask);
                 flush_mask(mask, FLUSH_CACHE);
+                put_scratch_cpumask();
             }
             else
                 rc = -EINVAL;
@@ -4155,12 +4163,13 @@ long do_mmu_update(
          * Force other vCPU-s of the affected guest to pick up L4 entry
          * changes (if any).
          */
-        unsigned int cpu = smp_processor_id();
-        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
+        cpumask_t *mask = get_scratch_cpumask();
 
-        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
+        cpumask_andnot(mask, pt_owner->dirty_cpumask,
+                       cpumask_of(smp_processor_id()));
         if ( !cpumask_empty(mask) )
             flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
+        put_scratch_cpumask();
     }
 
     perfc_add(num_page_updates, i);
@@ -4352,7 +4361,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4372,7 +4381,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4383,6 +4392,9 @@ static int __do_update_va_mapping(
         break;
     }
 
+    if ( mask && mask != d->dirty_cpumask )
+        put_scratch_cpumask();
+
     return rc;
 }
 
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 161ee60dbe..6624ea20d0 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -159,13 +159,15 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
 
     if ( cpu_mask )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask;
 
         if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
             return;
 
+        mask = get_scratch_cpumask();
         cpumask_and(mask, cpu_mask, &cpu_online_map);
         msg->dest32 = cpu_mask_to_apicid(mask);
+        put_scratch_cpumask();
     }
 
     msg->address_hi = MSI_ADDR_BASE_HI;
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 0a9a9e7f02..8ad2b6912f 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -25,6 +25,31 @@
 #include <irq_vectors.h>
 #include <mach_apic.h>
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
+
+#ifndef NDEBUG
+cpumask_t *scratch_cpumask(bool use)
+{
+    static DEFINE_PER_CPU(void *, scratch_cpumask_use);
+
+    /*
+     * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI
+     * context.
+     */
+    BUG_ON(in_irq() || in_mc() || in_nmi());
+
+    if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
+    {
+        printk("%p: scratch CPU mask already in use by %p\n",
+               __builtin_return_address(0), this_cpu(scratch_cpumask_use));
+        BUG();
+    }
+    this_cpu(scratch_cpumask_use) = use ? __builtin_return_address(0) : NULL;
+
+    return use ? this_cpu(scratch_cpumask) : NULL;
+}
+#endif
+
 /* Helper functions to prepare APIC register values. */
 static unsigned int prepare_ICR(unsigned int shortcut, int vector)
 {
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 82e89201b3..a2ac3adb38 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -54,7 +54,6 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask);
 /* representing HT and core siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 
-DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
 static cpumask_t scratch_cpu0mask;
 
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 92d69a5ea0..40ab6c251d 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -23,6 +23,16 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
+#ifndef NDEBUG
+/* Not to be called directly, use {get/put}_scratch_cpumask(). */
+cpumask_t *scratch_cpumask(bool use);
+#define get_scratch_cpumask() scratch_cpumask(true)
+#define put_scratch_cpumask() ((void)scratch_cpumask(false))
+#else
+#define get_scratch_cpumask() this_cpu(scratch_cpumask)
+#define put_scratch_cpumask()
+#endif
+
 /*
  * Do we, for platform reasons, need to actually keep CPUs online when we
  * would otherwise prefer them to be off?
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context Roger Pau Monne
@ 2020-02-17 19:29   ` Julien Grall
  2020-02-18  9:45     ` Roger Pau Monné
  2020-02-18  9:48     ` Roger Pau Monné
  0 siblings, 2 replies; 32+ messages in thread
From: Julien Grall @ 2020-02-17 19:29 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich

Hi Roger,

On 17/02/2020 18:43, Roger Pau Monne wrote:
> Add helpers to track when executing in #MC context. This is modeled
> after the in_irq helpers.
> 
> Note that there are no users of in_mc() introduced by the change,
> further users will be added by followup changes.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>   xen/arch/x86/cpu/mcheck/mce.c | 2 ++
>   xen/include/asm-x86/hardirq.h | 5 +++++
>   xen/include/xen/irq_cpustat.h | 1 +
>   3 files changed, 8 insertions(+)
> 
> diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
> index d61e582af3..93ed5752ac 100644
> --- a/xen/arch/x86/cpu/mcheck/mce.c
> +++ b/xen/arch/x86/cpu/mcheck/mce.c
> @@ -93,7 +93,9 @@ void x86_mce_vector_register(x86_mce_vector_t hdlr)
>   
>   void do_machine_check(const struct cpu_user_regs *regs)
>   {
> +    mc_enter();
>       _machine_check_vector(regs);
> +    mc_exit();
>   }
>   
>   /*
> diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
> index 34e1b49260..af3eab6a4d 100644
> --- a/xen/include/asm-x86/hardirq.h
> +++ b/xen/include/asm-x86/hardirq.h
> @@ -8,6 +8,7 @@ typedef struct {
>   	unsigned int __softirq_pending;
>   	unsigned int __local_irq_count;
>   	unsigned int __nmi_count;
> +	unsigned int mc_count;
>   	bool_t __mwait_wakeup;
>   } __cacheline_aligned irq_cpustat_t;
>   
> @@ -18,6 +19,10 @@ typedef struct {
>   #define irq_enter()	(local_irq_count(smp_processor_id())++)
>   #define irq_exit()	(local_irq_count(smp_processor_id())--)
>   
> +#define in_mc() 	(mc_count(smp_processor_id()) != 0)
> +#define mc_enter()	(mc_count(smp_processor_id())++)
> +#define mc_exit()	(mc_count(smp_processor_id())--)
> +
>   void ack_bad_irq(unsigned int irq);
>   
>   extern void apic_intr_init(void);
> diff --git a/xen/include/xen/irq_cpustat.h b/xen/include/xen/irq_cpustat.h
> index 73629f6ec8..12b932fc39 100644
> --- a/xen/include/xen/irq_cpustat.h
> +++ b/xen/include/xen/irq_cpustat.h
> @@ -26,5 +26,6 @@ extern irq_cpustat_t irq_stat[];
>   #define local_irq_count(cpu)	__IRQ_STAT((cpu), __local_irq_count)
>   #define nmi_count(cpu)		__IRQ_STAT((cpu), __nmi_count)
>   #define mwait_wakeup(cpu)	__IRQ_STAT((cpu), __mwait_wakeup)
> +#define mc_count(cpu)		__IRQ_STAT((cpu), mc_count)

The header is only meant to contain arch-independent IRQ stats (see 
comment a few lines above). This is unlikely to be used on Arm, so can 
you move this into an x86 specific header?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask
  2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
                   ` (5 preceding siblings ...)
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 6/6] x86: add accessors for scratch cpu mask Roger Pau Monne
@ 2020-02-18  7:40 ` Jürgen Groß
  2020-02-18 10:15   ` Roger Pau Monné
  2020-02-18 10:26   ` Andrew Cooper
  6 siblings, 2 replies; 32+ messages in thread
From: Jürgen Groß @ 2020-02-18  7:40 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich

On 17.02.20 19:43, Roger Pau Monne wrote:
> Hello,
> 
> Commit:
> 
> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> x86/smp: use APIC ALLBUT destination shorthand when possible
> 
> Introduced a bogus usage of the scratch cpumask: it was used in a
> function that could be called from interrupt context, and hence using
> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
> together with also preventing the usage of any per-CPU variables when
> send_IPI_mask is called from #MC or #NMI context. Previous patches are
> preparatory changes.
> 
> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
> is used in the right context, and hence should prevent further missuses.

I wonder whether it wouldn't be better to have a common percpu scratch
cpumask handling instead of introducing local ones all over the
hypervisor.

So basically an array of percpu cpumasks allocated when bringing up a
cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
cpus), a percpu counter of the next free index and get_ and put_
functions acting in a lifo manner.

This would help removing all the still existing cpumasks on the stack
and any illegal nesting would be avoided. The only remaining question
would be the size of the array.

Thoughts?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context
  2020-02-17 19:29   ` Julien Grall
@ 2020-02-18  9:45     ` Roger Pau Monné
  2020-02-18  9:48     ` Roger Pau Monné
  1 sibling, 0 replies; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18  9:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	xen-devel

On Mon, Feb 17, 2020 at 07:29:29PM +0000, Julien Grall wrote:
> Hi Roger,
> 
> On 17/02/2020 18:43, Roger Pau Monne wrote:
> > Add helpers to track when executing in #MC context. This is modeled
> > after the in_irq helpers.
> > 
> > Note that there are no users of in_mc() introduced by the change,
> > further users will be added by followup changes.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >   xen/arch/x86/cpu/mcheck/mce.c | 2 ++
> >   xen/include/asm-x86/hardirq.h | 5 +++++
> >   xen/include/xen/irq_cpustat.h | 1 +
> >   3 files changed, 8 insertions(+)
> > 
> > diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
> > index d61e582af3..93ed5752ac 100644
> > --- a/xen/arch/x86/cpu/mcheck/mce.c
> > +++ b/xen/arch/x86/cpu/mcheck/mce.c
> > @@ -93,7 +93,9 @@ void x86_mce_vector_register(x86_mce_vector_t hdlr)
> >   void do_machine_check(const struct cpu_user_regs *regs)
> >   {
> > +    mc_enter();
> >       _machine_check_vector(regs);
> > +    mc_exit();
> >   }
> >   /*
> > diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
> > index 34e1b49260..af3eab6a4d 100644
> > --- a/xen/include/asm-x86/hardirq.h
> > +++ b/xen/include/asm-x86/hardirq.h
> > @@ -8,6 +8,7 @@ typedef struct {
> >   	unsigned int __softirq_pending;
> >   	unsigned int __local_irq_count;
> >   	unsigned int __nmi_count;
> > +	unsigned int mc_count;
> >   	bool_t __mwait_wakeup;
> >   } __cacheline_aligned irq_cpustat_t;
> > @@ -18,6 +19,10 @@ typedef struct {
> >   #define irq_enter()	(local_irq_count(smp_processor_id())++)
> >   #define irq_exit()	(local_irq_count(smp_processor_id())--)
> > +#define in_mc() 	(mc_count(smp_processor_id()) != 0)
> > +#define mc_enter()	(mc_count(smp_processor_id())++)
> > +#define mc_exit()	(mc_count(smp_processor_id())--)
> > +
> >   void ack_bad_irq(unsigned int irq);
> >   extern void apic_intr_init(void);
> > diff --git a/xen/include/xen/irq_cpustat.h b/xen/include/xen/irq_cpustat.h
> > index 73629f6ec8..12b932fc39 100644
> > --- a/xen/include/xen/irq_cpustat.h
> > +++ b/xen/include/xen/irq_cpustat.h
> > @@ -26,5 +26,6 @@ extern irq_cpustat_t irq_stat[];
> >   #define local_irq_count(cpu)	__IRQ_STAT((cpu), __local_irq_count)
> >   #define nmi_count(cpu)		__IRQ_STAT((cpu), __nmi_count)
> >   #define mwait_wakeup(cpu)	__IRQ_STAT((cpu), __mwait_wakeup)
> > +#define mc_count(cpu)		__IRQ_STAT((cpu), mc_count)
> 
> The header is only meant to contain arch-independent IRQ stats (see comment
> a few lines above). This is unlikely to be used on Arm, so can you move this
> into an x86 specific header?

Uh, sure. Sorry for not realizing this.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context
  2020-02-17 19:29   ` Julien Grall
  2020-02-18  9:45     ` Roger Pau Monné
@ 2020-02-18  9:48     ` Roger Pau Monné
  2020-02-18 11:20       ` Julien Grall
  1 sibling, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18  9:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	xen-devel

On Mon, Feb 17, 2020 at 07:29:29PM +0000, Julien Grall wrote:
> Hi Roger,
> 
> On 17/02/2020 18:43, Roger Pau Monne wrote:
> > Add helpers to track when executing in #MC context. This is modeled
> > after the in_irq helpers.
> > 
> > Note that there are no users of in_mc() introduced by the change,
> > further users will be added by followup changes.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >   xen/arch/x86/cpu/mcheck/mce.c | 2 ++
> >   xen/include/asm-x86/hardirq.h | 5 +++++
> >   xen/include/xen/irq_cpustat.h | 1 +
> >   3 files changed, 8 insertions(+)
> > 
> > diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
> > index d61e582af3..93ed5752ac 100644
> > --- a/xen/arch/x86/cpu/mcheck/mce.c
> > +++ b/xen/arch/x86/cpu/mcheck/mce.c
> > @@ -93,7 +93,9 @@ void x86_mce_vector_register(x86_mce_vector_t hdlr)
> >   void do_machine_check(const struct cpu_user_regs *regs)
> >   {
> > +    mc_enter();
> >       _machine_check_vector(regs);
> > +    mc_exit();
> >   }
> >   /*
> > diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
> > index 34e1b49260..af3eab6a4d 100644
> > --- a/xen/include/asm-x86/hardirq.h
> > +++ b/xen/include/asm-x86/hardirq.h
> > @@ -8,6 +8,7 @@ typedef struct {
> >   	unsigned int __softirq_pending;
> >   	unsigned int __local_irq_count;
> >   	unsigned int __nmi_count;
> > +	unsigned int mc_count;
> >   	bool_t __mwait_wakeup;
> >   } __cacheline_aligned irq_cpustat_t;
> > @@ -18,6 +19,10 @@ typedef struct {
> >   #define irq_enter()	(local_irq_count(smp_processor_id())++)
> >   #define irq_exit()	(local_irq_count(smp_processor_id())--)
> > +#define in_mc() 	(mc_count(smp_processor_id()) != 0)
> > +#define mc_enter()	(mc_count(smp_processor_id())++)
> > +#define mc_exit()	(mc_count(smp_processor_id())--)
> > +
> >   void ack_bad_irq(unsigned int irq);
> >   extern void apic_intr_init(void);
> > diff --git a/xen/include/xen/irq_cpustat.h b/xen/include/xen/irq_cpustat.h
> > index 73629f6ec8..12b932fc39 100644
> > --- a/xen/include/xen/irq_cpustat.h
> > +++ b/xen/include/xen/irq_cpustat.h
> > @@ -26,5 +26,6 @@ extern irq_cpustat_t irq_stat[];
> >   #define local_irq_count(cpu)	__IRQ_STAT((cpu), __local_irq_count)
> >   #define nmi_count(cpu)		__IRQ_STAT((cpu), __nmi_count)
> >   #define mwait_wakeup(cpu)	__IRQ_STAT((cpu), __mwait_wakeup)
> > +#define mc_count(cpu)		__IRQ_STAT((cpu), mc_count)
> 
> The header is only meant to contain arch-independent IRQ stats (see comment
> a few lines above). This is unlikely to be used on Arm, so can you move this
> into an x86 specific header?

Now that I look at it, there's also nmi_count and mwait_wakeup defined
in irq_cpustat.h which won't build if used on Arm, since the fields
don't exist in the Arm version of irq_cpustat_t.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask
  2020-02-18  7:40 ` [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Jürgen Groß
@ 2020-02-18 10:15   ` Roger Pau Monné
  2020-02-18 10:46     ` Jürgen Groß
  2020-02-18 10:26   ` Andrew Cooper
  1 sibling, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18 10:15 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	xen-devel

On Tue, Feb 18, 2020 at 08:40:58AM +0100, Jürgen Groß wrote:
> On 17.02.20 19:43, Roger Pau Monne wrote:
> > Hello,
> > 
> > Commit:
> > 
> > 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> > x86/smp: use APIC ALLBUT destination shorthand when possible
> > 
> > Introduced a bogus usage of the scratch cpumask: it was used in a
> > function that could be called from interrupt context, and hence using
> > the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
> > together with also preventing the usage of any per-CPU variables when
> > send_IPI_mask is called from #MC or #NMI context. Previous patches are
> > preparatory changes.
> > 
> > Patch #6 adds some debug infrastructure to make sure the scratch cpumask
> > is used in the right context, and hence should prevent further missuses.
> 
> I wonder whether it wouldn't be better to have a common percpu scratch
> cpumask handling instead of introducing local ones all over the
> hypervisor.

But the scratch CPU mask already accomplishes this, it's a single
per-CPU mask allocated when the CPU is brought up.

> So basically an array of percpu cpumasks allocated when bringing up a
> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
> cpus), a percpu counter of the next free index and get_ and put_
> functions acting in a lifo manner.

Sizing this array would be complicated IMO, and it's possible that a
failure to get a mask in certain places could lead to a panic.

> This would help removing all the still existing cpumasks on the stack
> and any illegal nesting would be avoided. The only remaining question
> would be the size of the array.
> 
> Thoughts?

I'm mostly worried about the size of such stack, since we would then
allow nested calls to the get_ cpumask helper. Also I'm not sure
whether we should prevent the usage in interrupt context then, in
order to try to limit the nesting as much as possible.

I think this series is a move in the right direction, and we can add
the per-CPU stack afterwards (as the get_/put_ helpers will already be
there).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask
  2020-02-18  7:40 ` [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Jürgen Groß
  2020-02-18 10:15   ` Roger Pau Monné
@ 2020-02-18 10:26   ` Andrew Cooper
  2020-02-18 10:54     ` Jürgen Groß
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 10:26 UTC (permalink / raw)
  To: Jürgen Groß, Roger Pau Monne, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Jan Beulich

On 18/02/2020 07:40, Jürgen Groß wrote:
> On 17.02.20 19:43, Roger Pau Monne wrote:
>> Hello,
>>
>> Commit:
>>
>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>
>> Introduced a bogus usage of the scratch cpumask: it was used in a
>> function that could be called from interrupt context, and hence using
>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
>> together with also preventing the usage of any per-CPU variables when
>> send_IPI_mask is called from #MC or #NMI context. Previous patches are
>> preparatory changes.
>>
>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
>> is used in the right context, and hence should prevent further missuses.
>
> I wonder whether it wouldn't be better to have a common percpu scratch
> cpumask handling instead of introducing local ones all over the
> hypervisor.
>
> So basically an array of percpu cpumasks allocated when bringing up a
> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
> cpus), a percpu counter of the next free index and get_ and put_
> functions acting in a lifo manner.
>
> This would help removing all the still existing cpumasks on the stack
> and any illegal nesting would be avoided. The only remaining question
> would be the size of the array.
>
> Thoughts?

I like the approach, but there is a major caveat.

It is certainly problematic that we have both cpumask_scratch and
scratch_cpumask with have different rules for how to use safely, and now
we're gaining custom logic to fix up the recursive safety of one of them.

That said, I'm pretty sure it will be x86 specific, because the safety
of this is tied to using per-pcpu stacks rather than per-vcpu stacks. 
The only reason the scheduler cpumasks are safe for use on ARM is
because the scheduler code which uses them doesn't call schedule() in
the middle of use.

As a consequence, I don't think this is infrastructure which would be
safe for common code to use.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 4/6] x86: track when in #NMI context
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 4/6] x86: track when in #NMI context Roger Pau Monne
@ 2020-02-18 10:40   ` Andrew Cooper
  2020-02-18 10:59     ` Roger Pau Monné
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 10:40 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Wei Liu, Jan Beulich

On 17/02/2020 18:43, Roger Pau Monne wrote:
> Add helpers to track when running in #MC context. This is modeled
> after the in_irq helpers, but does not support reentry.
>
> Note that there are no users of in_mc() introduced by the change,
> further users will be added by followup changes.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

You probably mean s/mc/nmi/ throughout the commit message, but I'm
afraid these are rather problematic.

NMIs can be recursively entered, especially on corner cases in the crash
path.  Asserting that the crash path is not recursive can lead to never
entering the crash kernel.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask
  2020-02-18 10:15   ` Roger Pau Monné
@ 2020-02-18 10:46     ` Jürgen Groß
  0 siblings, 0 replies; 32+ messages in thread
From: Jürgen Groß @ 2020-02-18 10:46 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	xen-devel

On 18.02.20 11:15, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 08:40:58AM +0100, Jürgen Groß wrote:
>> On 17.02.20 19:43, Roger Pau Monne wrote:
>>> Hello,
>>>
>>> Commit:
>>>
>>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Introduced a bogus usage of the scratch cpumask: it was used in a
>>> function that could be called from interrupt context, and hence using
>>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
>>> together with also preventing the usage of any per-CPU variables when
>>> send_IPI_mask is called from #MC or #NMI context. Previous patches are
>>> preparatory changes.
>>>
>>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
>>> is used in the right context, and hence should prevent further missuses.
>>
>> I wonder whether it wouldn't be better to have a common percpu scratch
>> cpumask handling instead of introducing local ones all over the
>> hypervisor.
> 
> But the scratch CPU mask already accomplishes this, it's a single
> per-CPU mask allocated when the CPU is brought up.

This one, yes. There are others which are just defined like:

DEFINE_PER_CPU(cpumask_t, ...)

>> So basically an array of percpu cpumasks allocated when bringing up a
>> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
>> cpus), a percpu counter of the next free index and get_ and put_
>> functions acting in a lifo manner.
> 
> Sizing this array would be complicated IMO, and it's possible that a
> failure to get a mask in certain places could lead to a panic.

The question is whether a silent double usage is better (which your
series is already addressing at least for the scratch cpumask).

>> This would help removing all the still existing cpumasks on the stack
>> and any illegal nesting would be avoided. The only remaining question
>> would be the size of the array.
>>
>> Thoughts?
> 
> I'm mostly worried about the size of such stack, since we would then
> allow nested calls to the get_ cpumask helper. Also I'm not sure
> whether we should prevent the usage in interrupt context then, in
> order to try to limit the nesting as much as possible.

No, excluding interrupt context would add the need for special
purpose masks again.

> I think this series is a move in the right direction, and we can add
> the per-CPU stack afterwards (as the get_/put_ helpers will already be
> there).

Yes, I completely agree.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-17 18:43 ` [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
@ 2020-02-18 10:53   ` Andrew Cooper
  2020-02-18 11:10     ` Roger Pau Monné
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 10:53 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Sander Eikelenboom, Wei Liu, Jan Beulich

On 17/02/2020 18:43, Roger Pau Monne wrote:
> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>  void send_IPI_mask(const cpumask_t *mask, int vector)
>  {
>      bool cpus_locked = false;
> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> +    unsigned long flags;
> +
> +    if ( in_mc() || in_nmi() )
> +    {
> +        /*
> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> +         * because we have no way to avoid reentry, so do not use the APIC
> +         * shorthand.
> +         */
> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
> +        return;

The set of things you can safely do in an NMI/MCE handler is small, and
does not include sending IPIs.  (In reality, if you're using x2apic, it
is safe to send an IPI because there is no risk of clobbering ICR2
behind your outer context's back).

However, if we escalate from NMI/MCE context into crash context, then
anything goes.  In reality, we only ever send NMIs from the crash path,
and that is not permitted to use a shorthand, making this code dead.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask
  2020-02-18 10:26   ` Andrew Cooper
@ 2020-02-18 10:54     ` Jürgen Groß
  0 siblings, 0 replies; 32+ messages in thread
From: Jürgen Groß @ 2020-02-18 10:54 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monne, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Ian Jackson, Jan Beulich

On 18.02.20 11:26, Andrew Cooper wrote:
> On 18/02/2020 07:40, Jürgen Groß wrote:
>> On 17.02.20 19:43, Roger Pau Monne wrote:
>>> Hello,
>>>
>>> Commit:
>>>
>>> 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>> x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Introduced a bogus usage of the scratch cpumask: it was used in a
>>> function that could be called from interrupt context, and hence using
>>> the scratch cpumask there is not safe. Patch #5 is a fix for that usage,
>>> together with also preventing the usage of any per-CPU variables when
>>> send_IPI_mask is called from #MC or #NMI context. Previous patches are
>>> preparatory changes.
>>>
>>> Patch #6 adds some debug infrastructure to make sure the scratch cpumask
>>> is used in the right context, and hence should prevent further missuses.
>>
>> I wonder whether it wouldn't be better to have a common percpu scratch
>> cpumask handling instead of introducing local ones all over the
>> hypervisor.
>>
>> So basically an array of percpu cpumasks allocated when bringing up a
>> cpu (this spares memory as the masks wouldn't need to cover NR_CPUS
>> cpus), a percpu counter of the next free index and get_ and put_
>> functions acting in a lifo manner.
>>
>> This would help removing all the still existing cpumasks on the stack
>> and any illegal nesting would be avoided. The only remaining question
>> would be the size of the array.
>>
>> Thoughts?
> 
> I like the approach, but there is a major caveat.
> 
> It is certainly problematic that we have both cpumask_scratch and
> scratch_cpumask with have different rules for how to use safely, and now
> we're gaining custom logic to fix up the recursive safety of one of them.
> 
> That said, I'm pretty sure it will be x86 specific, because the safety
> of this is tied to using per-pcpu stacks rather than per-vcpu stacks.
> The only reason the scheduler cpumasks are safe for use on ARM is
> because the scheduler code which uses them doesn't call schedule() in
> the middle of use.

No, the reason the scheduler cpumasks are safe is that using one of
those requires to take the scheduler lock of the cpu having the mask in
its percpu data.

That restriction could probably be dropped in case the scheduler would
be using the common infrastructure.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 4/6] x86: track when in #NMI context
  2020-02-18 10:40   ` Andrew Cooper
@ 2020-02-18 10:59     ` Roger Pau Monné
  0 siblings, 0 replies; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18 10:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Jan Beulich

On Tue, Feb 18, 2020 at 10:40:02AM +0000, Andrew Cooper wrote:
> On 17/02/2020 18:43, Roger Pau Monne wrote:
> > Add helpers to track when running in #MC context. This is modeled
> > after the in_irq helpers, but does not support reentry.
> >
> > Note that there are no users of in_mc() introduced by the change,
> > further users will be added by followup changes.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> You probably mean s/mc/nmi/ throughout the commit message, but I'm
> afraid these are rather problematic.

Er, yes, sorry, c&p from the previous commit and I failed to adjust
it.

> 
> NMIs can be recursively entered, especially on corner cases in the crash
> path.  Asserting that the crash path is not recursive can lead to never
> entering the crash kernel.

Is this specific to how Xen handles #NMI?

Intel SDM states that #NMI is not reentrant, as further #NMIs are
blocked until the execution of the iret instruction:

"While an NMI interrupt handler is executing, the processor blocks
delivery of subsequent NMIs until the next execu- tion of the IRET
instruction. This blocking of NMIs prevents nested execution of the
NMI handler. It is recommended that the NMI interrupt handler be
accessed through an interrupt gate to disable maskable hardware
interrupts (see Section 6.8.1, “Masking Maskable Hardware
Interrupts”)."

AFAICT there's no iret in do_nmi until it has finished execution.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 10:53   ` Andrew Cooper
@ 2020-02-18 11:10     ` Roger Pau Monné
  2020-02-18 11:21       ` Andrew Cooper
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18 11:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom

On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
> On 17/02/2020 18:43, Roger Pau Monne wrote:
> > @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >  void send_IPI_mask(const cpumask_t *mask, int vector)
> >  {
> >      bool cpus_locked = false;
> > -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> > +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> > +    unsigned long flags;
> > +
> > +    if ( in_mc() || in_nmi() )
> > +    {
> > +        /*
> > +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> > +         * because we have no way to avoid reentry, so do not use the APIC
> > +         * shorthand.
> > +         */
> > +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
> > +        return;
> 
> The set of things you can safely do in an NMI/MCE handler is small, and
> does not include sending IPIs.  (In reality, if you're using x2apic, it
> is safe to send an IPI because there is no risk of clobbering ICR2
> behind your outer context's back).
> 
> However, if we escalate from NMI/MCE context into crash context, then
> anything goes.  In reality, we only ever send NMIs from the crash path,
> and that is not permitted to use a shorthand, making this code dead.

This was requested by Jan, as safety measure even though we might not
currently send IPIs from such contexts.

I think it's better to be safe than sorry, as ultimately someone
adding an IPI usage in #MC or #NMI context could go unnoticed without
those checks.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context
  2020-02-18  9:48     ` Roger Pau Monné
@ 2020-02-18 11:20       ` Julien Grall
  0 siblings, 0 replies; 32+ messages in thread
From: Julien Grall @ 2020-02-18 11:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Jan Beulich,
	xen-devel

Hi Roger,

On 18/02/2020 09:48, Roger Pau Monné wrote:
> On Mon, Feb 17, 2020 at 07:29:29PM +0000, Julien Grall wrote:
>> Hi Roger,
>>
>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>> Add helpers to track when executing in #MC context. This is modeled
>>> after the in_irq helpers.
>>>
>>> Note that there are no users of in_mc() introduced by the change,
>>> further users will be added by followup changes.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>>    xen/arch/x86/cpu/mcheck/mce.c | 2 ++
>>>    xen/include/asm-x86/hardirq.h | 5 +++++
>>>    xen/include/xen/irq_cpustat.h | 1 +
>>>    3 files changed, 8 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
>>> index d61e582af3..93ed5752ac 100644
>>> --- a/xen/arch/x86/cpu/mcheck/mce.c
>>> +++ b/xen/arch/x86/cpu/mcheck/mce.c
>>> @@ -93,7 +93,9 @@ void x86_mce_vector_register(x86_mce_vector_t hdlr)
>>>    void do_machine_check(const struct cpu_user_regs *regs)
>>>    {
>>> +    mc_enter();
>>>        _machine_check_vector(regs);
>>> +    mc_exit();
>>>    }
>>>    /*
>>> diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
>>> index 34e1b49260..af3eab6a4d 100644
>>> --- a/xen/include/asm-x86/hardirq.h
>>> +++ b/xen/include/asm-x86/hardirq.h
>>> @@ -8,6 +8,7 @@ typedef struct {
>>>    	unsigned int __softirq_pending;
>>>    	unsigned int __local_irq_count;
>>>    	unsigned int __nmi_count;
>>> +	unsigned int mc_count;
>>>    	bool_t __mwait_wakeup;
>>>    } __cacheline_aligned irq_cpustat_t;
>>> @@ -18,6 +19,10 @@ typedef struct {
>>>    #define irq_enter()	(local_irq_count(smp_processor_id())++)
>>>    #define irq_exit()	(local_irq_count(smp_processor_id())--)
>>> +#define in_mc() 	(mc_count(smp_processor_id()) != 0)
>>> +#define mc_enter()	(mc_count(smp_processor_id())++)
>>> +#define mc_exit()	(mc_count(smp_processor_id())--)
>>> +
>>>    void ack_bad_irq(unsigned int irq);
>>>    extern void apic_intr_init(void);
>>> diff --git a/xen/include/xen/irq_cpustat.h b/xen/include/xen/irq_cpustat.h
>>> index 73629f6ec8..12b932fc39 100644
>>> --- a/xen/include/xen/irq_cpustat.h
>>> +++ b/xen/include/xen/irq_cpustat.h
>>> @@ -26,5 +26,6 @@ extern irq_cpustat_t irq_stat[];
>>>    #define local_irq_count(cpu)	__IRQ_STAT((cpu), __local_irq_count)
>>>    #define nmi_count(cpu)		__IRQ_STAT((cpu), __nmi_count)
>>>    #define mwait_wakeup(cpu)	__IRQ_STAT((cpu), __mwait_wakeup)
>>> +#define mc_count(cpu)		__IRQ_STAT((cpu), mc_count)
>>
>> The header is only meant to contain arch-independent IRQ stats (see comment
>> a few lines above). This is unlikely to be used on Arm, so can you move this
>> into an x86 specific header?
> 
> Now that I look at it, there's also nmi_count and mwait_wakeup defined
> in irq_cpustat.h which won't build if used on Arm, since the fields
> don't exist in the Arm version of irq_cpustat_t.

I would prefer if we don't introduce more cases in xen/irq_cpustat.h. It 
would be good to remove nmi_count() and mwait_wakeup() from the common 
header, but that's a separate cleanup.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 11:10     ` Roger Pau Monné
@ 2020-02-18 11:21       ` Andrew Cooper
  2020-02-18 11:22         ` Roger Pau Monné
  2020-02-18 11:28         ` Jan Beulich
  0 siblings, 2 replies; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 11:21 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom

On 18/02/2020 11:10, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>  {
>>>      bool cpus_locked = false;
>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>> +    unsigned long flags;
>>> +
>>> +    if ( in_mc() || in_nmi() )
>>> +    {
>>> +        /*
>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>> +         * because we have no way to avoid reentry, so do not use the APIC
>>> +         * shorthand.
>>> +         */
>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>> +        return;
>> The set of things you can safely do in an NMI/MCE handler is small, and
>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>> is safe to send an IPI because there is no risk of clobbering ICR2
>> behind your outer context's back).
>>
>> However, if we escalate from NMI/MCE context into crash context, then
>> anything goes.  In reality, we only ever send NMIs from the crash path,
>> and that is not permitted to use a shorthand, making this code dead.
> This was requested by Jan, as safety measure

That may be, but it doesn't mean it is correct.  If execution ever
enters this function in NMI/MCE context, there is a real,
state-corrupting bug, higher up the call stack.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 11:21       ` Andrew Cooper
@ 2020-02-18 11:22         ` Roger Pau Monné
  2020-02-18 11:35           ` Andrew Cooper
  2020-02-18 11:28         ` Jan Beulich
  1 sibling, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18 11:22 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom

On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
> On 18/02/2020 11:10, Roger Pau Monné wrote:
> > On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
> >> On 17/02/2020 18:43, Roger Pau Monne wrote:
> >>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >>>  void send_IPI_mask(const cpumask_t *mask, int vector)
> >>>  {
> >>>      bool cpus_locked = false;
> >>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> >>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> >>> +    unsigned long flags;
> >>> +
> >>> +    if ( in_mc() || in_nmi() )
> >>> +    {
> >>> +        /*
> >>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> >>> +         * because we have no way to avoid reentry, so do not use the APIC
> >>> +         * shorthand.
> >>> +         */
> >>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >>> +        return;
> >> The set of things you can safely do in an NMI/MCE handler is small, and
> >> does not include sending IPIs.  (In reality, if you're using x2apic, it
> >> is safe to send an IPI because there is no risk of clobbering ICR2
> >> behind your outer context's back).
> >>
> >> However, if we escalate from NMI/MCE context into crash context, then
> >> anything goes.  In reality, we only ever send NMIs from the crash path,
> >> and that is not permitted to use a shorthand, making this code dead.
> > This was requested by Jan, as safety measure
> 
> That may be, but it doesn't mean it is correct.  If execution ever
> enters this function in NMI/MCE context, there is a real,
> state-corrupting bug, higher up the call stack.

Ack, then I guess we should just BUG() here if ever called from #NMI
or #MC context?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 11:21       ` Andrew Cooper
  2020-02-18 11:22         ` Roger Pau Monné
@ 2020-02-18 11:28         ` Jan Beulich
  2020-02-18 11:44           ` Andrew Cooper
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2020-02-18 11:28 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monné
  Cc: xen-devel, Wei Liu, Sander Eikelenboom

On 18.02.2020 12:21, Andrew Cooper wrote:
> On 18/02/2020 11:10, Roger Pau Monné wrote:
>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>>  {
>>>>      bool cpus_locked = false;
>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>>> +    unsigned long flags;
>>>> +
>>>> +    if ( in_mc() || in_nmi() )
>>>> +    {
>>>> +        /*
>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>> +         * because we have no way to avoid reentry, so do not use the APIC
>>>> +         * shorthand.
>>>> +         */
>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>> +        return;
>>> The set of things you can safely do in an NMI/MCE handler is small, and
>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>>> is safe to send an IPI because there is no risk of clobbering ICR2
>>> behind your outer context's back).
>>>
>>> However, if we escalate from NMI/MCE context into crash context, then
>>> anything goes.  In reality, we only ever send NMIs from the crash path,
>>> and that is not permitted to use a shorthand, making this code dead.
>> This was requested by Jan, as safety measure
> 
> That may be, but it doesn't mean it is correct.  If execution ever
> enters this function in NMI/MCE context, there is a real,
> state-corrupting bug, higher up the call stack.

Besides the issue of any locks needing taking on such paths (which
must not happen in NMI/#MC context), the only thing getting in the
way of IPI sending is - afaics - ICR2, which could be saved /
restored around such operations. That said, BUG()ing or panic()ing
if we get in here from such a context would also be sufficient to
satisfy the "safety measure" aspect.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 11:22         ` Roger Pau Monné
@ 2020-02-18 11:35           ` Andrew Cooper
  2020-02-18 11:46             ` Roger Pau Monné
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 11:35 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom



On 18/02/2020 11:22, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
>> On 18/02/2020 11:10, Roger Pau Monné wrote:
>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>>>  {
>>>>>      bool cpus_locked = false;
>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>>>> +    unsigned long flags;
>>>>> +
>>>>> +    if ( in_mc() || in_nmi() )
>>>>> +    {
>>>>> +        /*
>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>> +         * because we have no way to avoid reentry, so do not use the APIC
>>>>> +         * shorthand.
>>>>> +         */
>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>> +        return;
>>>> The set of things you can safely do in an NMI/MCE handler is small, and
>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>>>> is safe to send an IPI because there is no risk of clobbering ICR2
>>>> behind your outer context's back).
>>>>
>>>> However, if we escalate from NMI/MCE context into crash context, then
>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
>>>> and that is not permitted to use a shorthand, making this code dead.
>>> This was requested by Jan, as safety measure
>> That may be, but it doesn't mean it is correct.  If execution ever
>> enters this function in NMI/MCE context, there is a real,
>> state-corrupting bug, higher up the call stack.
> Ack, then I guess we should just BUG() here if ever called from #NMI
> or #MC context?

Well.  There is a reason I suggested removing it, and not using BUG().

If NMI/MCE context escalates to crash context, we do need to send NMIs. 
It won't be this function specifically, but it will be part of the
general IPI infrastructure.

We definitely don't want to get into the game of trying to clobber each
of the state variables, so the only thing throwing BUG()'s around in
this area will do is make the crash path more fragile.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 11:28         ` Jan Beulich
@ 2020-02-18 11:44           ` Andrew Cooper
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 11:44 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné; +Cc: xen-devel, Wei Liu, Sander Eikelenboom

On 18/02/2020 11:28, Jan Beulich wrote:
> On 18.02.2020 12:21, Andrew Cooper wrote:
>> On 18/02/2020 11:10, Roger Pau Monné wrote:
>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>>>  {
>>>>>      bool cpus_locked = false;
>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>>>> +    unsigned long flags;
>>>>> +
>>>>> +    if ( in_mc() || in_nmi() )
>>>>> +    {
>>>>> +        /*
>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>> +         * because we have no way to avoid reentry, so do not use the APIC
>>>>> +         * shorthand.
>>>>> +         */
>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>> +        return;
>>>> The set of things you can safely do in an NMI/MCE handler is small, and
>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>>>> is safe to send an IPI because there is no risk of clobbering ICR2
>>>> behind your outer context's back).
>>>>
>>>> However, if we escalate from NMI/MCE context into crash context, then
>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
>>>> and that is not permitted to use a shorthand, making this code dead.
>>> This was requested by Jan, as safety measure
>> That may be, but it doesn't mean it is correct.  If execution ever
>> enters this function in NMI/MCE context, there is a real,
>> state-corrupting bug, higher up the call stack.
> Besides the issue of any locks needing taking on such paths (which
> must not happen in NMI/#MC context), the only thing getting in the
> way of IPI sending is - afaics - ICR2, which could be saved /
> restored around such operations.

Its the important xAPIC register for sure, but you've also got to
account for compound effects such as causing an LAPIC error.

It is far easier to say "thou shalt not IPI from NMI/MCE context",
because we don't have code needing to do this in the first place.

> That said, BUG()ing or panic()ing
> if we get in here from such a context would also be sufficient to
> satisfy the "safety measure" aspect.

No - safety checks in the crash path make it worse, because if they
trigger, they reliably trigger recursively and never enter the crash kernel.

Once we are in crash context, the most important task is to successfully
transition to the crash kernel.  Sure - there is no guarantee that we
will manage it, but hitting poorly-thought-through safety checks really
has wasted months of customer (and my) time during investigations.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 11:35           ` Andrew Cooper
@ 2020-02-18 11:46             ` Roger Pau Monné
  2020-02-18 13:29               ` Andrew Cooper
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18 11:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom

On Tue, Feb 18, 2020 at 11:35:37AM +0000, Andrew Cooper wrote:
> 
> 
> On 18/02/2020 11:22, Roger Pau Monné wrote:
> > On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
> >> On 18/02/2020 11:10, Roger Pau Monné wrote:
> >>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
> >>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
> >>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
> >>>>>  {
> >>>>>      bool cpus_locked = false;
> >>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> >>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> >>>>> +    unsigned long flags;
> >>>>> +
> >>>>> +    if ( in_mc() || in_nmi() )
> >>>>> +    {
> >>>>> +        /*
> >>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> >>>>> +         * because we have no way to avoid reentry, so do not use the APIC
> >>>>> +         * shorthand.
> >>>>> +         */
> >>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >>>>> +        return;
> >>>> The set of things you can safely do in an NMI/MCE handler is small, and
> >>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
> >>>> is safe to send an IPI because there is no risk of clobbering ICR2
> >>>> behind your outer context's back).
> >>>>
> >>>> However, if we escalate from NMI/MCE context into crash context, then
> >>>> anything goes.  In reality, we only ever send NMIs from the crash path,
> >>>> and that is not permitted to use a shorthand, making this code dead.
> >>> This was requested by Jan, as safety measure
> >> That may be, but it doesn't mean it is correct.  If execution ever
> >> enters this function in NMI/MCE context, there is a real,
> >> state-corrupting bug, higher up the call stack.
> > Ack, then I guess we should just BUG() here if ever called from #NMI
> > or #MC context?
> 
> Well.  There is a reason I suggested removing it, and not using BUG().
> 
> If NMI/MCE context escalates to crash context, we do need to send NMIs. 
> It won't be this function specifically, but it will be part of the
> general IPI infrastructure.
> 
> We definitely don't want to get into the game of trying to clobber each
> of the state variables, so the only thing throwing BUG()'s around in
> this area will do is make the crash path more fragile.

I see, panicking in such context will just clobber the previous crash
happened in NMI/MC context.

So you would rather keep the current version of falling back to the
usage of the non-shorthand IPI sending routine instead of panicking?

What about:

if ( in_mc() || in_nmi() )
{
    /*
     * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
     * because we have no way to avoid reentry, so do not use the APIC
     * shorthand. The only IPI that should be sent from such context
     * is a #NMI to shutdown the system in case of a crash.
     */
    if ( vector == APIC_DM_NMI )
    	alternative_vcall(genapic.send_IPI_mask, mask, vector);
    else
        BUG();

    return;
}

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 11:46             ` Roger Pau Monné
@ 2020-02-18 13:29               ` Andrew Cooper
  2020-02-18 14:43                 ` Roger Pau Monné
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 13:29 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom

On 18/02/2020 11:46, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 11:35:37AM +0000, Andrew Cooper wrote:
>>
>> On 18/02/2020 11:22, Roger Pau Monné wrote:
>>> On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
>>>> On 18/02/2020 11:10, Roger Pau Monné wrote:
>>>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>>>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>>>>>  {
>>>>>>>      bool cpus_locked = false;
>>>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>>>>>> +    unsigned long flags;
>>>>>>> +
>>>>>>> +    if ( in_mc() || in_nmi() )
>>>>>>> +    {
>>>>>>> +        /*
>>>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>>>> +         * because we have no way to avoid reentry, so do not use the APIC
>>>>>>> +         * shorthand.
>>>>>>> +         */
>>>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>>>> +        return;
>>>>>> The set of things you can safely do in an NMI/MCE handler is small, and
>>>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>>>>>> is safe to send an IPI because there is no risk of clobbering ICR2
>>>>>> behind your outer context's back).
>>>>>>
>>>>>> However, if we escalate from NMI/MCE context into crash context, then
>>>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
>>>>>> and that is not permitted to use a shorthand, making this code dead.
>>>>> This was requested by Jan, as safety measure
>>>> That may be, but it doesn't mean it is correct.  If execution ever
>>>> enters this function in NMI/MCE context, there is a real,
>>>> state-corrupting bug, higher up the call stack.
>>> Ack, then I guess we should just BUG() here if ever called from #NMI
>>> or #MC context?
>> Well.  There is a reason I suggested removing it, and not using BUG().
>>
>> If NMI/MCE context escalates to crash context, we do need to send NMIs. 
>> It won't be this function specifically, but it will be part of the
>> general IPI infrastructure.
>>
>> We definitely don't want to get into the game of trying to clobber each
>> of the state variables, so the only thing throwing BUG()'s around in
>> this area will do is make the crash path more fragile.
> I see, panicking in such context will just clobber the previous crash
> happened in NMI/MC context.
>
> So you would rather keep the current version of falling back to the
> usage of the non-shorthand IPI sending routine instead of panicking?
>
> What about:
>
> if ( in_mc() || in_nmi() )
> {
>     /*
>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>      * because we have no way to avoid reentry, so do not use the APIC
>      * shorthand. The only IPI that should be sent from such context
>      * is a #NMI to shutdown the system in case of a crash.
>      */
>     if ( vector == APIC_DM_NMI )
>     	alternative_vcall(genapic.send_IPI_mask, mask, vector);
>     else
>         BUG();
>
>     return;
> }

How do you intent to test it?

It might be correct now[*] but it doesn't protect against someone
modifying code, violating the constraint, and this going unnoticed
because the above codepath will only be entered in exceptional
circumstances.  Sods law says that code inside that block is first going
to be tested in a customer environment.

ASSERT()s would be less bad, but any technical countermeasures, however
well intentioned, get in the way of the crash path functioning when it
matters most.

~Andrew

[*] There is a long outstanding bug in machine_restart() which blindly
enables interrupts and IPIs CPU 0.  You can get here in the middle of a
crash, and this BUG() will trigger in at least one case I've seen before.

Fixing this isn't a 5 minute job, and it hasn't bubbled sufficiently up
my TODO list yet.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 13:29               ` Andrew Cooper
@ 2020-02-18 14:43                 ` Roger Pau Monné
  2020-02-18 15:34                   ` Andrew Cooper
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18 14:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom

On Tue, Feb 18, 2020 at 01:29:56PM +0000, Andrew Cooper wrote:
> On 18/02/2020 11:46, Roger Pau Monné wrote:
> > On Tue, Feb 18, 2020 at 11:35:37AM +0000, Andrew Cooper wrote:
> >>
> >> On 18/02/2020 11:22, Roger Pau Monné wrote:
> >>> On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
> >>>> On 18/02/2020 11:10, Roger Pau Monné wrote:
> >>>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
> >>>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
> >>>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >>>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
> >>>>>>>  {
> >>>>>>>      bool cpus_locked = false;
> >>>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> >>>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> >>>>>>> +    unsigned long flags;
> >>>>>>> +
> >>>>>>> +    if ( in_mc() || in_nmi() )
> >>>>>>> +    {
> >>>>>>> +        /*
> >>>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> >>>>>>> +         * because we have no way to avoid reentry, so do not use the APIC
> >>>>>>> +         * shorthand.
> >>>>>>> +         */
> >>>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >>>>>>> +        return;
> >>>>>> The set of things you can safely do in an NMI/MCE handler is small, and
> >>>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
> >>>>>> is safe to send an IPI because there is no risk of clobbering ICR2
> >>>>>> behind your outer context's back).
> >>>>>>
> >>>>>> However, if we escalate from NMI/MCE context into crash context, then
> >>>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
> >>>>>> and that is not permitted to use a shorthand, making this code dead.
> >>>>> This was requested by Jan, as safety measure
> >>>> That may be, but it doesn't mean it is correct.  If execution ever
> >>>> enters this function in NMI/MCE context, there is a real,
> >>>> state-corrupting bug, higher up the call stack.
> >>> Ack, then I guess we should just BUG() here if ever called from #NMI
> >>> or #MC context?
> >> Well.  There is a reason I suggested removing it, and not using BUG().
> >>
> >> If NMI/MCE context escalates to crash context, we do need to send NMIs. 
> >> It won't be this function specifically, but it will be part of the
> >> general IPI infrastructure.
> >>
> >> We definitely don't want to get into the game of trying to clobber each
> >> of the state variables, so the only thing throwing BUG()'s around in
> >> this area will do is make the crash path more fragile.
> > I see, panicking in such context will just clobber the previous crash
> > happened in NMI/MC context.
> >
> > So you would rather keep the current version of falling back to the
> > usage of the non-shorthand IPI sending routine instead of panicking?
> >
> > What about:
> >
> > if ( in_mc() || in_nmi() )
> > {
> >     /*
> >      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> >      * because we have no way to avoid reentry, so do not use the APIC
> >      * shorthand. The only IPI that should be sent from such context
> >      * is a #NMI to shutdown the system in case of a crash.
> >      */
> >     if ( vector == APIC_DM_NMI )
> >     	alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >     else
> >         BUG();
> >
> >     return;
> > }
> 
> How do you intent to test it?
> 
> It might be correct now[*] but it doesn't protect against someone
> modifying code, violating the constraint, and this going unnoticed
> because the above codepath will only be entered in exceptional
> circumstances.  Sods law says that code inside that block is first going
> to be tested in a customer environment.
> 
> ASSERT()s would be less bad, but any technical countermeasures, however
> well intentioned, get in the way of the crash path functioning when it
> matters most.

OK, so what about:

if ( in_mc() || in_nmi() )
{
    bool x2apic = current_local_apic_mode() == APIC_MODE_X2APIC;
    unsigned int icr2;

    /*
     * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
     * because we have no way to avoid reentry, so do not use the APIC
     * shorthand. The only IPI that should be sent from such context
     * is a #NMI to shutdown the system in case of a crash.
     */
    ASSERT(vector == APIC_DM_NMI);
    if ( !x2apic )
        icr2 = apic_read(APIC_ICR2);
    alternative_vcall(genapic.send_IPI_mask, mask, vector);
    if ( !x2apic )
        apic_write(APIC_ICR2, icr2);

    return;
}

I'm unsure as to whether the assert is actually helpful, but would
like to settle this before sending a new version.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 14:43                 ` Roger Pau Monné
@ 2020-02-18 15:34                   ` Andrew Cooper
  2020-02-18 15:40                     ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2020-02-18 15:34 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Jan Beulich, Sander Eikelenboom

On 18/02/2020 14:43, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 01:29:56PM +0000, Andrew Cooper wrote:
>> On 18/02/2020 11:46, Roger Pau Monné wrote:
>>> On Tue, Feb 18, 2020 at 11:35:37AM +0000, Andrew Cooper wrote:
>>>> On 18/02/2020 11:22, Roger Pau Monné wrote:
>>>>> On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
>>>>>> On 18/02/2020 11:10, Roger Pau Monné wrote:
>>>>>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>>>>>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>>>>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>>>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>>>>>>>  {
>>>>>>>>>      bool cpus_locked = false;
>>>>>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>>>>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>>>>>>>> +    unsigned long flags;
>>>>>>>>> +
>>>>>>>>> +    if ( in_mc() || in_nmi() )
>>>>>>>>> +    {
>>>>>>>>> +        /*
>>>>>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>>>>>> +         * because we have no way to avoid reentry, so do not use the APIC
>>>>>>>>> +         * shorthand.
>>>>>>>>> +         */
>>>>>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>>>>>> +        return;
>>>>>>>> The set of things you can safely do in an NMI/MCE handler is small, and
>>>>>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>>>>>>>> is safe to send an IPI because there is no risk of clobbering ICR2
>>>>>>>> behind your outer context's back).
>>>>>>>>
>>>>>>>> However, if we escalate from NMI/MCE context into crash context, then
>>>>>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
>>>>>>>> and that is not permitted to use a shorthand, making this code dead.
>>>>>>> This was requested by Jan, as safety measure
>>>>>> That may be, but it doesn't mean it is correct.  If execution ever
>>>>>> enters this function in NMI/MCE context, there is a real,
>>>>>> state-corrupting bug, higher up the call stack.
>>>>> Ack, then I guess we should just BUG() here if ever called from #NMI
>>>>> or #MC context?
>>>> Well.  There is a reason I suggested removing it, and not using BUG().
>>>>
>>>> If NMI/MCE context escalates to crash context, we do need to send NMIs. 
>>>> It won't be this function specifically, but it will be part of the
>>>> general IPI infrastructure.
>>>>
>>>> We definitely don't want to get into the game of trying to clobber each
>>>> of the state variables, so the only thing throwing BUG()'s around in
>>>> this area will do is make the crash path more fragile.
>>> I see, panicking in such context will just clobber the previous crash
>>> happened in NMI/MC context.
>>>
>>> So you would rather keep the current version of falling back to the
>>> usage of the non-shorthand IPI sending routine instead of panicking?
>>>
>>> What about:
>>>
>>> if ( in_mc() || in_nmi() )
>>> {
>>>     /*
>>>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>      * because we have no way to avoid reentry, so do not use the APIC
>>>      * shorthand. The only IPI that should be sent from such context
>>>      * is a #NMI to shutdown the system in case of a crash.
>>>      */
>>>     if ( vector == APIC_DM_NMI )
>>>     	alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>     else
>>>         BUG();
>>>
>>>     return;
>>> }
>> How do you intent to test it?
>>
>> It might be correct now[*] but it doesn't protect against someone
>> modifying code, violating the constraint, and this going unnoticed
>> because the above codepath will only be entered in exceptional
>> circumstances.  Sods law says that code inside that block is first going
>> to be tested in a customer environment.
>>
>> ASSERT()s would be less bad, but any technical countermeasures, however
>> well intentioned, get in the way of the crash path functioning when it
>> matters most.
> OK, so what about:
>
> if ( in_mc() || in_nmi() )
> {
>     bool x2apic = current_local_apic_mode() == APIC_MODE_X2APIC;
>     unsigned int icr2;
>
>     /*
>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>      * because we have no way to avoid reentry, so do not use the APIC
>      * shorthand. The only IPI that should be sent from such context
>      * is a #NMI to shutdown the system in case of a crash.
>      */
>     ASSERT(vector == APIC_DM_NMI);
>     if ( !x2apic )
>         icr2 = apic_read(APIC_ICR2);
>     alternative_vcall(genapic.send_IPI_mask, mask, vector);
>     if ( !x2apic )
>         apic_write(APIC_ICR2, icr2);
>
>     return;
> }
>
> I'm unsure as to whether the assert is actually helpful, but would
> like to settle this before sending a new version.

I can only repeat my previous email (questions and statements).

*Any* logic inside "if ( in_mc() || in_nmi() )" can't be tested
usefully, making it problematic as a sanity check.

(For this version of the code specifically, you absolutely don't want to
be reading MSR_APIC_BASE every time, and when we're on the crash path
sending NMIs, we don't care at all about clobbering ICR2.)

Doing nothing, is less bad than doing this.  There is no point trying to
cope with a corner case we don't support, and there is nothing you can
do, sanity wise, which doesn't come with a high chance of blowing up
first in a customer environment.

Literally, do nothing.  It is the least bad option going.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 15:34                   ` Andrew Cooper
@ 2020-02-18 15:40                     ` Jan Beulich
  2020-02-18 16:18                       ` Roger Pau Monné
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2020-02-18 15:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Sander Eikelenboom, Wei Liu, Roger Pau Monné

On 18.02.2020 16:34, Andrew Cooper wrote:
> On 18/02/2020 14:43, Roger Pau Monné wrote:
>> On Tue, Feb 18, 2020 at 01:29:56PM +0000, Andrew Cooper wrote:
>>> On 18/02/2020 11:46, Roger Pau Monné wrote:
>>>> On Tue, Feb 18, 2020 at 11:35:37AM +0000, Andrew Cooper wrote:
>>>>> On 18/02/2020 11:22, Roger Pau Monné wrote:
>>>>>> On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
>>>>>>> On 18/02/2020 11:10, Roger Pau Monné wrote:
>>>>>>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
>>>>>>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
>>>>>>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>>>>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>>>>>>>>  {
>>>>>>>>>>      bool cpus_locked = false;
>>>>>>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>>>>>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>>>>>>>>> +    unsigned long flags;
>>>>>>>>>> +
>>>>>>>>>> +    if ( in_mc() || in_nmi() )
>>>>>>>>>> +    {
>>>>>>>>>> +        /*
>>>>>>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>>>>>>> +         * because we have no way to avoid reentry, so do not use the APIC
>>>>>>>>>> +         * shorthand.
>>>>>>>>>> +         */
>>>>>>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>>>>>>> +        return;
>>>>>>>>> The set of things you can safely do in an NMI/MCE handler is small, and
>>>>>>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
>>>>>>>>> is safe to send an IPI because there is no risk of clobbering ICR2
>>>>>>>>> behind your outer context's back).
>>>>>>>>>
>>>>>>>>> However, if we escalate from NMI/MCE context into crash context, then
>>>>>>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
>>>>>>>>> and that is not permitted to use a shorthand, making this code dead.
>>>>>>>> This was requested by Jan, as safety measure
>>>>>>> That may be, but it doesn't mean it is correct.  If execution ever
>>>>>>> enters this function in NMI/MCE context, there is a real,
>>>>>>> state-corrupting bug, higher up the call stack.
>>>>>> Ack, then I guess we should just BUG() here if ever called from #NMI
>>>>>> or #MC context?
>>>>> Well.  There is a reason I suggested removing it, and not using BUG().
>>>>>
>>>>> If NMI/MCE context escalates to crash context, we do need to send NMIs. 
>>>>> It won't be this function specifically, but it will be part of the
>>>>> general IPI infrastructure.
>>>>>
>>>>> We definitely don't want to get into the game of trying to clobber each
>>>>> of the state variables, so the only thing throwing BUG()'s around in
>>>>> this area will do is make the crash path more fragile.
>>>> I see, panicking in such context will just clobber the previous crash
>>>> happened in NMI/MC context.
>>>>
>>>> So you would rather keep the current version of falling back to the
>>>> usage of the non-shorthand IPI sending routine instead of panicking?
>>>>
>>>> What about:
>>>>
>>>> if ( in_mc() || in_nmi() )
>>>> {
>>>>     /*
>>>>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>      * because we have no way to avoid reentry, so do not use the APIC
>>>>      * shorthand. The only IPI that should be sent from such context
>>>>      * is a #NMI to shutdown the system in case of a crash.
>>>>      */
>>>>     if ( vector == APIC_DM_NMI )
>>>>     	alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>     else
>>>>         BUG();
>>>>
>>>>     return;
>>>> }
>>> How do you intent to test it?
>>>
>>> It might be correct now[*] but it doesn't protect against someone
>>> modifying code, violating the constraint, and this going unnoticed
>>> because the above codepath will only be entered in exceptional
>>> circumstances.  Sods law says that code inside that block is first going
>>> to be tested in a customer environment.
>>>
>>> ASSERT()s would be less bad, but any technical countermeasures, however
>>> well intentioned, get in the way of the crash path functioning when it
>>> matters most.
>> OK, so what about:
>>
>> if ( in_mc() || in_nmi() )
>> {
>>     bool x2apic = current_local_apic_mode() == APIC_MODE_X2APIC;
>>     unsigned int icr2;
>>
>>     /*
>>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>      * because we have no way to avoid reentry, so do not use the APIC
>>      * shorthand. The only IPI that should be sent from such context
>>      * is a #NMI to shutdown the system in case of a crash.
>>      */
>>     ASSERT(vector == APIC_DM_NMI);
>>     if ( !x2apic )
>>         icr2 = apic_read(APIC_ICR2);
>>     alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>     if ( !x2apic )
>>         apic_write(APIC_ICR2, icr2);
>>
>>     return;
>> }
>>
>> I'm unsure as to whether the assert is actually helpful, but would
>> like to settle this before sending a new version.
> 
> I can only repeat my previous email (questions and statements).
> 
> *Any* logic inside "if ( in_mc() || in_nmi() )" can't be tested
> usefully, making it problematic as a sanity check.
> 
> (For this version of the code specifically, you absolutely don't want to
> be reading MSR_APIC_BASE every time, and when we're on the crash path
> sending NMIs, we don't care at all about clobbering ICR2.)
> 
> Doing nothing, is less bad than doing this.  There is no point trying to
> cope with a corner case we don't support, and there is nothing you can
> do, sanity wise, which doesn't come with a high chance of blowing up
> first in a customer environment.
> 
> Literally, do nothing.  It is the least bad option going.

I think you're a little too focused on the crash path. Doing nothing
here likely means having problems later if we get into here, in a
far harder to debug manner. May I suggest we introduce e.g.
SYS_STATE_crashed, and bypass any such potentially problematic
checks if in this state? Your argument about not being able to test
these paths applies to a "don't do anything" approach as well, after
all - we won't know if the absence of any extra logic is fine until
someone (perhaps even multiple "someone"-s) actually hit that path.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 15:40                     ` Jan Beulich
@ 2020-02-18 16:18                       ` Roger Pau Monné
  2020-02-18 16:33                         ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2020-02-18 16:18 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel, Wei Liu, Sander Eikelenboom

On Tue, Feb 18, 2020 at 04:40:29PM +0100, Jan Beulich wrote:
> On 18.02.2020 16:34, Andrew Cooper wrote:
> > On 18/02/2020 14:43, Roger Pau Monné wrote:
> >> On Tue, Feb 18, 2020 at 01:29:56PM +0000, Andrew Cooper wrote:
> >>> On 18/02/2020 11:46, Roger Pau Monné wrote:
> >>>> On Tue, Feb 18, 2020 at 11:35:37AM +0000, Andrew Cooper wrote:
> >>>>> On 18/02/2020 11:22, Roger Pau Monné wrote:
> >>>>>> On Tue, Feb 18, 2020 at 11:21:12AM +0000, Andrew Cooper wrote:
> >>>>>>> On 18/02/2020 11:10, Roger Pau Monné wrote:
> >>>>>>>> On Tue, Feb 18, 2020 at 10:53:45AM +0000, Andrew Cooper wrote:
> >>>>>>>>> On 17/02/2020 18:43, Roger Pau Monne wrote:
> >>>>>>>>>> @@ -67,7 +68,20 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >>>>>>>>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
> >>>>>>>>>>  {
> >>>>>>>>>>      bool cpus_locked = false;
> >>>>>>>>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> >>>>>>>>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> >>>>>>>>>> +    unsigned long flags;
> >>>>>>>>>> +
> >>>>>>>>>> +    if ( in_mc() || in_nmi() )
> >>>>>>>>>> +    {
> >>>>>>>>>> +        /*
> >>>>>>>>>> +         * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> >>>>>>>>>> +         * because we have no way to avoid reentry, so do not use the APIC
> >>>>>>>>>> +         * shorthand.
> >>>>>>>>>> +         */
> >>>>>>>>>> +        alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >>>>>>>>>> +        return;
> >>>>>>>>> The set of things you can safely do in an NMI/MCE handler is small, and
> >>>>>>>>> does not include sending IPIs.  (In reality, if you're using x2apic, it
> >>>>>>>>> is safe to send an IPI because there is no risk of clobbering ICR2
> >>>>>>>>> behind your outer context's back).
> >>>>>>>>>
> >>>>>>>>> However, if we escalate from NMI/MCE context into crash context, then
> >>>>>>>>> anything goes.  In reality, we only ever send NMIs from the crash path,
> >>>>>>>>> and that is not permitted to use a shorthand, making this code dead.
> >>>>>>>> This was requested by Jan, as safety measure
> >>>>>>> That may be, but it doesn't mean it is correct.  If execution ever
> >>>>>>> enters this function in NMI/MCE context, there is a real,
> >>>>>>> state-corrupting bug, higher up the call stack.
> >>>>>> Ack, then I guess we should just BUG() here if ever called from #NMI
> >>>>>> or #MC context?
> >>>>> Well.  There is a reason I suggested removing it, and not using BUG().
> >>>>>
> >>>>> If NMI/MCE context escalates to crash context, we do need to send NMIs. 
> >>>>> It won't be this function specifically, but it will be part of the
> >>>>> general IPI infrastructure.
> >>>>>
> >>>>> We definitely don't want to get into the game of trying to clobber each
> >>>>> of the state variables, so the only thing throwing BUG()'s around in
> >>>>> this area will do is make the crash path more fragile.
> >>>> I see, panicking in such context will just clobber the previous crash
> >>>> happened in NMI/MC context.
> >>>>
> >>>> So you would rather keep the current version of falling back to the
> >>>> usage of the non-shorthand IPI sending routine instead of panicking?
> >>>>
> >>>> What about:
> >>>>
> >>>> if ( in_mc() || in_nmi() )
> >>>> {
> >>>>     /*
> >>>>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> >>>>      * because we have no way to avoid reentry, so do not use the APIC
> >>>>      * shorthand. The only IPI that should be sent from such context
> >>>>      * is a #NMI to shutdown the system in case of a crash.
> >>>>      */
> >>>>     if ( vector == APIC_DM_NMI )
> >>>>     	alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >>>>     else
> >>>>         BUG();
> >>>>
> >>>>     return;
> >>>> }
> >>> How do you intent to test it?
> >>>
> >>> It might be correct now[*] but it doesn't protect against someone
> >>> modifying code, violating the constraint, and this going unnoticed
> >>> because the above codepath will only be entered in exceptional
> >>> circumstances.  Sods law says that code inside that block is first going
> >>> to be tested in a customer environment.
> >>>
> >>> ASSERT()s would be less bad, but any technical countermeasures, however
> >>> well intentioned, get in the way of the crash path functioning when it
> >>> matters most.
> >> OK, so what about:
> >>
> >> if ( in_mc() || in_nmi() )
> >> {
> >>     bool x2apic = current_local_apic_mode() == APIC_MODE_X2APIC;
> >>     unsigned int icr2;
> >>
> >>     /*
> >>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
> >>      * because we have no way to avoid reentry, so do not use the APIC
> >>      * shorthand. The only IPI that should be sent from such context
> >>      * is a #NMI to shutdown the system in case of a crash.
> >>      */
> >>     ASSERT(vector == APIC_DM_NMI);
> >>     if ( !x2apic )
> >>         icr2 = apic_read(APIC_ICR2);
> >>     alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >>     if ( !x2apic )
> >>         apic_write(APIC_ICR2, icr2);
> >>
> >>     return;
> >> }
> >>
> >> I'm unsure as to whether the assert is actually helpful, but would
> >> like to settle this before sending a new version.
> > 
> > I can only repeat my previous email (questions and statements).
> > 
> > *Any* logic inside "if ( in_mc() || in_nmi() )" can't be tested
> > usefully, making it problematic as a sanity check.

Right, so what about keeping the logic in "if ( in_mc() || in_nmi() )"
using the code as it was previous to introducing the shorthand, ie:

if ( in_mc() || in_nmi() )
{
    alternative_vcall(genapic.send_IPI_mask, mask, vector);
    return;
}

That would be exactly what send_IPI_mask would do prior to the
introduction of the shorthand (pre 5500d265a2a8f), and I think
it's a compromise between "don't do anything" and "let's try to handle
this in a non-broken way".

Using the shorthand adds more logic, which we would like to avoid in
such critical crash paths, so let's try to avoid as much as possible
by just falling back to what was there previously.

> > (For this version of the code specifically, you absolutely don't want to
> > be reading MSR_APIC_BASE every time, and when we're on the crash path
> > sending NMIs, we don't care at all about clobbering ICR2.)
> > 
> > Doing nothing, is less bad than doing this.  There is no point trying to
> > cope with a corner case we don't support, and there is nothing you can
> > do, sanity wise, which doesn't come with a high chance of blowing up
> > first in a customer environment.
> > 
> > Literally, do nothing.  It is the least bad option going.
> 
> I think you're a little too focused on the crash path. Doing nothing
> here likely means having problems later if we get into here, in a
> far harder to debug manner. May I suggest we introduce e.g.
> SYS_STATE_crashed, and bypass any such potentially problematic
> checks if in this state? Your argument about not being able to test
> these paths applies to a "don't do anything" approach as well, after
> all - we won't know if the absence of any extra logic is fine until
> someone (perhaps even multiple "someone"-s) actually hit that path.

Introducing such state would be another option (or a further
improvement), but we still need to handle what happens when
send_IPI_mask gets called from non-maskable interrupt context, because
using the per-CPU mask in that context is definitely not safe
(regardless of whether it's a crash path or not).

Falling back to not using the shorthand in such contexts seems like a
good compromise: it's not adding new logic, just restoring the logic
prior to the introduction of the shorthand.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-18 16:18                       ` Roger Pau Monné
@ 2020-02-18 16:33                         ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-02-18 16:33 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Sander Eikelenboom, Wei Liu, xen-devel

On 18.02.2020 17:18, Roger Pau Monné wrote:
> On Tue, Feb 18, 2020 at 04:40:29PM +0100, Jan Beulich wrote:
>> On 18.02.2020 16:34, Andrew Cooper wrote:
>>> On 18/02/2020 14:43, Roger Pau Monné wrote:
>>>> OK, so what about:
>>>>
>>>> if ( in_mc() || in_nmi() )
>>>> {
>>>>     bool x2apic = current_local_apic_mode() == APIC_MODE_X2APIC;
>>>>     unsigned int icr2;
>>>>
>>>>     /*
>>>>      * When in #MC or #MNI context Xen cannot use the per-CPU scratch mask
>>>>      * because we have no way to avoid reentry, so do not use the APIC
>>>>      * shorthand. The only IPI that should be sent from such context
>>>>      * is a #NMI to shutdown the system in case of a crash.
>>>>      */
>>>>     ASSERT(vector == APIC_DM_NMI);
>>>>     if ( !x2apic )
>>>>         icr2 = apic_read(APIC_ICR2);
>>>>     alternative_vcall(genapic.send_IPI_mask, mask, vector);
>>>>     if ( !x2apic )
>>>>         apic_write(APIC_ICR2, icr2);
>>>>
>>>>     return;
>>>> }
>>>>
>>>> I'm unsure as to whether the assert is actually helpful, but would
>>>> like to settle this before sending a new version.
>>>
>>> I can only repeat my previous email (questions and statements).
>>>
>>> *Any* logic inside "if ( in_mc() || in_nmi() )" can't be tested
>>> usefully, making it problematic as a sanity check.
> 
> Right, so what about keeping the logic in "if ( in_mc() || in_nmi() )"
> using the code as it was previous to introducing the shorthand, ie:
> 
> if ( in_mc() || in_nmi() )
> {
>     alternative_vcall(genapic.send_IPI_mask, mask, vector);
>     return;
> }
> 
> That would be exactly what send_IPI_mask would do prior to the
> introduction of the shorthand (pre 5500d265a2a8f), and I think
> it's a compromise between "don't do anything" and "let's try to handle
> this in a non-broken way".
> 
> Using the shorthand adds more logic, which we would like to avoid in
> such critical crash paths, so let's try to avoid as much as possible
> by just falling back to what was there previously.
> 
>>> (For this version of the code specifically, you absolutely don't want to
>>> be reading MSR_APIC_BASE every time, and when we're on the crash path
>>> sending NMIs, we don't care at all about clobbering ICR2.)
>>>
>>> Doing nothing, is less bad than doing this.  There is no point trying to
>>> cope with a corner case we don't support, and there is nothing you can
>>> do, sanity wise, which doesn't come with a high chance of blowing up
>>> first in a customer environment.
>>>
>>> Literally, do nothing.  It is the least bad option going.
>>
>> I think you're a little too focused on the crash path. Doing nothing
>> here likely means having problems later if we get into here, in a
>> far harder to debug manner. May I suggest we introduce e.g.
>> SYS_STATE_crashed, and bypass any such potentially problematic
>> checks if in this state? Your argument about not being able to test
>> these paths applies to a "don't do anything" approach as well, after
>> all - we won't know if the absence of any extra logic is fine until
>> someone (perhaps even multiple "someone"-s) actually hit that path.
> 
> Introducing such state would be another option (or a further
> improvement), but we still need to handle what happens when
> send_IPI_mask gets called from non-maskable interrupt context, because
> using the per-CPU mask in that context is definitely not safe
> (regardless of whether it's a crash path or not).
> 
> Falling back to not using the shorthand in such contexts seems like a
> good compromise: it's not adding new logic, just restoring the logic
> prior to the introduction of the shorthand.

I'd be okay with this.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-02-18 16:33 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-17 18:43 [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Roger Pau Monne
2020-02-17 18:43 ` [Xen-devel] [PATCH v2 1/6] x86/smp: unify header includes in smp.h Roger Pau Monne
2020-02-17 18:43 ` [Xen-devel] [PATCH v2 2/6] x86: introduce a nmi_count tracking variable Roger Pau Monne
2020-02-17 18:43 ` [Xen-devel] [PATCH v2 3/6] x86: track when in #MC context Roger Pau Monne
2020-02-17 19:29   ` Julien Grall
2020-02-18  9:45     ` Roger Pau Monné
2020-02-18  9:48     ` Roger Pau Monné
2020-02-18 11:20       ` Julien Grall
2020-02-17 18:43 ` [Xen-devel] [PATCH v2 4/6] x86: track when in #NMI context Roger Pau Monne
2020-02-18 10:40   ` Andrew Cooper
2020-02-18 10:59     ` Roger Pau Monné
2020-02-17 18:43 ` [Xen-devel] [PATCH v2 5/6] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
2020-02-18 10:53   ` Andrew Cooper
2020-02-18 11:10     ` Roger Pau Monné
2020-02-18 11:21       ` Andrew Cooper
2020-02-18 11:22         ` Roger Pau Monné
2020-02-18 11:35           ` Andrew Cooper
2020-02-18 11:46             ` Roger Pau Monné
2020-02-18 13:29               ` Andrew Cooper
2020-02-18 14:43                 ` Roger Pau Monné
2020-02-18 15:34                   ` Andrew Cooper
2020-02-18 15:40                     ` Jan Beulich
2020-02-18 16:18                       ` Roger Pau Monné
2020-02-18 16:33                         ` Jan Beulich
2020-02-18 11:28         ` Jan Beulich
2020-02-18 11:44           ` Andrew Cooper
2020-02-17 18:43 ` [Xen-devel] [PATCH v2 6/6] x86: add accessors for scratch cpu mask Roger Pau Monne
2020-02-18  7:40 ` [Xen-devel] [PATCH v2 0/6] x86: fixes/improvements for scratch cpumask Jürgen Groß
2020-02-18 10:15   ` Roger Pau Monné
2020-02-18 10:46     ` Jürgen Groß
2020-02-18 10:26   ` Andrew Cooper
2020-02-18 10:54     ` Jürgen Groß

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.