[Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask
@ 2020-02-24 10:46 Roger Pau Monne
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 1/5] x86: introduce a nmi_count tracking variable Roger Pau Monne
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Roger Pau Monne @ 2020-02-24 10:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Roger Pau Monne

Hello,

Commit:

5500d265a2a8fa63d60c08beb549de8ec82ff7a5
x86/smp: use APIC ALLBUT destination shorthand when possible

Introduced a bogus usage of the scratch cpumask: it was used in a
function that could be called from interrupt context, and hence using
the scratch cpumask there is not safe. Patch #4 is a fix for that usage,
together with also preventing the usage of any per-CPU variables when
send_IPI_mask is called from #MC or #NMI context. Previous patches are
preparatory changes.

Patch #5 adds some debug infrastructure to make sure the scratch cpumask
is used in the right context, and hence should prevent further missuses.

Thanks, Roger.

Roger Pau Monne (5):
  x86: introduce a nmi_count tracking variable
  x86: track when in #NMI context
  x86: track when in #MC context
  x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  x86: add accessors for scratch cpu mask

 xen/arch/x86/cpu/mcheck/mce.c |  2 ++
 xen/arch/x86/io_apic.c        |  6 ++--
 xen/arch/x86/irq.c            | 13 +++++++--
 xen/arch/x86/mm.c             | 30 ++++++++++++++------
 xen/arch/x86/msi.c            |  4 ++-
 xen/arch/x86/nmi.c            | 11 ++++----
 xen/arch/x86/smp.c            | 52 ++++++++++++++++++++++++++++++++++-
 xen/arch/x86/smpboot.c        | 10 +++++--
 xen/arch/x86/traps.c          | 10 ++++++-
 xen/include/asm-x86/hardirq.h | 13 ++++++++-
 xen/include/asm-x86/nmi.h     |  2 ++
 xen/include/asm-x86/smp.h     | 10 +++++++
 xen/include/xen/irq_cpustat.h |  1 -
 13 files changed, 137 insertions(+), 27 deletions(-)

-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 1/5] x86: introduce a nmi_count tracking variable
  2020-02-24 10:46 [Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask Roger Pau Monne
@ 2020-02-24 10:46 ` Roger Pau Monne
  2020-02-25 16:39   ` Jan Beulich
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context Roger Pau Monne
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monne @ 2020-02-24 10:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

This is modeled after the irq_count variable, and is used to account
for all the NMIs handled by the system.

This will allow to repurpose the nmi_count() helper so it can be used
in a similar manner as local_irq_count(): account for the NMIs
currently in service.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/nmi.c        | 11 +++++------
 xen/arch/x86/traps.c      |  4 +++-
 xen/include/asm-x86/nmi.h |  2 ++
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index a69b91a924..c3f92ed231 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -151,15 +151,14 @@ int nmi_active;
 
 static void __init wait_for_nmis(void *p)
 {
-    unsigned int cpu = smp_processor_id();
-    unsigned int start_count = nmi_count(cpu);
+    unsigned int start_count = this_cpu(nmi_count);
     unsigned long ticks = 10 * 1000 * cpu_khz / nmi_hz;
     unsigned long s, e;
 
     s = rdtsc();
     do {
         cpu_relax();
-        if ( nmi_count(cpu) >= start_count + 2 )
+        if ( this_cpu(nmi_count) >= start_count + 2 )
             break;
         e = rdtsc();
     } while( e - s < ticks );
@@ -177,7 +176,7 @@ void __init check_nmi_watchdog(void)
     printk("Testing NMI watchdog on all CPUs:");
 
     for_each_online_cpu ( cpu )
-        prev_nmi_count[cpu] = nmi_count(cpu);
+        prev_nmi_count[cpu] = per_cpu(nmi_count, cpu);
 
     /*
      * Wait at most 10 ticks for 2 watchdog NMIs on each CPU.
@@ -188,7 +187,7 @@ void __init check_nmi_watchdog(void)
 
     for_each_online_cpu ( cpu )
     {
-        if ( nmi_count(cpu) - prev_nmi_count[cpu] < 2 )
+        if ( per_cpu(nmi_count, cpu) - prev_nmi_count[cpu] < 2 )
         {
             printk(" %d", cpu);
             ok = false;
@@ -593,7 +592,7 @@ static void do_nmi_stats(unsigned char key)
 
     printk("CPU\tNMI\n");
     for_each_online_cpu ( cpu )
-        printk("%3u\t%3u\n", cpu, nmi_count(cpu));
+        printk("%3u\t%3u\n", cpu, per_cpu(nmi_count, cpu));
 
     if ( !hardware_domain || !(v = domain_vcpu(hardware_domain, 0)) )
         return;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 56067f85d1..3dbc66bb64 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1683,13 +1683,15 @@ static int dummy_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 
 static nmi_callback_t *nmi_callback = dummy_nmi_callback;
 
+DEFINE_PER_CPU(unsigned int, nmi_count);
+
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
     unsigned char reason = 0;
     bool handle_unknown = false;
 
-    ++nmi_count(cpu);
+    this_cpu(nmi_count)++;
 
     if ( nmi_callback(regs, cpu) )
         return;
diff --git a/xen/include/asm-x86/nmi.h b/xen/include/asm-x86/nmi.h
index f9dfca6afb..a288f02a50 100644
--- a/xen/include/asm-x86/nmi.h
+++ b/xen/include/asm-x86/nmi.h
@@ -31,5 +31,7 @@ nmi_callback_t *set_nmi_callback(nmi_callback_t *callback);
  * Remove the handler previously set.
  */
 void unset_nmi_callback(void);
+
+DECLARE_PER_CPU(unsigned int, nmi_count);
  
 #endif /* ASM_NMI_H */
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context
  2020-02-24 10:46 [Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask Roger Pau Monne
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 1/5] x86: introduce a nmi_count tracking variable Roger Pau Monne
@ 2020-02-24 10:46 ` Roger Pau Monne
  2020-02-25 16:49   ` Jan Beulich
  2020-02-25 16:51   ` Jan Beulich
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 3/5] x86: track when in #MC context Roger Pau Monne
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Roger Pau Monne @ 2020-02-24 10:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, Jan Beulich,
	Roger Pau Monne

Add helpers to track when running in #NMI context. This is modeled
after the in_irq helpers.

The SDM states that no #NMI can be delivered while handling a #NMI
until the processor has executed an iret instruction. It's possible
however that another fault is received while handling the #NMI (a #MC
for example), and thus the iret from that fault would allow further
#NMIs to be injected while still processing the previous one, and
hence an integer is needed in order to keep track of in service #NMIs.

While there move nmi_count() into a x86 specific header and drop the
leading underscores from __nmi_count field.

Note that there are no users of in_nmi() introduced by the change,
further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Use an integer instead of a boolean to keep track of in service
   #NMIs.
 - Move nmi_count into x86 specific header.
 - Drop leading underscores from __nmi_count field.
---
 xen/arch/x86/traps.c          | 6 ++++++
 xen/include/asm-x86/hardirq.h | 7 ++++++-
 xen/include/xen/irq_cpustat.h | 1 -
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 3dbc66bb64..f4f2c13ae9 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1692,9 +1692,13 @@ void do_nmi(const struct cpu_user_regs *regs)
     bool handle_unknown = false;
 
     this_cpu(nmi_count)++;
+    nmi_enter();
 
     if ( nmi_callback(regs, cpu) )
+    {
+        nmi_exit();
         return;
+    }
 
     /*
      * Accessing port 0x61 may trap to SMM which has been actually
@@ -1720,6 +1724,8 @@ void do_nmi(const struct cpu_user_regs *regs)
         if ( !(reason & 0xc0) && handle_unknown )
             unknown_nmi_error(regs, reason);
     }
+
+    nmi_exit();
 }
 
 nmi_callback_t *set_nmi_callback(nmi_callback_t *callback)
diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
index 34e1b49260..6ccce75881 100644
--- a/xen/include/asm-x86/hardirq.h
+++ b/xen/include/asm-x86/hardirq.h
@@ -7,7 +7,7 @@
 typedef struct {
 	unsigned int __softirq_pending;
 	unsigned int __local_irq_count;
-	unsigned int __nmi_count;
+	unsigned int nmi_count;
 	bool_t __mwait_wakeup;
 } __cacheline_aligned irq_cpustat_t;
 
@@ -18,6 +18,11 @@ typedef struct {
 #define irq_enter()	(local_irq_count(smp_processor_id())++)
 #define irq_exit()	(local_irq_count(smp_processor_id())--)
 
+#define nmi_count(cpu)	__IRQ_STAT((cpu), nmi_count)
+#define in_nmi()	(nmi_count(smp_processor_id()) != 0)
+#define nmi_enter()	(nmi_count(smp_processor_id())++)
+#define nmi_exit()	(nmi_count(smp_processor_id())--)
+
 void ack_bad_irq(unsigned int irq);
 
 extern void apic_intr_init(void);
diff --git a/xen/include/xen/irq_cpustat.h b/xen/include/xen/irq_cpustat.h
index 73629f6ec8..b9629f25c2 100644
--- a/xen/include/xen/irq_cpustat.h
+++ b/xen/include/xen/irq_cpustat.h
@@ -24,7 +24,6 @@ extern irq_cpustat_t irq_stat[];
   /* arch independent irq_stat fields */
 #define softirq_pending(cpu)	__IRQ_STAT((cpu), __softirq_pending)
 #define local_irq_count(cpu)	__IRQ_STAT((cpu), __local_irq_count)
-#define nmi_count(cpu)		__IRQ_STAT((cpu), __nmi_count)
 #define mwait_wakeup(cpu)	__IRQ_STAT((cpu), __mwait_wakeup)
 
 #endif	/* __irq_cpustat_h */
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 3/5] x86: track when in #MC context
  2020-02-24 10:46 [Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask Roger Pau Monne
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 1/5] x86: introduce a nmi_count tracking variable Roger Pau Monne
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context Roger Pau Monne
@ 2020-02-24 10:46 ` Roger Pau Monne
  2020-02-25 17:06   ` Jan Beulich
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask Roger Pau Monne
  4 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monne @ 2020-02-24 10:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Add helpers to track when executing in #MC context. This is modeled
after the in_irq helpers.

Note that there are no users of in_mc() introduced by the change,
further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Move definition of mc_count to x86 hardirq.h.
---
 xen/arch/x86/cpu/mcheck/mce.c | 2 ++
 xen/include/asm-x86/hardirq.h | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index d61e582af3..93ed5752ac 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -93,7 +93,9 @@ void x86_mce_vector_register(x86_mce_vector_t hdlr)
 
 void do_machine_check(const struct cpu_user_regs *regs)
 {
+    mc_enter();
     _machine_check_vector(regs);
+    mc_exit();
 }
 
 /*
diff --git a/xen/include/asm-x86/hardirq.h b/xen/include/asm-x86/hardirq.h
index 6ccce75881..16dbe27de4 100644
--- a/xen/include/asm-x86/hardirq.h
+++ b/xen/include/asm-x86/hardirq.h
@@ -8,6 +8,7 @@ typedef struct {
 	unsigned int __softirq_pending;
 	unsigned int __local_irq_count;
 	unsigned int nmi_count;
+	unsigned int mc_count;
 	bool_t __mwait_wakeup;
 } __cacheline_aligned irq_cpustat_t;
 
@@ -23,6 +24,11 @@ typedef struct {
 #define nmi_enter()	(nmi_count(smp_processor_id())++)
 #define nmi_exit()	(nmi_count(smp_processor_id())--)
 
+#define mc_count(cpu)	__IRQ_STAT((cpu), mc_count)
+#define in_mc() 	(mc_count(smp_processor_id()) != 0)
+#define mc_enter()	(mc_count(smp_processor_id())++)
+#define mc_exit()	(mc_count(smp_processor_id())--)
+
 void ack_bad_irq(unsigned int irq);
 
 extern void apic_intr_init(void);
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-24 10:46 [Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask Roger Pau Monne
                   ` (2 preceding siblings ...)
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 3/5] x86: track when in #MC context Roger Pau Monne
@ 2020-02-24 10:46 ` Roger Pau Monne
  2020-02-26 10:07   ` Jan Beulich
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask Roger Pau Monne
  4 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monne @ 2020-02-24 10:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Sander Eikelenboom, Wei Liu, Jan Beulich, Roger Pau Monne

Using scratch_cpumask in send_IPI_mask is not safe because it can be
called from interrupt context, and hence Xen would have to make sure
all the users of the scratch cpumask disable interrupts while using
it.

Instead introduce a new cpumask to be used by send_IPI_mask, and
disable interrupts while using it.

Note that such mask cannot be used when non-maskable interrupts are
being serviced (#NMI or #MC) and hence fallback to not using the
shorthand in that case, like it was done previously.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Fallback to the previous IPI sending mechanism in #MC or #NMI
   context.

Changes since v1:
 - Don't use the shorthand when in #MC or #NMI context.
---
 xen/arch/x86/smp.c     | 27 ++++++++++++++++++++++++++-
 xen/arch/x86/smpboot.c |  9 ++++++++-
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 55d08c9d52..53e0de2a70 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -59,6 +59,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
     apic_write(APIC_ICR, cfg);
 }
 
+DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
 /*
  * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
  * excluding the local CPU. @cpumask may be empty.
@@ -67,7 +68,21 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
 void send_IPI_mask(const cpumask_t *mask, int vector)
 {
     bool cpus_locked = false;
-    cpumask_t *scratch = this_cpu(scratch_cpumask);
+    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
+    unsigned long flags;
+
+    if ( in_mc() || in_nmi() )
+    {
+        /*
+         * When in #NMI or #MC context fallback to the old (and simpler) IPI
+         * sending routine, and avoid doing any performance optimizations (like
+         * using a shorthand). Sending an IPI from such contexts is likely a
+         * sign of a crash, and hence we would like to avoid as much complexity
+         * as possible in order to make sure the IPI is delivered.
+         */
+        alternative_vcall(genapic.send_IPI_mask, mask, vector);
+        return;
+    }
 
     /*
      * This can only be safely used when no CPU hotplug or unplug operations
@@ -81,7 +96,15 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
          local_irq_is_enabled() && (cpus_locked = get_cpu_maps()) &&
          (park_offline_cpus ||
           cpumask_equal(&cpu_online_map, &cpu_present_map)) )
+    {
+        /*
+         * send_IPI_mask can be called from interrupt context, and hence we
+         * need to disable interrupts in order to protect the per-cpu
+         * send_ipi_cpumask while being used.
+         */
+        local_irq_save(flags);
         cpumask_or(scratch, mask, cpumask_of(smp_processor_id()));
+    }
     else
     {
         if ( cpus_locked )
@@ -89,6 +112,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
             put_cpu_maps();
             cpus_locked = false;
         }
+        local_irq_save(flags);
         cpumask_clear(scratch);
     }
 
@@ -97,6 +121,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
     else
         alternative_vcall(genapic.send_IPI_mask, mask, vector);
 
+    local_irq_restore(flags);
     if ( cpus_locked )
         put_cpu_maps();
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index e83e4564a4..82e89201b3 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -57,6 +57,9 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
 static cpumask_t scratch_cpu0mask;
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);
+static cpumask_t send_ipi_cpu0mask;
+
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
@@ -930,6 +933,8 @@ static void cpu_smpboot_free(unsigned int cpu, bool remove)
         FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu));
         if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
             FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu));
+        if ( per_cpu(send_ipi_cpumask, cpu) != &send_ipi_cpu0mask )
+            FREE_CPUMASK_VAR(per_cpu(send_ipi_cpumask, cpu));
     }
 
     cleanup_cpu_root_pgt(cpu);
@@ -1034,7 +1039,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
 
     if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
            cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
-           cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+           cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu)) &&
+           cond_alloc_cpumask_var(&per_cpu(send_ipi_cpumask, cpu))) )
         goto out;
 
     rc = 0;
@@ -1175,6 +1181,7 @@ void __init smp_prepare_boot_cpu(void)
     cpumask_set_cpu(cpu, &cpu_present_map);
 #if NR_CPUS > 2 * BITS_PER_LONG
     per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask;
+    per_cpu(send_ipi_cpumask, cpu) = &send_ipi_cpu0mask;
 #endif
 
     get_cpu_info()->use_pv_cr3 = false;
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask
  2020-02-24 10:46 [Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask Roger Pau Monne
                   ` (3 preceding siblings ...)
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
@ 2020-02-24 10:46 ` Roger Pau Monne
  2020-02-26 10:36   ` Jan Beulich
  4 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monne @ 2020-02-24 10:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Current usage of the per-CPU scratch cpumask is dangerous since
there's no way to figure out if the mask is already being used except
for manual code inspection of all the callers and possible call paths.

This is unsafe and not reliable, so introduce a minimal get/put
infrastructure to prevent nested usage of the scratch mask and usage
in interrupt context.

Move the declaration of scratch_cpumask to smp.c in order to place the
declaration and the accessors as close as possible.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Use __builtin_return_address(0) instead of __func__.
 - Move declaration of scratch_cpumask and scratch_cpumask accessor to
   smp.c.
 - Do not allow usage in #MC or #NMI context.
---
 xen/arch/x86/io_apic.c    |  6 ++++--
 xen/arch/x86/irq.c        | 13 ++++++++++---
 xen/arch/x86/mm.c         | 30 +++++++++++++++++++++---------
 xen/arch/x86/msi.c        |  4 +++-
 xen/arch/x86/smp.c        | 25 +++++++++++++++++++++++++
 xen/arch/x86/smpboot.c    |  1 -
 xen/include/asm-x86/smp.h | 10 ++++++++++
 7 files changed, 73 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..4ee261b632 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
     entry.vector = vector;
 
     if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(entry, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     } else {
         printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
                irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
@@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
 
     if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(rte, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     }
     else
     {
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..7ecf5376e3 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 {
     unsigned int cpu, old_vector, irq = desc->irq;
     unsigned int vector = desc->arch.vector;
-    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
+    cpumask_t *tmp_mask = get_scratch_cpumask();
 
     BUG_ON(!valid_irq_vector(vector));
 
@@ -223,7 +223,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
     trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
 
     if ( likely(!desc->arch.move_in_progress) )
+    {
+        put_scratch_cpumask();
         return;
+    }
 
     /* If we were in motion, also clear desc->arch.old_vector */
     old_vector = desc->arch.old_vector;
@@ -236,6 +239,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
         per_cpu(vector_irq, cpu)[old_vector] = ~irq;
     }
 
+    put_scratch_cpumask();
     release_old_vec(desc);
 
     desc->arch.move_in_progress = 0;
@@ -1152,10 +1156,11 @@ static void irq_guest_eoi_timer_fn(void *data)
         break;
 
     case ACKTYPE_EOI:
-        cpu_eoi_map = this_cpu(scratch_cpumask);
+        cpu_eoi_map = get_scratch_cpumask();
         cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
         spin_unlock_irq(&desc->lock);
         on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
+        put_scratch_cpumask();
         return;
     }
 
@@ -2531,12 +2536,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
     unsigned int irq;
     static int warned;
     struct irq_desc *desc;
+    cpumask_t *affinity = get_scratch_cpumask();
 
     for ( irq = 0; irq < nr_irqs; irq++ )
     {
         bool break_affinity = false, set_affinity = true;
         unsigned int vector;
-        cpumask_t *affinity = this_cpu(scratch_cpumask);
 
         if ( irq == 2 )
             continue;
@@ -2640,6 +2645,8 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
                    irq, CPUMASK_PR(affinity));
     }
 
+    put_scratch_cpumask();
+
     /* That doesn't seem sufficient.  Give it 1ms. */
     local_irq_enable();
     mdelay(1);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 70b87c4830..0320a9ad98 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1262,7 +1262,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
              (l1e_owner == pg_owner) )
         {
             struct vcpu *v;
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             cpumask_clear(mask);
 
@@ -1279,6 +1279,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 
             if ( !cpumask_empty(mask) )
                 flush_tlb_mask(mask);
+            put_scratch_cpumask();
         }
 #endif /* CONFIG_PV_LDT_PAGING */
         put_page(page);
@@ -2903,7 +2904,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                  * vital that no other CPUs are left with mappings of a frame
                  * which is about to become writeable to the guest.
                  */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 BUG_ON(in_irq());
                 cpumask_copy(mask, d->dirty_cpumask);
@@ -2919,6 +2920,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                     perfc_incr(need_flush_tlb_flush);
                     flush_tlb_mask(mask);
                 }
+                put_scratch_cpumask();
 
                 /* We lose existing type and validity. */
                 nx &= ~(PGT_type_mask | PGT_validated);
@@ -3635,7 +3637,7 @@ long do_mmuext_op(
         case MMUEXT_TLB_FLUSH_MULTI:
         case MMUEXT_INVLPG_MULTI:
         {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             if ( unlikely(currd != pg_owner) )
                 rc = -EPERM;
@@ -3645,12 +3647,17 @@ long do_mmuext_op(
                                    mask)) )
                 rc = -EINVAL;
             if ( unlikely(rc) )
+            {
+                put_scratch_cpumask();
                 break;
+            }
 
             if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
                 flush_tlb_mask(mask);
             else if ( __addr_ok(op.arg1.linear_addr) )
                 flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            put_scratch_cpumask();
+
             break;
         }
 
@@ -3683,7 +3690,7 @@ long do_mmuext_op(
             else if ( likely(cache_flush_permitted(currd)) )
             {
                 unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 cpumask_clear(mask);
                 for_each_online_cpu(cpu)
@@ -3691,6 +3698,7 @@ long do_mmuext_op(
                                              per_cpu(cpu_sibling_mask, cpu)) )
                         __cpumask_set_cpu(cpu, mask);
                 flush_mask(mask, FLUSH_CACHE);
+                put_scratch_cpumask();
             }
             else
                 rc = -EINVAL;
@@ -4156,12 +4164,13 @@ long do_mmu_update(
          * Force other vCPU-s of the affected guest to pick up L4 entry
          * changes (if any).
          */
-        unsigned int cpu = smp_processor_id();
-        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
+        cpumask_t *mask = get_scratch_cpumask();
 
-        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
+        cpumask_andnot(mask, pt_owner->dirty_cpumask,
+                       cpumask_of(smp_processor_id()));
         if ( !cpumask_empty(mask) )
             flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
+        put_scratch_cpumask();
     }
 
     perfc_add(num_page_updates, i);
@@ -4353,7 +4362,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4373,7 +4382,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4384,6 +4393,9 @@ static int __do_update_va_mapping(
         break;
     }
 
+    if ( mask && mask != d->dirty_cpumask )
+        put_scratch_cpumask();
+
     return rc;
 }
 
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 161ee60dbe..6624ea20d0 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -159,13 +159,15 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
 
     if ( cpu_mask )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask;
 
         if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
             return;
 
+        mask = get_scratch_cpumask();
         cpumask_and(mask, cpu_mask, &cpu_online_map);
         msg->dest32 = cpu_mask_to_apicid(mask);
+        put_scratch_cpumask();
     }
 
     msg->address_hi = MSI_ADDR_BASE_HI;
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 53e0de2a70..4d9640d135 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -25,6 +25,31 @@
 #include <irq_vectors.h>
 #include <mach_apic.h>
 
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
+
+#ifndef NDEBUG
+cpumask_t *scratch_cpumask(bool use)
+{
+    static DEFINE_PER_CPU(void *, scratch_cpumask_use);
+
+    /*
+     * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI
+     * context.
+     */
+    BUG_ON(in_irq() || in_mc() || in_nmi());
+
+    if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
+    {
+        printk("%p: scratch CPU mask already in use by %p\n",
+               __builtin_return_address(0), this_cpu(scratch_cpumask_use));
+        BUG();
+    }
+    this_cpu(scratch_cpumask_use) = use ? __builtin_return_address(0) : NULL;
+
+    return use ? this_cpu(scratch_cpumask) : NULL;
+}
+#endif
+
 /* Helper functions to prepare APIC register values. */
 static unsigned int prepare_ICR(unsigned int shortcut, int vector)
 {
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 82e89201b3..a2ac3adb38 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -54,7 +54,6 @@ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask);
 /* representing HT and core siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask);
 
-DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
 static cpumask_t scratch_cpu0mask;
 
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, send_ipi_cpumask);
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 92d69a5ea0..40ab6c251d 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -23,6 +23,16 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
+#ifndef NDEBUG
+/* Not to be called directly, use {get/put}_scratch_cpumask(). */
+cpumask_t *scratch_cpumask(bool use);
+#define get_scratch_cpumask() scratch_cpumask(true)
+#define put_scratch_cpumask() ((void)scratch_cpumask(false))
+#else
+#define get_scratch_cpumask() this_cpu(scratch_cpumask)
+#define put_scratch_cpumask()
+#endif
+
 /*
  * Do we, for platform reasons, need to actually keep CPUs online when we
  * would otherwise prefer them to be off?
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 1/5] x86: introduce a nmi_count tracking variable
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 1/5] x86: introduce a nmi_count tracking variable Roger Pau Monne
@ 2020-02-25 16:39   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-02-25 16:39 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 24.02.2020 11:46, Roger Pau Monne wrote:
> This is modeled after the irq_count variable, and is used to account
> for all the NMIs handled by the system.
> 
> This will allow to repurpose the nmi_count() helper so it can be used
> in a similar manner as local_irq_count(): account for the NMIs
> currently in service.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

This all looks fine, but shouldn't this be accompanied by dropping
nmi_count() and its underlying struct field?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context Roger Pau Monne
@ 2020-02-25 16:49   ` Jan Beulich
  2020-02-25 16:51   ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-02-25 16:49 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, xen-devel

On 24.02.2020 11:46, Roger Pau Monne wrote:
> Add helpers to track when running in #NMI context. This is modeled
> after the in_irq helpers.
> 
> The SDM states that no #NMI can be delivered while handling a #NMI
> until the processor has executed an iret instruction. It's possible
> however that another fault is received while handling the #NMI (a #MC
> for example), and thus the iret from that fault would allow further
> #NMIs to be injected while still processing the previous one, and
> hence an integer is needed in order to keep track of in service #NMIs.

While I agree that this needs taking care of, I'm afraid in_nmi()
is becoming ambiguous because of this - you give it the meaning
"we're handling an NMI", while one could also assume it to mean
"we're in NMI context, i.e. further NMIs are not possible". IOW
I think we want to consider using another name, despite this
getting things less nicely aligned with in_irq(). Perhaps
in_nmi_handler()?

> While there move nmi_count() into a x86 specific header and drop the
> leading underscores from __nmi_count field.

I notice you re-use the fields that I suggested to be removed by
the prior patch. I think logically they should indeed be removed
there, and a new field and a new macro be introduced here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context Roger Pau Monne
  2020-02-25 16:49   ` Jan Beulich
@ 2020-02-25 16:51   ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-02-25 16:51 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Ian Jackson, George Dunlap, xen-devel

On 24.02.2020 11:46, Roger Pau Monne wrote:
> @@ -18,6 +18,11 @@ typedef struct {
>  #define irq_enter()	(local_irq_count(smp_processor_id())++)
>  #define irq_exit()	(local_irq_count(smp_processor_id())--)
>  
> +#define nmi_count(cpu)	__IRQ_STAT((cpu), nmi_count)

Oh, btw (noticed only while already looking at the next
patch) - no need for the parentheses around "cpu" afaict.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 3/5] x86: track when in #MC context
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 3/5] x86: track when in #MC context Roger Pau Monne
@ 2020-02-25 17:06   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-02-25 17:06 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 24.02.2020 11:46, Roger Pau Monne wrote:
> Add helpers to track when executing in #MC context. This is modeled
> after the in_irq helpers.

Same concern regarding the name as for NMI, i.e. perhaps
in_mce_handler() here. (Note also the added 'e', which I think
would make for slightly less in risk of becoming ambiguous
names.) It may be worthwhile also talking about the possible
(in theory) nesting here, as you do in the NMI patch
description. (In practice nesting of #MC is fatal, because of
the stack switch we request when the handler gets invoked.
But this is something that could be changed, so assuming here
that nesting is possible seems appropriate to me.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
@ 2020-02-26 10:07   ` Jan Beulich
  2020-02-26 10:18     ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2020-02-26 10:07 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Sander Eikelenboom, Wei Liu, Andrew Cooper

On 24.02.2020 11:46, Roger Pau Monne wrote:
> Using scratch_cpumask in send_IPI_mask is not safe because it can be
> called from interrupt context, and hence Xen would have to make sure
> all the users of the scratch cpumask disable interrupts while using
> it.
> 
> Instead introduce a new cpumask to be used by send_IPI_mask, and
> disable interrupts while using it.

The alternative of also adding ...

> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -59,6 +59,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>      apic_write(APIC_ICR, cfg);
>  }
>  
> +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
>  /*
>   * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
>   * excluding the local CPU. @cpumask may be empty.
> @@ -67,7 +68,21 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>  void send_IPI_mask(const cpumask_t *mask, int vector)
>  {
>      bool cpus_locked = false;
> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> +    unsigned long flags;
> +
> +    if ( in_mc() || in_nmi() )

... in_irq() here was considered, and discarded because of? With
this you'd not need the second CPU mask and you'd also be able
to avoid disabling an re-enabling IRQs.

In order to not disturb the partial sentence, a small request on
the previous hunk as well: Could you add a blank line after the
new definition, please?

> +    {
> +        /*
> +         * When in #NMI or #MC context fallback to the old (and simpler) IPI

Note that conventional notation indeed is #MC but just NMI (applies
here, in the description, and also elsewhere in the series).

> @@ -81,7 +96,15 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
>           local_irq_is_enabled() && (cpus_locked = get_cpu_maps()) &&
>           (park_offline_cpus ||
>            cpumask_equal(&cpu_online_map, &cpu_present_map)) )
> +    {
> +        /*
> +         * send_IPI_mask can be called from interrupt context, and hence we
> +         * need to disable interrupts in order to protect the per-cpu
> +         * send_ipi_cpumask while being used.
> +         */
> +        local_irq_save(flags);
>          cpumask_or(scratch, mask, cpumask_of(smp_processor_id()));
> +    }
>      else
>      {
>          if ( cpus_locked )
> @@ -89,6 +112,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
>              put_cpu_maps();
>              cpus_locked = false;
>          }
> +        local_irq_save(flags);
>          cpumask_clear(scratch);
>      }
>  
> @@ -97,6 +121,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
>      else
>          alternative_vcall(genapic.send_IPI_mask, mask, vector);
>  
> +    local_irq_restore(flags);

Wouldn't it be better to re-enable interrupts in the "else" branch
visible in context ahead of the call?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-26 10:07   ` Jan Beulich
@ 2020-02-26 10:18     ` Roger Pau Monné
  2020-02-26 10:45       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2020-02-26 10:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Sander Eikelenboom, Wei Liu, Andrew Cooper

On Wed, Feb 26, 2020 at 11:07:44AM +0100, Jan Beulich wrote:
> On 24.02.2020 11:46, Roger Pau Monne wrote:
> > Using scratch_cpumask in send_IPI_mask is not safe because it can be
> > called from interrupt context, and hence Xen would have to make sure
> > all the users of the scratch cpumask disable interrupts while using
> > it.
> > 
> > Instead introduce a new cpumask to be used by send_IPI_mask, and
> > disable interrupts while using it.
> 
> The alternative of also adding ...
> 
> > --- a/xen/arch/x86/smp.c
> > +++ b/xen/arch/x86/smp.c
> > @@ -59,6 +59,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >      apic_write(APIC_ICR, cfg);
> >  }
> >  
> > +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
> >  /*
> >   * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
> >   * excluding the local CPU. @cpumask may be empty.
> > @@ -67,7 +68,21 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >  void send_IPI_mask(const cpumask_t *mask, int vector)
> >  {
> >      bool cpus_locked = false;
> > -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> > +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> > +    unsigned long flags;
> > +
> > +    if ( in_mc() || in_nmi() )
> 
> ... in_irq() here was considered, and discarded because of? With
> this you'd not need the second CPU mask and you'd also be able
> to avoid disabling an re-enabling IRQs.

I aimed to use the shorthand as much as possible, but I would also be
fine with not using it in irq context. I assume there aren't that many
flushes in irq context, and then the IPIs sent are probably not
broadcast ones.

> 
> In order to not disturb the partial sentence, a small request on
> the previous hunk as well: Could you add a blank line after the
> new definition, please?
> 
> > +    {
> > +        /*
> > +         * When in #NMI or #MC context fallback to the old (and simpler) IPI
> 
> Note that conventional notation indeed is #MC but just NMI (applies
> here, in the description, and also elsewhere in the series).
> 
> > @@ -81,7 +96,15 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
> >           local_irq_is_enabled() && (cpus_locked = get_cpu_maps()) &&
> >           (park_offline_cpus ||
> >            cpumask_equal(&cpu_online_map, &cpu_present_map)) )
> > +    {
> > +        /*
> > +         * send_IPI_mask can be called from interrupt context, and hence we
> > +         * need to disable interrupts in order to protect the per-cpu
> > +         * send_ipi_cpumask while being used.
> > +         */
> > +        local_irq_save(flags);
> >          cpumask_or(scratch, mask, cpumask_of(smp_processor_id()));
> > +    }
> >      else
> >      {
> >          if ( cpus_locked )
> > @@ -89,6 +112,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
> >              put_cpu_maps();
> >              cpus_locked = false;
> >          }
> > +        local_irq_save(flags);
> >          cpumask_clear(scratch);
> >      }
> >  
> > @@ -97,6 +121,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
> >      else
> >          alternative_vcall(genapic.send_IPI_mask, mask, vector);
> >  
> > +    local_irq_restore(flags);
> 
> Wouldn't it be better to re-enable interrupts in the "else" branch
> visible in context ahead of the call?

I think I will go with your suggestion and don't use the shorthand in
irq contenxt, and hence we won't need to disable interrupts then.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask
  2020-02-24 10:46 ` [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask Roger Pau Monne
@ 2020-02-26 10:36   ` Jan Beulich
  2020-02-27 17:54     ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2020-02-26 10:36 UTC (permalink / raw)
  To: Roger Pau Monne, Andrew Cooper; +Cc: xen-devel, Wei Liu

On 24.02.2020 11:46, Roger Pau Monne wrote:
> Current usage of the per-CPU scratch cpumask is dangerous since
> there's no way to figure out if the mask is already being used except
> for manual code inspection of all the callers and possible call paths.
> 
> This is unsafe and not reliable, so introduce a minimal get/put
> infrastructure to prevent nested usage of the scratch mask and usage
> in interrupt context.

While I can see the reasoning (especially in light of the change
which did violate to assumption), I'm still uncertain if this isn't
"over-engineering". Andrew, do you have a clear opinion one way or
the other here?

> Move the declaration of scratch_cpumask to smp.c in order to place the
> declaration and the accessors as close as possible.

s/declaration/definition/g

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
>  {
>      unsigned int cpu, old_vector, irq = desc->irq;
>      unsigned int vector = desc->arch.vector;
> -    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
> +    cpumask_t *tmp_mask = get_scratch_cpumask();
>  
>      BUG_ON(!valid_irq_vector(vector));
>  
> @@ -223,7 +223,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
>      trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
>  
>      if ( likely(!desc->arch.move_in_progress) )
> +    {
> +        put_scratch_cpumask();
>          return;
> +    }

I think if possible such error path adjustments would better be
avoided. And this seems feasible here: There are two entirely
independent used of the scratch mask in this function. You could
therefore put the mask above from here, and get it again further
down, or you could leverage a property of the current
implementation, plus the fact that the 2nd use doesn't involved
any "real" function calls, and avoid a 2nd get/put altogether.

Of course another question then is whether it is a good property
of the current model, i.e. whether it wouldn't be better for
"put" to actually zap the pointer, to prevent subsequent use.

> @@ -2531,12 +2536,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>      unsigned int irq;
>      static int warned;
>      struct irq_desc *desc;
> +    cpumask_t *affinity = get_scratch_cpumask();
>  
>      for ( irq = 0; irq < nr_irqs; irq++ )
>      {
>          bool break_affinity = false, set_affinity = true;
>          unsigned int vector;
> -        cpumask_t *affinity = this_cpu(scratch_cpumask);
>  
>          if ( irq == 2 )
>              continue;
> @@ -2640,6 +2645,8 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>                     irq, CPUMASK_PR(affinity));
>      }
>  
> +    put_scratch_cpumask();

Just as a remark, not necessarily as a request to change the code: I
wonder if down the road this pretty wide scope of "holding" the mask
isn't going to bite us, when a function called from here (in a range
of code not actively needing the mask) also may want to use the mask.
But of course we can make this finer grained at the point where it
might actually start mattering.

> @@ -3645,12 +3647,17 @@ long do_mmuext_op(
>                                     mask)) )
>                  rc = -EINVAL;
>              if ( unlikely(rc) )
> +            {
> +                put_scratch_cpumask();
>                  break;
> +            }

Again, instead of adjusting an error path, how about making this
have an empty statement (i.e. dropping the break) and ...

>              if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )

... having this become "else if()"?

> @@ -4384,6 +4393,9 @@ static int __do_update_va_mapping(
>          break;
>      }
>  
> +    if ( mask && mask != d->dirty_cpumask )
> +        put_scratch_cpumask();

The right side of the && here makes things feel a little fragile for
me.

> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -159,13 +159,15 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
>  
>      if ( cpu_mask )
>      {
> -        cpumask_t *mask = this_cpu(scratch_cpumask);
> +        cpumask_t *mask;
>  
>          if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
>              return;
>  
> +        mask = get_scratch_cpumask();
>          cpumask_and(mask, cpu_mask, &cpu_online_map);
>          msg->dest32 = cpu_mask_to_apicid(mask);
> +        put_scratch_cpumask();
>      }

This, I think, could do with a little more changing:

    if ( cpu_mask )
    {
        cpumask_t *mask = get_scratch_cpumask();

        cpumask_and(mask, cpu_mask, &cpu_online_map);
        if ( !cpumask_empty(mask) )
            msg->dest32 = cpu_mask_to_apicid(mask);
        put_scratch_cpumask();
    }

This way instead of looking twice at two cpumask_t instances, the
2nd one involves just one. Thoughts?

> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -25,6 +25,31 @@
>  #include <irq_vectors.h>
>  #include <mach_apic.h>
>  
> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
> +
> +#ifndef NDEBUG
> +cpumask_t *scratch_cpumask(bool use)
> +{
> +    static DEFINE_PER_CPU(void *, scratch_cpumask_use);
> +
> +    /*
> +     * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI
> +     * context.
> +     */
> +    BUG_ON(in_irq() || in_mc() || in_nmi());
> +
> +    if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
> +    {
> +        printk("%p: scratch CPU mask already in use by %p\n",
> +               __builtin_return_address(0), this_cpu(scratch_cpumask_use));

__builtin_return_address(0) simply shows another time what ...

> +        BUG();

... this already will show. I'd suggest to drop it. Also I think
you want %ps here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-26 10:18     ` Roger Pau Monné
@ 2020-02-26 10:45       ` Jan Beulich
  2020-02-26 10:51         ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2020-02-26 10:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Sander Eikelenboom, Wei Liu, Andrew Cooper

On 26.02.2020 11:18, Roger Pau Monné wrote:
> On Wed, Feb 26, 2020 at 11:07:44AM +0100, Jan Beulich wrote:
>> On 24.02.2020 11:46, Roger Pau Monne wrote:
>>> Using scratch_cpumask in send_IPI_mask is not safe because it can be
>>> called from interrupt context, and hence Xen would have to make sure
>>> all the users of the scratch cpumask disable interrupts while using
>>> it.
>>>
>>> Instead introduce a new cpumask to be used by send_IPI_mask, and
>>> disable interrupts while using it.
>>
>> The alternative of also adding ...
>>
>>> --- a/xen/arch/x86/smp.c
>>> +++ b/xen/arch/x86/smp.c
>>> @@ -59,6 +59,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>      apic_write(APIC_ICR, cfg);
>>>  }
>>>  
>>> +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
>>>  /*
>>>   * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
>>>   * excluding the local CPU. @cpumask may be empty.
>>> @@ -67,7 +68,21 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>>  {
>>>      bool cpus_locked = false;
>>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
>>> +    unsigned long flags;
>>> +
>>> +    if ( in_mc() || in_nmi() )
>>
>> ... in_irq() here was considered, and discarded because of? With
>> this you'd not need the second CPU mask and you'd also be able
>> to avoid disabling an re-enabling IRQs.
> 
> I aimed to use the shorthand as much as possible, but I would also be
> fine with not using it in irq context. I assume there aren't that many
> flushes in irq context, and then the IPIs sent are probably not
> broadcast ones.

Well, here it's not just flushes, is it? Knowing some (typical)
IRQ context uses could allow us take a better decision. After
Sander's report, did you actually identify what path(s) the
earlier change broke?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask
  2020-02-26 10:45       ` Jan Beulich
@ 2020-02-26 10:51         ` Roger Pau Monné
  0 siblings, 0 replies; 17+ messages in thread
From: Roger Pau Monné @ 2020-02-26 10:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Sander Eikelenboom, Wei Liu, Andrew Cooper

On Wed, Feb 26, 2020 at 11:45:40AM +0100, Jan Beulich wrote:
> On 26.02.2020 11:18, Roger Pau Monné wrote:
> > On Wed, Feb 26, 2020 at 11:07:44AM +0100, Jan Beulich wrote:
> >> On 24.02.2020 11:46, Roger Pau Monne wrote:
> >>> Using scratch_cpumask in send_IPI_mask is not safe because it can be
> >>> called from interrupt context, and hence Xen would have to make sure
> >>> all the users of the scratch cpumask disable interrupts while using
> >>> it.
> >>>
> >>> Instead introduce a new cpumask to be used by send_IPI_mask, and
> >>> disable interrupts while using it.
> >>
> >> The alternative of also adding ...
> >>
> >>> --- a/xen/arch/x86/smp.c
> >>> +++ b/xen/arch/x86/smp.c
> >>> @@ -59,6 +59,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >>>      apic_write(APIC_ICR, cfg);
> >>>  }
> >>>  
> >>> +DECLARE_PER_CPU(cpumask_var_t, send_ipi_cpumask);
> >>>  /*
> >>>   * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
> >>>   * excluding the local CPU. @cpumask may be empty.
> >>> @@ -67,7 +68,21 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
> >>>  void send_IPI_mask(const cpumask_t *mask, int vector)
> >>>  {
> >>>      bool cpus_locked = false;
> >>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> >>> +    cpumask_t *scratch = this_cpu(send_ipi_cpumask);
> >>> +    unsigned long flags;
> >>> +
> >>> +    if ( in_mc() || in_nmi() )
> >>
> >> ... in_irq() here was considered, and discarded because of? With
> >> this you'd not need the second CPU mask and you'd also be able
> >> to avoid disabling an re-enabling IRQs.
> > 
> > I aimed to use the shorthand as much as possible, but I would also be
> > fine with not using it in irq context. I assume there aren't that many
> > flushes in irq context, and then the IPIs sent are probably not
> > broadcast ones.
> 
> Well, here it's not just flushes, is it?

No, this covers all IPIs. My remark was that flush IPIs are more
likely to target all CPUs than other IPIs, together with global
rendezvous but I assume those aren't that frequent.

> Knowing some (typical)
> IRQ context uses could allow us take a better decision. After
> Sander's report, did you actually identify what path(s) the
> earlier change broke?

No, I assume something related to passthrough, but I haven't been able
to trigger it myself.

Going for the easier solution right now seems like the most sensible
approach, we can always add a dedicated mask if necessary.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask
  2020-02-26 10:36   ` Jan Beulich
@ 2020-02-27 17:54     ` Roger Pau Monné
  2020-02-28  8:48       ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2020-02-27 17:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Wed, Feb 26, 2020 at 11:36:52AM +0100, Jan Beulich wrote:
> On 24.02.2020 11:46, Roger Pau Monne wrote:
> > Current usage of the per-CPU scratch cpumask is dangerous since
> > there's no way to figure out if the mask is already being used except
> > for manual code inspection of all the callers and possible call paths.
> > 
> > This is unsafe and not reliable, so introduce a minimal get/put
> > infrastructure to prevent nested usage of the scratch mask and usage
> > in interrupt context.
> 
> While I can see the reasoning (especially in light of the change
> which did violate to assumption), I'm still uncertain if this isn't
> "over-engineering". Andrew, do you have a clear opinion one way or
> the other here?
> 
> > Move the declaration of scratch_cpumask to smp.c in order to place the
> > declaration and the accessors as close as possible.
> 
> s/declaration/definition/g

Done.

> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
> >  {
> >      unsigned int cpu, old_vector, irq = desc->irq;
> >      unsigned int vector = desc->arch.vector;
> > -    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
> > +    cpumask_t *tmp_mask = get_scratch_cpumask();
> >  
> >      BUG_ON(!valid_irq_vector(vector));
> >  
> > @@ -223,7 +223,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
> >      trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
> >  
> >      if ( likely(!desc->arch.move_in_progress) )
> > +    {
> > +        put_scratch_cpumask();
> >          return;
> > +    }
> 
> I think if possible such error path adjustments would better be
> avoided. And this seems feasible here: There are two entirely
> independent used of the scratch mask in this function. You could
> therefore put the mask above from here, and get it again further
> down, or you could leverage a property of the current
> implementation, plus the fact that the 2nd use doesn't involved
> any "real" function calls, and avoid a 2nd get/put altogether.

No, it's very easy to add function calls later on and forget to use
get_scratch_cpumask.

> Of course another question then is whether it is a good property
> of the current model, i.e. whether it wouldn't be better for
> "put" to actually zap the pointer, to prevent subsequent use.

So that put_scratch_cpumask takes the pointer as a parameter and
writes NULL to it?

> > @@ -2531,12 +2536,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >      unsigned int irq;
> >      static int warned;
> >      struct irq_desc *desc;
> > +    cpumask_t *affinity = get_scratch_cpumask();
> >  
> >      for ( irq = 0; irq < nr_irqs; irq++ )
> >      {
> >          bool break_affinity = false, set_affinity = true;
> >          unsigned int vector;
> > -        cpumask_t *affinity = this_cpu(scratch_cpumask);
> >  
> >          if ( irq == 2 )
> >              continue;
> > @@ -2640,6 +2645,8 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >                     irq, CPUMASK_PR(affinity));
> >      }
> >  
> > +    put_scratch_cpumask();
> 
> Just as a remark, not necessarily as a request to change the code: I
> wonder if down the road this pretty wide scope of "holding" the mask
> isn't going to bite us, when a function called from here (in a range
> of code not actively needing the mask) also may want to use the mask.
> But of course we can make this finer grained at the point where it
> might actually start mattering.

We can always reduce the scope if there's a need for it, until then I
would rather leave this as-is.

> 
> > @@ -3645,12 +3647,17 @@ long do_mmuext_op(
> >                                     mask)) )
> >                  rc = -EINVAL;
> >              if ( unlikely(rc) )
> > +            {
> > +                put_scratch_cpumask();
> >                  break;
> > +            }
> 
> Again, instead of adjusting an error path, how about making this
> have an empty statement (i.e. dropping the break) and ...
> 
> >              if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
> 
> ... having this become "else if()"?
> 
> > @@ -4384,6 +4393,9 @@ static int __do_update_va_mapping(
> >          break;
> >      }
> >  
> > +    if ( mask && mask != d->dirty_cpumask )
> > +        put_scratch_cpumask();
> 
> The right side of the && here makes things feel a little fragile for
> me.

What about using:

switch ( flags & ~UVMF_FLUSHTYPE_MASK )
{
case UVMF_LOCAL:
case UVMF_ALL:
    break;

default:
    put_scratch_cpumask();
}

I could also use an if, but I think it's clearer with a switch.

> > --- a/xen/arch/x86/msi.c
> > +++ b/xen/arch/x86/msi.c
> > @@ -159,13 +159,15 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
> >  
> >      if ( cpu_mask )
> >      {
> > -        cpumask_t *mask = this_cpu(scratch_cpumask);
> > +        cpumask_t *mask;
> >  
> >          if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
> >              return;
> >  
> > +        mask = get_scratch_cpumask();
> >          cpumask_and(mask, cpu_mask, &cpu_online_map);
> >          msg->dest32 = cpu_mask_to_apicid(mask);
> > +        put_scratch_cpumask();
> >      }
> 
> This, I think, could do with a little more changing:
> 
>     if ( cpu_mask )
>     {
>         cpumask_t *mask = get_scratch_cpumask();
> 
>         cpumask_and(mask, cpu_mask, &cpu_online_map);
>         if ( !cpumask_empty(mask) )
>             msg->dest32 = cpu_mask_to_apicid(mask);
>         put_scratch_cpumask();
>     }
> 
> This way instead of looking twice at two cpumask_t instances, the
> 2nd one involves just one. Thoughts?

LGTM.

Note that this won't exit early however in case masks don't intersect,
and will set the address field with whatever is in msg->dest32.

> > --- a/xen/arch/x86/smp.c
> > +++ b/xen/arch/x86/smp.c
> > @@ -25,6 +25,31 @@
> >  #include <irq_vectors.h>
> >  #include <mach_apic.h>
> >  
> > +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, scratch_cpumask);
> > +
> > +#ifndef NDEBUG
> > +cpumask_t *scratch_cpumask(bool use)
> > +{
> > +    static DEFINE_PER_CPU(void *, scratch_cpumask_use);
> > +
> > +    /*
> > +     * Due to reentrancy scratch cpumask cannot be used in IRQ, #MC or #NMI
> > +     * context.
> > +     */
> > +    BUG_ON(in_irq() || in_mc() || in_nmi());
> > +
> > +    if ( use && unlikely(this_cpu(scratch_cpumask_use)) )
> > +    {
> > +        printk("%p: scratch CPU mask already in use by %p\n",
> > +               __builtin_return_address(0), this_cpu(scratch_cpumask_use));
> 
> __builtin_return_address(0) simply shows another time what ...
> 
> > +        BUG();
> 
> ... this already will show. I'd suggest to drop it. Also I think
> you want %ps here.

Done, thanks.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask
  2020-02-27 17:54     ` Roger Pau Monné
@ 2020-02-28  8:48       ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2020-02-28  8:48 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 27.02.2020 18:54, Roger Pau Monné wrote:
> On Wed, Feb 26, 2020 at 11:36:52AM +0100, Jan Beulich wrote:
>> On 24.02.2020 11:46, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
>>>  {
>>>      unsigned int cpu, old_vector, irq = desc->irq;
>>>      unsigned int vector = desc->arch.vector;
>>> -    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
>>> +    cpumask_t *tmp_mask = get_scratch_cpumask();
>>>  
>>>      BUG_ON(!valid_irq_vector(vector));
>>>  
>>> @@ -223,7 +223,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
>>>      trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
>>>  
>>>      if ( likely(!desc->arch.move_in_progress) )
>>> +    {
>>> +        put_scratch_cpumask();
>>>          return;
>>> +    }
>>
>> I think if possible such error path adjustments would better be
>> avoided. And this seems feasible here: There are two entirely
>> independent used of the scratch mask in this function. You could
>> therefore put the mask above from here, and get it again further
>> down, or you could leverage a property of the current
>> implementation, plus the fact that the 2nd use doesn't involved
>> any "real" function calls, and avoid a 2nd get/put altogether.
> 
> No, it's very easy to add function calls later on and forget to use
> get_scratch_cpumask.

Well, yes, such a deliberate omission would of course require a
bold comment. 

>> Of course another question then is whether it is a good property
>> of the current model, i.e. whether it wouldn't be better for
>> "put" to actually zap the pointer, to prevent subsequent use.
> 
> So that put_scratch_cpumask takes the pointer as a parameter and
> writes NULL to it?

For example, yes.

>>> @@ -3645,12 +3647,17 @@ long do_mmuext_op(
>>>                                     mask)) )
>>>                  rc = -EINVAL;
>>>              if ( unlikely(rc) )
>>> +            {
>>> +                put_scratch_cpumask();
>>>                  break;
>>> +            }
>>
>> Again, instead of adjusting an error path, how about making this
>> have an empty statement (i.e. dropping the break) and ...
>>
>>>              if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
>>
>> ... having this become "else if()"?
>>
>>> @@ -4384,6 +4393,9 @@ static int __do_update_va_mapping(
>>>          break;
>>>      }
>>>  
>>> +    if ( mask && mask != d->dirty_cpumask )
>>> +        put_scratch_cpumask();
>>
>> The right side of the && here makes things feel a little fragile for
>> me.
> 
> What about using:
> 
> switch ( flags & ~UVMF_FLUSHTYPE_MASK )
> {
> case UVMF_LOCAL:
> case UVMF_ALL:
>     break;
> 
> default:
>     put_scratch_cpumask();
> }

Fine with me.

> I could also use an if, but I think it's clearer with a switch.

Agreed.

>>> --- a/xen/arch/x86/msi.c
>>> +++ b/xen/arch/x86/msi.c
>>> @@ -159,13 +159,15 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
>>>  
>>>      if ( cpu_mask )
>>>      {
>>> -        cpumask_t *mask = this_cpu(scratch_cpumask);
>>> +        cpumask_t *mask;
>>>  
>>>          if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
>>>              return;
>>>  
>>> +        mask = get_scratch_cpumask();
>>>          cpumask_and(mask, cpu_mask, &cpu_online_map);
>>>          msg->dest32 = cpu_mask_to_apicid(mask);
>>> +        put_scratch_cpumask();
>>>      }
>>
>> This, I think, could do with a little more changing:
>>
>>     if ( cpu_mask )
>>     {
>>         cpumask_t *mask = get_scratch_cpumask();
>>
>>         cpumask_and(mask, cpu_mask, &cpu_online_map);
>>         if ( !cpumask_empty(mask) )
>>             msg->dest32 = cpu_mask_to_apicid(mask);
>>         put_scratch_cpumask();
>>     }
>>
>> This way instead of looking twice at two cpumask_t instances, the
>> 2nd one involves just one. Thoughts?
> 
> LGTM.
> 
> Note that this won't exit early however in case masks don't intersect,
> and will set the address field with whatever is in msg->dest32.

Oh, I should have noticed this. No, the early exit has to remain
one way or another. I guess I'm fine then with the original
variant.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-02-28  8:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24 10:46 [Xen-devel] [PATCH v3 0/5] x86: fixes/improvements for scratch cpumask Roger Pau Monne
2020-02-24 10:46 ` [Xen-devel] [PATCH v3 1/5] x86: introduce a nmi_count tracking variable Roger Pau Monne
2020-02-25 16:39   ` Jan Beulich
2020-02-24 10:46 ` [Xen-devel] [PATCH v3 2/5] x86: track when in #NMI context Roger Pau Monne
2020-02-25 16:49   ` Jan Beulich
2020-02-25 16:51   ` Jan Beulich
2020-02-24 10:46 ` [Xen-devel] [PATCH v3 3/5] x86: track when in #MC context Roger Pau Monne
2020-02-25 17:06   ` Jan Beulich
2020-02-24 10:46 ` [Xen-devel] [PATCH v3 4/5] x86/smp: use a dedicated scratch cpumask in send_IPI_mask Roger Pau Monne
2020-02-26 10:07   ` Jan Beulich
2020-02-26 10:18     ` Roger Pau Monné
2020-02-26 10:45       ` Jan Beulich
2020-02-26 10:51         ` Roger Pau Monné
2020-02-24 10:46 ` [Xen-devel] [PATCH v3 5/5] x86: add accessors for scratch cpu mask Roger Pau Monne
2020-02-26 10:36   ` Jan Beulich
2020-02-27 17:54     ` Roger Pau Monné
2020-02-28  8:48       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.