[PATCH v3 0/4] x86: allow to suppress use of hyper-threading

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/4] x86: allow to suppress use of hyper-threading
@ 2018-07-19 10:25 Jan Beulich
  2018-07-19 10:31 ` [PATCH v3 1/4] x86: distinguish CPU offlining from CPU removal Jan Beulich
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jan Beulich @ 2018-07-19 10:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

I've been considering to add a respective command line option for
quite a long time, but never got around to. Now that the TLBleed
information is public[1], we're at a point where we not only want,
but need this, and where perhaps it needs to be the default on
affected systems. The first 2 patches are prerequisites to the 3rd
one; the final one is simply cleanup.

I've retained all tags provided for v2, as the changes are really
small and have largely been requested for the tags to apply.

1: x86: distinguish CPU offlining from CPU removal
2: x86: bring up all CPUs even if not all are supposed to be used
3: x86: command line option to avoid use of secondary hyper-threads
4: cpumask: tidy {,z}alloc_cpumask_var() 

Jan

[1] https://www.vusec.net/projects/tlbleed/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/4] x86: distinguish CPU offlining from CPU removal
  2018-07-19 10:25 [PATCH v3 0/4] x86: allow to suppress use of hyper-threading Jan Beulich
@ 2018-07-19 10:31 ` Jan Beulich
  2018-07-19 10:32 ` [PATCH v3 2/4] x86: possibly bring up all CPUs even if not all are supposed to be used Jan Beulich
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2018-07-19 10:31 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

In order to be able to service #MC on offlined CPUs, the GDT, IDT,
stack, and per-CPU data (which includes the TSS) need to be kept
allocated. They should only be freed upon CPU removal (which we
currently don't support, so some code is becoming effectively dead for
the moment).

Note that for now park_offline_cpus doesn't get set to true anywhere -
this is going to be the subject of a subsequent patch.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Simplify an expression. Alias FREE_CPUMASK_VAR() to
    free_cpumask_var() in the low-NR_CPUS case. Slightly re-write /
    extend description.
v2: Rename cpu_smpboot_free()'s new parameter. Introduce XFREE(),
    FREE_XENHEAP_PAGES(), FREE_XENHEAP_PAGE(), and FREE_CPUMASK_VAR().

--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -692,12 +692,15 @@ static void cpu_bank_free(unsigned int c
 
     mcabanks_free(poll);
     mcabanks_free(clr);
+
+    per_cpu(poll_bankmask, cpu) = NULL;
+    per_cpu(mce_clear_banks, cpu) = NULL;
 }
 
 static int cpu_bank_alloc(unsigned int cpu)
 {
-    struct mca_banks *poll = mcabanks_alloc();
-    struct mca_banks *clr = mcabanks_alloc();
+    struct mca_banks *poll = per_cpu(poll_bankmask, cpu) ?: mcabanks_alloc();
+    struct mca_banks *clr = per_cpu(mce_clear_banks, cpu) ?: mcabanks_alloc();
 
     if ( !poll || !clr )
     {
@@ -725,7 +728,13 @@ static int cpu_callback(
 
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        cpu_bank_free(cpu);
+        if ( !park_offline_cpus )
+            cpu_bank_free(cpu);
+        break;
+
+    case CPU_REMOVE:
+        if ( park_offline_cpus )
+            cpu_bank_free(cpu);
         break;
     }
 
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -107,10 +107,11 @@ static void play_dead(void)
     local_irq_disable();
 
     /*
-     * NOTE: After cpu_exit_clear, per-cpu variables are no longer accessible,
-     * as they may be freed at any time. In this case, heap corruption or
-     * #PF can occur (when heap debugging is enabled). For example, even
-     * printk() can involve tasklet scheduling, which touches per-cpu vars.
+     * NOTE: After cpu_exit_clear, per-cpu variables may no longer accessible,
+     * as they may be freed at any time if offline CPUs don't get parked. In
+     * this case, heap corruption or #PF can occur (when heap debugging is
+     * enabled). For example, even printk() can involve tasklet scheduling,
+     * which touches per-cpu vars.
      * 
      * Consider very carefully when adding code to *dead_idle. Most hypervisor
      * subsystems are unsafe to call.
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -201,18 +201,21 @@ static int update_clusterinfo(
         if ( !cluster_cpus_spare )
             cluster_cpus_spare = xzalloc(cpumask_t);
         if ( !cluster_cpus_spare ||
-             !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
+             !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
             err = -ENOMEM;
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
+    case CPU_REMOVE:
+        if ( park_offline_cpus == (action != CPU_REMOVE) )
+            break;
         if ( per_cpu(cluster_cpus, cpu) )
         {
             cpumask_clear_cpu(cpu, per_cpu(cluster_cpus, cpu));
             if ( cpumask_empty(per_cpu(cluster_cpus, cpu)) )
-                xfree(per_cpu(cluster_cpus, cpu));
+                XFREE(per_cpu(cluster_cpus, cpu));
         }
-        free_cpumask_var(per_cpu(scratch_mask, cpu));
+        FREE_CPUMASK_VAR(per_cpu(scratch_mask, cpu));
         break;
     }
 
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -28,7 +28,7 @@ static int init_percpu_area(unsigned int
     char *p;
 
     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
-        return -EBUSY;
+        return 0;
 
     if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
         return -ENOMEM;
@@ -76,9 +76,12 @@ static int cpu_percpu_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        free_percpu_area(cpu);
+        if ( !park_offline_cpus )
+            free_percpu_area(cpu);
         break;
-    default:
+    case CPU_REMOVE:
+        if ( park_offline_cpus )
+            free_percpu_area(cpu);
         break;
     }
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
+bool __read_mostly park_offline_cpus;
+
 unsigned int __read_mostly nr_sockets;
 cpumask_t **__read_mostly socket_cpumask;
 static cpumask_t *secondary_socket_cpumask;
@@ -895,7 +897,14 @@ static void cleanup_cpu_root_pgt(unsigne
     }
 }
 
-static void cpu_smpboot_free(unsigned int cpu)
+/*
+ * The 'remove' boolean controls whether a CPU is just getting offlined (and
+ * parked), or outright removed / offlined without parking. Parked CPUs need
+ * things like their stack, GDT, IDT, TSS, and per-CPU data still available.
+ * A few other items, in particular CPU masks, are also retained, as it's
+ * difficult to prove that they're entirely unreferenced from parked CPUs.
+ */
+static void cpu_smpboot_free(unsigned int cpu, bool remove)
 {
     unsigned int order, socket = cpu_to_socket(cpu);
     struct cpuinfo_x86 *c = cpu_data;
@@ -906,15 +915,19 @@ static void cpu_smpboot_free(unsigned in
         socket_cpumask[socket] = NULL;
     }
 
-    c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
-    c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
-    c[cpu].compute_unit_id = INVALID_CUID;
     cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
 
-    free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
-    free_cpumask_var(per_cpu(cpu_core_mask, cpu));
-    if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
-        free_cpumask_var(per_cpu(scratch_cpumask, cpu));
+    if ( remove )
+    {
+        c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
+        c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
+        c[cpu].compute_unit_id = INVALID_CUID;
+
+        FREE_CPUMASK_VAR(per_cpu(cpu_sibling_mask, cpu));
+        FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu));
+        if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
+            FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu));
+    }
 
     cleanup_cpu_root_pgt(cpu);
 
@@ -936,19 +949,21 @@ static void cpu_smpboot_free(unsigned in
     }
 
     order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
-    free_xenheap_pages(per_cpu(gdt_table, cpu), order);
+    if ( remove )
+        FREE_XENHEAP_PAGES(per_cpu(gdt_table, cpu), order);
 
     free_xenheap_pages(per_cpu(compat_gdt_table, cpu), order);
 
-    order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
-    free_xenheap_pages(idt_tables[cpu], order);
-    idt_tables[cpu] = NULL;
-
-    if ( stack_base[cpu] != NULL )
+    if ( remove )
     {
-        memguard_unguard_stack(stack_base[cpu]);
-        free_xenheap_pages(stack_base[cpu], STACK_ORDER);
-        stack_base[cpu] = NULL;
+        order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
+        FREE_XENHEAP_PAGES(idt_tables[cpu], order);
+
+        if ( stack_base[cpu] )
+        {
+            memguard_unguard_stack(stack_base[cpu]);
+            FREE_XENHEAP_PAGES(stack_base[cpu], STACK_ORDER);
+        }
     }
 }
 
@@ -963,15 +978,17 @@ static int cpu_smpboot_alloc(unsigned in
     if ( node != NUMA_NO_NODE )
         memflags = MEMF_node(node);
 
-    stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
+    if ( stack_base[cpu] == NULL )
+        stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
     if ( stack_base[cpu] == NULL )
         goto out;
     memguard_guard_stack(stack_base[cpu]);
 
     order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
-    per_cpu(gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags);
+    gdt = per_cpu(gdt_table, cpu) ?: alloc_xenheap_pages(order, memflags);
     if ( gdt == NULL )
         goto out;
+    per_cpu(gdt_table, cpu) = gdt;
     memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE);
     BUILD_BUG_ON(NR_CPUS > 0x10000);
     gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
@@ -983,7 +1000,8 @@ static int cpu_smpboot_alloc(unsigned in
     gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
 
     order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
-    idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
+    if ( idt_tables[cpu] == NULL )
+        idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
     if ( idt_tables[cpu] == NULL )
         goto out;
     memcpy(idt_tables[cpu], idt_table, IDT_ENTRIES * sizeof(idt_entry_t));
@@ -1011,16 +1029,16 @@ static int cpu_smpboot_alloc(unsigned in
          (secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL )
         goto out;
 
-    if ( !(zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
-           zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
-           alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+    if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
+           cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
+           cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
         goto out;
 
     rc = 0;
 
  out:
     if ( rc )
-        cpu_smpboot_free(cpu);
+        cpu_smpboot_free(cpu, true);
 
     return rc;
 }
@@ -1038,9 +1056,10 @@ static int cpu_smpboot_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        cpu_smpboot_free(cpu);
+        cpu_smpboot_free(cpu, !park_offline_cpus);
         break;
-    default:
+    case CPU_REMOVE:
+        cpu_smpboot_free(cpu, true);
         break;
     }
 
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -26,6 +26,8 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibli
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
+extern bool park_offline_cpus;
+
 void smp_send_nmi_allbutself(void);
 
 void send_IPI_mask(const cpumask_t *, int vector);
--- a/xen/include/xen/cpu.h
+++ b/xen/include/xen/cpu.h
@@ -47,6 +47,8 @@ void register_cpu_notifier(struct notifi
 #define CPU_DYING        (0x0007 | NOTIFY_REVERSE)
 /* CPU_DEAD: CPU is dead. */
 #define CPU_DEAD         (0x0008 | NOTIFY_REVERSE)
+/* CPU_REMOVE: CPU was removed. */
+#define CPU_REMOVE       (0x0009 | NOTIFY_REVERSE)
 
 /* Perform CPU hotplug. May return -EAGAIN. */
 int cpu_down(unsigned int cpu);
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -351,16 +351,35 @@ static inline bool_t alloc_cpumask_var(c
 	return *mask != NULL;
 }
 
+static inline bool cond_alloc_cpumask_var(cpumask_var_t *mask)
+{
+	if (*mask == NULL)
+		*mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+	return *mask != NULL;
+}
+
 static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	*(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
+static inline bool cond_zalloc_cpumask_var(cpumask_var_t *mask)
+{
+	if (*mask == NULL)
+		*mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+	else
+		cpumask_clear(*mask);
+	return *mask != NULL;
+}
+
 static inline void free_cpumask_var(cpumask_var_t mask)
 {
 	xfree(mask);
 }
+
+/* Free an allocated mask, and zero the pointer to it. */
+#define FREE_CPUMASK_VAR(m) XFREE(m)
 #else
 typedef cpumask_t cpumask_var_t[1];
 
@@ -368,16 +387,20 @@ static inline bool_t alloc_cpumask_var(c
 {
 	return 1;
 }
+#define cond_alloc_cpumask_var alloc_cpumask_var
 
 static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	cpumask_clear(*mask);
 	return 1;
 }
+#define cond_zalloc_cpumask_var zalloc_cpumask_var
 
 static inline void free_cpumask_var(cpumask_var_t mask)
 {
 }
+
+#define FREE_CPUMASK_VAR(m) free_cpumask_var(m)
 #endif
 
 #if NR_CPUS > 1
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -162,6 +162,14 @@ void free_xenheap_pages(void *v, unsigne
 bool scrub_free_pages(void);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
+
+/* Free an allocation, and zero the pointer to it. */
+#define FREE_XENHEAP_PAGES(p, o) do { \
+    free_xenheap_pages(p, o);         \
+    (p) = NULL;                       \
+} while ( false )
+#define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
+
 /* Map machine page range in Xen virtual address space. */
 int map_pages_to_xen(
     unsigned long virt,
--- a/xen/include/xen/xmalloc.h
+++ b/xen/include/xen/xmalloc.h
@@ -42,6 +42,12 @@
 /* Free any of the above. */
 extern void xfree(void *);
 
+/* Free an allocation, and zero the pointer to it. */
+#define XFREE(p) do { \
+    xfree(p);         \
+    (p) = NULL;       \
+} while ( false )
+
 /* Underlying functions */
 extern void *_xmalloc(unsigned long size, unsigned long align);
 extern void *_xzalloc(unsigned long size, unsigned long align);




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/4] x86: possibly bring up all CPUs even if not all are supposed to be used
  2018-07-19 10:25 [PATCH v3 0/4] x86: allow to suppress use of hyper-threading Jan Beulich
  2018-07-19 10:31 ` [PATCH v3 1/4] x86: distinguish CPU offlining from CPU removal Jan Beulich
@ 2018-07-19 10:32 ` Jan Beulich
  2018-07-19 11:16   ` Joao Martins
  2018-07-19 10:32 ` [PATCH v3 3/4] x86: command line option to avoid use of secondary hyper-threads Jan Beulich
  2018-07-19 10:33 ` [PATCH v3 4/4] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
  3 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2018-07-19 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Reportedly Intel CPUs which can't broadcast #MC to all targeted
cores/threads because some have CR4.MCE clear will shut down. Therefore
we want to keep CR4.MCE enabled when offlining a CPU, and we need to
bring up all CPUs in order to be able to set CR4.MCE in the first place.

The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
altogether right here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau MonnÃ© <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
v2: Use ROUNDUP().
---
Instead of fully bringing up CPUs and then calling cpu_down(), another
option would be to suppress/cancel full bringup in smp_callin(). But I
guess we should try to keep things simple for now, and see later whether
this can be "optimized".
---
Note: The parked CPUs can be brought online (i.e. the meaning of
      "maxcpus=" isn't as strict anymore as it was before), but won't
      immediately be used for scheduling pre-existing Dom0 CPUs. That's
      because dom0_setup_vcpu() artifically restricts the affinity. For
      DomU-s whose affinity was not artifically restricted, no such
      limitation exists, albeit the shown "soft" affinity appears to
      suffer a similar issue. As that's not a goal of this patch, I've
      put the issues on the side for now, perhaps for someone else to
      take care of.
Note: On one of my test systems the parked CPUs get _PSD data reported
      by Dom0 that is different from the non-parked ones (coord_type is
      0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this
      problem, so there is apparently something amiss in the processor
      driver. I've tried to figure out what, but I couldn't, despite the
      AML suggesting that this might be some _OSC invocation (but if it
      is, I can't find it - acpi_run_osc() clearly does not anywhere get
      invoked in a per-CPU fashion).

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -13,6 +13,7 @@
 #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
 
 #include "cpu.h"
+#include "mcheck/x86_mca.h"
 
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
@@ -343,6 +344,9 @@ static void __init early_cpu_detect(void
 			hap_paddr_bits = PADDR_BITS;
 	}
 
+	if (c->x86_vendor != X86_VENDOR_AMD)
+		park_offline_cpus = opt_mce;
+
 	initialize_cpu_data(0);
 }
 
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -636,8 +636,6 @@ static void clear_cmci(void)
 
 static void cpu_mcheck_disable(void)
 {
-    clear_in_cr4(X86_CR4_MCE);
-
     if ( cmci_support && opt_mce )
         clear_cmci();
 }
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -68,19 +68,26 @@ physid_mask_t phys_cpu_present_map;
 
 void __init set_nr_cpu_ids(unsigned int max_cpus)
 {
+	unsigned int tot_cpus = num_processors + disabled_cpus;
+
 	if (!max_cpus)
-		max_cpus = num_processors + disabled_cpus;
+		max_cpus = tot_cpus;
 	if (max_cpus > NR_CPUS)
 		max_cpus = NR_CPUS;
 	else if (!max_cpus)
 		max_cpus = 1;
 	printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
 	       max_cpus, max_t(int, max_cpus - num_processors, 0));
-	nr_cpu_ids = max_cpus;
+
+	if (!park_offline_cpus)
+		tot_cpus = max_cpus;
+	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
+	if (park_offline_cpus && nr_cpu_ids < num_processors)
+		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
+		       num_processors - nr_cpu_ids);
 
 #ifndef nr_cpumask_bits
-	nr_cpumask_bits = (max_cpus + (BITS_PER_LONG - 1)) &
-			  ~(BITS_PER_LONG - 1);
+	nr_cpumask_bits = ROUNDUP(nr_cpu_ids, BITS_PER_LONG);
 	printk(XENLOG_DEBUG "NR_CPUS:%u nr_cpumask_bits:%u\n",
 	       NR_CPUS, nr_cpumask_bits);
 #endif
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -665,7 +665,7 @@ void __init noreturn __start_xen(unsigne
 {
     char *memmap_type = NULL;
     char *cmdline, *kextra, *loader;
-    unsigned int initrdidx;
+    unsigned int initrdidx, num_parked = 0;
     multiboot_info_t *mbi;
     module_t *mod;
     unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
@@ -1512,7 +1512,8 @@ void __init noreturn __start_xen(unsigne
     else
     {
         set_nr_cpu_ids(max_cpus);
-        max_cpus = nr_cpu_ids;
+        if ( !max_cpus )
+            max_cpus = nr_cpu_ids;
     }
 
     if ( xen_guest )
@@ -1635,16 +1636,27 @@ void __init noreturn __start_xen(unsigne
             /* Set up node_to_cpumask based on cpu_to_node[]. */
             numa_add_cpu(i);
 
-            if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
+            if ( (park_offline_cpus || num_online_cpus() < max_cpus) &&
+                 !cpu_online(i) )
             {
                 int ret = cpu_up(i);
                 if ( ret != 0 )
                     printk("Failed to bring up CPU %u (error %d)\n", i, ret);
+                else if ( num_online_cpus() > max_cpus )
+                {
+                    ret = cpu_down(i);
+                    if ( !ret )
+                        ++num_parked;
+                    else
+                        printk("Could not re-offline CPU%u (%d)\n", i, ret);
+                }
             }
         }
     }
 
     printk("Brought up %ld CPUs\n", (long)num_online_cpus());
+    if ( num_parked )
+        printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
     smp_cpus_done();
 
     do_initcalls();
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -342,12 +342,6 @@ static always_inline void set_in_cr4 (un
     write_cr4(read_cr4() | mask);
 }
 
-static always_inline void clear_in_cr4 (unsigned long mask)
-{
-    mmu_cr4_features &= ~mask;
-    write_cr4(read_cr4() & ~mask);
-}
-
 static inline unsigned int read_pkru(void)
 {
     unsigned int pkru;




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 3/4] x86: command line option to avoid use of secondary hyper-threads
  2018-07-19 10:25 [PATCH v3 0/4] x86: allow to suppress use of hyper-threading Jan Beulich
  2018-07-19 10:31 ` [PATCH v3 1/4] x86: distinguish CPU offlining from CPU removal Jan Beulich
  2018-07-19 10:32 ` [PATCH v3 2/4] x86: possibly bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-19 10:32 ` Jan Beulich
  2018-07-19 10:37   ` Andrew Cooper
  2018-07-19 10:33 ` [PATCH v3 4/4] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
  3 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2018-07-19 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Shared resources (L1 cache and TLB in particular) present a risk of
information leak via side channels. Provide a means to avoid use of
hyperthreads in such cases.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau MonnÃ© <roger.pau@citrix.com>
---
v3: Also change the actual option string to "smt". Don't default the
    option to off for now, as being to impactful a default for things
    like TLBleed.
v2: Rename option to "smt".
---
An option to avoid the up/down cycle would be to avoid clearing the
sibling (and then perhaps also core) map of parked CPUs, allowing to
bail early from cpu_up_helper().

TBD: How to prevent the CPU from transiently becoming available for
     scheduling when being onlined at runtime?

TBD: For now the patch assumes all HT-enabled CPUs are affected by side
     channel attacks through shared resources. There are claims that AMD
     ones aren't, but it hasn't really become clear to me why that would
     be, as I don't see the fully associative L1 TLBs to be sufficient
     reason for there to not be other possible avenues (L2 TLB, caches).

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1764,6 +1764,13 @@ Use `smap=hvm` to allow SMAP use by HVM
 Flag to enable Supervisor Mode Execution Protection
 Use `smep=hvm` to allow SMEP use by HVM guests only.
 
+### smt (x86)
+> `= <boolean>`
+
+Default: `true`
+
+Control bring up of multiple hyper-threads per CPU core.
+
 ### snb\_igd\_quirk
 > `= <boolean> | cap | <integer>`
 
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -62,6 +62,9 @@ boolean_param("nosmp", opt_nosmp);
 static unsigned int __initdata max_cpus;
 integer_param("maxcpus", max_cpus);
 
+int8_t __read_mostly opt_smt = -1;
+boolean_param("smt", opt_smt);
+
 /* opt_invpcid: If false, don't use INVPCID instruction even if available. */
 static bool __initdata opt_invpcid = true;
 boolean_param("invpcid", opt_invpcid);
@@ -1642,7 +1645,10 @@ void __init noreturn __start_xen(unsigne
                 int ret = cpu_up(i);
                 if ( ret != 0 )
                     printk("Failed to bring up CPU %u (error %d)\n", i, ret);
-                else if ( num_online_cpus() > max_cpus )
+                else if ( num_online_cpus() > max_cpus ||
+                          (!opt_smt &&
+                           cpu_data[i].compute_unit_id == INVALID_CUID &&
+                           cpumask_weight(per_cpu(cpu_sibling_mask, i)) > 1) )
                 {
                     ret = cpu_down(i);
                     if ( !ret )
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -23,6 +23,7 @@
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/support.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 #include <asm/smp.h>
 #include <asm/numa.h>
 #include <xen/nodemask.h>
@@ -48,14 +49,27 @@ static void l3_cache_get(void *arg)
 
 long cpu_up_helper(void *data)
 {
-    int cpu = (unsigned long)data;
+    unsigned int cpu = (unsigned long)data;
     int ret = cpu_up(cpu);
+
     if ( ret == -EBUSY )
     {
         /* On EBUSY, flush RCU work and have one more go. */
         rcu_barrier();
         ret = cpu_up(cpu);
     }
+
+    if ( !ret && !opt_smt &&
+         cpu_data[cpu].compute_unit_id == INVALID_CUID &&
+         cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) > 1 )
+    {
+        ret = cpu_down_helper(data);
+        if ( ret )
+            printk("Could not re-offline CPU%u (%d)\n", cpu, ret);
+        else
+            ret = -EPERM;
+    }
+
     return ret;
 }
 
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -59,6 +59,8 @@ extern uint8_t kbd_shift_flags;
 extern unsigned long highmem_start;
 #endif
 
+extern int8_t opt_smt;
+
 #ifdef CONFIG_SHADOW_PAGING
 extern bool opt_dom0_shadow;
 #else




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 4/4] cpumask: tidy {,z}alloc_cpumask_var()
  2018-07-19 10:25 [PATCH v3 0/4] x86: allow to suppress use of hyper-threading Jan Beulich
                   ` (2 preceding siblings ...)
  2018-07-19 10:32 ` [PATCH v3 3/4] x86: command line option to avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-19 10:33 ` Jan Beulich
  3 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2018-07-19 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall

Drop unnecessary casts and use bool in favor of bool_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -345,9 +345,9 @@ static inline int cpulist_scnprintf(char
 
 typedef cpumask_t *cpumask_var_t;
 
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
 {
-	*(void **)mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+	*mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
@@ -358,9 +358,9 @@ static inline bool cond_alloc_cpumask_va
 	return *mask != NULL;
 }
 
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
 {
-	*(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+	*mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
@@ -383,16 +383,16 @@ static inline void free_cpumask_var(cpum
 #else
 typedef cpumask_t cpumask_var_t[1];
 
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
 {
-	return 1;
+	return true;
 }
 #define cond_alloc_cpumask_var alloc_cpumask_var
 
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	cpumask_clear(*mask);
-	return 1;
+	return true;
 }
 #define cond_zalloc_cpumask_var zalloc_cpumask_var
 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/4] x86: command line option to avoid use of secondary hyper-threads
  2018-07-19 10:32 ` [PATCH v3 3/4] x86: command line option to avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-19 10:37   ` Andrew Cooper
  2018-07-19 10:44     ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2018-07-19 10:37 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 19/07/18 11:32, Jan Beulich wrote:
> Shared resources (L1 cache and TLB in particular) present a risk of
> information leak via side channels. Provide a means to avoid use of
> hyperthreads in such cases.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Roger Pau MonnÃ© <roger.pau@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

It appears as if you've got some encoding issues with Roger's name in
patch 2 and here, but patch 4 is fine.  I trust you can fix this up on
commit.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/4] x86: command line option to avoid use of secondary hyper-threads
  2018-07-19 10:37   ` Andrew Cooper
@ 2018-07-19 10:44     ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2018-07-19 10:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 19.07.18 at 12:37, <andrew.cooper3@citrix.com> wrote:
> On 19/07/18 11:32, Jan Beulich wrote:
>> Shared resources (L1 cache and TLB in particular) present a risk of
>> information leak via side channels. Provide a means to avoid use of
>> hyperthreads in such cases.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Roger Pau MonnÃ© <roger.pau@citrix.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks.

> It appears as if you've got some encoding issues with Roger's name in
> patch 2 and here, but patch 4 is fine.  I trust you can fix this up on
> commit.

The actual patches are fine - it was just an issue with me forgetting to
do the necessary adjustment when sending the other two.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/4] x86: possibly bring up all CPUs even if not all are supposed to be used
  2018-07-19 10:32 ` [PATCH v3 2/4] x86: possibly bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-19 11:16   ` Joao Martins
  2018-07-19 11:45     ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Joao Martins @ 2018-07-19 11:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper

On 07/19/2018 11:32 AM, Jan Beulich wrote:
> Note: On one of my test systems the parked CPUs get _PSD data reported
>       by Dom0 that is different from the non-parked ones (coord_type is
>       0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this
>       problem, so there is apparently something amiss in the processor
>       driver. I've tried to figure out what, but I couldn't, despite the
>       AML suggesting that this might be some _OSC invocation (but if it
>       is, I can't find it - acpi_run_osc() clearly does not anywhere get
>       invoked in a per-CPU fashion).
> 

Regarding your second note, could the commit below (in the linux acpi processor
driver) help/fixes it?

4d0f1ce69559 xen/acpi: upload _PSD info for non Dom0 CPUs too

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/4] x86: possibly bring up all CPUs even if not all are supposed to be used
  2018-07-19 11:16   ` Joao Martins
@ 2018-07-19 11:45     ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2018-07-19 11:45 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 19.07.18 at 13:16, <joao.m.martins@oracle.com> wrote:
> On 07/19/2018 11:32 AM, Jan Beulich wrote:
>> Note: On one of my test systems the parked CPUs get _PSD data reported
>>       by Dom0 that is different from the non-parked ones (coord_type is
>>       0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this
>>       problem, so there is apparently something amiss in the processor
>>       driver. I've tried to figure out what, but I couldn't, despite the
>>       AML suggesting that this might be some _OSC invocation (but if it
>>       is, I can't find it - acpi_run_osc() clearly does not anywhere get
>>       invoked in a per-CPU fashion).
>> 
> 
> Regarding your second note, could the commit below (in the linux acpi 
> processor
> driver) help/fixes it?
> 
> 4d0f1ce69559 xen/acpi: upload _PSD info for non Dom0 CPUs too

No, that is in place already.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-07-19 11:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19 10:25 [PATCH v3 0/4] x86: allow to suppress use of hyper-threading Jan Beulich
2018-07-19 10:31 ` [PATCH v3 1/4] x86: distinguish CPU offlining from CPU removal Jan Beulich
2018-07-19 10:32 ` [PATCH v3 2/4] x86: possibly bring up all CPUs even if not all are supposed to be used Jan Beulich
2018-07-19 11:16   ` Joao Martins
2018-07-19 11:45     ` Jan Beulich
2018-07-19 10:32 ` [PATCH v3 3/4] x86: command line option to avoid use of secondary hyper-threads Jan Beulich
2018-07-19 10:37   ` Andrew Cooper
2018-07-19 10:44     ` Jan Beulich
2018-07-19 10:33 ` [PATCH v3 4/4] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.