All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading
@ 2018-07-18  8:15 Jan Beulich
  2018-07-18  8:18 ` [PATCH v2 1/6] cpupools: fix state when downing a CPU failed Jan Beulich
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18  8:15 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

I've been considering to add a respective command line option for
quite a long time, but never got around to. Now that the TLBleed
information is public[1], we're at a point where we not only want,
but need this, and where perhaps it needs to be the default on
affected systems. The first 4 patches are all prerequisites to the
5th one; the final one is simply cleanup.

1: cpupools: fix state when downing a CPU failed
2: x86: distinguish CPU offlining from CPU removal
3: x86/AMD: distinguish compute units from hyper-threads
4: x86: bring up all CPUs even if not all are supposed to be used
5: x86: command line option to avoid use of secondary hyper-threads
6: cpumask: tidy {,z}alloc_cpumask_var() 

Jan

[1] https://www.vusec.net/projects/tlbleed/




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/6] cpupools: fix state when downing a CPU failed
  2018-07-18  8:15 [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading Jan Beulich
@ 2018-07-18  8:18 ` Jan Beulich
  2018-07-18  8:19 ` [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal Jan Beulich
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18  8:18 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross

While I've run into the issue with further patches in place which no
longer guarantee the per-CPU area to start out as all zeros, the
CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
of schedule_cpu_switch() will trigger the "c != old_pool" assertion
there.

Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
should not happen before cpu_disable_scheduler()). Clearing it in
CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
piece of code twice. Since the field's value shouldn't matter while the
CPU is offline, simply clear it (implicitly) for CPU_ONLINE and
CPU_DOWN_FAILED, but only for other than the suspend/resume case (which
gets specially handled in cpupool_cpu_remove()).

By adjusting the conditional in cpupool_cpu_add() CPU_DOWN_FAILED
handling in the suspend case should now also be handled better.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Move change into cpupool_cpu_add(). Adjust conditional there.

--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -490,7 +490,7 @@ static int cpupool_cpu_add(unsigned int
     cpumask_clear_cpu(cpu, &cpupool_locked_cpus);
     cpumask_set_cpu(cpu, &cpupool_free_cpus);
 
-    if ( system_state == SYS_STATE_resume )
+    if ( system_state == SYS_STATE_suspend || system_state == SYS_STATE_resume )
     {
         struct cpupool **c;
 
@@ -522,6 +522,7 @@ static int cpupool_cpu_add(unsigned int
          * (or unplugging would have failed) and that is the default behavior
          * anyway.
          */
+        per_cpu(cpupool, cpu) = NULL;
         ret = cpupool_assign_cpu_locked(cpupool0, cpu);
     }
  out:




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal
  2018-07-18  8:15 [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading Jan Beulich
  2018-07-18  8:18 ` [PATCH v2 1/6] cpupools: fix state when downing a CPU failed Jan Beulich
@ 2018-07-18  8:19 ` Jan Beulich
  2018-07-18  8:49   ` Wei Liu
                     ` (2 more replies)
  2018-07-18  8:20 ` [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18  8:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall

In order to be able to service #MC on offlined CPUs, the GDT, IDT,
stack, and per-CPU data (which includes the TSS) need to be kept
allocated. They should only be freed upon CPU removal (which we
currently don't support, so some code is becoming effectively dead
for the moment).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Rename cpu_smpboot_free()'s new parameter. Introduce XFREE(),
    FREE_XENHEAP_PAGES(), FREE_XENHEAP_PAGE(), and FREE_CPUMASK_VAR().

--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -692,12 +692,15 @@ static void cpu_bank_free(unsigned int c
 
     mcabanks_free(poll);
     mcabanks_free(clr);
+
+    per_cpu(poll_bankmask, cpu) = NULL;
+    per_cpu(mce_clear_banks, cpu) = NULL;
 }
 
 static int cpu_bank_alloc(unsigned int cpu)
 {
-    struct mca_banks *poll = mcabanks_alloc();
-    struct mca_banks *clr = mcabanks_alloc();
+    struct mca_banks *poll = per_cpu(poll_bankmask, cpu) ?: mcabanks_alloc();
+    struct mca_banks *clr = per_cpu(mce_clear_banks, cpu) ?: mcabanks_alloc();
 
     if ( !poll || !clr )
     {
@@ -725,7 +728,13 @@ static int cpu_callback(
 
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        cpu_bank_free(cpu);
+        if ( !park_offline_cpus )
+            cpu_bank_free(cpu);
+        break;
+
+    case CPU_REMOVE:
+        if ( park_offline_cpus )
+            cpu_bank_free(cpu);
         break;
     }
 
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -107,10 +107,11 @@ static void play_dead(void)
     local_irq_disable();
 
     /*
-     * NOTE: After cpu_exit_clear, per-cpu variables are no longer accessible,
-     * as they may be freed at any time. In this case, heap corruption or
-     * #PF can occur (when heap debugging is enabled). For example, even
-     * printk() can involve tasklet scheduling, which touches per-cpu vars.
+     * NOTE: After cpu_exit_clear, per-cpu variables may no longer accessible,
+     * as they may be freed at any time if offline CPUs don't get parked. In
+     * this case, heap corruption or #PF can occur (when heap debugging is
+     * enabled). For example, even printk() can involve tasklet scheduling,
+     * which touches per-cpu vars.
      * 
      * Consider very carefully when adding code to *dead_idle. Most hypervisor
      * subsystems are unsafe to call.
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -201,18 +201,21 @@ static int update_clusterinfo(
         if ( !cluster_cpus_spare )
             cluster_cpus_spare = xzalloc(cpumask_t);
         if ( !cluster_cpus_spare ||
-             !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
+             !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
             err = -ENOMEM;
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
+    case CPU_REMOVE:
+        if ( park_offline_cpus == (action != CPU_REMOVE) )
+            break;
         if ( per_cpu(cluster_cpus, cpu) )
         {
             cpumask_clear_cpu(cpu, per_cpu(cluster_cpus, cpu));
             if ( cpumask_empty(per_cpu(cluster_cpus, cpu)) )
-                xfree(per_cpu(cluster_cpus, cpu));
+                XFREE(per_cpu(cluster_cpus, cpu));
         }
-        free_cpumask_var(per_cpu(scratch_mask, cpu));
+        FREE_CPUMASK_VAR(per_cpu(scratch_mask, cpu));
         break;
     }
 
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -28,7 +28,7 @@ static int init_percpu_area(unsigned int
     char *p;
 
     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
-        return -EBUSY;
+        return 0;
 
     if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
         return -ENOMEM;
@@ -76,9 +76,12 @@ static int cpu_percpu_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        free_percpu_area(cpu);
+        if ( !park_offline_cpus )
+            free_percpu_area(cpu);
         break;
-    default:
+    case CPU_REMOVE:
+        if ( park_offline_cpus )
+            free_percpu_area(cpu);
         break;
     }
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
+bool __read_mostly park_offline_cpus;
+
 unsigned int __read_mostly nr_sockets;
 cpumask_t **__read_mostly socket_cpumask;
 static cpumask_t *secondary_socket_cpumask;
@@ -887,7 +889,14 @@ static void cleanup_cpu_root_pgt(unsigne
     }
 }
 
-static void cpu_smpboot_free(unsigned int cpu)
+/*
+ * The 'remove' boolean controls whether a CPU is just getting offlined (and
+ * parked), or outright removed / offlined without parking. Parked CPUs need
+ * things like their stack, GDT, IDT, TSS, and per-CPU data still available.
+ * A few other items, in particular CPU masks, are also retained, as it's
+ * difficult to prove that they're entirely unreferenced from parked CPUs.
+ */
+static void cpu_smpboot_free(unsigned int cpu, bool remove)
 {
     unsigned int order, socket = cpu_to_socket(cpu);
     struct cpuinfo_x86 *c = cpu_data;
@@ -898,15 +907,19 @@ static void cpu_smpboot_free(unsigned in
         socket_cpumask[socket] = NULL;
     }
 
-    c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
-    c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
-    c[cpu].compute_unit_id = INVALID_CUID;
     cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
 
-    free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
-    free_cpumask_var(per_cpu(cpu_core_mask, cpu));
-    if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
-        free_cpumask_var(per_cpu(scratch_cpumask, cpu));
+    if ( remove )
+    {
+        c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
+        c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
+        c[cpu].compute_unit_id = INVALID_CUID;
+
+        FREE_CPUMASK_VAR(per_cpu(cpu_sibling_mask, cpu));
+        FREE_CPUMASK_VAR(per_cpu(cpu_core_mask, cpu));
+        if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
+            FREE_CPUMASK_VAR(per_cpu(scratch_cpumask, cpu));
+    }
 
     cleanup_cpu_root_pgt(cpu);
 
@@ -928,19 +941,21 @@ static void cpu_smpboot_free(unsigned in
     }
 
     order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
-    free_xenheap_pages(per_cpu(gdt_table, cpu), order);
+    if ( remove )
+        FREE_XENHEAP_PAGES(per_cpu(gdt_table, cpu), order);
 
     free_xenheap_pages(per_cpu(compat_gdt_table, cpu), order);
 
-    order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
-    free_xenheap_pages(idt_tables[cpu], order);
-    idt_tables[cpu] = NULL;
-
-    if ( stack_base[cpu] != NULL )
+    if ( remove )
     {
-        memguard_unguard_stack(stack_base[cpu]);
-        free_xenheap_pages(stack_base[cpu], STACK_ORDER);
-        stack_base[cpu] = NULL;
+        order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
+        FREE_XENHEAP_PAGES(idt_tables[cpu], order);
+
+        if ( stack_base[cpu] )
+        {
+            memguard_unguard_stack(stack_base[cpu]);
+            FREE_XENHEAP_PAGES(stack_base[cpu], STACK_ORDER);
+        }
     }
 }
 
@@ -955,15 +970,19 @@ static int cpu_smpboot_alloc(unsigned in
     if ( node != NUMA_NO_NODE )
         memflags = MEMF_node(node);
 
-    stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
+    if ( stack_base[cpu] == NULL )
+        stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
     if ( stack_base[cpu] == NULL )
         goto out;
     memguard_guard_stack(stack_base[cpu]);
 
     order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
-    per_cpu(gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags);
+    gdt = per_cpu(gdt_table, cpu);
+    if ( gdt == NULL )
+        gdt = alloc_xenheap_pages(order, memflags);
     if ( gdt == NULL )
         goto out;
+    per_cpu(gdt_table, cpu) = gdt;
     memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE);
     BUILD_BUG_ON(NR_CPUS > 0x10000);
     gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
@@ -975,7 +994,8 @@ static int cpu_smpboot_alloc(unsigned in
     gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
 
     order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
-    idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
+    if ( idt_tables[cpu] == NULL )
+        idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
     if ( idt_tables[cpu] == NULL )
         goto out;
     memcpy(idt_tables[cpu], idt_table, IDT_ENTRIES * sizeof(idt_entry_t));
@@ -1003,16 +1023,16 @@ static int cpu_smpboot_alloc(unsigned in
          (secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL )
         goto out;
 
-    if ( !(zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
-           zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
-           alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+    if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
+           cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
+           cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
         goto out;
 
     rc = 0;
 
  out:
     if ( rc )
-        cpu_smpboot_free(cpu);
+        cpu_smpboot_free(cpu, true);
 
     return rc;
 }
@@ -1030,9 +1050,10 @@ static int cpu_smpboot_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        cpu_smpboot_free(cpu);
+        cpu_smpboot_free(cpu, !park_offline_cpus);
         break;
-    default:
+    case CPU_REMOVE:
+        cpu_smpboot_free(cpu, true);
         break;
     }
 
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -26,6 +26,8 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibli
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
+extern bool park_offline_cpus;
+
 void smp_send_nmi_allbutself(void);
 
 void send_IPI_mask(const cpumask_t *, int vector);
--- a/xen/include/xen/cpu.h
+++ b/xen/include/xen/cpu.h
@@ -47,6 +47,8 @@ void register_cpu_notifier(struct notifi
 #define CPU_DYING        (0x0007 | NOTIFY_REVERSE)
 /* CPU_DEAD: CPU is dead. */
 #define CPU_DEAD         (0x0008 | NOTIFY_REVERSE)
+/* CPU_REMOVE: CPU was removed. */
+#define CPU_REMOVE       (0x0009 | NOTIFY_REVERSE)
 
 /* Perform CPU hotplug. May return -EAGAIN. */
 int cpu_down(unsigned int cpu);
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -351,16 +351,35 @@ static inline bool_t alloc_cpumask_var(c
 	return *mask != NULL;
 }
 
+static inline bool cond_alloc_cpumask_var(cpumask_var_t *mask)
+{
+	if (*mask == NULL)
+		*mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+	return *mask != NULL;
+}
+
 static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	*(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
+static inline bool cond_zalloc_cpumask_var(cpumask_var_t *mask)
+{
+	if (*mask == NULL)
+		*mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+	else
+		cpumask_clear(*mask);
+	return *mask != NULL;
+}
+
 static inline void free_cpumask_var(cpumask_var_t mask)
 {
 	xfree(mask);
 }
+
+/* Free an allocated mask, and zero the pointer to it. */
+#define FREE_CPUMASK_VAR(m) XFREE(m)
 #else
 typedef cpumask_t cpumask_var_t[1];
 
@@ -368,16 +387,20 @@ static inline bool_t alloc_cpumask_var(c
 {
 	return 1;
 }
+#define cond_alloc_cpumask_var alloc_cpumask_var
 
 static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	cpumask_clear(*mask);
 	return 1;
 }
+#define cond_zalloc_cpumask_var zalloc_cpumask_var
 
 static inline void free_cpumask_var(cpumask_var_t mask)
 {
 }
+
+#define FREE_CPUMASK_VAR(m) ((void)(m))
 #endif
 
 #if NR_CPUS > 1
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -162,6 +162,14 @@ void free_xenheap_pages(void *v, unsigne
 bool scrub_free_pages(void);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
+
+/* Free an allocation, and zero the pointer to it. */
+#define FREE_XENHEAP_PAGES(p, o) do { \
+    free_xenheap_pages(p, o);         \
+    (p) = NULL;                       \
+} while ( false )
+#define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
+
 /* Map machine page range in Xen virtual address space. */
 int map_pages_to_xen(
     unsigned long virt,
--- a/xen/include/xen/xmalloc.h
+++ b/xen/include/xen/xmalloc.h
@@ -42,6 +42,12 @@
 /* Free any of the above. */
 extern void xfree(void *);
 
+/* Free an allocation, and zero the pointer to it. */
+#define XFREE(p) do { \
+    xfree(p);         \
+    (p) = NULL;       \
+} while ( false )
+
 /* Underlying functions */
 extern void *_xmalloc(unsigned long size, unsigned long align);
 extern void *_xzalloc(unsigned long size, unsigned long align);




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads
  2018-07-18  8:15 [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading Jan Beulich
  2018-07-18  8:18 ` [PATCH v2 1/6] cpupools: fix state when downing a CPU failed Jan Beulich
  2018-07-18  8:19 ` [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal Jan Beulich
@ 2018-07-18  8:20 ` Jan Beulich
  2018-07-18 11:45   ` Roger Pau Monné
  2018-07-18 12:54   ` Andrew Cooper
  2018-07-18  8:21 ` [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18  8:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
the difference is not very big. The most relevant change (requiring some
code restructuring) is that the topoext feature no longer means there is
a valid CU ID.

Take the opportunity and convert wrongly plain int variables in
set_cpu_sibling_map() to unsigned int.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Brian Woods <brian.woods@amd.com>
---
v2: Don't break (bogus) indentation style in amd_get_topology().

--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->compute_unit_id = ebx & 0xFF;
                 c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+
+                if (c->x86 < 0x17)
+                        c->compute_unit_id = ebx & 0xFF;
+                else {
+                        c->cpu_core_id = ebx & 0xFF;
+                        c->x86_max_cores /= c->x86_num_siblings;
+                }
         }
         
         if (opt_cpu_info)
                 printk("CPU %d(%d) -> Processor %d, %s %d\n",
                        cpu, c->x86_max_cores, c->phys_proc_id,
-                       cpu_has(c, X86_FEATURE_TOPOEXT) ? "Compute Unit" : 
-                                                         "Core",
-                       cpu_has(c, X86_FEATURE_TOPOEXT) ? c->compute_unit_id :
-                                                         c->cpu_core_id);
+                       c->compute_unit_id != INVALID_CUID ? "Compute Unit"
+                                                          : "Core",
+                       c->compute_unit_id != INVALID_CUID ? c->compute_unit_id
+                                                          : c->cpu_core_id);
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -236,33 +236,41 @@ static void link_thread_siblings(int cpu
     cpumask_set_cpu(cpu2, per_cpu(cpu_core_mask, cpu1));
 }
 
-static void set_cpu_sibling_map(int cpu)
+static void set_cpu_sibling_map(unsigned int cpu)
 {
-    int i;
+    unsigned int i;
     struct cpuinfo_x86 *c = cpu_data;
 
     cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
 
     cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
+    cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
+    cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
 
     if ( c[cpu].x86_num_siblings > 1 )
     {
         for_each_cpu ( i, &cpu_sibling_setup_map )
         {
-            if ( cpu_has(c, X86_FEATURE_TOPOEXT) ) {
-                if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
-                     (c[cpu].compute_unit_id == c[i].compute_unit_id) )
+            if ( cpu == i || c[cpu].phys_proc_id != c[i].phys_proc_id )
+                continue;
+            if ( c[cpu].compute_unit_id != INVALID_CUID &&
+                 c[i].compute_unit_id != INVALID_CUID )
+            {
+                if ( c[cpu].compute_unit_id == c[i].compute_unit_id )
                     link_thread_siblings(cpu, i);
-            } else if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
-                        (c[cpu].cpu_core_id == c[i].cpu_core_id) ) {
-                link_thread_siblings(cpu, i);
             }
+            else if ( c[cpu].cpu_core_id != XEN_INVALID_CORE_ID &&
+                      c[i].cpu_core_id != XEN_INVALID_CORE_ID )
+            {
+                if ( c[cpu].cpu_core_id == c[i].cpu_core_id )
+                    link_thread_siblings(cpu, i);
+            }
+            else
+                printk(XENLOG_WARNING
+                       "CPU%u: unclear relationship with CPU%u\n",
+                       cpu, i);
         }
     }
-    else
-    {
-        cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
-    }
 
     if ( c[cpu].x86_max_cores == 1 )
     {





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-18  8:15 [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (2 preceding siblings ...)
  2018-07-18  8:20 ` [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
@ 2018-07-18  8:21 ` Jan Beulich
  2018-07-18 13:10   ` Andrew Cooper
                     ` (3 more replies)
  2018-07-18  8:24 ` [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
                   ` (2 subsequent siblings)
  6 siblings, 4 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18  8:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Reportedly Intel CPUs which can't broadcast #MC to all targeted
cores/threads because some have CR4.MCE clear will shut down. Therefore
we want to keep CR4.MCE enabled when offlining a CPU, and we need to
bring up all CPUs in order to be able to set CR4.MCE in the first place.

The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
altogether right here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Use ROUNDUP().
---
Instead of fully bringing up CPUs and then calling cpu_down(), another
option would be to suppress/cancel full bringup in smp_callin(). But I
guess we should try to keep things simple for now, and see later whether
this can be "optimized".
---
Note: The parked CPUs can be brought online (i.e. the meaning of
      "maxcpus=" isn't as strict anymore as it was before), but won't
      immediately be used for scheduling pre-existing Dom0 CPUs. That's
      because dom0_setup_vcpu() artifically restricts the affinity. For
      DomU-s whose affinity was not artifically restricted, no such
      limitation exists, albeit the shown "soft" affinity appears to
      suffer a similar issue. As that's not a goal of this patch, I've
      put the issues on the side for now, perhaps for someone else to
      take care of.
Note: On one of my test systems the parked CPUs get _PSD data reported
      by Dom0 that is different from the non-parked ones (coord_type is
      0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this
      problem, so there is apparently something amiss in the processor
      driver. I've tried to figure out what, but I couldn't, despite the
      AML suggesting that this might be some _OSC invocation (but if it
      is, I can't find it - acpi_run_osc() clearly does not anywhere get
      invoked in a per-CPU fashion).

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -13,6 +13,7 @@
 #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
 
 #include "cpu.h"
+#include "mcheck/x86_mca.h"
 
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
@@ -343,6 +344,9 @@ static void __init early_cpu_detect(void
 			hap_paddr_bits = PADDR_BITS;
 	}
 
+	if (c->x86_vendor != X86_VENDOR_AMD)
+		park_offline_cpus = opt_mce;
+
 	initialize_cpu_data(0);
 }
 
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -636,8 +636,6 @@ static void clear_cmci(void)
 
 static void cpu_mcheck_disable(void)
 {
-    clear_in_cr4(X86_CR4_MCE);
-
     if ( cmci_support && opt_mce )
         clear_cmci();
 }
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -68,19 +68,26 @@ physid_mask_t phys_cpu_present_map;
 
 void __init set_nr_cpu_ids(unsigned int max_cpus)
 {
+	unsigned int tot_cpus = num_processors + disabled_cpus;
+
 	if (!max_cpus)
-		max_cpus = num_processors + disabled_cpus;
+		max_cpus = tot_cpus;
 	if (max_cpus > NR_CPUS)
 		max_cpus = NR_CPUS;
 	else if (!max_cpus)
 		max_cpus = 1;
 	printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
 	       max_cpus, max_t(int, max_cpus - num_processors, 0));
-	nr_cpu_ids = max_cpus;
+
+	if (!park_offline_cpus)
+		tot_cpus = max_cpus;
+	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
+	if (park_offline_cpus && nr_cpu_ids < num_processors)
+		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
+		       num_processors - nr_cpu_ids);
 
 #ifndef nr_cpumask_bits
-	nr_cpumask_bits = (max_cpus + (BITS_PER_LONG - 1)) &
-			  ~(BITS_PER_LONG - 1);
+	nr_cpumask_bits = ROUNDUP(nr_cpu_ids, BITS_PER_LONG);
 	printk(XENLOG_DEBUG "NR_CPUS:%u nr_cpumask_bits:%u\n",
 	       NR_CPUS, nr_cpumask_bits);
 #endif
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -665,7 +665,7 @@ void __init noreturn __start_xen(unsigne
 {
     char *memmap_type = NULL;
     char *cmdline, *kextra, *loader;
-    unsigned int initrdidx;
+    unsigned int initrdidx, num_parked = 0;
     multiboot_info_t *mbi;
     module_t *mod;
     unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
@@ -1512,7 +1512,8 @@ void __init noreturn __start_xen(unsigne
     else
     {
         set_nr_cpu_ids(max_cpus);
-        max_cpus = nr_cpu_ids;
+        if ( !max_cpus )
+            max_cpus = nr_cpu_ids;
     }
 
     if ( xen_guest )
@@ -1635,16 +1636,27 @@ void __init noreturn __start_xen(unsigne
             /* Set up node_to_cpumask based on cpu_to_node[]. */
             numa_add_cpu(i);
 
-            if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
+            if ( (park_offline_cpus || num_online_cpus() < max_cpus) &&
+                 !cpu_online(i) )
             {
                 int ret = cpu_up(i);
                 if ( ret != 0 )
                     printk("Failed to bring up CPU %u (error %d)\n", i, ret);
+                else if ( num_online_cpus() > max_cpus )
+                {
+                    ret = cpu_down(i);
+                    if ( !ret )
+                        ++num_parked;
+                    else
+                        printk("Could not re-offline CPU%u (%d)\n", i, ret);
+                }
             }
         }
     }
 
     printk("Brought up %ld CPUs\n", (long)num_online_cpus());
+    if ( num_parked )
+        printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
     smp_cpus_done();
 
     do_initcalls();
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -342,12 +342,6 @@ static always_inline void set_in_cr4 (un
     write_cr4(read_cr4() | mask);
 }
 
-static always_inline void clear_in_cr4 (unsigned long mask)
-{
-    mmu_cr4_features &= ~mask;
-    write_cr4(read_cr4() & ~mask);
-}
-
 static inline unsigned int read_pkru(void)
 {
     unsigned int pkru;




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-18  8:15 [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (3 preceding siblings ...)
  2018-07-18  8:21 ` [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-18  8:24 ` Jan Beulich
  2018-07-18 13:43   ` Roger Pau Monné
  2018-07-18 13:48   ` Wei Liu
  2018-07-18  8:24 ` [PATCH v2 6/6] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
       [not found] ` <5B4EF7D302000078001D53D1@suse.com>
  6 siblings, 2 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18  8:24 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Brian Woods, Suravee Suthikulpanit,
	Dario Faggioli

Shared resources (L1 cache and TLB in particular) present a risk of
information leak via side channels. Don't use hyperthreads in such
cases, but allow independent control of their use at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Rename option to "smt".
---
An option to avoid the up/down cycle would be to avoid clearing the
sibling (and then perhaps also core) map of parked CPUs, allowing to
bail early from cpu_up_helper().

TBD: How to prevent the CPU from transiently becoming available for
     scheduling when being onlined at runtime?

TBD: For now the patch assumes all HT-enabled CPUs are affected by side
     channel attacks through shared resources. There are claims that AMD
     ones aren't, but it hasn't really become clear to me why that would
     be, as I don't see the fully associative L1 TLBs to be sufficient
     reason for there to not be other possible avenues (L2 TLB, caches).

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1764,6 +1764,13 @@ Use `smap=hvm` to allow SMAP use by HVM
 Flag to enable Supervisor Mode Execution Protection
 Use `smep=hvm` to allow SMEP use by HVM guests only.
 
+### smt (x86)
+> `= <boolean>`
+
+Default: `false`
+
+Control bring up of multiple hyper-threads per CPU core.
+
 ### snb\_igd\_quirk
 > `= <boolean> | cap | <integer>`
 
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -62,6 +62,9 @@ boolean_param("nosmp", opt_nosmp);
 static unsigned int __initdata max_cpus;
 integer_param("maxcpus", max_cpus);
 
+int8_t __read_mostly opt_smt = -1;
+boolean_param("ht", opt_smt);
+
 /* opt_invpcid: If false, don't use INVPCID instruction even if available. */
 static bool __initdata opt_invpcid = true;
 boolean_param("invpcid", opt_invpcid);
@@ -1642,7 +1645,10 @@ void __init noreturn __start_xen(unsigne
                 int ret = cpu_up(i);
                 if ( ret != 0 )
                     printk("Failed to bring up CPU %u (error %d)\n", i, ret);
-                else if ( num_online_cpus() > max_cpus )
+                else if ( num_online_cpus() > max_cpus ||
+                          (!opt_smt &&
+                           cpu_data[i].compute_unit_id == INVALID_CUID &&
+                           cpumask_weight(per_cpu(cpu_sibling_mask, i)) > 1) )
                 {
                     ret = cpu_down(i);
                     if ( !ret )
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -23,6 +23,7 @@
 #include <asm/microcode.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 #include <asm/spec_ctrl.h>
 #include <asm/spec_ctrl_asm.h>
 
@@ -126,6 +127,9 @@ static int __init parse_spec_ctrl(const
 
             opt_eager_fpu = 0;
 
+            if ( opt_smt < 0 )
+                opt_smt = 1;
+
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
@@ -627,6 +631,9 @@ void __init init_speculation_mitigations
     if ( default_xen_spec_ctrl )
         setup_force_cpu_cap(X86_FEATURE_SC_MSR_IDLE);
 
+    if ( opt_smt < 0 )
+        opt_smt = 0;
+
     xpti_init_default(false);
     if ( opt_xpti == 0 )
         setup_force_cpu_cap(X86_FEATURE_NO_XPTI);
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -23,6 +23,7 @@
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/support.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 #include <asm/smp.h>
 #include <asm/numa.h>
 #include <xen/nodemask.h>
@@ -48,14 +49,27 @@ static void l3_cache_get(void *arg)
 
 long cpu_up_helper(void *data)
 {
-    int cpu = (unsigned long)data;
+    unsigned int cpu = (unsigned long)data;
     int ret = cpu_up(cpu);
+
     if ( ret == -EBUSY )
     {
         /* On EBUSY, flush RCU work and have one more go. */
         rcu_barrier();
         ret = cpu_up(cpu);
     }
+
+    if ( !ret && !opt_smt &&
+         cpu_data[cpu].compute_unit_id == INVALID_CUID &&
+         cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) > 1 )
+    {
+        ret = cpu_down_helper(data);
+        if ( ret )
+            printk("Could not re-offline CPU%u (%d)\n", cpu, ret);
+        else
+            ret = -EPERM;
+    }
+
     return ret;
 }
 
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -59,6 +59,8 @@ extern uint8_t kbd_shift_flags;
 extern unsigned long highmem_start;
 #endif
 
+extern int8_t opt_smt;
+
 #ifdef CONFIG_SHADOW_PAGING
 extern bool opt_dom0_shadow;
 #else




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 6/6] cpumask: tidy {,z}alloc_cpumask_var()
  2018-07-18  8:15 [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (4 preceding siblings ...)
  2018-07-18  8:24 ` [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-18  8:24 ` Jan Beulich
  2018-07-18 13:44   ` [PATCH v2 6/6] cpumask: tidy {, z}alloc_cpumask_var() Roger Pau Monné
       [not found] ` <5B4EF7D302000078001D53D1@suse.com>
  6 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2018-07-18  8:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall

Drop unnecessary casts and use bool in favor of bool_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>

--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -345,9 +345,9 @@ static inline int cpulist_scnprintf(char
 
 typedef cpumask_t *cpumask_var_t;
 
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
 {
-	*(void **)mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+	*mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
@@ -358,9 +358,9 @@ static inline bool cond_alloc_cpumask_va
 	return *mask != NULL;
 }
 
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
 {
-	*(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+	*mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
@@ -383,16 +383,16 @@ static inline void free_cpumask_var(cpum
 #else
 typedef cpumask_t cpumask_var_t[1];
 
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
 {
-	return 1;
+	return true;
 }
 #define cond_alloc_cpumask_var alloc_cpumask_var
 
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	cpumask_clear(*mask);
-	return 1;
+	return true;
 }
 #define cond_zalloc_cpumask_var zalloc_cpumask_var
 





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal
  2018-07-18  8:19 ` [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal Jan Beulich
@ 2018-07-18  8:49   ` Wei Liu
  2018-07-18 11:37   ` Roger Pau Monné
  2018-07-18 12:53   ` Andrew Cooper
  2 siblings, 0 replies; 25+ messages in thread
From: Wei Liu @ 2018-07-18  8:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Jul 18, 2018 at 02:19:29AM -0600, Jan Beulich wrote:
> In order to be able to service #MC on offlined CPUs, the GDT, IDT,
> stack, and per-CPU data (which includes the TSS) need to be kept
> allocated. They should only be freed upon CPU removal (which we
> currently don't support, so some code is becoming effectively dead
> for the moment).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/6] cpupools: fix state when downing a CPU failed
       [not found] ` <5B4EF7D302000078001D53D1@suse.com>
@ 2018-07-18  9:13   ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2018-07-18  9:13 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 18/07/18 10:18, Jan Beulich wrote:
> While I've run into the issue with further patches in place which no
> longer guarantee the per-CPU area to start out as all zeros, the
> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
> there.
> 
> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
> should not happen before cpu_disable_scheduler()). Clearing it in
> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
> piece of code twice. Since the field's value shouldn't matter while the
> CPU is offline, simply clear it (implicitly) for CPU_ONLINE and
> CPU_DOWN_FAILED, but only for other than the suspend/resume case (which
> gets specially handled in cpupool_cpu_remove()).
> 
> By adjusting the conditional in cpupool_cpu_add() CPU_DOWN_FAILED
> handling in the suspend case should now also be handled better.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal
  2018-07-18  8:19 ` [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal Jan Beulich
  2018-07-18  8:49   ` Wei Liu
@ 2018-07-18 11:37   ` Roger Pau Monné
  2018-07-18 14:30     ` Jan Beulich
  2018-07-18 12:53   ` Andrew Cooper
  2 siblings, 1 reply; 25+ messages in thread
From: Roger Pau Monné @ 2018-07-18 11:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Jul 18, 2018 at 02:19:29AM -0600, Jan Beulich wrote:
> --- a/xen/arch/x86/genapic/x2apic.c
> +++ b/xen/arch/x86/genapic/x2apic.c
> @@ -201,18 +201,21 @@ static int update_clusterinfo(
>          if ( !cluster_cpus_spare )
>              cluster_cpus_spare = xzalloc(cpumask_t);
>          if ( !cluster_cpus_spare ||
> -             !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
> +             !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
>              err = -ENOMEM;
>          break;
>      case CPU_UP_CANCELED:
>      case CPU_DEAD:
> +    case CPU_REMOVE:
> +        if ( park_offline_cpus == (action != CPU_REMOVE) )
> +            break;

I think this would be clearer as:

case CPU_UP_CANCELED:
case CPU_DEAD:
    if ( park_offline_cpus )
        break;

    /* fallthrough */
case CPU_REMOVE:
    if ( per_cpu...

If it's safe to do so (similar to what you do in cpu_callback).

>          if ( per_cpu(cluster_cpus, cpu) )
>          {
>              cpumask_clear_cpu(cpu, per_cpu(cluster_cpus, cpu));
>              if ( cpumask_empty(per_cpu(cluster_cpus, cpu)) )
> -                xfree(per_cpu(cluster_cpus, cpu));
> +                XFREE(per_cpu(cluster_cpus, cpu));
>          }
> -        free_cpumask_var(per_cpu(scratch_mask, cpu));
> +        FREE_CPUMASK_VAR(per_cpu(scratch_mask, cpu));
>          break;
>      }
>  
> --- a/xen/arch/x86/percpu.c
> +++ b/xen/arch/x86/percpu.c
> @@ -28,7 +28,7 @@ static int init_percpu_area(unsigned int
>      char *p;
>  
>      if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
> -        return -EBUSY;
> +        return 0;
>  
>      if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
>          return -ENOMEM;
> @@ -76,9 +76,12 @@ static int cpu_percpu_callback(
>          break;
>      case CPU_UP_CANCELED:
>      case CPU_DEAD:
> -        free_percpu_area(cpu);
> +        if ( !park_offline_cpus )
> +            free_percpu_area(cpu);
>          break;
> -    default:
> +    case CPU_REMOVE:
> +        if ( park_offline_cpus )
> +            free_percpu_area(cpu);
>          break;
>      }
>  
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
>  cpumask_t cpu_online_map __read_mostly;
>  EXPORT_SYMBOL(cpu_online_map);
>  
> +bool __read_mostly park_offline_cpus;

park_offline_cpus seems to be left to it's initial value (false) in
this patch. It might be good to mention in the commit message that
further patches will allow setting this value (I haven't looked yet,
but I assume so)?

> @@ -955,15 +970,19 @@ static int cpu_smpboot_alloc(unsigned in
>      if ( node != NUMA_NO_NODE )
>          memflags = MEMF_node(node);
>  
> -    stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
> +    if ( stack_base[cpu] == NULL )
> +        stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
>      if ( stack_base[cpu] == NULL )
>          goto out;
>      memguard_guard_stack(stack_base[cpu]);
>  
>      order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
> -    per_cpu(gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags);
> +    gdt = per_cpu(gdt_table, cpu);
> +    if ( gdt == NULL )
> +        gdt = alloc_xenheap_pages(order, memflags);

gdt = per_cpu(gdt_table, cpu) ?: alloc_xenheap_pages(order, memflags);

Is equivalent and shorter AFAICT.

> @@ -368,16 +387,20 @@ static inline bool_t alloc_cpumask_var(c
>  {
>  	return 1;
>  }
> +#define cond_alloc_cpumask_var alloc_cpumask_var
>  
>  static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
>  {
>  	cpumask_clear(*mask);
>  	return 1;
>  }
> +#define cond_zalloc_cpumask_var zalloc_cpumask_var
>  
>  static inline void free_cpumask_var(cpumask_var_t mask)
>  {
>  }
> +
> +#define FREE_CPUMASK_VAR(m) ((void)(m))

For correctness shouldn't this call free_cpumask_var?

>  #endif
>  
>  #if NR_CPUS > 1
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -162,6 +162,14 @@ void free_xenheap_pages(void *v, unsigne
>  bool scrub_free_pages(void);
>  #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
>  #define free_xenheap_page(v) (free_xenheap_pages(v,0))
> +
> +/* Free an allocation, and zero the pointer to it. */
> +#define FREE_XENHEAP_PAGES(p, o) do { \
> +    free_xenheap_pages(p, o);         \
> +    (p) = NULL;                       \
> +} while ( false )
> +#define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
> +
>  /* Map machine page range in Xen virtual address space. */
>  int map_pages_to_xen(
>      unsigned long virt,
> --- a/xen/include/xen/xmalloc.h
> +++ b/xen/include/xen/xmalloc.h
> @@ -42,6 +42,12 @@
>  /* Free any of the above. */
>  extern void xfree(void *);
>  
> +/* Free an allocation, and zero the pointer to it. */
> +#define XFREE(p) do { \
> +    xfree(p);         \
> +    (p) = NULL;       \
> +} while ( false )

Do you need the do { ... } while construct here? Isn't it enough to
use ({ ... })?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads
  2018-07-18  8:20 ` [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
@ 2018-07-18 11:45   ` Roger Pau Monné
  2018-07-18 12:54   ` Andrew Cooper
  1 sibling, 0 replies; 25+ messages in thread
From: Roger Pau Monné @ 2018-07-18 11:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper

On Wed, Jul 18, 2018 at 02:20:29AM -0600, Jan Beulich wrote:
> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
> the difference is not very big. The most relevant change (requiring some
> code restructuring) is that the topoext feature no longer means there is
> a valid CU ID.
> 
> Take the opportunity and convert wrongly plain int variables in
> set_cpu_sibling_map() to unsigned int.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Brian Woods <brian.woods@amd.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> v2: Don't break (bogus) indentation style in amd_get_topology().
> 
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
>                  u32 eax, ebx, ecx, edx;
>  
>                  cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
> -                c->compute_unit_id = ebx & 0xFF;
>                  c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
> +
> +                if (c->x86 < 0x17)
> +                        c->compute_unit_id = ebx & 0xFF;
> +                else {
> +                        c->cpu_core_id = ebx & 0xFF;
> +                        c->x86_max_cores /= c->x86_num_siblings;
> +                }

Should compute_unit_id and cpu_core_id be inside of an union in
cpuinfo_x86?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal
  2018-07-18  8:19 ` [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal Jan Beulich
  2018-07-18  8:49   ` Wei Liu
  2018-07-18 11:37   ` Roger Pau Monné
@ 2018-07-18 12:53   ` Andrew Cooper
  2 siblings, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2018-07-18 12:53 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Julien Grall

On 18/07/18 09:19, Jan Beulich wrote:
> In order to be able to service #MC on offlined CPUs, the GDT, IDT,
> stack, and per-CPU data (which includes the TSS) need to be kept
> allocated. They should only be freed upon CPU removal (which we
> currently don't support, so some code is becoming effectively dead
> for the moment).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads
  2018-07-18  8:20 ` [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
  2018-07-18 11:45   ` Roger Pau Monné
@ 2018-07-18 12:54   ` Andrew Cooper
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2018-07-18 12:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 18/07/18 09:20, Jan Beulich wrote:
> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
> the difference is not very big. The most relevant change (requiring some
> code restructuring) is that the topoext feature no longer means there is
> a valid CU ID.
>
> Take the opportunity and convert wrongly plain int variables in
> set_cpu_sibling_map() to unsigned int.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Brian Woods <brian.woods@amd.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-18  8:21 ` [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-18 13:10   ` Andrew Cooper
  2018-07-18 13:36   ` Roger Pau Monné
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2018-07-18 13:10 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 18/07/18 09:21, Jan Beulich wrote:
> Reportedly Intel CPUs which can't broadcast #MC to all targeted
> cores/threads because some have CR4.MCE clear will shut down. Therefore
> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
> bring up all CPUs in order to be able to set CR4.MCE in the first place.
>
> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
> altogether right here.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-18  8:21 ` [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
  2018-07-18 13:10   ` Andrew Cooper
@ 2018-07-18 13:36   ` Roger Pau Monné
  2018-07-18 13:56   ` Wei Liu
  2018-07-18 23:48   ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 25+ messages in thread
From: Roger Pau Monné @ 2018-07-18 13:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper

On Wed, Jul 18, 2018 at 02:21:53AM -0600, Jan Beulich wrote:
> Reportedly Intel CPUs which can't broadcast #MC to all targeted
> cores/threads because some have CR4.MCE clear will shut down. Therefore
> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
> bring up all CPUs in order to be able to set CR4.MCE in the first place.
> 
> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
> altogether right here.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-18  8:24 ` [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-18 13:43   ` Roger Pau Monné
  2018-07-18 13:48   ` Wei Liu
  1 sibling, 0 replies; 25+ messages in thread
From: Roger Pau Monné @ 2018-07-18 13:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, xen-devel, Brian Woods

On Wed, Jul 18, 2018 at 02:24:14AM -0600, Jan Beulich wrote:
> Shared resources (L1 cache and TLB in particular) present a risk of
> information leak via side channels. Don't use hyperthreads in such
> cases, but allow independent control of their use at the same time.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 6/6] cpumask: tidy {, z}alloc_cpumask_var()
  2018-07-18  8:24 ` [PATCH v2 6/6] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
@ 2018-07-18 13:44   ` Roger Pau Monné
  0 siblings, 0 replies; 25+ messages in thread
From: Roger Pau Monné @ 2018-07-18 13:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Jul 18, 2018 at 02:24:57AM -0600, Jan Beulich wrote:
> Drop unnecessary casts and use bool in favor of bool_t.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-18  8:24 ` [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
  2018-07-18 13:43   ` Roger Pau Monné
@ 2018-07-18 13:48   ` Wei Liu
  2018-07-18 14:00     ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Wei Liu @ 2018-07-18 13:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, xen-devel, Brian Woods

On Wed, Jul 18, 2018 at 02:24:14AM -0600, Jan Beulich wrote:
> Shared resources (L1 cache and TLB in particular) present a risk of
> information leak via side channels. Don't use hyperthreads in such
> cases, but allow independent control of their use at the same time.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Rename option to "smt".
> ---
> An option to avoid the up/down cycle would be to avoid clearing the
> sibling (and then perhaps also core) map of parked CPUs, allowing to
> bail early from cpu_up_helper().
> 
> TBD: How to prevent the CPU from transiently becoming available for
>      scheduling when being onlined at runtime?
> 
> TBD: For now the patch assumes all HT-enabled CPUs are affected by side
>      channel attacks through shared resources. There are claims that AMD
>      ones aren't, but it hasn't really become clear to me why that would
>      be, as I don't see the fully associative L1 TLBs to be sufficient
>      reason for there to not be other possible avenues (L2 TLB, caches).
> 
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1764,6 +1764,13 @@ Use `smap=hvm` to allow SMAP use by HVM
>  Flag to enable Supervisor Mode Execution Protection
>  Use `smep=hvm` to allow SMEP use by HVM guests only.
>  
> +### smt (x86)
> +> `= <boolean>`
> +
> +Default: `false`
> +
> +Control bring up of multiple hyper-threads per CPU core.
> +
>  ### snb\_igd\_quirk
>  > `= <boolean> | cap | <integer>`
>  
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -62,6 +62,9 @@ boolean_param("nosmp", opt_nosmp);
>  static unsigned int __initdata max_cpus;
>  integer_param("maxcpus", max_cpus);
>  
> +int8_t __read_mostly opt_smt = -1;
> +boolean_param("ht", opt_smt);
> +

But here it is still "ht"?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-18  8:21 ` [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
  2018-07-18 13:10   ` Andrew Cooper
  2018-07-18 13:36   ` Roger Pau Monné
@ 2018-07-18 13:56   ` Wei Liu
  2018-07-18 14:02     ` Jan Beulich
  2018-07-18 23:48   ` Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2018-07-18 13:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Wed, Jul 18, 2018 at 02:21:53AM -0600, Jan Beulich wrote:
> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -68,19 +68,26 @@ physid_mask_t phys_cpu_present_map;
>  
>  void __init set_nr_cpu_ids(unsigned int max_cpus)
>  {
> +	unsigned int tot_cpus = num_processors + disabled_cpus;
> +
>  	if (!max_cpus)
> -		max_cpus = num_processors + disabled_cpus;
> +		max_cpus = tot_cpus;
>  	if (max_cpus > NR_CPUS)
>  		max_cpus = NR_CPUS;
>  	else if (!max_cpus)
>  		max_cpus = 1;
>  	printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
>  	       max_cpus, max_t(int, max_cpus - num_processors, 0));
> -	nr_cpu_ids = max_cpus;
> +
> +	if (!park_offline_cpus)
> +		tot_cpus = max_cpus;
> +	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
> +	if (park_offline_cpus && nr_cpu_ids < num_processors)
> +		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
> +		       num_processors - nr_cpu_ids);

Minor nit:

Maybe it is just me, but "Cannot" is normally used when an operation
fails, I would use "Won't" to indicate this is a deliberate action, not
a consequence of failure, or say "Skip bringing up ...".

Whatever this log message ends up like:

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-18 13:48   ` Wei Liu
@ 2018-07-18 14:00     ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18 14:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: George Dunlap, Andrew Cooper, Dario Faggioli,
	Suravee Suthikulpanit, xen-devel, Brian Woods

>>> On 18.07.18 at 15:48, <wei.liu2@citrix.com> wrote:
> On Wed, Jul 18, 2018 at 02:24:14AM -0600, Jan Beulich wrote:
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -62,6 +62,9 @@ boolean_param("nosmp", opt_nosmp);
>>  static unsigned int __initdata max_cpus;
>>  integer_param("maxcpus", max_cpus);
>>  
>> +int8_t __read_mostly opt_smt = -1;
>> +boolean_param("ht", opt_smt);
>> +
> 
> But here it is still "ht"?

Argh - thanks for noticing. I did specifically run a test for this, but
then failed to actually inspect the log.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-18 13:56   ` Wei Liu
@ 2018-07-18 14:02     ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-18 14:02 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, xen-devel

>>> On 18.07.18 at 15:56, <wei.liu2@citrix.com> wrote:
> On Wed, Jul 18, 2018 at 02:21:53AM -0600, Jan Beulich wrote:
>> --- a/xen/arch/x86/mpparse.c
>> +++ b/xen/arch/x86/mpparse.c
>> @@ -68,19 +68,26 @@ physid_mask_t phys_cpu_present_map;
>>  
>>  void __init set_nr_cpu_ids(unsigned int max_cpus)
>>  {
>> +	unsigned int tot_cpus = num_processors + disabled_cpus;
>> +
>>  	if (!max_cpus)
>> -		max_cpus = num_processors + disabled_cpus;
>> +		max_cpus = tot_cpus;
>>  	if (max_cpus > NR_CPUS)
>>  		max_cpus = NR_CPUS;
>>  	else if (!max_cpus)
>>  		max_cpus = 1;
>>  	printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
>>  	       max_cpus, max_t(int, max_cpus - num_processors, 0));
>> -	nr_cpu_ids = max_cpus;
>> +
>> +	if (!park_offline_cpus)
>> +		tot_cpus = max_cpus;
>> +	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
>> +	if (park_offline_cpus && nr_cpu_ids < num_processors)
>> +		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
>> +		       num_processors - nr_cpu_ids);
> 
> Minor nit:
> 
> Maybe it is just me, but "Cannot" is normally used when an operation
> fails, I would use "Won't" to indicate this is a deliberate action, not
> a consequence of failure, or say "Skip bringing up ...".

But it isn't a deliberate action - we just don't have enough bits to
represent all of them. To me this is the indication of a failure, just
that we catch it before it could actually cause problems.

> Whatever this log message ends up like:
> 
> Reviewed-by: Wei Liu <wei.liu2@citrix.com>

Thanks.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal
  2018-07-18 11:37   ` Roger Pau Monné
@ 2018-07-18 14:30     ` Jan Beulich
  2018-07-18 14:35       ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2018-07-18 14:30 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 18.07.18 at 13:37, <roger.pau@citrix.com> wrote:
> On Wed, Jul 18, 2018 at 02:19:29AM -0600, Jan Beulich wrote:
>> --- a/xen/arch/x86/genapic/x2apic.c
>> +++ b/xen/arch/x86/genapic/x2apic.c
>> @@ -201,18 +201,21 @@ static int update_clusterinfo(
>>          if ( !cluster_cpus_spare )
>>              cluster_cpus_spare = xzalloc(cpumask_t);
>>          if ( !cluster_cpus_spare ||
>> -             !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
>> +             !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
>>              err = -ENOMEM;
>>          break;
>>      case CPU_UP_CANCELED:
>>      case CPU_DEAD:
>> +    case CPU_REMOVE:
>> +        if ( park_offline_cpus == (action != CPU_REMOVE) )
>> +            break;
> 
> I think this would be clearer as:
> 
> case CPU_UP_CANCELED:
> case CPU_DEAD:
>     if ( park_offline_cpus )
>         break;
> 
>     /* fallthrough */
> case CPU_REMOVE:
>     if ( per_cpu...

But this is not equivalent. I want to do nothing for UP_CANCELED
and DEAD when parking, and for REMOVE when not parking.

> If it's safe to do so (similar to what you do in cpu_callback).

There I'm replicating what needs doing (as it's a simple function
call).

>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
>>  cpumask_t cpu_online_map __read_mostly;
>>  EXPORT_SYMBOL(cpu_online_map);
>>  
>> +bool __read_mostly park_offline_cpus;
> 
> park_offline_cpus seems to be left to it's initial value (false) in
> this patch. It might be good to mention in the commit message that
> further patches will allow setting this value (I haven't looked yet,
> but I assume so)?

Ah, yes, I had meant to, but forgot by the time I wrote the text.

>> @@ -955,15 +970,19 @@ static int cpu_smpboot_alloc(unsigned in
>>      if ( node != NUMA_NO_NODE )
>>          memflags = MEMF_node(node);
>>  
>> -    stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
>> +    if ( stack_base[cpu] == NULL )
>> +        stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
>>      if ( stack_base[cpu] == NULL )
>>          goto out;
>>      memguard_guard_stack(stack_base[cpu]);
>>  
>>      order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
>> -    per_cpu(gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags);
>> +    gdt = per_cpu(gdt_table, cpu);
>> +    if ( gdt == NULL )
>> +        gdt = alloc_xenheap_pages(order, memflags);
> 
> gdt = per_cpu(gdt_table, cpu) ?: alloc_xenheap_pages(order, memflags);
> 
> Is equivalent and shorter AFAICT.

Indeed - I didn't notice this because of the sequence of transformations
that lead to the current result.

>> @@ -368,16 +387,20 @@ static inline bool_t alloc_cpumask_var(c
>>  {
>>  	return 1;
>>  }
>> +#define cond_alloc_cpumask_var alloc_cpumask_var
>>  
>>  static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
>>  {
>>  	cpumask_clear(*mask);
>>  	return 1;
>>  }
>> +#define cond_zalloc_cpumask_var zalloc_cpumask_var
>>  
>>  static inline void free_cpumask_var(cpumask_var_t mask)
>>  {
>>  }
>> +
>> +#define FREE_CPUMASK_VAR(m) ((void)(m))
> 
> For correctness shouldn't this call free_cpumask_var?

Hmm, yes, that would also let us get away without cast.

>> --- a/xen/include/xen/xmalloc.h
>> +++ b/xen/include/xen/xmalloc.h
>> @@ -42,6 +42,12 @@
>>  /* Free any of the above. */
>>  extern void xfree(void *);
>>  
>> +/* Free an allocation, and zero the pointer to it. */
>> +#define XFREE(p) do { \
>> +    xfree(p);         \
>> +    (p) = NULL;       \
>> +} while ( false )
> 
> Do you need the do { ... } while construct here? Isn't it enough to
> use ({ ... })?

I try to avoid gcc extensions when the same can be achieved without.
Furthermore I'd prefer if the construct was not usable as rvalue.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal
  2018-07-18 14:30     ` Jan Beulich
@ 2018-07-18 14:35       ` Andrew Cooper
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2018-07-18 14:35 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Julien Grall, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 715 bytes --]

On 18/07/18 15:30, Jan Beulich wrote:
>>> --- a/xen/include/xen/xmalloc.h
>>> +++ b/xen/include/xen/xmalloc.h
>>> @@ -42,6 +42,12 @@
>>>  /* Free any of the above. */
>>>  extern void xfree(void *);
>>>  
>>> +/* Free an allocation, and zero the pointer to it. */
>>> +#define XFREE(p) do { \
>>> +    xfree(p);         \
>>> +    (p) = NULL;       \
>>> +} while ( false )
>> Do you need the do { ... } while construct here? Isn't it enough to
>> use ({ ... })?
> I try to avoid gcc extensions when the same can be achieved without.
> Furthermore I'd prefer if the construct was not usable as rvalue.

Yes - the do { } while ( 0 ) style here is actually more useful than ({
}) to prevent caller (mis)use.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 1255 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-18  8:21 ` [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
                     ` (2 preceding siblings ...)
  2018-07-18 13:56   ` Wei Liu
@ 2018-07-18 23:48   ` Konrad Rzeszutek Wilk
  2018-07-19  8:25     ` Jan Beulich
  3 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-07-18 23:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper

On Wed, Jul 18, 2018 at 02:21:53AM -0600, Jan Beulich wrote:
> Reportedly Intel CPUs which can't broadcast #MC to all targeted
> cores/threads because some have CR4.MCE clear will shut down. Therefore
> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
> bring up all CPUs in order to be able to set CR4.MCE in the first place.
> 
> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
> altogether right here.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Use ROUNDUP().
> ---
> Instead of fully bringing up CPUs and then calling cpu_down(), another
> option would be to suppress/cancel full bringup in smp_callin(). But I
> guess we should try to keep things simple for now, and see later whether
> this can be "optimized".
> ---
> Note: The parked CPUs can be brought online (i.e. the meaning of
>       "maxcpus=" isn't as strict anymore as it was before), but won't
>       immediately be used for scheduling pre-existing Dom0 CPUs. That's
>       because dom0_setup_vcpu() artifically restricts the affinity. For
>       DomU-s whose affinity was not artifically restricted, no such
>       limitation exists, albeit the shown "soft" affinity appears to
>       suffer a similar issue. As that's not a goal of this patch, I've
>       put the issues on the side for now, perhaps for someone else to
>       take care of.
> Note: On one of my test systems the parked CPUs get _PSD data reported
>       by Dom0 that is different from the non-parked ones (coord_type is
>       0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this

From drivers/xen/xen-acpi-processor.c:

181         /* 'acpi_processor_preregister_performance' does not parse if the       
182          * num_processors <= 1, but Xen still requires it. Do it manually here. 
183          */                                                                     
184         if (pdomain->num_processors <= 1) {                                     
185                 if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ALL)            
186                         dst->shared_type = CPUFREQ_SHARED_TYPE_ALL;             
187                 else if (pdomain->coord_type == DOMAIN_COORD_TYPE_HW_ALL)       
188                         dst->shared_type = CPUFREQ_SHARED_TYPE_HW;              
189                 else if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ANY)       
190                         dst->shared_type = CPUFREQ_SHARED_TYPE_ANY;             
191                                                                                 
192         }                                        

?
>       problem, so there is apparently something amiss in the processor
>       driver. I've tried to figure out what, but I couldn't, despite the
>       AML suggesting that this might be some _OSC invocation (but if it
>       is, I can't find it - acpi_run_osc() clearly does not anywhere get
>       invoked in a per-CPU fashion).
> 
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -13,6 +13,7 @@
>  #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
>  
>  #include "cpu.h"
> +#include "mcheck/x86_mca.h"
>  
>  bool_t opt_arat = 1;
>  boolean_param("arat", opt_arat);
> @@ -343,6 +344,9 @@ static void __init early_cpu_detect(void
>  			hap_paddr_bits = PADDR_BITS;
>  	}
>  
> +	if (c->x86_vendor != X86_VENDOR_AMD)
> +		park_offline_cpus = opt_mce;
> +
>  	initialize_cpu_data(0);
>  }
>  
> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c
> +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
> @@ -636,8 +636,6 @@ static void clear_cmci(void)
>  
>  static void cpu_mcheck_disable(void)
>  {
> -    clear_in_cr4(X86_CR4_MCE);
> -
>      if ( cmci_support && opt_mce )
>          clear_cmci();
>  }
> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -68,19 +68,26 @@ physid_mask_t phys_cpu_present_map;
>  
>  void __init set_nr_cpu_ids(unsigned int max_cpus)
>  {
> +	unsigned int tot_cpus = num_processors + disabled_cpus;
> +
>  	if (!max_cpus)
> -		max_cpus = num_processors + disabled_cpus;
> +		max_cpus = tot_cpus;
>  	if (max_cpus > NR_CPUS)
>  		max_cpus = NR_CPUS;
>  	else if (!max_cpus)
>  		max_cpus = 1;
>  	printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
>  	       max_cpus, max_t(int, max_cpus - num_processors, 0));
> -	nr_cpu_ids = max_cpus;
> +
> +	if (!park_offline_cpus)
> +		tot_cpus = max_cpus;
> +	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
> +	if (park_offline_cpus && nr_cpu_ids < num_processors)
> +		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
> +		       num_processors - nr_cpu_ids);
>  
>  #ifndef nr_cpumask_bits
> -	nr_cpumask_bits = (max_cpus + (BITS_PER_LONG - 1)) &
> -			  ~(BITS_PER_LONG - 1);
> +	nr_cpumask_bits = ROUNDUP(nr_cpu_ids, BITS_PER_LONG);
>  	printk(XENLOG_DEBUG "NR_CPUS:%u nr_cpumask_bits:%u\n",
>  	       NR_CPUS, nr_cpumask_bits);
>  #endif
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -665,7 +665,7 @@ void __init noreturn __start_xen(unsigne
>  {
>      char *memmap_type = NULL;
>      char *cmdline, *kextra, *loader;
> -    unsigned int initrdidx;
> +    unsigned int initrdidx, num_parked = 0;
>      multiboot_info_t *mbi;
>      module_t *mod;
>      unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
> @@ -1512,7 +1512,8 @@ void __init noreturn __start_xen(unsigne
>      else
>      {
>          set_nr_cpu_ids(max_cpus);
> -        max_cpus = nr_cpu_ids;
> +        if ( !max_cpus )
> +            max_cpus = nr_cpu_ids;
>      }
>  
>      if ( xen_guest )
> @@ -1635,16 +1636,27 @@ void __init noreturn __start_xen(unsigne
>              /* Set up node_to_cpumask based on cpu_to_node[]. */
>              numa_add_cpu(i);
>  
> -            if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
> +            if ( (park_offline_cpus || num_online_cpus() < max_cpus) &&
> +                 !cpu_online(i) )
>              {
>                  int ret = cpu_up(i);
>                  if ( ret != 0 )
>                      printk("Failed to bring up CPU %u (error %d)\n", i, ret);
> +                else if ( num_online_cpus() > max_cpus )
> +                {
> +                    ret = cpu_down(i);
> +                    if ( !ret )
> +                        ++num_parked;
> +                    else
> +                        printk("Could not re-offline CPU%u (%d)\n", i, ret);
> +                }
>              }
>          }
>      }
>  
>      printk("Brought up %ld CPUs\n", (long)num_online_cpus());
> +    if ( num_parked )
> +        printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
>      smp_cpus_done();
>  
>      do_initcalls();
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -342,12 +342,6 @@ static always_inline void set_in_cr4 (un
>      write_cr4(read_cr4() | mask);
>  }
>  
> -static always_inline void clear_in_cr4 (unsigned long mask)
> -{
> -    mmu_cr4_features &= ~mask;
> -    write_cr4(read_cr4() & ~mask);
> -}
> -
>  static inline unsigned int read_pkru(void)
>  {
>      unsigned int pkru;
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-18 23:48   ` Konrad Rzeszutek Wilk
@ 2018-07-19  8:25     ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2018-07-19  8:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, xen-devel

>>> On 19.07.18 at 01:48, <konrad.wilk@oracle.com> wrote:
> On Wed, Jul 18, 2018 at 02:21:53AM -0600, Jan Beulich wrote:
>> Reportedly Intel CPUs which can't broadcast #MC to all targeted
>> cores/threads because some have CR4.MCE clear will shut down. Therefore
>> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
>> bring up all CPUs in order to be able to set CR4.MCE in the first place.
>> 
>> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
>> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
>> altogether right here.
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: Use ROUNDUP().
>> ---
>> Instead of fully bringing up CPUs and then calling cpu_down(), another
>> option would be to suppress/cancel full bringup in smp_callin(). But I
>> guess we should try to keep things simple for now, and see later whether
>> this can be "optimized".
>> ---
>> Note: The parked CPUs can be brought online (i.e. the meaning of
>>       "maxcpus=" isn't as strict anymore as it was before), but won't
>>       immediately be used for scheduling pre-existing Dom0 CPUs. That's
>>       because dom0_setup_vcpu() artifically restricts the affinity. For
>>       DomU-s whose affinity was not artifically restricted, no such
>>       limitation exists, albeit the shown "soft" affinity appears to
>>       suffer a similar issue. As that's not a goal of this patch, I've
>>       put the issues on the side for now, perhaps for someone else to
>>       take care of.
>> Note: On one of my test systems the parked CPUs get _PSD data reported
>>       by Dom0 that is different from the non-parked ones (coord_type is
>>       0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this
> 
> From drivers/xen/xen-acpi-processor.c:
> 
> 181         /* 'acpi_processor_preregister_performance' does not parse if the       
> 182          * num_processors <= 1, but Xen still requires it. Do it manually here. 
> 183          */                                                                     
> 184         if (pdomain->num_processors <= 1) {                                     
> 185                 if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ALL)            
> 186                         dst->shared_type = CPUFREQ_SHARED_TYPE_ALL;             
> 187                 else if (pdomain->coord_type == DOMAIN_COORD_TYPE_HW_ALL)       
> 188                         dst->shared_type = CPUFREQ_SHARED_TYPE_HW;              
> 189                 else if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ANY)       
> 190                         dst->shared_type = CPUFREQ_SHARED_TYPE_ANY;             
> 191                                                                                 
> 192         }                                        
> 
> ?

Yes, I had found that code, but pdomain->num_processors shouldn't
depend on the number of _v_CPU-s in Dom0 afaict. Yet as said - the
problem went away when running Dom0 with as many vCPU-s as
there are onlined _and_ parked threads / cores. When I get back to
debug this further (unless an explanation / solution turns up earlier)
I could certainly go and double check whether this code comes into
play at all, and if so whether it has a bad effect in the particular case
here.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-07-19  8:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18  8:15 [PATCH v2 0/6] x86: (allow to) suppress use of hyper-threading Jan Beulich
2018-07-18  8:18 ` [PATCH v2 1/6] cpupools: fix state when downing a CPU failed Jan Beulich
2018-07-18  8:19 ` [PATCH v2 2/6] x86: distinguish CPU offlining from CPU removal Jan Beulich
2018-07-18  8:49   ` Wei Liu
2018-07-18 11:37   ` Roger Pau Monné
2018-07-18 14:30     ` Jan Beulich
2018-07-18 14:35       ` Andrew Cooper
2018-07-18 12:53   ` Andrew Cooper
2018-07-18  8:20 ` [PATCH v2 3/6] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
2018-07-18 11:45   ` Roger Pau Monné
2018-07-18 12:54   ` Andrew Cooper
2018-07-18  8:21 ` [PATCH v2 4/6] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
2018-07-18 13:10   ` Andrew Cooper
2018-07-18 13:36   ` Roger Pau Monné
2018-07-18 13:56   ` Wei Liu
2018-07-18 14:02     ` Jan Beulich
2018-07-18 23:48   ` Konrad Rzeszutek Wilk
2018-07-19  8:25     ` Jan Beulich
2018-07-18  8:24 ` [PATCH v2 5/6] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
2018-07-18 13:43   ` Roger Pau Monné
2018-07-18 13:48   ` Wei Liu
2018-07-18 14:00     ` Jan Beulich
2018-07-18  8:24 ` [PATCH v2 6/6] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
2018-07-18 13:44   ` [PATCH v2 6/6] cpumask: tidy {, z}alloc_cpumask_var() Roger Pau Monné
     [not found] ` <5B4EF7D302000078001D53D1@suse.com>
2018-07-18  9:13   ` [PATCH v2 1/6] cpupools: fix state when downing a CPU failed Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.