All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] x86: (allow to) suppress use of hyper-threading
@ 2018-07-11 11:55 Jan Beulich
  2018-07-11 12:04 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Jan Beulich
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 11:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

I've been considering to add a respective command line option for
quite a long time, but never got around to. Now that the TLBleed
information is public[1], we're at point where we not only want,
but need this, and where perhaps it needs to be the default on
affected systems. The first 5 patches are all prerequisites to the
6th one; the final two are simply addressing things I had noticed
while putting together the rest.

1: cpupools: fix state when downing a CPU failed
2: x86: distinguish CPU offlining from CPU removal
3: allow cpu_down() to be called earlier
4: x86/AMD: distinguish compute units from hyper-threads
5: x86: bring up all CPUs even if not all are supposed to be used
6: x86: command line option to avoid use of secondary hyper-threads
7: x86/shim: fully ignore "nosmp" and "maxcpus="
8: cpumask: tidy {,z}alloc_cpumask_var() 

Jan

[1] https://www.vusec.net/projects/tlbleed/



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/8] cpupools: fix state when downing a CPU failed
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
@ 2018-07-11 12:04 ` Jan Beulich
  2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Dario Faggioli

While I've run into the issue with further patches in place which no
longer guarantee the per-CPU area to start out as all zeros, the
CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
of schedule_cpu_switch() will trigger the "c != old_pool" assertion
there.

Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
should not happen before cpu_disable_scheduler()). Clearing it in
CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
piece of code twice. Since the field's value shouldn't matter while the
CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
only for other than the suspend/resume case (which gets specially
handled in cpupool_cpu_remove()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
     cpupool_cpu_remove(), but besides that - as per above - likely
     being too early, that function has further prereqs to be met. It
     also doesn't look as if cpupool_unassign_cpu_helper() could be used
     there.

--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -778,6 +778,8 @@ static int cpu_callback(
     {
     case CPU_DOWN_FAILED:
     case CPU_ONLINE:
+        if ( system_state <= SYS_STATE_active )
+            per_cpu(cpupool, cpu) = NULL;
         rc = cpupool_cpu_add(cpu);
         break;
     case CPU_DOWN_PREPARE:





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
  2018-07-11 12:04 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Jan Beulich
@ 2018-07-11 12:06 ` Jan Beulich
  2018-07-12 10:53   ` Wei Liu
  2018-07-12 12:42   ` Andrew Cooper
  2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall

In order to be able to service #MC on offlined CPUs, GDT, IDT, stack,
and per-CPU data (which includes the TSS) need to be kept allocated.
They should only be freed upon CPU removal (which we currently don't
support, so some code is becoming effectively dead for the moment).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -692,12 +692,15 @@ static void cpu_bank_free(unsigned int c
 
     mcabanks_free(poll);
     mcabanks_free(clr);
+
+    per_cpu(poll_bankmask, cpu) = NULL;
+    per_cpu(mce_clear_banks, cpu) = NULL;
 }
 
 static int cpu_bank_alloc(unsigned int cpu)
 {
-    struct mca_banks *poll = mcabanks_alloc();
-    struct mca_banks *clr = mcabanks_alloc();
+    struct mca_banks *poll = per_cpu(poll_bankmask, cpu) ?: mcabanks_alloc();
+    struct mca_banks *clr = per_cpu(mce_clear_banks, cpu) ?: mcabanks_alloc();
 
     if ( !poll || !clr )
     {
@@ -725,7 +728,13 @@ static int cpu_callback(
 
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        cpu_bank_free(cpu);
+        if ( !park_offline_cpus )
+            cpu_bank_free(cpu);
+        break;
+
+    case CPU_REMOVE:
+        if ( park_offline_cpus )
+            cpu_bank_free(cpu);
         break;
     }
 
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -107,10 +107,11 @@ static void play_dead(void)
     local_irq_disable();
 
     /*
-     * NOTE: After cpu_exit_clear, per-cpu variables are no longer accessible,
-     * as they may be freed at any time. In this case, heap corruption or
-     * #PF can occur (when heap debugging is enabled). For example, even
-     * printk() can involve tasklet scheduling, which touches per-cpu vars.
+     * NOTE: After cpu_exit_clear, per-cpu variables may no longer accessible,
+     * as they may be freed at any time if offline CPUs don't get parked. In
+     * this case, heap corruption or #PF can occur (when heap debugging is
+     * enabled). For example, even printk() can involve tasklet scheduling,
+     * which touches per-cpu vars.
      * 
      * Consider very carefully when adding code to *dead_idle. Most hypervisor
      * subsystems are unsafe to call.
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -201,18 +201,25 @@ static int update_clusterinfo(
         if ( !cluster_cpus_spare )
             cluster_cpus_spare = xzalloc(cpumask_t);
         if ( !cluster_cpus_spare ||
-             !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
+             !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
             err = -ENOMEM;
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
+    case CPU_REMOVE:
+        if ( park_offline_cpus == (action != CPU_REMOVE) )
+            break;
         if ( per_cpu(cluster_cpus, cpu) )
         {
             cpumask_clear_cpu(cpu, per_cpu(cluster_cpus, cpu));
             if ( cpumask_empty(per_cpu(cluster_cpus, cpu)) )
+            {
                 xfree(per_cpu(cluster_cpus, cpu));
+                per_cpu(cluster_cpus, cpu) = NULL;
+            }
         }
         free_cpumask_var(per_cpu(scratch_mask, cpu));
+        clear_cpumask_var(&per_cpu(scratch_mask, cpu));
         break;
     }
 
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -28,7 +28,7 @@ static int init_percpu_area(unsigned int
     char *p;
 
     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
-        return -EBUSY;
+        return 0;
 
     if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
         return -ENOMEM;
@@ -76,9 +76,12 @@ static int cpu_percpu_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        free_percpu_area(cpu);
+        if ( !park_offline_cpus )
+            free_percpu_area(cpu);
         break;
-    default:
+    case CPU_REMOVE:
+        if ( park_offline_cpus )
+            free_percpu_area(cpu);
         break;
     }
 
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
 cpumask_t cpu_online_map __read_mostly;
 EXPORT_SYMBOL(cpu_online_map);
 
+bool __read_mostly park_offline_cpus;
+
 unsigned int __read_mostly nr_sockets;
 cpumask_t **__read_mostly socket_cpumask;
 static cpumask_t *secondary_socket_cpumask;
@@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
     }
 }
 
-static void cpu_smpboot_free(unsigned int cpu)
+static void cpu_smpboot_free(unsigned int cpu, bool all)
 {
     unsigned int order, socket = cpu_to_socket(cpu);
     struct cpuinfo_x86 *c = cpu_data;
@@ -898,15 +900,24 @@ static void cpu_smpboot_free(unsigned in
         socket_cpumask[socket] = NULL;
     }
 
-    c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
-    c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
-    c[cpu].compute_unit_id = INVALID_CUID;
     cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
 
-    free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
-    free_cpumask_var(per_cpu(cpu_core_mask, cpu));
-    if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
-        free_cpumask_var(per_cpu(scratch_cpumask, cpu));
+    if ( all )
+    {
+        c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
+        c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
+        c[cpu].compute_unit_id = INVALID_CUID;
+
+        free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
+        clear_cpumask_var(&per_cpu(cpu_sibling_mask, cpu));
+        free_cpumask_var(per_cpu(cpu_core_mask, cpu));
+        clear_cpumask_var(&per_cpu(cpu_core_mask, cpu));
+        if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
+        {
+            free_cpumask_var(per_cpu(scratch_cpumask, cpu));
+            clear_cpumask_var(&per_cpu(scratch_cpumask, cpu));
+        }
+    }
 
     cleanup_cpu_root_pgt(cpu);
 
@@ -928,19 +939,26 @@ static void cpu_smpboot_free(unsigned in
     }
 
     order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
-    free_xenheap_pages(per_cpu(gdt_table, cpu), order);
+    if ( all )
+    {
+        free_xenheap_pages(per_cpu(gdt_table, cpu), order);
+        per_cpu(gdt_table, cpu) = NULL;
+    }
 
     free_xenheap_pages(per_cpu(compat_gdt_table, cpu), order);
 
-    order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
-    free_xenheap_pages(idt_tables[cpu], order);
-    idt_tables[cpu] = NULL;
-
-    if ( stack_base[cpu] != NULL )
+    if ( all )
     {
-        memguard_unguard_stack(stack_base[cpu]);
-        free_xenheap_pages(stack_base[cpu], STACK_ORDER);
-        stack_base[cpu] = NULL;
+        order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
+        free_xenheap_pages(idt_tables[cpu], order);
+        idt_tables[cpu] = NULL;
+
+        if ( stack_base[cpu] != NULL )
+        {
+            memguard_unguard_stack(stack_base[cpu]);
+            free_xenheap_pages(stack_base[cpu], STACK_ORDER);
+            stack_base[cpu] = NULL;
+        }
     }
 }
 
@@ -955,15 +973,19 @@ static int cpu_smpboot_alloc(unsigned in
     if ( node != NUMA_NO_NODE )
         memflags = MEMF_node(node);
 
-    stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
+    if ( stack_base[cpu] == NULL )
+        stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
     if ( stack_base[cpu] == NULL )
         goto out;
     memguard_guard_stack(stack_base[cpu]);
 
     order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
-    per_cpu(gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags);
+    gdt = per_cpu(gdt_table, cpu);
+    if ( gdt == NULL )
+        gdt = alloc_xenheap_pages(order, memflags);
     if ( gdt == NULL )
         goto out;
+    per_cpu(gdt_table, cpu) = gdt;
     memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE);
     BUILD_BUG_ON(NR_CPUS > 0x10000);
     gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
@@ -975,7 +997,8 @@ static int cpu_smpboot_alloc(unsigned in
     gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
 
     order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
-    idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
+    if ( idt_tables[cpu] == NULL )
+        idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
     if ( idt_tables[cpu] == NULL )
         goto out;
     memcpy(idt_tables[cpu], idt_table, IDT_ENTRIES * sizeof(idt_entry_t));
@@ -1003,16 +1026,16 @@ static int cpu_smpboot_alloc(unsigned in
          (secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL )
         goto out;
 
-    if ( !(zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
-           zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
-           alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+    if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
+           cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
+           cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
         goto out;
 
     rc = 0;
 
  out:
     if ( rc )
-        cpu_smpboot_free(cpu);
+        cpu_smpboot_free(cpu, true);
 
     return rc;
 }
@@ -1030,9 +1053,10 @@ static int cpu_smpboot_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        cpu_smpboot_free(cpu);
+        cpu_smpboot_free(cpu, !park_offline_cpus);
         break;
-    default:
+    case CPU_REMOVE:
+        cpu_smpboot_free(cpu, true);
         break;
     }
 
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -26,6 +26,8 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibli
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
+extern bool park_offline_cpus;
+
 void smp_send_nmi_allbutself(void);
 
 void send_IPI_mask(const cpumask_t *, int vector);
--- a/xen/include/xen/cpu.h
+++ b/xen/include/xen/cpu.h
@@ -47,6 +47,8 @@ void register_cpu_notifier(struct notifi
 #define CPU_DYING        (0x0007 | NOTIFY_REVERSE)
 /* CPU_DEAD: CPU is dead. */
 #define CPU_DEAD         (0x0008 | NOTIFY_REVERSE)
+/* CPU_REMOVE: CPU was removed. */
+#define CPU_REMOVE       (0x0009 | NOTIFY_REVERSE)
 
 /* Perform CPU hotplug. May return -EAGAIN. */
 int cpu_down(unsigned int cpu);
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -351,16 +351,37 @@ static inline bool_t alloc_cpumask_var(c
 	return *mask != NULL;
 }
 
+static inline bool cond_alloc_cpumask_var(cpumask_var_t *mask)
+{
+	if (*mask == NULL)
+		*mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+	return *mask != NULL;
+}
+
 static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	*(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
+static inline bool cond_zalloc_cpumask_var(cpumask_var_t *mask)
+{
+	if (*mask == NULL)
+		*mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+	else
+		cpumask_clear(*mask);
+	return *mask != NULL;
+}
+
 static inline void free_cpumask_var(cpumask_var_t mask)
 {
 	xfree(mask);
 }
+
+static inline void clear_cpumask_var(cpumask_var_t *mask)
+{
+	*mask = NULL;
+}
 #else
 typedef cpumask_t cpumask_var_t[1];
 
@@ -368,16 +389,22 @@ static inline bool_t alloc_cpumask_var(c
 {
 	return 1;
 }
+#define cond_alloc_cpumask_var alloc_cpumask_var
 
 static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	cpumask_clear(*mask);
 	return 1;
 }
+#define cond_zalloc_cpumask_var zalloc_cpumask_var
 
 static inline void free_cpumask_var(cpumask_var_t mask)
 {
 }
+
+static inline void clear_cpumask_var(cpumask_var_t *mask)
+{
+}
 #endif
 
 #if NR_CPUS > 1




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 3/8] allow cpu_down() to be called earlier
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
  2018-07-11 12:04 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Jan Beulich
  2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
@ 2018-07-11 12:06 ` Jan Beulich
  2018-07-12 10:55   ` Wei Liu
  2018-07-12 12:44   ` Andrew Cooper
  2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall

The function's use of the stop-machine logic has so far prevented its
use ahead of the processing of the "ordinary" initcalls. Since at this
early time we're in a controlled environment anyway, there's no need for
such a heavy tool. Additionally this ought to have less of a performance
impact especially on large systems, compared to the alternative of
making stop-machine functionality available earlier.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -67,12 +67,17 @@ void __init register_cpu_notifier(struct
     spin_unlock(&cpu_add_remove_lock);
 }
 
-static int take_cpu_down(void *unused)
+static void _take_cpu_down(void *unused)
 {
     void *hcpu = (void *)(long)smp_processor_id();
     int notifier_rc = notifier_call_chain(&cpu_chain, CPU_DYING, hcpu, NULL);
     BUG_ON(notifier_rc != NOTIFY_DONE);
     __cpu_disable();
+}
+
+static int take_cpu_down(void *arg)
+{
+    _take_cpu_down(arg);
     return 0;
 }
 
@@ -98,7 +103,9 @@ int cpu_down(unsigned int cpu)
         goto fail;
     }
 
-    if ( (err = stop_machine_run(take_cpu_down, NULL, cpu)) < 0 )
+    if ( unlikely(system_state < SYS_STATE_active) )
+        on_selected_cpus(cpumask_of(cpu), _take_cpu_down, NULL, true);
+    else if ( (err = stop_machine_run(take_cpu_down, NULL, cpu)) < 0 )
         goto fail;
 
     __cpu_die(cpu);





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (2 preceding siblings ...)
  2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
@ 2018-07-11 12:07 ` Jan Beulich
  2018-07-11 18:11   ` Brian Woods
  2018-07-12 13:02   ` Andrew Cooper
  2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
the difference is not very big. The most relevant change (requiring some
code restructuring) is that the topoext feature no longer means there is
a valid CU ID.

Take the opportunity and convert wrongly plain int variables in
set_cpu_sibling_map() to unsigned int.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
                 u32 eax, ebx, ecx, edx;
 
                 cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-                c->compute_unit_id = ebx & 0xFF;
                 c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+
+		if (c->x86 < 0x17)
+	                c->compute_unit_id = ebx & 0xFF;
+		else {
+	                c->cpu_core_id = ebx & 0xFF;
+			c->x86_max_cores /= c->x86_num_siblings;
+		}
         }
         
         if (opt_cpu_info)
                 printk("CPU %d(%d) -> Processor %d, %s %d\n",
                        cpu, c->x86_max_cores, c->phys_proc_id,
-                       cpu_has(c, X86_FEATURE_TOPOEXT) ? "Compute Unit" : 
-                                                         "Core",
-                       cpu_has(c, X86_FEATURE_TOPOEXT) ? c->compute_unit_id :
-                                                         c->cpu_core_id);
+                       c->compute_unit_id != INVALID_CUID ? "Compute Unit"
+                                                          : "Core",
+                       c->compute_unit_id != INVALID_CUID ? c->compute_unit_id
+                                                          : c->cpu_core_id);
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -236,33 +236,41 @@ static void link_thread_siblings(int cpu
     cpumask_set_cpu(cpu2, per_cpu(cpu_core_mask, cpu1));
 }
 
-static void set_cpu_sibling_map(int cpu)
+static void set_cpu_sibling_map(unsigned int cpu)
 {
-    int i;
+    unsigned int i;
     struct cpuinfo_x86 *c = cpu_data;
 
     cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
 
     cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
+    cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
+    cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
 
     if ( c[cpu].x86_num_siblings > 1 )
     {
         for_each_cpu ( i, &cpu_sibling_setup_map )
         {
-            if ( cpu_has(c, X86_FEATURE_TOPOEXT) ) {
-                if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
-                     (c[cpu].compute_unit_id == c[i].compute_unit_id) )
+            if ( cpu == i || c[cpu].phys_proc_id != c[i].phys_proc_id )
+                continue;
+            if ( c[cpu].compute_unit_id != INVALID_CUID &&
+                 c[i].compute_unit_id != INVALID_CUID )
+            {
+                if ( c[cpu].compute_unit_id == c[i].compute_unit_id )
                     link_thread_siblings(cpu, i);
-            } else if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
-                        (c[cpu].cpu_core_id == c[i].cpu_core_id) ) {
-                link_thread_siblings(cpu, i);
             }
+            else if ( c[cpu].cpu_core_id != XEN_INVALID_CORE_ID &&
+                      c[i].cpu_core_id != XEN_INVALID_CORE_ID )
+            {
+                if ( c[cpu].cpu_core_id == c[i].cpu_core_id )
+                    link_thread_siblings(cpu, i);
+            }
+            else
+                printk(XENLOG_WARNING
+                       "CPU%u: unclear relationship with CPU%u\n",
+                       cpu, i);
         }
     }
-    else
-    {
-        cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
-    }
 
     if ( c[cpu].x86_max_cores == 1 )
     {





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (3 preceding siblings ...)
  2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
@ 2018-07-11 12:09 ` Jan Beulich
  2018-07-12 15:38   ` Andrew Cooper
  2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:09 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Reportedly Intel CPUs which can't broadcast #MC to all targeted
cores/threads because some have CR4.MCE clear will shut down. Therefore
we want to keep CR4.MCE enabled when offlining a CPU, and we need to
bring up all CPUs in order to be able to set CR4.MCE in the first place.

The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
altogether right here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Instead of fully bringing up CPUs and then calling cpu_down(), another
option would be to suppress/cancel full bringup in smp_callin().
---
Note: The parked CPUs can be brought online (i.e. the meaning of
      "maxcpus=" isn't as strict anymore as it was before), but won't
      immediately be used for scheduling pre-existing Dom0 CPUs. That's
      because dom0_setup_vcpu() artifically restricts the affinity. For
      DomU-s whose affinity was not artifically restricted, no such
      limitation exists, albeit the shown "soft" affinity appears to
      suffer a similar issue. As that's not a goal of this patch, I've
      put the issues on the side for now, perhaps for someone else to
      take care of.
Note: On one of my test systems the parked CPUs get _PSD data reported
      by Dom0 that is different from the non-parked ones (coord_type is
      0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this
      problem, so there is apparently something amiss in the processor
      driver. I've tried to figure out what, but I couldn't, despite the
      AML suggesting that this might be some _OSC invocation (but if it
      is, I can't find it - acpi_run_osc() clearly does not anywhere get
      invoked in a per-CPU fashion).

--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -13,6 +13,7 @@
 #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
 
 #include "cpu.h"
+#include "mcheck/x86_mca.h"
 
 bool_t opt_arat = 1;
 boolean_param("arat", opt_arat);
@@ -343,6 +344,9 @@ static void __init early_cpu_detect(void
 			hap_paddr_bits = PADDR_BITS;
 	}
 
+	if (c->x86_vendor != X86_VENDOR_AMD)
+		park_offline_cpus = opt_mce;
+
 	initialize_cpu_data(0);
 }
 
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -636,8 +636,6 @@ static void clear_cmci(void)
 
 static void cpu_mcheck_disable(void)
 {
-    clear_in_cr4(X86_CR4_MCE);
-
     if ( cmci_support && opt_mce )
         clear_cmci();
 }
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -68,18 +68,26 @@ physid_mask_t phys_cpu_present_map;
 
 void __init set_nr_cpu_ids(unsigned int max_cpus)
 {
+	unsigned int tot_cpus = num_processors + disabled_cpus;
+
 	if (!max_cpus)
-		max_cpus = num_processors + disabled_cpus;
+		max_cpus = tot_cpus;
 	if (max_cpus > NR_CPUS)
 		max_cpus = NR_CPUS;
 	else if (!max_cpus)
 		max_cpus = 1;
 	printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
 	       max_cpus, max_t(int, max_cpus - num_processors, 0));
-	nr_cpu_ids = max_cpus;
+
+	if (!park_offline_cpus)
+		tot_cpus = max_cpus;
+	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
+	if (park_offline_cpus && nr_cpu_ids < num_processors)
+		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
+		       num_processors - nr_cpu_ids);
 
 #ifndef nr_cpumask_bits
-	nr_cpumask_bits = (max_cpus + (BITS_PER_LONG - 1)) &
+	nr_cpumask_bits = (nr_cpu_ids + (BITS_PER_LONG - 1)) &
 			  ~(BITS_PER_LONG - 1);
 	printk(XENLOG_DEBUG "NR_CPUS:%u nr_cpumask_bits:%u\n",
 	       NR_CPUS, nr_cpumask_bits);
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -665,7 +665,7 @@ void __init noreturn __start_xen(unsigne
 {
     char *memmap_type = NULL;
     char *cmdline, *kextra, *loader;
-    unsigned int initrdidx;
+    unsigned int initrdidx, num_parked = 0;
     multiboot_info_t *mbi;
     module_t *mod;
     unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
@@ -1503,7 +1503,8 @@ void __init noreturn __start_xen(unsigne
     else
     {
         set_nr_cpu_ids(max_cpus);
-        max_cpus = nr_cpu_ids;
+        if ( !max_cpus )
+            max_cpus = nr_cpu_ids;
     }
 
     if ( xen_guest )
@@ -1626,16 +1627,27 @@ void __init noreturn __start_xen(unsigne
             /* Set up node_to_cpumask based on cpu_to_node[]. */
             numa_add_cpu(i);
 
-            if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
+            if ( (park_offline_cpus || num_online_cpus() < max_cpus) &&
+                 !cpu_online(i) )
             {
                 int ret = cpu_up(i);
                 if ( ret != 0 )
                     printk("Failed to bring up CPU %u (error %d)\n", i, ret);
+                else if ( num_online_cpus() > max_cpus )
+                {
+                    ret = cpu_down(i);
+                    if ( !ret )
+                        ++num_parked;
+                    else
+                        printk("Could not re-offline CPU%u (%d)\n", i, ret);
+                }
             }
         }
     }
 
     printk("Brought up %ld CPUs\n", (long)num_online_cpus());
+    if ( num_parked )
+        printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
     smp_cpus_done();
 
     do_initcalls();
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -342,12 +342,6 @@ static always_inline void set_in_cr4 (un
     write_cr4(read_cr4() | mask);
 }
 
-static always_inline void clear_in_cr4 (unsigned long mask)
-{
-    mmu_cr4_features &= ~mask;
-    write_cr4(read_cr4() & ~mask);
-}
-
 static inline unsigned int read_pkru(void)
 {
     unsigned int pkru;




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (4 preceding siblings ...)
  2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-11 12:10 ` Jan Beulich
  2018-07-12 15:45   ` Andrew Cooper
  2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:10 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Brian Woods, Suravee Suthikulpanit,
	Dario Faggioli

Shared resources (L1 cache and TLB in particular) present a risk of
information leak via side channels. Don't use hyperthreads in such
cases, but allow independent control of their use at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
An option to avoid the up/down cycle would be to avoid clearing the
sibling (and then perhaps also core) map of parked CPUs, allowing to
bail early from cpu_up_helper().

TBD: How to prevent the CPU from transiently becoming available for
     scheduling when being onlined at runtime?

TBD: For now the patch assumes all HT-enabled CPUs are affected by side
     channel attacks through shared resources. There are claims that AMD
     ones aren't, but it hasn't really become clear to me why that would
     be, as I don't see the fully associative L1 TLBs to be sufficient
     reason for there to not be other possible avenues (L2 TLB, caches).

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
 ### hpetbroadcast (x86)
 > `= <boolean>`
 
+### ht (x86)
+> `= <boolean>`
+
+Default: `false`
+
+Control bring up of multiple hyper-threads per CPU core.
+
 ### hvm\_debug (x86)
 > `= <integer>`
 
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -62,6 +62,9 @@ boolean_param("nosmp", opt_nosmp);
 static unsigned int __initdata max_cpus;
 integer_param("maxcpus", max_cpus);
 
+int8_t __read_mostly opt_ht = -1;
+boolean_param("ht", opt_ht);
+
 /* opt_invpcid: If false, don't use INVPCID instruction even if available. */
 static bool __initdata opt_invpcid = true;
 boolean_param("invpcid", opt_invpcid);
@@ -1633,7 +1636,10 @@ void __init noreturn __start_xen(unsigne
                 int ret = cpu_up(i);
                 if ( ret != 0 )
                     printk("Failed to bring up CPU %u (error %d)\n", i, ret);
-                else if ( num_online_cpus() > max_cpus )
+                else if ( num_online_cpus() > max_cpus ||
+                          (!opt_ht &&
+                           cpu_data[i].compute_unit_id == INVALID_CUID &&
+                           cpumask_weight(per_cpu(cpu_sibling_mask, i)) > 1) )
                 {
                     ret = cpu_down(i);
                     if ( !ret )
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -23,6 +23,7 @@
 #include <asm/microcode.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 #include <asm/spec_ctrl.h>
 #include <asm/spec_ctrl_asm.h>
 
@@ -126,6 +127,9 @@ static int __init parse_spec_ctrl(const
 
             opt_eager_fpu = 0;
 
+            if ( opt_ht < 0 )
+                opt_ht = 1;
+
         disable_common:
             opt_rsb_pv = false;
             opt_rsb_hvm = false;
@@ -627,6 +631,9 @@ void __init init_speculation_mitigations
     if ( default_xen_spec_ctrl )
         setup_force_cpu_cap(X86_FEATURE_SC_MSR_IDLE);
 
+    if ( opt_ht < 0 )
+        opt_ht = 0;
+
     xpti_init_default(false);
     if ( opt_xpti == 0 )
         setup_force_cpu_cap(X86_FEATURE_NO_XPTI);
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -23,6 +23,7 @@
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/support.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 #include <asm/smp.h>
 #include <asm/numa.h>
 #include <xen/nodemask.h>
@@ -48,14 +49,27 @@ static void l3_cache_get(void *arg)
 
 long cpu_up_helper(void *data)
 {
-    int cpu = (unsigned long)data;
+    unsigned int cpu = (unsigned long)data;
     int ret = cpu_up(cpu);
+
     if ( ret == -EBUSY )
     {
         /* On EBUSY, flush RCU work and have one more go. */
         rcu_barrier();
         ret = cpu_up(cpu);
     }
+
+    if ( !ret && !opt_ht &&
+         cpu_data[cpu].compute_unit_id == INVALID_CUID &&
+         cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) > 1 )
+    {
+        ret = cpu_down_helper(data);
+        if ( ret )
+            printk("Could not re-offline CPU%u (%d)\n", cpu, ret);
+        else
+            ret = -EPERM;
+    }
+
     return ret;
 }
 
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -59,6 +59,8 @@ extern uint8_t kbd_shift_flags;
 extern unsigned long highmem_start;
 #endif
 
+extern int8_t opt_ht;
+
 #ifdef CONFIG_SHADOW_PAGING
 extern bool opt_dom0_shadow;
 #else




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (5 preceding siblings ...)
  2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-11 12:11 ` Jan Beulich
  2018-07-11 12:23   ` Andrew Cooper
                     ` (2 more replies)
  2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
       [not found] ` <5B45F26A02000078001D312F@suse.com>
  8 siblings, 3 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:11 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne

In the shim case, the number of CPUs should be solely controlled by the
guest configuration file. Make sure the command line options are fully
(and not just partially) ignored.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1498,6 +1498,15 @@ void __init noreturn __start_xen(unsigne
     if ( smp_found_config )
         get_smp_config();
 
+    /*
+     * In the shim case, the number of CPUs should be solely controlled by the
+     * guest configuration file.
+     */
+    if ( pv_shim )
+    {
+        opt_nosmp = false;
+        max_cpus = 0;
+    }
     if ( opt_nosmp )
     {
         max_cpus = 0;



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var()
  2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
                   ` (6 preceding siblings ...)
  2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
@ 2018-07-11 12:12 ` Jan Beulich
  2018-07-11 12:20   ` Andrew Cooper
  2018-07-12 15:13   ` Wei Liu
       [not found] ` <5B45F26A02000078001D312F@suse.com>
  8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall

Drop unnecessary casts and use bool in favor of bool_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -345,9 +345,9 @@ static inline int cpulist_scnprintf(char
 
 typedef cpumask_t *cpumask_var_t;
 
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
 {
-	*(void **)mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+	*mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
@@ -358,9 +358,9 @@ static inline bool cond_alloc_cpumask_va
 	return *mask != NULL;
 }
 
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
 {
-	*(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+	*mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
 	return *mask != NULL;
 }
 
@@ -385,16 +385,16 @@ static inline void clear_cpumask_var(cpu
 #else
 typedef cpumask_t cpumask_var_t[1];
 
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
 {
-	return 1;
+	return true;
 }
 #define cond_alloc_cpumask_var alloc_cpumask_var
 
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
 {
 	cpumask_clear(*mask);
-	return 1;
+	return true;
 }
 #define cond_zalloc_cpumask_var zalloc_cpumask_var
 





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var()
  2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
@ 2018-07-11 12:20   ` Andrew Cooper
  2018-07-12 15:13   ` Wei Liu
  1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-11 12:20 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Julien Grall

On 11/07/18 13:12, Jan Beulich wrote:
> Drop unnecessary casts and use bool in favor of bool_t.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
  2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
@ 2018-07-11 12:23   ` Andrew Cooper
  2018-07-11 15:18   ` Roger Pau Monné
  2018-07-11 16:02   ` Wei Liu
  2 siblings, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-11 12:23 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monne

On 11/07/18 13:11, Jan Beulich wrote:
> In the shim case, the number of CPUs should be solely controlled by the
> guest configuration file. Make sure the command line options are fully
> (and not just partially) ignored.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Ideally with "This option is ignored in **pv-shim** mode" added to the
docs in the relevant places,

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
  2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
  2018-07-11 12:23   ` Andrew Cooper
@ 2018-07-11 15:18   ` Roger Pau Monné
  2018-07-11 16:02   ` Wei Liu
  2 siblings, 0 replies; 39+ messages in thread
From: Roger Pau Monné @ 2018-07-11 15:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Wed, Jul 11, 2018 at 06:11:36AM -0600, Jan Beulich wrote:
> In the shim case, the number of CPUs should be solely controlled by the
> guest configuration file. Make sure the command line options are fully
> (and not just partially) ignored.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

I agree with Andrew that it should be mentioned in the command line
document. Sorry for not doing that before.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
  2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
  2018-07-11 12:23   ` Andrew Cooper
  2018-07-11 15:18   ` Roger Pau Monné
@ 2018-07-11 16:02   ` Wei Liu
  2 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-11 16:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Roger Pau Monne, Wei Liu, Andrew Cooper

On Wed, Jul 11, 2018 at 06:11:36AM -0600, Jan Beulich wrote:
> In the shim case, the number of CPUs should be solely controlled by the
> guest configuration file. Make sure the command line options are fully
> (and not just partially) ignored.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
  2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
@ 2018-07-11 18:11   ` Brian Woods
  2018-07-12 13:02   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Brian Woods @ 2018-07-11 18:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit, Andrew Cooper

On Wed, Jul 11, 2018 at 06:07:42AM -0600, Jan Beulich wrote:
> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
> the difference is not very big. The most relevant change (requiring some
> code restructuring) is that the topoext feature no longer means there is
> a valid CU ID.
> 
> Take the opportunity and convert wrongly plain int variables in
> set_cpu_sibling_map() to unsigned int.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
>                  u32 eax, ebx, ecx, edx;
>  
>                  cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
> -                c->compute_unit_id = ebx & 0xFF;
>                  c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
> +
> +		if (c->x86 < 0x17)
> +	                c->compute_unit_id = ebx & 0xFF;
> +		else {
> +	                c->cpu_core_id = ebx & 0xFF;
> +			c->x86_max_cores /= c->x86_num_siblings;
> +		}
>          }
>          
>          if (opt_cpu_info)
>                  printk("CPU %d(%d) -> Processor %d, %s %d\n",
>                         cpu, c->x86_max_cores, c->phys_proc_id,
> -                       cpu_has(c, X86_FEATURE_TOPOEXT) ? "Compute Unit" : 
> -                                                         "Core",
> -                       cpu_has(c, X86_FEATURE_TOPOEXT) ? c->compute_unit_id :
> -                                                         c->cpu_core_id);
> +                       c->compute_unit_id != INVALID_CUID ? "Compute Unit"
> +                                                          : "Core",
> +                       c->compute_unit_id != INVALID_CUID ? c->compute_unit_id
> +                                                          : c->cpu_core_id);
>  }
>  
>  static void early_init_amd(struct cpuinfo_x86 *c)
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -236,33 +236,41 @@ static void link_thread_siblings(int cpu
>      cpumask_set_cpu(cpu2, per_cpu(cpu_core_mask, cpu1));
>  }
>  
> -static void set_cpu_sibling_map(int cpu)
> +static void set_cpu_sibling_map(unsigned int cpu)
>  {
> -    int i;
> +    unsigned int i;
>      struct cpuinfo_x86 *c = cpu_data;
>  
>      cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
>  
>      cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
> +    cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
> +    cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
>  
>      if ( c[cpu].x86_num_siblings > 1 )
>      {
>          for_each_cpu ( i, &cpu_sibling_setup_map )
>          {
> -            if ( cpu_has(c, X86_FEATURE_TOPOEXT) ) {
> -                if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
> -                     (c[cpu].compute_unit_id == c[i].compute_unit_id) )
> +            if ( cpu == i || c[cpu].phys_proc_id != c[i].phys_proc_id )
> +                continue;
> +            if ( c[cpu].compute_unit_id != INVALID_CUID &&
> +                 c[i].compute_unit_id != INVALID_CUID )
> +            {
> +                if ( c[cpu].compute_unit_id == c[i].compute_unit_id )
>                      link_thread_siblings(cpu, i);
> -            } else if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
> -                        (c[cpu].cpu_core_id == c[i].cpu_core_id) ) {
> -                link_thread_siblings(cpu, i);
>              }
> +            else if ( c[cpu].cpu_core_id != XEN_INVALID_CORE_ID &&
> +                      c[i].cpu_core_id != XEN_INVALID_CORE_ID )
> +            {
> +                if ( c[cpu].cpu_core_id == c[i].cpu_core_id )
> +                    link_thread_siblings(cpu, i);
> +            }
> +            else
> +                printk(XENLOG_WARNING
> +                       "CPU%u: unclear relationship with CPU%u\n",
> +                       cpu, i);
>          }
>      }
> -    else
> -    {
> -        cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
> -    }
>  
>      if ( c[cpu].x86_max_cores == 1 )
>      {
> 
> 
> 
> 

Side note: if cpu_core_id isn't the logical cpu, it might be worth
updating the comments in processor.h

Reviewed-by: Brian Woods <brian.woods@amd.com>

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
  2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
@ 2018-07-12 10:53   ` Wei Liu
  2018-07-12 11:48     ` Jan Beulich
  2018-07-12 12:42   ` Andrew Cooper
  1 sibling, 1 reply; 39+ messages in thread
From: Wei Liu @ 2018-07-12 10:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Jul 11, 2018 at 06:06:04AM -0600, Jan Beulich wrote:
[...]
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
>  cpumask_t cpu_online_map __read_mostly;
>  EXPORT_SYMBOL(cpu_online_map);
>  
> +bool __read_mostly park_offline_cpus;
> +
>  unsigned int __read_mostly nr_sockets;
>  cpumask_t **__read_mostly socket_cpumask;
>  static cpumask_t *secondary_socket_cpumask;
> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
>      }
>  }
>  
> -static void cpu_smpboot_free(unsigned int cpu)
> +static void cpu_smpboot_free(unsigned int cpu, bool all)

I think "all" is too vague. It doesn't convey the idea what constitutes
"partial". But I don't have any better suggestion either.

>  {
>      unsigned int order, socket = cpu_to_socket(cpu);
>      struct cpuinfo_x86 *c = cpu_data;
> @@ -898,15 +900,24 @@ static void cpu_smpboot_free(unsigned in
>          socket_cpumask[socket] = NULL;
>      }
>  
> -    c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
> -    c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
> -    c[cpu].compute_unit_id = INVALID_CUID;
>      cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
>  
> -    free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
> -    free_cpumask_var(per_cpu(cpu_core_mask, cpu));
> -    if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
> -        free_cpumask_var(per_cpu(scratch_cpumask, cpu));
> +    if ( all )
> +    {
> +        c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
> +        c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
> +        c[cpu].compute_unit_id = INVALID_CUID;
> +
> +        free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
> +        clear_cpumask_var(&per_cpu(cpu_sibling_mask, cpu));
> +        free_cpumask_var(per_cpu(cpu_core_mask, cpu));
> +        clear_cpumask_var(&per_cpu(cpu_core_mask, cpu));
> +        if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
> +        {
> +            free_cpumask_var(per_cpu(scratch_cpumask, cpu));
> +            clear_cpumask_var(&per_cpu(scratch_cpumask, cpu));
> +        }

One thing that's not mentioned in the commit message is why various
masks are kept. I guess they are needed for some other parts of the
system to remain operational.

The rest looks fine to me.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/8] allow cpu_down() to be called earlier
  2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
@ 2018-07-12 10:55   ` Wei Liu
  2018-07-12 12:44   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-12 10:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Jul 11, 2018 at 06:06:52AM -0600, Jan Beulich wrote:
> The function's use of the stop-machine logic has so far prevented its
> use ahead of the processing of the "ordinary" initcalls. Since at this
> early time we're in a controlled environment anyway, there's no need for
> such a heavy tool. Additionally this ought to have less of a performance
> impact especially on large systems, compared to the alternative of
> making stop-machine functionality available earlier.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
  2018-07-12 10:53   ` Wei Liu
@ 2018-07-12 11:48     ` Jan Beulich
  2018-07-13  8:39       ` Wei Liu
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-12 11:48 UTC (permalink / raw)
  To: Wei Liu
  Cc: Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, xen-devel

>>> On 12.07.18 at 12:53, <wei.liu2@citrix.com> wrote:
> On Wed, Jul 11, 2018 at 06:06:04AM -0600, Jan Beulich wrote:
> [...]
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
>>  cpumask_t cpu_online_map __read_mostly;
>>  EXPORT_SYMBOL(cpu_online_map);
>>  
>> +bool __read_mostly park_offline_cpus;
>> +
>>  unsigned int __read_mostly nr_sockets;
>>  cpumask_t **__read_mostly socket_cpumask;
>>  static cpumask_t *secondary_socket_cpumask;
>> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
>>      }
>>  }
>>  
>> -static void cpu_smpboot_free(unsigned int cpu)
>> +static void cpu_smpboot_free(unsigned int cpu, bool all)
> 
> I think "all" is too vague. It doesn't convey the idea what constitutes
> "partial". But I don't have any better suggestion either.

Indeed I've been trying to come up with a better name before
posting, but couldn't. One thing though - I don't think the name
should in anyway be required to express what "partial" means.

>> @@ -898,15 +900,24 @@ static void cpu_smpboot_free(unsigned in
>>          socket_cpumask[socket] = NULL;
>>      }
>>  
>> -    c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
>> -    c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
>> -    c[cpu].compute_unit_id = INVALID_CUID;
>>      cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
>>  
>> -    free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
>> -    free_cpumask_var(per_cpu(cpu_core_mask, cpu));
>> -    if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
>> -        free_cpumask_var(per_cpu(scratch_cpumask, cpu));
>> +    if ( all )
>> +    {
>> +        c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
>> +        c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
>> +        c[cpu].compute_unit_id = INVALID_CUID;
>> +
>> +        free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
>> +        clear_cpumask_var(&per_cpu(cpu_sibling_mask, cpu));
>> +        free_cpumask_var(per_cpu(cpu_core_mask, cpu));
>> +        clear_cpumask_var(&per_cpu(cpu_core_mask, cpu));
>> +        if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
>> +        {
>> +            free_cpumask_var(per_cpu(scratch_cpumask, cpu));
>> +            clear_cpumask_var(&per_cpu(scratch_cpumask, cpu));
>> +        }
> 
> One thing that's not mentioned in the commit message is why various
> masks are kept. I guess they are needed for some other parts of the
> system to remain operational.

Hmm, I've taken the opposite approach: Free in "partial" mode
only what I could prove won't be needed.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
  2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
  2018-07-12 10:53   ` Wei Liu
@ 2018-07-12 12:42   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 12:42 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Julien Grall

On 11/07/18 13:06, Jan Beulich wrote:
> In order to be able to service #MC on offlined CPUs, GDT, IDT, stack,

For clarity, I'd phrase this as "CPUs, the GDT, ..." to help visually
separate CPUs as it isn't a part of the rest of the list.

> --- a/xen/arch/x86/genapic/x2apic.c
> +++ b/xen/arch/x86/genapic/x2apic.c
> @@ -201,18 +201,25 @@ static int update_clusterinfo(
>          if ( !cluster_cpus_spare )
>              cluster_cpus_spare = xzalloc(cpumask_t);
>          if ( !cluster_cpus_spare ||
> -             !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
> +             !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
>              err = -ENOMEM;
>          break;
>      case CPU_UP_CANCELED:
>      case CPU_DEAD:
> +    case CPU_REMOVE:
> +        if ( park_offline_cpus == (action != CPU_REMOVE) )
> +            break;
>          if ( per_cpu(cluster_cpus, cpu) )
>          {
>              cpumask_clear_cpu(cpu, per_cpu(cluster_cpus, cpu));
>              if ( cpumask_empty(per_cpu(cluster_cpus, cpu)) )
> +            {
>                  xfree(per_cpu(cluster_cpus, cpu));
> +                per_cpu(cluster_cpus, cpu) = NULL;
> +            }
>          }
>          free_cpumask_var(per_cpu(scratch_mask, cpu));
> +        clear_cpumask_var(&per_cpu(scratch_mask, cpu));

freeing and NULL-ing the pointer at the same time is already a common
pattern  How about introducing

/* Free an allocation, and zero the pointer to it. */
#define XFREE(p)
({
    typeof(*p) *_p = (p);

    xfree(_p);
    _p = NULL;
})

and a similar wrapper for FREE_CPUMASK_VAR(p) which takes care of the
NULL-ing as well?

In time as these start to get used more widely, it should reduce the
chances of double-freeing in cleanup paths.

> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
>  cpumask_t cpu_online_map __read_mostly;
>  EXPORT_SYMBOL(cpu_online_map);
>  
> +bool __read_mostly park_offline_cpus;
> +
>  unsigned int __read_mostly nr_sockets;
>  cpumask_t **__read_mostly socket_cpumask;
>  static cpumask_t *secondary_socket_cpumask;
> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
>      }
>  }
>  
> -static void cpu_smpboot_free(unsigned int cpu)
> +static void cpu_smpboot_free(unsigned int cpu, bool all)

Perhaps "remove" or "remove_cpu", to match the CPU_REMOVE terminology?

Also, we probably want a comment here explaining the difference between
parking and removing in terms of what infrastructure needs to remain valid.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/8] allow cpu_down() to be called earlier
  2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
  2018-07-12 10:55   ` Wei Liu
@ 2018-07-12 12:44   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 12:44 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, Julien Grall

On 11/07/18 13:06, Jan Beulich wrote:
> The function's use of the stop-machine logic has so far prevented its
> use ahead of the processing of the "ordinary" initcalls. Since at this
> early time we're in a controlled environment anyway, there's no need for
> such a heavy tool. Additionally this ought to have less of a performance
> impact especially on large systems, compared to the alternative of
> making stop-machine functionality available earlier.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
  2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
  2018-07-11 18:11   ` Brian Woods
@ 2018-07-12 13:02   ` Andrew Cooper
  2018-07-12 14:22     ` Jan Beulich
  1 sibling, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 13:02 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit

On 11/07/18 13:07, Jan Beulich wrote:
> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
> the difference is not very big. The most relevant change (requiring some
> code restructuring) is that the topoext feature no longer means there is
> a valid CU ID.
>
> Take the opportunity and convert wrongly plain int variables in
> set_cpu_sibling_map() to unsigned int.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
>                  u32 eax, ebx, ecx, edx;
>  
>                  cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
> -                c->compute_unit_id = ebx & 0xFF;
>                  c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
> +
> +		if (c->x86 < 0x17)
> +	                c->compute_unit_id = ebx & 0xFF;
> +		else {
> +	                c->cpu_core_id = ebx & 0xFF;
> +			c->x86_max_cores /= c->x86_num_siblings;
> +		}

The indentation here is odd.  It turns out the function uses entirely
8-spaces rather than tabs, like the rest of the file.

Would you mind either retaining spaces, or fixing up the entire function
(probably easiest as a prereq patch) ?

Otherwise LGTM.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
  2018-07-12 13:02   ` Andrew Cooper
@ 2018-07-12 14:22     ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-12 14:22 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit

>>> On 12.07.18 at 15:02, <andrew.cooper3@citrix.com> wrote:
> On 11/07/18 13:07, Jan Beulich wrote:
>> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
>> the difference is not very big. The most relevant change (requiring some
>> code restructuring) is that the topoext feature no longer means there is
>> a valid CU ID.
>>
>> Take the opportunity and convert wrongly plain int variables in
>> set_cpu_sibling_map() to unsigned int.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/cpu/amd.c
>> +++ b/xen/arch/x86/cpu/amd.c
>> @@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
>>                  u32 eax, ebx, ecx, edx;
>>  
>>                  cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
>> -                c->compute_unit_id = ebx & 0xFF;
>>                  c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
>> +
>> +		if (c->x86 < 0x17)
>> +	                c->compute_unit_id = ebx & 0xFF;
>> +		else {
>> +	                c->cpu_core_id = ebx & 0xFF;
>> +			c->x86_max_cores /= c->x86_num_siblings;
>> +		}
> 
> The indentation here is odd.  It turns out the function uses entirely
> 8-spaces rather than tabs, like the rest of the file.

Ouch - didn't notice this.

> Would you mind either retaining spaces, or fixing up the entire function
> (probably easiest as a prereq patch) ?

I'll fix this up here, not in a prereq patch (for backporting's sake).
I'll consider adding a follow-up patch.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var()
  2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
  2018-07-11 12:20   ` Andrew Cooper
@ 2018-07-12 15:13   ` Wei Liu
  1 sibling, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-12 15:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Jul 11, 2018 at 06:12:14AM -0600, Jan Beulich wrote:
> Drop unnecessary casts and use bool in favor of bool_t.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-12 15:38   ` Andrew Cooper
  2018-07-13  8:11     ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 15:38 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 11/07/18 13:09, Jan Beulich wrote:
> Reportedly Intel CPUs which can't broadcast #MC to all targeted
> cores/threads because some have CR4.MCE clear will shut down. Therefore
> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
> bring up all CPUs in order to be able to set CR4.MCE in the first place.
>
> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
> altogether right here.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Instead of fully bringing up CPUs and then calling cpu_down(), another
> option would be to suppress/cancel full bringup in smp_callin().

What is the practical difference?  When we know ahead of time that we
are intending to part the core, then cancelling in smp_callin() seems
cleaner.

> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -68,18 +68,26 @@ physid_mask_t phys_cpu_present_map;
>  
>  void __init set_nr_cpu_ids(unsigned int max_cpus)
>  {
> +	unsigned int tot_cpus = num_processors + disabled_cpus;
> +
>  	if (!max_cpus)
> -		max_cpus = num_processors + disabled_cpus;
> +		max_cpus = tot_cpus;
>  	if (max_cpus > NR_CPUS)
>  		max_cpus = NR_CPUS;
>  	else if (!max_cpus)
>  		max_cpus = 1;
>  	printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
>  	       max_cpus, max_t(int, max_cpus - num_processors, 0));
> -	nr_cpu_ids = max_cpus;
> +
> +	if (!park_offline_cpus)
> +		tot_cpus = max_cpus;
> +	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
> +	if (park_offline_cpus && nr_cpu_ids < num_processors)
> +		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
> +		       num_processors - nr_cpu_ids);
>  
>  #ifndef nr_cpumask_bits
> -	nr_cpumask_bits = (max_cpus + (BITS_PER_LONG - 1)) &
> +	nr_cpumask_bits = (nr_cpu_ids + (BITS_PER_LONG - 1)) &
>  			  ~(BITS_PER_LONG - 1);

ROUNDUP() ?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-12 15:45   ` Andrew Cooper
  2018-07-13  8:13     ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 15:45 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: George Dunlap, Brian Woods, Suravee Suthikulpanit, Dario Faggioli

On 11/07/18 13:10, Jan Beulich wrote:
> Shared resources (L1 cache and TLB in particular) present a risk of
> information leak via side channels. Don't use hyperthreads in such
> cases, but allow independent control of their use at the same time.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> An option to avoid the up/down cycle would be to avoid clearing the
> sibling (and then perhaps also core) map of parked CPUs, allowing to
> bail early from cpu_up_helper().
>
> TBD: How to prevent the CPU from transiently becoming available for
>      scheduling when being onlined at runtime?

This looks like an argument for cancelling at call-in time, no?

> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>  ### hpetbroadcast (x86)
>  > `= <boolean>`
>  
> +### ht (x86)

I'd suggest smt rather than ht here.  SMT is the technical term, while
HT is Intel's marketing name.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used
  2018-07-12 15:38   ` Andrew Cooper
@ 2018-07-13  8:11     ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-13  8:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 12.07.18 at 17:38, <andrew.cooper3@citrix.com> wrote:
> On 11/07/18 13:09, Jan Beulich wrote:
>> Reportedly Intel CPUs which can't broadcast #MC to all targeted
>> cores/threads because some have CR4.MCE clear will shut down. Therefore
>> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
>> bring up all CPUs in order to be able to set CR4.MCE in the first place.
>>
>> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
>> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
>> altogether right here.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Instead of fully bringing up CPUs and then calling cpu_down(), another
>> option would be to suppress/cancel full bringup in smp_callin().
> 
> What is the practical difference?  When we know ahead of time that we
> are intending to part the core, then cancelling in smp_callin() seems
> cleaner.

There are things that get left out in that case, in particular anything
done in CPU_STARTING and CPU_ONLINE notifications. A prime
omission here would be mwait_idle_cpu_init() setting up C-state
information for the CPU.

The other risk I see is that of the apparent error that this causes
to be handed up the call tree. As you can see from the cpupool fix
that was necessary, we'd chance to run into more buggy error
handling paths.

All in all - originally I had also thought that this approach might be
cleaner, but the above together with the ease of invoking
cpu_down() (requiring pretty little change overall) made me
change minds. The main downside is really that runtime soft-
online attempts will make (in patch 6, as mentioned there) the
hyperthreads visible to the scheduler for a brief moment.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-12 15:45   ` Andrew Cooper
@ 2018-07-13  8:13     ` Jan Beulich
  2018-07-16 12:37       ` Andrew Cooper
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-13  8:13 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
	Dario Faggioli

>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
> On 11/07/18 13:10, Jan Beulich wrote:
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>>  ### hpetbroadcast (x86)
>>  > `= <boolean>`
>>  
>> +### ht (x86)
> 
> I'd suggest smt rather than ht here.  SMT is the technical term, while
> HT is Intel's marketing name.

Hmm, many BIOSes (if the have such an option) talk about HT, which
to me makes "ht" a closer match. How about we allow both?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
  2018-07-12 11:48     ` Jan Beulich
@ 2018-07-13  8:39       ` Wei Liu
  0 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-13  8:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Thu, Jul 12, 2018 at 05:48:40AM -0600, Jan Beulich wrote:
> >>> On 12.07.18 at 12:53, <wei.liu2@citrix.com> wrote:
> > On Wed, Jul 11, 2018 at 06:06:04AM -0600, Jan Beulich wrote:
> > [...]
> >> --- a/xen/arch/x86/smpboot.c
> >> +++ b/xen/arch/x86/smpboot.c
> >> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
> >>  cpumask_t cpu_online_map __read_mostly;
> >>  EXPORT_SYMBOL(cpu_online_map);
> >>  
> >> +bool __read_mostly park_offline_cpus;
> >> +
> >>  unsigned int __read_mostly nr_sockets;
> >>  cpumask_t **__read_mostly socket_cpumask;
> >>  static cpumask_t *secondary_socket_cpumask;
> >> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
> >>      }
> >>  }
> >>  
> >> -static void cpu_smpboot_free(unsigned int cpu)
> >> +static void cpu_smpboot_free(unsigned int cpu, bool all)
> > 
> > I think "all" is too vague. It doesn't convey the idea what constitutes
> > "partial". But I don't have any better suggestion either.
> 
> Indeed I've been trying to come up with a better name before
> posting, but couldn't. One thing though - I don't think the name
> should in anyway be required to express what "partial" means.
> 

A sentence or two to describe what !all is for would be helpful.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
       [not found] ` <5B45F26A02000078001D312F@suse.com>
@ 2018-07-13  9:02   ` Juergen Gross
  2018-07-16  9:17     ` Jan Beulich
       [not found]     ` <5B4C629002000078001D4346@suse.com>
  0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-13  9:02 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Dario Faggioli

On 11/07/18 14:04, Jan Beulich wrote:
> While I've run into the issue with further patches in place which no
> longer guarantee the per-CPU area to start out as all zeros, the
> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
> there.
> 
> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
> should not happen before cpu_disable_scheduler()). Clearing it in
> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
> piece of code twice. Since the field's value shouldn't matter while the
> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
> only for other than the suspend/resume case (which gets specially
> handled in cpupool_cpu_remove()).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>      cpupool_cpu_remove(), but besides that - as per above - likely
>      being too early, that function has further prereqs to be met. It
>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>      there.
> 
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -778,6 +778,8 @@ static int cpu_callback(
>      {
>      case CPU_DOWN_FAILED:
>      case CPU_ONLINE:
> +        if ( system_state <= SYS_STATE_active )
> +            per_cpu(cpupool, cpu) = NULL;
>          rc = cpupool_cpu_add(cpu);

Wouldn't it make more sense to clear the field in cpupool_cpu_add()
which already is testing system_state?

Modifying the condition in cpupool_cpu_add() to

  if ( system_state <= SYS_STATE_active )

at the same time would have the benefit to catch problems in case
suspending cpus is failing during SYS_STATE_suspend (I'd expect
triggering the first ASSERT in schedule_cpu_switch() in this case).

It should be noted that this scenario is theoretical only, as today
the CPU_DOWN_FAILED case can't happen in the suspend case.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
  2018-07-13  9:02   ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Juergen Gross
@ 2018-07-16  9:17     ` Jan Beulich
       [not found]     ` <5B4C629002000078001D4346@suse.com>
  1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-16  9:17 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Dario Faggioli

>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
> On 11/07/18 14:04, Jan Beulich wrote:
>> While I've run into the issue with further patches in place which no
>> longer guarantee the per-CPU area to start out as all zeros, the
>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>> there.
>> 
>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>> should not happen before cpu_disable_scheduler()). Clearing it in
>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>> piece of code twice. Since the field's value shouldn't matter while the
>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>> only for other than the suspend/resume case (which gets specially
>> handled in cpupool_cpu_remove()).
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>      cpupool_cpu_remove(), but besides that - as per above - likely
>>      being too early, that function has further prereqs to be met. It
>>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>      there.
>> 
>> --- a/xen/common/cpupool.c
>> +++ b/xen/common/cpupool.c
>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>      {
>>      case CPU_DOWN_FAILED:
>>      case CPU_ONLINE:
>> +        if ( system_state <= SYS_STATE_active )
>> +            per_cpu(cpupool, cpu) = NULL;
>>          rc = cpupool_cpu_add(cpu);
> 
> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
> which already is testing system_state?

Hmm, this may be a matter of taste: I consider the change done here
a prereq to calling the function in the first place. As said in the
description, I actually think this should come earlier, and it's just that
I can't see how to cleanly do so.

> Modifying the condition in cpupool_cpu_add() to
> 
>   if ( system_state <= SYS_STATE_active )
> 
> at the same time would have the benefit to catch problems in case
> suspending cpus is failing during SYS_STATE_suspend (I'd expect
> triggering the first ASSERT in schedule_cpu_switch() in this case).

You mean the if() there, not the else? If so - how would the "else"
body then ever be reached? IOW if anything I could only see the
"else" to become "else if ( system_state <= SYS_STATE_active )".

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
       [not found]     ` <5B4C629002000078001D4346@suse.com>
@ 2018-07-16 11:47       ` Juergen Gross
  2018-07-16 12:19         ` Jan Beulich
       [not found]         ` <5B4C8D3702000078001D45EA@suse.com>
  0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-16 11:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Dario Faggioli

On 16/07/18 11:17, Jan Beulich wrote:
>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>> On 11/07/18 14:04, Jan Beulich wrote:
>>> While I've run into the issue with further patches in place which no
>>> longer guarantee the per-CPU area to start out as all zeros, the
>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>> there.
>>>
>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>> piece of code twice. Since the field's value shouldn't matter while the
>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>> only for other than the suspend/resume case (which gets specially
>>> handled in cpupool_cpu_remove()).
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>      cpupool_cpu_remove(), but besides that - as per above - likely
>>>      being too early, that function has further prereqs to be met. It
>>>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>      there.
>>>
>>> --- a/xen/common/cpupool.c
>>> +++ b/xen/common/cpupool.c
>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>      {
>>>      case CPU_DOWN_FAILED:
>>>      case CPU_ONLINE:
>>> +        if ( system_state <= SYS_STATE_active )
>>> +            per_cpu(cpupool, cpu) = NULL;
>>>          rc = cpupool_cpu_add(cpu);
>>
>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>> which already is testing system_state?
> 
> Hmm, this may be a matter of taste: I consider the change done here
> a prereq to calling the function in the first place. As said in the
> description, I actually think this should come earlier, and it's just that
> I can't see how to cleanly do so.
> 
>> Modifying the condition in cpupool_cpu_add() to
>>
>>   if ( system_state <= SYS_STATE_active )
>>
>> at the same time would have the benefit to catch problems in case
>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>> triggering the first ASSERT in schedule_cpu_switch() in this case).
> 
> You mean the if() there, not the else? If so - how would the "else"
> body then ever be reached? IOW if anything I could only see the
> "else" to become "else if ( system_state <= SYS_STATE_active )".

Bad wording on my side.

I should have written "the condition in cpupool_cpu_add() should match
if ( system_state <= SYS_STATE_active )."

So: "if ( system_state > SYS_STATE_active )", as the test is for the
other case.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
  2018-07-16 11:47       ` Juergen Gross
@ 2018-07-16 12:19         ` Jan Beulich
       [not found]         ` <5B4C8D3702000078001D45EA@suse.com>
  1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 12:19 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Dario Faggioli

>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
> On 16/07/18 11:17, Jan Beulich wrote:
>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>> While I've run into the issue with further patches in place which no
>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>> there.
>>>>
>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>> only for other than the suspend/resume case (which gets specially
>>>> handled in cpupool_cpu_remove()).
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>      cpupool_cpu_remove(), but besides that - as per above - likely
>>>>      being too early, that function has further prereqs to be met. It
>>>>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>      there.
>>>>
>>>> --- a/xen/common/cpupool.c
>>>> +++ b/xen/common/cpupool.c
>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>      {
>>>>      case CPU_DOWN_FAILED:
>>>>      case CPU_ONLINE:
>>>> +        if ( system_state <= SYS_STATE_active )
>>>> +            per_cpu(cpupool, cpu) = NULL;
>>>>          rc = cpupool_cpu_add(cpu);
>>>
>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>> which already is testing system_state?
>> 
>> Hmm, this may be a matter of taste: I consider the change done here
>> a prereq to calling the function in the first place. As said in the
>> description, I actually think this should come earlier, and it's just that
>> I can't see how to cleanly do so.

You didn't comment on this one at all, yet it matters for how a v2
is supposed to look like.

>>> Modifying the condition in cpupool_cpu_add() to
>>>
>>>   if ( system_state <= SYS_STATE_active )
>>>
>>> at the same time would have the benefit to catch problems in case
>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>> 
>> You mean the if() there, not the else? If so - how would the "else"
>> body then ever be reached? IOW if anything I could only see the
>> "else" to become "else if ( system_state <= SYS_STATE_active )".
> 
> Bad wording on my side.
> 
> I should have written "the condition in cpupool_cpu_add() should match
> if ( system_state <= SYS_STATE_active )."
> 
> So: "if ( system_state > SYS_STATE_active )", as the test is for the
> other case.

I'd recommend against this, as someone adding a new SYS_STATE_*
past suspend/resume would quite likely miss this one. The strong
ordering of states imo should only be used for active and lower states.
But yes, I could see the if() there to become suspend || resume to
address the problem you describe.

Coming back to your DOWN_FAILED consideration: Why are you
thinking this can't happen during suspend? disable_nonboot_cpus()
uses plain cpu_down() after all.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-13  8:13     ` Jan Beulich
@ 2018-07-16 12:37       ` Andrew Cooper
  2018-07-16 12:53         ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-16 12:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
	Dario Faggioli

On 13/07/18 09:13, Jan Beulich wrote:
>>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
>> On 11/07/18 13:10, Jan Beulich wrote:
>>> --- a/docs/misc/xen-command-line.markdown
>>> +++ b/docs/misc/xen-command-line.markdown
>>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>>>  ### hpetbroadcast (x86)
>>>  > `= <boolean>`
>>>  
>>> +### ht (x86)
>> I'd suggest smt rather than ht here.  SMT is the technical term, while
>> HT is Intel's marketing name.
> Hmm, many BIOSes (if the have such an option) talk about HT, which
> to me makes "ht" a closer match. How about we allow both?

That's because a BIOS is custom to the hardware it runs on.

Have you tried setting up an alias before? (given the specific
deficiency of the *_param() infrastructure in this area)  I'm don't
think an alias is worth the effort.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
       [not found]         ` <5B4C8D3702000078001D45EA@suse.com>
@ 2018-07-16 12:47           ` Juergen Gross
  2018-07-16 13:01             ` Jan Beulich
       [not found]             ` <5B4C973D02000078001D4693@suse.com>
  0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-16 12:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Dario Faggioli

On 16/07/18 14:19, Jan Beulich wrote:
>>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
>> On 16/07/18 11:17, Jan Beulich wrote:
>>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>>> While I've run into the issue with further patches in place which no
>>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>>> there.
>>>>>
>>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>>> only for other than the suspend/resume case (which gets specially
>>>>> handled in cpupool_cpu_remove()).
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> ---
>>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>>      cpupool_cpu_remove(), but besides that - as per above - likely
>>>>>      being too early, that function has further prereqs to be met. It
>>>>>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>>      there.
>>>>>
>>>>> --- a/xen/common/cpupool.c
>>>>> +++ b/xen/common/cpupool.c
>>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>>      {
>>>>>      case CPU_DOWN_FAILED:
>>>>>      case CPU_ONLINE:
>>>>> +        if ( system_state <= SYS_STATE_active )
>>>>> +            per_cpu(cpupool, cpu) = NULL;
>>>>>          rc = cpupool_cpu_add(cpu);
>>>>
>>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>>> which already is testing system_state?
>>>
>>> Hmm, this may be a matter of taste: I consider the change done here
>>> a prereq to calling the function in the first place. As said in the
>>> description, I actually think this should come earlier, and it's just that
>>> I can't see how to cleanly do so.
> 
> You didn't comment on this one at all, yet it matters for how a v2
> is supposed to look like.

My comment was thought to address this question, too. cpupool_cpu_add()
is handling the special case of resuming explicitly, where the old cpu
assignment to a cpupool is kept. So I believe setting
  per_cpu(cpupool, cpu) = NULL
in the else clause of cpupool_cpu_add() only is better.

>>>> Modifying the condition in cpupool_cpu_add() to
>>>>
>>>>   if ( system_state <= SYS_STATE_active )
>>>>
>>>> at the same time would have the benefit to catch problems in case
>>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>>
>>> You mean the if() there, not the else? If so - how would the "else"
>>> body then ever be reached? IOW if anything I could only see the
>>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>>
>> Bad wording on my side.
>>
>> I should have written "the condition in cpupool_cpu_add() should match
>> if ( system_state <= SYS_STATE_active )."
>>
>> So: "if ( system_state > SYS_STATE_active )", as the test is for the
>> other case.
> 
> I'd recommend against this, as someone adding a new SYS_STATE_*
> past suspend/resume would quite likely miss this one. The strong
> ordering of states imo should only be used for active and lower states.
> But yes, I could see the if() there to become suspend || resume to
> address the problem you describe.

Yes, this would seem to be a better choice here.

> Coming back to your DOWN_FAILED consideration: Why are you
> thinking this can't happen during suspend? disable_nonboot_cpus()
> uses plain cpu_down() after all.

Right.

DOWN_FAILED is used only once, and that is in cpu_down() after the step
CPU_DOWN_PREPARE returned an error. And CPU_DOWN_PREPARE is only used
for cpufreq driver where it never returns an error, and for cpupools
which don't matter here, as only other components failing at step
CPU_DOWN_PREPARE would lead to calling cpupool/DOWN_FAILED.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-16 12:37       ` Andrew Cooper
@ 2018-07-16 12:53         ` Jan Beulich
  2018-07-16 13:01           ` Andrew Cooper
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 12:53 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
	Dario Faggioli

>>> On 16.07.18 at 14:37, <andrew.cooper3@citrix.com> wrote:
> On 13/07/18 09:13, Jan Beulich wrote:
>>>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
>>> On 11/07/18 13:10, Jan Beulich wrote:
>>>> --- a/docs/misc/xen-command-line.markdown
>>>> +++ b/docs/misc/xen-command-line.markdown
>>>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>>>>  ### hpetbroadcast (x86)
>>>>  > `= <boolean>`
>>>>  
>>>> +### ht (x86)
>>> I'd suggest smt rather than ht here.  SMT is the technical term, while
>>> HT is Intel's marketing name.
>> Hmm, many BIOSes (if the have such an option) talk about HT, which
>> to me makes "ht" a closer match. How about we allow both?
> 
> That's because a BIOS is custom to the hardware it runs on.
> 
> Have you tried setting up an alias before? (given the specific
> deficiency of the *_param() infrastructure in this area)  I'm don't
> think an alias is worth the effort.

This reads as if you were expecting problems. Instead of

+int8_t __read_mostly opt_ht = -1;
+boolean_param("ht", opt_ht);

what we'd have is simply

+int8_t __read_mostly opt_ht = -1;
+boolean_param("ht", opt_ht);
+boolean_param("smt", opt_ht);

(and whether we use opt_ht or opt_smt doesn't matter much
to me). I don't see any source of possible issues with this.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
  2018-07-16 12:53         ` Jan Beulich
@ 2018-07-16 13:01           ` Andrew Cooper
  0 siblings, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-16 13:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
	Dario Faggioli

On 16/07/18 13:53, Jan Beulich wrote:
>>>> On 16.07.18 at 14:37, <andrew.cooper3@citrix.com> wrote:
>> On 13/07/18 09:13, Jan Beulich wrote:
>>>>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
>>>> On 11/07/18 13:10, Jan Beulich wrote:
>>>>> --- a/docs/misc/xen-command-line.markdown
>>>>> +++ b/docs/misc/xen-command-line.markdown
>>>>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>>>>>  ### hpetbroadcast (x86)
>>>>>  > `= <boolean>`
>>>>>  
>>>>> +### ht (x86)
>>>> I'd suggest smt rather than ht here.  SMT is the technical term, while
>>>> HT is Intel's marketing name.
>>> Hmm, many BIOSes (if the have such an option) talk about HT, which
>>> to me makes "ht" a closer match. How about we allow both?
>> That's because a BIOS is custom to the hardware it runs on.
>>
>> Have you tried setting up an alias before? (given the specific
>> deficiency of the *_param() infrastructure in this area)  I'm don't
>> think an alias is worth the effort.
> This reads as if you were expecting problems. Instead of
>
> +int8_t __read_mostly opt_ht = -1;
> +boolean_param("ht", opt_ht);
>
> what we'd have is simply
>
> +int8_t __read_mostly opt_ht = -1;
> +boolean_param("ht", opt_ht);
> +boolean_param("smt", opt_ht);
>
> (and whether we use opt_ht or opt_smt doesn't matter much
> to me). I don't see any source of possible issues with this.

Try compiling it.

I tried exactly this with bti= and spec-ctrl= originally.  The problem
is that the second argument must be (translation unit) unique, because
of the way it is used to form the name of the struct kernel_param.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
  2018-07-16 12:47           ` Juergen Gross
@ 2018-07-16 13:01             ` Jan Beulich
       [not found]             ` <5B4C973D02000078001D4693@suse.com>
  1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 13:01 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Dario Faggioli

>>> On 16.07.18 at 14:47, <jgross@suse.com> wrote:
> On 16/07/18 14:19, Jan Beulich wrote:
>>>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
>>> On 16/07/18 11:17, Jan Beulich wrote:
>>>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>>>> While I've run into the issue with further patches in place which no
>>>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>>>> there.
>>>>>>
>>>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>>>> only for other than the suspend/resume case (which gets specially
>>>>>> handled in cpupool_cpu_remove()).
>>>>>>
>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>> ---
>>>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>>>      cpupool_cpu_remove(), but besides that - as per above - likely
>>>>>>      being too early, that function has further prereqs to be met. It
>>>>>>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>>>      there.
>>>>>>
>>>>>> --- a/xen/common/cpupool.c
>>>>>> +++ b/xen/common/cpupool.c
>>>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>>>      {
>>>>>>      case CPU_DOWN_FAILED:
>>>>>>      case CPU_ONLINE:
>>>>>> +        if ( system_state <= SYS_STATE_active )
>>>>>> +            per_cpu(cpupool, cpu) = NULL;
>>>>>>          rc = cpupool_cpu_add(cpu);
>>>>>
>>>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>>>> which already is testing system_state?
>>>>
>>>> Hmm, this may be a matter of taste: I consider the change done here
>>>> a prereq to calling the function in the first place. As said in the
>>>> description, I actually think this should come earlier, and it's just that
>>>> I can't see how to cleanly do so.
>> 
>> You didn't comment on this one at all, yet it matters for how a v2
>> is supposed to look like.
> 
> My comment was thought to address this question, too. cpupool_cpu_add()
> is handling the special case of resuming explicitly, where the old cpu
> assignment to a cpupool is kept. So I believe setting
>   per_cpu(cpupool, cpu) = NULL
> in the else clause of cpupool_cpu_add() only is better.

Well, okay then. You're the maintainer.

>>>>> Modifying the condition in cpupool_cpu_add() to
>>>>>
>>>>>   if ( system_state <= SYS_STATE_active )
>>>>>
>>>>> at the same time would have the benefit to catch problems in case
>>>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>>>
>>>> You mean the if() there, not the else? If so - how would the "else"
>>>> body then ever be reached? IOW if anything I could only see the
>>>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>>>
>>> Bad wording on my side.
>>>
>>> I should have written "the condition in cpupool_cpu_add() should match
>>> if ( system_state <= SYS_STATE_active )."
>>>
>>> So: "if ( system_state > SYS_STATE_active )", as the test is for the
>>> other case.
>> 
>> I'd recommend against this, as someone adding a new SYS_STATE_*
>> past suspend/resume would quite likely miss this one. The strong
>> ordering of states imo should only be used for active and lower states.
>> But yes, I could see the if() there to become suspend || resume to
>> address the problem you describe.
> 
> Yes, this would seem to be a better choice here.
> 
>> Coming back to your DOWN_FAILED consideration: Why are you
>> thinking this can't happen during suspend? disable_nonboot_cpus()
>> uses plain cpu_down() after all.
> 
> Right.
> 
> DOWN_FAILED is used only once, and that is in cpu_down() after the step
> CPU_DOWN_PREPARE returned an error. And CPU_DOWN_PREPARE is only used
> for cpufreq driver where it never returns an error, and for cpupools
> which don't matter here, as only other components failing at step
> CPU_DOWN_PREPARE would lead to calling cpupool/DOWN_FAILED.

What about the stop_machine_run() failure case?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
       [not found]             ` <5B4C973D02000078001D4693@suse.com>
@ 2018-07-16 14:21               ` Juergen Gross
  2018-07-16 14:26                 ` Jan Beulich
       [not found]                 ` <5B4CAB1202000078001D47BC@suse.com>
  0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-16 14:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Dario Faggioli

On 16/07/18 15:01, Jan Beulich wrote:
>>>> On 16.07.18 at 14:47, <jgross@suse.com> wrote:
>> On 16/07/18 14:19, Jan Beulich wrote:
>>>>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
>>>> On 16/07/18 11:17, Jan Beulich wrote:
>>>>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>>>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>>>>> While I've run into the issue with further patches in place which no
>>>>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>>>>> there.
>>>>>>>
>>>>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>>>>> only for other than the suspend/resume case (which gets specially
>>>>>>> handled in cpupool_cpu_remove()).
>>>>>>>
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>> ---
>>>>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>>>>      cpupool_cpu_remove(), but besides that - as per above - likely
>>>>>>>      being too early, that function has further prereqs to be met. It
>>>>>>>      also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>>>>      there.
>>>>>>>
>>>>>>> --- a/xen/common/cpupool.c
>>>>>>> +++ b/xen/common/cpupool.c
>>>>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>>>>      {
>>>>>>>      case CPU_DOWN_FAILED:
>>>>>>>      case CPU_ONLINE:
>>>>>>> +        if ( system_state <= SYS_STATE_active )
>>>>>>> +            per_cpu(cpupool, cpu) = NULL;
>>>>>>>          rc = cpupool_cpu_add(cpu);
>>>>>>
>>>>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>>>>> which already is testing system_state?
>>>>>
>>>>> Hmm, this may be a matter of taste: I consider the change done here
>>>>> a prereq to calling the function in the first place. As said in the
>>>>> description, I actually think this should come earlier, and it's just that
>>>>> I can't see how to cleanly do so.
>>>
>>> You didn't comment on this one at all, yet it matters for how a v2
>>> is supposed to look like.
>>
>> My comment was thought to address this question, too. cpupool_cpu_add()
>> is handling the special case of resuming explicitly, where the old cpu
>> assignment to a cpupool is kept. So I believe setting
>>   per_cpu(cpupool, cpu) = NULL
>> in the else clause of cpupool_cpu_add() only is better.
> 
> Well, okay then. You're the maintainer.
> 
>>>>>> Modifying the condition in cpupool_cpu_add() to
>>>>>>
>>>>>>   if ( system_state <= SYS_STATE_active )
>>>>>>
>>>>>> at the same time would have the benefit to catch problems in case
>>>>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>>>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>>>>
>>>>> You mean the if() there, not the else? If so - how would the "else"
>>>>> body then ever be reached? IOW if anything I could only see the
>>>>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>>>>
>>>> Bad wording on my side.
>>>>
>>>> I should have written "the condition in cpupool_cpu_add() should match
>>>> if ( system_state <= SYS_STATE_active )."
>>>>
>>>> So: "if ( system_state > SYS_STATE_active )", as the test is for the
>>>> other case.
>>>
>>> I'd recommend against this, as someone adding a new SYS_STATE_*
>>> past suspend/resume would quite likely miss this one. The strong
>>> ordering of states imo should only be used for active and lower states.
>>> But yes, I could see the if() there to become suspend || resume to
>>> address the problem you describe.
>>
>> Yes, this would seem to be a better choice here.
>>
>>> Coming back to your DOWN_FAILED consideration: Why are you
>>> thinking this can't happen during suspend? disable_nonboot_cpus()
>>> uses plain cpu_down() after all.
>>
>> Right.
>>
>> DOWN_FAILED is used only once, and that is in cpu_down() after the step
>> CPU_DOWN_PREPARE returned an error. And CPU_DOWN_PREPARE is only used
>> for cpufreq driver where it never returns an error, and for cpupools
>> which don't matter here, as only other components failing at step
>> CPU_DOWN_PREPARE would lead to calling cpupool/DOWN_FAILED.
> 
> What about the stop_machine_run() failure case?

Oh. No idea how I missed that.

So maybe changing the condition in cpupool_cpu_add() should be split out
into a patch of its own in order to be able to backport it?


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
  2018-07-16 14:21               ` Juergen Gross
@ 2018-07-16 14:26                 ` Jan Beulich
       [not found]                 ` <5B4CAB1202000078001D47BC@suse.com>
  1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 14:26 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Dario Faggioli

>>> On 16.07.18 at 16:21, <jgross@suse.com> wrote:
> So maybe changing the condition in cpupool_cpu_add() should be split out
> into a patch of its own in order to be able to backport it?

We'll want/need to backport this change anyway.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
       [not found]                 ` <5B4CAB1202000078001D47BC@suse.com>
@ 2018-07-16 14:53                   ` Juergen Gross
  0 siblings, 0 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-16 14:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Dario Faggioli

On 16/07/18 16:26, Jan Beulich wrote:
>>>> On 16.07.18 at 16:21, <jgross@suse.com> wrote:
>> So maybe changing the condition in cpupool_cpu_add() should be split out
>> into a patch of its own in order to be able to backport it?
> 
> We'll want/need to backport this change anyway.

Okay, so even if its a bugfix on its own I'm fine with keeping it in the
patch. In case you like splitting the patches better I'm fine, too.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-07-16 14:53 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
2018-07-11 12:04 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Jan Beulich
2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
2018-07-12 10:53   ` Wei Liu
2018-07-12 11:48     ` Jan Beulich
2018-07-13  8:39       ` Wei Liu
2018-07-12 12:42   ` Andrew Cooper
2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
2018-07-12 10:55   ` Wei Liu
2018-07-12 12:44   ` Andrew Cooper
2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
2018-07-11 18:11   ` Brian Woods
2018-07-12 13:02   ` Andrew Cooper
2018-07-12 14:22     ` Jan Beulich
2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
2018-07-12 15:38   ` Andrew Cooper
2018-07-13  8:11     ` Jan Beulich
2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
2018-07-12 15:45   ` Andrew Cooper
2018-07-13  8:13     ` Jan Beulich
2018-07-16 12:37       ` Andrew Cooper
2018-07-16 12:53         ` Jan Beulich
2018-07-16 13:01           ` Andrew Cooper
2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
2018-07-11 12:23   ` Andrew Cooper
2018-07-11 15:18   ` Roger Pau Monné
2018-07-11 16:02   ` Wei Liu
2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
2018-07-11 12:20   ` Andrew Cooper
2018-07-12 15:13   ` Wei Liu
     [not found] ` <5B45F26A02000078001D312F@suse.com>
2018-07-13  9:02   ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Juergen Gross
2018-07-16  9:17     ` Jan Beulich
     [not found]     ` <5B4C629002000078001D4346@suse.com>
2018-07-16 11:47       ` Juergen Gross
2018-07-16 12:19         ` Jan Beulich
     [not found]         ` <5B4C8D3702000078001D45EA@suse.com>
2018-07-16 12:47           ` Juergen Gross
2018-07-16 13:01             ` Jan Beulich
     [not found]             ` <5B4C973D02000078001D4693@suse.com>
2018-07-16 14:21               ` Juergen Gross
2018-07-16 14:26                 ` Jan Beulich
     [not found]                 ` <5B4CAB1202000078001D47BC@suse.com>
2018-07-16 14:53                   ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.