* [PATCH 1/8] cpupools: fix state when downing a CPU failed
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
@ 2018-07-11 12:04 ` Jan Beulich
2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
` (7 subsequent siblings)
8 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:04 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, Dario Faggioli
While I've run into the issue with further patches in place which no
longer guarantee the per-CPU area to start out as all zeros, the
CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
of schedule_cpu_switch() will trigger the "c != old_pool" assertion
there.
Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
should not happen before cpu_disable_scheduler()). Clearing it in
CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
piece of code twice. Since the field's value shouldn't matter while the
CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
only for other than the suspend/resume case (which gets specially
handled in cpupool_cpu_remove()).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
cpupool_cpu_remove(), but besides that - as per above - likely
being too early, that function has further prereqs to be met. It
also doesn't look as if cpupool_unassign_cpu_helper() could be used
there.
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -778,6 +778,8 @@ static int cpu_callback(
{
case CPU_DOWN_FAILED:
case CPU_ONLINE:
+ if ( system_state <= SYS_STATE_active )
+ per_cpu(cpupool, cpu) = NULL;
rc = cpupool_cpu_add(cpu);
break;
case CPU_DOWN_PREPARE:
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
2018-07-11 12:04 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Jan Beulich
@ 2018-07-11 12:06 ` Jan Beulich
2018-07-12 10:53 ` Wei Liu
2018-07-12 12:42 ` Andrew Cooper
2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
` (6 subsequent siblings)
8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:06 UTC (permalink / raw)
To: xen-devel
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Julien Grall
In order to be able to service #MC on offlined CPUs, GDT, IDT, stack,
and per-CPU data (which includes the TSS) need to be kept allocated.
They should only be freed upon CPU removal (which we currently don't
support, so some code is becoming effectively dead for the moment).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -692,12 +692,15 @@ static void cpu_bank_free(unsigned int c
mcabanks_free(poll);
mcabanks_free(clr);
+
+ per_cpu(poll_bankmask, cpu) = NULL;
+ per_cpu(mce_clear_banks, cpu) = NULL;
}
static int cpu_bank_alloc(unsigned int cpu)
{
- struct mca_banks *poll = mcabanks_alloc();
- struct mca_banks *clr = mcabanks_alloc();
+ struct mca_banks *poll = per_cpu(poll_bankmask, cpu) ?: mcabanks_alloc();
+ struct mca_banks *clr = per_cpu(mce_clear_banks, cpu) ?: mcabanks_alloc();
if ( !poll || !clr )
{
@@ -725,7 +728,13 @@ static int cpu_callback(
case CPU_UP_CANCELED:
case CPU_DEAD:
- cpu_bank_free(cpu);
+ if ( !park_offline_cpus )
+ cpu_bank_free(cpu);
+ break;
+
+ case CPU_REMOVE:
+ if ( park_offline_cpus )
+ cpu_bank_free(cpu);
break;
}
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -107,10 +107,11 @@ static void play_dead(void)
local_irq_disable();
/*
- * NOTE: After cpu_exit_clear, per-cpu variables are no longer accessible,
- * as they may be freed at any time. In this case, heap corruption or
- * #PF can occur (when heap debugging is enabled). For example, even
- * printk() can involve tasklet scheduling, which touches per-cpu vars.
+ * NOTE: After cpu_exit_clear, per-cpu variables may no longer accessible,
+ * as they may be freed at any time if offline CPUs don't get parked. In
+ * this case, heap corruption or #PF can occur (when heap debugging is
+ * enabled). For example, even printk() can involve tasklet scheduling,
+ * which touches per-cpu vars.
*
* Consider very carefully when adding code to *dead_idle. Most hypervisor
* subsystems are unsafe to call.
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -201,18 +201,25 @@ static int update_clusterinfo(
if ( !cluster_cpus_spare )
cluster_cpus_spare = xzalloc(cpumask_t);
if ( !cluster_cpus_spare ||
- !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
+ !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
err = -ENOMEM;
break;
case CPU_UP_CANCELED:
case CPU_DEAD:
+ case CPU_REMOVE:
+ if ( park_offline_cpus == (action != CPU_REMOVE) )
+ break;
if ( per_cpu(cluster_cpus, cpu) )
{
cpumask_clear_cpu(cpu, per_cpu(cluster_cpus, cpu));
if ( cpumask_empty(per_cpu(cluster_cpus, cpu)) )
+ {
xfree(per_cpu(cluster_cpus, cpu));
+ per_cpu(cluster_cpus, cpu) = NULL;
+ }
}
free_cpumask_var(per_cpu(scratch_mask, cpu));
+ clear_cpumask_var(&per_cpu(scratch_mask, cpu));
break;
}
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -28,7 +28,7 @@ static int init_percpu_area(unsigned int
char *p;
if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
- return -EBUSY;
+ return 0;
if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
return -ENOMEM;
@@ -76,9 +76,12 @@ static int cpu_percpu_callback(
break;
case CPU_UP_CANCELED:
case CPU_DEAD:
- free_percpu_area(cpu);
+ if ( !park_offline_cpus )
+ free_percpu_area(cpu);
break;
- default:
+ case CPU_REMOVE:
+ if ( park_offline_cpus )
+ free_percpu_area(cpu);
break;
}
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
cpumask_t cpu_online_map __read_mostly;
EXPORT_SYMBOL(cpu_online_map);
+bool __read_mostly park_offline_cpus;
+
unsigned int __read_mostly nr_sockets;
cpumask_t **__read_mostly socket_cpumask;
static cpumask_t *secondary_socket_cpumask;
@@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
}
}
-static void cpu_smpboot_free(unsigned int cpu)
+static void cpu_smpboot_free(unsigned int cpu, bool all)
{
unsigned int order, socket = cpu_to_socket(cpu);
struct cpuinfo_x86 *c = cpu_data;
@@ -898,15 +900,24 @@ static void cpu_smpboot_free(unsigned in
socket_cpumask[socket] = NULL;
}
- c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
- c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
- c[cpu].compute_unit_id = INVALID_CUID;
cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
- free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
- free_cpumask_var(per_cpu(cpu_core_mask, cpu));
- if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
- free_cpumask_var(per_cpu(scratch_cpumask, cpu));
+ if ( all )
+ {
+ c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
+ c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
+ c[cpu].compute_unit_id = INVALID_CUID;
+
+ free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
+ clear_cpumask_var(&per_cpu(cpu_sibling_mask, cpu));
+ free_cpumask_var(per_cpu(cpu_core_mask, cpu));
+ clear_cpumask_var(&per_cpu(cpu_core_mask, cpu));
+ if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
+ {
+ free_cpumask_var(per_cpu(scratch_cpumask, cpu));
+ clear_cpumask_var(&per_cpu(scratch_cpumask, cpu));
+ }
+ }
cleanup_cpu_root_pgt(cpu);
@@ -928,19 +939,26 @@ static void cpu_smpboot_free(unsigned in
}
order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
- free_xenheap_pages(per_cpu(gdt_table, cpu), order);
+ if ( all )
+ {
+ free_xenheap_pages(per_cpu(gdt_table, cpu), order);
+ per_cpu(gdt_table, cpu) = NULL;
+ }
free_xenheap_pages(per_cpu(compat_gdt_table, cpu), order);
- order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
- free_xenheap_pages(idt_tables[cpu], order);
- idt_tables[cpu] = NULL;
-
- if ( stack_base[cpu] != NULL )
+ if ( all )
{
- memguard_unguard_stack(stack_base[cpu]);
- free_xenheap_pages(stack_base[cpu], STACK_ORDER);
- stack_base[cpu] = NULL;
+ order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
+ free_xenheap_pages(idt_tables[cpu], order);
+ idt_tables[cpu] = NULL;
+
+ if ( stack_base[cpu] != NULL )
+ {
+ memguard_unguard_stack(stack_base[cpu]);
+ free_xenheap_pages(stack_base[cpu], STACK_ORDER);
+ stack_base[cpu] = NULL;
+ }
}
}
@@ -955,15 +973,19 @@ static int cpu_smpboot_alloc(unsigned in
if ( node != NUMA_NO_NODE )
memflags = MEMF_node(node);
- stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
+ if ( stack_base[cpu] == NULL )
+ stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
if ( stack_base[cpu] == NULL )
goto out;
memguard_guard_stack(stack_base[cpu]);
order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
- per_cpu(gdt_table, cpu) = gdt = alloc_xenheap_pages(order, memflags);
+ gdt = per_cpu(gdt_table, cpu);
+ if ( gdt == NULL )
+ gdt = alloc_xenheap_pages(order, memflags);
if ( gdt == NULL )
goto out;
+ per_cpu(gdt_table, cpu) = gdt;
memcpy(gdt, boot_cpu_gdt_table, NR_RESERVED_GDT_PAGES * PAGE_SIZE);
BUILD_BUG_ON(NR_CPUS > 0x10000);
gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
@@ -975,7 +997,8 @@ static int cpu_smpboot_alloc(unsigned in
gdt[PER_CPU_GDT_ENTRY - FIRST_RESERVED_GDT_ENTRY].a = cpu;
order = get_order_from_bytes(IDT_ENTRIES * sizeof(idt_entry_t));
- idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
+ if ( idt_tables[cpu] == NULL )
+ idt_tables[cpu] = alloc_xenheap_pages(order, memflags);
if ( idt_tables[cpu] == NULL )
goto out;
memcpy(idt_tables[cpu], idt_table, IDT_ENTRIES * sizeof(idt_entry_t));
@@ -1003,16 +1026,16 @@ static int cpu_smpboot_alloc(unsigned in
(secondary_socket_cpumask = xzalloc(cpumask_t)) == NULL )
goto out;
- if ( !(zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
- zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
- alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
+ if ( !(cond_zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
+ cond_zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) &&
+ cond_alloc_cpumask_var(&per_cpu(scratch_cpumask, cpu))) )
goto out;
rc = 0;
out:
if ( rc )
- cpu_smpboot_free(cpu);
+ cpu_smpboot_free(cpu, true);
return rc;
}
@@ -1030,9 +1053,10 @@ static int cpu_smpboot_callback(
break;
case CPU_UP_CANCELED:
case CPU_DEAD:
- cpu_smpboot_free(cpu);
+ cpu_smpboot_free(cpu, !park_offline_cpus);
break;
- default:
+ case CPU_REMOVE:
+ cpu_smpboot_free(cpu, true);
break;
}
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -26,6 +26,8 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibli
DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
+extern bool park_offline_cpus;
+
void smp_send_nmi_allbutself(void);
void send_IPI_mask(const cpumask_t *, int vector);
--- a/xen/include/xen/cpu.h
+++ b/xen/include/xen/cpu.h
@@ -47,6 +47,8 @@ void register_cpu_notifier(struct notifi
#define CPU_DYING (0x0007 | NOTIFY_REVERSE)
/* CPU_DEAD: CPU is dead. */
#define CPU_DEAD (0x0008 | NOTIFY_REVERSE)
+/* CPU_REMOVE: CPU was removed. */
+#define CPU_REMOVE (0x0009 | NOTIFY_REVERSE)
/* Perform CPU hotplug. May return -EAGAIN. */
int cpu_down(unsigned int cpu);
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -351,16 +351,37 @@ static inline bool_t alloc_cpumask_var(c
return *mask != NULL;
}
+static inline bool cond_alloc_cpumask_var(cpumask_var_t *mask)
+{
+ if (*mask == NULL)
+ *mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+ return *mask != NULL;
+}
+
static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
{
*(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
return *mask != NULL;
}
+static inline bool cond_zalloc_cpumask_var(cpumask_var_t *mask)
+{
+ if (*mask == NULL)
+ *mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+ else
+ cpumask_clear(*mask);
+ return *mask != NULL;
+}
+
static inline void free_cpumask_var(cpumask_var_t mask)
{
xfree(mask);
}
+
+static inline void clear_cpumask_var(cpumask_var_t *mask)
+{
+ *mask = NULL;
+}
#else
typedef cpumask_t cpumask_var_t[1];
@@ -368,16 +389,22 @@ static inline bool_t alloc_cpumask_var(c
{
return 1;
}
+#define cond_alloc_cpumask_var alloc_cpumask_var
static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
{
cpumask_clear(*mask);
return 1;
}
+#define cond_zalloc_cpumask_var zalloc_cpumask_var
static inline void free_cpumask_var(cpumask_var_t mask)
{
}
+
+static inline void clear_cpumask_var(cpumask_var_t *mask)
+{
+}
#endif
#if NR_CPUS > 1
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
@ 2018-07-12 10:53 ` Wei Liu
2018-07-12 11:48 ` Jan Beulich
2018-07-12 12:42 ` Andrew Cooper
1 sibling, 1 reply; 39+ messages in thread
From: Wei Liu @ 2018-07-12 10:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Julien Grall, xen-devel
On Wed, Jul 11, 2018 at 06:06:04AM -0600, Jan Beulich wrote:
[...]
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
> cpumask_t cpu_online_map __read_mostly;
> EXPORT_SYMBOL(cpu_online_map);
>
> +bool __read_mostly park_offline_cpus;
> +
> unsigned int __read_mostly nr_sockets;
> cpumask_t **__read_mostly socket_cpumask;
> static cpumask_t *secondary_socket_cpumask;
> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
> }
> }
>
> -static void cpu_smpboot_free(unsigned int cpu)
> +static void cpu_smpboot_free(unsigned int cpu, bool all)
I think "all" is too vague. It doesn't convey the idea what constitutes
"partial". But I don't have any better suggestion either.
> {
> unsigned int order, socket = cpu_to_socket(cpu);
> struct cpuinfo_x86 *c = cpu_data;
> @@ -898,15 +900,24 @@ static void cpu_smpboot_free(unsigned in
> socket_cpumask[socket] = NULL;
> }
>
> - c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
> - c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
> - c[cpu].compute_unit_id = INVALID_CUID;
> cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
>
> - free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
> - free_cpumask_var(per_cpu(cpu_core_mask, cpu));
> - if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
> - free_cpumask_var(per_cpu(scratch_cpumask, cpu));
> + if ( all )
> + {
> + c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
> + c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
> + c[cpu].compute_unit_id = INVALID_CUID;
> +
> + free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
> + clear_cpumask_var(&per_cpu(cpu_sibling_mask, cpu));
> + free_cpumask_var(per_cpu(cpu_core_mask, cpu));
> + clear_cpumask_var(&per_cpu(cpu_core_mask, cpu));
> + if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
> + {
> + free_cpumask_var(per_cpu(scratch_cpumask, cpu));
> + clear_cpumask_var(&per_cpu(scratch_cpumask, cpu));
> + }
One thing that's not mentioned in the commit message is why various
masks are kept. I guess they are needed for some other parts of the
system to remain operational.
The rest looks fine to me.
Wei.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
2018-07-12 10:53 ` Wei Liu
@ 2018-07-12 11:48 ` Jan Beulich
2018-07-13 8:39 ` Wei Liu
0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-12 11:48 UTC (permalink / raw)
To: Wei Liu
Cc: Stefano Stabellini, George Dunlap, Andrew Cooper, Ian Jackson,
Tim Deegan, Julien Grall, xen-devel
>>> On 12.07.18 at 12:53, <wei.liu2@citrix.com> wrote:
> On Wed, Jul 11, 2018 at 06:06:04AM -0600, Jan Beulich wrote:
> [...]
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
>> cpumask_t cpu_online_map __read_mostly;
>> EXPORT_SYMBOL(cpu_online_map);
>>
>> +bool __read_mostly park_offline_cpus;
>> +
>> unsigned int __read_mostly nr_sockets;
>> cpumask_t **__read_mostly socket_cpumask;
>> static cpumask_t *secondary_socket_cpumask;
>> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
>> }
>> }
>>
>> -static void cpu_smpboot_free(unsigned int cpu)
>> +static void cpu_smpboot_free(unsigned int cpu, bool all)
>
> I think "all" is too vague. It doesn't convey the idea what constitutes
> "partial". But I don't have any better suggestion either.
Indeed I've been trying to come up with a better name before
posting, but couldn't. One thing though - I don't think the name
should in anyway be required to express what "partial" means.
>> @@ -898,15 +900,24 @@ static void cpu_smpboot_free(unsigned in
>> socket_cpumask[socket] = NULL;
>> }
>>
>> - c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
>> - c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
>> - c[cpu].compute_unit_id = INVALID_CUID;
>> cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
>>
>> - free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
>> - free_cpumask_var(per_cpu(cpu_core_mask, cpu));
>> - if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
>> - free_cpumask_var(per_cpu(scratch_cpumask, cpu));
>> + if ( all )
>> + {
>> + c[cpu].phys_proc_id = XEN_INVALID_SOCKET_ID;
>> + c[cpu].cpu_core_id = XEN_INVALID_CORE_ID;
>> + c[cpu].compute_unit_id = INVALID_CUID;
>> +
>> + free_cpumask_var(per_cpu(cpu_sibling_mask, cpu));
>> + clear_cpumask_var(&per_cpu(cpu_sibling_mask, cpu));
>> + free_cpumask_var(per_cpu(cpu_core_mask, cpu));
>> + clear_cpumask_var(&per_cpu(cpu_core_mask, cpu));
>> + if ( per_cpu(scratch_cpumask, cpu) != &scratch_cpu0mask )
>> + {
>> + free_cpumask_var(per_cpu(scratch_cpumask, cpu));
>> + clear_cpumask_var(&per_cpu(scratch_cpumask, cpu));
>> + }
>
> One thing that's not mentioned in the commit message is why various
> masks are kept. I guess they are needed for some other parts of the
> system to remain operational.
Hmm, I've taken the opposite approach: Free in "partial" mode
only what I could prove won't be needed.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
2018-07-12 11:48 ` Jan Beulich
@ 2018-07-13 8:39 ` Wei Liu
0 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-13 8:39 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Julien Grall, xen-devel
On Thu, Jul 12, 2018 at 05:48:40AM -0600, Jan Beulich wrote:
> >>> On 12.07.18 at 12:53, <wei.liu2@citrix.com> wrote:
> > On Wed, Jul 11, 2018 at 06:06:04AM -0600, Jan Beulich wrote:
> > [...]
> >> --- a/xen/arch/x86/smpboot.c
> >> +++ b/xen/arch/x86/smpboot.c
> >> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
> >> cpumask_t cpu_online_map __read_mostly;
> >> EXPORT_SYMBOL(cpu_online_map);
> >>
> >> +bool __read_mostly park_offline_cpus;
> >> +
> >> unsigned int __read_mostly nr_sockets;
> >> cpumask_t **__read_mostly socket_cpumask;
> >> static cpumask_t *secondary_socket_cpumask;
> >> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
> >> }
> >> }
> >>
> >> -static void cpu_smpboot_free(unsigned int cpu)
> >> +static void cpu_smpboot_free(unsigned int cpu, bool all)
> >
> > I think "all" is too vague. It doesn't convey the idea what constitutes
> > "partial". But I don't have any better suggestion either.
>
> Indeed I've been trying to come up with a better name before
> posting, but couldn't. One thing though - I don't think the name
> should in anyway be required to express what "partial" means.
>
A sentence or two to describe what !all is for would be helpful.
Wei.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 2/8] x86: distinguish CPU offlining from CPU removal
2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
2018-07-12 10:53 ` Wei Liu
@ 2018-07-12 12:42 ` Andrew Cooper
1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 12:42 UTC (permalink / raw)
To: Jan Beulich, xen-devel
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
Ian Jackson, Julien Grall
On 11/07/18 13:06, Jan Beulich wrote:
> In order to be able to service #MC on offlined CPUs, GDT, IDT, stack,
For clarity, I'd phrase this as "CPUs, the GDT, ..." to help visually
separate CPUs as it isn't a part of the rest of the list.
> --- a/xen/arch/x86/genapic/x2apic.c
> +++ b/xen/arch/x86/genapic/x2apic.c
> @@ -201,18 +201,25 @@ static int update_clusterinfo(
> if ( !cluster_cpus_spare )
> cluster_cpus_spare = xzalloc(cpumask_t);
> if ( !cluster_cpus_spare ||
> - !alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
> + !cond_alloc_cpumask_var(&per_cpu(scratch_mask, cpu)) )
> err = -ENOMEM;
> break;
> case CPU_UP_CANCELED:
> case CPU_DEAD:
> + case CPU_REMOVE:
> + if ( park_offline_cpus == (action != CPU_REMOVE) )
> + break;
> if ( per_cpu(cluster_cpus, cpu) )
> {
> cpumask_clear_cpu(cpu, per_cpu(cluster_cpus, cpu));
> if ( cpumask_empty(per_cpu(cluster_cpus, cpu)) )
> + {
> xfree(per_cpu(cluster_cpus, cpu));
> + per_cpu(cluster_cpus, cpu) = NULL;
> + }
> }
> free_cpumask_var(per_cpu(scratch_mask, cpu));
> + clear_cpumask_var(&per_cpu(scratch_mask, cpu));
freeing and NULL-ing the pointer at the same time is already a common
pattern How about introducing
/* Free an allocation, and zero the pointer to it. */
#define XFREE(p)
({
typeof(*p) *_p = (p);
xfree(_p);
_p = NULL;
})
and a similar wrapper for FREE_CPUMASK_VAR(p) which takes care of the
NULL-ing as well?
In time as these start to get used more widely, it should reduce the
chances of double-freeing in cleanup paths.
> @@ -63,6 +63,8 @@ static cpumask_t scratch_cpu0mask;
> cpumask_t cpu_online_map __read_mostly;
> EXPORT_SYMBOL(cpu_online_map);
>
> +bool __read_mostly park_offline_cpus;
> +
> unsigned int __read_mostly nr_sockets;
> cpumask_t **__read_mostly socket_cpumask;
> static cpumask_t *secondary_socket_cpumask;
> @@ -887,7 +889,7 @@ static void cleanup_cpu_root_pgt(unsigne
> }
> }
>
> -static void cpu_smpboot_free(unsigned int cpu)
> +static void cpu_smpboot_free(unsigned int cpu, bool all)
Perhaps "remove" or "remove_cpu", to match the CPU_REMOVE terminology?
Also, we probably want a comment here explaining the difference between
parking and removing in terms of what infrastructure needs to remain valid.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 3/8] allow cpu_down() to be called earlier
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
2018-07-11 12:04 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Jan Beulich
2018-07-11 12:06 ` [PATCH 2/8] x86: distinguish CPU offlining from CPU removal Jan Beulich
@ 2018-07-11 12:06 ` Jan Beulich
2018-07-12 10:55 ` Wei Liu
2018-07-12 12:44 ` Andrew Cooper
2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
` (5 subsequent siblings)
8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:06 UTC (permalink / raw)
To: xen-devel
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Julien Grall
The function's use of the stop-machine logic has so far prevented its
use ahead of the processing of the "ordinary" initcalls. Since at this
early time we're in a controlled environment anyway, there's no need for
such a heavy tool. Additionally this ought to have less of a performance
impact especially on large systems, compared to the alternative of
making stop-machine functionality available earlier.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -67,12 +67,17 @@ void __init register_cpu_notifier(struct
spin_unlock(&cpu_add_remove_lock);
}
-static int take_cpu_down(void *unused)
+static void _take_cpu_down(void *unused)
{
void *hcpu = (void *)(long)smp_processor_id();
int notifier_rc = notifier_call_chain(&cpu_chain, CPU_DYING, hcpu, NULL);
BUG_ON(notifier_rc != NOTIFY_DONE);
__cpu_disable();
+}
+
+static int take_cpu_down(void *arg)
+{
+ _take_cpu_down(arg);
return 0;
}
@@ -98,7 +103,9 @@ int cpu_down(unsigned int cpu)
goto fail;
}
- if ( (err = stop_machine_run(take_cpu_down, NULL, cpu)) < 0 )
+ if ( unlikely(system_state < SYS_STATE_active) )
+ on_selected_cpus(cpumask_of(cpu), _take_cpu_down, NULL, true);
+ else if ( (err = stop_machine_run(take_cpu_down, NULL, cpu)) < 0 )
goto fail;
__cpu_die(cpu);
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 3/8] allow cpu_down() to be called earlier
2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
@ 2018-07-12 10:55 ` Wei Liu
2018-07-12 12:44 ` Andrew Cooper
1 sibling, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-12 10:55 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Julien Grall, xen-devel
On Wed, Jul 11, 2018 at 06:06:52AM -0600, Jan Beulich wrote:
> The function's use of the stop-machine logic has so far prevented its
> use ahead of the processing of the "ordinary" initcalls. Since at this
> early time we're in a controlled environment anyway, there's no need for
> such a heavy tool. Additionally this ought to have less of a performance
> impact especially on large systems, compared to the alternative of
> making stop-machine functionality available earlier.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 3/8] allow cpu_down() to be called earlier
2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
2018-07-12 10:55 ` Wei Liu
@ 2018-07-12 12:44 ` Andrew Cooper
1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 12:44 UTC (permalink / raw)
To: Jan Beulich, xen-devel
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
Ian Jackson, Julien Grall
On 11/07/18 13:06, Jan Beulich wrote:
> The function's use of the stop-machine logic has so far prevented its
> use ahead of the processing of the "ordinary" initcalls. Since at this
> early time we're in a controlled environment anyway, there's no need for
> such a heavy tool. Additionally this ought to have less of a performance
> impact especially on large systems, compared to the alternative of
> making stop-machine functionality available earlier.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
` (2 preceding siblings ...)
2018-07-11 12:06 ` [PATCH 3/8] allow cpu_down() to be called earlier Jan Beulich
@ 2018-07-11 12:07 ` Jan Beulich
2018-07-11 18:11 ` Brian Woods
2018-07-12 13:02 ` Andrew Cooper
2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
` (4 subsequent siblings)
8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:07 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit
Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
the difference is not very big. The most relevant change (requiring some
code restructuring) is that the topoext feature no longer means there is
a valid CU ID.
Take the opportunity and convert wrongly plain int variables in
set_cpu_sibling_map() to unsigned int.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
u32 eax, ebx, ecx, edx;
cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
- c->compute_unit_id = ebx & 0xFF;
c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
+
+ if (c->x86 < 0x17)
+ c->compute_unit_id = ebx & 0xFF;
+ else {
+ c->cpu_core_id = ebx & 0xFF;
+ c->x86_max_cores /= c->x86_num_siblings;
+ }
}
if (opt_cpu_info)
printk("CPU %d(%d) -> Processor %d, %s %d\n",
cpu, c->x86_max_cores, c->phys_proc_id,
- cpu_has(c, X86_FEATURE_TOPOEXT) ? "Compute Unit" :
- "Core",
- cpu_has(c, X86_FEATURE_TOPOEXT) ? c->compute_unit_id :
- c->cpu_core_id);
+ c->compute_unit_id != INVALID_CUID ? "Compute Unit"
+ : "Core",
+ c->compute_unit_id != INVALID_CUID ? c->compute_unit_id
+ : c->cpu_core_id);
}
static void early_init_amd(struct cpuinfo_x86 *c)
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -236,33 +236,41 @@ static void link_thread_siblings(int cpu
cpumask_set_cpu(cpu2, per_cpu(cpu_core_mask, cpu1));
}
-static void set_cpu_sibling_map(int cpu)
+static void set_cpu_sibling_map(unsigned int cpu)
{
- int i;
+ unsigned int i;
struct cpuinfo_x86 *c = cpu_data;
cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
+ cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
+ cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
if ( c[cpu].x86_num_siblings > 1 )
{
for_each_cpu ( i, &cpu_sibling_setup_map )
{
- if ( cpu_has(c, X86_FEATURE_TOPOEXT) ) {
- if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
- (c[cpu].compute_unit_id == c[i].compute_unit_id) )
+ if ( cpu == i || c[cpu].phys_proc_id != c[i].phys_proc_id )
+ continue;
+ if ( c[cpu].compute_unit_id != INVALID_CUID &&
+ c[i].compute_unit_id != INVALID_CUID )
+ {
+ if ( c[cpu].compute_unit_id == c[i].compute_unit_id )
link_thread_siblings(cpu, i);
- } else if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
- (c[cpu].cpu_core_id == c[i].cpu_core_id) ) {
- link_thread_siblings(cpu, i);
}
+ else if ( c[cpu].cpu_core_id != XEN_INVALID_CORE_ID &&
+ c[i].cpu_core_id != XEN_INVALID_CORE_ID )
+ {
+ if ( c[cpu].cpu_core_id == c[i].cpu_core_id )
+ link_thread_siblings(cpu, i);
+ }
+ else
+ printk(XENLOG_WARNING
+ "CPU%u: unclear relationship with CPU%u\n",
+ cpu, i);
}
}
- else
- {
- cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
- }
if ( c[cpu].x86_max_cores == 1 )
{
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
@ 2018-07-11 18:11 ` Brian Woods
2018-07-12 13:02 ` Andrew Cooper
1 sibling, 0 replies; 39+ messages in thread
From: Brian Woods @ 2018-07-11 18:11 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit, Andrew Cooper
On Wed, Jul 11, 2018 at 06:07:42AM -0600, Jan Beulich wrote:
> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
> the difference is not very big. The most relevant change (requiring some
> code restructuring) is that the topoext feature no longer means there is
> a valid CU ID.
>
> Take the opportunity and convert wrongly plain int variables in
> set_cpu_sibling_map() to unsigned int.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
> u32 eax, ebx, ecx, edx;
>
> cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
> - c->compute_unit_id = ebx & 0xFF;
> c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
> +
> + if (c->x86 < 0x17)
> + c->compute_unit_id = ebx & 0xFF;
> + else {
> + c->cpu_core_id = ebx & 0xFF;
> + c->x86_max_cores /= c->x86_num_siblings;
> + }
> }
>
> if (opt_cpu_info)
> printk("CPU %d(%d) -> Processor %d, %s %d\n",
> cpu, c->x86_max_cores, c->phys_proc_id,
> - cpu_has(c, X86_FEATURE_TOPOEXT) ? "Compute Unit" :
> - "Core",
> - cpu_has(c, X86_FEATURE_TOPOEXT) ? c->compute_unit_id :
> - c->cpu_core_id);
> + c->compute_unit_id != INVALID_CUID ? "Compute Unit"
> + : "Core",
> + c->compute_unit_id != INVALID_CUID ? c->compute_unit_id
> + : c->cpu_core_id);
> }
>
> static void early_init_amd(struct cpuinfo_x86 *c)
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -236,33 +236,41 @@ static void link_thread_siblings(int cpu
> cpumask_set_cpu(cpu2, per_cpu(cpu_core_mask, cpu1));
> }
>
> -static void set_cpu_sibling_map(int cpu)
> +static void set_cpu_sibling_map(unsigned int cpu)
> {
> - int i;
> + unsigned int i;
> struct cpuinfo_x86 *c = cpu_data;
>
> cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
>
> cpumask_set_cpu(cpu, socket_cpumask[cpu_to_socket(cpu)]);
> + cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu));
> + cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
>
> if ( c[cpu].x86_num_siblings > 1 )
> {
> for_each_cpu ( i, &cpu_sibling_setup_map )
> {
> - if ( cpu_has(c, X86_FEATURE_TOPOEXT) ) {
> - if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
> - (c[cpu].compute_unit_id == c[i].compute_unit_id) )
> + if ( cpu == i || c[cpu].phys_proc_id != c[i].phys_proc_id )
> + continue;
> + if ( c[cpu].compute_unit_id != INVALID_CUID &&
> + c[i].compute_unit_id != INVALID_CUID )
> + {
> + if ( c[cpu].compute_unit_id == c[i].compute_unit_id )
> link_thread_siblings(cpu, i);
> - } else if ( (c[cpu].phys_proc_id == c[i].phys_proc_id) &&
> - (c[cpu].cpu_core_id == c[i].cpu_core_id) ) {
> - link_thread_siblings(cpu, i);
> }
> + else if ( c[cpu].cpu_core_id != XEN_INVALID_CORE_ID &&
> + c[i].cpu_core_id != XEN_INVALID_CORE_ID )
> + {
> + if ( c[cpu].cpu_core_id == c[i].cpu_core_id )
> + link_thread_siblings(cpu, i);
> + }
> + else
> + printk(XENLOG_WARNING
> + "CPU%u: unclear relationship with CPU%u\n",
> + cpu, i);
> }
> }
> - else
> - {
> - cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
> - }
>
> if ( c[cpu].x86_max_cores == 1 )
> {
>
>
>
>
Side note: if cpu_core_id isn't the logical cpu, it might be worth
updating the comments in processor.h
Reviewed-by: Brian Woods <brian.woods@amd.com>
--
Brian Woods
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
2018-07-11 18:11 ` Brian Woods
@ 2018-07-12 13:02 ` Andrew Cooper
2018-07-12 14:22 ` Jan Beulich
1 sibling, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 13:02 UTC (permalink / raw)
To: Jan Beulich, xen-devel; +Cc: Brian Woods, Suravee Suthikulpanit
On 11/07/18 13:07, Jan Beulich wrote:
> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
> the difference is not very big. The most relevant change (requiring some
> code restructuring) is that the topoext feature no longer means there is
> a valid CU ID.
>
> Take the opportunity and convert wrongly plain int variables in
> set_cpu_sibling_map() to unsigned int.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
> u32 eax, ebx, ecx, edx;
>
> cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
> - c->compute_unit_id = ebx & 0xFF;
> c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
> +
> + if (c->x86 < 0x17)
> + c->compute_unit_id = ebx & 0xFF;
> + else {
> + c->cpu_core_id = ebx & 0xFF;
> + c->x86_max_cores /= c->x86_num_siblings;
> + }
The indentation here is odd. It turns out the function uses entirely
8-spaces rather than tabs, like the rest of the file.
Would you mind either retaining spaces, or fixing up the entire function
(probably easiest as a prereq patch) ?
Otherwise LGTM.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads
2018-07-12 13:02 ` Andrew Cooper
@ 2018-07-12 14:22 ` Jan Beulich
0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-12 14:22 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel, Brian Woods, Suravee Suthikulpanit
>>> On 12.07.18 at 15:02, <andrew.cooper3@citrix.com> wrote:
> On 11/07/18 13:07, Jan Beulich wrote:
>> Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
>> the difference is not very big. The most relevant change (requiring some
>> code restructuring) is that the topoext feature no longer means there is
>> a valid CU ID.
>>
>> Take the opportunity and convert wrongly plain int variables in
>> set_cpu_sibling_map() to unsigned int.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/cpu/amd.c
>> +++ b/xen/arch/x86/cpu/amd.c
>> @@ -504,17 +504,23 @@ static void amd_get_topology(struct cpui
>> u32 eax, ebx, ecx, edx;
>>
>> cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
>> - c->compute_unit_id = ebx & 0xFF;
>> c->x86_num_siblings = ((ebx >> 8) & 0x3) + 1;
>> +
>> + if (c->x86 < 0x17)
>> + c->compute_unit_id = ebx & 0xFF;
>> + else {
>> + c->cpu_core_id = ebx & 0xFF;
>> + c->x86_max_cores /= c->x86_num_siblings;
>> + }
>
> The indentation here is odd. It turns out the function uses entirely
> 8-spaces rather than tabs, like the rest of the file.
Ouch - didn't notice this.
> Would you mind either retaining spaces, or fixing up the entire function
> (probably easiest as a prereq patch) ?
I'll fix this up here, not in a prereq patch (for backporting's sake).
I'll consider adding a follow-up patch.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
` (3 preceding siblings ...)
2018-07-11 12:07 ` [PATCH 4/8] x86/AMD: distinguish compute units from hyper-threads Jan Beulich
@ 2018-07-11 12:09 ` Jan Beulich
2018-07-12 15:38 ` Andrew Cooper
2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
` (3 subsequent siblings)
8 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:09 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
Reportedly Intel CPUs which can't broadcast #MC to all targeted
cores/threads because some have CR4.MCE clear will shut down. Therefore
we want to keep CR4.MCE enabled when offlining a CPU, and we need to
bring up all CPUs in order to be able to set CR4.MCE in the first place.
The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
altogether right here.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Instead of fully bringing up CPUs and then calling cpu_down(), another
option would be to suppress/cancel full bringup in smp_callin().
---
Note: The parked CPUs can be brought online (i.e. the meaning of
"maxcpus=" isn't as strict anymore as it was before), but won't
immediately be used for scheduling pre-existing Dom0 CPUs. That's
because dom0_setup_vcpu() artifically restricts the affinity. For
DomU-s whose affinity was not artifically restricted, no such
limitation exists, albeit the shown "soft" affinity appears to
suffer a similar issue. As that's not a goal of this patch, I've
put the issues on the side for now, perhaps for someone else to
take care of.
Note: On one of my test systems the parked CPUs get _PSD data reported
by Dom0 that is different from the non-parked ones (coord_type is
0xFC instead of 0xFE). Giving Dom0 enough vCPU-s eliminates this
problem, so there is apparently something amiss in the processor
driver. I've tried to figure out what, but I couldn't, despite the
AML suggesting that this might be some _OSC invocation (but if it
is, I can't find it - acpi_run_osc() clearly does not anywhere get
invoked in a per-CPU fashion).
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -13,6 +13,7 @@
#include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
#include "cpu.h"
+#include "mcheck/x86_mca.h"
bool_t opt_arat = 1;
boolean_param("arat", opt_arat);
@@ -343,6 +344,9 @@ static void __init early_cpu_detect(void
hap_paddr_bits = PADDR_BITS;
}
+ if (c->x86_vendor != X86_VENDOR_AMD)
+ park_offline_cpus = opt_mce;
+
initialize_cpu_data(0);
}
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -636,8 +636,6 @@ static void clear_cmci(void)
static void cpu_mcheck_disable(void)
{
- clear_in_cr4(X86_CR4_MCE);
-
if ( cmci_support && opt_mce )
clear_cmci();
}
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -68,18 +68,26 @@ physid_mask_t phys_cpu_present_map;
void __init set_nr_cpu_ids(unsigned int max_cpus)
{
+ unsigned int tot_cpus = num_processors + disabled_cpus;
+
if (!max_cpus)
- max_cpus = num_processors + disabled_cpus;
+ max_cpus = tot_cpus;
if (max_cpus > NR_CPUS)
max_cpus = NR_CPUS;
else if (!max_cpus)
max_cpus = 1;
printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
max_cpus, max_t(int, max_cpus - num_processors, 0));
- nr_cpu_ids = max_cpus;
+
+ if (!park_offline_cpus)
+ tot_cpus = max_cpus;
+ nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
+ if (park_offline_cpus && nr_cpu_ids < num_processors)
+ printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
+ num_processors - nr_cpu_ids);
#ifndef nr_cpumask_bits
- nr_cpumask_bits = (max_cpus + (BITS_PER_LONG - 1)) &
+ nr_cpumask_bits = (nr_cpu_ids + (BITS_PER_LONG - 1)) &
~(BITS_PER_LONG - 1);
printk(XENLOG_DEBUG "NR_CPUS:%u nr_cpumask_bits:%u\n",
NR_CPUS, nr_cpumask_bits);
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -665,7 +665,7 @@ void __init noreturn __start_xen(unsigne
{
char *memmap_type = NULL;
char *cmdline, *kextra, *loader;
- unsigned int initrdidx;
+ unsigned int initrdidx, num_parked = 0;
multiboot_info_t *mbi;
module_t *mod;
unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
@@ -1503,7 +1503,8 @@ void __init noreturn __start_xen(unsigne
else
{
set_nr_cpu_ids(max_cpus);
- max_cpus = nr_cpu_ids;
+ if ( !max_cpus )
+ max_cpus = nr_cpu_ids;
}
if ( xen_guest )
@@ -1626,16 +1627,27 @@ void __init noreturn __start_xen(unsigne
/* Set up node_to_cpumask based on cpu_to_node[]. */
numa_add_cpu(i);
- if ( (num_online_cpus() < max_cpus) && !cpu_online(i) )
+ if ( (park_offline_cpus || num_online_cpus() < max_cpus) &&
+ !cpu_online(i) )
{
int ret = cpu_up(i);
if ( ret != 0 )
printk("Failed to bring up CPU %u (error %d)\n", i, ret);
+ else if ( num_online_cpus() > max_cpus )
+ {
+ ret = cpu_down(i);
+ if ( !ret )
+ ++num_parked;
+ else
+ printk("Could not re-offline CPU%u (%d)\n", i, ret);
+ }
}
}
}
printk("Brought up %ld CPUs\n", (long)num_online_cpus());
+ if ( num_parked )
+ printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
smp_cpus_done();
do_initcalls();
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -342,12 +342,6 @@ static always_inline void set_in_cr4 (un
write_cr4(read_cr4() | mask);
}
-static always_inline void clear_in_cr4 (unsigned long mask)
-{
- mmu_cr4_features &= ~mask;
- write_cr4(read_cr4() & ~mask);
-}
-
static inline unsigned int read_pkru(void)
{
unsigned int pkru;
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used
2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-12 15:38 ` Andrew Cooper
2018-07-13 8:11 ` Jan Beulich
0 siblings, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 15:38 UTC (permalink / raw)
To: Jan Beulich, xen-devel
On 11/07/18 13:09, Jan Beulich wrote:
> Reportedly Intel CPUs which can't broadcast #MC to all targeted
> cores/threads because some have CR4.MCE clear will shut down. Therefore
> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
> bring up all CPUs in order to be able to set CR4.MCE in the first place.
>
> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
> altogether right here.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Instead of fully bringing up CPUs and then calling cpu_down(), another
> option would be to suppress/cancel full bringup in smp_callin().
What is the practical difference? When we know ahead of time that we
are intending to part the core, then cancelling in smp_callin() seems
cleaner.
> --- a/xen/arch/x86/mpparse.c
> +++ b/xen/arch/x86/mpparse.c
> @@ -68,18 +68,26 @@ physid_mask_t phys_cpu_present_map;
>
> void __init set_nr_cpu_ids(unsigned int max_cpus)
> {
> + unsigned int tot_cpus = num_processors + disabled_cpus;
> +
> if (!max_cpus)
> - max_cpus = num_processors + disabled_cpus;
> + max_cpus = tot_cpus;
> if (max_cpus > NR_CPUS)
> max_cpus = NR_CPUS;
> else if (!max_cpus)
> max_cpus = 1;
> printk(XENLOG_INFO "SMP: Allowing %u CPUs (%d hotplug CPUs)\n",
> max_cpus, max_t(int, max_cpus - num_processors, 0));
> - nr_cpu_ids = max_cpus;
> +
> + if (!park_offline_cpus)
> + tot_cpus = max_cpus;
> + nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
> + if (park_offline_cpus && nr_cpu_ids < num_processors)
> + printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
> + num_processors - nr_cpu_ids);
>
> #ifndef nr_cpumask_bits
> - nr_cpumask_bits = (max_cpus + (BITS_PER_LONG - 1)) &
> + nr_cpumask_bits = (nr_cpu_ids + (BITS_PER_LONG - 1)) &
> ~(BITS_PER_LONG - 1);
ROUNDUP() ?
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used
2018-07-12 15:38 ` Andrew Cooper
@ 2018-07-13 8:11 ` Jan Beulich
0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-13 8:11 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel
>>> On 12.07.18 at 17:38, <andrew.cooper3@citrix.com> wrote:
> On 11/07/18 13:09, Jan Beulich wrote:
>> Reportedly Intel CPUs which can't broadcast #MC to all targeted
>> cores/threads because some have CR4.MCE clear will shut down. Therefore
>> we want to keep CR4.MCE enabled when offlining a CPU, and we need to
>> bring up all CPUs in order to be able to set CR4.MCE in the first place.
>>
>> The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
>> anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
>> altogether right here.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Instead of fully bringing up CPUs and then calling cpu_down(), another
>> option would be to suppress/cancel full bringup in smp_callin().
>
> What is the practical difference? When we know ahead of time that we
> are intending to part the core, then cancelling in smp_callin() seems
> cleaner.
There are things that get left out in that case, in particular anything
done in CPU_STARTING and CPU_ONLINE notifications. A prime
omission here would be mwait_idle_cpu_init() setting up C-state
information for the CPU.
The other risk I see is that of the apparent error that this causes
to be handed up the call tree. As you can see from the cpupool fix
that was necessary, we'd chance to run into more buggy error
handling paths.
All in all - originally I had also thought that this approach might be
cleaner, but the above together with the ease of invoking
cpu_down() (requiring pretty little change overall) made me
change minds. The main downside is really that runtime soft-
online attempts will make (in patch 6, as mentioned there) the
hyperthreads visible to the scheduler for a brief moment.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
` (4 preceding siblings ...)
2018-07-11 12:09 ` [PATCH 5/8] x86: bring up all CPUs even if not all are supposed to be used Jan Beulich
@ 2018-07-11 12:10 ` Jan Beulich
2018-07-12 15:45 ` Andrew Cooper
2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
` (2 subsequent siblings)
8 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:10 UTC (permalink / raw)
To: xen-devel
Cc: George Dunlap, Andrew Cooper, Brian Woods, Suravee Suthikulpanit,
Dario Faggioli
Shared resources (L1 cache and TLB in particular) present a risk of
information leak via side channels. Don't use hyperthreads in such
cases, but allow independent control of their use at the same time.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
An option to avoid the up/down cycle would be to avoid clearing the
sibling (and then perhaps also core) map of parked CPUs, allowing to
bail early from cpu_up_helper().
TBD: How to prevent the CPU from transiently becoming available for
scheduling when being onlined at runtime?
TBD: For now the patch assumes all HT-enabled CPUs are affected by side
channel attacks through shared resources. There are claims that AMD
ones aren't, but it hasn't really become clear to me why that would
be, as I don't see the fully associative L1 TLBs to be sufficient
reason for there to not be other possible avenues (L2 TLB, caches).
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
### hpetbroadcast (x86)
> `= <boolean>`
+### ht (x86)
+> `= <boolean>`
+
+Default: `false`
+
+Control bring up of multiple hyper-threads per CPU core.
+
### hvm\_debug (x86)
> `= <integer>`
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -62,6 +62,9 @@ boolean_param("nosmp", opt_nosmp);
static unsigned int __initdata max_cpus;
integer_param("maxcpus", max_cpus);
+int8_t __read_mostly opt_ht = -1;
+boolean_param("ht", opt_ht);
+
/* opt_invpcid: If false, don't use INVPCID instruction even if available. */
static bool __initdata opt_invpcid = true;
boolean_param("invpcid", opt_invpcid);
@@ -1633,7 +1636,10 @@ void __init noreturn __start_xen(unsigne
int ret = cpu_up(i);
if ( ret != 0 )
printk("Failed to bring up CPU %u (error %d)\n", i, ret);
- else if ( num_online_cpus() > max_cpus )
+ else if ( num_online_cpus() > max_cpus ||
+ (!opt_ht &&
+ cpu_data[i].compute_unit_id == INVALID_CUID &&
+ cpumask_weight(per_cpu(cpu_sibling_mask, i)) > 1) )
{
ret = cpu_down(i);
if ( !ret )
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -23,6 +23,7 @@
#include <asm/microcode.h>
#include <asm/msr.h>
#include <asm/processor.h>
+#include <asm/setup.h>
#include <asm/spec_ctrl.h>
#include <asm/spec_ctrl_asm.h>
@@ -126,6 +127,9 @@ static int __init parse_spec_ctrl(const
opt_eager_fpu = 0;
+ if ( opt_ht < 0 )
+ opt_ht = 1;
+
disable_common:
opt_rsb_pv = false;
opt_rsb_hvm = false;
@@ -627,6 +631,9 @@ void __init init_speculation_mitigations
if ( default_xen_spec_ctrl )
setup_force_cpu_cap(X86_FEATURE_SC_MSR_IDLE);
+ if ( opt_ht < 0 )
+ opt_ht = 0;
+
xpti_init_default(false);
if ( opt_xpti == 0 )
setup_force_cpu_cap(X86_FEATURE_NO_XPTI);
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -23,6 +23,7 @@
#include <asm/hvm/hvm.h>
#include <asm/hvm/support.h>
#include <asm/processor.h>
+#include <asm/setup.h>
#include <asm/smp.h>
#include <asm/numa.h>
#include <xen/nodemask.h>
@@ -48,14 +49,27 @@ static void l3_cache_get(void *arg)
long cpu_up_helper(void *data)
{
- int cpu = (unsigned long)data;
+ unsigned int cpu = (unsigned long)data;
int ret = cpu_up(cpu);
+
if ( ret == -EBUSY )
{
/* On EBUSY, flush RCU work and have one more go. */
rcu_barrier();
ret = cpu_up(cpu);
}
+
+ if ( !ret && !opt_ht &&
+ cpu_data[cpu].compute_unit_id == INVALID_CUID &&
+ cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) > 1 )
+ {
+ ret = cpu_down_helper(data);
+ if ( ret )
+ printk("Could not re-offline CPU%u (%d)\n", cpu, ret);
+ else
+ ret = -EPERM;
+ }
+
return ret;
}
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -59,6 +59,8 @@ extern uint8_t kbd_shift_flags;
extern unsigned long highmem_start;
#endif
+extern int8_t opt_ht;
+
#ifdef CONFIG_SHADOW_PAGING
extern bool opt_dom0_shadow;
#else
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-12 15:45 ` Andrew Cooper
2018-07-13 8:13 ` Jan Beulich
0 siblings, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-12 15:45 UTC (permalink / raw)
To: Jan Beulich, xen-devel
Cc: George Dunlap, Brian Woods, Suravee Suthikulpanit, Dario Faggioli
On 11/07/18 13:10, Jan Beulich wrote:
> Shared resources (L1 cache and TLB in particular) present a risk of
> information leak via side channels. Don't use hyperthreads in such
> cases, but allow independent control of their use at the same time.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> An option to avoid the up/down cycle would be to avoid clearing the
> sibling (and then perhaps also core) map of parked CPUs, allowing to
> bail early from cpu_up_helper().
>
> TBD: How to prevent the CPU from transiently becoming available for
> scheduling when being onlined at runtime?
This looks like an argument for cancelling at call-in time, no?
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
> ### hpetbroadcast (x86)
> > `= <boolean>`
>
> +### ht (x86)
I'd suggest smt rather than ht here. SMT is the technical term, while
HT is Intel's marketing name.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
2018-07-12 15:45 ` Andrew Cooper
@ 2018-07-13 8:13 ` Jan Beulich
2018-07-16 12:37 ` Andrew Cooper
0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-13 8:13 UTC (permalink / raw)
To: Andrew Cooper
Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
Dario Faggioli
>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
> On 11/07/18 13:10, Jan Beulich wrote:
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>> ### hpetbroadcast (x86)
>> > `= <boolean>`
>>
>> +### ht (x86)
>
> I'd suggest smt rather than ht here. SMT is the technical term, while
> HT is Intel's marketing name.
Hmm, many BIOSes (if the have such an option) talk about HT, which
to me makes "ht" a closer match. How about we allow both?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
2018-07-13 8:13 ` Jan Beulich
@ 2018-07-16 12:37 ` Andrew Cooper
2018-07-16 12:53 ` Jan Beulich
0 siblings, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2018-07-16 12:37 UTC (permalink / raw)
To: Jan Beulich
Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
Dario Faggioli
On 13/07/18 09:13, Jan Beulich wrote:
>>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
>> On 11/07/18 13:10, Jan Beulich wrote:
>>> --- a/docs/misc/xen-command-line.markdown
>>> +++ b/docs/misc/xen-command-line.markdown
>>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>>> ### hpetbroadcast (x86)
>>> > `= <boolean>`
>>>
>>> +### ht (x86)
>> I'd suggest smt rather than ht here. SMT is the technical term, while
>> HT is Intel's marketing name.
> Hmm, many BIOSes (if the have such an option) talk about HT, which
> to me makes "ht" a closer match. How about we allow both?
That's because a BIOS is custom to the hardware it runs on.
Have you tried setting up an alias before? (given the specific
deficiency of the *_param() infrastructure in this area) I'm don't
think an alias is worth the effort.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
2018-07-16 12:37 ` Andrew Cooper
@ 2018-07-16 12:53 ` Jan Beulich
2018-07-16 13:01 ` Andrew Cooper
0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 12:53 UTC (permalink / raw)
To: Andrew Cooper
Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
Dario Faggioli
>>> On 16.07.18 at 14:37, <andrew.cooper3@citrix.com> wrote:
> On 13/07/18 09:13, Jan Beulich wrote:
>>>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
>>> On 11/07/18 13:10, Jan Beulich wrote:
>>>> --- a/docs/misc/xen-command-line.markdown
>>>> +++ b/docs/misc/xen-command-line.markdown
>>>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>>>> ### hpetbroadcast (x86)
>>>> > `= <boolean>`
>>>>
>>>> +### ht (x86)
>>> I'd suggest smt rather than ht here. SMT is the technical term, while
>>> HT is Intel's marketing name.
>> Hmm, many BIOSes (if the have such an option) talk about HT, which
>> to me makes "ht" a closer match. How about we allow both?
>
> That's because a BIOS is custom to the hardware it runs on.
>
> Have you tried setting up an alias before? (given the specific
> deficiency of the *_param() infrastructure in this area) I'm don't
> think an alias is worth the effort.
This reads as if you were expecting problems. Instead of
+int8_t __read_mostly opt_ht = -1;
+boolean_param("ht", opt_ht);
what we'd have is simply
+int8_t __read_mostly opt_ht = -1;
+boolean_param("ht", opt_ht);
+boolean_param("smt", opt_ht);
(and whether we use opt_ht or opt_smt doesn't matter much
to me). I don't see any source of possible issues with this.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads
2018-07-16 12:53 ` Jan Beulich
@ 2018-07-16 13:01 ` Andrew Cooper
0 siblings, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-16 13:01 UTC (permalink / raw)
To: Jan Beulich
Cc: George Dunlap, xen-devel, Brian Woods, Suravee Suthikulpanit,
Dario Faggioli
On 16/07/18 13:53, Jan Beulich wrote:
>>>> On 16.07.18 at 14:37, <andrew.cooper3@citrix.com> wrote:
>> On 13/07/18 09:13, Jan Beulich wrote:
>>>>>> On 12.07.18 at 17:45, <andrew.cooper3@citrix.com> wrote:
>>>> On 11/07/18 13:10, Jan Beulich wrote:
>>>>> --- a/docs/misc/xen-command-line.markdown
>>>>> +++ b/docs/misc/xen-command-line.markdown
>>>>> @@ -1040,6 +1040,13 @@ identical to the boot CPU will be parked
>>>>> ### hpetbroadcast (x86)
>>>>> > `= <boolean>`
>>>>>
>>>>> +### ht (x86)
>>>> I'd suggest smt rather than ht here. SMT is the technical term, while
>>>> HT is Intel's marketing name.
>>> Hmm, many BIOSes (if the have such an option) talk about HT, which
>>> to me makes "ht" a closer match. How about we allow both?
>> That's because a BIOS is custom to the hardware it runs on.
>>
>> Have you tried setting up an alias before? (given the specific
>> deficiency of the *_param() infrastructure in this area) I'm don't
>> think an alias is worth the effort.
> This reads as if you were expecting problems. Instead of
>
> +int8_t __read_mostly opt_ht = -1;
> +boolean_param("ht", opt_ht);
>
> what we'd have is simply
>
> +int8_t __read_mostly opt_ht = -1;
> +boolean_param("ht", opt_ht);
> +boolean_param("smt", opt_ht);
>
> (and whether we use opt_ht or opt_smt doesn't matter much
> to me). I don't see any source of possible issues with this.
Try compiling it.
I tried exactly this with bti= and spec-ctrl= originally. The problem
is that the second argument must be (translation unit) unique, because
of the way it is used to form the name of the struct kernel_param.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
` (5 preceding siblings ...)
2018-07-11 12:10 ` [PATCH 6/8] x86: (command line option to) avoid use of secondary hyper-threads Jan Beulich
@ 2018-07-11 12:11 ` Jan Beulich
2018-07-11 12:23 ` Andrew Cooper
` (2 more replies)
2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
[not found] ` <5B45F26A02000078001D312F@suse.com>
8 siblings, 3 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:11 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne
In the shim case, the number of CPUs should be solely controlled by the
guest configuration file. Make sure the command line options are fully
(and not just partially) ignored.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1498,6 +1498,15 @@ void __init noreturn __start_xen(unsigne
if ( smp_found_config )
get_smp_config();
+ /*
+ * In the shim case, the number of CPUs should be solely controlled by the
+ * guest configuration file.
+ */
+ if ( pv_shim )
+ {
+ opt_nosmp = false;
+ max_cpus = 0;
+ }
if ( opt_nosmp )
{
max_cpus = 0;
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
@ 2018-07-11 12:23 ` Andrew Cooper
2018-07-11 15:18 ` Roger Pau Monné
2018-07-11 16:02 ` Wei Liu
2 siblings, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-11 12:23 UTC (permalink / raw)
To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monne
On 11/07/18 13:11, Jan Beulich wrote:
> In the shim case, the number of CPUs should be solely controlled by the
> guest configuration file. Make sure the command line options are fully
> (and not just partially) ignored.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Ideally with "This option is ignored in **pv-shim** mode" added to the
docs in the relevant places,
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
2018-07-11 12:23 ` Andrew Cooper
@ 2018-07-11 15:18 ` Roger Pau Monné
2018-07-11 16:02 ` Wei Liu
2 siblings, 0 replies; 39+ messages in thread
From: Roger Pau Monné @ 2018-07-11 15:18 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper
On Wed, Jul 11, 2018 at 06:11:36AM -0600, Jan Beulich wrote:
> In the shim case, the number of CPUs should be solely controlled by the
> guest configuration file. Make sure the command line options are fully
> (and not just partially) ignored.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
I agree with Andrew that it should be mentioned in the command line
document. Sorry for not doing that before.
Thanks, Roger.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus="
2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
2018-07-11 12:23 ` Andrew Cooper
2018-07-11 15:18 ` Roger Pau Monné
@ 2018-07-11 16:02 ` Wei Liu
2 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-11 16:02 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Roger Pau Monne, Wei Liu, Andrew Cooper
On Wed, Jul 11, 2018 at 06:11:36AM -0600, Jan Beulich wrote:
> In the shim case, the number of CPUs should be solely controlled by the
> guest configuration file. Make sure the command line options are fully
> (and not just partially) ignored.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var()
2018-07-11 11:55 [PATCH 0/8] x86: (allow to) suppress use of hyper-threading Jan Beulich
` (6 preceding siblings ...)
2018-07-11 12:11 ` [PATCH 7/8] x86/shim: fully ignore "nosmp" and "maxcpus=" Jan Beulich
@ 2018-07-11 12:12 ` Jan Beulich
2018-07-11 12:20 ` Andrew Cooper
2018-07-12 15:13 ` Wei Liu
[not found] ` <5B45F26A02000078001D312F@suse.com>
8 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-11 12:12 UTC (permalink / raw)
To: xen-devel
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Julien Grall
Drop unnecessary casts and use bool in favor of bool_t.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -345,9 +345,9 @@ static inline int cpulist_scnprintf(char
typedef cpumask_t *cpumask_var_t;
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
{
- *(void **)mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
+ *mask = _xmalloc(nr_cpumask_bits / 8, sizeof(long));
return *mask != NULL;
}
@@ -358,9 +358,9 @@ static inline bool cond_alloc_cpumask_va
return *mask != NULL;
}
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
{
- *(void **)mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
+ *mask = _xzalloc(nr_cpumask_bits / 8, sizeof(long));
return *mask != NULL;
}
@@ -385,16 +385,16 @@ static inline void clear_cpumask_var(cpu
#else
typedef cpumask_t cpumask_var_t[1];
-static inline bool_t alloc_cpumask_var(cpumask_var_t *mask)
+static inline bool alloc_cpumask_var(cpumask_var_t *mask)
{
- return 1;
+ return true;
}
#define cond_alloc_cpumask_var alloc_cpumask_var
-static inline bool_t zalloc_cpumask_var(cpumask_var_t *mask)
+static inline bool zalloc_cpumask_var(cpumask_var_t *mask)
{
cpumask_clear(*mask);
- return 1;
+ return true;
}
#define cond_zalloc_cpumask_var zalloc_cpumask_var
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var()
2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
@ 2018-07-11 12:20 ` Andrew Cooper
2018-07-12 15:13 ` Wei Liu
1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2018-07-11 12:20 UTC (permalink / raw)
To: Jan Beulich, xen-devel
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
Ian Jackson, Julien Grall
On 11/07/18 13:12, Jan Beulich wrote:
> Drop unnecessary casts and use bool in favor of bool_t.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var()
2018-07-11 12:12 ` [PATCH 8/8] cpumask: tidy {,z}alloc_cpumask_var() Jan Beulich
2018-07-11 12:20 ` Andrew Cooper
@ 2018-07-12 15:13 ` Wei Liu
1 sibling, 0 replies; 39+ messages in thread
From: Wei Liu @ 2018-07-12 15:13 UTC (permalink / raw)
To: Jan Beulich
Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
Ian Jackson, Tim Deegan, Julien Grall, xen-devel
On Wed, Jul 11, 2018 at 06:12:14AM -0600, Jan Beulich wrote:
> Drop unnecessary casts and use bool in favor of bool_t.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
[parent not found: <5B45F26A02000078001D312F@suse.com>]
* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
[not found] ` <5B45F26A02000078001D312F@suse.com>
@ 2018-07-13 9:02 ` Juergen Gross
2018-07-16 9:17 ` Jan Beulich
[not found] ` <5B4C629002000078001D4346@suse.com>
0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-13 9:02 UTC (permalink / raw)
To: Jan Beulich, xen-devel; +Cc: Dario Faggioli
On 11/07/18 14:04, Jan Beulich wrote:
> While I've run into the issue with further patches in place which no
> longer guarantee the per-CPU area to start out as all zeros, the
> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
> there.
>
> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
> should not happen before cpu_disable_scheduler()). Clearing it in
> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
> piece of code twice. Since the field's value shouldn't matter while the
> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
> only for other than the suspend/resume case (which gets specially
> handled in cpupool_cpu_remove()).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
> cpupool_cpu_remove(), but besides that - as per above - likely
> being too early, that function has further prereqs to be met. It
> also doesn't look as if cpupool_unassign_cpu_helper() could be used
> there.
>
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -778,6 +778,8 @@ static int cpu_callback(
> {
> case CPU_DOWN_FAILED:
> case CPU_ONLINE:
> + if ( system_state <= SYS_STATE_active )
> + per_cpu(cpupool, cpu) = NULL;
> rc = cpupool_cpu_add(cpu);
Wouldn't it make more sense to clear the field in cpupool_cpu_add()
which already is testing system_state?
Modifying the condition in cpupool_cpu_add() to
if ( system_state <= SYS_STATE_active )
at the same time would have the benefit to catch problems in case
suspending cpus is failing during SYS_STATE_suspend (I'd expect
triggering the first ASSERT in schedule_cpu_switch() in this case).
It should be noted that this scenario is theoretical only, as today
the CPU_DOWN_FAILED case can't happen in the suspend case.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
2018-07-13 9:02 ` [PATCH 1/8] cpupools: fix state when downing a CPU failed Juergen Gross
@ 2018-07-16 9:17 ` Jan Beulich
[not found] ` <5B4C629002000078001D4346@suse.com>
1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 9:17 UTC (permalink / raw)
To: Juergen Gross; +Cc: xen-devel, Dario Faggioli
>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
> On 11/07/18 14:04, Jan Beulich wrote:
>> While I've run into the issue with further patches in place which no
>> longer guarantee the per-CPU area to start out as all zeros, the
>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>> there.
>>
>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>> should not happen before cpu_disable_scheduler()). Clearing it in
>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>> piece of code twice. Since the field's value shouldn't matter while the
>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>> only for other than the suspend/resume case (which gets specially
>> handled in cpupool_cpu_remove()).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>> cpupool_cpu_remove(), but besides that - as per above - likely
>> being too early, that function has further prereqs to be met. It
>> also doesn't look as if cpupool_unassign_cpu_helper() could be used
>> there.
>>
>> --- a/xen/common/cpupool.c
>> +++ b/xen/common/cpupool.c
>> @@ -778,6 +778,8 @@ static int cpu_callback(
>> {
>> case CPU_DOWN_FAILED:
>> case CPU_ONLINE:
>> + if ( system_state <= SYS_STATE_active )
>> + per_cpu(cpupool, cpu) = NULL;
>> rc = cpupool_cpu_add(cpu);
>
> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
> which already is testing system_state?
Hmm, this may be a matter of taste: I consider the change done here
a prereq to calling the function in the first place. As said in the
description, I actually think this should come earlier, and it's just that
I can't see how to cleanly do so.
> Modifying the condition in cpupool_cpu_add() to
>
> if ( system_state <= SYS_STATE_active )
>
> at the same time would have the benefit to catch problems in case
> suspending cpus is failing during SYS_STATE_suspend (I'd expect
> triggering the first ASSERT in schedule_cpu_switch() in this case).
You mean the if() there, not the else? If so - how would the "else"
body then ever be reached? IOW if anything I could only see the
"else" to become "else if ( system_state <= SYS_STATE_active )".
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
[parent not found: <5B4C629002000078001D4346@suse.com>]
* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
[not found] ` <5B4C629002000078001D4346@suse.com>
@ 2018-07-16 11:47 ` Juergen Gross
2018-07-16 12:19 ` Jan Beulich
[not found] ` <5B4C8D3702000078001D45EA@suse.com>
0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-16 11:47 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Dario Faggioli
On 16/07/18 11:17, Jan Beulich wrote:
>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>> On 11/07/18 14:04, Jan Beulich wrote:
>>> While I've run into the issue with further patches in place which no
>>> longer guarantee the per-CPU area to start out as all zeros, the
>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>> there.
>>>
>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>> piece of code twice. Since the field's value shouldn't matter while the
>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>> only for other than the suspend/resume case (which gets specially
>>> handled in cpupool_cpu_remove()).
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>> cpupool_cpu_remove(), but besides that - as per above - likely
>>> being too early, that function has further prereqs to be met. It
>>> also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>> there.
>>>
>>> --- a/xen/common/cpupool.c
>>> +++ b/xen/common/cpupool.c
>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>> {
>>> case CPU_DOWN_FAILED:
>>> case CPU_ONLINE:
>>> + if ( system_state <= SYS_STATE_active )
>>> + per_cpu(cpupool, cpu) = NULL;
>>> rc = cpupool_cpu_add(cpu);
>>
>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>> which already is testing system_state?
>
> Hmm, this may be a matter of taste: I consider the change done here
> a prereq to calling the function in the first place. As said in the
> description, I actually think this should come earlier, and it's just that
> I can't see how to cleanly do so.
>
>> Modifying the condition in cpupool_cpu_add() to
>>
>> if ( system_state <= SYS_STATE_active )
>>
>> at the same time would have the benefit to catch problems in case
>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>
> You mean the if() there, not the else? If so - how would the "else"
> body then ever be reached? IOW if anything I could only see the
> "else" to become "else if ( system_state <= SYS_STATE_active )".
Bad wording on my side.
I should have written "the condition in cpupool_cpu_add() should match
if ( system_state <= SYS_STATE_active )."
So: "if ( system_state > SYS_STATE_active )", as the test is for the
other case.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
2018-07-16 11:47 ` Juergen Gross
@ 2018-07-16 12:19 ` Jan Beulich
[not found] ` <5B4C8D3702000078001D45EA@suse.com>
1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 12:19 UTC (permalink / raw)
To: Juergen Gross; +Cc: xen-devel, Dario Faggioli
>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
> On 16/07/18 11:17, Jan Beulich wrote:
>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>> While I've run into the issue with further patches in place which no
>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>> there.
>>>>
>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>> only for other than the suspend/resume case (which gets specially
>>>> handled in cpupool_cpu_remove()).
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>> cpupool_cpu_remove(), but besides that - as per above - likely
>>>> being too early, that function has further prereqs to be met. It
>>>> also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>> there.
>>>>
>>>> --- a/xen/common/cpupool.c
>>>> +++ b/xen/common/cpupool.c
>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>> {
>>>> case CPU_DOWN_FAILED:
>>>> case CPU_ONLINE:
>>>> + if ( system_state <= SYS_STATE_active )
>>>> + per_cpu(cpupool, cpu) = NULL;
>>>> rc = cpupool_cpu_add(cpu);
>>>
>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>> which already is testing system_state?
>>
>> Hmm, this may be a matter of taste: I consider the change done here
>> a prereq to calling the function in the first place. As said in the
>> description, I actually think this should come earlier, and it's just that
>> I can't see how to cleanly do so.
You didn't comment on this one at all, yet it matters for how a v2
is supposed to look like.
>>> Modifying the condition in cpupool_cpu_add() to
>>>
>>> if ( system_state <= SYS_STATE_active )
>>>
>>> at the same time would have the benefit to catch problems in case
>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>
>> You mean the if() there, not the else? If so - how would the "else"
>> body then ever be reached? IOW if anything I could only see the
>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>
> Bad wording on my side.
>
> I should have written "the condition in cpupool_cpu_add() should match
> if ( system_state <= SYS_STATE_active )."
>
> So: "if ( system_state > SYS_STATE_active )", as the test is for the
> other case.
I'd recommend against this, as someone adding a new SYS_STATE_*
past suspend/resume would quite likely miss this one. The strong
ordering of states imo should only be used for active and lower states.
But yes, I could see the if() there to become suspend || resume to
address the problem you describe.
Coming back to your DOWN_FAILED consideration: Why are you
thinking this can't happen during suspend? disable_nonboot_cpus()
uses plain cpu_down() after all.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
[parent not found: <5B4C8D3702000078001D45EA@suse.com>]
* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
[not found] ` <5B4C8D3702000078001D45EA@suse.com>
@ 2018-07-16 12:47 ` Juergen Gross
2018-07-16 13:01 ` Jan Beulich
[not found] ` <5B4C973D02000078001D4693@suse.com>
0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-16 12:47 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Dario Faggioli
On 16/07/18 14:19, Jan Beulich wrote:
>>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
>> On 16/07/18 11:17, Jan Beulich wrote:
>>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>>> While I've run into the issue with further patches in place which no
>>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>>> there.
>>>>>
>>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>>> only for other than the suspend/resume case (which gets specially
>>>>> handled in cpupool_cpu_remove()).
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> ---
>>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>> cpupool_cpu_remove(), but besides that - as per above - likely
>>>>> being too early, that function has further prereqs to be met. It
>>>>> also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>> there.
>>>>>
>>>>> --- a/xen/common/cpupool.c
>>>>> +++ b/xen/common/cpupool.c
>>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>> {
>>>>> case CPU_DOWN_FAILED:
>>>>> case CPU_ONLINE:
>>>>> + if ( system_state <= SYS_STATE_active )
>>>>> + per_cpu(cpupool, cpu) = NULL;
>>>>> rc = cpupool_cpu_add(cpu);
>>>>
>>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>>> which already is testing system_state?
>>>
>>> Hmm, this may be a matter of taste: I consider the change done here
>>> a prereq to calling the function in the first place. As said in the
>>> description, I actually think this should come earlier, and it's just that
>>> I can't see how to cleanly do so.
>
> You didn't comment on this one at all, yet it matters for how a v2
> is supposed to look like.
My comment was thought to address this question, too. cpupool_cpu_add()
is handling the special case of resuming explicitly, where the old cpu
assignment to a cpupool is kept. So I believe setting
per_cpu(cpupool, cpu) = NULL
in the else clause of cpupool_cpu_add() only is better.
>>>> Modifying the condition in cpupool_cpu_add() to
>>>>
>>>> if ( system_state <= SYS_STATE_active )
>>>>
>>>> at the same time would have the benefit to catch problems in case
>>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>>
>>> You mean the if() there, not the else? If so - how would the "else"
>>> body then ever be reached? IOW if anything I could only see the
>>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>>
>> Bad wording on my side.
>>
>> I should have written "the condition in cpupool_cpu_add() should match
>> if ( system_state <= SYS_STATE_active )."
>>
>> So: "if ( system_state > SYS_STATE_active )", as the test is for the
>> other case.
>
> I'd recommend against this, as someone adding a new SYS_STATE_*
> past suspend/resume would quite likely miss this one. The strong
> ordering of states imo should only be used for active and lower states.
> But yes, I could see the if() there to become suspend || resume to
> address the problem you describe.
Yes, this would seem to be a better choice here.
> Coming back to your DOWN_FAILED consideration: Why are you
> thinking this can't happen during suspend? disable_nonboot_cpus()
> uses plain cpu_down() after all.
Right.
DOWN_FAILED is used only once, and that is in cpu_down() after the step
CPU_DOWN_PREPARE returned an error. And CPU_DOWN_PREPARE is only used
for cpufreq driver where it never returns an error, and for cpupools
which don't matter here, as only other components failing at step
CPU_DOWN_PREPARE would lead to calling cpupool/DOWN_FAILED.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
2018-07-16 12:47 ` Juergen Gross
@ 2018-07-16 13:01 ` Jan Beulich
[not found] ` <5B4C973D02000078001D4693@suse.com>
1 sibling, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2018-07-16 13:01 UTC (permalink / raw)
To: Juergen Gross; +Cc: xen-devel, Dario Faggioli
>>> On 16.07.18 at 14:47, <jgross@suse.com> wrote:
> On 16/07/18 14:19, Jan Beulich wrote:
>>>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
>>> On 16/07/18 11:17, Jan Beulich wrote:
>>>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>>>> While I've run into the issue with further patches in place which no
>>>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>>>> there.
>>>>>>
>>>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>>>> only for other than the suspend/resume case (which gets specially
>>>>>> handled in cpupool_cpu_remove()).
>>>>>>
>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>> ---
>>>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>>> cpupool_cpu_remove(), but besides that - as per above - likely
>>>>>> being too early, that function has further prereqs to be met. It
>>>>>> also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>>> there.
>>>>>>
>>>>>> --- a/xen/common/cpupool.c
>>>>>> +++ b/xen/common/cpupool.c
>>>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>>> {
>>>>>> case CPU_DOWN_FAILED:
>>>>>> case CPU_ONLINE:
>>>>>> + if ( system_state <= SYS_STATE_active )
>>>>>> + per_cpu(cpupool, cpu) = NULL;
>>>>>> rc = cpupool_cpu_add(cpu);
>>>>>
>>>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>>>> which already is testing system_state?
>>>>
>>>> Hmm, this may be a matter of taste: I consider the change done here
>>>> a prereq to calling the function in the first place. As said in the
>>>> description, I actually think this should come earlier, and it's just that
>>>> I can't see how to cleanly do so.
>>
>> You didn't comment on this one at all, yet it matters for how a v2
>> is supposed to look like.
>
> My comment was thought to address this question, too. cpupool_cpu_add()
> is handling the special case of resuming explicitly, where the old cpu
> assignment to a cpupool is kept. So I believe setting
> per_cpu(cpupool, cpu) = NULL
> in the else clause of cpupool_cpu_add() only is better.
Well, okay then. You're the maintainer.
>>>>> Modifying the condition in cpupool_cpu_add() to
>>>>>
>>>>> if ( system_state <= SYS_STATE_active )
>>>>>
>>>>> at the same time would have the benefit to catch problems in case
>>>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>>>
>>>> You mean the if() there, not the else? If so - how would the "else"
>>>> body then ever be reached? IOW if anything I could only see the
>>>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>>>
>>> Bad wording on my side.
>>>
>>> I should have written "the condition in cpupool_cpu_add() should match
>>> if ( system_state <= SYS_STATE_active )."
>>>
>>> So: "if ( system_state > SYS_STATE_active )", as the test is for the
>>> other case.
>>
>> I'd recommend against this, as someone adding a new SYS_STATE_*
>> past suspend/resume would quite likely miss this one. The strong
>> ordering of states imo should only be used for active and lower states.
>> But yes, I could see the if() there to become suspend || resume to
>> address the problem you describe.
>
> Yes, this would seem to be a better choice here.
>
>> Coming back to your DOWN_FAILED consideration: Why are you
>> thinking this can't happen during suspend? disable_nonboot_cpus()
>> uses plain cpu_down() after all.
>
> Right.
>
> DOWN_FAILED is used only once, and that is in cpu_down() after the step
> CPU_DOWN_PREPARE returned an error. And CPU_DOWN_PREPARE is only used
> for cpufreq driver where it never returns an error, and for cpupools
> which don't matter here, as only other components failing at step
> CPU_DOWN_PREPARE would lead to calling cpupool/DOWN_FAILED.
What about the stop_machine_run() failure case?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread
[parent not found: <5B4C973D02000078001D4693@suse.com>]
* Re: [PATCH 1/8] cpupools: fix state when downing a CPU failed
[not found] ` <5B4C973D02000078001D4693@suse.com>
@ 2018-07-16 14:21 ` Juergen Gross
2018-07-16 14:26 ` Jan Beulich
[not found] ` <5B4CAB1202000078001D47BC@suse.com>
0 siblings, 2 replies; 39+ messages in thread
From: Juergen Gross @ 2018-07-16 14:21 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Dario Faggioli
On 16/07/18 15:01, Jan Beulich wrote:
>>>> On 16.07.18 at 14:47, <jgross@suse.com> wrote:
>> On 16/07/18 14:19, Jan Beulich wrote:
>>>>>> On 16.07.18 at 13:47, <jgross@suse.com> wrote:
>>>> On 16/07/18 11:17, Jan Beulich wrote:
>>>>>>>> On 13.07.18 at 11:02, <jgross@suse.com> wrote:
>>>>>> On 11/07/18 14:04, Jan Beulich wrote:
>>>>>>> While I've run into the issue with further patches in place which no
>>>>>>> longer guarantee the per-CPU area to start out as all zeros, the
>>>>>>> CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
>>>>>>> the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
>>>>>>> of schedule_cpu_switch() will trigger the "c != old_pool" assertion
>>>>>>> there.
>>>>>>>
>>>>>>> Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
>>>>>>> should not happen before cpu_disable_scheduler()). Clearing it in
>>>>>>> CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
>>>>>>> piece of code twice. Since the field's value shouldn't matter while the
>>>>>>> CPU is offline, simply clear it in CPU_ONLINE and CPU_DOWN_FAILED, but
>>>>>>> only for other than the suspend/resume case (which gets specially
>>>>>>> handled in cpupool_cpu_remove()).
>>>>>>>
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>> ---
>>>>>>> TBD: I think this would better call schedule_cpu_switch(cpu, NULL) from
>>>>>>> cpupool_cpu_remove(), but besides that - as per above - likely
>>>>>>> being too early, that function has further prereqs to be met. It
>>>>>>> also doesn't look as if cpupool_unassign_cpu_helper() could be used
>>>>>>> there.
>>>>>>>
>>>>>>> --- a/xen/common/cpupool.c
>>>>>>> +++ b/xen/common/cpupool.c
>>>>>>> @@ -778,6 +778,8 @@ static int cpu_callback(
>>>>>>> {
>>>>>>> case CPU_DOWN_FAILED:
>>>>>>> case CPU_ONLINE:
>>>>>>> + if ( system_state <= SYS_STATE_active )
>>>>>>> + per_cpu(cpupool, cpu) = NULL;
>>>>>>> rc = cpupool_cpu_add(cpu);
>>>>>>
>>>>>> Wouldn't it make more sense to clear the field in cpupool_cpu_add()
>>>>>> which already is testing system_state?
>>>>>
>>>>> Hmm, this may be a matter of taste: I consider the change done here
>>>>> a prereq to calling the function in the first place. As said in the
>>>>> description, I actually think this should come earlier, and it's just that
>>>>> I can't see how to cleanly do so.
>>>
>>> You didn't comment on this one at all, yet it matters for how a v2
>>> is supposed to look like.
>>
>> My comment was thought to address this question, too. cpupool_cpu_add()
>> is handling the special case of resuming explicitly, where the old cpu
>> assignment to a cpupool is kept. So I believe setting
>> per_cpu(cpupool, cpu) = NULL
>> in the else clause of cpupool_cpu_add() only is better.
>
> Well, okay then. You're the maintainer.
>
>>>>>> Modifying the condition in cpupool_cpu_add() to
>>>>>>
>>>>>> if ( system_state <= SYS_STATE_active )
>>>>>>
>>>>>> at the same time would have the benefit to catch problems in case
>>>>>> suspending cpus is failing during SYS_STATE_suspend (I'd expect
>>>>>> triggering the first ASSERT in schedule_cpu_switch() in this case).
>>>>>
>>>>> You mean the if() there, not the else? If so - how would the "else"
>>>>> body then ever be reached? IOW if anything I could only see the
>>>>> "else" to become "else if ( system_state <= SYS_STATE_active )".
>>>>
>>>> Bad wording on my side.
>>>>
>>>> I should have written "the condition in cpupool_cpu_add() should match
>>>> if ( system_state <= SYS_STATE_active )."
>>>>
>>>> So: "if ( system_state > SYS_STATE_active )", as the test is for the
>>>> other case.
>>>
>>> I'd recommend against this, as someone adding a new SYS_STATE_*
>>> past suspend/resume would quite likely miss this one. The strong
>>> ordering of states imo should only be used for active and lower states.
>>> But yes, I could see the if() there to become suspend || resume to
>>> address the problem you describe.
>>
>> Yes, this would seem to be a better choice here.
>>
>>> Coming back to your DOWN_FAILED consideration: Why are you
>>> thinking this can't happen during suspend? disable_nonboot_cpus()
>>> uses plain cpu_down() after all.
>>
>> Right.
>>
>> DOWN_FAILED is used only once, and that is in cpu_down() after the step
>> CPU_DOWN_PREPARE returned an error. And CPU_DOWN_PREPARE is only used
>> for cpufreq driver where it never returns an error, and for cpupools
>> which don't matter here, as only other components failing at step
>> CPU_DOWN_PREPARE would lead to calling cpupool/DOWN_FAILED.
>
> What about the stop_machine_run() failure case?
Oh. No idea how I missed that.
So maybe changing the condition in cpupool_cpu_add() should be split out
into a patch of its own in order to be able to backport it?
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 39+ messages in thread