* [PATCH 0/5] x86: more power-efficient CPU parking @ 2018-08-01 14:22 Jan Beulich 2018-08-01 14:31 ` [PATCH 1/5] x86/cpuidle: replace a pointless NULL check Jan Beulich ` (6 more replies) 0 siblings, 7 replies; 33+ messages in thread From: Jan Beulich @ 2018-08-01 14:22 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper When putting CPUs to sleep permanently, we should try to put them into the most power conserving state possible. For now it is unclear whether, especially in a deep C-state, the P-state also matters, so this series only arranges for the C-state side of things (plus some cleanup). 1: x86/cpuidle: replace a pointless NULL check 2: x86/idle: re-arrange dead-idle handling 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible 4: x86/cpuidle: clean up Cx dumping 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 1/5] x86/cpuidle: replace a pointless NULL check 2018-08-01 14:22 [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich @ 2018-08-01 14:31 ` Jan Beulich 2018-08-01 14:33 ` Andrew Cooper 2018-08-01 14:31 ` [PATCH 2/5] x86/idle: re-arrange dead-idle handling Jan Beulich ` (5 subsequent siblings) 6 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2018-08-01 14:31 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper The address of an array slot can't be NULL. Instead add a bounds check to make sure the array indexing is valid (the check is against 2 since slot zero of the array - corresponding to C0 - is of no interest here). Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -759,11 +759,11 @@ void acpi_dead_idle(void) struct acpi_processor_power *power; struct acpi_processor_cx *cx; - if ( (power = processor_powers[smp_processor_id()]) == NULL ) + if ( (power = processor_powers[smp_processor_id()]) == NULL || + power->count < 2 ) goto default_halt; - if ( (cx = &power->states[power->count-1]) == NULL ) - goto default_halt; + cx = &power->states[power->count - 1]; if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/5] x86/cpuidle: replace a pointless NULL check 2018-08-01 14:31 ` [PATCH 1/5] x86/cpuidle: replace a pointless NULL check Jan Beulich @ 2018-08-01 14:33 ` Andrew Cooper 2018-08-01 15:12 ` Jan Beulich 0 siblings, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2018-08-01 14:33 UTC (permalink / raw) To: Jan Beulich, xen-devel On 01/08/18 15:31, Jan Beulich wrote: > The address of an array slot can't be NULL. Instead add a bounds check > to make sure the array indexing is valid (the check is against 2 since > slot zero of the array - corresponding to C0 - is of no interest here). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Wouldn't an expression involving ARRAY_SIZE() be more appropriate? Either way, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/5] x86/cpuidle: replace a pointless NULL check 2018-08-01 14:33 ` Andrew Cooper @ 2018-08-01 15:12 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2018-08-01 15:12 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel >>> On 01.08.18 at 16:33, <andrew.cooper3@citrix.com> wrote: > On 01/08/18 15:31, Jan Beulich wrote: >> The address of an array slot can't be NULL. Instead add a bounds check >> to make sure the array indexing is valid (the check is against 2 since >> slot zero of the array - corresponding to C0 - is of no interest here). >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Wouldn't an expression involving ARRAY_SIZE() be more appropriate? No, as this is a lower bounds check; ->count will never exceed the upper bound. > Either way, Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 2/5] x86/idle: re-arrange dead-idle handling 2018-08-01 14:22 [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich 2018-08-01 14:31 ` [PATCH 1/5] x86/cpuidle: replace a pointless NULL check Jan Beulich @ 2018-08-01 14:31 ` Jan Beulich 2018-09-07 17:08 ` Andrew Cooper 2018-08-01 14:32 ` [PATCH 3/5] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich ` (4 subsequent siblings) 6 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2018-08-01 14:31 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper In order to be able to wake parked CPUs from default_dead_idle(), the function must not itself loop. Move the loop into play_dead(), and use play_dead() as well on the AP boot error path. Furthermore, not the least considering the comment in play_dead(), make sure NMI raised (for now this would be a bug elsewhere, but that's about to change) against a parked or fully offline CPU won't invoke the actual, full-blown NMI handler. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -98,14 +98,19 @@ void default_dead_idle(void) */ spec_ctrl_enter_idle(get_cpu_info()); wbinvd(); - for ( ; ; ) - halt(); + halt(); } -static void play_dead(void) +void play_dead(void) { + unsigned int cpu = smp_processor_id(); + local_irq_disable(); + /* Change the NMI handler to a nop (see comment below). */ + _set_gate_lower(&idt_tables[cpu][TRAP_nmi], SYS_DESC_irq_gate, 0, + &trap_nop); + /* * NOTE: After cpu_exit_clear, per-cpu variables may no longer accessible, * as they may be freed at any time if offline CPUs don't get parked. In @@ -116,9 +121,10 @@ static void play_dead(void) * Consider very carefully when adding code to *dead_idle. Most hypervisor * subsystems are unsafe to call. */ - cpu_exit_clear(smp_processor_id()); + cpu_exit_clear(cpu); - (*dead_idle)(); + for ( ; ; ) + dead_idle(); } static void idle_loop(void) --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -33,6 +33,7 @@ #include <xen/serial.h> #include <xen/numa.h> #include <xen/cpu.h> +#include <asm/cpuidle.h> #include <asm/current.h> #include <asm/mc146818rtc.h> #include <asm/desc.h> @@ -209,8 +210,7 @@ static void smp_callin(void) halt: clear_local_APIC(); spin_debug_enable(); - cpu_exit_clear(cpu); - (*dead_idle)(); + play_dead(); } /* Allow the master to continue. */ --- a/xen/include/asm-x86/cpuidle.h +++ b/xen/include/asm-x86/cpuidle.h @@ -20,6 +20,7 @@ int mwait_idle_init(struct notifier_bloc int cpuidle_init_cpu(unsigned int cpu); void default_dead_idle(void); void acpi_dead_idle(void); +void play_dead(void); void trace_exit_reason(u32 *irq_traced); void update_idle_stats(struct acpi_processor_power *, struct acpi_processor_cx *, uint64_t, uint64_t); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/5] x86/idle: re-arrange dead-idle handling 2018-08-01 14:31 ` [PATCH 2/5] x86/idle: re-arrange dead-idle handling Jan Beulich @ 2018-09-07 17:08 ` Andrew Cooper 2018-09-10 10:13 ` Jan Beulich 0 siblings, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2018-09-07 17:08 UTC (permalink / raw) To: Jan Beulich, xen-devel On 01/08/18 15:31, Jan Beulich wrote: > In order to be able to wake parked CPUs from default_dead_idle(), the > function must not itself loop. Move the loop into play_dead(), and use > play_dead() as well on the AP boot error path. > > Furthermore, not the least considering the comment in play_dead(), > make sure NMI raised (for now this would be a bug elsewhere, but that's > about to change) against a parked or fully offline CPU won't invoke the > actual, full-blown NMI handler. I'm concerned by this. It isn't necessarily a bug elsewhere, because the CPU can still participate in SMM, and still have the SMM handler raise an NMI. Equally, it may still be able to service #MC's, so I can't see how it is safe for us to ever free the percpu data. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/5] x86/idle: re-arrange dead-idle handling 2018-09-07 17:08 ` Andrew Cooper @ 2018-09-10 10:13 ` Jan Beulich 2018-10-26 10:55 ` Ping: " Jan Beulich 2018-12-05 20:33 ` Andrew Cooper 0 siblings, 2 replies; 33+ messages in thread From: Jan Beulich @ 2018-09-10 10:13 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel >>> On 07.09.18 at 19:08, <andrew.cooper3@citrix.com> wrote: > On 01/08/18 15:31, Jan Beulich wrote: >> In order to be able to wake parked CPUs from default_dead_idle(), the >> function must not itself loop. Move the loop into play_dead(), and use >> play_dead() as well on the AP boot error path. >> >> Furthermore, not the least considering the comment in play_dead(), >> make sure NMI raised (for now this would be a bug elsewhere, but that's >> about to change) against a parked or fully offline CPU won't invoke the >> actual, full-blown NMI handler. > > I'm concerned by this. It isn't necessarily a bug elsewhere, because > the CPU can still participate in SMM, and still have the SMM handler > raise an NMI. What significance does an NMI have when raised towards an offline CPU? If SMM does this, then I'd very much guess this is happening in a broadcast fashion (otherwise they'd surely raised it against CPU 0), and in that case NOP-ing the effect for parked CPUs seems quite appropriate to me. > Equally, it may still be able to service #MC's, so I can't see how it is > safe for us to ever free the percpu data. I'm having trouble seeing how this remark relates to the series here. Plus it's a theoretical problem at present only anyway: - physical hot remove is not implemented (there's no source of the new CPU_REMOVE notification), - Intel CPUs get parked, i.e. never have their per-CPU data freed, - AMD CPUs don't broadcast #MC. Bottom line for both parts of your reply: I don't see any proposal / request of what you think needs changing, and how. Unless, perhaps, you're suggesting the entire series is rubbish, in which case I'd like you to propose an alternative approach to address the problem of parking CPUs in C1 instead of deepest possible C-states. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Ping: Re: [PATCH 2/5] x86/idle: re-arrange dead-idle handling 2018-09-10 10:13 ` Jan Beulich @ 2018-10-26 10:55 ` Jan Beulich 2018-12-05 20:33 ` Andrew Cooper 1 sibling, 0 replies; 33+ messages in thread From: Jan Beulich @ 2018-10-26 10:55 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel >>> On 10.09.18 at 12:13, wrote: >>>> On 07.09.18 at 19:08, <andrew.cooper3@citrix.com> wrote: > > On 01/08/18 15:31, Jan Beulich wrote: > >> In order to be able to wake parked CPUs from default_dead_idle(), the > >> function must not itself loop. Move the loop into play_dead(), and use > >> play_dead() as well on the AP boot error path. > >> > >> Furthermore, not the least considering the comment in play_dead(), > >> make sure NMI raised (for now this would be a bug elsewhere, but that's > >> about to change) against a parked or fully offline CPU won't invoke the > >> actual, full-blown NMI handler. > > > > I'm concerned by this. It isn't necessarily a bug elsewhere, because > > the CPU can still participate in SMM, and still have the SMM handler > > raise an NMI. > > What significance does an NMI have when raised towards an offline > CPU? If SMM does this, then I'd very much guess this is happening in > a broadcast fashion (otherwise they'd surely raised it against CPU 0), > and in that case NOP-ing the effect for parked CPUs seems quite > appropriate to me. > > > Equally, it may still be able to service #MC's, so I can't see how it is > > safe for us to ever free the percpu data. > > I'm having trouble seeing how this remark relates to the series here. > Plus it's a theoretical problem at present only anyway: > - physical hot remove is not implemented (there's no source of the > new CPU_REMOVE notification), > - Intel CPUs get parked, i.e. never have their per-CPU data freed, > - AMD CPUs don't broadcast #MC. > > Bottom line for both parts of your reply: I don't see any proposal / > request of what you think needs changing, and how. Unless, > perhaps, you're suggesting the entire series is rubbish, in which > case I'd like you to propose an alternative approach to address > the problem of parking CPUs in C1 instead of deepest possible > C-states. > > Jan > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/5] x86/idle: re-arrange dead-idle handling 2018-09-10 10:13 ` Jan Beulich 2018-10-26 10:55 ` Ping: " Jan Beulich @ 2018-12-05 20:33 ` Andrew Cooper 2018-12-06 8:16 ` Jan Beulich 1 sibling, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2018-12-05 20:33 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On 10/09/2018 11:13, Jan Beulich wrote: > >> Equally, it may still be able to service #MC's, so I can't see how it is >> safe for us to ever free the percpu data. > I'm having trouble seeing how this remark relates to the series here. Because you've tried to make NMIs safe, but not made equivalent adjustments to the #MC side of things. > Plus it's a theoretical problem at present only anyway: > - physical hot remove is not implemented (there's no source of the > new CPU_REMOVE notification), > - Intel CPUs get parked, i.e. never have their per-CPU data freed, > - AMD CPUs don't broadcast #MC. Ignoring MCE's is never an option, but whenever CR4.MCE is set, we must be prepared to handle #MC. Just because an AMD CPU is playing dead doesn't mean it is immune to receiving #MC's. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/5] x86/idle: re-arrange dead-idle handling 2018-12-05 20:33 ` Andrew Cooper @ 2018-12-06 8:16 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2018-12-06 8:16 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel >>> On 05.12.18 at 21:33, <andrew.cooper3@citrix.com> wrote: > On 10/09/2018 11:13, Jan Beulich wrote: >> >>> Equally, it may still be able to service #MC's, so I can't see how it is >>> safe for us to ever free the percpu data. >> I'm having trouble seeing how this remark relates to the series here. > > Because you've tried to make NMIs safe, but not made equivalent > adjustments to the #MC side of things. Explain to me how this is getting worse with the patch in question. It doesn't alter under what conditions per-CPU data gets freed. Of course I can short-circuit the #MC handler just like I do for the NMI one, but that's only going to delay shutdown of the core until a second #MC surfaces (as the first one would never get dealt with). >> Plus it's a theoretical problem at present only anyway: >> - physical hot remove is not implemented (there's no source of the >> new CPU_REMOVE notification), >> - Intel CPUs get parked, i.e. never have their per-CPU data freed, >> - AMD CPUs don't broadcast #MC. > > Ignoring MCE's is never an option, but whenever CR4.MCE is set, we must > be prepared to handle #MC. Just because an AMD CPU is playing dead > doesn't mean it is immune to receiving #MC's. Again - this is nothing the patch here changes in any way. It's not clear to me whether clearing CR4.MCE is an option on AMD CPUs. Anything beyond wiring #MC into trap_nop() (please let me know if that's what you want to see added) should imo not be part of this patch, and again imo doesn't even have to be part of this series. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 3/5] x86/cpuidle: push parked CPUs into deeper sleep states when possible 2018-08-01 14:22 [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich 2018-08-01 14:31 ` [PATCH 1/5] x86/cpuidle: replace a pointless NULL check Jan Beulich 2018-08-01 14:31 ` [PATCH 2/5] x86/idle: re-arrange dead-idle handling Jan Beulich @ 2018-08-01 14:32 ` Jan Beulich 2018-10-26 10:56 ` Ping: " Jan Beulich 2018-08-01 14:33 ` [PATCH 4/5] x86/cpuidle: clean up Cx dumping Jan Beulich ` (3 subsequent siblings) 6 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2018-08-01 14:32 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper When the mwait-idle driver isn't used, C-state information becomes available only in the course of Dom0 starting up. Use the provided data to allow parked CPUs to sleep in a more energy efficient way, by waking them briefly (via NMI) once the data has been recorded. This involves re-arranging how/when the governor's ->enable() hook gets invoked. The changes there include addition of so far missing error handling in the respective CPU notifier handlers. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -351,12 +351,22 @@ static void dump_cx(unsigned char key) unsigned int cpu; printk("'%c' pressed -> printing ACPI Cx structures\n", key); - for_each_online_cpu ( cpu ) - if (processor_powers[cpu]) - { - print_acpi_power(cpu, processor_powers[cpu]); - process_pending_softirqs(); - } + for_each_present_cpu ( cpu ) + { + struct acpi_processor_power *power = processor_powers[cpu]; + + if ( !power ) + continue; + + if ( cpu_online(cpu) ) + print_acpi_power(cpu, power); + else if ( park_offline_cpus ) + printk("CPU%u parked in state %u (C%u)\n", cpu, + power->last_state ? power->last_state->idx : 1, + power->last_state ? power->last_state->type : 1); + + process_pending_softirqs(); + } } static int __init cpu_idle_key_init(void) @@ -764,6 +774,7 @@ void acpi_dead_idle(void) goto default_halt; cx = &power->states[power->count - 1]; + power->last_state = cx; if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) { @@ -1216,9 +1227,30 @@ long set_cx_pminfo(uint32_t acpi_id, str set_cx(acpi_power, &xen_cx); } - if ( cpuidle_current_governor->enable && - cpuidle_current_governor->enable(acpi_power) ) - return -EFAULT; + if ( !cpu_online(cpu_id) ) + { + uint32_t apic_id = x86_cpu_to_apicid[cpu_id]; + + /* + * If we've just learned of more available C states, wake the CPU if + * it's parked, so it can go back to sleep in perhaps a deeper state. + */ + if ( park_offline_cpus && apic_id != BAD_APICID ) + { + unsigned long flags; + + local_irq_save(flags); + apic_wait_icr_idle(); + apic_icr_write(APIC_DM_NMI | APIC_DEST_PHYSICAL, apic_id); + local_irq_restore(flags); + } + } + else if ( cpuidle_current_governor->enable ) + { + ret = cpuidle_current_governor->enable(acpi_power); + if ( ret < 0 ) + return ret; + } /* FIXME: C-state dependency is not supported by far */ @@ -1378,19 +1410,22 @@ static int cpu_callback( struct notifier_block *nfb, unsigned long action, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; + int rc = 0; - /* Only hook on CPU_ONLINE because a dead cpu may utilize the info to - * to enter deep C-state */ + /* + * Only hook on CPU_UP_PREPARE because a dead cpu may utilize the info + * to enter deep C-state. + */ switch ( action ) { - case CPU_ONLINE: - (void)cpuidle_init_cpu(cpu); - break; - default: + case CPU_UP_PREPARE: + rc = cpuidle_init_cpu(cpu); + if ( !rc && cpuidle_current_governor->enable ) + rc = cpuidle_current_governor->enable(processor_powers[cpu]); break; } - return NOTIFY_DONE; + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); } static struct notifier_block cpu_nfb = { @@ -1405,6 +1440,7 @@ static int __init cpuidle_presmp_init(vo return 0; mwait_idle_init(&cpu_nfb); + cpu_nfb.notifier_call(&cpu_nfb, CPU_UP_PREPARE, cpu); cpu_nfb.notifier_call(&cpu_nfb, CPU_ONLINE, cpu); register_cpu_notifier(&cpu_nfb); return 0; --- a/xen/arch/x86/acpi/cpuidle_menu.c +++ b/xen/arch/x86/acpi/cpuidle_menu.c @@ -277,9 +277,6 @@ static void menu_reflect(struct acpi_pro static int menu_enable_device(struct acpi_processor_power *power) { - if (!cpu_online(power->cpu)) - return -1; - memset(&per_cpu(menu_devices, power->cpu), 0, sizeof(struct menu_device)); return 0; --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -1162,12 +1162,17 @@ static int mwait_idle_cpu_init(struct no struct acpi_processor_power *dev = processor_powers[cpu]; switch (action) { + int rc; + default: return NOTIFY_DONE; case CPU_UP_PREPARE: - cpuidle_init_cpu(cpu); - return NOTIFY_DONE; + rc = cpuidle_init_cpu(cpu); + dev = processor_powers[cpu]; + if (!rc && cpuidle_current_governor->enable) + rc = cpuidle_current_governor->enable(dev); + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); case CPU_ONLINE: if (!dev) @@ -1256,8 +1261,6 @@ int __init mwait_idle_init(struct notifi } if (!err) { nfb->notifier_call = mwait_idle_cpu_init; - mwait_idle_cpu_init(nfb, CPU_UP_PREPARE, NULL); - pm_idle_save = pm_idle; pm_idle = mwait_idle; dead_idle = acpi_dead_idle; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Ping: [PATCH 3/5] x86/cpuidle: push parked CPUs into deeper sleep states when possible 2018-08-01 14:32 ` [PATCH 3/5] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich @ 2018-10-26 10:56 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2018-10-26 10:56 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel >>> On 01.08.18 at 16:32, wrote: > When the mwait-idle driver isn't used, C-state information becomes > available only in the course of Dom0 starting up. Use the provided data > to allow parked CPUs to sleep in a more energy efficient way, by waking > them briefly (via NMI) once the data has been recorded. > > This involves re-arranging how/when the governor's ->enable() hook gets > invoked. The changes there include addition of so far missing error > handling in the respective CPU notifier handlers. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/arch/x86/acpi/cpu_idle.c > +++ b/xen/arch/x86/acpi/cpu_idle.c > @@ -351,12 +351,22 @@ static void dump_cx(unsigned char key) > unsigned int cpu; > > printk("'%c' pressed -> printing ACPI Cx structures\n", key); > - for_each_online_cpu ( cpu ) > - if (processor_powers[cpu]) > - { > - print_acpi_power(cpu, processor_powers[cpu]); > - process_pending_softirqs(); > - } > + for_each_present_cpu ( cpu ) > + { > + struct acpi_processor_power *power = processor_powers[cpu]; > + > + if ( !power ) > + continue; > + > + if ( cpu_online(cpu) ) > + print_acpi_power(cpu, power); > + else if ( park_offline_cpus ) > + printk("CPU%u parked in state %u (C%u)\n", cpu, > + power->last_state ? power->last_state->idx : 1, > + power->last_state ? power->last_state->type : 1); > + > + process_pending_softirqs(); > + } > } > > static int __init cpu_idle_key_init(void) > @@ -764,6 +774,7 @@ void acpi_dead_idle(void) > goto default_halt; > > cx = &power->states[power->count - 1]; > + power->last_state = cx; > > if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) > { > @@ -1216,9 +1227,30 @@ long set_cx_pminfo(uint32_t acpi_id, str > set_cx(acpi_power, &xen_cx); > } > > - if ( cpuidle_current_governor->enable && > - cpuidle_current_governor->enable(acpi_power) ) > - return -EFAULT; > + if ( !cpu_online(cpu_id) ) > + { > + uint32_t apic_id = x86_cpu_to_apicid[cpu_id]; > + > + /* > + * If we've just learned of more available C states, wake the CPU > if > + * it's parked, so it can go back to sleep in perhaps a deeper > state. > + */ > + if ( park_offline_cpus && apic_id != BAD_APICID ) > + { > + unsigned long flags; > + > + local_irq_save(flags); > + apic_wait_icr_idle(); > + apic_icr_write(APIC_DM_NMI | APIC_DEST_PHYSICAL, apic_id); > + local_irq_restore(flags); > + } > + } > + else if ( cpuidle_current_governor->enable ) > + { > + ret = cpuidle_current_governor->enable(acpi_power); > + if ( ret < 0 ) > + return ret; > + } > > /* FIXME: C-state dependency is not supported by far */ > > @@ -1378,19 +1410,22 @@ static int cpu_callback( > struct notifier_block *nfb, unsigned long action, void *hcpu) > { > unsigned int cpu = (unsigned long)hcpu; > + int rc = 0; > > - /* Only hook on CPU_ONLINE because a dead cpu may utilize the info to > - * to enter deep C-state */ > + /* > + * Only hook on CPU_UP_PREPARE because a dead cpu may utilize the info > + * to enter deep C-state. > + */ > switch ( action ) > { > - case CPU_ONLINE: > - (void)cpuidle_init_cpu(cpu); > - break; > - default: > + case CPU_UP_PREPARE: > + rc = cpuidle_init_cpu(cpu); > + if ( !rc && cpuidle_current_governor->enable ) > + rc = cpuidle_current_governor->enable(processor_powers[cpu]); > break; > } > > - return NOTIFY_DONE; > + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); > } > > static struct notifier_block cpu_nfb = { > @@ -1405,6 +1440,7 @@ static int __init cpuidle_presmp_init(vo > return 0; > > mwait_idle_init(&cpu_nfb); > + cpu_nfb.notifier_call(&cpu_nfb, CPU_UP_PREPARE, cpu); > cpu_nfb.notifier_call(&cpu_nfb, CPU_ONLINE, cpu); > register_cpu_notifier(&cpu_nfb); > return 0; > --- a/xen/arch/x86/acpi/cpuidle_menu.c > +++ b/xen/arch/x86/acpi/cpuidle_menu.c > @@ -277,9 +277,6 @@ static void menu_reflect(struct acpi_pro > > static int menu_enable_device(struct acpi_processor_power *power) > { > - if (!cpu_online(power->cpu)) > - return -1; > - > memset(&per_cpu(menu_devices, power->cpu), 0, sizeof(struct > menu_device)); > > return 0; > --- a/xen/arch/x86/cpu/mwait-idle.c > +++ b/xen/arch/x86/cpu/mwait-idle.c > @@ -1162,12 +1162,17 @@ static int mwait_idle_cpu_init(struct no > struct acpi_processor_power *dev = processor_powers[cpu]; > > switch (action) { > + int rc; > + > default: > return NOTIFY_DONE; > > case CPU_UP_PREPARE: > - cpuidle_init_cpu(cpu); > - return NOTIFY_DONE; > + rc = cpuidle_init_cpu(cpu); > + dev = processor_powers[cpu]; > + if (!rc && cpuidle_current_governor->enable) > + rc = cpuidle_current_governor->enable(dev); > + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); > > case CPU_ONLINE: > if (!dev) > @@ -1256,8 +1261,6 @@ int __init mwait_idle_init(struct notifi > } > if (!err) { > nfb->notifier_call = mwait_idle_cpu_init; > - mwait_idle_cpu_init(nfb, CPU_UP_PREPARE, NULL); > - > pm_idle_save = pm_idle; > pm_idle = mwait_idle; > dead_idle = acpi_dead_idle; > > > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 4/5] x86/cpuidle: clean up Cx dumping 2018-08-01 14:22 [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich ` (2 preceding siblings ...) 2018-08-01 14:32 ` [PATCH 3/5] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich @ 2018-08-01 14:33 ` Jan Beulich 2018-08-01 14:40 ` Andrew Cooper 2018-08-01 14:33 ` [PATCH 5/5] x86: place non-parked CPUs into wait-for-SIPI state after offlining Jan Beulich ` (2 subsequent siblings) 6 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2018-08-01 14:33 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper Don't log the same global information once per CPU. Don't log the same information (here: the currently active state) twice. Don't prefix decimal numbers with zeros (giving the impression they're octal). Use format specifiers matching the type of the corresponding expressions. Don't split printk()-s without intervening new-lines. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -304,9 +304,6 @@ static void print_acpi_power(uint32_t cp printk("==cpu%d==\n", cpu); last_state_idx = power->last_state ? power->last_state->idx : -1; - printk("active state:\t\tC%d\n", last_state_idx); - printk("max_cstate:\t\tC%d\n", max_cstate); - printk("states:\n"); spin_lock_irq(&power->stat_lock); current_tick = cpuidle_get_tick(); @@ -331,16 +328,14 @@ static void print_acpi_power(uint32_t cp idle_usage += usage[i]; idle_res += tick_to_ns(res_tick[i]); - printk((last_state_idx == i) ? " *" : " "); - printk("C%d:\t", i); - printk("type[C%d] ", power->states[i].type); - printk("latency[%03d] ", power->states[i].latency); - printk("usage[%08"PRIu64"] ", usage[i]); - printk("method[%5s] ", acpi_cstate_method_name[power->states[i].entry_method]); - printk("duration[%"PRIu64"]\n", tick_to_ns(res_tick[i])); + printk(" %cC%u:\ttype[C%d] latency[%3u] usage[%8"PRIu64"] method[%5s] duration[%"PRIu64"]\n", + (last_state_idx == i) ? '*' : ' ', i, + power->states[i].type, power->states[i].latency, usage[i], + acpi_cstate_method_name[power->states[i].entry_method], + tick_to_ns(res_tick[i])); } - printk((last_state_idx == 0) ? " *" : " "); - printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n", + printk(" %cC0:\tusage[%8"PRIu64"] duration[%"PRIu64"]\n", + (last_state_idx == 0) ? '*' : ' ', usage[0] + idle_usage, current_stime - idle_res); print_hw_residencies(cpu); @@ -351,6 +346,7 @@ static void dump_cx(unsigned char key) unsigned int cpu; printk("'%c' pressed -> printing ACPI Cx structures\n", key); + printk("max cstate: C%u\n", max_cstate); for_each_present_cpu ( cpu ) { struct acpi_processor_power *power = processor_powers[cpu]; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 4/5] x86/cpuidle: clean up Cx dumping 2018-08-01 14:33 ` [PATCH 4/5] x86/cpuidle: clean up Cx dumping Jan Beulich @ 2018-08-01 14:40 ` Andrew Cooper 0 siblings, 0 replies; 33+ messages in thread From: Andrew Cooper @ 2018-08-01 14:40 UTC (permalink / raw) To: Jan Beulich, xen-devel On 01/08/18 15:33, Jan Beulich wrote: > Don't log the same global information once per CPU. Don't log the same > information (here: the currently active state) twice. Don't prefix > decimal numbers with zeros (giving the impression they're octal). Use > format specifiers matching the type of the corresponding expressions. > Don't split printk()-s without intervening new-lines. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 5/5] x86: place non-parked CPUs into wait-for-SIPI state after offlining 2018-08-01 14:22 [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich ` (3 preceding siblings ...) 2018-08-01 14:33 ` [PATCH 4/5] x86/cpuidle: clean up Cx dumping Jan Beulich @ 2018-08-01 14:33 ` Jan Beulich 2018-08-29 7:08 ` Ping: [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich [not found] ` <5B61C21202000000000FC1F1@prv1-mh.provo.novell.com> 6 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2018-08-01 14:33 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper This is presumably more power efficient than keeping them in a HLT loop, and I supposed also the state in which they're being handed off by firmware. Split off from wakeup_secondary_cpu() the code to assert/deassert INIT (and in turn also recurring wait-for-send-completion code), and re-use it from __cpu_die(). Take the opportunity and add the previously missing apic_wait_icr_idle() at the start of the sequence. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -415,10 +415,25 @@ void start_secondary(void *unused) extern void *stack_start; -static int wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip) +static unsigned int wait_send(void) { - unsigned long send_status = 0, accept_status = 0; - int maxlvt, timeout, i; + unsigned int send_status, timeout = 0; + + Dprintk("Waiting for send to finish...\n"); + do { + Dprintk("+"); + udelay(100); + send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; + } while ( send_status && (timeout++ < 1000) ); + + return send_status; +} + +static unsigned int init_secondary_cpu(unsigned int phys_apicid) +{ + unsigned int send_status = 0; + + apic_wait_icr_idle(); /* * Be paranoid about clearing APIC errors. @@ -436,13 +451,7 @@ static int wakeup_secondary_cpu(int phys if ( !x2apic_enabled ) { - Dprintk("Waiting for send to finish...\n"); - timeout = 0; - do { - Dprintk("+"); - udelay(100); - send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; - } while ( send_status && (timeout++ < 1000) ); + send_status = wait_send(); mdelay(10); @@ -450,13 +459,7 @@ static int wakeup_secondary_cpu(int phys apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid); - Dprintk("Waiting for send to finish...\n"); - timeout = 0; - do { - Dprintk("+"); - udelay(100); - send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; - } while ( send_status && (timeout++ < 1000) ); + send_status = wait_send(); } else if ( tboot_in_measured_env() ) { @@ -471,7 +474,16 @@ static int wakeup_secondary_cpu(int phys udelay(10); } - maxlvt = get_maxlvt(); + return send_status; +} + +static unsigned int wakeup_secondary_cpu(unsigned int phys_apicid, + unsigned int start_eip) +{ + unsigned int send_status, accept_status = 0; + unsigned int maxlvt = get_maxlvt(), i; + + send_status = init_secondary_cpu(phys_apicid); for ( i = 0; i < 2; i++ ) { @@ -491,15 +503,9 @@ static int wakeup_secondary_cpu(int phys /* Give the other CPU some time to accept the IPI. */ udelay(300); - Dprintk("Startup point 1.\n"); + Dprintk("Startup point %u.\n", i + 1); - Dprintk("Waiting for send to finish...\n"); - timeout = 0; - do { - Dprintk("+"); - udelay(100); - send_status = apic_read(APIC_ICR) & APIC_ICR_BUSY; - } while ( send_status && (timeout++ < 1000) ); + send_status = wait_send(); /* Give the other CPU some time to accept the IPI. */ udelay(200); @@ -519,7 +525,7 @@ static int wakeup_secondary_cpu(int phys if ( send_status ) printk("APIC never delivered???\n"); if ( accept_status ) - printk("APIC delivery error (%lx).\n", accept_status); + printk("APIC delivery error (%x).\n", accept_status); return (send_status | accept_status); } @@ -1238,6 +1244,9 @@ void __cpu_die(unsigned int cpu) if ( (++i % 10) == 0 ) printk(KERN_ERR "CPU %u still not dead...\n", cpu); } + + if ( !park_offline_cpus ) + init_secondary_cpu(x86_cpu_to_apicid[cpu]); } int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Ping: [PATCH 0/5] x86: more power-efficient CPU parking 2018-08-01 14:22 [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich ` (4 preceding siblings ...) 2018-08-01 14:33 ` [PATCH 5/5] x86: place non-parked CPUs into wait-for-SIPI state after offlining Jan Beulich @ 2018-08-29 7:08 ` Jan Beulich 2018-08-29 17:01 ` Andrew Cooper [not found] ` <5B61C21202000000000FC1F1@prv1-mh.provo.novell.com> 6 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2018-08-29 7:08 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel >>> On 01.08.18 at 16:22, wrote: > When putting CPUs to sleep permanently, we should try to put them into > the most power conserving state possible. For now it is unclear whether, > especially in a deep C-state, the P-state also matters, so this series only > arranges for the C-state side of things (plus some cleanup). > > 1: x86/cpuidle: replace a pointless NULL check > 2: x86/idle: re-arrange dead-idle handling > 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible > 4: x86/cpuidle: clean up Cx dumping > 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining > > Jan > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Ping: [PATCH 0/5] x86: more power-efficient CPU parking 2018-08-29 7:08 ` Ping: [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich @ 2018-08-29 17:01 ` Andrew Cooper 2018-08-30 7:29 ` Jan Beulich 0 siblings, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2018-08-29 17:01 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On 29/08/18 08:08, Jan Beulich wrote: >>>> On 01.08.18 at 16:22, wrote: >> When putting CPUs to sleep permanently, we should try to put them into >> the most power conserving state possible. For now it is unclear whether, >> especially in a deep C-state, the P-state also matters, so this series only >> arranges for the C-state side of things (plus some cleanup). >> >> 1: x86/cpuidle: replace a pointless NULL check >> 2: x86/idle: re-arrange dead-idle handling >> 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible >> 4: x86/cpuidle: clean up Cx dumping >> 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining I don't have a problem in principle, but I'm afraid that you're going to need either positive documentation, or a positive statement from Intel and AMD that wait-for-SIPI is the most power efficient state to park a CPU in. At a guess, I'd expect that wait-for-SIPI isn't an optimised state (because the vendors wouldn't expect hardware to be in this state after boot), and from the various discussions around L1TF, there are definitely differences between wait-for-SIPI and mwait. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Ping: [PATCH 0/5] x86: more power-efficient CPU parking 2018-08-29 17:01 ` Andrew Cooper @ 2018-08-30 7:29 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2018-08-30 7:29 UTC (permalink / raw) To: Brian Woods, Suravee Suthikulpanit, Andrew Cooper Cc: xen-devel, Kevin Tian, Jun Nakajima >>> On 29.08.18 at 19:01, <andrew.cooper3@citrix.com> wrote: > On 29/08/18 08:08, Jan Beulich wrote: >>>>> On 01.08.18 at 16:22, wrote: >>> When putting CPUs to sleep permanently, we should try to put them into >>> the most power conserving state possible. For now it is unclear whether, >>> especially in a deep C-state, the P-state also matters, so this series only >>> arranges for the C-state side of things (plus some cleanup). >>> >>> 1: x86/cpuidle: replace a pointless NULL check >>> 2: x86/idle: re-arrange dead-idle handling >>> 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible >>> 4: x86/cpuidle: clean up Cx dumping >>> 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining > > I don't have a problem in principle, but I'm afraid that you're going to > need either positive documentation, or a positive statement from Intel > and AMD that wait-for-SIPI is the most power efficient state to park a > CPU in. > > At a guess, I'd expect that wait-for-SIPI isn't an optimised state > (because the vendors wouldn't expect hardware to be in this state after > boot), and from the various discussions around L1TF, there are > definitely differences between wait-for-SIPI and mwait. Well, that's a comment on patch 5 alone, which for that very reason I've put last in the series. Do I imply your response to mean patches 2 and 3 can have your ack (4 already has and 1 has already gone in)? Also you're comparing to MWAIT, but such CPUs would more likely get put in a loop over HLT (as patch 5's description also says). That's (part of) the background of why I'd like AMD to provide data to enable mwait-idle for their CPUs. As to the actual question raised: AMD (first and foremost), Intel? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <5B61C21202000000000FC1F1@prv1-mh.provo.novell.com>]
[parent not found: <5B61C21202000078001F8805@prv1-mh.provo.novell.com>]
[parent not found: <5B61C21202000000000FC6BD@prv1-mh.provo.novell.com>]
[parent not found: <5B61C212020000780020B6D8@prv1-mh.provo.novell.com>]
[parent not found: <5B61C21202000000000FF27E@prv1-mh.provo.novell.com>]
[parent not found: <5B61C2120200007800224310@prv1-mh.provo.novell.com>]
* Ping: [PATCH 0/5] x86: more power-efficient CPU parking [not found] ` <5B61C2120200007800224310@prv1-mh.provo.novell.com> @ 2019-04-03 10:12 ` Jan Beulich 2019-04-03 11:14 ` Andrew Cooper [not found] ` <5B61C2120200000000101EDC@prv1-mh.provo.novell.com> 1 sibling, 1 reply; 33+ messages in thread From: Jan Beulich @ 2019-04-03 10:12 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monne >>> On 01.08.18 at 16:22, wrote: > When putting CPUs to sleep permanently, we should try to put them into > the most power conserving state possible. For now it is unclear whether, > especially in a deep C-state, the P-state also matters, so this series only > arranges for the C-state side of things (plus some cleanup). > > 1: x86/cpuidle: replace a pointless NULL check > 2: x86/idle: re-arrange dead-idle handling > 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible > 4: x86/cpuidle: clean up Cx dumping > 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining So patch 5 is understandably controversial, and I'm explicitly excluding it from the ping. Patch 1 has gone in long ago and patch 4 has been acked. While there was some feedback on 2 (albeit stalled then after my reply), I don't think I've had any substantial feedback on 3 though, yet _they_ are supposed to provide the main improvement here. Patch 5 really was meant to be more optional than I had expressed, hence its placement even after the pure cleanup patch 4. The series has been pending for well over half a year now, yet other than for the indirect call patching series (where it already doesn't feel really well) I don't want to propose simply committing this in the absences of sufficient feedback. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Ping: [PATCH 0/5] x86: more power-efficient CPU parking 2019-04-03 10:12 ` Jan Beulich @ 2019-04-03 11:14 ` Andrew Cooper 2019-04-03 12:43 ` Jan Beulich 0 siblings, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2019-04-03 11:14 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, Wei Liu, Roger Pau Monne On 03/04/2019 11:12, Jan Beulich wrote: >>>> On 01.08.18 at 16:22, wrote: >> When putting CPUs to sleep permanently, we should try to put them into >> the most power conserving state possible. For now it is unclear whether, >> especially in a deep C-state, the P-state also matters, so this series only >> arranges for the C-state side of things (plus some cleanup). >> >> 1: x86/cpuidle: replace a pointless NULL check >> 2: x86/idle: re-arrange dead-idle handling >> 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible >> 4: x86/cpuidle: clean up Cx dumping >> 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining > So patch 5 is understandably controversial, and I'm explicitly > excluding it from the ping. Considering that it causes EFI firmware to explode in several interesting ways, I'm afraid it is a complete nonstarter. > Patch 1 has gone in long ago and > patch 4 has been acked. While there was some feedback on > 2 (albeit stalled then after my reply), I don't think I've had any > substantial feedback on 3 though, yet _they_ are supposed to > provide the main improvement here. Patch 5 really was meant > to be more optional than I had expressed, hence its placement > even after the pure cleanup patch 4. Some of the patches have cycled out of my inbox, so I'm reviewing from the mail archive. 2) Now that default_dead_idle() isn't a terminal function, you must use spec_ctrl_enter_idle() on the way back out for safety. I'm still going to insist on something about #MC, even if it is just a note in the commit message saying "this doesn't yet make #MC any safer for parked CPUs". The real problem is that your comment about NMI now makes the code read as if everything is safe, and while this patch is an improvement (and therefore ok on its own merits), it doesn't actually make the result #MC safe. 3) Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Ping: [PATCH 0/5] x86: more power-efficient CPU parking 2019-04-03 11:14 ` Andrew Cooper @ 2019-04-03 12:43 ` Jan Beulich 2019-04-03 14:44 ` Andrew Cooper 0 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2019-04-03 12:43 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monne >>> On 03.04.19 at 13:14, <andrew.cooper3@citrix.com> wrote: > On 03/04/2019 11:12, Jan Beulich wrote: >>>>> On 01.08.18 at 16:22, wrote: >>> When putting CPUs to sleep permanently, we should try to put them into >>> the most power conserving state possible. For now it is unclear whether, >>> especially in a deep C-state, the P-state also matters, so this series only >>> arranges for the C-state side of things (plus some cleanup). >>> >>> 1: x86/cpuidle: replace a pointless NULL check >>> 2: x86/idle: re-arrange dead-idle handling >>> 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible >>> 4: x86/cpuidle: clean up Cx dumping >>> 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining >> So patch 5 is understandably controversial, and I'm explicitly >> excluding it from the ping. > > Considering that it causes EFI firmware to explode in several > interesting ways, I'm afraid it is a complete nonstarter. I didn't know this - neither of my two EFI boxes have exploded in any way during the last half year. Care to share details? >> Patch 1 has gone in long ago and >> patch 4 has been acked. While there was some feedback on >> 2 (albeit stalled then after my reply), I don't think I've had any >> substantial feedback on 3 though, yet _they_ are supposed to >> provide the main improvement here. Patch 5 really was meant >> to be more optional than I had expressed, hence its placement >> even after the pure cleanup patch 4. > > Some of the patches have cycled out of my inbox, so I'm reviewing from > the mail archive. > > 2) Now that default_dead_idle() isn't a terminal function, you must use > spec_ctrl_enter_idle() on the way back out for safety. Ah, yes, perhaps better to add it, than to implicitly rely on callers (there's really exactly one) to be terminal. (I take it you mean spec_ctrl_exit_idle().) > I'm still going to insist on something about #MC, even if it is just a > note in the commit message saying "this doesn't yet make #MC any safer > for parked CPUs". I can add something like this, but what is it that's known unsafe for parked CPUs? They in particular retain their per-CPU data. If anything I see a problem with fully offline CPUs. For now I'll add a similar sentence, but with "parked" replaced. One other question (I apparently forgot about this aspect between putting together the series and posting it): acpi_dead_idle() has built-in loops as well. While it's not expected for a CPU to need waking from there (as no "even better" dead-idle handler could get installed) I wonder whether for consistency we wouldn't better drop the loops there too. The downside of doing so would be added overhead in case of spurious wakeups (which ought to have a small chance of being possible in particular in the MWAIT case). > 3) Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Thanks! Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Ping: [PATCH 0/5] x86: more power-efficient CPU parking 2019-04-03 12:43 ` Jan Beulich @ 2019-04-03 14:44 ` Andrew Cooper 2019-04-03 15:20 ` Jan Beulich 0 siblings, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2019-04-03 14:44 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, Wei Liu, Roger Pau Monne [-- Attachment #1.1: Type: text/plain, Size: 4065 bytes --] On 03/04/2019 13:43, Jan Beulich wrote: >>>> On 03.04.19 at 13:14, <andrew.cooper3@citrix.com> wrote: >> On 03/04/2019 11:12, Jan Beulich wrote: >>>>>> On 01.08.18 at 16:22, wrote: >>>> When putting CPUs to sleep permanently, we should try to put them into >>>> the most power conserving state possible. For now it is unclear whether, >>>> especially in a deep C-state, the P-state also matters, so this series only >>>> arranges for the C-state side of things (plus some cleanup). >>>> >>>> 1: x86/cpuidle: replace a pointless NULL check >>>> 2: x86/idle: re-arrange dead-idle handling >>>> 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible >>>> 4: x86/cpuidle: clean up Cx dumping >>>> 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining >>> So patch 5 is understandably controversial, and I'm explicitly >>> excluding it from the ping. >> Considering that it causes EFI firmware to explode in several >> interesting ways, I'm afraid it is a complete nonstarter. > I didn't know this - neither of my two EFI boxes have exploded in > any way during the last half year. Care to share details? It was an assertion failure when the CPU failed to call into the SMM rendezvous. LogLibaErrorLogSmmLib.c(276): ((BOOLEAN)(0==1)) This is a production Dell system IIRC (or maybe Supermicro, but either way, a production firmware). In retrospect, fully offlining a CPU behind the back of the firmware is an extremely antisocial thing to do, and I'm not surprised that the firmware doesn't tolerate it. >>> Patch 1 has gone in long ago and >>> patch 4 has been acked. While there was some feedback on >>> 2 (albeit stalled then after my reply), I don't think I've had any >>> substantial feedback on 3 though, yet _they_ are supposed to >>> provide the main improvement here. Patch 5 really was meant >>> to be more optional than I had expressed, hence its placement >>> even after the pure cleanup patch 4. >> Some of the patches have cycled out of my inbox, so I'm reviewing from >> the mail archive. >> >> 2) Now that default_dead_idle() isn't a terminal function, you must use >> spec_ctrl_enter_idle() on the way back out for safety. > Ah, yes, perhaps better to add it, than to implicitly rely on callers > (there's really exactly one) to be terminal. (I take it you mean > spec_ctrl_exit_idle().) I did, sorry. >> I'm still going to insist on something about #MC, even if it is just a >> note in the commit message saying "this doesn't yet make #MC any safer >> for parked CPUs". > I can add something like this, but what is it that's known unsafe > for parked CPUs? They in particular retain their per-CPU data. If > anything I see a problem with fully offline CPUs. For now I'll add > a similar sentence, but with "parked" replaced. Ah ok - my mistake. Basically, for any CPU we have ever started, we need to maintain a valid stack because we can take NMIs and MCEs. This is also a strict requirement placed on the BIOS even for APs by the various Bios Writers Guides. At a minimum, this is the IVT and enough stack to handle NMIs with just an IRET, and an SMM stack. Even for AMD CPUs which don't broadcast MCEs, we have a non-zero chance of legitimately taking an #MC on a core we have recently downed, for a previous action which has taken a while to propagate back. > One other question (I apparently forgot about this aspect > between putting together the series and posting it): > acpi_dead_idle() has built-in loops as well. While it's not > expected for a CPU to need waking from there (as no "even > better" dead-idle handler could get installed) I wonder whether > for consistency we wouldn't better drop the loops there too. I think that would be a good idea, along with a similar speculative adjustment. > The downside of doing so would be added overhead in case > of spurious wakeups (which ought to have a small chance of > being possible in particular in the MWAIT case). I really don't think that is a concern. I don't think you'll be able to measure the difference in the noise. ~Andrew [-- Attachment #1.2: Type: text/html, Size: 6469 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Ping: [PATCH 0/5] x86: more power-efficient CPU parking 2019-04-03 14:44 ` Andrew Cooper @ 2019-04-03 15:20 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2019-04-03 15:20 UTC (permalink / raw) To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monne >>> On 03.04.19 at 16:44, <andrew.cooper3@citrix.com> wrote: > On 03/04/2019 13:43, Jan Beulich wrote: >>>>> On 03.04.19 at 13:14, <andrew.cooper3@citrix.com> wrote: >>> On 03/04/2019 11:12, Jan Beulich wrote: >>>>>>> On 01.08.18 at 16:22, wrote: >>>>> When putting CPUs to sleep permanently, we should try to put them into >>>>> the most power conserving state possible. For now it is unclear whether, >>>>> especially in a deep C-state, the P-state also matters, so this series only >>>>> arranges for the C-state side of things (plus some cleanup). >>>>> >>>>> 1: x86/cpuidle: replace a pointless NULL check >>>>> 2: x86/idle: re-arrange dead-idle handling >>>>> 3: x86/cpuidle: push parked CPUs into deeper sleep states when possible >>>>> 4: x86/cpuidle: clean up Cx dumping >>>>> 5: x86: place non-parked CPUs into wait-for-SIPI state after offlining >>>> So patch 5 is understandably controversial, and I'm explicitly >>>> excluding it from the ping. >>> Considering that it causes EFI firmware to explode in several >>> interesting ways, I'm afraid it is a complete nonstarter. >> I didn't know this - neither of my two EFI boxes have exploded in >> any way during the last half year. Care to share details? > > It was an assertion failure when the CPU failed to call into the SMM > rendezvous. > > LogLibaErrorLogSmmLib.c(276): ((BOOLEAN)(0==1)) > > This is a production Dell system IIRC (or maybe Supermicro, but either > way, a production firmware). > > In retrospect, fully offlining a CPU behind the back of the firmware is > an extremely antisocial thing to do, and I'm not surprised that the > firmware doesn't tolerate it. Oh, I see. I withdraw this patch then. >> One other question (I apparently forgot about this aspect >> between putting together the series and posting it): >> acpi_dead_idle() has built-in loops as well. While it's not >> expected for a CPU to need waking from there (as no "even >> better" dead-idle handler could get installed) I wonder whether >> for consistency we wouldn't better drop the loops there too. > > I think that would be a good idea, along with a similar speculative > adjustment. Will do. >> The downside of doing so would be added overhead in case >> of spurious wakeups (which ought to have a small chance of >> being possible in particular in the MWAIT case). > > I really don't think that is a concern. I don't think you'll be able to > measure the difference in the noise. Well, not over any extended period of time. But a single WBINVD may be quite noticeable to at least the other thread while it executes. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <5B61C2120200000000101EDC@prv1-mh.provo.novell.com>]
[parent not found: <5B61C212020000780022FF0D@prv1-mh.provo.novell.com>]
* [PATCH v2 0/3] x86: more power-efficient CPU parking [not found] ` <5B61C212020000780022FF0D@prv1-mh.provo.novell.com> @ 2019-05-17 10:10 ` Jan Beulich 2019-05-17 10:10 ` [Xen-devel] " Jan Beulich ` (3 more replies) 0 siblings, 4 replies; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:10 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne When putting CPUs to sleep permanently, we should try to put them into the most power conserving state possible. For now it is unclear whether, especially in a deep C-state, the P-state also matters, so this series only arranges for the C-state side of things (plus some cleanup). 1: x86/idle: re-arrange dead-idle handling 2: x86/cpuidle: push parked CPUs into deeper sleep states when possible 3: x86/cpuidle: clean up Cx dumping Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [Xen-devel] [PATCH v2 0/3] x86: more power-efficient CPU parking 2019-05-17 10:10 ` [PATCH v2 0/3] " Jan Beulich @ 2019-05-17 10:10 ` Jan Beulich 2019-05-17 10:11 ` [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling Jan Beulich ` (2 subsequent siblings) 3 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:10 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne When putting CPUs to sleep permanently, we should try to put them into the most power conserving state possible. For now it is unclear whether, especially in a deep C-state, the P-state also matters, so this series only arranges for the C-state side of things (plus some cleanup). 1: x86/idle: re-arrange dead-idle handling 2: x86/cpuidle: push parked CPUs into deeper sleep states when possible 3: x86/cpuidle: clean up Cx dumping Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling 2019-05-17 10:10 ` [PATCH v2 0/3] " Jan Beulich 2019-05-17 10:10 ` [Xen-devel] " Jan Beulich @ 2019-05-17 10:11 ` Jan Beulich 2019-05-17 10:11 ` [Xen-devel] " Jan Beulich 2019-05-20 14:25 ` Andrew Cooper 2019-05-17 10:12 ` [PATCH v2 2/3] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich 2019-05-17 10:12 ` [PATCH v2 3/3] x86/cpuidle: clean up Cx dumping Jan Beulich 3 siblings, 2 replies; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:11 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne In order to be able to wake parked CPUs from default_dead_idle() (for them to then enter a different dead-idle routine), the function should not itself loop. Move the loop into play_dead(), and use play_dead() as well on the AP boot error path. Furthermore, not the least considering the comment in play_dead(), make sure NMI raised (for now this would be a bug elsewhere, but that's about to change) against a parked or fully offline CPU won't invoke the actual, full-blown NMI handler. Note however that this doesn't make #MC any safer for fully offline CPUs. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: Add spec_ctrl_exit_idle() to default_dead_idle(). Add #MC related remark to description. --- Note: I had to drop the discussed acpi_dead_idle() adjustment again, as it breaks booting with "smt=0" and "maxcpus=" on at least one of my systems. I've not yet managed to understand why that would be. --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -100,14 +100,20 @@ void default_dead_idle(void) */ spec_ctrl_enter_idle(get_cpu_info()); wbinvd(); - for ( ; ; ) - halt(); + halt(); + spec_ctrl_exit_idle(get_cpu_info()); } -static void play_dead(void) +void play_dead(void) { + unsigned int cpu = smp_processor_id(); + local_irq_disable(); + /* Change the NMI handler to a nop (see comment below). */ + _set_gate_lower(&idt_tables[cpu][TRAP_nmi], SYS_DESC_irq_gate, 0, + &trap_nop); + /* * NOTE: After cpu_exit_clear, per-cpu variables may no longer accessible, * as they may be freed at any time if offline CPUs don't get parked. In @@ -118,9 +124,10 @@ static void play_dead(void) * Consider very carefully when adding code to *dead_idle. Most hypervisor * subsystems are unsafe to call. */ - cpu_exit_clear(smp_processor_id()); + cpu_exit_clear(cpu); - (*dead_idle)(); + for ( ; ; ) + dead_idle(); } static void idle_loop(void) --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -33,6 +33,7 @@ #include <xen/serial.h> #include <xen/numa.h> #include <xen/cpu.h> +#include <asm/cpuidle.h> #include <asm/current.h> #include <asm/mc146818rtc.h> #include <asm/desc.h> @@ -209,8 +210,7 @@ static void smp_callin(void) halt: clear_local_APIC(); spin_debug_enable(); - cpu_exit_clear(cpu); - (*dead_idle)(); + play_dead(); } /* Allow the master to continue. */ --- a/xen/include/asm-x86/cpuidle.h +++ b/xen/include/asm-x86/cpuidle.h @@ -20,6 +20,7 @@ int mwait_idle_init(struct notifier_bloc int cpuidle_init_cpu(unsigned int cpu); void default_dead_idle(void); void acpi_dead_idle(void); +void play_dead(void); void trace_exit_reason(u32 *irq_traced); void update_idle_stats(struct acpi_processor_power *, struct acpi_processor_cx *, uint64_t, uint64_t); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [Xen-devel] [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling 2019-05-17 10:11 ` [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling Jan Beulich @ 2019-05-17 10:11 ` Jan Beulich 2019-05-20 14:25 ` Andrew Cooper 1 sibling, 0 replies; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:11 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne In order to be able to wake parked CPUs from default_dead_idle() (for them to then enter a different dead-idle routine), the function should not itself loop. Move the loop into play_dead(), and use play_dead() as well on the AP boot error path. Furthermore, not the least considering the comment in play_dead(), make sure NMI raised (for now this would be a bug elsewhere, but that's about to change) against a parked or fully offline CPU won't invoke the actual, full-blown NMI handler. Note however that this doesn't make #MC any safer for fully offline CPUs. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: Add spec_ctrl_exit_idle() to default_dead_idle(). Add #MC related remark to description. --- Note: I had to drop the discussed acpi_dead_idle() adjustment again, as it breaks booting with "smt=0" and "maxcpus=" on at least one of my systems. I've not yet managed to understand why that would be. --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -100,14 +100,20 @@ void default_dead_idle(void) */ spec_ctrl_enter_idle(get_cpu_info()); wbinvd(); - for ( ; ; ) - halt(); + halt(); + spec_ctrl_exit_idle(get_cpu_info()); } -static void play_dead(void) +void play_dead(void) { + unsigned int cpu = smp_processor_id(); + local_irq_disable(); + /* Change the NMI handler to a nop (see comment below). */ + _set_gate_lower(&idt_tables[cpu][TRAP_nmi], SYS_DESC_irq_gate, 0, + &trap_nop); + /* * NOTE: After cpu_exit_clear, per-cpu variables may no longer accessible, * as they may be freed at any time if offline CPUs don't get parked. In @@ -118,9 +124,10 @@ static void play_dead(void) * Consider very carefully when adding code to *dead_idle. Most hypervisor * subsystems are unsafe to call. */ - cpu_exit_clear(smp_processor_id()); + cpu_exit_clear(cpu); - (*dead_idle)(); + for ( ; ; ) + dead_idle(); } static void idle_loop(void) --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -33,6 +33,7 @@ #include <xen/serial.h> #include <xen/numa.h> #include <xen/cpu.h> +#include <asm/cpuidle.h> #include <asm/current.h> #include <asm/mc146818rtc.h> #include <asm/desc.h> @@ -209,8 +210,7 @@ static void smp_callin(void) halt: clear_local_APIC(); spin_debug_enable(); - cpu_exit_clear(cpu); - (*dead_idle)(); + play_dead(); } /* Allow the master to continue. */ --- a/xen/include/asm-x86/cpuidle.h +++ b/xen/include/asm-x86/cpuidle.h @@ -20,6 +20,7 @@ int mwait_idle_init(struct notifier_bloc int cpuidle_init_cpu(unsigned int cpu); void default_dead_idle(void); void acpi_dead_idle(void); +void play_dead(void); void trace_exit_reason(u32 *irq_traced); void update_idle_stats(struct acpi_processor_power *, struct acpi_processor_cx *, uint64_t, uint64_t); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling 2019-05-17 10:11 ` [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling Jan Beulich 2019-05-17 10:11 ` [Xen-devel] " Jan Beulich @ 2019-05-20 14:25 ` Andrew Cooper 2019-05-20 14:25 ` [Xen-devel] " Andrew Cooper 1 sibling, 1 reply; 33+ messages in thread From: Andrew Cooper @ 2019-05-20 14:25 UTC (permalink / raw) To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monne On 17/05/2019 11:11, Jan Beulich wrote: > In order to be able to wake parked CPUs from default_dead_idle() (for > them to then enter a different dead-idle routine), the function should > not itself loop. Move the loop into play_dead(), and use play_dead() as > well on the AP boot error path. > > Furthermore, not the least considering the comment in play_dead(), > make sure NMI raised (for now this would be a bug elsewhere, but that's > about to change) against a parked or fully offline CPU won't invoke the > actual, full-blown NMI handler. > > Note however that this doesn't make #MC any safer for fully offline > CPUs. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Xen-devel] [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling 2019-05-20 14:25 ` Andrew Cooper @ 2019-05-20 14:25 ` Andrew Cooper 0 siblings, 0 replies; 33+ messages in thread From: Andrew Cooper @ 2019-05-20 14:25 UTC (permalink / raw) To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monne On 17/05/2019 11:11, Jan Beulich wrote: > In order to be able to wake parked CPUs from default_dead_idle() (for > them to then enter a different dead-idle routine), the function should > not itself loop. Move the loop into play_dead(), and use play_dead() as > well on the AP boot error path. > > Furthermore, not the least considering the comment in play_dead(), > make sure NMI raised (for now this would be a bug elsewhere, but that's > about to change) against a parked or fully offline CPU won't invoke the > actual, full-blown NMI handler. > > Note however that this doesn't make #MC any safer for fully offline > CPUs. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 2/3] x86/cpuidle: push parked CPUs into deeper sleep states when possible 2019-05-17 10:10 ` [PATCH v2 0/3] " Jan Beulich 2019-05-17 10:10 ` [Xen-devel] " Jan Beulich 2019-05-17 10:11 ` [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling Jan Beulich @ 2019-05-17 10:12 ` Jan Beulich 2019-05-17 10:12 ` [Xen-devel] " Jan Beulich 2019-05-17 10:12 ` [PATCH v2 3/3] x86/cpuidle: clean up Cx dumping Jan Beulich 3 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:12 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne When the mwait-idle driver isn't used, C-state information becomes available only in the course of Dom0 starting up. Use the provided data to allow parked CPUs to sleep in a more energy efficient way, by waking them briefly (via NMI) once the data has been recorded. This involves re-arranging how/when the governor's ->enable() hook gets invoked. The changes there include addition of so far missing error handling in the respective CPU notifier handlers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -351,12 +351,22 @@ static void dump_cx(unsigned char key) unsigned int cpu; printk("'%c' pressed -> printing ACPI Cx structures\n", key); - for_each_online_cpu ( cpu ) - if (processor_powers[cpu]) - { - print_acpi_power(cpu, processor_powers[cpu]); - process_pending_softirqs(); - } + for_each_present_cpu ( cpu ) + { + struct acpi_processor_power *power = processor_powers[cpu]; + + if ( !power ) + continue; + + if ( cpu_online(cpu) ) + print_acpi_power(cpu, power); + else if ( park_offline_cpus ) + printk("CPU%u parked in state %u (C%u)\n", cpu, + power->last_state ? power->last_state->idx : 1, + power->last_state ? power->last_state->type : 1); + + process_pending_softirqs(); + } } static int __init cpu_idle_key_init(void) @@ -765,6 +775,7 @@ void acpi_dead_idle(void) goto default_halt; cx = &power->states[power->count - 1]; + power->last_state = cx; if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) { @@ -1217,9 +1228,30 @@ long set_cx_pminfo(uint32_t acpi_id, str set_cx(acpi_power, &xen_cx); } - if ( cpuidle_current_governor->enable && - cpuidle_current_governor->enable(acpi_power) ) - return -EFAULT; + if ( !cpu_online(cpu_id) ) + { + uint32_t apic_id = x86_cpu_to_apicid[cpu_id]; + + /* + * If we've just learned of more available C states, wake the CPU if + * it's parked, so it can go back to sleep in perhaps a deeper state. + */ + if ( park_offline_cpus && apic_id != BAD_APICID ) + { + unsigned long flags; + + local_irq_save(flags); + apic_wait_icr_idle(); + apic_icr_write(APIC_DM_NMI | APIC_DEST_PHYSICAL, apic_id); + local_irq_restore(flags); + } + } + else if ( cpuidle_current_governor->enable ) + { + ret = cpuidle_current_governor->enable(acpi_power); + if ( ret < 0 ) + return ret; + } /* FIXME: C-state dependency is not supported by far */ @@ -1379,19 +1411,22 @@ static int cpu_callback( struct notifier_block *nfb, unsigned long action, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; + int rc = 0; - /* Only hook on CPU_ONLINE because a dead cpu may utilize the info to - * to enter deep C-state */ + /* + * Only hook on CPU_UP_PREPARE because a dead cpu may utilize the info + * to enter deep C-state. + */ switch ( action ) { - case CPU_ONLINE: - (void)cpuidle_init_cpu(cpu); - break; - default: + case CPU_UP_PREPARE: + rc = cpuidle_init_cpu(cpu); + if ( !rc && cpuidle_current_governor->enable ) + rc = cpuidle_current_governor->enable(processor_powers[cpu]); break; } - return NOTIFY_DONE; + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); } static struct notifier_block cpu_nfb = { @@ -1406,6 +1441,7 @@ static int __init cpuidle_presmp_init(vo return 0; mwait_idle_init(&cpu_nfb); + cpu_nfb.notifier_call(&cpu_nfb, CPU_UP_PREPARE, cpu); cpu_nfb.notifier_call(&cpu_nfb, CPU_ONLINE, cpu); register_cpu_notifier(&cpu_nfb); return 0; --- a/xen/arch/x86/acpi/cpuidle_menu.c +++ b/xen/arch/x86/acpi/cpuidle_menu.c @@ -277,9 +277,6 @@ static void menu_reflect(struct acpi_pro static int menu_enable_device(struct acpi_processor_power *power) { - if (!cpu_online(power->cpu)) - return -1; - memset(&per_cpu(menu_devices, power->cpu), 0, sizeof(struct menu_device)); return 0; --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -1166,12 +1166,17 @@ static int mwait_idle_cpu_init(struct no struct acpi_processor_power *dev = processor_powers[cpu]; switch (action) { + int rc; + default: return NOTIFY_DONE; case CPU_UP_PREPARE: - cpuidle_init_cpu(cpu); - return NOTIFY_DONE; + rc = cpuidle_init_cpu(cpu); + dev = processor_powers[cpu]; + if (!rc && cpuidle_current_governor->enable) + rc = cpuidle_current_governor->enable(dev); + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); case CPU_ONLINE: if (!dev) @@ -1260,8 +1265,6 @@ int __init mwait_idle_init(struct notifi } if (!err) { nfb->notifier_call = mwait_idle_cpu_init; - mwait_idle_cpu_init(nfb, CPU_UP_PREPARE, NULL); - pm_idle_save = pm_idle; pm_idle = mwait_idle; dead_idle = acpi_dead_idle; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [Xen-devel] [PATCH v2 2/3] x86/cpuidle: push parked CPUs into deeper sleep states when possible 2019-05-17 10:12 ` [PATCH v2 2/3] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich @ 2019-05-17 10:12 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:12 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne When the mwait-idle driver isn't used, C-state information becomes available only in the course of Dom0 starting up. Use the provided data to allow parked CPUs to sleep in a more energy efficient way, by waking them briefly (via NMI) once the data has been recorded. This involves re-arranging how/when the governor's ->enable() hook gets invoked. The changes there include addition of so far missing error handling in the respective CPU notifier handlers. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -351,12 +351,22 @@ static void dump_cx(unsigned char key) unsigned int cpu; printk("'%c' pressed -> printing ACPI Cx structures\n", key); - for_each_online_cpu ( cpu ) - if (processor_powers[cpu]) - { - print_acpi_power(cpu, processor_powers[cpu]); - process_pending_softirqs(); - } + for_each_present_cpu ( cpu ) + { + struct acpi_processor_power *power = processor_powers[cpu]; + + if ( !power ) + continue; + + if ( cpu_online(cpu) ) + print_acpi_power(cpu, power); + else if ( park_offline_cpus ) + printk("CPU%u parked in state %u (C%u)\n", cpu, + power->last_state ? power->last_state->idx : 1, + power->last_state ? power->last_state->type : 1); + + process_pending_softirqs(); + } } static int __init cpu_idle_key_init(void) @@ -765,6 +775,7 @@ void acpi_dead_idle(void) goto default_halt; cx = &power->states[power->count - 1]; + power->last_state = cx; if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) { @@ -1217,9 +1228,30 @@ long set_cx_pminfo(uint32_t acpi_id, str set_cx(acpi_power, &xen_cx); } - if ( cpuidle_current_governor->enable && - cpuidle_current_governor->enable(acpi_power) ) - return -EFAULT; + if ( !cpu_online(cpu_id) ) + { + uint32_t apic_id = x86_cpu_to_apicid[cpu_id]; + + /* + * If we've just learned of more available C states, wake the CPU if + * it's parked, so it can go back to sleep in perhaps a deeper state. + */ + if ( park_offline_cpus && apic_id != BAD_APICID ) + { + unsigned long flags; + + local_irq_save(flags); + apic_wait_icr_idle(); + apic_icr_write(APIC_DM_NMI | APIC_DEST_PHYSICAL, apic_id); + local_irq_restore(flags); + } + } + else if ( cpuidle_current_governor->enable ) + { + ret = cpuidle_current_governor->enable(acpi_power); + if ( ret < 0 ) + return ret; + } /* FIXME: C-state dependency is not supported by far */ @@ -1379,19 +1411,22 @@ static int cpu_callback( struct notifier_block *nfb, unsigned long action, void *hcpu) { unsigned int cpu = (unsigned long)hcpu; + int rc = 0; - /* Only hook on CPU_ONLINE because a dead cpu may utilize the info to - * to enter deep C-state */ + /* + * Only hook on CPU_UP_PREPARE because a dead cpu may utilize the info + * to enter deep C-state. + */ switch ( action ) { - case CPU_ONLINE: - (void)cpuidle_init_cpu(cpu); - break; - default: + case CPU_UP_PREPARE: + rc = cpuidle_init_cpu(cpu); + if ( !rc && cpuidle_current_governor->enable ) + rc = cpuidle_current_governor->enable(processor_powers[cpu]); break; } - return NOTIFY_DONE; + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); } static struct notifier_block cpu_nfb = { @@ -1406,6 +1441,7 @@ static int __init cpuidle_presmp_init(vo return 0; mwait_idle_init(&cpu_nfb); + cpu_nfb.notifier_call(&cpu_nfb, CPU_UP_PREPARE, cpu); cpu_nfb.notifier_call(&cpu_nfb, CPU_ONLINE, cpu); register_cpu_notifier(&cpu_nfb); return 0; --- a/xen/arch/x86/acpi/cpuidle_menu.c +++ b/xen/arch/x86/acpi/cpuidle_menu.c @@ -277,9 +277,6 @@ static void menu_reflect(struct acpi_pro static int menu_enable_device(struct acpi_processor_power *power) { - if (!cpu_online(power->cpu)) - return -1; - memset(&per_cpu(menu_devices, power->cpu), 0, sizeof(struct menu_device)); return 0; --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -1166,12 +1166,17 @@ static int mwait_idle_cpu_init(struct no struct acpi_processor_power *dev = processor_powers[cpu]; switch (action) { + int rc; + default: return NOTIFY_DONE; case CPU_UP_PREPARE: - cpuidle_init_cpu(cpu); - return NOTIFY_DONE; + rc = cpuidle_init_cpu(cpu); + dev = processor_powers[cpu]; + if (!rc && cpuidle_current_governor->enable) + rc = cpuidle_current_governor->enable(dev); + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); case CPU_ONLINE: if (!dev) @@ -1260,8 +1265,6 @@ int __init mwait_idle_init(struct notifi } if (!err) { nfb->notifier_call = mwait_idle_cpu_init; - mwait_idle_cpu_init(nfb, CPU_UP_PREPARE, NULL); - pm_idle_save = pm_idle; pm_idle = mwait_idle; dead_idle = acpi_dead_idle; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v2 3/3] x86/cpuidle: clean up Cx dumping 2019-05-17 10:10 ` [PATCH v2 0/3] " Jan Beulich ` (2 preceding siblings ...) 2019-05-17 10:12 ` [PATCH v2 2/3] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich @ 2019-05-17 10:12 ` Jan Beulich 2019-05-17 10:12 ` [Xen-devel] " Jan Beulich 3 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:12 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne Don't log the same global information once per CPU. Don't log the same information (here: the currently active state) twice. Don't prefix decimal numbers with zeros (giving the impression they're octal). Use format specifiers matching the type of the corresponding expressions. Don't split printk()-s without intervening new-lines. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -304,9 +304,6 @@ static void print_acpi_power(uint32_t cp printk("==cpu%d==\n", cpu); last_state_idx = power->last_state ? power->last_state->idx : -1; - printk("active state:\t\tC%d\n", last_state_idx); - printk("max_cstate:\t\tC%d\n", max_cstate); - printk("states:\n"); spin_lock_irq(&power->stat_lock); current_tick = cpuidle_get_tick(); @@ -331,16 +328,14 @@ static void print_acpi_power(uint32_t cp idle_usage += usage[i]; idle_res += tick_to_ns(res_tick[i]); - printk((last_state_idx == i) ? " *" : " "); - printk("C%d:\t", i); - printk("type[C%d] ", power->states[i].type); - printk("latency[%03d] ", power->states[i].latency); - printk("usage[%08"PRIu64"] ", usage[i]); - printk("method[%5s] ", acpi_cstate_method_name[power->states[i].entry_method]); - printk("duration[%"PRIu64"]\n", tick_to_ns(res_tick[i])); + printk(" %cC%u:\ttype[C%d] latency[%3u] usage[%8"PRIu64"] method[%5s] duration[%"PRIu64"]\n", + (last_state_idx == i) ? '*' : ' ', i, + power->states[i].type, power->states[i].latency, usage[i], + acpi_cstate_method_name[power->states[i].entry_method], + tick_to_ns(res_tick[i])); } - printk((last_state_idx == 0) ? " *" : " "); - printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n", + printk(" %cC0:\tusage[%8"PRIu64"] duration[%"PRIu64"]\n", + (last_state_idx == 0) ? '*' : ' ', usage[0] + idle_usage, current_stime - idle_res); print_hw_residencies(cpu); @@ -351,6 +346,7 @@ static void dump_cx(unsigned char key) unsigned int cpu; printk("'%c' pressed -> printing ACPI Cx structures\n", key); + printk("max cstate: C%u\n", max_cstate); for_each_present_cpu ( cpu ) { struct acpi_processor_power *power = processor_powers[cpu]; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* [Xen-devel] [PATCH v2 3/3] x86/cpuidle: clean up Cx dumping 2019-05-17 10:12 ` [PATCH v2 3/3] x86/cpuidle: clean up Cx dumping Jan Beulich @ 2019-05-17 10:12 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2019-05-17 10:12 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monne Don't log the same global information once per CPU. Don't log the same information (here: the currently active state) twice. Don't prefix decimal numbers with zeros (giving the impression they're octal). Use format specifiers matching the type of the corresponding expressions. Don't split printk()-s without intervening new-lines. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -304,9 +304,6 @@ static void print_acpi_power(uint32_t cp printk("==cpu%d==\n", cpu); last_state_idx = power->last_state ? power->last_state->idx : -1; - printk("active state:\t\tC%d\n", last_state_idx); - printk("max_cstate:\t\tC%d\n", max_cstate); - printk("states:\n"); spin_lock_irq(&power->stat_lock); current_tick = cpuidle_get_tick(); @@ -331,16 +328,14 @@ static void print_acpi_power(uint32_t cp idle_usage += usage[i]; idle_res += tick_to_ns(res_tick[i]); - printk((last_state_idx == i) ? " *" : " "); - printk("C%d:\t", i); - printk("type[C%d] ", power->states[i].type); - printk("latency[%03d] ", power->states[i].latency); - printk("usage[%08"PRIu64"] ", usage[i]); - printk("method[%5s] ", acpi_cstate_method_name[power->states[i].entry_method]); - printk("duration[%"PRIu64"]\n", tick_to_ns(res_tick[i])); + printk(" %cC%u:\ttype[C%d] latency[%3u] usage[%8"PRIu64"] method[%5s] duration[%"PRIu64"]\n", + (last_state_idx == i) ? '*' : ' ', i, + power->states[i].type, power->states[i].latency, usage[i], + acpi_cstate_method_name[power->states[i].entry_method], + tick_to_ns(res_tick[i])); } - printk((last_state_idx == 0) ? " *" : " "); - printk("C0:\tusage[%08"PRIu64"] duration[%"PRIu64"]\n", + printk(" %cC0:\tusage[%8"PRIu64"] duration[%"PRIu64"]\n", + (last_state_idx == 0) ? '*' : ' ', usage[0] + idle_usage, current_stime - idle_res); print_hw_residencies(cpu); @@ -351,6 +346,7 @@ static void dump_cx(unsigned char key) unsigned int cpu; printk("'%c' pressed -> printing ACPI Cx structures\n", key); + printk("max cstate: C%u\n", max_cstate); for_each_present_cpu ( cpu ) { struct acpi_processor_power *power = processor_powers[cpu]; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2019-05-20 14:26 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-01 14:22 [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich 2018-08-01 14:31 ` [PATCH 1/5] x86/cpuidle: replace a pointless NULL check Jan Beulich 2018-08-01 14:33 ` Andrew Cooper 2018-08-01 15:12 ` Jan Beulich 2018-08-01 14:31 ` [PATCH 2/5] x86/idle: re-arrange dead-idle handling Jan Beulich 2018-09-07 17:08 ` Andrew Cooper 2018-09-10 10:13 ` Jan Beulich 2018-10-26 10:55 ` Ping: " Jan Beulich 2018-12-05 20:33 ` Andrew Cooper 2018-12-06 8:16 ` Jan Beulich 2018-08-01 14:32 ` [PATCH 3/5] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich 2018-10-26 10:56 ` Ping: " Jan Beulich 2018-08-01 14:33 ` [PATCH 4/5] x86/cpuidle: clean up Cx dumping Jan Beulich 2018-08-01 14:40 ` Andrew Cooper 2018-08-01 14:33 ` [PATCH 5/5] x86: place non-parked CPUs into wait-for-SIPI state after offlining Jan Beulich 2018-08-29 7:08 ` Ping: [PATCH 0/5] x86: more power-efficient CPU parking Jan Beulich 2018-08-29 17:01 ` Andrew Cooper 2018-08-30 7:29 ` Jan Beulich [not found] ` <5B61C21202000000000FC1F1@prv1-mh.provo.novell.com> [not found] ` <5B61C21202000078001F8805@prv1-mh.provo.novell.com> [not found] ` <5B61C21202000000000FC6BD@prv1-mh.provo.novell.com> [not found] ` <5B61C212020000780020B6D8@prv1-mh.provo.novell.com> [not found] ` <5B61C21202000000000FF27E@prv1-mh.provo.novell.com> [not found] ` <5B61C2120200007800224310@prv1-mh.provo.novell.com> 2019-04-03 10:12 ` Jan Beulich 2019-04-03 11:14 ` Andrew Cooper 2019-04-03 12:43 ` Jan Beulich 2019-04-03 14:44 ` Andrew Cooper 2019-04-03 15:20 ` Jan Beulich [not found] ` <5B61C2120200000000101EDC@prv1-mh.provo.novell.com> [not found] ` <5B61C212020000780022FF0D@prv1-mh.provo.novell.com> 2019-05-17 10:10 ` [PATCH v2 0/3] " Jan Beulich 2019-05-17 10:10 ` [Xen-devel] " Jan Beulich 2019-05-17 10:11 ` [PATCH v2 1/3] x86/idle: re-arrange dead-idle handling Jan Beulich 2019-05-17 10:11 ` [Xen-devel] " Jan Beulich 2019-05-20 14:25 ` Andrew Cooper 2019-05-20 14:25 ` [Xen-devel] " Andrew Cooper 2019-05-17 10:12 ` [PATCH v2 2/3] x86/cpuidle: push parked CPUs into deeper sleep states when possible Jan Beulich 2019-05-17 10:12 ` [Xen-devel] " Jan Beulich 2019-05-17 10:12 ` [PATCH v2 3/3] x86/cpuidle: clean up Cx dumping Jan Beulich 2019-05-17 10:12 ` [Xen-devel] " Jan Beulich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).