* acpi_pad mwait usage @ 2013-11-19 9:00 Peter Zijlstra 2013-11-19 9:08 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2013-11-19 9:00 UTC (permalink / raw) To: lenb, rjw, linux-acpi, linux-kernel; +Cc: shaohua.li Hi Len, Rafeal, I stumbled over acpi_pad (yuck! ;-), but noticed that you don't set the task in polling mode while using mwait. This means we'll still happily send an IPI to wake you up. A little something like the below should do; you might even be able to remove the smp_mb() but since it is completely undocumented (another fail) I couldn't tell if the implied barrier in current_set_polling_and_test() suffices to replace it, so I left it. If it compiles and works; change it to a proper SOB: Maybe-Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- drivers/acpi/acpi_pad.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c index fc6008fbce35..e9126efe7786 100644 --- a/drivers/acpi/acpi_pad.c +++ b/drivers/acpi/acpi_pad.c @@ -195,8 +195,10 @@ static int power_saving_thread(void *data) __monitor((void *)¤t_thread_info()->flags, 0, 0); smp_mb(); - if (!need_resched()) + if (!current_set_polling_and_test()) { __mwait(power_saving_mwait_eax, 1); + __current_clr_polling(); + } start_critical_timings(); if (lapic_marked_unstable) ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: acpi_pad mwait usage 2013-11-19 9:00 acpi_pad mwait usage Peter Zijlstra @ 2013-11-19 9:08 ` Peter Zijlstra 2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2013-11-19 9:08 UTC (permalink / raw) To: lenb, rjw, linux-acpi, linux-kernel Cc: shaohua.li, rui.zhang, arjan, jacob.jun.pan On Tue, Nov 19, 2013 at 10:00:19AM +0100, Peter Zijlstra wrote: > Hi Len, Rafeal, > > I stumbled over acpi_pad (yuck! ;-), but noticed that you don't set the > task in polling mode while using mwait. This means we'll still happily > send an IPI to wake you up. > > A little something like the below should do; you might even be able to > remove the smp_mb() but since it is completely undocumented (another > fail) I couldn't tell if the implied barrier in > current_set_polling_and_test() suffices to replace it, so I left it. > > If it compiles and works; change it to a proper SOB: > > Maybe-Signed-off-by: Peter Zijlstra <peterz@infradead.org> Of course one such driver is insufficient and you lot needed a second; same issue: --- drivers/thermal/intel_powerclamp.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..d8111e1ac62e 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -439,8 +439,11 @@ static int clamp_thread(void *arg) local_touch_nmi(); stop_critical_timings(); __monitor((void *)¤t_thread_info()->flags, 0, 0); - cpu_relax(); /* allow HT sibling to run */ - __mwait(eax, ecx); + if (!current_set_polling_and_test()) { + cpu_relax(); /* allow HT sibling to run */ + __mwait(eax, ecx); + __current_clr_polling(); + } start_critical_timings(); atomic_inc(&idle_wakeup_counter); } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 9:08 ` Peter Zijlstra @ 2013-11-19 11:31 ` Peter Zijlstra 2013-11-19 13:06 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Peter Zijlstra @ 2013-11-19 11:31 UTC (permalink / raw) To: lenb, rjw, linux-acpi, linux-kernel Cc: shaohua.li, rui.zhang, arjan, jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- Mike, does this cure your core2? arch/x86/include/asm/mwait.h | 42 ++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/processor.h | 23 --------------------- arch/x86/kernel/acpi/cstate.c | 23 --------------------- drivers/acpi/acpi_pad.c | 5 +---- drivers/acpi/processor_idle.c | 15 -------------- drivers/idle/intel_idle.c | 8 +------- drivers/thermal/intel_powerclamp.c | 4 +--- 7 files changed, 45 insertions(+), 75 deletions(-) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 2f366d0ac6b4..80014dade987 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_MWAIT_H #define _ASM_X86_MWAIT_H +#include <linux/sched.h> + #define MWAIT_SUBSTATE_MASK 0xf #define MWAIT_CSTATE_MASK 0xf #define MWAIT_SUBSTATE_SIZE 4 @@ -13,4 +15,44 @@ #define MWAIT_ECX_INTERRUPT_BREAK 0x1 +static inline void __monitor(const void *eax, unsigned long ecx, + unsigned long edx) +{ + /* "monitor %eax, %ecx, %edx;" */ + asm volatile(".byte 0x0f, 0x01, 0xc8;" + :: "a" (eax), "c" (ecx), "d"(edx)); +} + +static inline void __mwait(unsigned long eax, unsigned long ecx) +{ + /* "mwait %eax, %ecx;" */ + asm volatile(".byte 0x0f, 0x01, 0xc9;" + :: "a" (eax), "c" (ecx)); +} + +/* + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, + * which can obviate IPI to trigger checking of need_resched. + * We execute MONITOR against need_resched and enter optimized wait state + * through MWAIT. Whenever someone changes need_resched, we would be woken + * up from MWAIT (without an IPI). + * + * New with Core Duo processors, MWAIT can take some hints based on CPU + * capability. + */ +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) +{ + if (need_resched()) + return; + + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) + clflush((void *)¤t_thread_info()->flags); + + __monitor((void *)¤t_thread_info()->flags, 0, 0); + if (!current_set_polling_and_test()) + __mwait(eax, ecx); + + __current_clr_polling(); +} + #endif /* _ASM_X86_MWAIT_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 7b034a4057f9..24821f5768bc 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -700,29 +700,6 @@ static inline void sync_core(void) #endif } -static inline void __monitor(const void *eax, unsigned long ecx, - unsigned long edx) -{ - /* "monitor %eax, %ecx, %edx;" */ - asm volatile(".byte 0x0f, 0x01, 0xc8;" - :: "a" (eax), "c" (ecx), "d"(edx)); -} - -static inline void __mwait(unsigned long eax, unsigned long ecx) -{ - /* "mwait %eax, %ecx;" */ - asm volatile(".byte 0x0f, 0x01, 0xc9;" - :: "a" (eax), "c" (ecx)); -} - -static inline void __sti_mwait(unsigned long eax, unsigned long ecx) -{ - trace_hardirqs_on(); - /* "mwait %eax, %ecx;" */ - asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" - :: "a" (eax), "c" (ecx)); -} - extern void select_idle_routine(const struct cpuinfo_x86 *c); extern void init_amd_e400_c1e_mask(void); diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c index d2b7f27781bc..e69182fd01cf 100644 --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu, } EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe); -/* - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, - * which can obviate IPI to trigger checking of need_resched. - * We execute MONITOR against need_resched and enter optimized wait state - * through MWAIT. Whenever someone changes need_resched, we would be woken - * up from MWAIT (without an IPI). - * - * New with Core Duo processors, MWAIT can take some hints based on CPU - * capability. - */ -void mwait_idle_with_hints(unsigned long ax, unsigned long cx) -{ - if (!need_resched()) { - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) - clflush((void *)¤t_thread_info()->flags); - - __monitor((void *)¤t_thread_info()->flags, 0, 0); - smp_mb(); - if (!need_resched()) - __mwait(ax, cx); - } -} - void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) { unsigned int cpu = smp_processor_id(); diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c index fc6008fbce35..509452a62f96 100644 --- a/drivers/acpi/acpi_pad.c +++ b/drivers/acpi/acpi_pad.c @@ -193,10 +193,7 @@ static int power_saving_thread(void *data) CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); stop_critical_timings(); - __monitor((void *)¤t_thread_info()->flags, 0, 0); - smp_mb(); - if (!need_resched()) - __mwait(power_saving_mwait_eax, 1); + mwait_idle_with_hints(power_saving_mwait_eax, 1); start_critical_timings(); if (lapic_marked_unstable) diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 644516d9bde6..f90c56c8379e 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -727,11 +727,6 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev, if (unlikely(!pr)) return -EINVAL; - if (cx->entry_method == ACPI_CSTATE_FFH) { - if (current_set_polling_and_test()) - return -EINVAL; - } - lapic_timer_state_broadcast(pr, cx, 1); acpi_idle_do_entry(cx); @@ -785,11 +780,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev, if (unlikely(!pr)) return -EINVAL; - if (cx->entry_method == ACPI_CSTATE_FFH) { - if (current_set_polling_and_test()) - return -EINVAL; - } - /* * Must be done before busmaster disable as we might need to * access HPET ! @@ -841,11 +831,6 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, } } - if (cx->entry_method == ACPI_CSTATE_FFH) { - if (current_set_polling_and_test()) - return -EINVAL; - } - acpi_unlazy_tlb(smp_processor_id()); /* Tell the scheduler that we are going deep-idle: */ diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 3226ce98fb18..3b56d76a5bca 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -359,13 +359,7 @@ static int intel_idle(struct cpuidle_device *dev, if (!(lapic_timer_reliable_states & (1 << (cstate)))) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); - if (!current_set_polling_and_test()) { - - __monitor((void *)¤t_thread_info()->flags, 0, 0); - smp_mb(); - if (!need_resched()) - __mwait(eax, ecx); - } + mwait_idle_with_hints(eax, ecx); if (!(lapic_timer_reliable_states & (1 << (cstate)))) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..e8275f2df9af 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) */ local_touch_nmi(); stop_critical_timings(); - __monitor((void *)¤t_thread_info()->flags, 0, 0); - cpu_relax(); /* allow HT sibling to run */ - __mwait(eax, ecx); + mwait_idle_with_hints(eax, ecx); start_critical_timings(); atomic_inc(&idle_wakeup_counter); } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra @ 2013-11-19 13:06 ` Rafael J. Wysocki 2013-11-19 13:21 ` Mike Galbraith 2013-11-19 14:22 ` Arjan van de Ven 2 siblings, 0 replies; 15+ messages in thread From: Rafael J. Wysocki @ 2013-11-19 13:06 UTC (permalink / raw) To: Peter Zijlstra Cc: lenb, linux-acpi, linux-kernel, shaohua.li, rui.zhang, arjan, jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Tuesday, November 19, 2013 12:31:53 PM Peter Zijlstra wrote: > People seem to delight in writing wrong and broken mwait idle routines; > collapse the lot. > > This leaves mwait_play_dead() the sole remaining user of __mwait() and > new __mwait() users are probably doing it wrong. > > Also remove __sti_mwait() as its unused. > > Signed-off-by: Peter Zijlstra <peterz@infradead.org> For the ACPI part: Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > Mike, does this cure your core2? > > arch/x86/include/asm/mwait.h | 42 ++++++++++++++++++++++++++++++++++++++ > arch/x86/include/asm/processor.h | 23 --------------------- > arch/x86/kernel/acpi/cstate.c | 23 --------------------- > drivers/acpi/acpi_pad.c | 5 +---- > drivers/acpi/processor_idle.c | 15 -------------- > drivers/idle/intel_idle.c | 8 +------- > drivers/thermal/intel_powerclamp.c | 4 +--- > 7 files changed, 45 insertions(+), 75 deletions(-) > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index 2f366d0ac6b4..80014dade987 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -1,6 +1,8 @@ > #ifndef _ASM_X86_MWAIT_H > #define _ASM_X86_MWAIT_H > > +#include <linux/sched.h> > + > #define MWAIT_SUBSTATE_MASK 0xf > #define MWAIT_CSTATE_MASK 0xf > #define MWAIT_SUBSTATE_SIZE 4 > @@ -13,4 +15,44 @@ > > #define MWAIT_ECX_INTERRUPT_BREAK 0x1 > > +static inline void __monitor(const void *eax, unsigned long ecx, > + unsigned long edx) > +{ > + /* "monitor %eax, %ecx, %edx;" */ > + asm volatile(".byte 0x0f, 0x01, 0xc8;" > + :: "a" (eax), "c" (ecx), "d"(edx)); > +} > + > +static inline void __mwait(unsigned long eax, unsigned long ecx) > +{ > + /* "mwait %eax, %ecx;" */ > + asm volatile(".byte 0x0f, 0x01, 0xc9;" > + :: "a" (eax), "c" (ecx)); > +} > + > +/* > + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, > + * which can obviate IPI to trigger checking of need_resched. > + * We execute MONITOR against need_resched and enter optimized wait state > + * through MWAIT. Whenever someone changes need_resched, we would be woken > + * up from MWAIT (without an IPI). > + * > + * New with Core Duo processors, MWAIT can take some hints based on CPU > + * capability. > + */ > +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) > +{ > + if (need_resched()) > + return; > + > + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) > + clflush((void *)¤t_thread_info()->flags); > + > + __monitor((void *)¤t_thread_info()->flags, 0, 0); > + if (!current_set_polling_and_test()) > + __mwait(eax, ecx); > + > + __current_clr_polling(); > +} > + > #endif /* _ASM_X86_MWAIT_H */ > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h > index 7b034a4057f9..24821f5768bc 100644 > --- a/arch/x86/include/asm/processor.h > +++ b/arch/x86/include/asm/processor.h > @@ -700,29 +700,6 @@ static inline void sync_core(void) > #endif > } > > -static inline void __monitor(const void *eax, unsigned long ecx, > - unsigned long edx) > -{ > - /* "monitor %eax, %ecx, %edx;" */ > - asm volatile(".byte 0x0f, 0x01, 0xc8;" > - :: "a" (eax), "c" (ecx), "d"(edx)); > -} > - > -static inline void __mwait(unsigned long eax, unsigned long ecx) > -{ > - /* "mwait %eax, %ecx;" */ > - asm volatile(".byte 0x0f, 0x01, 0xc9;" > - :: "a" (eax), "c" (ecx)); > -} > - > -static inline void __sti_mwait(unsigned long eax, unsigned long ecx) > -{ > - trace_hardirqs_on(); > - /* "mwait %eax, %ecx;" */ > - asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" > - :: "a" (eax), "c" (ecx)); > -} > - > extern void select_idle_routine(const struct cpuinfo_x86 *c); > extern void init_amd_e400_c1e_mask(void); > > diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c > index d2b7f27781bc..e69182fd01cf 100644 > --- a/arch/x86/kernel/acpi/cstate.c > +++ b/arch/x86/kernel/acpi/cstate.c > @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu, > } > EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe); > > -/* > - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, > - * which can obviate IPI to trigger checking of need_resched. > - * We execute MONITOR against need_resched and enter optimized wait state > - * through MWAIT. Whenever someone changes need_resched, we would be woken > - * up from MWAIT (without an IPI). > - * > - * New with Core Duo processors, MWAIT can take some hints based on CPU > - * capability. > - */ > -void mwait_idle_with_hints(unsigned long ax, unsigned long cx) > -{ > - if (!need_resched()) { > - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) > - clflush((void *)¤t_thread_info()->flags); > - > - __monitor((void *)¤t_thread_info()->flags, 0, 0); > - smp_mb(); > - if (!need_resched()) > - __mwait(ax, cx); > - } > -} > - > void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) > { > unsigned int cpu = smp_processor_id(); > diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c > index fc6008fbce35..509452a62f96 100644 > --- a/drivers/acpi/acpi_pad.c > +++ b/drivers/acpi/acpi_pad.c > @@ -193,10 +193,7 @@ static int power_saving_thread(void *data) > CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); > stop_critical_timings(); > > - __monitor((void *)¤t_thread_info()->flags, 0, 0); > - smp_mb(); > - if (!need_resched()) > - __mwait(power_saving_mwait_eax, 1); > + mwait_idle_with_hints(power_saving_mwait_eax, 1); > > start_critical_timings(); > if (lapic_marked_unstable) > diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c > index 644516d9bde6..f90c56c8379e 100644 > --- a/drivers/acpi/processor_idle.c > +++ b/drivers/acpi/processor_idle.c > @@ -727,11 +727,6 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev, > if (unlikely(!pr)) > return -EINVAL; > > - if (cx->entry_method == ACPI_CSTATE_FFH) { > - if (current_set_polling_and_test()) > - return -EINVAL; > - } > - > lapic_timer_state_broadcast(pr, cx, 1); > acpi_idle_do_entry(cx); > > @@ -785,11 +780,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev, > if (unlikely(!pr)) > return -EINVAL; > > - if (cx->entry_method == ACPI_CSTATE_FFH) { > - if (current_set_polling_and_test()) > - return -EINVAL; > - } > - > /* > * Must be done before busmaster disable as we might need to > * access HPET ! > @@ -841,11 +831,6 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev, > } > } > > - if (cx->entry_method == ACPI_CSTATE_FFH) { > - if (current_set_polling_and_test()) > - return -EINVAL; > - } > - > acpi_unlazy_tlb(smp_processor_id()); > > /* Tell the scheduler that we are going deep-idle: */ > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c > index 3226ce98fb18..3b56d76a5bca 100644 > --- a/drivers/idle/intel_idle.c > +++ b/drivers/idle/intel_idle.c > @@ -359,13 +359,7 @@ static int intel_idle(struct cpuidle_device *dev, > if (!(lapic_timer_reliable_states & (1 << (cstate)))) > clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); > > - if (!current_set_polling_and_test()) { > - > - __monitor((void *)¤t_thread_info()->flags, 0, 0); > - smp_mb(); > - if (!need_resched()) > - __mwait(eax, ecx); > - } > + mwait_idle_with_hints(eax, ecx); > > if (!(lapic_timer_reliable_states & (1 << (cstate)))) > clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); > diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c > index 8f181b3f842b..e8275f2df9af 100644 > --- a/drivers/thermal/intel_powerclamp.c > +++ b/drivers/thermal/intel_powerclamp.c > @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) > */ > local_touch_nmi(); > stop_critical_timings(); > - __monitor((void *)¤t_thread_info()->flags, 0, 0); > - cpu_relax(); /* allow HT sibling to run */ > - __mwait(eax, ecx); > + mwait_idle_with_hints(eax, ecx); > start_critical_timings(); > atomic_inc(&idle_wakeup_counter); > } -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra 2013-11-19 13:06 ` Rafael J. Wysocki @ 2013-11-19 13:21 ` Mike Galbraith 2013-11-19 14:22 ` Arjan van de Ven 2 siblings, 0 replies; 15+ messages in thread From: Mike Galbraith @ 2013-11-19 13:21 UTC (permalink / raw) To: Peter Zijlstra Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, arjan, jacob.jun.pan, Ingo Molnar, Thomas Gleixner, hpa On Tue, 2013-11-19 at 12:31 +0100, Peter Zijlstra wrote: > People seem to delight in writing wrong and broken mwait idle routines; > collapse the lot. > > This leaves mwait_play_dead() the sole remaining user of __mwait() and > new __mwait() users are probably doing it wrong. > > Also remove __sti_mwait() as its unused. > > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > --- > > Mike, does this cure your core2? Nope. Maybe an acpi/bios thingy on this box since diags that fired on lappy during boot did not fire on desktop, and both are core2 booting same kernel. I kinda suspect I'll either be stuck with default_idle or have to resurrect mwait_idle and carry it locally if I want the thing to work well. Guess I'll find out if/when I have time to squabble with it. Meanwhile, desktop box works fine modulo benchmarks, lappy works fine too, modulo max_cstate=1 scaring mwait_idle_with_hints away, which I don't care about much. Neither box is wonderful at rt testing where I usually boot max_cstate=1, both just became a bit _less_ wonderful :) -Mike ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra 2013-11-19 13:06 ` Rafael J. Wysocki 2013-11-19 13:21 ` Mike Galbraith @ 2013-11-19 14:22 ` Arjan van de Ven 2013-11-19 14:46 ` Peter Zijlstra 2013-11-19 14:51 ` Peter Zijlstra 2 siblings, 2 replies; 15+ messages in thread From: Arjan van de Ven @ 2013-11-19 14:22 UTC (permalink / raw) To: Peter Zijlstra, lenb, rjw, linux-acpi, linux-kernel Cc: shaohua.li, rui.zhang, jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa > diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c > index 8f181b3f842b..e8275f2df9af 100644 > --- a/drivers/thermal/intel_powerclamp.c > +++ b/drivers/thermal/intel_powerclamp.c > @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) > */ > local_touch_nmi(); > stop_critical_timings(); > - __monitor((void *)¤t_thread_info()->flags, 0, 0); > - cpu_relax(); /* allow HT sibling to run */ > - __mwait(eax, ecx); > + mwait_idle_with_hints(eax, ecx); > start_critical_timings(); > atomic_inc(&idle_wakeup_counter); > } > hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ? if so... powerclamp may not want to use that (the whole point is to NOT give the cpu to tasks!) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 14:22 ` Arjan van de Ven @ 2013-11-19 14:46 ` Peter Zijlstra 2013-11-19 14:51 ` Peter Zijlstra 1 sibling, 0 replies; 15+ messages in thread From: Peter Zijlstra @ 2013-11-19 14:46 UTC (permalink / raw) To: Arjan van de Ven Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote: > >diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c > >index 8f181b3f842b..e8275f2df9af 100644 > >--- a/drivers/thermal/intel_powerclamp.c > >+++ b/drivers/thermal/intel_powerclamp.c > >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg) > > */ > > local_touch_nmi(); > > stop_critical_timings(); > >- __monitor((void *)¤t_thread_info()->flags, 0, 0); > >- cpu_relax(); /* allow HT sibling to run */ > >- __mwait(eax, ecx); > >+ mwait_idle_with_hints(eax, ecx); > > start_critical_timings(); > > atomic_inc(&idle_wakeup_counter); > > } > > > > hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ? > if so... powerclamp may not want to use that (the whole point is to NOT give the cpu > to tasks!) Of course, you're very much not getting something that wins from stop_machine or a fifo-99 task. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 14:22 ` Arjan van de Ven 2013-11-19 14:46 ` Peter Zijlstra @ 2013-11-19 14:51 ` Peter Zijlstra 2013-11-19 15:13 ` Peter Zijlstra 1 sibling, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2013-11-19 14:51 UTC (permalink / raw) To: Arjan van de Ven Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote: > >diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c > >index 8f181b3f842b..e8275f2df9af 100644 > >--- a/drivers/thermal/intel_powerclamp.c > >+++ b/drivers/thermal/intel_powerclamp.c > >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg) > > */ > > local_touch_nmi(); > > stop_critical_timings(); > >- __monitor((void *)¤t_thread_info()->flags, 0, 0); > >- cpu_relax(); /* allow HT sibling to run */ > >- __mwait(eax, ecx); > >+ mwait_idle_with_hints(eax, ecx); > > start_critical_timings(); > > atomic_inc(&idle_wakeup_counter); > > } > > > > hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ? > if so... powerclamp may not want to use that (the whole point is to NOT give the cpu > to tasks!) That said, that drive is completely wrecked. It uses preempt_enable_no_resched() wrong too, it has uncommented barriers.. Dude, wtf are you guys smoking? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 14:51 ` Peter Zijlstra @ 2013-11-19 15:13 ` Peter Zijlstra 2013-11-19 21:06 ` Jacob Pan 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2013-11-19 15:13 UTC (permalink / raw) To: Arjan van de Ven Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: > That said, that drive is completely wrecked. It uses > preempt_enable_no_resched() wrong too, it has uncommented barriers.. > > Dude, wtf are you guys smoking? --- Subject: sched: Take away preempt_enable_no_resched() and friends from modules There is no way in hell modules are going to play preemption tricks like this. Cc: eliezer.tamir@linux.intel.com Cc: arjan@linux.intel.com Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- drivers/thermal/intel_powerclamp.c | 2 +- include/linux/preempt.h | 8 +++++++- include/net/busy_poll.h | 15 +++------------ 3 files changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c 100644 --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -445,7 +445,7 @@ static int clamp_thread(void *arg) atomic_inc(&idle_wakeup_counter); } tick_nohz_idle_exit(); - preempt_enable_no_resched(); + preempt_enable(); } del_timer_sync(&wakeup_timer); clear_bit(cpunr, cpu_clamping_mask); diff --git a/include/linux/preempt.h b/include/linux/preempt.h index a3d9dc8c2c00..3ed2b5335ab4 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -64,7 +64,7 @@ do { \ } while (0) #else -#define preempt_enable() preempt_enable_no_resched() +#define preempt_enable() sched_preempt_enable_no_resched() #define preempt_check_resched() do { } while (0) #endif @@ -116,6 +116,12 @@ do { \ #endif /* CONFIG_PREEMPT_COUNT */ +#ifdef MODULE +#undef preempt_enable_no_resched +#undef preempt_enable_no_resched_notrace +#undef preempt_check_resched +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS struct preempt_notifier; diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 829627d7b846..756827a86c2d 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void) return sysctl_net_busy_poll; } -/* a wrapper to make debug_smp_processor_id() happy - * we can use sched_clock() because we don't care much about precision - * we only care that the average is bounded - */ -#ifdef CONFIG_DEBUG_PREEMPT static inline u64 busy_loop_us_clock(void) { u64 rc; + /* XXX with interrupts enabled sched_clock() can return utter garbage */ + preempt_disable_notrace(); rc = sched_clock(); - preempt_enable_no_resched_notrace(); + preempt_enable_notrace(); return rc >> 10; } -#else /* CONFIG_DEBUG_PREEMPT */ -static inline u64 busy_loop_us_clock(void) -{ - return sched_clock() >> 10; -} -#endif /* CONFIG_DEBUG_PREEMPT */ static inline unsigned long sk_busy_loop_end_time(struct sock *sk) { ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 15:13 ` Peter Zijlstra @ 2013-11-19 21:06 ` Jacob Pan 2013-11-20 10:28 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Jacob Pan @ 2013-11-19 21:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Arjan van de Ven, lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Tue, 19 Nov 2013 16:13:38 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: > > That said, that drive is completely wrecked. It uses > > preempt_enable_no_resched() wrong too, it has uncommented barriers.. > > > > Dude, wtf are you guys smoking? > I applied this patch on top of upstream kernel (801a760) and found out my machine completely failed to enter idle when nothing is running. turbostate shows 100% C0. ftrace shows kernel coming in and out of idle frequently. Both ACPI idle and intel_idle behaves the same way. I have to do the following change to allow entering C-states again. diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 80014da..b51d1e1 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) clflush((void *)¤t_thread_info()->flags); __monitor((void *)¤t_thread_info()->flags, 0, 0); - if (!current_set_polling_and_test()) + if (!need_resched()) __mwait(eax, ecx); - - __current_clr_polling(); } #endif /* _ASM_X86_MWAIT_H */ Did i miss any other patches? Jacob > --- > Subject: sched: Take away preempt_enable_no_resched() and friends > from modules > > There is no way in hell modules are going to play preemption tricks > like this. > > Cc: eliezer.tamir@linux.intel.com > Cc: arjan@linux.intel.com > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > --- > drivers/thermal/intel_powerclamp.c | 2 +- > include/linux/preempt.h | 8 +++++++- > include/net/busy_poll.h | 15 +++------------ > 3 files changed, 11 insertions(+), 14 deletions(-) > > diff --git a/drivers/thermal/intel_powerclamp.c > b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c > 100644 --- a/drivers/thermal/intel_powerclamp.c > +++ b/drivers/thermal/intel_powerclamp.c > @@ -445,7 +445,7 @@ static int clamp_thread(void *arg) > atomic_inc(&idle_wakeup_counter); > } > tick_nohz_idle_exit(); > - preempt_enable_no_resched(); > + preempt_enable(); > } > del_timer_sync(&wakeup_timer); > clear_bit(cpunr, cpu_clamping_mask); > diff --git a/include/linux/preempt.h b/include/linux/preempt.h > index a3d9dc8c2c00..3ed2b5335ab4 100644 > --- a/include/linux/preempt.h > +++ b/include/linux/preempt.h > @@ -64,7 +64,7 @@ do { \ > } while (0) > > #else > -#define preempt_enable() preempt_enable_no_resched() > +#define preempt_enable() sched_preempt_enable_no_resched() > #define preempt_check_resched() do { } while (0) > #endif > > @@ -116,6 +116,12 @@ do { \ > > #endif /* CONFIG_PREEMPT_COUNT */ > > +#ifdef MODULE > +#undef preempt_enable_no_resched > +#undef preempt_enable_no_resched_notrace > +#undef preempt_check_resched > +#endif > + > #ifdef CONFIG_PREEMPT_NOTIFIERS > > struct preempt_notifier; > diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h > index 829627d7b846..756827a86c2d 100644 > --- a/include/net/busy_poll.h > +++ b/include/net/busy_poll.h > @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void) > return sysctl_net_busy_poll; > } > > -/* a wrapper to make debug_smp_processor_id() happy > - * we can use sched_clock() because we don't care much about > precision > - * we only care that the average is bounded > - */ > -#ifdef CONFIG_DEBUG_PREEMPT > static inline u64 busy_loop_us_clock(void) > { > u64 rc; > > + /* XXX with interrupts enabled sched_clock() can return > utter garbage */ + > preempt_disable_notrace(); > rc = sched_clock(); > - preempt_enable_no_resched_notrace(); > + preempt_enable_notrace(); > > return rc >> 10; > } > -#else /* CONFIG_DEBUG_PREEMPT */ > -static inline u64 busy_loop_us_clock(void) > -{ > - return sched_clock() >> 10; > -} > -#endif /* CONFIG_DEBUG_PREEMPT */ > > static inline unsigned long sk_busy_loop_end_time(struct sock *sk) > { > [Jacob Pan] ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-19 21:06 ` Jacob Pan @ 2013-11-20 10:28 ` Peter Zijlstra 2013-11-20 10:58 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2013-11-20 10:28 UTC (permalink / raw) To: Jacob Pan Cc: Arjan van de Ven, lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: > On Tue, 19 Nov 2013 16:13:38 +0100 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote: > > > That said, that drive is completely wrecked. It uses > > > preempt_enable_no_resched() wrong too, it has uncommented barriers.. > > > > > > Dude, wtf are you guys smoking? > > > I applied this patch on top of upstream kernel (801a760) and found out > my machine completely failed to enter idle when nothing is running. > turbostate shows 100% C0. ftrace shows kernel coming in and out of idle > frequently. > > Both ACPI idle and intel_idle behaves the same way. I have to do the > following change to allow entering C-states again. > > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h > index 80014da..b51d1e1 100644 > --- a/arch/x86/include/asm/mwait.h > +++ b/arch/x86/include/asm/mwait.h > @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned > long eax, unsigned long ecx) clflush((void > *)¤t_thread_info()->flags); > __monitor((void *)¤t_thread_info()->flags, 0, 0); > - if (!current_set_polling_and_test()) > + if (!need_resched()) > __mwait(eax, ecx); > - > - __current_clr_polling(); > } > > #endif /* _ASM_X86_MWAIT_H */ That doesn't make any sense; current_set_polling_and_test() returns the same thing need_resched() does. But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% C1 residency... most weird. /me goes prod at it ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-20 10:28 ` Peter Zijlstra @ 2013-11-20 10:58 ` Peter Zijlstra 2013-11-20 16:24 ` Arjan van de Ven 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2013-11-20 10:58 UTC (permalink / raw) To: Jacob Pan Cc: Arjan van de Ven, lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote: > On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: > > I applied this patch on top of upstream kernel (801a760) and found out > > my machine completely failed to enter idle when nothing is running. > > turbostate shows 100% C0. ftrace shows kernel coming in and out of idle > > frequently. > > > > Both ACPI idle and intel_idle behaves the same way. I have to do the > > following change to allow entering C-states again. > That doesn't make any sense; current_set_polling_and_test() returns the > same thing need_resched() does. > > But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% > C1 residency... most weird. So pretty silly actually; you cannot do a store (any store) in between monitor and mwait. The below version seems to work properly again with both acpi_idle and intel_idle. Now to go make that preempt_disable_no_resched cleanup compile.. :-) --- Subject: x86, acpi, idle: Restructure the mwait idle routines From: Peter Zijlstra <peterz@infradead.org> Date: Tue, 19 Nov 2013 12:31:53 +0100 People seem to delight in writing wrong and broken mwait idle routines; collapse the lot. This leaves mwait_play_dead() the sole remaining user of __mwait() and new __mwait() users are probably doing it wrong. Also remove __sti_mwait() as its unused. Cc: arjan@linux.intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Cc: lenb@kernel.org Cc: shaohua.li@intel.com Cc: rui.zhang@intel.com Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> --- arch/x86/include/asm/mwait.h | 40 +++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/processor.h | 23 --------------------- arch/x86/kernel/acpi/cstate.c | 23 --------------------- drivers/acpi/acpi_pad.c | 5 ---- drivers/acpi/processor_idle.c | 15 ------------- drivers/idle/intel_idle.c | 8 ------- drivers/thermal/intel_powerclamp.c | 4 --- 7 files changed, 43 insertions(+), 75 deletions(-) --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_MWAIT_H #define _ASM_X86_MWAIT_H +#include <linux/sched.h> + #define MWAIT_SUBSTATE_MASK 0xf #define MWAIT_CSTATE_MASK 0xf #define MWAIT_SUBSTATE_SIZE 4 @@ -13,4 +15,42 @@ #define MWAIT_ECX_INTERRUPT_BREAK 0x1 +static inline void __monitor(const void *eax, unsigned long ecx, + unsigned long edx) +{ + /* "monitor %eax, %ecx, %edx;" */ + asm volatile(".byte 0x0f, 0x01, 0xc8;" + :: "a" (eax), "c" (ecx), "d"(edx)); +} + +static inline void __mwait(unsigned long eax, unsigned long ecx) +{ + /* "mwait %eax, %ecx;" */ + asm volatile(".byte 0x0f, 0x01, 0xc9;" + :: "a" (eax), "c" (ecx)); +} + +/* + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, + * which can obviate IPI to trigger checking of need_resched. + * We execute MONITOR against need_resched and enter optimized wait state + * through MWAIT. Whenever someone changes need_resched, we would be woken + * up from MWAIT (without an IPI). + * + * New with Core Duo processors, MWAIT can take some hints based on CPU + * capability. + */ +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) +{ + if (!current_set_polling_and_test()) { + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) + clflush((void *)¤t_thread_info()->flags); + + __monitor((void *)¤t_thread_info()->flags, 0, 0); + if (!need_resched()) + __mwait(eax, ecx); + } + __current_clr_polling(); +} + #endif /* _ASM_X86_MWAIT_H */ --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -700,29 +700,6 @@ static inline void sync_core(void) #endif } -static inline void __monitor(const void *eax, unsigned long ecx, - unsigned long edx) -{ - /* "monitor %eax, %ecx, %edx;" */ - asm volatile(".byte 0x0f, 0x01, 0xc8;" - :: "a" (eax), "c" (ecx), "d"(edx)); -} - -static inline void __mwait(unsigned long eax, unsigned long ecx) -{ - /* "mwait %eax, %ecx;" */ - asm volatile(".byte 0x0f, 0x01, 0xc9;" - :: "a" (eax), "c" (ecx)); -} - -static inline void __sti_mwait(unsigned long eax, unsigned long ecx) -{ - trace_hardirqs_on(); - /* "mwait %eax, %ecx;" */ - asm volatile("sti; .byte 0x0f, 0x01, 0xc9;" - :: "a" (eax), "c" (ecx)); -} - extern void select_idle_routine(const struct cpuinfo_x86 *c); extern void init_amd_e400_c1e_mask(void); --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsi } EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe); -/* - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, - * which can obviate IPI to trigger checking of need_resched. - * We execute MONITOR against need_resched and enter optimized wait state - * through MWAIT. Whenever someone changes need_resched, we would be woken - * up from MWAIT (without an IPI). - * - * New with Core Duo processors, MWAIT can take some hints based on CPU - * capability. - */ -void mwait_idle_with_hints(unsigned long ax, unsigned long cx) -{ - if (!need_resched()) { - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) - clflush((void *)¤t_thread_info()->flags); - - __monitor((void *)¤t_thread_info()->flags, 0, 0); - smp_mb(); - if (!need_resched()) - __mwait(ax, cx); - } -} - void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) { unsigned int cpu = smp_processor_id(); --- a/drivers/acpi/acpi_pad.c +++ b/drivers/acpi/acpi_pad.c @@ -193,10 +193,7 @@ static int power_saving_thread(void *dat CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); stop_critical_timings(); - __monitor((void *)¤t_thread_info()->flags, 0, 0); - smp_mb(); - if (!need_resched()) - __mwait(power_saving_mwait_eax, 1); + mwait_idle_with_hints(power_saving_mwait_eax, 1); start_critical_timings(); if (lapic_marked_unstable) --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -727,11 +727,6 @@ static int acpi_idle_enter_c1(struct cpu if (unlikely(!pr)) return -EINVAL; - if (cx->entry_method == ACPI_CSTATE_FFH) { - if (current_set_polling_and_test()) - return -EINVAL; - } - lapic_timer_state_broadcast(pr, cx, 1); acpi_idle_do_entry(cx); @@ -785,11 +780,6 @@ static int acpi_idle_enter_simple(struct if (unlikely(!pr)) return -EINVAL; - if (cx->entry_method == ACPI_CSTATE_FFH) { - if (current_set_polling_and_test()) - return -EINVAL; - } - /* * Must be done before busmaster disable as we might need to * access HPET ! @@ -841,11 +831,6 @@ static int acpi_idle_enter_bm(struct cpu } } - if (cx->entry_method == ACPI_CSTATE_FFH) { - if (current_set_polling_and_test()) - return -EINVAL; - } - acpi_unlazy_tlb(smp_processor_id()); /* Tell the scheduler that we are going deep-idle: */ --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -359,13 +359,7 @@ static int intel_idle(struct cpuidle_dev if (!(lapic_timer_reliable_states & (1 << (cstate)))) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu); - if (!current_set_polling_and_test()) { - - __monitor((void *)¤t_thread_info()->flags, 0, 0); - smp_mb(); - if (!need_resched()) - __mwait(eax, ecx); - } + mwait_idle_with_hints(eax, ecx); if (!(lapic_timer_reliable_states & (1 << (cstate)))) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu); --- a/drivers/thermal/intel_powerclamp.c +++ b/drivers/thermal/intel_powerclamp.c @@ -438,9 +438,7 @@ static int clamp_thread(void *arg) */ local_touch_nmi(); stop_critical_timings(); - __monitor((void *)¤t_thread_info()->flags, 0, 0); - cpu_relax(); /* allow HT sibling to run */ - __mwait(eax, ecx); + mwait_idle_with_hints(eax, ecx); start_critical_timings(); atomic_inc(&idle_wakeup_counter); } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-20 10:58 ` Peter Zijlstra @ 2013-11-20 16:24 ` Arjan van de Ven 2013-11-20 16:33 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Arjan van de Ven @ 2013-11-20 16:24 UTC (permalink / raw) To: Peter Zijlstra, Jacob Pan Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On 11/20/2013 2:58 AM, Peter Zijlstra wrote: > On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote: >> On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote: >>> I applied this patch on top of upstream kernel (801a760) and found out >>> my machine completely failed to enter idle when nothing is running. >>> turbostate shows 100% C0. ftrace shows kernel coming in and out of idle >>> frequently. >>> >>> Both ACPI idle and intel_idle behaves the same way. I have to do the >>> following change to allow entering C-states again. > >> That doesn't make any sense; current_set_polling_and_test() returns the >> same thing need_resched() does. >> >> But you're right, intel_idle resides 100% in C0 and acpi_idle has 100% >> C1 residency... most weird. > > So pretty silly actually; you cannot do a store (any store) in between > monitor and mwait. you can just not to the cacheline you are watching (or things that alias with that) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-20 16:24 ` Arjan van de Ven @ 2013-11-20 16:33 ` Peter Zijlstra 2013-11-20 16:38 ` Arjan van de Ven 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2013-11-20 16:33 UTC (permalink / raw) To: Arjan van de Ven Cc: Jacob Pan, lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote: > On 11/20/2013 2:58 AM, Peter Zijlstra wrote: > >So pretty silly actually; you cannot do a store (any store) in between > >monitor and mwait. > > you can > just not to the cacheline you are watching (or things that alias with that) Ah indeed, the thread_info::status is in the same cacheline. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines 2013-11-20 16:33 ` Peter Zijlstra @ 2013-11-20 16:38 ` Arjan van de Ven 0 siblings, 0 replies; 15+ messages in thread From: Arjan van de Ven @ 2013-11-20 16:38 UTC (permalink / raw) To: Peter Zijlstra Cc: Jacob Pan, lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa On 11/20/2013 8:33 AM, Peter Zijlstra wrote: > On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote: >> On 11/20/2013 2:58 AM, Peter Zijlstra wrote: >>> So pretty silly actually; you cannot do a store (any store) in between >>> monitor and mwait. >> >> you can >> just not to the cacheline you are watching (or things that alias with that) > > Ah indeed, the thread_info::status is in the same cacheline. > you told it to wake on any write to that cacheline and then you write to it ;-) ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-11-20 16:38 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-11-19 9:00 acpi_pad mwait usage Peter Zijlstra 2013-11-19 9:08 ` Peter Zijlstra 2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra 2013-11-19 13:06 ` Rafael J. Wysocki 2013-11-19 13:21 ` Mike Galbraith 2013-11-19 14:22 ` Arjan van de Ven 2013-11-19 14:46 ` Peter Zijlstra 2013-11-19 14:51 ` Peter Zijlstra 2013-11-19 15:13 ` Peter Zijlstra 2013-11-19 21:06 ` Jacob Pan 2013-11-20 10:28 ` Peter Zijlstra 2013-11-20 10:58 ` Peter Zijlstra 2013-11-20 16:24 ` Arjan van de Ven 2013-11-20 16:33 ` Peter Zijlstra 2013-11-20 16:38 ` Arjan van de Ven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).