* acpi_pad mwait usage
@ 2013-11-19 9:00 Peter Zijlstra
2013-11-19 9:08 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-19 9:00 UTC (permalink / raw)
To: lenb, rjw, linux-acpi, linux-kernel; +Cc: shaohua.li
Hi Len, Rafeal,
I stumbled over acpi_pad (yuck! ;-), but noticed that you don't set the
task in polling mode while using mwait. This means we'll still happily
send an IPI to wake you up.
A little something like the below should do; you might even be able to
remove the smp_mb() but since it is completely undocumented (another
fail) I couldn't tell if the implied barrier in
current_set_polling_and_test() suffices to replace it, so I left it.
If it compiles and works; change it to a proper SOB:
Maybe-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
drivers/acpi/acpi_pad.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
index fc6008fbce35..e9126efe7786 100644
--- a/drivers/acpi/acpi_pad.c
+++ b/drivers/acpi/acpi_pad.c
@@ -195,8 +195,10 @@ static int power_saving_thread(void *data)
__monitor((void *)¤t_thread_info()->flags, 0, 0);
smp_mb();
- if (!need_resched())
+ if (!current_set_polling_and_test()) {
__mwait(power_saving_mwait_eax, 1);
+ __current_clr_polling();
+ }
start_critical_timings();
if (lapic_marked_unstable)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: acpi_pad mwait usage
2013-11-19 9:00 acpi_pad mwait usage Peter Zijlstra
@ 2013-11-19 9:08 ` Peter Zijlstra
2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-19 9:08 UTC (permalink / raw)
To: lenb, rjw, linux-acpi, linux-kernel
Cc: shaohua.li, rui.zhang, arjan, jacob.jun.pan
On Tue, Nov 19, 2013 at 10:00:19AM +0100, Peter Zijlstra wrote:
> Hi Len, Rafeal,
>
> I stumbled over acpi_pad (yuck! ;-), but noticed that you don't set the
> task in polling mode while using mwait. This means we'll still happily
> send an IPI to wake you up.
>
> A little something like the below should do; you might even be able to
> remove the smp_mb() but since it is completely undocumented (another
> fail) I couldn't tell if the implied barrier in
> current_set_polling_and_test() suffices to replace it, so I left it.
>
> If it compiles and works; change it to a proper SOB:
>
> Maybe-Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Of course one such driver is insufficient and you lot needed a second;
same issue:
---
drivers/thermal/intel_powerclamp.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 8f181b3f842b..d8111e1ac62e 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -439,8 +439,11 @@ static int clamp_thread(void *arg)
local_touch_nmi();
stop_critical_timings();
__monitor((void *)¤t_thread_info()->flags, 0, 0);
- cpu_relax(); /* allow HT sibling to run */
- __mwait(eax, ecx);
+ if (!current_set_polling_and_test()) {
+ cpu_relax(); /* allow HT sibling to run */
+ __mwait(eax, ecx);
+ __current_clr_polling();
+ }
start_critical_timings();
atomic_inc(&idle_wakeup_counter);
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 9:08 ` Peter Zijlstra
@ 2013-11-19 11:31 ` Peter Zijlstra
2013-11-19 13:06 ` Rafael J. Wysocki
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-19 11:31 UTC (permalink / raw)
To: lenb, rjw, linux-acpi, linux-kernel
Cc: shaohua.li, rui.zhang, arjan, jacob.jun.pan, Mike Galbraith,
Ingo Molnar, Thomas Gleixner, hpa
People seem to delight in writing wrong and broken mwait idle routines;
collapse the lot.
This leaves mwait_play_dead() the sole remaining user of __mwait() and
new __mwait() users are probably doing it wrong.
Also remove __sti_mwait() as its unused.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
Mike, does this cure your core2?
arch/x86/include/asm/mwait.h | 42 ++++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/processor.h | 23 ---------------------
arch/x86/kernel/acpi/cstate.c | 23 ---------------------
drivers/acpi/acpi_pad.c | 5 +----
drivers/acpi/processor_idle.c | 15 --------------
drivers/idle/intel_idle.c | 8 +-------
drivers/thermal/intel_powerclamp.c | 4 +---
7 files changed, 45 insertions(+), 75 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 2f366d0ac6b4..80014dade987 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -1,6 +1,8 @@
#ifndef _ASM_X86_MWAIT_H
#define _ASM_X86_MWAIT_H
+#include <linux/sched.h>
+
#define MWAIT_SUBSTATE_MASK 0xf
#define MWAIT_CSTATE_MASK 0xf
#define MWAIT_SUBSTATE_SIZE 4
@@ -13,4 +15,44 @@
#define MWAIT_ECX_INTERRUPT_BREAK 0x1
+static inline void __monitor(const void *eax, unsigned long ecx,
+ unsigned long edx)
+{
+ /* "monitor %eax, %ecx, %edx;" */
+ asm volatile(".byte 0x0f, 0x01, 0xc8;"
+ :: "a" (eax), "c" (ecx), "d"(edx));
+}
+
+static inline void __mwait(unsigned long eax, unsigned long ecx)
+{
+ /* "mwait %eax, %ecx;" */
+ asm volatile(".byte 0x0f, 0x01, 0xc9;"
+ :: "a" (eax), "c" (ecx));
+}
+
+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait state
+ * through MWAIT. Whenever someone changes need_resched, we would be woken
+ * up from MWAIT (without an IPI).
+ *
+ * New with Core Duo processors, MWAIT can take some hints based on CPU
+ * capability.
+ */
+static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
+{
+ if (need_resched())
+ return;
+
+ if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)¤t_thread_info()->flags);
+
+ __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ if (!current_set_polling_and_test())
+ __mwait(eax, ecx);
+
+ __current_clr_polling();
+}
+
#endif /* _ASM_X86_MWAIT_H */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 7b034a4057f9..24821f5768bc 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -700,29 +700,6 @@ static inline void sync_core(void)
#endif
}
-static inline void __monitor(const void *eax, unsigned long ecx,
- unsigned long edx)
-{
- /* "monitor %eax, %ecx, %edx;" */
- asm volatile(".byte 0x0f, 0x01, 0xc8;"
- :: "a" (eax), "c" (ecx), "d"(edx));
-}
-
-static inline void __mwait(unsigned long eax, unsigned long ecx)
-{
- /* "mwait %eax, %ecx;" */
- asm volatile(".byte 0x0f, 0x01, 0xc9;"
- :: "a" (eax), "c" (ecx));
-}
-
-static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
-{
- trace_hardirqs_on();
- /* "mwait %eax, %ecx;" */
- asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
- :: "a" (eax), "c" (ecx));
-}
-
extern void select_idle_routine(const struct cpuinfo_x86 *c);
extern void init_amd_e400_c1e_mask(void);
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index d2b7f27781bc..e69182fd01cf 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
}
EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
-/*
- * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
- * which can obviate IPI to trigger checking of need_resched.
- * We execute MONITOR against need_resched and enter optimized wait state
- * through MWAIT. Whenever someone changes need_resched, we would be woken
- * up from MWAIT (without an IPI).
- *
- * New with Core Duo processors, MWAIT can take some hints based on CPU
- * capability.
- */
-void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
-{
- if (!need_resched()) {
- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
- clflush((void *)¤t_thread_info()->flags);
-
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __mwait(ax, cx);
- }
-}
-
void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
{
unsigned int cpu = smp_processor_id();
diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
index fc6008fbce35..509452a62f96 100644
--- a/drivers/acpi/acpi_pad.c
+++ b/drivers/acpi/acpi_pad.c
@@ -193,10 +193,7 @@ static int power_saving_thread(void *data)
CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
stop_critical_timings();
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __mwait(power_saving_mwait_eax, 1);
+ mwait_idle_with_hints(power_saving_mwait_eax, 1);
start_critical_timings();
if (lapic_marked_unstable)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 644516d9bde6..f90c56c8379e 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -727,11 +727,6 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
if (unlikely(!pr))
return -EINVAL;
- if (cx->entry_method == ACPI_CSTATE_FFH) {
- if (current_set_polling_and_test())
- return -EINVAL;
- }
-
lapic_timer_state_broadcast(pr, cx, 1);
acpi_idle_do_entry(cx);
@@ -785,11 +780,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
if (unlikely(!pr))
return -EINVAL;
- if (cx->entry_method == ACPI_CSTATE_FFH) {
- if (current_set_polling_and_test())
- return -EINVAL;
- }
-
/*
* Must be done before busmaster disable as we might need to
* access HPET !
@@ -841,11 +831,6 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
}
}
- if (cx->entry_method == ACPI_CSTATE_FFH) {
- if (current_set_polling_and_test())
- return -EINVAL;
- }
-
acpi_unlazy_tlb(smp_processor_id());
/* Tell the scheduler that we are going deep-idle: */
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3226ce98fb18..3b56d76a5bca 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -359,13 +359,7 @@ static int intel_idle(struct cpuidle_device *dev,
if (!(lapic_timer_reliable_states & (1 << (cstate))))
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
- if (!current_set_polling_and_test()) {
-
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __mwait(eax, ecx);
- }
+ mwait_idle_with_hints(eax, ecx);
if (!(lapic_timer_reliable_states & (1 << (cstate))))
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 8f181b3f842b..e8275f2df9af 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
*/
local_touch_nmi();
stop_critical_timings();
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- cpu_relax(); /* allow HT sibling to run */
- __mwait(eax, ecx);
+ mwait_idle_with_hints(eax, ecx);
start_critical_timings();
atomic_inc(&idle_wakeup_counter);
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra
@ 2013-11-19 13:06 ` Rafael J. Wysocki
2013-11-19 13:21 ` Mike Galbraith
2013-11-19 14:22 ` Arjan van de Ven
2 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2013-11-19 13:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: lenb, linux-acpi, linux-kernel, shaohua.li, rui.zhang, arjan,
jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa
On Tuesday, November 19, 2013 12:31:53 PM Peter Zijlstra wrote:
> People seem to delight in writing wrong and broken mwait idle routines;
> collapse the lot.
>
> This leaves mwait_play_dead() the sole remaining user of __mwait() and
> new __mwait() users are probably doing it wrong.
>
> Also remove __sti_mwait() as its unused.
>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
For the ACPI part:
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> Mike, does this cure your core2?
>
> arch/x86/include/asm/mwait.h | 42 ++++++++++++++++++++++++++++++++++++++
> arch/x86/include/asm/processor.h | 23 ---------------------
> arch/x86/kernel/acpi/cstate.c | 23 ---------------------
> drivers/acpi/acpi_pad.c | 5 +----
> drivers/acpi/processor_idle.c | 15 --------------
> drivers/idle/intel_idle.c | 8 +-------
> drivers/thermal/intel_powerclamp.c | 4 +---
> 7 files changed, 45 insertions(+), 75 deletions(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 2f366d0ac6b4..80014dade987 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -1,6 +1,8 @@
> #ifndef _ASM_X86_MWAIT_H
> #define _ASM_X86_MWAIT_H
>
> +#include <linux/sched.h>
> +
> #define MWAIT_SUBSTATE_MASK 0xf
> #define MWAIT_CSTATE_MASK 0xf
> #define MWAIT_SUBSTATE_SIZE 4
> @@ -13,4 +15,44 @@
>
> #define MWAIT_ECX_INTERRUPT_BREAK 0x1
>
> +static inline void __monitor(const void *eax, unsigned long ecx,
> + unsigned long edx)
> +{
> + /* "monitor %eax, %ecx, %edx;" */
> + asm volatile(".byte 0x0f, 0x01, 0xc8;"
> + :: "a" (eax), "c" (ecx), "d"(edx));
> +}
> +
> +static inline void __mwait(unsigned long eax, unsigned long ecx)
> +{
> + /* "mwait %eax, %ecx;" */
> + asm volatile(".byte 0x0f, 0x01, 0xc9;"
> + :: "a" (eax), "c" (ecx));
> +}
> +
> +/*
> + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
> + * which can obviate IPI to trigger checking of need_resched.
> + * We execute MONITOR against need_resched and enter optimized wait state
> + * through MWAIT. Whenever someone changes need_resched, we would be woken
> + * up from MWAIT (without an IPI).
> + *
> + * New with Core Duo processors, MWAIT can take some hints based on CPU
> + * capability.
> + */
> +static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
> +{
> + if (need_resched())
> + return;
> +
> + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> + clflush((void *)¤t_thread_info()->flags);
> +
> + __monitor((void *)¤t_thread_info()->flags, 0, 0);
> + if (!current_set_polling_and_test())
> + __mwait(eax, ecx);
> +
> + __current_clr_polling();
> +}
> +
> #endif /* _ASM_X86_MWAIT_H */
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 7b034a4057f9..24821f5768bc 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -700,29 +700,6 @@ static inline void sync_core(void)
> #endif
> }
>
> -static inline void __monitor(const void *eax, unsigned long ecx,
> - unsigned long edx)
> -{
> - /* "monitor %eax, %ecx, %edx;" */
> - asm volatile(".byte 0x0f, 0x01, 0xc8;"
> - :: "a" (eax), "c" (ecx), "d"(edx));
> -}
> -
> -static inline void __mwait(unsigned long eax, unsigned long ecx)
> -{
> - /* "mwait %eax, %ecx;" */
> - asm volatile(".byte 0x0f, 0x01, 0xc9;"
> - :: "a" (eax), "c" (ecx));
> -}
> -
> -static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> -{
> - trace_hardirqs_on();
> - /* "mwait %eax, %ecx;" */
> - asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
> - :: "a" (eax), "c" (ecx));
> -}
> -
> extern void select_idle_routine(const struct cpuinfo_x86 *c);
> extern void init_amd_e400_c1e_mask(void);
>
> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
> index d2b7f27781bc..e69182fd01cf 100644
> --- a/arch/x86/kernel/acpi/cstate.c
> +++ b/arch/x86/kernel/acpi/cstate.c
> @@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
> }
> EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
>
> -/*
> - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
> - * which can obviate IPI to trigger checking of need_resched.
> - * We execute MONITOR against need_resched and enter optimized wait state
> - * through MWAIT. Whenever someone changes need_resched, we would be woken
> - * up from MWAIT (without an IPI).
> - *
> - * New with Core Duo processors, MWAIT can take some hints based on CPU
> - * capability.
> - */
> -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
> -{
> - if (!need_resched()) {
> - if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
> - clflush((void *)¤t_thread_info()->flags);
> -
> - __monitor((void *)¤t_thread_info()->flags, 0, 0);
> - smp_mb();
> - if (!need_resched())
> - __mwait(ax, cx);
> - }
> -}
> -
> void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
> {
> unsigned int cpu = smp_processor_id();
> diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
> index fc6008fbce35..509452a62f96 100644
> --- a/drivers/acpi/acpi_pad.c
> +++ b/drivers/acpi/acpi_pad.c
> @@ -193,10 +193,7 @@ static int power_saving_thread(void *data)
> CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
> stop_critical_timings();
>
> - __monitor((void *)¤t_thread_info()->flags, 0, 0);
> - smp_mb();
> - if (!need_resched())
> - __mwait(power_saving_mwait_eax, 1);
> + mwait_idle_with_hints(power_saving_mwait_eax, 1);
>
> start_critical_timings();
> if (lapic_marked_unstable)
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 644516d9bde6..f90c56c8379e 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -727,11 +727,6 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
> if (unlikely(!pr))
> return -EINVAL;
>
> - if (cx->entry_method == ACPI_CSTATE_FFH) {
> - if (current_set_polling_and_test())
> - return -EINVAL;
> - }
> -
> lapic_timer_state_broadcast(pr, cx, 1);
> acpi_idle_do_entry(cx);
>
> @@ -785,11 +780,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
> if (unlikely(!pr))
> return -EINVAL;
>
> - if (cx->entry_method == ACPI_CSTATE_FFH) {
> - if (current_set_polling_and_test())
> - return -EINVAL;
> - }
> -
> /*
> * Must be done before busmaster disable as we might need to
> * access HPET !
> @@ -841,11 +831,6 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
> }
> }
>
> - if (cx->entry_method == ACPI_CSTATE_FFH) {
> - if (current_set_polling_and_test())
> - return -EINVAL;
> - }
> -
> acpi_unlazy_tlb(smp_processor_id());
>
> /* Tell the scheduler that we are going deep-idle: */
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 3226ce98fb18..3b56d76a5bca 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -359,13 +359,7 @@ static int intel_idle(struct cpuidle_device *dev,
> if (!(lapic_timer_reliable_states & (1 << (cstate))))
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
>
> - if (!current_set_polling_and_test()) {
> -
> - __monitor((void *)¤t_thread_info()->flags, 0, 0);
> - smp_mb();
> - if (!need_resched())
> - __mwait(eax, ecx);
> - }
> + mwait_idle_with_hints(eax, ecx);
>
> if (!(lapic_timer_reliable_states & (1 << (cstate))))
> clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
> index 8f181b3f842b..e8275f2df9af 100644
> --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
> */
> local_touch_nmi();
> stop_critical_timings();
> - __monitor((void *)¤t_thread_info()->flags, 0, 0);
> - cpu_relax(); /* allow HT sibling to run */
> - __mwait(eax, ecx);
> + mwait_idle_with_hints(eax, ecx);
> start_critical_timings();
> atomic_inc(&idle_wakeup_counter);
> }
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra
2013-11-19 13:06 ` Rafael J. Wysocki
@ 2013-11-19 13:21 ` Mike Galbraith
2013-11-19 14:22 ` Arjan van de Ven
2 siblings, 0 replies; 15+ messages in thread
From: Mike Galbraith @ 2013-11-19 13:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang,
arjan, jacob.jun.pan, Ingo Molnar, Thomas Gleixner, hpa
On Tue, 2013-11-19 at 12:31 +0100, Peter Zijlstra wrote:
> People seem to delight in writing wrong and broken mwait idle routines;
> collapse the lot.
>
> This leaves mwait_play_dead() the sole remaining user of __mwait() and
> new __mwait() users are probably doing it wrong.
>
> Also remove __sti_mwait() as its unused.
>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>
> Mike, does this cure your core2?
Nope. Maybe an acpi/bios thingy on this box since diags that fired on
lappy during boot did not fire on desktop, and both are core2 booting
same kernel. I kinda suspect I'll either be stuck with default_idle or
have to resurrect mwait_idle and carry it locally if I want the thing to
work well. Guess I'll find out if/when I have time to squabble with it.
Meanwhile, desktop box works fine modulo benchmarks, lappy works fine
too, modulo max_cstate=1 scaring mwait_idle_with_hints away, which I
don't care about much. Neither box is wonderful at rt testing where I
usually boot max_cstate=1, both just became a bit _less_ wonderful :)
-Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra
2013-11-19 13:06 ` Rafael J. Wysocki
2013-11-19 13:21 ` Mike Galbraith
@ 2013-11-19 14:22 ` Arjan van de Ven
2013-11-19 14:46 ` Peter Zijlstra
2013-11-19 14:51 ` Peter Zijlstra
2 siblings, 2 replies; 15+ messages in thread
From: Arjan van de Ven @ 2013-11-19 14:22 UTC (permalink / raw)
To: Peter Zijlstra, lenb, rjw, linux-acpi, linux-kernel
Cc: shaohua.li, rui.zhang, jacob.jun.pan, Mike Galbraith,
Ingo Molnar, Thomas Gleixner, hpa
> diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
> index 8f181b3f842b..e8275f2df9af 100644
> --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
> */
> local_touch_nmi();
> stop_critical_timings();
> - __monitor((void *)¤t_thread_info()->flags, 0, 0);
> - cpu_relax(); /* allow HT sibling to run */
> - __mwait(eax, ecx);
> + mwait_idle_with_hints(eax, ecx);
> start_critical_timings();
> atomic_inc(&idle_wakeup_counter);
> }
>
hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ?
if so... powerclamp may not want to use that (the whole point is to NOT give the cpu
to tasks!)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 14:22 ` Arjan van de Ven
@ 2013-11-19 14:46 ` Peter Zijlstra
2013-11-19 14:51 ` Peter Zijlstra
1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-19 14:46 UTC (permalink / raw)
To: Arjan van de Ven
Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang,
jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote:
> >diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
> >index 8f181b3f842b..e8275f2df9af 100644
> >--- a/drivers/thermal/intel_powerclamp.c
> >+++ b/drivers/thermal/intel_powerclamp.c
> >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
> > */
> > local_touch_nmi();
> > stop_critical_timings();
> >- __monitor((void *)¤t_thread_info()->flags, 0, 0);
> >- cpu_relax(); /* allow HT sibling to run */
> >- __mwait(eax, ecx);
> >+ mwait_idle_with_hints(eax, ecx);
> > start_critical_timings();
> > atomic_inc(&idle_wakeup_counter);
> > }
> >
>
> hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ?
> if so... powerclamp may not want to use that (the whole point is to NOT give the cpu
> to tasks!)
Of course, you're very much not getting something that wins from
stop_machine or a fifo-99 task.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 14:22 ` Arjan van de Ven
2013-11-19 14:46 ` Peter Zijlstra
@ 2013-11-19 14:51 ` Peter Zijlstra
2013-11-19 15:13 ` Peter Zijlstra
1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-19 14:51 UTC (permalink / raw)
To: Arjan van de Ven
Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang,
jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa
On Tue, Nov 19, 2013 at 06:22:43AM -0800, Arjan van de Ven wrote:
> >diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
> >index 8f181b3f842b..e8275f2df9af 100644
> >--- a/drivers/thermal/intel_powerclamp.c
> >+++ b/drivers/thermal/intel_powerclamp.c
> >@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
> > */
> > local_touch_nmi();
> > stop_critical_timings();
> >- __monitor((void *)¤t_thread_info()->flags, 0, 0);
> >- cpu_relax(); /* allow HT sibling to run */
> >- __mwait(eax, ecx);
> >+ mwait_idle_with_hints(eax, ecx);
> > start_critical_timings();
> > atomic_inc(&idle_wakeup_counter);
> > }
> >
>
> hmm I take it that mwait_idle_with_hints is the one that also checks need_resched() ?
> if so... powerclamp may not want to use that (the whole point is to NOT give the cpu
> to tasks!)
That said, that drive is completely wrecked. It uses
preempt_enable_no_resched() wrong too, it has uncommented barriers..
Dude, wtf are you guys smoking?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 14:51 ` Peter Zijlstra
@ 2013-11-19 15:13 ` Peter Zijlstra
2013-11-19 21:06 ` Jacob Pan
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-19 15:13 UTC (permalink / raw)
To: Arjan van de Ven
Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang,
jacob.jun.pan, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa
On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
> That said, that drive is completely wrecked. It uses
> preempt_enable_no_resched() wrong too, it has uncommented barriers..
>
> Dude, wtf are you guys smoking?
---
Subject: sched: Take away preempt_enable_no_resched() and friends from modules
There is no way in hell modules are going to play preemption tricks like
this.
Cc: eliezer.tamir@linux.intel.com
Cc: arjan@linux.intel.com
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
drivers/thermal/intel_powerclamp.c | 2 +-
include/linux/preempt.h | 8 +++++++-
include/net/busy_poll.h | 15 +++------------
3 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 8f181b3f842b..0a12ddc2eb4c 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -445,7 +445,7 @@ static int clamp_thread(void *arg)
atomic_inc(&idle_wakeup_counter);
}
tick_nohz_idle_exit();
- preempt_enable_no_resched();
+ preempt_enable();
}
del_timer_sync(&wakeup_timer);
clear_bit(cpunr, cpu_clamping_mask);
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index a3d9dc8c2c00..3ed2b5335ab4 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -64,7 +64,7 @@ do { \
} while (0)
#else
-#define preempt_enable() preempt_enable_no_resched()
+#define preempt_enable() sched_preempt_enable_no_resched()
#define preempt_check_resched() do { } while (0)
#endif
@@ -116,6 +116,12 @@ do { \
#endif /* CONFIG_PREEMPT_COUNT */
+#ifdef MODULE
+#undef preempt_enable_no_resched
+#undef preempt_enable_no_resched_notrace
+#undef preempt_check_resched
+#endif
+
#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 829627d7b846..756827a86c2d 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void)
return sysctl_net_busy_poll;
}
-/* a wrapper to make debug_smp_processor_id() happy
- * we can use sched_clock() because we don't care much about precision
- * we only care that the average is bounded
- */
-#ifdef CONFIG_DEBUG_PREEMPT
static inline u64 busy_loop_us_clock(void)
{
u64 rc;
+ /* XXX with interrupts enabled sched_clock() can return utter garbage */
+
preempt_disable_notrace();
rc = sched_clock();
- preempt_enable_no_resched_notrace();
+ preempt_enable_notrace();
return rc >> 10;
}
-#else /* CONFIG_DEBUG_PREEMPT */
-static inline u64 busy_loop_us_clock(void)
-{
- return sched_clock() >> 10;
-}
-#endif /* CONFIG_DEBUG_PREEMPT */
static inline unsigned long sk_busy_loop_end_time(struct sock *sk)
{
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 15:13 ` Peter Zijlstra
@ 2013-11-19 21:06 ` Jacob Pan
2013-11-20 10:28 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Jacob Pan @ 2013-11-19 21:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Arjan van de Ven, lenb, rjw, linux-acpi, linux-kernel,
shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar,
Thomas Gleixner, hpa
On Tue, 19 Nov 2013 16:13:38 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
> > That said, that drive is completely wrecked. It uses
> > preempt_enable_no_resched() wrong too, it has uncommented barriers..
> >
> > Dude, wtf are you guys smoking?
>
I applied this patch on top of upstream kernel (801a760) and found out
my machine completely failed to enter idle when nothing is running.
turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
frequently.
Both ACPI idle and intel_idle behaves the same way. I have to do the
following change to allow entering C-states again.
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 80014da..b51d1e1 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned
long eax, unsigned long ecx) clflush((void
*)¤t_thread_info()->flags);
__monitor((void *)¤t_thread_info()->flags, 0, 0);
- if (!current_set_polling_and_test())
+ if (!need_resched())
__mwait(eax, ecx);
-
- __current_clr_polling();
}
#endif /* _ASM_X86_MWAIT_H */
Did i miss any other patches?
Jacob
> ---
> Subject: sched: Take away preempt_enable_no_resched() and friends
> from modules
>
> There is no way in hell modules are going to play preemption tricks
> like this.
>
> Cc: eliezer.tamir@linux.intel.com
> Cc: arjan@linux.intel.com
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
> drivers/thermal/intel_powerclamp.c | 2 +-
> include/linux/preempt.h | 8 +++++++-
> include/net/busy_poll.h | 15 +++------------
> 3 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/thermal/intel_powerclamp.c
> b/drivers/thermal/intel_powerclamp.c index 8f181b3f842b..0a12ddc2eb4c
> 100644 --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -445,7 +445,7 @@ static int clamp_thread(void *arg)
> atomic_inc(&idle_wakeup_counter);
> }
> tick_nohz_idle_exit();
> - preempt_enable_no_resched();
> + preempt_enable();
> }
> del_timer_sync(&wakeup_timer);
> clear_bit(cpunr, cpu_clamping_mask);
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index a3d9dc8c2c00..3ed2b5335ab4 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -64,7 +64,7 @@ do { \
> } while (0)
>
> #else
> -#define preempt_enable() preempt_enable_no_resched()
> +#define preempt_enable() sched_preempt_enable_no_resched()
> #define preempt_check_resched() do { } while (0)
> #endif
>
> @@ -116,6 +116,12 @@ do { \
>
> #endif /* CONFIG_PREEMPT_COUNT */
>
> +#ifdef MODULE
> +#undef preempt_enable_no_resched
> +#undef preempt_enable_no_resched_notrace
> +#undef preempt_check_resched
> +#endif
> +
> #ifdef CONFIG_PREEMPT_NOTIFIERS
>
> struct preempt_notifier;
> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
> index 829627d7b846..756827a86c2d 100644
> --- a/include/net/busy_poll.h
> +++ b/include/net/busy_poll.h
> @@ -42,27 +42,18 @@ static inline bool net_busy_loop_on(void)
> return sysctl_net_busy_poll;
> }
>
> -/* a wrapper to make debug_smp_processor_id() happy
> - * we can use sched_clock() because we don't care much about
> precision
> - * we only care that the average is bounded
> - */
> -#ifdef CONFIG_DEBUG_PREEMPT
> static inline u64 busy_loop_us_clock(void)
> {
> u64 rc;
>
> + /* XXX with interrupts enabled sched_clock() can return
> utter garbage */ +
> preempt_disable_notrace();
> rc = sched_clock();
> - preempt_enable_no_resched_notrace();
> + preempt_enable_notrace();
>
> return rc >> 10;
> }
> -#else /* CONFIG_DEBUG_PREEMPT */
> -static inline u64 busy_loop_us_clock(void)
> -{
> - return sched_clock() >> 10;
> -}
> -#endif /* CONFIG_DEBUG_PREEMPT */
>
> static inline unsigned long sk_busy_loop_end_time(struct sock *sk)
> {
>
[Jacob Pan]
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-19 21:06 ` Jacob Pan
@ 2013-11-20 10:28 ` Peter Zijlstra
2013-11-20 10:58 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-20 10:28 UTC (permalink / raw)
To: Jacob Pan
Cc: Arjan van de Ven, lenb, rjw, linux-acpi, linux-kernel,
shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar,
Thomas Gleixner, hpa
On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:
> On Tue, 19 Nov 2013 16:13:38 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
>
> > On Tue, Nov 19, 2013 at 03:51:43PM +0100, Peter Zijlstra wrote:
> > > That said, that drive is completely wrecked. It uses
> > > preempt_enable_no_resched() wrong too, it has uncommented barriers..
> > >
> > > Dude, wtf are you guys smoking?
> >
> I applied this patch on top of upstream kernel (801a760) and found out
> my machine completely failed to enter idle when nothing is running.
> turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
> frequently.
>
> Both ACPI idle and intel_idle behaves the same way. I have to do the
> following change to allow entering C-states again.
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 80014da..b51d1e1 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -49,10 +49,8 @@ static inline void mwait_idle_with_hints(unsigned
> long eax, unsigned long ecx) clflush((void
> *)¤t_thread_info()->flags);
> __monitor((void *)¤t_thread_info()->flags, 0, 0);
> - if (!current_set_polling_and_test())
> + if (!need_resched())
> __mwait(eax, ecx);
> -
> - __current_clr_polling();
> }
>
> #endif /* _ASM_X86_MWAIT_H */
That doesn't make any sense; current_set_polling_and_test() returns the
same thing need_resched() does.
But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
C1 residency... most weird.
/me goes prod at it
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-20 10:28 ` Peter Zijlstra
@ 2013-11-20 10:58 ` Peter Zijlstra
2013-11-20 16:24 ` Arjan van de Ven
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-20 10:58 UTC (permalink / raw)
To: Jacob Pan
Cc: Arjan van de Ven, lenb, rjw, linux-acpi, linux-kernel,
shaohua.li, rui.zhang, Mike Galbraith, Ingo Molnar,
Thomas Gleixner, hpa
On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:
> > I applied this patch on top of upstream kernel (801a760) and found out
> > my machine completely failed to enter idle when nothing is running.
> > turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
> > frequently.
> >
> > Both ACPI idle and intel_idle behaves the same way. I have to do the
> > following change to allow entering C-states again.
> That doesn't make any sense; current_set_polling_and_test() returns the
> same thing need_resched() does.
>
> But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
> C1 residency... most weird.
So pretty silly actually; you cannot do a store (any store) in between
monitor and mwait.
The below version seems to work properly again with both acpi_idle and
intel_idle.
Now to go make that preempt_disable_no_resched cleanup compile.. :-)
---
Subject: x86, acpi, idle: Restructure the mwait idle routines
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 19 Nov 2013 12:31:53 +0100
People seem to delight in writing wrong and broken mwait idle routines;
collapse the lot.
This leaves mwait_play_dead() the sole remaining user of __mwait() and
new __mwait() users are probably doing it wrong.
Also remove __sti_mwait() as its unused.
Cc: arjan@linux.intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: lenb@kernel.org
Cc: shaohua.li@intel.com
Cc: rui.zhang@intel.com
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
arch/x86/include/asm/mwait.h | 40 +++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/processor.h | 23 ---------------------
arch/x86/kernel/acpi/cstate.c | 23 ---------------------
drivers/acpi/acpi_pad.c | 5 ----
drivers/acpi/processor_idle.c | 15 -------------
drivers/idle/intel_idle.c | 8 -------
drivers/thermal/intel_powerclamp.c | 4 ---
7 files changed, 43 insertions(+), 75 deletions(-)
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -1,6 +1,8 @@
#ifndef _ASM_X86_MWAIT_H
#define _ASM_X86_MWAIT_H
+#include <linux/sched.h>
+
#define MWAIT_SUBSTATE_MASK 0xf
#define MWAIT_CSTATE_MASK 0xf
#define MWAIT_SUBSTATE_SIZE 4
@@ -13,4 +15,42 @@
#define MWAIT_ECX_INTERRUPT_BREAK 0x1
+static inline void __monitor(const void *eax, unsigned long ecx,
+ unsigned long edx)
+{
+ /* "monitor %eax, %ecx, %edx;" */
+ asm volatile(".byte 0x0f, 0x01, 0xc8;"
+ :: "a" (eax), "c" (ecx), "d"(edx));
+}
+
+static inline void __mwait(unsigned long eax, unsigned long ecx)
+{
+ /* "mwait %eax, %ecx;" */
+ asm volatile(".byte 0x0f, 0x01, 0xc9;"
+ :: "a" (eax), "c" (ecx));
+}
+
+/*
+ * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
+ * which can obviate IPI to trigger checking of need_resched.
+ * We execute MONITOR against need_resched and enter optimized wait state
+ * through MWAIT. Whenever someone changes need_resched, we would be woken
+ * up from MWAIT (without an IPI).
+ *
+ * New with Core Duo processors, MWAIT can take some hints based on CPU
+ * capability.
+ */
+static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
+{
+ if (!current_set_polling_and_test()) {
+ if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+ clflush((void *)¤t_thread_info()->flags);
+
+ __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ if (!need_resched())
+ __mwait(eax, ecx);
+ }
+ __current_clr_polling();
+}
+
#endif /* _ASM_X86_MWAIT_H */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -700,29 +700,6 @@ static inline void sync_core(void)
#endif
}
-static inline void __monitor(const void *eax, unsigned long ecx,
- unsigned long edx)
-{
- /* "monitor %eax, %ecx, %edx;" */
- asm volatile(".byte 0x0f, 0x01, 0xc8;"
- :: "a" (eax), "c" (ecx), "d"(edx));
-}
-
-static inline void __mwait(unsigned long eax, unsigned long ecx)
-{
- /* "mwait %eax, %ecx;" */
- asm volatile(".byte 0x0f, 0x01, 0xc9;"
- :: "a" (eax), "c" (ecx));
-}
-
-static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
-{
- trace_hardirqs_on();
- /* "mwait %eax, %ecx;" */
- asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
- :: "a" (eax), "c" (ecx));
-}
-
extern void select_idle_routine(const struct cpuinfo_x86 *c);
extern void init_amd_e400_c1e_mask(void);
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -150,29 +150,6 @@ int acpi_processor_ffh_cstate_probe(unsi
}
EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);
-/*
- * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
- * which can obviate IPI to trigger checking of need_resched.
- * We execute MONITOR against need_resched and enter optimized wait state
- * through MWAIT. Whenever someone changes need_resched, we would be woken
- * up from MWAIT (without an IPI).
- *
- * New with Core Duo processors, MWAIT can take some hints based on CPU
- * capability.
- */
-void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
-{
- if (!need_resched()) {
- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
- clflush((void *)¤t_thread_info()->flags);
-
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __mwait(ax, cx);
- }
-}
-
void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
{
unsigned int cpu = smp_processor_id();
--- a/drivers/acpi/acpi_pad.c
+++ b/drivers/acpi/acpi_pad.c
@@ -193,10 +193,7 @@ static int power_saving_thread(void *dat
CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
stop_critical_timings();
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __mwait(power_saving_mwait_eax, 1);
+ mwait_idle_with_hints(power_saving_mwait_eax, 1);
start_critical_timings();
if (lapic_marked_unstable)
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -727,11 +727,6 @@ static int acpi_idle_enter_c1(struct cpu
if (unlikely(!pr))
return -EINVAL;
- if (cx->entry_method == ACPI_CSTATE_FFH) {
- if (current_set_polling_and_test())
- return -EINVAL;
- }
-
lapic_timer_state_broadcast(pr, cx, 1);
acpi_idle_do_entry(cx);
@@ -785,11 +780,6 @@ static int acpi_idle_enter_simple(struct
if (unlikely(!pr))
return -EINVAL;
- if (cx->entry_method == ACPI_CSTATE_FFH) {
- if (current_set_polling_and_test())
- return -EINVAL;
- }
-
/*
* Must be done before busmaster disable as we might need to
* access HPET !
@@ -841,11 +831,6 @@ static int acpi_idle_enter_bm(struct cpu
}
}
- if (cx->entry_method == ACPI_CSTATE_FFH) {
- if (current_set_polling_and_test())
- return -EINVAL;
- }
-
acpi_unlazy_tlb(smp_processor_id());
/* Tell the scheduler that we are going deep-idle: */
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -359,13 +359,7 @@ static int intel_idle(struct cpuidle_dev
if (!(lapic_timer_reliable_states & (1 << (cstate))))
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
- if (!current_set_polling_and_test()) {
-
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __mwait(eax, ecx);
- }
+ mwait_idle_with_hints(eax, ecx);
if (!(lapic_timer_reliable_states & (1 << (cstate))))
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -438,9 +438,7 @@ static int clamp_thread(void *arg)
*/
local_touch_nmi();
stop_critical_timings();
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
- cpu_relax(); /* allow HT sibling to run */
- __mwait(eax, ecx);
+ mwait_idle_with_hints(eax, ecx);
start_critical_timings();
atomic_inc(&idle_wakeup_counter);
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-20 10:58 ` Peter Zijlstra
@ 2013-11-20 16:24 ` Arjan van de Ven
2013-11-20 16:33 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Arjan van de Ven @ 2013-11-20 16:24 UTC (permalink / raw)
To: Peter Zijlstra, Jacob Pan
Cc: lenb, rjw, linux-acpi, linux-kernel, shaohua.li, rui.zhang,
Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa
On 11/20/2013 2:58 AM, Peter Zijlstra wrote:
> On Wed, Nov 20, 2013 at 11:28:03AM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 19, 2013 at 01:06:30PM -0800, Jacob Pan wrote:
>>> I applied this patch on top of upstream kernel (801a760) and found out
>>> my machine completely failed to enter idle when nothing is running.
>>> turbostate shows 100% C0. ftrace shows kernel coming in and out of idle
>>> frequently.
>>>
>>> Both ACPI idle and intel_idle behaves the same way. I have to do the
>>> following change to allow entering C-states again.
>
>> That doesn't make any sense; current_set_polling_and_test() returns the
>> same thing need_resched() does.
>>
>> But you're right, intel_idle resides 100% in C0 and acpi_idle has 100%
>> C1 residency... most weird.
>
> So pretty silly actually; you cannot do a store (any store) in between
> monitor and mwait.
you can
just not to the cacheline you are watching (or things that alias with that)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-20 16:24 ` Arjan van de Ven
@ 2013-11-20 16:33 ` Peter Zijlstra
2013-11-20 16:38 ` Arjan van de Ven
0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2013-11-20 16:33 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Jacob Pan, lenb, rjw, linux-acpi, linux-kernel, shaohua.li,
rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa
On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote:
> On 11/20/2013 2:58 AM, Peter Zijlstra wrote:
> >So pretty silly actually; you cannot do a store (any store) in between
> >monitor and mwait.
>
> you can
> just not to the cacheline you are watching (or things that alias with that)
Ah indeed, the thread_info::status is in the same cacheline.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, acpi, idle: Restructure the mwait idle routines
2013-11-20 16:33 ` Peter Zijlstra
@ 2013-11-20 16:38 ` Arjan van de Ven
0 siblings, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2013-11-20 16:38 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jacob Pan, lenb, rjw, linux-acpi, linux-kernel, shaohua.li,
rui.zhang, Mike Galbraith, Ingo Molnar, Thomas Gleixner, hpa
On 11/20/2013 8:33 AM, Peter Zijlstra wrote:
> On Wed, Nov 20, 2013 at 08:24:47AM -0800, Arjan van de Ven wrote:
>> On 11/20/2013 2:58 AM, Peter Zijlstra wrote:
>>> So pretty silly actually; you cannot do a store (any store) in between
>>> monitor and mwait.
>>
>> you can
>> just not to the cacheline you are watching (or things that alias with that)
>
> Ah indeed, the thread_info::status is in the same cacheline.
>
you told it to wake on any write to that cacheline
and then you write to it ;-)
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-11-20 16:38 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-19 9:00 acpi_pad mwait usage Peter Zijlstra
2013-11-19 9:08 ` Peter Zijlstra
2013-11-19 11:31 ` [PATCH] x86, acpi, idle: Restructure the mwait idle routines Peter Zijlstra
2013-11-19 13:06 ` Rafael J. Wysocki
2013-11-19 13:21 ` Mike Galbraith
2013-11-19 14:22 ` Arjan van de Ven
2013-11-19 14:46 ` Peter Zijlstra
2013-11-19 14:51 ` Peter Zijlstra
2013-11-19 15:13 ` Peter Zijlstra
2013-11-19 21:06 ` Jacob Pan
2013-11-20 10:28 ` Peter Zijlstra
2013-11-20 10:58 ` Peter Zijlstra
2013-11-20 16:24 ` Arjan van de Ven
2013-11-20 16:33 ` Peter Zijlstra
2013-11-20 16:38 ` Arjan van de Ven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).