* [PATCH] intel_idle: use static_key to optimize idle enter/exit paths
@ 2014-07-11 17:54 Jason Baron
2014-07-28 20:38 ` Len Brown
0 siblings, 1 reply; 3+ messages in thread
From: Jason Baron @ 2014-07-11 17:54 UTC (permalink / raw)
To: lenb; +Cc: linux-pm, linux-kernel
If 'arat' is set in the cpuflags, we can avoid the checks for entering/exiting
the tick broadcast code entirely. It would seem that this is a hot enough code
path to make this worthwhile. I ran a few hackbench runs, and consistenly see
reduced branches and cycles.
Signed-off-by: Jason Baron <jbaron@akamai.com>
---
drivers/idle/intel_idle.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 4d140bb..61e965c 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -80,6 +80,8 @@ static unsigned int mwait_substates;
#define LAPIC_TIMER_ALWAYS_RELIABLE 0xFFFFFFFF
/* Reliable LAPIC Timer States, bit 1 for C1 etc. */
static unsigned int lapic_timer_reliable_states = (1 << 1); /* Default to only C1 */
+/* if arat is set no sense in checking on each c-state transition */
+static struct static_key lapic_timer_unreliable __read_mostly;
struct idle_cpu {
struct cpuidle_state *state_table;
@@ -507,12 +509,10 @@ static int intel_idle(struct cpuidle_device *dev,
{
unsigned long ecx = 1; /* break on interrupt flag */
struct cpuidle_state *state = &drv->states[index];
- unsigned long eax = flg2MWAIT(state->flags);
- unsigned int cstate;
+ unsigned long uninitialized_var(eax);
+ unsigned int uninitialized_var(cstate);
int cpu = smp_processor_id();
- cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
-
/*
* leave_mm() to avoid costly and often unnecessary wakeups
* for flushing the user TLB's associated with the active mm.
@@ -520,13 +520,22 @@ static int intel_idle(struct cpuidle_device *dev,
if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
leave_mm(cpu);
- if (!(lapic_timer_reliable_states & (1 << (cstate))))
- clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
+ if (static_key_false(&lapic_timer_unreliable)) {
+ eax = flg2MWAIT(state->flags);
+ cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) &
+ MWAIT_CSTATE_MASK) + 1;
+ if (!(lapic_timer_reliable_states & (1 << (cstate))))
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER,
+ &cpu);
+ }
mwait_idle_with_hints(eax, ecx);
- if (!(lapic_timer_reliable_states & (1 << (cstate))))
- clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
+ if (static_key_false(&lapic_timer_unreliable)) {
+ if (!(lapic_timer_reliable_states & (1 << (cstate))))
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT,
+ &cpu);
+ }
return index;
}
@@ -702,8 +711,10 @@ static int __init intel_idle_probe(void)
if (boot_cpu_has(X86_FEATURE_ARAT)) /* Always Reliable APIC Timer */
lapic_timer_reliable_states = LAPIC_TIMER_ALWAYS_RELIABLE;
- else
+ else {
+ static_key_slow_inc(&lapic_timer_unreliable);
on_each_cpu(__setup_broadcast_timer, (void *)true, 1);
+ }
pr_debug(PREFIX "v" INTEL_IDLE_VERSION
" model 0x%X\n", boot_cpu_data.x86_model);
--
1.8.2.rc2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] intel_idle: use static_key to optimize idle enter/exit paths
2014-07-11 17:54 [PATCH] intel_idle: use static_key to optimize idle enter/exit paths Jason Baron
@ 2014-07-28 20:38 ` Len Brown
2014-07-28 21:50 ` Jason Baron
0 siblings, 1 reply; 3+ messages in thread
From: Len Brown @ 2014-07-28 20:38 UTC (permalink / raw)
To: Jason Baron; +Cc: Linux PM list, linux-kernel
On Fri, Jul 11, 2014 at 1:54 PM, Jason Baron <jbaron@akamai.com> wrote:
> If 'arat' is set in the cpuflags, we can avoid the checks for entering/exiting
> the tick broadcast code entirely. It would seem that this is a hot enough code
> path to make this worthwhile. I ran a few hackbench runs, and consistenly see
> reduced branches and cycles.
Hi Jason,
Your logic looks right -- though I've never used this
static_key_slow_inc() stuff.
I'm impressed that something in user-space could detect this change.
Can you share how to run the workload where you detected a difference,
and describe the hardware you measured?
thanks,
-Len Brown, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] intel_idle: use static_key to optimize idle enter/exit paths
2014-07-28 20:38 ` Len Brown
@ 2014-07-28 21:50 ` Jason Baron
0 siblings, 0 replies; 3+ messages in thread
From: Jason Baron @ 2014-07-28 21:50 UTC (permalink / raw)
To: Len Brown; +Cc: Linux PM list, linux-kernel
On 07/28/2014 04:38 PM, Len Brown wrote:
> On Fri, Jul 11, 2014 at 1:54 PM, Jason Baron <jbaron@akamai.com> wrote:
>> If 'arat' is set in the cpuflags, we can avoid the checks for entering/exiting
>> the tick broadcast code entirely. It would seem that this is a hot enough code
>> path to make this worthwhile. I ran a few hackbench runs, and consistenly see
>> reduced branches and cycles.
>
> Hi Jason,
>
> Your logic looks right -- though I've never used this
> static_key_slow_inc() stuff.
> I'm impressed that something in user-space could detect this change.
>
> Can you share how to run the workload where you detected a difference,
> and describe the hardware you measured?
>
> thanks,
> -Len Brown, Intel Open Source Technology Center
>
Hi Len,
So using something like hackbench appears to show the difference
(with CONFIG_JUMP_LABEL enabled):
Without the patch:
Performance counter stats for 'perf bench sched messaging' (200 runs):
641.113816 task-clock # 8.020 CPUs utilized ( +- 0.16% ) [100.00%]
29020 context-switches # 0.045 M/sec ( +- 1.66% ) [100.00%]
2487 cpu-migrations # 0.004 M/sec ( +- 0.89% ) [100.00%]
10514 page-faults # 0.016 M/sec ( +- 0.11% )
2085813986 cycles # 3.253 GHz ( +- 0.16% ) [100.00%]
1658381753 stalled-cycles-frontend # 79.51% frontend cycles idle ( +- 0.18% ) [100.00%]
<not supported> stalled-cycles-backend
1221737228 instructions # 0.59 insns per cycle
# 1.36 stalled cycles per insn ( +- 0.12% ) [100.00%]
211723499 branches # 330.243 M/sec ( +- 0.14% ) [100.00%]
716846 branch-misses # 0.34% of all branches ( +- 0.66% )
0.079936660 seconds time elapsed ( +- 0.16% )
With the patch:
Performance counter stats for 'perf bench sched messaging' (200 runs):
638.819963 task-clock # 8.020 CPUs utilized ( +- 0.15% ) [100.00%]
27751 context-switches # 0.043 M/sec ( +- 1.61% ) [100.00%]
2502 cpu-migrations # 0.004 M/sec ( +- 0.92% ) [100.00%]
10503 page-faults # 0.016 M/sec ( +- 0.09% )
2078109565 cycles # 3.253 GHz ( +- 0.14% ) [100.00%]
1653002141 stalled-cycles-frontend # 79.54% frontend cycles idle ( +- 0.17% ) [100.00%]
<not supported> stalled-cycles-backend
1218013520 instructions # 0.59 insns per cycle
# 1.36 stalled cycles per insn ( +- 0.12% ) [100.00%]
210943815 branches # 330.209 M/sec ( +- 0.14% ) [100.00%]
697865 branch-misses # 0.33% of all branches ( +- 0.66% )
0.079648462 seconds time elapsed ( +- 0.15% )
So you can see that 'branches' is higher without the patch. Yes, there is some
'noise' here, but there is a measurable impact. It doesn't seem to make too much
sense to me to check for the presence of a h/w feature every time through this kind
of code path if its easily avoidable.
Hardware is 4 core Intel box:
model name : Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz
stepping : 9
microcode : 0x12
cpu MHz : 3501.000
cache size : 8192 KB
Thanks,
-Jason
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-07-28 21:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-11 17:54 [PATCH] intel_idle: use static_key to optimize idle enter/exit paths Jason Baron
2014-07-28 20:38 ` Len Brown
2014-07-28 21:50 ` Jason Baron
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).