[PATCH v2 0/3] per cpu resume latency

* [PATCH v2 0/3] per cpu resume latency
@ 2017-01-12 13:27 Alex Shi
  2017-01-12 13:27 ` [PATCH 1/3] cpuidle/menu: stop seeking deeper idle if current state is too deep Alex Shi
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Alex Shi @ 2017-01-12 13:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Daniel Lezcano, Rafael J . Wysocki,
	vincent.guittot, linux-pm, linux-kernel

V2 changes: remove #ifdef CONFIG_CPU_IDLE_GOV_MENU for func
dev_pm_qos_expose_latency_limit(), since we have CONFIG_PM.

---
cpu_dma_latency is designed to keep all cpu awake from deep c-state.
That is good keep system with short response latency. But sometime we
don't need all cpu power especially in a more and more multi-core day.
So set all cpu restless that lead to a big power waste.

A better way is to keep the short cpu response latency on needed cpu, 
while let other unnecesscary cpus go to deep idle. That is this 
patchset. We just use the pm_qos_resume_latency on cpu. Giving the 
short cpu latency on appointed cpu via setting value on
/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us
We can set we wanted latency value according to the value of
/sys/devices/system/cpu/cpuX/cpuidle/stateX/latency. to just a bit
less related state's latency value. Then cpu can get to this state or
higher.

Here is some testing data on my dragonboard 410c, the latency of state1
is 280us. It has 4 cores.

Benchmark: cyclictest -t 1  -n -i 10000 -l 1000 -q --latency=10000

without the patch:
Latency (us) Min:     87 Act:  209 Avg:  205 Max:     239
With the patch and cpu0/power/pm_qos_resume_latency_us is lower than
280us, like any value between 1 to 279
benchmark result on cpu0:
Latency (us) Min:     82 Act:   91 Avg:   95 Max:     110
In repeat testing, the Avg latency always drop to half of vanilla kernel
value, as well as Max latency value, although sometime the Max latency
is similar with vanilla kernel. 

Also we could use the cpu_dma_latency to get the similar short latency.
But 'idlestate' show all cpu are restless. Here is the idle status 
compression between cpu_dma_latency and this feature:

To record idlestate
#./idlestat --trace -t 10 -f /tmp/mytracepmlat -p -c -w -- cyclictest -t 1  -n -i 10000 -l 1000 -q --latency=10000

To compare the idle state, the 'total' colum show cpu1~3 nearly stay
in WFI state with cpu_dma_latency. but w/ my patch, they can get about
10 second sleep in 'spc' state.
# ./idlestat --import -f /tmp/mytracepmlat -b /tmp/mytrace -r comparison
Log is 10.055305 secs long with 7514 events
Log is 10.055370 secs long with 7545 events
--------------------------------------------------------------------------------
| C-state  |   min    |   max    |   avg    |   total  | hits  |  over | under |
--------------------------------------------------------------------------------
| clusterA                                                                     |
--------------------------------------------------------------------------------
|      WFI |      2us |  12.88ms |   4.18ms |    9.76s |  2334 |     0 |     0 |
|          |     -2us |  -14.4ms |    -17us |  -72.5ms |    -8 |     0 |     0 |
--------------------------------------------------------------------------------
|             cpu0                                                             |
--------------------------------------------------------------------------------
|      WFI |      3us | 100.98ms |  26.81ms |   10.03s |   374 |     0 |     0 |
|          |     -1us |     -1us |   -350us |   +5.0ms |    +5 |     0 |     0 |
--------------------------------------------------------------------------------
|             cpu1                                                             |
--------------------------------------------------------------------------------
|      WFI |    280us |   3.96ms |   1.96ms |  19.64ms |    10 |     0 |     5 |
|          |   +221us | -891.7ms |   -9.1ms |    -9.9s |  -889 |     0 |     0 |
|      spc |    234us |  19.71ms |   9.79ms |    9.91s |  1012 |     4 |     0 |
|          |   +167us |  +17.9ms |   +8.6ms |    +9.9s | +1009 |    +1 |     0 |
--------------------------------------------------------------------------------
|             cpu2                                                             |
--------------------------------------------------------------------------------
|      WFI |     86us |   1.01ms |    637us |   1.91ms |     3 |     0 |     0 |
|          |    -16us |  -26.5ms |   -8.8ms |   -10.0s | -1057 |     0 |     0 |
|      spc |    930us |  47.67ms |  10.05ms |    9.92s |   987 |     2 |     0 |
|          |   -1.4ms |  +43.7ms |   +6.9ms |    +9.9s |  +985 |    +2 |     0 |
--------------------------------------------------------------------------------
|             cpu3                                                             |
--------------------------------------------------------------------------------
|      WFI |      0us |      0us |      0us |      0us |     0 |     0 |     0 |
|          |          |    -4.0s | -152.1ms |   -10.0s |   -66 |     0 |     0 |
|      spc |    420us |    3.50s | 913.74ms |   10.05s |    11 |     3 |     0 |
|          |   -891us |    +3.5s | +911.0ms |   +10.0s |    +8 |    +1 |     0 |
--------------------------------------------------------------------------------

Thanks
Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread