On Tue, 2016-02-23 at 14:29 +0000, Mel Gorman wrote: > Added a suggested change from Doug Smythies and can add a Signed-off- > by > if Doug is ok with that. > > Changelog since v1 > o Remove divide that is likely unnecessary (ds > mythies) > o Rebase on top of linux-pm/linux-next > > The PID relies on samples of equal time but this does not apply for > deferrable timers when the CPU is idle. intel_pstate checks if the > actual > duration between samples is large and if so, the "busyness" of the > CPU > is scaled. > > This assumes the delay was a deferred timer but a workload may simply > have > been idle for a short time if it's context switching between a server > and > client or waiting very briefly on IO. It's compounded by the problem > that > server/clients migrate between CPUs due to wake-affine trying to > maximise > hot cache usage. In such cases, the cores are not considered busy and > the > frequency is dropped prematurely. > > This patch increases the hold-off value before the busyness is > scaled. It > was selected based simply on testing until the desired result was > found. > Tests were conducted with workloads that are either client/server > based > or short-lived IO. Attached specpower comparison for Haswell EP Grantley server.  This workload ran about an hour+. Difference in OPS: +1019 Difference in power: +308.6 Difference in perf/watt -312.479023 So we are consuming 308 Watts on average for doing 1019 operation more. Thanks, Srinivas