All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
@ 2016-08-19 12:18 Andreas Herrmann
  2016-08-19 12:21 ` [PATCH 1/1] " Andreas Herrmann
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Andreas Herrmann @ 2016-08-19 12:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: linux-pm, linux-kernel, Stratos Karafotis, Thomas Renninger

Hello,

I've observed performance degradation with different workloads on HP
ProLiant systems that use pcc-cpufreq driver between older and more
recent kernels. Bisection showed that commit 6393d6a102 (cpufreq:
ondemand: Eliminate the deadband effect) caused a significant
performance drop. This patch was introduced in v3.17.

I am not too familiar with the PCC stuff but I think that elimination
of the deadband effect causes a significant increase of requested
frequency changes which in turn will have to be served by pcc-cpufreq
and slow down corresponding systems significantly.

AFAIK there is no frequency table for this driver, instead the driver
just asks firmware to set any requested frequency for a CPU (if its in
the min-max-range of allowed frequencies). Thus I think the
probability that a requested target frequency is matching the current
frequency of a CPU is lower in comparison to drivers that use a
fixed set of frequencies.

This is with two exceptions:

(1) when a CPU is under full load -- maximum frequency already set and
    maximum frequency is requested.

(2) CPU has just "minor load" -- minimum frequency already set and
    minimum frequency is requested.

I think what commit 6393d6a102 caused is, that case (2) occurs only if
the CPU is fully idle. Whereas before commit 6393d6a102 this case
occurred in all situations when

	load < freq_min/freq_max * 100

I suggest to introduce the old behaviour (re-introduce the deadband effect)
for pcc-cpufreq with the patch that follows.

Here are typical numbers from kernel compilation tests with varying
number of compile jobs:

                     v4.8.0-rc2               4.8.0-rc2-pcc-cpufreq-deadband
 # of jobst   user     sys   elapsed   CPU     user     sys   elapsed   CPU
       2     440.39  116.49  4:33.35   203%   404.85  109.10  4:10.35   205%
       4     436.87  133.39  2:22.88   399%   381.83  128.00  2:06.84   401%
       8     475.49  157.68  1:22.24   769%   344.36  149.08  1:04.29   767%
      16     620.69  188.33  0:54.74  1477%   374.60  157.40  0:36.76  1447%
      32     815.79  209.58  0:37.22  2754%   490.46  160.22  0:24.87  2616%
      64     394.13   60.55  0:13.54  3355%   386.54   60.33  0:12.79  3493%
     120     398.24   61.55  0:14.60  3148%   390.44   61.19  0:13.07  3453%

As expected under full load (64, 120 jobs) performance is "almost"
comparable. But with partial load (esp. 32 jobs) length of kernel
build is just 2/3 with patched kernel in comparison to current
mainline. Similar behaviour is observable since introduction of commit
6393d6a102 (ie. since v3.17).

Numbers are from an HP ProLiant DL580 Gen8 system:
 	- Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
        - 60 CPUs, 128GB RAM

PCC info of this system

  PCCH header (virtual) addr: 0xffffc9000d840000
  PCCH header is at physical address: 0x7ac59000, signature: 0x24504343,
    length: 64 bytes, major: 1, minor: 0, supported features: 0x1,
    command field: 0x1, status field: 0x0, nominal latency: 300 us
    min time between commands: 50000 us, max time between commands: 1000000 us,
    nominal CPU frequency: 2800 MHz, minimum CPU frequency: 150 MHz,
    minimum CPU frequency without throttling: 1200 MHz

Kernel message when driver loads:
  pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz

My patch adds a debug message, which on this system looks like
  pcc-cpufreq: setting deadband limit to 1885000 kHz


Regards,

Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-08-19 12:18 [PATCH 0/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes Andreas Herrmann
@ 2016-08-19 12:21 ` Andreas Herrmann
  2016-08-29  6:01   ` Viresh Kumar
  2016-08-19 12:40 ` [PATCH 0/1] " Andreas Herrmann
  2016-09-23 16:56 ` [PATCH v2 0/2] " Andreas Herrmann
  2 siblings, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-08-19 12:21 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: linux-pm, linux-kernel, Stratos Karafotis, Thomas Renninger


Commit 6393d6a102 (cpufreq: ondemand: Eliminate the deadband effect)
introduced a performance regression for systems using pcc-cpufreq and
ondemand governor. This is measurable with different workloads. E.g.
wall-clock time for kernel compilation significantly increased.

The elimination of the deadband effect significantly increased the
number of frequency changes with pcc-cpufreq.

Instead of reverting commit 6393d6a102 I suggest to add a workaround
in pcc-cpufreq to re-introduce the deadband effect for this driver
only - to restore the old performance behaviour with pcc-cpufreq with
ondemand governor.

Following some performance numbers for similar kernel compilations to
illustrate the effect of commit 6393d6a102 and the proposed fix.

Following typical numbers of kernel compilation tests with varying number of
compile jobs:

                     v4.8.0-rc2               4.8.0-rc2-pcc-cpufreq-deadband
 # of jobst   user     sys   elapsed   CPU     user     sys   elapsed   CPU
       2     440.39  116.49  4:33.35   203%   404.85  109.10  4:10.35   205%
       4     436.87  133.39  2:22.88   399%   381.83  128.00  2:06.84   401%
       8     475.49  157.68  1:22.24   769%   344.36  149.08  1:04.29   767%
      16     620.69  188.33  0:54.74  1477%   374.60  157.40  0:36.76  1447%
      32     815.79  209.58  0:37.22  2754%   490.46  160.22  0:24.87  2616%
      64     394.13   60.55  0:13.54  3355%   386.54   60.33  0:12.79  3493%
     120     398.24   61.55  0:14.60  3148%   390.44   61.19  0:13.07  3453%

(HP ProLiant DL580 Gen8 system, 60 CPUs @ 2.80GHz)

Link: http://marc.info/?l=linux-pm&m=147160912625600
Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
---
 drivers/cpufreq/pcc-cpufreq.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

If this change is accepted maybe it's a good idea to tag it also for
stable kernels, e.g. starting with v4.4.


Thanks,

Andreas


diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
index 3f0ce2a..d2e2963 100644
--- a/drivers/cpufreq/pcc-cpufreq.c
+++ b/drivers/cpufreq/pcc-cpufreq.c
@@ -200,10 +200,26 @@ static int pcc_cpufreq_target(struct cpufreq_policy *policy,
 {
 	struct pcc_cpu *pcc_cpu_data;
 	struct cpufreq_freqs freqs;
+	static u32 limit = 0;
+
 	u16 status;
 	u32 input_buffer;
 	int cpu;
 
+	if (!limit) {
+		u32 f_min = policy->cpuinfo.min_freq / 1000;
+		u32 f_max = policy->cpuinfo.max_freq / 1000;
+		limit = (f_max - f_min) * f_min;
+		limit /= f_max;
+		limit *= 1000;
+		limit += f_min * 1000;
+		pr_debug("pcc-cpufreq: setting deadband limit to %u kHz\n",
+			limit);
+	}
+
+	if (target_freq < limit)
+		target_freq = policy->min;
+
 	cpu = policy->cpu;
 	pcc_cpu_data = per_cpu_ptr(pcc_cpu_info, cpu);
 
@@ -214,6 +230,10 @@ static int pcc_cpufreq_target(struct cpufreq_policy *policy,
 
 	freqs.old = policy->cur;
 	freqs.new = target_freq;
+
+	if (freqs.new == freqs.old)
+		return 0;
+
 	cpufreq_freq_transition_begin(policy, &freqs);
 	spin_lock(&pcc_lock);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-08-19 12:18 [PATCH 0/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes Andreas Herrmann
  2016-08-19 12:21 ` [PATCH 1/1] " Andreas Herrmann
@ 2016-08-19 12:40 ` Andreas Herrmann
  2016-09-23 16:56 ` [PATCH v2 0/2] " Andreas Herrmann
  2 siblings, 0 replies; 22+ messages in thread
From: Andreas Herrmann @ 2016-08-19 12:40 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: linux-pm, linux-kernel, Stratos Karafotis, Thomas Renninger

On Fri, Aug 19, 2016 at 02:18:14PM +0200, Andreas Herrmann wrote:
> Hello,
> 
> I've observed performance degradation with different workloads on HP
> ProLiant systems that use pcc-cpufreq driver between older and more
> recent kernels. Bisection showed that commit 6393d6a102 (cpufreq:
> ondemand: Eliminate the deadband effect) caused a significant
> performance drop. This patch was introduced in v3.17.
> 
> I am not too familiar with the PCC stuff but I think that elimination
> of the deadband effect causes a significant increase of requested
> frequency changes which in turn will have to be served by pcc-cpufreq
> and slow down corresponding systems significantly.
> 
> AFAIK there is no frequency table for this driver, instead the driver
> just asks firmware to set any requested frequency for a CPU (if its in
> the min-max-range of allowed frequencies). Thus I think the
> probability that a requested target frequency is matching the current
> frequency of a CPU is lower in comparison to drivers that use a
> fixed set of frequencies.
> 
> This is with two exceptions:
> 
> (1) when a CPU is under full load -- maximum frequency already set and
>     maximum frequency is requested.
> 
> (2) CPU has just "minor load" -- minimum frequency already set and
>     minimum frequency is requested.
> 
> I think what commit 6393d6a102 caused is, that case (2) occurs only if
> the CPU is fully idle. Whereas before commit 6393d6a102 this case
> occurred in all situations when
> 
> 	load < freq_min/freq_max * 100
> 
> I suggest to introduce the old behaviour (re-introduce the deadband effect)
> for pcc-cpufreq with the patch that follows.
> 
> Here are typical numbers from kernel compilation tests with varying
> number of compile jobs:
> 
>                      v4.8.0-rc2               4.8.0-rc2-pcc-cpufreq-deadband
>  # of jobst   user     sys   elapsed   CPU     user     sys   elapsed   CPU
>        2     440.39  116.49  4:33.35   203%   404.85  109.10  4:10.35   205%
>        4     436.87  133.39  2:22.88   399%   381.83  128.00  2:06.84   401%
>        8     475.49  157.68  1:22.24   769%   344.36  149.08  1:04.29   767%
>       16     620.69  188.33  0:54.74  1477%   374.60  157.40  0:36.76  1447%
>       32     815.79  209.58  0:37.22  2754%   490.46  160.22  0:24.87  2616%
>       64     394.13   60.55  0:13.54  3355%   386.54   60.33  0:12.79  3493%
>      120     398.24   61.55  0:14.60  3148%   390.44   61.19  0:13.07  3453%
> 
> As expected under full load (64, 120 jobs) performance is "almost"
> comparable. But with partial load (esp. 32 jobs) length of kernel
> build is just 2/3 with patched kernel in comparison to current
> mainline. Similar behaviour is observable since introduction of commit
> 6393d6a102 (ie. since v3.17).
> 
> Numbers are from an HP ProLiant DL580 Gen8 system:
>  	- Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
>         - 60 CPUs, 128GB RAM
> 
> PCC info of this system
> 
>   PCCH header (virtual) addr: 0xffffc9000d840000
>   PCCH header is at physical address: 0x7ac59000, signature: 0x24504343,
>     length: 64 bytes, major: 1, minor: 0, supported features: 0x1,
>     command field: 0x1, status field: 0x0, nominal latency: 300 us
>     min time between commands: 50000 us, max time between commands: 1000000 us,
>     nominal CPU frequency: 2800 MHz, minimum CPU frequency: 150 MHz,
>     minimum CPU frequency without throttling: 1200 MHz
> 
> Kernel message when driver loads:
>   pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
> 
> My patch adds a debug message, which on this system looks like
>   pcc-cpufreq: setting deadband limit to 1885000 kHz

BTW, I've forgotten to point out that my proposed change does not
fully restore the old behaviour (before v3.17).

The difference is that the combination of commit 6393d6a102 and my
change still maps frequencies previously requested between
[min_freq,max_freq] to the range [limit,max_freq].

So on above system pcc-cpufreq will only request freqencies that
are either 1200 MHz or between [1885 MHz, 2800 MHz].


Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-08-19 12:21 ` [PATCH 1/1] " Andreas Herrmann
@ 2016-08-29  6:01   ` Viresh Kumar
  2016-09-01 13:21     ` Andreas Herrmann
  0 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2016-08-29  6:01 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On 19-08-16, 14:21, Andreas Herrmann wrote:
> 
> Commit 6393d6a102 (cpufreq: ondemand: Eliminate the deadband effect)
> introduced a performance regression for systems using pcc-cpufreq and
> ondemand governor. This is measurable with different workloads. E.g.
> wall-clock time for kernel compilation significantly increased.
> 
> The elimination of the deadband effect significantly increased the
> number of frequency changes with pcc-cpufreq.
> 
> Instead of reverting commit 6393d6a102 I suggest to add a workaround
> in pcc-cpufreq to re-introduce the deadband effect for this driver
> only - to restore the old performance behaviour with pcc-cpufreq with
> ondemand governor.
> 
> Following some performance numbers for similar kernel compilations to
> illustrate the effect of commit 6393d6a102 and the proposed fix.
> 
> Following typical numbers of kernel compilation tests with varying number of
> compile jobs:
> 
>                      v4.8.0-rc2               4.8.0-rc2-pcc-cpufreq-deadband
>  # of jobst   user     sys   elapsed   CPU     user     sys   elapsed   CPU
>        2     440.39  116.49  4:33.35   203%   404.85  109.10  4:10.35   205%
>        4     436.87  133.39  2:22.88   399%   381.83  128.00  2:06.84   401%
>        8     475.49  157.68  1:22.24   769%   344.36  149.08  1:04.29   767%
>       16     620.69  188.33  0:54.74  1477%   374.60  157.40  0:36.76  1447%
>       32     815.79  209.58  0:37.22  2754%   490.46  160.22  0:24.87  2616%
>       64     394.13   60.55  0:13.54  3355%   386.54   60.33  0:12.79  3493%
>      120     398.24   61.55  0:14.60  3148%   390.44   61.19  0:13.07  3453%
> 
> (HP ProLiant DL580 Gen8 system, 60 CPUs @ 2.80GHz)
> 
> Link: http://marc.info/?l=linux-pm&m=147160912625600
> Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
> ---
>  drivers/cpufreq/pcc-cpufreq.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> If this change is accepted maybe it's a good idea to tag it also for
> stable kernels, e.g. starting with v4.4.

I am _really_ worried about such hacks in drivers to negate the effect of a
patch, that was actually good.

Did you try to increase the sampling period of ondemand governor to see if that
helps without this patch.

Also, it is important to understand why is the performance going down, while the
original commit should have made it better. Is it only about more transitions ?

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-08-29  6:01   ` Viresh Kumar
@ 2016-09-01 13:21     ` Andreas Herrmann
  2016-09-07  5:02       ` Viresh Kumar
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-01 13:21 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
> On 19-08-16, 14:21, Andreas Herrmann wrote:
> > 
> > Commit 6393d6a102 (cpufreq: ondemand: Eliminate the deadband effect)
> > introduced a performance regression for systems using pcc-cpufreq and
> > ondemand governor. This is measurable with different workloads. E.g.
> > wall-clock time for kernel compilation significantly increased.
> > 
> > The elimination of the deadband effect significantly increased the
> > number of frequency changes with pcc-cpufreq.
> > 
> > Instead of reverting commit 6393d6a102 I suggest to add a workaround
> > in pcc-cpufreq to re-introduce the deadband effect for this driver
> > only - to restore the old performance behaviour with pcc-cpufreq with
> > ondemand governor.
> > 
> > Following some performance numbers for similar kernel compilations to
> > illustrate the effect of commit 6393d6a102 and the proposed fix.
> > 
> > Following typical numbers of kernel compilation tests with varying number of
> > compile jobs:
> > 
> >                      v4.8.0-rc2               4.8.0-rc2-pcc-cpufreq-deadband
> >  # of jobst   user     sys   elapsed   CPU     user     sys   elapsed   CPU
> >        2     440.39  116.49  4:33.35   203%   404.85  109.10  4:10.35   205%
> >        4     436.87  133.39  2:22.88   399%   381.83  128.00  2:06.84   401%
> >        8     475.49  157.68  1:22.24   769%   344.36  149.08  1:04.29   767%
> >       16     620.69  188.33  0:54.74  1477%   374.60  157.40  0:36.76  1447%
> >       32     815.79  209.58  0:37.22  2754%   490.46  160.22  0:24.87  2616%
> >       64     394.13   60.55  0:13.54  3355%   386.54   60.33  0:12.79  3493%
> >      120     398.24   61.55  0:14.60  3148%   390.44   61.19  0:13.07  3453%
> > 
> > (HP ProLiant DL580 Gen8 system, 60 CPUs @ 2.80GHz)
> > 
> > Link: http://marc.info/?l=linux-pm&m=147160912625600
> > Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
> > ---
> >  drivers/cpufreq/pcc-cpufreq.c | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> > 
> > If this change is accepted maybe it's a good idea to tag it also for
> > stable kernels, e.g. starting with v4.4.

> I am _really_ worried about such hacks in drivers to negate the effect of a
> patch, that was actually good.

> Did you try to increase the sampling period of ondemand governor to see if that
> helps without this patch.

With an older kernel I've modified transition_latency of the driver
which in turn is used to calculate the sampling rate.

I started with the value return as "nominal latency" for PCC.  This
was 300000 ns on the test system and made things worse.  I've tested
other values as well unitl I've found a local optimium at 45000ns but
performance was lower in comparison to when I've applied my hack.

> Also, it is important to understand why is the performance going
> down, while the original commit should have made it better.

My understanding is that the original commit was tested with certain
combinations of hardware and cpufreq-drivers and the claim was that
for those (two?) tested combinations performance increased and power
consumption was lower. So I am not so sure what to expect from all
other cpufreq-driver/hardware combinations.

> Is it only about more transitions ?

I think this is the main issue.

In an older kernel version I activated/added debug output in
__cpufreq_driver_target(). Of course this creates a huge amount of
messages. But with original patch reverted it was like:

[   29.489677] cpufreq: target for CPU 0: 1760000 kHz (1200000 kHz), relation 2, requested 1760000 kHz
[   29.570364] cpufreq: target for CPU 0: 1216000 kHz (1760000 kHz), relation 2, requested 1216000 kHz
[   29.571055] cpufreq: target for CPU 1: 1200000 kHz (1148000 kHz), relation 0, requested 1200000 kHz
[   29.571483] cpufreq: target for CPU 1: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz
[   29.572042] cpufreq: target for CPU 2: 1200000 kHz (1064000 kHz), relation 0, requested 1200000 kHz
[   29.572503] cpufreq: target for CPU 2: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz

a lot of stuff, but system could still handle it and booted to the
prompt.

With the original patch applied the system was really flooded and
eventually became unresponsive:

** 459 printk messages dropped ** [   29.838689] cpufreq: target for CPU 43: 1408000 kHz (2384000 kHz), relation 2, requested 1408000 kHz
** 480 printk messages dropped ** [   29.993849] cpufreq: target for CPU 54: 1200000 kHz (1248000 kHz), relation 2, requested 1200000 kHz
** 413 printk messages dropped ** [   30.113921] cpufreq: target for CPU 59: 2064000 kHz (1248000 kHz), relation 2, requested 2064000 kHz
** 437 printk messages dropped ** [   30.245846] cpufreq: target for CPU 21: 1296000 kHz (1296000 kHz), relation 2, requested 1296000 kHz
** 435 printk messages dropped ** [   30.397748] cpufreq: target for CPU 13: 1280000 kHz (2640000 kHz), relation 2, requested 1280000 kHz
** 480 printk messages dropped ** [   30.541846] cpufreq: target for CPU 58: 2112000 kHz (1632000 kHz), relation 2, requested 2112000 kHz


Of course, ideas to further debug this are welcome, or suggestions
to fix the issue in another way.


Thanks,

Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-01 13:21     ` Andreas Herrmann
@ 2016-09-07  5:02       ` Viresh Kumar
  2016-09-13 10:53         ` Andreas Herrmann
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Viresh Kumar @ 2016-09-07  5:02 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On 01-09-16, 15:21, Andreas Herrmann wrote:
> On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:

> > I am _really_ worried about such hacks in drivers to negate the effect of a
> > patch, that was actually good.
> 
> > Did you try to increase the sampling period of ondemand governor to see if that
> > helps without this patch.
> 
> With an older kernel I've modified transition_latency of the driver
> which in turn is used to calculate the sampling rate.

Naah, that isn't what I was looking for, sorry :(

To explain it a bit more, this is what the patch did.

Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
sampling rate and system load tries to change the frequency of the
underlying hardware and select one of those.

Before the original patch came in, F2 and F3 were never getting
selected and the system was stuck in F1 for a long time. Which will
decrease the performance for that period of time as we should have
switched to a higher frequency really.

With the new patch we switch to the frequency proportional to current
load.

The requests made from cpufreq-governor wouldn't have changed at all
with that, but the number of freq changes at hardware level may
actually change as we might be returning very quickly if the target
freq evaluated to F1 in the earlier case.

That may lead to switching to frequencies F2 and F3, which wasn't
happening earlier.

I don't think the result of that should be too bad.. We should have
performed better almost everywhere.

transition_latency is used only to initialize sampling rate, but that
may get modified from userspace later on. Please tell us the value
read from sysfs file sampling_rate present in:

/sys/devices/system/cpu/cpufreq/policy0/ondemand/

And try playing with that variable a bit to see if you can make things
better or not.

> I started with the value return as "nominal latency" for PCC.  This
> was 300000 ns on the test system and made things worse.  I've tested
> other values as well unitl I've found a local optimium at 45000ns but
> performance was lower in comparison to when I've applied my hack.

Can you try to use kernel tracer (ftrace) and see how the frequencies
are getting changed and at what frequency.

We need to understand the problem better, as I am not 100% sure what's
going on right now.

> > Also, it is important to understand why is the performance going
> > down, while the original commit should have made it better.
> 
> My understanding is that the original commit was tested with certain
> combinations of hardware and cpufreq-drivers and the claim was that
> for those (two?) tested combinations performance increased and power
> consumption was lower. So I am not so sure what to expect from all
> other cpufreq-driver/hardware combinations.

It was principally the right thing to do IMO. And I don't think any
other hardware should get affected badly. At the max, the tuning needs
to be made a bit better.

> > Is it only about more transitions ?
> 
> I think this is the main issue.

Then it can be controlled with sampling rate from userspace.

> In an older kernel version I activated/added debug output in
> __cpufreq_driver_target(). Of course this creates a huge amount of
> messages. But with original patch reverted it was like:
> 
> [   29.489677] cpufreq: target for CPU 0: 1760000 kHz (1200000 kHz), relation 2, requested 1760000 kHz
> [   29.570364] cpufreq: target for CPU 0: 1216000 kHz (1760000 kHz), relation 2, requested 1216000 kHz
> [   29.571055] cpufreq: target for CPU 1: 1200000 kHz (1148000 kHz), relation 0, requested 1200000 kHz
> [   29.571483] cpufreq: target for CPU 1: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz
> [   29.572042] cpufreq: target for CPU 2: 1200000 kHz (1064000 kHz), relation 0, requested 1200000 kHz
> [   29.572503] cpufreq: target for CPU 2: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz

Your platform is a bit special as it uses ->target() callback and not
->target_index(). And so you can pretty much switch to any frequency.

Can you please read value of all the sysfs files present in the
governor directory? That would be helpful. Maybe we can play with some
more files like: up_threshold to see what the results are.

> a lot of stuff, but system could still handle it and booted to the
> prompt.
> 
> With the original patch applied the system was really flooded and
> eventually became unresponsive:
> 
> ** 459 printk messages dropped ** [   29.838689] cpufreq: target for CPU 43: 1408000 kHz (2384000 kHz), relation 2, requested 1408000 kHz
> ** 480 printk messages dropped ** [   29.993849] cpufreq: target for CPU 54: 1200000 kHz (1248000 kHz), relation 2, requested 1200000 kHz
> ** 413 printk messages dropped ** [   30.113921] cpufreq: target for CPU 59: 2064000 kHz (1248000 kHz), relation 2, requested 2064000 kHz
> ** 437 printk messages dropped ** [   30.245846] cpufreq: target for CPU 21: 1296000 kHz (1296000 kHz), relation 2, requested 1296000 kHz
> ** 435 printk messages dropped ** [   30.397748] cpufreq: target for CPU 13: 1280000 kHz (2640000 kHz), relation 2, requested 1280000 kHz
> ** 480 printk messages dropped ** [   30.541846] cpufreq: target for CPU 58: 2112000 kHz (1632000 kHz), relation 2, requested 2112000 kHz

This looks even more dangerous :)

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-07  5:02       ` Viresh Kumar
@ 2016-09-13 10:53         ` Andreas Herrmann
  2016-09-14 14:56         ` Andreas Herrmann
  2016-09-16  9:47         ` Andreas Herrmann
  2 siblings, 0 replies; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-13 10:53 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
> On 01-09-16, 15:21, Andreas Herrmann wrote:
> > On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
> 
> > > I am _really_ worried about such hacks in drivers to negate the effect of a
> > > patch, that was actually good.
> > 
> > > Did you try to increase the sampling period of ondemand governor to see if that
> > > helps without this patch.
> > 
> > With an older kernel I've modified transition_latency of the driver
> > which in turn is used to calculate the sampling rate.
> 
> Naah, that isn't what I was looking for, sorry :(
> 
> To explain it a bit more, this is what the patch did.
> 
> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
> sampling rate and system load tries to change the frequency of the
> underlying hardware and select one of those.
> 
> Before the original patch came in, F2 and F3 were never getting
> selected and the system was stuck in F1 for a long time. Which will
> decrease the performance for that period of time as we should have
> switched to a higher frequency really.
> 
> With the new patch we switch to the frequency proportional to current
> load.
> 
> The requests made from cpufreq-governor wouldn't have changed at all
> with that, but the number of freq changes at hardware level may
> actually change as we might be returning very quickly if the target
> freq evaluated to F1 in the earlier case.
> 
> That may lead to switching to frequencies F2 and F3, which wasn't
> happening earlier.
> 
> I don't think the result of that should be too bad.. We should have
> performed better almost everywhere.
> 
> transition_latency is used only to initialize sampling rate, but that
> may get modified from userspace later on. Please tell us the value
> read from sysfs file sampling_rate present in:
> 
> /sys/devices/system/cpu/cpufreq/policy0/ondemand/

/sys/devices/system/cpu/cpufreq/ondemand contains following values:

ignore_nice_load           : 0
io_is_busy                 : 1
min_sampling_rate          : 10000
powersave_bias             : 0
up_threshold               : 95
sampling_rate              : 10000
sampling_down_factor       : 1

FYI, for each CPU direcories /sys/devices/system/cpu/cpufreq/policy* look like:

scaling_cur_freq           : 1200000
cpuinfo_max_freq           : 2800000
cpuinfo_transition_latency : 0
cpuinfo_min_freq           : 1200000
cpuinfo_cur_freq           : 1204000
affected_cpus              : 0
scaling_governor           : ondemand
scaling_driver             : pcc-cpufreq
scaling_max_freq           : 2800000
related_cpus               : 0
scaling_setspeed           : <unsupported>
scaling_available_governors: ondemand performance 
scaling_min_freq           : 1200000

(which is for policy0 for CPU0)

> And try playing with that variable a bit to see if you can make things
> better or not.

Done that. Following the elapsed time with ondemand governor and
different values for sampling rate (for each of 4.8-rc5
vanilla/modified with my patch/commit 6393d6a reverted).

4.8-rc5 (vanilla)
                                                 sample_rate
           10000            20000            30000            45000            60000            90000
jobs  seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU
2     283.47   203.00  273.79   204.00  276.08   204.00  276.99   204.00  281.80   203.60  291.00   203.00 
4     146.92   399.20  140.98   399.80  141.55   399.80  140.91   400.00  142.91   400.00  146.49   399.80 
8      81.66   770.00   77.44   769.20   76.81   769.20   73.74   769.20   73.17   767.20   72.75   767.00 
16     54.31  1466.20   49.58  1458.40   48.37  1456.60   44.29  1447.80   42.89  1436.40   41.12  1416.80
32     36.71  2720.60   30.70  2672.80   29.22  2578.80   26.34  2563.20   25.57  2506.80   24.58  2457.80
64     13.99  3291.20   13.63  3295.40   13.55  3309.20   13.30  3354.60   13.66  3264.40   13.55  3288.60
120    14.84  3114.20   14.10  3215.00   14.57  3115.20   14.11  3209.80   14.17  3198.60   14.09  3220.80
------------------------------
4.8-rc5 re-intro deadband for pcc-cpufreq only (my patch)
                                                 sample_rate
           10000            20000            30000            45000            60000            90000
jobs  seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU
2     270.88   192.00  270.87   192.80  270.65   192.00  274.08   192.00  277.97   192.80  289.85   193.00 
4     140.23   379.20  138.94   379.80  138.20   378.80  138.99   379.20  140.81   379.60  145.83   380.40 
8      76.15   738.20   73.86   733.60   71.58   734.20   70.43   730.40   70.32   730.60   71.14   728.20 
16     47.39  1414.80   44.20  1395.60   43.02  1384.60   41.09  1368.00   40.37  1362.80   39.56  1351.20
32     32.78  2659.80   29.05  2531.80   27.18  2482.40   25.36  2471.60   24.55  2441.00   24.18  2365.80
64     13.34  3213.20   13.29  3204.60   13.38  3158.60   13.30  3162.40   13.57  3089.60   13.48  3125.20
120    13.89  3103.00   14.00  3047.40   13.87  3068.40   13.83  3077.80   13.93  3054.80   14.10  3033.60
------------------------------
4.8-rc5 reverted commit 6393d6a
                                                 sample_rate
           10000            20000            30000            45000            60000            90000
jobs  seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU
2     250.94   205.00  256.26   205.00  261.29   205.00  268.92   205.00  275.60   204.80  288.64   204.00 
4     127.20   401.20  129.63   401.80  131.80   402.00  135.41   401.20  138.88   401.20  144.96   401.00 
8      64.24   767.60   64.18   769.60   64.49   770.20   65.69   766.00   66.38   768.60   68.64   767.40 
16     37.02  1438.00   36.09  1432.20   36.10  1429.40   36.10  1424.60   36.61  1408.80   36.87  1405.60
32     24.73  2618.80   23.28  2533.80   23.00  2513.40   22.45  2526.00   22.92  2463.00   23.12  2422.20
64     12.62  3550.60   12.94  3439.60   12.95  3425.60   12.94  3421.80   13.29  3336.80   13.65  3257.80
120    13.30  3408.80   13.43  3346.40   13.44  3347.80   13.55  3322.60   13.76  3277.20   14.19  3192.40

> > I started with the value return as "nominal latency" for PCC.  This
> > was 300000 ns on the test system and made things worse.  I've tested
> > other values as well unitl I've found a local optimium at 45000ns but
> > performance was lower in comparison to when I've applied my hack.
> 
> Can you try to use kernel tracer (ftrace) and see how the frequencies
> are getting changed and at what frequency.

I am currently looking into this ...

> We need to understand the problem better, as I am not 100% sure what's
> going on right now.

... to gather related data.

> > > Also, it is important to understand why is the performance going
> > > down, while the original commit should have made it better.
> > 
> > My understanding is that the original commit was tested with certain
> > combinations of hardware and cpufreq-drivers and the claim was that
> > for those (two?) tested combinations performance increased and power
> > consumption was lower. So I am not so sure what to expect from all
> > other cpufreq-driver/hardware combinations.
> 
> It was principally the right thing to do IMO. And I don't think any
> other hardware should get affected badly. At the max, the tuning needs
> to be made a bit better.
> 
> > > Is it only about more transitions ?
> > 
> > I think this is the main issue.
> 
> Then it can be controlled with sampling rate from userspace.
> 
> > In an older kernel version I activated/added debug output in
> > __cpufreq_driver_target(). Of course this creates a huge amount of
> > messages. But with original patch reverted it was like:
> > 
> > [   29.489677] cpufreq: target for CPU 0: 1760000 kHz (1200000 kHz), relation 2, requested 1760000 kHz
> > [   29.570364] cpufreq: target for CPU 0: 1216000 kHz (1760000 kHz), relation 2, requested 1216000 kHz
> > [   29.571055] cpufreq: target for CPU 1: 1200000 kHz (1148000 kHz), relation 0, requested 1200000 kHz
> > [   29.571483] cpufreq: target for CPU 1: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz
> > [   29.572042] cpufreq: target for CPU 2: 1200000 kHz (1064000 kHz), relation 0, requested 1200000 kHz
> > [   29.572503] cpufreq: target for CPU 2: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz
> 
> Your platform is a bit special as it uses ->target() callback and not
> ->target_index(). And so you can pretty much switch to any frequency.
> 
> Can you please read value of all the sysfs files present in the
> governor directory? That would be helpful. Maybe we can play with some
> more files like: up_threshold to see what the results are.

For sysfs values see above. I've not yet played with up_threshold but
plan to do this as well.

> > a lot of stuff, but system could still handle it and booted to the
> > prompt.
> > 
> > With the original patch applied the system was really flooded and
> > eventually became unresponsive:
> > 
> > ** 459 printk messages dropped ** [   29.838689] cpufreq: target for CPU 43: 1408000 kHz (2384000 kHz), relation 2, requested 1408000 kHz
> > ** 480 printk messages dropped ** [   29.993849] cpufreq: target for CPU 54: 1200000 kHz (1248000 kHz), relation 2, requested 1200000 kHz
> > ** 413 printk messages dropped ** [   30.113921] cpufreq: target for CPU 59: 2064000 kHz (1248000 kHz), relation 2, requested 2064000 kHz
> > ** 437 printk messages dropped ** [   30.245846] cpufreq: target for CPU 21: 1296000 kHz (1296000 kHz), relation 2, requested 1296000 kHz
> > ** 435 printk messages dropped ** [   30.397748] cpufreq: target for CPU 13: 1280000 kHz (2640000 kHz), relation 2, requested 1280000 kHz
> > ** 480 printk messages dropped ** [   30.541846] cpufreq: target for CPU 58: 2112000 kHz (1632000 kHz), relation 2, requested 2112000 kHz
> 
> This looks even more dangerous :)
> 
> -- 
> viresh


Thanks,

Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-07  5:02       ` Viresh Kumar
  2016-09-13 10:53         ` Andreas Herrmann
@ 2016-09-14 14:56         ` Andreas Herrmann
  2016-10-05  5:17           ` Viresh Kumar
  2016-09-16  9:47         ` Andreas Herrmann
  2 siblings, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-14 14:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
> On 01-09-16, 15:21, Andreas Herrmann wrote:

  ---8<---

> > I started with the value return as "nominal latency" for PCC.  This
> > was 300000 ns on the test system and made things worse.  I've tested
> > other values as well unitl I've found a local optimium at 45000ns but
> > performance was lower in comparison to when I've applied my hack.
> 
> Can you try to use kernel tracer (ftrace) and see how the frequencies
> are getting changed and at what frequency.

Below is some trace data. I hope it is of some help.

(A) - sampling 10s period when system is idle
(B) - sampling 10s period when system partially loaded (kernel
      compilation using 2 jobs)

(1) 4.8-rc5
(2) 4.8-rc5 with my patch (reintro of deadband effect within
    pcc-cpufreq)
(3) 4.8-rc5 with reversal of 6393d6a102 (cpufreq: ondemand: Eliminate
    the deadband effect)

Let me know whether you are looking for other trace data wrt this
issue.


Thanks,

Andreas

---

(A)-(1)

 # Total Lost Samples: 0
 # Samples: 41  of event 'power:cpu_frequency'
 # Event count (approx.): 41
 # Overhead  Command       Shared Object     Symbol                       
 # ........  ............  ................  .............................
     39.02%  kworker/14:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
     29.27%  kworker/0:0   [kernel.vmlinux]  [k] cpufreq_notify_transition
     19.51%  kworker/10:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      7.32%  kworker/5:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
      2.44%  kworker/23:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
      2.44%  kworker/40:1  [kernel.vmlinux]  [k] cpufreq_notify_transition

(A)-(2)

 # Total Lost Samples: 0
 # Samples: 6  of event 'power:cpu_frequency'
 # Event count (approx.): 6
 # Overhead  Command       Shared Object     Symbol                       
 # ........  ............  ................  .............................
     33.33%  kworker/1:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
     16.67%  kworker/16:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
     16.67%  kworker/22:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
     16.67%  kworker/26:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
     16.67%  kworker/33:1  [kernel.vmlinux]  [k] cpufreq_notify_transition

(A)-(3)

 # Total Lost Samples: 0
 # Samples: 7  of event 'power:cpu_frequency'
 # Event count (approx.): 7
 # Overhead  Command       Shared Object     Symbol                       
 # ........  ............  ................  .............................
     28.57%  kworker/58:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
     14.29%  kworker/19:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
     14.29%  kworker/20:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
     14.29%  kworker/22:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
     14.29%  kworker/23:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
     14.29%  kworker/35:1  [kernel.vmlinux]  [k] cpufreq_notify_transition

---

(B)-(1)

 # Total Lost Samples: 0
 # Samples: 2K of event 'power:cpu_frequency'
 # Event count (approx.): 2382
 # Overhead  Command       Shared Object     Symbol                       
 # ........  ............  ................  .............................
      5.75%  kworker/0:0   [kernel.vmlinux]  [k] cpufreq_notify_transition
      4.16%  kworker/12:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
      3.11%  kworker/17:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      2.94%  kworker/2:1   [kernel.vmlinux]  [k] cpufreq_notify_transition
      2.73%  kworker/19:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      ...

(B)-(2)

 # Total Lost Samples: 0
 # Samples: 320  of event 'power:cpu_frequency'
 # Event count (approx.): 320
 # Overhead  Command       Shared Object     Symbol                       
 # ........  ............  ................  .............................
      4.69%  kworker/56:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      4.06%  kworker/12:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      4.06%  kworker/28:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      4.06%  kworker/6:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
      3.75%  kworker/32:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
      ...

(B)-(3)

 # Total Lost Samples: 0
 # Samples: 333  of event 'power:cpu_frequency'
 # Event count (approx.): 333
 # Overhead  Command       Shared Object     Symbol                       
 # ........  ............  ................  .............................
      4.80%  kworker/51:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      4.50%  kworker/39:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      4.20%  kworker/47:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      3.90%  kworker/59:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
      3.90%  kworker/7:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
      ...

---

With (1) pcc-cpufreq tries to set pretty much every frequency even if
the system is idle, e.g. here is the start of (perf script output)
from (A)-(1):

     kworker/5:2   904 [005]   255.778343: power:cpu_frequency: state=1216000 cpu_id=5
     kworker/5:2   904 [005]   255.794382: power:cpu_frequency: state=1264000 cpu_id=5
     kworker/5:2   904 [005]   256.102400: power:cpu_frequency: state=1200000 cpu_id=5
    kworker/10:1   171 [010]   258.010362: power:cpu_frequency: state=2224000 cpu_id=10
    kworker/10:1   171 [010]   258.026366: power:cpu_frequency: state=1264000 cpu_id=10
    kworker/10:1   171 [010]   258.594514: power:cpu_frequency: state=1200000 cpu_id=10
    kworker/10:1   171 [010]   258.618417: power:cpu_frequency: state=1232000 cpu_id=10
    kworker/10:1   171 [010]   258.634409: power:cpu_frequency: state=1264000 cpu_id=10
    kworker/10:1   171 [010]   258.674467: power:cpu_frequency: state=1200000 cpu_id=10
    kworker/10:1   171 [010]   258.730486: power:cpu_frequency: state=1216000 cpu_id=10
    kworker/40:1   388 [040]   258.730999: power:cpu_frequency: state=1200000 cpu_id=40
    kworker/23:2   775 [023]   258.731504: power:cpu_frequency: state=1200000 cpu_id=23
    kworker/14:1   178 [014]   258.732013: power:cpu_frequency: state=1216000 cpu_id=14
    kworker/10:1   171 [010]   258.906434: power:cpu_frequency: state=1200000 cpu_id=10
    kworker/14:1   178 [014]   258.970500: power:cpu_frequency: state=1200000 cpu_id=14
    kworker/14:1   178 [014]   258.998440: power:cpu_frequency: state=1232000 cpu_id=14
    kworker/14:1   178 [014]   259.034490: power:cpu_frequency: state=1200000 cpu_id=14
    kworker/14:1   178 [014]   259.095089: power:cpu_frequency: state=1216000 cpu_id=14
    kworker/14:1   178 [014]   259.270470: power:cpu_frequency: state=1200000 cpu_id=14
    ...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-07  5:02       ` Viresh Kumar
  2016-09-13 10:53         ` Andreas Herrmann
  2016-09-14 14:56         ` Andreas Herrmann
@ 2016-09-16  9:47         ` Andreas Herrmann
  2016-09-16 18:48           ` Stratos Karafotis
  2 siblings, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-16  9:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
> On 01-09-16, 15:21, Andreas Herrmann wrote:
> > On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:

> > > I am _really_ worried about such hacks in drivers to negate the effect of a
> > > patch, that was actually good.
> > 
> > > Did you try to increase the sampling period of ondemand governor to see if that
> > > helps without this patch.
> > 
> > With an older kernel I've modified transition_latency of the driver
> > which in turn is used to calculate the sampling rate.

> Naah, that isn't what I was looking for, sorry :(

> To explain it a bit more, this is what the patch did.

> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
> sampling rate and system load tries to change the frequency of the
> underlying hardware and select one of those.

> Before the original patch came in, F2 and F3 were never getting
> selected and the system was stuck in F1 for a long time.

I think this is not a general statement. Such a behaviour is not
common to all systems. Before commit 6393d6a target frequency was
based on

   freq_next = load * policy->cpuinfo.max_freq / 100;

F2 would have been selected if

  load = F2 * 100 / F7

If F2 was not seen it can mean

(1) either the load value was not hit in practice during monitoring of
    a certain workload

(2) or the calculated load value (in integer representation) would
    select F1 or F3 (there is no corresponding integer value that
    would select F2)

E.g. for the Intel i7-3770 system mentioned in commit message for
6393d6a I think a load value of 49 should have selected 1700000 which
is not shown in the provided frequency table.

What essentially changed was how load values are mapped to target
frequencies. For the HP system (min_freq=1200000, max_freq=2800000)
that I used in my tests, the old code would create following mapping:

load | freq_next | used target frequency
________________________________________
0      0            1200000
10     280000       1200000
20     560000       1200000
30     840000       1200000
40     1120000      1200000
42     1176000      1200000
43     1204000      1204000
50     1400000      1400000
60     1680000      1680000
70     1960000      1960000
80     2240000      2240000
90     2520000      2520000
100    2800000      2800000

The new code (introduced with commit 6393d6a) changed the mapping as
follows:

load | freq_next | used target frequency
________________________________________
0      1200000      1200000
10     1360000	    1360000
20     1520000	    1520000
30     1680000	    1680000
40     1840000	    1840000
42     1872000	    1872000
43     1888000	    1888000
50     2000000	    2000000
60     2160000	    2160000
70     2320000	    2320000
80     2480000	    2480000
90     2640000	    2640000
100    2800000	    2800000

My patch creates a third mapping. It basically ensures that up to a
load value of 42 the minimum frequency is used.

> Which will decrease the performance for that period of time as we
> should have switched to a higher frequency really.

I am not sure whether it's really useful for all systems using
ondemand governor to increase frequency above min_freq even if load is
just above 0. Of course expectation is that performance will be equal
or better than before. But how overall power consumption changes
depends on the hardware and its power saving capabilites.

  ---8<---

> > My understanding is that the original commit was tested with certain
> > combinations of hardware and cpufreq-drivers and the claim was that
> > for those (two?) tested combinations performance increased and power
> > consumption was lower. So I am not so sure what to expect from all
> > other cpufreq-driver/hardware combinations.

> It was principally the right thing to do IMO. And I don't think any
> other hardware should get affected badly. At the max, the tuning needs
> to be made a bit better.

  ---8<---

It seems that the decision how to best map load values to target
frequencies is kind of hardware specific.

Maybe a solution to this is that the cpufreq driver should be able to
provide a mapping function to overwrite the current default
calculation.


Regards,

Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-16  9:47         ` Andreas Herrmann
@ 2016-09-16 18:48           ` Stratos Karafotis
       [not found]             ` <CADmjqpNE9f7fzQjWsHKB4wEjLq-4ZvQpaC314OcLdQ-i_TAABg@mail.gmail.com>
  0 siblings, 1 reply; 22+ messages in thread
From: Stratos Karafotis @ 2016-09-16 18:48 UTC (permalink / raw)
  To: Andreas Herrmann, Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Thomas Renninger

Hi,

On 16/09/2016 12:47 μμ, Andreas Herrmann wrote:
> On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
>> On 01-09-16, 15:21, Andreas Herrmann wrote:
>>> On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
> 
>>>> I am _really_ worried about such hacks in drivers to negate the effect of a
>>>> patch, that was actually good.
>>>
>>>> Did you try to increase the sampling period of ondemand governor to see if that
>>>> helps without this patch.
>>>
>>> With an older kernel I've modified transition_latency of the driver
>>> which in turn is used to calculate the sampling rate.
> 
>> Naah, that isn't what I was looking for, sorry :(
> 
>> To explain it a bit more, this is what the patch did.
> 
>> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
>> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
>> sampling rate and system load tries to change the frequency of the
>> underlying hardware and select one of those.
> 
>> Before the original patch came in, F2 and F3 were never getting
>> selected and the system was stuck in F1 for a long time.
> 
> I think this is not a general statement. Such a behaviour is not
> common to all systems. Before commit 6393d6a target frequency was
> based on
> 
>    freq_next = load * policy->cpuinfo.max_freq / 100;
> 
> F2 would have been selected if
> 
>   load = F2 * 100 / F7
> 
> If F2 was not seen it can mean
> 
> (1) either the load value was not hit in practice during monitoring of
>     a certain workload
> 
> (2) or the calculated load value (in integer representation) would
>     select F1 or F3 (there is no corresponding integer value that
>     would select F2)
> 
> E.g. for the Intel i7-3770 system mentioned in commit message for
> 6393d6a I think a load value of 49 should have selected 1700000 which
> is not shown in the provided frequency table.

I think this is not true, because before the specific patch the relation
of frequency selection was CPUFREQ_RELATION_L. This is the reason
that a load value of 49 with a freq_next 1666490 would have a
target frequency 1600000.

> 
> What essentially changed was how load values are mapped to target
> frequencies. For the HP system (min_freq=1200000, max_freq=2800000)
> that I used in my tests, the old code would create following mapping:
> 
> load | freq_next | used target frequency
> ________________________________________
> 0      0            1200000
> 10     280000       1200000
> 20     560000       1200000
> 30     840000       1200000
> 40     1120000      1200000
> 42     1176000      1200000
> 43     1204000      1204000
> 50     1400000      1400000
> 60     1680000      1680000
> 70     1960000      1960000
> 80     2240000      2240000
> 90     2520000      2520000
> 100    2800000      2800000
> 
> The new code (introduced with commit 6393d6a) changed the mapping as
> follows:
> 
> load | freq_next | used target frequency
> ________________________________________
> 0      1200000      1200000
> 10     1360000	    1360000
> 20     1520000	    1520000
> 30     1680000	    1680000
> 40     1840000	    1840000
> 42     1872000	    1872000
> 43     1888000	    1888000
> 50     2000000	    2000000
> 60     2160000	    2160000
> 70     2320000	    2320000
> 80     2480000	    2480000
> 90     2640000	    2640000
> 100    2800000	    2800000
> 
> My patch creates a third mapping. It basically ensures that up to a
> load value of 42 the minimum frequency is used.
> 
>> Which will decrease the performance for that period of time as we
>> should have switched to a higher frequency really.
> 
> I am not sure whether it's really useful for all systems using
> ondemand governor to increase frequency above min_freq even if load is
> just above 0. Of course expectation is that performance will be equal
> or better than before. But how overall power consumption changes
> depends on the hardware and its power saving capabilites.
> 
>   ---8<---
> 
>>> My understanding is that the original commit was tested with certain
>>> combinations of hardware and cpufreq-drivers and the claim was that
>>> for those (two?) tested combinations performance increased and power
>>> consumption was lower. So I am not so sure what to expect from all
>>> other cpufreq-driver/hardware combinations.
> 
>> It was principally the right thing to do IMO. And I don't think any
>> other hardware should get affected badly. At the max, the tuning needs
>> to be made a bit better.
> 
>   ---8<---
> 
> It seems that the decision how to best map load values to target
> frequencies is kind of hardware specific.
> 
> Maybe a solution to this is that the cpufreq driver should be able to
> provide a mapping function to overwrite the current default
> calculation.
> 

I'm not familiar with ppc-cpufreq drive but maybe patch 6393d6 just
uncovered an "issue" that was already existed but only on higher loads.

Because, with or without patch 6393d6, if the specific CPU doesn't
use a frequency table, there will many frequency transitions in
higher loads too. I believe, though, that the side effect it's smaller
in higher frequencies because CPUs tend to work on lowest and highest
frequencies.

What about a patch in ppc-cpufreq driver that permits frequency
changes only in specific steps and not in arbitrary values?


Regards,
Stratos

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
       [not found]             ` <CADmjqpNE9f7fzQjWsHKB4wEjLq-4ZvQpaC314OcLdQ-i_TAABg@mail.gmail.com>
@ 2016-09-19 16:16               ` Andreas Herrmann
  2016-09-19 19:39                 ` Stratos Karafotis
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-19 16:16 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm,
	Linux Kernel Mailing List, Stratos Karafotis, Thomas Renninger

On Fri, Sep 16, 2016 at 09:58:42PM +0300, Stratos Karafotis wrote:
> Hi,
> 
> [ I 'm resending this message, because I think some recipients didn't receive
> it. ]
> 
> On 16/09/2016 12:47 μμ, Andreas Herrmann wrote:
> > On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
> >> On 01-09-16, 15:21, Andreas Herrmann wrote:
> >>> On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
> >
> >>>> I am _really_ worried about such hacks in drivers to negate the effect of
> a
> >>>> patch, that was actually good.
> >>>
> >>>> Did you try to increase the sampling period of ondemand governor to see if
> that
> >>>> helps without this patch.
> >>>
> >>> With an older kernel I've modified transition_latency of the driver
> >>> which in turn is used to calculate the sampling rate.
> >
> >> Naah, that isn't what I was looking for, sorry 
> >
> >> To explain it a bit more, this is what the patch did.
> >
> >> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
> >> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
> >> sampling rate and system load tries to change the frequency of the
> >> underlying hardware and select one of those.
> >
> >> Before the original patch came in, F2 and F3 were never getting
> >> selected and the system was stuck in F1 for a long time.
> >
> > I think this is not a general statement. Such a behaviour is not
> > common to all systems. Before commit 6393d6a target frequency was
> > based on
> >
> >    freq_next = load * policy->cpuinfo.max_freq / 100;
> >
> > F2 would have been selected if
> >
> >   load = F2 * 100 / F7
> >
> > If F2 was not seen it can mean
> >
> > (1) either the load value was not hit in practice during monitoring of
> >     a certain workload
> >
> > (2) or the calculated load value (in integer representation) would
> >     select F1 or F3 (there is no corresponding integer value that
> >     would select F2)
> >
> > E.g. for the Intel i7-3770 system mentioned in commit message for
> > 6393d6a I think a load value of 49 should have selected 1700000 which
> > is not shown in the provided frequency table.
> 
> I think this is not true, because before the specific patch the relation
> of frequency selection was CPUFREQ_RELATION_L. This is the reason
> that a load value of 49 with a freq_next 1666490 would have a
> target frequency 1600000.

Hmm...
CPUFREQ_RELATION_L should select "lowest frequency at or above target"
being 1700000 in this case. Otherwise (if it would select "highest
frequency at or below target") this would imply that load values of
50, 51, 52 should select 1700000 which would contradict what was
written in commit message of 6393d6a1.

In any case probability of seeing such load values and thus selecting
a frequency of 1700000 is quite low. So I fully understand why the
patch was introduced.

> > What essentially changed was how load values are mapped to target
> > frequencies. For the HP system (min_freq=1200000, max_freq=2800000)
> > that I used in my tests, the old code would create following mapping:
> >
> > load | freq_next | used target frequency
> > ________________________________________
> > 0      0            1200000
> > 10     280000       1200000
> > 20     560000       1200000
> > 30     840000       1200000
> > 40     1120000      1200000
> > 42     1176000      1200000
> > 43     1204000      1204000
> > 50     1400000      1400000
> > 60     1680000      1680000
> > 70     1960000      1960000
> > 80     2240000      2240000
> > 90     2520000      2520000
> > 100    2800000      2800000
> >
> > The new code (introduced with commit 6393d6a) changed the mapping as
> > follows:
> >
> > load | freq_next | used target frequency
> > ________________________________________
> > 0      1200000      1200000
> > 10     1360000    1360000
> > 20     1520000    1520000
> > 30     1680000    1680000
> > 40     1840000    1840000
> > 42     1872000    1872000
> > 43     1888000    1888000
> > 50     2000000    2000000
> > 60     2160000    2160000
> > 70     2320000    2320000
> > 80     2480000    2480000
> > 90     2640000    2640000
> > 100    2800000    2800000
> >
> > My patch creates a third mapping. It basically ensures that up to a
> > load value of 42 the minimum frequency is used.
> >
> >> Which will decrease the performance for that period of time as we
> >> should have switched to a higher frequency really.
> >
> > I am not sure whether it's really useful for all systems using
> > ondemand governor to increase frequency above min_freq even if load is
> > just above 0. Of course expectation is that performance will be equal
> > or better than before. But how overall power consumption changes
> > depends on the hardware and its power saving capabilites.
> >
> >   ---8<---
> >
> >>> My understanding is that the original commit was tested with certain
> >>> combinations of hardware and cpufreq-drivers and the claim was that
> >>> for those (two?) tested combinations performance increased and power
> >>> consumption was lower. So I am not so sure what to expect from all
> >>> other cpufreq-driver/hardware combinations.
> >
> >> It was principally the right thing to do IMO. And I don't think any
> >> other hardware should get affected badly. At the max, the tuning needs
> >> to be made a bit better.
> >
> >   ---8<---
> >
> > It seems that the decision how to best map load values to target
> > frequencies is kind of hardware specific.
> >
> > Maybe a solution to this is that the cpufreq driver should be able to
> > provide a mapping function to overwrite the current default
> > calculation.
> >
> 
> I'm not familiar with ppc-cpufreq drive but maybe patch 6393d6 just
> uncovered an "issue" that was already existed but only on higher loads.
>
> Because, with or without patch 6393d6, if the specific CPU doesn't
> use a frequency table, there will many frequency transitions in
> higher loads too. I believe, though, that the side effect it's smaller
> in higher frequencies because CPUs tend to work on lowest and highest
> frequencies.

Might be. I didn't test this specifically.

> What about a patch in ppc-cpufreq driver that permits frequency
> changes only in specific steps and not in arbitrary values?

Which steps would you use? What scheme would be universal usable for
all affected system using this driver?

I had played with an approach to only make use of min_freq and
max_freq which eventually didn't result in better performance
in comparison to code before commit 6393d6.


Andreas

> Regards,
> Stratos

  ---8<---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-19 16:16               ` Andreas Herrmann
@ 2016-09-19 19:39                 ` Stratos Karafotis
  2016-09-22 17:54                   ` Andreas Herrmann
  0 siblings, 1 reply; 22+ messages in thread
From: Stratos Karafotis @ 2016-09-19 19:39 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm,
	Linux Kernel Mailing List, Stratos Karafotis, Thomas Renninger

On Mon, Sep 19, 2016 at 7:16 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Fri, Sep 16, 2016 at 09:58:42PM +0300, Stratos Karafotis wrote:
>> Hi,
>>
>> [ I 'm resending this message, because I think some recipients didn't receive
>> it. ]
>>
>> On 16/09/2016 12:47 μμ, Andreas Herrmann wrote:
>> > On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
>> >> On 01-09-16, 15:21, Andreas Herrmann wrote:
>> >>> On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
>> >
>> >>>> I am _really_ worried about such hacks in drivers to negate the effect of
>> a
>> >>>> patch, that was actually good.
>> >>>
>> >>>> Did you try to increase the sampling period of ondemand governor to see if
>> that
>> >>>> helps without this patch.
>> >>>
>> >>> With an older kernel I've modified transition_latency of the driver
>> >>> which in turn is used to calculate the sampling rate.
>> >
>> >> Naah, that isn't what I was looking for, sorry
>> >
>> >> To explain it a bit more, this is what the patch did.
>> >
>> >> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
>> >> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
>> >> sampling rate and system load tries to change the frequency of the
>> >> underlying hardware and select one of those.
>> >
>> >> Before the original patch came in, F2 and F3 were never getting
>> >> selected and the system was stuck in F1 for a long time.
>> >
>> > I think this is not a general statement. Such a behaviour is not
>> > common to all systems. Before commit 6393d6a target frequency was
>> > based on
>> >
>> >    freq_next = load * policy->cpuinfo.max_freq / 100;
>> >
>> > F2 would have been selected if
>> >
>> >   load = F2 * 100 / F7
>> >
>> > If F2 was not seen it can mean
>> >
>> > (1) either the load value was not hit in practice during monitoring of
>> >     a certain workload
>> >
>> > (2) or the calculated load value (in integer representation) would
>> >     select F1 or F3 (there is no corresponding integer value that
>> >     would select F2)
>> >
>> > E.g. for the Intel i7-3770 system mentioned in commit message for
>> > 6393d6a I think a load value of 49 should have selected 1700000 which
>> > is not shown in the provided frequency table.
>>
>> I think this is not true, because before the specific patch the relation
>> of frequency selection was CPUFREQ_RELATION_L. This is the reason
>> that a load value of 49 with a freq_next 1666490 would have a
>> target frequency 1600000.
>
> Hmm...
> CPUFREQ_RELATION_L should select "lowest frequency at or above target"
> being 1700000 in this case. Otherwise (if it would select "highest
> frequency at or below target") this would imply that load values of
> 50, 51, 52 should select 1700000 which would contradict what was
> written in commit message of 6393d6a1.

Yes, you are right. I'm sorry for the noise.

> In any case probability of seeing such load values and thus selecting
> a frequency of 1700000 is quite low. So I fully understand why the
> patch was introduced.

>> > What essentially changed was how load values are mapped to target
>> > frequencies. For the HP system (min_freq=1200000, max_freq=2800000)
>> > that I used in my tests, the old code would create following mapping:
>> >
>> > load | freq_next | used target frequency
>> > ________________________________________
>> > 0      0            1200000
>> > 10     280000       1200000
>> > 20     560000       1200000
>> > 30     840000       1200000
>> > 40     1120000      1200000
>> > 42     1176000      1200000
>> > 43     1204000      1204000
>> > 50     1400000      1400000
>> > 60     1680000      1680000
>> > 70     1960000      1960000
>> > 80     2240000      2240000
>> > 90     2520000      2520000
>> > 100    2800000      2800000
>> >
>> > The new code (introduced with commit 6393d6a) changed the mapping as
>> > follows:
>> >
>> > load | freq_next | used target frequency
>> > ________________________________________
>> > 0      1200000      1200000
>> > 10     1360000    1360000
>> > 20     1520000    1520000
>> > 30     1680000    1680000
>> > 40     1840000    1840000
>> > 42     1872000    1872000
>> > 43     1888000    1888000
>> > 50     2000000    2000000
>> > 60     2160000    2160000
>> > 70     2320000    2320000
>> > 80     2480000    2480000
>> > 90     2640000    2640000
>> > 100    2800000    2800000
>> >
>> > My patch creates a third mapping. It basically ensures that up to a
>> > load value of 42 the minimum frequency is used.
>> >
>> >> Which will decrease the performance for that period of time as we
>> >> should have switched to a higher frequency really.
>> >
>> > I am not sure whether it's really useful for all systems using
>> > ondemand governor to increase frequency above min_freq even if load is
>> > just above 0. Of course expectation is that performance will be equal
>> > or better than before. But how overall power consumption changes
>> > depends on the hardware and its power saving capabilites.
>> >
>> >   ---8<---
>> >
>> >>> My understanding is that the original commit was tested with certain
>> >>> combinations of hardware and cpufreq-drivers and the claim was that
>> >>> for those (two?) tested combinations performance increased and power
>> >>> consumption was lower. So I am not so sure what to expect from all
>> >>> other cpufreq-driver/hardware combinations.
>> >
>> >> It was principally the right thing to do IMO. And I don't think any
>> >> other hardware should get affected badly. At the max, the tuning needs
>> >> to be made a bit better.
>> >
>> >   ---8<---
>> >
>> > It seems that the decision how to best map load values to target
>> > frequencies is kind of hardware specific.
>> >
>> > Maybe a solution to this is that the cpufreq driver should be able to
>> > provide a mapping function to overwrite the current default
>> > calculation.
>> >
>>
>> I'm not familiar with ppc-cpufreq drive but maybe patch 6393d6 just
>> uncovered an "issue" that was already existed but only on higher loads.
>>
>> Because, with or without patch 6393d6, if the specific CPU doesn't
>> use a frequency table, there will many frequency transitions in
>> higher loads too. I believe, though, that the side effect it's smaller
>> in higher frequencies because CPUs tend to work on lowest and highest
>> frequencies.
>
> Might be. I didn't test this specifically.
>
>> What about a patch in ppc-cpufreq driver that permits frequency
>> changes only in specific steps and not in arbitrary values?
>
> Which steps would you use? What scheme would be universal usable for
> all affected system using this driver?

Just an idea. I would split the frequency range (max_freq - min_freq)
into ~10 steps. But I'm not familiar with the affected systems and
of course I can't prove this is an ideal approach.

> I had played with an approach to only make use of min_freq and
> max_freq which eventually didn't result in better performance
> in comparison to code before commit 6393d6.
>


Regards,
Stratos

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-19 19:39                 ` Stratos Karafotis
@ 2016-09-22 17:54                   ` Andreas Herrmann
  2016-10-05  5:21                     ` Viresh Kumar
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-22 17:54 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm,
	Linux Kernel Mailing List, Stratos Karafotis, Thomas Renninger

On Mon, Sep 19, 2016 at 10:39:18PM +0300, Stratos Karafotis wrote:
> On Mon, Sep 19, 2016 at 7:16 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > On Fri, Sep 16, 2016 at 09:58:42PM +0300, Stratos Karafotis wrote:
> >> Hi,
> >>
> >> [ I 'm resending this message, because I think some recipients didn't receive
> >> it. ]
> >>
> >> On 16/09/2016 12:47 μμ, Andreas Herrmann wrote:
> >> > On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
> >> >> On 01-09-16, 15:21, Andreas Herrmann wrote:
> >> >>> On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:

  ---8<---

> >> > It seems that the decision how to best map load values to target
> >> > frequencies is kind of hardware specific.
> >> >
> >> > Maybe a solution to this is that the cpufreq driver should be able to
> >> > provide a mapping function to overwrite the current default
> >> > calculation.

FYI, I've created new patches to address the issue.

First one will be to introduce a map_load_to_freq function. The
default being what commit 6393d6 introduced (no deadband).  Second
patch will than introduce a specific function for pcc-cpufreq to fall
back to what was used before commit 6393d6.

I just want to assemble gathered performance data and I am planning to
send those patches tomorrow.

> >> I'm not familiar with ppc-cpufreq drive but maybe patch 6393d6 just
> >> uncovered an "issue" that was already existed but only on higher loads.
> >>
> >> Because, with or without patch 6393d6, if the specific CPU doesn't
> >> use a frequency table, there will many frequency transitions in
> >> higher loads too. I believe, though, that the side effect it's smaller
> >> in higher frequencies because CPUs tend to work on lowest and highest
> >> frequencies.
> >
> > Might be. I didn't test this specifically.

Hopefully I'll also find time to gather some ftrace data wrt this.

> >> What about a patch in ppc-cpufreq driver that permits frequency
> >> changes only in specific steps and not in arbitrary values?
> >
> > Which steps would you use? What scheme would be universal usable for
> > all affected system using this driver?
> 
> Just an idea. I would split the frequency range (max_freq - min_freq)
> into ~10 steps. But I'm not familiar with the affected systems and
> of course I can't prove this is an ideal approach.

I've modified the pcc-cpufreq specific map_load_to_freq function to do
just that (map load values to 10 discrete frequency values) instead of
falling back to the deadband (pre-commit-6393d6-version).

Unfortunately this resulted in lower performance compared to
pre-commit-6393d6-version.

> > I had played with an approach to only make use of min_freq and
> > max_freq which eventually didn't result in better performance
> > in comparison to code before commit 6393d6.
> 
> Regards,
> Stratos


Regards,

Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/2] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-08-19 12:18 [PATCH 0/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes Andreas Herrmann
  2016-08-19 12:21 ` [PATCH 1/1] " Andreas Herrmann
  2016-08-19 12:40 ` [PATCH 0/1] " Andreas Herrmann
@ 2016-09-23 16:56 ` Andreas Herrmann
  2016-09-23 17:02   ` [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency Andreas Herrmann
  2016-09-23 17:07   ` [PATCH v2 2/2] cpufreq/pcc-cpufreq: Make use of map_load_to_freq op Andreas Herrmann
  2 siblings, 2 replies; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-23 16:56 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: linux-pm, linux-kernel, Stratos Karafotis, Thomas Renninger

Hi,

following patches address the performance degradation due to commit
6393d6a102 (cpufreq: ondemand: Eliminate the deadband effect) on
systems using pcc-cpufreq driver and ondemand governor.

Patch 1 introduces a generic_map_load_to_freq function which is
similar to what is used since commit 6393d6a102 (cpufreq: ondemand:
Eliminate the deadband effect) to calculate freq_next in od_update.

Patch 2 provides a specific function for pcc-cpufreq driver which
falls back to the calculation that was in the used before commit
6393d6a102.

I've also tested a pcc-specific function without deadband effect but
using only 10 frequency values. That was suboptimal in comparison to
patch 2. Here the performance data for this comparison (kernel
compilation with different number of jobs):

            pcc specific map_load_to_freq     pcc specific map_load_to_freq
              function (with deadband)        function (10 frequency steps)
 # of jobs  user    sys   elapsed   % CPU      user    sys   elapsed   % CPU
    2      413.19  102.34  250.97   205.00    426.21  106.32  260.00   204.00
    4      390.56  120.79  127.25   401.20    408.38  124.03  132.63   401.00
    8      354.22  140.09   64.20   769.60    383.33  146.31   68.70   770.40
   16      384.20  148.69   37.07  1436.60    466.20  164.18   43.30  1455.00
   32      496.70  152.77   25.15  2581.40    658.50  179.74   31.27  2680.60
   64      399.48   49.13   12.80  3505.80    404.27   51.24   13.14  3467.00
  120      406.52   46.89   13.60  3331.60    409.42   48.71   13.58  3371.40


Regards,

Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency
  2016-09-23 16:56 ` [PATCH v2 0/2] " Andreas Herrmann
@ 2016-09-23 17:02   ` Andreas Herrmann
  2016-10-05  4:01     ` Viresh Kumar
  2016-09-23 17:07   ` [PATCH v2 2/2] cpufreq/pcc-cpufreq: Make use of map_load_to_freq op Andreas Herrmann
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-23 17:02 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: linux-pm, linux-kernel, Stratos Karafotis, Thomas Renninger

Introduce op for ondemand governor that is used to map load to
frequency. It allows a cpufreq driver to provide a specific mapping
function if the generic function is not optimal for the driver.

Performance results (kernel compile with different number of jobs)
based on 4.8.0-rc7 (with and w/o my patches on top) from
an HP ProLiant DL580 Gen8 system using pcc-cpufreq:
 - Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
 - 60 CPUs, 128GB RAM
 
                    vanilla                  generic_map_load_to_freq function
 # of jobs  user    sys   elapsed   % CPU      user    sys   elapsed   % CPU
    2      445.44  110.51  272.99   203.00    445.56  111.22  273.35   203.00
    4      444.41  126.20  142.81   399.00    445.61  126.10  143.12   399.00
    8      483.04  150.58   82.19   770.40    483.51  150.84   82.17   771.40
   16      626.81  185.01   55.00  1475.40    628.01  185.54   55.02  1477.80
   32      816.72  204.39   37.26  2740.00    818.58  205.51   37.02  2765.40
   64      406.59   51.12   14.04  3257.80    406.22   51.84   13.84  3308.80
  120      413.00   48.39   14.36  3211.20    413.61   49.06   14.54  3181.00

Similar tests on another system using acpi_cpufreq didn't show
significant performance differences between these two kernel versions.

Link: https://marc.info/?i=20160819121814.GA17296%40suselix.suse.de
Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
---
 drivers/cpufreq/cpufreq_governor.h |  5 +++++
 drivers/cpufreq/cpufreq_ondemand.c | 35 ++++++++++++++++++++++++++++++-----
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index ef1037e..9fef947 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -171,6 +171,8 @@ void cpufreq_dbs_governor_limits(struct cpufreq_policy *policy);
 struct od_ops {
 	unsigned int (*powersave_bias_target)(struct cpufreq_policy *policy,
 			unsigned int freq_next, unsigned int relation);
+	unsigned int (*map_load_to_freq)(struct cpufreq_policy *policy,
+			unsigned int load);
 };
 
 unsigned int dbs_update(struct cpufreq_policy *policy);
@@ -178,6 +180,9 @@ void od_register_powersave_bias_handler(unsigned int (*f)
 		(struct cpufreq_policy *, unsigned int, unsigned int),
 		unsigned int powersave_bias);
 void od_unregister_powersave_bias_handler(void);
+void od_register_map_load_to_freq_handler(unsigned int (*f)
+		(struct cpufreq_policy *, unsigned int));
+void od_unregister_map_load_to_freq_handler(void);
 ssize_t store_sampling_rate(struct gov_attr_set *attr_set, const char *buf,
 			    size_t count);
 void gov_update_cpu_data(struct dbs_data *dbs_data);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 3a1f49f..d245f1c 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -112,6 +112,20 @@ static void ondemand_powersave_bias_init(struct cpufreq_policy *policy)
 	dbs_info->freq_lo = 0;
 }
 
+/*
+ * Calculate the next frequency proportional to load
+ */
+static unsigned int generic_map_load_to_freq(struct cpufreq_policy *policy,
+					unsigned int load)
+{
+	unsigned int min_f, max_f;
+
+	min_f = policy->cpuinfo.min_freq;
+	max_f = policy->cpuinfo.max_freq;
+
+	return (min_f + load * (max_f - min_f) / 100);
+}
+
 static void dbs_freq_increase(struct cpufreq_policy *policy, unsigned int freq)
 {
 	struct policy_dbs_info *policy_dbs = policy->governor_data;
@@ -150,12 +164,9 @@ static void od_update(struct cpufreq_policy *policy)
 			policy_dbs->rate_mult = dbs_data->sampling_down_factor;
 		dbs_freq_increase(policy, policy->max);
 	} else {
-		/* Calculate the next frequency proportional to load */
-		unsigned int freq_next, min_f, max_f;
+		unsigned int freq_next;
 
-		min_f = policy->cpuinfo.min_freq;
-		max_f = policy->cpuinfo.max_freq;
-		freq_next = min_f + load * (max_f - min_f) / 100;
+		freq_next = od_ops.map_load_to_freq(policy, load);
 
 		/* No longer fully busy, reset rate_mult */
 		policy_dbs->rate_mult = 1;
@@ -410,6 +421,7 @@ static void od_start(struct cpufreq_policy *policy)
 
 static struct od_ops od_ops = {
 	.powersave_bias_target = generic_powersave_bias_target,
+	.map_load_to_freq = generic_map_load_to_freq,
 };
 
 static struct dbs_governor od_dbs_gov = {
@@ -476,6 +488,19 @@ void od_unregister_powersave_bias_handler(void)
 }
 EXPORT_SYMBOL_GPL(od_unregister_powersave_bias_handler);
 
+void od_register_map_load_to_freq_handler(unsigned int (*f)
+					(struct cpufreq_policy *, unsigned int))
+{
+	od_ops.map_load_to_freq = f;
+}
+EXPORT_SYMBOL_GPL(od_register_map_load_to_freq_handler);
+
+void od_unregister_map_load_to_freq_handler(void)
+{
+	od_ops.map_load_to_freq = generic_map_load_to_freq;
+}
+EXPORT_SYMBOL_GPL(od_unregister_map_load_to_freq_handler);
+
 static int __init cpufreq_gov_dbs_init(void)
 {
 	return cpufreq_register_governor(CPU_FREQ_GOV_ONDEMAND);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/2] cpufreq/pcc-cpufreq: Make use of map_load_to_freq op
  2016-09-23 16:56 ` [PATCH v2 0/2] " Andreas Herrmann
  2016-09-23 17:02   ` [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency Andreas Herrmann
@ 2016-09-23 17:07   ` Andreas Herrmann
  2016-09-26  9:05     ` [PATCH v3 " Andreas Herrmann
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-23 17:07 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: linux-pm, linux-kernel, Stratos Karafotis, Thomas Renninger


Commit 6393d6a102 (cpufreq: ondemand: Eliminate the deadband effect)
introduced a performance regression for systems using pcc-cpufreq and
ondemand governor. This is measurable with different workloads. E.g.
wall-clock time for kernel compilation significantly increased.

The elimination of the deadband effect significantly increased the
number of frequency changes with pcc-cpufreq.

Provide a pcc-cpufreq specific function that reintroduces the mapping
of load to frequency which was used before commit 6393d6a102 and hence
re-introduces the deadband effect for this cpufreq driver.

Performance results (kernel compile with different number of jobs)
based on 4.8.0-rc7 (with and w/o my patches on top) from
an HP ProLiant DL580 Gen8 system using pcc-cpufreq:
 - Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
 - 60 CPUs, 128GB RAM
                                                       pcc specific
            generic_map_load_to_freq function     map_load_to_freq function
 # of jobs    user    sys   elapsed   % CPU      user    sys   elapsed   % CPU
    2        445.56  111.22  273.35   203.00    413.19  102.34  250.97   205.00
    4        445.61  126.10  143.12   399.00    390.56  120.79  127.25   401.20
    8        483.51  150.84   82.17   771.40    354.22  140.09   64.20   769.60
   16        628.01  185.54   55.02  1477.80    384.20  148.69   37.07  1436.60
   32        818.58  205.51   37.02  2765.40    496.70  152.77   25.15  2581.40
   64        406.22   51.84   13.84  3308.80    399.48   49.13   12.80  3505.80
  120        413.61   49.06   14.54  3181.00    406.52   46.89   13.60  3331.60

Link: https://marc.info/?i=20160819121814.GA17296%40suselix.suse.de
Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
---
 drivers/cpufreq/pcc-cpufreq.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
index 3f0ce2a..1522747 100644
--- a/drivers/cpufreq/pcc-cpufreq.c
+++ b/drivers/cpufreq/pcc-cpufreq.c
@@ -39,6 +39,8 @@
 
 #include <acpi/processor.h>
 
+#include "cpufreq_governor.h"
+
 #define PCC_VERSION	"1.10.00"
 #define POLL_LOOPS 	300
 
@@ -534,6 +536,12 @@ out_free:
 	return ret;
 }
 
+static unsigned int pcc_map_load_to_freq(struct cpufreq_policy *policy,
+					unsigned int load)
+{
+	return (load * policy->cpuinfo.max_freq / 100);
+}
+
 static int pcc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
 	unsigned int cpu = policy->cpu;
@@ -555,6 +563,8 @@ static int pcc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	policy->min = policy->cpuinfo.min_freq =
 		ioread32(&pcch_hdr->minimum_frequency) * 1000;
 
+	od_register_map_load_to_freq_handler(pcc_map_load_to_freq);
+
 	pr_debug("init: policy->max is %d, policy->min is %d\n",
 		policy->max, policy->min);
 out:
@@ -563,6 +573,7 @@ out:
 
 static int pcc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
+	od_unregister_map_load_to_freq_handler();
 	return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/2] cpufreq/pcc-cpufreq: Make use of map_load_to_freq op
  2016-09-23 17:07   ` [PATCH v2 2/2] cpufreq/pcc-cpufreq: Make use of map_load_to_freq op Andreas Herrmann
@ 2016-09-26  9:05     ` Andreas Herrmann
  0 siblings, 0 replies; 22+ messages in thread
From: Andreas Herrmann @ 2016-09-26  9:05 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: linux-pm, linux-kernel, Stratos Karafotis, Thomas Renninger


Commit 6393d6a102 (cpufreq: ondemand: Eliminate the deadband effect)
introduced a performance regression for systems using pcc-cpufreq and
ondemand governor. This is measurable with different workloads. E.g.
wall-clock time for kernel compilation significantly increased.

The elimination of the deadband effect significantly increased the
number of frequency changes with pcc-cpufreq.

Provide a pcc-cpufreq specific function that reintroduces the mapping
of load to frequency which was used before commit 6393d6a102 and hence
re-introduces the deadband effect for this cpufreq driver.

Performance results (kernel compile with different number of jobs)
based on 4.8.0-rc7 (with and w/o my patches on top) from
an HP ProLiant DL580 Gen8 system using pcc-cpufreq:
 - Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
 - 60 CPUs, 128GB RAM
                                                       pcc specific
            generic_map_load_to_freq function     map_load_to_freq function
 # of jobs    user    sys   elapsed   % CPU      user    sys   elapsed   % CPU
    2        445.56  111.22  273.35   203.00    413.19  102.34  250.97   205.00
    4        445.61  126.10  143.12   399.00    390.56  120.79  127.25   401.20
    8        483.51  150.84   82.17   771.40    354.22  140.09   64.20   769.60
   16        628.01  185.54   55.02  1477.80    384.20  148.69   37.07  1436.60
   32        818.58  205.51   37.02  2765.40    496.70  152.77   25.15  2581.40
   64        406.22   51.84   13.84  3308.80    399.48   49.13   12.80  3505.80
  120        413.61   49.06   14.54  3181.00    406.52   46.89   13.60  3331.60

Link: https://marc.info/?i=20160819121814.GA17296%40suselix.suse.de
Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
---
 drivers/cpufreq/Kconfig.x86   |  2 +-
 drivers/cpufreq/pcc-cpufreq.c | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

Kbuild test robot detected a build issue with my patch:

   drivers/built-in.o: In function `pcc_cpufreq_cpu_exit':
>> pcc-cpufreq.c:(.text+0x321019): undefined reference to `od_unregister_map_load_to_freq_handler'
   drivers/built-in.o: In function `pcc_cpufreq_cpu_init':
>> pcc-cpufreq.c:(.text+0x32136a): undefined reference to `od_register_map_load_to_freq_handler'

for configs that selected X86_PCC_CPUFREQ but not
CPU_FREQ_GOV_ONDEMAND.

This new patch version adds the required build dependency as my change
makes pcc-cpufreq dependent on cpufreq_ondemand.
Sorry for the omission in the previous patch version.


Regards,

Andreas

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index adbd1de..f159726 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -18,7 +18,7 @@ config X86_INTEL_PSTATE
 
 config X86_PCC_CPUFREQ
 	tristate "Processor Clocking Control interface driver"
-	depends on ACPI && ACPI_PROCESSOR
+	depends on CPU_FREQ_GOV_ONDEMAND && ACPI && ACPI_PROCESSOR
 	help
 	  This driver adds support for the PCC interface.
 
diff --git a/drivers/cpufreq/pcc-cpufreq.c b/drivers/cpufreq/pcc-cpufreq.c
index 3f0ce2a..1522747 100644
--- a/drivers/cpufreq/pcc-cpufreq.c
+++ b/drivers/cpufreq/pcc-cpufreq.c
@@ -39,6 +39,8 @@
 
 #include <acpi/processor.h>
 
+#include "cpufreq_governor.h"
+
 #define PCC_VERSION	"1.10.00"
 #define POLL_LOOPS 	300
 
@@ -534,6 +536,12 @@ out_free:
 	return ret;
 }
 
+static unsigned int pcc_map_load_to_freq(struct cpufreq_policy *policy,
+					unsigned int load)
+{
+	return (load * policy->cpuinfo.max_freq / 100);
+}
+
 static int pcc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
 	unsigned int cpu = policy->cpu;
@@ -555,6 +563,8 @@ static int pcc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	policy->min = policy->cpuinfo.min_freq =
 		ioread32(&pcch_hdr->minimum_frequency) * 1000;
 
+	od_register_map_load_to_freq_handler(pcc_map_load_to_freq);
+
 	pr_debug("init: policy->max is %d, policy->min is %d\n",
 		policy->max, policy->min);
 out:
@@ -563,6 +573,7 @@ out:
 
 static int pcc_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
+	od_unregister_map_load_to_freq_handler();
 	return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency
  2016-09-23 17:02   ` [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency Andreas Herrmann
@ 2016-10-05  4:01     ` Viresh Kumar
  2016-10-11  6:30       ` Andreas Herrmann
  0 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2016-10-05  4:01 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On 23-09-16, 19:02, Andreas Herrmann wrote:
> Introduce op for ondemand governor that is used to map load to
> frequency. It allows a cpufreq driver to provide a specific mapping
> function if the generic function is not optimal for the driver.
> 
> Performance results (kernel compile with different number of jobs)
> based on 4.8.0-rc7 (with and w/o my patches on top) from
> an HP ProLiant DL580 Gen8 system using pcc-cpufreq:
>  - Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
>  - 60 CPUs, 128GB RAM
>  
>                     vanilla                  generic_map_load_to_freq function
>  # of jobs  user    sys   elapsed   % CPU      user    sys   elapsed   % CPU
>     2      445.44  110.51  272.99   203.00    445.56  111.22  273.35   203.00
>     4      444.41  126.20  142.81   399.00    445.61  126.10  143.12   399.00
>     8      483.04  150.58   82.19   770.40    483.51  150.84   82.17   771.40
>    16      626.81  185.01   55.00  1475.40    628.01  185.54   55.02  1477.80
>    32      816.72  204.39   37.26  2740.00    818.58  205.51   37.02  2765.40
>    64      406.59   51.12   14.04  3257.80    406.22   51.84   13.84  3308.80
>   120      413.00   48.39   14.36  3211.20    413.61   49.06   14.54  3181.00
> 
> Similar tests on another system using acpi_cpufreq didn't show
> significant performance differences between these two kernel versions.
> 
> Link: https://marc.info/?i=20160819121814.GA17296%40suselix.suse.de
> Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
> ---
>  drivers/cpufreq/cpufreq_governor.h |  5 +++++
>  drivers/cpufreq/cpufreq_ondemand.c | 35 ++++++++++++++++++++++++++++++-----
>  2 files changed, 35 insertions(+), 5 deletions(-)

NAK.

If we are absolutely required to hack it in some way, then I would
prefer your first patchset as the noise was limited to only your
driver.

We aren't going to provide such operations from the governors, sorry.

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-14 14:56         ` Andreas Herrmann
@ 2016-10-05  5:17           ` Viresh Kumar
  2016-10-11  6:28             ` Andreas Herrmann
  0 siblings, 1 reply; 22+ messages in thread
From: Viresh Kumar @ 2016-10-05  5:17 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

Sorry for being late Andreas :(

On 14-09-16, 16:56, Andreas Herrmann wrote:

First of all, thanks for your hardwork in getting these numbers out.
Really appreciate it.

> Below is some trace data. I hope it is of some help.
> 
> (A) - sampling 10s period when system is idle
> (B) - sampling 10s period when system partially loaded (kernel
>       compilation using 2 jobs)
> 
> (1) 4.8-rc5
> (2) 4.8-rc5 with my patch (reintro of deadband effect within
>     pcc-cpufreq)
> (3) 4.8-rc5 with reversal of 6393d6a102 (cpufreq: ondemand: Eliminate
>     the deadband effect)
> 
> Let me know whether you are looking for other trace data wrt this
> issue.
> 
> 
> Thanks,
> 
> Andreas
> 
> ---
> 
> (A)-(1)
> 
>  # Total Lost Samples: 0
>  # Samples: 41  of event 'power:cpu_frequency'
>  # Event count (approx.): 41
>  # Overhead  Command       Shared Object     Symbol                       
>  # ........  ............  ................  .............................
>      39.02%  kworker/14:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      29.27%  kworker/0:0   [kernel.vmlinux]  [k] cpufreq_notify_transition
>      19.51%  kworker/10:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       7.32%  kworker/5:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
>       2.44%  kworker/23:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       2.44%  kworker/40:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> 
> (A)-(2)
> 
>  # Total Lost Samples: 0
>  # Samples: 6  of event 'power:cpu_frequency'
>  # Event count (approx.): 6
>  # Overhead  Command       Shared Object     Symbol                       
>  # ........  ............  ................  .............................
>      33.33%  kworker/1:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
>      16.67%  kworker/16:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      16.67%  kworker/22:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      16.67%  kworker/26:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      16.67%  kworker/33:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> 
> (A)-(3)
> 
>  # Total Lost Samples: 0
>  # Samples: 7  of event 'power:cpu_frequency'
>  # Event count (approx.): 7
>  # Overhead  Command       Shared Object     Symbol                       
>  # ........  ............  ................  .............................
>      28.57%  kworker/58:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      14.29%  kworker/19:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      14.29%  kworker/20:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      14.29%  kworker/22:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      14.29%  kworker/23:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>      14.29%  kworker/35:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> 
> ---
> 
> (B)-(1)
> 
>  # Total Lost Samples: 0
>  # Samples: 2K of event 'power:cpu_frequency'
>  # Event count (approx.): 2382
>  # Overhead  Command       Shared Object     Symbol                       
>  # ........  ............  ................  .............................
>       5.75%  kworker/0:0   [kernel.vmlinux]  [k] cpufreq_notify_transition
>       4.16%  kworker/12:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       3.11%  kworker/17:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       2.94%  kworker/2:1   [kernel.vmlinux]  [k] cpufreq_notify_transition
>       2.73%  kworker/19:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       ...
> 
> (B)-(2)
> 
>  # Total Lost Samples: 0
>  # Samples: 320  of event 'power:cpu_frequency'
>  # Event count (approx.): 320
>  # Overhead  Command       Shared Object     Symbol                       
>  # ........  ............  ................  .............................
>       4.69%  kworker/56:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       4.06%  kworker/12:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       4.06%  kworker/28:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       4.06%  kworker/6:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
>       3.75%  kworker/32:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       ...
> 
> (B)-(3)
> 
>  # Total Lost Samples: 0
>  # Samples: 333  of event 'power:cpu_frequency'
>  # Event count (approx.): 333
>  # Overhead  Command       Shared Object     Symbol                       
>  # ........  ............  ................  .............................
>       4.80%  kworker/51:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       4.50%  kworker/39:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       4.20%  kworker/47:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       3.90%  kworker/59:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
>       3.90%  kworker/7:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
>       ...

I am worried by all of these.. So, its not the DVFS transition which
took time, but cpufreq_notify_transition(). And probably one of the
drivers in your setup is screwing up here, which has registered with
cpufreq_register_notifier().

Can you please take a look at that ? Just check which all routines are
getting called as part of srcu_notifier_call_chain().

Also a simple (hacky) solution to fix the problem you have got is to
divide the range of frequency in steps (which you have already done
AFAICT in one of your patches) and not mention the frequencies that
were part of the deadband earlier. That will keep the CPU at either
low freq or some higher freqs.

But I am quite sure that we have an abuser here and we better find it
now.

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-09-22 17:54                   ` Andreas Herrmann
@ 2016-10-05  5:21                     ` Viresh Kumar
  0 siblings, 0 replies; 22+ messages in thread
From: Viresh Kumar @ 2016-10-05  5:21 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Stratos Karafotis, Rafael J. Wysocki, linux-pm,
	Linux Kernel Mailing List, Stratos Karafotis, Thomas Renninger

On 22-09-16, 19:54, Andreas Herrmann wrote:
> On Mon, Sep 19, 2016 at 10:39:18PM +0300, Stratos Karafotis wrote:
> > Just an idea. I would split the frequency range (max_freq - min_freq)
> > into ~10 steps. But I'm not familiar with the affected systems and
> > of course I can't prove this is an ideal approach.
> 
> I've modified the pcc-cpufreq specific map_load_to_freq function to do
> just that (map load values to 10 discrete frequency values) instead of
> falling back to the deadband (pre-commit-6393d6-version).

And in that you can remove the deadband frequencies from the table to
get almost the same results, as I already stated in an earlier email.

-- 
viresh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
  2016-10-05  5:17           ` Viresh Kumar
@ 2016-10-11  6:28             ` Andreas Herrmann
  0 siblings, 0 replies; 22+ messages in thread
From: Andreas Herrmann @ 2016-10-11  6:28 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On Wed, Oct 05, 2016 at 10:47:53AM +0530, Viresh Kumar wrote:
> Sorry for being late Andreas :(
> 
> On 14-09-16, 16:56, Andreas Herrmann wrote:
> 
> First of all, thanks for your hardwork in getting these numbers out.
> Really appreciate it.
> 
> > Below is some trace data. I hope it is of some help.
> > 
> > (A) - sampling 10s period when system is idle
> > (B) - sampling 10s period when system partially loaded (kernel
> >       compilation using 2 jobs)
> > 
> > (1) 4.8-rc5
> > (2) 4.8-rc5 with my patch (reintro of deadband effect within
> >     pcc-cpufreq)
> > (3) 4.8-rc5 with reversal of 6393d6a102 (cpufreq: ondemand: Eliminate
> >     the deadband effect)
> > 
> > Let me know whether you are looking for other trace data wrt this
> > issue.
> > 
> > 
> > Thanks,
> > 
> > Andreas
> > 
> > ---
> > 
> > (A)-(1)
> > 
> >  # Total Lost Samples: 0
> >  # Samples: 41  of event 'power:cpu_frequency'
> >  # Event count (approx.): 41
> >  # Overhead  Command       Shared Object     Symbol                       
> >  # ........  ............  ................  .............................
> >      39.02%  kworker/14:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      29.27%  kworker/0:0   [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      19.51%  kworker/10:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       7.32%  kworker/5:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       2.44%  kworker/23:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       2.44%  kworker/40:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> > 
> > (A)-(2)
> > 
> >  # Total Lost Samples: 0
> >  # Samples: 6  of event 'power:cpu_frequency'
> >  # Event count (approx.): 6
> >  # Overhead  Command       Shared Object     Symbol                       
> >  # ........  ............  ................  .............................
> >      33.33%  kworker/1:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      16.67%  kworker/16:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      16.67%  kworker/22:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      16.67%  kworker/26:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      16.67%  kworker/33:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> > 
> > (A)-(3)
> > 
> >  # Total Lost Samples: 0
> >  # Samples: 7  of event 'power:cpu_frequency'
> >  # Event count (approx.): 7
> >  # Overhead  Command       Shared Object     Symbol                       
> >  # ........  ............  ................  .............................
> >      28.57%  kworker/58:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      14.29%  kworker/19:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      14.29%  kworker/20:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      14.29%  kworker/22:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      14.29%  kworker/23:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >      14.29%  kworker/35:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> > 
> > ---
> > 
> > (B)-(1)
> > 
> >  # Total Lost Samples: 0
> >  # Samples: 2K of event 'power:cpu_frequency'
> >  # Event count (approx.): 2382
> >  # Overhead  Command       Shared Object     Symbol                       
> >  # ........  ............  ................  .............................
> >       5.75%  kworker/0:0   [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       4.16%  kworker/12:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       3.11%  kworker/17:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       2.94%  kworker/2:1   [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       2.73%  kworker/19:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       ...
> > 
> > (B)-(2)
> > 
> >  # Total Lost Samples: 0
> >  # Samples: 320  of event 'power:cpu_frequency'
> >  # Event count (approx.): 320
> >  # Overhead  Command       Shared Object     Symbol                       
> >  # ........  ............  ................  .............................
> >       4.69%  kworker/56:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       4.06%  kworker/12:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       4.06%  kworker/28:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       4.06%  kworker/6:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       3.75%  kworker/32:2  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       ...
> > 
> > (B)-(3)
> > 
> >  # Total Lost Samples: 0
> >  # Samples: 333  of event 'power:cpu_frequency'
> >  # Event count (approx.): 333
> >  # Overhead  Command       Shared Object     Symbol                       
> >  # ........  ............  ................  .............................
> >       4.80%  kworker/51:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       4.50%  kworker/39:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       4.20%  kworker/47:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       3.90%  kworker/59:1  [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       3.90%  kworker/7:2   [kernel.vmlinux]  [k] cpufreq_notify_transition
> >       ...
> 
> I am worried by all of these.. So, its not the DVFS transition which
> took time, but cpufreq_notify_transition(). And probably one of the
> drivers in your setup is screwing up here, which has registered with
> cpufreq_register_notifier().
> 
> Can you please take a look at that ? Just check which all routines are
> getting called as part of srcu_notifier_call_chain().

I'll take a look into this.

> Also a simple (hacky) solution to fix the problem you have got is to
> divide the range of frequency in steps (which you have already done
> AFAICT in one of your patches) and not mention the frequencies that
> were part of the deadband earlier. That will keep the CPU at either
> low freq or some higher freqs.

Yes, seems worth to try this.

> But I am quite sure that we have an abuser here and we better find it
> now.


Thanks,

Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency
  2016-10-05  4:01     ` Viresh Kumar
@ 2016-10-11  6:30       ` Andreas Herrmann
  0 siblings, 0 replies; 22+ messages in thread
From: Andreas Herrmann @ 2016-10-11  6:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, linux-kernel, Stratos Karafotis,
	Thomas Renninger

On Wed, Oct 05, 2016 at 09:31:18AM +0530, Viresh Kumar wrote:
> On 23-09-16, 19:02, Andreas Herrmann wrote:
> > Introduce op for ondemand governor that is used to map load to
> > frequency. It allows a cpufreq driver to provide a specific mapping
> > function if the generic function is not optimal for the driver.
> > 
> > Performance results (kernel compile with different number of jobs)
> > based on 4.8.0-rc7 (with and w/o my patches on top) from
> > an HP ProLiant DL580 Gen8 system using pcc-cpufreq:
> >  - Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
> >  - 60 CPUs, 128GB RAM
> >  
> >                     vanilla                  generic_map_load_to_freq function
> >  # of jobs  user    sys   elapsed   % CPU      user    sys   elapsed   % CPU
> >     2      445.44  110.51  272.99   203.00    445.56  111.22  273.35   203.00
> >     4      444.41  126.20  142.81   399.00    445.61  126.10  143.12   399.00
> >     8      483.04  150.58   82.19   770.40    483.51  150.84   82.17   771.40
> >    16      626.81  185.01   55.00  1475.40    628.01  185.54   55.02  1477.80
> >    32      816.72  204.39   37.26  2740.00    818.58  205.51   37.02  2765.40
> >    64      406.59   51.12   14.04  3257.80    406.22   51.84   13.84  3308.80
> >   120      413.00   48.39   14.36  3211.20    413.61   49.06   14.54  3181.00
> > 
> > Similar tests on another system using acpi_cpufreq didn't show
> > significant performance differences between these two kernel versions.
> > 
> > Link: https://marc.info/?i=20160819121814.GA17296%40suselix.suse.de
> > Signed-off-by: Andreas Herrmann <aherrmann@suse.com>
> > ---
> >  drivers/cpufreq/cpufreq_governor.h |  5 +++++
> >  drivers/cpufreq/cpufreq_ondemand.c | 35 ++++++++++++++++++++++++++++++-----
> >  2 files changed, 35 insertions(+), 5 deletions(-)
> 
> NAK.
> 
> If we are absolutely required to hack it in some way, then I would
> prefer your first patchset as the noise was limited to only your
> driver.
> 
> We aren't going to provide such operations from the governors, sorry.

Yep, understandable.


Andreas

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-10-11  6:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-19 12:18 [PATCH 0/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes Andreas Herrmann
2016-08-19 12:21 ` [PATCH 1/1] " Andreas Herrmann
2016-08-29  6:01   ` Viresh Kumar
2016-09-01 13:21     ` Andreas Herrmann
2016-09-07  5:02       ` Viresh Kumar
2016-09-13 10:53         ` Andreas Herrmann
2016-09-14 14:56         ` Andreas Herrmann
2016-10-05  5:17           ` Viresh Kumar
2016-10-11  6:28             ` Andreas Herrmann
2016-09-16  9:47         ` Andreas Herrmann
2016-09-16 18:48           ` Stratos Karafotis
     [not found]             ` <CADmjqpNE9f7fzQjWsHKB4wEjLq-4ZvQpaC314OcLdQ-i_TAABg@mail.gmail.com>
2016-09-19 16:16               ` Andreas Herrmann
2016-09-19 19:39                 ` Stratos Karafotis
2016-09-22 17:54                   ` Andreas Herrmann
2016-10-05  5:21                     ` Viresh Kumar
2016-08-19 12:40 ` [PATCH 0/1] " Andreas Herrmann
2016-09-23 16:56 ` [PATCH v2 0/2] " Andreas Herrmann
2016-09-23 17:02   ` [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency Andreas Herrmann
2016-10-05  4:01     ` Viresh Kumar
2016-10-11  6:30       ` Andreas Herrmann
2016-09-23 17:07   ` [PATCH v2 2/2] cpufreq/pcc-cpufreq: Make use of map_load_to_freq op Andreas Herrmann
2016-09-26  9:05     ` [PATCH v3 " Andreas Herrmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.