linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
@ 2018-07-17  6:50 Andreas Herrmann
  2018-07-17  7:33 ` Rafael J. Wysocki
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17  6:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Frederic Weisbecker, Viresh Kumar, linux-pm,
	linux-kernel

Hello,

I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
idle state before stopping the tick") causes severe performance drop
for systems using pcc-cpufreq driver. Depending on the number of CPUs
the system might be almost unusable. The OS jitter for 4.17.y and
4.18.-rcx kernels is off the charts, you can even spot it with top
command (issued when the system is supposedly idle), e.g.

 top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
 Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
 %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
 KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
 KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
  3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
    67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
    25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
   182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
    43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
   103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
   334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
   789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
   418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
   779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
   773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
   762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
   769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
   805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
   840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
   812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
   847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
   763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
   772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
   821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
   923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
  1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
    61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
  3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
   771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
   767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
   764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
   765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1

When I apply below patch (trying to revert essential parts of commit
554c8aa8ecad) behaviour seems back to normal.

I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
to cpufreq drivers and you better not use it. But I wonder whether
commit 554c8aa8ecad ("sched: idle: Select idle state before stopping
the tick") introduced bad behaviour for other cases as well.

I'll send some performance results to illustrate the issue asap. I've
also tried to modify pcc-cpufreq to reduce the amount of frequency
changes triggered by this driver but this does not help for kernels
where commit 554c8aa8ecad is applied.


Andreas

---

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 1a3e9bddd17b..377a62ec475c 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -185,18 +185,13 @@ static void cpuidle_idle_call(void)
 	} else {
 		bool stop_tick = true;
 
+		tick_nohz_idle_stop_tick();
+		rcu_idle_enter();
 		/*
 		 * Ask the cpuidle framework to choose a convenient idle state.
 		 */
 		next_state = cpuidle_select(drv, dev, &stop_tick);
 
-		if (stop_tick)
-			tick_nohz_idle_stop_tick();
-		else
-			tick_nohz_idle_retain_tick();
-
-		rcu_idle_enter();
-
 		entered_state = call_cpuidle(drv, dev, next_state);
 		/*
 		 * Give the governor an opportunity to reflect on the outcome

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  6:50 Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
@ 2018-07-17  7:33 ` Rafael J. Wysocki
  2018-07-17  8:03   ` Rafael J. Wysocki
                     ` (2 more replies)
  2018-07-17  8:15 ` Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17  7:33 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

Hi,

Thanks for your report!

On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> Hello,
>
> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
> idle state before stopping the tick") causes severe performance drop
> for systems using pcc-cpufreq driver. Depending on the number of CPUs
> the system might be almost unusable. The OS jitter for 4.17.y and
> 4.18.-rcx kernels is off the charts, you can even spot it with top
> command (issued when the system is supposedly idle), e.g.
>
>  top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
>  Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
>  %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
>  KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
>  KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem
>
>    PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
>   3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
>     67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
>     25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
>    182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
>     43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
>    103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
>    334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
>    789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
>    418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
>    779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
>    773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
>    762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
>    769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
>    805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
>    840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
>    812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
>    847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
>    763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
>    772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
>    821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
>    923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
>   1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
>     61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
>   3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
>    771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
>    767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
>    764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
>    765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1
>
> When I apply below patch (trying to revert essential parts of commit
> 554c8aa8ecad) behaviour seems back to normal.

Well, that basically defeats the purpose of the change in commit
554c8aa8ecad, so it's not what I'd like to do to fix this problem.

Also it would be good to understand what actually happens.

> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
> to cpufreq drivers and you better not use it.

That's exactly right.

> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
> stopping the tick") introduced bad behaviour for other cases as well.

It has been tested quite extensively in that respect, although
admittedly not with the pcc-cpufreq driver.

Nothing bad related to it has been has been reported so far, FWIW.

> I'll send some performance results to illustrate the issue asap. I've
> also tried to modify pcc-cpufreq to reduce the amount of frequency
> changes triggered by this driver but this does not help for kernels
> where commit 554c8aa8ecad is applied.

Can you replace pcc-cpufreq with a different cpufreq driver on the
affected systems?  If so, do performance numbers look bad after that
too?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  7:33 ` Rafael J. Wysocki
@ 2018-07-17  8:03   ` Rafael J. Wysocki
  2018-07-17  8:50     ` Andreas Herrmann
  2018-07-17  8:08   ` Daniel Lezcano
  2018-07-17  8:36   ` Andreas Herrmann
  2 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17  8:03 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 9:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> Hi,
>
> Thanks for your report!
>
> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> Hello,
>>
>> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> idle state before stopping the tick") causes severe performance drop
>> for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> the system might be almost unusable. The OS jitter for 4.17.y and
>> 4.18.-rcx kernels is off the charts, you can even spot it with top
>> command (issued when the system is supposedly idle), e.g.
>>
>>  top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
>>  Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
>>  %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
>>  KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
>>  KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem
>>
>>    PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
>>   3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
>>     67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
>>     25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
>>    182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
>>     43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
>>    103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
>>    334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
>>    789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
>>    418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
>>    779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
>>    773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
>>    762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
>>    769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
>>    805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
>>    840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
>>    812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
>>    847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
>>    763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
>>    772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
>>    821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
>>    923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
>>   1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
>>     61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
>>   3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
>>    771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
>>    767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
>>    764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
>>    765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1
>>
>> When I apply below patch (trying to revert essential parts of commit
>> 554c8aa8ecad) behaviour seems back to normal.
>
> Well, that basically defeats the purpose of the change in commit
> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>
> Also it would be good to understand what actually happens.
>
>> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> to cpufreq drivers and you better not use it.
>
> That's exactly right.
>
>> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> stopping the tick") introduced bad behaviour for other cases as well.
>
> It has been tested quite extensively in that respect, although
> admittedly not with the pcc-cpufreq driver.
>
> Nothing bad related to it has been has been reported so far, FWIW.
>
>> I'll send some performance results to illustrate the issue asap. I've
>> also tried to modify pcc-cpufreq to reduce the amount of frequency
>> changes triggered by this driver but this does not help for kernels
>> where commit 554c8aa8ecad is applied.
>
> Can you replace pcc-cpufreq with a different cpufreq driver on the
> affected systems?  If so, do performance numbers look bad after that
> too?

Also, what cpufreq governor do you use with pcc-cpufreq?  Does
changing it to something like "performance" improve things?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  7:33 ` Rafael J. Wysocki
  2018-07-17  8:03   ` Rafael J. Wysocki
@ 2018-07-17  8:08   ` Daniel Lezcano
  2018-07-17  8:36   ` Andreas Herrmann
  2 siblings, 0 replies; 36+ messages in thread
From: Daniel Lezcano @ 2018-07-17  8:08 UTC (permalink / raw)
  To: Rafael J. Wysocki, Andreas Herrmann
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On 17/07/2018 09:33, Rafael J. Wysocki wrote:
> Hi,
> 
> Thanks for your report!
> 
> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> Hello,
>>
>> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> idle state before stopping the tick") causes severe performance drop
>> for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> the system might be almost unusable. The OS jitter for 4.17.y and
>> 4.18.-rcx kernels is off the charts, you can even spot it with top
>> command (issued when the system is supposedly idle), e.g.
>>
>>  top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
>>  Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
>>  %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
>>  KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
>>  KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem
>>
>>    PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
>>   3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
>>     67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
>>     25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
>>    182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
>>     43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
>>    103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
>>    334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
>>    789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
>>    418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
>>    779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
>>    773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
>>    762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
>>    769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
>>    805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
>>    840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
>>    812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
>>    847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
>>    763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
>>    772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
>>    821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
>>    923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
>>   1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
>>     61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
>>   3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
>>    771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
>>    767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
>>    764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
>>    765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1
>>
>> When I apply below patch (trying to revert essential parts of commit
>> 554c8aa8ecad) behaviour seems back to normal.
> 
> Well, that basically defeats the purpose of the change in commit
> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
> 
> Also it would be good to understand what actually happens.
> 
>> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> to cpufreq drivers and you better not use it.
> 
> That's exactly right.
> 
>> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> stopping the tick") introduced bad behaviour for other cases as well.
> 
> It has been tested quite extensively in that respect, although
> admittedly not with the pcc-cpufreq driver.
> 
> Nothing bad related to it has been has been reported so far, FWIW.

I confirm we benchmarked on ARM64 and we did not notice a big
performance drop expect a few percent due to more cluster idle states
which is logical and compensated by a big energy improvement.

>> I'll send some performance results to illustrate the issue asap. I've
>> also tried to modify pcc-cpufreq to reduce the amount of frequency
>> changes triggered by this driver but this does not help for kernels
>> where commit 554c8aa8ecad is applied.
> 
> Can you replace pcc-cpufreq with a different cpufreq driver on the
> affected systems?  If so, do performance numbers look bad after that
> too?
> 
> Thanks,
> Rafael
> 


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  6:50 Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
  2018-07-17  7:33 ` Rafael J. Wysocki
@ 2018-07-17  8:15 ` Peter Zijlstra
  2018-07-17  9:05   ` Andreas Herrmann
  2018-07-17 12:02 ` [PATCH] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems Rafael J. Wysocki
  2018-07-18 15:25 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
  3 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2018-07-17  8:15 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Frederic Weisbecker, Viresh Kumar, linux-pm,
	linux-kernel

On Tue, Jul 17, 2018 at 08:50:48AM +0200, Andreas Herrmann wrote:
> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
> idle state before stopping the tick") causes severe performance drop
> for systems using pcc-cpufreq driver. Depending on the number of CPUs
> the system might be almost unusable. 

Have you looked at that driver? It features a global lock which is taken
for all cpufreq operations. Once you get past a handful of cpus that
will constrict your system exactly as you say.

What kind of machine are you running this on, and can you configure the
thing to use anything else? If so, do so and never look back.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  7:33 ` Rafael J. Wysocki
  2018-07-17  8:03   ` Rafael J. Wysocki
  2018-07-17  8:08   ` Daniel Lezcano
@ 2018-07-17  8:36   ` Andreas Herrmann
  2018-07-17  8:52     ` Rafael J. Wysocki
  2 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17  8:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 09:33:53AM +0200, Rafael J. Wysocki wrote:
> Hi,
> 
> Thanks for your report!
> 
> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > Hello,
> >
> > I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
> > idle state before stopping the tick") causes severe performance drop
> > for systems using pcc-cpufreq driver. Depending on the number of CPUs
> > the system might be almost unusable. The OS jitter for 4.17.y and
> > 4.18.-rcx kernels is off the charts, you can even spot it with top
> > command (issued when the system is supposedly idle), e.g.
> >
> >  top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
> >  Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
> >  %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
> >  KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
> >  KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem
> >
> >    PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
> >   3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
> >     67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
> >     25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
> >    182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
> >     43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
> >    103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
> >    334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
> >    789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
> >    418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
> >    779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
> >    773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
> >    762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
> >    769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
> >    805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
> >    840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
> >    812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
> >    847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
> >    763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
> >    772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
> >    821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
> >    923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
> >   1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
> >     61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
> >   3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
> >    771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
> >    767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
> >    764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
> >    765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1
> >
> > When I apply below patch (trying to revert essential parts of commit
> > 554c8aa8ecad) behaviour seems back to normal.
> 
> Well, that basically defeats the purpose of the change in commit
> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
> 
> Also it would be good to understand what actually happens.
> 
> > I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
> > to cpufreq drivers and you better not use it.
> 
> That's exactly right.
> 
> > But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
> > stopping the tick") introduced bad behaviour for other cases as well.
> 
> It has been tested quite extensively in that respect, although
> admittedly not with the pcc-cpufreq driver.
> 
> Nothing bad related to it has been has been reported so far, FWIW.
> 
> > I'll send some performance results to illustrate the issue asap. I've
> > also tried to modify pcc-cpufreq to reduce the amount of frequency
> > changes triggered by this driver but this does not help for kernels
> > where commit 554c8aa8ecad is applied.
> 
> Can you replace pcc-cpufreq with a different cpufreq driver on the
> affected systems?  If so, do performance numbers look bad after that
> too?

I have no performance numbers yet for other cpufreq drivers on this
system (checking this commit).  But I'll look it at next.


Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  8:03   ` Rafael J. Wysocki
@ 2018-07-17  8:50     ` Andreas Herrmann
  2018-07-17  8:58       ` Rafael J. Wysocki
  2018-07-17  9:06       ` Rafael J. Wysocki
  0 siblings, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17  8:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 10:03:41AM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 17, 2018 at 9:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > Hi,
> >
> > Thanks for your report!
> >
> > On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> >> Hello,
> >>
> >> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
> >> idle state before stopping the tick") causes severe performance drop
> >> for systems using pcc-cpufreq driver. Depending on the number of CPUs
> >> the system might be almost unusable. The OS jitter for 4.17.y and
> >> 4.18.-rcx kernels is off the charts, you can even spot it with top
> >> command (issued when the system is supposedly idle), e.g.
> >>
> >>  top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
> >>  Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
> >>  %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
> >>  KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
> >>  KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem
> >>
> >>    PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
> >>   3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
> >>     67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
> >>     25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
> >>    182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
> >>     43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
> >>    103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
> >>    334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
> >>    789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
> >>    418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
> >>    779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
> >>    773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
> >>    762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
> >>    769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
> >>    805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
> >>    840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
> >>    812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
> >>    847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
> >>    763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
> >>    772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
> >>    821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
> >>    923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
> >>   1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
> >>     61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
> >>   3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
> >>    771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
> >>    767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
> >>    764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
> >>    765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1
> >>
> >> When I apply below patch (trying to revert essential parts of commit
> >> 554c8aa8ecad) behaviour seems back to normal.
> >
> > Well, that basically defeats the purpose of the change in commit
> > 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
> >
> > Also it would be good to understand what actually happens.
> >
> >> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
> >> to cpufreq drivers and you better not use it.
> >
> > That's exactly right.
> >
> >> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
> >> stopping the tick") introduced bad behaviour for other cases as well.
> >
> > It has been tested quite extensively in that respect, although
> > admittedly not with the pcc-cpufreq driver.
> >
> > Nothing bad related to it has been has been reported so far, FWIW.
> >
> >> I'll send some performance results to illustrate the issue asap. I've
> >> also tried to modify pcc-cpufreq to reduce the amount of frequency
> >> changes triggered by this driver but this does not help for kernels
> >> where commit 554c8aa8ecad is applied.
> >
> > Can you replace pcc-cpufreq with a different cpufreq driver on the
> > affected systems?  If so, do performance numbers look bad after that
> > too?
> 
> Also, what cpufreq governor do you use with pcc-cpufreq?

Ondemand governor. Which triggers a lot of PCC related platform calls.
And as Peter noticed already the driver has a severe bottleneck (lock
protecting shared memory used for all CPUs to pass data to/from
platform for PCC calls).

> Does changing it to something like "performance" improve things?

With performance governor above mentioned bottleneck is no issue.

On balance before this commit users could use pcc-cpufreq but had
already suboptimal performance (compared to say intel_pstate driver
which can be used changing BIOS options). Starting with this commit
systems using pcc-cpufreq are unusable with high number of CPUs (top
output above is for system with 120 CPUs).

So should the driver be removed (sooner or later), or this behaviour
be documented somewhere, or just leave it as is.


Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  8:36   ` Andreas Herrmann
@ 2018-07-17  8:52     ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17  8:52 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux PM,
	Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 10:36 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 09:33:53AM +0200, Rafael J. Wysocki wrote:
>> Hi,
>>
>> Thanks for your report!
>>
>> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> > Hello,
>> >
>> > I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> > idle state before stopping the tick") causes severe performance drop
>> > for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> > the system might be almost unusable. The OS jitter for 4.17.y and
>> > 4.18.-rcx kernels is off the charts, you can even spot it with top
>> > command (issued when the system is supposedly idle), e.g.
>> >
>> >  top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
>> >  Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
>> >  %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
>> >  KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
>> >  KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem
>> >
>> >    PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
>> >   3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
>> >     67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
>> >     25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
>> >    182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
>> >     43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
>> >    103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
>> >    334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
>> >    789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
>> >    418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
>> >    779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
>> >    773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
>> >    762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
>> >    769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
>> >    805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
>> >    840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
>> >    812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
>> >    847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
>> >    763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
>> >    772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
>> >    821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
>> >    923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
>> >   1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
>> >     61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
>> >   3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
>> >    771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
>> >    767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
>> >    764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
>> >    765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1
>> >
>> > When I apply below patch (trying to revert essential parts of commit
>> > 554c8aa8ecad) behaviour seems back to normal.
>>
>> Well, that basically defeats the purpose of the change in commit
>> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>>
>> Also it would be good to understand what actually happens.
>>
>> > I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> > to cpufreq drivers and you better not use it.
>>
>> That's exactly right.
>>
>> > But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> > stopping the tick") introduced bad behaviour for other cases as well.
>>
>> It has been tested quite extensively in that respect, although
>> admittedly not with the pcc-cpufreq driver.
>>
>> Nothing bad related to it has been has been reported so far, FWIW.
>>
>> > I'll send some performance results to illustrate the issue asap. I've
>> > also tried to modify pcc-cpufreq to reduce the amount of frequency
>> > changes triggered by this driver but this does not help for kernels
>> > where commit 554c8aa8ecad is applied.
>>
>> Can you replace pcc-cpufreq with a different cpufreq driver on the
>> affected systems?  If so, do performance numbers look bad after that
>> too?
>
> I have no performance numbers yet for other cpufreq drivers on this
> system (checking this commit).  But I'll look it at next.

Thanks!

Generally speaking, pcc-cpufreq is fundamentally not scalable, so the
additional concurrency brought in by the commit in question may have
exposed that weakness if that driver is run on a system with multiple
logical CPUs.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  8:50     ` Andreas Herrmann
@ 2018-07-17  8:58       ` Rafael J. Wysocki
  2018-07-17  9:06       ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17  8:58 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux PM,
	Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 10:03:41AM +0200, Rafael J. Wysocki wrote:
>> On Tue, Jul 17, 2018 at 9:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> > Hi,
>> >
>> > Thanks for your report!
>> >
>> > On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> >> Hello,
>> >>
>> >> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> >> idle state before stopping the tick") causes severe performance drop
>> >> for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> >> the system might be almost unusable. The OS jitter for 4.17.y and
>> >> 4.18.-rcx kernels is off the charts, you can even spot it with top
>> >> command (issued when the system is supposedly idle), e.g.
>> >>
>> >>  top - 14:44:24 up 2 min,  1 user,  load average: 90.11, 38.20, 14.38
>> >>  Tasks: 1199 total, 109 running, 541 sleeping,   0 stopped,   0 zombie
>> >>  %Cpu(s):  1.2 us, 58.7 sy,  0.0 ni, 39.3 id,  0.6 wa,  0.0 hi,  0.3 si,  0.0 st
>> >>  KiB Mem:  13137064+total,  1192168 used, 13017848+free,     2340 buffers
>> >>  KiB Swap:  2104316 total,        0 used,  2104316 free.   522296 cached Mem
>> >>
>> >>    PID USER      PR  NI    VIRT    RES    SHR S    %CPU  %MEM     TIME+ COMMAND
>> >>   3373 root      20   0  982024  49916  36120 R  96.691 0.038   0:19.54 kubelet
>> >>     67 root      20   0       0      0      0 R  78.676 0.000   0:49.36 kworker/9:0
>> >>     25 root      20   0       0      0      0 R  78.125 0.000   0:49.67 kworker/2:0
>> >>    182 root      20   0       0      0      0 R  75.735 0.000   1:18.17 kworker/28:0
>> >>     43 root      20   0       0      0      0 R  75.000 0.000   0:11.56 kworker/5:0
>> >>    103 root      20   0       0      0      0 R  74.449 0.000   0:46.83 kworker/15:0
>> >>    334 root      20   0       0      0      0 R  72.978 0.000   1:06.88 kworker/53:0
>> >>    789 root      20   0       0      0      0 R  69.853 0.000   1:29.50 kworker/38:1
>> >>    418 root      20   0       0      0      0 R  69.301 0.000   0:41.33 kworker/67:0
>> >>    779 root      20   0       0      0      0 R  68.934 0.000   1:33.60 kworker/27:1
>> >>    773 root      20   0       0      0      0 R  68.566 0.000   1:37.91 kworker/22:1
>> >>    762 root      20   0       0      0      0 R  68.015 0.000   1:41.01 kworker/11:1
>> >>    769 root      20   0       0      0      0 R  67.647 0.000   1:37.65 kworker/18:1
>> >>    805 root      20   0       0      0      0 R  67.096 0.000   1:30.96 kworker/54:1
>> >>    840 root      20   0       0      0      0 R  66.912 0.000   1:23.82 kworker/89:1
>> >>    812 root      20   0       0      0      0 R  66.728 0.000   1:31.89 kworker/59:1
>> >>    847 root      20   0       0      0      0 R  66.360 0.000   1:28.40 kworker/96:1
>> >>    763 root      20   0       0      0      0 R  66.176 0.000   1:42.57 kworker/12:1
>> >>    772 root      20   0       0      0      0 R  66.176 0.000   1:12.58 kworker/21:1
>> >>    821 root      20   0       0      0      0 R  66.176 0.000   1:29.62 kworker/69:1
>> >>    923 root      20   0       0      0      0 R  65.809 0.000   1:44.32 kworker/3:18
>> >>   1284 root      20   0       0      0      0 R  65.809 0.000   1:23.50 kworker/101:2
>> >>     61 root      20   0       0      0      0 R  65.625 0.000   1:29.37 kworker/8:0
>> >>   3531 root      20   0   24384   3768   2356 R  65.625 0.003   0:08.91 top
>> >>    771 root      20   0       0      0      0 R  65.074 0.000   1:37.90 kworker/20:1
>> >>    767 root      20   0       0      0      0 R  64.706 0.000   1:38.01 kworker/16:1
>> >>    764 root      20   0       0      0      0 R  64.522 0.000   1:40.28 kworker/13:1
>> >>    765 root      20   0       0      0      0 R  64.154 0.000   1:40.13 kworker/14:1
>> >>
>> >> When I apply below patch (trying to revert essential parts of commit
>> >> 554c8aa8ecad) behaviour seems back to normal.
>> >
>> > Well, that basically defeats the purpose of the change in commit
>> > 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>> >
>> > Also it would be good to understand what actually happens.
>> >
>> >> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> >> to cpufreq drivers and you better not use it.
>> >
>> > That's exactly right.
>> >
>> >> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> >> stopping the tick") introduced bad behaviour for other cases as well.
>> >
>> > It has been tested quite extensively in that respect, although
>> > admittedly not with the pcc-cpufreq driver.
>> >
>> > Nothing bad related to it has been has been reported so far, FWIW.
>> >
>> >> I'll send some performance results to illustrate the issue asap. I've
>> >> also tried to modify pcc-cpufreq to reduce the amount of frequency
>> >> changes triggered by this driver but this does not help for kernels
>> >> where commit 554c8aa8ecad is applied.
>> >
>> > Can you replace pcc-cpufreq with a different cpufreq driver on the
>> > affected systems?  If so, do performance numbers look bad after that
>> > too?
>>
>> Also, what cpufreq governor do you use with pcc-cpufreq?
>
> Ondemand governor. Which triggers a lot of PCC related platform calls.
> And as Peter noticed already the driver has a severe bottleneck (lock
> protecting shared memory used for all CPUs to pass data to/from
> platform for PCC calls).
>
>> Does changing it to something like "performance" improve things?
>
> With performance governor above mentioned bottleneck is no issue.

OK

> On balance before this commit users could use pcc-cpufreq but had
> already suboptimal performance (compared to say intel_pstate driver
> which can be used changing BIOS options). Starting with this commit
> systems using pcc-cpufreq are unusable with high number of CPUs (top
> output above is for system with 120 CPUs).

I see. :-)

> So should the driver be removed (sooner or later), or this behaviour
> be documented somewhere, or just leave it as is.

At least it should be documented in the driver that it is not scalable
and not for use on many-CPU systems (with "many" meaning anything
greater than 4 probably).

Or we could just make the driver not load if the number of CPUs in the
system is greater than 4 or similar.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  8:15 ` Peter Zijlstra
@ 2018-07-17  9:05   ` Andreas Herrmann
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17  9:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Frederic Weisbecker, Viresh Kumar, linux-pm,
	linux-kernel

On Tue, Jul 17, 2018 at 10:15:07AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 17, 2018 at 08:50:48AM +0200, Andreas Herrmann wrote:
> > I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
> > idle state before stopping the tick") causes severe performance drop
> > for systems using pcc-cpufreq driver. Depending on the number of CPUs
> > the system might be almost unusable. 
> 
> Have you looked at that driver? It features a global lock which is taken
> for all cpufreq operations. Once you get past a handful of cpus that
> will constrict your system exactly as you say.

Yes, I figured that already and you better use another cpufreq driver
on those systems. (But that such systems with this driver are almost
unusable is a new feature ;-)

> What kind of machine are you running this on, and can you configure the
> thing to use anything else?

Its an older system.

HP ProLiant DL580 Gen8, CPUs are Intel(R) Xeon(R) CPU E7-4890
4 nodes, With HT enabled 120 logical CPUs.

Yes you can configure it using something else.

For the record. On this and similar systems pcc-cpufreq driver is used
when BIOS setting for power management option "Power Regulator" is set
to

  "Dynamic Power Savings Mode"

AFAIK this is the default setting for this option. When changing this
setting to "OS Control Mode" intel_pstate driver should be loaded.

> If so, do so and never look back.

Ok, I am fine with this.


Thanks,

Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  8:50     ` Andreas Herrmann
  2018-07-17  8:58       ` Rafael J. Wysocki
@ 2018-07-17  9:06       ` Rafael J. Wysocki
  2018-07-17  9:11         ` Andreas Herrmann
  1 sibling, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17  9:06 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux PM,
	Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:

[cut]

>
> On balance before this commit users could use pcc-cpufreq but had
> already suboptimal performance (compared to say intel_pstate driver
> which can be used changing BIOS options).

BTW, I wonder why you need to change the BIOS options for intel_pstate to load.

It should be initialized before pcc-cpufreq (according to their
respective initcall levels), so in theory intel_pstate should be used
by default on the affected systems anyway.

What BIOS settings need to be changed for that?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  9:06       ` Rafael J. Wysocki
@ 2018-07-17  9:11         ` Andreas Herrmann
  2018-07-17  9:23           ` Rafael J. Wysocki
  0 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17  9:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> 
> [cut]
> 
> >
> > On balance before this commit users could use pcc-cpufreq but had
> > already suboptimal performance (compared to say intel_pstate driver
> > which can be used changing BIOS options).
> 
> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.

I think this is because of (in intel_pstate_init()):

        /*
         * The Intel pstate driver will be ignored if the platform
         * firmware has its own power management modes.
         */
        if (intel_pstate_platform_pwr_mgmt_exists())
                return -ENODEV;


> It should be initialized before pcc-cpufreq (according to their
> respective initcall levels), so in theory intel_pstate should be used
> by default on the affected systems anyway.

> What BIOS settings need to be changed for that?

Already answered in other mail.


Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  9:11         ` Andreas Herrmann
@ 2018-07-17  9:23           ` Rafael J. Wysocki
  2018-07-17  9:27             ` Andreas Herrmann
  0 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17  9:23 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux PM,
	Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
>> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>>
>> [cut]
>>
>> >
>> > On balance before this commit users could use pcc-cpufreq but had
>> > already suboptimal performance (compared to say intel_pstate driver
>> > which can be used changing BIOS options).
>>
>> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
>
> I think this is because of (in intel_pstate_init()):
>
>         /*
>          * The Intel pstate driver will be ignored if the platform
>          * firmware has its own power management modes.
>          */
>         if (intel_pstate_platform_pwr_mgmt_exists())
>                 return -ENODEV;
>

OK, because of the "Proliant" entry, right?

So it looks like we have an issue there.  We find the entry and we
look for _PSS.  It is not there, so we assume that the firmware is
expected to control performance, which is not the case.  It looks like
we also should look for the presence of the PCC interface in there.

I can provide a patch for that, will you be able to test it?

>> It should be initialized before pcc-cpufreq (according to their
>> respective initcall levels), so in theory intel_pstate should be used
>> by default on the affected systems anyway.
>
>> What BIOS settings need to be changed for that?
>
> Already answered in other mail.

Indeed.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  9:23           ` Rafael J. Wysocki
@ 2018-07-17  9:27             ` Andreas Herrmann
  2018-07-17  9:36               ` Andreas Herrmann
  0 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17  9:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> >>
> >> [cut]
> >>
> >> >
> >> > On balance before this commit users could use pcc-cpufreq but had
> >> > already suboptimal performance (compared to say intel_pstate driver
> >> > which can be used changing BIOS options).
> >>
> >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> >
> > I think this is because of (in intel_pstate_init()):
> >
> >         /*
> >          * The Intel pstate driver will be ignored if the platform
> >          * firmware has its own power management modes.
> >          */
> >         if (intel_pstate_platform_pwr_mgmt_exists())
> >                 return -ENODEV;
> >
> 
> OK, because of the "Proliant" entry, right?
> 
> So it looks like we have an issue there.  We find the entry and we
> look for _PSS.  It is not there, so we assume that the firmware is
> expected to control performance, which is not the case.  It looks like
> we also should look for the presence of the PCC interface in there.
> 
> I can provide a patch for that, will you be able to test it?

Yes, I can test it.

> >> It should be initialized before pcc-cpufreq (according to their
> >> respective initcall levels), so in theory intel_pstate should be used
> >> by default on the affected systems anyway.
> >
> >> What BIOS settings need to be changed for that?
> >
> > Already answered in other mail.
> 
> Indeed.


Andreas


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  9:27             ` Andreas Herrmann
@ 2018-07-17  9:36               ` Andreas Herrmann
  2018-07-17 10:09                 ` Rafael J. Wysocki
  2018-07-17 10:18                 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
  0 siblings, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > >>
> > >> [cut]
> > >>
> > >> >
> > >> > On balance before this commit users could use pcc-cpufreq but had
> > >> > already suboptimal performance (compared to say intel_pstate driver
> > >> > which can be used changing BIOS options).
> > >>
> > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > >
> > > I think this is because of (in intel_pstate_init()):
> > >
> > >         /*
> > >          * The Intel pstate driver will be ignored if the platform
> > >          * firmware has its own power management modes.
> > >          */
> > >         if (intel_pstate_platform_pwr_mgmt_exists())
> > >                 return -ENODEV;
> > >
> > 
> > OK, because of the "Proliant" entry, right?
> > 
> > So it looks like we have an issue there.  We find the entry and we
> > look for _PSS.  It is not there, so we assume that the firmware is
> > expected to control performance, which is not the case.

FYI, there is another BIOS setting on those systems. It's called
"Collaborative Power Control" (AFAIK enabled by default).

Only if this is disabled, firmware is (alone) in control of
performance. (And of course in this case neither pcc-cpufreq nor
intel_pstate will be loaded).

> > It looks like we also should look for the presence of the PCC
> > interface in there.
> >
> > I can provide a patch for that, will you be able to test it?
> 
> Yes, I can test it.
> 
> > >> It should be initialized before pcc-cpufreq (according to their
> > >> respective initcall levels), so in theory intel_pstate should be used
> > >> by default on the affected systems anyway.
> > >
> > >> What BIOS settings need to be changed for that?
> > >
> > > Already answered in other mail.
> > 
> > Indeed.
> 
> 
> Andreas
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  9:36               ` Andreas Herrmann
@ 2018-07-17 10:09                 ` Rafael J. Wysocki
  2018-07-17 10:21                   ` Andreas Herrmann
  2018-07-17 10:18                 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
  1 sibling, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 10:09 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tuesday, July 17, 2018 11:36:20 AM CEST Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > On balance before this commit users could use pcc-cpufreq but had
> > > >> > already suboptimal performance (compared to say intel_pstate driver
> > > >> > which can be used changing BIOS options).
> > > >>
> > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > > >
> > > > I think this is because of (in intel_pstate_init()):
> > > >
> > > >         /*
> > > >          * The Intel pstate driver will be ignored if the platform
> > > >          * firmware has its own power management modes.
> > > >          */
> > > >         if (intel_pstate_platform_pwr_mgmt_exists())
> > > >                 return -ENODEV;
> > > >
> > > 
> > > OK, because of the "Proliant" entry, right?
> > > 
> > > So it looks like we have an issue there.  We find the entry and we
> > > look for _PSS.  It is not there, so we assume that the firmware is
> > > expected to control performance, which is not the case.
> 
> FYI, there is another BIOS setting on those systems. It's called
> "Collaborative Power Control" (AFAIK enabled by default).
> 
> Only if this is disabled, firmware is (alone) in control of
> performance. (And of course in this case neither pcc-cpufreq nor
> intel_pstate will be loaded).

OK, the patch is below.

First, I hope that if "Collaborative Power Control" is disabled, it will
simply hide the PCCH object and so intel_pstate will still not load then.

The main question basically is what the OS is expected to do if "Dynamic
Power Savings Mode" is set.  If we are *expected* to use the PCC interface
then, intel_pstate may not work in that case, but I suspect that the PCC
interface allows extra energy to be saved over what is possible without it.

---
 drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
 	return true;
 }
 
+static bool __init intel_pstate_no_acpi_pcch(void)
+{
+	acpi_status status;
+	acpi_handle handle;
+
+	status = acpi_get_handle(NULL, "\\_SB", &handle);
+	if (ACPI_FAILURE(status))
+		return true;
+
+	return !acpi_has_method(handle, "PCCH");
+}
+
 static bool __init intel_pstate_has_acpi_ppc(void)
 {
 	int i;
@@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
 
 	switch (plat_info[idx].data) {
 	case PSS:
-		return intel_pstate_no_acpi_pss();
+		if (!intel_pstate_no_acpi_pss())
+			return false;
+
+		return intel_pstate_no_acpi_pcch();
 	case PPC:
 		return intel_pstate_has_acpi_ppc() && !force_load;
 	}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  9:36               ` Andreas Herrmann
  2018-07-17 10:09                 ` Rafael J. Wysocki
@ 2018-07-17 10:18                 ` Andreas Herrmann
  1 sibling, 0 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 10:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 11:36:20AM +0200, Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > On balance before this commit users could use pcc-cpufreq but had
> > > >> > already suboptimal performance (compared to say intel_pstate driver
> > > >> > which can be used changing BIOS options).
> > > >>
> > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > > >
> > > > I think this is because of (in intel_pstate_init()):
> > > >
> > > >         /*
> > > >          * The Intel pstate driver will be ignored if the platform
> > > >          * firmware has its own power management modes.
> > > >          */
> > > >         if (intel_pstate_platform_pwr_mgmt_exists())
> > > >                 return -ENODEV;
> > > >
> > > 
> > > OK, because of the "Proliant" entry, right?
> > > 
> > > So it looks like we have an issue there.  We find the entry and we
> > > look for _PSS.  It is not there, so we assume that the firmware is
> > > expected to control performance, which is not the case.

> FYI, there is another BIOS setting on those systems. It's called
> "Collaborative Power Control" (AFAIK enabled by default).
> 
> Only if this is disabled, firmware is (alone) in control of
> performance. (And of course in this case neither pcc-cpufreq nor
> intel_pstate will be loaded).

To clarify:

(i)  When "Dynamic Power Savings Mode" is enabled _PSS objects are
     missing (in fact they seem to be renamed to "XPSS").
     pcc-cpufreq driver evaluates PCCH header and loads.

     Same when "Collaborative Power Control" is disabled. No _PSS but
     XPSS objects. pcc-cpufreq driver fails to evaluate PCCH header,
     no cpufreq driver is loaded.

(ii) There are _PSS objects when "OS Control Mode" is selected.

     (But I think when "Collaborative Power Control" is disabled there
     are no _PSS objects either and the "OS Control Mode" selection
     does not matter.)

> > > It looks like we also should look for the presence of the PCC
> > > interface in there.
> > >
> > > I can provide a patch for that, will you be able to test it?
> > 
> > Yes, I can test it.
> > 
> > > >> It should be initialized before pcc-cpufreq (according to their
> > > >> respective initcall levels), so in theory intel_pstate should be used
> > > >> by default on the affected systems anyway.
> > > >
> > > >> What BIOS settings need to be changed for that?
> > > >
> > > > Already answered in other mail.
> > > 
> > > Indeed.
> > 
> > 
> > Andreas
> > 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17 10:09                 ` Rafael J. Wysocki
@ 2018-07-17 10:21                   ` Andreas Herrmann
  2018-07-17 10:23                     ` Rafael J. Wysocki
  2018-07-17 14:03                     ` Andreas Herrmann
  0 siblings, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 10:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, July 17, 2018 11:36:20 AM CEST Andreas Herrmann wrote:
> > On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> > > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > >>
> > > > >> [cut]
> > > > >>
> > > > >> >
> > > > >> > On balance before this commit users could use pcc-cpufreq but had
> > > > >> > already suboptimal performance (compared to say intel_pstate driver
> > > > >> > which can be used changing BIOS options).
> > > > >>
> > > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > > > >
> > > > > I think this is because of (in intel_pstate_init()):
> > > > >
> > > > >         /*
> > > > >          * The Intel pstate driver will be ignored if the platform
> > > > >          * firmware has its own power management modes.
> > > > >          */
> > > > >         if (intel_pstate_platform_pwr_mgmt_exists())
> > > > >                 return -ENODEV;
> > > > >
> > > > 
> > > > OK, because of the "Proliant" entry, right?
> > > > 
> > > > So it looks like we have an issue there.  We find the entry and we
> > > > look for _PSS.  It is not there, so we assume that the firmware is
> > > > expected to control performance, which is not the case.
> > 
> > FYI, there is another BIOS setting on those systems. It's called
> > "Collaborative Power Control" (AFAIK enabled by default).
> > 
> > Only if this is disabled, firmware is (alone) in control of
> > performance. (And of course in this case neither pcc-cpufreq nor
> > intel_pstate will be loaded).
> 
> OK, the patch is below.
> 
> First, I hope that if "Collaborative Power Control" is disabled, it will
> simply hide the PCCH object and so intel_pstate will still not load then.

PCCH is hidden in that case.

> The main question basically is what the OS is expected to do if
> "Dynamic Power Savings Mode" is set.  If we are *expected* to use
> the PCC interface then, intel_pstate may not work in that case, but
> I suspect that the PCC interface allows extra energy to be saved
> over what is possible without it.

I'll test it and see what happens.


Andreas

> ---
>  drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
>  	return true;
>  }
>  
> +static bool __init intel_pstate_no_acpi_pcch(void)
> +{
> +	acpi_status status;
> +	acpi_handle handle;
> +
> +	status = acpi_get_handle(NULL, "\\_SB", &handle);
> +	if (ACPI_FAILURE(status))
> +		return true;
> +
> +	return !acpi_has_method(handle, "PCCH");
> +}
> +
>  static bool __init intel_pstate_has_acpi_ppc(void)
>  {
>  	int i;
> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>  
>  	switch (plat_info[idx].data) {
>  	case PSS:
> -		return intel_pstate_no_acpi_pss();
> +		if (!intel_pstate_no_acpi_pss())
> +			return false;
> +
> +		return intel_pstate_no_acpi_pcch();
>  	case PPC:
>  		return intel_pstate_has_acpi_ppc() && !force_load;
>  	}
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17 10:21                   ` Andreas Herrmann
@ 2018-07-17 10:23                     ` Rafael J. Wysocki
  2018-07-17 14:03                     ` Andreas Herrmann
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 10:23 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux PM,
	Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 12:21 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:
>> On Tuesday, July 17, 2018 11:36:20 AM CEST Andreas Herrmann wrote:
>> > On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
>> > > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
>> > > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> > > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
>> > > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> > > > >>
>> > > > >> [cut]
>> > > > >>
>> > > > >> >
>> > > > >> > On balance before this commit users could use pcc-cpufreq but had
>> > > > >> > already suboptimal performance (compared to say intel_pstate driver
>> > > > >> > which can be used changing BIOS options).
>> > > > >>
>> > > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
>> > > > >
>> > > > > I think this is because of (in intel_pstate_init()):
>> > > > >
>> > > > >         /*
>> > > > >          * The Intel pstate driver will be ignored if the platform
>> > > > >          * firmware has its own power management modes.
>> > > > >          */
>> > > > >         if (intel_pstate_platform_pwr_mgmt_exists())
>> > > > >                 return -ENODEV;
>> > > > >
>> > > >
>> > > > OK, because of the "Proliant" entry, right?
>> > > >
>> > > > So it looks like we have an issue there.  We find the entry and we
>> > > > look for _PSS.  It is not there, so we assume that the firmware is
>> > > > expected to control performance, which is not the case.
>> >
>> > FYI, there is another BIOS setting on those systems. It's called
>> > "Collaborative Power Control" (AFAIK enabled by default).
>> >
>> > Only if this is disabled, firmware is (alone) in control of
>> > performance. (And of course in this case neither pcc-cpufreq nor
>> > intel_pstate will be loaded).
>>
>> OK, the patch is below.
>>
>> First, I hope that if "Collaborative Power Control" is disabled, it will
>> simply hide the PCCH object and so intel_pstate will still not load then.
>
> PCCH is hidden in that case.

OK

>> The main question basically is what the OS is expected to do if
>> "Dynamic Power Savings Mode" is set.  If we are *expected* to use
>> the PCC interface then, intel_pstate may not work in that case, but
>> I suspect that the PCC interface allows extra energy to be saved
>> over what is possible without it.
>
> I'll test it and see what happens.

Thanks!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
  2018-07-17  6:50 Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
  2018-07-17  7:33 ` Rafael J. Wysocki
  2018-07-17  8:15 ` Peter Zijlstra
@ 2018-07-17 12:02 ` Rafael J. Wysocki
  2018-07-17 16:14   ` [PATCH v2] " Rafael J. Wysocki
  2018-07-18 15:25 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
  3 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 12:02 UTC (permalink / raw)
  To: linux-pm
  Cc: Andreas Herrmann, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, linux-kernel

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The firmware interface used by the pcc-cpufreq driver is
fundamentally not scalable and using it for dynamic CPU performance
scaling on systems with many CPUs leads to degraded performance.

For this reason, disable dynamic CPU performance scaling on systems
with pcc-cpufreq where the number of CPUs present at the driver init
time is greater than 4.  Also make the driver print corresponding
complaints to the kernel log.

Reported-by: Andreas Herrmann <aherrmann@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/pcc-cpufreq.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-pm/drivers/cpufreq/pcc-cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/pcc-cpufreq.c
+++ linux-pm/drivers/cpufreq/pcc-cpufreq.c
@@ -589,6 +589,14 @@ static int __init pcc_cpufreq_init(void)
 		return ret;
 	}
 
+	if (num_present_cpus() > 4) {
+		pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
+		pr_err("%s: Unsuitable system, dynamic performance scaling disabled\n",
+		       __func__);
+		pr_err("%s: Change BIOS settings and complain to the hardware vendor\n",
+		       __func__);
+	}
+
 	ret = cpufreq_register_driver(&pcc_cpufreq_driver);
 
 	return ret;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17 10:21                   ` Andreas Herrmann
  2018-07-17 10:23                     ` Rafael J. Wysocki
@ 2018-07-17 14:03                     ` Andreas Herrmann
  2018-07-17 15:29                       ` Rafael J. Wysocki
  2018-07-17 16:13                       ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
  1 sibling, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 14:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 12:21:36PM +0200, Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:

  ---8<---

> > OK, the patch is below.
> > 
> > First, I hope that if "Collaborative Power Control" is disabled, it will
> > simply hide the PCCH object and so intel_pstate will still not load then.
> 
> PCCH is hidden in that case.
> 
> > The main question basically is what the OS is expected to do if
> > "Dynamic Power Savings Mode" is set.  If we are *expected* to use
> > the PCC interface then, intel_pstate may not work in that case, but
> > I suspect that the PCC interface allows extra energy to be saved
> > over what is possible without it.
> 
> I'll test it and see what happens.

I've tested it on top of v4.18-rc5-36-g30b06abfb92b. intel_pstate now
loads instead of pcc-cpufreq and system looks stable.

When disabling "Collaborative Power Control" no cpufreq driver is loaded
(as expected).

Performance (with kernbench) is as expected (always better than any
brew of pcc-cpufreq + misc modifications to this driver + partial
rollback of commit 554c8aa8ecad).

If you like you can add either Tested-by or
Reviewed-by: Andreas Herrmann <aherrmann@suse.com>

I think this patch should be tagged for 4.17-stable.


Thanks,

Andreas

> > ---
> >  drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> > @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
> >  	return true;
> >  }
> >  
> > +static bool __init intel_pstate_no_acpi_pcch(void)
> > +{
> > +	acpi_status status;
> > +	acpi_handle handle;
> > +
> > +	status = acpi_get_handle(NULL, "\\_SB", &handle);
> > +	if (ACPI_FAILURE(status))
> > +		return true;
> > +
> > +	return !acpi_has_method(handle, "PCCH");
> > +}
> > +
> >  static bool __init intel_pstate_has_acpi_ppc(void)
> >  {
> >  	int i;
> > @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
> >  
> >  	switch (plat_info[idx].data) {
> >  	case PSS:
> > -		return intel_pstate_no_acpi_pss();
> > +		if (!intel_pstate_no_acpi_pss())
> > +			return false;
> > +
> > +		return intel_pstate_no_acpi_pcch();
> >  	case PPC:
> >  		return intel_pstate_has_acpi_ppc() && !force_load;
> >  	}
> > 
> > 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17 14:03                     ` Andreas Herrmann
@ 2018-07-17 15:29                       ` Rafael J. Wysocki
  2018-07-17 16:13                       ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 15:29 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux PM,
	Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 4:03 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 12:21:36PM +0200, Andreas Herrmann wrote:
>> On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:
>
>   ---8<---
>
>> > OK, the patch is below.
>> >
>> > First, I hope that if "Collaborative Power Control" is disabled, it will
>> > simply hide the PCCH object and so intel_pstate will still not load then.
>>
>> PCCH is hidden in that case.
>>
>> > The main question basically is what the OS is expected to do if
>> > "Dynamic Power Savings Mode" is set.  If we are *expected* to use
>> > the PCC interface then, intel_pstate may not work in that case, but
>> > I suspect that the PCC interface allows extra energy to be saved
>> > over what is possible without it.
>>
>> I'll test it and see what happens.
>
> I've tested it on top of v4.18-rc5-36-g30b06abfb92b. intel_pstate now
> loads instead of pcc-cpufreq and system looks stable.
>
> When disabling "Collaborative Power Control" no cpufreq driver is loaded
> (as expected).
>
> Performance (with kernbench) is as expected (always better than any
> brew of pcc-cpufreq + misc modifications to this driver + partial
> rollback of commit 554c8aa8ecad).
>
> If you like you can add either Tested-by or
> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
>
> I think this patch should be tagged for 4.17-stable.

OK, thank you for testing!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present
  2018-07-17 14:03                     ` Andreas Herrmann
  2018-07-17 15:29                       ` Rafael J. Wysocki
@ 2018-07-17 16:13                       ` Rafael J. Wysocki
  2018-07-17 17:23                         ` Srinivas Pandruvada
  2018-07-17 18:06                         ` [PATCH] cpufreq: intel_pstate: Register " Rafael J. Wysocki
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 16:13 UTC (permalink / raw)
  To: Andreas Herrmann, Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Currently, intel_pstate doesn't load if _PSS is not present on
HP Proliant systems, because it expects the firmware to take over
CPU performance scaling in that case.  However, if ACPI PCCH is
present, the firmware expects the kernel to use it for CPU
performance scaling and the pcc-cpufreq driver is loaded for that.

Unfortunately, the firmware interface used by that driver is not
scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
on systems with more than just a few CPUs.  In fact, it is better to
avoid using it at all.

For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
is not present and load if it is there.

Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
Reported-by: Andreas Herrmann <aherrmann@suse.com>
Tested-by: Andreas Herrmann <aherrmann@suse.com>
Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
 	return true;
 }
 
+static bool __init intel_pstate_no_acpi_pcch(void)
+{
+	acpi_status status;
+	acpi_handle handle;
+
+	status = acpi_get_handle(NULL, "\\_SB", &handle);
+	if (ACPI_FAILURE(status))
+		return true;
+
+	return !acpi_has_method(handle, "PCCH");
+}
+
 static bool __init intel_pstate_has_acpi_ppc(void)
 {
 	int i;
@@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
 
 	switch (plat_info[idx].data) {
 	case PSS:
-		return intel_pstate_no_acpi_pss();
+		if (!intel_pstate_no_acpi_pss())
+			return false;
+
+		return intel_pstate_no_acpi_pcch();
 	case PPC:
 		return intel_pstate_has_acpi_ppc() && !force_load;
 	}


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
  2018-07-17 12:02 ` [PATCH] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems Rafael J. Wysocki
@ 2018-07-17 16:14   ` Rafael J. Wysocki
  2018-07-17 20:13     ` Andreas Herrmann
  0 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 16:14 UTC (permalink / raw)
  To: linux-pm
  Cc: Andreas Herrmann, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, linux-kernel

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The firmware interface used by the pcc-cpufreq driver is
fundamentally not scalable and using it for dynamic CPU performance
scaling on systems with many CPUs leads to degraded performance.

For this reason, disable dynamic CPU performance scaling on systems
with pcc-cpufreq where the number of CPUs present at the driver init
time is greater than 4.  Also make the driver print corresponding
complaints to the kernel log.

Reported-by: Andreas Herrmann <aherrmann@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2: Rework the messages printed in the problematic case.

---
 drivers/cpufreq/pcc-cpufreq.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux-pm/drivers/cpufreq/pcc-cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/pcc-cpufreq.c
+++ linux-pm/drivers/cpufreq/pcc-cpufreq.c
@@ -589,6 +589,15 @@ static int __init pcc_cpufreq_init(void)
 		return ret;
 	}
 
+	if (num_present_cpus() > 4) {
+		pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
+		pr_err("%s: Too many CPUs, dynamic performance scaling disabled\n",
+		       __func__);
+		pr_err("%s: Try to enable a different scaling driver through BIOS settings\n",
+		       __func__);
+		pr_err("%s: and complain to the system vendor\n", __func__);
+	}
+
 	ret = cpufreq_register_driver(&pcc_cpufreq_driver);
 
 	return ret;


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present
  2018-07-17 16:13                       ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
@ 2018-07-17 17:23                         ` Srinivas Pandruvada
  2018-07-17 17:28                           ` Rafael J. Wysocki
  2018-07-17 18:06                         ` [PATCH] cpufreq: intel_pstate: Register " Rafael J. Wysocki
  1 sibling, 1 reply; 36+ messages in thread
From: Srinivas Pandruvada @ 2018-07-17 17:23 UTC (permalink / raw)
  To: Rafael J. Wysocki, Andreas Herrmann
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux PM, Linux Kernel Mailing List

On Tue, 2018-07-17 at 18:13 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Currently, intel_pstate doesn't load if _PSS is not present on
> HP Proliant systems, because it expects the firmware to take over
> CPU performance scaling in that case.  However, if ACPI PCCH is
> present, the firmware expects the kernel to use it for CPU
> performance scaling and the pcc-cpufreq driver is loaded for that.
> 
> Unfortunately, the firmware interface used by that driver is not
> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
> on systems with more than just a few CPUs.  In fact, it is better to
> avoid using it at all.
> 
> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
> is not present and load if it is there.
> 
> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power
> mgmt option)
> Reported-by: Andreas Herrmann <aherrmann@suse.com>
> Tested-by: Andreas Herrmann <aherrmann@suse.com>
> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

But do we need a change as done by the following commit in in pcc-
cpufreq.c?

"
commit 8a61e12e84597b5f8155ac91b44dea866ccfaac2
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Sep 20 10:43:56 2013 -0700

    acpi-cpufreq: skip loading acpi_cpufreq after intel_pstate
    
"

Thanks,
Srinivas

> ---
>  drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
>  	return true;
>  }
>  
> +static bool __init intel_pstate_no_acpi_pcch(void)
> +{
> +	acpi_status status;
> +	acpi_handle handle;
> +
> +	status = acpi_get_handle(NULL, "\\_SB", &handle);
> +	if (ACPI_FAILURE(status))
> +		return true;
> +
> +	return !acpi_has_method(handle, "PCCH");
> +}
> +
>  static bool __init intel_pstate_has_acpi_ppc(void)
>  {
>  	int i;
> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>  
>  	switch (plat_info[idx].data) {
>  	case PSS:
> -		return intel_pstate_no_acpi_pss();
> +		if (!intel_pstate_no_acpi_pss())
> +			return false;
> +
> +		return intel_pstate_no_acpi_pcch();
>  	case PPC:
>  		return intel_pstate_has_acpi_ppc() && !force_load;
>  	}
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present
  2018-07-17 17:23                         ` Srinivas Pandruvada
@ 2018-07-17 17:28                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 17:28 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Andreas Herrmann, Rafael J. Wysocki,
	Peter Zijlstra, Frederic Weisbecker, Viresh Kumar, Linux PM,
	Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 7:23 PM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> On Tue, 2018-07-17 at 18:13 +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Currently, intel_pstate doesn't load if _PSS is not present on
>> HP Proliant systems, because it expects the firmware to take over
>> CPU performance scaling in that case.  However, if ACPI PCCH is
>> present, the firmware expects the kernel to use it for CPU
>> performance scaling and the pcc-cpufreq driver is loaded for that.
>>
>> Unfortunately, the firmware interface used by that driver is not
>> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
>> on systems with more than just a few CPUs.  In fact, it is better to
>> avoid using it at all.
>>
>> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
>> is not present and load if it is there.
>>
>> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power
>> mgmt option)
>> Reported-by: Andreas Herrmann <aherrmann@suse.com>
>> Tested-by: Andreas Herrmann <aherrmann@suse.com>
>> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
>> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>
> But do we need a change as done by the following commit in in pcc-
> cpufreq.c?
>
> "
> commit 8a61e12e84597b5f8155ac91b44dea866ccfaac2
> Author: Yinghai Lu <yinghai@kernel.org>
> Date:   Fri Sep 20 10:43:56 2013 -0700
>
>     acpi-cpufreq: skip loading acpi_cpufreq after intel_pstate
>
> "

Quite likely yes, good point!


>> ---
>>  drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
>>  1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>> ===================================================================
>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
>>       return true;
>>  }
>>
>> +static bool __init intel_pstate_no_acpi_pcch(void)
>> +{
>> +     acpi_status status;
>> +     acpi_handle handle;
>> +
>> +     status = acpi_get_handle(NULL, "\\_SB", &handle);
>> +     if (ACPI_FAILURE(status))
>> +             return true;
>> +
>> +     return !acpi_has_method(handle, "PCCH");
>> +}
>> +
>>  static bool __init intel_pstate_has_acpi_ppc(void)
>>  {
>>       int i;
>> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>>
>>       switch (plat_info[idx].data) {
>>       case PSS:
>> -             return intel_pstate_no_acpi_pss();
>> +             if (!intel_pstate_no_acpi_pss())
>> +                     return false;
>> +
>> +             return intel_pstate_no_acpi_pcch();
>>       case PPC:
>>               return intel_pstate_has_acpi_ppc() && !force_load;
>>       }
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] cpufreq: intel_pstate: Register when ACPI PCCH is present
  2018-07-17 16:13                       ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
  2018-07-17 17:23                         ` Srinivas Pandruvada
@ 2018-07-17 18:06                         ` Rafael J. Wysocki
  2018-07-18 10:43                           ` Andreas Herrmann
  1 sibling, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 18:06 UTC (permalink / raw)
  To: Andreas Herrmann, Srinivas Pandruvada, Linux PM
  Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux Kernel Mailing List

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Currently, intel_pstate doesn't load if _PSS is not present on
HP Proliant systems, because it expects the firmware to take over
CPU performance scaling in that case.  However, if ACPI PCCH is
present, the firmware expects the kernel to use it for CPU
performance scaling and the pcc-cpufreq driver is loaded for that.

Unfortunately, the firmware interface used by that driver is not
scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
on systems with more than just a few CPUs.  In fact, it is better to
avoid using it at all.

For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
is not present and register if it is there.  Also prevent the
pcc-cpufreq driver from trying to initialize if intel_pstate has
been registered already.

Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
Reported-by: Andreas Herrmann <aherrmann@suse.com>
Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

This is a replacement for https://patchwork.kernel.org/patch/10530091/

In addition to the intel_pstate changes in the above, it also
prevents pcc-cpufreq from trying to initialize (which will fail
ultimately, but may confuse the firmware in the meantime).

Andreas, please test this one and let me know if it still works for you.

Thanks,
Rafael

---
 drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
 drivers/cpufreq/pcc-cpufreq.c  |    4 ++++
 2 files changed, 20 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
 	return true;
 }
 
+static bool __init intel_pstate_no_acpi_pcch(void)
+{
+	acpi_status status;
+	acpi_handle handle;
+
+	status = acpi_get_handle(NULL, "\\_SB", &handle);
+	if (ACPI_FAILURE(status))
+		return true;
+
+	return !acpi_has_method(handle, "PCCH");
+}
+
 static bool __init intel_pstate_has_acpi_ppc(void)
 {
 	int i;
@@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
 
 	switch (plat_info[idx].data) {
 	case PSS:
-		return intel_pstate_no_acpi_pss();
+		if (!intel_pstate_no_acpi_pss())
+			return false;
+
+		return intel_pstate_no_acpi_pcch();
 	case PPC:
 		return intel_pstate_has_acpi_ppc() && !force_load;
 	}
Index: linux-pm/drivers/cpufreq/pcc-cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/pcc-cpufreq.c
+++ linux-pm/drivers/cpufreq/pcc-cpufreq.c
@@ -580,6 +580,10 @@ static int __init pcc_cpufreq_init(void)
 {
 	int ret;
 
+	/* Skip initialization if another cpufreq driver is there. */
+	if (cpufreq_get_current_driver())
+		return 0;
+
 	if (acpi_disabled)
 		return 0;
 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
  2018-07-17 16:14   ` [PATCH v2] " Rafael J. Wysocki
@ 2018-07-17 20:13     ` Andreas Herrmann
  2018-07-18  7:44       ` Rafael J. Wysocki
  2018-07-18  8:23       ` Peter Zijlstra
  0 siblings, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 20:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, linux-kernel

On Tue, Jul 17, 2018 at 06:14:58PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The firmware interface used by the pcc-cpufreq driver is
> fundamentally not scalable and using it for dynamic CPU performance
> scaling on systems with many CPUs leads to degraded performance.
> 
> For this reason, disable dynamic CPU performance scaling on systems
> with pcc-cpufreq where the number of CPUs present at the driver init
> time is greater than 4.  Also make the driver print corresponding
> complaints to the kernel log.
> 
> Reported-by: Andreas Herrmann <aherrmann@suse.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> -> v2: Rework the messages printed in the problematic case.

I've tested this patch. Effect is as expected: driver loads but use of
ondemand governor is not allowed. Sample output:

[   40.757519] pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
[   40.831705] pcc_cpufreq_init: Too many CPUs, dynamic performance scaling disabled
[   40.898353] pcc_cpufreq_init: Try to enable a different scaling driver through BIOS settings
[   40.972327] pcc_cpufreq_init: and complain to the system vendor
[   41.025620] cpufreq: Can't use ondemand governor as dynamic switching is disallowed. Fallback to performance governor
...
[   41.187928] cpufreq: Can't use ondemand governor as dynamic switching is disallowed. Fallback to performance governor

Last message is shown for each online CPU in the system  (ie. 120x).

Looks good to me.


Andreas

> ---
>  drivers/cpufreq/pcc-cpufreq.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> Index: linux-pm/drivers/cpufreq/pcc-cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/pcc-cpufreq.c
> +++ linux-pm/drivers/cpufreq/pcc-cpufreq.c
> @@ -589,6 +589,15 @@ static int __init pcc_cpufreq_init(void)
>  		return ret;
>  	}
>  
> +	if (num_present_cpus() > 4) {
> +		pcc_cpufreq_driver.flags |= CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING;
> +		pr_err("%s: Too many CPUs, dynamic performance scaling disabled\n",
> +		       __func__);
> +		pr_err("%s: Try to enable a different scaling driver through BIOS settings\n",
> +		       __func__);
> +		pr_err("%s: and complain to the system vendor\n", __func__);
> +	}
> +
>  	ret = cpufreq_register_driver(&pcc_cpufreq_driver);
>  
>  	return ret;
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
  2018-07-17 20:13     ` Andreas Herrmann
@ 2018-07-18  7:44       ` Rafael J. Wysocki
  2018-07-18  8:23       ` Peter Zijlstra
  1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-18  7:44 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Linux PM, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 10:13 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 06:14:58PM +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The firmware interface used by the pcc-cpufreq driver is
>> fundamentally not scalable and using it for dynamic CPU performance
>> scaling on systems with many CPUs leads to degraded performance.
>>
>> For this reason, disable dynamic CPU performance scaling on systems
>> with pcc-cpufreq where the number of CPUs present at the driver init
>> time is greater than 4.  Also make the driver print corresponding
>> complaints to the kernel log.
>>
>> Reported-by: Andreas Herrmann <aherrmann@suse.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>
>> -> v2: Rework the messages printed in the problematic case.
>
> I've tested this patch. Effect is as expected: driver loads but use of
> ondemand governor is not allowed. Sample output:
>
> [   40.757519] pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
> [   40.831705] pcc_cpufreq_init: Too many CPUs, dynamic performance scaling disabled
> [   40.898353] pcc_cpufreq_init: Try to enable a different scaling driver through BIOS settings
> [   40.972327] pcc_cpufreq_init: and complain to the system vendor
> [   41.025620] cpufreq: Can't use ondemand governor as dynamic switching is disallowed. Fallback to performance governor
> ...
> [   41.187928] cpufreq: Can't use ondemand governor as dynamic switching is disallowed. Fallback to performance governor
>
> Last message is shown for each online CPU in the system  (ie. 120x).
>
> Looks good to me.

Thanks a lot!

Please also try https://patchwork.kernel.org/patch/10530321/

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
  2018-07-17 20:13     ` Andreas Herrmann
  2018-07-18  7:44       ` Rafael J. Wysocki
@ 2018-07-18  8:23       ` Peter Zijlstra
  2018-07-18  9:34         ` Andreas Herrmann
  1 sibling, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2018-07-18  8:23 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, linux-pm, Rafael J. Wysocki,
	Frederic Weisbecker, Viresh Kumar, linux-kernel

On Tue, Jul 17, 2018 at 10:13:23PM +0200, Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 06:14:58PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > The firmware interface used by the pcc-cpufreq driver is
> > fundamentally not scalable and using it for dynamic CPU performance
> > scaling on systems with many CPUs leads to degraded performance.
> > 
> > For this reason, disable dynamic CPU performance scaling on systems
> > with pcc-cpufreq where the number of CPUs present at the driver init
> > time is greater than 4.  Also make the driver print corresponding
> > complaints to the kernel log.
> > 
> > Reported-by: Andreas Herrmann <aherrmann@suse.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > 
> > -> v2: Rework the messages printed in the problematic case.
> 
> I've tested this patch. Effect is as expected: driver loads but use of
> ondemand governor is not allowed. Sample output:
> 
> [   40.757519] pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
> [   40.831705] pcc_cpufreq_init: Too many CPUs, dynamic performance scaling disabled
> [   40.898353] pcc_cpufreq_init: Try to enable a different scaling driver through BIOS settings

BTW, Andreas, is that BIOS option available through the normal BIOS
settings, or it is in the "secret" BIOS menu that HP has? If it is in
the "secret" one (^A IIRC) then we might want to explicitly mention
that.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
  2018-07-18  8:23       ` Peter Zijlstra
@ 2018-07-18  9:34         ` Andreas Herrmann
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-18  9:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, linux-pm, Rafael J. Wysocki,
	Frederic Weisbecker, Viresh Kumar, linux-kernel

On Wed, Jul 18, 2018 at 10:23:52AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 17, 2018 at 10:13:23PM +0200, Andreas Herrmann wrote:
> > On Tue, Jul 17, 2018 at 06:14:58PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > The firmware interface used by the pcc-cpufreq driver is
> > > fundamentally not scalable and using it for dynamic CPU performance
> > > scaling on systems with many CPUs leads to degraded performance.
> > > 
> > > For this reason, disable dynamic CPU performance scaling on systems
> > > with pcc-cpufreq where the number of CPUs present at the driver init
> > > time is greater than 4.  Also make the driver print corresponding
> > > complaints to the kernel log.
> > > 
> > > Reported-by: Andreas Herrmann <aherrmann@suse.com>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > > 
> > > -> v2: Rework the messages printed in the problematic case.
> > 
> > I've tested this patch. Effect is as expected: driver loads but use of
> > ondemand governor is not allowed. Sample output:
> > 
> > [   40.757519] pcc-cpufreq: (v1.10.00) driver loaded with frequency limits: 1200 MHz, 2800 MHz
> > [   40.831705] pcc_cpufreq_init: Too many CPUs, dynamic performance scaling disabled
> > [   40.898353] pcc_cpufreq_init: Try to enable a different scaling driver through BIOS settings
> 
> BTW, Andreas, is that BIOS option available through the normal BIOS
> settings, or it is in the "secret" BIOS menu that HP has? If it is in
> the "secret" one (^A IIRC) then we might want to explicitly mention
> that.

The relevant options are not in a secret section (so far I was not
even aware of a secret section). But on those systems some stuff is
available via iLO interface (browser) and some only via "BIOS/Platform
Configuration (RBSU)". The iLO interface has a couple of options that
are only visible with an advanced ILO license.

The options affecting Linux cpufreq subsystem are

(a) available only via "BIOS/Platform Configuration (RBSU)"

  [Power Management]->
    [Advanced Power Options]->
      [Collaborative Power Control] == "enabled" | "disabled"

 If set to "disabled": No cpufreq driver will load and platform is
 solely responsible for CPU frequency adaptions (whatever that means).

(b) available via "BIOS/Platform Configuration (RBSU)" and iLO interface

  [Power Management]->
    [Power Regulator] == "Dynamic Power Savings Mode" | "OS Control Mode" |
                         "Static Low Power Mode" | "Static High Performance Mode"

 "Dynamic Power Savings Mode" allows pcc-cpufreq to load and "OS
 Control Mode" allows intel-pstate to be loaded. We now change it such
 that also with "Dynamic Power Savings Mode" intel-pstate is loaded
 (if available; if not, pcc-cpufreq will still be loaded but it now
 emits a warning and disallows use of ondemand governor if too many
 CPUs are in use).

 AFAIK the other (static) modes have no real meaning when
 "Collaborative Power Control" is enabled because as soon as a cpufreq
 driver is loaded frequency will be adapted by OS. I can't remember
 which one (pcc-cpufreq or intel-pstate) would be loaded, I've just
 tried it once or twice.

Note that the above is valid for a DL580 Gen8 with Intel CPUs. It
might be slightly different on Gen9/Gen10 and/or other models. Ie. for
models based on AMD CPUs we just have added a new warning in
pcc-cpufreq and disabled use of ondemand governor when the system has
more than 4 CPUs.


Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] cpufreq: intel_pstate: Register when ACPI PCCH is present
  2018-07-17 18:06                         ` [PATCH] cpufreq: intel_pstate: Register " Rafael J. Wysocki
@ 2018-07-18 10:43                           ` Andreas Herrmann
  2018-07-18 10:51                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-18 10:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Linux PM, Rafael J. Wysocki, Peter Zijlstra,
	Frederic Weisbecker, Viresh Kumar, Linux Kernel Mailing List

On Tue, Jul 17, 2018 at 08:06:54PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Currently, intel_pstate doesn't load if _PSS is not present on
> HP Proliant systems, because it expects the firmware to take over
> CPU performance scaling in that case.  However, if ACPI PCCH is
> present, the firmware expects the kernel to use it for CPU
> performance scaling and the pcc-cpufreq driver is loaded for that.
> 
> Unfortunately, the firmware interface used by that driver is not
> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
> on systems with more than just a few CPUs.  In fact, it is better to
> avoid using it at all.
> 
> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
> is not present and register if it is there.  Also prevent the
> pcc-cpufreq driver from trying to initialize if intel_pstate has
> been registered already.
> 
> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
> Reported-by: Andreas Herrmann <aherrmann@suse.com>
> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> This is a replacement for https://patchwork.kernel.org/patch/10530091/
> 
> In addition to the intel_pstate changes in the above, it also
> prevents pcc-cpufreq from trying to initialize (which will fail
> ultimately, but may confuse the firmware in the meantime).
> 
> Andreas, please test this one and let me know if it still works for you.

Done that (with system in "Dynamic Power Savings Mode"). It still
works and system was stable.

FYI, as a sniff test I've run a kernbench test. I now repeat the test
with system switched to "OS control mode". Just in case -- if there
was interference with platform code while system was in "Dynamic Power
Savings Mode" (significantly affecting performance) I might spot it
this way.


Andreas


> Thanks,
> Rafael
> 
> ---
>  drivers/cpufreq/intel_pstate.c |   17 ++++++++++++++++-
>  drivers/cpufreq/pcc-cpufreq.c  |    4 ++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
>  	return true;
>  }
>  
> +static bool __init intel_pstate_no_acpi_pcch(void)
> +{
> +	acpi_status status;
> +	acpi_handle handle;
> +
> +	status = acpi_get_handle(NULL, "\\_SB", &handle);
> +	if (ACPI_FAILURE(status))
> +		return true;
> +
> +	return !acpi_has_method(handle, "PCCH");
> +}
> +
>  static bool __init intel_pstate_has_acpi_ppc(void)
>  {
>  	int i;
> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>  
>  	switch (plat_info[idx].data) {
>  	case PSS:
> -		return intel_pstate_no_acpi_pss();
> +		if (!intel_pstate_no_acpi_pss())
> +			return false;
> +
> +		return intel_pstate_no_acpi_pcch();
>  	case PPC:
>  		return intel_pstate_has_acpi_ppc() && !force_load;
>  	}
> Index: linux-pm/drivers/cpufreq/pcc-cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/pcc-cpufreq.c
> +++ linux-pm/drivers/cpufreq/pcc-cpufreq.c
> @@ -580,6 +580,10 @@ static int __init pcc_cpufreq_init(void)
>  {
>  	int ret;
>  
> +	/* Skip initialization if another cpufreq driver is there. */
> +	if (cpufreq_get_current_driver())
> +		return 0;
> +
>  	if (acpi_disabled)
>  		return 0;
>  
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] cpufreq: intel_pstate: Register when ACPI PCCH is present
  2018-07-18 10:43                           ` Andreas Herrmann
@ 2018-07-18 10:51                             ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-18 10:51 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Linux PM,
	Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
	Viresh Kumar, Linux Kernel Mailing List

On Wed, Jul 18, 2018 at 12:43 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 08:06:54PM +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Currently, intel_pstate doesn't load if _PSS is not present on
>> HP Proliant systems, because it expects the firmware to take over
>> CPU performance scaling in that case.  However, if ACPI PCCH is
>> present, the firmware expects the kernel to use it for CPU
>> performance scaling and the pcc-cpufreq driver is loaded for that.
>>
>> Unfortunately, the firmware interface used by that driver is not
>> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
>> on systems with more than just a few CPUs.  In fact, it is better to
>> avoid using it at all.
>>
>> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
>> is not present and register if it is there.  Also prevent the
>> pcc-cpufreq driver from trying to initialize if intel_pstate has
>> been registered already.
>>
>> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
>> Reported-by: Andreas Herrmann <aherrmann@suse.com>
>> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
>> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>
>> This is a replacement for https://patchwork.kernel.org/patch/10530091/
>>
>> In addition to the intel_pstate changes in the above, it also
>> prevents pcc-cpufreq from trying to initialize (which will fail
>> ultimately, but may confuse the firmware in the meantime).
>>
>> Andreas, please test this one and let me know if it still works for you.
>
> Done that (with system in "Dynamic Power Savings Mode"). It still
> works and system was stable.

Thanks!

> FYI, as a sniff test I've run a kernbench test. I now repeat the test
> with system switched to "OS control mode". Just in case -- if there
> was interference with platform code while system was in "Dynamic Power
> Savings Mode" (significantly affecting performance) I might spot it
> this way.

I'd rather not expect performance to be affected by this, but power
very well may be affected.

Anyway, good idea!

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-17  6:50 Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
                   ` (2 preceding siblings ...)
  2018-07-17 12:02 ` [PATCH] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems Rafael J. Wysocki
@ 2018-07-18 15:25 ` Andreas Herrmann
  2018-07-18 15:31   ` Andreas Herrmann
  3 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-18 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Frederic Weisbecker, Viresh Kumar, linux-pm,
	linux-kernel

I think I still owe some performance numbers to show what is wrong
with systems using pcc-cpufreq with Linux after commit 554c8aa8ecad.

Following are results for kernbench tests (from MMTests test suite).
That's just a kernel compile with different number of compile jobs.
As result the time is measured, 5 runs are done for each configuration and
average values calculated.

I've restricted maximum number of jobs to 30. That means that tests
were done for 2, 4, 8, 16, and 30 compile jobs. I had bound all tests
to node 0. (I've used something like "numactl -N 0 ./run-mmtests.sh
--run-monitor <test_name>" to start those tests.)

Tests were done with kernel 4.18.0-rc3 on an HP DL580 Gen8 with Intel
Xeon CPU E7-4890 with latest BIOS installed. System had 4 nodes, 15
CPUs per node (30 logical CPUs with HT enabled). pcc-cpufreq was
active and ondemand governor in use.

I've tested with different number of online CPUs which better
illustrates how idle online CPUs interfere with compile load on node 0
(due to the jitter caused by pcc-cpufreq and its locking).

Average mean for user/system/elapsed time and standard deviation for each
subtest (=number of compile jobs) are as follows:

(Nodes)                      N0                N01                   N0                  N01                 N0123
 (CPUs)                  15CPUs             30CPUs               30CPUs               60CPUs               120CPUs
Amean   user-2   640.82 (0.00%)  675.90   (-5.47%)   789.03   (-23.13%)  1448.58  (-126.05%)  3575.79   (-458.01%)
Amean   user-4   652.18 (0.00%)  689.12   (-5.67%)   868.19   (-33.12%)  1846.66  (-183.15%)  5437.37   (-733.73%)
Amean   user-8   695.00 (0.00%)  732.22   (-5.35%)  1138.30   (-63.78%)  2598.74  (-273.92%)  7413.43   (-966.67%)
Amean   user-16  653.94 (0.00%)  772.48  (-18.13%)  1734.80  (-165.29%)  2699.65  (-312.83%)  9224.47  (-1310.61%)
Amean   user-30  634.91 (0.00%)  701.11  (-10.43%)  1197.37   (-88.59%)  1360.02  (-114.21%)  3732.34   (-487.85%)
Amean   syst-2   235.45 (0.00%)  235.68   (-0.10%)   321.99   (-36.76%)   574.44  (-143.98%)   869.35   (-269.23%)
Amean   syst-4   239.34 (0.00%)  243.09   (-1.57%)   345.07   (-44.18%)   621.00  (-159.47%)  1145.13   (-378.46%)
Amean   syst-8   246.51 (0.00%)  254.83   (-3.37%)   387.49   (-57.19%)   786.63  (-219.10%)  1406.17   (-470.42%)
Amean   syst-16  110.85 (0.00%)  122.21  (-10.25%)   408.25  (-268.31%)   644.41  (-481.36%)  1513.04  (-1264.99%)
Amean   syst-30   82.74 (0.00%)   94.07  (-13.69%)   155.38   (-87.80%)   207.03  (-150.22%)   547.73   (-562.01%)
Amean   elsp-2   625.33 (0.00%)  724.51  (-15.86%)   792.47   (-26.73%)  1537.44  (-145.86%)  3510.22   (-461.34%)
Amean   elsp-4   482.02 (0.00%)  568.26  (-17.89%)   670.26   (-39.05%)  1257.34  (-160.85%)  3120.89   (-547.46%)
Amean   elsp-8   267.75 (0.00%)  337.88  (-26.19%)   430.56   (-60.80%)   978.47  (-265.44%)  2321.91   (-767.18%)
Amean   elsp-16   63.55 (0.00%)   71.79  (-12.97%)   224.83  (-253.79%)   403.94  (-535.65%)  1121.04  (-1664.09%)
Amean   elsp-30   56.76 (0.00%)   62.82  (-10.69%)    66.50   (-17.16%)   124.20  (-118.84%)   303.47   (-434.70%)
Stddev  user-2     1.36 (0.00%)    1.94  (-42.57%)    16.17 (-1090.46%)   119.09 (-8669.75%)   382.74 (-28085.60%)
Stddev  user-4     2.81 (0.00%)    5.08  (-80.78%)     4.88   (-73.66%)   252.56 (-8881.80%)  1133.02 (-40193.16%)
Stddev  user-8     2.30 (0.00%)   15.58 (-578.28%)    30.60 (-1232.63%)   279.35 (-12064.01%) 1050.00 (-45621.61%)
Stddev  user-16    6.76 (0.00%)   25.52 (-277.80%)    78.44 (-1060.97%)   118.29 (-1650.94%)   724.11 (-10617.95%)
Stddev  user-30    0.51 (0.00%)    1.80 (-249.13%)    12.63 (-2354.11%)    25.82 (-4915.43%)  1098.82 (-213365.28%)
Stddev  syst-2     1.52 (0.00%)    2.76  (-81.04%)     3.98  (-161.58%)    36.35 (-2287.16%)    59.09  (-3781.09%)
Stddev  syst-4     2.39 (0.00%)    1.55   (35.25%)     3.24  ( -35.92%)    51.51 (-2057.65%)   175.75  (-7262.43%)
Stddev  syst-8     1.08 (0.00%)    3.70 (-241.40%)     6.83  (-531.33%)    65.80 (-5977.97%)   151.17 (-13864.10%)
Stddev  syst-16    3.78 (0.00%)    5.58  (-47.53%)     4.63  ( -22.44%)    47.90 (-1167.18%)    99.94  (-2543.88%)
Stddev  syst-30    0.31 (0.00%)    0.38  (-22.41%)     3.01  (-862.79%)    27.45 (-8688.85%)   137.94 (-44072.77%)
Stddev  elsp-2    55.14 (0.00%)   55.04    (0.18%)    95.33  ( -72.90%)   103.91   (-88.45%)   302.31   (-448.29%)
Stddev  elsp-4    60.90 (0.00%)   84.42  (-38.62%)    18.92  (  68.94%)   197.60  (-224.46%)   323.53   (-431.24%)
Stddev  elsp-8    16.77 (0.00%)   30.77  (-83.47%)    49.57  (-195.57%)    79.02  (-371.16%)   261.85  (-1461.28%)
Stddev  elsp-16    1.99 (0.00%)    2.88  (-44.60%)    28.11 (-1311.79%)   101.81 (-5012.88%)    62.29  (-3028.36%)
Stddev  elsp-30    0.65 (0.00%)    1.04  (-59.06%)     1.64  (-151.81%)    41.84 (-6308.81%)    75.37 (-11445.61%)

Overall test time for each mmtests invocation was as follows (this is
also given for number-of-cpu configs for which I did not provide
details above).

               N0      N01       N0     N012    N0123      N01    N0123    N0123     N012    N0123     N0123
           15CPUs   30CPUs   30CPUs   45CPUs   60CPUs   60CPUs   75CPUs   90CPUs   90CPUs  105CPUs   120CPUs
User     17196.67 18714.36 30105.65 19239.27 19505.35 53089.39 22690.33 26731.06 38131.74 47627.61 153424.99
System    4807.98  4970.89  8533.95  5136.97  5184.24 16351.67  6135.29  7152.66 10920.76 12362.39  32129.74
Elapsed   7796.46  9166.55 11518.51  9274.77  9030.39 25465.38  9361.60 10677.63 15633.49 18900.46  60908.28

The results given for 120 online CPUs on nodes 0-3 indicate what I
meant with the "system being almost unusable". When trying to gather
results with kernel 4.17.5 and 120 CPUs, one iteration of kernbench (1
kernel compile) with 2 jobs even took about 6 hours. Maybe it was an
extreme outlier but I dismissed to further use that kernel (w/o
modifications) for further tests.


Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-18 15:25 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
@ 2018-07-18 15:31   ` Andreas Herrmann
  2018-07-19 11:04     ` Andreas Herrmann
  0 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-18 15:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Frederic Weisbecker, Viresh Kumar, linux-pm,
	linux-kernel

On Wed, Jul 18, 2018 at 05:25:56PM +0200, Andreas Herrmann wrote:

 ...

> Average mean for user/system/elapsed time and standard deviation for each

Oops, of course I meant arithmetic mean.

 ...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
  2018-07-18 15:31   ` Andreas Herrmann
@ 2018-07-19 11:04     ` Andreas Herrmann
  0 siblings, 0 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-19 11:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Frederic Weisbecker, Viresh Kumar, linux-pm,
	linux-kernel

For the sake of completeness following are given the remaining sets of
kernbench results related to this thread.

Setup for kernbench test is as described in previous mails but now all
120 logical CPUs were online in all tests. Test runs were still pinned
to node 0.

Common legend for below tables is:

   OSCM: "OS Control Mode"
   DPSM: "Dynamic Power Savings Mode"
idle_rb: partial rollback of 554c8aa8ecad ("sched: idle: Select idle
         state before stopping the tick") as described in initial mail
         of this thread

(A) intel_pstate (in powersave mode) performance wrt effect of commit
    554c8aa8ecad and wrt to potential interference from platform code

 Kernel v4.18-rc5-36-g30b06abfb92b + patch for intel_pstate to load it
 instead of pcc-cpufreq when system is in DPSM.

 Detailed results for each number of compile jobs:
 (OSCM is baseline, values in parenthesis show comparison to baseline)

                    OSCM                OSCM                DPSM                DPSM
                                     idle_rb                                 idle_rb
 Amean  user-2    600.58   596.38 (   0.70%)   685.94 ( -14.21%)   688.78 ( -14.69%)
 Amean  user-4    583.90   586.34 (  -0.42%)   626.37 (  -7.27%)   622.17 (  -6.55%)
 Amean  user-8    584.78   581.52 (   0.56%)   600.89 (  -2.75%)   595.53 (  -1.84%)
 Amean  user-16   705.07   688.62 (   2.33%)   705.16 (  -0.01%)   682.44 (   3.21%)
 Amean  user-30  1017.25  1022.39 (  -0.51%)  1025.23 (  -0.78%)  1022.61 (  -0.53%)
 Amean  syst-2    172.17   174.08 (  -1.11%)   184.73 (  -7.30%)   186.13 (  -8.11%)
 Amean  syst-4    183.88   180.44 (   1.87%)   191.70 (  -4.25%)   192.24 (  -4.54%)
 Amean  syst-8    193.40   193.81 (  -0.21%)   198.01 (  -2.38%)   193.96 (  -0.29%)
 Amean  syst-16   183.97   180.40 (   1.94%)   184.00 (  -0.01%)   182.10 (   1.02%)
 Amean  syst-30   122.36   122.08 (   0.23%)   122.53 (  -0.14%)   122.17 (   0.15%)
 Amean  elsp-2    610.90   634.64 (  -3.89%)   667.67 (  -9.29%)   661.81 (  -8.33%)
 Amean  elsp-4    413.54   488.02 ( -18.01%)   433.79 (  -4.90%)   407.30 (   1.51%)
 Amean  elsp-8    261.85   218.25 (  16.65%)   246.62 (   5.82%)   219.55 (  16.15%)
 Amean  elsp-16    89.27    99.36 ( -11.30%)    92.74 (  -3.89%)   102.74 ( -15.09%)
 Amean  elsp-30    47.07    47.04 (   0.08%)    48.82 (  -3.72%)    48.28 (  -2.57%)
 Stddev user-2      6.06     7.53 ( -24.21%)    31.88 (-425.98%)    25.79 (-325.57%)
 Stddev user-4      7.05    14.48 (-105.40%)    11.82 ( -67.63%)    12.14 ( -72.22%)
 Stddev user-8      5.69     1.18 (  79.28%)    18.75 (-229.45%)     7.03 ( -23.51%)
 Stddev user-16     6.41    15.74 (-145.55%)    12.87 (-100.75%)    10.59 ( -65.19%)
 Stddev user-30     2.62     2.80 (  -6.56%)     2.92 ( -11.31%)     2.45 (   6.52%)
 Stddev syst-2      3.48     2.81 (  19.28%)     2.27 (  34.73%)     1.47 (  57.83%)
 Stddev syst-4      4.04     4.69 ( -16.03%)     2.16 (  46.42%)     0.84 (  79.32%)
 Stddev syst-8      3.96     1.42 (  64.11%)     2.34 (  40.98%)     1.93 (  51.24%)
 Stddev syst-16     2.01     2.33 ( -15.76%)     1.33 (  33.89%)     1.94 (   3.74%)
 Stddev syst-30     0.76     0.38 (  50.10%)     0.91 ( -19.48%)     0.17 (  77.86%)
 Stddev elsp-2     44.55    58.37 ( -31.01%)   110.11 (-147.15%)    82.81 ( -85.88%)
 Stddev elsp-4     62.39   109.75 ( -75.90%)    48.32 (  22.56%)    47.10 (  24.52%)
 Stddev elsp-8     59.01    25.95 (  56.02%)    71.44 ( -21.07%)    37.83 (  35.89%)
 Stddev elsp-16    10.47    23.88 (-128.08%)    11.98 ( -14.41%)    15.42 ( -47.32%)
 Stddev elsp-30     0.26     0.64 (-142.06%)     0.39 ( -46.53%)     0.44 ( -66.71%)

 Overall test time:

               OSCM     OSCM     DPSM     DPSM
                     idle_rb           idle_rb
 User      18681.59 18599.99 19450.38 19289.33
 System     4487.76  4458.55  4620.80  4595.13
 Elapsed    7407.07  7725.86  7765.91  7502.72

 Overall test run-time is comparable. Commit 554c8aa8ecad does not
 seem to have a significant impact on performance (I don't have
 numbers for power consumption). Comparing OSCM vs. DPSM: it seems
 that its better to switch system into OSCM.


(B) performance of intel_pstate (in powersave mode and system in DPSM)
    vs. pcc-cpufreq (with ondemand governor)

 Results for pcc-cpufreq were obtained with v4.17.5+misc modifications.

 intel_pstate results were obtained with v4.18-rc5-36-g30b06abfb92b +
 patch for intel_pstate to load it instead of pcc-cpufreq when system
 is in DPSM.

 So strictly speaking this is no correct comparison but at least it
 gives an idea where the limits are with pcc-cpufreq and why its
 better to just switch to intel_pstate.
 
 pcc-cpufreq driver modifications were

 freqtable: pcc-cpufreq modified to use fixed table of 4 frequencies
  deadband: pcc-cpufreq modified to re-introduce so called deadband
            effect which keeps CPU at minimum frequency if target
            frequency would be in the calculated deadband

           intel_pstate        pcc-cpufreq        pcc-cpufreq        pcc-cpufreq
                   DPSM            idle_rb  idle_rb+freqtable   idle_rb+deadband
 Amean  user-2   685.94  834.15 ( -21.61%)  648.68 (   5.43%)  636.63 (   7.19%)
 Amean  user-4   626.37  902.09 ( -44.02%)  657.43 (  -4.96%)  615.49 (   1.74%)
 Amean  user-8   600.89 1078.37 ( -79.46%)  723.05 ( -20.33%)  646.23 (  -7.55%)
 Amean  user-16  705.16 1640.89 (-132.70%) 1096.61 ( -55.51%)  904.17 ( -28.22%)
 Amean  user-30 1025.23 1463.90 ( -42.79%) 1156.17 ( -12.77%) 1151.40 ( -12.31%)
 Amean  syst-2   184.73  232.17 ( -25.68%)  178.24 (   3.51%)  172.09 (   6.84%)
 Amean  syst-4   191.70  257.22 ( -34.18%)  194.16 (  -1.29%)  188.10 (   1.88%)
 Amean  syst-8   198.01  313.67 ( -58.41%)  228.34 ( -15.31%)  206.99 (  -4.53%)
 Amean  syst-16  184.00  393.92 (-114.09%)  279.89 ( -52.12%)  241.83 ( -31.43%)
 Amean  syst-30  122.53  185.98 ( -51.79%)  143.28 ( -16.94%)  140.45 ( -14.62%)
 Amean  elsp-2   667.67  769.28 ( -15.22%)  635.68 (   4.79%)  651.51 (   2.42%)
 Amean  elsp-4   433.79  614.27 ( -41.60%)  440.45 (  -1.53%)  392.80 (   9.45%)
 Amean  elsp-8   246.62  397.54 ( -61.19%)  252.27 (  -2.29%)  239.21 (   3.01%)
 Amean  elsp-16   92.74  207.43 (-123.68%)  138.00 ( -48.81%)  119.98 ( -29.37%)
 Amean  elsp-30   48.82   72.66 ( -48.83%)   55.95 ( -14.60%)   54.32 ( -11.27%)
 Stddev user-2    31.88   15.22 (  52.26%)    7.77 (  75.63%)    6.63 (  79.21%)
 Stddev user-4    11.82   32.20 (-172.49%)    3.37 (  71.44%)    6.44 (  45.49%)
 Stddev user-8    18.75   33.99 ( -81.29%)    6.96 (  62.86%)    5.82 (  68.97%)
 Stddev user-16   12.87   70.72 (-449.46%)   31.19 (-142.30%)   28.88 (-124.40%)
 Stddev user-30    2.92   26.08 (-792.64%)    6.16 (-110.99%)   10.90 (-273.16%)
 Stddev syst-2     2.27    4.44 ( -95.54%)    4.15 ( -82.48%)    2.09 (   8.11%)
 Stddev syst-4     2.16    8.46 (-290.74%)    3.71 ( -71.58%)    2.45 ( -12.99%)
 Stddev syst-8     2.34   10.73 (-359.70%)    3.98 ( -70.62%)    4.39 ( -87.80%)
 Stddev syst-16    1.33   11.44 (-759.46%)    2.14 ( -60.49%)    2.93 (-120.24%)
 Stddev syst-30    0.91    4.88 (-436.79%)    1.37 ( -50.11%)    2.36 (-159.71%)
 Stddev elsp-2   110.11   85.53 (  22.32%)   87.11 (  20.89%)   37.33 (  66.10%)
 Stddev elsp-4    48.32  130.17 (-169.39%)   59.81 ( -23.79%)   26.15 (  45.88%)
 Stddev elsp-8    71.44   86.47 ( -21.03%)   12.87 (  81.98%)   43.88 (  38.58%)
 Stddev elsp-16   11.98   13.63 ( -13.82%)    8.94 (  25.35%)    5.97 (  50.15%)
 Stddev elsp-30    0.39    2.64 (-582.23%)    0.62 ( -58.97%)    0.95 (-144.47%)

         intel_pstate pcc-cpufreq pcc-cpufreq pcc-cpufreq
                 DPSM     idle_rb    idle_rb+    idle_rb+
                                    freqtable    deadband
 User        19450.38    31273.96    22689.14    21050.35
 System       4620.80     7327.67     5364.63     4984.36
 Elapsed      7765.91    10997.49     7935.53     7593.74

 Again I have no numbers for power consumption.

 Note that I've stopped an attempt to collect results for pcc-cpufreq
 with unmodififed v4.17.5 (ie. w/o idle_rb) after the first iteration
 (compiling kernel with 2 jobs) took several hours.


Andreas

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-07-19 11:04 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17  6:50 Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
2018-07-17  7:33 ` Rafael J. Wysocki
2018-07-17  8:03   ` Rafael J. Wysocki
2018-07-17  8:50     ` Andreas Herrmann
2018-07-17  8:58       ` Rafael J. Wysocki
2018-07-17  9:06       ` Rafael J. Wysocki
2018-07-17  9:11         ` Andreas Herrmann
2018-07-17  9:23           ` Rafael J. Wysocki
2018-07-17  9:27             ` Andreas Herrmann
2018-07-17  9:36               ` Andreas Herrmann
2018-07-17 10:09                 ` Rafael J. Wysocki
2018-07-17 10:21                   ` Andreas Herrmann
2018-07-17 10:23                     ` Rafael J. Wysocki
2018-07-17 14:03                     ` Andreas Herrmann
2018-07-17 15:29                       ` Rafael J. Wysocki
2018-07-17 16:13                       ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
2018-07-17 17:23                         ` Srinivas Pandruvada
2018-07-17 17:28                           ` Rafael J. Wysocki
2018-07-17 18:06                         ` [PATCH] cpufreq: intel_pstate: Register " Rafael J. Wysocki
2018-07-18 10:43                           ` Andreas Herrmann
2018-07-18 10:51                             ` Rafael J. Wysocki
2018-07-17 10:18                 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
2018-07-17  8:08   ` Daniel Lezcano
2018-07-17  8:36   ` Andreas Herrmann
2018-07-17  8:52     ` Rafael J. Wysocki
2018-07-17  8:15 ` Peter Zijlstra
2018-07-17  9:05   ` Andreas Herrmann
2018-07-17 12:02 ` [PATCH] cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems Rafael J. Wysocki
2018-07-17 16:14   ` [PATCH v2] " Rafael J. Wysocki
2018-07-17 20:13     ` Andreas Herrmann
2018-07-18  7:44       ` Rafael J. Wysocki
2018-07-18  8:23       ` Peter Zijlstra
2018-07-18  9:34         ` Andreas Herrmann
2018-07-18 15:25 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
2018-07-18 15:31   ` Andreas Herrmann
2018-07-19 11:04     ` Andreas Herrmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).