* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 7:33 ` Rafael J. Wysocki
@ 2018-07-17 8:03 ` Rafael J. Wysocki
2018-07-17 8:50 ` Andreas Herrmann
2018-07-17 8:08 ` Daniel Lezcano
2018-07-17 8:36 ` Andreas Herrmann
2 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 8:03 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 9:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> Hi,
>
> Thanks for your report!
>
> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> Hello,
>>
>> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> idle state before stopping the tick") causes severe performance drop
>> for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> the system might be almost unusable. The OS jitter for 4.17.y and
>> 4.18.-rcx kernels is off the charts, you can even spot it with top
>> command (issued when the system is supposedly idle), e.g.
>>
>> top - 14:44:24 up 2 min, 1 user, load average: 90.11, 38.20, 14.38
>> Tasks: 1199 total, 109 running, 541 sleeping, 0 stopped, 0 zombie
>> %Cpu(s): 1.2 us, 58.7 sy, 0.0 ni, 39.3 id, 0.6 wa, 0.0 hi, 0.3 si, 0.0 st
>> KiB Mem: 13137064+total, 1192168 used, 13017848+free, 2340 buffers
>> KiB Swap: 2104316 total, 0 used, 2104316 free. 522296 cached Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 3373 root 20 0 982024 49916 36120 R 96.691 0.038 0:19.54 kubelet
>> 67 root 20 0 0 0 0 R 78.676 0.000 0:49.36 kworker/9:0
>> 25 root 20 0 0 0 0 R 78.125 0.000 0:49.67 kworker/2:0
>> 182 root 20 0 0 0 0 R 75.735 0.000 1:18.17 kworker/28:0
>> 43 root 20 0 0 0 0 R 75.000 0.000 0:11.56 kworker/5:0
>> 103 root 20 0 0 0 0 R 74.449 0.000 0:46.83 kworker/15:0
>> 334 root 20 0 0 0 0 R 72.978 0.000 1:06.88 kworker/53:0
>> 789 root 20 0 0 0 0 R 69.853 0.000 1:29.50 kworker/38:1
>> 418 root 20 0 0 0 0 R 69.301 0.000 0:41.33 kworker/67:0
>> 779 root 20 0 0 0 0 R 68.934 0.000 1:33.60 kworker/27:1
>> 773 root 20 0 0 0 0 R 68.566 0.000 1:37.91 kworker/22:1
>> 762 root 20 0 0 0 0 R 68.015 0.000 1:41.01 kworker/11:1
>> 769 root 20 0 0 0 0 R 67.647 0.000 1:37.65 kworker/18:1
>> 805 root 20 0 0 0 0 R 67.096 0.000 1:30.96 kworker/54:1
>> 840 root 20 0 0 0 0 R 66.912 0.000 1:23.82 kworker/89:1
>> 812 root 20 0 0 0 0 R 66.728 0.000 1:31.89 kworker/59:1
>> 847 root 20 0 0 0 0 R 66.360 0.000 1:28.40 kworker/96:1
>> 763 root 20 0 0 0 0 R 66.176 0.000 1:42.57 kworker/12:1
>> 772 root 20 0 0 0 0 R 66.176 0.000 1:12.58 kworker/21:1
>> 821 root 20 0 0 0 0 R 66.176 0.000 1:29.62 kworker/69:1
>> 923 root 20 0 0 0 0 R 65.809 0.000 1:44.32 kworker/3:18
>> 1284 root 20 0 0 0 0 R 65.809 0.000 1:23.50 kworker/101:2
>> 61 root 20 0 0 0 0 R 65.625 0.000 1:29.37 kworker/8:0
>> 3531 root 20 0 24384 3768 2356 R 65.625 0.003 0:08.91 top
>> 771 root 20 0 0 0 0 R 65.074 0.000 1:37.90 kworker/20:1
>> 767 root 20 0 0 0 0 R 64.706 0.000 1:38.01 kworker/16:1
>> 764 root 20 0 0 0 0 R 64.522 0.000 1:40.28 kworker/13:1
>> 765 root 20 0 0 0 0 R 64.154 0.000 1:40.13 kworker/14:1
>>
>> When I apply below patch (trying to revert essential parts of commit
>> 554c8aa8ecad) behaviour seems back to normal.
>
> Well, that basically defeats the purpose of the change in commit
> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>
> Also it would be good to understand what actually happens.
>
>> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> to cpufreq drivers and you better not use it.
>
> That's exactly right.
>
>> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> stopping the tick") introduced bad behaviour for other cases as well.
>
> It has been tested quite extensively in that respect, although
> admittedly not with the pcc-cpufreq driver.
>
> Nothing bad related to it has been has been reported so far, FWIW.
>
>> I'll send some performance results to illustrate the issue asap. I've
>> also tried to modify pcc-cpufreq to reduce the amount of frequency
>> changes triggered by this driver but this does not help for kernels
>> where commit 554c8aa8ecad is applied.
>
> Can you replace pcc-cpufreq with a different cpufreq driver on the
> affected systems? If so, do performance numbers look bad after that
> too?
Also, what cpufreq governor do you use with pcc-cpufreq? Does
changing it to something like "performance" improve things?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 8:03 ` Rafael J. Wysocki
@ 2018-07-17 8:50 ` Andreas Herrmann
2018-07-17 8:58 ` Rafael J. Wysocki
2018-07-17 9:06 ` Rafael J. Wysocki
0 siblings, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 8:50 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 10:03:41AM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 17, 2018 at 9:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > Hi,
> >
> > Thanks for your report!
> >
> > On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> >> Hello,
> >>
> >> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
> >> idle state before stopping the tick") causes severe performance drop
> >> for systems using pcc-cpufreq driver. Depending on the number of CPUs
> >> the system might be almost unusable. The OS jitter for 4.17.y and
> >> 4.18.-rcx kernels is off the charts, you can even spot it with top
> >> command (issued when the system is supposedly idle), e.g.
> >>
> >> top - 14:44:24 up 2 min, 1 user, load average: 90.11, 38.20, 14.38
> >> Tasks: 1199 total, 109 running, 541 sleeping, 0 stopped, 0 zombie
> >> %Cpu(s): 1.2 us, 58.7 sy, 0.0 ni, 39.3 id, 0.6 wa, 0.0 hi, 0.3 si, 0.0 st
> >> KiB Mem: 13137064+total, 1192168 used, 13017848+free, 2340 buffers
> >> KiB Swap: 2104316 total, 0 used, 2104316 free. 522296 cached Mem
> >>
> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >> 3373 root 20 0 982024 49916 36120 R 96.691 0.038 0:19.54 kubelet
> >> 67 root 20 0 0 0 0 R 78.676 0.000 0:49.36 kworker/9:0
> >> 25 root 20 0 0 0 0 R 78.125 0.000 0:49.67 kworker/2:0
> >> 182 root 20 0 0 0 0 R 75.735 0.000 1:18.17 kworker/28:0
> >> 43 root 20 0 0 0 0 R 75.000 0.000 0:11.56 kworker/5:0
> >> 103 root 20 0 0 0 0 R 74.449 0.000 0:46.83 kworker/15:0
> >> 334 root 20 0 0 0 0 R 72.978 0.000 1:06.88 kworker/53:0
> >> 789 root 20 0 0 0 0 R 69.853 0.000 1:29.50 kworker/38:1
> >> 418 root 20 0 0 0 0 R 69.301 0.000 0:41.33 kworker/67:0
> >> 779 root 20 0 0 0 0 R 68.934 0.000 1:33.60 kworker/27:1
> >> 773 root 20 0 0 0 0 R 68.566 0.000 1:37.91 kworker/22:1
> >> 762 root 20 0 0 0 0 R 68.015 0.000 1:41.01 kworker/11:1
> >> 769 root 20 0 0 0 0 R 67.647 0.000 1:37.65 kworker/18:1
> >> 805 root 20 0 0 0 0 R 67.096 0.000 1:30.96 kworker/54:1
> >> 840 root 20 0 0 0 0 R 66.912 0.000 1:23.82 kworker/89:1
> >> 812 root 20 0 0 0 0 R 66.728 0.000 1:31.89 kworker/59:1
> >> 847 root 20 0 0 0 0 R 66.360 0.000 1:28.40 kworker/96:1
> >> 763 root 20 0 0 0 0 R 66.176 0.000 1:42.57 kworker/12:1
> >> 772 root 20 0 0 0 0 R 66.176 0.000 1:12.58 kworker/21:1
> >> 821 root 20 0 0 0 0 R 66.176 0.000 1:29.62 kworker/69:1
> >> 923 root 20 0 0 0 0 R 65.809 0.000 1:44.32 kworker/3:18
> >> 1284 root 20 0 0 0 0 R 65.809 0.000 1:23.50 kworker/101:2
> >> 61 root 20 0 0 0 0 R 65.625 0.000 1:29.37 kworker/8:0
> >> 3531 root 20 0 24384 3768 2356 R 65.625 0.003 0:08.91 top
> >> 771 root 20 0 0 0 0 R 65.074 0.000 1:37.90 kworker/20:1
> >> 767 root 20 0 0 0 0 R 64.706 0.000 1:38.01 kworker/16:1
> >> 764 root 20 0 0 0 0 R 64.522 0.000 1:40.28 kworker/13:1
> >> 765 root 20 0 0 0 0 R 64.154 0.000 1:40.13 kworker/14:1
> >>
> >> When I apply below patch (trying to revert essential parts of commit
> >> 554c8aa8ecad) behaviour seems back to normal.
> >
> > Well, that basically defeats the purpose of the change in commit
> > 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
> >
> > Also it would be good to understand what actually happens.
> >
> >> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
> >> to cpufreq drivers and you better not use it.
> >
> > That's exactly right.
> >
> >> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
> >> stopping the tick") introduced bad behaviour for other cases as well.
> >
> > It has been tested quite extensively in that respect, although
> > admittedly not with the pcc-cpufreq driver.
> >
> > Nothing bad related to it has been has been reported so far, FWIW.
> >
> >> I'll send some performance results to illustrate the issue asap. I've
> >> also tried to modify pcc-cpufreq to reduce the amount of frequency
> >> changes triggered by this driver but this does not help for kernels
> >> where commit 554c8aa8ecad is applied.
> >
> > Can you replace pcc-cpufreq with a different cpufreq driver on the
> > affected systems? If so, do performance numbers look bad after that
> > too?
>
> Also, what cpufreq governor do you use with pcc-cpufreq?
Ondemand governor. Which triggers a lot of PCC related platform calls.
And as Peter noticed already the driver has a severe bottleneck (lock
protecting shared memory used for all CPUs to pass data to/from
platform for PCC calls).
> Does changing it to something like "performance" improve things?
With performance governor above mentioned bottleneck is no issue.
On balance before this commit users could use pcc-cpufreq but had
already suboptimal performance (compared to say intel_pstate driver
which can be used changing BIOS options). Starting with this commit
systems using pcc-cpufreq are unusable with high number of CPUs (top
output above is for system with 120 CPUs).
So should the driver be removed (sooner or later), or this behaviour
be documented somewhere, or just leave it as is.
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 8:50 ` Andreas Herrmann
@ 2018-07-17 8:58 ` Rafael J. Wysocki
2018-07-17 9:06 ` Rafael J. Wysocki
1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 8:58 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
Frederic Weisbecker, Viresh Kumar, Linux PM,
Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 10:03:41AM +0200, Rafael J. Wysocki wrote:
>> On Tue, Jul 17, 2018 at 9:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> > Hi,
>> >
>> > Thanks for your report!
>> >
>> > On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> >> Hello,
>> >>
>> >> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> >> idle state before stopping the tick") causes severe performance drop
>> >> for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> >> the system might be almost unusable. The OS jitter for 4.17.y and
>> >> 4.18.-rcx kernels is off the charts, you can even spot it with top
>> >> command (issued when the system is supposedly idle), e.g.
>> >>
>> >> top - 14:44:24 up 2 min, 1 user, load average: 90.11, 38.20, 14.38
>> >> Tasks: 1199 total, 109 running, 541 sleeping, 0 stopped, 0 zombie
>> >> %Cpu(s): 1.2 us, 58.7 sy, 0.0 ni, 39.3 id, 0.6 wa, 0.0 hi, 0.3 si, 0.0 st
>> >> KiB Mem: 13137064+total, 1192168 used, 13017848+free, 2340 buffers
>> >> KiB Swap: 2104316 total, 0 used, 2104316 free. 522296 cached Mem
>> >>
>> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> >> 3373 root 20 0 982024 49916 36120 R 96.691 0.038 0:19.54 kubelet
>> >> 67 root 20 0 0 0 0 R 78.676 0.000 0:49.36 kworker/9:0
>> >> 25 root 20 0 0 0 0 R 78.125 0.000 0:49.67 kworker/2:0
>> >> 182 root 20 0 0 0 0 R 75.735 0.000 1:18.17 kworker/28:0
>> >> 43 root 20 0 0 0 0 R 75.000 0.000 0:11.56 kworker/5:0
>> >> 103 root 20 0 0 0 0 R 74.449 0.000 0:46.83 kworker/15:0
>> >> 334 root 20 0 0 0 0 R 72.978 0.000 1:06.88 kworker/53:0
>> >> 789 root 20 0 0 0 0 R 69.853 0.000 1:29.50 kworker/38:1
>> >> 418 root 20 0 0 0 0 R 69.301 0.000 0:41.33 kworker/67:0
>> >> 779 root 20 0 0 0 0 R 68.934 0.000 1:33.60 kworker/27:1
>> >> 773 root 20 0 0 0 0 R 68.566 0.000 1:37.91 kworker/22:1
>> >> 762 root 20 0 0 0 0 R 68.015 0.000 1:41.01 kworker/11:1
>> >> 769 root 20 0 0 0 0 R 67.647 0.000 1:37.65 kworker/18:1
>> >> 805 root 20 0 0 0 0 R 67.096 0.000 1:30.96 kworker/54:1
>> >> 840 root 20 0 0 0 0 R 66.912 0.000 1:23.82 kworker/89:1
>> >> 812 root 20 0 0 0 0 R 66.728 0.000 1:31.89 kworker/59:1
>> >> 847 root 20 0 0 0 0 R 66.360 0.000 1:28.40 kworker/96:1
>> >> 763 root 20 0 0 0 0 R 66.176 0.000 1:42.57 kworker/12:1
>> >> 772 root 20 0 0 0 0 R 66.176 0.000 1:12.58 kworker/21:1
>> >> 821 root 20 0 0 0 0 R 66.176 0.000 1:29.62 kworker/69:1
>> >> 923 root 20 0 0 0 0 R 65.809 0.000 1:44.32 kworker/3:18
>> >> 1284 root 20 0 0 0 0 R 65.809 0.000 1:23.50 kworker/101:2
>> >> 61 root 20 0 0 0 0 R 65.625 0.000 1:29.37 kworker/8:0
>> >> 3531 root 20 0 24384 3768 2356 R 65.625 0.003 0:08.91 top
>> >> 771 root 20 0 0 0 0 R 65.074 0.000 1:37.90 kworker/20:1
>> >> 767 root 20 0 0 0 0 R 64.706 0.000 1:38.01 kworker/16:1
>> >> 764 root 20 0 0 0 0 R 64.522 0.000 1:40.28 kworker/13:1
>> >> 765 root 20 0 0 0 0 R 64.154 0.000 1:40.13 kworker/14:1
>> >>
>> >> When I apply below patch (trying to revert essential parts of commit
>> >> 554c8aa8ecad) behaviour seems back to normal.
>> >
>> > Well, that basically defeats the purpose of the change in commit
>> > 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>> >
>> > Also it would be good to understand what actually happens.
>> >
>> >> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> >> to cpufreq drivers and you better not use it.
>> >
>> > That's exactly right.
>> >
>> >> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> >> stopping the tick") introduced bad behaviour for other cases as well.
>> >
>> > It has been tested quite extensively in that respect, although
>> > admittedly not with the pcc-cpufreq driver.
>> >
>> > Nothing bad related to it has been has been reported so far, FWIW.
>> >
>> >> I'll send some performance results to illustrate the issue asap. I've
>> >> also tried to modify pcc-cpufreq to reduce the amount of frequency
>> >> changes triggered by this driver but this does not help for kernels
>> >> where commit 554c8aa8ecad is applied.
>> >
>> > Can you replace pcc-cpufreq with a different cpufreq driver on the
>> > affected systems? If so, do performance numbers look bad after that
>> > too?
>>
>> Also, what cpufreq governor do you use with pcc-cpufreq?
>
> Ondemand governor. Which triggers a lot of PCC related platform calls.
> And as Peter noticed already the driver has a severe bottleneck (lock
> protecting shared memory used for all CPUs to pass data to/from
> platform for PCC calls).
>
>> Does changing it to something like "performance" improve things?
>
> With performance governor above mentioned bottleneck is no issue.
OK
> On balance before this commit users could use pcc-cpufreq but had
> already suboptimal performance (compared to say intel_pstate driver
> which can be used changing BIOS options). Starting with this commit
> systems using pcc-cpufreq are unusable with high number of CPUs (top
> output above is for system with 120 CPUs).
I see. :-)
> So should the driver be removed (sooner or later), or this behaviour
> be documented somewhere, or just leave it as is.
At least it should be documented in the driver that it is not scalable
and not for use on many-CPU systems (with "many" meaning anything
greater than 4 probably).
Or we could just make the driver not load if the number of CPUs in the
system is greater than 4 or similar.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 8:50 ` Andreas Herrmann
2018-07-17 8:58 ` Rafael J. Wysocki
@ 2018-07-17 9:06 ` Rafael J. Wysocki
2018-07-17 9:11 ` Andreas Herrmann
1 sibling, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 9:06 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
Frederic Weisbecker, Viresh Kumar, Linux PM,
Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
[cut]
>
> On balance before this commit users could use pcc-cpufreq but had
> already suboptimal performance (compared to say intel_pstate driver
> which can be used changing BIOS options).
BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
It should be initialized before pcc-cpufreq (according to their
respective initcall levels), so in theory intel_pstate should be used
by default on the affected systems anyway.
What BIOS settings need to be changed for that?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 9:06 ` Rafael J. Wysocki
@ 2018-07-17 9:11 ` Andreas Herrmann
2018-07-17 9:23 ` Rafael J. Wysocki
0 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 9:11 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>
> [cut]
>
> >
> > On balance before this commit users could use pcc-cpufreq but had
> > already suboptimal performance (compared to say intel_pstate driver
> > which can be used changing BIOS options).
>
> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
I think this is because of (in intel_pstate_init()):
/*
* The Intel pstate driver will be ignored if the platform
* firmware has its own power management modes.
*/
if (intel_pstate_platform_pwr_mgmt_exists())
return -ENODEV;
> It should be initialized before pcc-cpufreq (according to their
> respective initcall levels), so in theory intel_pstate should be used
> by default on the affected systems anyway.
> What BIOS settings need to be changed for that?
Already answered in other mail.
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 9:11 ` Andreas Herrmann
@ 2018-07-17 9:23 ` Rafael J. Wysocki
2018-07-17 9:27 ` Andreas Herrmann
0 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 9:23 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
Frederic Weisbecker, Viresh Kumar, Linux PM,
Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
>> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>>
>> [cut]
>>
>> >
>> > On balance before this commit users could use pcc-cpufreq but had
>> > already suboptimal performance (compared to say intel_pstate driver
>> > which can be used changing BIOS options).
>>
>> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
>
> I think this is because of (in intel_pstate_init()):
>
> /*
> * The Intel pstate driver will be ignored if the platform
> * firmware has its own power management modes.
> */
> if (intel_pstate_platform_pwr_mgmt_exists())
> return -ENODEV;
>
OK, because of the "Proliant" entry, right?
So it looks like we have an issue there. We find the entry and we
look for _PSS. It is not there, so we assume that the firmware is
expected to control performance, which is not the case. It looks like
we also should look for the presence of the PCC interface in there.
I can provide a patch for that, will you be able to test it?
>> It should be initialized before pcc-cpufreq (according to their
>> respective initcall levels), so in theory intel_pstate should be used
>> by default on the affected systems anyway.
>
>> What BIOS settings need to be changed for that?
>
> Already answered in other mail.
Indeed.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 9:23 ` Rafael J. Wysocki
@ 2018-07-17 9:27 ` Andreas Herrmann
2018-07-17 9:36 ` Andreas Herrmann
0 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 9:27 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> >>
> >> [cut]
> >>
> >> >
> >> > On balance before this commit users could use pcc-cpufreq but had
> >> > already suboptimal performance (compared to say intel_pstate driver
> >> > which can be used changing BIOS options).
> >>
> >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> >
> > I think this is because of (in intel_pstate_init()):
> >
> > /*
> > * The Intel pstate driver will be ignored if the platform
> > * firmware has its own power management modes.
> > */
> > if (intel_pstate_platform_pwr_mgmt_exists())
> > return -ENODEV;
> >
>
> OK, because of the "Proliant" entry, right?
>
> So it looks like we have an issue there. We find the entry and we
> look for _PSS. It is not there, so we assume that the firmware is
> expected to control performance, which is not the case. It looks like
> we also should look for the presence of the PCC interface in there.
>
> I can provide a patch for that, will you be able to test it?
Yes, I can test it.
> >> It should be initialized before pcc-cpufreq (according to their
> >> respective initcall levels), so in theory intel_pstate should be used
> >> by default on the affected systems anyway.
> >
> >> What BIOS settings need to be changed for that?
> >
> > Already answered in other mail.
>
> Indeed.
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 9:27 ` Andreas Herrmann
@ 2018-07-17 9:36 ` Andreas Herrmann
2018-07-17 10:09 ` Rafael J. Wysocki
2018-07-17 10:18 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
0 siblings, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 9:36 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > >>
> > >> [cut]
> > >>
> > >> >
> > >> > On balance before this commit users could use pcc-cpufreq but had
> > >> > already suboptimal performance (compared to say intel_pstate driver
> > >> > which can be used changing BIOS options).
> > >>
> > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > >
> > > I think this is because of (in intel_pstate_init()):
> > >
> > > /*
> > > * The Intel pstate driver will be ignored if the platform
> > > * firmware has its own power management modes.
> > > */
> > > if (intel_pstate_platform_pwr_mgmt_exists())
> > > return -ENODEV;
> > >
> >
> > OK, because of the "Proliant" entry, right?
> >
> > So it looks like we have an issue there. We find the entry and we
> > look for _PSS. It is not there, so we assume that the firmware is
> > expected to control performance, which is not the case.
FYI, there is another BIOS setting on those systems. It's called
"Collaborative Power Control" (AFAIK enabled by default).
Only if this is disabled, firmware is (alone) in control of
performance. (And of course in this case neither pcc-cpufreq nor
intel_pstate will be loaded).
> > It looks like we also should look for the presence of the PCC
> > interface in there.
> >
> > I can provide a patch for that, will you be able to test it?
>
> Yes, I can test it.
>
> > >> It should be initialized before pcc-cpufreq (according to their
> > >> respective initcall levels), so in theory intel_pstate should be used
> > >> by default on the affected systems anyway.
> > >
> > >> What BIOS settings need to be changed for that?
> > >
> > > Already answered in other mail.
> >
> > Indeed.
>
>
> Andreas
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 9:36 ` Andreas Herrmann
@ 2018-07-17 10:09 ` Rafael J. Wysocki
2018-07-17 10:21 ` Andreas Herrmann
2018-07-17 10:18 ` Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Andreas Herrmann
1 sibling, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 10:09 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tuesday, July 17, 2018 11:36:20 AM CEST Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > On balance before this commit users could use pcc-cpufreq but had
> > > >> > already suboptimal performance (compared to say intel_pstate driver
> > > >> > which can be used changing BIOS options).
> > > >>
> > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > > >
> > > > I think this is because of (in intel_pstate_init()):
> > > >
> > > > /*
> > > > * The Intel pstate driver will be ignored if the platform
> > > > * firmware has its own power management modes.
> > > > */
> > > > if (intel_pstate_platform_pwr_mgmt_exists())
> > > > return -ENODEV;
> > > >
> > >
> > > OK, because of the "Proliant" entry, right?
> > >
> > > So it looks like we have an issue there. We find the entry and we
> > > look for _PSS. It is not there, so we assume that the firmware is
> > > expected to control performance, which is not the case.
>
> FYI, there is another BIOS setting on those systems. It's called
> "Collaborative Power Control" (AFAIK enabled by default).
>
> Only if this is disabled, firmware is (alone) in control of
> performance. (And of course in this case neither pcc-cpufreq nor
> intel_pstate will be loaded).
OK, the patch is below.
First, I hope that if "Collaborative Power Control" is disabled, it will
simply hide the PCCH object and so intel_pstate will still not load then.
The main question basically is what the OS is expected to do if "Dynamic
Power Savings Mode" is set. If we are *expected* to use the PCC interface
then, intel_pstate may not work in that case, but I suspect that the PCC
interface allows extra energy to be saved over what is possible without it.
---
drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
return true;
}
+static bool __init intel_pstate_no_acpi_pcch(void)
+{
+ acpi_status status;
+ acpi_handle handle;
+
+ status = acpi_get_handle(NULL, "\\_SB", &handle);
+ if (ACPI_FAILURE(status))
+ return true;
+
+ return !acpi_has_method(handle, "PCCH");
+}
+
static bool __init intel_pstate_has_acpi_ppc(void)
{
int i;
@@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
switch (plat_info[idx].data) {
case PSS:
- return intel_pstate_no_acpi_pss();
+ if (!intel_pstate_no_acpi_pss())
+ return false;
+
+ return intel_pstate_no_acpi_pcch();
case PPC:
return intel_pstate_has_acpi_ppc() && !force_load;
}
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 10:09 ` Rafael J. Wysocki
@ 2018-07-17 10:21 ` Andreas Herrmann
2018-07-17 10:23 ` Rafael J. Wysocki
2018-07-17 14:03 ` Andreas Herrmann
0 siblings, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 10:21 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, July 17, 2018 11:36:20 AM CEST Andreas Herrmann wrote:
> > On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> > > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > >>
> > > > >> [cut]
> > > > >>
> > > > >> >
> > > > >> > On balance before this commit users could use pcc-cpufreq but had
> > > > >> > already suboptimal performance (compared to say intel_pstate driver
> > > > >> > which can be used changing BIOS options).
> > > > >>
> > > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > > > >
> > > > > I think this is because of (in intel_pstate_init()):
> > > > >
> > > > > /*
> > > > > * The Intel pstate driver will be ignored if the platform
> > > > > * firmware has its own power management modes.
> > > > > */
> > > > > if (intel_pstate_platform_pwr_mgmt_exists())
> > > > > return -ENODEV;
> > > > >
> > > >
> > > > OK, because of the "Proliant" entry, right?
> > > >
> > > > So it looks like we have an issue there. We find the entry and we
> > > > look for _PSS. It is not there, so we assume that the firmware is
> > > > expected to control performance, which is not the case.
> >
> > FYI, there is another BIOS setting on those systems. It's called
> > "Collaborative Power Control" (AFAIK enabled by default).
> >
> > Only if this is disabled, firmware is (alone) in control of
> > performance. (And of course in this case neither pcc-cpufreq nor
> > intel_pstate will be loaded).
>
> OK, the patch is below.
>
> First, I hope that if "Collaborative Power Control" is disabled, it will
> simply hide the PCCH object and so intel_pstate will still not load then.
PCCH is hidden in that case.
> The main question basically is what the OS is expected to do if
> "Dynamic Power Savings Mode" is set. If we are *expected* to use
> the PCC interface then, intel_pstate may not work in that case, but
> I suspect that the PCC interface allows extra energy to be saved
> over what is possible without it.
I'll test it and see what happens.
Andreas
> ---
> drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
> return true;
> }
>
> +static bool __init intel_pstate_no_acpi_pcch(void)
> +{
> + acpi_status status;
> + acpi_handle handle;
> +
> + status = acpi_get_handle(NULL, "\\_SB", &handle);
> + if (ACPI_FAILURE(status))
> + return true;
> +
> + return !acpi_has_method(handle, "PCCH");
> +}
> +
> static bool __init intel_pstate_has_acpi_ppc(void)
> {
> int i;
> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>
> switch (plat_info[idx].data) {
> case PSS:
> - return intel_pstate_no_acpi_pss();
> + if (!intel_pstate_no_acpi_pss())
> + return false;
> +
> + return intel_pstate_no_acpi_pcch();
> case PPC:
> return intel_pstate_has_acpi_ppc() && !force_load;
> }
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 10:21 ` Andreas Herrmann
@ 2018-07-17 10:23 ` Rafael J. Wysocki
2018-07-17 14:03 ` Andreas Herrmann
1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 10:23 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
Frederic Weisbecker, Viresh Kumar, Linux PM,
Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 12:21 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:
>> On Tuesday, July 17, 2018 11:36:20 AM CEST Andreas Herrmann wrote:
>> > On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
>> > > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
>> > > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> > > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
>> > > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> > > > >>
>> > > > >> [cut]
>> > > > >>
>> > > > >> >
>> > > > >> > On balance before this commit users could use pcc-cpufreq but had
>> > > > >> > already suboptimal performance (compared to say intel_pstate driver
>> > > > >> > which can be used changing BIOS options).
>> > > > >>
>> > > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
>> > > > >
>> > > > > I think this is because of (in intel_pstate_init()):
>> > > > >
>> > > > > /*
>> > > > > * The Intel pstate driver will be ignored if the platform
>> > > > > * firmware has its own power management modes.
>> > > > > */
>> > > > > if (intel_pstate_platform_pwr_mgmt_exists())
>> > > > > return -ENODEV;
>> > > > >
>> > > >
>> > > > OK, because of the "Proliant" entry, right?
>> > > >
>> > > > So it looks like we have an issue there. We find the entry and we
>> > > > look for _PSS. It is not there, so we assume that the firmware is
>> > > > expected to control performance, which is not the case.
>> >
>> > FYI, there is another BIOS setting on those systems. It's called
>> > "Collaborative Power Control" (AFAIK enabled by default).
>> >
>> > Only if this is disabled, firmware is (alone) in control of
>> > performance. (And of course in this case neither pcc-cpufreq nor
>> > intel_pstate will be loaded).
>>
>> OK, the patch is below.
>>
>> First, I hope that if "Collaborative Power Control" is disabled, it will
>> simply hide the PCCH object and so intel_pstate will still not load then.
>
> PCCH is hidden in that case.
OK
>> The main question basically is what the OS is expected to do if
>> "Dynamic Power Savings Mode" is set. If we are *expected* to use
>> the PCC interface then, intel_pstate may not work in that case, but
>> I suspect that the PCC interface allows extra energy to be saved
>> over what is possible without it.
>
> I'll test it and see what happens.
Thanks!
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 10:21 ` Andreas Herrmann
2018-07-17 10:23 ` Rafael J. Wysocki
@ 2018-07-17 14:03 ` Andreas Herrmann
2018-07-17 15:29 ` Rafael J. Wysocki
2018-07-17 16:13 ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
1 sibling, 2 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 14:03 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 12:21:36PM +0200, Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:
---8<---
> > OK, the patch is below.
> >
> > First, I hope that if "Collaborative Power Control" is disabled, it will
> > simply hide the PCCH object and so intel_pstate will still not load then.
>
> PCCH is hidden in that case.
>
> > The main question basically is what the OS is expected to do if
> > "Dynamic Power Savings Mode" is set. If we are *expected* to use
> > the PCC interface then, intel_pstate may not work in that case, but
> > I suspect that the PCC interface allows extra energy to be saved
> > over what is possible without it.
>
> I'll test it and see what happens.
I've tested it on top of v4.18-rc5-36-g30b06abfb92b. intel_pstate now
loads instead of pcc-cpufreq and system looks stable.
When disabling "Collaborative Power Control" no cpufreq driver is loaded
(as expected).
Performance (with kernbench) is as expected (always better than any
brew of pcc-cpufreq + misc modifications to this driver + partial
rollback of commit 554c8aa8ecad).
If you like you can add either Tested-by or
Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
I think this patch should be tagged for 4.17-stable.
Thanks,
Andreas
> > ---
> > drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> > @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
> > return true;
> > }
> >
> > +static bool __init intel_pstate_no_acpi_pcch(void)
> > +{
> > + acpi_status status;
> > + acpi_handle handle;
> > +
> > + status = acpi_get_handle(NULL, "\\_SB", &handle);
> > + if (ACPI_FAILURE(status))
> > + return true;
> > +
> > + return !acpi_has_method(handle, "PCCH");
> > +}
> > +
> > static bool __init intel_pstate_has_acpi_ppc(void)
> > {
> > int i;
> > @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
> >
> > switch (plat_info[idx].data) {
> > case PSS:
> > - return intel_pstate_no_acpi_pss();
> > + if (!intel_pstate_no_acpi_pss())
> > + return false;
> > +
> > + return intel_pstate_no_acpi_pcch();
> > case PPC:
> > return intel_pstate_has_acpi_ppc() && !force_load;
> > }
> >
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 14:03 ` Andreas Herrmann
@ 2018-07-17 15:29 ` Rafael J. Wysocki
2018-07-17 16:13 ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
1 sibling, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 15:29 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
Frederic Weisbecker, Viresh Kumar, Linux PM,
Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 4:03 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 12:21:36PM +0200, Andreas Herrmann wrote:
>> On Tue, Jul 17, 2018 at 12:09:21PM +0200, Rafael J. Wysocki wrote:
>
> ---8<---
>
>> > OK, the patch is below.
>> >
>> > First, I hope that if "Collaborative Power Control" is disabled, it will
>> > simply hide the PCCH object and so intel_pstate will still not load then.
>>
>> PCCH is hidden in that case.
>>
>> > The main question basically is what the OS is expected to do if
>> > "Dynamic Power Savings Mode" is set. If we are *expected* to use
>> > the PCC interface then, intel_pstate may not work in that case, but
>> > I suspect that the PCC interface allows extra energy to be saved
>> > over what is possible without it.
>>
>> I'll test it and see what happens.
>
> I've tested it on top of v4.18-rc5-36-g30b06abfb92b. intel_pstate now
> loads instead of pcc-cpufreq and system looks stable.
>
> When disabling "Collaborative Power Control" no cpufreq driver is loaded
> (as expected).
>
> Performance (with kernbench) is as expected (always better than any
> brew of pcc-cpufreq + misc modifications to this driver + partial
> rollback of commit 554c8aa8ecad).
>
> If you like you can add either Tested-by or
> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
>
> I think this patch should be tagged for 4.17-stable.
OK, thank you for testing!
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present
2018-07-17 14:03 ` Andreas Herrmann
2018-07-17 15:29 ` Rafael J. Wysocki
@ 2018-07-17 16:13 ` Rafael J. Wysocki
2018-07-17 17:23 ` Srinivas Pandruvada
2018-07-17 18:06 ` [PATCH] cpufreq: intel_pstate: Register " Rafael J. Wysocki
1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 16:13 UTC (permalink / raw)
To: Andreas Herrmann, Srinivas Pandruvada
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, intel_pstate doesn't load if _PSS is not present on
HP Proliant systems, because it expects the firmware to take over
CPU performance scaling in that case. However, if ACPI PCCH is
present, the firmware expects the kernel to use it for CPU
performance scaling and the pcc-cpufreq driver is loaded for that.
Unfortunately, the firmware interface used by that driver is not
scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
on systems with more than just a few CPUs. In fact, it is better to
avoid using it at all.
For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
is not present and load if it is there.
Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
Reported-by: Andreas Herrmann <aherrmann@suse.com>
Tested-by: Andreas Herrmann <aherrmann@suse.com>
Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
return true;
}
+static bool __init intel_pstate_no_acpi_pcch(void)
+{
+ acpi_status status;
+ acpi_handle handle;
+
+ status = acpi_get_handle(NULL, "\\_SB", &handle);
+ if (ACPI_FAILURE(status))
+ return true;
+
+ return !acpi_has_method(handle, "PCCH");
+}
+
static bool __init intel_pstate_has_acpi_ppc(void)
{
int i;
@@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
switch (plat_info[idx].data) {
case PSS:
- return intel_pstate_no_acpi_pss();
+ if (!intel_pstate_no_acpi_pss())
+ return false;
+
+ return intel_pstate_no_acpi_pcch();
case PPC:
return intel_pstate_has_acpi_ppc() && !force_load;
}
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present
2018-07-17 16:13 ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
@ 2018-07-17 17:23 ` Srinivas Pandruvada
2018-07-17 17:28 ` Rafael J. Wysocki
2018-07-17 18:06 ` [PATCH] cpufreq: intel_pstate: Register " Rafael J. Wysocki
1 sibling, 1 reply; 36+ messages in thread
From: Srinivas Pandruvada @ 2018-07-17 17:23 UTC (permalink / raw)
To: Rafael J. Wysocki, Andreas Herrmann
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, 2018-07-17 at 18:13 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Currently, intel_pstate doesn't load if _PSS is not present on
> HP Proliant systems, because it expects the firmware to take over
> CPU performance scaling in that case. However, if ACPI PCCH is
> present, the firmware expects the kernel to use it for CPU
> performance scaling and the pcc-cpufreq driver is loaded for that.
>
> Unfortunately, the firmware interface used by that driver is not
> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
> on systems with more than just a few CPUs. In fact, it is better to
> avoid using it at all.
>
> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
> is not present and load if it is there.
>
> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power
> mgmt option)
> Reported-by: Andreas Herrmann <aherrmann@suse.com>
> Tested-by: Andreas Herrmann <aherrmann@suse.com>
> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
But do we need a change as done by the following commit in in pcc-
cpufreq.c?
"
commit 8a61e12e84597b5f8155ac91b44dea866ccfaac2
Author: Yinghai Lu <yinghai@kernel.org>
Date: Fri Sep 20 10:43:56 2013 -0700
acpi-cpufreq: skip loading acpi_cpufreq after intel_pstate
"
Thanks,
Srinivas
> ---
> drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
> return true;
> }
>
> +static bool __init intel_pstate_no_acpi_pcch(void)
> +{
> + acpi_status status;
> + acpi_handle handle;
> +
> + status = acpi_get_handle(NULL, "\\_SB", &handle);
> + if (ACPI_FAILURE(status))
> + return true;
> +
> + return !acpi_has_method(handle, "PCCH");
> +}
> +
> static bool __init intel_pstate_has_acpi_ppc(void)
> {
> int i;
> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>
> switch (plat_info[idx].data) {
> case PSS:
> - return intel_pstate_no_acpi_pss();
> + if (!intel_pstate_no_acpi_pss())
> + return false;
> +
> + return intel_pstate_no_acpi_pcch();
> case PPC:
> return intel_pstate_has_acpi_ppc() && !force_load;
> }
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present
2018-07-17 17:23 ` Srinivas Pandruvada
@ 2018-07-17 17:28 ` Rafael J. Wysocki
0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 17:28 UTC (permalink / raw)
To: Srinivas Pandruvada
Cc: Rafael J. Wysocki, Andreas Herrmann, Rafael J. Wysocki,
Peter Zijlstra, Frederic Weisbecker, Viresh Kumar, Linux PM,
Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 7:23 PM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> On Tue, 2018-07-17 at 18:13 +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Currently, intel_pstate doesn't load if _PSS is not present on
>> HP Proliant systems, because it expects the firmware to take over
>> CPU performance scaling in that case. However, if ACPI PCCH is
>> present, the firmware expects the kernel to use it for CPU
>> performance scaling and the pcc-cpufreq driver is loaded for that.
>>
>> Unfortunately, the firmware interface used by that driver is not
>> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
>> on systems with more than just a few CPUs. In fact, it is better to
>> avoid using it at all.
>>
>> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
>> is not present and load if it is there.
>>
>> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power
>> mgmt option)
>> Reported-by: Andreas Herrmann <aherrmann@suse.com>
>> Tested-by: Andreas Herrmann <aherrmann@suse.com>
>> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
>> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>
> But do we need a change as done by the following commit in in pcc-
> cpufreq.c?
>
> "
> commit 8a61e12e84597b5f8155ac91b44dea866ccfaac2
> Author: Yinghai Lu <yinghai@kernel.org>
> Date: Fri Sep 20 10:43:56 2013 -0700
>
> acpi-cpufreq: skip loading acpi_cpufreq after intel_pstate
>
> "
Quite likely yes, good point!
>> ---
>> drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>> ===================================================================
>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
>> return true;
>> }
>>
>> +static bool __init intel_pstate_no_acpi_pcch(void)
>> +{
>> + acpi_status status;
>> + acpi_handle handle;
>> +
>> + status = acpi_get_handle(NULL, "\\_SB", &handle);
>> + if (ACPI_FAILURE(status))
>> + return true;
>> +
>> + return !acpi_has_method(handle, "PCCH");
>> +}
>> +
>> static bool __init intel_pstate_has_acpi_ppc(void)
>> {
>> int i;
>> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>>
>> switch (plat_info[idx].data) {
>> case PSS:
>> - return intel_pstate_no_acpi_pss();
>> + if (!intel_pstate_no_acpi_pss())
>> + return false;
>> +
>> + return intel_pstate_no_acpi_pcch();
>> case PPC:
>> return intel_pstate_has_acpi_ppc() && !force_load;
>> }
>>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH] cpufreq: intel_pstate: Register when ACPI PCCH is present
2018-07-17 16:13 ` [PATCH] cpufreq: intel_pstate: Load when ACPI PCCH is present Rafael J. Wysocki
2018-07-17 17:23 ` Srinivas Pandruvada
@ 2018-07-17 18:06 ` Rafael J. Wysocki
2018-07-18 10:43 ` Andreas Herrmann
1 sibling, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 18:06 UTC (permalink / raw)
To: Andreas Herrmann, Srinivas Pandruvada, Linux PM
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux Kernel Mailing List
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, intel_pstate doesn't load if _PSS is not present on
HP Proliant systems, because it expects the firmware to take over
CPU performance scaling in that case. However, if ACPI PCCH is
present, the firmware expects the kernel to use it for CPU
performance scaling and the pcc-cpufreq driver is loaded for that.
Unfortunately, the firmware interface used by that driver is not
scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
on systems with more than just a few CPUs. In fact, it is better to
avoid using it at all.
For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
is not present and register if it is there. Also prevent the
pcc-cpufreq driver from trying to initialize if intel_pstate has
been registered already.
Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
Reported-by: Andreas Herrmann <aherrmann@suse.com>
Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
This is a replacement for https://patchwork.kernel.org/patch/10530091/
In addition to the intel_pstate changes in the above, it also
prevents pcc-cpufreq from trying to initialize (which will fail
ultimately, but may confuse the firmware in the meantime).
Andreas, please test this one and let me know if it still works for you.
Thanks,
Rafael
---
drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
drivers/cpufreq/pcc-cpufreq.c | 4 ++++
2 files changed, 20 insertions(+), 1 deletion(-)
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
return true;
}
+static bool __init intel_pstate_no_acpi_pcch(void)
+{
+ acpi_status status;
+ acpi_handle handle;
+
+ status = acpi_get_handle(NULL, "\\_SB", &handle);
+ if (ACPI_FAILURE(status))
+ return true;
+
+ return !acpi_has_method(handle, "PCCH");
+}
+
static bool __init intel_pstate_has_acpi_ppc(void)
{
int i;
@@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
switch (plat_info[idx].data) {
case PSS:
- return intel_pstate_no_acpi_pss();
+ if (!intel_pstate_no_acpi_pss())
+ return false;
+
+ return intel_pstate_no_acpi_pcch();
case PPC:
return intel_pstate_has_acpi_ppc() && !force_load;
}
Index: linux-pm/drivers/cpufreq/pcc-cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/pcc-cpufreq.c
+++ linux-pm/drivers/cpufreq/pcc-cpufreq.c
@@ -580,6 +580,10 @@ static int __init pcc_cpufreq_init(void)
{
int ret;
+ /* Skip initialization if another cpufreq driver is there. */
+ if (cpufreq_get_current_driver())
+ return 0;
+
if (acpi_disabled)
return 0;
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] cpufreq: intel_pstate: Register when ACPI PCCH is present
2018-07-17 18:06 ` [PATCH] cpufreq: intel_pstate: Register " Rafael J. Wysocki
@ 2018-07-18 10:43 ` Andreas Herrmann
2018-07-18 10:51 ` Rafael J. Wysocki
0 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-18 10:43 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Srinivas Pandruvada, Linux PM, Rafael J. Wysocki, Peter Zijlstra,
Frederic Weisbecker, Viresh Kumar, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 08:06:54PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Currently, intel_pstate doesn't load if _PSS is not present on
> HP Proliant systems, because it expects the firmware to take over
> CPU performance scaling in that case. However, if ACPI PCCH is
> present, the firmware expects the kernel to use it for CPU
> performance scaling and the pcc-cpufreq driver is loaded for that.
>
> Unfortunately, the firmware interface used by that driver is not
> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
> on systems with more than just a few CPUs. In fact, it is better to
> avoid using it at all.
>
> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
> is not present and register if it is there. Also prevent the
> pcc-cpufreq driver from trying to initialize if intel_pstate has
> been registered already.
>
> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
> Reported-by: Andreas Herrmann <aherrmann@suse.com>
> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> This is a replacement for https://patchwork.kernel.org/patch/10530091/
>
> In addition to the intel_pstate changes in the above, it also
> prevents pcc-cpufreq from trying to initialize (which will fail
> ultimately, but may confuse the firmware in the meantime).
>
> Andreas, please test this one and let me know if it still works for you.
Done that (with system in "Dynamic Power Savings Mode"). It still
works and system was stable.
FYI, as a sniff test I've run a kernbench test. I now repeat the test
with system switched to "OS control mode". Just in case -- if there
was interference with platform code while system was in "Dynamic Power
Savings Mode" (significantly affecting performance) I might spot it
this way.
Andreas
> Thanks,
> Rafael
>
> ---
> drivers/cpufreq/intel_pstate.c | 17 ++++++++++++++++-
> drivers/cpufreq/pcc-cpufreq.c | 4 ++++
> 2 files changed, 20 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -2391,6 +2391,18 @@ static bool __init intel_pstate_no_acpi_
> return true;
> }
>
> +static bool __init intel_pstate_no_acpi_pcch(void)
> +{
> + acpi_status status;
> + acpi_handle handle;
> +
> + status = acpi_get_handle(NULL, "\\_SB", &handle);
> + if (ACPI_FAILURE(status))
> + return true;
> +
> + return !acpi_has_method(handle, "PCCH");
> +}
> +
> static bool __init intel_pstate_has_acpi_ppc(void)
> {
> int i;
> @@ -2450,7 +2462,10 @@ static bool __init intel_pstate_platform
>
> switch (plat_info[idx].data) {
> case PSS:
> - return intel_pstate_no_acpi_pss();
> + if (!intel_pstate_no_acpi_pss())
> + return false;
> +
> + return intel_pstate_no_acpi_pcch();
> case PPC:
> return intel_pstate_has_acpi_ppc() && !force_load;
> }
> Index: linux-pm/drivers/cpufreq/pcc-cpufreq.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/pcc-cpufreq.c
> +++ linux-pm/drivers/cpufreq/pcc-cpufreq.c
> @@ -580,6 +580,10 @@ static int __init pcc_cpufreq_init(void)
> {
> int ret;
>
> + /* Skip initialization if another cpufreq driver is there. */
> + if (cpufreq_get_current_driver())
> + return 0;
> +
> if (acpi_disabled)
> return 0;
>
>
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] cpufreq: intel_pstate: Register when ACPI PCCH is present
2018-07-18 10:43 ` Andreas Herrmann
@ 2018-07-18 10:51 ` Rafael J. Wysocki
0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-18 10:51 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Srinivas Pandruvada, Linux PM,
Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux Kernel Mailing List
On Wed, Jul 18, 2018 at 12:43 PM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 08:06:54PM +0200, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Currently, intel_pstate doesn't load if _PSS is not present on
>> HP Proliant systems, because it expects the firmware to take over
>> CPU performance scaling in that case. However, if ACPI PCCH is
>> present, the firmware expects the kernel to use it for CPU
>> performance scaling and the pcc-cpufreq driver is loaded for that.
>>
>> Unfortunately, the firmware interface used by that driver is not
>> scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
>> on systems with more than just a few CPUs. In fact, it is better to
>> avoid using it at all.
>>
>> For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
>> is not present and register if it is there. Also prevent the
>> pcc-cpufreq driver from trying to initialize if intel_pstate has
>> been registered already.
>>
>> Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option)
>> Reported-by: Andreas Herrmann <aherrmann@suse.com>
>> Reviewed-by: Andreas Herrmann <aherrmann@suse.com>
>> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>
>> This is a replacement for https://patchwork.kernel.org/patch/10530091/
>>
>> In addition to the intel_pstate changes in the above, it also
>> prevents pcc-cpufreq from trying to initialize (which will fail
>> ultimately, but may confuse the firmware in the meantime).
>>
>> Andreas, please test this one and let me know if it still works for you.
>
> Done that (with system in "Dynamic Power Savings Mode"). It still
> works and system was stable.
Thanks!
> FYI, as a sniff test I've run a kernbench test. I now repeat the test
> with system switched to "OS control mode". Just in case -- if there
> was interference with platform code while system was in "Dynamic Power
> Savings Mode" (significantly affecting performance) I might spot it
> this way.
I'd rather not expect performance to be affected by this, but power
very well may be affected.
Anyway, good idea!
Cheers,
Rafael
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 9:36 ` Andreas Herrmann
2018-07-17 10:09 ` Rafael J. Wysocki
@ 2018-07-17 10:18 ` Andreas Herrmann
1 sibling, 0 replies; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 10:18 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 11:36:20AM +0200, Andreas Herrmann wrote:
> On Tue, Jul 17, 2018 at 11:27:21AM +0200, Andreas Herrmann wrote:
> > On Tue, Jul 17, 2018 at 11:23:25AM +0200, Rafael J. Wysocki wrote:
> > > On Tue, Jul 17, 2018 at 11:11 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > > On Tue, Jul 17, 2018 at 11:06:29AM +0200, Rafael J. Wysocki wrote:
> > > >> On Tue, Jul 17, 2018 at 10:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > > >>
> > > >> [cut]
> > > >>
> > > >> >
> > > >> > On balance before this commit users could use pcc-cpufreq but had
> > > >> > already suboptimal performance (compared to say intel_pstate driver
> > > >> > which can be used changing BIOS options).
> > > >>
> > > >> BTW, I wonder why you need to change the BIOS options for intel_pstate to load.
> > > >
> > > > I think this is because of (in intel_pstate_init()):
> > > >
> > > > /*
> > > > * The Intel pstate driver will be ignored if the platform
> > > > * firmware has its own power management modes.
> > > > */
> > > > if (intel_pstate_platform_pwr_mgmt_exists())
> > > > return -ENODEV;
> > > >
> > >
> > > OK, because of the "Proliant" entry, right?
> > >
> > > So it looks like we have an issue there. We find the entry and we
> > > look for _PSS. It is not there, so we assume that the firmware is
> > > expected to control performance, which is not the case.
> FYI, there is another BIOS setting on those systems. It's called
> "Collaborative Power Control" (AFAIK enabled by default).
>
> Only if this is disabled, firmware is (alone) in control of
> performance. (And of course in this case neither pcc-cpufreq nor
> intel_pstate will be loaded).
To clarify:
(i) When "Dynamic Power Savings Mode" is enabled _PSS objects are
missing (in fact they seem to be renamed to "XPSS").
pcc-cpufreq driver evaluates PCCH header and loads.
Same when "Collaborative Power Control" is disabled. No _PSS but
XPSS objects. pcc-cpufreq driver fails to evaluate PCCH header,
no cpufreq driver is loaded.
(ii) There are _PSS objects when "OS Control Mode" is selected.
(But I think when "Collaborative Power Control" is disabled there
are no _PSS objects either and the "OS Control Mode" selection
does not matter.)
> > > It looks like we also should look for the presence of the PCC
> > > interface in there.
> > >
> > > I can provide a patch for that, will you be able to test it?
> >
> > Yes, I can test it.
> >
> > > >> It should be initialized before pcc-cpufreq (according to their
> > > >> respective initcall levels), so in theory intel_pstate should be used
> > > >> by default on the affected systems anyway.
> > > >
> > > >> What BIOS settings need to be changed for that?
> > > >
> > > > Already answered in other mail.
> > >
> > > Indeed.
> >
> >
> > Andreas
> >
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 7:33 ` Rafael J. Wysocki
2018-07-17 8:03 ` Rafael J. Wysocki
@ 2018-07-17 8:08 ` Daniel Lezcano
2018-07-17 8:36 ` Andreas Herrmann
2 siblings, 0 replies; 36+ messages in thread
From: Daniel Lezcano @ 2018-07-17 8:08 UTC (permalink / raw)
To: Rafael J. Wysocki, Andreas Herrmann
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On 17/07/2018 09:33, Rafael J. Wysocki wrote:
> Hi,
>
> Thanks for your report!
>
> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> Hello,
>>
>> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> idle state before stopping the tick") causes severe performance drop
>> for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> the system might be almost unusable. The OS jitter for 4.17.y and
>> 4.18.-rcx kernels is off the charts, you can even spot it with top
>> command (issued when the system is supposedly idle), e.g.
>>
>> top - 14:44:24 up 2 min, 1 user, load average: 90.11, 38.20, 14.38
>> Tasks: 1199 total, 109 running, 541 sleeping, 0 stopped, 0 zombie
>> %Cpu(s): 1.2 us, 58.7 sy, 0.0 ni, 39.3 id, 0.6 wa, 0.0 hi, 0.3 si, 0.0 st
>> KiB Mem: 13137064+total, 1192168 used, 13017848+free, 2340 buffers
>> KiB Swap: 2104316 total, 0 used, 2104316 free. 522296 cached Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 3373 root 20 0 982024 49916 36120 R 96.691 0.038 0:19.54 kubelet
>> 67 root 20 0 0 0 0 R 78.676 0.000 0:49.36 kworker/9:0
>> 25 root 20 0 0 0 0 R 78.125 0.000 0:49.67 kworker/2:0
>> 182 root 20 0 0 0 0 R 75.735 0.000 1:18.17 kworker/28:0
>> 43 root 20 0 0 0 0 R 75.000 0.000 0:11.56 kworker/5:0
>> 103 root 20 0 0 0 0 R 74.449 0.000 0:46.83 kworker/15:0
>> 334 root 20 0 0 0 0 R 72.978 0.000 1:06.88 kworker/53:0
>> 789 root 20 0 0 0 0 R 69.853 0.000 1:29.50 kworker/38:1
>> 418 root 20 0 0 0 0 R 69.301 0.000 0:41.33 kworker/67:0
>> 779 root 20 0 0 0 0 R 68.934 0.000 1:33.60 kworker/27:1
>> 773 root 20 0 0 0 0 R 68.566 0.000 1:37.91 kworker/22:1
>> 762 root 20 0 0 0 0 R 68.015 0.000 1:41.01 kworker/11:1
>> 769 root 20 0 0 0 0 R 67.647 0.000 1:37.65 kworker/18:1
>> 805 root 20 0 0 0 0 R 67.096 0.000 1:30.96 kworker/54:1
>> 840 root 20 0 0 0 0 R 66.912 0.000 1:23.82 kworker/89:1
>> 812 root 20 0 0 0 0 R 66.728 0.000 1:31.89 kworker/59:1
>> 847 root 20 0 0 0 0 R 66.360 0.000 1:28.40 kworker/96:1
>> 763 root 20 0 0 0 0 R 66.176 0.000 1:42.57 kworker/12:1
>> 772 root 20 0 0 0 0 R 66.176 0.000 1:12.58 kworker/21:1
>> 821 root 20 0 0 0 0 R 66.176 0.000 1:29.62 kworker/69:1
>> 923 root 20 0 0 0 0 R 65.809 0.000 1:44.32 kworker/3:18
>> 1284 root 20 0 0 0 0 R 65.809 0.000 1:23.50 kworker/101:2
>> 61 root 20 0 0 0 0 R 65.625 0.000 1:29.37 kworker/8:0
>> 3531 root 20 0 24384 3768 2356 R 65.625 0.003 0:08.91 top
>> 771 root 20 0 0 0 0 R 65.074 0.000 1:37.90 kworker/20:1
>> 767 root 20 0 0 0 0 R 64.706 0.000 1:38.01 kworker/16:1
>> 764 root 20 0 0 0 0 R 64.522 0.000 1:40.28 kworker/13:1
>> 765 root 20 0 0 0 0 R 64.154 0.000 1:40.13 kworker/14:1
>>
>> When I apply below patch (trying to revert essential parts of commit
>> 554c8aa8ecad) behaviour seems back to normal.
>
> Well, that basically defeats the purpose of the change in commit
> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>
> Also it would be good to understand what actually happens.
>
>> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> to cpufreq drivers and you better not use it.
>
> That's exactly right.
>
>> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> stopping the tick") introduced bad behaviour for other cases as well.
>
> It has been tested quite extensively in that respect, although
> admittedly not with the pcc-cpufreq driver.
>
> Nothing bad related to it has been has been reported so far, FWIW.
I confirm we benchmarked on ARM64 and we did not notice a big
performance drop expect a few percent due to more cluster idle states
which is logical and compensated by a big energy improvement.
>> I'll send some performance results to illustrate the issue asap. I've
>> also tried to modify pcc-cpufreq to reduce the amount of frequency
>> changes triggered by this driver but this does not help for kernels
>> where commit 554c8aa8ecad is applied.
>
> Can you replace pcc-cpufreq with a different cpufreq driver on the
> affected systems? If so, do performance numbers look bad after that
> too?
>
> Thanks,
> Rafael
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 7:33 ` Rafael J. Wysocki
2018-07-17 8:03 ` Rafael J. Wysocki
2018-07-17 8:08 ` Daniel Lezcano
@ 2018-07-17 8:36 ` Andreas Herrmann
2018-07-17 8:52 ` Rafael J. Wysocki
2 siblings, 1 reply; 36+ messages in thread
From: Andreas Herrmann @ 2018-07-17 8:36 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Rafael J. Wysocki, Peter Zijlstra, Frederic Weisbecker,
Viresh Kumar, Linux PM, Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 09:33:53AM +0200, Rafael J. Wysocki wrote:
> Hi,
>
> Thanks for your report!
>
> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> > Hello,
> >
> > I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
> > idle state before stopping the tick") causes severe performance drop
> > for systems using pcc-cpufreq driver. Depending on the number of CPUs
> > the system might be almost unusable. The OS jitter for 4.17.y and
> > 4.18.-rcx kernels is off the charts, you can even spot it with top
> > command (issued when the system is supposedly idle), e.g.
> >
> > top - 14:44:24 up 2 min, 1 user, load average: 90.11, 38.20, 14.38
> > Tasks: 1199 total, 109 running, 541 sleeping, 0 stopped, 0 zombie
> > %Cpu(s): 1.2 us, 58.7 sy, 0.0 ni, 39.3 id, 0.6 wa, 0.0 hi, 0.3 si, 0.0 st
> > KiB Mem: 13137064+total, 1192168 used, 13017848+free, 2340 buffers
> > KiB Swap: 2104316 total, 0 used, 2104316 free. 522296 cached Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 3373 root 20 0 982024 49916 36120 R 96.691 0.038 0:19.54 kubelet
> > 67 root 20 0 0 0 0 R 78.676 0.000 0:49.36 kworker/9:0
> > 25 root 20 0 0 0 0 R 78.125 0.000 0:49.67 kworker/2:0
> > 182 root 20 0 0 0 0 R 75.735 0.000 1:18.17 kworker/28:0
> > 43 root 20 0 0 0 0 R 75.000 0.000 0:11.56 kworker/5:0
> > 103 root 20 0 0 0 0 R 74.449 0.000 0:46.83 kworker/15:0
> > 334 root 20 0 0 0 0 R 72.978 0.000 1:06.88 kworker/53:0
> > 789 root 20 0 0 0 0 R 69.853 0.000 1:29.50 kworker/38:1
> > 418 root 20 0 0 0 0 R 69.301 0.000 0:41.33 kworker/67:0
> > 779 root 20 0 0 0 0 R 68.934 0.000 1:33.60 kworker/27:1
> > 773 root 20 0 0 0 0 R 68.566 0.000 1:37.91 kworker/22:1
> > 762 root 20 0 0 0 0 R 68.015 0.000 1:41.01 kworker/11:1
> > 769 root 20 0 0 0 0 R 67.647 0.000 1:37.65 kworker/18:1
> > 805 root 20 0 0 0 0 R 67.096 0.000 1:30.96 kworker/54:1
> > 840 root 20 0 0 0 0 R 66.912 0.000 1:23.82 kworker/89:1
> > 812 root 20 0 0 0 0 R 66.728 0.000 1:31.89 kworker/59:1
> > 847 root 20 0 0 0 0 R 66.360 0.000 1:28.40 kworker/96:1
> > 763 root 20 0 0 0 0 R 66.176 0.000 1:42.57 kworker/12:1
> > 772 root 20 0 0 0 0 R 66.176 0.000 1:12.58 kworker/21:1
> > 821 root 20 0 0 0 0 R 66.176 0.000 1:29.62 kworker/69:1
> > 923 root 20 0 0 0 0 R 65.809 0.000 1:44.32 kworker/3:18
> > 1284 root 20 0 0 0 0 R 65.809 0.000 1:23.50 kworker/101:2
> > 61 root 20 0 0 0 0 R 65.625 0.000 1:29.37 kworker/8:0
> > 3531 root 20 0 24384 3768 2356 R 65.625 0.003 0:08.91 top
> > 771 root 20 0 0 0 0 R 65.074 0.000 1:37.90 kworker/20:1
> > 767 root 20 0 0 0 0 R 64.706 0.000 1:38.01 kworker/16:1
> > 764 root 20 0 0 0 0 R 64.522 0.000 1:40.28 kworker/13:1
> > 765 root 20 0 0 0 0 R 64.154 0.000 1:40.13 kworker/14:1
> >
> > When I apply below patch (trying to revert essential parts of commit
> > 554c8aa8ecad) behaviour seems back to normal.
>
> Well, that basically defeats the purpose of the change in commit
> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>
> Also it would be good to understand what actually happens.
>
> > I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
> > to cpufreq drivers and you better not use it.
>
> That's exactly right.
>
> > But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
> > stopping the tick") introduced bad behaviour for other cases as well.
>
> It has been tested quite extensively in that respect, although
> admittedly not with the pcc-cpufreq driver.
>
> Nothing bad related to it has been has been reported so far, FWIW.
>
> > I'll send some performance results to illustrate the issue asap. I've
> > also tried to modify pcc-cpufreq to reduce the amount of frequency
> > changes triggered by this driver but this does not help for kernels
> > where commit 554c8aa8ecad is applied.
>
> Can you replace pcc-cpufreq with a different cpufreq driver on the
> affected systems? If so, do performance numbers look bad after that
> too?
I have no performance numbers yet for other cpufreq drivers on this
system (checking this commit). But I'll look it at next.
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq
2018-07-17 8:36 ` Andreas Herrmann
@ 2018-07-17 8:52 ` Rafael J. Wysocki
0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2018-07-17 8:52 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra,
Frederic Weisbecker, Viresh Kumar, Linux PM,
Linux Kernel Mailing List
On Tue, Jul 17, 2018 at 10:36 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
> On Tue, Jul 17, 2018 at 09:33:53AM +0200, Rafael J. Wysocki wrote:
>> Hi,
>>
>> Thanks for your report!
>>
>> On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann <aherrmann@suse.com> wrote:
>> > Hello,
>> >
>> > I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select
>> > idle state before stopping the tick") causes severe performance drop
>> > for systems using pcc-cpufreq driver. Depending on the number of CPUs
>> > the system might be almost unusable. The OS jitter for 4.17.y and
>> > 4.18.-rcx kernels is off the charts, you can even spot it with top
>> > command (issued when the system is supposedly idle), e.g.
>> >
>> > top - 14:44:24 up 2 min, 1 user, load average: 90.11, 38.20, 14.38
>> > Tasks: 1199 total, 109 running, 541 sleeping, 0 stopped, 0 zombie
>> > %Cpu(s): 1.2 us, 58.7 sy, 0.0 ni, 39.3 id, 0.6 wa, 0.0 hi, 0.3 si, 0.0 st
>> > KiB Mem: 13137064+total, 1192168 used, 13017848+free, 2340 buffers
>> > KiB Swap: 2104316 total, 0 used, 2104316 free. 522296 cached Mem
>> >
>> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> > 3373 root 20 0 982024 49916 36120 R 96.691 0.038 0:19.54 kubelet
>> > 67 root 20 0 0 0 0 R 78.676 0.000 0:49.36 kworker/9:0
>> > 25 root 20 0 0 0 0 R 78.125 0.000 0:49.67 kworker/2:0
>> > 182 root 20 0 0 0 0 R 75.735 0.000 1:18.17 kworker/28:0
>> > 43 root 20 0 0 0 0 R 75.000 0.000 0:11.56 kworker/5:0
>> > 103 root 20 0 0 0 0 R 74.449 0.000 0:46.83 kworker/15:0
>> > 334 root 20 0 0 0 0 R 72.978 0.000 1:06.88 kworker/53:0
>> > 789 root 20 0 0 0 0 R 69.853 0.000 1:29.50 kworker/38:1
>> > 418 root 20 0 0 0 0 R 69.301 0.000 0:41.33 kworker/67:0
>> > 779 root 20 0 0 0 0 R 68.934 0.000 1:33.60 kworker/27:1
>> > 773 root 20 0 0 0 0 R 68.566 0.000 1:37.91 kworker/22:1
>> > 762 root 20 0 0 0 0 R 68.015 0.000 1:41.01 kworker/11:1
>> > 769 root 20 0 0 0 0 R 67.647 0.000 1:37.65 kworker/18:1
>> > 805 root 20 0 0 0 0 R 67.096 0.000 1:30.96 kworker/54:1
>> > 840 root 20 0 0 0 0 R 66.912 0.000 1:23.82 kworker/89:1
>> > 812 root 20 0 0 0 0 R 66.728 0.000 1:31.89 kworker/59:1
>> > 847 root 20 0 0 0 0 R 66.360 0.000 1:28.40 kworker/96:1
>> > 763 root 20 0 0 0 0 R 66.176 0.000 1:42.57 kworker/12:1
>> > 772 root 20 0 0 0 0 R 66.176 0.000 1:12.58 kworker/21:1
>> > 821 root 20 0 0 0 0 R 66.176 0.000 1:29.62 kworker/69:1
>> > 923 root 20 0 0 0 0 R 65.809 0.000 1:44.32 kworker/3:18
>> > 1284 root 20 0 0 0 0 R 65.809 0.000 1:23.50 kworker/101:2
>> > 61 root 20 0 0 0 0 R 65.625 0.000 1:29.37 kworker/8:0
>> > 3531 root 20 0 24384 3768 2356 R 65.625 0.003 0:08.91 top
>> > 771 root 20 0 0 0 0 R 65.074 0.000 1:37.90 kworker/20:1
>> > 767 root 20 0 0 0 0 R 64.706 0.000 1:38.01 kworker/16:1
>> > 764 root 20 0 0 0 0 R 64.522 0.000 1:40.28 kworker/13:1
>> > 765 root 20 0 0 0 0 R 64.154 0.000 1:40.13 kworker/14:1
>> >
>> > When I apply below patch (trying to revert essential parts of commit
>> > 554c8aa8ecad) behaviour seems back to normal.
>>
>> Well, that basically defeats the purpose of the change in commit
>> 554c8aa8ecad, so it's not what I'd like to do to fix this problem.
>>
>> Also it would be good to understand what actually happens.
>>
>> > I know that pcc-cpufreq driver is not "state-of-the-art" when it comes
>> > to cpufreq drivers and you better not use it.
>>
>> That's exactly right.
>>
>> > But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before
>> > stopping the tick") introduced bad behaviour for other cases as well.
>>
>> It has been tested quite extensively in that respect, although
>> admittedly not with the pcc-cpufreq driver.
>>
>> Nothing bad related to it has been has been reported so far, FWIW.
>>
>> > I'll send some performance results to illustrate the issue asap. I've
>> > also tried to modify pcc-cpufreq to reduce the amount of frequency
>> > changes triggered by this driver but this does not help for kernels
>> > where commit 554c8aa8ecad is applied.
>>
>> Can you replace pcc-cpufreq with a different cpufreq driver on the
>> affected systems? If so, do performance numbers look bad after that
>> too?
>
> I have no performance numbers yet for other cpufreq drivers on this
> system (checking this commit). But I'll look it at next.
Thanks!
Generally speaking, pcc-cpufreq is fundamentally not scalable, so the
additional concurrency brought in by the commit in question may have
exposed that weakness if that driver is run on a system with multiple
logical CPUs.
^ permalink raw reply [flat|nested] 36+ messages in thread