[v2,0/1] AMD EPYC: fix schedutil perf regression (freq-invariance)
mbox series

Message ID 20210122204038.3238-1-ggherdovich@suse.cz
Headers show
Series
  • AMD EPYC: fix schedutil perf regression (freq-invariance)
Related show

Message

Giovanni Gherdovich Jan. 22, 2021, 8:40 p.m. UTC
Michael Larabel from Phoronix.com graciously tested v1, see results at:

AMD EPYC 7702 -
https://openbenchmarking.org/result/2101210-PTS-LINUX51178

AMD Ryzen 9 5950X - 
https://openbenchmarking.org/result/2101212-HA-RYZEN959566

The reported regression is recovered, and some workloads even report an
improvement over the v5.10 results.

Thanks Michael for the feedback!


v1 at https://lore.kernel.org/lkml/20210121003223.20257-1-ggherdovich@suse.cz/

Changes wrt v1:

- move code around so that it builds for non-x86 architectures too

Giovanni Gherdovich (1):
  x86,sched: On AMD EPYC set freq_max = max_boost in schedutil invariant
    formula

 drivers/cpufreq/acpi-cpufreq.c   | 64 +++++++++++++++++++++++++++++++-
 drivers/cpufreq/cpufreq.c        |  3 ++
 include/linux/cpufreq.h          |  5 +++
 kernel/sched/cpufreq_schedutil.c |  8 +++-
 4 files changed, 76 insertions(+), 4 deletions(-)

Comments

Michael Larabel Jan. 24, 2021, 10:30 p.m. UTC | #1
From ongoing tests of this patch, it still certainly shows to address 
most of the Linux 5.11 performance regression previously encountered 
when using Schedutil. Additionally, for a number of workloads where not 
seeing a regression from 5.10 to 5.11 Git is still showing even better 
performance with this patch. The power monitoring on the AMD EPYC server 
is showing higher power spikes but the average power consumption rate is 
roughly comparable to that of Linux 5.11 Git, which is higher than 5.10 
by just about 3%.

So this patch still seems to be working out well and indeed taking care 
of some wide losses seen otherwise on Linux 5.11 when using Schedutil on 
AMD Zen2/Zen3. Still have some other tests running but so far no 
unexpected results.

Michael


AMD EPYC 7F72 2P

On an EPYC 7F72 2P server[1] across 147 test cases I am finding the 
patched Linux 5.11 kernel to be just over 1% faster than 5.10 stable 
compared to the unpatched 5.11 Git being just behind 5.10. For the 
workloads on that server where Linux 5.11 is slower with Schedutil, the 
patch indeed is largely addressing that regression and also providing 
other improvements.

During that testing, the amd_energy interface was monitored. Linux 5.11 
with Schedutil AMD freq invariance did show on average 10 Watts (~3.7%) 
higher power consumption on average than Linux 5.10 with Schedutil. But 
with this patch, that average is still basically the same. The peak 
power consumption during any of the tests was higher at 530~549 Watts 
compared to 501 Watts with Linux 5.10. Overall the performance is 
looking good but given amd_energy still not working for consumer models, 
I don't have much power data to share at the moment.

Ryzen 9 5950X

Expanding on the prior testing with the 5950X, I ran some follow-up 
tests[2]. Of 221 test cases there, the current Linux 5.11 Git 
performance came around 2% slower on a geo mean basis than Linux 5.10 
while the patched performance pulls it to ~2.5% faster than 5.10. There 
still are some cases where Linux 5.10 is faster, but overall at least 
the patched Linux 5.11 performance doesn't show nearly as many 
regressions. In a number of test cases, the Linux 5.11 patched 
performance is outright better than Linux 5.10 even where 5.11 
(un-patched) hadn't regressed or by that much.

Ryzen 5 4500U

For something at the lower end of the spectrum I also ran a number of 
tests on a Ryzen 5 4500U notebook[3]. Linux 5.11 (unpatched) doesn't see 
as many regressions as on the larger systems but still the patched 
performance did help in a number of tests, particularly around 
graphics/gaming. In the heavier multi-core core tests are still a number 
of cases where Linux 5.10 is faster than patched/unpatched 5.11. The 
patched kernel in those 90 tests came out to being about 4% faster than 
5.10.

(Result highlights below, results with little change set to hidden by 
default.)
[1] 
https://openbenchmarking.org/result/2101248-HA-AMDEPYC7F52&grs&hlc=1&hnr=1&hlc=1
[2] https://openbenchmarking.org/result/2101242-HA-RYZEN959530&grs&hlc=1
[3] 
https://openbenchmarking.org/result/2101232-PTS-RENOIRLI89&grs&hnr=1&hlc=1


On 1/22/21 2:40 PM, Giovanni Gherdovich wrote:
> Michael Larabel from Phoronix.com graciously tested v1, see results at:
>
> AMD EPYC 7702 -
> https://openbenchmarking.org/result/2101210-PTS-LINUX51178
>
> AMD Ryzen 9 5950X -
> https://openbenchmarking.org/result/2101212-HA-RYZEN959566
>
> The reported regression is recovered, and some workloads even report an
> improvement over the v5.10 results.
>
> Thanks Michael for the feedback!
>
>
> v1 at https://lore.kernel.org/lkml/20210121003223.20257-1-ggherdovich@suse.cz/
>
> Changes wrt v1:
>
> - move code around so that it builds for non-x86 architectures too
>
> Giovanni Gherdovich (1):
>    x86,sched: On AMD EPYC set freq_max = max_boost in schedutil invariant
>      formula
>
>   drivers/cpufreq/acpi-cpufreq.c   | 64 +++++++++++++++++++++++++++++++-
>   drivers/cpufreq/cpufreq.c        |  3 ++
>   include/linux/cpufreq.h          |  5 +++
>   kernel/sched/cpufreq_schedutil.c |  8 +++-
>   4 files changed, 76 insertions(+), 4 deletions(-)
>
Peter Zijlstra Jan. 25, 2021, 8:34 a.m. UTC | #2
On Sun, Jan 24, 2021 at 04:30:57PM -0600, Michael Larabel wrote:
> From ongoing tests of this patch, it still certainly shows to address most
> of the Linux 5.11 performance regression previously encountered when using
> Schedutil. Additionally, for a number of workloads where not seeing a
> regression from 5.10 to 5.11 Git is still showing even better performance
> with this patch. The power monitoring on the AMD EPYC server is showing
> higher power spikes but the average power consumption rate is roughly
> comparable to that of Linux 5.11 Git, which is higher than 5.10 by just
> about 3%.
> 
> So this patch still seems to be working out well and indeed taking care of
> some wide losses seen otherwise on Linux 5.11 when using Schedutil on AMD
> Zen2/Zen3. Still have some other tests running but so far no unexpected
> results.
> 

Did you do all this writing and forget to add:

Tested-by: Michael Larabel <Michael@phoronix.com>

?
Giovanni Gherdovich Jan. 26, 2021, 9:01 a.m. UTC | #3
On Mon, 2021-01-25 at 09:34 +0100, Peter Zijlstra wrote:
> On Sun, Jan 24, 2021 at 04:30:57PM -0600, Michael Larabel wrote:
> > From ongoing tests of this patch, it still certainly shows to address most
> > of the Linux 5.11 performance regression previously encountered when using
> > Schedutil. Additionally, for a number of workloads where not seeing a
> > regression from 5.10 to 5.11 Git is still showing even better performance
> > with this patch. The power monitoring on the AMD EPYC server is showing
> > higher power spikes but the average power consumption rate is roughly
> > comparable to that of Linux 5.11 Git, which is higher than 5.10 by just
> > about 3%.
> > 
> > So this patch still seems to be working out well and indeed taking care of
> > some wide losses seen otherwise on Linux 5.11 when using Schedutil on AMD
> > Zen2/Zen3. Still have some other tests running but so far no unexpected
> > results.
> > 
> 
> Did you do all this writing and forget to add:
> 
> Tested-by: Michael Larabel <Michael@phoronix.com>
> 
> ?

Michael confirmed me off-list that yes, the patch should carry the
"Tested-by" tag with his name.


Giovanni