linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Jon Grimm <Jon.Grimm@amd.com>,
	Nathan Fontenot <Nathan.Fontenot@amd.com>,
	Yazen Ghannam <Yazen.Ghannam@amd.com>,
	Thomas Lendacky <Thomas.Lendacky@amd.com>,
	Suthikulpanit Suravee <Suravee.Suthikulpanit@amd.com>,
	Mel Gorman <mgorman@techsingularity.net>, Pu Wen <puwen@hygon.cn>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Michael Larabel <Michael@phoronix.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>
Subject: Re: [PATCH v2 1/1] x86,sched: On AMD EPYC set freq_max = max_boost in schedutil invariant formula
Date: Wed, 3 Feb 2021 14:40:01 +0100	[thread overview]
Message-ID: <CAJZ5v0jdcxFBxqYjvTfO64v3Vij1ATp4_GMJKdQCvAysM4gbjw@mail.gmail.com> (raw)
In-Reply-To: <1612341564.3640.14.camel@suse.cz>

On Wed, Feb 3, 2021 at 9:39 AM Giovanni Gherdovich <ggherdovich@suse.cz> wrote:
>
> Hello,
>
> both Rafael and Viresh make a similar remark: why adding a new "max_boost"
> variable, since "max_freq" is already available and could be used instead.
>
> Replying here to both.
>
> On Tue, 2021-02-02 at 20:26 +0100, Rafael J. Wysocki wrote:
> > On Tue, Feb 2, 2021 at 7:59 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Fri, Jan 22, 2021 at 9:47 PM Giovanni Gherdovich <ggherdovich@suse.cz> wrote:
> > >
> > > [cut]
> > > > @@ -779,15 +829,25 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > > >                     freq_table[valid_states-1].frequency / 1000)
> > > >                         continue;
> > > >
> > > > +               freq = perf->states[i].core_frequency * 1000;
> > > >                 freq_table[valid_states].driver_data = i;
> > > > -               freq_table[valid_states].frequency =
> > > > -                   perf->states[i].core_frequency * 1000;
> > > > +               freq_table[valid_states].frequency = freq;
> > > > +
> > > > +               if (freq > max_freq)
> > > > +                       max_freq = freq;
> > > > +
> > > >                 valid_states++;
> > > >         }
> > > >         freq_table[valid_states].frequency = CPUFREQ_TABLE_END;
> > > >         policy->freq_table = freq_table;
> > > >         perf->state = 0;
> > > >
> > > > +       if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> > > > +           amd_max_boost(max_freq, &max_boost)) {
> > > > +               policy->cpuinfo.max_boost = max_boost;
> > >
> > > Why not to set max_freq to max_boost instead?
> >
> > I mean, would setting the frequency in the last table entry to max_boost work?
> >
> > Alternatively, one more (artificial) entry with the frequency equal to
> > max_boost could be added.
>
> On Wed, 2021-02-03 at 11:34 +0530, Viresh Kumar wrote:
> > [cut]
> >
> > On 22-01-21, 21:40, Giovanni Gherdovich wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 6931f0cdeb80..541f3db3f576 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -159,8 +159,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> > >                               unsigned long util, unsigned long max)
> > >  {
> > >     struct cpufreq_policy *policy = sg_policy->policy;
> > > -   unsigned int freq = arch_scale_freq_invariant() ?
> > > -                           policy->cpuinfo.max_freq : policy->cur;
> > > +   unsigned int freq, max_freq;
> > > +
> > > +   max_freq = cpufreq_driver_has_max_boost() ?
> > > +                   policy->cpuinfo.max_boost : policy->cpuinfo.max_freq;
> >
> > Also, can't we update max_freq itself from the cpufreq driver? What
> > troubles will it cost ?
>
> I could add the max boost frequency to the frequency table (and
> policy->cpuinfo.max_freq would follow), yes, but that would trigger the
> following warning from acpi-cpufreq.c:
>
> static void acpi_cpufreq_cpu_ready(struct cpufreq_policy *policy)
> {
>         struct acpi_processor_performance *perf = per_cpu_ptr(acpi_perf_data,
>                                                               policy->cpu);
>
>         if (perf->states[0].core_frequency * 1000 != policy->cpuinfo.max_freq)
>                 pr_warn(FW_WARN "P-state 0 is not max freq\n");
> }

This check can be changed, though.

> so I thought that to stay out of troubles I'd supply a different variable,
> max_boost, to be used only in the schedutil formula.

Which is not necessary and the extra static branch is not necessary.

Moreover, there is no reason whatsoever to believe that EPYC is the
only affected processor model.  If I'm not mistaken, the regression
will be visible on every CPU where the scale invariance algorithm uses
the max frequency greater than the max frequency used acpi_cpufreq.

Also, AFAICS, it should be sufficient to modify acpi_cpufreq to remedy
this for all of the affected CPUs, not just EPYC.

> After schedutil figures out a desired next_freq then the usual comparison with the
> firmware-supplied frequency table could take place.
>
> Altering the frequency table seemed more invasive because once a freq value is
> in there, it's going to be actually requested by the driver to the platform.

This need not be the case if the control structure for the new entry
is copied from an existing one.

> I only want my max_boost to stretch the range of schedutil's next_freq.

Right, but that can be done in a different way which would be cleaner too IMO.

I'm going to send an alternative patch to fix this problem.

> On Tue, 2021-02-02 at 19:59 +0100, Rafael J. Wysocki wrote:
> >
> > [cut]
> > Also notice that the static branch is global and the max_boost value
> > for different CPUs may be different, at least in theory.
>
> In theory yes, but I'm guarding the code with two conditions:
>
> * vendor is X86_VENDOR_AMD
> * cppc_get_perf_caps() returns success
>
> this identifies AMD EPYC cpus with model 7xx2 and later, where max_boost is
> the same on all cores. I may have added synchronization so that only one cpu
> sets the value, but I didn't in the interest of simplicity for an -rc patch
> (I'd have to consider hotplug, the maxcpus= command line param, ecc).

And what about the other potentially affected processors?

I wouldn't worry about the -rc time frame too much.  If we can do a
better fix now, let's do it.

  reply	other threads:[~2021-02-03 13:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-22 20:40 [PATCH v2 0/1] AMD EPYC: fix schedutil perf regression (freq-invariance) Giovanni Gherdovich
2021-01-22 20:40 ` [PATCH v2 1/1] x86,sched: On AMD EPYC set freq_max = max_boost in schedutil invariant formula Giovanni Gherdovich
2021-01-25 10:04   ` Peter Zijlstra
2021-01-26  9:28     ` Giovanni Gherdovich
2021-01-26 10:02       ` Peter Zijlstra
2021-02-02 18:45       ` Rafael J. Wysocki
2021-02-02 19:11         ` Rafael J. Wysocki
2021-02-03  9:56           ` Giovanni Gherdovich
2021-02-02 18:40     ` Rafael J. Wysocki
2021-01-25 10:06   ` Peter Zijlstra
2021-01-26  9:09     ` Giovanni Gherdovich
2021-01-26  9:31       ` Mel Gorman
2021-01-26 10:05         ` Peter Zijlstra
     [not found]           ` <1611933781.15858.48.camel@suse.cz>
2021-02-02 14:17             ` Giovanni Gherdovich
2021-02-02 18:21               ` Peter Zijlstra
2021-02-02 18:29                 ` Rafael J. Wysocki
2021-02-02 19:00                   ` Rafael J. Wysocki
2021-02-02 18:59   ` Rafael J. Wysocki
2021-02-02 19:26     ` Rafael J. Wysocki
2021-02-03  8:39       ` Giovanni Gherdovich
2021-02-03 13:40         ` Rafael J. Wysocki [this message]
2021-02-03  9:12     ` Giovanni Gherdovich
2021-02-03  6:04   ` Viresh Kumar
2021-01-24 22:30 ` [PATCH v2 0/1] AMD EPYC: fix schedutil perf regression (freq-invariance) Michael Larabel
2021-01-25  8:34   ` Peter Zijlstra
2021-01-26  9:01     ` Giovanni Gherdovich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJZ5v0jdcxFBxqYjvTfO64v3Vij1ATp4_GMJKdQCvAysM4gbjw@mail.gmail.com \
    --to=rafael@kernel.org \
    --cc=Jon.Grimm@amd.com \
    --cc=Michael@phoronix.com \
    --cc=Nathan.Fontenot@amd.com \
    --cc=Suravee.Suthikulpanit@amd.com \
    --cc=Thomas.Lendacky@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=dietmar.eggemann@arm.com \
    --cc=ggherdovich@suse.cz \
    --cc=juri.lelli@redhat.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=puwen@hygon.cn \
    --cc=rjw@rjwysocki.net \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).