All of lore.kernel.org
 help / color / mirror / Atom feed
From: Beata Michalska <beata.michalska@arm.com>
To: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>,
	viresh.kumar@linaro.org
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, ionela.voinescu@arm.com,
	sudeep.holla@arm.com, will@kernel.org, catalin.marinas@arm.com,
	vincent.guittot@linaro.org, sumitg@nvidia.com,
	yang@os.amperecomputing.com, lihuisong@huawei.com
Subject: Re: Re: [PATCH v4 4/4] cpufreq: Use arch specific feedback for cpuinfo_cur_freq
Date: Fri, 26 Apr 2024 12:45:39 +0200	[thread overview]
Message-ID: <ZiuF0zgqkMlmkEZz@arm.com> (raw)
In-Reply-To: <s2bel7fzwpkyfyfkhod4xaihuklsaum75ycbcgmcanqaezxdu7@uxvqdqt3yo7l>

On Wed, Apr 17, 2024 at 02:38:58PM -0700, Vanshidhar Konda wrote:
> On Tue, Apr 16, 2024 at 05:46:18PM +0200, Beata Michalska wrote:
> > On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote:
> > > On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote:
> > > > Some architectures provide a way to determine an average frequency over
> > > > a certain period of time based on available performance monitors (AMU on
> > > > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
> > > > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
> > > > represent the current frequency of a given CPU, as obtained by the hardware.
> > > > This is the type of feedback that counters do provide.
> > > >
> > > 
> > > --- snip ---
> > > 
> > > While testing this patch series on AmpereOne system, I simulated CPU
> > > frequency throttling when the system is under power or thermal
> > > constraints.
> > > 
> > > In this scenario, based on the user guilde, I expect scaling_cur_freq
> > > is the frequency the kernel requests from the hardware; cpuinfo_cur_freq
> > > is the actual frequency that the hardware is able to run at during the
> > > power or thermal constraints.
> > There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1].
> > The guidelines you are referring here (assuming you mean [2]) are kinda
> > out-of-sync already as scaling_cur_freq has been wired earlier to use arch
> > specific feedback. As there was no Arm dedicated implementation of
> > arch_freq_get_on_cpu, this went kinda unnoticed.
> > The conclusion of the above mentioned discussion (though rather unstated
> > explicitly) was to keep the current behaviour of scaling_cur_freq and align
> > both across different archs: so with the patches, both attributes will provide
> > hw feedback on current frequency, when available.
> > Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use
> > the counters really.
> > 
> > That change was also requested through [3]
> > 
> > Adding @Viresh in case there was any shift in the tides ....
> > 
> 
> Thank you for the pointer to the discussion in [1]. It makes sense to
> bring arm64 behavior in line with x86. The question about whether
> modifying the behavior of scaling_cur_freq was a good idea did not get
> any response.
> 
> > > 
> > > The AmpereOne system I'm testing on has the following configuration:
> > > - Max frequency is 3000000
> > > - Support for AMU registers
> > > - ACPI CPPC feedback counters use PCC register space
> > > - Fedora 39 with 6.7.5 kernel
> > > - Fedora 39 with 6.9.0-rc3 + this patch series
> > > 
> > > With 6.7.5 kernel:
> > > Core        scaling_cur_freq        cpuinfo_cur_freq
> > > ----        ----------------        ----------------
> > > 0             3000000                 2593000
> > > 1             3000000                 2613000
> > > 2             3000000                 2625000
> > > 3             3000000                 2632000
> > > 
> > So if I got it right from the info you have provided the numbers above are
> > obtained without applying the patches. In that case, scaling_cur_freq will
> > use policy->cur (in your case) showing last frequency set, not necessarily
> > the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters.
> > 
> > 
> > > With 6.9.0-rc3 + this patch series:
> > > Core        scaling_cur_freq        cpuinfo_cur_freq
> > > ----        ----------------        ----------------
> > > 0             2671875                 2671875
> > > 1             2589632                 2589632
> > > 2             2648437                 2648437
> > > 3             2698242                 2698242
> > > 
> > With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU
> > counters, or fie scale factor obtained based on AMU counters to be more precise:
> > both should now show similar/same frequency (as discussed in [1])
> > I'd say due to existing implementation for scaling_cur_freq (which we cannot
> > change at this point) this is unavoidable.
> > 
> > > In the second case we can't identify that the CPU frequency is
> > > being throttled by the hardware. I noticed this behavior with
> > > or without this patch.
> > > 
> > I am not entirely sure comparing the two should be a way to go about throttling
> > (whether w/ or w/o the changes).
> > It would probably be best to refer to thermal sysfs and get a hold of cur_state
> 
> Throttling could happen due to non-thermal reasons. Or a system may not
> even support thermal zones. So on those systems we wouldn't be able to
> identify/debug the behavior of the hardware providing less than maximum
> performance. The discussion around scaling_cur_freq should probably be
> re-visited in a targeted manner I think.
> 

@Viresh:

It seems that we might need to revisit the discussion we've had around
scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu.
As Vanshi has raised, having both utilizing arch specific feedback for
getting current frequency is bit problematic and might be confusing at best.
As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not
many options we are left with, if we want to kee all archs aligned:
we can either try to rework show_scaling_cur_freq and it's use of
arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with
relevant docs, though that will not work for x86, or we keep it only there and
skip updating cpuinfo_cur_freq, going against the guidelines. Other options,
purely theoretical, would involve making arch_freq_get_on_cpu aware of type of
the info requested (hw vs sw) or adding yet another arch-specific implementation,
and those are not really appealing alternatives to say at least.
What's your opinion on this one ?

---
BR
Beata


> I'll test v5 of the series on AmpereOne and report back on that thread.
> 
> Thanks,
> Vanshi
> 
> > which should indicate current throttle state:
> > 
> > /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state
> > 
> > with values above '0' implying ongoing throttling.
> > 
> > The appropriate thermal_zone can be identified through 'type' attribute.
> > 
> > 
> > Thank you for giving those patches a spin.
> > 
> > ---
> > BR
> > Beata
> > ---
> > [1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/
> > [2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197
> > [3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/
> > ---
> > 
> > 
> > > Thanks,
> > > Vanshi

WARNING: multiple messages have this Message-ID (diff)
From: Beata Michalska <beata.michalska@arm.com>
To: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>,
	viresh.kumar@linaro.org
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, ionela.voinescu@arm.com,
	sudeep.holla@arm.com, will@kernel.org, catalin.marinas@arm.com,
	vincent.guittot@linaro.org, sumitg@nvidia.com,
	yang@os.amperecomputing.com, lihuisong@huawei.com
Subject: Re: Re: [PATCH v4 4/4] cpufreq: Use arch specific feedback for cpuinfo_cur_freq
Date: Fri, 26 Apr 2024 12:45:39 +0200	[thread overview]
Message-ID: <ZiuF0zgqkMlmkEZz@arm.com> (raw)
In-Reply-To: <s2bel7fzwpkyfyfkhod4xaihuklsaum75ycbcgmcanqaezxdu7@uxvqdqt3yo7l>

On Wed, Apr 17, 2024 at 02:38:58PM -0700, Vanshidhar Konda wrote:
> On Tue, Apr 16, 2024 at 05:46:18PM +0200, Beata Michalska wrote:
> > On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote:
> > > On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote:
> > > > Some architectures provide a way to determine an average frequency over
> > > > a certain period of time based on available performance monitors (AMU on
> > > > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
> > > > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
> > > > represent the current frequency of a given CPU, as obtained by the hardware.
> > > > This is the type of feedback that counters do provide.
> > > >
> > > 
> > > --- snip ---
> > > 
> > > While testing this patch series on AmpereOne system, I simulated CPU
> > > frequency throttling when the system is under power or thermal
> > > constraints.
> > > 
> > > In this scenario, based on the user guilde, I expect scaling_cur_freq
> > > is the frequency the kernel requests from the hardware; cpuinfo_cur_freq
> > > is the actual frequency that the hardware is able to run at during the
> > > power or thermal constraints.
> > There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1].
> > The guidelines you are referring here (assuming you mean [2]) are kinda
> > out-of-sync already as scaling_cur_freq has been wired earlier to use arch
> > specific feedback. As there was no Arm dedicated implementation of
> > arch_freq_get_on_cpu, this went kinda unnoticed.
> > The conclusion of the above mentioned discussion (though rather unstated
> > explicitly) was to keep the current behaviour of scaling_cur_freq and align
> > both across different archs: so with the patches, both attributes will provide
> > hw feedback on current frequency, when available.
> > Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use
> > the counters really.
> > 
> > That change was also requested through [3]
> > 
> > Adding @Viresh in case there was any shift in the tides ....
> > 
> 
> Thank you for the pointer to the discussion in [1]. It makes sense to
> bring arm64 behavior in line with x86. The question about whether
> modifying the behavior of scaling_cur_freq was a good idea did not get
> any response.
> 
> > > 
> > > The AmpereOne system I'm testing on has the following configuration:
> > > - Max frequency is 3000000
> > > - Support for AMU registers
> > > - ACPI CPPC feedback counters use PCC register space
> > > - Fedora 39 with 6.7.5 kernel
> > > - Fedora 39 with 6.9.0-rc3 + this patch series
> > > 
> > > With 6.7.5 kernel:
> > > Core        scaling_cur_freq        cpuinfo_cur_freq
> > > ----        ----------------        ----------------
> > > 0             3000000                 2593000
> > > 1             3000000                 2613000
> > > 2             3000000                 2625000
> > > 3             3000000                 2632000
> > > 
> > So if I got it right from the info you have provided the numbers above are
> > obtained without applying the patches. In that case, scaling_cur_freq will
> > use policy->cur (in your case) showing last frequency set, not necessarily
> > the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters.
> > 
> > 
> > > With 6.9.0-rc3 + this patch series:
> > > Core        scaling_cur_freq        cpuinfo_cur_freq
> > > ----        ----------------        ----------------
> > > 0             2671875                 2671875
> > > 1             2589632                 2589632
> > > 2             2648437                 2648437
> > > 3             2698242                 2698242
> > > 
> > With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU
> > counters, or fie scale factor obtained based on AMU counters to be more precise:
> > both should now show similar/same frequency (as discussed in [1])
> > I'd say due to existing implementation for scaling_cur_freq (which we cannot
> > change at this point) this is unavoidable.
> > 
> > > In the second case we can't identify that the CPU frequency is
> > > being throttled by the hardware. I noticed this behavior with
> > > or without this patch.
> > > 
> > I am not entirely sure comparing the two should be a way to go about throttling
> > (whether w/ or w/o the changes).
> > It would probably be best to refer to thermal sysfs and get a hold of cur_state
> 
> Throttling could happen due to non-thermal reasons. Or a system may not
> even support thermal zones. So on those systems we wouldn't be able to
> identify/debug the behavior of the hardware providing less than maximum
> performance. The discussion around scaling_cur_freq should probably be
> re-visited in a targeted manner I think.
> 

@Viresh:

It seems that we might need to revisit the discussion we've had around
scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu.
As Vanshi has raised, having both utilizing arch specific feedback for
getting current frequency is bit problematic and might be confusing at best.
As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not
many options we are left with, if we want to kee all archs aligned:
we can either try to rework show_scaling_cur_freq and it's use of
arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with
relevant docs, though that will not work for x86, or we keep it only there and
skip updating cpuinfo_cur_freq, going against the guidelines. Other options,
purely theoretical, would involve making arch_freq_get_on_cpu aware of type of
the info requested (hw vs sw) or adding yet another arch-specific implementation,
and those are not really appealing alternatives to say at least.
What's your opinion on this one ?

---
BR
Beata


> I'll test v5 of the series on AmpereOne and report back on that thread.
> 
> Thanks,
> Vanshi
> 
> > which should indicate current throttle state:
> > 
> > /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state
> > 
> > with values above '0' implying ongoing throttling.
> > 
> > The appropriate thermal_zone can be identified through 'type' attribute.
> > 
> > 
> > Thank you for giving those patches a spin.
> > 
> > ---
> > BR
> > Beata
> > ---
> > [1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/
> > [2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197
> > [3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/
> > ---
> > 
> > 
> > > Thanks,
> > > Vanshi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-26 10:45 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 13:33 [PATCH v4 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
2024-04-05 13:33 ` Beata Michalska
2024-04-05 13:33 ` [PATCH v4 1/4] arch_topology: init capacity_freq_ref to 0 Beata Michalska
2024-04-05 13:33   ` Beata Michalska
2024-04-08  8:35   ` Vincent Guittot
2024-04-08  8:35     ` Vincent Guittot
2024-04-05 13:33 ` [PATCH v4 2/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
2024-04-05 13:33   ` Beata Michalska
2024-04-05 13:33 ` [PATCH v4 3/4] arm64: Update AMU-based frequency scale factor on entering idle Beata Michalska
2024-04-05 13:33   ` Beata Michalska
2024-04-10 18:57   ` Sumit Gupta
2024-04-10 18:57     ` Sumit Gupta
2024-04-11 19:30     ` Beata Michalska
2024-04-11 19:30       ` Beata Michalska
2024-04-05 13:33 ` [PATCH v4 4/4] cpufreq: Use arch specific feedback for cpuinfo_cur_freq Beata Michalska
2024-04-05 13:33   ` Beata Michalska
2024-04-16  4:23   ` Vanshidhar Konda
2024-04-16  4:23     ` Vanshidhar Konda
2024-04-16 15:46     ` Beata Michalska
2024-04-16 15:46       ` Beata Michalska
2024-04-17 21:38       ` Vanshidhar Konda
2024-04-17 21:38         ` Vanshidhar Konda
2024-04-26 10:45         ` Beata Michalska [this message]
2024-04-26 10:45           ` Beata Michalska
2024-04-29  9:25           ` Viresh Kumar
2024-04-29  9:25             ` Viresh Kumar
2024-05-01 14:46             ` Vanshidhar Konda
2024-05-01 14:46               ` Vanshidhar Konda
2024-05-07  8:31             ` Beata Michalska
2024-05-07  8:31               ` Beata Michalska
2024-05-07 10:02               ` Beata Michalska
2024-05-07 10:02                 ` Beata Michalska
2024-05-20  9:18                 ` Viresh Kumar
2024-05-20  9:18                   ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiuF0zgqkMlmkEZz@arm.com \
    --to=beata.michalska@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=ionela.voinescu@arm.com \
    --cc=lihuisong@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=sumitg@nvidia.com \
    --cc=vanshikonda@os.amperecomputing.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.