linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Ionela Voinescu <ionela.voinescu@arm.com>,
	Sudeep Holla <sudeep.holla@arm.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Hanjun Guo <guohanjun@huawei.com>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [Question]: about 'cpuinfo_cur_freq' shown in sysfs when the CPU is in idle state
Date: Fri, 12 Jun 2020 13:55:20 +0200	[thread overview]
Message-ID: <CAJZ5v0h1-AE31axGJXmDgc22-jMZ6vWTc1K7b2BmbO8JTwX7tg@mail.gmail.com> (raw)
In-Reply-To: <9d715cb0-0a94-0008-5653-e8d9ad1c5332@huawei.com>

On Thu, Jun 11, 2020 at 3:52 AM Xiongfeng Wang
<wangxiongfeng2@huawei.com> wrote:
>
> Hi Ionela,
>
> Thanks for your reply !
>
> On 2020/6/10 17:40, Ionela Voinescu wrote:
> > Hi guys,
> >
> > Sorry for showing up late to the party, I was on holiday last week.
> >
> > On Thursday 04 Jun 2020 at 13:58:22 (+0100), Sudeep Holla wrote:
> >> On Thu, Jun 04, 2020 at 12:42:06PM +0200, Rafael J. Wysocki wrote:
> >>> On Thu, Jun 4, 2020 at 6:41 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>>
> >>>> On 04-06-20, 09:32, Xiongfeng Wang wrote:
> >>>>> On 2020/6/3 21:39, Rafael J. Wysocki wrote:
> >>>>>> The frequency value obtained by kicking the CPU out of idle
> >>>>>> artificially is bogus, though.  You may as well return a random number
> >>>>>> instead.
> >>>>>
> >>>>> Yes, it may return a randowm number as well.
> >>>>>
> >>>>>>
> >>>>>> The frequency of a CPU in an idle state is in fact unknown in the case
> >>>>>> at hand, so returning 0 looks like the cleanest option to me.
> >>>>>
> >>>>> I am not sure about how the user will use 'cpuinfo_cur_freq' in sysfs. If I
> >>>>> return 0 when the CPU is idle, when I run a light load on the CPU, I will get a
> >>>>> zero value for 'cpuinfo_cur_freq' when the CPU is idle. When the CPU is not
> >>>>> idle, I will get a non-zero value. The user may feel odd about
> >>>>> 'cpuinfo_cur_frreq' switching between a zero value and a non-zero value. They
> >>>>> may hope it can return the frequency when the CPU execute instructions, namely
> >>>>> in C0 state. I am not so sure about the user will look at 'cpuinfo_cur_freq'.
> >>>>
> >>>> This is what I was worried about as well. The interface to sysfs needs
> >>>> to be robust. Returning frequency on some readings and 0 on others
> >>>> doesn't look right to me as well. This will break scripts (I am not
> >>>> sure if some scripts are there to look for these values) with the
> >>>> randomness of values returned by it.
> >>>
> >>> The only thing the scripts need to do is to skip zeros (or anything
> >>> less than the minimum hw frequency for that matter) coming from that
> >>> attribute.
> >>>
> >>>> On reading values locally from the CPU, I thought about the case where
> >>>> userspace can prevent a CPU going into idle just by reading its
> >>>> frequency from sysfs (and so waste power), but the same can be done by
> >>>> userspace to run arbitrary load on the CPUs.
> >>>>
> >>>> Can we do some sort of caching of the last frequency the CPU was
> >>>> running at before going into idle ? Then we can just check if cpu is
> >>>> idle and so return cached value.
> >>>
> >>> That is an option, but it looks like in this case the cpuinfo_cur_freq
> >>> attribute should not be present at all, as per the documentation.
> >>>
> >>
> >> +1 for dropping the attribute.
> >>
> >
> > I've been experimenting with some code quite recently that uses the
> > scheduler frequency scale factor to compute this hardware current rate
> > for CPPC.
> >
> > On the scheduler tick, the scale factor is computed in
> > arch_scale_freq_tick() to give an indication on delivered performance,
> > using AMUs on arm64 [1] and APERF/MPERF on x86 [2]. Basically, this scale
> > factor has the cached value of the average delivered performance between
> > the last two scheduler ticks, on a capacity scale: 0-1024. All that would
> > be needed is to convert from the scheduler frequency scale to the CPPC
> > expected performance scale.
> >
> > The gist of the code would be:
> >
> > delivered_perf = topology_get_freq_scale(cpu);
> > delivered_perf *= fb_ctrs.reference_perf;
> > delivered_perf = div64_u64(delivered_perf << SCHED_CAPACITY_SHIFT,
> >                          per_cpu(arch_max_freq_scale, cpu));
> >
> > While this solution is not perfect, it would provide the best view of
> > the hardware "current" rate without the cost of waking up the CPU when
> > idle, scheduling additional work on the CPU, doing checks on whether
> > the CPU is idle and/or providing other caching mechanisms.
>
> I think it's a good idea. It's just that the value is a average frequency in the
> last two scheduler ticks, not an instantaneous frequency. What
> 'cppc_cpufreq_get_rate()' get is also not an  instantaneous frequency. It's a
> average frequency in 2us. I check the interval between two frequency updates on
> my machine. It's about 10ms. So the frequency will change at least one time in
> two scheduler ticks if HZ is 1000.
>
> I am not sure how people will use 'cpuinfo_cur_freq'. They may not expect a very
> accurate frequency. How about we return this value when CPU is idle? Or just
> return 0 in idle is better ?

According to the documentation, if this attribute is present, it
should return the exact frequency of the CPU (with a caveat that it
may change already when user space is able to consume the value), and
which is what the users may expect.

As I said before, IMO the CPPC driver should not cause the core to
expose cpuinfo_cur_freq at all.

Thanks!

      reply	other threads:[~2020-06-12 11:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02  3:34 [Question]: about 'cpuinfo_cur_freq' shown in sysfs when the CPU is in idle state Xiongfeng Wang
2020-06-03  2:05 ` Hanjun Guo
2020-06-03  7:52 ` Viresh Kumar
2020-06-03 10:07   ` Sudeep Holla
2020-06-03 10:10     ` Viresh Kumar
2020-06-03 10:17       ` Sudeep Holla
2020-06-03 10:21         ` Viresh Kumar
2020-06-03 10:40           ` Sudeep Holla
2020-06-03 13:39   ` Rafael J. Wysocki
2020-06-04  1:32     ` Xiongfeng Wang
2020-06-04  4:41       ` Viresh Kumar
2020-06-04 10:42         ` Rafael J. Wysocki
2020-06-04 12:58           ` Sudeep Holla
2020-06-10  9:40             ` Ionela Voinescu
2020-06-11  1:52               ` Xiongfeng Wang
2020-06-12 11:55                 ` Rafael J. Wysocki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJZ5v0h1-AE31axGJXmDgc22-jMZ6vWTc1K7b2BmbO8JTwX7tg@mail.gmail.com \
    --to=rafael@kernel.org \
    --cc=guohanjun@huawei.com \
    --cc=ionela.voinescu@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=sudeep.holla@arm.com \
    --cc=viresh.kumar@linaro.org \
    --cc=wangxiongfeng2@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).