Linux-PM Archive on lore.kernel.org
 help / color / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Francisco Jerez <currojerez@riseup.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM <linux-pm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Amit Kucheria <amit.kucheria@linaro.org>,
	"Pandruvada, Srinivas" <srinivas.pandruvada@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 00/28] PM: QoS: Get rid of unuseful code and rework CPU latency QoS interface
Date: Mon, 24 Feb 2020 11:39:23 +0100
Message-ID: <CAJZ5v0iz5e6GhpJcphKtyzS=MeteuQeSVOVkL-9YjeQ3OWO-Jw@mail.gmail.com> (raw)
In-Reply-To: <87imk8hpud.fsf@riseup.net>

Sorry for the late response, I was offline for a major part of the
previous week.

On Fri, Feb 14, 2020 at 9:31 PM Francisco Jerez <currojerez@riseup.net> wrote:
>
> "Rafael J. Wysocki" <rafael@kernel.org> writes:
>
> > On Fri, Feb 14, 2020 at 1:14 AM Francisco Jerez <currojerez@riseup.net> wrote:
> >>
> >> "Rafael J. Wysocki" <rafael@kernel.org> writes:
> >>
> >> > On Thu, Feb 13, 2020 at 12:34 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > [cut]
> >
> >> >
> >> > I think that your use case is almost equivalent to the thermal
> >> > pressure one, so you'd want to limit the max and so that would be
> >> > something similar to store_max_perf_pct() with its input side hooked
> >> > up to a QoS list.
> >> >
> >> > But it looks like that QoS list would rather be of a "reservation"
> >> > type, so a request added to it would mean something like "leave this
> >> > fraction of power that appears to be available to the CPU subsystem
> >> > unused, because I need it for a different purpose".  And in principle
> >> > there might be multiple requests in there at the same time and those
> >> > "reservations" would add up.  So that would be a kind of "limited sum"
> >> > QoS type which wasn't even there before my changes.
> >> >
> >> > A user of that QoS list might then do something like
> >> >
> >> > ret = cpu_power_reserve_add(1, 4);
> >> >
> >> > meaning that it wants 25% of the "potential" CPU power to be not
> >> > utilized by CPU performance scaling and that could affect the
> >> > scheduler through load modifications (kind of along the thermal
> >> > pressure patchset discussed some time ago) and HWP (as well as the
> >> > non-HWP intel_pstate by preventing turbo frequencies from being used
> >> > etc).
> >>
> >> The problems with this are the same as with the per-CPU frequency QoS
> >> approach: How does the device driver know what the appropriate fraction
> >> of CPU power is?
> >
> > Of course it doesn't know and it may never know exactly, but it may guess.
> >
> > Also, it may set up a feedback loop: request an aggressive
> > reservation, run for a while, measure something and refine if there's
> > headroom.  Then repeat.
> >
>
> Yeah, of course, but that's obviously more computationally intensive and
> less accurate than computing an approximately optimal constraint in a
> single iteration (based on knowledge from performance counters and a
> notion of the latency requirements of the application), since such a
> feedback loop relies on repeatedly overshooting and undershooting the
> optimal value (the latter causes an artificial CPU bottleneck, possibly
> slowing down other applications too) in order to converge to and remain
> in a neighborhood of the optimal value.

I'm not saying that feedback loops are the way to go in general, but
that in some cases they are applicable and this particular case looks
like it may be one of them.

> Incidentally people tested a power balancing solution with a feedback
> loop very similar to the one you're describing side by side to the RFC
> patch series I provided a link to earlier (which targeted Gen9 LP
> parts), and the energy efficiency improvements they observed were
> roughly half of the improvement obtained with my series unsurprisingly.
>
> Not to speak about generalizing such a feedback loop to bottlenecks on
> multiple I/O devices.

The generalizing part I'm totally unconvinced above.

> >> Depending on the instantaneous behavior of the
> >> workload it might take 1% or 95% of the CPU power in order to keep the
> >> IO device busy.  Each user of this would need to monitor the performance
> >> of every CPU in the system and update the constraints on each of them
> >> periodically (whether or not they're talking to that IO device, which
> >> would possibly negatively impact the latency of unrelated applications
> >> running on other CPUs, unless we're willing to race with the task
> >> scheduler).
> >
> > No, it just needs to measure a signal representing how much power *it*
> > gets and decide whether or not it can let the CPU subsystem use more
> > power.
> >
>
> Well yes it's technically possible to set frequency constraints based on
> trial-and-error without sampling utilization information from the CPU
> cores, but don't we agree that this kind of information can be highly
> valuable?

OK, so there are three things, frequency constraints (meaning HWP min
and max limits, for example), frequency requests (this is what cpufreq
does) and power limits.

If the processor has at least some autonomy in driving the frequency,
using frequency requests (i.e. cpufreq governors) for limiting power
is inefficient in general, because the processor is not required to
grant those requests at all.

Using frequency limits may be good enough, but it generally limits the
processor's ability to respond at short-time scales (for example,
setting the max frequency limit will prevent the processor from using
frequencies above that limit even temporarily, but that might be the
most energy-efficient option in some cases).

Using power limits (which is what RAPL does) doesn't bring such shortcomings in.

> >> A solution based on utilization clamps (with some
> >> extensions) sounds more future-proof to me honestly.
> >
> > Except that it would be rather hard to connect it to something like
> > RAPL, which should be quite straightforward with the approach I'm
> > talking about.
> >
>
> I think using RAPL as additional control variable would be useful, but
> fully orthogonal to the cap being set by some global mechanism or being
> derived from the aggregation of a number of per-process power caps based
> on the scheduler behavior.

I'm not sure what do you mean by "the cap" here.  A maximum frequency
limit or something else?

> The latter sounds like the more reasonable
> fit for a multi-tasking, possibly virtualized environment honestly.
> Either way RAPL is neither necessary nor sufficient in order to achieve
> the energy efficiency improvement I'm working on.

The "not necessary" I can agree with, but I don't see any arguments
for the "not sufficient" statement.

> > The problem with all scheduler-based ways, again, is that there is no
> > direct connection between the scheduler and HWP,
>
> I was planning to introduce such a connection in RFC part 2.  I have a
> prototype for that based on a not particularly pretty custom interface,
> I wouldn't mind trying to get it to use utilization clamps if you think
> that's the way forward.

Well, I may think so, but that's just thinking at this point.  I have
no real numbers to support that theory.

> > or even with whatever the processor does with the P-states in the
> > turbo range.  If any P-state in the turbo range is requested, the
> > processor has a license to use whatever P-state it wants, so this
> > pretty much means allowing it to use as much power as it can.
> >
> > So in the first place, if you want to limit the use of power in the
> > CPU subsystem through frequency control alone, you need to prevent it
> > from using turbo P-states at all.  However, with RAPL you can just
> > limit power which may still allow some (but not all) turbo P-states to
> > be used.
>
> My goal is not to limit the use of power of the CPU (if it has enough
> load to utilize 100% of the cycles at turbo frequency so be it), but to
> get it to use it more efficiently.  If you are constrained by a given
> power budget (e.g. the TDP or the one you want set via RAPL) you can do
> more with it if you set a stable frequency rather than if you let the
> CPU bounce back and forth between turbo and idle.

Well, this basically means driving the CPU frequency by hand with the
assumption that the processor cannot do the right thing in this
respect, while in theory the HWP algorithm should be able to produce
the desired result.

IOW, your argumentation seems to go into the "HWP is useless"
direction, more or less and while there are people who will agree with
such a statement, others won't.

> This can only be
> achieved effectively if the frequency governor has a rough idea of the
> latency requirements of the workload, since it involves a
> latency/energy-efficiency trade-off.

Let me state this again (and this will be the last time, because I
don't really like to repeat points): the frequency governor can only
*request* the processor to do something in general and the request may
or may not be granted, for various reasons.  If it is not granted, the
whole "control" mechanism fails.

  reply index

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 22:51 Rafael J. Wysocki
2020-02-11 22:52 ` [PATCH 01/28] PM: QoS: Drop debugfs interface Rafael J. Wysocki
2020-02-11 22:58 ` [PATCH 02/28] PM: QoS: Drop pm_qos_update_request_timeout() Rafael J. Wysocki
2020-02-11 22:58 ` [PATCH 03/28] PM: QoS: Drop the PM_QOS_SUM QoS type Rafael J. Wysocki
2020-02-11 22:58 ` [PATCH 04/28] PM: QoS: Clean up pm_qos_update_target() and pm_qos_update_flags() Rafael J. Wysocki
2020-02-11 22:58 ` [PATCH 05/28] PM: QoS: Clean up pm_qos_read_value() and pm_qos_get/set_value() Rafael J. Wysocki
2020-02-11 22:59 ` [PATCH 06/28] PM: QoS: Drop iterations over global QoS classes Rafael J. Wysocki
2020-02-11 23:00 ` [PATCH 07/28] PM: QoS: Clean up misc device file operations Rafael J. Wysocki
2020-02-11 23:01 ` [PATCH 08/28] PM: QoS: Redefine struct pm_qos_request and drop struct pm_qos_object Rafael J. Wysocki
2020-02-11 23:02 ` [PATCH 09/28] PM: QoS: Drop PM_QOS_CPU_DMA_LATENCY notifier chain Rafael J. Wysocki
2020-02-11 23:04 ` [PATCH 10/28] PM: QoS: Rename things related to the CPU latency QoS Rafael J. Wysocki
2020-02-12 10:34   ` Rafael J. Wysocki
2020-02-12 19:13   ` Greg Kroah-Hartman
2020-02-11 23:06 ` [PATCH 11/28] PM: QoS: Simplify definitions of CPU latency QoS trace events Rafael J. Wysocki
2020-02-11 23:07 ` [PATCH 12/28] PM: QoS: Adjust pm_qos_request() signature and reorder pm_qos.h Rafael J. Wysocki
2020-02-11 23:07 ` [PATCH 13/28] PM: QoS: Add CPU latency QoS API wrappers Rafael J. Wysocki
2020-02-11 23:08 ` [PATCH 14/28] cpuidle: Call cpu_latency_qos_limit() instead of pm_qos_request() Rafael J. Wysocki
2020-02-11 23:10 ` [PATCH 15/28] x86: platform: iosf_mbi: Call cpu_latency_qos_*() instead of pm_qos_*() Rafael J. Wysocki
2020-02-12 10:14   ` Andy Shevchenko
2020-02-11 23:12 ` [PATCH 16/28] drm: i915: " Rafael J. Wysocki
2020-02-12 10:32   ` Rafael J. Wysocki
2020-02-14  7:42   ` Jani Nikula
2020-02-11 23:13 ` [PATCH 17/28] drivers: hsi: " Rafael J. Wysocki
2020-02-13 21:06   ` Sebastian Reichel
2020-02-11 23:17 ` [PATCH 18/28] drivers: media: " Rafael J. Wysocki
2020-02-12  5:37   ` Mauro Carvalho Chehab
2020-02-11 23:21 ` [PATCH 19/28] drivers: mmc: " Rafael J. Wysocki
2020-02-11 23:24 ` [PATCH 20/28] drivers: net: " Rafael J. Wysocki
2020-02-11 23:48   ` Jeff Kirsher
2020-02-12  5:49   ` Kalle Valo
2020-02-11 23:26 ` [PATCH 21/28] drivers: spi: " Rafael J. Wysocki
2020-02-11 23:27 ` [PATCH 22/28] drivers: tty: " Rafael J. Wysocki
2020-02-12 10:35   ` Rafael J. Wysocki
2020-02-12 19:13   ` Greg Kroah-Hartman
2020-02-11 23:28 ` [PATCH 23/28] drivers: usb: " Rafael J. Wysocki
2020-02-12 18:38   ` Greg KH
2020-02-18  8:03     ` Peter Chen
2020-02-18  8:08       ` Greg KH
2020-02-18  8:11         ` Peter Chen
2020-02-19  1:09   ` Peter Chen
2020-02-11 23:34 ` [PATCH 24/28] sound: " Rafael J. Wysocki
2020-02-12 10:08   ` Mark Brown
2020-02-12 10:16     ` Rafael J. Wysocki
2020-02-12 10:21       ` Takashi Iwai
2020-02-12 10:18   ` Mark Brown
2020-02-11 23:35 ` [PATCH 25/28] PM: QoS: Drop PM_QOS_CPU_DMA_LATENCY and rename related functions Rafael J. Wysocki
2020-02-11 23:35 ` [PATCH 26/28] PM: QoS: Update file information comments Rafael J. Wysocki
2020-02-11 23:36 ` [PATCH 27/28] Documentation: PM: QoS: Update to reflect previous code changes Rafael J. Wysocki
2020-02-11 23:37 ` [PATCH 28/28] PM: QoS: Make CPU latency QoS depend on CONFIG_CPU_IDLE Rafael J. Wysocki
2020-02-12  8:37 ` [PATCH 00/28] PM: QoS: Get rid of unuseful code and rework CPU latency QoS interface Ulf Hansson
2020-02-12  9:17   ` Rafael J. Wysocki
2020-02-12  9:39 ` Rafael J. Wysocki
2020-02-12 23:32 ` Francisco Jerez
2020-02-13  0:16   ` Rafael J. Wysocki
2020-02-13  0:37     ` Rafael J. Wysocki
2020-02-13  8:10       ` Francisco Jerez
2020-02-13 11:38         ` Rafael J. Wysocki
2020-02-21 22:10           ` Francisco Jerez
2020-02-24  0:29             ` Rafael J. Wysocki
2020-02-24 21:06               ` Francisco Jerez
2020-02-13  8:07     ` Francisco Jerez
2020-02-13 11:34       ` Rafael J. Wysocki
2020-02-13 16:35         ` Rafael J. Wysocki
2020-02-14  0:15           ` Francisco Jerez
2020-02-14 10:42             ` Rafael J. Wysocki
2020-02-14 20:32               ` Francisco Jerez
2020-02-24 10:39                 ` Rafael J. Wysocki [this message]
2020-02-24 21:16                   ` Francisco Jerez
2020-02-14  0:14         ` Francisco Jerez
2020-02-13  7:10 ` Amit Kucheria
2020-02-13 10:17   ` Rafael J. Wysocki
2020-02-13 10:22     ` Rafael J. Wysocki
2020-02-13 10:49     ` Amit Kucheria
2020-02-13 11:36       ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0iz5e6GhpJcphKtyzS=MeteuQeSVOVkL-9YjeQ3OWO-Jw@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=amit.kucheria@linaro.org \
    --cc=currojerez@riseup.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=rodrigo.vivi@intel.com \
    --cc=srinivas.pandruvada@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pm/0 linux-pm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pm linux-pm/ https://lore.kernel.org/linux-pm \
		linux-pm@vger.kernel.org
	public-inbox-index linux-pm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git