LKML Archive on
 help / color / Atom feed
From: Valentin Schneider <>
To: Parth Shah <>,
	Patrick Bellasi <>
Cc: Peter Zijlstra <>,
	Subhra Mazumdar <>,,,,,,,,,,
Subject: Re: [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice
Date: Fri, 6 Sep 2019 23:50:58 +0100
Message-ID: <> (raw)
In-Reply-To: <>

On 06/09/2019 18:10, Parth Shah wrote:
> Right, CPU capacity can solve the problem of indicating the thermal throttle to the scheduler.
> AFAIU, the patchset from Thara changes CPU capacity to reflect Thermal headroom of the CPU.
> This is a nice mitigation but,
> 1. Sometimes a single task is responsible for the Thermal heatup of the core, reducing the
>    CPU capacity of all the CPUs in the core is not optimal when just moving such single
>    task to other core can allow us to remain within thermal headroom. This is important
>    for the servers especially where there are upto 8 threads.> 2. Given the implementation in the patches and its integration with EAS, it seems difficult
>    to adapt to servers, where CPU capacity itself is in doubt.

I'd nuance this to *SMT* capacity (which isn't just servers). The thing is
that it's difficult to come up with a sensible scheme to describe the base
capacity of a single logical CPU. But yeah, valid point.

>> For active balance, we actually already have a condition that moves a task
>> to a less capacity-pressured CPU (although it is somewhat specific). So if
>> thermal pressure follows that task (e.g. it's doing tons of vector/float),
>> it will be rotated around.
> Agree. But this should break in certain conditions like when we have multiple tasks
> in a core with almost equal utilization among which one is just doing vector operations.
> LB can pick and move any task with equal probability if the capacity is reduced here.

Right, if/when we get things like per-unit signals (wasn't there something
about tracking AVX a few months back?) then we'll be able to make
more informed decisions, for now we'll need some handholding (read: task

>> However there should be a point made on latency vs throughput. If you
>> care about latency you probably do not want to active balance your task. If
> Can you please elaborate on why not to consider active balance for latency sensitive tasks?
> Because, sometimes finding a thermally cool core is beneficial when Turbo frequency
> range is around 20% above rated ones.

This goes back to my reply to Patrick further up the thread.

Right now active balance can happen just because we've been imbalanced for
some time and repeatedly failed to migrate anything. After 3 (IIRC) successive
failed attempts, we'll active balance the running task of the remote rq we
decided was busiest.

If that happens to be a latency sensitive task, that's not great - active
balancing means stopping that task's execution, so we're going to add some
latency to this latency-sensitive task. My proposal was to further ratelimit
active balance (e.g. require more failed attempts) when the task that would be
preempted is latency-sensitive.

My point is: if that task is doing fine where it is, why preempt it? That's
just introducing latency IMO (keeping in mind that those balance attempts
could happen despite not having any thermal pressure).

If you care about performance (e.g. a minimum level of throughput), to me
that is a separate (though perhaps not entirely distinct) property.

>> you care about throughput, it should be specified in some way (util-clamp
>> says hello!).
> yes I do care for latency and throughput both. :-)

Don't we all!

> but I'm wondering how uclamp can solve the problem for throughput.
> If I make the thermally hot tasks to appear bigger than other tasks then reducing
> CPU capacity can allow such tasks to move around the chip.
> But this will require the utilization value to be relatively large compared to the other
> tasks in the core. Or other task's uclamp.max can be lowered to make such task rotate.
> If I got it right, then this will be a difficult UCLAMP usecase from user perspective, right?
> I feel like I'm missing something here.

Hmm perhaps I was jumping the gun here. What I was getting to is if you have
something like misfit that migrates tasks to CPUs of higher capacity than the
one they are on, you could use uclamp to flag them.

You could translate your throughput requirement as a uclamp.min of e.g. 80%,
and if the CPU capacity goes below that (or close within a margin) then you'd
try to migrate the task to a CPU of higher capacity (i.e. not or less 
thermally pressured).

This doesn't have to involve your less throughput-sensitive tasks, since you
would only tag and take action for your throughput-sensitive tasks.

  reply index

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 17:49 [RFC PATCH 0/9] Task latency-nice subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice subhra mazumdar
2019-09-04 17:32   ` Tim Chen
2019-09-05  6:15     ` Parth Shah
2019-09-05 10:11       ` Patrick Bellasi
2019-09-06 12:22         ` Parth Shah
2019-09-05  8:31   ` Peter Zijlstra
2019-09-05  9:45     ` Patrick Bellasi
2019-09-05 10:46       ` Peter Zijlstra
2019-09-05 11:13         ` Qais Yousef
2019-09-05 11:30           ` Peter Zijlstra
2019-09-05 11:40             ` Patrick Bellasi
2019-09-05 11:48               ` Peter Zijlstra
2019-09-05 13:32                 ` Qais Yousef
2019-09-05 11:47             ` Qais Yousef
2020-04-16  0:02               ` Joel Fernandes
2020-04-16 17:23                 ` Dietmar Eggemann
2020-04-18 16:01                   ` Joel Fernandes
2020-04-20 11:26                     ` Parth Shah
2020-04-20 19:14                       ` Joel Fernandes
2020-04-20 11:47                     ` Qais Yousef
2020-04-20 19:10                       ` Joel Fernandes
2019-09-05 11:30           ` Patrick Bellasi
2019-09-05 11:47             ` Peter Zijlstra
2019-09-05 11:18         ` Patrick Bellasi
2019-09-05 11:40           ` Peter Zijlstra
2019-09-05 11:46             ` Patrick Bellasi
2019-09-05 11:46           ` Valentin Schneider
2019-09-05 13:07             ` Patrick Bellasi
2019-09-05 14:48               ` Valentin Schneider
2019-09-06 12:45               ` Parth Shah
2019-09-06 14:13                 ` Valentin Schneider
2019-09-06 14:32                   ` Vincent Guittot
2019-09-06 17:10                   ` Parth Shah
2019-09-06 22:50                     ` Valentin Schneider [this message]
2019-09-06 12:31       ` Parth Shah
2019-09-05 10:05   ` Patrick Bellasi
2019-09-05 10:48     ` Peter Zijlstra
2019-08-30 17:49 ` [RFC PATCH 2/9] sched: add search limit as per latency-nice subhra mazumdar
2019-09-05  6:22   ` Parth Shah
2019-08-30 17:49 ` [RFC PATCH 3/9] sched: add sched feature to disable idle core search subhra mazumdar
2019-09-05 10:17   ` Patrick Bellasi
2019-09-05 22:02     ` Subhra Mazumdar
2019-08-30 17:49 ` [RFC PATCH 4/9] sched: SIS_CORE " subhra mazumdar
2019-09-05 10:19   ` Patrick Bellasi
2019-08-30 17:49 ` [RFC PATCH 5/9] sched: Define macro for number of CPUs in core subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 6/9] x86/smpboot: Optimize cpumask_weight_sibling macro for x86 subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 7/9] sched: search SMT before LLC domain subhra mazumdar
2019-09-05  9:31   ` Peter Zijlstra
2019-09-05 20:40     ` Subhra Mazumdar
2019-08-30 17:49 ` [RFC PATCH 8/9] sched: introduce per-cpu var next_cpu to track search limit subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 9/9] sched: rotate the cpu search window for better spread subhra mazumdar
2019-09-05  6:37   ` Parth Shah
2019-09-05  5:55 ` [RFC PATCH 0/9] Task latency-nice Parth Shah
2019-09-05 10:31 ` Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on

Archives are clonable:
	git clone --mirror lkml/git/0.git
	git clone --mirror lkml/git/1.git
	git clone --mirror lkml/git/2.git
	git clone --mirror lkml/git/3.git
	git clone --mirror lkml/git/4.git
	git clone --mirror lkml/git/5.git
	git clone --mirror lkml/git/6.git
	git clone --mirror lkml/git/7.git
	git clone --mirror lkml/git/8.git
	git clone --mirror lkml/git/9.git
	git clone --mirror lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ \
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone