linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Steven Sistare <steven.sistare@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: mingo@redhat.com, subhra.mazumdar@oracle.com,
	dhaval.giani@oracle.com, daniel.m.jordan@oracle.com,
	pavel.tatashin@microsoft.com, matt@codeblueprint.co.uk,
	umgwanakikbuti@gmail.com, riel@redhat.com, jbacik@fb.com,
	juri.lelli@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/10] steal tasks to improve CPU utilization
Date: Wed, 24 Oct 2018 16:34:34 +0100	[thread overview]
Message-ID: <a43db228-ddd0-c30f-6ba0-8d54f17f57c7@arm.com> (raw)
In-Reply-To: <8e38ce84-ec1a-aef7-4784-462ef754f62a@oracle.com>

Hi,

On 22/10/2018 20:07, Steven Sistare wrote:
> On 10/22/2018 1:04 PM, Peter Zijlstra wrote:
[...]
> 
> We could delete idle_balance() and use stealing exclusively for handling
> new idle.  For each sd level, stealing would look for an overloaded CPU
> in the overloaded bitmap(s) that overlap that level.  I played with that
> a little but it is not ready for prime time, and I did not want to hold
> the patch series for it.  Also, I would like folks to get some production
> experience with stealing on a variety of architectures before considering
> a radical step like replacing idle_balance().
> 

I think this could work fine for standard symmetrical systems, but I have
some concerns for asymmetric systems (Arm big.LITTLE & co). One thing that
should show up in 4.20-rc1 is the misfit logic, which caters to those
asymmetric systems.

If you look at 757ffdd705ee ("sched/fair: Set rq->rd->overload when
misfit") on Linus' tree, we can set rq->rd->overload even if
(rq->nr_running == 1). This is because we do want to do an idle_balance()
when we have misfit tasks, which should lead to active balancing one of
those CPU-hungry tasks to move it to a more powerful CPU.

With a pure try_steal() approach, we won't do any active balancing - we
could steal some task from a cfs_overload_cpu but that's not what the
load balancer would have done. The load balancer would only do such a thing
if the imbalance type is group_overloaded, which means:

  sum_nr_running > group_weight &&
  group_util * sd->imbalance_pct > group_capacity * 100

(IOW the number of tasks running on the CPU is not the sole deciding
factor)

Otherwise, misfit tasks (group_misfit_task imbalance type) would have
priority.

Perhaps we could decorate the cfs_overload_cpus with some more information
(e.g. misfit task presence), but then we'd have to add some logic to decide
when to steal what.




We'd also lose the NOHZ update done in idle_balance(), though I think it's
not such a big deal - were were piggy-backing this on idle_balance() just
because it happened to be convenient, and we still have NOHZ_STATS_KICK
anyway.




Another thing - in your test cases, what is the most prevalent cause of
failure to pull a task in idle_balance()? Is it the load_balance() itself
that fails to find a task (e.g. because the imbalance is not deemed big
enough), or is it the idle migration cost logic that prevents
load_balance() from running to completion?

In the first case, try_steal() makes perfect sense to me. In the second
case, I'm not sure if we really want to pull something if we know (well,
we *think*) we're about to resume the execution of some other task.

> We could merge the stealing code into the idle_balance() code to get a
> union of the two, but IMO that would be less readable.
> 
> We could remove the core and socket levels from idle_balance()

I understand that as only doing load_balance() at DIE level in
idle_balance(), as that is what makes most sense to me (with big.LITTLE
those misfit migrations are done at DIE level), is that correct?

Also, with DynamIQ (next gen big.LITTLE) we could have asymmetry at MC
level, which could cause issues there.

> and let
> stealing handle those levels.  I think that makes sense after stealing
> performance is validated on more architectures, but we would still have
> two different mechanisms.
> 
> - Steve
> 

I'll try out those patches on top of the misfit series to see how the
whole thing behaves.

  parent reply	other threads:[~2018-10-24 15:34 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-22 14:59 [PATCH 00/10] steal tasks to improve CPU utilization Steve Sistare
2018-10-22 14:59 ` [PATCH 01/10] sched: Provide sparsemask, a reduced contention bitmap Steve Sistare
2018-10-22 14:59 ` [PATCH 02/10] sched/topology: Provide hooks to allocate data shared per LLC Steve Sistare
2018-10-22 14:59 ` [PATCH 03/10] sched/topology: Provide cfs_overload_cpus bitmap Steve Sistare
2018-10-22 14:59 ` [PATCH 04/10] sched/fair: Dynamically update cfs_overload_cpus Steve Sistare
2018-10-22 16:56   ` Peter Zijlstra
2018-10-22 18:43     ` Steven Sistare
2018-10-22 14:59 ` [PATCH 05/10] sched/fair: Hoist idle_stamp up from idle_balance Steve Sistare
2018-10-25 13:47   ` Valentin Schneider
2018-10-25 14:04     ` Steven Sistare
2018-10-22 14:59 ` [PATCH 06/10] sched/fair: Generalize the detach_task interface Steve Sistare
2018-10-22 14:59 ` [PATCH 07/10] sched/fair: Provide can_migrate_task_llc Steve Sistare
2018-10-26 18:04   ` Valentin Schneider
2018-10-26 18:28     ` Steven Sistare
2018-10-29 19:34       ` Valentin Schneider
2018-10-31 15:43         ` Steven Sistare
2018-10-31 18:48           ` Valentin Schneider
2018-10-31 19:14         ` Peter Zijlstra
2018-11-01 11:16           ` Valentin Schneider
2018-10-22 14:59 ` [PATCH 08/10] sched/fair: Steal work from an overloaded CPU when CPU goes idle Steve Sistare
2018-10-25 13:48   ` Valentin Schneider
2018-10-25 14:07     ` Steven Sistare
2018-10-22 14:59 ` [PATCH 09/10] sched/fair: disable stealing if too many NUMA nodes Steve Sistare
2018-10-22 17:06   ` Peter Zijlstra
2018-10-22 18:47     ` Steven Sistare
2018-10-22 19:21       ` Steven Sistare
2018-10-22 22:05         ` Peter Zijlstra
2018-10-23 13:18   ` Steven Sistare
2018-10-22 14:59 ` [PATCH 10/10] sched/fair: Provide idle search schedstats Steve Sistare
2018-10-22 17:04 ` [PATCH 00/10] steal tasks to improve CPU utilization Peter Zijlstra
2018-10-22 19:07   ` Steven Sistare
2018-10-22 22:09     ` Peter Zijlstra
2018-10-24 15:34     ` Valentin Schneider [this message]
2018-10-24 19:27       ` Steven Sistare
2018-10-25 11:31         ` Valentin Schneider
2018-10-25 12:21           ` Steven Sistare
2018-10-25  7:50 ` Vincent Guittot
2018-10-25 11:28   ` Steven Sistare
2018-10-25 12:43     ` Vincent Guittot
2018-10-25 14:19       ` Steven Sistare
2018-10-31 19:35 ` Steven Sistare
2018-11-01 11:56 ` Steven Sistare
2018-11-02 23:39 ` Subhra Mazumdar
2018-11-05 20:08   ` Steven Sistare
2019-01-04 13:37 ` Shijith Thotton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a43db228-ddd0-c30f-6ba0-8d54f17f57c7@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dhaval.giani@oracle.com \
    --cc=jbacik@fb.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@redhat.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=steven.sistare@oracle.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).