linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Steven Sistare <steven.sistare@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: mingo@redhat.com, subhra.mazumdar@oracle.com,
	dhaval.giani@oracle.com, daniel.m.jordan@oracle.com,
	pavel.tatashin@microsoft.com, matt@codeblueprint.co.uk,
	umgwanakikbuti@gmail.com, riel@redhat.com, jbacik@fb.com,
	juri.lelli@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/10] steal tasks to improve CPU utilization
Date: Thu, 25 Oct 2018 12:31:12 +0100	[thread overview]
Message-ID: <09b10abc-8357-2db3-3d30-8aa9e95e8655@arm.com> (raw)
In-Reply-To: <abf3ae2a-a7f4-2524-0da6-09599928b47a@oracle.com>


On 24/10/2018 20:27, Steven Sistare wrote:
[...]
> Hi Valentin,
> 
> Asymmetric systems could maintain a separate bitmap for misfits; set a bit 
> when a CPU goes on CPU, clear it going off.  When a fast CPU goes new idle,
> it would first search the misfits mask, then search cfs_overload_cpus.
> The misfits logic would be conditionalized with CONFIG or sched feat static 
> branches so symmetric systems do not incur extra overhead.
> 

That sounds reasonable - besides, misfit already introduces a
sched_asym_cpucapacity static key. I'll try to play around with that.

>> We'd also lose the NOHZ update done in idle_balance(), though I think it's
>> not such a big deal - were were piggy-backing this on idle_balance() just
>> because it happened to be convenient, and we still have NOHZ_STATS_KICK
>> anyway.
> 
> Agreed.
>  
>> Another thing - in your test cases, what is the most prevalent cause of
>> failure to pull a task in idle_balance()? Is it the load_balance() itself
>> that fails to find a task (e.g. because the imbalance is not deemed big
>> enough), or is it the idle migration cost logic that prevents
>> load_balance() from running to completion?
> 
> The latter.  Eg, for the test "X6-2, 40 CPUs, hackbench 3 process 50000",
> CPU avg_idle is 355566 nsec, and sched_migration_cost_ns = 500000,
> so idle_balance bails at the top:
>           if (this_rq->avg_idle < sysctl_sched_migration_cost ||
>             ...
>             goto out
> 
> For other tests, we get past that clause but bail from a domain:
>       if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
>            ...
>            break;
> 
>> In the first case, try_steal() makes perfect sense to me. In the second
>> case, I'm not sure if we really want to pull something if we know (well,
>> we *think*) we're about to resume the execution of some other task.
> 
> 355.566 microsec is enough time to steal, go on CPU, do useful work, and go 
> off CPU, particularly for chatty workloads like hackbench.  The performance
> data bear this out.  For the higher loads, the average timeslice for 
> hackbench 
> 

Thanks for the explanation. AIUI the big difference here is that try_steal()
is considerably cheaper than load_balance(), so the rq->avg_idle concerns
matter less (or at least, on a considerably smaller scale).

> Perhaps I could skip try_steal() if avg_idle is very small, although with
> hackbench I have seen average time slice as small as 10 microsec under 
> high load and preemptions.  I'll run some experiments.
> 

That might be a safe thing to do. In the same department, maybe we could
skip try_steal() if we bail out of idle_balance() because
!(this_rq->rd->overload). Although rq->rd->overload and cfs_overload_cpus
are decoupled, they should express the same thing here.

>>> We could merge the stealing code into the idle_balance() code to get a
>>> union of the two, but IMO that would be less readable.
>>>
>>> We could remove the core and socket levels from idle_balance()
>>
>> I understand that as only doing load_balance() at DIE level in
>> idle_balance(), as that is what makes most sense to me (with big.LITTLE
>> those misfit migrations are done at DIE level), is that correct?
> 
> Correct. 
>> Also, with DynamIQ (next gen big.LITTLE) we could have asymmetry at MC
>> level, which could cause issues there.
> 
> We could keep idle_balance for this level and fall back to stealing as in
> my patch, or you could extend the misfits bitmap to also include CPUs 
> with reduced memory bandwidth and active tasks. (if I understand the asymmetry 
> correctly).
> 

It's mostly µarch asymmetry, so by "asymmetry at MC level" I meant "we'll
see the SD_ASYM_CPUCAPACITY flag at MC level". But if we tweak stealing
to take misfit tasks into account (so we'd rely on SD_ASYM_CPUCAPACITY
in some way or another), that could work.

>>> and let
>>> stealing handle those levels.  I think that makes sense after stealing
>>> performance is validated on more architectures, but we would still have
>>> two different mechanisms.
>>>
>>> - Steve
>>
>> I'll try out those patches on top of the misfit series to see how the
>> whole thing behaves.
> 
> Very good, thanks.
> 
> - Steve
> 

  reply	other threads:[~2018-10-25 11:31 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-22 14:59 [PATCH 00/10] steal tasks to improve CPU utilization Steve Sistare
2018-10-22 14:59 ` [PATCH 01/10] sched: Provide sparsemask, a reduced contention bitmap Steve Sistare
2018-10-22 14:59 ` [PATCH 02/10] sched/topology: Provide hooks to allocate data shared per LLC Steve Sistare
2018-10-22 14:59 ` [PATCH 03/10] sched/topology: Provide cfs_overload_cpus bitmap Steve Sistare
2018-10-22 14:59 ` [PATCH 04/10] sched/fair: Dynamically update cfs_overload_cpus Steve Sistare
2018-10-22 16:56   ` Peter Zijlstra
2018-10-22 18:43     ` Steven Sistare
2018-10-22 14:59 ` [PATCH 05/10] sched/fair: Hoist idle_stamp up from idle_balance Steve Sistare
2018-10-25 13:47   ` Valentin Schneider
2018-10-25 14:04     ` Steven Sistare
2018-10-22 14:59 ` [PATCH 06/10] sched/fair: Generalize the detach_task interface Steve Sistare
2018-10-22 14:59 ` [PATCH 07/10] sched/fair: Provide can_migrate_task_llc Steve Sistare
2018-10-26 18:04   ` Valentin Schneider
2018-10-26 18:28     ` Steven Sistare
2018-10-29 19:34       ` Valentin Schneider
2018-10-31 15:43         ` Steven Sistare
2018-10-31 18:48           ` Valentin Schneider
2018-10-31 19:14         ` Peter Zijlstra
2018-11-01 11:16           ` Valentin Schneider
2018-10-22 14:59 ` [PATCH 08/10] sched/fair: Steal work from an overloaded CPU when CPU goes idle Steve Sistare
2018-10-25 13:48   ` Valentin Schneider
2018-10-25 14:07     ` Steven Sistare
2018-10-22 14:59 ` [PATCH 09/10] sched/fair: disable stealing if too many NUMA nodes Steve Sistare
2018-10-22 17:06   ` Peter Zijlstra
2018-10-22 18:47     ` Steven Sistare
2018-10-22 19:21       ` Steven Sistare
2018-10-22 22:05         ` Peter Zijlstra
2018-10-23 13:18   ` Steven Sistare
2018-10-22 14:59 ` [PATCH 10/10] sched/fair: Provide idle search schedstats Steve Sistare
2018-10-22 17:04 ` [PATCH 00/10] steal tasks to improve CPU utilization Peter Zijlstra
2018-10-22 19:07   ` Steven Sistare
2018-10-22 22:09     ` Peter Zijlstra
2018-10-24 15:34     ` Valentin Schneider
2018-10-24 19:27       ` Steven Sistare
2018-10-25 11:31         ` Valentin Schneider [this message]
2018-10-25 12:21           ` Steven Sistare
2018-10-25  7:50 ` Vincent Guittot
2018-10-25 11:28   ` Steven Sistare
2018-10-25 12:43     ` Vincent Guittot
2018-10-25 14:19       ` Steven Sistare
2018-10-31 19:35 ` Steven Sistare
2018-11-01 11:56 ` Steven Sistare
2018-11-02 23:39 ` Subhra Mazumdar
2018-11-05 20:08   ` Steven Sistare
2019-01-04 13:37 ` Shijith Thotton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09b10abc-8357-2db3-3d30-8aa9e95e8655@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dhaval.giani@oracle.com \
    --cc=jbacik@fb.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mingo@redhat.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=steven.sistare@oracle.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).