All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Yuyang Du <yuyang.du@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [rfc patch] sched/fair: Use instantaneous load for fork/exec balancing
Date: Mon, 11 Jul 2016 09:58:52 +0100	[thread overview]
Message-ID: <57835FCC.70205@arm.com> (raw)
In-Reply-To: <20160704150452.GP8415@codeblueprint.co.uk>

On 04/07/16 16:04, Matt Fleming wrote:
> On Wed, 15 Jun, at 04:32:58PM, Dietmar Eggemann wrote:
>> On 14/06/16 17:40, Mike Galbraith wrote:
>>> On Tue, 2016-06-14 at 15:14 +0100, Dietmar Eggemann wrote:
>>>
>>>> IMHO, the hackbench performance "boost" w/o 0905f04eb21f is due to the
>>>> fact that a new task gets all it's load decayed (making it a small task)
>>>> in the __update_load_avg() call in remove_entity_load_avg() because its
>>>> se->avg.last_update_time value is 0 which creates a huge time difference
>>>> comparing it to cfs_rq->avg.last_update_time. The patch 0905f04eb21f
>>>> avoids this and thus the task stays big se->avg.load_avg = 1024.
>>>
>>> I don't care much at all about the hackbench "regression" in its own
>>> right, and what causes it, for me, bottom line is that there are cases
>>> where we need to be able to resolve, and can't, simply because we're
>>> looking at a fuzzy (rippling) reflection.
>>
>> Understood. I just thought it would be nice to know why 0905f04eb21f
>> makes this problem even more visible. But so far I wasn't able to figure
>> out why this diff in se->avg.load_avg [1024 versus 0] has this effect on
>> cfs_rq->runnable_load_avg making it even less suitable in find idlest*.
>> enqueue_entity_load_avg()'s cfs_rq->runnable_load_* += sa->load_* looks
>> suspicious though.
> 
> In my testing without 0905f04eb21f I saw that se->avg.load_avg
> actually managed to skip being decayed at all before the task was
> dequeued, which meant that cfs_rq->runnable_load_avg was more likely
> to be zero after dequeue, for those workloads like hackbench that
> essentially are just a fork bomb.

Do you mean the first dequeue when the task is forked?

These are the pelt related functions which are called when the task is
forked:

detach_entity_load_avg
attach_entity_load_avg
remove_entity_load_avg <-- se->avg.load_avg is set to 0 w/o 0905f04eb21f
                           se->avg.load_avg stays 1024 w/ 0905f04eb21f
enqueue_entity_load_avg
attach_entity_load_avg (double attach is fixed on tip/sched/core)
dequeue_entity_load_avg

> se->avg.load_avg evaded decay because se->avg.period_contrib was being
> zero'd in __update_load_avg().

I don't see the relation to se->avg.period_contrib here. IMHO,
se->avg.period_contrib is purely there to manage the 3 different update
phases in __update_load_avg().

This difference in the initial se->avg.load_avg value [0 or 1024] has an
influence in wake_affine() [weight = p->se.avg.load_avg;] for the wakeup
handling of the hackbench tasks in the 'send/receive data' phase.

There are a couple of patches on tip/sched/core which might change the
behaviour of this: fork path, no double attach_entity_load_avg for new
task, no remove_entity_load_avg for new task, changes in effective_load ...

> With 0905f04eb21f applied, it's less likely (though not impossible)
> that ->period_contrib will be zero'd and so we usually end up with
> some residual load in cfs_rq->runnable_load_avg on dequeue, and hence,
> 
> 	cfs_rq->runnable_load_avg > se->avg.load_avg
> 
> even if 'se' is the only task on the runqueue.
> 
> FYI, below is my quick and dirty hack that restored hackbench
> performance for the few machines I checked. I didn't try schbench with
> it.
> 
> ---
> 
> From 4e9856ea3dc56e356195ca035dab7302754ce59b Mon Sep 17 00:00:00 2001
> From: Matt Fleming <matt@codeblueprint.co.uk>
> Date: Thu, 9 Jun 2016 19:48:14 +0100
> Subject: [PATCH] sched/fair: Reset ::runnable_load_avg when dequeueing last
>  entity
> 
> The task and runqueue load averages maintained in p->se.avg.load_avg
> and cfs_rq->runnable_load_avg respectively, can decay at different
> wall clock rates, which means that enqueueing and then dequeueing a
> task on an otherwise empty runqueue doesn't always leave
> ::runnable_load_avg with its initial value.
> 
> This can lead to the situation where cfs_rq->runnable_load_avg has a
> non-zero value even though there are no runnable entities on the
> runqueue. Assuming no entity is enqueued on this runqueue for some
> time this residual load average will decay gradually as the load
> averages are updated.
> 
> But we can optimise the special case of dequeueing the last entity and
> reset ::runnable_load_avg early, which gives a performance improvement
> to workloads that trigger the load balancer, such as fork-heavy
> applications when SD_BALANCE_FORK is set, because it gives a more up
> to date view of how busy the cpu is.
> 
> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
> ---
>  kernel/sched/fair.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c6dd8bab010c..408ee90c7ea8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3007,10 +3007,20 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  static inline void
>  dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> +	unsigned long load_avg = 0;
> +
>  	update_load_avg(se, 1);
>  
> -	cfs_rq->runnable_load_avg =
> -		max_t(long, cfs_rq->runnable_load_avg - se->avg.load_avg, 0);
> +	/*
> +	 * If we're about to dequeue the last runnable entity we can
> +	 * reset the runnable load average to zero instead of waiting
> +	 * for it to decay naturally. This gives the load balancer a
> +	 * more timely and accurate view of how busy this cpu is.
> +	 */
> +	if (cfs_rq->nr_running > 1)
> +		load_avg = max_t(long, cfs_rq->runnable_load_avg - se->avg.load_avg, 0);
> +
> +	cfs_rq->runnable_load_avg = load_avg;
>  	cfs_rq->runnable_load_sum =
>  		max_t(s64,  cfs_rq->runnable_load_sum - se->avg.load_sum, 0);
>  }
> 

  parent reply	other threads:[~2016-07-11  8:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-14  7:58 [rfc patch] sched/fair: Use instantaneous load for fork/exec balancing Mike Galbraith
2016-06-14 14:14 ` Dietmar Eggemann
2016-06-14 16:40   ` Mike Galbraith
2016-06-15 15:32     ` Dietmar Eggemann
2016-06-15 16:03       ` Mike Galbraith
2016-06-15 19:03         ` Dietmar Eggemann
2016-06-16  3:33           ` Mike Galbraith
2016-06-16  9:01             ` Dietmar Eggemann
2016-07-04 15:04       ` Matt Fleming
2016-07-04 17:43         ` Mike Galbraith
2016-07-06 11:45           ` Matt Fleming
2016-07-06 12:21             ` Mike Galbraith
2016-07-11  8:58         ` Dietmar Eggemann [this message]
2016-07-12 11:14           ` Matt Fleming
2016-06-14 22:42 ` Yuyang Du
2016-06-15  7:01   ` Mike Galbraith
2016-06-16 11:46     ` [patch] sched/fair: Use instantaneous load in wakeup paths Mike Galbraith
2016-06-16 12:04       ` Mike Galbraith
2016-06-16 12:41         ` Mike Galbraith
2016-06-17  6:21           ` Mike Galbraith
2016-06-17 10:55             ` Dietmar Eggemann
2016-06-17 13:57               ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57835FCC.70205@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mgorman@techsingularity.net \
    --cc=peterz@infradead.org \
    --cc=umgwanakikbuti@gmail.com \
    --cc=yuyang.du@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.