From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1758008AbcFANgk (ORCPT <rfc822;w@1wt.eu>);
	Wed, 1 Jun 2016 09:36:40 -0400
Received: from merlin.infradead.org ([205.233.59.134]:49640 "EHLO
	merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753818AbcFANgj (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 1 Jun 2016 09:36:39 -0400
Date: Wed, 1 Jun 2016 15:36:32 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, pjt@google.com,
        yuyang.du@intel.com, dietmar.eggemann@arm.com, bsegall@google.com,
        Morten.Rasmussen@arm.com
Subject: Re: [RFC PATCH v2] sched: reflect sched_entity movement into
 task_group's utilization
Message-ID: <20160601133632.GB28447@twins.programming.kicks-ass.net>
References: <1464080252-17209-1-git-send-email-vincent.guittot@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1464080252-17209-1-git-send-email-vincent.guittot@linaro.org>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, May 24, 2016 at 10:57:32AM +0200, Vincent Guittot wrote:

> +/*
> + * Save how much utilization has just been added/removed on cfs rq so we can
> + * propagate it across the whole tg tree
> + */
> +static void set_tg_cfs_rq_util(struct cfs_rq *cfs_rq, int delta)
> +{
> +	if (cfs_rq->tg == &root_task_group)
> +		return;
> +
> +	cfs_rq->diff_util_avg += delta;
> +}

function name doesn't really reflect its purpose..

> +
> +/* Take into account the change of the utilization of a child task group */

This comment could be so much better... :-)

> +static void update_tg_cfs_util(struct sched_entity *se, int blocked)
> +{
> +	int delta;
> +	struct cfs_rq *cfs_rq;
> +	long update_util_avg;
> +	long last_update_time;
> +	long old_util_avg;
> +
> +
> +	/*
> +	 * update_blocked_average will call this function for root cfs_rq
> +	 * whose se is null. In this case just return
> +	 */
> +	if (!se)
> +		return;
> +
> +	if (entity_is_task(se))
> +		return 0;
> +
> +	/* Get sched_entity of cfs rq */
> +	cfs_rq = group_cfs_rq(se);
> +
> +	update_util_avg = cfs_rq->diff_util_avg;
> +
> +	if (!update_util_avg)
> +		return 0;
> +
> +	/* Clear pending changes */
> +	cfs_rq->diff_util_avg = 0;
> +
> +	/* Add changes in sched_entity utilizaton */
> +	old_util_avg = se->avg.util_avg;
> +	se->avg.util_avg = max_t(long, se->avg.util_avg + update_util_avg, 0);
> +	se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
> +
> +	/* Get parent cfs_rq */
> +	cfs_rq = cfs_rq_of(se);
> +
> +	if (blocked) {
> +		/*
> +		 * blocked utilization has to be synchronized with its parent
> +		 * cfs_rq's timestamp
> +		 */
> +		last_update_time = cfs_rq_last_update_time(cfs_rq);
> +
> +		__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)),
> +			  &se->avg,
> +			  se->on_rq * scale_load_down(se->load.weight),
> +			  cfs_rq->curr == se, NULL);
> +	}
> +
> +	delta = se->avg.util_avg - old_util_avg;
> +
> +	cfs_rq->avg.util_avg =  max_t(long, cfs_rq->avg.util_avg + delta, 0);
> +	cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX;
> +
> +	set_tg_cfs_rq_util(cfs_rq, delta);
> +}

So if I read this right it computes the utilization delta for group se's
and propagates it into the corresponding parent group cfs_rq ?

> @@ -6276,6 +6368,8 @@ static void update_blocked_averages(int cpu)
>  
>  		if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))
>  			update_tg_load_avg(cfs_rq, 0);
> +		/* Propagate pending util changes to the parent */
> +		update_tg_cfs_util(cfs_rq->tg->se[cpu], true);

And this is why you need the strict bottom-up thing fixed..

> @@ -8370,6 +8464,20 @@ static void detach_task_cfs_rq(struct task_struct *p)
>  
>  	/* Catch up with the cfs_rq and remove our load when we leave */
>  	detach_entity_load_avg(cfs_rq, se);
> +
> +	/*
> +	 * Propagate the detach across the tg tree to ake it visible to the
> +	 * root
> +	 */
> +	for_each_sched_entity(se) {
> +		cfs_rq = cfs_rq_of(se);
> +
> +		if (cfs_rq_throttled(cfs_rq))
> +			break;
> +
> +		update_load_avg(se, 1);
> +		update_tg_cfs_util(se, false);
> +	}
>  }
>  
>  static void attach_task_cfs_rq(struct task_struct *p)
> @@ -8399,8 +8507,21 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
>  
>  static void switched_to_fair(struct rq *rq, struct task_struct *p)
>  {
> +	struct sched_entity *se = &p->se;
> +	struct cfs_rq *cfs_rq;
> +
>  	attach_task_cfs_rq(p);
>  
> +	for_each_sched_entity(se) {
> +		cfs_rq = cfs_rq_of(se);
> +
> +		if (cfs_rq_throttled(cfs_rq))
> +			break;
> +
> +		update_load_avg(se, 1);
> +		update_tg_cfs_util(se, false);
> +	}
> +
>  	if (task_on_rq_queued(p)) {
>  		/*
>  		 * We were most likely switched from sched_rt, so

So I would expect this to be in attach_task_cfs_rq() to mirror the
detach_task_cfs_rq().

Also, this change is somewhat independent but required for the 'flat'
util measure, right?