From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754551AbdDJRiO (ORCPT <rfc822;w@1wt.eu>);
        Mon, 10 Apr 2017 13:38:14 -0400
Received: from merlin.infradead.org ([205.233.59.134]:34512 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753464AbdDJRiL (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 10 Apr 2017 13:38:11 -0400
Date: Mon, 10 Apr 2017 19:38:02 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org,
        dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com,
        yuyang.du@intel.com, pjt@google.com, bsegall@google.com
Subject: Re: [PATCH v2] sched/fair: update scale invariance of PELT
Message-ID: <20170410173802.orygigjbcpefqtdv@hirez.programming.kicks-ass.net>
References: <1491815909-13345-1-git-send-email-vincent.guittot@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1491815909-13345-1-git-send-email-vincent.guittot@linaro.org>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


Thanks for the rebase.

On Mon, Apr 10, 2017 at 11:18:29AM +0200, Vincent Guittot wrote:

Ok, so let me try and paraphrase what this patch does.

So consider a task that runs 16 out of our 32ms window:

   running   idle
  |---------|---------|


You're saying that when we scale running with the frequency, suppose we
were at 50% freq, we'll end up with:

   run  idle
  |----|---------|


Which is obviously a shorter total then before; so what you do is add
back the lost idle time like:

   run  lost idle
  |----|----|---------|


to arrive at the same total time. Which seems to make sense.

Now I have vague memories of Morten having issues with your previous
patches, so I'll wait for him to chime in as well.


On to the implementation:

>  /*
> + * Scale the time to reflect the effective amount of computation done during
> + * this delta time.
> + */
> +static __always_inline u64
> +scale_time(u64 delta, int cpu, struct sched_avg *sa,
> +		unsigned long weight, int running)
> +{
> +	if (running) {
> +		sa->stolen_idle_time += delta;
> +		/*
> +		 * scale the elapsed time to reflect the real amount of
> +		 * computation
> +		 */
> +		delta = cap_scale(delta, arch_scale_freq_capacity(NULL, cpu));
> +		delta = cap_scale(delta, arch_scale_cpu_capacity(NULL, cpu));
> +
> +		/*
> +		 * Track the amount of stolen idle time due to running at
> +		 * lower capacity
> +		 */
> +		sa->stolen_idle_time -= delta;

OK so far so good, this tracks, in stolen_idle_time, the 'lost' bit from
above.

> +	} else if (!weight) {
> +		if (sa->util_sum < (LOAD_AVG_MAX * 1000)) {

But here I'm completely lost. WTF just happened ;-)

Firstly, I think we want a comment on why we care about the !weight
case. Why isn't !running sufficient?

Secondly, what's up with the util_sum < LOAD_AVG_MAX * 1000 thing?

Is that to deal with cpu_capacity?


> +			/*
> +			 * Add the idle time stolen by running at lower compute
> +			 * capacity
> +			 */
> +			delta += sa->stolen_idle_time;
> +		}
> +		sa->stolen_idle_time = 0;
> +	}
> +
> +	return delta;
> +}


Thirdly, I'm thinking this isn't quite right. Imagine a task that's
running across a decay window, then we'll only add back the stolen_idle
time in the next window, even though it should've been in this one,
right?