From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1754136AbdJIMPJ (ORCPT <rfc822;w@1wt.eu>);
        Mon, 9 Oct 2017 08:15:09 -0400
Received: from foss.arm.com ([217.140.101.70]:55572 "EHLO foss.arm.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1754079AbdJIMPH (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 9 Oct 2017 08:15:07 -0400
Subject: Re: [PATCH -v2 15/18] sched/fair: Align PELT windows between cfs_rq
 and its se
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org,
        josef@toxicpanda.com, torvalds@linux-foundation.org,
        vincent.guittot@linaro.org, efault@gmx.de, pjt@google.com, clm@fb.com,
        morten.rasmussen@arm.com, bsegall@google.com, yuyang.du@intel.com
References: <20170901132059.342024223@infradead.org>
 <20170901132748.738108335@infradead.org>
 <fae8858f-13ce-b753-7bce-cfb6a1cba92d@arm.com>
 <20171006130257.l4jekk5bvme3pcma@hirez.programming.kicks-ass.net>
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
Message-ID: <37c7fe49-1d1f-2930-c90f-88645db398ce@arm.com>
Date: Mon, 9 Oct 2017 13:15:04 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <20171006130257.l4jekk5bvme3pcma@hirez.programming.kicks-ass.net>
Content-Type: text/plain; charset=utf-8
Content-Language: en-GB
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/10/17 14:02, Peter Zijlstra wrote:
> On Wed, Oct 04, 2017 at 08:27:01PM +0100, Dietmar Eggemann wrote:
>> On 01/09/17 14:21, Peter Zijlstra wrote:
>>> The PELT _sum values are a saw-tooth function, dropping on the decay
>>> edge and then growing back up again during the window.
>>>
>>> When these window-edges are not aligned between cfs_rq and se, we can
>>> have the situation where, for example, on dequeue, the se decays
>>> first.
>>>
>>> Its _sum values will be small(er), while the cfs_rq _sum values will
>>> still be on their way up. Because of this, the subtraction:
>>> cfs_rq->avg._sum -= se->avg._sum will result in a positive value. This
>>> will then, once the cfs_rq reaches an edge, translate into its _avg
>>> value jumping up.
>>>
>>> This is especially visible with the runnable_load bits, since they get
>>> added/subtracted a lot.
>>>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> ---
>>>  kernel/sched/fair.c |   45 +++++++++++++++++++++++++++++++--------------
>>>  1 file changed, 31 insertions(+), 14 deletions(-)
>>
>> [...]
>>
>>> @@ -3644,7 +3634,34 @@ update_cfs_rq_load_avg(u64 now, struct c
>>>   */
>>>  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>>  {
>>> +	u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib;
>>> +
>>> +	/*
>>> +	 * When we attach the @se to the @cfs_rq, we must align the decay
>>> +	 * window because without that, really weird and wonderful things can
>>> +	 * happen.
>>> +	 *
>>> +	 * XXX illustrate
>>> +	 */
>>>  	se->avg.last_update_time = cfs_rq->avg.last_update_time;
>>> +	se->avg.period_contrib = cfs_rq->avg.period_contrib;
>>> +
>>> +	/*
>>> +	 * Hell(o) Nasty stuff.. we need to recompute _sum based on the new
>>> +	 * period_contrib. This isn't strictly correct, but since we're
>>> +	 * entirely outside of the PELT hierarchy, nobody cares if we truncate
>>> +	 * _sum a little.
>>> +	 */
>>> +	se->avg.util_sum = se->avg.util_avg * divider;
>>> +
>>> +	se->avg.load_sum = divider;
>>> +	if (se_weight(se)) {
>>> +		se->avg.load_sum =
>>> +			div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se));
>>> +	}
>>
>> Can scale_load_down(se->load.weight) ever become 0 here?
> 
> Yeah, don't see why not.

Tasks should be safe since for them scale_load_down(se->load.weight)
should be >= 15 when they get attached.

task groups get attached via attach_entity_cfs_rq()

# mkdir /sys/fs/cgroup/foo

[   63.885333] [<ffff00000811a558>] attach_entity_load_avg+0x138/0x558
[   63.885345] [<ffff0000081208e0>] attach_entity_cfs_rq+0x298/0x998
[   63.885357] [<ffff00000812d660>] online_fair_sched_group+0x70/0xb0
[   63.885369] [<ffff0000081182d4>] sched_online_group+0x94/0xb0
[   63.885381] [<ffff000008118318>] cpu_cgroup_css_online+0x28/0x38
[   63.885393] [<ffff0000081a5e80>] online_css+0x38/0xd0
[   63.885406] [<ffff0000081acae0>] cgroup_apply_control_enable+0x260
[   63.885418] [<ffff0000081afc54>] cgroup_mkdir+0x314/0x4e8

mkdir-2501  [004]    63.689455: bprint:
attach_entity_load_avg: cpu=1 se=0xffff800072fffc00 load.weight=1048576

If I apply the smallest shares possible to that task group, it is
already attched.

# echo 2 > /sys/fs/cgroup/foo/cpu.shares

create_cpu_cgro-2500  [003]    64.591878: bprint:
update_cfs_group: cpu=1 se=0xffff800072fffc00 tg->shares=2048 shares=0
load=0 tg_weight=0
 create_cpu_cgro-2500  [003]    64.591904: bprint:
reweight_entity: cpu=1 se=0xffff800072fffc00 se->load.weight=2

I can't see right now how a task group can get attached with
se->load.weight < 1024 (32 bit)/1048576 (64 bit)? Do I miss something?