From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754136AbdJIMPJ (ORCPT ); Mon, 9 Oct 2017 08:15:09 -0400 Received: from foss.arm.com ([217.140.101.70]:55572 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754079AbdJIMPH (ORCPT ); Mon, 9 Oct 2017 08:15:07 -0400 Subject: Re: [PATCH -v2 15/18] sched/fair: Align PELT windows between cfs_rq and its se To: Peter Zijlstra Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org, josef@toxicpanda.com, torvalds@linux-foundation.org, vincent.guittot@linaro.org, efault@gmx.de, pjt@google.com, clm@fb.com, morten.rasmussen@arm.com, bsegall@google.com, yuyang.du@intel.com References: <20170901132059.342024223@infradead.org> <20170901132748.738108335@infradead.org> <20171006130257.l4jekk5bvme3pcma@hirez.programming.kicks-ass.net> From: Dietmar Eggemann Message-ID: <37c7fe49-1d1f-2930-c90f-88645db398ce@arm.com> Date: Mon, 9 Oct 2017 13:15:04 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <20171006130257.l4jekk5bvme3pcma@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/10/17 14:02, Peter Zijlstra wrote: > On Wed, Oct 04, 2017 at 08:27:01PM +0100, Dietmar Eggemann wrote: >> On 01/09/17 14:21, Peter Zijlstra wrote: >>> The PELT _sum values are a saw-tooth function, dropping on the decay >>> edge and then growing back up again during the window. >>> >>> When these window-edges are not aligned between cfs_rq and se, we can >>> have the situation where, for example, on dequeue, the se decays >>> first. >>> >>> Its _sum values will be small(er), while the cfs_rq _sum values will >>> still be on their way up. Because of this, the subtraction: >>> cfs_rq->avg._sum -= se->avg._sum will result in a positive value. This >>> will then, once the cfs_rq reaches an edge, translate into its _avg >>> value jumping up. >>> >>> This is especially visible with the runnable_load bits, since they get >>> added/subtracted a lot. >>> >>> Signed-off-by: Peter Zijlstra (Intel) >>> --- >>> kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++-------------- >>> 1 file changed, 31 insertions(+), 14 deletions(-) >> >> [...] >> >>> @@ -3644,7 +3634,34 @@ update_cfs_rq_load_avg(u64 now, struct c >>> */ >>> static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) >>> { >>> + u32 divider = LOAD_AVG_MAX - 1024 + cfs_rq->avg.period_contrib; >>> + >>> + /* >>> + * When we attach the @se to the @cfs_rq, we must align the decay >>> + * window because without that, really weird and wonderful things can >>> + * happen. >>> + * >>> + * XXX illustrate >>> + */ >>> se->avg.last_update_time = cfs_rq->avg.last_update_time; >>> + se->avg.period_contrib = cfs_rq->avg.period_contrib; >>> + >>> + /* >>> + * Hell(o) Nasty stuff.. we need to recompute _sum based on the new >>> + * period_contrib. This isn't strictly correct, but since we're >>> + * entirely outside of the PELT hierarchy, nobody cares if we truncate >>> + * _sum a little. >>> + */ >>> + se->avg.util_sum = se->avg.util_avg * divider; >>> + >>> + se->avg.load_sum = divider; >>> + if (se_weight(se)) { >>> + se->avg.load_sum = >>> + div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); >>> + } >> >> Can scale_load_down(se->load.weight) ever become 0 here? > > Yeah, don't see why not. Tasks should be safe since for them scale_load_down(se->load.weight) should be >= 15 when they get attached. task groups get attached via attach_entity_cfs_rq() # mkdir /sys/fs/cgroup/foo [ 63.885333] [] attach_entity_load_avg+0x138/0x558 [ 63.885345] [] attach_entity_cfs_rq+0x298/0x998 [ 63.885357] [] online_fair_sched_group+0x70/0xb0 [ 63.885369] [] sched_online_group+0x94/0xb0 [ 63.885381] [] cpu_cgroup_css_online+0x28/0x38 [ 63.885393] [] online_css+0x38/0xd0 [ 63.885406] [] cgroup_apply_control_enable+0x260 [ 63.885418] [] cgroup_mkdir+0x314/0x4e8 mkdir-2501 [004] 63.689455: bprint: attach_entity_load_avg: cpu=1 se=0xffff800072fffc00 load.weight=1048576 If I apply the smallest shares possible to that task group, it is already attched. # echo 2 > /sys/fs/cgroup/foo/cpu.shares create_cpu_cgro-2500 [003] 64.591878: bprint: update_cfs_group: cpu=1 se=0xffff800072fffc00 tg->shares=2048 shares=0 load=0 tg_weight=0 create_cpu_cgro-2500 [003] 64.591904: bprint: reweight_entity: cpu=1 se=0xffff800072fffc00 se->load.weight=2 I can't see right now how a task group can get attached with se->load.weight < 1024 (32 bit)/1048576 (64 bit)? Do I miss something?