From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761179AbdADRUS (ORCPT ); Wed, 4 Jan 2017 12:20:18 -0500 Received: from foss.arm.com ([217.140.101.70]:54846 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756249AbdADRUG (ORCPT ); Wed, 4 Jan 2017 12:20:06 -0500 Subject: Re: [PATCH v2] sched: fix group_entity's share update To: Vincent Guittot , peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org References: <1482335426-7664-1-git-send-email-vincent.guittot@linaro.org> Cc: stable@vger.kernel.org, pjt@google.com From: Dietmar Eggemann Message-ID: Date: Wed, 4 Jan 2017 17:20:01 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1482335426-7664-1-git-send-email-vincent.guittot@linaro.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 21/12/16 15:50, Vincent Guittot wrote: IMHO, the overall idea makes sense to me. Just a couple of small questions ... > The update of the share of a cfs_rq is done when its load_avg is updated > but before the group_entity's load_avg has been updated for the past time > slot. This generates wrong load_avg accounting which can be significant > when small tasks are involved in the scheduling. Why for small tasks? Is it because we use load = scale_load_down(cfs_rq->load.weight) and not load = cfs_rq->avg.load_avg in calc_cfs_shares()? > Let take the example of a task a that is dequeued of its task group A: > root > (cfs_rq) > \ > (se) > A > (cfs_rq) > \ > (se) > a > > Task "a" was the only task in task group A which becomes idle when a is > dequeued. > > We have the sequence: > > - dequeue_entity a->se > - update_load_avg(a->se) > - dequeue_entity_load_avg(A->cfs_rq, a->se) > - update_cfs_shares(A->cfs_rq) > A->cfs_rq->load.weight == 0 You mean A->cfs_rq->load.weight = 0 ? > A->se->load.weight is updated with the new share (0 in this case) Shouldn't this be MIN_SHARES (2) instead of 0? > - dequeue_entity A->se > - update_load_avg(A->se) but its weight is now null so the last time > slot (up to a tick) will be accounted with a weight of 0 instead of > its real weight during the time slot. The last time slot will be > accounted as an idle one whereas it was a running one. > > If the running time of task a is short enough that no tick happens when it > runs, all running time of group entity A->se will be accounted as idle > time. > > Instead, we should update the share of a cfs_rq (in fact the weight of its > group entity) only after having updated the load_avg of the group_entity. This is because we use 'se->on_rq * scale_load_down(se->load.weight)' in __update_load_avg() as weight parameter for PELT load_avg update? > update_cfs_shares() now take the sched_entity as parameter instead of the > cfs_rq and the weight of the group_entity is updated only once its load_avg > has been synced with current time. [...]