From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758715AbcFAP1G (ORCPT ); Wed, 1 Jun 2016 11:27:06 -0400 Received: from mail-lf0-f42.google.com ([209.85.215.42]:33832 "EHLO mail-lf0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758650AbcFAP1D (ORCPT ); Wed, 1 Jun 2016 11:27:03 -0400 MIME-Version: 1.0 In-Reply-To: <20160601133632.GB28447@twins.programming.kicks-ass.net> References: <1464080252-17209-1-git-send-email-vincent.guittot@linaro.org> <20160601133632.GB28447@twins.programming.kicks-ass.net> From: Vincent Guittot Date: Wed, 1 Jun 2016 17:26:41 +0200 Message-ID: Subject: Re: [RFC PATCH v2] sched: reflect sched_entity movement into task_group's utilization To: Peter Zijlstra Cc: Ingo Molnar , linux-kernel , Paul Turner , Yuyang Du , Dietmar Eggemann , Benjamin Segall , Morten Rasmussen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1 June 2016 at 15:36, Peter Zijlstra wrote: > On Tue, May 24, 2016 at 10:57:32AM +0200, Vincent Guittot wrote: > >> +/* >> + * Save how much utilization has just been added/removed on cfs rq so we can >> + * propagate it across the whole tg tree >> + */ >> +static void set_tg_cfs_rq_util(struct cfs_rq *cfs_rq, int delta) >> +{ >> + if (cfs_rq->tg == &root_task_group) >> + return; >> + >> + cfs_rq->diff_util_avg += delta; >> +} > > function name doesn't really reflect its purpose.. ok, i will change it > >> + >> +/* Take into account the change of the utilization of a child task group */ > > This comment could be so much better... :-) yes, i'm going to add more details of what is done > >> +static void update_tg_cfs_util(struct sched_entity *se, int blocked) >> +{ >> + int delta; >> + struct cfs_rq *cfs_rq; >> + long update_util_avg; >> + long last_update_time; >> + long old_util_avg; >> + >> + >> + /* >> + * update_blocked_average will call this function for root cfs_rq >> + * whose se is null. In this case just return >> + */ >> + if (!se) >> + return; >> + >> + if (entity_is_task(se)) >> + return 0; >> + >> + /* Get sched_entity of cfs rq */ >> + cfs_rq = group_cfs_rq(se); >> + >> + update_util_avg = cfs_rq->diff_util_avg; >> + >> + if (!update_util_avg) >> + return 0; >> + >> + /* Clear pending changes */ >> + cfs_rq->diff_util_avg = 0; >> + >> + /* Add changes in sched_entity utilizaton */ >> + old_util_avg = se->avg.util_avg; >> + se->avg.util_avg = max_t(long, se->avg.util_avg + update_util_avg, 0); >> + se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX; >> + >> + /* Get parent cfs_rq */ >> + cfs_rq = cfs_rq_of(se); >> + >> + if (blocked) { >> + /* >> + * blocked utilization has to be synchronized with its parent >> + * cfs_rq's timestamp >> + */ >> + last_update_time = cfs_rq_last_update_time(cfs_rq); >> + >> + __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), >> + &se->avg, >> + se->on_rq * scale_load_down(se->load.weight), >> + cfs_rq->curr == se, NULL); >> + } >> + >> + delta = se->avg.util_avg - old_util_avg; >> + >> + cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg + delta, 0); >> + cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX; >> + >> + set_tg_cfs_rq_util(cfs_rq, delta); >> +} > > So if I read this right it computes the utilization delta for group se's > and propagates it into the corresponding parent group cfs_rq ? yes > >> @@ -6276,6 +6368,8 @@ static void update_blocked_averages(int cpu) >> >> if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true)) >> update_tg_load_avg(cfs_rq, 0); >> + /* Propagate pending util changes to the parent */ >> + update_tg_cfs_util(cfs_rq->tg->se[cpu], true); > > And this is why you need the strict bottom-up thing fixed.. yes > >> @@ -8370,6 +8464,20 @@ static void detach_task_cfs_rq(struct task_struct *p) >> >> /* Catch up with the cfs_rq and remove our load when we leave */ >> detach_entity_load_avg(cfs_rq, se); >> + >> + /* >> + * Propagate the detach across the tg tree to ake it visible to the >> + * root >> + */ >> + for_each_sched_entity(se) { >> + cfs_rq = cfs_rq_of(se); >> + >> + if (cfs_rq_throttled(cfs_rq)) >> + break; >> + >> + update_load_avg(se, 1); >> + update_tg_cfs_util(se, false); >> + } >> } >> >> static void attach_task_cfs_rq(struct task_struct *p) >> @@ -8399,8 +8507,21 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) >> >> static void switched_to_fair(struct rq *rq, struct task_struct *p) >> { >> + struct sched_entity *se = &p->se; >> + struct cfs_rq *cfs_rq; >> + >> attach_task_cfs_rq(p); >> >> + for_each_sched_entity(se) { >> + cfs_rq = cfs_rq_of(se); >> + >> + if (cfs_rq_throttled(cfs_rq)) >> + break; >> + >> + update_load_avg(se, 1); >> + update_tg_cfs_util(se, false); >> + } >> + >> if (task_on_rq_queued(p)) { >> /* >> * We were most likely switched from sched_rt, so > > So I would expect this to be in attach_task_cfs_rq() to mirror the > detach_task_cfs_rq(). yes. My goal what to rely on the enqueue to the propagate the move during a task_move_group but it's better to place it in attach_task_cfs_rq so we have same sequence for attaching and detaching > > Also, this change is somewhat independent but required for the 'flat' > util measure, right? don't know if it's needed for flat util measure as the main interest of flat util measure is to not go through the tg tree