From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758008AbcFANgk (ORCPT ); Wed, 1 Jun 2016 09:36:40 -0400 Received: from merlin.infradead.org ([205.233.59.134]:49640 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753818AbcFANgj (ORCPT ); Wed, 1 Jun 2016 09:36:39 -0400 Date: Wed, 1 Jun 2016 15:36:32 +0200 From: Peter Zijlstra To: Vincent Guittot Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, pjt@google.com, yuyang.du@intel.com, dietmar.eggemann@arm.com, bsegall@google.com, Morten.Rasmussen@arm.com Subject: Re: [RFC PATCH v2] sched: reflect sched_entity movement into task_group's utilization Message-ID: <20160601133632.GB28447@twins.programming.kicks-ass.net> References: <1464080252-17209-1-git-send-email-vincent.guittot@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1464080252-17209-1-git-send-email-vincent.guittot@linaro.org> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 24, 2016 at 10:57:32AM +0200, Vincent Guittot wrote: > +/* > + * Save how much utilization has just been added/removed on cfs rq so we can > + * propagate it across the whole tg tree > + */ > +static void set_tg_cfs_rq_util(struct cfs_rq *cfs_rq, int delta) > +{ > + if (cfs_rq->tg == &root_task_group) > + return; > + > + cfs_rq->diff_util_avg += delta; > +} function name doesn't really reflect its purpose.. > + > +/* Take into account the change of the utilization of a child task group */ This comment could be so much better... :-) > +static void update_tg_cfs_util(struct sched_entity *se, int blocked) > +{ > + int delta; > + struct cfs_rq *cfs_rq; > + long update_util_avg; > + long last_update_time; > + long old_util_avg; > + > + > + /* > + * update_blocked_average will call this function for root cfs_rq > + * whose se is null. In this case just return > + */ > + if (!se) > + return; > + > + if (entity_is_task(se)) > + return 0; > + > + /* Get sched_entity of cfs rq */ > + cfs_rq = group_cfs_rq(se); > + > + update_util_avg = cfs_rq->diff_util_avg; > + > + if (!update_util_avg) > + return 0; > + > + /* Clear pending changes */ > + cfs_rq->diff_util_avg = 0; > + > + /* Add changes in sched_entity utilizaton */ > + old_util_avg = se->avg.util_avg; > + se->avg.util_avg = max_t(long, se->avg.util_avg + update_util_avg, 0); > + se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX; > + > + /* Get parent cfs_rq */ > + cfs_rq = cfs_rq_of(se); > + > + if (blocked) { > + /* > + * blocked utilization has to be synchronized with its parent > + * cfs_rq's timestamp > + */ > + last_update_time = cfs_rq_last_update_time(cfs_rq); > + > + __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), > + &se->avg, > + se->on_rq * scale_load_down(se->load.weight), > + cfs_rq->curr == se, NULL); > + } > + > + delta = se->avg.util_avg - old_util_avg; > + > + cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg + delta, 0); > + cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX; > + > + set_tg_cfs_rq_util(cfs_rq, delta); > +} So if I read this right it computes the utilization delta for group se's and propagates it into the corresponding parent group cfs_rq ? > @@ -6276,6 +6368,8 @@ static void update_blocked_averages(int cpu) > > if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true)) > update_tg_load_avg(cfs_rq, 0); > + /* Propagate pending util changes to the parent */ > + update_tg_cfs_util(cfs_rq->tg->se[cpu], true); And this is why you need the strict bottom-up thing fixed.. > @@ -8370,6 +8464,20 @@ static void detach_task_cfs_rq(struct task_struct *p) > > /* Catch up with the cfs_rq and remove our load when we leave */ > detach_entity_load_avg(cfs_rq, se); > + > + /* > + * Propagate the detach across the tg tree to ake it visible to the > + * root > + */ > + for_each_sched_entity(se) { > + cfs_rq = cfs_rq_of(se); > + > + if (cfs_rq_throttled(cfs_rq)) > + break; > + > + update_load_avg(se, 1); > + update_tg_cfs_util(se, false); > + } > } > > static void attach_task_cfs_rq(struct task_struct *p) > @@ -8399,8 +8507,21 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) > > static void switched_to_fair(struct rq *rq, struct task_struct *p) > { > + struct sched_entity *se = &p->se; > + struct cfs_rq *cfs_rq; > + > attach_task_cfs_rq(p); > > + for_each_sched_entity(se) { > + cfs_rq = cfs_rq_of(se); > + > + if (cfs_rq_throttled(cfs_rq)) > + break; > + > + update_load_avg(se, 1); > + update_tg_cfs_util(se, false); > + } > + > if (task_on_rq_queued(p)) { > /* > * We were most likely switched from sched_rt, so So I would expect this to be in attach_task_cfs_rq() to mirror the detach_task_cfs_rq(). Also, this change is somewhat independent but required for the 'flat' util measure, right?