From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752145AbcFUOP7 (ORCPT <rfc822;w@1wt.eu>);
	Tue, 21 Jun 2016 10:15:59 -0400
Received: from bombadil.infradead.org ([198.137.202.9]:38777 "EHLO
	bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751380AbcFUOP5 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 21 Jun 2016 10:15:57 -0400
Date: Tue, 21 Jun 2016 15:17:46 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
        Yuyang Du <yuyang.du@intel.com>, Ingo Molnar <mingo@kernel.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        Benjamin Segall <bsegall@google.com>, Paul Turner <pjt@google.com>,
        Morten Rasmussen <morten.rasmussen@arm.com>,
        Matt Fleming <matt@codeblueprint.co.uk>
Subject: Re: [PATCH 4/4] sched,fair: Fix PELT integrity for new tasks
Message-ID: <20160621131746.GR30927@twins.programming.kicks-ass.net>
References: <20160617120136.064100812@infradead.org>
 <20160617120454.150630859@infradead.org>
 <CAKfTPtAUtteP0V5-1u2n0YFDdFbZfwSqigHjOfM32Vbw3iAPbg@mail.gmail.com>
 <20160617142814.GT30154@twins.programming.kicks-ass.net>
 <20160617160239.GL30927@twins.programming.kicks-ass.net>
 <20160617161831.GM30927@twins.programming.kicks-ass.net>
 <5767D51F.3080600@arm.com>
 <CAKfTPtBB3jyOWrv8=+S2zRXD+pSXPWgoZ3aJUscE5yJDpkDmzw@mail.gmail.com>
 <5768027E.1090408@arm.com>
 <20160621084119.GN30154@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160621084119.GN30154@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Jun 21, 2016 at 10:41:19AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 20, 2016 at 03:49:34PM +0100, Dietmar Eggemann wrote:
> > On 20/06/16 13:35, Vincent Guittot wrote:
> 
> > > It will go through wake_up_new_task and post_init_entity_util_avg
> > > during its fork which is enough to set last_update_time. Then, it will
> > > use the switched_to_fair if the task becomes a fair one
> > 
> > Oh I see. We want to make sure that every task (even when forked as
> > !fair) has a last_update_time value != 0, when becoming fair one day.
> 
> Right, see 2 below. I need to write a bunch of comments explaining PELT
> proper, as well as document these things.
> 
> The things we ran into with these patches were that:
> 
>  1) You need to update the cfs_rq _before_ any entity attach/detach
>     (and might need to update_tg_load_avg when update_cfs_rq_load_avg()
>     returns true).
> 
>  2) (fair) entities are always attached, switched_from/to deal with !fair.
> 
>  3) cpu migration is the only exception and uses the last_update_time=0
>     thing -- because refusal to take second rq->lock.
> 
> Which is why I dislike Yuyang's patches, they create more exceptions
> instead of applying existing rules (albeit undocumented).
> 
> Esp. 1 is important, because while for mathematically consistency you
> don't actually need to do this, you only need the entities to be
> up-to-date with the cfs rq when you attach/detach, but that forgets the
> temporal aspect of _when_ you do this.

I have the below for now, I'll continue poking at this for a bit.


--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -692,6 +692,7 @@ void init_entity_runnable_average(struct
 
 static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
 static int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq);
+static void update_tg_load_avg(struct cfs_rq *cfs_rq, int force);
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se);
 
 /*
@@ -757,7 +758,8 @@ void post_init_entity_util_avg(struct sc
 		}
 	}
 
-	update_cfs_rq_load_avg(now, cfs_rq, false);
+	if (update_cfs_rq_load_avg(now, cfs_rq, false))
+		update_tg_load_avg(cfs_rq, false);
 	attach_entity_load_avg(cfs_rq, se);
 }
 
@@ -2919,7 +2921,21 @@ static inline void cfs_rq_util_change(st
 	WRITE_ONCE(*ptr, res);					\
 } while (0)
 
-/* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
+/**
+ * update_cfs_rq_load_avg - update the cfs_rq's load/util averages
+ * @now: current time, as per cfs_rq_clock_task()
+ * @cfs_rq: cfs_rq to update
+ * @update_freq: should we call cfs_rq_util_change() or will the call do so
+ *
+ * The cfs_rq avg is the direct sum of all its entities (blocked and runnable)
+ * avg. The immediate corollary is that all (fair) tasks must be attached, see
+ * post_init_entity_util_avg().
+ *
+ * cfs_rq->avg is used for task_h_load() and update_cfs_share() for example.
+ *
+ * Returns true if the load decayed or we removed utilization. It is expected
+ * that one calls update_tg_load_avg() on this condition.
+ */
 static inline int
 update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 {
@@ -2974,6 +2990,14 @@ static inline void update_load_avg(struc
 		update_tg_load_avg(cfs_rq, 0);
 }
 
+/**
+ * attach_entity_load_avg - attach this entity to its cfs_rq load avg
+ * @cfs_rq: cfs_rq to attach to
+ * @se: sched_entity to attach
+ *
+ * Must call update_cfs_rq_load_avg() before this, since we rely on
+ * cfs_rq->avg.last_update_time being current.
+ */
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	if (!sched_feat(ATTACH_AGE_LOAD))
@@ -3005,6 +3029,14 @@ static void attach_entity_load_avg(struc
 	cfs_rq_util_change(cfs_rq);
 }
 
+/**
+ * detach_entity_load_avg - detach this entity from its cfs_rq load avg
+ * @cfs_rq: cfs_rq to detach from
+ * @se: sched_entity to detach
+ *
+ * Must call update_cfs_rq_load_avg() before this, since we rely on
+ * cfs_rq->avg.last_update_time being current.
+ */
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
@@ -8392,7 +8424,8 @@ static void detach_task_cfs_rq(struct ta
 	}
 
 	/* Catch up with the cfs_rq and remove our load when we leave */
-	update_cfs_rq_load_avg(now, cfs_rq, false);
+	if (update_cfs_rq_load_avg(now, cfs_rq, false))
+		update_tg_load_avg(cfs_rq, false);
 	detach_entity_load_avg(cfs_rq, se);
 }
 
@@ -8411,7 +8444,8 @@ static void attach_task_cfs_rq(struct ta
 #endif
 
 	/* Synchronize task with its cfs_rq */
-	update_cfs_rq_load_avg(now, cfs_rq, false);
+	if (update_cfs_rq_load_avg(now, cfs_rq, false))
+		update_tg_load_avg(cfs_rq, false);
 	attach_entity_load_avg(cfs_rq, se);
 
 	if (!vruntime_normalized(p))