linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq
@ 2016-06-01 19:39 Dietmar Eggemann
  2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
  To: Peter Zijlstra, linux-kernel
  Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

This is an alternative approach to '[RFC PATCH v2] sched: reflect
sched_entity movement into task_group's utilization' which requires
'[RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list'

The patch-set doesn't currently handle the impact of cfs throttling on
utilization.

[patch 1] sched/fair: Aggregate task utilization only on root cfs_rq

 Prevent the utilization signal maintenance for se/cfs_rq representing
 task groups other than root task group.

[patch 2] sched/fair: Sync se with root cfs_rq

 Make sure that the se (of a task) is in sync with the root cfs_rq

[patch 3] sched/fair: Change @running of __update_load_avg() to
          @update_util

 Pass the information whether utilization (besides load) has to be
 maintained for the container element of @sa.

Dietmar Eggemann (3):
  sched/fair: Aggregate task utilization only on root cfs_rq
  sched/fair: Sync se with root cfs_rq
  sched/fair: Change @running of __update_load_avg() to @update_util

 kernel/sched/fair.c | 81 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 61 insertions(+), 20 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
  2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
@ 2016-06-01 19:39 ` Dietmar Eggemann
  2016-06-02  9:23   ` Juri Lelli
  2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
  To: Peter Zijlstra, linux-kernel
  Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

cpu utilization (cpu_util()) is defined as the cpu (original) capacity
capped cfs_rq->avg->util_avg signal of the root cfs_rq.

With the current pelt version, the utilization of a task [en|de]queued
on/from a cfs_rq, representing a task group other than the root task group
on a cpu, is not immediately propagated down to the root cfs_rq.

This makes decisions based on cpu_util() for scheduling or cpu frequency
settings less accurate in case tasks are running in task groups.

This patch aggregates the task utilization only on the root cfs_rq,
essentially avoiding maintaining utilization for a se/cfs_rq representing
task groups other than the root task group (!entity_is_task(se) and
&rq_of(cfs_rq)->cfs != cfs_rq).

The additional if/else condition to set @update_util in
__update_load_avg() is replaced in 'sched/fair: Change @running of
__update_load_avg() to @update_util' by providing the information whether
utilization has to be maintained via an argument to this function.

The additional requirements for the alignment of the last_update_time of a
se and the root cfs_rq are handled by the patch 'sched/fair: Sync se with
root cfs_rq'.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/fair.c | 48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..212becd3708f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	u32 contrib;
 	unsigned int delta_w, scaled_delta_w, decayed = 0;
 	unsigned long scale_freq, scale_cpu;
+	int update_util = 0;
 
 	delta = now - sa->last_update_time;
 	/*
@@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		return 0;
 	sa->last_update_time = now;
 
+	if (cfs_rq) {
+		if (&rq_of(cfs_rq)->cfs == cfs_rq)
+			update_util = 1;
+	} else if (entity_is_task(container_of(sa, struct sched_entity, avg)))
+			update_util = 1;
+
 	scale_freq = arch_scale_freq_capacity(NULL, cpu);
 	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
@@ -2750,7 +2757,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 						weight * scaled_delta_w;
 			}
 		}
-		if (running)
+		if (update_util && running)
 			sa->util_sum += scaled_delta_w * scale_cpu;
 
 		delta -= delta_w;
@@ -2774,7 +2781,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 			if (cfs_rq)
 				cfs_rq->runnable_load_sum += weight * contrib;
 		}
-		if (running)
+		if (update_util && running)
 			sa->util_sum += contrib * scale_cpu;
 	}
 
@@ -2785,7 +2792,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		if (cfs_rq)
 			cfs_rq->runnable_load_sum += weight * scaled_delta;
 	}
-	if (running)
+	if (update_util && running)
 		sa->util_sum += scaled_delta * scale_cpu;
 
 	sa->period_contrib += delta;
@@ -2796,7 +2803,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 			cfs_rq->runnable_load_avg =
 				div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
 		}
-		sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
+		if (update_util)
+			sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
 	}
 
 	return decayed;
@@ -2918,7 +2926,8 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 		removed_load = 1;
 	}
 
-	if (atomic_long_read(&cfs_rq->removed_util_avg)) {
+	if ((&rq_of(cfs_rq)->cfs == cfs_rq) &&
+	    atomic_long_read(&cfs_rq->removed_util_avg)) {
 		long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
 		sa->util_avg = max_t(long, sa->util_avg - r, 0);
 		sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
@@ -2982,8 +2991,12 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	se->avg.last_update_time = cfs_rq->avg.last_update_time;
 	cfs_rq->avg.load_avg += se->avg.load_avg;
 	cfs_rq->avg.load_sum += se->avg.load_sum;
-	cfs_rq->avg.util_avg += se->avg.util_avg;
-	cfs_rq->avg.util_sum += se->avg.util_sum;
+
+	if (!entity_is_task(se))
+		return;
+
+	rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
+	rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
 
 	cfs_rq_util_change(cfs_rq);
 }
@@ -2996,8 +3009,14 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 
 	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
 	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
-	cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
-	cfs_rq->avg.util_sum = max_t(s32,  cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+
+	if (!entity_is_task(se))
+		return;
+
+	rq_of(cfs_rq)->cfs.avg.util_avg =
+	    max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
+	rq_of(cfs_rq)->cfs.avg.util_sum =
+	    max_t(s32, rq_of(cfs_rq)->cfs.avg.util_sum - se->avg.util_sum, 0);
 
 	cfs_rq_util_change(cfs_rq);
 }
@@ -3082,7 +3101,11 @@ void remove_entity_load_avg(struct sched_entity *se)
 
 	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
 	atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
-	atomic_long_add(se->avg.util_avg, &cfs_rq->removed_util_avg);
+
+	if (!entity_is_task(se))
+		return;
+
+	atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
 }
 
 static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq)
@@ -8460,7 +8483,9 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #endif
 #ifdef CONFIG_SMP
 	atomic_long_set(&cfs_rq->removed_load_avg, 0);
-	atomic_long_set(&cfs_rq->removed_util_avg, 0);
+
+	if (&rq_of(cfs_rq)->cfs == cfs_rq)
+		atomic_long_set(&cfs_rq->removed_util_avg, 0);
 #endif
 }
 
@@ -8525,7 +8550,6 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
 		init_cfs_rq(cfs_rq);
 		init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]);
 		init_entity_runnable_average(se);
-		post_init_entity_util_avg(se);
 	}
 
 	return 1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
  2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
  2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
@ 2016-06-01 19:39 ` Dietmar Eggemann
  2016-06-06  2:59   ` Leo Yan
  2016-06-06 12:11   ` Vincent Guittot
  2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
  2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
  3 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
  To: Peter Zijlstra, linux-kernel
  Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

Since task utilization is accrued only on the root cfs_rq, there are a
couple of places where the se has to be synced with the root cfs_rq:

(1) The root cfs_rq has to be updated in attach_entity_load_avg() for
    an se representing a task in a tg other than the root tg before
    the se utilization can be added to it.

(2) The last_update_time value of the root cfs_rq can be higher
    than the one of the cfs_rq the se is enqueued in. Call
    __update_load_avg() on the se with the last_update_time value of
    the root cfs_rq before removing se's utilization from the root
    cfs_rq in [remove|detach]_entity_load_avg().

In case the difference between the last_update_time value of the cfs_rq
and the root cfs_rq is smaller than 1024ns, the additional calls to
__update_load_avg() will bail early.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/fair.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 212becd3708f..3ae8e79fb687 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2970,6 +2970,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 
 static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	struct cfs_rq* root_cfs_rq;
+
 	if (!sched_feat(ATTACH_AGE_LOAD))
 		goto skip_aging;
 
@@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	if (!entity_is_task(se))
 		return;
 
-	rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
-	rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
+	root_cfs_rq = &rq_of(cfs_rq)->cfs;
+
+	if (parent_entity(se))
+		__update_load_avg(cfs_rq_clock_task(root_cfs_rq),
+				  cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
+				  scale_load_down(root_cfs_rq->load.weight),
+				  upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
+
+	root_cfs_rq->avg.util_avg += se->avg.util_avg;
+	root_cfs_rq->avg.util_sum += se->avg.util_sum;
 
 	cfs_rq_util_change(cfs_rq);
 }
@@ -3013,6 +3023,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	if (!entity_is_task(se))
 		return;
 
+	__update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
+			  cfs_rq->curr == se, NULL);
+
 	rq_of(cfs_rq)->cfs.avg.util_avg =
 	    max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
 	rq_of(cfs_rq)->cfs.avg.util_sum =
@@ -3105,6 +3119,9 @@ void remove_entity_load_avg(struct sched_entity *se)
 	if (!entity_is_task(se))
 		return;
 
+	last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
+
+	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
 	atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
  2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
  2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
  2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
@ 2016-06-01 19:39 ` Dietmar Eggemann
  2016-06-01 20:11   ` Peter Zijlstra
  2016-06-02  9:25   ` Juri Lelli
  2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
  3 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
  To: Peter Zijlstra, linux-kernel
  Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

The information whether a se/cfs_rq should get its load and
utilization (se representing a task and root cfs_rq) or only its load
(se representing a task group and cfs_rq owned by this se) updated can
be passed into __update_load_avg() to avoid the additional if/else
condition to set update_util.

@running is changed to @update_util which now carries the information if
the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
is running or not.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ae8e79fb687..a1c13975cf56 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
 
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
+#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))
+#define upd_util_cfs_rq(cfs_rq) \
+		(((&rq_of(cfs_rq)->cfs == cfs_rq) << 1) | !!cfs_rq->curr)
+
 /*
  * We can represent the historical contribution to runnable average as the
  * coefficients of a geometric series.  To do this we sub-divide our runnable
@@ -2699,13 +2703,12 @@ static u32 __compute_runnable_contrib(u64 n)
  */
 static __always_inline int
 __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
-		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
+		  unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
 {
 	u64 delta, scaled_delta, periods;
 	u32 contrib;
 	unsigned int delta_w, scaled_delta_w, decayed = 0;
 	unsigned long scale_freq, scale_cpu;
-	int update_util = 0;
 
 	delta = now - sa->last_update_time;
 	/*
@@ -2726,12 +2729,6 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		return 0;
 	sa->last_update_time = now;
 
-	if (cfs_rq) {
-		if (&rq_of(cfs_rq)->cfs == cfs_rq)
-			update_util = 1;
-	} else if (entity_is_task(container_of(sa, struct sched_entity, avg)))
-			update_util = 1;
-
 	scale_freq = arch_scale_freq_capacity(NULL, cpu);
 	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
@@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 						weight * scaled_delta_w;
 			}
 		}
-		if (update_util && running)
+		if (update_util == 0x3)
 			sa->util_sum += scaled_delta_w * scale_cpu;
 
 		delta -= delta_w;
@@ -2781,7 +2778,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 			if (cfs_rq)
 				cfs_rq->runnable_load_sum += weight * contrib;
 		}
-		if (update_util && running)
+		if (update_util == 0x3)
 			sa->util_sum += contrib * scale_cpu;
 	}
 
@@ -2792,7 +2789,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		if (cfs_rq)
 			cfs_rq->runnable_load_sum += weight * scaled_delta;
 	}
-	if (update_util && running)
+	if (update_util == 0x3)
 		sa->util_sum += scaled_delta * scale_cpu;
 
 	sa->period_contrib += delta;
@@ -2803,7 +2800,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 			cfs_rq->runnable_load_avg =
 				div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
 		}
-		if (update_util)
+		if (update_util & 0x2)
 			sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
 	}
 
@@ -2873,7 +2870,7 @@ void set_task_rq_fair(struct sched_entity *se,
 		n_last_update_time = next->avg.last_update_time;
 #endif
 		__update_load_avg(p_last_update_time, cpu_of(rq_of(prev)),
-				  &se->avg, 0, 0, NULL);
+				  &se->avg, 0, upd_util_se(se, 0), NULL);
 		se->avg.last_update_time = n_last_update_time;
 	}
 }
@@ -2935,7 +2932,8 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 	}
 
 	decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
-		scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq);
+		scale_load_down(cfs_rq->load.weight), upd_util_cfs_rq(cfs_rq),
+		cfs_rq);
 
 #ifndef CONFIG_64BIT
 	smp_wmb();
@@ -2962,7 +2960,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 	 */
 	__update_load_avg(now, cpu, &se->avg,
 			  se->on_rq * scale_load_down(se->load.weight),
-			  cfs_rq->curr == se, NULL);
+			  upd_util_se(se, cfs_rq->curr == se), NULL);
 
 	if (update_cfs_rq_load_avg(now, cfs_rq, true) && update_tg)
 		update_tg_load_avg(cfs_rq, 0);
@@ -2981,7 +2979,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	 */
 	if (se->avg.last_update_time) {
 		__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-				  &se->avg, 0, 0, NULL);
+				  &se->avg, 0, upd_util_se(se, 0), NULL);
 
 		/*
 		 * XXX: we could have just aged the entire load away if we've been
@@ -3015,7 +3013,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
-			  cfs_rq->curr == se, NULL);
+			  upd_util_se(se, cfs_rq->curr == se), NULL);
 
 	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
 	cfs_rq->avg.load_sum = max_t(s64,  cfs_rq->avg.load_sum - se->avg.load_sum, 0);
@@ -3025,7 +3023,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 
 	__update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
 			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
-			  cfs_rq->curr == se, NULL);
+			  upd_util_se(se, cfs_rq->curr == se), NULL);
 
 	rq_of(cfs_rq)->cfs.avg.util_avg =
 	    max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
@@ -3047,7 +3045,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	if (!migrated) {
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
 			se->on_rq * scale_load_down(se->load.weight),
-			cfs_rq->curr == se, NULL);
+			upd_util_se(se, cfs_rq->curr == se), NULL);
 	}
 
 	decayed = update_cfs_rq_load_avg(now, cfs_rq, !migrated);
@@ -3113,7 +3111,8 @@ void remove_entity_load_avg(struct sched_entity *se)
 
 	last_update_time = cfs_rq_last_update_time(cfs_rq);
 
-	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0,
+			  upd_util_se(se, 0), NULL);
 	atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
 
 	if (!entity_is_task(se))
@@ -3121,7 +3120,8 @@ void remove_entity_load_avg(struct sched_entity *se)
 
 	last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
 
-	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0,
+			  upd_util_se(se, 0), NULL);
 	atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq
  2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
                   ` (2 preceding siblings ...)
  2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
@ 2016-06-01 20:10 ` Peter Zijlstra
  2016-06-02 15:40   ` Dietmar Eggemann
  3 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2016-06-01 20:10 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

On Wed, Jun 01, 2016 at 08:39:19PM +0100, Dietmar Eggemann wrote:
> This is an alternative approach to '[RFC PATCH v2] sched: reflect
> sched_entity movement into task_group's utilization' which requires
> '[RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list'

Awesome, a 3rd alternative :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
  2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
@ 2016-06-01 20:11   ` Peter Zijlstra
  2016-06-02 15:59     ` Dietmar Eggemann
  2016-06-02  9:25   ` Juri Lelli
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2016-06-01 20:11 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

On Wed, Jun 01, 2016 at 08:39:22PM +0100, Dietmar Eggemann wrote:
> The information whether a se/cfs_rq should get its load and
> utilization (se representing a task and root cfs_rq) or only its load
> (se representing a task group and cfs_rq owned by this se) updated can
> be passed into __update_load_avg() to avoid the additional if/else
> condition to set update_util.
> 
> @running is changed to @update_util which now carries the information if
> the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
> is running or not.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
>  1 file changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3ae8e79fb687..a1c13975cf56 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
>  
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>  
> +#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))

Just saying that on first reading that went: Random Number Generator, uh
what?!

So maybe pick better names?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
  2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
@ 2016-06-02  9:23   ` Juri Lelli
  2016-06-02 15:53     ` Dietmar Eggemann
  0 siblings, 1 reply; 17+ messages in thread
From: Juri Lelli @ 2016-06-02  9:23 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du

Hi,

minor comment below.

On 01/06/16 20:39, Dietmar Eggemann wrote:
> cpu utilization (cpu_util()) is defined as the cpu (original) capacity
> capped cfs_rq->avg->util_avg signal of the root cfs_rq.
> 
> With the current pelt version, the utilization of a task [en|de]queued
> on/from a cfs_rq, representing a task group other than the root task group
> on a cpu, is not immediately propagated down to the root cfs_rq.
> 
> This makes decisions based on cpu_util() for scheduling or cpu frequency
> settings less accurate in case tasks are running in task groups.
> 
> This patch aggregates the task utilization only on the root cfs_rq,
> essentially avoiding maintaining utilization for a se/cfs_rq representing
> task groups other than the root task group (!entity_is_task(se) and
> &rq_of(cfs_rq)->cfs != cfs_rq).
> 
> The additional if/else condition to set @update_util in
> __update_load_avg() is replaced in 'sched/fair: Change @running of
> __update_load_avg() to @update_util' by providing the information whether
> utilization has to be maintained via an argument to this function.
> 
> The additional requirements for the alignment of the last_update_time of a
> se and the root cfs_rq are handled by the patch 'sched/fair: Sync se with
> root cfs_rq'.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/fair.c | 48 ++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..212becd3708f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  	u32 contrib;
>  	unsigned int delta_w, scaled_delta_w, decayed = 0;
>  	unsigned long scale_freq, scale_cpu;
> +	int update_util = 0;
>  
>  	delta = now - sa->last_update_time;
>  	/*
> @@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		return 0;
>  	sa->last_update_time = now;
>  
> +	if (cfs_rq) {
> +		if (&rq_of(cfs_rq)->cfs == cfs_rq)

Maybe we can wrap this sort of checks in a static inline improving
readability?

Best,

- Juri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
  2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
  2016-06-01 20:11   ` Peter Zijlstra
@ 2016-06-02  9:25   ` Juri Lelli
  2016-06-02 17:27     ` Dietmar Eggemann
  1 sibling, 1 reply; 17+ messages in thread
From: Juri Lelli @ 2016-06-02  9:25 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du

Hi,

another minor comment below. :-)

On 01/06/16 20:39, Dietmar Eggemann wrote:
> The information whether a se/cfs_rq should get its load and
> utilization (se representing a task and root cfs_rq) or only its load
> (se representing a task group and cfs_rq owned by this se) updated can
> be passed into __update_load_avg() to avoid the additional if/else
> condition to set update_util.
> 
> @running is changed to @update_util which now carries the information if
> the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
> is running or not.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
>  1 file changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3ae8e79fb687..a1c13975cf56 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
>  
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>  
> +#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))
> +#define upd_util_cfs_rq(cfs_rq) \
> +		(((&rq_of(cfs_rq)->cfs == cfs_rq) << 1) | !!cfs_rq->curr)
> +
>  /*
>   * We can represent the historical contribution to runnable average as the
>   * coefficients of a geometric series.  To do this we sub-divide our runnable
> @@ -2699,13 +2703,12 @@ static u32 __compute_runnable_contrib(u64 n)
>   */
>  static __always_inline int
>  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> -		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
> +		  unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
>  {
>  	u64 delta, scaled_delta, periods;
>  	u32 contrib;
>  	unsigned int delta_w, scaled_delta_w, decayed = 0;
>  	unsigned long scale_freq, scale_cpu;
> -	int update_util = 0;
>  
>  	delta = now - sa->last_update_time;
>  	/*
> @@ -2726,12 +2729,6 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  		return 0;
>  	sa->last_update_time = now;
>  
> -	if (cfs_rq) {
> -		if (&rq_of(cfs_rq)->cfs == cfs_rq)
> -			update_util = 1;
> -	} else if (entity_is_task(container_of(sa, struct sched_entity, avg)))
> -			update_util = 1;
> -
>  	scale_freq = arch_scale_freq_capacity(NULL, cpu);
>  	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
>  
> @@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  						weight * scaled_delta_w;
>  			}
>  		}
> -		if (update_util && running)
> +		if (update_util == 0x3)

How about a define for these masks?

Best,

- Juri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq
  2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
@ 2016-06-02 15:40   ` Dietmar Eggemann
  0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

On 01/06/16 21:10, Peter Zijlstra wrote:
> On Wed, Jun 01, 2016 at 08:39:19PM +0100, Dietmar Eggemann wrote:
>> This is an alternative approach to '[RFC PATCH v2] sched: reflect
>> sched_entity movement into task_group's utilization' which requires
>> '[RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list'
> 
> Awesome, a 3rd alternative :-)

Yeah, sorry, but at least it runs op tip/sched/core w/o any other PELT
related changes.

Moreover, it's less code changes compared to the reflect-thing.

And Yuyang's implementation depended on the optimization of
_update_load_avg() (1us->1ms) which IMHO turned out to be not working.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
  2016-06-02  9:23   ` Juri Lelli
@ 2016-06-02 15:53     ` Dietmar Eggemann
  2016-06-02 16:11       ` Juri Lelli
  0 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 15:53 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du

On 02/06/16 10:23, Juri Lelli wrote:

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 218f8e83db73..212becd3708f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>  	u32 contrib;
>>  	unsigned int delta_w, scaled_delta_w, decayed = 0;
>>  	unsigned long scale_freq, scale_cpu;
>> +	int update_util = 0;
>>  
>>  	delta = now - sa->last_update_time;
>>  	/*
>> @@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>  		return 0;
>>  	sa->last_update_time = now;
>>  
>> +	if (cfs_rq) {
>> +		if (&rq_of(cfs_rq)->cfs == cfs_rq)
> 
> Maybe we can wrap this sort of checks in a static inline improving
> readability?

Something like this?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..01b0fa689ef9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -251,6 +251,18 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
        return cfs_rq->rq;
 }
 
+/* cfs_rq "owned" by the root task group */
+static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
+{
+       return &rq_of(cfs_rq)->cfs;
+}
+
+/* Is this cfs_rq "owned" by the root task group ? */
+static inline bool rq_is_root(struct cfs_rq *cfs_rq)
+{
+       return root_rq_of(cfs_rq) == cfs_rq;
+}
+
 /* An entity is a task if it doesn't "own" a runqueue */
 #define entity_is_task(se)     (!se->my_q)
 
@@ -376,6 +388,16 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
        return container_of(cfs_rq, struct rq, cfs);
 }
 
+static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
+{
+       return cfs_rq;
+}
+
+static inline bool rq_is_root(struct cfs_rq *cfs_rq)
+{
+       return true;
+}
+
 #define entity_is_task(se)     1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
  2016-06-01 20:11   ` Peter Zijlstra
@ 2016-06-02 15:59     ` Dietmar Eggemann
  0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 15:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du

On 01/06/16 21:11, Peter Zijlstra wrote:
> On Wed, Jun 01, 2016 at 08:39:22PM +0100, Dietmar Eggemann wrote:
>> The information whether a se/cfs_rq should get its load and
>> utilization (se representing a task and root cfs_rq) or only its load
>> (se representing a task group and cfs_rq owned by this se) updated can
>> be passed into __update_load_avg() to avoid the additional if/else
>> condition to set update_util.
>>
>> @running is changed to @update_util which now carries the information if
>> the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
>> is running or not.
>>
>> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
>> ---
>>  kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
>>  1 file changed, 21 insertions(+), 21 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 3ae8e79fb687..a1c13975cf56 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
>>  
>>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>>  
>> +#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))
> 
> Just saying that on first reading that went: Random Number Generator, uh
> what?!
> 
> So maybe pick better names?

Yeah, can do. What about?

#define update_util_se(se, running) ((entity_is_task(se) << 1) | (running))
#define update_util_rq(cfs_rq) ((rq_is_root(cfs_rq) << 1) | !!(cfs_rq)->curr)

 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
  2016-06-02 15:53     ` Dietmar Eggemann
@ 2016-06-02 16:11       ` Juri Lelli
  0 siblings, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2016-06-02 16:11 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du

On 02/06/16 16:53, Dietmar Eggemann wrote:
> On 02/06/16 10:23, Juri Lelli wrote:
> 
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 218f8e83db73..212becd3708f 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >>  	u32 contrib;
> >>  	unsigned int delta_w, scaled_delta_w, decayed = 0;
> >>  	unsigned long scale_freq, scale_cpu;
> >> +	int update_util = 0;
> >>  
> >>  	delta = now - sa->last_update_time;
> >>  	/*
> >> @@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >>  		return 0;
> >>  	sa->last_update_time = now;
> >>  
> >> +	if (cfs_rq) {
> >> +		if (&rq_of(cfs_rq)->cfs == cfs_rq)
> > 
> > Maybe we can wrap this sort of checks in a static inline improving
> > readability?
> 
> Something like this?
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..01b0fa689ef9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -251,6 +251,18 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
>         return cfs_rq->rq;
>  }
>  
> +/* cfs_rq "owned" by the root task group */
> +static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
> +{
> +       return &rq_of(cfs_rq)->cfs;
> +}
> +
> +/* Is this cfs_rq "owned" by the root task group ? */
> +static inline bool rq_is_root(struct cfs_rq *cfs_rq)
> +{
> +       return root_rq_of(cfs_rq) == cfs_rq;
> +}
> +
>  /* An entity is a task if it doesn't "own" a runqueue */
>  #define entity_is_task(se)     (!se->my_q)
>  
> @@ -376,6 +388,16 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
>         return container_of(cfs_rq, struct rq, cfs);
>  }
>  
> +static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
> +{
> +       return cfs_rq;
> +}
> +
> +static inline bool rq_is_root(struct cfs_rq *cfs_rq)
> +{
> +       return true;
> +}
> +
>  #define entity_is_task(se)     1
> 

Looks good to me.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
  2016-06-02  9:25   ` Juri Lelli
@ 2016-06-02 17:27     ` Dietmar Eggemann
  2016-06-03 10:56       ` Juri Lelli
  0 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 17:27 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du

On 02/06/16 10:25, Juri Lelli wrote:

[...]

>> @@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>  						weight * scaled_delta_w;
>>  			}
>>  		}
>> -		if (update_util && running)
>> +		if (update_util == 0x3)
> 
> How about a define for these masks?

Something like this?

+#define UTIL_RUNNING   1
+#define UTIL_UPDATE    2
+
 /*
  * We can represent the historical contribution to runnable average as the
  * coefficients of a geometric series.  To do this we sub-divide our runnable
@@ -2724,7 +2727,7 @@ static u32 __compute_runnable_contrib(u64 n)
  */
 static __always_inline int
 __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
-                 unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
+                 unsigned long weight, int util_flags, struct cfs_rq *cfs_rq)
 {
        u64 delta, scaled_delta, periods;
        u32 contrib;
@@ -2775,7 +2778,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
                                                weight * scaled_delta_w;
                        }
                }
-               if (update_util == 0x3)
+               if (util_flags == (UTIL_UPDATE | UTIL_RUNNING))
                        sa->util_sum += scaled_delta_w * scale_cpu;
...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
  2016-06-02 17:27     ` Dietmar Eggemann
@ 2016-06-03 10:56       ` Juri Lelli
  0 siblings, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2016-06-03 10:56 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du

On 02/06/16 18:27, Dietmar Eggemann wrote:
> On 02/06/16 10:25, Juri Lelli wrote:
> 
> [...]
> 
> >> @@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >>  						weight * scaled_delta_w;
> >>  			}
> >>  		}
> >> -		if (update_util && running)
> >> +		if (update_util == 0x3)
> > 
> > How about a define for these masks?
> 
> Something like this?
> 
> +#define UTIL_RUNNING   1
> +#define UTIL_UPDATE    2

Make these 0x01 and 0x02, I'd say.

> +
>  /*
>   * We can represent the historical contribution to runnable average as the
>   * coefficients of a geometric series.  To do this we sub-divide our runnable
> @@ -2724,7 +2727,7 @@ static u32 __compute_runnable_contrib(u64 n)
>   */
>  static __always_inline int
>  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> -                 unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
> +                 unsigned long weight, int util_flags, struct cfs_rq *cfs_rq)
>  {
>         u64 delta, scaled_delta, periods;
>         u32 contrib;
> @@ -2775,7 +2778,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>                                                 weight * scaled_delta_w;
>                         }
>                 }
> -               if (update_util == 0x3)
> +               if (util_flags == (UTIL_UPDATE | UTIL_RUNNING))

Looks more readable to me. :-)

Best,

- Juri

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
  2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
@ 2016-06-06  2:59   ` Leo Yan
  2016-06-06  8:45     ` Dietmar Eggemann
  2016-06-06 12:11   ` Vincent Guittot
  1 sibling, 1 reply; 17+ messages in thread
From: Leo Yan @ 2016-06-06  2:59 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du

On Wed, Jun 01, 2016 at 08:39:21PM +0100, Dietmar Eggemann wrote:
> Since task utilization is accrued only on the root cfs_rq, there are a
> couple of places where the se has to be synced with the root cfs_rq:
> 
> (1) The root cfs_rq has to be updated in attach_entity_load_avg() for
>     an se representing a task in a tg other than the root tg before
>     the se utilization can be added to it.
> 
> (2) The last_update_time value of the root cfs_rq can be higher
>     than the one of the cfs_rq the se is enqueued in. Call
>     __update_load_avg() on the se with the last_update_time value of
>     the root cfs_rq before removing se's utilization from the root
>     cfs_rq in [remove|detach]_entity_load_avg().
> 
> In case the difference between the last_update_time value of the cfs_rq
> and the root cfs_rq is smaller than 1024ns, the additional calls to
> __update_load_avg() will bail early.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/fair.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 212becd3708f..3ae8e79fb687 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2970,6 +2970,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>  
>  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> +	struct cfs_rq* root_cfs_rq;
> +
>  	if (!sched_feat(ATTACH_AGE_LOAD))
>  		goto skip_aging;
>  
> @@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>  	if (!entity_is_task(se))
>  		return;
>  
> -	rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
> -	rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
> +	root_cfs_rq = &rq_of(cfs_rq)->cfs;
> +
> +	if (parent_entity(se))
> +		__update_load_avg(cfs_rq_clock_task(root_cfs_rq),
> +				  cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
> +				  scale_load_down(root_cfs_rq->load.weight),
> +				  upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);

Maybe add below macro in this patch for more readable:
#define upd_util_cfs_rq(cfs_rq) XXX

> +
> +	root_cfs_rq->avg.util_avg += se->avg.util_avg;
> +	root_cfs_rq->avg.util_sum += se->avg.util_sum;
>  
>  	cfs_rq_util_change(cfs_rq);
>  }
> @@ -3013,6 +3023,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>  	if (!entity_is_task(se))
>  		return;
>  
> +	__update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> +			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
> +			  cfs_rq->curr == se, NULL);
> +
>  	rq_of(cfs_rq)->cfs.avg.util_avg =
>  	    max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
>  	rq_of(cfs_rq)->cfs.avg.util_sum =
> @@ -3105,6 +3119,9 @@ void remove_entity_load_avg(struct sched_entity *se)
>  	if (!entity_is_task(se))
>  		return;
>  
> +	last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
> +
> +	__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
>  	atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
>  }
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
  2016-06-06  2:59   ` Leo Yan
@ 2016-06-06  8:45     ` Dietmar Eggemann
  0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-06  8:45 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
	Morten Rasmussen, Yuyang Du



On 06/06/16 03:59, Leo Yan wrote:
> On Wed, Jun 01, 2016 at 08:39:21PM +0100, Dietmar Eggemann wrote:

[...]

>> @@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>>  	if (!entity_is_task(se))
>>  		return;
>>  
>> -	rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
>> -	rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
>> +	root_cfs_rq = &rq_of(cfs_rq)->cfs;
>> +
>> +	if (parent_entity(se))
>> +		__update_load_avg(cfs_rq_clock_task(root_cfs_rq),
>> +				  cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
>> +				  scale_load_down(root_cfs_rq->load.weight),
>> +				  upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
> 
> Maybe add below macro in this patch for more readable:
> #define upd_util_cfs_rq(cfs_rq) XXX

Sorry, this doesn't belong here since the macro is only introduced in
3/3. Should be 'root_cfs_rq->curr != NULL' here instead.

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
  2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
  2016-06-06  2:59   ` Leo Yan
@ 2016-06-06 12:11   ` Vincent Guittot
  1 sibling, 0 replies; 17+ messages in thread
From: Vincent Guittot @ 2016-06-06 12:11 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, linux-kernel, Ben Segall, Morten Rasmussen, Yuyang Du

Hi Dietmar,

On 1 June 2016 at 21:39, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> Since task utilization is accrued only on the root cfs_rq, there are a
> couple of places where the se has to be synced with the root cfs_rq:
>
> (1) The root cfs_rq has to be updated in attach_entity_load_avg() for
>     an se representing a task in a tg other than the root tg before
>     the se utilization can be added to it.
>
> (2) The last_update_time value of the root cfs_rq can be higher
>     than the one of the cfs_rq the se is enqueued in. Call
>     __update_load_avg() on the se with the last_update_time value of
>     the root cfs_rq before removing se's utilization from the root
>     cfs_rq in [remove|detach]_entity_load_avg().
>
> In case the difference between the last_update_time value of the cfs_rq
> and the root cfs_rq is smaller than 1024ns, the additional calls to
> __update_load_avg() will bail early.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/fair.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 212becd3708f..3ae8e79fb687 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2970,6 +2970,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>
>  static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> +       struct cfs_rq* root_cfs_rq;
> +
>         if (!sched_feat(ATTACH_AGE_LOAD))
>                 goto skip_aging;
>
> @@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>         if (!entity_is_task(se))
>                 return;
>
> -       rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
> -       rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
> +       root_cfs_rq = &rq_of(cfs_rq)->cfs;
> +
> +       if (parent_entity(se))
> +               __update_load_avg(cfs_rq_clock_task(root_cfs_rq),
> +                                 cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
> +                                 scale_load_down(root_cfs_rq->load.weight),
> +                                 upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
> +
> +       root_cfs_rq->avg.util_avg += se->avg.util_avg;
> +       root_cfs_rq->avg.util_sum += se->avg.util_sum;

The main issue with flat utilization is that we can't keep the
sched_avg on an sched_entity synced (from a last_update_time pov) with
both the cfs_rq on which load is attached and the root_cfs rq on which
the utilization is attached.

With this additional sync to root cfs_rq in
attach/detach_entity_load_avg and in remove_entity_load_avg, the load
of a sched_entity is no more synced to the time stamp of cfs_rq onto
which it is attached. This  can generate several wrong update of the
load of the latter.
As an example, lets take a task TA that sleeps and move it on TGB
which has not run recently so TGB.avg.last_update_time << root
cfs_rq.avg.last_update_time (a decay of 20ms remove 35% of the load)
When we attach TA to TGB, TA is sync with TGB for attaching it and
then decayed to be synced with root cfs_rq.
If TA is then moved to another task group, we try to sync TA to TGB
but TA is in the future so TA.avg.last_update_time is set to TGB one.
Then, TA load is removed to TGB but TA load has been decayed so only a
part will be effectively subtracted. Then, TA load is synced with root
cfs_rq which means decayed one more time for the same time slot
because TA.avg.last_update_time has been reset to
TGB.avg.last_update_time so we will substract less utilization than
what we should in root cfs_rq.

I think that similar behavior can apply with the removed load.


>
>         cfs_rq_util_change(cfs_rq);
>  }
> @@ -3013,6 +3023,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>         if (!entity_is_task(se))
>                 return;
>
> +       __update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> +                         &se->avg, se->on_rq * scale_load_down(se->load.weight),
> +                         cfs_rq->curr == se, NULL);
> +
>         rq_of(cfs_rq)->cfs.avg.util_avg =
>             max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
>         rq_of(cfs_rq)->cfs.avg.util_sum =
> @@ -3105,6 +3119,9 @@ void remove_entity_load_avg(struct sched_entity *se)
>         if (!entity_is_task(se))
>                 return;
>
> +       last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
> +
> +       __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
>         atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
>  }
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-06-06 12:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
2016-06-02  9:23   ` Juri Lelli
2016-06-02 15:53     ` Dietmar Eggemann
2016-06-02 16:11       ` Juri Lelli
2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
2016-06-06  2:59   ` Leo Yan
2016-06-06  8:45     ` Dietmar Eggemann
2016-06-06 12:11   ` Vincent Guittot
2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
2016-06-01 20:11   ` Peter Zijlstra
2016-06-02 15:59     ` Dietmar Eggemann
2016-06-02  9:25   ` Juri Lelli
2016-06-02 17:27     ` Dietmar Eggemann
2016-06-03 10:56       ` Juri Lelli
2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
2016-06-02 15:40   ` Dietmar Eggemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).