* [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq
@ 2016-06-01 19:39 Dietmar Eggemann
2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
To: Peter Zijlstra, linux-kernel
Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
This is an alternative approach to '[RFC PATCH v2] sched: reflect
sched_entity movement into task_group's utilization' which requires
'[RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list'
The patch-set doesn't currently handle the impact of cfs throttling on
utilization.
[patch 1] sched/fair: Aggregate task utilization only on root cfs_rq
Prevent the utilization signal maintenance for se/cfs_rq representing
task groups other than root task group.
[patch 2] sched/fair: Sync se with root cfs_rq
Make sure that the se (of a task) is in sync with the root cfs_rq
[patch 3] sched/fair: Change @running of __update_load_avg() to
@update_util
Pass the information whether utilization (besides load) has to be
maintained for the container element of @sa.
Dietmar Eggemann (3):
sched/fair: Aggregate task utilization only on root cfs_rq
sched/fair: Sync se with root cfs_rq
sched/fair: Change @running of __update_load_avg() to @update_util
kernel/sched/fair.c | 81 ++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 61 insertions(+), 20 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
@ 2016-06-01 19:39 ` Dietmar Eggemann
2016-06-02 9:23 ` Juri Lelli
2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
To: Peter Zijlstra, linux-kernel
Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
cpu utilization (cpu_util()) is defined as the cpu (original) capacity
capped cfs_rq->avg->util_avg signal of the root cfs_rq.
With the current pelt version, the utilization of a task [en|de]queued
on/from a cfs_rq, representing a task group other than the root task group
on a cpu, is not immediately propagated down to the root cfs_rq.
This makes decisions based on cpu_util() for scheduling or cpu frequency
settings less accurate in case tasks are running in task groups.
This patch aggregates the task utilization only on the root cfs_rq,
essentially avoiding maintaining utilization for a se/cfs_rq representing
task groups other than the root task group (!entity_is_task(se) and
&rq_of(cfs_rq)->cfs != cfs_rq).
The additional if/else condition to set @update_util in
__update_load_avg() is replaced in 'sched/fair: Change @running of
__update_load_avg() to @update_util' by providing the information whether
utilization has to be maintained via an argument to this function.
The additional requirements for the alignment of the last_update_time of a
se and the root cfs_rq are handled by the patch 'sched/fair: Sync se with
root cfs_rq'.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
kernel/sched/fair.c | 48 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 36 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..212becd3708f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
u32 contrib;
unsigned int delta_w, scaled_delta_w, decayed = 0;
unsigned long scale_freq, scale_cpu;
+ int update_util = 0;
delta = now - sa->last_update_time;
/*
@@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
+ if (cfs_rq) {
+ if (&rq_of(cfs_rq)->cfs == cfs_rq)
+ update_util = 1;
+ } else if (entity_is_task(container_of(sa, struct sched_entity, avg)))
+ update_util = 1;
+
scale_freq = arch_scale_freq_capacity(NULL, cpu);
scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
@@ -2750,7 +2757,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
weight * scaled_delta_w;
}
}
- if (running)
+ if (update_util && running)
sa->util_sum += scaled_delta_w * scale_cpu;
delta -= delta_w;
@@ -2774,7 +2781,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * contrib;
}
- if (running)
+ if (update_util && running)
sa->util_sum += contrib * scale_cpu;
}
@@ -2785,7 +2792,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * scaled_delta;
}
- if (running)
+ if (update_util && running)
sa->util_sum += scaled_delta * scale_cpu;
sa->period_contrib += delta;
@@ -2796,7 +2803,8 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
cfs_rq->runnable_load_avg =
div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
}
- sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
+ if (update_util)
+ sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
}
return decayed;
@@ -2918,7 +2926,8 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
removed_load = 1;
}
- if (atomic_long_read(&cfs_rq->removed_util_avg)) {
+ if ((&rq_of(cfs_rq)->cfs == cfs_rq) &&
+ atomic_long_read(&cfs_rq->removed_util_avg)) {
long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
sa->util_avg = max_t(long, sa->util_avg - r, 0);
sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
@@ -2982,8 +2991,12 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
se->avg.last_update_time = cfs_rq->avg.last_update_time;
cfs_rq->avg.load_avg += se->avg.load_avg;
cfs_rq->avg.load_sum += se->avg.load_sum;
- cfs_rq->avg.util_avg += se->avg.util_avg;
- cfs_rq->avg.util_sum += se->avg.util_sum;
+
+ if (!entity_is_task(se))
+ return;
+
+ rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
+ rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
cfs_rq_util_change(cfs_rq);
}
@@ -2996,8 +3009,14 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
- cfs_rq->avg.util_avg = max_t(long, cfs_rq->avg.util_avg - se->avg.util_avg, 0);
- cfs_rq->avg.util_sum = max_t(s32, cfs_rq->avg.util_sum - se->avg.util_sum, 0);
+
+ if (!entity_is_task(se))
+ return;
+
+ rq_of(cfs_rq)->cfs.avg.util_avg =
+ max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
+ rq_of(cfs_rq)->cfs.avg.util_sum =
+ max_t(s32, rq_of(cfs_rq)->cfs.avg.util_sum - se->avg.util_sum, 0);
cfs_rq_util_change(cfs_rq);
}
@@ -3082,7 +3101,11 @@ void remove_entity_load_avg(struct sched_entity *se)
__update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
- atomic_long_add(se->avg.util_avg, &cfs_rq->removed_util_avg);
+
+ if (!entity_is_task(se))
+ return;
+
+ atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
}
static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq)
@@ -8460,7 +8483,9 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
#endif
#ifdef CONFIG_SMP
atomic_long_set(&cfs_rq->removed_load_avg, 0);
- atomic_long_set(&cfs_rq->removed_util_avg, 0);
+
+ if (&rq_of(cfs_rq)->cfs == cfs_rq)
+ atomic_long_set(&cfs_rq->removed_util_avg, 0);
#endif
}
@@ -8525,7 +8550,6 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
init_cfs_rq(cfs_rq);
init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]);
init_entity_runnable_average(se);
- post_init_entity_util_avg(se);
}
return 1;
--
1.9.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
@ 2016-06-01 19:39 ` Dietmar Eggemann
2016-06-06 2:59 ` Leo Yan
2016-06-06 12:11 ` Vincent Guittot
2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
3 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
To: Peter Zijlstra, linux-kernel
Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
Since task utilization is accrued only on the root cfs_rq, there are a
couple of places where the se has to be synced with the root cfs_rq:
(1) The root cfs_rq has to be updated in attach_entity_load_avg() for
an se representing a task in a tg other than the root tg before
the se utilization can be added to it.
(2) The last_update_time value of the root cfs_rq can be higher
than the one of the cfs_rq the se is enqueued in. Call
__update_load_avg() on the se with the last_update_time value of
the root cfs_rq before removing se's utilization from the root
cfs_rq in [remove|detach]_entity_load_avg().
In case the difference between the last_update_time value of the cfs_rq
and the root cfs_rq is smaller than 1024ns, the additional calls to
__update_load_avg() will bail early.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
kernel/sched/fair.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 212becd3708f..3ae8e79fb687 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2970,6 +2970,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
+ struct cfs_rq* root_cfs_rq;
+
if (!sched_feat(ATTACH_AGE_LOAD))
goto skip_aging;
@@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
if (!entity_is_task(se))
return;
- rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
- rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
+ root_cfs_rq = &rq_of(cfs_rq)->cfs;
+
+ if (parent_entity(se))
+ __update_load_avg(cfs_rq_clock_task(root_cfs_rq),
+ cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
+ scale_load_down(root_cfs_rq->load.weight),
+ upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
+
+ root_cfs_rq->avg.util_avg += se->avg.util_avg;
+ root_cfs_rq->avg.util_sum += se->avg.util_sum;
cfs_rq_util_change(cfs_rq);
}
@@ -3013,6 +3023,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
if (!entity_is_task(se))
return;
+ __update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
+ &se->avg, se->on_rq * scale_load_down(se->load.weight),
+ cfs_rq->curr == se, NULL);
+
rq_of(cfs_rq)->cfs.avg.util_avg =
max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
rq_of(cfs_rq)->cfs.avg.util_sum =
@@ -3105,6 +3119,9 @@ void remove_entity_load_avg(struct sched_entity *se)
if (!entity_is_task(se))
return;
+ last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
+
+ __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
@ 2016-06-01 19:39 ` Dietmar Eggemann
2016-06-01 20:11 ` Peter Zijlstra
2016-06-02 9:25 ` Juri Lelli
2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
3 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-01 19:39 UTC (permalink / raw)
To: Peter Zijlstra, linux-kernel
Cc: Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
The information whether a se/cfs_rq should get its load and
utilization (se representing a task and root cfs_rq) or only its load
(se representing a task group and cfs_rq owned by this se) updated can
be passed into __update_load_avg() to avoid the additional if/else
condition to set update_util.
@running is changed to @update_util which now carries the information if
the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
is running or not.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
1 file changed, 21 insertions(+), 21 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ae8e79fb687..a1c13975cf56 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))
+#define upd_util_cfs_rq(cfs_rq) \
+ (((&rq_of(cfs_rq)->cfs == cfs_rq) << 1) | !!cfs_rq->curr)
+
/*
* We can represent the historical contribution to runnable average as the
* coefficients of a geometric series. To do this we sub-divide our runnable
@@ -2699,13 +2703,12 @@ static u32 __compute_runnable_contrib(u64 n)
*/
static __always_inline int
__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
- unsigned long weight, int running, struct cfs_rq *cfs_rq)
+ unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
{
u64 delta, scaled_delta, periods;
u32 contrib;
unsigned int delta_w, scaled_delta_w, decayed = 0;
unsigned long scale_freq, scale_cpu;
- int update_util = 0;
delta = now - sa->last_update_time;
/*
@@ -2726,12 +2729,6 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return 0;
sa->last_update_time = now;
- if (cfs_rq) {
- if (&rq_of(cfs_rq)->cfs == cfs_rq)
- update_util = 1;
- } else if (entity_is_task(container_of(sa, struct sched_entity, avg)))
- update_util = 1;
-
scale_freq = arch_scale_freq_capacity(NULL, cpu);
scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
@@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
weight * scaled_delta_w;
}
}
- if (update_util && running)
+ if (update_util == 0x3)
sa->util_sum += scaled_delta_w * scale_cpu;
delta -= delta_w;
@@ -2781,7 +2778,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * contrib;
}
- if (update_util && running)
+ if (update_util == 0x3)
sa->util_sum += contrib * scale_cpu;
}
@@ -2792,7 +2789,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
if (cfs_rq)
cfs_rq->runnable_load_sum += weight * scaled_delta;
}
- if (update_util && running)
+ if (update_util == 0x3)
sa->util_sum += scaled_delta * scale_cpu;
sa->period_contrib += delta;
@@ -2803,7 +2800,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
cfs_rq->runnable_load_avg =
div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
}
- if (update_util)
+ if (update_util & 0x2)
sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
}
@@ -2873,7 +2870,7 @@ void set_task_rq_fair(struct sched_entity *se,
n_last_update_time = next->avg.last_update_time;
#endif
__update_load_avg(p_last_update_time, cpu_of(rq_of(prev)),
- &se->avg, 0, 0, NULL);
+ &se->avg, 0, upd_util_se(se, 0), NULL);
se->avg.last_update_time = n_last_update_time;
}
}
@@ -2935,7 +2932,8 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
}
decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
- scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq);
+ scale_load_down(cfs_rq->load.weight), upd_util_cfs_rq(cfs_rq),
+ cfs_rq);
#ifndef CONFIG_64BIT
smp_wmb();
@@ -2962,7 +2960,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
*/
__update_load_avg(now, cpu, &se->avg,
se->on_rq * scale_load_down(se->load.weight),
- cfs_rq->curr == se, NULL);
+ upd_util_se(se, cfs_rq->curr == se), NULL);
if (update_cfs_rq_load_avg(now, cfs_rq, true) && update_tg)
update_tg_load_avg(cfs_rq, 0);
@@ -2981,7 +2979,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
*/
if (se->avg.last_update_time) {
__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
- &se->avg, 0, 0, NULL);
+ &se->avg, 0, upd_util_se(se, 0), NULL);
/*
* XXX: we could have just aged the entire load away if we've been
@@ -3015,7 +3013,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
{
__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
&se->avg, se->on_rq * scale_load_down(se->load.weight),
- cfs_rq->curr == se, NULL);
+ upd_util_se(se, cfs_rq->curr == se), NULL);
cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
cfs_rq->avg.load_sum = max_t(s64, cfs_rq->avg.load_sum - se->avg.load_sum, 0);
@@ -3025,7 +3023,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
__update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
&se->avg, se->on_rq * scale_load_down(se->load.weight),
- cfs_rq->curr == se, NULL);
+ upd_util_se(se, cfs_rq->curr == se), NULL);
rq_of(cfs_rq)->cfs.avg.util_avg =
max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
@@ -3047,7 +3045,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
if (!migrated) {
__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
se->on_rq * scale_load_down(se->load.weight),
- cfs_rq->curr == se, NULL);
+ upd_util_se(se, cfs_rq->curr == se), NULL);
}
decayed = update_cfs_rq_load_avg(now, cfs_rq, !migrated);
@@ -3113,7 +3111,8 @@ void remove_entity_load_avg(struct sched_entity *se)
last_update_time = cfs_rq_last_update_time(cfs_rq);
- __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+ __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0,
+ upd_util_se(se, 0), NULL);
atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg);
if (!entity_is_task(se))
@@ -3121,7 +3120,8 @@ void remove_entity_load_avg(struct sched_entity *se)
last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
- __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
+ __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0,
+ upd_util_se(se, 0), NULL);
atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
}
--
1.9.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq
2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
` (2 preceding siblings ...)
2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
@ 2016-06-01 20:10 ` Peter Zijlstra
2016-06-02 15:40 ` Dietmar Eggemann
3 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2016-06-01 20:10 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
On Wed, Jun 01, 2016 at 08:39:19PM +0100, Dietmar Eggemann wrote:
> This is an alternative approach to '[RFC PATCH v2] sched: reflect
> sched_entity movement into task_group's utilization' which requires
> '[RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list'
Awesome, a 3rd alternative :-)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
@ 2016-06-01 20:11 ` Peter Zijlstra
2016-06-02 15:59 ` Dietmar Eggemann
2016-06-02 9:25 ` Juri Lelli
1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2016-06-01 20:11 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
On Wed, Jun 01, 2016 at 08:39:22PM +0100, Dietmar Eggemann wrote:
> The information whether a se/cfs_rq should get its load and
> utilization (se representing a task and root cfs_rq) or only its load
> (se representing a task group and cfs_rq owned by this se) updated can
> be passed into __update_load_avg() to avoid the additional if/else
> condition to set update_util.
>
> @running is changed to @update_util which now carries the information if
> the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
> is running or not.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
> kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
> 1 file changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3ae8e79fb687..a1c13975cf56 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
>
> #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>
> +#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))
Just saying that on first reading that went: Random Number Generator, uh
what?!
So maybe pick better names?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
@ 2016-06-02 9:23 ` Juri Lelli
2016-06-02 15:53 ` Dietmar Eggemann
0 siblings, 1 reply; 17+ messages in thread
From: Juri Lelli @ 2016-06-02 9:23 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
Hi,
minor comment below.
On 01/06/16 20:39, Dietmar Eggemann wrote:
> cpu utilization (cpu_util()) is defined as the cpu (original) capacity
> capped cfs_rq->avg->util_avg signal of the root cfs_rq.
>
> With the current pelt version, the utilization of a task [en|de]queued
> on/from a cfs_rq, representing a task group other than the root task group
> on a cpu, is not immediately propagated down to the root cfs_rq.
>
> This makes decisions based on cpu_util() for scheduling or cpu frequency
> settings less accurate in case tasks are running in task groups.
>
> This patch aggregates the task utilization only on the root cfs_rq,
> essentially avoiding maintaining utilization for a se/cfs_rq representing
> task groups other than the root task group (!entity_is_task(se) and
> &rq_of(cfs_rq)->cfs != cfs_rq).
>
> The additional if/else condition to set @update_util in
> __update_load_avg() is replaced in 'sched/fair: Change @running of
> __update_load_avg() to @update_util' by providing the information whether
> utilization has to be maintained via an argument to this function.
>
> The additional requirements for the alignment of the last_update_time of a
> se and the root cfs_rq are handled by the patch 'sched/fair: Sync se with
> root cfs_rq'.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
> kernel/sched/fair.c | 48 ++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 36 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..212becd3708f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> u32 contrib;
> unsigned int delta_w, scaled_delta_w, decayed = 0;
> unsigned long scale_freq, scale_cpu;
> + int update_util = 0;
>
> delta = now - sa->last_update_time;
> /*
> @@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> return 0;
> sa->last_update_time = now;
>
> + if (cfs_rq) {
> + if (&rq_of(cfs_rq)->cfs == cfs_rq)
Maybe we can wrap this sort of checks in a static inline improving
readability?
Best,
- Juri
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
2016-06-01 20:11 ` Peter Zijlstra
@ 2016-06-02 9:25 ` Juri Lelli
2016-06-02 17:27 ` Dietmar Eggemann
1 sibling, 1 reply; 17+ messages in thread
From: Juri Lelli @ 2016-06-02 9:25 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
Hi,
another minor comment below. :-)
On 01/06/16 20:39, Dietmar Eggemann wrote:
> The information whether a se/cfs_rq should get its load and
> utilization (se representing a task and root cfs_rq) or only its load
> (se representing a task group and cfs_rq owned by this se) updated can
> be passed into __update_load_avg() to avoid the additional if/else
> condition to set update_util.
>
> @running is changed to @update_util which now carries the information if
> the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
> is running or not.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
> kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
> 1 file changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3ae8e79fb687..a1c13975cf56 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
>
> #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>
> +#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))
> +#define upd_util_cfs_rq(cfs_rq) \
> + (((&rq_of(cfs_rq)->cfs == cfs_rq) << 1) | !!cfs_rq->curr)
> +
> /*
> * We can represent the historical contribution to runnable average as the
> * coefficients of a geometric series. To do this we sub-divide our runnable
> @@ -2699,13 +2703,12 @@ static u32 __compute_runnable_contrib(u64 n)
> */
> static __always_inline int
> __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> - unsigned long weight, int running, struct cfs_rq *cfs_rq)
> + unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
> {
> u64 delta, scaled_delta, periods;
> u32 contrib;
> unsigned int delta_w, scaled_delta_w, decayed = 0;
> unsigned long scale_freq, scale_cpu;
> - int update_util = 0;
>
> delta = now - sa->last_update_time;
> /*
> @@ -2726,12 +2729,6 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> return 0;
> sa->last_update_time = now;
>
> - if (cfs_rq) {
> - if (&rq_of(cfs_rq)->cfs == cfs_rq)
> - update_util = 1;
> - } else if (entity_is_task(container_of(sa, struct sched_entity, avg)))
> - update_util = 1;
> -
> scale_freq = arch_scale_freq_capacity(NULL, cpu);
> scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
>
> @@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> weight * scaled_delta_w;
> }
> }
> - if (update_util && running)
> + if (update_util == 0x3)
How about a define for these masks?
Best,
- Juri
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq
2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
@ 2016-06-02 15:40 ` Dietmar Eggemann
0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 15:40 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
On 01/06/16 21:10, Peter Zijlstra wrote:
> On Wed, Jun 01, 2016 at 08:39:19PM +0100, Dietmar Eggemann wrote:
>> This is an alternative approach to '[RFC PATCH v2] sched: reflect
>> sched_entity movement into task_group's utilization' which requires
>> '[RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list'
>
> Awesome, a 3rd alternative :-)
Yeah, sorry, but at least it runs op tip/sched/core w/o any other PELT
related changes.
Moreover, it's less code changes compared to the reflect-thing.
And Yuyang's implementation depended on the optimization of
_update_load_avg() (1us->1ms) which IMHO turned out to be not working.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
2016-06-02 9:23 ` Juri Lelli
@ 2016-06-02 15:53 ` Dietmar Eggemann
2016-06-02 16:11 ` Juri Lelli
0 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 15:53 UTC (permalink / raw)
To: Juri Lelli
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
On 02/06/16 10:23, Juri Lelli wrote:
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 218f8e83db73..212becd3708f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>> u32 contrib;
>> unsigned int delta_w, scaled_delta_w, decayed = 0;
>> unsigned long scale_freq, scale_cpu;
>> + int update_util = 0;
>>
>> delta = now - sa->last_update_time;
>> /*
>> @@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>> return 0;
>> sa->last_update_time = now;
>>
>> + if (cfs_rq) {
>> + if (&rq_of(cfs_rq)->cfs == cfs_rq)
>
> Maybe we can wrap this sort of checks in a static inline improving
> readability?
Something like this?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..01b0fa689ef9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -251,6 +251,18 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
return cfs_rq->rq;
}
+/* cfs_rq "owned" by the root task group */
+static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
+{
+ return &rq_of(cfs_rq)->cfs;
+}
+
+/* Is this cfs_rq "owned" by the root task group ? */
+static inline bool rq_is_root(struct cfs_rq *cfs_rq)
+{
+ return root_rq_of(cfs_rq) == cfs_rq;
+}
+
/* An entity is a task if it doesn't "own" a runqueue */
#define entity_is_task(se) (!se->my_q)
@@ -376,6 +388,16 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
return container_of(cfs_rq, struct rq, cfs);
}
+static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
+{
+ return cfs_rq;
+}
+
+static inline bool rq_is_root(struct cfs_rq *cfs_rq)
+{
+ return true;
+}
+
#define entity_is_task(se) 1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
2016-06-01 20:11 ` Peter Zijlstra
@ 2016-06-02 15:59 ` Dietmar Eggemann
0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 15:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Vincent Guittot, Ben Segall, Morten Rasmussen, Yuyang Du
On 01/06/16 21:11, Peter Zijlstra wrote:
> On Wed, Jun 01, 2016 at 08:39:22PM +0100, Dietmar Eggemann wrote:
>> The information whether a se/cfs_rq should get its load and
>> utilization (se representing a task and root cfs_rq) or only its load
>> (se representing a task group and cfs_rq owned by this se) updated can
>> be passed into __update_load_avg() to avoid the additional if/else
>> condition to set update_util.
>>
>> @running is changed to @update_util which now carries the information if
>> the utilization of the se/cfs_rq should be updated and if the se/cfs_rq
>> is running or not.
>>
>> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
>> ---
>> kernel/sched/fair.c | 42 +++++++++++++++++++++---------------------
>> 1 file changed, 21 insertions(+), 21 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 3ae8e79fb687..a1c13975cf56 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2669,6 +2669,10 @@ static u32 __compute_runnable_contrib(u64 n)
>>
>> #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>>
>> +#define upd_util_se(se, rng) ((entity_is_task(se) << 1) | (rng))
>
> Just saying that on first reading that went: Random Number Generator, uh
> what?!
>
> So maybe pick better names?
Yeah, can do. What about?
#define update_util_se(se, running) ((entity_is_task(se) << 1) | (running))
#define update_util_rq(cfs_rq) ((rq_is_root(cfs_rq) << 1) | !!(cfs_rq)->curr)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] sched/fair: Aggregate task utilization only on root cfs_rq
2016-06-02 15:53 ` Dietmar Eggemann
@ 2016-06-02 16:11 ` Juri Lelli
0 siblings, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2016-06-02 16:11 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
On 02/06/16 16:53, Dietmar Eggemann wrote:
> On 02/06/16 10:23, Juri Lelli wrote:
>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 218f8e83db73..212becd3708f 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -2705,6 +2705,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >> u32 contrib;
> >> unsigned int delta_w, scaled_delta_w, decayed = 0;
> >> unsigned long scale_freq, scale_cpu;
> >> + int update_util = 0;
> >>
> >> delta = now - sa->last_update_time;
> >> /*
> >> @@ -2725,6 +2726,12 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >> return 0;
> >> sa->last_update_time = now;
> >>
> >> + if (cfs_rq) {
> >> + if (&rq_of(cfs_rq)->cfs == cfs_rq)
> >
> > Maybe we can wrap this sort of checks in a static inline improving
> > readability?
>
> Something like this?
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..01b0fa689ef9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -251,6 +251,18 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
> return cfs_rq->rq;
> }
>
> +/* cfs_rq "owned" by the root task group */
> +static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
> +{
> + return &rq_of(cfs_rq)->cfs;
> +}
> +
> +/* Is this cfs_rq "owned" by the root task group ? */
> +static inline bool rq_is_root(struct cfs_rq *cfs_rq)
> +{
> + return root_rq_of(cfs_rq) == cfs_rq;
> +}
> +
> /* An entity is a task if it doesn't "own" a runqueue */
> #define entity_is_task(se) (!se->my_q)
>
> @@ -376,6 +388,16 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
> return container_of(cfs_rq, struct rq, cfs);
> }
>
> +static inline struct cfs_rq *root_rq_of(struct cfs_rq *cfs_rq)
> +{
> + return cfs_rq;
> +}
> +
> +static inline bool rq_is_root(struct cfs_rq *cfs_rq)
> +{
> + return true;
> +}
> +
> #define entity_is_task(se) 1
>
Looks good to me.
Thanks,
- Juri
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
2016-06-02 9:25 ` Juri Lelli
@ 2016-06-02 17:27 ` Dietmar Eggemann
2016-06-03 10:56 ` Juri Lelli
0 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-02 17:27 UTC (permalink / raw)
To: Juri Lelli
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
On 02/06/16 10:25, Juri Lelli wrote:
[...]
>> @@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>> weight * scaled_delta_w;
>> }
>> }
>> - if (update_util && running)
>> + if (update_util == 0x3)
>
> How about a define for these masks?
Something like this?
+#define UTIL_RUNNING 1
+#define UTIL_UPDATE 2
+
/*
* We can represent the historical contribution to runnable average as the
* coefficients of a geometric series. To do this we sub-divide our runnable
@@ -2724,7 +2727,7 @@ static u32 __compute_runnable_contrib(u64 n)
*/
static __always_inline int
__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
- unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
+ unsigned long weight, int util_flags, struct cfs_rq *cfs_rq)
{
u64 delta, scaled_delta, periods;
u32 contrib;
@@ -2775,7 +2778,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
weight * scaled_delta_w;
}
}
- if (update_util == 0x3)
+ if (util_flags == (UTIL_UPDATE | UTIL_RUNNING))
sa->util_sum += scaled_delta_w * scale_cpu;
...
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util
2016-06-02 17:27 ` Dietmar Eggemann
@ 2016-06-03 10:56 ` Juri Lelli
0 siblings, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2016-06-03 10:56 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
On 02/06/16 18:27, Dietmar Eggemann wrote:
> On 02/06/16 10:25, Juri Lelli wrote:
>
> [...]
>
> >> @@ -2757,7 +2754,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> >> weight * scaled_delta_w;
> >> }
> >> }
> >> - if (update_util && running)
> >> + if (update_util == 0x3)
> >
> > How about a define for these masks?
>
> Something like this?
>
> +#define UTIL_RUNNING 1
> +#define UTIL_UPDATE 2
Make these 0x01 and 0x02, I'd say.
> +
> /*
> * We can represent the historical contribution to runnable average as the
> * coefficients of a geometric series. To do this we sub-divide our runnable
> @@ -2724,7 +2727,7 @@ static u32 __compute_runnable_contrib(u64 n)
> */
> static __always_inline int
> __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> - unsigned long weight, int update_util, struct cfs_rq *cfs_rq)
> + unsigned long weight, int util_flags, struct cfs_rq *cfs_rq)
> {
> u64 delta, scaled_delta, periods;
> u32 contrib;
> @@ -2775,7 +2778,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> weight * scaled_delta_w;
> }
> }
> - if (update_util == 0x3)
> + if (util_flags == (UTIL_UPDATE | UTIL_RUNNING))
Looks more readable to me. :-)
Best,
- Juri
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
@ 2016-06-06 2:59 ` Leo Yan
2016-06-06 8:45 ` Dietmar Eggemann
2016-06-06 12:11 ` Vincent Guittot
1 sibling, 1 reply; 17+ messages in thread
From: Leo Yan @ 2016-06-06 2:59 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
On Wed, Jun 01, 2016 at 08:39:21PM +0100, Dietmar Eggemann wrote:
> Since task utilization is accrued only on the root cfs_rq, there are a
> couple of places where the se has to be synced with the root cfs_rq:
>
> (1) The root cfs_rq has to be updated in attach_entity_load_avg() for
> an se representing a task in a tg other than the root tg before
> the se utilization can be added to it.
>
> (2) The last_update_time value of the root cfs_rq can be higher
> than the one of the cfs_rq the se is enqueued in. Call
> __update_load_avg() on the se with the last_update_time value of
> the root cfs_rq before removing se's utilization from the root
> cfs_rq in [remove|detach]_entity_load_avg().
>
> In case the difference between the last_update_time value of the cfs_rq
> and the root cfs_rq is smaller than 1024ns, the additional calls to
> __update_load_avg() will bail early.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
> kernel/sched/fair.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 212becd3708f..3ae8e79fb687 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2970,6 +2970,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>
> static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> + struct cfs_rq* root_cfs_rq;
> +
> if (!sched_feat(ATTACH_AGE_LOAD))
> goto skip_aging;
>
> @@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> if (!entity_is_task(se))
> return;
>
> - rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
> - rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
> + root_cfs_rq = &rq_of(cfs_rq)->cfs;
> +
> + if (parent_entity(se))
> + __update_load_avg(cfs_rq_clock_task(root_cfs_rq),
> + cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
> + scale_load_down(root_cfs_rq->load.weight),
> + upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
Maybe add below macro in this patch for more readable:
#define upd_util_cfs_rq(cfs_rq) XXX
> +
> + root_cfs_rq->avg.util_avg += se->avg.util_avg;
> + root_cfs_rq->avg.util_sum += se->avg.util_sum;
>
> cfs_rq_util_change(cfs_rq);
> }
> @@ -3013,6 +3023,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> if (!entity_is_task(se))
> return;
>
> + __update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> + &se->avg, se->on_rq * scale_load_down(se->load.weight),
> + cfs_rq->curr == se, NULL);
> +
> rq_of(cfs_rq)->cfs.avg.util_avg =
> max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
> rq_of(cfs_rq)->cfs.avg.util_sum =
> @@ -3105,6 +3119,9 @@ void remove_entity_load_avg(struct sched_entity *se)
> if (!entity_is_task(se))
> return;
>
> + last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
> +
> + __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
> atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
> }
>
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
2016-06-06 2:59 ` Leo Yan
@ 2016-06-06 8:45 ` Dietmar Eggemann
0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2016-06-06 8:45 UTC (permalink / raw)
To: Leo Yan
Cc: Peter Zijlstra, linux-kernel, Vincent Guittot, Ben Segall,
Morten Rasmussen, Yuyang Du
On 06/06/16 03:59, Leo Yan wrote:
> On Wed, Jun 01, 2016 at 08:39:21PM +0100, Dietmar Eggemann wrote:
[...]
>> @@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>> if (!entity_is_task(se))
>> return;
>>
>> - rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
>> - rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
>> + root_cfs_rq = &rq_of(cfs_rq)->cfs;
>> +
>> + if (parent_entity(se))
>> + __update_load_avg(cfs_rq_clock_task(root_cfs_rq),
>> + cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
>> + scale_load_down(root_cfs_rq->load.weight),
>> + upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
>
> Maybe add below macro in this patch for more readable:
> #define upd_util_cfs_rq(cfs_rq) XXX
Sorry, this doesn't belong here since the macro is only introduced in
3/3. Should be 'root_cfs_rq->curr != NULL' here instead.
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] sched/fair: Sync se with root cfs_rq
2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
2016-06-06 2:59 ` Leo Yan
@ 2016-06-06 12:11 ` Vincent Guittot
1 sibling, 0 replies; 17+ messages in thread
From: Vincent Guittot @ 2016-06-06 12:11 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Peter Zijlstra, linux-kernel, Ben Segall, Morten Rasmussen, Yuyang Du
Hi Dietmar,
On 1 June 2016 at 21:39, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> Since task utilization is accrued only on the root cfs_rq, there are a
> couple of places where the se has to be synced with the root cfs_rq:
>
> (1) The root cfs_rq has to be updated in attach_entity_load_avg() for
> an se representing a task in a tg other than the root tg before
> the se utilization can be added to it.
>
> (2) The last_update_time value of the root cfs_rq can be higher
> than the one of the cfs_rq the se is enqueued in. Call
> __update_load_avg() on the se with the last_update_time value of
> the root cfs_rq before removing se's utilization from the root
> cfs_rq in [remove|detach]_entity_load_avg().
>
> In case the difference between the last_update_time value of the cfs_rq
> and the root cfs_rq is smaller than 1024ns, the additional calls to
> __update_load_avg() will bail early.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
> kernel/sched/fair.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 212becd3708f..3ae8e79fb687 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2970,6 +2970,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
>
> static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> + struct cfs_rq* root_cfs_rq;
> +
> if (!sched_feat(ATTACH_AGE_LOAD))
> goto skip_aging;
>
> @@ -2995,8 +2997,16 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> if (!entity_is_task(se))
> return;
>
> - rq_of(cfs_rq)->cfs.avg.util_avg += se->avg.util_avg;
> - rq_of(cfs_rq)->cfs.avg.util_sum += se->avg.util_sum;
> + root_cfs_rq = &rq_of(cfs_rq)->cfs;
> +
> + if (parent_entity(se))
> + __update_load_avg(cfs_rq_clock_task(root_cfs_rq),
> + cpu_of(rq_of(root_cfs_rq)), &root_cfs_rq->avg,
> + scale_load_down(root_cfs_rq->load.weight),
> + upd_util_cfs_rq(root_cfs_rq), root_cfs_rq);
> +
> + root_cfs_rq->avg.util_avg += se->avg.util_avg;
> + root_cfs_rq->avg.util_sum += se->avg.util_sum;
The main issue with flat utilization is that we can't keep the
sched_avg on an sched_entity synced (from a last_update_time pov) with
both the cfs_rq on which load is attached and the root_cfs rq on which
the utilization is attached.
With this additional sync to root cfs_rq in
attach/detach_entity_load_avg and in remove_entity_load_avg, the load
of a sched_entity is no more synced to the time stamp of cfs_rq onto
which it is attached. This can generate several wrong update of the
load of the latter.
As an example, lets take a task TA that sleeps and move it on TGB
which has not run recently so TGB.avg.last_update_time << root
cfs_rq.avg.last_update_time (a decay of 20ms remove 35% of the load)
When we attach TA to TGB, TA is sync with TGB for attaching it and
then decayed to be synced with root cfs_rq.
If TA is then moved to another task group, we try to sync TA to TGB
but TA is in the future so TA.avg.last_update_time is set to TGB one.
Then, TA load is removed to TGB but TA load has been decayed so only a
part will be effectively subtracted. Then, TA load is synced with root
cfs_rq which means decayed one more time for the same time slot
because TA.avg.last_update_time has been reset to
TGB.avg.last_update_time so we will substract less utilization than
what we should in root cfs_rq.
I think that similar behavior can apply with the removed load.
>
> cfs_rq_util_change(cfs_rq);
> }
> @@ -3013,6 +3023,10 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> if (!entity_is_task(se))
> return;
>
> + __update_load_avg(rq_of(cfs_rq)->cfs.avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> + &se->avg, se->on_rq * scale_load_down(se->load.weight),
> + cfs_rq->curr == se, NULL);
> +
> rq_of(cfs_rq)->cfs.avg.util_avg =
> max_t(long, rq_of(cfs_rq)->cfs.avg.util_avg - se->avg.util_avg, 0);
> rq_of(cfs_rq)->cfs.avg.util_sum =
> @@ -3105,6 +3119,9 @@ void remove_entity_load_avg(struct sched_entity *se)
> if (!entity_is_task(se))
> return;
>
> + last_update_time = cfs_rq_last_update_time(&rq_of(cfs_rq)->cfs);
> +
> + __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL);
> atomic_long_add(se->avg.util_avg, &rq_of(cfs_rq)->cfs.removed_util_avg);
> }
>
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-06-06 12:11 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 19:39 [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Dietmar Eggemann
2016-06-01 19:39 ` [RFC PATCH 1/3] sched/fair: " Dietmar Eggemann
2016-06-02 9:23 ` Juri Lelli
2016-06-02 15:53 ` Dietmar Eggemann
2016-06-02 16:11 ` Juri Lelli
2016-06-01 19:39 ` [RFC PATCH 2/3] sched/fair: Sync se with " Dietmar Eggemann
2016-06-06 2:59 ` Leo Yan
2016-06-06 8:45 ` Dietmar Eggemann
2016-06-06 12:11 ` Vincent Guittot
2016-06-01 19:39 ` [RFC PATCH 3/3] sched/fair: Change @running of __update_load_avg() to @update_util Dietmar Eggemann
2016-06-01 20:11 ` Peter Zijlstra
2016-06-02 15:59 ` Dietmar Eggemann
2016-06-02 9:25 ` Juri Lelli
2016-06-02 17:27 ` Dietmar Eggemann
2016-06-03 10:56 ` Juri Lelli
2016-06-01 20:10 ` [RFC PATCH 0/3] Aggregate task utilization only on root cfs_rq Peter Zijlstra
2016-06-02 15:40 ` Dietmar Eggemann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).