[PATCH] sched/pelt: fix false running accounting in PELT

* [PATCH] sched/pelt: fix false running accounting in PELT
@ 2017-06-30 13:58 Vincent Guittot
  2017-07-01  5:06 ` [PATCH v2] sched/pelt: fix false running accounting Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Vincent Guittot @ 2017-06-30 13:58 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, Vincent Guittot

The running state is a subset of runnable state which means that running
can't be set if runnable (weight) is cleared. There are corner cases
where the current sched_entity has been already dequeued but cfs_rq->curr
has not been updated yet and still points to the dequeued sched_entity.
If ___update_load_avg is called at that time, weight will be 0 and running
will be set which is not possible.

This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
The current sched_entity has been dequeued so se->on_rq is cleared and
cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
cleared when picking the idle thread). Because the cfs_rq becomes idle,
idle_balance() is called and ends up to call update_blocked_averages()
with these wrong running and runnable states.

Add a test in ___update_load_avg to correct the running state in this case.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 008c514..5fdcb42 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2968,6 +2968,24 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	sa->last_update_time += delta << 10;
 
 	/*
+	 * running is a subset of runnable (weight) so running can't be set if
+	 * runnable is clear. But there are some corner cases where the current
+	 * se has been already dequeued but cfs_rq->curr still points to it.
+	 * This means that weight will be 0 but not running for a sched_entity
+	 * but also for a cfs_rq if the latter becomes idle. As an example,
+	 * this happens during idle_balance() which calls
+	 * update_blocked_averages()
+	 */
+	if (weight)
+		running = 1;
+
+	/*
+	 * Scale time to reflect the amount a computation effectively done
+	 * during the time slot at current capacity
+	 */
+	delta = scale_time(delta, cpu, sa, weight, running);
+
+	/*
 	 * Now we know we crossed measurement unit boundaries. The *_avg
 	 * accrues by two steps:
 	 *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread