linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/pelt: fix false running accounting in PELT
@ 2017-06-30 13:58 Vincent Guittot
  2017-07-01  5:06 ` [PATCH v2] sched/pelt: fix false running accounting Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Vincent Guittot @ 2017-06-30 13:58 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, Vincent Guittot

The running state is a subset of runnable state which means that running
can't be set if runnable (weight) is cleared. There are corner cases
where the current sched_entity has been already dequeued but cfs_rq->curr
has not been updated yet and still points to the dequeued sched_entity.
If ___update_load_avg is called at that time, weight will be 0 and running
will be set which is not possible.

This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
The current sched_entity has been dequeued so se->on_rq is cleared and
cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
cleared when picking the idle thread). Because the cfs_rq becomes idle,
idle_balance() is called and ends up to call update_blocked_averages()
with these wrong running and runnable states.

Add a test in ___update_load_avg to correct the running state in this case.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 008c514..5fdcb42 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2968,6 +2968,24 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	sa->last_update_time += delta << 10;
 
 	/*
+	 * running is a subset of runnable (weight) so running can't be set if
+	 * runnable is clear. But there are some corner cases where the current
+	 * se has been already dequeued but cfs_rq->curr still points to it.
+	 * This means that weight will be 0 but not running for a sched_entity
+	 * but also for a cfs_rq if the latter becomes idle. As an example,
+	 * this happens during idle_balance() which calls
+	 * update_blocked_averages()
+	 */
+	if (weight)
+		running = 1;
+
+	/*
+	 * Scale time to reflect the amount a computation effectively done
+	 * during the time slot at current capacity
+	 */
+	delta = scale_time(delta, cpu, sa, weight, running);
+
+	/*
 	 * Now we know we crossed measurement unit boundaries. The *_avg
 	 * accrues by two steps:
 	 *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2] sched/pelt: fix false running accounting
  2017-06-30 13:58 [PATCH] sched/pelt: fix false running accounting in PELT Vincent Guittot
@ 2017-07-01  5:06 ` Vincent Guittot
  2017-07-04  7:27   ` Peter Zijlstra
  2017-08-10 12:07   ` [tip:sched/core] sched/pelt: Fix " tip-bot for Vincent Guittot
  2017-07-01  5:09 ` [PATCH] sched/pelt: fix false running accounting in PELT Vincent Guittot
  2017-07-01 16:57 ` kbuild test robot
  2 siblings, 2 replies; 12+ messages in thread
From: Vincent Guittot @ 2017-07-01  5:06 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, Vincent Guittot

The running state is a subset of runnable state which means that running
can't be set if runnable (weight) is cleared. There are corner cases
where the current sched_entity has been already dequeued but cfs_rq->curr
has not been updated yet and still points to the dequeued sched_entity.
If ___update_load_avg is called at that time, weight will be 0 and running
will be set which is not possible.

This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
The current sched_entity has been dequeued so se->on_rq is cleared and
cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
cleared when picking the idle thread). Because the cfs_rq becomes idle,
idle_balance() is called and ends up to call update_blocked_averages()
with these wrong running and runnable states.

Add a test in ___update_load_avg to correct the running state in this case.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 008c514..bc36a75 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2968,6 +2968,18 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	sa->last_update_time += delta << 10;
 
 	/*
+	 * running is a subset of runnable (weight) so running can't be set if
+	 * runnable is clear. But there are some corner cases where the current
+	 * se has been already dequeued but cfs_rq->curr still points to it.
+	 * This means that weight will be 0 but not running for a sched_entity
+	 * but also for a cfs_rq if the latter becomes idle. As an example,
+	 * this happens during idle_balance() which calls
+	 * update_blocked_averages()
+	 */
+	if (!weight)
+		running = 0;
+
+	/*
 	 * Now we know we crossed measurement unit boundaries. The *_avg
 	 * accrues by two steps:
 	 *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched/pelt: fix false running accounting in PELT
  2017-06-30 13:58 [PATCH] sched/pelt: fix false running accounting in PELT Vincent Guittot
  2017-07-01  5:06 ` [PATCH v2] sched/pelt: fix false running accounting Vincent Guittot
@ 2017-07-01  5:09 ` Vincent Guittot
  2017-07-01 16:57 ` kbuild test robot
  2 siblings, 0 replies; 12+ messages in thread
From: Vincent Guittot @ 2017-07-01  5:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel
  Cc: Dietmar Eggemann, Morten Rasmussen, Vincent Guittot

On 30 June 2017 at 15:58, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> The running state is a subset of runnable state which means that running
> can't be set if runnable (weight) is cleared. There are corner cases
> where the current sched_entity has been already dequeued but cfs_rq->curr
> has not been updated yet and still points to the dequeued sched_entity.
> If ___update_load_avg is called at that time, weight will be 0 and running
> will be set which is not possible.
>
> This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
> The current sched_entity has been dequeued so se->on_rq is cleared and
> cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
> cleared when picking the idle thread). Because the cfs_rq becomes idle,
> idle_balance() is called and ends up to call update_blocked_averages()
> with these wrong running and runnable states.
>
> Add a test in ___update_load_avg to correct the running state in this case.*
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

The v1 is just wrong. I have sent a v2 with correct patch

sorry for the disturb
Vincent

> ---
>  kernel/sched/fair.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 008c514..5fdcb42 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2968,6 +2968,24 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>         sa->last_update_time += delta << 10;
>
>         /*
> +        * running is a subset of runnable (weight) so running can't be set if
> +        * runnable is clear. But there are some corner cases where the current
> +        * se has been already dequeued but cfs_rq->curr still points to it.
> +        * This means that weight will be 0 but not running for a sched_entity
> +        * but also for a cfs_rq if the latter becomes idle. As an example,
> +        * this happens during idle_balance() which calls
> +        * update_blocked_averages()
> +        */
> +       if (weight)
> +               running = 1;
> +
> +       /*
> +        * Scale time to reflect the amount a computation effectively done
> +        * during the time slot at current capacity
> +        */
> +       delta = scale_time(delta, cpu, sa, weight, running);
> +
> +       /*
>          * Now we know we crossed measurement unit boundaries. The *_avg
>          * accrues by two steps:
>          *
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched/pelt: fix false running accounting in PELT
  2017-06-30 13:58 [PATCH] sched/pelt: fix false running accounting in PELT Vincent Guittot
  2017-07-01  5:06 ` [PATCH v2] sched/pelt: fix false running accounting Vincent Guittot
  2017-07-01  5:09 ` [PATCH] sched/pelt: fix false running accounting in PELT Vincent Guittot
@ 2017-07-01 16:57 ` kbuild test robot
  2 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-07-01 16:57 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: kbuild-all, peterz, mingo, linux-kernel, dietmar.eggemann,
	Morten.Rasmussen, Vincent Guittot

[-- Attachment #1: Type: text/plain, Size: 1424 bytes --]

Hi Vincent,

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.12-rc7 next-20170630]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Vincent-Guittot/sched-pelt-fix-false-running-accounting-in-PELT/20170702-002907
config: i386-randconfig-x078-07012120 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/sched/fair.c: In function '___update_load_avg':
>> kernel/sched/fair.c:2986:10: error: implicit declaration of function 'scale_time' [-Werror=implicit-function-declaration]
     delta = scale_time(delta, cpu, sa, weight, running);
             ^~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/scale_time +2986 kernel/sched/fair.c

  2980			running = 1;
  2981	
  2982		/*
  2983		 * Scale time to reflect the amount a computation effectively done
  2984		 * during the time slot at current capacity
  2985		 */
> 2986		delta = scale_time(delta, cpu, sa, weight, running);
  2987	
  2988		/*
  2989		 * Now we know we crossed measurement unit boundaries. The *_avg

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22764 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched/pelt: fix false running accounting
  2017-07-01  5:06 ` [PATCH v2] sched/pelt: fix false running accounting Vincent Guittot
@ 2017-07-04  7:27   ` Peter Zijlstra
  2017-07-04  7:37     ` Vincent Guittot
  2017-07-04  8:34     ` Peter Zijlstra
  2017-08-10 12:07   ` [tip:sched/core] sched/pelt: Fix " tip-bot for Vincent Guittot
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-07-04  7:27 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: mingo, linux-kernel, dietmar.eggemann, Morten.Rasmussen

On Sat, Jul 01, 2017 at 07:06:13AM +0200, Vincent Guittot wrote:
> The running state is a subset of runnable state which means that running
> can't be set if runnable (weight) is cleared. There are corner cases
> where the current sched_entity has been already dequeued but cfs_rq->curr
> has not been updated yet and still points to the dequeued sched_entity.
> If ___update_load_avg is called at that time, weight will be 0 and running
> will be set which is not possible.
> 
> This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
> The current sched_entity has been dequeued so se->on_rq is cleared and
> cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
> cleared when picking the idle thread). Because the cfs_rq becomes idle,
> idle_balance() is called and ends up to call update_blocked_averages()
> with these wrong running and runnable states.
> 
> Add a test in ___update_load_avg to correct the running state in this case.

Cute, however did you find that ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched/pelt: fix false running accounting
  2017-07-04  7:27   ` Peter Zijlstra
@ 2017-07-04  7:37     ` Vincent Guittot
  2017-07-04  8:34     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Vincent Guittot @ 2017-07-04  7:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Dietmar Eggemann, Morten Rasmussen

On 4 July 2017 at 09:27, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, Jul 01, 2017 at 07:06:13AM +0200, Vincent Guittot wrote:
>> The running state is a subset of runnable state which means that running
>> can't be set if runnable (weight) is cleared. There are corner cases
>> where the current sched_entity has been already dequeued but cfs_rq->curr
>> has not been updated yet and still points to the dequeued sched_entity.
>> If ___update_load_avg is called at that time, weight will be 0 and running
>> will be set which is not possible.
>>
>> This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
>> The current sched_entity has been dequeued so se->on_rq is cleared and
>> cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
>> cleared when picking the idle thread). Because the cfs_rq becomes idle,
>> idle_balance() is called and ends up to call update_blocked_averages()
>> with these wrong running and runnable states.
>>
>> Add a test in ___update_load_avg to correct the running state in this case.
>
> Cute, however did you find that ?

In fact, while rebasing and running more tests on my patch "update
scale invariance of PELT" that changes how to scale the load and
utilization, I have seen that sometimes the utilization was increasing
but not the load when CPU was going into idle state because the
stolen_idle time was applied as idle time for load but running time
for utilization. This patch has highlighted the problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched/pelt: fix false running accounting
  2017-07-04  7:27   ` Peter Zijlstra
  2017-07-04  7:37     ` Vincent Guittot
@ 2017-07-04  8:34     ` Peter Zijlstra
  2017-07-04  9:12       ` Vincent Guittot
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2017-07-04  8:34 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: mingo, linux-kernel, dietmar.eggemann, Morten.Rasmussen

On Tue, Jul 04, 2017 at 09:27:07AM +0200, Peter Zijlstra wrote:
> On Sat, Jul 01, 2017 at 07:06:13AM +0200, Vincent Guittot wrote:
> > The running state is a subset of runnable state which means that running
> > can't be set if runnable (weight) is cleared. There are corner cases
> > where the current sched_entity has been already dequeued but cfs_rq->curr
> > has not been updated yet and still points to the dequeued sched_entity.
> > If ___update_load_avg is called at that time, weight will be 0 and running
> > will be set which is not possible.
> > 
> > This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
> > The current sched_entity has been dequeued so se->on_rq is cleared and
> > cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
> > cleared when picking the idle thread). Because the cfs_rq becomes idle,
> > idle_balance() is called and ends up to call update_blocked_averages()
> > with these wrong running and runnable states.
> > 
> > Add a test in ___update_load_avg to correct the running state in this case.
> 
> Cute, however did you find that ?

Hmm,.. could you give a little more detail?

Because if ->on_rq=0, we'll have done dequeue_task() which will have
done update_curr() with ->on_rq, weight and ->running consistently.

Then the above, inconsistent update should not happen, because delta=0.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched/pelt: fix false running accounting
  2017-07-04  8:34     ` Peter Zijlstra
@ 2017-07-04  9:12       ` Vincent Guittot
  2017-07-04  9:44         ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Guittot @ 2017-07-04  9:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Dietmar Eggemann, Morten Rasmussen

On 4 July 2017 at 10:34, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jul 04, 2017 at 09:27:07AM +0200, Peter Zijlstra wrote:
>> On Sat, Jul 01, 2017 at 07:06:13AM +0200, Vincent Guittot wrote:
>> > The running state is a subset of runnable state which means that running
>> > can't be set if runnable (weight) is cleared. There are corner cases
>> > where the current sched_entity has been already dequeued but cfs_rq->curr
>> > has not been updated yet and still points to the dequeued sched_entity.
>> > If ___update_load_avg is called at that time, weight will be 0 and running
>> > will be set which is not possible.
>> >
>> > This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
>> > The current sched_entity has been dequeued so se->on_rq is cleared and
>> > cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
>> > cleared when picking the idle thread). Because the cfs_rq becomes idle,
>> > idle_balance() is called and ends up to call update_blocked_averages()
>> > with these wrong running and runnable states.
>> >
>> > Add a test in ___update_load_avg to correct the running state in this case.
>>
>> Cute, however did you find that ?
>
> Hmm,.. could you give a little more detail?
>
> Because if ->on_rq=0, we'll have done dequeue_task() which will have
> done update_curr() with ->on_rq, weight and ->running consistently.
>
> Then the above, inconsistent update should not happen, because delta=0.

In fact, the delta between dequeue_entity_load_avg() and
update_blocked_averages() is not 0 on my platform (hikey) but can be
longer than 60us (at lowest frequency with only 1 task group level)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched/pelt: fix false running accounting
  2017-07-04  9:12       ` Vincent Guittot
@ 2017-07-04  9:44         ` Peter Zijlstra
  2017-07-04  9:57           ` Vincent Guittot
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2017-07-04  9:44 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, linux-kernel, Dietmar Eggemann, Morten Rasmussen

On Tue, Jul 04, 2017 at 11:12:34AM +0200, Vincent Guittot wrote:
> On 4 July 2017 at 10:34, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Jul 04, 2017 at 09:27:07AM +0200, Peter Zijlstra wrote:
> >> On Sat, Jul 01, 2017 at 07:06:13AM +0200, Vincent Guittot wrote:
> >> > The running state is a subset of runnable state which means that running
> >> > can't be set if runnable (weight) is cleared. There are corner cases
> >> > where the current sched_entity has been already dequeued but cfs_rq->curr
> >> > has not been updated yet and still points to the dequeued sched_entity.
> >> > If ___update_load_avg is called at that time, weight will be 0 and running
> >> > will be set which is not possible.
> >> >
> >> > This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
> >> > The current sched_entity has been dequeued so se->on_rq is cleared and
> >> > cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
> >> > cleared when picking the idle thread). Because the cfs_rq becomes idle,
> >> > idle_balance() is called and ends up to call update_blocked_averages()
> >> > with these wrong running and runnable states.
> >> >
> >> > Add a test in ___update_load_avg to correct the running state in this case.
> >>
> >> Cute, however did you find that ?
> >
> > Hmm,.. could you give a little more detail?
> >
> > Because if ->on_rq=0, we'll have done dequeue_task() which will have
> > done update_curr() with ->on_rq, weight and ->running consistently.
> >
> > Then the above, inconsistent update should not happen, because delta=0.
> 
> In fact, the delta between dequeue_entity_load_avg() and
> update_blocked_averages() is not 0 on my platform (hikey) but can be
> longer than 60us (at lowest frequency with only 1 task group level)

But but but, how can that happen? Should it not all be under the same
rq->lock and thus have only a single update_rq_clock() and thus be at
the same 'instant' ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched/pelt: fix false running accounting
  2017-07-04  9:44         ` Peter Zijlstra
@ 2017-07-04  9:57           ` Vincent Guittot
  2017-07-04 11:07             ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Guittot @ 2017-07-04  9:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Dietmar Eggemann, Morten Rasmussen

On 4 July 2017 at 11:44, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jul 04, 2017 at 11:12:34AM +0200, Vincent Guittot wrote:
>> On 4 July 2017 at 10:34, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, Jul 04, 2017 at 09:27:07AM +0200, Peter Zijlstra wrote:
>> >> On Sat, Jul 01, 2017 at 07:06:13AM +0200, Vincent Guittot wrote:
>> >> > The running state is a subset of runnable state which means that running
>> >> > can't be set if runnable (weight) is cleared. There are corner cases
>> >> > where the current sched_entity has been already dequeued but cfs_rq->curr
>> >> > has not been updated yet and still points to the dequeued sched_entity.
>> >> > If ___update_load_avg is called at that time, weight will be 0 and running
>> >> > will be set which is not possible.
>> >> >
>> >> > This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
>> >> > The current sched_entity has been dequeued so se->on_rq is cleared and
>> >> > cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
>> >> > cleared when picking the idle thread). Because the cfs_rq becomes idle,
>> >> > idle_balance() is called and ends up to call update_blocked_averages()
>> >> > with these wrong running and runnable states.
>> >> >
>> >> > Add a test in ___update_load_avg to correct the running state in this case.
>> >>
>> >> Cute, however did you find that ?
>> >
>> > Hmm,.. could you give a little more detail?
>> >
>> > Because if ->on_rq=0, we'll have done dequeue_task() which will have
>> > done update_curr() with ->on_rq, weight and ->running consistently.
>> >
>> > Then the above, inconsistent update should not happen, because delta=0.
>>
>> In fact, the delta between dequeue_entity_load_avg() and
>> update_blocked_averages() is not 0 on my platform (hikey) but can be
>> longer than 60us (at lowest frequency with only 1 task group level)
>
> But but but, how can that happen? Should it not all be under the same
> rq->lock and thus have only a single update_rq_clock() and thus be at
> the same 'instant' ?

idle_balance() unlock rq->lock before calling  update_blocked_averages
And update_blocked_averages() starts by calling update_rq_clock()

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] sched/pelt: fix false running accounting
  2017-07-04  9:57           ` Vincent Guittot
@ 2017-07-04 11:07             ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-07-04 11:07 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, linux-kernel, Dietmar Eggemann, Morten Rasmussen

On Tue, Jul 04, 2017 at 11:57:12AM +0200, Vincent Guittot wrote:
> On 4 July 2017 at 11:44, Peter Zijlstra <peterz@infradead.org> wrote:

> > But but but, how can that happen? Should it not all be under the same
> > rq->lock and thus have only a single update_rq_clock() and thus be at
> > the same 'instant' ?
> 
> idle_balance() unlock rq->lock before calling  update_blocked_averages
> And update_blocked_averages() starts by calling update_rq_clock()

Ah indeed. Might want to clarify that point.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip:sched/core] sched/pelt: Fix false running accounting
  2017-07-01  5:06 ` [PATCH v2] sched/pelt: fix false running accounting Vincent Guittot
  2017-07-04  7:27   ` Peter Zijlstra
@ 2017-08-10 12:07   ` tip-bot for Vincent Guittot
  1 sibling, 0 replies; 12+ messages in thread
From: tip-bot for Vincent Guittot @ 2017-08-10 12:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, tglx, mingo, torvalds, hpa, vincent.guittot, linux-kernel

Commit-ID:  f235a54f00449c611f85173fe8a66c4d189c5ce1
Gitweb:     http://git.kernel.org/tip/f235a54f00449c611f85173fe8a66c4d189c5ce1
Author:     Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Sat, 1 Jul 2017 07:06:13 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Aug 2017 12:18:15 +0200

sched/pelt: Fix false running accounting

The running state is a subset of runnable state which means that running
can't be set if runnable (weight) is cleared. There are corner cases
where the current sched_entity has been already dequeued but cfs_rq->curr
has not been updated yet and still points to the dequeued sched_entity.
If ___update_load_avg() is called at that time, weight will be 0 and running
will be set which is not possible.

This case happens during pick_next_task_fair() when a cfs_rq becomes idles.
The current sched_entity has been dequeued so se->on_rq is cleared and
cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be
cleared when picking the idle thread). Because the cfs_rq becomes idle,
idle_balance() is called and ends up to call update_blocked_averages()
with these wrong running and runnable states.

Add a test in ___update_load_avg() to correct the running state in this case.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten.Rasmussen@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Link: http://lkml.kernel.org/r/1498885573-18984-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 75c58c7..ef5b66b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2991,6 +2991,18 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	sa->last_update_time += delta << 10;
 
 	/*
+	 * running is a subset of runnable (weight) so running can't be set if
+	 * runnable is clear. But there are some corner cases where the current
+	 * se has been already dequeued but cfs_rq->curr still points to it.
+	 * This means that weight will be 0 but not running for a sched_entity
+	 * but also for a cfs_rq if the latter becomes idle. As an example,
+	 * this happens during idle_balance() which calls
+	 * update_blocked_averages()
+	 */
+	if (!weight)
+		running = 0;
+
+	/*
 	 * Now we know we crossed measurement unit boundaries. The *_avg
 	 * accrues by two steps:
 	 *

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-08-10 12:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-30 13:58 [PATCH] sched/pelt: fix false running accounting in PELT Vincent Guittot
2017-07-01  5:06 ` [PATCH v2] sched/pelt: fix false running accounting Vincent Guittot
2017-07-04  7:27   ` Peter Zijlstra
2017-07-04  7:37     ` Vincent Guittot
2017-07-04  8:34     ` Peter Zijlstra
2017-07-04  9:12       ` Vincent Guittot
2017-07-04  9:44         ` Peter Zijlstra
2017-07-04  9:57           ` Vincent Guittot
2017-07-04 11:07             ` Peter Zijlstra
2017-08-10 12:07   ` [tip:sched/core] sched/pelt: Fix " tip-bot for Vincent Guittot
2017-07-01  5:09 ` [PATCH] sched/pelt: fix false running accounting in PELT Vincent Guittot
2017-07-01 16:57 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).