All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
@ 2016-04-01 19:42 Chris Metcalf
  2016-04-04 19:12 ` Rik van Riel
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Metcalf @ 2016-04-01 19:42 UTC (permalink / raw)
  To: Frederic Weisbecker, Christoph Lameter, Ingo Molnar,
	Luiz Capitulino, Peter Zijlstra, Rik van Riel, Thomas Gleixner,
	Viresh Kumar, linux-kernel
  Cc: Chris Metcalf

On arm64, when calling enqueue_task_fair() from migration_cpu_stop(),
we find the nr_running value updated by add_nr_running(), but the
cfs.nr_running value has not always yet been updated.  Accordingly,
the sched_can_stop_tick() false returns true when we are migrating a
second task onto a core.

Correct this by using rq->nr_running instead of rq->cfs.nr_running.
This should always be more conservative, and reverts the test to the
form it had before commit 76d92ac305f2 ("sched: Migrate sched to use
new tick dependency mask model").

Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
---
I found this bug because I had a program running in nohz_full
on a core, and from a different core I called sched_setaffinity()
to force that task onto the nohz_full core, but I did not end up with
a kick to the nohz_full core, so tick-based scheduling did not start.
This is probably bad enough that we should fix it for 4.6.

Strangely, for some reason, the existing code worked correctly for me
for tilegx, but not for arm64.  I see that the enqueue_task_fair()
code calls enqueue_entity(), which calls account_entity_enqueue() to
adjust cfs.nr_running.  That seemed to happen on tilegx, but not arm64.
Perhaps there is some difference in how the sched_entity stuff is done,
but frankly that took me a little deeper into the CFS stuff than I was
willing to dive in this moment.

I could also argue that sched/core.c shouldn't have a lot of CFS
stuff in it anyway, and if we view the FIFO/RR stuff as handling the
real special cases in sched_can_stop_tick() anyway, then just checking
the core nr_running feels like the right thing to do regardless.

 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 00649f7ad567..1737d63c65fa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -599,7 +599,7 @@ bool sched_can_stop_tick(struct rq *rq)
 	}
 
 	/* Normal multitasking need periodic preemption checks */
-	if (rq->cfs.nr_running > 1)
+	if (rq->nr_running > 1)
 		return false;
 
 	return true;
-- 
2.7.2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-01 19:42 [PATCH] nohz_full: Make sched_should_stop_tick() more conservative Chris Metcalf
@ 2016-04-04 19:12 ` Rik van Riel
  2016-04-04 19:23   ` Peter Zijlstra
  2016-04-04 19:31   ` [PATCH] nohz_full: Make sched_should_stop_tick() more conservative Chris Metcalf
  0 siblings, 2 replies; 12+ messages in thread
From: Rik van Riel @ 2016-04-04 19:12 UTC (permalink / raw)
  To: Chris Metcalf, Frederic Weisbecker, Christoph Lameter,
	Ingo Molnar, Luiz Capitulino, Peter Zijlstra, Thomas Gleixner,
	Viresh Kumar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3100 bytes --]

On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote:
> On arm64, when calling enqueue_task_fair() from migration_cpu_stop(),
> we find the nr_running value updated by add_nr_running(), but the
> cfs.nr_running value has not always yet been updated.  Accordingly,
> the sched_can_stop_tick() false returns true when we are migrating a
> second task onto a core.

I don't get it.

Looking at the enqueue_task_fair(), I see this:

        for_each_sched_entity(se) {
                cfs_rq = cfs_rq_of(se);
                cfs_rq->h_nr_running++;
		...
	}

        if (!se)
                add_nr_running(rq, 1);

What is the difference between cfs_rq->h_nr_running,
and rq->cfs.nr_running?

Why do we have two?

Are we simply testing against the wrong one in
sched_can_stop_tick?

> Correct this by using rq->nr_running instead of rq->cfs.nr_running.
> This should always be more conservative, and reverts the test to the
> form it had before commit 76d92ac305f2 ("sched: Migrate sched to use
> new tick dependency mask model").

That would cause us to run the timer tick while running
a single SCHED_RR real time task, with a single
SCHED_OTHER task sitting in the background (which will
not get run until the SCHED_RR task is done).

I don't think that is the quite behaviour we want.

> Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
> ---
> I found this bug because I had a program running in nohz_full
> on a core, and from a different core I called sched_setaffinity()
> to force that task onto the nohz_full core, but I did not end up with
> a kick to the nohz_full core, so tick-based scheduling did not start.
> This is probably bad enough that we should fix it for 4.6.
> 
> Strangely, for some reason, the existing code worked correctly for me
> for tilegx, but not for arm64.  I see that the enqueue_task_fair()
> code calls enqueue_entity(), which calls account_entity_enqueue() to
> adjust cfs.nr_running.  That seemed to happen on tilegx, but not
> arm64.
> Perhaps there is some difference in how the sched_entity stuff is
> done,
> but frankly that took me a little deeper into the CFS stuff than I
> was
> willing to dive in this moment.
> 
> I could also argue that sched/core.c shouldn't have a lot of CFS
> stuff in it anyway, and if we view the FIFO/RR stuff as handling the
> real special cases in sched_can_stop_tick() anyway, then just
> checking
> the core nr_running feels like the right thing to do regardless.
> 
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 00649f7ad567..1737d63c65fa 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -599,7 +599,7 @@ bool sched_can_stop_tick(struct rq *rq)
>  	}
>  
>  	/* Normal multitasking need periodic preemption checks */
> -	if (rq->cfs.nr_running > 1)
> +	if (rq->nr_running > 1)
>  		return false;
>  
>  	return true;
-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-04 19:12 ` Rik van Riel
@ 2016-04-04 19:23   ` Peter Zijlstra
  2016-04-18  2:00     ` Wanpeng Li
  2016-04-04 19:31   ` [PATCH] nohz_full: Make sched_should_stop_tick() more conservative Chris Metcalf
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-04-04 19:23 UTC (permalink / raw)
  To: Rik van Riel, Chris Metcalf, Frederic Weisbecker,
	Christoph Lameter, Ingo Molnar, Luiz Capitulino, Thomas Gleixner,
	Viresh Kumar, linux-kernel



On 4 April 2016 21:12:23 CEST, Rik van Riel <riel@redhat.com> wrote:

>What is the difference between cfs_rq->h_nr_running,
>and rq->cfs.nr_running?
>
>Why do we have two?


H is for hierarchy. That counts the total of runnable tasks in the entire child hierarchy. Nr_running is the number of se  entities in the current tree.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-04 19:12 ` Rik van Riel
  2016-04-04 19:23   ` Peter Zijlstra
@ 2016-04-04 19:31   ` Chris Metcalf
  2016-04-04 19:36     ` Rik van Riel
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Metcalf @ 2016-04-04 19:31 UTC (permalink / raw)
  To: Rik van Riel, Frederic Weisbecker, Christoph Lameter,
	Ingo Molnar, Luiz Capitulino, Peter Zijlstra, Thomas Gleixner,
	Viresh Kumar, linux-kernel

On 4/4/2016 3:12 PM, Rik van Riel wrote:
> On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote:
>> On arm64, when calling enqueue_task_fair() from migration_cpu_stop(),
>> we find the nr_running value updated by add_nr_running(), but the
>> cfs.nr_running value has not always yet been updated.  Accordingly,
>> the sched_can_stop_tick() false returns true when we are migrating a
>> second task onto a core.
> I don't get it.
>
> Looking at the enqueue_task_fair(), I see this:
>
>          for_each_sched_entity(se) {
>                  cfs_rq = cfs_rq_of(se);
>                  cfs_rq->h_nr_running++;
> 		...
> 	}
>
>          if (!se)
>                  add_nr_running(rq, 1);
>
> What is the difference between cfs_rq->h_nr_running,
> and rq->cfs.nr_running?
>
> Why do we have two?
> Are we simply testing against the wrong one in
> sched_can_stop_tick?

It seems that using the non-CFS one is what we want.  I don't know whether
using a different CFS count instead might be more correct.

Since I'm not sure what causes the difference I see between tile (correct)
and arm64 (incorrect) it's hard for me to speculate.

>> Correct this by using rq->nr_running instead of rq->cfs.nr_running.
>> This should always be more conservative, and reverts the test to the
>> form it had before commit 76d92ac305f2 ("sched: Migrate sched to use
>> new tick dependency mask model").
> That would cause us to run the timer tick while running
> a single SCHED_RR real time task, with a single
> SCHED_OTHER task sitting in the background (which will
> not get run until the SCHED_RR task is done).

No, because in sched_can_stop_tick(), we first handle the special
cases of RR or FIFO tasks present.  For example, RR:

         if (rq->rt.rr_nr_running) {
                 if (rq->rt.rr_nr_running == 1)
                         return true;
                 else
                         return false;
         }

Once we see there's any RR tasks running, the return value
ignores any possible SCHED_OTHER tasks.  Only after the code
concludes there are no RR/FIFO tasks do we even look at
the over nr_running value.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-04 19:31   ` [PATCH] nohz_full: Make sched_should_stop_tick() more conservative Chris Metcalf
@ 2016-04-04 19:36     ` Rik van Riel
  2016-04-05  0:27       ` Chris Metcalf
  0 siblings, 1 reply; 12+ messages in thread
From: Rik van Riel @ 2016-04-04 19:36 UTC (permalink / raw)
  To: Chris Metcalf, Frederic Weisbecker, Christoph Lameter,
	Ingo Molnar, Luiz Capitulino, Peter Zijlstra, Thomas Gleixner,
	Viresh Kumar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2766 bytes --]

On Mon, 2016-04-04 at 15:31 -0400, Chris Metcalf wrote:
> On 4/4/2016 3:12 PM, Rik van Riel wrote:
> > 
> > On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote:
> > > 
> > > On arm64, when calling enqueue_task_fair() from
> > > migration_cpu_stop(),
> > > we find the nr_running value updated by add_nr_running(), but the
> > > cfs.nr_running value has not always yet been
> > > updated.  Accordingly,
> > > the sched_can_stop_tick() false returns true when we are
> > > migrating a
> > > second task onto a core.
> > I don't get it.
> > 
> > Looking at the enqueue_task_fair(), I see this:
> > 
> >          for_each_sched_entity(se) {
> >                  cfs_rq = cfs_rq_of(se);
> >                  cfs_rq->h_nr_running++;
> > 		...
> > 	}
> > 
> >          if (!se)
> >                  add_nr_running(rq, 1);
> > 
> > What is the difference between cfs_rq->h_nr_running,
> > and rq->cfs.nr_running?
> > 
> > Why do we have two?
> > Are we simply testing against the wrong one in
> > sched_can_stop_tick?
> It seems that using the non-CFS one is what we want.  I don't know
> whether
> using a different CFS count instead might be more correct.
> 
> Since I'm not sure what causes the difference I see between tile
> (correct)
> and arm64 (incorrect) it's hard for me to speculate.
> 
> > 
> > > 
> > > Correct this by using rq->nr_running instead of rq-
> > > >cfs.nr_running.
> > > This should always be more conservative, and reverts the test to
> > > the
> > > form it had before commit 76d92ac305f2 ("sched: Migrate sched to
> > > use
> > > new tick dependency mask model").
> > That would cause us to run the timer tick while running
> > a single SCHED_RR real time task, with a single
> > SCHED_OTHER task sitting in the background (which will
> > not get run until the SCHED_RR task is done).
> No, because in sched_can_stop_tick(), we first handle the special
> cases of RR or FIFO tasks present.  For example, RR:
> 
>          if (rq->rt.rr_nr_running) {
>                  if (rq->rt.rr_nr_running == 1)
>                          return true;
>                  else
>                          return false;
>          }
> 
> Once we see there's any RR tasks running, the return value
> ignores any possible SCHED_OTHER tasks.  Only after the code
> concludes there are no RR/FIFO tasks do we even look at
> the over nr_running value.

OK, fair enough. I guess both of the RT cases are
covered already.

Patch gets my:

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-04 19:36     ` Rik van Riel
@ 2016-04-05  0:27       ` Chris Metcalf
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Metcalf @ 2016-04-05  0:27 UTC (permalink / raw)
  To: Rik van Riel, Frederic Weisbecker, Christoph Lameter,
	Ingo Molnar, Luiz Capitulino, Peter Zijlstra, Thomas Gleixner,
	Viresh Kumar, linux-kernel

On 4/4/2016 3:36 PM, Rik van Riel wrote:
>>> On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote:
>>>> On arm64, when calling enqueue_task_fair() from
>>>> migration_cpu_stop(),
>>>> we find the nr_running value updated by add_nr_running(), but the
>>>> cfs.nr_running value has not always yet been
>>>> updated.  Accordingly,
>>>> the sched_can_stop_tick() false returns true when we are
>>>> migrating a
>>>> second task onto a core.
>>>> Correct this by using rq->nr_running instead of rq-
>>>>> cfs.nr_running.
>>>> This should always be more conservative, and reverts the test to
>>>> the
>>>> form it had before commit 76d92ac305f2 ("sched: Migrate sched to
>>>> use
>>>> new tick dependency mask model").
>>>>
>>>>
> [...]
>
> Patch gets my:
>
> Acked-by: Rik van Riel <riel@redhat.com>

Thanks!  Whose tree should this go through: Frederic, PeterZ, Ingo?
Do any of you have any concerns with it?

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-04 19:23   ` Peter Zijlstra
@ 2016-04-18  2:00     ` Wanpeng Li
  2016-04-21 14:42       ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2016-04-18  2:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rik van Riel, Chris Metcalf, Frederic Weisbecker,
	Christoph Lameter, Ingo Molnar, Luiz Capitulino, Thomas Gleixner,
	Viresh Kumar, linux-kernel

Hi Peterz,
2016-04-05 3:23 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
>
>
> On 4 April 2016 21:12:23 CEST, Rik van Riel <riel@redhat.com> wrote:
>
>>What is the difference between cfs_rq->h_nr_running,
>>and rq->cfs.nr_running?
>>
>>Why do we have two?
>
>
> H is for hierarchy. That counts the total of runnable tasks in the entire child hierarchy. Nr_running is the number of se  entities in the current tree.

So I think we should at least change cfs_rq->nr_running to
cfs->h_nr_running, I can send a formal patch if you think it makes
sense. :-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1159423..79197df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -616,7 +616,7 @@ bool sched_can_stop_tick(struct rq *rq)
        }

        /* Normal multitasking need periodic preemption checks */
-       if (rq->cfs.nr_running > 1)
+       if (rq->cfs.h_nr_running > 1)
                return false;

        return true;

Regards,
Wanpeng Li

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-18  2:00     ` Wanpeng Li
@ 2016-04-21 14:42       ` Peter Zijlstra
  2016-04-21 16:03         ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-04-21 14:42 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Rik van Riel, Chris Metcalf, Frederic Weisbecker,
	Christoph Lameter, Ingo Molnar, Luiz Capitulino, Thomas Gleixner,
	Viresh Kumar, linux-kernel

On Mon, Apr 18, 2016 at 10:00:42AM +0800, Wanpeng Li wrote:
> > H is for hierarchy. That counts the total of runnable tasks in the
> > entire child hierarchy. Nr_running is the number of se  entities in
> > the current tree.
> 
> So I think we should at least change cfs_rq->nr_running to
> cfs->h_nr_running, I can send a formal patch if you think it makes
> sense. :-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1159423..79197df 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -616,7 +616,7 @@ bool sched_can_stop_tick(struct rq *rq)
>         }
> 
>         /* Normal multitasking need periodic preemption checks */
> -       if (rq->cfs.nr_running > 1)
> +       if (rq->cfs.h_nr_running > 1)
>                 return false;
> 
>         return true;

So I think that is indeed the right thing here. But looking at this
function I think there's more problems with it.

It seems to assume that if there's FIFO tasks, those will run. This is
incorrect. The FIFO task can have a lower prio than an RR task, in which
case the RR task will run.

So the whole fifo_nr_running test seems misplaced, it should go after
the rr_nr_running tests. That is, only if !rr_nr_running, can we use
fifo_nr_running like this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-21 14:42       ` Peter Zijlstra
@ 2016-04-21 16:03         ` Peter Zijlstra
  2016-04-25 21:30           ` Chris Metcalf
  2016-04-28 10:24           ` [tip:sched/urgent] nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick() tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2016-04-21 16:03 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Rik van Riel, Chris Metcalf, Frederic Weisbecker,
	Christoph Lameter, Ingo Molnar, Luiz Capitulino, Thomas Gleixner,
	Viresh Kumar, linux-kernel

On Thu, Apr 21, 2016 at 04:42:13PM +0200, Peter Zijlstra wrote:

> So I think that is indeed the right thing here. But looking at this
> function I think there's more problems with it.
> 
> It seems to assume that if there's FIFO tasks, those will run. This is
> incorrect. The FIFO task can have a lower prio than an RR task, in which
> case the RR task will run.
> 
> So the whole fifo_nr_running test seems misplaced, it should go after
> the rr_nr_running tests. That is, only if !rr_nr_running, can we use
> fifo_nr_running like this.

A little something like so perhaps; can anybody test?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ffec7d9e7763..4240686f6857 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -596,17 +596,8 @@ bool sched_can_stop_tick(struct rq *rq)
 		return false;
 
 	/*
-	 * FIFO realtime policy runs the highest priority task (after DEADLINE).
-	 * Other runnable tasks are of a lower priority. The scheduler tick
-	 * isn't needed.
-	 */
-	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
-	if (fifo_nr_running)
-		return true;
-
-	/*
-	 * Round-robin realtime tasks time slice with other tasks at the same
-	 * realtime priority.
+	 * If there are more than one RR tasks, we need the tick to effect the
+	 * actual RR behaviour.
 	 */
 	if (rq->rt.rr_nr_running) {
 		if (rq->rt.rr_nr_running == 1)
@@ -615,8 +606,20 @@ bool sched_can_stop_tick(struct rq *rq)
 			return false;
 	}
 
-	/* Normal multitasking need periodic preemption checks */
-	if (rq->cfs.nr_running > 1)
+	/*
+	 * If there's no RR tasks, but FIFO tasks, we can skip the tick, no
+	 * forced preemption between FIFO tasks.
+	 */
+	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
+	if (fifo_nr_running)
+		return true;
+
+	/*
+	 * If there are no DL,RR/FIFO tasks, there must only be CFS tasks left;
+	 * if there's more than one we need the tick for involuntary
+	 * preemption.
+	 */
+	if (rq->nr_running > 1)
 		return false;
 
 	return true;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative
  2016-04-21 16:03         ` Peter Zijlstra
@ 2016-04-25 21:30           ` Chris Metcalf
  2016-04-28 10:24           ` [tip:sched/urgent] nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick() tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Metcalf @ 2016-04-25 21:30 UTC (permalink / raw)
  To: Peter Zijlstra, Wanpeng Li
  Cc: Rik van Riel, Frederic Weisbecker, Christoph Lameter,
	Ingo Molnar, Luiz Capitulino, Thomas Gleixner, Viresh Kumar,
	linux-kernel

On 4/21/2016 12:03 PM, Peter Zijlstra wrote:
> On Thu, Apr 21, 2016 at 04:42:13PM +0200, Peter Zijlstra wrote:
>
>> So I think that is indeed the right thing here. But looking at this
>> function I think there's more problems with it.
>>
>> It seems to assume that if there's FIFO tasks, those will run. This is
>> incorrect. The FIFO task can have a lower prio than an RR task, in which
>> case the RR task will run.
>>
>> So the whole fifo_nr_running test seems misplaced, it should go after
>> the rr_nr_running tests. That is, only if !rr_nr_running, can we use
>> fifo_nr_running like this.
> A little something like so perhaps; can anybody test?

Tested-by: Chris Metcalf <cmetcalf@mellanox.com>

To be clear, I only tested that it fixed my original bug, where we weren't
kicking a remote cpu when we should have been; I have not tested that
it works properly in the presence of RR or FIFO scheduled tasks.

But this or something like it should definitely go into 4.6 before it's done.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip:sched/urgent] nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
  2016-04-21 16:03         ` Peter Zijlstra
  2016-04-25 21:30           ` Chris Metcalf
@ 2016-04-28 10:24           ` tip-bot for Peter Zijlstra
  2016-04-28 13:30             ` Frederic Weisbecker
  1 sibling, 1 reply; 12+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-04-28 10:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, riel, vincent.weaver, kernellwp, lcapitulino, peterz,
	efault, acme, cl, alexander.shishkin, mingo, tglx, viresh.kumar,
	cmetcalf, fweisbec, linux-kernel, hpa, eranian

Commit-ID:  2548d546d40c0014efdde88a53bf7896e917dcce
Gitweb:     http://git.kernel.org/tip/2548d546d40c0014efdde88a53bf7896e917dcce
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 21 Apr 2016 18:03:15 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 28 Apr 2016 10:28:55 +0200

nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()

Chris Metcalf reported a that sched_can_stop_tick() sometimes fails to
re-enable the tick.

His observed problem is that rq->cfs.nr_running can be 1 even though
there are multiple runnable CFS tasks. This happens in the cgroup
case, in which case cfs.nr_running is the number of runnable entities
for that level.

If there is a single runnable cgroup (which can have an arbitrary
number of runnable child entries itself) rq->cfs.nr_running will be 1.

However, looking at that function I think there's more problems with it.

It seems to assume that if there's FIFO tasks, those will run. This is
incorrect. The FIFO task can have a lower prio than an RR task, in which
case the RR task will run.

So the whole fifo_nr_running test seems misplaced, it should go after
the rr_nr_running tests. That is, only if !rr_nr_running, can we use
fifo_nr_running like this.

Reported-by: Chris Metcalf <cmetcalf@mellanox.com>
Tested-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Wanpeng Li <kernellwp@gmail.com>
Fixes: 76d92ac305f2 ("sched: Migrate sched to use new tick dependency mask model")
Link: http://lkml.kernel.org/r/20160421160315.GK24771@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..d1f7149 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -596,17 +596,8 @@ bool sched_can_stop_tick(struct rq *rq)
 		return false;
 
 	/*
-	 * FIFO realtime policy runs the highest priority task (after DEADLINE).
-	 * Other runnable tasks are of a lower priority. The scheduler tick
-	 * isn't needed.
-	 */
-	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
-	if (fifo_nr_running)
-		return true;
-
-	/*
-	 * Round-robin realtime tasks time slice with other tasks at the same
-	 * realtime priority.
+	 * If there are more than one RR tasks, we need the tick to effect the
+	 * actual RR behaviour.
 	 */
 	if (rq->rt.rr_nr_running) {
 		if (rq->rt.rr_nr_running == 1)
@@ -615,8 +606,20 @@ bool sched_can_stop_tick(struct rq *rq)
 			return false;
 	}
 
-	/* Normal multitasking need periodic preemption checks */
-	if (rq->cfs.nr_running > 1)
+	/*
+	 * If there's no RR tasks, but FIFO tasks, we can skip the tick, no
+	 * forced preemption between FIFO tasks.
+	 */
+	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
+	if (fifo_nr_running)
+		return true;
+
+	/*
+	 * If there are no DL,RR/FIFO tasks, there must only be CFS tasks left;
+	 * if there's more than one we need the tick for involuntary
+	 * preemption.
+	 */
+	if (rq->nr_running > 1)
 		return false;
 
 	return true;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [tip:sched/urgent] nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
  2016-04-28 10:24           ` [tip:sched/urgent] nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick() tip-bot for Peter Zijlstra
@ 2016-04-28 13:30             ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-28 13:30 UTC (permalink / raw)
  To: cl, alexander.shishkin, mingo, tglx, viresh.kumar, cmetcalf,
	linux-kernel, eranian, hpa, riel, jolsa, vincent.weaver,
	kernellwp, lcapitulino, peterz, acme, efault
  Cc: linux-tip-commits

On Thu, Apr 28, 2016 at 03:24:43AM -0700, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  2548d546d40c0014efdde88a53bf7896e917dcce
> Gitweb:     http://git.kernel.org/tip/2548d546d40c0014efdde88a53bf7896e917dcce
> Author:     Peter Zijlstra <peterz@infradead.org>
> AuthorDate: Thu, 21 Apr 2016 18:03:15 +0200
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Thu, 28 Apr 2016 10:28:55 +0200
> 
> nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
> 
> Chris Metcalf reported a that sched_can_stop_tick() sometimes fails to
> re-enable the tick.
> 
> His observed problem is that rq->cfs.nr_running can be 1 even though
> there are multiple runnable CFS tasks. This happens in the cgroup
> case, in which case cfs.nr_running is the number of runnable entities
> for that level.
> 
> If there is a single runnable cgroup (which can have an arbitrary
> number of runnable child entries itself) rq->cfs.nr_running will be 1.
> 
> However, looking at that function I think there's more problems with it.
> 
> It seems to assume that if there's FIFO tasks, those will run. This is
> incorrect. The FIFO task can have a lower prio than an RR task, in which
> case the RR task will run.
> 
> So the whole fifo_nr_running test seems misplaced, it should go after
> the rr_nr_running tests. That is, only if !rr_nr_running, can we use
> fifo_nr_running like this.

Thanks for this patch. I indeed made confusions around SCHED_RR and SCHED_FIFO priorities.

Too late for me to ACK but I would have. Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-04-28 13:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 19:42 [PATCH] nohz_full: Make sched_should_stop_tick() more conservative Chris Metcalf
2016-04-04 19:12 ` Rik van Riel
2016-04-04 19:23   ` Peter Zijlstra
2016-04-18  2:00     ` Wanpeng Li
2016-04-21 14:42       ` Peter Zijlstra
2016-04-21 16:03         ` Peter Zijlstra
2016-04-25 21:30           ` Chris Metcalf
2016-04-28 10:24           ` [tip:sched/urgent] nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick() tip-bot for Peter Zijlstra
2016-04-28 13:30             ` Frederic Weisbecker
2016-04-04 19:31   ` [PATCH] nohz_full: Make sched_should_stop_tick() more conservative Chris Metcalf
2016-04-04 19:36     ` Rik van Riel
2016-04-05  0:27       ` Chris Metcalf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.