linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix
@ 2013-02-12 22:54 Steven Rostedt
  2013-02-12 22:54 ` [PATCH 1/3] sched/rt: Fix push_rt_task() to have the same checks as the caller did Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Steven Rostedt @ 2013-02-12 22:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Vincent Guittot, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1511 bytes --]


Ingo,

The first of the patches is a minor fix to when a woken RT task is about
to preempt a pinned RT task, push_rt_task() is called to try to
migrate the woken task if possible (to avoid preempting a pinned RT
task that may be the second highest priority task in the system).

But the issue is that push_rt_task() won't push it if the woken task
is higher priority even if the task to be preempted is pinned.

The second two patches is more of a clean up to remove the idle
hooks in the scheduler proper, and to use the pre/post schedule
methods instead.

This allows interrupts to be enabled in the idle balance, which slightly
helps latencies, especially for the -rt kernel. Other patches can be
added on top. Maybe in the future preemption could be enabled during
the idle balance as well. But that remains to be seen.

Please pull the latest tip/sched/core tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/sched/core

Head SHA1: 1db13ecf89054d39922e7b3323d124c7e2921560


Steven Rostedt (Red Hat) (3):
      sched/rt: Fix push_rt_task() to have the same checks as the caller did
      sched: Move idle_balance() to post_schedule
      sched: Enable interrupts in idle_balance()

----
 kernel/sched/core.c      |    3 ---
 kernel/sched/fair.c      |    8 +++++---
 kernel/sched/idle_task.c |   10 ++++++++++
 kernel/sched/rt.c        |   15 ++++++++++-----
 4 files changed, 25 insertions(+), 11 deletions(-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] sched/rt: Fix push_rt_task() to have the same checks as the caller did
  2013-02-12 22:54 [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Steven Rostedt
@ 2013-02-12 22:54 ` Steven Rostedt
  2013-02-12 22:54 ` [PATCH 2/3] sched: Move idle_balance() to post_schedule Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2013-02-12 22:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Vincent Guittot, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 2728 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Currently, push_rt_task() only pushes the task if it is lower
priority than the currently running task.

But it can be called for other reasons. Namely, if the current process
that is about to be preempted, is a real time task and is also pinned
to the CPU. This happens on wake up of high priority task. A check
is made in wake_up_rt() to see if the woken task can preempt the
task on its CPU, and the running task is not a pinned RT task. If
the task is pinned, and the woken task can migrate, it will try to
migrate it by calling push_rt_task().

Now in push_rt_task(), it will check if it can preempt the current
task but does not check if that task is pinned. If the woken task
is of higher priority, it wont try to migrate the woken task, even
if the other task is pinned. It will simply not push the task, which
is not consistent with the reason that push_rt_task() was called for
in the first place. Even for other callers of push_rt_task(), the task
should try to avoid preempting pinned RT tasks.

A helper routine is created call "ok_to_push_task()" that is now
used by both the wake_up_rt() and push_rt_task() code.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/rt.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c25de14..6f3108e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1613,6 +1613,14 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq)
 	return p;
 }
 
+static int ok_to_push_task(struct task_struct *p, struct task_struct *curr)
+{
+	return p->nr_cpus_allowed > 1 &&
+		rt_task(curr) &&
+		(curr->nr_cpus_allowed < 2 ||
+		 curr->prio <= p->prio);
+}
+
 /*
  * If the current CPU has more than one RT task, see if the non
  * running task can migrate over to a CPU that is running a task
@@ -1642,7 +1650,7 @@ retry:
 	 * higher priority than current. If that's the case
 	 * just reschedule current.
 	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
+	if (!ok_to_push_task(next_task, rq->curr)) {
 		resched_task(rq->curr);
 		return 0;
 	}
@@ -1807,10 +1815,7 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
 	if (!task_running(rq, p) &&
 	    !test_tsk_need_resched(rq->curr) &&
 	    has_pushable_tasks(rq) &&
-	    p->nr_cpus_allowed > 1 &&
-	    rt_task(rq->curr) &&
-	    (rq->curr->nr_cpus_allowed < 2 ||
-	     rq->curr->prio <= p->prio))
+	    ok_to_push_task(p, rq->curr))
 		push_rt_tasks(rq);
 }
 
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] sched: Move idle_balance() to post_schedule
  2013-02-12 22:54 [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Steven Rostedt
  2013-02-12 22:54 ` [PATCH 1/3] sched/rt: Fix push_rt_task() to have the same checks as the caller did Steven Rostedt
@ 2013-02-12 22:54 ` Steven Rostedt
  2013-02-13 18:43   ` Peter Zijlstra
  2013-02-12 22:54 ` [PATCH 3/3] sched: Enable interrupts in idle_balance() Steven Rostedt
  2013-02-13  8:33 ` [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Ingo Molnar
  3 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2013-02-12 22:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Vincent Guittot, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 3015 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The idle_balance() code is called to do task load balancing just before
going to idle. This makes sense as the CPU is about to sleep anyway.
But currently it's called in the middle of the scheduler and in a place
that must have interrupts disabled. That means, while the load balancing
is going on, if a task wakes up on this CPU, it wont get to run while
the interrupts are disabled. The idle task doing the balancing will be
clueless about it.

There's no real reason that the idle_balance() needs to be called in the
middle of schedule anyway. The only benefit is that if a task is pulled
to this CPU, it can be scheduled without the need to schedule the idle
task. But load balancing and migrating the task makes a switch to idle
and back negligible.

By using the post_schedule function pointer of the sched class, the
unlikely branch in the hot path of the scheduler can be removed, and
the idle task itself can do the load balancing.

Another advantage of this, is that by moving the idle_balance to the
post_schedule routine, interrupts can now be enabled in the load balance
allowing for interrupts and wakeups to still occur on that CPU while a
balance is taking place. The enabling of interrupts will come as a separate
patch.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/core.c      |    3 ---
 kernel/sched/idle_task.c |   10 ++++++++++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1dff78a..a9317b7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2927,9 +2927,6 @@ need_resched:
 
 	pre_schedule(rq, prev);
 
-	if (unlikely(!rq->nr_running))
-		idle_balance(cpu, rq);
-
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
 	clear_tsk_need_resched(prev);
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index b6baf37..66b5220 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -13,6 +13,11 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
 {
 	return task_cpu(p); /* IDLE tasks as never migrated */
 }
+
+static void post_schedule_idle(struct rq *rq)
+{
+	idle_balance(smp_processor_id(), rq);
+}
 #endif /* CONFIG_SMP */
 /*
  * Idle tasks are unconditionally rescheduled:
@@ -25,6 +30,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
 static struct task_struct *pick_next_task_idle(struct rq *rq)
 {
 	schedstat_inc(rq, sched_goidle);
+#ifdef CONFIG_SMP
+	/* Trigger the post schedule to do an idle_balance */
+	rq->post_schedule = 1;
+#endif
 	return rq->idle;
 }
 
@@ -86,6 +95,7 @@ const struct sched_class idle_sched_class = {
 
 #ifdef CONFIG_SMP
 	.select_task_rq		= select_task_rq_idle,
+	.post_schedule		= post_schedule_idle,
 #endif
 
 	.set_curr_task          = set_curr_task_idle,
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] sched: Enable interrupts in idle_balance()
  2013-02-12 22:54 [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Steven Rostedt
  2013-02-12 22:54 ` [PATCH 1/3] sched/rt: Fix push_rt_task() to have the same checks as the caller did Steven Rostedt
  2013-02-12 22:54 ` [PATCH 2/3] sched: Move idle_balance() to post_schedule Steven Rostedt
@ 2013-02-12 22:54 ` Steven Rostedt
  2013-02-13  8:33 ` [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Ingo Molnar
  3 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2013-02-12 22:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Vincent Guittot, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Now that the idle_balance is called from the post_schedule of the
idle task sched class, it is safe to enable interrupts. This allows
for better interaction of tasks waking up and other interrupts that
are triggered while the idle balance is in process.

Preemption is still disabled, but perhaps that can change as well.
That may need some more investigation.

It may be safe to also enable preemption, but we'll leave that change
for another time.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/fair.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ed18c74..0fcdbff 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5222,9 +5222,10 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	update_rq_runnable_avg(this_rq, 1);
 
 	/*
-	 * Drop the rq->lock, but keep IRQ/preempt disabled.
+	 * Drop the rq->lock, but keep preempt disabled.
 	 */
-	raw_spin_unlock(&this_rq->lock);
+	preempt_disable();
+	raw_spin_unlock_irq(&this_rq->lock);
 
 	update_blocked_averages(this_cpu);
 	rcu_read_lock();
@@ -5251,7 +5252,8 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	}
 	rcu_read_unlock();
 
-	raw_spin_lock(&this_rq->lock);
+	raw_spin_lock_irq(&this_rq->lock);
+	preempt_enable();
 
 	if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
 		/*
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix
  2013-02-12 22:54 [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Steven Rostedt
                   ` (2 preceding siblings ...)
  2013-02-12 22:54 ` [PATCH 3/3] sched: Enable interrupts in idle_balance() Steven Rostedt
@ 2013-02-13  8:33 ` Ingo Molnar
  3 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2013-02-13  8:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Vincent Guittot, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> Ingo,
> 
> The first of the patches is a minor fix to when a woken RT task is about
> to preempt a pinned RT task, push_rt_task() is called to try to
> migrate the woken task if possible (to avoid preempting a pinned RT
> task that may be the second highest priority task in the system).
> 
> But the issue is that push_rt_task() won't push it if the woken task
> is higher priority even if the task to be preempted is pinned.
> 
> The second two patches is more of a clean up to remove the idle
> hooks in the scheduler proper, and to use the pre/post schedule
> methods instead.
> 
> This allows interrupts to be enabled in the idle balance, which slightly
> helps latencies, especially for the -rt kernel. Other patches can be
> added on top. Maybe in the future preemption could be enabled during
> the idle balance as well. But that remains to be seen.
> 
> Please pull the latest tip/sched/core tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> tip/sched/core
> 
> Head SHA1: 1db13ecf89054d39922e7b3323d124c7e2921560
> 
> 
> Steven Rostedt (Red Hat) (3):
>       sched/rt: Fix push_rt_task() to have the same checks as the caller did
>       sched: Move idle_balance() to post_schedule
>       sched: Enable interrupts in idle_balance()
> 
> ----
>  kernel/sched/core.c      |    3 ---
>  kernel/sched/fair.c      |    8 +++++---
>  kernel/sched/idle_task.c |   10 ++++++++++
>  kernel/sched/rt.c        |   15 ++++++++++-----
>  4 files changed, 25 insertions(+), 11 deletions(-)

Pulled, thanks a lot Steve!

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
  2013-02-12 22:54 ` [PATCH 2/3] sched: Move idle_balance() to post_schedule Steven Rostedt
@ 2013-02-13 18:43   ` Peter Zijlstra
  2013-02-13 19:05     ` Steven Rostedt
  2013-02-14 14:25     ` Steven Rostedt
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2013-02-13 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Vincent Guittot, Frederic Weisbecker, Mike Galbraith

On Tue, 2013-02-12 at 17:54 -0500, Steven Rostedt wrote:
> There's no real reason that the idle_balance() needs to be called in
> the
> middle of schedule anyway. The only benefit is that if a task is
> pulled
> to this CPU, it can be scheduled without the need to schedule the idle
> task. 

Uhm, istr that extra schedule being an issue somewhere.. Make very sure
you don't regress anything silly like sysbench or hackbench. Maybe ask
Mike, he seems to have a better retention for benchmark weirdness than
me.

> But load balancing and migrating the task makes a switch to idle
> and back negligible.

How does that follow? We can have to-idle switches _far_ more often than
we balance.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
  2013-02-13 18:43   ` Peter Zijlstra
@ 2013-02-13 19:05     ` Steven Rostedt
  2013-02-15 11:51       ` Peter Zijlstra
  2013-02-14 14:25     ` Steven Rostedt
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2013-02-13 19:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Vincent Guittot, Frederic Weisbecker, Mike Galbraith

On Wed, 2013-02-13 at 19:43 +0100, Peter Zijlstra wrote:
> On Tue, 2013-02-12 at 17:54 -0500, Steven Rostedt wrote:
> > There's no real reason that the idle_balance() needs to be called in
> > the
> > middle of schedule anyway. The only benefit is that if a task is
> > pulled
> > to this CPU, it can be scheduled without the need to schedule the idle
> > task. 
> 
> Uhm, istr that extra schedule being an issue somewhere.. Make very sure
> you don't regress anything silly like sysbench or hackbench. Maybe ask
> Mike, he seems to have a better retention for benchmark weirdness than
> me.

I'm all for benchmarks :-)

> 
> > But load balancing and migrating the task makes a switch to idle
> > and back negligible.
> 
> How does that follow? We can have to-idle switches _far_ more often than
> we balance.

But we have the to-idle switch regardless in that case, don't we?

That is, the CPU is about to go idle, thus a load balance is done, and
perhaps a task is pulled to the current queue. To do this, rq locks and
such need to be grabbed across CPUs.

If it didn't balance the switch would happen anyways.

The current method is:

	if (!rq->nr_running)
		idle_balance();

	/* if something pulled, then run that instead,
	   if not, continue to switch to idle. */

What this change did was:

	pick_next_task_idle()
		rq->post_schedule = 1;

	post_schedule()
		idle_balance();

Now if a balance occurred, we would have to switch back from idle, to
what we just pulled.

Hence the change is that a switch to and from idle is done only in the
case that load balancing occurred, otherwise it just goes to idle like
it always has.

Or perhaps I need add another check to make sure rq->nr_running is still
0 before doing the balance, because the rq lock was released. That is:

	post_schedule()
		if (!rq->nr_running)
			idle_balance();

needs to be added.

-- Steve




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
  2013-02-13 18:43   ` Peter Zijlstra
  2013-02-13 19:05     ` Steven Rostedt
@ 2013-02-14 14:25     ` Steven Rostedt
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2013-02-14 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Vincent Guittot, Frederic Weisbecker, Mike Galbraith

On Wed, 2013-02-13 at 19:43 +0100, Peter Zijlstra wrote:

> How does that follow? We can have to-idle switches _far_ more often than
> we balance.

I ran "perf stat -a -r 100 hackbench 500" on an i7 4 core hyperthreaded
box, and got the following results:

With no patches applied:

 Performance counter stats for '/work/c/hackbench 500' (100 runs):

     199820.045583 task-clock                #    8.016 CPUs utilized            ( +-  5.29% ) [100.00%]
         3,594,264 context-switches          #    0.018 M/sec                    ( +-  5.94% ) [100.00%]
           352,240 cpu-migrations            #    0.002 M/sec                    ( +-  3.31% ) [100.00%]
         1,006,732 page-faults               #    0.005 M/sec                    ( +-  0.56% )
   293,801,912,874 cycles                    #    1.470 GHz                      ( +-  4.20% ) [100.00%]
   261,808,125,109 stalled-cycles-frontend   #   89.11% frontend cycles idle     ( +-  4.38% ) [100.00%]
   <not supported> stalled-cycles-backend  
   135,521,344,089 instructions              #    0.46  insns per cycle        
                                             #    1.93  stalled cycles per insn  ( +-  4.37% ) [100.00%]
    26,198,116,586 branches                  #  131.109 M/sec                    ( +-  4.59% ) [100.00%]
       115,326,812 branch-misses             #    0.44% of all branches          ( +-  4.12% )

      24.929136087 seconds time elapsed                                          ( +-  5.31% )


With the two idle_balance patches applied:

 Performance counter stats for '/work/c/hackbench 500' (100 runs):

     178619.929457 task-clock                #    8.011 CPUs utilized            ( +-  4.16% ) [100.00%]
         3,171,229 context-switches          #    0.018 M/sec                    ( +-  3.30% ) [100.00%]
           323,115 cpu-migrations            #    0.002 M/sec                    ( +-  2.47% ) [100.00%]
           994,506 page-faults               #    0.006 M/sec                    ( +-  0.52% )
   269,744,391,573 cycles                    #    1.510 GHz                      ( +-  2.12% ) [100.00%]
   238,589,242,461 stalled-cycles-frontend   #   88.45% frontend cycles idle     ( +-  2.26% ) [100.00%]
   <not supported> stalled-cycles-backend  
   124,298,251,712 instructions              #    0.46  insns per cycle        
                                             #    1.92  stalled cycles per insn  ( +-  1.99% ) [100.00%]
    23,918,305,712 branches                  #  133.906 M/sec                    ( +-  2.13% ) [100.00%]
       105,863,415 branch-misses             #    0.44% of all branches          ( +-  2.14% )

      22.296151996 seconds time elapsed                                          ( +-  4.18% )


And finally with the patch at the bottom of this email applied:

 Performance counter stats for '/work/c/hackbench 500' (100 runs):

     170170.547682 task-clock                #    8.021 CPUs utilized            ( +-  5.59% ) [100.00%]
         3,118,923 context-switches          #    0.018 M/sec                    ( +-  4.82% ) [100.00%]
           318,479 cpu-migrations            #    0.002 M/sec                    ( +-  2.58% ) [100.00%]
           988,187 page-faults               #    0.006 M/sec                    ( +-  0.62% )
   271,343,352,987 cycles                    #    1.595 GHz                      ( +-  3.84% ) [100.00%]
   240,599,089,430 stalled-cycles-frontend   #   88.67% frontend cycles idle     ( +-  4.24% ) [100.00%]
   <not supported> stalled-cycles-backend  
   125,888,645,748 instructions              #    0.46  insns per cycle        
                                             #    1.91  stalled cycles per insn  ( +-  3.97% ) [100.00%]
    24,219,147,811 branches                  #  142.323 M/sec                    ( +-  4.22% ) [100.00%]
       105,077,636 branch-misses             #    0.43% of all branches          ( +-  3.70% )

      21.214840224 seconds time elapsed                                          ( +-  5.61% )

Yeah, it seems the extra check for rq empty helps. But even without
that, the current idle patches still seem pretty good.

-- Steve

diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 66b5220..025350b 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -16,7 +16,9 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
 
 static void post_schedule_idle(struct rq *rq)
 {
-	idle_balance(smp_processor_id(), rq);
+	/* rq lock was released, make sure system is still idle */
+	if (likely(!rq->nr_running))
+		idle_balance(smp_processor_id(), rq);
 }
 #endif /* CONFIG_SMP */
 /*



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
  2013-02-13 19:05     ` Steven Rostedt
@ 2013-02-15 11:51       ` Peter Zijlstra
  2013-02-15 13:37         ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2013-02-15 11:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Vincent Guittot, Frederic Weisbecker, Mike Galbraith

On Wed, 2013-02-13 at 14:05 -0500, Steven Rostedt wrote:
> That is, the CPU is about to go idle, thus a load balance is done, and
> perhaps a task is pulled to the current queue. To do this, rq locks
> and
> such need to be grabbed across CPUs.

Right, grabbing the rq locks and all isn't my main worry, we do that
either case, but my worry was the two extra switches we do for no good
reason at all. 

Now its not as if we'll actually run the idle thread, that would be very
expensive indeed, so its just the two context_switch() calls, but still,
I somehow remember us spending quite a lot of effort to keep
idle_balance where it is now, if only I could remember the benchmark we
had for it :/

Can't you do the opposite and fold post_schedule() into idle_balance()?

/me goes stare at code to help remember what the -rt requirements were
again..

Ah, so that's push_rt_task() which wants to move extra rt tasks to other
cpus. Doing that from where we have idle_balance() won't actually work I
think since we might need to move current, which we cannot at that point
-- I'm thinking a higher prio task (than current) waking to this cpu and
then cascading current to another cpu, can that happen?

If we never need to migrate current because we don't do the cascade by
ensuring we wake the higher prio task to the approriate cpu we might
just get away with it.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] sched: Move idle_balance() to post_schedule
  2013-02-15 11:51       ` Peter Zijlstra
@ 2013-02-15 13:37         ` Steven Rostedt
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2013-02-15 13:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Vincent Guittot, Frederic Weisbecker, Mike Galbraith

On Fri, 2013-02-15 at 12:51 +0100, Peter Zijlstra wrote:
> On Wed, 2013-02-13 at 14:05 -0500, Steven Rostedt wrote:

> Ah, so that's push_rt_task() which wants to move extra rt tasks to other
> cpus. Doing that from where we have idle_balance() won't actually work I
> think since we might need to move current, which we cannot at that point
> -- I'm thinking a higher prio task (than current) waking to this cpu and
> then cascading current to another cpu, can that happen?

Yep, that's exactly why we do the push after the switch. It also catches
a bunch of races that can happen if in the middle of the switch another
CPU drops in priority. A drop in priority can cause that CPU to search
for pending rt tasks on other CPUs, but because current hasn't been
taken off the queue yet, it can miss trying to pull it, and current
could get stuck on its CPU even though there's another CPU it could run
on. Having the higher priority task do the check solves that race.

> 
> If we never need to migrate current because we don't do the cascade by
> ensuring we wake the higher prio task to the approriate cpu we might
> just get away with it.

I'm not exactly sure what you mean by this? Migrate the higher task? But
what if it's pinned. And we use to do that, until we found out that
higher priority tasks were ping-ponging around like crazy.

-- Steve



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-02-15 13:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-12 22:54 [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Steven Rostedt
2013-02-12 22:54 ` [PATCH 1/3] sched/rt: Fix push_rt_task() to have the same checks as the caller did Steven Rostedt
2013-02-12 22:54 ` [PATCH 2/3] sched: Move idle_balance() to post_schedule Steven Rostedt
2013-02-13 18:43   ` Peter Zijlstra
2013-02-13 19:05     ` Steven Rostedt
2013-02-15 11:51       ` Peter Zijlstra
2013-02-15 13:37         ` Steven Rostedt
2013-02-14 14:25     ` Steven Rostedt
2013-02-12 22:54 ` [PATCH 3/3] sched: Enable interrupts in idle_balance() Steven Rostedt
2013-02-13  8:33 ` [PATCH 0/3] [GIT PULL] sched: clean ups and a minor fix Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).