linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched: newidle and RT wake-buddy fixes
@ 2008-06-27 20:29 Gregory Haskins
  2008-06-27 20:29 ` [PATCH 1/3] sched: enable interrupts and drop rq-lock during newidle balancing Gregory Haskins
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-06-27 20:29 UTC (permalink / raw)
  To: mingo, peterz; +Cc: npiggin, ghaskins, rostedt, linux-kernel, linux-rt-users

Hi Ingo,
  The following patches apply to linux-tip/sched/devel and enhance the
performance of the kernel (specifically in PREEMPT_RT, though they do
not regress mainline performance as far as I can tell).  They offer
somewhere between 50-100% speedups in netperf performance, depending
on the test.

Should you be interested in these patches, you can optionally pull them
from git at the following URL:

git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git sched-devel

Comments/feedback/bugfixes welcome

Regards,
-Greg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] sched: enable interrupts and drop rq-lock during newidle balancing
  2008-06-27 20:29 [PATCH 0/3] sched: newidle and RT wake-buddy fixes Gregory Haskins
@ 2008-06-27 20:29 ` Gregory Haskins
  2008-06-27 20:29 ` [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over Gregory Haskins
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-06-27 20:29 UTC (permalink / raw)
  To: mingo, peterz; +Cc: npiggin, ghaskins, rostedt, linux-kernel, linux-rt-users

Oprofile data shows that the system may spend a significant amount of
time (60%+) in find_busiest_groups as a result of newidle balancing.  This
entire operation is a critical section since it occurs inline with
a schedule().  Since we do find_busiest_groups() et. al. without locks
held for normal balancing, lets do it for newidle as well.  It will
at least allow other cpus and interrupts to make forward progress
(against our RQ) while we try to balance.

Additionally, we abort the newidle processing if we are preempted.

This patch should both improve latency response, as well as increase
throughput.  It has shown to significantly contribute to a 6-12%
increase in network peformance.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 kernel/sched.c |   40 +++++++++++++++++++++++++++++++++-------
 1 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index c51d9fa..56722b1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3426,6 +3426,15 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd,
 
 	cpus_setall(*cpus);
 
+	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
+
+	/*
+	 * We are in a preempt-disabled section, so dropping the lock/irq
+	 * here simply means that other cores may acquire the lock,
+	 * and interrupts may occur.
+	 */
+	spin_unlock_irq(&this_rq->lock);
+
 	/*
 	 * When power savings policy is enabled for the parent domain, idle
 	 * sibling can pick up load irrespective of busy siblings. In this case,
@@ -3436,7 +3445,6 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd,
 	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
 		sd_idle = 1;
 
-	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
 redo:
 	group = find_busiest_group(sd, this_cpu, &imbalance, CPU_NEWLY_IDLE,
 				   &sd_idle, cpus, NULL);
@@ -3456,22 +3464,37 @@ redo:
 	schedstat_add(sd, lb_imbalance[CPU_NEWLY_IDLE], imbalance);
 
 	ld_moved = 0;
-	if (busiest->nr_running > 1) {
+	if (!need_resched() && busiest->nr_running > 1) {
 		/* Attempt to move tasks */
-		double_lock_balance(this_rq, busiest);
-		/* this_rq->clock is already updated */
-		update_rq_clock(busiest);
+		local_irq_disable();
+		double_rq_lock(this_rq, busiest);
+
+		BUG_ON(this_cpu != smp_processor_id());
+
+		/*
+		 * Checking rq->nr_running covers both the case where
+		 * newidle-balancing pulls a task, as well as if something
+		 * else issued a NEEDS_RESCHED (since we would only need
+		 * a reschedule if something was moved to us)
+		 */
+		if (this_rq->nr_running) {
+			spin_unlock(&busiest->lock);
+			goto out_balanced_locked;
+		}
+
 		ld_moved = move_tasks(this_rq, this_cpu, busiest,
 					imbalance, sd, CPU_NEWLY_IDLE,
 					&all_pinned);
 		spin_unlock(&busiest->lock);
 
-		if (unlikely(all_pinned)) {
+		if (unlikely(all_pinned && !this_rq->nr_running)) {
+			spin_unlock_irq(&this_rq->lock);
 			cpu_clear(cpu_of(busiest), *cpus);
 			if (!cpus_empty(*cpus))
 				goto redo;
 		}
-	}
+	} else
+		spin_lock_irq(&this_rq->lock);
 
 	if (!ld_moved) {
 		schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]);
@@ -3484,6 +3507,9 @@ redo:
 	return ld_moved;
 
 out_balanced:
+	spin_lock_irq(&this_rq->lock);
+
+out_balanced_locked:
 	schedstat_inc(sd, lb_balanced[CPU_NEWLY_IDLE]);
 	if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
 	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over
  2008-06-27 20:29 [PATCH 0/3] sched: newidle and RT wake-buddy fixes Gregory Haskins
  2008-06-27 20:29 ` [PATCH 1/3] sched: enable interrupts and drop rq-lock during newidle balancing Gregory Haskins
@ 2008-06-27 20:29 ` Gregory Haskins
  2008-07-08  5:00   ` Nick Piggin
  2008-06-27 20:30 ` [PATCH 3/3] sched: add avg-overlap support to RT tasks Gregory Haskins
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2008-06-27 20:29 UTC (permalink / raw)
  To: mingo, peterz; +Cc: npiggin, ghaskins, rostedt, linux-kernel, linux-rt-users

Inspired by Peter Zijlstra.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 kernel/sched.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 56722b1..0b9f90e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2863,6 +2863,10 @@ static int move_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest,
 				max_load_move - total_load_moved,
 				sd, idle, all_pinned, &this_best_prio);
 		class = class->next;
+
+		if (idle == CPU_NEWLY_IDLE && this_rq->nr_running)
+			break;
+
 	} while (class && max_load_move > total_load_moved);
 
 	return total_load_moved > 0;


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] sched: add avg-overlap support to RT tasks
  2008-06-27 20:29 [PATCH 0/3] sched: newidle and RT wake-buddy fixes Gregory Haskins
  2008-06-27 20:29 ` [PATCH 1/3] sched: enable interrupts and drop rq-lock during newidle balancing Gregory Haskins
  2008-06-27 20:29 ` [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over Gregory Haskins
@ 2008-06-27 20:30 ` Gregory Haskins
  2008-06-27 20:51 ` [PATCH 0/3] sched: newidle and RT wake-buddy fixes Peter Zijlstra
  2008-06-30 13:15 ` Ingo Molnar
  4 siblings, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-06-27 20:30 UTC (permalink / raw)
  To: mingo, peterz; +Cc: npiggin, ghaskins, rostedt, linux-kernel, linux-rt-users

We have the notion of tracking process-coupling (a.k.a. buddy-wake) via
the p->se.last_wake / p->se.avg_overlap facilities, but it is only used
for cfs to cfs interactions.  There is no reason why an rt to cfs
interaction cannot share in establishing a relationhip in a similar
manner.

Because PREEMPT_RT runs many kernel threads as FIFO priority, we often
times have heavy interaction between RT threads waking CFS applications.
This patch offers a substantial boost (50-60%+) in perfomance under those
circumstances.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 kernel/sched.c      |   14 ++++++++++++++
 kernel/sched_fair.c |   21 ++-------------------
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0b9f90e..af9a68b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1544,6 +1544,12 @@ static void set_load_weight(struct task_struct *p)
 	p->se.load.inv_weight = prio_to_wmult[p->static_prio - MAX_RT_PRIO];
 }
 
+static void update_avg(u64 *avg, u64 sample)
+{
+	s64 diff = sample - *avg;
+	*avg += diff >> 3;
+}
+
 static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
 {
 	sched_info_queued(p);
@@ -1553,6 +1559,12 @@ static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+	if (sleep && p->se.last_wakeup) {
+		update_avg(&p->se.avg_overlap,
+			   p->se.sum_exec_runtime - p->se.last_wakeup);
+		p->se.last_wakeup = 0;
+	}
+
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
 }
@@ -2157,6 +2169,8 @@ out_running:
 		p->sched_class->task_wake_up(rq, p);
 #endif
 out:
+	current->se.last_wakeup = current->se.sum_exec_runtime;
+
 	task_rq_unlock(rq, &flags);
 
 	return success;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8e77d67..3c06027 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -639,21 +639,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int wakeup)
 		__enqueue_entity(cfs_rq, se);
 }
 
-static void update_avg(u64 *avg, u64 sample)
-{
-	s64 diff = sample - *avg;
-	*avg += diff >> 3;
-}
-
-static void update_avg_stats(struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
-	if (!se->last_wakeup)
-		return;
-
-	update_avg(&se->avg_overlap, se->sum_exec_runtime - se->last_wakeup);
-	se->last_wakeup = 0;
-}
-
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep)
 {
@@ -664,7 +649,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep)
 
 	update_stats_dequeue(cfs_rq, se);
 	if (sleep) {
-		update_avg_stats(cfs_rq, se);
 #ifdef CONFIG_SCHEDSTATS
 		if (entity_is_task(se)) {
 			struct task_struct *tsk = task_of(se);
@@ -1016,9 +1000,9 @@ wake_affine(struct rq *rq, struct sched_domain *this_sd, struct rq *this_rq,
 	 * a reasonable amount of time then attract this newly
 	 * woken task:
 	 */
-	if (sync && balanced && curr->sched_class == &fair_sched_class) {
+	if (sync && balanced) {
 		if (curr->se.avg_overlap < sysctl_sched_migration_cost &&
-				p->se.avg_overlap < sysctl_sched_migration_cost)
+		    p->se.avg_overlap < sysctl_sched_migration_cost)
 			return 1;
 	}
 
@@ -1177,7 +1161,6 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p)
 		return;
 	}
 
-	se->last_wakeup = se->sum_exec_runtime;
 	if (unlikely(se == pse))
 		return;
 


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-27 20:29 [PATCH 0/3] sched: newidle and RT wake-buddy fixes Gregory Haskins
                   ` (2 preceding siblings ...)
  2008-06-27 20:30 ` [PATCH 3/3] sched: add avg-overlap support to RT tasks Gregory Haskins
@ 2008-06-27 20:51 ` Peter Zijlstra
  2008-06-30 12:56   ` Ingo Molnar
  2008-06-30 13:15 ` Ingo Molnar
  4 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2008-06-27 20:51 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: mingo, npiggin, rostedt, linux-kernel, linux-rt-users

On Fri, 2008-06-27 at 14:29 -0600, Gregory Haskins wrote:
> Hi Ingo,
>   The following patches apply to linux-tip/sched/devel and enhance the
> performance of the kernel (specifically in PREEMPT_RT, though they do
> not regress mainline performance as far as I can tell).  They offer
> somewhere between 50-100% speedups in netperf performance, depending
> on the test.
> 
> Should you be interested in these patches, you can optionally pull them
> from git at the following URL:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git sched-devel
> 
> Comments/feedback/bugfixes welcome

Seem reasonable to me,

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-30 13:15 ` Ingo Molnar
@ 2008-06-30 11:20   ` Gregory Haskins
  2008-06-30 14:41   ` Gregory Haskins
  2008-06-30 17:16   ` Gregory Haskins
  2 siblings, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-06-30 11:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users

>>> On Mon, Jun 30, 2008 at  9:15 AM, in message <20080630131511.GA7506@elte.hu>,
Ingo Molnar <mingo@elte.hu> wrote: 

> * Gregory Haskins <ghaskins@novell.com> wrote:
> 
>> Hi Ingo,
>>   The following patches apply to linux-tip/sched/devel and enhance the
>> performance of the kernel (specifically in PREEMPT_RT, though they do
>> not regress mainline performance as far as I can tell).  They offer
>> somewhere between 50-100% speedups in netperf performance, depending
>> on the test.
> 
> -tip testing found this boot hang:

Ill take a look, Ingo.  Sorry for the inconvenience and thanks for the bisect.

-Greg

> 
>  Linux version 2.6.26-rc8-tip (mingo@dione) (gcc version 4.2.3) #12917 
>  SMP Mon Jun 30 15:06:32 CEST 2008
>  [...]
>  CPU 1/1 -> Node 0
>  CPU: Physical Processor ID: 0
>  CPU: Processor Core ID: 1
>  CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
>  Brought up 2 CPUs
>  Total of 2 processors activated (8041.15 BogoMIPS).
>  [ hard hang ]
> 
> with this config:
> 
>   http://redhat.com/~mingo/misc/config-Mon_Jun_30_14_54_19_CEST_2008.bad
> 
> full bootlog:
> 
>   http://redhat.com/~mingo/misc/hang-Mon_Jun_30_14_54_19_CEST_2008.bad
> 
> it should continue with this bootup sequence:
> 
>   calling  net_ns_init+0x0/0x1a0
>   net_namespace: 376 bytes
>   initcall net_ns_init+0x0/0x1a0 returned 0 after 0 msecs
>   calling  init_smp_flush+0x0/0x60
> 
> i've bisected it down to:
> 
> --------------
> | commit cc8160c56843201891766660e3816d2e546c1b17
> | Author: Gregory Haskins <ghaskins@novell.com>
> | Date:   Fri Jun 27 14:29:50 2008 -0600
> |
> |     sched: enable interrupts and drop rq-lock during newidle balancing
> --------------
> 
> so i've reverted that change for now.
> 
> 	Ingo



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-27 20:51 ` [PATCH 0/3] sched: newidle and RT wake-buddy fixes Peter Zijlstra
@ 2008-06-30 12:56   ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2008-06-30 12:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gregory Haskins, npiggin, rostedt, linux-kernel, linux-rt-users


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2008-06-27 at 14:29 -0600, Gregory Haskins wrote:
> > Hi Ingo,
> >   The following patches apply to linux-tip/sched/devel and enhance the
> > performance of the kernel (specifically in PREEMPT_RT, though they do
> > not regress mainline performance as far as I can tell).  They offer
> > somewhere between 50-100% speedups in netperf performance, depending
> > on the test.
> > 
> > Should you be interested in these patches, you can optionally pull them
> > from git at the following URL:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git sched-devel
> > 
> > Comments/feedback/bugfixes welcome
> 
> Seem reasonable to me,
> 
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

applied to tip/sched/devel - thanks guys.

I have also merged tip/sched/devel.smp-group-balance into 
tip/sched/devel, those changes have held up fine in testing, with little 
problems.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-27 20:29 [PATCH 0/3] sched: newidle and RT wake-buddy fixes Gregory Haskins
                   ` (3 preceding siblings ...)
  2008-06-27 20:51 ` [PATCH 0/3] sched: newidle and RT wake-buddy fixes Peter Zijlstra
@ 2008-06-30 13:15 ` Ingo Molnar
  2008-06-30 11:20   ` Gregory Haskins
                     ` (2 more replies)
  4 siblings, 3 replies; 22+ messages in thread
From: Ingo Molnar @ 2008-06-30 13:15 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: peterz, npiggin, rostedt, linux-kernel, linux-rt-users


* Gregory Haskins <ghaskins@novell.com> wrote:

> Hi Ingo,
>   The following patches apply to linux-tip/sched/devel and enhance the
> performance of the kernel (specifically in PREEMPT_RT, though they do
> not regress mainline performance as far as I can tell).  They offer
> somewhere between 50-100% speedups in netperf performance, depending
> on the test.

-tip testing found this boot hang:

 Linux version 2.6.26-rc8-tip (mingo@dione) (gcc version 4.2.3) #12917 
 SMP Mon Jun 30 15:06:32 CEST 2008
 [...]
 CPU 1/1 -> Node 0
 CPU: Physical Processor ID: 0
 CPU: Processor Core ID: 1
 CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
 Brought up 2 CPUs
 Total of 2 processors activated (8041.15 BogoMIPS).
 [ hard hang ]

with this config:

  http://redhat.com/~mingo/misc/config-Mon_Jun_30_14_54_19_CEST_2008.bad

full bootlog:

  http://redhat.com/~mingo/misc/hang-Mon_Jun_30_14_54_19_CEST_2008.bad

it should continue with this bootup sequence:

  calling  net_ns_init+0x0/0x1a0
  net_namespace: 376 bytes
  initcall net_ns_init+0x0/0x1a0 returned 0 after 0 msecs
  calling  init_smp_flush+0x0/0x60

i've bisected it down to:

--------------
| commit cc8160c56843201891766660e3816d2e546c1b17
| Author: Gregory Haskins <ghaskins@novell.com>
| Date:   Fri Jun 27 14:29:50 2008 -0600
|
|     sched: enable interrupts and drop rq-lock during newidle balancing
--------------

so i've reverted that change for now.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-30 13:15 ` Ingo Molnar
  2008-06-30 11:20   ` Gregory Haskins
@ 2008-06-30 14:41   ` Gregory Haskins
  2008-06-30 15:01     ` Steven Rostedt
  2008-06-30 17:16   ` Gregory Haskins
  2 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2008-06-30 14:41 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users

>>> On Mon, Jun 30, 2008 at  9:15 AM, in message <20080630131511.GA7506@elte.hu>,
Ingo Molnar <mingo@elte.hu> wrote: 

> * Gregory Haskins <ghaskins@novell.com> wrote:
> 
>> Hi Ingo,
>>   The following patches apply to linux-tip/sched/devel and enhance the
>> performance of the kernel (specifically in PREEMPT_RT, though they do
>> not regress mainline performance as far as I can tell).  They offer
>> somewhere between 50-100% speedups in netperf performance, depending
>> on the test.
> 
> -tip testing found this boot hang:

Ok, I dug in a little bit here.  I haven't verified this out yet, but I think the problem is that
your config is PREEMPT_VOLUNTARY which NOPs the preempt_disable() in schedule() that
I rely on to allow the lock to be dropped.  (Doh!)

One way I can fix this is to fixup the newidle() code to only play the irq dropping tricks
ifdef CONFIG_PREEMPT == TRUE.  Does this sound reasonable, or is there a better way
to address this?

-Greg

> 
>  Linux version 2.6.26-rc8-tip (mingo@dione) (gcc version 4.2.3) #12917 
>  SMP Mon Jun 30 15:06:32 CEST 2008
>  [...]
>  CPU 1/1 -> Node 0
>  CPU: Physical Processor ID: 0
>  CPU: Processor Core ID: 1
>  CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
>  Brought up 2 CPUs
>  Total of 2 processors activated (8041.15 BogoMIPS).
>  [ hard hang ]
> 
> with this config:
> 
>   http://redhat.com/~mingo/misc/config-Mon_Jun_30_14_54_19_CEST_2008.bad
> 
> full bootlog:
> 
>   http://redhat.com/~mingo/misc/hang-Mon_Jun_30_14_54_19_CEST_2008.bad
> 
> it should continue with this bootup sequence:
> 
>   calling  net_ns_init+0x0/0x1a0
>   net_namespace: 376 bytes
>   initcall net_ns_init+0x0/0x1a0 returned 0 after 0 msecs
>   calling  init_smp_flush+0x0/0x60
> 
> i've bisected it down to:
> 
> --------------
> | commit cc8160c56843201891766660e3816d2e546c1b17
> | Author: Gregory Haskins <ghaskins@novell.com>
> | Date:   Fri Jun 27 14:29:50 2008 -0600
> |
> |     sched: enable interrupts and drop rq-lock during newidle balancing
> --------------
> 
> so i've reverted that change for now.
> 
> 	Ingo



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-30 14:41   ` Gregory Haskins
@ 2008-06-30 15:01     ` Steven Rostedt
  0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2008-06-30 15:01 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ingo Molnar, peterz, npiggin, linux-kernel, linux-rt-users


On Mon, 30 Jun 2008, Gregory Haskins wrote:

>
> >>> On Mon, Jun 30, 2008 at  9:15 AM, in message <20080630131511.GA7506@elte.hu>,
> Ingo Molnar <mingo@elte.hu> wrote:
>
> > * Gregory Haskins <ghaskins@novell.com> wrote:
> >
> >> Hi Ingo,
> >>   The following patches apply to linux-tip/sched/devel and enhance the
> >> performance of the kernel (specifically in PREEMPT_RT, though they do
> >> not regress mainline performance as far as I can tell).  They offer
> >> somewhere between 50-100% speedups in netperf performance, depending
> >> on the test.
> >
> > -tip testing found this boot hang:
>
> Ok, I dug in a little bit here.  I haven't verified this out yet, but I think the problem is that
> your config is PREEMPT_VOLUNTARY which NOPs the preempt_disable() in schedule() that
> I rely on to allow the lock to be dropped.  (Doh!)

This means that there's a location somewhere that does a cond_resched?
Perhaps we should look to see what does that. And perhaps the
preempt_disable should not be allowing PREEMPT_VOLUNTARY to do scheduling
with cond_resched.

-- Steve

>
> One way I can fix this is to fixup the newidle() code to only play the irq dropping tricks
> ifdef CONFIG_PREEMPT == TRUE.  Does this sound reasonable, or is there a better way
> to address this?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-30 13:15 ` Ingo Molnar
  2008-06-30 11:20   ` Gregory Haskins
  2008-06-30 14:41   ` Gregory Haskins
@ 2008-06-30 17:16   ` Gregory Haskins
  2008-06-30 18:10     ` Ingo Molnar
  2008-07-03 14:41     ` Ingo Molnar
  2 siblings, 2 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-06-30 17:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users

>>> On Mon, Jun 30, 2008 at  9:15 AM, in message <20080630131511.GA7506@elte.hu>,
Ingo Molnar <mingo@elte.hu> wrote: 

> * Gregory Haskins <ghaskins@novell.com> wrote:
> 
>> Hi Ingo,
>>   The following patches apply to linux-tip/sched/devel and enhance the
>> performance of the kernel (specifically in PREEMPT_RT, though they do
>> not regress mainline performance as far as I can tell).  They offer
>> somewhere between 50-100% speedups in netperf performance, depending
>> on the test.
> 
> -tip testing found this boot hang:

I may have found the issue:  It looks like the hunk that initially disables interrupts in load_balance_newidle() was inadvertently applied to load_balance() instead during the
merge to linux-tip.  If you fold the following patch into my original patch, it should set
things right again.

-----

sched: fix merge problem with newidle enhancement patch

From: Gregory Haskins <ghaskins@novell.com>

commit cc8160c56843201891766660e3816d2e546c1b17 introduces a locking
enhancement for newidle.  However, one hunk misapplied to load_balance
instead of load_balance_newidle.  This fixes the issue.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 kernel/sched.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index f35d73c..f36406f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3459,15 +3459,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 	cpus_setall(*cpus);
 
-	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
-
-	/*
-	 * We are in a preempt-disabled section, so dropping the lock/irq
-	 * here simply means that other cores may acquire the lock,
-	 * and interrupts may occur.
-	 */
-	spin_unlock_irq(&this_rq->lock);
-
 	/*
 	 * When power savings policy is enabled for the parent domain, idle
 	 * sibling can pick up load irrespective of busy siblings. In this case,
@@ -3630,6 +3621,15 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd,
 
 	cpus_setall(*cpus);
 
+	schedstat_inc(sd, lb_count[CPU_NEWLY_IDLE]);
+
+	/*
+	 * We are in a preempt-disabled section, so dropping the lock/irq
+	 * here simply means that other cores may acquire the lock,
+	 * and interrupts may occur.
+	 */
+	spin_unlock_irq(&this_rq->lock);
+
 	/*
 	 * When power savings policy is enabled for the parent domain, idle
 	 * sibling can pick up load irrespective of busy siblings. In this case,


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-30 17:16   ` Gregory Haskins
@ 2008-06-30 18:10     ` Ingo Molnar
  2008-07-03 14:41     ` Ingo Molnar
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2008-06-30 18:10 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users


* Gregory Haskins <ghaskins@novell.com> wrote:

> > -tip testing found this boot hang:
> 
> I may have found the issue: It looks like the hunk that initially 
> disables interrupts in load_balance_newidle() was inadvertently 
> applied to load_balance() instead during the merge to linux-tip.  If 
> you fold the following patch into my original patch, it should set 
> things right again.

ah, that might have indeed happened. Good catch, i'll test it tomorrow.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-06-30 17:16   ` Gregory Haskins
  2008-06-30 18:10     ` Ingo Molnar
@ 2008-07-03 14:41     ` Ingo Molnar
  2008-07-03 15:12       ` Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2008-07-03 14:41 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users


* Gregory Haskins <ghaskins@novell.com> wrote:

> I may have found the issue: It looks like the hunk that initially 
> disables interrupts in load_balance_newidle() was inadvertently 
> applied to load_balance() instead during the merge to linux-tip.  If 
> you fold the following patch into my original patch, it should set 
> things right again.

ah, sorry - indeed! I've reactivated your patch and i'm testing it now.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-07-03 14:41     ` Ingo Molnar
@ 2008-07-03 15:12       ` Ingo Molnar
  2008-07-08 12:38         ` Gregory Haskins
  2008-07-08 16:45         ` Gregory Haskins
  0 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2008-07-03 15:12 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Gregory Haskins <ghaskins@novell.com> wrote:
> 
> > I may have found the issue: It looks like the hunk that initially 
> > disables interrupts in load_balance_newidle() was inadvertently 
> > applied to load_balance() instead during the merge to linux-tip.  If 
> > you fold the following patch into my original patch, it should set 
> > things right again.
> 
> ah, sorry - indeed! I've reactivated your patch and i'm testing it 
> now.

-tip testing found that it still hangs at:

 Intel machine check architecture supported.
 Intel machine check reporting enabled on CPU#1.
 x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
 CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
 Brought up 2 CPUs
 Total of 2 processors activated (8044.97 BogoMIPS).
 [...]

with this config:

  http://redhat.com/~mingo/misc/config-Thu_Jul__3_16_49_53_CEST_2008.bad

on 3 separate test-systems.

I've again reverted both your fixup and the original commit. I've pushed 
out the failing tree to the tmp.sched/devel.fe6149f5e82 -tip branch, you 
should be able to reproduce it.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over
  2008-06-27 20:29 ` [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over Gregory Haskins
@ 2008-07-08  5:00   ` Nick Piggin
  2008-07-08 12:37     ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Nick Piggin @ 2008-07-08  5:00 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: mingo, peterz, npiggin, rostedt, linux-kernel, linux-rt-users

On Saturday 28 June 2008 06:29, Gregory Haskins wrote:
> Inspired by Peter Zijlstra.
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>

What happened to the feedback I sent about this?

It is still nack from me.


> ---
>
>  kernel/sched.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 56722b1..0b9f90e 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2863,6 +2863,10 @@ static int move_tasks(struct rq *this_rq, int
> this_cpu, struct rq *busiest, max_load_move - total_load_moved,
>  				sd, idle, all_pinned, &this_best_prio);
>  		class = class->next;
> +
> +		if (idle == CPU_NEWLY_IDLE && this_rq->nr_running)
> +			break;
> +
>  	} while (class && max_load_move > total_load_moved);
>
>  	return total_load_moved > 0;
>
> --

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over
  2008-07-08  5:00   ` Nick Piggin
@ 2008-07-08 12:37     ` Gregory Haskins
  2008-07-09  8:09       ` Nick Piggin
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2008-07-08 12:37 UTC (permalink / raw)
  To: Nick Piggin; +Cc: mingo, rostedt, peterz, npiggin, linux-kernel, linux-rt-users

>>> On Tue, Jul 8, 2008 at  1:00 AM, in message
<200807081500.18245.nickpiggin@yahoo.com.au>, Nick Piggin
<nickpiggin@yahoo.com.au> wrote: 
> On Saturday 28 June 2008 06:29, Gregory Haskins wrote:
>> Inspired by Peter Zijlstra.
>>
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> 
> What happened to the feedback I sent about this?
> 
> It is still nack from me.

Ah yes.  Slipped through the cracks...sorry about that.

What if we did "if (idle == CPU_NEWLY_IDLE && need_resched())" instead?

-Greg

> 
> 
>> ---
>>
>>  kernel/sched.c |    4 ++++
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index 56722b1..0b9f90e 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -2863,6 +2863,10 @@ static int move_tasks(struct rq *this_rq, int
>> this_cpu, struct rq *busiest, max_load_move - total_load_moved,
>>  				sd, idle, all_pinned, &this_best_prio);
>>  		class = class->next;
>> +
>> +		if (idle == CPU_NEWLY_IDLE && this_rq->nr_running)
>> +			break;
>> +
>>  	} while (class && max_load_move > total_load_moved);
>>
>>  	return total_load_moved > 0;
>>
>> --




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-07-03 15:12       ` Ingo Molnar
@ 2008-07-08 12:38         ` Gregory Haskins
  2008-07-08 16:45         ` Gregory Haskins
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-07-08 12:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users

>>> On Thu, Jul 3, 2008 at 11:12 AM, in message <20080703151209.GA17059@elte.hu>,
Ingo Molnar <mingo@elte.hu> wrote: 

> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>> 
>> * Gregory Haskins <ghaskins@novell.com> wrote:
>> 
>> > I may have found the issue: It looks like the hunk that initially 
>> > disables interrupts in load_balance_newidle() was inadvertently 
>> > applied to load_balance() instead during the merge to linux-tip.  If 
>> > you fold the following patch into my original patch, it should set 
>> > things right again.
>> 
>> ah, sorry - indeed! I've reactivated your patch and i'm testing it 
>> now.
> 
> -tip testing found that it still hangs at:

Hmm..my fix + your config booted for me at the time I submitted it.  I will try to repro
this issue with the new info.

Regards,
-Greg

> 
>  Intel machine check architecture supported.
>  Intel machine check reporting enabled on CPU#1.
>  x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
>  CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
>  Brought up 2 CPUs
>  Total of 2 processors activated (8044.97 BogoMIPS).
>  [...]
> 
> with this config:
> 
>   http://redhat.com/~mingo/misc/config-Thu_Jul__3_16_49_53_CEST_2008.bad
> 
> on 3 separate test-systems.
> 
> I've again reverted both your fixup and the original commit. I've pushed 
> out the failing tree to the tmp.sched/devel.fe6149f5e82 -tip branch, you 
> should be able to reproduce it.
> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/3] sched: newidle and RT wake-buddy fixes
  2008-07-03 15:12       ` Ingo Molnar
  2008-07-08 12:38         ` Gregory Haskins
@ 2008-07-08 16:45         ` Gregory Haskins
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-07-08 16:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: rostedt, peterz, npiggin, linux-kernel, linux-rt-users

Hi Ingo,

>>> On Thu, Jul 3, 2008 at 11:12 AM, in message <20080703151209.GA17059@elte.hu>,
Ingo Molnar <mingo@elte.hu> wrote: 

> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>> 
>> * Gregory Haskins <ghaskins@novell.com> wrote:
>> 
>> > I may have found the issue: It looks like the hunk that initially 
>> > disables interrupts in load_balance_newidle() was inadvertently 
>> > applied to load_balance() instead during the merge to linux-tip.  If 
>> > you fold the following patch into my original patch, it should set 
>> > things right again.
>> 
>> ah, sorry - indeed! I've reactivated your patch and i'm testing it 
>> now.
> 
> -tip testing found that it still hangs at:

I was not able to reproduce the hang by taking sched/devel/HEAD, my patch, and your
config (or any config for that matter).  We will have to back-burner this patch for now
until I can think of a way to figure out what is wrong here.  Sorry for the troubles.

-Greg




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over
  2008-07-08 12:37     ` Gregory Haskins
@ 2008-07-09  8:09       ` Nick Piggin
  2008-07-09 10:53         ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Nick Piggin @ 2008-07-09  8:09 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: mingo, rostedt, peterz, npiggin, linux-kernel, linux-rt-users

On Tuesday 08 July 2008 22:37, Gregory Haskins wrote:
> >>> On Tue, Jul 8, 2008 at  1:00 AM, in message
>
> <200807081500.18245.nickpiggin@yahoo.com.au>, Nick Piggin
>
> <nickpiggin@yahoo.com.au> wrote:
> > On Saturday 28 June 2008 06:29, Gregory Haskins wrote:
> >> Inspired by Peter Zijlstra.
> >>
> >> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> >
> > What happened to the feedback I sent about this?
> >
> > It is still nack from me.
>
> Ah yes.  Slipped through the cracks...sorry about that.
>
> What if we did "if (idle == CPU_NEWLY_IDLE && need_resched())" instead?

Isn't that exactly the same thing because any task will preempt the idle
thread?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over
  2008-07-09  8:09       ` Nick Piggin
@ 2008-07-09 10:53         ` Gregory Haskins
  2008-07-09 11:17           ` Nick Piggin
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Haskins @ 2008-07-09 10:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: mingo, rostedt, peterz, npiggin, linux-kernel, linux-rt-users

>>> On Wed, Jul 9, 2008 at  4:09 AM, in message
<200807091809.52293.nickpiggin@yahoo.com.au>, Nick Piggin
<nickpiggin@yahoo.com.au> wrote: 
> On Tuesday 08 July 2008 22:37, Gregory Haskins wrote:
>> >>> On Tue, Jul 8, 2008 at  1:00 AM, in message
>>
>> <200807081500.18245.nickpiggin@yahoo.com.au>, Nick Piggin
>>
>> <nickpiggin@yahoo.com.au> wrote:
>> > On Saturday 28 June 2008 06:29, Gregory Haskins wrote:
>> >> Inspired by Peter Zijlstra.
>> >>
>> >> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> >
>> > What happened to the feedback I sent about this?
>> >
>> > It is still nack from me.
>>
>> Ah yes.  Slipped through the cracks...sorry about that.
>>
>> What if we did "if (idle == CPU_NEWLY_IDLE && need_resched())" instead?
> 
> Isn't that exactly the same thing

Not quite.  The former version would break on *any* succesful enqueue (as a result of a
local move_task as well as a remote wake-up/migration).  The latter version will only
break on the the remote variety.  You were concerned about stopping a move_task
operation early because it would reduce efficiency, and I do not entirely disagree.
However, this really only concerns the local type (which have now been removed).

Remote preemptions should (IMO) always break immediately because it would have been likely
to invalidate the f_b_g() calculation anyway, and low-latency requirements dictate
its the right thing to do.

> because any task will preempt the idle thread?

During NEWIDLE this is a preempt-disabled section because we are still in the middle
of a schedule(). Therefore there will be no involuntary preemption and that is why we
are concerned with making sure we check for voluntary preemption.  The move_tasks()
will run to completion without this patch.  With this patch it will break the operation if
someone tries to preempt us.

I'll keep an open mind but I am pretty sure this is something we should be doing. As far
as I can tell, there should be no downside with this second version.  As a compromise
we could put an #ifdef CONFIG_PREEMPT around this new logic, but I don't think it
is strictly necessary.

Regards,
-Greg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over
  2008-07-09 10:53         ` Gregory Haskins
@ 2008-07-09 11:17           ` Nick Piggin
  2008-07-09 11:53             ` Gregory Haskins
  0 siblings, 1 reply; 22+ messages in thread
From: Nick Piggin @ 2008-07-09 11:17 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: mingo, rostedt, peterz, npiggin, linux-kernel, linux-rt-users

On Wednesday 09 July 2008 20:53, Gregory Haskins wrote:
> >>> On Wed, Jul 9, 2008 at  4:09 AM, in message
>
> <200807091809.52293.nickpiggin@yahoo.com.au>, Nick Piggin
>
> <nickpiggin@yahoo.com.au> wrote:
> > On Tuesday 08 July 2008 22:37, Gregory Haskins wrote:
> >> >>> On Tue, Jul 8, 2008 at  1:00 AM, in message
> >>
> >> <200807081500.18245.nickpiggin@yahoo.com.au>, Nick Piggin
> >>
> >> <nickpiggin@yahoo.com.au> wrote:
> >> > On Saturday 28 June 2008 06:29, Gregory Haskins wrote:
> >> >> Inspired by Peter Zijlstra.
> >> >>
> >> >> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> >> >
> >> > What happened to the feedback I sent about this?
> >> >
> >> > It is still nack from me.
> >>
> >> Ah yes.  Slipped through the cracks...sorry about that.
> >>
> >> What if we did "if (idle == CPU_NEWLY_IDLE && need_resched())" instead?
> >
> > Isn't that exactly the same thing
>
> Not quite.  The former version would break on *any* succesful enqueue (as a
> result of a local move_task as well as a remote wake-up/migration).  The
> latter version will only break on the the remote variety.  You were
> concerned about stopping a move_task operation early because it would
> reduce efficiency, and I do not entirely disagree. However, this really
> only concerns the local type (which have now been removed).
>
> Remote preemptions should (IMO) always break immediately because it would
> have been likely to invalidate the f_b_g() calculation anyway, and
> low-latency requirements dictate its the right thing to do.

I thought this was about newidle balancing? Tasks are always going to
be coming from remote runqueues, aren't they?


> > because any task will preempt the idle thread?
>
> During NEWIDLE this is a preempt-disabled section because we are still in
> the middle of a schedule(). Therefore there will be no involuntary
> preemption and that is why we are concerned with making sure we check for
> voluntary preemption.  The move_tasks() will run to completion without this
> patch.  With this patch it will break the operation if someone tries to
> preempt us.

Firstly, won't the act of pulling tasks set the need_resched condition?

Secondly, even if it does what you say, what exactly would be the difference
between blocking a newly moved task from running and blocking a newly woken
task from running? Either way you introduce the same worst case latency
condition.


> I'll keep an open mind but I am pretty sure this is something we should be
> doing. As far as I can tell, there should be no downside with this second
> version. 

I don't think it has really been thought through that well. So I'm still
against it.

> As a compromise we could put an #ifdef CONFIG_PREEMPT around this 
> new logic, but I don't think it is strictly necessary.

That's not very nice. It's reasonable to run with CONFIG_PREEMPT but not
blindly want to trade latency for throughput.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over
  2008-07-09 11:17           ` Nick Piggin
@ 2008-07-09 11:53             ` Gregory Haskins
  0 siblings, 0 replies; 22+ messages in thread
From: Gregory Haskins @ 2008-07-09 11:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: mingo, rostedt, peterz, npiggin, linux-kernel, linux-rt-users

>>> On Wed, Jul 9, 2008 at  7:17 AM, in message
<200807092117.01669.nickpiggin@yahoo.com.au>, Nick Piggin
<nickpiggin@yahoo.com.au> wrote: 
> On Wednesday 09 July 2008 20:53, Gregory Haskins wrote:
>> >>> On Wed, Jul 9, 2008 at  4:09 AM, in message
>>
>> <200807091809.52293.nickpiggin@yahoo.com.au>, Nick Piggin
>>
>> <nickpiggin@yahoo.com.au> wrote:
>> > On Tuesday 08 July 2008 22:37, Gregory Haskins wrote:
>> >> >>> On Tue, Jul 8, 2008 at  1:00 AM, in message
>> >>
>> >> <200807081500.18245.nickpiggin@yahoo.com.au>, Nick Piggin
>> >>
>> >> <nickpiggin@yahoo.com.au> wrote:
>> >> > On Saturday 28 June 2008 06:29, Gregory Haskins wrote:
>> >> >> Inspired by Peter Zijlstra.
>> >> >>
>> >> >> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> >> >
>> >> > What happened to the feedback I sent about this?
>> >> >
>> >> > It is still nack from me.
>> >>
>> >> Ah yes.  Slipped through the cracks...sorry about that.
>> >>
>> >> What if we did "if (idle == CPU_NEWLY_IDLE && need_resched())" instead?
>> >
>> > Isn't that exactly the same thing
>>
>> Not quite.  The former version would break on *any* succesful enqueue (as a
>> result of a local move_task as well as a remote wake-up/migration).  The
>> latter version will only break on the the remote variety.  You were
>> concerned about stopping a move_task operation early because it would
>> reduce efficiency, and I do not entirely disagree. However, this really
>> only concerns the local type (which have now been removed).
>>
>> Remote preemptions should (IMO) always break immediately because it would
>> have been likely to invalidate the f_b_g() calculation anyway, and
>> low-latency requirements dictate its the right thing to do.
> 
> I thought this was about newidle balancing? Tasks are always going to
> be coming from remote runqueues, aren't they?

Yes, but you misunderstand me.  I am referring to "push" (remote moves to us) verses
"pull" (we move from remote).  During a move_task() we sometimes have to drop the
RQ lock in the double_lock balance.  This gives a remote CPU a chance to grab the lock
and potentially move tasks to us as part of either a migration operation, or a wake-up.

When this happens, several things should be noted:  1) it will change the load "landscape"
such that any previous computation in f_b_g() is potentially invalid.  2) The task that was
moved may be higher priority and therefore should not have to wait for move_tasks() to
finish moving some arbitrary number of lower-priority tasks (and note that "lower prio"
is a high-probability since NEWIDLE only does CFS tasks, and only RT tasks typically
migrate like this).

Therefore, IMO it doesnt make sense to continue moving more load.  Just stop and let the
scheduler sort it out.  At the very least it needs to recompute how much load to move.

> 
> 
>> > because any task will preempt the idle thread?
>>
>> During NEWIDLE this is a preempt-disabled section because we are still in
>> the middle of a schedule(). Therefore there will be no involuntary
>> preemption and that is why we are concerned with making sure we check for
>> voluntary preemption.  The move_tasks() will run to completion without this
>> patch.  With this patch it will break the operation if someone tries to
>> preempt us.
> 
> Firstly, won't the act of pulling tasks set the need_resched condition?

Hmm.. Indeed.   You are probably right about that and I need some other way to indicate
that a task was pushed to us over anything we might have pulled.

> 
> Secondly, even if it does what you say, what exactly would be the difference
> between blocking a newly moved task from running and blocking a newly woken
> task from running? Either way you introduce the same worst case latency
> condition.

Tasks that are pushed to us have a good chance to be RT (since RT is a heavy user of 
"push" methods, while CFS is mostly pull).  Conversely, tasks that are pulled to us by
newidle are guaranteed *not* to be RT (since newidle balancing will only pull CFS tasks).

Perhaps that is the answer: terminate on s/need_resched()/rq->rt.nr_running.  Its not
exactly scalable to an arbitrary arrangement of future sched_classes, but that could be
addresses when those sched_classes become available.

To be fair, I think this is what Peter was trying to do with his more elaborate version of
patches that I based this one on.

> 
> 
>> I'll keep an open mind but I am pretty sure this is something we should be
>> doing. As far as I can tell, there should be no downside with this second
>> version. 
> 
> I don't think it has really been thought through that well. So I'm still
> against it.
> 
>> As a compromise we could put an #ifdef CONFIG_PREEMPT around this 
>> new logic, but I don't think it is strictly necessary.
> 
> That's not very nice. It's reasonable to run with CONFIG_PREEMPT but not
> blindly want to trade latency for throughput.

How do you come to this conclusion?  Continuing to perform a move under these
circumstances (or at least, my intended circumstances) is against stale data and
could just as well hurt throughput as much as help it.  Since the move is
essentially arbitrary once this threshold is crossed, even the throughput will
become non-deterministic ;)

Regards,
-Greg



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-07-09 12:01 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-27 20:29 [PATCH 0/3] sched: newidle and RT wake-buddy fixes Gregory Haskins
2008-06-27 20:29 ` [PATCH 1/3] sched: enable interrupts and drop rq-lock during newidle balancing Gregory Haskins
2008-06-27 20:29 ` [PATCH 2/3] sched: terminate newidle balancing once at least one task has moved over Gregory Haskins
2008-07-08  5:00   ` Nick Piggin
2008-07-08 12:37     ` Gregory Haskins
2008-07-09  8:09       ` Nick Piggin
2008-07-09 10:53         ` Gregory Haskins
2008-07-09 11:17           ` Nick Piggin
2008-07-09 11:53             ` Gregory Haskins
2008-06-27 20:30 ` [PATCH 3/3] sched: add avg-overlap support to RT tasks Gregory Haskins
2008-06-27 20:51 ` [PATCH 0/3] sched: newidle and RT wake-buddy fixes Peter Zijlstra
2008-06-30 12:56   ` Ingo Molnar
2008-06-30 13:15 ` Ingo Molnar
2008-06-30 11:20   ` Gregory Haskins
2008-06-30 14:41   ` Gregory Haskins
2008-06-30 15:01     ` Steven Rostedt
2008-06-30 17:16   ` Gregory Haskins
2008-06-30 18:10     ` Ingo Molnar
2008-07-03 14:41     ` Ingo Molnar
2008-07-03 15:12       ` Ingo Molnar
2008-07-08 12:38         ` Gregory Haskins
2008-07-08 16:45         ` Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).