All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Skip double execution of pick_next_task_fair
@ 2014-04-23 22:31 Tim Chen
  2014-04-24 10:00 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Tim Chen @ 2014-04-23 22:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel, Andi Kleen, Len Brown

The current code will call pick_next_task_fair a second time
in the slow path if we did not pull any task in our first try.
This is really unnecessary as we already know no task can
be pulled and it doubles the delay for the cpu to enter idle.

We instrumented some network workloads and that saw that
pick_next_task_fair is frequently called twice before a cpu enters idle.
The call to pick_next_task_fair can add
non trivial latency as it calls load_balance which runs find_busiest_group
on an hierarchy of sched domains spanning the cpus for a large system.  For
some 4 socket systems, we saw almost 0.25 msec spent per call
of pick_next_task_fair before a cpu can be idled.

This patch skips pick_next_task_fair in the slow path if it
has already been invoked.

Tim

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1d1b87b..4053437 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2583,6 +2583,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev)
 {
 	const struct sched_class *class = &fair_sched_class;
 	struct task_struct *p;
+	int skip_fair = 0;
 
 	/*
 	 * Optimization: we know that if all tasks are in
@@ -2591,12 +2592,17 @@ pick_next_task(struct rq *rq, struct task_struct *prev)
 	if (likely(prev->sched_class == class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 		p = fair_sched_class.pick_next_task(rq, prev);
-		if (likely(p && p != RETRY_TASK))
+		if (!p)
+			skip_fair = 1;
+		else if (likely(p != RETRY_TASK))
 			return p;
 	}
 
 again:
 	for_each_class(class) {
+		/* if we have already failed to pull fair task, skip */
+		if (class == &fair_sched_class && skip_fair)
+			continue;
 		p = class->pick_next_task(rq, prev);
 		if (p) {
 			if (unlikely(p == RETRY_TASK))
-- 
1.7.11.7



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched: Skip double execution of pick_next_task_fair
  2014-04-23 22:31 [PATCH] sched: Skip double execution of pick_next_task_fair Tim Chen
@ 2014-04-24 10:00 ` Peter Zijlstra
  2014-04-24 17:06   ` Tim Chen
  2014-05-08 10:42   ` [tip:sched/core] sched: Skip double execution of pick_next_task_fair() tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Zijlstra @ 2014-04-24 10:00 UTC (permalink / raw)
  To: Tim Chen; +Cc: Ingo Molnar, linux-kernel, Andi Kleen, Len Brown

On Wed, Apr 23, 2014 at 03:31:57PM -0700, Tim Chen wrote:
> The current code will call pick_next_task_fair a second time
> in the slow path if we did not pull any task in our first try.
> This is really unnecessary as we already know no task can
> be pulled and it doubles the delay for the cpu to enter idle.
> 
> We instrumented some network workloads and that saw that
> pick_next_task_fair is frequently called twice before a cpu enters idle.
> The call to pick_next_task_fair can add
> non trivial latency as it calls load_balance which runs find_busiest_group
> on an hierarchy of sched domains spanning the cpus for a large system.  For
> some 4 socket systems, we saw almost 0.25 msec spent per call
> of pick_next_task_fair before a cpu can be idled.
> 
> This patch skips pick_next_task_fair in the slow path if it
> has already been invoked.

How about something like so?

Its a little more contained.

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2636,8 +2636,14 @@ pick_next_task(struct rq *rq, struct tas
 	if (likely(prev->sched_class == class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 		p = fair_sched_class.pick_next_task(rq, prev);
-		if (likely(p && p != RETRY_TASK))
-			return p;
+		if (unlikely(p == RETRY_TASK))
+			goto again;
+
+		/* assumes fair_sched_class->next == idle_sched_class */
+		if (unlikely(!p))
+			p = pick_next_task_idle(rq, prev);
+
+		return p;
 	}
 
 again:

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched: Skip double execution of pick_next_task_fair
  2014-04-24 10:00 ` Peter Zijlstra
@ 2014-04-24 17:06   ` Tim Chen
  2014-04-24 17:16     ` Peter Zijlstra
  2014-05-08 10:42   ` [tip:sched/core] sched: Skip double execution of pick_next_task_fair() tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Tim Chen @ 2014-04-24 17:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, Andi Kleen, Len Brown

On Thu, 2014-04-24 at 12:00 +0200, Peter Zijlstra wrote:
> On Wed, Apr 23, 2014 at 03:31:57PM -0700, Tim Chen wrote:
> > The current code will call pick_next_task_fair a second time
> > in the slow path if we did not pull any task in our first try.
> > This is really unnecessary as we already know no task can
> > be pulled and it doubles the delay for the cpu to enter idle.
> > 
> > We instrumented some network workloads and that saw that
> > pick_next_task_fair is frequently called twice before a cpu enters idle.
> > The call to pick_next_task_fair can add
> > non trivial latency as it calls load_balance which runs find_busiest_group
> > on an hierarchy of sched domains spanning the cpus for a large system.  For
> > some 4 socket systems, we saw almost 0.25 msec spent per call
> > of pick_next_task_fair before a cpu can be idled.
> > 
> > This patch skips pick_next_task_fair in the slow path if it
> > has already been invoked.
> 
> How about something like so?

Yes, this version is more concise.

> 
> Its a little more contained.
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2636,8 +2636,14 @@ pick_next_task(struct rq *rq, struct tas
>  	if (likely(prev->sched_class == class &&
>  		   rq->nr_running == rq->cfs.h_nr_running)) {
>  		p = fair_sched_class.pick_next_task(rq, prev);
> -		if (likely(p && p != RETRY_TASK))
> -			return p;
> +		if (unlikely(p == RETRY_TASK))
> +			goto again;
> +
> +		/* assumes fair_sched_class->next == idle_sched_class */
> +		if (unlikely(!p))
> +			p = pick_next_task_idle(rq, prev);
Should be 
			  p = idle_sched_class.pick_next_task(rq, prev);
> +
> +		return p;
>  	}
>  
>  again:

I'll respin the patch with these changes.

Thanks.

Tim



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched: Skip double execution of pick_next_task_fair
  2014-04-24 17:06   ` Tim Chen
@ 2014-04-24 17:16     ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2014-04-24 17:16 UTC (permalink / raw)
  To: Tim Chen; +Cc: Ingo Molnar, linux-kernel, Andi Kleen, Len Brown

On Thu, Apr 24, 2014 at 10:06:07AM -0700, Tim Chen wrote:
> Yes, this version is more concise.
> 
> > 
> > Its a little more contained.
> > 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2636,8 +2636,14 @@ pick_next_task(struct rq *rq, struct tas
> >  	if (likely(prev->sched_class == class &&
> >  		   rq->nr_running == rq->cfs.h_nr_running)) {
> >  		p = fair_sched_class.pick_next_task(rq, prev);
> > -		if (likely(p && p != RETRY_TASK))
> > -			return p;
> > +		if (unlikely(p == RETRY_TASK))
> > +			goto again;
> > +
> > +		/* assumes fair_sched_class->next == idle_sched_class */
> > +		if (unlikely(!p))
> > +			p = pick_next_task_idle(rq, prev);
> Should be 
> 			  p = idle_sched_class.pick_next_task(rq, prev);

Indeed, already fixed that when my compiler complained.

> > +
> > +		return p;
> >  	}
> >  
> >  again:
> 
> I'll respin the patch with these changes.

No need; I have the following queued:

---
Subject: sched: Skip double execution of pick_next_task_fair
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 24 Apr 2014 12:00:47 +0200

Tim wrote:

The current code will call pick_next_task_fair a second time in the
slow path if we did not pull any task in our first try.  This is
really unnecessary as we already know no task can be pulled and it
doubles the delay for the cpu to enter idle.

We instrumented some network workloads and that saw that
pick_next_task_fair is frequently called twice before a cpu enters
idle.  The call to pick_next_task_fair can add non trivial latency as
it calls load_balance which runs find_busiest_group on an hierarchy of
sched domains spanning the cpus for a large system.  For some 4 socket
systems, we saw almost 0.25 msec spent per call of pick_next_task_fair
before a cpu can be idled.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140424100047.GP11096@twins.programming.kicks-ass.net
---
---
 kernel/sched/core.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2605,8 +2605,14 @@ pick_next_task(struct rq *rq, struct tas
 	if (likely(prev->sched_class == class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 		p = fair_sched_class.pick_next_task(rq, prev);
-		if (likely(p && p != RETRY_TASK))
-			return p;
+		if (unlikely(p == RETRY_TASK))
+			goto again;
+
+		/* assumes fair_sched_class->next == idle_sched_class */
+		if (unlikely(!p))
+			p = idle_sched_class.pick_next_task(rq, prev);
+
+		return p;
 	}
 
 again:

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:sched/core] sched: Skip double execution of pick_next_task_fair()
  2014-04-24 10:00 ` Peter Zijlstra
  2014-04-24 17:06   ` Tim Chen
@ 2014-05-08 10:42   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-05-08 10:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, tim.c.chen, tglx, len.brown

Commit-ID:  6ccdc84b81a0a6c09a7f0427761d2f8cecfc2218
Gitweb:     http://git.kernel.org/tip/6ccdc84b81a0a6c09a7f0427761d2f8cecfc2218
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 24 Apr 2014 12:00:47 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 7 May 2014 11:51:35 +0200

sched: Skip double execution of pick_next_task_fair()

Tim wrote:

 "The current code will call pick_next_task_fair a second time in the
  slow path if we did not pull any task in our first try.  This is
  really unnecessary as we already know no task can be pulled and it
  doubles the delay for the cpu to enter idle.

  We instrumented some network workloads and that saw that
  pick_next_task_fair is frequently called twice before a cpu enters
  idle.  The call to pick_next_task_fair can add non trivial latency as
  it calls load_balance which runs find_busiest_group on an hierarchy of
  sched domains spanning the cpus for a large system.  For some 4 socket
  systems, we saw almost 0.25 msec spent per call of pick_next_task_fair
  before a cpu can be idled."

Optimize the second call away for the common case and document the
dependency.

Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/20140424100047.GP11096@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e62c65a..28921ec 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2592,8 +2592,14 @@ pick_next_task(struct rq *rq, struct task_struct *prev)
 	if (likely(prev->sched_class == class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
 		p = fair_sched_class.pick_next_task(rq, prev);
-		if (likely(p && p != RETRY_TASK))
-			return p;
+		if (unlikely(p == RETRY_TASK))
+			goto again;
+
+		/* assumes fair_sched_class->next == idle_sched_class */
+		if (unlikely(!p))
+			p = idle_sched_class.pick_next_task(rq, prev);
+
+		return p;
 	}
 
 again:

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-08 10:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-23 22:31 [PATCH] sched: Skip double execution of pick_next_task_fair Tim Chen
2014-04-24 10:00 ` Peter Zijlstra
2014-04-24 17:06   ` Tim Chen
2014-04-24 17:16     ` Peter Zijlstra
2014-05-08 10:42   ` [tip:sched/core] sched: Skip double execution of pick_next_task_fair() tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.