All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/12] sched: fastpath cycle recovery
@ 2010-03-11  9:49 Mike Galbraith
  2010-03-11  9:50 ` [patch 1/12] sched: ratelimit nohz Mike Galbraith
                   ` (11 more replies)
  0 siblings, 12 replies; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML

Hi Peter,

The following patchlets take a pinned pipe-test context switch frequency
in tip from 663KHZ to 694KHZ, and an unpinned instance from 450KHz to
540KHz.  With these applied to tip.today, I have zero 31-12->today
regressions, and even some modest progressions.

The biggest difference is made by the first patch.  We have a problem
with nohz when waking cross-cpu, which given select_idle_sibling(), we
do quite a bit.  In testing netperf TCP_RR, hitting nohz code on every
micro-idle was eating ~10% of throughput, making cross-cpu wakeup a
loser.  These patchlets combined turned netperf TCP_RR cross-cpu vs
affine from big loser into a winner.

All of these are trivial, mostly axe murder, but cycles add up.

	-Mike


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 1/12] sched: ratelimit nohz
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
@ 2010-03-11  9:50 ` Mike Galbraith
  2010-03-11 18:30   ` [tip:sched/core] sched: Rate-limit nohz tip-bot for Mike Galbraith
  2010-03-11  9:51 ` [patch 2/12] sched: remove avg_wakeup Mike Galbraith
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:50 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: ratelimit nohz

Entering nohz code on every micro-idle is costing ~10% throughput for netperf
TCP_RR when scheduling cross-cpu.  Rate limiting entry fixes this, but raises
ticks a bit.  On my Q6600, an idle box goes from ~85 interrupts/sec to 128.

The higher the context switch rate, the more nohz entry costs.  With this patch
and some cycle recovery patches in my tree, max cross cpu context switch rate is
improved by ~16%, a large portion of which of which is this ratelimiting.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 include/linux/sched.h    |    6 ++++++
 kernel/sched.c           |   12 ++++++++++++
 kernel/time/tick-sched.c |    3 +++
 3 files changed, 21 insertions(+)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -271,11 +271,17 @@ extern cpumask_var_t nohz_cpu_mask;
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
 extern int select_nohz_load_balancer(int cpu);
 extern int get_nohz_load_balancer(void);
+extern int nohz_ratelimit(int cpu);
 #else
 static inline int select_nohz_load_balancer(int cpu)
 {
 	return 0;
 }
+
+static inline int nohz_ratelimit(int cpu)
+{
+	return 0;
+}
 #endif
 
 /*
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -492,6 +492,7 @@ struct rq {
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
 #ifdef CONFIG_NO_HZ
+	u64 nohz_stamp;
 	unsigned char in_nohz_recently;
 #endif
 	/* capture load from *all* tasks on this cpu: */
@@ -1228,6 +1229,17 @@ void wake_up_idle_cpu(int cpu)
 	if (!tsk_is_polling(rq->idle))
 		smp_send_reschedule(cpu);
 }
+
+int nohz_ratelimit(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	u64 diff = rq->clock - rq->nohz_stamp;
+
+	rq->nohz_stamp = rq->clock;
+
+	return diff < (NSEC_PER_SEC / HZ) >> 1;
+}
+
 #endif /* CONFIG_NO_HZ */
 
 static u64 sched_avg_period(void)
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -262,6 +262,9 @@ void tick_nohz_stop_sched_tick(int inidl
 		goto end;
 	}
 
+	if (nohz_ratelimit(cpu))
+		goto end;
+
 	ts->idle_calls++;
 	/* Read jiffies and the time when jiffies were updated last */
 	do {



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 2/12] sched: remove avg_wakeup
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
  2010-03-11  9:50 ` [patch 1/12] sched: ratelimit nohz Mike Galbraith
@ 2010-03-11  9:51 ` Mike Galbraith
  2010-03-11 18:30   ` [tip:sched/core] sched: Remove avg_wakeup tip-bot for Mike Galbraith
  2010-03-11  9:52 ` [patch 3/12] sched: remove avg_overlap Mike Galbraith
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:51 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove avg_wakeup.

Testing the load which led to this heuristic (nfs4 kbuild) shows that it has
outlived it's usefullness.  With intervening load balancing changes, I cannot
see any difference with/without, so recover there fastpath cycles.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 include/linux/sched.h   |    3 ---
 kernel/sched.c          |   26 ++++----------------------
 kernel/sched_debug.c    |    1 -
 kernel/sched_fair.c     |   31 -------------------------------
 kernel/sched_features.h |    6 ------
 5 files changed, 4 insertions(+), 63 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1105,9 +1105,6 @@ struct sched_entity {
 
 	u64			nr_migrations;
 
-	u64			start_runtime;
-	u64			avg_wakeup;
-
 #ifdef CONFIG_SCHEDSTATS
 	u64			wait_start;
 	u64			wait_max;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1880,9 +1880,6 @@ static void update_avg(u64 *avg, u64 sam
 static void
 enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, bool head)
 {
-	if (wakeup)
-		p->se.start_runtime = p->se.sum_exec_runtime;
-
 	sched_info_queued(p);
 	p->sched_class->enqueue_task(rq, p, wakeup, head);
 	p->se.on_rq = 1;
@@ -1890,17 +1887,11 @@ enqueue_task(struct rq *rq, struct task_
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
-	if (sleep) {
-		if (p->se.last_wakeup) {
-			update_avg(&p->se.avg_overlap,
-				p->se.sum_exec_runtime - p->se.last_wakeup);
-			p->se.last_wakeup = 0;
-		} else {
-			update_avg(&p->se.avg_wakeup,
-				sysctl_sched_wakeup_granularity);
-		}
+	if (sleep && p->se.last_wakeup) {
+		update_avg(&p->se.avg_overlap,
+			p->se.sum_exec_runtime - p->se.last_wakeup);
+		p->se.last_wakeup = 0;
 	}
-
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
@@ -2466,13 +2457,6 @@ out_activate:
 	 */
 	if (!in_interrupt()) {
 		struct sched_entity *se = &current->se;
-		u64 sample = se->sum_exec_runtime;
-
-		if (se->last_wakeup)
-			sample -= se->last_wakeup;
-		else
-			sample -= se->start_runtime;
-		update_avg(&se->avg_wakeup, sample);
 
 		se->last_wakeup = se->sum_exec_runtime;
 	}
@@ -2540,8 +2524,6 @@ static void __sched_fork(struct task_str
 	p->se.nr_migrations		= 0;
 	p->se.last_wakeup		= 0;
 	p->se.avg_overlap		= 0;
-	p->se.start_runtime		= 0;
-	p->se.avg_wakeup		= sysctl_sched_wakeup_granularity;
 
 #ifdef CONFIG_SCHEDSTATS
 	p->se.wait_start			= 0;
Index: linux-2.6/kernel/sched_debug.c
===================================================================
--- linux-2.6.orig/kernel/sched_debug.c
+++ linux-2.6/kernel/sched_debug.c
@@ -408,7 +408,6 @@ void proc_sched_show_task(struct task_st
 	PN(se.vruntime);
 	PN(se.sum_exec_runtime);
 	PN(se.avg_overlap);
-	PN(se.avg_wakeup);
 
 	nr_switches = p->nvcsw + p->nivcsw;
 
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1591,42 +1591,11 @@ static int select_task_rq_fair(struct ta
 }
 #endif /* CONFIG_SMP */
 
-/*
- * Adaptive granularity
- *
- * se->avg_wakeup gives the average time a task runs until it does a wakeup,
- * with the limit of wakeup_gran -- when it never does a wakeup.
- *
- * So the smaller avg_wakeup is the faster we want this task to preempt,
- * but we don't want to treat the preemptee unfairly and therefore allow it
- * to run for at least the amount of time we'd like to run.
- *
- * NOTE: we use 2*avg_wakeup to increase the probability of actually doing one
- *
- * NOTE: we use *nr_running to scale with load, this nicely matches the
- *       degrading latency on load.
- */
-static unsigned long
-adaptive_gran(struct sched_entity *curr, struct sched_entity *se)
-{
-	u64 this_run = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
-	u64 expected_wakeup = 2*se->avg_wakeup * cfs_rq_of(se)->nr_running;
-	u64 gran = 0;
-
-	if (this_run < expected_wakeup)
-		gran = expected_wakeup - this_run;
-
-	return min_t(s64, gran, sysctl_sched_wakeup_granularity);
-}
-
 static unsigned long
 wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
 {
 	unsigned long gran = sysctl_sched_wakeup_granularity;
 
-	if (cfs_rq_of(curr)->curr && sched_feat(ADAPTIVE_GRAN))
-		gran = adaptive_gran(curr, se);
-
 	/*
 	 * Since its curr running now, convert the gran from real-time
 	 * to virtual-time in his units.
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -31,12 +31,6 @@ SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
 
 /*
- * Compute wakeup_gran based on task behaviour, clipped to
- *  [0, sched_wakeup_gran_ns]
- */
-SCHED_FEAT(ADAPTIVE_GRAN, 1)
-
-/*
  * When converting the wakeup granularity to virtual time, do it such
  * that heavier tasks preempting a lighter task have an edge.
  */



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 3/12] sched: remove avg_overlap
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
  2010-03-11  9:50 ` [patch 1/12] sched: ratelimit nohz Mike Galbraith
  2010-03-11  9:51 ` [patch 2/12] sched: remove avg_wakeup Mike Galbraith
@ 2010-03-11  9:52 ` Mike Galbraith
  2010-03-11 18:31   ` [tip:sched/core] sched: Remove avg_overlap tip-bot for Mike Galbraith
  2010-03-11  9:53 ` [patch 4/12] sched: cleanup/optimize clock updates Mike Galbraith
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove avg_overlap

Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy
was detrimentally affected by cross-cpu wakeups, this because we are missing
the necessary call to update_curr().  This can't be fixed without increasing
overhead in our already too fat fastpath.

Additionally, with recent load balancing changes making us prefer to place tasks
in an idle cache domain (which is good for compute bound loads), communicating
tasks suffer when a sync wakeup, which would enable affine placement, is turned
into a non-sync wakeup by SYNC_LESS.  With one task on the runqueue, wake_affine()
rejects the affine wakeup request, leaving the unfortunate where placed, taking
frequent cache misses.

Remove it, and recover some fastpath cycles.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 include/linux/sched.h   |    3 ---
 kernel/sched.c          |   33 ---------------------------------
 kernel/sched_debug.c    |    1 -
 kernel/sched_fair.c     |   18 ------------------
 kernel/sched_features.h |   16 ----------------
 5 files changed, 71 deletions(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1100,9 +1100,6 @@ struct sched_entity {
 	u64			vruntime;
 	u64			prev_sum_exec_runtime;
 
-	u64			last_wakeup;
-	u64			avg_overlap;
-
 	u64			nr_migrations;
 
 #ifdef CONFIG_SCHEDSTATS
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1887,11 +1887,6 @@ enqueue_task(struct rq *rq, struct task_
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
-	if (sleep && p->se.last_wakeup) {
-		update_avg(&p->se.avg_overlap,
-			p->se.sum_exec_runtime - p->se.last_wakeup);
-		p->se.last_wakeup = 0;
-	}
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
@@ -2452,15 +2447,6 @@ out_activate:
 	activate_task(rq, p, 1);
 	success = 1;
 
-	/*
-	 * Only attribute actual wakeups done by this task.
-	 */
-	if (!in_interrupt()) {
-		struct sched_entity *se = &current->se;
-
-		se->last_wakeup = se->sum_exec_runtime;
-	}
-
 out_running:
 	trace_sched_wakeup(rq, p, success);
 	check_preempt_curr(rq, p, wake_flags);
@@ -2522,8 +2508,6 @@ static void __sched_fork(struct task_str
 	p->se.sum_exec_runtime		= 0;
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
-	p->se.last_wakeup		= 0;
-	p->se.avg_overlap		= 0;
 
 #ifdef CONFIG_SCHEDSTATS
 	p->se.wait_start			= 0;
@@ -3623,23 +3607,6 @@ static inline void schedule_debug(struct
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->state == TASK_RUNNING) {
-		u64 runtime = prev->se.sum_exec_runtime;
-
-		runtime -= prev->se.prev_sum_exec_runtime;
-		runtime = min_t(u64, runtime, 2*sysctl_sched_migration_cost);
-
-		/*
-		 * In order to avoid avg_overlap growing stale when we are
-		 * indeed overlapping and hence not getting put to sleep, grow
-		 * the avg_overlap on preemption.
-		 *
-		 * We use the average preemption runtime because that
-		 * correlates to the amount of cache footprint a task can
-		 * build up.
-		 */
-		update_avg(&prev->se.avg_overlap, runtime);
-	}
 	prev->sched_class->put_prev_task(rq, prev);
 }
 
Index: linux-2.6/kernel/sched_debug.c
===================================================================
--- linux-2.6.orig/kernel/sched_debug.c
+++ linux-2.6/kernel/sched_debug.c
@@ -407,7 +407,6 @@ void proc_sched_show_task(struct task_st
 	PN(se.exec_start);
 	PN(se.vruntime);
 	PN(se.sum_exec_runtime);
-	PN(se.avg_overlap);
 
 	nr_switches = p->nvcsw + p->nivcsw;
 
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1240,7 +1240,6 @@ static inline unsigned long effective_lo
 
 static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 {
-	struct task_struct *curr = current;
 	unsigned long this_load, load;
 	int idx, this_cpu, prev_cpu;
 	unsigned long tl_per_task;
@@ -1255,18 +1254,6 @@ static int wake_affine(struct sched_doma
 	load	  = source_load(prev_cpu, idx);
 	this_load = target_load(this_cpu, idx);
 
-	if (sync) {
-	       if (sched_feat(SYNC_LESS) &&
-		   (curr->se.avg_overlap > sysctl_sched_migration_cost ||
-		    p->se.avg_overlap > sysctl_sched_migration_cost))
-		       sync = 0;
-	} else {
-		if (sched_feat(SYNC_MORE) &&
-		    (curr->se.avg_overlap < sysctl_sched_migration_cost &&
-		     p->se.avg_overlap < sysctl_sched_migration_cost))
-			sync = 1;
-	}
-
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
 	 * effect of the currently running task from the load
@@ -1710,11 +1697,6 @@ static void check_preempt_wakeup(struct
 	if (sched_feat(WAKEUP_SYNC) && sync)
 		goto preempt;
 
-	if (sched_feat(WAKEUP_OVERLAP) &&
-			se->avg_overlap < sysctl_sched_migration_cost &&
-			pse->avg_overlap < sysctl_sched_migration_cost)
-		goto preempt;
-
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -42,12 +42,6 @@ SCHED_FEAT(ASYM_GRAN, 1)
 SCHED_FEAT(WAKEUP_SYNC, 0)
 
 /*
- * Wakeup preempt based on task behaviour. Tasks that do not overlap
- * don't get preempted.
- */
-SCHED_FEAT(WAKEUP_OVERLAP, 0)
-
-/*
  * Use the SYNC wakeup hint, pipes and the likes use this to indicate
  * the remote end is likely to consume the data we just wrote, and
  * therefore has cache benefit from being placed on the same cpu, see
@@ -64,16 +58,6 @@ SCHED_FEAT(SYNC_WAKEUPS, 1)
 SCHED_FEAT(AFFINE_WAKEUPS, 1)
 
 /*
- * Weaken SYNC hint based on overlap
- */
-SCHED_FEAT(SYNC_LESS, 1)
-
-/*
- * Add SYNC hint based on overlap
- */
-SCHED_FEAT(SYNC_MORE, 0)
-
-/*
  * Prefer to schedule the task we woke last (assuming it failed
  * wakeup-preemption), since its likely going to consume data we
  * touched, increases cache locality.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 4/12] sched: cleanup/optimize clock updates
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (2 preceding siblings ...)
  2010-03-11  9:52 ` [patch 3/12] sched: remove avg_overlap Mike Galbraith
@ 2010-03-11  9:53 ` Mike Galbraith
  2010-03-11 18:31   ` [tip:sched/core] sched: Cleanup/optimize " tip-bot for Mike Galbraith
  2010-03-11  9:54 ` [patch 5/12] sched: tweak sched_latency and min_granularity Mike Galbraith
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: cleanup and optimize clock updates

Now that we no longer depend on the clock being updated prior to enqueueing
on migratory wakeup, we can clean up a bit, placing calls to update_rq_clock()
exactly where they are needed, ie on enqueue, dequeue and schedule events.

In the case of a freshly enqueued task immediately preempting, we can skip the
update during preemption, as the clock was just updated by the enqueue event.
We also save an unneeded call during a migratory wakeup by not updating the
previous runqueue, where update_curr() won't be invoked.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched.c      |   32 ++++++++++++++++----------------
 kernel/sched_fair.c |    2 --
 2 files changed, 16 insertions(+), 18 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -495,6 +495,8 @@ struct rq {
 	u64 nohz_stamp;
 	unsigned char in_nohz_recently;
 #endif
+	unsigned int skip_clock_update;
+
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
 	unsigned long nr_load_updates;
@@ -592,6 +594,13 @@ static inline
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 {
 	rq->curr->sched_class->check_preempt_curr(rq, p, flags);
+
+	/*
+	 * A queue event has occurred, and we're going to schedule.  In
+	 * this case, we can save a useless back to back clock update.
+	 */
+	if (test_tsk_need_resched(p))
+		rq->skip_clock_update = 1;
 }
 
 static inline int cpu_of(struct rq *rq)
@@ -626,7 +635,8 @@ static inline int cpu_of(struct rq *rq)
 
 inline void update_rq_clock(struct rq *rq)
 {
-	rq->clock = sched_clock_cpu(cpu_of(rq));
+	if (!rq->skip_clock_update)
+		rq->clock = sched_clock_cpu(cpu_of(rq));
 }
 
 /*
@@ -1782,8 +1792,6 @@ static void double_rq_lock(struct rq *rq
 			raw_spin_lock_nested(&rq1->lock, SINGLE_DEPTH_NESTING);
 		}
 	}
-	update_rq_clock(rq1);
-	update_rq_clock(rq2);
 }
 
 /*
@@ -1880,6 +1888,7 @@ static void update_avg(u64 *avg, u64 sam
 static void
 enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, bool head)
 {
+	update_rq_clock(rq);
 	sched_info_queued(p);
 	p->sched_class->enqueue_task(rq, p, wakeup, head);
 	p->se.on_rq = 1;
@@ -1887,6 +1896,7 @@ enqueue_task(struct rq *rq, struct task_
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+	update_rq_clock(rq);
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
@@ -2366,7 +2376,6 @@ static int try_to_wake_up(struct task_st
 
 	smp_wmb();
 	rq = orig_rq = task_rq_lock(p, &flags);
-	update_rq_clock(rq);
 	if (!(p->state & state))
 		goto out;
 
@@ -2407,7 +2416,6 @@ static int try_to_wake_up(struct task_st
 
 	rq = cpu_rq(cpu);
 	raw_spin_lock(&rq->lock);
-	update_rq_clock(rq);
 
 	/*
 	 * We migrated the task without holding either rq->lock, however
@@ -2653,7 +2661,6 @@ void wake_up_new_task(struct task_struct
 
 	BUG_ON(p->state != TASK_WAKING);
 	p->state = TASK_RUNNING;
-	update_rq_clock(rq);
 	activate_task(rq, p, 0);
 	trace_sched_wakeup_new(rq, p, 1);
 	check_preempt_curr(rq, p, WF_FORK);
@@ -3607,6 +3614,9 @@ static inline void schedule_debug(struct
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
+	if (prev->se.on_rq)
+		update_rq_clock(rq);
+	rq->skip_clock_update = 0;
 	prev->sched_class->put_prev_task(rq, prev);
 }
 
@@ -3669,7 +3679,6 @@ need_resched_nonpreemptible:
 		hrtick_clear(rq);
 
 	raw_spin_lock_irq(&rq->lock);
-	update_rq_clock(rq);
 	clear_tsk_need_resched(prev);
 
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
@@ -4226,7 +4235,6 @@ void rt_mutex_setprio(struct task_struct
 	BUG_ON(prio < 0 || prio > MAX_PRIO);
 
 	rq = task_rq_lock(p, &flags);
-	update_rq_clock(rq);
 
 	oldprio = p->prio;
 	prev_class = p->sched_class;
@@ -4269,7 +4277,6 @@ void set_user_nice(struct task_struct *p
 	 * the task might be in the middle of scheduling on another CPU.
 	 */
 	rq = task_rq_lock(p, &flags);
-	update_rq_clock(rq);
 	/*
 	 * The RT priorities are set via sched_setscheduler(), but we still
 	 * allow the 'normal' nice value to be set - but as expected
@@ -4552,7 +4559,6 @@ recheck:
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 		goto recheck;
 	}
-	update_rq_clock(rq);
 	on_rq = p->se.on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
@@ -5559,7 +5565,6 @@ void sched_idle_next(void)
 
 	__setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
 
-	update_rq_clock(rq);
 	activate_task(rq, p, 0);
 
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
@@ -5614,7 +5619,6 @@ static void migrate_dead_tasks(unsigned
 	for ( ; ; ) {
 		if (!rq->nr_running)
 			break;
-		update_rq_clock(rq);
 		next = pick_next_task(rq);
 		if (!next)
 			break;
@@ -5898,7 +5902,6 @@ migration_call(struct notifier_block *nf
 		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		raw_spin_lock_irq(&rq->lock);
-		update_rq_clock(rq);
 		deactivate_task(rq, rq->idle, 0);
 		__setscheduler(rq, rq->idle, SCHED_NORMAL, 0);
 		rq->idle->sched_class = &idle_sched_class;
@@ -7848,7 +7851,6 @@ static void normalize_task(struct rq *rq
 {
 	int on_rq;
 
-	update_rq_clock(rq);
 	on_rq = p->se.on_rq;
 	if (on_rq)
 		deactivate_task(rq, p, 0);
@@ -8210,8 +8212,6 @@ void sched_move_task(struct task_struct
 
 	rq = task_rq_lock(tsk, &flags);
 
-	update_rq_clock(rq);
-
 	running = task_current(rq, tsk);
 	on_rq = tsk->se.on_rq;
 
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -3063,8 +3063,6 @@ static void active_load_balance(struct r
 
 	/* move a task from busiest_rq to target_rq */
 	double_lock_balance(busiest_rq, target_rq);
-	update_rq_clock(busiest_rq);
-	update_rq_clock(target_rq);
 
 	/* Search for an sd spanning us and the target CPU. */
 	for_each_domain(target_cpu, sd) {



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 5/12] sched: tweak sched_latency and min_granularity
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (3 preceding siblings ...)
  2010-03-11  9:53 ` [patch 4/12] sched: cleanup/optimize clock updates Mike Galbraith
@ 2010-03-11  9:54 ` Mike Galbraith
  2010-03-11 18:31   ` [tip:sched/core] sched: Tweak " tip-bot for Mike Galbraith
  2010-03-11  9:56 ` [patch 6/12] sched: fix select_idle_sibling() Mike Galbraith
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:54 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: tweak sched_latency and min_granularity

Allow LAST_BUDDY to kick in sooner, improving cache utilization as soon as
a second buddy pair arrives on scene.  The cost is latency starting to climb
sooner, the tbenefit for tbench 8 on my Q6600 box is ~2%.  No detrimental
effects noted in normal idesktop usage.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched_fair.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -35,8 +35,8 @@
  * (to see the precise effective timeslice length of your workload,
  *  run vmstat and monitor the context-switches (cs) field)
  */
-unsigned int sysctl_sched_latency = 5000000ULL;
-unsigned int normalized_sysctl_sched_latency = 5000000ULL;
+unsigned int sysctl_sched_latency = 6000000ULL;
+unsigned int normalized_sysctl_sched_latency = 6000000ULL;
 
 /*
  * The initial- and re-scaling of tunables is configurable
@@ -52,15 +52,15 @@ enum sched_tunable_scaling sysctl_sched_
 
 /*
  * Minimal preemption granularity for CPU-bound tasks:
- * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 2 msec * (1 + ilog(ncpus)), units: nanoseconds)
  */
-unsigned int sysctl_sched_min_granularity = 1000000ULL;
-unsigned int normalized_sysctl_sched_min_granularity = 1000000ULL;
+unsigned int sysctl_sched_min_granularity = 2000000ULL;
+unsigned int normalized_sysctl_sched_min_granularity = 2000000ULL;
 
 /*
  * is kept at sysctl_sched_latency / sysctl_sched_min_granularity
  */
-static unsigned int sched_nr_latency = 5;
+static unsigned int sched_nr_latency = 3;
 
 /*
  * After fork, child runs first. If set to 0 (default) then



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 6/12] sched: fix select_idle_sibling()
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (4 preceding siblings ...)
  2010-03-11  9:54 ` [patch 5/12] sched: tweak sched_latency and min_granularity Mike Galbraith
@ 2010-03-11  9:56 ` Mike Galbraith
  2010-03-11 18:32   ` [tip:sched/core] sched: Fix select_idle_sibling() tip-bot for Mike Galbraith
  2010-03-11  9:57 ` [patch 7/12] sched: remove NORMALIZED_SLEEPER Mike Galbraith
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: fix select_idle_sibling()

Don't bother with selection when the current cpu is idle.  Recent load
balancing changes also make it no longer necessary to check wake_affine()
success before returning the selected sibling, so we now always use it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched_fair.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1438,7 +1438,7 @@ static int select_task_rq_fair(struct ta
 	int cpu = smp_processor_id();
 	int prev_cpu = task_cpu(p);
 	int new_cpu = cpu;
-	int want_affine = 0;
+	int want_affine = 0, cpu_idle = !current->pid;
 	int want_sd = 1;
 	int sync = wake_flags & WF_SYNC;
 
@@ -1496,13 +1496,15 @@ static int select_task_rq_fair(struct ta
 			 * If there's an idle sibling in this domain, make that
 			 * the wake_affine target instead of the current cpu.
 			 */
-			if (tmp->flags & SD_SHARE_PKG_RESOURCES)
+			if (!cpu_idle && tmp->flags & SD_SHARE_PKG_RESOURCES)
 				target = select_idle_sibling(p, tmp, target);
 
 			if (target >= 0) {
 				if (tmp->flags & SD_WAKE_AFFINE) {
 					affine_sd = tmp;
 					want_affine = 0;
+					if (target != cpu)
+						cpu_idle = 1;
 				}
 				cpu = target;
 			}
@@ -1518,6 +1520,7 @@ static int select_task_rq_fair(struct ta
 			sd = tmp;
 	}
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
 	if (sched_feat(LB_SHARES_UPDATE)) {
 		/*
 		 * Pick the largest domain to update shares over
@@ -1531,9 +1534,12 @@ static int select_task_rq_fair(struct ta
 		if (tmp)
 			update_shares(tmp);
 	}
+#endif
 
-	if (affine_sd && wake_affine(affine_sd, p, sync))
-		return cpu;
+	if (affine_sd) {
+		if (cpu_idle || cpu == prev_cpu || wake_affine(affine_sd, p, sync))
+			return cpu;
+	}
 
 	while (sd) {
 		int load_idx = sd->forkexec_idx;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 7/12] sched: remove NORMALIZED_SLEEPER
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (5 preceding siblings ...)
  2010-03-11  9:56 ` [patch 6/12] sched: fix select_idle_sibling() Mike Galbraith
@ 2010-03-11  9:57 ` Mike Galbraith
  2010-03-11 18:32   ` [tip:sched/core] sched: Remove NORMALIZED_SLEEPER tip-bot for Mike Galbraith
  2010-03-11  9:58 ` [patch 8/12] sched: remove FAIR_SLEEPERS feature Mike Galbraith
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove NORMALIZED_SLEEPER

This feature hasn't been enabled in a long time, remove effectively dead code.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched_fair.c     |   10 ----------
 kernel/sched_features.h |    7 -------
 2 files changed, 17 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -741,16 +741,6 @@ place_entity(struct cfs_rq *cfs_rq, stru
 		unsigned long thresh = sysctl_sched_latency;
 
 		/*
-		 * Convert the sleeper threshold into virtual time.
-		 * SCHED_IDLE is a special sub-class.  We care about
-		 * fairness only relative to other SCHED_IDLE tasks,
-		 * all of which have the same weight.
-		 */
-		if (sched_feat(NORMALIZED_SLEEPER) && (!entity_is_task(se) ||
-				 task_of(se)->policy != SCHED_IDLE))
-			thresh = calc_delta_fair(thresh, se);
-
-		/*
 		 * Halve their sleep time's effect, to allow
 		 * for a gentler effect of sleepers:
 		 */
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -13,13 +13,6 @@ SCHED_FEAT(FAIR_SLEEPERS, 1)
 SCHED_FEAT(GENTLE_FAIR_SLEEPERS, 1)
 
 /*
- * By not normalizing the sleep time, heavy tasks get an effective
- * longer period, and lighter task an effective shorter period they
- * are considered running.
- */
-SCHED_FEAT(NORMALIZED_SLEEPER, 0)
-
-/*
  * Place new tasks ahead so that they do not starve already running
  * tasks
  */



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 8/12] sched: remove FAIR_SLEEPERS feature
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (6 preceding siblings ...)
  2010-03-11  9:57 ` [patch 7/12] sched: remove NORMALIZED_SLEEPER Mike Galbraith
@ 2010-03-11  9:58 ` Mike Galbraith
  2010-03-11 18:32   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
  2010-03-11  9:59 ` [patch 9/12] sched: remove WAKEUP_SYNC feature Mike Galbraith
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove FAIR_SLEEPERS feature

Our preemption model relies too heavily on sleeper fairness to disable it
without dire consequences.  Remove the feature, and save a branch or two.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched_fair.c     |    2 +-
 kernel/sched_features.h |    7 -------
 2 files changed, 1 insertion(+), 8 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -737,7 +737,7 @@ place_entity(struct cfs_rq *cfs_rq, stru
 		vruntime += sched_vslice(cfs_rq, se);
 
 	/* sleeps up to a single latency don't count. */
-	if (!initial && sched_feat(FAIR_SLEEPERS)) {
+	if (!initial) {
 		unsigned long thresh = sysctl_sched_latency;
 
 		/*
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -1,11 +1,4 @@
 /*
- * Disregards a certain amount of sleep time (sched_latency_ns) and
- * considers the task to be running during that period. This gives it
- * a service deficit on wakeup, allowing it to run sooner.
- */
-SCHED_FEAT(FAIR_SLEEPERS, 1)
-
-/*
  * Only give sleepers 50% of their service deficit. This allows
  * them to run sooner, but does not allow tons of sleepers to
  * rip the spread apart.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 9/12] sched: remove WAKEUP_SYNC feature
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (7 preceding siblings ...)
  2010-03-11  9:58 ` [patch 8/12] sched: remove FAIR_SLEEPERS feature Mike Galbraith
@ 2010-03-11  9:59 ` Mike Galbraith
  2010-03-11 18:32   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
  2010-03-11 10:01 ` [patch 11/12] sched: remove ASYM_GRAN feature Mike Galbraith
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11  9:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove WAKEUP_SYNC feature

This feature never earned it's keep, remove it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched_fair.c     |    4 ----
 kernel/sched_features.h |    5 -----
 2 files changed, 9 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1657,7 +1657,6 @@ static void check_preempt_wakeup(struct
 	struct task_struct *curr = rq->curr;
 	struct sched_entity *se = &curr->se, *pse = &p->se;
 	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
-	int sync = wake_flags & WF_SYNC;
 	int scale = cfs_rq->nr_running >= sched_nr_latency;
 
 	if (unlikely(rt_prio(p->prio)))
@@ -1690,9 +1689,6 @@ static void check_preempt_wakeup(struct
 	if (unlikely(curr->policy == SCHED_IDLE))
 		goto preempt;
 
-	if (sched_feat(WAKEUP_SYNC) && sync)
-		goto preempt;
-
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -23,11 +23,6 @@ SCHED_FEAT(WAKEUP_PREEMPT, 1)
 SCHED_FEAT(ASYM_GRAN, 1)
 
 /*
- * Always wakeup-preempt SYNC wakeups, see SYNC_WAKEUPS.
- */
-SCHED_FEAT(WAKEUP_SYNC, 0)
-
-/*
  * Use the SYNC wakeup hint, pipes and the likes use this to indicate
  * the remote end is likely to consume the data we just wrote, and
  * therefore has cache benefit from being placed on the same cpu, see



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 11/12] sched: remove ASYM_GRAN feature
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (8 preceding siblings ...)
  2010-03-11  9:59 ` [patch 9/12] sched: remove WAKEUP_SYNC feature Mike Galbraith
@ 2010-03-11 10:01 ` Mike Galbraith
  2010-03-11 18:33   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
  2010-03-11 10:03 ` [patch 10/12] sched: remove SYNC_WAKEUPS feature Mike Galbraith
  2010-03-11 10:04 ` [patch 12/12] sched: remove AFFINE_WAKEUPS feature Mike Galbraith
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11 10:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove ASYM_GRAN feature

This features has been enabled for quite a while, after testing showed that
easing preemption for light tasks was harmful to high priority threads.

Remove it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched_fair.c     |   28 +++++++++++-----------------
 kernel/sched_features.h |    6 ------
 2 files changed, 11 insertions(+), 23 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1582,24 +1582,18 @@ wakeup_gran(struct sched_entity *curr, s
 	/*
 	 * Since its curr running now, convert the gran from real-time
 	 * to virtual-time in his units.
+	 *
+	 * By using 'se' instead of 'curr' we penalize light tasks, so
+	 * they get preempted easier. That is, if 'se' < 'curr' then
+	 * the resulting gran will be larger, therefore penalizing the
+	 * lighter, if otoh 'se' > 'curr' then the resulting gran will
+	 * be smaller, again penalizing the lighter task.
+	 *
+	 * This is especially important for buddies when the leftmost
+	 * task is higher priority than the buddy.
 	 */
-	if (sched_feat(ASYM_GRAN)) {
-		/*
-		 * By using 'se' instead of 'curr' we penalize light tasks, so
-		 * they get preempted easier. That is, if 'se' < 'curr' then
-		 * the resulting gran will be larger, therefore penalizing the
-		 * lighter, if otoh 'se' > 'curr' then the resulting gran will
-		 * be smaller, again penalizing the lighter task.
-		 *
-		 * This is especially important for buddies when the leftmost
-		 * task is higher priority than the buddy.
-		 */
-		if (unlikely(se->load.weight != NICE_0_LOAD))
-			gran = calc_delta_fair(gran, se);
-	} else {
-		if (unlikely(curr->load.weight != NICE_0_LOAD))
-			gran = calc_delta_fair(gran, curr);
-	}
+	if (unlikely(se->load.weight != NICE_0_LOAD))
+		gran = calc_delta_fair(gran, se);
 
 	return gran;
 }
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -17,12 +17,6 @@ SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
 
 /*
- * When converting the wakeup granularity to virtual time, do it such
- * that heavier tasks preempting a lighter task have an edge.
- */
-SCHED_FEAT(ASYM_GRAN, 1)
-
-/*
  * Use the SYNC wakeup hint, pipes and the likes use this to indicate
  * the remote end is likely to consume the data we just wrote, and
  * therefore has cache benefit from being placed on the same cpu, see



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 10/12] sched: remove SYNC_WAKEUPS feature
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (9 preceding siblings ...)
  2010-03-11 10:01 ` [patch 11/12] sched: remove ASYM_GRAN feature Mike Galbraith
@ 2010-03-11 10:03 ` Mike Galbraith
  2010-03-11 18:33   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
  2010-03-11 10:04 ` [patch 12/12] sched: remove AFFINE_WAKEUPS feature Mike Galbraith
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11 10:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove SYNC_WAKEUPS feature

Sync wakeups are critical functionality with a long history.  Remove it, we don't
need the branch or icache footprint.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched.c          |    3 ---
 kernel/sched_features.h |    8 --------
 2 files changed, 11 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2369,9 +2369,6 @@ static int try_to_wake_up(struct task_st
 	unsigned long flags;
 	struct rq *rq, *orig_rq;
 
-	if (!sched_feat(SYNC_WAKEUPS))
-		wake_flags &= ~WF_SYNC;
-
 	this_cpu = get_cpu();
 
 	smp_wmb();
Index: linux-2.6/kernel/sched_features.h
===================================================================
--- linux-2.6.orig/kernel/sched_features.h
+++ linux-2.6/kernel/sched_features.h
@@ -17,14 +17,6 @@ SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
 
 /*
- * Use the SYNC wakeup hint, pipes and the likes use this to indicate
- * the remote end is likely to consume the data we just wrote, and
- * therefore has cache benefit from being placed on the same cpu, see
- * also AFFINE_WAKEUPS.
- */
-SCHED_FEAT(SYNC_WAKEUPS, 1)
-
-/*
  * Based on load and program behaviour, see if it makes sense to place
  * a newly woken task on the same cpu as the task that woke it --
  * improve cache locality. Typically used with SYNC wakeups as



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [patch 12/12] sched: remove AFFINE_WAKEUPS feature
  2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
                   ` (10 preceding siblings ...)
  2010-03-11 10:03 ` [patch 10/12] sched: remove SYNC_WAKEUPS feature Mike Galbraith
@ 2010-03-11 10:04 ` Mike Galbraith
  2010-03-11 18:33   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
  11 siblings, 1 reply; 27+ messages in thread
From: Mike Galbraith @ 2010-03-11 10:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, LKML


sched: remove AFFINE_WAKEUPS feature

Disabling affine wakeups is too horrible to contemplate.  Remove it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched_fair.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -1433,8 +1433,7 @@ static int select_task_rq_fair(struct ta
 	int sync = wake_flags & WF_SYNC;
 
 	if (sd_flag & SD_BALANCE_WAKE) {
-		if (sched_feat(AFFINE_WAKEUPS) &&
-		    cpumask_test_cpu(cpu, &p->cpus_allowed))
+		if (cpumask_test_cpu(cpu, &p->cpus_allowed))
 			want_affine = 1;
 		new_cpu = prev_cpu;
 	}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Rate-limit nohz
  2010-03-11  9:50 ` [patch 1/12] sched: ratelimit nohz Mike Galbraith
@ 2010-03-11 18:30   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  39c0cbe2150cbd848a25ba6cdb271d1ad46818ad
Gitweb:     http://git.kernel.org/tip/39c0cbe2150cbd848a25ba6cdb271d1ad46818ad
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:13 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:49 +0100

sched: Rate-limit nohz

Entering nohz code on every micro-idle is costing ~10% throughput for netperf
TCP_RR when scheduling cross-cpu.  Rate limiting entry fixes this, but raises
ticks a bit.  On my Q6600, an idle box goes from ~85 interrupts/sec to 128.

The higher the context switch rate, the more nohz entry costs.  With this patch
and some cycle recovery patches in my tree, max cross cpu context switch rate is
improved by ~16%, a large portion of which of which is this ratelimiting.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301003.6785.28.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h    |    6 ++++++
 kernel/sched.c           |   12 ++++++++++++
 kernel/time/tick-sched.c |    3 +++
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8cc863d..13efe7d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -271,11 +271,17 @@ extern cpumask_var_t nohz_cpu_mask;
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
 extern int select_nohz_load_balancer(int cpu);
 extern int get_nohz_load_balancer(void);
+extern int nohz_ratelimit(int cpu);
 #else
 static inline int select_nohz_load_balancer(int cpu)
 {
 	return 0;
 }
+
+static inline int nohz_ratelimit(int cpu)
+{
+	return 0;
+}
 #endif
 
 /*
diff --git a/kernel/sched.c b/kernel/sched.c
index a4aa071..60b1bbe 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -492,6 +492,7 @@ struct rq {
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
 #ifdef CONFIG_NO_HZ
+	u64 nohz_stamp;
 	unsigned char in_nohz_recently;
 #endif
 	/* capture load from *all* tasks on this cpu: */
@@ -1228,6 +1229,17 @@ void wake_up_idle_cpu(int cpu)
 	if (!tsk_is_polling(rq->idle))
 		smp_send_reschedule(cpu);
 }
+
+int nohz_ratelimit(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	u64 diff = rq->clock - rq->nohz_stamp;
+
+	rq->nohz_stamp = rq->clock;
+
+	return diff < (NSEC_PER_SEC / HZ) >> 1;
+}
+
 #endif /* CONFIG_NO_HZ */
 
 static u64 sched_avg_period(void)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f992762..f25735a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -262,6 +262,9 @@ void tick_nohz_stop_sched_tick(int inidle)
 		goto end;
 	}
 
+	if (nohz_ratelimit(cpu))
+		goto end;
+
 	ts->idle_calls++;
 	/* Read jiffies and the time when jiffies were updated last */
 	do {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove avg_wakeup
  2010-03-11  9:51 ` [patch 2/12] sched: remove avg_wakeup Mike Galbraith
@ 2010-03-11 18:30   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  b42e0c41a422a212ddea0666d5a3a0e3c35206db
Gitweb:     http://git.kernel.org/tip/b42e0c41a422a212ddea0666d5a3a0e3c35206db
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:15:38 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:50 +0100

sched: Remove avg_wakeup

Testing the load which led to this heuristic (nfs4 kbuild) shows that it has
outlived it's usefullness.  With intervening load balancing changes, I cannot
see any difference with/without, so recover there fastpath cycles.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301062.6785.29.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h   |    3 ---
 kernel/sched.c          |   26 ++++----------------------
 kernel/sched_debug.c    |    1 -
 kernel/sched_fair.c     |   31 -------------------------------
 kernel/sched_features.h |    6 ------
 5 files changed, 4 insertions(+), 63 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 13efe7d..70c560f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1185,9 +1185,6 @@ struct sched_entity {
 
 	u64			nr_migrations;
 
-	u64			start_runtime;
-	u64			avg_wakeup;
-
 #ifdef CONFIG_SCHEDSTATS
 	struct sched_statistics statistics;
 #endif
diff --git a/kernel/sched.c b/kernel/sched.c
index 60b1bbe..35a8626 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1880,9 +1880,6 @@ static void update_avg(u64 *avg, u64 sample)
 static void
 enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, bool head)
 {
-	if (wakeup)
-		p->se.start_runtime = p->se.sum_exec_runtime;
-
 	sched_info_queued(p);
 	p->sched_class->enqueue_task(rq, p, wakeup, head);
 	p->se.on_rq = 1;
@@ -1890,17 +1887,11 @@ enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, bool head)
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
-	if (sleep) {
-		if (p->se.last_wakeup) {
-			update_avg(&p->se.avg_overlap,
-				p->se.sum_exec_runtime - p->se.last_wakeup);
-			p->se.last_wakeup = 0;
-		} else {
-			update_avg(&p->se.avg_wakeup,
-				sysctl_sched_wakeup_granularity);
-		}
+	if (sleep && p->se.last_wakeup) {
+		update_avg(&p->se.avg_overlap,
+			p->se.sum_exec_runtime - p->se.last_wakeup);
+		p->se.last_wakeup = 0;
 	}
-
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
@@ -2466,13 +2457,6 @@ out_activate:
 	 */
 	if (!in_interrupt()) {
 		struct sched_entity *se = &current->se;
-		u64 sample = se->sum_exec_runtime;
-
-		if (se->last_wakeup)
-			sample -= se->last_wakeup;
-		else
-			sample -= se->start_runtime;
-		update_avg(&se->avg_wakeup, sample);
 
 		se->last_wakeup = se->sum_exec_runtime;
 	}
@@ -2540,8 +2524,6 @@ static void __sched_fork(struct task_struct *p)
 	p->se.nr_migrations		= 0;
 	p->se.last_wakeup		= 0;
 	p->se.avg_overlap		= 0;
-	p->se.start_runtime		= 0;
-	p->se.avg_wakeup		= sysctl_sched_wakeup_granularity;
 
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index ad9df44..20b95a4 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -408,7 +408,6 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
 	PN(se.vruntime);
 	PN(se.sum_exec_runtime);
 	PN(se.avg_overlap);
-	PN(se.avg_wakeup);
 
 	nr_switches = p->nvcsw + p->nivcsw;
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8ad164b..6fc6285 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1592,42 +1592,11 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
 }
 #endif /* CONFIG_SMP */
 
-/*
- * Adaptive granularity
- *
- * se->avg_wakeup gives the average time a task runs until it does a wakeup,
- * with the limit of wakeup_gran -- when it never does a wakeup.
- *
- * So the smaller avg_wakeup is the faster we want this task to preempt,
- * but we don't want to treat the preemptee unfairly and therefore allow it
- * to run for at least the amount of time we'd like to run.
- *
- * NOTE: we use 2*avg_wakeup to increase the probability of actually doing one
- *
- * NOTE: we use *nr_running to scale with load, this nicely matches the
- *       degrading latency on load.
- */
-static unsigned long
-adaptive_gran(struct sched_entity *curr, struct sched_entity *se)
-{
-	u64 this_run = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
-	u64 expected_wakeup = 2*se->avg_wakeup * cfs_rq_of(se)->nr_running;
-	u64 gran = 0;
-
-	if (this_run < expected_wakeup)
-		gran = expected_wakeup - this_run;
-
-	return min_t(s64, gran, sysctl_sched_wakeup_granularity);
-}
-
 static unsigned long
 wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
 {
 	unsigned long gran = sysctl_sched_wakeup_granularity;
 
-	if (cfs_rq_of(curr)->curr && sched_feat(ADAPTIVE_GRAN))
-		gran = adaptive_gran(curr, se);
-
 	/*
 	 * Since its curr running now, convert the gran from real-time
 	 * to virtual-time in his units.
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index d5059fd..96ef5db 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -31,12 +31,6 @@ SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
 
 /*
- * Compute wakeup_gran based on task behaviour, clipped to
- *  [0, sched_wakeup_gran_ns]
- */
-SCHED_FEAT(ADAPTIVE_GRAN, 1)
-
-/*
  * When converting the wakeup granularity to virtual time, do it such
  * that heavier tasks preempting a lighter task have an edge.
  */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove avg_overlap
  2010-03-11  9:52 ` [patch 3/12] sched: remove avg_overlap Mike Galbraith
@ 2010-03-11 18:31   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  e12f31d3e5d36328c7fbd0fce40a95e70b59152c
Gitweb:     http://git.kernel.org/tip/e12f31d3e5d36328c7fbd0fce40a95e70b59152c
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:15:51 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:50 +0100

sched: Remove avg_overlap

Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy
was detrimentally affected by cross-cpu wakeups, this because we are missing
the necessary call to update_curr().  This can't be fixed without increasing
overhead in our already too fat fastpath.

Additionally, with recent load balancing changes making us prefer to place tasks
in an idle cache domain (which is good for compute bound loads), communicating
tasks suffer when a sync wakeup, which would enable affine placement, is turned
into a non-sync wakeup by SYNC_LESS.  With one task on the runqueue, wake_affine()
rejects the affine wakeup request, leaving the unfortunate where placed, taking
frequent cache misses.

Remove it, and recover some fastpath cycles.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301121.6785.30.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/sched.h   |    3 ---
 kernel/sched.c          |   33 ---------------------------------
 kernel/sched_debug.c    |    1 -
 kernel/sched_fair.c     |   18 ------------------
 kernel/sched_features.h |   16 ----------------
 5 files changed, 0 insertions(+), 71 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 70c560f..8604884 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1180,9 +1180,6 @@ struct sched_entity {
 	u64			vruntime;
 	u64			prev_sum_exec_runtime;
 
-	u64			last_wakeup;
-	u64			avg_overlap;
-
 	u64			nr_migrations;
 
 #ifdef CONFIG_SCHEDSTATS
diff --git a/kernel/sched.c b/kernel/sched.c
index 35a8626..68ed6f4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1887,11 +1887,6 @@ enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, bool head)
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
-	if (sleep && p->se.last_wakeup) {
-		update_avg(&p->se.avg_overlap,
-			p->se.sum_exec_runtime - p->se.last_wakeup);
-		p->se.last_wakeup = 0;
-	}
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
@@ -2452,15 +2447,6 @@ out_activate:
 	activate_task(rq, p, 1);
 	success = 1;
 
-	/*
-	 * Only attribute actual wakeups done by this task.
-	 */
-	if (!in_interrupt()) {
-		struct sched_entity *se = &current->se;
-
-		se->last_wakeup = se->sum_exec_runtime;
-	}
-
 out_running:
 	trace_sched_wakeup(rq, p, success);
 	check_preempt_curr(rq, p, wake_flags);
@@ -2522,8 +2508,6 @@ static void __sched_fork(struct task_struct *p)
 	p->se.sum_exec_runtime		= 0;
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
-	p->se.last_wakeup		= 0;
-	p->se.avg_overlap		= 0;
 
 #ifdef CONFIG_SCHEDSTATS
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
@@ -3594,23 +3578,6 @@ static inline void schedule_debug(struct task_struct *prev)
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	if (prev->state == TASK_RUNNING) {
-		u64 runtime = prev->se.sum_exec_runtime;
-
-		runtime -= prev->se.prev_sum_exec_runtime;
-		runtime = min_t(u64, runtime, 2*sysctl_sched_migration_cost);
-
-		/*
-		 * In order to avoid avg_overlap growing stale when we are
-		 * indeed overlapping and hence not getting put to sleep, grow
-		 * the avg_overlap on preemption.
-		 *
-		 * We use the average preemption runtime because that
-		 * correlates to the amount of cache footprint a task can
-		 * build up.
-		 */
-		update_avg(&prev->se.avg_overlap, runtime);
-	}
 	prev->sched_class->put_prev_task(rq, prev);
 }
 
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index 20b95a4..8a46a71 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -407,7 +407,6 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
 	PN(se.exec_start);
 	PN(se.vruntime);
 	PN(se.sum_exec_runtime);
-	PN(se.avg_overlap);
 
 	nr_switches = p->nvcsw + p->nivcsw;
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 6fc6285..c3b69d4 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1241,7 +1241,6 @@ static inline unsigned long effective_load(struct task_group *tg, int cpu,
 
 static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 {
-	struct task_struct *curr = current;
 	unsigned long this_load, load;
 	int idx, this_cpu, prev_cpu;
 	unsigned long tl_per_task;
@@ -1256,18 +1255,6 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
 	load	  = source_load(prev_cpu, idx);
 	this_load = target_load(this_cpu, idx);
 
-	if (sync) {
-	       if (sched_feat(SYNC_LESS) &&
-		   (curr->se.avg_overlap > sysctl_sched_migration_cost ||
-		    p->se.avg_overlap > sysctl_sched_migration_cost))
-		       sync = 0;
-	} else {
-		if (sched_feat(SYNC_MORE) &&
-		    (curr->se.avg_overlap < sysctl_sched_migration_cost &&
-		     p->se.avg_overlap < sysctl_sched_migration_cost))
-			sync = 1;
-	}
-
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
 	 * effect of the currently running task from the load
@@ -1711,11 +1698,6 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 	if (sched_feat(WAKEUP_SYNC) && sync)
 		goto preempt;
 
-	if (sched_feat(WAKEUP_OVERLAP) &&
-			se->avg_overlap < sysctl_sched_migration_cost &&
-			pse->avg_overlap < sysctl_sched_migration_cost)
-		goto preempt;
-
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 96ef5db..c545e04 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -42,12 +42,6 @@ SCHED_FEAT(ASYM_GRAN, 1)
 SCHED_FEAT(WAKEUP_SYNC, 0)
 
 /*
- * Wakeup preempt based on task behaviour. Tasks that do not overlap
- * don't get preempted.
- */
-SCHED_FEAT(WAKEUP_OVERLAP, 0)
-
-/*
  * Use the SYNC wakeup hint, pipes and the likes use this to indicate
  * the remote end is likely to consume the data we just wrote, and
  * therefore has cache benefit from being placed on the same cpu, see
@@ -64,16 +58,6 @@ SCHED_FEAT(SYNC_WAKEUPS, 1)
 SCHED_FEAT(AFFINE_WAKEUPS, 1)
 
 /*
- * Weaken SYNC hint based on overlap
- */
-SCHED_FEAT(SYNC_LESS, 1)
-
-/*
- * Add SYNC hint based on overlap
- */
-SCHED_FEAT(SYNC_MORE, 0)
-
-/*
  * Prefer to schedule the task we woke last (assuming it failed
  * wakeup-preemption), since its likely going to consume data we
  * touched, increases cache locality.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Cleanup/optimize clock updates
  2010-03-11  9:53 ` [patch 4/12] sched: cleanup/optimize clock updates Mike Galbraith
@ 2010-03-11 18:31   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  a64692a3afd85fe048551ab89142fd5ca99a0dbd
Gitweb:     http://git.kernel.org/tip/a64692a3afd85fe048551ab89142fd5ca99a0dbd
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:16:20 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:50 +0100

sched: Cleanup/optimize clock updates

Now that we no longer depend on the clock being updated prior to enqueueing
on migratory wakeup, we can clean up a bit, placing calls to update_rq_clock()
exactly where they are needed, ie on enqueue, dequeue and schedule events.

In the case of a freshly enqueued task immediately preempting, we can skip the
update during preemption, as the clock was just updated by the enqueue event.
We also save an unneeded call during a migratory wakeup by not updating the
previous runqueue, where update_curr() won't be invoked.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301199.6785.32.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c      |   32 ++++++++++++++++----------------
 kernel/sched_fair.c |    2 --
 2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 68ed6f4..16559de 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -495,6 +495,8 @@ struct rq {
 	u64 nohz_stamp;
 	unsigned char in_nohz_recently;
 #endif
+	unsigned int skip_clock_update;
+
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
 	unsigned long nr_load_updates;
@@ -592,6 +594,13 @@ static inline
 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 {
 	rq->curr->sched_class->check_preempt_curr(rq, p, flags);
+
+	/*
+	 * A queue event has occurred, and we're going to schedule.  In
+	 * this case, we can save a useless back to back clock update.
+	 */
+	if (test_tsk_need_resched(p))
+		rq->skip_clock_update = 1;
 }
 
 static inline int cpu_of(struct rq *rq)
@@ -626,7 +635,8 @@ static inline int cpu_of(struct rq *rq)
 
 inline void update_rq_clock(struct rq *rq)
 {
-	rq->clock = sched_clock_cpu(cpu_of(rq));
+	if (!rq->skip_clock_update)
+		rq->clock = sched_clock_cpu(cpu_of(rq));
 }
 
 /*
@@ -1782,8 +1792,6 @@ static void double_rq_lock(struct rq *rq1, struct rq *rq2)
 			raw_spin_lock_nested(&rq1->lock, SINGLE_DEPTH_NESTING);
 		}
 	}
-	update_rq_clock(rq1);
-	update_rq_clock(rq2);
 }
 
 /*
@@ -1880,6 +1888,7 @@ static void update_avg(u64 *avg, u64 sample)
 static void
 enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, bool head)
 {
+	update_rq_clock(rq);
 	sched_info_queued(p);
 	p->sched_class->enqueue_task(rq, p, wakeup, head);
 	p->se.on_rq = 1;
@@ -1887,6 +1896,7 @@ enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, bool head)
 
 static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
 {
+	update_rq_clock(rq);
 	sched_info_dequeued(p);
 	p->sched_class->dequeue_task(rq, p, sleep);
 	p->se.on_rq = 0;
@@ -2366,7 +2376,6 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 
 	smp_wmb();
 	rq = task_rq_lock(p, &flags);
-	update_rq_clock(rq);
 	if (!(p->state & state))
 		goto out;
 
@@ -2407,7 +2416,6 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 
 	rq = cpu_rq(cpu);
 	raw_spin_lock(&rq->lock);
-	update_rq_clock(rq);
 
 	/*
 	 * We migrated the task without holding either rq->lock, however
@@ -2624,7 +2632,6 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
 
 	BUG_ON(p->state != TASK_WAKING);
 	p->state = TASK_RUNNING;
-	update_rq_clock(rq);
 	activate_task(rq, p, 0);
 	trace_sched_wakeup_new(rq, p, 1);
 	check_preempt_curr(rq, p, WF_FORK);
@@ -3578,6 +3585,9 @@ static inline void schedule_debug(struct task_struct *prev)
 
 static void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
+	if (prev->se.on_rq)
+		update_rq_clock(rq);
+	rq->skip_clock_update = 0;
 	prev->sched_class->put_prev_task(rq, prev);
 }
 
@@ -3640,7 +3650,6 @@ need_resched_nonpreemptible:
 		hrtick_clear(rq);
 
 	raw_spin_lock_irq(&rq->lock);
-	update_rq_clock(rq);
 	clear_tsk_need_resched(prev);
 
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
@@ -4197,7 +4206,6 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
 	BUG_ON(prio < 0 || prio > MAX_PRIO);
 
 	rq = task_rq_lock(p, &flags);
-	update_rq_clock(rq);
 
 	oldprio = p->prio;
 	prev_class = p->sched_class;
@@ -4240,7 +4248,6 @@ void set_user_nice(struct task_struct *p, long nice)
 	 * the task might be in the middle of scheduling on another CPU.
 	 */
 	rq = task_rq_lock(p, &flags);
-	update_rq_clock(rq);
 	/*
 	 * The RT priorities are set via sched_setscheduler(), but we still
 	 * allow the 'normal' nice value to be set - but as expected
@@ -4523,7 +4530,6 @@ recheck:
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 		goto recheck;
 	}
-	update_rq_clock(rq);
 	on_rq = p->se.on_rq;
 	running = task_current(rq, p);
 	if (on_rq)
@@ -5530,7 +5536,6 @@ void sched_idle_next(void)
 
 	__setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
 
-	update_rq_clock(rq);
 	activate_task(rq, p, 0);
 
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
@@ -5585,7 +5590,6 @@ static void migrate_dead_tasks(unsigned int dead_cpu)
 	for ( ; ; ) {
 		if (!rq->nr_running)
 			break;
-		update_rq_clock(rq);
 		next = pick_next_task(rq);
 		if (!next)
 			break;
@@ -5869,7 +5873,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		raw_spin_lock_irq(&rq->lock);
-		update_rq_clock(rq);
 		deactivate_task(rq, rq->idle, 0);
 		__setscheduler(rq, rq->idle, SCHED_NORMAL, 0);
 		rq->idle->sched_class = &idle_sched_class;
@@ -7815,7 +7818,6 @@ static void normalize_task(struct rq *rq, struct task_struct *p)
 {
 	int on_rq;
 
-	update_rq_clock(rq);
 	on_rq = p->se.on_rq;
 	if (on_rq)
 		deactivate_task(rq, p, 0);
@@ -8177,8 +8179,6 @@ void sched_move_task(struct task_struct *tsk)
 
 	rq = task_rq_lock(tsk, &flags);
 
-	update_rq_clock(rq);
-
 	running = task_current(rq, tsk);
 	on_rq = tsk->se.on_rq;
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index c3b69d4..69e5820 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3064,8 +3064,6 @@ static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
 
 	/* move a task from busiest_rq to target_rq */
 	double_lock_balance(busiest_rq, target_rq);
-	update_rq_clock(busiest_rq);
-	update_rq_clock(target_rq);
 
 	/* Search for an sd spanning us and the target CPU. */
 	for_each_domain(target_cpu, sd) {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Tweak sched_latency and min_granularity
  2010-03-11  9:54 ` [patch 5/12] sched: tweak sched_latency and min_granularity Mike Galbraith
@ 2010-03-11 18:31   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  21406928afe43f1db6acab4931bb8c886f4d04ce
Gitweb:     http://git.kernel.org/tip/21406928afe43f1db6acab4931bb8c886f4d04ce
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:15 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:51 +0100

sched: Tweak sched_latency and min_granularity

Allow LAST_BUDDY to kick in sooner, improving cache utilization as soon as
a second buddy pair arrives on scene.  The cost is latency starting to climb
sooner, the tbenefit for tbench 8 on my Q6600 box is ~2%.  No detrimental
effects noted in normal idesktop usage.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301285.6785.34.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 69e5820..d19df5b 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -35,8 +35,8 @@
  * (to see the precise effective timeslice length of your workload,
  *  run vmstat and monitor the context-switches (cs) field)
  */
-unsigned int sysctl_sched_latency = 5000000ULL;
-unsigned int normalized_sysctl_sched_latency = 5000000ULL;
+unsigned int sysctl_sched_latency = 6000000ULL;
+unsigned int normalized_sysctl_sched_latency = 6000000ULL;
 
 /*
  * The initial- and re-scaling of tunables is configurable
@@ -52,15 +52,15 @@ enum sched_tunable_scaling sysctl_sched_tunable_scaling
 
 /*
  * Minimal preemption granularity for CPU-bound tasks:
- * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 2 msec * (1 + ilog(ncpus)), units: nanoseconds)
  */
-unsigned int sysctl_sched_min_granularity = 1000000ULL;
-unsigned int normalized_sysctl_sched_min_granularity = 1000000ULL;
+unsigned int sysctl_sched_min_granularity = 2000000ULL;
+unsigned int normalized_sysctl_sched_min_granularity = 2000000ULL;
 
 /*
  * is kept at sysctl_sched_latency / sysctl_sched_min_granularity
  */
-static unsigned int sched_nr_latency = 5;
+static unsigned int sched_nr_latency = 3;
 
 /*
  * After fork, child runs first. If set to 0 (default) then

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Fix select_idle_sibling()
  2010-03-11  9:56 ` [patch 6/12] sched: fix select_idle_sibling() Mike Galbraith
@ 2010-03-11 18:32   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  8b911acdf08477c059d1c36c21113ab1696c612b
Gitweb:     http://git.kernel.org/tip/8b911acdf08477c059d1c36c21113ab1696c612b
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:16 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:51 +0100

sched: Fix select_idle_sibling()

Don't bother with selection when the current cpu is idle.  Recent load
balancing changes also make it no longer necessary to check wake_affine()
success before returning the selected sibling, so we now always use it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301369.6785.36.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index d19df5b..0008cc4 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1439,7 +1439,7 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
 	int cpu = smp_processor_id();
 	int prev_cpu = task_cpu(p);
 	int new_cpu = cpu;
-	int want_affine = 0;
+	int want_affine = 0, cpu_idle = !current->pid;
 	int want_sd = 1;
 	int sync = wake_flags & WF_SYNC;
 
@@ -1497,13 +1497,15 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
 			 * If there's an idle sibling in this domain, make that
 			 * the wake_affine target instead of the current cpu.
 			 */
-			if (tmp->flags & SD_SHARE_PKG_RESOURCES)
+			if (!cpu_idle && tmp->flags & SD_SHARE_PKG_RESOURCES)
 				target = select_idle_sibling(p, tmp, target);
 
 			if (target >= 0) {
 				if (tmp->flags & SD_WAKE_AFFINE) {
 					affine_sd = tmp;
 					want_affine = 0;
+					if (target != cpu)
+						cpu_idle = 1;
 				}
 				cpu = target;
 			}
@@ -1519,6 +1521,7 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
 			sd = tmp;
 	}
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
 	if (sched_feat(LB_SHARES_UPDATE)) {
 		/*
 		 * Pick the largest domain to update shares over
@@ -1532,9 +1535,12 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
 		if (tmp)
 			update_shares(tmp);
 	}
+#endif
 
-	if (affine_sd && wake_affine(affine_sd, p, sync))
-		return cpu;
+	if (affine_sd) {
+		if (cpu_idle || cpu == prev_cpu || wake_affine(affine_sd, p, sync))
+			return cpu;
+	}
 
 	while (sd) {
 		int load_idx = sd->forkexec_idx;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove NORMALIZED_SLEEPER
  2010-03-11  9:57 ` [patch 7/12] sched: remove NORMALIZED_SLEEPER Mike Galbraith
@ 2010-03-11 18:32   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08
Gitweb:     http://git.kernel.org/tip/6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:17 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:52 +0100

sched: Remove NORMALIZED_SLEEPER

This feature hasn't been enabled in a long time, remove effectively dead code.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301447.6785.38.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c     |   10 ----------
 kernel/sched_features.h |    7 -------
 2 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 0008cc4..de98e2e 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -742,16 +742,6 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 		unsigned long thresh = sysctl_sched_latency;
 
 		/*
-		 * Convert the sleeper threshold into virtual time.
-		 * SCHED_IDLE is a special sub-class.  We care about
-		 * fairness only relative to other SCHED_IDLE tasks,
-		 * all of which have the same weight.
-		 */
-		if (sched_feat(NORMALIZED_SLEEPER) && (!entity_is_task(se) ||
-				 task_of(se)->policy != SCHED_IDLE))
-			thresh = calc_delta_fair(thresh, se);
-
-		/*
 		 * Halve their sleep time's effect, to allow
 		 * for a gentler effect of sleepers:
 		 */
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index c545e04..4042883 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -13,13 +13,6 @@ SCHED_FEAT(FAIR_SLEEPERS, 1)
 SCHED_FEAT(GENTLE_FAIR_SLEEPERS, 1)
 
 /*
- * By not normalizing the sleep time, heavy tasks get an effective
- * longer period, and lighter task an effective shorter period they
- * are considered running.
- */
-SCHED_FEAT(NORMALIZED_SLEEPER, 0)
-
-/*
  * Place new tasks ahead so that they do not starve already running
  * tasks
  */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove FAIR_SLEEPERS feature
  2010-03-11  9:58 ` [patch 8/12] sched: remove FAIR_SLEEPERS feature Mike Galbraith
@ 2010-03-11 18:32   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  5ca9880c6f4ba4c84b517bc2fed5366adf63d191
Gitweb:     http://git.kernel.org/tip/5ca9880c6f4ba4c84b517bc2fed5366adf63d191
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:17 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:52 +0100

sched: Remove FAIR_SLEEPERS feature

Our preemption model relies too heavily on sleeper fairness to disable it
without dire consequences.  Remove the feature, and save a branch or two.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301520.6785.40.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c     |    2 +-
 kernel/sched_features.h |    7 -------
 2 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index de98e2e..97682f9 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -738,7 +738,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 		vruntime += sched_vslice(cfs_rq, se);
 
 	/* sleeps up to a single latency don't count. */
-	if (!initial && sched_feat(FAIR_SLEEPERS)) {
+	if (!initial) {
 		unsigned long thresh = sysctl_sched_latency;
 
 		/*
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 4042883..850f980 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -1,11 +1,4 @@
 /*
- * Disregards a certain amount of sleep time (sched_latency_ns) and
- * considers the task to be running during that period. This gives it
- * a service deficit on wakeup, allowing it to run sooner.
- */
-SCHED_FEAT(FAIR_SLEEPERS, 1)
-
-/*
  * Only give sleepers 50% of their service deficit. This allows
  * them to run sooner, but does not allow tons of sleepers to
  * rip the spread apart.

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove WAKEUP_SYNC feature
  2010-03-11  9:59 ` [patch 9/12] sched: remove WAKEUP_SYNC feature Mike Galbraith
@ 2010-03-11 18:32   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  f2e74eeac03ffb779d64b66a643c5e598145a28b
Gitweb:     http://git.kernel.org/tip/f2e74eeac03ffb779d64b66a643c5e598145a28b
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:18 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:52 +0100

sched: Remove WAKEUP_SYNC feature

This feature never earned its keep, remove it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301591.6785.42.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c     |    4 ----
 kernel/sched_features.h |    5 -----
 2 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 97682f9..1d99535 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1658,7 +1658,6 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 	struct task_struct *curr = rq->curr;
 	struct sched_entity *se = &curr->se, *pse = &p->se;
 	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
-	int sync = wake_flags & WF_SYNC;
 	int scale = cfs_rq->nr_running >= sched_nr_latency;
 
 	if (unlikely(rt_prio(p->prio)))
@@ -1691,9 +1690,6 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
 	if (unlikely(curr->policy == SCHED_IDLE))
 		goto preempt;
 
-	if (sched_feat(WAKEUP_SYNC) && sync)
-		goto preempt;
-
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 850f980..1cb7c47 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -23,11 +23,6 @@ SCHED_FEAT(WAKEUP_PREEMPT, 1)
 SCHED_FEAT(ASYM_GRAN, 1)
 
 /*
- * Always wakeup-preempt SYNC wakeups, see SYNC_WAKEUPS.
- */
-SCHED_FEAT(WAKEUP_SYNC, 0)
-
-/*
  * Use the SYNC wakeup hint, pipes and the likes use this to indicate
  * the remote end is likely to consume the data we just wrote, and
  * therefore has cache benefit from being placed on the same cpu, see

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove SYNC_WAKEUPS feature
  2010-03-11 10:03 ` [patch 10/12] sched: remove SYNC_WAKEUPS feature Mike Galbraith
@ 2010-03-11 18:33   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  c6ee36c423c3ed1fb86bb3eabba9fc256a300d16
Gitweb:     http://git.kernel.org/tip/c6ee36c423c3ed1fb86bb3eabba9fc256a300d16
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:16:43 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:53 +0100

sched: Remove SYNC_WAKEUPS feature

Sync wakeups are critical functionality with a long history.  Remove it, we don't
need the branch or icache footprint.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301817.6785.47.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c          |    3 ---
 kernel/sched_features.h |    8 --------
 2 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 16559de..cc6dc8c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2369,9 +2369,6 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state,
 	unsigned long flags;
 	struct rq *rq;
 
-	if (!sched_feat(SYNC_WAKEUPS))
-		wake_flags &= ~WF_SYNC;
-
 	this_cpu = get_cpu();
 
 	smp_wmb();
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 1cb7c47..f54b6f9 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -23,14 +23,6 @@ SCHED_FEAT(WAKEUP_PREEMPT, 1)
 SCHED_FEAT(ASYM_GRAN, 1)
 
 /*
- * Use the SYNC wakeup hint, pipes and the likes use this to indicate
- * the remote end is likely to consume the data we just wrote, and
- * therefore has cache benefit from being placed on the same cpu, see
- * also AFFINE_WAKEUPS.
- */
-SCHED_FEAT(SYNC_WAKEUPS, 1)
-
-/*
  * Based on load and program behaviour, see if it makes sense to place
  * a newly woken task on the same cpu as the task that woke it --
  * improve cache locality. Typically used with SYNC wakeups as

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove ASYM_GRAN feature
  2010-03-11 10:01 ` [patch 11/12] sched: remove ASYM_GRAN feature Mike Galbraith
@ 2010-03-11 18:33   ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  13814d42e45dfbe845a0bbe5184565d9236896ae
Gitweb:     http://git.kernel.org/tip/13814d42e45dfbe845a0bbe5184565d9236896ae
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:04 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:53 +0100

sched: Remove ASYM_GRAN feature

This features has been enabled for quite a while, after testing showed that
easing preemption for light tasks was harmful to high priority threads.

Remove the feature flag.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301675.6785.44.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c     |   28 +++++++++++-----------------
 kernel/sched_features.h |    6 ------
 2 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 1d99535..9357ecd 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1583,24 +1583,18 @@ wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
 	/*
 	 * Since its curr running now, convert the gran from real-time
 	 * to virtual-time in his units.
+	 *
+	 * By using 'se' instead of 'curr' we penalize light tasks, so
+	 * they get preempted easier. That is, if 'se' < 'curr' then
+	 * the resulting gran will be larger, therefore penalizing the
+	 * lighter, if otoh 'se' > 'curr' then the resulting gran will
+	 * be smaller, again penalizing the lighter task.
+	 *
+	 * This is especially important for buddies when the leftmost
+	 * task is higher priority than the buddy.
 	 */
-	if (sched_feat(ASYM_GRAN)) {
-		/*
-		 * By using 'se' instead of 'curr' we penalize light tasks, so
-		 * they get preempted easier. That is, if 'se' < 'curr' then
-		 * the resulting gran will be larger, therefore penalizing the
-		 * lighter, if otoh 'se' > 'curr' then the resulting gran will
-		 * be smaller, again penalizing the lighter task.
-		 *
-		 * This is especially important for buddies when the leftmost
-		 * task is higher priority than the buddy.
-		 */
-		if (unlikely(se->load.weight != NICE_0_LOAD))
-			gran = calc_delta_fair(gran, se);
-	} else {
-		if (unlikely(curr->load.weight != NICE_0_LOAD))
-			gran = calc_delta_fair(gran, curr);
-	}
+	if (unlikely(se->load.weight != NICE_0_LOAD))
+		gran = calc_delta_fair(gran, se);
 
 	return gran;
 }
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index f54b6f9..83c66e8 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -17,12 +17,6 @@ SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
 
 /*
- * When converting the wakeup granularity to virtual time, do it such
- * that heavier tasks preempting a lighter task have an edge.
- */
-SCHED_FEAT(ASYM_GRAN, 1)
-
-/*
  * Based on load and program behaviour, see if it makes sense to place
  * a newly woken task on the same cpu as the task that woke it --
  * improve cache locality. Typically used with SYNC wakeups as

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:sched/core] sched: Remove AFFINE_WAKEUPS feature
  2010-03-11 10:04 ` [patch 12/12] sched: remove AFFINE_WAKEUPS feature Mike Galbraith
@ 2010-03-11 18:33   ` tip-bot for Mike Galbraith
  2010-03-12  3:23     ` Yong Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: tip-bot for Mike Galbraith @ 2010-03-11 18:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo

Commit-ID:  beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
Gitweb:     http://git.kernel.org/tip/beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Thu, 11 Mar 2010 17:17:20 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 18:32:53 +0100

sched: Remove AFFINE_WAKEUPS feature

Disabling affine wakeups is too horrible to contemplate.  Remove the feature flag.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301890.6785.50.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9357ecd..35a5c64 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1434,8 +1434,7 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
 	int sync = wake_flags & WF_SYNC;
 
 	if (sd_flag & SD_BALANCE_WAKE) {
-		if (sched_feat(AFFINE_WAKEUPS) &&
-		    cpumask_test_cpu(cpu, &p->cpus_allowed))
+		if (cpumask_test_cpu(cpu, &p->cpus_allowed))
 			want_affine = 1;
 		new_cpu = prev_cpu;
 	}

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [tip:sched/core] sched: Remove AFFINE_WAKEUPS feature
  2010-03-11 18:33   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
@ 2010-03-12  3:23     ` Yong Zhang
  2010-03-12  4:37       ` Mike Galbraith
  0 siblings, 1 reply; 27+ messages in thread
From: Yong Zhang @ 2010-03-12  3:23 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, a.p.zijlstra, efault, tglx, mingo
  Cc: linux-tip-commits

On Thu, Mar 11, 2010 at 06:33:38PM +0000, tip-bot for Mike Galbraith wrote:
> Commit-ID:  beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
> Gitweb:     http://git.kernel.org/tip/beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
> Author:     Mike Galbraith <efault@gmx.de>
> AuthorDate: Thu, 11 Mar 2010 17:17:20 +0100
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Thu, 11 Mar 2010 18:32:53 +0100
> 
> sched: Remove AFFINE_WAKEUPS feature
> 
> Disabling affine wakeups is too horrible to contemplate.  Remove the feature flag.

AFFINE_WAKEUPS is still left in sched_feature.h

From: Yong Zhang <yong.zhang@windriver.com>
Date: Fri, 12 Mar 2010 11:14:26 +0800
Subject: [PATCH] sched: clean AFFINE_WAKEUPS feature

complementary work to commit beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8

Signed-off-by: Yong Zhang <yong.zhang@windriver.com>
---
 kernel/sched_features.h |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 83c66e8..2137ac0 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -17,14 +17,6 @@ SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(WAKEUP_PREEMPT, 1)
 
 /*
- * Based on load and program behaviour, see if it makes sense to place
- * a newly woken task on the same cpu as the task that woke it --
- * improve cache locality. Typically used with SYNC wakeups as
- * generated by pipes and the like, see also SYNC_WAKEUPS.
- */
-SCHED_FEAT(AFFINE_WAKEUPS, 1)
-
-/*
  * Prefer to schedule the task we woke last (assuming it failed
  * wakeup-preemption), since its likely going to consume data we
  * touched, increases cache locality.
-- 
1.6.3.3



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [tip:sched/core] sched: Remove AFFINE_WAKEUPS feature
  2010-03-12  3:23     ` Yong Zhang
@ 2010-03-12  4:37       ` Mike Galbraith
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Galbraith @ 2010-03-12  4:37 UTC (permalink / raw)
  To: Yong Zhang
  Cc: mingo, hpa, linux-kernel, a.p.zijlstra, tglx, mingo, linux-tip-commits

On Fri, 2010-03-12 at 11:23 +0800, Yong Zhang wrote:
> On Thu, Mar 11, 2010 at 06:33:38PM +0000, tip-bot for Mike Galbraith wrote:
> > Commit-ID:  beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
> > Gitweb:     http://git.kernel.org/tip/beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
> > Author:     Mike Galbraith <efault@gmx.de>
> > AuthorDate: Thu, 11 Mar 2010 17:17:20 +0100
> > Committer:  Ingo Molnar <mingo@elte.hu>
> > CommitDate: Thu, 11 Mar 2010 18:32:53 +0100
> > 
> > sched: Remove AFFINE_WAKEUPS feature
> > 
> > Disabling affine wakeups is too horrible to contemplate.  Remove the feature flag.
> 
> AFFINE_WAKEUPS is still left in sched_feature.h

Oops, axe got dull.  Thanks for checking.

	-Mike

> From: Yong Zhang <yong.zhang@windriver.com>
> Date: Fri, 12 Mar 2010 11:14:26 +0800
> Subject: [PATCH] sched: clean AFFINE_WAKEUPS feature
> 
> complementary work to commit beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
> 
> Signed-off-by: Yong Zhang <yong.zhang@windriver.com>
> ---
>  kernel/sched_features.h |    8 --------
>  1 files changed, 0 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched_features.h b/kernel/sched_features.h
> index 83c66e8..2137ac0 100644
> --- a/kernel/sched_features.h
> +++ b/kernel/sched_features.h
> @@ -17,14 +17,6 @@ SCHED_FEAT(START_DEBIT, 1)
>  SCHED_FEAT(WAKEUP_PREEMPT, 1)
>  
>  /*
> - * Based on load and program behaviour, see if it makes sense to place
> - * a newly woken task on the same cpu as the task that woke it --
> - * improve cache locality. Typically used with SYNC wakeups as
> - * generated by pipes and the like, see also SYNC_WAKEUPS.
> - */
> -SCHED_FEAT(AFFINE_WAKEUPS, 1)
> -
> -/*
>   * Prefer to schedule the task we woke last (assuming it failed
>   * wakeup-preemption), since its likely going to consume data we
>   * touched, increases cache locality.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-03-12  4:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-11  9:49 [patch 0/12] sched: fastpath cycle recovery Mike Galbraith
2010-03-11  9:50 ` [patch 1/12] sched: ratelimit nohz Mike Galbraith
2010-03-11 18:30   ` [tip:sched/core] sched: Rate-limit nohz tip-bot for Mike Galbraith
2010-03-11  9:51 ` [patch 2/12] sched: remove avg_wakeup Mike Galbraith
2010-03-11 18:30   ` [tip:sched/core] sched: Remove avg_wakeup tip-bot for Mike Galbraith
2010-03-11  9:52 ` [patch 3/12] sched: remove avg_overlap Mike Galbraith
2010-03-11 18:31   ` [tip:sched/core] sched: Remove avg_overlap tip-bot for Mike Galbraith
2010-03-11  9:53 ` [patch 4/12] sched: cleanup/optimize clock updates Mike Galbraith
2010-03-11 18:31   ` [tip:sched/core] sched: Cleanup/optimize " tip-bot for Mike Galbraith
2010-03-11  9:54 ` [patch 5/12] sched: tweak sched_latency and min_granularity Mike Galbraith
2010-03-11 18:31   ` [tip:sched/core] sched: Tweak " tip-bot for Mike Galbraith
2010-03-11  9:56 ` [patch 6/12] sched: fix select_idle_sibling() Mike Galbraith
2010-03-11 18:32   ` [tip:sched/core] sched: Fix select_idle_sibling() tip-bot for Mike Galbraith
2010-03-11  9:57 ` [patch 7/12] sched: remove NORMALIZED_SLEEPER Mike Galbraith
2010-03-11 18:32   ` [tip:sched/core] sched: Remove NORMALIZED_SLEEPER tip-bot for Mike Galbraith
2010-03-11  9:58 ` [patch 8/12] sched: remove FAIR_SLEEPERS feature Mike Galbraith
2010-03-11 18:32   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
2010-03-11  9:59 ` [patch 9/12] sched: remove WAKEUP_SYNC feature Mike Galbraith
2010-03-11 18:32   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
2010-03-11 10:01 ` [patch 11/12] sched: remove ASYM_GRAN feature Mike Galbraith
2010-03-11 18:33   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
2010-03-11 10:03 ` [patch 10/12] sched: remove SYNC_WAKEUPS feature Mike Galbraith
2010-03-11 18:33   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
2010-03-11 10:04 ` [patch 12/12] sched: remove AFFINE_WAKEUPS feature Mike Galbraith
2010-03-11 18:33   ` [tip:sched/core] sched: Remove " tip-bot for Mike Galbraith
2010-03-12  3:23     ` Yong Zhang
2010-03-12  4:37       ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.