All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] sched: Remove 'cpu' parameter for idle_balance()
@ 2014-01-17  9:04 Daniel Lezcano
  2014-01-17  9:04 ` [PATCH 2/4] sched: Fix race in idle_balance() Daniel Lezcano
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17  9:04 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, linaro-kernel, alex.shi

The cpu parameter passed to idle_balance is not needed as it could
be retrieved from the struct rq.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/core.c  |    2 +-
 kernel/sched/fair.c  |    3 ++-
 kernel/sched/sched.h |    2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 36c951b..cf51601 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2680,7 +2680,7 @@ need_resched:
 	pre_schedule(rq, prev);
 
 	if (unlikely(!rq->nr_running))
-		idle_balance(cpu, rq);
+		idle_balance(rq);
 
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b24b6cf..d601df3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6359,12 +6359,13 @@ out:
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
-void idle_balance(int this_cpu, struct rq *this_rq)
+void idle_balance(struct rq *this_rq)
 {
 	struct sched_domain *sd;
 	int pulled_task = 0;
 	unsigned long next_balance = jiffies + HZ;
 	u64 curr_cost = 0;
+	int this_cpu = this_rq->cpu;
 
 	this_rq->idle_stamp = rq_clock(this_rq);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c2119fd..b0be4c6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1176,7 +1176,7 @@ extern const struct sched_class idle_sched_class;
 extern void update_group_power(struct sched_domain *sd, int cpu);
 
 extern void trigger_load_balance(struct rq *rq);
-extern void idle_balance(int this_cpu, struct rq *this_rq);
+extern void idle_balance(struct rq *this_rq);
 
 extern void idle_enter_fair(struct rq *this_rq);
 extern void idle_exit_fair(struct rq *this_rq);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] sched: Fix race in idle_balance()
  2014-01-17  9:04 [PATCH 1/4] sched: Remove 'cpu' parameter for idle_balance() Daniel Lezcano
@ 2014-01-17  9:04 ` Daniel Lezcano
  2014-01-17 13:33   ` Peter Zijlstra
  2014-02-11 12:15   ` [tip:sched/core] " tip-bot for Daniel Lezcano
  2014-01-17  9:04 ` [PATCH 3/4] sched: Move idle_stamp up to the core Daniel Lezcano
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17  9:04 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, linaro-kernel, alex.shi

The scheduler main function 'schedule()' checks if there are no more tasks
on the runqueue. Then it checks if a task should be pulled in the current
runqueue in idle_balance() assuming it will go to idle otherwise.

But the idle_balance() releases the rq->lock in order to lookup in the sched
domains and takes the lock again right after. That opens a window where
another cpu may put a task in our runqueue, so we won't go to idle but
we have filled the idle_stamp, thinking we will.

This patch closes the window by checking if the runqueue has been modified
but without pulling a task after taking the lock again, so we won't go to idle
right after in the __schedule() function.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/fair.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d601df3..502c51c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6417,6 +6417,13 @@ void idle_balance(struct rq *this_rq)
 
 	raw_spin_lock(&this_rq->lock);
 
+	/*
+	 * While browsing the domains, we released the rq lock.
+	 * A task could have be enqueued in the meantime
+	 */
+	if (this_rq->nr_running && !pulled_task)
+		return;
+
 	if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
 		/*
 		 * We are going idle. next_balance may be set based on
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] sched: Move idle_stamp up to the core
  2014-01-17  9:04 [PATCH 1/4] sched: Remove 'cpu' parameter for idle_balance() Daniel Lezcano
  2014-01-17  9:04 ` [PATCH 2/4] sched: Fix race in idle_balance() Daniel Lezcano
@ 2014-01-17  9:04 ` Daniel Lezcano
  2014-02-11 12:16   ` [tip:sched/core] sched: Move rq->idle_stamp " tip-bot for Daniel Lezcano
  2014-01-17  9:04 ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
  2014-02-11 12:15 ` [tip:sched/core] sched: Remove 'cpu' parameter from idle_balance( ) tip-bot for Daniel Lezcano
  3 siblings, 1 reply; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17  9:04 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, linaro-kernel, alex.shi

The idle_balance modifies the idle_stamp field of the rq, making this
information to be shared across core.c and fair.c. As we can know if the
cpu is going to idle or not with the previous patch, let's encapsulate the
idle_stamp information in core.c by moving it up to the caller. The
idle_balance function returns true in case a balancing occured and the cpu
won't be idle, false if no balance happened and the cpu is going idle.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/core.c  |    2 +-
 kernel/sched/fair.c  |   14 ++++++--------
 kernel/sched/sched.h |    2 +-
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cf51601..b7e3635 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2680,7 +2680,7 @@ need_resched:
 	pre_schedule(rq, prev);
 
 	if (unlikely(!rq->nr_running))
-		idle_balance(rq);
+		rq->idle_stamp = idle_balance(rq) ? 0 : rq_clock(rq);
 
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 502c51c..49dfedb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6359,7 +6359,7 @@ out:
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
-void idle_balance(struct rq *this_rq)
+int idle_balance(struct rq *this_rq)
 {
 	struct sched_domain *sd;
 	int pulled_task = 0;
@@ -6367,10 +6367,8 @@ void idle_balance(struct rq *this_rq)
 	u64 curr_cost = 0;
 	int this_cpu = this_rq->cpu;
 
-	this_rq->idle_stamp = rq_clock(this_rq);
-
 	if (this_rq->avg_idle < sysctl_sched_migration_cost)
-		return;
+		return 0;
 
 	/*
 	 * Drop the rq->lock, but keep IRQ/preempt disabled.
@@ -6408,10 +6406,8 @@ void idle_balance(struct rq *this_rq)
 		interval = msecs_to_jiffies(sd->balance_interval);
 		if (time_after(next_balance, sd->last_balance + interval))
 			next_balance = sd->last_balance + interval;
-		if (pulled_task) {
-			this_rq->idle_stamp = 0;
+		if (pulled_task)
 			break;
-		}
 	}
 	rcu_read_unlock();
 
@@ -6422,7 +6418,7 @@ void idle_balance(struct rq *this_rq)
 	 * A task could have be enqueued in the meantime
 	 */
 	if (this_rq->nr_running && !pulled_task)
-		return;
+		return 1;
 
 	if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
 		/*
@@ -6434,6 +6430,8 @@ void idle_balance(struct rq *this_rq)
 
 	if (curr_cost > this_rq->max_idle_balance_cost)
 		this_rq->max_idle_balance_cost = curr_cost;
+
+	return pulled_task;
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b0be4c6..970dd44 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1176,7 +1176,7 @@ extern const struct sched_class idle_sched_class;
 extern void update_group_power(struct sched_domain *sd, int cpu);
 
 extern void trigger_load_balance(struct rq *rq);
-extern void idle_balance(struct rq *this_rq);
+extern int idle_balance(struct rq *this_rq);
 
 extern void idle_enter_fair(struct rq *this_rq);
 extern void idle_exit_fair(struct rq *this_rq);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17  9:04 [PATCH 1/4] sched: Remove 'cpu' parameter for idle_balance() Daniel Lezcano
  2014-01-17  9:04 ` [PATCH 2/4] sched: Fix race in idle_balance() Daniel Lezcano
  2014-01-17  9:04 ` [PATCH 3/4] sched: Move idle_stamp up to the core Daniel Lezcano
@ 2014-01-17  9:04 ` Daniel Lezcano
  2014-01-17 14:09   ` Peter Zijlstra
  2014-01-17 14:26   ` Peter Zijlstra
  2014-02-11 12:15 ` [tip:sched/core] sched: Remove 'cpu' parameter from idle_balance( ) tip-bot for Daniel Lezcano
  3 siblings, 2 replies; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17  9:04 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, linaro-kernel, alex.shi

With the previous patch, we have no ambiguity on going to idle. So we can pick
directly the idle task instead of looking up all the domains which will in any
case return no task except the idle_task.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/core.c      |   43 ++++++++++++++++++++++++++++++++++++++-----
 kernel/sched/idle_task.c |   15 +++++----------
 2 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b7e3635..c7a8f4e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2586,7 +2586,43 @@ pick_next_task(struct rq *rq)
 			return p;
 	}
 
-	BUG(); /* the idle class will always have a runnable task */
+	/*
+	 * We must have at least one task to run, the idle task is
+	 * returned by the shortcut in pick_next_task_or_idle. If we
+	 * fall here, we don't have any task to run, so something is
+	 * wrong when we thought the cpu was not going to idle.
+	 */
+	BUG();
+}
+
+static inline struct task_struct *pick_next_task_or_idle(struct rq *rq)
+{
+	if (likely(rq->nr_running))
+		return pick_next_task(rq);
+
+	rq->idle_stamp = 0;
+
+	/*
+	 * If there is a task balanced on this cpu, pick the next task,
+	 * otherwise fall in the optimization by picking the idle task
+	 * directly.
+	 */
+	if (idle_balance(rq))
+		return pick_next_task(rq);
+
+	rq->idle_stamp = rq_clock(rq);
+
+	/*
+	 * Optimization: pick_next_task will return rq->idle in any case but
+	 * after walking through the different sched domains. Let's add a
+	 * shortcut to directly return the idle task.
+	 */
+	schedstat_inc(rq, sched_goidle);
+#ifdef CONFIG_SMP
+	/* Trigger the post schedule to do an idle_enter for CFS */
+	rq->post_schedule = 1;
+#endif
+	return rq->idle;
 }
 
 /*
@@ -2679,11 +2715,8 @@ need_resched:
 
 	pre_schedule(rq, prev);
 
-	if (unlikely(!rq->nr_running))
-		rq->idle_stamp = idle_balance(rq) ? 0 : rq_clock(rq);
-
 	put_prev_task(rq, prev);
-	next = pick_next_task(rq);
+	next = pick_next_task_or_idle(rq);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 	rq->skip_clock_update = 0;
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 516c3d9..0b4c94b 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -33,16 +33,6 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
 	resched_task(rq->idle);
 }
 
-static struct task_struct *pick_next_task_idle(struct rq *rq)
-{
-	schedstat_inc(rq, sched_goidle);
-#ifdef CONFIG_SMP
-	/* Trigger the post schedule to do an idle_enter for CFS */
-	rq->post_schedule = 1;
-#endif
-	return rq->idle;
-}
-
 /*
  * It is not legal to sleep in the idle task - print a warning
  * message if some code attempts to do it:
@@ -60,6 +50,11 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
 {
 }
 
+static struct task_struct *pick_next_task_idle(struct rq *rq)
+{
+	return NULL;
+}
+
 static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
 {
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] sched: Fix race in idle_balance()
  2014-01-17  9:04 ` [PATCH 2/4] sched: Fix race in idle_balance() Daniel Lezcano
@ 2014-01-17 13:33   ` Peter Zijlstra
  2014-01-17 13:44     ` Daniel Lezcano
  2014-02-11 12:15   ` [tip:sched/core] " tip-bot for Daniel Lezcano
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2014-01-17 13:33 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On Fri, Jan 17, 2014 at 10:04:02AM +0100, Daniel Lezcano wrote:
> The scheduler main function 'schedule()' checks if there are no more tasks
> on the runqueue. Then it checks if a task should be pulled in the current
> runqueue in idle_balance() assuming it will go to idle otherwise.
> 
> But the idle_balance() releases the rq->lock in order to lookup in the sched
> domains and takes the lock again right after. That opens a window where
> another cpu may put a task in our runqueue, so we won't go to idle but
> we have filled the idle_stamp, thinking we will.
> 
> This patch closes the window by checking if the runqueue has been modified
> but without pulling a task after taking the lock again, so we won't go to idle
> right after in the __schedule() function.

Did you actually observe this or was it found by reading the code?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] sched: Fix race in idle_balance()
  2014-01-17 13:33   ` Peter Zijlstra
@ 2014-01-17 13:44     ` Daniel Lezcano
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 13:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/17/2014 02:33 PM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 10:04:02AM +0100, Daniel Lezcano wrote:
>> The scheduler main function 'schedule()' checks if there are no more tasks
>> on the runqueue. Then it checks if a task should be pulled in the current
>> runqueue in idle_balance() assuming it will go to idle otherwise.
>>
>> But the idle_balance() releases the rq->lock in order to lookup in the sched
>> domains and takes the lock again right after. That opens a window where
>> another cpu may put a task in our runqueue, so we won't go to idle but
>> we have filled the idle_stamp, thinking we will.
>>
>> This patch closes the window by checking if the runqueue has been modified
>> but without pulling a task after taking the lock again, so we won't go to idle
>> right after in the __schedule() function.
>
> Did you actually observe this or was it found by reading the code?

When I tried to achieve what is doing the patch 4/4, I was falling in 
the BUG() (comment in patch 4/4). So I did some tests and checked that 
we enter idle_balance() with nr_running == 0 but we exit with nr_running 
 > 0 and pulled_task == 0.


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17  9:04 ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
@ 2014-01-17 14:09   ` Peter Zijlstra
  2014-01-17 15:09     ` Daniel Lezcano
  2014-01-17 14:26   ` Peter Zijlstra
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2014-01-17 14:09 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On Fri, Jan 17, 2014 at 10:04:04AM +0100, Daniel Lezcano wrote:
> +	schedstat_inc(rq, sched_goidle);
> +#ifdef CONFIG_SMP
> +	/* Trigger the post schedule to do an idle_enter for CFS */
> +	rq->post_schedule = 1;
> +#endif
> +	return rq->idle;

Urgh, that retains the stupid idle crap like it is.

I've not yet tested this, but is there a reason something like the below
couldn't work?

---
Subject: sched: Clean up idle task SMP logic
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Jan 17 14:54:02 CET 2014

The idle post_schedule hook is just a vile waste of time, fix it proper.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/fair.c      |    5 +++--
 kernel/sched/idle_task.c |   21 ++++++---------------
 2 files changed, 9 insertions(+), 17 deletions(-)

Index: linux-2.6/kernel/sched/fair.c
===================================================================
--- linux-2.6.orig/kernel/sched/fair.c
+++ linux-2.6/kernel/sched/fair.c
@@ -2416,7 +2416,8 @@ void idle_exit_fair(struct rq *this_rq)
 	update_rq_runnable_avg(this_rq, 0);
 }
 
-#else
+#else /* CONFIG_SMP */
+
 static inline void update_entity_load_avg(struct sched_entity *se,
 					  int update_cfs_rq) {}
 static inline void update_rq_runnable_avg(struct rq *rq, int runnable) {}
@@ -2428,7 +2429,7 @@ static inline void dequeue_entity_load_a
 					   int sleep) {}
 static inline void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq,
 					      int force_update) {}
-#endif
+#endif /* CONFIG_SMP */
 
 static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
Index: linux-2.6/kernel/sched/idle_task.c
===================================================================
--- linux-2.6.orig/kernel/sched/idle_task.c
+++ linux-2.6/kernel/sched/idle_task.c
@@ -13,18 +13,8 @@ select_task_rq_idle(struct task_struct *
 {
 	return task_cpu(p); /* IDLE tasks as never migrated */
 }
-
-static void pre_schedule_idle(struct rq *rq, struct task_struct *prev)
-{
-	idle_exit_fair(rq);
-	rq_last_tick_reset(rq);
-}
-
-static void post_schedule_idle(struct rq *rq)
-{
-	idle_enter_fair(rq);
-}
 #endif /* CONFIG_SMP */
+
 /*
  * Idle tasks are unconditionally rescheduled:
  */
@@ -37,8 +27,7 @@ static struct task_struct *pick_next_tas
 {
 	schedstat_inc(rq, sched_goidle);
 #ifdef CONFIG_SMP
-	/* Trigger the post schedule to do an idle_enter for CFS */
-	rq->post_schedule = 1;
+	idle_enter_fair(rq);
 #endif
 	return rq->idle;
 }
@@ -58,6 +47,10 @@ dequeue_task_idle(struct rq *rq, struct
 
 static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
 {
+#ifdef CONFIG_SMP
+	idle_exit_fair(rq);
+	rq_last_tick_reset(rq);
+#endif
 }
 
 static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
@@ -101,8 +94,6 @@ const struct sched_class idle_sched_clas
 
 #ifdef CONFIG_SMP
 	.select_task_rq		= select_task_rq_idle,
-	.pre_schedule		= pre_schedule_idle,
-	.post_schedule		= post_schedule_idle,
 #endif
 
 	.set_curr_task          = set_curr_task_idle,

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17  9:04 ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
  2014-01-17 14:09   ` Peter Zijlstra
@ 2014-01-17 14:26   ` Peter Zijlstra
  2014-01-17 15:06     ` Daniel Lezcano
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2014-01-17 14:26 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On Fri, Jan 17, 2014 at 10:04:04AM +0100, Daniel Lezcano wrote:
> @@ -2679,11 +2715,8 @@ need_resched:
>  
>  	pre_schedule(rq, prev);
>  
> -	if (unlikely(!rq->nr_running))
> -		rq->idle_stamp = idle_balance(rq) ? 0 : rq_clock(rq);
> -
>  	put_prev_task(rq, prev);
> -	next = pick_next_task(rq);
> +	next = pick_next_task_or_idle(rq);
>  	clear_tsk_need_resched(prev);
>  	clear_preempt_need_resched();
>  	rq->skip_clock_update = 0;

I have vague memories that we need to have idle_balance() before
put_prev_task(), but I can't recollect why this would be so.

That said, if I resurrect these patches:

  https://lkml.org/lkml/2012/6/14/271

I suppose we could write something like:

struct task_struct *pick_next_task(struct rq *rq, struct task_struct *prev)
{
	const struct sched_class *class;
	struct task_struct *p;

again:
	if (likely(rq->nr_running)) {

		if (likely(rq->nr_running == rq->cfs.h_nr_running))
			return fair_sched_class.pick_next_task(rq, prev);

		for_each_class(class) {
			p = class->pick_next_task(rq, prev);
			if (p)
				return p;
		}
	}

	if (idle_balance(rq))
		goto again;

	rq->idle_stamp = rq_clock(rq);

	return idle_sched_class.pick_next_task(rq, prev);
}

Which keeps idle_balance() before put_prev_task(), and by using
idle_sched_clas.pick_next_task() doesn't rape the idle class interface
like you did :-)





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 14:26   ` Peter Zijlstra
@ 2014-01-17 15:06     ` Daniel Lezcano
  2014-01-17 15:23       ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 15:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/17/2014 03:26 PM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 10:04:04AM +0100, Daniel Lezcano wrote:
>> @@ -2679,11 +2715,8 @@ need_resched:
>>
>>   	pre_schedule(rq, prev);
>>
>> -	if (unlikely(!rq->nr_running))
>> -		rq->idle_stamp = idle_balance(rq) ? 0 : rq_clock(rq);
>> -
>>   	put_prev_task(rq, prev);
>> -	next = pick_next_task(rq);
>> +	next = pick_next_task_or_idle(rq);
>>   	clear_tsk_need_resched(prev);
>>   	clear_preempt_need_resched();
>>   	rq->skip_clock_update = 0;
>
> I have vague memories that we need to have idle_balance() before
> put_prev_task(), but I can't recollect why this would be so.
>
> That said, if I resurrect these patches:
>
>    https://lkml.org/lkml/2012/6/14/271
>
> I suppose we could write something like:
>
> struct task_struct *pick_next_task(struct rq *rq, struct task_struct *prev)
> {
> 	const struct sched_class *class;
> 	struct task_struct *p;
>
> again:
> 	if (likely(rq->nr_running)) {
>
> 		if (likely(rq->nr_running == rq->cfs.h_nr_running))
> 			return fair_sched_class.pick_next_task(rq, prev);
>
> 		for_each_class(class) {
> 			p = class->pick_next_task(rq, prev);
> 			if (p)
> 				return p;
> 		}
> 	}
>
> 	if (idle_balance(rq))
> 		goto again;
>
> 	rq->idle_stamp = rq_clock(rq);
>
> 	return idle_sched_class.pick_next_task(rq, prev);
> }
>
> Which keeps idle_balance() before put_prev_task(), and by using
> idle_sched_clas.pick_next_task() doesn't rape the idle class interface
> like you did :-)

But put_prev_task is called before pick_next_task, so idle_balance() is 
called after now, no  ?


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 14:09   ` Peter Zijlstra
@ 2014-01-17 15:09     ` Daniel Lezcano
  2014-01-17 15:23       ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 15:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/17/2014 03:09 PM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 10:04:04AM +0100, Daniel Lezcano wrote:
>> +	schedstat_inc(rq, sched_goidle);
>> +#ifdef CONFIG_SMP
>> +	/* Trigger the post schedule to do an idle_enter for CFS */
>> +	rq->post_schedule = 1;
>> +#endif
>> +	return rq->idle;
>
> Urgh, that retains the stupid idle crap like it is.
>
> I've not yet tested this, but is there a reason something like the below
> couldn't work?

I tested it by running hackbench and it seems to work as expected.

> ---
> Subject: sched: Clean up idle task SMP logic
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri Jan 17 14:54:02 CET 2014
>
> The idle post_schedule hook is just a vile waste of time, fix it proper.
>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
>   kernel/sched/fair.c      |    5 +++--
>   kernel/sched/idle_task.c |   21 ++++++---------------
>   2 files changed, 9 insertions(+), 17 deletions(-)
>
> Index: linux-2.6/kernel/sched/fair.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched/fair.c
> +++ linux-2.6/kernel/sched/fair.c
> @@ -2416,7 +2416,8 @@ void idle_exit_fair(struct rq *this_rq)
>   	update_rq_runnable_avg(this_rq, 0);
>   }
>
> -#else
> +#else /* CONFIG_SMP */
> +
>   static inline void update_entity_load_avg(struct sched_entity *se,
>   					  int update_cfs_rq) {}
>   static inline void update_rq_runnable_avg(struct rq *rq, int runnable) {}
> @@ -2428,7 +2429,7 @@ static inline void dequeue_entity_load_a
>   					   int sleep) {}
>   static inline void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq,
>   					      int force_update) {}
> -#endif
> +#endif /* CONFIG_SMP */
>
>   static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
>   {
> Index: linux-2.6/kernel/sched/idle_task.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched/idle_task.c
> +++ linux-2.6/kernel/sched/idle_task.c
> @@ -13,18 +13,8 @@ select_task_rq_idle(struct task_struct *
>   {
>   	return task_cpu(p); /* IDLE tasks as never migrated */
>   }
> -
> -static void pre_schedule_idle(struct rq *rq, struct task_struct *prev)
> -{
> -	idle_exit_fair(rq);
> -	rq_last_tick_reset(rq);
> -}
> -
> -static void post_schedule_idle(struct rq *rq)
> -{
> -	idle_enter_fair(rq);
> -}
>   #endif /* CONFIG_SMP */
> +
>   /*
>    * Idle tasks are unconditionally rescheduled:
>    */
> @@ -37,8 +27,7 @@ static struct task_struct *pick_next_tas
>   {
>   	schedstat_inc(rq, sched_goidle);
>   #ifdef CONFIG_SMP
> -	/* Trigger the post schedule to do an idle_enter for CFS */
> -	rq->post_schedule = 1;
> +	idle_enter_fair(rq);
>   #endif
>   	return rq->idle;
>   }
> @@ -58,6 +47,10 @@ dequeue_task_idle(struct rq *rq, struct
>
>   static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
>   {
> +#ifdef CONFIG_SMP
> +	idle_exit_fair(rq);
> +	rq_last_tick_reset(rq);
> +#endif
>   }
>
>   static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)
> @@ -101,8 +94,6 @@ const struct sched_class idle_sched_clas
>
>   #ifdef CONFIG_SMP
>   	.select_task_rq		= select_task_rq_idle,
> -	.pre_schedule		= pre_schedule_idle,
> -	.post_schedule		= post_schedule_idle,
>   #endif
>
>   	.set_curr_task          = set_curr_task_idle,
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 15:06     ` Daniel Lezcano
@ 2014-01-17 15:23       ` Peter Zijlstra
  2014-01-17 15:26         ` Daniel Lezcano
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2014-01-17 15:23 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On Fri, Jan 17, 2014 at 04:06:45PM +0100, Daniel Lezcano wrote:
> >I suppose we could write something like:
> >
> >struct task_struct *pick_next_task(struct rq *rq, struct task_struct *prev)
> >{
> >	const struct sched_class *class;
> >	struct task_struct *p;
> >
> >again:
> >	if (likely(rq->nr_running)) {
> >
> >		if (likely(rq->nr_running == rq->cfs.h_nr_running))
> >			return fair_sched_class.pick_next_task(rq, prev);
> >
> >		for_each_class(class) {
> >			p = class->pick_next_task(rq, prev);
> >			if (p)
> >				return p;
> >		}
> >	}
> >
> >	if (idle_balance(rq))
> >		goto again;
> >
> >	rq->idle_stamp = rq_clock(rq);
> >
> >	return idle_sched_class.pick_next_task(rq, prev);
> >}
> >
> >Which keeps idle_balance() before put_prev_task(), and by using
> >idle_sched_clas.pick_next_task() doesn't rape the idle class interface
> >like you did :-)
> 
> But put_prev_task is called before pick_next_task, so idle_balance() is
> called after now, no  ?

No, put_prev_task() is called by the pick_next_task() that returns a
task. So in the idle case above, the idle_sched_class.pick_next_task()
will do the required put_prev_task().

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 15:09     ` Daniel Lezcano
@ 2014-01-17 15:23       ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2014-01-17 15:23 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On Fri, Jan 17, 2014 at 04:09:39PM +0100, Daniel Lezcano wrote:
> On 01/17/2014 03:09 PM, Peter Zijlstra wrote:
> >On Fri, Jan 17, 2014 at 10:04:04AM +0100, Daniel Lezcano wrote:
> >>+	schedstat_inc(rq, sched_goidle);
> >>+#ifdef CONFIG_SMP
> >>+	/* Trigger the post schedule to do an idle_enter for CFS */
> >>+	rq->post_schedule = 1;
> >>+#endif
> >>+	return rq->idle;
> >
> >Urgh, that retains the stupid idle crap like it is.
> >
> >I've not yet tested this, but is there a reason something like the below
> >couldn't work?
> 
> I tested it by running hackbench and it seems to work as expected.

Thanks, I was still looking through the throttling. I'll keep it then
:-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 15:23       ` Peter Zijlstra
@ 2014-01-17 15:26         ` Daniel Lezcano
  2014-01-17 15:33           ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 15:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/17/2014 04:23 PM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 04:06:45PM +0100, Daniel Lezcano wrote:
>>> I suppose we could write something like:
>>>
>>> struct task_struct *pick_next_task(struct rq *rq, struct task_struct *prev)
>>> {
>>> 	const struct sched_class *class;
>>> 	struct task_struct *p;
>>>
>>> again:
>>> 	if (likely(rq->nr_running)) {
>>>
>>> 		if (likely(rq->nr_running == rq->cfs.h_nr_running))
>>> 			return fair_sched_class.pick_next_task(rq, prev);
>>>
>>> 		for_each_class(class) {
>>> 			p = class->pick_next_task(rq, prev);
>>> 			if (p)
>>> 				return p;
>>> 		}
>>> 	}
>>>
>>> 	if (idle_balance(rq))
>>> 		goto again;
>>>
>>> 	rq->idle_stamp = rq_clock(rq);
>>>
>>> 	return idle_sched_class.pick_next_task(rq, prev);
>>> }
>>>
>>> Which keeps idle_balance() before put_prev_task(), and by using
>>> idle_sched_clas.pick_next_task() doesn't rape the idle class interface
>>> like you did :-)
>>
>> But put_prev_task is called before pick_next_task, so idle_balance() is
>> called after now, no  ?
>
> No, put_prev_task() is called by the pick_next_task() that returns a
> task. So in the idle case above, the idle_sched_class.pick_next_task()
> will do the required put_prev_task().

Ah, ok. Let me try it.



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 15:26         ` Daniel Lezcano
@ 2014-01-17 15:33           ` Peter Zijlstra
  2014-01-17 16:06             ` Daniel Lezcano
                               ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Peter Zijlstra @ 2014-01-17 15:33 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On Fri, Jan 17, 2014 at 04:26:13PM +0100, Daniel Lezcano wrote:
> 
> Ah, ok. Let me try it.
> 

http://programming.kicks-ass.net/sekrit/patches.tar.bz2

has a queue that applies to tip/master.

The patches as on lkml need a little help in applying.

They've not been near a compiler yet though :/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 15:33           ` Peter Zijlstra
@ 2014-01-17 16:06             ` Daniel Lezcano
  2014-01-17 16:37             ` Daniel Lezcano
  2014-01-21  8:41             ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
  2 siblings, 0 replies; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 16:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/17/2014 04:33 PM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 04:26:13PM +0100, Daniel Lezcano wrote:
>>
>> Ah, ok. Let me try it.
>>
>
> http://programming.kicks-ass.net/sekrit/patches.tar.bz2
>
> has a queue that applies to tip/master.
>
> The patches as on lkml need a little help in applying.
>
> They've not been near a compiler yet though :/

Thanks a lot for the series. That makes my life easier :)

I did a small compilation fix with the pick_next_task function for the 
deadline scheduler and currently hackbench is running. I will give you 
the results in a moment.


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 15:33           ` Peter Zijlstra
  2014-01-17 16:06             ` Daniel Lezcano
@ 2014-01-17 16:37             ` Daniel Lezcano
  2014-01-17 16:45               ` [PATCH 1/2] sched/deadline: Fix compilation error Daniel Lezcano
  2014-01-21  8:41             ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
  2 siblings, 1 reply; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 16:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/17/2014 04:33 PM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 04:26:13PM +0100, Daniel Lezcano wrote:
>>
>> Ah, ok. Let me try it.
>>
>
> http://programming.kicks-ass.net/sekrit/patches.tar.bz2
>
> has a queue that applies to tip/master.
>
> The patches as on lkml need a little help in applying.
>
> They've not been near a compiler yet though :/
>

Here are the results:

Col1 : tip/sched/core
Col2 : the patchset above
Col3 : the patchset above + idle task shortcut

hackbench -s 4096 -l 1000 -g 10 -f 40 -T
  33.306 32.720 31.902
  32.344	32.139 32.214
  33.342	33.281 33.056
  33.319	33.421 31.789
  32.325	32.540 31.941
  33.701	32.978 32.229
  34.981	32.418 30.218
  32.379	31.656 31.717
  32.135	32.241 32.812
  32.531	32.790 31.967

avg:
  33.036 32.618 31.984

hackbench -p -s 4096 -l 1000 -g 10 -f 40
  27.595 27.601 26.331
  25.192	29.336 30.222
  26.057	28.579 28.351
  28.397	29.419 28.554
  27.628	25.045 30.470
  28.976	28.027 29.823
  28.764	29.361 26.902
  27.632	30.101 26.285
  29.129	29.248 26.975
  26.020	25.047 29.324

avg:
  27.539 28.176 28.324

hackbench -P -s 4096 -l 1000 -g 10 -f 40
  32.788 32.057 30.691
  33.158	33.251 34.315
  32.713	33.076 32.880
  32.488	32.878 32.792
  31.240	33.003 32.958
  32.493	33.096 32.244
  33.940	33.660 31.550
  34.733	33.777 31.829
  32.598	34.117 33.714
  34.091	33.196 31.336
avg:
  33.024 33.211 32.431



-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] sched/deadline: Fix compilation error
  2014-01-17 16:37             ` Daniel Lezcano
@ 2014-01-17 16:45               ` Daniel Lezcano
  2014-01-17 16:45                 ` [PATCH 2/2] sched: Use idle task shortcut Daniel Lezcano
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 16:45 UTC (permalink / raw)
  To: peterz; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

The pick_next_task function prototype changed with the previous patches.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/deadline.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0de2482..db4fb9c 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -989,7 +989,7 @@ static struct sched_dl_entity *pick_next_dl_entity(struct rq *rq,
 	return rb_entry(left, struct sched_dl_entity, rb_node);
 }
 
-struct task_struct *pick_next_task_dl(struct rq *rq)
+struct task_struct *pick_next_task_dl(struct rq *rq, struct task_struct *prev)
 {
 	struct sched_dl_entity *dl_se;
 	struct task_struct *p;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] sched: Use idle task shortcut
  2014-01-17 16:45               ` [PATCH 1/2] sched/deadline: Fix compilation error Daniel Lezcano
@ 2014-01-17 16:45                 ` Daniel Lezcano
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-17 16:45 UTC (permalink / raw)
  To: peterz; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

From: Peter Zijlstra <peterz@infradead.org>

With the previous patches, we have no ambiguity on going to idle. So we can
return directly the idle task instead of looking up all the domains which will in
any case return the idle_task.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/core.c |   39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 76f0075..b5237dc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2563,23 +2563,33 @@ pick_next_task(struct rq *rq, struct task_struct *prev)
 	const struct sched_class *class;
 	struct task_struct *p;
 
+again:
+	if (likely(rq->nr_running)) {
+		/*
+		 * Optimization: we know that if all tasks are in
+		 * the fair class we can call that function directly:
+		 */
+		if (likely(rq->nr_running == rq->cfs.h_nr_running))
+			return fair_sched_class.pick_next_task(rq, prev);
+
+		for_each_class(class) {
+			p = class->pick_next_task(rq, prev);
+			if (p)
+				return p;
+		}
+	}
+
 	/*
-	 * Optimization: we know that if all tasks are in
-	 * the fair class we can call that function directly:
+	 * If there is a task balanced on this cpu, pick the next task,
+	 * otherwise fall in the optimization by picking the idle task
+	 * directly.
 	 */
-	if (likely(rq->nr_running == rq->cfs.h_nr_running)) {
-		p = fair_sched_class.pick_next_task(rq, prev);
-		if (likely(p))
-			return p;
-	}
+	if (idle_balance(rq))
+		goto again;
 
-	for_each_class(class) {
-		p = class->pick_next_task(rq, prev);
-		if (p)
-			return p;
-	}
+	rq->idle_stamp = rq_clock(rq);
 
-	BUG(); /* the idle class will always have a runnable task */
+	return idle_sched_class.pick_next_task(rq, prev);
 }
 
 /*
@@ -2672,9 +2682,6 @@ need_resched:
 
 	pre_schedule(rq, prev);
 
-	if (unlikely(!rq->nr_running))
-		rq->idle_stamp = idle_balance(rq) ? 0 : rq_clock(rq);
-
 	if (prev->on_rq || rq->skip_clock_update < 0)
 		update_rq_clock(rq);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-17 15:33           ` Peter Zijlstra
  2014-01-17 16:06             ` Daniel Lezcano
  2014-01-17 16:37             ` Daniel Lezcano
@ 2014-01-21  8:41             ` Daniel Lezcano
  2014-01-21  9:06               ` Peter Zijlstra
  2 siblings, 1 reply; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-21  8:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/17/2014 04:33 PM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2014 at 04:26:13PM +0100, Daniel Lezcano wrote:
>>
>> Ah, ok. Let me try it.
>>
>
> http://programming.kicks-ass.net/sekrit/patches.tar.bz2
>
> has a queue that applies to tip/master.
>
> The patches as on lkml need a little help in applying.
>
> They've not been near a compiler yet though :/

Hi Peter,

do you want me to resend the full serie with your patchset included ?

Thanks
   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-21  8:41             ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
@ 2014-01-21  9:06               ` Peter Zijlstra
  2014-01-21  9:28                 ` Daniel Lezcano
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2014-01-21  9:06 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On Tue, Jan 21, 2014 at 09:41:31AM +0100, Daniel Lezcano wrote:
> On 01/17/2014 04:33 PM, Peter Zijlstra wrote:
> >On Fri, Jan 17, 2014 at 04:26:13PM +0100, Daniel Lezcano wrote:
> >>
> >>Ah, ok. Let me try it.
> >>
> >
> >http://programming.kicks-ass.net/sekrit/patches.tar.bz2
> >
> >has a queue that applies to tip/master.
> >
> >The patches as on lkml need a little help in applying.
> >
> >They've not been near a compiler yet though :/
> 
> Hi Peter,
> 
> do you want me to resend the full serie with your patchset included ?

Nah, I'll send it out, I wanted PJT or Ben to look over one more patch I
did, meant to do that yesterday but you know how those things go :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] sched: Idle task shortcut optimization
  2014-01-21  9:06               ` Peter Zijlstra
@ 2014-01-21  9:28                 ` Daniel Lezcano
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Lezcano @ 2014-01-21  9:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, linaro-kernel, alex.shi

On 01/21/2014 10:06 AM, Peter Zijlstra wrote:
> On Tue, Jan 21, 2014 at 09:41:31AM +0100, Daniel Lezcano wrote:
>> On 01/17/2014 04:33 PM, Peter Zijlstra wrote:
>>> On Fri, Jan 17, 2014 at 04:26:13PM +0100, Daniel Lezcano wrote:
>>>>
>>>> Ah, ok. Let me try it.
>>>>
>>>
>>> http://programming.kicks-ass.net/sekrit/patches.tar.bz2
>>>
>>> has a queue that applies to tip/master.
>>>
>>> The patches as on lkml need a little help in applying.
>>>
>>> They've not been near a compiler yet though :/
>>
>> Hi Peter,
>>
>> do you want me to resend the full serie with your patchset included ?
>
> Nah, I'll send it out, I wanted PJT or Ben to look over one more patch I
> did, meant to do that yesterday but you know how those things go :-)

Yeah :)

Thanks !

   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Remove 'cpu' parameter from idle_balance( )
  2014-01-17  9:04 [PATCH 1/4] sched: Remove 'cpu' parameter for idle_balance() Daniel Lezcano
                   ` (2 preceding siblings ...)
  2014-01-17  9:04 ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
@ 2014-02-11 12:15 ` tip-bot for Daniel Lezcano
  3 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Daniel Lezcano @ 2014-02-11 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx, daniel.lezcano

Commit-ID:  b4f2ab43615e5b36c48fffa99f26aca381839ac6
Gitweb:     http://git.kernel.org/tip/b4f2ab43615e5b36c48fffa99f26aca381839ac6
Author:     Daniel Lezcano <daniel.lezcano@linaro.org>
AuthorDate: Fri, 17 Jan 2014 10:04:01 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 10 Feb 2014 16:17:01 +0100

sched: Remove 'cpu' parameter from idle_balance()

The cpu parameter passed to idle_balance() is not needed as it could
be retrieved from 'struct rq.'

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: alex.shi@linaro.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389949444-14821-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  | 2 +-
 kernel/sched/fair.c  | 3 ++-
 kernel/sched/sched.h | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 104c816..74dd565 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2705,7 +2705,7 @@ need_resched:
 	pre_schedule(rq, prev);
 
 	if (unlikely(!rq->nr_running))
-		idle_balance(cpu, rq);
+		idle_balance(rq);
 
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4caa803..428bc9d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6531,12 +6531,13 @@ out:
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
-void idle_balance(int this_cpu, struct rq *this_rq)
+void idle_balance(struct rq *this_rq)
 {
 	struct sched_domain *sd;
 	int pulled_task = 0;
 	unsigned long next_balance = jiffies + HZ;
 	u64 curr_cost = 0;
+	int this_cpu = this_rq->cpu;
 
 	this_rq->idle_stamp = rq_clock(this_rq);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b44720d..82c0e02 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1158,7 +1158,7 @@ extern const struct sched_class idle_sched_class;
 extern void update_group_power(struct sched_domain *sd, int cpu);
 
 extern void trigger_load_balance(struct rq *rq);
-extern void idle_balance(int this_cpu, struct rq *this_rq);
+extern void idle_balance(struct rq *this_rq);
 
 extern void idle_enter_fair(struct rq *this_rq);
 extern void idle_exit_fair(struct rq *this_rq);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Fix race in idle_balance()
  2014-01-17  9:04 ` [PATCH 2/4] sched: Fix race in idle_balance() Daniel Lezcano
  2014-01-17 13:33   ` Peter Zijlstra
@ 2014-02-11 12:15   ` tip-bot for Daniel Lezcano
  1 sibling, 0 replies; 24+ messages in thread
From: tip-bot for Daniel Lezcano @ 2014-02-11 12:15 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx, daniel.lezcano

Commit-ID:  e5fc66119ec97054eefc83f173a7ee9e133c3c3a
Gitweb:     http://git.kernel.org/tip/e5fc66119ec97054eefc83f173a7ee9e133c3c3a
Author:     Daniel Lezcano <daniel.lezcano@linaro.org>
AuthorDate: Fri, 17 Jan 2014 10:04:02 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 10 Feb 2014 16:17:04 +0100

sched: Fix race in idle_balance()

The scheduler main function 'schedule()' checks if there are no more tasks
on the runqueue. Then it checks if a task should be pulled in the current
runqueue in idle_balance() assuming it will go to idle otherwise.

But idle_balance() releases the rq->lock in order to look up the sched
domains and takes the lock again right after. That opens a window where
another cpu may put a task in our runqueue, so we won't go to idle but
we have filled the idle_stamp, thinking we will.

This patch closes the window by checking if the runqueue has been modified
but without pulling a task after taking the lock again, so we won't go to idle
right after in the __schedule() function.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: alex.shi@linaro.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389949444-14821-2-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 428bc9d..5ebc681 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6589,6 +6589,13 @@ void idle_balance(struct rq *this_rq)
 
 	raw_spin_lock(&this_rq->lock);
 
+	/*
+	 * While browsing the domains, we released the rq lock.
+	 * A task could have be enqueued in the meantime
+	 */
+	if (this_rq->nr_running && !pulled_task)
+		return;
+
 	if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
 		/*
 		 * We are going idle. next_balance may be set based on

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched: Move rq->idle_stamp up to the core
  2014-01-17  9:04 ` [PATCH 3/4] sched: Move idle_stamp up to the core Daniel Lezcano
@ 2014-02-11 12:16   ` tip-bot for Daniel Lezcano
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Daniel Lezcano @ 2014-02-11 12:16 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx, daniel.lezcano

Commit-ID:  3c4017c13f91069194fce3160944efec50f15a6e
Gitweb:     http://git.kernel.org/tip/3c4017c13f91069194fce3160944efec50f15a6e
Author:     Daniel Lezcano <daniel.lezcano@linaro.org>
AuthorDate: Fri, 17 Jan 2014 10:04:03 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 10 Feb 2014 16:17:07 +0100

sched: Move rq->idle_stamp up to the core

idle_balance() modifies the rq->idle_stamp field, making this information
shared across core.c and fair.c.

As we know if the cpu is going to idle or not with the previous patch, let's
encapsulate the rq->idle_stamp information in core.c by moving it up to the
caller.

The idle_balance() function returns true in case a balancing occured and the
cpu won't be idle, false if no balance happened and the cpu is going idle.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: alex.shi@linaro.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389949444-14821-3-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  | 11 +++++++++--
 kernel/sched/fair.c  | 14 ++++++--------
 kernel/sched/sched.h |  2 +-
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 74dd565..417cf65 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2704,8 +2704,15 @@ need_resched:
 
 	pre_schedule(rq, prev);
 
-	if (unlikely(!rq->nr_running))
-		idle_balance(rq);
+	if (unlikely(!rq->nr_running)) {
+		/*
+		 * We must set idle_stamp _before_ calling idle_balance(), such
+		 * that we measure the duration of idle_balance() as idle time.
+		 */
+		rq->idle_stamp = rq_clock(rq);
+		if (idle_balance(rq))
+			rq->idle_stamp = 0;
+	}
 
 	put_prev_task(rq, prev);
 	next = pick_next_task(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5ebc681..04fea77 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6531,7 +6531,7 @@ out:
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
-void idle_balance(struct rq *this_rq)
+int idle_balance(struct rq *this_rq)
 {
 	struct sched_domain *sd;
 	int pulled_task = 0;
@@ -6539,10 +6539,8 @@ void idle_balance(struct rq *this_rq)
 	u64 curr_cost = 0;
 	int this_cpu = this_rq->cpu;
 
-	this_rq->idle_stamp = rq_clock(this_rq);
-
 	if (this_rq->avg_idle < sysctl_sched_migration_cost)
-		return;
+		return 0;
 
 	/*
 	 * Drop the rq->lock, but keep IRQ/preempt disabled.
@@ -6580,10 +6578,8 @@ void idle_balance(struct rq *this_rq)
 		interval = msecs_to_jiffies(sd->balance_interval);
 		if (time_after(next_balance, sd->last_balance + interval))
 			next_balance = sd->last_balance + interval;
-		if (pulled_task) {
-			this_rq->idle_stamp = 0;
+		if (pulled_task)
 			break;
-		}
 	}
 	rcu_read_unlock();
 
@@ -6594,7 +6590,7 @@ void idle_balance(struct rq *this_rq)
 	 * A task could have be enqueued in the meantime
 	 */
 	if (this_rq->nr_running && !pulled_task)
-		return;
+		return 1;
 
 	if (pulled_task || time_after(jiffies, this_rq->next_balance)) {
 		/*
@@ -6606,6 +6602,8 @@ void idle_balance(struct rq *this_rq)
 
 	if (curr_cost > this_rq->max_idle_balance_cost)
 		this_rq->max_idle_balance_cost = curr_cost;
+
+	return pulled_task;
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 82c0e02..bb89991 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1158,7 +1158,7 @@ extern const struct sched_class idle_sched_class;
 extern void update_group_power(struct sched_domain *sd, int cpu);
 
 extern void trigger_load_balance(struct rq *rq);
-extern void idle_balance(struct rq *this_rq);
+extern int idle_balance(struct rq *this_rq);
 
 extern void idle_enter_fair(struct rq *this_rq);
 extern void idle_exit_fair(struct rq *this_rq);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-02-11 12:16 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-17  9:04 [PATCH 1/4] sched: Remove 'cpu' parameter for idle_balance() Daniel Lezcano
2014-01-17  9:04 ` [PATCH 2/4] sched: Fix race in idle_balance() Daniel Lezcano
2014-01-17 13:33   ` Peter Zijlstra
2014-01-17 13:44     ` Daniel Lezcano
2014-02-11 12:15   ` [tip:sched/core] " tip-bot for Daniel Lezcano
2014-01-17  9:04 ` [PATCH 3/4] sched: Move idle_stamp up to the core Daniel Lezcano
2014-02-11 12:16   ` [tip:sched/core] sched: Move rq->idle_stamp " tip-bot for Daniel Lezcano
2014-01-17  9:04 ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
2014-01-17 14:09   ` Peter Zijlstra
2014-01-17 15:09     ` Daniel Lezcano
2014-01-17 15:23       ` Peter Zijlstra
2014-01-17 14:26   ` Peter Zijlstra
2014-01-17 15:06     ` Daniel Lezcano
2014-01-17 15:23       ` Peter Zijlstra
2014-01-17 15:26         ` Daniel Lezcano
2014-01-17 15:33           ` Peter Zijlstra
2014-01-17 16:06             ` Daniel Lezcano
2014-01-17 16:37             ` Daniel Lezcano
2014-01-17 16:45               ` [PATCH 1/2] sched/deadline: Fix compilation error Daniel Lezcano
2014-01-17 16:45                 ` [PATCH 2/2] sched: Use idle task shortcut Daniel Lezcano
2014-01-21  8:41             ` [PATCH 4/4] sched: Idle task shortcut optimization Daniel Lezcano
2014-01-21  9:06               ` Peter Zijlstra
2014-01-21  9:28                 ` Daniel Lezcano
2014-02-11 12:15 ` [tip:sched/core] sched: Remove 'cpu' parameter from idle_balance( ) tip-bot for Daniel Lezcano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.