[PATCH 0/2] scheduler fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] scheduler fixes
@ 2009-07-30 14:57 Gregory Haskins
  2009-07-30 14:57 ` [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code Gregory Haskins
  2009-07-30 14:57 ` [PATCH 2/2] sched: fix race in cpupri introduced by cpumask_var changes Gregory Haskins
  0 siblings, 2 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-07-30 14:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, rostedt, rusty, maxk, mingo

The following changes since commit 1314bb58bbe2a2c146490c59986c511687543d3c:
  Gregory Haskins (1):
        sched: enhance the pre/post scheduling logic

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks.git sched/updates

Gregory Haskins (2):
      [RESEND] sched: Fully integrate cpus_active_map and root-domain code
      sched: fix race in cpupri introduced by cpumask_var changes

 kernel/sched.c        |    2 +-
 kernel/sched_cpupri.c |   15 ++++++++++++++-
 kernel/sched_fair.c   |   10 +++++++---
 kernel/sched_rt.c     |    7 -------
 4 files changed, 22 insertions(+), 12 deletions(-)


Regards,
-Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code
  2009-07-30 14:57 [PATCH 0/2] scheduler fixes Gregory Haskins
@ 2009-07-30 14:57 ` Gregory Haskins
  2009-07-30 15:01   ` Gregory Haskins
  2009-08-02 13:13   ` [tip:sched/core] " tip-bot for Gregory Haskins
  2009-07-30 14:57 ` [PATCH 2/2] sched: fix race in cpupri introduced by cpumask_var changes Gregory Haskins
  1 sibling, 2 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-07-30 14:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, rostedt, rusty, maxk, mingo

(Applies to 2.6.31-rc4)

[
	This patch was originaly sent about a year ago, but got dropped
	presumably by accident.  Here is the original thread.

	http://lkml.org/lkml/2008/7/22/281

	At that time, Peter and Max acked it.  It has now been forward
	ported to the new cpumask interface.  I will be so bold as to
	carry their acks forward since the basic logic is the same.
	However, a new acknowledgement, if they have the time to review,
	would be ideal.

	I have tested this patch on a 4-way system using Max's recommended
	"echo 0|1 > cpu1/online" technique and it appears to work properly
]

What: Reflect "active" cpus in the rq->rd->online field, instead of the
online_map.

Motivation:  Things that use the root-domain code (such as cpupri) only
care about cpus classified as "active" anyway.  By synchronizing the
root-domain state with the active map, we allow several optimizations.

For instance, we can remove an extra cpumask_and from the scheduler
hotpath by utilizing rq->rd->online (since it is now a cached version
of cpu_active_map & rq->rd->span).

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
---

 kernel/sched.c      |    2 +-
 kernel/sched_fair.c |   10 +++++++---
 kernel/sched_rt.c   |    7 -------
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 1a104e6..38a1526 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7874,7 +7874,7 @@ static void rq_attach_root(struct rq *rq, struct root_domain *rd)
 	rq->rd = rd;
 
 	cpumask_set_cpu(rq->cpu, rd->span);
-	if (cpumask_test_cpu(rq->cpu, cpu_online_mask))
+	if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
 		set_rq_online(rq);
 
 	spin_unlock_irqrestore(&rq->lock, flags);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9ffb2b2..2b9cae6 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1040,17 +1040,21 @@ static void yield_task_fair(struct rq *rq)
  * search starts with cpus closest then further out as needed,
  * so we always favor a closer, idle cpu.
  * Domains may include CPUs that are not usable for migration,
- * hence we need to mask them out (cpu_active_mask)
+ * hence we need to mask them out (rq->rd->online)
  *
  * Returns the CPU we should wake onto.
  */
 #if defined(ARCH_HAS_SCHED_WAKE_IDLE)
+
+#define cpu_rd_active(cpu, rq) cpumask_test_cpu(cpu, rq->rd->online)
+
 static int wake_idle(int cpu, struct task_struct *p)
 {
 	struct sched_domain *sd;
 	int i;
 	unsigned int chosen_wakeup_cpu;
 	int this_cpu;
+	struct rq *task_rq = task_rq(p);
 
 	/*
 	 * At POWERSAVINGS_BALANCE_WAKEUP level, if both this_cpu and prev_cpu
@@ -1083,10 +1087,10 @@ static int wake_idle(int cpu, struct task_struct *p)
 	for_each_domain(cpu, sd) {
 		if ((sd->flags & SD_WAKE_IDLE)
 		    || ((sd->flags & SD_WAKE_IDLE_FAR)
-			&& !task_hot(p, task_rq(p)->clock, sd))) {
+			&& !task_hot(p, task_rq->clock, sd))) {
 			for_each_cpu_and(i, sched_domain_span(sd),
 					 &p->cpus_allowed) {
-				if (cpu_active(i) && idle_cpu(i)) {
+				if (cpu_rd_active(i, task_rq) && idle_cpu(i)) {
 					if (i != task_cpu(p)) {
 						schedstat_inc(p,
 						       se.nr_wakeups_idle);
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index a8f89bc..13f728e 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1173,13 +1173,6 @@ static int find_lowest_rq(struct task_struct *task)
 		return -1; /* No targets found */
 
 	/*
-	 * Only consider CPUs that are usable for migration.
-	 * I guess we might want to change cpupri_find() to ignore those
-	 * in the first place.
-	 */
-	cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
-
-	/*
 	 * At this point we have built a mask of cpus representing the
 	 * lowest priority tasks in the system.  Now we want to elect
 	 * the best one based on our affinity and topology.


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] sched: fix race in cpupri introduced by cpumask_var changes
  2009-07-30 14:57 [PATCH 0/2] scheduler fixes Gregory Haskins
  2009-07-30 14:57 ` [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code Gregory Haskins
@ 2009-07-30 14:57 ` Gregory Haskins
  2009-08-02 13:12   ` [tip:sched/core] sched: Fix " tip-bot for Gregory Haskins
  1 sibling, 1 reply; 7+ messages in thread
From: Gregory Haskins @ 2009-07-30 14:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, rostedt, rusty, maxk, mingo

[
	Note: obsoletes c8cab1ddfafd112c82a69a6cc4ec608984eeac97,
	though it should be harmless to leave the old patch applied.
]

Background:

Several race conditions in the scheduler have cropped up recently,
which Steven and I have tracked down using ftrace.  The most recent
one turns out to be a race in how the scheduler determines a
suitable migration target for RT tasks, introduced recently with commit:

    commit 68e74568fbe5854952355e942acca51f138096d9
    Author: Rusty Russell <rusty@rustcorp.com.au>
    Date:   Tue Nov 25 02:35:13 2008 +1030

        sched: convert struct cpupri_vec cpumask_var_t.

The original design of cpupri allowed lockless readers to quickly
determine a best-estimate target.  Races between the pri_active
bitmap and the vec->mask were handled in the original code because
we would detect and return "0" when this occured.  The design was
predicated on the *effective* atomicity (*) of caching the result
of cpus_and() between the cpus_allowed and the vec->mask.

Commit 68e74568 changed the behavior such that vec->mask is accessed
multiple times.  This introduces a subtle race, the result of which
means we can have a result that returns "1", but with an empty bitmap.

*) yes, we know cpus_and() is not a locked operator across the entire
   composite array, but it is implicitly atomic on a per-word basis
   which is all the design required to work.

Implementation:

Rather than forgoing the lockless design, or reverting to a stack-based
cpumask_t, we simply check for when the race has been encountered and
continue processing in the event that the race is hit.  This renders the
removal race as if the priority bit had been atomically cleared as well,
and allows the algorithm to execute correctly.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Steven Rostedt <srostedt@redhat.com>
---

 kernel/sched_cpupri.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_cpupri.c b/kernel/sched_cpupri.c
index aef68b6..0f052fc 100644
--- a/kernel/sched_cpupri.c
+++ b/kernel/sched_cpupri.c
@@ -81,8 +81,21 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
 		if (cpumask_any_and(&p->cpus_allowed, vec->mask) >= nr_cpu_ids)
 			continue;

-		if (lowest_mask)
+		if (lowest_mask) {
 			cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
+
+			/*
+			 * We have to ensure that we have at least one bit
+			 * still set in the array, since the map could have
+			 * been concurrently emptied between the first and
+			 * second reads of vec->mask.  If we hit this
+			 * condition, simply act as though we never hit this
+			 * priority level and continue on.
+			 */
+			if (cpumask_any(lowest_mask) >= nr_cpu_ids)
+				continue;
+		}
+
 		return 1;
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code
  2009-07-30 14:57 ` [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code Gregory Haskins
@ 2009-07-30 15:01   ` Gregory Haskins
  2009-07-30 15:10     ` Gregory Haskins
  2009-08-02 13:13   ` [tip:sched/core] " tip-bot for Gregory Haskins
  1 sibling, 1 reply; 7+ messages in thread
From: Gregory Haskins @ 2009-07-30 15:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, rostedt, rusty, maxk, mingo

[-- Attachment #1: Type: text/plain, Size: 4704 bytes --]

Gregory Haskins wrote:
> (Applies to 2.6.31-rc4)
> 
> [
> 	This patch was originaly sent about a year ago, but got dropped
> 	presumably by accident.  Here is the original thread.
> 
> 	http://lkml.org/lkml/2008/7/22/281
> 
> 	At that time, Peter and Max acked it.  It has now been forward
> 	ported to the new cpumask interface.  I will be so bold as to
> 	carry their acks forward since the basic logic is the same.
> 	However, a new acknowledgement, if they have the time to review,
> 	would be ideal.
> 
> 	I have tested this patch on a 4-way system using Max's recommended
> 	"echo 0|1 > cpu1/online" technique and it appears to work properly
> ]
> 
> What: Reflect "active" cpus in the rq->rd->online field, instead of the
> online_map.
> 
> Motivation:  Things that use the root-domain code (such as cpupri) only
> care about cpus classified as "active" anyway.  By synchronizing the
> root-domain state with the active map, we allow several optimizations.
> 
> For instance, we can remove an extra cpumask_and from the scheduler
> hotpath by utilizing rq->rd->online (since it is now a cached version
> of cpu_active_map & rq->rd->span).
> 
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Max Krasnyansky <maxk@qualcomm.com>
> ---
> 
>  kernel/sched.c      |    2 +-
>  kernel/sched_fair.c |   10 +++++++---
>  kernel/sched_rt.c   |    7 -------
>  3 files changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 1a104e6..38a1526 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -7874,7 +7874,7 @@ static void rq_attach_root(struct rq *rq, struct root_domain *rd)
>  	rq->rd = rd;
>  
>  	cpumask_set_cpu(rq->cpu, rd->span);
> -	if (cpumask_test_cpu(rq->cpu, cpu_online_mask))
> +	if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
>  		set_rq_online(rq);
>  
>  	spin_unlock_irqrestore(&rq->lock, flags);
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9ffb2b2..2b9cae6 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1040,17 +1040,21 @@ static void yield_task_fair(struct rq *rq)
>   * search starts with cpus closest then further out as needed,
>   * so we always favor a closer, idle cpu.
>   * Domains may include CPUs that are not usable for migration,
> - * hence we need to mask them out (cpu_active_mask)
> + * hence we need to mask them out (rq->rd->online)
>   *
>   * Returns the CPU we should wake onto.
>   */
>  #if defined(ARCH_HAS_SCHED_WAKE_IDLE)
> +
> +#define cpu_rd_active(cpu, rq) cpumask_test_cpu(cpu, rq->rd->online)
> +
>  static int wake_idle(int cpu, struct task_struct *p)
>  {
>  	struct sched_domain *sd;
>  	int i;
>  	unsigned int chosen_wakeup_cpu;
>  	int this_cpu;
> +	struct rq *task_rq = task_rq(p);
>  
>  	/*
>  	 * At POWERSAVINGS_BALANCE_WAKEUP level, if both this_cpu and prev_cpu
> @@ -1083,10 +1087,10 @@ static int wake_idle(int cpu, struct task_struct *p)
>  	for_each_domain(cpu, sd) {
>  		if ((sd->flags & SD_WAKE_IDLE)
>  		    || ((sd->flags & SD_WAKE_IDLE_FAR)
> -			&& !task_hot(p, task_rq(p)->clock, sd))) {
> +			&& !task_hot(p, task_rq->clock, sd))) {
>  			for_each_cpu_and(i, sched_domain_span(sd),
>  					 &p->cpus_allowed) {

Hmm, something got suboptimal in translation from the original patch.

This would be better expressed as:

for_each_cpu_and(i, rq->rd->online, &p->cpus_allowed) {
	if (idle_cpu(i)...
}

> -				if (cpu_active(i) && idle_cpu(i)) {
> +				if (cpu_rd_active(i, task_rq) && idle_cpu(i)) {
>  					if (i != task_cpu(p)) {
>  						schedstat_inc(p,
>  						       se.nr_wakeups_idle);
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index a8f89bc..13f728e 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -1173,13 +1173,6 @@ static int find_lowest_rq(struct task_struct *task)
>  		return -1; /* No targets found */
>  
>  	/*
> -	 * Only consider CPUs that are usable for migration.
> -	 * I guess we might want to change cpupri_find() to ignore those
> -	 * in the first place.
> -	 */
> -	cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
> -
> -	/*
>  	 * At this point we have built a mask of cpus representing the
>  	 * lowest priority tasks in the system.  Now we want to elect
>  	 * the best one based on our affinity and topology.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code
  2009-07-30 15:01   ` Gregory Haskins
@ 2009-07-30 15:10     ` Gregory Haskins
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-07-30 15:10 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: linux-kernel, peterz, rostedt, rusty, maxk, mingo

[-- Attachment #1: Type: text/plain, Size: 3874 bytes --]

Gregory Haskins wrote:
> Gregory Haskins wrote:
>> (Applies to 2.6.31-rc4)
>>
>> [
>> 	This patch was originaly sent about a year ago, but got dropped
>> 	presumably by accident.  Here is the original thread.
>>
>> 	http://lkml.org/lkml/2008/7/22/281
>>
>> 	At that time, Peter and Max acked it.  It has now been forward
>> 	ported to the new cpumask interface.  I will be so bold as to
>> 	carry their acks forward since the basic logic is the same.
>> 	However, a new acknowledgement, if they have the time to review,
>> 	would be ideal.
>>
>> 	I have tested this patch on a 4-way system using Max's recommended
>> 	"echo 0|1 > cpu1/online" technique and it appears to work properly
>> ]
>>
>> What: Reflect "active" cpus in the rq->rd->online field, instead of the
>> online_map.
>>
>> Motivation:  Things that use the root-domain code (such as cpupri) only
>> care about cpus classified as "active" anyway.  By synchronizing the
>> root-domain state with the active map, we allow several optimizations.
>>
>> For instance, we can remove an extra cpumask_and from the scheduler
>> hotpath by utilizing rq->rd->online (since it is now a cached version
>> of cpu_active_map & rq->rd->span).
>>
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> Acked-by: Peter Zijlstra <peterz@infradead.org>
>> Acked-by: Max Krasnyansky <maxk@qualcomm.com>
>> ---
>>
>>  kernel/sched.c      |    2 +-
>>  kernel/sched_fair.c |   10 +++++++---
>>  kernel/sched_rt.c   |    7 -------
>>  3 files changed, 8 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index 1a104e6..38a1526 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -7874,7 +7874,7 @@ static void rq_attach_root(struct rq *rq, struct root_domain *rd)
>>  	rq->rd = rd;
>>  
>>  	cpumask_set_cpu(rq->cpu, rd->span);
>> -	if (cpumask_test_cpu(rq->cpu, cpu_online_mask))
>> +	if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
>>  		set_rq_online(rq);
>>  
>>  	spin_unlock_irqrestore(&rq->lock, flags);
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index 9ffb2b2..2b9cae6 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -1040,17 +1040,21 @@ static void yield_task_fair(struct rq *rq)
>>   * search starts with cpus closest then further out as needed,
>>   * so we always favor a closer, idle cpu.
>>   * Domains may include CPUs that are not usable for migration,
>> - * hence we need to mask them out (cpu_active_mask)
>> + * hence we need to mask them out (rq->rd->online)
>>   *
>>   * Returns the CPU we should wake onto.
>>   */
>>  #if defined(ARCH_HAS_SCHED_WAKE_IDLE)
>> +
>> +#define cpu_rd_active(cpu, rq) cpumask_test_cpu(cpu, rq->rd->online)
>> +
>>  static int wake_idle(int cpu, struct task_struct *p)
>>  {
>>  	struct sched_domain *sd;
>>  	int i;
>>  	unsigned int chosen_wakeup_cpu;
>>  	int this_cpu;
>> +	struct rq *task_rq = task_rq(p);
>>  
>>  	/*
>>  	 * At POWERSAVINGS_BALANCE_WAKEUP level, if both this_cpu and prev_cpu
>> @@ -1083,10 +1087,10 @@ static int wake_idle(int cpu, struct task_struct *p)
>>  	for_each_domain(cpu, sd) {
>>  		if ((sd->flags & SD_WAKE_IDLE)
>>  		    || ((sd->flags & SD_WAKE_IDLE_FAR)
>> -			&& !task_hot(p, task_rq(p)->clock, sd))) {
>> +			&& !task_hot(p, task_rq->clock, sd))) {
>>  			for_each_cpu_and(i, sched_domain_span(sd),
>>  					 &p->cpus_allowed) {
> 
> Hmm, something got suboptimal in translation from the original patch.
> 
> This would be better expressed as:
> 
> for_each_cpu_and(i, rq->rd->online, &p->cpus_allowed) {
> 	if (idle_cpu(i)...
> }


NM.  My first instinct was correct.

We need the result of sd->span, cpus_allowed, and active_map.  The
submitted logic (or the original) is correct, and my comment above is wrong.

Sorry for the noise,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched: Fix race in cpupri introduced by cpumask_var changes
  2009-07-30 14:57 ` [PATCH 2/2] sched: fix race in cpupri introduced by cpumask_var changes Gregory Haskins
@ 2009-08-02 13:12   ` tip-bot for Gregory Haskins
  0 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Gregory Haskins @ 2009-08-02 13:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, rusty, srostedt, tglx,
	ghaskins, mingo

Commit-ID:  07903af152b0597d94e9b0030746b63c4664e787
Gitweb:     http://git.kernel.org/tip/07903af152b0597d94e9b0030746b63c4664e787
Author:     Gregory Haskins <ghaskins@novell.com>
AuthorDate: Thu, 30 Jul 2009 10:57:28 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 2 Aug 2009 14:23:29 +0200

sched: Fix race in cpupri introduced by cpumask_var changes

Background:

Several race conditions in the scheduler have cropped up
recently, which Steven and I have tracked down using ftrace.
The most recent one turns out to be a race in how the scheduler
determines a suitable migration target for RT tasks, introduced
recently with commit:

    commit 68e74568fbe5854952355e942acca51f138096d9
    Date:   Tue Nov 25 02:35:13 2008 +1030

        sched: convert struct cpupri_vec cpumask_var_t.

The original design of cpupri allowed lockless readers to
quickly determine a best-estimate target.  Races between the
pri_active bitmap and the vec->mask were handled in the
original code because we would detect and return "0" when this
occured.  The design was predicated on the *effective*
atomicity (*) of caching the result of cpus_and() between the
cpus_allowed and the vec->mask.

Commit 68e74568 changed the behavior such that vec->mask is
accessed multiple times.  This introduces a subtle race, the
result of which means we can have a result that returns "1",
but with an empty bitmap.

*) yes, we know cpus_and() is not a locked operator across the
   entire composite array, but it is implicitly atomic on a
   per-word basis which is all the design required to work.

Implementation:

Rather than forgoing the lockless design, or reverting to a
stack-based cpumask_t, we simply check for when the race has
been encountered and continue processing in the event that the
race is hit.  This renders the removal race as if the priority
bit had been atomically cleared as well, and allows the
algorithm to execute correctly.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

---
 kernel/sched_cpupri.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_cpupri.c b/kernel/sched_cpupri.c
index e6c2517..d014efb 100644
--- a/kernel/sched_cpupri.c
+++ b/kernel/sched_cpupri.c
@@ -81,8 +81,21 @@ int cpupri_find(struct cpupri *cp, struct task_struct *p,
 		if (cpumask_any_and(&p->cpus_allowed, vec->mask) >= nr_cpu_ids)
 			continue;

-		if (lowest_mask)
+		if (lowest_mask) {
 			cpumask_and(lowest_mask, &p->cpus_allowed, vec->mask);
+
+			/*
+			 * We have to ensure that we have at least one bit
+			 * still set in the array, since the map could have
+			 * been concurrently emptied between the first and
+			 * second reads of vec->mask.  If we hit this
+			 * condition, simply act as though we never hit this
+			 * priority level and continue on.
+			 */
+			if (cpumask_any(lowest_mask) >= nr_cpu_ids)
+				continue;
+		}
+
 		return 1;
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched: Fully integrate cpus_active_map and root-domain code
  2009-07-30 14:57 ` [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code Gregory Haskins
  2009-07-30 15:01   ` Gregory Haskins
@ 2009-08-02 13:13   ` tip-bot for Gregory Haskins
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Gregory Haskins @ 2009-08-02 13:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, peterz, maxk, tglx,
	ghaskins, mingo

Commit-ID:  00aec93d10a051ea64f83eff75d4065a19508ea6
Gitweb:     http://git.kernel.org/tip/00aec93d10a051ea64f83eff75d4065a19508ea6
Author:     Gregory Haskins <ghaskins@novell.com>
AuthorDate: Thu, 30 Jul 2009 10:57:23 -0400
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 2 Aug 2009 14:26:12 +0200

sched: Fully integrate cpus_active_map and root-domain code

Reflect "active" cpus in the rq->rd->online field, instead of
the online_map.

The motivation is that things that use the root-domain code
(such as cpupri) only care about cpus classified as "active"
anyway. By synchronizing the root-domain state with the active
map, we allow several optimizations.

For instance, we can remove an extra cpumask_and from the
scheduler hotpath by utilizing rq->rd->online (since it is now
a cached version of cpu_active_map & rq->rd->span).

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 kernel/sched.c      |    2 +-
 kernel/sched_fair.c |   10 +++++++---
 kernel/sched_rt.c   |    7 -------
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 613fee5..475138c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7927,7 +7927,7 @@ static void rq_attach_root(struct rq *rq, struct root_domain *rd)
 	rq->rd = rd;
 
 	cpumask_set_cpu(rq->cpu, rd->span);
-	if (cpumask_test_cpu(rq->cpu, cpu_online_mask))
+	if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
 		set_rq_online(rq);
 
 	spin_unlock_irqrestore(&rq->lock, flags);
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 652e8bd..4934729 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1046,17 +1046,21 @@ static void yield_task_fair(struct rq *rq)
  * search starts with cpus closest then further out as needed,
  * so we always favor a closer, idle cpu.
  * Domains may include CPUs that are not usable for migration,
- * hence we need to mask them out (cpu_active_mask)
+ * hence we need to mask them out (rq->rd->online)
  *
  * Returns the CPU we should wake onto.
  */
 #if defined(ARCH_HAS_SCHED_WAKE_IDLE)
+
+#define cpu_rd_active(cpu, rq) cpumask_test_cpu(cpu, rq->rd->online)
+
 static int wake_idle(int cpu, struct task_struct *p)
 {
 	struct sched_domain *sd;
 	int i;
 	unsigned int chosen_wakeup_cpu;
 	int this_cpu;
+	struct rq *task_rq = task_rq(p);
 
 	/*
 	 * At POWERSAVINGS_BALANCE_WAKEUP level, if both this_cpu and prev_cpu
@@ -1089,10 +1093,10 @@ static int wake_idle(int cpu, struct task_struct *p)
 	for_each_domain(cpu, sd) {
 		if ((sd->flags & SD_WAKE_IDLE)
 		    || ((sd->flags & SD_WAKE_IDLE_FAR)
-			&& !task_hot(p, task_rq(p)->clock, sd))) {
+			&& !task_hot(p, task_rq->clock, sd))) {
 			for_each_cpu_and(i, sched_domain_span(sd),
 					 &p->cpus_allowed) {
-				if (cpu_active(i) && idle_cpu(i)) {
+				if (cpu_rd_active(i, task_rq) && idle_cpu(i)) {
 					if (i != task_cpu(p)) {
 						schedstat_inc(p,
 						       se.nr_wakeups_idle);
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index a8f89bc..13f728e 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1173,13 +1173,6 @@ static int find_lowest_rq(struct task_struct *task)
 		return -1; /* No targets found */
 
 	/*
-	 * Only consider CPUs that are usable for migration.
-	 * I guess we might want to change cpupri_find() to ignore those
-	 * in the first place.
-	 */
-	cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
-
-	/*
 	 * At this point we have built a mask of cpus representing the
 	 * lowest priority tasks in the system.  Now we want to elect
 	 * the best one based on our affinity and topology.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-08-02 13:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-30 14:57 [PATCH 0/2] scheduler fixes Gregory Haskins
2009-07-30 14:57 ` [PATCH 1/2] [RESEND] sched: Fully integrate cpus_active_map and root-domain code Gregory Haskins
2009-07-30 15:01   ` Gregory Haskins
2009-07-30 15:10     ` Gregory Haskins
2009-08-02 13:13   ` [tip:sched/core] " tip-bot for Gregory Haskins
2009-07-30 14:57 ` [PATCH 2/2] sched: fix race in cpupri introduced by cpumask_var changes Gregory Haskins
2009-08-02 13:12   ` [tip:sched/core] sched: Fix " tip-bot for Gregory Haskins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.