linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] sched: fork/exec/wake clean up
@ 2012-12-03 13:54 Alex Shi
  2012-12-03 13:54 ` [PATCH 01/10] sched: select_task_rq_fair " Alex Shi
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Alex Shi @ 2012-12-03 13:54 UTC (permalink / raw)
  To: npiggin, mingo, peterz; +Cc: linux-kernel, akpm

This patchset try to clean up select_task_rq_fair which used for 
fork/exec/wake scheduling.

With this patchset, our system NHM EX and SNB EP 4 socket machine
has 10% hackbench performance increase.

Regards
Alex

[PATCH 01/4] sched: select_task_rq_fair clean up
[PATCH 02/4] sched: fix find_idlest_group mess logical
[PATCH 03/4] sched: don't need go to smaller sched domain
[PATCH 04/4] sched: remove domain iterations in fork/exec/wake

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/10] sched: select_task_rq_fair clean up
  2012-12-03 13:54 [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
@ 2012-12-03 13:54 ` Alex Shi
  2012-12-06 17:50   ` Frederic Weisbecker
  2012-12-03 13:54 ` [PATCH 02/10] sched: fix find_idlest_group mess logical Alex Shi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2012-12-03 13:54 UTC (permalink / raw)
  To: npiggin, mingo, peterz; +Cc: linux-kernel, akpm

It is impossible to miss a task allowed cpu in a eligible group.

And since find_idlest_group only return a different group which
excludes old cpu, it's also imporissible to find a new cpu same as old
cpu.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |    5 -----
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 59e072b..df99456 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3150,11 +3150,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 		}
 
 		new_cpu = find_idlest_cpu(group, p, cpu);
-		if (new_cpu == -1 || new_cpu == cpu) {
-			/* Now try balancing at a lower domain level of cpu */
-			sd = sd->child;
-			continue;
-		}
 
 		/* Now try balancing at a lower domain level of new_cpu */
 		cpu = new_cpu;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/10] sched: fix find_idlest_group mess logical
  2012-12-03 13:54 [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
  2012-12-03 13:54 ` [PATCH 01/10] sched: select_task_rq_fair " Alex Shi
@ 2012-12-03 13:54 ` Alex Shi
  2012-12-07  0:56   ` Frederic Weisbecker
  2012-12-03 13:54 ` [PATCH 03/10] sched: don't need go to smaller sched domain Alex Shi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2012-12-03 13:54 UTC (permalink / raw)
  To: npiggin, mingo, peterz; +Cc: linux-kernel, akpm

There is 4 situations in the function:
1, no task allowed group;
	so min_load = ULONG_MAX, this_load = 0, idlest = NULL
2, only local group task allowed;
	so min_load = ULONG_MAX, this_load assigned, idlest = NULL
3, only non-local task group allowed;
	so min_load assigned, this_load = 0, idlest != NULL
4, local group + another group are task allowed.
	so min_load assigned, this_load assigned, idlest != NULL

Current logical will return NULL in first 3 kinds of scenarios.
And still return NULL, if idlest group is heavier then the
local group in the 4th situation.

Actually, I thought groups in situation 2,3 are also eligible to host
the task. And in 4th situation, agree to bias toward local group.
So, has this patch.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df99456..b40bc2b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2953,6 +2953,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		  int this_cpu, int load_idx)
 {
 	struct sched_group *idlest = NULL, *group = sd->groups;
+	struct sched_group *this_group = NULL;
 	unsigned long min_load = ULONG_MAX, this_load = 0;
 	int imbalance = 100 + (sd->imbalance_pct-100)/2;
 
@@ -2987,14 +2988,19 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 
 		if (local_group) {
 			this_load = avg_load;
-		} else if (avg_load < min_load) {
+			this_group = group;
+		}
+		if (avg_load < min_load) {
 			min_load = avg_load;
 			idlest = group;
 		}
 	} while (group = group->next, group != sd->groups);
 
-	if (!idlest || 100*this_load < imbalance*min_load)
-		return NULL;
+	if (this_group && idlest != this_group)
+		/* Bias toward our group again */
+		if (100*this_load < imbalance*min_load)
+			idlest = this_group;
+
 	return idlest;
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/10] sched: don't need go to smaller sched domain
  2012-12-03 13:54 [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
  2012-12-03 13:54 ` [PATCH 01/10] sched: select_task_rq_fair " Alex Shi
  2012-12-03 13:54 ` [PATCH 02/10] sched: fix find_idlest_group mess logical Alex Shi
@ 2012-12-03 13:54 ` Alex Shi
  2012-12-03 13:54 ` [PATCH 04/10] sched: remove domain iterations in fork/exec/wake Alex Shi
  2012-12-05  7:09 ` [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
  4 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2012-12-03 13:54 UTC (permalink / raw)
  To: npiggin, mingo, peterz; +Cc: linux-kernel, akpm

If parent sched domain has no task allowed cpu find. neither find in
it's child. So, go out to save useless checking.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b40bc2b..05ee54e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3150,10 +3150,8 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 			load_idx = sd->wake_idx;
 
 		group = find_idlest_group(sd, p, cpu, load_idx);
-		if (!group) {
-			sd = sd->child;
-			continue;
-		}
+		if (!group)
+			goto unlock;
 
 		new_cpu = find_idlest_cpu(group, p, cpu);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/10] sched: remove domain iterations in fork/exec/wake
  2012-12-03 13:54 [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
                   ` (2 preceding siblings ...)
  2012-12-03 13:54 ` [PATCH 03/10] sched: don't need go to smaller sched domain Alex Shi
@ 2012-12-03 13:54 ` Alex Shi
  2012-12-05  7:09 ` [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
  4 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2012-12-03 13:54 UTC (permalink / raw)
  To: npiggin, mingo, peterz; +Cc: linux-kernel, akpm

Guess the search cpu from bottom to up in domain tree come from
commit 3dbd5342074a1e sched: multilevel sbe sbf, the purpose is
balancing over tasks on all level domains.

This balancing cost much if there has many domain/groups in a large
system. And force spreading task among different domains may cause
performance issue due to bad locality.

If we remove this code, we will get quick fork/exec/wake, plus better
balancing among whole system, that also reduce migrations in future
load balancing.

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/fair.c |   20 +-------------------
 1 files changed, 1 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 05ee54e..1faf89f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3136,15 +3136,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 		goto unlock;
 	}
 
-	while (sd) {
+	if (sd) {
 		int load_idx = sd->forkexec_idx;
 		struct sched_group *group;
-		int weight;
-
-		if (!(sd->flags & sd_flag)) {
-			sd = sd->child;
-			continue;
-		}
 
 		if (sd_flag & SD_BALANCE_WAKE)
 			load_idx = sd->wake_idx;
@@ -3154,18 +3148,6 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
 			goto unlock;
 
 		new_cpu = find_idlest_cpu(group, p, cpu);
-
-		/* Now try balancing at a lower domain level of new_cpu */
-		cpu = new_cpu;
-		weight = sd->span_weight;
-		sd = NULL;
-		for_each_domain(cpu, tmp) {
-			if (weight <= tmp->span_weight)
-				break;
-			if (tmp->flags & sd_flag)
-				sd = tmp;
-		}
-		/* while loop will break here if sd == NULL */
 	}
 unlock:
 	rcu_read_unlock();
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] sched: fork/exec/wake clean up
  2012-12-03 13:54 [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
                   ` (3 preceding siblings ...)
  2012-12-03 13:54 ` [PATCH 04/10] sched: remove domain iterations in fork/exec/wake Alex Shi
@ 2012-12-05  7:09 ` Alex Shi
  2012-12-07  0:33   ` Alex Shi
  4 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2012-12-05  7:09 UTC (permalink / raw)
  To: Alex Shi
  Cc: npiggin, mingo, peterz, linux-kernel, akpm, Vincent Guittot,
	Preeti U Murthy

On Mon, Dec 3, 2012 at 9:54 PM, Alex Shi <alex.shi@intel.com> wrote:
> This patchset try to clean up select_task_rq_fair which used for
> fork/exec/wake scheduling.
>
> With this patchset, our system NHM EX and SNB EP 4 socket machine
> has 10% hackbench performance increase.

Any comments for this patchset?
>
> Regards
> Alex
>
> [PATCH 01/4] sched: select_task_rq_fair clean up
> [PATCH 02/4] sched: fix find_idlest_group mess logical
> [PATCH 03/4] sched: don't need go to smaller sched domain
> [PATCH 04/4] sched: remove domain iterations in fork/exec/wake
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 01/10] sched: select_task_rq_fair clean up
  2012-12-03 13:54 ` [PATCH 01/10] sched: select_task_rq_fair " Alex Shi
@ 2012-12-06 17:50   ` Frederic Weisbecker
  2012-12-07  0:31     ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2012-12-06 17:50 UTC (permalink / raw)
  To: Alex Shi; +Cc: npiggin, mingo, peterz, linux-kernel, akpm

2012/12/3 Alex Shi <alex.shi@intel.com>:
> It is impossible to miss a task allowed cpu in a eligible group.
>
> And since find_idlest_group only return a different group which
> excludes old cpu, it's also imporissible to find a new cpu same as old
> cpu.

Is it possible for weighted_cpuload() to return ULONG_MAX? If so,
find_idlest_cpu() can return -1.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 01/10] sched: select_task_rq_fair clean up
  2012-12-06 17:50   ` Frederic Weisbecker
@ 2012-12-07  0:31     ` Alex Shi
  2012-12-07  1:02       ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2012-12-07  0:31 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: npiggin, mingo, peterz, linux-kernel, akpm

On 12/07/2012 01:50 AM, Frederic Weisbecker wrote:
> 2012/12/3 Alex Shi <alex.shi@intel.com>:
>> It is impossible to miss a task allowed cpu in a eligible group.
>>
>> And since find_idlest_group only return a different group which
>> excludes old cpu, it's also imporissible to find a new cpu same as old
>> cpu.
> 
> Is it possible for weighted_cpuload() to return ULONG_MAX? If so,
> find_idlest_cpu() can return -1.
> 

No, non of sched entity can has a ULONG_MAX weight.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] sched: fork/exec/wake clean up
  2012-12-05  7:09 ` [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
@ 2012-12-07  0:33   ` Alex Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2012-12-07  0:33 UTC (permalink / raw)
  To: Alex Shi
  Cc: npiggin, mingo, peterz, linux-kernel, akpm, Vincent Guittot,
	Preeti U Murthy, Van De Ven, Arjan, Mike Galbraith

On 12/05/2012 03:09 PM, Alex Shi wrote:
> On Mon, Dec 3, 2012 at 9:54 PM, Alex Shi <alex.shi@intel.com> wrote:
>> This patchset try to clean up select_task_rq_fair which used for
>> fork/exec/wake scheduling.
>>
>> With this patchset, our system NHM EX and SNB EP 4 socket machine
>> has 10% hackbench performance increase.
> 
> Any comments for this patchset?

Nick & Ingo:
Would you like to give some comments for this?


CC to more experts :)

Thanks!
Alex
>>
>> Regards
>> Alex
>>
>> [PATCH 01/4] sched: select_task_rq_fair clean up
>> [PATCH 02/4] sched: fix find_idlest_group mess logical
>> [PATCH 03/4] sched: don't need go to smaller sched domain
>> [PATCH 04/4] sched: remove domain iterations in fork/exec/wake
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/10] sched: fix find_idlest_group mess logical
  2012-12-03 13:54 ` [PATCH 02/10] sched: fix find_idlest_group mess logical Alex Shi
@ 2012-12-07  0:56   ` Frederic Weisbecker
  2012-12-07  1:32     ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2012-12-07  0:56 UTC (permalink / raw)
  To: Alex Shi; +Cc: npiggin, mingo, peterz, linux-kernel, akpm, Mike Galbraith

2012/12/3 Alex Shi <alex.shi@intel.com>:
> There is 4 situations in the function:
> 1, no task allowed group;
>         so min_load = ULONG_MAX, this_load = 0, idlest = NULL
> 2, only local group task allowed;
>         so min_load = ULONG_MAX, this_load assigned, idlest = NULL
> 3, only non-local task group allowed;
>         so min_load assigned, this_load = 0, idlest != NULL
> 4, local group + another group are task allowed.
>         so min_load assigned, this_load assigned, idlest != NULL
>
> Current logical will return NULL in first 3 kinds of scenarios.
> And still return NULL, if idlest group is heavier then the
> local group in the 4th situation.
>
> Actually, I thought groups in situation 2,3 are also eligible to host
> the task. And in 4th situation, agree to bias toward local group.
> So, has this patch.

The way I understand the loop that use this in select_task_rq_fair() is:

a) start from the highest domain level we are allowed to run to
migrate the task in
b) from that top level domain, find the idlest group. If the idlest
group contains current CPU, zoom in the child domain and repeat b). If
the idlest group doesn't contain the current CPU, pick the idlest CPU
from that group.
c) In the end if we found no idler target than current CPU, then take it.

So if you also return a group that contains current CPU from
find_idlest_group(), you don't recursively zoom in the child domain
anymore. find_idlest_cpu() will fix that for you but it may come with
some cost because now it iterates through every CPUs, or may be half
of them.

The advantage of a recursive zooming through find_idlest_group() is to
scale better with the number of CPUs. It's probably like O(log n)
instead of O(n).

But it's possible I misunderstood something.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 01/10] sched: select_task_rq_fair clean up
  2012-12-07  0:31     ` Alex Shi
@ 2012-12-07  1:02       ` Frederic Weisbecker
  2012-12-07  1:22         ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2012-12-07  1:02 UTC (permalink / raw)
  To: Alex Shi; +Cc: npiggin, mingo, peterz, linux-kernel, akpm, Mike Galbraith

2012/12/7 Alex Shi <alex.shi@intel.com>:
> On 12/07/2012 01:50 AM, Frederic Weisbecker wrote:
>> 2012/12/3 Alex Shi <alex.shi@intel.com>:
>>> It is impossible to miss a task allowed cpu in a eligible group.
>>>
>>> And since find_idlest_group only return a different group which
>>> excludes old cpu, it's also imporissible to find a new cpu same as old
>>> cpu.
>>
>> Is it possible for weighted_cpuload() to return ULONG_MAX? If so,
>> find_idlest_cpu() can return -1.
>>
>
> No, non of sched entity can has a ULONG_MAX weight.

Ok. find_idlest_cpu() can still return -1 but select_task_rq_fair() is
the only caller. Presumably safe but code evolves. May be add some
comment to explain why what you're doing is safe. May be a
WARN_ON_ONCE() could be good to add?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 01/10] sched: select_task_rq_fair clean up
  2012-12-07  1:02       ` Frederic Weisbecker
@ 2012-12-07  1:22         ` Alex Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2012-12-07  1:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: npiggin, mingo, peterz, linux-kernel, akpm, Mike Galbraith

On 12/07/2012 09:02 AM, Frederic Weisbecker wrote:
> 2012/12/7 Alex Shi <alex.shi@intel.com>:
>> On 12/07/2012 01:50 AM, Frederic Weisbecker wrote:
>>> 2012/12/3 Alex Shi <alex.shi@intel.com>:
>>>> It is impossible to miss a task allowed cpu in a eligible group.
>>>>
>>>> And since find_idlest_group only return a different group which
>>>> excludes old cpu, it's also imporissible to find a new cpu same as old
>>>> cpu.
>>>
>>> Is it possible for weighted_cpuload() to return ULONG_MAX? If so,
>>> find_idlest_cpu() can return -1.
>>>
>>
>> No, non of sched entity can has a ULONG_MAX weight.
> 
> Ok. find_idlest_cpu() can still return -1 but select_task_rq_fair() is
> the only caller. Presumably safe but code evolves. May be add some
> comment to explain why what you're doing is safe. May be a
> WARN_ON_ONCE() could be good to add?
> 

why you think it is possible to be -1?
And there is a WARN_ON_ONCE for cpu = -1 in find_idlest_group when
checking local_group.

Thanks!


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/10] sched: fix find_idlest_group mess logical
  2012-12-07  0:56   ` Frederic Weisbecker
@ 2012-12-07  1:32     ` Alex Shi
  2012-12-07  8:33       ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Shi @ 2012-12-07  1:32 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: npiggin, mingo, peterz, linux-kernel, akpm, Mike Galbraith

On 12/07/2012 08:56 AM, Frederic Weisbecker wrote:
> 2012/12/3 Alex Shi <alex.shi@intel.com>:
>> There is 4 situations in the function:
>> 1, no task allowed group;
>>         so min_load = ULONG_MAX, this_load = 0, idlest = NULL
>> 2, only local group task allowed;
>>         so min_load = ULONG_MAX, this_load assigned, idlest = NULL
>> 3, only non-local task group allowed;
>>         so min_load assigned, this_load = 0, idlest != NULL
>> 4, local group + another group are task allowed.
>>         so min_load assigned, this_load assigned, idlest != NULL
>>
>> Current logical will return NULL in first 3 kinds of scenarios.
>> And still return NULL, if idlest group is heavier then the
>> local group in the 4th situation.
>>
>> Actually, I thought groups in situation 2,3 are also eligible to host
>> the task. And in 4th situation, agree to bias toward local group.
>> So, has this patch.
> 
> The way I understand the loop that use this in select_task_rq_fair() is:
> 
> a) start from the highest domain level we are allowed to run to
> migrate the task in
> b) from that top level domain, find the idlest group. If the idlest
> group contains current CPU, zoom in the child domain and repeat b). If
> the idlest group doesn't contain the current CPU, pick the idlest CPU
> from that group.
> c) In the end if we found no idler target than current CPU, then take it.
> 
> So if you also return a group that contains current CPU from
> find_idlest_group(), you don't recursively zoom in the child domain
> anymore. find_idlest_cpu() will fix that for you but it may come with
> some cost because now it iterates through every CPUs, or may be half
> of them.

Not exactly, the old logical won't select cpu from group of situation 2
and 3. That is wrong. and may cause the task keep stay on prev_cpu even
there are still other idler and allowed cpu exist.

situation 2,3 are also eligible for the task. and may has idler and
eligible cpu.

> 
> The advantage of a recursive zooming through find_idlest_group() is to
> scale better with the number of CPUs. It's probably like O(log n)
> instead of O(n).
> 
> But it's possible I misunderstood something.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/10] sched: fix find_idlest_group mess logical
  2012-12-07  1:32     ` Alex Shi
@ 2012-12-07  8:33       ` Frederic Weisbecker
  2012-12-08 12:12         ` Alex Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2012-12-07  8:33 UTC (permalink / raw)
  To: Alex Shi; +Cc: npiggin, mingo, peterz, linux-kernel, akpm, Mike Galbraith

2012/12/7 Alex Shi <alex.shi@intel.com>:
> On 12/07/2012 08:56 AM, Frederic Weisbecker wrote:
>> 2012/12/3 Alex Shi <alex.shi@intel.com>:
>>> There is 4 situations in the function:
>>> 1, no task allowed group;
>>>         so min_load = ULONG_MAX, this_load = 0, idlest = NULL
>>> 2, only local group task allowed;
>>>         so min_load = ULONG_MAX, this_load assigned, idlest = NULL
>>> 3, only non-local task group allowed;
>>>         so min_load assigned, this_load = 0, idlest != NULL
>>> 4, local group + another group are task allowed.
>>>         so min_load assigned, this_load assigned, idlest != NULL
>>>
>>> Current logical will return NULL in first 3 kinds of scenarios.
>>> And still return NULL, if idlest group is heavier then the
>>> local group in the 4th situation.
>>>
>>> Actually, I thought groups in situation 2,3 are also eligible to host
>>> the task. And in 4th situation, agree to bias toward local group.
>>> So, has this patch.
>>
>> The way I understand the loop that use this in select_task_rq_fair() is:
>>
>> a) start from the highest domain level we are allowed to run to
>> migrate the task in
>> b) from that top level domain, find the idlest group. If the idlest
>> group contains current CPU, zoom in the child domain and repeat b). If
>> the idlest group doesn't contain the current CPU, pick the idlest CPU
>> from that group.
>> c) In the end if we found no idler target than current CPU, then take it.
>>
>> So if you also return a group that contains current CPU from
>> find_idlest_group(), you don't recursively zoom in the child domain
>> anymore. find_idlest_cpu() will fix that for you but it may come with
>> some cost because now it iterates through every CPUs, or may be half
>> of them.
>
> Not exactly, the old logical won't select cpu from group of situation 2
> and 3. That is wrong. and may cause the task keep stay on prev_cpu even
> there are still other idler and allowed cpu exist.

For situation 2 I don't understand the issue. If current CPU belong to
idlest group we want to zoom in our lookup until we find something an
idler group than the current CPU's? If we eventually don't find it,
then we fallback to current CPU, don't we?

I just have a doubt to express. How does the final leaf child domain
look like? Is it made of current CPU only or can it contain other
siblings? In the first case we are fine. In the second one, if this
domain is made of only one group of several CPUs, we are skipping the
find_idlest_cpu() call for that group and choose this_cpu by default.
Which is probably suboptimized?

Concerning situation 3, if this_cpu is not a CPU allowed by the task,
we may indeed have an issue because find_idlest_group() doesn't seem
to be selecting non-local groups in this case. But your current fix
still breaks the recursive find_idlest_group() on other cases and may
not scale with big number of CPUs.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 02/10] sched: fix find_idlest_group mess logical
  2012-12-07  8:33       ` Frederic Weisbecker
@ 2012-12-08 12:12         ` Alex Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Shi @ 2012-12-08 12:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: npiggin, mingo, peterz, linux-kernel, akpm, Mike Galbraith

On 12/07/2012 04:33 PM, Frederic Weisbecker wrote:
> 2012/12/7 Alex Shi <alex.shi@intel.com>:
>> On 12/07/2012 08:56 AM, Frederic Weisbecker wrote:
>>> 2012/12/3 Alex Shi <alex.shi@intel.com>:
>>>> There is 4 situations in the function:
>>>> 1, no task allowed group;
>>>>         so min_load = ULONG_MAX, this_load = 0, idlest = NULL
>>>> 2, only local group task allowed;
>>>>         so min_load = ULONG_MAX, this_load assigned, idlest = NULL
>>>> 3, only non-local task group allowed;
>>>>         so min_load assigned, this_load = 0, idlest != NULL
>>>> 4, local group + another group are task allowed.
>>>>         so min_load assigned, this_load assigned, idlest != NULL
>>>>
>>>> Current logical will return NULL in first 3 kinds of scenarios.
>>>> And still return NULL, if idlest group is heavier then the
>>>> local group in the 4th situation.
>>>>
>>>> Actually, I thought groups in situation 2,3 are also eligible to host
>>>> the task. And in 4th situation, agree to bias toward local group.
>>>> So, has this patch.
>>>
>>> The way I understand the loop that use this in select_task_rq_fair() is:
>>>
>>> a) start from the highest domain level we are allowed to run to
>>> migrate the task in
>>> b) from that top level domain, find the idlest group. If the idlest
>>> group contains current CPU, zoom in the child domain and repeat b). If
>>> the idlest group doesn't contain the current CPU, pick the idlest CPU
>>> from that group.
>>> c) In the end if we found no idler target than current CPU, then take it.
>>>
>>> So if you also return a group that contains current CPU from
>>> find_idlest_group(), you don't recursively zoom in the child domain
>>> anymore. find_idlest_cpu() will fix that for you but it may come with
>>> some cost because now it iterates through every CPUs, or may be half
>>> of them.
>>
>> Not exactly, the old logical won't select cpu from group of situation 2
>> and 3. That is wrong. and may cause the task keep stay on prev_cpu even
>> there are still other idler and allowed cpu exist.
> 
> For situation 2 I don't understand the issue. If current CPU belong to
> idlest group we want to zoom in our lookup until we find something an
> idler group than the current CPU's? If we eventually don't find it,
> then we fallback to current CPU, don't we?

fallback to current CPU is not the best choice here. :)
The meaning to release situation 2 is that this_cpu may not the idlest
cpu even in local group. there maybe other idlers in the  local group.
But the old logical will refuse to select idlest cpu from the local
group, just there is no other group eligible. that is the problem.

> 
> I just have a doubt to express. How does the final leaf child domain
> look like? Is it made of current CPU only or can it contain other
> siblings? In the first case we are fine. In the second one, if this
> domain is made of only one group of several CPUs, we are skipping the
> find_idlest_cpu() call for that group and choose this_cpu by default.
> Which is probably suboptimized?

In most of time, the domain is not leaf child domain. and even with leaf
child and single group domain, find_idlest_cpu will return quickly, that
should not cause much trouble.
> 
> Concerning situation 3, if this_cpu is not a CPU allowed by the task,
> we may indeed have an issue because find_idlest_group() doesn't seem
> to be selecting non-local groups in this case.

thanks!

 But your current fix
> still breaks the recursive find_idlest_group() on other cases and may
> not scale with big number of CPUs.
> 

I don't understand recursive fig can scale with big number CPU. :)
Actually, this patch set just show 10+% performance gain on hackbench
with big machines, while just 0~2% performance gain on 2 sockets machine.


-- 
Thanks
    Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-12-08 12:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-03 13:54 [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
2012-12-03 13:54 ` [PATCH 01/10] sched: select_task_rq_fair " Alex Shi
2012-12-06 17:50   ` Frederic Weisbecker
2012-12-07  0:31     ` Alex Shi
2012-12-07  1:02       ` Frederic Weisbecker
2012-12-07  1:22         ` Alex Shi
2012-12-03 13:54 ` [PATCH 02/10] sched: fix find_idlest_group mess logical Alex Shi
2012-12-07  0:56   ` Frederic Weisbecker
2012-12-07  1:32     ` Alex Shi
2012-12-07  8:33       ` Frederic Weisbecker
2012-12-08 12:12         ` Alex Shi
2012-12-03 13:54 ` [PATCH 03/10] sched: don't need go to smaller sched domain Alex Shi
2012-12-03 13:54 ` [PATCH 04/10] sched: remove domain iterations in fork/exec/wake Alex Shi
2012-12-05  7:09 ` [PATCH 0/4] sched: fork/exec/wake clean up Alex Shi
2012-12-07  0:33   ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).