linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/2] sched: Cleanups,fixes in nohz_kick_needed()
@ 2013-10-30  3:12 Preeti U Murthy
  2013-10-30  3:12 ` [PATCH V2 1/2] sched: Fix asymmetric scheduling for POWER7 Preeti U Murthy
  2013-10-30  3:12 ` [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus Preeti U Murthy
  0 siblings, 2 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-10-30  3:12 UTC (permalink / raw)
  To: peterz, mikey, svaidy, mingo
  Cc: vincent.guittot, bitbucket, linux-kernel, anton, linuxppc-dev,
	Morten.Rasmussen, pjt

Changes from V1:https://lkml.org/lkml/2013/10/21/248

1. Swapped the order of PATCH1 and PATCH2 in V1 so as to not mess with the
nr_busy_cpus parameter computation during asymmetric balancing, while fixing
it.

2. nohz_busy_cpus parameter is to be updated and queried at only one level of
the sched domain-sd_busy where it is relevant.

3. Introduce sd_asym to represent the sched domain where asymmetric load
balancing has to be done.
---

Preeti U Murthy (1):
      sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus

Vaidyanathan Srinivasan (1):
      sched: Fix asymmetric scheduling for POWER7


 kernel/sched/core.c  |    6 ++++++
 kernel/sched/fair.c  |   38 ++++++++++++++++++++------------------
 kernel/sched/sched.h |    2 ++
 3 files changed, 28 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V2 1/2] sched: Fix asymmetric scheduling for POWER7
  2013-10-30  3:12 [PATCH V2 0/2] sched: Cleanups,fixes in nohz_kick_needed() Preeti U Murthy
@ 2013-10-30  3:12 ` Preeti U Murthy
  2013-10-30  3:12 ` [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus Preeti U Murthy
  1 sibling, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-10-30  3:12 UTC (permalink / raw)
  To: peterz, mikey, svaidy, mingo
  Cc: vincent.guittot, bitbucket, linux-kernel, anton, linuxppc-dev,
	Morten.Rasmussen, pjt

From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Asymmetric scheduling within a core is a scheduler loadbalancing
feature that is triggered when SD_ASYM_PACKING flag is set.  The goal
for the load balancer is to move tasks to lower order idle SMT threads
within a core on a POWER7 system.

In nohz_kick_needed(), we intend to check if our sched domain (core)
is completely busy or we have idle cpu.

The following check for SD_ASYM_PACKING:

    (cpumask_first_and(nohz.idle_cpus_mask, sched_domain_span(sd)) < cpu)

already covers the case of checking if the domain has an idle cpu,
because cpumask_first_and() will not yield any set bits if this domain
has no idle cpu.

Hence, nr_busy check against group weight can be removed.

Reported-by: Michael Neuling <michael.neuling@au1.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
---

 kernel/sched/fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 813dd61..e9c9549 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6781,7 +6781,7 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu)
 		if (sd->flags & SD_SHARE_PKG_RESOURCES && nr_busy > 1)
 			goto need_kick_unlock;
 
-		if (sd->flags & SD_ASYM_PACKING && nr_busy != sg->group_weight
+		if (sd->flags & SD_ASYM_PACKING
 		    && (cpumask_first_and(nohz.idle_cpus_mask,
 					  sched_domain_span(sd)) < cpu))
 			goto need_kick_unlock;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus
  2013-10-30  3:12 [PATCH V2 0/2] sched: Cleanups,fixes in nohz_kick_needed() Preeti U Murthy
  2013-10-30  3:12 ` [PATCH V2 1/2] sched: Fix asymmetric scheduling for POWER7 Preeti U Murthy
@ 2013-10-30  3:12 ` Preeti U Murthy
  2013-10-30  3:20   ` Preeti U Murthy
  1 sibling, 1 reply; 6+ messages in thread
From: Preeti U Murthy @ 2013-10-30  3:12 UTC (permalink / raw)
  To: peterz, mikey, svaidy, mingo
  Cc: vincent.guittot, bitbucket, linux-kernel, anton, linuxppc-dev,
	Morten.Rasmussen, pjt

nr_busy_cpus parameter is used by nohz_kick_needed() to find out the number
of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES flag set.
Therefore instead of updating nr_busy_cpus at every level of sched domain,
since it is irrelevant, we can update this parameter only at the parent
domain of the sd which has this flag set. Introduce a per-cpu parameter
sd_busy which represents this parent domain.

In nohz_kick_needed() we directly query the nr_busy_cpus parameter
associated with the groups of sd_busy.

By associating sd_busy with the highest domain which has
SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains which could
have this flag set and trigger nohz_idle_balancing if any of the levels have
more than one busy cpu.

sd_busy is irrelevant for asymmetric load balancing.

While we are at it, we might as well change the nohz_idle parameter to be
updated at the sd_busy domain level alone and not the base domain level of a CPU.
This will unify the concept of busy cpus at just one level of sched domain
where it is currently used.

Signed-off-by: Preeti U Murthy<preeti@linux.vnet.ibm.com>
---

 kernel/sched/core.c  |    6 ++++++
 kernel/sched/fair.c  |   38 ++++++++++++++++++++------------------
 kernel/sched/sched.h |    2 ++
 3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c06b8d3..e6a6244 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5271,6 +5271,8 @@ DEFINE_PER_CPU(struct sched_domain *, sd_llc);
 DEFINE_PER_CPU(int, sd_llc_size);
 DEFINE_PER_CPU(int, sd_llc_id);
 DEFINE_PER_CPU(struct sched_domain *, sd_numa);
+DEFINE_PER_CPU(struct sched_domain *, sd_busy);
+DEFINE_PER_CPU(struct sched_domain *, sd_asym);
 
 static void update_top_cache_domain(int cpu)
 {
@@ -5282,6 +5284,7 @@ static void update_top_cache_domain(int cpu)
 	if (sd) {
 		id = cpumask_first(sched_domain_span(sd));
 		size = cpumask_weight(sched_domain_span(sd));
+		rcu_assign_pointer(per_cpu(sd_busy, cpu), sd->parent);
 	}
 
 	rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
@@ -5290,6 +5293,9 @@ static void update_top_cache_domain(int cpu)
 
 	sd = lowest_flag_domain(cpu, SD_NUMA);
 	rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
+
+	sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
+	rcu_assign_pointer(per_cpu(sd_asym, cpu), sd);
 }
 
 /*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e9c9549..8602b2c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6515,16 +6515,16 @@ static inline void nohz_balance_exit_idle(int cpu)
 static inline void set_cpu_sd_state_busy(void)
 {
 	struct sched_domain *sd;
+	int cpu = smp_processor_id();
 
 	rcu_read_lock();
-	sd = rcu_dereference_check_sched_domain(this_rq()->sd);
+	sd = rcu_dereference(per_cpu(sd_busy, cpu));
 
 	if (!sd || !sd->nohz_idle)
 		goto unlock;
 	sd->nohz_idle = 0;
 
-	for (; sd; sd = sd->parent)
-		atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+	atomic_inc(&sd->groups->sgp->nr_busy_cpus);
 unlock:
 	rcu_read_unlock();
 }
@@ -6532,16 +6532,16 @@ unlock:
 void set_cpu_sd_state_idle(void)
 {
 	struct sched_domain *sd;
+	int cpu = smp_processor_id();
 
 	rcu_read_lock();
-	sd = rcu_dereference_check_sched_domain(this_rq()->sd);
+	sd = rcu_dereference(per_cpu(sd_busy, cpu));
 
 	if (!sd || sd->nohz_idle)
 		goto unlock;
 	sd->nohz_idle = 1;
 
-	for (; sd; sd = sd->parent)
-		atomic_dec(&sd->groups->sgp->nr_busy_cpus);
+	atomic_dec(&sd->groups->sgp->nr_busy_cpus);
 unlock:
 	rcu_read_unlock();
 }
@@ -6748,6 +6748,8 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu)
 {
 	unsigned long now = jiffies;
 	struct sched_domain *sd;
+	struct sched_group_power *sgp;
+	int nr_busy;
 
 	if (unlikely(idle_cpu(cpu)))
 		return 0;
@@ -6773,22 +6775,22 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu)
 		goto need_kick;
 
 	rcu_read_lock();
-	for_each_domain(cpu, sd) {
-		struct sched_group *sg = sd->groups;
-		struct sched_group_power *sgp = sg->sgp;
-		int nr_busy = atomic_read(&sgp->nr_busy_cpus);
+	sd = rcu_dereference(per_cpu(sd_busy, cpu));
 
-		if (sd->flags & SD_SHARE_PKG_RESOURCES && nr_busy > 1)
-			goto need_kick_unlock;
+	if (sd) {
+		sgp = sd->groups->sgp;
+		nr_busy = atomic_read(&sgp->nr_busy_cpus);
 
-		if (sd->flags & SD_ASYM_PACKING
-		    && (cpumask_first_and(nohz.idle_cpus_mask,
-					  sched_domain_span(sd)) < cpu))
+		if (nr_busy > 1)
 			goto need_kick_unlock;
-
-		if (!(sd->flags & (SD_SHARE_PKG_RESOURCES | SD_ASYM_PACKING)))
-			break;
 	}
+
+	sd = rcu_dereference(per_cpu(sd_asym, cpu));
+
+	if (sd && (cpumask_first_and(nohz.idle_cpus_mask,
+				  sched_domain_span(sd)) < cpu))
+		goto need_kick_unlock;
+
 	rcu_read_unlock();
 	return 0;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ffc7087..c8cb145 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -623,6 +623,8 @@ DECLARE_PER_CPU(struct sched_domain *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
 DECLARE_PER_CPU(struct sched_domain *, sd_numa);
+DECLARE_PER_CPU(struct sched_domain *, sd_busy);
+DECLARE_PER_CPU(struct sched_domain *, sd_asym);
 
 struct sched_group_power {
 	atomic_t ref;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus
  2013-10-30  3:12 ` [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus Preeti U Murthy
@ 2013-10-30  3:20   ` Preeti U Murthy
  2013-10-30  9:23     ` Kamalesh Babulal
  0 siblings, 1 reply; 6+ messages in thread
From: Preeti U Murthy @ 2013-10-30  3:20 UTC (permalink / raw)
  To: peterz, mikey, svaidy, mingo
  Cc: vincent.guittot, bitbucket, linux-kernel, anton, linuxppc-dev,
	Morten.Rasmussen, pjt

The changelog has missed mentioning the introduction of sd_asym per_cpu sched domain.
Apologies for this. The patch with the changelog including mention of sd_asym is
pasted below.

Regards
Preeti U Murthy

---------------

sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus

From: Preeti U Murthy <preeti@linux.vnet.ibm.com>

nr_busy_cpus parameter is used by nohz_kick_needed() to find out the number
of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES flag set.
Therefore instead of updating nr_busy_cpus at every level of sched domain,
since it is irrelevant, we can update this parameter only at the parent
domain of the sd which has this flag set. Introduce a per-cpu parameter
sd_busy which represents this parent domain.

In nohz_kick_needed() we directly query the nr_busy_cpus parameter
associated with the groups of sd_busy.

By associating sd_busy with the highest domain which has
SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains which could
have this flag set and trigger nohz_idle_balancing if any of the levels have
more than one busy cpu.

sd_busy is irrelevant for asymmetric load balancing. However sd_asym has been
introduced to represent the highest sched domain which has SD_ASYM_PACKING flag set
so that it can be queried directly when required.

While we are at it, we might as well change the nohz_idle parameter to be
updated at the sd_busy domain level alone and not the base domain level of a CPU.
This will unify the concept of busy cpus at just one level of sched domain
where it is currently used.

Signed-off-by: Preeti U Murthy<preeti@linux.vnet.ibm.com>
---
 kernel/sched/core.c  |    6 ++++++
 kernel/sched/fair.c  |   38 ++++++++++++++++++++------------------
 kernel/sched/sched.h |    2 ++
 3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c06b8d3..e6a6244 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5271,6 +5271,8 @@ DEFINE_PER_CPU(struct sched_domain *, sd_llc);
 DEFINE_PER_CPU(int, sd_llc_size);
 DEFINE_PER_CPU(int, sd_llc_id);
 DEFINE_PER_CPU(struct sched_domain *, sd_numa);
+DEFINE_PER_CPU(struct sched_domain *, sd_busy);
+DEFINE_PER_CPU(struct sched_domain *, sd_asym);
 
 static void update_top_cache_domain(int cpu)
 {
@@ -5282,6 +5284,7 @@ static void update_top_cache_domain(int cpu)
 	if (sd) {
 		id = cpumask_first(sched_domain_span(sd));
 		size = cpumask_weight(sched_domain_span(sd));
+		rcu_assign_pointer(per_cpu(sd_busy, cpu), sd->parent);
 	}
 
 	rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
@@ -5290,6 +5293,9 @@ static void update_top_cache_domain(int cpu)
 
 	sd = lowest_flag_domain(cpu, SD_NUMA);
 	rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
+
+	sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
+	rcu_assign_pointer(per_cpu(sd_asym, cpu), sd);
 }
 
 /*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e9c9549..8602b2c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6515,16 +6515,16 @@ static inline void nohz_balance_exit_idle(int cpu)
 static inline void set_cpu_sd_state_busy(void)
 {
 	struct sched_domain *sd;
+	int cpu = smp_processor_id();
 
 	rcu_read_lock();
-	sd = rcu_dereference_check_sched_domain(this_rq()->sd);
+	sd = rcu_dereference(per_cpu(sd_busy, cpu));
 
 	if (!sd || !sd->nohz_idle)
 		goto unlock;
 	sd->nohz_idle = 0;
 
-	for (; sd; sd = sd->parent)
-		atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+	atomic_inc(&sd->groups->sgp->nr_busy_cpus);
 unlock:
 	rcu_read_unlock();
 }
@@ -6532,16 +6532,16 @@ unlock:
 void set_cpu_sd_state_idle(void)
 {
 	struct sched_domain *sd;
+	int cpu = smp_processor_id();
 
 	rcu_read_lock();
-	sd = rcu_dereference_check_sched_domain(this_rq()->sd);
+	sd = rcu_dereference(per_cpu(sd_busy, cpu));
 
 	if (!sd || sd->nohz_idle)
 		goto unlock;
 	sd->nohz_idle = 1;
 
-	for (; sd; sd = sd->parent)
-		atomic_dec(&sd->groups->sgp->nr_busy_cpus);
+	atomic_dec(&sd->groups->sgp->nr_busy_cpus);
 unlock:
 	rcu_read_unlock();
 }
@@ -6748,6 +6748,8 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu)
 {
 	unsigned long now = jiffies;
 	struct sched_domain *sd;
+	struct sched_group_power *sgp;
+	int nr_busy;
 
 	if (unlikely(idle_cpu(cpu)))
 		return 0;
@@ -6773,22 +6775,22 @@ static inline int nohz_kick_needed(struct rq *rq, int cpu)
 		goto need_kick;
 
 	rcu_read_lock();
-	for_each_domain(cpu, sd) {
-		struct sched_group *sg = sd->groups;
-		struct sched_group_power *sgp = sg->sgp;
-		int nr_busy = atomic_read(&sgp->nr_busy_cpus);
+	sd = rcu_dereference(per_cpu(sd_busy, cpu));
 
-		if (sd->flags & SD_SHARE_PKG_RESOURCES && nr_busy > 1)
-			goto need_kick_unlock;
+	if (sd) {
+		sgp = sd->groups->sgp;
+		nr_busy = atomic_read(&sgp->nr_busy_cpus);
 
-		if (sd->flags & SD_ASYM_PACKING
-		    && (cpumask_first_and(nohz.idle_cpus_mask,
-					  sched_domain_span(sd)) < cpu))
+		if (nr_busy > 1)
 			goto need_kick_unlock;
-
-		if (!(sd->flags & (SD_SHARE_PKG_RESOURCES | SD_ASYM_PACKING)))
-			break;
 	}
+
+	sd = rcu_dereference(per_cpu(sd_asym, cpu));
+
+	if (sd && (cpumask_first_and(nohz.idle_cpus_mask,
+				  sched_domain_span(sd)) < cpu))
+		goto need_kick_unlock;
+
 	rcu_read_unlock();
 	return 0;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ffc7087..c8cb145 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -623,6 +623,8 @@ DECLARE_PER_CPU(struct sched_domain *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
 DECLARE_PER_CPU(struct sched_domain *, sd_numa);
+DECLARE_PER_CPU(struct sched_domain *, sd_busy);
+DECLARE_PER_CPU(struct sched_domain *, sd_asym);
 
 struct sched_group_power {
 	atomic_t ref;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus
  2013-10-30  3:20   ` Preeti U Murthy
@ 2013-10-30  9:23     ` Kamalesh Babulal
  2013-10-30 10:03       ` Preeti U Murthy
  0 siblings, 1 reply; 6+ messages in thread
From: Kamalesh Babulal @ 2013-10-30  9:23 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: mikey, vincent.guittot, peterz, linux-kernel, Morten.Rasmussen,
	bitbucket, anton, linuxppc-dev, mingo, pjt

Hi Preeti,

> nr_busy_cpus parameter is used by nohz_kick_needed() to find out the number
> of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES flag set.
> Therefore instead of updating nr_busy_cpus at every level of sched domain,
> since it is irrelevant, we can update this parameter only at the parent
> domain of the sd which has this flag set. Introduce a per-cpu parameter
> sd_busy which represents this parent domain.
> 
> In nohz_kick_needed() we directly query the nr_busy_cpus parameter
> associated with the groups of sd_busy.
> 
> By associating sd_busy with the highest domain which has
> SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains which could
> have this flag set and trigger nohz_idle_balancing if any of the levels have
> more than one busy cpu.
> 
> sd_busy is irrelevant for asymmetric load balancing. However sd_asym has been
> introduced to represent the highest sched domain which has SD_ASYM_PACKING flag set
> so that it can be queried directly when required.
> 
> While we are at it, we might as well change the nohz_idle parameter to be
> updated at the sd_busy domain level alone and not the base domain level of a CPU.
> This will unify the concept of busy cpus at just one level of sched domain
> where it is currently used.
> 
> Signed-off-by: Preeti U Murthy<preeti@linux.vnet.ibm.com>
> ---
>  kernel/sched/core.c  |    6 ++++++
>  kernel/sched/fair.c  |   38 ++++++++++++++++++++------------------
>  kernel/sched/sched.h |    2 ++
>  3 files changed, 28 insertions(+), 18 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c06b8d3..e6a6244 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5271,6 +5271,8 @@ DEFINE_PER_CPU(struct sched_domain *, sd_llc);
>  DEFINE_PER_CPU(int, sd_llc_size);
>  DEFINE_PER_CPU(int, sd_llc_id);
>  DEFINE_PER_CPU(struct sched_domain *, sd_numa);
> +DEFINE_PER_CPU(struct sched_domain *, sd_busy);
> +DEFINE_PER_CPU(struct sched_domain *, sd_asym);
> 
>  static void update_top_cache_domain(int cpu)
>  {
> @@ -5282,6 +5284,7 @@ static void update_top_cache_domain(int cpu)
>  	if (sd) {
>  		id = cpumask_first(sched_domain_span(sd));
>  		size = cpumask_weight(sched_domain_span(sd));
> +		rcu_assign_pointer(per_cpu(sd_busy, cpu), sd->parent);
>  	}


consider a machine with single socket, dual core with HT enabled. The top most
domain is also the highest domain with SD_SHARE_PKG_RESOURCES flag set,
i.e MC domain (the machine toplogy consist of SIBLING and MC domain).

# lstopo-no-graphics --no-bridges --no-io
Machine (7869MB) + Socket L#0 + L3 L#0 (3072KB)
  L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
    PU L#0 (P#0)
    PU L#1 (P#1)
  L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
    PU L#2 (P#2)
    PU L#3 (P#3)

With this approach parent of MC domain is NULL and given that sd_busy is NULL,
nr_busy_cpus of sched domain sd_busy will never be incremented/decremented.
Resulting is nohz_kick_needed returning 0.

Thanks,
Kamalesh.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus
  2013-10-30  9:23     ` Kamalesh Babulal
@ 2013-10-30 10:03       ` Preeti U Murthy
  0 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-10-30 10:03 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: mikey, vincent.guittot, peterz, linux-kernel, Morten.Rasmussen,
	bitbucket, anton, linuxppc-dev, mingo, pjt

Hi Kamalesh,

On 10/30/2013 02:53 PM, Kamalesh Babulal wrote:
> Hi Preeti,
> 
>> nr_busy_cpus parameter is used by nohz_kick_needed() to find out the number
>> of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES flag set.
>> Therefore instead of updating nr_busy_cpus at every level of sched domain,
>> since it is irrelevant, we can update this parameter only at the parent
>> domain of the sd which has this flag set. Introduce a per-cpu parameter
>> sd_busy which represents this parent domain.
>>
>> In nohz_kick_needed() we directly query the nr_busy_cpus parameter
>> associated with the groups of sd_busy.
>>
>> By associating sd_busy with the highest domain which has
>> SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains which could
>> have this flag set and trigger nohz_idle_balancing if any of the levels have
>> more than one busy cpu.
>>
>> sd_busy is irrelevant for asymmetric load balancing. However sd_asym has been
>> introduced to represent the highest sched domain which has SD_ASYM_PACKING flag set
>> so that it can be queried directly when required.
>>
>> While we are at it, we might as well change the nohz_idle parameter to be
>> updated at the sd_busy domain level alone and not the base domain level of a CPU.
>> This will unify the concept of busy cpus at just one level of sched domain
>> where it is currently used.
>>
>> Signed-off-by: Preeti U Murthy<preeti@linux.vnet.ibm.com>
>> ---
>>  kernel/sched/core.c  |    6 ++++++
>>  kernel/sched/fair.c  |   38 ++++++++++++++++++++------------------
>>  kernel/sched/sched.h |    2 ++
>>  3 files changed, 28 insertions(+), 18 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index c06b8d3..e6a6244 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5271,6 +5271,8 @@ DEFINE_PER_CPU(struct sched_domain *, sd_llc);
>>  DEFINE_PER_CPU(int, sd_llc_size);
>>  DEFINE_PER_CPU(int, sd_llc_id);
>>  DEFINE_PER_CPU(struct sched_domain *, sd_numa);
>> +DEFINE_PER_CPU(struct sched_domain *, sd_busy);
>> +DEFINE_PER_CPU(struct sched_domain *, sd_asym);
>>
>>  static void update_top_cache_domain(int cpu)
>>  {
>> @@ -5282,6 +5284,7 @@ static void update_top_cache_domain(int cpu)
>>  	if (sd) {
>>  		id = cpumask_first(sched_domain_span(sd));
>>  		size = cpumask_weight(sched_domain_span(sd));
>> +		rcu_assign_pointer(per_cpu(sd_busy, cpu), sd->parent);
>>  	}
> 
> 
> consider a machine with single socket, dual core with HT enabled. The top most
> domain is also the highest domain with SD_SHARE_PKG_RESOURCES flag set,
> i.e MC domain (the machine toplogy consist of SIBLING and MC domain).
> 
> # lstopo-no-graphics --no-bridges --no-io
> Machine (7869MB) + Socket L#0 + L3 L#0 (3072KB)
>   L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
>     PU L#0 (P#0)
>     PU L#1 (P#1)
>   L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
>     PU L#2 (P#2)
>     PU L#3 (P#3)
> 
> With this approach parent of MC domain is NULL and given that sd_busy is NULL,
> nr_busy_cpus of sched domain sd_busy will never be incremented/decremented.
> Resulting is nohz_kick_needed returning 0.

Right and it *should* return 0. There is no sibling domain that can
offload tasks from it. Therefore there is no point kicking nohz idle
balance.

Regards
Preeti U Murthy
> 
> Thanks,
> Kamalesh.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-10-30 10:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-30  3:12 [PATCH V2 0/2] sched: Cleanups,fixes in nohz_kick_needed() Preeti U Murthy
2013-10-30  3:12 ` [PATCH V2 1/2] sched: Fix asymmetric scheduling for POWER7 Preeti U Murthy
2013-10-30  3:12 ` [PATCH V2 2/2] sched: Remove un-necessary iteration over sched domains to update nr_busy_cpus Preeti U Murthy
2013-10-30  3:20   ` Preeti U Murthy
2013-10-30  9:23     ` Kamalesh Babulal
2013-10-30 10:03       ` Preeti U Murthy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).