linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] optimization, clean-up about fair.c
@ 2013-08-02  1:50 Joonsoo Kim
  2013-08-02  1:50 ` [PATCH v2 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-02  1:50 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

Patch 1 is for removing one division operation with multiplication.
Patch 2,3 is for clean-up related to load_balance(), there is improvement
in terms of code size and readability may be also improved.

Feel free to give a comment for this patchset.

It's tested on v3.10.
On v3.11-rc3, it can be compiled properly.

* Change from v1
Remove 2 patches, because I cannot sure they are right.

Joonsoo Kim (3):
  sched: remove one division operation in find_buiest_queue()
  sched: factor out code to should_we_balance()
  sched: clean-up struct sd_lb_stat

 kernel/sched/fair.c |  326 +++++++++++++++++++++++++--------------------------
 1 file changed, 161 insertions(+), 165 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/3] sched: remove one division operation in find_buiest_queue()
  2013-08-02  1:50 [PATCH v2 0/3] optimization, clean-up about fair.c Joonsoo Kim
@ 2013-08-02  1:50 ` Joonsoo Kim
  2013-08-02  1:50 ` [PATCH v2 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
  2013-08-02  1:50 ` [PATCH v2 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
  2 siblings, 0 replies; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-02  1:50 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

Remove one division operation in find_buiest_queue().

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c61a614..eaae77e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4931,7 +4931,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 				     struct sched_group *group)
 {
 	struct rq *busiest = NULL, *rq;
-	unsigned long max_load = 0;
+	unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE;
 	int i;
 
 	for_each_cpu(i, sched_group_cpus(group)) {
@@ -4962,10 +4962,9 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		 * the load can be moved away from the cpu that is potentially
 		 * running at a lower capacity.
 		 */
-		wl = (wl * SCHED_POWER_SCALE) / power;
-
-		if (wl > max_load) {
-			max_load = wl;
+		if (wl * busiest_power > busiest_load * power) {
+			busiest_load = wl;
+			busiest_power = power;
 			busiest = rq;
 		}
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  1:50 [PATCH v2 0/3] optimization, clean-up about fair.c Joonsoo Kim
  2013-08-02  1:50 ` [PATCH v2 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
@ 2013-08-02  1:50 ` Joonsoo Kim
  2013-08-02  4:22   ` Preeti U Murthy
  2013-08-02  7:51   ` Vincent Guittot
  2013-08-02  1:50 ` [PATCH v2 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
  2 siblings, 2 replies; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-02  1:50 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

Now checking whether this cpu is appropriate to balance or not
is embedded into update_sg_lb_stats() and this checking has no direct
relationship to this function. There is not enough reason to place
this checking at update_sg_lb_stats(), except saving one iteration
for sched_group_cpus.

In this patch, I factor out this checking to should_we_balance() function.
And before doing actual work for load_balancing, check whether this cpu is
appropriate to balance via should_we_balance(). If this cpu is not
a candidate for balancing, it quit the work immediately.

With this change, we can save two memset cost and can expect better
compiler optimization.

Below is result of this patch.

* Vanilla *
   text	   data	    bss	    dec	    hex	filename
  34499	   1136	    116	  35751	   8ba7	kernel/sched/fair.o

* Patched *
   text	   data	    bss	    dec	    hex	filename
  34243	   1136	    116	  35495	   8aa7	kernel/sched/fair.o

In addition, rename @balance to @should_balance in order to represent
its purpose more clearly.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eaae77e..7f51b8c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4426,22 +4426,17 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
  * @group: sched_group whose statistics are to be updated.
  * @load_idx: Load index of sched_domain of this_cpu for load calc.
  * @local_group: Does group contain this_cpu.
- * @balance: Should we balance.
  * @sgs: variable to hold the statistics for this group.
  */
 static inline void update_sg_lb_stats(struct lb_env *env,
 			struct sched_group *group, int load_idx,
-			int local_group, int *balance, struct sg_lb_stats *sgs)
+			int local_group, struct sg_lb_stats *sgs)
 {
 	unsigned long nr_running, max_nr_running, min_nr_running;
 	unsigned long load, max_cpu_load, min_cpu_load;
-	unsigned int balance_cpu = -1, first_idle_cpu = 0;
 	unsigned long avg_load_per_task = 0;
 	int i;
 
-	if (local_group)
-		balance_cpu = group_balance_cpu(group);
-
 	/* Tally up the load of all CPUs in the group */
 	max_cpu_load = 0;
 	min_cpu_load = ~0UL;
@@ -4454,15 +4449,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		nr_running = rq->nr_running;
 
 		/* Bias balancing toward cpus of our domain */
-		if (local_group) {
-			if (idle_cpu(i) && !first_idle_cpu &&
-					cpumask_test_cpu(i, sched_group_mask(group))) {
-				first_idle_cpu = 1;
-				balance_cpu = i;
-			}
-
+		if (local_group)
 			load = target_load(i, load_idx);
-		} else {
+		else {
 			load = source_load(i, load_idx);
 			if (load > max_cpu_load)
 				max_cpu_load = load;
@@ -4482,22 +4471,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			sgs->idle_cpus++;
 	}
 
-	/*
-	 * First idle cpu or the first cpu(busiest) in this sched group
-	 * is eligible for doing load balancing at this and above
-	 * domains. In the newly idle case, we will allow all the cpu's
-	 * to do the newly idle load balance.
-	 */
-	if (local_group) {
-		if (env->idle != CPU_NEWLY_IDLE) {
-			if (balance_cpu != env->dst_cpu) {
-				*balance = 0;
-				return;
-			}
-			update_group_power(env->sd, env->dst_cpu);
-		} else if (time_after_eq(jiffies, group->sgp->next_update))
-			update_group_power(env->sd, env->dst_cpu);
-	}
+	if (local_group && (env->idle != CPU_NEWLY_IDLE ||
+			time_after_eq(jiffies, group->sgp->next_update)))
+		update_group_power(env->sd, env->dst_cpu);
 
 	/* Adjust by relative CPU power of the group */
 	sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / group->sgp->power;
@@ -4576,7 +4552,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
  * @sds: variable to hold the statistics for this sched_domain.
  */
 static inline void update_sd_lb_stats(struct lb_env *env,
-					int *balance, struct sd_lb_stats *sds)
+					struct sd_lb_stats *sds)
 {
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
@@ -4593,10 +4569,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
 		memset(&sgs, 0, sizeof(sgs));
-		update_sg_lb_stats(env, sg, load_idx, local_group, balance, &sgs);
-
-		if (local_group && !(*balance))
-			return;
+		update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
 
 		sds->total_load += sgs.group_load;
 		sds->total_pwr += sg->sgp->power;
@@ -4829,8 +4802,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
  * to restore balance.
  *
  * @env: The load balancing environment.
- * @balance: Pointer to a variable indicating if this_cpu
- *	is the appropriate cpu to perform load balancing at this_level.
  *
  * Returns:	- the busiest group if imbalance exists.
  *		- If no imbalance and user has opted for power-savings balance,
@@ -4838,7 +4809,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
  *		   put to idle by rebalancing its tasks onto our group.
  */
 static struct sched_group *
-find_busiest_group(struct lb_env *env, int *balance)
+find_busiest_group(struct lb_env *env)
 {
 	struct sd_lb_stats sds;
 
@@ -4848,14 +4819,7 @@ find_busiest_group(struct lb_env *env, int *balance)
 	 * Compute the various statistics relavent for load balancing at
 	 * this level.
 	 */
-	update_sd_lb_stats(env, balance, &sds);
-
-	/*
-	 * this_cpu is not the appropriate cpu to perform load balancing at
-	 * this level.
-	 */
-	if (!(*balance))
-		goto ret;
+	update_sd_lb_stats(env, &sds);
 
 	if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
 	    check_asym_packing(env, &sds))
@@ -4919,7 +4883,6 @@ force_balance:
 	return sds.busiest;
 
 out_balanced:
-ret:
 	env->imbalance = 0;
 	return NULL;
 }
@@ -5001,13 +4964,47 @@ static int need_active_balance(struct lb_env *env)
 
 static int active_load_balance_cpu_stop(void *data);
 
+static int should_we_balance(struct lb_env *env)
+{
+	struct sched_group *sg = env->sd->groups;
+	struct cpumask *sg_cpus, *sg_mask;
+	int cpu, balance_cpu = -1;
+
+	/*
+	 * In the newly idle case, we will allow all the cpu's
+	 * to do the newly idle load balance.
+	 */
+	if (env->idle == CPU_NEWLY_IDLE)
+		return 1;
+
+	sg_cpus = sched_group_cpus(sg);
+	sg_mask = sched_group_mask(sg);
+	/* Try to find first idle cpu */
+	for_each_cpu_and(cpu, sg_cpus, env->cpus) {
+		if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
+			continue;
+
+		balance_cpu = cpu;
+		break;
+	}
+
+	if (balance_cpu == -1)
+		balance_cpu = group_balance_cpu(sg);
+
+	/*
+	 * First idle cpu or the first cpu(busiest) in this sched group
+	 * is eligible for doing load balancing at this and above domains.
+	 */
+	return balance_cpu != env->dst_cpu;
+}
+
 /*
  * Check this_cpu to ensure it is balanced within domain. Attempt to move
  * tasks if there is an imbalance.
  */
 static int load_balance(int this_cpu, struct rq *this_rq,
 			struct sched_domain *sd, enum cpu_idle_type idle,
-			int *balance)
+			int *should_balance)
 {
 	int ld_moved, cur_ld_moved, active_balance = 0;
 	struct sched_group *group;
@@ -5036,12 +5033,14 @@ static int load_balance(int this_cpu, struct rq *this_rq,
 
 	schedstat_inc(sd, lb_count[idle]);
 
-redo:
-	group = find_busiest_group(&env, balance);
-
-	if (*balance == 0)
+	if (!should_we_balance(&env)) {
+		*should_balance = 0;
 		goto out_balanced;
+	} else
+		*should_balance = 1;
 
+redo:
+	group = find_busiest_group(&env);
 	if (!group) {
 		schedstat_inc(sd, lb_nobusyg[idle]);
 		goto out_balanced;
@@ -5254,7 +5253,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	rcu_read_lock();
 	for_each_domain(this_cpu, sd) {
 		unsigned long interval;
-		int balance = 1;
+		int should_balance;
 
 		if (!(sd->flags & SD_LOAD_BALANCE))
 			continue;
@@ -5262,7 +5261,8 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 		if (sd->flags & SD_BALANCE_NEWIDLE) {
 			/* If we've pulled tasks over stop searching: */
 			pulled_task = load_balance(this_cpu, this_rq,
-						   sd, CPU_NEWLY_IDLE, &balance);
+						   sd, CPU_NEWLY_IDLE,
+						   &should_balance);
 		}
 
 		interval = msecs_to_jiffies(sd->balance_interval);
@@ -5502,7 +5502,7 @@ void update_max_interval(void)
  */
 static void rebalance_domains(int cpu, enum cpu_idle_type idle)
 {
-	int balance = 1;
+	int should_balance;
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long interval;
 	struct sched_domain *sd;
@@ -5534,7 +5534,7 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
 		}
 
 		if (time_after_eq(jiffies, sd->last_balance + interval)) {
-			if (load_balance(cpu, rq, sd, idle, &balance)) {
+			if (load_balance(cpu, rq, sd, idle, &should_balance)) {
 				/*
 				 * The LBF_SOME_PINNED logic could have changed
 				 * env->dst_cpu, so we can't know our idle
@@ -5557,7 +5557,7 @@ out:
 		 * CPU in our sched group which is doing load balancing more
 		 * actively.
 		 */
-		if (!balance)
+		if (!should_balance)
 			break;
 	}
 	rcu_read_unlock();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/3] sched: clean-up struct sd_lb_stat
  2013-08-02  1:50 [PATCH v2 0/3] optimization, clean-up about fair.c Joonsoo Kim
  2013-08-02  1:50 ` [PATCH v2 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
  2013-08-02  1:50 ` [PATCH v2 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
@ 2013-08-02  1:50 ` Joonsoo Kim
  2013-08-02  4:40   ` Preeti U Murthy
  2 siblings, 1 reply; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-02  1:50 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, Mike Galbraith, Paul Turner, Alex Shi,
	Preeti U Murthy, Vincent Guittot, Morten Rasmussen, Namhyung Kim,
	Joonsoo Kim, Joonsoo Kim

There is no reason to maintain separate variables for this_group
and busiest_group in sd_lb_stat, except saving some space.
But this structure is always allocated in stack, so this saving
isn't really benificial.

This patch unify these variables, so IMO, readability may be improved.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f51b8c..f72ee7d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4195,36 +4195,6 @@ static unsigned long task_h_load(struct task_struct *p)
 
 /********** Helpers for find_busiest_group ************************/
 /*
- * sd_lb_stats - Structure to store the statistics of a sched_domain
- * 		during load balancing.
- */
-struct sd_lb_stats {
-	struct sched_group *busiest; /* Busiest group in this sd */
-	struct sched_group *this;  /* Local group in this sd */
-	unsigned long total_load;  /* Total load of all groups in sd */
-	unsigned long total_pwr;   /*	Total power of all groups in sd */
-	unsigned long avg_load;	   /* Average load across all groups in sd */
-
-	/** Statistics of this group */
-	unsigned long this_load;
-	unsigned long this_load_per_task;
-	unsigned long this_nr_running;
-	unsigned long this_has_capacity;
-	unsigned int  this_idle_cpus;
-
-	/* Statistics of the busiest group */
-	unsigned int  busiest_idle_cpus;
-	unsigned long max_load;
-	unsigned long busiest_load_per_task;
-	unsigned long busiest_nr_running;
-	unsigned long busiest_group_capacity;
-	unsigned long busiest_has_capacity;
-	unsigned int  busiest_group_weight;
-
-	int group_imb; /* Is there imbalance in this sd */
-};
-
-/*
  * sg_lb_stats - stats of a sched_group required for load_balancing
  */
 struct sg_lb_stats {
@@ -4239,6 +4209,24 @@ struct sg_lb_stats {
 	int group_has_capacity; /* Is there extra capacity in the group? */
 };
 
+/*
+ * sd_lb_stats - Structure to store the statistics of a sched_domain
+ *		during load balancing.
+ */
+struct sd_lb_stats {
+	struct sched_group *busiest; /* Busiest group in this sd */
+	struct sched_group *this;  /* Local group in this sd */
+	unsigned long total_load;  /* Total load of all groups in sd */
+	unsigned long total_pwr;   /*	Total power of all groups in sd */
+	unsigned long sd_avg_load; /* Average load across all groups in sd */
+
+	/* Statistics of this group */
+	struct sg_lb_stats this_stat;
+
+	/* Statistics of the busiest group */
+	struct sg_lb_stats busiest_stat;
+};
+
 /**
  * get_sd_load_idx - Obtain the load index for a given sched domain.
  * @sd: The sched_domain whose load_idx is to be obtained.
@@ -4519,7 +4507,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 				   struct sched_group *sg,
 				   struct sg_lb_stats *sgs)
 {
-	if (sgs->avg_load <= sds->max_load)
+	if (sgs->avg_load <= sds->busiest_stat.avg_load)
 		return false;
 
 	if (sgs->sum_nr_running > sgs->group_capacity)
@@ -4556,7 +4544,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 {
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
-	struct sg_lb_stats sgs;
+	struct sg_lb_stats tmp_sgs;
 	int load_idx, prefer_sibling = 0;
 
 	if (child && child->flags & SD_PREFER_SIBLING)
@@ -4566,13 +4554,16 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 
 	do {
 		int local_group;
+		struct sg_lb_stats *sgs;
 
 		local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
-		memset(&sgs, 0, sizeof(sgs));
-		update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
-
-		sds->total_load += sgs.group_load;
-		sds->total_pwr += sg->sgp->power;
+		if (local_group)
+			sgs = &sds->this_stat;
+		else {
+			sgs = &tmp_sgs;
+			memset(sgs, 0, sizeof(*sgs));
+		}
+		update_sg_lb_stats(env, sg, load_idx, local_group, sgs);
 
 		/*
 		 * In case the child domain prefers tasks go to siblings
@@ -4584,26 +4575,19 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 		 * heaviest group when it is already under-utilized (possible
 		 * with a large weight task outweighs the tasks on the system).
 		 */
-		if (prefer_sibling && !local_group && sds->this_has_capacity)
-			sgs.group_capacity = min(sgs.group_capacity, 1UL);
+		if (prefer_sibling && !local_group &&
+				sds->this && sds->this_stat.group_has_capacity)
+			sgs->group_capacity = min(sgs->group_capacity, 1UL);
 
-		if (local_group) {
-			sds->this_load = sgs.avg_load;
+		/* Now, start updating sd_lb_stats */
+		sds->total_load += sgs->group_load;
+		sds->total_pwr += sg->sgp->power;
+
+		if (local_group)
 			sds->this = sg;
-			sds->this_nr_running = sgs.sum_nr_running;
-			sds->this_load_per_task = sgs.sum_weighted_load;
-			sds->this_has_capacity = sgs.group_has_capacity;
-			sds->this_idle_cpus = sgs.idle_cpus;
-		} else if (update_sd_pick_busiest(env, sds, sg, &sgs)) {
-			sds->max_load = sgs.avg_load;
+		else if (update_sd_pick_busiest(env, sds, sg, sgs)) {
 			sds->busiest = sg;
-			sds->busiest_nr_running = sgs.sum_nr_running;
-			sds->busiest_idle_cpus = sgs.idle_cpus;
-			sds->busiest_group_capacity = sgs.group_capacity;
-			sds->busiest_load_per_task = sgs.sum_weighted_load;
-			sds->busiest_has_capacity = sgs.group_has_capacity;
-			sds->busiest_group_weight = sgs.group_weight;
-			sds->group_imb = sgs.group_imb;
+			sds->busiest_stat = *sgs;
 		}
 
 		sg = sg->next;
@@ -4647,8 +4631,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (env->dst_cpu > busiest_cpu)
 		return 0;
 
-	env->imbalance = DIV_ROUND_CLOSEST(
-		sds->max_load * sds->busiest->sgp->power, SCHED_POWER_SCALE);
+	env->imbalance = DIV_ROUND_CLOSEST(sds->busiest_stat.avg_load *
+				sds->busiest->sgp->power, SCHED_POWER_SCALE);
 
 	return 1;
 }
@@ -4666,24 +4650,22 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 	unsigned long tmp, pwr_now = 0, pwr_move = 0;
 	unsigned int imbn = 2;
 	unsigned long scaled_busy_load_per_task;
+	struct sg_lb_stats *this, *busiest;
 
-	if (sds->this_nr_running) {
-		sds->this_load_per_task /= sds->this_nr_running;
-		if (sds->busiest_load_per_task >
-				sds->this_load_per_task)
+	this = &sds->this_stat;
+	busiest = &sds->busiest_stat;
+
+	if (!this->sum_nr_running)
+		this->sum_weighted_load = cpu_avg_load_per_task(env->dst_cpu);
+	else if (busiest->sum_weighted_load > this->sum_weighted_load)
 			imbn = 1;
-	} else {
-		sds->this_load_per_task =
-			cpu_avg_load_per_task(env->dst_cpu);
-	}
 
-	scaled_busy_load_per_task = sds->busiest_load_per_task
-					 * SCHED_POWER_SCALE;
-	scaled_busy_load_per_task /= sds->busiest->sgp->power;
+	scaled_busy_load_per_task = (busiest->sum_weighted_load *
+			SCHED_POWER_SCALE) / sds->busiest->sgp->power;
 
-	if (sds->max_load - sds->this_load + scaled_busy_load_per_task >=
-			(scaled_busy_load_per_task * imbn)) {
-		env->imbalance = sds->busiest_load_per_task;
+	if (busiest->avg_load - this->avg_load + scaled_busy_load_per_task >=
+		(scaled_busy_load_per_task * imbn)) {
+		env->imbalance = busiest->sum_weighted_load;
 		return;
 	}
 
@@ -4694,33 +4676,34 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 	 */
 
 	pwr_now += sds->busiest->sgp->power *
-			min(sds->busiest_load_per_task, sds->max_load);
+			min(busiest->sum_weighted_load, busiest->avg_load);
 	pwr_now += sds->this->sgp->power *
-			min(sds->this_load_per_task, sds->this_load);
+			min(this->sum_weighted_load, this->avg_load);
 	pwr_now /= SCHED_POWER_SCALE;
 
 	/* Amount of load we'd subtract */
-	tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
+	tmp = (busiest->sum_weighted_load * SCHED_POWER_SCALE) /
 		sds->busiest->sgp->power;
-	if (sds->max_load > tmp)
+	if (busiest->avg_load > tmp)
 		pwr_move += sds->busiest->sgp->power *
-			min(sds->busiest_load_per_task, sds->max_load - tmp);
+			min(busiest->sum_weighted_load,
+				busiest->avg_load - tmp);
 
 	/* Amount of load we'd add */
-	if (sds->max_load * sds->busiest->sgp->power <
-		sds->busiest_load_per_task * SCHED_POWER_SCALE)
-		tmp = (sds->max_load * sds->busiest->sgp->power) /
+	if (busiest->avg_load * sds->busiest->sgp->power <
+		busiest->sum_weighted_load * SCHED_POWER_SCALE)
+		tmp = (busiest->avg_load * sds->busiest->sgp->power) /
 			sds->this->sgp->power;
 	else
-		tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
+		tmp = (busiest->sum_weighted_load * SCHED_POWER_SCALE) /
 			sds->this->sgp->power;
 	pwr_move += sds->this->sgp->power *
-			min(sds->this_load_per_task, sds->this_load + tmp);
+			min(this->sum_weighted_load, this->avg_load + tmp);
 	pwr_move /= SCHED_POWER_SCALE;
 
 	/* Move if we gain throughput */
 	if (pwr_move > pwr_now)
-		env->imbalance = sds->busiest_load_per_task;
+		env->imbalance = busiest->sum_weighted_load;
 }
 
 /**
@@ -4732,11 +4715,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 {
 	unsigned long max_pull, load_above_capacity = ~0UL;
-
-	sds->busiest_load_per_task /= sds->busiest_nr_running;
-	if (sds->group_imb) {
-		sds->busiest_load_per_task =
-			min(sds->busiest_load_per_task, sds->avg_load);
+	struct sg_lb_stats *this, *busiest;
+
+	/* After below code, we use sum_weighted_load of
+	 * this_stat and busiest_stat as load_per_task */
+	this = &sds->this_stat;
+	if (this->sum_nr_running)
+		this->sum_weighted_load /= this->sum_nr_running;
+
+	busiest = &sds->busiest_stat;
+	/* busiest must have some tasks */
+	busiest->sum_weighted_load /= busiest->sum_nr_running;
+	if (busiest->group_imb) {
+		busiest->sum_weighted_load =
+			min(busiest->sum_weighted_load, sds->sd_avg_load);
 	}
 
 	/*
@@ -4744,17 +4736,17 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * max load less than avg load(as we skip the groups at or below
 	 * its cpu_power, while calculating max_load..)
 	 */
-	if (sds->max_load < sds->avg_load) {
+	if (busiest->avg_load < this->avg_load) {
 		env->imbalance = 0;
 		return fix_small_imbalance(env, sds);
 	}
 
-	if (!sds->group_imb) {
+	if (!busiest->group_imb) {
 		/*
 		 * Don't want to pull so many tasks that a group would go idle.
 		 */
-		load_above_capacity = (sds->busiest_nr_running -
-						sds->busiest_group_capacity);
+		load_above_capacity = (busiest->sum_nr_running -
+						busiest->group_capacity);
 
 		load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_POWER_SCALE);
 
@@ -4771,12 +4763,13 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * Be careful of negative numbers as they'll appear as very large values
 	 * with unsigned longs.
 	 */
-	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
+	max_pull = min(busiest->avg_load - sds->sd_avg_load,
+						load_above_capacity);
 
 	/* How much load to actually move to equalise the imbalance */
 	env->imbalance = min(max_pull * sds->busiest->sgp->power,
-		(sds->avg_load - sds->this_load) * sds->this->sgp->power)
-			/ SCHED_POWER_SCALE;
+		(sds->sd_avg_load - this->avg_load) *
+			sds->this->sgp->power) / SCHED_POWER_SCALE;
 
 	/*
 	 * if *imbalance is less than the average load per runnable task
@@ -4784,7 +4777,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * a think about bumping its value to force at least one task to be
 	 * moved
 	 */
-	if (env->imbalance < sds->busiest_load_per_task)
+	if (env->imbalance < busiest->sum_weighted_load)
 		return fix_small_imbalance(env, sds);
 
 }
@@ -4812,6 +4805,7 @@ static struct sched_group *
 find_busiest_group(struct lb_env *env)
 {
 	struct sd_lb_stats sds;
+	struct sg_lb_stats *this, *busiest;
 
 	memset(&sds, 0, sizeof(sds));
 
@@ -4820,42 +4814,44 @@ find_busiest_group(struct lb_env *env)
 	 * this level.
 	 */
 	update_sd_lb_stats(env, &sds);
+	this = &sds.this_stat;
+	busiest = &sds.busiest_stat;
 
 	if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
 	    check_asym_packing(env, &sds))
 		return sds.busiest;
 
 	/* There is no busy sibling group to pull tasks from */
-	if (!sds.busiest || sds.busiest_nr_running == 0)
+	if (!sds.busiest || busiest->sum_nr_running == 0)
 		goto out_balanced;
 
-	sds.avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
+	sds.sd_avg_load = (SCHED_POWER_SCALE * sds.total_load) / sds.total_pwr;
 
 	/*
 	 * If the busiest group is imbalanced the below checks don't
 	 * work because they assumes all things are equal, which typically
 	 * isn't true due to cpus_allowed constraints and the like.
 	 */
-	if (sds.group_imb)
+	if (busiest->group_imb)
 		goto force_balance;
 
 	/* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
-	if (env->idle == CPU_NEWLY_IDLE && sds.this_has_capacity &&
-			!sds.busiest_has_capacity)
+	if (env->idle == CPU_NEWLY_IDLE &&
+		this->group_has_capacity && !busiest->group_has_capacity)
 		goto force_balance;
 
 	/*
 	 * If the local group is more busy than the selected busiest group
 	 * don't try and pull any tasks.
 	 */
-	if (sds.this_load >= sds.max_load)
+	if (this->avg_load >= busiest->avg_load)
 		goto out_balanced;
 
 	/*
 	 * Don't pull any tasks if this group is already above the domain
 	 * average load.
 	 */
-	if (sds.this_load >= sds.avg_load)
+	if (this->avg_load >= sds.sd_avg_load)
 		goto out_balanced;
 
 	if (env->idle == CPU_IDLE) {
@@ -4865,15 +4861,16 @@ find_busiest_group(struct lb_env *env)
 		 * there is no imbalance between this and busiest group
 		 * wrt to idle cpu's, it is balanced.
 		 */
-		if ((sds.this_idle_cpus <= sds.busiest_idle_cpus + 1) &&
-		    sds.busiest_nr_running <= sds.busiest_group_weight)
+		if ((this->idle_cpus <= busiest->idle_cpus + 1) &&
+			busiest->sum_nr_running <= busiest->group_weight)
 			goto out_balanced;
 	} else {
 		/*
 		 * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use
 		 * imbalance_pct to be conservative.
 		 */
-		if (100 * sds.max_load <= env->sd->imbalance_pct * sds.this_load)
+		if (100 * busiest->avg_load <=
+				env->sd->imbalance_pct * this->avg_load)
 			goto out_balanced;
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  1:50 ` [PATCH v2 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
@ 2013-08-02  4:22   ` Preeti U Murthy
  2013-08-02  9:05     ` 김준수
  2013-08-02  7:51   ` Vincent Guittot
  1 sibling, 1 reply; 17+ messages in thread
From: Preeti U Murthy @ 2013-08-02  4:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Mike Galbraith,
	Paul Turner, Alex Shi, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim, Joonsoo Kim

Hi Joonsoo,

On 08/02/2013 07:20 AM, Joonsoo Kim wrote:
> Now checking whether this cpu is appropriate to balance or not
> is embedded into update_sg_lb_stats() and this checking has no direct
> relationship to this function. There is not enough reason to place
> this checking at update_sg_lb_stats(), except saving one iteration
> for sched_group_cpus.
> 
> In this patch, I factor out this checking to should_we_balance() function.
> And before doing actual work for load_balancing, check whether this cpu is
> appropriate to balance via should_we_balance(). If this cpu is not
> a candidate for balancing, it quit the work immediately.
> 
> With this change, we can save two memset cost and can expect better
> compiler optimization.
> 
> Below is result of this patch.
> 
> * Vanilla *
>    text	   data	    bss	    dec	    hex	filename
>   34499	   1136	    116	  35751	   8ba7	kernel/sched/fair.o
> 
> * Patched *
>    text	   data	    bss	    dec	    hex	filename
>   34243	   1136	    116	  35495	   8aa7	kernel/sched/fair.o
> 
> In addition, rename @balance to @should_balance in order to represent
> its purpose more clearly.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index eaae77e..7f51b8c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4426,22 +4426,17 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
>   * @group: sched_group whose statistics are to be updated.
>   * @load_idx: Load index of sched_domain of this_cpu for load calc.
>   * @local_group: Does group contain this_cpu.
> - * @balance: Should we balance.
>   * @sgs: variable to hold the statistics for this group.
>   */
>  static inline void update_sg_lb_stats(struct lb_env *env,
>  			struct sched_group *group, int load_idx,
> -			int local_group, int *balance, struct sg_lb_stats *sgs)
> +			int local_group, struct sg_lb_stats *sgs)
>  {
>  	unsigned long nr_running, max_nr_running, min_nr_running;
>  	unsigned long load, max_cpu_load, min_cpu_load;
> -	unsigned int balance_cpu = -1, first_idle_cpu = 0;
>  	unsigned long avg_load_per_task = 0;
>  	int i;
> 
> -	if (local_group)
> -		balance_cpu = group_balance_cpu(group);
> -
>  	/* Tally up the load of all CPUs in the group */
>  	max_cpu_load = 0;
>  	min_cpu_load = ~0UL;
> @@ -4454,15 +4449,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  		nr_running = rq->nr_running;
> 
>  		/* Bias balancing toward cpus of our domain */
> -		if (local_group) {
> -			if (idle_cpu(i) && !first_idle_cpu &&
> -					cpumask_test_cpu(i, sched_group_mask(group))) {
> -				first_idle_cpu = 1;
> -				balance_cpu = i;
> -			}
> -
> +		if (local_group)
>  			load = target_load(i, load_idx);
> -		} else {
> +		else {
>  			load = source_load(i, load_idx);
>  			if (load > max_cpu_load)
>  				max_cpu_load = load;
> @@ -4482,22 +4471,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  			sgs->idle_cpus++;
>  	}
> 
> -	/*
> -	 * First idle cpu or the first cpu(busiest) in this sched group
> -	 * is eligible for doing load balancing at this and above
> -	 * domains. In the newly idle case, we will allow all the cpu's
> -	 * to do the newly idle load balance.
> -	 */
> -	if (local_group) {
> -		if (env->idle != CPU_NEWLY_IDLE) {
> -			if (balance_cpu != env->dst_cpu) {
> -				*balance = 0;
> -				return;
> -			}
> -			update_group_power(env->sd, env->dst_cpu);
> -		} else if (time_after_eq(jiffies, group->sgp->next_update))
> -			update_group_power(env->sd, env->dst_cpu);
> -	}

Observe what is happening in the above code which checks if we should
balance on the env->dst_cpu.

Only if the env->dst_cpu "belongs" to the group considered in
update_sg_lb_stats(), which means local_group = 1,should the above
checks be carried out.

Meaning, if there is a better CPU in the same group to which
env->dst_cpu belongs, to carry out load balancing for the system (in the
above case, balance_cpu), cancel the current iteration of load balancing
on env->dst_cpu. Wait for the right cpu in this group to do the load
balancing.

Keeping this in mind see the below comments around should_we_balance().

> @@ -5001,13 +4964,47 @@ static int need_active_balance(struct lb_env *env)
> 
>  static int active_load_balance_cpu_stop(void *data);
> 
> +static int should_we_balance(struct lb_env *env)
> +{
> +	struct sched_group *sg = env->sd->groups;
> +	struct cpumask *sg_cpus, *sg_mask;
> +	int cpu, balance_cpu = -1;
> +
> +	/*
> +	 * In the newly idle case, we will allow all the cpu's
> +	 * to do the newly idle load balance.
> +	 */
> +	if (env->idle == CPU_NEWLY_IDLE)
> +		return 1;
> +
> +	sg_cpus = sched_group_cpus(sg);
> +	sg_mask = sched_group_mask(sg);
> +	/* Try to find first idle cpu */
> +	for_each_cpu_and(cpu, sg_cpus, env->cpus) {
> +		if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
> +			continue;
> +
> +		balance_cpu = cpu;
> +		break;
> +	}

You need to iterate over all the groups of the sched domain env->sd and
not just the first group of env->sd like you are doing above. This is to
check to which group the env->dst_cpu belongs to. Only for that group
should you do the above check and set of balance_cpu and the below check
of balance_cpu != env->dst_cpu.

> +
> +	if (balance_cpu == -1)
> +		balance_cpu = group_balance_cpu(sg);
> +
> +	/*
> +	 * First idle cpu or the first cpu(busiest) in this sched group
> +	 * is eligible for doing load balancing at this and above domains.
> +	 */
> +	return balance_cpu != env->dst_cpu;

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] sched: clean-up struct sd_lb_stat
  2013-08-02  1:50 ` [PATCH v2 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
@ 2013-08-02  4:40   ` Preeti U Murthy
  2013-08-05  7:32     ` Joonsoo Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Preeti U Murthy @ 2013-08-02  4:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Mike Galbraith,
	Paul Turner, Alex Shi, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim, Joonsoo Kim

Hi Joonsoo,

On 08/02/2013 07:20 AM, Joonsoo Kim wrote:
> There is no reason to maintain separate variables for this_group
> and busiest_group in sd_lb_stat, except saving some space.
> But this structure is always allocated in stack, so this saving
> isn't really benificial.
> 
> This patch unify these variables, so IMO, readability may be improved.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7f51b8c..f72ee7d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4195,36 +4195,6 @@ static unsigned long task_h_load(struct task_struct *p)
> 
>  /********** Helpers for find_busiest_group ************************/
>  /*
> - * sd_lb_stats - Structure to store the statistics of a sched_domain
> - * 		during load balancing.
> - */
> -struct sd_lb_stats {
> -	struct sched_group *busiest; /* Busiest group in this sd */
> -	struct sched_group *this;  /* Local group in this sd */
> -	unsigned long total_load;  /* Total load of all groups in sd */
> -	unsigned long total_pwr;   /*	Total power of all groups in sd */
> -	unsigned long avg_load;	   /* Average load across all groups in sd */
> -
> -	/** Statistics of this group */
> -	unsigned long this_load;
> -	unsigned long this_load_per_task;
> -	unsigned long this_nr_running;
> -	unsigned long this_has_capacity;
> -	unsigned int  this_idle_cpus;
> -
> -	/* Statistics of the busiest group */
> -	unsigned int  busiest_idle_cpus;
> -	unsigned long max_load;
> -	unsigned long busiest_load_per_task;
> -	unsigned long busiest_nr_running;
> -	unsigned long busiest_group_capacity;
> -	unsigned long busiest_has_capacity;
> -	unsigned int  busiest_group_weight;
> -
> -	int group_imb; /* Is there imbalance in this sd */
> -};

There are a few parameters that you should not be removing.
busiest_load_per_task and max_load. busiest_load_per_task is a metric
that is not present for sg_lb_stats. And max_load going by its usage in
load balancing is best placed under sd_lb_stats. Removing it could cause
some confusion. See below.
>  /**
> @@ -4732,11 +4715,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
>  static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
>  {
>  	unsigned long max_pull, load_above_capacity = ~0UL;
> -
> -	sds->busiest_load_per_task /= sds->busiest_nr_running;
> -	if (sds->group_imb) {
> -		sds->busiest_load_per_task =
> -			min(sds->busiest_load_per_task, sds->avg_load);
> +	struct sg_lb_stats *this, *busiest;
> +
> +	/* After below code, we use sum_weighted_load of
> +	 * this_stat and busiest_stat as load_per_task */
> +	this = &sds->this_stat;
> +	if (this->sum_nr_running)
> +		this->sum_weighted_load /= this->sum_nr_running;
> +
> +	busiest = &sds->busiest_stat;
> +	/* busiest must have some tasks */
> +	busiest->sum_weighted_load /= busiest->sum_nr_running;

You cannot change sum_weighted_load of this_grp and busiest_grp like
this. They are meant to carry the total load of the group and not the
load per task in that group. This is why there is a separate metric
busiest_load_per_task and this_load_per_task under sd_lb_stats to convey
exactly what is being done. Doing the above will confuse the readers of
this code.

> +	if (busiest->group_imb) {
> +		busiest->sum_weighted_load =
> +			min(busiest->sum_weighted_load, sds->sd_avg_load);

Right here we get confused as to why the total load is being compared
against load per task (although you are changing it to load per task above).

>  	}
> 
>  	/*
> @@ -4744,17 +4736,17 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  	 * max load less than avg load(as we skip the groups at or below
>  	 * its cpu_power, while calculating max_load..)
>  	 */
> -	if (sds->max_load < sds->avg_load) {
> +	if (busiest->avg_load < this->avg_load) {
>  		env->imbalance = 0;
>  		return fix_small_imbalance(env, sds);
>  	}
> 
> -	if (!sds->group_imb) {
> +	if (!busiest->group_imb) {
>  		/*
>  		 * Don't want to pull so many tasks that a group would go idle.
>  		 */
> -		load_above_capacity = (sds->busiest_nr_running -
> -						sds->busiest_group_capacity);
> +		load_above_capacity = (busiest->sum_nr_running -
> +						busiest->group_capacity);
> 
>  		load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_POWER_SCALE);
> 
> @@ -4771,12 +4763,13 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  	 * Be careful of negative numbers as they'll appear as very large values
>  	 * with unsigned longs.
>  	 */
> -	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
> +	max_pull = min(busiest->avg_load - sds->sd_avg_load,
> +						load_above_capacity);

This is ok, but readability wise max_load is much better. max_load
signifies the maximum load per task on a group in this sd, and avg_load
signifies the total load per task across the sd. You are checking if
there is imbalance in the total load in the sd and try to even it out
across the sd.  Here "busiest" does not convey this immediately.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  1:50 ` [PATCH v2 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
  2013-08-02  4:22   ` Preeti U Murthy
@ 2013-08-02  7:51   ` Vincent Guittot
  2013-08-02  9:08     ` Joonsoo Kim
  1 sibling, 1 reply; 17+ messages in thread
From: Vincent Guittot @ 2013-08-02  7:51 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Mike Galbraith,
	Paul Turner, Alex Shi, Preeti U Murthy, Morten Rasmussen,
	Namhyung Kim, Joonsoo Kim

On 2 August 2013 03:50, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> Now checking whether this cpu is appropriate to balance or not
> is embedded into update_sg_lb_stats() and this checking has no direct
> relationship to this function. There is not enough reason to place
> this checking at update_sg_lb_stats(), except saving one iteration
> for sched_group_cpus.
>
> In this patch, I factor out this checking to should_we_balance() function.
> And before doing actual work for load_balancing, check whether this cpu is
> appropriate to balance via should_we_balance(). If this cpu is not
> a candidate for balancing, it quit the work immediately.
>
> With this change, we can save two memset cost and can expect better
> compiler optimization.
>
> Below is result of this patch.
>
> * Vanilla *
>    text    data     bss     dec     hex filename
>   34499    1136     116   35751    8ba7 kernel/sched/fair.o
>
> * Patched *
>    text    data     bss     dec     hex filename
>   34243    1136     116   35495    8aa7 kernel/sched/fair.o
>
> In addition, rename @balance to @should_balance in order to represent
> its purpose more clearly.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index eaae77e..7f51b8c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4426,22 +4426,17 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
>   * @group: sched_group whose statistics are to be updated.
>   * @load_idx: Load index of sched_domain of this_cpu for load calc.
>   * @local_group: Does group contain this_cpu.
> - * @balance: Should we balance.
>   * @sgs: variable to hold the statistics for this group.
>   */
>  static inline void update_sg_lb_stats(struct lb_env *env,
>                         struct sched_group *group, int load_idx,
> -                       int local_group, int *balance, struct sg_lb_stats *sgs)
> +                       int local_group, struct sg_lb_stats *sgs)
>  {
>         unsigned long nr_running, max_nr_running, min_nr_running;
>         unsigned long load, max_cpu_load, min_cpu_load;
> -       unsigned int balance_cpu = -1, first_idle_cpu = 0;
>         unsigned long avg_load_per_task = 0;
>         int i;
>
> -       if (local_group)
> -               balance_cpu = group_balance_cpu(group);
> -
>         /* Tally up the load of all CPUs in the group */
>         max_cpu_load = 0;
>         min_cpu_load = ~0UL;
> @@ -4454,15 +4449,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>                 nr_running = rq->nr_running;
>
>                 /* Bias balancing toward cpus of our domain */
> -               if (local_group) {
> -                       if (idle_cpu(i) && !first_idle_cpu &&
> -                                       cpumask_test_cpu(i, sched_group_mask(group))) {
> -                               first_idle_cpu = 1;
> -                               balance_cpu = i;
> -                       }
> -
> +               if (local_group)
>                         load = target_load(i, load_idx);
> -               } else {
> +               else {
>                         load = source_load(i, load_idx);
>                         if (load > max_cpu_load)
>                                 max_cpu_load = load;
> @@ -4482,22 +4471,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>                         sgs->idle_cpus++;
>         }
>
> -       /*
> -        * First idle cpu or the first cpu(busiest) in this sched group
> -        * is eligible for doing load balancing at this and above
> -        * domains. In the newly idle case, we will allow all the cpu's
> -        * to do the newly idle load balance.
> -        */
> -       if (local_group) {
> -               if (env->idle != CPU_NEWLY_IDLE) {
> -                       if (balance_cpu != env->dst_cpu) {
> -                               *balance = 0;
> -                               return;
> -                       }
> -                       update_group_power(env->sd, env->dst_cpu);
> -               } else if (time_after_eq(jiffies, group->sgp->next_update))
> -                       update_group_power(env->sd, env->dst_cpu);
> -       }
> +       if (local_group && (env->idle != CPU_NEWLY_IDLE ||
> +                       time_after_eq(jiffies, group->sgp->next_update)))
> +               update_group_power(env->sd, env->dst_cpu);
>
>         /* Adjust by relative CPU power of the group */
>         sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / group->sgp->power;
> @@ -4576,7 +4552,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>   * @sds: variable to hold the statistics for this sched_domain.
>   */
>  static inline void update_sd_lb_stats(struct lb_env *env,
> -                                       int *balance, struct sd_lb_stats *sds)
> +                                       struct sd_lb_stats *sds)
>  {
>         struct sched_domain *child = env->sd->child;
>         struct sched_group *sg = env->sd->groups;
> @@ -4593,10 +4569,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
>
>                 local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
>                 memset(&sgs, 0, sizeof(sgs));
> -               update_sg_lb_stats(env, sg, load_idx, local_group, balance, &sgs);
> -
> -               if (local_group && !(*balance))
> -                       return;
> +               update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
>
>                 sds->total_load += sgs.group_load;
>                 sds->total_pwr += sg->sgp->power;
> @@ -4829,8 +4802,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>   * to restore balance.
>   *
>   * @env: The load balancing environment.
> - * @balance: Pointer to a variable indicating if this_cpu
> - *     is the appropriate cpu to perform load balancing at this_level.
>   *
>   * Returns:    - the busiest group if imbalance exists.
>   *             - If no imbalance and user has opted for power-savings balance,
> @@ -4838,7 +4809,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>   *                put to idle by rebalancing its tasks onto our group.
>   */
>  static struct sched_group *
> -find_busiest_group(struct lb_env *env, int *balance)
> +find_busiest_group(struct lb_env *env)
>  {
>         struct sd_lb_stats sds;
>
> @@ -4848,14 +4819,7 @@ find_busiest_group(struct lb_env *env, int *balance)
>          * Compute the various statistics relavent for load balancing at
>          * this level.
>          */
> -       update_sd_lb_stats(env, balance, &sds);
> -
> -       /*
> -        * this_cpu is not the appropriate cpu to perform load balancing at
> -        * this level.
> -        */
> -       if (!(*balance))
> -               goto ret;
> +       update_sd_lb_stats(env, &sds);
>
>         if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
>             check_asym_packing(env, &sds))
> @@ -4919,7 +4883,6 @@ force_balance:
>         return sds.busiest;
>
>  out_balanced:
> -ret:
>         env->imbalance = 0;
>         return NULL;
>  }
> @@ -5001,13 +4964,47 @@ static int need_active_balance(struct lb_env *env)
>
>  static int active_load_balance_cpu_stop(void *data);
>
> +static int should_we_balance(struct lb_env *env)
> +{
> +       struct sched_group *sg = env->sd->groups;
> +       struct cpumask *sg_cpus, *sg_mask;
> +       int cpu, balance_cpu = -1;
> +
> +       /*
> +        * In the newly idle case, we will allow all the cpu's
> +        * to do the newly idle load balance.
> +        */
> +       if (env->idle == CPU_NEWLY_IDLE)
> +               return 1;
> +
> +       sg_cpus = sched_group_cpus(sg);
> +       sg_mask = sched_group_mask(sg);
> +       /* Try to find first idle cpu */
> +       for_each_cpu_and(cpu, sg_cpus, env->cpus) {
> +               if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
> +                       continue;
> +
> +               balance_cpu = cpu;
> +               break;
> +       }
> +
> +       if (balance_cpu == -1)
> +               balance_cpu = group_balance_cpu(sg);
> +
> +       /*
> +        * First idle cpu or the first cpu(busiest) in this sched group
> +        * is eligible for doing load balancing at this and above domains.
> +        */
> +       return balance_cpu != env->dst_cpu;
> +}
> +
>  /*
>   * Check this_cpu to ensure it is balanced within domain. Attempt to move
>   * tasks if there is an imbalance.
>   */
>  static int load_balance(int this_cpu, struct rq *this_rq,
>                         struct sched_domain *sd, enum cpu_idle_type idle,
> -                       int *balance)
> +                       int *should_balance)
>  {
>         int ld_moved, cur_ld_moved, active_balance = 0;
>         struct sched_group *group;
> @@ -5036,12 +5033,14 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>
>         schedstat_inc(sd, lb_count[idle]);
>
> -redo:
> -       group = find_busiest_group(&env, balance);
> -
> -       if (*balance == 0)
> +       if (!should_we_balance(&env)) {
> +               *should_balance = 0;
>                 goto out_balanced;
> +       } else
> +               *should_balance = 1;
>
> +redo:
> +       group = find_busiest_group(&env);
>         if (!group) {
>                 schedstat_inc(sd, lb_nobusyg[idle]);
>                 goto out_balanced;
> @@ -5254,7 +5253,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
>         rcu_read_lock();
>         for_each_domain(this_cpu, sd) {
>                 unsigned long interval;
> -               int balance = 1;
> +               int should_balance;
>
>                 if (!(sd->flags & SD_LOAD_BALANCE))
>                         continue;
> @@ -5262,7 +5261,8 @@ void idle_balance(int this_cpu, struct rq *this_rq)
>                 if (sd->flags & SD_BALANCE_NEWIDLE) {
>                         /* If we've pulled tasks over stop searching: */
>                         pulled_task = load_balance(this_cpu, this_rq,
> -                                                  sd, CPU_NEWLY_IDLE, &balance);
> +                                                  sd, CPU_NEWLY_IDLE,
> +                                                  &should_balance);
>                 }
>
>                 interval = msecs_to_jiffies(sd->balance_interval);
> @@ -5502,7 +5502,7 @@ void update_max_interval(void)
>   */
>  static void rebalance_domains(int cpu, enum cpu_idle_type idle)
>  {
> -       int balance = 1;
> +       int should_balance;

int should_balance = 1;
because if (time_after_eq(jiffies, sd->last_balance + interval)) can
skip the 1st load_balance and you will test should_balance with its
initial value.

Vincent

>         struct rq *rq = cpu_rq(cpu);
>         unsigned long interval;
>         struct sched_domain *sd;
> @@ -5534,7 +5534,7 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
>                 }
>
>                 if (time_after_eq(jiffies, sd->last_balance + interval)) {
> -                       if (load_balance(cpu, rq, sd, idle, &balance)) {
> +                       if (load_balance(cpu, rq, sd, idle, &should_balance)) {
>                                 /*
>                                  * The LBF_SOME_PINNED logic could have changed
>                                  * env->dst_cpu, so we can't know our idle
> @@ -5557,7 +5557,7 @@ out:
>                  * CPU in our sched group which is doing load balancing more
>                  * actively.
>                  */
> -               if (!balance)
> +               if (!should_balance)
>                         break;
>         }
>         rcu_read_unlock();
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  4:22   ` Preeti U Murthy
@ 2013-08-02  9:05     ` 김준수
  2013-08-02  9:26       ` Preeti U Murthy
  2013-08-02 10:20       ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: 김준수 @ 2013-08-02  9:05 UTC (permalink / raw)
  To: 'Preeti U Murthy'
  Cc: 'Ingo Molnar', 'Peter Zijlstra',
	linux-kernel, 'Mike Galbraith', 'Paul Turner',
	'Alex Shi', 'Vincent Guittot',
	'Morten Rasmussen', 'Namhyung Kim',
	'Joonsoo Kim'



> -----Original Message-----
> From: Preeti U Murthy [mailto:preeti@linux.vnet.ibm.com]
> Sent: Friday, August 02, 2013 1:23 PM
> To: Joonsoo Kim
> Cc: Ingo Molnar; Peter Zijlstra; linux-kernel@vger.kernel.org; Mike
> Galbraith; Paul Turner; Alex Shi; Vincent Guittot; Morten Rasmussen;
> Namhyung Kim; Joonsoo Kim
> Subject: Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
> 
> Hi Joonsoo,

Hello, Preeti.

> 
> On 08/02/2013 07:20 AM, Joonsoo Kim wrote:
> > Now checking whether this cpu is appropriate to balance or not is
> > embedded into update_sg_lb_stats() and this checking has no direct
> > relationship to this function. There is not enough reason to place
> > this checking at update_sg_lb_stats(), except saving one iteration for
> > sched_group_cpus.
> >
> > In this patch, I factor out this checking to should_we_balance()
> function.
> > And before doing actual work for load_balancing, check whether this
> > cpu is appropriate to balance via should_we_balance(). If this cpu is
> > not a candidate for balancing, it quit the work immediately.
> >
> > With this change, we can save two memset cost and can expect better
> > compiler optimization.
> >
> > Below is result of this patch.
> >
> > * Vanilla *
> >    text	   data	    bss	    dec	    hex	filename
> >   34499	   1136	    116	  35751	   8ba7	kernel/sched/fair.o
> >
> > * Patched *
> >    text	   data	    bss	    dec	    hex	filename
> >   34243	   1136	    116	  35495	   8aa7	kernel/sched/fair.o
> >
> > In addition, rename @balance to @should_balance in order to represent
> > its purpose more clearly.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > eaae77e..7f51b8c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4426,22 +4426,17 @@ fix_small_capacity(struct sched_domain *sd,
> struct sched_group *group)
> >   * @group: sched_group whose statistics are to be updated.
> >   * @load_idx: Load index of sched_domain of this_cpu for load calc.
> >   * @local_group: Does group contain this_cpu.
> > - * @balance: Should we balance.
> >   * @sgs: variable to hold the statistics for this group.
> >   */
> >  static inline void update_sg_lb_stats(struct lb_env *env,
> >  			struct sched_group *group, int load_idx,
> > -			int local_group, int *balance, struct sg_lb_stats
*sgs)
> > +			int local_group, struct sg_lb_stats *sgs)
> >  {
> >  	unsigned long nr_running, max_nr_running, min_nr_running;
> >  	unsigned long load, max_cpu_load, min_cpu_load;
> > -	unsigned int balance_cpu = -1, first_idle_cpu = 0;
> >  	unsigned long avg_load_per_task = 0;
> >  	int i;
> >
> > -	if (local_group)
> > -		balance_cpu = group_balance_cpu(group);
> > -
> >  	/* Tally up the load of all CPUs in the group */
> >  	max_cpu_load = 0;
> >  	min_cpu_load = ~0UL;
> > @@ -4454,15 +4449,9 @@ static inline void update_sg_lb_stats(struct
> lb_env *env,
> >  		nr_running = rq->nr_running;
> >
> >  		/* Bias balancing toward cpus of our domain */
> > -		if (local_group) {
> > -			if (idle_cpu(i) && !first_idle_cpu &&
> > -					cpumask_test_cpu(i,
> sched_group_mask(group))) {
> > -				first_idle_cpu = 1;
> > -				balance_cpu = i;
> > -			}
> > -
> > +		if (local_group)
> >  			load = target_load(i, load_idx);
> > -		} else {
> > +		else {
> >  			load = source_load(i, load_idx);
> >  			if (load > max_cpu_load)
> >  				max_cpu_load = load;
> > @@ -4482,22 +4471,9 @@ static inline void update_sg_lb_stats(struct
> lb_env *env,
> >  			sgs->idle_cpus++;
> >  	}
> >
> > -	/*
> > -	 * First idle cpu or the first cpu(busiest) in this sched group
> > -	 * is eligible for doing load balancing at this and above
> > -	 * domains. In the newly idle case, we will allow all the cpu's
> > -	 * to do the newly idle load balance.
> > -	 */
> > -	if (local_group) {
> > -		if (env->idle != CPU_NEWLY_IDLE) {
> > -			if (balance_cpu != env->dst_cpu) {
> > -				*balance = 0;
> > -				return;
> > -			}
> > -			update_group_power(env->sd, env->dst_cpu);
> > -		} else if (time_after_eq(jiffies, group->sgp->next_update))
> > -			update_group_power(env->sd, env->dst_cpu);
> > -	}
> 
> Observe what is happening in the above code which checks if we should
> balance on the env->dst_cpu.
> 
> Only if the env->dst_cpu "belongs" to the group considered in
> update_sg_lb_stats(), which means local_group = 1,should the above checks
> be carried out.
> 
> Meaning, if there is a better CPU in the same group to which
> env->dst_cpu belongs, to carry out load balancing for the system (in the
> above case, balance_cpu), cancel the current iteration of load balancing
> on env->dst_cpu. Wait for the right cpu in this group to do the load
> balancing.
> 
> Keeping this in mind see the below comments around should_we_balance().

Okay.

> 
> > @@ -5001,13 +4964,47 @@ static int need_active_balance(struct lb_env
> > *env)
> >
> >  static int active_load_balance_cpu_stop(void *data);
> >
> > +static int should_we_balance(struct lb_env *env) {
> > +	struct sched_group *sg = env->sd->groups;
> > +	struct cpumask *sg_cpus, *sg_mask;
> > +	int cpu, balance_cpu = -1;
> > +
> > +	/*
> > +	 * In the newly idle case, we will allow all the cpu's
> > +	 * to do the newly idle load balance.
> > +	 */
> > +	if (env->idle == CPU_NEWLY_IDLE)
> > +		return 1;
> > +
> > +	sg_cpus = sched_group_cpus(sg);
> > +	sg_mask = sched_group_mask(sg);
> > +	/* Try to find first idle cpu */
> > +	for_each_cpu_and(cpu, sg_cpus, env->cpus) {
> > +		if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
> > +			continue;
> > +
> > +		balance_cpu = cpu;
> > +		break;
> > +	}
> 
> You need to iterate over all the groups of the sched domain env->sd and
> not just the first group of env->sd like you are doing above. This is to

I don't think so.
IIRC, env->sd->groups always means local group,
so we don't need to find our group by iterating over all the groups.

Thanks.

> check to which group the env->dst_cpu belongs to. Only for that group
> should you do the above check and set of balance_cpu and the below check
> of balance_cpu != env->dst_cpu.
> 
> > +
> > +	if (balance_cpu == -1)
> > +		balance_cpu = group_balance_cpu(sg);
> > +
> > +	/*
> > +	 * First idle cpu or the first cpu(busiest) in this sched group
> > +	 * is eligible for doing load balancing at this and above domains.
> > +	 */
> > +	return balance_cpu != env->dst_cpu;
> 
> Regards
> Preeti U Murthy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  7:51   ` Vincent Guittot
@ 2013-08-02  9:08     ` Joonsoo Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-02  9:08 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Mike Galbraith,
	Paul Turner, Alex Shi, Preeti U Murthy, Morten Rasmussen,
	Namhyung Kim

On Fri, Aug 02, 2013 at 09:51:45AM +0200, Vincent Guittot wrote:
> On 2 August 2013 03:50, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > Now checking whether this cpu is appropriate to balance or not
> > is embedded into update_sg_lb_stats() and this checking has no direct
> > relationship to this function. There is not enough reason to place
> > this checking at update_sg_lb_stats(), except saving one iteration
> > for sched_group_cpus.
> >
> > In this patch, I factor out this checking to should_we_balance() function.
> > And before doing actual work for load_balancing, check whether this cpu is
> > appropriate to balance via should_we_balance(). If this cpu is not
> > a candidate for balancing, it quit the work immediately.
> >
> > With this change, we can save two memset cost and can expect better
> > compiler optimization.
> >
> > Below is result of this patch.
> >
> > * Vanilla *
> >    text    data     bss     dec     hex filename
> >   34499    1136     116   35751    8ba7 kernel/sched/fair.o
> >
> > * Patched *
> >    text    data     bss     dec     hex filename
> >   34243    1136     116   35495    8aa7 kernel/sched/fair.o
> >
> > In addition, rename @balance to @should_balance in order to represent
> > its purpose more clearly.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index eaae77e..7f51b8c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4426,22 +4426,17 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
> >   * @group: sched_group whose statistics are to be updated.
> >   * @load_idx: Load index of sched_domain of this_cpu for load calc.
> >   * @local_group: Does group contain this_cpu.
> > - * @balance: Should we balance.
> >   * @sgs: variable to hold the statistics for this group.
> >   */
> >  static inline void update_sg_lb_stats(struct lb_env *env,
> >                         struct sched_group *group, int load_idx,
> > -                       int local_group, int *balance, struct sg_lb_stats *sgs)
> > +                       int local_group, struct sg_lb_stats *sgs)
> >  {
> >         unsigned long nr_running, max_nr_running, min_nr_running;
> >         unsigned long load, max_cpu_load, min_cpu_load;
> > -       unsigned int balance_cpu = -1, first_idle_cpu = 0;
> >         unsigned long avg_load_per_task = 0;
> >         int i;
> >
> > -       if (local_group)
> > -               balance_cpu = group_balance_cpu(group);
> > -
> >         /* Tally up the load of all CPUs in the group */
> >         max_cpu_load = 0;
> >         min_cpu_load = ~0UL;
> > @@ -4454,15 +4449,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >                 nr_running = rq->nr_running;
> >
> >                 /* Bias balancing toward cpus of our domain */
> > -               if (local_group) {
> > -                       if (idle_cpu(i) && !first_idle_cpu &&
> > -                                       cpumask_test_cpu(i, sched_group_mask(group))) {
> > -                               first_idle_cpu = 1;
> > -                               balance_cpu = i;
> > -                       }
> > -
> > +               if (local_group)
> >                         load = target_load(i, load_idx);
> > -               } else {
> > +               else {
> >                         load = source_load(i, load_idx);
> >                         if (load > max_cpu_load)
> >                                 max_cpu_load = load;
> > @@ -4482,22 +4471,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >                         sgs->idle_cpus++;
> >         }
> >
> > -       /*
> > -        * First idle cpu or the first cpu(busiest) in this sched group
> > -        * is eligible for doing load balancing at this and above
> > -        * domains. In the newly idle case, we will allow all the cpu's
> > -        * to do the newly idle load balance.
> > -        */
> > -       if (local_group) {
> > -               if (env->idle != CPU_NEWLY_IDLE) {
> > -                       if (balance_cpu != env->dst_cpu) {
> > -                               *balance = 0;
> > -                               return;
> > -                       }
> > -                       update_group_power(env->sd, env->dst_cpu);
> > -               } else if (time_after_eq(jiffies, group->sgp->next_update))
> > -                       update_group_power(env->sd, env->dst_cpu);
> > -       }
> > +       if (local_group && (env->idle != CPU_NEWLY_IDLE ||
> > +                       time_after_eq(jiffies, group->sgp->next_update)))
> > +               update_group_power(env->sd, env->dst_cpu);
> >
> >         /* Adjust by relative CPU power of the group */
> >         sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / group->sgp->power;
> > @@ -4576,7 +4552,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >   * @sds: variable to hold the statistics for this sched_domain.
> >   */
> >  static inline void update_sd_lb_stats(struct lb_env *env,
> > -                                       int *balance, struct sd_lb_stats *sds)
> > +                                       struct sd_lb_stats *sds)
> >  {
> >         struct sched_domain *child = env->sd->child;
> >         struct sched_group *sg = env->sd->groups;
> > @@ -4593,10 +4569,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
> >
> >                 local_group = cpumask_test_cpu(env->dst_cpu, sched_group_cpus(sg));
> >                 memset(&sgs, 0, sizeof(sgs));
> > -               update_sg_lb_stats(env, sg, load_idx, local_group, balance, &sgs);
> > -
> > -               if (local_group && !(*balance))
> > -                       return;
> > +               update_sg_lb_stats(env, sg, load_idx, local_group, &sgs);
> >
> >                 sds->total_load += sgs.group_load;
> >                 sds->total_pwr += sg->sgp->power;
> > @@ -4829,8 +4802,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> >   * to restore balance.
> >   *
> >   * @env: The load balancing environment.
> > - * @balance: Pointer to a variable indicating if this_cpu
> > - *     is the appropriate cpu to perform load balancing at this_level.
> >   *
> >   * Returns:    - the busiest group if imbalance exists.
> >   *             - If no imbalance and user has opted for power-savings balance,
> > @@ -4838,7 +4809,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> >   *                put to idle by rebalancing its tasks onto our group.
> >   */
> >  static struct sched_group *
> > -find_busiest_group(struct lb_env *env, int *balance)
> > +find_busiest_group(struct lb_env *env)
> >  {
> >         struct sd_lb_stats sds;
> >
> > @@ -4848,14 +4819,7 @@ find_busiest_group(struct lb_env *env, int *balance)
> >          * Compute the various statistics relavent for load balancing at
> >          * this level.
> >          */
> > -       update_sd_lb_stats(env, balance, &sds);
> > -
> > -       /*
> > -        * this_cpu is not the appropriate cpu to perform load balancing at
> > -        * this level.
> > -        */
> > -       if (!(*balance))
> > -               goto ret;
> > +       update_sd_lb_stats(env, &sds);
> >
> >         if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
> >             check_asym_packing(env, &sds))
> > @@ -4919,7 +4883,6 @@ force_balance:
> >         return sds.busiest;
> >
> >  out_balanced:
> > -ret:
> >         env->imbalance = 0;
> >         return NULL;
> >  }
> > @@ -5001,13 +4964,47 @@ static int need_active_balance(struct lb_env *env)
> >
> >  static int active_load_balance_cpu_stop(void *data);
> >
> > +static int should_we_balance(struct lb_env *env)
> > +{
> > +       struct sched_group *sg = env->sd->groups;
> > +       struct cpumask *sg_cpus, *sg_mask;
> > +       int cpu, balance_cpu = -1;
> > +
> > +       /*
> > +        * In the newly idle case, we will allow all the cpu's
> > +        * to do the newly idle load balance.
> > +        */
> > +       if (env->idle == CPU_NEWLY_IDLE)
> > +               return 1;
> > +
> > +       sg_cpus = sched_group_cpus(sg);
> > +       sg_mask = sched_group_mask(sg);
> > +       /* Try to find first idle cpu */
> > +       for_each_cpu_and(cpu, sg_cpus, env->cpus) {
> > +               if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
> > +                       continue;
> > +
> > +               balance_cpu = cpu;
> > +               break;
> > +       }
> > +
> > +       if (balance_cpu == -1)
> > +               balance_cpu = group_balance_cpu(sg);
> > +
> > +       /*
> > +        * First idle cpu or the first cpu(busiest) in this sched group
> > +        * is eligible for doing load balancing at this and above domains.
> > +        */
> > +       return balance_cpu != env->dst_cpu;
> > +}
> > +
> >  /*
> >   * Check this_cpu to ensure it is balanced within domain. Attempt to move
> >   * tasks if there is an imbalance.
> >   */
> >  static int load_balance(int this_cpu, struct rq *this_rq,
> >                         struct sched_domain *sd, enum cpu_idle_type idle,
> > -                       int *balance)
> > +                       int *should_balance)
> >  {
> >         int ld_moved, cur_ld_moved, active_balance = 0;
> >         struct sched_group *group;
> > @@ -5036,12 +5033,14 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >
> >         schedstat_inc(sd, lb_count[idle]);
> >
> > -redo:
> > -       group = find_busiest_group(&env, balance);
> > -
> > -       if (*balance == 0)
> > +       if (!should_we_balance(&env)) {
> > +               *should_balance = 0;
> >                 goto out_balanced;
> > +       } else
> > +               *should_balance = 1;
> >
> > +redo:
> > +       group = find_busiest_group(&env);
> >         if (!group) {
> >                 schedstat_inc(sd, lb_nobusyg[idle]);
> >                 goto out_balanced;
> > @@ -5254,7 +5253,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
> >         rcu_read_lock();
> >         for_each_domain(this_cpu, sd) {
> >                 unsigned long interval;
> > -               int balance = 1;
> > +               int should_balance;
> >
> >                 if (!(sd->flags & SD_LOAD_BALANCE))
> >                         continue;
> > @@ -5262,7 +5261,8 @@ void idle_balance(int this_cpu, struct rq *this_rq)
> >                 if (sd->flags & SD_BALANCE_NEWIDLE) {
> >                         /* If we've pulled tasks over stop searching: */
> >                         pulled_task = load_balance(this_cpu, this_rq,
> > -                                                  sd, CPU_NEWLY_IDLE, &balance);
> > +                                                  sd, CPU_NEWLY_IDLE,
> > +                                                  &should_balance);
> >                 }
> >
> >                 interval = msecs_to_jiffies(sd->balance_interval);
> > @@ -5502,7 +5502,7 @@ void update_max_interval(void)
> >   */
> >  static void rebalance_domains(int cpu, enum cpu_idle_type idle)
> >  {
> > -       int balance = 1;
> > +       int should_balance;
> 
> int should_balance = 1;
> because if (time_after_eq(jiffies, sd->last_balance + interval)) can
> skip the 1st load_balance and you will test should_balance with its
> initial value.

Hello, Vincent.

Yes, you are right.
I will fix it.

Thanks.

> 
> Vincent
> 
> >         struct rq *rq = cpu_rq(cpu);
> >         unsigned long interval;
> >         struct sched_domain *sd;
> > @@ -5534,7 +5534,7 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
> >                 }
> >
> >                 if (time_after_eq(jiffies, sd->last_balance + interval)) {
> > -                       if (load_balance(cpu, rq, sd, idle, &balance)) {
> > +                       if (load_balance(cpu, rq, sd, idle, &should_balance)) {
> >                                 /*
> >                                  * The LBF_SOME_PINNED logic could have changed
> >                                  * env->dst_cpu, so we can't know our idle
> > @@ -5557,7 +5557,7 @@ out:
> >                  * CPU in our sched group which is doing load balancing more
> >                  * actively.
> >                  */
> > -               if (!balance)
> > +               if (!should_balance)
> >                         break;
> >         }
> >         rcu_read_unlock();
> > --
> > 1.7.9.5
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  9:05     ` 김준수
@ 2013-08-02  9:26       ` Preeti U Murthy
  2013-08-02 10:32         ` Peter Zijlstra
  2013-08-02 10:20       ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Preeti U Murthy @ 2013-08-02  9:26 UTC (permalink / raw)
  To: 김준수
  Cc: 'Ingo Molnar', 'Peter Zijlstra',
	linux-kernel, 'Mike Galbraith', 'Paul Turner',
	'Alex Shi', 'Vincent Guittot',
	'Morten Rasmussen', 'Namhyung Kim',
	'Joonsoo Kim'

On 08/02/2013 02:35 PM, 김준수 wrote:
> 
> 
>> -----Original Message-----
>> From: Preeti U Murthy [mailto:preeti@linux.vnet.ibm.com]
>> Sent: Friday, August 02, 2013 1:23 PM
>> To: Joonsoo Kim
>> Cc: Ingo Molnar; Peter Zijlstra; linux-kernel@vger.kernel.org; Mike
>> Galbraith; Paul Turner; Alex Shi; Vincent Guittot; Morten Rasmussen;
>> Namhyung Kim; Joonsoo Kim
>> Subject: Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
>>
>> Hi Joonsoo,
> 
> Hello, Preeti.
> 
>>
>> On 08/02/2013 07:20 AM, Joonsoo Kim wrote:
>>> Now checking whether this cpu is appropriate to balance or not is
>>> embedded into update_sg_lb_stats() and this checking has no direct
>>> relationship to this function. There is not enough reason to place
>>> this checking at update_sg_lb_stats(), except saving one iteration for
>>> sched_group_cpus.
>>>
>>> In this patch, I factor out this checking to should_we_balance()
>> function.
>>> And before doing actual work for load_balancing, check whether this
>>> cpu is appropriate to balance via should_we_balance(). If this cpu is
>>> not a candidate for balancing, it quit the work immediately.
>>>
>>> With this change, we can save two memset cost and can expect better
>>> compiler optimization.
>>>
>>> Below is result of this patch.
>>>
>>> * Vanilla *
>>>    text	   data	    bss	    dec	    hex	filename
>>>   34499	   1136	    116	  35751	   8ba7	kernel/sched/fair.o
>>>
>>> * Patched *
>>>    text	   data	    bss	    dec	    hex	filename
>>>   34243	   1136	    116	  35495	   8aa7	kernel/sched/fair.o
>>>
>>> In addition, rename @balance to @should_balance in order to represent
>>> its purpose more clearly.
>>>
>>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
>>> eaae77e..7f51b8c 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4426,22 +4426,17 @@ fix_small_capacity(struct sched_domain *sd,
>> struct sched_group *group)
>>>   * @group: sched_group whose statistics are to be updated.
>>>   * @load_idx: Load index of sched_domain of this_cpu for load calc.
>>>   * @local_group: Does group contain this_cpu.
>>> - * @balance: Should we balance.
>>>   * @sgs: variable to hold the statistics for this group.
>>>   */
>>>  static inline void update_sg_lb_stats(struct lb_env *env,
>>>  			struct sched_group *group, int load_idx,
>>> -			int local_group, int *balance, struct sg_lb_stats
> *sgs)
>>> +			int local_group, struct sg_lb_stats *sgs)
>>>  {
>>>  	unsigned long nr_running, max_nr_running, min_nr_running;
>>>  	unsigned long load, max_cpu_load, min_cpu_load;
>>> -	unsigned int balance_cpu = -1, first_idle_cpu = 0;
>>>  	unsigned long avg_load_per_task = 0;
>>>  	int i;
>>>
>>> -	if (local_group)
>>> -		balance_cpu = group_balance_cpu(group);
>>> -
>>>  	/* Tally up the load of all CPUs in the group */
>>>  	max_cpu_load = 0;
>>>  	min_cpu_load = ~0UL;
>>> @@ -4454,15 +4449,9 @@ static inline void update_sg_lb_stats(struct
>> lb_env *env,
>>>  		nr_running = rq->nr_running;
>>>
>>>  		/* Bias balancing toward cpus of our domain */
>>> -		if (local_group) {
>>> -			if (idle_cpu(i) && !first_idle_cpu &&
>>> -					cpumask_test_cpu(i,
>> sched_group_mask(group))) {
>>> -				first_idle_cpu = 1;
>>> -				balance_cpu = i;
>>> -			}
>>> -
>>> +		if (local_group)
>>>  			load = target_load(i, load_idx);
>>> -		} else {
>>> +		else {
>>>  			load = source_load(i, load_idx);
>>>  			if (load > max_cpu_load)
>>>  				max_cpu_load = load;
>>> @@ -4482,22 +4471,9 @@ static inline void update_sg_lb_stats(struct
>> lb_env *env,
>>>  			sgs->idle_cpus++;
>>>  	}
>>>
>>> -	/*
>>> -	 * First idle cpu or the first cpu(busiest) in this sched group
>>> -	 * is eligible for doing load balancing at this and above
>>> -	 * domains. In the newly idle case, we will allow all the cpu's
>>> -	 * to do the newly idle load balance.
>>> -	 */
>>> -	if (local_group) {
>>> -		if (env->idle != CPU_NEWLY_IDLE) {
>>> -			if (balance_cpu != env->dst_cpu) {
>>> -				*balance = 0;
>>> -				return;
>>> -			}
>>> -			update_group_power(env->sd, env->dst_cpu);
>>> -		} else if (time_after_eq(jiffies, group->sgp->next_update))
>>> -			update_group_power(env->sd, env->dst_cpu);
>>> -	}
>>
>> Observe what is happening in the above code which checks if we should
>> balance on the env->dst_cpu.
>>
>> Only if the env->dst_cpu "belongs" to the group considered in
>> update_sg_lb_stats(), which means local_group = 1,should the above checks
>> be carried out.
>>
>> Meaning, if there is a better CPU in the same group to which
>> env->dst_cpu belongs, to carry out load balancing for the system (in the
>> above case, balance_cpu), cancel the current iteration of load balancing
>> on env->dst_cpu. Wait for the right cpu in this group to do the load
>> balancing.
>>
>> Keeping this in mind see the below comments around should_we_balance().
> 
> Okay.
> 
>>
>>> @@ -5001,13 +4964,47 @@ static int need_active_balance(struct lb_env
>>> *env)
>>>
>>>  static int active_load_balance_cpu_stop(void *data);
>>>
>>> +static int should_we_balance(struct lb_env *env) {
>>> +	struct sched_group *sg = env->sd->groups;
>>> +	struct cpumask *sg_cpus, *sg_mask;
>>> +	int cpu, balance_cpu = -1;
>>> +
>>> +	/*
>>> +	 * In the newly idle case, we will allow all the cpu's
>>> +	 * to do the newly idle load balance.
>>> +	 */
>>> +	if (env->idle == CPU_NEWLY_IDLE)
>>> +		return 1;
>>> +
>>> +	sg_cpus = sched_group_cpus(sg);
>>> +	sg_mask = sched_group_mask(sg);
>>> +	/* Try to find first idle cpu */
>>> +	for_each_cpu_and(cpu, sg_cpus, env->cpus) {
>>> +		if (!cpumask_test_cpu(cpu, sg_mask) || idle_cpu(cpu))
>>> +			continue;
>>> +
>>> +		balance_cpu = cpu;
>>> +		break;
>>> +	}
>>
>> You need to iterate over all the groups of the sched domain env->sd and
>> not just the first group of env->sd like you are doing above. This is to
> 
> I don't think so.
> IIRC, env->sd->groups always means local group,
> so we don't need to find our group by iterating over all the groups.

Take a look at update_sd_lb_stats(). That should clarify this. There is
an exclusive
local_group check there.

sd->groups points to the first group in the list of groups under this sd.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  9:05     ` 김준수
  2013-08-02  9:26       ` Preeti U Murthy
@ 2013-08-02 10:20       ` Peter Zijlstra
  2013-08-05  7:22         ` Joonsoo Kim
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2013-08-02 10:20 UTC (permalink / raw)
  To: ���ؼ�
  Cc: 'Preeti U Murthy', 'Ingo Molnar',
	linux-kernel, 'Mike Galbraith', 'Paul Turner',
	'Alex Shi', 'Vincent Guittot',
	'Morten Rasmussen', 'Namhyung Kim',
	'Joonsoo Kim'

On Fri, Aug 02, 2013 at 06:05:51PM +0900, ���ؼ� wrote:

What is with you people; have you never learned to trim emails?

Seriously, I'm going to write a script which tests to too many quoted
lines, too many nested quotes and other such nonsense and drop your
emails into /dev/null.

*shees*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02  9:26       ` Preeti U Murthy
@ 2013-08-02 10:32         ` Peter Zijlstra
  2013-08-05  4:22           ` Preeti U Murthy
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2013-08-02 10:32 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: 김준수, 'Ingo Molnar',
	linux-kernel, 'Mike Galbraith', 'Paul Turner',
	'Alex Shi', 'Vincent Guittot',
	'Morten Rasmussen', 'Namhyung Kim',
	'Joonsoo Kim'

On Fri, Aug 02, 2013 at 02:56:14PM +0530, Preeti U Murthy wrote:
> >> You need to iterate over all the groups of the sched domain env->sd and
> >> not just the first group of env->sd like you are doing above. This is to
> > 
> > I don't think so.
> > IIRC, env->sd->groups always means local group,
> > so we don't need to find our group by iterating over all the groups.
> 
> Take a look at update_sd_lb_stats(). That should clarify this. There is
> an exclusive
> local_group check there.
> 
> sd->groups points to the first group in the list of groups under this sd.

Take a look at: 88b8dac0a

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02 10:32         ` Peter Zijlstra
@ 2013-08-05  4:22           ` Preeti U Murthy
  2013-08-05  7:21             ` Joonsoo Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Preeti U Murthy @ 2013-08-05  4:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: 김준수, 'Ingo Molnar',
	linux-kernel, 'Mike Galbraith', 'Paul Turner',
	'Alex Shi', 'Vincent Guittot',
	'Morten Rasmussen', 'Namhyung Kim',
	'Joonsoo Kim'

On 08/02/2013 04:02 PM, Peter Zijlstra wrote:
> On Fri, Aug 02, 2013 at 02:56:14PM +0530, Preeti U Murthy wrote:
>>>> You need to iterate over all the groups of the sched domain env->sd and
>>>> not just the first group of env->sd like you are doing above. This is to
>>>
>>> I don't think so.
>>> IIRC, env->sd->groups always means local group,
>>> so we don't need to find our group by iterating over all the groups.
>>
>> Take a look at update_sd_lb_stats(). That should clarify this. There is
>> an exclusive
>> local_group check there.
>>
>> sd->groups points to the first group in the list of groups under this sd.
> 
> Take a look at: 88b8dac0a
> 
Ah ok! Thanks for this pointer.

Apologies for having overlooked the fact that the sd->groups always
points to the group to which the balance_cpu belongs. And subsequent
dst_cpus for retries of load balancing also belong to the same group as
the balance_cpu.

This patch thus looks fine to me.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-05  4:22           ` Preeti U Murthy
@ 2013-08-05  7:21             ` Joonsoo Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-05  7:21 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Peter Zijlstra, 'Ingo Molnar',
	linux-kernel, 'Mike Galbraith', 'Paul Turner',
	'Alex Shi', 'Vincent Guittot',
	'Morten Rasmussen', 'Namhyung Kim'

On Mon, Aug 05, 2013 at 09:52:28AM +0530, Preeti U Murthy wrote:
> On 08/02/2013 04:02 PM, Peter Zijlstra wrote:
> > On Fri, Aug 02, 2013 at 02:56:14PM +0530, Preeti U Murthy wrote:
> >>>> You need to iterate over all the groups of the sched domain env->sd and
> >>>> not just the first group of env->sd like you are doing above. This is to
> >>>
> >>> I don't think so.
> >>> IIRC, env->sd->groups always means local group,
> >>> so we don't need to find our group by iterating over all the groups.
> >>
> >> Take a look at update_sd_lb_stats(). That should clarify this. There is
> >> an exclusive
> >> local_group check there.
> >>
> >> sd->groups points to the first group in the list of groups under this sd.
> > 
> > Take a look at: 88b8dac0a
> > 
> Ah ok! Thanks for this pointer.
> 
> Apologies for having overlooked the fact that the sd->groups always
> points to the group to which the balance_cpu belongs. And subsequent
> dst_cpus for retries of load balancing also belong to the same group as
> the balance_cpu.
> 
> This patch thus looks fine to me.

Okay!

> 
> Regards
> Preeti U Murthy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/3] sched: factor out code to should_we_balance()
  2013-08-02 10:20       ` Peter Zijlstra
@ 2013-08-05  7:22         ` Joonsoo Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-05  7:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: 'Preeti U Murthy', 'Ingo Molnar',
	linux-kernel, 'Mike Galbraith', 'Paul Turner',
	'Alex Shi', 'Vincent Guittot',
	'Morten Rasmussen', 'Namhyung Kim'

On Fri, Aug 02, 2013 at 12:20:40PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 02, 2013 at 06:05:51PM +0900, ���ؼ� wrote:
> 
> What is with you people; have you never learned to trim emails?
> 
> Seriously, I'm going to write a script which tests to too many quoted
> lines, too many nested quotes and other such nonsense and drop your
> emails into /dev/null.

Hello, Peter.

Sorry about that.
I will be careful about this later.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] sched: clean-up struct sd_lb_stat
  2013-08-02  4:40   ` Preeti U Murthy
@ 2013-08-05  7:32     ` Joonsoo Kim
  2013-08-06  9:22       ` Preeti U Murthy
  0 siblings, 1 reply; 17+ messages in thread
From: Joonsoo Kim @ 2013-08-05  7:32 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Mike Galbraith,
	Paul Turner, Alex Shi, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim

> > +	if (busiest->group_imb) {
> > +		busiest->sum_weighted_load =
> > +			min(busiest->sum_weighted_load, sds->sd_avg_load);
> 
> Right here we get confused as to why the total load is being compared
> against load per task (although you are changing it to load per task above).

Yes, you are right. I will add load_per_task to struct sg_lb_stats.


> > @@ -4771,12 +4763,13 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> >  	 * Be careful of negative numbers as they'll appear as very large values
> >  	 * with unsigned longs.
> >  	 */
> > -	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
> > +	max_pull = min(busiest->avg_load - sds->sd_avg_load,
> > +						load_above_capacity);
> 
> This is ok, but readability wise max_load is much better. max_load
> signifies the maximum load per task on a group in this sd, and avg_load
> signifies the total load per task across the sd. You are checking if
> there is imbalance in the total load in the sd and try to even it out
> across the sd.  Here "busiest" does not convey this immediately.

You may be confused. max_load doesn't means the maximum load per task.
It means that the busiest group's load per cpu power. And here max doesn't
mean maximum load in this sd, instead, load of busiest sg. IMO,
busiest->avg_load convey proper meaning.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/3] sched: clean-up struct sd_lb_stat
  2013-08-05  7:32     ` Joonsoo Kim
@ 2013-08-06  9:22       ` Preeti U Murthy
  0 siblings, 0 replies; 17+ messages in thread
From: Preeti U Murthy @ 2013-08-06  9:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Mike Galbraith,
	Paul Turner, Alex Shi, Vincent Guittot, Morten Rasmussen,
	Namhyung Kim

Hi Joonsoo,

On 08/05/2013 01:02 PM, Joonsoo Kim wrote:
>>> +	if (busiest->group_imb) {
>>> +		busiest->sum_weighted_load =
>>> +			min(busiest->sum_weighted_load, sds->sd_avg_load);
>>
>> Right here we get confused as to why the total load is being compared
>> against load per task (although you are changing it to load per task above).
> 
> Yes, you are right. I will add load_per_task to struct sg_lb_stats.
> 
> 
>>> @@ -4771,12 +4763,13 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>>>  	 * Be careful of negative numbers as they'll appear as very large values
>>>  	 * with unsigned longs.
>>>  	 */
>>> -	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
>>> +	max_pull = min(busiest->avg_load - sds->sd_avg_load,
>>> +						load_above_capacity);
>>
>> This is ok, but readability wise max_load is much better. max_load
>> signifies the maximum load per task on a group in this sd, and avg_load
>> signifies the total load per task across the sd. You are checking if
>> there is imbalance in the total load in the sd and try to even it out
>> across the sd.  Here "busiest" does not convey this immediately.
> 
> You may be confused. max_load doesn't means the maximum load per task.
> It means that the busiest group's load per cpu power. And here max doesn't
> mean maximum load in this sd, instead, load of busiest sg. IMO,
> busiest->avg_load convey proper meaning.

Yes sorry about that. It is load per cpu power. My point was that 
max_load helped in readability. Look at the these two lines in your patch.

-	max_pull = min(sds->max_load - sds->avg_load, load_above_capacity);
+	max_pull = min(busiest->avg_load - sds->sd_avg_load,
>>> +						load_above_capacity);

The deleted line says that there is an imbalance in the load/cpu in some 
part of the sd(max_load), which is above the average load of the sd, which
we are trying to even out. Something like the below diagram for the sd load.

  max_load
_/\________ avg_load.

I believe the line you have added makes it hard to understand this.

Anyway this is a minor issue, you can ignore it.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-08-06  9:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02  1:50 [PATCH v2 0/3] optimization, clean-up about fair.c Joonsoo Kim
2013-08-02  1:50 ` [PATCH v2 1/3] sched: remove one division operation in find_buiest_queue() Joonsoo Kim
2013-08-02  1:50 ` [PATCH v2 2/3] sched: factor out code to should_we_balance() Joonsoo Kim
2013-08-02  4:22   ` Preeti U Murthy
2013-08-02  9:05     ` 김준수
2013-08-02  9:26       ` Preeti U Murthy
2013-08-02 10:32         ` Peter Zijlstra
2013-08-05  4:22           ` Preeti U Murthy
2013-08-05  7:21             ` Joonsoo Kim
2013-08-02 10:20       ` Peter Zijlstra
2013-08-05  7:22         ` Joonsoo Kim
2013-08-02  7:51   ` Vincent Guittot
2013-08-02  9:08     ` Joonsoo Kim
2013-08-02  1:50 ` [PATCH v2 3/3] sched: clean-up struct sd_lb_stat Joonsoo Kim
2013-08-02  4:40   ` Preeti U Murthy
2013-08-05  7:32     ` Joonsoo Kim
2013-08-06  9:22       ` Preeti U Murthy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).