All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Tunable sched_mc_power_savings=n
@ 2008-12-17 17:26 Vaidyanathan Srinivasan
  2008-12-17 17:26 ` [PATCH v6 1/7] sched: Framework for sched_mc/smt_power_savings=N Vaidyanathan Srinivasan
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:26 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

Hi,

The existing power saving loadbalancer CONFIG_SCHED_MC attempts to run
the workload in the system on minimum number of CPU packages and tries
to keep rest of the CPU packages idle for longer duration. Thus
consolidating workloads to fewer packages help other packages to be in
idle state and save power.  The current implementation is very
conservative and does not work effectively across different workloads.
Initial idea of tunable sched_mc_power_savings=n was proposed to
enable tuning of the power saving load balancer based on the system
configuration, workload characteristics and end user requirements.

The power savings and performance of the given workload in an under
utilised system can be controlled by setting values of 0, 1 or 2 to
/sys/devices/system/cpu/sched_mc_power_savings with 0 being highest
performance and least power savings and level 2 indicating maximum
power savings even at the cost of slight performance degradation.

Please refer to the following discussions and article for details.

[1]Making power policy just work
http://lwn.net/Articles/287924/ 

[2][RFC v1] Tunable sched_mc_power_savings=n
http://lwn.net/Articles/287882/
                                                                                                               
[3][RFC PATCH v2 0/7] Tunable sched_mc_power_savings=n                                                                                                   
http://lwn.net/Articles/297306/

[4][RFC PATCH v3 0/5] Tunable sched_mc_power_savings=n
http://lkml.org/lkml/2008/11/10/260

[5][RFC PATCH v4 0/7] Tunable sched_mc_power_savings=n
http://lkml.org/lkml/2008/11/21/47

[6][RFC PATCH v5 0/7] Tunable sched_mc_power_savings=n
http://lkml.org/lkml/2008/12/11/178

The following series of patch demonstrates the basic framework for
tunable sched_mc_power_savings.

This version of the patch incorporates comments and feedback received
on the previous post.  Thanks to Balbir, Peter Zijlstra and others for
the review and comments.

Changes form v5:
---------------
* Fixed the sscanf bug and checking for (p->flags & PF_KTHREAD)
* Dropped the RFC prefix to indicate that the patch is ready for
  testing and inclusion  
* Patch series against 2.6.28-rc8 kernel

Changes from v4:
----------------

* Conditionally added SD_BALANCE_NEWIDLE flag to MC and CPU level
  domain to help task consolidation when sched_mc > 0
  Removing this flag significantly reduces the number load_balance
  calls and considerably slows down task consolidation and in effect
  power savings.

  Ingo and Mike Galbraith removed this flag to improve performance for
  select benchmarks.  I have added the flags only when power savings
  is requested and restore to full performance mode if sched_mc=0

* Fixed few whitespace issues
* Patch series against 2.6.28-rc8 kernel

Changes from v3:
----------------

* Fixed the locking code with double_lock_balance() in
  active-balance-newidle.patch

* Moved sched_mc_preferred_wakeup_cpu to root_domain structure so that
  each partitioned sched domain will get independent nominated cpu

* More comments in active-balance-newidle.patch

* Reverted sched MC level and CPU level fine tuning in v2.6.28-rc4 for
  now.  These affect consolidation since SD_BALANCE_NEWIDLE is
  removed.  I will rework the tuning in the next iteration to
  selectively enable them at sched_mc=2

* Patch series on 2.6.28-rc6 kernel  

Changes from v2:
----------------

* Fixed locking order issue in active-balance new-idle
* Moved the wakeup biasing code to wake_idle() function and preserve
  wake_affine function.  Previous version would break wake affine in
  order to aggressively consolidate tasks  
* Removed sched_mc_preferred_wakeup_cpu global variable and moved to
  doms_cur/dattr_cur and added a per_cpu pointer to appropriate
  storage in partitioned sched domain.  This changed is needed to
  preserve functionality in case of partitioned sched domains
* Patch on 2.6.28-rc3 kernel  

Results:
--------

Basic functionality of the code has not changed and the power vs
performance benefits for kernbench are similar to the ones posted
earlier.

KERNBENCH Runs: make -j4 on a x86 8 core, dual socket quad core cpu
package system

SchedMC Run Time     Package Idle    Energy  Power
0	81.69	     52.67% 53.94%  1.00x J 1.00y W
1	81.16	     35.33% 71.01%  0.96x J 0.97y W
2	75.28	     21.65% 84.79%  0.91x J 0.98y W

Please feel free to test, and let me know your comments and feedback.
I will post more experimental results with various benchmarks.

Thanks,
Vaidy

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

---

Gautham R Shenoy (1):
      sched: Framework for sched_mc/smt_power_savings=N

Vaidyanathan Srinivasan (6):
      sched: idle_balance() does not call load_balance_newidle()
      sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
      sched: activate active load balancing in new idle cpus
      sched: bias task wakeups to preferred semi-idle packages
      sched: nominate preferred wakeup cpu
      sched: favour lower logical cpu number for sched_mc balance


 include/linux/sched.h    |   21 +++++++++++
 include/linux/topology.h |    6 ++-
 kernel/sched.c           |   89 +++++++++++++++++++++++++++++++++++++++++++---
 kernel/sched_fair.c      |   18 +++++++++
 4 files changed, 126 insertions(+), 8 deletions(-)

-- 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 1/7] sched: Framework for sched_mc/smt_power_savings=N
  2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
@ 2008-12-17 17:26 ` Vaidyanathan Srinivasan
  2008-12-17 18:10   ` Balbir Singh
  2008-12-17 17:26 ` [PATCH v6 2/7] sched: favour lower logical cpu number for sched_mc balance Vaidyanathan Srinivasan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:26 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

From: Gautham R Shenoy <ego@in.ibm.com>

Currently the sched_mc/smt_power_savings variable is a boolean,
which either enables or disables topology based power savings.
This patch extends the behaviour of the variable from boolean to
multivalued, such that based on the value, we decide how
aggressively do we want to perform powersavings balance at
appropriate sched domain based on topology.

Variable levels of power saving tunable would benefit end user to
match the required level of power savings vs performance
trade-off depending on the system configuration and workloads.

This version makes the sched_mc_power_savings global variable to
take more values (0,1,2).  Later versions can have a single
tunable called sched_power_savings instead of
sched_{mc,smt}_power_savings.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---

 include/linux/sched.h |   11 +++++++++++
 kernel/sched.c        |   17 ++++++++++++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 55e30d1..888f2b2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -764,6 +764,17 @@ enum cpu_idle_type {
 #define SD_SERIALIZE		1024	/* Only a single load balancing instance */
 #define SD_WAKE_IDLE_FAR	2048	/* Gain latency sacrificing cache hit */
 
+enum powersavings_balance_level {
+	POWERSAVINGS_BALANCE_NONE = 0,  /* No power saving load balance */
+	POWERSAVINGS_BALANCE_BASIC,	/* Fill one thread/core/package
+					 * first for long running threads
+					 */
+	POWERSAVINGS_BALANCE_WAKEUP,	/* Also bias task wakeups to semi-idle
+					 * cpu package for power savings
+					 */
+	MAX_POWERSAVINGS_BALANCE_LEVELS
+};
+
 #define BALANCE_FOR_MC_POWER	\
 	(sched_smt_power_savings ? SD_POWERSAVINGS_BALANCE : 0)
 
diff --git a/kernel/sched.c b/kernel/sched.c
index e4bb1dd..16897ab 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -7879,14 +7879,25 @@ int arch_reinit_sched_domains(void)
 static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
 {
 	int ret;
+	unsigned int level = 0;
 
-	if (buf[0] != '0' && buf[0] != '1')
+	if (sscanf(buf, "%u", &level) != 1)
+		return -EINVAL;
+
+	/*
+	 * level is always be positive so don't check for
+	 * level < POWERSAVINGS_BALANCE_NONE which is 0
+	 * What happens on 0 or 1 byte write,
+	 * need to check for count as well?
+	 */
+
+	if (level >= MAX_POWERSAVINGS_BALANCE_LEVELS)
 		return -EINVAL;
 
 	if (smt)
-		sched_smt_power_savings = (buf[0] == '1');
+		sched_smt_power_savings = level;
 	else
-		sched_mc_power_savings = (buf[0] == '1');
+		sched_mc_power_savings = level;
 
 	ret = arch_reinit_sched_domains();
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 2/7] sched: favour lower logical cpu number for sched_mc balance
  2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
  2008-12-17 17:26 ` [PATCH v6 1/7] sched: Framework for sched_mc/smt_power_savings=N Vaidyanathan Srinivasan
@ 2008-12-17 17:26 ` Vaidyanathan Srinivasan
  2008-12-17 18:09   ` Balbir Singh
  2008-12-17 17:26 ` [PATCH v6 3/7] sched: nominate preferred wakeup cpu Vaidyanathan Srinivasan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:26 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

Just in case two groups have identical load, prefer to move load to lower
logical cpu number rather than the present logic of moving to higher logical
number.

find_busiest_group() tries to look for a group_leader that has spare capacity
to take more tasks and freeup an appropriate least loaded group.  Just in case
there is a tie and the load is equal, then the group with higher logical number
is favoured.  This conflicts with user space irqbalance daemon that will move
interrupts to lower logical number if the system utilisation is very low.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---

 kernel/sched.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 16897ab..0b9bbbd 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3264,7 +3264,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
 		 */
 		if ((sum_nr_running < min_nr_running) ||
 		    (sum_nr_running == min_nr_running &&
-		     first_cpu(group->cpumask) <
+		     first_cpu(group->cpumask) >
 		     first_cpu(group_min->cpumask))) {
 			group_min = group;
 			min_nr_running = sum_nr_running;
@@ -3280,7 +3280,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
 		if (sum_nr_running <= group_capacity - 1) {
 			if (sum_nr_running > leader_nr_running ||
 			    (sum_nr_running == leader_nr_running &&
-			     first_cpu(group->cpumask) >
+			     first_cpu(group->cpumask) <
 			      first_cpu(group_leader->cpumask))) {
 				group_leader = group;
 				leader_nr_running = sum_nr_running;


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 3/7] sched: nominate preferred wakeup cpu
  2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
  2008-12-17 17:26 ` [PATCH v6 1/7] sched: Framework for sched_mc/smt_power_savings=N Vaidyanathan Srinivasan
  2008-12-17 17:26 ` [PATCH v6 2/7] sched: favour lower logical cpu number for sched_mc balance Vaidyanathan Srinivasan
@ 2008-12-17 17:26 ` Vaidyanathan Srinivasan
  2008-12-17 17:27 ` [PATCH v6 4/7] sched: bias task wakeups to preferred semi-idle packages Vaidyanathan Srinivasan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:26 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

When the system utilisation is low and more cpus are idle,
then the process waking up from sleep should prefer to
wakeup an idle cpu from semi-idle cpu package (multi core
package) rather than a completely idle cpu package which
would waste power.

Use the sched_mc balance logic in find_busiest_group() to
nominate a preferred wakeup cpu.

This info can be sored in appropriate sched_domain, but
updating this info in all copies of sched_domain is not
practical.  Hence this information is stored in root_domain
struct which is one copy per partitioned sched domain.
The root_domain can be accessed from each cpu's runqueue
and there is one copy per partitioned sched domain.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---

 kernel/sched.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0b9bbbd..3415fa3 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -493,6 +493,14 @@ struct root_domain {
 #ifdef CONFIG_SMP
 	struct cpupri cpupri;
 #endif
+#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
+	/*
+	 * Preferred wake up cpu nominated by sched_mc balance that will be
+	 * used when most cpus are idle in the system indicating overall very
+	 * low system utilisation. Triggered at POWERSAVINGS_BALANCE_WAKEUP(2)
+	 */
+	unsigned int sched_mc_preferred_wakeup_cpu;
+#endif
 };
 
 /*
@@ -3407,6 +3415,10 @@ out_balanced:
 
 	if (this == group_leader && group_leader != group_min) {
 		*imbalance = min_load_per_task;
+		if (sched_mc_power_savings >= POWERSAVINGS_BALANCE_WAKEUP) {
+			cpu_rq(this_cpu)->rd->sched_mc_preferred_wakeup_cpu =
+					first_cpu(group_leader->cpumask);
+		}
 		return group_min;
 	}
 #endif


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 4/7] sched: bias task wakeups to preferred semi-idle packages
  2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
                   ` (2 preceding siblings ...)
  2008-12-17 17:26 ` [PATCH v6 3/7] sched: nominate preferred wakeup cpu Vaidyanathan Srinivasan
@ 2008-12-17 17:27 ` Vaidyanathan Srinivasan
  2008-12-17 17:27 ` [PATCH v6 5/7] sched: activate active load balancing in new idle cpus Vaidyanathan Srinivasan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:27 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

Preferred wakeup cpu (from a semi idle package) has been
nominated in find_busiest_group() in the previous patch.  Use
this information in sched_mc_preferred_wakeup_cpu in function
wake_idle() to bias task wakeups if the following conditions
are satisfied:
        - The present cpu that is trying to wakeup the process is
          idle and waking the target process on this cpu will
          potentially wakeup a completely idle package
        - The previous cpu on which the target process ran is
          also idle and hence selecting the previous cpu may
          wakeup a semi idle cpu package
        - The task being woken up is allowed to run in the
          nominated cpu (cpu affinity and restrictions)

Basically if both the current cpu and the previous cpu on
which the task ran is idle, select the nominated cpu from semi
idle cpu package for running the new task that is waking up.

Cache hotness is considered since the actual biasing happens
in wake_idle() only if the application is cache cold.

This technique will effectively move short running bursty jobs in
a mostly idle system.

Wakeup biasing for power savings gets automatically disabled if
system utilisation increases due to the fact that the probability
of finding both this_cpu and prev_cpu idle decreases.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---

 kernel/sched_fair.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 98345e4..c82386c 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1027,6 +1027,24 @@ static int wake_idle(int cpu, struct task_struct *p)
 	cpumask_t tmp;
 	struct sched_domain *sd;
 	int i;
+	unsigned int chosen_wakeup_cpu;
+	int this_cpu;
+
+	/*
+	 * At POWERSAVINGS_BALANCE_WAKEUP level, if both this_cpu and prev_cpu
+	 * are idle and this is not a kernel thread and this task's affinity
+	 * allows it to be moved to preferred cpu, then just move!
+	 */
+
+	this_cpu = smp_processor_id();
+	chosen_wakeup_cpu =
+		cpu_rq(this_cpu)->rd->sched_mc_preferred_wakeup_cpu;
+
+	if (sched_mc_power_savings >= POWERSAVINGS_BALANCE_WAKEUP &&
+		idle_cpu(cpu) && idle_cpu(this_cpu) &&
+		p->mm && !(p->flags & PF_KTHREAD) &&
+		cpu_isset(chosen_wakeup_cpu, p->cpus_allowed))
+		return chosen_wakeup_cpu;
 
 	/*
 	 * If it is idle, then it is the best cpu to run this task.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 5/7] sched: activate active load balancing in new idle cpus
  2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
                   ` (3 preceding siblings ...)
  2008-12-17 17:27 ` [PATCH v6 4/7] sched: bias task wakeups to preferred semi-idle packages Vaidyanathan Srinivasan
@ 2008-12-17 17:27 ` Vaidyanathan Srinivasan
  2008-12-17 17:27 ` [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 Vaidyanathan Srinivasan
  2008-12-17 17:27 ` [PATCH v6 7/7] sched: idle_balance() does not call load_balance_newidle() Vaidyanathan Srinivasan
  6 siblings, 0 replies; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:27 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

Active load balancing is a process by which migration thread
is woken up on the target CPU in order to pull current
running task on another package into this newly idle
package.

This method is already in use with normal load_balance(),
this patch introduces this method to new idle cpus when
sched_mc is set to POWERSAVINGS_BALANCE_WAKEUP.

This logic provides effective consolidation of short running
daemon jobs in a almost idle system

The side effect of this patch may be ping-ponging of tasks
if the system is moderately utilised. May need to adjust the
iterations before triggering.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---

 kernel/sched.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 3415fa3..f82a7a0 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3692,10 +3692,64 @@ redo:
 	}
 
 	if (!ld_moved) {
+		int active_balance;
+
 		schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]);
 		if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
 		    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
 			return -1;
+
+		if (sched_mc_power_savings < POWERSAVINGS_BALANCE_WAKEUP)
+			return -1;
+
+		if (sd->nr_balance_failed++ < 2)
+			return -1;
+
+		/*
+		 * The only task running in a non-idle cpu can be moved to this
+		 * cpu in an attempt to completely freeup the other CPU
+		 * package. The same method used to move task in load_balance()
+		 * have been extended for load_balance_newidle() to speedup
+		 * consolidation at sched_mc=POWERSAVINGS_BALANCE_WAKEUP (2)
+		 *
+		 * The package power saving logic comes from
+		 * find_busiest_group().  If there are no imbalance, then
+		 * f_b_g() will return NULL.  However when sched_mc={1,2} then
+		 * f_b_g() will select a group from which a running task may be
+		 * pulled to this cpu in order to make the other package idle.
+		 * If there is no opportunity to make a package idle and if
+		 * there are no imbalance, then f_b_g() will return NULL and no
+		 * action will be taken in load_balance_newidle().
+		 *
+		 * Under normal task pull operation due to imbalance, there
+		 * will be more than one task in the source run queue and
+		 * move_tasks() will succeed.  ld_moved will be true and this
+		 * active balance code will not be triggered.
+		 */
+
+		/* Lock busiest in correct order while this_rq is held */
+		double_lock_balance(this_rq, busiest);
+
+		/*
+		 * don't kick the migration_thread, if the curr
+		 * task on busiest cpu can't be moved to this_cpu
+		 */
+		if (!cpu_isset(this_cpu, busiest->curr->cpus_allowed)) {
+			double_unlock_balance(this_rq, busiest);
+			all_pinned = 1;
+			return ld_moved;
+		}
+
+		if (!busiest->active_balance) {
+			busiest->active_balance = 1;
+			busiest->push_cpu = this_cpu;
+			active_balance = 1;
+		}
+
+		double_unlock_balance(this_rq, busiest);
+		if (active_balance)
+			wake_up_process(busiest->migration_thread);
+
 	} else
 		sd->nr_balance_failed = 0;
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
                   ` (4 preceding siblings ...)
  2008-12-17 17:27 ` [PATCH v6 5/7] sched: activate active load balancing in new idle cpus Vaidyanathan Srinivasan
@ 2008-12-17 17:27 ` Vaidyanathan Srinivasan
  2008-12-18  1:42   ` Andrew Morton
  2008-12-17 17:27 ` [PATCH v6 7/7] sched: idle_balance() does not call load_balance_newidle() Vaidyanathan Srinivasan
  6 siblings, 1 reply; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:27 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

Add SD_BALANCE_NEWIDLE flag at MC level and CPU level
if sched_mc is set.  This helps power savings and
will not affect performance when sched_mc=0

Ingo and Mike Galbraith have optimised the SD flags by
removing SD_BALANCE_NEWIDLE at MC and CPU level.  This
helps performance but hurts power savings since this
slows down task consolidation by reducing the number
of times load_balance is run.

    sched: fine-tune SD_MC_INIT
        commit 14800984706bf6936bbec5187f736e928be5c218
        Author: Mike Galbraith <efault@gmx.de>
        Date:   Fri Nov 7 15:26:50 2008 +0100

    sched: re-tune balancing -- revert
        commit 9fcd18c9e63e325dbd2b4c726623f760788d5aa8
        Author: Ingo Molnar <mingo@elte.hu>
        Date:   Wed Nov 5 16:52:08 2008 +0100

This patch selectively enables SD_BALANCE_NEWIDLE flag
only when sched_mc is set to 1 or 2.  This helps power savings
by task consolidation and also does not hurt performance at
sched_mc=0 where all power saving optimisations are turned off.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---

 include/linux/sched.h    |   10 ++++++++++
 include/linux/topology.h |    6 ++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 888f2b2..e894892 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -782,6 +782,16 @@ enum powersavings_balance_level {
 	((sched_mc_power_savings || sched_smt_power_savings) ?	\
 	 SD_POWERSAVINGS_BALANCE : 0)
 
+/*
+ * Optimise SD flags for power savings:
+ * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
+ * Keep default SD flags if sched_{smt,mc}_power_saving=0
+ */
+
+#define POWERSAVING_SD_FLAGS	\
+	((sched_mc_power_savings || sched_smt_power_savings) ?	\
+	  SD_BALANCE_NEWIDLE : 0)
+
 #define test_sd_parent(sd, flag)	((sd->parent &&		\
 					 (sd->parent->flags & flag)) ? 1 : 0)
 
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 117f1b7..26c608c 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -125,7 +125,8 @@ void arch_update_cpu_topology(void);
 				| SD_WAKE_AFFINE	\
 				| SD_WAKE_BALANCE	\
 				| SD_SHARE_PKG_RESOURCES\
-				| BALANCE_FOR_MC_POWER,	\
+				| BALANCE_FOR_MC_POWER	\
+				| POWERSAVING_SD_FLAGS, \
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
 }
@@ -150,7 +151,8 @@ void arch_update_cpu_topology(void);
 				| SD_BALANCE_FORK	\
 				| SD_WAKE_AFFINE	\
 				| SD_WAKE_BALANCE	\
-				| BALANCE_FOR_PKG_POWER,\
+				| BALANCE_FOR_PKG_POWER \
+				| POWERSAVING_SD_FLAGS, \
 	.last_balance		= jiffies,		\
 	.balance_interval	= 1,			\
 }


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 7/7] sched: idle_balance() does not call load_balance_newidle()
  2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
                   ` (5 preceding siblings ...)
  2008-12-17 17:27 ` [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 Vaidyanathan Srinivasan
@ 2008-12-17 17:27 ` Vaidyanathan Srinivasan
  6 siblings, 0 replies; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-17 17:27 UTC (permalink / raw)
  To: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi, Peter Zijlstra
  Cc: Ingo Molnar, Dipankar Sarma, Balbir Singh, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek,
	Vaidyanathan Srinivasan

load_balance_newidle() does not get called if SD_BALANCE_NEWIDLE
is set at higher level domain (3-CPU) and not in low level domain
(2-MC).

pulled_task is initialised to -1 and checked for non-zero which
is always true if the lowest level sched_domain does not have
SD_BALANCE_NEWIDLE flag set.

Trivial fix to initialise pulled_task to zero.

This patch has been queued for 2.6.29
http://lkml.org/lkml/2008/12/8/213

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---

 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index f82a7a0..e6a88bf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3773,7 +3773,7 @@ out_balanced:
 static void idle_balance(int this_cpu, struct rq *this_rq)
 {
 	struct sched_domain *sd;
-	int pulled_task = -1;
+	int pulled_task = 0;
 	unsigned long next_balance = jiffies + HZ;
 	cpumask_t tmpmask;
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 2/7] sched: favour lower logical cpu number for sched_mc balance
  2008-12-17 17:26 ` [PATCH v6 2/7] sched: favour lower logical cpu number for sched_mc balance Vaidyanathan Srinivasan
@ 2008-12-17 18:09   ` Balbir Singh
  0 siblings, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2008-12-17 18:09 UTC (permalink / raw)
  To: Vaidyanathan Srinivasan
  Cc: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi,
	Peter Zijlstra, Ingo Molnar, Dipankar Sarma, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek

* Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> [2008-12-17 22:56:42]:

> Just in case two groups have identical load, prefer to move load to lower
> logical cpu number rather than the present logic of moving to higher logical
> number.
> 
> find_busiest_group() tries to look for a group_leader that has spare capacity
> to take more tasks and freeup an appropriate least loaded group.  Just in case
> there is a tie and the load is equal, then the group with higher logical number
> is favoured.  This conflicts with user space irqbalance daemon that will move
> interrupts to lower logical number if the system utilisation is very low.
> 
> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> ---
> 
>  kernel/sched.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 16897ab..0b9bbbd 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3264,7 +3264,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
>  		 */
>  		if ((sum_nr_running < min_nr_running) ||
>  		    (sum_nr_running == min_nr_running &&
> -		     first_cpu(group->cpumask) <
> +		     first_cpu(group->cpumask) >
>  		     first_cpu(group_min->cpumask))) {
>  			group_min = group;
>  			min_nr_running = sum_nr_running;
> @@ -3280,7 +3280,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
>  		if (sum_nr_running <= group_capacity - 1) {
>  			if (sum_nr_running > leader_nr_running ||
>  			    (sum_nr_running == leader_nr_running &&
> -			     first_cpu(group->cpumask) >
> +			     first_cpu(group->cpumask) <
>  			      first_cpu(group_leader->cpumask))) {
>  				group_leader = group;
>  				leader_nr_running = sum_nr_running;
> 
>

OK, so group_min is pushed to higher cpu numbers and group_leader is pulled to
lower cpu numbers

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 1/7] sched: Framework for sched_mc/smt_power_savings=N
  2008-12-17 17:26 ` [PATCH v6 1/7] sched: Framework for sched_mc/smt_power_savings=N Vaidyanathan Srinivasan
@ 2008-12-17 18:10   ` Balbir Singh
  0 siblings, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2008-12-17 18:10 UTC (permalink / raw)
  To: Vaidyanathan Srinivasan
  Cc: Linux Kernel, Suresh B Siddha, Venkatesh Pallipadi,
	Peter Zijlstra, Ingo Molnar, Dipankar Sarma, Vatsa,
	Gautham R Shenoy, Andi Kleen, David Collier-Brown, Tim Connors,
	Max Krasnyansky, Gregory Haskins, Pavel Machek

* Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> [2008-12-17 22:56:35]:

> From: Gautham R Shenoy <ego@in.ibm.com>
> 
> Currently the sched_mc/smt_power_savings variable is a boolean,
> which either enables or disables topology based power savings.
> This patch extends the behaviour of the variable from boolean to
> multivalued, such that based on the value, we decide how
> aggressively do we want to perform powersavings balance at
> appropriate sched domain based on topology.
> 
> Variable levels of power saving tunable would benefit end user to
> match the required level of power savings vs performance
> trade-off depending on the system configuration and workloads.
> 
> This version makes the sched_mc_power_savings global variable to
> take more values (0,1,2).  Later versions can have a single
> tunable called sched_power_savings instead of
> sched_{mc,smt}_power_savings.
> 
> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
> ---
> 
>  include/linux/sched.h |   11 +++++++++++
>  kernel/sched.c        |   17 ++++++++++++++---
>  2 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 55e30d1..888f2b2 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -764,6 +764,17 @@ enum cpu_idle_type {
>  #define SD_SERIALIZE		1024	/* Only a single load balancing instance */
>  #define SD_WAKE_IDLE_FAR	2048	/* Gain latency sacrificing cache hit */
> 
> +enum powersavings_balance_level {
> +	POWERSAVINGS_BALANCE_NONE = 0,  /* No power saving load balance */
> +	POWERSAVINGS_BALANCE_BASIC,	/* Fill one thread/core/package
> +					 * first for long running threads
> +					 */
> +	POWERSAVINGS_BALANCE_WAKEUP,	/* Also bias task wakeups to semi-idle
> +					 * cpu package for power savings
> +					 */
> +	MAX_POWERSAVINGS_BALANCE_LEVELS
> +};
> +
>  #define BALANCE_FOR_MC_POWER	\
>  	(sched_smt_power_savings ? SD_POWERSAVINGS_BALANCE : 0)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index e4bb1dd..16897ab 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -7879,14 +7879,25 @@ int arch_reinit_sched_domains(void)
>  static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
>  {
>  	int ret;
> +	unsigned int level = 0;
> 
> -	if (buf[0] != '0' && buf[0] != '1')
> +	if (sscanf(buf, "%u", &level) != 1)
> +		return -EINVAL;
> +
> +	/*
> +	 * level is always be positive so don't check for
> +	 * level < POWERSAVINGS_BALANCE_NONE which is 0
> +	 * What happens on 0 or 1 byte write,
> +	 * need to check for count as well?
> +	 */
> +
> +	if (level >= MAX_POWERSAVINGS_BALANCE_LEVELS)
>  		return -EINVAL;
> 
>  	if (smt)
> -		sched_smt_power_savings = (buf[0] == '1');
> +		sched_smt_power_savings = level;
>  	else
> -		sched_mc_power_savings = (buf[0] == '1');
> +		sched_mc_power_savings = level;
> 
>  	ret = arch_reinit_sched_domains();
>

Looks good to me

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  2008-12-17 17:27 ` [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 Vaidyanathan Srinivasan
@ 2008-12-18  1:42   ` Andrew Morton
  2008-12-18  9:35     ` Vaidyanathan Srinivasan
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2008-12-18  1:42 UTC (permalink / raw)
  To: Vaidyanathan Srinivasan
  Cc: linux-kernel, suresh.b.siddha, venkatesh.pallipadi, a.p.zijlstra,
	mingo, dipankar, balbir, vatsa, ego, andi, davecb, tconnors,
	maxk, gregory.haskins, pavel, svaidy

On Wed, 17 Dec 2008 22:57:38 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -782,6 +782,16 @@ enum powersavings_balance_level {
>  	((sched_mc_power_savings || sched_smt_power_savings) ?	\
>  	 SD_POWERSAVINGS_BALANCE : 0)

What's with all the crappy macros in here?

> +/*
> + * Optimise SD flags for power savings:
> + * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
> + * Keep default SD flags if sched_{smt,mc}_power_saving=0
> + */
> +
> +#define POWERSAVING_SD_FLAGS	\
> +	((sched_mc_power_savings || sched_smt_power_savings) ?	\
> +	  SD_BALANCE_NEWIDLE : 0)

This one purports to be a constant, but it isn't - it's code.

It would be cleaner, clearer and more idiomatic to do

static inline int powersaving_sd_flags(void)
{
	...
}

Also, doing (sched_mc_power_savings | sched_smt_power_saving) might
save a branch.

>  #define test_sd_parent(sd, flag)	((sd->parent &&		\
>  					 (sd->parent->flags & flag)) ? 1 : 0)

buggy when passed an expression with side-effects.  Doesn't need to be
implemented as a macro.


Sigh.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  2008-12-18  1:42   ` Andrew Morton
@ 2008-12-18  9:35     ` Vaidyanathan Srinivasan
  2008-12-18 12:46       ` Ingo Molnar
  0 siblings, 1 reply; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-18  9:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, suresh.b.siddha, venkatesh.pallipadi, a.p.zijlstra,
	mingo, dipankar, balbir, vatsa, ego, andi, davecb, tconnors,
	maxk, gregory.haskins, pavel

* Andrew Morton <akpm@linux-foundation.org> [2008-12-17 17:42:54]:

> On Wed, 17 Dec 2008 22:57:38 +0530
> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:
> 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -782,6 +782,16 @@ enum powersavings_balance_level {
> >  	((sched_mc_power_savings || sched_smt_power_savings) ?	\
> >  	 SD_POWERSAVINGS_BALANCE : 0)
> 
> What's with all the crappy macros in here?

Hi Andrew,

These macros set the SD_POWERSAVINGS_BALANCE flag based on the
sysfs tunable.
 
> > +/*
> > + * Optimise SD flags for power savings:
> > + * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
> > + * Keep default SD flags if sched_{smt,mc}_power_saving=0
> > + */
> > +
> > +#define POWERSAVING_SD_FLAGS	\
> > +	((sched_mc_power_savings || sched_smt_power_savings) ?	\
> > +	  SD_BALANCE_NEWIDLE : 0)
> 
> This one purports to be a constant, but it isn't - it's code.
> 
> It would be cleaner, clearer and more idiomatic to do
> 
> static inline int powersaving_sd_flags(void)
> {
> 	...
> }

Your are suggesting to move these to inline functions.   I will write
a patch and post for review.
 
> Also, doing (sched_mc_power_savings | sched_smt_power_saving) might
> save a branch.
> 
> >  #define test_sd_parent(sd, flag)	((sd->parent &&		\
> >  					 (sd->parent->flags & flag)) ? 1 : 0)
> 
> buggy when passed an expression with side-effects.  Doesn't need to be
> implemented as a macro.

Agreed, but these macros are used throughout sched.c and are
performance sensitive.  Inline functions are a close enough
replacement for the macro let me look for any performance penalty as
well and report.

Thanks for the review and comments.

--Vaidy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  2008-12-18  9:35     ` Vaidyanathan Srinivasan
@ 2008-12-18 12:46       ` Ingo Molnar
  2008-12-18 15:22         ` Vaidyanathan Srinivasan
  0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2008-12-18 12:46 UTC (permalink / raw)
  To: Vaidyanathan Srinivasan
  Cc: Andrew Morton, linux-kernel, suresh.b.siddha,
	venkatesh.pallipadi, a.p.zijlstra, dipankar, balbir, vatsa, ego,
	andi, davecb, tconnors, maxk, gregory.haskins, pavel,
	Vaidyanathan Srinivasan


* Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> * Andrew Morton <akpm@linux-foundation.org> [2008-12-17 17:42:54]:
> 
> > On Wed, 17 Dec 2008 22:57:38 +0530
> > Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:
> > 
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -782,6 +782,16 @@ enum powersavings_balance_level {
> > >  	((sched_mc_power_savings || sched_smt_power_savings) ?	\
> > >  	 SD_POWERSAVINGS_BALANCE : 0)
> > 
> > What's with all the crappy macros in here?
> 
> Hi Andrew,
> 
> These macros set the SD_POWERSAVINGS_BALANCE flag based on the
> sysfs tunable.
>  
> > > +/*
> > > + * Optimise SD flags for power savings:
> > > + * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
> > > + * Keep default SD flags if sched_{smt,mc}_power_saving=0
> > > + */
> > > +
> > > +#define POWERSAVING_SD_FLAGS	\
> > > +	((sched_mc_power_savings || sched_smt_power_savings) ?	\
> > > +	  SD_BALANCE_NEWIDLE : 0)
> > 
> > This one purports to be a constant, but it isn't - it's code.
> > 
> > It would be cleaner, clearer and more idiomatic to do
> > 
> > static inline int powersaving_sd_flags(void)
> > {
> > 	...
> > }
> 
> Your are suggesting to move these to inline functions.   I will write
> a patch and post for review.
>  
> > Also, doing (sched_mc_power_savings | sched_smt_power_saving) might
> > save a branch.
> > 
> > >  #define test_sd_parent(sd, flag)	((sd->parent &&		\
> > >  					 (sd->parent->flags & flag)) ? 1 : 0)
> > 
> > buggy when passed an expression with side-effects.  Doesn't need to be
> > implemented as a macro.
> 
> Agreed, but these macros are used throughout sched.c and are performance 
> sensitive.  Inline functions are a close enough replacement for the 
> macro let me look for any performance penalty as well and report.

those macros are historic so feel free to convert them to inlines without 
re-measuring performance impact.

The patchset looks pretty good in principle otherwise, so if you could 
address Andrew's comments and clean up those bits in the next iteration we 
could start testing it in the scheduler tree. (Please also add Balbir 
Singh's acks to the next iteration.)

and please fix your mailer to not inject stuff like this:

 Mail-Followup-To: Andrew Morton <akpm@linux-foundation.org>,
        linux-kernel@vger.kernel.org, suresh.b.siddha@intel.com,
        venkatesh.pallipadi@intel.com, a.p.zijlstra@chello.nl,
        mingo@elte.hu, dipankar@in.ibm.com, balbir@linux.vnet.ibm.com,
        vatsa@linux.vnet.ibm.com, ego@in.ibm.com, andi@firstfloor.org,
        davecb@sun.com, tconnors@astro.swin.edu.au, maxk@qualcomm.com,
        gregory.haskins@gmail.com, pavel@suse.cz

It utterly messed up the addressing mode of my reply here and i had to 
edit the raw email headers manually to fix it up ;-)

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  2008-12-18 12:46       ` Ingo Molnar
@ 2008-12-18 15:22         ` Vaidyanathan Srinivasan
  0 siblings, 0 replies; 14+ messages in thread
From: Vaidyanathan Srinivasan @ 2008-12-18 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-kernel, suresh.b.siddha,
	venkatesh.pallipadi, a.p.zijlstra, dipankar, balbir, vatsa, ego,
	andi, davecb, tconnors, maxk, gregory.haskins, pavel

* Ingo Molnar <mingo@elte.hu> [2008-12-18 13:46:44]:

> 
> * Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:
> 
> > * Andrew Morton <akpm@linux-foundation.org> [2008-12-17 17:42:54]:
> > 
> > > On Wed, 17 Dec 2008 22:57:38 +0530
> > > Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:
> > > 
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -782,6 +782,16 @@ enum powersavings_balance_level {
> > > >  	((sched_mc_power_savings || sched_smt_power_savings) ?	\
> > > >  	 SD_POWERSAVINGS_BALANCE : 0)
> > > 
> > > What's with all the crappy macros in here?
> > 
> > Hi Andrew,
> > 
> > These macros set the SD_POWERSAVINGS_BALANCE flag based on the
> > sysfs tunable.
> >  
> > > > +/*
> > > > + * Optimise SD flags for power savings:
> > > > + * SD_BALANCE_NEWIDLE helps agressive task consolidation and power savings.
> > > > + * Keep default SD flags if sched_{smt,mc}_power_saving=0
> > > > + */
> > > > +
> > > > +#define POWERSAVING_SD_FLAGS	\
> > > > +	((sched_mc_power_savings || sched_smt_power_savings) ?	\
> > > > +	  SD_BALANCE_NEWIDLE : 0)
> > > 
> > > This one purports to be a constant, but it isn't - it's code.
> > > 
> > > It would be cleaner, clearer and more idiomatic to do
> > > 
> > > static inline int powersaving_sd_flags(void)
> > > {
> > > 	...
> > > }
> > 
> > Your are suggesting to move these to inline functions.   I will write
> > a patch and post for review.
> >  
> > > Also, doing (sched_mc_power_savings | sched_smt_power_saving) might
> > > save a branch.
> > > 
> > > >  #define test_sd_parent(sd, flag)	((sd->parent &&		\
> > > >  					 (sd->parent->flags & flag)) ? 1 : 0)
> > > 
> > > buggy when passed an expression with side-effects.  Doesn't need to be
> > > implemented as a macro.
> > 
> > Agreed, but these macros are used throughout sched.c and are performance 
> > sensitive.  Inline functions are a close enough replacement for the 
> > macro let me look for any performance penalty as well and report.
> 
> those macros are historic so feel free to convert them to inlines without 
> re-measuring performance impact.

Sure Ingo.  I will go ahead and change them in my next iteration.
 
> The patchset looks pretty good in principle otherwise, so if you could 
> address Andrew's comments and clean up those bits in the next iteration we 
> could start testing it in the scheduler tree. (Please also add Balbir 
> Singh's acks to the next iteration.)

Thank you for acking the patch.  I will address Andrew's comments and
post the next iteration along with Balbir's acks.

> and please fix your mailer to not inject stuff like this:
> 
>  Mail-Followup-To: Andrew Morton <akpm@linux-foundation.org>,
>         linux-kernel@vger.kernel.org, suresh.b.siddha@intel.com,
>         venkatesh.pallipadi@intel.com, a.p.zijlstra@chello.nl,
>         mingo@elte.hu, dipankar@in.ibm.com, balbir@linux.vnet.ibm.com,
>         vatsa@linux.vnet.ibm.com, ego@in.ibm.com, andi@firstfloor.org,
>         davecb@sun.com, tconnors@astro.swin.edu.au, maxk@qualcomm.com,
>         gregory.haskins@gmail.com, pavel@suse.cz
> 
> It utterly messed up the addressing mode of my reply here and i had to 
> edit the raw email headers manually to fix it up ;-)

OOPS! My bad mutt config!  I have tried to fix this.  Hopefully this
will not cause trouble anymore.

Thanks,
Vaidy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-12-18 15:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-17 17:26 [PATCH v6 0/7] Tunable sched_mc_power_savings=n Vaidyanathan Srinivasan
2008-12-17 17:26 ` [PATCH v6 1/7] sched: Framework for sched_mc/smt_power_savings=N Vaidyanathan Srinivasan
2008-12-17 18:10   ` Balbir Singh
2008-12-17 17:26 ` [PATCH v6 2/7] sched: favour lower logical cpu number for sched_mc balance Vaidyanathan Srinivasan
2008-12-17 18:09   ` Balbir Singh
2008-12-17 17:26 ` [PATCH v6 3/7] sched: nominate preferred wakeup cpu Vaidyanathan Srinivasan
2008-12-17 17:27 ` [PATCH v6 4/7] sched: bias task wakeups to preferred semi-idle packages Vaidyanathan Srinivasan
2008-12-17 17:27 ` [PATCH v6 5/7] sched: activate active load balancing in new idle cpus Vaidyanathan Srinivasan
2008-12-17 17:27 ` [PATCH v6 6/7] sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 Vaidyanathan Srinivasan
2008-12-18  1:42   ` Andrew Morton
2008-12-18  9:35     ` Vaidyanathan Srinivasan
2008-12-18 12:46       ` Ingo Molnar
2008-12-18 15:22         ` Vaidyanathan Srinivasan
2008-12-17 17:27 ` [PATCH v6 7/7] sched: idle_balance() does not call load_balance_newidle() Vaidyanathan Srinivasan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.