All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] sched: consolidation of cpu_power
@ 2014-03-28 13:22 ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, linux, linux-arm-kernel
  Cc: preeti, Morten.Rasmussen, efault, linaro-kernel, Vincent Guittot

Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm

SMT system is no more the only system that can have a CPUs with capacity that
is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and groups of the sched_domains.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method ensures that we will not return more
capacity than number of real cores but it returns wrong value for group of
LITTLE cores and in some situation for SMT system. The proposed solution
computes the ratio between CPUs and cores for a group durint the init sequence
and uses it with power and power_orig to return the current capacity of a
group.

[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377

Vincent Guittot (4):
  sched: extend the usage of cpu_power_orig
  ARM: topology: use new cpu_power interface
  sched: fix computed capacity for HMP
  sched: add per group cpu_power_orig

 arch/arm/kernel/topology.c |  4 ++--
 kernel/sched/core.c        |  9 ++++++++-
 kernel/sched/fair.c        | 31 +++++++++++++++++++------------
 kernel/sched/sched.h       |  3 ++-
 4 files changed, 31 insertions(+), 16 deletions(-)

-- 
1.9.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 0/4] sched: consolidation of cpu_power
@ 2014-03-28 13:22 ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

Part of this patchset was previously part of the larger tasks packing patchset
[1]. I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology [2]
-update and consolidation of cpu_power (this patchset)
-tasks packing algorithm

SMT system is no more the only system that can have a CPUs with capacity that
is different from the default value. We need to extend the use of
cpu_power_orig to all kind of platform so the scheduler will have both the
maximum capacity (cpu_power_orig/power_orig) and the current capacity
(cpu_power/power) of CPUs and groups of the sched_domains.
During load balance, the scheduler evaluates the number of tasks that a group
of CPUs can handle. The current method ensures that we will not return more
capacity than number of real cores but it returns wrong value for group of
LITTLE cores and in some situation for SMT system. The proposed solution
computes the ratio between CPUs and cores for a group durint the init sequence
and uses it with power and power_orig to return the current capacity of a
group.

[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2014/3/19/377

Vincent Guittot (4):
  sched: extend the usage of cpu_power_orig
  ARM: topology: use new cpu_power interface
  sched: fix computed capacity for HMP
  sched: add per group cpu_power_orig

 arch/arm/kernel/topology.c |  4 ++--
 kernel/sched/core.c        |  9 ++++++++-
 kernel/sched/fair.c        | 31 +++++++++++++++++++------------
 kernel/sched/sched.h       |  3 ++-
 4 files changed, 31 insertions(+), 16 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 1/4] sched: extend the usage of cpu_power_orig
  2014-03-28 13:22 ` Vincent Guittot
@ 2014-03-28 13:22   ` Vincent Guittot
  -1 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, linux, linux-arm-kernel
  Cc: preeti, Morten.Rasmussen, efault, linaro-kernel, Vincent Guittot

cpu_power_orig is only changed for SMT system in order to reflect the lower
capacity of CPUs. Heterogenous system also have to reflect an original
capacity that is different from the default value.

Create a more generic function arch_scale_cpu_power that can be also used by
non SMT platform to set cpu_power_orig.

The weak behavior of arch_scale_cpu_power is the previous SMT one in order to
keep backward compatibility in the use of cpu_power_orig.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e9bd0b..ed42061 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5559,6 +5559,20 @@ unsigned long __weak arch_scale_smt_power(struct sched_domain *sd, int cpu)
 	return default_scale_smt_power(sd, cpu);
 }
 
+unsigned long __weak arch_scale_cpu_power(struct sched_domain *sd, int cpu)
+{
+	unsigned long weight = sd->span_weight;
+
+	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
+		if (sched_feat(ARCH_POWER))
+			return arch_scale_smt_power(sd, cpu);
+		else
+			return default_scale_smt_power(sd, cpu);
+	}
+
+	return SCHED_POWER_SCALE;
+}
+
 static unsigned long scale_rt_power(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -5590,18 +5604,12 @@ static unsigned long scale_rt_power(int cpu)
 
 static void update_cpu_power(struct sched_domain *sd, int cpu)
 {
-	unsigned long weight = sd->span_weight;
 	unsigned long power = SCHED_POWER_SCALE;
 	struct sched_group *sdg = sd->groups;
 
-	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
-		if (sched_feat(ARCH_POWER))
-			power *= arch_scale_smt_power(sd, cpu);
-		else
-			power *= default_scale_smt_power(sd, cpu);
+	power *= arch_scale_cpu_power(sd, cpu);
 
-		power >>= SCHED_POWER_SHIFT;
-	}
+	power >>= SCHED_POWER_SHIFT;
 
 	sdg->sgp->power_orig = power;
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 1/4] sched: extend the usage of cpu_power_orig
@ 2014-03-28 13:22   ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

cpu_power_orig is only changed for SMT system in order to reflect the lower
capacity of CPUs. Heterogenous system also have to reflect an original
capacity that is different from the default value.

Create a more generic function arch_scale_cpu_power that can be also used by
non SMT platform to set cpu_power_orig.

The weak behavior of arch_scale_cpu_power is the previous SMT one in order to
keep backward compatibility in the use of cpu_power_orig.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e9bd0b..ed42061 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5559,6 +5559,20 @@ unsigned long __weak arch_scale_smt_power(struct sched_domain *sd, int cpu)
 	return default_scale_smt_power(sd, cpu);
 }
 
+unsigned long __weak arch_scale_cpu_power(struct sched_domain *sd, int cpu)
+{
+	unsigned long weight = sd->span_weight;
+
+	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
+		if (sched_feat(ARCH_POWER))
+			return arch_scale_smt_power(sd, cpu);
+		else
+			return default_scale_smt_power(sd, cpu);
+	}
+
+	return SCHED_POWER_SCALE;
+}
+
 static unsigned long scale_rt_power(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -5590,18 +5604,12 @@ static unsigned long scale_rt_power(int cpu)
 
 static void update_cpu_power(struct sched_domain *sd, int cpu)
 {
-	unsigned long weight = sd->span_weight;
 	unsigned long power = SCHED_POWER_SCALE;
 	struct sched_group *sdg = sd->groups;
 
-	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
-		if (sched_feat(ARCH_POWER))
-			power *= arch_scale_smt_power(sd, cpu);
-		else
-			power *= default_scale_smt_power(sd, cpu);
+	power *= arch_scale_cpu_power(sd, cpu);
 
-		power >>= SCHED_POWER_SHIFT;
-	}
+	power >>= SCHED_POWER_SHIFT;
 
 	sdg->sgp->power_orig = power;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 2/4] ARM: topology: use new cpu_power interface
  2014-03-28 13:22 ` Vincent Guittot
@ 2014-03-28 13:22   ` Vincent Guittot
  -1 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, linux, linux-arm-kernel
  Cc: preeti, Morten.Rasmussen, efault, linaro-kernel, Vincent Guittot

Use the new arch_scale_cpu_power in order to reflect the original capacity of
a CPU instead of arch_scale_freq_power which is more linked to a scaling of
the capacity regarding frequency.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/arm/kernel/topology.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 0bc94b1..55a7d90 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -42,7 +42,7 @@
  */
 static DEFINE_PER_CPU(unsigned long, cpu_scale);
 
-unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu)
+unsigned long arch_scale_cpu_power(struct sched_domain *sd, int cpu)
 {
 	return per_cpu(cpu_scale, cpu);
 }
@@ -166,7 +166,7 @@ static void update_cpu_power(unsigned int cpu)
 	set_power_scale(cpu, cpu_capacity(cpu) / middle_capacity);
 
 	printk(KERN_INFO "CPU%u: update cpu_power %lu\n",
-		cpu, arch_scale_freq_power(NULL, cpu));
+		cpu, arch_scale_cpu_power(NULL, cpu));
 }
 
 #else
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 2/4] ARM: topology: use new cpu_power interface
@ 2014-03-28 13:22   ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

Use the new arch_scale_cpu_power in order to reflect the original capacity of
a CPU instead of arch_scale_freq_power which is more linked to a scaling of
the capacity regarding frequency.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/arm/kernel/topology.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 0bc94b1..55a7d90 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -42,7 +42,7 @@
  */
 static DEFINE_PER_CPU(unsigned long, cpu_scale);
 
-unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu)
+unsigned long arch_scale_cpu_power(struct sched_domain *sd, int cpu)
 {
 	return per_cpu(cpu_scale, cpu);
 }
@@ -166,7 +166,7 @@ static void update_cpu_power(unsigned int cpu)
 	set_power_scale(cpu, cpu_capacity(cpu) / middle_capacity);
 
 	printk(KERN_INFO "CPU%u: update cpu_power %lu\n",
-		cpu, arch_scale_freq_power(NULL, cpu));
+		cpu, arch_scale_cpu_power(NULL, cpu));
 }
 
 #else
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 3/4] sched: fix computed capacity for HMP
  2014-03-28 13:22 ` Vincent Guittot
@ 2014-03-28 13:22   ` Vincent Guittot
  -1 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, linux, linux-arm-kernel
  Cc: preeti, Morten.Rasmussen, efault, linaro-kernel, Vincent Guittot

The current sg_capacity solves the ghost cores issue for SMT system and
cluster made of big cores which have a cpu_power above SCHED_POWER_SCALE at
core level. But it still removes some real cores of a cluster made of LITTLE
cores which have a cpu_power below SCHED_POWER_SCALE.

Instead of using the power_orig to detect SMT system and compute a smt factor
that will be used to calculate the real number of cores, we set a core_fct
field when building the sched_domain topology. We can detect SMT system thanks
to SD_SHARE_CPUPOWER flag and set core_fct to know how many CPUs per core we
have. The core_fct will ensure that sg_capacity will return cores capacity of
a SMT system and will not remove any real core of LITTLE cluster.

This method also fixes a use case where the capacity of a SMT system was
overrated.
Let take the example of a system made of 8 cores HT system:
At CPU level, sg_capacity is cap to a maximum capacity of 8 whereas
DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) returns 9.
((589*16) / 1024) = 9.3
Now if 2 CPUs (1 core) are fully loaded by rt tasks, sg_capacity still returns
a capacity of 8 whereas it should return a capacity of 7. This happen because
DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) is still above 7.5:
((589*14) / 1024) = 8.05

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c  | 7 +++++++
 kernel/sched/fair.c  | 6 ++----
 kernel/sched/sched.h | 2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9d9776..5b20b27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5844,6 +5844,13 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 
 	WARN_ON(!sg);
 
+	if (!sd->child)
+		sg->core_fct = 1;
+	else if (sd->child->flags & SD_SHARE_CPUPOWER)
+		sg->core_fct = cpumask_weight(sched_group_cpus(sg));
+	else
+		sg->core_fct = sd->child->groups->core_fct;
+
 	do {
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
 		sg = sg->next;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ed42061..7387c05 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5773,12 +5773,10 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
 	power = group->sgp->power;
 	power_orig = group->sgp->power_orig;
 	cpus = group->group_weight;
+	smt = group->core_fct;
 
-	/* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
-	smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
-	capacity = cpus / smt; /* cores */
+	capacity = DIV_ROUND_CLOSEST(power * cpus, power_orig * smt);
 
-	capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE));
 	if (!capacity)
 		capacity = fix_small_capacity(env->sd, group);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c9007f2..46c3784 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -759,7 +759,7 @@ struct sched_group {
 	struct sched_group *next;	/* Must be a circular list */
 	atomic_t ref;
 
-	unsigned int group_weight;
+	unsigned int group_weight, core_fct;
 	struct sched_group_power *sgp;
 
 	/*
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 3/4] sched: fix computed capacity for HMP
@ 2014-03-28 13:22   ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-28 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

The current sg_capacity solves the ghost cores issue for SMT system and
cluster made of big cores which have a cpu_power above SCHED_POWER_SCALE at
core level. But it still removes some real cores of a cluster made of LITTLE
cores which have a cpu_power below SCHED_POWER_SCALE.

Instead of using the power_orig to detect SMT system and compute a smt factor
that will be used to calculate the real number of cores, we set a core_fct
field when building the sched_domain topology. We can detect SMT system thanks
to SD_SHARE_CPUPOWER flag and set core_fct to know how many CPUs per core we
have. The core_fct will ensure that sg_capacity will return cores capacity of
a SMT system and will not remove any real core of LITTLE cluster.

This method also fixes a use case where the capacity of a SMT system was
overrated.
Let take the example of a system made of 8 cores HT system:
At CPU level, sg_capacity is cap to a maximum capacity of 8 whereas
DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) returns 9.
((589*16) / 1024) = 9.3
Now if 2 CPUs (1 core) are fully loaded by rt tasks, sg_capacity still returns
a capacity of 8 whereas it should return a capacity of 7. This happen because
DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) is still above 7.5:
((589*14) / 1024) = 8.05

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c  | 7 +++++++
 kernel/sched/fair.c  | 6 ++----
 kernel/sched/sched.h | 2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9d9776..5b20b27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5844,6 +5844,13 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 
 	WARN_ON(!sg);
 
+	if (!sd->child)
+		sg->core_fct = 1;
+	else if (sd->child->flags & SD_SHARE_CPUPOWER)
+		sg->core_fct = cpumask_weight(sched_group_cpus(sg));
+	else
+		sg->core_fct = sd->child->groups->core_fct;
+
 	do {
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
 		sg = sg->next;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ed42061..7387c05 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5773,12 +5773,10 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
 	power = group->sgp->power;
 	power_orig = group->sgp->power_orig;
 	cpus = group->group_weight;
+	smt = group->core_fct;
 
-	/* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
-	smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
-	capacity = cpus / smt; /* cores */
+	capacity = DIV_ROUND_CLOSEST(power * cpus, power_orig * smt);
 
-	capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE));
 	if (!capacity)
 		capacity = fix_small_capacity(env->sd, group);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c9007f2..46c3784 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -759,7 +759,7 @@ struct sched_group {
 	struct sched_group *next;	/* Must be a circular list */
 	atomic_t ref;
 
-	unsigned int group_weight;
+	unsigned int group_weight, core_fct;
 	struct sched_group_power *sgp;
 
 	/*
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC 0/4] sched: consolidation of cpu_power
  2014-03-28 13:22 ` Vincent Guittot
@ 2014-03-29  2:27   ` Nicolas Pitre
  -1 siblings, 0 replies; 22+ messages in thread
From: Nicolas Pitre @ 2014-03-29  2:27 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, Russell King - ARM Linux,
	linux-arm-kernel, preeti, efault, linaro-kernel

On Fri, 28 Mar 2014, Vincent Guittot wrote:

> Part of this patchset was previously part of the larger tasks packing patchset
> [1]. I have splitted the latter in 3 different patchsets (at least) to make the
> thing easier.
> -configuration of sched_domain topology [2]
> -update and consolidation of cpu_power (this patchset)
> -tasks packing algorithm
> 
> SMT system is no more the only system that can have a CPUs with capacity that
> is different from the default value. We need to extend the use of
> cpu_power_orig to all kind of platform so the scheduler will have both the
> maximum capacity (cpu_power_orig/power_orig) and the current capacity
> (cpu_power/power) of CPUs and groups of the sched_domains.
[...]

Wouldn't it make sense to have a patch in this series that performs some 
s/power/capacity/ throughout the scheduler code before the other patches 
are applied?  That would help getting consistent terminology and avoid 
confusion before real power related metrics are eventually brought into 
the scheduler.


Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 0/4] sched: consolidation of cpu_power
@ 2014-03-29  2:27   ` Nicolas Pitre
  0 siblings, 0 replies; 22+ messages in thread
From: Nicolas Pitre @ 2014-03-29  2:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 28 Mar 2014, Vincent Guittot wrote:

> Part of this patchset was previously part of the larger tasks packing patchset
> [1]. I have splitted the latter in 3 different patchsets (at least) to make the
> thing easier.
> -configuration of sched_domain topology [2]
> -update and consolidation of cpu_power (this patchset)
> -tasks packing algorithm
> 
> SMT system is no more the only system that can have a CPUs with capacity that
> is different from the default value. We need to extend the use of
> cpu_power_orig to all kind of platform so the scheduler will have both the
> maximum capacity (cpu_power_orig/power_orig) and the current capacity
> (cpu_power/power) of CPUs and groups of the sched_domains.
[...]

Wouldn't it make sense to have a patch in this series that performs some 
s/power/capacity/ throughout the scheduler code before the other patches 
are applied?  That would help getting consistent terminology and avoid 
confusion before real power related metrics are eventually brought into 
the scheduler.


Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 0/4] sched: consolidation of cpu_power
  2014-03-29  2:27   ` Nicolas Pitre
@ 2014-03-31  7:31     ` Vincent Guittot
  -1 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-31  7:31 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel,
	Russell King - ARM Linux, LAK, Preeti U Murthy, Mike Galbraith,
	linaro-kernel

On 29 March 2014 03:27, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Fri, 28 Mar 2014, Vincent Guittot wrote:
>
>> Part of this patchset was previously part of the larger tasks packing patchset
>> [1]. I have splitted the latter in 3 different patchsets (at least) to make the
>> thing easier.
>> -configuration of sched_domain topology [2]
>> -update and consolidation of cpu_power (this patchset)
>> -tasks packing algorithm
>>
>> SMT system is no more the only system that can have a CPUs with capacity that
>> is different from the default value. We need to extend the use of
>> cpu_power_orig to all kind of platform so the scheduler will have both the
>> maximum capacity (cpu_power_orig/power_orig) and the current capacity
>> (cpu_power/power) of CPUs and groups of the sched_domains.
> [...]
>
> Wouldn't it make sense to have a patch in this series that performs some
> s/power/capacity/ throughout the scheduler code before the other patches
> are applied?  That would help getting consistent terminology and avoid
> confusion before real power related metrics are eventually brought into
> the scheduler.

Fair point. I will add the changes in the next version.

Thanks
Vincent

>
>
> Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 0/4] sched: consolidation of cpu_power
@ 2014-03-31  7:31     ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-03-31  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 29 March 2014 03:27, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Fri, 28 Mar 2014, Vincent Guittot wrote:
>
>> Part of this patchset was previously part of the larger tasks packing patchset
>> [1]. I have splitted the latter in 3 different patchsets (at least) to make the
>> thing easier.
>> -configuration of sched_domain topology [2]
>> -update and consolidation of cpu_power (this patchset)
>> -tasks packing algorithm
>>
>> SMT system is no more the only system that can have a CPUs with capacity that
>> is different from the default value. We need to extend the use of
>> cpu_power_orig to all kind of platform so the scheduler will have both the
>> maximum capacity (cpu_power_orig/power_orig) and the current capacity
>> (cpu_power/power) of CPUs and groups of the sched_domains.
> [...]
>
> Wouldn't it make sense to have a patch in this series that performs some
> s/power/capacity/ throughout the scheduler code before the other patches
> are applied?  That would help getting consistent terminology and avoid
> confusion before real power related metrics are eventually brought into
> the scheduler.

Fair point. I will add the changes in the next version.

Thanks
Vincent

>
>
> Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/4] sched: extend the usage of cpu_power_orig
  2014-03-28 13:22   ` Vincent Guittot
@ 2014-04-01 10:39     ` Preeti U Murthy
  -1 siblings, 0 replies; 22+ messages in thread
From: Preeti U Murthy @ 2014-04-01 10:39 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, linux, linux-arm-kernel,
	Morten.Rasmussen, efault, linaro-kernel

Hi Vincent,

On 03/28/2014 06:52 PM, Vincent Guittot wrote:
> cpu_power_orig is only changed for SMT system in order to reflect the lower
> capacity of CPUs. Heterogenous system also have to reflect an original
> capacity that is different from the default value.

There is no parameter 'cpu_power_orig' till your fourth patch right?
Why is this term being used in this patch?

Besides, both parameters power and power_orig are changed for SMT
systems to reflect the lower capacity of the CPUs.Why is there a mention
of only power_orig?

IMO, the subject of the patch is not clearly reflecting the main
intention of the patch. There is nothing done to change the way
cpu_power is used; rather you are changing the way the cpu_power is
being set to be flexible, thus allowing for the right power value to be
set on heterogeneous systems.

'Allow archs to set the cpu_power instead of falling to default value'
or something similar would be more appropriate. What do you think?

Regards
Preeti U Murthy
> 
> Create a more generic function arch_scale_cpu_power that can be also used by
> non SMT platform to set cpu_power_orig.
> 
> The weak behavior of arch_scale_cpu_power is the previous SMT one in order to
> keep backward compatibility in the use of cpu_power_orig.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7e9bd0b..ed42061 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5559,6 +5559,20 @@ unsigned long __weak arch_scale_smt_power(struct sched_domain *sd, int cpu)
>  	return default_scale_smt_power(sd, cpu);
>  }
> 
> +unsigned long __weak arch_scale_cpu_power(struct sched_domain *sd, int cpu)
> +{
> +	unsigned long weight = sd->span_weight;
> +
> +	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
> +		if (sched_feat(ARCH_POWER))
> +			return arch_scale_smt_power(sd, cpu);
> +		else
> +			return default_scale_smt_power(sd, cpu);
> +	}
> +
> +	return SCHED_POWER_SCALE;
> +}
> +
>  static unsigned long scale_rt_power(int cpu)
>  {
>  	struct rq *rq = cpu_rq(cpu);
> @@ -5590,18 +5604,12 @@ static unsigned long scale_rt_power(int cpu)
> 
>  static void update_cpu_power(struct sched_domain *sd, int cpu)
>  {
> -	unsigned long weight = sd->span_weight;
>  	unsigned long power = SCHED_POWER_SCALE;
>  	struct sched_group *sdg = sd->groups;
> 
> -	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
> -		if (sched_feat(ARCH_POWER))
> -			power *= arch_scale_smt_power(sd, cpu);
> -		else
> -			power *= default_scale_smt_power(sd, cpu);
> +	power *= arch_scale_cpu_power(sd, cpu);
> 
> -		power >>= SCHED_POWER_SHIFT;
> -	}
> +	power >>= SCHED_POWER_SHIFT;
> 
>  	sdg->sgp->power_orig = power;
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 1/4] sched: extend the usage of cpu_power_orig
@ 2014-04-01 10:39     ` Preeti U Murthy
  0 siblings, 0 replies; 22+ messages in thread
From: Preeti U Murthy @ 2014-04-01 10:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Vincent,

On 03/28/2014 06:52 PM, Vincent Guittot wrote:
> cpu_power_orig is only changed for SMT system in order to reflect the lower
> capacity of CPUs. Heterogenous system also have to reflect an original
> capacity that is different from the default value.

There is no parameter 'cpu_power_orig' till your fourth patch right?
Why is this term being used in this patch?

Besides, both parameters power and power_orig are changed for SMT
systems to reflect the lower capacity of the CPUs.Why is there a mention
of only power_orig?

IMO, the subject of the patch is not clearly reflecting the main
intention of the patch. There is nothing done to change the way
cpu_power is used; rather you are changing the way the cpu_power is
being set to be flexible, thus allowing for the right power value to be
set on heterogeneous systems.

'Allow archs to set the cpu_power instead of falling to default value'
or something similar would be more appropriate. What do you think?

Regards
Preeti U Murthy
> 
> Create a more generic function arch_scale_cpu_power that can be also used by
> non SMT platform to set cpu_power_orig.
> 
> The weak behavior of arch_scale_cpu_power is the previous SMT one in order to
> keep backward compatibility in the use of cpu_power_orig.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7e9bd0b..ed42061 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5559,6 +5559,20 @@ unsigned long __weak arch_scale_smt_power(struct sched_domain *sd, int cpu)
>  	return default_scale_smt_power(sd, cpu);
>  }
> 
> +unsigned long __weak arch_scale_cpu_power(struct sched_domain *sd, int cpu)
> +{
> +	unsigned long weight = sd->span_weight;
> +
> +	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
> +		if (sched_feat(ARCH_POWER))
> +			return arch_scale_smt_power(sd, cpu);
> +		else
> +			return default_scale_smt_power(sd, cpu);
> +	}
> +
> +	return SCHED_POWER_SCALE;
> +}
> +
>  static unsigned long scale_rt_power(int cpu)
>  {
>  	struct rq *rq = cpu_rq(cpu);
> @@ -5590,18 +5604,12 @@ static unsigned long scale_rt_power(int cpu)
> 
>  static void update_cpu_power(struct sched_domain *sd, int cpu)
>  {
> -	unsigned long weight = sd->span_weight;
>  	unsigned long power = SCHED_POWER_SCALE;
>  	struct sched_group *sdg = sd->groups;
> 
> -	if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
> -		if (sched_feat(ARCH_POWER))
> -			power *= arch_scale_smt_power(sd, cpu);
> -		else
> -			power *= default_scale_smt_power(sd, cpu);
> +	power *= arch_scale_cpu_power(sd, cpu);
> 
> -		power >>= SCHED_POWER_SHIFT;
> -	}
> +	power >>= SCHED_POWER_SHIFT;
> 
>  	sdg->sgp->power_orig = power;
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/4] sched: extend the usage of cpu_power_orig
  2014-04-01 10:39     ` Preeti U Murthy
@ 2014-04-01 11:20       ` Vincent Guittot
  -1 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-04-01 11:20 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel,
	Russell King - ARM Linux, LAK, Morten Rasmussen, Mike Galbraith,
	linaro-kernel

On 1 April 2014 12:39, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> Hi Vincent,
>
> On 03/28/2014 06:52 PM, Vincent Guittot wrote:
>> cpu_power_orig is only changed for SMT system in order to reflect the lower
>> capacity of CPUs. Heterogenous system also have to reflect an original
>> capacity that is different from the default value.
>
> There is no parameter 'cpu_power_orig' till your fourth patch right?
> Why is this term being used in this patch?

It looks like that i have mixed power_orig and cpu_power_orig in my
commit messages

>
> Besides, both parameters power and power_orig are changed for SMT
> systems to reflect the lower capacity of the CPUs.Why is there a mention
> of only power_orig?

Only SMT system is able to change power_orig from default value
whereas all systems can already change power field with
arch_scale_freq_power function. The goal of this patch is to change
the function name that is used to set power_orig value to a more
generic one and to extend the possibility of setting power_orig for
any kind of system. The behavior of the power field is not changed
with this patchset.

>
> IMO, the subject of the patch is not clearly reflecting the main
> intention of the patch. There is nothing done to change the way
> cpu_power is used; rather you are changing the way the cpu_power is
> being set to be flexible, thus allowing for the right power value to be
> set on heterogeneous systems.
>
> 'Allow archs to set the cpu_power instead of falling to default value'
> or something similar would be more appropriate. What do you think?

I can change with : "Allow all archs to set the power_orig"

Thanks

>
> Regards
> Preeti U Murthy
>>
>> Create a more generic function arch_scale_cpu_power that can be also used by
>> non SMT platform to set cpu_power_orig.
>>
>> The weak behavior of arch_scale_cpu_power is the previous SMT one in order to
>> keep backward compatibility in the use of cpu_power_orig.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c | 24 ++++++++++++++++--------
>>  1 file changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 7e9bd0b..ed42061 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5559,6 +5559,20 @@ unsigned long __weak arch_scale_smt_power(struct sched_domain *sd, int cpu)
>>       return default_scale_smt_power(sd, cpu);
>>  }
>>
>> +unsigned long __weak arch_scale_cpu_power(struct sched_domain *sd, int cpu)
>> +{
>> +     unsigned long weight = sd->span_weight;
>> +
>> +     if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
>> +             if (sched_feat(ARCH_POWER))
>> +                     return arch_scale_smt_power(sd, cpu);
>> +             else
>> +                     return default_scale_smt_power(sd, cpu);
>> +     }
>> +
>> +     return SCHED_POWER_SCALE;
>> +}
>> +
>>  static unsigned long scale_rt_power(int cpu)
>>  {
>>       struct rq *rq = cpu_rq(cpu);
>> @@ -5590,18 +5604,12 @@ static unsigned long scale_rt_power(int cpu)
>>
>>  static void update_cpu_power(struct sched_domain *sd, int cpu)
>>  {
>> -     unsigned long weight = sd->span_weight;
>>       unsigned long power = SCHED_POWER_SCALE;
>>       struct sched_group *sdg = sd->groups;
>>
>> -     if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
>> -             if (sched_feat(ARCH_POWER))
>> -                     power *= arch_scale_smt_power(sd, cpu);
>> -             else
>> -                     power *= default_scale_smt_power(sd, cpu);
>> +     power *= arch_scale_cpu_power(sd, cpu);
>>
>> -             power >>= SCHED_POWER_SHIFT;
>> -     }
>> +     power >>= SCHED_POWER_SHIFT;
>>
>>       sdg->sgp->power_orig = power;
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 1/4] sched: extend the usage of cpu_power_orig
@ 2014-04-01 11:20       ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-04-01 11:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 1 April 2014 12:39, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> Hi Vincent,
>
> On 03/28/2014 06:52 PM, Vincent Guittot wrote:
>> cpu_power_orig is only changed for SMT system in order to reflect the lower
>> capacity of CPUs. Heterogenous system also have to reflect an original
>> capacity that is different from the default value.
>
> There is no parameter 'cpu_power_orig' till your fourth patch right?
> Why is this term being used in this patch?

It looks like that i have mixed power_orig and cpu_power_orig in my
commit messages

>
> Besides, both parameters power and power_orig are changed for SMT
> systems to reflect the lower capacity of the CPUs.Why is there a mention
> of only power_orig?

Only SMT system is able to change power_orig from default value
whereas all systems can already change power field with
arch_scale_freq_power function. The goal of this patch is to change
the function name that is used to set power_orig value to a more
generic one and to extend the possibility of setting power_orig for
any kind of system. The behavior of the power field is not changed
with this patchset.

>
> IMO, the subject of the patch is not clearly reflecting the main
> intention of the patch. There is nothing done to change the way
> cpu_power is used; rather you are changing the way the cpu_power is
> being set to be flexible, thus allowing for the right power value to be
> set on heterogeneous systems.
>
> 'Allow archs to set the cpu_power instead of falling to default value'
> or something similar would be more appropriate. What do you think?

I can change with : "Allow all archs to set the power_orig"

Thanks

>
> Regards
> Preeti U Murthy
>>
>> Create a more generic function arch_scale_cpu_power that can be also used by
>> non SMT platform to set cpu_power_orig.
>>
>> The weak behavior of arch_scale_cpu_power is the previous SMT one in order to
>> keep backward compatibility in the use of cpu_power_orig.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c | 24 ++++++++++++++++--------
>>  1 file changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 7e9bd0b..ed42061 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5559,6 +5559,20 @@ unsigned long __weak arch_scale_smt_power(struct sched_domain *sd, int cpu)
>>       return default_scale_smt_power(sd, cpu);
>>  }
>>
>> +unsigned long __weak arch_scale_cpu_power(struct sched_domain *sd, int cpu)
>> +{
>> +     unsigned long weight = sd->span_weight;
>> +
>> +     if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
>> +             if (sched_feat(ARCH_POWER))
>> +                     return arch_scale_smt_power(sd, cpu);
>> +             else
>> +                     return default_scale_smt_power(sd, cpu);
>> +     }
>> +
>> +     return SCHED_POWER_SCALE;
>> +}
>> +
>>  static unsigned long scale_rt_power(int cpu)
>>  {
>>       struct rq *rq = cpu_rq(cpu);
>> @@ -5590,18 +5604,12 @@ static unsigned long scale_rt_power(int cpu)
>>
>>  static void update_cpu_power(struct sched_domain *sd, int cpu)
>>  {
>> -     unsigned long weight = sd->span_weight;
>>       unsigned long power = SCHED_POWER_SCALE;
>>       struct sched_group *sdg = sd->groups;
>>
>> -     if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) {
>> -             if (sched_feat(ARCH_POWER))
>> -                     power *= arch_scale_smt_power(sd, cpu);
>> -             else
>> -                     power *= default_scale_smt_power(sd, cpu);
>> +     power *= arch_scale_cpu_power(sd, cpu);
>>
>> -             power >>= SCHED_POWER_SHIFT;
>> -     }
>> +     power >>= SCHED_POWER_SHIFT;
>>
>>       sdg->sgp->power_orig = power;
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 1/4] sched: extend the usage of cpu_power_orig
  2014-04-01 11:20       ` Vincent Guittot
@ 2014-04-03 10:40         ` Preeti U Murthy
  -1 siblings, 0 replies; 22+ messages in thread
From: Preeti U Murthy @ 2014-04-03 10:40 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel,
	Russell King - ARM Linux, LAK, Morten Rasmussen, Mike Galbraith,
	linaro-kernel

On 04/01/2014 04:50 PM, Vincent Guittot wrote:
> On 1 April 2014 12:39, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> Hi Vincent,
>>
>> On 03/28/2014 06:52 PM, Vincent Guittot wrote:
>>> cpu_power_orig is only changed for SMT system in order to reflect the lower
>>> capacity of CPUs. Heterogenous system also have to reflect an original
>>> capacity that is different from the default value.
>>
>> There is no parameter 'cpu_power_orig' till your fourth patch right?
>> Why is this term being used in this patch?
> 
> It looks like that i have mixed power_orig and cpu_power_orig in my
> commit messages
> 
>>
>> Besides, both parameters power and power_orig are changed for SMT
>> systems to reflect the lower capacity of the CPUs.Why is there a mention
>> of only power_orig?
> 
> Only SMT system is able to change power_orig from default value
> whereas all systems can already change power field with
> arch_scale_freq_power function. The goal of this patch is to change

I am unable to understand this. arch_scale_freq_power() is not doing the
same job as arch_scale_smt_power() right? While the former allows the
arch to set a power value per cpu different from the default value, the
latter allows arch to adjust the power for hyper threads.

For example in big.LITTLE, the big cores would be required to have a
higher cpu power than what is currently being used as default. This is
done by arch_scale_freq_power(). Whereas for an SMT system, the power of
a core could perhaps be the same as default value, but needs to be
divided among the hyper threads. This is done by arch_scale_smt_power().
They have different purposes right?

So I do not understand why you mention "all systems can already change
power field with arch_scale_freq_power()."

Actually I was assuming this patch would introduce
arch_scale_smt_power() and arch_scale_freq_power(), two functions that
would *enable all archs and not just ARCH_POWER* as is found today, to
adjust their smt power and non-smt power respectively.

For the same reasons, I am also unclear as to why power_orig is not
initialized *after* default/arch_scale_freq_power(). For big cores for
instance you would want the power_orig to be set not to a default value
but to what the arch says as the frequency of a core scaled over the
default value.

> the function name that is used to set power_orig value to a more
> generic one and to extend the possibility of setting power_orig for
> any kind of system. The behavior of the power field is not changed
> with this patchset.
> 
>>
>> IMO, the subject of the patch is not clearly reflecting the main
>> intention of the patch. There is nothing done to change the way
>> cpu_power is used; rather you are changing the way the cpu_power is
>> being set to be flexible, thus allowing for the right power value to be
>> set on heterogeneous systems.
>>
>> 'Allow archs to set the cpu_power instead of falling to default value'
>> or something similar would be more appropriate. What do you think?
> 
> I can change with : "Allow all archs to set the power_orig"

Ok, but for the reasons mentioned above I was thinking the power_orig
should be avoided since archs should be able to scale both power and
power_orig per thread and not just ARCH_POWER.

Regards
Preeti U Murthy


> 
> Thanks
> 
>>
>> Regards
>> Preeti U Murthy


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 1/4] sched: extend the usage of cpu_power_orig
@ 2014-04-03 10:40         ` Preeti U Murthy
  0 siblings, 0 replies; 22+ messages in thread
From: Preeti U Murthy @ 2014-04-03 10:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/01/2014 04:50 PM, Vincent Guittot wrote:
> On 1 April 2014 12:39, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> Hi Vincent,
>>
>> On 03/28/2014 06:52 PM, Vincent Guittot wrote:
>>> cpu_power_orig is only changed for SMT system in order to reflect the lower
>>> capacity of CPUs. Heterogenous system also have to reflect an original
>>> capacity that is different from the default value.
>>
>> There is no parameter 'cpu_power_orig' till your fourth patch right?
>> Why is this term being used in this patch?
> 
> It looks like that i have mixed power_orig and cpu_power_orig in my
> commit messages
> 
>>
>> Besides, both parameters power and power_orig are changed for SMT
>> systems to reflect the lower capacity of the CPUs.Why is there a mention
>> of only power_orig?
> 
> Only SMT system is able to change power_orig from default value
> whereas all systems can already change power field with
> arch_scale_freq_power function. The goal of this patch is to change

I am unable to understand this. arch_scale_freq_power() is not doing the
same job as arch_scale_smt_power() right? While the former allows the
arch to set a power value per cpu different from the default value, the
latter allows arch to adjust the power for hyper threads.

For example in big.LITTLE, the big cores would be required to have a
higher cpu power than what is currently being used as default. This is
done by arch_scale_freq_power(). Whereas for an SMT system, the power of
a core could perhaps be the same as default value, but needs to be
divided among the hyper threads. This is done by arch_scale_smt_power().
They have different purposes right?

So I do not understand why you mention "all systems can already change
power field with arch_scale_freq_power()."

Actually I was assuming this patch would introduce
arch_scale_smt_power() and arch_scale_freq_power(), two functions that
would *enable all archs and not just ARCH_POWER* as is found today, to
adjust their smt power and non-smt power respectively.

For the same reasons, I am also unclear as to why power_orig is not
initialized *after* default/arch_scale_freq_power(). For big cores for
instance you would want the power_orig to be set not to a default value
but to what the arch says as the frequency of a core scaled over the
default value.

> the function name that is used to set power_orig value to a more
> generic one and to extend the possibility of setting power_orig for
> any kind of system. The behavior of the power field is not changed
> with this patchset.
> 
>>
>> IMO, the subject of the patch is not clearly reflecting the main
>> intention of the patch. There is nothing done to change the way
>> cpu_power is used; rather you are changing the way the cpu_power is
>> being set to be flexible, thus allowing for the right power value to be
>> set on heterogeneous systems.
>>
>> 'Allow archs to set the cpu_power instead of falling to default value'
>> or something similar would be more appropriate. What do you think?
> 
> I can change with : "Allow all archs to set the power_orig"

Ok, but for the reasons mentioned above I was thinking the power_orig
should be avoided since archs should be able to scale both power and
power_orig per thread and not just ARCH_POWER.

Regards
Preeti U Murthy


> 
> Thanks
> 
>>
>> Regards
>> Preeti U Murthy

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 3/4] sched: fix computed capacity for HMP
  2014-03-28 13:22   ` Vincent Guittot
@ 2014-04-15 17:22     ` Peter Zijlstra
  -1 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2014-04-15 17:22 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, linux, linux-arm-kernel, preeti,
	Morten.Rasmussen, efault, linaro-kernel

On Fri, Mar 28, 2014 at 02:22:28PM +0100, Vincent Guittot wrote:
> The current sg_capacity solves the ghost cores issue for SMT system and
> cluster made of big cores which have a cpu_power above SCHED_POWER_SCALE at
> core level. But it still removes some real cores of a cluster made of LITTLE
> cores which have a cpu_power below SCHED_POWER_SCALE.
> 
> Instead of using the power_orig to detect SMT system and compute a smt factor
> that will be used to calculate the real number of cores, we set a core_fct
> field when building the sched_domain topology. We can detect SMT system thanks
> to SD_SHARE_CPUPOWER flag and set core_fct to know how many CPUs per core we
> have. The core_fct will ensure that sg_capacity will return cores capacity of
> a SMT system and will not remove any real core of LITTLE cluster.
> 
> This method also fixes a use case where the capacity of a SMT system was
> overrated.
> Let take the example of a system made of 8 cores HT system:
> At CPU level, sg_capacity is cap to a maximum capacity of 8 whereas
> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) returns 9.
> ((589*16) / 1024) = 9.3
> Now if 2 CPUs (1 core) are fully loaded by rt tasks, sg_capacity still returns
> a capacity of 8 whereas it should return a capacity of 7. This happen because
> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) is still above 7.5:
> ((589*14) / 1024) = 8.05
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/core.c  | 7 +++++++
>  kernel/sched/fair.c  | 6 ++----
>  kernel/sched/sched.h | 2 +-
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f9d9776..5b20b27 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5844,6 +5844,13 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
>  
>  	WARN_ON(!sg);
>  
> +	if (!sd->child)
> +		sg->core_fct = 1;
> +	else if (sd->child->flags & SD_SHARE_CPUPOWER)
> +		sg->core_fct = cpumask_weight(sched_group_cpus(sg));
> +	else
> +		sg->core_fct = sd->child->groups->core_fct;
> +
>  	do {
>  		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
>  		sg = sg->next;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ed42061..7387c05 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5773,12 +5773,10 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
>  	power = group->sgp->power;
>  	power_orig = group->sgp->power_orig;
>  	cpus = group->group_weight;
> +	smt = group->core_fct;
>  
> -	/* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
> -	smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
> -	capacity = cpus / smt; /* cores */
> +	capacity = DIV_ROUND_CLOSEST(power * cpus, power_orig * smt);
>  
> -	capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE));
>  	if (!capacity)
>  		capacity = fix_small_capacity(env->sd, group);
>  

So this patch only cures a little problem; the much bigger problem is
that capacity as exists is completely wrong.

We really should be using utilization here. Until a CPU is fully
utilized we shouldn't be moving tasks around (unless packing, but where
not there yet, and in that case you want to stop, where this starts,
namely full utilization).

So while I appreciate what you're trying to 'fix' here, its really just
trying to dress a monkey.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 3/4] sched: fix computed capacity for HMP
@ 2014-04-15 17:22     ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2014-04-15 17:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 28, 2014 at 02:22:28PM +0100, Vincent Guittot wrote:
> The current sg_capacity solves the ghost cores issue for SMT system and
> cluster made of big cores which have a cpu_power above SCHED_POWER_SCALE at
> core level. But it still removes some real cores of a cluster made of LITTLE
> cores which have a cpu_power below SCHED_POWER_SCALE.
> 
> Instead of using the power_orig to detect SMT system and compute a smt factor
> that will be used to calculate the real number of cores, we set a core_fct
> field when building the sched_domain topology. We can detect SMT system thanks
> to SD_SHARE_CPUPOWER flag and set core_fct to know how many CPUs per core we
> have. The core_fct will ensure that sg_capacity will return cores capacity of
> a SMT system and will not remove any real core of LITTLE cluster.
> 
> This method also fixes a use case where the capacity of a SMT system was
> overrated.
> Let take the example of a system made of 8 cores HT system:
> At CPU level, sg_capacity is cap to a maximum capacity of 8 whereas
> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) returns 9.
> ((589*16) / 1024) = 9.3
> Now if 2 CPUs (1 core) are fully loaded by rt tasks, sg_capacity still returns
> a capacity of 8 whereas it should return a capacity of 7. This happen because
> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) is still above 7.5:
> ((589*14) / 1024) = 8.05
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/core.c  | 7 +++++++
>  kernel/sched/fair.c  | 6 ++----
>  kernel/sched/sched.h | 2 +-
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f9d9776..5b20b27 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5844,6 +5844,13 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
>  
>  	WARN_ON(!sg);
>  
> +	if (!sd->child)
> +		sg->core_fct = 1;
> +	else if (sd->child->flags & SD_SHARE_CPUPOWER)
> +		sg->core_fct = cpumask_weight(sched_group_cpus(sg));
> +	else
> +		sg->core_fct = sd->child->groups->core_fct;
> +
>  	do {
>  		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
>  		sg = sg->next;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ed42061..7387c05 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5773,12 +5773,10 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
>  	power = group->sgp->power;
>  	power_orig = group->sgp->power_orig;
>  	cpus = group->group_weight;
> +	smt = group->core_fct;
>  
> -	/* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
> -	smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
> -	capacity = cpus / smt; /* cores */
> +	capacity = DIV_ROUND_CLOSEST(power * cpus, power_orig * smt);
>  
> -	capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE));
>  	if (!capacity)
>  		capacity = fix_small_capacity(env->sd, group);
>  

So this patch only cures a little problem; the much bigger problem is
that capacity as exists is completely wrong.

We really should be using utilization here. Until a CPU is fully
utilized we shouldn't be moving tasks around (unless packing, but where
not there yet, and in that case you want to stop, where this starts,
namely full utilization).

So while I appreciate what you're trying to 'fix' here, its really just
trying to dress a monkey.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 3/4] sched: fix computed capacity for HMP
  2014-04-15 17:22     ` Peter Zijlstra
@ 2014-04-16  6:54       ` Vincent Guittot
  -1 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-04-16  6:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Russell King - ARM Linux, LAK,
	Preeti U Murthy, Morten Rasmussen, Mike Galbraith, linaro-kernel

On 15 April 2014 19:22, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Mar 28, 2014 at 02:22:28PM +0100, Vincent Guittot wrote:
>> The current sg_capacity solves the ghost cores issue for SMT system and
>> cluster made of big cores which have a cpu_power above SCHED_POWER_SCALE at
>> core level. But it still removes some real cores of a cluster made of LITTLE
>> cores which have a cpu_power below SCHED_POWER_SCALE.
>>
>> Instead of using the power_orig to detect SMT system and compute a smt factor
>> that will be used to calculate the real number of cores, we set a core_fct
>> field when building the sched_domain topology. We can detect SMT system thanks
>> to SD_SHARE_CPUPOWER flag and set core_fct to know how many CPUs per core we
>> have. The core_fct will ensure that sg_capacity will return cores capacity of
>> a SMT system and will not remove any real core of LITTLE cluster.
>>
>> This method also fixes a use case where the capacity of a SMT system was
>> overrated.
>> Let take the example of a system made of 8 cores HT system:
>> At CPU level, sg_capacity is cap to a maximum capacity of 8 whereas
>> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) returns 9.
>> ((589*16) / 1024) = 9.3
>> Now if 2 CPUs (1 core) are fully loaded by rt tasks, sg_capacity still returns
>> a capacity of 8 whereas it should return a capacity of 7. This happen because
>> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) is still above 7.5:
>> ((589*14) / 1024) = 8.05
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/core.c  | 7 +++++++
>>  kernel/sched/fair.c  | 6 ++----
>>  kernel/sched/sched.h | 2 +-
>>  3 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f9d9776..5b20b27 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5844,6 +5844,13 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
>>
>>       WARN_ON(!sg);
>>
>> +     if (!sd->child)
>> +             sg->core_fct = 1;
>> +     else if (sd->child->flags & SD_SHARE_CPUPOWER)
>> +             sg->core_fct = cpumask_weight(sched_group_cpus(sg));
>> +     else
>> +             sg->core_fct = sd->child->groups->core_fct;
>> +
>>       do {
>>               sg->group_weight = cpumask_weight(sched_group_cpus(sg));
>>               sg = sg->next;
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ed42061..7387c05 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5773,12 +5773,10 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
>>       power = group->sgp->power;
>>       power_orig = group->sgp->power_orig;
>>       cpus = group->group_weight;
>> +     smt = group->core_fct;
>>
>> -     /* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
>> -     smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
>> -     capacity = cpus / smt; /* cores */
>> +     capacity = DIV_ROUND_CLOSEST(power * cpus, power_orig * smt);
>>
>> -     capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE));
>>       if (!capacity)
>>               capacity = fix_small_capacity(env->sd, group);
>>
>
> So this patch only cures a little problem; the much bigger problem is
> that capacity as exists is completely wrong.
>
> We really should be using utilization here. Until a CPU is fully

ok, I'm goiing to see how to replace capacity with utilization

Thanks

> utilized we shouldn't be moving tasks around (unless packing, but where
> not there yet, and in that case you want to stop, where this starts,
> namely full utilization).
>
> So while I appreciate what you're trying to 'fix' here, its really just
> trying to dress a monkey.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 3/4] sched: fix computed capacity for HMP
@ 2014-04-16  6:54       ` Vincent Guittot
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Guittot @ 2014-04-16  6:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 April 2014 19:22, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Mar 28, 2014 at 02:22:28PM +0100, Vincent Guittot wrote:
>> The current sg_capacity solves the ghost cores issue for SMT system and
>> cluster made of big cores which have a cpu_power above SCHED_POWER_SCALE at
>> core level. But it still removes some real cores of a cluster made of LITTLE
>> cores which have a cpu_power below SCHED_POWER_SCALE.
>>
>> Instead of using the power_orig to detect SMT system and compute a smt factor
>> that will be used to calculate the real number of cores, we set a core_fct
>> field when building the sched_domain topology. We can detect SMT system thanks
>> to SD_SHARE_CPUPOWER flag and set core_fct to know how many CPUs per core we
>> have. The core_fct will ensure that sg_capacity will return cores capacity of
>> a SMT system and will not remove any real core of LITTLE cluster.
>>
>> This method also fixes a use case where the capacity of a SMT system was
>> overrated.
>> Let take the example of a system made of 8 cores HT system:
>> At CPU level, sg_capacity is cap to a maximum capacity of 8 whereas
>> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) returns 9.
>> ((589*16) / 1024) = 9.3
>> Now if 2 CPUs (1 core) are fully loaded by rt tasks, sg_capacity still returns
>> a capacity of 8 whereas it should return a capacity of 7. This happen because
>> DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE) is still above 7.5:
>> ((589*14) / 1024) = 8.05
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/core.c  | 7 +++++++
>>  kernel/sched/fair.c  | 6 ++----
>>  kernel/sched/sched.h | 2 +-
>>  3 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f9d9776..5b20b27 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5844,6 +5844,13 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
>>
>>       WARN_ON(!sg);
>>
>> +     if (!sd->child)
>> +             sg->core_fct = 1;
>> +     else if (sd->child->flags & SD_SHARE_CPUPOWER)
>> +             sg->core_fct = cpumask_weight(sched_group_cpus(sg));
>> +     else
>> +             sg->core_fct = sd->child->groups->core_fct;
>> +
>>       do {
>>               sg->group_weight = cpumask_weight(sched_group_cpus(sg));
>>               sg = sg->next;
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ed42061..7387c05 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5773,12 +5773,10 @@ static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
>>       power = group->sgp->power;
>>       power_orig = group->sgp->power_orig;
>>       cpus = group->group_weight;
>> +     smt = group->core_fct;
>>
>> -     /* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
>> -     smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
>> -     capacity = cpus / smt; /* cores */
>> +     capacity = DIV_ROUND_CLOSEST(power * cpus, power_orig * smt);
>>
>> -     capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE));
>>       if (!capacity)
>>               capacity = fix_small_capacity(env->sd, group);
>>
>
> So this patch only cures a little problem; the much bigger problem is
> that capacity as exists is completely wrong.
>
> We really should be using utilization here. Until a CPU is fully

ok, I'm goiing to see how to replace capacity with utilization

Thanks

> utilized we shouldn't be moving tasks around (unless packing, but where
> not there yet, and in that case you want to stop, where this starts,
> namely full utilization).
>
> So while I appreciate what you're trying to 'fix' here, its really just
> trying to dress a monkey.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-04-16  6:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-28 13:22 [RFC 0/4] sched: consolidation of cpu_power Vincent Guittot
2014-03-28 13:22 ` Vincent Guittot
2014-03-28 13:22 ` [RFC 1/4] sched: extend the usage of cpu_power_orig Vincent Guittot
2014-03-28 13:22   ` Vincent Guittot
2014-04-01 10:39   ` Preeti U Murthy
2014-04-01 10:39     ` Preeti U Murthy
2014-04-01 11:20     ` Vincent Guittot
2014-04-01 11:20       ` Vincent Guittot
2014-04-03 10:40       ` Preeti U Murthy
2014-04-03 10:40         ` Preeti U Murthy
2014-03-28 13:22 ` [RFC 2/4] ARM: topology: use new cpu_power interface Vincent Guittot
2014-03-28 13:22   ` Vincent Guittot
2014-03-28 13:22 ` [RFC 3/4] sched: fix computed capacity for HMP Vincent Guittot
2014-03-28 13:22   ` Vincent Guittot
2014-04-15 17:22   ` Peter Zijlstra
2014-04-15 17:22     ` Peter Zijlstra
2014-04-16  6:54     ` Vincent Guittot
2014-04-16  6:54       ` Vincent Guittot
2014-03-29  2:27 ` [RFC 0/4] sched: consolidation of cpu_power Nicolas Pitre
2014-03-29  2:27   ` Nicolas Pitre
2014-03-31  7:31   ` Vincent Guittot
2014-03-31  7:31     ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.