linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
@ 2018-11-27 13:24 Daniel Lezcano
  2018-11-27 13:24 ` [PATCH V5 2/2] base/drivers/arch_topology: Default dmips-mhz if they are not set in DT Daniel Lezcano
  2018-11-28 11:44 ` [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE Juri Lelli
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Lezcano @ 2018-11-27 13:24 UTC (permalink / raw)
  To: rjw
  Cc: linux-kernel, viresh.kumar, Sudeep Holla, Greg Kroah-Hartman,
	Rafael J. Wysocki, Ingo Molnar, Peter Zijlstra (Intel),
	Morten Rasmussen, Juri Lelli

The mutex protects a per_cpu variable access. The potential race can
happen only when the cpufreq governor module is loaded and at the same
time the cpu capacity is changed in the sysfs.

There is no real interest of using a mutex to protect a variable
assignation when there is no situation where a task can take the lock
and block.

Replace the mutex by READ_ONCE / WRITE_ONCE.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/base/arch_topology.c  | 7 +------
 include/linux/arch_topology.h | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index edfcf8d..fd5325b 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
 		per_cpu(freq_scale, i) = scale;
 }
 
-static DEFINE_MUTEX(cpu_scale_mutex);
 DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
 
 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
 {
-	per_cpu(cpu_scale, cpu) = capacity;
+	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);
 }
 
 static ssize_t cpu_capacity_show(struct device *dev,
@@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev,
 	if (new_capacity > SCHED_CAPACITY_SCALE)
 		return -EINVAL;
 
-	mutex_lock(&cpu_scale_mutex);
 	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
 		topology_set_cpu_scale(i, new_capacity);
-	mutex_unlock(&cpu_scale_mutex);
 
 	schedule_work(&update_topology_flags_work);
 
@@ -141,7 +138,6 @@ void topology_normalize_cpu_scale(void)
 		return;
 
 	pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale);
-	mutex_lock(&cpu_scale_mutex);
 	for_each_possible_cpu(cpu) {
 		pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n",
 			 cpu, raw_capacity[cpu]);
@@ -151,7 +147,6 @@ void topology_normalize_cpu_scale(void)
 		pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
 			cpu, topology_get_cpu_scale(NULL, cpu));
 	}
-	mutex_unlock(&cpu_scale_mutex);
 }
 
 bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index d9bdc1a..12c439f 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -20,7 +20,7 @@ struct sched_domain;
 static inline
 unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu)
 {
-	return per_cpu(cpu_scale, cpu);
+	return READ_ONCE(per_cpu(cpu_scale, cpu));
 }
 
 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH V5 2/2] base/drivers/arch_topology: Default dmips-mhz if they are not set in DT
  2018-11-27 13:24 [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE Daniel Lezcano
@ 2018-11-27 13:24 ` Daniel Lezcano
  2018-12-03 13:46   ` Dietmar Eggemann
  2018-11-28 11:44 ` [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE Juri Lelli
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Lezcano @ 2018-11-27 13:24 UTC (permalink / raw)
  To: rjw
  Cc: linux-kernel, viresh.kumar, Chris Redpath, Quentin Perret,
	Amit Kucheria, Nicolas Dechesne, Niklas Cassel, Rob Herring,
	Quentin Perret, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, Li Yang, Olof Johansson,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS

In the case of asymmetric SoC with the same micro-architecture, we
have a group of CPUs with smaller OPPs than the other group. One
example is the 96boards dragonboard 820c. There is no dmips/MHz
difference between both groups, so no need to specify the values in
the DT. Unfortunately, without these defined, there is no scaling
capacity computation triggered, so we need to write
'capacity-dmips-mhz' for each CPU with the same value in order to
force the scaled capacity computation.

In order to fix this situation, allocate 'raw_capacity' so the pointer
is set and the init_cpu_capacity_callback() function can be called.

This was tested on db820c:
 - specified values in the DT (correct results)
 - partial values defined in the DT (error + fallback to defaults)
 - no specified values in the DT (correct results)

correct results are:
  cat /sys/devices/system/cpu/cpu*/cpu_capacity
   758
   758
  1024
  1024

  ... respectively for CPU0, CPU1, CPU2 and CPU3.

That reflects the capacity for the max frequencies 1593600 and 2150400.

Cc: Chris Redpath <chris.redpath@linaro.org>
Cc: Quentin Perret <quentin.perret@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Amit Kucheria <amit.kucheria@linaro.org>
Cc: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Quentin Perret <quentin.perret@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 Documentation/devicetree/bindings/arm/cpu-capacity.txt | 6 ++++++
 drivers/base/arch_topology.c                           | 9 ++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/arm/cpu-capacity.txt b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
index 84262cd..f53a3c9 100644
--- a/Documentation/devicetree/bindings/arm/cpu-capacity.txt
+++ b/Documentation/devicetree/bindings/arm/cpu-capacity.txt
@@ -54,6 +54,12 @@ fall back to the default capacity value for every CPU. If cpufreq is not
 available, final capacities are calculated by directly using capacity-dmips-
 mhz values (normalized w.r.t. the highest value found while parsing the DT).
 
+If capacity-dmips-mhz is not specified or if the parsing fails, the
+default capacity value will be computed against the highest frequency.
+When all CPUs have the same OPP, they will have the same capacity
+value otherwise the capacity will be scaled down for CPUs having lower
+frequencies.
+
 ===========================================
 4 - Examples
 ===========================================
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index fd5325b..696cea5 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -243,9 +243,16 @@ static int __init register_cpufreq_notifier(void)
 	 * until we have the necessary code to parse the cpu capacity, so
 	 * skip registering cpufreq notifier.
 	 */
-	if (!acpi_disabled || !raw_capacity)
+	if (!acpi_disabled)
 		return -EINVAL;
 
+	if (!raw_capacity) {
+		raw_capacity = kmalloc_array(num_possible_cpus(),
+					     sizeof(*raw_capacity), GFP_KERNEL);
+		if (!raw_capacity)
+			return -ENOMEM;
+	}
+
 	if (!alloc_cpumask_var(&cpus_to_visit, GFP_KERNEL)) {
 		pr_err("cpu_capacity: failed to allocate memory for cpus_to_visit\n");
 		return -ENOMEM;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
  2018-11-27 13:24 [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE Daniel Lezcano
  2018-11-27 13:24 ` [PATCH V5 2/2] base/drivers/arch_topology: Default dmips-mhz if they are not set in DT Daniel Lezcano
@ 2018-11-28 11:44 ` Juri Lelli
  2018-11-28 17:54   ` Daniel Lezcano
  1 sibling, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2018-11-28 11:44 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, linux-kernel, viresh.kumar, Sudeep Holla,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Peter Zijlstra (Intel),
	Morten Rasmussen, Juri Lelli

Hi Daniel,

On 27/11/18 14:24, Daniel Lezcano wrote:
> The mutex protects a per_cpu variable access. The potential race can
> happen only when the cpufreq governor module is loaded and at the same
> time the cpu capacity is changed in the sysfs.
> 
> There is no real interest of using a mutex to protect a variable
> assignation when there is no situation where a task can take the lock
> and block.
> 
> Replace the mutex by READ_ONCE / WRITE_ONCE.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/base/arch_topology.c  | 7 +------
>  include/linux/arch_topology.h | 2 +-
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index edfcf8d..fd5325b 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
>  		per_cpu(freq_scale, i) = scale;
>  }
>  
> -static DEFINE_MUTEX(cpu_scale_mutex);
>  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
>  
>  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
>  {
> -	per_cpu(cpu_scale, cpu) = capacity;
> +	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);
>  }
>  
>  static ssize_t cpu_capacity_show(struct device *dev,
> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev,
>  	if (new_capacity > SCHED_CAPACITY_SCALE)
>  		return -EINVAL;
>  
> -	mutex_lock(&cpu_scale_mutex);
>  	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
>  		topology_set_cpu_scale(i, new_capacity);
> -	mutex_unlock(&cpu_scale_mutex);

IIRC this was meant to ensure atomic updates of all siblings with the new
capacity value. I actually now wonder if readers should not grab the
mutex as well (cpu_capacity_show()). Can't we get into a situation where
a reader might see siblings with intermediate values (while the loop
above is performing an update)?

BTW, please update my email address. :-)

Best,

- Juri

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
  2018-11-28 11:44 ` [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE Juri Lelli
@ 2018-11-28 17:54   ` Daniel Lezcano
  2018-11-29  7:04     ` Juri Lelli
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Lezcano @ 2018-11-28 17:54 UTC (permalink / raw)
  To: Juri Lelli
  Cc: rjw, linux-kernel, viresh.kumar, Sudeep Holla,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Peter Zijlstra (Intel),
	Morten Rasmussen

On 28/11/2018 12:44, Juri Lelli wrote:
> Hi Daniel,
> 
> On 27/11/18 14:24, Daniel Lezcano wrote:
>> The mutex protects a per_cpu variable access. The potential race can
>> happen only when the cpufreq governor module is loaded and at the same
>> time the cpu capacity is changed in the sysfs.
>>
>> There is no real interest of using a mutex to protect a variable
>> assignation when there is no situation where a task can take the lock
>> and block.
>>
>> Replace the mutex by READ_ONCE / WRITE_ONCE.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> Cc: Sudeep Holla <sudeep.holla@arm.com>
>> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>>  drivers/base/arch_topology.c  | 7 +------
>>  include/linux/arch_topology.h | 2 +-
>>  2 files changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index edfcf8d..fd5325b 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
>>  		per_cpu(freq_scale, i) = scale;
>>  }
>>  
>> -static DEFINE_MUTEX(cpu_scale_mutex);
>>  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
>>  
>>  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
>>  {
>> -	per_cpu(cpu_scale, cpu) = capacity;
>> +	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);
>>  }
>>  
>>  static ssize_t cpu_capacity_show(struct device *dev,
>> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev,
>>  	if (new_capacity > SCHED_CAPACITY_SCALE)
>>  		return -EINVAL;
>>  
>> -	mutex_lock(&cpu_scale_mutex);
>>  	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
>>  		topology_set_cpu_scale(i, new_capacity);
>> -	mutex_unlock(&cpu_scale_mutex);
> 
> IIRC this was meant to ensure atomic updates of all siblings with the new
> capacity value. I actually now wonder if readers should not grab the
> mutex as well (cpu_capacity_show()). Can't we get into a situation where
> a reader might see siblings with intermediate values (while the loop
> above is performing an update)?

With or without this patch, it is the case:

                task1                      task2
                  |                          |
  read("/sys/.../cpu1/cpu_capacity)          |
                  |                  write("/sys/.../cpu1/cpu_capacity")
  read("/sys/.../cpu2/cpu_capacity)          |


There is no guarantee userspace can have a consistent view of the
capacity. As soon as it reads a capacity, it can be changed in its back.


> BTW, please update my email address. :-)

Sure.


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
  2018-11-28 17:54   ` Daniel Lezcano
@ 2018-11-29  7:04     ` Juri Lelli
  2018-11-29  9:18       ` Daniel Lezcano
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2018-11-29  7:04 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, linux-kernel, viresh.kumar, Sudeep Holla,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Peter Zijlstra (Intel),
	Morten Rasmussen

On 28/11/18 18:54, Daniel Lezcano wrote:
> On 28/11/2018 12:44, Juri Lelli wrote:
> > Hi Daniel,
> > 
> > On 27/11/18 14:24, Daniel Lezcano wrote:
> >> The mutex protects a per_cpu variable access. The potential race can
> >> happen only when the cpufreq governor module is loaded and at the same
> >> time the cpu capacity is changed in the sysfs.
> >>
> >> There is no real interest of using a mutex to protect a variable
> >> assignation when there is no situation where a task can take the lock
> >> and block.
> >>
> >> Replace the mutex by READ_ONCE / WRITE_ONCE.
> >>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> >> Cc: Sudeep Holla <sudeep.holla@arm.com>
> >> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
> >> ---
> >>  drivers/base/arch_topology.c  | 7 +------
> >>  include/linux/arch_topology.h | 2 +-
> >>  2 files changed, 2 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> >> index edfcf8d..fd5325b 100644
> >> --- a/drivers/base/arch_topology.c
> >> +++ b/drivers/base/arch_topology.c
> >> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
> >>  		per_cpu(freq_scale, i) = scale;
> >>  }
> >>  
> >> -static DEFINE_MUTEX(cpu_scale_mutex);
> >>  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
> >>  
> >>  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
> >>  {
> >> -	per_cpu(cpu_scale, cpu) = capacity;
> >> +	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);
> >>  }
> >>  
> >>  static ssize_t cpu_capacity_show(struct device *dev,
> >> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev,
> >>  	if (new_capacity > SCHED_CAPACITY_SCALE)
> >>  		return -EINVAL;
> >>  
> >> -	mutex_lock(&cpu_scale_mutex);
> >>  	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
> >>  		topology_set_cpu_scale(i, new_capacity);
> >> -	mutex_unlock(&cpu_scale_mutex);
> > 
> > IIRC this was meant to ensure atomic updates of all siblings with the new
> > capacity value. I actually now wonder if readers should not grab the
> > mutex as well (cpu_capacity_show()). Can't we get into a situation where
> > a reader might see siblings with intermediate values (while the loop
> > above is performing an update)?
> 
> With or without this patch, it is the case:
> 
>                 task1                      task2
>                   |                          |
>   read("/sys/.../cpu1/cpu_capacity)          |
>                   |                  write("/sys/.../cpu1/cpu_capacity")
>   read("/sys/.../cpu2/cpu_capacity)          |
> 
> 
> There is no guarantee userspace can have a consistent view of the
> capacity. As soon as it reads a capacity, it can be changed in its back.

True, but w/o the mutex task1 could read different cpu_capacity values
for a cluster (it actually can also with current implementation, we
should grab the mutex in the read path as well if we want to avoid
this). Just pointing this out, not sure it is a problem, though, as even
the notion of strictly equal capacities clusters is going to disappear
AFAIK.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
  2018-11-29  7:04     ` Juri Lelli
@ 2018-11-29  9:18       ` Daniel Lezcano
  2018-11-29  9:58         ` Juri Lelli
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Lezcano @ 2018-11-29  9:18 UTC (permalink / raw)
  To: Juri Lelli
  Cc: rjw, linux-kernel, viresh.kumar, Sudeep Holla,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Peter Zijlstra (Intel),
	Morten Rasmussen

On 29/11/2018 08:04, Juri Lelli wrote:

[ ... ]

>> With or without this patch, it is the case:
>>
>>                 task1                      task2
>>                   |                          |
>>   read("/sys/.../cpu1/cpu_capacity)          |
>>                   |                  write("/sys/.../cpu1/cpu_capacity")
>>   read("/sys/.../cpu2/cpu_capacity)          |
>>
>>
>> There is no guarantee userspace can have a consistent view of the
>> capacity. As soon as it reads a capacity, it can be changed in its back.
> 
> True, but w/o the mutex task1 could read different cpu_capacity values
> for a cluster (it actually can also with current implementation, we
> should grab the mutex in the read path as well if we want to avoid
> this). 

Even if the mutex is on the read path, the userspace can see different
capacities because it will read the cpu_capacity per cpu directory.

The mutex will be take when reading cpu0/cpu_capacity, not for
cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the
lock is released in between.

Do you agree with the patch ? Or do you want me to drop it ?

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
  2018-11-29  9:18       ` Daniel Lezcano
@ 2018-11-29  9:58         ` Juri Lelli
  2018-11-29 10:02           ` Daniel Lezcano
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2018-11-29  9:58 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, linux-kernel, viresh.kumar, Sudeep Holla,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Peter Zijlstra (Intel),
	Morten Rasmussen

On 29/11/18 10:18, Daniel Lezcano wrote:
> On 29/11/2018 08:04, Juri Lelli wrote:
> 
> [ ... ]
> 
> >> With or without this patch, it is the case:
> >>
> >>                 task1                      task2
> >>                   |                          |
> >>   read("/sys/.../cpu1/cpu_capacity)          |
> >>                   |                  write("/sys/.../cpu1/cpu_capacity")
> >>   read("/sys/.../cpu2/cpu_capacity)          |
> >>
> >>
> >> There is no guarantee userspace can have a consistent view of the
> >> capacity. As soon as it reads a capacity, it can be changed in its back.
> > 
> > True, but w/o the mutex task1 could read different cpu_capacity values
> > for a cluster (it actually can also with current implementation, we
> > should grab the mutex in the read path as well if we want to avoid
> > this). 
> 
> Even if the mutex is on the read path, the userspace can see different
> capacities because it will read the cpu_capacity per cpu directory.
> 
> The mutex will be take when reading cpu0/cpu_capacity, not for
> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the
> lock is released in between.
> 
> Do you agree with the patch ? Or do you want me to drop it ?

I don't actually have cases at hand that are showing regression with it,
I was just trying to understand if we might potentially hit problems in
the future. So, I'm not against this patch. :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
  2018-11-29  9:58         ` Juri Lelli
@ 2018-11-29 10:02           ` Daniel Lezcano
  2018-11-29 12:40             ` Juri Lelli
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Lezcano @ 2018-11-29 10:02 UTC (permalink / raw)
  To: Juri Lelli
  Cc: rjw, linux-kernel, viresh.kumar, Sudeep Holla,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Peter Zijlstra (Intel),
	Morten Rasmussen

On 29/11/2018 10:58, Juri Lelli wrote:
> On 29/11/18 10:18, Daniel Lezcano wrote:
>> On 29/11/2018 08:04, Juri Lelli wrote:
>>
>> [ ... ]
>>
>>>> With or without this patch, it is the case:
>>>>
>>>>                 task1                      task2
>>>>                   |                          |
>>>>   read("/sys/.../cpu1/cpu_capacity)          |
>>>>                   |                  write("/sys/.../cpu1/cpu_capacity")
>>>>   read("/sys/.../cpu2/cpu_capacity)          |
>>>>
>>>>
>>>> There is no guarantee userspace can have a consistent view of the
>>>> capacity. As soon as it reads a capacity, it can be changed in its back.
>>>
>>> True, but w/o the mutex task1 could read different cpu_capacity values
>>> for a cluster (it actually can also with current implementation, we
>>> should grab the mutex in the read path as well if we want to avoid
>>> this). 
>>
>> Even if the mutex is on the read path, the userspace can see different
>> capacities because it will read the cpu_capacity per cpu directory.
>>
>> The mutex will be take when reading cpu0/cpu_capacity, not for
>> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the
>> lock is released in between.
>>
>> Do you agree with the patch ? Or do you want me to drop it ?
> 
> I don't actually have cases at hand that are showing regression with it,
> I was just trying to understand if we might potentially hit problems in
> the future. So, I'm not against this patch. :-)

not-not-acked-by ? :)

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
  2018-11-29 10:02           ` Daniel Lezcano
@ 2018-11-29 12:40             ` Juri Lelli
  0 siblings, 0 replies; 11+ messages in thread
From: Juri Lelli @ 2018-11-29 12:40 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, linux-kernel, viresh.kumar, Sudeep Holla,
	Greg Kroah-Hartman, Rafael J. Wysocki, Ingo Molnar,
	Peter Zijlstra (Intel),
	Morten Rasmussen

On 29/11/18 11:02, Daniel Lezcano wrote:
> On 29/11/2018 10:58, Juri Lelli wrote:
> > On 29/11/18 10:18, Daniel Lezcano wrote:
> >> On 29/11/2018 08:04, Juri Lelli wrote:
> >>
> >> [ ... ]
> >>
> >>>> With or without this patch, it is the case:
> >>>>
> >>>>                 task1                      task2
> >>>>                   |                          |
> >>>>   read("/sys/.../cpu1/cpu_capacity)          |
> >>>>                   |                  write("/sys/.../cpu1/cpu_capacity")
> >>>>   read("/sys/.../cpu2/cpu_capacity)          |
> >>>>
> >>>>
> >>>> There is no guarantee userspace can have a consistent view of the
> >>>> capacity. As soon as it reads a capacity, it can be changed in its back.
> >>>
> >>> True, but w/o the mutex task1 could read different cpu_capacity values
> >>> for a cluster (it actually can also with current implementation, we
> >>> should grab the mutex in the read path as well if we want to avoid
> >>> this). 
> >>
> >> Even if the mutex is on the read path, the userspace can see different
> >> capacities because it will read the cpu_capacity per cpu directory.
> >>
> >> The mutex will be take when reading cpu0/cpu_capacity, not for
> >> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the
> >> lock is released in between.
> >>
> >> Do you agree with the patch ? Or do you want me to drop it ?
> > 
> > I don't actually have cases at hand that are showing regression with it,
> > I was just trying to understand if we might potentially hit problems in
> > the future. So, I'm not against this patch. :-)
> 
> not-not-acked-by ? :)

:-)

I'm not maintaining this, so,

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 2/2] base/drivers/arch_topology: Default dmips-mhz if they are not set in DT
  2018-11-27 13:24 ` [PATCH V5 2/2] base/drivers/arch_topology: Default dmips-mhz if they are not set in DT Daniel Lezcano
@ 2018-12-03 13:46   ` Dietmar Eggemann
  2018-12-04 10:02     ` Daniel Lezcano
  0 siblings, 1 reply; 11+ messages in thread
From: Dietmar Eggemann @ 2018-12-03 13:46 UTC (permalink / raw)
  To: Daniel Lezcano, rjw
  Cc: linux-kernel, viresh.kumar, Chris Redpath, Quentin Perret,
	Amit Kucheria, Nicolas Dechesne, Niklas Cassel, Rob Herring,
	Quentin Perret, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, Li Yang, Olof Johansson,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Russell King

Hi Daniel,

+cc: Russell King <linux@armlinux.org.uk>

On 11/27/18 2:24 PM, Daniel Lezcano wrote:
> In the case of asymmetric SoC with the same micro-architecture, we
> have a group of CPUs with smaller OPPs than the other group. One
> example is the 96boards dragonboard 820c. There is no dmips/MHz
> difference between both groups, so no need to specify the values in
> the DT. Unfortunately, without these defined, there is no scaling
> capacity computation triggered, so we need to write
> 'capacity-dmips-mhz' for each CPU with the same value in order to
> force the scaled capacity computation.
> 
> In order to fix this situation, allocate 'raw_capacity' so the pointer
> is set and the init_cpu_capacity_callback() function can be called.
> 
> This was tested on db820c:
>   - specified values in the DT (correct results)
>   - partial values defined in the DT (error + fallback to defaults)
>   - no specified values in the DT (correct results)
> 
> correct results are:
>    cat /sys/devices/system/cpu/cpu*/cpu_capacity
>     758
>     758
>    1024
>    1024
> 
>    ... respectively for CPU0, CPU1, CPU2 and CPU3.
> 
> That reflects the capacity for the max frequencies 1593600 and 2150400.

[...]

I'm afraid that this change is incompatible with the still existing 
cpu_efficiency interface we have in Arm32 for A15/A7 systems like Arm TC2:

In case you specify clock-frequency dt properties per cpu for such a 
system, the cpu_capacity values are determined via the cpu_efficiency 
code in arch/arm/kernel/topology.c.

So on Arm TC2 with clock-frequency = <1000000000> [A15] and <800000000> 
[A7] you get:

root@linaro-nano:~# cat /sys/devices/system/cpu/cpu*/cpu_capacity
606
1441
1441
606
606

With your patches on top (cpu_capacity functionality in 
drivers/base/arch_topology.c does not have to be switched on by 
specifying capacity-dmips-mhz dt properties anymore) we end up scaling 
by max frequency again:

root@linaro-nano:~# cat /sys/devices/system/cpu/cpu*/cpu_capacity
358
1024
1024
358
358

I tried to remove the cpu_efficiency based API a year ago but Russell 
pointed out that the compatibility has to be maintained for longer:

https://lore.kernel.org/lkml/20171024102718.16113-1-dietmar.eggemann@arm.com/

I assume that the capacity-dmips-mhz dt property is like a switch to 
turn this functionality on for big.Little and so called gold/silver 
platforms, which have cores with the same uArch but in frequency domains 
with different max frequency values.

So what's wrong with specifying capacity-dmips-mhz = <1024> for all 
cores for those gold/silver platforms? I don't expect that there will be 
so many of them. And normal SMP platforms (w/o frequency domains w/o 
different max frequency values) don't have to execute this code.

IMHO, at least we should remove the cpu_efficiency bits before we do 
this change.

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V5 2/2] base/drivers/arch_topology: Default dmips-mhz if they are not set in DT
  2018-12-03 13:46   ` Dietmar Eggemann
@ 2018-12-04 10:02     ` Daniel Lezcano
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Lezcano @ 2018-12-04 10:02 UTC (permalink / raw)
  To: Dietmar Eggemann, rjw
  Cc: linux-kernel, viresh.kumar, Chris Redpath, Quentin Perret,
	Amit Kucheria, Nicolas Dechesne, Niklas Cassel, Rob Herring,
	Quentin Perret, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, Li Yang, Olof Johansson,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Russell King


Hi Dietmar,

thanks for the review and spotting this.

On 03/12/2018 14:46, Dietmar Eggemann wrote:
> Hi Daniel,
> 
> +cc: Russell King <linux@armlinux.org.uk>
> 
> On 11/27/18 2:24 PM, Daniel Lezcano wrote:
>> In the case of asymmetric SoC with the same micro-architecture, we
>> have a group of CPUs with smaller OPPs than the other group. One
>> example is the 96boards dragonboard 820c. There is no dmips/MHz
>> difference between both groups, so no need to specify the values in
>> the DT. Unfortunately, without these defined, there is no scaling
>> capacity computation triggered, so we need to write
>> 'capacity-dmips-mhz' for each CPU with the same value in order to
>> force the scaled capacity computation.
>>
>> In order to fix this situation, allocate 'raw_capacity' so the pointer
>> is set and the init_cpu_capacity_callback() function can be called.
>>
>> This was tested on db820c:
>>   - specified values in the DT (correct results)
>>   - partial values defined in the DT (error + fallback to defaults)
>>   - no specified values in the DT (correct results)
>>
>> correct results are:
>>    cat /sys/devices/system/cpu/cpu*/cpu_capacity
>>     758
>>     758
>>    1024
>>    1024
>>
>>    ... respectively for CPU0, CPU1, CPU2 and CPU3.
>>
>> That reflects the capacity for the max frequencies 1593600 and 2150400.
> 
> [...]
> 
> I'm afraid that this change is incompatible with the still existing
> cpu_efficiency interface we have in Arm32 for A15/A7 systems like Arm TC2:
> 
> In case you specify clock-frequency dt properties per cpu for such a
> system, the cpu_capacity values are determined via the cpu_efficiency
> code in arch/arm/kernel/topology.c.
> 
> So on Arm TC2 with clock-frequency = <1000000000> [A15] and <800000000>
> [A7] you get:
> 
> root@linaro-nano:~# cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 606
> 1441
> 1441
> 606
> 606
> 
> With your patches on top (cpu_capacity functionality in
> drivers/base/arch_topology.c does not have to be switched on by
> specifying capacity-dmips-mhz dt properties anymore) we end up scaling
> by max frequency again:
> 
> root@linaro-nano:~# cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 358
> 1024
> 1024
> 358
> 358
> 
> I tried to remove the cpu_efficiency based API a year ago but Russell
> pointed out that the compatibility has to be maintained for longer:
> 
> https://lore.kernel.org/lkml/20171024102718.16113-1-dietmar.eggemann@arm.com/
> 
> 
> I assume that the capacity-dmips-mhz dt property is like a switch to
> turn this functionality on for big.Little and so called gold/silver
> platforms, which have cores with the same uArch but in frequency domains
> with different max frequency values.
> 
> So what's wrong with specifying capacity-dmips-mhz = <1024> for all
> cores for those gold/silver platforms? 

There is nothing wrong, I just don't like to specify in a DT a default
values.

> I don't expect that there will be
> so many of them. And normal SMP platforms (w/o frequency domains w/o
> different max frequency values) don't have to execute this code.

Fair enough, I will send a DT change, I'm tired of playing mikado with
this code.

Thanks again for the review.

  -- Daniel


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-12-04 10:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-27 13:24 [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE Daniel Lezcano
2018-11-27 13:24 ` [PATCH V5 2/2] base/drivers/arch_topology: Default dmips-mhz if they are not set in DT Daniel Lezcano
2018-12-03 13:46   ` Dietmar Eggemann
2018-12-04 10:02     ` Daniel Lezcano
2018-11-28 11:44 ` [PATCH V5 1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE Juri Lelli
2018-11-28 17:54   ` Daniel Lezcano
2018-11-29  7:04     ` Juri Lelli
2018-11-29  9:18       ` Daniel Lezcano
2018-11-29  9:58         ` Juri Lelli
2018-11-29 10:02           ` Daniel Lezcano
2018-11-29 12:40             ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).