All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arch_topology: Trace the update thermal pressure
@ 2022-04-19 16:48 Lukasz Luba
  2022-04-20  8:34 ` kernel test robot
  2022-04-20 23:44 ` Steven Rostedt
  0 siblings, 2 replies; 6+ messages in thread
From: Lukasz Luba @ 2022-04-19 16:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: sudeep.holla, dietmar.eggemann, vincent.guittot, gregkh, rafael,
	rostedt, mingo, lukasz.luba

Add trace event to capture the moment of the call for updating the thermal
pressure value. It's helpful to investigate how often those events occurs
in a system dealing with throttling. This trace event is needed since the
old 'cdev_update' might not be used by some drivers. Also, the new trace
event shows capacity value, not a cooling state.

We already have a trace point 'pelt_thermal_tp' which after a change to
trace event can be paired with this new 'thermal_pressure_update' and
derive more insight what is going on in the system under pressure
(and why).

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/base/arch_topology.c   |  4 ++++
 include/trace/events/thermal.h | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1d6636ebaac5..4f0392de3081 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -19,6 +19,8 @@
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
 
+#include <trace/events/thermal.h>
+
 static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
 static struct cpumask scale_freq_counters_mask;
 static bool scale_freq_invariant;
@@ -195,6 +197,8 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
 
 	th_pressure = max_capacity - capacity;
 
+	trace_thermal_pressure_update(cpu, th_pressure);
+
 	for_each_cpu(cpu, cpus)
 		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
 }
diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h
index 8a5f04888abd..1bf08ee1a25b 100644
--- a/include/trace/events/thermal.h
+++ b/include/trace/events/thermal.h
@@ -65,6 +65,25 @@ TRACE_EVENT(cdev_update,
 	TP_printk("type=%s target=%lu", __get_str(type), __entry->target)
 );
 
+TRACE_EVENT(thermal_pressure_update,
+
+	TP_PROTO(int cpu, unsigned long thermal_pressure),
+
+	TP_ARGS(cpu, thermal_pressure),
+
+	TP_STRUCT__entry(
+		__field(int, cpu)
+		__field(unsigned long, thermal_pressure)
+	),
+
+	TP_fast_assign(
+		__entry->cpu = cpu;
+		__entry->thermal_pressure = thermal_pressure;
+	),
+
+	TP_printk("cpu=%d thermal_pressure=%lu", __entry->cpu, __entry->thermal_pressure)
+);
+
 TRACE_EVENT(thermal_zone_trip,
 
 	TP_PROTO(struct thermal_zone_device *tz, int trip,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch_topology: Trace the update thermal pressure
  2022-04-19 16:48 [PATCH] arch_topology: Trace the update thermal pressure Lukasz Luba
@ 2022-04-20  8:34 ` kernel test robot
  2022-04-25 10:05     ` Lukasz Luba
  2022-04-20 23:44 ` Steven Rostedt
  1 sibling, 1 reply; 6+ messages in thread
From: kernel test robot @ 2022-04-20  8:34 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel
  Cc: kbuild-all, sudeep.holla, dietmar.eggemann, vincent.guittot,
	gregkh, rafael, rostedt, mingo, lukasz.luba

Hi Lukasz,

I love your patch! Yet something to improve:

[auto build test ERROR on driver-core/driver-core-testing]
[also build test ERROR on rostedt-trace/for-next linus/master linux/master v5.18-rc3 next-20220419]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Lukasz-Luba/arch_topology-Trace-the-update-thermal-pressure/20220420-005845
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 3123109284176b1532874591f7c81f3837bbdc17
config: riscv-randconfig-c024-20220420 (https://download.01.org/0day-ci/archive/20220420/202204201654.vcszVDGb-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/aeff700a88be6a7469acfc312155bd213f76de95
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Lukasz-Luba/arch_topology-Trace-the-update-thermal-pressure/20220420-005845
        git checkout aeff700a88be6a7469acfc312155bd213f76de95
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L18':
>> arch_topology.c:(.text+0x136): undefined reference to `__traceiter_thermal_pressure_update'
   riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L20':
>> arch_topology.c:(.text+0x168): undefined reference to `__tracepoint_thermal_pressure_update'
>> riscv64-linux-ld: arch_topology.c:(.text+0x16c): undefined reference to `__tracepoint_thermal_pressure_update'
   riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L0 ':
>> arch_topology.c:(__jump_table+0x8): undefined reference to `__tracepoint_thermal_pressure_update'

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch_topology: Trace the update thermal pressure
  2022-04-19 16:48 [PATCH] arch_topology: Trace the update thermal pressure Lukasz Luba
  2022-04-20  8:34 ` kernel test robot
@ 2022-04-20 23:44 ` Steven Rostedt
  2022-04-21  6:26   ` Lukasz Luba
  1 sibling, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2022-04-20 23:44 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, sudeep.holla, dietmar.eggemann, vincent.guittot,
	gregkh, rafael, mingo

On Tue, 19 Apr 2022 17:48:01 +0100
Lukasz Luba <lukasz.luba@arm.com> wrote:

> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 1d6636ebaac5..4f0392de3081 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -19,6 +19,8 @@
>  #include <linux/rcupdate.h>
>  #include <linux/sched.h>
>  
> +#include <trace/events/thermal.h>
> +
>  static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
>  static struct cpumask scale_freq_counters_mask;
>  static bool scale_freq_invariant;
> @@ -195,6 +197,8 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
>  
>  	th_pressure = max_capacity - capacity;
>  
> +	trace_thermal_pressure_update(cpu, th_pressure);
> +
>  	for_each_cpu(cpu, cpus)
>  		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
>  }
> diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h
> index 8a5f04888abd..1bf08ee1a25b 100644
> --- a/include/trace/events/thermal.h
> +++ b/include/trace/events/thermal.h
> @@ -65,6 +65,25 @@ TRACE_EVENT(cdev_update,
>  	TP_printk("type=%s target=%lu", __get_str(type), __entry->target)
>  );
>  
> +TRACE_EVENT(thermal_pressure_update,
> +
> +	TP_PROTO(int cpu, unsigned long thermal_pressure),
> +
> +	TP_ARGS(cpu, thermal_pressure),
> +
> +	TP_STRUCT__entry(
> +		__field(int, cpu)
> +		__field(unsigned long, thermal_pressure)

Note, it is always best to place the bigger object before the smaller one
(when not properly aligned), as that will help to prevent structure
"holes". That is:

		__field(unsigned long, thermal_pressure)
		__field(int, cpu)


Otherwise, you are pretty much guaranteed to have a 4 byte hole between cpu
and thermal_pressure on 64 bit machines.

Also, for the warning you got from the test robot, if you are using this in
a module and defining it in the core kernel, you need to add:

EXPORT_TRACEPOINT_SYMBOL_GPL(thermal_pressure_update);

Somewhere in the C file that includes this file and defines
CREATE_TRACE_POINTS.

-- Steve



> +	),
> +
> +	TP_fast_assign(
> +		__entry->cpu = cpu;
> +		__entry->thermal_pressure = thermal_pressure;
> +	),
> +
> +	TP_printk("cpu=%d thermal_pressure=%lu", __entry->cpu, __entry->thermal_pressure)
> +);
> +
>  TRACE_EVENT(thermal_zone_trip,
>  
>  	TP_PROTO(struct thermal_zone_device *tz, int trip,


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch_topology: Trace the update thermal pressure
  2022-04-20 23:44 ` Steven Rostedt
@ 2022-04-21  6:26   ` Lukasz Luba
  0 siblings, 0 replies; 6+ messages in thread
From: Lukasz Luba @ 2022-04-21  6:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, sudeep.holla, dietmar.eggemann, vincent.guittot,
	gregkh, rafael, mingo



On 4/21/22 00:44, Steven Rostedt wrote:
> On Tue, 19 Apr 2022 17:48:01 +0100
> Lukasz Luba <lukasz.luba@arm.com> wrote:
> 
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index 1d6636ebaac5..4f0392de3081 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -19,6 +19,8 @@
>>   #include <linux/rcupdate.h>
>>   #include <linux/sched.h>
>>   
>> +#include <trace/events/thermal.h>
>> +
>>   static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
>>   static struct cpumask scale_freq_counters_mask;
>>   static bool scale_freq_invariant;
>> @@ -195,6 +197,8 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
>>   
>>   	th_pressure = max_capacity - capacity;
>>   
>> +	trace_thermal_pressure_update(cpu, th_pressure);
>> +
>>   	for_each_cpu(cpu, cpus)
>>   		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
>>   }
>> diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h
>> index 8a5f04888abd..1bf08ee1a25b 100644
>> --- a/include/trace/events/thermal.h
>> +++ b/include/trace/events/thermal.h
>> @@ -65,6 +65,25 @@ TRACE_EVENT(cdev_update,
>>   	TP_printk("type=%s target=%lu", __get_str(type), __entry->target)
>>   );
>>   
>> +TRACE_EVENT(thermal_pressure_update,
>> +
>> +	TP_PROTO(int cpu, unsigned long thermal_pressure),
>> +
>> +	TP_ARGS(cpu, thermal_pressure),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(int, cpu)
>> +		__field(unsigned long, thermal_pressure)
> 
> Note, it is always best to place the bigger object before the smaller one
> (when not properly aligned), as that will help to prevent structure
> "holes". That is:
> 
> 		__field(unsigned long, thermal_pressure)
> 		__field(int, cpu)
> 
> 
> Otherwise, you are pretty much guaranteed to have a 4 byte hole between cpu
> and thermal_pressure on 64 bit machines.
> 
> Also, for the warning you got from the test robot, if you are using this in
> a module and defining it in the core kernel, you need to add:
> 
> EXPORT_TRACEPOINT_SYMBOL_GPL(thermal_pressure_update);
> 
> Somewhere in the C file that includes this file and defines
> CREATE_TRACE_POINTS.
> 
> -- Steve
> 
> 

Thank you Steve, I'll swap those fields and add the needed export.
I have to go through those older trace events in that file and
understand if they are correct, since I just followed the pattern
in 'cdev_update'.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch_topology: Trace the update thermal pressure
  2022-04-20  8:34 ` kernel test robot
@ 2022-04-25 10:05     ` Lukasz Luba
  0 siblings, 0 replies; 6+ messages in thread
From: Lukasz Luba @ 2022-04-25 10:05 UTC (permalink / raw)
  To: kernel test robot, linux-kernel
  Cc: kbuild-all, sudeep.holla, dietmar.eggemann, vincent.guittot,
	gregkh, rafael, rostedt, mingo



On 4/20/22 09:34, kernel test robot wrote:
> Hi Lukasz,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on driver-core/driver-core-testing]
> [also build test ERROR on rostedt-trace/for-next linus/master linux/master v5.18-rc3 next-20220419]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Lukasz-Luba/arch_topology-Trace-the-update-thermal-pressure/20220420-005845
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 3123109284176b1532874591f7c81f3837bbdc17
> config: riscv-randconfig-c024-20220420 (https://download.01.org/0day-ci/archive/20220420/202204201654.vcszVDGb-lkp@intel.com/config)
> compiler: riscv64-linux-gcc (GCC) 11.2.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/intel-lab-lkp/linux/commit/aeff700a88be6a7469acfc312155bd213f76de95
>          git remote add linux-review https://github.com/intel-lab-lkp/linux
>          git fetch --no-tags linux-review Lukasz-Luba/arch_topology-Trace-the-update-thermal-pressure/20220420-005845
>          git checkout aeff700a88be6a7469acfc312155bd213f76de95
>          # save the config file
>          mkdir build_dir && cp config build_dir/.config
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>     riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L18':
>>> arch_topology.c:(.text+0x136): undefined reference to `__traceiter_thermal_pressure_update'
>     riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L20':
>>> arch_topology.c:(.text+0x168): undefined reference to `__tracepoint_thermal_pressure_update'
>>> riscv64-linux-ld: arch_topology.c:(.text+0x16c): undefined reference to `__tracepoint_thermal_pressure_update'
>     riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L0 ':
>>> arch_topology.c:(__jump_table+0x8): undefined reference to `__tracepoint_thermal_pressure_update'
> 

I've reproduced this issue and experimented with a few solutions.
that config file doesn't set CONFIG_THERMAL, where normally we have
in thermal_core.c the:
   24 #define CREATE_TRACE_POINTS
   25 #include <trace/events/thermal.h>

(similar mechanism we have for thermal_power_allocator.h and IPA.)

We normally test w/ this config, but I'll also start checking the build
w/o thermal subsystem.

I cannot add those two lines in the arch_topology.c since it complains
about multiple definitions for many entries.

Thus, I'll create a separate header thermal_pressure.h to put it in the
arch_topology.c.

I'll also export the symbol as suggested by Steve, so some potential
cpufreq modules could just us it from there.

I'll also re-visit the trace events that we have in thermal subsystem,
since they are not exported but some modules might would like
to use (some of) them...

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arch_topology: Trace the update thermal pressure
@ 2022-04-25 10:05     ` Lukasz Luba
  0 siblings, 0 replies; 6+ messages in thread
From: Lukasz Luba @ 2022-04-25 10:05 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3415 bytes --]



On 4/20/22 09:34, kernel test robot wrote:
> Hi Lukasz,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on driver-core/driver-core-testing]
> [also build test ERROR on rostedt-trace/for-next linus/master linux/master v5.18-rc3 next-20220419]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Lukasz-Luba/arch_topology-Trace-the-update-thermal-pressure/20220420-005845
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 3123109284176b1532874591f7c81f3837bbdc17
> config: riscv-randconfig-c024-20220420 (https://download.01.org/0day-ci/archive/20220420/202204201654.vcszVDGb-lkp(a)intel.com/config)
> compiler: riscv64-linux-gcc (GCC) 11.2.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # https://github.com/intel-lab-lkp/linux/commit/aeff700a88be6a7469acfc312155bd213f76de95
>          git remote add linux-review https://github.com/intel-lab-lkp/linux
>          git fetch --no-tags linux-review Lukasz-Luba/arch_topology-Trace-the-update-thermal-pressure/20220420-005845
>          git checkout aeff700a88be6a7469acfc312155bd213f76de95
>          # save the config file
>          mkdir build_dir && cp config build_dir/.config
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>     riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L18':
>>> arch_topology.c:(.text+0x136): undefined reference to `__traceiter_thermal_pressure_update'
>     riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L20':
>>> arch_topology.c:(.text+0x168): undefined reference to `__tracepoint_thermal_pressure_update'
>>> riscv64-linux-ld: arch_topology.c:(.text+0x16c): undefined reference to `__tracepoint_thermal_pressure_update'
>     riscv64-linux-ld: drivers/base/arch_topology.o: in function `.L0 ':
>>> arch_topology.c:(__jump_table+0x8): undefined reference to `__tracepoint_thermal_pressure_update'
> 

I've reproduced this issue and experimented with a few solutions.
that config file doesn't set CONFIG_THERMAL, where normally we have
in thermal_core.c the:
   24 #define CREATE_TRACE_POINTS
   25 #include <trace/events/thermal.h>

(similar mechanism we have for thermal_power_allocator.h and IPA.)

We normally test w/ this config, but I'll also start checking the build
w/o thermal subsystem.

I cannot add those two lines in the arch_topology.c since it complains
about multiple definitions for many entries.

Thus, I'll create a separate header thermal_pressure.h to put it in the
arch_topology.c.

I'll also export the symbol as suggested by Steve, so some potential
cpufreq modules could just us it from there.

I'll also re-visit the trace events that we have in thermal subsystem,
since they are not exported but some modules might would like
to use (some of) them...

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-25 10:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 16:48 [PATCH] arch_topology: Trace the update thermal pressure Lukasz Luba
2022-04-20  8:34 ` kernel test robot
2022-04-25 10:05   ` Lukasz Luba
2022-04-25 10:05     ` Lukasz Luba
2022-04-20 23:44 ` Steven Rostedt
2022-04-21  6:26   ` Lukasz Luba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.