All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] arch_topology: Trace the update thermal pressure
@ 2022-04-27  7:35 Lukasz Luba
  2022-04-27  7:44 ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Lukasz Luba @ 2022-04-27  7:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: sudeep.holla, dietmar.eggemann, vincent.guittot, gregkh, rafael,
	rostedt, mingo, lukasz.luba

Add trace event to capture the moment of the call for updating the thermal
pressure value. It's helpful to investigate how often those events occur
in a system dealing with throttling. This trace event is needed since the
old 'cdev_update' might not be used by some drivers.

The old 'cdev_update' trace event only provides a cooling state
value: [0, n]. That state value then needs additional tools to translate
it: state -> freq -> capacity -> thermal pressure. This new trace event
just stores proper thermal pressure value in the trace buffer, no need
for additional logic. This is helpful for cooperation when someone can
simply sends to the list the trace buffer output from the platform (no
need from additional information from other subsystems).

There are also platforms which due to some design reasons don't use
cooling devices and thus don't trigger old 'cdev_update' trace event.
They are also important and measuring latency for the thermal signal
raising/decaying characteristics is in scope. This new trace event
would cover them as well.

We already have a trace point 'pelt_thermal_tp' which after a change to
trace event can be paired with this new 'thermal_pressure_update' and
derive more insight what is going on in the system under thermal pressure
(and why).

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---

Changes in v3:
- extended explanation in the changelog following Greg's comment

The v2 is here:
https://lore.kernel.org/lkml/20220425135317.5880-1-lukasz.luba@arm.com/

The v1 and discussion can be found here at:
https://lore.kernel.org/lkml/20220419164801.29078-1-lukasz.luba@arm.com/

Regards,
Lukasz


 drivers/base/arch_topology.c            |  5 +++++
 include/trace/events/thermal_pressure.h | 29 +++++++++++++++++++++++++
 2 files changed, 34 insertions(+)
 create mode 100644 include/trace/events/thermal_pressure.h

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index f73b836047cf..579c851a2bd7 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -19,6 +19,9 @@
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/thermal_pressure.h>
+
 static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
 static struct cpumask scale_freq_counters_mask;
 static bool scale_freq_invariant;
@@ -195,6 +198,8 @@ void topology_update_thermal_pressure(const struct cpumask *cpus,
 
 	th_pressure = max_capacity - capacity;
 
+	trace_thermal_pressure_update(cpu, th_pressure);
+
 	for_each_cpu(cpu, cpus)
 		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
 }
diff --git a/include/trace/events/thermal_pressure.h b/include/trace/events/thermal_pressure.h
new file mode 100644
index 000000000000..b68680201360
--- /dev/null
+++ b/include/trace/events/thermal_pressure.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM thermal_pressure
+
+#if !defined(_TRACE_THERMAL_PRESSURE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_THERMAL_PRESSURE_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(thermal_pressure_update,
+	TP_PROTO(int cpu, unsigned long thermal_pressure),
+	TP_ARGS(cpu, thermal_pressure),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, thermal_pressure)
+		__field(int, cpu)
+	),
+
+	TP_fast_assign(
+		__entry->thermal_pressure = thermal_pressure;
+		__entry->cpu = cpu;
+	),
+
+	TP_printk("cpu=%d thermal_pressure=%lu", __entry->cpu, __entry->thermal_pressure)
+);
+#endif /* _TRACE_THERMAL_PRESSURE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] arch_topology: Trace the update thermal pressure
  2022-04-27  7:35 [PATCH v3] arch_topology: Trace the update thermal pressure Lukasz Luba
@ 2022-04-27  7:44 ` Greg KH
  2022-04-27  7:52   ` Lukasz Luba
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2022-04-27  7:44 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, sudeep.holla, dietmar.eggemann, vincent.guittot,
	rafael, rostedt, mingo

On Wed, Apr 27, 2022 at 08:35:51AM +0100, Lukasz Luba wrote:
> Add trace event to capture the moment of the call for updating the thermal
> pressure value. It's helpful to investigate how often those events occur
> in a system dealing with throttling. This trace event is needed since the
> old 'cdev_update' might not be used by some drivers.
> 
> The old 'cdev_update' trace event only provides a cooling state
> value: [0, n]. That state value then needs additional tools to translate
> it: state -> freq -> capacity -> thermal pressure. This new trace event
> just stores proper thermal pressure value in the trace buffer, no need
> for additional logic. This is helpful for cooperation when someone can
> simply sends to the list the trace buffer output from the platform (no
> need from additional information from other subsystems).
> 
> There are also platforms which due to some design reasons don't use
> cooling devices and thus don't trigger old 'cdev_update' trace event.
> They are also important and measuring latency for the thermal signal
> raising/decaying characteristics is in scope. This new trace event
> would cover them as well.
> 
> We already have a trace point 'pelt_thermal_tp' which after a change to
> trace event can be paired with this new 'thermal_pressure_update' and
> derive more insight what is going on in the system under thermal pressure
> (and why).
> 
> Reported-by: kernel test robot <lkp@intel.com>

The kernel test robot did not report that you needed to add a new trace
event :(


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] arch_topology: Trace the update thermal pressure
  2022-04-27  7:44 ` Greg KH
@ 2022-04-27  7:52   ` Lukasz Luba
  2022-04-27  7:58     ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Lukasz Luba @ 2022-04-27  7:52 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, sudeep.holla, dietmar.eggemann, vincent.guittot,
	rafael, rostedt, mingo



On 4/27/22 08:44, Greg KH wrote:
> On Wed, Apr 27, 2022 at 08:35:51AM +0100, Lukasz Luba wrote:
>> Add trace event to capture the moment of the call for updating the thermal
>> pressure value. It's helpful to investigate how often those events occur
>> in a system dealing with throttling. This trace event is needed since the
>> old 'cdev_update' might not be used by some drivers.
>>
>> The old 'cdev_update' trace event only provides a cooling state
>> value: [0, n]. That state value then needs additional tools to translate
>> it: state -> freq -> capacity -> thermal pressure. This new trace event
>> just stores proper thermal pressure value in the trace buffer, no need
>> for additional logic. This is helpful for cooperation when someone can
>> simply sends to the list the trace buffer output from the platform (no
>> need from additional information from other subsystems).
>>
>> There are also platforms which due to some design reasons don't use
>> cooling devices and thus don't trigger old 'cdev_update' trace event.
>> They are also important and measuring latency for the thermal signal
>> raising/decaying characteristics is in scope. This new trace event
>> would cover them as well.
>>
>> We already have a trace point 'pelt_thermal_tp' which after a change to
>> trace event can be paired with this new 'thermal_pressure_update' and
>> derive more insight what is going on in the system under thermal pressure
>> (and why).
>>
>> Reported-by: kernel test robot <lkp@intel.com>
> 
> The kernel test robot did not report that you needed to add a new trace
> event :(
> 

I got feedback from the test robot for v1, which figured out that
the riscv configuration is broken. You can find it here
https://lore.kernel.org/lkml/202204201654.vcszVDGb-lkp@intel.com/

So, I've added that tag following:
"If you fix the issue, kindly add following tag as appropriate"

Should this only be honored when a patch actually got into next
and then following patch with a fix would have that tag?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] arch_topology: Trace the update thermal pressure
  2022-04-27  7:52   ` Lukasz Luba
@ 2022-04-27  7:58     ` Greg KH
  2022-04-27  8:01       ` Lukasz Luba
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2022-04-27  7:58 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: linux-kernel, sudeep.holla, dietmar.eggemann, vincent.guittot,
	rafael, rostedt, mingo

On Wed, Apr 27, 2022 at 08:52:50AM +0100, Lukasz Luba wrote:
> 
> 
> On 4/27/22 08:44, Greg KH wrote:
> > On Wed, Apr 27, 2022 at 08:35:51AM +0100, Lukasz Luba wrote:
> > > Add trace event to capture the moment of the call for updating the thermal
> > > pressure value. It's helpful to investigate how often those events occur
> > > in a system dealing with throttling. This trace event is needed since the
> > > old 'cdev_update' might not be used by some drivers.
> > > 
> > > The old 'cdev_update' trace event only provides a cooling state
> > > value: [0, n]. That state value then needs additional tools to translate
> > > it: state -> freq -> capacity -> thermal pressure. This new trace event
> > > just stores proper thermal pressure value in the trace buffer, no need
> > > for additional logic. This is helpful for cooperation when someone can
> > > simply sends to the list the trace buffer output from the platform (no
> > > need from additional information from other subsystems).
> > > 
> > > There are also platforms which due to some design reasons don't use
> > > cooling devices and thus don't trigger old 'cdev_update' trace event.
> > > They are also important and measuring latency for the thermal signal
> > > raising/decaying characteristics is in scope. This new trace event
> > > would cover them as well.
> > > 
> > > We already have a trace point 'pelt_thermal_tp' which after a change to
> > > trace event can be paired with this new 'thermal_pressure_update' and
> > > derive more insight what is going on in the system under thermal pressure
> > > (and why).
> > > 
> > > Reported-by: kernel test robot <lkp@intel.com>
> > 
> > The kernel test robot did not report that you needed to add a new trace
> > event :(
> > 
> 
> I got feedback from the test robot for v1, which figured out that
> the riscv configuration is broken. You can find it here
> https://lore.kernel.org/lkml/202204201654.vcszVDGb-lkp@intel.com/
> 
> So, I've added that tag following:
> "If you fix the issue, kindly add following tag as appropriate"
> 
> Should this only be honored when a patch actually got into next
> and then following patch with a fix would have that tag?

Yes.  And you can mention it in the version information about what
changed between each patch version below the --- line, but as is, you
can see how this does not make sense.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] arch_topology: Trace the update thermal pressure
  2022-04-27  7:58     ` Greg KH
@ 2022-04-27  8:01       ` Lukasz Luba
  0 siblings, 0 replies; 5+ messages in thread
From: Lukasz Luba @ 2022-04-27  8:01 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, sudeep.holla, dietmar.eggemann, vincent.guittot,
	rafael, rostedt, mingo



On 4/27/22 08:58, Greg KH wrote:
> On Wed, Apr 27, 2022 at 08:52:50AM +0100, Lukasz Luba wrote:
>>
>>
>> On 4/27/22 08:44, Greg KH wrote:
>>> On Wed, Apr 27, 2022 at 08:35:51AM +0100, Lukasz Luba wrote:
>>>> Add trace event to capture the moment of the call for updating the thermal
>>>> pressure value. It's helpful to investigate how often those events occur
>>>> in a system dealing with throttling. This trace event is needed since the
>>>> old 'cdev_update' might not be used by some drivers.
>>>>
>>>> The old 'cdev_update' trace event only provides a cooling state
>>>> value: [0, n]. That state value then needs additional tools to translate
>>>> it: state -> freq -> capacity -> thermal pressure. This new trace event
>>>> just stores proper thermal pressure value in the trace buffer, no need
>>>> for additional logic. This is helpful for cooperation when someone can
>>>> simply sends to the list the trace buffer output from the platform (no
>>>> need from additional information from other subsystems).
>>>>
>>>> There are also platforms which due to some design reasons don't use
>>>> cooling devices and thus don't trigger old 'cdev_update' trace event.
>>>> They are also important and measuring latency for the thermal signal
>>>> raising/decaying characteristics is in scope. This new trace event
>>>> would cover them as well.
>>>>
>>>> We already have a trace point 'pelt_thermal_tp' which after a change to
>>>> trace event can be paired with this new 'thermal_pressure_update' and
>>>> derive more insight what is going on in the system under thermal pressure
>>>> (and why).
>>>>
>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>
>>> The kernel test robot did not report that you needed to add a new trace
>>> event :(
>>>
>>
>> I got feedback from the test robot for v1, which figured out that
>> the riscv configuration is broken. You can find it here
>> https://lore.kernel.org/lkml/202204201654.vcszVDGb-lkp@intel.com/
>>
>> So, I've added that tag following:
>> "If you fix the issue, kindly add following tag as appropriate"
>>
>> Should this only be honored when a patch actually got into next
>> and then following patch with a fix would have that tag?
> 
> Yes.  And you can mention it in the version information about what
> changed between each patch version below the --- line, but as is, you
> can see how this does not make sense.
> 

Thank you Greg for the explanation. I'll remove the tag in v4.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-27  8:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27  7:35 [PATCH v3] arch_topology: Trace the update thermal pressure Lukasz Luba
2022-04-27  7:44 ` Greg KH
2022-04-27  7:52   ` Lukasz Luba
2022-04-27  7:58     ` Greg KH
2022-04-27  8:01       ` Lukasz Luba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.