All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes
@ 2023-12-23 18:06 kernel test robot
  0 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2023-12-23 18:06 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20231219092539.3655172-2-daniel.lezcano@linaro.org>
References: <20231219092539.3655172-2-daniel.lezcano@linaro.org>
TO: Daniel Lezcano <daniel.lezcano@linaro.org>
TO: rjw@rjwysocki.net
TO: daniel.lezcano@linaro.org
TO: lukasz.luba@arm.com
CC: rui.zhang@intel.com
CC: linux-kernel@vger.kernel.org
CC: linux-pm@vger.kernel.org
CC: "Rafael J. Wysocki" <rafael@kernel.org>

Hi Daniel,

kernel test robot noticed the following build warnings:

[auto build test WARNING on rafael-pm/thermal]
[also build test WARNING on next-20231222]
[cannot apply to linus/master v6.7-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Lezcano/thermal-debugfs-Add-thermal-debugfs-information-for-mitigation-episodes/20231219-172732
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git thermal
patch link:    https://lore.kernel.org/r/20231219092539.3655172-2-daniel.lezcano%40linaro.org
patch subject: [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes
:::::: branch date: 4 days ago
:::::: commit date: 4 days ago
config: x86_64-randconfig-161-20231222 (https://download.01.org/0day-ci/archive/20231224/202312240142.RgAg0VLT-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Closes: https://lore.kernel.org/r/202312240142.RgAg0VLT-lkp@intel.com/

smatch warnings:
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xc2'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xb0'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xc2'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xb0'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xc2'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xb0'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xc2'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xb0'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xc2'
drivers/thermal/thermal_debugfs.c:700 tze_seq_show() warn: format string contains non-ascii character '\xb0'

vim +700 drivers/thermal/thermal_debugfs.c

7fb73c6c804962 Daniel Lezcano 2023-12-19  687  
7fb73c6c804962 Daniel Lezcano 2023-12-19  688  static int tze_seq_show(struct seq_file *s, void *v)
7fb73c6c804962 Daniel Lezcano 2023-12-19  689  {
7fb73c6c804962 Daniel Lezcano 2023-12-19  690  	struct thermal_zone_device *tz = s->private;
7fb73c6c804962 Daniel Lezcano 2023-12-19  691  	struct tz_events *tze;
7fb73c6c804962 Daniel Lezcano 2023-12-19  692  	int i;
7fb73c6c804962 Daniel Lezcano 2023-12-19  693  
7fb73c6c804962 Daniel Lezcano 2023-12-19  694  	tze = list_entry((struct list_head *)v, struct tz_events, node);
7fb73c6c804962 Daniel Lezcano 2023-12-19  695  
7fb73c6c804962 Daniel Lezcano 2023-12-19  696  	seq_printf(s, ",-Mitigation at %lluus, duration=%llums\n",
7fb73c6c804962 Daniel Lezcano 2023-12-19  697  		   ktime_to_us(tze->timestamp),
7fb73c6c804962 Daniel Lezcano 2023-12-19  698  		   ktime_to_ms(tze->duration));
7fb73c6c804962 Daniel Lezcano 2023-12-19  699  
7fb73c6c804962 Daniel Lezcano 2023-12-19 @700  	seq_printf(s, "| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |\n");
7fb73c6c804962 Daniel Lezcano 2023-12-19  701  	
7fb73c6c804962 Daniel Lezcano 2023-12-19  702  	for (i = 0; i < tz->num_trips; i++) {
7fb73c6c804962 Daniel Lezcano 2023-12-19  703  
7fb73c6c804962 Daniel Lezcano 2023-12-19  704  		struct thermal_trip trip;
7fb73c6c804962 Daniel Lezcano 2023-12-19  705  		const char *type;
7fb73c6c804962 Daniel Lezcano 2023-12-19  706  		
7fb73c6c804962 Daniel Lezcano 2023-12-19  707  		if (__thermal_zone_get_trip(tz, i, &trip))
7fb73c6c804962 Daniel Lezcano 2023-12-19  708  			continue;
7fb73c6c804962 Daniel Lezcano 2023-12-19  709  
7fb73c6c804962 Daniel Lezcano 2023-12-19  710  		/*
7fb73c6c804962 Daniel Lezcano 2023-12-19  711  		 * There is no possible mitigation happening at the
7fb73c6c804962 Daniel Lezcano 2023-12-19  712  		 * critical trip point, so the stats will be always
7fb73c6c804962 Daniel Lezcano 2023-12-19  713  		 * zero, skip this trip point
7fb73c6c804962 Daniel Lezcano 2023-12-19  714  		 */
7fb73c6c804962 Daniel Lezcano 2023-12-19  715  		if (trip.type == THERMAL_TRIP_CRITICAL)
7fb73c6c804962 Daniel Lezcano 2023-12-19  716  			continue;
7fb73c6c804962 Daniel Lezcano 2023-12-19  717  
7fb73c6c804962 Daniel Lezcano 2023-12-19  718  		if (trip.type == THERMAL_TRIP_PASSIVE)
7fb73c6c804962 Daniel Lezcano 2023-12-19  719  			type = "passive";
7fb73c6c804962 Daniel Lezcano 2023-12-19  720  		else if (trip.type == THERMAL_TRIP_ACTIVE)
7fb73c6c804962 Daniel Lezcano 2023-12-19  721  			type = "active";
7fb73c6c804962 Daniel Lezcano 2023-12-19  722  		else
7fb73c6c804962 Daniel Lezcano 2023-12-19  723  			type = "hot";
7fb73c6c804962 Daniel Lezcano 2023-12-19  724  
7fb73c6c804962 Daniel Lezcano 2023-12-19  725  		seq_printf(s, "| %*d | %*s | %*d | %*d | %*lld | %*d | %*d | %*d |\n",
7fb73c6c804962 Daniel Lezcano 2023-12-19  726  			   4 , i,
7fb73c6c804962 Daniel Lezcano 2023-12-19  727  			   8, type,
7fb73c6c804962 Daniel Lezcano 2023-12-19  728  			   9, trip.temperature,
7fb73c6c804962 Daniel Lezcano 2023-12-19  729  			   9, trip.hysteresis,
7fb73c6c804962 Daniel Lezcano 2023-12-19  730  			   10, ktime_to_ms(tze->trip_stats[i].duration),
7fb73c6c804962 Daniel Lezcano 2023-12-19  731  			   9, tze->trip_stats[i].avg,
7fb73c6c804962 Daniel Lezcano 2023-12-19  732  			   9, tze->trip_stats[i].min,
7fb73c6c804962 Daniel Lezcano 2023-12-19  733  			   9, tze->trip_stats[i].max);
7fb73c6c804962 Daniel Lezcano 2023-12-19  734  	}
7fb73c6c804962 Daniel Lezcano 2023-12-19  735  
7fb73c6c804962 Daniel Lezcano 2023-12-19  736  	return 0;
7fb73c6c804962 Daniel Lezcano 2023-12-19  737  }
7fb73c6c804962 Daniel Lezcano 2023-12-19  738  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes
  2023-12-23 11:41     ` Daniel Lezcano
@ 2023-12-27 19:53       ` Rafael J. Wysocki
  0 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-12-27 19:53 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, rjw, lukasz.luba, rui.zhang, linux-kernel, linux-pm

Hi Daniel,

On Sat, Dec 23, 2023 at 12:41 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
>
> Hi Rafael,
>
> On 21/12/2023 20:26, Rafael J. Wysocki wrote:
>
> [ ... ]
>
>
> >> +/**
> >> + * struct tz_events - Store all events related to a mitigation episode
> >> + *
> >> + * The tz_events structure describes a mitigation episode.
> >
> > So why not call it tz_mitigation?
>
> A mitigation episode = N x tz_events
>
> eg.
> trip A = passive cooling - cpufreq cluster0
> trip B = passive cooling - cpufreq cluster0 + cluster1
> trip C = active cooling + fan
>
> temperature trip A < trip B < trip C
>
> The mitigation episode, as defined, begins at trip A, and we can have
> multiple events (eg. trip B crossed several times, trip C, then trip B
> again etc ...).

I understand this, but I thought that tz_events represented the entire episode.

> [ ... ]
>
> >> +       if (dfs->tz.trip_index < 0) {
> >> +               tze = thermal_debugfs_tz_event_alloc(tz, now);
> >> +               if (!tze)
> >> +                       return;
> >> +
> >> +               list_add(&tze->node, &dfs->tz.tz_events);
> >> +       }
> >> +
> >> +       dfs->tz.trip_index++;
> >> +       dfs->tz.trips_crossed[dfs->tz.trip_index] = trip_id;
> >
> > So trip_index is an index into trips_crossed[] and the value is the ID
> > of the trip passed by thermal_debug_tz_trip_up() IIUC, so the trip IDs
> > in trips_crossed[] are always sorted by the trip temperature, in the
> > ascending order.
> >
> > It would be good to write this down somewhere in a comment.
> >
> > And what if trip temperatures change during a mitigation episode such
> > that the order by the trip temperature changes?
>
> Changing a trip point temperature during a mitigation is a general
> question about the thermal framework.
>
> How the governors will behave with such a change on the fly while they
> are in action?
>
> IMO, we should prevent to change a trip point temperature when this one
> is crossed and has a cooling device bound to it.

Well, it's not that simple.

There are legitimate cases in which this can happen on purpose.  For
example, the ACPI spec does not provide a way to get a trip hysteresis
from the platform firmware and instead it recommends the firmware to
update the trip temperature every time it is crossed on the way up in
order to implement hysteresis.

> >> +
> >> +       tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
> >> +       tze->trip_stats[trip_id].timestamp = now;
> >> +       tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
> >> +       tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
> >> +       tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
> >> +               (temperature - tze->trip_stats[trip_id].avg) /
> >> +               tze->trip_stats[trip_id].count;
> >> +
> >> +       mutex_unlock(&dfs->lock);
> >> +}
>
>
>
> --

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes
  2023-12-21 19:26   ` Rafael J. Wysocki
@ 2023-12-23 11:41     ` Daniel Lezcano
  2023-12-27 19:53       ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Lezcano @ 2023-12-23 11:41 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: rjw, lukasz.luba, rui.zhang, linux-kernel, linux-pm


Hi Rafael,

On 21/12/2023 20:26, Rafael J. Wysocki wrote:

[ ... ]


>> +/**
>> + * struct tz_events - Store all events related to a mitigation episode
>> + *
>> + * The tz_events structure describes a mitigation episode.
> 
> So why not call it tz_mitigation?

A mitigation episode = N x tz_events

eg.
trip A = passive cooling - cpufreq cluster0
trip B = passive cooling - cpufreq cluster0 + cluster1
trip C = active cooling + fan

temperature trip A < trip B < trip C

The mitigation episode, as defined, begins at trip A, and we can have 
multiple events (eg. trip B crossed several times, trip C, then trip B 
again etc ...).

[ ... ]

>> +       if (dfs->tz.trip_index < 0) {
>> +               tze = thermal_debugfs_tz_event_alloc(tz, now);
>> +               if (!tze)
>> +                       return;
>> +
>> +               list_add(&tze->node, &dfs->tz.tz_events);
>> +       }
>> +
>> +       dfs->tz.trip_index++;
>> +       dfs->tz.trips_crossed[dfs->tz.trip_index] = trip_id;
> 
> So trip_index is an index into trips_crossed[] and the value is the ID
> of the trip passed by thermal_debug_tz_trip_up() IIUC, so the trip IDs
> in trips_crossed[] are always sorted by the trip temperature, in the
> ascending order.
> 
> It would be good to write this down somewhere in a comment.
> 
> And what if trip temperatures change during a mitigation episode such
> that the order by the trip temperature changes?

Changing a trip point temperature during a mitigation is a general 
question about the thermal framework.

How the governors will behave with such a change on the fly while they 
are in action?

IMO, we should prevent to change a trip point temperature when this one 
is crossed and has a cooling device bound to it.

>> +
>> +       tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
>> +       tze->trip_stats[trip_id].timestamp = now;
>> +       tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
>> +       tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
>> +       tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
>> +               (temperature - tze->trip_stats[trip_id].avg) /
>> +               tze->trip_stats[trip_id].count;
>> +
>> +       mutex_unlock(&dfs->lock);
>> +}



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes
  2023-12-19  9:25 ` [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes Daniel Lezcano
@ 2023-12-21 19:26   ` Rafael J. Wysocki
  2023-12-23 11:41     ` Daniel Lezcano
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-12-21 19:26 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: rjw, lukasz.luba, rui.zhang, linux-kernel, linux-pm, Rafael J. Wysocki

On Tue, Dec 19, 2023 at 10:25 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> The mitigation episodes are recorded. A mitigation episode happens
> when the first trip point is crossed the way up and then the way
> down. During this episode other trip points can be crossed also and
> are accounted for this mitigation episode. The interesting information
> is the average temperature at the trip point, the undershot and the
> overshot. The standard deviation of the mitigated temperature will be
> added later.
>
> The thermal debugfs directory structure tries to stay consistent with
> the sysfs one but in a very simplified way:
>
> thermal/
>  `-- thermal_zones
>      |-- 0
>      |   `-- mitigations
>      `-- 1
>          `-- mitigations
>
> The content of the mitigations file has the following format:
>
> ,-Mitigation at 349988258us, duration=130136ms
> | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
> |    0 |  passive |     65000 |      2000 |     130136 |     68227 |     62500 |     75625 |
> |    1 |  passive |     75000 |      2000 |     104209 |     74857 |     71666 |     77500 |
> ,-Mitigation at 272451637us, duration=75000ms
> | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
> |    0 |  passive |     65000 |      2000 |      75000 |     68561 |     62500 |     75000 |
> |    1 |  passive |     75000 |      2000 |      60714 |     74820 |     70555 |     77500 |
> ,-Mitigation at 238184119us, duration=27316ms
> | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
> |    0 |  passive |     65000 |      2000 |      27316 |     73377 |     62500 |     75000 |
> |    1 |  passive |     75000 |      2000 |      19468 |     75284 |     69444 |     77500 |
> ,-Mitigation at 39863713us, duration=136196ms
> | trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
> |    0 |  passive |     65000 |      2000 |     136196 |     73922 |     62500 |     75000 |
> |    1 |  passive |     75000 |      2000 |      91721 |     74386 |     69444 |     78125 |
>
> More information for a better understanding of the thermal behavior
> will be added after. The idea is to give detailed statistics
> information about the undershots and overshots, the temperature speed,
> etc... As all the information in a single file is too much, the idea
> would be to create a directory named with the mitigation timestamp
> where all data could be added.
>
> Please note this code is immune against trip ordering.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
> Changelog:
>   - v3
>     - Fixed kerneldoc (kbuild)
>     - Fixed wrong indentation s/<space>/<tab>/
>   - v2
>     - Applied changes based on comments from patch 1/2
>     - Constified structure in function parameters
>   - v1 (from RFC):
>     - Replaced exported function name s/debugfs/debug/
>     - Used "struct thermal_trip" parameter instead of "trip_id"
>     - Renamed handle_way_[up|down] by tz_trip_[up|down]
>     - Replaced thermal_debug_tz_[unregister|register] by [add|remove]
> ---
>  drivers/thermal/thermal_core.c    |   7 +
>  drivers/thermal/thermal_debugfs.c | 367 +++++++++++++++++++++++++++++-
>  drivers/thermal/thermal_debugfs.h |  14 ++
>  3 files changed, 387 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 33332d401b13..a0cbe8d7b945 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c

[cut]

The changes above LGTM.

> diff --git a/drivers/thermal/thermal_debugfs.c b/drivers/thermal/thermal_debugfs.c
> index 8fceddb5f6d2..5fd2553260b2 100644
> --- a/drivers/thermal/thermal_debugfs.c
> +++ b/drivers/thermal/thermal_debugfs.c
> @@ -14,8 +14,11 @@
>  #include <linux/mutex.h>
>  #include <linux/thermal.h>
>
> +#include "thermal_core.h"
> +
>  static struct dentry *d_root;
>  static struct dentry *d_cdev;
> +static struct dentry *d_tz;
>
>  /*
>   * Length of the string containing the thermal zone id or the cooling
> @@ -77,22 +80,84 @@ struct cdev_value {
>         u64 value;
>  };
>
> +/**
> + * struct trip_stats - Thermal trip statistics
> + *
> + * The trip_stats structure has the relevant information to show the
> + * statistics related to a trip point violation during a mitigation
> + * episode.

I wouldn't use the term "violation" here, it feels  too strong.

I looked for a replacement word, but I couldn't find a suitable one.
In the sentence above I would just say "related to going above a trip
point".

> + *
> + * @timestamp: the trip crossing timestamp
> + * @duration: the total duration of trip point violation

"total time when the zone temperature was above the trip point"

> + * @count: the number of occurrences of the trip point violation

"the number of times the zone temperature was above the trip point"

> + * @max: maximum temperature during the trip point violation

"maximum recorded temperature above the trip point"

> + * @min: min temperature during the trip point violation

"minimum recorded temperature above the trip point"

> + * @avg: average temperature during the trip point violation

"average temperature above the trip point"

> + */
> +struct trip_stats {
> +       ktime_t timestamp;
> +       ktime_t duration;
> +       int count;
> +       int max;
> +       int min;
> +       int avg;
> +};
> +
> +/**
> + * struct tz_events - Store all events related to a mitigation episode
> + *
> + * The tz_events structure describes a mitigation episode.

So why not call it tz_mitigation?

> A
> + * mitigation episode is when the mitigation begins and ends. During
> + * this episode we can have multiple trip points crossed the way up
> + * and down if there are multiple trip describes in the firmware.
> + *
> + * @node: a list element to be added to the list of tz events
> + * @trip_stats: per trip point statistics
> + * @timestamp: First trip point crossed the way up
> + * @duration: total duration of the mitigation episode
> + */
> +struct tz_events {
> +       struct list_head node;
> +       struct trip_stats *trip_stats;
> +       ktime_t timestamp;
> +       ktime_t duration;
> +};
> +
> +/**
> + * struct tz_debugfs - Store all mitigations episodes for a thermal zone

"Collection of all mitigation episodes for a thermal zone"

> + *
> + * The tz_debugfs structure contains the list of the mitigation
> + * episodes and has to track which trip point has been crossed in
> + * order to handle correctly nested trip point mitigation episodes.
> + *
> + * @tz_events: a list of thermal mitigation episodes
> + * @trips_crossed: an array of trip point crossed by id

This requires a bit more explanation IMV.

From the patch reverse-engineering I gather that this is used for
working around a possible issue with the trips being not sorted.  It
is also unclear what id means at this point.

> + * @trip_index: the current trip point crossed

I'm not sure what this means.

> + */
> +struct tz_debugfs {
> +       struct list_head tz_events;
> +       int *trips_crossed;
> +       int trip_index;
> +};
> +
>  /**
>   * struct thermal_debugfs - High level structure for a thermal object in debugfs
>   *
>   * The thermal_debugfs structure is the common structure used by the
> - * cooling device to compute the statistics.
> + * cooling device and the thermal zone to store the statistics.

s/and/or/ ?  Because it can't be both at the same time.

>   *
>   * @d_top: top directory of the thermal object directory
>   * @lock: per object lock to protect the internals
>   *
>   * @cdev: a cooling device debug structure
> + * @tz: a thermal zone debug structure
>   */
>  struct thermal_debugfs {
>         struct dentry *d_top;
>         struct mutex lock;
>         union {
>                 struct cdev_debugfs cdev;
> +               struct tz_debugfs tz;

tz_dbg so as to avoid confusing this with a thermal zone object?

>         };
>  };
>
> @@ -103,6 +168,10 @@ void thermal_debug_init(void)
>                 return;
>
>         d_cdev = debugfs_create_dir("cooling_devices", d_root);
> +       if (!d_cdev)
> +               return;
> +
> +       d_tz = debugfs_create_dir("thermal_zones", d_root);
>  }
>
>  static struct thermal_debugfs *thermal_debugfs_add_id(struct dentry *d, int id)
> @@ -422,3 +491,299 @@ void thermal_debug_cdev_remove(struct thermal_cooling_device *cdev)
>
>         mutex_unlock(&dfs->lock);
>  }
> +
> +static struct tz_events *thermal_debugfs_tz_event_alloc(struct thermal_zone_device *tz,

thermal_debugfs_tz_mitigation_alloc() ?

> +                                                       ktime_t now)
> +{
> +       struct tz_events *tze;
> +       struct trip_stats *trip_stats;
> +       int i;
> +
> +       tze = kzalloc(sizeof(*tze), GFP_KERNEL);
> +       if (!tze)
> +               return NULL;
> +
> +       INIT_LIST_HEAD(&tze->node);
> +       tze->timestamp = now;
> +
> +       trip_stats = kzalloc(sizeof(struct trip_stats) * tz->num_trips, GFP_KERNEL);
> +       if (!trip_stats) {
> +               kfree(tze);
> +               return NULL;
> +       }

The memory allocations can be combined into one:

tze = kzalloc(sizeof(*tze) + sizeof(*trip_stats) * tz->num_trips, GFP_KERNEL);
if (!tze)
        return NULL;

trip_stats = (struct trip_stats *)(tze + 1);

Or if you defined trip_stats[] as a flexible array, you could use
struct_size() (see overflow.h).

> +
> +       for (i = 0; i < tz->num_trips; i++) {
> +               trip_stats[i].min = INT_MAX;
> +               trip_stats[i].max = INT_MIN;
> +       }
> +
> +       tze->trip_stats = trip_stats;
> +
> +       return tze;
> +}
> +
> +void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
> +                             const struct thermal_trip *trip)
> +{
> +       struct tz_events *tze;
> +       struct thermal_debugfs *dfs = tz->debugfs;

As per the comments on the previous patch this would be

struct thermal_debugfs *thermdbg = tz->debugfs;

I would add

struct tz_debugfs *tz_dbg =&thermdbg->tz_dbg;

and use it instead of dereferencing thermdbg every time to get to tz_dbg.

> +       int temperature = tz->temperature;
> +       int trip_id = thermal_zone_trip_id(tz, trip);
> +       ktime_t now = ktime_get();
> +
> +       if (!dfs)
> +               return;
> +
> +       mutex_lock(&dfs->lock);
> +
> +       /*
> +        * The mitigation is starting. A mitigation can contain
> +        * several episodes where each of them is related to a
> +        * temperature crossing a trip point. The episodes are
> +        * nested. That means when the temperature is crossing the
> +        * first trip point, the duration begins to be measured. If
> +        * the temperature continues to increase and reaches the
> +        * second trip point, the duration of the first trip must be
> +        * also accumulated.
> +        *
> +        * eg.
> +        *
> +        * temp
> +        *   ^
> +        *   |             --------
> +        * trip 2         /        \         ------
> +        *   |           /|        |\      /|      |\
> +        * trip 1       / |        | `----  |      | \
> +        *   |         /| |        |        |      | |\
> +        * trip 0     / | |        |        |      | | \
> +        *   |       /| | |        |        |      | | |\
> +        *   |      / | | |        |        |      | | | `--
> +        *   |     /  | | |        |        |      | | |
> +        *   |-----   | | |        |        |      | | |
> +        *   |        | | |        |        |      | | |
> +        *    --------|-|-|--------|--------|------|-|-|------------------> time
> +        *            | | |<--t2-->|        |<-t2'>| | |
> +        *            | |                            | |
> +        *            | |<------------t1------------>| |
> +        *            |                                |
> +        *            |<-------------t0--------------->|
> +        *
> +        */
> +       if (dfs->tz.trip_index < 0) {
> +               tze = thermal_debugfs_tz_event_alloc(tz, now);
> +               if (!tze)
> +                       return;
> +
> +               list_add(&tze->node, &dfs->tz.tz_events);
> +       }
> +
> +       dfs->tz.trip_index++;
> +       dfs->tz.trips_crossed[dfs->tz.trip_index] = trip_id;

So trip_index is an index into trips_crossed[] and the value is the ID
of the trip passed by thermal_debug_tz_trip_up() IIUC, so the trip IDs
in trips_crossed[] are always sorted by the trip temperature, in the
ascending order.

It would be good to write this down somewhere in a comment.

And what if trip temperatures change during a mitigation episode such
that the order by the trip temperature changes?

> +
> +       tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
> +       tze->trip_stats[trip_id].timestamp = now;
> +       tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
> +       tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
> +       tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
> +               (temperature - tze->trip_stats[trip_id].avg) /
> +               tze->trip_stats[trip_id].count;
> +
> +       mutex_unlock(&dfs->lock);
> +}
> +
> +void thermal_debug_tz_trip_down(struct thermal_zone_device *tz,
> +                               const struct thermal_trip *trip)
> +{
> +       struct thermal_debugfs *dfs = tz->debugfs;
> +       struct tz_events *tze;
> +       int trip_id = thermal_zone_trip_id(tz, trip);
> +       ktime_t delta, now = ktime_get();
> +
> +       if (!dfs)
> +               return;
> +
> +       /*
> +        * The temperature crosses the way down but there was not
> +        * mitigation detected before. That may happen when the
> +        * temperature is greater than a trip point when registering a
> +        * thermal zone, which is a common use case as the kernel has
> +        * no mitigation mechanism yet at boot time.
> +        */
> +       if (dfs->tz.trip_index < 0)
> +               return;
> +
> +       mutex_lock(&dfs->lock);
> +
> +       tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
> +
> +       delta = ktime_sub(now, tze->trip_stats[trip_id].timestamp);
> +       tze->trip_stats[trip_id].duration = ktime_add(delta,
> +                                                     tze->trip_stats[trip_id].duration);
> +
> +       dfs->tz.trip_index--;
> +
> +       /*
> +        * This event closes the mitigation as we are crossing the
> +        * last trip point the way down.
> +        */
> +       if (dfs->tz.trip_index < 0)
> +               tze->duration = ktime_sub(now, tze->timestamp);
> +
> +       mutex_unlock(&dfs->lock);
> +}
> +
> +void thermal_debug_update_temp(struct thermal_zone_device *tz)
> +{
> +       struct thermal_debugfs *dfs = tz->debugfs;
> +       struct tz_events *tze;
> +       int trip;

trip_id, please.

> +
> +       if (!dfs)
> +               return;
> +
> +       mutex_lock(&dfs->lock);
> +
> +       if (dfs->tz.trip_index >= 0) {
> +               trip = dfs->tz.trip_index;
> +               tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
> +               tze->trip_stats[trip].count++;
> +               tze->trip_stats[trip].max = max(tze->trip_stats[trip].max, tz->temperature);
> +               tze->trip_stats[trip].min = min(tze->trip_stats[trip].min, tz->temperature);
> +               tze->trip_stats[trip].avg = tze->trip_stats[trip].avg +
> +                       (tz->temperature - tze->trip_stats[trip].avg) /
> +                       tze->trip_stats[trip].count;
> +       }
> +
> +       mutex_unlock(&dfs->lock);
> +}
> +
> +static void *tze_seq_start(struct seq_file *s, loff_t *pos)
> +{
> +       struct thermal_zone_device *tz = s->private;
> +       struct thermal_debugfs *dfs = tz->debugfs;
> +       struct tz_debugfs *tzd = &dfs->tz;
> +
> +       mutex_lock(&dfs->lock);
> +
> +       return seq_list_start(&tzd->tz_events, *pos);
> +}
> +
> +static void *tze_seq_next(struct seq_file *s, void *v, loff_t *pos)
> +{
> +       struct thermal_zone_device *tz = s->private;
> +       struct thermal_debugfs *dfs = tz->debugfs;
> +       struct tz_debugfs *tzd = &dfs->tz;
> +
> +       return seq_list_next(v, &tzd->tz_events, pos);
> +}
> +
> +static void tze_seq_stop(struct seq_file *s, void *v)
> +{
> +       struct thermal_zone_device *tz = s->private;
> +       struct thermal_debugfs *dfs = tz->debugfs;
> +
> +       mutex_unlock(&dfs->lock);
> +}
> +
> +static int tze_seq_show(struct seq_file *s, void *v)
> +{
> +       struct thermal_zone_device *tz = s->private;
> +       struct tz_events *tze;
> +       int i;
> +
> +       tze = list_entry((struct list_head *)v, struct tz_events, node);
> +
> +       seq_printf(s, ",-Mitigation at %lluus, duration=%llums\n",
> +                  ktime_to_us(tze->timestamp),
> +                  ktime_to_ms(tze->duration));
> +
> +       seq_printf(s, "| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |\n");
> +
> +       for (i = 0; i < tz->num_trips; i++) {

Please use for_each_trip(), here and thermal_zone_trip_id() to get to the IDs.

> +
> +               struct thermal_trip trip;
> +               const char *type;
> +
> +               if (__thermal_zone_get_trip(tz, i, &trip))
> +                       continue;
> +
> +               /*
> +                * There is no possible mitigation happening at the
> +                * critical trip point, so the stats will be always
> +                * zero, skip this trip point
> +                */
> +               if (trip.type == THERMAL_TRIP_CRITICAL)
> +                       continue;
> +
> +               if (trip.type == THERMAL_TRIP_PASSIVE)
> +                       type = "passive";
> +               else if (trip.type == THERMAL_TRIP_ACTIVE)
> +                       type = "active";
> +               else
> +                       type = "hot";
> +
> +               seq_printf(s, "| %*d | %*s | %*d | %*d | %*lld | %*d | %*d | %*d |\n",
> +                          4 , i,
> +                          8, type,
> +                          9, trip.temperature,
> +                          9, trip.hysteresis,
> +                          10, ktime_to_ms(tze->trip_stats[i].duration),
> +                          9, tze->trip_stats[i].avg,
> +                          9, tze->trip_stats[i].min,
> +                          9, tze->trip_stats[i].max);
> +       }
> +
> +       return 0;
> +}
> +
> +static const struct seq_operations tze_sops = {
> +       .start = tze_seq_start,
> +       .next = tze_seq_next,
> +       .stop = tze_seq_stop,
> +       .show = tze_seq_show,
> +};
> +
> +DEFINE_SEQ_ATTRIBUTE(tze);
> +
> +void thermal_debug_tz_add(struct thermal_zone_device *tz)
> +{
> +       struct thermal_debugfs *dfs;
> +       struct tz_debugfs *tzd;
> +
> +       dfs = thermal_debugfs_add_id(d_tz, tz->id);
> +       if (!dfs)
> +               return;
> +
> +       tzd = &dfs->tz;
> +
> +       tzd->trips_crossed = kzalloc(sizeof(int) * tz->num_trips, GFP_KERNEL);
> +       if (!tzd->trips_crossed) {
> +               thermal_debugfs_remove_id(dfs);
> +               return;
> +       }
> +
> +       /*
> +        * Trip index '-1' means no mitigation
> +        */
> +       tzd->trip_index = -1;
> +       INIT_LIST_HEAD(&tzd->tz_events);
> +
> +       debugfs_create_file("mitigations", 0400, dfs->d_top, tz, &tze_fops);
> +
> +       tz->debugfs = dfs;
> +}
> +
> +void thermal_debug_tz_remove(struct thermal_zone_device *tz)
> +{
> +       struct thermal_debugfs *dfs = tz->debugfs;
> +
> +       if (!dfs)
> +               return;
> +
> +       mutex_lock(&dfs->lock);
> +
> +       tz->debugfs = NULL;
> +       thermal_debugfs_remove_id(dfs);
> +
> +       mutex_unlock(&dfs->lock);
> +}
> diff --git a/drivers/thermal/thermal_debugfs.h b/drivers/thermal/thermal_debugfs.h
> index 341499388448..155b9af5fe87 100644
> --- a/drivers/thermal/thermal_debugfs.h
> +++ b/drivers/thermal/thermal_debugfs.h
> @@ -5,10 +5,24 @@ void thermal_debug_init(void);
>  void thermal_debug_cdev_add(struct thermal_cooling_device *cdev);
>  void thermal_debug_cdev_remove(struct thermal_cooling_device *cdev);
>  void thermal_debug_cdev_state_update(const struct thermal_cooling_device *cdev, int state);
> +void thermal_debug_tz_add(struct thermal_zone_device *tz);
> +void thermal_debug_tz_remove(struct thermal_zone_device *tz);
> +void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
> +                             const struct thermal_trip *trip);
> +void thermal_debug_tz_trip_down(struct thermal_zone_device *tz,
> +                               const struct thermal_trip *trip);
> +void thermal_debug_update_temp(struct thermal_zone_device *tz);
>  #else
>  static inline void thermal_debug_init(void) {}
>  static inline void thermal_debug_cdev_add(struct thermal_cooling_device *cdev) {}
>  static inline void thermal_debug_cdev_remove(struct thermal_cooling_device *cdev) {}
>  static inline void thermal_debug_cdev_state_update(const struct thermal_cooling_device *cdev,
>                                                    int state) {}
> +static inline void thermal_debug_tz_add(struct thermal_zone_device *tz) {}
> +static inline void thermal_debug_tz_remove(struct thermal_zone_device *tz) {}
> +static inline void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
> +                                           const struct thermal_trip *trip) {};
> +static inline void thermal_debug_tz_trip_down(struct thermal_zone_device *tz,
> +                                             const struct thermal_trip *trip) {}
> +static inline void thermal_debug_update_temp(struct thermal_zone_device *tz) {}
>  #endif /* CONFIG_THERMAL_DEBUGFS */
> --

It overall seems to have a problem with a possible change of trip
temperatures during a mitigation episode.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes
  2023-12-19  9:25 [PATCH v3 1/2] thermal/debugfs: Add thermal cooling device debugfs information Daniel Lezcano
@ 2023-12-19  9:25 ` Daniel Lezcano
  2023-12-21 19:26   ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Lezcano @ 2023-12-19  9:25 UTC (permalink / raw)
  To: rjw, daniel.lezcano, lukasz.luba
  Cc: rui.zhang, linux-kernel, linux-pm, Rafael J. Wysocki

The mitigation episodes are recorded. A mitigation episode happens
when the first trip point is crossed the way up and then the way
down. During this episode other trip points can be crossed also and
are accounted for this mitigation episode. The interesting information
is the average temperature at the trip point, the undershot and the
overshot. The standard deviation of the mitigated temperature will be
added later.

The thermal debugfs directory structure tries to stay consistent with
the sysfs one but in a very simplified way:

thermal/
 `-- thermal_zones
     |-- 0
     |   `-- mitigations
     `-- 1
         `-- mitigations

The content of the mitigations file has the following format:

,-Mitigation at 349988258us, duration=130136ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |     130136 |     68227 |     62500 |     75625 |
|    1 |  passive |     75000 |      2000 |     104209 |     74857 |     71666 |     77500 |
,-Mitigation at 272451637us, duration=75000ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |      75000 |     68561 |     62500 |     75000 |
|    1 |  passive |     75000 |      2000 |      60714 |     74820 |     70555 |     77500 |
,-Mitigation at 238184119us, duration=27316ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |      27316 |     73377 |     62500 |     75000 |
|    1 |  passive |     75000 |      2000 |      19468 |     75284 |     69444 |     77500 |
,-Mitigation at 39863713us, duration=136196ms
| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |
|    0 |  passive |     65000 |      2000 |     136196 |     73922 |     62500 |     75000 |
|    1 |  passive |     75000 |      2000 |      91721 |     74386 |     69444 |     78125 |

More information for a better understanding of the thermal behavior
will be added after. The idea is to give detailed statistics
information about the undershots and overshots, the temperature speed,
etc... As all the information in a single file is too much, the idea
would be to create a directory named with the mitigation timestamp
where all data could be added.

Please note this code is immune against trip ordering.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
Changelog:
  - v3
    - Fixed kerneldoc (kbuild)
    - Fixed wrong indentation s/<space>/<tab>/
  - v2
    - Applied changes based on comments from patch 1/2
    - Constified structure in function parameters
  - v1 (from RFC):
    - Replaced exported function name s/debugfs/debug/
    - Used "struct thermal_trip" parameter instead of "trip_id"
    - Renamed handle_way_[up|down] by tz_trip_[up|down]
    - Replaced thermal_debug_tz_[unregister|register] by [add|remove]
---
 drivers/thermal/thermal_core.c    |   7 +
 drivers/thermal/thermal_debugfs.c | 367 +++++++++++++++++++++++++++++-
 drivers/thermal/thermal_debugfs.h |  14 ++
 3 files changed, 387 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 33332d401b13..a0cbe8d7b945 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -367,6 +367,7 @@ static void handle_thermal_trip(struct thermal_zone_device *tz,
 			thermal_notify_tz_trip_up(tz->id,
 						  thermal_zone_trip_id(tz, trip),
 						  tz->temperature);
+			thermal_debug_tz_trip_up(tz, trip);
 			trip->threshold = trip->temperature - trip->hysteresis;
 		} else {
 			trip->threshold = trip->temperature;
@@ -386,6 +387,7 @@ static void handle_thermal_trip(struct thermal_zone_device *tz,
 			thermal_notify_tz_trip_down(tz->id,
 						    thermal_zone_trip_id(tz, trip),
 						    tz->temperature);
+			thermal_debug_tz_trip_down(tz, trip);
 			trip->threshold = trip->temperature;
 		} else {
 			trip->threshold = trip->temperature - trip->hysteresis;
@@ -417,6 +419,7 @@ static void update_temperature(struct thermal_zone_device *tz)
 	trace_thermal_temperature(tz);
 
 	thermal_genl_sampling_temp(tz->id, temp);
+	thermal_debug_update_temp(tz);
 }
 
 static void thermal_zone_device_init(struct thermal_zone_device *tz)
@@ -1396,6 +1399,8 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
 
 	thermal_notify_tz_create(tz->id, tz->type);
 
+	thermal_debug_tz_add(tz);
+
 	return tz;
 
 unregister:
@@ -1461,6 +1466,8 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 	if (!tz)
 		return;
 
+	thermal_debug_tz_remove(tz);
+
 	tz_id = tz->id;
 
 	mutex_lock(&thermal_list_lock);
diff --git a/drivers/thermal/thermal_debugfs.c b/drivers/thermal/thermal_debugfs.c
index 8fceddb5f6d2..5fd2553260b2 100644
--- a/drivers/thermal/thermal_debugfs.c
+++ b/drivers/thermal/thermal_debugfs.c
@@ -14,8 +14,11 @@
 #include <linux/mutex.h>
 #include <linux/thermal.h>
 
+#include "thermal_core.h"
+
 static struct dentry *d_root;
 static struct dentry *d_cdev;
+static struct dentry *d_tz;
 
 /*
  * Length of the string containing the thermal zone id or the cooling
@@ -77,22 +80,84 @@ struct cdev_value {
 	u64 value;
 };
 
+/**
+ * struct trip_stats - Thermal trip statistics
+ *
+ * The trip_stats structure has the relevant information to show the
+ * statistics related to a trip point violation during a mitigation
+ * episode.
+ *
+ * @timestamp: the trip crossing timestamp
+ * @duration: the total duration of trip point violation
+ * @count: the number of occurrences of the trip point violation
+ * @max: maximum temperature during the trip point violation
+ * @min: min temperature during the trip point violation
+ * @avg: average temperature during the trip point violation
+ */
+struct trip_stats {
+	ktime_t timestamp;
+	ktime_t duration;
+	int count;
+	int max;
+	int min;
+	int avg;
+};
+
+/**
+ * struct tz_events - Store all events related to a mitigation episode
+ *
+ * The tz_events structure describes a mitigation episode. A
+ * mitigation episode is when the mitigation begins and ends. During
+ * this episode we can have multiple trip points crossed the way up
+ * and down if there are multiple trip describes in the firmware.
+ *
+ * @node: a list element to be added to the list of tz events
+ * @trip_stats: per trip point statistics
+ * @timestamp: First trip point crossed the way up
+ * @duration: total duration of the mitigation episode
+ */
+struct tz_events {
+	struct list_head node;
+	struct trip_stats *trip_stats;
+	ktime_t timestamp;
+	ktime_t duration;
+};
+
+/**
+ * struct tz_debugfs - Store all mitigations episodes for a thermal zone
+ *
+ * The tz_debugfs structure contains the list of the mitigation
+ * episodes and has to track which trip point has been crossed in
+ * order to handle correctly nested trip point mitigation episodes.
+ *
+ * @tz_events: a list of thermal mitigation episodes
+ * @trips_crossed: an array of trip point crossed by id
+ * @trip_index: the current trip point crossed
+ */
+struct tz_debugfs {
+	struct list_head tz_events;
+	int *trips_crossed;
+	int trip_index;
+};
+
 /**
  * struct thermal_debugfs - High level structure for a thermal object in debugfs
  *
  * The thermal_debugfs structure is the common structure used by the
- * cooling device to compute the statistics.
+ * cooling device and the thermal zone to store the statistics.
  *
  * @d_top: top directory of the thermal object directory
  * @lock: per object lock to protect the internals
  *
  * @cdev: a cooling device debug structure
+ * @tz: a thermal zone debug structure
  */
 struct thermal_debugfs {
 	struct dentry *d_top;
 	struct mutex lock;
 	union {
 		struct cdev_debugfs cdev;
+		struct tz_debugfs tz;
 	};
 };
 
@@ -103,6 +168,10 @@ void thermal_debug_init(void)
 		return;
 
 	d_cdev = debugfs_create_dir("cooling_devices", d_root);
+	if (!d_cdev)
+		return;
+
+	d_tz = debugfs_create_dir("thermal_zones", d_root);
 }
 
 static struct thermal_debugfs *thermal_debugfs_add_id(struct dentry *d, int id)
@@ -422,3 +491,299 @@ void thermal_debug_cdev_remove(struct thermal_cooling_device *cdev)
 
 	mutex_unlock(&dfs->lock);
 }
+
+static struct tz_events *thermal_debugfs_tz_event_alloc(struct thermal_zone_device *tz,
+							ktime_t now)
+{
+	struct tz_events *tze;
+	struct trip_stats *trip_stats;
+	int i;
+
+	tze = kzalloc(sizeof(*tze), GFP_KERNEL);
+	if (!tze)
+		return NULL;
+
+	INIT_LIST_HEAD(&tze->node);
+	tze->timestamp = now;
+
+	trip_stats = kzalloc(sizeof(struct trip_stats) * tz->num_trips, GFP_KERNEL);
+	if (!trip_stats) {
+		kfree(tze);
+		return NULL;
+	}
+
+	for (i = 0; i < tz->num_trips; i++) {
+		trip_stats[i].min = INT_MAX;
+		trip_stats[i].max = INT_MIN;
+	}
+	
+	tze->trip_stats = trip_stats;
+
+	return tze;
+}
+
+void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
+			      const struct thermal_trip *trip)
+{
+	struct tz_events *tze;
+	struct thermal_debugfs *dfs = tz->debugfs;
+	int temperature = tz->temperature;
+	int trip_id = thermal_zone_trip_id(tz, trip);
+	ktime_t now = ktime_get();
+
+	if (!dfs)
+		return;
+
+	mutex_lock(&dfs->lock);
+
+	/*
+	 * The mitigation is starting. A mitigation can contain
+	 * several episodes where each of them is related to a
+	 * temperature crossing a trip point. The episodes are
+	 * nested. That means when the temperature is crossing the
+	 * first trip point, the duration begins to be measured. If
+	 * the temperature continues to increase and reaches the
+	 * second trip point, the duration of the first trip must be
+	 * also accumulated.
+	 *
+	 * eg.
+	 *
+	 * temp
+	 *   ^
+	 *   |             --------
+	 * trip 2         /        \         ------
+	 *   |           /|        |\      /|      |\
+	 * trip 1       / |        | `----  |      | \
+	 *   |         /| |        |        |      | |\
+	 * trip 0     / | |        |        |      | | \
+	 *   |       /| | |        |        |      | | |\
+	 *   |      / | | |        |        |      | | | `--
+	 *   |     /  | | |        |        |      | | |     
+	 *   |-----   | | |        |        |      | | |      
+	 *   |        | | |        |        |      | | |
+	 *    --------|-|-|--------|--------|------|-|-|------------------> time
+	 *            | | |<--t2-->|        |<-t2'>| | |
+	 *            | |                            | |
+	 *            | |<------------t1------------>| |
+	 *            |                                |
+	 *            |<-------------t0--------------->|
+	 *
+	 */
+	if (dfs->tz.trip_index < 0) {
+		tze = thermal_debugfs_tz_event_alloc(tz, now);
+		if (!tze)
+			return;
+
+		list_add(&tze->node, &dfs->tz.tz_events);
+	}
+
+	dfs->tz.trip_index++;
+	dfs->tz.trips_crossed[dfs->tz.trip_index] = trip_id;
+
+	tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
+	tze->trip_stats[trip_id].timestamp = now;
+	tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature);
+	tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature);
+	tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg +
+		(temperature - tze->trip_stats[trip_id].avg) /
+		tze->trip_stats[trip_id].count;
+
+	mutex_unlock(&dfs->lock);
+}
+
+void thermal_debug_tz_trip_down(struct thermal_zone_device *tz,
+				const struct thermal_trip *trip)
+{
+	struct thermal_debugfs *dfs = tz->debugfs;
+	struct tz_events *tze;
+	int trip_id = thermal_zone_trip_id(tz, trip);
+	ktime_t delta, now = ktime_get();
+
+	if (!dfs)
+		return;
+
+	/*
+	 * The temperature crosses the way down but there was not
+	 * mitigation detected before. That may happen when the
+	 * temperature is greater than a trip point when registering a
+	 * thermal zone, which is a common use case as the kernel has
+	 * no mitigation mechanism yet at boot time.
+	 */
+	if (dfs->tz.trip_index < 0)
+		return;
+	
+	mutex_lock(&dfs->lock);
+	
+	tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
+
+	delta = ktime_sub(now, tze->trip_stats[trip_id].timestamp);
+	tze->trip_stats[trip_id].duration = ktime_add(delta,
+						      tze->trip_stats[trip_id].duration);
+
+	dfs->tz.trip_index--;
+
+	/*
+	 * This event closes the mitigation as we are crossing the
+	 * last trip point the way down.
+	 */
+	if (dfs->tz.trip_index < 0)
+		tze->duration = ktime_sub(now, tze->timestamp);
+
+	mutex_unlock(&dfs->lock);
+}
+
+void thermal_debug_update_temp(struct thermal_zone_device *tz)
+{
+	struct thermal_debugfs *dfs = tz->debugfs;
+	struct tz_events *tze;
+	int trip;
+
+	if (!dfs)
+		return;
+
+	mutex_lock(&dfs->lock);
+
+	if (dfs->tz.trip_index >= 0) {
+		trip = dfs->tz.trip_index;
+		tze = list_first_entry(&dfs->tz.tz_events, struct tz_events, node);
+		tze->trip_stats[trip].count++;
+		tze->trip_stats[trip].max = max(tze->trip_stats[trip].max, tz->temperature);
+		tze->trip_stats[trip].min = min(tze->trip_stats[trip].min, tz->temperature);
+		tze->trip_stats[trip].avg = tze->trip_stats[trip].avg +
+			(tz->temperature - tze->trip_stats[trip].avg) /
+			tze->trip_stats[trip].count;
+	}
+
+	mutex_unlock(&dfs->lock);
+}
+
+static void *tze_seq_start(struct seq_file *s, loff_t *pos)
+{
+	struct thermal_zone_device *tz = s->private;
+	struct thermal_debugfs *dfs = tz->debugfs;
+	struct tz_debugfs *tzd = &dfs->tz;
+
+	mutex_lock(&dfs->lock);
+
+	return seq_list_start(&tzd->tz_events, *pos);
+}
+
+static void *tze_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct thermal_zone_device *tz = s->private;
+	struct thermal_debugfs *dfs = tz->debugfs;
+	struct tz_debugfs *tzd = &dfs->tz;
+
+	return seq_list_next(v, &tzd->tz_events, pos);
+}
+
+static void tze_seq_stop(struct seq_file *s, void *v)
+{
+	struct thermal_zone_device *tz = s->private;
+	struct thermal_debugfs *dfs = tz->debugfs;
+
+	mutex_unlock(&dfs->lock);
+}
+
+static int tze_seq_show(struct seq_file *s, void *v)
+{
+	struct thermal_zone_device *tz = s->private;
+	struct tz_events *tze;
+	int i;
+
+	tze = list_entry((struct list_head *)v, struct tz_events, node);
+
+	seq_printf(s, ",-Mitigation at %lluus, duration=%llums\n",
+		   ktime_to_us(tze->timestamp),
+		   ktime_to_ms(tze->duration));
+
+	seq_printf(s, "| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |\n");
+	
+	for (i = 0; i < tz->num_trips; i++) {
+
+		struct thermal_trip trip;
+		const char *type;
+		
+		if (__thermal_zone_get_trip(tz, i, &trip))
+			continue;
+
+		/*
+		 * There is no possible mitigation happening at the
+		 * critical trip point, so the stats will be always
+		 * zero, skip this trip point
+		 */
+		if (trip.type == THERMAL_TRIP_CRITICAL)
+			continue;
+
+		if (trip.type == THERMAL_TRIP_PASSIVE)
+			type = "passive";
+		else if (trip.type == THERMAL_TRIP_ACTIVE)
+			type = "active";
+		else
+			type = "hot";
+
+		seq_printf(s, "| %*d | %*s | %*d | %*d | %*lld | %*d | %*d | %*d |\n",
+			   4 , i,
+			   8, type,
+			   9, trip.temperature,
+			   9, trip.hysteresis,
+			   10, ktime_to_ms(tze->trip_stats[i].duration),
+			   9, tze->trip_stats[i].avg,
+			   9, tze->trip_stats[i].min,
+			   9, tze->trip_stats[i].max);
+	}
+
+	return 0;
+}
+
+static const struct seq_operations tze_sops = {
+	.start = tze_seq_start,
+	.next = tze_seq_next,
+	.stop = tze_seq_stop,
+	.show = tze_seq_show,
+};
+
+DEFINE_SEQ_ATTRIBUTE(tze);
+
+void thermal_debug_tz_add(struct thermal_zone_device *tz)
+{
+	struct thermal_debugfs *dfs;
+	struct tz_debugfs *tzd;
+
+	dfs = thermal_debugfs_add_id(d_tz, tz->id);
+	if (!dfs)
+		return;
+
+	tzd = &dfs->tz;
+
+	tzd->trips_crossed = kzalloc(sizeof(int) * tz->num_trips, GFP_KERNEL);
+	if (!tzd->trips_crossed) {
+		thermal_debugfs_remove_id(dfs);
+		return;
+	}
+
+	/*
+	 * Trip index '-1' means no mitigation
+	 */
+	tzd->trip_index = -1;
+	INIT_LIST_HEAD(&tzd->tz_events);
+
+	debugfs_create_file("mitigations", 0400, dfs->d_top, tz, &tze_fops);
+	
+	tz->debugfs = dfs;
+}
+
+void thermal_debug_tz_remove(struct thermal_zone_device *tz)
+{
+	struct thermal_debugfs *dfs = tz->debugfs;
+
+	if (!dfs)
+		return;
+
+	mutex_lock(&dfs->lock);
+
+	tz->debugfs = NULL;
+	thermal_debugfs_remove_id(dfs);
+
+	mutex_unlock(&dfs->lock);
+}
diff --git a/drivers/thermal/thermal_debugfs.h b/drivers/thermal/thermal_debugfs.h
index 341499388448..155b9af5fe87 100644
--- a/drivers/thermal/thermal_debugfs.h
+++ b/drivers/thermal/thermal_debugfs.h
@@ -5,10 +5,24 @@ void thermal_debug_init(void);
 void thermal_debug_cdev_add(struct thermal_cooling_device *cdev);
 void thermal_debug_cdev_remove(struct thermal_cooling_device *cdev);
 void thermal_debug_cdev_state_update(const struct thermal_cooling_device *cdev, int state);
+void thermal_debug_tz_add(struct thermal_zone_device *tz);
+void thermal_debug_tz_remove(struct thermal_zone_device *tz);
+void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
+			      const struct thermal_trip *trip);
+void thermal_debug_tz_trip_down(struct thermal_zone_device *tz,
+				const struct thermal_trip *trip);
+void thermal_debug_update_temp(struct thermal_zone_device *tz);
 #else
 static inline void thermal_debug_init(void) {}
 static inline void thermal_debug_cdev_add(struct thermal_cooling_device *cdev) {}
 static inline void thermal_debug_cdev_remove(struct thermal_cooling_device *cdev) {}
 static inline void thermal_debug_cdev_state_update(const struct thermal_cooling_device *cdev,
 						   int state) {}
+static inline void thermal_debug_tz_add(struct thermal_zone_device *tz) {}
+static inline void thermal_debug_tz_remove(struct thermal_zone_device *tz) {}
+static inline void thermal_debug_tz_trip_up(struct thermal_zone_device *tz,
+					    const struct thermal_trip *trip) {};
+static inline void thermal_debug_tz_trip_down(struct thermal_zone_device *tz,
+					      const struct thermal_trip *trip) {}
+static inline void thermal_debug_update_temp(struct thermal_zone_device *tz) {}
 #endif /* CONFIG_THERMAL_DEBUGFS */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-12-27 19:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-23 18:06 [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2023-12-19  9:25 [PATCH v3 1/2] thermal/debugfs: Add thermal cooling device debugfs information Daniel Lezcano
2023-12-19  9:25 ` [PATCH v3 2/2] thermal/debugfs: Add thermal debugfs information for mitigation episodes Daniel Lezcano
2023-12-21 19:26   ` Rafael J. Wysocki
2023-12-23 11:41     ` Daniel Lezcano
2023-12-27 19:53       ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.