linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vadim Pasternak <vadimp@nvidia.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>,
	"rui.zhang@intel.com" <rui.zhang@intel.com>
Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	Ido Schimmel <idosch@nvidia.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: RE: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation
Date: Mon, 27 Sep 2021 17:52:53 +0000	[thread overview]
Message-ID: <BN9PR12MB538128B5C2A0B04531E5775CAFA79@BN9PR12MB5381.namprd12.prod.outlook.com> (raw)
In-Reply-To: <d87227b8-57b9-fdb9-bb87-f01c6d0e23cc@linaro.org>



> -----Original Message-----
> From: Daniel Lezcano <daniel.lezcano@linaro.org>
> Sent: Monday, September 27, 2021 4:56 PM
> To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com
> Cc: linux-pm@vger.kernel.org; Ido Schimmel <idosch@nvidia.com>; Rafael J.
> Wysocki <rjw@rjwysocki.net>
> Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics
> update for configuration operation
> 
> On 27/09/2021 15:29, Vadim Pasternak wrote:
> >
> >
> >> -----Original Message-----
> >> From: Daniel Lezcano <daniel.lezcano@linaro.org>
> >> Sent: Monday, September 27, 2021 3:32 PM
> >> To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com
> >> Cc: linux-pm@vger.kernel.org; Ido Schimmel <idosch@nvidia.com>;
> Rafael J.
> >> Wysocki <rjw@rjwysocki.net>
> >> Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device
> >> statistics update for configuration operation
> >>
> >>
> >>
> >>
> >> On 27/09/2021 13:22, Vadim Pasternak wrote:
> >>> Hi Daniel,
> >>>
> >>> Thank you for quick reply.
> >>>
> >>>> -----Original Message-----
> >>>> From: Daniel Lezcano <daniel.lezcano@linaro.org>
> >>>> Sent: Monday, September 27, 2021 1:42 PM
> >>>> To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com
> >>>> Cc: =idosch@nvidia.com; linux-pm@vger.kernel.org
> >>>> Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device
> >>>> statistics update for configuration operation
> >>>>
> >>>>
> >>>> Hi Vadim,
> >>>>
> >>>>
> >>>> On 27/09/2021 10:24, Vadim Pasternak wrote:
> >>>>> The thermal subsystem maintains a transition table between states
> >>>>> that is allocated according to the maximum state supported by the
> >>>>> cooling device.
> >>>>>
> >>>>> When the table needs to be updated, the thermal subsystem does
> not
> >>>>> validate that the new state does not exceed the maximum state,
> >>>>> leading to out-of-bounds memory accesses [1].
> >>>>
> >>>> Actually, thermal_cooling_device_stats_update() is called if the
> >>>> set_cur_state is successful.
> >>>>
> >>>> With a state greater than the max state, the set_cur_state should
> >>>> fail and
> >>>> thermal_cooling_device_stats_update() is not called.
> >>>>
> >>>> Perhaps the problem is in mlxsw_thermal_set_cur_state() ?
> >>>
> >>> "mlxsw" thermal drivers has additional use of 'sysfs' 'cur_state'
> >>> for configuration purpose to limit minimum fan speed.
> >>> Fan speed minimum is enforced by setting 'cur_state' with value
> >>> exceeding actual fan speed maximum.
> >>
> >> Yes, and that is the problem because the driver is doing weird things
> >> with the cooling device state resulting in an abuse of the sysfs API
> >> and conflicting with the thermal internals.
> >>
> >>
> >>> This feature provides ability to limit fan speed according to some
> >>> system wise considerations, like absence of some replaceable units
> >>> or high system ambient temperature, or some other factors which
> >>> indirectly impacts system airflow.
> >>
> >> Is that a static thermal profile depending on the platform set by
> >> userspace or something which can be changed dynamically at runtime via
> eg. a daemon ?
> >
> > Yes, this is some profiles/rules, which are system specific and
> > according to these rules userspace can limit fan speed. Like, for example:
> > - if one of power supplies is removed, system fan should be enforced to
> full
> >   speed, because it makes a hole in a box, and it has hard impact on airflow.
> > - If port side ambient temperature reaches some threshold X1, fan speed
> should
> >   be limited by Y1%, X2 - Y2%, etcetera.
> > - if temperature fault is detected for any optical transceivers - some limit is
> >   required.
> 
> I see, thanks for the information.
> 
> >>> For example, if cooling devices operates at cooling levels from 1 to
> >>> 10
> >>> (1 for 10% fan speed, 10 for 100% fan speed), cooling device minimal
> >>> speed can be limited by setting 'cur_state' attribute through 'sysfs'
> >>> to the values from 'max_state' + 1 to 'max_state * 2' (from 11 to 20).
> >>> Following this example if value is set to 14 (40%) cooling levels
> >>> vector will be set to 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 10 for setting
> >>> device speed cooling states respectively in 40, 40, 40, 40, 40, 50,
> >>> 60. 70, 80, 90,
> >>> 100 percent. And it limits cooling device to operate only at 40%
> >>> speed and above.
> >>>
> >>> Maybe it would be worth adding earlier some dedicated 'cur_state_limit'
> >>> attribute for this feature, but it was not done.
> >>>
> >>> We have another driver required this feature and one new we are
> >>> developing now, which require fan minim speed limit as well.
> >>
> >> The use case is valid but I think the approach is wrong. Probably the
> >> simplest thing to do is to set a low trip point with a minimal fan speed.
> >
> > For "trip_point_0_temp" there is the below definition:
> > 	{	/* In range - 0-40% PWM */
> > 		.type		= THERMAL_TRIP_ACTIVE,
> > 		.temp		= MLXSW_THERMAL_ASIC_TEMP_NORM,
> > 		.hyst		= MLXSW_THERMAL_HYSTERESIS_TEMP,
> > 		.min_state	= 0,
> > 		.max_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
> (40%)
> > 	},
> > For "trip_point_1_temp":
> > 	{
> > 		/* In range - 40-100% PWM */
> > 		.type		= THERMAL_TRIP_ACTIVE,
> > 		.temp		= MLXSW_THERMAL_ASIC_TEMP_HIGH,
> > 		.hyst		= MLXSW_THERMAL_HYSTERESIS_TEMP,
> > 		.min_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
> (100%)
> > 		.max_state	= MLXSW_THERMAL_MAX_STATE,
> > 	},
> >
> > To limit cooling device by f.e 70%, I should change dynamically 'min_state'
> and 'max_state'
> > for both trips to (70%, 70%) and (70%, 100%). I am not sure I can do it?
> >
> > And we have many customers, using this user space interface, it would
> > be not so good to change it.
> 
> The issue is the driver is wrong here. The change you referred in the past
> should have not been merged. I note there is no thermal maintainer blessing
> in the tags.
> 
> > I understand it is possible to handle this issue inside driver's
> > set_cur_stat() callback by returning positive value for configuration
> request.
> > But maybe this feature could you useful for other developers and it
> > could be some common interface to support it?
> 
> I suggest to revert a421ce088ac8 and switch to hwmon to set the fan speed.
> At the first glance, it is already supported by the mlx driver, no ?
> 

Yes, hwmon is supported.
But setting fan trough hwmon means we'll have two owners controlling same device.
If speed is set through hwmon for example to 100%, thermal could decide to decrease
it, since it does not know what the reason was for setting full speed.
And it could make a completion between hwmon and thermal for cooling control.

Maybe adding new set_min_state()/get_min_state() callback could work?
In such case user space can limit speed through "min_state" attrbbute.

Maybe some other approach, but within thermal subsystem?





> 
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-
> blog/> Blog

  reply	other threads:[~2021-09-27 17:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27  8:24 [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation Vadim Pasternak
2021-09-27 10:41 ` Daniel Lezcano
2021-09-27 11:22   ` Vadim Pasternak
2021-09-27 12:31     ` Daniel Lezcano
2021-09-27 13:29       ` Vadim Pasternak
2021-09-27 13:55         ` Daniel Lezcano
2021-09-27 17:52           ` Vadim Pasternak [this message]
2021-09-27 21:12             ` Daniel Lezcano
2021-09-28 11:35               ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN9PR12MB538128B5C2A0B04531E5775CAFA79@BN9PR12MB5381.namprd12.prod.outlook.com \
    --to=vadimp@nvidia.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=idosch@nvidia.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=rui.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).