linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thara Gopinath <thara.gopinath@linaro.org>
To: Dmitry Osipenko <digetx@gmail.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>
Cc: Thierry Reding <thierry.reding@gmail.com>,
	Jonathan Hunter <jonathanh@nvidia.com>,
	Zhang Rui <rui.zhang@intel.com>, Amit Kucheria <amitk@kernel.org>,
	Andreas Westman Dorcsak <hedmoo@yahoo.com>,
	Maxim Schwalm <maxim.schwalm@gmail.com>,
	Svyatoslav Ryhel <clamor95@gmail.com>,
	Ihor Didenko <tailormoon@rambler.ru>,
	Ion Agorria <ion@agorria.com>,
	Matt Merhar <mattmerhar@protonmail.com>,
	Peter Geis <pgwipeout@gmail.com>,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-tegra@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
Date: Tue, 15 Jun 2021 22:50:43 -0400	[thread overview]
Message-ID: <545974aa-bb0f-169b-6f31-6e8c2461343f@linaro.org> (raw)
In-Reply-To: <4c7b23c4-cf6a-0942-5250-63515be4a219@gmail.com>



On 6/15/21 3:32 PM, Dmitry Osipenko wrote:
> 15.06.2021 19:18, Daniel Lezcano пишет:
>> On 15/06/2021 15:01, Dmitry Osipenko wrote:
>>> 15.06.2021 13:26, Viresh Kumar пишет:
>>>> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>>>>
>>>>> [Cc Viresh]
>>>>>
>>>>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>>>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>>>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>>>>> throttling, which is activated by hardware once preprogrammed temperature
>>>>>> level is breached, they also send signal to Power Management controller to
>>>>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>>>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>>>>> and a cooling device.
>>>>>
>>>>> IMO it does not make sense to expose the hardware throttling mechanism
>>>>> as a cooling device because it is not usable anywhere from the thermal
>>>>> framework.
>>>>>
>>>>> Moreover, that will collide with the thermal / cpufreq framework
>>>>> mitigation (hardware sets the frequency but the software thinks the freq
>>>>> is different), right ?
>>>
>>> H/w mitigation is additional and should be transparent to the software
>>> mitigation. The software mitigation is much more flexible, but it has
>>> latency. Software also could crash and hang.
>>>
>>> Hardware mitigation doesn't have latency and it will continue to work
>>> regardless of the software state.
>>
>> Yes, I agree. Both solutions have their pros and cons. However, I don't
>> think they can co-exist sanely.
>>
>>> The CCF driver is aware about the h/w cooling status [1], hence software
>>> sees the actual frequency.
>>>
>>> [1]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660
>>
>> Ah interesting, thanks for the pointer.
>>
>> What I'm worried about is the consistency with cpufreq.
>>
>> Probably cpufreq_update_limits() should be called from the interrupt
>> handler.
> 
> IIUC, the cpufreq already should be prepared for the case where firmware
> may override frequency. Viresh, could you please clarify what are the
> possible implications of the frequency overriding?
> 
>>>> I am not even sure what the cooling device is doing here:
>>>>
>>>> tegra_tsensor_set_cur_state() is not implemented and it says hardware
>>>> changed it by itself. What is the benefit you are getting out of the
>>>> cooling device here ?
>>>
>>> It allows userspace to check whether hardware cooling is active via the
>>> cooling_device sysfs. Otherwise we don't have ability to check whether
>>> h/w cooling is active, I think it's a useful information. It's also
>>> interesting to see the cooling_device stats, showing how many times h/w
>>> mitigation was active.
>>
>> Actually the stats are for software mitigation. For the driver, create a
>> debugfs entry like what do the other drivers or a module parameter with
>> the stats.
> 
> Okay
> 
>>>>> The hardware limiter should let know the cpufreq framework about the
>>>>> frequency change.
>>>>>
>>>>> 	https://lkml.org/lkml/2021/6/8/1792
>>>>>
>>>>> May be post the sensor without the hw limiter for now and address that
>>>>> in a separate series ?
>>>>
>>>
>>> I wasn't aware about existence of the thermal pressure, thank you for
>>> pointing at it. At a quick glance it should be possible to benefit from
>>> the information about the additional pressure.
>>>
>>> Seems the current thermal pressure API assumes that there is only one
>>> user of the API. So it's impossible to aggregate the pressure from
>>> different sources, like software cpufreq pressure + h/w freq pressure.
>>> Correct? If yes, then please let me know yours thoughts about the best
>>> approach of supporting the aggregation.

Hi,

Thermal pressure is letting scheduler know that the max capacity 
available for a cpu to schedule tasks is reduced due to a thermal event.
So you cannot have a h/w thermal pressure and s/w thermal pressure. 
There is eventually only one capping applied at h/w level and the 
frequency corresponding to this capping should be used for thermal pressure.

Ideally you should not be having both s/w and h/w trying to throttle at 
the same time. Why is this a scenario and what prevents you from 
disabling s/w throttling when h/w throttling is enabled. Now if there 
has to a aggregation for whatever reason this should be done at the 
thermal driver level and passed to scheduler.

>>
>> That is a good question. IMO, first step would be to call
>> cpufreq_update_limits().
> 
> Right
> 
>> [ Cc Thara who implemented the thermal pressure ]
>>
>> May be Thara has an idea about how to aggregate both? There is another
>> series floating around with hardware limiter [1] and the same problematic.
>>
>>   [1] https://lkml.org/lkml/2021/6/8/1791
> 
> Thanks, it indeed looks similar.
> 
> I guess the common thermal pressure update code could be moved out into
> a new special cpufreq thermal QoS handler (policy->thermal_constraints),
> where handler will select the frequency constraint and set up the
> pressure accordingly. So there won't be any races in the code.
> 
It was a conscious decision to keep thermal pressure update out of qos 
max freq update because there are platforms that don't use the qos 
framework. For eg acpi uses cpufreq_update_policy.
But you are right. We have two platforms now applying h/w throttling and 
cpufreq_cooling applying s/w throttling. So it does make sense to have 
one api doing all the computation to update thermal pressure. I am not 
sure how exactly/where exactly this will reside.

So for starters, I think you should replicate the update of thermal 
pressure in your h/w driver when you know that h/w is 
throttling/throttled the frequency. You can refer to cpufreq_cooling.c 
to see how it is done.

Moving to a common api can be done as a separate patch series.

-- 
Warm Regards
Thara (She/Her/Hers)

  reply	other threads:[~2021-06-16  2:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-29 17:09 [PATCH v3 0/7] Add driver for NVIDIA Tegra30 SoC Thermal sensor Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 1/7] dt-bindings: thermal: Add binding for Tegra30 thermal sensor Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 2/7] thermal: thermal_of: Stop zone device before unregistering it Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 3/7] thermal/core: Export thermal_cooling_device_stats_update() Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor Dmitry Osipenko
2021-05-31 12:51   ` Thierry Reding
2021-06-15 10:03   ` Daniel Lezcano
2021-06-15 10:26     ` Viresh Kumar
2021-06-15 13:01       ` Dmitry Osipenko
2021-06-15 16:18         ` Daniel Lezcano
2021-06-15 19:32           ` Dmitry Osipenko
2021-06-16  2:50             ` Thara Gopinath [this message]
2021-06-16 10:47               ` Dmitry Osipenko
2021-06-16 14:36                 ` Thara Gopinath
2021-06-16  8:03             ` Viresh Kumar
2021-06-16  8:30               ` Vincent Guittot
2021-06-16  8:39                 ` Dmitry Osipenko
2021-06-16  8:51                   ` Vincent Guittot
2021-06-16 10:49                     ` Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 5/7] ARM: tegra_defconfig: Enable CONFIG_TEGRA30_TSENSOR Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 6/7] ARM: multi_v7_defconfig: " Dmitry Osipenko
2021-05-29 17:09 ` [PATCH v3 7/7] ARM: tegra: Add SoC thermal sensor to Tegra30 device-trees Dmitry Osipenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=545974aa-bb0f-169b-6f31-6e8c2461343f@linaro.org \
    --to=thara.gopinath@linaro.org \
    --cc=amitk@kernel.org \
    --cc=clamor95@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=devicetree@vger.kernel.org \
    --cc=digetx@gmail.com \
    --cc=hedmoo@yahoo.com \
    --cc=ion@agorria.com \
    --cc=jonathanh@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mattmerhar@protonmail.com \
    --cc=maxim.schwalm@gmail.com \
    --cc=pgwipeout@gmail.com \
    --cc=rui.zhang@intel.com \
    --cc=tailormoon@rambler.ru \
    --cc=thierry.reding@gmail.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).