All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukasz Luba <lukasz.luba@arm.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	dri-devel@lists.freedesktop.org, rui.zhang@intel.com,
	amit.kucheria@verdurent.com, orjan.eide@arm.com, robh@kernel.org,
	alyssa.rosenzweig@collabora.com, steven.price@arm.com,
	airlied@linux.ie, daniel@ffwll.ch, ionela.voinescu@arm.com
Subject: Re: [PATCH v2 2/5] thermal: devfreq_cooling: get a copy of device status
Date: Thu, 3 Dec 2020 15:38:03 +0000	[thread overview]
Message-ID: <d9906ed8-e3bf-5e42-2e43-09071848ae48@arm.com> (raw)
In-Reply-To: <5d4743b9-5b2f-8494-8d10-6a5fd2c0fdfd@linaro.org>



On 12/3/20 1:09 PM, Daniel Lezcano wrote:
> On 18/11/2020 13:03, Lukasz Luba wrote:
>> Devfreq cooling needs to now the correct status of the device in order
>> to operate. Do not rely on Devfreq last_status which might be a stale data
>> and get more up-to-date values of the load.
>>
>> Devfreq framework can change the device status in the background. To
>> mitigate this situation make a copy of the status structure and use it
>> for internal calculations.
>>
>> In addition this patch adds normalization function, which also makes sure
>> that whatever data comes from the device, it is in a sane range.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   drivers/thermal/devfreq_cooling.c | 52 +++++++++++++++++++++++++------
>>   1 file changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
>> index 659c0143c9f0..925523694462 100644
>> --- a/drivers/thermal/devfreq_cooling.c
>> +++ b/drivers/thermal/devfreq_cooling.c
>> @@ -227,20 +227,46 @@ static inline unsigned long get_total_power(struct devfreq_cooling_device *dfc,
>>   							       voltage);
>>   }
>>   
>> +static void _normalize_load(struct devfreq_dev_status *status)
>> +{
>> +	/* Make some space if needed */
>> +	if (status->busy_time > 0xffff) {
>> +		status->busy_time >>= 10;
>> +		status->total_time >>= 10;
>> +	}
>> +
>> +	if (status->busy_time > status->total_time)
>> +		status->busy_time = status->total_time;
> 
> How the condition above is possible?

They should, be checked by the driver, but I cannot trust
and have to check for all corner cases: (div by 0, overflow
one of them, etc). The busy_time and total_time are unsigned long,
which means 4B on 32bit machines.
If these values are coming from device counters, which count every
busy cycle and total cycles of a clock of a device running at e.g.
1GHz they would overflow every ~4s.

Normally IPA polling are 1s and 100ms, it's platform specific. But there
are also 'empty' periods when IPA sees temperature very low and does not
even call the .get_requested_power() callbacks for the cooling devices,
just grants max freq to all. This is problematic. I am investigating it
and will propose a solution for IPA soon.

I would avoid all of this if devfreq core would have default for all
devices a reliable polling timer... Let me check some possibilities also
for this case.

> 
>> +	status->busy_time *= 100;
>> +	status->busy_time /= status->total_time ? : 1;
>> +
>> +	/* Avoid division by 0 */
>> +	status->busy_time = status->busy_time ? : 1;
>> +	status->total_time = 100;
> 
> Why not base the normalization on 1024? and use an intermediate u64.

You are the 2nd reviewer who is asking this. I tried to keep 'load' as
in range [0, 100] since we also have 'load' in cpufreq cooling in this
range. Maybe I should switch to 1024 (Ionela was also asking for this).

> 
> For example:
> 
> static u32 _normalize_load(struct devfreq_dev_status *status)
> {
> 	u64 load = 0;
> 
> 	/* Prevent divison by zero */
> 	if (!status->busy_time)
> 		return 0;
> 
> 	/*
> 	 * Assuming status->total_time is always greater or equal
> 	 * to status->busy_time, it can not be equal to zero because
> 	 * of the test above
> 	 */
> 	load = status->busy_time * 1024;
> 	load /= status->total_time;

I wanted to avoid any divisions which involve 64bit var on 32bit
machine.

> 
> 	/*
> 	 * load is always [1..1024[, so it can not be truncated by a
> 	 * u64 -> u32 coercive cast
> 	 */
> 	return (u32)load;
> }
> 
> 
>> +}
>>   
>>   static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cdev,
>>   					       u32 *power)
>>   {
>>   	struct devfreq_cooling_device *dfc = cdev->devdata;
>>   	struct devfreq *df = dfc->devfreq;
>> -	struct devfreq_dev_status *status = &df->last_status;
>> +	struct devfreq_dev_status status;
>>   	unsigned long state;
>> -	unsigned long freq = status->current_frequency;
>> +	unsigned long freq;
>>   	unsigned long voltage;
>>   	u32 dyn_power = 0;
>>   	u32 static_power = 0;
>>   	int res;
>>   
>> +	mutex_lock(&df->lock);
>> +	res = df->profile->get_dev_status(df->dev.parent, &status);
>> +	mutex_unlock(&df->lock);
>> +	if (res)
>> +		return res;
>> +
>> +	freq = status.current_frequency;
>> +
>>   	state = freq_get_state(dfc, freq);
>>   	if (state == THERMAL_CSTATE_INVALID) {
>>   		res = -EAGAIN;
>> @@ -268,16 +294,18 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
>>   	} else {
>>   		dyn_power = dfc->power_table[state];
>>   
>> +		_normalize_load(&status);
> 
> 		load = _normalize_load(&status);
> 
>> +
>>   		/* Scale dynamic power for utilization */
>> -		dyn_power *= status->busy_time;
>> -		dyn_power /= status->total_time;
>> +		dyn_power *= status.busy_time;
>> +		dyn_power /= status.total_time;
> 
> 		/*
> 		 * May be change dyn_power to a u64 to prevent overflow
> 		 * when multiplied by 'load'
> 		 */
> 		dyn_power = (dyn_power * load) / 1024;

dyn_power value from EM should fit in 16bit [1], so we should be safe.

I will experiment with the 1024 code and check some corner cases.

Thank you Daniel for the review!

Regards,
Lukasz

[1] 
https://elixir.bootlin.com/linux/v5.10-rc5/source/kernel/power/energy_model.c#L135

WARNING: multiple messages have this Message-ID (diff)
From: Lukasz Luba <lukasz.luba@arm.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: amit.kucheria@verdurent.com, linux-pm@vger.kernel.org,
	airlied@linux.ie, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org, steven.price@arm.com,
	alyssa.rosenzweig@collabora.com, rui.zhang@intel.com,
	ionela.voinescu@arm.com, orjan.eide@arm.com
Subject: Re: [PATCH v2 2/5] thermal: devfreq_cooling: get a copy of device status
Date: Thu, 3 Dec 2020 15:38:03 +0000	[thread overview]
Message-ID: <d9906ed8-e3bf-5e42-2e43-09071848ae48@arm.com> (raw)
In-Reply-To: <5d4743b9-5b2f-8494-8d10-6a5fd2c0fdfd@linaro.org>



On 12/3/20 1:09 PM, Daniel Lezcano wrote:
> On 18/11/2020 13:03, Lukasz Luba wrote:
>> Devfreq cooling needs to now the correct status of the device in order
>> to operate. Do not rely on Devfreq last_status which might be a stale data
>> and get more up-to-date values of the load.
>>
>> Devfreq framework can change the device status in the background. To
>> mitigate this situation make a copy of the status structure and use it
>> for internal calculations.
>>
>> In addition this patch adds normalization function, which also makes sure
>> that whatever data comes from the device, it is in a sane range.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
>> ---
>>   drivers/thermal/devfreq_cooling.c | 52 +++++++++++++++++++++++++------
>>   1 file changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
>> index 659c0143c9f0..925523694462 100644
>> --- a/drivers/thermal/devfreq_cooling.c
>> +++ b/drivers/thermal/devfreq_cooling.c
>> @@ -227,20 +227,46 @@ static inline unsigned long get_total_power(struct devfreq_cooling_device *dfc,
>>   							       voltage);
>>   }
>>   
>> +static void _normalize_load(struct devfreq_dev_status *status)
>> +{
>> +	/* Make some space if needed */
>> +	if (status->busy_time > 0xffff) {
>> +		status->busy_time >>= 10;
>> +		status->total_time >>= 10;
>> +	}
>> +
>> +	if (status->busy_time > status->total_time)
>> +		status->busy_time = status->total_time;
> 
> How the condition above is possible?

They should, be checked by the driver, but I cannot trust
and have to check for all corner cases: (div by 0, overflow
one of them, etc). The busy_time and total_time are unsigned long,
which means 4B on 32bit machines.
If these values are coming from device counters, which count every
busy cycle and total cycles of a clock of a device running at e.g.
1GHz they would overflow every ~4s.

Normally IPA polling are 1s and 100ms, it's platform specific. But there
are also 'empty' periods when IPA sees temperature very low and does not
even call the .get_requested_power() callbacks for the cooling devices,
just grants max freq to all. This is problematic. I am investigating it
and will propose a solution for IPA soon.

I would avoid all of this if devfreq core would have default for all
devices a reliable polling timer... Let me check some possibilities also
for this case.

> 
>> +	status->busy_time *= 100;
>> +	status->busy_time /= status->total_time ? : 1;
>> +
>> +	/* Avoid division by 0 */
>> +	status->busy_time = status->busy_time ? : 1;
>> +	status->total_time = 100;
> 
> Why not base the normalization on 1024? and use an intermediate u64.

You are the 2nd reviewer who is asking this. I tried to keep 'load' as
in range [0, 100] since we also have 'load' in cpufreq cooling in this
range. Maybe I should switch to 1024 (Ionela was also asking for this).

> 
> For example:
> 
> static u32 _normalize_load(struct devfreq_dev_status *status)
> {
> 	u64 load = 0;
> 
> 	/* Prevent divison by zero */
> 	if (!status->busy_time)
> 		return 0;
> 
> 	/*
> 	 * Assuming status->total_time is always greater or equal
> 	 * to status->busy_time, it can not be equal to zero because
> 	 * of the test above
> 	 */
> 	load = status->busy_time * 1024;
> 	load /= status->total_time;

I wanted to avoid any divisions which involve 64bit var on 32bit
machine.

> 
> 	/*
> 	 * load is always [1..1024[, so it can not be truncated by a
> 	 * u64 -> u32 coercive cast
> 	 */
> 	return (u32)load;
> }
> 
> 
>> +}
>>   
>>   static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cdev,
>>   					       u32 *power)
>>   {
>>   	struct devfreq_cooling_device *dfc = cdev->devdata;
>>   	struct devfreq *df = dfc->devfreq;
>> -	struct devfreq_dev_status *status = &df->last_status;
>> +	struct devfreq_dev_status status;
>>   	unsigned long state;
>> -	unsigned long freq = status->current_frequency;
>> +	unsigned long freq;
>>   	unsigned long voltage;
>>   	u32 dyn_power = 0;
>>   	u32 static_power = 0;
>>   	int res;
>>   
>> +	mutex_lock(&df->lock);
>> +	res = df->profile->get_dev_status(df->dev.parent, &status);
>> +	mutex_unlock(&df->lock);
>> +	if (res)
>> +		return res;
>> +
>> +	freq = status.current_frequency;
>> +
>>   	state = freq_get_state(dfc, freq);
>>   	if (state == THERMAL_CSTATE_INVALID) {
>>   		res = -EAGAIN;
>> @@ -268,16 +294,18 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
>>   	} else {
>>   		dyn_power = dfc->power_table[state];
>>   
>> +		_normalize_load(&status);
> 
> 		load = _normalize_load(&status);
> 
>> +
>>   		/* Scale dynamic power for utilization */
>> -		dyn_power *= status->busy_time;
>> -		dyn_power /= status->total_time;
>> +		dyn_power *= status.busy_time;
>> +		dyn_power /= status.total_time;
> 
> 		/*
> 		 * May be change dyn_power to a u64 to prevent overflow
> 		 * when multiplied by 'load'
> 		 */
> 		dyn_power = (dyn_power * load) / 1024;

dyn_power value from EM should fit in 16bit [1], so we should be safe.

I will experiment with the 1024 code and check some corner cases.

Thank you Daniel for the review!

Regards,
Lukasz

[1] 
https://elixir.bootlin.com/linux/v5.10-rc5/source/kernel/power/energy_model.c#L135
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-12-03 15:39 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-18 12:03 [PATCH v2 0/5] Thermal devfreq cooling improvements with Energy Model Lukasz Luba
2020-11-18 12:03 ` Lukasz Luba
2020-11-18 12:03 ` [PATCH v2 1/5] thermal: devfreq_cooling: change tracing function and arguments Lukasz Luba
2020-11-18 12:03   ` Lukasz Luba
2020-12-02 10:23   ` Ionela Voinescu
2020-12-02 10:23     ` Ionela Voinescu
2020-11-18 12:03 ` [PATCH v2 2/5] thermal: devfreq_cooling: get a copy of device status Lukasz Luba
2020-11-18 12:03   ` Lukasz Luba
2020-12-02 10:23   ` Ionela Voinescu
2020-12-02 10:23     ` Ionela Voinescu
2020-12-03 13:09   ` Daniel Lezcano
2020-12-03 13:09     ` Daniel Lezcano
2020-12-03 15:38     ` Lukasz Luba [this message]
2020-12-03 15:38       ` Lukasz Luba
2020-12-03 16:09       ` Daniel Lezcano
2020-12-03 16:09         ` Daniel Lezcano
2020-12-07 12:41         ` Lukasz Luba
2020-12-07 12:41           ` Lukasz Luba
2020-12-08 14:20           ` Lukasz Luba
2020-12-08 14:20             ` Lukasz Luba
2020-11-18 12:03 ` [PATCH v2 3/5] thermal: devfreq_cooling: add new registration functions with Energy Model Lukasz Luba
2020-11-18 12:03   ` Lukasz Luba
2020-12-02 10:24   ` Ionela Voinescu
2020-12-02 10:24     ` Ionela Voinescu
2020-12-02 11:14     ` Lukasz Luba
2020-12-02 11:14       ` Lukasz Luba
2020-12-02 11:49       ` Ionela Voinescu
2020-12-02 11:49         ` Ionela Voinescu
2020-12-02 11:54         ` Lukasz Luba
2020-12-02 11:54           ` Lukasz Luba
2020-12-03 15:40   ` Daniel Lezcano
2020-12-03 15:40     ` Daniel Lezcano
2020-12-07  9:46     ` Lukasz Luba
2020-12-07  9:46       ` Lukasz Luba
2020-11-18 12:03 ` [PATCH v2 4/5] thermal: devfreq_cooling: remove old power model and use EM Lukasz Luba
2020-11-18 12:03   ` Lukasz Luba
2020-12-02 10:26   ` Ionela Voinescu
2020-12-02 10:26     ` Ionela Voinescu
2020-11-18 12:03 ` [PATCH v2 5/5] drm/panfrost: Register devfreq cooling and attempt to add Energy Model Lukasz Luba
2020-11-18 12:03   ` Lukasz Luba
2020-12-02 15:01 ` [PATCH v2 0/5] Thermal devfreq cooling improvements with " Daniel Lezcano
2020-12-02 15:01   ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9906ed8-e3bf-5e42-2e43-09071848ae48@arm.com \
    --to=lukasz.luba@arm.com \
    --cc=airlied@linux.ie \
    --cc=alyssa.rosenzweig@collabora.com \
    --cc=amit.kucheria@verdurent.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ionela.voinescu@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=orjan.eide@arm.com \
    --cc=robh@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.