linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Regression] cached max_state breaks ACPI processor cooling device
@ 2023-02-25  5:29 Zhang, Rui
  2023-02-25  5:34 ` Zhang, Rui
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Zhang, Rui @ 2023-02-25  5:29 UTC (permalink / raw)
  To: viresh.kumar, Wysocki, Rafael J, daniel.lezcano
  Cc: srinivas.pandruvada, linux-pm, linux-kernel

Hi, All,

Starting from commit c408b3d1d9bb("thermal: Validate new state in
cur_state_store()") and commit a365105c685c("thermal: sysfs: Reuse
cdev->max_state"), the cdev->get_max_state() is evaluated only once
during cooling device registration.

This is done to fix the below Smatch static checker warning:
	drivers/thermal/thermal_sysfs.c:656 thermal_cooling_device_stats_update()
	warn: potential integer overflow from user 'stats->state *
stats->max_states + new_state'
reported here https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/.

But this actually breaks cooling devices which could have dynamic max
cooling state, like ACPI processor cooling device.

acpi_processor_driver_init
	driver_register(&acpi_processor_driver);
		acpi_processor_start
			acpi_processor_thermal_init
				thermal_cooling_device_register
					processor_get_max_state
	acpi_processor_cpufreq_init = true
The driver doesn't count cpufreq as cooling state until
acpi_processor_cpufreq_init is set later.

As a result, without the commits above,
/sys/class/thermal/cooling_device10/cur_state:0
/sys/class/thermal/cooling_device10/max_state:3
/sys/class/thermal/cooling_device10/type:Processor
after the commits above, 
/sys/class/thermal/cooling_device10/cur_state:0
/sys/class/thermal/cooling_device10/max_state:0
/sys/class/thermal/cooling_device10/type:Processor

In order to fix this, there are something worth clarification IMO.
1. should we support dynamic max_state or not?
   ACPI processor cooling state is a combination of processor p-states
   and t-states.
   t-states are static, but p-states can vary depends on processor 
   frequency driver loaded or not.
   I'm not sure if there is any other user like this, but still this
   is a valid use case of dynamic max_state.
2. how to handle dynamic max_state for cooling device statistics in
   sysfs?
   IMO, when max_state changes, the definition of previous cooling
   device is changed as well, thus we should abandon the previous
   statistics and restart counting.
3. anything else needs to be handled when max_state changes?
   Say, as the definition of each cooling state is changed when
   max_state changes, should we invalidate and re-update the
   thermal instances of this cdev in each thermal zone device?
4. how to detect/handle max cooling states changes?
   Should we do this as before, which invokes .get_max_state()
   everywhere and do updates when necessary , or
   a. cache max_state like we do today
   b. introduce a new API for max_state change
   c. invoke the new API in the cooling device driver explicitly when
      max_state changes
   d. update cached max_state value, statistics sysfs and
      thermal_instances in the API
   e. remove .get_max_state() callback as we register and update the
      max_state with a fixed value each time.

thanks,
rui




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Regression] cached max_state breaks ACPI processor cooling device
  2023-02-25  5:29 [Regression] cached max_state breaks ACPI processor cooling device Zhang, Rui
@ 2023-02-25  5:34 ` Zhang, Rui
  2023-02-27  6:49   ` srinivas pandruvada
  2023-02-27 12:03 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-03-02 19:29 ` Rafael J. Wysocki
  2 siblings, 1 reply; 5+ messages in thread
From: Zhang, Rui @ 2023-02-25  5:34 UTC (permalink / raw)
  To: viresh.kumar, Wysocki, Rafael J, daniel.lezcano
  Cc: srinivas.pandruvada, Wang, Quanxian, linux-pm, linux-kernel

On Sat, 2023-02-25 at 13:29 +0800, Zhang Rui wrote:
> Hi, All,
> 
> Starting from commit c408b3d1d9bb("thermal: Validate new state in
> cur_state_store()") and commit a365105c685c("thermal: sysfs: Reuse
> cdev->max_state"), the cdev->get_max_state() is evaluated only once
> during cooling device registration.
> 
> This is done to fix the below Smatch static checker warning:
> 	drivers/thermal/thermal_sysfs.c:656
> thermal_cooling_device_stats_update()
> 	warn: potential integer overflow from user 'stats->state *
> stats->max_states + new_state'
> reported here https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/.
> 
> But this actually breaks cooling devices which could have dynamic max
> cooling state, like ACPI processor cooling device.
> 
> acpi_processor_driver_init
> 	driver_register(&acpi_processor_driver);
> 		acpi_processor_start
> 			acpi_processor_thermal_init
> 				thermal_cooling_device_register
> 					processor_get_max_state
> 	acpi_processor_cpufreq_init = true
> The driver doesn't count cpufreq as cooling state until
> acpi_processor_cpufreq_init is set later.
> 
> As a result, without the commits above,
> /sys/class/thermal/cooling_device10/cur_state:0
> /sys/class/thermal/cooling_device10/max_state:3
> /sys/class/thermal/cooling_device10/type:Processor
> after the commits above, 
> /sys/class/thermal/cooling_device10/cur_state:0
> /sys/class/thermal/cooling_device10/max_state:0
> /sys/class/thermal/cooling_device10/type:Processor

Forgot to mention that this problem is

Reported-by: Wang, Quanxian <quanxian.wang@intel.com>

thanks,
rui
> 
> In order to fix this, there are something worth clarification IMO.
> 1. should we support dynamic max_state or not?
>    ACPI processor cooling state is a combination of processor p-
> states
>    and t-states.
>    t-states are static, but p-states can vary depends on processor 
>    frequency driver loaded or not.
>    I'm not sure if there is any other user like this, but still this
>    is a valid use case of dynamic max_state.
> 2. how to handle dynamic max_state for cooling device statistics in
>    sysfs?
>    IMO, when max_state changes, the definition of previous cooling
>    device is changed as well, thus we should abandon the previous
>    statistics and restart counting.
> 3. anything else needs to be handled when max_state changes?
>    Say, as the definition of each cooling state is changed when
>    max_state changes, should we invalidate and re-update the
>    thermal instances of this cdev in each thermal zone device?
> 4. how to detect/handle max cooling states changes?
>    Should we do this as before, which invokes .get_max_state()
>    everywhere and do updates when necessary , or
>    a. cache max_state like we do today
>    b. introduce a new API for max_state change
>    c. invoke the new API in the cooling device driver explicitly when
>       max_state changes
>    d. update cached max_state value, statistics sysfs and
>       thermal_instances in the API
>    e. remove .get_max_state() callback as we register and update the
>       max_state with a fixed value each time.
> 
> thanks,
> rui
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Regression] cached max_state breaks ACPI processor cooling device
  2023-02-25  5:34 ` Zhang, Rui
@ 2023-02-27  6:49   ` srinivas pandruvada
  0 siblings, 0 replies; 5+ messages in thread
From: srinivas pandruvada @ 2023-02-27  6:49 UTC (permalink / raw)
  To: Zhang, Rui, viresh.kumar, Wysocki, Rafael J, daniel.lezcano
  Cc: Wang, Quanxian, linux-pm, linux-kernel

On Sat, 2023-02-25 at 05:34 +0000, Zhang, Rui wrote:
> On Sat, 2023-02-25 at 13:29 +0800, Zhang Rui wrote:
> > Hi, All,
> > 
> > Starting from commit c408b3d1d9bb("thermal: Validate new state in
> > cur_state_store()") and commit a365105c685c("thermal: sysfs: Reuse
> > cdev->max_state"), the cdev->get_max_state() is evaluated only once
> > during cooling device registration.
> > 
> > This is done to fix the below Smatch static checker warning:
> >         drivers/thermal/thermal_sysfs.c:656
> > thermal_cooling_device_stats_update()
> >         warn: potential integer overflow from user 'stats->state *
> > stats->max_states + new_state'
> > reported here https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/.
> > 
> > But this actually breaks cooling devices which could have dynamic
> > max
> > cooling state, like ACPI processor cooling device.
> > 
> > acpi_processor_driver_init
> >         driver_register(&acpi_processor_driver);
> >                 acpi_processor_start
> >                         acpi_processor_thermal_init
> >                                 thermal_cooling_device_register
> >                                         processor_get_max_state
> >         acpi_processor_cpufreq_init = true
> > The driver doesn't count cpufreq as cooling state until
> > acpi_processor_cpufreq_init is set later.
> > 
> > As a result, without the commits above,
> > /sys/class/thermal/cooling_device10/cur_state:0
> > /sys/class/thermal/cooling_device10/max_state:3
> > /sys/class/thermal/cooling_device10/type:Processor
> > after the commits above, 
> > /sys/class/thermal/cooling_device10/cur_state:0
> > /sys/class/thermal/cooling_device10/max_state:0
> > /sys/class/thermal/cooling_device10/type:Processor
> 
You really need a core API, which updates max_state. When I was trying
to update max_state for powerclamp driver, I thought of a function to
do this.
This basically works under  cdev->lock, something similar to
__thermal_cdev_update. For each thermal instances if the shallowest
state is more than the new max_state, fail the request, otherwise set.


Thanks,
Srinivas


> Forgot to mention that this problem is
> 
> Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> 
> thanks,
> rui
> > 
> > In order to fix this, there are something worth clarification IMO.
> > 1. should we support dynamic max_state or not?
> >    ACPI processor cooling state is a combination of processor p-
> > states
> >    and t-states.
> >    t-states are static, but p-states can vary depends on processor 
> >    frequency driver loaded or not.
> >    I'm not sure if there is any other user like this, but still
> > this
> >    is a valid use case of dynamic max_state.
> > 2. how to handle dynamic max_state for cooling device statistics in
> >    sysfs?
> >    IMO, when max_state changes, the definition of previous cooling
> >    device is changed as well, thus we should abandon the previous
> >    statistics and restart counting.
> > 3. anything else needs to be handled when max_state changes?
> >    Say, as the definition of each cooling state is changed when
> >    max_state changes, should we invalidate and re-update the
> >    thermal instances of this cdev in each thermal zone device?
> > 4. how to detect/handle max cooling states changes?
> >    Should we do this as before, which invokes .get_max_state()
> >    everywhere and do updates when necessary , or
> >    a. cache max_state like we do today
> >    b. introduce a new API for max_state change
> >    c. invoke the new API in the cooling device driver explicitly
> > when
> >       max_state changes
> >    d. update cached max_state value, statistics sysfs and
> >       thermal_instances in the API
> >    e. remove .get_max_state() callback as we register and update
> > the
> >       max_state with a fixed value each time.
> > 
> > thanks,
> > rui
> > 
> > 
> > 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Regression] cached max_state breaks ACPI processor cooling device
  2023-02-25  5:29 [Regression] cached max_state breaks ACPI processor cooling device Zhang, Rui
  2023-02-25  5:34 ` Zhang, Rui
@ 2023-02-27 12:03 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-03-02 19:29 ` Rafael J. Wysocki
  2 siblings, 0 replies; 5+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-02-27 12:03 UTC (permalink / raw)
  To: Zhang, Rui, viresh.kumar, Wysocki, Rafael J, daniel.lezcano
  Cc: srinivas.pandruvada, linux-pm, linux-kernel,
	Linux kernel regressions list

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 25.02.23 06:29, Zhang, Rui wrote:
>
> Starting from commit c408b3d1d9bb("thermal: Validate new state in
> cur_state_store()") and commit a365105c685c("thermal: sysfs: Reuse
> cdev->max_state"), the cdev->get_max_state() is evaluated only once
> during cooling device registration.
> 
> This is done to fix the below Smatch static checker warning:
> 	drivers/thermal/thermal_sysfs.c:656 thermal_cooling_device_stats_update()
> 	warn: potential integer overflow from user 'stats->state *
> stats->max_states + new_state'
> reported here https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/.
> 
> But this actually breaks cooling devices which could have dynamic max
> cooling state, like ACPI processor cooling device.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced c408b3d1d9bb
#regzbot title thermal: cached max_state breaks ACPI processor cooling
device
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Regression] cached max_state breaks ACPI processor cooling device
  2023-02-25  5:29 [Regression] cached max_state breaks ACPI processor cooling device Zhang, Rui
  2023-02-25  5:34 ` Zhang, Rui
  2023-02-27 12:03 ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-03-02 19:29 ` Rafael J. Wysocki
  2 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2023-03-02 19:29 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: viresh.kumar, Wysocki, Rafael J, daniel.lezcano,
	srinivas.pandruvada, linux-pm, linux-kernel

On Sat, Feb 25, 2023 at 6:30 AM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> Hi, All,
>
> Starting from commit c408b3d1d9bb("thermal: Validate new state in
> cur_state_store()") and commit a365105c685c("thermal: sysfs: Reuse
> cdev->max_state"), the cdev->get_max_state() is evaluated only once
> during cooling device registration.
>
> This is done to fix the below Smatch static checker warning:
>         drivers/thermal/thermal_sysfs.c:656 thermal_cooling_device_stats_update()
>         warn: potential integer overflow from user 'stats->state *
> stats->max_states + new_state'
> reported here https://lore.kernel.org/all/Y0ltRJRjO7AkawvE@kili/.
>
> But this actually breaks cooling devices which could have dynamic max
> cooling state, like ACPI processor cooling device.
>
> acpi_processor_driver_init
>         driver_register(&acpi_processor_driver);
>                 acpi_processor_start
>                         acpi_processor_thermal_init
>                                 thermal_cooling_device_register
>                                         processor_get_max_state
>         acpi_processor_cpufreq_init = true
> The driver doesn't count cpufreq as cooling state until
> acpi_processor_cpufreq_init is set later.
>
> As a result, without the commits above,
> /sys/class/thermal/cooling_device10/cur_state:0
> /sys/class/thermal/cooling_device10/max_state:3
> /sys/class/thermal/cooling_device10/type:Processor
> after the commits above,
> /sys/class/thermal/cooling_device10/cur_state:0
> /sys/class/thermal/cooling_device10/max_state:0
> /sys/class/thermal/cooling_device10/type:Processor
>
> In order to fix this, there are something worth clarification IMO.
> 1. should we support dynamic max_state or not?
>    ACPI processor cooling state is a combination of processor p-states
>    and t-states.
>    t-states are static, but p-states can vary depends on processor
>    frequency driver loaded or not.
>    I'm not sure if there is any other user like this, but still this
>    is a valid use case of dynamic max_state.
> 2. how to handle dynamic max_state for cooling device statistics in
>    sysfs?
>    IMO, when max_state changes, the definition of previous cooling
>    device is changed as well, thus we should abandon the previous
>    statistics and restart counting.
> 3. anything else needs to be handled when max_state changes?
>    Say, as the definition of each cooling state is changed when
>    max_state changes, should we invalidate and re-update the
>    thermal instances of this cdev in each thermal zone device?
> 4. how to detect/handle max cooling states changes?
>    Should we do this as before, which invokes .get_max_state()
>    everywhere and do updates when necessary ,

No.

>  or
>    a. cache max_state like we do today
>    b. introduce a new API for max_state change
>    c. invoke the new API in the cooling device driver explicitly when
>       max_state changes
>    d. update cached max_state value, statistics sysfs and
>       thermal_instances in the API
>    e. remove .get_max_state() callback as we register and update the
>       max_state with a fixed value each time.

Yes, IMO.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-02 19:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-25  5:29 [Regression] cached max_state breaks ACPI processor cooling device Zhang, Rui
2023-02-25  5:34 ` Zhang, Rui
2023-02-27  6:49   ` srinivas pandruvada
2023-02-27 12:03 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-03-02 19:29 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).