linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/4] thermal: core/ACPI: Fix processor cooling device regression
@ 2023-03-03 19:18 Rafael J. Wysocki
  2023-03-03 19:19 ` [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init() Rafael J. Wysocki
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-03 19:18 UTC (permalink / raw)
  To: Linux PM
  Cc: Zhang Rui, Linux ACPI, LKML, Daniel Lezcano, Srinivas Pandruvada,
	Viresh Kumar, Quanxian Wang

Hi All,

As reported by Rui in this thread:

Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/

some recent changes in the thermal core cause the CPU cooling devices
registered by the ACPI processor driver to become unusable in some cases
and somewhat crippled in general.

The problem is that the ACPI processor driver changes its ->get_max_state()
callback return value depending on whether or not cpufreq is available and
there is a cpufreq policy for a given CPU.  However, the thermal core has
always assumed that the return value of that callback will not change, which
in fact is relied on by the cooling device statistics code.  In particular,
when the ->get_max_state() grows, the memory buffer allocated for storing the
statistics will be too small and corruption may ensue as a result.

For this reason, the issue needs to be addressed in the ACPI processor driver
and not in the thermal core, but the core needs to help somewhat too.  Namely,
it needs to provide a helper allowing an interested driver to update the
max_state value for an already registered cooling device in certain situations
which will also cause the statistics to be rebuilt.

This series implements the above and for details please refer to the individual
patch chagelogs.

Please also note that it has only been lightly tested, so more testing and of
course review of it is welcome.

Thanks!




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init()
  2023-03-03 19:18 [PATCH v1 0/4] thermal: core/ACPI: Fix processor cooling device regression Rafael J. Wysocki
@ 2023-03-03 19:19 ` Rafael J. Wysocki
  2023-03-07 16:51   ` Zhang, Rui
  2023-03-12 16:08   ` Zhang, Rui
  2023-03-03 19:21 ` [PATCH v1 2/4] thermal: core: Introduce thermal_cooling_device_present() Rafael J. Wysocki
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-03 19:19 UTC (permalink / raw)
  To: Linux PM
  Cc: Zhang Rui, Linux ACPI, LKML, Daniel Lezcano, Srinivas Pandruvada,
	Viresh Kumar, Quanxian Wang

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The cpufreq policy notifier in the ACPI processor driver may as
well be registered before the driver itself, which causes
acpi_processor_cpufreq_init to be true (unless the notifier
registration fails, which is unlikely at that point) when the
ACPI CPU thermal cooling devices are registered, so the
processor_get_max_state() result does not change while
acpi_processor_driver_init() is running.

Change the ordering in acpi_processor_driver_init() accordingly
to prevent the max_state value from remaining 0 permanently for all
ACPI CPU cooling devices.

Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/processor_driver.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/acpi/processor_driver.c
===================================================================
--- linux-pm.orig/drivers/acpi/processor_driver.c
+++ linux-pm/drivers/acpi/processor_driver.c
@@ -263,6 +263,12 @@ static int __init acpi_processor_driver_
 	if (acpi_disabled)
 		return 0;
 
+	if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
+				       CPUFREQ_POLICY_NOTIFIER)) {
+		acpi_processor_cpufreq_init = true;
+		acpi_processor_ignore_ppc_init();
+	}
+
 	result = driver_register(&acpi_processor_driver);
 	if (result < 0)
 		return result;
@@ -276,12 +282,6 @@ static int __init acpi_processor_driver_
 	cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD, "acpi/cpu-drv:dead",
 				  NULL, acpi_soft_cpu_dead);
 
-	if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
-				       CPUFREQ_POLICY_NOTIFIER)) {
-		acpi_processor_cpufreq_init = true;
-		acpi_processor_ignore_ppc_init();
-	}
-
 	acpi_processor_throttling_init();
 	return 0;
 err:




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 2/4] thermal: core: Introduce thermal_cooling_device_present()
  2023-03-03 19:18 [PATCH v1 0/4] thermal: core/ACPI: Fix processor cooling device regression Rafael J. Wysocki
  2023-03-03 19:19 ` [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init() Rafael J. Wysocki
@ 2023-03-03 19:21 ` Rafael J. Wysocki
  2023-03-03 19:23 ` [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update() Rafael J. Wysocki
  2023-03-03 19:23 ` [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes Rafael J. Wysocki
  3 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-03 19:21 UTC (permalink / raw)
  To: Linux PM
  Cc: Zhang Rui, Linux ACPI, LKML, Daniel Lezcano, Srinivas Pandruvada,
	Viresh Kumar, Quanxian Wang

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Introduce a helper function, thermal_cooling_device_present(), for
checking if the given cooling device is in the list of registered
cooling devices to avoid some code duplication in a subsequent
patch.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1045,6 +1045,18 @@ devm_thermal_of_cooling_device_register(
 }
 EXPORT_SYMBOL_GPL(devm_thermal_of_cooling_device_register);
 
+static bool thermal_cooling_device_present(struct thermal_cooling_device *cdev)
+{
+	struct thermal_cooling_device *pos;
+
+	list_for_each_entry(pos, &thermal_cdev_list, node) {
+		if (pos == cdev)
+			return true;
+	}
+
+	return false;
+}
+
 static void __unbind(struct thermal_zone_device *tz, int mask,
 		     struct thermal_cooling_device *cdev)
 {
@@ -1067,20 +1079,17 @@ void thermal_cooling_device_unregister(s
 	int i;
 	const struct thermal_zone_params *tzp;
 	struct thermal_zone_device *tz;
-	struct thermal_cooling_device *pos = NULL;
 
 	if (!cdev)
 		return;
 
 	mutex_lock(&thermal_list_lock);
-	list_for_each_entry(pos, &thermal_cdev_list, node)
-		if (pos == cdev)
-			break;
-	if (pos != cdev) {
-		/* thermal cooling device not found */
+
+	if (!thermal_cooling_device_present(cdev)) {
 		mutex_unlock(&thermal_list_lock);
 		return;
 	}
+
 	list_del(&cdev->node);
 
 	/* Unbind all thermal zones associated with 'this' cdev */




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update()
  2023-03-03 19:18 [PATCH v1 0/4] thermal: core/ACPI: Fix processor cooling device regression Rafael J. Wysocki
  2023-03-03 19:19 ` [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init() Rafael J. Wysocki
  2023-03-03 19:21 ` [PATCH v1 2/4] thermal: core: Introduce thermal_cooling_device_present() Rafael J. Wysocki
@ 2023-03-03 19:23 ` Rafael J. Wysocki
  2023-03-07 16:40   ` Zhang, Rui
  2023-03-27 20:57   ` Imre Deak
  2023-03-03 19:23 ` [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes Rafael J. Wysocki
  3 siblings, 2 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-03 19:23 UTC (permalink / raw)
  To: Linux PM
  Cc: Zhang Rui, Linux ACPI, LKML, Daniel Lezcano, Srinivas Pandruvada,
	Viresh Kumar, Quanxian Wang

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its ->get_max_state()
callback return value.

That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state.  Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).

The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.

Note that the actual user of the new function will be added separately.

Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.c  |   47 ++++++++++++++++++++++++++
 drivers/thermal/thermal_core.h  |    1 
 drivers/thermal/thermal_sysfs.c |   72 +++++++++++++++++++++++++++++++++++-----
 include/linux/thermal.h         |    1 
 4 files changed, 113 insertions(+), 8 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -1057,6 +1057,53 @@ static bool thermal_cooling_device_prese
 	return false;
 }
 
+void thermal_cooling_device_update(struct thermal_cooling_device *cdev)
+{
+	unsigned long state;
+
+	if (!cdev)
+		return;
+
+	/*
+	 * Hold thermal_list_lock throughout the update to prevent the device
+	 * from going away while being updated.
+	 */
+	mutex_lock(&thermal_list_lock);
+
+	if (!thermal_cooling_device_present(cdev))
+		goto unlock_list;
+
+	/*
+	 * Update under the cdev lock to prevent the state from being set beyond
+	 * the new limit concurrently.
+	 */
+	mutex_lock(&cdev->lock);
+
+	if (cdev->ops->get_max_state(cdev, &cdev->max_state))
+		goto unlock;
+
+	thermal_cooling_device_stats_reinit(cdev);
+
+	if (cdev->ops->get_cur_state(cdev, &state))
+		goto unlock;
+
+	if (state <= cdev->max_state)
+		goto update_stats;
+
+	if (cdev->ops->set_cur_state(cdev, state))
+		goto unlock;
+
+update_stats:
+	thermal_cooling_device_stats_update(cdev, state);
+
+unlock:
+	mutex_unlock(&cdev->lock);
+
+unlock_list:
+	mutex_unlock(&thermal_list_lock);
+}
+EXPORT_SYMBOL_GPL(thermal_cooling_device_update);
+
 static void __unbind(struct thermal_zone_device *tz, int mask,
 		     struct thermal_cooling_device *cdev)
 {
Index: linux-pm/drivers/thermal/thermal_sysfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_sysfs.c
+++ linux-pm/drivers/thermal/thermal_sysfs.c
@@ -685,6 +685,8 @@ void thermal_cooling_device_stats_update
 {
 	struct cooling_dev_stats *stats = cdev->stats;
 
+	lockdep_assert_held(&cdev->lock);
+
 	if (!stats)
 		return;
 
@@ -706,13 +708,22 @@ static ssize_t total_trans_show(struct d
 				struct device_attribute *attr, char *buf)
 {
 	struct thermal_cooling_device *cdev = to_cooling_device(dev);
-	struct cooling_dev_stats *stats = cdev->stats;
+	struct cooling_dev_stats *stats;
 	int ret;
 
+	mutex_lock(&cdev->lock);
+
+	stats = cdev->stats;
+	if (!stats)
+		goto unlock;
+
 	spin_lock(&stats->lock);
 	ret = sprintf(buf, "%u\n", stats->total_trans);
 	spin_unlock(&stats->lock);
 
+unlock:
+	mutex_unlock(&cdev->lock);
+
 	return ret;
 }
 
@@ -721,11 +732,18 @@ time_in_state_ms_show(struct device *dev
 		      char *buf)
 {
 	struct thermal_cooling_device *cdev = to_cooling_device(dev);
-	struct cooling_dev_stats *stats = cdev->stats;
+	struct cooling_dev_stats *stats;
 	ssize_t len = 0;
 	int i;
 
+	mutex_lock(&cdev->lock);
+
+	stats = cdev->stats;
+	if (!stats)
+		goto unlock;
+
 	spin_lock(&stats->lock);
+
 	update_time_in_state(stats);
 
 	for (i = 0; i <= cdev->max_state; i++) {
@@ -734,6 +752,9 @@ time_in_state_ms_show(struct device *dev
 	}
 	spin_unlock(&stats->lock);
 
+unlock:
+	mutex_unlock(&cdev->lock);
+
 	return len;
 }
 
@@ -742,8 +763,16 @@ reset_store(struct device *dev, struct d
 	    size_t count)
 {
 	struct thermal_cooling_device *cdev = to_cooling_device(dev);
-	struct cooling_dev_stats *stats = cdev->stats;
-	int i, states = cdev->max_state + 1;
+	struct cooling_dev_stats *stats;
+	int i, states;
+
+	mutex_lock(&cdev->lock);
+
+	stats = cdev->stats;
+	if (!stats)
+		goto unlock;
+
+	states = cdev->max_state + 1;
 
 	spin_lock(&stats->lock);
 
@@ -757,6 +786,9 @@ reset_store(struct device *dev, struct d
 
 	spin_unlock(&stats->lock);
 
+unlock:
+	mutex_unlock(&cdev->lock);
+
 	return count;
 }
 
@@ -764,10 +796,18 @@ static ssize_t trans_table_show(struct d
 				struct device_attribute *attr, char *buf)
 {
 	struct thermal_cooling_device *cdev = to_cooling_device(dev);
-	struct cooling_dev_stats *stats = cdev->stats;
+	struct cooling_dev_stats *stats;
 	ssize_t len = 0;
 	int i, j;
 
+	mutex_lock(&cdev->lock);
+
+	stats = cdev->stats;
+	if (!stats) {
+		len = -ENODATA;
+		goto unlock;
+	}
+
 	len += snprintf(buf + len, PAGE_SIZE - len, " From  :    To\n");
 	len += snprintf(buf + len, PAGE_SIZE - len, "       : ");
 	for (i = 0; i <= cdev->max_state; i++) {
@@ -775,8 +815,10 @@ static ssize_t trans_table_show(struct d
 			break;
 		len += snprintf(buf + len, PAGE_SIZE - len, "state%2u  ", i);
 	}
-	if (len >= PAGE_SIZE)
-		return PAGE_SIZE;
+	if (len >= PAGE_SIZE) {
+		len = PAGE_SIZE;
+		goto unlock;
+	}
 
 	len += snprintf(buf + len, PAGE_SIZE - len, "\n");
 
@@ -799,8 +841,12 @@ static ssize_t trans_table_show(struct d
 
 	if (len >= PAGE_SIZE) {
 		pr_warn_once("Thermal transition table exceeds PAGE_SIZE. Disabling\n");
-		return -EFBIG;
+		len = -EFBIG;
 	}
+
+unlock:
+	mutex_unlock(&cdev->lock);
+
 	return len;
 }
 
@@ -830,6 +876,8 @@ static void cooling_device_stats_setup(s
 	unsigned long states = cdev->max_state + 1;
 	int var;
 
+	lockdep_assert_held(&cdev->lock);
+
 	var = sizeof(*stats);
 	var += sizeof(*stats->time_in_state) * states;
 	var += sizeof(*stats->trans_table) * states * states;
@@ -855,6 +903,8 @@ out:
 
 static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
 {
+	lockdep_assert_held(&cdev->lock);
+
 	kfree(cdev->stats);
 	cdev->stats = NULL;
 }
@@ -879,6 +929,12 @@ void thermal_cooling_device_destroy_sysf
 	cooling_device_stats_destroy(cdev);
 }
 
+void thermal_cooling_device_stats_reinit(struct thermal_cooling_device *cdev)
+{
+	cooling_device_stats_destroy(cdev);
+	cooling_device_stats_setup(cdev);
+}
+
 /* these helper will be used only at the time of bindig */
 ssize_t
 trip_point_show(struct device *dev, struct device_attribute *attr, char *buf)
Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -127,6 +127,7 @@ int thermal_zone_create_device_groups(st
 void thermal_zone_destroy_device_groups(struct thermal_zone_device *);
 void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
 void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev);
+void thermal_cooling_device_stats_reinit(struct thermal_cooling_device *cdev);
 /* used only at binding time */
 ssize_t trip_point_show(struct device *, struct device_attribute *, char *);
 ssize_t weight_show(struct device *, struct device_attribute *, char *);
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -384,6 +384,7 @@ devm_thermal_of_cooling_device_register(
 				struct device_node *np,
 				char *type, void *devdata,
 				const struct thermal_cooling_device_ops *ops);
+void thermal_cooling_device_update(struct thermal_cooling_device *);
 void thermal_cooling_device_unregister(struct thermal_cooling_device *);
 struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name);
 int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp);




^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  2023-03-03 19:18 [PATCH v1 0/4] thermal: core/ACPI: Fix processor cooling device regression Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2023-03-03 19:23 ` [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update() Rafael J. Wysocki
@ 2023-03-03 19:23 ` Rafael J. Wysocki
  2023-03-07 16:43   ` Zhang, Rui
  3 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-03 19:23 UTC (permalink / raw)
  To: Linux PM
  Cc: Zhang Rui, Linux ACPI, LKML, Daniel Lezcano, Srinivas Pandruvada,
	Viresh Kumar, Quanxian Wang

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

When a cpufreq policy appears or goes away, the CPU cooling devices for
the CPUs covered by that policy need to be updated so that the new
processor_get_max_state() value is stored as max_state and the
statistics in sysfs are rearranged for each of them.

Do that accordingly in acpi_thermal_cpufreq_init() and
acpi_thermal_cpufreq_exit().

Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/processor_thermal.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/acpi/processor_thermal.c
===================================================================
--- linux-pm.orig/drivers/acpi/processor_thermal.c
+++ linux-pm/drivers/acpi/processor_thermal.c
@@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
 		ret = freq_qos_add_request(&policy->constraints,
 					   &pr->thermal_req,
 					   FREQ_QOS_MAX, INT_MAX);
-		if (ret < 0)
+		if (ret < 0) {
 			pr_err("Failed to add freq constraint for CPU%d (%d)\n",
 			       cpu, ret);
+			continue;
+		}
+
+		if (!IS_ERR(pr->cdev))
+			thermal_cooling_device_update(pr->cdev);
 	}
 }
 
@@ -153,8 +158,13 @@ void acpi_thermal_cpufreq_exit(struct cp
 	for_each_cpu(cpu, policy->related_cpus) {
 		struct acpi_processor *pr = per_cpu(processors, cpu);
 
-		if (pr)
-			freq_qos_remove_request(&pr->thermal_req);
+		if (!pr)
+			continue;
+
+		freq_qos_remove_request(&pr->thermal_req);
+
+		if (!IS_ERR(pr->cdev))
+			thermal_cooling_device_update(pr->cdev);
 	}
 }
 #else				/* ! CONFIG_CPU_FREQ */




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update()
  2023-03-03 19:23 ` [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update() Rafael J. Wysocki
@ 2023-03-07 16:40   ` Zhang, Rui
  2023-03-10 18:25     ` Rafael J. Wysocki
  2023-03-27 20:57   ` Imre Deak
  1 sibling, 1 reply; 19+ messages in thread
From: Zhang, Rui @ 2023-03-07 16:40 UTC (permalink / raw)
  To: linux-pm, rjw
  Cc: srinivas.pandruvada, viresh.kumar, Wang, Quanxian, linux-kernel,
	daniel.lezcano, linux-acpi

On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Introduce a core thermal API function,
> thermal_cooling_device_update(),
> for updating the max_state value for a cooling device and rearranging
> its statistics in sysfs after a possible change of its
> ->get_max_state()
> callback return value.
> 
> That callback is now invoked only once, during cooling device
> registration, to populate the max_state field in the cooling device
> object, so if its return value changes, it needs to be invoked again
> and the new return value needs to be stored as max_state.  Moreover,
> the statistics presented in sysfs need to be rearranged in general,
> because there may not be enough room in them to store data for all
> of the possible states (in the case when max_state grows).
> 
> The new function takes care of that (and some other minor things
> related to it), but some extra locking and lockdep annotations are
> added in several places too to protect against crashes in the cases
> when the statistics are not present or when a stale max_state value
> might be used by sysfs attributes.
> 
> Note that the actual user of the new function will be added
> separately.
> 
> Link: 
> https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/thermal/thermal_core.c  |   47 ++++++++++++++++++++++++++
>  drivers/thermal/thermal_core.h  |    1 
>  drivers/thermal/thermal_sysfs.c |   72
> +++++++++++++++++++++++++++++++++++-----
>  include/linux/thermal.h         |    1 
>  4 files changed, 113 insertions(+), 8 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -1057,6 +1057,53 @@ static bool thermal_cooling_device_prese
>  	return false;
>  }
>  
> +void thermal_cooling_device_update(struct thermal_cooling_device
> *cdev)
> +{
> +	unsigned long state;
> +
> +	if (!cdev)
> +		return;
> +
> +	/*
> +	 * Hold thermal_list_lock throughout the update to prevent the
> device
> +	 * from going away while being updated.
> +	 */
> +	mutex_lock(&thermal_list_lock);
> +
> +	if (!thermal_cooling_device_present(cdev))
> +		goto unlock_list;
> +
> +	/*
> +	 * Update under the cdev lock to prevent the state from being
> set beyond
> +	 * the new limit concurrently.
> +	 */
> +	mutex_lock(&cdev->lock);
> +
> +	if (cdev->ops->get_max_state(cdev, &cdev->max_state))
> +		goto unlock;
> +
> +	thermal_cooling_device_stats_reinit(cdev);
> +
> +	if (cdev->ops->get_cur_state(cdev, &state))
> +		goto unlock;
> +
> +	if (state <= cdev->max_state)
> +		goto update_stats;
> +
how could the .get_cur_state() callback returns a value higher than
.get_max_state()? Isn't this a driver problem?

> +	if (cdev->ops->set_cur_state(cdev, state))
> +		goto unlock;

even if we don't error out, should we reevaluate .get_max_state() and
update cdev->max_state?

thanks,
rui
> +
> +update_stats:
> +	thermal_cooling_device_stats_update(cdev, state);
> +
> +unlock:
> +	mutex_unlock(&cdev->lock);
> +
> +unlock_list:
> +	mutex_unlock(&thermal_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(thermal_cooling_device_update);
> +
>  static void __unbind(struct thermal_zone_device *tz, int mask,
>  		     struct thermal_cooling_device *cdev)
>  {
> Index: linux-pm/drivers/thermal/thermal_sysfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> +++ linux-pm/drivers/thermal/thermal_sysfs.c
> @@ -685,6 +685,8 @@ void thermal_cooling_device_stats_update
>  {
>  	struct cooling_dev_stats *stats = cdev->stats;
>  
> +	lockdep_assert_held(&cdev->lock);
> +
>  	if (!stats)
>  		return;
>  
> @@ -706,13 +708,22 @@ static ssize_t total_trans_show(struct d
>  				struct device_attribute *attr, char
> *buf)
>  {
>  	struct thermal_cooling_device *cdev = to_cooling_device(dev);
> -	struct cooling_dev_stats *stats = cdev->stats;
> +	struct cooling_dev_stats *stats;
>  	int ret;
>  
> +	mutex_lock(&cdev->lock);
> +
> +	stats = cdev->stats;
> +	if (!stats)
> +		goto unlock;
> +
>  	spin_lock(&stats->lock);
>  	ret = sprintf(buf, "%u\n", stats->total_trans);
>  	spin_unlock(&stats->lock);
>  
> +unlock:
> +	mutex_unlock(&cdev->lock);
> +
>  	return ret;
>  }
>  
> @@ -721,11 +732,18 @@ time_in_state_ms_show(struct device *dev
>  		      char *buf)
>  {
>  	struct thermal_cooling_device *cdev = to_cooling_device(dev);
> -	struct cooling_dev_stats *stats = cdev->stats;
> +	struct cooling_dev_stats *stats;
>  	ssize_t len = 0;
>  	int i;
>  
> +	mutex_lock(&cdev->lock);
> +
> +	stats = cdev->stats;
> +	if (!stats)
> +		goto unlock;
> +
>  	spin_lock(&stats->lock);
> +
>  	update_time_in_state(stats);
>  
>  	for (i = 0; i <= cdev->max_state; i++) {
> @@ -734,6 +752,9 @@ time_in_state_ms_show(struct device *dev
>  	}
>  	spin_unlock(&stats->lock);
>  
> +unlock:
> +	mutex_unlock(&cdev->lock);
> +
>  	return len;
>  }
>  
> @@ -742,8 +763,16 @@ reset_store(struct device *dev, struct d
>  	    size_t count)
>  {
>  	struct thermal_cooling_device *cdev = to_cooling_device(dev);
> -	struct cooling_dev_stats *stats = cdev->stats;
> -	int i, states = cdev->max_state + 1;
> +	struct cooling_dev_stats *stats;
> +	int i, states;
> +
> +	mutex_lock(&cdev->lock);
> +
> +	stats = cdev->stats;
> +	if (!stats)
> +		goto unlock;
> +
> +	states = cdev->max_state + 1;
>  
>  	spin_lock(&stats->lock);
>  
> @@ -757,6 +786,9 @@ reset_store(struct device *dev, struct d
>  
>  	spin_unlock(&stats->lock);
>  
> +unlock:
> +	mutex_unlock(&cdev->lock);
> +
>  	return count;
>  }
>  
> @@ -764,10 +796,18 @@ static ssize_t trans_table_show(struct d
>  				struct device_attribute *attr, char
> *buf)
>  {
>  	struct thermal_cooling_device *cdev = to_cooling_device(dev);
> -	struct cooling_dev_stats *stats = cdev->stats;
> +	struct cooling_dev_stats *stats;
>  	ssize_t len = 0;
>  	int i, j;
>  
> +	mutex_lock(&cdev->lock);
> +
> +	stats = cdev->stats;
> +	if (!stats) {
> +		len = -ENODATA;
> +		goto unlock;
> +	}
> +
>  	len += snprintf(buf + len, PAGE_SIZE - len, "
> From  :    To\n");
>  	len += snprintf(buf + len, PAGE_SIZE - len, "       : ");
>  	for (i = 0; i <= cdev->max_state; i++) {
> @@ -775,8 +815,10 @@ static ssize_t trans_table_show(struct d
>  			break;
>  		len += snprintf(buf + len, PAGE_SIZE - len,
> "state%2u  ", i);
>  	}
> -	if (len >= PAGE_SIZE)
> -		return PAGE_SIZE;
> +	if (len >= PAGE_SIZE) {
> +		len = PAGE_SIZE;
> +		goto unlock;
> +	}
>  
>  	len += snprintf(buf + len, PAGE_SIZE - len, "\n");
>  
> @@ -799,8 +841,12 @@ static ssize_t trans_table_show(struct d
>  
>  	if (len >= PAGE_SIZE) {
>  		pr_warn_once("Thermal transition table exceeds
> PAGE_SIZE. Disabling\n");
> -		return -EFBIG;
> +		len = -EFBIG;
>  	}
> +
> +unlock:
> +	mutex_unlock(&cdev->lock);
> +
>  	return len;
>  }
>  
> @@ -830,6 +876,8 @@ static void cooling_device_stats_setup(s
>  	unsigned long states = cdev->max_state + 1;
>  	int var;
>  
> +	lockdep_assert_held(&cdev->lock);
> +
>  	var = sizeof(*stats);
>  	var += sizeof(*stats->time_in_state) * states;
>  	var += sizeof(*stats->trans_table) * states * states;
> @@ -855,6 +903,8 @@ out:
>  
>  static void cooling_device_stats_destroy(struct
> thermal_cooling_device *cdev)
>  {
> +	lockdep_assert_held(&cdev->lock);
> +
>  	kfree(cdev->stats);
>  	cdev->stats = NULL;
>  }
> @@ -879,6 +929,12 @@ void thermal_cooling_device_destroy_sysf
>  	cooling_device_stats_destroy(cdev);
>  }
>  
> +void thermal_cooling_device_stats_reinit(struct
> thermal_cooling_device *cdev)
> +{
> +	cooling_device_stats_destroy(cdev);
> +	cooling_device_stats_setup(cdev);
> +}
> +
>  /* these helper will be used only at the time of bindig */
>  ssize_t
>  trip_point_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -127,6 +127,7 @@ int thermal_zone_create_device_groups(st
>  void thermal_zone_destroy_device_groups(struct thermal_zone_device
> *);
>  void thermal_cooling_device_setup_sysfs(struct
> thermal_cooling_device *);
>  void thermal_cooling_device_destroy_sysfs(struct
> thermal_cooling_device *cdev);
> +void thermal_cooling_device_stats_reinit(struct
> thermal_cooling_device *cdev);
>  /* used only at binding time */
>  ssize_t trip_point_show(struct device *, struct device_attribute *,
> char *);
>  ssize_t weight_show(struct device *, struct device_attribute *, char
> *);
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -384,6 +384,7 @@ devm_thermal_of_cooling_device_register(
>  				struct device_node *np,
>  				char *type, void *devdata,
>  				const struct thermal_cooling_device_ops
> *ops);
> +void thermal_cooling_device_update(struct thermal_cooling_device *);
>  void thermal_cooling_device_unregister(struct thermal_cooling_device
> *);
>  struct thermal_zone_device *thermal_zone_get_zone_by_name(const char
> *name);
>  int thermal_zone_get_temp(struct thermal_zone_device *tz, int
> *temp);
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  2023-03-03 19:23 ` [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes Rafael J. Wysocki
@ 2023-03-07 16:43   ` Zhang, Rui
  2023-03-10 18:29     ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Zhang, Rui @ 2023-03-07 16:43 UTC (permalink / raw)
  To: linux-pm, rjw
  Cc: srinivas.pandruvada, viresh.kumar, Wang, Quanxian, linux-kernel,
	daniel.lezcano, linux-acpi

On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> When a cpufreq policy appears or goes away, the CPU cooling devices
> for
> the CPUs covered by that policy need to be updated so that the new
> processor_get_max_state() value is stored as max_state and the
> statistics in sysfs are rearranged for each of them.
> 
> Do that accordingly in acpi_thermal_cpufreq_init() and
> acpi_thermal_cpufreq_exit().
> 
> Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> Link: 
> https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> Index: linux-pm/drivers/acpi/processor_thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/processor_thermal.c
> +++ linux-pm/drivers/acpi/processor_thermal.c
> @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
>  		ret = freq_qos_add_request(&policy->constraints,
>  					   &pr->thermal_req,
>  					   FREQ_QOS_MAX, INT_MAX);
> -		if (ret < 0)
> +		if (ret < 0) {
>  			pr_err("Failed to add freq constraint for CPU%d
> (%d)\n",
>  			       cpu, ret);
> +			continue;
> +		}
> +
> +		if (!IS_ERR(pr->cdev))
> +			thermal_cooling_device_update(pr->cdev);

Although thermal_cooling_device_update() handles "pr->cdev == NULL"
case, I think it is better to use !IS_ERR_OR_NULL() here.

thanks,
rui

>  	}
>  }
>  
> @@ -153,8 +158,13 @@ void acpi_thermal_cpufreq_exit(struct cp
>  	for_each_cpu(cpu, policy->related_cpus) {
>  		struct acpi_processor *pr = per_cpu(processors, cpu);
>  
> -		if (pr)
> -			freq_qos_remove_request(&pr->thermal_req);
> +		if (!pr)
> +			continue;
> +
> +		freq_qos_remove_request(&pr->thermal_req);
> +
> +		if (!IS_ERR(pr->cdev))
> +			thermal_cooling_device_update(pr->cdev);
>  	}
>  }
>  #else				/* ! CONFIG_CPU_FREQ */
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init()
  2023-03-03 19:19 ` [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init() Rafael J. Wysocki
@ 2023-03-07 16:51   ` Zhang, Rui
  2023-03-10 18:31     ` Rafael J. Wysocki
  2023-03-12 16:08   ` Zhang, Rui
  1 sibling, 1 reply; 19+ messages in thread
From: Zhang, Rui @ 2023-03-07 16:51 UTC (permalink / raw)
  To: linux-pm, rjw
  Cc: srinivas.pandruvada, viresh.kumar, Wang, Quanxian, linux-kernel,
	daniel.lezcano, linux-acpi

On Fri, 2023-03-03 at 20:19 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The cpufreq policy notifier in the ACPI processor driver may as
> well be registered before the driver itself, which causes
> acpi_processor_cpufreq_init to be true (unless the notifier
> registration fails, which is unlikely at that point) when the
> ACPI CPU thermal cooling devices are registered, so the
> processor_get_max_state() result does not change while
> acpi_processor_driver_init() is running.
> 
> Change the ordering in acpi_processor_driver_init() accordingly
> to prevent the max_state value from remaining 0 permanently for all
> ACPI CPU cooling devices.
> 
> Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> Link: 
> https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/acpi/processor_driver.c |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Index: linux-pm/drivers/acpi/processor_driver.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/processor_driver.c
> +++ linux-pm/drivers/acpi/processor_driver.c
> @@ -263,6 +263,12 @@ static int __init acpi_processor_driver_
>  	if (acpi_disabled)
>  		return 0;
>  
> +	if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> +				       CPUFREQ_POLICY_NOTIFIER)) {
> +		acpi_processor_cpufreq_init = true;
> +		acpi_processor_ignore_ppc_init();
> +	}
> +
>  	result = driver_register(&acpi_processor_driver);
>  	if (result < 0)
>  		return result;
> @@ -276,12 +282,6 @@ static int __init acpi_processor_driver_
>  	cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD, "acpi/cpu-
> drv:dead",
>  				  NULL, acpi_soft_cpu_dead);
>  
> -	if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> -				       CPUFREQ_POLICY_NOTIFIER)) {
> -		acpi_processor_cpufreq_init = true;
> -		acpi_processor_ignore_ppc_init();
> -	}
> -
>  	acpi_processor_throttling_init();
>  	return 0;
>  err:
> 
Just FYI.
I need some time to ramp up on the ordering here to double confirm this
does not break any dependency, too many things are involved here :p.

I will test the whole patch series later this week.

thanks,
rui

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update()
  2023-03-07 16:40   ` Zhang, Rui
@ 2023-03-10 18:25     ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-10 18:25 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: linux-pm, rjw, srinivas.pandruvada, viresh.kumar, Wang, Quanxian,
	linux-kernel, daniel.lezcano, linux-acpi

On Tue, Mar 7, 2023 at 5:44 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Introduce a core thermal API function,
> > thermal_cooling_device_update(),
> > for updating the max_state value for a cooling device and rearranging
> > its statistics in sysfs after a possible change of its
> > ->get_max_state()
> > callback return value.
> >
> > That callback is now invoked only once, during cooling device
> > registration, to populate the max_state field in the cooling device
> > object, so if its return value changes, it needs to be invoked again
> > and the new return value needs to be stored as max_state.  Moreover,
> > the statistics presented in sysfs need to be rearranged in general,
> > because there may not be enough room in them to store data for all
> > of the possible states (in the case when max_state grows).
> >
> > The new function takes care of that (and some other minor things
> > related to it), but some extra locking and lockdep annotations are
> > added in several places too to protect against crashes in the cases
> > when the statistics are not present or when a stale max_state value
> > might be used by sysfs attributes.
> >
> > Note that the actual user of the new function will be added
> > separately.
> >
> > Link:
> > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/thermal/thermal_core.c  |   47 ++++++++++++++++++++++++++
> >  drivers/thermal/thermal_core.h  |    1
> >  drivers/thermal/thermal_sysfs.c |   72
> > +++++++++++++++++++++++++++++++++++-----
> >  include/linux/thermal.h         |    1
> >  4 files changed, 113 insertions(+), 8 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/thermal_core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_core.c
> > +++ linux-pm/drivers/thermal/thermal_core.c
> > @@ -1057,6 +1057,53 @@ static bool thermal_cooling_device_prese
> >       return false;
> >  }
> >
> > +void thermal_cooling_device_update(struct thermal_cooling_device
> > *cdev)
> > +{
> > +     unsigned long state;
> > +
> > +     if (!cdev)
> > +             return;
> > +
> > +     /*
> > +      * Hold thermal_list_lock throughout the update to prevent the
> > device
> > +      * from going away while being updated.
> > +      */
> > +     mutex_lock(&thermal_list_lock);
> > +
> > +     if (!thermal_cooling_device_present(cdev))
> > +             goto unlock_list;
> > +
> > +     /*
> > +      * Update under the cdev lock to prevent the state from being
> > set beyond
> > +      * the new limit concurrently.
> > +      */
> > +     mutex_lock(&cdev->lock);
> > +
> > +     if (cdev->ops->get_max_state(cdev, &cdev->max_state))
> > +             goto unlock;
> > +
> > +     thermal_cooling_device_stats_reinit(cdev);
> > +
> > +     if (cdev->ops->get_cur_state(cdev, &state))
> > +             goto unlock;
> > +
> > +     if (state <= cdev->max_state)
> > +             goto update_stats;
> > +
> how could the .get_cur_state() callback returns a value higher than
> .get_max_state()? Isn't this a driver problem?

It is a driver problem, but the check needs to be done here in case
the driver has this problem, because
thermal_cooling_device_stats_update() below might write out of array
bounds otherwise.

> > +     if (cdev->ops->set_cur_state(cdev, state))
> > +             goto unlock;
>
> even if we don't error out, should we reevaluate .get_max_state() and
> update cdev->max_state?

This has just been done and in fact it is not necessary to call the
driver to set the state to the value that has just been returned by
it.  Actually, this particular piece is broken, because passing a
value above max_state to the "set" callback here doesn't make sense.
I'll fix this in v2.

> > +
> > +update_stats:
> > +     thermal_cooling_device_stats_update(cdev, state);
> > +
> > +unlock:
> > +     mutex_unlock(&cdev->lock);
> > +
> > +unlock_list:
> > +     mutex_unlock(&thermal_list_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(thermal_cooling_device_update);
> > +
> >  static void __unbind(struct thermal_zone_device *tz, int mask,
> >                    struct thermal_cooling_device *cdev)
> >  {
> > Index: linux-pm/drivers/thermal/thermal_sysfs.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> > +++ linux-pm/drivers/thermal/thermal_sysfs.c
> > @@ -685,6 +685,8 @@ void thermal_cooling_device_stats_update
> >  {
> >       struct cooling_dev_stats *stats = cdev->stats;
> >
> > +     lockdep_assert_held(&cdev->lock);
> > +
> >       if (!stats)
> >               return;
> >
> > @@ -706,13 +708,22 @@ static ssize_t total_trans_show(struct d
> >                               struct device_attribute *attr, char
> > *buf)
> >  {
> >       struct thermal_cooling_device *cdev = to_cooling_device(dev);
> > -     struct cooling_dev_stats *stats = cdev->stats;
> > +     struct cooling_dev_stats *stats;
> >       int ret;
> >
> > +     mutex_lock(&cdev->lock);
> > +
> > +     stats = cdev->stats;
> > +     if (!stats)
> > +             goto unlock;
> > +
> >       spin_lock(&stats->lock);
> >       ret = sprintf(buf, "%u\n", stats->total_trans);
> >       spin_unlock(&stats->lock);
> >
> > +unlock:
> > +     mutex_unlock(&cdev->lock);
> > +
> >       return ret;
> >  }
> >
> > @@ -721,11 +732,18 @@ time_in_state_ms_show(struct device *dev
> >                     char *buf)
> >  {
> >       struct thermal_cooling_device *cdev = to_cooling_device(dev);
> > -     struct cooling_dev_stats *stats = cdev->stats;
> > +     struct cooling_dev_stats *stats;
> >       ssize_t len = 0;
> >       int i;
> >
> > +     mutex_lock(&cdev->lock);
> > +
> > +     stats = cdev->stats;
> > +     if (!stats)
> > +             goto unlock;
> > +
> >       spin_lock(&stats->lock);
> > +
> >       update_time_in_state(stats);
> >
> >       for (i = 0; i <= cdev->max_state; i++) {
> > @@ -734,6 +752,9 @@ time_in_state_ms_show(struct device *dev
> >       }
> >       spin_unlock(&stats->lock);
> >
> > +unlock:
> > +     mutex_unlock(&cdev->lock);
> > +
> >       return len;
> >  }
> >
> > @@ -742,8 +763,16 @@ reset_store(struct device *dev, struct d
> >           size_t count)
> >  {
> >       struct thermal_cooling_device *cdev = to_cooling_device(dev);
> > -     struct cooling_dev_stats *stats = cdev->stats;
> > -     int i, states = cdev->max_state + 1;
> > +     struct cooling_dev_stats *stats;
> > +     int i, states;
> > +
> > +     mutex_lock(&cdev->lock);
> > +
> > +     stats = cdev->stats;
> > +     if (!stats)
> > +             goto unlock;
> > +
> > +     states = cdev->max_state + 1;
> >
> >       spin_lock(&stats->lock);
> >
> > @@ -757,6 +786,9 @@ reset_store(struct device *dev, struct d
> >
> >       spin_unlock(&stats->lock);
> >
> > +unlock:
> > +     mutex_unlock(&cdev->lock);
> > +
> >       return count;
> >  }
> >
> > @@ -764,10 +796,18 @@ static ssize_t trans_table_show(struct d
> >                               struct device_attribute *attr, char
> > *buf)
> >  {
> >       struct thermal_cooling_device *cdev = to_cooling_device(dev);
> > -     struct cooling_dev_stats *stats = cdev->stats;
> > +     struct cooling_dev_stats *stats;
> >       ssize_t len = 0;
> >       int i, j;
> >
> > +     mutex_lock(&cdev->lock);
> > +
> > +     stats = cdev->stats;
> > +     if (!stats) {
> > +             len = -ENODATA;
> > +             goto unlock;
> > +     }
> > +
> >       len += snprintf(buf + len, PAGE_SIZE - len, "
> > From  :    To\n");
> >       len += snprintf(buf + len, PAGE_SIZE - len, "       : ");
> >       for (i = 0; i <= cdev->max_state; i++) {
> > @@ -775,8 +815,10 @@ static ssize_t trans_table_show(struct d
> >                       break;
> >               len += snprintf(buf + len, PAGE_SIZE - len,
> > "state%2u  ", i);
> >       }
> > -     if (len >= PAGE_SIZE)
> > -             return PAGE_SIZE;
> > +     if (len >= PAGE_SIZE) {
> > +             len = PAGE_SIZE;
> > +             goto unlock;
> > +     }
> >
> >       len += snprintf(buf + len, PAGE_SIZE - len, "\n");
> >
> > @@ -799,8 +841,12 @@ static ssize_t trans_table_show(struct d
> >
> >       if (len >= PAGE_SIZE) {
> >               pr_warn_once("Thermal transition table exceeds
> > PAGE_SIZE. Disabling\n");
> > -             return -EFBIG;
> > +             len = -EFBIG;
> >       }
> > +
> > +unlock:
> > +     mutex_unlock(&cdev->lock);
> > +
> >       return len;
> >  }
> >
> > @@ -830,6 +876,8 @@ static void cooling_device_stats_setup(s
> >       unsigned long states = cdev->max_state + 1;
> >       int var;
> >
> > +     lockdep_assert_held(&cdev->lock);
> > +
> >       var = sizeof(*stats);
> >       var += sizeof(*stats->time_in_state) * states;
> >       var += sizeof(*stats->trans_table) * states * states;
> > @@ -855,6 +903,8 @@ out:
> >
> >  static void cooling_device_stats_destroy(struct
> > thermal_cooling_device *cdev)
> >  {
> > +     lockdep_assert_held(&cdev->lock);
> > +
> >       kfree(cdev->stats);
> >       cdev->stats = NULL;
> >  }
> > @@ -879,6 +929,12 @@ void thermal_cooling_device_destroy_sysf
> >       cooling_device_stats_destroy(cdev);
> >  }
> >
> > +void thermal_cooling_device_stats_reinit(struct
> > thermal_cooling_device *cdev)
> > +{
> > +     cooling_device_stats_destroy(cdev);
> > +     cooling_device_stats_setup(cdev);
> > +}
> > +
> >  /* these helper will be used only at the time of bindig */
> >  ssize_t
> >  trip_point_show(struct device *dev, struct device_attribute *attr,
> > char *buf)
> > Index: linux-pm/drivers/thermal/thermal_core.h
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_core.h
> > +++ linux-pm/drivers/thermal/thermal_core.h
> > @@ -127,6 +127,7 @@ int thermal_zone_create_device_groups(st
> >  void thermal_zone_destroy_device_groups(struct thermal_zone_device
> > *);
> >  void thermal_cooling_device_setup_sysfs(struct
> > thermal_cooling_device *);
> >  void thermal_cooling_device_destroy_sysfs(struct
> > thermal_cooling_device *cdev);
> > +void thermal_cooling_device_stats_reinit(struct
> > thermal_cooling_device *cdev);
> >  /* used only at binding time */
> >  ssize_t trip_point_show(struct device *, struct device_attribute *,
> > char *);
> >  ssize_t weight_show(struct device *, struct device_attribute *, char
> > *);
> > Index: linux-pm/include/linux/thermal.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/thermal.h
> > +++ linux-pm/include/linux/thermal.h
> > @@ -384,6 +384,7 @@ devm_thermal_of_cooling_device_register(
> >                               struct device_node *np,
> >                               char *type, void *devdata,
> >                               const struct thermal_cooling_device_ops
> > *ops);
> > +void thermal_cooling_device_update(struct thermal_cooling_device *);
> >  void thermal_cooling_device_unregister(struct thermal_cooling_device
> > *);
> >  struct thermal_zone_device *thermal_zone_get_zone_by_name(const char
> > *name);
> >  int thermal_zone_get_temp(struct thermal_zone_device *tz, int
> > *temp);
> >
> >
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  2023-03-07 16:43   ` Zhang, Rui
@ 2023-03-10 18:29     ` Rafael J. Wysocki
  2023-03-12 14:44       ` Zhang, Rui
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-10 18:29 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: linux-pm, rjw, srinivas.pandruvada, viresh.kumar, Wang, Quanxian,
	linux-kernel, daniel.lezcano, linux-acpi

On Tue, Mar 7, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > When a cpufreq policy appears or goes away, the CPU cooling devices
> > for
> > the CPUs covered by that policy need to be updated so that the new
> > processor_get_max_state() value is stored as max_state and the
> > statistics in sysfs are rearranged for each of them.
> >
> > Do that accordingly in acpi_thermal_cpufreq_init() and
> > acpi_thermal_cpufreq_exit().
> >
> > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > Link:
> > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/drivers/acpi/processor_thermal.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/processor_thermal.c
> > +++ linux-pm/drivers/acpi/processor_thermal.c
> > @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
> >               ret = freq_qos_add_request(&policy->constraints,
> >                                          &pr->thermal_req,
> >                                          FREQ_QOS_MAX, INT_MAX);
> > -             if (ret < 0)
> > +             if (ret < 0) {
> >                       pr_err("Failed to add freq constraint for CPU%d
> > (%d)\n",
> >                              cpu, ret);
> > +                     continue;
> > +             }
> > +
> > +             if (!IS_ERR(pr->cdev))
> > +                     thermal_cooling_device_update(pr->cdev);
>
> Although thermal_cooling_device_update() handles "pr->cdev == NULL"
> case, I think it is better to use !IS_ERR_OR_NULL() here.

Why is it?

I was thinking about doing that, but then I realized that the NULL
case had been covered and that's why I went for the change above.  If
there is a particular reason to check for NULL here, I can do that,
but I'm not sure what it is.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init()
  2023-03-07 16:51   ` Zhang, Rui
@ 2023-03-10 18:31     ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-10 18:31 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: linux-pm, rjw, srinivas.pandruvada, viresh.kumar, Wang, Quanxian,
	linux-kernel, daniel.lezcano, linux-acpi

On Tue, Mar 7, 2023 at 5:55 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2023-03-03 at 20:19 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The cpufreq policy notifier in the ACPI processor driver may as
> > well be registered before the driver itself, which causes
> > acpi_processor_cpufreq_init to be true (unless the notifier
> > registration fails, which is unlikely at that point) when the
> > ACPI CPU thermal cooling devices are registered, so the
> > processor_get_max_state() result does not change while
> > acpi_processor_driver_init() is running.
> >
> > Change the ordering in acpi_processor_driver_init() accordingly
> > to prevent the max_state value from remaining 0 permanently for all
> > ACPI CPU cooling devices.
> >
> > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > Link:
> > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/acpi/processor_driver.c |   12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > Index: linux-pm/drivers/acpi/processor_driver.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/processor_driver.c
> > +++ linux-pm/drivers/acpi/processor_driver.c
> > @@ -263,6 +263,12 @@ static int __init acpi_processor_driver_
> >       if (acpi_disabled)
> >               return 0;
> >
> > +     if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > +                                    CPUFREQ_POLICY_NOTIFIER)) {
> > +             acpi_processor_cpufreq_init = true;
> > +             acpi_processor_ignore_ppc_init();
> > +     }
> > +
> >       result = driver_register(&acpi_processor_driver);
> >       if (result < 0)
> >               return result;
> > @@ -276,12 +282,6 @@ static int __init acpi_processor_driver_
> >       cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD, "acpi/cpu-
> > drv:dead",
> >                                 NULL, acpi_soft_cpu_dead);
> >
> > -     if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > -                                    CPUFREQ_POLICY_NOTIFIER)) {
> > -             acpi_processor_cpufreq_init = true;
> > -             acpi_processor_ignore_ppc_init();
> > -     }
> > -
> >       acpi_processor_throttling_init();
> >       return 0;
> >  err:
> >
> Just FYI.
> I need some time to ramp up on the ordering here to double confirm this
> does not break any dependency, too many things are involved here :p.

Unless I've overlooked something tricky, it should be fine, but of
course verifying this independently won't hurt.

> I will test the whole patch series later this week.

Thank you!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  2023-03-10 18:29     ` Rafael J. Wysocki
@ 2023-03-12 14:44       ` Zhang, Rui
  2023-03-13 13:50         ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Zhang, Rui @ 2023-03-12 14:44 UTC (permalink / raw)
  To: rafael
  Cc: viresh.kumar, daniel.lezcano, Wang, Quanxian, rjw,
	srinivas.pandruvada, linux-kernel, linux-acpi, linux-pm

On Fri, 2023-03-10 at 19:29 +0100, Rafael J. Wysocki wrote:
> On Tue, Mar 7, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com>
> wrote:
> > On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > When a cpufreq policy appears or goes away, the CPU cooling
> > > devices
> > > for
> > > the CPUs covered by that policy need to be updated so that the
> > > new
> > > processor_get_max_state() value is stored as max_state and the
> > > statistics in sysfs are rearranged for each of them.
> > > 
> > > Do that accordingly in acpi_thermal_cpufreq_init() and
> > > acpi_thermal_cpufreq_exit().
> > > 
> > > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > > Link:
> > > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
> > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > 
> > > Index: linux-pm/drivers/acpi/processor_thermal.c
> > > =================================================================
> > > ==
> > > --- linux-pm.orig/drivers/acpi/processor_thermal.c
> > > +++ linux-pm/drivers/acpi/processor_thermal.c
> > > @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
> > >               ret = freq_qos_add_request(&policy->constraints,
> > >                                          &pr->thermal_req,
> > >                                          FREQ_QOS_MAX, INT_MAX);
> > > -             if (ret < 0)
> > > +             if (ret < 0) {
> > >                       pr_err("Failed to add freq constraint for
> > > CPU%d
> > > (%d)\n",
> > >                              cpu, ret);
> > > +                     continue;
> > > +             }
> > > +
> > > +             if (!IS_ERR(pr->cdev))
> > > +                     thermal_cooling_device_update(pr->cdev);
> > 
> > Although thermal_cooling_device_update() handles "pr->cdev == NULL"
> > case, I think it is better to use !IS_ERR_OR_NULL() here.
> 
> Why is it?
> 
> I was thinking about doing that, but then I realized that the NULL
> case had been covered and that's why I went for the change above.  If
> there is a particular reason to check for NULL here, I can do that,
> but I'm not sure what it is.

I don't have a strong objection here.

I thought this was a code bug at first glance, until I double checked t
hermal_cooling_device_update().

So I think the latter would be more straight forward without
introducing code complexity.

thanks,
rui


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init()
  2023-03-03 19:19 ` [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init() Rafael J. Wysocki
  2023-03-07 16:51   ` Zhang, Rui
@ 2023-03-12 16:08   ` Zhang, Rui
  2023-03-13 13:48     ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Zhang, Rui @ 2023-03-12 16:08 UTC (permalink / raw)
  To: linux-pm, rjw
  Cc: srinivas.pandruvada, viresh.kumar, Wang, Quanxian, linux-kernel,
	daniel.lezcano, linux-acpi

On Fri, 2023-03-03 at 20:19 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The cpufreq policy notifier in the ACPI processor driver may as
> well be registered before the driver itself, which causes
> acpi_processor_cpufreq_init to be true (unless the notifier
> registration fails, which is unlikely at that point) when the
> ACPI CPU thermal cooling devices are registered, so the
> processor_get_max_state() result does not change while
> acpi_processor_driver_init() is running.
> 
> Change the ordering in acpi_processor_driver_init() accordingly
> to prevent the max_state value from remaining 0 permanently for all
> ACPI CPU cooling devices.
> 
> Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> Link: 
> https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The full patch series fixes the problem but this one does not.

This is because,

static int cpu_has_cpufreq(unsigned int cpu)
{
        struct
cpufreq_policy *policy;

        if (!acpi_processor_cpufreq_init)
                return 0;

        policy = cpufreq_cpu_get(cpu);
        if (policy) {
                cpufreq_cpu_put(policy);
                return 1;
        }
        return 0;
}

Although acpi_processor_cpufreq_init is set to true with patch 1/4, but
we don't have cpufreq driver registered, thus cpufreq_cpu_get() return
NULL.

so acpi_processor_cpufreq_init is not the only dependency here. :(

thanks,
rui

> ---
>  drivers/acpi/processor_driver.c |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Index: linux-pm/drivers/acpi/processor_driver.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/processor_driver.c
> +++ linux-pm/drivers/acpi/processor_driver.c
> @@ -263,6 +263,12 @@ static int __init acpi_processor_driver_
>  	if (acpi_disabled)
>  		return 0;
>  
> +	if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> +				       CPUFREQ_POLICY_NOTIFIER)) {
> +		acpi_processor_cpufreq_init = true;
> +		acpi_processor_ignore_ppc_init();
> +	}
> +
>  	result = driver_register(&acpi_processor_driver);
>  	if (result < 0)
>  		return result;
> @@ -276,12 +282,6 @@ static int __init acpi_processor_driver_
>  	cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD, "acpi/cpu-
> drv:dead",
>  				  NULL, acpi_soft_cpu_dead);
>  
> -	if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> -				       CPUFREQ_POLICY_NOTIFIER)) {
> -		acpi_processor_cpufreq_init = true;
> -		acpi_processor_ignore_ppc_init();
> -	}
> -
>  	acpi_processor_throttling_init();
>  	return 0;
>  err:
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init()
  2023-03-12 16:08   ` Zhang, Rui
@ 2023-03-13 13:48     ` Rafael J. Wysocki
  2023-03-13 14:54       ` Zhang, Rui
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-13 13:48 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: linux-pm, rjw, srinivas.pandruvada, viresh.kumar, Wang, Quanxian,
	linux-kernel, daniel.lezcano, linux-acpi

On Sun, Mar 12, 2023 at 5:09 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2023-03-03 at 20:19 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The cpufreq policy notifier in the ACPI processor driver may as
> > well be registered before the driver itself, which causes
> > acpi_processor_cpufreq_init to be true (unless the notifier
> > registration fails, which is unlikely at that point) when the
> > ACPI CPU thermal cooling devices are registered, so the
> > processor_get_max_state() result does not change while
> > acpi_processor_driver_init() is running.
> >
> > Change the ordering in acpi_processor_driver_init() accordingly
> > to prevent the max_state value from remaining 0 permanently for all
> > ACPI CPU cooling devices.
> >
> > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > Link:
> > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The full patch series fixes the problem but this one does not.

That is a correct observation, but the $subject patch fixes part of
the problem (which is not addressed by the rest of the series AFAICS)
and so it deserves a Fixes tag of its own IMO.

I guess I should clarify that in the changelog.

> This is because,
>
> static int cpu_has_cpufreq(unsigned int cpu)
> {
>         struct
> cpufreq_policy *policy;
>
>         if (!acpi_processor_cpufreq_init)
>                 return 0;
>
>         policy = cpufreq_cpu_get(cpu);
>         if (policy) {
>                 cpufreq_cpu_put(policy);
>                 return 1;
>         }
>         return 0;
> }
>
> Although acpi_processor_cpufreq_init is set to true with patch 1/4, but
> we don't have cpufreq driver registered, thus cpufreq_cpu_get() return
> NULL.
> so acpi_processor_cpufreq_init is not the only dependency here. :(

Right.  That's why the other patches in the series are needed too.

> > ---
> >  drivers/acpi/processor_driver.c |   12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > Index: linux-pm/drivers/acpi/processor_driver.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/processor_driver.c
> > +++ linux-pm/drivers/acpi/processor_driver.c
> > @@ -263,6 +263,12 @@ static int __init acpi_processor_driver_
> >       if (acpi_disabled)
> >               return 0;
> >
> > +     if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > +                                    CPUFREQ_POLICY_NOTIFIER)) {
> > +             acpi_processor_cpufreq_init = true;
> > +             acpi_processor_ignore_ppc_init();
> > +     }
> > +
> >       result = driver_register(&acpi_processor_driver);
> >       if (result < 0)
> >               return result;
> > @@ -276,12 +282,6 @@ static int __init acpi_processor_driver_
> >       cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD, "acpi/cpu-
> > drv:dead",
> >                                 NULL, acpi_soft_cpu_dead);
> >
> > -     if (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > -                                    CPUFREQ_POLICY_NOTIFIER)) {
> > -             acpi_processor_cpufreq_init = true;
> > -             acpi_processor_ignore_ppc_init();
> > -     }
> > -
> >       acpi_processor_throttling_init();
> >       return 0;
> >  err:
> >
> >
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes
  2023-03-12 14:44       ` Zhang, Rui
@ 2023-03-13 13:50         ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-13 13:50 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: rafael, viresh.kumar, daniel.lezcano, Wang, Quanxian, rjw,
	srinivas.pandruvada, linux-kernel, linux-acpi, linux-pm

On Sun, Mar 12, 2023 at 3:44 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2023-03-10 at 19:29 +0100, Rafael J. Wysocki wrote:
> > On Tue, Mar 7, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com>
> > wrote:
> > > On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > When a cpufreq policy appears or goes away, the CPU cooling
> > > > devices
> > > > for
> > > > the CPUs covered by that policy need to be updated so that the
> > > > new
> > > > processor_get_max_state() value is stored as max_state and the
> > > > statistics in sysfs are rearranged for each of them.
> > > >
> > > > Do that accordingly in acpi_thermal_cpufreq_init() and
> > > > acpi_thermal_cpufreq_exit().
> > > >
> > > > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > > > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > > > Link:
> > > > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
> > > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/acpi/processor_thermal.c
> > > > =================================================================
> > > > ==
> > > > --- linux-pm.orig/drivers/acpi/processor_thermal.c
> > > > +++ linux-pm/drivers/acpi/processor_thermal.c
> > > > @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
> > > >               ret = freq_qos_add_request(&policy->constraints,
> > > >                                          &pr->thermal_req,
> > > >                                          FREQ_QOS_MAX, INT_MAX);
> > > > -             if (ret < 0)
> > > > +             if (ret < 0) {
> > > >                       pr_err("Failed to add freq constraint for
> > > > CPU%d
> > > > (%d)\n",
> > > >                              cpu, ret);
> > > > +                     continue;
> > > > +             }
> > > > +
> > > > +             if (!IS_ERR(pr->cdev))
> > > > +                     thermal_cooling_device_update(pr->cdev);
> > >
> > > Although thermal_cooling_device_update() handles "pr->cdev == NULL"
> > > case, I think it is better to use !IS_ERR_OR_NULL() here.
> >
> > Why is it?
> >
> > I was thinking about doing that, but then I realized that the NULL
> > case had been covered and that's why I went for the change above.  If
> > there is a particular reason to check for NULL here, I can do that,
> > but I'm not sure what it is.
>
> I don't have a strong objection here.
>
> I thought this was a code bug at first glance, until I double checked t
> hermal_cooling_device_update().
>
> So I think the latter would be more straight forward without
> introducing code complexity.

Alternatively, thermal_cooling_device_update() can be made to do the
full check instead.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init()
  2023-03-13 13:48     ` Rafael J. Wysocki
@ 2023-03-13 14:54       ` Zhang, Rui
  2023-03-13 14:58         ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Zhang, Rui @ 2023-03-13 14:54 UTC (permalink / raw)
  To: rafael
  Cc: viresh.kumar, daniel.lezcano, Wang, Quanxian, rjw,
	srinivas.pandruvada, linux-kernel, linux-acpi, linux-pm

On Mon, 2023-03-13 at 14:48 +0100, Rafael J. Wysocki wrote:
> On Sun, Mar 12, 2023 at 5:09 PM Zhang, Rui <rui.zhang@intel.com>
> wrote:
> > On Fri, 2023-03-03 at 20:19 +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > The cpufreq policy notifier in the ACPI processor driver may as
> > > well be registered before the driver itself, which causes
> > > acpi_processor_cpufreq_init to be true (unless the notifier
> > > registration fails, which is unlikely at that point) when the
> > > ACPI CPU thermal cooling devices are registered, so the
> > > processor_get_max_state() result does not change while
> > > acpi_processor_driver_init() is running.
> > > 
> > > Change the ordering in acpi_processor_driver_init() accordingly
> > > to prevent the max_state value from remaining 0 permanently for
> > > all
> > > ACPI CPU cooling devices.
> > > 
> > > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > > Link:
> > > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > The full patch series fixes the problem but this one does not.
> 
> That is a correct observation, but the $subject patch fixes part of
> the problem (which is not addressed by the rest of the series AFAICS)
> and so it deserves a Fixes tag of its own IMO.

Am I understanding this correctly that this patch helps in below case?

cpufreq driver like intel_pstate is registered before we register the
notifier callback in processor_driver. In this case, we are not able to
catch the CPUFREQ_CREATE_POLICY notification and cpufreq should be
counted as part of cooling states when registering the ACPI CPU cooling
device. (acpi_processor_cpufreq_init must be set at this time)

Or else, in normal case, the ACPI CPU cdev->max_state always return 0
(when t-state not available) until we receive the CPUFREQ_CREATE_POLICY
notification and call thermal_cooling_device_update(), both with and
without this patch.

Patch 2,3,4 works on my test platform, without applying patch 1/4.

thanks,
rui

> 
> I guess I should clarify that in the changelog.
> 
> > This is because,
> > 
> > static int cpu_has_cpufreq(unsigned int cpu)
> > {
> >         struct
> > cpufreq_policy *policy;
> > 
> >         if (!acpi_processor_cpufreq_init)
> >                 return 0;
> > 
> >         policy = cpufreq_cpu_get(cpu);
> >         if (policy) {
> >                 cpufreq_cpu_put(policy);
> >                 return 1;
> >         }
> >         return 0;
> > }
> > 
> > Although acpi_processor_cpufreq_init is set to true with patch 1/4,
> > but
> > we don't have cpufreq driver registered, thus cpufreq_cpu_get()
> > return
> > NULL.
> > so acpi_processor_cpufreq_init is not the only dependency here. :(
> 
> Right.  That's why the other patches in the series are needed too.
> 
> > > ---
> > >  drivers/acpi/processor_driver.c |   12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > Index: linux-pm/drivers/acpi/processor_driver.c
> > > =================================================================
> > > ==
> > > --- linux-pm.orig/drivers/acpi/processor_driver.c
> > > +++ linux-pm/drivers/acpi/processor_driver.c
> > > @@ -263,6 +263,12 @@ static int __init acpi_processor_driver_
> > >       if (acpi_disabled)
> > >               return 0;
> > > 
> > > +     if
> > > (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > > +                                    CPUFREQ_POLICY_NOTIFIER)) {
> > > +             acpi_processor_cpufreq_init = true;
> > > +             acpi_processor_ignore_ppc_init();
> > > +     }
> > > +
> > >       result = driver_register(&acpi_processor_driver);
> > >       if (result < 0)
> > >               return result;
> > > @@ -276,12 +282,6 @@ static int __init acpi_processor_driver_
> > >       cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD,
> > > "acpi/cpu-
> > > drv:dead",
> > >                                 NULL, acpi_soft_cpu_dead);
> > > 
> > > -     if
> > > (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > > -                                    CPUFREQ_POLICY_NOTIFIER)) {
> > > -             acpi_processor_cpufreq_init = true;
> > > -             acpi_processor_ignore_ppc_init();
> > > -     }
> > > -
> > >       acpi_processor_throttling_init();
> > >       return 0;
> > >  err:
> > > 
> > > 
> > > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init()
  2023-03-13 14:54       ` Zhang, Rui
@ 2023-03-13 14:58         ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-13 14:58 UTC (permalink / raw)
  To: Zhang, Rui
  Cc: rafael, viresh.kumar, daniel.lezcano, Wang, Quanxian, rjw,
	srinivas.pandruvada, linux-kernel, linux-acpi, linux-pm

On Mon, Mar 13, 2023 at 3:54 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Mon, 2023-03-13 at 14:48 +0100, Rafael J. Wysocki wrote:
> > On Sun, Mar 12, 2023 at 5:09 PM Zhang, Rui <rui.zhang@intel.com>
> > wrote:
> > > On Fri, 2023-03-03 at 20:19 +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > The cpufreq policy notifier in the ACPI processor driver may as
> > > > well be registered before the driver itself, which causes
> > > > acpi_processor_cpufreq_init to be true (unless the notifier
> > > > registration fails, which is unlikely at that point) when the
> > > > ACPI CPU thermal cooling devices are registered, so the
> > > > processor_get_max_state() result does not change while
> > > > acpi_processor_driver_init() is running.
> > > >
> > > > Change the ordering in acpi_processor_driver_init() accordingly
> > > > to prevent the max_state value from remaining 0 permanently for
> > > > all
> > > > ACPI CPU cooling devices.
> > > >
> > > > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > > > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > > > Link:
> > > > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > The full patch series fixes the problem but this one does not.
> >
> > That is a correct observation, but the $subject patch fixes part of
> > the problem (which is not addressed by the rest of the series AFAICS)
> > and so it deserves a Fixes tag of its own IMO.
>
> Am I understanding this correctly that this patch helps in below case?
>
> cpufreq driver like intel_pstate is registered before we register the
> notifier callback in processor_driver. In this case, we are not able to
> catch the CPUFREQ_CREATE_POLICY notification and cpufreq should be
> counted as part of cooling states when registering the ACPI CPU cooling
> device. (acpi_processor_cpufreq_init must be set at this time)

Yes.

> Or else, in normal case, the ACPI CPU cdev->max_state always return 0
> (when t-state not available) until we receive the CPUFREQ_CREATE_POLICY
> notification and call thermal_cooling_device_update(), both with and
> without this patch.
>
> Patch 2,3,4 works on my test platform, without applying patch 1/4.

OK

> > I guess I should clarify that in the changelog.
> >
> > > This is because,
> > >
> > > static int cpu_has_cpufreq(unsigned int cpu)
> > > {
> > >         struct
> > > cpufreq_policy *policy;
> > >
> > >         if (!acpi_processor_cpufreq_init)
> > >                 return 0;
> > >
> > >         policy = cpufreq_cpu_get(cpu);
> > >         if (policy) {
> > >                 cpufreq_cpu_put(policy);
> > >                 return 1;
> > >         }
> > >         return 0;
> > > }
> > >
> > > Although acpi_processor_cpufreq_init is set to true with patch 1/4,
> > > but
> > > we don't have cpufreq driver registered, thus cpufreq_cpu_get()
> > > return
> > > NULL.
> > > so acpi_processor_cpufreq_init is not the only dependency here. :(
> >
> > Right.  That's why the other patches in the series are needed too.
> >
> > > > ---
> > > >  drivers/acpi/processor_driver.c |   12 ++++++------
> > > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/acpi/processor_driver.c
> > > > =================================================================
> > > > ==
> > > > --- linux-pm.orig/drivers/acpi/processor_driver.c
> > > > +++ linux-pm/drivers/acpi/processor_driver.c
> > > > @@ -263,6 +263,12 @@ static int __init acpi_processor_driver_
> > > >       if (acpi_disabled)
> > > >               return 0;
> > > >
> > > > +     if
> > > > (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > > > +                                    CPUFREQ_POLICY_NOTIFIER)) {
> > > > +             acpi_processor_cpufreq_init = true;
> > > > +             acpi_processor_ignore_ppc_init();
> > > > +     }
> > > > +
> > > >       result = driver_register(&acpi_processor_driver);
> > > >       if (result < 0)
> > > >               return result;
> > > > @@ -276,12 +282,6 @@ static int __init acpi_processor_driver_
> > > >       cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD,
> > > > "acpi/cpu-
> > > > drv:dead",
> > > >                                 NULL, acpi_soft_cpu_dead);
> > > >
> > > > -     if
> > > > (!cpufreq_register_notifier(&acpi_processor_notifier_block,
> > > > -                                    CPUFREQ_POLICY_NOTIFIER)) {
> > > > -             acpi_processor_cpufreq_init = true;
> > > > -             acpi_processor_ignore_ppc_init();
> > > > -     }
> > > > -
> > > >       acpi_processor_throttling_init();
> > > >       return 0;
> > > >  err:
> > > >
> > > >
> > > >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update()
  2023-03-03 19:23 ` [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update() Rafael J. Wysocki
  2023-03-07 16:40   ` Zhang, Rui
@ 2023-03-27 20:57   ` Imre Deak
  2023-03-28 15:35     ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Imre Deak @ 2023-03-27 20:57 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Linux ACPI, LKML, Zhang Rui

Hi,

this leads to the stacktrace below triggered by
lockdep_assert_held(&cdev->lock) in cooling_device_stats_setup(), and

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 566df4522b885..132175b14814f 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -918,7 +918,9 @@ __thermal_cooling_device_register(struct device_node *np,
 	if (ret)
 		goto out_cdev_type;
 
+	mutex_lock(&cdev->lock);
 	thermal_cooling_device_setup_sysfs(cdev);
+	mutex_unlock(&cdev->lock);
 
 	ret = dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
 	if (ret)

fixed it up for me, but not sure if it's the correct fix.

--Imre

[    4.662358] ------------[ cut here ]------------
[    4.662361] WARNING: CPU: 3 PID: 1 at drivers/thermal/thermal_sysfs.c:879 cooling_device_stats_setup+0xb4/0xc0
[    4.662370] Modules linked in:
[    4.662375] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G          I        6.3.0-rc4-imre+ #771
[    4.662379] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
[    4.662382] RIP: 0010:cooling_device_stats_setup+0xb4/0xc0
[    4.662387] Code: 89 1d 58 52 36 01 5b 41 5c 41 5d 5d c3 cc cc cc cc 48 8d bf 18 05 00 00 be ff ff ff ff e8 f4 d2 3e 00 85 c0 0f 85 6f ff ff ff <0f> 0b e9 68 ff ff ff 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90
[    4.662390] RSP: 0000:ffff9f48c0057b30 EFLAGS: 00010246
[    4.662395] RAX: 0000000000000000 RBX: ffff8fc381ca9800 RCX: 0000000000000000
[    4.662398] RDX: 0000000000000000 RSI: ffffffff94ad1d28 RDI: ffffffff94b58cc6
[    4.662401] RBP: ffff9f48c0057b48 R08: 0000000000000004 R09: 0000000000000000
[    4.662404] R10: ffff8fc381c77cd0 R11: 0000000000000000 R12: 0000000000000002
[    4.662406] R13: ffff8fc381ca9800 R14: ffff8fc381b0a000 R15: 0000000000000000
[    4.662409] FS:  0000000000000000(0000) GS:ffff8fc6b5580000(0000) knlGS:0000000000000000
[    4.662412] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.662415] CR2: 0000000000000000 CR3: 0000000283856001 CR4: 00000000001706e0
[    4.662418] Call Trace:
[    4.662421]  <TASK>
[    4.662427]  thermal_cooling_device_setup_sysfs+0x12/0x30
[    4.662433]  __thermal_cooling_device_register+0x195/0x410
[    4.662442]  thermal_cooling_device_register+0x19/0x20
[    4.662446]  acpi_fan_probe+0xd7/0x5a0
[    4.662458]  ? acpi_match_device_ids+0x12/0x20
[    4.662464]  ? acpi_dev_pm_attach+0x41/0x110
[    4.662473]  platform_probe+0x48/0xc0
[    4.662481]  really_probe+0x1be/0x420
[    4.662487]  __driver_probe_device+0x8c/0x190
[    4.662493]  driver_probe_device+0x24/0x90
[    4.662498]  __driver_attach+0xf7/0x200
[    4.662503]  ? __pfx___driver_attach+0x10/0x10
[    4.662507]  bus_for_each_dev+0x80/0xd0
[    4.662516]  driver_attach+0x1e/0x30
[    4.662522]  bus_add_driver+0x11f/0x230
[    4.662530]  driver_register+0x5e/0x120
[    4.662534]  ? __pfx_acpi_fan_driver_init+0x10/0x10
[    4.662540]  __platform_driver_register+0x1e/0x30
[    4.662545]  acpi_fan_driver_init+0x17/0x20
[    4.662549]  do_one_initcall+0x61/0x280
[    4.662559]  ? debug_smp_processor_id+0x17/0x20
[    4.662568]  kernel_init_freeable+0x411/0x640
[    4.662582]  ? __pfx_kernel_init+0x10/0x10
[    4.662589]  kernel_init+0x1b/0x1f0
[    4.662594]  ? __pfx_kernel_init+0x10/0x10
[    4.662599]  ret_from_fork+0x2c/0x50
[    4.662615]  </TASK>
[    4.662618] irq event stamp: 506869
[    4.662620] hardirqs last  enabled at (506875): [<ffffffff9338e2d8>] __up_console_sem+0x68/0x80
[    4.662625] hardirqs last disabled at (506880): [<ffffffff9338e2bd>] __up_console_sem+0x4d/0x80
[    4.662628] softirqs last  enabled at (504698): [<ffffffff932de49f>] __irq_exit_rcu+0xbf/0x140
[    4.662633] softirqs last disabled at (504689): [<ffffffff932de49f>] __irq_exit_rcu+0xbf/0x140
[    4.662636] ---[ end trace 0000000000000000 ]---
[    4.662779] ------------[ cut here ]------------

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update()
  2023-03-27 20:57   ` Imre Deak
@ 2023-03-28 15:35     ` Rafael J. Wysocki
  0 siblings, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2023-03-28 15:35 UTC (permalink / raw)
  To: imre.deak; +Cc: Rafael J. Wysocki, Linux PM, Linux ACPI, LKML, Zhang Rui

On Mon, Mar 27, 2023 at 10:58 PM Imre Deak <imre.deak@intel.com> wrote:
>
> Hi,
>
> this leads to the stacktrace below triggered by
> lockdep_assert_held(&cdev->lock) in cooling_device_stats_setup(),

Thanks for the report!

> and
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 566df4522b885..132175b14814f 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -918,7 +918,9 @@ __thermal_cooling_device_register(struct device_node *np,
>         if (ret)
>                 goto out_cdev_type;
>
> +       mutex_lock(&cdev->lock);
>         thermal_cooling_device_setup_sysfs(cdev);
> +       mutex_unlock(&cdev->lock);
>
>         ret = dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
>         if (ret)
>
> fixed it up for me, but not sure if it's the correct fix.

There are other cases when the lockdep_assert_held() annotations may
trigger, so it is better to remove them from
cooling_device_stats_setup() and cooling_device_stats_destroy() and to
put one into thermal_cooling_device_stats_reinit().

I'll send a patch to do that later today.

> [    4.662358] ------------[ cut here ]------------
> [    4.662361] WARNING: CPU: 3 PID: 1 at drivers/thermal/thermal_sysfs.c:879 cooling_device_stats_setup+0xb4/0xc0
> [    4.662370] Modules linked in:
> [    4.662375] CPU: 3 PID: 1 Comm: swapper/0 Tainted: G          I        6.3.0-rc4-imre+ #771
> [    4.662379] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013
> [    4.662382] RIP: 0010:cooling_device_stats_setup+0xb4/0xc0
> [    4.662387] Code: 89 1d 58 52 36 01 5b 41 5c 41 5d 5d c3 cc cc cc cc 48 8d bf 18 05 00 00 be ff ff ff ff e8 f4 d2 3e 00 85 c0 0f 85 6f ff ff ff <0f> 0b e9 68 ff ff ff 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90
> [    4.662390] RSP: 0000:ffff9f48c0057b30 EFLAGS: 00010246
> [    4.662395] RAX: 0000000000000000 RBX: ffff8fc381ca9800 RCX: 0000000000000000
> [    4.662398] RDX: 0000000000000000 RSI: ffffffff94ad1d28 RDI: ffffffff94b58cc6
> [    4.662401] RBP: ffff9f48c0057b48 R08: 0000000000000004 R09: 0000000000000000
> [    4.662404] R10: ffff8fc381c77cd0 R11: 0000000000000000 R12: 0000000000000002
> [    4.662406] R13: ffff8fc381ca9800 R14: ffff8fc381b0a000 R15: 0000000000000000
> [    4.662409] FS:  0000000000000000(0000) GS:ffff8fc6b5580000(0000) knlGS:0000000000000000
> [    4.662412] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.662415] CR2: 0000000000000000 CR3: 0000000283856001 CR4: 00000000001706e0
> [    4.662418] Call Trace:
> [    4.662421]  <TASK>
> [    4.662427]  thermal_cooling_device_setup_sysfs+0x12/0x30
> [    4.662433]  __thermal_cooling_device_register+0x195/0x410
> [    4.662442]  thermal_cooling_device_register+0x19/0x20
> [    4.662446]  acpi_fan_probe+0xd7/0x5a0
> [    4.662458]  ? acpi_match_device_ids+0x12/0x20
> [    4.662464]  ? acpi_dev_pm_attach+0x41/0x110
> [    4.662473]  platform_probe+0x48/0xc0
> [    4.662481]  really_probe+0x1be/0x420
> [    4.662487]  __driver_probe_device+0x8c/0x190
> [    4.662493]  driver_probe_device+0x24/0x90
> [    4.662498]  __driver_attach+0xf7/0x200
> [    4.662503]  ? __pfx___driver_attach+0x10/0x10
> [    4.662507]  bus_for_each_dev+0x80/0xd0
> [    4.662516]  driver_attach+0x1e/0x30
> [    4.662522]  bus_add_driver+0x11f/0x230
> [    4.662530]  driver_register+0x5e/0x120
> [    4.662534]  ? __pfx_acpi_fan_driver_init+0x10/0x10
> [    4.662540]  __platform_driver_register+0x1e/0x30
> [    4.662545]  acpi_fan_driver_init+0x17/0x20
> [    4.662549]  do_one_initcall+0x61/0x280
> [    4.662559]  ? debug_smp_processor_id+0x17/0x20
> [    4.662568]  kernel_init_freeable+0x411/0x640
> [    4.662582]  ? __pfx_kernel_init+0x10/0x10
> [    4.662589]  kernel_init+0x1b/0x1f0
> [    4.662594]  ? __pfx_kernel_init+0x10/0x10
> [    4.662599]  ret_from_fork+0x2c/0x50
> [    4.662615]  </TASK>
> [    4.662618] irq event stamp: 506869
> [    4.662620] hardirqs last  enabled at (506875): [<ffffffff9338e2d8>] __up_console_sem+0x68/0x80
> [    4.662625] hardirqs last disabled at (506880): [<ffffffff9338e2bd>] __up_console_sem+0x4d/0x80
> [    4.662628] softirqs last  enabled at (504698): [<ffffffff932de49f>] __irq_exit_rcu+0xbf/0x140
> [    4.662633] softirqs last disabled at (504689): [<ffffffff932de49f>] __irq_exit_rcu+0xbf/0x140
> [    4.662636] ---[ end trace 0000000000000000 ]---
> [    4.662779] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-03-28 15:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-03 19:18 [PATCH v1 0/4] thermal: core/ACPI: Fix processor cooling device regression Rafael J. Wysocki
2023-03-03 19:19 ` [PATCH v1 1/4] ACPI: processor: Reorder acpi_processor_driver_init() Rafael J. Wysocki
2023-03-07 16:51   ` Zhang, Rui
2023-03-10 18:31     ` Rafael J. Wysocki
2023-03-12 16:08   ` Zhang, Rui
2023-03-13 13:48     ` Rafael J. Wysocki
2023-03-13 14:54       ` Zhang, Rui
2023-03-13 14:58         ` Rafael J. Wysocki
2023-03-03 19:21 ` [PATCH v1 2/4] thermal: core: Introduce thermal_cooling_device_present() Rafael J. Wysocki
2023-03-03 19:23 ` [PATCH v1 3/4] thermal: core: Introduce thermal_cooling_device_update() Rafael J. Wysocki
2023-03-07 16:40   ` Zhang, Rui
2023-03-10 18:25     ` Rafael J. Wysocki
2023-03-27 20:57   ` Imre Deak
2023-03-28 15:35     ` Rafael J. Wysocki
2023-03-03 19:23 ` [PATCH v1 4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes Rafael J. Wysocki
2023-03-07 16:43   ` Zhang, Rui
2023-03-10 18:29     ` Rafael J. Wysocki
2023-03-12 14:44       ` Zhang, Rui
2023-03-13 13:50         ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).