All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal
@ 2022-11-10 15:24 Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 1/9] thermal/core: Destroy thermal zone device mutex in release function Guenter Roeck
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

Accesses to thermal zones, and with it to thermal zone device operations,
are still possible after the thermal zone device has been unregistered.
For example, thermal_zone_get_temp() can be called from temp_show()
in thermal_sysfs.c if the sysfs attribute was opened before the thermal
device was unregistered. This is problematic and may result in crashes
since the operations data structure and the underlying code may be gone
when the calls are made.

The following series solves the problem by protecting accesses to thermal
device operations with the thermal device mutex, and by verifying that the
thermal device is still registered after the mutex has been acquired.

This was previously sent as RFC/RFT as single patch [1]. The code was reworked
to match thermal subsystem changes made between v6.0 and v6.1, and it was
split into several patches to simplify review.

[1] https://lore.kernel.org/linux-pm/20221004033936.1047691-1-linux@roeck-us.net/

v2: Improved documentation, rearranged code.
    No functional changes. See individual patches for details.

----------------------------------------------------------------
Guenter Roeck (9):
      thermal/core: Destroy thermal zone device mutex in release function
      thermal/core: Delete device under thermal device zone lock
      thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
      thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
      thermal/core: Introduce locked version of thermal_zone_device_update
      thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
      thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
      thermal/core: Remove thermal_zone_set_trips()
      thermal/core: Protect thermal device operations against thermal device removal

 drivers/thermal/thermal_core.c    | 76 +++++++++++++++++++++++--------------
 drivers/thermal/thermal_core.h    |  3 +-
 drivers/thermal/thermal_helpers.c | 67 +++++++++++++++++++++------------
 drivers/thermal/thermal_hwmon.c   | 10 ++++-
 drivers/thermal/thermal_sysfs.c   | 79 ++++++++++++++++++++++++++++++++-------
 5 files changed, 168 insertions(+), 67 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/9] thermal/core: Destroy thermal zone device mutex in release function
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 2/9] thermal/core: Delete device under thermal device zone lock Guenter Roeck
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

Accesses to thermal zones, and with it the thermal zone device mutex,
are still possible after the thermal zone device has been unregistered.
For example, thermal_zone_get_temp() can be called from temp_show()
in thermal_sysfs.c if the sysfs attribute was opened before the thermal
device was unregistered.

Move the call to mutex_destroy from thermal_zone_device_unregister()
to thermal_release() to ensure that it is only destroyed after it is
guaranteed to be no longer accessed.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: No change

 drivers/thermal/thermal_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 117eeaf7dd24..f548875a016d 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -759,6 +759,7 @@ static void thermal_release(struct device *dev)
 		     sizeof("thermal_zone") - 1)) {
 		tz = to_thermal_zone(dev);
 		thermal_zone_destroy_device_groups(tz);
+		mutex_destroy(&tz->lock);
 		kfree(tz);
 	} else if (!strncmp(dev_name(dev), "cooling_device",
 			    sizeof("cooling_device") - 1)) {
@@ -1390,7 +1391,6 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 	thermal_remove_hwmon_sysfs(tz);
 	ida_free(&thermal_tz_ida, tz->id);
 	ida_destroy(&tz->ida);
-	mutex_destroy(&tz->lock);
 	device_unregister(&tz->device);
 
 	thermal_notify_tz_delete(tz_id);
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/9] thermal/core: Delete device under thermal device zone lock
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 1/9] thermal/core: Destroy thermal zone device mutex in release function Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 3/9] thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp Guenter Roeck
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

Thermal device attributes may still be opened after unregistering
the thermal zone and deleting the thermal device.

Currently there is no protection against accessing thermal device
operations after unregistering a thermal zone. To enable adding
such protection, protect the device delete operation with the
thermal zone device mutex. This requires splitting the call to
device_unregister() into its components, device_del() and put_device().
Only the first call can be executed under mutex protection, since
put_device() may result in releasing the thermal zone device memory.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: No change

 drivers/thermal/thermal_core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index f548875a016d..562ece8d16aa 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1391,7 +1391,12 @@ void thermal_zone_device_unregister(struct thermal_zone_device *tz)
 	thermal_remove_hwmon_sysfs(tz);
 	ida_free(&thermal_tz_ida, tz->id);
 	ida_destroy(&tz->ida);
-	device_unregister(&tz->device);
+
+	mutex_lock(&tz->lock);
+	device_del(&tz->device);
+	mutex_unlock(&tz->lock);
+
+	put_device(&tz->device);
 
 	thermal_notify_tz_delete(tz_id);
 }
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/9] thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 1/9] thermal/core: Destroy thermal zone device mutex in release function Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 2/9] thermal/core: Delete device under thermal device zone lock Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 4/9] thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp Guenter Roeck
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

Calls to thermal_zone_get_temp() are not protected against thermal zone
device removal. As result, it is possible that the thermal zone operations
callbacks are no longer valid when thermal_zone_get_temp() is called.
This may result in crashes such as

BUG: unable to handle page fault for address: ffffffffc04ef420
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
PGD 5d60e067 P4D 5d60e067 PUD 5d610067 PMD 110197067 PTE 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 3209 Comm: cat Tainted: G        W         5.10.136-19389-g615abc6eb807 #1 02df41ac0b12f3a64f4b34245188d8875bb3bce1
Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.92.0 11/27/2018
RIP: 0010:thermal_zone_get_temp+0x26/0x73
Code: 89 c3 eb d3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 85 ff 74 50 48 89 fb 48 81 ff 00 f0 ff ff 77 44 48 8b 83 98 03 00 00 <48> 83 78 10 00 74 36 49 89 f6 4c 8d bb d8 03 00 00 4c 89 ff e8 9f
RSP: 0018:ffffb3758138fd38 EFLAGS: 00010287
RAX: ffffffffc04ef410 RBX: ffff98f14d7fb000 RCX: 0000000000000000
RDX: ffff98f17cf90000 RSI: ffffb3758138fd64 RDI: ffff98f14d7fb000
RBP: ffffb3758138fd50 R08: 0000000000001000 R09: ffff98f17cf90000
R10: 0000000000000000 R11: ffffffff8dacad28 R12: 0000000000001000
R13: ffff98f1793a7d80 R14: ffff98f143231708 R15: ffff98f14d7fb018
FS:  00007ec166097800(0000) GS:ffff98f1bbd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc04ef420 CR3: 000000010ee9a000 CR4: 00000000003506e0
Call Trace:
 temp_show+0x31/0x68
 dev_attr_show+0x1d/0x4f
 sysfs_kf_seq_show+0x92/0x107
 seq_read_iter+0xf5/0x3f2
 vfs_read+0x205/0x379
 __x64_sys_read+0x7c/0xe2
 do_syscall_64+0x43/0x55
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

if a thermal device is removed while accesses to its device attributes
are ongoing.

The problem is exposed by code in iwl_op_mode_mvm_start(), which registers
a thermal zone device only to unregister it shortly afterwards if an
unrelated failure is encountered while accessing the hardware.

Check if the thermal zone device is registered after acquiring the
thermal zone device mutex to ensure this does not happen.

The code was tested by triggering the failure in iwl_op_mode_mvm_start()
on purpose. Without this patch, the kernel crashes reliably. The crash
is no longer observed after applying this and the preceding patches.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: Simplify error handling 

 drivers/thermal/thermal_helpers.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index c65cdce8f856..fca0b23570f9 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c
@@ -115,7 +115,12 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp)
 	int ret;
 
 	mutex_lock(&tz->lock);
-	ret = __thermal_zone_get_temp(tz, temp);
+
+	if (device_is_registered(&tz->device))
+		ret = __thermal_zone_get_temp(tz, temp);
+	else
+		ret = -ENODEV;
+
 	mutex_unlock(&tz->lock);
 
 	return ret;
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/9] thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
                   ` (2 preceding siblings ...)
  2022-11-10 15:24 ` [PATCH v2 3/9] thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 5/9] thermal/core: Introduce locked version of thermal_zone_device_update Guenter Roeck
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

All callers of __thermal_zone_get_temp() already validated the
thermal zone parameters. Move validation to thermal_zone_get_temp()
where it is actually needed. Also add kernel documentation for
__thermal_zone_get_temp(), listing the requirement that the
function must be called with validated parameters and with thermal
device mutex held.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: Add note about added kernel documentation to description .
    Use IS_ERR_OR_NULL instead of manually coding it.

 drivers/thermal/thermal_helpers.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index fca0b23570f9..321f8020082d 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c
@@ -64,6 +64,20 @@ get_thermal_instance(struct thermal_zone_device *tz,
 }
 EXPORT_SYMBOL(get_thermal_instance);
 
+/**
+ * __thermal_zone_get_temp() - returns the temperature of a thermal zone
+ * @tz: a valid pointer to a struct thermal_zone_device
+ * @temp: a valid pointer to where to store the resulting temperature.
+ *
+ * When a valid thermal zone reference is passed, it will fetch its
+ * temperature and fill @temp.
+ *
+ * Both tz and tz->ops must be valid pointers when calling this function,
+ * and the tz->ops->get_temp callback must be provided.
+ * The function must be called under tz->lock.
+ *
+ * Return: On success returns 0, an error code otherwise
+ */
 int __thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp)
 {
 	int ret = -EINVAL;
@@ -73,9 +87,6 @@ int __thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp)
 
 	lockdep_assert_held(&tz->lock);
 
-	if (!tz || IS_ERR(tz) || !tz->ops->get_temp)
-		return -EINVAL;
-
 	ret = tz->ops->get_temp(tz, temp);
 
 	if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz->emul_temperature) {
@@ -114,13 +125,22 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp)
 {
 	int ret;
 
+	if (IS_ERR_OR_NULL(tz))
+		return -EINVAL;
+
 	mutex_lock(&tz->lock);
 
+	if (!tz->ops->get_temp) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
 	if (device_is_registered(&tz->device))
 		ret = __thermal_zone_get_temp(tz, temp);
 	else
 		ret = -ENODEV;
 
+unlock:
 	mutex_unlock(&tz->lock);
 
 	return ret;
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/9] thermal/core: Introduce locked version of thermal_zone_device_update
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
                   ` (3 preceding siblings ...)
  2022-11-10 15:24 ` [PATCH v2 4/9] thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 6/9] thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex Guenter Roeck
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

In thermal_zone_device_set_mode(), the thermal zone mutex is released only
to be reacquired in the subsequent call to thermal_zone_device_update().

Introduce __thermal_zone_device_update(), which is similar to
thermal_zone_device_update() but has to be called with the thermal device
mutex held. Call the new function from thermal_zone_device_set_mode()
to avoid the extra thermal device mutex release/acquire sequence in that
function.

With the new function in place, re-implement thermal_zone_device_update()
as wrapper around __thermal_zone_device_update() to acquire and release
the thermal device mutex.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: Reword description to clarify that the new function must be called with
    thermal device mutex held.

 drivers/thermal/thermal_core.c | 57 ++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 562ece8d16aa..9facd9c5b70f 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -403,6 +403,34 @@ static void thermal_zone_device_init(struct thermal_zone_device *tz)
 		pos->initialized = false;
 }
 
+static void __thermal_zone_device_update(struct thermal_zone_device *tz,
+					 enum thermal_notify_event event)
+{
+	int count;
+
+	if (atomic_read(&in_suspend))
+		return;
+
+	if (WARN_ONCE(!tz->ops->get_temp,
+		      "'%s' must not be called without 'get_temp' ops set\n",
+		      __func__))
+		return;
+
+	if (!thermal_zone_device_is_enabled(tz))
+		return;
+
+	update_temperature(tz);
+
+	__thermal_zone_set_trips(tz);
+
+	tz->notify_event = event;
+
+	for (count = 0; count < tz->num_trips; count++)
+		handle_thermal_trip(tz, count);
+
+	monitor_thermal_zone(tz);
+}
+
 static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
 					enum thermal_device_mode mode)
 {
@@ -423,9 +451,9 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
 	if (!ret)
 		tz->mode = mode;
 
-	mutex_unlock(&tz->lock);
+	__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
 
-	thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+	mutex_unlock(&tz->lock);
 
 	if (mode == THERMAL_DEVICE_ENABLED)
 		thermal_notify_tz_enable(tz->id);
@@ -457,31 +485,8 @@ int thermal_zone_device_is_enabled(struct thermal_zone_device *tz)
 void thermal_zone_device_update(struct thermal_zone_device *tz,
 				enum thermal_notify_event event)
 {
-	int count;
-
-	if (atomic_read(&in_suspend))
-		return;
-
-	if (WARN_ONCE(!tz->ops->get_temp, "'%s' must not be called without "
-		      "'get_temp' ops set\n", __func__))
-		return;
-
 	mutex_lock(&tz->lock);
-
-	if (!thermal_zone_device_is_enabled(tz))
-		goto out;
-
-	update_temperature(tz);
-
-	__thermal_zone_set_trips(tz);
-
-	tz->notify_event = event;
-
-	for (count = 0; count < tz->num_trips; count++)
-		handle_thermal_trip(tz, count);
-
-	monitor_thermal_zone(tz);
-out:
+	__thermal_zone_device_update(tz, event);
 	mutex_unlock(&tz->lock);
 }
 EXPORT_SYMBOL_GPL(thermal_zone_device_update);
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/9] thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
                   ` (4 preceding siblings ...)
  2022-11-10 15:24 ` [PATCH v2 5/9] thermal/core: Introduce locked version of thermal_zone_device_update Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 7/9] thermal/core: Protect sysfs " Guenter Roeck
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

In preparation to protecting access to thermal operations against thermal
zone device removal, protect hwmon accesses to thermal zone operations
with the thermal zone mutex. After acquiring the mutex, ensure that the
thermal zone device is registered before proceeding.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: Simplify error handling to avoid additional goto.
    Do not use ternary operator in modified code.

 drivers/thermal/thermal_hwmon.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
index f53f4ceb6a5d..c594c42bea6d 100644
--- a/drivers/thermal/thermal_hwmon.c
+++ b/drivers/thermal/thermal_hwmon.c
@@ -77,7 +77,15 @@ temp_crit_show(struct device *dev, struct device_attribute *attr, char *buf)
 	int temperature;
 	int ret;
 
-	ret = tz->ops->get_crit_temp(tz, &temperature);
+	mutex_lock(&tz->lock);
+
+	if (device_is_registered(&tz->device))
+		ret = tz->ops->get_crit_temp(tz, &temperature);
+	else
+		ret = -ENODEV;
+
+	mutex_unlock(&tz->lock);
+
 	if (ret)
 		return ret;
 
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 7/9] thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
                   ` (5 preceding siblings ...)
  2022-11-10 15:24 ` [PATCH v2 6/9] thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:24 ` [PATCH v2 8/9] thermal/core: Remove thermal_zone_set_trips() Guenter Roeck
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

Protect access to thermal operations against thermal zone removal by
acquiring the thermal zone device mutex. After acquiring the mutex, check
if the thermal zone device is registered and abort the operation if not.

With this change, we can call __thermal_zone_device_update() instead of
thermal_zone_device_update() from trip_point_temp_store() and from
emul_temp_store(). Similar, we can call __thermal_zone_set_trips() instead
of thermal_zone_set_trips() from trip_point_hyst_store().

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: Simplify error handling to avoid additional goto.
    Do not use ternary operator in modified code.
    Remove now unnecessary { } in emul_temp_store().

 drivers/thermal/thermal_core.c  |  4 +-
 drivers/thermal/thermal_core.h  |  2 +
 drivers/thermal/thermal_sysfs.c | 79 +++++++++++++++++++++++++++------
 3 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 9facd9c5b70f..b8e3b262b2bd 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -403,8 +403,8 @@ static void thermal_zone_device_init(struct thermal_zone_device *tz)
 		pos->initialized = false;
 }
 
-static void __thermal_zone_device_update(struct thermal_zone_device *tz,
-					 enum thermal_notify_event event)
+void __thermal_zone_device_update(struct thermal_zone_device *tz,
+				  enum thermal_notify_event event)
 {
 	int count;
 
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 1571917bd3c8..7b51b9a22e7e 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -109,6 +109,8 @@ int thermal_register_governor(struct thermal_governor *);
 void thermal_unregister_governor(struct thermal_governor *);
 int thermal_zone_device_set_policy(struct thermal_zone_device *, char *);
 int thermal_build_list_of_policies(char *buf);
+void __thermal_zone_device_update(struct thermal_zone_device *tz,
+				  enum thermal_notify_event event);
 
 /* Helpers */
 void thermal_zone_set_trips(struct thermal_zone_device *tz);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index ec495c7dff03..e2e94294fbb2 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -92,7 +92,14 @@ trip_point_type_show(struct device *dev, struct device_attribute *attr,
 	if (sscanf(attr->attr.name, "trip_point_%d_type", &trip) != 1)
 		return -EINVAL;
 
-	result = tz->ops->get_trip_type(tz, trip, &type);
+	mutex_lock(&tz->lock);
+
+	if (device_is_registered(dev))
+		result = tz->ops->get_trip_type(tz, trip, &type);
+	else
+		result = -ENODEV;
+
+	mutex_unlock(&tz->lock);
 	if (result)
 		return result;
 
@@ -128,10 +135,17 @@ trip_point_temp_store(struct device *dev, struct device_attribute *attr,
 	if (kstrtoint(buf, 10, &temperature))
 		return -EINVAL;
 
+	mutex_lock(&tz->lock);
+
+	if (!device_is_registered(dev)) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
 	if (tz->ops->set_trip_temp) {
 		ret = tz->ops->set_trip_temp(tz, trip, temperature);
 		if (ret)
-			return ret;
+			goto unlock;
 	}
 
 	if (tz->trips)
@@ -140,16 +154,22 @@ trip_point_temp_store(struct device *dev, struct device_attribute *attr,
 	if (tz->ops->get_trip_hyst) {
 		ret = tz->ops->get_trip_hyst(tz, trip, &hyst);
 		if (ret)
-			return ret;
+			goto unlock;
 	}
 
 	ret = tz->ops->get_trip_type(tz, trip, &type);
 	if (ret)
-		return ret;
+		goto unlock;
 
 	thermal_notify_tz_trip_change(tz->id, trip, type, temperature, hyst);
 
-	thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+	__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+
+unlock:
+	mutex_unlock(&tz->lock);
+
+	if (ret)
+		return ret;
 
 	return count;
 }
@@ -168,7 +188,14 @@ trip_point_temp_show(struct device *dev, struct device_attribute *attr,
 	if (sscanf(attr->attr.name, "trip_point_%d_temp", &trip) != 1)
 		return -EINVAL;
 
-	ret = tz->ops->get_trip_temp(tz, trip, &temperature);
+	mutex_lock(&tz->lock);
+
+	if (device_is_registered(dev))
+		ret = tz->ops->get_trip_temp(tz, trip, &temperature);
+	else
+		ret = -ENODEV;
+
+	mutex_unlock(&tz->lock);
 
 	if (ret)
 		return ret;
@@ -193,6 +220,13 @@ trip_point_hyst_store(struct device *dev, struct device_attribute *attr,
 	if (kstrtoint(buf, 10, &temperature))
 		return -EINVAL;
 
+	mutex_lock(&tz->lock);
+
+	if (!device_is_registered(dev)) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
 	/*
 	 * We are not doing any check on the 'temperature' value
 	 * here. The driver implementing 'set_trip_hyst' has to
@@ -201,7 +235,10 @@ trip_point_hyst_store(struct device *dev, struct device_attribute *attr,
 	ret = tz->ops->set_trip_hyst(tz, trip, temperature);
 
 	if (!ret)
-		thermal_zone_set_trips(tz);
+		__thermal_zone_set_trips(tz);
+
+unlock:
+	mutex_unlock(&tz->lock);
 
 	return ret ? ret : count;
 }
@@ -220,7 +257,14 @@ trip_point_hyst_show(struct device *dev, struct device_attribute *attr,
 	if (sscanf(attr->attr.name, "trip_point_%d_hyst", &trip) != 1)
 		return -EINVAL;
 
-	ret = tz->ops->get_trip_hyst(tz, trip, &temperature);
+	mutex_lock(&tz->lock);
+
+	if (device_is_registered(dev))
+		ret = tz->ops->get_trip_hyst(tz, trip, &temperature);
+	else
+		ret = -ENODEV;
+
+	mutex_unlock(&tz->lock);
 
 	return ret ? ret : sprintf(buf, "%d\n", temperature);
 }
@@ -269,16 +313,23 @@ emul_temp_store(struct device *dev, struct device_attribute *attr,
 	if (kstrtoint(buf, 10, &temperature))
 		return -EINVAL;
 
-	if (!tz->ops->set_emul_temp) {
-		mutex_lock(&tz->lock);
+	mutex_lock(&tz->lock);
+
+	if (!device_is_registered(dev)) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	if (!tz->ops->set_emul_temp)
 		tz->emul_temperature = temperature;
-		mutex_unlock(&tz->lock);
-	} else {
+	else
 		ret = tz->ops->set_emul_temp(tz, temperature);
-	}
 
 	if (!ret)
-		thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+		__thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+
+unlock:
+	mutex_unlock(&tz->lock);
 
 	return ret ? ret : count;
 }
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 8/9] thermal/core: Remove thermal_zone_set_trips()
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
                   ` (6 preceding siblings ...)
  2022-11-10 15:24 ` [PATCH v2 7/9] thermal/core: Protect sysfs " Guenter Roeck
@ 2022-11-10 15:24 ` Guenter Roeck
  2022-11-10 15:25 ` [PATCH v2 9/9] thermal/core: Protect thermal device operations against thermal device removal Guenter Roeck
  2022-11-14 18:06 ` [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Rafael J. Wysocki
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:24 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

Since no callers of thermal_zone_set_trips() are left, remove the function.
Document __thermal_zone_set_trips() instead. Explicitly state that the
thermal zone lock must be held when calling the function, and that the
pointer to the thermal zone must be valid.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: No change

 drivers/thermal/thermal_core.h    |  1 -
 drivers/thermal/thermal_helpers.c | 34 ++++++++++++++-----------------
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 7b51b9a22e7e..b834cb273429 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -113,7 +113,6 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz,
 				  enum thermal_notify_event event);
 
 /* Helpers */
-void thermal_zone_set_trips(struct thermal_zone_device *tz);
 void __thermal_zone_set_trips(struct thermal_zone_device *tz);
 int __thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp);
 
diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index 321f8020082d..56aa2e88f34f 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c
@@ -147,6 +147,21 @@ int thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp)
 }
 EXPORT_SYMBOL_GPL(thermal_zone_get_temp);
 
+/**
+ * __thermal_zone_set_trips - Computes the next trip points for the driver
+ * @tz: a pointer to a thermal zone device structure
+ *
+ * The function computes the next temperature boundaries by browsing
+ * the trip points. The result is the closer low and high trip points
+ * to the current temperature. These values are passed to the backend
+ * driver to let it set its own notification mechanism (usually an
+ * interrupt).
+ *
+ * This function must be called with tz->lock held. Both tz and tz->ops
+ * must be valid pointers.
+ *
+ * It does not return a value
+ */
 void __thermal_zone_set_trips(struct thermal_zone_device *tz)
 {
 	int low = -INT_MAX;
@@ -193,25 +208,6 @@ void __thermal_zone_set_trips(struct thermal_zone_device *tz)
 		dev_err(&tz->device, "Failed to set trips: %d\n", ret);
 }
 
-/**
- * thermal_zone_set_trips - Computes the next trip points for the driver
- * @tz: a pointer to a thermal zone device structure
- *
- * The function computes the next temperature boundaries by browsing
- * the trip points. The result is the closer low and high trip points
- * to the current temperature. These values are passed to the backend
- * driver to let it set its own notification mechanism (usually an
- * interrupt).
- *
- * It does not return a value
- */
-void thermal_zone_set_trips(struct thermal_zone_device *tz)
-{
-	mutex_lock(&tz->lock);
-	__thermal_zone_set_trips(tz);
-	mutex_unlock(&tz->lock);
-}
-
 static void thermal_cdev_set_cur_state(struct thermal_cooling_device *cdev,
 				       int target)
 {
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 9/9] thermal/core: Protect thermal device operations against thermal device removal
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
                   ` (7 preceding siblings ...)
  2022-11-10 15:24 ` [PATCH v2 8/9] thermal/core: Remove thermal_zone_set_trips() Guenter Roeck
@ 2022-11-10 15:25 ` Guenter Roeck
  2022-11-14 18:06 ` [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Rafael J. Wysocki
  9 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-10 15:25 UTC (permalink / raw)
  To: Rafael J . Wysocki, Daniel Lezcano
  Cc: Amit Kucheria, Zhang Rui, linux-pm, linux-kernel, Guenter Roeck

Thermal device operations may be called after thermal zone device removal.
After thermal zone device removal, thermal zone device operations must
no longer be called. To prevent such calls from happening, ensure that
the thermal device is registered before executing any thermal device
operations.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: Fixed spelling error in description

 drivers/thermal/thermal_core.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index b8e3b262b2bd..aa0107f11c98 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -203,6 +203,9 @@ int thermal_zone_device_set_policy(struct thermal_zone_device *tz,
 	mutex_lock(&thermal_governor_lock);
 	mutex_lock(&tz->lock);
 
+	if (!device_is_registered(&tz->device))
+		goto exit;
+
 	gov = __find_governor(strim(policy));
 	if (!gov)
 		goto exit;
@@ -445,6 +448,12 @@ static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
 		return ret;
 	}
 
+	if (!device_is_registered(&tz->device)) {
+		mutex_unlock(&tz->lock);
+
+		return -ENODEV;
+	}
+
 	if (tz->ops->change_mode)
 		ret = tz->ops->change_mode(tz, mode);
 
@@ -486,7 +495,8 @@ void thermal_zone_device_update(struct thermal_zone_device *tz,
 				enum thermal_notify_event event)
 {
 	mutex_lock(&tz->lock);
-	__thermal_zone_device_update(tz, event);
+	if (device_is_registered(&tz->device))
+		__thermal_zone_device_update(tz, event);
 	mutex_unlock(&tz->lock);
 }
 EXPORT_SYMBOL_GPL(thermal_zone_device_update);
-- 
2.36.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal
  2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
                   ` (8 preceding siblings ...)
  2022-11-10 15:25 ` [PATCH v2 9/9] thermal/core: Protect thermal device operations against thermal device removal Guenter Roeck
@ 2022-11-14 18:06 ` Rafael J. Wysocki
  2022-11-15 13:31   ` Guenter Roeck
  9 siblings, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2022-11-14 18:06 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rafael J . Wysocki, Daniel Lezcano, Amit Kucheria, Zhang Rui,
	linux-pm, linux-kernel

On Thu, Nov 10, 2022 at 4:25 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> Accesses to thermal zones, and with it to thermal zone device operations,
> are still possible after the thermal zone device has been unregistered.
> For example, thermal_zone_get_temp() can be called from temp_show()
> in thermal_sysfs.c if the sysfs attribute was opened before the thermal
> device was unregistered. This is problematic and may result in crashes
> since the operations data structure and the underlying code may be gone
> when the calls are made.
>
> The following series solves the problem by protecting accesses to thermal
> device operations with the thermal device mutex, and by verifying that the
> thermal device is still registered after the mutex has been acquired.
>
> This was previously sent as RFC/RFT as single patch [1]. The code was reworked
> to match thermal subsystem changes made between v6.0 and v6.1, and it was
> split into several patches to simplify review.
>
> [1] https://lore.kernel.org/linux-pm/20221004033936.1047691-1-linux@roeck-us.net/
>
> v2: Improved documentation, rearranged code.
>     No functional changes. See individual patches for details.
>
> ----------------------------------------------------------------
> Guenter Roeck (9):
>       thermal/core: Destroy thermal zone device mutex in release function
>       thermal/core: Delete device under thermal device zone lock
>       thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
>       thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
>       thermal/core: Introduce locked version of thermal_zone_device_update
>       thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
>       thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
>       thermal/core: Remove thermal_zone_set_trips()
>       thermal/core: Protect thermal device operations against thermal device removal
>
>  drivers/thermal/thermal_core.c    | 76 +++++++++++++++++++++++--------------
>  drivers/thermal/thermal_core.h    |  3 +-
>  drivers/thermal/thermal_helpers.c | 67 +++++++++++++++++++++------------
>  drivers/thermal/thermal_hwmon.c   | 10 ++++-
>  drivers/thermal/thermal_sysfs.c   | 79 ++++++++++++++++++++++++++++++++-------
>  5 files changed, 168 insertions(+), 67 deletions(-)

All applied as 6.2 material, thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal
  2022-11-14 18:06 ` [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Rafael J. Wysocki
@ 2022-11-15 13:31   ` Guenter Roeck
  0 siblings, 0 replies; 12+ messages in thread
From: Guenter Roeck @ 2022-11-15 13:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Lezcano, Amit Kucheria, Zhang Rui, linux-pm, linux-kernel

On Mon, Nov 14, 2022 at 07:06:57PM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 10, 2022 at 4:25 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > Accesses to thermal zones, and with it to thermal zone device operations,
> > are still possible after the thermal zone device has been unregistered.
> > For example, thermal_zone_get_temp() can be called from temp_show()
> > in thermal_sysfs.c if the sysfs attribute was opened before the thermal
> > device was unregistered. This is problematic and may result in crashes
> > since the operations data structure and the underlying code may be gone
> > when the calls are made.
> >
> > The following series solves the problem by protecting accesses to thermal
> > device operations with the thermal device mutex, and by verifying that the
> > thermal device is still registered after the mutex has been acquired.
> >
> > This was previously sent as RFC/RFT as single patch [1]. The code was reworked
> > to match thermal subsystem changes made between v6.0 and v6.1, and it was
> > split into several patches to simplify review.
> >
> > [1] https://lore.kernel.org/linux-pm/20221004033936.1047691-1-linux@roeck-us.net/
> >
> > v2: Improved documentation, rearranged code.
> >     No functional changes. See individual patches for details.
> >
> > ----------------------------------------------------------------
> > Guenter Roeck (9):
> >       thermal/core: Destroy thermal zone device mutex in release function
> >       thermal/core: Delete device under thermal device zone lock
> >       thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
> >       thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
> >       thermal/core: Introduce locked version of thermal_zone_device_update
> >       thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
> >       thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
> >       thermal/core: Remove thermal_zone_set_trips()
> >       thermal/core: Protect thermal device operations against thermal device removal
> >
> >  drivers/thermal/thermal_core.c    | 76 +++++++++++++++++++++++--------------
> >  drivers/thermal/thermal_core.h    |  3 +-
> >  drivers/thermal/thermal_helpers.c | 67 +++++++++++++++++++++------------
> >  drivers/thermal/thermal_hwmon.c   | 10 ++++-
> >  drivers/thermal/thermal_sysfs.c   | 79 ++++++++++++++++++++++++++++++++-------
> >  5 files changed, 168 insertions(+), 67 deletions(-)
> 
> All applied as 6.2 material, thanks!

Thanks a lot!

Guenter

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-11-15 13:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10 15:24 [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 1/9] thermal/core: Destroy thermal zone device mutex in release function Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 2/9] thermal/core: Delete device under thermal device zone lock Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 3/9] thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 4/9] thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 5/9] thermal/core: Introduce locked version of thermal_zone_device_update Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 6/9] thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 7/9] thermal/core: Protect sysfs " Guenter Roeck
2022-11-10 15:24 ` [PATCH v2 8/9] thermal/core: Remove thermal_zone_set_trips() Guenter Roeck
2022-11-10 15:25 ` [PATCH v2 9/9] thermal/core: Protect thermal device operations against thermal device removal Guenter Roeck
2022-11-14 18:06 ` [PATCH v2 0/9] thermal/core: Protect thermal device operations against removal Rafael J. Wysocki
2022-11-15 13:31   ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.