* [PATCH] hwmon: Handle failure to register sensor with thermal zone correctly
@ 2022-02-21 16:44 Guenter Roeck
2022-02-21 16:56 ` Dmitry Osipenko
0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2022-02-21 16:44 UTC (permalink / raw)
To: Hardware Monitoring
Cc: Jean Delvare, Guenter Roeck, Jon Hunter, Dmitry Osipenko
If an attempt is made to a sensor with a thermal zone and it fails,
the call to devm_thermal_zone_of_sensor_register() may return -ENODEV.
This may result in crashes similar to the following.
Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd
...
Internal error: Oops: 96000021 [#1] PREEMPT SMP
...
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mutex_lock+0x18/0x60
lr : thermal_zone_device_update+0x40/0x2e0
sp : ffff800014c4fc60
x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790
x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000
x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd
x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000
x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040
x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000
x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd
x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd
Call trace:
mutex_lock+0x18/0x60
hwmon_notify_event+0xfc/0x110
0xffffdde1cb7a0a90
0xffffdde1cb7a0b7c
irq_thread_fn+0x2c/0xa0
irq_thread+0x134/0x240
kthread+0x178/0x190
ret_from_fork+0x10/0x20
Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02)
Jon Hunter reports that the exact call sequence is:
hwmon_notify_event()
--> hwmon_thermal_notify()
--> thermal_zone_device_update()
--> update_temperature()
--> mutex_lock()
The hwmon core needs to handle all errors returned from calls
to devm_thermal_zone_of_sensor_register(). If the call fails
with -ENODEV, report the problem as warning but continue to
register the hwmon device.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Fixes: 1597b374af222 ("hwmon: Add notification support")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
drivers/hwmon/hwmon.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
index 3501a3ead4ba..4bfe3791a5ba 100644
--- a/drivers/hwmon/hwmon.c
+++ b/drivers/hwmon/hwmon.c
@@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index)
tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata,
&hwmon_thermal_ops);
- /*
- * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV,
- * so ignore that error but forward any other error.
- */
- if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV))
- return PTR_ERR(tzd);
+ if (IS_ERR(tzd)) {
+ if (PTR_ERR(tzd) != -ENODEV)
+ return PTR_ERR(tzd);
+ dev_warn(dev, "Failed to register temp%d_input with thermal zone\n",
+ index + 1);
+ devm_kfree(dev, tdata);
+ return 0;
+ }
err = devm_add_action(dev, hwmon_thermal_remove_sensor, &tdata->node);
if (err)
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] hwmon: Handle failure to register sensor with thermal zone correctly
2022-02-21 16:44 [PATCH] hwmon: Handle failure to register sensor with thermal zone correctly Guenter Roeck
@ 2022-02-21 16:56 ` Dmitry Osipenko
2022-02-21 17:12 ` Guenter Roeck
0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Osipenko @ 2022-02-21 16:56 UTC (permalink / raw)
To: Guenter Roeck, Hardware Monitoring; +Cc: Jean Delvare, Jon Hunter
21.02.2022 19:44, Guenter Roeck пишет:
> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
> index 3501a3ead4ba..4bfe3791a5ba 100644
> --- a/drivers/hwmon/hwmon.c
> +++ b/drivers/hwmon/hwmon.c
> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index)
>
> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata,
> &hwmon_thermal_ops);
> - /*
> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV,
> - * so ignore that error but forward any other error.
> - */
> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV))
> - return PTR_ERR(tzd);
> + if (IS_ERR(tzd)) {
> + if (PTR_ERR(tzd) != -ENODEV)
> + return PTR_ERR(tzd);
> + dev_warn(dev, "Failed to register temp%d_input with thermal zone\n",
> + index + 1);
Do we really need this warning? I suppose it should be okay if sensor
isn't attached to any device in a device-tree and just reports temperature.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] hwmon: Handle failure to register sensor with thermal zone correctly
2022-02-21 16:56 ` Dmitry Osipenko
@ 2022-02-21 17:12 ` Guenter Roeck
2022-02-21 17:59 ` Dmitry Osipenko
0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2022-02-21 17:12 UTC (permalink / raw)
To: Dmitry Osipenko, Hardware Monitoring; +Cc: Jean Delvare, Jon Hunter
On 2/21/22 08:56, Dmitry Osipenko wrote:
> 21.02.2022 19:44, Guenter Roeck пишет:
>> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
>> index 3501a3ead4ba..4bfe3791a5ba 100644
>> --- a/drivers/hwmon/hwmon.c
>> +++ b/drivers/hwmon/hwmon.c
>> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct device *dev, int index)
>>
>> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata,
>> &hwmon_thermal_ops);
>> - /*
>> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV,
>> - * so ignore that error but forward any other error.
>> - */
>> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV))
>> - return PTR_ERR(tzd);
>> + if (IS_ERR(tzd)) {
>> + if (PTR_ERR(tzd) != -ENODEV)
>> + return PTR_ERR(tzd);
>> + dev_warn(dev, "Failed to register temp%d_input with thermal zone\n",
>> + index + 1);
>
> Do we really need this warning? I suppose it should be okay if sensor
> isn't attached to any device in a device-tree and just reports temperature.
I'd rather leave it there for the time being. It will only affect devicetree
systems (turns out there is already a check for of_node elsewhere). Thermal
zone specification is not always easy and there may be a mismatch between
what is reported by the driver and what the user (programmer) expects to
see (which I think is what happens here). I don't want to silently
ignore such problems without any notification.
We could make it dev_notice and/or change the message (instead of "Failed to
..." just say "temp%d_input not registered with thermal zone" , maybe ?).
Guenter
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] hwmon: Handle failure to register sensor with thermal zone correctly
2022-02-21 17:12 ` Guenter Roeck
@ 2022-02-21 17:59 ` Dmitry Osipenko
2022-02-21 18:23 ` Guenter Roeck
0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Osipenko @ 2022-02-21 17:59 UTC (permalink / raw)
To: Guenter Roeck, Hardware Monitoring; +Cc: Jean Delvare, Jon Hunter
21.02.2022 20:12, Guenter Roeck пишет:
> On 2/21/22 08:56, Dmitry Osipenko wrote:
>> 21.02.2022 19:44, Guenter Roeck пишет:
>>> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
>>> index 3501a3ead4ba..4bfe3791a5ba 100644
>>> --- a/drivers/hwmon/hwmon.c
>>> +++ b/drivers/hwmon/hwmon.c
>>> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct
>>> device *dev, int index)
>>> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata,
>>> &hwmon_thermal_ops);
>>> - /*
>>> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV,
>>> - * so ignore that error but forward any other error.
>>> - */
>>> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV))
>>> - return PTR_ERR(tzd);
>>> + if (IS_ERR(tzd)) {
>>> + if (PTR_ERR(tzd) != -ENODEV)
>>> + return PTR_ERR(tzd);
>>> + dev_warn(dev, "Failed to register temp%d_input with thermal
>>> zone\n",
>>> + index + 1);
>>
>> Do we really need this warning? I suppose it should be okay if sensor
>> isn't attached to any device in a device-tree and just reports
>> temperature.
>
> I'd rather leave it there for the time being. It will only affect
> devicetree
> systems (turns out there is already a check for of_node elsewhere). Thermal
> zone specification is not always easy and there may be a mismatch between
> what is reported by the driver and what the user (programmer) expects to
> see (which I think is what happens here). I don't want to silently
> ignore such problems without any notification.
>
> We could make it dev_notice and/or change the message (instead of
> "Failed to
> ..." just say "temp%d_input not registered with thermal zone" , maybe ?).
I'd change it to:
dev_info(dev, "temp%d_input not attached to any thermal zone\n", index + 1);
I'd also add an info message to print out to which tzd attachment happened.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] hwmon: Handle failure to register sensor with thermal zone correctly
2022-02-21 17:59 ` Dmitry Osipenko
@ 2022-02-21 18:23 ` Guenter Roeck
0 siblings, 0 replies; 5+ messages in thread
From: Guenter Roeck @ 2022-02-21 18:23 UTC (permalink / raw)
To: Dmitry Osipenko, Hardware Monitoring; +Cc: Jean Delvare, Jon Hunter
On 2/21/22 09:59, Dmitry Osipenko wrote:
> 21.02.2022 20:12, Guenter Roeck пишет:
>> On 2/21/22 08:56, Dmitry Osipenko wrote:
>>> 21.02.2022 19:44, Guenter Roeck пишет:
>>>> diff --git a/drivers/hwmon/hwmon.c b/drivers/hwmon/hwmon.c
>>>> index 3501a3ead4ba..4bfe3791a5ba 100644
>>>> --- a/drivers/hwmon/hwmon.c
>>>> +++ b/drivers/hwmon/hwmon.c
>>>> @@ -214,12 +214,14 @@ static int hwmon_thermal_add_sensor(struct
>>>> device *dev, int index)
>>>> tzd = devm_thermal_zone_of_sensor_register(dev, index, tdata,
>>>> &hwmon_thermal_ops);
>>>> - /*
>>>> - * If CONFIG_THERMAL_OF is disabled, this returns -ENODEV,
>>>> - * so ignore that error but forward any other error.
>>>> - */
>>>> - if (IS_ERR(tzd) && (PTR_ERR(tzd) != -ENODEV))
>>>> - return PTR_ERR(tzd);
>>>> + if (IS_ERR(tzd)) {
>>>> + if (PTR_ERR(tzd) != -ENODEV)
>>>> + return PTR_ERR(tzd);
>>>> + dev_warn(dev, "Failed to register temp%d_input with thermal
>>>> zone\n",
>>>> + index + 1);
>>>
>>> Do we really need this warning? I suppose it should be okay if sensor
>>> isn't attached to any device in a device-tree and just reports
>>> temperature.
>>
>> I'd rather leave it there for the time being. It will only affect
>> devicetree
>> systems (turns out there is already a check for of_node elsewhere). Thermal
>> zone specification is not always easy and there may be a mismatch between
>> what is reported by the driver and what the user (programmer) expects to
>> see (which I think is what happens here). I don't want to silently
>> ignore such problems without any notification.
>>
>> We could make it dev_notice and/or change the message (instead of
>> "Failed to
>> ..." just say "temp%d_input not registered with thermal zone" , maybe ?).
>
> I'd change it to:
>
> dev_info(dev, "temp%d_input not attached to any thermal zone\n", index + 1);
>
Makes sense. I just sent v2 with that change.
> I'd also add an info message to print out to which tzd attachment happened.
Oh, I don't know. Kernel logs are already way too noisy. Either case that would
not be a bug fix, so such a change should not be part of this patch.
Thanks,
Guenter
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-02-21 18:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 16:44 [PATCH] hwmon: Handle failure to register sensor with thermal zone correctly Guenter Roeck
2022-02-21 16:56 ` Dmitry Osipenko
2022-02-21 17:12 ` Guenter Roeck
2022-02-21 17:59 ` Dmitry Osipenko
2022-02-21 18:23 ` Guenter Roeck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).