linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation
@ 2021-09-27  8:24 Vadim Pasternak
  2021-09-27 10:41 ` Daniel Lezcano
  0 siblings, 1 reply; 9+ messages in thread
From: Vadim Pasternak @ 2021-09-27  8:24 UTC (permalink / raw)
  To: rui.zhang, daniel.lezcano; +Cc: =idosch, linux-pm, Vadim Pasternak

The thermal subsystem maintains a transition table between states that
is allocated according to the maximum state supported by the cooling
device.

When the table needs to be updated, the thermal subsystem does not
validate that the new state does not exceed the maximum state,
leading to out-of-bounds memory accesses [1].

Fix this by validating the new state before updating the table.

Some thermal drivers allow user space to configure such states to limit
the minimum fan speed. See, for example:
commit a421ce088ac8 ("mlxsw: core: Extend cooling device with cooling levels")

In any case, the thermal subsystem needs to validate the state before
using it as an index into the transition table.

[1]
[  156.703153] ==================================================================
[  156.710613] BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x7d/0xb0
[  156.719227] Read of size 4 at addr ffff88811f63e840 by task hw-management.s/2300
[  156.726816]
[  156.728369] CPU: 1 PID: 2300 Comm: hw-management.s Not tainted 5.15.0-rc2-dvs-00320-ga3b397b4fffb #1
[  156.737766] Hardware name: Mellanox Technologies Ltd. MSN2410/VMOD0001, BIOS 4.6.5 09/13/2018
[  156.746561] Call Trace:
[  156.749093]  dump_stack_lvl+0x44/0x57
[  156.752890]  print_address_description.constprop.9+0x21/0x150
[  156.758824]  ? thermal_cooling_device_stats_update+0x7d/0xb0
[  156.764675]  ? thermal_cooling_device_stats_update+0x7d/0xb0
[  156.770521]  kasan_report.cold.14+0x83/0xdf
[  156.774835]  ? thermal_cooling_device_stats_update+0x7d/0xb0
[  156.780654]  thermal_cooling_device_stats_update+0x7d/0xb0
[  156.786314]  __thermal_cdev_update+0xc0/0x200
[  156.790818]  thermal_cdev_update+0x4e/0x70
[  156.795064]  step_wise_throttle+0x426/0x6f0
[  156.799408]  ? devm_thermal_add_hwmon_sysfs+0x80/0x80
[  156.804628]  ? thermal_zone_get_temp+0xa0/0xa0
[  156.809185]  ? netlink_broadcast+0xa/0x10
[  156.813353]  ? thermal_genl_sampling_temp+0x16a/0x200
[  156.818563]  ? thermal_genl_cmd_doit+0x290/0x290
[  156.823332]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  156.827900]  thermal_zone_device_update+0x39e/0x850
[  156.832954]  ? trace_event_raw_event_thermal_zone_trip+0x1f0/0x1f0
[  156.839349]  ? wait_for_completion+0x190/0x190
[  156.843944]  ? devm_thermal_of_cooling_device_register+0xa0/0xa0
[  156.850148]  ? snprintf+0x91/0xc0
[  156.853594]  ? vsprintf+0x10/0x10
[  156.857039]  thermal_zone_device_set_mode+0x81/0xf0
[  156.862079]  mlxsw_thermal_modules_init+0x48f/0x590 [mlxsw_core]
[  156.868392]  ? mlxsw_thermal_set_cur_state+0x590/0x590 [mlxsw_core]
[  156.874954]  ? netlink_broadcast+0xa/0x10
[  156.879090]  ? thermal_genl_send_event+0x117/0x1a0
[  156.884039]  ? thermal_notify_tz_create+0x7d/0xb0
[  156.888903]  ? thermal_genl_sampling_temp+0x200/0x200
[  156.894147]  ? do_init_timer+0x6c/0x80
[  156.898009]  ? thermal_zone_device_register+0x8af/0x9a0
[  156.903438]  ? __thermal_cooling_device_register+0x4f2/0x500
[  156.909272]  mlxsw_thermal_init+0x763/0x880 [mlxsw_core]
[  156.914882]  ? mlxsw_thermal_gearboxes_init.isra.8+0x460/0x460 [mlxsw_core]
[  156.922214]  __mlxsw_core_bus_device_register+0xa0c/0xca0 [mlxsw_core]
[  156.929093]  ? dev_printk_emit+0x90/0xb6
[  156.933136]  ? dev_vprintk_emit+0x208/0x208
[  156.937462]  ? mlxsw_devlink_info_get+0x490/0x490 [mlxsw_core]
[  156.943580]  ? do_raw_spin_lock+0x1d0/0x1d0
[  156.947906]  ? lockdep_hardirqs_on_prepare+0xe/0x230
[  156.953041]  ? __dev_printk+0x9e/0xd6
[  156.956818]  ? _dev_info+0xc8/0xf6
[  156.960339]  ? _dev_notice+0x84/0xf6
[  156.964047]  ? mark_wakeup_next_waiter+0x1b0/0x1b0
[  156.968988]  mlxsw_core_bus_device_register+0x3e/0x60 [mlxsw_core]
[  156.975492]  mlxsw_i2c_probe.cold.8+0x159/0x283 [mlxsw_i2c]
[  156.981259]  ? mlxsw_i2c_wait_go_bit.isra.3+0x1c0/0x1c0 [mlxsw_i2c]
[  156.987701]  ? lock_release+0x50/0x6c0
[  156.991555]  ? devres_open_group+0x13d/0x180
[  156.995936]  ? lock_downgrade+0x3a0/0x3a0
[  157.000108]  ? lock_contended+0x710/0x710
[  157.004229]  ? devres_open_group+0x59/0x180
[  157.008539]  ? devres_log+0x11a/0x180
[  157.012309]  ? trace_hardirqs_on+0x1c/0x110
[  157.016630]  ? preempt_count_sub+0xf/0xb0
[  157.020781]  ? mlxsw_i2c_wait_go_bit.isra.3+0x1c0/0x1c0 [mlxsw_i2c]
[  157.027255]  ? i2c_device_probe+0x30e/0x370
[  157.031573]  i2c_device_probe+0x30e/0x370
[  157.035746]  really_probe+0x149/0x3c0
[  157.039535]  ? driver_allows_async_probing+0x80/0x80
[  157.044629]  __driver_probe_device+0xc3/0x130
[  157.049127]  driver_probe_device+0x45/0x100
[  157.053486]  __device_attach_driver+0xd6/0x100
[  157.058069]  bus_for_each_drv+0xe7/0x150
[  157.062122]  ? bus_rescan_devices+0x10/0x10
[  157.066444]  ? lockdep_hardirqs_on_prepare+0xe/0x230
[  157.071587]  ? trace_hardirqs_on+0x1c/0x110
[  157.075891]  ? preempt_count_sub+0xf/0xb0
[  157.080050]  ? _raw_spin_unlock_irqrestore+0x36/0x50
[  157.085193]  __device_attach+0x185/0x210
[  157.089231]  ? device_bind_driver+0x70/0x70
[  157.093569]  ? kobject_uevent_env+0x287/0x940
[  157.098091]  bus_probe_device+0xf9/0x120
[  157.102159]  device_add+0x623/0xeb0
[  157.105746]  ? static_obj+0x32/0x80
[  157.109348]  ? lockdep_init_map_type+0xd9/0x360
[  157.114062]  ? __fw_devlink_link_to_suppliers+0x270/0x270
[  157.119619]  ? __raw_spin_lock_init+0x71/0x80
[  157.124134]  i2c_new_client_device+0x277/0x3a0
[  157.128715]  new_device_store+0x13c/0x270
[  157.132847]  ? copyin+0x6b/0x80
[  157.136090]  ? i2c_new_ancillary_device+0x20/0x20
[  157.140946]  ? lock_acquire+0xc0/0x3f0
[  157.144830]  ? lock_release+0x6c0/0x6c0
[  157.148833]  ? sysfs_file_ops+0x6b/0x90
[  157.152808]  ? sysfs_file_ops+0x90/0x90
[  157.156774]  kernfs_fop_write_iter+0x1af/0x250
[  157.161346]  new_sync_write+0x25a/0x380
[  157.165327]  ? new_sync_read+0x370/0x370
[  157.169384]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  157.173973]  ? rcu_read_lock_held_common+0x12/0x50
[  157.178911]  ? irq_migrate_all_off_this_cpu+0xf0/0x300
[  157.184208]  ? lock_release+0x6c0/0x6c0
[  157.188149]  ? rcu_read_lock_sched_held+0x5a/0xd0
[  157.192995]  ? rcu_read_lock_held+0xb0/0xb0
[  157.197310]  vfs_write+0x33d/0x530
[  157.200839]  ksys_write+0xbb/0x150
[  157.204359]  ? __ia32_sys_read+0x40/0x40
[  157.208410]  ? set_load_weight+0xd1/0x110
[  157.212557]  do_syscall_64+0x3a/0x80
[  157.216256]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  157.221457] RIP: 0033:0x7fa80bd47970
[  157.225131] Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e 9b 01 00 48 89 04 24
[  157.244407] RSP: 002b:00007fff14b2c648 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  157.252196] RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007fa80bd47970
[  157.259528] RDX: 0000000000000013 RSI: 0000000000ede408 RDI: 0000000000000001
[  157.266868] RBP: 0000000000ede408 R08: 00007fa80c007760 R09: 00007fa80c653700
[  157.274219] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000013
[  157.281564] R13: 0000000000000001 R14: 00007fa80c006600 R15: 0000000000000013
[  157.288933]
[  157.290484] Allocated by task 2300:
[  157.294054]  kasan_save_stack+0x19/0x40
[  157.298024]  __kasan_kmalloc+0x7f/0xa0
[  157.301914]  __kmalloc+0x18f/0x2c0
[  157.305417]  thermal_cooling_device_setup_sysfs+0xf9/0x1a0
[  157.311075]  __thermal_cooling_device_register+0x1b5/0x500
[  157.316734]  mlxsw_thermal_init+0x7e4/0x880 [mlxsw_core]
[  157.322325]  __mlxsw_core_bus_device_register+0xa0c/0xca0 [mlxsw_core]
[  157.329154]  mlxsw_core_bus_device_register+0x3e/0x60 [mlxsw_core]
[  157.335656]  mlxsw_i2c_probe.cold.8+0x159/0x283 [mlxsw_i2c]
[  157.341373]  i2c_device_probe+0x30e/0x370
[  157.345524]  really_probe+0x149/0x3c0
[  157.349330]  __driver_probe_device+0xc3/0x130
[  157.353815]  driver_probe_device+0x45/0x100
[  157.358117]  __device_attach_driver+0xd6/0x100
[  157.362712]  bus_for_each_drv+0xe7/0x150
[  157.366768]  __device_attach+0x185/0x210
[  157.370823]  bus_probe_device+0xf9/0x120
[  157.374897]  device_add+0x623/0xeb0
[  157.378512]  i2c_new_client_device+0x277/0x3a0
[  157.383094]  new_device_store+0x13c/0x270
[  157.387222]  kernfs_fop_write_iter+0x1af/0x250
[  157.391797]  new_sync_write+0x25a/0x380
[  157.395765]  vfs_write+0x33d/0x530
[  157.399267]  ksys_write+0xbb/0x150
[  157.402795]  do_syscall_64+0x3a/0x80
[  157.406496]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  157.411686]
[  157.413236] The buggy address belongs to the object at ffff88811f63e400
[  157.413236]  which belongs to the cache kmalloc-1k of size 1024
[  157.426135] The buggy address is located 64 bytes to the right of
[  157.426135]  1024-byte region [ffff88811f63e400, ffff88811f63e800)
[  157.438710] The buggy address belongs to the page:
[  157.443640] page:ffffea00047d8e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88811f63c000 pfn:0x11f638
[  157.454579] head:ffffea00047d8e00 order:3 compound_mapcount:0 compound_pincount:0
[  157.462281] flags: 0x200000000010200(slab|head|node=0|zone=2)
[  157.468220] raw: 0200000000010200 ffffea000863a808 ffffea00045ab208 ffff888100043bc0
[  157.476205] raw: ffff88811f63c000 00000000000a0002 00000001ffffffff 0000000000000000
[  157.484185] page dumped because: kasan: bad access detected
[  157.489895]
[  157.491445] Memory state around the buggy address:
[  157.496373]  ffff88811f63e700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  157.503809]  ffff88811f63e780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  157.511235] >ffff88811f63e800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  157.518641]                                            ^
[  157.524113]  ffff88811f63e880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  157.531571]  ffff88811f63e900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  157.538980] ==================================================================

Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
---
 drivers/thermal/thermal_sysfs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 1c4aac8464a7..80b38b180140 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -673,10 +673,14 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
 					 unsigned long new_state)
 {
 	struct cooling_dev_stats *stats = cdev->stats;
+	unsigned long max_state;
 
 	if (!stats)
 		return;
 
+	if (cdev->ops->get_max_state(cdev, &max_state) || new_state > max_state)
+		return;
+
 	spin_lock(&stats->lock);
 
 	if (stats->state == new_state)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-28 11:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-27  8:24 [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation Vadim Pasternak
2021-09-27 10:41 ` Daniel Lezcano
2021-09-27 11:22   ` Vadim Pasternak
2021-09-27 12:31     ` Daniel Lezcano
2021-09-27 13:29       ` Vadim Pasternak
2021-09-27 13:55         ` Daniel Lezcano
2021-09-27 17:52           ` Vadim Pasternak
2021-09-27 21:12             ` Daniel Lezcano
2021-09-28 11:35               ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).