Greeting, FYI, we noticed the following commit (built with gcc-11): commit: 4971d1200e1f46625fde6db421961ba1cb3a511a ("[RFC/RFT PATCH resend] thermal: Protect thermal device operations against thermal device removal") url: https://github.com/intel-lab-lkp/linux/commits/Guenter-Roeck/thermal-Protect-thermal-device-operations-against-thermal-device-removal/20221004-114107 patch link: https://lore.kernel.org/linux-pm/20221004033936.1047691-1-linux@roeck-us.net in testcase: pm-qa version: pm-qa-x86_64-5ead848-1_20220523 with following parameters: test: thermal on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz (Haswell) with 8G memory caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): If you fix the issue, kindly add following tag | Reported-by: kernel test robot | Link: https://lore.kernel.org/r/202210072346.aaf911d-oliver.sang@intel.com [ 38.916500][ T50] BUG: KASAN: use-after-free in mutex_lock (kbuild/src/x86_64-3/include/linux/instrumented.h:101 kbuild/src/x86_64-3/include/linux/atomic/atomic-instrumented.h:1780 kbuild/src/x86_64-3/kernel/locking/mutex.c:171 kbuild/src/x86_64-3/kernel/locking/mutex.c:285) [ 38.923152][ T50] Write of size 8 at addr ffff8881404a03d8 by task cpuhp/7/50 [ 38.930487][ T50] [ 38.932702][ T50] CPU: 7 PID: 50 Comm: cpuhp/7 Tainted: G I 6.0.0-00001-g4971d1200e1f #35 [ 38.942471][ T50] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD5H/Z87X-UD5H-CF, BIOS F9 03/18/2014 [ 38.952230][ T50] Call Trace: [ 38.955383][ T50] [ 38.958192][ T50] dump_stack_lvl (kbuild/src/x86_64-3/lib/dump_stack.c:107 (discriminator 1)) [ 38.962570][ T50] print_address_description+0x1f/0x200 [ 38.969032][ T50] print_report.cold (kbuild/src/x86_64-3/mm/kasan/report.c:434) [ 38.973749][ T50] ? _raw_spin_lock_irqsave (kbuild/src/x86_64-3/arch/x86/include/asm/atomic.h:202 kbuild/src/x86_64-3/include/linux/atomic/atomic-instrumented.h:543 kbuild/src/x86_64-3/include/asm-generic/qspinlock.h:111 kbuild/src/x86_64-3/include/linux/spinlock.h:185 kbuild/src/x86_64-3/include/linux/spinlock_api_smp.h:111 kbuild/src/x86_64-3/kernel/locking/spinlock.c:162) [ 38.979082][ T50] ? mutex_lock (kbuild/src/x86_64-3/include/linux/instrumented.h:101 kbuild/src/x86_64-3/include/linux/atomic/atomic-instrumented.h:1780 kbuild/src/x86_64-3/kernel/locking/mutex.c:171 kbuild/src/x86_64-3/kernel/locking/mutex.c:285) [ 38.983372][ T50] kasan_report (kbuild/src/x86_64-3/mm/kasan/report.c:162 kbuild/src/x86_64-3/mm/kasan/report.c:497) [ 38.987663][ T50] ? mutex_lock (kbuild/src/x86_64-3/include/linux/instrumented.h:101 kbuild/src/x86_64-3/include/linux/atomic/atomic-instrumented.h:1780 kbuild/src/x86_64-3/kernel/locking/mutex.c:171 kbuild/src/x86_64-3/kernel/locking/mutex.c:285) [ 38.991952][ T50] kasan_check_range (kbuild/src/x86_64-3/mm/kasan/generic.c:190) [ 38.996675][ T50] mutex_lock (kbuild/src/x86_64-3/include/linux/instrumented.h:101 kbuild/src/x86_64-3/include/linux/atomic/atomic-instrumented.h:1780 kbuild/src/x86_64-3/kernel/locking/mutex.c:171 kbuild/src/x86_64-3/kernel/locking/mutex.c:285) [ 39.000791][ T50] ? __mutex_lock_slowpath (kbuild/src/x86_64-3/kernel/locking/mutex.c:282) [ 39.005949][ T50] ? kobject_cleanup (kbuild/src/x86_64-3/lib/kobject.c:683) [ 39.010759][ T50] thermal_zone_device_unregister (kbuild/src/x86_64-3/drivers/thermal/thermal_core.c:436 kbuild/src/x86_64-3/drivers/thermal/thermal_core.c:425) [ 39.017303][ T50] ? mutex_unlock (kbuild/src/x86_64-3/arch/x86/include/asm/atomic64_64.h:190 kbuild/src/x86_64-3/include/linux/atomic/atomic-long.h:449 kbuild/src/x86_64-3/include/linux/atomic/atomic-instrumented.h:1790 kbuild/src/x86_64-3/kernel/locking/mutex.c:181 kbuild/src/x86_64-3/kernel/locking/mutex.c:540) [ 39.021764][ T50] ? __mutex_unlock_slowpath+0x2c0/0x2c0 [ 39.028311][ T50] pkg_thermal_cpu_offline (kbuild/src/x86_64-3/drivers/thermal/intel/x86_pkg_temp_thermal.c:418) x86_pkg_temp_thermal [ 39.035635][ T50] ? pkg_thermal_notify (kbuild/src/x86_64-3/drivers/thermal/intel/x86_pkg_temp_thermal.c:386) x86_pkg_temp_thermal [ 39.042696][ T50] cpuhp_invoke_callback (kbuild/src/x86_64-3/kernel/cpu.c:192) [ 39.047853][ T50] ? __schedule (kbuild/src/x86_64-3/kernel/sched/core.c:6376) [ 39.052316][ T50] cpuhp_thread_fun (kbuild/src/x86_64-3/kernel/cpu.c:785) [ 39.057039][ T50] ? smpboot_thread_fn (kbuild/src/x86_64-3/kernel/smpboot.c:112) [ 39.061937][ T50] ? cpuhp_invoke_callback (kbuild/src/x86_64-3/kernel/cpu.c:742) [ 39.067264][ T50] ? cpuhp_invoke_callback (kbuild/src/x86_64-3/kernel/cpu.c:742) [ 39.072595][ T50] ? cpuhp_invoke_callback (kbuild/src/x86_64-3/kernel/cpu.c:742) [ 39.077927][ T50] ? smpboot_thread_fn (kbuild/src/x86_64-3/kernel/smpboot.c:112) [ 39.082823][ T50] smpboot_thread_fn (kbuild/src/x86_64-3/kernel/smpboot.c:164 (discriminator 4)) [ 39.087631][ T50] ? find_next_bit (kbuild/src/x86_64-3/arch/x86/events/intel/core.c:4961) [ 39.092095][ T50] ? find_next_bit (kbuild/src/x86_64-3/arch/x86/events/intel/core.c:4961) [ 39.096559][ T50] kthread (kbuild/src/x86_64-3/kernel/kthread.c:376) [ 39.100502][ T50] ? kthread_complete_and_exit (kbuild/src/x86_64-3/kernel/kthread.c:331) [ 39.106006][ T50] ret_from_fork (kbuild/src/x86_64-3/arch/x86/entry/entry_64.S:312) [ 39.110295][ T50] [ 39.113197][ T50] [ 39.115399][ T50] Allocated by task 19: [ 39.119428][ T50] kasan_save_stack (kbuild/src/x86_64-3/mm/kasan/common.c:39) [ 39.123978][ T50] __kasan_kmalloc (kbuild/src/x86_64-3/mm/kasan/common.c:45 kbuild/src/x86_64-3/mm/kasan/common.c:437 kbuild/src/x86_64-3/mm/kasan/common.c:516 kbuild/src/x86_64-3/mm/kasan/common.c:525) [ 39.128443][ T50] thermal_zone_device_register_with_trips (kbuild/src/x86_64-3/include/linux/slab.h:600 kbuild/src/x86_64-3/include/linux/slab.h:733 kbuild/src/x86_64-3/drivers/thermal/thermal_core.c:1236) [ 39.135161][ T50] thermal_zone_device_register (kbuild/src/x86_64-3/drivers/thermal/thermal_core.c:1347) [ 39.140751][ T50] pkg_temp_thermal_device_add (kbuild/src/x86_64-3/drivers/thermal/intel/x86_pkg_temp_thermal.c:359) x86_pkg_temp_thermal [ 39.148421][ T50] cpuhp_invoke_callback (kbuild/src/x86_64-3/kernel/cpu.c:192) [ 39.153577][ T50] cpuhp_thread_fun (kbuild/src/x86_64-3/kernel/cpu.c:785) [ 39.158300][ T50] smpboot_thread_fn (kbuild/src/x86_64-3/kernel/smpboot.c:164 (discriminator 4)) [ 39.163108][ T50] kthread (kbuild/src/x86_64-3/kernel/kthread.c:376) [ 39.167051][ T50] ret_from_fork (kbuild/src/x86_64-3/arch/x86/entry/entry_64.S:312) [ 39.171342][ T50] [ 39.173541][ T50] Freed by task 50: [ 39.177218][ T50] kasan_save_stack (kbuild/src/x86_64-3/mm/kasan/common.c:39) [ 39.181768][ T50] kasan_set_track (kbuild/src/x86_64-3/mm/kasan/common.c:45) [ 39.186231][ T50] kasan_set_free_info (kbuild/src/x86_64-3/mm/kasan/generic.c:372) [ 39.191042][ T50] __kasan_slab_free (kbuild/src/x86_64-3/mm/kasan/common.c:369 kbuild/src/x86_64-3/mm/kasan/common.c:329 kbuild/src/x86_64-3/mm/kasan/common.c:375) [ 39.195852][ T50] kfree (kbuild/src/x86_64-3/mm/slub.c:1785 kbuild/src/x86_64-3/mm/slub.c:3539 kbuild/src/x86_64-3/mm/slub.c:4567) [ 39.197982][ T401] X.Org X Server 1.20.11 [ 39.199605][ T50] device_release (kbuild/src/x86_64-3/drivers/base/core.c:2335) [ 39.199610][ T50] kobject_cleanup (kbuild/src/x86_64-3/lib/kobject.c:677) [ 39.199633][ T401] [ 39.203721][ T50] thermal_zone_device_unregister (kbuild/src/x86_64-3/drivers/thermal/thermal_core.c:436 kbuild/src/x86_64-3/drivers/thermal/thermal_core.c:425) [ 39.203726][ T50] pkg_thermal_cpu_offline (kbuild/src/x86_64-3/drivers/thermal/intel/x86_pkg_temp_thermal.c:418) x86_pkg_temp_thermal [ 39.203730][ T50] cpuhp_invoke_callback (kbuild/src/x86_64-3/kernel/cpu.c:192) [ 39.203733][ T50] cpuhp_thread_fun (kbuild/src/x86_64-3/kernel/cpu.c:785) [ 39.203735][ T50] smpboot_thread_fn (kbuild/src/x86_64-3/kernel/smpboot.c:164 (discriminator 4)) [ 39.243555][ T50] kthread (kbuild/src/x86_64-3/kernel/kthread.c:376) [ 39.247503][ T50] ret_from_fork (kbuild/src/x86_64-3/arch/x86/entry/entry_64.S:312) [ 39.251795][ T50] [ 39.253995][ T50] The buggy address belongs to the object at ffff8881404a0000 [ 39.253995][ T50] which belongs to the cache kmalloc-2k of size 2048 [ 39.267912][ T50] The buggy address is located 984 bytes inside of [ 39.267912][ T50] 2048-byte region [ffff8881404a0000, ffff8881404a0800) [ 39.281137][ T50] [ 39.283341][ T50] The buggy address belongs to the physical page: [ 39.289615][ T50] page:000000009883a4a4 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8881404a1000 pfn:0x1404a0 [ 39.301020][ T50] head:000000009883a4a4 order:3 compound_mapcount:0 compound_pincount:0 [ 39.309213][ T50] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff) [ 39.317321][ T50] raw: 0017ffffc0010200 ffffea0005049808 ffffea0005042a08 ffff888100042f00 [ 39.325772][ T50] raw: ffff8881404a1000 0000000000080004 00000001ffffffff 0000000000000000 [ 39.334220][ T50] page dumped because: kasan: bad access detected [ 39.340500][ T50] [ 39.342707][ T50] Memory state around the buggy address: [ 39.348206][ T50] ffff8881404a0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 39.356137][ T50] ffff8881404a0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 39.364067][ T50] >ffff8881404a0380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 39.371997][ T50] ^ [ 39.378805][ T50] ffff8881404a0400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 39.386737][ T50] ffff8881404a0480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 39.394667][ T50] ================================================================== [ 39.402641][ T50] Disabling lock debugging due to kernel taint [ 39.624286][ T399] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-hsw-d04/pm-qa-thermal-debian-11.1-x86_64-20220510.cgz-4971d1200e1f46625fde6db421961ba1cb3a511a-20221005-50436-146a5zr-4.yaml&job_state=running -O /dev/null [ 39.624303][ T399] [ 39.656811][ T399] target ucode: 0x28 [ 39.656819][ T399] [ 39.661957][ T1113] Consider using thermal netlink events interface [ 39.663530][ T399] current_version: 28, target_version: 28 [ 39.669136][ T399] [ 39.677836][ T399] 2022-10-05 05:00:38 make -C thermal run_tests [ 39.677843][ T399] [ 39.687286][ T399] make: Entering directory '/lkp/benchmarks/pm-qa/thermal' [ 39.687294][ T399] [ 39.696670][ T399] ### [ 39.696676][ T399] [ 39.701716][ T399] ### thermal_00: [ 39.701722][ T399] [ 39.710232][ T399] ### list existing thermal-zones and cooling-devices in the system [ 39.710247][ T399] [ 39.722272][ T399] ### https://wiki.linaro.org/WorkingGroups/PowerManagement/Doc/QA/Scripts#thermal_00 [ 39.722281][ T399] [ 39.724588][ T401] X Protocol Version 11, Revision 0 [ 39.731763][ T399] ### [ 39.733877][ T401] [ 39.743767][ T399] [ 39.746579][ T399] Thermal Zone list [ 39.746585][ T399] [ 39.752858][ T399] ----------------- [ 39.752864][ T399] [ 39.759088][ T399] thermal_zone0 [ 39.759094][ T399] [ 39.764876][ T399] - acpitz [ 39.764882][ T399] [ 39.770261][ T399] thermal_zone1 [ 39.770267][ T399] [ 39.776014][ T399] - acpitz [ 39.776020][ T399] [ 39.781173][ T399] [ 39.781179][ T399] [ 39.785630][ T399] [ 39.785642][ T399] [ 39.791059][ T399] Cooling Device list [ 39.791074][ T399] [ 39.797515][ T399] ------------------- [ 39.797521][ T399] [ 39.803923][ T399] cooling_device0 [ 39.803929][ T399] [ 39.809780][ T399] - Fan [ 39.809807][ T399] [ 39.814977][ T399] cooling_device1 [ 39.814984][ T399] [ 39.820857][ T399] - Fan [ 39.820862][ T399] [ 39.826066][ T399] cooling_device10 [ 39.826074][ T399] [ 39.832125][ T399] - Processor [ 39.832134][ T399] [ 39.842028][ T399] cooling_device11 [ 39.842038][ T399] [ 39.848197][ T399] - Processor [ 39.848205][ T399] [ 39.854084][ T399] cooling_device12 To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. -- 0-DAY CI Kernel Test Service https://01.org/lkp