Greeting, FYI, we noticed the following commit (built with gcc-9): commit: c928e9b1439de4d74b942abd30d5c838a40af777 ("[PATCH v2 7/7] Test softlockup") url: https://github.com/0day-ci/linux/commits/Petr-Mladek/watchdog-softlockup-Report-overall-time-and-some-cleanup/20210311-205501 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git a74e6a014c9d4d4161061f770c9b4f98372ac778 in testcase: stress-ng version: stress-ng-x86_64-0.11-06_20210314 with following parameters: nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: filesystem test: binderfs cpufreq_governor: performance ucode: 0x42e on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): If you fix the issue, kindly add following tag Reported-by: kernel test robot [ 70.666742] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [perf:1794] [ 70.675062] Modules linked in: dm_mod xfs libcrc32c sd_mod t10_pi sg intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul crc32_pclmul crc32c_intel drm_kms_helper ghash_clmulni_intel isci syscopyarea sysfillrect rapl sysimgblt libsas fb_sys_fops ahci intel_cstate ipmi_si scsi_transport_sas libahci mei_me ipmi_devintf ioatdma drm intel_uncore ipmi_msghandler libata mei joydev dca wmi ip_tables [ 70.725024] CPU: 8 PID: 1794 Comm: perf Not tainted 5.12.0-rc2-00303-gc928e9b1439d #1 [ 70.734501] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 70.746688] RIP: 0010:version_proc_show (kbuild/src/consumer/fs/proc/version.c:15) [ 70.752690] Code: c3 0f 1f 44 00 00 55 48 c7 c6 00 dc 24 82 48 89 fd 48 c7 c7 a8 ed 57 82 e8 af 5d ff ff c6 05 90 60 ba 01 01 8a 05 8a 60 ba 01 <84> c0 74 04 f3 90 eb f2 65 48 8b 04 25 00 6f 01 00 48 8b 80 98 0b All code ======== 0: c3 retq 1: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 6: 55 push %rbp 7: 48 c7 c6 00 dc 24 82 mov $0xffffffff8224dc00,%rsi e: 48 89 fd mov %rdi,%rbp 11: 48 c7 c7 a8 ed 57 82 mov $0xffffffff8257eda8,%rdi 18: e8 af 5d ff ff callq 0xffffffffffff5dcc 1d: c6 05 90 60 ba 01 01 movb $0x1,0x1ba6090(%rip) # 0x1ba60b4 24: 8a 05 8a 60 ba 01 mov 0x1ba608a(%rip),%al # 0x1ba60b4 2a:* 84 c0 test %al,%al <-- trapping instruction 2c: 74 04 je 0x32 2e: f3 90 pause 30: eb f2 jmp 0x24 32: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax 39: 01 00 3b: 48 rex.W 3c: 8b .byte 0x8b 3d: 80 .byte 0x80 3e: 98 cwtl 3f: 0b .byte 0xb Code starting with the faulting instruction =========================================== 0: 84 c0 test %al,%al 2: 74 04 je 0x8 4: f3 90 pause 6: eb f2 jmp 0xfffffffffffffffa 8: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax f: 01 00 11: 48 rex.W 12: 8b .byte 0x8b 13: 80 .byte 0x80 14: 98 cwtl 15: 0b .byte 0xb [ 70.775248] RSP: 0018:ffffc9000b84bdd0 EFLAGS: 00000202 [ 70.781821] RAX: 0000000000000001 RBX: ffff888111af7ca8 RCX: 0000000000000000 [ 70.790549] RDX: 0000000000000000 RSI: ffff888f02a177f0 RDI: ffff888f02a177f0 [ 70.799294] RBP: ffff888111af7ca8 R08: ffff888f02a177f0 R09: ffffc9000b84bbf0 [ 70.808019] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000b84be88 [ 70.816751] R13: ffffc9000b84be60 R14: ffff888111af7cd0 R15: 0000000000000001 [ 70.825493] FS: 00007f3b1f0397c0(0000) GS:ffff888f02a00000(0000) knlGS:0000000000000000 [ 70.835310] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 70.842505] CR2: 00005650280be178 CR3: 0000000194ba2004 CR4: 00000000001706e0 [ 70.851261] Call Trace: [ 70.854739] seq_read_iter (kbuild/src/consumer/fs/seq_file.c:227) [ 70.859674] proc_reg_read_iter (kbuild/src/consumer/fs/proc/inode.c:311) [ 70.864901] new_sync_read (kbuild/src/consumer/fs/read_write.c:416 (discriminator 1)) [ 70.869838] vfs_read (kbuild/src/consumer/fs/read_write.c:496) [ 70.874326] ksys_read (kbuild/src/consumer/fs/read_write.c:634) [ 70.878653] do_syscall_64 (kbuild/src/consumer/arch/x86/entry/common.c:46) [ 70.883389] entry_SYSCALL_64_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64.S:112) [ 70.889715] RIP: 0033:0x7f3b1fcd5461 [ 70.894432] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 All code ======== 0: fe (bad) 1: ff (bad) 2: ff 50 48 callq *0x48(%rax) 5: 8d 3d fe d0 09 00 lea 0x9d0fe(%rip),%edi # 0x9d109 b: e8 e9 03 02 00 callq 0x203f9 10: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 17: 00 00 19: 48 8d 05 99 62 0d 00 lea 0xd6299(%rip),%rax # 0xd62b9 20: 8b 00 mov (%rax),%eax 22: 85 c0 test %eax,%eax 24: 75 13 jne 0x39 26: 31 c0 xor %eax,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 57 ja 0x89 32: c3 retq 33: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 39: 41 54 push %r12 3b: 49 89 d4 mov %rdx,%r12 3e: 55 push %rbp 3f: 48 rex.W Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 57 ja 0x5f 8: c3 retq 9: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) f: 41 54 push %r12 11: 49 89 d4 mov %rdx,%r12 14: 55 push %rbp 15: 48 rex.W [ 70.916824] RSP: 002b:00007fffa9898db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 70.922741] watchdog: BUG: soft lockup - CPU#27 stuck for 27s! [perf:1882] [ 70.925999] RAX: ffffffffffffffda RBX: 0000565027ffc970 RCX: 00007f3b1fcd5461 [ 70.934426] Modules linked in: dm_mod [ 70.943145] RDX: 0000000000000400 RSI: 0000565027ffcc20 RDI: 0000000000000004 [ 70.943147] RBP: 0000000000000d68 R08: 0000000000000001 R09: 0000000000000000 [ 70.947963] xfs [ 70.956663] R10: 00007f3b1f0397c0 R11: 0000000000000246 R12: 00007f3b1fda2760 [ 70.965389] libcrc32c [ 70.968210] R13: 00007f3b1fda32a0 R14: 0000000000000fff R15: 0000565027ffc970 [ 70.976908] sd_mod [ 70.980313] Kernel panic - not syncing: softlockup: hung tasks [ 70.989033] t10_pi [ 70.992181] CPU: 8 PID: 1794 Comm: perf Tainted: G L 5.12.0-rc2-00303-gc928e9b1439d #1 [ 70.999457] sg [ 71.002521] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [ 71.013538] intel_rapl_msr [ 71.016248] Call Trace: [ 71.028495] intel_rapl_common [ 71.032402] [ 71.035857] sb_edac [ 71.039990] dump_stack (kbuild/src/consumer/lib/dump_stack.c:122) [ 71.042930] x86_pkg_temp_thermal [ 71.046073] panic (kbuild/src/consumer/kernel/panic.c:249) [ 71.050431] intel_powerclamp [ 71.054752] watchdog_timer_fn.cold (kbuild/src/consumer/kernel/watchdog.c:433) [ 71.058764] coretemp [ 71.062645] ? softlockup_fn (kbuild/src/consumer/kernel/watchdog.c:354) [ 71.068018] kvm_intel [ 71.071163] __hrtimer_run_queues (kbuild/src/consumer/kernel/time/hrtimer.c:1519 kbuild/src/consumer/kernel/time/hrtimer.c:1583) [ 71.075919] kvm [ 71.079161] hrtimer_interrupt (kbuild/src/consumer/kernel/time/hrtimer.c:1648) [ 71.084497] irqbypass [ 71.087153] __sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/include/asm/jump_label.h:25 kbuild/src/consumer/include/linux/jump_label.h:200 kbuild/src/consumer/arch/x86/include/asm/trace/irq_vectors.h:41 kbuild/src/consumer/arch/x86/kernel/apic/apic.c:1107) [ 71.092326] mgag200 [ 71.095499] sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/kernel/apic/apic.c:1100 (discriminator 14)) [ 71.101652] crct10dif_pclmul [ 71.104609] [ 71.110501] crc32_pclmul [ 71.114365] asm_sysvec_apic_timer_interrupt (kbuild/src/consumer/arch/x86/include/asm/idtentry.h:632) [ 71.117264] crc32c_intel [ 71.120677] RIP: 0010:version_proc_show (kbuild/src/consumer/fs/proc/version.c:15) [ 71.126951] drm_kms_helper [ 71.130406] Code: c3 0f 1f 44 00 00 55 48 c7 c6 00 dc 24 82 48 89 fd 48 c7 c7 a8 ed 57 82 e8 af 5d ff ff c6 05 90 60 ba 01 01 8a 05 8a 60 ba 01 <84> c0 74 04 f3 90 eb f2 65 48 8b 04 25 00 6f 01 00 48 8b 80 98 0b All code ======== 0: c3 retq 1: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 6: 55 push %rbp 7: 48 c7 c6 00 dc 24 82 mov $0xffffffff8224dc00,%rsi e: 48 89 fd mov %rdi,%rbp 11: 48 c7 c7 a8 ed 57 82 mov $0xffffffff8257eda8,%rdi 18: e8 af 5d ff ff callq 0xffffffffffff5dcc 1d: c6 05 90 60 ba 01 01 movb $0x1,0x1ba6090(%rip) # 0x1ba60b4 24: 8a 05 8a 60 ba 01 mov 0x1ba608a(%rip),%al # 0x1ba60b4 2a:* 84 c0 test %al,%al <-- trapping instruction 2c: 74 04 je 0x32 2e: f3 90 pause 30: eb f2 jmp 0x24 32: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax 39: 01 00 3b: 48 rex.W 3c: 8b .byte 0x8b 3d: 80 .byte 0x80 3e: 98 cwtl 3f: 0b .byte 0xb Code starting with the faulting instruction =========================================== 0: 84 c0 test %al,%al 2: 74 04 je 0x8 4: f3 90 pause 6: eb f2 jmp 0xfffffffffffffffa 8: 65 48 8b 04 25 00 6f mov %gs:0x16f00,%rax f: 01 00 11: 48 rex.W 12: 8b .byte 0x8b 13: 80 .byte 0x80 14: 98 cwtl 15: 0b .byte 0xb [ 71.136243] ghash_clmulni_intel [ 71.139898] RSP: 0018:ffffc9000b84bdd0 EFLAGS: 00000202 [ 71.162013] isci [ 71.166220] [ 71.172580] syscopyarea [ 71.175291] RAX: 0000000000000001 RBX: ffff888111af7ca8 RCX: 0000000000000000 [ 71.177465] sysfillrect [ 71.180792] RDX: 0000000000000000 RSI: ffff888f02a177f0 RDI: ffff888f02a177f0 [ 71.189291] rapl [ 71.192595] RBP: ffff888111af7ca8 R08: ffff888f02a177f0 R09: ffffc9000b84bbf0 [ 71.201111] sysimgblt [ 71.203764] R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000b84be88 [ 71.212307] libsas [ 71.215453] R13: ffffc9000b84be60 R14: ffff888111af7cd0 R15: 0000000000000001 [ 71.223956] fb_sys_fops [ 71.226799] seq_read_iter (kbuild/src/consumer/fs/seq_file.c:227) [ 71.235309] ahci [ 71.238628] proc_reg_read_iter (kbuild/src/consumer/fs/proc/inode.c:311) [ 71.243337] intel_cstate [ 71.246008] new_sync_read (kbuild/src/consumer/fs/read_write.c:416 (discriminator 1)) [ 71.250990] ipmi_si [ 71.254390] vfs_read (kbuild/src/consumer/fs/read_write.c:496) [ 71.259089] scsi_transport_sas [ 71.262024] ksys_read (kbuild/src/consumer/fs/read_write.c:634) [ 71.266253] libahci [ 71.270260] do_syscall_64 (kbuild/src/consumer/arch/x86/entry/common.c:46) [ 71.274349] mei_me [ 71.277282] entry_SYSCALL_64_after_hwframe (kbuild/src/consumer/arch/x86/entry/entry_64.S:112) [ 71.281743] ipmi_devintf [ 71.284552] RIP: 0033:0x7f3b1fcd5461 [ 71.290647] ioatdma [ 71.294094] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48 All code ======== 0: fe (bad) 1: ff (bad) 2: ff 50 48 callq *0x48(%rax) 5: 8d 3d fe d0 09 00 lea 0x9d0fe(%rip),%edi # 0x9d109 b: e8 e9 03 02 00 callq 0x203f9 10: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 17: 00 00 19: 48 8d 05 99 62 0d 00 lea 0xd6299(%rip),%rax # 0xd62b9 20: 8b 00 mov (%rax),%eax 22: 85 c0 test %eax,%eax 24: 75 13 jne 0x39 26: 31 c0 xor %eax,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 57 ja 0x89 32: c3 retq 33: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 39: 41 54 push %r12 3b: 49 89 d4 mov %rdx,%r12 3e: 55 push %rbp 3f: 48 rex.W Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 57 ja 0x5f 8: c3 retq 9: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) f: 41 54 push %r12 11: 49 89 d4 mov %rdx,%r12 14: 55 push %rbp 15: 48 rex.W To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml bin/lkp run compatible-job.yaml --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang