All of lore.kernel.org
 help / color / mirror / Atom feed
* PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
@ 2022-03-21 23:29 Jirka Hladky
  2022-03-21 23:37 ` Jirka Hladky
  2022-03-24 11:49 ` Thorsten Leemhuis
  0 siblings, 2 replies; 21+ messages in thread
From: Jirka Hladky @ 2022-03-21 23:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Philip Auld, Donald Zickus

Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
Gold 6126 CPU)

Bellow is a backtrace and the dmesg log.

I have trouble creating a simple reproducer - it happens at random
places when preparing the NAS benchmark to be run. The script creates
a bunch of directories, compiles the benchmark a start trial runs.

Could you please help to narrow down the problem?

Reports bellow were created with kernel 5.17 rc8 and with
echo 1 > /proc/sys/kernel/panic_on_oops
setting.

crash> sys
      KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 48
        DATE: Thu Mar 17 02:49:40 CET 2022
      UPTIME: 00:02:50
LOAD AVERAGE: 0.32, 0.10, 0.03
       TASKS: 608
    NODENAME: gold-2s-c
     RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
     VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
     MACHINE: x86_64  (2600 Mhz)
      MEMORY: 94.7 GB
       PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)


crash> bt
PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
#0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
#1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
#2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
#3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
#4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
#5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
#6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
   [exception RIP: kernfs_remove+7]
   RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
   RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
   RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
   RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
   R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
   R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
#8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
#9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
#10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
#11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
#12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
#13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
#14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
   RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
   RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
   RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
   RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
   R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
   R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
   ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b

[2] dmesg
[  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  172.783513] #PF: supervisor read access in kernel mode
[  172.788652] #PF: error_code(0x0000) - not-present page
[  172.793793] PGD 0 P4D 0
[  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
[  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
[  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
2.0b 03/07/2018
[  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
[  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
c5 60
[  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
[  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
[  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
[  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
[  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
[  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
[  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
knlGS:0000000000000000
[  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
[  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  172.916261] PKRU: 55555554
[  172.918974] Call Trace:
[  172.921427]  <TASK>
[  172.923533]  rdt_kill_sb+0x29b/0x350
[  172.927112]  deactivate_locked_super+0x2f/0xa0
[  172.931559]  cleanup_mnt+0xee/0x180
[  172.935051]  task_work_run+0x5c/0x90
[  172.938629]  exit_to_user_mode_prepare+0x229/0x230
[  172.943424]  syscall_exit_to_user_mode+0x18/0x40
[  172.948043]  do_syscall_64+0x46/0x80
[  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  172.956675] RIP: 0033:0x7f442c75126b
[  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
f7 d8
[  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
00000000000000a6
[  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
[  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
[  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
[  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
[  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
[  173.022249]  </TASK>
[  173.024440] Modules linked in: rfkill intel_rapl_msr
intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
intel_pmc_bxt ib_core iTCO_vendor_support kvm_
intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
acpi_power_meter fuse zram xfs crct10d
if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
ttm wmi ghash_clmulni_intel
[  173.073900] CR2: 0000000000000008

-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-21 23:29 PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers Jirka Hladky
@ 2022-03-21 23:37 ` Jirka Hladky
  2022-03-22  7:12   ` Greg KH
  2022-03-24 11:49 ` Thorsten Leemhuis
  1 sibling, 1 reply; 21+ messages in thread
From: Jirka Hladky @ 2022-03-21 23:37 UTC (permalink / raw)
  To: stable, linux-kernel; +Cc: regressions

Cc: regressions@lists.linux.dev stable@vger.kernel.org

On Tue, Mar 22, 2022 at 12:29 AM Jirka Hladky <jhladky@redhat.com> wrote:
>
> Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> Gold 6126 CPU)
>
> Bellow is a backtrace and the dmesg log.
>
> I have trouble creating a simple reproducer - it happens at random
> places when preparing the NAS benchmark to be run. The script creates
> a bunch of directories, compiles the benchmark a start trial runs.
>
> Could you please help to narrow down the problem?
>
> Reports bellow were created with kernel 5.17 rc8 and with
> echo 1 > /proc/sys/kernel/panic_on_oops
> setting.
>
> crash> sys
>       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
>     DUMPFILE: vmcore  [PARTIAL DUMP]
>         CPUS: 48
>         DATE: Thu Mar 17 02:49:40 CET 2022
>       UPTIME: 00:02:50
> LOAD AVERAGE: 0.32, 0.10, 0.03
>        TASKS: 608
>     NODENAME: gold-2s-c
>      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
>      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
>      MACHINE: x86_64  (2600 Mhz)
>       MEMORY: 94.7 GB
>        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
>
>
> crash> bt
> PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
>    [exception RIP: kernfs_remove+7]
>    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
>    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
>    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
>    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
>    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
>    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
>    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
>    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
>    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
>    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
>    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
>    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
>    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
>
> [2] dmesg
> [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  172.783513] #PF: supervisor read access in kernel mode
> [  172.788652] #PF: error_code(0x0000) - not-present page
> [  172.793793] PGD 0 P4D 0
> [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> 2.0b 03/07/2018
> [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> c5 60
> [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> knlGS:0000000000000000
> [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  172.916261] PKRU: 55555554
> [  172.918974] Call Trace:
> [  172.921427]  <TASK>
> [  172.923533]  rdt_kill_sb+0x29b/0x350
> [  172.927112]  deactivate_locked_super+0x2f/0xa0
> [  172.931559]  cleanup_mnt+0xee/0x180
> [  172.935051]  task_work_run+0x5c/0x90
> [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> [  172.948043]  do_syscall_64+0x46/0x80
> [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  172.956675] RIP: 0033:0x7f442c75126b
> [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> f7 d8
> [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000a6
> [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> [  173.022249]  </TASK>
> [  173.024440] Modules linked in: rfkill intel_rapl_msr
> intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> acpi_power_meter fuse zram xfs crct10d
> if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> ttm wmi ghash_clmulni_intel
> [  173.073900] CR2: 0000000000000008
>
> --
> -Jirka



-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-21 23:37 ` Jirka Hladky
@ 2022-03-22  7:12   ` Greg KH
  2022-03-22 10:19     ` Jirka Hladky
  0 siblings, 1 reply; 21+ messages in thread
From: Greg KH @ 2022-03-22  7:12 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: stable, linux-kernel, regressions

On Tue, Mar 22, 2022 at 12:37:37AM +0100, Jirka Hladky wrote:
> Cc: regressions@lists.linux.dev stable@vger.kernel.org
> 
> On Tue, Mar 22, 2022 at 12:29 AM Jirka Hladky <jhladky@redhat.com> wrote:
> >
> > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > Gold 6126 CPU)
> >
> > Bellow is a backtrace and the dmesg log.
> >
> > I have trouble creating a simple reproducer - it happens at random
> > places when preparing the NAS benchmark to be run. The script creates
> > a bunch of directories, compiles the benchmark a start trial runs.
> >
> > Could you please help to narrow down the problem?

Can you use 'git bisect' to track down the kernel commit that caused
this problem?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-22  7:12   ` Greg KH
@ 2022-03-22 10:19     ` Jirka Hladky
  0 siblings, 0 replies; 21+ messages in thread
From: Jirka Hladky @ 2022-03-22 10:19 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, linux-kernel, regressions

> Can you use 'git bisect' to track down the kernel commit that caused
> this problem?

Yes, I will try that and report the back the findings.

On Tue, Mar 22, 2022 at 8:15 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Mar 22, 2022 at 12:37:37AM +0100, Jirka Hladky wrote:
> > Cc: regressions@lists.linux.dev stable@vger.kernel.org
> >
> > On Tue, Mar 22, 2022 at 12:29 AM Jirka Hladky <jhladky@redhat.com> wrote:
> > >
> > > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > > Gold 6126 CPU)
> > >
> > > Bellow is a backtrace and the dmesg log.
> > >
> > > I have trouble creating a simple reproducer - it happens at random
> > > places when preparing the NAS benchmark to be run. The script creates
> > > a bunch of directories, compiles the benchmark a start trial runs.
> > >
> > > Could you please help to narrow down the problem?
>
> Can you use 'git bisect' to track down the kernel commit that caused
> this problem?
>
> thanks,
>
> greg k-h
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-21 23:29 PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers Jirka Hladky
  2022-03-21 23:37 ` Jirka Hladky
@ 2022-03-24 11:49 ` Thorsten Leemhuis
  2022-03-30 22:16   ` Jirka Hladky
  1 sibling, 1 reply; 21+ messages in thread
From: Thorsten Leemhuis @ 2022-03-24 11:49 UTC (permalink / raw)
  To: linux-kernel, regressions

[TLDR: I'm adding the regression report below to regzbot, the Linux
kernel regression tracking bot; all text you find below is compiled from
a few templates paragraphs you might have encountered already already
from similar mails.]

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

To be sure below issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced v5.16..v5.17
#regzbot ignore-activity

If it turns out this isn't a regression, free free to remove it from the
tracking by sending a reply to this thread containing a paragraph like
"#regzbot invalid: reason why this is invalid" (without the quotes).

Reminder for developers: when fixing the issue, please add a 'Link:'
tags pointing to the report (the mail quoted above) using
lore.kernel.org/r/, as explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'. Regzbot needs them to
automatically connect reports with fixes, but they are useful in
general, too.

I'm sending this to everyone that got the initial report, to make
everyone aware of the tracking. I also hope that messages like this
motivate people to directly get at least the regression mailing list and
ideally even regzbot involved when dealing with regressions, as messages
like this wouldn't be needed then. And don't worry, if I need to send
other mails regarding this regression only relevant for regzbot I'll
send them to the regressions lists only (with a tag in the subject so
people can filter them away). With a bit of luck no such messages will
be needed anyway.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.


On 22.03.22 00:29, Jirka Hladky wrote:
> Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> Gold 6126 CPU)
> 
> Bellow is a backtrace and the dmesg log.
> 
> I have trouble creating a simple reproducer - it happens at random
> places when preparing the NAS benchmark to be run. The script creates
> a bunch of directories, compiles the benchmark a start trial runs.
> 
> Could you please help to narrow down the problem?
> 
> Reports bellow were created with kernel 5.17 rc8 and with
> echo 1 > /proc/sys/kernel/panic_on_oops
> setting.
> 
> crash> sys
>       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
>     DUMPFILE: vmcore  [PARTIAL DUMP]
>         CPUS: 48
>         DATE: Thu Mar 17 02:49:40 CET 2022
>       UPTIME: 00:02:50
> LOAD AVERAGE: 0.32, 0.10, 0.03
>        TASKS: 608
>     NODENAME: gold-2s-c
>      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
>      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
>      MACHINE: x86_64  (2600 Mhz)
>       MEMORY: 94.7 GB
>        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> 
> 
> crash> bt
> PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
>    [exception RIP: kernfs_remove+7]
>    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
>    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
>    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
>    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
>    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
>    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
>    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
>    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
>    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
>    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
>    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
>    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
>    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> 
> [2] dmesg
> [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  172.783513] #PF: supervisor read access in kernel mode
> [  172.788652] #PF: error_code(0x0000) - not-present page
> [  172.793793] PGD 0 P4D 0
> [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> 2.0b 03/07/2018
> [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> c5 60
> [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> knlGS:0000000000000000
> [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  172.916261] PKRU: 55555554
> [  172.918974] Call Trace:
> [  172.921427]  <TASK>
> [  172.923533]  rdt_kill_sb+0x29b/0x350
> [  172.927112]  deactivate_locked_super+0x2f/0xa0
> [  172.931559]  cleanup_mnt+0xee/0x180
> [  172.935051]  task_work_run+0x5c/0x90
> [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> [  172.948043]  do_syscall_64+0x46/0x80
> [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  172.956675] RIP: 0033:0x7f442c75126b
> [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> f7 d8
> [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000a6
> [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> [  173.022249]  </TASK>
> [  173.024440] Modules linked in: rfkill intel_rapl_msr
> intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> acpi_power_meter fuse zram xfs crct10d
> if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> ttm wmi ghash_clmulni_intel
> [  173.073900] CR2: 0000000000000008
> 

-- 
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-24 11:49 ` Thorsten Leemhuis
@ 2022-03-30 22:16   ` Jirka Hladky
  2022-03-30 22:24     ` Jirka Hladky
  0 siblings, 1 reply; 21+ messages in thread
From: Jirka Hladky @ 2022-03-30 22:16 UTC (permalink / raw)
  To: Thorsten Leemhuis; +Cc: linux-kernel, regressions

Hi Thorsten,

thanks for adding this to the regzbot bot.

Hi Greg and all,

I did bisecting and I have found the commit causing this issue [1].
Could you please have a look at the code how to fix it?

Thanks a lot
Jirka

[1]
=========================================================
$ git bisect visualize
commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
Author: Minchan Kim <minchan@kernel.org>
Date:   Thu Nov 18 15:00:08 2021 -0800

   kernfs: switch global kernfs_rwsem lock to per-fs lock

   The kernfs implementation has big lock granularity(kernfs_rwsem) so
   every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
   lock. It makes trouble for some cases to wait the global lock
   for a long time even though they are totally independent contexts
   each other.

   A general example is process A goes under direct reclaim with holding
   the lock when it accessed the file in sysfs and process B is waiting
   the lock with exclusive mode and then process C is waiting the lock
   until process B could finish the job after it gets the lock from
   process A.

   This patch switches the global kernfs_rwsem to per-fs lock, which
   put the rwsem into kernfs_root.

   Suggested-by: Tejun Heo <tj@kernel.org>
   Acked-by: Tejun Heo <tj@kernel.org>
   Signed-off-by: Minchan Kim <minchan@kernel.org>
   Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
   Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
=========================================================

The bug is triggered by running NAS Parallel benchmark suite on
SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
log:

[  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  247.036009] #PF: supervisor read access in kernel mode
[  247.036009] #PF: error_code(0x0000) - not-present page
[  247.036009] PGD 0 P4D 0
[  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
[  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
[  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
2.0b 03/07/2018
[  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
[  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
c4 60
[  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
[  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
[  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
[  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
[  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
[  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
[  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
knlGS:0000000000000000
[  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
[  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  247.122048] PKRU: 55555554
[  247.122048] Call Trace:
[  247.122048]  <TASK>
[  247.122048]  rdt_kill_sb+0x29d/0x350
[  247.122048]  deactivate_locked_super+0x36/0xa0
[  247.122048]  cleanup_mnt+0x131/0x190
[  247.122048]  task_work_run+0x5c/0x90
[  247.122048]  exit_to_user_mode_prepare+0x229/0x230
[  247.122048]  syscall_exit_to_user_mode+0x18/0x40
[  247.122048]  do_syscall_64+0x48/0x90
[  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  247.122048] RIP: 0033:0x7f01be2d735b
[  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
f7 d8
[  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
00000000000000a6
[  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
[  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
[  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
[  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
[  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
[  247.279066]  </TASK>
[  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
intel_rapl_common isst_if_common skx_edac nfit libnvdimm
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
xfs ast i2c_algo_bit drm_v
ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
fuse ipmi_devintf ipmi_msghandler
[  247.335054] CR2: 0000000000000008
[  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
[  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
[  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
c4 60
[  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
[  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
[  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
[  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
[  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
[  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
[  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
knlGS:0000000000000000
[  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
[  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  247.455060] PKRU: 55555554

On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
>
> [TLDR: I'm adding the regression report below to regzbot, the Linux
> kernel regression tracking bot; all text you find below is compiled from
> a few templates paragraphs you might have encountered already already
> from similar mails.]
>
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> To be sure below issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
>
> #regzbot ^introduced v5.16..v5.17
> #regzbot ignore-activity
>
> If it turns out this isn't a regression, free free to remove it from the
> tracking by sending a reply to this thread containing a paragraph like
> "#regzbot invalid: reason why this is invalid" (without the quotes).
>
> Reminder for developers: when fixing the issue, please add a 'Link:'
> tags pointing to the report (the mail quoted above) using
> lore.kernel.org/r/, as explained in
> 'Documentation/process/submitting-patches.rst' and
> 'Documentation/process/5.Posting.rst'. Regzbot needs them to
> automatically connect reports with fixes, but they are useful in
> general, too.
>
> I'm sending this to everyone that got the initial report, to make
> everyone aware of the tracking. I also hope that messages like this
> motivate people to directly get at least the regression mailing list and
> ideally even regzbot involved when dealing with regressions, as messages
> like this wouldn't be needed then. And don't worry, if I need to send
> other mails regarding this regression only relevant for regzbot I'll
> send them to the regressions lists only (with a tag in the subject so
> people can filter them away). With a bit of luck no such messages will
> be needed anyway.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
>
>
> On 22.03.22 00:29, Jirka Hladky wrote:
> > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > Gold 6126 CPU)
> >
> > Bellow is a backtrace and the dmesg log.
> >
> > I have trouble creating a simple reproducer - it happens at random
> > places when preparing the NAS benchmark to be run. The script creates
> > a bunch of directories, compiles the benchmark a start trial runs.
> >
> > Could you please help to narrow down the problem?
> >
> > Reports bellow were created with kernel 5.17 rc8 and with
> > echo 1 > /proc/sys/kernel/panic_on_oops
> > setting.
> >
> > crash> sys
> >       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
> >     DUMPFILE: vmcore  [PARTIAL DUMP]
> >         CPUS: 48
> >         DATE: Thu Mar 17 02:49:40 CET 2022
> >       UPTIME: 00:02:50
> > LOAD AVERAGE: 0.32, 0.10, 0.03
> >        TASKS: 608
> >     NODENAME: gold-2s-c
> >      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
> >      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
> >      MACHINE: x86_64  (2600 Mhz)
> >       MEMORY: 94.7 GB
> >        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> >
> >
> > crash> bt
> > PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> > #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> > #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> > #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> > #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> > #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> > #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> > #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
> >    [exception RIP: kernfs_remove+7]
> >    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
> >    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
> >    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
> >    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
> >    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
> >    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
> >    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> > #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> > #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> > #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> > #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> > #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> > #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> > #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
> >    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
> >    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
> >    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
> >    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
> >    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
> >    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
> >    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> >
> > [2] dmesg
> > [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > [  172.783513] #PF: supervisor read access in kernel mode
> > [  172.788652] #PF: error_code(0x0000) - not-present page
> > [  172.793793] PGD 0 P4D 0
> > [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> > [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> > tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> > [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > 2.0b 03/07/2018
> > [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> > [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> > f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> > 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> > c5 60
> > [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> > [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> > [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> > [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> > [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> > [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> > [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> > knlGS:0000000000000000
> > [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> > [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  172.916261] PKRU: 55555554
> > [  172.918974] Call Trace:
> > [  172.921427]  <TASK>
> > [  172.923533]  rdt_kill_sb+0x29b/0x350
> > [  172.927112]  deactivate_locked_super+0x2f/0xa0
> > [  172.931559]  cleanup_mnt+0xee/0x180
> > [  172.935051]  task_work_run+0x5c/0x90
> > [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> > [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> > [  172.948043]  do_syscall_64+0x46/0x80
> > [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  172.956675] RIP: 0033:0x7f442c75126b
> > [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> > f7 d8
> > [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> > 00000000000000a6
> > [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> > [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> > [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> > [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> > [  173.022249]  </TASK>
> > [  173.024440] Modules linked in: rfkill intel_rapl_msr
> > intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> > x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> > intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> > intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> > mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> > intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> > acpi_power_meter fuse zram xfs crct10d
> > if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> > ttm wmi ghash_clmulni_intel
> > [  173.073900] CR2: 0000000000000008
> >
>
> --
> Additional information about regzbot:
>
> If you want to know more about regzbot, check out its web-interface, the
> getting start guide, and the references documentation:
>
> https://linux-regtracking.leemhuis.info/regzbot/
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
>
> The last two documents will explain how you can interact with regzbot
> yourself if your want to.
>
> Hint for reporters: when reporting a regression it's in your interest to
> CC the regression list and tell regzbot about the issue, as that ensures
> the regression makes it onto the radar of the Linux kernel's regression
> tracker -- that's in your interest, as it ensures your report won't fall
> through the cracks unnoticed.
>
> Hint for developers: you normally don't need to care about regzbot once
> it's involved. Fix the issue as you normally would, just remember to
> include 'Link:' tag in the patch descriptions pointing to all reports
> about the issue. This has been expected from developers even before
> regzbot showed up for reasons explained in
> 'Documentation/process/submitting-patches.rst' and
> 'Documentation/process/5.Posting.rst'.
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-30 22:16   ` Jirka Hladky
@ 2022-03-30 22:24     ` Jirka Hladky
  2022-03-31  0:11       ` Minchan Kim
  2022-04-04  6:37       ` PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers #forregzbot Thorsten Leemhuis
  0 siblings, 2 replies; 21+ messages in thread
From: Jirka Hladky @ 2022-03-30 22:24 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-kernel, regressions, Thorsten Leemhuis

Adding Minchan Kim on Cc.

@Minchan - commit 393c3714081a53795bbff0e985d24146def6f57f authored by
you is causing BUG: kernel NULL pointer dereference, address:
0000000000000008

Could you please have a look at what might be wrong?

Thank you!
Jirka

On Thu, Mar 31, 2022 at 12:16 AM Jirka Hladky <jhladky@redhat.com> wrote:
>
> Hi Thorsten,
>
> thanks for adding this to the regzbot bot.
>
> Hi Greg and all,
>
> I did bisecting and I have found the commit causing this issue [1].
> Could you please have a look at the code how to fix it?
>
> Thanks a lot
> Jirka
>
> [1]
> =========================================================
> $ git bisect visualize
> commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
> Author: Minchan Kim <minchan@kernel.org>
> Date:   Thu Nov 18 15:00:08 2021 -0800
>
>    kernfs: switch global kernfs_rwsem lock to per-fs lock
>
>    The kernfs implementation has big lock granularity(kernfs_rwsem) so
>    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
>    lock. It makes trouble for some cases to wait the global lock
>    for a long time even though they are totally independent contexts
>    each other.
>
>    A general example is process A goes under direct reclaim with holding
>    the lock when it accessed the file in sysfs and process B is waiting
>    the lock with exclusive mode and then process C is waiting the lock
>    until process B could finish the job after it gets the lock from
>    process A.
>
>    This patch switches the global kernfs_rwsem to per-fs lock, which
>    put the rwsem into kernfs_root.
>
>    Suggested-by: Tejun Heo <tj@kernel.org>
>    Acked-by: Tejun Heo <tj@kernel.org>
>    Signed-off-by: Minchan Kim <minchan@kernel.org>
>    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
>    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> =========================================================
>
> The bug is triggered by running NAS Parallel benchmark suite on
> SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
> log:
>
> [  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  247.036009] #PF: supervisor read access in kernel mode
> [  247.036009] #PF: error_code(0x0000) - not-present page
> [  247.036009] PGD 0 P4D 0
> [  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
> [  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
> 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
> [  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> 2.0b 03/07/2018
> [  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
> [  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> c4 60
> [  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> [  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> [  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> [  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> [  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> [  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> [  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> knlGS:0000000000000000
> [  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> [  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  247.122048] PKRU: 55555554
> [  247.122048] Call Trace:
> [  247.122048]  <TASK>
> [  247.122048]  rdt_kill_sb+0x29d/0x350
> [  247.122048]  deactivate_locked_super+0x36/0xa0
> [  247.122048]  cleanup_mnt+0x131/0x190
> [  247.122048]  task_work_run+0x5c/0x90
> [  247.122048]  exit_to_user_mode_prepare+0x229/0x230
> [  247.122048]  syscall_exit_to_user_mode+0x18/0x40
> [  247.122048]  do_syscall_64+0x48/0x90
> [  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  247.122048] RIP: 0033:0x7f01be2d735b
> [  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
> 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
> f7 d8
> [  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000a6
> [  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
> [  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
> [  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
> [  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> [  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
> [  247.279066]  </TASK>
> [  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
> intel_rapl_common isst_if_common skx_edac nfit libnvdimm
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
> iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
> rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
> intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
> intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
> xfs ast i2c_algo_bit drm_v
> ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
> fuse ipmi_devintf ipmi_msghandler
> [  247.335054] CR2: 0000000000000008
> [  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
> [  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
> [  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> c4 60
> [  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> [  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> [  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> [  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> [  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> [  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> [  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> knlGS:0000000000000000
> [  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> [  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  247.455060] PKRU: 55555554
>
> On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
> <regressions@leemhuis.info> wrote:
> >
> > [TLDR: I'm adding the regression report below to regzbot, the Linux
> > kernel regression tracking bot; all text you find below is compiled from
> > a few templates paragraphs you might have encountered already already
> > from similar mails.]
> >
> > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > to make this easily accessible to everyone.
> >
> > To be sure below issue doesn't fall through the cracks unnoticed, I'm
> > adding it to regzbot, my Linux kernel regression tracking bot:
> >
> > #regzbot ^introduced v5.16..v5.17
> > #regzbot ignore-activity
> >
> > If it turns out this isn't a regression, free free to remove it from the
> > tracking by sending a reply to this thread containing a paragraph like
> > "#regzbot invalid: reason why this is invalid" (without the quotes).
> >
> > Reminder for developers: when fixing the issue, please add a 'Link:'
> > tags pointing to the report (the mail quoted above) using
> > lore.kernel.org/r/, as explained in
> > 'Documentation/process/submitting-patches.rst' and
> > 'Documentation/process/5.Posting.rst'. Regzbot needs them to
> > automatically connect reports with fixes, but they are useful in
> > general, too.
> >
> > I'm sending this to everyone that got the initial report, to make
> > everyone aware of the tracking. I also hope that messages like this
> > motivate people to directly get at least the regression mailing list and
> > ideally even regzbot involved when dealing with regressions, as messages
> > like this wouldn't be needed then. And don't worry, if I need to send
> > other mails regarding this regression only relevant for regzbot I'll
> > send them to the regressions lists only (with a tag in the subject so
> > people can filter them away). With a bit of luck no such messages will
> > be needed anyway.
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >
> > P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> > reports on my table. I can only look briefly into most of them and lack
> > knowledge about most of the areas they concern. I thus unfortunately
> > will sometimes get things wrong or miss something important. I hope
> > that's not the case here; if you think it is, don't hesitate to tell me
> > in a public reply, it's in everyone's interest to set the public record
> > straight.
> >
> >
> > On 22.03.22 00:29, Jirka Hladky wrote:
> > > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > > Gold 6126 CPU)
> > >
> > > Bellow is a backtrace and the dmesg log.
> > >
> > > I have trouble creating a simple reproducer - it happens at random
> > > places when preparing the NAS benchmark to be run. The script creates
> > > a bunch of directories, compiles the benchmark a start trial runs.
> > >
> > > Could you please help to narrow down the problem?
> > >
> > > Reports bellow were created with kernel 5.17 rc8 and with
> > > echo 1 > /proc/sys/kernel/panic_on_oops
> > > setting.
> > >
> > > crash> sys
> > >       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
> > >     DUMPFILE: vmcore  [PARTIAL DUMP]
> > >         CPUS: 48
> > >         DATE: Thu Mar 17 02:49:40 CET 2022
> > >       UPTIME: 00:02:50
> > > LOAD AVERAGE: 0.32, 0.10, 0.03
> > >        TASKS: 608
> > >     NODENAME: gold-2s-c
> > >      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
> > >      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
> > >      MACHINE: x86_64  (2600 Mhz)
> > >       MEMORY: 94.7 GB
> > >        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> > >
> > >
> > > crash> bt
> > > PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> > > #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> > > #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> > > #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> > > #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> > > #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> > > #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> > > #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
> > >    [exception RIP: kernfs_remove+7]
> > >    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
> > >    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
> > >    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
> > >    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
> > >    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
> > >    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
> > >    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> > > #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> > > #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> > > #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> > > #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> > > #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> > > #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> > > #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
> > >    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
> > >    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
> > >    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
> > >    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
> > >    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
> > >    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
> > >    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> > >
> > > [2] dmesg
> > > [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > [  172.783513] #PF: supervisor read access in kernel mode
> > > [  172.788652] #PF: error_code(0x0000) - not-present page
> > > [  172.793793] PGD 0 P4D 0
> > > [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> > > [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> > > tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> > > [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > 2.0b 03/07/2018
> > > [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> > > [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> > > f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> > > 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> > > c5 60
> > > [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> > > [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> > > [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> > > [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> > > [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> > > [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> > > [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> > > knlGS:0000000000000000
> > > [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> > > [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [  172.916261] PKRU: 55555554
> > > [  172.918974] Call Trace:
> > > [  172.921427]  <TASK>
> > > [  172.923533]  rdt_kill_sb+0x29b/0x350
> > > [  172.927112]  deactivate_locked_super+0x2f/0xa0
> > > [  172.931559]  cleanup_mnt+0xee/0x180
> > > [  172.935051]  task_work_run+0x5c/0x90
> > > [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> > > [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> > > [  172.948043]  do_syscall_64+0x46/0x80
> > > [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [  172.956675] RIP: 0033:0x7f442c75126b
> > > [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> > > f7 d8
> > > [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> > > 00000000000000a6
> > > [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> > > [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> > > [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> > > [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> > > [  173.022249]  </TASK>
> > > [  173.024440] Modules linked in: rfkill intel_rapl_msr
> > > intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> > > x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> > > intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> > > intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> > > mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> > > intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> > > acpi_power_meter fuse zram xfs crct10d
> > > if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> > > ttm wmi ghash_clmulni_intel
> > > [  173.073900] CR2: 0000000000000008
> > >
> >
> > --
> > Additional information about regzbot:
> >
> > If you want to know more about regzbot, check out its web-interface, the
> > getting start guide, and the references documentation:
> >
> > https://linux-regtracking.leemhuis.info/regzbot/
> > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> >
> > The last two documents will explain how you can interact with regzbot
> > yourself if your want to.
> >
> > Hint for reporters: when reporting a regression it's in your interest to
> > CC the regression list and tell regzbot about the issue, as that ensures
> > the regression makes it onto the radar of the Linux kernel's regression
> > tracker -- that's in your interest, as it ensures your report won't fall
> > through the cracks unnoticed.
> >
> > Hint for developers: you normally don't need to care about regzbot once
> > it's involved. Fix the issue as you normally would, just remember to
> > include 'Link:' tag in the patch descriptions pointing to all reports
> > about the issue. This has been expected from developers even before
> > regzbot showed up for reasons explained in
> > 'Documentation/process/submitting-patches.rst' and
> > 'Documentation/process/5.Posting.rst'.
> >
>
>
> --
> -Jirka



-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-30 22:24     ` Jirka Hladky
@ 2022-03-31  0:11       ` Minchan Kim
  2022-03-31 14:54         ` Justin Forbes
  2022-04-04  6:37       ` PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 21+ messages in thread
From: Minchan Kim @ 2022-03-31  0:11 UTC (permalink / raw)
  To: Jirka Hladky, tj; +Cc: linux-kernel, regressions, Thorsten Leemhuis

On Thu, Mar 31, 2022 at 12:24:12AM +0200, Jirka Hladky wrote:
> Adding Minchan Kim on Cc.
> 
> @Minchan - commit 393c3714081a53795bbff0e985d24146def6f57f authored by
> you is causing BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> 
> Could you please have a look at what might be wrong?

There was one follow-up patch to fix some issue at that time.

555a0ce4558d kernfs: prevent early freeing of root node

So, do you mean you hit the bug with the additional fix?
Do you have any reproducer?

Ccing Tejun to borrow kernfs expertise.

> 
> Thank you!
> Jirka
> 
> On Thu, Mar 31, 2022 at 12:16 AM Jirka Hladky <jhladky@redhat.com> wrote:
> >
> > Hi Thorsten,
> >
> > thanks for adding this to the regzbot bot.
> >
> > Hi Greg and all,
> >
> > I did bisecting and I have found the commit causing this issue [1].
> > Could you please have a look at the code how to fix it?
> >
> > Thanks a lot
> > Jirka
> >
> > [1]
> > =========================================================
> > $ git bisect visualize
> > commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
> > Author: Minchan Kim <minchan@kernel.org>
> > Date:   Thu Nov 18 15:00:08 2021 -0800
> >
> >    kernfs: switch global kernfs_rwsem lock to per-fs lock
> >
> >    The kernfs implementation has big lock granularity(kernfs_rwsem) so
> >    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
> >    lock. It makes trouble for some cases to wait the global lock
> >    for a long time even though they are totally independent contexts
> >    each other.
> >
> >    A general example is process A goes under direct reclaim with holding
> >    the lock when it accessed the file in sysfs and process B is waiting
> >    the lock with exclusive mode and then process C is waiting the lock
> >    until process B could finish the job after it gets the lock from
> >    process A.
> >
> >    This patch switches the global kernfs_rwsem to per-fs lock, which
> >    put the rwsem into kernfs_root.
> >
> >    Suggested-by: Tejun Heo <tj@kernel.org>
> >    Acked-by: Tejun Heo <tj@kernel.org>
> >    Signed-off-by: Minchan Kim <minchan@kernel.org>
> >    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
> >    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > =========================================================
> >
> > The bug is triggered by running NAS Parallel benchmark suite on
> > SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
> > log:
> >
> > [  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > [  247.036009] #PF: supervisor read access in kernel mode
> > [  247.036009] #PF: error_code(0x0000) - not-present page
> > [  247.036009] PGD 0 P4D 0
> > [  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
> > [  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
> > 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
> > [  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > 2.0b 03/07/2018
> > [  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
> > [  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > c4 60
> > [  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > [  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > [  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > [  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > [  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > [  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > [  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > knlGS:0000000000000000
> > [  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > [  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  247.122048] PKRU: 55555554
> > [  247.122048] Call Trace:
> > [  247.122048]  <TASK>
> > [  247.122048]  rdt_kill_sb+0x29d/0x350
> > [  247.122048]  deactivate_locked_super+0x36/0xa0
> > [  247.122048]  cleanup_mnt+0x131/0x190
> > [  247.122048]  task_work_run+0x5c/0x90
> > [  247.122048]  exit_to_user_mode_prepare+0x229/0x230
> > [  247.122048]  syscall_exit_to_user_mode+0x18/0x40
> > [  247.122048]  do_syscall_64+0x48/0x90
> > [  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  247.122048] RIP: 0033:0x7f01be2d735b
> > [  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
> > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
> > f7 d8
> > [  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
> > 00000000000000a6
> > [  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
> > [  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
> > [  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
> > [  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > [  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
> > [  247.279066]  </TASK>
> > [  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
> > intel_rapl_common isst_if_common skx_edac nfit libnvdimm
> > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
> > iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
> > rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
> > intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
> > intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
> > xfs ast i2c_algo_bit drm_v
> > ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
> > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
> > fuse ipmi_devintf ipmi_msghandler
> > [  247.335054] CR2: 0000000000000008
> > [  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
> > [  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
> > [  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > c4 60
> > [  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > [  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > [  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > [  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > [  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > [  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > [  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > knlGS:0000000000000000
> > [  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > [  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  247.455060] PKRU: 55555554
> >
> > On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
> > <regressions@leemhuis.info> wrote:
> > >
> > > [TLDR: I'm adding the regression report below to regzbot, the Linux
> > > kernel regression tracking bot; all text you find below is compiled from
> > > a few templates paragraphs you might have encountered already already
> > > from similar mails.]
> > >
> > > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > > to make this easily accessible to everyone.
> > >
> > > To be sure below issue doesn't fall through the cracks unnoticed, I'm
> > > adding it to regzbot, my Linux kernel regression tracking bot:
> > >
> > > #regzbot ^introduced v5.16..v5.17
> > > #regzbot ignore-activity
> > >
> > > If it turns out this isn't a regression, free free to remove it from the
> > > tracking by sending a reply to this thread containing a paragraph like
> > > "#regzbot invalid: reason why this is invalid" (without the quotes).
> > >
> > > Reminder for developers: when fixing the issue, please add a 'Link:'
> > > tags pointing to the report (the mail quoted above) using
> > > lore.kernel.org/r/, as explained in
> > > 'Documentation/process/submitting-patches.rst' and
> > > 'Documentation/process/5.Posting.rst'. Regzbot needs them to
> > > automatically connect reports with fixes, but they are useful in
> > > general, too.
> > >
> > > I'm sending this to everyone that got the initial report, to make
> > > everyone aware of the tracking. I also hope that messages like this
> > > motivate people to directly get at least the regression mailing list and
> > > ideally even regzbot involved when dealing with regressions, as messages
> > > like this wouldn't be needed then. And don't worry, if I need to send
> > > other mails regarding this regression only relevant for regzbot I'll
> > > send them to the regressions lists only (with a tag in the subject so
> > > people can filter them away). With a bit of luck no such messages will
> > > be needed anyway.
> > >
> > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > >
> > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> > > reports on my table. I can only look briefly into most of them and lack
> > > knowledge about most of the areas they concern. I thus unfortunately
> > > will sometimes get things wrong or miss something important. I hope
> > > that's not the case here; if you think it is, don't hesitate to tell me
> > > in a public reply, it's in everyone's interest to set the public record
> > > straight.
> > >
> > >
> > > On 22.03.22 00:29, Jirka Hladky wrote:
> > > > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > > > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > > > Gold 6126 CPU)
> > > >
> > > > Bellow is a backtrace and the dmesg log.
> > > >
> > > > I have trouble creating a simple reproducer - it happens at random
> > > > places when preparing the NAS benchmark to be run. The script creates
> > > > a bunch of directories, compiles the benchmark a start trial runs.
> > > >
> > > > Could you please help to narrow down the problem?
> > > >
> > > > Reports bellow were created with kernel 5.17 rc8 and with
> > > > echo 1 > /proc/sys/kernel/panic_on_oops
> > > > setting.
> > > >
> > > > crash> sys
> > > >       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
> > > >     DUMPFILE: vmcore  [PARTIAL DUMP]
> > > >         CPUS: 48
> > > >         DATE: Thu Mar 17 02:49:40 CET 2022
> > > >       UPTIME: 00:02:50
> > > > LOAD AVERAGE: 0.32, 0.10, 0.03
> > > >        TASKS: 608
> > > >     NODENAME: gold-2s-c
> > > >      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
> > > >      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
> > > >      MACHINE: x86_64  (2600 Mhz)
> > > >       MEMORY: 94.7 GB
> > > >        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> > > >
> > > >
> > > > crash> bt
> > > > PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> > > > #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> > > > #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> > > > #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> > > > #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> > > > #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> > > > #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> > > > #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
> > > >    [exception RIP: kernfs_remove+7]
> > > >    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
> > > >    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
> > > >    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
> > > >    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
> > > >    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
> > > >    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
> > > >    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > > #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> > > > #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> > > > #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> > > > #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> > > > #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> > > > #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> > > > #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> > > > #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
> > > >    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
> > > >    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
> > > >    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
> > > >    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
> > > >    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
> > > >    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
> > > >    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> > > >
> > > > [2] dmesg
> > > > [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > [  172.783513] #PF: supervisor read access in kernel mode
> > > > [  172.788652] #PF: error_code(0x0000) - not-present page
> > > > [  172.793793] PGD 0 P4D 0
> > > > [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> > > > tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> > > > [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > 2.0b 03/07/2018
> > > > [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> > > > [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> > > > f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> > > > 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> > > > c5 60
> > > > [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> > > > [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> > > > [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> > > > [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> > > > [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> > > > [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> > > > [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> > > > knlGS:0000000000000000
> > > > [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> > > > [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [  172.916261] PKRU: 55555554
> > > > [  172.918974] Call Trace:
> > > > [  172.921427]  <TASK>
> > > > [  172.923533]  rdt_kill_sb+0x29b/0x350
> > > > [  172.927112]  deactivate_locked_super+0x2f/0xa0
> > > > [  172.931559]  cleanup_mnt+0xee/0x180
> > > > [  172.935051]  task_work_run+0x5c/0x90
> > > > [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> > > > [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> > > > [  172.948043]  do_syscall_64+0x46/0x80
> > > > [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > [  172.956675] RIP: 0033:0x7f442c75126b
> > > > [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> > > > f7 d8
> > > > [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> > > > 00000000000000a6
> > > > [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> > > > [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> > > > [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> > > > [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> > > > [  173.022249]  </TASK>
> > > > [  173.024440] Modules linked in: rfkill intel_rapl_msr
> > > > intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> > > > x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> > > > intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> > > > intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> > > > mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> > > > intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> > > > acpi_power_meter fuse zram xfs crct10d
> > > > if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> > > > ttm wmi ghash_clmulni_intel
> > > > [  173.073900] CR2: 0000000000000008
> > > >
> > >
> > > --
> > > Additional information about regzbot:
> > >
> > > If you want to know more about regzbot, check out its web-interface, the
> > > getting start guide, and the references documentation:
> > >
> > > https://linux-regtracking.leemhuis.info/regzbot/
> > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> > >
> > > The last two documents will explain how you can interact with regzbot
> > > yourself if your want to.
> > >
> > > Hint for reporters: when reporting a regression it's in your interest to
> > > CC the regression list and tell regzbot about the issue, as that ensures
> > > the regression makes it onto the radar of the Linux kernel's regression
> > > tracker -- that's in your interest, as it ensures your report won't fall
> > > through the cracks unnoticed.
> > >
> > > Hint for developers: you normally don't need to care about regzbot once
> > > it's involved. Fix the issue as you normally would, just remember to
> > > include 'Link:' tag in the patch descriptions pointing to all reports
> > > about the issue. This has been expected from developers even before
> > > regzbot showed up for reasons explained in
> > > 'Documentation/process/submitting-patches.rst' and
> > > 'Documentation/process/5.Posting.rst'.
> > >
> >
> >
> > --
> > -Jirka
> 
> 
> 
> -- 
> -Jirka
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-31  0:11       ` Minchan Kim
@ 2022-03-31 14:54         ` Justin Forbes
  2022-03-31 16:18           ` Jirka Hladky
  0 siblings, 1 reply; 21+ messages in thread
From: Justin Forbes @ 2022-03-31 14:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jirka Hladky, tj, linux-kernel, regressions, Thorsten Leemhuis

On Wed, Mar 30, 2022 at 7:11 PM Minchan Kim <minchan@kernel.org> wrote:
>
> On Thu, Mar 31, 2022 at 12:24:12AM +0200, Jirka Hladky wrote:
> > Adding Minchan Kim on Cc.
> >
> > @Minchan - commit 393c3714081a53795bbff0e985d24146def6f57f authored by
> > you is causing BUG: kernel NULL pointer dereference, address:
> > 0000000000000008
> >
> > Could you please have a look at what might be wrong?
>
> There was one follow-up patch to fix some issue at that time.
>
> 555a0ce4558d kernfs: prevent early freeing of root node
>
> So, do you mean you hit the bug with the additional fix?
> Do you have any reproducer?
>
> Ccing Tejun to borrow kernfs expertise.

That patch was included in v5.17-rc1, so yes, it does reproduce with
that patch included.

Justin

> >
> > Thank you!
> > Jirka
> >
> > On Thu, Mar 31, 2022 at 12:16 AM Jirka Hladky <jhladky@redhat.com> wrote:
> > >
> > > Hi Thorsten,
> > >
> > > thanks for adding this to the regzbot bot.
> > >
> > > Hi Greg and all,
> > >
> > > I did bisecting and I have found the commit causing this issue [1].
> > > Could you please have a look at the code how to fix it?
> > >
> > > Thanks a lot
> > > Jirka
> > >
> > > [1]
> > > =========================================================
> > > $ git bisect visualize
> > > commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
> > > Author: Minchan Kim <minchan@kernel.org>
> > > Date:   Thu Nov 18 15:00:08 2021 -0800
> > >
> > >    kernfs: switch global kernfs_rwsem lock to per-fs lock
> > >
> > >    The kernfs implementation has big lock granularity(kernfs_rwsem) so
> > >    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
> > >    lock. It makes trouble for some cases to wait the global lock
> > >    for a long time even though they are totally independent contexts
> > >    each other.
> > >
> > >    A general example is process A goes under direct reclaim with holding
> > >    the lock when it accessed the file in sysfs and process B is waiting
> > >    the lock with exclusive mode and then process C is waiting the lock
> > >    until process B could finish the job after it gets the lock from
> > >    process A.
> > >
> > >    This patch switches the global kernfs_rwsem to per-fs lock, which
> > >    put the rwsem into kernfs_root.
> > >
> > >    Suggested-by: Tejun Heo <tj@kernel.org>
> > >    Acked-by: Tejun Heo <tj@kernel.org>
> > >    Signed-off-by: Minchan Kim <minchan@kernel.org>
> > >    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
> > >    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > =========================================================
> > >
> > > The bug is triggered by running NAS Parallel benchmark suite on
> > > SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
> > > log:
> > >
> > > [  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > [  247.036009] #PF: supervisor read access in kernel mode
> > > [  247.036009] #PF: error_code(0x0000) - not-present page
> > > [  247.036009] PGD 0 P4D 0
> > > [  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
> > > [  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
> > > 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
> > > [  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > 2.0b 03/07/2018
> > > [  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
> > > [  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > c4 60
> > > [  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > [  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > [  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > [  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > [  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > [  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > [  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > knlGS:0000000000000000
> > > [  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > [  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [  247.122048] PKRU: 55555554
> > > [  247.122048] Call Trace:
> > > [  247.122048]  <TASK>
> > > [  247.122048]  rdt_kill_sb+0x29d/0x350
> > > [  247.122048]  deactivate_locked_super+0x36/0xa0
> > > [  247.122048]  cleanup_mnt+0x131/0x190
> > > [  247.122048]  task_work_run+0x5c/0x90
> > > [  247.122048]  exit_to_user_mode_prepare+0x229/0x230
> > > [  247.122048]  syscall_exit_to_user_mode+0x18/0x40
> > > [  247.122048]  do_syscall_64+0x48/0x90
> > > [  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [  247.122048] RIP: 0033:0x7f01be2d735b
> > > [  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
> > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
> > > f7 d8
> > > [  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
> > > 00000000000000a6
> > > [  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
> > > [  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
> > > [  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
> > > [  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > [  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
> > > [  247.279066]  </TASK>
> > > [  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
> > > intel_rapl_common isst_if_common skx_edac nfit libnvdimm
> > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
> > > iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
> > > rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
> > > intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
> > > intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
> > > xfs ast i2c_algo_bit drm_v
> > > ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
> > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
> > > fuse ipmi_devintf ipmi_msghandler
> > > [  247.335054] CR2: 0000000000000008
> > > [  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
> > > [  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
> > > [  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > c4 60
> > > [  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > [  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > [  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > [  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > [  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > [  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > [  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > knlGS:0000000000000000
> > > [  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > [  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [  247.455060] PKRU: 55555554
> > >
> > > On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
> > > <regressions@leemhuis.info> wrote:
> > > >
> > > > [TLDR: I'm adding the regression report below to regzbot, the Linux
> > > > kernel regression tracking bot; all text you find below is compiled from
> > > > a few templates paragraphs you might have encountered already already
> > > > from similar mails.]
> > > >
> > > > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > > > to make this easily accessible to everyone.
> > > >
> > > > To be sure below issue doesn't fall through the cracks unnoticed, I'm
> > > > adding it to regzbot, my Linux kernel regression tracking bot:
> > > >
> > > > #regzbot ^introduced v5.16..v5.17
> > > > #regzbot ignore-activity
> > > >
> > > > If it turns out this isn't a regression, free free to remove it from the
> > > > tracking by sending a reply to this thread containing a paragraph like
> > > > "#regzbot invalid: reason why this is invalid" (without the quotes).
> > > >
> > > > Reminder for developers: when fixing the issue, please add a 'Link:'
> > > > tags pointing to the report (the mail quoted above) using
> > > > lore.kernel.org/r/, as explained in
> > > > 'Documentation/process/submitting-patches.rst' and
> > > > 'Documentation/process/5.Posting.rst'. Regzbot needs them to
> > > > automatically connect reports with fixes, but they are useful in
> > > > general, too.
> > > >
> > > > I'm sending this to everyone that got the initial report, to make
> > > > everyone aware of the tracking. I also hope that messages like this
> > > > motivate people to directly get at least the regression mailing list and
> > > > ideally even regzbot involved when dealing with regressions, as messages
> > > > like this wouldn't be needed then. And don't worry, if I need to send
> > > > other mails regarding this regression only relevant for regzbot I'll
> > > > send them to the regressions lists only (with a tag in the subject so
> > > > people can filter them away). With a bit of luck no such messages will
> > > > be needed anyway.
> > > >
> > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > > >
> > > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> > > > reports on my table. I can only look briefly into most of them and lack
> > > > knowledge about most of the areas they concern. I thus unfortunately
> > > > will sometimes get things wrong or miss something important. I hope
> > > > that's not the case here; if you think it is, don't hesitate to tell me
> > > > in a public reply, it's in everyone's interest to set the public record
> > > > straight.
> > > >
> > > >
> > > > On 22.03.22 00:29, Jirka Hladky wrote:
> > > > > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > > > > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > > > > Gold 6126 CPU)
> > > > >
> > > > > Bellow is a backtrace and the dmesg log.
> > > > >
> > > > > I have trouble creating a simple reproducer - it happens at random
> > > > > places when preparing the NAS benchmark to be run. The script creates
> > > > > a bunch of directories, compiles the benchmark a start trial runs.
> > > > >
> > > > > Could you please help to narrow down the problem?
> > > > >
> > > > > Reports bellow were created with kernel 5.17 rc8 and with
> > > > > echo 1 > /proc/sys/kernel/panic_on_oops
> > > > > setting.
> > > > >
> > > > > crash> sys
> > > > >       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
> > > > >     DUMPFILE: vmcore  [PARTIAL DUMP]
> > > > >         CPUS: 48
> > > > >         DATE: Thu Mar 17 02:49:40 CET 2022
> > > > >       UPTIME: 00:02:50
> > > > > LOAD AVERAGE: 0.32, 0.10, 0.03
> > > > >        TASKS: 608
> > > > >     NODENAME: gold-2s-c
> > > > >      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
> > > > >      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
> > > > >      MACHINE: x86_64  (2600 Mhz)
> > > > >       MEMORY: 94.7 GB
> > > > >        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> > > > >
> > > > >
> > > > > crash> bt
> > > > > PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> > > > > #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> > > > > #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> > > > > #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> > > > > #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> > > > > #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> > > > > #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> > > > > #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
> > > > >    [exception RIP: kernfs_remove+7]
> > > > >    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
> > > > >    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
> > > > >    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
> > > > >    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
> > > > >    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
> > > > >    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
> > > > >    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > > > #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> > > > > #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> > > > > #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> > > > > #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> > > > > #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> > > > > #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> > > > > #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> > > > > #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
> > > > >    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
> > > > >    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
> > > > >    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
> > > > >    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
> > > > >    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
> > > > >    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
> > > > >    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> > > > >
> > > > > [2] dmesg
> > > > > [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > > [  172.783513] #PF: supervisor read access in kernel mode
> > > > > [  172.788652] #PF: error_code(0x0000) - not-present page
> > > > > [  172.793793] PGD 0 P4D 0
> > > > > [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > > [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> > > > > tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> > > > > [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > > 2.0b 03/07/2018
> > > > > [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> > > > > [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> > > > > f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> > > > > 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> > > > > c5 60
> > > > > [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> > > > > [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> > > > > [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> > > > > [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> > > > > [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> > > > > [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> > > > > [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> > > > > [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [  172.916261] PKRU: 55555554
> > > > > [  172.918974] Call Trace:
> > > > > [  172.921427]  <TASK>
> > > > > [  172.923533]  rdt_kill_sb+0x29b/0x350
> > > > > [  172.927112]  deactivate_locked_super+0x2f/0xa0
> > > > > [  172.931559]  cleanup_mnt+0xee/0x180
> > > > > [  172.935051]  task_work_run+0x5c/0x90
> > > > > [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> > > > > [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> > > > > [  172.948043]  do_syscall_64+0x46/0x80
> > > > > [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  172.956675] RIP: 0033:0x7f442c75126b
> > > > > [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> > > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> > > > > f7 d8
> > > > > [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> > > > > 00000000000000a6
> > > > > [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> > > > > [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> > > > > [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> > > > > [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > > [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> > > > > [  173.022249]  </TASK>
> > > > > [  173.024440] Modules linked in: rfkill intel_rapl_msr
> > > > > intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> > > > > x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> > > > > intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> > > > > intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> > > > > mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> > > > > intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> > > > > acpi_power_meter fuse zram xfs crct10d
> > > > > if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> > > > > ttm wmi ghash_clmulni_intel
> > > > > [  173.073900] CR2: 0000000000000008
> > > > >
> > > >
> > > > --
> > > > Additional information about regzbot:
> > > >
> > > > If you want to know more about regzbot, check out its web-interface, the
> > > > getting start guide, and the references documentation:
> > > >
> > > > https://linux-regtracking.leemhuis.info/regzbot/
> > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> > > >
> > > > The last two documents will explain how you can interact with regzbot
> > > > yourself if your want to.
> > > >
> > > > Hint for reporters: when reporting a regression it's in your interest to
> > > > CC the regression list and tell regzbot about the issue, as that ensures
> > > > the regression makes it onto the radar of the Linux kernel's regression
> > > > tracker -- that's in your interest, as it ensures your report won't fall
> > > > through the cracks unnoticed.
> > > >
> > > > Hint for developers: you normally don't need to care about regzbot once
> > > > it's involved. Fix the issue as you normally would, just remember to
> > > > include 'Link:' tag in the patch descriptions pointing to all reports
> > > > about the issue. This has been expected from developers even before
> > > > regzbot showed up for reasons explained in
> > > > 'Documentation/process/submitting-patches.rst' and
> > > > 'Documentation/process/5.Posting.rst'.
> > > >
> > >
> > >
> > > --
> > > -Jirka
> >
> >
> >
> > --
> > -Jirka
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-31 14:54         ` Justin Forbes
@ 2022-03-31 16:18           ` Jirka Hladky
  2022-03-31 23:33             ` Minchan Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Jirka Hladky @ 2022-03-31 16:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: tj, linux-kernel, regressions, Thorsten Leemhuis, Justin Forbes

> So, do you mean you hit the bug with the additional fix?

Yes, exactly. We have been hitting this issue since v5.17-rc1. I have
now specifically tested the "555a0ce4558d kernfs: prevent early
freeing of root node" commit and it does not resolve the issue.

> Do you have any reproducer?
Yes. It happens in various places when preparing a NAS parallel
benchmark for the execution. Sometimes during compilation, sometimes
with the first trial run. It takes 1 or 2 minutes to hit that issue.

@Michan - the tarball with the reproducer has ~170kB. How can I send
it to you? (I have been trying to create a simple reproducer but
without success).

Thanks
Jirka


On Thu, Mar 31, 2022 at 4:55 PM Justin Forbes <jforbes@fedoraproject.org> wrote:
>
> On Wed, Mar 30, 2022 at 7:11 PM Minchan Kim <minchan@kernel.org> wrote:
> >
> > On Thu, Mar 31, 2022 at 12:24:12AM +0200, Jirka Hladky wrote:
> > > Adding Minchan Kim on Cc.
> > >
> > > @Minchan - commit 393c3714081a53795bbff0e985d24146def6f57f authored by
> > > you is causing BUG: kernel NULL pointer dereference, address:
> > > 0000000000000008
> > >
> > > Could you please have a look at what might be wrong?
> >
> > There was one follow-up patch to fix some issue at that time.
> >
> > 555a0ce4558d kernfs: prevent early freeing of root node
> >
> > So, do you mean you hit the bug with the additional fix?
> > Do you have any reproducer?
> >
> > Ccing Tejun to borrow kernfs expertise.
>
> That patch was included in v5.17-rc1, so yes, it does reproduce with
> that patch included.
>
> Justin
>
> > >
> > > Thank you!
> > > Jirka
> > >
> > > On Thu, Mar 31, 2022 at 12:16 AM Jirka Hladky <jhladky@redhat.com> wrote:
> > > >
> > > > Hi Thorsten,
> > > >
> > > > thanks for adding this to the regzbot bot.
> > > >
> > > > Hi Greg and all,
> > > >
> > > > I did bisecting and I have found the commit causing this issue [1].
> > > > Could you please have a look at the code how to fix it?
> > > >
> > > > Thanks a lot
> > > > Jirka
> > > >
> > > > [1]
> > > > =========================================================
> > > > $ git bisect visualize
> > > > commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
> > > > Author: Minchan Kim <minchan@kernel.org>
> > > > Date:   Thu Nov 18 15:00:08 2021 -0800
> > > >
> > > >    kernfs: switch global kernfs_rwsem lock to per-fs lock
> > > >
> > > >    The kernfs implementation has big lock granularity(kernfs_rwsem) so
> > > >    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
> > > >    lock. It makes trouble for some cases to wait the global lock
> > > >    for a long time even though they are totally independent contexts
> > > >    each other.
> > > >
> > > >    A general example is process A goes under direct reclaim with holding
> > > >    the lock when it accessed the file in sysfs and process B is waiting
> > > >    the lock with exclusive mode and then process C is waiting the lock
> > > >    until process B could finish the job after it gets the lock from
> > > >    process A.
> > > >
> > > >    This patch switches the global kernfs_rwsem to per-fs lock, which
> > > >    put the rwsem into kernfs_root.
> > > >
> > > >    Suggested-by: Tejun Heo <tj@kernel.org>
> > > >    Acked-by: Tejun Heo <tj@kernel.org>
> > > >    Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > >    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
> > > >    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > =========================================================
> > > >
> > > > The bug is triggered by running NAS Parallel benchmark suite on
> > > > SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
> > > > log:
> > > >
> > > > [  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > [  247.036009] #PF: supervisor read access in kernel mode
> > > > [  247.036009] #PF: error_code(0x0000) - not-present page
> > > > [  247.036009] PGD 0 P4D 0
> > > > [  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > [  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
> > > > 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
> > > > [  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > 2.0b 03/07/2018
> > > > [  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
> > > > [  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > > c4 60
> > > > [  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > > [  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > > [  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > > [  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > > [  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > > [  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > > [  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > > knlGS:0000000000000000
> > > > [  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > > [  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [  247.122048] PKRU: 55555554
> > > > [  247.122048] Call Trace:
> > > > [  247.122048]  <TASK>
> > > > [  247.122048]  rdt_kill_sb+0x29d/0x350
> > > > [  247.122048]  deactivate_locked_super+0x36/0xa0
> > > > [  247.122048]  cleanup_mnt+0x131/0x190
> > > > [  247.122048]  task_work_run+0x5c/0x90
> > > > [  247.122048]  exit_to_user_mode_prepare+0x229/0x230
> > > > [  247.122048]  syscall_exit_to_user_mode+0x18/0x40
> > > > [  247.122048]  do_syscall_64+0x48/0x90
> > > > [  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > [  247.122048] RIP: 0033:0x7f01be2d735b
> > > > [  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
> > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
> > > > f7 d8
> > > > [  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
> > > > 00000000000000a6
> > > > [  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
> > > > [  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
> > > > [  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
> > > > [  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > [  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
> > > > [  247.279066]  </TASK>
> > > > [  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
> > > > intel_rapl_common isst_if_common skx_edac nfit libnvdimm
> > > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
> > > > iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
> > > > rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
> > > > intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
> > > > intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
> > > > xfs ast i2c_algo_bit drm_v
> > > > ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
> > > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
> > > > fuse ipmi_devintf ipmi_msghandler
> > > > [  247.335054] CR2: 0000000000000008
> > > > [  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
> > > > [  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
> > > > [  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > > c4 60
> > > > [  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > > [  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > > [  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > > [  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > > [  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > > [  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > > [  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > > knlGS:0000000000000000
> > > > [  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > > [  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [  247.455060] PKRU: 55555554
> > > >
> > > > On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
> > > > <regressions@leemhuis.info> wrote:
> > > > >
> > > > > [TLDR: I'm adding the regression report below to regzbot, the Linux
> > > > > kernel regression tracking bot; all text you find below is compiled from
> > > > > a few templates paragraphs you might have encountered already already
> > > > > from similar mails.]
> > > > >
> > > > > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > > > > to make this easily accessible to everyone.
> > > > >
> > > > > To be sure below issue doesn't fall through the cracks unnoticed, I'm
> > > > > adding it to regzbot, my Linux kernel regression tracking bot:
> > > > >
> > > > > #regzbot ^introduced v5.16..v5.17
> > > > > #regzbot ignore-activity
> > > > >
> > > > > If it turns out this isn't a regression, free free to remove it from the
> > > > > tracking by sending a reply to this thread containing a paragraph like
> > > > > "#regzbot invalid: reason why this is invalid" (without the quotes).
> > > > >
> > > > > Reminder for developers: when fixing the issue, please add a 'Link:'
> > > > > tags pointing to the report (the mail quoted above) using
> > > > > lore.kernel.org/r/, as explained in
> > > > > 'Documentation/process/submitting-patches.rst' and
> > > > > 'Documentation/process/5.Posting.rst'. Regzbot needs them to
> > > > > automatically connect reports with fixes, but they are useful in
> > > > > general, too.
> > > > >
> > > > > I'm sending this to everyone that got the initial report, to make
> > > > > everyone aware of the tracking. I also hope that messages like this
> > > > > motivate people to directly get at least the regression mailing list and
> > > > > ideally even regzbot involved when dealing with regressions, as messages
> > > > > like this wouldn't be needed then. And don't worry, if I need to send
> > > > > other mails regarding this regression only relevant for regzbot I'll
> > > > > send them to the regressions lists only (with a tag in the subject so
> > > > > people can filter them away). With a bit of luck no such messages will
> > > > > be needed anyway.
> > > > >
> > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > > > >
> > > > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> > > > > reports on my table. I can only look briefly into most of them and lack
> > > > > knowledge about most of the areas they concern. I thus unfortunately
> > > > > will sometimes get things wrong or miss something important. I hope
> > > > > that's not the case here; if you think it is, don't hesitate to tell me
> > > > > in a public reply, it's in everyone's interest to set the public record
> > > > > straight.
> > > > >
> > > > >
> > > > > On 22.03.22 00:29, Jirka Hladky wrote:
> > > > > > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > > > > > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > > > > > Gold 6126 CPU)
> > > > > >
> > > > > > Bellow is a backtrace and the dmesg log.
> > > > > >
> > > > > > I have trouble creating a simple reproducer - it happens at random
> > > > > > places when preparing the NAS benchmark to be run. The script creates
> > > > > > a bunch of directories, compiles the benchmark a start trial runs.
> > > > > >
> > > > > > Could you please help to narrow down the problem?
> > > > > >
> > > > > > Reports bellow were created with kernel 5.17 rc8 and with
> > > > > > echo 1 > /proc/sys/kernel/panic_on_oops
> > > > > > setting.
> > > > > >
> > > > > > crash> sys
> > > > > >       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
> > > > > >     DUMPFILE: vmcore  [PARTIAL DUMP]
> > > > > >         CPUS: 48
> > > > > >         DATE: Thu Mar 17 02:49:40 CET 2022
> > > > > >       UPTIME: 00:02:50
> > > > > > LOAD AVERAGE: 0.32, 0.10, 0.03
> > > > > >        TASKS: 608
> > > > > >     NODENAME: gold-2s-c
> > > > > >      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
> > > > > >      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
> > > > > >      MACHINE: x86_64  (2600 Mhz)
> > > > > >       MEMORY: 94.7 GB
> > > > > >        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> > > > > >
> > > > > >
> > > > > > crash> bt
> > > > > > PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> > > > > > #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> > > > > > #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> > > > > > #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> > > > > > #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> > > > > > #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> > > > > > #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> > > > > > #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
> > > > > >    [exception RIP: kernfs_remove+7]
> > > > > >    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
> > > > > >    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
> > > > > >    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
> > > > > >    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
> > > > > >    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
> > > > > >    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
> > > > > >    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > > > > #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> > > > > > #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> > > > > > #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> > > > > > #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> > > > > > #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> > > > > > #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> > > > > > #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> > > > > > #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
> > > > > >    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
> > > > > >    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
> > > > > >    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
> > > > > >    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
> > > > > >    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
> > > > > >    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
> > > > > >    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> > > > > >
> > > > > > [2] dmesg
> > > > > > [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > > > [  172.783513] #PF: supervisor read access in kernel mode
> > > > > > [  172.788652] #PF: error_code(0x0000) - not-present page
> > > > > > [  172.793793] PGD 0 P4D 0
> > > > > > [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > > > [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> > > > > > tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> > > > > > [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > > > 2.0b 03/07/2018
> > > > > > [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> > > > > > [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> > > > > > f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> > > > > > 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> > > > > > c5 60
> > > > > > [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> > > > > > [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> > > > > > [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> > > > > > [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> > > > > > [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> > > > > > [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> > > > > > [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> > > > > > [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > [  172.916261] PKRU: 55555554
> > > > > > [  172.918974] Call Trace:
> > > > > > [  172.921427]  <TASK>
> > > > > > [  172.923533]  rdt_kill_sb+0x29b/0x350
> > > > > > [  172.927112]  deactivate_locked_super+0x2f/0xa0
> > > > > > [  172.931559]  cleanup_mnt+0xee/0x180
> > > > > > [  172.935051]  task_work_run+0x5c/0x90
> > > > > > [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> > > > > > [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> > > > > > [  172.948043]  do_syscall_64+0x46/0x80
> > > > > > [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > > [  172.956675] RIP: 0033:0x7f442c75126b
> > > > > > [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> > > > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> > > > > > f7 d8
> > > > > > [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> > > > > > 00000000000000a6
> > > > > > [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> > > > > > [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> > > > > > [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> > > > > > [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > > > [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> > > > > > [  173.022249]  </TASK>
> > > > > > [  173.024440] Modules linked in: rfkill intel_rapl_msr
> > > > > > intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> > > > > > x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> > > > > > intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> > > > > > intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> > > > > > mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> > > > > > intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> > > > > > acpi_power_meter fuse zram xfs crct10d
> > > > > > if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> > > > > > ttm wmi ghash_clmulni_intel
> > > > > > [  173.073900] CR2: 0000000000000008
> > > > > >
> > > > >
> > > > > --
> > > > > Additional information about regzbot:
> > > > >
> > > > > If you want to know more about regzbot, check out its web-interface, the
> > > > > getting start guide, and the references documentation:
> > > > >
> > > > > https://linux-regtracking.leemhuis.info/regzbot/
> > > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> > > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> > > > >
> > > > > The last two documents will explain how you can interact with regzbot
> > > > > yourself if your want to.
> > > > >
> > > > > Hint for reporters: when reporting a regression it's in your interest to
> > > > > CC the regression list and tell regzbot about the issue, as that ensures
> > > > > the regression makes it onto the radar of the Linux kernel's regression
> > > > > tracker -- that's in your interest, as it ensures your report won't fall
> > > > > through the cracks unnoticed.
> > > > >
> > > > > Hint for developers: you normally don't need to care about regzbot once
> > > > > it's involved. Fix the issue as you normally would, just remember to
> > > > > include 'Link:' tag in the patch descriptions pointing to all reports
> > > > > about the issue. This has been expected from developers even before
> > > > > regzbot showed up for reasons explained in
> > > > > 'Documentation/process/submitting-patches.rst' and
> > > > > 'Documentation/process/5.Posting.rst'.
> > > > >
> > > >
> > > >
> > > > --
> > > > -Jirka
> > >
> > >
> > >
> > > --
> > > -Jirka
> > >
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-31 16:18           ` Jirka Hladky
@ 2022-03-31 23:33             ` Minchan Kim
  2022-04-01 12:04               ` Jirka Hladky
  0 siblings, 1 reply; 21+ messages in thread
From: Minchan Kim @ 2022-03-31 23:33 UTC (permalink / raw)
  To: Jirka Hladky
  Cc: tj, linux-kernel, regressions, Thorsten Leemhuis, Justin Forbes

On Thu, Mar 31, 2022 at 06:18:28PM +0200, Jirka Hladky wrote:
> > So, do you mean you hit the bug with the additional fix?
> 
> Yes, exactly. We have been hitting this issue since v5.17-rc1. I have
> now specifically tested the "555a0ce4558d kernfs: prevent early
> freeing of root node" commit and it does not resolve the issue.

Could you decode exact source code line from the oops?

> 
> > Do you have any reproducer?
> Yes. It happens in various places when preparing a NAS parallel
> benchmark for the execution. Sometimes during compilation, sometimes
> with the first trial run. It takes 1 or 2 minutes to hit that issue.
> 
> @Michan - the tarball with the reproducer has ~170kB. How can I send
> it to you? (I have been trying to create a simple reproducer but
> without success).

I think it's fine to attach in the reply because kernel test bot 
usually attach bigger size files to report some bug and I have not
seen anyone complaing about it.

Thanks!
> 
> Thanks
> Jirka
> 
> 
> On Thu, Mar 31, 2022 at 4:55 PM Justin Forbes <jforbes@fedoraproject.org> wrote:
> >
> > On Wed, Mar 30, 2022 at 7:11 PM Minchan Kim <minchan@kernel.org> wrote:
> > >
> > > On Thu, Mar 31, 2022 at 12:24:12AM +0200, Jirka Hladky wrote:
> > > > Adding Minchan Kim on Cc.
> > > >
> > > > @Minchan - commit 393c3714081a53795bbff0e985d24146def6f57f authored by
> > > > you is causing BUG: kernel NULL pointer dereference, address:
> > > > 0000000000000008
> > > >
> > > > Could you please have a look at what might be wrong?
> > >
> > > There was one follow-up patch to fix some issue at that time.
> > >
> > > 555a0ce4558d kernfs: prevent early freeing of root node
> > >
> > > So, do you mean you hit the bug with the additional fix?
> > > Do you have any reproducer?
> > >
> > > Ccing Tejun to borrow kernfs expertise.
> >
> > That patch was included in v5.17-rc1, so yes, it does reproduce with
> > that patch included.
> >
> > Justin
> >
> > > >
> > > > Thank you!
> > > > Jirka
> > > >
> > > > On Thu, Mar 31, 2022 at 12:16 AM Jirka Hladky <jhladky@redhat.com> wrote:
> > > > >
> > > > > Hi Thorsten,
> > > > >
> > > > > thanks for adding this to the regzbot bot.
> > > > >
> > > > > Hi Greg and all,
> > > > >
> > > > > I did bisecting and I have found the commit causing this issue [1].
> > > > > Could you please have a look at the code how to fix it?
> > > > >
> > > > > Thanks a lot
> > > > > Jirka
> > > > >
> > > > > [1]
> > > > > =========================================================
> > > > > $ git bisect visualize
> > > > > commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
> > > > > Author: Minchan Kim <minchan@kernel.org>
> > > > > Date:   Thu Nov 18 15:00:08 2021 -0800
> > > > >
> > > > >    kernfs: switch global kernfs_rwsem lock to per-fs lock
> > > > >
> > > > >    The kernfs implementation has big lock granularity(kernfs_rwsem) so
> > > > >    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
> > > > >    lock. It makes trouble for some cases to wait the global lock
> > > > >    for a long time even though they are totally independent contexts
> > > > >    each other.
> > > > >
> > > > >    A general example is process A goes under direct reclaim with holding
> > > > >    the lock when it accessed the file in sysfs and process B is waiting
> > > > >    the lock with exclusive mode and then process C is waiting the lock
> > > > >    until process B could finish the job after it gets the lock from
> > > > >    process A.
> > > > >
> > > > >    This patch switches the global kernfs_rwsem to per-fs lock, which
> > > > >    put the rwsem into kernfs_root.
> > > > >
> > > > >    Suggested-by: Tejun Heo <tj@kernel.org>
> > > > >    Acked-by: Tejun Heo <tj@kernel.org>
> > > > >    Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > > >    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
> > > > >    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > =========================================================
> > > > >
> > > > > The bug is triggered by running NAS Parallel benchmark suite on
> > > > > SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
> > > > > log:
> > > > >
> > > > > [  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > > [  247.036009] #PF: supervisor read access in kernel mode
> > > > > [  247.036009] #PF: error_code(0x0000) - not-present page
> > > > > [  247.036009] PGD 0 P4D 0
> > > > > [  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > > [  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
> > > > > 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
> > > > > [  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > > 2.0b 03/07/2018
> > > > > [  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
> > > > > [  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > > > c4 60
> > > > > [  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > > > [  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > > > [  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > > > [  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > > > [  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > > > [  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > > > [  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > > > [  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [  247.122048] PKRU: 55555554
> > > > > [  247.122048] Call Trace:
> > > > > [  247.122048]  <TASK>
> > > > > [  247.122048]  rdt_kill_sb+0x29d/0x350
> > > > > [  247.122048]  deactivate_locked_super+0x36/0xa0
> > > > > [  247.122048]  cleanup_mnt+0x131/0x190
> > > > > [  247.122048]  task_work_run+0x5c/0x90
> > > > > [  247.122048]  exit_to_user_mode_prepare+0x229/0x230
> > > > > [  247.122048]  syscall_exit_to_user_mode+0x18/0x40
> > > > > [  247.122048]  do_syscall_64+0x48/0x90
> > > > > [  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > [  247.122048] RIP: 0033:0x7f01be2d735b
> > > > > [  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
> > > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
> > > > > f7 d8
> > > > > [  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
> > > > > 00000000000000a6
> > > > > [  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
> > > > > [  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
> > > > > [  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
> > > > > [  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > > [  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
> > > > > [  247.279066]  </TASK>
> > > > > [  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
> > > > > intel_rapl_common isst_if_common skx_edac nfit libnvdimm
> > > > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
> > > > > iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
> > > > > rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
> > > > > intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
> > > > > intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
> > > > > xfs ast i2c_algo_bit drm_v
> > > > > ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
> > > > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
> > > > > fuse ipmi_devintf ipmi_msghandler
> > > > > [  247.335054] CR2: 0000000000000008
> > > > > [  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
> > > > > [  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
> > > > > [  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > > > c4 60
> > > > > [  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > > > [  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > > > [  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > > > [  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > > > [  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > > > [  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > > > [  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > > > knlGS:0000000000000000
> > > > > [  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > > > [  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [  247.455060] PKRU: 55555554
> > > > >
> > > > > On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
> > > > > <regressions@leemhuis.info> wrote:
> > > > > >
> > > > > > [TLDR: I'm adding the regression report below to regzbot, the Linux
> > > > > > kernel regression tracking bot; all text you find below is compiled from
> > > > > > a few templates paragraphs you might have encountered already already
> > > > > > from similar mails.]
> > > > > >
> > > > > > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > > > > > to make this easily accessible to everyone.
> > > > > >
> > > > > > To be sure below issue doesn't fall through the cracks unnoticed, I'm
> > > > > > adding it to regzbot, my Linux kernel regression tracking bot:
> > > > > >
> > > > > > #regzbot ^introduced v5.16..v5.17
> > > > > > #regzbot ignore-activity
> > > > > >
> > > > > > If it turns out this isn't a regression, free free to remove it from the
> > > > > > tracking by sending a reply to this thread containing a paragraph like
> > > > > > "#regzbot invalid: reason why this is invalid" (without the quotes).
> > > > > >
> > > > > > Reminder for developers: when fixing the issue, please add a 'Link:'
> > > > > > tags pointing to the report (the mail quoted above) using
> > > > > > lore.kernel.org/r/, as explained in
> > > > > > 'Documentation/process/submitting-patches.rst' and
> > > > > > 'Documentation/process/5.Posting.rst'. Regzbot needs them to
> > > > > > automatically connect reports with fixes, but they are useful in
> > > > > > general, too.
> > > > > >
> > > > > > I'm sending this to everyone that got the initial report, to make
> > > > > > everyone aware of the tracking. I also hope that messages like this
> > > > > > motivate people to directly get at least the regression mailing list and
> > > > > > ideally even regzbot involved when dealing with regressions, as messages
> > > > > > like this wouldn't be needed then. And don't worry, if I need to send
> > > > > > other mails regarding this regression only relevant for regzbot I'll
> > > > > > send them to the regressions lists only (with a tag in the subject so
> > > > > > people can filter them away). With a bit of luck no such messages will
> > > > > > be needed anyway.
> > > > > >
> > > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > > > > >
> > > > > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> > > > > > reports on my table. I can only look briefly into most of them and lack
> > > > > > knowledge about most of the areas they concern. I thus unfortunately
> > > > > > will sometimes get things wrong or miss something important. I hope
> > > > > > that's not the case here; if you think it is, don't hesitate to tell me
> > > > > > in a public reply, it's in everyone's interest to set the public record
> > > > > > straight.
> > > > > >
> > > > > >
> > > > > > On 22.03.22 00:29, Jirka Hladky wrote:
> > > > > > > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > > > > > > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > > > > > > Gold 6126 CPU)
> > > > > > >
> > > > > > > Bellow is a backtrace and the dmesg log.
> > > > > > >
> > > > > > > I have trouble creating a simple reproducer - it happens at random
> > > > > > > places when preparing the NAS benchmark to be run. The script creates
> > > > > > > a bunch of directories, compiles the benchmark a start trial runs.
> > > > > > >
> > > > > > > Could you please help to narrow down the problem?
> > > > > > >
> > > > > > > Reports bellow were created with kernel 5.17 rc8 and with
> > > > > > > echo 1 > /proc/sys/kernel/panic_on_oops
> > > > > > > setting.
> > > > > > >
> > > > > > > crash> sys
> > > > > > >       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
> > > > > > >     DUMPFILE: vmcore  [PARTIAL DUMP]
> > > > > > >         CPUS: 48
> > > > > > >         DATE: Thu Mar 17 02:49:40 CET 2022
> > > > > > >       UPTIME: 00:02:50
> > > > > > > LOAD AVERAGE: 0.32, 0.10, 0.03
> > > > > > >        TASKS: 608
> > > > > > >     NODENAME: gold-2s-c
> > > > > > >      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
> > > > > > >      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
> > > > > > >      MACHINE: x86_64  (2600 Mhz)
> > > > > > >       MEMORY: 94.7 GB
> > > > > > >        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> > > > > > >
> > > > > > >
> > > > > > > crash> bt
> > > > > > > PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> > > > > > > #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> > > > > > > #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> > > > > > > #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> > > > > > > #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> > > > > > > #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> > > > > > > #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> > > > > > > #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
> > > > > > >    [exception RIP: kernfs_remove+7]
> > > > > > >    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
> > > > > > >    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
> > > > > > >    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
> > > > > > >    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
> > > > > > >    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
> > > > > > >    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
> > > > > > >    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > > > > > #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> > > > > > > #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> > > > > > > #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> > > > > > > #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> > > > > > > #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> > > > > > > #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> > > > > > > #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> > > > > > > #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
> > > > > > >    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
> > > > > > >    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
> > > > > > >    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
> > > > > > >    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
> > > > > > >    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
> > > > > > >    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
> > > > > > >    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> > > > > > >
> > > > > > > [2] dmesg
> > > > > > > [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > > > > [  172.783513] #PF: supervisor read access in kernel mode
> > > > > > > [  172.788652] #PF: error_code(0x0000) - not-present page
> > > > > > > [  172.793793] PGD 0 P4D 0
> > > > > > > [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > > > > [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> > > > > > > tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> > > > > > > [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > > > > 2.0b 03/07/2018
> > > > > > > [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> > > > > > > [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> > > > > > > f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> > > > > > > 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> > > > > > > c5 60
> > > > > > > [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> > > > > > > [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> > > > > > > [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> > > > > > > [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> > > > > > > [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> > > > > > > [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> > > > > > > [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > > [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> > > > > > > [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > [  172.916261] PKRU: 55555554
> > > > > > > [  172.918974] Call Trace:
> > > > > > > [  172.921427]  <TASK>
> > > > > > > [  172.923533]  rdt_kill_sb+0x29b/0x350
> > > > > > > [  172.927112]  deactivate_locked_super+0x2f/0xa0
> > > > > > > [  172.931559]  cleanup_mnt+0xee/0x180
> > > > > > > [  172.935051]  task_work_run+0x5c/0x90
> > > > > > > [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> > > > > > > [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> > > > > > > [  172.948043]  do_syscall_64+0x46/0x80
> > > > > > > [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > > > [  172.956675] RIP: 0033:0x7f442c75126b
> > > > > > > [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> > > > > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> > > > > > > f7 d8
> > > > > > > [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> > > > > > > 00000000000000a6
> > > > > > > [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> > > > > > > [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> > > > > > > [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> > > > > > > [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > > > > [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> > > > > > > [  173.022249]  </TASK>
> > > > > > > [  173.024440] Modules linked in: rfkill intel_rapl_msr
> > > > > > > intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> > > > > > > x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> > > > > > > intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> > > > > > > intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> > > > > > > mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> > > > > > > intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> > > > > > > acpi_power_meter fuse zram xfs crct10d
> > > > > > > if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> > > > > > > ttm wmi ghash_clmulni_intel
> > > > > > > [  173.073900] CR2: 0000000000000008
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Additional information about regzbot:
> > > > > >
> > > > > > If you want to know more about regzbot, check out its web-interface, the
> > > > > > getting start guide, and the references documentation:
> > > > > >
> > > > > > https://linux-regtracking.leemhuis.info/regzbot/
> > > > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> > > > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> > > > > >
> > > > > > The last two documents will explain how you can interact with regzbot
> > > > > > yourself if your want to.
> > > > > >
> > > > > > Hint for reporters: when reporting a regression it's in your interest to
> > > > > > CC the regression list and tell regzbot about the issue, as that ensures
> > > > > > the regression makes it onto the radar of the Linux kernel's regression
> > > > > > tracker -- that's in your interest, as it ensures your report won't fall
> > > > > > through the cracks unnoticed.
> > > > > >
> > > > > > Hint for developers: you normally don't need to care about regzbot once
> > > > > > it's involved. Fix the issue as you normally would, just remember to
> > > > > > include 'Link:' tag in the patch descriptions pointing to all reports
> > > > > > about the issue. This has been expected from developers even before
> > > > > > regzbot showed up for reasons explained in
> > > > > > 'Documentation/process/submitting-patches.rst' and
> > > > > > 'Documentation/process/5.Posting.rst'.
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Jirka
> > > >
> > > >
> > > >
> > > > --
> > > > -Jirka
> > > >
> >
> 
> 
> -- 
> -Jirka
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-03-31 23:33             ` Minchan Kim
@ 2022-04-01 12:04               ` Jirka Hladky
  2022-04-04 17:41                 ` Minchan Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Jirka Hladky @ 2022-04-01 12:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: tj, linux-kernel, regressions, Thorsten Leemhuis, Justin Forbes

[-- Attachment #1: Type: text/plain, Size: 28071 bytes --]

> Could you decode exact source code line from the oops?

Yes - please see below [1].

> I think it's fine to attach in the reply because kernel test bot

OK. The reproducer is attached. Please unpack it and follow the
instructions in the README file. [2]

Thanks a lot for looking into it!
Jirka

[1]
=============================================
Source code line numbers for the Oops message
=============================================

1) RIP: 0010:kernfs_remove+0x8/0x50:
(gdb) l *kernfs_remove+0x8
0xffffffff81418588 is in kernfs_remove (fs/kernfs/kernfs-internal.h:48).
43       * Return the kernfs_root @kn belongs to.
44       */
45      static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
46      {
47              /* if parent exists, it's always a dir; otherwise, @sd
is a dir */
48              if (kn->parent)
49                      kn = kn->parent;
50              return kn->dir.root;
51      }

And here are source code lines from the 5 first functions in call trace:
[ 8563.366280] Call Trace:
[ 8563.366280]  <TASK>
[ 8563.366280]  rdt_kill_sb+0x29d/0x350
[ 8563.366280]  deactivate_locked_super+0x36/0xa0
[ 8563.366280]  cleanup_mnt+0x131/0x190
[ 8563.366280]  task_work_run+0x5c/0x90
[ 8563.366280]  exit_to_user_mode_prepare+0x229/0x230
[ 8563.366280]  syscall_exit_to_user_mode+0x18/0x40
[ 8563.366280]  do_syscall_64+0x48/0x90
[ 8563.366280]  entry_SYSCALL_64_after_hwframe+0x44/0xae

2)(gdb) l *rdt_kill_sb+0x29d
0xffffffff810506bd is in rdt_kill_sb
(arch/x86/kernel/cpu/resctrl/rdtgroup.c:2442).
2437            /* Notify online CPUs to update per cpu storage and
PQR_ASSOC MSR */
2438            update_closid_rmid(cpu_online_mask, &rdtgroup_default);
2439
2440            kernfs_remove(kn_info);
2441            kernfs_remove(kn_mongrp);
2442            kernfs_remove(kn_mondata);
2443    }

3)(gdb) l *deactivate_locked_super+0x36
0xffffffff813650f6 is in deactivate_locked_super (fs/super.c:342).
337                     /*
338                      * Since list_lru_destroy() may sleep, we
cannot call it from
339                      * put_super(), where we hold the sb_lock.
Therefore we destroy
340                      * the lru lists right now.
341                      */
342                     list_lru_destroy(&s->s_dentry_lru);
343                     list_lru_destroy(&s->s_inode_lru);
344
345                     put_filesystem(fs);
346                     put_super(s);

4) (gdb) l *cleanup_mnt+0x131
0xffffffff813890a1 is in cleanup_mnt (fs/namespace.c:137).
132             return 0;
133     }
134
135     static void mnt_free_id(struct mount *mnt)
136     {
137             ida_free(&mnt_id_ida, mnt->mnt_id);
138     }

5) (gdb) l *task_work_run+0x5c
0xffffffff8110620c is in task_work_run (./include/linux/sched.h:2017).
2012
2013    DECLARE_STATIC_CALL(cond_resched, __cond_resched);
2014
2015    static __always_inline int _cond_resched(void)
2016    {
2017            return static_call_mod(cond_resched)();
2018    }

6) (gdb) l *exit_to_user_mode_prepare+0x229
0xffffffff81176d19 is in exit_to_user_mode_prepare
(./include/linux/tracehook.h:189).
184              * This barrier pairs with
task_work_add()->set_notify_resume() after
185              * hlist_add_head(task->task_works);
186              */
187             smp_mb__after_atomic();
188             if (unlikely(current->task_works))
189                     task_work_run();
190
191     #ifdef CONFIG_KEYS_REQUEST_CACHE
192             if (unlikely(current->cached_requested_key)) {
193                     key_put(current->cached_requested_key);

[2]
=============================================
Reproducer - README
=============================================

1) HW
This issue seems to be platform specific. I was not able to reproduce
it on AMD Zen and also not on Intel Ice Lake platform.
I see the issue on dual socket Intel Skylake systems. Reproduced on a
Supermicro Super Server/X11DDW-L with 2x Xeon Gold 6126 CPU.

2) Preparation
You will need these packages (tested on Fedora):
gcc
gcc-gfortran
libgomp-devel
sysstat
hwloc
hwloc-gui
util-linux
time

3)
Please check /proc/sys/kernel/panic_on_oops
0 => you will get Oops and system will continue to run
1 => you will get Panic (this is the default on RHEL)

cd  Reproducer
./reproducer.sh

Check the file ./reproducer.sh for the options how to get more
detailed logs and how to send logs to /dev/kmsg (dmesg)
#Verbose logs from script:
#./runtest.sh --iterations 1 --list_of_threads 24 --skip_system_info --verbose

# Copy logs to /dev/kmsg
#stdbuf -oL ./runtest.sh --iterations 1 --list_of_threads 24
--skip_system_info | stdbuf -oL tee /dev/kmsg


On Fri, Apr 1, 2022 at 1:33 AM Minchan Kim <minchan@kernel.org> wrote:
>
> On Thu, Mar 31, 2022 at 06:18:28PM +0200, Jirka Hladky wrote:
> > > So, do you mean you hit the bug with the additional fix?
> >
> > Yes, exactly. We have been hitting this issue since v5.17-rc1. I have
> > now specifically tested the "555a0ce4558d kernfs: prevent early
> > freeing of root node" commit and it does not resolve the issue.
>
> Could you decode exact source code line from the oops?
>
> >
> > > Do you have any reproducer?
> > Yes. It happens in various places when preparing a NAS parallel
> > benchmark for the execution. Sometimes during compilation, sometimes
> > with the first trial run. It takes 1 or 2 minutes to hit that issue.
> >
> > @Michan - the tarball with the reproducer has ~170kB. How can I send
> > it to you? (I have been trying to create a simple reproducer but
> > without success).
>
> I think it's fine to attach in the reply because kernel test bot
> usually attach bigger size files to report some bug and I have not
> seen anyone complaing about it.
>
> Thanks!
> >
> > Thanks
> > Jirka
> >
> >
> > On Thu, Mar 31, 2022 at 4:55 PM Justin Forbes <jforbes@fedoraproject.org> wrote:
> > >
> > > On Wed, Mar 30, 2022 at 7:11 PM Minchan Kim <minchan@kernel.org> wrote:
> > > >
> > > > On Thu, Mar 31, 2022 at 12:24:12AM +0200, Jirka Hladky wrote:
> > > > > Adding Minchan Kim on Cc.
> > > > >
> > > > > @Minchan - commit 393c3714081a53795bbff0e985d24146def6f57f authored by
> > > > > you is causing BUG: kernel NULL pointer dereference, address:
> > > > > 0000000000000008
> > > > >
> > > > > Could you please have a look at what might be wrong?
> > > >
> > > > There was one follow-up patch to fix some issue at that time.
> > > >
> > > > 555a0ce4558d kernfs: prevent early freeing of root node
> > > >
> > > > So, do you mean you hit the bug with the additional fix?
> > > > Do you have any reproducer?
> > > >
> > > > Ccing Tejun to borrow kernfs expertise.
> > >
> > > That patch was included in v5.17-rc1, so yes, it does reproduce with
> > > that patch included.
> > >
> > > Justin
> > >
> > > > >
> > > > > Thank you!
> > > > > Jirka
> > > > >
> > > > > On Thu, Mar 31, 2022 at 12:16 AM Jirka Hladky <jhladky@redhat.com> wrote:
> > > > > >
> > > > > > Hi Thorsten,
> > > > > >
> > > > > > thanks for adding this to the regzbot bot.
> > > > > >
> > > > > > Hi Greg and all,
> > > > > >
> > > > > > I did bisecting and I have found the commit causing this issue [1].
> > > > > > Could you please have a look at the code how to fix it?
> > > > > >
> > > > > > Thanks a lot
> > > > > > Jirka
> > > > > >
> > > > > > [1]
> > > > > > =========================================================
> > > > > > $ git bisect visualize
> > > > > > commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
> > > > > > Author: Minchan Kim <minchan@kernel.org>
> > > > > > Date:   Thu Nov 18 15:00:08 2021 -0800
> > > > > >
> > > > > >    kernfs: switch global kernfs_rwsem lock to per-fs lock
> > > > > >
> > > > > >    The kernfs implementation has big lock granularity(kernfs_rwsem) so
> > > > > >    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
> > > > > >    lock. It makes trouble for some cases to wait the global lock
> > > > > >    for a long time even though they are totally independent contexts
> > > > > >    each other.
> > > > > >
> > > > > >    A general example is process A goes under direct reclaim with holding
> > > > > >    the lock when it accessed the file in sysfs and process B is waiting
> > > > > >    the lock with exclusive mode and then process C is waiting the lock
> > > > > >    until process B could finish the job after it gets the lock from
> > > > > >    process A.
> > > > > >
> > > > > >    This patch switches the global kernfs_rwsem to per-fs lock, which
> > > > > >    put the rwsem into kernfs_root.
> > > > > >
> > > > > >    Suggested-by: Tejun Heo <tj@kernel.org>
> > > > > >    Acked-by: Tejun Heo <tj@kernel.org>
> > > > > >    Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > > > >    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
> > > > > >    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > > =========================================================
> > > > > >
> > > > > > The bug is triggered by running NAS Parallel benchmark suite on
> > > > > > SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
> > > > > > log:
> > > > > >
> > > > > > [  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > > > [  247.036009] #PF: supervisor read access in kernel mode
> > > > > > [  247.036009] #PF: error_code(0x0000) - not-present page
> > > > > > [  247.036009] PGD 0 P4D 0
> > > > > > [  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > > > [  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
> > > > > > 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
> > > > > > [  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > > > 2.0b 03/07/2018
> > > > > > [  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
> > > > > > [  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > > > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > > > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > > > > c4 60
> > > > > > [  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > > > > [  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > > > > [  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > > > > [  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > > > > [  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > > > > [  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > > > > [  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > [  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > > > > [  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > [  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > [  247.122048] PKRU: 55555554
> > > > > > [  247.122048] Call Trace:
> > > > > > [  247.122048]  <TASK>
> > > > > > [  247.122048]  rdt_kill_sb+0x29d/0x350
> > > > > > [  247.122048]  deactivate_locked_super+0x36/0xa0
> > > > > > [  247.122048]  cleanup_mnt+0x131/0x190
> > > > > > [  247.122048]  task_work_run+0x5c/0x90
> > > > > > [  247.122048]  exit_to_user_mode_prepare+0x229/0x230
> > > > > > [  247.122048]  syscall_exit_to_user_mode+0x18/0x40
> > > > > > [  247.122048]  do_syscall_64+0x48/0x90
> > > > > > [  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > > [  247.122048] RIP: 0033:0x7f01be2d735b
> > > > > > [  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
> > > > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
> > > > > > f7 d8
> > > > > > [  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
> > > > > > 00000000000000a6
> > > > > > [  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
> > > > > > [  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
> > > > > > [  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
> > > > > > [  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > > > [  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
> > > > > > [  247.279066]  </TASK>
> > > > > > [  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
> > > > > > intel_rapl_common isst_if_common skx_edac nfit libnvdimm
> > > > > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
> > > > > > iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
> > > > > > rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
> > > > > > intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
> > > > > > intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
> > > > > > xfs ast i2c_algo_bit drm_v
> > > > > > ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
> > > > > > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
> > > > > > fuse ipmi_devintf ipmi_msghandler
> > > > > > [  247.335054] CR2: 0000000000000008
> > > > > > [  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
> > > > > > [  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
> > > > > > [  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
> > > > > > ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
> > > > > > 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
> > > > > > c4 60
> > > > > > [  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
> > > > > > [  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
> > > > > > [  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
> > > > > > [  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
> > > > > > [  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
> > > > > > [  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
> > > > > > [  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > [  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
> > > > > > [  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > [  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > [  247.455060] PKRU: 55555554
> > > > > >
> > > > > > On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
> > > > > > <regressions@leemhuis.info> wrote:
> > > > > > >
> > > > > > > [TLDR: I'm adding the regression report below to regzbot, the Linux
> > > > > > > kernel regression tracking bot; all text you find below is compiled from
> > > > > > > a few templates paragraphs you might have encountered already already
> > > > > > > from similar mails.]
> > > > > > >
> > > > > > > Hi, this is your Linux kernel regression tracker. Top-posting for once,
> > > > > > > to make this easily accessible to everyone.
> > > > > > >
> > > > > > > To be sure below issue doesn't fall through the cracks unnoticed, I'm
> > > > > > > adding it to regzbot, my Linux kernel regression tracking bot:
> > > > > > >
> > > > > > > #regzbot ^introduced v5.16..v5.17
> > > > > > > #regzbot ignore-activity
> > > > > > >
> > > > > > > If it turns out this isn't a regression, free free to remove it from the
> > > > > > > tracking by sending a reply to this thread containing a paragraph like
> > > > > > > "#regzbot invalid: reason why this is invalid" (without the quotes).
> > > > > > >
> > > > > > > Reminder for developers: when fixing the issue, please add a 'Link:'
> > > > > > > tags pointing to the report (the mail quoted above) using
> > > > > > > lore.kernel.org/r/, as explained in
> > > > > > > 'Documentation/process/submitting-patches.rst' and
> > > > > > > 'Documentation/process/5.Posting.rst'. Regzbot needs them to
> > > > > > > automatically connect reports with fixes, but they are useful in
> > > > > > > general, too.
> > > > > > >
> > > > > > > I'm sending this to everyone that got the initial report, to make
> > > > > > > everyone aware of the tracking. I also hope that messages like this
> > > > > > > motivate people to directly get at least the regression mailing list and
> > > > > > > ideally even regzbot involved when dealing with regressions, as messages
> > > > > > > like this wouldn't be needed then. And don't worry, if I need to send
> > > > > > > other mails regarding this regression only relevant for regzbot I'll
> > > > > > > send them to the regressions lists only (with a tag in the subject so
> > > > > > > people can filter them away). With a bit of luck no such messages will
> > > > > > > be needed anyway.
> > > > > > >
> > > > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > > > > > >
> > > > > > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> > > > > > > reports on my table. I can only look briefly into most of them and lack
> > > > > > > knowledge about most of the areas they concern. I thus unfortunately
> > > > > > > will sometimes get things wrong or miss something important. I hope
> > > > > > > that's not the case here; if you think it is, don't hesitate to tell me
> > > > > > > in a public reply, it's in everyone's interest to set the public record
> > > > > > > straight.
> > > > > > >
> > > > > > >
> > > > > > > On 22.03.22 00:29, Jirka Hladky wrote:
> > > > > > > > Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
> > > > > > > > experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
> > > > > > > > Gold 6126 CPU)
> > > > > > > >
> > > > > > > > Bellow is a backtrace and the dmesg log.
> > > > > > > >
> > > > > > > > I have trouble creating a simple reproducer - it happens at random
> > > > > > > > places when preparing the NAS benchmark to be run. The script creates
> > > > > > > > a bunch of directories, compiles the benchmark a start trial runs.
> > > > > > > >
> > > > > > > > Could you please help to narrow down the problem?
> > > > > > > >
> > > > > > > > Reports bellow were created with kernel 5.17 rc8 and with
> > > > > > > > echo 1 > /proc/sys/kernel/panic_on_oops
> > > > > > > > setting.
> > > > > > > >
> > > > > > > > crash> sys
> > > > > > > >       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
> > > > > > > >     DUMPFILE: vmcore  [PARTIAL DUMP]
> > > > > > > >         CPUS: 48
> > > > > > > >         DATE: Thu Mar 17 02:49:40 CET 2022
> > > > > > > >       UPTIME: 00:02:50
> > > > > > > > LOAD AVERAGE: 0.32, 0.10, 0.03
> > > > > > > >        TASKS: 608
> > > > > > > >     NODENAME: gold-2s-c
> > > > > > > >      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
> > > > > > > >      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
> > > > > > > >      MACHINE: x86_64  (2600 Mhz)
> > > > > > > >       MEMORY: 94.7 GB
> > > > > > > >        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
> > > > > > > >
> > > > > > > >
> > > > > > > > crash> bt
> > > > > > > > PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
> > > > > > > > #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
> > > > > > > > #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
> > > > > > > > #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
> > > > > > > > #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
> > > > > > > > #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
> > > > > > > > #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
> > > > > > > > #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
> > > > > > > >    [exception RIP: kernfs_remove+7]
> > > > > > > >    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
> > > > > > > >    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
> > > > > > > >    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
> > > > > > > >    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
> > > > > > > >    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
> > > > > > > >    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
> > > > > > > >    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > > > > > > #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
> > > > > > > > #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
> > > > > > > > #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
> > > > > > > > #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
> > > > > > > > #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
> > > > > > > > #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
> > > > > > > > #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
> > > > > > > > #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
> > > > > > > >    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
> > > > > > > >    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
> > > > > > > >    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
> > > > > > > >    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
> > > > > > > >    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
> > > > > > > >    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
> > > > > > > >    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
> > > > > > > >
> > > > > > > > [2] dmesg
> > > > > > > > [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
> > > > > > > > [  172.783513] #PF: supervisor read access in kernel mode
> > > > > > > > [  172.788652] #PF: error_code(0x0000) - not-present page
> > > > > > > > [  172.793793] PGD 0 P4D 0
> > > > > > > > [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > > > > > [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
> > > > > > > > tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
> > > > > > > > [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
> > > > > > > > 2.0b 03/07/2018
> > > > > > > > [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
> > > > > > > > [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
> > > > > > > > f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
> > > > > > > > 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
> > > > > > > > c5 60
> > > > > > > > [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
> > > > > > > > [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
> > > > > > > > [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
> > > > > > > > [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
> > > > > > > > [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
> > > > > > > > [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
> > > > > > > > [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
> > > > > > > > [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > [  172.916261] PKRU: 55555554
> > > > > > > > [  172.918974] Call Trace:
> > > > > > > > [  172.921427]  <TASK>
> > > > > > > > [  172.923533]  rdt_kill_sb+0x29b/0x350
> > > > > > > > [  172.927112]  deactivate_locked_super+0x2f/0xa0
> > > > > > > > [  172.931559]  cleanup_mnt+0xee/0x180
> > > > > > > > [  172.935051]  task_work_run+0x5c/0x90
> > > > > > > > [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
> > > > > > > > [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
> > > > > > > > [  172.948043]  do_syscall_64+0x46/0x80
> > > > > > > > [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > > > > [  172.956675] RIP: 0033:0x7f442c75126b
> > > > > > > > [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
> > > > > > > > 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
> > > > > > > > 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
> > > > > > > > f7 d8
> > > > > > > > [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
> > > > > > > > 00000000000000a6
> > > > > > > > [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
> > > > > > > > [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
> > > > > > > > [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
> > > > > > > > [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
> > > > > > > > [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
> > > > > > > > [  173.022249]  </TASK>
> > > > > > > > [  173.024440] Modules linked in: rfkill intel_rapl_msr
> > > > > > > > intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
> > > > > > > > x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
> > > > > > > > intel_pmc_bxt ib_core iTCO_vendor_support kvm_
> > > > > > > > intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
> > > > > > > > mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
> > > > > > > > intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
> > > > > > > > acpi_power_meter fuse zram xfs crct10d
> > > > > > > > if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
> > > > > > > > ttm wmi ghash_clmulni_intel
> > > > > > > > [  173.073900] CR2: 0000000000000008
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Additional information about regzbot:
> > > > > > >
> > > > > > > If you want to know more about regzbot, check out its web-interface, the
> > > > > > > getting start guide, and the references documentation:
> > > > > > >
> > > > > > > https://linux-regtracking.leemhuis.info/regzbot/
> > > > > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> > > > > > > https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> > > > > > >
> > > > > > > The last two documents will explain how you can interact with regzbot
> > > > > > > yourself if your want to.
> > > > > > >
> > > > > > > Hint for reporters: when reporting a regression it's in your interest to
> > > > > > > CC the regression list and tell regzbot about the issue, as that ensures
> > > > > > > the regression makes it onto the radar of the Linux kernel's regression
> > > > > > > tracker -- that's in your interest, as it ensures your report won't fall
> > > > > > > through the cracks unnoticed.
> > > > > > >
> > > > > > > Hint for developers: you normally don't need to care about regzbot once
> > > > > > > it's involved. Fix the issue as you normally would, just remember to
> > > > > > > include 'Link:' tag in the patch descriptions pointing to all reports
> > > > > > > about the issue. This has been expected from developers even before
> > > > > > > regzbot showed up for reasons explained in
> > > > > > > 'Documentation/process/submitting-patches.rst' and
> > > > > > > 'Documentation/process/5.Posting.rst'.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -Jirka
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Jirka
> > > > >
> > >
> >
> >
> > --
> > -Jirka
> >
>


-- 
-Jirka

[-- Attachment #2: Reproducer.tar.xz --]
[-- Type: application/x-xz, Size: 202600 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers #forregzbot
  2022-03-30 22:24     ` Jirka Hladky
  2022-03-31  0:11       ` Minchan Kim
@ 2022-04-04  6:37       ` Thorsten Leemhuis
  1 sibling, 0 replies; 21+ messages in thread
From: Thorsten Leemhuis @ 2022-04-04  6:37 UTC (permalink / raw)
  To: regressions; +Cc: linux-kernel

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

#regzbot introduced: 393c3714081a53795bbff0e985d24146def6f57

On 31.03.22 00:24, Jirka Hladky wrote:
> Adding Minchan Kim on Cc.
> 
> @Minchan - commit 393c3714081a53795bbff0e985d24146def6f57f authored by
> you is causing BUG: kernel NULL pointer dereference, address:
> 0000000000000008
> 
> Could you please have a look at what might be wrong?
> 
> Thank you!
> Jirka
> 
> On Thu, Mar 31, 2022 at 12:16 AM Jirka Hladky <jhladky@redhat.com> wrote:
>>
>> Hi Thorsten,
>>
>> thanks for adding this to the regzbot bot.
>>
>> Hi Greg and all,
>>
>> I did bisecting and I have found the commit causing this issue [1].
>> Could you please have a look at the code how to fix it?
>>
>> Thanks a lot
>> Jirka
>>
>> [1]
>> =========================================================
>> $ git bisect visualize
>> commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad)
>> Author: Minchan Kim <minchan@kernel.org>
>> Date:   Thu Nov 18 15:00:08 2021 -0800
>>
>>    kernfs: switch global kernfs_rwsem lock to per-fs lock
>>
>>    The kernfs implementation has big lock granularity(kernfs_rwsem) so
>>    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
>>    lock. It makes trouble for some cases to wait the global lock
>>    for a long time even though they are totally independent contexts
>>    each other.
>>
>>    A general example is process A goes under direct reclaim with holding
>>    the lock when it accessed the file in sysfs and process B is waiting
>>    the lock with exclusive mode and then process C is waiting the lock
>>    until process B could finish the job after it gets the lock from
>>    process A.
>>
>>    This patch switches the global kernfs_rwsem to per-fs lock, which
>>    put the rwsem into kernfs_root.
>>
>>    Suggested-by: Tejun Heo <tj@kernel.org>
>>    Acked-by: Tejun Heo <tj@kernel.org>
>>    Signed-off-by: Minchan Kim <minchan@kernel.org>
>>    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
>>    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> =========================================================
>>
>> The bug is triggered by running NAS Parallel benchmark suite on
>> SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
>> log:
>>
>> [  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
>> [  247.036009] #PF: supervisor read access in kernel mode
>> [  247.036009] #PF: error_code(0x0000) - not-present page
>> [  247.036009] PGD 0 P4D 0
>> [  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
>> [  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
>> 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
>> [  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
>> 2.0b 03/07/2018
>> [  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
>> [  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
>> ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
>> 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
>> c4 60
>> [  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
>> [  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
>> [  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
>> [  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
>> [  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
>> [  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
>> [  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
>> knlGS:0000000000000000
>> [  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
>> [  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  247.122048] PKRU: 55555554
>> [  247.122048] Call Trace:
>> [  247.122048]  <TASK>
>> [  247.122048]  rdt_kill_sb+0x29d/0x350
>> [  247.122048]  deactivate_locked_super+0x36/0xa0
>> [  247.122048]  cleanup_mnt+0x131/0x190
>> [  247.122048]  task_work_run+0x5c/0x90
>> [  247.122048]  exit_to_user_mode_prepare+0x229/0x230
>> [  247.122048]  syscall_exit_to_user_mode+0x18/0x40
>> [  247.122048]  do_syscall_64+0x48/0x90
>> [  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [  247.122048] RIP: 0033:0x7f01be2d735b
>> [  247.122048] Code: 2b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3
>> 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
>> 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 2a 0c 00
>> f7 d8
>> [  247.122048] RSP: 002b:00007ffde1021e08 EFLAGS: 00000202 ORIG_RAX:
>> 00000000000000a6
>> [  247.122048] RAX: 0000000000000000 RBX: 0000560c012bf5a0 RCX: 00007f01be2d735b
>> [  247.122048] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000560c012c33a0
>> [  247.259079] RBP: 0000560c012bf370 R08: 0000000000000001 R09: 00007ffde1020b90
>> [  247.267058] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
>> [  247.271055] R13: 0000560c012c33a0 R14: 0000560c012bf480 R15: 0000560c012bf370
>> [  247.279066]  </TASK>
>> [  247.283054] Modules linked in: rfkill sunrpc intel_rapl_msr
>> intel_rapl_common isst_if_common skx_edac nfit libnvdimm
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm ice
>> iTCO_wdt intel_pmc_bxt iTCO_vendor_support i
>> rqbypass ib_uverbs ipmi_ssif rapl intel_cstate ib_core mei_me joydev
>> intel_uncore i2c_i801 ioatdma acpi_ipmi lpc_ich mei pcspkr i2c_smbus
>> intel_pch_thermal dca ipmi_si acpi_power_meter acpi_pad zram ip_tables
>> xfs ast i2c_algo_bit drm_v
>> ram_helper drm_kms_helper cec drm_ttm_helper ttm drm i40e
>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel wmi
>> fuse ipmi_devintf ipmi_msghandler
>> [  247.335054] CR2: 0000000000000008
>> [  247.339041] ---[ end trace d8ccdb6c2d272688 ]---
>> [  247.355057] RIP: 0010:kernfs_remove+0x8/0x50
>> [  247.359059] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
>> ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
>> 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
>> c4 60
>> [  247.379054] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
>> [  247.383056] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
>> [  247.391053] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
>> [  247.395047] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
>> [  247.403055] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
>> [  247.411046] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
>> [  247.419055] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
>> knlGS:0000000000000000
>> [  247.427055] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  247.431055] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
>> [  247.439055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  247.443055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  247.455060] PKRU: 55555554
>>
>> On Thu, Mar 24, 2022 at 12:49 PM Thorsten Leemhuis
>> <regressions@leemhuis.info> wrote:
>>>
>>> [TLDR: I'm adding the regression report below to regzbot, the Linux
>>> kernel regression tracking bot; all text you find below is compiled from
>>> a few templates paragraphs you might have encountered already already
>>> from similar mails.]
>>>
>>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>>> to make this easily accessible to everyone.
>>>
>>> To be sure below issue doesn't fall through the cracks unnoticed, I'm
>>> adding it to regzbot, my Linux kernel regression tracking bot:
>>>
>>> #regzbot ^introduced v5.16..v5.17
>>> #regzbot ignore-activity
>>>
>>> If it turns out this isn't a regression, free free to remove it from the
>>> tracking by sending a reply to this thread containing a paragraph like
>>> "#regzbot invalid: reason why this is invalid" (without the quotes).
>>>
>>> Reminder for developers: when fixing the issue, please add a 'Link:'
>>> tags pointing to the report (the mail quoted above) using
>>> lore.kernel.org/r/, as explained in
>>> 'Documentation/process/submitting-patches.rst' and
>>> 'Documentation/process/5.Posting.rst'. Regzbot needs them to
>>> automatically connect reports with fixes, but they are useful in
>>> general, too.
>>>
>>> I'm sending this to everyone that got the initial report, to make
>>> everyone aware of the tracking. I also hope that messages like this
>>> motivate people to directly get at least the regression mailing list and
>>> ideally even regzbot involved when dealing with regressions, as messages
>>> like this wouldn't be needed then. And don't worry, if I need to send
>>> other mails regarding this regression only relevant for regzbot I'll
>>> send them to the regressions lists only (with a tag in the subject so
>>> people can filter them away). With a bit of luck no such messages will
>>> be needed anyway.
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>
>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>>> reports on my table. I can only look briefly into most of them and lack
>>> knowledge about most of the areas they concern. I thus unfortunately
>>> will sometimes get things wrong or miss something important. I hope
>>> that's not the case here; if you think it is, don't hesitate to tell me
>>> in a public reply, it's in everyone's interest to set the public record
>>> straight.
>>>
>>>
>>> On 22.03.22 00:29, Jirka Hladky wrote:
>>>> Starting from kernel 5.17 (tested with rc2, rc4, rc7, rc8) we
>>>> experience kernel oops on Intel Xeon Gold dual-socket servers (2x Xeon
>>>> Gold 6126 CPU)
>>>>
>>>> Bellow is a backtrace and the dmesg log.
>>>>
>>>> I have trouble creating a simple reproducer - it happens at random
>>>> places when preparing the NAS benchmark to be run. The script creates
>>>> a bunch of directories, compiles the benchmark a start trial runs.
>>>>
>>>> Could you please help to narrow down the problem?
>>>>
>>>> Reports bellow were created with kernel 5.17 rc8 and with
>>>> echo 1 > /proc/sys/kernel/panic_on_oops
>>>> setting.
>>>>
>>>> crash> sys
>>>>       KERNEL: /usr/lib/debug/lib/modules/5.17.0-0.rc8.123.fc37.x86_64/vmlinux
>>>>     DUMPFILE: vmcore  [PARTIAL DUMP]
>>>>         CPUS: 48
>>>>         DATE: Thu Mar 17 02:49:40 CET 2022
>>>>       UPTIME: 00:02:50
>>>> LOAD AVERAGE: 0.32, 0.10, 0.03
>>>>        TASKS: 608
>>>>     NODENAME: gold-2s-c
>>>>      RELEASE: 5.17.0-0.rc8.123.fc37.x86_64
>>>>      VERSION: #1 SMP PREEMPT Mon Mar 14 18:11:49 UTC 2022
>>>>      MACHINE: x86_64  (2600 Mhz)
>>>>       MEMORY: 94.7 GB
>>>>        PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" (check log for details)
>>>>
>>>>
>>>> crash> bt
>>>> PID: 2480   TASK: ffff9e8f76cb8000  CPU: 26  COMMAND: "umount"
>>>> #0 [ffffae00cacbfbb8] machine_kexec at ffffffffbb068980
>>>> #1 [ffffae00cacbfc08] __crash_kexec at ffffffffbb1a300a
>>>> #2 [ffffae00cacbfcc8] crash_kexec at ffffffffbb1a4045
>>>> #3 [ffffae00cacbfcd0] oops_end at ffffffffbb02c410
>>>> #4 [ffffae00cacbfcf0] page_fault_oops at ffffffffbb076a38
>>>> #5 [ffffae00cacbfd68] exc_page_fault at ffffffffbbd0b7c1
>>>> #6 [ffffae00cacbfd90] asm_exc_page_fault at ffffffffbbe00ace
>>>>    [exception RIP: kernfs_remove+7]
>>>>    RIP: ffffffffbb421f67  RSP: ffffae00cacbfe48  RFLAGS: 00010246
>>>>    RAX: 0000000000000001  RBX: ffffffffbce31e58  RCX: 0000000080200018
>>>>    RDX: 0000000080200019  RSI: ffffdfbd44161640  RDI: 0000000000000000
>>>>    RBP: ffffffffbce31e58   R8: 0000000000000000   R9: 0000000080200018
>>>>    R10: ffff9e8f05859e80  R11: ffff9e9443b1bd98  R12: ffff9ea057f1d000
>>>>    R13: ffffffffbce31e60  R14: dead000000000122  R15: dead000000000100
>>>>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>> #7 [ffffae00cacbfe58] rdt_kill_sb at ffffffffbb05074b
>>>> #8 [ffffae00cacbfea8] deactivate_locked_super at ffffffffbb36ce1f
>>>> #9 [ffffae00cacbfec0] cleanup_mnt at ffffffffbb39176e
>>>> #10 [ffffae00cacbfee8] task_work_run at ffffffffbb10703c
>>>> #11 [ffffae00cacbff08] exit_to_user_mode_prepare at ffffffffbb17a399
>>>> #12 [ffffae00cacbff28] syscall_exit_to_user_mode at ffffffffbbd0bde8
>>>> #13 [ffffae00cacbff38] do_syscall_64 at ffffffffbbd071a6
>>>> #14 [ffffae00cacbff50] entry_SYSCALL_64_after_hwframe at ffffffffbbe0007c
>>>>    RIP: 00007f442c75126b  RSP: 00007ffc82d66fe8  RFLAGS: 00000202
>>>>    RAX: 0000000000000000  RBX: 000055bd4cc37090  RCX: 00007f442c75126b
>>>>    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 000055bd4cc3b950
>>>>    RBP: 000055bd4cc371a8   R8: 0000000000000000   R9: 0000000000000073
>>>>    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000001
>>>>    R13: 000055bd4cc3b950  R14: 000055bd4cc372c0  R15: 000055bd4cc37090
>>>>    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
>>>>
>>>> [2] dmesg
>>>> [  172.776553] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>> [  172.783513] #PF: supervisor read access in kernel mode
>>>> [  172.788652] #PF: error_code(0x0000) - not-present page
>>>> [  172.793793] PGD 0 P4D 0
>>>> [  172.796330] Oops: 0000 [#1] PREEMPT SMP PTI
>>>> [  172.800519] CPU: 26 PID: 2480 Comm: umount Kdump: loaded Not
>>>> tainted 5.17.0-0.rc8.123.fc37.x86_64 #1
>>>> [  172.809645] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
>>>> 2.0b 03/07/2018
>>>> [  172.817123] RIP: 0010:kernfs_remove+0x7/0x50
>>>> [  172.821397] Code: e8 be e7 2c 00 48 89 df e8 b6 8c f0 ff 48 c7 c3
>>>> f4 ff ff ff 48 89 d8 5b 5d 41 5c 41 5d 41 5e c3 cc 66 90 0f 1f 44 00
>>>> 00 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83
>>>> c5 60
>>>> [  172.840141] RSP: 0018:ffffae00cacbfe48 EFLAGS: 00010246
>>>> [  172.845367] RAX: 0000000000000001 RBX: ffffffffbce31e58 RCX: 0000000080200018
>>>> [  172.852501] RDX: 0000000080200019 RSI: ffffdfbd44161640 RDI: 0000000000000000
>>>> [  172.859632] RBP: ffffffffbce31e58 R08: 0000000000000000 R09: 0000000080200018
>>>> [  172.866764] R10: ffff9e8f05859e80 R11: ffff9e9443b1bd98 R12: ffff9ea057f1d000
>>>> [  172.873899] R13: ffffffffbce31e60 R14: dead000000000122 R15: dead000000000100
>>>> [  172.881033] FS:  00007f442c53c800(0000) GS:ffff9e9429000000(0000)
>>>> knlGS:0000000000000000
>>>> [  172.889117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  172.894861] CR2: 0000000000000008 CR3: 000000010ba96006 CR4: 00000000007706e0
>>>> [  172.901997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [  172.909127] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [  172.916261] PKRU: 55555554
>>>> [  172.918974] Call Trace:
>>>> [  172.921427]  <TASK>
>>>> [  172.923533]  rdt_kill_sb+0x29b/0x350
>>>> [  172.927112]  deactivate_locked_super+0x2f/0xa0
>>>> [  172.931559]  cleanup_mnt+0xee/0x180
>>>> [  172.935051]  task_work_run+0x5c/0x90
>>>> [  172.938629]  exit_to_user_mode_prepare+0x229/0x230
>>>> [  172.943424]  syscall_exit_to_user_mode+0x18/0x40
>>>> [  172.948043]  do_syscall_64+0x46/0x80
>>>> [  172.951623]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>> [  172.956675] RIP: 0033:0x7f442c75126b
>>>> [  172.960271] Code: cb 1b 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 90 f3
>>>> 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00
>>>> 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 91 1b 0e 00
>>>> f7 d8
>>>> [  172.979017] RSP: 002b:00007ffc82d66fe8 EFLAGS: 00000202 ORIG_RAX:
>>>> 00000000000000a6
>>>> [  172.986584] RAX: 0000000000000000 RBX: 000055bd4cc37090 RCX: 00007f442c75126b
>>>> [  172.993715] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000055bd4cc3b950
>>>> [  173.000849] RBP: 000055bd4cc371a8 R08: 0000000000000000 R09: 0000000000000073
>>>> [  173.007980] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
>>>> [  173.015115] R13: 000055bd4cc3b950 R14: 000055bd4cc372c0 R15: 000055bd4cc37090
>>>> [  173.022249]  </TASK>
>>>> [  173.024440] Modules linked in: rfkill intel_rapl_msr
>>>> intel_rapl_common isst_if_common irdma skx_edac nfit libnvdimm ice
>>>> x86_pkg_temp_thermal intel_powerclamp coretemp ib_uverbs iTCO_wdt
>>>> intel_pmc_bxt ib_core iTCO_vendor_support kvm_
>>>> intel ipmi_ssif kvm irqbypass rapl acpi_ipmi intel_cstate i40e joydev
>>>> mei_me ioatdma i2c_i801 intel_uncore lpc_ich i2c_smbus mei
>>>> intel_pch_thermal dca ipmi_si ipmi_devintf ipmi_msghandler acpi_pad
>>>> acpi_power_meter fuse zram xfs crct10d
>>>> if_pclmul ast crc32_pclmul crc32c_intel drm_vram_helper drm_ttm_helper
>>>> ttm wmi ghash_clmulni_intel
>>>> [  173.073900] CR2: 0000000000000008
>>>>
>>>
>>> --
>>> Additional information about regzbot:
>>>
>>> If you want to know more about regzbot, check out its web-interface, the
>>> getting start guide, and the references documentation:
>>>
>>> https://linux-regtracking.leemhuis.info/regzbot/
>>> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
>>> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
>>>
>>> The last two documents will explain how you can interact with regzbot
>>> yourself if your want to.
>>>
>>> Hint for reporters: when reporting a regression it's in your interest to
>>> CC the regression list and tell regzbot about the issue, as that ensures
>>> the regression makes it onto the radar of the Linux kernel's regression
>>> tracker -- that's in your interest, as it ensures your report won't fall
>>> through the cracks unnoticed.
>>>
>>> Hint for developers: you normally don't need to care about regzbot once
>>> it's involved. Fix the issue as you normally would, just remember to
>>> include 'Link:' tag in the patch descriptions pointing to all reports
>>> about the issue. This has been expected from developers even before
>>> regzbot showed up for reasons explained in
>>> 'Documentation/process/submitting-patches.rst' and
>>> 'Documentation/process/5.Posting.rst'.
>>>
>>
>>
>> --
>> -Jirka
> 
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-01 12:04               ` Jirka Hladky
@ 2022-04-04 17:41                 ` Minchan Kim
  2022-04-20  8:02                   ` Jirka Hladky
  0 siblings, 1 reply; 21+ messages in thread
From: Minchan Kim @ 2022-04-04 17:41 UTC (permalink / raw)
  To: Jirka Hladky, tj, gregkh
  Cc: tj, linux-kernel, regressions, Thorsten Leemhuis, Justin Forbes

On Fri, Apr 01, 2022 at 02:04:03PM +0200, Jirka Hladky wrote:
> > Could you decode exact source code line from the oops?
> 
> Yes - please see below [1].

Thanks.

> 
> > I think it's fine to attach in the reply because kernel test bot
> 
> OK. The reproducer is attached. Please unpack it and follow the
> instructions in the README file. [2]

Unfortunately, I failed to run the script in my machine.

> 
> Thanks a lot for looking into it!
> Jirka
> 
> [1]
> =============================================
> Source code line numbers for the Oops message
> =============================================
> 
> 1) RIP: 0010:kernfs_remove+0x8/0x50:
> (gdb) l *kernfs_remove+0x8
> 0xffffffff81418588 is in kernfs_remove (fs/kernfs/kernfs-internal.h:48).
> 43       * Return the kernfs_root @kn belongs to.
> 44       */
> 45      static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> 46      {
> 47              /* if parent exists, it's always a dir; otherwise, @sd
> is a dir */
> 48              if (kn->parent)
> 49                      kn = kn->parent;
> 50              return kn->dir.root;
> 51      }
> 
> And here are source code lines from the 5 first functions in call trace:
> [ 8563.366280] Call Trace:
> [ 8563.366280]  <TASK>
> [ 8563.366280]  rdt_kill_sb+0x29d/0x350
> [ 8563.366280]  deactivate_locked_super+0x36/0xa0
> [ 8563.366280]  cleanup_mnt+0x131/0x190
> [ 8563.366280]  task_work_run+0x5c/0x90
> [ 8563.366280]  exit_to_user_mode_prepare+0x229/0x230
> [ 8563.366280]  syscall_exit_to_user_mode+0x18/0x40
> [ 8563.366280]  do_syscall_64+0x48/0x90
> [ 8563.366280]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> 2)(gdb) l *rdt_kill_sb+0x29d
> 0xffffffff810506bd is in rdt_kill_sb
> (arch/x86/kernel/cpu/resctrl/rdtgroup.c:2442).
> 2437            /* Notify online CPUs to update per cpu storage and
> PQR_ASSOC MSR */
> 2438            update_closid_rmid(cpu_online_mask, &rdtgroup_default);
> 2439
> 2440            kernfs_remove(kn_info);
> 2441            kernfs_remove(kn_mongrp);
> 2442            kernfs_remove(kn_mondata);
> 2443    }
> 
> 3)(gdb) l *deactivate_locked_super+0x36
> 0xffffffff813650f6 is in deactivate_locked_super (fs/super.c:342).
> 337                     /*
> 338                      * Since list_lru_destroy() may sleep, we
> cannot call it from
> 339                      * put_super(), where we hold the sb_lock.
> Therefore we destroy
> 340                      * the lru lists right now.
> 341                      */
> 342                     list_lru_destroy(&s->s_dentry_lru);
> 343                     list_lru_destroy(&s->s_inode_lru);
> 344
> 345                     put_filesystem(fs);
> 346                     put_super(s);
> 
> 4) (gdb) l *cleanup_mnt+0x131
> 0xffffffff813890a1 is in cleanup_mnt (fs/namespace.c:137).
> 132             return 0;
> 133     }
> 134
> 135     static void mnt_free_id(struct mount *mnt)
> 136     {
> 137             ida_free(&mnt_id_ida, mnt->mnt_id);
> 138     }
> 
> 5) (gdb) l *task_work_run+0x5c
> 0xffffffff8110620c is in task_work_run (./include/linux/sched.h:2017).
> 2012
> 2013    DECLARE_STATIC_CALL(cond_resched, __cond_resched);
> 2014
> 2015    static __always_inline int _cond_resched(void)
> 2016    {
> 2017            return static_call_mod(cond_resched)();
> 2018    }
> 
> 6) (gdb) l *exit_to_user_mode_prepare+0x229
> 0xffffffff81176d19 is in exit_to_user_mode_prepare
> (./include/linux/tracehook.h:189).
> 184              * This barrier pairs with
> task_work_add()->set_notify_resume() after
> 185              * hlist_add_head(task->task_works);
> 186              */
> 187             smp_mb__after_atomic();
> 188             if (unlikely(current->task_works))
> 189                     task_work_run();
> 190
> 191     #ifdef CONFIG_KEYS_REQUEST_CACHE
> 192             if (unlikely(current->cached_requested_key)) {
> 193                     key_put(current->cached_requested_key);
> 
> [2]
> =============================================
> Reproducer - README
> =============================================
> 
> 1) HW
> This issue seems to be platform specific. I was not able to reproduce
> it on AMD Zen and also not on Intel Ice Lake platform.
> I see the issue on dual socket Intel Skylake systems. Reproduced on a
> Supermicro Super Server/X11DDW-L with 2x Xeon Gold 6126 CPU.

Based on your report, kernel was crashed due to kn_mondata was NULL

  rdt_kill_sb
    rmdir_all_sub
      ..
      kernfs_remove(kn_mondata);
        struct kernfs_root *root = kernfs_root(kn); <-- crashed


Before the my patch[1], it worked like this.

  rdt_kill_sb
    rmdir_all_sub
      ..
      kernfs_remove(kn_mondata);
        down_write(&kernfs_rwsem);
          if (!kn)
            return;
        up_write(&kernfs_rwsem);

IOW, before, kernfs_remove worked with NULL argument via just bailing
but with the my patch[1], it doesn't work any longer.

It makes me have questions for kernfs maintainers:

Should kernfs_remove API support NULL parameter? If so, can we support
it atomically without old global kernfs_rwsem?

[1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-04 17:41                 ` Minchan Kim
@ 2022-04-20  8:02                   ` Jirka Hladky
  2022-04-21 16:47                     ` Tejun Heo
  0 siblings, 1 reply; 21+ messages in thread
From: Jirka Hladky @ 2022-04-20  8:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: tj, Greg Kroah-Hartman, linux-kernel, regressions,
	Thorsten Leemhuis, Justin Forbes

Hi Minchan,

have you heard back from the kernfs maintainers?

Thank you!
Jirka


On Mon, Apr 4, 2022 at 7:41 PM Minchan Kim <minchan@kernel.org> wrote:
>
> On Fri, Apr 01, 2022 at 02:04:03PM +0200, Jirka Hladky wrote:
> > > Could you decode exact source code line from the oops?
> >
> > Yes - please see below [1].
>
> Thanks.
>
> >
> > > I think it's fine to attach in the reply because kernel test bot
> >
> > OK. The reproducer is attached. Please unpack it and follow the
> > instructions in the README file. [2]
>
> Unfortunately, I failed to run the script in my machine.
>
> >
> > Thanks a lot for looking into it!
> > Jirka
> >
> > [1]
> > =============================================
> > Source code line numbers for the Oops message
> > =============================================
> >
> > 1) RIP: 0010:kernfs_remove+0x8/0x50:
> > (gdb) l *kernfs_remove+0x8
> > 0xffffffff81418588 is in kernfs_remove (fs/kernfs/kernfs-internal.h:48).
> > 43       * Return the kernfs_root @kn belongs to.
> > 44       */
> > 45      static inline struct kernfs_root *kernfs_root(struct kernfs_node *kn)
> > 46      {
> > 47              /* if parent exists, it's always a dir; otherwise, @sd
> > is a dir */
> > 48              if (kn->parent)
> > 49                      kn = kn->parent;
> > 50              return kn->dir.root;
> > 51      }
> >
> > And here are source code lines from the 5 first functions in call trace:
> > [ 8563.366280] Call Trace:
> > [ 8563.366280]  <TASK>
> > [ 8563.366280]  rdt_kill_sb+0x29d/0x350
> > [ 8563.366280]  deactivate_locked_super+0x36/0xa0
> > [ 8563.366280]  cleanup_mnt+0x131/0x190
> > [ 8563.366280]  task_work_run+0x5c/0x90
> > [ 8563.366280]  exit_to_user_mode_prepare+0x229/0x230
> > [ 8563.366280]  syscall_exit_to_user_mode+0x18/0x40
> > [ 8563.366280]  do_syscall_64+0x48/0x90
> > [ 8563.366280]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > 2)(gdb) l *rdt_kill_sb+0x29d
> > 0xffffffff810506bd is in rdt_kill_sb
> > (arch/x86/kernel/cpu/resctrl/rdtgroup.c:2442).
> > 2437            /* Notify online CPUs to update per cpu storage and
> > PQR_ASSOC MSR */
> > 2438            update_closid_rmid(cpu_online_mask, &rdtgroup_default);
> > 2439
> > 2440            kernfs_remove(kn_info);
> > 2441            kernfs_remove(kn_mongrp);
> > 2442            kernfs_remove(kn_mondata);
> > 2443    }
> >
> > 3)(gdb) l *deactivate_locked_super+0x36
> > 0xffffffff813650f6 is in deactivate_locked_super (fs/super.c:342).
> > 337                     /*
> > 338                      * Since list_lru_destroy() may sleep, we
> > cannot call it from
> > 339                      * put_super(), where we hold the sb_lock.
> > Therefore we destroy
> > 340                      * the lru lists right now.
> > 341                      */
> > 342                     list_lru_destroy(&s->s_dentry_lru);
> > 343                     list_lru_destroy(&s->s_inode_lru);
> > 344
> > 345                     put_filesystem(fs);
> > 346                     put_super(s);
> >
> > 4) (gdb) l *cleanup_mnt+0x131
> > 0xffffffff813890a1 is in cleanup_mnt (fs/namespace.c:137).
> > 132             return 0;
> > 133     }
> > 134
> > 135     static void mnt_free_id(struct mount *mnt)
> > 136     {
> > 137             ida_free(&mnt_id_ida, mnt->mnt_id);
> > 138     }
> >
> > 5) (gdb) l *task_work_run+0x5c
> > 0xffffffff8110620c is in task_work_run (./include/linux/sched.h:2017).
> > 2012
> > 2013    DECLARE_STATIC_CALL(cond_resched, __cond_resched);
> > 2014
> > 2015    static __always_inline int _cond_resched(void)
> > 2016    {
> > 2017            return static_call_mod(cond_resched)();
> > 2018    }
> >
> > 6) (gdb) l *exit_to_user_mode_prepare+0x229
> > 0xffffffff81176d19 is in exit_to_user_mode_prepare
> > (./include/linux/tracehook.h:189).
> > 184              * This barrier pairs with
> > task_work_add()->set_notify_resume() after
> > 185              * hlist_add_head(task->task_works);
> > 186              */
> > 187             smp_mb__after_atomic();
> > 188             if (unlikely(current->task_works))
> > 189                     task_work_run();
> > 190
> > 191     #ifdef CONFIG_KEYS_REQUEST_CACHE
> > 192             if (unlikely(current->cached_requested_key)) {
> > 193                     key_put(current->cached_requested_key);
> >
> > [2]
> > =============================================
> > Reproducer - README
> > =============================================
> >
> > 1) HW
> > This issue seems to be platform specific. I was not able to reproduce
> > it on AMD Zen and also not on Intel Ice Lake platform.
> > I see the issue on dual socket Intel Skylake systems. Reproduced on a
> > Supermicro Super Server/X11DDW-L with 2x Xeon Gold 6126 CPU.
>
> Based on your report, kernel was crashed due to kn_mondata was NULL
>
>   rdt_kill_sb
>     rmdir_all_sub
>       ..
>       kernfs_remove(kn_mondata);
>         struct kernfs_root *root = kernfs_root(kn); <-- crashed
>
>
> Before the my patch[1], it worked like this.
>
>   rdt_kill_sb
>     rmdir_all_sub
>       ..
>       kernfs_remove(kn_mondata);
>         down_write(&kernfs_rwsem);
>           if (!kn)
>             return;
>         up_write(&kernfs_rwsem);
>
> IOW, before, kernfs_remove worked with NULL argument via just bailing
> but with the my patch[1], it doesn't work any longer.
>
> It makes me have questions for kernfs maintainers:
>
> Should kernfs_remove API support NULL parameter? If so, can we support
> it atomically without old global kernfs_rwsem?
>
> [1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-20  8:02                   ` Jirka Hladky
@ 2022-04-21 16:47                     ` Tejun Heo
  2022-04-22 18:27                       ` Minchan Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2022-04-21 16:47 UTC (permalink / raw)
  To: Jirka Hladky
  Cc: Minchan Kim, Greg Kroah-Hartman, linux-kernel, regressions,
	Thorsten Leemhuis, Justin Forbes

Sorry about late reply.

On Wed, Apr 20, 2022 at 10:02:20AM +0200, Jirka Hladky wrote:
> > Based on your report, kernel was crashed due to kn_mondata was NULL
> >
> >   rdt_kill_sb
> >     rmdir_all_sub
> >       ..
> >       kernfs_remove(kn_mondata);
> >         struct kernfs_root *root = kernfs_root(kn); <-- crashed
> >
> >
> > Before the my patch[1], it worked like this.
> >
> >   rdt_kill_sb
> >     rmdir_all_sub
> >       ..
> >       kernfs_remove(kn_mondata);
> >         down_write(&kernfs_rwsem);
> >           if (!kn)
> >             return;
> >         up_write(&kernfs_rwsem);
> >
> > IOW, before, kernfs_remove worked with NULL argument via just bailing
> > but with the my patch[1], it doesn't work any longer.
> >
> > It makes me have questions for kernfs maintainers:
> >
> > Should kernfs_remove API support NULL parameter? If so, can we support
> > it atomically without old global kernfs_rwsem?
> >
> > [1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock

Yes, I mean, kernfs_remove() used to support NULL arg, so it should do the
same after the locking change too. Can you send a patch?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-21 16:47                     ` Tejun Heo
@ 2022-04-22 18:27                       ` Minchan Kim
  2022-04-22 18:44                         ` Thorsten Leemhuis
  0 siblings, 1 reply; 21+ messages in thread
From: Minchan Kim @ 2022-04-22 18:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jirka Hladky, Greg Kroah-Hartman, linux-kernel, regressions,
	Thorsten Leemhuis, Justin Forbes

On Thu, Apr 21, 2022 at 06:47:41AM -1000, Tejun Heo wrote:
> Sorry about late reply.
> 
> On Wed, Apr 20, 2022 at 10:02:20AM +0200, Jirka Hladky wrote:
> > > Based on your report, kernel was crashed due to kn_mondata was NULL
> > >
> > >   rdt_kill_sb
> > >     rmdir_all_sub
> > >       ..
> > >       kernfs_remove(kn_mondata);
> > >         struct kernfs_root *root = kernfs_root(kn); <-- crashed
> > >
> > >
> > > Before the my patch[1], it worked like this.
> > >
> > >   rdt_kill_sb
> > >     rmdir_all_sub
> > >       ..
> > >       kernfs_remove(kn_mondata);
> > >         down_write(&kernfs_rwsem);
> > >           if (!kn)
> > >             return;
> > >         up_write(&kernfs_rwsem);
> > >
> > > IOW, before, kernfs_remove worked with NULL argument via just bailing
> > > but with the my patch[1], it doesn't work any longer.
> > >
> > > It makes me have questions for kernfs maintainers:
> > >
> > > Should kernfs_remove API support NULL parameter? If so, can we support
> > > it atomically without old global kernfs_rwsem?
> > >
> > > [1] 393c3714081a, kernfs: switch global kernfs_rwsem lock to per-fs lock
> 
> Yes, I mean, kernfs_remove() used to support NULL arg, so it should do the
> same after the locking change too. Can you send a patch?

Thanks for checking, Tejun.

Jirka, Could you test the patch? Once it's confirmed, I need to resend
it with Ccing stable.

Thanks.

From c7441bc659d2869f2d751b43f27356156e028513 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Fri, 22 Apr 2022 11:16:45 -0700
Subject: [PATCH] kernfs: fix NULL dereferencing in kernfs_remove

kernfs_remove supported NULL kernfs_node param to bail out but revent
per-fs lock change introduced regression that dereferencing the
param without NULL check so kernel goes crash.

This patch checks the NULL kernfs_node in kernfs_remove and if so,
just return.

Quote from bug report by Jirka

```
The bug is triggered by running NAS Parallel benchmark suite on
SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
log:

[  247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  247.036009] #PF: supervisor read access in kernel mode
[  247.036009] #PF: error_code(0x0000) - not-present page
[  247.036009] PGD 0 P4D 0
[  247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
[  247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
[  247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
2.0b 03/07/2018
[  247.058060] RIP: 0010:kernfs_remove+0x8/0x50
[  247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
c4 60
[  247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
[  247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
[  247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
[  247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
[  247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
[  247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
[  247.122048] FS:  00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
knlGS:0000000000000000
[  247.122048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
[  247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  247.122048] PKRU: 55555554
[  247.122048] Call Trace:
[  247.122048]  <TASK>
[  247.122048]  rdt_kill_sb+0x29d/0x350
[  247.122048]  deactivate_locked_super+0x36/0xa0
[  247.122048]  cleanup_mnt+0x131/0x190
[  247.122048]  task_work_run+0x5c/0x90
[  247.122048]  exit_to_user_mode_prepare+0x229/0x230
[  247.122048]  syscall_exit_to_user_mode+0x18/0x40
[  247.122048]  do_syscall_64+0x48/0x90
[  247.122048]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  247.122048] RIP: 0033:0x7f01be2d735b
```

Fixes: 393c3714081a (kernfs: switch global kernfs_rwsem lock to per-fs lock)
Reported-by: Jirka Hladky <jhladky@redhat.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 fs/kernfs/dir.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 61a8edc4ba8b..e205fde7163a 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1406,7 +1406,12 @@ static void __kernfs_remove(struct kernfs_node *kn)
  */
 void kernfs_remove(struct kernfs_node *kn)
 {
-	struct kernfs_root *root = kernfs_root(kn);
+	struct kernfs_root *root;
+
+	if (!kn)
+		return;
+
+	root = kernfs_root(kn);
 
 	down_write(&root->kernfs_rwsem);
 	__kernfs_remove(kn);
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-22 18:27                       ` Minchan Kim
@ 2022-04-22 18:44                         ` Thorsten Leemhuis
  2022-04-22 20:09                           ` Minchan Kim
  0 siblings, 1 reply; 21+ messages in thread
From: Thorsten Leemhuis @ 2022-04-22 18:44 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Jirka Hladky, Greg Kroah-Hartman, linux-kernel, regressions,
	Justin Forbes, Tejun Heo

On 22.04.22 20:27, Minchan Kim wrote:
> On Thu, Apr 21, 2022 at 06:47:41AM -1000, Tejun Heo wrote:
> [...]

Many thx for looking into this.

> Jirka, Could you test the patch? Once it's confirmed, I need to resend
> it with Ccing stable.

When you do so, could you please include a proper "Link:" tag pointing
to all reports of the regression, as explained in the Linux kernels
documentation (see 'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'). E.g. in this case:

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696
Link:
https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/

This concept is not new (Linus and quite a few other developers use them
like this for a long time), I just recently improved those documents to
clarify things, as my regression tracking efforts rely on this (and
there might be other people and software out there that does) -- that's
why it's making my work a lot harder if such tags are missing. :-/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

> [...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-22 18:44                         ` Thorsten Leemhuis
@ 2022-04-22 20:09                           ` Minchan Kim
  2022-04-25 21:34                             ` Jirka Hladky
  2022-04-26  9:43                             ` Greg Kroah-Hartman
  0 siblings, 2 replies; 21+ messages in thread
From: Minchan Kim @ 2022-04-22 20:09 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Jirka Hladky, Greg Kroah-Hartman, linux-kernel, regressions,
	Justin Forbes, Tejun Heo

On Fri, Apr 22, 2022 at 08:44:17PM +0200, Thorsten Leemhuis wrote:
> On 22.04.22 20:27, Minchan Kim wrote:
> > On Thu, Apr 21, 2022 at 06:47:41AM -1000, Tejun Heo wrote:
> > [...]
> 
> Many thx for looking into this.
> 
> > Jirka, Could you test the patch? Once it's confirmed, I need to resend
> > it with Ccing stable.
> 
> When you do so, could you please include a proper "Link:" tag pointing
> to all reports of the regression, as explained in the Linux kernels
> documentation (see 'Documentation/process/submitting-patches.rst' and
> 'Documentation/process/5.Posting.rst'). E.g. in this case:
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696
> Link:
> https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/

Sure. Will do that.

> 
> This concept is not new (Linus and quite a few other developers use them
> like this for a long time), I just recently improved those documents to
> clarify things, as my regression tracking efforts rely on this (and
> there might be other people and software out there that does) -- that's
> why it's making my work a lot harder if such tags are missing. :-/
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
> 
> > [...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-22 20:09                           ` Minchan Kim
@ 2022-04-25 21:34                             ` Jirka Hladky
  2022-04-26  9:43                             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 21+ messages in thread
From: Jirka Hladky @ 2022-04-25 21:34 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Thorsten Leemhuis, Greg Kroah-Hartman, linux-kernel, regressions,
	Justin Forbes, Tejun Heo

Hi Minchan,

I have tested the proposed patch and it fixes the issue!

Thanks a lot for your help!
Jirka


On Fri, Apr 22, 2022 at 10:09 PM Minchan Kim <minchan@kernel.org> wrote:
>
> On Fri, Apr 22, 2022 at 08:44:17PM +0200, Thorsten Leemhuis wrote:
> > On 22.04.22 20:27, Minchan Kim wrote:
> > > On Thu, Apr 21, 2022 at 06:47:41AM -1000, Tejun Heo wrote:
> > > [...]
> >
> > Many thx for looking into this.
> >
> > > Jirka, Could you test the patch? Once it's confirmed, I need to resend
> > > it with Ccing stable.
> >
> > When you do so, could you please include a proper "Link:" tag pointing
> > to all reports of the regression, as explained in the Linux kernels
> > documentation (see 'Documentation/process/submitting-patches.rst' and
> > 'Documentation/process/5.Posting.rst'). E.g. in this case:
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696
> > Link:
> > https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/
>
> Sure. Will do that.
>
> >
> > This concept is not new (Linus and quite a few other developers use them
> > like this for a long time), I just recently improved those documents to
> > clarify things, as my regression tracking efforts rely on this (and
> > there might be other people and software out there that does) -- that's
> > why it's making my work a lot harder if such tags are missing. :-/
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >
> > P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> > reports on my table. I can only look briefly into most of them and lack
> > knowledge about most of the areas they concern. I thus unfortunately
> > will sometimes get things wrong or miss something important. I hope
> > that's not the case here; if you think it is, don't hesitate to tell me
> > in a public reply, it's in everyone's interest to set the public record
> > straight.
> >
> > > [...]
>


-- 
-Jirka


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers
  2022-04-22 20:09                           ` Minchan Kim
  2022-04-25 21:34                             ` Jirka Hladky
@ 2022-04-26  9:43                             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2022-04-26  9:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Thorsten Leemhuis, Jirka Hladky, linux-kernel, regressions,
	Justin Forbes, Tejun Heo

On Fri, Apr 22, 2022 at 01:09:36PM -0700, Minchan Kim wrote:
> On Fri, Apr 22, 2022 at 08:44:17PM +0200, Thorsten Leemhuis wrote:
> > On 22.04.22 20:27, Minchan Kim wrote:
> > > On Thu, Apr 21, 2022 at 06:47:41AM -1000, Tejun Heo wrote:
> > > [...]
> > 
> > Many thx for looking into this.
> > 
> > > Jirka, Could you test the patch? Once it's confirmed, I need to resend
> > > it with Ccing stable.
> > 
> > When you do so, could you please include a proper "Link:" tag pointing
> > to all reports of the regression, as explained in the Linux kernels
> > documentation (see 'Documentation/process/submitting-patches.rst' and
> > 'Documentation/process/5.Posting.rst'). E.g. in this case:
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696
> > Link:
> > https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/
> 
> Sure. Will do that.

Did this ever get sent, I can't find it in my queue anywhere...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-04-26  9:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 23:29 PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers Jirka Hladky
2022-03-21 23:37 ` Jirka Hladky
2022-03-22  7:12   ` Greg KH
2022-03-22 10:19     ` Jirka Hladky
2022-03-24 11:49 ` Thorsten Leemhuis
2022-03-30 22:16   ` Jirka Hladky
2022-03-30 22:24     ` Jirka Hladky
2022-03-31  0:11       ` Minchan Kim
2022-03-31 14:54         ` Justin Forbes
2022-03-31 16:18           ` Jirka Hladky
2022-03-31 23:33             ` Minchan Kim
2022-04-01 12:04               ` Jirka Hladky
2022-04-04 17:41                 ` Minchan Kim
2022-04-20  8:02                   ` Jirka Hladky
2022-04-21 16:47                     ` Tejun Heo
2022-04-22 18:27                       ` Minchan Kim
2022-04-22 18:44                         ` Thorsten Leemhuis
2022-04-22 20:09                           ` Minchan Kim
2022-04-25 21:34                             ` Jirka Hladky
2022-04-26  9:43                             ` Greg Kroah-Hartman
2022-04-04  6:37       ` PANIC: "Oops: 0000 [#1] PREEMPT SMP PTI" starting from 5.17 on dual socket Intel Xeon Gold servers #forregzbot Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.