linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
@ 2023-08-23  4:06 Ming Lei
  2023-08-23  8:47 ` Christian Brauner
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2023-08-23  4:06 UTC (permalink / raw)
  To: linux-fsdevel, Christian Brauner, Alexander Viro, linux-kernel
  Cc: ming.lei, linux-scsi, Changhui Zhong


Looks the issue is more related with vfs, so forward to vfs list.

----- Forwarded message from Changhui Zhong <czhong@redhat.com> -----

Date: Wed, 23 Aug 2023 11:17:55 +0800
From: Changhui Zhong <czhong@redhat.com>
To: linux-scsi@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278

Hello,

triggered below warning issue with branch
"
Tree: mainline.kernel.org-clang
Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
@ master
Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d
Commit Name: v6.5-rc7-18-g89bf6209cad6
Kernel information:
Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
"
for more detail,please check
https://datawarehouse.cki-project.org/kcidb/tests/9232643

#modprobe scsi_debug virtual_gb=128
#echo none > /sys/block/sdb/queue/scheduler
#fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw
--name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G
--runtime=60

[ 3056.092761] Device: sdb  Engine: libaio Sched: none Pattern: randrw
 Direct: 1 Depth: 1  Block size: 4K Size: 60G
[ 3117.055168] ------------[ cut here ]------------
[ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365
__dentry_kill+0x214/0x278
[ 3117.067601] Modules linked in: scsi_debug nvme nvme_core
nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc
vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf
ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu
acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic
ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last
unloaded: scsi_debug]
[ 3117.103572] CPU: 121 PID: 93233 Comm: bash Not tainted 6.5.0-rc7 #1
[ 3117.109827] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31n (SCP: 2.10.20220810) 09/30/2022
[ 3117.119119] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3117.126068] pc : __dentry_kill+0x214/0x278
[ 3117.130152] lr : __dentry_kill+0x194/0x278
[ 3117.134236] sp : ffff800084993870
[ 3117.137537] x29: ffff800084993870 x28: ffff0800830e2200 x27: 0000000000080400
[ 3117.144661] x26: 00000000fff7fbff x25: 0000000000000001 x24: ffff07ff88473198
[ 3117.151783] x23: ffff07ff884731c0 x22: ffff07ff9d033c80 x21: ffff07ff884731d0
[ 3117.158906] x20: ffff07ff88473198 x19: ffff07ff88473140 x18: ffffbc0d0739ceb4
[ 3117.166028] x17: 0000000000000000 x16: ffffbc0d073528c0 x15: ffff07ff89c761f8
[ 3117.173151] x14: 0000000000000002 x13: 0000000000000000 x12: 0000000000000001
[ 3117.180273] x11: ffff080f33fe0850 x10: ffffffffffffffff x9 : 0000000100000000
[ 3117.187395] x8 : ffffbc0d08aa1e98 x7 : 0000000000000000 x6 : 000000000000003f
[ 3117.194518] x5 : ffff800084993a30 x4 : ffff800084993908 x3 : ffff07ff9cb7e210
[ 3117.201640] x2 : ffff07ff884731d0 x1 : 0000000000000000 x0 : ffff07ff884731f0
[ 3117.208763] Call trace:
[ 3117.211197]  __dentry_kill+0x214/0x278
[ 3117.214934]  shrink_dentry_list+0x134/0x2b0
[ 3117.219105]  prune_dcache_sb+0x64/0xa0
[ 3117.222842]  super_cache_scan+0x144/0x198
[ 3117.226841]  do_shrink_slab+0x1dc/0x420
[ 3117.230666]  shrink_slab+0x114/0x388
[ 3117.234229]  drop_slab+0xb0/0x118
[ 3117.237532]  drop_caches_sysctl_handler+0xac/0x170
[ 3117.242313]  proc_sys_call_handler+0x184/0x2d0
[ 3117.246745]  proc_sys_write+0x20/0x38
[ 3117.250396]  vfs_write+0x24c/0x368
[ 3117.253786]  ksys_write+0x84/0xf8
[ 3117.257089]  __arm64_sys_write+0x28/0x40
[ 3117.261000]  invoke_syscall+0x78/0x110
[ 3117.264739]  el0_svc_common+0xc0/0xf8
[ 3117.268390]  do_el0_svc+0x3c/0xb8
[ 3117.271693]  el0_svc+0x34/0x110
[ 3117.274825]  el0t_64_sync_handler+0x84/0x100
[ 3117.279083]  el0t_64_sync+0x194/0x198
[ 3117.282733] ---[ end trace 0000000000000000 ]---
[ 3119.372909] Device: sdb  Engine: libaio Sched: none Pattern: randrw
 Direct: 1 Depth: 1  Block size: 16K Size: 60G
[ 3144.068984] watchdog: BUG: soft lockup - CPU#97 stuck for 26s! [fio:93777]
[ 3144.069984] watchdog: BUG: soft lockup - CPU#99 stuck for 26s!
[systemd-udevd:1680]
[ 3144.075849] Modules linked in: scsi_debug nvme
[ 3144.083493] Modules linked in:
[ 3144.087924]  nvme_core nvme_common null_blk pktcdvd
[ 3144.090967]  scsi_debug
[ 3144.090968]  ipmi_watchdog ipmi_poweroff rfkill
[ 3144.098267]  nvme
[ 3144.098267]  sunrpc vfat fat
[ 3144.102785]  nvme_core
[ 3144.104698]  acpi_ipmi
[ 3144.107566]  nvme_common
[ 3144.109912]  ipmi_ssif
[ 3144.112259]  null_blk
[ 3144.114779]  arm_spe_pmu
[ 3144.117126]  pktcdvd
[ 3144.119385]  igb ipmi_devintf
[ 3144.121905]  ipmi_watchdog
[ 3144.124078]  ipmi_msghandler
[ 3144.127033]  ipmi_poweroff
[ 3144.129728]  arm_cmn
[ 3144.132596]  rfkill
[ 3144.135289]  arm_dmc620_pmu
[ 3144.137462]  sunrpc
[ 3144.139548]  cppc_cpufreq
[ 3144.142329]  vfat
[ 3144.144415]  arm_dsu_pmu
[ 3144.147022]  fat
[ 3144.148934]  acpi_tad loop
[ 3144.151455]  acpi_ipmi
[ 3144.153280]  fuse
[ 3144.155974]  ipmi_ssif
[ 3144.158320]  zram
[ 3144.160233]  arm_spe_pmu
[ 3144.162579]  xfs
[ 3144.164493]  igb
[ 3144.167012]  crct10dif_ce
[ 3144.168838]  ipmi_devintf
[ 3144.170664]  polyval_ce
[ 3144.173271]  ipmi_msghandler
[ 3144.175877]  polyval_generic
[ 3144.178312]  arm_cmn
[ 3144.181178]  ghash_ce
[ 3144.184047]  arm_dmc620_pmu
[ 3144.186219]  sbsa_gwdt
[ 3144.188479]  cppc_cpufreq
[ 3144.191259]  ast
[ 3144.193606]  arm_dsu_pmu
[ 3144.196213]  onboard_usb_hub
[ 3144.198039]  acpi_tad
[ 3144.200559]  i2c_algo_bit
[ 3144.203427]  loop
[ 3144.205686]  xgene_hwmon
[ 3144.208294]  fuse
[ 3144.210206]  [last unloaded: scsi_debug]
[ 3144.212726]  zram
[ 3144.214638]
[ 3144.214640] CPU: 97 PID: 93777 Comm: fio Tainted: G        W
  6.5.0-rc7 #1
[ 3144.218548]  xfs
[ 3144.220459] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31n (SCP: 2.10.20220810) 09/30/2022
[ 3144.221939]  crct10dif_ce
[ 3144.229493] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3144.231320]  polyval_ce
[ 3144.240610] pc : d_alloc_parallel+0x204/0x520
[ 3144.243218]  polyval_generic
[ 3144.250164] lr : d_alloc_parallel+0x128/0x520
[ 3144.252598]  ghash_ce
[ 3144.256940] sp : ffff800083ecb860
[ 3144.259809]  sbsa_gwdt
[ 3144.264151] x29: ffff800083ecb8b0
[ 3144.266411]  ast
[ 3144.269712]  x28: 00000000e28e44d4
[ 3144.272059]  onboard_usb_hub
[ 3144.275360]  x27: ffff0800c7e6c400
[ 3144.277186]  i2c_algo_bit
[ 3144.280574]
[ 3144.280575] x26: ffff07ff9dfb48d0
[ 3144.283442]  xgene_hwmon
[ 3144.286830]  x25: 0000000000011b30
[ 3144.289437]  [last unloaded: scsi_debug]
[ 3144.290915]  x24: ffff07ffe89be298
[ 3144.294217]
[ 3144.296737]
[ 3144.296738] x23: ffff07ffe89be240
[ 3144.300126] CPU: 99 PID: 1680 Comm: systemd-udevd Tainted: G
W          6.5.0-rc7 #1
[ 3144.304034]  x22: ffff800083ecba20 x21: ffffbc0d084e9000
[ 3144.307424] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31n (SCP: 2.10.20220810) 09/30/2022
[ 3144.308902]
[ 3144.308903] x20: ffffbc0d08aa1e98
[ 3144.310381] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3144.313682]  x19: ffff07ff9eeac918 x18: 00000000fffffffb
[ 3144.322019] pc : d_alloc_parallel+0x17c/0x520
[ 3144.327316]
[ 3144.327317] x17: ffff07ff93f1c021
[ 3144.336608] lr : d_alloc_parallel+0x128/0x520
[ 3144.338086]  x16: 636f6c622f000000 x15: fefefefeff727372
[ 3144.341388] sp : ffff8000844ab9b0
[ 3144.348335]
[ 3144.348336] x14: fefefefeff7b7b7b
[ 3144.353634] x29: ffff8000844aba00
[ 3144.357976]  x13: 0000000000000000 x12: 00000000e28e44d4
[ 3144.359456]  x28: 00000000e777e234
[ 3144.362757]
[ 3144.362757] x11: ffff080f32b00000
[ 3144.367100]  x27: ffff07ff96888000
[ 3144.372398]  x10: 0000000000000000 x9 : ffff07ff93f1c037
[ 3144.375700]
[ 3144.377178]
[ 3144.377179] x8 : 0000000000000000
[ 3144.380480] x26: ffff07ff9de47890
[ 3144.383781]  x7 : 7374736575716572
[ 3144.389080]  x25: 0000000000011b30
[ 3144.392467]  x6 : 65757165725f726e
[ 3144.393947]  x24: ffff07ff88faf618
[ 3144.397248]
[ 3144.397249] x5 : ffff07ff9dfb4883
[ 3144.400637]
[ 3144.405934]  x4 : ffff07ff93f1c042
[ 3144.407414] x23: ffff07ff88faf5c0
[ 3144.408892]  x3 : ffff07ff9dfb4bd0
[ 3144.412193]  x22: ffff8000844abb30
[ 3144.415494]
[ 3144.415495] x2 : ffff800083ecb87c
[ 3144.418883]  x21: ffffbc0d084e9000
[ 3144.422271]  x1 : ffff800083ecba20 x0 : 0000000000000000
[ 3144.425660]
[ 3144.429048]
[ 3144.429049] Call trace:
[ 3144.430527] x20: ffffbc0d08aa1e98
[ 3144.433828]  d_alloc_parallel+0x204/0x520
[ 3144.435307]  x19: 00000000000000e8
[ 3144.438695]  __lookup_slow+0x6c/0x158
[ 3144.441997]  x18: 00000000fffffffb
[ 3144.445385]  lookup_slow+0x4c/0x78
[ 3144.448774]
[ 3144.450252]  walk_component+0x10c/0x128
[ 3144.453553] x17: ffff07ff9400a021
[ 3144.456941]  path_lookupat+0x60/0x140
[ 3144.462240]  x16: 697665642f000000
[ 3144.463718]  filename_lookup+0xd8/0x1d0
[ 3144.465197]  x15: 722e6a626e6b612e
[ 3144.467629]  vfs_statx+0x90/0x220
[ 3144.470932]
[ 3144.474927]  __arm64_sys_newfstatat+0xa0/0x100
[ 3144.478316] x14: 7aff6b6b7f6b6bff
[ 3144.481964]  invoke_syscall+0x78/0x110
[ 3144.485353]  x13: 0000000000000000
[ 3144.488741]  el0_svc_common+0xc0/0xf8
[ 3144.490219]  x12: 00000000e777e234
[ 3144.494041]  do_el0_svc+0x3c/0xb8
[ 3144.497343]
[ 3144.500991]  el0_svc+0x34/0x110
[ 3144.504380] x11: ffff080f32b00000
[ 3144.508201]  el0t_64_sync_handler+0x84/0x100
[ 3144.511590]  x10: 0000000000000000
[ 3144.514891]  el0t_64_sync+0x194/0x198
[ 3144.516370]  x9 : ffff07ff9400a06b
[ 3144.564153] x8 : ffff07ff884731f1 x7 : 25732500716d016b x6 : 0000000032757063
[ 3144.571280] x5 : ffff07ff9de4783d x4 : ffff07ff9400a070 x3 : ffff07ff9de46c90
[ 3144.578407] x2 : ffff8000844ab9cc x1 : ffff8000844abb30 x0 : 0000000000000000
[ 3144.585534] Call trace:
[ 3144.587969]  d_alloc_parallel+0x17c/0x520
[ 3144.591971]  path_openat+0x238/0xc70
[ 3144.595539]  do_filp_open+0xc4/0x178
[ 3144.599107]  do_sys_openat2+0x90/0x100
[ 3144.602847]  __arm64_sys_openat+0x7c/0xb0
[ 3144.606848]  invoke_syscall+0x78/0x110
[ 3144.610591]  el0_svc_common+0x94/0xf8
[ 3144.614247]  do_el0_svc+0x3c/0xb8
[ 3144.617555]  el0_svc+0x34/0x110
[ 3144.620690]  el0t_64_sync_handler+0x84/0x100
[ 3144.624954]  el0t_64_sync+0x194/0x198
[ 3168.068867] watchdog: BUG: soft lockup - CPU#97 stuck for 48s! [fio:93777]
[ 3168.069867] watchdog: BUG: soft lockup - CPU#99 stuck for 48s!
[systemd-udevd:1680]

Thanks,


----- End forwarded message -----

-- 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-08-23  4:06 [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] Ming Lei
@ 2023-08-23  8:47 ` Christian Brauner
  2023-08-28 10:43   ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2023-08-23  8:47 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong

On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote:
> 
> Looks the issue is more related with vfs, so forward to vfs list.
> 
> ----- Forwarded message from Changhui Zhong <czhong@redhat.com> -----
> 
> Date: Wed, 23 Aug 2023 11:17:55 +0800
> From: Changhui Zhong <czhong@redhat.com>
> To: linux-scsi@vger.kernel.org
> Cc: Ming Lei <ming.lei@redhat.com>
> Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
> 
> Hello,
> 
> triggered below warning issue with branch
> "
> Tree: mainline.kernel.org-clang
> Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> @ master
> Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d
> Commit Name: v6.5-rc7-18-g89bf6209cad6
> Kernel information:
> Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> "
> for more detail,please check
> https://datawarehouse.cki-project.org/kcidb/tests/9232643
> 
> #modprobe scsi_debug virtual_gb=128
> #echo none > /sys/block/sdb/queue/scheduler
> #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw
> --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G
> --runtime=60

Looking at this issue it seems unlikely that this is a vfs bug.
We should see this all over the place and specifically not just on arm64.

The sequence here seems to be:

echo 4 > /proc/sys/vm/drop_caches
rmmod scsi_debug > /dev/null 3>&1

[ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278 
[ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]

So my money is on some device that gets removed still having an
increased refcount and pinning the dentry. Immediate suspects would be:

7882541ca06d ("of/platform: increase refcount of fwnode")

but that part is complete speculation on my part.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-08-23  8:47 ` Christian Brauner
@ 2023-08-28 10:43   ` Ming Lei
  2023-09-13  8:59     ` Yi Zhang
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2023-08-28 10:43 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi,
	Changhui Zhong, Yi Zhang

On Wed, Aug 23, 2023 at 4:47 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote:
> >
> > Looks the issue is more related with vfs, so forward to vfs list.
> >
> > ----- Forwarded message from Changhui Zhong <czhong@redhat.com> -----
> >
> > Date: Wed, 23 Aug 2023 11:17:55 +0800
> > From: Changhui Zhong <czhong@redhat.com>
> > To: linux-scsi@vger.kernel.org
> > Cc: Ming Lei <ming.lei@redhat.com>
> > Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
> >
> > Hello,
> >
> > triggered below warning issue with branch
> > "
> > Tree: mainline.kernel.org-clang
> > Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > @ master
> > Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d
> > Commit Name: v6.5-rc7-18-g89bf6209cad6
> > Kernel information:
> > Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> > "
> > for more detail,please check
> > https://datawarehouse.cki-project.org/kcidb/tests/9232643
> >
> > #modprobe scsi_debug virtual_gb=128
> > #echo none > /sys/block/sdb/queue/scheduler
> > #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw
> > --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G
> > --runtime=60
>
> Looking at this issue it seems unlikely that this is a vfs bug.
> We should see this all over the place and specifically not just on arm64.
>
> The sequence here seems to be:
>
> echo 4 > /proc/sys/vm/drop_caches
> rmmod scsi_debug > /dev/null 3>&1
>
> [ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
> [ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]
>
> So my money is on some device that gets removed still having an
> increased refcount and pinning the dentry. Immediate suspects would be:
>
> 7882541ca06d ("of/platform: increase refcount of fwnode")
>
> but that part is complete speculation on my part.

BTW, just saw another panic on 6.5-rc7, still scsi_debug test on arm64:

[  959.371726] sr 50:0:0:0: Attached scsi generic sg1 type 5
[  959.603145] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[  959.603155] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[  959.603950] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[  959.604052] scsi 49:0:0:0: Power-on or device reset occurred
[  959.609336] sr 49:0:0:0: [sr1] scsi-1 drive
[  959.611360] scsi 48:0:0:0: Power-on or device reset occurred
[  959.614540] Unable to handle kernel paging request at virtual
address 65888c2e6fe694d5
[  959.614544] Mem abort info:
[  959.614545]   ESR = 0x0000000096000004
[  959.614547]   EC = 0x25: DABT (current EL), IL = 32 bits
[  959.614550]   SET = 0, FnV = 0
[  959.614552]   EA = 0, S1PTW = 0
[  959.614553]   FSC = 0x04: level 0 translation fault
[  959.614555] Data abort info:
[  959.614556]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  959.614559]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  959.614561]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  959.614563] [65888c2e6fe694d5] address between user and kernel address ranges
[  959.614566] Internal error: Oops: 0000000096000004 [#1] SMP
[  959.614570] Modules linked in: pktcdvd scsi_debug ipmi_watchdog
ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu
igb ipmi_devintf arm_cmn ipmi_msghandler arm_dmc620_pmu arm_dsu_pmu
cppc_cpufreq acpi_tad loop fuse zram xfs nvme crct10dif_ce polyval_ce
nvme_core polyval_generic ghash_ce sbsa_gwdt nvme_common ast
onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]
[  959.614620] CPU: 108 PID: 19529 Comm: check Not tainted 6.5.0-rc7 #1
[  959.614625] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31n (SCP: 2.10.20220810) 09/30/2022
[  959.614627] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  959.614632] pc : d_alloc_parallel+0x140/0x440
[  959.614641] lr : d_alloc_parallel+0xcc/0x440
[  959.614646] sp : ffff80008a7d3290
[  959.614647] x29: ffff80008a7d3290 x28: ffff07ff8230c530 x27: 65888c2e6fe69565
[  959.614654] x26: ffffcb72eac9e1d0 x25: ffff80008a7d33d8 x24: ffff07ffa0fd3800
[  959.614659] x23: 00000000000003c0 x22: 000000007a701548 x21: ffffcb72eac9ffd0
[  959.614664] x20: ffff07ffa0fd35c0 x19: ffffcb72ea6e9600 x18: ffffffffffffffff
[  959.614670] x17: 00000000440fd8e0 x16: 00000000b6431329 x15: ffff80008a7d3360
[  959.614675] x14: ffff80008a7d3508 x13: ffffcb72e94da6d0 x12: ffff80008a7d334c
[  959.614680] x11: 0000000c7a701548 x10: ffff3c8cb54590b8 x9 : ffffcb72e80832f4
[  959.614685] x8 : ffff07ffa0fd35c0 x7 : 7473696c5f71725f x6 : 0000000000200008
[  959.614690] x5 : ffffcb72ea6f4000 x4 : 00000000003d380a x3 : 0000000000000004
[  959.614696] x2 : ffff80008a7d331c x1 : ffff07ffe1760000 x0 : 0000000000005000
[  959.614701] Call trace:
[  959.614703]  d_alloc_parallel+0x140/0x440
[  959.614708]  __lookup_slow+0x64/0x158
[  959.614714]  lookup_one_len+0xac/0xc8
[  959.614719]  start_creating.part.0+0x88/0x198
[  959.614725]  __debugfs_create_file+0x70/0x230
[  959.614730]  debugfs_create_file+0x34/0x48
[  959.614734]  blk_mq_debugfs_register_hctx+0x154/0x1d0
[  959.614740]  blk_mq_debugfs_register+0xfc/0x1e0
[  959.614745]  blk_register_queue+0xc0/0x1f0
[  959.614750]  device_add_disk+0x1dc/0x3e0
[  959.614754]  sr_probe+0x2c0/0x368
[  959.614760]  really_probe+0x190/0x3d8
[  959.614766]  __driver_probe_device+0x84/0x180
[  959.614771]  driver_probe_device+0x44/0x120
[  959.614776]  __device_attach_driver+0xc4/0x168
[  959.614781]  bus_for_each_drv+0x8c/0xf0
[  959.614785]  __device_attach+0xa4/0x1c0
[  959.614790]  device_initial_probe+0x1c/0x30
[  959.614795]  bus_probe_device+0xb4/0xc0
[  959.614799]  device_add+0x508/0x6f8
[  959.614803]  scsi_sysfs_add_sdev+0x8c/0x250
[  959.614809]  scsi_add_lun+0x424/0x558
[  959.614813]  scsi_probe_and_add_lun+0x11c/0x430
[  959.614817]  __scsi_scan_target+0xb8/0x258
[  959.614821]  scsi_scan_channel+0xa0/0xb8
[  959.614825]  scsi_scan_host_selected+0x170/0x188
[  959.614830]  store_scan+0x194/0x1a8
[  959.614835]  dev_attr_store+0x20/0x40
[  959.614840]  sysfs_kf_write+0x4c/0x68
[  959.614845]  kernfs_fop_write_iter+0x13c/0x1d8
[  959.614849]  vfs_write+0x1c0/0x310
[  959.614855]  ksys_write+0x78/0x118
[  959.614859]  __arm64_sys_write+0x24/0x38
[  959.614864]  invoke_syscall+0x78/0x100
[  959.614868]  el0_svc_common.constprop.0+0x4c/0xf8
[  959.614871]  do_el0_svc+0x34/0x50
[  959.614874]  el0_svc+0x34/0x108
[  959.614879]  el0t_64_sync_handler+0x120/0x130
[  959.614884]  el0t_64_sync+0x194/0x198
[  959.614889] Code: 54000088 14000067 f940037b b4000cbb (b8570360)
[  959.614892] ---[ end trace 0000000000000000 ]---
[  959.614895] Kernel panic - not syncing: Oops: Fatal exception
[  959.614897] SMP: stopping secondary CPUs
[  959.619492] Kernel Offset: 0x4b7267c40000 from 0xffff800080000000
[  959.619494] PHYS_OFFSET: 0x80000000
[  959.619496] CPU features: 0x00000010,b80140a1,8841720b
[  959.619498] Memory Limit: none
[  960.040819] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-08-28 10:43   ` Ming Lei
@ 2023-09-13  8:59     ` Yi Zhang
  2023-09-16  6:55       ` Baokun Li
  2023-09-17  0:35       ` Bagas Sanjaya
  0 siblings, 2 replies; 14+ messages in thread
From: Yi Zhang @ 2023-09-13  8:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel,
	linux-scsi, Changhui Zhong, mark.rutland

The issue still can be reproduced on the latest linux tree[2].
To reproduce I need to run about 1000 times blktests block/001, and
bisect shows it was introduced with commit[1], as it was not 100%
reproduced, not sure if it's the culprit?


[1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
[2]
[ 2304.536339] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.540805] sr 50:0:0:0: Attached scsi CD-ROM sr3
[ 2304.544574] scsi 48:0:0:0: Power-on or device reset occurred
[ 2304.600645] sr 48:0:0:0: [sr1] scsi-1 drive
[ 2304.616364] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.624639] scsi 51:0:0:0: Power-on or device reset occurred
[ 2304.626634] sr 48:0:0:0: Attached scsi CD-ROM sr1
[ 2304.680537] sr 51:0:0:0: [sr2] scsi-1 drive
[ 2304.706394] sr 51:0:0:0: Attached scsi CD-ROM sr2
[ 2304.746329] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.754569] scsi 49:0:0:0: Power-on or device reset occurred
[ 2304.756302] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.768483] scsi 50:0:0:0: Power-on or device reset occurred
[ 2304.806321] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.810587] sr 49:0:0:0: [sr0] scsi-1 drive
[ 2304.814561] scsi 48:0:0:0: Power-on or device reset occurred
[ 2304.824475] sr 50:0:0:0: [sr3] scsi-1 drive
[ 2304.836384] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.840364] sr 49:0:0:0: Attached scsi CD-ROM sr0
[ 2304.844619] scsi 51:0:0:0: Power-on or device reset occurred
[ 2304.850444] sr 50:0:0:0: Attached scsi CD-ROM sr3
[ 2304.874563] sr 48:0:0:0: [sr1] scsi-1 drive
[ 2304.900660] sr 51:0:0:0: [sr2] scsi-1 drive
[ 2304.901506] sr 48:0:0:0: Attached scsi CD-ROM sr1
[ 2304.926306] sr 51:0:0:0: Attached scsi CD-ROM sr2
[ 2305.056432] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.056572] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.064635] scsi 50:0:0:0: Power-on or device reset occurred
[ 2305.072821] scsi 49:0:0:0: Power-on or device reset occurred
[ 2305.086286] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.086357] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.094521] scsi 51:0:0:0: Power-on or device reset occurred
[ 2305.102693] scsi 48:0:0:0: Power-on or device reset occurred
[ 2305.128785] sr 50:0:0:0: [sr0] scsi-1 drive
[ 2305.134445] sr 49:0:0:0: [sr1] scsi-1 drive
[ 2305.154728] sr 50:0:0:0: Attached scsi CD-ROM sr0
[ 2305.158607] sr 51:0:0:0: [sr2] scsi-1 drive
[ 2305.160392] sr 49:0:0:0: Attached scsi CD-ROM sr1
[ 2305.164254] sr 48:0:0:0: [sr3] scsi-1 drive
[ 2305.184185] sr 51:0:0:0: Attached scsi CD-ROM sr2
[ 2305.190086] sr 48:0:0:0: Attached scsi CD-ROM sr3
[ 2305.555658] Unable to handle kernel execute from non-executable
memory at virtual address ffffc61b656052e8
[ 2305.565301] Mem abort info:
[ 2305.568086]   ESR = 0x000000008600000e
[ 2305.571822]   EC = 0x21: IABT (current EL), IL = 32 bits
[ 2305.577123]   SET = 0, FnV = 0
[ 2305.580164]   EA = 0, S1PTW = 0
[ 2305.583292]   FSC = 0x0e: level 2 permission fault
[ 2305.588074] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080731fa0000
[ 2305.594761] [ffffc61b656052e8] pgd=1000080ffffff003,
p4d=1000080ffffff003, pud=1000080fffffe003, pmd=0068080732e00f01
[ 2305.605362] Internal error: Oops: 000000008600000e [#1] SMP
[ 2305.610922] Modules linked in: scsi_debug sr_mod pktcdvd cdrom
rfkill sunrpc vfat fat acpi_ipmi arm_spe_pmu ipmi_ssif ipmi_devintf
ipmi_msghandler arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq loop
fuse zram xfs crct10dif_ce ghash_ce nvme sha2_ce nvme_core
sha256_arm64 igb sha1_ce ast sbsa_gwdt nvme_common
i2c_designware_platform i2c_algo_bit i2c_designware_core xgene_hwmon
dm_mod [last unloaded: scsi_debug]
[ 2305.647236] CPU: 85 PID: 1 Comm: systemd Kdump: loaded Not tainted
6.6.0-rc1+ #13
[ 2305.654706] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31n (SCP: 2.10.20220810) 09/30/2022
[ 2305.663997] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2305.670946] pc : in_lookup_hashtable+0x1138/0x2000
[ 2305.675728] lr : rcu_do_batch+0x194/0x488
[ 2305.679727] sp : ffff8000802abe60
[ 2305.683029] x29: ffff8000802abe60 x28: ffffc61b6524c7c0 x27: ffffc61b63452f40
[ 2305.690152] x26: ffff080f37ab6438 x25: 000000000000000a x24: 0000000000000000
[ 2305.697274] x23: 0000000000000002 x22: ffff8000802abec0 x21: ffff080f37ab63c0
[ 2305.704396] x20: ffff07ff8136a580 x19: 0000000000000003 x18: 0000000000000000
[ 2305.711519] x17: ffff41f3d3161000 x16: ffff8000802a8000 x15: 0000000000000000
[ 2305.718641] x14: 0000000000000000 x13: ffff07ffa131802d x12: ffff80008041bb94
[ 2305.725764] x11: 0000000000000040 x10: ffff07ff802622e8 x9 : ffffc61b63452e30
[ 2305.732887] x8 : 000002189dce1780 x7 : ffff07ff8d5c1000 x6 : ffff41f3d3161000
[ 2305.740009] x5 : ffff07ff8136a580 x4 : ffff080f37aba960 x3 : 000000001550a055
[ 2305.747131] x2 : 0000000000000000 x1 : ffffc61b656052e8 x0 : ffff080184c565f0
[ 2305.754254] Call trace:
[ 2305.756687]  in_lookup_hashtable+0x1138/0x2000
[ 2305.761119]  rcu_core+0x268/0x350
[ 2305.764422]  rcu_core_si+0x18/0x30
[ 2305.767812]  __do_softirq+0x120/0x3d4
[ 2305.771462]  ____do_softirq+0x18/0x30
[ 2305.775112]  call_on_irq_stack+0x24/0x30
[ 2305.779022]  do_softirq_own_stack+0x24/0x38
[ 2305.783192]  __irq_exit_rcu+0xfc/0x130
[ 2305.786929]  irq_exit_rcu+0x18/0x30
[ 2305.790404]  el1_interrupt+0x4c/0xe8
[ 2305.793969]  el1h_64_irq_handler+0x18/0x28
[ 2305.798052]  el1h_64_irq+0x78/0x80
[ 2305.801441]  d_same_name+0x50/0xd0
[ 2305.804832]  __lookup_slow+0x64/0x158
[ 2305.808482]  walk_component+0xe0/0x1a0
[ 2305.812219]  path_lookupat+0x7c/0x1b8
[ 2305.815869]  filename_lookup+0xb4/0x1b8
[ 2305.819692]  vfs_statx+0x94/0x1a8
[ 2305.822995]  vfs_fstatat+0xd4/0x110
[ 2305.826471]  __do_sys_newfstatat+0x58/0xa8
[ 2305.830556]  __arm64_sys_newfstatat+0x28/0x40
[ 2305.834901]  invoke_syscall.constprop.0+0x80/0xd8
[ 2305.839592]  do_el0_svc+0x48/0xd0
[ 2305.842894]  el0_svc+0x4c/0x1c0
[ 2305.846023]  el0t_64_sync_handler+0x120/0x130
[ 2305.850367]  el0t_64_sync+0x1a4/0x1a8
[ 2305.854017] Code: 00000000 00000000 00000000 00000000 (84c565f1)
[ 2305.860098] SMP: stopping secondary CPUs
[ 2305.865048] Starting crashdump kernel...
[ 2305.868958] Bye!


On Mon, Aug 28, 2023 at 6:43 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Aug 23, 2023 at 4:47 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote:
> > >
> > > Looks the issue is more related with vfs, so forward to vfs list.
> > >
> > > ----- Forwarded message from Changhui Zhong <czhong@redhat.com> -----
> > >
> > > Date: Wed, 23 Aug 2023 11:17:55 +0800
> > > From: Changhui Zhong <czhong@redhat.com>
> > > To: linux-scsi@vger.kernel.org
> > > Cc: Ming Lei <ming.lei@redhat.com>
> > > Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
> > >
> > > Hello,
> > >
> > > triggered below warning issue with branch
> > > "
> > > Tree: mainline.kernel.org-clang
> > > Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > @ master
> > > Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d
> > > Commit Name: v6.5-rc7-18-g89bf6209cad6
> > > Kernel information:
> > > Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> > > "
> > > for more detail,please check
> > > https://datawarehouse.cki-project.org/kcidb/tests/9232643
> > >
> > > #modprobe scsi_debug virtual_gb=128
> > > #echo none > /sys/block/sdb/queue/scheduler
> > > #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw
> > > --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G
> > > --runtime=60
> >
> > Looking at this issue it seems unlikely that this is a vfs bug.
> > We should see this all over the place and specifically not just on arm64.
> >
> > The sequence here seems to be:
> >
> > echo 4 > /proc/sys/vm/drop_caches
> > rmmod scsi_debug > /dev/null 3>&1
> >
> > [ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
> > [ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]
> >
> > So my money is on some device that gets removed still having an
> > increased refcount and pinning the dentry. Immediate suspects would be:
> >
> > 7882541ca06d ("of/platform: increase refcount of fwnode")
> >
> > but that part is complete speculation on my part.
>
> BTW, just saw another panic on 6.5-rc7, still scsi_debug test on arm64:
>
> [  959.371726] sr 50:0:0:0: Attached scsi generic sg1 type 5
> [  959.603145] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [  959.603155] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [  959.603950] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [  959.604052] scsi 49:0:0:0: Power-on or device reset occurred
> [  959.609336] sr 49:0:0:0: [sr1] scsi-1 drive
> [  959.611360] scsi 48:0:0:0: Power-on or device reset occurred
> [  959.614540] Unable to handle kernel paging request at virtual
> address 65888c2e6fe694d5
> [  959.614544] Mem abort info:
> [  959.614545]   ESR = 0x0000000096000004
> [  959.614547]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  959.614550]   SET = 0, FnV = 0
> [  959.614552]   EA = 0, S1PTW = 0
> [  959.614553]   FSC = 0x04: level 0 translation fault
> [  959.614555] Data abort info:
> [  959.614556]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [  959.614559]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [  959.614561]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  959.614563] [65888c2e6fe694d5] address between user and kernel address ranges
> [  959.614566] Internal error: Oops: 0000000096000004 [#1] SMP
> [  959.614570] Modules linked in: pktcdvd scsi_debug ipmi_watchdog
> ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu
> igb ipmi_devintf arm_cmn ipmi_msghandler arm_dmc620_pmu arm_dsu_pmu
> cppc_cpufreq acpi_tad loop fuse zram xfs nvme crct10dif_ce polyval_ce
> nvme_core polyval_generic ghash_ce sbsa_gwdt nvme_common ast
> onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]
> [  959.614620] CPU: 108 PID: 19529 Comm: check Not tainted 6.5.0-rc7 #1
> [  959.614625] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
> F31n (SCP: 2.10.20220810) 09/30/2022
> [  959.614627] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  959.614632] pc : d_alloc_parallel+0x140/0x440
> [  959.614641] lr : d_alloc_parallel+0xcc/0x440
> [  959.614646] sp : ffff80008a7d3290
> [  959.614647] x29: ffff80008a7d3290 x28: ffff07ff8230c530 x27: 65888c2e6fe69565
> [  959.614654] x26: ffffcb72eac9e1d0 x25: ffff80008a7d33d8 x24: ffff07ffa0fd3800
> [  959.614659] x23: 00000000000003c0 x22: 000000007a701548 x21: ffffcb72eac9ffd0
> [  959.614664] x20: ffff07ffa0fd35c0 x19: ffffcb72ea6e9600 x18: ffffffffffffffff
> [  959.614670] x17: 00000000440fd8e0 x16: 00000000b6431329 x15: ffff80008a7d3360
> [  959.614675] x14: ffff80008a7d3508 x13: ffffcb72e94da6d0 x12: ffff80008a7d334c
> [  959.614680] x11: 0000000c7a701548 x10: ffff3c8cb54590b8 x9 : ffffcb72e80832f4
> [  959.614685] x8 : ffff07ffa0fd35c0 x7 : 7473696c5f71725f x6 : 0000000000200008
> [  959.614690] x5 : ffffcb72ea6f4000 x4 : 00000000003d380a x3 : 0000000000000004
> [  959.614696] x2 : ffff80008a7d331c x1 : ffff07ffe1760000 x0 : 0000000000005000
> [  959.614701] Call trace:
> [  959.614703]  d_alloc_parallel+0x140/0x440
> [  959.614708]  __lookup_slow+0x64/0x158
> [  959.614714]  lookup_one_len+0xac/0xc8
> [  959.614719]  start_creating.part.0+0x88/0x198
> [  959.614725]  __debugfs_create_file+0x70/0x230
> [  959.614730]  debugfs_create_file+0x34/0x48
> [  959.614734]  blk_mq_debugfs_register_hctx+0x154/0x1d0
> [  959.614740]  blk_mq_debugfs_register+0xfc/0x1e0
> [  959.614745]  blk_register_queue+0xc0/0x1f0
> [  959.614750]  device_add_disk+0x1dc/0x3e0
> [  959.614754]  sr_probe+0x2c0/0x368
> [  959.614760]  really_probe+0x190/0x3d8
> [  959.614766]  __driver_probe_device+0x84/0x180
> [  959.614771]  driver_probe_device+0x44/0x120
> [  959.614776]  __device_attach_driver+0xc4/0x168
> [  959.614781]  bus_for_each_drv+0x8c/0xf0
> [  959.614785]  __device_attach+0xa4/0x1c0
> [  959.614790]  device_initial_probe+0x1c/0x30
> [  959.614795]  bus_probe_device+0xb4/0xc0
> [  959.614799]  device_add+0x508/0x6f8
> [  959.614803]  scsi_sysfs_add_sdev+0x8c/0x250
> [  959.614809]  scsi_add_lun+0x424/0x558
> [  959.614813]  scsi_probe_and_add_lun+0x11c/0x430
> [  959.614817]  __scsi_scan_target+0xb8/0x258
> [  959.614821]  scsi_scan_channel+0xa0/0xb8
> [  959.614825]  scsi_scan_host_selected+0x170/0x188
> [  959.614830]  store_scan+0x194/0x1a8
> [  959.614835]  dev_attr_store+0x20/0x40
> [  959.614840]  sysfs_kf_write+0x4c/0x68
> [  959.614845]  kernfs_fop_write_iter+0x13c/0x1d8
> [  959.614849]  vfs_write+0x1c0/0x310
> [  959.614855]  ksys_write+0x78/0x118
> [  959.614859]  __arm64_sys_write+0x24/0x38
> [  959.614864]  invoke_syscall+0x78/0x100
> [  959.614868]  el0_svc_common.constprop.0+0x4c/0xf8
> [  959.614871]  do_el0_svc+0x34/0x50
> [  959.614874]  el0_svc+0x34/0x108
> [  959.614879]  el0t_64_sync_handler+0x120/0x130
> [  959.614884]  el0t_64_sync+0x194/0x198
> [  959.614889] Code: 54000088 14000067 f940037b b4000cbb (b8570360)
> [  959.614892] ---[ end trace 0000000000000000 ]---
> [  959.614895] Kernel panic - not syncing: Oops: Fatal exception
> [  959.614897] SMP: stopping secondary CPUs
> [  959.619492] Kernel Offset: 0x4b7267c40000 from 0xffff800080000000
> [  959.619494] PHYS_OFFSET: 0x80000000
> [  959.619496] CPU features: 0x00000010,b80140a1,8841720b
> [  959.619498] Memory Limit: none
> [  960.040819] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-13  8:59     ` Yi Zhang
@ 2023-09-16  6:55       ` Baokun Li
  2023-09-17  9:10         ` Peter Zijlstra
  2023-09-19 15:10         ` Mark Rutland
  2023-09-17  0:35       ` Bagas Sanjaya
  1 sibling, 2 replies; 14+ messages in thread
From: Baokun Li @ 2023-09-16  6:55 UTC (permalink / raw)
  To: Yi Zhang, Ming Lei, mark.rutland
  Cc: Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel,
	linux-scsi, Changhui Zhong, yangerkun, zhangyi (F),
	Baokun Li, peterz, Kees Cook, chengzhihao

On 2023/9/13 16:59, Yi Zhang wrote:
> The issue still can be reproduced on the latest linux tree[2].
> To reproduce I need to run about 1000 times blktests block/001, and
> bisect shows it was introduced with commit[1], as it was not 100%
> reproduced, not sure if it's the culprit?
>
>
> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
Hello, everyone!

We have confirmed that the merge-in of this patch caused hlist_bl_lock
(aka, bit_spin_lock) to fail, which in turn triggered the issue above.


The process in which VFS issue arise is as follows:
1.  bl_head >>> first==dentry2 >>> dentry1
dentry2->next = dentry1
dentry2->pprev = head
dentry1->next = NULL
dentry1->pprev = dentry2

2. Concurrent deletion of dentry, hlist_bl_lock lock protection failure
```
__hlist_bl_del(dentry2)
                                __hlist_bl_del(dentry1)
                                dentry2->next = NULL;
                                dentry1->next = NULL;
                                dentry1->pprev = NULL;
head->first = dentry1
dentry1->pprev = head
dentry2->next = NULL;
dentry2->pprev = NULL;
```
3. WARN_ON/BUG_ON is triggered because dentry1 is still on the
  hlist after being deleted.

dentry1->next = NULL
dentry1->pprev = head


Verify that hlist_bl_lock is not working with the following mod:
mymod.c
```
#include <linux/kallsyms.h>
#include <linux/module.h>
#include <linux/moduleloader.h>
#include <linux/kernel.h>
#include <linux/jiffies.h>
#include <linux/sched.h>
#include <linux/smp.h>
#include <linux/cpu.h>
#include <linux/delay.h>
#include <linux/percpu.h>
#include <linux/threads.h>
#include <linux/kthread.h>
#include <linux/kernel_stat.h>
#include <linux/version.h>
#include <linux/slab.h>
#include <linux/smpboot.h>
#include <linux/pagemap.h>
#include <linux/notifier.h>
#include <linux/syscalls.h>
#include <linux/namei.h>

#include <asm/atomic.h>
#include <asm/bitops.h>

static unsigned long long a = 0, b = 0;
static struct hlist_bl_head bl_head;

struct task_struct *Thread1;
struct task_struct *Thread2;
struct task_struct *Thread3;
struct task_struct *Thread4;
struct task_struct *Thread5;
struct task_struct *Thread6;
int increase_ab(void *arg);

int increase_ab(void *arg)
{
     while (1) {
         hlist_bl_lock(&bl_head);
         if (a != b) {
             pr_err(">>> a = %llu, b = %llu \n", a, b);
             BUG();
             return -1;
         }
         if (a > (ULLONG_MAX - 4096)) {
             a = 0;
             b = 0;
         }
         a++;
         b++;
         hlist_bl_unlock(&bl_head);
         schedule();
     }
     return 0;
}

static int mymod_init(void)
{
     INIT_HLIST_BL_HEAD(&bl_head);

     Thread1 = kthread_create(increase_ab, NULL, "bl_lock_thread1");
     wake_up_process(Thread1);

     Thread2 = kthread_create(increase_ab, NULL, "bl_lock_thread2");
     wake_up_process(Thread2);

     Thread3 = kthread_create(increase_ab, NULL, "bl_lock_thread3");
     wake_up_process(Thread3);

     Thread4 = kthread_create(increase_ab, NULL, "bl_lock_thread4");
     wake_up_process(Thread4);

     Thread5 = kthread_create(increase_ab, NULL, "bl_lock_thread5");
     wake_up_process(Thread5);

     Thread6 = kthread_create(increase_ab, NULL, "bl_lock_thread6");
     wake_up_process(Thread6);

         return 0;
}

static void mymod_exit(void)
{
     if (Thread1)
         kthread_stop(Thread1);
         if (Thread2)
                 kthread_stop(Thread2);
         if (Thread3)
                 kthread_stop(Thread3);
         if (Thread4)
                 kthread_stop(Thread4);
         if (Thread5)
                 kthread_stop(Thread5);
         if (Thread6)
                 kthread_stop(Thread6);
}

module_init(mymod_init);
module_exit(mymod_exit);

MODULE_LICENSE("Dual BSD/GPL");

```


After 9257959a6e5b ("locking/atomic: scripts: restructure fallback 
ifdeffery") is
merged in, we can see the problem when inserting the ko:
```
[root@localhost ~]# insmod mymod.ko
[   37.994787][  T621] >>> a = 725, b = 724
[   37.995313][  T621] ------------[ cut here ]------------
[   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
[r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG: 
00000000f2000800 [#1] SMP
[   37.997420][  T621] Modules linked in: mymod(E)
[   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted: 
G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
[   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
[   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT 
-SSBS BTYPE=--)
[   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
[   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
[   38.001416][  T621] sp : ffff800008b4be40
[   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27: 
0000000000000000
[   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24: 
0000000000000000
[   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21: 
0000000000000001
[   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18: 
0000000000000000
[   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15: 
ffffffffffffffff
[   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12: 
ffffd99332175b80
[   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 : 
ffffd9933022a9d8
[   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 
ffffd993320b5b40
[   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 : 
0000000000000000
[   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 
0000000000000015
[   38.009709][  T621] Call trace:
[   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
[   38.010539][  T621]  kthread+0xdc/0xf0
[   38.010927][  T621]  ret_from_fork+0x10/0x20
[   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
[   38.012067][  T621] ---[ end trace 0000000000000000 ]---
[   38.012603][  T621] Kernel panic - not syncing: Oops - BUG: Fatal 
exception
[   38.013311][  T621] SMP: stopping secondary CPUs
[   38.013818][  T621] Kernel Offset: 0x599328000000 from 0xffff800008000000
[   38.014508][  T621] PHYS_OFFSET: 0x40000000
[   38.014933][  T621] CPU features: 0x000000,0220080c,44016203
[   38.015510][  T621] Memory Limit: none
[   38.015950][  T621] ---[ end Kernel panic - not syncing: Oops - BUG: 
Fatal exception ]---
```

























^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-13  8:59     ` Yi Zhang
  2023-09-16  6:55       ` Baokun Li
@ 2023-09-17  0:35       ` Bagas Sanjaya
  2023-09-29 13:24         ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 14+ messages in thread
From: Bagas Sanjaya @ 2023-09-17  0:35 UTC (permalink / raw)
  To: Yi Zhang, Ming Lei
  Cc: Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel,
	linux-scsi, Changhui Zhong, Mark Rutland, Peter Zijlstra,
	Kees Cook, Linux Regressions

[-- Attachment #1: Type: text/plain, Size: 7266 bytes --]

On Wed, Sep 13, 2023 at 04:59:31PM +0800, Yi Zhang wrote:
> The issue still can be reproduced on the latest linux tree[2].
> To reproduce I need to run about 1000 times blktests block/001, and
> bisect shows it was introduced with commit[1], as it was not 100%
> reproduced, not sure if it's the culprit?
> 
> 
> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> [2]
> [ 2304.536339] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2304.540805] sr 50:0:0:0: Attached scsi CD-ROM sr3
> [ 2304.544574] scsi 48:0:0:0: Power-on or device reset occurred
> [ 2304.600645] sr 48:0:0:0: [sr1] scsi-1 drive
> [ 2304.616364] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2304.624639] scsi 51:0:0:0: Power-on or device reset occurred
> [ 2304.626634] sr 48:0:0:0: Attached scsi CD-ROM sr1
> [ 2304.680537] sr 51:0:0:0: [sr2] scsi-1 drive
> [ 2304.706394] sr 51:0:0:0: Attached scsi CD-ROM sr2
> [ 2304.746329] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2304.754569] scsi 49:0:0:0: Power-on or device reset occurred
> [ 2304.756302] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2304.768483] scsi 50:0:0:0: Power-on or device reset occurred
> [ 2304.806321] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2304.810587] sr 49:0:0:0: [sr0] scsi-1 drive
> [ 2304.814561] scsi 48:0:0:0: Power-on or device reset occurred
> [ 2304.824475] sr 50:0:0:0: [sr3] scsi-1 drive
> [ 2304.836384] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2304.840364] sr 49:0:0:0: Attached scsi CD-ROM sr0
> [ 2304.844619] scsi 51:0:0:0: Power-on or device reset occurred
> [ 2304.850444] sr 50:0:0:0: Attached scsi CD-ROM sr3
> [ 2304.874563] sr 48:0:0:0: [sr1] scsi-1 drive
> [ 2304.900660] sr 51:0:0:0: [sr2] scsi-1 drive
> [ 2304.901506] sr 48:0:0:0: Attached scsi CD-ROM sr1
> [ 2304.926306] sr 51:0:0:0: Attached scsi CD-ROM sr2
> [ 2305.056432] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2305.056572] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2305.064635] scsi 50:0:0:0: Power-on or device reset occurred
> [ 2305.072821] scsi 49:0:0:0: Power-on or device reset occurred
> [ 2305.086286] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2305.086357] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [ 2305.094521] scsi 51:0:0:0: Power-on or device reset occurred
> [ 2305.102693] scsi 48:0:0:0: Power-on or device reset occurred
> [ 2305.128785] sr 50:0:0:0: [sr0] scsi-1 drive
> [ 2305.134445] sr 49:0:0:0: [sr1] scsi-1 drive
> [ 2305.154728] sr 50:0:0:0: Attached scsi CD-ROM sr0
> [ 2305.158607] sr 51:0:0:0: [sr2] scsi-1 drive
> [ 2305.160392] sr 49:0:0:0: Attached scsi CD-ROM sr1
> [ 2305.164254] sr 48:0:0:0: [sr3] scsi-1 drive
> [ 2305.184185] sr 51:0:0:0: Attached scsi CD-ROM sr2
> [ 2305.190086] sr 48:0:0:0: Attached scsi CD-ROM sr3
> [ 2305.555658] Unable to handle kernel execute from non-executable
> memory at virtual address ffffc61b656052e8
> [ 2305.565301] Mem abort info:
> [ 2305.568086]   ESR = 0x000000008600000e
> [ 2305.571822]   EC = 0x21: IABT (current EL), IL = 32 bits
> [ 2305.577123]   SET = 0, FnV = 0
> [ 2305.580164]   EA = 0, S1PTW = 0
> [ 2305.583292]   FSC = 0x0e: level 2 permission fault
> [ 2305.588074] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080731fa0000
> [ 2305.594761] [ffffc61b656052e8] pgd=1000080ffffff003,
> p4d=1000080ffffff003, pud=1000080fffffe003, pmd=0068080732e00f01
> [ 2305.605362] Internal error: Oops: 000000008600000e [#1] SMP
> [ 2305.610922] Modules linked in: scsi_debug sr_mod pktcdvd cdrom
> rfkill sunrpc vfat fat acpi_ipmi arm_spe_pmu ipmi_ssif ipmi_devintf
> ipmi_msghandler arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq loop
> fuse zram xfs crct10dif_ce ghash_ce nvme sha2_ce nvme_core
> sha256_arm64 igb sha1_ce ast sbsa_gwdt nvme_common
> i2c_designware_platform i2c_algo_bit i2c_designware_core xgene_hwmon
> dm_mod [last unloaded: scsi_debug]
> [ 2305.647236] CPU: 85 PID: 1 Comm: systemd Kdump: loaded Not tainted
> 6.6.0-rc1+ #13
> [ 2305.654706] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
> F31n (SCP: 2.10.20220810) 09/30/2022
> [ 2305.663997] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 2305.670946] pc : in_lookup_hashtable+0x1138/0x2000
> [ 2305.675728] lr : rcu_do_batch+0x194/0x488
> [ 2305.679727] sp : ffff8000802abe60
> [ 2305.683029] x29: ffff8000802abe60 x28: ffffc61b6524c7c0 x27: ffffc61b63452f40
> [ 2305.690152] x26: ffff080f37ab6438 x25: 000000000000000a x24: 0000000000000000
> [ 2305.697274] x23: 0000000000000002 x22: ffff8000802abec0 x21: ffff080f37ab63c0
> [ 2305.704396] x20: ffff07ff8136a580 x19: 0000000000000003 x18: 0000000000000000
> [ 2305.711519] x17: ffff41f3d3161000 x16: ffff8000802a8000 x15: 0000000000000000
> [ 2305.718641] x14: 0000000000000000 x13: ffff07ffa131802d x12: ffff80008041bb94
> [ 2305.725764] x11: 0000000000000040 x10: ffff07ff802622e8 x9 : ffffc61b63452e30
> [ 2305.732887] x8 : 000002189dce1780 x7 : ffff07ff8d5c1000 x6 : ffff41f3d3161000
> [ 2305.740009] x5 : ffff07ff8136a580 x4 : ffff080f37aba960 x3 : 000000001550a055
> [ 2305.747131] x2 : 0000000000000000 x1 : ffffc61b656052e8 x0 : ffff080184c565f0
> [ 2305.754254] Call trace:
> [ 2305.756687]  in_lookup_hashtable+0x1138/0x2000
> [ 2305.761119]  rcu_core+0x268/0x350
> [ 2305.764422]  rcu_core_si+0x18/0x30
> [ 2305.767812]  __do_softirq+0x120/0x3d4
> [ 2305.771462]  ____do_softirq+0x18/0x30
> [ 2305.775112]  call_on_irq_stack+0x24/0x30
> [ 2305.779022]  do_softirq_own_stack+0x24/0x38
> [ 2305.783192]  __irq_exit_rcu+0xfc/0x130
> [ 2305.786929]  irq_exit_rcu+0x18/0x30
> [ 2305.790404]  el1_interrupt+0x4c/0xe8
> [ 2305.793969]  el1h_64_irq_handler+0x18/0x28
> [ 2305.798052]  el1h_64_irq+0x78/0x80
> [ 2305.801441]  d_same_name+0x50/0xd0
> [ 2305.804832]  __lookup_slow+0x64/0x158
> [ 2305.808482]  walk_component+0xe0/0x1a0
> [ 2305.812219]  path_lookupat+0x7c/0x1b8
> [ 2305.815869]  filename_lookup+0xb4/0x1b8
> [ 2305.819692]  vfs_statx+0x94/0x1a8
> [ 2305.822995]  vfs_fstatat+0xd4/0x110
> [ 2305.826471]  __do_sys_newfstatat+0x58/0xa8
> [ 2305.830556]  __arm64_sys_newfstatat+0x28/0x40
> [ 2305.834901]  invoke_syscall.constprop.0+0x80/0xd8
> [ 2305.839592]  do_el0_svc+0x48/0xd0
> [ 2305.842894]  el0_svc+0x4c/0x1c0
> [ 2305.846023]  el0t_64_sync_handler+0x120/0x130
> [ 2305.850367]  el0t_64_sync+0x1a4/0x1a8
> [ 2305.854017] Code: 00000000 00000000 00000000 00000000 (84c565f1)
> [ 2305.860098] SMP: stopping secondary CPUs
> [ 2305.865048] Starting crashdump kernel...
> [ 2305.868958] Bye!
> 
> 

Please don't top-post; reply inline with appropriate context instead.

Anyway, thanks for bisecting this regression. I'm adding it to regzbot:

#regzbot ^introduced: 9257959a6e5b4f
#regzbot title: restructuring atomic locking conditionals causes vfs dentry lock protection failure

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-16  6:55       ` Baokun Li
@ 2023-09-17  9:10         ` Peter Zijlstra
  2023-09-17  9:26           ` Peter Zijlstra
  2023-09-18  1:10           ` Baokun Li
  2023-09-19 15:10         ` Mark Rutland
  1 sibling, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2023-09-17  9:10 UTC (permalink / raw)
  To: Baokun Li
  Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner,
	linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi,
	Changhui Zhong, yangerkun, zhangyi (F),
	Kees Cook, chengzhihao

On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
> On 2023/9/13 16:59, Yi Zhang wrote:
> > The issue still can be reproduced on the latest linux tree[2].
> > To reproduce I need to run about 1000 times blktests block/001, and
> > bisect shows it was introduced with commit[1], as it was not 100%
> > reproduced, not sure if it's the culprit?
> > 
> > 
> > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> Hello, everyone!
> 
> We have confirmed that the merge-in of this patch caused hlist_bl_lock
> (aka, bit_spin_lock) to fail, which in turn triggered the issue above.

> [root@localhost ~]# insmod mymod.ko
> [   37.994787][  T621] >>> a = 725, b = 724
> [   37.995313][  T621] ------------[ cut here ]------------
> [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
> [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
> 00000000f2000800 [#1] SMP
> [   37.997420][  T621] Modules linked in: mymod(E)
> [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
> G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
> [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
> [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
> [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
> [   38.001416][  T621] sp : ffff800008b4be40
> [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
> 0000000000000000
> [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
> 0000000000000000
> [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
> 0000000000000001
> [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
> 0000000000000000
> [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
> ffffffffffffffff
> [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
> ffffd99332175b80
> [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
> ffffd9933022a9d8
> [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
> ffffd993320b5b40
> [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
> 0000000000000000
> [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> 0000000000000015
> [   38.009709][  T621] Call trace:
> [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
> [   38.010539][  T621]  kthread+0xdc/0xf0
> [   38.010927][  T621]  ret_from_fork+0x10/0x20
> [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
> [   38.012067][  T621] ---[ end trace 0000000000000000 ]---

Is this arm64 or something? You seem to have forgotten to mention what
platform you're using.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-17  9:10         ` Peter Zijlstra
@ 2023-09-17  9:26           ` Peter Zijlstra
  2023-09-18  1:52             ` Baokun Li
  2023-09-18  1:10           ` Baokun Li
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2023-09-17  9:26 UTC (permalink / raw)
  To: Baokun Li
  Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner,
	linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi,
	Changhui Zhong, yangerkun, zhangyi (F),
	Kees Cook, chengzhihao

On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote:
> On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
> > On 2023/9/13 16:59, Yi Zhang wrote:
> > > The issue still can be reproduced on the latest linux tree[2].
> > > To reproduce I need to run about 1000 times blktests block/001, and
> > > bisect shows it was introduced with commit[1], as it was not 100%
> > > reproduced, not sure if it's the culprit?
> > > 
> > > 
> > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> > Hello, everyone!
> > 
> > We have confirmed that the merge-in of this patch caused hlist_bl_lock
> > (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
> 
> > [root@localhost ~]# insmod mymod.ko
> > [   37.994787][  T621] >>> a = 725, b = 724
> > [   37.995313][  T621] ------------[ cut here ]------------
> > [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
> > [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
> > 00000000f2000800 [#1] SMP
> > [   37.997420][  T621] Modules linked in: mymod(E)
> > [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
> > G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
> > [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
> > [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> > BTYPE=--)
> > [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
> > [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
> > [   38.001416][  T621] sp : ffff800008b4be40
> > [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
> > 0000000000000000
> > [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
> > 0000000000000000
> > [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
> > 0000000000000001
> > [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
> > 0000000000000000
> > [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
> > ffffffffffffffff
> > [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
> > ffffd99332175b80
> > [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
> > ffffd9933022a9d8
> > [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
> > ffffd993320b5b40
> > [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
> > 0000000000000000
> > [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> > 0000000000000015
> > [   38.009709][  T621] Call trace:
> > [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
> > [   38.010539][  T621]  kthread+0xdc/0xf0
> > [   38.010927][  T621]  ret_from_fork+0x10/0x20
> > [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
> > [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
> 
> Is this arm64 or something? You seem to have forgotten to mention what
> platform you're using.

Is that an LSE or LLSC arm64 ?

Anyway, it seems that ARM64 shouldn't be using the fallback as it does
everything itself.

Mark, can you have a look please? At first glance the
atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h
seems in order..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-17  9:10         ` Peter Zijlstra
  2023-09-17  9:26           ` Peter Zijlstra
@ 2023-09-18  1:10           ` Baokun Li
  2023-09-18 10:20             ` Yi Zhang
  1 sibling, 1 reply; 14+ messages in thread
From: Baokun Li @ 2023-09-18  1:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner,
	linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi,
	Changhui Zhong, yangerkun, zhangyi (F),
	Kees Cook, chengzhihao, Baokun Li

On 2023/9/17 17:10, Peter Zijlstra wrote:
> On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
>> On 2023/9/13 16:59, Yi Zhang wrote:
>>> The issue still can be reproduced on the latest linux tree[2].
>>> To reproduce I need to run about 1000 times blktests block/001, and
>>> bisect shows it was introduced with commit[1], as it was not 100%
>>> reproduced, not sure if it's the culprit?
>>>
>>>
>>> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
>> Hello, everyone!
>>
>> We have confirmed that the merge-in of this patch caused hlist_bl_lock
>> (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
>> [root@localhost ~]# insmod mymod.ko
>> [   37.994787][  T621] >>> a = 725, b = 724
>> [   37.995313][  T621] ------------[ cut here ]------------
>> [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
>> [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
>> 00000000f2000800 [#1] SMP
>> [   37.997420][  T621] Modules linked in: mymod(E)
>> [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
>> G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
>> [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
>> [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
>> [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
>> [   38.001416][  T621] sp : ffff800008b4be40
>> [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
>> 0000000000000000
>> [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
>> 0000000000000000
>> [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
>> 0000000000000001
>> [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
>> 0000000000000000
>> [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
>> ffffffffffffffff
>> [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
>> ffffd99332175b80
>> [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
>> ffffd9933022a9d8
>> [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
>> ffffd993320b5b40
>> [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
>> 0000000000000000
>> [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>> 0000000000000015
>> [   38.009709][  T621] Call trace:
>> [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
>> [   38.010539][  T621]  kthread+0xdc/0xf0
>> [   38.010927][  T621]  ret_from_fork+0x10/0x20
>> [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
>> [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
> Is this arm64 or something? You seem to have forgotten to mention what
> platform you're using.
>
Sorry for the late reply.
We tested both x86 and arm64, and the problem is only encountered under 
arm64.

-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-17  9:26           ` Peter Zijlstra
@ 2023-09-18  1:52             ` Baokun Li
  2023-09-18 18:42               ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Baokun Li @ 2023-09-18  1:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yi Zhang, Ming Lei, mark.rutland, Christian Brauner,
	linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi,
	Changhui Zhong, yangerkun, zhangyi (F),
	Kees Cook, chengzhihao, Baokun Li

On 2023/9/17 17:26, Peter Zijlstra wrote:
> On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote:
>> On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
>>> On 2023/9/13 16:59, Yi Zhang wrote:
>>>> The issue still can be reproduced on the latest linux tree[2].
>>>> To reproduce I need to run about 1000 times blktests block/001, and
>>>> bisect shows it was introduced with commit[1], as it was not 100%
>>>> reproduced, not sure if it's the culprit?
>>>>
>>>>
>>>> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
>>> Hello, everyone!
>>>
>>> We have confirmed that the merge-in of this patch caused hlist_bl_lock
>>> (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
>>> [root@localhost ~]# insmod mymod.ko
>>> [   37.994787][  T621] >>> a = 725, b = 724
>>> [   37.995313][  T621] ------------[ cut here ]------------
>>> [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
>>> [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
>>> 00000000f2000800 [#1] SMP
>>> [   37.997420][  T621] Modules linked in: mymod(E)
>>> [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
>>> G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
>>> [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
>>> [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
>>> BTYPE=--)
>>> [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
>>> [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
>>> [   38.001416][  T621] sp : ffff800008b4be40
>>> [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
>>> 0000000000000000
>>> [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
>>> 0000000000000000
>>> [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
>>> 0000000000000001
>>> [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
>>> 0000000000000000
>>> [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
>>> ffffffffffffffff
>>> [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
>>> ffffd99332175b80
>>> [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
>>> ffffd9933022a9d8
>>> [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
>>> ffffd993320b5b40
>>> [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
>>> 0000000000000000
>>> [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
>>> 0000000000000015
>>> [   38.009709][  T621] Call trace:
>>> [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
>>> [   38.010539][  T621]  kthread+0xdc/0xf0
>>> [   38.010927][  T621]  ret_from_fork+0x10/0x20
>>> [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
>>> [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
>> Is this arm64 or something? You seem to have forgotten to mention what
>> platform you're using.
> Is that an LSE or LLSC arm64 ?

I'm not sure how to distinguish if it's LSE or LLSC, here's some info on 
the cpu:

$ cat /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
0x00000000481fd010

$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        4
Vendor ID:           HiSilicon
BIOS Vendor ID:      HiSilicon
Model:               0
Model name:          Kunpeng-920
BIOS Model name:     Kunpeng 920-4826
Stepping:            0x1
BogoMIPS:            200.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            49152K
NUMA node0 CPU(s):   0-23
NUMA node1 CPU(s):   24-47
NUMA node2 CPU(s):   48-71
NUMA node3 CPU(s):   72-95
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics 
fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

> Anyway, it seems that ARM64 shouldn't be using the fallback as it does
> everything itself.
>
> Mark, can you have a look please? At first glance the
> atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h
> seems in order..
>
We also suspect some implicit mechanism change in
raw_atomic64_fetch_or_acquire. You can reproduce the problem with the
above mod that can reproduce the problem to make it easier to locate.
I can help reproduce it and grab some information if you can't reproduce
it on your end.

-- 
With Best Regards,
Baokun Li
.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-18  1:10           ` Baokun Li
@ 2023-09-18 10:20             ` Yi Zhang
  0 siblings, 0 replies; 14+ messages in thread
From: Yi Zhang @ 2023-09-18 10:20 UTC (permalink / raw)
  To: Baokun Li
  Cc: Peter Zijlstra, Ming Lei, mark.rutland, Christian Brauner,
	linux-fsdevel, Alexander Viro, linux-kernel, linux-scsi,
	Changhui Zhong, yangerkun, zhangyi (F),
	Kees Cook, chengzhihao

On Mon, Sep 18, 2023 at 9:10 AM Baokun Li <libaokun1@huawei.com> wrote:
>
> On 2023/9/17 17:10, Peter Zijlstra wrote:
> > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
> >> On 2023/9/13 16:59, Yi Zhang wrote:
> >>> The issue still can be reproduced on the latest linux tree[2].
> >>> To reproduce I need to run about 1000 times blktests block/001, and
> >>> bisect shows it was introduced with commit[1], as it was not 100%
> >>> reproduced, not sure if it's the culprit?
> >>>
> >>>
> >>> [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> >> Hello, everyone!
> >>
> >> We have confirmed that the merge-in of this patch caused hlist_bl_lock
> >> (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
> >> [root@localhost ~]# insmod mymod.ko
> >> [   37.994787][  T621] >>> a = 725, b = 724
> >> [   37.995313][  T621] ------------[ cut here ]------------
> >> [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
> >> [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
> >> 00000000f2000800 [#1] SMP
> >> [   37.997420][  T621] Modules linked in: mymod(E)
> >> [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
> >> G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
> >> [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
> >> [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> >> BTYPE=--)
> >> [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
> >> [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
> >> [   38.001416][  T621] sp : ffff800008b4be40
> >> [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
> >> 0000000000000000
> >> [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
> >> 0000000000000000
> >> [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
> >> 0000000000000001
> >> [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
> >> 0000000000000000
> >> [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
> >> ffffffffffffffff
> >> [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
> >> ffffd99332175b80
> >> [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
> >> ffffd9933022a9d8
> >> [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
> >> ffffd993320b5b40
> >> [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
> >> 0000000000000000
> >> [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> >> 0000000000000015
> >> [   38.009709][  T621] Call trace:
> >> [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
> >> [   38.010539][  T621]  kthread+0xdc/0xf0
> >> [   38.010927][  T621]  ret_from_fork+0x10/0x20
> >> [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
> >> [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
> > Is this arm64 or something? You seem to have forgotten to mention what
> > platform you're using.
> >
> Sorry for the late reply.
> We tested both x86 and arm64, and the problem is only encountered under
> arm64.

Yeah, my reproduced environment is also aarch64.



>
> --
> With Best Regards,
> Baokun Li
> .
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-18  1:52             ` Baokun Li
@ 2023-09-18 18:42               ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2023-09-18 18:42 UTC (permalink / raw)
  To: Baokun Li
  Cc: Peter Zijlstra, Yi Zhang, Ming Lei, mark.rutland,
	Christian Brauner, linux-fsdevel, Alexander Viro, linux-kernel,
	linux-scsi, Changhui Zhong, yangerkun, zhangyi (F),
	Kees Cook, chengzhihao

On Mon, Sep 18, 2023 at 09:52:28AM +0800, Baokun Li wrote:
> On 2023/9/17 17:26, Peter Zijlstra wrote:
> > On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote:
> > > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
> > > > On 2023/9/13 16:59, Yi Zhang wrote:
> > > > > The issue still can be reproduced on the latest linux tree[2].
> > > > > To reproduce I need to run about 1000 times blktests block/001, and
> > > > > bisect shows it was introduced with commit[1], as it was not 100%
> > > > > reproduced, not sure if it's the culprit?
> > > > > 
> > > > > 
> > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> > > > Hello, everyone!
> > > > 
> > > > We have confirmed that the merge-in of this patch caused hlist_bl_lock
> > > > (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
> > > > [root@localhost ~]# insmod mymod.ko
> > > > [   37.994787][  T621] >>> a = 725, b = 724
> > > > [   37.995313][  T621] ------------[ cut here ]------------
> > > > [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
> > > > [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
> > > > 00000000f2000800 [#1] SMP
> > > > [   37.997420][  T621] Modules linked in: mymod(E)
> > > > [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
> > > > G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
> > > > [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
> > > > [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> > > > BTYPE=--)
> > > > [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.001416][  T621] sp : ffff800008b4be40
> > > > [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
> > > > 0000000000000000
> > > > [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
> > > > 0000000000000000
> > > > [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
> > > > 0000000000000001
> > > > [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
> > > > 0000000000000000
> > > > [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
> > > > ffffffffffffffff
> > > > [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
> > > > ffffd99332175b80
> > > > [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
> > > > ffffd9933022a9d8
> > > > [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
> > > > ffffd993320b5b40
> > > > [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
> > > > 0000000000000000
> > > > [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> > > > 0000000000000015
> > > > [   38.009709][  T621] Call trace:
> > > > [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.010539][  T621]  kthread+0xdc/0xf0
> > > > [   38.010927][  T621]  ret_from_fork+0x10/0x20
> > > > [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
> > > > [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
> > > Is this arm64 or something? You seem to have forgotten to mention what
> > > platform you're using.
> > Is that an LSE or LLSC arm64 ?
> 
> I'm not sure how to distinguish if it's LSE or LLSC, here's some info on the
> cpu:
> 
> $ cat /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
> 0x00000000481fd010
> 
> $ lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              96
> On-line CPU(s) list: 0-95
> Thread(s) per core:  1
> Core(s) per socket:  48
> Socket(s):           2
> NUMA node(s):        4
> Vendor ID:           HiSilicon
> BIOS Vendor ID:      HiSilicon
> Model:               0
> Model name:          Kunpeng-920
> BIOS Model name:     Kunpeng 920-4826
> Stepping:            0x1
> BogoMIPS:            200.00
> L1d cache:           64K
> L1i cache:           64K
> L2 cache:            512K
> L3 cache:            49152K
> NUMA node0 CPU(s):   0-23
> NUMA node1 CPU(s):   24-47
> NUMA node2 CPU(s):   48-71
> NUMA node3 CPU(s):   72-95
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
> 
> > Anyway, it seems that ARM64 shouldn't be using the fallback as it does
> > everything itself.
> > 
> > Mark, can you have a look please? At first glance the
> > atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h
> > seems in order..
> > 
> We also suspect some implicit mechanism change in
> raw_atomic64_fetch_or_acquire. You can reproduce the problem with the
> above mod that can reproduce the problem to make it easier to locate.
> I can help reproduce it and grab some information if you can't reproduce
> it on your end.

FWIW this looks a lot like the crash I reported last week:
https://lore.kernel.org/linux-fsdevel/ZQep0OR0uMmR%2Fwg3@dread.disaster.area/T/#t

Also arm64, but virtualized.  I /think/ the host is some Ampere box,
though I have no idea what kind since it's just some Oracle Cloud A1
instance.  The internet claims "Ampere Altra" processors[1].

# lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 2
  On-line CPU(s) list:  0,1
Vendor ID:              ARM
  Model name:           Neoverse-N1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           r3p1
    BogoMIPS:           50.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
NUMA:                   
  NUMA node(s):         1
  NUMA node0 CPU(s):    0,1
Vulnerabilities:        
  Gather data sampling: Not affected
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec rstack overflow: Not affected
  Spec store bypass:    Vulnerable
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Mitigation; CSV2, but not BHB
  Srbds:                Not affected
  Tsx async abort:      Not affected

[1] https://www.oracle.com/cloud/compute/arm/ 

--D

> -- 
> With Best Regards,
> Baokun Li
> .

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-16  6:55       ` Baokun Li
  2023-09-17  9:10         ` Peter Zijlstra
@ 2023-09-19 15:10         ` Mark Rutland
  1 sibling, 0 replies; 14+ messages in thread
From: Mark Rutland @ 2023-09-19 15:10 UTC (permalink / raw)
  To: Baokun Li
  Cc: Yi Zhang, Ming Lei, Christian Brauner, linux-fsdevel,
	Alexander Viro, linux-kernel, linux-scsi, Changhui Zhong,
	yangerkun, zhangyi (F),
	peterz, Kees Cook, chengzhihao, Will Deacon

On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
> On 2023/9/13 16:59, Yi Zhang wrote:
> > The issue still can be reproduced on the latest linux tree[2].
> > To reproduce I need to run about 1000 times blktests block/001, and
> > bisect shows it was introduced with commit[1], as it was not 100%
> > reproduced, not sure if it's the culprit?
> > 
> > 
> > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> Hello, everyone!
> 
> We have confirmed that the merge-in of this patch caused hlist_bl_lock
> (aka, bit_spin_lock) to fail, which in turn triggered the issue above.

Thanks for this!

I believe I know what the issue is.

I took a look at the generated assembly for hlist_bl_lock() and
hlist_bl_unlock(), and for the latter I see a plain store rather than a
store-release as was intended.

I believe that in 9257959a6e5b, I messed up the fallback logic for
atomic*_set_release():

| static __always_inline void 
| raw_atomic64_set_release(atomic64_t *v, s64 i)
| {
| #if defined(arch_atomic64_set_release)
|         arch_atomic64_set_release(v, i);
| #elif defined(arch_atomic64_set)
|         arch_atomic64_set(v, i);
| #else
|         if (__native_word(atomic64_t)) {
|                 smp_store_release(&(v)->counter, i);
|         } else {
|                 __atomic_release_fence();
|                 raw_atomic64_set(v, i);
|         }    
| #endif
| }

On arm64 we want to use smp_store_release(), and don't provide
arch_atomic64_set_release(). Unfortunately we *do* provide arch_atomic64_set(),
and the ifdeffery above will choose that in preference.

Prior to that commit, the ifdeffery would do what we want:

| #ifndef arch_atomic64_set_release
| static __always_inline void
| arch_atomic64_set_release(atomic64_t *v, s64 i)
| {
|         if (__native_word(atomic64_t)) {
|                 smp_store_release(&(v)->counter, i);
|         } else {
|                 __atomic_release_fence();
|                 arch_atomic64_set(v, i);
|         }
| }
| #define arch_atomic64_set_release arch_atomic64_set_release
| #endif

That explains the lock going wrong -- we lose the RELEASE semantic on
hlist_bl_unlock(), and so loads and stores within the critical section aren't
guaranteed to be visible to the next hlist_bl_lock(). On x86 this happens to
work becauase of TSO.

I'm working on fixing that now; I'll try to have a patch shortly.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
  2023-09-17  0:35       ` Bagas Sanjaya
@ 2023-09-29 13:24         ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 14+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-29 13:24 UTC (permalink / raw)
  To: Linux Regressions; +Cc: linux-fsdevel, linux-kernel, linux-scsi

On 17.09.23 02:35, Bagas Sanjaya wrote:
> On Wed, Sep 13, 2023 at 04:59:31PM +0800, Yi Zhang wrote:
>> The issue still can be reproduced on the latest linux tree[2].
>> To reproduce I need to run about 1000 times blktests block/001, and
>> bisect shows it was introduced with commit[1], as it was not 100%
>> reproduced, not sure if it's the culprit?
> 
> Anyway, thanks for bisecting this regression. I'm adding it to regzbot:
> 
> #regzbot ^introduced: 9257959a6e5b4f
> #regzbot title: restructuring atomic locking conditionals causes vfs dentry lock protection failure

#regzbot fix: 6d2779ecaeb56f
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-29 13:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-23  4:06 [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] Ming Lei
2023-08-23  8:47 ` Christian Brauner
2023-08-28 10:43   ` Ming Lei
2023-09-13  8:59     ` Yi Zhang
2023-09-16  6:55       ` Baokun Li
2023-09-17  9:10         ` Peter Zijlstra
2023-09-17  9:26           ` Peter Zijlstra
2023-09-18  1:52             ` Baokun Li
2023-09-18 18:42               ` Darrick J. Wong
2023-09-18  1:10           ` Baokun Li
2023-09-18 10:20             ` Yi Zhang
2023-09-19 15:10         ` Mark Rutland
2023-09-17  0:35       ` Bagas Sanjaya
2023-09-29 13:24         ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).