linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	Changhui Zhong <czhong@redhat.com>,
	mark.rutland@arm.com
Subject: Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
Date: Wed, 13 Sep 2023 16:59:31 +0800	[thread overview]
Message-ID: <CAHj4cs_MqqWYy+pKrNrLqTb=eoSOXcZdjPXy44x-aA1WvdVv0w@mail.gmail.com> (raw)
In-Reply-To: <CAFj5m9KiBDzNHCsTjwUevZh3E3RRda2ypj9+QcRrqEsJnf9rXQ@mail.gmail.com>

The issue still can be reproduced on the latest linux tree[2].
To reproduce I need to run about 1000 times blktests block/001, and
bisect shows it was introduced with commit[1], as it was not 100%
reproduced, not sure if it's the culprit?


[1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
[2]
[ 2304.536339] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.540805] sr 50:0:0:0: Attached scsi CD-ROM sr3
[ 2304.544574] scsi 48:0:0:0: Power-on or device reset occurred
[ 2304.600645] sr 48:0:0:0: [sr1] scsi-1 drive
[ 2304.616364] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.624639] scsi 51:0:0:0: Power-on or device reset occurred
[ 2304.626634] sr 48:0:0:0: Attached scsi CD-ROM sr1
[ 2304.680537] sr 51:0:0:0: [sr2] scsi-1 drive
[ 2304.706394] sr 51:0:0:0: Attached scsi CD-ROM sr2
[ 2304.746329] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.754569] scsi 49:0:0:0: Power-on or device reset occurred
[ 2304.756302] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.768483] scsi 50:0:0:0: Power-on or device reset occurred
[ 2304.806321] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.810587] sr 49:0:0:0: [sr0] scsi-1 drive
[ 2304.814561] scsi 48:0:0:0: Power-on or device reset occurred
[ 2304.824475] sr 50:0:0:0: [sr3] scsi-1 drive
[ 2304.836384] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2304.840364] sr 49:0:0:0: Attached scsi CD-ROM sr0
[ 2304.844619] scsi 51:0:0:0: Power-on or device reset occurred
[ 2304.850444] sr 50:0:0:0: Attached scsi CD-ROM sr3
[ 2304.874563] sr 48:0:0:0: [sr1] scsi-1 drive
[ 2304.900660] sr 51:0:0:0: [sr2] scsi-1 drive
[ 2304.901506] sr 48:0:0:0: Attached scsi CD-ROM sr1
[ 2304.926306] sr 51:0:0:0: Attached scsi CD-ROM sr2
[ 2305.056432] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.056572] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.064635] scsi 50:0:0:0: Power-on or device reset occurred
[ 2305.072821] scsi 49:0:0:0: Power-on or device reset occurred
[ 2305.086286] scsi 51:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.086357] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
   0191 PQ: 0 ANSI: 7
[ 2305.094521] scsi 51:0:0:0: Power-on or device reset occurred
[ 2305.102693] scsi 48:0:0:0: Power-on or device reset occurred
[ 2305.128785] sr 50:0:0:0: [sr0] scsi-1 drive
[ 2305.134445] sr 49:0:0:0: [sr1] scsi-1 drive
[ 2305.154728] sr 50:0:0:0: Attached scsi CD-ROM sr0
[ 2305.158607] sr 51:0:0:0: [sr2] scsi-1 drive
[ 2305.160392] sr 49:0:0:0: Attached scsi CD-ROM sr1
[ 2305.164254] sr 48:0:0:0: [sr3] scsi-1 drive
[ 2305.184185] sr 51:0:0:0: Attached scsi CD-ROM sr2
[ 2305.190086] sr 48:0:0:0: Attached scsi CD-ROM sr3
[ 2305.555658] Unable to handle kernel execute from non-executable
memory at virtual address ffffc61b656052e8
[ 2305.565301] Mem abort info:
[ 2305.568086]   ESR = 0x000000008600000e
[ 2305.571822]   EC = 0x21: IABT (current EL), IL = 32 bits
[ 2305.577123]   SET = 0, FnV = 0
[ 2305.580164]   EA = 0, S1PTW = 0
[ 2305.583292]   FSC = 0x0e: level 2 permission fault
[ 2305.588074] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080731fa0000
[ 2305.594761] [ffffc61b656052e8] pgd=1000080ffffff003,
p4d=1000080ffffff003, pud=1000080fffffe003, pmd=0068080732e00f01
[ 2305.605362] Internal error: Oops: 000000008600000e [#1] SMP
[ 2305.610922] Modules linked in: scsi_debug sr_mod pktcdvd cdrom
rfkill sunrpc vfat fat acpi_ipmi arm_spe_pmu ipmi_ssif ipmi_devintf
ipmi_msghandler arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq loop
fuse zram xfs crct10dif_ce ghash_ce nvme sha2_ce nvme_core
sha256_arm64 igb sha1_ce ast sbsa_gwdt nvme_common
i2c_designware_platform i2c_algo_bit i2c_designware_core xgene_hwmon
dm_mod [last unloaded: scsi_debug]
[ 2305.647236] CPU: 85 PID: 1 Comm: systemd Kdump: loaded Not tainted
6.6.0-rc1+ #13
[ 2305.654706] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31n (SCP: 2.10.20220810) 09/30/2022
[ 2305.663997] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2305.670946] pc : in_lookup_hashtable+0x1138/0x2000
[ 2305.675728] lr : rcu_do_batch+0x194/0x488
[ 2305.679727] sp : ffff8000802abe60
[ 2305.683029] x29: ffff8000802abe60 x28: ffffc61b6524c7c0 x27: ffffc61b63452f40
[ 2305.690152] x26: ffff080f37ab6438 x25: 000000000000000a x24: 0000000000000000
[ 2305.697274] x23: 0000000000000002 x22: ffff8000802abec0 x21: ffff080f37ab63c0
[ 2305.704396] x20: ffff07ff8136a580 x19: 0000000000000003 x18: 0000000000000000
[ 2305.711519] x17: ffff41f3d3161000 x16: ffff8000802a8000 x15: 0000000000000000
[ 2305.718641] x14: 0000000000000000 x13: ffff07ffa131802d x12: ffff80008041bb94
[ 2305.725764] x11: 0000000000000040 x10: ffff07ff802622e8 x9 : ffffc61b63452e30
[ 2305.732887] x8 : 000002189dce1780 x7 : ffff07ff8d5c1000 x6 : ffff41f3d3161000
[ 2305.740009] x5 : ffff07ff8136a580 x4 : ffff080f37aba960 x3 : 000000001550a055
[ 2305.747131] x2 : 0000000000000000 x1 : ffffc61b656052e8 x0 : ffff080184c565f0
[ 2305.754254] Call trace:
[ 2305.756687]  in_lookup_hashtable+0x1138/0x2000
[ 2305.761119]  rcu_core+0x268/0x350
[ 2305.764422]  rcu_core_si+0x18/0x30
[ 2305.767812]  __do_softirq+0x120/0x3d4
[ 2305.771462]  ____do_softirq+0x18/0x30
[ 2305.775112]  call_on_irq_stack+0x24/0x30
[ 2305.779022]  do_softirq_own_stack+0x24/0x38
[ 2305.783192]  __irq_exit_rcu+0xfc/0x130
[ 2305.786929]  irq_exit_rcu+0x18/0x30
[ 2305.790404]  el1_interrupt+0x4c/0xe8
[ 2305.793969]  el1h_64_irq_handler+0x18/0x28
[ 2305.798052]  el1h_64_irq+0x78/0x80
[ 2305.801441]  d_same_name+0x50/0xd0
[ 2305.804832]  __lookup_slow+0x64/0x158
[ 2305.808482]  walk_component+0xe0/0x1a0
[ 2305.812219]  path_lookupat+0x7c/0x1b8
[ 2305.815869]  filename_lookup+0xb4/0x1b8
[ 2305.819692]  vfs_statx+0x94/0x1a8
[ 2305.822995]  vfs_fstatat+0xd4/0x110
[ 2305.826471]  __do_sys_newfstatat+0x58/0xa8
[ 2305.830556]  __arm64_sys_newfstatat+0x28/0x40
[ 2305.834901]  invoke_syscall.constprop.0+0x80/0xd8
[ 2305.839592]  do_el0_svc+0x48/0xd0
[ 2305.842894]  el0_svc+0x4c/0x1c0
[ 2305.846023]  el0t_64_sync_handler+0x120/0x130
[ 2305.850367]  el0t_64_sync+0x1a4/0x1a8
[ 2305.854017] Code: 00000000 00000000 00000000 00000000 (84c565f1)
[ 2305.860098] SMP: stopping secondary CPUs
[ 2305.865048] Starting crashdump kernel...
[ 2305.868958] Bye!


On Mon, Aug 28, 2023 at 6:43 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Aug 23, 2023 at 4:47 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Aug 23, 2023 at 12:06:14PM +0800, Ming Lei wrote:
> > >
> > > Looks the issue is more related with vfs, so forward to vfs list.
> > >
> > > ----- Forwarded message from Changhui Zhong <czhong@redhat.com> -----
> > >
> > > Date: Wed, 23 Aug 2023 11:17:55 +0800
> > > From: Changhui Zhong <czhong@redhat.com>
> > > To: linux-scsi@vger.kernel.org
> > > Cc: Ming Lei <ming.lei@redhat.com>
> > > Subject: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
> > >
> > > Hello,
> > >
> > > triggered below warning issue with branch
> > > "
> > > Tree: mainline.kernel.org-clang
> > > Repository: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > @ master
> > > Commit Hash: 89bf6209cad66214d3774dac86b6bbf2aec6a30d
> > > Commit Name: v6.5-rc7-18-g89bf6209cad6
> > > Kernel information:
> > > Commit message: Merge tag 'devicetree-fixes-for-6.5-2' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
> > > "
> > > for more detail,please check
> > > https://datawarehouse.cki-project.org/kcidb/tests/9232643
> > >
> > > #modprobe scsi_debug virtual_gb=128
> > > #echo none > /sys/block/sdb/queue/scheduler
> > > #fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=4 --rw=randrw
> > > --name=sdb-libaio-randrw-4k --filename=/dev/sdb --direct=1 --size=60G
> > > --runtime=60
> >
> > Looking at this issue it seems unlikely that this is a vfs bug.
> > We should see this all over the place and specifically not just on arm64.
> >
> > The sequence here seems to be:
> >
> > echo 4 > /proc/sys/vm/drop_caches
> > rmmod scsi_debug > /dev/null 3>&1
> >
> > [ 3117.059778] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278
> > [ 3117.067601] Modules linked in: scsi_debug nvme nvme_core nvme_common null_blk pktcdvd ipmi_watchdog ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu igb ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu cppc_cpufreq arm_dsu_pmu acpi_tad loop fuse zram xfs crct10dif_ce polyval_ce polyval_generic ghash_ce sbsa_gwdt ast onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]
> >
> > So my money is on some device that gets removed still having an
> > increased refcount and pinning the dentry. Immediate suspects would be:
> >
> > 7882541ca06d ("of/platform: increase refcount of fwnode")
> >
> > but that part is complete speculation on my part.
>
> BTW, just saw another panic on 6.5-rc7, still scsi_debug test on arm64:
>
> [  959.371726] sr 50:0:0:0: Attached scsi generic sg1 type 5
> [  959.603145] scsi 48:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [  959.603155] scsi 50:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [  959.603950] scsi 49:0:0:0: CD-ROM            Linux    scsi_debug
>    0191 PQ: 0 ANSI: 7
> [  959.604052] scsi 49:0:0:0: Power-on or device reset occurred
> [  959.609336] sr 49:0:0:0: [sr1] scsi-1 drive
> [  959.611360] scsi 48:0:0:0: Power-on or device reset occurred
> [  959.614540] Unable to handle kernel paging request at virtual
> address 65888c2e6fe694d5
> [  959.614544] Mem abort info:
> [  959.614545]   ESR = 0x0000000096000004
> [  959.614547]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  959.614550]   SET = 0, FnV = 0
> [  959.614552]   EA = 0, S1PTW = 0
> [  959.614553]   FSC = 0x04: level 0 translation fault
> [  959.614555] Data abort info:
> [  959.614556]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [  959.614559]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [  959.614561]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [  959.614563] [65888c2e6fe694d5] address between user and kernel address ranges
> [  959.614566] Internal error: Oops: 0000000096000004 [#1] SMP
> [  959.614570] Modules linked in: pktcdvd scsi_debug ipmi_watchdog
> ipmi_poweroff rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu
> igb ipmi_devintf arm_cmn ipmi_msghandler arm_dmc620_pmu arm_dsu_pmu
> cppc_cpufreq acpi_tad loop fuse zram xfs nvme crct10dif_ce polyval_ce
> nvme_core polyval_generic ghash_ce sbsa_gwdt nvme_common ast
> onboard_usb_hub i2c_algo_bit xgene_hwmon [last unloaded: scsi_debug]
> [  959.614620] CPU: 108 PID: 19529 Comm: check Not tainted 6.5.0-rc7 #1
> [  959.614625] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
> F31n (SCP: 2.10.20220810) 09/30/2022
> [  959.614627] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  959.614632] pc : d_alloc_parallel+0x140/0x440
> [  959.614641] lr : d_alloc_parallel+0xcc/0x440
> [  959.614646] sp : ffff80008a7d3290
> [  959.614647] x29: ffff80008a7d3290 x28: ffff07ff8230c530 x27: 65888c2e6fe69565
> [  959.614654] x26: ffffcb72eac9e1d0 x25: ffff80008a7d33d8 x24: ffff07ffa0fd3800
> [  959.614659] x23: 00000000000003c0 x22: 000000007a701548 x21: ffffcb72eac9ffd0
> [  959.614664] x20: ffff07ffa0fd35c0 x19: ffffcb72ea6e9600 x18: ffffffffffffffff
> [  959.614670] x17: 00000000440fd8e0 x16: 00000000b6431329 x15: ffff80008a7d3360
> [  959.614675] x14: ffff80008a7d3508 x13: ffffcb72e94da6d0 x12: ffff80008a7d334c
> [  959.614680] x11: 0000000c7a701548 x10: ffff3c8cb54590b8 x9 : ffffcb72e80832f4
> [  959.614685] x8 : ffff07ffa0fd35c0 x7 : 7473696c5f71725f x6 : 0000000000200008
> [  959.614690] x5 : ffffcb72ea6f4000 x4 : 00000000003d380a x3 : 0000000000000004
> [  959.614696] x2 : ffff80008a7d331c x1 : ffff07ffe1760000 x0 : 0000000000005000
> [  959.614701] Call trace:
> [  959.614703]  d_alloc_parallel+0x140/0x440
> [  959.614708]  __lookup_slow+0x64/0x158
> [  959.614714]  lookup_one_len+0xac/0xc8
> [  959.614719]  start_creating.part.0+0x88/0x198
> [  959.614725]  __debugfs_create_file+0x70/0x230
> [  959.614730]  debugfs_create_file+0x34/0x48
> [  959.614734]  blk_mq_debugfs_register_hctx+0x154/0x1d0
> [  959.614740]  blk_mq_debugfs_register+0xfc/0x1e0
> [  959.614745]  blk_register_queue+0xc0/0x1f0
> [  959.614750]  device_add_disk+0x1dc/0x3e0
> [  959.614754]  sr_probe+0x2c0/0x368
> [  959.614760]  really_probe+0x190/0x3d8
> [  959.614766]  __driver_probe_device+0x84/0x180
> [  959.614771]  driver_probe_device+0x44/0x120
> [  959.614776]  __device_attach_driver+0xc4/0x168
> [  959.614781]  bus_for_each_drv+0x8c/0xf0
> [  959.614785]  __device_attach+0xa4/0x1c0
> [  959.614790]  device_initial_probe+0x1c/0x30
> [  959.614795]  bus_probe_device+0xb4/0xc0
> [  959.614799]  device_add+0x508/0x6f8
> [  959.614803]  scsi_sysfs_add_sdev+0x8c/0x250
> [  959.614809]  scsi_add_lun+0x424/0x558
> [  959.614813]  scsi_probe_and_add_lun+0x11c/0x430
> [  959.614817]  __scsi_scan_target+0xb8/0x258
> [  959.614821]  scsi_scan_channel+0xa0/0xb8
> [  959.614825]  scsi_scan_host_selected+0x170/0x188
> [  959.614830]  store_scan+0x194/0x1a8
> [  959.614835]  dev_attr_store+0x20/0x40
> [  959.614840]  sysfs_kf_write+0x4c/0x68
> [  959.614845]  kernfs_fop_write_iter+0x13c/0x1d8
> [  959.614849]  vfs_write+0x1c0/0x310
> [  959.614855]  ksys_write+0x78/0x118
> [  959.614859]  __arm64_sys_write+0x24/0x38
> [  959.614864]  invoke_syscall+0x78/0x100
> [  959.614868]  el0_svc_common.constprop.0+0x4c/0xf8
> [  959.614871]  do_el0_svc+0x34/0x50
> [  959.614874]  el0_svc+0x34/0x108
> [  959.614879]  el0t_64_sync_handler+0x120/0x130
> [  959.614884]  el0t_64_sync+0x194/0x198
> [  959.614889] Code: 54000088 14000067 f940037b b4000cbb (b8570360)
> [  959.614892] ---[ end trace 0000000000000000 ]---
> [  959.614895] Kernel panic - not syncing: Oops: Fatal exception
> [  959.614897] SMP: stopping secondary CPUs
> [  959.619492] Kernel Offset: 0x4b7267c40000 from 0xffff800080000000
> [  959.619494] PHYS_OFFSET: 0x80000000
> [  959.619496] CPU features: 0x00000010,b80140a1,8841720b
> [  959.619498] Memory Limit: none
> [  960.040819] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
>


-- 
Best Regards,
  Yi Zhang


  reply	other threads:[~2023-09-13  9:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-23  4:06 [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] Ming Lei
2023-08-23  8:47 ` Christian Brauner
2023-08-28 10:43   ` Ming Lei
2023-09-13  8:59     ` Yi Zhang [this message]
2023-09-16  6:55       ` Baokun Li
2023-09-17  9:10         ` Peter Zijlstra
2023-09-17  9:26           ` Peter Zijlstra
2023-09-18  1:52             ` Baokun Li
2023-09-18 18:42               ` Darrick J. Wong
2023-09-18  1:10           ` Baokun Li
2023-09-18 10:20             ` Yi Zhang
2023-09-19 15:10         ` Mark Rutland
2023-09-17  0:35       ` Bagas Sanjaya
2023-09-29 13:24         ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHj4cs_MqqWYy+pKrNrLqTb=eoSOXcZdjPXy44x-aA1WvdVv0w@mail.gmail.com' \
    --to=yi.zhang@redhat.com \
    --cc=brauner@kernel.org \
    --cc=czhong@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=ming.lei@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).