All of lore.kernel.org
 help / color / mirror / Atom feed
* [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
@ 2023-12-18 14:01 Zorro Lang
  2023-12-18 17:48 ` Darrick J. Wong
  2024-01-18  4:20 ` Dave Chinner
  0 siblings, 2 replies; 18+ messages in thread
From: Zorro Lang @ 2023-12-18 14:01 UTC (permalink / raw)
  To: linux-xfs; +Cc: fstests

Hi,

Recently I hit a crash [1] on s390x with 64k directory block size xfs
(-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
even not panic, a assertion failure will happen.

I found it from an old downstream kernel at first, then reproduced it
on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
time this issue be there, just reported it at first.

I hit and reproduced it several times by running generic/648 on s390x
with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
try to check other arches.

Thanks,
Zorro

[1]
 [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
 [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
 [  899.455844] XFS (loop1): Ending clean mount
 [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
 [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  899.588501] XFS (dm-3): Ending clean mount
 ...
 ...
 [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
 [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
 [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
 [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
 [  976.831942] lo_write_bvec: 15 callbacks suppressed
 [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
 [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
 [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
 [  976.831960] XFS (loop3): log I/O error -5
 [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
 [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
 [  976.839603] buffer_io_error: 6 callbacks suppressed
 [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
 [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
 [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
 [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
 [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
 [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
 [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
 [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
 [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
 [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
 [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
 [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
 [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  978.449156] XFS (dm-3): log I/O error -5
 [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
 [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
 [  978.449163] XFS (dm-3): log I/O error -5
 [  978.449164] XFS (dm-3): log I/O error -5
 [  978.449165] XFS (dm-3): log I/O error -5
 [  978.449166] XFS (dm-3): log I/O error -5
 [  978.449167] XFS (dm-3): log I/O error -5
 [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
 [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
 [  978.587276] loop3: detected capacity change from 0 to 6876346
 [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
 [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
 [  979.225078] XFS (loop3): Bad dir block magic!
 [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
 [  979.225120] ------------[ cut here ]------------
 [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
 [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
 [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
 [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
 [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
 [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
 [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
 [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
 [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
 [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
 [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
 [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
 [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
 [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
 [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
 [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
 [  979.225476]            000003ff7ffead86: af000000		mc	0,0
 [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
 [  979.225487] Call Trace:
 [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
 [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
 [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
 [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
 [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
 [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
 [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
 [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
 [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
 [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
 [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
 [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
 [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
 [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
 [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
 [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
 [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
 [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
 [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
 [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
 [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
 [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
 [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
 [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
 [  979.226868] Last Breaking-Event-Address:
 [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
 [  979.226949] ---[ end trace 0000000000000000 ]---
 [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
 [  979.227732] XFS (loop3): Unmount and run xfs_repair
 [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
 [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
 [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
 [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
 [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
 [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
 [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
 [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
 [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
 [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
 [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
 [  979.227912] XFS (loop3): log mount/recovery failed: error -117
 [  979.228061] XFS (loop3): log mount failed
 [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
 [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  980.355207] XFS (loop0): Ending clean mount
 [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
 [  982.586881] XFS (loop1): Ending clean mount
 [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
 [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
 [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
 [  982.637231] XFS (loop0): Ending clean mount
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-18 14:01 [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply Zorro Lang
@ 2023-12-18 17:48 ` Darrick J. Wong
  2023-12-19  6:34   ` Zorro Lang
  2024-01-18  4:20 ` Dave Chinner
  1 sibling, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2023-12-18 17:48 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, fstests

On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> Hi,
> 
> Recently I hit a crash [1] on s390x with 64k directory block size xfs
> (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> even not panic, a assertion failure will happen.
> 
> I found it from an old downstream kernel at first, then reproduced it
> on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> time this issue be there, just reported it at first.
> 
> I hit and reproduced it several times by running generic/648 on s390x
> with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> try to check other arches.

XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020

Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?

Also, does "xfs: update dir3 leaf block metadata after swap" fix it?

(I suspect it won't because that's a dirent block, but for-next has a
few directory fixes in it.)

--D

> 
> Thanks,
> Zorro
> 
> [1]
>  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
>  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
>  [  899.455844] XFS (loop1): Ending clean mount
>  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
>  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  899.588501] XFS (dm-3): Ending clean mount
>  ...
>  ...
>  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
>  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
>  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
>  [  976.831942] lo_write_bvec: 15 callbacks suppressed
>  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
>  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
>  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
>  [  976.831960] XFS (loop3): log I/O error -5
>  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
>  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
>  [  976.839603] buffer_io_error: 6 callbacks suppressed
>  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
>  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
>  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
>  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
>  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
>  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
>  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
>  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
>  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
>  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
>  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
>  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  978.449156] XFS (dm-3): log I/O error -5
>  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
>  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
>  [  978.449163] XFS (dm-3): log I/O error -5
>  [  978.449164] XFS (dm-3): log I/O error -5
>  [  978.449165] XFS (dm-3): log I/O error -5
>  [  978.449166] XFS (dm-3): log I/O error -5
>  [  978.449167] XFS (dm-3): log I/O error -5
>  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
>  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
>  [  978.587276] loop3: detected capacity change from 0 to 6876346
>  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
>  [  979.225078] XFS (loop3): Bad dir block magic!
>  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
>  [  979.225120] ------------[ cut here ]------------
>  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
>  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
>  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
>  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
>  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
>  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
>  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
>  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
>  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
>  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
>  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
>  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
>  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
>  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
>  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
>  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
>  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
>  [  979.225487] Call Trace:
>  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
>  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
>  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
>  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
>  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
>  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
>  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
>  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
>  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
>  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
>  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
>  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
>  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
>  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
>  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
>  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
>  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
>  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
>  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
>  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
>  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
>  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
>  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
>  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
>  [  979.226868] Last Breaking-Event-Address:
>  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
>  [  979.226949] ---[ end trace 0000000000000000 ]---
>  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
>  [  979.227732] XFS (loop3): Unmount and run xfs_repair
>  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
>  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
>  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
>  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
>  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
>  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
>  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
>  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
>  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
>  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
>  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
>  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
>  [  979.228061] XFS (loop3): log mount failed
>  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
>  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  980.355207] XFS (loop0): Ending clean mount
>  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
>  [  982.586881] XFS (loop1): Ending clean mount
>  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
>  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
>  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
>  [  982.637231] XFS (loop0): Ending clean mount
>  
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-18 17:48 ` Darrick J. Wong
@ 2023-12-19  6:34   ` Zorro Lang
  2023-12-25 13:38     ` Zorro Lang
  0 siblings, 1 reply; 18+ messages in thread
From: Zorro Lang @ 2023-12-19  6:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, fstests

On Mon, Dec 18, 2023 at 09:48:36AM -0800, Darrick J. Wong wrote:
> On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > even not panic, a assertion failure will happen.
> > 
> > I found it from an old downstream kernel at first, then reproduced it
> > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > time this issue be there, just reported it at first.
> > 
> > I hit and reproduced it several times by running generic/648 on s390x
> > with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> > try to check other arches.
> 
> XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020
> 
> Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?

The source is just based on HEAD="v6.7-rc6", the decode_stacktrace.sh output is:
  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020

The code is as below:

        switch (hdr->magic) {
        case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
        case cpu_to_be32(XFS_DIR2_BLOCK_MAGIC):
                btp = xfs_dir2_block_tail_p(geo, hdr);
                lep = xfs_dir2_block_leaf_p(btp);

                if (be32_to_cpu(btp->count) >=
                    xfs_dir2_data_max_leaf_entries(geo))
137: ==>                 return __this_address;
                break;


More output as below:

[  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
[  979.225120] ------------[ cut here ]------------
[  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail (fs/xfs/xfs_message.c:104 (discriminator 3)) xfs
[  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm
+fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log
+dm_mod pkey zcrypt aes_s390
[  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
[  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 xfs])
[  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
[  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
[  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
[  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
[ 979.225476] Krnl Code: 000003ff7ffead6e: 95001010 cli 16(%r1),0
All code
========

Code starting with the faulting instruction
===========================================
[  979.225476]            000003ff7ffead72: a774000a           brc     7,000003ff7ffead86
[  979.225476]           #000003ff7ffead76: af000000           mc      0,0
[  979.225476]           >000003ff7ffead7a: eb6ff0a80004       lmg     %r6,%r15,168(%r15)
[  979.225476]            000003ff7ffead80: 07fe               bcr     15,%r14
[  979.225476]            000003ff7ffead82: 47000700           bc      0,1792
[  979.225476]            000003ff7ffead86: af000000           mc      0,0
[  979.225476]            000003ff7ffead8a: 0707               bcr     0,%r7
[  979.225487] Call Trace:
[  979.225488] assfail (fs/xfs/xfs_message.c:105 (discriminator 3)) xfs
[  979.225568] assfail+0x40/0x68 xfs])
[  979.225648] xlog_recover_validate_buf_type (fs/xfs/xfs_buf_item_recover.c:414 (discriminator 1)) xfs
[  979.225727] xlog_recover_buf_commit_pass2 (fs/xfs/xfs_buf_item_recover.c:971) xfs
[  979.225807] xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1957) xfs
[  979.225886] xlog_recover_commit_trans (./include/linux/list.h:373 ./include/linux/list.h:588 fs/xfs/xfs_log_recover.c:2028) xfs
[  979.225965] xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2257) xfs
[  979.226045] xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2405) xfs
[  979.226124] xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2447) xfs
[  979.226204] xlog_recover_process (fs/xfs/xfs_log_recover.c:2901) xfs
[  979.226285] xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3153) xfs
[  979.226366] xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3265) xfs
[  979.226446] xlog_do_recover (fs/xfs/xfs_log_recover.c:3293) xfs
[  979.226526] xlog_recover (./arch/s390/include/asm/atomic_ops.h:71 (discriminator 1) ./arch/s390/include/asm/bitops.h:63 (discriminator 1) ./include/asm-generic/bitops/instrumented-atomic.h:29 (discriminator 1) fs/xfs/xfs_log_recover.c:3427 (discriminator 1)) xfs
[  979.226607] xfs_log_mount (fs/xfs/xfs_log.c:718) xfs
[  979.226686] xfs_mountfs (fs/xfs/xfs_mount.c:822) xfs
[  979.226765] xfs_fs_fill_super (fs/xfs/xfs_super.c:1734) xfs
[  979.226844] get_tree_bdev (fs/super.c:1598) 
[  979.226850] vfs_get_tree (fs/super.c:1771) 
[  979.226852] do_new_mount (fs/namespace.c:3337) 
[  979.226855] path_mount (fs/namespace.c:3664) 
[  979.226857] __s390x_sys_mount (fs/namespace.c:3677 fs/namespace.c:3886 fs/namespace.c:3863 fs/namespace.c:3863) 
[  979.226860] __do_syscall (arch/s390/kernel/syscall.c:144 arch/s390/kernel/syscall.c:168) 
[  979.226864] system_call (arch/s390/kernel/entry.S:309) 
[  979.226868] Last Breaking-Event-Address:
[  979.226868] xfs_printk_level (fs/xfs/xfs_message.c:52) xfs
[  979.226949] ---[ end trace 0000000000000000 ]---
[  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
[  979.227732] XFS (loop3): Unmount and run xfs_repair
[  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
[  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3...........
[  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
[  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
[  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
[  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
[  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
[  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
[  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
[  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558). Shutting down filesystem.




> 
> Also, does "xfs: update dir3 leaf block metadata after swap" fix it?

OK, I'll merge and give it a try.

Thanks,
Zorro

> 
> (I suspect it won't because that's a dirent block, but for-next has a
> few directory fixes in it.)
> 
> --D
> 
> > 
> > Thanks,
> > Zorro
> > 
> > [1]
> >  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
> >  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> >  [  899.455844] XFS (loop1): Ending clean mount
> >  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> >  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  899.588501] XFS (dm-3): Ending clean mount
> >  ...
> >  ...
> >  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
> >  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
> >  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
> >  [  976.831942] lo_write_bvec: 15 callbacks suppressed
> >  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
> >  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> >  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> >  [  976.831960] XFS (loop3): log I/O error -5
> >  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
> >  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
> >  [  976.839603] buffer_io_error: 6 callbacks suppressed
> >  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
> >  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
> >  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
> >  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
> >  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
> >  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
> >  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
> >  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
> >  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
> >  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
> >  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
> >  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  978.449156] XFS (dm-3): log I/O error -5
> >  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
> >  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
> >  [  978.449163] XFS (dm-3): log I/O error -5
> >  [  978.449164] XFS (dm-3): log I/O error -5
> >  [  978.449165] XFS (dm-3): log I/O error -5
> >  [  978.449166] XFS (dm-3): log I/O error -5
> >  [  978.449167] XFS (dm-3): log I/O error -5
> >  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
> >  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
> >  [  978.587276] loop3: detected capacity change from 0 to 6876346
> >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> >  [  979.225078] XFS (loop3): Bad dir block magic!
> >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> >  [  979.225120] ------------[ cut here ]------------
> >  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
> >  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> >  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
> >  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> >  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
> >  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> >  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> >  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> >  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> >  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
> >  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
> >  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
> >  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
> >  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
> >  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
> >  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
> >  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
> >  [  979.225487] Call Trace:
> >  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
> >  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
> >  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> >  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> >  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> >  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> >  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> >  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> >  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> >  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
> >  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
> >  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> >  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
> >  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
> >  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
> >  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
> >  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> >  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
> >  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
> >  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
> >  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
> >  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
> >  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
> >  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
> >  [  979.226868] Last Breaking-Event-Address:
> >  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
> >  [  979.226949] ---[ end trace 0000000000000000 ]---
> >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> >  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> >  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> >  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> >  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> >  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> >  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> >  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> >  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
> >  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
> >  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
> >  [  979.228061] XFS (loop3): log mount failed
> >  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> >  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  980.355207] XFS (loop0): Ending clean mount
> >  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> >  [  982.586881] XFS (loop1): Ending clean mount
> >  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> >  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> >  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> >  [  982.637231] XFS (loop0): Ending clean mount
> >  
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-19  6:34   ` Zorro Lang
@ 2023-12-25 13:38     ` Zorro Lang
  2024-01-04  4:35       ` Darrick J. Wong
  0 siblings, 1 reply; 18+ messages in thread
From: Zorro Lang @ 2023-12-25 13:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, fstests

On Tue, Dec 19, 2023 at 02:34:20PM +0800, Zorro Lang wrote:
> On Mon, Dec 18, 2023 at 09:48:36AM -0800, Darrick J. Wong wrote:
> > On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > > Hi,
> > > 
> > > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > > even not panic, a assertion failure will happen.
> > > 
> > > I found it from an old downstream kernel at first, then reproduced it
> > > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > > time this issue be there, just reported it at first.
> > > 
> > > I hit and reproduced it several times by running generic/648 on s390x
> > > with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> > > try to check other arches.
> > 
> > XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020
> > 
> > Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?
> 
> The source is just based on HEAD="v6.7-rc6", the decode_stacktrace.sh output is:
>   [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> 
> The code is as below:
> 
>         switch (hdr->magic) {
>         case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
>         case cpu_to_be32(XFS_DIR2_BLOCK_MAGIC):
>                 btp = xfs_dir2_block_tail_p(geo, hdr);
>                 lep = xfs_dir2_block_leaf_p(btp);
> 
>                 if (be32_to_cpu(btp->count) >=
>                     xfs_dir2_data_max_leaf_entries(geo))
> 137: ==>                 return __this_address;
>                 break;
> 
> 
> More output as below:
> 
> [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> [  979.225120] ------------[ cut here ]------------
> [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail (fs/xfs/xfs_message.c:104 (discriminator 3)) xfs
> [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm
> +fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log
> +dm_mod pkey zcrypt aes_s390
> [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 xfs])
> [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> [ 979.225476] Krnl Code: 000003ff7ffead6e: 95001010 cli 16(%r1),0
> All code
> ========
> 
> Code starting with the faulting instruction
> ===========================================
> [  979.225476]            000003ff7ffead72: a774000a           brc     7,000003ff7ffead86
> [  979.225476]           #000003ff7ffead76: af000000           mc      0,0
> [  979.225476]           >000003ff7ffead7a: eb6ff0a80004       lmg     %r6,%r15,168(%r15)
> [  979.225476]            000003ff7ffead80: 07fe               bcr     15,%r14
> [  979.225476]            000003ff7ffead82: 47000700           bc      0,1792
> [  979.225476]            000003ff7ffead86: af000000           mc      0,0
> [  979.225476]            000003ff7ffead8a: 0707               bcr     0,%r7
> [  979.225487] Call Trace:
> [  979.225488] assfail (fs/xfs/xfs_message.c:105 (discriminator 3)) xfs
> [  979.225568] assfail+0x40/0x68 xfs])
> [  979.225648] xlog_recover_validate_buf_type (fs/xfs/xfs_buf_item_recover.c:414 (discriminator 1)) xfs
> [  979.225727] xlog_recover_buf_commit_pass2 (fs/xfs/xfs_buf_item_recover.c:971) xfs
> [  979.225807] xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1957) xfs
> [  979.225886] xlog_recover_commit_trans (./include/linux/list.h:373 ./include/linux/list.h:588 fs/xfs/xfs_log_recover.c:2028) xfs
> [  979.225965] xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2257) xfs
> [  979.226045] xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2405) xfs
> [  979.226124] xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2447) xfs
> [  979.226204] xlog_recover_process (fs/xfs/xfs_log_recover.c:2901) xfs
> [  979.226285] xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3153) xfs
> [  979.226366] xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3265) xfs
> [  979.226446] xlog_do_recover (fs/xfs/xfs_log_recover.c:3293) xfs
> [  979.226526] xlog_recover (./arch/s390/include/asm/atomic_ops.h:71 (discriminator 1) ./arch/s390/include/asm/bitops.h:63 (discriminator 1) ./include/asm-generic/bitops/instrumented-atomic.h:29 (discriminator 1) fs/xfs/xfs_log_recover.c:3427 (discriminator 1)) xfs
> [  979.226607] xfs_log_mount (fs/xfs/xfs_log.c:718) xfs
> [  979.226686] xfs_mountfs (fs/xfs/xfs_mount.c:822) xfs
> [  979.226765] xfs_fs_fill_super (fs/xfs/xfs_super.c:1734) xfs
> [  979.226844] get_tree_bdev (fs/super.c:1598) 
> [  979.226850] vfs_get_tree (fs/super.c:1771) 
> [  979.226852] do_new_mount (fs/namespace.c:3337) 
> [  979.226855] path_mount (fs/namespace.c:3664) 
> [  979.226857] __s390x_sys_mount (fs/namespace.c:3677 fs/namespace.c:3886 fs/namespace.c:3863 fs/namespace.c:3863) 
> [  979.226860] __do_syscall (arch/s390/kernel/syscall.c:144 arch/s390/kernel/syscall.c:168) 
> [  979.226864] system_call (arch/s390/kernel/entry.S:309) 
> [  979.226868] Last Breaking-Event-Address:
> [  979.226868] xfs_printk_level (fs/xfs/xfs_message.c:52) xfs
> [  979.226949] ---[ end trace 0000000000000000 ]---
> [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> [  979.227732] XFS (loop3): Unmount and run xfs_repair
> [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3...........
> [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558). Shutting down filesystem.
> 
> 
> 
> 
> > 
> > Also, does "xfs: update dir3 leaf block metadata after swap" fix it?
> 
> OK, I'll merge and give it a try.

It's still reproducible on xfs-linux for-next branch xfs-6.8-merge-2, which
contains the 5759aa4f9560 ("xfs: update dir3 leaf block metadata after swap")

Thanks,
Zorro

> 
> Thanks,
> Zorro
> 
> > 
> > (I suspect it won't because that's a dirent block, but for-next has a
> > few directory fixes in it.)
> > 
> > --D
> > 
> > > 
> > > Thanks,
> > > Zorro
> > > 
> > > [1]
> > >  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
> > >  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > >  [  899.455844] XFS (loop1): Ending clean mount
> > >  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > >  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  899.588501] XFS (dm-3): Ending clean mount
> > >  ...
> > >  ...
> > >  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > >  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
> > >  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
> > >  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
> > >  [  976.831942] lo_write_bvec: 15 callbacks suppressed
> > >  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
> > >  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > >  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > >  [  976.831960] XFS (loop3): log I/O error -5
> > >  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
> > >  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
> > >  [  976.839603] buffer_io_error: 6 callbacks suppressed
> > >  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
> > >  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
> > >  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
> > >  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
> > >  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
> > >  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
> > >  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
> > >  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
> > >  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > >  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
> > >  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
> > >  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
> > >  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  978.449156] XFS (dm-3): log I/O error -5
> > >  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
> > >  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
> > >  [  978.449163] XFS (dm-3): log I/O error -5
> > >  [  978.449164] XFS (dm-3): log I/O error -5
> > >  [  978.449165] XFS (dm-3): log I/O error -5
> > >  [  978.449166] XFS (dm-3): log I/O error -5
> > >  [  978.449167] XFS (dm-3): log I/O error -5
> > >  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
> > >  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
> > >  [  978.587276] loop3: detected capacity change from 0 to 6876346
> > >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> > >  [  979.225078] XFS (loop3): Bad dir block magic!
> > >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > >  [  979.225120] ------------[ cut here ]------------
> > >  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
> > >  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> > >  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
> > >  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> > >  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
> > >  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > >  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> > >  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> > >  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> > >  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> > >  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
> > >  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
> > >  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
> > >  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
> > >  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
> > >  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
> > >  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
> > >  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
> > >  [  979.225487] Call Trace:
> > >  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
> > >  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
> > >  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> > >  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> > >  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> > >  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> > >  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> > >  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> > >  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> > >  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
> > >  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
> > >  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> > >  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
> > >  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
> > >  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
> > >  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
> > >  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> > >  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
> > >  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
> > >  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
> > >  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
> > >  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
> > >  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
> > >  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
> > >  [  979.226868] Last Breaking-Event-Address:
> > >  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
> > >  [  979.226949] ---[ end trace 0000000000000000 ]---
> > >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> > >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> > >  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> > >  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> > >  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> > >  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> > >  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> > >  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> > >  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> > >  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
> > >  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
> > >  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
> > >  [  979.228061] XFS (loop3): log mount failed
> > >  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > >  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  980.355207] XFS (loop0): Ending clean mount
> > >  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > >  [  982.586881] XFS (loop1): Ending clean mount
> > >  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > >  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> > >  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > >  [  982.637231] XFS (loop0): Ending clean mount
> > >  
> > > 
> > > 
> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-25 13:38     ` Zorro Lang
@ 2024-01-04  4:35       ` Darrick J. Wong
  2024-01-18  1:59         ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2024-01-04  4:35 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, fstests

On Mon, Dec 25, 2023 at 09:38:54PM +0800, Zorro Lang wrote:
> On Tue, Dec 19, 2023 at 02:34:20PM +0800, Zorro Lang wrote:
> > On Mon, Dec 18, 2023 at 09:48:36AM -0800, Darrick J. Wong wrote:
> > > On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > > > Hi,
> > > > 
> > > > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > > > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > > > even not panic, a assertion failure will happen.
> > > > 
> > > > I found it from an old downstream kernel at first, then reproduced it
> > > > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > > > time this issue be there, just reported it at first.
> > > > 
> > > > I hit and reproduced it several times by running generic/648 on s390x
> > > > with "-n size=65536" xfs many times. Not sure if s390x is needed, I'll
> > > > try to check other arches.
> > > 
> > > XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020
> > > 
> > > Where in the source code is this             ^^^^^^^^^^^^^^^^^^^^^^^^^^^?
> > 
> > The source is just based on HEAD="v6.7-rc6", the decode_stacktrace.sh output is:
> >   [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> > 
> > The code is as below:
> > 
> >         switch (hdr->magic) {
> >         case cpu_to_be32(XFS_DIR3_BLOCK_MAGIC):
> >         case cpu_to_be32(XFS_DIR2_BLOCK_MAGIC):
> >                 btp = xfs_dir2_block_tail_p(geo, hdr);
> >                 lep = xfs_dir2_block_leaf_p(btp);
> > 
> >                 if (be32_to_cpu(btp->count) >=
> >                     xfs_dir2_data_max_leaf_entries(geo))
> > 137: ==>                 return __this_address;
> >                 break;
> > 
> > 
> > More output as below:
> > 
> > [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > [  979.225120] ------------[ cut here ]------------
> > [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail (fs/xfs/xfs_message.c:104 (discriminator 3)) xfs
> > [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm
> > +fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log
> > +dm_mod pkey zcrypt aes_s390
> > [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> > [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 xfs])
> > [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> > [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> > [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> > [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> > [ 979.225476] Krnl Code: 000003ff7ffead6e: 95001010 cli 16(%r1),0
> > All code
> > ========
> > 
> > Code starting with the faulting instruction
> > ===========================================
> > [  979.225476]            000003ff7ffead72: a774000a           brc     7,000003ff7ffead86
> > [  979.225476]           #000003ff7ffead76: af000000           mc      0,0
> > [  979.225476]           >000003ff7ffead7a: eb6ff0a80004       lmg     %r6,%r15,168(%r15)
> > [  979.225476]            000003ff7ffead80: 07fe               bcr     15,%r14
> > [  979.225476]            000003ff7ffead82: 47000700           bc      0,1792
> > [  979.225476]            000003ff7ffead86: af000000           mc      0,0
> > [  979.225476]            000003ff7ffead8a: 0707               bcr     0,%r7
> > [  979.225487] Call Trace:
> > [  979.225488] assfail (fs/xfs/xfs_message.c:105 (discriminator 3)) xfs
> > [  979.225568] assfail+0x40/0x68 xfs])
> > [  979.225648] xlog_recover_validate_buf_type (fs/xfs/xfs_buf_item_recover.c:414 (discriminator 1)) xfs
> > [  979.225727] xlog_recover_buf_commit_pass2 (fs/xfs/xfs_buf_item_recover.c:971) xfs
> > [  979.225807] xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1957) xfs
> > [  979.225886] xlog_recover_commit_trans (./include/linux/list.h:373 ./include/linux/list.h:588 fs/xfs/xfs_log_recover.c:2028) xfs
> > [  979.225965] xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2257) xfs
> > [  979.226045] xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2405) xfs
> > [  979.226124] xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2447) xfs
> > [  979.226204] xlog_recover_process (fs/xfs/xfs_log_recover.c:2901) xfs
> > [  979.226285] xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3153) xfs
> > [  979.226366] xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3265) xfs
> > [  979.226446] xlog_do_recover (fs/xfs/xfs_log_recover.c:3293) xfs
> > [  979.226526] xlog_recover (./arch/s390/include/asm/atomic_ops.h:71 (discriminator 1) ./arch/s390/include/asm/bitops.h:63 (discriminator 1) ./include/asm-generic/bitops/instrumented-atomic.h:29 (discriminator 1) fs/xfs/xfs_log_recover.c:3427 (discriminator 1)) xfs
> > [  979.226607] xfs_log_mount (fs/xfs/xfs_log.c:718) xfs
> > [  979.226686] xfs_mountfs (fs/xfs/xfs_mount.c:822) xfs
> > [  979.226765] xfs_fs_fill_super (fs/xfs/xfs_super.c:1734) xfs
> > [  979.226844] get_tree_bdev (fs/super.c:1598) 
> > [  979.226850] vfs_get_tree (fs/super.c:1771) 
> > [  979.226852] do_new_mount (fs/namespace.c:3337) 
> > [  979.226855] path_mount (fs/namespace.c:3664) 
> > [  979.226857] __s390x_sys_mount (fs/namespace.c:3677 fs/namespace.c:3886 fs/namespace.c:3863 fs/namespace.c:3863) 
> > [  979.226860] __do_syscall (arch/s390/kernel/syscall.c:144 arch/s390/kernel/syscall.c:168) 
> > [  979.226864] system_call (arch/s390/kernel/entry.S:309) 
> > [  979.226868] Last Breaking-Event-Address:
> > [  979.226868] xfs_printk_level (fs/xfs/xfs_message.c:52) xfs
> > [  979.226949] ---[ end trace 0000000000000000 ]---
> > [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check (fs/xfs/libxfs/xfs_dir2_data.c:137) xfs, xfs_dir3_block block 0x1020
> > [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3...........
> > [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> > [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> > [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> > [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> > [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> > [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> > [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> > [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558). Shutting down filesystem.
> > 
> > 
> > 
> > 
> > > 
> > > Also, does "xfs: update dir3 leaf block metadata after swap" fix it?
> > 
> > OK, I'll merge and give it a try.
> 
> It's still reproducible on xfs-linux for-next branch xfs-6.8-merge-2, which
> contains the 5759aa4f9560 ("xfs: update dir3 leaf block metadata after swap")

DOH.  Got a metadump?  I wonder if s390x is more fubar than we used to
think it was...

--D

> Thanks,
> Zorro
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > 
> > > (I suspect it won't because that's a dirent block, but for-next has a
> > > few directory fixes in it.)
> > > 
> > > --D
> > > 
> > > > 
> > > > Thanks,
> > > > Zorro
> > > > 
> > > > [1]
> > > >  [  899.271221] run fstests generic/648 at 2023-12-18 01:18:03
> > > >  [  899.454444] XFS (loop1): Mounting V5 Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > > >  [  899.455844] XFS (loop1): Ending clean mount
> > > >  [  899.463623] XFS (loop1): Unmounting Filesystem 9be840e1-9bd1-468e-81c5-179f2ebdce56
> > > >  [  899.587365] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  899.588501] XFS (dm-3): Ending clean mount
> > > >  ...
> > > >  ...
> > > >  [  974.217814] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > > >  [  974.803146] XFS (loop3): Starting recovery (logdev: internal)
> > > >  [  974.804634] XFS (loop3): Ending recovery (logdev: internal)
> > > >  [  975.782734] Direct I/O collision with buffered writes! File: /p2/df/f4f Comm: fsstress
> > > >  [  976.831942] lo_write_bvec: 15 callbacks suppressed
> > > >  [  976.831946] loop: Write error at byte offset 1766129664, length 4096.
> > > >  [  976.831953] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > > >  [  976.831957] I/O error, dev loop3, sector 3449468 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
> > > >  [  976.831960] XFS (loop3): log I/O error -5
> > > >  [  976.831963] XFS (loop3): Filesystem has been shut down due to log error (0x2).
> > > >  [  976.831964] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
> > > >  [  976.839603] buffer_io_error: 6 callbacks suppressed
> > > >  [  976.839604] Buffer I/O error on dev dm-3, logical block 20971392, async page read
> > > >  [  976.839607] Buffer I/O error on dev dm-3, logical block 20971393, async page read
> > > >  [  976.839609] Buffer I/O error on dev dm-3, logical block 20971394, async page read
> > > >  [  976.839611] Buffer I/O error on dev dm-3, logical block 20971395, async page read
> > > >  [  976.839612] Buffer I/O error on dev dm-3, logical block 20971396, async page read
> > > >  [  976.839613] Buffer I/O error on dev dm-3, logical block 20971397, async page read
> > > >  [  976.839614] Buffer I/O error on dev dm-3, logical block 20971398, async page read
> > > >  [  976.839616] Buffer I/O error on dev dm-3, logical block 20971399, async page read
> > > >  [  977.419266] XFS (loop3): Unmounting Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > > >  [  977.423981] iomap_finish_ioend: 15 callbacks suppressed
> > > >  [  977.423982] dm-3: writeback error on inode 131, offset 1766125568, sector 4059696
> > > >  [  977.423988] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
> > > >  [  978.434124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  978.449156] XFS (dm-3): log I/O error -5
> > > >  [  978.449159] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
> > > >  [  978.449160] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
> > > >  [  978.449163] XFS (dm-3): log I/O error -5
> > > >  [  978.449164] XFS (dm-3): log I/O error -5
> > > >  [  978.449165] XFS (dm-3): log I/O error -5
> > > >  [  978.449166] XFS (dm-3): log I/O error -5
> > > >  [  978.449167] XFS (dm-3): log I/O error -5
> > > >  [  978.498053] XFS (dm-3): Mounting V5 Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  978.513331] XFS (dm-3): Starting recovery (logdev: internal)
> > > >  [  978.584227] XFS (dm-3): Ending recovery (logdev: internal)
> > > >  [  978.587276] loop3: detected capacity change from 0 to 6876346
> > > >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > > >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> > > >  [  979.225078] XFS (loop3): Bad dir block magic!
> > > >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > > >  [  979.225120] ------------[ cut here ]------------
> > > >  [  979.225122] WARNING: CPU: 1 PID: 157902 at fs/xfs/xfs_message.c:104 assfail+0x4e/0x68 [xfs]
> > > >  [  979.225349] Modules linked in: tls loop lcs ctcm fsm qeth ccwgroup zfcp qdio scsi_transport_fc dasd_fba_mod dasd_eckd_mod dasd_mod rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 vfio drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng des_s390 virtio_net net_failover sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> > > >  [  979.225373] CPU: 1 PID: 157902 Comm: mount Kdump: loaded Not tainted 6.7.0-rc6 #1
> > > >  [  979.225376] Hardware name: IBM 8561 LT1 400 (KVM/Linux)
> > > >  [  979.225377] Krnl PSW : 0704c00180000000 000003ff7ffead7a (assfail+0x52/0x68 [xfs])
> > > >  [  979.225457]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> > > >  [  979.225460] Krnl GPRS: c000000000000021 000003ff800549b0 ffffffffffffffea 000000000000000a
> > > >  [  979.225462]            0000038000363418 0000000000000000 000003ff80162b74 000000030001c127
> > > >  [  979.225464]            00000000ccc51d40 000000030001c127 00000000dfa03000 0000000163d19600
> > > >  [  979.225465]            000003ffa1f2ef68 000003ff80147c28 000003ff7ffead68 00000380003634c0
> > > >  [  979.225476] Krnl Code: 000003ff7ffead6e: 95001010		cli	16(%r1),0
> > > >  [  979.225476]            000003ff7ffead72: a774000a		brc	7,000003ff7ffead86
> > > >  [  979.225476]           #000003ff7ffead76: af000000		mc	0,0
> > > >  [  979.225476]           >000003ff7ffead7a: eb6ff0a80004	lmg	%r6,%r15,168(%r15)
> > > >  [  979.225476]            000003ff7ffead80: 07fe		bcr	15,%r14
> > > >  [  979.225476]            000003ff7ffead82: 47000700		bc	0,1792
> > > >  [  979.225476]            000003ff7ffead86: af000000		mc	0,0
> > > >  [  979.225476]            000003ff7ffead8a: 0707		bcr	0,%r7
> > > >  [  979.225487] Call Trace:
> > > >  [  979.225488]  [<000003ff7ffead7a>] assfail+0x52/0x68 [xfs] 
> > > >  [  979.225568] ([<000003ff7ffead68>] assfail+0x40/0x68 [xfs])
> > > >  [  979.225648]  [<000003ff80004b4e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> > > >  [  979.225727]  [<000003ff80005eca>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> > > >  [  979.225807]  [<000003ff8001077a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> > > >  [  979.225886]  [<000003ff80011436>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> > > >  [  979.225965]  [<000003ff80011598>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> > > >  [  979.226045]  [<000003ff80011660>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> > > >  [  979.226124]  [<000003ff80012126>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> > > >  [  979.226204]  [<000003ff800122dc>] xlog_recover_process+0x104/0x150 [xfs] 
> > > >  [  979.226285]  [<000003ff800125a4>] xlog_do_recovery_pass+0x27c/0x748 [xfs] 
> > > >  [  979.226366]  [<000003ff80012ec8>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> > > >  [  979.226446]  [<000003ff80012f64>] xlog_do_recover+0x4c/0x218 [xfs] 
> > > >  [  979.226526]  [<000003ff8001446a>] xlog_recover+0xda/0x1a0 [xfs] 
> > > >  [  979.226607]  [<000003ff7fffa51e>] xfs_log_mount+0x11e/0x280 [xfs] 
> > > >  [  979.226686]  [<000003ff7ffec226>] xfs_mountfs+0x3e6/0x928 [xfs] 
> > > >  [  979.226765]  [<000003ff7fff3b4c>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> > > >  [  979.226844]  [<00000001ef53bfd4>] get_tree_bdev+0x144/0x1d0 
> > > >  [  979.226850]  [<00000001ef539a08>] vfs_get_tree+0x38/0x110 
> > > >  [  979.226852]  [<00000001ef56c432>] do_new_mount+0x17a/0x2d0 
> > > >  [  979.226855]  [<00000001ef56d0ac>] path_mount+0x1ac/0x818 
> > > >  [  979.226857]  [<00000001ef56d81c>] __s390x_sys_mount+0x104/0x148 
> > > >  [  979.226860]  [<00000001efbb6e00>] __do_syscall+0x1d0/0x1f8 
> > > >  [  979.226864]  [<00000001efbc7088>] system_call+0x70/0x98 
> > > >  [  979.226868] Last Breaking-Event-Address:
> > > >  [  979.226868]  [<000003ff7ffeabc4>] xfs_printk_level+0xac/0xd8 [xfs]
> > > >  [  979.226949] ---[ end trace 0000000000000000 ]---
> > > >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> > > >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > > >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > > >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> > > >  [  979.227738] 00000010: 00 00 00 00 00 00 00 00 c1 95 44 38 a1 8d 4b 4a  ..........D8..KJ
> > > >  [  979.227739] 00000020: ad 32 0e 29 c4 07 13 ed 00 00 00 00 00 00 00 80  .2.)............
> > > >  [  979.227740] 00000030: 02 68 fc c8 00 00 00 00 00 00 00 00 00 00 00 00  .h..............
> > > >  [  979.227741] 00000040: 00 00 00 00 00 00 00 80 01 2e 02 00 00 00 00 40  ...............@
> > > >  [  979.227742] 00000050: 00 00 00 00 00 00 00 80 02 2e 2e 02 00 00 00 50  ...............P
> > > >  [  979.227743] 00000060: 00 00 00 00 00 00 00 83 08 66 73 73 32 36 30 38  .........fss2608
> > > >  [  979.227745] 00000070: 36 01 00 00 00 00 00 60 00 00 00 00 00 00 00 84  6......`........
> > > >  [  979.227748] XFS (loop3): Corruption of in-memory data (0x8) detected at __xfs_buf_submit+0x78/0x230 [xfs] (fs/xfs/xfs_buf.c:1558).  Shutting down filesystem.
> > > >  [  979.227826] XFS (loop3): Please unmount the filesystem and rectify the problem(s)
> > > >  [  979.227912] XFS (loop3): log mount/recovery failed: error -117
> > > >  [  979.228061] XFS (loop3): log mount failed
> > > >  [  980.224124] XFS (dm-3): Unmounting Filesystem d636e675-dfc9-4e5d-b777-ff856af136a9
> > > >  [  980.318601] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  980.353946] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  980.355207] XFS (loop0): Ending clean mount
> > > >  [  982.585362] XFS (loop1): Mounting V5 Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > > >  [  982.586881] XFS (loop1): Ending clean mount
> > > >  [  982.588944] XFS (loop1): Unmounting Filesystem ec19bbbd-e925-41c9-befe-28218e74a23b
> > > >  [  982.603160] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk!
> > > >  [  982.620838] XFS (loop0): Unmounting Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  982.636206] XFS (loop0): Mounting V5 Filesystem c8777546-cb21-46f2-9963-456588f60b85
> > > >  [  982.637231] XFS (loop0): Ending clean mount
> > > >  
> > > > 
> > > > 
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-04  4:35       ` Darrick J. Wong
@ 2024-01-18  1:59         ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2024-01-18  1:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Zorro Lang, linux-xfs, fstests

[ FYI, I missed this because I got the fstests list copy first, not
linux-xfs list copy and so it got filtered into fstests, not XFS.
please just send test failures like this to the linux-xfs list -
there is no value in sending them to fstests as well but it can
cause bug reports to "go missing". ]

On Wed, Jan 03, 2024 at 08:35:52PM -0800, Darrick J. Wong wrote:
> On Mon, Dec 25, 2023 at 09:38:54PM +0800, Zorro Lang wrote:
> > On Tue, Dec 19, 2023 at 02:34:20PM +0800, Zorro Lang wrote:
> > > > Also, does "xfs: update dir3 leaf block metadata after swap" fix it?
> > > 
> > > OK, I'll merge and give it a try.
> > 
> > It's still reproducible on xfs-linux for-next branch xfs-6.8-merge-2, which
> > contains the 5759aa4f9560 ("xfs: update dir3 leaf block metadata after swap")
> 
> DOH.  Got a metadump?  I wonder if s390x is more fubar than we used to
> think it was...

I'm betting that the directory corruption is being reported because
the directory block change was not replayed by recovery due to the
bad magic number error. The kernel was configured with
XFS_ASSERT_FATAL=n, so it continued on after the recovery failure
and things went bad when that unrecovered metadata was accessed.

Two things: a magic number mismatch between the buffer and the log
item should cause recovery to fail on production kernels, and what
we need to work out is how the buffer being recovered had a magic
number mismatch with the buf log item.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2023-12-18 14:01 [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply Zorro Lang
  2023-12-18 17:48 ` Darrick J. Wong
@ 2024-01-18  4:20 ` Dave Chinner
  2024-01-19  1:38   ` Zorro Lang
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2024-01-18  4:20 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs, fstests

On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> Hi,
> 
> Recently I hit a crash [1] on s390x with 64k directory block size xfs
> (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> even not panic, a assertion failure will happen.
> 
> I found it from an old downstream kernel at first, then reproduced it
> on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> time this issue be there, just reported it at first.
>  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
>  [  979.225078] XFS (loop3): Bad dir block magic!
>  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414

Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object
that we recovered into the buffer did not have a
XFS_DIR3_BLOCK_MAGIC type.

Perhaps the buf log item didn't contain the first 128 bytes of the
buffer (or maybe any of it), and so didn't recovery the magic number?

Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure
preserves the journal contents when the issue triggers, then get a
metadump of the filesystem so I can dig into the contents of the
journal?  I really want to see what is in the buf log item we fail
to recover.

We don't want recovery to continue here because that will result in
the journal being fully recovered and updated and so we won't be
able to replay the recovery failure from it. 

i.e. if we leave the buffer we recovered in memory without failure
because the ASSERT is just a warn, we continue onwards and likely
then recover newer changes over the top of it. This may or may
not result in a correctly recovered buffer, depending on what parts
of the buffer got relogged.

IOWs, we should be expecting corruption to be detected somewhere
further down the track once we've seen this warning, and really we
should be aborting journal recovery if we see a mismatch like this.

.....

>  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
>  [  979.227732] XFS (loop3): Unmount and run xfs_repair
>  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
>  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 

XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's
the tail pointer (btp->count) that is bad, this indicates that maybe
the tail didn't get written correctly by subsequent checkpoint
recoveries. We don't know, because that isn't in the output below.

It likely doesn't matter, because I think the problem is either a
runtime problem writing bad stuff into the journal, or a recovery
problem failing to handle the contents correctly. Hence the need for
a metadump.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-18  4:20 ` Dave Chinner
@ 2024-01-19  1:38   ` Zorro Lang
  2024-01-19  7:17     ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Zorro Lang @ 2024-01-19  1:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 18, 2024 at 03:20:21PM +1100, Dave Chinner wrote:
> On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > even not panic, a assertion failure will happen.
> > 
> > I found it from an old downstream kernel at first, then reproduced it
> > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > time this issue be there, just reported it at first.
> >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> >  [  979.225078] XFS (loop3): Bad dir block magic!
> >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> 
> Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object
> that we recovered into the buffer did not have a
> XFS_DIR3_BLOCK_MAGIC type.
> 
> Perhaps the buf log item didn't contain the first 128 bytes of the
> buffer (or maybe any of it), and so didn't recovery the magic number?
> 
> Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure
> preserves the journal contents when the issue triggers, then get a
> metadump of the filesystem so I can dig into the contents of the
> journal?  I really want to see what is in the buf log item we fail
> to recover.
> 
> We don't want recovery to continue here because that will result in
> the journal being fully recovered and updated and so we won't be
> able to replay the recovery failure from it. 
> 
> i.e. if we leave the buffer we recovered in memory without failure
> because the ASSERT is just a warn, we continue onwards and likely
> then recover newer changes over the top of it. This may or may
> not result in a correctly recovered buffer, depending on what parts
> of the buffer got relogged.
> 
> IOWs, we should be expecting corruption to be detected somewhere
> further down the track once we've seen this warning, and really we
> should be aborting journal recovery if we see a mismatch like this.
> 
> .....
> 
> >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> 
> XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's
> the tail pointer (btp->count) that is bad, this indicates that maybe
> the tail didn't get written correctly by subsequent checkpoint
> recoveries. We don't know, because that isn't in the output below.
> 
> It likely doesn't matter, because I think the problem is either a
> runtime problem writing bad stuff into the journal, or a recovery
> problem failing to handle the contents correctly. Hence the need for
> a metadump.

Hi Dave,

Thanks for your reply. It's been a month passed, since I reported this
bug last time. Now I can't reproduce this issue on latest upstream
mainline linux and xfs-linux for-next branch. I've tried to do the
same testing ~1000 times, still can't reproduce it...

If you think it might not be fixed but be hided, I can try it on older
kernel which can reproduce this bug last time, to get a metadump. What
do you think?

Thanks,
Zorro

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-19  1:38   ` Zorro Lang
@ 2024-01-19  7:17     ` Dave Chinner
  2024-01-20 11:26       ` Zorro Lang
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2024-01-19  7:17 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Fri, Jan 19, 2024 at 09:38:07AM +0800, Zorro Lang wrote:
> On Thu, Jan 18, 2024 at 03:20:21PM +1100, Dave Chinner wrote:
> > On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > > Hi,
> > > 
> > > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > > even not panic, a assertion failure will happen.
> > > 
> > > I found it from an old downstream kernel at first, then reproduced it
> > > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > > time this issue be there, just reported it at first.
> > >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> > >  [  979.225078] XFS (loop3): Bad dir block magic!
> > >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > 
> > Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object
> > that we recovered into the buffer did not have a
> > XFS_DIR3_BLOCK_MAGIC type.
> > 
> > Perhaps the buf log item didn't contain the first 128 bytes of the
> > buffer (or maybe any of it), and so didn't recovery the magic number?
> > 
> > Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure
> > preserves the journal contents when the issue triggers, then get a
> > metadump of the filesystem so I can dig into the contents of the
> > journal?  I really want to see what is in the buf log item we fail
> > to recover.
> > 
> > We don't want recovery to continue here because that will result in
> > the journal being fully recovered and updated and so we won't be
> > able to replay the recovery failure from it. 
> > 
> > i.e. if we leave the buffer we recovered in memory without failure
> > because the ASSERT is just a warn, we continue onwards and likely
> > then recover newer changes over the top of it. This may or may
> > not result in a correctly recovered buffer, depending on what parts
> > of the buffer got relogged.
> > 
> > IOWs, we should be expecting corruption to be detected somewhere
> > further down the track once we've seen this warning, and really we
> > should be aborting journal recovery if we see a mismatch like this.
> > 
> > .....
> > 
> > >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> > >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> > 
> > XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's
> > the tail pointer (btp->count) that is bad, this indicates that maybe
> > the tail didn't get written correctly by subsequent checkpoint
> > recoveries. We don't know, because that isn't in the output below.
> > 
> > It likely doesn't matter, because I think the problem is either a
> > runtime problem writing bad stuff into the journal, or a recovery
> > problem failing to handle the contents correctly. Hence the need for
> > a metadump.
> 
> Hi Dave,
> 
> Thanks for your reply. It's been a month passed, since I reported this
> bug last time. Now I can't reproduce this issue on latest upstream
> mainline linux and xfs-linux for-next branch. I've tried to do the
> same testing ~1000 times, still can't reproduce it...
> 
> If you think it might not be fixed but be hided, I can try it on older
> kernel which can reproduce this bug last time, to get a metadump. What
> do you think?

Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
fixed it? Nothing in the for-next branch really looks relevant to
the problem to me....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-19  7:17     ` Dave Chinner
@ 2024-01-20 11:26       ` Zorro Lang
  2024-01-20 23:58         ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Zorro Lang @ 2024-01-20 11:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> On Fri, Jan 19, 2024 at 09:38:07AM +0800, Zorro Lang wrote:
> > On Thu, Jan 18, 2024 at 03:20:21PM +1100, Dave Chinner wrote:
> > > On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > > > Hi,
> > > > 
> > > > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > > > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > > > even not panic, a assertion failure will happen.
> > > > 
> > > > I found it from an old downstream kernel at first, then reproduced it
> > > > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > > > time this issue be there, just reported it at first.
> > > >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> > > >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> > > >  [  979.225078] XFS (loop3): Bad dir block magic!
> > > >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > > 
> > > Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object
> > > that we recovered into the buffer did not have a
> > > XFS_DIR3_BLOCK_MAGIC type.
> > > 
> > > Perhaps the buf log item didn't contain the first 128 bytes of the
> > > buffer (or maybe any of it), and so didn't recovery the magic number?
> > > 
> > > Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure
> > > preserves the journal contents when the issue triggers, then get a
> > > metadump of the filesystem so I can dig into the contents of the
> > > journal?  I really want to see what is in the buf log item we fail
> > > to recover.
> > > 
> > > We don't want recovery to continue here because that will result in
> > > the journal being fully recovered and updated and so we won't be
> > > able to replay the recovery failure from it. 
> > > 
> > > i.e. if we leave the buffer we recovered in memory without failure
> > > because the ASSERT is just a warn, we continue onwards and likely
> > > then recover newer changes over the top of it. This may or may
> > > not result in a correctly recovered buffer, depending on what parts
> > > of the buffer got relogged.
> > > 
> > > IOWs, we should be expecting corruption to be detected somewhere
> > > further down the track once we've seen this warning, and really we
> > > should be aborting journal recovery if we see a mismatch like this.
> > > 
> > > .....
> > > 
> > > >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> > > >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> > > >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> > > >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> > > 
> > > XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's
> > > the tail pointer (btp->count) that is bad, this indicates that maybe
> > > the tail didn't get written correctly by subsequent checkpoint
> > > recoveries. We don't know, because that isn't in the output below.
> > > 
> > > It likely doesn't matter, because I think the problem is either a
> > > runtime problem writing bad stuff into the journal, or a recovery
> > > problem failing to handle the contents correctly. Hence the need for
> > > a metadump.
> > 
> > Hi Dave,
> > 
> > Thanks for your reply. It's been a month passed, since I reported this
> > bug last time. Now I can't reproduce this issue on latest upstream
> > mainline linux and xfs-linux for-next branch. I've tried to do the
> > same testing ~1000 times, still can't reproduce it...
> > 
> > If you think it might not be fixed but be hided, I can try it on older
> > kernel which can reproduce this bug last time, to get a metadump. What
> > do you think?
> 
> Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> fixed it? Nothing in the for-next branch really looks relevant to
> the problem to me....

Hi Dave,

Finally, I got a chance to reproduce this issue on latest upstream mainline
linux (HEAD=9d64bf433c53) (and linux-xfs) again.

Looks like some userspace updates hide the issue, but I haven't found out what
change does that, due to it's a big change about a whole system version. I
reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
(I'll try to find out what changes cause that later if it's necessary)

Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
you suggested. And got the xfs metadump file after it crashed [1] and rebooted.

Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:

https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing

I can reproduce this issue by restoring and mounting this xfs:
  # xfs_mdrestore testfs.metadump testfs.img
  # mount testfs.img /mnt/tmp
then the system crashed [2] directly.

Thanks,
Zorro



[1]
...
 [ 1700.892578] blk_print_req_error: 7 callbacks suppressed
 [ 1700.892580] I/O error, dev loop3, sector 3518365 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 0
 [ 1700.892583] I/O error, dev loop3, sector 3518365 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 0
 [ 1700.892587] XFS (loop3): log I/O error -5
 [ 1700.892590] XFS (loop3): Filesystem has been shut down due to log error (0x2).
 [ 1700.892591] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
 [ 1700.895245] Buffer I/O error on dev dm-3, logical block 20971392, async page read
 [ 1700.895251] Buffer I/O error on dev dm-3, logical block 20971393, async page read
 [ 1701.499437] XFS (loop3): Unmounting Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
 [ 1701.504307] iomap_finish_ioend: 47 callbacks suppressed
 [ 1701.504308] dm-3: writeback error on inode 131, offset 1801400320, sector 4585184
 [ 1701.504315] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 [ 1702.507024] XFS (dm-3): log I/O error -5
 [ 1702.507034] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
 [ 1702.507037] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
 [ 1702.522361] XFS (dm-3): Unmounting Filesystem cd0c2e0e-6afe-4460-ab7a-186f4cf3d3c5
 [ 1702.532965] XFS (dm-3): Mounting V5 Filesystem cd0c2e0e-6afe-4460-ab7a-186f4cf3d3c5
 [ 1702.548545] XFS (dm-3): Starting recovery (logdev: internal)
 [ 1702.606253] XFS (dm-3): Ending recovery (logdev: internal)
 [ 1702.608865] loop3: detected capacity change from 0 to 6876346
 [ 1702.622241] XFS (loop3): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
 [ 1703.228237] XFS (loop3): Starting recovery (logdev: internal)
 [ 1703.321680] XFS (loop3): Ending recovery (logdev: internal)
 [ 1705.345881] XFS (dm-3): log I/O error -5
 [ 1705.345890] XFS (dm-3): Filesystem has been shut down due to log error (0x2).
 [ 1705.345891] XFS (dm-3): Please unmount the filesystem and rectify the problem(s).
 [ 1705.345905] I/O error, dev loop3, sector 3549974 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 0
 [ 1705.345913] XFS (loop3): log I/O error -5
 [ 1705.345916] XFS (loop3): Filesystem has been shut down due to log error (0x2).
 [ 1705.345917] XFS (loop3): Please unmount the filesystem and rectify the problem(s).
 [ 1705.346697] loop3: writeback error on inode 8003302, offset 4964352, sector 3208280
 [ 1705.346698] loop3: writeback error on inode 8003302, offset 4968448, sector 3208688
 [ 1705.346700] loop3: writeback error on inode 8003302, offset 4972544, sector 3207016
 [ 1705.346702] loop3: writeback error on inode 8003302, offset 4976640, sector 3205616
 [ 1705.346704] loop3: writeback error on inode 8003302, offset 4988928, sector 3205472
 [ 1705.346706] loop3: writeback error on inode 8003302, offset 5005312, sector 3205344
 [ 1705.348167] buffer_io_error: 6 callbacks suppressed
 [ 1705.348169] Buffer I/O error on dev dm-3, logical block 20971392, async page read
 [ 1705.348175] Buffer I/O error on dev dm-3, logical block 20971393, async page read
 [ 1705.348177] Buffer I/O error on dev dm-3, logical block 20971394, async page read
 [ 1705.348178] Buffer I/O error on dev dm-3, logical block 20971395, async page read
 [ 1705.348180] Buffer I/O error on dev dm-3, logical block 20971396, async page read
 [ 1705.348182] Buffer I/O error on dev dm-3, logical block 20971397, async page read
 [ 1705.348184] Buffer I/O error on dev dm-3, logical block 20971398, async page read
 [ 1705.348185] Buffer I/O error on dev dm-3, logical block 20971399, async page read
 [ 1705.892860] XFS (loop3): Unmounting Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
 [ 1705.898719] I/O error, dev loop3, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
 [ 1706.916137] XFS (dm-3): Unmounting Filesystem cd0c2e0e-6afe-4460-ab7a-186f4cf3d3c5
 [ 1706.957851] XFS (dm-3): Mounting V5 Filesystem cd0c2e0e-6afe-4460-ab7a-186f4cf3d3c5
 [ 1706.972954] XFS (dm-3): Starting recovery (logdev: internal)
 [ 1707.037394] XFS (dm-3): Ending recovery (logdev: internal)
 [ 1707.038974] loop3: detected capacity change from 0 to 6876346
 [ 1707.044730] XFS (loop3): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
 [ 1707.061925] XFS (loop3): Starting recovery (logdev: internal)
 [ 1707.079549] XFS (loop3): Bad dir block magic!
 [ 1707.079552] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
 [ 1707.079593] ------------[ cut here ]------------
 [ 1707.079594] kernel BUG at fs/xfs/xfs_message.c:102!
 [ 1707.079680] monitor event: 0040 ilc:2 [#1] SMP 
 [ 1707.079685] Modules linked in: tls loop rfkill sunrpc vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio iommufd drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 virtio_net prng des_s390 net_failover sha3_512_s390 failover virtio_blk sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
 [ 1707.079707] CPU: 0 PID: 139643 Comm: mount Kdump: loaded Not tainted 6.7.0+ #1
 [ 1707.079710] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
 [ 1707.079712] Krnl PSW : 0704e00180000000 000003ff7ff1b2ca (assfail+0x62/0x68 [xfs])
 [ 1707.079981]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
 [ 1707.079984] Krnl GPRS: c000000100000021 000003ff7ff95a40 ffffffffffffffea 000000000000000a
 [ 1707.079986]            00000380004133f0 0000000000000000 000003ff800b24c4 0000000300014178
 [ 1707.079988]            000000009580b840 0000000300014178 00000000816ae000 0000000094190a00
 [ 1707.079989]            000003ff8a92ef68 000003ff80096c20 000003ff7ff1b2a8 0000038000413498
 [ 1707.079998] Krnl Code: 000003ff7ff1b2bc: f0a8000407fe	srp	4(11,%r0),2046,8
 [ 1707.079998]            000003ff7ff1b2c2: 47000700		bc	0,1792
 [ 1707.079998]           #000003ff7ff1b2c6: af000000		mc	0,0
 [ 1707.079998]           >000003ff7ff1b2ca: 0707		bcr	0,%r7
 [ 1707.079998]            000003ff7ff1b2cc: 0707		bcr	0,%r7
 [ 1707.079998]            000003ff7ff1b2ce: 0707		bcr	0,%r7
 [ 1707.079998]            000003ff7ff1b2d0: c00400133e0c	brcl	0,000003ff80182ee8
 [ 1707.079998]            000003ff7ff1b2d6: eb6ff0480024	stmg	%r6,%r15,72(%r15)
 [ 1707.080011] Call Trace:
 [ 1707.080012]  [<000003ff7ff1b2ca>] assfail+0x62/0x68 [xfs] 
 [ 1707.080112] ([<000003ff7ff1b2a8>] assfail+0x40/0x68 [xfs])
 [ 1707.080209]  [<000003ff7ff3523e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
 [ 1707.080307]  [<000003ff7ff365ba>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
 [ 1707.080405]  [<000003ff7ff4089a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
 [ 1707.080504]  [<000003ff7ff417ce>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
 [ 1707.080605]  [<000003ff7ff41930>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
 [ 1707.080703]  [<000003ff7ff419f8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
 [ 1707.080802]  [<000003ff7ff4202e>] xlog_recover_process_data+0xb6/0x168 [xfs] 
 [ 1707.080901]  [<000003ff7ff421e4>] xlog_recover_process+0x104/0x150 [xfs] 
 [ 1707.080999]  [<000003ff7ff425e2>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] 
 [ 1707.081098]  [<000003ff7ff42dd0>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
 [ 1707.081196]  [<000003ff7ff42e6c>] xlog_do_recover+0x4c/0x218 [xfs] 
 [ 1707.081294]  [<000003ff7ff4447a>] xlog_recover+0xda/0x1a0 [xfs] 
 [ 1707.081392]  [<000003ff7ff2ad36>] xfs_log_mount+0x11e/0x280 [xfs] 
 [ 1707.081489]  [<000003ff7ff1c756>] xfs_mountfs+0x3e6/0x920 [xfs] 
 [ 1707.081586]  [<000003ff7ff23ffc>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
 [ 1707.081682]  [<00000000a3d99f88>] get_tree_bdev+0x120/0x1a8 
 [ 1707.081690]  [<00000000a3d97690>] vfs_get_tree+0x38/0x110 
 [ 1707.081693]  [<00000000a3dcae28>] do_new_mount+0x188/0x2e0 
 [ 1707.081697]  [<00000000a3dcbaa4>] path_mount+0x1ac/0x818 
 [ 1707.081700]  [<00000000a3dcc214>] __s390x_sys_mount+0x104/0x148 
 [ 1707.081703]  [<00000000a4420796>] __do_syscall+0x21e/0x2b0 
 [ 1707.081707]  [<00000000a4430d60>] system_call+0x70/0x98 
 [ 1707.081711] Last Breaking-Event-Address:
 [ 1707.081712]  [<000003ff7ff1b2b2>] assfail+0x4a/0x68 [xfs]
 [ 1707.081810] Kernel panic - not syncing: Fatal exception: pan[    0.119599] Linux version 6.7.0+ (root@s390x-kvm-069.lab.eng.rdu2.redhat.com) (gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), GNU ld version 2.35.2-42.el9) #1 SMP Sat Jan 20 01:55:04 EST 2024
 [    0.119601] setup: Linux is running under KVM in 64-bit mode
 [    0.122163] setup: The maximum memory size is 256MB
 [    0.122165] setup: Relocating AMODE31 section of size 0x00003000
 [    0.133428] cpu: 4 configured CPUs, 0 standby CPUs
 [    0.133481] Write protected kernel read-only data: 14604k
 [    0.133490] Zone ranges:
 [    0.133491]   DMA      [mem 0x0000000000000000-0x000000007fffffff]
 [    0.133493]   Normal   empty
 [    0.133494] Movable zone start for each node
...
(rebooting)
...

[2]
...
 [11866.409382] XFS (loop1): Mounting V5 Filesystem cd0c2e0e-6afe-4460-ab7a-186f4cf3d3c5
 [11866.430464] XFS (loop1): Starting recovery (logdev: internal)
 [11866.438192] XFS (loop1): Ending recovery (logdev: internal)
 [-- MARK -- Sat Jan 20 10:55:00 2024] 
[12116.079456] loop2: detected capacity change from 0 to 6876344
 [12116.085727] XFS (loop2): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
 [12116.097020] XFS (loop2): Starting recovery (logdev: internal)
 [12116.113494] XFS (loop2): Bad dir block magic!
 [12116.113498] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
 [12116.113544] ------------[ cut here ]------------
 [12116.113546] kernel BUG at fs/xfs/xfs_message.c:102!
 [12116.113637] monitor event: 0040 ilc:2 [#1] SMP 
 [12116.113642] Modules linked in: loop rfkill sunrpc vfio_ccw zcrypt_cex4 mdev vfio_iommu_type1 vfio iommufd drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c virtio_net ghash_s390 prng net_failover des_s390 sha3_512_s390 virtio_blk failover sha3_256_s390 dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
 [12116.113673] CPU: 0 PID: 1538 Comm: mount Kdump: loaded Not tainted 6.7.0+ #1
 [12116.113676] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
 [12116.113677] Krnl PSW : 0704e00180000000 000003ff7fe9d2ca (assfail+0x62/0x68 [xfs])
 [12116.114117]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
 [12116.114120] Krnl GPRS: c000000000000021 000003ff7ff17a40 ffffffffffffffea 000000000000000a
 [12116.114123]            0000038000417438 0000000000000000 000003ff800344c4 0000000300014178
 [12116.114124]            000000008e50a480 0000000300014178 000000008144e000 000000008648be00
 [12116.114126]            000003ff8f52ef68 000003ff80018c20 000003ff7fe9d2a8 00000380004174e0
 [12116.114135] Krnl Code: 000003ff7fe9d2bc: f0a8000407fe	srp	4(11,%r0),2046,8
 [12116.114135]            000003ff7fe9d2c2: 47000700		bc	0,1792
 [12116.114135]           #000003ff7fe9d2c6: af000000		mc	0,0
 [12116.114135]           >000003ff7fe9d2ca: 0707		bcr	0,%r7
 [12116.114135]            000003ff7fe9d2cc: 0707		bcr	0,%r7
 [12116.114135]            000003ff7fe9d2ce: 0707		bcr	0,%r7
 [12116.114135]            000003ff7fe9d2d0: c00400133e0c	brcl	0,000003ff80104ee8
 [12116.114135]            000003ff7fe9d2d6: eb6ff0480024	stmg	%r6,%r15,72(%r15)
 [12116.114148] Call Trace:
 [12116.114149]  [<000003ff7fe9d2ca>] assfail+0x62/0x68 [xfs] 
 [12116.114274] ([<000003ff7fe9d2a8>] assfail+0x40/0x68 [xfs])
 [12116.114374]  [<000003ff7feb723e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
 [12116.114475]  [<000003ff7feb85ba>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
 [12116.114577]  [<000003ff7fec289a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
 [12116.114679]  [<000003ff7fec37ce>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
 [12116.114782]  [<000003ff7fec3930>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
 [12116.114883]  [<000003ff7fec39f8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
 [12116.114987]  [<000003ff7fec402e>] xlog_recover_process_data+0xb6/0x168 [xfs] 
 [12116.115089]  [<000003ff7fec41e4>] xlog_recover_process+0x104/0x150 [xfs] 
 [12116.115192]  [<000003ff7fec45e2>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] 
 [12116.115295]  [<000003ff7fec4dd0>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
 [12116.115397]  [<000003ff7fec4e6c>] xlog_do_recover+0x4c/0x218 [xfs] 
 [12116.115500]  [<000003ff7fec647a>] xlog_recover+0xda/0x1a0 [xfs] 
 [12116.115602]  [<000003ff7feacd36>] xfs_log_mount+0x11e/0x280 [xfs] 
 [12116.115706]  [<000003ff7fe9e756>] xfs_mountfs+0x3e6/0x920 [xfs] 
 [12116.115809]  [<000003ff7fea5ffc>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
 [12116.115911]  [<00000000ad9f1f88>] get_tree_bdev+0x120/0x1a8 
 [12116.115918]  [<00000000ad9ef690>] vfs_get_tree+0x38/0x110 
 [12116.115921]  [<00000000ada22e28>] do_new_mount+0x188/0x2e0 
 [12116.115925]  [<00000000ada23aa4>] path_mount+0x1ac/0x818 
 [12116.115928]  [<00000000ada24214>] __s390x_sys_mount+0x104/0x148 
 [12116.115931]  [<00000000ae078796>] __do_syscall+0x21e/0x2b0 
 [12116.115936]  [<00000000ae088d60>] system_call+0x70/0x98 
 [12116.115940] Last Breaking-Event-Address:
 [12116.115940]  [<000003ff7fe9d2b2>] assfail+0x4a/0x68 [xfs]
 [12[    0.117676] Linux version 6.7.0+ (root@s390x-kvm-069.lab.eng.rdu2.redhat.com) (gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), GNU ld version 2.35.2-42.el9) #1 SMP Sat Jan 20 01:55:04 EST 2024
 [    0.117678] setup: Linux is running under KVM in 64-bit mode
 [    0.120239] setup: The maximum memory size is 256MB
 [    0.120241] setup: Relocating AMODE31 section of size 0x00003000
...

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-20 11:26       ` Zorro Lang
@ 2024-01-20 23:58         ` Dave Chinner
  2024-01-22  7:23           ` Zorro Lang
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2024-01-20 23:58 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Sat, Jan 20, 2024 at 07:26:00PM +0800, Zorro Lang wrote:
> On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> > Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> > fixed it? Nothing in the for-next branch really looks relevant to
> > the problem to me....
> 
> Hi Dave,
> 
> Finally, I got a chance to reproduce this issue on latest upstream mainline
> linux (HEAD=9d64bf433c53) (and linux-xfs) again.
> 
> Looks like some userspace updates hide the issue, but I haven't found out what
> change does that, due to it's a big change about a whole system version. I
> reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
> (I'll try to find out what changes cause that later if it's necessary)
> 
> Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
> you suggested. And got the xfs metadump file after it crashed [1] and rebooted.
> 
> Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
> but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:
> 
> https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing

Ok, I forgot the log on s390 is in big endian format. I don't have a
bigendian machine here, so I can't replay the log to trace it or
find out what disk address the buffer belongs. I can't even use
xfs_logprint to dump the log.

Can you take that metadump, restore it on the s390 machine, and
trace a mount attempt? i.e in one shell run 'trace-cmd record -e
xfs\*' and then in another shell run 'mount testfs.img /mnt/test'
and then after the assert fail terminate the tracing and run
'trace-cmd report > testfs.trace.txt'?

The trace will tell me what buffer was being replayed when the
failure occurred, and from there I can look at the raw dump of the
log and the buffer on disk and go from there...

>  [ 1707.044730] XFS (loop3): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
>  [ 1707.061925] XFS (loop3): Starting recovery (logdev: internal)
>  [ 1707.079549] XFS (loop3): Bad dir block magic!

At minimum, this error message will need to be improved to tell us
what buffer failed this check....

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-20 23:58         ` Dave Chinner
@ 2024-01-22  7:23           ` Zorro Lang
  2024-01-22 11:21             ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Zorro Lang @ 2024-01-22  7:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Sun, Jan 21, 2024 at 10:58:49AM +1100, Dave Chinner wrote:
> On Sat, Jan 20, 2024 at 07:26:00PM +0800, Zorro Lang wrote:
> > On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> > > Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> > > fixed it? Nothing in the for-next branch really looks relevant to
> > > the problem to me....
> > 
> > Hi Dave,
> > 
> > Finally, I got a chance to reproduce this issue on latest upstream mainline
> > linux (HEAD=9d64bf433c53) (and linux-xfs) again.
> > 
> > Looks like some userspace updates hide the issue, but I haven't found out what
> > change does that, due to it's a big change about a whole system version. I
> > reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
> > (I'll try to find out what changes cause that later if it's necessary)
> > 
> > Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
> > you suggested. And got the xfs metadump file after it crashed [1] and rebooted.
> > 
> > Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
> > but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:
> > 
> > https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing
> 
> Ok, I forgot the log on s390 is in big endian format. I don't have a
> bigendian machine here, so I can't replay the log to trace it or
> find out what disk address the buffer belongs. I can't even use
> xfs_logprint to dump the log.
> 
> Can you take that metadump, restore it on the s390 machine, and
> trace a mount attempt? i.e in one shell run 'trace-cmd record -e
> xfs\*' and then in another shell run 'mount testfs.img /mnt/test'

The 'mount testfs.img /mnt/test' will crash the kernel and reboot
the system directly ...

> and then after the assert fail terminate the tracing and run
> 'trace-cmd report > testfs.trace.txt'?

... Can I still get the trace report after rebooting?

Thanks,
Zorro

> 
> The trace will tell me what buffer was being replayed when the
> failure occurred, and from there I can look at the raw dump of the
> log and the buffer on disk and go from there...
> 
> >  [ 1707.044730] XFS (loop3): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
> >  [ 1707.061925] XFS (loop3): Starting recovery (logdev: internal)
> >  [ 1707.079549] XFS (loop3): Bad dir block magic!
> 
> At minimum, this error message will need to be improved to tell us
> what buffer failed this check....
> 
> -Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-22  7:23           ` Zorro Lang
@ 2024-01-22 11:21             ` Dave Chinner
  2024-01-22 13:18               ` Zorro Lang
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2024-01-22 11:21 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Mon, Jan 22, 2024 at 03:23:12PM +0800, Zorro Lang wrote:
> On Sun, Jan 21, 2024 at 10:58:49AM +1100, Dave Chinner wrote:
> > On Sat, Jan 20, 2024 at 07:26:00PM +0800, Zorro Lang wrote:
> > > On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> > > > Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> > > > fixed it? Nothing in the for-next branch really looks relevant to
> > > > the problem to me....
> > > 
> > > Hi Dave,
> > > 
> > > Finally, I got a chance to reproduce this issue on latest upstream mainline
> > > linux (HEAD=9d64bf433c53) (and linux-xfs) again.
> > > 
> > > Looks like some userspace updates hide the issue, but I haven't found out what
> > > change does that, due to it's a big change about a whole system version. I
> > > reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
> > > (I'll try to find out what changes cause that later if it's necessary)
> > > 
> > > Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
> > > you suggested. And got the xfs metadump file after it crashed [1] and rebooted.
> > > 
> > > Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
> > > but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:
> > > 
> > > https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing
> > 
> > Ok, I forgot the log on s390 is in big endian format. I don't have a
> > bigendian machine here, so I can't replay the log to trace it or
> > find out what disk address the buffer belongs. I can't even use
> > xfs_logprint to dump the log.
> > 
> > Can you take that metadump, restore it on the s390 machine, and
> > trace a mount attempt? i.e in one shell run 'trace-cmd record -e
> > xfs\*' and then in another shell run 'mount testfs.img /mnt/test'
> 
> The 'mount testfs.img /mnt/test' will crash the kernel and reboot
> the system directly ...

Turn off panic-on-oops. Some thing like 'echo 0 >
/proc/sys/kernel/panic_on_oops' will do that, I think.


> > and then after the assert fail terminate the tracing and run
> > 'trace-cmd report > testfs.trace.txt'?
> 
> ... Can I still get the trace report after rebooting?

Not that I know of. But, then again, I don't reboot test machines
when an oops or assert fail occurs - I like to have a warm corpse
left behind that I can poke around in with various blunt instruments
to see what went wrong....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-22 11:21             ` Dave Chinner
@ 2024-01-22 13:18               ` Zorro Lang
  2024-01-22 22:09                 ` Dave Chinner
  2024-01-22 22:49                 ` Dave Chinner
  0 siblings, 2 replies; 18+ messages in thread
From: Zorro Lang @ 2024-01-22 13:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Jan 22, 2024 at 10:21:07PM +1100, Dave Chinner wrote:
> On Mon, Jan 22, 2024 at 03:23:12PM +0800, Zorro Lang wrote:
> > On Sun, Jan 21, 2024 at 10:58:49AM +1100, Dave Chinner wrote:
> > > On Sat, Jan 20, 2024 at 07:26:00PM +0800, Zorro Lang wrote:
> > > > On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> > > > > Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> > > > > fixed it? Nothing in the for-next branch really looks relevant to
> > > > > the problem to me....
> > > > 
> > > > Hi Dave,
> > > > 
> > > > Finally, I got a chance to reproduce this issue on latest upstream mainline
> > > > linux (HEAD=9d64bf433c53) (and linux-xfs) again.
> > > > 
> > > > Looks like some userspace updates hide the issue, but I haven't found out what
> > > > change does that, due to it's a big change about a whole system version. I
> > > > reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
> > > > (I'll try to find out what changes cause that later if it's necessary)
> > > > 
> > > > Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
> > > > you suggested. And got the xfs metadump file after it crashed [1] and rebooted.
> > > > 
> > > > Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
> > > > but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:
> > > > 
> > > > https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing
> > > 
> > > Ok, I forgot the log on s390 is in big endian format. I don't have a
> > > bigendian machine here, so I can't replay the log to trace it or
> > > find out what disk address the buffer belongs. I can't even use
> > > xfs_logprint to dump the log.
> > > 
> > > Can you take that metadump, restore it on the s390 machine, and
> > > trace a mount attempt? i.e in one shell run 'trace-cmd record -e
> > > xfs\*' and then in another shell run 'mount testfs.img /mnt/test'
> > 
> > The 'mount testfs.img /mnt/test' will crash the kernel and reboot
> > the system directly ...
> 
> Turn off panic-on-oops. Some thing like 'echo 0 >
> /proc/sys/kernel/panic_on_oops' will do that, I think.

Thanks, it helps. I did below steps:

# trace-cmd record -e xfs\*
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x5b7000
    90112 bytes in size
CPU1 data recorded at offset=0x5cd000
    57344 bytes in size
CPU2 data recorded at offset=0x5db000
    9945088 bytes in size
CPU3 data recorded at offset=0xf57000
    786432 bytes in size
# mount testfs.img /mnt/tmp
Segmentation fault
# (Ctrl^C the trace-cmd record process)
# dmesg
[180724.293443] loop: module loaded
[180724.294001] loop0: detected capacity change from 0 to 6876344
[180724.296987] XFS (loop0): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
[180724.309088] XFS (loop0): Starting recovery (logdev: internal)
[180724.335207] XFS (loop0): Bad dir block magic!
[180724.335210] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
[180724.335264] ------------[ cut here ]------------
[180724.335265] kernel BUG at fs/xfs/xfs_message.c:102!
[180724.335356] monitor event: 0040 ilc:2 [#1] SMP 
[180724.335362] Modules linked in: loop sunrpc rfkill vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio iommufd drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng virt
io_net des_s390 sha3_512_s390 net_failover sha3_256_s390 failover virtio_blk dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
[180724.335379] CPU: 2 PID: 6449 Comm: mount Kdump: loaded Not tainted 6.7.0+ #1
[180724.335382] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
[180724.335384] Krnl PSW : 0704e00180000000 000003ff7fe692ca (assfail+0x62/0x68 [xfs])
[180724.335727]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[180724.335729] Krnl GPRS: c000000000000021 000003ff7fee3a40 ffffffffffffffea 000000000000000a
[180724.335731]            000003800005b3c0 0000000000000000 000003ff800004c4 0000000300014178
[180724.335732]            0000000090a87e80 0000000300014178 000000008bcf6000 00000000924a5000
[180724.335734]            000003ffbe72ef68 000003ff7ffe4c20 000003ff7fe692a8 000003800005b468
[180724.335742] Krnl Code: 000003ff7fe692bc: f0a8000407fe       srp     4(11,%r0),2046,8
                           000003ff7fe692c2: 47000700           bc      0,1792
                          #000003ff7fe692c6: af000000           mc      0,0
                          >000003ff7fe692ca: 0707               bcr     0,%r7
                           000003ff7fe692cc: 0707               bcr     0,%r7
                           000003ff7fe692ce: 0707               bcr     0,%r7
                           000003ff7fe692d0: c00400133e0c       brcl    0,000003ff800d0ee8
                           000003ff7fe692d6: eb6ff0480024       stmg    %r6,%r15,72(%r15)
[180724.335753] Call Trace:
[180724.335754]  [<000003ff7fe692ca>] assfail+0x62/0x68 [xfs] 
[180724.335835] ([<000003ff7fe692a8>] assfail+0x40/0x68 [xfs])
[180724.335915]  [<000003ff7fe8323e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
[180724.335997]  [<000003ff7fe845ba>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
[180724.336078]  [<000003ff7fe8e89a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
[180724.336159]  [<000003ff7fe8f7ce>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
[180724.336240]  [<000003ff7fe8f930>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
[180724.336321]  [<000003ff7fe8f9f8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
[180724.336402]  [<000003ff7fe9002e>] xlog_recover_process_data+0xb6/0x168 [xfs] 
[180724.336482]  [<000003ff7fe901e4>] xlog_recover_process+0x104/0x150 [xfs] 
[180724.336563]  [<000003ff7fe905e2>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] 
[180724.336643]  [<000003ff7fe90dd0>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
[180724.336727]  [<000003ff7fe90e6c>] xlog_do_recover+0x4c/0x218 [xfs] 
[180724.336808]  [<000003ff7fe9247a>] xlog_recover+0xda/0x1a0 [xfs] 
[180724.336888]  [<000003ff7fe78d36>] xfs_log_mount+0x11e/0x280 [xfs] 
[180724.336967]  [<000003ff7fe6a756>] xfs_mountfs+0x3e6/0x920 [xfs] 
[180724.337047]  [<000003ff7fe71ffc>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
[180724.337127]  [<00000000552adf88>] get_tree_bdev+0x120/0x1a8 
[180724.337142]  [<00000000552ab690>] vfs_get_tree+0x38/0x110 
[180724.337146]  [<00000000552dee28>] do_new_mount+0x188/0x2e0 
[180724.337150]  [<00000000552dfaa4>] path_mount+0x1ac/0x818 
[180724.337153]  [<00000000552e0214>] __s390x_sys_mount+0x104/0x148 
[180724.337156]  [<0000000055934796>] __do_syscall+0x21e/0x2b0 
[180724.337163]  [<0000000055944d60>] system_call+0x70/0x98 
[180724.337170] Last Breaking-Event-Address:
[180724.337221]  [<000003ff7fe692b2>] assfail+0x4a/0x68 [xfs]
[180724.337301] ---[ end trace 0000000000000000 ]---

# trace-cmd report > testfs.trace.txt
# bzip2 testfs.trace.txt

Please download it from:
https://drive.google.com/file/d/1FgpPidbMZHSjZinyc_WbVGfvwp2btA86/view?usp=sharing

Hope it's gotten what you need :)

Thanks,
Zorro

> 
> 
> > > and then after the assert fail terminate the tracing and run
> > > 'trace-cmd report > testfs.trace.txt'?
> > 
> > ... Can I still get the trace report after rebooting?
> 
> Not that I know of. But, then again, I don't reboot test machines
> when an oops or assert fail occurs - I like to have a warm corpse
> left behind that I can poke around in with various blunt instruments
> to see what went wrong....
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-22 13:18               ` Zorro Lang
@ 2024-01-22 22:09                 ` Dave Chinner
  2024-01-22 22:49                 ` Dave Chinner
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2024-01-22 22:09 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Mon, Jan 22, 2024 at 09:18:56PM +0800, Zorro Lang wrote:
> On Mon, Jan 22, 2024 at 10:21:07PM +1100, Dave Chinner wrote:
> > On Mon, Jan 22, 2024 at 03:23:12PM +0800, Zorro Lang wrote:
> > > On Sun, Jan 21, 2024 at 10:58:49AM +1100, Dave Chinner wrote:
> > > > On Sat, Jan 20, 2024 at 07:26:00PM +0800, Zorro Lang wrote:
> > > > > On Fri, Jan 19, 2024 at 06:17:24PM +1100, Dave Chinner wrote:
> > > > > > Perhaps a bisect from 6.7 to 6.7+linux-xfs/for-next to identify what
> > > > > > fixed it? Nothing in the for-next branch really looks relevant to
> > > > > > the problem to me....
> > > > > 
> > > > > Hi Dave,
> > > > > 
> > > > > Finally, I got a chance to reproduce this issue on latest upstream mainline
> > > > > linux (HEAD=9d64bf433c53) (and linux-xfs) again.
> > > > > 
> > > > > Looks like some userspace updates hide the issue, but I haven't found out what
> > > > > change does that, due to it's a big change about a whole system version. I
> > > > > reproduced this issue again by using an old RHEL distro (but the kernel is the newest).
> > > > > (I'll try to find out what changes cause that later if it's necessary)
> > > > > 
> > > > > Anyway, I enabled the "CONFIG_XFS_ASSERT_FATAL=y" and "CONFIG_XFS_DEBUG=y" as
> > > > > you suggested. And got the xfs metadump file after it crashed [1] and rebooted.
> > > > > 
> > > > > Due to g/648 tests on a loopimg in SCRATCH_MNT, so I didn't dump the SCRATCH_DEV,
> > > > > but dumped the $SCRATCH_MNT/testfs file, you can get the metadump file from:
> > > > > 
> > > > > https://drive.google.com/file/d/14q7iRl7vFyrEKvv_Wqqwlue6vHGdIFO1/view?usp=sharing
> > > > 
> > > > Ok, I forgot the log on s390 is in big endian format. I don't have a
> > > > bigendian machine here, so I can't replay the log to trace it or
> > > > find out what disk address the buffer belongs. I can't even use
> > > > xfs_logprint to dump the log.
> > > > 
> > > > Can you take that metadump, restore it on the s390 machine, and
> > > > trace a mount attempt? i.e in one shell run 'trace-cmd record -e
> > > > xfs\*' and then in another shell run 'mount testfs.img /mnt/test'
> > > 
> > > The 'mount testfs.img /mnt/test' will crash the kernel and reboot
> > > the system directly ...
> > 
> > Turn off panic-on-oops. Some thing like 'echo 0 >
> > /proc/sys/kernel/panic_on_oops' will do that, I think.
> 
> Thanks, it helps. I did below steps:

Thanks!

> 
> # trace-cmd record -e xfs\*

One modification to this:

# trace-cmd record -e xfs\* -e printk

So it captures the console output, too.


> Hit Ctrl^C to stop recording
> ^CCPU0 data recorded at offset=0x5b7000
>     90112 bytes in size
> CPU1 data recorded at offset=0x5cd000
>     57344 bytes in size
> CPU2 data recorded at offset=0x5db000
>     9945088 bytes in size
> CPU3 data recorded at offset=0xf57000
>     786432 bytes in size
> # mount testfs.img /mnt/tmp
> Segmentation fault
> # (Ctrl^C the trace-cmd record process)
....
> 
> # trace-cmd report > testfs.trace.txt
> # bzip2 testfs.trace.txt
> 
> Please download it from:
> https://drive.google.com/file/d/1FgpPidbMZHSjZinyc_WbVGfvwp2btA86/view?usp=sharing

Excellent, but I also need the metadump to go with the trace. My
fault, I should have made that clear.

My initial scan of the trace indicates that there is something
whacky about the buffer that failed:

mount-6449  [002] 180724.335208: xfs_log_recover_buf_reg_buf: dev 7:0 daddr 0x331fb0, bbcount 0x58, flags 0x5000, size 2, map_size 11

It's got a size of 0x58 BBs, or 44kB. That's not a complete
directory buffer, the directory buffer should be 0x80 BBs (64kB) in
size.

I see that buf log format item in the journal over and over again at
that same size and that is how the buffer is initialised and read
from disk during recovery.  So it look slike the buf log item
written to the journal for this directory block is bad in a way I've
never seen before.

At this point I suspect that something has gone wrong at runtime,
maybe to do with logging a compound buffer, but my initial thought
is that this isn't a recovery bug at all. However, I'll need a
matching trace and metadump to confirm that.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-22 13:18               ` Zorro Lang
  2024-01-22 22:09                 ` Dave Chinner
@ 2024-01-22 22:49                 ` Dave Chinner
  2024-01-23  7:02                   ` Zorro Lang
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2024-01-22 22:49 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Mon, Jan 22, 2024 at 09:18:56PM +0800, Zorro Lang wrote:
> # dmesg
> [180724.293443] loop: module loaded
> [180724.294001] loop0: detected capacity change from 0 to 6876344
> [180724.296987] XFS (loop0): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
> [180724.309088] XFS (loop0): Starting recovery (logdev: internal)
> [180724.335207] XFS (loop0): Bad dir block magic!
> [180724.335210] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> [180724.335264] ------------[ cut here ]------------
> [180724.335265] kernel BUG at fs/xfs/xfs_message.c:102!
> [180724.335356] monitor event: 0040 ilc:2 [#1] SMP 
> [180724.335362] Modules linked in: loop sunrpc rfkill vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio iommufd drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng virt
> io_net des_s390 sha3_512_s390 net_failover sha3_256_s390 failover virtio_blk dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> [180724.335379] CPU: 2 PID: 6449 Comm: mount Kdump: loaded Not tainted 6.7.0+ #1
> [180724.335382] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
> [180724.335384] Krnl PSW : 0704e00180000000 000003ff7fe692ca (assfail+0x62/0x68 [xfs])
> [180724.335727]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> [180724.335729] Krnl GPRS: c000000000000021 000003ff7fee3a40 ffffffffffffffea 000000000000000a
> [180724.335731]            000003800005b3c0 0000000000000000 000003ff800004c4 0000000300014178
> [180724.335732]            0000000090a87e80 0000000300014178 000000008bcf6000 00000000924a5000
> [180724.335734]            000003ffbe72ef68 000003ff7ffe4c20 000003ff7fe692a8 000003800005b468
> [180724.335742] Krnl Code: 000003ff7fe692bc: f0a8000407fe       srp     4(11,%r0),2046,8
>                            000003ff7fe692c2: 47000700           bc      0,1792
>                           #000003ff7fe692c6: af000000           mc      0,0
>                           >000003ff7fe692ca: 0707               bcr     0,%r7
>                            000003ff7fe692cc: 0707               bcr     0,%r7
>                            000003ff7fe692ce: 0707               bcr     0,%r7
>                            000003ff7fe692d0: c00400133e0c       brcl    0,000003ff800d0ee8
>                            000003ff7fe692d6: eb6ff0480024       stmg    %r6,%r15,72(%r15)
> [180724.335753] Call Trace:
> [180724.335754]  [<000003ff7fe692ca>] assfail+0x62/0x68 [xfs] 
> [180724.335835] ([<000003ff7fe692a8>] assfail+0x40/0x68 [xfs])
> [180724.335915]  [<000003ff7fe8323e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> [180724.335997]  [<000003ff7fe845ba>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> [180724.336078]  [<000003ff7fe8e89a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> [180724.336159]  [<000003ff7fe8f7ce>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> [180724.336240]  [<000003ff7fe8f930>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> [180724.336321]  [<000003ff7fe8f9f8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> [180724.336402]  [<000003ff7fe9002e>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> [180724.336482]  [<000003ff7fe901e4>] xlog_recover_process+0x104/0x150 [xfs] 
> [180724.336563]  [<000003ff7fe905e2>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] 
> [180724.336643]  [<000003ff7fe90dd0>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> [180724.336727]  [<000003ff7fe90e6c>] xlog_do_recover+0x4c/0x218 [xfs] 
> [180724.336808]  [<000003ff7fe9247a>] xlog_recover+0xda/0x1a0 [xfs] 
> [180724.336888]  [<000003ff7fe78d36>] xfs_log_mount+0x11e/0x280 [xfs] 
> [180724.336967]  [<000003ff7fe6a756>] xfs_mountfs+0x3e6/0x920 [xfs] 
> [180724.337047]  [<000003ff7fe71ffc>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> [180724.337127]  [<00000000552adf88>] get_tree_bdev+0x120/0x1a8 
> [180724.337142]  [<00000000552ab690>] vfs_get_tree+0x38/0x110 
> [180724.337146]  [<00000000552dee28>] do_new_mount+0x188/0x2e0 
> [180724.337150]  [<00000000552dfaa4>] path_mount+0x1ac/0x818 
> [180724.337153]  [<00000000552e0214>] __s390x_sys_mount+0x104/0x148 
> [180724.337156]  [<0000000055934796>] __do_syscall+0x21e/0x2b0 
> [180724.337163]  [<0000000055944d60>] system_call+0x70/0x98 
> [180724.337170] Last Breaking-Event-Address:
> [180724.337221]  [<000003ff7fe692b2>] assfail+0x4a/0x68 [xfs]
> [180724.337301] ---[ end trace 0000000000000000 ]---
> 
> # trace-cmd report > testfs.trace.txt
> # bzip2 testfs.trace.txt
> 
> Please download it from:
> https://drive.google.com/file/d/1FgpPidbMZHSjZinyc_WbVGfvwp2btA86/view?usp=sharing

Actually, looking at the compound buffer logging code and reminding
myself of how it works, I think this might actually be a false
positive.

What I see in the trace is this pattern of buffer log items being
processed through every phase of recovery:

 xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x2c2ce0, bbcount 0x10, flags 0x5000, size 2, map_size 2
 xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8ea70fc0, item type XFS_LI_BUF item region count/total 2/2
 xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x331fb0, bbcount 0x58, flags 0x5000, size 2, map_size 11
 xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8f36c040, item type XFS_LI_BUF item region count/total 2/2

The item addresses, tid and LSN change, but the order of the two
buf log items does not.

These are both "flags 0x5000" which means both log items are
XFS_BLFT_DIR_BLOCK_BUF types, and they are both partial directory
block buffers, and they are discontiguous. They also have different
types of log items both before and after them, so it is likely these
are two extents within the same compound buffer.

The way we log compound buffers is that we create a buf log format
item for each extent in the buffer, and then we log each range as a
separate buf log format item. IOWs, to recovery these fragments of
the directory block appear just like complete regular buffers that
need to be recovered.

Hence what we see above is the first buffer (daddr 0x2c2ce0, bbcount
0x10) is the first extent in the directory block that contains the
header and magic number, so it recovers and verifies just fine. The
second buffer is the tail of the directory block, and it does not
contain a magic number because it starts mid-way through the
directory block. Hence the magic number verification fails and the
buffer is not recovered.

Compound buffers were logged this way so that they didn't require a
change of log format to recover. Prior to compound buffers, the
directory code kept it's own dabuf structure to map multiple extents
in a singel directory block, and they got logged as separate buffer
log format items as well.

So the problem isn't directly related to the conversion of dabufs to
compound buffers - the problem is related to the buffer recovery
verification code not knowing that directory buffer fragments are
valid recovery targets.

IOWs, this appears to be a log recovery problem and not a runtime
issue. I think the fix will be to allow directory blocks to fail the
magic number check if and only if the buffer length does not match
the directory block size (i.e. it is a partial directory block
fragment being recovered). 

This makes the verification a bit more tricky and perhaps a little
less robust, but we do know that we are supposed to be recovering a
directory block from the buf log format flags and so this additional
check can be constrained.

I suspect that we also need to verify that all other buffers that
are supposed to log and recover entire objects have the correct
size, too.

I'll start working on a fix now - I'm not sure what form that will
take yet as I need to stare at the code for a while longer yet.

Zorro, in the mean time, can you write up an xfstest that creates a
small XFS filesystem with "-n size=64k" and a large log, sets it up
with single block fragmentation (falloc, punch alternate), then
creates a bunch of files (a few thousand) to create a set of
fragmented directory blocks, then runs 'shutdown -f' to force the
log and prevent metadata writeback, then unmounts and mounts the
filesystem. The mount of the filesystem should then trigger this
directory fragment recovery issue on any platform, not just s390.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-22 22:49                 ` Dave Chinner
@ 2024-01-23  7:02                   ` Zorro Lang
  2024-01-23 20:52                     ` Dave Chinner
  0 siblings, 1 reply; 18+ messages in thread
From: Zorro Lang @ 2024-01-23  7:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Jan 23, 2024 at 09:49:37AM +1100, Dave Chinner wrote:
> On Mon, Jan 22, 2024 at 09:18:56PM +0800, Zorro Lang wrote:
> > # dmesg
> > [180724.293443] loop: module loaded
> > [180724.294001] loop0: detected capacity change from 0 to 6876344
> > [180724.296987] XFS (loop0): Mounting V5 Filesystem 59e2f6ae-ceab-4232-9531-a85417847238
> > [180724.309088] XFS (loop0): Starting recovery (logdev: internal)
> > [180724.335207] XFS (loop0): Bad dir block magic!
> > [180724.335210] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> > [180724.335264] ------------[ cut here ]------------
> > [180724.335265] kernel BUG at fs/xfs/xfs_message.c:102!
> > [180724.335356] monitor event: 0040 ilc:2 [#1] SMP 
> > [180724.335362] Modules linked in: loop sunrpc rfkill vfio_ccw mdev vfio_iommu_type1 zcrypt_cex4 vfio iommufd drm fuse i2c_core drm_panel_orientation_quirks xfs libcrc32c ghash_s390 prng virt
> > io_net des_s390 sha3_512_s390 net_failover sha3_256_s390 failover virtio_blk dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt aes_s390
> > [180724.335379] CPU: 2 PID: 6449 Comm: mount Kdump: loaded Not tainted 6.7.0+ #1
> > [180724.335382] Hardware name: IBM 3931 LA1 400 (KVM/Linux)
> > [180724.335384] Krnl PSW : 0704e00180000000 000003ff7fe692ca (assfail+0x62/0x68 [xfs])
> > [180724.335727]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> > [180724.335729] Krnl GPRS: c000000000000021 000003ff7fee3a40 ffffffffffffffea 000000000000000a
> > [180724.335731]            000003800005b3c0 0000000000000000 000003ff800004c4 0000000300014178
> > [180724.335732]            0000000090a87e80 0000000300014178 000000008bcf6000 00000000924a5000
> > [180724.335734]            000003ffbe72ef68 000003ff7ffe4c20 000003ff7fe692a8 000003800005b468
> > [180724.335742] Krnl Code: 000003ff7fe692bc: f0a8000407fe       srp     4(11,%r0),2046,8
> >                            000003ff7fe692c2: 47000700           bc      0,1792
> >                           #000003ff7fe692c6: af000000           mc      0,0
> >                           >000003ff7fe692ca: 0707               bcr     0,%r7
> >                            000003ff7fe692cc: 0707               bcr     0,%r7
> >                            000003ff7fe692ce: 0707               bcr     0,%r7
> >                            000003ff7fe692d0: c00400133e0c       brcl    0,000003ff800d0ee8
> >                            000003ff7fe692d6: eb6ff0480024       stmg    %r6,%r15,72(%r15)
> > [180724.335753] Call Trace:
> > [180724.335754]  [<000003ff7fe692ca>] assfail+0x62/0x68 [xfs] 
> > [180724.335835] ([<000003ff7fe692a8>] assfail+0x40/0x68 [xfs])
> > [180724.335915]  [<000003ff7fe8323e>] xlog_recover_validate_buf_type+0x2a6/0x5c8 [xfs] 
> > [180724.335997]  [<000003ff7fe845ba>] xlog_recover_buf_commit_pass2+0x382/0x448 [xfs] 
> > [180724.336078]  [<000003ff7fe8e89a>] xlog_recover_items_pass2+0x72/0xf0 [xfs] 
> > [180724.336159]  [<000003ff7fe8f7ce>] xlog_recover_commit_trans+0x39e/0x3c0 [xfs] 
> > [180724.336240]  [<000003ff7fe8f930>] xlog_recovery_process_trans+0x140/0x148 [xfs] 
> > [180724.336321]  [<000003ff7fe8f9f8>] xlog_recover_process_ophdr+0xc0/0x180 [xfs] 
> > [180724.336402]  [<000003ff7fe9002e>] xlog_recover_process_data+0xb6/0x168 [xfs] 
> > [180724.336482]  [<000003ff7fe901e4>] xlog_recover_process+0x104/0x150 [xfs] 
> > [180724.336563]  [<000003ff7fe905e2>] xlog_do_recovery_pass+0x3b2/0x748 [xfs] 
> > [180724.336643]  [<000003ff7fe90dd0>] xlog_do_log_recovery+0x88/0xd8 [xfs] 
> > [180724.336727]  [<000003ff7fe90e6c>] xlog_do_recover+0x4c/0x218 [xfs] 
> > [180724.336808]  [<000003ff7fe9247a>] xlog_recover+0xda/0x1a0 [xfs] 
> > [180724.336888]  [<000003ff7fe78d36>] xfs_log_mount+0x11e/0x280 [xfs] 
> > [180724.336967]  [<000003ff7fe6a756>] xfs_mountfs+0x3e6/0x920 [xfs] 
> > [180724.337047]  [<000003ff7fe71ffc>] xfs_fs_fill_super+0x40c/0x7d8 [xfs] 
> > [180724.337127]  [<00000000552adf88>] get_tree_bdev+0x120/0x1a8 
> > [180724.337142]  [<00000000552ab690>] vfs_get_tree+0x38/0x110 
> > [180724.337146]  [<00000000552dee28>] do_new_mount+0x188/0x2e0 
> > [180724.337150]  [<00000000552dfaa4>] path_mount+0x1ac/0x818 
> > [180724.337153]  [<00000000552e0214>] __s390x_sys_mount+0x104/0x148 
> > [180724.337156]  [<0000000055934796>] __do_syscall+0x21e/0x2b0 
> > [180724.337163]  [<0000000055944d60>] system_call+0x70/0x98 
> > [180724.337170] Last Breaking-Event-Address:
> > [180724.337221]  [<000003ff7fe692b2>] assfail+0x4a/0x68 [xfs]
> > [180724.337301] ---[ end trace 0000000000000000 ]---
> > 
> > # trace-cmd report > testfs.trace.txt
> > # bzip2 testfs.trace.txt
> > 
> > Please download it from:
> > https://drive.google.com/file/d/1FgpPidbMZHSjZinyc_WbVGfvwp2btA86/view?usp=sharing
> 
> Actually, looking at the compound buffer logging code and reminding
> myself of how it works, I think this might actually be a false
> positive.
> 
> What I see in the trace is this pattern of buffer log items being
> processed through every phase of recovery:
> 
>  xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x2c2ce0, bbcount 0x10, flags 0x5000, size 2, map_size 2
>  xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8ea70fc0, item type XFS_LI_BUF item region count/total 2/2
>  xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x331fb0, bbcount 0x58, flags 0x5000, size 2, map_size 11
>  xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8f36c040, item type XFS_LI_BUF item region count/total 2/2
> 
> The item addresses, tid and LSN change, but the order of the two
> buf log items does not.
> 
> These are both "flags 0x5000" which means both log items are
> XFS_BLFT_DIR_BLOCK_BUF types, and they are both partial directory
> block buffers, and they are discontiguous. They also have different
> types of log items both before and after them, so it is likely these
> are two extents within the same compound buffer.
> 
> The way we log compound buffers is that we create a buf log format
> item for each extent in the buffer, and then we log each range as a
> separate buf log format item. IOWs, to recovery these fragments of
> the directory block appear just like complete regular buffers that
> need to be recovered.
> 
> Hence what we see above is the first buffer (daddr 0x2c2ce0, bbcount
> 0x10) is the first extent in the directory block that contains the
> header and magic number, so it recovers and verifies just fine. The
> second buffer is the tail of the directory block, and it does not
> contain a magic number because it starts mid-way through the
> directory block. Hence the magic number verification fails and the
> buffer is not recovered.
> 
> Compound buffers were logged this way so that they didn't require a
> change of log format to recover. Prior to compound buffers, the
> directory code kept it's own dabuf structure to map multiple extents
> in a singel directory block, and they got logged as separate buffer
> log format items as well.
> 
> So the problem isn't directly related to the conversion of dabufs to
> compound buffers - the problem is related to the buffer recovery
> verification code not knowing that directory buffer fragments are
> valid recovery targets.
> 
> IOWs, this appears to be a log recovery problem and not a runtime
> issue. I think the fix will be to allow directory blocks to fail the
> magic number check if and only if the buffer length does not match
> the directory block size (i.e. it is a partial directory block
> fragment being recovered). 
> 
> This makes the verification a bit more tricky and perhaps a little
> less robust, but we do know that we are supposed to be recovering a
> directory block from the buf log format flags and so this additional
> check can be constrained.
> 
> I suspect that we also need to verify that all other buffers that
> are supposed to log and recover entire objects have the correct
> size, too.
> 
> I'll start working on a fix now - I'm not sure what form that will
> take yet as I need to stare at the code for a while longer yet.
> 
> Zorro, in the mean time, can you write up an xfstest that creates a
> small XFS filesystem with "-n size=64k" and a large log, sets it up
> with single block fragmentation (falloc, punch alternate), then
> creates a bunch of files (a few thousand) to create a set of
> fragmented directory blocks, then runs 'shutdown -f' to force the
> log and prevent metadata writeback, then unmounts and mounts the
> filesystem. The mount of the filesystem should then trigger this
> directory fragment recovery issue on any platform, not just s390.

Sure Dave, do you mean something likes this:

# mkfs.xfs -f -d size=1g -n size=64k -l size=200M /dev/loop0
# mount /dev/loop0 /mnt/test
# for ((i=0; i<10000; i++));do echo > /mnt/tmp/dir/loooooooooooooooooooooooogfile$i;done && xfs_io -xc 'shutdown -f' /mnt/test
# umount /mnt/test
# mount /mnt/test

I think might be not such simple, it only creates a directory with
several dir-blocks, then shutdown fs. Is there anything detailed &
necessary steps I missed?

Thanks,
Zorro

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply
  2024-01-23  7:02                   ` Zorro Lang
@ 2024-01-23 20:52                     ` Dave Chinner
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2024-01-23 20:52 UTC (permalink / raw)
  To: Zorro Lang; +Cc: linux-xfs

On Tue, Jan 23, 2024 at 03:02:03PM +0800, Zorro Lang wrote:
> On Tue, Jan 23, 2024 at 09:49:37AM +1100, Dave Chinner wrote:
> > On Mon, Jan 22, 2024 at 09:18:56PM +0800, Zorro Lang wrote:
> > Zorro, in the mean time, can you write up an xfstest that creates a
> > small XFS filesystem with "-n size=64k" and a large log, sets it up
> > with single block fragmentation (falloc, punch alternate), then
> > creates a bunch of files (a few thousand) to create a set of
> > fragmented directory blocks, then runs 'shutdown -f' to force the
> > log and prevent metadata writeback, then unmounts and mounts the
> > filesystem. The mount of the filesystem should then trigger this
> > directory fragment recovery issue on any platform, not just s390.
> 
> Sure Dave, do you mean something likes this:

Almost. Free space needs to be fragmented before creating the
directory structure so that the directory blocks are fragmented.

> # mkfs.xfs -f -d size=1g -n size=64k -l size=200M /dev/loop0
> # mount /dev/loop0 /mnt/test

# fallocate 1g /mnt/test/file
# punch_alternate /mnt/test/file

> # for ((i=0; i<10000; i++));do echo > /mnt/tmp/dir/loooooooooooooooooooooooogfile$i;done && xfs_io -xc 'shutdown -f' /mnt/test
> # umount /mnt/test
> # mount /mnt/test

Otherwise this should trigger recover of the directory blocks and
hopefully trip over it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-01-23 20:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-18 14:01 [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply Zorro Lang
2023-12-18 17:48 ` Darrick J. Wong
2023-12-19  6:34   ` Zorro Lang
2023-12-25 13:38     ` Zorro Lang
2024-01-04  4:35       ` Darrick J. Wong
2024-01-18  1:59         ` Dave Chinner
2024-01-18  4:20 ` Dave Chinner
2024-01-19  1:38   ` Zorro Lang
2024-01-19  7:17     ` Dave Chinner
2024-01-20 11:26       ` Zorro Lang
2024-01-20 23:58         ` Dave Chinner
2024-01-22  7:23           ` Zorro Lang
2024-01-22 11:21             ` Dave Chinner
2024-01-22 13:18               ` Zorro Lang
2024-01-22 22:09                 ` Dave Chinner
2024-01-22 22:49                 ` Dave Chinner
2024-01-23  7:02                   ` Zorro Lang
2024-01-23 20:52                     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.