All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	Allison Henderson <allison.henderson@oracle.com>,
	Zhang Tianci <zhangtianci.1997@bytedance.com>,
	Brian Foster <bfoster@redhat.com>, Ben Myers <bpm@sgi.com>,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	xieyongji@bytedance.com, me@jcix.top
Subject: [PATCH 2/2] xfs: update dir3 leaf block metadata after swap
Date: Tue, 28 Nov 2023 17:39:50 +0800	[thread overview]
Message-ID: <39b76473-fe00-0f1b-62e3-ae349a9f80d3@bytedance.com> (raw)
In-Reply-To: <ZWWnQYo73yHnctvi@infradead.org>

On 2023/11/28 16:39, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
>> From: Zhang Tianci <zhangtianci.1997@bytedance.com>
>>
>> xfs_da3_swap_lastblock() copy the last block content to the dead block,
>> but do not update the metadata in it. We need update some metadata
>> for some kinds of type block, such as dir3 leafn block records its
>> blkno, we shall update it to the dead block blkno. Otherwise,
>> before write the xfs_buf to disk, the verify_write() will fail in
>> blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
> 
> Do you have a reproducer for this?  It would be very helpful to add it
> to xfstests.

Hi Christoph,

Thanks for the review!

It's hard to reproduce the issue. Currently we can reproduce it with
some kernel code changes. We forcely reserve 0 t_blk_res for xfs_remove
on kernel version 4.19:

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f2d06e1e4906..c8f84b95a0ec 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2551,13 +2551,8 @@ xfs_remove(
          * insert tries to happen, instead trimming the LAST
          * block from the directory.
          */
-       resblks = XFS_REMOVE_SPACE_RES(mp);
-       error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0, 
0, &tp);
-       if (error == -ENOSPC) {
-               resblks = 0;
-               error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0,
-                               &tp);
-       }
+       resblks = 0;
+       error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, &tp);
         if (error) {
                 ASSERT(error != -ENOSPC);
                 goto std_return


After insmod the new modified xfs.ko, run the following scripts, and it
can reproduce the problem consistently on the final `umount mnt`:

fallocate -l 1G xfs.img
mkfs.xfs -f xfs.img
mkdir -p mnt
losetup /dev/loop0 xfs.img
mount -t xfs /dev/loop0 mnt
pushd mnt
mkdir dir3
prefix="a_"
for j in $(seq 0 13); do
     for i in $(seq 0 2800); do
             touch dir3/${prefix}_${i}_${j}
     done
     for i in $(seq 0 2500); do
             rm -f dir3/${prefix}_${i}_${j}
             if [ "$i" == "2094" ] && [ "$j" == "13" ]; then
                     echo "should reproduce now, so break here!"
                     break;
             fi
     done
done
popd
umount mnt


We are still trying to make a reproducer without any kernel changes. Do
you have any suggestions on this?


> 
>>
>> We will get this warning:
>>
>>    XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
>>    XFS (dm-0): Unmount and run xfs_repair
>>    XFS (dm-0): First 128 bytes of corrupted metadata buffer:
>>    00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00  ........=.......
>>    000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00  ................
>>    000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf  .D...dDA..`.PC..
>>    00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00  .........s......
>>    00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00  .).8.....).@....
>>    000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00  .).H.....I......
>>    00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe  .I....E%.I....H.
>>    0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97  .I....Lk.I. ..M.
>>    XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c.  Return address = 00000000c0ff63c1
>>    XFS (dm-0): Corruption of in-memory data detected.  Shutting down filesystem
>>    XFS (dm-0): Please umount the filesystem and rectify the problem(s)
>>
>> >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
>> its blkno is 0x1a0.
>>
>> Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
>> Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
>> ---
>>   fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
>> index e576560b46e9..35f70e4c6447 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.c
>> +++ b/fs/xfs/libxfs/xfs_da_btree.c
>> @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
>>   	 * Copy the last block into the dead buffer and log it.
>>   	 */
>>   	memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
>> -	xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
>>   	dead_info = dead_buf->b_addr;
>> +	/*
>> +	 * Update the moved block's blkno if it's a dir3 leaf block
>> +	 */
>> +	if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
>> +	    dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
>> +	    dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
>> +		struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
>> +
>> +		dap->blkno = cpu_to_be64(dead_buf->b_bn);
>> +	}
>> +	xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> 
> The fix here looks correct to me, but also a little ugly and ad-hoc.
> 
> At last we should be using container_of and not casts for getting from a
> xfs_da_blkinfo to a xfs_da3_blkinfo (even if there is bad precedence
> for the cast in existing code).
> 

Thanks, we will optimize the code in the next version of the patchset.

> But I think it would be useful to add a helper that stamps in the blkno
> in for a caller that only has as xfs_da_blkinfo but no xfs_da3_blkinfo
> and use in all the places that do it currently in an open coded fashion
> e.g. xfs_da3_root_join, xfs_da3_root_split, xfs_attr3_leaf_to_node.
> 
> That should probably be done on top of the small backportable fix.
> 

I think the idea to add helper is great, and we can do it after this
fixes patch is merged.


Thanks,
Jiachen


  reply	other threads:[~2023-11-28  9:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-28  5:32 [PATCH 0/2] Fixes for ENOSPC xfs_remove Jiachen Zhang
2023-11-28  5:32 ` [PATCH 1/2] xfs: ensure tmp_logflags is initialized in xfs_bmap_del_extent_real Jiachen Zhang
2023-11-28  8:19   ` Christoph Hellwig
2023-11-28 16:19     ` Darrick J. Wong
2023-11-28  5:32 ` [PATCH 2/2] xfs: update dir3 leaf block metadata after swap Jiachen Zhang
2023-11-28  8:39   ` Christoph Hellwig
2023-11-28  9:39     ` Jiachen Zhang [this message]
2023-11-28 16:29       ` Darrick J. Wong
2023-11-28 11:18   ` kernel test robot
2023-11-28 12:08   ` kernel test robot
2023-11-28 23:15   ` Dave Chinner
2023-11-29  6:34     ` Christoph Hellwig
2023-11-29  8:46       ` Dave Chinner
2023-11-29  6:34     ` Darrick J. Wong
2023-11-29  7:28       ` [External] " Zhang Tianci
2023-11-29  8:50       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39b76473-fe00-0f1b-62e3-ae349a9f80d3@bytedance.com \
    --to=zhangjiachen.jaycee@bytedance.com \
    --cc=allison.henderson@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=bpm@sgi.com \
    --cc=chandan.babu@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=me@jcix.top \
    --cc=xieyongji@bytedance.com \
    --cc=zhangtianci.1997@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.