linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: yu kuai <yukuai3@huawei.com>
Cc: bfoster@redhat.com, dchinner@redhat.com, sandeen@sandeen.net,
	cmaiolino@redhat.com, hch@lst.de, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, zhengbin13@huawei.com,
	yi.zhang@huawei.com, houtao1@huawei.com
Subject: Re: [PATCH 2/2] xfs: fix stale data exposure problem when punch hole, collapse range or zero range across a delalloc extent
Date: Mon, 6 Jan 2020 13:57:55 -0800	[thread overview]
Message-ID: <20200106215755.GB472651@magnolia> (raw)
In-Reply-To: <20191226134721.43797-3-yukuai3@huawei.com>

On Thu, Dec 26, 2019 at 09:47:21PM +0800, yu kuai wrote:
> In xfs_file_fallocate, when punch hole, zero range or collapse range is
> performed, xfs_fulsh_unmap_range() need to be called first. However,
> xfs_map_blocks will convert the whole extent to real, even if there are
> some blocks not related. Furthermore, the unrelated blocks will hold stale
> data since xfs_fulsh_unmap_range didn't flush the correspond dirty pages
> to disk.
> 
> In this case, if user shutdown file system through xfsioctl with cmd
> 'XFS_IOC_GOINGDOWN' and arg 'XFS_FSOP_GOING_FLAGS_LOGFLUSH'. All the
> completed transactions will be flushed to disk, while dirty pages will
> never be flushed to disk. And after remount, the file will hold stale
> data.

Waitaminute, what problem are you trying to solve?

You have a file with a huge delalloc extent because we just wrote a
bunch of 'X' characters to part of a file:

---dddddddddddddddd

Then you want to fallocate or something in the middle of that:

---dddddddddddddddd
           ^^^^------ collapse range these blocks

So we xfs_flush_unmap_range to kill the pagecache on that range:

---dddddddddddddddd
           ^^^^------ xfs_flush_unmap_range()

This triggers writeback, which can convert the entire delalloc range to
a single extent:

---rrrrrrrrrrrrrrrr
           ^^^^^^^^-- This is the range we are writing back
   ^^^^^^^^---------- This range doesn't undergo writeback, but we wrote
                      the extent tree anyway

After committing that update to the log, the fs goes down, which leaves
us with the following after we reboot, mount, and recover the fs:

---rrrrrrrrrrrrrrrr
           ^^^^^^^^-- This part contains 'X'
   ^^^^^^^^---------- This range never underwent writeback, so it's full
		      of junk from the previous owner of the space

So your solution is to split the delalloc reservation to constrain the
allocation to the range that's being operated on?

If so, I think a better solution (at least from the perspective of
reducing fragmentation) would be to map the extent unwritten and force a
post-writeback conversion[1] but I got shot down for performance reasons
the last time I suggested that.

--D

[1] https://lore.kernel.org/linux-xfs/155259894630.30230.10064390935593758177.stgit@magnolia/

> Fix the problem by spliting delalloc extent before xfs_flush_unmap_range
> is called.
> 
> Signed-off-by: yu kuai <yukuai3@huawei.com>
> ---
>  fs/xfs/xfs_file.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index c93250108952..5398102feec9 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -786,6 +786,50 @@ xfs_break_layouts(
>  
>  	return error;
>  }
> +int
> +try_split_da_extent(
> +	struct xfs_inode	*ip,
> +	loff_t			offset,
> +	loff_t			len)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_fileoff_t		start = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t		end = XFS_B_TO_FSBT(mp, offset + len - 1);
> +	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> +	struct xfs_iext_cursor	cur;
> +	struct xfs_bmbt_irec	imap;
> +	int error;
> +
> +	/*
> +	 * if start belong to a delalloc extent and it's not the first block,
> +	 * split the extent at start.
> +	 */
> +	if (xfs_iext_lookup_extent(ip, ifp, start, &cur, &imap) &&
> +	    imap.br_startblock != HOLESTARTBLOCK &&
> +	    isnullstartblock(imap.br_startblock) &&
> +	    start > imap.br_startoff) {
> +		error = xfs_bmap_split_da_extent(ip, start);
> +		if (error)
> +			return error;
> +		ip->i_d.di_nextents--;
> +	}
> +
> +	/*
> +	 * if end + 1 belong to a delalloc extent and it's not the first block,
> +	 * split the extent at end + 1.
> +	 */
> +	if (xfs_iext_lookup_extent(ip, ifp, end + 1, &cur, &imap) &&
> +	    imap.br_startblock != HOLESTARTBLOCK &&
> +	    isnullstartblock(imap.br_startblock) &&
> +	    end + 1 > imap.br_startoff) {
> +		error = xfs_bmap_split_da_extent(ip, end + 1);
> +		if (error)
> +			return error;
> +		ip->i_d.di_nextents--;
> +	}
> +
> +	return 0;
> +}
>  
>  #define	XFS_FALLOC_FL_SUPPORTED						\
>  		(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |		\
> @@ -842,6 +886,9 @@ xfs_file_fallocate(
>  	 */
>  	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE |
>  		    FALLOC_FL_COLLAPSE_RANGE)) {
> +		error = try_split_da_extent(ip, offset, len);
> +		if (error)
> +			goto out_unlock;
>  		error = xfs_flush_unmap_range(ip, offset, len);
>  		if (error)
>  			goto out_unlock;
> -- 
> 2.17.2
> 

  parent reply	other threads:[~2020-01-06 21:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-26 13:47 [PATCH 0/2] fix stale data exposure problem yu kuai
2019-12-26 13:47 ` [PATCH 1/2] xfs: introduce xfs_bmap_split_da_extent yu kuai
2020-01-05 21:04   ` Dave Chinner
2019-12-26 13:47 ` [PATCH 2/2] xfs: fix stale data exposure problem when punch hole, collapse range or zero range across a delalloc extent yu kuai
2019-12-27  0:17   ` kbuild test robot
2019-12-27  0:17   ` [RFC PATCH] xfs: try_split_da_extent can be static kbuild test robot
2019-12-27 14:25   ` [xfs] 1c6c6a28e3: Assertion_failed kernel test robot
2020-01-06 21:57   ` Darrick J. Wong [this message]
2020-01-07  2:09     ` [PATCH 2/2] xfs: fix stale data exposure problem when punch hole, collapse range or zero range across a delalloc extent yukuai (C)
2020-01-07 18:17       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200106215755.GB472651@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=cmaiolino@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=houtao1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    --cc=zhengbin13@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).