All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 06/18] xfs: make xfs_writepage_map extent map centric
Date: Wed, 30 May 2018 13:39:56 -0400	[thread overview]
Message-ID: <20180530173955.GF112411@bfoster.bfoster> (raw)
In-Reply-To: <20180530100013.31358-7-hch@lst.de>

On Wed, May 30, 2018 at 12:00:01PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_writepage_map() iterates over the bufferheads on a page to decide
> what sort of IO to do and what actions to take.  However, when it comes
> to reflink and deciding when it needs to execute a COW operation, we no
> longer look at the bufferhead state but instead we ignore than and look
> up internal state held in teh COW fork extent list.
> 
...
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> [hch: forward port + slight refactoring]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

I believe Dave originally intended to split this up into multiple
patches. Dave, did you happen to get anywhere with that before Christoph
pulled this in?

If not, could we at least split off some of the behavior changes into
separate patches? For example, dropping the !mapped && uptodate check
that causes us to writeback zeroed blocks over unwritten extents is a
behavior change that warrants a separate patch. I also think the
associated text above doesn't quite describe the details of the behavior
change. Writeback only converts those extents if they've been previously
read somehow or another, otherwise those blocks are still skipped.

So for example, (on a bsize=1k fs) this:

	xfs_io -fc "falloc 0 4k" -c "pread 3k 1k" -c "pwrite 3k 1k" \
		-c fsync -c "fiemap -v" /mnt/file

... now results in different writeback behavior from this:

	xfs_io -fc "falloc 0 4k" -c "pwrite 3k 1k" \
		-c fsync -c "fiemap -v" /mnt/file

That may be fine, but it does leave me thinking whether we should
consider ways to provide more consistent behavior here (such as zeroing
blocks at writeback time, for example).

>  fs/xfs/xfs_aops.c | 273 ++++++++++++++++++++--------------------------
>  fs/xfs/xfs_aops.h |   4 +-
>  2 files changed, 124 insertions(+), 153 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 5dd09e83c81c..8cc41a786b5e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -845,85 +818,81 @@ xfs_writepage_map(
>  {
>  	LIST_HEAD(submit_list);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh, *head;
> +	struct buffer_head	*bh;
>  	ssize_t			len = i_blocksize(inode);
> -	uint64_t		offset;
>  	int			error = 0;
>  	int			count = 0;
> -	int			uptodate = 1;
> -	unsigned int		new_type;
> +	bool			uptodate = true;
> +	loff_t			file_offset;	/* file offset of page */
> +	unsigned		poffset;	/* offset into page */
>  
> -	bh = head = page_buffers(page);
> -	offset = page_offset(page);
> -	do {
> -		if (offset >= end_offset)
> +	/*
> +	 * Walk the blocks on the page, and we we run off then end of the
> +	 * current map or find the current map invalid, grab a new one.
> +	 * We only use bufferheads here to check per-block state - they no
> +	 * longer control the iteration through the page. This allows us to
> +	 * replace the bufferhead with some other state tracking mechanism in
> +	 * future.
> +	 */
> +	file_offset = page_offset(page);
> +	bh = page_buffers(page);
> +	for (poffset = 0;
> +	     poffset < PAGE_SIZE;
> +	     poffset += len, file_offset += len, bh = bh->b_this_page) {
> +		/* past the range we are writing, so nothing more to write. */
> +		if (file_offset >= end_offset)
>  			break;
> -		if (!buffer_uptodate(bh))
> -			uptodate = 0;
>  
>  		/*
> -		 * set_page_dirty dirties all buffers in a page, independent
> -		 * of their state.  The dirty state however is entirely
> -		 * meaningless for holes (!mapped && uptodate), so skip
> -		 * buffers covering holes here.
> +		 * Block does not contain valid data, skip it, mark the current
> +		 * map as invalid because we have a discontiguity. This ensures
> +		 * we put subsequent writeable buffers into a new ioend.
>  		 */
> -		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
> -			wpc->imap_valid = false;
> -			continue;
> -		}
> -
> -		if (buffer_unwritten(bh))
> -			new_type = XFS_IO_UNWRITTEN;
> -		else if (buffer_delay(bh))
> -			new_type = XFS_IO_DELALLOC;
> -		else if (buffer_uptodate(bh))
> -			new_type = XFS_IO_OVERWRITE;
> -		else {
> +		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> -			/*
> -			 * This buffer is not uptodate and will not be
> -			 * written to disk.  Ensure that we will put any
> -			 * subsequent writeable buffers into a new
> -			 * ioend.
> -			 */
> +			uptodate = false;
>  			wpc->imap_valid = false;
>  			continue;
>  		}
>  
> -		if (xfs_is_reflink_inode(XFS_I(inode))) {
> -			error = xfs_map_cow(wpc, inode, offset, &new_type);
> -			if (error)
> -				goto out;
> -		}
> -
> -		if (wpc->io_type != new_type) {
> -			wpc->io_type = new_type;
> -			wpc->imap_valid = false;
> -		}
> -
> +		/* Check to see if current map spans this file offset */
>  		if (wpc->imap_valid)
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);

What if the file is reflinked and the current page covers a non-shared
block but has an overlapping cow mapping due to cowextsize? The current
logic unconditionally uses the COW mapping for writeback. The updated
logic doesn't appear to do that in all cases. Consider if the current
imap was delalloc (and so not trimmed) or the cow mapping was introduced
after the current imap was mapped. This logic appears to prioritize the
current mapping so long as it is valid. Doesn't that break the
cowextsize hint?

Brian

> +		/*
> +		 * If we don't have a valid map, now it's time to get a new one
> +		 * for this offset.  This will convert delayed allocations
> +		 * (including COW ones) into real extents.  If we return without
> +		 * a valid map, it means we landed in a hole and we skip the
> +		 * block.
> +		 */
>  		if (!wpc->imap_valid) {
> -			error = xfs_map_blocks(inode, offset, &wpc->imap,
> -					     wpc->io_type);
> +			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
> +					     &wpc->io_type);
>  			if (error)
>  				goto out;
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
>  		}
> -		if (wpc->imap_valid) {
> -			lock_buffer(bh);
> -			if (wpc->io_type != XFS_IO_OVERWRITE)
> -				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
> -			count++;
> +
> +		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> +			/*
> +			 * set_page_dirty dirties all buffers in a page, independent
> +			 * of their state.  The dirty state however is entirely
> +			 * meaningless for holes (!mapped && uptodate), so check we did
> +			 * have a buffer covering a hole here and continue.
> +			 */
> +			continue;
>  		}
>  
> -	} while (offset += len, ((bh = bh->b_this_page) != head));
> +		lock_buffer(bh);
> +		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
> +		count++;
> +	}
>  
> -	if (uptodate && bh == head)
> +	if (uptodate && poffset == PAGE_SIZE)
>  		SetPageUptodate(page);
>  
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index 69346d460dfa..b2ef5b661761 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -29,6 +29,7 @@ enum {
>  	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
>  	XFS_IO_OVERWRITE,	/* covers already allocated extent */
>  	XFS_IO_COW,		/* covers copy-on-write extent */
> +	XFS_IO_HOLE,		/* covers region without any block allocation */
>  };
>  
>  #define XFS_IO_TYPES \
> @@ -36,7 +37,8 @@ enum {
>  	{ XFS_IO_DELALLOC,		"delalloc" }, \
>  	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
>  	{ XFS_IO_OVERWRITE,		"overwrite" }, \
> -	{ XFS_IO_COW,			"CoW" }
> +	{ XFS_IO_COW,			"CoW" }, \
> +	{ XFS_IO_HOLE,			"hole" }
>  
>  /*
>   * Structure for buffered I/O completions.
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2018-05-30 17:39 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
2018-05-30  9:59 ` [PATCH 01/18] fs: factor out a __generic_write_end helper Christoph Hellwig
2018-05-30 13:34   ` Brian Foster
2018-05-30 16:26   ` Darrick J. Wong
2018-05-30  9:59 ` [PATCH 02/18] iomap: add initial support for writes without buffer heads Christoph Hellwig
2018-05-30 13:34   ` Brian Foster
2018-05-30 16:33   ` Darrick J. Wong
2018-05-30  9:59 ` [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-30 13:35   ` Brian Foster
2018-05-30 16:55     ` Darrick J. Wong
2018-05-30  9:59 ` [PATCH 04/18] xfs: simplify xfs_aops_discard_page Christoph Hellwig
2018-05-30 13:35   ` Brian Foster
2018-05-30 16:40   ` Darrick J. Wong
2018-05-30 10:00 ` [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-30 13:35   ` Brian Foster
2018-05-30 16:58     ` Darrick J. Wong
2018-05-30 17:40       ` Brian Foster
2018-05-30 10:00 ` [PATCH 06/18] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
2018-05-30 17:19   ` Darrick J. Wong
2018-05-30 17:39   ` Brian Foster [this message]
2018-05-31 16:11     ` Christoph Hellwig
2018-05-31 16:12     ` Christoph Hellwig
2018-05-30 10:00 ` [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
2018-05-30 17:19   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-31 16:11     ` Christoph Hellwig
2018-05-30 10:00 ` [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
2018-05-30 17:23   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
2018-05-30 17:27   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
2018-05-30 17:32   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
2018-05-30 17:34   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 12/18] xfs: remove the imap_valid flag Christoph Hellwig
2018-05-30 17:44   ` Darrick J. Wong
2018-05-31 13:47   ` Brian Foster
2018-05-30 10:00 ` [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
2018-05-30 17:55   ` Darrick J. Wong
2018-05-31  7:03     ` Christoph Hellwig
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
2018-05-30 17:56   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 15/18] xfs: remove xfs_start_page_writeback Christoph Hellwig
2018-05-30 17:57   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
2018-05-30 18:00   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
2018-05-30 18:08   ` Darrick J. Wong
2018-05-31  7:04     ` Christoph Hellwig
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 18/18] xfs: allow writeback on pages without buffer heads Christoph Hellwig
2018-05-30 18:08   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180530173955.GF112411@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.