linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 22/34] xfs: make xfs_writepage_map extent map centric
Date: Thu, 24 May 2018 10:59:36 -0400	[thread overview]
Message-ID: <20180524145935.GA84959@bfoster.bfoster> (raw)
In-Reply-To: <20180523144357.18985-23-hch@lst.de>

On Wed, May 23, 2018 at 04:43:45PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
...
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 5dd09e83c81c..a50f69c2c602 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -845,85 +826,81 @@ xfs_writepage_map(
>  {
>  	LIST_HEAD(submit_list);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh, *head;
> +	struct buffer_head	*bh;
>  	ssize_t			len = i_blocksize(inode);
> -	uint64_t		offset;
>  	int			error = 0;
>  	int			count = 0;
> -	int			uptodate = 1;
> -	unsigned int		new_type;
> +	bool			uptodate = true;
> +	loff_t			file_offset;	/* file offset of page */
> +	unsigned		poffset;	/* offset into page */
>  
> -	bh = head = page_buffers(page);
> -	offset = page_offset(page);
> -	do {
> -		if (offset >= end_offset)
> +	/*
> +	 * Walk the blocks on the page, and we we run off then end of the
> +	 * current map or find the current map invalid, grab a new one.
> +	 * We only use bufferheads here to check per-block state - they no
> +	 * longer control the iteration through the page. This allows us to
> +	 * replace the bufferhead with some other state tracking mechanism in
> +	 * future.
> +	 */
> +	file_offset = page_offset(page);
> +	bh = page_buffers(page);
> +	for (poffset = 0;
> +	     poffset < PAGE_SIZE;
> +	     poffset += len, file_offset += len, bh = bh->b_this_page) {
> +		/* past the range we are writing, so nothing more to write. */
> +		if (file_offset >= end_offset)
>  			break;
> -		if (!buffer_uptodate(bh))
> -			uptodate = 0;
>  
>  		/*
> -		 * set_page_dirty dirties all buffers in a page, independent
> -		 * of their state.  The dirty state however is entirely
> -		 * meaningless for holes (!mapped && uptodate), so skip
> -		 * buffers covering holes here.
> +		 * Block does not contain valid data, skip it, mark the current
> +		 * map as invalid because we have a discontiguity. This ensures
> +		 * we put subsequent writeable buffers into a new ioend.
>  		 */
> -		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
> -			wpc->imap_valid = false;
> -			continue;
> -		}
> -
> -		if (buffer_unwritten(bh))
> -			new_type = XFS_IO_UNWRITTEN;
> -		else if (buffer_delay(bh))
> -			new_type = XFS_IO_DELALLOC;
> -		else if (buffer_uptodate(bh))
> -			new_type = XFS_IO_OVERWRITE;
> -		else {
> +		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> -			/*
> -			 * This buffer is not uptodate and will not be
> -			 * written to disk.  Ensure that we will put any
> -			 * subsequent writeable buffers into a new
> -			 * ioend.
> -			 */
> +			uptodate = false;
>  			wpc->imap_valid = false;
>  			continue;
>  		}
>  
> -		if (xfs_is_reflink_inode(XFS_I(inode))) {
> -			error = xfs_map_cow(wpc, inode, offset, &new_type);
> -			if (error)
> -				goto out;
> -		}
> -
> -		if (wpc->io_type != new_type) {
> -			wpc->io_type = new_type;
> -			wpc->imap_valid = false;
> -		}
> -
> +		/* Check to see if current map spans this file offset */
>  		if (wpc->imap_valid)
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
> +		/*
> +		 * If we don't have a valid map, now it's time to get a new one
> +		 * for this offset.  This will convert delayed allocations
> +		 * (including COW ones) into real extents.  If we return without
> +		 * a valid map, it means we landed in a hole and we skip the
> +		 * block.
> +		 */
>  		if (!wpc->imap_valid) {
> -			error = xfs_map_blocks(inode, offset, &wpc->imap,
> -					     wpc->io_type);
> +			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
> +					     &wpc->io_type);
>  			if (error)
>  				goto out;
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
>  		}
> -		if (wpc->imap_valid) {
> -			lock_buffer(bh);
> -			if (wpc->io_type != XFS_IO_OVERWRITE)
> -				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
> -			count++;
> +
> +		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> +			/*
> +			 * set_page_dirty dirties all buffers in a page, independent
> +			 * of their state.  The dirty state however is entirely
> +			 * meaningless for holes (!mapped && uptodate), so check we did
> +			 * have a buffer covering a hole here and continue.
> +			 */

The comment above doesn't make much sense given that we don't check for
anything here and just continue the loop.

That aside, the concern I had with this patch when it was last posted is
that it indirectly dropped the error/consistency check between page
state and extent state provided by the XFS_BMAPI_DELALLOC flag. What was
historically an accounting/reservation issue was turned into something
like this by XFS_BMAPI_DELALLOC:

# xfs_io -c "pwrite 0 4k" -c fsync /mnt/file
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0041 sec (974.184 KiB/sec and 243.5460 ops/sec)
fsync: Input/output error

As of this patch, that same error condition now behaves something like
this:

[root@localhost ~]# xfs_io -c "pwrite 0 4k" -c fsync /mnt/file
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0029 sec (1.325 MiB/sec and 339.2130 ops/sec)
[root@localhost ~]# ls -al /mnt/file
-rw-r--r--. 1 root root 4096 May 24 08:27 /mnt/file
[root@localhost ~]# umount  /mnt ; mount /dev/test/scratch /mnt/
[root@localhost ~]# ls -al /mnt/file
-rw-r--r--. 1 root root 0 May 24 08:27 /mnt/file

So our behavior has changed from forced block allocation (violating
reservation) and writing the data, to instead return an error, and now
to silently skip the page. I suppose there are situations (i.e., races
with truncate) where a hole is valid and the correct behavior is to skip
the page, and this is admittedly an error condition that "should never
happen," but can we at least add an assert somewhere in this series that
ensures if uptodate data maps over a hole that the associated block
offset is beyond EOF (or something of that nature)?

Brian

> +			continue;
>  		}
>  
> -	} while (offset += len, ((bh = bh->b_this_page) != head));
> +		lock_buffer(bh);
> +		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
> +		count++;
> +	}
>  
> -	if (uptodate && bh == head)
> +	if (uptodate && poffset == PAGE_SIZE)
>  		SetPageUptodate(page);
>  
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index 69346d460dfa..b2ef5b661761 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -29,6 +29,7 @@ enum {
>  	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
>  	XFS_IO_OVERWRITE,	/* covers already allocated extent */
>  	XFS_IO_COW,		/* covers copy-on-write extent */
> +	XFS_IO_HOLE,		/* covers region without any block allocation */
>  };
>  
>  #define XFS_IO_TYPES \
> @@ -36,7 +37,8 @@ enum {
>  	{ XFS_IO_DELALLOC,		"delalloc" }, \
>  	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
>  	{ XFS_IO_OVERWRITE,		"overwrite" }, \
> -	{ XFS_IO_COW,			"CoW" }
> +	{ XFS_IO_COW,			"CoW" }, \
> +	{ XFS_IO_HOLE,			"hole" }
>  
>  /*
>   * Structure for buffered I/O completions.
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-05-24 14:59 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23 14:43 buffered I/O without buffer heads in xfs and iomap v3 Christoph Hellwig
2018-05-23 14:43 ` [PATCH 01/34] block: add a lower-level bio_add_page interface Christoph Hellwig
2018-05-30  5:28   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 02/34] fs: factor out a __generic_write_end helper Christoph Hellwig
2018-05-30  5:30   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 03/34] fs: move page_cache_seek_hole_data to iomap.c Christoph Hellwig
2018-05-30  5:31   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 04/34] fs: remove the buffer_unwritten check in page_seek_hole_data Christoph Hellwig
2018-05-30  5:36   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 05/34] fs: use ->is_partially_uptodate in page_cache_seek_hole_data Christoph Hellwig
2018-05-30  5:41   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 06/34] mm: give the 'ret' variable a better name __do_page_cache_readahead Christoph Hellwig
2018-05-30  5:42   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 07/34] mm: return an unsigned int from __do_page_cache_readahead Christoph Hellwig
2018-05-30  5:44   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 08/34] mm: split ->readpages calls to avoid non-contiguous pages lists Christoph Hellwig
2018-05-30  5:46   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 09/34] iomap: inline data should be an iomap type, not a flag Christoph Hellwig
2018-05-30  5:49   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 10/34] iomap: fix the comment describing IOMAP_NOWAIT Christoph Hellwig
2018-05-30  5:49   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 11/34] iomap: move IOMAP_F_BOUNDARY to gfs2 Christoph Hellwig
2018-05-30  5:50   ` Darrick J. Wong
2018-05-30  9:30     ` [Cluster-devel] " Steven Whitehouse
2018-05-30  9:59       ` Christoph Hellwig
2018-05-30 10:02         ` Steven Whitehouse
2018-05-30 10:10           ` Christoph Hellwig
2018-05-30 10:12             ` Steven Whitehouse
2018-05-30 11:03               ` Andreas Gruenbacher
2018-05-23 14:43 ` [PATCH 12/34] iomap: use __bio_add_page in iomap_dio_zero Christoph Hellwig
2018-05-30  5:51   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 13/34] iomap: add a iomap_sector helper Christoph Hellwig
2018-05-30  5:52   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 14/34] iomap: add an iomap-based bmap implementation Christoph Hellwig
2018-05-30  5:54   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 15/34] iomap: add an iomap-based readpage and readpages implementation Christoph Hellwig
2018-05-30  6:11   ` Darrick J. Wong
2018-05-30  6:23     ` Christoph Hellwig
2018-05-23 14:43 ` [PATCH 16/34] iomap: add initial support for writes without buffer heads Christoph Hellwig
2018-05-30  6:21   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 17/34] xfs: use iomap_bmap Christoph Hellwig
2018-05-30  6:14   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 18/34] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
2018-05-30  6:22   ` Darrick J. Wong
2018-05-23 14:43 ` [PATCH 19/34] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-23 16:17   ` Brian Foster
2018-05-24  8:01     ` Christoph Hellwig
2018-05-23 14:43 ` [PATCH 20/34] xfs: simplify xfs_aops_discard_page Christoph Hellwig
2018-05-23 14:43 ` [PATCH 21/34] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-23 14:43 ` [PATCH 22/34] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
2018-05-24 14:59   ` Brian Foster [this message]
2018-05-24 16:53     ` Christoph Hellwig
2018-05-24 18:13       ` Brian Foster
2018-05-25  6:19         ` Christoph Hellwig
2018-05-25 11:35           ` Brian Foster
2018-05-28  7:15             ` Christoph Hellwig
2018-05-29 11:26               ` Brian Foster
2018-05-29 13:08                 ` Christoph Hellwig
2018-05-29 17:04                   ` Brian Foster
2018-05-23 14:43 ` [PATCH 23/34] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
2018-05-23 14:43 ` [PATCH 24/34] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
2018-05-23 14:43 ` [PATCH 25/34] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
2018-05-24 14:59   ` Brian Foster
2018-05-24 15:06     ` Brian Foster
2018-05-24 17:10       ` Christoph Hellwig
2018-05-23 14:43 ` [PATCH 26/34] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
2018-05-23 14:43 ` [PATCH 27/34] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
2018-05-23 14:43 ` [PATCH 28/34] xfs: remove the imap_valid flag Christoph Hellwig
2018-05-23 14:43 ` [PATCH 29/34] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
2018-05-23 14:43 ` [PATCH 30/34] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
2018-05-23 14:43 ` [PATCH 31/34] xfs: remove xfs_start_page_writeback Christoph Hellwig
2018-05-23 14:43 ` [PATCH 32/34] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
2018-05-23 14:43 ` [PATCH 33/34] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
2018-05-23 14:43 ` [PATCH 34/34] xfs: allow writeback on pages without buffer heads Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2018-05-18 16:47 buffered I/O without buffer heads in xfs and iomap v2 Christoph Hellwig
2018-05-18 16:48 ` [PATCH 22/34] xfs: make xfs_writepage_map extent map centric Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180524145935.GA84959@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --subject='Re: [PATCH 22/34] xfs: make xfs_writepage_map extent map centric' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox