linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* stop using buffer heads in xfs v7
@ 2018-07-02 14:57 Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 01/22] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
                   ` (21 more replies)
  0 siblings, 22 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Hi all,

this series adds support for buffered writes without buffer heads to
the iomap and XFS code.

A git tree is available at:

    git://git.infradead.org/users/hch/xfs.git xfs-iomap-write.7

Gitweb:

    http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-iomap-write.7

Changes since v6:
 - fixe EOF zeroing for small blocksizes
 - various comment updates
 - new patch to update copyrights

Changes since v5:
 - merged the leftovers of the previous series into one
 - rebased on top of the gfs2 iomap prep series
 - fixed the map_blocks map conditions
 - clean up a loop in __iomap_write_begin

Changes since v4:
 - make the logic more clear in xfs_bmap_punch_delalloc_range
 - fix a missing unlock in xfs_bmap_punch_delalloc_range
 - always look up COW extents to check for overlaps
 - lots of patch reshuffling and splitting without impact to the final
   result

Changes since v3:
 - iterate backwards in xfs_bmap_punch_delalloc_range
 - remove the cow_valid variable in xfs_reflink_trim_irec_to_next_cow
 - additional trivial xfs_map_blocks simplifications
 - split the read side into a separate prep series
 - moved the SEEK_HOLE/DATA patches not strictly required out of the
   series

Changes since v2:
 - minor page_seek_hole_data tweaks
 - don't read data entirely covered by the write operation in write_begin
 - fix zeroing on write_begin I/O failure
 - remove iomap_block_needs_zeroing to make the code more clear
 - update comments on __do_page_cache_readahead

Changes since v1:
 - fix the iomap_readpages error handling
 - use unsigned file offsets in a few places to avoid arithmetic overflows
 - allocate a iomap_page in iomap_page_mkwrite to fix generic/095
 - improve a few comments
 - add more asserts
 - warn about truncated block numbers from ->bmap
 - new patch to change the __do_page_cache_readahead return value to
   unsigned int
 - remove an incorrectly added empty line
 - make inline data an explicit iomap type instead of a flag
 - add a IOMAP_F_BUFFER_HEAD flag to force use of buffers heads for gfs2,
   and keep the basic buffer head infrastructure around for now.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/22] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 02/22] xfs: simplify xfs_aops_discard_page Christoph Hellwig
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

For file systems with a block size that equals the page size we never do
partial reads, so we can use the buffer_head-less iomap versions of
readpage and readpages without conflicting with the buffer_head structures
create later in write_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8eb3ba3d4d00..85e1a625d42a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1401,6 +1401,8 @@ xfs_vm_readpage(
 	struct page		*page)
 {
 	trace_xfs_vm_readpage(page->mapping->host, 1);
+	if (i_blocksize(page->mapping->host) == PAGE_SIZE)
+		return iomap_readpage(page, &xfs_iomap_ops);
 	return mpage_readpage(page, xfs_get_blocks);
 }
 
@@ -1412,6 +1414,8 @@ xfs_vm_readpages(
 	unsigned		nr_pages)
 {
 	trace_xfs_vm_readpages(mapping->host, nr_pages);
+	if (i_blocksize(mapping->host) == PAGE_SIZE)
+		return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
 	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
 }
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/22] xfs: simplify xfs_aops_discard_page
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 01/22] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 03/22] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Instead of looking at the buffer heads to see if a block is delalloc just
call xfs_bmap_punch_delalloc_range on the whole page - this will leave
any non-delalloc block intact and handle the iteration for us.  As a side
effect one more place stops caring about buffer heads and we can remove the
xfs_check_page_type function entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 85 +++++------------------------------------------
 1 file changed, 9 insertions(+), 76 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 85e1a625d42a..9b1a17b4cacb 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -710,49 +710,6 @@ xfs_map_at_offset(
 	clear_buffer_unwritten(bh);
 }
 
-/*
- * Test if a given page contains at least one buffer of a given @type.
- * If @check_all_buffers is true, then we walk all the buffers in the page to
- * try to find one of the type passed in. If it is not set, then the caller only
- * needs to check the first buffer on the page for a match.
- */
-STATIC bool
-xfs_check_page_type(
-	struct page		*page,
-	unsigned int		type,
-	bool			check_all_buffers)
-{
-	struct buffer_head	*bh;
-	struct buffer_head	*head;
-
-	if (PageWriteback(page))
-		return false;
-	if (!page->mapping)
-		return false;
-	if (!page_has_buffers(page))
-		return false;
-
-	bh = head = page_buffers(page);
-	do {
-		if (buffer_unwritten(bh)) {
-			if (type == XFS_IO_UNWRITTEN)
-				return true;
-		} else if (buffer_delay(bh)) {
-			if (type == XFS_IO_DELALLOC)
-				return true;
-		} else if (buffer_dirty(bh) && buffer_mapped(bh)) {
-			if (type == XFS_IO_OVERWRITE)
-				return true;
-		}
-
-		/* If we are only checking the first buffer, we are done now. */
-		if (!check_all_buffers)
-			break;
-	} while ((bh = bh->b_this_page) != head);
-
-	return false;
-}
-
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
@@ -784,9 +741,6 @@ xfs_vm_invalidatepage(
  * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
  * truncation without a transaction as there is no space left for block
  * reservation (typically why we see a ENOSPC in writeback).
- *
- * This is not a performance critical path, so for now just do the punching a
- * buffer head at a time.
  */
 STATIC void
 xfs_aops_discard_page(
@@ -794,47 +748,26 @@ xfs_aops_discard_page(
 {
 	struct inode		*inode = page->mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	struct buffer_head	*bh, *head;
+	struct xfs_mount	*mp = ip->i_mount;
 	loff_t			offset = page_offset(page);
+	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, offset);
+	int			error;
 
-	if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
-		goto out_invalidate;
-
-	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
+	if (XFS_FORCED_SHUTDOWN(mp))
 		goto out_invalidate;
 
-	xfs_alert(ip->i_mount,
+	xfs_alert(mp,
 		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
 			page, ip->i_ino, offset);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	bh = head = page_buffers(page);
-	do {
-		int		error;
-		xfs_fileoff_t	start_fsb;
-
-		if (!buffer_delay(bh))
-			goto next_buffer;
-
-		start_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
-		error = xfs_bmap_punch_delalloc_range(ip, start_fsb, 1);
-		if (error) {
-			/* something screwed, just bail */
-			if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
-				xfs_alert(ip->i_mount,
-			"page discard unable to remove delalloc mapping.");
-			}
-			break;
-		}
-next_buffer:
-		offset += i_blocksize(inode);
-
-	} while ((bh = bh->b_this_page) != head);
-
+	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
+			PAGE_SIZE / i_blocksize(inode));
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error && !XFS_FORCED_SHUTDOWN(mp))
+		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
 	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
-	return;
 }
 
 static int
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/22] xfs: move locking into xfs_bmap_punch_delalloc_range
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 01/22] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 02/22] xfs: simplify xfs_aops_discard_page Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 04/22] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Both callers want the same looking, so do it only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c      | 2 --
 fs/xfs/xfs_bmap_util.c | 9 +++++----
 fs/xfs/xfs_iomap.c     | 3 ---
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 9b1a17b4cacb..5c549e983d69 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -760,10 +760,8 @@ xfs_aops_discard_page(
 		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
 			page, ip->i_ino, offset);
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
 			PAGE_SIZE / i_blocksize(inode));
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 83b1e8c6c18f..da561882c349 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -702,16 +702,15 @@ xfs_bmap_punch_delalloc_range(
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
 
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
 		if (error)
-			return error;
+			goto out_unlock;
 	}
 
 	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
-		return 0;
+		goto out_unlock;
 
 	while (got.br_startoff + got.br_blockcount > start_fsb) {
 		del = got;
@@ -735,6 +734,8 @@ xfs_bmap_punch_delalloc_range(
 			break;
 	}
 
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e08a84d9ee72..10c54fc7d1b4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1204,11 +1204,8 @@ xfs_file_iomap_end_delalloc(
 		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
 					 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
 					       end_fsb - start_fsb);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
 		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
 			xfs_alert(mp, "%s: unable to clean up ino %lld",
 				__func__, ip->i_ino);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/22] xfs: do not set the page uptodate in xfs_writepage_map
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-07-02 14:57 ` [PATCH 03/22] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 05/22] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We already track the page uptodate status based on the buffer uptodate
status, which is updated whenever reading or zeroing blocks.

This code has been there since commit a ptool commit in 2002, which
claims to:

    "merge" the 2.4 fsx fix for block size < page size to 2.5.  This needed
    major changes to actually fit.

and isn't present in other writepage implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5c549e983d69..df80a383ccd8 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -849,7 +849,6 @@ xfs_writepage_map(
 	uint64_t		offset;
 	int			error = 0;
 	int			count = 0;
-	int			uptodate = 1;
 	unsigned int		new_type;
 
 	bh = head = page_buffers(page);
@@ -857,8 +856,6 @@ xfs_writepage_map(
 	do {
 		if (offset >= end_offset)
 			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
 
 		/*
 		 * set_page_dirty dirties all buffers in a page, independent
@@ -922,9 +919,6 @@ xfs_writepage_map(
 
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
 out:
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/22] xfs: don't clear imap_valid for a non-uptodate buffers
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-07-02 14:57 ` [PATCH 04/22] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 06/22] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Finding a buffer that isn't uptodate doesn't invalidate the mapping for
any given block.  The last_sector check will already take care of starting
another ioend as soon as we find any non-update buffer, and if the current
mapping doesn't include the next uptodate buffer the xfs_imap_valid check
will take care of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index df80a383ccd8..1d1cb917cc6e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -863,10 +863,8 @@ xfs_writepage_map(
 		 * meaningless for holes (!mapped && uptodate), so skip
 		 * buffers covering holes here.
 		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			wpc->imap_valid = false;
+		if (!buffer_mapped(bh) && buffer_uptodate(bh))
 			continue;
-		}
 
 		if (buffer_unwritten(bh))
 			new_type = XFS_IO_UNWRITTEN;
@@ -879,11 +877,8 @@ xfs_writepage_map(
 				ASSERT(buffer_mapped(bh));
 			/*
 			 * This buffer is not uptodate and will not be
-			 * written to disk.  Ensure that we will put any
-			 * subsequent writeable buffers into a new
-			 * ioend.
+			 * written to disk.
 			 */
-			wpc->imap_valid = false;
 			continue;
 		}
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/22] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-07-02 14:57 ` [PATCH 05/22] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 07/22] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We want to be able to use the extent state as a reliably indicator for
the type of I/O, and stop using the buffer head state.  For this we
need to stop using the XFS_BMAPI_IGSTATE so that we don't see merged
extents of different types.

Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1d1cb917cc6e..6b6150683343 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -373,7 +373,6 @@ xfs_map_blocks(
 	ssize_t			count = i_blocksize(inode);
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	int			error = 0;
-	int			bmapi_flags = XFS_BMAPI_ENTIRE;
 	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
@@ -393,8 +392,6 @@ xfs_map_blocks(
 		return 0;
 
 	ASSERT(type != XFS_IO_COW);
-	if (type == XFS_IO_UNWRITTEN)
-		bmapi_flags |= XFS_BMAPI_IGSTATE;
 
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
@@ -406,7 +403,7 @@ xfs_map_blocks(
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				imap, &nimaps, bmapi_flags);
+				imap, &nimaps, XFS_BMAPI_ENTIRE);
 	/*
 	 * Truncate an overwrite extent if there's a pending CoW
 	 * reservation before the end of this extent.  This forces us
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/22] xfs: remove xfs_reflink_trim_irec_to_next_cow
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-07-02 14:57 ` [PATCH 06/22] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:57 ` [PATCH 08/22] xfs: remove xfs_map_cow Christoph Hellwig
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We already have to check for overlapping COW extents everytime we
come back to a page in xfs_writepage_map / xfs_map_cow, so this
additional trim is not required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c    |  7 -------
 fs/xfs/xfs_reflink.c | 33 ---------------------------------
 fs/xfs/xfs_reflink.h |  2 --
 fs/xfs/xfs_trace.h   |  1 -
 4 files changed, 43 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 6b6150683343..08605432c497 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -404,13 +404,6 @@ xfs_map_blocks(
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				imap, &nimaps, XFS_BMAPI_ENTIRE);
-	/*
-	 * Truncate an overwrite extent if there's a pending CoW
-	 * reservation before the end of this extent.  This forces us
-	 * to come back to writepage to take care of the CoW.
-	 */
-	if (nimaps && type == XFS_IO_OVERWRITE)
-		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 	if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 592fb2071a03..22c11b98ab26 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -500,39 +500,6 @@ xfs_reflink_find_cow_mapping(
 	return true;
 }
 
-/*
- * Trim an extent to end at the next CoW reservation past offset_fsb.
- */
-void
-xfs_reflink_trim_irec_to_next_cow(
-	struct xfs_inode		*ip,
-	xfs_fileoff_t			offset_fsb,
-	struct xfs_bmbt_irec		*imap)
-{
-	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	struct xfs_bmbt_irec		got;
-	struct xfs_iext_cursor		icur;
-
-	if (!xfs_is_reflink_inode(ip))
-		return;
-
-	/* Find the extent in the CoW fork. */
-	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
-		return;
-
-	/* This is the extent before; try sliding up one. */
-	if (got.br_startoff < offset_fsb) {
-		if (!xfs_iext_next_extent(ifp, &icur, &got))
-			return;
-	}
-
-	if (got.br_startoff >= imap->br_startoff + imap->br_blockcount)
-		return;
-
-	imap->br_blockcount = got.br_startoff - imap->br_startoff;
-	trace_xfs_reflink_trim_irec(ip, imap);
-}
-
 /*
  * Cancel CoW reservations for some block range of an inode.
  *
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 1532827ba911..6f9f98894abc 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -20,8 +20,6 @@ extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
 extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
 		struct xfs_bmbt_irec *imap);
-extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
-		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
 
 extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
 		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 972d45d28097..a5b01529ecf6 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3216,7 +3216,6 @@ DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
 DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
-DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/22] xfs: remove xfs_map_cow
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-07-02 14:57 ` [PATCH 07/22] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
@ 2018-07-02 14:57 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 09/22] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:57 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We can handle the existing cow mapping case as a special case directly
in xfs_writepage_map, and share code for allocating delalloc blocks
with regular I/O in xfs_map_blocks.  This means we need to always
call xfs_map_blocks for reflink inodes, but we can still skip most of
the work if it turns out that there is no COW mapping overlapping the
current block.

As a subtle detail we need to start caching holes in the wpc to deal
with the case of COW reservations between EOF.  But we'll need that
infrastructure later anyway, so this is no big deal.

Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 195 +++++++++++++++++++++++-----------------------
 fs/xfs/xfs_aops.h |   4 +-
 2 files changed, 100 insertions(+), 99 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 08605432c497..65454a4f4d93 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -363,70 +363,107 @@ xfs_end_bio(
 
 STATIC int
 xfs_map_blocks(
+	struct xfs_writepage_ctx *wpc,
 	struct inode		*inode,
-	loff_t			offset,
-	struct xfs_bmbt_irec	*imap,
-	int			type)
+	loff_t			offset)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb, end_fsb;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
+	struct xfs_bmbt_irec	imap;
+	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
 	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	/*
-	 * Truncate can race with writeback since writeback doesn't take the
-	 * iolock and truncate decreases the file size before it starts
-	 * truncating the pages between new_size and old_size.  Therefore, we
-	 * can end up in the situation where writeback gets a CoW fork mapping
-	 * but the truncate makes the mapping invalid and we end up in here
-	 * trying to get a new mapping.  Bail out here so that we simply never
-	 * get a valid mapping and so we drop the write altogether.  The page
-	 * truncation will kill the contents anyway.
-	 */
-	if (type == XFS_IO_COW && offset > i_size_read(inode))
-		return 0;
-
-	ASSERT(type != XFS_IO_COW);
-
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
 
+	if (xfs_is_reflink_inode(ip) &&
+	    xfs_reflink_find_cow_mapping(ip, offset, &imap)) {
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+		/*
+		 * Truncate can race with writeback since writeback doesn't
+		 * take the iolock and truncate decreases the file size before
+		 * it starts truncating the pages between new_size and old_size.
+		 * Therefore, we can end up in the situation where writeback
+		 * gets a CoW fork mapping but the truncate makes the mapping
+		 * invalid and we end up in here trying to get a new mapping.
+		 * bail out here so that we simply never get a valid mapping
+		 * and so we drop the write altogether.  The page truncation
+		 * will kill the contents anyway.
+		 */
+		if (offset > i_size_read(inode)) {
+			wpc->io_type = XFS_IO_HOLE;
+			return 0;
+		}
+		whichfork = XFS_COW_FORK;
+		wpc->io_type = XFS_IO_COW;
+		goto allocate_blocks;
+	}
+
+	/*
+	 * Map valid and no COW extent in the way?  We're done.
+	 */
+	if (wpc->imap_valid) {
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+		return 0;
+	}
+
+	/*
+	 * If we don't have a valid map, now it's time to get a new one for this
+	 * offset.  This will convert delayed allocations (including COW ones)
+	 * into real extents.
+	 */
 	if (offset > mp->m_super->s_maxbytes - count)
 		count = mp->m_super->s_maxbytes - offset;
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				imap, &nimaps, XFS_BMAPI_ENTIRE);
+				&imap, &nimaps, XFS_BMAPI_ENTIRE);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
 	if (error)
 		return error;
 
-	if (type == XFS_IO_DELALLOC &&
-	    (!nimaps || isnullstartblock(imap->br_startblock))) {
-		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
-				imap);
-		if (!error)
-			trace_xfs_map_blocks_alloc(ip, offset, count, type, imap);
-		return error;
+	if (!nimaps) {
+		/*
+		 * Lookup returns no match? Beyond eof? regardless,
+		 * return it as a hole so we don't write it
+		 */
+		imap.br_startoff = offset_fsb;
+		imap.br_blockcount = end_fsb - offset_fsb;
+		imap.br_startblock = HOLESTARTBLOCK;
+		wpc->io_type = XFS_IO_HOLE;
+	} else if (imap.br_startblock == HOLESTARTBLOCK) {
+		/* landed in a hole */
+		wpc->io_type = XFS_IO_HOLE;
 	}
 
+	if (wpc->io_type == XFS_IO_DELALLOC &&
+	    (!nimaps || isnullstartblock(imap.br_startblock)))
+		goto allocate_blocks;
+
 #ifdef DEBUG
-	if (type == XFS_IO_UNWRITTEN) {
+	if (wpc->io_type == XFS_IO_UNWRITTEN) {
 		ASSERT(nimaps);
-		ASSERT(imap->br_startblock != HOLESTARTBLOCK);
-		ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
+		ASSERT(imap.br_startblock != HOLESTARTBLOCK);
+		ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
 	}
 #endif
-	if (nimaps)
-		trace_xfs_map_blocks_found(ip, offset, count, type, imap);
+	wpc->imap = imap;
+	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
+	return 0;
+allocate_blocks:
+	error = xfs_iomap_write_allocate(ip, whichfork, offset, &imap);
+	if (error)
+		return error;
+	wpc->imap = imap;
+	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap);
 	return 0;
 }
 
@@ -758,56 +795,6 @@ xfs_aops_discard_page(
 	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
 }
 
-static int
-xfs_map_cow(
-	struct xfs_writepage_ctx *wpc,
-	struct inode		*inode,
-	loff_t			offset,
-	unsigned int		*new_type)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_bmbt_irec	imap;
-	bool			is_cow = false;
-	int			error;
-
-	/*
-	 * If we already have a valid COW mapping keep using it.
-	 */
-	if (wpc->io_type == XFS_IO_COW) {
-		wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
-		if (wpc->imap_valid) {
-			*new_type = XFS_IO_COW;
-			return 0;
-		}
-	}
-
-	/*
-	 * Else we need to check if there is a COW mapping at this offset.
-	 */
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
-	is_cow = xfs_reflink_find_cow_mapping(ip, offset, &imap);
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
-	if (!is_cow)
-		return 0;
-
-	/*
-	 * And if the COW mapping has a delayed extent here we need to
-	 * allocate real space for it now.
-	 */
-	if (isnullstartblock(imap.br_startblock)) {
-		error = xfs_iomap_write_allocate(ip, XFS_COW_FORK, offset,
-				&imap);
-		if (error)
-			return error;
-	}
-
-	wpc->io_type = *new_type = XFS_IO_COW;
-	wpc->imap_valid = true;
-	wpc->imap = imap;
-	return 0;
-}
-
 /*
  * We implement an immediate ioend submission policy here to avoid needing to
  * chain multiple ioends and hence nest mempool allocations which can violate
@@ -836,7 +823,7 @@ xfs_writepage_map(
 	struct xfs_ioend	*ioend, *next;
 	struct buffer_head	*bh, *head;
 	ssize_t			len = i_blocksize(inode);
-	uint64_t		offset;
+	uint64_t		offset;	/* file offset of page */
 	int			error = 0;
 	int			count = 0;
 	unsigned int		new_type;
@@ -872,10 +859,13 @@ xfs_writepage_map(
 			continue;
 		}
 
-		if (xfs_is_reflink_inode(XFS_I(inode))) {
-			error = xfs_map_cow(wpc, inode, offset, &new_type);
-			if (error)
-				goto out;
+		/*
+		 * If we already have a valid COW mapping keep using it.
+		 */
+		if (wpc->io_type == XFS_IO_COW &&
+		    xfs_imap_valid(inode, &wpc->imap, offset)) {
+			wpc->imap_valid = true;
+			new_type = XFS_IO_COW;
 		}
 
 		if (wpc->io_type != new_type) {
@@ -886,22 +876,31 @@ xfs_writepage_map(
 		if (wpc->imap_valid)
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
 							 offset);
-		if (!wpc->imap_valid) {
-			error = xfs_map_blocks(inode, offset, &wpc->imap,
-					     wpc->io_type);
+
+		/*
+		 * COW fork blocks can overlap data fork blocks even if the
+		 * blocks aren't shared. COW I/O always takes precedent, so we
+		 * must always check for overlap on reflink inodes unless the
+		 * mapping is already a COW one.
+		 */
+		if (!wpc->imap_valid ||
+		    (xfs_is_reflink_inode(XFS_I(inode)) &&
+		     wpc->io_type != XFS_IO_COW)) {
+			error = xfs_map_blocks(wpc, inode, offset);
 			if (error)
 				goto out;
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
 							 offset);
 		}
-		if (wpc->imap_valid) {
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
-			count++;
-		}
 
+		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
+			continue;
+
+		lock_buffer(bh);
+		if (wpc->io_type != XFS_IO_OVERWRITE)
+			xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+		xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
+		count++;
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 25bc6d4a1231..9af867951a10 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -17,6 +17,7 @@ enum {
 	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
 	XFS_IO_OVERWRITE,	/* covers already allocated extent */
 	XFS_IO_COW,		/* covers copy-on-write extent */
+	XFS_IO_HOLE,		/* covers region without any block allocation */
 };
 
 #define XFS_IO_TYPES \
@@ -24,7 +25,8 @@ enum {
 	{ XFS_IO_DELALLOC,		"delalloc" }, \
 	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
 	{ XFS_IO_OVERWRITE,		"overwrite" }, \
-	{ XFS_IO_COW,			"CoW" }
+	{ XFS_IO_COW,			"CoW" }, \
+	{ XFS_IO_HOLE,			"hole" }
 
 /*
  * Structure for buffered I/O completions.
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/22] xfs: rename the offset variable in xfs_writepage_map
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-07-02 14:57 ` [PATCH 08/22] xfs: remove xfs_map_cow Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 10/22] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Calling it file_offset makes the usage more clear, especially with
a new poffset variable that will be added soon for the offset inside
the page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 65454a4f4d93..4dc5fcff226e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -823,15 +823,15 @@ xfs_writepage_map(
 	struct xfs_ioend	*ioend, *next;
 	struct buffer_head	*bh, *head;
 	ssize_t			len = i_blocksize(inode);
-	uint64_t		offset;	/* file offset of page */
+	uint64_t		file_offset;	/* file offset of page */
 	int			error = 0;
 	int			count = 0;
 	unsigned int		new_type;
 
 	bh = head = page_buffers(page);
-	offset = page_offset(page);
+	file_offset = page_offset(page);
 	do {
-		if (offset >= end_offset)
+		if (file_offset >= end_offset)
 			break;
 
 		/*
@@ -863,7 +863,7 @@ xfs_writepage_map(
 		 * If we already have a valid COW mapping keep using it.
 		 */
 		if (wpc->io_type == XFS_IO_COW &&
-		    xfs_imap_valid(inode, &wpc->imap, offset)) {
+		    xfs_imap_valid(inode, &wpc->imap, file_offset)) {
 			wpc->imap_valid = true;
 			new_type = XFS_IO_COW;
 		}
@@ -875,7 +875,7 @@ xfs_writepage_map(
 
 		if (wpc->imap_valid)
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
+							 file_offset);
 
 		/*
 		 * COW fork blocks can overlap data fork blocks even if the
@@ -886,11 +886,11 @@ xfs_writepage_map(
 		if (!wpc->imap_valid ||
 		    (xfs_is_reflink_inode(XFS_I(inode)) &&
 		     wpc->io_type != XFS_IO_COW)) {
-			error = xfs_map_blocks(wpc, inode, offset);
+			error = xfs_map_blocks(wpc, inode, file_offset);
 			if (error)
 				goto out;
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
+							 file_offset);
 		}
 
 		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
@@ -898,10 +898,10 @@ xfs_writepage_map(
 
 		lock_buffer(bh);
 		if (wpc->io_type != XFS_IO_OVERWRITE)
-			xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-		xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
+			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
 		count++;
-	} while (offset += len, ((bh = bh->b_this_page) != head));
+	} while (file_offset += len, ((bh = bh->b_this_page) != head));
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/22] xfs: make xfs_writepage_map extent map centric
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 09/22] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 11/22] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

xfs_writepage_map() iterates over the bufferheads on a page to decide
what sort of IO to do and what actions to take.  However, when it comes
to reflink and deciding when it needs to execute a COW operation, we no
longer look at the bufferhead state but instead we ignore than and look
up internal state held in the COW fork extent list.

This means xfs_writepage_map() is somewhat confused. It does stuff, then
ignores it, then tries to handle the impedence mismatch by shovelling the
results inside the existing mapping code.  It works, but it's a bit of a
mess and it makes it hard to fix the cached map bug that the writepage
code currently has.

To unify the two different mechanisms, we first have to choose a direction.
That's already been set - we're de-emphasising bufferheads so they are no
longer a control structure as we need to do taht to allow for eventual
removal.  Hence we need to move away from looking at bufferhead state to
determine what operations we need to perform.

We can't completely get rid of bufferheads yet - they do contain some
state that is absolutely necessary, such as whether that part of the page
contains valid data or not (buffer_uptodate()).  Other state in the
bufferhead is redundant:

	BH_dirty - the page is dirty, so we can ignore this and just
		write it
	BH_delay - we have delalloc extent info in the DATA fork extent
		tree
	BH_unwritten - same as BH_delay
	BH_mapped - indicates we've already used it once for IO and it is
		mapped to a disk address. Needs to be ignored for COW
		blocks.

The BH_mapped flag is an interesting case - it's supposed to indicate that
it's already mapped to disk and so we can just use it "as is".  In theory,
we don't even have to do an extent lookup to find where to write it too,
but we have to do that anyway to determine we are actually writing over a
valid extent.  Hence it's not even serving the purpose of avoiding a an
extent lookup during writeback, and so we can pretty much ignore it.
Especially as we have to ignore it for COW operations...

Therefore, use the extent map as the source of information to tell us
what actions we need to take and what sort of IO we should perform.  The
first step is to have xfs_map_blocks() set the io type according to what
it looks up.  This means it can easily handle both normal overwrite and
COW cases.  The only thing we also need to add is the ability to return
hole mappings.

We need to return and cache hole mappings now for the case of multiple
blocks per page.  We no longer use the BH_mapped to indicate a block over
a hole, so we have to get that info from xfs_map_blocks().  We cache it so
that holes that span two pages don't need separate lookups.  This allows us
to avoid ever doing write IO over a hole, too.

Now that we have xfs_map_blocks() returning both a cached map and the type
of IO we need to perform, we can rewrite xfs_writepage_map() to drop all
the bufferhead control. It's also much simplified because it doesn't need
to explicitly handle COW operations.  Instead of iterating bufferheads, it
iterates blocks within the page and then looks up what per-block state is
required from the appropriate bufferhead.  It then validates the cached
map, and if it's not valid, we get a new map.  If we don't get a valid map
or it's over a hole, we skip the block.

At this point, we have to remap the bufferhead via xfs_map_at_offset().
As previously noted, we had to do this even if the buffer was already
mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN
and XFS_IO_COW IO types.  With xfs_map_blocks() now controlling the type,
even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet-
written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE.
Bufferheads that span such regions still need their BH_Delay flags cleared
and their block numbers calculated, so we now unconditionally map each
bufferhead before submission.

But wait! There's more - remember the old "treat unwritten extents as
holes on read" hack?  Yeah, that means we can have a dirty page with
unmapped, unwritten bufferheads that contain data!  What makes these so
special is that the unwritten "hole" bufferheads do not have a valid block
device pointer, so if we attempt to write them xfs_add_to_ioend() blows
up. So we make xfs_map_at_offset() do the "realtime or data device"
lookup from the inode and ignore what was or wasn't put into the
bufferhead when the buffer was instantiated.

The astute reader will have realised by now that this code treats
unwritten extents in multiple-blocks-per-page situations differently.
If we get any combination of unwritten blocks on a dirty page that contain
valid data in the page, we're going to convert them to real extents.  This
can actually be a win, because it means that pages with interleaving
unwritten and written blocks will get converted to a single written extent
with zeros replacing the interspersed unwritten blocks.  This is actually
good for reducing extent list and conversion overhead, and it means we
issue a contiguous IO instead of lots of little ones.  The downside is
that we use up a little extra IO bandwidth.  Neither of these seem like a
bad thing given that spinning disks are seek sensitive, and SSDs/pmem have
bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger
IOs will result in better performance on them...

As a result of all this, the only state we actually care about from the
bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to
pass some information to the bio via xfs_add_to_ioend(), but that is
trivial to separate and pass explicitly.  This means we really only need
1 bit of state per block per page from the buffered write path in the
writeback path.  Everything else we do with the bufferhead is purely to
make the buffered IO front end continue to work correctly. i.e we've
pretty much marginalised bufferheads in the writeback path completely.

Signed-off-By: Dave Chinner <dchinner@redhat.com>
[hch: forward port, refactor and split off bits into other commits]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 88 +++++++++++++++++++----------------------------
 1 file changed, 36 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4dc5fcff226e..815b0b29438b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -442,19 +442,19 @@ xfs_map_blocks(
 	} else if (imap.br_startblock == HOLESTARTBLOCK) {
 		/* landed in a hole */
 		wpc->io_type = XFS_IO_HOLE;
-	}
-
-	if (wpc->io_type == XFS_IO_DELALLOC &&
-	    (!nimaps || isnullstartblock(imap.br_startblock)))
-		goto allocate_blocks;
+	} else {
+		if (isnullstartblock(imap.br_startblock)) {
+			/* got a delalloc extent */
+			wpc->io_type = XFS_IO_DELALLOC;
+			goto allocate_blocks;
+		}
 
-#ifdef DEBUG
-	if (wpc->io_type == XFS_IO_UNWRITTEN) {
-		ASSERT(nimaps);
-		ASSERT(imap.br_startblock != HOLESTARTBLOCK);
-		ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
+		if (imap.br_state == XFS_EXT_UNWRITTEN)
+			wpc->io_type = XFS_IO_UNWRITTEN;
+		else
+			wpc->io_type = XFS_IO_OVERWRITE;
 	}
-#endif
+
 	wpc->imap = imap;
 	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
 	return 0;
@@ -735,6 +735,14 @@ xfs_map_at_offset(
 	set_buffer_mapped(bh);
 	clear_buffer_delay(bh);
 	clear_buffer_unwritten(bh);
+
+	/*
+	 * If this is a realtime file, data may be on a different device.
+	 * to that pointed to from the buffer_head b_bdev currently. We can't
+	 * trust that the bufferhead has a already been mapped correctly, so
+	 * set the bdev now.
+	 */
+	bh->b_bdev = xfs_find_bdev_for_inode(inode);
 }
 
 STATIC void
@@ -821,58 +829,35 @@ xfs_writepage_map(
 {
 	LIST_HEAD(submit_list);
 	struct xfs_ioend	*ioend, *next;
-	struct buffer_head	*bh, *head;
+	struct buffer_head	*bh;
 	ssize_t			len = i_blocksize(inode);
 	uint64_t		file_offset;	/* file offset of page */
+	unsigned		poffset;	/* offset into page */
 	int			error = 0;
 	int			count = 0;
-	unsigned int		new_type;
 
-	bh = head = page_buffers(page);
+	/*
+	 * Walk the blocks on the page, and if we run off the end of the current
+	 * map or find the current map invalid, grab a new one.  We only use
+	 * bufferheads here to check per-block state - they no longer control
+	 * the iteration through the page. This allows us to replace the
+	 * bufferhead with some other state tracking mechanism in future.
+	 */
 	file_offset = page_offset(page);
-	do {
+	bh = page_buffers(page);
+	for (poffset = 0;
+	     poffset < PAGE_SIZE;
+	     poffset += len, file_offset += len, bh = bh->b_this_page) {
+		/* past the range we are writing, so nothing more to write. */
 		if (file_offset >= end_offset)
 			break;
 
-		/*
-		 * set_page_dirty dirties all buffers in a page, independent
-		 * of their state.  The dirty state however is entirely
-		 * meaningless for holes (!mapped && uptodate), so skip
-		 * buffers covering holes here.
-		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh))
-			continue;
-
-		if (buffer_unwritten(bh))
-			new_type = XFS_IO_UNWRITTEN;
-		else if (buffer_delay(bh))
-			new_type = XFS_IO_DELALLOC;
-		else if (buffer_uptodate(bh))
-			new_type = XFS_IO_OVERWRITE;
-		else {
+		if (!buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
-			/*
-			 * This buffer is not uptodate and will not be
-			 * written to disk.
-			 */
 			continue;
 		}
 
-		/*
-		 * If we already have a valid COW mapping keep using it.
-		 */
-		if (wpc->io_type == XFS_IO_COW &&
-		    xfs_imap_valid(inode, &wpc->imap, file_offset)) {
-			wpc->imap_valid = true;
-			new_type = XFS_IO_COW;
-		}
-
-		if (wpc->io_type != new_type) {
-			wpc->io_type = new_type;
-			wpc->imap_valid = false;
-		}
-
 		if (wpc->imap_valid)
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
 							 file_offset);
@@ -897,11 +882,10 @@ xfs_writepage_map(
 			continue;
 
 		lock_buffer(bh);
-		if (wpc->io_type != XFS_IO_OVERWRITE)
-			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
 		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
 		count++;
-	} while (file_offset += len, ((bh = bh->b_this_page) != head));
+	}
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/22] xfs: remove the now unused XFS_BMAPI_IGSTATE flag
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 10/22] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 12/22] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 6 ++----
 fs/xfs/libxfs/xfs_bmap.h | 3 ---
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7205268b30bc..68ea1f4b9c3f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3788,8 +3788,7 @@ xfs_bmapi_update_map(
 		   mval[-1].br_startblock != HOLESTARTBLOCK &&
 		   mval->br_startblock == mval[-1].br_startblock +
 					  mval[-1].br_blockcount &&
-		   ((flags & XFS_BMAPI_IGSTATE) ||
-			mval[-1].br_state == mval->br_state)) {
+		   mval[-1].br_state == mval->br_state) {
 		ASSERT(mval->br_startoff ==
 		       mval[-1].br_startoff + mval[-1].br_blockcount);
 		mval[-1].br_blockcount += mval->br_blockcount;
@@ -3834,7 +3833,7 @@ xfs_bmapi_read(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
-			   XFS_BMAPI_IGSTATE|XFS_BMAPI_COWFORK)));
+			   XFS_BMAPI_COWFORK)));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
 
 	if (unlikely(XFS_TEST_ERROR(
@@ -4279,7 +4278,6 @@ xfs_bmapi_write(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
-	ASSERT(!(flags & XFS_BMAPI_IGSTATE));
 	ASSERT(tp != NULL ||
 	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
 			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 9b49ddf99c41..44639588d1c7 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -68,8 +68,6 @@ struct xfs_extent_free_item
 #define XFS_BMAPI_METADATA	0x002	/* mapping metadata not user data */
 #define XFS_BMAPI_ATTRFORK	0x004	/* use attribute fork not data */
 #define XFS_BMAPI_PREALLOC	0x008	/* preallocation op: unwritten space */
-#define XFS_BMAPI_IGSTATE	0x010	/* Ignore state - */
-					/* combine contig. space */
 #define XFS_BMAPI_CONTIG	0x020	/* must allocate only one extent */
 /*
  * unwritten extent conversion - this needs write cache flushing and no additional
@@ -116,7 +114,6 @@ struct xfs_extent_free_item
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
 	{ XFS_BMAPI_ATTRFORK,	"ATTRFORK" }, \
 	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
-	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
 	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
 	{ XFS_BMAPI_ZERO,	"ZERO" }, \
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/22] xfs: remove xfs_reflink_find_cow_mapping
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 11/22] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 13/22] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We only have one caller left, and open coding the simple extent list
lookup in it allows us to make the code both more understandable and
reuse calculations and variables already present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c    | 19 +++++++++++++------
 fs/xfs/xfs_reflink.c | 30 ------------------------------
 fs/xfs/xfs_reflink.h |  2 --
 fs/xfs/xfs_trace.h   |  1 -
 4 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 815b0b29438b..5c5d8c832dcc 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -370,9 +370,10 @@ xfs_map_blocks(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
+	xfs_fileoff_t		offset_fsb, end_fsb;
 	struct xfs_bmbt_irec	imap;
 	int			whichfork = XFS_DATA_FORK;
+	struct xfs_iext_cursor	icur;
 	int			error = 0;
 	int			nimaps = 1;
 
@@ -384,8 +385,18 @@ xfs_map_blocks(
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
 
+	if (offset > mp->m_super->s_maxbytes - count)
+		count = mp->m_super->s_maxbytes - offset;
+	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
+	offset_fsb = XFS_B_TO_FSBT(mp, offset);
+
+	/*
+	 * Check if this is offset is covered by a COW extents, and if yes use
+	 * it directly instead of looking up anything in the data fork.
+	 */
 	if (xfs_is_reflink_inode(ip) &&
-	    xfs_reflink_find_cow_mapping(ip, offset, &imap)) {
+	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &imap) &&
+	    imap.br_startoff <= offset_fsb) {
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		/*
 		 * Truncate can race with writeback since writeback doesn't
@@ -420,10 +431,6 @@ xfs_map_blocks(
 	 * offset.  This will convert delayed allocations (including COW ones)
 	 * into real extents.
 	 */
-	if (offset > mp->m_super->s_maxbytes - count)
-		count = mp->m_super->s_maxbytes - offset;
-	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				&imap, &nimaps, XFS_BMAPI_ENTIRE);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 22c11b98ab26..49e4913fa779 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -470,36 +470,6 @@ xfs_reflink_allocate_cow(
 	return error;
 }
 
-/*
- * Find the CoW reservation for a given byte offset of a file.
- */
-bool
-xfs_reflink_find_cow_mapping(
-	struct xfs_inode		*ip,
-	xfs_off_t			offset,
-	struct xfs_bmbt_irec		*imap)
-{
-	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	xfs_fileoff_t			offset_fsb;
-	struct xfs_bmbt_irec		got;
-	struct xfs_iext_cursor		icur;
-
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
-
-	if (!xfs_is_reflink_inode(ip))
-		return false;
-	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
-	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
-		return false;
-	if (got.br_startoff > offset_fsb)
-		return false;
-
-	trace_xfs_reflink_find_cow_mapping(ip, offset, 1, XFS_IO_OVERWRITE,
-			&got);
-	*imap = got;
-	return true;
-}
-
 /*
  * Cancel CoW reservations for some block range of an inode.
  *
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 6f9f98894abc..c585ad9552b2 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -18,8 +18,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
-extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
-		struct xfs_bmbt_irec *imap);
 
 extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
 		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index a5b01529ecf6..1af123df19b5 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3215,7 +3215,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
 DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
-DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/22] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 12/22] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 14/22] xfs: remove the imap_valid flag Christoph Hellwig
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

xfs_bmapi_read adds zero value in xfs_map_blocks.  Replace it with a
direct call to the low-level extent lookup function.

Note that we now always pass a 0 length to the trace points as we ask
for an unspecified len.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5c5d8c832dcc..0bfcc2d06658 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -375,7 +375,6 @@ xfs_map_blocks(
 	int			whichfork = XFS_DATA_FORK;
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
-	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
@@ -431,24 +430,16 @@ xfs_map_blocks(
 	 * offset.  This will convert delayed allocations (including COW ones)
 	 * into real extents.
 	 */
-	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				&imap, &nimaps, XFS_BMAPI_ENTIRE);
+	if (!xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap))
+		imap.br_startoff = end_fsb;	/* fake a hole past EOF */
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-	if (error)
-		return error;
 
-	if (!nimaps) {
-		/*
-		 * Lookup returns no match? Beyond eof? regardless,
-		 * return it as a hole so we don't write it
-		 */
+	if (imap.br_startoff > offset_fsb) {
+		/* landed in a hole or beyond EOF */
+		imap.br_blockcount = imap.br_startoff - offset_fsb;
 		imap.br_startoff = offset_fsb;
-		imap.br_blockcount = end_fsb - offset_fsb;
 		imap.br_startblock = HOLESTARTBLOCK;
 		wpc->io_type = XFS_IO_HOLE;
-	} else if (imap.br_startblock == HOLESTARTBLOCK) {
-		/* landed in a hole */
-		wpc->io_type = XFS_IO_HOLE;
 	} else {
 		if (isnullstartblock(imap.br_startblock)) {
 			/* got a delalloc extent */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/22] xfs: remove the imap_valid flag
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 13/22] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 15/22] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Simplify the way we check for a valid imap - we know we have a valid
mapping after xfs_map_blocks returned successfully, and we know we can
call xfs_imap_valid on any imap, as it will always fail on a
zero-initialized map.

We can also remove the xfs_imap_valid function and fold it into
xfs_map_blocks now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 89 ++++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0bfcc2d06658..09092f10cff3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -30,7 +30,6 @@
  */
 struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
-	bool			imap_valid;
 	unsigned int		io_type;
 	struct xfs_ioend	*ioend;
 	sector_t		last_block;
@@ -370,15 +369,47 @@ xfs_map_blocks(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb, end_fsb;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
 	struct xfs_bmbt_irec	imap;
 	int			whichfork = XFS_DATA_FORK;
 	struct xfs_iext_cursor	icur;
+	bool			imap_valid;
 	int			error = 0;
 
+	/*
+	 * We have to make sure the cached mapping is within EOF to protect
+	 * against eofblocks trimming on file release leaving us with a stale
+	 * mapping. Otherwise, a page for a subsequent file extending buffered
+	 * write could get picked up by this writeback cycle and written to the
+	 * wrong blocks.
+	 *
+	 * Note that what we really want here is a generic mapping invalidation
+	 * mechanism to protect us from arbitrary extent modifying contexts, not
+	 * just eofblocks.
+	 */
+	xfs_trim_extent_eof(&wpc->imap, ip);
+
+	/*
+	 * COW fork blocks can overlap data fork blocks even if the blocks
+	 * aren't shared.  COW I/O always takes precedent, so we must always
+	 * check for overlap on reflink inodes unless the mapping is already a
+	 * COW one.
+	 */
+	imap_valid = offset_fsb >= wpc->imap.br_startoff &&
+		     offset_fsb < wpc->imap.br_startoff + wpc->imap.br_blockcount;
+	if (imap_valid &&
+	    (!xfs_is_reflink_inode(ip) || wpc->io_type == XFS_IO_COW))
+		return 0;
+
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	/*
+	 * If we don't have a valid map, now it's time to get a new one for this
+	 * offset.  This will convert delayed allocations (including COW ones)
+	 * into real extents.  If we return without a valid map, it means we
+	 * landed in a hole and we skip the block.
+	 */
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
@@ -387,7 +418,6 @@ xfs_map_blocks(
 	if (offset > mp->m_super->s_maxbytes - count)
 		count = mp->m_super->s_maxbytes - offset;
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 
 	/*
 	 * Check if this is offset is covered by a COW extents, and if yes use
@@ -420,7 +450,7 @@ xfs_map_blocks(
 	/*
 	 * Map valid and no COW extent in the way?  We're done.
 	 */
-	if (wpc->imap_valid) {
+	if (imap_valid) {
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		return 0;
 	}
@@ -465,31 +495,6 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC bool
-xfs_imap_valid(
-	struct inode		*inode,
-	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset)
-{
-	offset >>= inode->i_blkbits;
-
-	/*
-	 * We have to make sure the cached mapping is within EOF to protect
-	 * against eofblocks trimming on file release leaving us with a stale
-	 * mapping. Otherwise, a page for a subsequent file extending buffered
-	 * write could get picked up by this writeback cycle and written to the
-	 * wrong blocks.
-	 *
-	 * Note that what we really want here is a generic mapping invalidation
-	 * mechanism to protect us from arbitrary extent modifying contexts, not
-	 * just eofblocks.
-	 */
-	xfs_trim_extent_eof(imap, XFS_I(inode));
-
-	return offset >= imap->br_startoff &&
-		offset < imap->br_startoff + imap->br_blockcount;
-}
-
 STATIC void
 xfs_start_buffer_writeback(
 	struct buffer_head	*bh)
@@ -856,27 +861,10 @@ xfs_writepage_map(
 			continue;
 		}
 
-		if (wpc->imap_valid)
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 file_offset);
-
-		/*
-		 * COW fork blocks can overlap data fork blocks even if the
-		 * blocks aren't shared. COW I/O always takes precedent, so we
-		 * must always check for overlap on reflink inodes unless the
-		 * mapping is already a COW one.
-		 */
-		if (!wpc->imap_valid ||
-		    (xfs_is_reflink_inode(XFS_I(inode)) &&
-		     wpc->io_type != XFS_IO_COW)) {
-			error = xfs_map_blocks(wpc, inode, file_offset);
-			if (error)
-				goto out;
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 file_offset);
-		}
-
-		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
+		error = xfs_map_blocks(wpc, inode, file_offset);
+		if (error)
+			break;
+		if (wpc->io_type == XFS_IO_HOLE)
 			continue;
 
 		lock_buffer(bh);
@@ -887,7 +875,6 @@ xfs_writepage_map(
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
-out:
 	/*
 	 * On error, we have to fail the ioend here because we have locked
 	 * buffers in the ioend. If we don't do this, we'll deadlock
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 15/22] xfs: don't look at buffer heads in xfs_add_to_ioend
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 14/22] xfs: remove the imap_valid flag Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 16/22] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Calculate all information for the bio based on the passed in information
without requiring a buffer_head structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 68 ++++++++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 09092f10cff3..6402e4323031 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -32,7 +32,6 @@ struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
 	unsigned int		io_type;
 	struct xfs_ioend	*ioend;
-	sector_t		last_block;
 };
 
 void
@@ -534,11 +533,6 @@ xfs_start_page_writeback(
 	unlock_page(page);
 }
 
-static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
-{
-	return bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
-}
-
 /*
  * Submit the bio for an ioend. We are passed an ioend with a bio attached to
  * it, and we submit that bio. The ioend may be used for multiple bio
@@ -604,27 +598,20 @@ xfs_submit_ioend(
 	return 0;
 }
 
-static void
-xfs_init_bio_from_bh(
-	struct bio		*bio,
-	struct buffer_head	*bh)
-{
-	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio_set_dev(bio, bh->b_bdev);
-}
-
 static struct xfs_ioend *
 xfs_alloc_ioend(
 	struct inode		*inode,
 	unsigned int		type,
 	xfs_off_t		offset,
-	struct buffer_head	*bh)
+	struct block_device	*bdev,
+	sector_t		sector)
 {
 	struct xfs_ioend	*ioend;
 	struct bio		*bio;
 
 	bio = bio_alloc_bioset(GFP_NOFS, BIO_MAX_PAGES, &xfs_ioend_bioset);
-	xfs_init_bio_from_bh(bio, bh);
+	bio_set_dev(bio, bdev);
+	bio->bi_iter.bi_sector = sector;
 
 	ioend = container_of(bio, struct xfs_ioend, io_inline_bio);
 	INIT_LIST_HEAD(&ioend->io_list);
@@ -649,13 +636,14 @@ static void
 xfs_chain_bio(
 	struct xfs_ioend	*ioend,
 	struct writeback_control *wbc,
-	struct buffer_head	*bh)
+	struct block_device	*bdev,
+	sector_t		sector)
 {
 	struct bio *new;
 
 	new = bio_alloc(GFP_NOFS, BIO_MAX_PAGES);
-	xfs_init_bio_from_bh(new, bh);
-
+	bio_set_dev(new, bdev);
+	new->bi_iter.bi_sector = sector;
 	bio_chain(ioend->io_bio, new);
 	bio_get(ioend->io_bio);		/* for xfs_destroy_ioend */
 	ioend->io_bio->bi_opf = REQ_OP_WRITE | wbc_to_write_flags(wbc);
@@ -665,39 +653,45 @@ xfs_chain_bio(
 }
 
 /*
- * Test to see if we've been building up a completion structure for
- * earlier buffers -- if so, we try to append to this ioend if we
- * can, otherwise we finish off any current ioend and start another.
- * Return the ioend we finished off so that the caller can submit it
- * once it has finished processing the dirty page.
+ * Test to see if we have an existing ioend structure that we could append to
+ * first, otherwise finish off the current ioend and start another.
  */
 STATIC void
 xfs_add_to_ioend(
 	struct inode		*inode,
-	struct buffer_head	*bh,
 	xfs_off_t		offset,
+	struct page		*page,
 	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	struct list_head	*iolist)
 {
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
+	unsigned		len = i_blocksize(inode);
+	unsigned		poff = offset & (PAGE_SIZE - 1);
+	sector_t		sector;
+
+	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
+		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
+
 	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
-	    bh->b_blocknr != wpc->last_block + 1 ||
+	    sector != bio_end_sector(wpc->ioend->io_bio) ||
 	    offset != wpc->ioend->io_offset + wpc->ioend->io_size) {
 		if (wpc->ioend)
 			list_add(&wpc->ioend->io_list, iolist);
-		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset, bh);
+		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset,
+				bdev, sector);
 	}
 
 	/*
-	 * If the buffer doesn't fit into the bio we need to allocate a new
-	 * one.  This shouldn't happen more than once for a given buffer.
+	 * If the block doesn't fit into the bio we need to allocate a new
+	 * one.  This shouldn't happen more than once for a given block.
 	 */
-	while (xfs_bio_add_buffer(wpc->ioend->io_bio, bh) != bh->b_size)
-		xfs_chain_bio(wpc->ioend, wbc, bh);
+	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
+		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
 
-	wpc->ioend->io_size += bh->b_size;
-	wpc->last_block = bh->b_blocknr;
-	xfs_start_buffer_writeback(bh);
+	wpc->ioend->io_size += len;
 }
 
 STATIC void
@@ -869,7 +863,9 @@ xfs_writepage_map(
 
 		lock_buffer(bh);
 		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
-		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
+		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
+				&submit_list);
+		xfs_start_buffer_writeback(bh);
 		count++;
 	}
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 16/22] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 15/22] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 17/22] xfs: remove xfs_start_page_writeback Christoph Hellwig
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

This keeps it in a single place so it can be made otional more easily.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 6402e4323031..71b4ca60ff40 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -494,21 +494,6 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC void
-xfs_start_buffer_writeback(
-	struct buffer_head	*bh)
-{
-	ASSERT(buffer_mapped(bh));
-	ASSERT(buffer_locked(bh));
-	ASSERT(!buffer_delay(bh));
-	ASSERT(!buffer_unwritten(bh));
-
-	bh->b_end_io = NULL;
-	set_buffer_async_write(bh);
-	set_buffer_uptodate(bh);
-	clear_buffer_dirty(bh);
-}
-
 STATIC void
 xfs_start_page_writeback(
 	struct page		*page,
@@ -728,6 +713,7 @@ xfs_map_at_offset(
 	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
 	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
 
+	lock_buffer(bh);
 	xfs_map_buffer(inode, bh, imap, offset);
 	set_buffer_mapped(bh);
 	clear_buffer_delay(bh);
@@ -740,6 +726,10 @@ xfs_map_at_offset(
 	 * set the bdev now.
 	 */
 	bh->b_bdev = xfs_find_bdev_for_inode(inode);
+	bh->b_end_io = NULL;
+	set_buffer_async_write(bh);
+	set_buffer_uptodate(bh);
+	clear_buffer_dirty(bh);
 }
 
 STATIC void
@@ -861,11 +851,9 @@ xfs_writepage_map(
 		if (wpc->io_type == XFS_IO_HOLE)
 			continue;
 
-		lock_buffer(bh);
 		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
 		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
 				&submit_list);
-		xfs_start_buffer_writeback(bh);
 		count++;
 	}
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 17/22] xfs: remove xfs_start_page_writeback
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 16/22] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 18/22] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

This helper only has two callers, one of them with a constant error
argument.  Remove it to make pending changes to the code a little easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 46 ++++++++++++++++++++--------------------------
 1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 71b4ca60ff40..af9224ea4ebf 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -494,30 +494,6 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC void
-xfs_start_page_writeback(
-	struct page		*page,
-	int			clear_dirty)
-{
-	ASSERT(PageLocked(page));
-	ASSERT(!PageWriteback(page));
-
-	/*
-	 * if the page was not fully cleaned, we need to ensure that the higher
-	 * layers come back to it correctly. That means we need to keep the page
-	 * dirty, and for WB_SYNC_ALL writeback we need to ensure the
-	 * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to
-	 * write this page in this writeback sweep will be made.
-	 */
-	if (clear_dirty) {
-		clear_page_dirty_for_io(page);
-		set_page_writeback(page);
-	} else
-		set_page_writeback_keepwrite(page);
-
-	unlock_page(page);
-}
-
 /*
  * Submit the bio for an ioend. We are passed an ioend with a bio attached to
  * it, and we submit that bio. The ioend may be used for multiple bio
@@ -858,6 +834,8 @@ xfs_writepage_map(
 	}
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
+	ASSERT(PageLocked(page));
+	ASSERT(!PageWriteback(page));
 
 	/*
 	 * On error, we have to fail the ioend here because we have locked
@@ -877,7 +855,21 @@ xfs_writepage_map(
 	 * treated correctly on error.
 	 */
 	if (count) {
-		xfs_start_page_writeback(page, !error);
+		/*
+		 * If the page was not fully cleaned, we need to ensure that the
+		 * higher layers come back to it correctly.  That means we need
+		 * to keep the page dirty, and for WB_SYNC_ALL writeback we need
+		 * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed
+		 * so another attempt to write this page in this writeback sweep
+		 * will be made.
+		 */
+		if (error) {
+			set_page_writeback_keepwrite(page);
+		} else {
+			clear_page_dirty_for_io(page);
+			set_page_writeback(page);
+		}
+		unlock_page(page);
 
 		/*
 		 * Preserve the original error if there was one, otherwise catch
@@ -902,7 +894,9 @@ xfs_writepage_map(
 		 * race with a partial page truncate on a sub-page block sized
 		 * filesystem. In that case we need to mark the page clean.
 		 */
-		xfs_start_page_writeback(page, 1);
+		clear_page_dirty_for_io(page);
+		set_page_writeback(page);
+		unlock_page(page);
 		end_page_writeback(page);
 	}
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 18/22] xfs: refactor the tail of xfs_writepage_map
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 17/22] xfs: remove xfs_start_page_writeback Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 19/22] xfs: allow writeback on pages without buffer heads Christoph Hellwig
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Rejuggle how we deal with the different error vs non-error and have
ioends vs not have ioend cases to keep the fast path streamlined, and
the duplicate code at a minimum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 65 +++++++++++++++++++++++------------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index af9224ea4ebf..c8e0d3055153 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -854,7 +854,14 @@ xfs_writepage_map(
 	 * submission of outstanding ioends on the writepage context so they are
 	 * treated correctly on error.
 	 */
-	if (count) {
+	if (unlikely(error)) {
+		if (!count) {
+			xfs_aops_discard_page(page);
+			ClearPageUptodate(page);
+			unlock_page(page);
+			goto done;
+		}
+
 		/*
 		 * If the page was not fully cleaned, we need to ensure that the
 		 * higher layers come back to it correctly.  That means we need
@@ -863,43 +870,35 @@ xfs_writepage_map(
 		 * so another attempt to write this page in this writeback sweep
 		 * will be made.
 		 */
-		if (error) {
-			set_page_writeback_keepwrite(page);
-		} else {
-			clear_page_dirty_for_io(page);
-			set_page_writeback(page);
-		}
-		unlock_page(page);
-
-		/*
-		 * Preserve the original error if there was one, otherwise catch
-		 * submission errors here and propagate into subsequent ioend
-		 * submissions.
-		 */
-		list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
-			int error2;
-
-			list_del_init(&ioend->io_list);
-			error2 = xfs_submit_ioend(wbc, ioend, error);
-			if (error2 && !error)
-				error = error2;
-		}
-	} else if (error) {
-		xfs_aops_discard_page(page);
-		ClearPageUptodate(page);
-		unlock_page(page);
+		set_page_writeback_keepwrite(page);
 	} else {
-		/*
-		 * We can end up here with no error and nothing to write if we
-		 * race with a partial page truncate on a sub-page block sized
-		 * filesystem. In that case we need to mark the page clean.
-		 */
 		clear_page_dirty_for_io(page);
 		set_page_writeback(page);
-		unlock_page(page);
-		end_page_writeback(page);
 	}
 
+	unlock_page(page);
+
+	/*
+	 * Preserve the original error if there was one, otherwise catch
+	 * submission errors here and propagate into subsequent ioend
+	 * submissions.
+	 */
+	list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
+		int error2;
+
+		list_del_init(&ioend->io_list);
+		error2 = xfs_submit_ioend(wbc, ioend, error);
+		if (error2 && !error)
+			error = error2;
+	}
+
+	/*
+	 * We can end up here with no error and nothing to write if we race with
+	 * a partial page truncate on a sub-page block sized filesystem.
+	 */
+	if (!count)
+		end_page_writeback(page);
+done:
 	mapping_set_error(page->mapping, error);
 	return error;
 }
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 19/22] xfs: allow writeback on pages without buffer heads
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 18/22] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 20/22] iomap: add support for sub-pagesize buffered I/O " Christoph Hellwig
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size
equal to the page size, and deal with pages without buffer heads in
writeback.  Thanks to the previous refactoring this is basically trivial
now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c  | 51 +++++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_iomap.c |  3 ++-
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c8e0d3055153..0058f9893705 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -79,6 +79,19 @@ xfs_find_daxdev_for_inode(
 		return mp->m_ddev_targp->bt_daxdev;
 }
 
+static void
+xfs_finish_page_writeback(
+	struct inode		*inode,
+	struct bio_vec		*bvec,
+	int			error)
+{
+	if (error) {
+		SetPageError(bvec->bv_page);
+		mapping_set_error(inode->i_mapping, -EIO);
+	}
+	end_page_writeback(bvec->bv_page);
+}
+
 /*
  * We're now finished for good with this page.  Update the page state via the
  * associated buffer_heads, paying attention to the start and end offsets that
@@ -91,7 +104,7 @@ xfs_find_daxdev_for_inode(
  * and buffers potentially freed after every call to end_buffer_async_write.
  */
 static void
-xfs_finish_page_writeback(
+xfs_finish_buffer_writeback(
 	struct inode		*inode,
 	struct bio_vec		*bvec,
 	int			error)
@@ -166,9 +179,12 @@ xfs_destroy_ioend(
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bvec, bio, i)
-			xfs_finish_page_writeback(inode, bvec, error);
-
+		bio_for_each_segment_all(bvec, bio, i) {
+			if (page_has_buffers(bvec->bv_page))
+				xfs_finish_buffer_writeback(inode, bvec, error);
+			else
+				xfs_finish_page_writeback(inode, bvec, error);
+		}
 		bio_put(bio);
 	}
 
@@ -792,13 +808,16 @@ xfs_writepage_map(
 {
 	LIST_HEAD(submit_list);
 	struct xfs_ioend	*ioend, *next;
-	struct buffer_head	*bh;
+	struct buffer_head	*bh = NULL;
 	ssize_t			len = i_blocksize(inode);
 	uint64_t		file_offset;	/* file offset of page */
 	unsigned		poffset;	/* offset into page */
 	int			error = 0;
 	int			count = 0;
 
+	if (page_has_buffers(page))
+		bh = page_buffers(page);
+
 	/*
 	 * Walk the blocks on the page, and if we run off the end of the current
 	 * map or find the current map invalid, grab a new one.  We only use
@@ -806,28 +825,34 @@ xfs_writepage_map(
 	 * the iteration through the page. This allows us to replace the
 	 * bufferhead with some other state tracking mechanism in future.
 	 */
-	file_offset = page_offset(page);
-	bh = page_buffers(page);
-	for (poffset = 0;
+	for (poffset = 0, file_offset = page_offset(page);
 	     poffset < PAGE_SIZE;
-	     poffset += len, file_offset += len, bh = bh->b_this_page) {
+	     poffset += len, file_offset += len) {
 		/* past the range we are writing, so nothing more to write. */
 		if (file_offset >= end_offset)
 			break;
 
-		if (!buffer_uptodate(bh)) {
+		if (bh && !buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
+			bh = bh->b_this_page;
 			continue;
 		}
 
 		error = xfs_map_blocks(wpc, inode, file_offset);
 		if (error)
 			break;
-		if (wpc->io_type == XFS_IO_HOLE)
+
+		if (wpc->io_type == XFS_IO_HOLE) {
+			if (bh)
+				bh = bh->b_this_page;
 			continue;
+		}
 
-		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		if (bh) {
+			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+			bh = bh->b_this_page;
+		}
 		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
 				&submit_list);
 		count++;
@@ -925,8 +950,6 @@ xfs_do_writepage(
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
-	ASSERT(page_has_buffers(page));
-
 	/*
 	 * Refuse to write the page out if we are called from reclaim context.
 	 *
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 10c54fc7d1b4..7fe42a126ec1 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1032,7 +1032,8 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	iomap->flags |= IOMAP_F_BUFFER_HEAD;
+	if (i_blocksize(inode) < PAGE_SIZE)
+		iomap->flags |= IOMAP_F_BUFFER_HEAD;
 
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 20/22] iomap: add support for sub-pagesize buffered I/O without buffer heads
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 19/22] xfs: allow writeback on pages without buffer heads Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-03 12:31   ` Brian Foster
  2018-07-02 14:58 ` [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads Christoph Hellwig
  2018-07-02 14:58 ` [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code Christoph Hellwig
  21 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

After already supporting a simple implementation of buffered writes for
the blocksize == PAGE_SIZE case in the last commit this adds full support
even for smaller block sizes.   There are three bits of per-block
information in the buffer_head structure that really matter for the iomap
read and write path:

 - uptodate status (BH_uptodate)
 - marked as currently under read I/O (BH_Async_Read)
 - marked as currently under write I/O (BH_Async_Write)

Instead of having new per-block structures this now adds a per-page
structure called struct iomap_page to track this information in a slightly
different form:

 - a bitmap for the per-block uptodate status.  For worst case of a 64k
   page size system this bitmap needs to contain 128 bits.  For the
   typical 4k page size case it only needs 8 bits, although we still
   need a full unsigned long due to the way the atomic bitmap API works.
 - two atomic_t counters are used to track the outstanding read and write
   counts

There is quite a bit of boilerplate code as the buffered I/O path uses
various helper methods, but the actual code is very straight forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c            | 279 ++++++++++++++++++++++++++++++++++++++----
 include/linux/iomap.h |  31 +++++
 2 files changed, 289 insertions(+), 21 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 13cdcf33e6c0..ea1b1ba61ba3 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -17,6 +17,7 @@
 #include <linux/iomap.h>
 #include <linux/uaccess.h>
 #include <linux/gfp.h>
+#include <linux/migrate.h>
 #include <linux/mm.h>
 #include <linux/mm_inline.h>
 #include <linux/swap.h>
@@ -104,6 +105,138 @@ iomap_sector(struct iomap *iomap, loff_t pos)
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
 }
 
+static struct iomap_page *
+iomap_page_create(struct inode *inode, struct page *page)
+{
+	struct iomap_page *iop = to_iomap_page(page);
+
+	if (iop || i_blocksize(inode) == PAGE_SIZE)
+		return iop;
+
+	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
+	atomic_set(&iop->read_count, 0);
+	atomic_set(&iop->write_count, 0);
+	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+	set_page_private(page, (unsigned long)iop);
+	SetPagePrivate(page);
+	return iop;
+}
+
+static void
+iomap_page_release(struct page *page)
+{
+	struct iomap_page *iop = to_iomap_page(page);
+
+	if (!iop)
+		return;
+	WARN_ON_ONCE(atomic_read(&iop->read_count));
+	WARN_ON_ONCE(atomic_read(&iop->write_count));
+	ClearPagePrivate(page);
+	set_page_private(page, 0);
+	kfree(iop);
+}
+
+/*
+ * Calculate the range inside the page that we actually need to read.
+ */
+static void
+iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
+		loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
+{
+	unsigned block_bits = inode->i_blkbits;
+	unsigned block_size = (1 << block_bits);
+	unsigned poff = *pos & (PAGE_SIZE - 1);
+	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
+	unsigned first = poff >> block_bits;
+	unsigned last = (poff + plen - 1) >> block_bits;
+	unsigned end = (i_size_read(inode) & (PAGE_SIZE - 1)) >> block_bits;
+
+	/*
+	 * If the block size is smaller than the page size we need to check the
+	 * per-block uptodate status and adjust the offset and length if needed
+	 * to avoid reading in already uptodate ranges.
+	 */
+	if (iop) {
+		unsigned int i;
+
+		/* move forward for each leading block marked uptodate */
+		for (i = first; i <= last; i++) {
+			if (!test_bit(i, iop->uptodate))
+				break;
+			*pos += block_size;
+			poff += block_size;
+			plen -= block_size;
+			first++;
+		}
+
+		/* truncate len if we find any trailing uptodate block(s) */
+		for ( ; i <= last; i++) {
+			if (test_bit(i, iop->uptodate)) {
+				plen -= (last - i + 1) * block_size;
+				last = i - 1;
+				break;
+			}
+		}
+	}
+
+	/*
+	 * If the extent spans the block that contains the i_size we need to
+	 * handle both halves separately so that we properly zero data in the
+	 * page cache for blocks that are entirely outside of i_size.
+	 */
+	if (first <= end && last > end)
+		plen -= (last - end) * block_size;
+
+	*offp = poff;
+	*lenp = plen;
+}
+
+static void
+iomap_set_range_uptodate(struct page *page, unsigned off, unsigned len)
+{
+	struct iomap_page *iop = to_iomap_page(page);
+	struct inode *inode = page->mapping->host;
+	unsigned first = off >> inode->i_blkbits;
+	unsigned last = (off + len - 1) >> inode->i_blkbits;
+	unsigned int i;
+	bool uptodate = true;
+
+	if (iop) {
+		for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
+			if (i >= first && i <= last)
+				set_bit(i, iop->uptodate);
+			else if (!test_bit(i, iop->uptodate))
+				uptodate = false;
+		}
+	}
+
+	if (uptodate && !PageError(page))
+		SetPageUptodate(page);
+}
+
+static void
+iomap_read_finish(struct iomap_page *iop, struct page *page)
+{
+	if (!iop || atomic_dec_and_test(&iop->read_count))
+		unlock_page(page);
+}
+
+static void
+iomap_read_page_end_io(struct bio_vec *bvec, int error)
+{
+	struct page *page = bvec->bv_page;
+	struct iomap_page *iop = to_iomap_page(page);
+
+	if (unlikely(error)) {
+		ClearPageUptodate(page);
+		SetPageError(page);
+	} else {
+		iomap_set_range_uptodate(page, bvec->bv_offset, bvec->bv_len);
+	}
+
+	iomap_read_finish(iop, page);
+}
+
 static void
 iomap_read_inline_data(struct inode *inode, struct page *page,
 		struct iomap *iomap)
@@ -132,7 +265,7 @@ iomap_read_end_io(struct bio *bio)
 	int i;
 
 	bio_for_each_segment_all(bvec, bio, i)
-		page_endio(bvec->bv_page, false, error);
+		iomap_read_page_end_io(bvec, error);
 	bio_put(bio);
 }
 
@@ -150,9 +283,10 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 {
 	struct iomap_readpage_ctx *ctx = data;
 	struct page *page = ctx->cur_page;
-	unsigned poff = pos & (PAGE_SIZE - 1);
-	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
+	struct iomap_page *iop = iomap_page_create(inode, page);
 	bool is_contig = false;
+	loff_t orig_pos = pos;
+	unsigned poff, plen;
 	sector_t sector;
 
 	if (iomap->type == IOMAP_INLINE) {
@@ -161,13 +295,13 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return PAGE_SIZE;
 	}
 
-	/* we don't support blocksize < PAGE_SIZE quite yet. */
-	WARN_ON_ONCE(pos != page_offset(page));
-	WARN_ON_ONCE(plen != PAGE_SIZE);
+	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
+	if (plen == 0)
+		goto done;
 
 	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
 		zero_user(page, poff, plen);
-		SetPageUptodate(page);
+		iomap_set_range_uptodate(page, poff, plen);
 		goto done;
 	}
 
@@ -183,6 +317,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		is_contig = true;
 	}
 
+	/*
+	 * If we start a new segment we need to increase the read count, and we
+	 * need to do so before submitting any previous full bio to make sure
+	 * that we don't prematurely unlock the page.
+	 */
+	if (iop)
+		atomic_inc(&iop->read_count);
+
 	if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
 		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
 		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -203,7 +345,13 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 	__bio_add_page(ctx->bio, page, plen, poff);
 done:
-	return plen;
+	/*
+	 * Move the caller beyond our range so that it keeps making progress.
+	 * For that we have to include any leading non-uptodate ranges, but
+	 * we can skip trailing ones as they will be handled in the next
+	 * iteration.
+	 */
+	return pos - orig_pos + plen;
 }
 
 int
@@ -214,8 +362,6 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 	unsigned poff;
 	loff_t ret;
 
-	WARN_ON_ONCE(page_has_buffers(page));
-
 	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
 		ret = iomap_apply(inode, page_offset(page) + poff,
 				PAGE_SIZE - poff, 0, ops, &ctx,
@@ -341,6 +487,84 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
 }
 EXPORT_SYMBOL_GPL(iomap_readpages);
 
+int
+iomap_is_partially_uptodate(struct page *page, unsigned long from,
+		unsigned long count)
+{
+	struct iomap_page *iop = to_iomap_page(page);
+	struct inode *inode = page->mapping->host;
+	unsigned first = from >> inode->i_blkbits;
+	unsigned last = (from + count - 1) >> inode->i_blkbits;
+	unsigned i;
+
+	if (iop) {
+		for (i = first; i <= last; i++)
+			if (!test_bit(i, iop->uptodate))
+				return 0;
+		return 1;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
+
+int
+iomap_releasepage(struct page *page, gfp_t gfp_mask)
+{
+	/*
+	 * mm accommodates an old ext3 case where clean pages might not have had
+	 * the dirty bit cleared. Thus, it can send actual dirty pages to
+	 * ->releasepage() via shrink_active_list(), skip those here.
+	 */
+	if (PageDirty(page) || PageWriteback(page))
+		return 0;
+	iomap_page_release(page);
+	return 1;
+}
+EXPORT_SYMBOL_GPL(iomap_releasepage);
+
+void
+iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
+{
+	/*
+	 * If we are invalidating the entire page, clear the dirty state from it
+	 * and release it to avoid unnecessary buildup of the LRU.
+	 */
+	if (offset == 0 && len == PAGE_SIZE) {
+		WARN_ON_ONCE(PageWriteback(page));
+		cancel_dirty_page(page);
+		iomap_page_release(page);
+	}
+}
+EXPORT_SYMBOL_GPL(iomap_invalidatepage);
+
+#ifdef CONFIG_MIGRATION
+int
+iomap_migrate_page(struct address_space *mapping, struct page *newpage,
+		struct page *page, enum migrate_mode mode)
+{
+	int ret;
+
+	ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
+	if (ret != MIGRATEPAGE_SUCCESS)
+		return ret;
+
+	if (page_has_private(page)) {
+		ClearPagePrivate(page);
+		set_page_private(newpage, page_private(page));
+		set_page_private(page, 0);
+		SetPagePrivate(newpage);
+	}
+
+	if (mode != MIGRATE_SYNC_NO_COPY)
+		migrate_page_copy(newpage, page);
+	else
+		migrate_page_states(newpage, page);
+	return MIGRATEPAGE_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(iomap_migrate_page);
+#endif /* CONFIG_MIGRATION */
+
 static void
 iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 {
@@ -364,6 +588,7 @@ iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
 
 	if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) {
 		zero_user_segments(page, poff, from, to, poff + plen);
+		iomap_set_range_uptodate(page, poff, plen);
 		return 0;
 	}
 
@@ -379,21 +604,33 @@ static int
 __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 		struct page *page, struct iomap *iomap)
 {
+	struct iomap_page *iop = iomap_page_create(inode, page);
 	loff_t block_size = i_blocksize(inode);
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-	unsigned poff = block_start & (PAGE_SIZE - 1);
-	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start);
-	unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
-
-	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
+	unsigned from = pos & (PAGE_SIZE - 1), to = from + len, poff, plen;
+	int status = 0;
 
 	if (PageUptodate(page))
 		return 0;
-	if (from <= poff && to >= poff + plen)
-		return 0;
-	return iomap_read_page_sync(inode, block_start, page,
-			poff, plen, from, to, iomap);
+
+	do {
+		iomap_adjust_read_range(inode, iop, &block_start,
+				block_end - block_start, &poff, &plen);
+		if (plen == 0)
+			break;
+
+		if ((from > poff && from < poff + plen) ||
+		    (to > poff && to < poff + plen)) {
+			status = iomap_read_page_sync(inode, block_start, page,
+					poff, plen, from, to, iomap);
+			if (status)
+				break;
+		}
+
+	} while ((block_start += plen) < block_end);
+
+	return status;
 }
 
 static int
@@ -476,7 +713,7 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	if (unlikely(copied < len && !PageUptodate(page))) {
 		copied = 0;
 	} else {
-		SetPageUptodate(page);
+		iomap_set_range_uptodate(page, pos & (PAGE_SIZE - 1), len);
 		iomap_set_page_dirty(page);
 	}
 	return __generic_write_end(inode, pos, copied, page);
@@ -812,7 +1049,7 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
 		block_commit_write(page, 0, length);
 	} else {
 		WARN_ON_ONCE(!PageUptodate(page));
-		WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
+		iomap_page_create(inode, page);
 	}
 
 	return length;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 5eb9ca8d7ce5..3555d54bf79a 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -2,6 +2,9 @@
 #ifndef LINUX_IOMAP_H
 #define LINUX_IOMAP_H 1
 
+#include <linux/atomic.h>
+#include <linux/bitmap.h>
+#include <linux/mm.h>
 #include <linux/types.h>
 
 struct address_space;
@@ -98,12 +101,40 @@ struct iomap_ops {
 			ssize_t written, unsigned flags, struct iomap *iomap);
 };
 
+/*
+ * Structure allocate for each page when block size < PAGE_SIZE to track
+ * sub-page uptodate status and I/O completions.
+ */
+struct iomap_page {
+	atomic_t		read_count;
+	atomic_t		write_count;
+	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+};
+
+static inline struct iomap_page *to_iomap_page(struct page *page)
+{
+	if (page_has_private(page))
+		return (struct iomap_page *)page_private(page);
+	return NULL;
+}
+
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_readpage(struct page *page, const struct iomap_ops *ops);
 int iomap_readpages(struct address_space *mapping, struct list_head *pages,
 		unsigned nr_pages, const struct iomap_ops *ops);
 int iomap_set_page_dirty(struct page *page);
+int iomap_is_partially_uptodate(struct page *page, unsigned long from,
+		unsigned long count);
+int iomap_releasepage(struct page *page, gfp_t gfp_mask);
+void iomap_invalidatepage(struct page *page, unsigned int offset,
+		unsigned int len);
+#ifdef CONFIG_MIGRATION
+int iomap_migrate_page(struct address_space *mapping, struct page *newpage,
+		struct page *page, enum migrate_mode mode);
+#else
+#define iomap_migrate_page NULL
+#endif
 int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (19 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 20/22] iomap: add support for sub-pagesize buffered I/O " Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-03 12:36   ` Brian Foster
  2018-07-02 14:58 ` [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code Christoph Hellwig
  21 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Switch to using the iomap_page structure for checking sub-page uptodate
status and track sub-page I/O completion status, and remove large
quantities of boilerplate code working around buffer heads.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  | 492 ++++++---------------------------------------
 fs/xfs/xfs_buf.h   |   1 -
 fs/xfs/xfs_iomap.c |   3 -
 fs/xfs/xfs_super.c |   2 +-
 fs/xfs/xfs_trace.h |  18 +-
 5 files changed, 61 insertions(+), 455 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0058f9893705..bae88ac1101d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -20,9 +20,6 @@
 #include "xfs_bmap_util.h"
 #include "xfs_bmap_btree.h"
 #include "xfs_reflink.h"
-#include <linux/gfp.h>
-#include <linux/mpage.h>
-#include <linux/pagevec.h>
 #include <linux/writeback.h>
 
 /*
@@ -34,25 +31,6 @@ struct xfs_writepage_ctx {
 	struct xfs_ioend	*ioend;
 };
 
-void
-xfs_count_page_state(
-	struct page		*page,
-	int			*delalloc,
-	int			*unwritten)
-{
-	struct buffer_head	*bh, *head;
-
-	*delalloc = *unwritten = 0;
-
-	bh = head = page_buffers(page);
-	do {
-		if (buffer_unwritten(bh))
-			(*unwritten) = 1;
-		else if (buffer_delay(bh))
-			(*delalloc) = 1;
-	} while ((bh = bh->b_this_page) != head);
-}
-
 struct block_device *
 xfs_find_bdev_for_inode(
 	struct inode		*inode)
@@ -85,67 +63,17 @@ xfs_finish_page_writeback(
 	struct bio_vec		*bvec,
 	int			error)
 {
+	struct iomap_page	*iop = to_iomap_page(bvec->bv_page);
+
 	if (error) {
 		SetPageError(bvec->bv_page);
 		mapping_set_error(inode->i_mapping, -EIO);
 	}
-	end_page_writeback(bvec->bv_page);
-}
 
-/*
- * We're now finished for good with this page.  Update the page state via the
- * associated buffer_heads, paying attention to the start and end offsets that
- * we need to process on the page.
- *
- * Note that we open code the action in end_buffer_async_write here so that we
- * only have to iterate over the buffers attached to the page once.  This is not
- * only more efficient, but also ensures that we only calls end_page_writeback
- * at the end of the iteration, and thus avoids the pitfall of having the page
- * and buffers potentially freed after every call to end_buffer_async_write.
- */
-static void
-xfs_finish_buffer_writeback(
-	struct inode		*inode,
-	struct bio_vec		*bvec,
-	int			error)
-{
-	struct buffer_head	*head = page_buffers(bvec->bv_page), *bh = head;
-	bool			busy = false;
-	unsigned int		off = 0;
-	unsigned long		flags;
-
-	ASSERT(bvec->bv_offset < PAGE_SIZE);
-	ASSERT((bvec->bv_offset & (i_blocksize(inode) - 1)) == 0);
-	ASSERT(bvec->bv_offset + bvec->bv_len <= PAGE_SIZE);
-	ASSERT((bvec->bv_len & (i_blocksize(inode) - 1)) == 0);
-
-	local_irq_save(flags);
-	bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
-	do {
-		if (off >= bvec->bv_offset &&
-		    off < bvec->bv_offset + bvec->bv_len) {
-			ASSERT(buffer_async_write(bh));
-			ASSERT(bh->b_end_io == NULL);
-
-			if (error) {
-				mark_buffer_write_io_error(bh);
-				clear_buffer_uptodate(bh);
-				SetPageError(bvec->bv_page);
-			} else {
-				set_buffer_uptodate(bh);
-			}
-			clear_buffer_async_write(bh);
-			unlock_buffer(bh);
-		} else if (buffer_async_write(bh)) {
-			ASSERT(buffer_locked(bh));
-			busy = true;
-		}
-		off += bh->b_size;
-	} while ((bh = bh->b_this_page) != head);
-	bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
-	local_irq_restore(flags);
+	ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
+	ASSERT(!iop || atomic_read(&iop->write_count) > 0);
 
-	if (!busy)
+	if (!iop || atomic_dec_and_test(&iop->write_count))
 		end_page_writeback(bvec->bv_page);
 }
 
@@ -179,12 +107,8 @@ xfs_destroy_ioend(
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bvec, bio, i) {
-			if (page_has_buffers(bvec->bv_page))
-				xfs_finish_buffer_writeback(inode, bvec, error);
-			else
-				xfs_finish_page_writeback(inode, bvec, error);
-		}
+		bio_for_each_segment_all(bvec, bio, i)
+			xfs_finish_page_writeback(inode, bvec, error);
 		bio_put(bio);
 	}
 
@@ -638,6 +562,7 @@ xfs_add_to_ioend(
 	struct inode		*inode,
 	xfs_off_t		offset,
 	struct page		*page,
+	struct iomap_page	*iop,
 	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	struct list_head	*iolist)
@@ -661,100 +586,37 @@ xfs_add_to_ioend(
 				bdev, sector);
 	}
 
-	/*
-	 * If the block doesn't fit into the bio we need to allocate a new
-	 * one.  This shouldn't happen more than once for a given block.
-	 */
-	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
-		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
+	if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
+		if (iop)
+			atomic_inc(&iop->write_count);
+		if (bio_full(wpc->ioend->io_bio))
+			xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
+		__bio_add_page(wpc->ioend->io_bio, page, len, poff);
+	}
 
 	wpc->ioend->io_size += len;
 }
 
-STATIC void
-xfs_map_buffer(
-	struct inode		*inode,
-	struct buffer_head	*bh,
-	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset)
-{
-	sector_t		bn;
-	struct xfs_mount	*m = XFS_I(inode)->i_mount;
-	xfs_off_t		iomap_offset = XFS_FSB_TO_B(m, imap->br_startoff);
-	xfs_daddr_t		iomap_bn = xfs_fsb_to_db(XFS_I(inode), imap->br_startblock);
-
-	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
-	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
-
-	bn = (iomap_bn >> (inode->i_blkbits - BBSHIFT)) +
-	      ((offset - iomap_offset) >> inode->i_blkbits);
-
-	ASSERT(bn || XFS_IS_REALTIME_INODE(XFS_I(inode)));
-
-	bh->b_blocknr = bn;
-	set_buffer_mapped(bh);
-}
-
-STATIC void
-xfs_map_at_offset(
-	struct inode		*inode,
-	struct buffer_head	*bh,
-	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset)
-{
-	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
-	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
-
-	lock_buffer(bh);
-	xfs_map_buffer(inode, bh, imap, offset);
-	set_buffer_mapped(bh);
-	clear_buffer_delay(bh);
-	clear_buffer_unwritten(bh);
-
-	/*
-	 * If this is a realtime file, data may be on a different device.
-	 * to that pointed to from the buffer_head b_bdev currently. We can't
-	 * trust that the bufferhead has a already been mapped correctly, so
-	 * set the bdev now.
-	 */
-	bh->b_bdev = xfs_find_bdev_for_inode(inode);
-	bh->b_end_io = NULL;
-	set_buffer_async_write(bh);
-	set_buffer_uptodate(bh);
-	clear_buffer_dirty(bh);
-}
-
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
 	unsigned int		offset,
 	unsigned int		length)
 {
-	trace_xfs_invalidatepage(page->mapping->host, page, offset,
-				 length);
-
-	/*
-	 * If we are invalidating the entire page, clear the dirty state from it
-	 * so that we can check for attempts to release dirty cached pages in
-	 * xfs_vm_releasepage().
-	 */
-	if (offset == 0 && length >= PAGE_SIZE)
-		cancel_dirty_page(page);
-	block_invalidatepage(page, offset, length);
+	trace_xfs_invalidatepage(page->mapping->host, page, offset, length);
+	iomap_invalidatepage(page, offset, length);
 }
 
 /*
- * If the page has delalloc buffers on it, we need to punch them out before we
- * invalidate the page. If we don't, we leave a stale delalloc mapping on the
- * inode that can trip a BUG() in xfs_get_blocks() later on if a direct IO read
- * is done on that same region - the delalloc extent is returned when none is
- * supposed to be there.
+ * If the page has delalloc blocks on it, we need to punch them out before we
+ * invalidate the page.  If we don't, we leave a stale delalloc mapping on the
+ * inode that can trip up a later direct I/O read operation on the same region.
  *
- * We prevent this by truncating away the delalloc regions on the page before
- * invalidating it. Because they are delalloc, we can do this without needing a
- * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
- * truncation without a transaction as there is no space left for block
- * reservation (typically why we see a ENOSPC in writeback).
+ * We prevent this by truncating away the delalloc regions on the page.  Because
+ * they are delalloc, we can do this without needing a transaction. Indeed - if
+ * we get ENOSPC errors, we have to be able to do this truncation without a
+ * transaction as there is no space left for block reservation (typically why we
+ * see a ENOSPC in writeback).
  */
 STATIC void
 xfs_aops_discard_page(
@@ -786,7 +648,7 @@ xfs_aops_discard_page(
  * We implement an immediate ioend submission policy here to avoid needing to
  * chain multiple ioends and hence nest mempool allocations which can violate
  * forward progress guarantees we need to provide. The current ioend we are
- * adding buffers to is cached on the writepage context, and if the new buffer
+ * adding blocks to is cached on the writepage context, and if the new block
  * does not append to the cached ioend it will create a new ioend and cache that
  * instead.
  *
@@ -807,54 +669,33 @@ xfs_writepage_map(
 	uint64_t		end_offset)
 {
 	LIST_HEAD(submit_list);
+	struct iomap_page	*iop = to_iomap_page(page);
+	unsigned		len = i_blocksize(inode);
 	struct xfs_ioend	*ioend, *next;
-	struct buffer_head	*bh = NULL;
-	ssize_t			len = i_blocksize(inode);
 	uint64_t		file_offset;	/* file offset of page */
-	unsigned		poffset;	/* offset into page */
-	int			error = 0;
-	int			count = 0;
+	int			error = 0, count = 0, i;
 
-	if (page_has_buffers(page))
-		bh = page_buffers(page);
+	ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
+	ASSERT(!iop || atomic_read(&iop->write_count) == 0);
 
 	/*
-	 * Walk the blocks on the page, and if we run off the end of the current
-	 * map or find the current map invalid, grab a new one.  We only use
-	 * bufferheads here to check per-block state - they no longer control
-	 * the iteration through the page. This allows us to replace the
-	 * bufferhead with some other state tracking mechanism in future.
+	 * Walk through the page to find areas to write back. If we run off the
+	 * end of the current map or find the current map invalid, grab a new
+	 * one.
 	 */
-	for (poffset = 0, file_offset = page_offset(page);
-	     poffset < PAGE_SIZE;
-	     poffset += len, file_offset += len) {
-		/* past the range we are writing, so nothing more to write. */
-		if (file_offset >= end_offset)
-			break;
-
-		if (bh && !buffer_uptodate(bh)) {
-			if (PageUptodate(page))
-				ASSERT(buffer_mapped(bh));
-			bh = bh->b_this_page;
+	for (i = 0, file_offset = page_offset(page);
+	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
+	     i++, file_offset += len) {
+		if (iop && !test_bit(i, iop->uptodate))
 			continue;
-		}
 
 		error = xfs_map_blocks(wpc, inode, file_offset);
 		if (error)
 			break;
-
-		if (wpc->io_type == XFS_IO_HOLE) {
-			if (bh)
-				bh = bh->b_this_page;
+		if (wpc->io_type == XFS_IO_HOLE)
 			continue;
-		}
-
-		if (bh) {
-			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
-			bh = bh->b_this_page;
-		}
-		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
-				&submit_list);
+		xfs_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
+				 &submit_list);
 		count++;
 	}
 
@@ -863,21 +704,18 @@ xfs_writepage_map(
 	ASSERT(!PageWriteback(page));
 
 	/*
-	 * On error, we have to fail the ioend here because we have locked
-	 * buffers in the ioend. If we don't do this, we'll deadlock
-	 * invalidating the page as that tries to lock the buffers on the page.
-	 * Also, because we may have set pages under writeback, we have to make
-	 * sure we run IO completion to mark the error state of the IO
-	 * appropriately, so we can't cancel the ioend directly here. That means
-	 * we have to mark this page as under writeback if we included any
-	 * buffers from it in the ioend chain so that completion treats it
-	 * correctly.
+	 * On error, we have to fail the ioend here because we may have set
+	 * pages under writeback, we have to make sure we run IO completion to
+	 * mark the error state of the IO appropriately, so we can't cancel the
+	 * ioend directly here.  That means we have to mark this page as under
+	 * writeback if we included any blocks from it in the ioend chain so
+	 * that completion treats it correctly.
 	 *
 	 * If we didn't include the page in the ioend, the on error we can
 	 * simply discard and unlock it as there are no other users of the page
-	 * or it's buffers right now. The caller will still need to trigger
-	 * submission of outstanding ioends on the writepage context so they are
-	 * treated correctly on error.
+	 * now.  The caller will still need to trigger submission of outstanding
+	 * ioends on the writepage context so they are treated correctly on
+	 * error.
 	 */
 	if (unlikely(error)) {
 		if (!count) {
@@ -918,8 +756,8 @@ xfs_writepage_map(
 	}
 
 	/*
-	 * We can end up here with no error and nothing to write if we race with
-	 * a partial page truncate on a sub-page block sized filesystem.
+	 * We can end up here with no error and nothing to write only if we race
+	 * with a partial page truncate on a sub-page block sized filesystem.
 	 */
 	if (!count)
 		end_page_writeback(page);
@@ -934,7 +772,6 @@ xfs_writepage_map(
  * For delalloc space on the page we need to allocate space and flush it.
  * For unwritten space on the page we need to start the conversion to
  * regular allocated space.
- * For any other dirty buffer heads on the page we should flush them.
  */
 STATIC int
 xfs_do_writepage(
@@ -1088,166 +925,13 @@ xfs_dax_writepages(
 			xfs_find_bdev_for_inode(mapping->host), wbc);
 }
 
-/*
- * Called to move a page into cleanable state - and from there
- * to be released. The page should already be clean. We always
- * have buffer heads in this call.
- *
- * Returns 1 if the page is ok to release, 0 otherwise.
- */
 STATIC int
 xfs_vm_releasepage(
 	struct page		*page,
 	gfp_t			gfp_mask)
 {
-	int			delalloc, unwritten;
-
 	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
-
-	/*
-	 * mm accommodates an old ext3 case where clean pages might not have had
-	 * the dirty bit cleared. Thus, it can send actual dirty pages to
-	 * ->releasepage() via shrink_active_list(). Conversely,
-	 * block_invalidatepage() can send pages that are still marked dirty but
-	 * otherwise have invalidated buffers.
-	 *
-	 * We want to release the latter to avoid unnecessary buildup of the
-	 * LRU, so xfs_vm_invalidatepage() clears the page dirty flag on pages
-	 * that are entirely invalidated and need to be released.  Hence the
-	 * only time we should get dirty pages here is through
-	 * shrink_active_list() and so we can simply skip those now.
-	 *
-	 * warn if we've left any lingering delalloc/unwritten buffers on clean
-	 * or invalidated pages we are about to release.
-	 */
-	if (PageDirty(page))
-		return 0;
-
-	xfs_count_page_state(page, &delalloc, &unwritten);
-
-	if (WARN_ON_ONCE(delalloc))
-		return 0;
-	if (WARN_ON_ONCE(unwritten))
-		return 0;
-
-	return try_to_free_buffers(page);
-}
-
-/*
- * If this is O_DIRECT or the mpage code calling tell them how large the mapping
- * is, so that we can avoid repeated get_blocks calls.
- *
- * If the mapping spans EOF, then we have to break the mapping up as the mapping
- * for blocks beyond EOF must be marked new so that sub block regions can be
- * correctly zeroed. We can't do this for mappings within EOF unless the mapping
- * was just allocated or is unwritten, otherwise the callers would overwrite
- * existing data with zeros. Hence we have to split the mapping into a range up
- * to and including EOF, and a second mapping for beyond EOF.
- */
-static void
-xfs_map_trim_size(
-	struct inode		*inode,
-	sector_t		iblock,
-	struct buffer_head	*bh_result,
-	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset,
-	ssize_t			size)
-{
-	xfs_off_t		mapping_size;
-
-	mapping_size = imap->br_startoff + imap->br_blockcount - iblock;
-	mapping_size <<= inode->i_blkbits;
-
-	ASSERT(mapping_size > 0);
-	if (mapping_size > size)
-		mapping_size = size;
-	if (offset < i_size_read(inode) &&
-	    (xfs_ufsize_t)offset + mapping_size >= i_size_read(inode)) {
-		/* limit mapping to block that spans EOF */
-		mapping_size = roundup_64(i_size_read(inode) - offset,
-					  i_blocksize(inode));
-	}
-	if (mapping_size > LONG_MAX)
-		mapping_size = LONG_MAX;
-
-	bh_result->b_size = mapping_size;
-}
-
-static int
-xfs_get_blocks(
-	struct inode		*inode,
-	sector_t		iblock,
-	struct buffer_head	*bh_result,
-	int			create)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		offset_fsb, end_fsb;
-	int			error = 0;
-	int			lockmode = 0;
-	struct xfs_bmbt_irec	imap;
-	int			nimaps = 1;
-	xfs_off_t		offset;
-	ssize_t			size;
-
-	BUG_ON(create);
-
-	if (XFS_FORCED_SHUTDOWN(mp))
-		return -EIO;
-
-	offset = (xfs_off_t)iblock << inode->i_blkbits;
-	ASSERT(bh_result->b_size >= i_blocksize(inode));
-	size = bh_result->b_size;
-
-	if (offset >= i_size_read(inode))
-		return 0;
-
-	/*
-	 * Direct I/O is usually done on preallocated files, so try getting
-	 * a block mapping without an exclusive lock first.
-	 */
-	lockmode = xfs_ilock_data_map_shared(ip);
-
-	ASSERT(offset <= mp->m_super->s_maxbytes);
-	if (offset > mp->m_super->s_maxbytes - size)
-		size = mp->m_super->s_maxbytes - offset;
-	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + size);
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
-
-	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
-			&nimaps, 0);
-	if (error)
-		goto out_unlock;
-	if (!nimaps) {
-		trace_xfs_get_blocks_notfound(ip, offset, size);
-		goto out_unlock;
-	}
-
-	trace_xfs_get_blocks_found(ip, offset, size,
-		imap.br_state == XFS_EXT_UNWRITTEN ?
-			XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, &imap);
-	xfs_iunlock(ip, lockmode);
-
-	/* trim mapping down to size requested */
-	xfs_map_trim_size(inode, iblock, bh_result, &imap, offset, size);
-
-	/*
-	 * For unwritten extents do not report a disk address in the buffered
-	 * read case (treat as if we're reading into a hole).
-	 */
-	if (xfs_bmap_is_real_extent(&imap))
-		xfs_map_buffer(inode, bh_result, &imap, offset);
-
-	/*
-	 * If this is a realtime file, data may be on a different device.
-	 * to that pointed to from the buffer_head b_bdev currently.
-	 */
-	bh_result->b_bdev = xfs_find_bdev_for_inode(inode);
-	return 0;
-
-out_unlock:
-	xfs_iunlock(ip, lockmode);
-	return error;
+	return iomap_releasepage(page, gfp_mask);
 }
 
 STATIC sector_t
@@ -1279,9 +963,7 @@ xfs_vm_readpage(
 	struct page		*page)
 {
 	trace_xfs_vm_readpage(page->mapping->host, 1);
-	if (i_blocksize(page->mapping->host) == PAGE_SIZE)
-		return iomap_readpage(page, &xfs_iomap_ops);
-	return mpage_readpage(page, xfs_get_blocks);
+	return iomap_readpage(page, &xfs_iomap_ops);
 }
 
 STATIC int
@@ -1292,65 +974,7 @@ xfs_vm_readpages(
 	unsigned		nr_pages)
 {
 	trace_xfs_vm_readpages(mapping->host, nr_pages);
-	if (i_blocksize(mapping->host) == PAGE_SIZE)
-		return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
-	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
-}
-
-/*
- * This is basically a copy of __set_page_dirty_buffers() with one
- * small tweak: buffers beyond EOF do not get marked dirty. If we mark them
- * dirty, we'll never be able to clean them because we don't write buffers
- * beyond EOF, and that means we can't invalidate pages that span EOF
- * that have been marked dirty. Further, the dirty state can leak into
- * the file interior if the file is extended, resulting in all sorts of
- * bad things happening as the state does not match the underlying data.
- *
- * XXX: this really indicates that bufferheads in XFS need to die. Warts like
- * this only exist because of bufferheads and how the generic code manages them.
- */
-STATIC int
-xfs_vm_set_page_dirty(
-	struct page		*page)
-{
-	struct address_space	*mapping = page->mapping;
-	struct inode		*inode = mapping->host;
-	loff_t			end_offset;
-	loff_t			offset;
-	int			newly_dirty;
-
-	if (unlikely(!mapping))
-		return !TestSetPageDirty(page);
-
-	end_offset = i_size_read(inode);
-	offset = page_offset(page);
-
-	spin_lock(&mapping->private_lock);
-	if (page_has_buffers(page)) {
-		struct buffer_head *head = page_buffers(page);
-		struct buffer_head *bh = head;
-
-		do {
-			if (offset < end_offset)
-				set_buffer_dirty(bh);
-			bh = bh->b_this_page;
-			offset += i_blocksize(inode);
-		} while (bh != head);
-	}
-	/*
-	 * Lock out page->mem_cgroup migration to keep PageDirty
-	 * synchronized with per-memcg dirty page counters.
-	 */
-	lock_page_memcg(page);
-	newly_dirty = !TestSetPageDirty(page);
-	spin_unlock(&mapping->private_lock);
-
-	if (newly_dirty)
-		__set_page_dirty(page, mapping, 1);
-	unlock_page_memcg(page);
-	if (newly_dirty)
-		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
-	return newly_dirty;
+	return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
 }
 
 static int
@@ -1368,13 +992,13 @@ const struct address_space_operations xfs_address_space_operations = {
 	.readpages		= xfs_vm_readpages,
 	.writepage		= xfs_vm_writepage,
 	.writepages		= xfs_vm_writepages,
-	.set_page_dirty		= xfs_vm_set_page_dirty,
+	.set_page_dirty		= iomap_set_page_dirty,
 	.releasepage		= xfs_vm_releasepage,
 	.invalidatepage		= xfs_vm_invalidatepage,
 	.bmap			= xfs_vm_bmap,
 	.direct_IO		= noop_direct_IO,
-	.migratepage		= buffer_migrate_page,
-	.is_partially_uptodate  = block_is_partially_uptodate,
+	.migratepage		= iomap_migrate_page,
+	.is_partially_uptodate  = iomap_is_partially_uptodate,
 	.error_remove_page	= generic_error_remove_page,
 	.swap_activate		= xfs_iomap_swapfile_activate,
 };
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index d24dbd4dac39..6ddf1907fc7a 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -12,7 +12,6 @@
 #include <linux/mm.h>
 #include <linux/fs.h>
 #include <linux/dax.h>
-#include <linux/buffer_head.h>
 #include <linux/uio.h>
 #include <linux/list_lru.h>
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7fe42a126ec1..778b8c850de3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1032,9 +1032,6 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if (i_blocksize(inode) < PAGE_SIZE)
-		iomap->flags |= IOMAP_F_BUFFER_HEAD;
-
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9d791f158dfe..f9f8dc490d3d 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1860,7 +1860,7 @@ MODULE_ALIAS_FS("xfs");
 STATIC int __init
 xfs_init_zones(void)
 {
-	if (bioset_init(&xfs_ioend_bioset, 4 * MAX_BUF_PER_PAGE,
+	if (bioset_init(&xfs_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
 			offsetof(struct xfs_ioend, io_inline_bio),
 			BIOSET_NEED_BVECS))
 		goto out;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 1af123df19b5..7f4c7071e7ed 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1153,33 +1153,23 @@ DECLARE_EVENT_CLASS(xfs_page_class,
 		__field(loff_t, size)
 		__field(unsigned long, offset)
 		__field(unsigned int, length)
-		__field(int, delalloc)
-		__field(int, unwritten)
 	),
 	TP_fast_assign(
-		int delalloc = -1, unwritten = -1;
-
-		if (page_has_buffers(page))
-			xfs_count_page_state(page, &delalloc, &unwritten);
 		__entry->dev = inode->i_sb->s_dev;
 		__entry->ino = XFS_I(inode)->i_ino;
 		__entry->pgoff = page_offset(page);
 		__entry->size = i_size_read(inode);
 		__entry->offset = off;
 		__entry->length = len;
-		__entry->delalloc = delalloc;
-		__entry->unwritten = unwritten;
 	),
 	TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
-		  "length %x delalloc %d unwritten %d",
+		  "length %x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->pgoff,
 		  __entry->size,
 		  __entry->offset,
-		  __entry->length,
-		  __entry->delalloc,
-		  __entry->unwritten)
+		  __entry->length)
 )
 
 #define DEFINE_PAGE_EVENT(name)		\
@@ -1263,9 +1253,6 @@ DEFINE_EVENT(xfs_imap_class, name,	\
 	TP_ARGS(ip, offset, count, type, irec))
 DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
 DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
-DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
-DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
-DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
 DEFINE_IOMAP_EVENT(xfs_iomap_alloc);
 DEFINE_IOMAP_EVENT(xfs_iomap_found);
 
@@ -1304,7 +1291,6 @@ DEFINE_EVENT(xfs_simple_io_class, name,	\
 	TP_ARGS(ip, offset, count))
 DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enospc);
 DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
-DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
 DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
 DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
 DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code
  2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
                   ` (20 preceding siblings ...)
  2018-07-02 14:58 ` [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads Christoph Hellwig
@ 2018-07-02 14:58 ` Christoph Hellwig
  2018-07-03 12:36   ` Brian Foster
  2018-07-03 21:51   ` Darrick J. Wong
  21 siblings, 2 replies; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-02 14:58 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  | 1 +
 fs/xfs/xfs_iomap.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index bae88ac1101d..f4d3252236c1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2016-2018 Christoph Hellwig.
  * All Rights Reserved.
  */
 #include "xfs.h"
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 778b8c850de3..fb9746cc7338 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (c) 2000-2006 Silicon Graphics, Inc.
- * Copyright (c) 2016 Christoph Hellwig.
+ * Copyright (c) 2016-2018 Christoph Hellwig.
  * All Rights Reserved.
  */
 #include <linux/iomap.h>
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 20/22] iomap: add support for sub-pagesize buffered I/O without buffer heads
  2018-07-02 14:58 ` [PATCH 20/22] iomap: add support for sub-pagesize buffered I/O " Christoph Hellwig
@ 2018-07-03 12:31   ` Brian Foster
  2018-07-03 21:52     ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2018-07-03 12:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Mon, Jul 02, 2018 at 08:58:11AM -0600, Christoph Hellwig wrote:
> After already supporting a simple implementation of buffered writes for
> the blocksize == PAGE_SIZE case in the last commit this adds full support
> even for smaller block sizes.   There are three bits of per-block
> information in the buffer_head structure that really matter for the iomap
> read and write path:
> 
>  - uptodate status (BH_uptodate)
>  - marked as currently under read I/O (BH_Async_Read)
>  - marked as currently under write I/O (BH_Async_Write)
> 
> Instead of having new per-block structures this now adds a per-page
> structure called struct iomap_page to track this information in a slightly
> different form:
> 
>  - a bitmap for the per-block uptodate status.  For worst case of a 64k
>    page size system this bitmap needs to contain 128 bits.  For the
>    typical 4k page size case it only needs 8 bits, although we still
>    need a full unsigned long due to the way the atomic bitmap API works.
>  - two atomic_t counters are used to track the outstanding read and write
>    counts
> 
> There is quite a bit of boilerplate code as the buffered I/O path uses
> various helper methods, but the actual code is very straight forward.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap.c            | 279 ++++++++++++++++++++++++++++++++++++++----
>  include/linux/iomap.h |  31 +++++
>  2 files changed, 289 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 13cdcf33e6c0..ea1b1ba61ba3 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
...
> @@ -161,13 +295,13 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		return PAGE_SIZE;
>  	}
>  
> -	/* we don't support blocksize < PAGE_SIZE quite yet. */
> -	WARN_ON_ONCE(pos != page_offset(page));
> -	WARN_ON_ONCE(plen != PAGE_SIZE);
> +	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
> +	if (plen == 0)
> +		goto done;
>  

I think the i_size check confused me in the previous go around. It's
obviously clear to me now after the iomap zeroing issue and fix, but a
one-liner comment couldn't hurt for future reference:

	/* zero post-eof blocks as the page may be mapped */

That nit aside, this looks good to me and survives my tests:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
>  		zero_user(page, poff, plen);
> -		SetPageUptodate(page);
> +		iomap_set_range_uptodate(page, poff, plen);
>  		goto done;
>  	}
>  
> @@ -183,6 +317,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		is_contig = true;
>  	}
>  
> +	/*
> +	 * If we start a new segment we need to increase the read count, and we
> +	 * need to do so before submitting any previous full bio to make sure
> +	 * that we don't prematurely unlock the page.
> +	 */
> +	if (iop)
> +		atomic_inc(&iop->read_count);
> +
>  	if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
>  		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
>  		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
> @@ -203,7 +345,13 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  	__bio_add_page(ctx->bio, page, plen, poff);
>  done:
> -	return plen;
> +	/*
> +	 * Move the caller beyond our range so that it keeps making progress.
> +	 * For that we have to include any leading non-uptodate ranges, but
> +	 * we can skip trailing ones as they will be handled in the next
> +	 * iteration.
> +	 */
> +	return pos - orig_pos + plen;
>  }
>  
>  int
> @@ -214,8 +362,6 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
>  	unsigned poff;
>  	loff_t ret;
>  
> -	WARN_ON_ONCE(page_has_buffers(page));
> -
>  	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
>  		ret = iomap_apply(inode, page_offset(page) + poff,
>  				PAGE_SIZE - poff, 0, ops, &ctx,
> @@ -341,6 +487,84 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
>  }
>  EXPORT_SYMBOL_GPL(iomap_readpages);
>  
> +int
> +iomap_is_partially_uptodate(struct page *page, unsigned long from,
> +		unsigned long count)
> +{
> +	struct iomap_page *iop = to_iomap_page(page);
> +	struct inode *inode = page->mapping->host;
> +	unsigned first = from >> inode->i_blkbits;
> +	unsigned last = (from + count - 1) >> inode->i_blkbits;
> +	unsigned i;
> +
> +	if (iop) {
> +		for (i = first; i <= last; i++)
> +			if (!test_bit(i, iop->uptodate))
> +				return 0;
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
> +
> +int
> +iomap_releasepage(struct page *page, gfp_t gfp_mask)
> +{
> +	/*
> +	 * mm accommodates an old ext3 case where clean pages might not have had
> +	 * the dirty bit cleared. Thus, it can send actual dirty pages to
> +	 * ->releasepage() via shrink_active_list(), skip those here.
> +	 */
> +	if (PageDirty(page) || PageWriteback(page))
> +		return 0;
> +	iomap_page_release(page);
> +	return 1;
> +}
> +EXPORT_SYMBOL_GPL(iomap_releasepage);
> +
> +void
> +iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
> +{
> +	/*
> +	 * If we are invalidating the entire page, clear the dirty state from it
> +	 * and release it to avoid unnecessary buildup of the LRU.
> +	 */
> +	if (offset == 0 && len == PAGE_SIZE) {
> +		WARN_ON_ONCE(PageWriteback(page));
> +		cancel_dirty_page(page);
> +		iomap_page_release(page);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(iomap_invalidatepage);
> +
> +#ifdef CONFIG_MIGRATION
> +int
> +iomap_migrate_page(struct address_space *mapping, struct page *newpage,
> +		struct page *page, enum migrate_mode mode)
> +{
> +	int ret;
> +
> +	ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
> +	if (ret != MIGRATEPAGE_SUCCESS)
> +		return ret;
> +
> +	if (page_has_private(page)) {
> +		ClearPagePrivate(page);
> +		set_page_private(newpage, page_private(page));
> +		set_page_private(page, 0);
> +		SetPagePrivate(newpage);
> +	}
> +
> +	if (mode != MIGRATE_SYNC_NO_COPY)
> +		migrate_page_copy(newpage, page);
> +	else
> +		migrate_page_states(newpage, page);
> +	return MIGRATEPAGE_SUCCESS;
> +}
> +EXPORT_SYMBOL_GPL(iomap_migrate_page);
> +#endif /* CONFIG_MIGRATION */
> +
>  static void
>  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  {
> @@ -364,6 +588,7 @@ iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
>  
>  	if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) {
>  		zero_user_segments(page, poff, from, to, poff + plen);
> +		iomap_set_range_uptodate(page, poff, plen);
>  		return 0;
>  	}
>  
> @@ -379,21 +604,33 @@ static int
>  __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>  		struct page *page, struct iomap *iomap)
>  {
> +	struct iomap_page *iop = iomap_page_create(inode, page);
>  	loff_t block_size = i_blocksize(inode);
>  	loff_t block_start = pos & ~(block_size - 1);
>  	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
> -	unsigned poff = block_start & (PAGE_SIZE - 1);
> -	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start);
> -	unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
> -
> -	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> +	unsigned from = pos & (PAGE_SIZE - 1), to = from + len, poff, plen;
> +	int status = 0;
>  
>  	if (PageUptodate(page))
>  		return 0;
> -	if (from <= poff && to >= poff + plen)
> -		return 0;
> -	return iomap_read_page_sync(inode, block_start, page,
> -			poff, plen, from, to, iomap);
> +
> +	do {
> +		iomap_adjust_read_range(inode, iop, &block_start,
> +				block_end - block_start, &poff, &plen);
> +		if (plen == 0)
> +			break;
> +
> +		if ((from > poff && from < poff + plen) ||
> +		    (to > poff && to < poff + plen)) {
> +			status = iomap_read_page_sync(inode, block_start, page,
> +					poff, plen, from, to, iomap);
> +			if (status)
> +				break;
> +		}
> +
> +	} while ((block_start += plen) < block_end);
> +
> +	return status;
>  }
>  
>  static int
> @@ -476,7 +713,7 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  	if (unlikely(copied < len && !PageUptodate(page))) {
>  		copied = 0;
>  	} else {
> -		SetPageUptodate(page);
> +		iomap_set_range_uptodate(page, pos & (PAGE_SIZE - 1), len);
>  		iomap_set_page_dirty(page);
>  	}
>  	return __generic_write_end(inode, pos, copied, page);
> @@ -812,7 +1049,7 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
>  		block_commit_write(page, 0, length);
>  	} else {
>  		WARN_ON_ONCE(!PageUptodate(page));
> -		WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> +		iomap_page_create(inode, page);
>  	}
>  
>  	return length;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 5eb9ca8d7ce5..3555d54bf79a 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -2,6 +2,9 @@
>  #ifndef LINUX_IOMAP_H
>  #define LINUX_IOMAP_H 1
>  
> +#include <linux/atomic.h>
> +#include <linux/bitmap.h>
> +#include <linux/mm.h>
>  #include <linux/types.h>
>  
>  struct address_space;
> @@ -98,12 +101,40 @@ struct iomap_ops {
>  			ssize_t written, unsigned flags, struct iomap *iomap);
>  };
>  
> +/*
> + * Structure allocate for each page when block size < PAGE_SIZE to track
> + * sub-page uptodate status and I/O completions.
> + */
> +struct iomap_page {
> +	atomic_t		read_count;
> +	atomic_t		write_count;
> +	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
> +};
> +
> +static inline struct iomap_page *to_iomap_page(struct page *page)
> +{
> +	if (page_has_private(page))
> +		return (struct iomap_page *)page_private(page);
> +	return NULL;
> +}
> +
>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  		const struct iomap_ops *ops);
>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
>  int iomap_readpages(struct address_space *mapping, struct list_head *pages,
>  		unsigned nr_pages, const struct iomap_ops *ops);
>  int iomap_set_page_dirty(struct page *page);
> +int iomap_is_partially_uptodate(struct page *page, unsigned long from,
> +		unsigned long count);
> +int iomap_releasepage(struct page *page, gfp_t gfp_mask);
> +void iomap_invalidatepage(struct page *page, unsigned int offset,
> +		unsigned int len);
> +#ifdef CONFIG_MIGRATION
> +int iomap_migrate_page(struct address_space *mapping, struct page *newpage,
> +		struct page *page, enum migrate_mode mode);
> +#else
> +#define iomap_migrate_page NULL
> +#endif
>  int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>  		const struct iomap_ops *ops);
>  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
> -- 
> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads
  2018-07-02 14:58 ` [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads Christoph Hellwig
@ 2018-07-03 12:36   ` Brian Foster
  2018-07-03 22:05     ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2018-07-03 12:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Mon, Jul 02, 2018 at 08:58:12AM -0600, Christoph Hellwig wrote:
> Switch to using the iomap_page structure for checking sub-page uptodate
> status and track sub-page I/O completion status, and remove large
> quantities of boilerplate code working around buffer heads.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c  | 492 ++++++---------------------------------------
>  fs/xfs/xfs_buf.h   |   1 -
>  fs/xfs/xfs_iomap.c |   3 -
>  fs/xfs/xfs_super.c |   2 +-
>  fs/xfs/xfs_trace.h |  18 +-
>  5 files changed, 61 insertions(+), 455 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 0058f9893705..bae88ac1101d 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -85,67 +63,17 @@ xfs_finish_page_writeback(
>  	struct bio_vec		*bvec,
>  	int			error)
>  {
> +	struct iomap_page	*iop = to_iomap_page(bvec->bv_page);
> +
>  	if (error) {
>  		SetPageError(bvec->bv_page);
>  		mapping_set_error(inode->i_mapping, -EIO);
>  	}
> -	end_page_writeback(bvec->bv_page);
> -}
>  
> -/*
> - * We're now finished for good with this page.  Update the page state via the
> - * associated buffer_heads, paying attention to the start and end offsets that
> - * we need to process on the page.
> - *
> - * Note that we open code the action in end_buffer_async_write here so that we
> - * only have to iterate over the buffers attached to the page once.  This is not
> - * only more efficient, but also ensures that we only calls end_page_writeback
> - * at the end of the iteration, and thus avoids the pitfall of having the page
> - * and buffers potentially freed after every call to end_buffer_async_write.
> - */
> -static void
> -xfs_finish_buffer_writeback(
> -	struct inode		*inode,
> -	struct bio_vec		*bvec,
> -	int			error)
> -{
> -	struct buffer_head	*head = page_buffers(bvec->bv_page), *bh = head;
> -	bool			busy = false;
> -	unsigned int		off = 0;
> -	unsigned long		flags;
> -
> -	ASSERT(bvec->bv_offset < PAGE_SIZE);
> -	ASSERT((bvec->bv_offset & (i_blocksize(inode) - 1)) == 0);
> -	ASSERT(bvec->bv_offset + bvec->bv_len <= PAGE_SIZE);
> -	ASSERT((bvec->bv_len & (i_blocksize(inode) - 1)) == 0);
> -
> -	local_irq_save(flags);
> -	bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
> -	do {
> -		if (off >= bvec->bv_offset &&
> -		    off < bvec->bv_offset + bvec->bv_len) {
> -			ASSERT(buffer_async_write(bh));
> -			ASSERT(bh->b_end_io == NULL);
> -
> -			if (error) {
> -				mark_buffer_write_io_error(bh);
> -				clear_buffer_uptodate(bh);

So the buffer completion code clears the uptodate status of the buffer
on error. I assume that means the next read would replace the data we
failed to write with whatever was previously on disk. I guess it's
debatable whether that is the right thing to do in general, but that
seems like a higher level issue nonetheless (i.e., I don't think we'd
ever retry the writepage either?). So is there any reason not to do the
analogous in the iomap completion code?

Otherwise the rest looks fine to me.

Brian

> -				SetPageError(bvec->bv_page);
> -			} else {
> -				set_buffer_uptodate(bh);
> -			}
> -			clear_buffer_async_write(bh);
> -			unlock_buffer(bh);
> -		} else if (buffer_async_write(bh)) {
> -			ASSERT(buffer_locked(bh));
> -			busy = true;
> -		}
> -		off += bh->b_size;
> -	} while ((bh = bh->b_this_page) != head);
> -	bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
> -	local_irq_restore(flags);
> +	ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
> +	ASSERT(!iop || atomic_read(&iop->write_count) > 0);
>  
> -	if (!busy)
> +	if (!iop || atomic_dec_and_test(&iop->write_count))
>  		end_page_writeback(bvec->bv_page);
>  }
>  
> @@ -179,12 +107,8 @@ xfs_destroy_ioend(
>  			next = bio->bi_private;
>  
>  		/* walk each page on bio, ending page IO on them */
> -		bio_for_each_segment_all(bvec, bio, i) {
> -			if (page_has_buffers(bvec->bv_page))
> -				xfs_finish_buffer_writeback(inode, bvec, error);
> -			else
> -				xfs_finish_page_writeback(inode, bvec, error);
> -		}
> +		bio_for_each_segment_all(bvec, bio, i)
> +			xfs_finish_page_writeback(inode, bvec, error);
>  		bio_put(bio);
>  	}
>  
> @@ -638,6 +562,7 @@ xfs_add_to_ioend(
>  	struct inode		*inode,
>  	xfs_off_t		offset,
>  	struct page		*page,
> +	struct iomap_page	*iop,
>  	struct xfs_writepage_ctx *wpc,
>  	struct writeback_control *wbc,
>  	struct list_head	*iolist)
> @@ -661,100 +586,37 @@ xfs_add_to_ioend(
>  				bdev, sector);
>  	}
>  
> -	/*
> -	 * If the block doesn't fit into the bio we need to allocate a new
> -	 * one.  This shouldn't happen more than once for a given block.
> -	 */
> -	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
> -		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
> +	if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
> +		if (iop)
> +			atomic_inc(&iop->write_count);
> +		if (bio_full(wpc->ioend->io_bio))
> +			xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
> +		__bio_add_page(wpc->ioend->io_bio, page, len, poff);
> +	}
>  
>  	wpc->ioend->io_size += len;
>  }
>  
> -STATIC void
> -xfs_map_buffer(
> -	struct inode		*inode,
> -	struct buffer_head	*bh,
> -	struct xfs_bmbt_irec	*imap,
> -	xfs_off_t		offset)
> -{
> -	sector_t		bn;
> -	struct xfs_mount	*m = XFS_I(inode)->i_mount;
> -	xfs_off_t		iomap_offset = XFS_FSB_TO_B(m, imap->br_startoff);
> -	xfs_daddr_t		iomap_bn = xfs_fsb_to_db(XFS_I(inode), imap->br_startblock);
> -
> -	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
> -	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
> -
> -	bn = (iomap_bn >> (inode->i_blkbits - BBSHIFT)) +
> -	      ((offset - iomap_offset) >> inode->i_blkbits);
> -
> -	ASSERT(bn || XFS_IS_REALTIME_INODE(XFS_I(inode)));
> -
> -	bh->b_blocknr = bn;
> -	set_buffer_mapped(bh);
> -}
> -
> -STATIC void
> -xfs_map_at_offset(
> -	struct inode		*inode,
> -	struct buffer_head	*bh,
> -	struct xfs_bmbt_irec	*imap,
> -	xfs_off_t		offset)
> -{
> -	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
> -	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
> -
> -	lock_buffer(bh);
> -	xfs_map_buffer(inode, bh, imap, offset);
> -	set_buffer_mapped(bh);
> -	clear_buffer_delay(bh);
> -	clear_buffer_unwritten(bh);
> -
> -	/*
> -	 * If this is a realtime file, data may be on a different device.
> -	 * to that pointed to from the buffer_head b_bdev currently. We can't
> -	 * trust that the bufferhead has a already been mapped correctly, so
> -	 * set the bdev now.
> -	 */
> -	bh->b_bdev = xfs_find_bdev_for_inode(inode);
> -	bh->b_end_io = NULL;
> -	set_buffer_async_write(bh);
> -	set_buffer_uptodate(bh);
> -	clear_buffer_dirty(bh);
> -}
> -
>  STATIC void
>  xfs_vm_invalidatepage(
>  	struct page		*page,
>  	unsigned int		offset,
>  	unsigned int		length)
>  {
> -	trace_xfs_invalidatepage(page->mapping->host, page, offset,
> -				 length);
> -
> -	/*
> -	 * If we are invalidating the entire page, clear the dirty state from it
> -	 * so that we can check for attempts to release dirty cached pages in
> -	 * xfs_vm_releasepage().
> -	 */
> -	if (offset == 0 && length >= PAGE_SIZE)
> -		cancel_dirty_page(page);
> -	block_invalidatepage(page, offset, length);
> +	trace_xfs_invalidatepage(page->mapping->host, page, offset, length);
> +	iomap_invalidatepage(page, offset, length);
>  }
>  
>  /*
> - * If the page has delalloc buffers on it, we need to punch them out before we
> - * invalidate the page. If we don't, we leave a stale delalloc mapping on the
> - * inode that can trip a BUG() in xfs_get_blocks() later on if a direct IO read
> - * is done on that same region - the delalloc extent is returned when none is
> - * supposed to be there.
> + * If the page has delalloc blocks on it, we need to punch them out before we
> + * invalidate the page.  If we don't, we leave a stale delalloc mapping on the
> + * inode that can trip up a later direct I/O read operation on the same region.
>   *
> - * We prevent this by truncating away the delalloc regions on the page before
> - * invalidating it. Because they are delalloc, we can do this without needing a
> - * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
> - * truncation without a transaction as there is no space left for block
> - * reservation (typically why we see a ENOSPC in writeback).
> + * We prevent this by truncating away the delalloc regions on the page.  Because
> + * they are delalloc, we can do this without needing a transaction. Indeed - if
> + * we get ENOSPC errors, we have to be able to do this truncation without a
> + * transaction as there is no space left for block reservation (typically why we
> + * see a ENOSPC in writeback).
>   */
>  STATIC void
>  xfs_aops_discard_page(
> @@ -786,7 +648,7 @@ xfs_aops_discard_page(
>   * We implement an immediate ioend submission policy here to avoid needing to
>   * chain multiple ioends and hence nest mempool allocations which can violate
>   * forward progress guarantees we need to provide. The current ioend we are
> - * adding buffers to is cached on the writepage context, and if the new buffer
> + * adding blocks to is cached on the writepage context, and if the new block
>   * does not append to the cached ioend it will create a new ioend and cache that
>   * instead.
>   *
> @@ -807,54 +669,33 @@ xfs_writepage_map(
>  	uint64_t		end_offset)
>  {
>  	LIST_HEAD(submit_list);
> +	struct iomap_page	*iop = to_iomap_page(page);
> +	unsigned		len = i_blocksize(inode);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh = NULL;
> -	ssize_t			len = i_blocksize(inode);
>  	uint64_t		file_offset;	/* file offset of page */
> -	unsigned		poffset;	/* offset into page */
> -	int			error = 0;
> -	int			count = 0;
> +	int			error = 0, count = 0, i;
>  
> -	if (page_has_buffers(page))
> -		bh = page_buffers(page);
> +	ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
> +	ASSERT(!iop || atomic_read(&iop->write_count) == 0);
>  
>  	/*
> -	 * Walk the blocks on the page, and if we run off the end of the current
> -	 * map or find the current map invalid, grab a new one.  We only use
> -	 * bufferheads here to check per-block state - they no longer control
> -	 * the iteration through the page. This allows us to replace the
> -	 * bufferhead with some other state tracking mechanism in future.
> +	 * Walk through the page to find areas to write back. If we run off the
> +	 * end of the current map or find the current map invalid, grab a new
> +	 * one.
>  	 */
> -	for (poffset = 0, file_offset = page_offset(page);
> -	     poffset < PAGE_SIZE;
> -	     poffset += len, file_offset += len) {
> -		/* past the range we are writing, so nothing more to write. */
> -		if (file_offset >= end_offset)
> -			break;
> -
> -		if (bh && !buffer_uptodate(bh)) {
> -			if (PageUptodate(page))
> -				ASSERT(buffer_mapped(bh));
> -			bh = bh->b_this_page;
> +	for (i = 0, file_offset = page_offset(page);
> +	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
> +	     i++, file_offset += len) {
> +		if (iop && !test_bit(i, iop->uptodate))
>  			continue;
> -		}
>  
>  		error = xfs_map_blocks(wpc, inode, file_offset);
>  		if (error)
>  			break;
> -
> -		if (wpc->io_type == XFS_IO_HOLE) {
> -			if (bh)
> -				bh = bh->b_this_page;
> +		if (wpc->io_type == XFS_IO_HOLE)
>  			continue;
> -		}
> -
> -		if (bh) {
> -			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> -			bh = bh->b_this_page;
> -		}
> -		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
> -				&submit_list);
> +		xfs_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
> +				 &submit_list);
>  		count++;
>  	}
>  
> @@ -863,21 +704,18 @@ xfs_writepage_map(
>  	ASSERT(!PageWriteback(page));
>  
>  	/*
> -	 * On error, we have to fail the ioend here because we have locked
> -	 * buffers in the ioend. If we don't do this, we'll deadlock
> -	 * invalidating the page as that tries to lock the buffers on the page.
> -	 * Also, because we may have set pages under writeback, we have to make
> -	 * sure we run IO completion to mark the error state of the IO
> -	 * appropriately, so we can't cancel the ioend directly here. That means
> -	 * we have to mark this page as under writeback if we included any
> -	 * buffers from it in the ioend chain so that completion treats it
> -	 * correctly.
> +	 * On error, we have to fail the ioend here because we may have set
> +	 * pages under writeback, we have to make sure we run IO completion to
> +	 * mark the error state of the IO appropriately, so we can't cancel the
> +	 * ioend directly here.  That means we have to mark this page as under
> +	 * writeback if we included any blocks from it in the ioend chain so
> +	 * that completion treats it correctly.
>  	 *
>  	 * If we didn't include the page in the ioend, the on error we can
>  	 * simply discard and unlock it as there are no other users of the page
> -	 * or it's buffers right now. The caller will still need to trigger
> -	 * submission of outstanding ioends on the writepage context so they are
> -	 * treated correctly on error.
> +	 * now.  The caller will still need to trigger submission of outstanding
> +	 * ioends on the writepage context so they are treated correctly on
> +	 * error.
>  	 */
>  	if (unlikely(error)) {
>  		if (!count) {
> @@ -918,8 +756,8 @@ xfs_writepage_map(
>  	}
>  
>  	/*
> -	 * We can end up here with no error and nothing to write if we race with
> -	 * a partial page truncate on a sub-page block sized filesystem.
> +	 * We can end up here with no error and nothing to write only if we race
> +	 * with a partial page truncate on a sub-page block sized filesystem.
>  	 */
>  	if (!count)
>  		end_page_writeback(page);
> @@ -934,7 +772,6 @@ xfs_writepage_map(
>   * For delalloc space on the page we need to allocate space and flush it.
>   * For unwritten space on the page we need to start the conversion to
>   * regular allocated space.
> - * For any other dirty buffer heads on the page we should flush them.
>   */
>  STATIC int
>  xfs_do_writepage(
> @@ -1088,166 +925,13 @@ xfs_dax_writepages(
>  			xfs_find_bdev_for_inode(mapping->host), wbc);
>  }
>  
> -/*
> - * Called to move a page into cleanable state - and from there
> - * to be released. The page should already be clean. We always
> - * have buffer heads in this call.
> - *
> - * Returns 1 if the page is ok to release, 0 otherwise.
> - */
>  STATIC int
>  xfs_vm_releasepage(
>  	struct page		*page,
>  	gfp_t			gfp_mask)
>  {
> -	int			delalloc, unwritten;
> -
>  	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
> -
> -	/*
> -	 * mm accommodates an old ext3 case where clean pages might not have had
> -	 * the dirty bit cleared. Thus, it can send actual dirty pages to
> -	 * ->releasepage() via shrink_active_list(). Conversely,
> -	 * block_invalidatepage() can send pages that are still marked dirty but
> -	 * otherwise have invalidated buffers.
> -	 *
> -	 * We want to release the latter to avoid unnecessary buildup of the
> -	 * LRU, so xfs_vm_invalidatepage() clears the page dirty flag on pages
> -	 * that are entirely invalidated and need to be released.  Hence the
> -	 * only time we should get dirty pages here is through
> -	 * shrink_active_list() and so we can simply skip those now.
> -	 *
> -	 * warn if we've left any lingering delalloc/unwritten buffers on clean
> -	 * or invalidated pages we are about to release.
> -	 */
> -	if (PageDirty(page))
> -		return 0;
> -
> -	xfs_count_page_state(page, &delalloc, &unwritten);
> -
> -	if (WARN_ON_ONCE(delalloc))
> -		return 0;
> -	if (WARN_ON_ONCE(unwritten))
> -		return 0;
> -
> -	return try_to_free_buffers(page);
> -}
> -
> -/*
> - * If this is O_DIRECT or the mpage code calling tell them how large the mapping
> - * is, so that we can avoid repeated get_blocks calls.
> - *
> - * If the mapping spans EOF, then we have to break the mapping up as the mapping
> - * for blocks beyond EOF must be marked new so that sub block regions can be
> - * correctly zeroed. We can't do this for mappings within EOF unless the mapping
> - * was just allocated or is unwritten, otherwise the callers would overwrite
> - * existing data with zeros. Hence we have to split the mapping into a range up
> - * to and including EOF, and a second mapping for beyond EOF.
> - */
> -static void
> -xfs_map_trim_size(
> -	struct inode		*inode,
> -	sector_t		iblock,
> -	struct buffer_head	*bh_result,
> -	struct xfs_bmbt_irec	*imap,
> -	xfs_off_t		offset,
> -	ssize_t			size)
> -{
> -	xfs_off_t		mapping_size;
> -
> -	mapping_size = imap->br_startoff + imap->br_blockcount - iblock;
> -	mapping_size <<= inode->i_blkbits;
> -
> -	ASSERT(mapping_size > 0);
> -	if (mapping_size > size)
> -		mapping_size = size;
> -	if (offset < i_size_read(inode) &&
> -	    (xfs_ufsize_t)offset + mapping_size >= i_size_read(inode)) {
> -		/* limit mapping to block that spans EOF */
> -		mapping_size = roundup_64(i_size_read(inode) - offset,
> -					  i_blocksize(inode));
> -	}
> -	if (mapping_size > LONG_MAX)
> -		mapping_size = LONG_MAX;
> -
> -	bh_result->b_size = mapping_size;
> -}
> -
> -static int
> -xfs_get_blocks(
> -	struct inode		*inode,
> -	sector_t		iblock,
> -	struct buffer_head	*bh_result,
> -	int			create)
> -{
> -	struct xfs_inode	*ip = XFS_I(inode);
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_fileoff_t		offset_fsb, end_fsb;
> -	int			error = 0;
> -	int			lockmode = 0;
> -	struct xfs_bmbt_irec	imap;
> -	int			nimaps = 1;
> -	xfs_off_t		offset;
> -	ssize_t			size;
> -
> -	BUG_ON(create);
> -
> -	if (XFS_FORCED_SHUTDOWN(mp))
> -		return -EIO;
> -
> -	offset = (xfs_off_t)iblock << inode->i_blkbits;
> -	ASSERT(bh_result->b_size >= i_blocksize(inode));
> -	size = bh_result->b_size;
> -
> -	if (offset >= i_size_read(inode))
> -		return 0;
> -
> -	/*
> -	 * Direct I/O is usually done on preallocated files, so try getting
> -	 * a block mapping without an exclusive lock first.
> -	 */
> -	lockmode = xfs_ilock_data_map_shared(ip);
> -
> -	ASSERT(offset <= mp->m_super->s_maxbytes);
> -	if (offset > mp->m_super->s_maxbytes - size)
> -		size = mp->m_super->s_maxbytes - offset;
> -	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + size);
> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -
> -	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> -			&nimaps, 0);
> -	if (error)
> -		goto out_unlock;
> -	if (!nimaps) {
> -		trace_xfs_get_blocks_notfound(ip, offset, size);
> -		goto out_unlock;
> -	}
> -
> -	trace_xfs_get_blocks_found(ip, offset, size,
> -		imap.br_state == XFS_EXT_UNWRITTEN ?
> -			XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, &imap);
> -	xfs_iunlock(ip, lockmode);
> -
> -	/* trim mapping down to size requested */
> -	xfs_map_trim_size(inode, iblock, bh_result, &imap, offset, size);
> -
> -	/*
> -	 * For unwritten extents do not report a disk address in the buffered
> -	 * read case (treat as if we're reading into a hole).
> -	 */
> -	if (xfs_bmap_is_real_extent(&imap))
> -		xfs_map_buffer(inode, bh_result, &imap, offset);
> -
> -	/*
> -	 * If this is a realtime file, data may be on a different device.
> -	 * to that pointed to from the buffer_head b_bdev currently.
> -	 */
> -	bh_result->b_bdev = xfs_find_bdev_for_inode(inode);
> -	return 0;
> -
> -out_unlock:
> -	xfs_iunlock(ip, lockmode);
> -	return error;
> +	return iomap_releasepage(page, gfp_mask);
>  }
>  
>  STATIC sector_t
> @@ -1279,9 +963,7 @@ xfs_vm_readpage(
>  	struct page		*page)
>  {
>  	trace_xfs_vm_readpage(page->mapping->host, 1);
> -	if (i_blocksize(page->mapping->host) == PAGE_SIZE)
> -		return iomap_readpage(page, &xfs_iomap_ops);
> -	return mpage_readpage(page, xfs_get_blocks);
> +	return iomap_readpage(page, &xfs_iomap_ops);
>  }
>  
>  STATIC int
> @@ -1292,65 +974,7 @@ xfs_vm_readpages(
>  	unsigned		nr_pages)
>  {
>  	trace_xfs_vm_readpages(mapping->host, nr_pages);
> -	if (i_blocksize(mapping->host) == PAGE_SIZE)
> -		return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
> -	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
> -}
> -
> -/*
> - * This is basically a copy of __set_page_dirty_buffers() with one
> - * small tweak: buffers beyond EOF do not get marked dirty. If we mark them
> - * dirty, we'll never be able to clean them because we don't write buffers
> - * beyond EOF, and that means we can't invalidate pages that span EOF
> - * that have been marked dirty. Further, the dirty state can leak into
> - * the file interior if the file is extended, resulting in all sorts of
> - * bad things happening as the state does not match the underlying data.
> - *
> - * XXX: this really indicates that bufferheads in XFS need to die. Warts like
> - * this only exist because of bufferheads and how the generic code manages them.
> - */
> -STATIC int
> -xfs_vm_set_page_dirty(
> -	struct page		*page)
> -{
> -	struct address_space	*mapping = page->mapping;
> -	struct inode		*inode = mapping->host;
> -	loff_t			end_offset;
> -	loff_t			offset;
> -	int			newly_dirty;
> -
> -	if (unlikely(!mapping))
> -		return !TestSetPageDirty(page);
> -
> -	end_offset = i_size_read(inode);
> -	offset = page_offset(page);
> -
> -	spin_lock(&mapping->private_lock);
> -	if (page_has_buffers(page)) {
> -		struct buffer_head *head = page_buffers(page);
> -		struct buffer_head *bh = head;
> -
> -		do {
> -			if (offset < end_offset)
> -				set_buffer_dirty(bh);
> -			bh = bh->b_this_page;
> -			offset += i_blocksize(inode);
> -		} while (bh != head);
> -	}
> -	/*
> -	 * Lock out page->mem_cgroup migration to keep PageDirty
> -	 * synchronized with per-memcg dirty page counters.
> -	 */
> -	lock_page_memcg(page);
> -	newly_dirty = !TestSetPageDirty(page);
> -	spin_unlock(&mapping->private_lock);
> -
> -	if (newly_dirty)
> -		__set_page_dirty(page, mapping, 1);
> -	unlock_page_memcg(page);
> -	if (newly_dirty)
> -		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> -	return newly_dirty;
> +	return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
>  }
>  
>  static int
> @@ -1368,13 +992,13 @@ const struct address_space_operations xfs_address_space_operations = {
>  	.readpages		= xfs_vm_readpages,
>  	.writepage		= xfs_vm_writepage,
>  	.writepages		= xfs_vm_writepages,
> -	.set_page_dirty		= xfs_vm_set_page_dirty,
> +	.set_page_dirty		= iomap_set_page_dirty,
>  	.releasepage		= xfs_vm_releasepage,
>  	.invalidatepage		= xfs_vm_invalidatepage,
>  	.bmap			= xfs_vm_bmap,
>  	.direct_IO		= noop_direct_IO,
> -	.migratepage		= buffer_migrate_page,
> -	.is_partially_uptodate  = block_is_partially_uptodate,
> +	.migratepage		= iomap_migrate_page,
> +	.is_partially_uptodate  = iomap_is_partially_uptodate,
>  	.error_remove_page	= generic_error_remove_page,
>  	.swap_activate		= xfs_iomap_swapfile_activate,
>  };
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index d24dbd4dac39..6ddf1907fc7a 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -12,7 +12,6 @@
>  #include <linux/mm.h>
>  #include <linux/fs.h>
>  #include <linux/dax.h>
> -#include <linux/buffer_head.h>
>  #include <linux/uio.h>
>  #include <linux/list_lru.h>
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 7fe42a126ec1..778b8c850de3 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1032,9 +1032,6 @@ xfs_file_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if (i_blocksize(inode) < PAGE_SIZE)
> -		iomap->flags |= IOMAP_F_BUFFER_HEAD;
> -
>  	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
>  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 9d791f158dfe..f9f8dc490d3d 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1860,7 +1860,7 @@ MODULE_ALIAS_FS("xfs");
>  STATIC int __init
>  xfs_init_zones(void)
>  {
> -	if (bioset_init(&xfs_ioend_bioset, 4 * MAX_BUF_PER_PAGE,
> +	if (bioset_init(&xfs_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
>  			offsetof(struct xfs_ioend, io_inline_bio),
>  			BIOSET_NEED_BVECS))
>  		goto out;
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 1af123df19b5..7f4c7071e7ed 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1153,33 +1153,23 @@ DECLARE_EVENT_CLASS(xfs_page_class,
>  		__field(loff_t, size)
>  		__field(unsigned long, offset)
>  		__field(unsigned int, length)
> -		__field(int, delalloc)
> -		__field(int, unwritten)
>  	),
>  	TP_fast_assign(
> -		int delalloc = -1, unwritten = -1;
> -
> -		if (page_has_buffers(page))
> -			xfs_count_page_state(page, &delalloc, &unwritten);
>  		__entry->dev = inode->i_sb->s_dev;
>  		__entry->ino = XFS_I(inode)->i_ino;
>  		__entry->pgoff = page_offset(page);
>  		__entry->size = i_size_read(inode);
>  		__entry->offset = off;
>  		__entry->length = len;
> -		__entry->delalloc = delalloc;
> -		__entry->unwritten = unwritten;
>  	),
>  	TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
> -		  "length %x delalloc %d unwritten %d",
> +		  "length %x",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
>  		  __entry->ino,
>  		  __entry->pgoff,
>  		  __entry->size,
>  		  __entry->offset,
> -		  __entry->length,
> -		  __entry->delalloc,
> -		  __entry->unwritten)
> +		  __entry->length)
>  )
>  
>  #define DEFINE_PAGE_EVENT(name)		\
> @@ -1263,9 +1253,6 @@ DEFINE_EVENT(xfs_imap_class, name,	\
>  	TP_ARGS(ip, offset, count, type, irec))
>  DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
>  DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
> -DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
> -DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
> -DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
>  DEFINE_IOMAP_EVENT(xfs_iomap_alloc);
>  DEFINE_IOMAP_EVENT(xfs_iomap_found);
>  
> @@ -1304,7 +1291,6 @@ DEFINE_EVENT(xfs_simple_io_class, name,	\
>  	TP_ARGS(ip, offset, count))
>  DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enospc);
>  DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
> -DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
>  DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
>  DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
>  DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
> -- 
> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code
  2018-07-02 14:58 ` [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code Christoph Hellwig
@ 2018-07-03 12:36   ` Brian Foster
  2018-07-03 21:51   ` Darrick J. Wong
  1 sibling, 0 replies; 33+ messages in thread
From: Brian Foster @ 2018-07-03 12:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Mon, Jul 02, 2018 at 08:58:13AM -0600, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c  | 1 +
>  fs/xfs/xfs_iomap.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index bae88ac1101d..f4d3252236c1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (c) 2000-2005 Silicon Graphics, Inc.
> + * Copyright (c) 2016-2018 Christoph Hellwig.
>   * All Rights Reserved.
>   */
>  #include "xfs.h"
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 778b8c850de3..fb9746cc7338 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (c) 2000-2006 Silicon Graphics, Inc.
> - * Copyright (c) 2016 Christoph Hellwig.
> + * Copyright (c) 2016-2018 Christoph Hellwig.
>   * All Rights Reserved.
>   */
>  #include <linux/iomap.h>
> -- 
> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code
  2018-07-02 14:58 ` [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code Christoph Hellwig
  2018-07-03 12:36   ` Brian Foster
@ 2018-07-03 21:51   ` Darrick J. Wong
  1 sibling, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2018-07-03 21:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Mon, Jul 02, 2018 at 08:58:13AM -0600, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c  | 1 +
>  fs/xfs/xfs_iomap.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index bae88ac1101d..f4d3252236c1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (c) 2000-2005 Silicon Graphics, Inc.
> + * Copyright (c) 2016-2018 Christoph Hellwig.
>   * All Rights Reserved.
>   */
>  #include "xfs.h"
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 778b8c850de3..fb9746cc7338 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (c) 2000-2006 Silicon Graphics, Inc.
> - * Copyright (c) 2016 Christoph Hellwig.
> + * Copyright (c) 2016-2018 Christoph Hellwig.
>   * All Rights Reserved.
>   */
>  #include <linux/iomap.h>
> -- 
> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 20/22] iomap: add support for sub-pagesize buffered I/O without buffer heads
  2018-07-03 12:31   ` Brian Foster
@ 2018-07-03 21:52     ` Darrick J. Wong
  0 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2018-07-03 21:52 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel

On Tue, Jul 03, 2018 at 08:31:27AM -0400, Brian Foster wrote:
> On Mon, Jul 02, 2018 at 08:58:11AM -0600, Christoph Hellwig wrote:
> > After already supporting a simple implementation of buffered writes for
> > the blocksize == PAGE_SIZE case in the last commit this adds full support
> > even for smaller block sizes.   There are three bits of per-block
> > information in the buffer_head structure that really matter for the iomap
> > read and write path:
> > 
> >  - uptodate status (BH_uptodate)
> >  - marked as currently under read I/O (BH_Async_Read)
> >  - marked as currently under write I/O (BH_Async_Write)
> > 
> > Instead of having new per-block structures this now adds a per-page
> > structure called struct iomap_page to track this information in a slightly
> > different form:
> > 
> >  - a bitmap for the per-block uptodate status.  For worst case of a 64k
> >    page size system this bitmap needs to contain 128 bits.  For the
> >    typical 4k page size case it only needs 8 bits, although we still
> >    need a full unsigned long due to the way the atomic bitmap API works.
> >  - two atomic_t counters are used to track the outstanding read and write
> >    counts
> > 
> > There is quite a bit of boilerplate code as the buffered I/O path uses
> > various helper methods, but the actual code is very straight forward.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/iomap.c            | 279 ++++++++++++++++++++++++++++++++++++++----
> >  include/linux/iomap.h |  31 +++++
> >  2 files changed, 289 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index 13cdcf33e6c0..ea1b1ba61ba3 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> ...
> > @@ -161,13 +295,13 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  		return PAGE_SIZE;
> >  	}
> >  
> > -	/* we don't support blocksize < PAGE_SIZE quite yet. */
> > -	WARN_ON_ONCE(pos != page_offset(page));
> > -	WARN_ON_ONCE(plen != PAGE_SIZE);
> > +	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
> > +	if (plen == 0)
> > +		goto done;
> >  
> 
> I think the i_size check confused me in the previous go around. It's
> obviously clear to me now after the iomap zeroing issue and fix, but a
> one-liner comment couldn't hurt for future reference:
> 
> 	/* zero post-eof blocks as the page may be mapped */
> 
> That nit aside, this looks good to me and survives my tests:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Looks ok, will insert the comment on its way in.  Will test (again)...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> >  	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
> >  		zero_user(page, poff, plen);
> > -		SetPageUptodate(page);
> > +		iomap_set_range_uptodate(page, poff, plen);
> >  		goto done;
> >  	}
> >  
> > @@ -183,6 +317,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  		is_contig = true;
> >  	}
> >  
> > +	/*
> > +	 * If we start a new segment we need to increase the read count, and we
> > +	 * need to do so before submitting any previous full bio to make sure
> > +	 * that we don't prematurely unlock the page.
> > +	 */
> > +	if (iop)
> > +		atomic_inc(&iop->read_count);
> > +
> >  	if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
> >  		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
> >  		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > @@ -203,7 +345,13 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  
> >  	__bio_add_page(ctx->bio, page, plen, poff);
> >  done:
> > -	return plen;
> > +	/*
> > +	 * Move the caller beyond our range so that it keeps making progress.
> > +	 * For that we have to include any leading non-uptodate ranges, but
> > +	 * we can skip trailing ones as they will be handled in the next
> > +	 * iteration.
> > +	 */
> > +	return pos - orig_pos + plen;
> >  }
> >  
> >  int
> > @@ -214,8 +362,6 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
> >  	unsigned poff;
> >  	loff_t ret;
> >  
> > -	WARN_ON_ONCE(page_has_buffers(page));
> > -
> >  	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
> >  		ret = iomap_apply(inode, page_offset(page) + poff,
> >  				PAGE_SIZE - poff, 0, ops, &ctx,
> > @@ -341,6 +487,84 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages,
> >  }
> >  EXPORT_SYMBOL_GPL(iomap_readpages);
> >  
> > +int
> > +iomap_is_partially_uptodate(struct page *page, unsigned long from,
> > +		unsigned long count)
> > +{
> > +	struct iomap_page *iop = to_iomap_page(page);
> > +	struct inode *inode = page->mapping->host;
> > +	unsigned first = from >> inode->i_blkbits;
> > +	unsigned last = (from + count - 1) >> inode->i_blkbits;
> > +	unsigned i;
> > +
> > +	if (iop) {
> > +		for (i = first; i <= last; i++)
> > +			if (!test_bit(i, iop->uptodate))
> > +				return 0;
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
> > +
> > +int
> > +iomap_releasepage(struct page *page, gfp_t gfp_mask)
> > +{
> > +	/*
> > +	 * mm accommodates an old ext3 case where clean pages might not have had
> > +	 * the dirty bit cleared. Thus, it can send actual dirty pages to
> > +	 * ->releasepage() via shrink_active_list(), skip those here.
> > +	 */
> > +	if (PageDirty(page) || PageWriteback(page))
> > +		return 0;
> > +	iomap_page_release(page);
> > +	return 1;
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_releasepage);
> > +
> > +void
> > +iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
> > +{
> > +	/*
> > +	 * If we are invalidating the entire page, clear the dirty state from it
> > +	 * and release it to avoid unnecessary buildup of the LRU.
> > +	 */
> > +	if (offset == 0 && len == PAGE_SIZE) {
> > +		WARN_ON_ONCE(PageWriteback(page));
> > +		cancel_dirty_page(page);
> > +		iomap_page_release(page);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_invalidatepage);
> > +
> > +#ifdef CONFIG_MIGRATION
> > +int
> > +iomap_migrate_page(struct address_space *mapping, struct page *newpage,
> > +		struct page *page, enum migrate_mode mode)
> > +{
> > +	int ret;
> > +
> > +	ret = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
> > +	if (ret != MIGRATEPAGE_SUCCESS)
> > +		return ret;
> > +
> > +	if (page_has_private(page)) {
> > +		ClearPagePrivate(page);
> > +		set_page_private(newpage, page_private(page));
> > +		set_page_private(page, 0);
> > +		SetPagePrivate(newpage);
> > +	}
> > +
> > +	if (mode != MIGRATE_SYNC_NO_COPY)
> > +		migrate_page_copy(newpage, page);
> > +	else
> > +		migrate_page_states(newpage, page);
> > +	return MIGRATEPAGE_SUCCESS;
> > +}
> > +EXPORT_SYMBOL_GPL(iomap_migrate_page);
> > +#endif /* CONFIG_MIGRATION */
> > +
> >  static void
> >  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
> >  {
> > @@ -364,6 +588,7 @@ iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
> >  
> >  	if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) {
> >  		zero_user_segments(page, poff, from, to, poff + plen);
> > +		iomap_set_range_uptodate(page, poff, plen);
> >  		return 0;
> >  	}
> >  
> > @@ -379,21 +604,33 @@ static int
> >  __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> >  		struct page *page, struct iomap *iomap)
> >  {
> > +	struct iomap_page *iop = iomap_page_create(inode, page);
> >  	loff_t block_size = i_blocksize(inode);
> >  	loff_t block_start = pos & ~(block_size - 1);
> >  	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
> > -	unsigned poff = block_start & (PAGE_SIZE - 1);
> > -	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start);
> > -	unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
> > -
> > -	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> > +	unsigned from = pos & (PAGE_SIZE - 1), to = from + len, poff, plen;
> > +	int status = 0;
> >  
> >  	if (PageUptodate(page))
> >  		return 0;
> > -	if (from <= poff && to >= poff + plen)
> > -		return 0;
> > -	return iomap_read_page_sync(inode, block_start, page,
> > -			poff, plen, from, to, iomap);
> > +
> > +	do {
> > +		iomap_adjust_read_range(inode, iop, &block_start,
> > +				block_end - block_start, &poff, &plen);
> > +		if (plen == 0)
> > +			break;
> > +
> > +		if ((from > poff && from < poff + plen) ||
> > +		    (to > poff && to < poff + plen)) {
> > +			status = iomap_read_page_sync(inode, block_start, page,
> > +					poff, plen, from, to, iomap);
> > +			if (status)
> > +				break;
> > +		}
> > +
> > +	} while ((block_start += plen) < block_end);
> > +
> > +	return status;
> >  }
> >  
> >  static int
> > @@ -476,7 +713,7 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> >  	if (unlikely(copied < len && !PageUptodate(page))) {
> >  		copied = 0;
> >  	} else {
> > -		SetPageUptodate(page);
> > +		iomap_set_range_uptodate(page, pos & (PAGE_SIZE - 1), len);
> >  		iomap_set_page_dirty(page);
> >  	}
> >  	return __generic_write_end(inode, pos, copied, page);
> > @@ -812,7 +1049,7 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> >  		block_commit_write(page, 0, length);
> >  	} else {
> >  		WARN_ON_ONCE(!PageUptodate(page));
> > -		WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> > +		iomap_page_create(inode, page);
> >  	}
> >  
> >  	return length;
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 5eb9ca8d7ce5..3555d54bf79a 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -2,6 +2,9 @@
> >  #ifndef LINUX_IOMAP_H
> >  #define LINUX_IOMAP_H 1
> >  
> > +#include <linux/atomic.h>
> > +#include <linux/bitmap.h>
> > +#include <linux/mm.h>
> >  #include <linux/types.h>
> >  
> >  struct address_space;
> > @@ -98,12 +101,40 @@ struct iomap_ops {
> >  			ssize_t written, unsigned flags, struct iomap *iomap);
> >  };
> >  
> > +/*
> > + * Structure allocate for each page when block size < PAGE_SIZE to track
> > + * sub-page uptodate status and I/O completions.
> > + */
> > +struct iomap_page {
> > +	atomic_t		read_count;
> > +	atomic_t		write_count;
> > +	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
> > +};
> > +
> > +static inline struct iomap_page *to_iomap_page(struct page *page)
> > +{
> > +	if (page_has_private(page))
> > +		return (struct iomap_page *)page_private(page);
> > +	return NULL;
> > +}
> > +
> >  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> >  		const struct iomap_ops *ops);
> >  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
> >  int iomap_readpages(struct address_space *mapping, struct list_head *pages,
> >  		unsigned nr_pages, const struct iomap_ops *ops);
> >  int iomap_set_page_dirty(struct page *page);
> > +int iomap_is_partially_uptodate(struct page *page, unsigned long from,
> > +		unsigned long count);
> > +int iomap_releasepage(struct page *page, gfp_t gfp_mask);
> > +void iomap_invalidatepage(struct page *page, unsigned int offset,
> > +		unsigned int len);
> > +#ifdef CONFIG_MIGRATION
> > +int iomap_migrate_page(struct address_space *mapping, struct page *newpage,
> > +		struct page *page, enum migrate_mode mode);
> > +#else
> > +#define iomap_migrate_page NULL
> > +#endif
> >  int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
> >  		const struct iomap_ops *ops);
> >  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
> > -- 
> > 2.18.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads
  2018-07-03 12:36   ` Brian Foster
@ 2018-07-03 22:05     ` Darrick J. Wong
  2018-07-08 15:16       ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2018-07-03 22:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel

On Tue, Jul 03, 2018 at 08:36:04AM -0400, Brian Foster wrote:
> On Mon, Jul 02, 2018 at 08:58:12AM -0600, Christoph Hellwig wrote:
> > Switch to using the iomap_page structure for checking sub-page uptodate
> > status and track sub-page I/O completion status, and remove large
> > quantities of boilerplate code working around buffer heads.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_aops.c  | 492 ++++++---------------------------------------
> >  fs/xfs/xfs_buf.h   |   1 -
> >  fs/xfs/xfs_iomap.c |   3 -
> >  fs/xfs/xfs_super.c |   2 +-
> >  fs/xfs/xfs_trace.h |  18 +-
> >  5 files changed, 61 insertions(+), 455 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 0058f9893705..bae88ac1101d 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> ...
> > @@ -85,67 +63,17 @@ xfs_finish_page_writeback(
> >  	struct bio_vec		*bvec,
> >  	int			error)
> >  {
> > +	struct iomap_page	*iop = to_iomap_page(bvec->bv_page);
> > +
> >  	if (error) {
> >  		SetPageError(bvec->bv_page);
> >  		mapping_set_error(inode->i_mapping, -EIO);
> >  	}
> > -	end_page_writeback(bvec->bv_page);
> > -}
> >  
> > -/*
> > - * We're now finished for good with this page.  Update the page state via the
> > - * associated buffer_heads, paying attention to the start and end offsets that
> > - * we need to process on the page.
> > - *
> > - * Note that we open code the action in end_buffer_async_write here so that we
> > - * only have to iterate over the buffers attached to the page once.  This is not
> > - * only more efficient, but also ensures that we only calls end_page_writeback
> > - * at the end of the iteration, and thus avoids the pitfall of having the page
> > - * and buffers potentially freed after every call to end_buffer_async_write.
> > - */
> > -static void
> > -xfs_finish_buffer_writeback(
> > -	struct inode		*inode,
> > -	struct bio_vec		*bvec,
> > -	int			error)
> > -{
> > -	struct buffer_head	*head = page_buffers(bvec->bv_page), *bh = head;
> > -	bool			busy = false;
> > -	unsigned int		off = 0;
> > -	unsigned long		flags;
> > -
> > -	ASSERT(bvec->bv_offset < PAGE_SIZE);
> > -	ASSERT((bvec->bv_offset & (i_blocksize(inode) - 1)) == 0);
> > -	ASSERT(bvec->bv_offset + bvec->bv_len <= PAGE_SIZE);
> > -	ASSERT((bvec->bv_len & (i_blocksize(inode) - 1)) == 0);
> > -
> > -	local_irq_save(flags);
> > -	bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
> > -	do {
> > -		if (off >= bvec->bv_offset &&
> > -		    off < bvec->bv_offset + bvec->bv_len) {
> > -			ASSERT(buffer_async_write(bh));
> > -			ASSERT(bh->b_end_io == NULL);
> > -
> > -			if (error) {
> > -				mark_buffer_write_io_error(bh);
> > -				clear_buffer_uptodate(bh);
> 
> So the buffer completion code clears the uptodate status of the buffer
> on error. I assume that means the next read would replace the data we
> failed to write with whatever was previously on disk.

I've always found it a little weird that we basically throw away the
newer page contents on error, but we shouldn't be changing the behavior
in quite so subtle a way.

Also, since we clear uptodate the next (buffered) write will reread the
page contents.

> I guess it's debatable whether that is the right thing to do in
> general, but that seems like a higher level issue nonetheless (i.e., I
> don't think we'd ever retry the writepage either?).

AFAIK we don't retry failed writes unless userspace dirties the page.

> So is there any reason not to do the analogous in the iomap completion
> code?

Will let Christoph answer that one.

--D

> Otherwise the rest looks fine to me.
> 
> Brian
> 
> > -				SetPageError(bvec->bv_page);
> > -			} else {
> > -				set_buffer_uptodate(bh);
> > -			}
> > -			clear_buffer_async_write(bh);
> > -			unlock_buffer(bh);
> > -		} else if (buffer_async_write(bh)) {
> > -			ASSERT(buffer_locked(bh));
> > -			busy = true;
> > -		}
> > -		off += bh->b_size;
> > -	} while ((bh = bh->b_this_page) != head);
> > -	bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
> > -	local_irq_restore(flags);
> > +	ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
> > +	ASSERT(!iop || atomic_read(&iop->write_count) > 0);
> >  
> > -	if (!busy)
> > +	if (!iop || atomic_dec_and_test(&iop->write_count))
> >  		end_page_writeback(bvec->bv_page);
> >  }
> >  
> > @@ -179,12 +107,8 @@ xfs_destroy_ioend(
> >  			next = bio->bi_private;
> >  
> >  		/* walk each page on bio, ending page IO on them */
> > -		bio_for_each_segment_all(bvec, bio, i) {
> > -			if (page_has_buffers(bvec->bv_page))
> > -				xfs_finish_buffer_writeback(inode, bvec, error);
> > -			else
> > -				xfs_finish_page_writeback(inode, bvec, error);
> > -		}
> > +		bio_for_each_segment_all(bvec, bio, i)
> > +			xfs_finish_page_writeback(inode, bvec, error);
> >  		bio_put(bio);
> >  	}
> >  
> > @@ -638,6 +562,7 @@ xfs_add_to_ioend(
> >  	struct inode		*inode,
> >  	xfs_off_t		offset,
> >  	struct page		*page,
> > +	struct iomap_page	*iop,
> >  	struct xfs_writepage_ctx *wpc,
> >  	struct writeback_control *wbc,
> >  	struct list_head	*iolist)
> > @@ -661,100 +586,37 @@ xfs_add_to_ioend(
> >  				bdev, sector);
> >  	}
> >  
> > -	/*
> > -	 * If the block doesn't fit into the bio we need to allocate a new
> > -	 * one.  This shouldn't happen more than once for a given block.
> > -	 */
> > -	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
> > -		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
> > +	if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
> > +		if (iop)
> > +			atomic_inc(&iop->write_count);
> > +		if (bio_full(wpc->ioend->io_bio))
> > +			xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
> > +		__bio_add_page(wpc->ioend->io_bio, page, len, poff);
> > +	}
> >  
> >  	wpc->ioend->io_size += len;
> >  }
> >  
> > -STATIC void
> > -xfs_map_buffer(
> > -	struct inode		*inode,
> > -	struct buffer_head	*bh,
> > -	struct xfs_bmbt_irec	*imap,
> > -	xfs_off_t		offset)
> > -{
> > -	sector_t		bn;
> > -	struct xfs_mount	*m = XFS_I(inode)->i_mount;
> > -	xfs_off_t		iomap_offset = XFS_FSB_TO_B(m, imap->br_startoff);
> > -	xfs_daddr_t		iomap_bn = xfs_fsb_to_db(XFS_I(inode), imap->br_startblock);
> > -
> > -	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
> > -	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
> > -
> > -	bn = (iomap_bn >> (inode->i_blkbits - BBSHIFT)) +
> > -	      ((offset - iomap_offset) >> inode->i_blkbits);
> > -
> > -	ASSERT(bn || XFS_IS_REALTIME_INODE(XFS_I(inode)));
> > -
> > -	bh->b_blocknr = bn;
> > -	set_buffer_mapped(bh);
> > -}
> > -
> > -STATIC void
> > -xfs_map_at_offset(
> > -	struct inode		*inode,
> > -	struct buffer_head	*bh,
> > -	struct xfs_bmbt_irec	*imap,
> > -	xfs_off_t		offset)
> > -{
> > -	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
> > -	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
> > -
> > -	lock_buffer(bh);
> > -	xfs_map_buffer(inode, bh, imap, offset);
> > -	set_buffer_mapped(bh);
> > -	clear_buffer_delay(bh);
> > -	clear_buffer_unwritten(bh);
> > -
> > -	/*
> > -	 * If this is a realtime file, data may be on a different device.
> > -	 * to that pointed to from the buffer_head b_bdev currently. We can't
> > -	 * trust that the bufferhead has a already been mapped correctly, so
> > -	 * set the bdev now.
> > -	 */
> > -	bh->b_bdev = xfs_find_bdev_for_inode(inode);
> > -	bh->b_end_io = NULL;
> > -	set_buffer_async_write(bh);
> > -	set_buffer_uptodate(bh);
> > -	clear_buffer_dirty(bh);
> > -}
> > -
> >  STATIC void
> >  xfs_vm_invalidatepage(
> >  	struct page		*page,
> >  	unsigned int		offset,
> >  	unsigned int		length)
> >  {
> > -	trace_xfs_invalidatepage(page->mapping->host, page, offset,
> > -				 length);
> > -
> > -	/*
> > -	 * If we are invalidating the entire page, clear the dirty state from it
> > -	 * so that we can check for attempts to release dirty cached pages in
> > -	 * xfs_vm_releasepage().
> > -	 */
> > -	if (offset == 0 && length >= PAGE_SIZE)
> > -		cancel_dirty_page(page);
> > -	block_invalidatepage(page, offset, length);
> > +	trace_xfs_invalidatepage(page->mapping->host, page, offset, length);
> > +	iomap_invalidatepage(page, offset, length);
> >  }
> >  
> >  /*
> > - * If the page has delalloc buffers on it, we need to punch them out before we
> > - * invalidate the page. If we don't, we leave a stale delalloc mapping on the
> > - * inode that can trip a BUG() in xfs_get_blocks() later on if a direct IO read
> > - * is done on that same region - the delalloc extent is returned when none is
> > - * supposed to be there.
> > + * If the page has delalloc blocks on it, we need to punch them out before we
> > + * invalidate the page.  If we don't, we leave a stale delalloc mapping on the
> > + * inode that can trip up a later direct I/O read operation on the same region.
> >   *
> > - * We prevent this by truncating away the delalloc regions on the page before
> > - * invalidating it. Because they are delalloc, we can do this without needing a
> > - * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
> > - * truncation without a transaction as there is no space left for block
> > - * reservation (typically why we see a ENOSPC in writeback).
> > + * We prevent this by truncating away the delalloc regions on the page.  Because
> > + * they are delalloc, we can do this without needing a transaction. Indeed - if
> > + * we get ENOSPC errors, we have to be able to do this truncation without a
> > + * transaction as there is no space left for block reservation (typically why we
> > + * see a ENOSPC in writeback).
> >   */
> >  STATIC void
> >  xfs_aops_discard_page(
> > @@ -786,7 +648,7 @@ xfs_aops_discard_page(
> >   * We implement an immediate ioend submission policy here to avoid needing to
> >   * chain multiple ioends and hence nest mempool allocations which can violate
> >   * forward progress guarantees we need to provide. The current ioend we are
> > - * adding buffers to is cached on the writepage context, and if the new buffer
> > + * adding blocks to is cached on the writepage context, and if the new block
> >   * does not append to the cached ioend it will create a new ioend and cache that
> >   * instead.
> >   *
> > @@ -807,54 +669,33 @@ xfs_writepage_map(
> >  	uint64_t		end_offset)
> >  {
> >  	LIST_HEAD(submit_list);
> > +	struct iomap_page	*iop = to_iomap_page(page);
> > +	unsigned		len = i_blocksize(inode);
> >  	struct xfs_ioend	*ioend, *next;
> > -	struct buffer_head	*bh = NULL;
> > -	ssize_t			len = i_blocksize(inode);
> >  	uint64_t		file_offset;	/* file offset of page */
> > -	unsigned		poffset;	/* offset into page */
> > -	int			error = 0;
> > -	int			count = 0;
> > +	int			error = 0, count = 0, i;
> >  
> > -	if (page_has_buffers(page))
> > -		bh = page_buffers(page);
> > +	ASSERT(iop || i_blocksize(inode) == PAGE_SIZE);
> > +	ASSERT(!iop || atomic_read(&iop->write_count) == 0);
> >  
> >  	/*
> > -	 * Walk the blocks on the page, and if we run off the end of the current
> > -	 * map or find the current map invalid, grab a new one.  We only use
> > -	 * bufferheads here to check per-block state - they no longer control
> > -	 * the iteration through the page. This allows us to replace the
> > -	 * bufferhead with some other state tracking mechanism in future.
> > +	 * Walk through the page to find areas to write back. If we run off the
> > +	 * end of the current map or find the current map invalid, grab a new
> > +	 * one.
> >  	 */
> > -	for (poffset = 0, file_offset = page_offset(page);
> > -	     poffset < PAGE_SIZE;
> > -	     poffset += len, file_offset += len) {
> > -		/* past the range we are writing, so nothing more to write. */
> > -		if (file_offset >= end_offset)
> > -			break;
> > -
> > -		if (bh && !buffer_uptodate(bh)) {
> > -			if (PageUptodate(page))
> > -				ASSERT(buffer_mapped(bh));
> > -			bh = bh->b_this_page;
> > +	for (i = 0, file_offset = page_offset(page);
> > +	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
> > +	     i++, file_offset += len) {
> > +		if (iop && !test_bit(i, iop->uptodate))
> >  			continue;
> > -		}
> >  
> >  		error = xfs_map_blocks(wpc, inode, file_offset);
> >  		if (error)
> >  			break;
> > -
> > -		if (wpc->io_type == XFS_IO_HOLE) {
> > -			if (bh)
> > -				bh = bh->b_this_page;
> > +		if (wpc->io_type == XFS_IO_HOLE)
> >  			continue;
> > -		}
> > -
> > -		if (bh) {
> > -			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> > -			bh = bh->b_this_page;
> > -		}
> > -		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
> > -				&submit_list);
> > +		xfs_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
> > +				 &submit_list);
> >  		count++;
> >  	}
> >  
> > @@ -863,21 +704,18 @@ xfs_writepage_map(
> >  	ASSERT(!PageWriteback(page));
> >  
> >  	/*
> > -	 * On error, we have to fail the ioend here because we have locked
> > -	 * buffers in the ioend. If we don't do this, we'll deadlock
> > -	 * invalidating the page as that tries to lock the buffers on the page.
> > -	 * Also, because we may have set pages under writeback, we have to make
> > -	 * sure we run IO completion to mark the error state of the IO
> > -	 * appropriately, so we can't cancel the ioend directly here. That means
> > -	 * we have to mark this page as under writeback if we included any
> > -	 * buffers from it in the ioend chain so that completion treats it
> > -	 * correctly.
> > +	 * On error, we have to fail the ioend here because we may have set
> > +	 * pages under writeback, we have to make sure we run IO completion to
> > +	 * mark the error state of the IO appropriately, so we can't cancel the
> > +	 * ioend directly here.  That means we have to mark this page as under
> > +	 * writeback if we included any blocks from it in the ioend chain so
> > +	 * that completion treats it correctly.
> >  	 *
> >  	 * If we didn't include the page in the ioend, the on error we can
> >  	 * simply discard and unlock it as there are no other users of the page
> > -	 * or it's buffers right now. The caller will still need to trigger
> > -	 * submission of outstanding ioends on the writepage context so they are
> > -	 * treated correctly on error.
> > +	 * now.  The caller will still need to trigger submission of outstanding
> > +	 * ioends on the writepage context so they are treated correctly on
> > +	 * error.
> >  	 */
> >  	if (unlikely(error)) {
> >  		if (!count) {
> > @@ -918,8 +756,8 @@ xfs_writepage_map(
> >  	}
> >  
> >  	/*
> > -	 * We can end up here with no error and nothing to write if we race with
> > -	 * a partial page truncate on a sub-page block sized filesystem.
> > +	 * We can end up here with no error and nothing to write only if we race
> > +	 * with a partial page truncate on a sub-page block sized filesystem.
> >  	 */
> >  	if (!count)
> >  		end_page_writeback(page);
> > @@ -934,7 +772,6 @@ xfs_writepage_map(
> >   * For delalloc space on the page we need to allocate space and flush it.
> >   * For unwritten space on the page we need to start the conversion to
> >   * regular allocated space.
> > - * For any other dirty buffer heads on the page we should flush them.
> >   */
> >  STATIC int
> >  xfs_do_writepage(
> > @@ -1088,166 +925,13 @@ xfs_dax_writepages(
> >  			xfs_find_bdev_for_inode(mapping->host), wbc);
> >  }
> >  
> > -/*
> > - * Called to move a page into cleanable state - and from there
> > - * to be released. The page should already be clean. We always
> > - * have buffer heads in this call.
> > - *
> > - * Returns 1 if the page is ok to release, 0 otherwise.
> > - */
> >  STATIC int
> >  xfs_vm_releasepage(
> >  	struct page		*page,
> >  	gfp_t			gfp_mask)
> >  {
> > -	int			delalloc, unwritten;
> > -
> >  	trace_xfs_releasepage(page->mapping->host, page, 0, 0);
> > -
> > -	/*
> > -	 * mm accommodates an old ext3 case where clean pages might not have had
> > -	 * the dirty bit cleared. Thus, it can send actual dirty pages to
> > -	 * ->releasepage() via shrink_active_list(). Conversely,
> > -	 * block_invalidatepage() can send pages that are still marked dirty but
> > -	 * otherwise have invalidated buffers.
> > -	 *
> > -	 * We want to release the latter to avoid unnecessary buildup of the
> > -	 * LRU, so xfs_vm_invalidatepage() clears the page dirty flag on pages
> > -	 * that are entirely invalidated and need to be released.  Hence the
> > -	 * only time we should get dirty pages here is through
> > -	 * shrink_active_list() and so we can simply skip those now.
> > -	 *
> > -	 * warn if we've left any lingering delalloc/unwritten buffers on clean
> > -	 * or invalidated pages we are about to release.
> > -	 */
> > -	if (PageDirty(page))
> > -		return 0;
> > -
> > -	xfs_count_page_state(page, &delalloc, &unwritten);
> > -
> > -	if (WARN_ON_ONCE(delalloc))
> > -		return 0;
> > -	if (WARN_ON_ONCE(unwritten))
> > -		return 0;
> > -
> > -	return try_to_free_buffers(page);
> > -}
> > -
> > -/*
> > - * If this is O_DIRECT or the mpage code calling tell them how large the mapping
> > - * is, so that we can avoid repeated get_blocks calls.
> > - *
> > - * If the mapping spans EOF, then we have to break the mapping up as the mapping
> > - * for blocks beyond EOF must be marked new so that sub block regions can be
> > - * correctly zeroed. We can't do this for mappings within EOF unless the mapping
> > - * was just allocated or is unwritten, otherwise the callers would overwrite
> > - * existing data with zeros. Hence we have to split the mapping into a range up
> > - * to and including EOF, and a second mapping for beyond EOF.
> > - */
> > -static void
> > -xfs_map_trim_size(
> > -	struct inode		*inode,
> > -	sector_t		iblock,
> > -	struct buffer_head	*bh_result,
> > -	struct xfs_bmbt_irec	*imap,
> > -	xfs_off_t		offset,
> > -	ssize_t			size)
> > -{
> > -	xfs_off_t		mapping_size;
> > -
> > -	mapping_size = imap->br_startoff + imap->br_blockcount - iblock;
> > -	mapping_size <<= inode->i_blkbits;
> > -
> > -	ASSERT(mapping_size > 0);
> > -	if (mapping_size > size)
> > -		mapping_size = size;
> > -	if (offset < i_size_read(inode) &&
> > -	    (xfs_ufsize_t)offset + mapping_size >= i_size_read(inode)) {
> > -		/* limit mapping to block that spans EOF */
> > -		mapping_size = roundup_64(i_size_read(inode) - offset,
> > -					  i_blocksize(inode));
> > -	}
> > -	if (mapping_size > LONG_MAX)
> > -		mapping_size = LONG_MAX;
> > -
> > -	bh_result->b_size = mapping_size;
> > -}
> > -
> > -static int
> > -xfs_get_blocks(
> > -	struct inode		*inode,
> > -	sector_t		iblock,
> > -	struct buffer_head	*bh_result,
> > -	int			create)
> > -{
> > -	struct xfs_inode	*ip = XFS_I(inode);
> > -	struct xfs_mount	*mp = ip->i_mount;
> > -	xfs_fileoff_t		offset_fsb, end_fsb;
> > -	int			error = 0;
> > -	int			lockmode = 0;
> > -	struct xfs_bmbt_irec	imap;
> > -	int			nimaps = 1;
> > -	xfs_off_t		offset;
> > -	ssize_t			size;
> > -
> > -	BUG_ON(create);
> > -
> > -	if (XFS_FORCED_SHUTDOWN(mp))
> > -		return -EIO;
> > -
> > -	offset = (xfs_off_t)iblock << inode->i_blkbits;
> > -	ASSERT(bh_result->b_size >= i_blocksize(inode));
> > -	size = bh_result->b_size;
> > -
> > -	if (offset >= i_size_read(inode))
> > -		return 0;
> > -
> > -	/*
> > -	 * Direct I/O is usually done on preallocated files, so try getting
> > -	 * a block mapping without an exclusive lock first.
> > -	 */
> > -	lockmode = xfs_ilock_data_map_shared(ip);
> > -
> > -	ASSERT(offset <= mp->m_super->s_maxbytes);
> > -	if (offset > mp->m_super->s_maxbytes - size)
> > -		size = mp->m_super->s_maxbytes - offset;
> > -	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + size);
> > -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > -
> > -	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> > -			&nimaps, 0);
> > -	if (error)
> > -		goto out_unlock;
> > -	if (!nimaps) {
> > -		trace_xfs_get_blocks_notfound(ip, offset, size);
> > -		goto out_unlock;
> > -	}
> > -
> > -	trace_xfs_get_blocks_found(ip, offset, size,
> > -		imap.br_state == XFS_EXT_UNWRITTEN ?
> > -			XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, &imap);
> > -	xfs_iunlock(ip, lockmode);
> > -
> > -	/* trim mapping down to size requested */
> > -	xfs_map_trim_size(inode, iblock, bh_result, &imap, offset, size);
> > -
> > -	/*
> > -	 * For unwritten extents do not report a disk address in the buffered
> > -	 * read case (treat as if we're reading into a hole).
> > -	 */
> > -	if (xfs_bmap_is_real_extent(&imap))
> > -		xfs_map_buffer(inode, bh_result, &imap, offset);
> > -
> > -	/*
> > -	 * If this is a realtime file, data may be on a different device.
> > -	 * to that pointed to from the buffer_head b_bdev currently.
> > -	 */
> > -	bh_result->b_bdev = xfs_find_bdev_for_inode(inode);
> > -	return 0;
> > -
> > -out_unlock:
> > -	xfs_iunlock(ip, lockmode);
> > -	return error;
> > +	return iomap_releasepage(page, gfp_mask);
> >  }
> >  
> >  STATIC sector_t
> > @@ -1279,9 +963,7 @@ xfs_vm_readpage(
> >  	struct page		*page)
> >  {
> >  	trace_xfs_vm_readpage(page->mapping->host, 1);
> > -	if (i_blocksize(page->mapping->host) == PAGE_SIZE)
> > -		return iomap_readpage(page, &xfs_iomap_ops);
> > -	return mpage_readpage(page, xfs_get_blocks);
> > +	return iomap_readpage(page, &xfs_iomap_ops);
> >  }
> >  
> >  STATIC int
> > @@ -1292,65 +974,7 @@ xfs_vm_readpages(
> >  	unsigned		nr_pages)
> >  {
> >  	trace_xfs_vm_readpages(mapping->host, nr_pages);
> > -	if (i_blocksize(mapping->host) == PAGE_SIZE)
> > -		return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
> > -	return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks);
> > -}
> > -
> > -/*
> > - * This is basically a copy of __set_page_dirty_buffers() with one
> > - * small tweak: buffers beyond EOF do not get marked dirty. If we mark them
> > - * dirty, we'll never be able to clean them because we don't write buffers
> > - * beyond EOF, and that means we can't invalidate pages that span EOF
> > - * that have been marked dirty. Further, the dirty state can leak into
> > - * the file interior if the file is extended, resulting in all sorts of
> > - * bad things happening as the state does not match the underlying data.
> > - *
> > - * XXX: this really indicates that bufferheads in XFS need to die. Warts like
> > - * this only exist because of bufferheads and how the generic code manages them.
> > - */
> > -STATIC int
> > -xfs_vm_set_page_dirty(
> > -	struct page		*page)
> > -{
> > -	struct address_space	*mapping = page->mapping;
> > -	struct inode		*inode = mapping->host;
> > -	loff_t			end_offset;
> > -	loff_t			offset;
> > -	int			newly_dirty;
> > -
> > -	if (unlikely(!mapping))
> > -		return !TestSetPageDirty(page);
> > -
> > -	end_offset = i_size_read(inode);
> > -	offset = page_offset(page);
> > -
> > -	spin_lock(&mapping->private_lock);
> > -	if (page_has_buffers(page)) {
> > -		struct buffer_head *head = page_buffers(page);
> > -		struct buffer_head *bh = head;
> > -
> > -		do {
> > -			if (offset < end_offset)
> > -				set_buffer_dirty(bh);
> > -			bh = bh->b_this_page;
> > -			offset += i_blocksize(inode);
> > -		} while (bh != head);
> > -	}
> > -	/*
> > -	 * Lock out page->mem_cgroup migration to keep PageDirty
> > -	 * synchronized with per-memcg dirty page counters.
> > -	 */
> > -	lock_page_memcg(page);
> > -	newly_dirty = !TestSetPageDirty(page);
> > -	spin_unlock(&mapping->private_lock);
> > -
> > -	if (newly_dirty)
> > -		__set_page_dirty(page, mapping, 1);
> > -	unlock_page_memcg(page);
> > -	if (newly_dirty)
> > -		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> > -	return newly_dirty;
> > +	return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
> >  }
> >  
> >  static int
> > @@ -1368,13 +992,13 @@ const struct address_space_operations xfs_address_space_operations = {
> >  	.readpages		= xfs_vm_readpages,
> >  	.writepage		= xfs_vm_writepage,
> >  	.writepages		= xfs_vm_writepages,
> > -	.set_page_dirty		= xfs_vm_set_page_dirty,
> > +	.set_page_dirty		= iomap_set_page_dirty,
> >  	.releasepage		= xfs_vm_releasepage,
> >  	.invalidatepage		= xfs_vm_invalidatepage,
> >  	.bmap			= xfs_vm_bmap,
> >  	.direct_IO		= noop_direct_IO,
> > -	.migratepage		= buffer_migrate_page,
> > -	.is_partially_uptodate  = block_is_partially_uptodate,
> > +	.migratepage		= iomap_migrate_page,
> > +	.is_partially_uptodate  = iomap_is_partially_uptodate,
> >  	.error_remove_page	= generic_error_remove_page,
> >  	.swap_activate		= xfs_iomap_swapfile_activate,
> >  };
> > diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> > index d24dbd4dac39..6ddf1907fc7a 100644
> > --- a/fs/xfs/xfs_buf.h
> > +++ b/fs/xfs/xfs_buf.h
> > @@ -12,7 +12,6 @@
> >  #include <linux/mm.h>
> >  #include <linux/fs.h>
> >  #include <linux/dax.h>
> > -#include <linux/buffer_head.h>
> >  #include <linux/uio.h>
> >  #include <linux/list_lru.h>
> >  
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 7fe42a126ec1..778b8c850de3 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1032,9 +1032,6 @@ xfs_file_iomap_begin(
> >  	if (XFS_FORCED_SHUTDOWN(mp))
> >  		return -EIO;
> >  
> > -	if (i_blocksize(inode) < PAGE_SIZE)
> > -		iomap->flags |= IOMAP_F_BUFFER_HEAD;
> > -
> >  	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
> >  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> >  		/* Reserve delalloc blocks for regular writeback. */
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 9d791f158dfe..f9f8dc490d3d 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1860,7 +1860,7 @@ MODULE_ALIAS_FS("xfs");
> >  STATIC int __init
> >  xfs_init_zones(void)
> >  {
> > -	if (bioset_init(&xfs_ioend_bioset, 4 * MAX_BUF_PER_PAGE,
> > +	if (bioset_init(&xfs_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
> >  			offsetof(struct xfs_ioend, io_inline_bio),
> >  			BIOSET_NEED_BVECS))
> >  		goto out;
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 1af123df19b5..7f4c7071e7ed 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -1153,33 +1153,23 @@ DECLARE_EVENT_CLASS(xfs_page_class,
> >  		__field(loff_t, size)
> >  		__field(unsigned long, offset)
> >  		__field(unsigned int, length)
> > -		__field(int, delalloc)
> > -		__field(int, unwritten)
> >  	),
> >  	TP_fast_assign(
> > -		int delalloc = -1, unwritten = -1;
> > -
> > -		if (page_has_buffers(page))
> > -			xfs_count_page_state(page, &delalloc, &unwritten);
> >  		__entry->dev = inode->i_sb->s_dev;
> >  		__entry->ino = XFS_I(inode)->i_ino;
> >  		__entry->pgoff = page_offset(page);
> >  		__entry->size = i_size_read(inode);
> >  		__entry->offset = off;
> >  		__entry->length = len;
> > -		__entry->delalloc = delalloc;
> > -		__entry->unwritten = unwritten;
> >  	),
> >  	TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx "
> > -		  "length %x delalloc %d unwritten %d",
> > +		  "length %x",
> >  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> >  		  __entry->ino,
> >  		  __entry->pgoff,
> >  		  __entry->size,
> >  		  __entry->offset,
> > -		  __entry->length,
> > -		  __entry->delalloc,
> > -		  __entry->unwritten)
> > +		  __entry->length)
> >  )
> >  
> >  #define DEFINE_PAGE_EVENT(name)		\
> > @@ -1263,9 +1253,6 @@ DEFINE_EVENT(xfs_imap_class, name,	\
> >  	TP_ARGS(ip, offset, count, type, irec))
> >  DEFINE_IOMAP_EVENT(xfs_map_blocks_found);
> >  DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc);
> > -DEFINE_IOMAP_EVENT(xfs_get_blocks_found);
> > -DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
> > -DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct);
> >  DEFINE_IOMAP_EVENT(xfs_iomap_alloc);
> >  DEFINE_IOMAP_EVENT(xfs_iomap_found);
> >  
> > @@ -1304,7 +1291,6 @@ DEFINE_EVENT(xfs_simple_io_class, name,	\
> >  	TP_ARGS(ip, offset, count))
> >  DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enospc);
> >  DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert);
> > -DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound);
> >  DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize);
> >  DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof);
> >  DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);
> > -- 
> > 2.18.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads
  2018-07-03 22:05     ` Darrick J. Wong
@ 2018-07-08 15:16       ` Christoph Hellwig
  2018-07-10  1:02         ` Brian Foster
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-08 15:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, Christoph Hellwig, linux-xfs, linux-fsdevel

On Tue, Jul 03, 2018 at 03:05:01PM -0700, Darrick J. Wong wrote:
> > So the buffer completion code clears the uptodate status of the buffer
> > on error. I assume that means the next read would replace the data we
> > failed to write with whatever was previously on disk.
> 
> I've always found it a little weird that we basically throw away the
> newer page contents on error, but we shouldn't be changing the behavior
> in quite so subtle a way.
> 
> Also, since we clear uptodate the next (buffered) write will reread the
> page contents.
> 
> > I guess it's debatable whether that is the right thing to do in
> > general, but that seems like a higher level issue nonetheless (i.e., I
> > don't think we'd ever retry the writepage either?).
> 
> AFAIK we don't retry failed writes unless userspace dirties the page.
> 
> > So is there any reason not to do the analogous in the iomap completion
> > code?
> 
> Will let Christoph answer that one.

As far as I can tell the write path should never even touch the
uptodate bit, and the buffer head path is only doing so for very
old legacy reasons.  I'd rather keep the iomap write path out of
the update bit manipulation business from the very beginning instead
of carry junk like this over.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads
  2018-07-08 15:16       ` Christoph Hellwig
@ 2018-07-10  1:02         ` Brian Foster
  2018-07-10 12:15           ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2018-07-10  1:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs, linux-fsdevel

On Sun, Jul 08, 2018 at 05:16:15PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 03, 2018 at 03:05:01PM -0700, Darrick J. Wong wrote:
> > > So the buffer completion code clears the uptodate status of the buffer
> > > on error. I assume that means the next read would replace the data we
> > > failed to write with whatever was previously on disk.
> > 
> > I've always found it a little weird that we basically throw away the
> > newer page contents on error, but we shouldn't be changing the behavior
> > in quite so subtle a way.
> > 
> > Also, since we clear uptodate the next (buffered) write will reread the
> > page contents.
> > 
> > > I guess it's debatable whether that is the right thing to do in
> > > general, but that seems like a higher level issue nonetheless (i.e., I
> > > don't think we'd ever retry the writepage either?).
> > 
> > AFAIK we don't retry failed writes unless userspace dirties the page.
> > 
> > > So is there any reason not to do the analogous in the iomap completion
> > > code?
> > 
> > Will let Christoph answer that one.
> 
> As far as I can tell the write path should never even touch the
> uptodate bit, and the buffer head path is only doing so for very
> old legacy reasons.  I'd rather keep the iomap write path out of
> the update bit manipulation business from the very beginning instead
> of carry junk like this over.

I'm more interested in preserving the expected behavior in the event of
a write error as opposed to just copying whatever state the buffer head
code sets. It looks to me that if the page itself isn't uptodate, we
overwrite a block of that page and then the writepage fails, clearing
the buffer uptodate status means that the next read would return what is
on disk (not what was just written to the page). I'm not sure that's
what happens if the page was already uptodate before the
overwrite/writepage, however, I didn't notice anything that cleared page
uptodate status on a writepage I/O error..?

Brian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads
  2018-07-10  1:02         ` Brian Foster
@ 2018-07-10 12:15           ` Christoph Hellwig
  2018-07-11 10:58             ` Brian Foster
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2018-07-10 12:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel

On Mon, Jul 09, 2018 at 09:02:02PM -0400, Brian Foster wrote:
> It looks to me that if the page itself isn't uptodate, we
> overwrite a block of that page and then the writepage fails, clearing
> the buffer uptodate status means that the next read would return what is
> on disk (not what was just written to the page).

With iomap we never clear the uptodate bit, and we only set it when
the part of the page contains valid data.  With buffer heads we might
indeed clear the uptodate bit after a write error.  Now if the whole
page is set uptodate we won't re-read it, but if the whole page wasn't
uptodate it seems like the buffer head code will lose data in that
case, which looks wrong to me.

> I'm not sure that's
> what happens if the page was already uptodate before the
> overwrite/writepage, however, I didn't notice anything that cleared page
> uptodate status on a writepage I/O error..?

Yes, the buffer head code seems inconsistent in how it treats the buffer
vs page uptodate bits.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads
  2018-07-10 12:15           ` Christoph Hellwig
@ 2018-07-11 10:58             ` Brian Foster
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Foster @ 2018-07-11 10:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs, linux-fsdevel

On Tue, Jul 10, 2018 at 02:15:30PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 09, 2018 at 09:02:02PM -0400, Brian Foster wrote:
> > It looks to me that if the page itself isn't uptodate, we
> > overwrite a block of that page and then the writepage fails, clearing
> > the buffer uptodate status means that the next read would return what is
> > on disk (not what was just written to the page).
> 
> With iomap we never clear the uptodate bit, and we only set it when
> the part of the page contains valid data.  With buffer heads we might
> indeed clear the uptodate bit after a write error.  Now if the whole
> page is set uptodate we won't re-read it, but if the whole page wasn't
> uptodate it seems like the buffer head code will lose data in that
> case, which looks wrong to me.
> 

My understanding is that we could lose data regardless because the page
is not redirtied and thus can be reclaimed. Based on that, I thought
that perhaps the buffer state was cleared to perform that invalidation
explicitly rather than unpredictably based on cache behavior, but it
seems the whole page uptodate thing is completely inconsistent with
that.

> > I'm not sure that's
> > what happens if the page was already uptodate before the
> > overwrite/writepage, however, I didn't notice anything that cleared page
> > uptodate status on a writepage I/O error..?
> 
> Yes, the buffer head code seems inconsistent in how it treats the buffer
> vs page uptodate bits.

Ok. Given the page behavior is what it is, I think it's reasonable to
treat the block state consistently. I mainly wanted to be sure that I
wasn't missing something with regard to the impact of write errors on
PageUptodate() and if there was some explicit write error invalidation
going on, we don't lose sight of it and provide inconsistent behavior.
It sounds like that is not the case:

Reviewed-by: Brian Foster <bfoster@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-07-11 11:02 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-02 14:57 stop using buffer heads in xfs v7 Christoph Hellwig
2018-07-02 14:57 ` [PATCH 01/22] xfs: use iomap for blocksize == PAGE_SIZE readpage and readpages Christoph Hellwig
2018-07-02 14:57 ` [PATCH 02/22] xfs: simplify xfs_aops_discard_page Christoph Hellwig
2018-07-02 14:57 ` [PATCH 03/22] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-07-02 14:57 ` [PATCH 04/22] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
2018-07-02 14:57 ` [PATCH 05/22] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
2018-07-02 14:57 ` [PATCH 06/22] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
2018-07-02 14:57 ` [PATCH 07/22] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
2018-07-02 14:57 ` [PATCH 08/22] xfs: remove xfs_map_cow Christoph Hellwig
2018-07-02 14:58 ` [PATCH 09/22] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
2018-07-02 14:58 ` [PATCH 10/22] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
2018-07-02 14:58 ` [PATCH 11/22] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
2018-07-02 14:58 ` [PATCH 12/22] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
2018-07-02 14:58 ` [PATCH 13/22] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
2018-07-02 14:58 ` [PATCH 14/22] xfs: remove the imap_valid flag Christoph Hellwig
2018-07-02 14:58 ` [PATCH 15/22] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
2018-07-02 14:58 ` [PATCH 16/22] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
2018-07-02 14:58 ` [PATCH 17/22] xfs: remove xfs_start_page_writeback Christoph Hellwig
2018-07-02 14:58 ` [PATCH 18/22] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
2018-07-02 14:58 ` [PATCH 19/22] xfs: allow writeback on pages without buffer heads Christoph Hellwig
2018-07-02 14:58 ` [PATCH 20/22] iomap: add support for sub-pagesize buffered I/O " Christoph Hellwig
2018-07-03 12:31   ` Brian Foster
2018-07-03 21:52     ` Darrick J. Wong
2018-07-02 14:58 ` [PATCH 21/22] xfs: add support for sub-pagesize writeback without buffer_heads Christoph Hellwig
2018-07-03 12:36   ` Brian Foster
2018-07-03 22:05     ` Darrick J. Wong
2018-07-08 15:16       ` Christoph Hellwig
2018-07-10  1:02         ` Brian Foster
2018-07-10 12:15           ` Christoph Hellwig
2018-07-11 10:58             ` Brian Foster
2018-07-02 14:58 ` [PATCH 22/22] xfs: update my copyrights for the writeback and iomap code Christoph Hellwig
2018-07-03 12:36   ` Brian Foster
2018-07-03 21:51   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).