linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* buffered writes without buffer heads in xfs and iomap v5
@ 2018-05-31 18:07 Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 01/21] fs: factor out a __generic_write_end helper Christoph Hellwig
                   ` (20 more replies)
  0 siblings, 21 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Hi all,

this series adds support for buffered writes without buffer heads to
the iomap and XFS code.

For now this series only contains support for block size == PAGE_SIZE,
with the 4k support split into a separate series.


A git tree is available at:

    git://git.infradead.org/users/hch/xfs.git xfs-iomap-write.5

Gitweb:

    http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-iomap-write.5

Changes since v4:
 - make the logic more clear in xfs_bmap_punch_delalloc_range
 - fix a missing unlock in xfs_bmap_punch_delalloc_range
 - always look up COW extents to check for overlaps
 - lots of patch reshuffling and splitting without impact to the final result

Changes since v3:
 - iterate backwards in xfs_bmap_punch_delalloc_range
 - remove the cow_valid variable in xfs_reflink_trim_irec_to_next_cow
 - additional trivial xfs_map_blocks simplifications
 - split the read side into a separate prep series
 - moved the SEEK_HOLE/DATA patches not strictly required out of the series

Changes since v2:
 - minor page_seek_hole_data tweaks
 - don't read data entirely covered by the write operation in write_begin
 - fix zeroing on write_begin I/O failure
 - remove iomap_block_needs_zeroing to make the code more clear
 - update comments on __do_page_cache_readahead

Changes since v1:
 - fix the iomap_readpages error handling
 - use unsigned file offsets in a few places to avoid arithmetic overflows
 - allocate a iomap_page in iomap_page_mkwrite to fix generic/095
 - improve a few comments
 - add more asserts
 - warn about truncated block numbers from ->bmap
 - new patch to change the __do_page_cache_readahead return value to
   unsigned int
 - remove an incorrectly added empty line
 - make inline data an explicit iomap type instead of a flag
 - add a IOMAP_F_BUFFER_HEAD flag to force use of buffers heads for gfs2,
   and keep the basic buffer head infrastructure around for now.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/21] fs: factor out a __generic_write_end helper
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 02/21] iomap: add initial support for writes without buffer heads Christoph Hellwig
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Bits of the buffer.c based write_end implementations that don't know
about buffer_heads and can be reused by other implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/buffer.c   | 67 +++++++++++++++++++++++++++------------------------
 fs/internal.h |  2 ++
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index cabc045f483d..aba2a948b235 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2076,6 +2076,40 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
 }
 EXPORT_SYMBOL(block_write_begin);
 
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page)
+{
+	loff_t old_size = inode->i_size;
+	bool i_size_changed = false;
+
+	/*
+	 * No need to use i_size_read() here, the i_size cannot change under us
+	 * because we hold i_rwsem.
+	 *
+	 * But it's important to update i_size while still holding page lock:
+	 * page writeout could otherwise come in and zero beyond i_size.
+	 */
+	if (pos + copied > inode->i_size) {
+		i_size_write(inode, pos + copied);
+		i_size_changed = true;
+	}
+
+	unlock_page(page);
+	put_page(page);
+
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
+	/*
+	 * Don't mark the inode dirty under page lock. First, it unnecessarily
+	 * makes the holding time of page lock longer. Second, it forces lock
+	 * ordering of page lock and transaction start for journaling
+	 * filesystems.
+	 */
+	if (i_size_changed)
+		mark_inode_dirty(inode);
+	return copied;
+}
+
 int block_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
@@ -2116,39 +2150,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
 {
-	struct inode *inode = mapping->host;
-	loff_t old_size = inode->i_size;
-	int i_size_changed = 0;
-
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-
-	/*
-	 * No need to use i_size_read() here, the i_size
-	 * cannot change under us because we hold i_mutex.
-	 *
-	 * But it's important to update i_size while still holding page lock:
-	 * page writeout could otherwise come in and zero beyond i_size.
-	 */
-	if (pos+copied > inode->i_size) {
-		i_size_write(inode, pos+copied);
-		i_size_changed = 1;
-	}
-
-	unlock_page(page);
-	put_page(page);
-
-	if (old_size < pos)
-		pagecache_isize_extended(inode, old_size, pos);
-	/*
-	 * Don't mark the inode dirty under page lock. First, it unnecessarily
-	 * makes the holding time of page lock longer. Second, it forces lock
-	 * ordering of page lock and transaction start for journaling
-	 * filesystems.
-	 */
-	if (i_size_changed)
-		mark_inode_dirty(inode);
-
-	return copied;
+	return __generic_write_end(mapping->host, pos, copied, page);
 }
 EXPORT_SYMBOL(generic_write_end);
 
diff --git a/fs/internal.h b/fs/internal.h
index e08972db0303..b955232d3d49 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -43,6 +43,8 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
 extern void guard_bio_eod(int rw, struct bio *bio);
 extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
 		get_block_t *get_block, struct iomap *iomap);
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page);
 
 /*
  * char_dev.c
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/21] iomap: add initial support for writes without buffer heads
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 01/21] fs: factor out a __generic_write_end helper Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 03/21] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

For now just limited to blocksize == PAGE_SIZE, where we can simply read
in the full page in write begin, and just set the whole page dirty after
copying data into it.  This code is enabled by default and XFS will now
be feed pages without buffer heads in ->writepage and ->writepages.

If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old
path will still be used, this both helps the transition in XFS and
prepares for the gfs2 migration to the iomap infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap.c            | 128 ++++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_iomap.c    |   6 +-
 include/linux/iomap.h |   2 +
 3 files changed, 123 insertions(+), 13 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 49119e64edc3..5b81d4463520 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -328,6 +328,48 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
 }
 
+static int
+iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
+		unsigned poff, unsigned plen, unsigned from, unsigned to,
+		struct iomap *iomap)
+{
+	struct bio_vec bvec;
+	struct bio bio;
+
+	if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) {
+		zero_user_segments(page, poff, from, to, poff + plen);
+		return 0;
+	}
+
+	bio_init(&bio, &bvec, 1);
+	bio.bi_opf = REQ_OP_READ;
+	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
+	bio_set_dev(&bio, iomap->bdev);
+	__bio_add_page(&bio, page, plen, poff);
+	return submit_bio_wait(&bio);
+}
+
+static int
+__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+		struct page *page, struct iomap *iomap)
+{
+	loff_t block_size = i_blocksize(inode);
+	loff_t block_start = pos & ~(block_size - 1);
+	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
+	unsigned poff = block_start & (PAGE_SIZE - 1);
+	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start);
+	unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
+
+	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
+
+	if (PageUptodate(page))
+		return 0;
+	if (from <= poff && to >= poff + plen)
+		return 0;
+	return iomap_read_page_sync(inode, block_start, page,
+			poff, plen, from, to, iomap);
+}
+
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
@@ -345,7 +387,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	if (!page)
 		return -ENOMEM;
 
-	status = __block_write_begin_int(page, pos, len, NULL, iomap);
+	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
+		status = __block_write_begin_int(page, pos, len, NULL, iomap);
+	else
+		status = __iomap_write_begin(inode, pos, len, page, iomap);
 	if (unlikely(status)) {
 		unlock_page(page);
 		put_page(page);
@@ -358,14 +403,69 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	return status;
 }
 
+int
+iomap_set_page_dirty(struct page *page)
+{
+	struct address_space *mapping = page_mapping(page);
+	int newly_dirty;
+
+	if (unlikely(!mapping))
+		return !TestSetPageDirty(page);
+
+	/*
+	 * Lock out page->mem_cgroup migration to keep PageDirty
+	 * synchronized with per-memcg dirty page counters.
+	 */
+	lock_page_memcg(page);
+	newly_dirty = !TestSetPageDirty(page);
+	if (newly_dirty)
+		__set_page_dirty(page, mapping, 0);
+	unlock_page_memcg(page);
+
+	if (newly_dirty)
+		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+	return newly_dirty;
+}
+EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
+
+static int
+__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+		unsigned copied, struct page *page, struct iomap *iomap)
+{
+	flush_dcache_page(page);
+
+	/*
+	 * The blocks that were entirely written will now be uptodate, so we
+	 * don't have to worry about a readpage reading them and overwriting a
+	 * partial write.  However if we have encountered a short write and only
+	 * partially written into a block, it will not be marked uptodate, so a
+	 * readpage might come in and destroy our partial write.
+	 *
+	 * Do the simplest thing, and just treat any short write to a non
+	 * uptodate page as a zero-length write, and force the caller to redo
+	 * the whole thing.
+	 */
+	if (unlikely(copied < len && !PageUptodate(page))) {
+		copied = 0;
+	} else {
+		SetPageUptodate(page);
+		iomap_set_page_dirty(page);
+	}
+	return __generic_write_end(inode, pos, copied, page);
+}
+
 static int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page)
+		unsigned copied, struct page *page, struct iomap *iomap)
 {
 	int ret;
 
-	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
-			copied, page, NULL);
+	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
+		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
+				copied, page, NULL);
+	else
+		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
+
 	if (ret < len)
 		iomap_write_failed(inode, pos, len);
 	return ret;
@@ -420,7 +520,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page);
+		status = iomap_write_end(inode, pos, bytes, copied, page,
+				iomap);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -514,7 +615,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		WARN_ON_ONCE(!PageUptodate(page));
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page);
+		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -566,7 +667,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page);
+	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -652,11 +753,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
 	struct page *page = data;
 	int ret;
 
-	ret = __block_write_begin_int(page, pos, length, NULL, iomap);
-	if (ret)
-		return ret;
+	if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
+		ret = __block_write_begin_int(page, pos, length, NULL, iomap);
+		if (ret)
+			return ret;
+		block_commit_write(page, 0, length);
+	} else {
+		WARN_ON_ONCE(!PageUptodate(page));
+		WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
+	}
 
-	block_commit_write(page, 0, length);
 	return length;
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c6ce6f9335b6..da6d1995e460 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -638,7 +638,7 @@ xfs_file_iomap_begin_delay(
 	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
 	 * them out if the write happens to fail.
 	 */
-	iomap->flags = IOMAP_F_NEW;
+	iomap->flags |= IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
 done:
 	if (isnullstartblock(got.br_startblock))
@@ -1031,6 +1031,8 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	iomap->flags |= IOMAP_F_BUFFER_HEAD;
+
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
@@ -1131,7 +1133,7 @@ xfs_file_iomap_begin(
 	if (error)
 		return error;
 
-	iomap->flags = IOMAP_F_NEW;
+	iomap->flags |= IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
 
 out_finish:
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 7300d30ca495..4d3d9d0cd69f 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -30,6 +30,7 @@ struct vm_fault;
  */
 #define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
 #define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
+#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
@@ -92,6 +93,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 int iomap_readpage(struct page *page, const struct iomap_ops *ops);
 int iomap_readpages(struct address_space *mapping, struct list_head *pages,
 		unsigned nr_pages, const struct iomap_ops *ops);
+int iomap_set_page_dirty(struct page *page);
 int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/21] xfs: simplify xfs_bmap_punch_delalloc_range
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 01/21] fs: factor out a __generic_write_end helper Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 02/21] iomap: add initial support for writes without buffer heads Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 04/21] xfs: simplify xfs_aops_discard_page Christoph Hellwig
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Instead of using xfs_bmapi_read to find delalloc extents and then punch
them out using xfs_bunmapi, opencode the loop to iterate over the extents
and call xfs_bmap_del_extent_delay directly.  This both simplifies the
code and reduces the number of extent tree lookups required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_bmap_util.c | 85 ++++++++++++++++--------------------------
 1 file changed, 32 insertions(+), 53 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 06badcbadeb4..0070b877ed94 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -695,12 +695,10 @@ xfs_getbmap(
 }
 
 /*
- * dead simple method of punching delalyed allocation blocks from a range in
- * the inode. Walks a block at a time so will be slow, but is only executed in
- * rare error cases so the overhead is not critical. This will always punch out
- * both the start and end blocks, even if the ranges only partially overlap
- * them, so it is up to the caller to ensure that partial blocks are not
- * passed in.
+ * Dead simple method of punching delalyed allocation blocks from a range in
+ * the inode.  This will always punch out both the start and end blocks, even
+ * if the ranges only partially overlap them, so it is up to the caller to
+ * ensure that partial blocks are not passed in.
  */
 int
 xfs_bmap_punch_delalloc_range(
@@ -708,63 +706,44 @@ xfs_bmap_punch_delalloc_range(
 	xfs_fileoff_t		start_fsb,
 	xfs_fileoff_t		length)
 {
-	xfs_fileoff_t		remaining = length;
+	struct xfs_ifork	*ifp = &ip->i_df;
+	xfs_fileoff_t		end_fsb = start_fsb + length;
+	struct xfs_bmbt_irec	got, del;
+	struct xfs_iext_cursor	icur;
 	int			error = 0;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
-	do {
-		int		done;
-		xfs_bmbt_irec_t	imap;
-		int		nimaps = 1;
-		xfs_fsblock_t	firstblock;
-		struct xfs_defer_ops dfops;
+	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+		if (error)
+			return error;
+	}
 
-		/*
-		 * Map the range first and check that it is a delalloc extent
-		 * before trying to unmap the range. Otherwise we will be
-		 * trying to remove a real extent (which requires a
-		 * transaction) or a hole, which is probably a bad idea...
-		 */
-		error = xfs_bmapi_read(ip, start_fsb, 1, &imap, &nimaps,
-				       XFS_BMAPI_ENTIRE);
+	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
+		return 0;
 
-		if (error) {
-			/* something screwed, just bail */
-			if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
-				xfs_alert(ip->i_mount,
-			"Failed delalloc mapping lookup ino %lld fsb %lld.",
-						ip->i_ino, start_fsb);
-			}
-			break;
-		}
-		if (!nimaps) {
-			/* nothing there */
-			goto next_block;
-		}
-		if (imap.br_startblock != DELAYSTARTBLOCK) {
-			/* been converted, ignore */
-			goto next_block;
-		}
-		WARN_ON(imap.br_blockcount == 0);
+	while (got.br_startoff + got.br_blockcount > start_fsb) {
+		del = got;
+		xfs_trim_extent(&del, start_fsb, length);
 
 		/*
-		 * Note: while we initialise the firstblock/dfops pair, they
-		 * should never be used because blocks should never be
-		 * allocated or freed for a delalloc extent and hence we need
-		 * don't cancel or finish them after the xfs_bunmapi() call.
+		 * A delete can push the cursor forward. Step back to the
+		 * previous extent on non-delalloc or extents outside the
+		 * target range.
 		 */
-		xfs_defer_init(&dfops, &firstblock);
-		error = xfs_bunmapi(NULL, ip, start_fsb, 1, 0, 1, &firstblock,
-					&dfops, &done);
-		if (error)
-			break;
+		if (!del.br_blockcount ||
+		    !isnullstartblock(del.br_startblock)) {
+			if (!xfs_iext_prev_extent(ifp, &icur, &got))
+				break;
+			continue;
+		}
 
-		ASSERT(!xfs_defer_has_unfinished_work(&dfops));
-next_block:
-		start_fsb++;
-		remaining--;
-	} while(remaining > 0);
+		error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur,
+						  &got, &del);
+		if (error || !xfs_iext_get_extent(ifp, &icur, &got))
+			break;
+	}
 
 	return error;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/21] xfs: simplify xfs_aops_discard_page
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 03/21] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 05/21] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Instead of looking at the buffer heads to see if a block is delalloc just
call xfs_bmap_punch_delalloc_range on the whole page - this will leave
any non-delalloc block intact and handle the iteration for us.  As a side
effect one more place stops caring about buffer heads and we can remove the
xfs_check_page_type function entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 85 +++++------------------------------------------
 1 file changed, 9 insertions(+), 76 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c631c457b444..f2333e351e07 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -711,49 +711,6 @@ xfs_map_at_offset(
 	clear_buffer_unwritten(bh);
 }
 
-/*
- * Test if a given page contains at least one buffer of a given @type.
- * If @check_all_buffers is true, then we walk all the buffers in the page to
- * try to find one of the type passed in. If it is not set, then the caller only
- * needs to check the first buffer on the page for a match.
- */
-STATIC bool
-xfs_check_page_type(
-	struct page		*page,
-	unsigned int		type,
-	bool			check_all_buffers)
-{
-	struct buffer_head	*bh;
-	struct buffer_head	*head;
-
-	if (PageWriteback(page))
-		return false;
-	if (!page->mapping)
-		return false;
-	if (!page_has_buffers(page))
-		return false;
-
-	bh = head = page_buffers(page);
-	do {
-		if (buffer_unwritten(bh)) {
-			if (type == XFS_IO_UNWRITTEN)
-				return true;
-		} else if (buffer_delay(bh)) {
-			if (type == XFS_IO_DELALLOC)
-				return true;
-		} else if (buffer_dirty(bh) && buffer_mapped(bh)) {
-			if (type == XFS_IO_OVERWRITE)
-				return true;
-		}
-
-		/* If we are only checking the first buffer, we are done now. */
-		if (!check_all_buffers)
-			break;
-	} while ((bh = bh->b_this_page) != head);
-
-	return false;
-}
-
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
@@ -785,9 +742,6 @@ xfs_vm_invalidatepage(
  * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
  * truncation without a transaction as there is no space left for block
  * reservation (typically why we see a ENOSPC in writeback).
- *
- * This is not a performance critical path, so for now just do the punching a
- * buffer head at a time.
  */
 STATIC void
 xfs_aops_discard_page(
@@ -795,47 +749,26 @@ xfs_aops_discard_page(
 {
 	struct inode		*inode = page->mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	struct buffer_head	*bh, *head;
+	struct xfs_mount	*mp = ip->i_mount;
 	loff_t			offset = page_offset(page);
+	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, offset);
+	int			error;
 
-	if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
-		goto out_invalidate;
-
-	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
+	if (XFS_FORCED_SHUTDOWN(mp))
 		goto out_invalidate;
 
-	xfs_alert(ip->i_mount,
+	xfs_alert(mp,
 		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
 			page, ip->i_ino, offset);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	bh = head = page_buffers(page);
-	do {
-		int		error;
-		xfs_fileoff_t	start_fsb;
-
-		if (!buffer_delay(bh))
-			goto next_buffer;
-
-		start_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
-		error = xfs_bmap_punch_delalloc_range(ip, start_fsb, 1);
-		if (error) {
-			/* something screwed, just bail */
-			if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
-				xfs_alert(ip->i_mount,
-			"page discard unable to remove delalloc mapping.");
-			}
-			break;
-		}
-next_buffer:
-		offset += i_blocksize(inode);
-
-	} while ((bh = bh->b_this_page) != head);
-
+	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
+			PAGE_SIZE / i_blocksize(inode));
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error && !XFS_FORCED_SHUTDOWN(mp))
+		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
 	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
-	return;
 }
 
 static int
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/21] xfs: move locking into xfs_bmap_punch_delalloc_range
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 04/21] xfs: simplify xfs_aops_discard_page Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 06/21] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Both callers want the same looking, so do it only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c      | 2 --
 fs/xfs/xfs_bmap_util.c | 9 +++++----
 fs/xfs/xfs_iomap.c     | 3 ---
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f2333e351e07..5dd09e83c81c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -761,10 +761,8 @@ xfs_aops_discard_page(
 		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
 			page, ip->i_ino, offset);
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
 			PAGE_SIZE / i_blocksize(inode));
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0070b877ed94..6690a038892d 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -712,16 +712,15 @@ xfs_bmap_punch_delalloc_range(
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
 
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
 		if (error)
-			return error;
+			goto out_unlock;
 	}
 
 	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
-		return 0;
+		goto out_unlock;
 
 	while (got.br_startoff + got.br_blockcount > start_fsb) {
 		del = got;
@@ -745,6 +744,8 @@ xfs_bmap_punch_delalloc_range(
 			break;
 	}
 
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index da6d1995e460..f949f0dd7382 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1203,11 +1203,8 @@ xfs_file_iomap_end_delalloc(
 		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
 					 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
 					       end_fsb - start_fsb);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
 		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
 			xfs_alert(mp, "%s: unable to clean up ino %lld",
 				__func__, ip->i_ino);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/21] xfs: do not set the page uptodate in xfs_writepage_map
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 05/21] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 07/21] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We already track the page uptodate status based on the buffer uptodate
status, which is updated whenever reading or zeroing blocks.

This code has been there since commit a ptool commit in 2002, which
claims to:

    "merge" the 2.4 fsx fix for block size < page size to 2.5.  This needed
    major changes to actually fit.

and isn't present in other writepage implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5dd09e83c81c..236154d2572f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -850,7 +850,6 @@ xfs_writepage_map(
 	uint64_t		offset;
 	int			error = 0;
 	int			count = 0;
-	int			uptodate = 1;
 	unsigned int		new_type;
 
 	bh = head = page_buffers(page);
@@ -858,8 +857,6 @@ xfs_writepage_map(
 	do {
 		if (offset >= end_offset)
 			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
 
 		/*
 		 * set_page_dirty dirties all buffers in a page, independent
@@ -923,9 +920,6 @@ xfs_writepage_map(
 
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
-	if (uptodate && bh == head)
-		SetPageUptodate(page);
-
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
 out:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/21] xfs: don't clear imap_valid for a non-uptodate buffers
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 06/21] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 08/21] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Finding a buffer that isn't uptodate doesn't invalidate the mapping for
any given block.  The last_sector check will already take care of starting
another ioend as soon as we find any non-update buffer, and if the current
mapping doesn't include the next uptodate buffer the xfs_imap_valid check
will take care of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 236154d2572f..25d112a4bcf5 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -864,10 +864,8 @@ xfs_writepage_map(
 		 * meaningless for holes (!mapped && uptodate), so skip
 		 * buffers covering holes here.
 		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			wpc->imap_valid = false;
+		if (!buffer_mapped(bh) && buffer_uptodate(bh))
 			continue;
-		}
 
 		if (buffer_unwritten(bh))
 			new_type = XFS_IO_UNWRITTEN;
@@ -880,11 +878,8 @@ xfs_writepage_map(
 				ASSERT(buffer_mapped(bh));
 			/*
 			 * This buffer is not uptodate and will not be
-			 * written to disk.  Ensure that we will put any
-			 * subsequent writeable buffers into a new
-			 * ioend.
+			 * written to disk.
 			 */
-			wpc->imap_valid = false;
 			continue;
 		}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/21] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 07/21] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-06-04 12:25   ` Brian Foster
  2018-05-31 18:07 ` [PATCH 09/21] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We want to be able to use the extent state as a reliably indicator for
the type of I/O, and stop using the buffer head state.  For this we
need to stop using the XFS_BMAPI_IGSTATE so that we don't see merged
extents of different types.

Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 25d112a4bcf5..2966d62b0594 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -385,7 +385,6 @@ xfs_map_blocks(
 	ssize_t			count = i_blocksize(inode);
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	int			error = 0;
-	int			bmapi_flags = XFS_BMAPI_ENTIRE;
 	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
@@ -405,8 +404,6 @@ xfs_map_blocks(
 		return 0;
 
 	ASSERT(type != XFS_IO_COW);
-	if (type == XFS_IO_UNWRITTEN)
-		bmapi_flags |= XFS_BMAPI_IGSTATE;
 
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
@@ -418,7 +415,7 @@ xfs_map_blocks(
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				imap, &nimaps, bmapi_flags);
+				imap, &nimaps, XFS_BMAPI_ENTIRE);
 	/*
 	 * Truncate an overwrite extent if there's a pending CoW
 	 * reservation before the end of this extent.  This forces us
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/21] xfs: remove xfs_reflink_trim_irec_to_next_cow
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 08/21] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-06-04 12:25   ` Brian Foster
  2018-05-31 18:07 ` [PATCH 10/21] xfs: remove xfs_map_cow Christoph Hellwig
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We already have to check for overlapping COW extents everytime we
come back to a page in xfs_writepage_map / xfs_map_cow, so this
additional trim is not required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c    |  7 -------
 fs/xfs/xfs_reflink.c | 33 ---------------------------------
 fs/xfs/xfs_reflink.h |  2 --
 fs/xfs/xfs_trace.h   |  1 -
 4 files changed, 43 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2966d62b0594..9b5a4b6e268c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -416,13 +416,6 @@ xfs_map_blocks(
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				imap, &nimaps, XFS_BMAPI_ENTIRE);
-	/*
-	 * Truncate an overwrite extent if there's a pending CoW
-	 * reservation before the end of this extent.  This forces us
-	 * to come back to writepage to take care of the CoW.
-	 */
-	if (nimaps && type == XFS_IO_OVERWRITE)
-		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 	if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 713e857d9ffa..d0092c864458 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -514,39 +514,6 @@ xfs_reflink_find_cow_mapping(
 	return true;
 }
 
-/*
- * Trim an extent to end at the next CoW reservation past offset_fsb.
- */
-void
-xfs_reflink_trim_irec_to_next_cow(
-	struct xfs_inode		*ip,
-	xfs_fileoff_t			offset_fsb,
-	struct xfs_bmbt_irec		*imap)
-{
-	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	struct xfs_bmbt_irec		got;
-	struct xfs_iext_cursor		icur;
-
-	if (!xfs_is_reflink_inode(ip))
-		return;
-
-	/* Find the extent in the CoW fork. */
-	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
-		return;
-
-	/* This is the extent before; try sliding up one. */
-	if (got.br_startoff < offset_fsb) {
-		if (!xfs_iext_next_extent(ifp, &icur, &got))
-			return;
-	}
-
-	if (got.br_startoff >= imap->br_startoff + imap->br_blockcount)
-		return;
-
-	imap->br_blockcount = got.br_startoff - imap->br_startoff;
-	trace_xfs_reflink_trim_irec(ip, imap);
-}
-
 /*
  * Cancel CoW reservations for some block range of an inode.
  *
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 701487bab468..c0a25609007b 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -34,8 +34,6 @@ extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
 extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
 		struct xfs_bmbt_irec *imap);
-extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
-		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
 
 extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
 		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 9d4c4ca24fe6..041045a8d65c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3228,7 +3228,6 @@ DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
 DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
-DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/21] xfs: remove xfs_map_cow
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 09/21] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-06-04 12:25   ` Brian Foster
  2018-05-31 18:07 ` [PATCH 11/21] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We can handle the existing cow mapping case as a special case directly
in xfs_writepage_map, and share code for allocating delalloc blocks
with regular I/O in xfs_map_blocks.  This means we need to always
call xfs_map_blocks for reflink inodes, but we can still skip most of
the work if it turns out that there is no COW mapping overlapping the
current block.

As a subtle detail we need to start caching holes in the wpc to deal
with the case of COW reservations between EOF.  But we'll need that
infrastructure later anyway, so this is no big deal.

Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 184 ++++++++++++++++++++++------------------------
 fs/xfs/xfs_aops.h |   4 +-
 2 files changed, 90 insertions(+), 98 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 9b5a4b6e268c..d536509ca05b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -375,70 +375,107 @@ xfs_end_bio(
 
 STATIC int
 xfs_map_blocks(
+	struct xfs_writepage_ctx *wpc,
 	struct inode		*inode,
-	loff_t			offset,
-	struct xfs_bmbt_irec	*imap,
-	int			type)
+	loff_t			offset)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb, end_fsb;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
+	struct xfs_bmbt_irec	imap;
+	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
 	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	/*
-	 * Truncate can race with writeback since writeback doesn't take the
-	 * iolock and truncate decreases the file size before it starts
-	 * truncating the pages between new_size and old_size.  Therefore, we
-	 * can end up in the situation where writeback gets a CoW fork mapping
-	 * but the truncate makes the mapping invalid and we end up in here
-	 * trying to get a new mapping.  Bail out here so that we simply never
-	 * get a valid mapping and so we drop the write altogether.  The page
-	 * truncation will kill the contents anyway.
-	 */
-	if (type == XFS_IO_COW && offset > i_size_read(inode))
-		return 0;
-
-	ASSERT(type != XFS_IO_COW);
-
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
 
+	if (xfs_is_reflink_inode(ip) &&
+	    xfs_reflink_find_cow_mapping(ip, offset, &imap)) {
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+		/*
+		 * Truncate can race with writeback since writeback doesn't
+		 * take the iolock and truncate decreases the file size before
+		 * it starts truncating the pages between new_size and old_size.
+		 * Therefore, we can end up in the situation where writeback
+		 * gets a CoW fork mapping but the truncate makes the mapping
+		 * invalid and we end up in here trying to get a new mapping.
+		 * bail out here so that we simply never get a valid mapping
+		 * and so we drop the write altogether.  The page truncation
+		 * will kill the contents anyway.
+		 */
+		if (offset > i_size_read(inode)) {
+			whichfork = XFS_IO_HOLE;
+			return 0;
+		}
+		whichfork = XFS_COW_FORK;
+		wpc->io_type = XFS_IO_COW;
+		goto allocate_blocks;
+	}
+
+	/*
+	 * Map valid and no COW extent in the way?  We're done.
+	 */
+	if (wpc->imap_valid) {
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+		return 0;
+	}
+
+	/*
+	 * If we don't have a valid map, now it's time to get a new one for this
+	 * offset.  This will convert delayed allocations (including COW ones)
+	 * into real extents.
+	 */
 	if (offset > mp->m_super->s_maxbytes - count)
 		count = mp->m_super->s_maxbytes - offset;
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				imap, &nimaps, XFS_BMAPI_ENTIRE);
+				&imap, &nimaps, XFS_BMAPI_ENTIRE);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
 	if (error)
 		return error;
 
-	if (type == XFS_IO_DELALLOC &&
-	    (!nimaps || isnullstartblock(imap->br_startblock))) {
-		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
-				imap);
-		if (!error)
-			trace_xfs_map_blocks_alloc(ip, offset, count, type, imap);
-		return error;
+	if (!nimaps) {
+		/*
+		 * Lookup returns no match? Beyond eof? regardless,
+		 * return it as a hole so we don't write it
+		 */
+		imap.br_startoff = offset_fsb;
+		imap.br_blockcount = end_fsb - offset_fsb;
+		imap.br_startblock = HOLESTARTBLOCK;
+		wpc->io_type = XFS_IO_HOLE;
+	} else if (imap.br_startblock == HOLESTARTBLOCK) {
+		/* landed in a hole */
+		wpc->io_type = XFS_IO_HOLE;
 	}
 
+	if (wpc->io_type == XFS_IO_DELALLOC &&
+	    (!nimaps || isnullstartblock(imap.br_startblock)))
+		goto allocate_blocks;
+
 #ifdef DEBUG
-	if (type == XFS_IO_UNWRITTEN) {
+	if (wpc->io_type == XFS_IO_UNWRITTEN) {
 		ASSERT(nimaps);
-		ASSERT(imap->br_startblock != HOLESTARTBLOCK);
-		ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
+		ASSERT(imap.br_startblock != HOLESTARTBLOCK);
+		ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
 	}
 #endif
-	if (nimaps)
-		trace_xfs_map_blocks_found(ip, offset, count, type, imap);
+	wpc->imap = imap;
+	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
+	return 0;
+allocate_blocks:
+	error = xfs_iomap_write_allocate(ip, whichfork, offset, &imap);
+	if (error)
+		return error;
+	wpc->imap = imap;
+	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap);
 	return 0;
 }
 
@@ -759,56 +796,6 @@ xfs_aops_discard_page(
 	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
 }
 
-static int
-xfs_map_cow(
-	struct xfs_writepage_ctx *wpc,
-	struct inode		*inode,
-	loff_t			offset,
-	unsigned int		*new_type)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_bmbt_irec	imap;
-	bool			is_cow = false;
-	int			error;
-
-	/*
-	 * If we already have a valid COW mapping keep using it.
-	 */
-	if (wpc->io_type == XFS_IO_COW) {
-		wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
-		if (wpc->imap_valid) {
-			*new_type = XFS_IO_COW;
-			return 0;
-		}
-	}
-
-	/*
-	 * Else we need to check if there is a COW mapping at this offset.
-	 */
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
-	is_cow = xfs_reflink_find_cow_mapping(ip, offset, &imap);
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
-	if (!is_cow)
-		return 0;
-
-	/*
-	 * And if the COW mapping has a delayed extent here we need to
-	 * allocate real space for it now.
-	 */
-	if (isnullstartblock(imap.br_startblock)) {
-		error = xfs_iomap_write_allocate(ip, XFS_COW_FORK, offset,
-				&imap);
-		if (error)
-			return error;
-	}
-
-	wpc->io_type = *new_type = XFS_IO_COW;
-	wpc->imap_valid = true;
-	wpc->imap = imap;
-	return 0;
-}
-
 /*
  * We implement an immediate ioend submission policy here to avoid needing to
  * chain multiple ioends and hence nest mempool allocations which can violate
@@ -873,10 +860,13 @@ xfs_writepage_map(
 			continue;
 		}
 
-		if (xfs_is_reflink_inode(XFS_I(inode))) {
-			error = xfs_map_cow(wpc, inode, offset, &new_type);
-			if (error)
-				goto out;
+		/*
+		 * If we already have a valid COW mapping keep using it.
+		 */
+		if (wpc->io_type == XFS_IO_COW &&
+		    xfs_imap_valid(inode, &wpc->imap, offset)) {
+			wpc->imap_valid = true;
+			new_type = XFS_IO_COW;
 		}
 
 		if (wpc->io_type != new_type) {
@@ -887,22 +877,22 @@ xfs_writepage_map(
 		if (wpc->imap_valid)
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
 							 offset);
-		if (!wpc->imap_valid) {
-			error = xfs_map_blocks(inode, offset, &wpc->imap,
-					     wpc->io_type);
+		if (!wpc->imap_valid || xfs_is_reflink_inode(XFS_I(inode))) {
+			error = xfs_map_blocks(wpc, inode, offset);
 			if (error)
 				goto out;
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
 							 offset);
 		}
-		if (wpc->imap_valid) {
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
-			count++;
-		}
 
+		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
+			continue;
+
+		lock_buffer(bh);
+		if (wpc->io_type != XFS_IO_OVERWRITE)
+			xfs_map_at_offset(inode, bh, &wpc->imap, offset);
+		xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
+		count++;
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 69346d460dfa..b2ef5b661761 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -29,6 +29,7 @@ enum {
 	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
 	XFS_IO_OVERWRITE,	/* covers already allocated extent */
 	XFS_IO_COW,		/* covers copy-on-write extent */
+	XFS_IO_HOLE,		/* covers region without any block allocation */
 };
 
 #define XFS_IO_TYPES \
@@ -36,7 +37,8 @@ enum {
 	{ XFS_IO_DELALLOC,		"delalloc" }, \
 	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
 	{ XFS_IO_OVERWRITE,		"overwrite" }, \
-	{ XFS_IO_COW,			"CoW" }
+	{ XFS_IO_COW,			"CoW" }, \
+	{ XFS_IO_HOLE,			"hole" }
 
 /*
  * Structure for buffered I/O completions.
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/21] xfs: rename the offset variable in xfs_writepage_map
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 10/21] xfs: remove xfs_map_cow Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-06-04 12:25   ` Brian Foster
  2018-05-31 18:07 ` [PATCH 12/21] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Calling it file_offset makes the usage more clear, especially with
a new poffset variable that will be added soon for the offset inside
the page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index d536509ca05b..6f57a785f4d3 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -824,15 +824,15 @@ xfs_writepage_map(
 	struct xfs_ioend	*ioend, *next;
 	struct buffer_head	*bh, *head;
 	ssize_t			len = i_blocksize(inode);
-	uint64_t		offset;
+	uint64_t		file_offset;	/* file offset of page */
 	int			error = 0;
 	int			count = 0;
 	unsigned int		new_type;
 
 	bh = head = page_buffers(page);
-	offset = page_offset(page);
+	file_offset = page_offset(page);
 	do {
-		if (offset >= end_offset)
+		if (file_offset >= end_offset)
 			break;
 
 		/*
@@ -864,7 +864,7 @@ xfs_writepage_map(
 		 * If we already have a valid COW mapping keep using it.
 		 */
 		if (wpc->io_type == XFS_IO_COW &&
-		    xfs_imap_valid(inode, &wpc->imap, offset)) {
+		    xfs_imap_valid(inode, &wpc->imap, file_offset)) {
 			wpc->imap_valid = true;
 			new_type = XFS_IO_COW;
 		}
@@ -876,13 +876,13 @@ xfs_writepage_map(
 
 		if (wpc->imap_valid)
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
+							 file_offset);
 		if (!wpc->imap_valid || xfs_is_reflink_inode(XFS_I(inode))) {
-			error = xfs_map_blocks(wpc, inode, offset);
+			error = xfs_map_blocks(wpc, inode, file_offset);
 			if (error)
 				goto out;
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
+							 file_offset);
 		}
 
 		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
@@ -890,10 +890,10 @@ xfs_writepage_map(
 
 		lock_buffer(bh);
 		if (wpc->io_type != XFS_IO_OVERWRITE)
-			xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-		xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
+			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
 		count++;
-	} while (offset += len, ((bh = bh->b_this_page) != head));
+	} while (file_offset += len, ((bh = bh->b_this_page) != head));
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/21] xfs: make xfs_writepage_map extent map centric
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 11/21] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-06-04 12:26   ` Brian Foster
  2018-05-31 18:07 ` [PATCH 13/21] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

xfs_writepage_map() iterates over the bufferheads on a page to decide
what sort of IO to do and what actions to take.  However, when it comes
to reflink and deciding when it needs to execute a COW operation, we no
longer look at the bufferhead state but instead we ignore than and look
up internal state held in the COW fork extent list.

This means xfs_writepage_map() is somewhat confused. It does stuff, then
ignores it, then tries to handle the impedence mismatch by shovelling the
results inside the existing mapping code.  It works, but it's a bit of a
mess and it makes it hard to fix the cached map bug that the writepage
code currently has.

To unify the two different mechanisms, we first have to choose a direction.
That's already been set - we're de-emphasising bufferheads so they are no
longer a control structure as we need to do taht to allow for eventual
removal.  Hence we need to move away from looking at bufferhead state to
determine what operations we need to perform.

We can't completely get rid of bufferheads yet - they do contain some
state that is absolutely necessary, such as whether that part of the page
contains valid data or not (buffer_uptodate()).  Other state in the
bufferhead is redundant:

	BH_dirty - the page is dirty, so we can ignore this and just
		write it
	BH_delay - we have delalloc extent info in the DATA fork extent
		tree
	BH_unwritten - same as BH_delay
	BH_mapped - indicates we've already used it once for IO and it is
		mapped to a disk address. Needs to be ignored for COW
		blocks.

The BH_mapped flag is an interesting case - it's supposed to indicate that
it's already mapped to disk and so we can just use it "as is".  In theory,
we don't even have to do an extent lookup to find where to write it too,
but we have to do that anyway to determine we are actually writing over a
valid extent.  Hence it's not even serving the purpose of avoiding a an
extent lookup during writeback, and so we can pretty much ignore it.
Especially as we have to ignore it for COW operations...

Therefore, use the extent map as the source of information to tell us
what actions we need to take and what sort of IO we should perform.  The
first step is to have xfs_map_blocks() set the io type according to what
it looks up.  This means it can easily handle both normal overwrite and
COW cases.  The only thing we also need to add is the ability to return
hole mappings.

We need to return and cache hole mappings now for the case of multiple
blocks per page.  We no longer use the BH_mapped to indicate a block over
a hole, so we have to get that info from xfs_map_blocks().  We cache it so
that holes that span two pages don't need separate lookups.  This allows us
to avoid ever doing write IO over a hole, too.

Now that we have xfs_map_blocks() returning both a cached map and the type
of IO we need to perform, we can rewrite xfs_writepage_map() to drop all
the bufferhead control. It's also much simplified because it doesn't need
to explicitly handle COW operations.  Instead of iterating bufferheads, it
iterates blocks within the page and then looks up what per-block state is
required from the appropriate bufferhead.  It then validates the cached
map, and if it's not valid, we get a new map.  If we don't get a valid map
or it's over a hole, we skip the block.

At this point, we have to remap the bufferhead via xfs_map_at_offset().
As previously noted, we had to do this even if the buffer was already
mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN
and XFS_IO_COW IO types.  With xfs_map_blocks() now controlling the type,
even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet-
written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE.
Bufferheads that span such regions still need their BH_Delay flags cleared
and their block numbers calculated, so we now unconditionally map each
bufferhead before submission.

But wait! There's more - remember the old "treat unwritten extents as
holes on read" hack?  Yeah, that means we can have a dirty page with
unmapped, unwritten bufferheads that contain data!  What makes these so
special is that the unwritten "hole" bufferheads do not have a valid block
device pointer, so if we attempt to write them xfs_add_to_ioend() blows
up. So we make xfs_map_at_offset() do the "realtime or data device"
lookup from the inode and ignore what was or wasn't put into the
bufferhead when the buffer was instantiated.

The astute reader will have realised by now that this code treats
unwritten extents in multiple-blocks-per-page situations differently.
If we get any combination of unwritten blocks on a dirty page that contain
valid data in the page, we're going to convert them to real extents.  This
can actually be a win, because it means that pages with interleaving
unwritten and written blocks will get converted to a single written extent
with zeros replacing the interspersed unwritten blocks.  This is actually
good for reducing extent list and conversion overhead, and it means we
issue a contiguous IO instead of lots of little ones.  The downside is
that we use up a little extra IO bandwidth.  Neither of these seem like a
bad thing given that spinning disks are seek sensitive, and SSDs/pmem have
bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger
IOs will result in better performance on them...

As a result of all this, the only state we actually care about from the
bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to
pass some information to the bio via xfs_add_to_ioend(), but that is
trivial to separate and pass explicitly.  This means we really only need
1 bit of state per block per page from the buffered write path in the
writeback path.  Everything else we do with the bufferhead is purely to
make the buffered IO front end continue to work correctly. i.e we've
pretty much marginalised bufferheads in the writeback path completely.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
[hch: forward port + slight refactoring]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 96 ++++++++++++++++++++++-------------------------
 1 file changed, 44 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 6f57a785f4d3..a0ecfd63b858 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -454,19 +454,19 @@ xfs_map_blocks(
 	} else if (imap.br_startblock == HOLESTARTBLOCK) {
 		/* landed in a hole */
 		wpc->io_type = XFS_IO_HOLE;
-	}
-
-	if (wpc->io_type == XFS_IO_DELALLOC &&
-	    (!nimaps || isnullstartblock(imap.br_startblock)))
-		goto allocate_blocks;
+	} else {
+		if (isnullstartblock(imap.br_startblock)) {
+			/* got a delalloc extent */
+			wpc->io_type = XFS_IO_DELALLOC;
+			goto allocate_blocks;
+		}
 
-#ifdef DEBUG
-	if (wpc->io_type == XFS_IO_UNWRITTEN) {
-		ASSERT(nimaps);
-		ASSERT(imap.br_startblock != HOLESTARTBLOCK);
-		ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
+		if (imap.br_state == XFS_EXT_UNWRITTEN)
+			wpc->io_type = XFS_IO_UNWRITTEN;
+		else
+			wpc->io_type = XFS_IO_OVERWRITE;
 	}
-#endif
+
 	wpc->imap = imap;
 	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
 	return 0;
@@ -736,6 +736,14 @@ xfs_map_at_offset(
 	set_buffer_mapped(bh);
 	clear_buffer_delay(bh);
 	clear_buffer_unwritten(bh);
+
+	/*
+	 * If this is a realtime file, data may be on a different device.
+	 * to that pointed to from the buffer_head b_bdev currently. We can't
+	 * trust that the bufferhead has a already been mapped correctly, so
+	 * set the bdev now.
+	 */
+	bh->b_bdev = xfs_find_bdev_for_inode(inode);
 }
 
 STATIC void
@@ -822,61 +830,46 @@ xfs_writepage_map(
 {
 	LIST_HEAD(submit_list);
 	struct xfs_ioend	*ioend, *next;
-	struct buffer_head	*bh, *head;
+	struct buffer_head	*bh;
 	ssize_t			len = i_blocksize(inode);
 	uint64_t		file_offset;	/* file offset of page */
+	unsigned		poffset;	/* offset into page */
 	int			error = 0;
 	int			count = 0;
-	unsigned int		new_type;
 
-	bh = head = page_buffers(page);
+	/*
+	 * Walk the blocks on the page, and we we run off then end of the
+	 * current map or find the current map invalid, grab a new one.
+	 * We only use bufferheads here to check per-block state - they no
+	 * longer control the iteration through the page. This allows us to
+	 * replace the bufferhead with some other state tracking mechanism in
+	 * future.
+	 */
 	file_offset = page_offset(page);
-	do {
+	bh = page_buffers(page);
+	for (poffset = 0;
+	     poffset < PAGE_SIZE;
+	     poffset += len, file_offset += len, bh = bh->b_this_page) {
+		/* past the range we are writing, so nothing more to write. */
 		if (file_offset >= end_offset)
 			break;
 
-		/*
-		 * set_page_dirty dirties all buffers in a page, independent
-		 * of their state.  The dirty state however is entirely
-		 * meaningless for holes (!mapped && uptodate), so skip
-		 * buffers covering holes here.
-		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh))
-			continue;
-
-		if (buffer_unwritten(bh))
-			new_type = XFS_IO_UNWRITTEN;
-		else if (buffer_delay(bh))
-			new_type = XFS_IO_DELALLOC;
-		else if (buffer_uptodate(bh))
-			new_type = XFS_IO_OVERWRITE;
-		else {
+		if (!buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
-			/*
-			 * This buffer is not uptodate and will not be
-			 * written to disk.
-			 */
 			continue;
 		}
 
-		/*
-		 * If we already have a valid COW mapping keep using it.
-		 */
-		if (wpc->io_type == XFS_IO_COW &&
-		    xfs_imap_valid(inode, &wpc->imap, file_offset)) {
-			wpc->imap_valid = true;
-			new_type = XFS_IO_COW;
-		}
-
-		if (wpc->io_type != new_type) {
-			wpc->io_type = new_type;
-			wpc->imap_valid = false;
-		}
-
 		if (wpc->imap_valid)
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
 							 file_offset);
+		/*
+		 * If we don't have a valid map, now it's time to get a new one
+		 * for this offset.  This will convert delayed allocations
+		 * (including COW ones) into real extents.  If we return without
+		 * a valid map, it means we landed in a hole and we skip the
+		 * block.
+		 */
 		if (!wpc->imap_valid || xfs_is_reflink_inode(XFS_I(inode))) {
 			error = xfs_map_blocks(wpc, inode, file_offset);
 			if (error)
@@ -889,11 +882,10 @@ xfs_writepage_map(
 			continue;
 
 		lock_buffer(bh);
-		if (wpc->io_type != XFS_IO_OVERWRITE)
-			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
 		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
 		count++;
-	} while (file_offset += len, ((bh = bh->b_this_page) != head));
+	}
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/21] xfs: remove the now unused XFS_BMAPI_IGSTATE flag
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 12/21] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-06-04 12:27   ` Brian Foster
  2018-05-31 18:07 ` [PATCH 14/21] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 6 ++----
 fs/xfs/libxfs/xfs_bmap.h | 3 ---
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7b0e2b551e23..4b5e014417d2 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3799,8 +3799,7 @@ xfs_bmapi_update_map(
 		   mval[-1].br_startblock != HOLESTARTBLOCK &&
 		   mval->br_startblock == mval[-1].br_startblock +
 					  mval[-1].br_blockcount &&
-		   ((flags & XFS_BMAPI_IGSTATE) ||
-			mval[-1].br_state == mval->br_state)) {
+		   mval[-1].br_state == mval->br_state) {
 		ASSERT(mval->br_startoff ==
 		       mval[-1].br_startoff + mval[-1].br_blockcount);
 		mval[-1].br_blockcount += mval->br_blockcount;
@@ -3845,7 +3844,7 @@ xfs_bmapi_read(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
-			   XFS_BMAPI_IGSTATE|XFS_BMAPI_COWFORK)));
+			   XFS_BMAPI_COWFORK)));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
 
 	if (unlikely(XFS_TEST_ERROR(
@@ -4290,7 +4289,6 @@ xfs_bmapi_write(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
-	ASSERT(!(flags & XFS_BMAPI_IGSTATE));
 	ASSERT(tp != NULL ||
 	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
 			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 2c233f9f1a26..a845fe57d1b5 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -80,8 +80,6 @@ struct xfs_extent_free_item
 #define XFS_BMAPI_METADATA	0x002	/* mapping metadata not user data */
 #define XFS_BMAPI_ATTRFORK	0x004	/* use attribute fork not data */
 #define XFS_BMAPI_PREALLOC	0x008	/* preallocation op: unwritten space */
-#define XFS_BMAPI_IGSTATE	0x010	/* Ignore state - */
-					/* combine contig. space */
 #define XFS_BMAPI_CONTIG	0x020	/* must allocate only one extent */
 /*
  * unwritten extent conversion - this needs write cache flushing and no additional
@@ -128,7 +126,6 @@ struct xfs_extent_free_item
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
 	{ XFS_BMAPI_ATTRFORK,	"ATTRFORK" }, \
 	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
-	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
 	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
 	{ XFS_BMAPI_ZERO,	"ZERO" }, \
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/21] xfs: remove xfs_reflink_find_cow_mapping
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 13/21] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 15/21] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

We only have one caller left, and open coding the simple extent list
lookup in it allows us to make the code both more understandable and
reuse calculations and variables already present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c    | 19 +++++++++++++------
 fs/xfs/xfs_reflink.c | 30 ------------------------------
 fs/xfs/xfs_reflink.h |  2 --
 fs/xfs/xfs_trace.h   |  1 -
 4 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a0ecfd63b858..d49307312ca6 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -382,9 +382,10 @@ xfs_map_blocks(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
+	xfs_fileoff_t		offset_fsb, end_fsb;
 	struct xfs_bmbt_irec	imap;
 	int			whichfork = XFS_DATA_FORK;
+	struct xfs_iext_cursor	icur;
 	int			error = 0;
 	int			nimaps = 1;
 
@@ -396,8 +397,18 @@ xfs_map_blocks(
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
 
+	if (offset > mp->m_super->s_maxbytes - count)
+		count = mp->m_super->s_maxbytes - offset;
+	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
+	offset_fsb = XFS_B_TO_FSBT(mp, offset);
+
+	/*
+	 * Check if this is offset is covered by a COW extents, and if yes use
+	 * it directly instead of looking up anything in the data fork.
+	 */
 	if (xfs_is_reflink_inode(ip) &&
-	    xfs_reflink_find_cow_mapping(ip, offset, &imap)) {
+	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &imap) &&
+	    imap.br_startoff <= offset_fsb) {
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		/*
 		 * Truncate can race with writeback since writeback doesn't
@@ -432,10 +443,6 @@ xfs_map_blocks(
 	 * offset.  This will convert delayed allocations (including COW ones)
 	 * into real extents.
 	 */
-	if (offset > mp->m_super->s_maxbytes - count)
-		count = mp->m_super->s_maxbytes - offset;
-	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				&imap, &nimaps, XFS_BMAPI_ENTIRE);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index d0092c864458..ff76bc56ff3d 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -484,36 +484,6 @@ xfs_reflink_allocate_cow(
 	return error;
 }
 
-/*
- * Find the CoW reservation for a given byte offset of a file.
- */
-bool
-xfs_reflink_find_cow_mapping(
-	struct xfs_inode		*ip,
-	xfs_off_t			offset,
-	struct xfs_bmbt_irec		*imap)
-{
-	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	xfs_fileoff_t			offset_fsb;
-	struct xfs_bmbt_irec		got;
-	struct xfs_iext_cursor		icur;
-
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
-
-	if (!xfs_is_reflink_inode(ip))
-		return false;
-	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
-	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
-		return false;
-	if (got.br_startoff > offset_fsb)
-		return false;
-
-	trace_xfs_reflink_find_cow_mapping(ip, offset, 1, XFS_IO_OVERWRITE,
-			&got);
-	*imap = got;
-	return true;
-}
-
 /*
  * Cancel CoW reservations for some block range of an inode.
  *
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index c0a25609007b..e8d4d50c629f 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -32,8 +32,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
-extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
-		struct xfs_bmbt_irec *imap);
 
 extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
 		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 041045a8d65c..666844532030 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3227,7 +3227,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
 DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
-DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 15/21] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 14/21] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 16/21] xfs: remove the imap_valid flag Christoph Hellwig
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

xfs_bmapi_read adds zero value in xfs_map_blocks.  Replace it with a
direct call to the low-level extent lookup function.

Note that we now always pass a 0 length to the trace points as we ask
for an unspecified len.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index d49307312ca6..5c4cc713edba 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -387,7 +387,6 @@ xfs_map_blocks(
 	int			whichfork = XFS_DATA_FORK;
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
-	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
@@ -443,24 +442,16 @@ xfs_map_blocks(
 	 * offset.  This will convert delayed allocations (including COW ones)
 	 * into real extents.
 	 */
-	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				&imap, &nimaps, XFS_BMAPI_ENTIRE);
+	if (!xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap))
+		imap.br_startoff = end_fsb;	/* fake a hole past EOF */
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-	if (error)
-		return error;
 
-	if (!nimaps) {
-		/*
-		 * Lookup returns no match? Beyond eof? regardless,
-		 * return it as a hole so we don't write it
-		 */
+	if (imap.br_startoff > offset_fsb) {
+		/* landed in a hole or beyond EOF */
+		imap.br_blockcount = imap.br_startoff - offset_fsb;
 		imap.br_startoff = offset_fsb;
-		imap.br_blockcount = end_fsb - offset_fsb;
 		imap.br_startblock = HOLESTARTBLOCK;
 		wpc->io_type = XFS_IO_HOLE;
-	} else if (imap.br_startblock == HOLESTARTBLOCK) {
-		/* landed in a hole */
-		wpc->io_type = XFS_IO_HOLE;
 	} else {
 		if (isnullstartblock(imap.br_startblock)) {
 			/* got a delalloc extent */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 16/21] xfs: remove the imap_valid flag
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 15/21] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 17/21] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Simplify the way we check for a valid imap - we know we have a valid
mapping after xfs_map_blocks returned successfully, and we know we can
call xfs_imap_valid on any imap, as it will always fail on a
zero-initialized map.

We can also remove the xfs_imap_valid function and fold it into
xfs_map_blocks now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 78 +++++++++++++++++++----------------------------
 1 file changed, 31 insertions(+), 47 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5c4cc713edba..5f8e40a032b1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -42,7 +42,6 @@
  */
 struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
-	bool			imap_valid;
 	unsigned int		io_type;
 	struct xfs_ioend	*ioend;
 	sector_t		last_block;
@@ -382,15 +381,40 @@ xfs_map_blocks(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb, end_fsb;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
 	struct xfs_bmbt_irec	imap;
 	int			whichfork = XFS_DATA_FORK;
 	struct xfs_iext_cursor	icur;
+	bool			imap_valid;
 	int			error = 0;
 
+	/*
+	 * We have to make sure the cached mapping is within EOF to protect
+	 * against eofblocks trimming on file release leaving us with a stale
+	 * mapping. Otherwise, a page for a subsequent file extending buffered
+	 * write could get picked up by this writeback cycle and written to the
+	 * wrong blocks.
+	 *
+	 * Note that what we really want here is a generic mapping invalidation
+	 * mechanism to protect us from arbitrary extent modifying contexts, not
+	 * just eofblocks.
+	 */
+	xfs_trim_extent_eof(&wpc->imap, ip);
+
+	imap_valid = offset_fsb >= wpc->imap.br_startoff &&
+		     offset_fsb < wpc->imap.br_startoff + wpc->imap.br_blockcount;
+	if (imap_valid && !xfs_is_reflink_inode(ip))
+		return 0;
+
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	/*
+	 * If we don't have a valid map, now it's time to get a new one for this
+	 * offset.  This will convert delayed allocations (including COW ones)
+	 * into real extents.  If we return without a valid map, it means we
+	 * landed in a hole and we skip the block.
+	 */
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
@@ -399,7 +423,6 @@ xfs_map_blocks(
 	if (offset > mp->m_super->s_maxbytes - count)
 		count = mp->m_super->s_maxbytes - offset;
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 
 	/*
 	 * Check if this is offset is covered by a COW extents, and if yes use
@@ -432,7 +455,7 @@ xfs_map_blocks(
 	/*
 	 * Map valid and no COW extent in the way?  We're done.
 	 */
-	if (wpc->imap_valid) {
+	if (imap_valid) {
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		return 0;
 	}
@@ -477,31 +500,6 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC bool
-xfs_imap_valid(
-	struct inode		*inode,
-	struct xfs_bmbt_irec	*imap,
-	xfs_off_t		offset)
-{
-	offset >>= inode->i_blkbits;
-
-	/*
-	 * We have to make sure the cached mapping is within EOF to protect
-	 * against eofblocks trimming on file release leaving us with a stale
-	 * mapping. Otherwise, a page for a subsequent file extending buffered
-	 * write could get picked up by this writeback cycle and written to the
-	 * wrong blocks.
-	 *
-	 * Note that what we really want here is a generic mapping invalidation
-	 * mechanism to protect us from arbitrary extent modifying contexts, not
-	 * just eofblocks.
-	 */
-	xfs_trim_extent_eof(imap, XFS_I(inode));
-
-	return offset >= imap->br_startoff &&
-		offset < imap->br_startoff + imap->br_blockcount;
-}
-
 STATIC void
 xfs_start_buffer_writeback(
 	struct buffer_head	*bh)
@@ -858,25 +856,11 @@ xfs_writepage_map(
 			continue;
 		}
 
-		if (wpc->imap_valid)
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 file_offset);
-		/*
-		 * If we don't have a valid map, now it's time to get a new one
-		 * for this offset.  This will convert delayed allocations
-		 * (including COW ones) into real extents.  If we return without
-		 * a valid map, it means we landed in a hole and we skip the
-		 * block.
-		 */
-		if (!wpc->imap_valid || xfs_is_reflink_inode(XFS_I(inode))) {
-			error = xfs_map_blocks(wpc, inode, file_offset);
-			if (error)
-				goto out;
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 file_offset);
-		}
+		error = xfs_map_blocks(wpc, inode, file_offset);
+		if (error)
+			goto out;
 
-		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
+		if (wpc->io_type == XFS_IO_HOLE)
 			continue;
 
 		lock_buffer(bh);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 17/21] xfs: don't look at buffer heads in xfs_add_to_ioend
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 16/21] xfs: remove the imap_valid flag Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 18/21] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Calculate all information for the bio based on the passed in information
without requiring a buffer_head structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 68 ++++++++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5f8e40a032b1..5950a3627c00 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -44,7 +44,6 @@ struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
 	unsigned int		io_type;
 	struct xfs_ioend	*ioend;
-	sector_t		last_block;
 };
 
 void
@@ -539,11 +538,6 @@ xfs_start_page_writeback(
 	unlock_page(page);
 }
 
-static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
-{
-	return bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
-}
-
 /*
  * Submit the bio for an ioend. We are passed an ioend with a bio attached to
  * it, and we submit that bio. The ioend may be used for multiple bio
@@ -598,27 +592,20 @@ xfs_submit_ioend(
 	return 0;
 }
 
-static void
-xfs_init_bio_from_bh(
-	struct bio		*bio,
-	struct buffer_head	*bh)
-{
-	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio_set_dev(bio, bh->b_bdev);
-}
-
 static struct xfs_ioend *
 xfs_alloc_ioend(
 	struct inode		*inode,
 	unsigned int		type,
 	xfs_off_t		offset,
-	struct buffer_head	*bh)
+	struct block_device	*bdev,
+	sector_t		sector)
 {
 	struct xfs_ioend	*ioend;
 	struct bio		*bio;
 
 	bio = bio_alloc_bioset(GFP_NOFS, BIO_MAX_PAGES, xfs_ioend_bioset);
-	xfs_init_bio_from_bh(bio, bh);
+	bio_set_dev(bio, bdev);
+	bio->bi_iter.bi_sector = sector;
 
 	ioend = container_of(bio, struct xfs_ioend, io_inline_bio);
 	INIT_LIST_HEAD(&ioend->io_list);
@@ -643,13 +630,14 @@ static void
 xfs_chain_bio(
 	struct xfs_ioend	*ioend,
 	struct writeback_control *wbc,
-	struct buffer_head	*bh)
+	struct block_device	*bdev,
+	sector_t		sector)
 {
 	struct bio *new;
 
 	new = bio_alloc(GFP_NOFS, BIO_MAX_PAGES);
-	xfs_init_bio_from_bh(new, bh);
-
+	bio_set_dev(new, bdev);
+	new->bi_iter.bi_sector = sector;
 	bio_chain(ioend->io_bio, new);
 	bio_get(ioend->io_bio);		/* for xfs_destroy_ioend */
 	ioend->io_bio->bi_opf = REQ_OP_WRITE | wbc_to_write_flags(wbc);
@@ -659,39 +647,45 @@ xfs_chain_bio(
 }
 
 /*
- * Test to see if we've been building up a completion structure for
- * earlier buffers -- if so, we try to append to this ioend if we
- * can, otherwise we finish off any current ioend and start another.
- * Return the ioend we finished off so that the caller can submit it
- * once it has finished processing the dirty page.
+ * Test to see if we have an existing ioend structure that we could append to
+ * first, otherwise finish off the current ioend and start another.
  */
 STATIC void
 xfs_add_to_ioend(
 	struct inode		*inode,
-	struct buffer_head	*bh,
 	xfs_off_t		offset,
+	struct page		*page,
 	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	struct list_head	*iolist)
 {
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
+	unsigned		len = i_blocksize(inode);
+	unsigned		poff = offset & (PAGE_SIZE - 1);
+	sector_t		sector;
+
+	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
+		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
+
 	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
-	    bh->b_blocknr != wpc->last_block + 1 ||
+	    sector != bio_end_sector(wpc->ioend->io_bio) ||
 	    offset != wpc->ioend->io_offset + wpc->ioend->io_size) {
 		if (wpc->ioend)
 			list_add(&wpc->ioend->io_list, iolist);
-		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset, bh);
+		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset,
+				bdev, sector);
 	}
 
 	/*
-	 * If the buffer doesn't fit into the bio we need to allocate a new
-	 * one.  This shouldn't happen more than once for a given buffer.
+	 * If the block doesn't fit into the bio we need to allocate a new
+	 * one.  This shouldn't happen more than once for a given block.
 	 */
-	while (xfs_bio_add_buffer(wpc->ioend->io_bio, bh) != bh->b_size)
-		xfs_chain_bio(wpc->ioend, wbc, bh);
+	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
+		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
 
-	wpc->ioend->io_size += bh->b_size;
-	wpc->last_block = bh->b_blocknr;
-	xfs_start_buffer_writeback(bh);
+	wpc->ioend->io_size += len;
 }
 
 STATIC void
@@ -865,7 +859,9 @@ xfs_writepage_map(
 
 		lock_buffer(bh);
 		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
-		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
+		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
+				&submit_list);
+		xfs_start_buffer_writeback(bh);
 		count++;
 	}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 18/21] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 17/21] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 19/21] xfs: remove xfs_start_page_writeback Christoph Hellwig
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

This keeps it in a single place so it can be made otional more easily.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5950a3627c00..8328f55b7ea7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -499,21 +499,6 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC void
-xfs_start_buffer_writeback(
-	struct buffer_head	*bh)
-{
-	ASSERT(buffer_mapped(bh));
-	ASSERT(buffer_locked(bh));
-	ASSERT(!buffer_delay(bh));
-	ASSERT(!buffer_unwritten(bh));
-
-	bh->b_end_io = NULL;
-	set_buffer_async_write(bh);
-	set_buffer_uptodate(bh);
-	clear_buffer_dirty(bh);
-}
-
 STATIC void
 xfs_start_page_writeback(
 	struct page		*page,
@@ -722,6 +707,7 @@ xfs_map_at_offset(
 	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
 	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
 
+	lock_buffer(bh);
 	xfs_map_buffer(inode, bh, imap, offset);
 	set_buffer_mapped(bh);
 	clear_buffer_delay(bh);
@@ -734,6 +720,10 @@ xfs_map_at_offset(
 	 * set the bdev now.
 	 */
 	bh->b_bdev = xfs_find_bdev_for_inode(inode);
+	bh->b_end_io = NULL;
+	set_buffer_async_write(bh);
+	set_buffer_uptodate(bh);
+	clear_buffer_dirty(bh);
 }
 
 STATIC void
@@ -857,11 +847,9 @@ xfs_writepage_map(
 		if (wpc->io_type == XFS_IO_HOLE)
 			continue;
 
-		lock_buffer(bh);
 		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
 		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
 				&submit_list);
-		xfs_start_buffer_writeback(bh);
 		count++;
 	}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 19/21] xfs: remove xfs_start_page_writeback
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 18/21] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 20/21] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 21/21] xfs: allow writeback on pages without buffer heads Christoph Hellwig
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

This helper only has two callers, one of them with a constant error
argument.  Remove it to make pending changes to the code a little easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 47 +++++++++++++++++++++--------------------------
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8328f55b7ea7..e017e7ea3a51 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -499,30 +499,6 @@ xfs_map_blocks(
 	return 0;
 }
 
-STATIC void
-xfs_start_page_writeback(
-	struct page		*page,
-	int			clear_dirty)
-{
-	ASSERT(PageLocked(page));
-	ASSERT(!PageWriteback(page));
-
-	/*
-	 * if the page was not fully cleaned, we need to ensure that the higher
-	 * layers come back to it correctly. That means we need to keep the page
-	 * dirty, and for WB_SYNC_ALL writeback we need to ensure the
-	 * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to
-	 * write this page in this writeback sweep will be made.
-	 */
-	if (clear_dirty) {
-		clear_page_dirty_for_io(page);
-		set_page_writeback(page);
-	} else
-		set_page_writeback_keepwrite(page);
-
-	unlock_page(page);
-}
-
 /*
  * Submit the bio for an ioend. We are passed an ioend with a bio attached to
  * it, and we submit that bio. The ioend may be used for multiple bio
@@ -856,6 +832,9 @@ xfs_writepage_map(
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
 out:
+	ASSERT(PageLocked(page));
+	ASSERT(!PageWriteback(page));
+
 	/*
 	 * On error, we have to fail the ioend here because we have locked
 	 * buffers in the ioend. If we don't do this, we'll deadlock
@@ -874,7 +853,21 @@ xfs_writepage_map(
 	 * treated correctly on error.
 	 */
 	if (count) {
-		xfs_start_page_writeback(page, !error);
+		/*
+		 * If the page was not fully cleaned, we need to ensure that the
+		 * higher layers come back to it correctly.  That means we need
+		 * to keep the page dirty, and for WB_SYNC_ALL writeback we need
+		 * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed
+		 * so another attempt to write this page in this writeback sweep
+		 * will be made.
+		 */
+		if (error) {
+			set_page_writeback_keepwrite(page);
+		} else {
+			clear_page_dirty_for_io(page);
+			set_page_writeback(page);
+		}
+		unlock_page(page);
 
 		/*
 		 * Preserve the original error if there was one, otherwise catch
@@ -899,7 +892,9 @@ xfs_writepage_map(
 		 * race with a partial page truncate on a sub-page block sized
 		 * filesystem. In that case we need to mark the page clean.
 		 */
-		xfs_start_page_writeback(page, 1);
+		clear_page_dirty_for_io(page);
+		set_page_writeback(page);
+		unlock_page(page);
 		end_page_writeback(page);
 	}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 20/21] xfs: refactor the tail of xfs_writepage_map
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 19/21] xfs: remove xfs_start_page_writeback Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  2018-05-31 18:07 ` [PATCH 21/21] xfs: allow writeback on pages without buffer heads Christoph Hellwig
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Rejuggle how we deal with the different error vs non-error and have
ioends vs not have ioend cases to keep the fast path streamlined, and
the duplicate code at a minimum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c | 65 +++++++++++++++++++++++------------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e017e7ea3a51..1988835390ab 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -852,7 +852,14 @@ xfs_writepage_map(
 	 * submission of outstanding ioends on the writepage context so they are
 	 * treated correctly on error.
 	 */
-	if (count) {
+	if (unlikely(error)) {
+		if (!count) {
+			xfs_aops_discard_page(page);
+			ClearPageUptodate(page);
+			unlock_page(page);
+			goto done;
+		}
+
 		/*
 		 * If the page was not fully cleaned, we need to ensure that the
 		 * higher layers come back to it correctly.  That means we need
@@ -861,43 +868,35 @@ xfs_writepage_map(
 		 * so another attempt to write this page in this writeback sweep
 		 * will be made.
 		 */
-		if (error) {
-			set_page_writeback_keepwrite(page);
-		} else {
-			clear_page_dirty_for_io(page);
-			set_page_writeback(page);
-		}
-		unlock_page(page);
-
-		/*
-		 * Preserve the original error if there was one, otherwise catch
-		 * submission errors here and propagate into subsequent ioend
-		 * submissions.
-		 */
-		list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
-			int error2;
-
-			list_del_init(&ioend->io_list);
-			error2 = xfs_submit_ioend(wbc, ioend, error);
-			if (error2 && !error)
-				error = error2;
-		}
-	} else if (error) {
-		xfs_aops_discard_page(page);
-		ClearPageUptodate(page);
-		unlock_page(page);
+		set_page_writeback_keepwrite(page);
 	} else {
-		/*
-		 * We can end up here with no error and nothing to write if we
-		 * race with a partial page truncate on a sub-page block sized
-		 * filesystem. In that case we need to mark the page clean.
-		 */
 		clear_page_dirty_for_io(page);
 		set_page_writeback(page);
-		unlock_page(page);
-		end_page_writeback(page);
 	}
 
+	unlock_page(page);
+
+	/*
+	 * Preserve the original error if there was one, otherwise catch
+	 * submission errors here and propagate into subsequent ioend
+	 * submissions.
+	 */
+	list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
+		int error2;
+
+		list_del_init(&ioend->io_list);
+		error2 = xfs_submit_ioend(wbc, ioend, error);
+		if (error2 && !error)
+			error = error2;
+	}
+
+	/*
+	 * We can end up here with no error and nothing to write if we race with
+	 * a partial page truncate on a sub-page block sized filesystem.
+	 */
+	if (!count)
+		end_page_writeback(page);
+done:
 	mapping_set_error(page->mapping, error);
 	return error;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 21/21] xfs: allow writeback on pages without buffer heads
  2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
                   ` (19 preceding siblings ...)
  2018-05-31 18:07 ` [PATCH 20/21] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
@ 2018-05-31 18:07 ` Christoph Hellwig
  20 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-05-31 18:07 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel

Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size
equal to the page size, and deal with pages without buffer heads in
writeback.  Thanks to the previous refactoring this is basically trivial
now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c  | 50 +++++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_iomap.c |  3 ++-
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1988835390ab..ff100706ae28 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -91,6 +91,19 @@ xfs_find_daxdev_for_inode(
 		return mp->m_ddev_targp->bt_daxdev;
 }
 
+static void
+xfs_finish_page_writeback(
+	struct inode		*inode,
+	struct bio_vec		*bvec,
+	int			error)
+{
+	if (error) {
+		SetPageError(bvec->bv_page);
+		mapping_set_error(inode->i_mapping, -EIO);
+	}
+	end_page_writeback(bvec->bv_page);
+}
+
 /*
  * We're now finished for good with this page.  Update the page state via the
  * associated buffer_heads, paying attention to the start and end offsets that
@@ -103,7 +116,7 @@ xfs_find_daxdev_for_inode(
  * and buffers potentially freed after every call to end_buffer_async_write.
  */
 static void
-xfs_finish_page_writeback(
+xfs_finish_buffer_writeback(
 	struct inode		*inode,
 	struct bio_vec		*bvec,
 	int			error)
@@ -178,9 +191,12 @@ xfs_destroy_ioend(
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bvec, bio, i)
-			xfs_finish_page_writeback(inode, bvec, error);
-
+		bio_for_each_segment_all(bvec, bio, i) {
+			if (page_has_buffers(bvec->bv_page))
+				xfs_finish_buffer_writeback(inode, bvec, error);
+			else
+				xfs_finish_page_writeback(inode, bvec, error);
+		}
 		bio_put(bio);
 	}
 
@@ -786,13 +802,16 @@ xfs_writepage_map(
 {
 	LIST_HEAD(submit_list);
 	struct xfs_ioend	*ioend, *next;
-	struct buffer_head	*bh;
+	struct buffer_head	*bh = NULL;
 	ssize_t			len = i_blocksize(inode);
 	uint64_t		file_offset;	/* file offset of page */
 	unsigned		poffset;	/* offset into page */
 	int			error = 0;
 	int			count = 0;
 
+	if (page_has_buffers(page))
+		bh = page_buffers(page);
+
 	/*
 	 * Walk the blocks on the page, and we we run off then end of the
 	 * current map or find the current map invalid, grab a new one.
@@ -801,18 +820,17 @@ xfs_writepage_map(
 	 * replace the bufferhead with some other state tracking mechanism in
 	 * future.
 	 */
-	file_offset = page_offset(page);
-	bh = page_buffers(page);
-	for (poffset = 0;
+	for (poffset = 0, file_offset = page_offset(page);
 	     poffset < PAGE_SIZE;
-	     poffset += len, file_offset += len, bh = bh->b_this_page) {
+	     poffset += len, file_offset += len) {
 		/* past the range we are writing, so nothing more to write. */
 		if (file_offset >= end_offset)
 			break;
 
-		if (!buffer_uptodate(bh)) {
+		if (bh && !buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
+			bh = bh->b_this_page;
 			continue;
 		}
 
@@ -820,10 +838,16 @@ xfs_writepage_map(
 		if (error)
 			goto out;
 
-		if (wpc->io_type == XFS_IO_HOLE)
+		if (wpc->io_type == XFS_IO_HOLE) {
+			if (bh)
+				bh = bh->b_this_page;
 			continue;
+		}
 
-		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		if (bh) {
+			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+			bh = bh->b_this_page;
+		}
 		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
 				&submit_list);
 		count++;
@@ -923,8 +947,6 @@ xfs_do_writepage(
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
-	ASSERT(page_has_buffers(page));
-
 	/*
 	 * Refuse to write the page out if we are called from reclaim context.
 	 *
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f949f0dd7382..93c40da3378a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1031,7 +1031,8 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	iomap->flags |= IOMAP_F_BUFFER_HEAD;
+	if (i_blocksize(inode) < PAGE_SIZE)
+		iomap->flags |= IOMAP_F_BUFFER_HEAD;
 
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 08/21] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks
  2018-05-31 18:07 ` [PATCH 08/21] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
@ 2018-06-04 12:25   ` Brian Foster
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Foster @ 2018-06-04 12:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Thu, May 31, 2018 at 08:07:46PM +0200, Christoph Hellwig wrote:
> We want to be able to use the extent state as a reliably indicator for
> the type of I/O, and stop using the buffer head state.  For this we
> need to stop using the XFS_BMAPI_IGSTATE so that we don't see merged
> extents of different types.
> 
> Based on a patch from Dave Chinner.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Thanks for splitting these up. Looks good to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 25d112a4bcf5..2966d62b0594 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -385,7 +385,6 @@ xfs_map_blocks(
>  	ssize_t			count = i_blocksize(inode);
>  	xfs_fileoff_t		offset_fsb, end_fsb;
>  	int			error = 0;
> -	int			bmapi_flags = XFS_BMAPI_ENTIRE;
>  	int			nimaps = 1;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
> @@ -405,8 +404,6 @@ xfs_map_blocks(
>  		return 0;
>  
>  	ASSERT(type != XFS_IO_COW);
> -	if (type == XFS_IO_UNWRITTEN)
> -		bmapi_flags |= XFS_BMAPI_IGSTATE;
>  
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
>  	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
> @@ -418,7 +415,7 @@ xfs_map_blocks(
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> -				imap, &nimaps, bmapi_flags);
> +				imap, &nimaps, XFS_BMAPI_ENTIRE);
>  	/*
>  	 * Truncate an overwrite extent if there's a pending CoW
>  	 * reservation before the end of this extent.  This forces us
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/21] xfs: remove xfs_reflink_trim_irec_to_next_cow
  2018-05-31 18:07 ` [PATCH 09/21] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
@ 2018-06-04 12:25   ` Brian Foster
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Foster @ 2018-06-04 12:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Thu, May 31, 2018 at 08:07:47PM +0200, Christoph Hellwig wrote:
> We already have to check for overlapping COW extents everytime we
> come back to a page in xfs_writepage_map / xfs_map_cow, so this
> additional trim is not required.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c    |  7 -------
>  fs/xfs/xfs_reflink.c | 33 ---------------------------------
>  fs/xfs/xfs_reflink.h |  2 --
>  fs/xfs/xfs_trace.h   |  1 -
>  4 files changed, 43 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 2966d62b0594..9b5a4b6e268c 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -416,13 +416,6 @@ xfs_map_blocks(
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>  				imap, &nimaps, XFS_BMAPI_ENTIRE);
> -	/*
> -	 * Truncate an overwrite extent if there's a pending CoW
> -	 * reservation before the end of this extent.  This forces us
> -	 * to come back to writepage to take care of the CoW.
> -	 */
> -	if (nimaps && type == XFS_IO_OVERWRITE)
> -		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
>  	if (error)
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 713e857d9ffa..d0092c864458 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -514,39 +514,6 @@ xfs_reflink_find_cow_mapping(
>  	return true;
>  }
>  
> -/*
> - * Trim an extent to end at the next CoW reservation past offset_fsb.
> - */
> -void
> -xfs_reflink_trim_irec_to_next_cow(
> -	struct xfs_inode		*ip,
> -	xfs_fileoff_t			offset_fsb,
> -	struct xfs_bmbt_irec		*imap)
> -{
> -	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> -	struct xfs_bmbt_irec		got;
> -	struct xfs_iext_cursor		icur;
> -
> -	if (!xfs_is_reflink_inode(ip))
> -		return;
> -
> -	/* Find the extent in the CoW fork. */
> -	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
> -		return;
> -
> -	/* This is the extent before; try sliding up one. */
> -	if (got.br_startoff < offset_fsb) {
> -		if (!xfs_iext_next_extent(ifp, &icur, &got))
> -			return;
> -	}
> -
> -	if (got.br_startoff >= imap->br_startoff + imap->br_blockcount)
> -		return;
> -
> -	imap->br_blockcount = got.br_startoff - imap->br_startoff;
> -	trace_xfs_reflink_trim_irec(ip, imap);
> -}
> -
>  /*
>   * Cancel CoW reservations for some block range of an inode.
>   *
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 701487bab468..c0a25609007b 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -34,8 +34,6 @@ extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
>  extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
>  		struct xfs_bmbt_irec *imap);
> -extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
> -		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
>  
>  extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
>  		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 9d4c4ca24fe6..041045a8d65c 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3228,7 +3228,6 @@ DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
>  
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
>  DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
> -DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);
>  
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/21] xfs: remove xfs_map_cow
  2018-05-31 18:07 ` [PATCH 10/21] xfs: remove xfs_map_cow Christoph Hellwig
@ 2018-06-04 12:25   ` Brian Foster
  2018-06-13 15:30     ` Christoph Hellwig
  0 siblings, 1 reply; 29+ messages in thread
From: Brian Foster @ 2018-06-04 12:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Thu, May 31, 2018 at 08:07:48PM +0200, Christoph Hellwig wrote:
> We can handle the existing cow mapping case as a special case directly
> in xfs_writepage_map, and share code for allocating delalloc blocks
> with regular I/O in xfs_map_blocks.  This means we need to always
> call xfs_map_blocks for reflink inodes, but we can still skip most of
> the work if it turns out that there is no COW mapping overlapping the
> current block.
> 
> As a subtle detail we need to start caching holes in the wpc to deal
> with the case of COW reservations between EOF.  But we'll need that
> infrastructure later anyway, so this is no big deal.
> 
> Based on a patch from Dave Chinner.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c | 184 ++++++++++++++++++++++------------------------
>  fs/xfs/xfs_aops.h |   4 +-
>  2 files changed, 90 insertions(+), 98 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 9b5a4b6e268c..d536509ca05b 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -375,70 +375,107 @@ xfs_end_bio(
>  
>  STATIC int
>  xfs_map_blocks(
> +	struct xfs_writepage_ctx *wpc,
>  	struct inode		*inode,
> -	loff_t			offset,
> -	struct xfs_bmbt_irec	*imap,
> -	int			type)
> +	loff_t			offset)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	ssize_t			count = i_blocksize(inode);
> -	xfs_fileoff_t		offset_fsb, end_fsb;
> +	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset), end_fsb;
> +	struct xfs_bmbt_irec	imap;
> +	int			whichfork = XFS_DATA_FORK;
>  	int			error = 0;
>  	int			nimaps = 1;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	/*
> -	 * Truncate can race with writeback since writeback doesn't take the
> -	 * iolock and truncate decreases the file size before it starts
> -	 * truncating the pages between new_size and old_size.  Therefore, we
> -	 * can end up in the situation where writeback gets a CoW fork mapping
> -	 * but the truncate makes the mapping invalid and we end up in here
> -	 * trying to get a new mapping.  Bail out here so that we simply never
> -	 * get a valid mapping and so we drop the write altogether.  The page
> -	 * truncation will kill the contents anyway.
> -	 */
> -	if (type == XFS_IO_COW && offset > i_size_read(inode))
> -		return 0;
> -
> -	ASSERT(type != XFS_IO_COW);
> -
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
>  	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
>  	       (ip->i_df.if_flags & XFS_IFEXTENTS));
>  	ASSERT(offset <= mp->m_super->s_maxbytes);
>  
> +	if (xfs_is_reflink_inode(ip) &&
> +	    xfs_reflink_find_cow_mapping(ip, offset, &imap)) {
> +		xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +		/*
> +		 * Truncate can race with writeback since writeback doesn't
> +		 * take the iolock and truncate decreases the file size before
> +		 * it starts truncating the pages between new_size and old_size.
> +		 * Therefore, we can end up in the situation where writeback
> +		 * gets a CoW fork mapping but the truncate makes the mapping
> +		 * invalid and we end up in here trying to get a new mapping.
> +		 * bail out here so that we simply never get a valid mapping
> +		 * and so we drop the write altogether.  The page truncation
> +		 * will kill the contents anyway.
> +		 */
> +		if (offset > i_size_read(inode)) {
> +			whichfork = XFS_IO_HOLE;

Not sure what the point of this assignment is if we're going to return,
but that's not a valid fork either way. Did you mean to assign
wpc->io_type?

> +			return 0;
> +		}
> +		whichfork = XFS_COW_FORK;
> +		wpc->io_type = XFS_IO_COW;
> +		goto allocate_blocks;
> +	}
> +
> +	/*
> +	 * Map valid and no COW extent in the way?  We're done.
> +	 */
> +	if (wpc->imap_valid) {
> +		xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +		return 0;
> +	}
> +
> +	/*
> +	 * If we don't have a valid map, now it's time to get a new one for this
> +	 * offset.  This will convert delayed allocations (including COW ones)
> +	 * into real extents.
> +	 */
>  	if (offset > mp->m_super->s_maxbytes - count)
>  		count = mp->m_super->s_maxbytes - offset;
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> -				imap, &nimaps, XFS_BMAPI_ENTIRE);
> +				&imap, &nimaps, XFS_BMAPI_ENTIRE);
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -
>  	if (error)
>  		return error;
>  
> -	if (type == XFS_IO_DELALLOC &&
> -	    (!nimaps || isnullstartblock(imap->br_startblock))) {
> -		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
> -				imap);
> -		if (!error)
> -			trace_xfs_map_blocks_alloc(ip, offset, count, type, imap);
> -		return error;
> +	if (!nimaps) {
> +		/*
> +		 * Lookup returns no match? Beyond eof? regardless,
> +		 * return it as a hole so we don't write it
> +		 */
> +		imap.br_startoff = offset_fsb;
> +		imap.br_blockcount = end_fsb - offset_fsb;
> +		imap.br_startblock = HOLESTARTBLOCK;
> +		wpc->io_type = XFS_IO_HOLE;
> +	} else if (imap.br_startblock == HOLESTARTBLOCK) {
> +		/* landed in a hole */
> +		wpc->io_type = XFS_IO_HOLE;
>  	}
>  
> +	if (wpc->io_type == XFS_IO_DELALLOC &&
> +	    (!nimaps || isnullstartblock(imap.br_startblock)))
> +		goto allocate_blocks;
> +
>  #ifdef DEBUG
> -	if (type == XFS_IO_UNWRITTEN) {
> +	if (wpc->io_type == XFS_IO_UNWRITTEN) {
>  		ASSERT(nimaps);
> -		ASSERT(imap->br_startblock != HOLESTARTBLOCK);
> -		ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
> +		ASSERT(imap.br_startblock != HOLESTARTBLOCK);
> +		ASSERT(imap.br_startblock != DELAYSTARTBLOCK);
>  	}
>  #endif
> -	if (nimaps)
> -		trace_xfs_map_blocks_found(ip, offset, count, type, imap);
> +	wpc->imap = imap;
> +	trace_xfs_map_blocks_found(ip, offset, count, wpc->io_type, &imap);
> +	return 0;
> +allocate_blocks:
> +	error = xfs_iomap_write_allocate(ip, whichfork, offset, &imap);
> +	if (error)
> +		return error;
> +	wpc->imap = imap;
> +	trace_xfs_map_blocks_alloc(ip, offset, count, wpc->io_type, &imap);
>  	return 0;
>  }
>  
> @@ -759,56 +796,6 @@ xfs_aops_discard_page(
>  	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
>  }
>  
> -static int
> -xfs_map_cow(
> -	struct xfs_writepage_ctx *wpc,
> -	struct inode		*inode,
> -	loff_t			offset,
> -	unsigned int		*new_type)
> -{
> -	struct xfs_inode	*ip = XFS_I(inode);
> -	struct xfs_bmbt_irec	imap;
> -	bool			is_cow = false;
> -	int			error;
> -
> -	/*
> -	 * If we already have a valid COW mapping keep using it.
> -	 */
> -	if (wpc->io_type == XFS_IO_COW) {
> -		wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
> -		if (wpc->imap_valid) {
> -			*new_type = XFS_IO_COW;
> -			return 0;
> -		}
> -	}
> -
> -	/*
> -	 * Else we need to check if there is a COW mapping at this offset.
> -	 */
> -	xfs_ilock(ip, XFS_ILOCK_SHARED);
> -	is_cow = xfs_reflink_find_cow_mapping(ip, offset, &imap);
> -	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -
> -	if (!is_cow)
> -		return 0;
> -
> -	/*
> -	 * And if the COW mapping has a delayed extent here we need to
> -	 * allocate real space for it now.
> -	 */
> -	if (isnullstartblock(imap.br_startblock)) {
> -		error = xfs_iomap_write_allocate(ip, XFS_COW_FORK, offset,
> -				&imap);
> -		if (error)
> -			return error;
> -	}
> -
> -	wpc->io_type = *new_type = XFS_IO_COW;
> -	wpc->imap_valid = true;
> -	wpc->imap = imap;
> -	return 0;
> -}
> -
>  /*
>   * We implement an immediate ioend submission policy here to avoid needing to
>   * chain multiple ioends and hence nest mempool allocations which can violate
> @@ -873,10 +860,13 @@ xfs_writepage_map(
>  			continue;
>  		}
>  
> -		if (xfs_is_reflink_inode(XFS_I(inode))) {
> -			error = xfs_map_cow(wpc, inode, offset, &new_type);
> -			if (error)
> -				goto out;
> +		/*
> +		 * If we already have a valid COW mapping keep using it.
> +		 */
> +		if (wpc->io_type == XFS_IO_COW &&
> +		    xfs_imap_valid(inode, &wpc->imap, offset)) {
> +			wpc->imap_valid = true;
> +			new_type = XFS_IO_COW;
>  		}

So this lifts the cow I/O type check from xfs_map_cow(). That path would
set imap_valid and new_type and return, which essentially looks like a
"blocks found" case where we go ahead and add the buffer to the ioend.
Now we set the same values, but we have to call xfs_map_blocks() to make
sure we cover the cow blocks overlap case that xfs_map_cow() used to
handle.

Shouldn't passing this check effectively jump straight to to the buffer
add? Otherwise it looks like we unconditionally attempt to allocate
blocks in the cow fork.

Alternatively...

>  
>  		if (wpc->io_type != new_type) {
> @@ -887,22 +877,22 @@ xfs_writepage_map(
>  		if (wpc->imap_valid)
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
>  							 offset);
> -		if (!wpc->imap_valid) {
> -			error = xfs_map_blocks(inode, offset, &wpc->imap,
> -					     wpc->io_type);
> +		if (!wpc->imap_valid || xfs_is_reflink_inode(XFS_I(inode))) {
> +			error = xfs_map_blocks(wpc, inode, offset);

... perhaps this could change to something like:

		if (!wpc->imap_valid ||
		    (xfs_is_reflink_inode(XFS_I(inode)) &&
		     wpc->io_type != XFS_IO_COW))
			error = ...

... so the logic is a bit more clear. E.g., we always need to look up in
the COW fork for reflink inodes unless we're already doing COW I/O. Hm?

Brian

>  			if (error)
>  				goto out;
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
>  							 offset);
>  		}
> -		if (wpc->imap_valid) {
> -			lock_buffer(bh);
> -			if (wpc->io_type != XFS_IO_OVERWRITE)
> -				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
> -			count++;
> -		}
>  
> +		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
> +			continue;
> +
> +		lock_buffer(bh);
> +		if (wpc->io_type != XFS_IO_OVERWRITE)
> +			xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> +		xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
> +		count++;
>  	} while (offset += len, ((bh = bh->b_this_page) != head));
>  
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index 69346d460dfa..b2ef5b661761 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -29,6 +29,7 @@ enum {
>  	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
>  	XFS_IO_OVERWRITE,	/* covers already allocated extent */
>  	XFS_IO_COW,		/* covers copy-on-write extent */
> +	XFS_IO_HOLE,		/* covers region without any block allocation */
>  };
>  
>  #define XFS_IO_TYPES \
> @@ -36,7 +37,8 @@ enum {
>  	{ XFS_IO_DELALLOC,		"delalloc" }, \
>  	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
>  	{ XFS_IO_OVERWRITE,		"overwrite" }, \
> -	{ XFS_IO_COW,			"CoW" }
> +	{ XFS_IO_COW,			"CoW" }, \
> +	{ XFS_IO_HOLE,			"hole" }
>  
>  /*
>   * Structure for buffered I/O completions.
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/21] xfs: rename the offset variable in xfs_writepage_map
  2018-05-31 18:07 ` [PATCH 11/21] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
@ 2018-06-04 12:25   ` Brian Foster
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Foster @ 2018-06-04 12:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Thu, May 31, 2018 at 08:07:49PM +0200, Christoph Hellwig wrote:
> Calling it file_offset makes the usage more clear, especially with
> a new poffset variable that will be added soon for the offset inside
> the page.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index d536509ca05b..6f57a785f4d3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -824,15 +824,15 @@ xfs_writepage_map(
>  	struct xfs_ioend	*ioend, *next;
>  	struct buffer_head	*bh, *head;
>  	ssize_t			len = i_blocksize(inode);
> -	uint64_t		offset;
> +	uint64_t		file_offset;	/* file offset of page */
>  	int			error = 0;
>  	int			count = 0;
>  	unsigned int		new_type;
>  
>  	bh = head = page_buffers(page);
> -	offset = page_offset(page);
> +	file_offset = page_offset(page);
>  	do {
> -		if (offset >= end_offset)
> +		if (file_offset >= end_offset)
>  			break;
>  
>  		/*
> @@ -864,7 +864,7 @@ xfs_writepage_map(
>  		 * If we already have a valid COW mapping keep using it.
>  		 */
>  		if (wpc->io_type == XFS_IO_COW &&
> -		    xfs_imap_valid(inode, &wpc->imap, offset)) {
> +		    xfs_imap_valid(inode, &wpc->imap, file_offset)) {
>  			wpc->imap_valid = true;
>  			new_type = XFS_IO_COW;
>  		}
> @@ -876,13 +876,13 @@ xfs_writepage_map(
>  
>  		if (wpc->imap_valid)
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
>  		if (!wpc->imap_valid || xfs_is_reflink_inode(XFS_I(inode))) {
> -			error = xfs_map_blocks(wpc, inode, offset);
> +			error = xfs_map_blocks(wpc, inode, file_offset);
>  			if (error)
>  				goto out;
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
>  		}
>  
>  		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE)
> @@ -890,10 +890,10 @@ xfs_writepage_map(
>  
>  		lock_buffer(bh);
>  		if (wpc->io_type != XFS_IO_OVERWRITE)
> -			xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -		xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
> +			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
>  		count++;
> -	} while (offset += len, ((bh = bh->b_this_page) != head));
> +	} while (file_offset += len, ((bh = bh->b_this_page) != head));
>  
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 12/21] xfs: make xfs_writepage_map extent map centric
  2018-05-31 18:07 ` [PATCH 12/21] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
@ 2018-06-04 12:26   ` Brian Foster
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Foster @ 2018-06-04 12:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, Dave Chinner

On Thu, May 31, 2018 at 08:07:50PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
...
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> [hch: forward port + slight refactoring]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c | 96 ++++++++++++++++++++++-------------------------
>  1 file changed, 44 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 6f57a785f4d3..a0ecfd63b858 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -822,61 +830,46 @@ xfs_writepage_map(
>  {
>  	LIST_HEAD(submit_list);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh, *head;
> +	struct buffer_head	*bh;
>  	ssize_t			len = i_blocksize(inode);
>  	uint64_t		file_offset;	/* file offset of page */
> +	unsigned		poffset;	/* offset into page */
>  	int			error = 0;
>  	int			count = 0;
> -	unsigned int		new_type;
>  
> -	bh = head = page_buffers(page);
> +	/*
> +	 * Walk the blocks on the page, and we we run off then end of the
> +	 * current map or find the current map invalid, grab a new one.
> +	 * We only use bufferheads here to check per-block state - they no
> +	 * longer control the iteration through the page. This allows us to
> +	 * replace the bufferhead with some other state tracking mechanism in
> +	 * future.
> +	 */
>  	file_offset = page_offset(page);
> -	do {
> +	bh = page_buffers(page);
> +	for (poffset = 0;
> +	     poffset < PAGE_SIZE;
> +	     poffset += len, file_offset += len, bh = bh->b_this_page) {
> +		/* past the range we are writing, so nothing more to write. */
>  		if (file_offset >= end_offset)
>  			break;
>  
> -		/*
> -		 * set_page_dirty dirties all buffers in a page, independent
> -		 * of their state.  The dirty state however is entirely
> -		 * meaningless for holes (!mapped && uptodate), so skip
> -		 * buffers covering holes here.
> -		 */
> -		if (!buffer_mapped(bh) && buffer_uptodate(bh))
> -			continue;
> -
> -		if (buffer_unwritten(bh))
> -			new_type = XFS_IO_UNWRITTEN;
> -		else if (buffer_delay(bh))
> -			new_type = XFS_IO_DELALLOC;
> -		else if (buffer_uptodate(bh))
> -			new_type = XFS_IO_OVERWRITE;
> -		else {
> +		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> -			/*
> -			 * This buffer is not uptodate and will not be
> -			 * written to disk.
> -			 */
>  			continue;
>  		}
>  
> -		/*
> -		 * If we already have a valid COW mapping keep using it.
> -		 */
> -		if (wpc->io_type == XFS_IO_COW &&
> -		    xfs_imap_valid(inode, &wpc->imap, file_offset)) {
> -			wpc->imap_valid = true;
> -			new_type = XFS_IO_COW;
> -		}
> -
> -		if (wpc->io_type != new_type) {
> -			wpc->io_type = new_type;
> -			wpc->imap_valid = false;
> -		}
> -

We drop the previously lifted check but unless I'm missing something wrt
my previous comments on the imap_valid && io_type == COW case, we still
have that unconditional allocation behavior. I suspect if the previous
patch updated the logic below (and preserved it moving forward while
also perhaps adding to the comment to explain why we must call
xfs_map_blocks() in certain cases for reflink inodes), that would fix up
both of these patches.

Otherwise the rest of the changes here look fine to me.

Brian

>  		if (wpc->imap_valid)
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
>  							 file_offset);
> +		/*
> +		 * If we don't have a valid map, now it's time to get a new one
> +		 * for this offset.  This will convert delayed allocations
> +		 * (including COW ones) into real extents.  If we return without
> +		 * a valid map, it means we landed in a hole and we skip the
> +		 * block.
> +		 */
>  		if (!wpc->imap_valid || xfs_is_reflink_inode(XFS_I(inode))) {
>  			error = xfs_map_blocks(wpc, inode, file_offset);
>  			if (error)
> @@ -889,11 +882,10 @@ xfs_writepage_map(
>  			continue;
>  
>  		lock_buffer(bh);
> -		if (wpc->io_type != XFS_IO_OVERWRITE)
> -			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
>  		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
>  		count++;
> -	} while (file_offset += len, ((bh = bh->b_this_page) != head));
> +	}
>  
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 13/21] xfs: remove the now unused XFS_BMAPI_IGSTATE flag
  2018-05-31 18:07 ` [PATCH 13/21] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
@ 2018-06-04 12:27   ` Brian Foster
  0 siblings, 0 replies; 29+ messages in thread
From: Brian Foster @ 2018-06-04 12:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel

On Thu, May 31, 2018 at 08:07:51PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_bmap.c | 6 ++----
>  fs/xfs/libxfs/xfs_bmap.h | 3 ---
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7b0e2b551e23..4b5e014417d2 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3799,8 +3799,7 @@ xfs_bmapi_update_map(
>  		   mval[-1].br_startblock != HOLESTARTBLOCK &&
>  		   mval->br_startblock == mval[-1].br_startblock +
>  					  mval[-1].br_blockcount &&
> -		   ((flags & XFS_BMAPI_IGSTATE) ||
> -			mval[-1].br_state == mval->br_state)) {
> +		   mval[-1].br_state == mval->br_state) {
>  		ASSERT(mval->br_startoff ==
>  		       mval[-1].br_startoff + mval[-1].br_blockcount);
>  		mval[-1].br_blockcount += mval->br_blockcount;
> @@ -3845,7 +3844,7 @@ xfs_bmapi_read(
>  
>  	ASSERT(*nmap >= 1);
>  	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
> -			   XFS_BMAPI_IGSTATE|XFS_BMAPI_COWFORK)));
> +			   XFS_BMAPI_COWFORK)));
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
>  
>  	if (unlikely(XFS_TEST_ERROR(
> @@ -4290,7 +4289,6 @@ xfs_bmapi_write(
>  
>  	ASSERT(*nmap >= 1);
>  	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
> -	ASSERT(!(flags & XFS_BMAPI_IGSTATE));
>  	ASSERT(tp != NULL ||
>  	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
>  			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 2c233f9f1a26..a845fe57d1b5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -80,8 +80,6 @@ struct xfs_extent_free_item
>  #define XFS_BMAPI_METADATA	0x002	/* mapping metadata not user data */
>  #define XFS_BMAPI_ATTRFORK	0x004	/* use attribute fork not data */
>  #define XFS_BMAPI_PREALLOC	0x008	/* preallocation op: unwritten space */
> -#define XFS_BMAPI_IGSTATE	0x010	/* Ignore state - */
> -					/* combine contig. space */
>  #define XFS_BMAPI_CONTIG	0x020	/* must allocate only one extent */
>  /*
>   * unwritten extent conversion - this needs write cache flushing and no additional
> @@ -128,7 +126,6 @@ struct xfs_extent_free_item
>  	{ XFS_BMAPI_METADATA,	"METADATA" }, \
>  	{ XFS_BMAPI_ATTRFORK,	"ATTRFORK" }, \
>  	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
> -	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
>  	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
>  	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
>  	{ XFS_BMAPI_ZERO,	"ZERO" }, \
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/21] xfs: remove xfs_map_cow
  2018-06-04 12:25   ` Brian Foster
@ 2018-06-13 15:30     ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2018-06-13 15:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel

On Mon, Jun 04, 2018 at 08:25:33AM -0400, Brian Foster wrote:
> > +		if (offset > i_size_read(inode)) {
> > +			whichfork = XFS_IO_HOLE;
> 
> Not sure what the point of this assignment is if we're going to return,
> but that's not a valid fork either way. Did you mean to assign
> wpc->io_type?

Yes, I did.  And this bug actually caused my unexplainable test
failures with 1k file systems..

> > +		/*
> > +		 * If we already have a valid COW mapping keep using it.
> > +		 */
> > +		if (wpc->io_type == XFS_IO_COW &&
> > +		    xfs_imap_valid(inode, &wpc->imap, offset)) {
> > +			wpc->imap_valid = true;
> > +			new_type = XFS_IO_COW;
> >  		}
> 
> So this lifts the cow I/O type check from xfs_map_cow(). That path would
> set imap_valid and new_type and return, which essentially looks like a
> "blocks found" case where we go ahead and add the buffer to the ioend.
> Now we set the same values, but we have to call xfs_map_blocks() to make
> sure we cover the cow blocks overlap case that xfs_map_cow() used to
> handle.
> 
> Shouldn't passing this check effectively jump straight to to the buffer
> add? Otherwise it looks like we unconditionally attempt to allocate
> blocks in the cow fork.
> 
> Alternatively...

It's all a bit of a mess.  I've now bitten the bullet and added a
modification counter to the ifork and use that instead of our
previous hacks.  Still work in progress, though.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-06-13 15:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31 18:07 buffered writes without buffer heads in xfs and iomap v5 Christoph Hellwig
2018-05-31 18:07 ` [PATCH 01/21] fs: factor out a __generic_write_end helper Christoph Hellwig
2018-05-31 18:07 ` [PATCH 02/21] iomap: add initial support for writes without buffer heads Christoph Hellwig
2018-05-31 18:07 ` [PATCH 03/21] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-31 18:07 ` [PATCH 04/21] xfs: simplify xfs_aops_discard_page Christoph Hellwig
2018-05-31 18:07 ` [PATCH 05/21] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-31 18:07 ` [PATCH 06/21] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
2018-05-31 18:07 ` [PATCH 07/21] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
2018-05-31 18:07 ` [PATCH 08/21] xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocks Christoph Hellwig
2018-06-04 12:25   ` Brian Foster
2018-05-31 18:07 ` [PATCH 09/21] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
2018-06-04 12:25   ` Brian Foster
2018-05-31 18:07 ` [PATCH 10/21] xfs: remove xfs_map_cow Christoph Hellwig
2018-06-04 12:25   ` Brian Foster
2018-06-13 15:30     ` Christoph Hellwig
2018-05-31 18:07 ` [PATCH 11/21] xfs: rename the offset variable in xfs_writepage_map Christoph Hellwig
2018-06-04 12:25   ` Brian Foster
2018-05-31 18:07 ` [PATCH 12/21] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
2018-06-04 12:26   ` Brian Foster
2018-05-31 18:07 ` [PATCH 13/21] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
2018-06-04 12:27   ` Brian Foster
2018-05-31 18:07 ` [PATCH 14/21] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
2018-05-31 18:07 ` [PATCH 15/21] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
2018-05-31 18:07 ` [PATCH 16/21] xfs: remove the imap_valid flag Christoph Hellwig
2018-05-31 18:07 ` [PATCH 17/21] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
2018-05-31 18:07 ` [PATCH 18/21] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
2018-05-31 18:07 ` [PATCH 19/21] xfs: remove xfs_start_page_writeback Christoph Hellwig
2018-05-31 18:07 ` [PATCH 20/21] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
2018-05-31 18:07 ` [PATCH 21/21] xfs: allow writeback on pages without buffer heads Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).