linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* buffered writes without buffer heads in xfs and iomap v4
@ 2018-05-30  9:59 Christoph Hellwig
  2018-05-30  9:59 ` [PATCH 01/18] fs: factor out a __generic_write_end helper Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Hi all,

this series adds support for buffered writes without buffer heads to
the iomap and XFS code.

For now this series only contains support for block size == PAGE_SIZE,
with the 4k support split into a separate series.


A git tree is available at:

    git://git.infradead.org/users/hch/xfs.git xfs-iomap-write.4

Gitweb:

    http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-iomap-write.4

Changes since v3:
 - iterate backwards in xfs_bmap_punch_delalloc_range
 - remove the cow_valid variable in xfs_reflink_trim_irec_to_next_cow
 - additional trivial xfs_map_blocks simplifications
 - split the read side into a separate prep series
 - moved the SEEK_HOLE/DATA patches not strictly required out of the series

Changes since v2:
 - minor page_seek_hole_data tweaks
 - don't read data entirely covered by the write operation in write_begin
 - fix zeroing on write_begin I/O failure
 - remove iomap_block_needs_zeroing to make the code more clear
 - update comments on __do_page_cache_readahead

Changes since v1:
 - fix the iomap_readpages error handling
 - use unsigned file offsets in a few places to avoid arithmetic overflows
 - allocate a iomap_page in iomap_page_mkwrite to fix generic/095
 - improve a few comments
 - add more asserts
 - warn about truncated block numbers from ->bmap
 - new patch to change the __do_page_cache_readahead return value to
   unsigned int
 - remove an incorrectly added empty line
 - make inline data an explicit iomap type instead of a flag
 - add a IOMAP_F_BUFFER_HEAD flag to force use of buffers heads for gfs2,
   and keep the basic buffer head infrastructure around for now.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/18] fs: factor out a __generic_write_end helper
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
@ 2018-05-30  9:59 ` Christoph Hellwig
  2018-05-30 13:34   ` Brian Foster
  2018-05-30 16:26   ` Darrick J. Wong
  2018-05-30  9:59 ` [PATCH 02/18] iomap: add initial support for writes without buffer heads Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Bits of the buffer.c based write_end implementations that don't know
about buffer_heads and can be reused by other implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c   | 67 +++++++++++++++++++++++++++------------------------
 fs/internal.h |  2 ++
 2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..bd964b2ad99a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2076,6 +2076,40 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
 }
 EXPORT_SYMBOL(block_write_begin);
 
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page)
+{
+	loff_t old_size = inode->i_size;
+	bool i_size_changed = false;
+
+	/*
+	 * No need to use i_size_read() here, the i_size cannot change under us
+	 * because we hold i_rwsem.
+	 *
+	 * But it's important to update i_size while still holding page lock:
+	 * page writeout could otherwise come in and zero beyond i_size.
+	 */
+	if (pos + copied > inode->i_size) {
+		i_size_write(inode, pos + copied);
+		i_size_changed = true;
+	}
+
+	unlock_page(page);
+	put_page(page);
+
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
+	/*
+	 * Don't mark the inode dirty under page lock. First, it unnecessarily
+	 * makes the holding time of page lock longer. Second, it forces lock
+	 * ordering of page lock and transaction start for journaling
+	 * filesystems.
+	 */
+	if (i_size_changed)
+		mark_inode_dirty(inode);
+	return copied;
+}
+
 int block_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
@@ -2116,39 +2150,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
 {
-	struct inode *inode = mapping->host;
-	loff_t old_size = inode->i_size;
-	int i_size_changed = 0;
-
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-
-	/*
-	 * No need to use i_size_read() here, the i_size
-	 * cannot change under us because we hold i_mutex.
-	 *
-	 * But it's important to update i_size while still holding page lock:
-	 * page writeout could otherwise come in and zero beyond i_size.
-	 */
-	if (pos+copied > inode->i_size) {
-		i_size_write(inode, pos+copied);
-		i_size_changed = 1;
-	}
-
-	unlock_page(page);
-	put_page(page);
-
-	if (old_size < pos)
-		pagecache_isize_extended(inode, old_size, pos);
-	/*
-	 * Don't mark the inode dirty under page lock. First, it unnecessarily
-	 * makes the holding time of page lock longer. Second, it forces lock
-	 * ordering of page lock and transaction start for journaling
-	 * filesystems.
-	 */
-	if (i_size_changed)
-		mark_inode_dirty(inode);
-
-	return copied;
+	return __generic_write_end(mapping->host, pos, copied, page);
 }
 EXPORT_SYMBOL(generic_write_end);
 
diff --git a/fs/internal.h b/fs/internal.h
index e08972db0303..b955232d3d49 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -43,6 +43,8 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
 extern void guard_bio_eod(int rw, struct bio *bio);
 extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
 		get_block_t *get_block, struct iomap *iomap);
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+		struct page *page);
 
 /*
  * char_dev.c
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/18] iomap: add initial support for writes without buffer heads
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
  2018-05-30  9:59 ` [PATCH 01/18] fs: factor out a __generic_write_end helper Christoph Hellwig
@ 2018-05-30  9:59 ` Christoph Hellwig
  2018-05-30 13:34   ` Brian Foster
  2018-05-30 16:33   ` Darrick J. Wong
  2018-05-30  9:59 ` [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

For now just limited to blocksize == PAGE_SIZE, where we can simply read
in the full page in write begin, and just set the whole page dirty after
copying data into it.  This code is enabled by default and XFS will now
be feed pages without buffer heads in ->writepage and ->writepages.

If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old
path will still be used, this both helps the transition in XFS and
prepares for the gfs2 migration to the iomap infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c            | 128 ++++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_iomap.c    |   6 +-
 include/linux/iomap.h |   2 +
 3 files changed, 123 insertions(+), 13 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 5e5a266e3325..0c9d9be59184 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -316,6 +316,48 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
 }
 
+static int
+iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
+		unsigned poff, unsigned plen, unsigned from, unsigned to,
+		struct iomap *iomap)
+{
+	struct bio_vec bvec;
+	struct bio bio;
+
+	if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) {
+		zero_user_segments(page, poff, from, to, poff + plen);
+		return 0;
+	}
+
+	bio_init(&bio, &bvec, 1);
+	bio.bi_opf = REQ_OP_READ;
+	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
+	bio_set_dev(&bio, iomap->bdev);
+	__bio_add_page(&bio, page, plen, poff);
+	return submit_bio_wait(&bio);
+}
+
+static int
+__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+		struct page *page, struct iomap *iomap)
+{
+	loff_t block_size = i_blocksize(inode);
+	loff_t block_start = pos & ~(block_size - 1);
+	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
+	unsigned poff = block_start & (PAGE_SIZE - 1);
+	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start);
+	unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
+
+	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
+
+	if (PageUptodate(page))
+		return 0;
+	if (from <= poff && to >= poff + plen)
+		return 0;
+	return iomap_read_page_sync(inode, block_start, page,
+			poff, plen, from, to, iomap);
+}
+
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
@@ -333,7 +375,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	if (!page)
 		return -ENOMEM;
 
-	status = __block_write_begin_int(page, pos, len, NULL, iomap);
+	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
+		status = __block_write_begin_int(page, pos, len, NULL, iomap);
+	else
+		status = __iomap_write_begin(inode, pos, len, page, iomap);
 	if (unlikely(status)) {
 		unlock_page(page);
 		put_page(page);
@@ -346,14 +391,69 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	return status;
 }
 
+int
+iomap_set_page_dirty(struct page *page)
+{
+	struct address_space *mapping = page_mapping(page);
+	int newly_dirty;
+
+	if (unlikely(!mapping))
+		return !TestSetPageDirty(page);
+
+	/*
+	 * Lock out page->mem_cgroup migration to keep PageDirty
+	 * synchronized with per-memcg dirty page counters.
+	 */
+	lock_page_memcg(page);
+	newly_dirty = !TestSetPageDirty(page);
+	if (newly_dirty)
+		__set_page_dirty(page, mapping, 0);
+	unlock_page_memcg(page);
+
+	if (newly_dirty)
+		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
+	return newly_dirty;
+}
+EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
+
+static int
+__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+		unsigned copied, struct page *page, struct iomap *iomap)
+{
+	flush_dcache_page(page);
+
+	/*
+	 * The blocks that were entirely written will now be uptodate, so we
+	 * don't have to worry about a readpage reading them and overwriting a
+	 * partial write.  However if we have encountered a short write and only
+	 * partially written into a block, it will not be marked uptodate, so a
+	 * readpage might come in and destroy our partial write.
+	 *
+	 * Do the simplest thing, and just treat any short write to a non
+	 * uptodate page as a zero-length write, and force the caller to redo
+	 * the whole thing.
+	 */
+	if (unlikely(copied < len && !PageUptodate(page))) {
+		copied = 0;
+	} else {
+		SetPageUptodate(page);
+		iomap_set_page_dirty(page);
+	}
+	return __generic_write_end(inode, pos, copied, page);
+}
+
 static int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page)
+		unsigned copied, struct page *page, struct iomap *iomap)
 {
 	int ret;
 
-	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
-			copied, page, NULL);
+	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
+		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
+				copied, page, NULL);
+	else
+		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
+
 	if (ret < len)
 		iomap_write_failed(inode, pos, len);
 	return ret;
@@ -408,7 +508,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page);
+		status = iomap_write_end(inode, pos, bytes, copied, page,
+				iomap);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -502,7 +603,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		WARN_ON_ONCE(!PageUptodate(page));
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page);
+		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -554,7 +655,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page);
+	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -640,11 +741,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
 	struct page *page = data;
 	int ret;
 
-	ret = __block_write_begin_int(page, pos, length, NULL, iomap);
-	if (ret)
-		return ret;
+	if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
+		ret = __block_write_begin_int(page, pos, length, NULL, iomap);
+		if (ret)
+			return ret;
+		block_commit_write(page, 0, length);
+	} else {
+		WARN_ON_ONCE(!PageUptodate(page));
+		WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
+	}
 
-	block_commit_write(page, 0, length);
 	return length;
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c6ce6f9335b6..da6d1995e460 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -638,7 +638,7 @@ xfs_file_iomap_begin_delay(
 	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
 	 * them out if the write happens to fail.
 	 */
-	iomap->flags = IOMAP_F_NEW;
+	iomap->flags |= IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
 done:
 	if (isnullstartblock(got.br_startblock))
@@ -1031,6 +1031,8 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	iomap->flags |= IOMAP_F_BUFFER_HEAD;
+
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
@@ -1131,7 +1133,7 @@ xfs_file_iomap_begin(
 	if (error)
 		return error;
 
-	iomap->flags = IOMAP_F_NEW;
+	iomap->flags |= IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
 
 out_finish:
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 7300d30ca495..4d3d9d0cd69f 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -30,6 +30,7 @@ struct vm_fault;
  */
 #define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
 #define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
+#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
@@ -92,6 +93,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 int iomap_readpage(struct page *page, const struct iomap_ops *ops);
 int iomap_readpages(struct address_space *mapping, struct list_head *pages,
 		unsigned nr_pages, const struct iomap_ops *ops);
+int iomap_set_page_dirty(struct page *page);
 int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
  2018-05-30  9:59 ` [PATCH 01/18] fs: factor out a __generic_write_end helper Christoph Hellwig
  2018-05-30  9:59 ` [PATCH 02/18] iomap: add initial support for writes without buffer heads Christoph Hellwig
@ 2018-05-30  9:59 ` Christoph Hellwig
  2018-05-30 13:35   ` Brian Foster
  2018-05-30  9:59 ` [PATCH 04/18] xfs: simplify xfs_aops_discard_page Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Instead of using xfs_bmapi_read to find delalloc extents and then punch
them out using xfs_bunmapi, opencode the loop to iterate over the extents
and call xfs_bmap_del_extent_delay directly.  This both simplifies the
code and reduces the number of extent tree lookups required.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c | 84 ++++++++++++++----------------------------
 1 file changed, 28 insertions(+), 56 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 06badcbadeb4..f2b87873612d 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -695,12 +695,10 @@ xfs_getbmap(
 }
 
 /*
- * dead simple method of punching delalyed allocation blocks from a range in
- * the inode. Walks a block at a time so will be slow, but is only executed in
- * rare error cases so the overhead is not critical. This will always punch out
- * both the start and end blocks, even if the ranges only partially overlap
- * them, so it is up to the caller to ensure that partial blocks are not
- * passed in.
+ * Dead simple method of punching delalyed allocation blocks from a range in
+ * the inode.  This will always punch out both the start and end blocks, even
+ * if the ranges only partially overlap them, so it is up to the caller to
+ * ensure that partial blocks are not passed in.
  */
 int
 xfs_bmap_punch_delalloc_range(
@@ -708,63 +706,37 @@ xfs_bmap_punch_delalloc_range(
 	xfs_fileoff_t		start_fsb,
 	xfs_fileoff_t		length)
 {
-	xfs_fileoff_t		remaining = length;
+	struct xfs_ifork	*ifp = &ip->i_df;
+	xfs_fileoff_t		end_fsb = start_fsb + length;
+	struct xfs_bmbt_irec	got, del;
+	struct xfs_iext_cursor	icur;
 	int			error = 0;
 
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
-	do {
-		int		done;
-		xfs_bmbt_irec_t	imap;
-		int		nimaps = 1;
-		xfs_fsblock_t	firstblock;
-		struct xfs_defer_ops dfops;
-
-		/*
-		 * Map the range first and check that it is a delalloc extent
-		 * before trying to unmap the range. Otherwise we will be
-		 * trying to remove a real extent (which requires a
-		 * transaction) or a hole, which is probably a bad idea...
-		 */
-		error = xfs_bmapi_read(ip, start_fsb, 1, &imap, &nimaps,
-				       XFS_BMAPI_ENTIRE);
+	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+		if (error)
+			return error;
+	}
 
-		if (error) {
-			/* something screwed, just bail */
-			if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
-				xfs_alert(ip->i_mount,
-			"Failed delalloc mapping lookup ino %lld fsb %lld.",
-						ip->i_ino, start_fsb);
-			}
-			break;
-		}
-		if (!nimaps) {
-			/* nothing there */
-			goto next_block;
-		}
-		if (imap.br_startblock != DELAYSTARTBLOCK) {
-			/* been converted, ignore */
-			goto next_block;
-		}
-		WARN_ON(imap.br_blockcount == 0);
+	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
+		return 0;
 
-		/*
-		 * Note: while we initialise the firstblock/dfops pair, they
-		 * should never be used because blocks should never be
-		 * allocated or freed for a delalloc extent and hence we need
-		 * don't cancel or finish them after the xfs_bunmapi() call.
-		 */
-		xfs_defer_init(&dfops, &firstblock);
-		error = xfs_bunmapi(NULL, ip, start_fsb, 1, 0, 1, &firstblock,
-					&dfops, &done);
-		if (error)
-			break;
+	while (got.br_startoff + got.br_blockcount > start_fsb) {
+		del = got;
+		xfs_trim_extent(&del, start_fsb, length);
 
-		ASSERT(!xfs_defer_has_unfinished_work(&dfops));
-next_block:
-		start_fsb++;
-		remaining--;
-	} while(remaining > 0);
+		if (del.br_blockcount && isnullstartblock(del.br_startblock)) {
+			error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK,
+					&icur, &got, &del);
+			if (error || !xfs_iext_get_extent(ifp, &icur, &got))
+				break;
+		} else {
+			if (!xfs_iext_prev_extent(ifp, &icur, &got))
+				break;
+		}
+	}
 
 	return error;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/18] xfs: simplify xfs_aops_discard_page
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-05-30  9:59 ` [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-05-30  9:59 ` Christoph Hellwig
  2018-05-30 13:35   ` Brian Foster
  2018-05-30 16:40   ` Darrick J. Wong
  2018-05-30 10:00 ` [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30  9:59 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Instead of looking at the buffer heads to see if a block is delalloc just
call xfs_bmap_punch_delalloc_range on the whole page - this will leave
any non-delalloc block intact and handle the iteration for us.  As a side
effect one more place stops caring about buffer heads and we can remove the
xfs_check_page_type function entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 85 +++++------------------------------------------
 1 file changed, 9 insertions(+), 76 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index c631c457b444..f2333e351e07 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -711,49 +711,6 @@ xfs_map_at_offset(
 	clear_buffer_unwritten(bh);
 }
 
-/*
- * Test if a given page contains at least one buffer of a given @type.
- * If @check_all_buffers is true, then we walk all the buffers in the page to
- * try to find one of the type passed in. If it is not set, then the caller only
- * needs to check the first buffer on the page for a match.
- */
-STATIC bool
-xfs_check_page_type(
-	struct page		*page,
-	unsigned int		type,
-	bool			check_all_buffers)
-{
-	struct buffer_head	*bh;
-	struct buffer_head	*head;
-
-	if (PageWriteback(page))
-		return false;
-	if (!page->mapping)
-		return false;
-	if (!page_has_buffers(page))
-		return false;
-
-	bh = head = page_buffers(page);
-	do {
-		if (buffer_unwritten(bh)) {
-			if (type == XFS_IO_UNWRITTEN)
-				return true;
-		} else if (buffer_delay(bh)) {
-			if (type == XFS_IO_DELALLOC)
-				return true;
-		} else if (buffer_dirty(bh) && buffer_mapped(bh)) {
-			if (type == XFS_IO_OVERWRITE)
-				return true;
-		}
-
-		/* If we are only checking the first buffer, we are done now. */
-		if (!check_all_buffers)
-			break;
-	} while ((bh = bh->b_this_page) != head);
-
-	return false;
-}
-
 STATIC void
 xfs_vm_invalidatepage(
 	struct page		*page,
@@ -785,9 +742,6 @@ xfs_vm_invalidatepage(
  * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
  * truncation without a transaction as there is no space left for block
  * reservation (typically why we see a ENOSPC in writeback).
- *
- * This is not a performance critical path, so for now just do the punching a
- * buffer head at a time.
  */
 STATIC void
 xfs_aops_discard_page(
@@ -795,47 +749,26 @@ xfs_aops_discard_page(
 {
 	struct inode		*inode = page->mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	struct buffer_head	*bh, *head;
+	struct xfs_mount	*mp = ip->i_mount;
 	loff_t			offset = page_offset(page);
+	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, offset);
+	int			error;
 
-	if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
-		goto out_invalidate;
-
-	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
+	if (XFS_FORCED_SHUTDOWN(mp))
 		goto out_invalidate;
 
-	xfs_alert(ip->i_mount,
+	xfs_alert(mp,
 		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
 			page, ip->i_ino, offset);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	bh = head = page_buffers(page);
-	do {
-		int		error;
-		xfs_fileoff_t	start_fsb;
-
-		if (!buffer_delay(bh))
-			goto next_buffer;
-
-		start_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
-		error = xfs_bmap_punch_delalloc_range(ip, start_fsb, 1);
-		if (error) {
-			/* something screwed, just bail */
-			if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
-				xfs_alert(ip->i_mount,
-			"page discard unable to remove delalloc mapping.");
-			}
-			break;
-		}
-next_buffer:
-		offset += i_blocksize(inode);
-
-	} while ((bh = bh->b_this_page) != head);
-
+	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
+			PAGE_SIZE / i_blocksize(inode));
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error && !XFS_FORCED_SHUTDOWN(mp))
+		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
 	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
-	return;
 }
 
 static int
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-05-30  9:59 ` [PATCH 04/18] xfs: simplify xfs_aops_discard_page Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 13:35   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 06/18] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Both callers want the same looking, so do it only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c      | 2 --
 fs/xfs/xfs_bmap_util.c | 7 ++++---
 fs/xfs/xfs_iomap.c     | 3 ---
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index f2333e351e07..5dd09e83c81c 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -761,10 +761,8 @@ xfs_aops_discard_page(
 		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
 			page, ip->i_ino, offset);
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
 			PAGE_SIZE / i_blocksize(inode));
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index f2b87873612d..86a7ee425bfc 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -712,12 +712,11 @@ xfs_bmap_punch_delalloc_range(
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
 
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
-
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
 		if (error)
-			return error;
+			goto out_unlock;
 	}
 
 	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
@@ -738,6 +737,8 @@ xfs_bmap_punch_delalloc_range(
 		}
 	}
 
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index da6d1995e460..f949f0dd7382 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1203,11 +1203,8 @@ xfs_file_iomap_end_delalloc(
 		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
 					 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
 		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
 					       end_fsb - start_fsb);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
 		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
 			xfs_alert(mp, "%s: unable to clean up ino %lld",
 				__func__, ip->i_ino);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/18] xfs: make xfs_writepage_map extent map centric
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:19   ` Darrick J. Wong
  2018-05-30 17:39   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm, Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

xfs_writepage_map() iterates over the bufferheads on a page to decide
what sort of IO to do and what actions to take.  However, when it comes
to reflink and deciding when it needs to execute a COW operation, we no
longer look at the bufferhead state but instead we ignore than and look
up internal state held in teh COW fork extent list.

This means xfs_writepage_map() is somewhat confused. It does stuff, then
ignores it, then tries to handle the impedence mismatch by shovelling the
results inside the existing mapping code.  It works, but it's a bit of a
mess and it makes it hard to fix the cached map bug that the writepage
code currently has.

To unify the two different mechanisms, we first have to choose a direction.
That's already been set - we're de-emphasising bufferheads so they are no
longer a control structure as we need to do taht to allow for eventual
removal.  Hence we need to move away from looking at bufferhead state to
determine what operations we need to perform.

We can't completely get rid of bufferheads yet - they do contain some
state that is absolutely necessary, such as whether that part of the page
contains valid data or not (buffer_uptodate()).  Other state in the
bufferhead is redundant:

	BH_dirty - the page is dirty, so we can ignore this and just
		write it
	BH_delay - we have delalloc extent info in the DATA fork extent
		tree
	BH_unwritten - same as BH_delay
	BH_mapped - indicates we've already used it once for IO and it is
		mapped to a disk address. Needs to be ignored for COW
		blocks.

The BH_mapped flag is an interesting case - it's supposed to indicate that
it's already mapped to disk and so we can just use it "as is".  In theory,
we don't even have to do an extent lookup to find where to write it too,
but we have to do that anyway to determine we are actually writing over a
valid extent.  Hence it's not even serving the purpose of avoiding a an
extent lookup during writeback, and so we can pretty much ignore it.
Especially as we have to ignore it for COW operations...

Therefore, use the extent map as the source of information to tell us
what actions we need to take and what sort of IO we should perform.  The
first step is integration xfs_map_blocks() and xfs_map_cow() and have
xfs_map_blocks() set the io type according to what it looks up.  This
means it can easily handle both normal overwrite and COW cases.  The
only thing we also need to add is the ability to return hole mappings.

We need to return and cache hole mappings now for the case of multiple
blocks per page.  We no longer use the BH_mapped to indicate a block over
a hole, so we have to get that info from xfs_map_blocks().  We cache it so
that holes that span two pages don't need separate lookups.  This allows us
to avoid ever doing write IO over a hole, too.

Further, we need to drop the XFS_BMAPI_IGSTATE flag so that we don't
combine contiguous written and unwritten extents into a single map.  The
io type needs to match the extent type we are writing to so that we run the
correct IO completion routine for the IO. There is scope for optimisation
that would allow us to re-instate the XFS_BMAPI_IGSTATE flag, but this
requires tweaks to code outside the scope of this change.

Now that we have xfs_map_blocks() returning both a cached map and the type
of IO we need to perform, we can rewrite xfs_writepage_map() to drop all
the bufferhead control. It's also much simplified because it doesn't need
to explicitly handle COW operations.  Instead of iterating bufferheads, it
iterates blocks within the page and then looks up what per-block state is
required from the appropriate bufferhead.  It then validates the cached
map, and if it's not valid, we get a new map.  If we don't get a valid map
or it's over a hole, we skip the block.

At this point, we have to remap the bufferhead via xfs_map_at_offset().
As previously noted, we had to do this even if the buffer was already
mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN
and XFS_IO_COW IO types.  With xfs_map_blocks() now controlling the type,
even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet-
written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE.
Bufferheads that span such regions still need their BH_Delay flags cleared
and their block numbers calculated, so we now unconditionally map each
bufferhead before submission.

But wait! There's more - remember the old "treat unwritten extents as
holes on read" hack?  Yeah, that means we can have a dirty page with
unmapped, unwritten bufferheads that contain data!  What makes these so
special is that the unwritten "hole" bufferheads do not have a valid block
device pointer, so if we attempt to write them xfs_add_to_ioend() blows
up. So we make xfs_map_at_offset() do the "realtime or data device"
lookup from the inode and ignore what was or wasn't put into the
bufferhead when the buffer was instantiated.

The astute reader will have realised by now that this code treats
unwritten extents in multiple-blocks-per-page situations differently.
If we get any combination of unwritten blocks on a dirty page that contain
valid data in the page, we're going to convert them to real extents.  This
can actually be a win, because it means that pages with interleaving
unwritten and written blocks will get converted to a single written extent
with zeros replacing the interspersed unwritten blocks.  This is actually
good for reducing extent list and conversion overhead, and it means we
issue a contiguous IO instead of lots of little ones.  The downside is
that we use up a little extra IO bandwidth.  Neither of these seem like a
bad thing given that spinning disks are seek sensitive, and SSDs/pmem have
bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger
IOs will result in better performance on them...

As a result of all this, the only state we actually care about from the
bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to
pass some information to the bio via xfs_add_to_ioend(), but that is
trivial to separate and pass explicitly.  This means we really only need
1 bit of state per block per page from the buffered write path in the
writeback path.  Everything else we do with the bufferhead is purely to
make the buffered IO front end continue to work correctly. i.e we've
pretty much marginalised bufferheads in the writeback path completely.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
[hch: forward port + slight refactoring]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 273 ++++++++++++++++++++--------------------------
 fs/xfs/xfs_aops.h |   4 +-
 2 files changed, 124 insertions(+), 153 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5dd09e83c81c..8cc41a786b5e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -378,78 +378,93 @@ xfs_map_blocks(
 	struct inode		*inode,
 	loff_t			offset,
 	struct xfs_bmbt_irec	*imap,
-	int			type)
+	int			*type)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
 	xfs_fileoff_t		offset_fsb, end_fsb;
+	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
-	int			bmapi_flags = XFS_BMAPI_ENTIRE;
 	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	/*
-	 * Truncate can race with writeback since writeback doesn't take the
-	 * iolock and truncate decreases the file size before it starts
-	 * truncating the pages between new_size and old_size.  Therefore, we
-	 * can end up in the situation where writeback gets a CoW fork mapping
-	 * but the truncate makes the mapping invalid and we end up in here
-	 * trying to get a new mapping.  Bail out here so that we simply never
-	 * get a valid mapping and so we drop the write altogether.  The page
-	 * truncation will kill the contents anyway.
-	 */
-	if (type == XFS_IO_COW && offset > i_size_read(inode))
-		return 0;
-
-	ASSERT(type != XFS_IO_COW);
-	if (type == XFS_IO_UNWRITTEN)
-		bmapi_flags |= XFS_BMAPI_IGSTATE;
-
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
 	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
 
+	if (xfs_is_reflink_inode(ip) &&
+	    xfs_reflink_find_cow_mapping(ip, offset, imap)) {
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+		/*
+		 * Truncate can race with writeback since writeback doesn't
+		 * take the iolock and truncate decreases the file size before
+		 * it starts truncating the pages between new_size and old_size.
+		 * Therefore, we can end up in the situation where writeback
+		 * gets a CoW fork mapping but the truncate makes the mapping
+		 * invalid and we end up in here trying to get a new mapping.
+		 * bail out here so that we simply never get a valid mapping
+		 * and so we drop the write altogether.  The page truncation
+		 * will kill the contents anyway.
+		 */
+		if (offset > i_size_read(inode))
+			return 0;
+		whichfork = XFS_COW_FORK;
+		*type = XFS_IO_COW;
+		goto allocate_blocks;
+	}
+
 	if (offset > mp->m_super->s_maxbytes - count)
 		count = mp->m_super->s_maxbytes - offset;
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				imap, &nimaps, bmapi_flags);
-	/*
-	 * Truncate an overwrite extent if there's a pending CoW
-	 * reservation before the end of this extent.  This forces us
-	 * to come back to writepage to take care of the CoW.
-	 */
-	if (nimaps && type == XFS_IO_OVERWRITE)
+				imap, &nimaps, XFS_BMAPI_ENTIRE);
+	if (!nimaps) {
+		/*
+		 * Lookup returns no match? Beyond eof? regardless,
+		 * return it as a hole so we don't write it
+		 */
+		imap->br_startoff = offset_fsb;
+		imap->br_blockcount = end_fsb - offset_fsb;
+		imap->br_startblock = HOLESTARTBLOCK;
+		*type = XFS_IO_HOLE;
+	} else if (imap->br_startblock == HOLESTARTBLOCK) {
+		/* landed in a hole */
+		*type = XFS_IO_HOLE;
+	} else {
+		if (isnullstartblock(imap->br_startblock)) {
+			/* got a delalloc extent */
+			*type = XFS_IO_DELALLOC;
+			goto allocate_blocks;
+		}
+
+		/*
+		 * Got an existing extent for overwrite.  Truncate it if there
+		 * is a pending CoW reservation before the end of this extent,
+		 * so that we pick up the COW extents in the next iteration.
+		 */
 		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
+		if (imap->br_state == XFS_EXT_UNWRITTEN)
+			*type = XFS_IO_UNWRITTEN;
+		else
+			*type = XFS_IO_OVERWRITE;
+	}
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
-	if (error)
-		return error;
-
-	if (type == XFS_IO_DELALLOC &&
-	    (!nimaps || isnullstartblock(imap->br_startblock))) {
-		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
-				imap);
-		if (!error)
-			trace_xfs_map_blocks_alloc(ip, offset, count, type, imap);
-		return error;
-	}
+	trace_xfs_map_blocks_found(ip, offset, count, *type, imap);
+	return error;
 
-#ifdef DEBUG
-	if (type == XFS_IO_UNWRITTEN) {
-		ASSERT(nimaps);
-		ASSERT(imap->br_startblock != HOLESTARTBLOCK);
-		ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
-	}
-#endif
-	if (nimaps)
-		trace_xfs_map_blocks_found(ip, offset, count, type, imap);
-	return 0;
+allocate_blocks:
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+	if (!error)
+		error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
+	if (!error)
+		trace_xfs_map_blocks_alloc(ip, offset, count, *type, imap);
+	return error;
 }
 
 STATIC bool
@@ -709,6 +724,14 @@ xfs_map_at_offset(
 	set_buffer_mapped(bh);
 	clear_buffer_delay(bh);
 	clear_buffer_unwritten(bh);
+
+	/*
+	 * If this is a realtime file, data may be on a different device.
+	 * to that pointed to from the buffer_head b_bdev currently. We can't
+	 * trust that the bufferhead has a already been mapped correctly, so
+	 * set the bdev now.
+	 */
+	bh->b_bdev = xfs_find_bdev_for_inode(inode);
 }
 
 STATIC void
@@ -769,56 +792,6 @@ xfs_aops_discard_page(
 	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
 }
 
-static int
-xfs_map_cow(
-	struct xfs_writepage_ctx *wpc,
-	struct inode		*inode,
-	loff_t			offset,
-	unsigned int		*new_type)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_bmbt_irec	imap;
-	bool			is_cow = false;
-	int			error;
-
-	/*
-	 * If we already have a valid COW mapping keep using it.
-	 */
-	if (wpc->io_type == XFS_IO_COW) {
-		wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
-		if (wpc->imap_valid) {
-			*new_type = XFS_IO_COW;
-			return 0;
-		}
-	}
-
-	/*
-	 * Else we need to check if there is a COW mapping at this offset.
-	 */
-	xfs_ilock(ip, XFS_ILOCK_SHARED);
-	is_cow = xfs_reflink_find_cow_mapping(ip, offset, &imap);
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
-	if (!is_cow)
-		return 0;
-
-	/*
-	 * And if the COW mapping has a delayed extent here we need to
-	 * allocate real space for it now.
-	 */
-	if (isnullstartblock(imap.br_startblock)) {
-		error = xfs_iomap_write_allocate(ip, XFS_COW_FORK, offset,
-				&imap);
-		if (error)
-			return error;
-	}
-
-	wpc->io_type = *new_type = XFS_IO_COW;
-	wpc->imap_valid = true;
-	wpc->imap = imap;
-	return 0;
-}
-
 /*
  * We implement an immediate ioend submission policy here to avoid needing to
  * chain multiple ioends and hence nest mempool allocations which can violate
@@ -845,85 +818,81 @@ xfs_writepage_map(
 {
 	LIST_HEAD(submit_list);
 	struct xfs_ioend	*ioend, *next;
-	struct buffer_head	*bh, *head;
+	struct buffer_head	*bh;
 	ssize_t			len = i_blocksize(inode);
-	uint64_t		offset;
 	int			error = 0;
 	int			count = 0;
-	int			uptodate = 1;
-	unsigned int		new_type;
+	bool			uptodate = true;
+	loff_t			file_offset;	/* file offset of page */
+	unsigned		poffset;	/* offset into page */
 
-	bh = head = page_buffers(page);
-	offset = page_offset(page);
-	do {
-		if (offset >= end_offset)
+	/*
+	 * Walk the blocks on the page, and we we run off then end of the
+	 * current map or find the current map invalid, grab a new one.
+	 * We only use bufferheads here to check per-block state - they no
+	 * longer control the iteration through the page. This allows us to
+	 * replace the bufferhead with some other state tracking mechanism in
+	 * future.
+	 */
+	file_offset = page_offset(page);
+	bh = page_buffers(page);
+	for (poffset = 0;
+	     poffset < PAGE_SIZE;
+	     poffset += len, file_offset += len, bh = bh->b_this_page) {
+		/* past the range we are writing, so nothing more to write. */
+		if (file_offset >= end_offset)
 			break;
-		if (!buffer_uptodate(bh))
-			uptodate = 0;
 
 		/*
-		 * set_page_dirty dirties all buffers in a page, independent
-		 * of their state.  The dirty state however is entirely
-		 * meaningless for holes (!mapped && uptodate), so skip
-		 * buffers covering holes here.
+		 * Block does not contain valid data, skip it, mark the current
+		 * map as invalid because we have a discontiguity. This ensures
+		 * we put subsequent writeable buffers into a new ioend.
 		 */
-		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
-			wpc->imap_valid = false;
-			continue;
-		}
-
-		if (buffer_unwritten(bh))
-			new_type = XFS_IO_UNWRITTEN;
-		else if (buffer_delay(bh))
-			new_type = XFS_IO_DELALLOC;
-		else if (buffer_uptodate(bh))
-			new_type = XFS_IO_OVERWRITE;
-		else {
+		if (!buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
-			/*
-			 * This buffer is not uptodate and will not be
-			 * written to disk.  Ensure that we will put any
-			 * subsequent writeable buffers into a new
-			 * ioend.
-			 */
+			uptodate = false;
 			wpc->imap_valid = false;
 			continue;
 		}
 
-		if (xfs_is_reflink_inode(XFS_I(inode))) {
-			error = xfs_map_cow(wpc, inode, offset, &new_type);
-			if (error)
-				goto out;
-		}
-
-		if (wpc->io_type != new_type) {
-			wpc->io_type = new_type;
-			wpc->imap_valid = false;
-		}
-
+		/* Check to see if current map spans this file offset */
 		if (wpc->imap_valid)
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
+							 file_offset);
+		/*
+		 * If we don't have a valid map, now it's time to get a new one
+		 * for this offset.  This will convert delayed allocations
+		 * (including COW ones) into real extents.  If we return without
+		 * a valid map, it means we landed in a hole and we skip the
+		 * block.
+		 */
 		if (!wpc->imap_valid) {
-			error = xfs_map_blocks(inode, offset, &wpc->imap,
-					     wpc->io_type);
+			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
+					     &wpc->io_type);
 			if (error)
 				goto out;
 			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 offset);
+							 file_offset);
 		}
-		if (wpc->imap_valid) {
-			lock_buffer(bh);
-			if (wpc->io_type != XFS_IO_OVERWRITE)
-				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
-			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
-			count++;
+
+		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
+			/*
+			 * set_page_dirty dirties all buffers in a page, independent
+			 * of their state.  The dirty state however is entirely
+			 * meaningless for holes (!mapped && uptodate), so check we did
+			 * have a buffer covering a hole here and continue.
+			 */
+			continue;
 		}
 
-	} while (offset += len, ((bh = bh->b_this_page) != head));
+		lock_buffer(bh);
+		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
+		count++;
+	}
 
-	if (uptodate && bh == head)
+	if (uptodate && poffset == PAGE_SIZE)
 		SetPageUptodate(page);
 
 	ASSERT(wpc->ioend || list_empty(&submit_list));
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 69346d460dfa..b2ef5b661761 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -29,6 +29,7 @@ enum {
 	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
 	XFS_IO_OVERWRITE,	/* covers already allocated extent */
 	XFS_IO_COW,		/* covers copy-on-write extent */
+	XFS_IO_HOLE,		/* covers region without any block allocation */
 };
 
 #define XFS_IO_TYPES \
@@ -36,7 +37,8 @@ enum {
 	{ XFS_IO_DELALLOC,		"delalloc" }, \
 	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
 	{ XFS_IO_OVERWRITE,		"overwrite" }, \
-	{ XFS_IO_COW,			"CoW" }
+	{ XFS_IO_COW,			"CoW" }, \
+	{ XFS_IO_HOLE,			"hole" }
 
 /*
  * Structure for buffered I/O completions.
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 06/18] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:19   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 6 ++----
 fs/xfs/libxfs/xfs_bmap.h | 3 ---
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7b0e2b551e23..4b5e014417d2 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3799,8 +3799,7 @@ xfs_bmapi_update_map(
 		   mval[-1].br_startblock != HOLESTARTBLOCK &&
 		   mval->br_startblock == mval[-1].br_startblock +
 					  mval[-1].br_blockcount &&
-		   ((flags & XFS_BMAPI_IGSTATE) ||
-			mval[-1].br_state == mval->br_state)) {
+		   mval[-1].br_state == mval->br_state) {
 		ASSERT(mval->br_startoff ==
 		       mval[-1].br_startoff + mval[-1].br_blockcount);
 		mval[-1].br_blockcount += mval->br_blockcount;
@@ -3845,7 +3844,7 @@ xfs_bmapi_read(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
-			   XFS_BMAPI_IGSTATE|XFS_BMAPI_COWFORK)));
+			   XFS_BMAPI_COWFORK)));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
 
 	if (unlikely(XFS_TEST_ERROR(
@@ -4290,7 +4289,6 @@ xfs_bmapi_write(
 
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
-	ASSERT(!(flags & XFS_BMAPI_IGSTATE));
 	ASSERT(tp != NULL ||
 	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
 			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 2c233f9f1a26..a845fe57d1b5 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -80,8 +80,6 @@ struct xfs_extent_free_item
 #define XFS_BMAPI_METADATA	0x002	/* mapping metadata not user data */
 #define XFS_BMAPI_ATTRFORK	0x004	/* use attribute fork not data */
 #define XFS_BMAPI_PREALLOC	0x008	/* preallocation op: unwritten space */
-#define XFS_BMAPI_IGSTATE	0x010	/* Ignore state - */
-					/* combine contig. space */
 #define XFS_BMAPI_CONTIG	0x020	/* must allocate only one extent */
 /*
  * unwritten extent conversion - this needs write cache flushing and no additional
@@ -128,7 +126,6 @@ struct xfs_extent_free_item
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
 	{ XFS_BMAPI_ATTRFORK,	"ATTRFORK" }, \
 	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
-	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
 	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
 	{ XFS_BMAPI_ZERO,	"ZERO" }, \
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:23   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

We only have one caller left, and open coding the simple extent list
lookup in it allows us to make the code both more understandable and
reuse calculations and variables already present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c    | 17 ++++++++++++-----
 fs/xfs/xfs_reflink.c | 30 ------------------------------
 fs/xfs/xfs_reflink.h |  2 --
 fs/xfs/xfs_trace.h   |  1 -
 4 files changed, 12 insertions(+), 38 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8cc41a786b5e..e2671b223409 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -385,6 +385,7 @@ xfs_map_blocks(
 	ssize_t			count = i_blocksize(inode);
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	int			whichfork = XFS_DATA_FORK;
+	struct xfs_iext_cursor	icur;
 	int			error = 0;
 	int			nimaps = 1;
 
@@ -396,8 +397,18 @@ xfs_map_blocks(
 	       (ip->i_df.if_flags & XFS_IFEXTENTS));
 	ASSERT(offset <= mp->m_super->s_maxbytes);
 
+	if (offset > mp->m_super->s_maxbytes - count)
+		count = mp->m_super->s_maxbytes - offset;
+	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
+	offset_fsb = XFS_B_TO_FSBT(mp, offset);
+
+	/*
+	 * Check if this is offset is covered by a COW extents, and if yes use
+	 * it directly instead of looking up anything in the data fork.
+	 */
 	if (xfs_is_reflink_inode(ip) &&
-	    xfs_reflink_find_cow_mapping(ip, offset, imap)) {
+	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap) &&
+	    imap->br_startoff <= offset_fsb) {
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		/*
 		 * Truncate can race with writeback since writeback doesn't
@@ -417,10 +428,6 @@ xfs_map_blocks(
 		goto allocate_blocks;
 	}
 
-	if (offset > mp->m_super->s_maxbytes - count)
-		count = mp->m_super->s_maxbytes - offset;
-	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				imap, &nimaps, XFS_BMAPI_ENTIRE);
 	if (!nimaps) {
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 713e857d9ffa..8e5eb8e70c89 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -484,36 +484,6 @@ xfs_reflink_allocate_cow(
 	return error;
 }
 
-/*
- * Find the CoW reservation for a given byte offset of a file.
- */
-bool
-xfs_reflink_find_cow_mapping(
-	struct xfs_inode		*ip,
-	xfs_off_t			offset,
-	struct xfs_bmbt_irec		*imap)
-{
-	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	xfs_fileoff_t			offset_fsb;
-	struct xfs_bmbt_irec		got;
-	struct xfs_iext_cursor		icur;
-
-	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
-
-	if (!xfs_is_reflink_inode(ip))
-		return false;
-	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
-	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
-		return false;
-	if (got.br_startoff > offset_fsb)
-		return false;
-
-	trace_xfs_reflink_find_cow_mapping(ip, offset, 1, XFS_IO_OVERWRITE,
-			&got);
-	*imap = got;
-	return true;
-}
-
 /*
  * Trim an extent to end at the next CoW reservation past offset_fsb.
  */
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 701487bab468..15a456492667 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -32,8 +32,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
-extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
-		struct xfs_bmbt_irec *imap);
 extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
 		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 9d4c4ca24fe6..ed8f774944ba 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3227,7 +3227,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
 DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
-DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);
 
 DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:27   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

In the only caller we just did a lookup in the COW extent tree for
the same offset.  Reuse that result and save a lookup, as well as
shortening the ilock hold time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c    | 25 ++++++++++++++++---------
 fs/xfs/xfs_reflink.c | 33 ---------------------------------
 fs/xfs/xfs_reflink.h |  2 --
 3 files changed, 16 insertions(+), 44 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e2671b223409..587493e9c8a1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -383,7 +383,7 @@ xfs_map_blocks(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			count = i_blocksize(inode);
-	xfs_fileoff_t		offset_fsb, end_fsb;
+	xfs_fileoff_t		offset_fsb, end_fsb, cow_fsb = NULLFILEOFF;
 	int			whichfork = XFS_DATA_FORK;
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
@@ -407,8 +407,9 @@ xfs_map_blocks(
 	 * it directly instead of looking up anything in the data fork.
 	 */
 	if (xfs_is_reflink_inode(ip) &&
-	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap) &&
-	    imap->br_startoff <= offset_fsb) {
+	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap))
+		cow_fsb = imap->br_startoff;
+	if (cow_fsb != NULLFILEOFF && cow_fsb <= offset_fsb) {
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		/*
 		 * Truncate can race with writeback since writeback doesn't
@@ -430,6 +431,10 @@ xfs_map_blocks(
 
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				imap, &nimaps, XFS_BMAPI_ENTIRE);
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+	if (error)
+		return error;
+
 	if (!nimaps) {
 		/*
 		 * Lookup returns no match? Beyond eof? regardless,
@@ -454,21 +459,23 @@ xfs_map_blocks(
 		 * is a pending CoW reservation before the end of this extent,
 		 * so that we pick up the COW extents in the next iteration.
 		 */
-		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
+		if (cow_fsb != NULLFILEOFF &&
+		    cow_fsb < imap->br_startoff + imap->br_blockcount) {
+			imap->br_blockcount = cow_fsb - imap->br_startoff;
+			trace_xfs_reflink_trim_irec(ip, imap);
+		}
+
 		if (imap->br_state == XFS_EXT_UNWRITTEN)
 			*type = XFS_IO_UNWRITTEN;
 		else
 			*type = XFS_IO_OVERWRITE;
 	}
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 	trace_xfs_map_blocks_found(ip, offset, count, *type, imap);
-	return error;
+	return 0;
 
 allocate_blocks:
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-	if (!error)
-		error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
+	error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
 	if (!error)
 		trace_xfs_map_blocks_alloc(ip, offset, count, *type, imap);
 	return error;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 8e5eb8e70c89..ff76bc56ff3d 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -484,39 +484,6 @@ xfs_reflink_allocate_cow(
 	return error;
 }
 
-/*
- * Trim an extent to end at the next CoW reservation past offset_fsb.
- */
-void
-xfs_reflink_trim_irec_to_next_cow(
-	struct xfs_inode		*ip,
-	xfs_fileoff_t			offset_fsb,
-	struct xfs_bmbt_irec		*imap)
-{
-	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	struct xfs_bmbt_irec		got;
-	struct xfs_iext_cursor		icur;
-
-	if (!xfs_is_reflink_inode(ip))
-		return;
-
-	/* Find the extent in the CoW fork. */
-	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
-		return;
-
-	/* This is the extent before; try sliding up one. */
-	if (got.br_startoff < offset_fsb) {
-		if (!xfs_iext_next_extent(ifp, &icur, &got))
-			return;
-	}
-
-	if (got.br_startoff >= imap->br_startoff + imap->br_blockcount)
-		return;
-
-	imap->br_blockcount = got.br_startoff - imap->br_startoff;
-	trace_xfs_reflink_trim_irec(ip, imap);
-}
-
 /*
  * Cancel CoW reservations for some block range of an inode.
  *
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 15a456492667..e8d4d50c629f 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -32,8 +32,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
-extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
-		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
 
 extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
 		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:32   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

xfs_bmapi_read adds zero value in xfs_map_blocks.  Replace it with a
direct call to the low-level extent lookup function.

Note that we now always pass a 0 length to the trace points as we ask
for an unspecified len.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 587493e9c8a1..cef2bc3cf98b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -387,7 +387,6 @@ xfs_map_blocks(
 	int			whichfork = XFS_DATA_FORK;
 	struct xfs_iext_cursor	icur;
 	int			error = 0;
-	int			nimaps = 1;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
@@ -429,24 +428,16 @@ xfs_map_blocks(
 		goto allocate_blocks;
 	}
 
-	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				imap, &nimaps, XFS_BMAPI_ENTIRE);
+	if (!xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, imap))
+		imap->br_startoff = end_fsb;	/* fake a hole past EOF */
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-	if (error)
-		return error;
 
-	if (!nimaps) {
-		/*
-		 * Lookup returns no match? Beyond eof? regardless,
-		 * return it as a hole so we don't write it
-		 */
+	if (imap->br_startoff > offset_fsb) {
+		/* landed in a hole or beyond EOF */
+		imap->br_blockcount = imap->br_startoff - offset_fsb;
 		imap->br_startoff = offset_fsb;
-		imap->br_blockcount = end_fsb - offset_fsb;
 		imap->br_startblock = HOLESTARTBLOCK;
 		*type = XFS_IO_HOLE;
-	} else if (imap->br_startblock == HOLESTARTBLOCK) {
-		/* landed in a hole */
-		*type = XFS_IO_HOLE;
 	} else {
 		if (isnullstartblock(imap->br_startblock)) {
 			/* got a delalloc extent */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:34   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 12/18] xfs: remove the imap_valid flag Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Finding a buffer that isn't uptodate doesn't invalidate the mapping for
any given block.  The last_sector check will already take care of starting
another ioend as soon as we find any non-update buffer, and if the current
mapping doesn't include the next uptodate buffer the xfs_imap_valid check
will take care of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index cef2bc3cf98b..7dc13b0aae60 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -849,15 +849,12 @@ xfs_writepage_map(
 			break;
 
 		/*
-		 * Block does not contain valid data, skip it, mark the current
-		 * map as invalid because we have a discontiguity. This ensures
-		 * we put subsequent writeable buffers into a new ioend.
+		 * Block does not contain valid data, skip it.
 		 */
 		if (!buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
 			uptodate = false;
-			wpc->imap_valid = false;
 			continue;
 		}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/18] xfs: remove the imap_valid flag
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:44   ` Darrick J. Wong
  2018-05-31 13:47   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Simplify the way we check for a valid imap - we know we have a valid
mapping after xfs_map_blocks returned successfully, and we know we can
call xfs_imap_valid on any imap, as it will always fail on a
zero-initialized map.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 7dc13b0aae60..910b410e5a90 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -42,7 +42,6 @@
  */
 struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
-	bool			imap_valid;
 	unsigned int		io_type;
 	struct xfs_ioend	*ioend;
 	sector_t		last_block;
@@ -858,10 +857,6 @@ xfs_writepage_map(
 			continue;
 		}
 
-		/* Check to see if current map spans this file offset */
-		if (wpc->imap_valid)
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 file_offset);
 		/*
 		 * If we don't have a valid map, now it's time to get a new one
 		 * for this offset.  This will convert delayed allocations
@@ -869,16 +864,14 @@ xfs_writepage_map(
 		 * a valid map, it means we landed in a hole and we skip the
 		 * block.
 		 */
-		if (!wpc->imap_valid) {
+		if (!xfs_imap_valid(inode, &wpc->imap, file_offset)) {
 			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
 					     &wpc->io_type);
 			if (error)
 				goto out;
-			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
-							 file_offset);
 		}
 
-		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
+		if (wpc->io_type == XFS_IO_HOLE) {
 			/*
 			 * set_page_dirty dirties all buffers in a page, independent
 			 * of their state.  The dirty state however is entirely
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 12/18] xfs: remove the imap_valid flag Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:55   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Calculate all information for the bio based on the passed in information
without requiring a buffer_head structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 68 ++++++++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 910b410e5a90..7d02d04d5a5b 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -44,7 +44,6 @@ struct xfs_writepage_ctx {
 	struct xfs_bmbt_irec    imap;
 	unsigned int		io_type;
 	struct xfs_ioend	*ioend;
-	sector_t		last_block;
 };
 
 void
@@ -535,11 +534,6 @@ xfs_start_page_writeback(
 	unlock_page(page);
 }
 
-static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
-{
-	return bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
-}
-
 /*
  * Submit the bio for an ioend. We are passed an ioend with a bio attached to
  * it, and we submit that bio. The ioend may be used for multiple bio
@@ -594,27 +588,20 @@ xfs_submit_ioend(
 	return 0;
 }
 
-static void
-xfs_init_bio_from_bh(
-	struct bio		*bio,
-	struct buffer_head	*bh)
-{
-	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio_set_dev(bio, bh->b_bdev);
-}
-
 static struct xfs_ioend *
 xfs_alloc_ioend(
 	struct inode		*inode,
 	unsigned int		type,
 	xfs_off_t		offset,
-	struct buffer_head	*bh)
+	struct block_device	*bdev,
+	sector_t		sector)
 {
 	struct xfs_ioend	*ioend;
 	struct bio		*bio;
 
 	bio = bio_alloc_bioset(GFP_NOFS, BIO_MAX_PAGES, xfs_ioend_bioset);
-	xfs_init_bio_from_bh(bio, bh);
+	bio_set_dev(bio, bdev);
+	bio->bi_iter.bi_sector = sector;
 
 	ioend = container_of(bio, struct xfs_ioend, io_inline_bio);
 	INIT_LIST_HEAD(&ioend->io_list);
@@ -639,13 +626,14 @@ static void
 xfs_chain_bio(
 	struct xfs_ioend	*ioend,
 	struct writeback_control *wbc,
-	struct buffer_head	*bh)
+	struct block_device	*bdev,
+	sector_t		sector)
 {
 	struct bio *new;
 
 	new = bio_alloc(GFP_NOFS, BIO_MAX_PAGES);
-	xfs_init_bio_from_bh(new, bh);
-
+	bio_set_dev(new, bdev);
+	new->bi_iter.bi_sector = sector;
 	bio_chain(ioend->io_bio, new);
 	bio_get(ioend->io_bio);		/* for xfs_destroy_ioend */
 	ioend->io_bio->bi_opf = REQ_OP_WRITE | wbc_to_write_flags(wbc);
@@ -655,39 +643,45 @@ xfs_chain_bio(
 }
 
 /*
- * Test to see if we've been building up a completion structure for
- * earlier buffers -- if so, we try to append to this ioend if we
- * can, otherwise we finish off any current ioend and start another.
- * Return the ioend we finished off so that the caller can submit it
- * once it has finished processing the dirty page.
+ * Test to see if we have an existing ioend structure that we could append to
+ * first, otherwise finish off the current ioend and start another.
  */
 STATIC void
 xfs_add_to_ioend(
 	struct inode		*inode,
-	struct buffer_head	*bh,
 	xfs_off_t		offset,
+	struct page		*page,
 	struct xfs_writepage_ctx *wpc,
 	struct writeback_control *wbc,
 	struct list_head	*iolist)
 {
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
+	unsigned		len = i_blocksize(inode);
+	unsigned		poff = offset & (PAGE_SIZE - 1);
+	sector_t		sector;
+
+	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
+		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
+
 	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
-	    bh->b_blocknr != wpc->last_block + 1 ||
+	    sector != bio_end_sector(wpc->ioend->io_bio) ||
 	    offset != wpc->ioend->io_offset + wpc->ioend->io_size) {
 		if (wpc->ioend)
 			list_add(&wpc->ioend->io_list, iolist);
-		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset, bh);
+		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset,
+				bdev, sector);
 	}
 
 	/*
-	 * If the buffer doesn't fit into the bio we need to allocate a new
-	 * one.  This shouldn't happen more than once for a given buffer.
+	 * If the block doesn't fit into the bio we need to allocate a new
+	 * one.  This shouldn't happen more than once for a given block.
 	 */
-	while (xfs_bio_add_buffer(wpc->ioend->io_bio, bh) != bh->b_size)
-		xfs_chain_bio(wpc->ioend, wbc, bh);
+	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
+		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
 
-	wpc->ioend->io_size += bh->b_size;
-	wpc->last_block = bh->b_blocknr;
-	xfs_start_buffer_writeback(bh);
+	wpc->ioend->io_size += len;
 }
 
 STATIC void
@@ -883,7 +877,9 @@ xfs_writepage_map(
 
 		lock_buffer(bh);
 		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
-		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
+		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
+				&submit_list);
+		xfs_start_buffer_writeback(bh);
 		count++;
 	}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:56   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 15/18] xfs: remove xfs_start_page_writeback Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

This keeps it in a single place so it can be made otional more easily.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 7d02d04d5a5b..025f2acac100 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -495,21 +495,6 @@ xfs_imap_valid(
 		offset < imap->br_startoff + imap->br_blockcount;
 }
 
-STATIC void
-xfs_start_buffer_writeback(
-	struct buffer_head	*bh)
-{
-	ASSERT(buffer_mapped(bh));
-	ASSERT(buffer_locked(bh));
-	ASSERT(!buffer_delay(bh));
-	ASSERT(!buffer_unwritten(bh));
-
-	bh->b_end_io = NULL;
-	set_buffer_async_write(bh);
-	set_buffer_uptodate(bh);
-	clear_buffer_dirty(bh);
-}
-
 STATIC void
 xfs_start_page_writeback(
 	struct page		*page,
@@ -718,6 +703,7 @@ xfs_map_at_offset(
 	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
 	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
 
+	lock_buffer(bh);
 	xfs_map_buffer(inode, bh, imap, offset);
 	set_buffer_mapped(bh);
 	clear_buffer_delay(bh);
@@ -730,6 +716,10 @@ xfs_map_at_offset(
 	 * set the bdev now.
 	 */
 	bh->b_bdev = xfs_find_bdev_for_inode(inode);
+	bh->b_end_io = NULL;
+	set_buffer_async_write(bh);
+	set_buffer_uptodate(bh);
+	clear_buffer_dirty(bh);
 }
 
 STATIC void
@@ -875,11 +865,9 @@ xfs_writepage_map(
 			continue;
 		}
 
-		lock_buffer(bh);
 		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
 		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
 				&submit_list);
-		xfs_start_buffer_writeback(bh);
 		count++;
 	}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/18] xfs: remove xfs_start_page_writeback
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 17:57   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

This helper only has two callers, one of them with a constant error
argument.  Remove it to make pending changes to the code a little easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 47 +++++++++++++++++++++--------------------------
 1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 025f2acac100..38021023131e 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -495,30 +495,6 @@ xfs_imap_valid(
 		offset < imap->br_startoff + imap->br_blockcount;
 }
 
-STATIC void
-xfs_start_page_writeback(
-	struct page		*page,
-	int			clear_dirty)
-{
-	ASSERT(PageLocked(page));
-	ASSERT(!PageWriteback(page));
-
-	/*
-	 * if the page was not fully cleaned, we need to ensure that the higher
-	 * layers come back to it correctly. That means we need to keep the page
-	 * dirty, and for WB_SYNC_ALL writeback we need to ensure the
-	 * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to
-	 * write this page in this writeback sweep will be made.
-	 */
-	if (clear_dirty) {
-		clear_page_dirty_for_io(page);
-		set_page_writeback(page);
-	} else
-		set_page_writeback_keepwrite(page);
-
-	unlock_page(page);
-}
-
 /*
  * Submit the bio for an ioend. We are passed an ioend with a bio attached to
  * it, and we submit that bio. The ioend may be used for multiple bio
@@ -877,6 +853,9 @@ xfs_writepage_map(
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
 out:
+	ASSERT(PageLocked(page));
+	ASSERT(!PageWriteback(page));
+
 	/*
 	 * On error, we have to fail the ioend here because we have locked
 	 * buffers in the ioend. If we don't do this, we'll deadlock
@@ -895,7 +874,21 @@ xfs_writepage_map(
 	 * treated correctly on error.
 	 */
 	if (count) {
-		xfs_start_page_writeback(page, !error);
+		/*
+		 * If the page was not fully cleaned, we need to ensure that the
+		 * higher layers come back to it correctly.  That means we need
+		 * to keep the page dirty, and for WB_SYNC_ALL writeback we need
+		 * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed
+		 * so another attempt to write this page in this writeback sweep
+		 * will be made.
+		 */
+		if (error) {
+			set_page_writeback_keepwrite(page);
+		} else {
+			clear_page_dirty_for_io(page);
+			set_page_writeback(page);
+		}
+		unlock_page(page);
 
 		/*
 		 * Preserve the original error if there was one, otherwise catch
@@ -920,7 +913,9 @@ xfs_writepage_map(
 		 * race with a partial page truncate on a sub-page block sized
 		 * filesystem. In that case we need to mark the page clean.
 		 */
-		xfs_start_page_writeback(page, 1);
+		clear_page_dirty_for_io(page);
+		set_page_writeback(page);
+		unlock_page(page);
 		end_page_writeback(page);
 	}
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 15/18] xfs: remove xfs_start_page_writeback Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 18:00   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
  2018-05-30 10:00 ` [PATCH 18/18] xfs: allow writeback on pages without buffer heads Christoph Hellwig
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Rejuggle how we deal with the different error vs non-error and have
ioends vs not have ioend cases to keep the fast path streamlined, and
the duplicate code at a minimum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 65 +++++++++++++++++++++++------------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 38021023131e..ac417ef326a9 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -873,7 +873,14 @@ xfs_writepage_map(
 	 * submission of outstanding ioends on the writepage context so they are
 	 * treated correctly on error.
 	 */
-	if (count) {
+	if (unlikely(error)) {
+		if (!count) {
+			xfs_aops_discard_page(page);
+			ClearPageUptodate(page);
+			unlock_page(page);
+			goto done;
+		}
+
 		/*
 		 * If the page was not fully cleaned, we need to ensure that the
 		 * higher layers come back to it correctly.  That means we need
@@ -882,43 +889,35 @@ xfs_writepage_map(
 		 * so another attempt to write this page in this writeback sweep
 		 * will be made.
 		 */
-		if (error) {
-			set_page_writeback_keepwrite(page);
-		} else {
-			clear_page_dirty_for_io(page);
-			set_page_writeback(page);
-		}
-		unlock_page(page);
-
-		/*
-		 * Preserve the original error if there was one, otherwise catch
-		 * submission errors here and propagate into subsequent ioend
-		 * submissions.
-		 */
-		list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
-			int error2;
-
-			list_del_init(&ioend->io_list);
-			error2 = xfs_submit_ioend(wbc, ioend, error);
-			if (error2 && !error)
-				error = error2;
-		}
-	} else if (error) {
-		xfs_aops_discard_page(page);
-		ClearPageUptodate(page);
-		unlock_page(page);
+		set_page_writeback_keepwrite(page);
 	} else {
-		/*
-		 * We can end up here with no error and nothing to write if we
-		 * race with a partial page truncate on a sub-page block sized
-		 * filesystem. In that case we need to mark the page clean.
-		 */
 		clear_page_dirty_for_io(page);
 		set_page_writeback(page);
-		unlock_page(page);
-		end_page_writeback(page);
 	}
 
+	unlock_page(page);
+
+	/*
+	 * Preserve the original error if there was one, otherwise catch
+	 * submission errors here and propagate into subsequent ioend
+	 * submissions.
+	 */
+	list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
+		int error2;
+
+		list_del_init(&ioend->io_list);
+		error2 = xfs_submit_ioend(wbc, ioend, error);
+		if (error2 && !error)
+			error = error2;
+	}
+
+	/*
+	 * We can end up here with no error and nothing to write if we race with
+	 * a partial page truncate on a sub-page block sized filesystem.
+	 */
+	if (!count)
+		end_page_writeback(page);
+done:
 	mapping_set_error(page->mapping, error);
 	return error;
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 18:08   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  2018-05-30 10:00 ` [PATCH 18/18] xfs: allow writeback on pages without buffer heads Christoph Hellwig
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

We already track the page uptodate status based on the buffer uptodate
status, which is updated whenever reading or zeroing blocks.

This code has been there since commit a ptool commit in 2002, which
claims to:

    "merge" the 2.4 fsx fix for block size < page size to 2.5.  This needed
    major changes to actually fit.

and isn't present in other writepage implementations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index ac417ef326a9..84f88cecd2f1 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -786,7 +786,6 @@ xfs_writepage_map(
 	ssize_t			len = i_blocksize(inode);
 	int			error = 0;
 	int			count = 0;
-	bool			uptodate = true;
 	loff_t			file_offset;	/* file offset of page */
 	unsigned		poffset;	/* offset into page */
 
@@ -813,7 +812,6 @@ xfs_writepage_map(
 		if (!buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
-			uptodate = false;
 			continue;
 		}
 
@@ -847,9 +845,6 @@ xfs_writepage_map(
 		count++;
 	}
 
-	if (uptodate && poffset == PAGE_SIZE)
-		SetPageUptodate(page);
-
 	ASSERT(wpc->ioend || list_empty(&submit_list));
 
 out:
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 18/18] xfs: allow writeback on pages without buffer heads
  2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2018-05-30 10:00 ` [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
@ 2018-05-30 10:00 ` Christoph Hellwig
  2018-05-30 18:08   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  17 siblings, 2 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-30 10:00 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-fsdevel, linux-mm

Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size
equal to the page size, and deal with pages without buffer heads in
writeback.  Thanks to the previous refactoring this is basically trivial
now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  | 47 +++++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_iomap.c |  3 ++-
 2 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 84f88cecd2f1..6640377b6eae 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -91,6 +91,19 @@ xfs_find_daxdev_for_inode(
 		return mp->m_ddev_targp->bt_daxdev;
 }
 
+static void
+xfs_finish_page_writeback(
+	struct inode		*inode,
+	struct bio_vec		*bvec,
+	int			error)
+{
+	if (error) {
+		SetPageError(bvec->bv_page);
+		mapping_set_error(inode->i_mapping, -EIO);
+	}
+	end_page_writeback(bvec->bv_page);
+}
+
 /*
  * We're now finished for good with this page.  Update the page state via the
  * associated buffer_heads, paying attention to the start and end offsets that
@@ -103,7 +116,7 @@ xfs_find_daxdev_for_inode(
  * and buffers potentially freed after every call to end_buffer_async_write.
  */
 static void
-xfs_finish_page_writeback(
+xfs_finish_buffer_writeback(
 	struct inode		*inode,
 	struct bio_vec		*bvec,
 	int			error)
@@ -178,9 +191,12 @@ xfs_destroy_ioend(
 			next = bio->bi_private;
 
 		/* walk each page on bio, ending page IO on them */
-		bio_for_each_segment_all(bvec, bio, i)
-			xfs_finish_page_writeback(inode, bvec, error);
-
+		bio_for_each_segment_all(bvec, bio, i) {
+			if (page_has_buffers(bvec->bv_page))
+				xfs_finish_buffer_writeback(inode, bvec, error);
+			else
+				xfs_finish_page_writeback(inode, bvec, error);
+		}
 		bio_put(bio);
 	}
 
@@ -782,13 +798,16 @@ xfs_writepage_map(
 {
 	LIST_HEAD(submit_list);
 	struct xfs_ioend	*ioend, *next;
-	struct buffer_head	*bh;
+	struct buffer_head	*bh = NULL;
 	ssize_t			len = i_blocksize(inode);
 	int			error = 0;
 	int			count = 0;
 	loff_t			file_offset;	/* file offset of page */
 	unsigned		poffset;	/* offset into page */
 
+	if (page_has_buffers(page))
+		bh = page_buffers(page);
+
 	/*
 	 * Walk the blocks on the page, and we we run off then end of the
 	 * current map or find the current map invalid, grab a new one.
@@ -797,11 +816,9 @@ xfs_writepage_map(
 	 * replace the bufferhead with some other state tracking mechanism in
 	 * future.
 	 */
-	file_offset = page_offset(page);
-	bh = page_buffers(page);
-	for (poffset = 0;
+	for (poffset = 0, file_offset = page_offset(page);
 	     poffset < PAGE_SIZE;
-	     poffset += len, file_offset += len, bh = bh->b_this_page) {
+	     poffset += len, file_offset += len) {
 		/* past the range we are writing, so nothing more to write. */
 		if (file_offset >= end_offset)
 			break;
@@ -809,9 +826,10 @@ xfs_writepage_map(
 		/*
 		 * Block does not contain valid data, skip it.
 		 */
-		if (!buffer_uptodate(bh)) {
+		if (bh && !buffer_uptodate(bh)) {
 			if (PageUptodate(page))
 				ASSERT(buffer_mapped(bh));
+			bh = bh->b_this_page;
 			continue;
 		}
 
@@ -836,10 +854,15 @@ xfs_writepage_map(
 			 * meaningless for holes (!mapped && uptodate), so check we did
 			 * have a buffer covering a hole here and continue.
 			 */
+			if (bh)
+				bh = bh->b_this_page;
 			continue;
 		}
 
-		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+		if (bh) {
+			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
+			bh = bh->b_this_page;
+		}
 		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
 				&submit_list);
 		count++;
@@ -939,8 +962,6 @@ xfs_do_writepage(
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
-	ASSERT(page_has_buffers(page));
-
 	/*
 	 * Refuse to write the page out if we are called from reclaim context.
 	 *
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f949f0dd7382..93c40da3378a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1031,7 +1031,8 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	iomap->flags |= IOMAP_F_BUFFER_HEAD;
+	if (i_blocksize(inode) < PAGE_SIZE)
+		iomap->flags |= IOMAP_F_BUFFER_HEAD;
 
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/18] fs: factor out a __generic_write_end helper
  2018-05-30  9:59 ` [PATCH 01/18] fs: factor out a __generic_write_end helper Christoph Hellwig
@ 2018-05-30 13:34   ` Brian Foster
  2018-05-30 16:26   ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-30 13:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:59:56AM +0200, Christoph Hellwig wrote:
> Bits of the buffer.c based write_end implementations that don't know
> about buffer_heads and can be reused by other implementations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/buffer.c   | 67 +++++++++++++++++++++++++++------------------------
>  fs/internal.h |  2 ++
>  2 files changed, 37 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 249b83fafe48..bd964b2ad99a 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2076,6 +2076,40 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
>  }
>  EXPORT_SYMBOL(block_write_begin);
>  
> +int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +		struct page *page)
> +{
> +	loff_t old_size = inode->i_size;
> +	bool i_size_changed = false;
> +
> +	/*
> +	 * No need to use i_size_read() here, the i_size cannot change under us
> +	 * because we hold i_rwsem.
> +	 *
> +	 * But it's important to update i_size while still holding page lock:
> +	 * page writeout could otherwise come in and zero beyond i_size.
> +	 */
> +	if (pos + copied > inode->i_size) {
> +		i_size_write(inode, pos + copied);
> +		i_size_changed = true;
> +	}
> +
> +	unlock_page(page);
> +	put_page(page);
> +
> +	if (old_size < pos)
> +		pagecache_isize_extended(inode, old_size, pos);
> +	/*
> +	 * Don't mark the inode dirty under page lock. First, it unnecessarily
> +	 * makes the holding time of page lock longer. Second, it forces lock
> +	 * ordering of page lock and transaction start for journaling
> +	 * filesystems.
> +	 */
> +	if (i_size_changed)
> +		mark_inode_dirty(inode);
> +	return copied;
> +}
> +
>  int block_write_end(struct file *file, struct address_space *mapping,
>  			loff_t pos, unsigned len, unsigned copied,
>  			struct page *page, void *fsdata)
> @@ -2116,39 +2150,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
>  			loff_t pos, unsigned len, unsigned copied,
>  			struct page *page, void *fsdata)
>  {
> -	struct inode *inode = mapping->host;
> -	loff_t old_size = inode->i_size;
> -	int i_size_changed = 0;
> -
>  	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
> -
> -	/*
> -	 * No need to use i_size_read() here, the i_size
> -	 * cannot change under us because we hold i_mutex.
> -	 *
> -	 * But it's important to update i_size while still holding page lock:
> -	 * page writeout could otherwise come in and zero beyond i_size.
> -	 */
> -	if (pos+copied > inode->i_size) {
> -		i_size_write(inode, pos+copied);
> -		i_size_changed = 1;
> -	}
> -
> -	unlock_page(page);
> -	put_page(page);
> -
> -	if (old_size < pos)
> -		pagecache_isize_extended(inode, old_size, pos);
> -	/*
> -	 * Don't mark the inode dirty under page lock. First, it unnecessarily
> -	 * makes the holding time of page lock longer. Second, it forces lock
> -	 * ordering of page lock and transaction start for journaling
> -	 * filesystems.
> -	 */
> -	if (i_size_changed)
> -		mark_inode_dirty(inode);
> -
> -	return copied;
> +	return __generic_write_end(mapping->host, pos, copied, page);
>  }
>  EXPORT_SYMBOL(generic_write_end);
>  
> diff --git a/fs/internal.h b/fs/internal.h
> index e08972db0303..b955232d3d49 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -43,6 +43,8 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
>  extern void guard_bio_eod(int rw, struct bio *bio);
>  extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>  		get_block_t *get_block, struct iomap *iomap);
> +int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +		struct page *page);
>  
>  /*
>   * char_dev.c
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/18] iomap: add initial support for writes without buffer heads
  2018-05-30  9:59 ` [PATCH 02/18] iomap: add initial support for writes without buffer heads Christoph Hellwig
@ 2018-05-30 13:34   ` Brian Foster
  2018-05-30 16:33   ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-30 13:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:59:57AM +0200, Christoph Hellwig wrote:
> For now just limited to blocksize == PAGE_SIZE, where we can simply read
> in the full page in write begin, and just set the whole page dirty after
> copying data into it.  This code is enabled by default and XFS will now
> be feed pages without buffer heads in ->writepage and ->writepages.
> 
> If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old
> path will still be used, this both helps the transition in XFS and
> prepares for the gfs2 migration to the iomap infrastructure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/iomap.c            | 128 ++++++++++++++++++++++++++++++++++++++----
>  fs/xfs/xfs_iomap.c    |   6 +-
>  include/linux/iomap.h |   2 +
>  3 files changed, 123 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 5e5a266e3325..0c9d9be59184 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -316,6 +316,48 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
>  }
>  
> +static int
> +iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
> +		unsigned poff, unsigned plen, unsigned from, unsigned to,
> +		struct iomap *iomap)
> +{
> +	struct bio_vec bvec;
> +	struct bio bio;
> +
> +	if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) {
> +		zero_user_segments(page, poff, from, to, poff + plen);
> +		return 0;
> +	}
> +
> +	bio_init(&bio, &bvec, 1);
> +	bio.bi_opf = REQ_OP_READ;
> +	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
> +	bio_set_dev(&bio, iomap->bdev);
> +	__bio_add_page(&bio, page, plen, poff);
> +	return submit_bio_wait(&bio);
> +}
> +
> +static int
> +__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> +		struct page *page, struct iomap *iomap)
> +{
> +	loff_t block_size = i_blocksize(inode);
> +	loff_t block_start = pos & ~(block_size - 1);
> +	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
> +	unsigned poff = block_start & (PAGE_SIZE - 1);
> +	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start);
> +	unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
> +
> +	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> +
> +	if (PageUptodate(page))
> +		return 0;
> +	if (from <= poff && to >= poff + plen)
> +		return 0;
> +	return iomap_read_page_sync(inode, block_start, page,
> +			poff, plen, from, to, iomap);
> +}
> +
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap)
> @@ -333,7 +375,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	if (!page)
>  		return -ENOMEM;
>  
> -	status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> +		status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +	else
> +		status = __iomap_write_begin(inode, pos, len, page, iomap);
>  	if (unlikely(status)) {
>  		unlock_page(page);
>  		put_page(page);
> @@ -346,14 +391,69 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	return status;
>  }
>  
> +int
> +iomap_set_page_dirty(struct page *page)
> +{
> +	struct address_space *mapping = page_mapping(page);
> +	int newly_dirty;
> +
> +	if (unlikely(!mapping))
> +		return !TestSetPageDirty(page);
> +
> +	/*
> +	 * Lock out page->mem_cgroup migration to keep PageDirty
> +	 * synchronized with per-memcg dirty page counters.
> +	 */
> +	lock_page_memcg(page);
> +	newly_dirty = !TestSetPageDirty(page);
> +	if (newly_dirty)
> +		__set_page_dirty(page, mapping, 0);
> +	unlock_page_memcg(page);
> +
> +	if (newly_dirty)
> +		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> +	return newly_dirty;
> +}
> +EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
> +
> +static int
> +__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> +		unsigned copied, struct page *page, struct iomap *iomap)
> +{
> +	flush_dcache_page(page);
> +
> +	/*
> +	 * The blocks that were entirely written will now be uptodate, so we
> +	 * don't have to worry about a readpage reading them and overwriting a
> +	 * partial write.  However if we have encountered a short write and only
> +	 * partially written into a block, it will not be marked uptodate, so a
> +	 * readpage might come in and destroy our partial write.
> +	 *
> +	 * Do the simplest thing, and just treat any short write to a non
> +	 * uptodate page as a zero-length write, and force the caller to redo
> +	 * the whole thing.
> +	 */
> +	if (unlikely(copied < len && !PageUptodate(page))) {
> +		copied = 0;
> +	} else {
> +		SetPageUptodate(page);
> +		iomap_set_page_dirty(page);
> +	}
> +	return __generic_write_end(inode, pos, copied, page);
> +}
> +
>  static int
>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page)
> +		unsigned copied, struct page *page, struct iomap *iomap)
>  {
>  	int ret;
>  
> -	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> -			copied, page, NULL);
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> +		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> +				copied, page, NULL);
> +	else
> +		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
> +
>  	if (ret < len)
>  		iomap_write_failed(inode, pos, len);
>  	return ret;
> @@ -408,7 +508,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		flush_dcache_page(page);
>  
> -		status = iomap_write_end(inode, pos, bytes, copied, page);
> +		status = iomap_write_end(inode, pos, bytes, copied, page,
> +				iomap);
>  		if (unlikely(status < 0))
>  			break;
>  		copied = status;
> @@ -502,7 +603,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		WARN_ON_ONCE(!PageUptodate(page));
>  
> -		status = iomap_write_end(inode, pos, bytes, bytes, page);
> +		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>  		if (unlikely(status <= 0)) {
>  			if (WARN_ON_ONCE(status == 0))
>  				return -EIO;
> @@ -554,7 +655,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>  	zero_user(page, offset, bytes);
>  	mark_page_accessed(page);
>  
> -	return iomap_write_end(inode, pos, bytes, bytes, page);
> +	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>  }
>  
>  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -640,11 +741,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
>  	struct page *page = data;
>  	int ret;
>  
> -	ret = __block_write_begin_int(page, pos, length, NULL, iomap);
> -	if (ret)
> -		return ret;
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> +		ret = __block_write_begin_int(page, pos, length, NULL, iomap);
> +		if (ret)
> +			return ret;
> +		block_commit_write(page, 0, length);
> +	} else {
> +		WARN_ON_ONCE(!PageUptodate(page));
> +		WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> +	}
>  
> -	block_commit_write(page, 0, length);
>  	return length;
>  }
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c6ce6f9335b6..da6d1995e460 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -638,7 +638,7 @@ xfs_file_iomap_begin_delay(
>  	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
>  	 * them out if the write happens to fail.
>  	 */
> -	iomap->flags = IOMAP_F_NEW;
> +	iomap->flags |= IOMAP_F_NEW;
>  	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
>  done:
>  	if (isnullstartblock(got.br_startblock))
> @@ -1031,6 +1031,8 @@ xfs_file_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> +	iomap->flags |= IOMAP_F_BUFFER_HEAD;
> +
>  	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
>  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
> @@ -1131,7 +1133,7 @@ xfs_file_iomap_begin(
>  	if (error)
>  		return error;
>  
> -	iomap->flags = IOMAP_F_NEW;
> +	iomap->flags |= IOMAP_F_NEW;
>  	trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
>  
>  out_finish:
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 7300d30ca495..4d3d9d0cd69f 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -30,6 +30,7 @@ struct vm_fault;
>   */
>  #define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
>  #define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
> +#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
>  
>  /*
>   * Flags that only need to be reported for IOMAP_REPORT requests:
> @@ -92,6 +93,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
>  int iomap_readpages(struct address_space *mapping, struct list_head *pages,
>  		unsigned nr_pages, const struct iomap_ops *ops);
> +int iomap_set_page_dirty(struct page *page);
>  int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>  		const struct iomap_ops *ops);
>  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range
  2018-05-30  9:59 ` [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-05-30 13:35   ` Brian Foster
  2018-05-30 16:55     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Brian Foster @ 2018-05-30 13:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:59:58AM +0200, Christoph Hellwig wrote:
> Instead of using xfs_bmapi_read to find delalloc extents and then punch
> them out using xfs_bunmapi, opencode the loop to iterate over the extents
> and call xfs_bmap_del_extent_delay directly.  This both simplifies the
> code and reduces the number of extent tree lookups required.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_bmap_util.c | 84 ++++++++++++++----------------------------
>  1 file changed, 28 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 06badcbadeb4..f2b87873612d 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -695,12 +695,10 @@ xfs_getbmap(
>  }
>  
>  /*
> - * dead simple method of punching delalyed allocation blocks from a range in
> - * the inode. Walks a block at a time so will be slow, but is only executed in
> - * rare error cases so the overhead is not critical. This will always punch out
> - * both the start and end blocks, even if the ranges only partially overlap
> - * them, so it is up to the caller to ensure that partial blocks are not
> - * passed in.
> + * Dead simple method of punching delalyed allocation blocks from a range in
> + * the inode.  This will always punch out both the start and end blocks, even
> + * if the ranges only partially overlap them, so it is up to the caller to
> + * ensure that partial blocks are not passed in.
>   */
>  int
>  xfs_bmap_punch_delalloc_range(
> @@ -708,63 +706,37 @@ xfs_bmap_punch_delalloc_range(
>  	xfs_fileoff_t		start_fsb,
>  	xfs_fileoff_t		length)
>  {
...
> +	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
> +		return 0;
>  
> -		/*
> -		 * Note: while we initialise the firstblock/dfops pair, they
> -		 * should never be used because blocks should never be
> -		 * allocated or freed for a delalloc extent and hence we need
> -		 * don't cancel or finish them after the xfs_bunmapi() call.
> -		 */
> -		xfs_defer_init(&dfops, &firstblock);
> -		error = xfs_bunmapi(NULL, ip, start_fsb, 1, 0, 1, &firstblock,
> -					&dfops, &done);
> -		if (error)
> -			break;
> +	while (got.br_startoff + got.br_blockcount > start_fsb) {
> +		del = got;
> +		xfs_trim_extent(&del, start_fsb, length);
>  
> -		ASSERT(!xfs_defer_has_unfinished_work(&dfops));
> -next_block:
> -		start_fsb++;
> -		remaining--;
> -	} while(remaining > 0);
> +		if (del.br_blockcount && isnullstartblock(del.br_startblock)) {

I think there's subtle behavior here that warrants a comment (and
describes the somewhat funky logic). E.g., something like:

/*
 * got might point to the extent after del in some cases. The next
 * iteration will detect this and step back to the previous extent.
 */

Alternatively, I find separating the if/else a bit more readable (see
the appended hunk). But otherwise looks fine:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +			error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK,
> +					&icur, &got, &del);
> +			if (error || !xfs_iext_get_extent(ifp, &icur, &got))
> +				break;
> +		} else {
> +			if (!xfs_iext_prev_extent(ifp, &icur, &got))
> +				break;
> +		}
> +	}
>  
>  	return error;
>  }
> -- 
> 2.17.0

--- 8< ---

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index f2b87873612d..0070b877ed94 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -727,15 +727,22 @@ xfs_bmap_punch_delalloc_range(
 		del = got;
 		xfs_trim_extent(&del, start_fsb, length);
 
-		if (del.br_blockcount && isnullstartblock(del.br_startblock)) {
-			error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK,
-					&icur, &got, &del);
-			if (error || !xfs_iext_get_extent(ifp, &icur, &got))
-				break;
-		} else {
+		/*
+		 * A delete can push the cursor forward. Step back to the
+		 * previous extent on non-delalloc or extents outside the
+		 * target range.
+		 */
+		if (!del.br_blockcount ||
+		    !isnullstartblock(del.br_startblock)) {
 			if (!xfs_iext_prev_extent(ifp, &icur, &got))
 				break;
+			continue;
 		}
+
+		error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur,
+						  &got, &del);
+		if (error || !xfs_iext_get_extent(ifp, &icur, &got))
+			break;
 	}
 
 	return error;

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/18] xfs: simplify xfs_aops_discard_page
  2018-05-30  9:59 ` [PATCH 04/18] xfs: simplify xfs_aops_discard_page Christoph Hellwig
@ 2018-05-30 13:35   ` Brian Foster
  2018-05-30 16:40   ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-30 13:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:59:59AM +0200, Christoph Hellwig wrote:
> Instead of looking at the buffer heads to see if a block is delalloc just
> call xfs_bmap_punch_delalloc_range on the whole page - this will leave
> any non-delalloc block intact and handle the iteration for us.  As a side
> effect one more place stops caring about buffer heads and we can remove the
> xfs_check_page_type function entirely.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 85 +++++------------------------------------------
>  1 file changed, 9 insertions(+), 76 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index c631c457b444..f2333e351e07 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -711,49 +711,6 @@ xfs_map_at_offset(
>  	clear_buffer_unwritten(bh);
>  }
>  
> -/*
> - * Test if a given page contains at least one buffer of a given @type.
> - * If @check_all_buffers is true, then we walk all the buffers in the page to
> - * try to find one of the type passed in. If it is not set, then the caller only
> - * needs to check the first buffer on the page for a match.
> - */
> -STATIC bool
> -xfs_check_page_type(
> -	struct page		*page,
> -	unsigned int		type,
> -	bool			check_all_buffers)
> -{
> -	struct buffer_head	*bh;
> -	struct buffer_head	*head;
> -
> -	if (PageWriteback(page))
> -		return false;
> -	if (!page->mapping)
> -		return false;
> -	if (!page_has_buffers(page))
> -		return false;
> -
> -	bh = head = page_buffers(page);
> -	do {
> -		if (buffer_unwritten(bh)) {
> -			if (type == XFS_IO_UNWRITTEN)
> -				return true;
> -		} else if (buffer_delay(bh)) {
> -			if (type == XFS_IO_DELALLOC)
> -				return true;
> -		} else if (buffer_dirty(bh) && buffer_mapped(bh)) {
> -			if (type == XFS_IO_OVERWRITE)
> -				return true;
> -		}
> -
> -		/* If we are only checking the first buffer, we are done now. */
> -		if (!check_all_buffers)
> -			break;
> -	} while ((bh = bh->b_this_page) != head);
> -
> -	return false;
> -}
> -
>  STATIC void
>  xfs_vm_invalidatepage(
>  	struct page		*page,
> @@ -785,9 +742,6 @@ xfs_vm_invalidatepage(
>   * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
>   * truncation without a transaction as there is no space left for block
>   * reservation (typically why we see a ENOSPC in writeback).
> - *
> - * This is not a performance critical path, so for now just do the punching a
> - * buffer head at a time.
>   */
>  STATIC void
>  xfs_aops_discard_page(
> @@ -795,47 +749,26 @@ xfs_aops_discard_page(
>  {
>  	struct inode		*inode = page->mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
> -	struct buffer_head	*bh, *head;
> +	struct xfs_mount	*mp = ip->i_mount;
>  	loff_t			offset = page_offset(page);
> +	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, offset);
> +	int			error;
>  
> -	if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
> -		goto out_invalidate;
> -
> -	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> +	if (XFS_FORCED_SHUTDOWN(mp))
>  		goto out_invalidate;
>  
> -	xfs_alert(ip->i_mount,
> +	xfs_alert(mp,
>  		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
>  			page, ip->i_ino, offset);
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	bh = head = page_buffers(page);
> -	do {
> -		int		error;
> -		xfs_fileoff_t	start_fsb;
> -
> -		if (!buffer_delay(bh))
> -			goto next_buffer;
> -
> -		start_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> -		error = xfs_bmap_punch_delalloc_range(ip, start_fsb, 1);
> -		if (error) {
> -			/* something screwed, just bail */
> -			if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
> -				xfs_alert(ip->i_mount,
> -			"page discard unable to remove delalloc mapping.");
> -			}
> -			break;
> -		}
> -next_buffer:
> -		offset += i_blocksize(inode);
> -
> -	} while ((bh = bh->b_this_page) != head);
> -
> +	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> +			PAGE_SIZE / i_blocksize(inode));
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	if (error && !XFS_FORCED_SHUTDOWN(mp))
> +		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
>  out_invalidate:
>  	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
> -	return;
>  }
>  
>  static int
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range
  2018-05-30 10:00 ` [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2018-05-30 13:35   ` Brian Foster
  2018-05-30 16:58     ` Darrick J. Wong
  0 siblings, 1 reply; 61+ messages in thread
From: Brian Foster @ 2018-05-30 13:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:00PM +0200, Christoph Hellwig wrote:
> Both callers want the same looking, so do it only once.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c      | 2 --
>  fs/xfs/xfs_bmap_util.c | 7 ++++---
>  fs/xfs/xfs_iomap.c     | 3 ---
>  3 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index f2333e351e07..5dd09e83c81c 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -761,10 +761,8 @@ xfs_aops_discard_page(
>  		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
>  			page, ip->i_ino, offset);
>  
> -	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
>  			PAGE_SIZE / i_blocksize(inode));
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	if (error && !XFS_FORCED_SHUTDOWN(mp))
>  		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
>  out_invalidate:
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index f2b87873612d..86a7ee425bfc 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -712,12 +712,11 @@ xfs_bmap_punch_delalloc_range(
>  	struct xfs_iext_cursor	icur;
>  	int			error = 0;
>  
> -	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> -
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
>  		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
>  		if (error)
> -			return error;
> +			goto out_unlock;
>  	}
>  
>  	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))

There's a return 0 just below here that needs the exit label treatment.
Otherwise looks Ok.

Brian

> @@ -738,6 +737,8 @@ xfs_bmap_punch_delalloc_range(
>  		}
>  	}
>  
> +out_unlock:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index da6d1995e460..f949f0dd7382 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1203,11 +1203,8 @@ xfs_file_iomap_end_delalloc(
>  		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
>  					 XFS_FSB_TO_B(mp, end_fsb) - 1);
>  
> -		xfs_ilock(ip, XFS_ILOCK_EXCL);
>  		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
>  					       end_fsb - start_fsb);
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -
>  		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
>  			xfs_alert(mp, "%s: unable to clean up ino %lld",
>  				__func__, ip->i_ino);
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/18] fs: factor out a __generic_write_end helper
  2018-05-30  9:59 ` [PATCH 01/18] fs: factor out a __generic_write_end helper Christoph Hellwig
  2018-05-30 13:34   ` Brian Foster
@ 2018-05-30 16:26   ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 16:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:59:56AM +0200, Christoph Hellwig wrote:
> Bits of the buffer.c based write_end implementations that don't know
> about buffer_heads and can be reused by other implementations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/buffer.c   | 67 +++++++++++++++++++++++++++------------------------
>  fs/internal.h |  2 ++
>  2 files changed, 37 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 249b83fafe48..bd964b2ad99a 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2076,6 +2076,40 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
>  }
>  EXPORT_SYMBOL(block_write_begin);
>  
> +int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +		struct page *page)
> +{
> +	loff_t old_size = inode->i_size;
> +	bool i_size_changed = false;
> +
> +	/*
> +	 * No need to use i_size_read() here, the i_size cannot change under us
> +	 * because we hold i_rwsem.
> +	 *
> +	 * But it's important to update i_size while still holding page lock:
> +	 * page writeout could otherwise come in and zero beyond i_size.
> +	 */
> +	if (pos + copied > inode->i_size) {
> +		i_size_write(inode, pos + copied);
> +		i_size_changed = true;
> +	}
> +
> +	unlock_page(page);
> +	put_page(page);
> +
> +	if (old_size < pos)
> +		pagecache_isize_extended(inode, old_size, pos);
> +	/*
> +	 * Don't mark the inode dirty under page lock. First, it unnecessarily
> +	 * makes the holding time of page lock longer. Second, it forces lock
> +	 * ordering of page lock and transaction start for journaling
> +	 * filesystems.
> +	 */
> +	if (i_size_changed)
> +		mark_inode_dirty(inode);
> +	return copied;
> +}
> +
>  int block_write_end(struct file *file, struct address_space *mapping,
>  			loff_t pos, unsigned len, unsigned copied,
>  			struct page *page, void *fsdata)
> @@ -2116,39 +2150,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
>  			loff_t pos, unsigned len, unsigned copied,
>  			struct page *page, void *fsdata)
>  {
> -	struct inode *inode = mapping->host;
> -	loff_t old_size = inode->i_size;
> -	int i_size_changed = 0;
> -
>  	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
> -
> -	/*
> -	 * No need to use i_size_read() here, the i_size
> -	 * cannot change under us because we hold i_mutex.
> -	 *
> -	 * But it's important to update i_size while still holding page lock:
> -	 * page writeout could otherwise come in and zero beyond i_size.
> -	 */
> -	if (pos+copied > inode->i_size) {
> -		i_size_write(inode, pos+copied);
> -		i_size_changed = 1;
> -	}
> -
> -	unlock_page(page);
> -	put_page(page);
> -
> -	if (old_size < pos)
> -		pagecache_isize_extended(inode, old_size, pos);
> -	/*
> -	 * Don't mark the inode dirty under page lock. First, it unnecessarily
> -	 * makes the holding time of page lock longer. Second, it forces lock
> -	 * ordering of page lock and transaction start for journaling
> -	 * filesystems.
> -	 */
> -	if (i_size_changed)
> -		mark_inode_dirty(inode);
> -
> -	return copied;
> +	return __generic_write_end(mapping->host, pos, copied, page);
>  }
>  EXPORT_SYMBOL(generic_write_end);
>  
> diff --git a/fs/internal.h b/fs/internal.h
> index e08972db0303..b955232d3d49 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -43,6 +43,8 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
>  extern void guard_bio_eod(int rw, struct bio *bio);
>  extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>  		get_block_t *get_block, struct iomap *iomap);
> +int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
> +		struct page *page);
>  
>  /*
>   * char_dev.c
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/18] iomap: add initial support for writes without buffer heads
  2018-05-30  9:59 ` [PATCH 02/18] iomap: add initial support for writes without buffer heads Christoph Hellwig
  2018-05-30 13:34   ` Brian Foster
@ 2018-05-30 16:33   ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 16:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:59:57AM +0200, Christoph Hellwig wrote:
> For now just limited to blocksize == PAGE_SIZE, where we can simply read
> in the full page in write begin, and just set the whole page dirty after
> copying data into it.  This code is enabled by default and XFS will now
> be feed pages without buffer heads in ->writepage and ->writepages.
> 
> If a file system sets the IOMAP_F_BUFFER_HEAD flag on the iomap the old
> path will still be used, this both helps the transition in XFS and
> prepares for the gfs2 migration to the iomap infrastructure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap.c            | 128 ++++++++++++++++++++++++++++++++++++++----
>  fs/xfs/xfs_iomap.c    |   6 +-
>  include/linux/iomap.h |   2 +
>  3 files changed, 123 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 5e5a266e3325..0c9d9be59184 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -316,6 +316,48 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  		truncate_pagecache_range(inode, max(pos, i_size), pos + len);
>  }
>  
> +static int
> +iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
> +		unsigned poff, unsigned plen, unsigned from, unsigned to,
> +		struct iomap *iomap)
> +{
> +	struct bio_vec bvec;
> +	struct bio bio;
> +
> +	if (iomap->type != IOMAP_MAPPED || block_start >= i_size_read(inode)) {
> +		zero_user_segments(page, poff, from, to, poff + plen);
> +		return 0;
> +	}
> +
> +	bio_init(&bio, &bvec, 1);
> +	bio.bi_opf = REQ_OP_READ;
> +	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
> +	bio_set_dev(&bio, iomap->bdev);
> +	__bio_add_page(&bio, page, plen, poff);
> +	return submit_bio_wait(&bio);
> +}
> +
> +static int
> +__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> +		struct page *page, struct iomap *iomap)
> +{
> +	loff_t block_size = i_blocksize(inode);
> +	loff_t block_start = pos & ~(block_size - 1);
> +	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
> +	unsigned poff = block_start & (PAGE_SIZE - 1);
> +	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - block_start);
> +	unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
> +
> +	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> +
> +	if (PageUptodate(page))
> +		return 0;
> +	if (from <= poff && to >= poff + plen)
> +		return 0;
> +	return iomap_read_page_sync(inode, block_start, page,
> +			poff, plen, from, to, iomap);
> +}
> +
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap)
> @@ -333,7 +375,10 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	if (!page)
>  		return -ENOMEM;
>  
> -	status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> +		status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +	else
> +		status = __iomap_write_begin(inode, pos, len, page, iomap);
>  	if (unlikely(status)) {
>  		unlock_page(page);
>  		put_page(page);
> @@ -346,14 +391,69 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	return status;
>  }
>  
> +int
> +iomap_set_page_dirty(struct page *page)
> +{
> +	struct address_space *mapping = page_mapping(page);
> +	int newly_dirty;
> +
> +	if (unlikely(!mapping))
> +		return !TestSetPageDirty(page);
> +
> +	/*
> +	 * Lock out page->mem_cgroup migration to keep PageDirty
> +	 * synchronized with per-memcg dirty page counters.
> +	 */
> +	lock_page_memcg(page);
> +	newly_dirty = !TestSetPageDirty(page);
> +	if (newly_dirty)
> +		__set_page_dirty(page, mapping, 0);
> +	unlock_page_memcg(page);
> +
> +	if (newly_dirty)
> +		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> +	return newly_dirty;
> +}
> +EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
> +
> +static int
> +__iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> +		unsigned copied, struct page *page, struct iomap *iomap)
> +{
> +	flush_dcache_page(page);
> +
> +	/*
> +	 * The blocks that were entirely written will now be uptodate, so we
> +	 * don't have to worry about a readpage reading them and overwriting a
> +	 * partial write.  However if we have encountered a short write and only
> +	 * partially written into a block, it will not be marked uptodate, so a
> +	 * readpage might come in and destroy our partial write.
> +	 *
> +	 * Do the simplest thing, and just treat any short write to a non
> +	 * uptodate page as a zero-length write, and force the caller to redo
> +	 * the whole thing.
> +	 */
> +	if (unlikely(copied < len && !PageUptodate(page))) {
> +		copied = 0;
> +	} else {
> +		SetPageUptodate(page);
> +		iomap_set_page_dirty(page);
> +	}
> +	return __generic_write_end(inode, pos, copied, page);
> +}
> +
>  static int
>  iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page)
> +		unsigned copied, struct page *page, struct iomap *iomap)
>  {
>  	int ret;
>  
> -	ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> -			copied, page, NULL);
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> +		ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> +				copied, page, NULL);
> +	else
> +		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
> +
>  	if (ret < len)
>  		iomap_write_failed(inode, pos, len);
>  	return ret;
> @@ -408,7 +508,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		flush_dcache_page(page);
>  
> -		status = iomap_write_end(inode, pos, bytes, copied, page);
> +		status = iomap_write_end(inode, pos, bytes, copied, page,
> +				iomap);
>  		if (unlikely(status < 0))
>  			break;
>  		copied = status;
> @@ -502,7 +603,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		WARN_ON_ONCE(!PageUptodate(page));
>  
> -		status = iomap_write_end(inode, pos, bytes, bytes, page);
> +		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>  		if (unlikely(status <= 0)) {
>  			if (WARN_ON_ONCE(status == 0))
>  				return -EIO;
> @@ -554,7 +655,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>  	zero_user(page, offset, bytes);
>  	mark_page_accessed(page);
>  
> -	return iomap_write_end(inode, pos, bytes, bytes, page);
> +	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>  }
>  
>  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -640,11 +741,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
>  	struct page *page = data;
>  	int ret;
>  
> -	ret = __block_write_begin_int(page, pos, length, NULL, iomap);
> -	if (ret)
> -		return ret;
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> +		ret = __block_write_begin_int(page, pos, length, NULL, iomap);
> +		if (ret)
> +			return ret;
> +		block_commit_write(page, 0, length);
> +	} else {
> +		WARN_ON_ONCE(!PageUptodate(page));
> +		WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
> +	}
>  
> -	block_commit_write(page, 0, length);
>  	return length;
>  }
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c6ce6f9335b6..da6d1995e460 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -638,7 +638,7 @@ xfs_file_iomap_begin_delay(
>  	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
>  	 * them out if the write happens to fail.
>  	 */
> -	iomap->flags = IOMAP_F_NEW;
> +	iomap->flags |= IOMAP_F_NEW;
>  	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
>  done:
>  	if (isnullstartblock(got.br_startblock))
> @@ -1031,6 +1031,8 @@ xfs_file_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> +	iomap->flags |= IOMAP_F_BUFFER_HEAD;
> +
>  	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
>  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
> @@ -1131,7 +1133,7 @@ xfs_file_iomap_begin(
>  	if (error)
>  		return error;
>  
> -	iomap->flags = IOMAP_F_NEW;
> +	iomap->flags |= IOMAP_F_NEW;
>  	trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
>  
>  out_finish:
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 7300d30ca495..4d3d9d0cd69f 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -30,6 +30,7 @@ struct vm_fault;
>   */
>  #define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
>  #define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
> +#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
>  
>  /*
>   * Flags that only need to be reported for IOMAP_REPORT requests:
> @@ -92,6 +93,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
>  int iomap_readpages(struct address_space *mapping, struct list_head *pages,
>  		unsigned nr_pages, const struct iomap_ops *ops);
> +int iomap_set_page_dirty(struct page *page);
>  int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>  		const struct iomap_ops *ops);
>  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/18] xfs: simplify xfs_aops_discard_page
  2018-05-30  9:59 ` [PATCH 04/18] xfs: simplify xfs_aops_discard_page Christoph Hellwig
  2018-05-30 13:35   ` Brian Foster
@ 2018-05-30 16:40   ` Darrick J. Wong
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 16:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:59:59AM +0200, Christoph Hellwig wrote:
> Instead of looking at the buffer heads to see if a block is delalloc just
> call xfs_bmap_punch_delalloc_range on the whole page - this will leave
> any non-delalloc block intact and handle the iteration for us.  As a side
> effect one more place stops caring about buffer heads and we can remove the
> xfs_check_page_type function entirely.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 85 +++++------------------------------------------
>  1 file changed, 9 insertions(+), 76 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index c631c457b444..f2333e351e07 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -711,49 +711,6 @@ xfs_map_at_offset(
>  	clear_buffer_unwritten(bh);
>  }
>  
> -/*
> - * Test if a given page contains at least one buffer of a given @type.
> - * If @check_all_buffers is true, then we walk all the buffers in the page to
> - * try to find one of the type passed in. If it is not set, then the caller only
> - * needs to check the first buffer on the page for a match.
> - */
> -STATIC bool
> -xfs_check_page_type(
> -	struct page		*page,
> -	unsigned int		type,
> -	bool			check_all_buffers)
> -{
> -	struct buffer_head	*bh;
> -	struct buffer_head	*head;
> -
> -	if (PageWriteback(page))
> -		return false;
> -	if (!page->mapping)
> -		return false;
> -	if (!page_has_buffers(page))
> -		return false;
> -
> -	bh = head = page_buffers(page);
> -	do {
> -		if (buffer_unwritten(bh)) {
> -			if (type == XFS_IO_UNWRITTEN)
> -				return true;
> -		} else if (buffer_delay(bh)) {
> -			if (type == XFS_IO_DELALLOC)
> -				return true;
> -		} else if (buffer_dirty(bh) && buffer_mapped(bh)) {
> -			if (type == XFS_IO_OVERWRITE)
> -				return true;
> -		}
> -
> -		/* If we are only checking the first buffer, we are done now. */
> -		if (!check_all_buffers)
> -			break;
> -	} while ((bh = bh->b_this_page) != head);
> -
> -	return false;
> -}
> -
>  STATIC void
>  xfs_vm_invalidatepage(
>  	struct page		*page,
> @@ -785,9 +742,6 @@ xfs_vm_invalidatepage(
>   * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this
>   * truncation without a transaction as there is no space left for block
>   * reservation (typically why we see a ENOSPC in writeback).
> - *
> - * This is not a performance critical path, so for now just do the punching a
> - * buffer head at a time.
>   */
>  STATIC void
>  xfs_aops_discard_page(
> @@ -795,47 +749,26 @@ xfs_aops_discard_page(
>  {
>  	struct inode		*inode = page->mapping->host;
>  	struct xfs_inode	*ip = XFS_I(inode);
> -	struct buffer_head	*bh, *head;
> +	struct xfs_mount	*mp = ip->i_mount;
>  	loff_t			offset = page_offset(page);
> +	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, offset);
> +	int			error;
>  
> -	if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
> -		goto out_invalidate;
> -
> -	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> +	if (XFS_FORCED_SHUTDOWN(mp))
>  		goto out_invalidate;
>  
> -	xfs_alert(ip->i_mount,
> +	xfs_alert(mp,
>  		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
>  			page, ip->i_ino, offset);
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	bh = head = page_buffers(page);
> -	do {
> -		int		error;
> -		xfs_fileoff_t	start_fsb;
> -
> -		if (!buffer_delay(bh))
> -			goto next_buffer;
> -
> -		start_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> -		error = xfs_bmap_punch_delalloc_range(ip, start_fsb, 1);
> -		if (error) {
> -			/* something screwed, just bail */
> -			if (!XFS_FORCED_SHUTDOWN(ip->i_mount)) {
> -				xfs_alert(ip->i_mount,
> -			"page discard unable to remove delalloc mapping.");
> -			}
> -			break;
> -		}
> -next_buffer:
> -		offset += i_blocksize(inode);
> -
> -	} while ((bh = bh->b_this_page) != head);
> -
> +	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> +			PAGE_SIZE / i_blocksize(inode));
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	if (error && !XFS_FORCED_SHUTDOWN(mp))
> +		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
>  out_invalidate:
>  	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
> -	return;
>  }
>  
>  static int
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range
  2018-05-30 13:35   ` Brian Foster
@ 2018-05-30 16:55     ` Darrick J. Wong
  0 siblings, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 16:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 09:35:39AM -0400, Brian Foster wrote:
> On Wed, May 30, 2018 at 11:59:58AM +0200, Christoph Hellwig wrote:
> > Instead of using xfs_bmapi_read to find delalloc extents and then punch
> > them out using xfs_bunmapi, opencode the loop to iterate over the extents
> > and call xfs_bmap_del_extent_delay directly.  This both simplifies the
> > code and reduces the number of extent tree lookups required.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_bmap_util.c | 84 ++++++++++++++----------------------------
> >  1 file changed, 28 insertions(+), 56 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 06badcbadeb4..f2b87873612d 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -695,12 +695,10 @@ xfs_getbmap(
> >  }
> >  
> >  /*
> > - * dead simple method of punching delalyed allocation blocks from a range in
> > - * the inode. Walks a block at a time so will be slow, but is only executed in
> > - * rare error cases so the overhead is not critical. This will always punch out
> > - * both the start and end blocks, even if the ranges only partially overlap
> > - * them, so it is up to the caller to ensure that partial blocks are not
> > - * passed in.
> > + * Dead simple method of punching delalyed allocation blocks from a range in
> > + * the inode.  This will always punch out both the start and end blocks, even
> > + * if the ranges only partially overlap them, so it is up to the caller to
> > + * ensure that partial blocks are not passed in.
> >   */
> >  int
> >  xfs_bmap_punch_delalloc_range(
> > @@ -708,63 +706,37 @@ xfs_bmap_punch_delalloc_range(
> >  	xfs_fileoff_t		start_fsb,
> >  	xfs_fileoff_t		length)
> >  {
> ...
> > +	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
> > +		return 0;
> >  
> > -		/*
> > -		 * Note: while we initialise the firstblock/dfops pair, they
> > -		 * should never be used because blocks should never be
> > -		 * allocated or freed for a delalloc extent and hence we need
> > -		 * don't cancel or finish them after the xfs_bunmapi() call.
> > -		 */
> > -		xfs_defer_init(&dfops, &firstblock);
> > -		error = xfs_bunmapi(NULL, ip, start_fsb, 1, 0, 1, &firstblock,
> > -					&dfops, &done);
> > -		if (error)
> > -			break;
> > +	while (got.br_startoff + got.br_blockcount > start_fsb) {
> > +		del = got;
> > +		xfs_trim_extent(&del, start_fsb, length);
> >  
> > -		ASSERT(!xfs_defer_has_unfinished_work(&dfops));
> > -next_block:
> > -		start_fsb++;
> > -		remaining--;
> > -	} while(remaining > 0);
> > +		if (del.br_blockcount && isnullstartblock(del.br_startblock)) {
> 
> I think there's subtle behavior here that warrants a comment (and
> describes the somewhat funky logic). E.g., something like:
> 
> /*
>  * got might point to the extent after del in some cases. The next
>  * iteration will detect this and step back to the previous extent.
>  */
> 
> Alternatively, I find separating the if/else a bit more readable (see
> the appended hunk). But otherwise looks fine:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>

/me agrees (and adds the hunk for testing),

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> > +			error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK,
> > +					&icur, &got, &del);
> > +			if (error || !xfs_iext_get_extent(ifp, &icur, &got))
> > +				break;
> > +		} else {
> > +			if (!xfs_iext_prev_extent(ifp, &icur, &got))
> > +				break;
> > +		}
> > +	}
> >  
> >  	return error;
> >  }
> > -- 
> > 2.17.0
> 
> --- 8< ---
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index f2b87873612d..0070b877ed94 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -727,15 +727,22 @@ xfs_bmap_punch_delalloc_range(
>  		del = got;
>  		xfs_trim_extent(&del, start_fsb, length);
>  
> -		if (del.br_blockcount && isnullstartblock(del.br_startblock)) {
> -			error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK,
> -					&icur, &got, &del);
> -			if (error || !xfs_iext_get_extent(ifp, &icur, &got))
> -				break;
> -		} else {
> +		/*
> +		 * A delete can push the cursor forward. Step back to the
> +		 * previous extent on non-delalloc or extents outside the
> +		 * target range.
> +		 */
> +		if (!del.br_blockcount ||
> +		    !isnullstartblock(del.br_startblock)) {
>  			if (!xfs_iext_prev_extent(ifp, &icur, &got))
>  				break;
> +			continue;
>  		}
> +
> +		error = xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur,
> +						  &got, &del);
> +		if (error || !xfs_iext_get_extent(ifp, &icur, &got))
> +			break;
>  	}
>  
>  	return error;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range
  2018-05-30 13:35   ` Brian Foster
@ 2018-05-30 16:58     ` Darrick J. Wong
  2018-05-30 17:40       ` Brian Foster
  0 siblings, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 16:58 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 09:35:52AM -0400, Brian Foster wrote:
> On Wed, May 30, 2018 at 12:00:00PM +0200, Christoph Hellwig wrote:
> > Both callers want the same looking, so do it only once.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_aops.c      | 2 --
> >  fs/xfs/xfs_bmap_util.c | 7 ++++---
> >  fs/xfs/xfs_iomap.c     | 3 ---
> >  3 files changed, 4 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index f2333e351e07..5dd09e83c81c 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -761,10 +761,8 @@ xfs_aops_discard_page(
> >  		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
> >  			page, ip->i_ino, offset);
> >  
> > -	xfs_ilock(ip, XFS_ILOCK_EXCL);
> >  	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> >  			PAGE_SIZE / i_blocksize(inode));
> > -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >  	if (error && !XFS_FORCED_SHUTDOWN(mp))
> >  		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
> >  out_invalidate:
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index f2b87873612d..86a7ee425bfc 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -712,12 +712,11 @@ xfs_bmap_punch_delalloc_range(
> >  	struct xfs_iext_cursor	icur;
> >  	int			error = 0;
> >  
> > -	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > -
> > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> >  	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> >  		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> >  		if (error)
> > -			return error;
> > +			goto out_unlock;
> >  	}
> >  
> >  	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
> 
> There's a return 0 just below here that needs the exit label treatment.
> Otherwise looks Ok.

Will fix that in my tree for testing.  Brian, will you RVB the fixed up
patch?

--D

> Brian
> 
> > @@ -738,6 +737,8 @@ xfs_bmap_punch_delalloc_range(
> >  		}
> >  	}
> >  
> > +out_unlock:
> > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >  	return error;
> >  }
> >  
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index da6d1995e460..f949f0dd7382 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -1203,11 +1203,8 @@ xfs_file_iomap_end_delalloc(
> >  		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> >  					 XFS_FSB_TO_B(mp, end_fsb) - 1);
> >  
> > -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> >  		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> >  					       end_fsb - start_fsb);
> > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > -
> >  		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
> >  			xfs_alert(mp, "%s: unable to clean up ino %lld",
> >  				__func__, ip->i_ino);
> > -- 
> > 2.17.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/18] xfs: make xfs_writepage_map extent map centric
  2018-05-30 10:00 ` [PATCH 06/18] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
@ 2018-05-30 17:19   ` Darrick J. Wong
  2018-05-30 17:39   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm, Dave Chinner

On Wed, May 30, 2018 at 12:00:01PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_writepage_map() iterates over the bufferheads on a page to decide
> what sort of IO to do and what actions to take.  However, when it comes
> to reflink and deciding when it needs to execute a COW operation, we no
> longer look at the bufferhead state but instead we ignore than and look
> up internal state held in teh COW fork extent list.
> 
> This means xfs_writepage_map() is somewhat confused. It does stuff, then
> ignores it, then tries to handle the impedence mismatch by shovelling the
> results inside the existing mapping code.  It works, but it's a bit of a
> mess and it makes it hard to fix the cached map bug that the writepage
> code currently has.
> 
> To unify the two different mechanisms, we first have to choose a direction.
> That's already been set - we're de-emphasising bufferheads so they are no
> longer a control structure as we need to do taht to allow for eventual
> removal.  Hence we need to move away from looking at bufferhead state to
> determine what operations we need to perform.
> 
> We can't completely get rid of bufferheads yet - they do contain some
> state that is absolutely necessary, such as whether that part of the page
> contains valid data or not (buffer_uptodate()).  Other state in the
> bufferhead is redundant:
> 
> 	BH_dirty - the page is dirty, so we can ignore this and just
> 		write it
> 	BH_delay - we have delalloc extent info in the DATA fork extent
> 		tree
> 	BH_unwritten - same as BH_delay
> 	BH_mapped - indicates we've already used it once for IO and it is
> 		mapped to a disk address. Needs to be ignored for COW
> 		blocks.
> 
> The BH_mapped flag is an interesting case - it's supposed to indicate that
> it's already mapped to disk and so we can just use it "as is".  In theory,
> we don't even have to do an extent lookup to find where to write it too,
> but we have to do that anyway to determine we are actually writing over a
> valid extent.  Hence it's not even serving the purpose of avoiding a an
> extent lookup during writeback, and so we can pretty much ignore it.
> Especially as we have to ignore it for COW operations...
> 
> Therefore, use the extent map as the source of information to tell us
> what actions we need to take and what sort of IO we should perform.  The
> first step is integration xfs_map_blocks() and xfs_map_cow() and have
> xfs_map_blocks() set the io type according to what it looks up.  This
> means it can easily handle both normal overwrite and COW cases.  The
> only thing we also need to add is the ability to return hole mappings.
> 
> We need to return and cache hole mappings now for the case of multiple
> blocks per page.  We no longer use the BH_mapped to indicate a block over
> a hole, so we have to get that info from xfs_map_blocks().  We cache it so
> that holes that span two pages don't need separate lookups.  This allows us
> to avoid ever doing write IO over a hole, too.
> 
> Further, we need to drop the XFS_BMAPI_IGSTATE flag so that we don't
> combine contiguous written and unwritten extents into a single map.  The
> io type needs to match the extent type we are writing to so that we run the
> correct IO completion routine for the IO. There is scope for optimisation
> that would allow us to re-instate the XFS_BMAPI_IGSTATE flag, but this
> requires tweaks to code outside the scope of this change.
> 
> Now that we have xfs_map_blocks() returning both a cached map and the type
> of IO we need to perform, we can rewrite xfs_writepage_map() to drop all
> the bufferhead control. It's also much simplified because it doesn't need
> to explicitly handle COW operations.  Instead of iterating bufferheads, it
> iterates blocks within the page and then looks up what per-block state is
> required from the appropriate bufferhead.  It then validates the cached
> map, and if it's not valid, we get a new map.  If we don't get a valid map
> or it's over a hole, we skip the block.
> 
> At this point, we have to remap the bufferhead via xfs_map_at_offset().
> As previously noted, we had to do this even if the buffer was already
> mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN
> and XFS_IO_COW IO types.  With xfs_map_blocks() now controlling the type,
> even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet-
> written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE.
> Bufferheads that span such regions still need their BH_Delay flags cleared
> and their block numbers calculated, so we now unconditionally map each
> bufferhead before submission.
> 
> But wait! There's more - remember the old "treat unwritten extents as
> holes on read" hack?  Yeah, that means we can have a dirty page with
> unmapped, unwritten bufferheads that contain data!  What makes these so
> special is that the unwritten "hole" bufferheads do not have a valid block
> device pointer, so if we attempt to write them xfs_add_to_ioend() blows
> up. So we make xfs_map_at_offset() do the "realtime or data device"
> lookup from the inode and ignore what was or wasn't put into the
> bufferhead when the buffer was instantiated.
> 
> The astute reader will have realised by now that this code treats
> unwritten extents in multiple-blocks-per-page situations differently.
> If we get any combination of unwritten blocks on a dirty page that contain
> valid data in the page, we're going to convert them to real extents.  This
> can actually be a win, because it means that pages with interleaving
> unwritten and written blocks will get converted to a single written extent
> with zeros replacing the interspersed unwritten blocks.  This is actually
> good for reducing extent list and conversion overhead, and it means we
> issue a contiguous IO instead of lots of little ones.  The downside is
> that we use up a little extra IO bandwidth.  Neither of these seem like a
> bad thing given that spinning disks are seek sensitive, and SSDs/pmem have
> bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger
> IOs will result in better performance on them...
> 
> As a result of all this, the only state we actually care about from the
> bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to
> pass some information to the bio via xfs_add_to_ioend(), but that is
> trivial to separate and pass explicitly.  This means we really only need
> 1 bit of state per block per page from the buffered write path in the
> writeback path.  Everything else we do with the bufferhead is purely to
> make the buffered IO front end continue to work correctly. i.e we've
> pretty much marginalised bufferheads in the writeback path completely.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> [hch: forward port + slight refactoring]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 273 ++++++++++++++++++++--------------------------
>  fs/xfs/xfs_aops.h |   4 +-
>  2 files changed, 124 insertions(+), 153 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 5dd09e83c81c..8cc41a786b5e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -378,78 +378,93 @@ xfs_map_blocks(
>  	struct inode		*inode,
>  	loff_t			offset,
>  	struct xfs_bmbt_irec	*imap,
> -	int			type)
> +	int			*type)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	ssize_t			count = i_blocksize(inode);
>  	xfs_fileoff_t		offset_fsb, end_fsb;
> +	int			whichfork = XFS_DATA_FORK;
>  	int			error = 0;
> -	int			bmapi_flags = XFS_BMAPI_ENTIRE;
>  	int			nimaps = 1;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	/*
> -	 * Truncate can race with writeback since writeback doesn't take the
> -	 * iolock and truncate decreases the file size before it starts
> -	 * truncating the pages between new_size and old_size.  Therefore, we
> -	 * can end up in the situation where writeback gets a CoW fork mapping
> -	 * but the truncate makes the mapping invalid and we end up in here
> -	 * trying to get a new mapping.  Bail out here so that we simply never
> -	 * get a valid mapping and so we drop the write altogether.  The page
> -	 * truncation will kill the contents anyway.
> -	 */
> -	if (type == XFS_IO_COW && offset > i_size_read(inode))
> -		return 0;
> -
> -	ASSERT(type != XFS_IO_COW);
> -	if (type == XFS_IO_UNWRITTEN)
> -		bmapi_flags |= XFS_BMAPI_IGSTATE;
> -
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
>  	ASSERT(ip->i_d.di_format != XFS_DINODE_FMT_BTREE ||
>  	       (ip->i_df.if_flags & XFS_IFEXTENTS));
>  	ASSERT(offset <= mp->m_super->s_maxbytes);
>  
> +	if (xfs_is_reflink_inode(ip) &&
> +	    xfs_reflink_find_cow_mapping(ip, offset, imap)) {
> +		xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +		/*
> +		 * Truncate can race with writeback since writeback doesn't
> +		 * take the iolock and truncate decreases the file size before
> +		 * it starts truncating the pages between new_size and old_size.
> +		 * Therefore, we can end up in the situation where writeback
> +		 * gets a CoW fork mapping but the truncate makes the mapping
> +		 * invalid and we end up in here trying to get a new mapping.
> +		 * bail out here so that we simply never get a valid mapping
> +		 * and so we drop the write altogether.  The page truncation
> +		 * will kill the contents anyway.
> +		 */
> +		if (offset > i_size_read(inode))
> +			return 0;
> +		whichfork = XFS_COW_FORK;
> +		*type = XFS_IO_COW;
> +		goto allocate_blocks;
> +	}
> +
>  	if (offset > mp->m_super->s_maxbytes - count)
>  		count = mp->m_super->s_maxbytes - offset;
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> -				imap, &nimaps, bmapi_flags);
> -	/*
> -	 * Truncate an overwrite extent if there's a pending CoW
> -	 * reservation before the end of this extent.  This forces us
> -	 * to come back to writepage to take care of the CoW.
> -	 */
> -	if (nimaps && type == XFS_IO_OVERWRITE)
> +				imap, &nimaps, XFS_BMAPI_ENTIRE);
> +	if (!nimaps) {
> +		/*
> +		 * Lookup returns no match? Beyond eof? regardless,
> +		 * return it as a hole so we don't write it
> +		 */
> +		imap->br_startoff = offset_fsb;
> +		imap->br_blockcount = end_fsb - offset_fsb;
> +		imap->br_startblock = HOLESTARTBLOCK;
> +		*type = XFS_IO_HOLE;
> +	} else if (imap->br_startblock == HOLESTARTBLOCK) {
> +		/* landed in a hole */
> +		*type = XFS_IO_HOLE;
> +	} else {
> +		if (isnullstartblock(imap->br_startblock)) {
> +			/* got a delalloc extent */
> +			*type = XFS_IO_DELALLOC;
> +			goto allocate_blocks;
> +		}
> +
> +		/*
> +		 * Got an existing extent for overwrite.  Truncate it if there
> +		 * is a pending CoW reservation before the end of this extent,
> +		 * so that we pick up the COW extents in the next iteration.
> +		 */
>  		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
> +		if (imap->br_state == XFS_EXT_UNWRITTEN)
> +			*type = XFS_IO_UNWRITTEN;
> +		else
> +			*type = XFS_IO_OVERWRITE;
> +	}
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
> -	if (error)
> -		return error;
> -
> -	if (type == XFS_IO_DELALLOC &&
> -	    (!nimaps || isnullstartblock(imap->br_startblock))) {
> -		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
> -				imap);
> -		if (!error)
> -			trace_xfs_map_blocks_alloc(ip, offset, count, type, imap);
> -		return error;
> -	}
> +	trace_xfs_map_blocks_found(ip, offset, count, *type, imap);
> +	return error;
>  
> -#ifdef DEBUG
> -	if (type == XFS_IO_UNWRITTEN) {
> -		ASSERT(nimaps);
> -		ASSERT(imap->br_startblock != HOLESTARTBLOCK);
> -		ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
> -	}
> -#endif
> -	if (nimaps)
> -		trace_xfs_map_blocks_found(ip, offset, count, type, imap);
> -	return 0;
> +allocate_blocks:
> +	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +	if (!error)
> +		error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
> +	if (!error)
> +		trace_xfs_map_blocks_alloc(ip, offset, count, *type, imap);
> +	return error;
>  }
>  
>  STATIC bool
> @@ -709,6 +724,14 @@ xfs_map_at_offset(
>  	set_buffer_mapped(bh);
>  	clear_buffer_delay(bh);
>  	clear_buffer_unwritten(bh);
> +
> +	/*
> +	 * If this is a realtime file, data may be on a different device.
> +	 * to that pointed to from the buffer_head b_bdev currently. We can't
> +	 * trust that the bufferhead has a already been mapped correctly, so
> +	 * set the bdev now.
> +	 */
> +	bh->b_bdev = xfs_find_bdev_for_inode(inode);
>  }
>  
>  STATIC void
> @@ -769,56 +792,6 @@ xfs_aops_discard_page(
>  	xfs_vm_invalidatepage(page, 0, PAGE_SIZE);
>  }
>  
> -static int
> -xfs_map_cow(
> -	struct xfs_writepage_ctx *wpc,
> -	struct inode		*inode,
> -	loff_t			offset,
> -	unsigned int		*new_type)
> -{
> -	struct xfs_inode	*ip = XFS_I(inode);
> -	struct xfs_bmbt_irec	imap;
> -	bool			is_cow = false;
> -	int			error;
> -
> -	/*
> -	 * If we already have a valid COW mapping keep using it.
> -	 */
> -	if (wpc->io_type == XFS_IO_COW) {
> -		wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap, offset);
> -		if (wpc->imap_valid) {
> -			*new_type = XFS_IO_COW;
> -			return 0;
> -		}
> -	}
> -
> -	/*
> -	 * Else we need to check if there is a COW mapping at this offset.
> -	 */
> -	xfs_ilock(ip, XFS_ILOCK_SHARED);
> -	is_cow = xfs_reflink_find_cow_mapping(ip, offset, &imap);
> -	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -
> -	if (!is_cow)
> -		return 0;
> -
> -	/*
> -	 * And if the COW mapping has a delayed extent here we need to
> -	 * allocate real space for it now.
> -	 */
> -	if (isnullstartblock(imap.br_startblock)) {
> -		error = xfs_iomap_write_allocate(ip, XFS_COW_FORK, offset,
> -				&imap);
> -		if (error)
> -			return error;
> -	}
> -
> -	wpc->io_type = *new_type = XFS_IO_COW;
> -	wpc->imap_valid = true;
> -	wpc->imap = imap;
> -	return 0;
> -}
> -
>  /*
>   * We implement an immediate ioend submission policy here to avoid needing to
>   * chain multiple ioends and hence nest mempool allocations which can violate
> @@ -845,85 +818,81 @@ xfs_writepage_map(
>  {
>  	LIST_HEAD(submit_list);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh, *head;
> +	struct buffer_head	*bh;
>  	ssize_t			len = i_blocksize(inode);
> -	uint64_t		offset;
>  	int			error = 0;
>  	int			count = 0;
> -	int			uptodate = 1;
> -	unsigned int		new_type;
> +	bool			uptodate = true;
> +	loff_t			file_offset;	/* file offset of page */
> +	unsigned		poffset;	/* offset into page */
>  
> -	bh = head = page_buffers(page);
> -	offset = page_offset(page);
> -	do {
> -		if (offset >= end_offset)
> +	/*
> +	 * Walk the blocks on the page, and we we run off then end of the
> +	 * current map or find the current map invalid, grab a new one.
> +	 * We only use bufferheads here to check per-block state - they no
> +	 * longer control the iteration through the page. This allows us to
> +	 * replace the bufferhead with some other state tracking mechanism in
> +	 * future.
> +	 */
> +	file_offset = page_offset(page);
> +	bh = page_buffers(page);
> +	for (poffset = 0;
> +	     poffset < PAGE_SIZE;
> +	     poffset += len, file_offset += len, bh = bh->b_this_page) {
> +		/* past the range we are writing, so nothing more to write. */
> +		if (file_offset >= end_offset)
>  			break;
> -		if (!buffer_uptodate(bh))
> -			uptodate = 0;
>  
>  		/*
> -		 * set_page_dirty dirties all buffers in a page, independent
> -		 * of their state.  The dirty state however is entirely
> -		 * meaningless for holes (!mapped && uptodate), so skip
> -		 * buffers covering holes here.
> +		 * Block does not contain valid data, skip it, mark the current
> +		 * map as invalid because we have a discontiguity. This ensures
> +		 * we put subsequent writeable buffers into a new ioend.
>  		 */
> -		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
> -			wpc->imap_valid = false;
> -			continue;
> -		}
> -
> -		if (buffer_unwritten(bh))
> -			new_type = XFS_IO_UNWRITTEN;
> -		else if (buffer_delay(bh))
> -			new_type = XFS_IO_DELALLOC;
> -		else if (buffer_uptodate(bh))
> -			new_type = XFS_IO_OVERWRITE;
> -		else {
> +		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> -			/*
> -			 * This buffer is not uptodate and will not be
> -			 * written to disk.  Ensure that we will put any
> -			 * subsequent writeable buffers into a new
> -			 * ioend.
> -			 */
> +			uptodate = false;
>  			wpc->imap_valid = false;
>  			continue;
>  		}
>  
> -		if (xfs_is_reflink_inode(XFS_I(inode))) {
> -			error = xfs_map_cow(wpc, inode, offset, &new_type);
> -			if (error)
> -				goto out;
> -		}
> -
> -		if (wpc->io_type != new_type) {
> -			wpc->io_type = new_type;
> -			wpc->imap_valid = false;
> -		}
> -
> +		/* Check to see if current map spans this file offset */
>  		if (wpc->imap_valid)
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
> +		/*
> +		 * If we don't have a valid map, now it's time to get a new one
> +		 * for this offset.  This will convert delayed allocations
> +		 * (including COW ones) into real extents.  If we return without
> +		 * a valid map, it means we landed in a hole and we skip the
> +		 * block.
> +		 */
>  		if (!wpc->imap_valid) {
> -			error = xfs_map_blocks(inode, offset, &wpc->imap,
> -					     wpc->io_type);
> +			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
> +					     &wpc->io_type);
>  			if (error)
>  				goto out;
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
>  		}
> -		if (wpc->imap_valid) {
> -			lock_buffer(bh);
> -			if (wpc->io_type != XFS_IO_OVERWRITE)
> -				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
> -			count++;
> +
> +		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> +			/*
> +			 * set_page_dirty dirties all buffers in a page, independent
> +			 * of their state.  The dirty state however is entirely
> +			 * meaningless for holes (!mapped && uptodate), so check we did
> +			 * have a buffer covering a hole here and continue.
> +			 */
> +			continue;
>  		}
>  
> -	} while (offset += len, ((bh = bh->b_this_page) != head));
> +		lock_buffer(bh);
> +		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
> +		count++;
> +	}
>  
> -	if (uptodate && bh == head)
> +	if (uptodate && poffset == PAGE_SIZE)
>  		SetPageUptodate(page);
>  
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index 69346d460dfa..b2ef5b661761 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -29,6 +29,7 @@ enum {
>  	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
>  	XFS_IO_OVERWRITE,	/* covers already allocated extent */
>  	XFS_IO_COW,		/* covers copy-on-write extent */
> +	XFS_IO_HOLE,		/* covers region without any block allocation */
>  };
>  
>  #define XFS_IO_TYPES \
> @@ -36,7 +37,8 @@ enum {
>  	{ XFS_IO_DELALLOC,		"delalloc" }, \
>  	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
>  	{ XFS_IO_OVERWRITE,		"overwrite" }, \
> -	{ XFS_IO_COW,			"CoW" }
> +	{ XFS_IO_COW,			"CoW" }, \
> +	{ XFS_IO_HOLE,			"hole" }
>  
>  /*
>   * Structure for buffered I/O completions.
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag
  2018-05-30 10:00 ` [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
@ 2018-05-30 17:19   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:02PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 6 ++----
>  fs/xfs/libxfs/xfs_bmap.h | 3 ---
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7b0e2b551e23..4b5e014417d2 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3799,8 +3799,7 @@ xfs_bmapi_update_map(
>  		   mval[-1].br_startblock != HOLESTARTBLOCK &&
>  		   mval->br_startblock == mval[-1].br_startblock +
>  					  mval[-1].br_blockcount &&
> -		   ((flags & XFS_BMAPI_IGSTATE) ||
> -			mval[-1].br_state == mval->br_state)) {
> +		   mval[-1].br_state == mval->br_state) {
>  		ASSERT(mval->br_startoff ==
>  		       mval[-1].br_startoff + mval[-1].br_blockcount);
>  		mval[-1].br_blockcount += mval->br_blockcount;
> @@ -3845,7 +3844,7 @@ xfs_bmapi_read(
>  
>  	ASSERT(*nmap >= 1);
>  	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
> -			   XFS_BMAPI_IGSTATE|XFS_BMAPI_COWFORK)));
> +			   XFS_BMAPI_COWFORK)));
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
>  
>  	if (unlikely(XFS_TEST_ERROR(
> @@ -4290,7 +4289,6 @@ xfs_bmapi_write(
>  
>  	ASSERT(*nmap >= 1);
>  	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
> -	ASSERT(!(flags & XFS_BMAPI_IGSTATE));
>  	ASSERT(tp != NULL ||
>  	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
>  			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 2c233f9f1a26..a845fe57d1b5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -80,8 +80,6 @@ struct xfs_extent_free_item
>  #define XFS_BMAPI_METADATA	0x002	/* mapping metadata not user data */
>  #define XFS_BMAPI_ATTRFORK	0x004	/* use attribute fork not data */
>  #define XFS_BMAPI_PREALLOC	0x008	/* preallocation op: unwritten space */
> -#define XFS_BMAPI_IGSTATE	0x010	/* Ignore state - */
> -					/* combine contig. space */
>  #define XFS_BMAPI_CONTIG	0x020	/* must allocate only one extent */
>  /*
>   * unwritten extent conversion - this needs write cache flushing and no additional
> @@ -128,7 +126,6 @@ struct xfs_extent_free_item
>  	{ XFS_BMAPI_METADATA,	"METADATA" }, \
>  	{ XFS_BMAPI_ATTRFORK,	"ATTRFORK" }, \
>  	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
> -	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
>  	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
>  	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
>  	{ XFS_BMAPI_ZERO,	"ZERO" }, \
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping
  2018-05-30 10:00 ` [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
@ 2018-05-30 17:23   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:03PM +0200, Christoph Hellwig wrote:
> We only have one caller left, and open coding the simple extent list
> lookup in it allows us to make the code both more understandable and
> reuse calculations and variables already present.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c    | 17 ++++++++++++-----
>  fs/xfs/xfs_reflink.c | 30 ------------------------------
>  fs/xfs/xfs_reflink.h |  2 --
>  fs/xfs/xfs_trace.h   |  1 -
>  4 files changed, 12 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 8cc41a786b5e..e2671b223409 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -385,6 +385,7 @@ xfs_map_blocks(
>  	ssize_t			count = i_blocksize(inode);
>  	xfs_fileoff_t		offset_fsb, end_fsb;
>  	int			whichfork = XFS_DATA_FORK;
> +	struct xfs_iext_cursor	icur;
>  	int			error = 0;
>  	int			nimaps = 1;
>  
> @@ -396,8 +397,18 @@ xfs_map_blocks(
>  	       (ip->i_df.if_flags & XFS_IFEXTENTS));
>  	ASSERT(offset <= mp->m_super->s_maxbytes);
>  
> +	if (offset > mp->m_super->s_maxbytes - count)
> +		count = mp->m_super->s_maxbytes - offset;
> +	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
> +	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +
> +	/*
> +	 * Check if this is offset is covered by a COW extents, and if yes use
> +	 * it directly instead of looking up anything in the data fork.
> +	 */
>  	if (xfs_is_reflink_inode(ip) &&
> -	    xfs_reflink_find_cow_mapping(ip, offset, imap)) {
> +	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap) &&
> +	    imap->br_startoff <= offset_fsb) {
>  		xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  		/*
>  		 * Truncate can race with writeback since writeback doesn't
> @@ -417,10 +428,6 @@ xfs_map_blocks(
>  		goto allocate_blocks;
>  	}
>  
> -	if (offset > mp->m_super->s_maxbytes - count)
> -		count = mp->m_super->s_maxbytes - offset;
> -	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>  				imap, &nimaps, XFS_BMAPI_ENTIRE);
>  	if (!nimaps) {
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 713e857d9ffa..8e5eb8e70c89 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -484,36 +484,6 @@ xfs_reflink_allocate_cow(
>  	return error;
>  }
>  
> -/*
> - * Find the CoW reservation for a given byte offset of a file.
> - */
> -bool
> -xfs_reflink_find_cow_mapping(
> -	struct xfs_inode		*ip,
> -	xfs_off_t			offset,
> -	struct xfs_bmbt_irec		*imap)
> -{
> -	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> -	xfs_fileoff_t			offset_fsb;
> -	struct xfs_bmbt_irec		got;
> -	struct xfs_iext_cursor		icur;
> -
> -	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
> -
> -	if (!xfs_is_reflink_inode(ip))
> -		return false;
> -	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> -	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
> -		return false;
> -	if (got.br_startoff > offset_fsb)
> -		return false;
> -
> -	trace_xfs_reflink_find_cow_mapping(ip, offset, 1, XFS_IO_OVERWRITE,
> -			&got);
> -	*imap = got;
> -	return true;
> -}
> -
>  /*
>   * Trim an extent to end at the next CoW reservation past offset_fsb.
>   */
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 701487bab468..15a456492667 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -32,8 +32,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
>  extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
> -extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
> -		struct xfs_bmbt_irec *imap);
>  extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
>  		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 9d4c4ca24fe6..ed8f774944ba 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3227,7 +3227,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
>  DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
>  
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
> -DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);
>  
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow
  2018-05-30 10:00 ` [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
@ 2018-05-30 17:27   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:04PM +0200, Christoph Hellwig wrote:
> In the only caller we just did a lookup in the COW extent tree for
> the same offset.  Reuse that result and save a lookup, as well as
> shortening the ilock hold time.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c    | 25 ++++++++++++++++---------
>  fs/xfs/xfs_reflink.c | 33 ---------------------------------
>  fs/xfs/xfs_reflink.h |  2 --
>  3 files changed, 16 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index e2671b223409..587493e9c8a1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -383,7 +383,7 @@ xfs_map_blocks(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	ssize_t			count = i_blocksize(inode);
> -	xfs_fileoff_t		offset_fsb, end_fsb;
> +	xfs_fileoff_t		offset_fsb, end_fsb, cow_fsb = NULLFILEOFF;
>  	int			whichfork = XFS_DATA_FORK;
>  	struct xfs_iext_cursor	icur;
>  	int			error = 0;
> @@ -407,8 +407,9 @@ xfs_map_blocks(
>  	 * it directly instead of looking up anything in the data fork.
>  	 */
>  	if (xfs_is_reflink_inode(ip) &&
> -	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap) &&
> -	    imap->br_startoff <= offset_fsb) {
> +	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap))
> +		cow_fsb = imap->br_startoff;
> +	if (cow_fsb != NULLFILEOFF && cow_fsb <= offset_fsb) {
>  		xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  		/*
>  		 * Truncate can race with writeback since writeback doesn't
> @@ -430,6 +431,10 @@ xfs_map_blocks(
>  
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>  				imap, &nimaps, XFS_BMAPI_ENTIRE);
> +	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +	if (error)
> +		return error;
> +
>  	if (!nimaps) {
>  		/*
>  		 * Lookup returns no match? Beyond eof? regardless,
> @@ -454,21 +459,23 @@ xfs_map_blocks(
>  		 * is a pending CoW reservation before the end of this extent,
>  		 * so that we pick up the COW extents in the next iteration.
>  		 */
> -		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
> +		if (cow_fsb != NULLFILEOFF &&
> +		    cow_fsb < imap->br_startoff + imap->br_blockcount) {
> +			imap->br_blockcount = cow_fsb - imap->br_startoff;
> +			trace_xfs_reflink_trim_irec(ip, imap);
> +		}
> +
>  		if (imap->br_state == XFS_EXT_UNWRITTEN)
>  			*type = XFS_IO_UNWRITTEN;
>  		else
>  			*type = XFS_IO_OVERWRITE;
>  	}
> -	xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
>  	trace_xfs_map_blocks_found(ip, offset, count, *type, imap);
> -	return error;
> +	return 0;
>  
>  allocate_blocks:
> -	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -	if (!error)
> -		error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
> +	error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
>  	if (!error)
>  		trace_xfs_map_blocks_alloc(ip, offset, count, *type, imap);
>  	return error;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 8e5eb8e70c89..ff76bc56ff3d 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -484,39 +484,6 @@ xfs_reflink_allocate_cow(
>  	return error;
>  }
>  
> -/*
> - * Trim an extent to end at the next CoW reservation past offset_fsb.
> - */
> -void
> -xfs_reflink_trim_irec_to_next_cow(
> -	struct xfs_inode		*ip,
> -	xfs_fileoff_t			offset_fsb,
> -	struct xfs_bmbt_irec		*imap)
> -{
> -	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> -	struct xfs_bmbt_irec		got;
> -	struct xfs_iext_cursor		icur;
> -
> -	if (!xfs_is_reflink_inode(ip))
> -		return;
> -
> -	/* Find the extent in the CoW fork. */
> -	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
> -		return;
> -
> -	/* This is the extent before; try sliding up one. */
> -	if (got.br_startoff < offset_fsb) {
> -		if (!xfs_iext_next_extent(ifp, &icur, &got))
> -			return;
> -	}
> -
> -	if (got.br_startoff >= imap->br_startoff + imap->br_blockcount)
> -		return;
> -
> -	imap->br_blockcount = got.br_startoff - imap->br_startoff;
> -	trace_xfs_reflink_trim_irec(ip, imap);
> -}
> -
>  /*
>   * Cancel CoW reservations for some block range of an inode.
>   *
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 15a456492667..e8d4d50c629f 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -32,8 +32,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
>  extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
> -extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
> -		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
>  
>  extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
>  		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly
  2018-05-30 10:00 ` [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
@ 2018-05-30 17:32   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:05PM +0200, Christoph Hellwig wrote:
> xfs_bmapi_read adds zero value in xfs_map_blocks.  Replace it with a
> direct call to the low-level extent lookup function.
> 
> Note that we now always pass a 0 length to the trace points as we ask
> for an unspecified len.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 587493e9c8a1..cef2bc3cf98b 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -387,7 +387,6 @@ xfs_map_blocks(
>  	int			whichfork = XFS_DATA_FORK;
>  	struct xfs_iext_cursor	icur;
>  	int			error = 0;
> -	int			nimaps = 1;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
> @@ -429,24 +428,16 @@ xfs_map_blocks(
>  		goto allocate_blocks;
>  	}
>  
> -	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> -				imap, &nimaps, XFS_BMAPI_ENTIRE);
> +	if (!xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, imap))
> +		imap->br_startoff = end_fsb;	/* fake a hole past EOF */
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -	if (error)
> -		return error;
>  
> -	if (!nimaps) {
> -		/*
> -		 * Lookup returns no match? Beyond eof? regardless,
> -		 * return it as a hole so we don't write it
> -		 */
> +	if (imap->br_startoff > offset_fsb) {
> +		/* landed in a hole or beyond EOF */
> +		imap->br_blockcount = imap->br_startoff - offset_fsb;
>  		imap->br_startoff = offset_fsb;
> -		imap->br_blockcount = end_fsb - offset_fsb;
>  		imap->br_startblock = HOLESTARTBLOCK;
>  		*type = XFS_IO_HOLE;
> -	} else if (imap->br_startblock == HOLESTARTBLOCK) {
> -		/* landed in a hole */
> -		*type = XFS_IO_HOLE;
>  	} else {
>  		if (isnullstartblock(imap->br_startblock)) {
>  			/* got a delalloc extent */
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers
  2018-05-30 10:00 ` [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
@ 2018-05-30 17:34   ` Darrick J. Wong
  2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:06PM +0200, Christoph Hellwig wrote:
> Finding a buffer that isn't uptodate doesn't invalidate the mapping for
> any given block.  The last_sector check will already take care of starting
> another ioend as soon as we find any non-update buffer, and if the current
> mapping doesn't include the next uptodate buffer the xfs_imap_valid check
> will take care of it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index cef2bc3cf98b..7dc13b0aae60 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -849,15 +849,12 @@ xfs_writepage_map(
>  			break;
>  
>  		/*
> -		 * Block does not contain valid data, skip it, mark the current
> -		 * map as invalid because we have a discontiguity. This ensures
> -		 * we put subsequent writeable buffers into a new ioend.
> +		 * Block does not contain valid data, skip it.
>  		 */
>  		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
>  			uptodate = false;
> -			wpc->imap_valid = false;
>  			continue;
>  		}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/18] xfs: make xfs_writepage_map extent map centric
  2018-05-30 10:00 ` [PATCH 06/18] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
  2018-05-30 17:19   ` Darrick J. Wong
@ 2018-05-30 17:39   ` Brian Foster
  2018-05-31 16:11     ` Christoph Hellwig
  2018-05-31 16:12     ` Christoph Hellwig
  1 sibling, 2 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-30 17:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm, Dave Chinner

On Wed, May 30, 2018 at 12:00:01PM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_writepage_map() iterates over the bufferheads on a page to decide
> what sort of IO to do and what actions to take.  However, when it comes
> to reflink and deciding when it needs to execute a COW operation, we no
> longer look at the bufferhead state but instead we ignore than and look
> up internal state held in teh COW fork extent list.
> 
...
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> [hch: forward port + slight refactoring]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

I believe Dave originally intended to split this up into multiple
patches. Dave, did you happen to get anywhere with that before Christoph
pulled this in?

If not, could we at least split off some of the behavior changes into
separate patches? For example, dropping the !mapped && uptodate check
that causes us to writeback zeroed blocks over unwritten extents is a
behavior change that warrants a separate patch. I also think the
associated text above doesn't quite describe the details of the behavior
change. Writeback only converts those extents if they've been previously
read somehow or another, otherwise those blocks are still skipped.

So for example, (on a bsize=1k fs) this:

	xfs_io -fc "falloc 0 4k" -c "pread 3k 1k" -c "pwrite 3k 1k" \
		-c fsync -c "fiemap -v" /mnt/file

... now results in different writeback behavior from this:

	xfs_io -fc "falloc 0 4k" -c "pwrite 3k 1k" \
		-c fsync -c "fiemap -v" /mnt/file

That may be fine, but it does leave me thinking whether we should
consider ways to provide more consistent behavior here (such as zeroing
blocks at writeback time, for example).

>  fs/xfs/xfs_aops.c | 273 ++++++++++++++++++++--------------------------
>  fs/xfs/xfs_aops.h |   4 +-
>  2 files changed, 124 insertions(+), 153 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 5dd09e83c81c..8cc41a786b5e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
...
> @@ -845,85 +818,81 @@ xfs_writepage_map(
>  {
>  	LIST_HEAD(submit_list);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh, *head;
> +	struct buffer_head	*bh;
>  	ssize_t			len = i_blocksize(inode);
> -	uint64_t		offset;
>  	int			error = 0;
>  	int			count = 0;
> -	int			uptodate = 1;
> -	unsigned int		new_type;
> +	bool			uptodate = true;
> +	loff_t			file_offset;	/* file offset of page */
> +	unsigned		poffset;	/* offset into page */
>  
> -	bh = head = page_buffers(page);
> -	offset = page_offset(page);
> -	do {
> -		if (offset >= end_offset)
> +	/*
> +	 * Walk the blocks on the page, and we we run off then end of the
> +	 * current map or find the current map invalid, grab a new one.
> +	 * We only use bufferheads here to check per-block state - they no
> +	 * longer control the iteration through the page. This allows us to
> +	 * replace the bufferhead with some other state tracking mechanism in
> +	 * future.
> +	 */
> +	file_offset = page_offset(page);
> +	bh = page_buffers(page);
> +	for (poffset = 0;
> +	     poffset < PAGE_SIZE;
> +	     poffset += len, file_offset += len, bh = bh->b_this_page) {
> +		/* past the range we are writing, so nothing more to write. */
> +		if (file_offset >= end_offset)
>  			break;
> -		if (!buffer_uptodate(bh))
> -			uptodate = 0;
>  
>  		/*
> -		 * set_page_dirty dirties all buffers in a page, independent
> -		 * of their state.  The dirty state however is entirely
> -		 * meaningless for holes (!mapped && uptodate), so skip
> -		 * buffers covering holes here.
> +		 * Block does not contain valid data, skip it, mark the current
> +		 * map as invalid because we have a discontiguity. This ensures
> +		 * we put subsequent writeable buffers into a new ioend.
>  		 */
> -		if (!buffer_mapped(bh) && buffer_uptodate(bh)) {
> -			wpc->imap_valid = false;
> -			continue;
> -		}
> -
> -		if (buffer_unwritten(bh))
> -			new_type = XFS_IO_UNWRITTEN;
> -		else if (buffer_delay(bh))
> -			new_type = XFS_IO_DELALLOC;
> -		else if (buffer_uptodate(bh))
> -			new_type = XFS_IO_OVERWRITE;
> -		else {
> +		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> -			/*
> -			 * This buffer is not uptodate and will not be
> -			 * written to disk.  Ensure that we will put any
> -			 * subsequent writeable buffers into a new
> -			 * ioend.
> -			 */
> +			uptodate = false;
>  			wpc->imap_valid = false;
>  			continue;
>  		}
>  
> -		if (xfs_is_reflink_inode(XFS_I(inode))) {
> -			error = xfs_map_cow(wpc, inode, offset, &new_type);
> -			if (error)
> -				goto out;
> -		}
> -
> -		if (wpc->io_type != new_type) {
> -			wpc->io_type = new_type;
> -			wpc->imap_valid = false;
> -		}
> -
> +		/* Check to see if current map spans this file offset */
>  		if (wpc->imap_valid)
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);

What if the file is reflinked and the current page covers a non-shared
block but has an overlapping cow mapping due to cowextsize? The current
logic unconditionally uses the COW mapping for writeback. The updated
logic doesn't appear to do that in all cases. Consider if the current
imap was delalloc (and so not trimmed) or the cow mapping was introduced
after the current imap was mapped. This logic appears to prioritize the
current mapping so long as it is valid. Doesn't that break the
cowextsize hint?

Brian

> +		/*
> +		 * If we don't have a valid map, now it's time to get a new one
> +		 * for this offset.  This will convert delayed allocations
> +		 * (including COW ones) into real extents.  If we return without
> +		 * a valid map, it means we landed in a hole and we skip the
> +		 * block.
> +		 */
>  		if (!wpc->imap_valid) {
> -			error = xfs_map_blocks(inode, offset, &wpc->imap,
> -					     wpc->io_type);
> +			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
> +					     &wpc->io_type);
>  			if (error)
>  				goto out;
>  			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 offset);
> +							 file_offset);
>  		}
> -		if (wpc->imap_valid) {
> -			lock_buffer(bh);
> -			if (wpc->io_type != XFS_IO_OVERWRITE)
> -				xfs_map_at_offset(inode, bh, &wpc->imap, offset);
> -			xfs_add_to_ioend(inode, bh, offset, wpc, wbc, &submit_list);
> -			count++;
> +
> +		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> +			/*
> +			 * set_page_dirty dirties all buffers in a page, independent
> +			 * of their state.  The dirty state however is entirely
> +			 * meaningless for holes (!mapped && uptodate), so check we did
> +			 * have a buffer covering a hole here and continue.
> +			 */
> +			continue;
>  		}
>  
> -	} while (offset += len, ((bh = bh->b_this_page) != head));
> +		lock_buffer(bh);
> +		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
> +		count++;
> +	}
>  
> -	if (uptodate && bh == head)
> +	if (uptodate && poffset == PAGE_SIZE)
>  		SetPageUptodate(page);
>  
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index 69346d460dfa..b2ef5b661761 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -29,6 +29,7 @@ enum {
>  	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
>  	XFS_IO_OVERWRITE,	/* covers already allocated extent */
>  	XFS_IO_COW,		/* covers copy-on-write extent */
> +	XFS_IO_HOLE,		/* covers region without any block allocation */
>  };
>  
>  #define XFS_IO_TYPES \
> @@ -36,7 +37,8 @@ enum {
>  	{ XFS_IO_DELALLOC,		"delalloc" }, \
>  	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
>  	{ XFS_IO_OVERWRITE,		"overwrite" }, \
> -	{ XFS_IO_COW,			"CoW" }
> +	{ XFS_IO_COW,			"CoW" }, \
> +	{ XFS_IO_HOLE,			"hole" }
>  
>  /*
>   * Structure for buffered I/O completions.
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range
  2018-05-30 16:58     ` Darrick J. Wong
@ 2018-05-30 17:40       ` Brian Foster
  0 siblings, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-30 17:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 09:58:04AM -0700, Darrick J. Wong wrote:
> On Wed, May 30, 2018 at 09:35:52AM -0400, Brian Foster wrote:
> > On Wed, May 30, 2018 at 12:00:00PM +0200, Christoph Hellwig wrote:
> > > Both callers want the same looking, so do it only once.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/xfs/xfs_aops.c      | 2 --
> > >  fs/xfs/xfs_bmap_util.c | 7 ++++---
> > >  fs/xfs/xfs_iomap.c     | 3 ---
> > >  3 files changed, 4 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > index f2333e351e07..5dd09e83c81c 100644
> > > --- a/fs/xfs/xfs_aops.c
> > > +++ b/fs/xfs/xfs_aops.c
> > > @@ -761,10 +761,8 @@ xfs_aops_discard_page(
> > >  		"page discard on page "PTR_FMT", inode 0x%llx, offset %llu.",
> > >  			page, ip->i_ino, offset);
> > >  
> > > -	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > >  	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> > >  			PAGE_SIZE / i_blocksize(inode));
> > > -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > >  	if (error && !XFS_FORCED_SHUTDOWN(mp))
> > >  		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
> > >  out_invalidate:
> > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > > index f2b87873612d..86a7ee425bfc 100644
> > > --- a/fs/xfs/xfs_bmap_util.c
> > > +++ b/fs/xfs/xfs_bmap_util.c
> > > @@ -712,12 +712,11 @@ xfs_bmap_punch_delalloc_range(
> > >  	struct xfs_iext_cursor	icur;
> > >  	int			error = 0;
> > >  
> > > -	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > > -
> > > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > >  	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> > >  		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> > >  		if (error)
> > > -			return error;
> > > +			goto out_unlock;
> > >  	}
> > >  
> > >  	if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got))
> > 
> > There's a return 0 just below here that needs the exit label treatment.
> > Otherwise looks Ok.
> 
> Will fix that in my tree for testing.  Brian, will you RVB the fixed up
> patch?
> 

Yep. With the locking fix:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> --D
> 
> > Brian
> > 
> > > @@ -738,6 +737,8 @@ xfs_bmap_punch_delalloc_range(
> > >  		}
> > >  	}
> > >  
> > > +out_unlock:
> > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > >  	return error;
> > >  }
> > >  
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index da6d1995e460..f949f0dd7382 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -1203,11 +1203,8 @@ xfs_file_iomap_end_delalloc(
> > >  		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> > >  					 XFS_FSB_TO_B(mp, end_fsb) - 1);
> > >  
> > > -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> > >  		error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> > >  					       end_fsb - start_fsb);
> > > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > -
> > >  		if (error && !XFS_FORCED_SHUTDOWN(mp)) {
> > >  			xfs_alert(mp, "%s: unable to clean up ino %lld",
> > >  				__func__, ip->i_ino);
> > > -- 
> > > 2.17.0
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] xfs: remove the imap_valid flag
  2018-05-30 10:00 ` [PATCH 12/18] xfs: remove the imap_valid flag Christoph Hellwig
@ 2018-05-30 17:44   ` Darrick J. Wong
  2018-05-31 13:47   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:07PM +0200, Christoph Hellwig wrote:
> Simplify the way we check for a valid imap - we know we have a valid
> mapping after xfs_map_blocks returned successfully, and we know we can
> call xfs_imap_valid on any imap, as it will always fail on a
> zero-initialized map.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 7dc13b0aae60..910b410e5a90 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -42,7 +42,6 @@
>   */
>  struct xfs_writepage_ctx {
>  	struct xfs_bmbt_irec    imap;
> -	bool			imap_valid;
>  	unsigned int		io_type;
>  	struct xfs_ioend	*ioend;
>  	sector_t		last_block;
> @@ -858,10 +857,6 @@ xfs_writepage_map(
>  			continue;
>  		}
>  
> -		/* Check to see if current map spans this file offset */
> -		if (wpc->imap_valid)
> -			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 file_offset);
>  		/*
>  		 * If we don't have a valid map, now it's time to get a new one
>  		 * for this offset.  This will convert delayed allocations
> @@ -869,16 +864,14 @@ xfs_writepage_map(
>  		 * a valid map, it means we landed in a hole and we skip the
>  		 * block.
>  		 */
> -		if (!wpc->imap_valid) {
> +		if (!xfs_imap_valid(inode, &wpc->imap, file_offset)) {
>  			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
>  					     &wpc->io_type);
>  			if (error)
>  				goto out;
> -			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 file_offset);
>  		}
>  
> -		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> +		if (wpc->io_type == XFS_IO_HOLE) {
>  			/*
>  			 * set_page_dirty dirties all buffers in a page, independent
>  			 * of their state.  The dirty state however is entirely
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend
  2018-05-30 10:00 ` [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
@ 2018-05-30 17:55   ` Darrick J. Wong
  2018-05-31  7:03     ` Christoph Hellwig
  2018-05-31 13:49   ` Brian Foster
  1 sibling, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:08PM +0200, Christoph Hellwig wrote:
> Calculate all information for the bio based on the passed in information
> without requiring a buffer_head structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_aops.c | 68 ++++++++++++++++++++++-------------------------
>  1 file changed, 32 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 910b410e5a90..7d02d04d5a5b 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -44,7 +44,6 @@ struct xfs_writepage_ctx {
>  	struct xfs_bmbt_irec    imap;
>  	unsigned int		io_type;
>  	struct xfs_ioend	*ioend;
> -	sector_t		last_block;
>  };
>  
>  void
> @@ -535,11 +534,6 @@ xfs_start_page_writeback(
>  	unlock_page(page);
>  }
>  
> -static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
> -{
> -	return bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
> -}
> -
>  /*
>   * Submit the bio for an ioend. We are passed an ioend with a bio attached to
>   * it, and we submit that bio. The ioend may be used for multiple bio
> @@ -594,27 +588,20 @@ xfs_submit_ioend(
>  	return 0;
>  }
>  
> -static void
> -xfs_init_bio_from_bh(
> -	struct bio		*bio,
> -	struct buffer_head	*bh)
> -{
> -	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
> -	bio_set_dev(bio, bh->b_bdev);
> -}
> -
>  static struct xfs_ioend *
>  xfs_alloc_ioend(
>  	struct inode		*inode,
>  	unsigned int		type,
>  	xfs_off_t		offset,
> -	struct buffer_head	*bh)
> +	struct block_device	*bdev,
> +	sector_t		sector)
>  {
>  	struct xfs_ioend	*ioend;
>  	struct bio		*bio;
>  
>  	bio = bio_alloc_bioset(GFP_NOFS, BIO_MAX_PAGES, xfs_ioend_bioset);
> -	xfs_init_bio_from_bh(bio, bh);
> +	bio_set_dev(bio, bdev);
> +	bio->bi_iter.bi_sector = sector;
>  
>  	ioend = container_of(bio, struct xfs_ioend, io_inline_bio);
>  	INIT_LIST_HEAD(&ioend->io_list);
> @@ -639,13 +626,14 @@ static void
>  xfs_chain_bio(
>  	struct xfs_ioend	*ioend,
>  	struct writeback_control *wbc,
> -	struct buffer_head	*bh)
> +	struct block_device	*bdev,
> +	sector_t		sector)
>  {
>  	struct bio *new;
>  
>  	new = bio_alloc(GFP_NOFS, BIO_MAX_PAGES);
> -	xfs_init_bio_from_bh(new, bh);
> -
> +	bio_set_dev(new, bdev);
> +	new->bi_iter.bi_sector = sector;
>  	bio_chain(ioend->io_bio, new);
>  	bio_get(ioend->io_bio);		/* for xfs_destroy_ioend */
>  	ioend->io_bio->bi_opf = REQ_OP_WRITE | wbc_to_write_flags(wbc);
> @@ -655,39 +643,45 @@ xfs_chain_bio(
>  }
>  
>  /*
> - * Test to see if we've been building up a completion structure for
> - * earlier buffers -- if so, we try to append to this ioend if we
> - * can, otherwise we finish off any current ioend and start another.
> - * Return the ioend we finished off so that the caller can submit it
> - * once it has finished processing the dirty page.
> + * Test to see if we have an existing ioend structure that we could append to
> + * first, otherwise finish off the current ioend and start another.
>   */
>  STATIC void
>  xfs_add_to_ioend(
>  	struct inode		*inode,
> -	struct buffer_head	*bh,
>  	xfs_off_t		offset,
> +	struct page		*page,
>  	struct xfs_writepage_ctx *wpc,
>  	struct writeback_control *wbc,
>  	struct list_head	*iolist)
>  {
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
> +	unsigned		len = i_blocksize(inode);
> +	unsigned		poff = offset & (PAGE_SIZE - 1);
> +	sector_t		sector;
> +
> +	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
> +		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);

" >> SECTOR_SHIFT" here?  If so, I can fix this on its way in.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +
>  	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
> -	    bh->b_blocknr != wpc->last_block + 1 ||
> +	    sector != bio_end_sector(wpc->ioend->io_bio) ||
>  	    offset != wpc->ioend->io_offset + wpc->ioend->io_size) {
>  		if (wpc->ioend)
>  			list_add(&wpc->ioend->io_list, iolist);
> -		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset, bh);
> +		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset,
> +				bdev, sector);
>  	}
>  
>  	/*
> -	 * If the buffer doesn't fit into the bio we need to allocate a new
> -	 * one.  This shouldn't happen more than once for a given buffer.
> +	 * If the block doesn't fit into the bio we need to allocate a new
> +	 * one.  This shouldn't happen more than once for a given block.
>  	 */
> -	while (xfs_bio_add_buffer(wpc->ioend->io_bio, bh) != bh->b_size)
> -		xfs_chain_bio(wpc->ioend, wbc, bh);
> +	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
> +		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
>  
> -	wpc->ioend->io_size += bh->b_size;
> -	wpc->last_block = bh->b_blocknr;
> -	xfs_start_buffer_writeback(bh);
> +	wpc->ioend->io_size += len;
>  }
>  
>  STATIC void
> @@ -883,7 +877,9 @@ xfs_writepage_map(
>  
>  		lock_buffer(bh);
>  		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> -		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
> +		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
> +				&submit_list);
> +		xfs_start_buffer_writeback(bh);
>  		count++;
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset
  2018-05-30 10:00 ` [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
@ 2018-05-30 17:56   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:09PM +0200, Christoph Hellwig wrote:
> This keeps it in a single place so it can be made otional more easily.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 22 +++++-----------------
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 7d02d04d5a5b..025f2acac100 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -495,21 +495,6 @@ xfs_imap_valid(
>  		offset < imap->br_startoff + imap->br_blockcount;
>  }
>  
> -STATIC void
> -xfs_start_buffer_writeback(
> -	struct buffer_head	*bh)
> -{
> -	ASSERT(buffer_mapped(bh));
> -	ASSERT(buffer_locked(bh));
> -	ASSERT(!buffer_delay(bh));
> -	ASSERT(!buffer_unwritten(bh));
> -
> -	bh->b_end_io = NULL;
> -	set_buffer_async_write(bh);
> -	set_buffer_uptodate(bh);
> -	clear_buffer_dirty(bh);
> -}
> -
>  STATIC void
>  xfs_start_page_writeback(
>  	struct page		*page,
> @@ -718,6 +703,7 @@ xfs_map_at_offset(
>  	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
>  	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
>  
> +	lock_buffer(bh);
>  	xfs_map_buffer(inode, bh, imap, offset);
>  	set_buffer_mapped(bh);
>  	clear_buffer_delay(bh);
> @@ -730,6 +716,10 @@ xfs_map_at_offset(
>  	 * set the bdev now.
>  	 */
>  	bh->b_bdev = xfs_find_bdev_for_inode(inode);
> +	bh->b_end_io = NULL;
> +	set_buffer_async_write(bh);
> +	set_buffer_uptodate(bh);
> +	clear_buffer_dirty(bh);
>  }
>  
>  STATIC void
> @@ -875,11 +865,9 @@ xfs_writepage_map(
>  			continue;
>  		}
>  
> -		lock_buffer(bh);
>  		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
>  		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
>  				&submit_list);
> -		xfs_start_buffer_writeback(bh);
>  		count++;
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/18] xfs: remove xfs_start_page_writeback
  2018-05-30 10:00 ` [PATCH 15/18] xfs: remove xfs_start_page_writeback Christoph Hellwig
@ 2018-05-30 17:57   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 17:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:10PM +0200, Christoph Hellwig wrote:
> This helper only has two callers, one of them with a constant error
> argument.  Remove it to make pending changes to the code a little easier.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 47 +++++++++++++++++++++--------------------------
>  1 file changed, 21 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 025f2acac100..38021023131e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -495,30 +495,6 @@ xfs_imap_valid(
>  		offset < imap->br_startoff + imap->br_blockcount;
>  }
>  
> -STATIC void
> -xfs_start_page_writeback(
> -	struct page		*page,
> -	int			clear_dirty)
> -{
> -	ASSERT(PageLocked(page));
> -	ASSERT(!PageWriteback(page));
> -
> -	/*
> -	 * if the page was not fully cleaned, we need to ensure that the higher
> -	 * layers come back to it correctly. That means we need to keep the page
> -	 * dirty, and for WB_SYNC_ALL writeback we need to ensure the
> -	 * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to
> -	 * write this page in this writeback sweep will be made.
> -	 */
> -	if (clear_dirty) {
> -		clear_page_dirty_for_io(page);
> -		set_page_writeback(page);
> -	} else
> -		set_page_writeback_keepwrite(page);
> -
> -	unlock_page(page);
> -}
> -
>  /*
>   * Submit the bio for an ioend. We are passed an ioend with a bio attached to
>   * it, and we submit that bio. The ioend may be used for multiple bio
> @@ -877,6 +853,9 @@ xfs_writepage_map(
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
>  
>  out:
> +	ASSERT(PageLocked(page));
> +	ASSERT(!PageWriteback(page));
> +
>  	/*
>  	 * On error, we have to fail the ioend here because we have locked
>  	 * buffers in the ioend. If we don't do this, we'll deadlock
> @@ -895,7 +874,21 @@ xfs_writepage_map(
>  	 * treated correctly on error.
>  	 */
>  	if (count) {
> -		xfs_start_page_writeback(page, !error);
> +		/*
> +		 * If the page was not fully cleaned, we need to ensure that the
> +		 * higher layers come back to it correctly.  That means we need
> +		 * to keep the page dirty, and for WB_SYNC_ALL writeback we need
> +		 * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed
> +		 * so another attempt to write this page in this writeback sweep
> +		 * will be made.
> +		 */
> +		if (error) {
> +			set_page_writeback_keepwrite(page);
> +		} else {
> +			clear_page_dirty_for_io(page);
> +			set_page_writeback(page);
> +		}
> +		unlock_page(page);
>  
>  		/*
>  		 * Preserve the original error if there was one, otherwise catch
> @@ -920,7 +913,9 @@ xfs_writepage_map(
>  		 * race with a partial page truncate on a sub-page block sized
>  		 * filesystem. In that case we need to mark the page clean.
>  		 */
> -		xfs_start_page_writeback(page, 1);
> +		clear_page_dirty_for_io(page);
> +		set_page_writeback(page);
> +		unlock_page(page);
>  		end_page_writeback(page);
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map
  2018-05-30 10:00 ` [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
@ 2018-05-30 18:00   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 18:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:11PM +0200, Christoph Hellwig wrote:
> Rejuggle how we deal with the different error vs non-error and have
> ioends vs not have ioend cases to keep the fast path streamlined, and
> the duplicate code at a minimum.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 65 +++++++++++++++++++++++------------------------
>  1 file changed, 32 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 38021023131e..ac417ef326a9 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -873,7 +873,14 @@ xfs_writepage_map(
>  	 * submission of outstanding ioends on the writepage context so they are
>  	 * treated correctly on error.
>  	 */
> -	if (count) {
> +	if (unlikely(error)) {
> +		if (!count) {
> +			xfs_aops_discard_page(page);
> +			ClearPageUptodate(page);
> +			unlock_page(page);
> +			goto done;
> +		}
> +
>  		/*
>  		 * If the page was not fully cleaned, we need to ensure that the
>  		 * higher layers come back to it correctly.  That means we need
> @@ -882,43 +889,35 @@ xfs_writepage_map(
>  		 * so another attempt to write this page in this writeback sweep
>  		 * will be made.
>  		 */
> -		if (error) {
> -			set_page_writeback_keepwrite(page);
> -		} else {
> -			clear_page_dirty_for_io(page);
> -			set_page_writeback(page);
> -		}
> -		unlock_page(page);
> -
> -		/*
> -		 * Preserve the original error if there was one, otherwise catch
> -		 * submission errors here and propagate into subsequent ioend
> -		 * submissions.
> -		 */
> -		list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
> -			int error2;
> -
> -			list_del_init(&ioend->io_list);
> -			error2 = xfs_submit_ioend(wbc, ioend, error);
> -			if (error2 && !error)
> -				error = error2;
> -		}
> -	} else if (error) {
> -		xfs_aops_discard_page(page);
> -		ClearPageUptodate(page);
> -		unlock_page(page);
> +		set_page_writeback_keepwrite(page);
>  	} else {
> -		/*
> -		 * We can end up here with no error and nothing to write if we
> -		 * race with a partial page truncate on a sub-page block sized
> -		 * filesystem. In that case we need to mark the page clean.
> -		 */
>  		clear_page_dirty_for_io(page);
>  		set_page_writeback(page);
> -		unlock_page(page);
> -		end_page_writeback(page);
>  	}
>  
> +	unlock_page(page);
> +
> +	/*
> +	 * Preserve the original error if there was one, otherwise catch
> +	 * submission errors here and propagate into subsequent ioend
> +	 * submissions.
> +	 */
> +	list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
> +		int error2;
> +
> +		list_del_init(&ioend->io_list);
> +		error2 = xfs_submit_ioend(wbc, ioend, error);
> +		if (error2 && !error)
> +			error = error2;
> +	}
> +
> +	/*
> +	 * We can end up here with no error and nothing to write if we race with
> +	 * a partial page truncate on a sub-page block sized filesystem.
> +	 */
> +	if (!count)
> +		end_page_writeback(page);
> +done:
>  	mapping_set_error(page->mapping, error);
>  	return error;
>  }
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map
  2018-05-30 10:00 ` [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
@ 2018-05-30 18:08   ` Darrick J. Wong
  2018-05-31  7:04     ` Christoph Hellwig
  2018-05-31 13:49   ` Brian Foster
  1 sibling, 1 reply; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 18:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:12PM +0200, Christoph Hellwig wrote:
> We already track the page uptodate status based on the buffer uptodate
> status, which is updated whenever reading or zeroing blocks.
> 
> This code has been there since commit a ptool commit in 2002, which
> claims to:
> 
>     "merge" the 2.4 fsx fix for block size < page size to 2.5.  This needed
>     major changes to actually fit.
> 
> and isn't present in other writepage implementations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok, assuming that reads or buffered writes set the page
uptodate...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index ac417ef326a9..84f88cecd2f1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -786,7 +786,6 @@ xfs_writepage_map(
>  	ssize_t			len = i_blocksize(inode);
>  	int			error = 0;
>  	int			count = 0;
> -	bool			uptodate = true;
>  	loff_t			file_offset;	/* file offset of page */
>  	unsigned		poffset;	/* offset into page */
>  
> @@ -813,7 +812,6 @@ xfs_writepage_map(
>  		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> -			uptodate = false;
>  			continue;
>  		}
>  
> @@ -847,9 +845,6 @@ xfs_writepage_map(
>  		count++;
>  	}
>  
> -	if (uptodate && poffset == PAGE_SIZE)
> -		SetPageUptodate(page);
> -
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
>  
>  out:
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 18/18] xfs: allow writeback on pages without buffer heads
  2018-05-30 10:00 ` [PATCH 18/18] xfs: allow writeback on pages without buffer heads Christoph Hellwig
@ 2018-05-30 18:08   ` Darrick J. Wong
  2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Darrick J. Wong @ 2018-05-30 18:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:13PM +0200, Christoph Hellwig wrote:
> Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size
> equal to the page size, and deal with pages without buffer heads in
> writeback.  Thanks to the previous refactoring this is basically trivial
> now.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c  | 47 +++++++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_iomap.c |  3 ++-
>  2 files changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 84f88cecd2f1..6640377b6eae 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -91,6 +91,19 @@ xfs_find_daxdev_for_inode(
>  		return mp->m_ddev_targp->bt_daxdev;
>  }
>  
> +static void
> +xfs_finish_page_writeback(
> +	struct inode		*inode,
> +	struct bio_vec		*bvec,
> +	int			error)
> +{
> +	if (error) {
> +		SetPageError(bvec->bv_page);
> +		mapping_set_error(inode->i_mapping, -EIO);
> +	}
> +	end_page_writeback(bvec->bv_page);
> +}
> +
>  /*
>   * We're now finished for good with this page.  Update the page state via the
>   * associated buffer_heads, paying attention to the start and end offsets that
> @@ -103,7 +116,7 @@ xfs_find_daxdev_for_inode(
>   * and buffers potentially freed after every call to end_buffer_async_write.
>   */
>  static void
> -xfs_finish_page_writeback(
> +xfs_finish_buffer_writeback(
>  	struct inode		*inode,
>  	struct bio_vec		*bvec,
>  	int			error)
> @@ -178,9 +191,12 @@ xfs_destroy_ioend(
>  			next = bio->bi_private;
>  
>  		/* walk each page on bio, ending page IO on them */
> -		bio_for_each_segment_all(bvec, bio, i)
> -			xfs_finish_page_writeback(inode, bvec, error);
> -
> +		bio_for_each_segment_all(bvec, bio, i) {
> +			if (page_has_buffers(bvec->bv_page))
> +				xfs_finish_buffer_writeback(inode, bvec, error);
> +			else
> +				xfs_finish_page_writeback(inode, bvec, error);
> +		}
>  		bio_put(bio);
>  	}
>  
> @@ -782,13 +798,16 @@ xfs_writepage_map(
>  {
>  	LIST_HEAD(submit_list);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh;
> +	struct buffer_head	*bh = NULL;
>  	ssize_t			len = i_blocksize(inode);
>  	int			error = 0;
>  	int			count = 0;
>  	loff_t			file_offset;	/* file offset of page */
>  	unsigned		poffset;	/* offset into page */
>  
> +	if (page_has_buffers(page))
> +		bh = page_buffers(page);
> +
>  	/*
>  	 * Walk the blocks on the page, and we we run off then end of the
>  	 * current map or find the current map invalid, grab a new one.
> @@ -797,11 +816,9 @@ xfs_writepage_map(
>  	 * replace the bufferhead with some other state tracking mechanism in
>  	 * future.
>  	 */
> -	file_offset = page_offset(page);
> -	bh = page_buffers(page);
> -	for (poffset = 0;
> +	for (poffset = 0, file_offset = page_offset(page);
>  	     poffset < PAGE_SIZE;
> -	     poffset += len, file_offset += len, bh = bh->b_this_page) {
> +	     poffset += len, file_offset += len) {
>  		/* past the range we are writing, so nothing more to write. */
>  		if (file_offset >= end_offset)
>  			break;
> @@ -809,9 +826,10 @@ xfs_writepage_map(
>  		/*
>  		 * Block does not contain valid data, skip it.
>  		 */
> -		if (!buffer_uptodate(bh)) {
> +		if (bh && !buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> +			bh = bh->b_this_page;
>  			continue;
>  		}
>  
> @@ -836,10 +854,15 @@ xfs_writepage_map(
>  			 * meaningless for holes (!mapped && uptodate), so check we did
>  			 * have a buffer covering a hole here and continue.
>  			 */
> +			if (bh)
> +				bh = bh->b_this_page;
>  			continue;
>  		}
>  
> -		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		if (bh) {
> +			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +			bh = bh->b_this_page;
> +		}
>  		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
>  				&submit_list);
>  		count++;
> @@ -939,8 +962,6 @@ xfs_do_writepage(
>  
>  	trace_xfs_writepage(inode, page, 0, 0);
>  
> -	ASSERT(page_has_buffers(page));
> -
>  	/*
>  	 * Refuse to write the page out if we are called from reclaim context.
>  	 *
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index f949f0dd7382..93c40da3378a 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1031,7 +1031,8 @@ xfs_file_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	iomap->flags |= IOMAP_F_BUFFER_HEAD;
> +	if (i_blocksize(inode) < PAGE_SIZE)
> +		iomap->flags |= IOMAP_F_BUFFER_HEAD;
>  
>  	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
>  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend
  2018-05-30 17:55   ` Darrick J. Wong
@ 2018-05-31  7:03     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-31  7:03 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 10:55:29AM -0700, Darrick J. Wong wrote:
> > +	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
> > +		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
> 
> " >> SECTOR_SHIFT" here?  If so, I can fix this on its way in.

The >> 9 that until very recently was used everywhere makes it nicely
fit on two lines.  But the fixup is ok with me, too.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map
  2018-05-30 18:08   ` Darrick J. Wong
@ 2018-05-31  7:04     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-31  7:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 11:08:39AM -0700, Darrick J. Wong wrote:
> > and isn't present in other writepage implementations.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks ok, assuming that reads or buffered writes set the page
> uptodate...

Reads have to by definition, as do buffered writes that bring in
data / overwrite data.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag
  2018-05-30 10:00 ` [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
  2018-05-30 17:19   ` Darrick J. Wong
@ 2018-05-31 13:46   ` Brian Foster
  2018-05-31 16:11     ` Christoph Hellwig
  1 sibling, 1 reply; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:02PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

The change looks Ok... It's clearly reasonable to remove a flag that is
no longer used, but why is it no longer used? The previous patch drops
it to "make xfs_writepage_map() extent map centric," but the description
doesn't exactly explain why (and it's not immediately clear to me
amongst all the other code changes).

My understanding of the purpose of IGSTATE here is that if we already
have an iotype == unwritten ioend, it makes sense to combine contiguous
mappings since we'd convert the unwritten portions on I/O completion.
Now that we look up extent first and establish the ioend type from that
rather than set the ioend type based on the buffer state, I suppose it
is possible for IGSTATE to lose the fact that a contiguous unwritten
extent ends up being merged with a normal extent before the ioend type
is established..? Then again, was IGSTATE even effective in this context
with nimaps == 1?

This change may very well be fine in the end, but it's made
unnecessarily difficult to review by the nature of the previous patch.
ISTM that if this is a dependency of the broader change, it should be
split off into a separate patch that drops the usage and the flag
together and explains why.

Brian

>  fs/xfs/libxfs/xfs_bmap.c | 6 ++----
>  fs/xfs/libxfs/xfs_bmap.h | 3 ---
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7b0e2b551e23..4b5e014417d2 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3799,8 +3799,7 @@ xfs_bmapi_update_map(
>  		   mval[-1].br_startblock != HOLESTARTBLOCK &&
>  		   mval->br_startblock == mval[-1].br_startblock +
>  					  mval[-1].br_blockcount &&
> -		   ((flags & XFS_BMAPI_IGSTATE) ||
> -			mval[-1].br_state == mval->br_state)) {
> +		   mval[-1].br_state == mval->br_state) {
>  		ASSERT(mval->br_startoff ==
>  		       mval[-1].br_startoff + mval[-1].br_blockcount);
>  		mval[-1].br_blockcount += mval->br_blockcount;
> @@ -3845,7 +3844,7 @@ xfs_bmapi_read(
>  
>  	ASSERT(*nmap >= 1);
>  	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
> -			   XFS_BMAPI_IGSTATE|XFS_BMAPI_COWFORK)));
> +			   XFS_BMAPI_COWFORK)));
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED|XFS_ILOCK_EXCL));
>  
>  	if (unlikely(XFS_TEST_ERROR(
> @@ -4290,7 +4289,6 @@ xfs_bmapi_write(
>  
>  	ASSERT(*nmap >= 1);
>  	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
> -	ASSERT(!(flags & XFS_BMAPI_IGSTATE));
>  	ASSERT(tp != NULL ||
>  	       (flags & (XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK)) ==
>  			(XFS_BMAPI_CONVERT | XFS_BMAPI_COWFORK));
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 2c233f9f1a26..a845fe57d1b5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -80,8 +80,6 @@ struct xfs_extent_free_item
>  #define XFS_BMAPI_METADATA	0x002	/* mapping metadata not user data */
>  #define XFS_BMAPI_ATTRFORK	0x004	/* use attribute fork not data */
>  #define XFS_BMAPI_PREALLOC	0x008	/* preallocation op: unwritten space */
> -#define XFS_BMAPI_IGSTATE	0x010	/* Ignore state - */
> -					/* combine contig. space */
>  #define XFS_BMAPI_CONTIG	0x020	/* must allocate only one extent */
>  /*
>   * unwritten extent conversion - this needs write cache flushing and no additional
> @@ -128,7 +126,6 @@ struct xfs_extent_free_item
>  	{ XFS_BMAPI_METADATA,	"METADATA" }, \
>  	{ XFS_BMAPI_ATTRFORK,	"ATTRFORK" }, \
>  	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
> -	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
>  	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
>  	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
>  	{ XFS_BMAPI_ZERO,	"ZERO" }, \
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping
  2018-05-30 10:00 ` [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
  2018-05-30 17:23   ` Darrick J. Wong
@ 2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:03PM +0200, Christoph Hellwig wrote:
> We only have one caller left, and open coding the simple extent list
> lookup in it allows us to make the code both more understandable and
> reuse calculations and variables already present.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c    | 17 ++++++++++++-----
>  fs/xfs/xfs_reflink.c | 30 ------------------------------
>  fs/xfs/xfs_reflink.h |  2 --
>  fs/xfs/xfs_trace.h   |  1 -
>  4 files changed, 12 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 8cc41a786b5e..e2671b223409 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -385,6 +385,7 @@ xfs_map_blocks(
>  	ssize_t			count = i_blocksize(inode);
>  	xfs_fileoff_t		offset_fsb, end_fsb;
>  	int			whichfork = XFS_DATA_FORK;
> +	struct xfs_iext_cursor	icur;
>  	int			error = 0;
>  	int			nimaps = 1;
>  
> @@ -396,8 +397,18 @@ xfs_map_blocks(
>  	       (ip->i_df.if_flags & XFS_IFEXTENTS));
>  	ASSERT(offset <= mp->m_super->s_maxbytes);
>  
> +	if (offset > mp->m_super->s_maxbytes - count)
> +		count = mp->m_super->s_maxbytes - offset;
> +	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
> +	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +
> +	/*
> +	 * Check if this is offset is covered by a COW extents, and if yes use
> +	 * it directly instead of looking up anything in the data fork.
> +	 */
>  	if (xfs_is_reflink_inode(ip) &&
> -	    xfs_reflink_find_cow_mapping(ip, offset, imap)) {
> +	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap) &&
> +	    imap->br_startoff <= offset_fsb) {
>  		xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  		/*
>  		 * Truncate can race with writeback since writeback doesn't
> @@ -417,10 +428,6 @@ xfs_map_blocks(
>  		goto allocate_blocks;
>  	}
>  
> -	if (offset > mp->m_super->s_maxbytes - count)
> -		count = mp->m_super->s_maxbytes - offset;
> -	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>  				imap, &nimaps, XFS_BMAPI_ENTIRE);
>  	if (!nimaps) {
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 713e857d9ffa..8e5eb8e70c89 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -484,36 +484,6 @@ xfs_reflink_allocate_cow(
>  	return error;
>  }
>  
> -/*
> - * Find the CoW reservation for a given byte offset of a file.
> - */
> -bool
> -xfs_reflink_find_cow_mapping(
> -	struct xfs_inode		*ip,
> -	xfs_off_t			offset,
> -	struct xfs_bmbt_irec		*imap)
> -{
> -	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> -	xfs_fileoff_t			offset_fsb;
> -	struct xfs_bmbt_irec		got;
> -	struct xfs_iext_cursor		icur;
> -
> -	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
> -
> -	if (!xfs_is_reflink_inode(ip))
> -		return false;
> -	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> -	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
> -		return false;
> -	if (got.br_startoff > offset_fsb)
> -		return false;
> -
> -	trace_xfs_reflink_find_cow_mapping(ip, offset, 1, XFS_IO_OVERWRITE,
> -			&got);
> -	*imap = got;
> -	return true;
> -}
> -
>  /*
>   * Trim an extent to end at the next CoW reservation past offset_fsb.
>   */
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 701487bab468..15a456492667 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -32,8 +32,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
>  extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
> -extern bool xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
> -		struct xfs_bmbt_irec *imap);
>  extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
>  		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 9d4c4ca24fe6..ed8f774944ba 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3227,7 +3227,6 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
>  DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
>  
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
> -DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
>  DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);
>  
>  DEFINE_SIMPLE_IO_EVENT(xfs_reflink_cancel_cow_range);
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow
  2018-05-30 10:00 ` [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
  2018-05-30 17:27   ` Darrick J. Wong
@ 2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:04PM +0200, Christoph Hellwig wrote:
> In the only caller we just did a lookup in the COW extent tree for
> the same offset.  Reuse that result and save a lookup, as well as
> shortening the ilock hold time.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c    | 25 ++++++++++++++++---------
>  fs/xfs/xfs_reflink.c | 33 ---------------------------------
>  fs/xfs/xfs_reflink.h |  2 --
>  3 files changed, 16 insertions(+), 44 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index e2671b223409..587493e9c8a1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -383,7 +383,7 @@ xfs_map_blocks(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	ssize_t			count = i_blocksize(inode);
> -	xfs_fileoff_t		offset_fsb, end_fsb;
> +	xfs_fileoff_t		offset_fsb, end_fsb, cow_fsb = NULLFILEOFF;
>  	int			whichfork = XFS_DATA_FORK;
>  	struct xfs_iext_cursor	icur;
>  	int			error = 0;
> @@ -407,8 +407,9 @@ xfs_map_blocks(
>  	 * it directly instead of looking up anything in the data fork.
>  	 */
>  	if (xfs_is_reflink_inode(ip) &&
> -	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap) &&
> -	    imap->br_startoff <= offset_fsb) {
> +	    xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, imap))
> +		cow_fsb = imap->br_startoff;
> +	if (cow_fsb != NULLFILEOFF && cow_fsb <= offset_fsb) {
>  		xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  		/*
>  		 * Truncate can race with writeback since writeback doesn't
> @@ -430,6 +431,10 @@ xfs_map_blocks(
>  
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>  				imap, &nimaps, XFS_BMAPI_ENTIRE);
> +	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +	if (error)
> +		return error;
> +
>  	if (!nimaps) {
>  		/*
>  		 * Lookup returns no match? Beyond eof? regardless,
> @@ -454,21 +459,23 @@ xfs_map_blocks(
>  		 * is a pending CoW reservation before the end of this extent,
>  		 * so that we pick up the COW extents in the next iteration.
>  		 */
> -		xfs_reflink_trim_irec_to_next_cow(ip, offset_fsb, imap);
> +		if (cow_fsb != NULLFILEOFF &&
> +		    cow_fsb < imap->br_startoff + imap->br_blockcount) {
> +			imap->br_blockcount = cow_fsb - imap->br_startoff;
> +			trace_xfs_reflink_trim_irec(ip, imap);
> +		}
> +
>  		if (imap->br_state == XFS_EXT_UNWRITTEN)
>  			*type = XFS_IO_UNWRITTEN;
>  		else
>  			*type = XFS_IO_OVERWRITE;
>  	}
> -	xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
>  	trace_xfs_map_blocks_found(ip, offset, count, *type, imap);
> -	return error;
> +	return 0;
>  
>  allocate_blocks:
> -	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -	if (!error)
> -		error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
> +	error = xfs_iomap_write_allocate(ip, whichfork, offset, imap);
>  	if (!error)
>  		trace_xfs_map_blocks_alloc(ip, offset, count, *type, imap);
>  	return error;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 8e5eb8e70c89..ff76bc56ff3d 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -484,39 +484,6 @@ xfs_reflink_allocate_cow(
>  	return error;
>  }
>  
> -/*
> - * Trim an extent to end at the next CoW reservation past offset_fsb.
> - */
> -void
> -xfs_reflink_trim_irec_to_next_cow(
> -	struct xfs_inode		*ip,
> -	xfs_fileoff_t			offset_fsb,
> -	struct xfs_bmbt_irec		*imap)
> -{
> -	struct xfs_ifork		*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> -	struct xfs_bmbt_irec		got;
> -	struct xfs_iext_cursor		icur;
> -
> -	if (!xfs_is_reflink_inode(ip))
> -		return;
> -
> -	/* Find the extent in the CoW fork. */
> -	if (!xfs_iext_lookup_extent(ip, ifp, offset_fsb, &icur, &got))
> -		return;
> -
> -	/* This is the extent before; try sliding up one. */
> -	if (got.br_startoff < offset_fsb) {
> -		if (!xfs_iext_next_extent(ifp, &icur, &got))
> -			return;
> -	}
> -
> -	if (got.br_startoff >= imap->br_startoff + imap->br_blockcount)
> -		return;
> -
> -	imap->br_blockcount = got.br_startoff - imap->br_startoff;
> -	trace_xfs_reflink_trim_irec(ip, imap);
> -}
> -
>  /*
>   * Cancel CoW reservations for some block range of an inode.
>   *
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 15a456492667..e8d4d50c629f 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -32,8 +32,6 @@ extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode);
>  extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
> -extern void xfs_reflink_trim_irec_to_next_cow(struct xfs_inode *ip,
> -		xfs_fileoff_t offset_fsb, struct xfs_bmbt_irec *imap);
>  
>  extern int xfs_reflink_cancel_cow_blocks(struct xfs_inode *ip,
>  		struct xfs_trans **tpp, xfs_fileoff_t offset_fsb,
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly
  2018-05-30 10:00 ` [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
  2018-05-30 17:32   ` Darrick J. Wong
@ 2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:05PM +0200, Christoph Hellwig wrote:
> xfs_bmapi_read adds zero value in xfs_map_blocks.  Replace it with a
> direct call to the low-level extent lookup function.
> 
> Note that we now always pass a 0 length to the trace points as we ask
> for an unspecified len.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 587493e9c8a1..cef2bc3cf98b 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -387,7 +387,6 @@ xfs_map_blocks(
>  	int			whichfork = XFS_DATA_FORK;
>  	struct xfs_iext_cursor	icur;
>  	int			error = 0;
> -	int			nimaps = 1;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
> @@ -429,24 +428,16 @@ xfs_map_blocks(
>  		goto allocate_blocks;
>  	}
>  
> -	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> -				imap, &nimaps, XFS_BMAPI_ENTIRE);
> +	if (!xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, imap))
> +		imap->br_startoff = end_fsb;	/* fake a hole past EOF */
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -	if (error)
> -		return error;
>  
> -	if (!nimaps) {
> -		/*
> -		 * Lookup returns no match? Beyond eof? regardless,
> -		 * return it as a hole so we don't write it
> -		 */
> +	if (imap->br_startoff > offset_fsb) {
> +		/* landed in a hole or beyond EOF */
> +		imap->br_blockcount = imap->br_startoff - offset_fsb;
>  		imap->br_startoff = offset_fsb;
> -		imap->br_blockcount = end_fsb - offset_fsb;
>  		imap->br_startblock = HOLESTARTBLOCK;
>  		*type = XFS_IO_HOLE;
> -	} else if (imap->br_startblock == HOLESTARTBLOCK) {
> -		/* landed in a hole */
> -		*type = XFS_IO_HOLE;
>  	} else {
>  		if (isnullstartblock(imap->br_startblock)) {
>  			/* got a delalloc extent */
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers
  2018-05-30 10:00 ` [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
  2018-05-30 17:34   ` Darrick J. Wong
@ 2018-05-31 13:46   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:06PM +0200, Christoph Hellwig wrote:
> Finding a buffer that isn't uptodate doesn't invalidate the mapping for
> any given block.  The last_sector check will already take care of starting
> another ioend as soon as we find any non-update buffer, and if the current
> mapping doesn't include the next uptodate buffer the xfs_imap_valid check
> will take care of it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index cef2bc3cf98b..7dc13b0aae60 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -849,15 +849,12 @@ xfs_writepage_map(
>  			break;
>  
>  		/*
> -		 * Block does not contain valid data, skip it, mark the current
> -		 * map as invalid because we have a discontiguity. This ensures
> -		 * we put subsequent writeable buffers into a new ioend.
> +		 * Block does not contain valid data, skip it.
>  		 */
>  		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
>  			uptodate = false;
> -			wpc->imap_valid = false;
>  			continue;
>  		}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/18] xfs: remove the imap_valid flag
  2018-05-30 10:00 ` [PATCH 12/18] xfs: remove the imap_valid flag Christoph Hellwig
  2018-05-30 17:44   ` Darrick J. Wong
@ 2018-05-31 13:47   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:07PM +0200, Christoph Hellwig wrote:
> Simplify the way we check for a valid imap - we know we have a valid
> mapping after xfs_map_blocks returned successfully, and we know we can
> call xfs_imap_valid on any imap, as it will always fail on a
> zero-initialized map.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 7dc13b0aae60..910b410e5a90 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -42,7 +42,6 @@
>   */
>  struct xfs_writepage_ctx {
>  	struct xfs_bmbt_irec    imap;
> -	bool			imap_valid;
>  	unsigned int		io_type;
>  	struct xfs_ioend	*ioend;
>  	sector_t		last_block;
> @@ -858,10 +857,6 @@ xfs_writepage_map(
>  			continue;
>  		}
>  
> -		/* Check to see if current map spans this file offset */
> -		if (wpc->imap_valid)
> -			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 file_offset);
>  		/*
>  		 * If we don't have a valid map, now it's time to get a new one
>  		 * for this offset.  This will convert delayed allocations
> @@ -869,16 +864,14 @@ xfs_writepage_map(
>  		 * a valid map, it means we landed in a hole and we skip the
>  		 * block.
>  		 */
> -		if (!wpc->imap_valid) {
> +		if (!xfs_imap_valid(inode, &wpc->imap, file_offset)) {
>  			error = xfs_map_blocks(inode, file_offset, &wpc->imap,
>  					     &wpc->io_type);
>  			if (error)
>  				goto out;
> -			wpc->imap_valid = xfs_imap_valid(inode, &wpc->imap,
> -							 file_offset);
>  		}
>  
> -		if (!wpc->imap_valid || wpc->io_type == XFS_IO_HOLE) {
> +		if (wpc->io_type == XFS_IO_HOLE) {
>  			/*
>  			 * set_page_dirty dirties all buffers in a page, independent
>  			 * of their state.  The dirty state however is entirely
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend
  2018-05-30 10:00 ` [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
  2018-05-30 17:55   ` Darrick J. Wong
@ 2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:08PM +0200, Christoph Hellwig wrote:
> Calculate all information for the bio based on the passed in information
> without requiring a buffer_head structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 68 ++++++++++++++++++++++-------------------------
>  1 file changed, 32 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 910b410e5a90..7d02d04d5a5b 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -44,7 +44,6 @@ struct xfs_writepage_ctx {
>  	struct xfs_bmbt_irec    imap;
>  	unsigned int		io_type;
>  	struct xfs_ioend	*ioend;
> -	sector_t		last_block;
>  };
>  
>  void
> @@ -535,11 +534,6 @@ xfs_start_page_writeback(
>  	unlock_page(page);
>  }
>  
> -static inline int xfs_bio_add_buffer(struct bio *bio, struct buffer_head *bh)
> -{
> -	return bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
> -}
> -
>  /*
>   * Submit the bio for an ioend. We are passed an ioend with a bio attached to
>   * it, and we submit that bio. The ioend may be used for multiple bio
> @@ -594,27 +588,20 @@ xfs_submit_ioend(
>  	return 0;
>  }
>  
> -static void
> -xfs_init_bio_from_bh(
> -	struct bio		*bio,
> -	struct buffer_head	*bh)
> -{
> -	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
> -	bio_set_dev(bio, bh->b_bdev);
> -}
> -
>  static struct xfs_ioend *
>  xfs_alloc_ioend(
>  	struct inode		*inode,
>  	unsigned int		type,
>  	xfs_off_t		offset,
> -	struct buffer_head	*bh)
> +	struct block_device	*bdev,
> +	sector_t		sector)
>  {
>  	struct xfs_ioend	*ioend;
>  	struct bio		*bio;
>  
>  	bio = bio_alloc_bioset(GFP_NOFS, BIO_MAX_PAGES, xfs_ioend_bioset);
> -	xfs_init_bio_from_bh(bio, bh);
> +	bio_set_dev(bio, bdev);
> +	bio->bi_iter.bi_sector = sector;
>  
>  	ioend = container_of(bio, struct xfs_ioend, io_inline_bio);
>  	INIT_LIST_HEAD(&ioend->io_list);
> @@ -639,13 +626,14 @@ static void
>  xfs_chain_bio(
>  	struct xfs_ioend	*ioend,
>  	struct writeback_control *wbc,
> -	struct buffer_head	*bh)
> +	struct block_device	*bdev,
> +	sector_t		sector)
>  {
>  	struct bio *new;
>  
>  	new = bio_alloc(GFP_NOFS, BIO_MAX_PAGES);
> -	xfs_init_bio_from_bh(new, bh);
> -
> +	bio_set_dev(new, bdev);
> +	new->bi_iter.bi_sector = sector;
>  	bio_chain(ioend->io_bio, new);
>  	bio_get(ioend->io_bio);		/* for xfs_destroy_ioend */
>  	ioend->io_bio->bi_opf = REQ_OP_WRITE | wbc_to_write_flags(wbc);
> @@ -655,39 +643,45 @@ xfs_chain_bio(
>  }
>  
>  /*
> - * Test to see if we've been building up a completion structure for
> - * earlier buffers -- if so, we try to append to this ioend if we
> - * can, otherwise we finish off any current ioend and start another.
> - * Return the ioend we finished off so that the caller can submit it
> - * once it has finished processing the dirty page.
> + * Test to see if we have an existing ioend structure that we could append to
> + * first, otherwise finish off the current ioend and start another.
>   */
>  STATIC void
>  xfs_add_to_ioend(
>  	struct inode		*inode,
> -	struct buffer_head	*bh,
>  	xfs_off_t		offset,
> +	struct page		*page,
>  	struct xfs_writepage_ctx *wpc,
>  	struct writeback_control *wbc,
>  	struct list_head	*iolist)
>  {
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct block_device	*bdev = xfs_find_bdev_for_inode(inode);
> +	unsigned		len = i_blocksize(inode);
> +	unsigned		poff = offset & (PAGE_SIZE - 1);
> +	sector_t		sector;
> +
> +	sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
> +		((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
> +
>  	if (!wpc->ioend || wpc->io_type != wpc->ioend->io_type ||
> -	    bh->b_blocknr != wpc->last_block + 1 ||
> +	    sector != bio_end_sector(wpc->ioend->io_bio) ||
>  	    offset != wpc->ioend->io_offset + wpc->ioend->io_size) {
>  		if (wpc->ioend)
>  			list_add(&wpc->ioend->io_list, iolist);
> -		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset, bh);
> +		wpc->ioend = xfs_alloc_ioend(inode, wpc->io_type, offset,
> +				bdev, sector);
>  	}
>  
>  	/*
> -	 * If the buffer doesn't fit into the bio we need to allocate a new
> -	 * one.  This shouldn't happen more than once for a given buffer.
> +	 * If the block doesn't fit into the bio we need to allocate a new
> +	 * one.  This shouldn't happen more than once for a given block.
>  	 */
> -	while (xfs_bio_add_buffer(wpc->ioend->io_bio, bh) != bh->b_size)
> -		xfs_chain_bio(wpc->ioend, wbc, bh);
> +	while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len)
> +		xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
>  
> -	wpc->ioend->io_size += bh->b_size;
> -	wpc->last_block = bh->b_blocknr;
> -	xfs_start_buffer_writeback(bh);
> +	wpc->ioend->io_size += len;
>  }
>  
>  STATIC void
> @@ -883,7 +877,9 @@ xfs_writepage_map(
>  
>  		lock_buffer(bh);
>  		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> -		xfs_add_to_ioend(inode, bh, file_offset, wpc, wbc, &submit_list);
> +		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
> +				&submit_list);
> +		xfs_start_buffer_writeback(bh);
>  		count++;
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset
  2018-05-30 10:00 ` [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
  2018-05-30 17:56   ` Darrick J. Wong
@ 2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:09PM +0200, Christoph Hellwig wrote:
> This keeps it in a single place so it can be made otional more easily.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 22 +++++-----------------
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 7d02d04d5a5b..025f2acac100 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -495,21 +495,6 @@ xfs_imap_valid(
>  		offset < imap->br_startoff + imap->br_blockcount;
>  }
>  
> -STATIC void
> -xfs_start_buffer_writeback(
> -	struct buffer_head	*bh)
> -{
> -	ASSERT(buffer_mapped(bh));
> -	ASSERT(buffer_locked(bh));
> -	ASSERT(!buffer_delay(bh));
> -	ASSERT(!buffer_unwritten(bh));
> -
> -	bh->b_end_io = NULL;
> -	set_buffer_async_write(bh);
> -	set_buffer_uptodate(bh);
> -	clear_buffer_dirty(bh);
> -}
> -
>  STATIC void
>  xfs_start_page_writeback(
>  	struct page		*page,
> @@ -718,6 +703,7 @@ xfs_map_at_offset(
>  	ASSERT(imap->br_startblock != HOLESTARTBLOCK);
>  	ASSERT(imap->br_startblock != DELAYSTARTBLOCK);
>  
> +	lock_buffer(bh);
>  	xfs_map_buffer(inode, bh, imap, offset);
>  	set_buffer_mapped(bh);
>  	clear_buffer_delay(bh);
> @@ -730,6 +716,10 @@ xfs_map_at_offset(
>  	 * set the bdev now.
>  	 */
>  	bh->b_bdev = xfs_find_bdev_for_inode(inode);
> +	bh->b_end_io = NULL;
> +	set_buffer_async_write(bh);
> +	set_buffer_uptodate(bh);
> +	clear_buffer_dirty(bh);
>  }
>  
>  STATIC void
> @@ -875,11 +865,9 @@ xfs_writepage_map(
>  			continue;
>  		}
>  
> -		lock_buffer(bh);
>  		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
>  		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
>  				&submit_list);
> -		xfs_start_buffer_writeback(bh);
>  		count++;
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 15/18] xfs: remove xfs_start_page_writeback
  2018-05-30 10:00 ` [PATCH 15/18] xfs: remove xfs_start_page_writeback Christoph Hellwig
  2018-05-30 17:57   ` Darrick J. Wong
@ 2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:10PM +0200, Christoph Hellwig wrote:
> This helper only has two callers, one of them with a constant error
> argument.  Remove it to make pending changes to the code a little easier.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 47 +++++++++++++++++++++--------------------------
>  1 file changed, 21 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 025f2acac100..38021023131e 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -495,30 +495,6 @@ xfs_imap_valid(
>  		offset < imap->br_startoff + imap->br_blockcount;
>  }
>  
> -STATIC void
> -xfs_start_page_writeback(
> -	struct page		*page,
> -	int			clear_dirty)
> -{
> -	ASSERT(PageLocked(page));
> -	ASSERT(!PageWriteback(page));
> -
> -	/*
> -	 * if the page was not fully cleaned, we need to ensure that the higher
> -	 * layers come back to it correctly. That means we need to keep the page
> -	 * dirty, and for WB_SYNC_ALL writeback we need to ensure the
> -	 * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to
> -	 * write this page in this writeback sweep will be made.
> -	 */
> -	if (clear_dirty) {
> -		clear_page_dirty_for_io(page);
> -		set_page_writeback(page);
> -	} else
> -		set_page_writeback_keepwrite(page);
> -
> -	unlock_page(page);
> -}
> -
>  /*
>   * Submit the bio for an ioend. We are passed an ioend with a bio attached to
>   * it, and we submit that bio. The ioend may be used for multiple bio
> @@ -877,6 +853,9 @@ xfs_writepage_map(
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
>  
>  out:
> +	ASSERT(PageLocked(page));
> +	ASSERT(!PageWriteback(page));
> +
>  	/*
>  	 * On error, we have to fail the ioend here because we have locked
>  	 * buffers in the ioend. If we don't do this, we'll deadlock
> @@ -895,7 +874,21 @@ xfs_writepage_map(
>  	 * treated correctly on error.
>  	 */
>  	if (count) {
> -		xfs_start_page_writeback(page, !error);
> +		/*
> +		 * If the page was not fully cleaned, we need to ensure that the
> +		 * higher layers come back to it correctly.  That means we need
> +		 * to keep the page dirty, and for WB_SYNC_ALL writeback we need
> +		 * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed
> +		 * so another attempt to write this page in this writeback sweep
> +		 * will be made.
> +		 */
> +		if (error) {
> +			set_page_writeback_keepwrite(page);
> +		} else {
> +			clear_page_dirty_for_io(page);
> +			set_page_writeback(page);
> +		}
> +		unlock_page(page);
>  
>  		/*
>  		 * Preserve the original error if there was one, otherwise catch
> @@ -920,7 +913,9 @@ xfs_writepage_map(
>  		 * race with a partial page truncate on a sub-page block sized
>  		 * filesystem. In that case we need to mark the page clean.
>  		 */
> -		xfs_start_page_writeback(page, 1);
> +		clear_page_dirty_for_io(page);
> +		set_page_writeback(page);
> +		unlock_page(page);
>  		end_page_writeback(page);
>  	}
>  
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map
  2018-05-30 10:00 ` [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
  2018-05-30 18:00   ` Darrick J. Wong
@ 2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:11PM +0200, Christoph Hellwig wrote:
> Rejuggle how we deal with the different error vs non-error and have
> ioends vs not have ioend cases to keep the fast path streamlined, and
> the duplicate code at a minimum.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 65 +++++++++++++++++++++++------------------------
>  1 file changed, 32 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 38021023131e..ac417ef326a9 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -873,7 +873,14 @@ xfs_writepage_map(
>  	 * submission of outstanding ioends on the writepage context so they are
>  	 * treated correctly on error.
>  	 */
> -	if (count) {
> +	if (unlikely(error)) {
> +		if (!count) {
> +			xfs_aops_discard_page(page);
> +			ClearPageUptodate(page);
> +			unlock_page(page);
> +			goto done;
> +		}
> +
>  		/*
>  		 * If the page was not fully cleaned, we need to ensure that the
>  		 * higher layers come back to it correctly.  That means we need
> @@ -882,43 +889,35 @@ xfs_writepage_map(
>  		 * so another attempt to write this page in this writeback sweep
>  		 * will be made.
>  		 */
> -		if (error) {
> -			set_page_writeback_keepwrite(page);
> -		} else {
> -			clear_page_dirty_for_io(page);
> -			set_page_writeback(page);
> -		}
> -		unlock_page(page);
> -
> -		/*
> -		 * Preserve the original error if there was one, otherwise catch
> -		 * submission errors here and propagate into subsequent ioend
> -		 * submissions.
> -		 */
> -		list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
> -			int error2;
> -
> -			list_del_init(&ioend->io_list);
> -			error2 = xfs_submit_ioend(wbc, ioend, error);
> -			if (error2 && !error)
> -				error = error2;
> -		}
> -	} else if (error) {
> -		xfs_aops_discard_page(page);
> -		ClearPageUptodate(page);
> -		unlock_page(page);
> +		set_page_writeback_keepwrite(page);
>  	} else {
> -		/*
> -		 * We can end up here with no error and nothing to write if we
> -		 * race with a partial page truncate on a sub-page block sized
> -		 * filesystem. In that case we need to mark the page clean.
> -		 */
>  		clear_page_dirty_for_io(page);
>  		set_page_writeback(page);
> -		unlock_page(page);
> -		end_page_writeback(page);
>  	}
>  
> +	unlock_page(page);
> +
> +	/*
> +	 * Preserve the original error if there was one, otherwise catch
> +	 * submission errors here and propagate into subsequent ioend
> +	 * submissions.
> +	 */
> +	list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
> +		int error2;
> +
> +		list_del_init(&ioend->io_list);
> +		error2 = xfs_submit_ioend(wbc, ioend, error);
> +		if (error2 && !error)
> +			error = error2;
> +	}
> +
> +	/*
> +	 * We can end up here with no error and nothing to write if we race with
> +	 * a partial page truncate on a sub-page block sized filesystem.
> +	 */
> +	if (!count)
> +		end_page_writeback(page);
> +done:
>  	mapping_set_error(page->mapping, error);
>  	return error;
>  }
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map
  2018-05-30 10:00 ` [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
  2018-05-30 18:08   ` Darrick J. Wong
@ 2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:12PM +0200, Christoph Hellwig wrote:
> We already track the page uptodate status based on the buffer uptodate
> status, which is updated whenever reading or zeroing blocks.
> 
> This code has been there since commit a ptool commit in 2002, which
> claims to:
> 
>     "merge" the 2.4 fsx fix for block size < page size to 2.5.  This needed
>     major changes to actually fit.
> 
> and isn't present in other writepage implementations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index ac417ef326a9..84f88cecd2f1 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -786,7 +786,6 @@ xfs_writepage_map(
>  	ssize_t			len = i_blocksize(inode);
>  	int			error = 0;
>  	int			count = 0;
> -	bool			uptodate = true;
>  	loff_t			file_offset;	/* file offset of page */
>  	unsigned		poffset;	/* offset into page */
>  
> @@ -813,7 +812,6 @@ xfs_writepage_map(
>  		if (!buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> -			uptodate = false;
>  			continue;
>  		}
>  
> @@ -847,9 +845,6 @@ xfs_writepage_map(
>  		count++;
>  	}
>  
> -	if (uptodate && poffset == PAGE_SIZE)
> -		SetPageUptodate(page);
> -
>  	ASSERT(wpc->ioend || list_empty(&submit_list));
>  
>  out:
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 18/18] xfs: allow writeback on pages without buffer heads
  2018-05-30 10:00 ` [PATCH 18/18] xfs: allow writeback on pages without buffer heads Christoph Hellwig
  2018-05-30 18:08   ` Darrick J. Wong
@ 2018-05-31 13:49   ` Brian Foster
  1 sibling, 0 replies; 61+ messages in thread
From: Brian Foster @ 2018-05-31 13:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, linux-fsdevel, linux-mm

On Wed, May 30, 2018 at 12:00:13PM +0200, Christoph Hellwig wrote:
> Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size
> equal to the page size, and deal with pages without buffer heads in
> writeback.  Thanks to the previous refactoring this is basically trivial
> now.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c  | 47 +++++++++++++++++++++++++++++++++-------------
>  fs/xfs/xfs_iomap.c |  3 ++-
>  2 files changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 84f88cecd2f1..6640377b6eae 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -91,6 +91,19 @@ xfs_find_daxdev_for_inode(
>  		return mp->m_ddev_targp->bt_daxdev;
>  }
>  
> +static void
> +xfs_finish_page_writeback(
> +	struct inode		*inode,
> +	struct bio_vec		*bvec,
> +	int			error)
> +{
> +	if (error) {
> +		SetPageError(bvec->bv_page);
> +		mapping_set_error(inode->i_mapping, -EIO);
> +	}
> +	end_page_writeback(bvec->bv_page);
> +}
> +
>  /*
>   * We're now finished for good with this page.  Update the page state via the
>   * associated buffer_heads, paying attention to the start and end offsets that
> @@ -103,7 +116,7 @@ xfs_find_daxdev_for_inode(
>   * and buffers potentially freed after every call to end_buffer_async_write.
>   */
>  static void
> -xfs_finish_page_writeback(
> +xfs_finish_buffer_writeback(
>  	struct inode		*inode,
>  	struct bio_vec		*bvec,
>  	int			error)
> @@ -178,9 +191,12 @@ xfs_destroy_ioend(
>  			next = bio->bi_private;
>  
>  		/* walk each page on bio, ending page IO on them */
> -		bio_for_each_segment_all(bvec, bio, i)
> -			xfs_finish_page_writeback(inode, bvec, error);
> -
> +		bio_for_each_segment_all(bvec, bio, i) {
> +			if (page_has_buffers(bvec->bv_page))
> +				xfs_finish_buffer_writeback(inode, bvec, error);
> +			else
> +				xfs_finish_page_writeback(inode, bvec, error);
> +		}
>  		bio_put(bio);
>  	}
>  
> @@ -782,13 +798,16 @@ xfs_writepage_map(
>  {
>  	LIST_HEAD(submit_list);
>  	struct xfs_ioend	*ioend, *next;
> -	struct buffer_head	*bh;
> +	struct buffer_head	*bh = NULL;
>  	ssize_t			len = i_blocksize(inode);
>  	int			error = 0;
>  	int			count = 0;
>  	loff_t			file_offset;	/* file offset of page */
>  	unsigned		poffset;	/* offset into page */
>  
> +	if (page_has_buffers(page))
> +		bh = page_buffers(page);
> +
>  	/*
>  	 * Walk the blocks on the page, and we we run off then end of the
>  	 * current map or find the current map invalid, grab a new one.
> @@ -797,11 +816,9 @@ xfs_writepage_map(
>  	 * replace the bufferhead with some other state tracking mechanism in
>  	 * future.
>  	 */
> -	file_offset = page_offset(page);
> -	bh = page_buffers(page);
> -	for (poffset = 0;
> +	for (poffset = 0, file_offset = page_offset(page);
>  	     poffset < PAGE_SIZE;
> -	     poffset += len, file_offset += len, bh = bh->b_this_page) {
> +	     poffset += len, file_offset += len) {
>  		/* past the range we are writing, so nothing more to write. */
>  		if (file_offset >= end_offset)
>  			break;
> @@ -809,9 +826,10 @@ xfs_writepage_map(
>  		/*
>  		 * Block does not contain valid data, skip it.
>  		 */
> -		if (!buffer_uptodate(bh)) {
> +		if (bh && !buffer_uptodate(bh)) {
>  			if (PageUptodate(page))
>  				ASSERT(buffer_mapped(bh));
> +			bh = bh->b_this_page;
>  			continue;
>  		}
>  
> @@ -836,10 +854,15 @@ xfs_writepage_map(
>  			 * meaningless for holes (!mapped && uptodate), so check we did
>  			 * have a buffer covering a hole here and continue.
>  			 */
> +			if (bh)
> +				bh = bh->b_this_page;
>  			continue;
>  		}
>  
> -		xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +		if (bh) {
> +			xfs_map_at_offset(inode, bh, &wpc->imap, file_offset);
> +			bh = bh->b_this_page;
> +		}
>  		xfs_add_to_ioend(inode, file_offset, page, wpc, wbc,
>  				&submit_list);
>  		count++;
> @@ -939,8 +962,6 @@ xfs_do_writepage(
>  
>  	trace_xfs_writepage(inode, page, 0, 0);
>  
> -	ASSERT(page_has_buffers(page));
> -
>  	/*
>  	 * Refuse to write the page out if we are called from reclaim context.
>  	 *
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index f949f0dd7382..93c40da3378a 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1031,7 +1031,8 @@ xfs_file_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	iomap->flags |= IOMAP_F_BUFFER_HEAD;
> +	if (i_blocksize(inode) < PAGE_SIZE)
> +		iomap->flags |= IOMAP_F_BUFFER_HEAD;
>  
>  	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
>  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/18] xfs: make xfs_writepage_map extent map centric
  2018-05-30 17:39   ` Brian Foster
@ 2018-05-31 16:11     ` Christoph Hellwig
  2018-05-31 16:12     ` Christoph Hellwig
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-31 16:11 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm, Dave Chinner

On Wed, May 30, 2018 at 01:39:56PM -0400, Brian Foster wrote:
> I believe Dave originally intended to split this up into multiple
> patches. Dave, did you happen to get anywhere with that before Christoph
> pulled this in?

I've split a few bits off.

> 
> If not, could we at least split off some of the behavior changes into
> separate patches? For example, dropping the !mapped && uptodate check
> that causes us to writeback zeroed blocks over unwritten extents is a
> behavior change that warrants a separate patch.

But that is the one part I can't easily split off.  It would require
tons of spurious changes to the old system of buffer flags, which
might be doable but would be removed in the next patch.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag
  2018-05-31 13:46   ` Brian Foster
@ 2018-05-31 16:11     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-31 16:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm

On Thu, May 31, 2018 at 09:46:38AM -0400, Brian Foster wrote:
> On Wed, May 30, 2018 at 12:00:02PM +0200, Christoph Hellwig wrote:
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> 
> The change looks Ok... It's clearly reasonable to remove a flag that is
> no longer used, but why is it no longer used? The previous patch drops
> it to "make xfs_writepage_map() extent map centric," but the description
> doesn't exactly explain why (and it's not immediately clear to me
> amongst all the other code changes).

My refactoring moves this into a separate patch with a proper changelog,
I'll send it out in a bit.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/18] xfs: make xfs_writepage_map extent map centric
  2018-05-30 17:39   ` Brian Foster
  2018-05-31 16:11     ` Christoph Hellwig
@ 2018-05-31 16:12     ` Christoph Hellwig
  1 sibling, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2018-05-31 16:12 UTC (permalink / raw)
  To: Brian Foster
  Cc: Christoph Hellwig, linux-xfs, linux-fsdevel, linux-mm, Dave Chinner

On Wed, May 30, 2018 at 01:39:56PM -0400, Brian Foster wrote:
> What if the file is reflinked and the current page covers a non-shared
> block but has an overlapping cow mapping due to cowextsize? The current
> logic unconditionally uses the COW mapping for writeback. The updated
> logic doesn't appear to do that in all cases. Consider if the current
> imap was delalloc (and so not trimmed) or the cow mapping was introduced
> after the current imap was mapped. This logic appears to prioritize the
> current mapping so long as it is valid. Doesn't that break the
> cowextsize hint?

It does.  I've fixed it for the next version.

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2018-05-31 16:12 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-30  9:59 buffered writes without buffer heads in xfs and iomap v4 Christoph Hellwig
2018-05-30  9:59 ` [PATCH 01/18] fs: factor out a __generic_write_end helper Christoph Hellwig
2018-05-30 13:34   ` Brian Foster
2018-05-30 16:26   ` Darrick J. Wong
2018-05-30  9:59 ` [PATCH 02/18] iomap: add initial support for writes without buffer heads Christoph Hellwig
2018-05-30 13:34   ` Brian Foster
2018-05-30 16:33   ` Darrick J. Wong
2018-05-30  9:59 ` [PATCH 03/18] xfs: simplify xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-30 13:35   ` Brian Foster
2018-05-30 16:55     ` Darrick J. Wong
2018-05-30  9:59 ` [PATCH 04/18] xfs: simplify xfs_aops_discard_page Christoph Hellwig
2018-05-30 13:35   ` Brian Foster
2018-05-30 16:40   ` Darrick J. Wong
2018-05-30 10:00 ` [PATCH 05/18] xfs: move locking into xfs_bmap_punch_delalloc_range Christoph Hellwig
2018-05-30 13:35   ` Brian Foster
2018-05-30 16:58     ` Darrick J. Wong
2018-05-30 17:40       ` Brian Foster
2018-05-30 10:00 ` [PATCH 06/18] xfs: make xfs_writepage_map extent map centric Christoph Hellwig
2018-05-30 17:19   ` Darrick J. Wong
2018-05-30 17:39   ` Brian Foster
2018-05-31 16:11     ` Christoph Hellwig
2018-05-31 16:12     ` Christoph Hellwig
2018-05-30 10:00 ` [PATCH 07/18] xfs: remove the now unused XFS_BMAPI_IGSTATE flag Christoph Hellwig
2018-05-30 17:19   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-31 16:11     ` Christoph Hellwig
2018-05-30 10:00 ` [PATCH 08/18] xfs: remove xfs_reflink_find_cow_mapping Christoph Hellwig
2018-05-30 17:23   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 09/18] xfs: remove xfs_reflink_trim_irec_to_next_cow Christoph Hellwig
2018-05-30 17:27   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 10/18] xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directly Christoph Hellwig
2018-05-30 17:32   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 11/18] xfs: don't clear imap_valid for a non-uptodate buffers Christoph Hellwig
2018-05-30 17:34   ` Darrick J. Wong
2018-05-31 13:46   ` Brian Foster
2018-05-30 10:00 ` [PATCH 12/18] xfs: remove the imap_valid flag Christoph Hellwig
2018-05-30 17:44   ` Darrick J. Wong
2018-05-31 13:47   ` Brian Foster
2018-05-30 10:00 ` [PATCH 13/18] xfs: don't look at buffer heads in xfs_add_to_ioend Christoph Hellwig
2018-05-30 17:55   ` Darrick J. Wong
2018-05-31  7:03     ` Christoph Hellwig
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 14/18] xfs: move all writeback buffer_head manipulation into xfs_map_at_offset Christoph Hellwig
2018-05-30 17:56   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 15/18] xfs: remove xfs_start_page_writeback Christoph Hellwig
2018-05-30 17:57   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 16/18] xfs: refactor the tail of xfs_writepage_map Christoph Hellwig
2018-05-30 18:00   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 17/18] xfs: do not set the page uptodate in xfs_writepage_map Christoph Hellwig
2018-05-30 18:08   ` Darrick J. Wong
2018-05-31  7:04     ` Christoph Hellwig
2018-05-31 13:49   ` Brian Foster
2018-05-30 10:00 ` [PATCH 18/18] xfs: allow writeback on pages without buffer heads Christoph Hellwig
2018-05-30 18:08   ` Darrick J. Wong
2018-05-31 13:49   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).