linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iomap and xfs COW cleanups
@ 2019-09-09 18:27 Christoph Hellwig
  2019-09-09 18:27 ` [PATCH 01/19] iomap: better document the IOMAP_F_* flags Christoph Hellwig
                   ` (18 more replies)
  0 siblings, 19 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Hi all,

this series started out based on a review of the btrfs iomap series
from Goldwyn.  I've taken his main srcmap patch and modified it a bit
based on my experience of converting xfs over to use the feature.
That led to comments that the xfs code is a mess, so I resurrected
an old series to clean that up and merged it in.  That series also
happens to massively clean up the unshare path in the iomap and xfs
code as well.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 01/19] iomap: better document the IOMAP_F_* flags
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
  2019-09-16 18:08   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

The documentation for IOMAP_F_* is a bit disorganized, and doesn't
mention the fact that most flags are set by the file system and consumed
by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
consumed by the file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/iomap.h | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index e79af6b28410..8adcc8dd4498 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -30,21 +30,36 @@ struct vm_fault;
 #define IOMAP_INLINE	0x05	/* data inline in the inode */
 
 /*
- * Flags for all iomap mappings:
+ * Flags reported by the file system from iomap_begin:
+ *
+ * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
+ * zeroing for areas that no data is copied to.
  *
  * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
  * written data and requires fdatasync to commit them to persistent storage.
+ *
+ * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be
+ * unshared as part a write.
+ *
+ * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block
+ * mappings.
+ *
+ * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
+ * buffer heads for this mapping.
  */
-#define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
-#define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
-#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
-#define IOMAP_F_SIZE_CHANGED	0x08	/* file size has changed */
+#define IOMAP_F_NEW		0x01
+#define IOMAP_F_DIRTY		0x02
+#define IOMAP_F_SHARED		0x04
+#define IOMAP_F_MERGED		0x08
+#define IOMAP_F_BUFFER_HEAD	0x10
 
 /*
- * Flags that only need to be reported for IOMAP_REPORT requests:
+ * Flags set by the core iomap code during operations:
+ *
+ * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
+ * has changed as the result of this write operation.
  */
-#define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
-#define IOMAP_F_SHARED		0x20	/* block shared with another file */
+#define IOMAP_F_SIZE_CHANGED	0x100
 
 /*
  * Flags from 0x1000 up are for file system specific usage:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
  2019-09-09 18:27 ` [PATCH 01/19] iomap: better document the IOMAP_F_* flags Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
  2019-09-16 18:10   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 03/19] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 051b8ec326ba..2a9b41352495 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -678,7 +678,7 @@ EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
 
 static int
 __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page, struct iomap *iomap)
+		unsigned copied, struct page *page)
 {
 	flush_dcache_page(page);
 
@@ -731,7 +731,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
 				page, NULL);
 	} else {
-		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
+		ret = __iomap_write_end(inode, pos, len, copied, page);
 	}
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 03/19] iomap: always use AOP_FLAG_NOFS in iomap_write_begin
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
  2019-09-09 18:27 ` [PATCH 01/19] iomap: better document the IOMAP_F_* flags Christoph Hellwig
  2019-09-09 18:27 ` [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-16 18:11   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 04/19] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
to allow reusing the flags arguments for an internal flags namespace
soon.  Also remove the local index variable that is only used once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2a9b41352495..33e03992d8a7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -606,7 +606,6 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
-	pgoff_t index = pos >> PAGE_SHIFT;
 	struct page *page;
 	int status = 0;
 
@@ -621,7 +620,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 			return status;
 	}
 
-	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
+	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
+			AOP_FLAG_NOFS);
 	if (!page) {
 		status = -ENOMEM;
 		goto out_no_page;
@@ -763,7 +763,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iov_iter *i = data;
 	long status = 0;
 	ssize_t written = 0;
-	unsigned int flags = AOP_FLAG_NOFS;
 
 	do {
 		struct page *page;
@@ -793,8 +792,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, flags, &page,
-				iomap);
+		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
 		if (unlikely(status))
 			break;
 
@@ -892,8 +890,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		if (IS_ERR(rpage))
 			return PTR_ERR(rpage);
 
-		status = iomap_write_begin(inode, pos, bytes,
-					   AOP_FLAG_NOFS, &page, iomap);
+		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
 		put_page(rpage);
 		if (unlikely(status))
 			return status;
@@ -944,8 +941,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	struct page *page;
 	int status;
 
-	status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
-				   iomap);
+	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
 	if (status)
 		return status;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 04/19] iomap: ignore non-shared or non-data blocks in xfs_file_dirty
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 03/19] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-16 18:12   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 05/19] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

xfs_file_dirty is used to unshare reflink blocks.  Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared are don't need zeroing.  This will allow to simplify
the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 15 +++++++++++----
 fs/xfs/xfs_reflink.c   |  2 +-
 include/linux/iomap.h  |  2 +-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 33e03992d8a7..0b05551d9b5a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -872,12 +872,19 @@ __iomap_read_page(struct inode *inode, loff_t offset)
 }
 
 static loff_t
-iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
+iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
 	long status = 0;
 	ssize_t written = 0;
 
+	/* don't bother with blocks that are not shared to start with */
+	if (!(iomap->flags & IOMAP_F_SHARED))
+		return length;
+	/* don't bother with holes or unwritten extents */
+	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+		return length;
+
 	do {
 		struct page *page, *rpage;
 		unsigned long offset;	/* Offset into pagecache page */
@@ -917,14 +924,14 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 }
 
 int
-iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
+iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops)
 {
 	loff_t ret;
 
 	while (len) {
 		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
-				iomap_dirty_actor);
+				iomap_unshare_actor);
 		if (ret <= 0)
 			return ret;
 		pos += ret;
@@ -933,7 +940,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(iomap_file_dirty);
+EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
 static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 		unsigned bytes, struct iomap *iomap)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index c4ec7afd1170..cadc0456804d 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1442,7 +1442,7 @@ xfs_reflink_dirty_extents(
 			flen = XFS_FSB_TO_B(mp, rlen);
 			if (fpos + flen > isize)
 				flen = isize - fpos;
-			error = iomap_file_dirty(VFS_I(ip), fpos, flen,
+			error = iomap_file_unshare(VFS_I(ip), fpos, flen,
 					&xfs_iomap_ops);
 			xfs_ilock(ip, XFS_ILOCK_EXCL);
 			if (error)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8adcc8dd4498..3a0f0975a57e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -183,7 +183,7 @@ int iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 #else
 #define iomap_migrate_page NULL
 #endif
-int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
+int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 		bool *did_zero, const struct iomap_ops *ops);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 05/19] iomap: move the zeroing case out of iomap_read_page_sync
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 04/19] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-16 18:17   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 06/19] iomap: use write_begin to read pages to unshare Christoph Hellwig
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

That keeps the function a little easier to understand, and easier to
modify for pending enhancements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0b05551d9b5a..fe099faf540f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -547,19 +547,12 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 }
 
 static int
-iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
-		unsigned poff, unsigned plen, unsigned from, unsigned to,
-		struct iomap *iomap)
+iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
+		unsigned plen, struct iomap *iomap)
 {
 	struct bio_vec bvec;
 	struct bio bio;
 
-	if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
-		zero_user_segments(page, poff, from, to, poff + plen);
-		iomap_set_range_uptodate(page, poff, plen);
-		return 0;
-	}
-
 	bio_init(&bio, &bvec, 1);
 	bio.bi_opf = REQ_OP_READ;
 	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
@@ -577,7 +570,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
 	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
-	int status = 0;
+	int status;
 
 	if (PageUptodate(page))
 		return 0;
@@ -588,17 +581,23 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 		if (plen == 0)
 			break;
 
-		if ((from > poff && from < poff + plen) ||
-		    (to > poff && to < poff + plen)) {
-			status = iomap_read_page_sync(inode, block_start, page,
-					poff, plen, from, to, iomap);
-			if (status)
-				break;
+		if ((from <= poff || from >= poff + plen) &&
+		    (to <= poff || to >= poff + plen))
+			continue;
+
+		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+			zero_user_segments(page, poff, from, to, poff + plen);
+			iomap_set_range_uptodate(page, poff, plen);
+			continue;
 		}
 
+		status = iomap_read_page_sync(block_start, page, poff, plen,
+				iomap);
+		if (status)
+			return status;
 	} while ((block_start += plen) < block_end);
 
-	return status;
+	return 0;
 }
 
 static int
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 06/19] iomap: use write_begin to read pages to unshare
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (4 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 05/19] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-16 18:34   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Use the existing iomap write_begin code to read the pages unshared
by iomap_file_unshare.  That avoids the extra ->readpage call and
extent tree lookup currently done by read_mapping_page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 49 ++++++++++++++----------------------------
 1 file changed, 16 insertions(+), 33 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index fe099faf540f..a421977a9496 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -533,6 +533,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 EXPORT_SYMBOL_GPL(iomap_migrate_page);
 #endif /* CONFIG_MIGRATION */
 
+enum {
+	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
+};
+
 static void
 iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 {
@@ -562,7 +566,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 }
 
 static int
-__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		struct page *page, struct iomap *iomap)
 {
 	struct iomap_page *iop = iomap_page_create(inode, page);
@@ -581,11 +585,14 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 		if (plen == 0)
 			break;
 
-		if ((from <= poff || from >= poff + plen) &&
+		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
+		    (from <= poff || from >= poff + plen) &&
 		    (to <= poff || to >= poff + plen))
 			continue;
 
 		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
+				return -EIO;
 			zero_user_segments(page, poff, from, to, poff + plen);
 			iomap_set_range_uptodate(page, poff, plen);
 			continue;
@@ -631,7 +638,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, iomap);
 	else
-		status = __iomap_write_begin(inode, pos, len, page, iomap);
+		status = __iomap_write_begin(inode, pos, len, flags, page,
+				iomap);
 
 	if (unlikely(status))
 		goto out_unlock;
@@ -854,22 +862,6 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
-static struct page *
-__iomap_read_page(struct inode *inode, loff_t offset)
-{
-	struct address_space *mapping = inode->i_mapping;
-	struct page *page;
-
-	page = read_mapping_page(mapping, offset >> PAGE_SHIFT, NULL);
-	if (IS_ERR(page))
-		return page;
-	if (!PageUptodate(page)) {
-		put_page(page);
-		return ERR_PTR(-EIO);
-	}
-	return page;
-}
-
 static loff_t
 iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
@@ -885,24 +877,15 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return length;
 
 	do {
-		struct page *page, *rpage;
-		unsigned long offset;	/* Offset into pagecache page */
-		unsigned long bytes;	/* Bytes to write to page */
-
-		offset = offset_in_page(pos);
-		bytes = min_t(loff_t, PAGE_SIZE - offset, length);
-
-		rpage = __iomap_read_page(inode, pos);
-		if (IS_ERR(rpage))
-			return PTR_ERR(rpage);
+		unsigned long offset = offset_in_page(pos);
+		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
+		struct page *page;
 
-		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
-		put_page(rpage);
+		status = iomap_write_begin(inode, pos, bytes,
+				IOMAP_WRITE_F_UNSHARE, &page, iomap);
 		if (unlikely(status))
 			return status;
 
-		WARN_ON_ONCE(!PageUptodate(page));
-
 		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (5 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 06/19] iomap: use write_begin to read pages to unshare Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-10 12:48   ` Goldwyn Rodrigues
  2019-09-16 18:42   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 08/19] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target.  The srcmap is only supported for buffered writes so far.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
      srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c               |  9 ++++--
 fs/ext2/inode.c        |  2 +-
 fs/ext4/inode.c        |  2 +-
 fs/gfs2/bmap.c         |  3 +-
 fs/iomap/apply.c       | 23 +++++++++++----
 fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
 fs/iomap/direct-io.c   |  2 +-
 fs/iomap/fiemap.c      |  4 +--
 fs/iomap/seek.c        |  4 +--
 fs/iomap/swapfile.c    |  3 +-
 fs/xfs/xfs_iomap.c     |  9 ++++--
 include/linux/iomap.h  |  5 ++--
 12 files changed, 79 insertions(+), 52 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index a237141d8787..8016be24bbd9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
 static loff_t
 dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct block_device *bdev = iomap->bdev;
 	struct dax_device *dax_dev = iomap->dax_dev;
@@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
 	struct iomap iomap = { 0 };
+	struct iomap srcmap = { 0 };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
@@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * the file system block size to be equal the page size, which means
 	 * that we never have to deal with more than a single extent here.
 	 */
-	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
+	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
 	if (iomap_errp)
 		*iomap_errp = error;
 	if (error) {
@@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	struct inode *inode = mapping->host;
 	vm_fault_t result = VM_FAULT_FALLBACK;
 	struct iomap iomap = { 0 };
+	struct iomap srcmap = { 0 };
 	pgoff_t max_pgoff;
 	void *entry;
 	loff_t pos;
@@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * to look up our filesystem block.
 	 */
 	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
-	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
+	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
+			&srcmap);
 	if (error)
 		goto unlock_entry;
 
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 7004ce581a32..467c13ff6b40 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
 
 #ifdef CONFIG_FS_DAX
 static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
-		unsigned flags, struct iomap *iomap)
+		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
 {
 	unsigned int blkbits = inode->i_blkbits;
 	unsigned long first_block = offset >> blkbits;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 420fe3deed39..e2116e8228d2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3453,7 +3453,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
 }
 
 static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
-			    unsigned flags, struct iomap *iomap)
+		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	unsigned int blkbits = inode->i_blkbits;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 79581b9bdebb..0bf8e8fa82bd 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1123,7 +1123,8 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
 }
 
 static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
-			    unsigned flags, struct iomap *iomap)
+			    unsigned flags, struct iomap *iomap,
+			    struct iomap *srcmap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct metapath mp = { .mp_aheight = 1, };
diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
index 54c02aecf3cd..67efd86675e6 100644
--- a/fs/iomap/apply.c
+++ b/fs/iomap/apply.c
@@ -24,7 +24,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
 {
 	struct iomap iomap = { 0 };
+	struct iomap srcmap = { 0 };
 	loff_t written = 0, ret;
+	u64 end;
 
 	/*
 	 * Need to map a range from start position for length bytes. This can
@@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	 * expose transient stale data. If the reserve fails, we can safely
 	 * back out at this point as there is nothing to undo.
 	 */
-	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
+	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
 	if (ret)
 		return ret;
 	if (WARN_ON(iomap.offset > pos))
@@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	 * Cut down the length to the one actually provided by the filesystem,
 	 * as it might not be able to give us the whole size that we requested.
 	 */
-	if (iomap.offset + iomap.length < pos + length)
-		length = iomap.offset + iomap.length - pos;
+	end = iomap.offset + iomap.length;
+	if (srcmap.type)
+		end = min(end, srcmap.offset + srcmap.length);
+	if (pos + length > end)
+		length = end - pos;
 
 	/*
-	 * Now that we have guaranteed that the space allocation will succeed.
+	 * Now that we have guaranteed that the space allocation will succeed,
 	 * we can do the copy-in page by page without having to worry about
 	 * failures exposing transient data.
+	 *
+	 * To support COW operations, we read in data for partially blocks from
+	 * the srcmap if the file system filled it in.  In that case we the
+	 * length needs to be limited to the earlier of the ends of the iomaps.
+	 * If the file system did not provide a srcmap we pass in the normal
+	 * iomap into the actors so that they don't need to have special
+	 * handling for the two cases.
 	 */
-	written = actor(inode, pos, length, data, &iomap);
+	written = actor(inode, pos, length, data, &iomap,
+			srcmap.type ? &srcmap : &iomap);
 
 	/*
 	 * Now the data has been copied, commit the range we've copied.  This
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a421977a9496..58fb7a894ad4 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -219,7 +219,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
 
 static loff_t
 iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_readpage_ctx *ctx = data;
 	struct page *page = ctx->cur_page;
@@ -367,7 +367,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
 
 static loff_t
 iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_readpage_ctx *ctx = data;
 	loff_t done, ret;
@@ -387,7 +387,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
 			ctx->cur_page_in_bio = false;
 		}
 		ret = iomap_readpage_actor(inode, pos + done, length - done,
-				ctx, iomap);
+				ctx, iomap, srcmap);
 	}
 
 	return done;
@@ -567,7 +567,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 
 static int
 __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
-		struct page *page, struct iomap *iomap)
+		struct page *page, struct iomap *srcmap)
 {
 	struct iomap_page *iop = iomap_page_create(inode, page);
 	loff_t block_size = i_blocksize(inode);
@@ -590,7 +590,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		    (to <= poff || to >= poff + plen))
 			continue;
 
-		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
 			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
 				return -EIO;
 			zero_user_segments(page, poff, from, to, poff + plen);
@@ -599,7 +599,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		}
 
 		status = iomap_read_page_sync(block_start, page, poff, plen,
-				iomap);
+				srcmap);
 		if (status)
 			return status;
 	} while ((block_start += plen) < block_end);
@@ -609,13 +609,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
-		struct page **pagep, struct iomap *iomap)
+		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	struct page *page;
 	int status = 0;
 
 	BUG_ON(pos + len > iomap->offset + iomap->length);
+	if (srcmap != iomap)
+		BUG_ON(pos + len > srcmap->offset + srcmap->length);
 
 	if (fatal_signal_pending(current))
 		return -EINTR;
@@ -633,13 +635,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		goto out_no_page;
 	}
 
-	if (iomap->type == IOMAP_INLINE)
-		iomap_read_inline_data(inode, page, iomap);
+	if (srcmap->type == IOMAP_INLINE)
+		iomap_read_inline_data(inode, page, srcmap);
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
-		status = __block_write_begin_int(page, pos, len, NULL, iomap);
+		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
 		status = __iomap_write_begin(inode, pos, len, flags, page,
-				iomap);
+				srcmap);
 
 	if (unlikely(status))
 		goto out_unlock;
@@ -725,16 +727,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 }
 
 static int
-iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page, struct iomap *iomap)
+iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
+		struct page *page, struct iomap *iomap, struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	loff_t old_size = inode->i_size;
 	int ret;
 
-	if (iomap->type == IOMAP_INLINE) {
+	if (srcmap->type == IOMAP_INLINE) {
 		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
-	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
+	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
 		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
 				page, NULL);
 	} else {
@@ -765,7 +767,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 
 static loff_t
 iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iov_iter *i = data;
 	long status = 0;
@@ -799,7 +801,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
+		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
+				srcmap);
 		if (unlikely(status))
 			break;
 
@@ -810,8 +813,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page,
-				iomap);
+		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
+				srcmap);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -864,7 +867,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
 static loff_t
 iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	long status = 0;
 	ssize_t written = 0;
@@ -873,7 +876,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (!(iomap->flags & IOMAP_F_SHARED))
 		return length;
 	/* don't bother with holes or unwritten extents */
-	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
 		return length;
 
 	do {
@@ -882,11 +885,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct page *page;
 
 		status = iomap_write_begin(inode, pos, bytes,
-				IOMAP_WRITE_F_UNSHARE, &page, iomap);
+				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
 		if (unlikely(status))
 			return status;
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+				srcmap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -925,19 +929,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
 static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-		unsigned bytes, struct iomap *iomap)
+		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page;
 	int status;
 
-	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
+	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
 	if (status)
 		return status;
 
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -949,14 +953,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 
 static loff_t
 iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	bool *did_zero = data;
 	loff_t written = 0;
 	int status;
 
 	/* already zeroed?  we're done. */
-	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
 		return count;
 
 	do {
@@ -968,7 +972,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		if (IS_DAX(inode))
 			status = iomap_dax_zero(pos, offset, bytes, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap);
+			status = iomap_zero(inode, pos, offset, bytes, iomap,
+					srcmap);
 		if (status < 0)
 			return status;
 
@@ -1018,7 +1023,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
 
 static loff_t
 iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page = data;
 	int ret;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 1fc28c2da279..e3ccbf7daaae 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 
 static loff_t
 iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_dio *dio = data;
 
diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index f26fdd36e383..690ef2d7c6c8 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 
 static loff_t
 iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct fiemap_ctx *ctx = data;
 	loff_t ret = length;
@@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
 
 static loff_t
 iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	sector_t *bno = data, addr;
 
diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
index c04bad4b2b43..89f61d93c0bc 100644
--- a/fs/iomap/seek.c
+++ b/fs/iomap/seek.c
@@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
 
 static loff_t
 iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
-		      void *data, struct iomap *iomap)
+		      void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	switch (iomap->type) {
 	case IOMAP_UNWRITTEN:
@@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
 
 static loff_t
 iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
-		      void *data, struct iomap *iomap)
+		      void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	switch (iomap->type) {
 	case IOMAP_HOLE:
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 152a230f668d..a648dbf6991e 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
  * distinction between written and unwritten extents.
  */
 static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
-		loff_t count, void *data, struct iomap *iomap)
+		loff_t count, void *data, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	struct iomap_swapfile_info *isi = data;
 	int error;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d340db666c06..f1c3ff27c666 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -928,7 +928,8 @@ xfs_file_iomap_begin(
 	loff_t			offset,
 	loff_t			length,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
 	loff_t			offset,
 	loff_t			length,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
 	loff_t			offset,
 	loff_t			length,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 3a0f0975a57e..6d795924baa7 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -127,7 +127,8 @@ struct iomap_ops {
 	 * The actual length is returned in iomap->length.
 	 */
 	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
-			unsigned flags, struct iomap *iomap);
+			unsigned flags, struct iomap *iomap,
+			struct iomap *srcmap);
 
 	/*
 	 * Commit and/or unreserve space previous allocated using iomap_begin.
@@ -143,7 +144,7 @@ struct iomap_ops {
  * Main iomap iterator function.
  */
 typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
-		void *data, struct iomap *iomap);
+		void *data, struct iomap *iomap, struct iomap *srcmap);
 
 loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
 		unsigned flags, const struct iomap_ops *ops, void *data,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 08/19] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (6 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 17:09   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

There is no reason not to punch out stale delalloc blocks for zeroing
operations, as they otherwise behave exactly like normal writes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f1c3ff27c666..db4764c16142 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1138,7 +1138,8 @@ xfs_file_iomap_end(
 	unsigned		flags,
 	struct iomap		*iomap)
 {
-	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
+	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
+	    iomap->type == IOMAP_DELALLOC)
 		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
 				length, written, iomap);
 	return 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (7 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 08/19] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 17:17   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 10/19] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Now that xfs_file_unshare is not completely dumb we can just call it
directly without iterating the extent and reflink btrees ourselves.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 108 ++++---------------------------------------
 1 file changed, 10 insertions(+), 98 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index cadc0456804d..73f8cce4722d 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1381,85 +1381,6 @@ xfs_reflink_remap_prep(
 	return ret;
 }
 
-/*
- * The user wants to preemptively CoW all shared blocks in this file,
- * which enables us to turn off the reflink flag.  Iterate all
- * extents which are not prealloc/delalloc to see which ranges are
- * mentioned in the refcount tree, then read those blocks into the
- * pagecache, dirty them, fsync them back out, and then we can update
- * the inode flag.  What happens if we run out of memory? :)
- */
-STATIC int
-xfs_reflink_dirty_extents(
-	struct xfs_inode	*ip,
-	xfs_fileoff_t		fbno,
-	xfs_filblks_t		end,
-	xfs_off_t		isize)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_agnumber_t		agno;
-	xfs_agblock_t		agbno;
-	xfs_extlen_t		aglen;
-	xfs_agblock_t		rbno;
-	xfs_extlen_t		rlen;
-	xfs_off_t		fpos;
-	xfs_off_t		flen;
-	struct xfs_bmbt_irec	map[2];
-	int			nmaps;
-	int			error = 0;
-
-	while (end - fbno > 0) {
-		nmaps = 1;
-		/*
-		 * Look for extents in the file.  Skip holes, delalloc, or
-		 * unwritten extents; they can't be reflinked.
-		 */
-		error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
-		if (error)
-			goto out;
-		if (nmaps == 0)
-			break;
-		if (!xfs_bmap_is_real_extent(&map[0]))
-			goto next;
-
-		map[1] = map[0];
-		while (map[1].br_blockcount) {
-			agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
-			agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
-			aglen = map[1].br_blockcount;
-
-			error = xfs_reflink_find_shared(mp, NULL, agno, agbno,
-					aglen, &rbno, &rlen, true);
-			if (error)
-				goto out;
-			if (rbno == NULLAGBLOCK)
-				break;
-
-			/* Dirty the pages */
-			xfs_iunlock(ip, XFS_ILOCK_EXCL);
-			fpos = XFS_FSB_TO_B(mp, map[1].br_startoff +
-					(rbno - agbno));
-			flen = XFS_FSB_TO_B(mp, rlen);
-			if (fpos + flen > isize)
-				flen = isize - fpos;
-			error = iomap_file_unshare(VFS_I(ip), fpos, flen,
-					&xfs_iomap_ops);
-			xfs_ilock(ip, XFS_ILOCK_EXCL);
-			if (error)
-				goto out;
-
-			map[1].br_blockcount -= (rbno - agbno + rlen);
-			map[1].br_startoff += (rbno - agbno + rlen);
-			map[1].br_startblock += (rbno - agbno + rlen);
-		}
-
-next:
-		fbno = map[0].br_startoff + map[0].br_blockcount;
-	}
-out:
-	return error;
-}
-
 /* Does this inode need the reflink flag? */
 int
 xfs_reflink_inode_has_shared_extents(
@@ -1589,6 +1510,11 @@ xfs_reflink_try_clear_inode_flag(
 /*
  * Pre-COW all shared blocks within a given byte range of a file and turn off
  * the reflink flag if we unshare all of the file's blocks.
+ *
+ * Let iomap iterate all extents to see which are shared and not unwritten or
+ * delalloc and read them into the page cache, dirty them, fsync them back out,
+ * and then we can update the inode flag.  What happens if we run out of
+ * memory? :)
  */
 int
 xfs_reflink_unshare(
@@ -1596,10 +1522,7 @@ xfs_reflink_unshare(
 	xfs_off_t		offset,
 	xfs_off_t		len)
 {
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		fbno;
-	xfs_filblks_t		end;
-	xfs_off_t		isize;
+	struct inode		*inode = VFS_I(ip);
 	int			error;
 
 	if (!xfs_is_reflink_inode(ip))
@@ -1607,20 +1530,12 @@ xfs_reflink_unshare(
 
 	trace_xfs_reflink_unshare(ip, offset, len);
 
-	inode_dio_wait(VFS_I(ip));
+	inode_dio_wait(inode);
 
-	/* Try to CoW the selected ranges */
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	fbno = XFS_B_TO_FSBT(mp, offset);
-	isize = i_size_read(VFS_I(ip));
-	end = XFS_B_TO_FSB(mp, offset + len);
-	error = xfs_reflink_dirty_extents(ip, fbno, end, isize);
+	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
 	if (error)
-		goto out_unlock;
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
-	/* Wait for the IO to finish */
-	error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
+		goto out;
+	error = filemap_write_and_wait(inode->i_mapping);
 	if (error)
 		goto out;
 
@@ -1628,11 +1543,8 @@ xfs_reflink_unshare(
 	error = xfs_reflink_try_clear_inode_flag(ip);
 	if (error)
 		goto out;
-
 	return 0;
 
-out_unlock:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 out:
 	trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
 	return error;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 10/19] xfs: pass two imaps to xfs_reflink_allocate_cow
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (8 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 17:26   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 11/19] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

xfs_reflink_allocate_cow consumes the source data fork imap, and
potentially returns the COW fork imap.  Split the arguments in two
to clear up the calling conventions and to prepare for returning
a source iomap from ->iomap_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c   |  8 ++++----
 fs/xfs/xfs_reflink.c | 30 +++++++++++++++---------------
 fs/xfs/xfs_reflink.h |  4 ++--
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index db4764c16142..b228d1dbf59f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -996,9 +996,8 @@ xfs_file_iomap_begin(
 			goto out_found;
 
 		/* may drop and re-acquire the ilock */
-		cmap = imap;
-		error = xfs_reflink_allocate_cow(ip, &cmap, &shared, &lockmode,
-				directio);
+		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
+				&lockmode, directio);
 		if (error)
 			goto out_unlock;
 
@@ -1011,7 +1010,8 @@ xfs_file_iomap_begin(
 		 * newly allocated address.  If the data fork has a hole, copy
 		 * the COW fork mapping to avoid allocating to the data fork.
 		 */
-		if (directio || imap.br_startblock == HOLESTARTBLOCK)
+		if (shared &&
+		    (directio || imap.br_startblock == HOLESTARTBLOCK))
 			imap = cmap;
 
 		end_fsb = imap.br_startoff + imap.br_blockcount;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 73f8cce4722d..673018a618f0 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -308,13 +308,13 @@ static int
 xfs_find_trim_cow_extent(
 	struct xfs_inode	*ip,
 	struct xfs_bmbt_irec	*imap,
+	struct xfs_bmbt_irec	*cmap,
 	bool			*shared,
 	bool			*found)
 {
 	xfs_fileoff_t		offset_fsb = imap->br_startoff;
 	xfs_filblks_t		count_fsb = imap->br_blockcount;
 	struct xfs_iext_cursor	icur;
-	struct xfs_bmbt_irec	got;
 
 	*found = false;
 
@@ -322,23 +322,22 @@ xfs_find_trim_cow_extent(
 	 * If we don't find an overlapping extent, trim the range we need to
 	 * allocate to fit the hole we found.
 	 */
-	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got))
-		got.br_startoff = offset_fsb + count_fsb;
-	if (got.br_startoff > offset_fsb) {
+	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, cmap))
+		cmap->br_startoff = offset_fsb + count_fsb;
+	if (cmap->br_startoff > offset_fsb) {
 		xfs_trim_extent(imap, imap->br_startoff,
-				got.br_startoff - imap->br_startoff);
+				cmap->br_startoff - imap->br_startoff);
 		return xfs_inode_need_cow(ip, imap, shared);
 	}
 
 	*shared = true;
-	if (isnullstartblock(got.br_startblock)) {
-		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
+	if (isnullstartblock(cmap->br_startblock)) {
+		xfs_trim_extent(imap, cmap->br_startoff, cmap->br_blockcount);
 		return 0;
 	}
 
 	/* real extent found - no need to allocate */
-	xfs_trim_extent(&got, offset_fsb, count_fsb);
-	*imap = got;
+	xfs_trim_extent(cmap, offset_fsb, count_fsb);
 	*found = true;
 	return 0;
 }
@@ -348,6 +347,7 @@ int
 xfs_reflink_allocate_cow(
 	struct xfs_inode	*ip,
 	struct xfs_bmbt_irec	*imap,
+	struct xfs_bmbt_irec	*cmap,
 	bool			*shared,
 	uint			*lockmode,
 	bool			convert_now)
@@ -367,7 +367,7 @@ xfs_reflink_allocate_cow(
 		xfs_ifork_init_cow(ip);
 	}
 
-	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+	error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
 	if (error || !*shared)
 		return error;
 	if (found)
@@ -392,7 +392,7 @@ xfs_reflink_allocate_cow(
 	/*
 	 * Check for an overlapping extent again now that we dropped the ilock.
 	 */
-	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+	error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
 	if (error || !*shared)
 		goto out_trans_cancel;
 	if (found) {
@@ -411,7 +411,7 @@ xfs_reflink_allocate_cow(
 	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
 			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
-			resblks, imap, &nimaps);
+			resblks, cmap, &nimaps);
 	if (error)
 		goto out_unreserve;
 
@@ -427,15 +427,15 @@ xfs_reflink_allocate_cow(
 	if (nimaps == 0)
 		return -ENOSPC;
 convert:
-	xfs_trim_extent(imap, offset_fsb, count_fsb);
+	xfs_trim_extent(cmap, offset_fsb, count_fsb);
 	/*
 	 * COW fork extents are supposed to remain unwritten until we're ready
 	 * to initiate a disk write.  For direct I/O we are going to write the
 	 * data and need the conversion, but for buffered writes we're done.
 	 */
-	if (!convert_now || imap->br_state == XFS_EXT_NORM)
+	if (!convert_now || cmap->br_state == XFS_EXT_NORM)
 		return 0;
-	trace_xfs_reflink_convert_cow(ip, imap);
+	trace_xfs_reflink_convert_cow(ip, cmap);
 	return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
 
 out_unreserve:
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 28a43b7f581d..d18ad7f4fb64 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -25,8 +25,8 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 bool xfs_inode_need_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
 		bool *shared);
 
-extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
-		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
+int xfs_reflink_allocate_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
+		struct xfs_bmbt_irec *cmap, bool *shared, uint *lockmode,
 		bool convert_now);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 11/19] xfs: refactor xfs_file_iomap_begin_delay
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (9 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 10/19] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 17:30   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 12/19] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Rejuggle the return path to prepare for filling out a source iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 48 +++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index b228d1dbf59f..18a0f8a5d8c9 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -539,7 +539,6 @@ xfs_file_iomap_begin_delay(
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
 	bool			eof = false, cow_eof = false, shared = false;
-	u16			iomap_flags = 0;
 	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
 
@@ -600,8 +599,7 @@ xfs_file_iomap_begin_delay(
 				&ccur, &cmap);
 		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
 			trace_xfs_reflink_cow_found(ip, &cmap);
-			whichfork = XFS_COW_FORK;
-			goto done;
+			goto found_cow;
 		}
 	}
 
@@ -615,7 +613,7 @@ xfs_file_iomap_begin_delay(
 		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
 			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
 					&imap);
-			goto done;
+			goto found_imap;
 		}
 
 		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
@@ -629,7 +627,7 @@ xfs_file_iomap_begin_delay(
 		if (!shared) {
 			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
 					&imap);
-			goto done;
+			goto found_imap;
 		}
 
 		/*
@@ -703,35 +701,37 @@ xfs_file_iomap_begin_delay(
 		goto out_unlock;
 	}
 
+	if (whichfork == XFS_COW_FORK) {
+		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+		goto found_cow;
+	}
+
 	/*
 	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
 	 * them out if the write happens to fail.
 	 */
-	if (whichfork == XFS_DATA_FORK) {
-		iomap_flags |= IOMAP_F_NEW;
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
-	} else {
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
-	}
-done:
-	if (whichfork == XFS_COW_FORK) {
-		if (imap.br_startoff > offset_fsb) {
-			xfs_trim_extent(&cmap, offset_fsb,
-					imap.br_startoff - offset_fsb);
-			error = xfs_bmbt_to_iomap(ip, iomap, &cmap,
-					IOMAP_F_SHARED);
-			goto out_unlock;
-		}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
+
+found_imap:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
+
+found_cow:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (imap.br_startoff <= offset_fsb) {
 		/* ensure we only report blocks we have a reservation for */
 		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
-		shared = true;
+		return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
 	}
-	if (shared)
-		iomap_flags |= IOMAP_F_SHARED;
-	error = xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
+	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
+
 }
 
 int
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 12/19] xfs: fill out the srcmap in iomap_begin
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (10 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 11/19] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 17:52   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Replace our local hacks to report the source block in the main iomap
with the proper scrmap reporting.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 49 +++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 18a0f8a5d8c9..d12eacdc9bba 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -527,7 +527,8 @@ xfs_file_iomap_begin_delay(
 	loff_t			offset,
 	loff_t			count,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -721,11 +722,13 @@ xfs_file_iomap_begin_delay(
 found_cow:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (imap.br_startoff <= offset_fsb) {
-		/* ensure we only report blocks we have a reservation for */
-		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
-		return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+		if (error)
+			return error;
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb,
+				imap.br_startoff - offset_fsb);
 	}
-	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
 	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
 
 out_unlock:
@@ -933,7 +936,7 @@ xfs_file_iomap_begin(
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_bmbt_irec	imap;
+	struct xfs_bmbt_irec	imap, cmap;
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	int			nimaps = 1, error = 0;
 	bool			shared = false;
@@ -947,7 +950,7 @@ xfs_file_iomap_begin(
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
 		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
-				iomap);
+				iomap, srcmap);
 	}
 
 	/*
@@ -987,9 +990,6 @@ xfs_file_iomap_begin(
 	 * been done up front, so we don't need to do them here.
 	 */
 	if (xfs_is_cow_inode(ip)) {
-		struct xfs_bmbt_irec	cmap;
-		bool			directio = (flags & IOMAP_DIRECT);
-
 		/* if zeroing doesn't need COW allocation, then we are done. */
 		if ((flags & IOMAP_ZERO) &&
 		    !needs_cow_for_zeroing(&imap, nimaps))
@@ -997,23 +997,11 @@ xfs_file_iomap_begin(
 
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
-				&lockmode, directio);
+				&lockmode, flags & IOMAP_DIRECT);
 		if (error)
 			goto out_unlock;
-
-		/*
-		 * For buffered writes we need to report the address of the
-		 * previous block (if there was any) so that the higher level
-		 * write code can perform read-modify-write operations; we
-		 * won't need the CoW fork mapping until writeback.  For direct
-		 * I/O, which must be block aligned, we need to report the
-		 * newly allocated address.  If the data fork has a hole, copy
-		 * the COW fork mapping to avoid allocating to the data fork.
-		 */
-		if (shared &&
-		    (directio || imap.br_startblock == HOLESTARTBLOCK))
-			imap = cmap;
-
+		if (shared)
+			goto out_found_cow;
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
 	}
@@ -1067,6 +1055,17 @@ xfs_file_iomap_begin(
 	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
 	goto out_finish;
 
+out_found_cow:
+	xfs_iunlock(ip, lockmode);
+	length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
+	trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap);
+	if (imap.br_startblock != HOLESTARTBLOCK) {
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+		if (error)
+			return error;
+	}
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
 out_unlock:
 	xfs_iunlock(ip, lockmode);
 	return error;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (11 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 12/19] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
  2019-09-18 17:55   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 14/19] xfs: split out a new set of read-only iomap ops Christoph Hellwig
                   ` (5 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

We have lots of places that want to calculate the final fsb for
a offset + count in bytes and check that the result fits into
s_maxbytes.  Factor out a helper for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d12eacdc9bba..0ba67a8d8169 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -102,6 +102,17 @@ xfs_hole_to_iomap(
 	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
 }
 
+static inline xfs_fileoff_t
+xfs_iomap_end_fsb(
+	struct xfs_mount	*mp,
+	loff_t			offset,
+	loff_t			count)
+{
+	ASSERT(offset <= mp->m_super->s_maxbytes);
+	return min(XFS_B_TO_FSB(mp, offset + count),
+		   XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+}
+
 xfs_extlen_t
 xfs_eof_alignment(
 	struct xfs_inode	*ip,
@@ -172,8 +183,8 @@ xfs_iomap_write_direct(
 	int		nmaps)
 {
 	xfs_mount_t	*mp = ip->i_mount;
-	xfs_fileoff_t	offset_fsb;
-	xfs_fileoff_t	last_fsb;
+	xfs_fileoff_t	offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t	last_fsb = xfs_iomap_end_fsb(mp, offset, count);
 	xfs_filblks_t	count_fsb, resaligned;
 	xfs_extlen_t	extsz;
 	int		nimaps;
@@ -192,8 +203,6 @@ xfs_iomap_write_direct(
 
 	ASSERT(xfs_isilocked(ip, lockmode));
 
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
 	if ((offset + count) > XFS_ISIZE(ip)) {
 		/*
 		 * Assert that the in-core extent list is present since this can
@@ -533,9 +542,7 @@ xfs_file_iomap_begin_delay(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	xfs_fileoff_t		maxbytes_fsb =
-		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
-	xfs_fileoff_t		end_fsb;
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
 	struct xfs_bmbt_irec	imap, cmap;
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
@@ -565,8 +572,6 @@ xfs_file_iomap_begin_delay(
 			goto out_unlock;
 	}
 
-	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
-
 	/*
 	 * Search the data fork fork first to look up our source mapping.  We
 	 * always need the data fork map, as we have to return it to the
@@ -648,7 +653,7 @@ xfs_file_iomap_begin_delay(
 		 * the lower level functions are updated.
 		 */
 		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
-		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
+		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
 
 		if (xfs_is_always_cow_inode(ip))
 			whichfork = XFS_COW_FORK;
@@ -674,7 +679,8 @@ xfs_file_iomap_begin_delay(
 			if (align)
 				p_end_fsb = roundup_64(p_end_fsb, align);
 
-			p_end_fsb = min(p_end_fsb, maxbytes_fsb);
+			p_end_fsb = min(p_end_fsb,
+				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
 			ASSERT(p_end_fsb > offset_fsb);
 			prealloc_blocks = p_end_fsb - end_fsb;
 		}
@@ -937,7 +943,8 @@ xfs_file_iomap_begin(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_bmbt_irec	imap, cmap;
-	xfs_fileoff_t		offset_fsb, end_fsb;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
 	int			nimaps = 1, error = 0;
 	bool			shared = false;
 	u16			iomap_flags = 0;
@@ -963,12 +970,6 @@ xfs_file_iomap_begin(
 	if (error)
 		return error;
 
-	ASSERT(offset <= mp->m_super->s_maxbytes);
-	if (offset > mp->m_super->s_maxbytes - length)
-		length = mp->m_super->s_maxbytes - offset;
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	end_fsb = XFS_B_TO_FSB(mp, offset + length);
-
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
 			       &nimaps, 0);
 	if (error)
@@ -1189,8 +1190,7 @@ xfs_seek_iomap_begin(
 		/*
 		 * Fake a hole until the end of the file.
 		 */
-		data_fsb = min(XFS_B_TO_FSB(mp, offset + length),
-			       XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+		data_fsb = xfs_iomap_end_fsb(mp, offset, length);
 	}
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 14/19] xfs: split out a new set of read-only iomap ops
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (12 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 17:56   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 15/19] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Start untangling xfs_file_iomap_begin by splitting out the read-only
case into its own set of iomap_ops with a very simply iomap_begin
helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  |  9 ++++---
 fs/xfs/xfs_file.c  |  8 +++---
 fs/xfs/xfs_iomap.c | 61 ++++++++++++++++++++++++++++++++++------------
 fs/xfs/xfs_iomap.h |  1 +
 fs/xfs/xfs_iops.c  |  2 +-
 5 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4e4a4d7df5ac..fe47a4d52247 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1158,7 +1158,7 @@ xfs_vm_bmap(
 	 */
 	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
 		return 0;
-	return iomap_bmap(mapping, block, &xfs_iomap_ops);
+	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
 }
 
 STATIC int
@@ -1167,7 +1167,7 @@ xfs_vm_readpage(
 	struct page		*page)
 {
 	trace_xfs_vm_readpage(page->mapping->host, 1);
-	return iomap_readpage(page, &xfs_iomap_ops);
+	return iomap_readpage(page, &xfs_read_iomap_ops);
 }
 
 STATIC int
@@ -1178,7 +1178,7 @@ xfs_vm_readpages(
 	unsigned		nr_pages)
 {
 	trace_xfs_vm_readpages(mapping->host, nr_pages);
-	return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
+	return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
 }
 
 static int
@@ -1188,7 +1188,8 @@ xfs_iomap_swapfile_activate(
 	sector_t			*span)
 {
 	sis->bdev = xfs_find_bdev_for_inode(file_inode(swap_file));
-	return iomap_swapfile_activate(sis, swap_file, span, &xfs_iomap_ops);
+	return iomap_swapfile_activate(sis, swap_file, span,
+			&xfs_read_iomap_ops);
 }
 
 const struct address_space_operations xfs_address_space_operations = {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 21bd3d575aaa..d8e2e36b84f7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -187,7 +187,7 @@ xfs_file_dio_aio_read(
 	file_accessed(iocb->ki_filp);
 
 	xfs_ilock(ip, XFS_IOLOCK_SHARED);
-	ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
+	ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
 	return ret;
@@ -214,7 +214,7 @@ xfs_file_dax_read(
 		xfs_ilock(ip, XFS_IOLOCK_SHARED);
 	}
 
-	ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
+	ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
 	file_accessed(iocb->ki_filp);
@@ -1131,7 +1131,9 @@ __xfs_filemap_fault(
 	if (IS_DAX(inode)) {
 		pfn_t pfn;
 
-		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
+				(write_fault && !vmf->cow_page) ?
+				 &xfs_iomap_ops : &xfs_read_iomap_ops);
 		if (ret & VM_FAULT_NEEDDSYNC)
 			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
 	} else {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0ba67a8d8169..d0ed0cba9041 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -950,11 +950,13 @@ xfs_file_iomap_begin(
 	u16			iomap_flags = 0;
 	unsigned		lockmode;
 
+	ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
+
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
-			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
+	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
+	    !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
 		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
 				iomap, srcmap);
@@ -975,17 +977,6 @@ xfs_file_iomap_begin(
 	if (error)
 		goto out_unlock;
 
-	if (flags & IOMAP_REPORT) {
-		/* Trim the mapping to the nearest shared extent boundary. */
-		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
-		if (error)
-			goto out_unlock;
-	}
-
-	/* Non-modifying mapping requested, so we are done */
-	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
-		goto out_found;
-
 	/*
 	 * Break shared extents if necessary. Checks for non-blocking IO have
 	 * been done up front, so we don't need to do them here.
@@ -1046,8 +1037,6 @@ xfs_file_iomap_begin(
 	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
 
 out_finish:
-	if (shared)
-		iomap_flags |= IOMAP_F_SHARED;
 	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
 
 out_found:
@@ -1150,6 +1139,48 @@ const struct iomap_ops xfs_iomap_ops = {
 	.iomap_end		= xfs_file_iomap_end,
 };
 
+static int
+xfs_read_iomap_begin(
+	struct inode		*inode,
+	loff_t			offset,
+	loff_t			length,
+	unsigned		flags,
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_bmbt_irec	imap;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
+	int			nimaps = 1, error = 0;
+	bool			shared = false;
+	unsigned		lockmode;
+
+	ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
+
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
+
+	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
+	if (error)
+		return error;
+	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
+			       &nimaps, 0);
+	if (!error && (flags & IOMAP_REPORT))
+		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
+	xfs_iunlock(ip, lockmode);
+
+	if (error)
+		return error;
+	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, shared ? IOMAP_F_SHARED : 0);
+}
+
+const struct iomap_ops xfs_read_iomap_ops = {
+	.iomap_begin		= xfs_read_iomap_begin,
+};
+
 static int
 xfs_seek_iomap_begin(
 	struct inode		*inode,
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 71d0ae460c44..61b1fc3e5143 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -40,6 +40,7 @@ xfs_aligned_fsb_count(
 }
 
 extern const struct iomap_ops xfs_iomap_ops;
+extern const struct iomap_ops xfs_read_iomap_ops;
 extern const struct iomap_ops xfs_seek_iomap_ops;
 extern const struct iomap_ops xfs_xattr_iomap_ops;
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ff3c1fae5357..f0c7df82e18b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1113,7 +1113,7 @@ xfs_vn_fiemap(
 				&xfs_xattr_iomap_ops);
 	} else {
 		error = iomap_fiemap(inode, fieinfo, start, length,
-				&xfs_iomap_ops);
+				&xfs_read_iomap_ops);
 	}
 	xfs_iunlock(XFS_I(inode), XFS_IOLOCK_SHARED);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 15/19] xfs: move xfs_file_iomap_begin_delay around
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (13 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 14/19] xfs: split out a new set of read-only iomap ops Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 17:59   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 16/19] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Move xfs_file_iomap_begin_delay near the end of the file next to the
other iomap functions to prepare for additional refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 434 +++++++++++++++++++++++----------------------
 1 file changed, 221 insertions(+), 213 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index d0ed0cba9041..7ae55503ee27 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -530,219 +530,6 @@ xfs_iomap_prealloc_size(
 	return alloc_blocks;
 }
 
-static int
-xfs_file_iomap_begin_delay(
-	struct inode		*inode,
-	loff_t			offset,
-	loff_t			count,
-	unsigned		flags,
-	struct iomap		*iomap,
-	struct iomap		*srcmap)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
-	struct xfs_bmbt_irec	imap, cmap;
-	struct xfs_iext_cursor	icur, ccur;
-	xfs_fsblock_t		prealloc_blocks = 0;
-	bool			eof = false, cow_eof = false, shared = false;
-	int			whichfork = XFS_DATA_FORK;
-	int			error = 0;
-
-	ASSERT(!XFS_IS_REALTIME_INODE(ip));
-	ASSERT(!xfs_get_extsz_hint(ip));
-
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-
-	if (unlikely(XFS_TEST_ERROR(
-	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
-	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
-	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
-		error = -EFSCORRUPTED;
-		goto out_unlock;
-	}
-
-	XFS_STATS_INC(mp, xs_blk_mapw);
-
-	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
-		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
-		if (error)
-			goto out_unlock;
-	}
-
-	/*
-	 * Search the data fork fork first to look up our source mapping.  We
-	 * always need the data fork map, as we have to return it to the
-	 * iomap code so that the higher level write code can read data in to
-	 * perform read-modify-write cycles for unaligned writes.
-	 */
-	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
-	if (eof)
-		imap.br_startoff = end_fsb; /* fake hole until the end */
-
-	/* We never need to allocate blocks for zeroing a hole. */
-	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
-		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
-		goto out_unlock;
-	}
-
-	/*
-	 * Search the COW fork extent list even if we did not find a data fork
-	 * extent.  This serves two purposes: first this implements the
-	 * speculative preallocation using cowextsize, so that we also unshare
-	 * block adjacent to shared blocks instead of just the shared blocks
-	 * themselves.  Second the lookup in the extent list is generally faster
-	 * than going out to the shared extent tree.
-	 */
-	if (xfs_is_cow_inode(ip)) {
-		if (!ip->i_cowfp) {
-			ASSERT(!xfs_is_reflink_inode(ip));
-			xfs_ifork_init_cow(ip);
-		}
-		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
-				&ccur, &cmap);
-		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
-			trace_xfs_reflink_cow_found(ip, &cmap);
-			goto found_cow;
-		}
-	}
-
-	if (imap.br_startoff <= offset_fsb) {
-		/*
-		 * For reflink files we may need a delalloc reservation when
-		 * overwriting shared extents.   This includes zeroing of
-		 * existing extents that contain data.
-		 */
-		if (!xfs_is_cow_inode(ip) ||
-		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
-			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
-					&imap);
-			goto found_imap;
-		}
-
-		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
-
-		/* Trim the mapping to the nearest shared extent boundary. */
-		error = xfs_inode_need_cow(ip, &imap, &shared);
-		if (error)
-			goto out_unlock;
-
-		/* Not shared?  Just report the (potentially capped) extent. */
-		if (!shared) {
-			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
-					&imap);
-			goto found_imap;
-		}
-
-		/*
-		 * Fork all the shared blocks from our write offset until the
-		 * end of the extent.
-		 */
-		whichfork = XFS_COW_FORK;
-		end_fsb = imap.br_startoff + imap.br_blockcount;
-	} else {
-		/*
-		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
-		 * pages to keep the chunks of work done where somewhat
-		 * symmetric with the work writeback does.  This is a completely
-		 * arbitrary number pulled out of thin air.
-		 *
-		 * Note that the values needs to be less than 32-bits wide until
-		 * the lower level functions are updated.
-		 */
-		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
-		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
-
-		if (xfs_is_always_cow_inode(ip))
-			whichfork = XFS_COW_FORK;
-	}
-
-	error = xfs_qm_dqattach_locked(ip, false);
-	if (error)
-		goto out_unlock;
-
-	if (eof) {
-		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
-				count, &icur);
-		if (prealloc_blocks) {
-			xfs_extlen_t	align;
-			xfs_off_t	end_offset;
-			xfs_fileoff_t	p_end_fsb;
-
-			end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
-			p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
-					prealloc_blocks;
-
-			align = xfs_eof_alignment(ip, 0);
-			if (align)
-				p_end_fsb = roundup_64(p_end_fsb, align);
-
-			p_end_fsb = min(p_end_fsb,
-				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
-			ASSERT(p_end_fsb > offset_fsb);
-			prealloc_blocks = p_end_fsb - end_fsb;
-		}
-	}
-
-retry:
-	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
-			end_fsb - offset_fsb, prealloc_blocks,
-			whichfork == XFS_DATA_FORK ? &imap : &cmap,
-			whichfork == XFS_DATA_FORK ? &icur : &ccur,
-			whichfork == XFS_DATA_FORK ? eof : cow_eof);
-	switch (error) {
-	case 0:
-		break;
-	case -ENOSPC:
-	case -EDQUOT:
-		/* retry without any preallocation */
-		trace_xfs_delalloc_enospc(ip, offset, count);
-		if (prealloc_blocks) {
-			prealloc_blocks = 0;
-			goto retry;
-		}
-		/*FALLTHRU*/
-	default:
-		goto out_unlock;
-	}
-
-	if (whichfork == XFS_COW_FORK) {
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
-		goto found_cow;
-	}
-
-	/*
-	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
-	 * them out if the write happens to fail.
-	 */
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
-
-found_imap:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
-
-found_cow:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	if (imap.br_startoff <= offset_fsb) {
-		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
-		if (error)
-			return error;
-	} else {
-		xfs_trim_extent(&cmap, offset_fsb,
-				imap.br_startoff - offset_fsb);
-	}
-	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
-
-out_unlock:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return error;
-
-}
-
 int
 xfs_iomap_write_unwritten(
 	xfs_inode_t	*ip,
@@ -931,6 +718,15 @@ xfs_ilock_for_iomap(
 	return 0;
 }
 
+static int
+xfs_file_iomap_begin_delay(
+	struct inode		*inode,
+	loff_t			offset,
+	loff_t			count,
+	unsigned		flags,
+	struct iomap		*iomap,
+	struct iomap		*srcmap);
+
 static int
 xfs_file_iomap_begin(
 	struct inode		*inode,
@@ -1061,6 +857,218 @@ xfs_file_iomap_begin(
 	return error;
 }
 
+static int
+xfs_file_iomap_begin_delay(
+	struct inode		*inode,
+	loff_t			offset,
+	loff_t			count,
+	unsigned		flags,
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
+	struct xfs_bmbt_irec	imap, cmap;
+	struct xfs_iext_cursor	icur, ccur;
+	xfs_fsblock_t		prealloc_blocks = 0;
+	bool			eof = false, cow_eof = false, shared = false;
+	int			whichfork = XFS_DATA_FORK;
+	int			error = 0;
+
+	ASSERT(!XFS_IS_REALTIME_INODE(ip));
+	ASSERT(!xfs_get_extsz_hint(ip));
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	if (unlikely(XFS_TEST_ERROR(
+	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
+	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
+	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
+		error = -EFSCORRUPTED;
+		goto out_unlock;
+	}
+
+	XFS_STATS_INC(mp, xs_blk_mapw);
+
+	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+		if (error)
+			goto out_unlock;
+	}
+
+	/*
+	 * Search the data fork fork first to look up our source mapping.  We
+	 * always need the data fork map, as we have to return it to the
+	 * iomap code so that the higher level write code can read data in to
+	 * perform read-modify-write cycles for unaligned writes.
+	 */
+	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
+	if (eof)
+		imap.br_startoff = end_fsb; /* fake hole until the end */
+
+	/* We never need to allocate blocks for zeroing a hole. */
+	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
+		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
+		goto out_unlock;
+	}
+
+	/*
+	 * Search the COW fork extent list even if we did not find a data fork
+	 * extent.  This serves two purposes: first this implements the
+	 * speculative preallocation using cowextsize, so that we also unshare
+	 * block adjacent to shared blocks instead of just the shared blocks
+	 * themselves.  Second the lookup in the extent list is generally faster
+	 * than going out to the shared extent tree.
+	 */
+	if (xfs_is_cow_inode(ip)) {
+		if (!ip->i_cowfp) {
+			ASSERT(!xfs_is_reflink_inode(ip));
+			xfs_ifork_init_cow(ip);
+		}
+		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
+				&ccur, &cmap);
+		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
+			trace_xfs_reflink_cow_found(ip, &cmap);
+			goto found_cow;
+		}
+	}
+
+	if (imap.br_startoff <= offset_fsb) {
+		/*
+		 * For reflink files we may need a delalloc reservation when
+		 * overwriting shared extents.   This includes zeroing of
+		 * existing extents that contain data.
+		 */
+		if (!xfs_is_cow_inode(ip) ||
+		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
+			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+					&imap);
+			goto found_imap;
+		}
+
+		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
+
+		/* Trim the mapping to the nearest shared extent boundary. */
+		error = xfs_inode_need_cow(ip, &imap, &shared);
+		if (error)
+			goto out_unlock;
+
+		/* Not shared?  Just report the (potentially capped) extent. */
+		if (!shared) {
+			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+					&imap);
+			goto found_imap;
+		}
+
+		/*
+		 * Fork all the shared blocks from our write offset until the
+		 * end of the extent.
+		 */
+		whichfork = XFS_COW_FORK;
+		end_fsb = imap.br_startoff + imap.br_blockcount;
+	} else {
+		/*
+		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
+		 * pages to keep the chunks of work done where somewhat
+		 * symmetric with the work writeback does.  This is a completely
+		 * arbitrary number pulled out of thin air.
+		 *
+		 * Note that the values needs to be less than 32-bits wide until
+		 * the lower level functions are updated.
+		 */
+		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
+		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
+
+		if (xfs_is_always_cow_inode(ip))
+			whichfork = XFS_COW_FORK;
+	}
+
+	error = xfs_qm_dqattach_locked(ip, false);
+	if (error)
+		goto out_unlock;
+
+	if (eof) {
+		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
+				count, &icur);
+		if (prealloc_blocks) {
+			xfs_extlen_t	align;
+			xfs_off_t	end_offset;
+			xfs_fileoff_t	p_end_fsb;
+
+			end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
+			p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
+					prealloc_blocks;
+
+			align = xfs_eof_alignment(ip, 0);
+			if (align)
+				p_end_fsb = roundup_64(p_end_fsb, align);
+
+			p_end_fsb = min(p_end_fsb,
+		   		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+			ASSERT(p_end_fsb > offset_fsb);
+			prealloc_blocks = p_end_fsb - end_fsb;
+		}
+	}
+
+retry:
+	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
+			end_fsb - offset_fsb, prealloc_blocks,
+			whichfork == XFS_DATA_FORK ? &imap : &cmap,
+			whichfork == XFS_DATA_FORK ? &icur : &ccur,
+			whichfork == XFS_DATA_FORK ? eof : cow_eof);
+	switch (error) {
+	case 0:
+		break;
+	case -ENOSPC:
+	case -EDQUOT:
+		/* retry without any preallocation */
+		trace_xfs_delalloc_enospc(ip, offset, count);
+		if (prealloc_blocks) {
+			prealloc_blocks = 0;
+			goto retry;
+		}
+		/*FALLTHRU*/
+	default:
+		goto out_unlock;
+	}
+
+	if (whichfork == XFS_COW_FORK) {
+		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+		goto found_cow;
+	}
+
+	/*
+	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
+	 * them out if the write happens to fail.
+	 */
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
+
+found_imap:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
+
+found_cow:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (imap.br_startoff <= offset_fsb) {
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+		if (error)
+			return error;
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb,
+				imap.br_startoff - offset_fsb);
+	}
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+}
+
 static int
 xfs_file_iomap_end_delalloc(
 	struct xfs_inode	*ip,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 16/19] xfs: split the iomap ops for buffered vs direct writes
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (14 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 15/19] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 18:00   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Instead of lots of magic conditionals in the main write_begin
handler this make the intent very clear.  Thing will become even
better once we support delayed allocations for extent size hints
and realtime allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c |  3 ++-
 fs/xfs/xfs_file.c      | 16 ++++++-----
 fs/xfs/xfs_iomap.c     | 61 +++++++++++++++---------------------------
 fs/xfs/xfs_iomap.h     |  3 ++-
 fs/xfs/xfs_iops.c      |  4 +--
 fs/xfs/xfs_reflink.c   |  5 ++--
 6 files changed, 40 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 98c6a7a71427..a4a4813e9cc6 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1111,7 +1111,8 @@ xfs_free_file_space(
 		return 0;
 	if (offset + len > XFS_ISIZE(ip))
 		len = XFS_ISIZE(ip) - offset;
-	error = iomap_zero_range(VFS_I(ip), offset, len, NULL, &xfs_iomap_ops);
+	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
+			&xfs_buffered_write_iomap_ops);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index d8e2e36b84f7..b2ea030485e9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -350,7 +350,7 @@ xfs_file_aio_write_checks(
 	
 		trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
 		error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
-				NULL, &xfs_iomap_ops);
+				NULL, &xfs_buffered_write_iomap_ops);
 		if (error)
 			return error;
 	} else
@@ -546,7 +546,8 @@ xfs_file_dio_aio_write(
 	}
 
 	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
-	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, &xfs_dio_write_ops);
+	ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
+			   &xfs_dio_write_ops);
 
 	/*
 	 * If unaligned, this is the only IO in-flight. If it has not yet
@@ -593,7 +594,7 @@ xfs_file_dax_write(
 	count = iov_iter_count(from);
 
 	trace_xfs_file_dax_write(ip, count, pos);
-	ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
+	ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
 	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
 		i_size_write(inode, iocb->ki_pos);
 		error = xfs_setfilesize(ip, pos, ret);
@@ -640,7 +641,8 @@ xfs_file_buffered_aio_write(
 	current->backing_dev_info = inode_to_bdi(inode);
 
 	trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
-	ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
+	ret = iomap_file_buffered_write(iocb, from,
+			&xfs_buffered_write_iomap_ops);
 	if (likely(ret >= 0))
 		iocb->ki_pos += ret;
 
@@ -1133,12 +1135,14 @@ __xfs_filemap_fault(
 
 		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
 				(write_fault && !vmf->cow_page) ?
-				 &xfs_iomap_ops : &xfs_read_iomap_ops);
+				 &xfs_direct_write_iomap_ops :
+				 &xfs_read_iomap_ops);
 		if (ret & VM_FAULT_NEEDDSYNC)
 			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
 	} else {
 		if (write_fault)
-			ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
+			ret = iomap_page_mkwrite(vmf,
+					&xfs_buffered_write_iomap_ops);
 		else
 			ret = filemap_fault(vmf);
 	}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7ae55503ee27..6dd143374d75 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -719,16 +719,7 @@ xfs_ilock_for_iomap(
 }
 
 static int
-xfs_file_iomap_begin_delay(
-	struct inode		*inode,
-	loff_t			offset,
-	loff_t			count,
-	unsigned		flags,
-	struct iomap		*iomap,
-	struct iomap		*srcmap);
-
-static int
-xfs_file_iomap_begin(
+xfs_direct_write_iomap_begin(
 	struct inode		*inode,
 	loff_t			offset,
 	loff_t			length,
@@ -751,13 +742,6 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
-	    !xfs_get_extsz_hint(ip)) {
-		/* Reserve delalloc blocks for regular writeback. */
-		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
-				iomap, srcmap);
-	}
-
 	/*
 	 * Lock the inode in the manner required for the specified operation and
 	 * check for as many conditions that would result in blocking as
@@ -857,8 +841,12 @@ xfs_file_iomap_begin(
 	return error;
 }
 
+const struct iomap_ops xfs_direct_write_iomap_ops = {
+	.iomap_begin		= xfs_direct_write_iomap_begin,
+};
+
 static int
-xfs_file_iomap_begin_delay(
+xfs_buffered_write_iomap_begin(
 	struct inode		*inode,
 	loff_t			offset,
 	loff_t			count,
@@ -877,8 +865,12 @@ xfs_file_iomap_begin_delay(
 	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
 
+	/* we can't use delayed allocations when using extent size hints */
+	if (xfs_get_extsz_hint(ip))
+		return xfs_direct_write_iomap_begin(inode, offset, count,
+				flags, iomap, srcmap);
+
 	ASSERT(!XFS_IS_REALTIME_INODE(ip));
-	ASSERT(!xfs_get_extsz_hint(ip));
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 
@@ -1070,18 +1062,23 @@ xfs_file_iomap_begin_delay(
 }
 
 static int
-xfs_file_iomap_end_delalloc(
-	struct xfs_inode	*ip,
+xfs_buffered_write_iomap_end(
+	struct inode		*inode,
 	loff_t			offset,
 	loff_t			length,
 	ssize_t			written,
+	unsigned		flags,
 	struct iomap		*iomap)
 {
+	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		start_fsb;
 	xfs_fileoff_t		end_fsb;
 	int			error = 0;
 
+	if (iomap->type != IOMAP_DELALLOC)
+		return 0;
+
 	/*
 	 * Behave as if the write failed if drop writes is enabled. Set the NEW
 	 * flag to force delalloc cleanup.
@@ -1126,25 +1123,9 @@ xfs_file_iomap_end_delalloc(
 	return 0;
 }
 
-static int
-xfs_file_iomap_end(
-	struct inode		*inode,
-	loff_t			offset,
-	loff_t			length,
-	ssize_t			written,
-	unsigned		flags,
-	struct iomap		*iomap)
-{
-	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
-	    iomap->type == IOMAP_DELALLOC)
-		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
-				length, written, iomap);
-	return 0;
-}
-
-const struct iomap_ops xfs_iomap_ops = {
-	.iomap_begin		= xfs_file_iomap_begin,
-	.iomap_end		= xfs_file_iomap_end,
+const struct iomap_ops xfs_buffered_write_iomap_ops = {
+	.iomap_begin		= xfs_buffered_write_iomap_begin,
+	.iomap_end		= xfs_buffered_write_iomap_end,
 };
 
 static int
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 61b1fc3e5143..7aed28275089 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -39,7 +39,8 @@ xfs_aligned_fsb_count(
 	return count_fsb;
 }
 
-extern const struct iomap_ops xfs_iomap_ops;
+extern const struct iomap_ops xfs_buffered_write_iomap_ops;
+extern const struct iomap_ops xfs_direct_write_iomap_ops;
 extern const struct iomap_ops xfs_read_iomap_ops;
 extern const struct iomap_ops xfs_seek_iomap_ops;
 extern const struct iomap_ops xfs_xattr_iomap_ops;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f0c7df82e18b..5c1154e761ca 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -882,10 +882,10 @@ xfs_setattr_size(
 	if (newsize > oldsize) {
 		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
 		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-				&did_zeroing, &xfs_iomap_ops);
+				&did_zeroing, &xfs_buffered_write_iomap_ops);
 	} else {
 		error = iomap_truncate_page(inode, newsize, &did_zeroing,
-				&xfs_iomap_ops);
+				&xfs_buffered_write_iomap_ops);
 	}
 
 	if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 673018a618f0..c68700c3a64a 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1270,7 +1270,7 @@ xfs_reflink_zero_posteof(
 
 	trace_xfs_zero_eof(ip, isize, pos - isize);
 	return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
-			&xfs_iomap_ops);
+			&xfs_buffered_write_iomap_ops);
 }
 
 /*
@@ -1532,7 +1532,8 @@ xfs_reflink_unshare(
 
 	inode_dio_wait(inode);
 
-	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
+	error = iomap_file_unshare(inode, offset, len,
+			&xfs_buffered_write_iomap_ops);
 	if (error)
 		goto out;
 	error = filemap_write_and_wait(inode->i_mapping);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (15 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 16/19] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
  2019-09-18 18:00   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 18/19] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
  2019-09-09 18:27 ` [PATCH 19/19] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
  18 siblings, 2 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Renaming whichfork to allocfork in xfs_buffered_write_iomap_begin makes
the usage of this variable a little more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6dd143374d75..0e575ca1e3fc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -862,7 +862,7 @@ xfs_buffered_write_iomap_begin(
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
 	bool			eof = false, cow_eof = false, shared = false;
-	int			whichfork = XFS_DATA_FORK;
+	int			allocfork = XFS_DATA_FORK;
 	int			error = 0;
 
 	/* we can't use delayed allocations when using extent size hints */
@@ -959,7 +959,7 @@ xfs_buffered_write_iomap_begin(
 		 * Fork all the shared blocks from our write offset until the
 		 * end of the extent.
 		 */
-		whichfork = XFS_COW_FORK;
+		allocfork = XFS_COW_FORK;
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 	} else {
 		/*
@@ -975,7 +975,7 @@ xfs_buffered_write_iomap_begin(
 		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
 
 		if (xfs_is_always_cow_inode(ip))
-			whichfork = XFS_COW_FORK;
+			allocfork = XFS_COW_FORK;
 	}
 
 	error = xfs_qm_dqattach_locked(ip, false);
@@ -983,7 +983,7 @@ xfs_buffered_write_iomap_begin(
 		goto out_unlock;
 
 	if (eof) {
-		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
+		prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
 				count, &icur);
 		if (prealloc_blocks) {
 			xfs_extlen_t	align;
@@ -1006,11 +1006,11 @@ xfs_buffered_write_iomap_begin(
 	}
 
 retry:
-	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
+	error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
 			end_fsb - offset_fsb, prealloc_blocks,
-			whichfork == XFS_DATA_FORK ? &imap : &cmap,
-			whichfork == XFS_DATA_FORK ? &icur : &ccur,
-			whichfork == XFS_DATA_FORK ? eof : cow_eof);
+			allocfork == XFS_DATA_FORK ? &imap : &cmap,
+			allocfork == XFS_DATA_FORK ? &icur : &ccur,
+			allocfork == XFS_DATA_FORK ? eof : cow_eof);
 	switch (error) {
 	case 0:
 		break;
@@ -1027,8 +1027,8 @@ xfs_buffered_write_iomap_begin(
 		goto out_unlock;
 	}
 
-	if (whichfork == XFS_COW_FORK) {
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+	if (allocfork == XFS_COW_FORK) {
+		trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap);
 		goto found_cow;
 	}
 
@@ -1037,7 +1037,7 @@ xfs_buffered_write_iomap_begin(
 	 * them out if the write happens to fail.
 	 */
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+	trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
 	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
 
 found_imap:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 18/19] xfs: cleanup xfs_iomap_write_unwritten
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (16 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 18:06   ` Darrick J. Wong
  2019-09-09 18:27 ` [PATCH 19/19] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Move more checks into the helpers that determine if we need a COW
operation or allocation and split the return path for when an existing
data for allocation has been found versus a new allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 74 ++++++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0e575ca1e3fc..e4e79aa5b695 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -642,23 +642,42 @@ xfs_iomap_write_unwritten(
 static inline bool
 imap_needs_alloc(
 	struct inode		*inode,
+	unsigned		flags,
 	struct xfs_bmbt_irec	*imap,
 	int			nimaps)
 {
-	return !nimaps ||
-		imap->br_startblock == HOLESTARTBLOCK ||
-		imap->br_startblock == DELAYSTARTBLOCK ||
-		(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
+	/* don't allocate blocks when just zeroing */
+	if (flags & IOMAP_ZERO)
+		return false;
+	if (!nimaps ||
+	    imap->br_startblock == HOLESTARTBLOCK ||
+	    imap->br_startblock == DELAYSTARTBLOCK)
+		return true;
+	/* we convert unwritten extents before copying the data for DAX */
+	if (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN)
+		return true;
+	return false;
 }
 
 static inline bool
-needs_cow_for_zeroing(
+imap_needs_cow(
+	struct xfs_inode	*ip,
 	struct xfs_bmbt_irec	*imap,
+	unsigned int		flags,
 	int			nimaps)
 {
-	return nimaps &&
-		imap->br_startblock != HOLESTARTBLOCK &&
-		imap->br_state != XFS_EXT_UNWRITTEN;
+	if (!xfs_is_cow_inode(ip))
+		return false;
+
+	/* when zeroing we don't have to COW holes or unwritten extents */
+	if (flags & IOMAP_ZERO) {
+		if (!nimaps ||
+		    imap->br_startblock == HOLESTARTBLOCK ||
+		    imap->br_state == XFS_EXT_UNWRITTEN)
+			return false;
+	}
+
+	return true;
 }
 
 static int
@@ -734,7 +753,6 @@ xfs_direct_write_iomap_begin(
 	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
 	int			nimaps = 1, error = 0;
 	bool			shared = false;
-	u16			iomap_flags = 0;
 	unsigned		lockmode;
 
 	ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
@@ -761,12 +779,7 @@ xfs_direct_write_iomap_begin(
 	 * Break shared extents if necessary. Checks for non-blocking IO have
 	 * been done up front, so we don't need to do them here.
 	 */
-	if (xfs_is_cow_inode(ip)) {
-		/* if zeroing doesn't need COW allocation, then we are done. */
-		if ((flags & IOMAP_ZERO) &&
-		    !needs_cow_for_zeroing(&imap, nimaps))
-			goto out_found;
-
+	if (imap_needs_cow(ip, &imap, flags, nimaps)) {
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
 				&lockmode, flags & IOMAP_DIRECT);
@@ -778,18 +791,17 @@ xfs_direct_write_iomap_begin(
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
 	}
 
-	/* Don't need to allocate over holes when doing zeroing operations. */
-	if (flags & IOMAP_ZERO)
-		goto out_found;
+	if (imap_needs_alloc(inode, flags, &imap, nimaps))
+		goto allocate_blocks;
 
-	if (!imap_needs_alloc(inode, &imap, nimaps))
-		goto out_found;
+	xfs_iunlock(ip, lockmode);
+	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
 
-	/* If nowait is set bail since we are going to make allocations. */
-	if (flags & IOMAP_NOWAIT) {
-		error = -EAGAIN;
+allocate_blocks:
+	error = -EAGAIN;
+	if (flags & IOMAP_NOWAIT)
 		goto out_unlock;
-	}
 
 	/*
 	 * We cap the maximum length we map to a sane size  to keep the chunks
@@ -808,22 +820,12 @@ xfs_direct_write_iomap_begin(
 	 */
 	if (lockmode == XFS_ILOCK_EXCL)
 		xfs_ilock_demote(ip, lockmode);
-	error = xfs_iomap_write_direct(ip, offset, length, &imap,
-			nimaps);
+	error = xfs_iomap_write_direct(ip, offset, length, &imap, nimaps);
 	if (error)
 		return error;
 
-	iomap_flags |= IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
-
-out_finish:
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
-
-out_found:
-	ASSERT(nimaps);
-	xfs_iunlock(ip, lockmode);
-	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
-	goto out_finish;
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
 
 out_found_cow:
 	xfs_iunlock(ip, lockmode);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 19/19] xfs: improve the IOMAP_NOWAIT check for COW inodes
  2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
                   ` (17 preceding siblings ...)
  2019-09-09 18:27 ` [PATCH 18/19] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
@ 2019-09-09 18:27 ` Christoph Hellwig
  2019-09-18 18:09   ` Darrick J. Wong
  18 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-09 18:27 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Only bail out once we know that a COW allocation is actually required,
similar to how we handle normal data fork allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e4e79aa5b695..9e1b4b94acac 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -693,15 +693,8 @@ xfs_ilock_for_iomap(
 	 * COW writes may allocate delalloc space or convert unwritten COW
 	 * extents, so we need to make sure to take the lock exclusively here.
 	 */
-	if (xfs_is_cow_inode(ip) && is_write) {
-		/*
-		 * FIXME: It could still overwrite on unshared extents and not
-		 * need allocation.
-		 */
-		if (flags & IOMAP_NOWAIT)
-			return -EAGAIN;
+	if (xfs_is_cow_inode(ip) && is_write)
 		mode = XFS_ILOCK_EXCL;
-	}
 
 	/*
 	 * Extents not yet cached requires exclusive access, don't block.  This
@@ -760,12 +753,6 @@ xfs_direct_write_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	/*
-	 * Lock the inode in the manner required for the specified operation and
-	 * check for as many conditions that would result in blocking as
-	 * possible. This removes most of the non-blocking checks from the
-	 * mapping code below.
-	 */
 	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
 	if (error)
 		return error;
@@ -775,11 +762,11 @@ xfs_direct_write_iomap_begin(
 	if (error)
 		goto out_unlock;
 
-	/*
-	 * Break shared extents if necessary. Checks for non-blocking IO have
-	 * been done up front, so we don't need to do them here.
-	 */
 	if (imap_needs_cow(ip, &imap, flags, nimaps)) {
+		error = -EAGAIN;
+		if (flags & IOMAP_NOWAIT)
+			goto out_unlock;
+
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
 				&lockmode, flags & IOMAP_DIRECT);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O
  2019-09-09 18:27 ` [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
@ 2019-09-10 12:48   ` Goldwyn Rodrigues
  2019-09-10 14:39     ` hch
  2019-09-16 18:42   ` Darrick J. Wong
  1 sibling, 1 reply; 54+ messages in thread
From: Goldwyn Rodrigues @ 2019-09-10 12:48 UTC (permalink / raw)
  To: hch, linux-xfs, linux-fsdevel

On Mon, 2019-09-09 at 20:27 +0200, Christoph Hellwig wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The srcmap is used to identify where the read is to be performed
> from.
> It is passed to ->iomap_begin, which can fill it in if we need to
> read
> data for partially written blocks from a different location than the
> write target.  The srcmap is only supported for buffered writes so
> far.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
>       srcmap if not set, adjust length down to srcmap end as well]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c               |  9 ++++--
>  fs/ext2/inode.c        |  2 +-
>  fs/ext4/inode.c        |  2 +-
>  fs/gfs2/bmap.c         |  3 +-
>  fs/iomap/apply.c       | 23 +++++++++++----
>  fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-----------------
> --
>  fs/iomap/direct-io.c   |  2 +-
>  fs/iomap/fiemap.c      |  4 +--
>  fs/iomap/seek.c        |  4 +--
>  fs/iomap/swapfile.c    |  3 +-
>  fs/xfs/xfs_iomap.c     |  9 ++++--
>  include/linux/iomap.h  |  5 ++--
>  12 files changed, 79 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index a237141d8787..8016be24bbd9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
>  
>  static loff_t
>  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void
> *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct block_device *bdev = iomap->bdev;
>  	struct dax_device *dax_dev = iomap->dax_dev;
> @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct
> vm_fault *vmf, pfn_t *pfnp,
>  	unsigned long vaddr = vmf->address;
>  	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { 0 };
>  	unsigned flags = IOMAP_FAULT;
>  	int error, major = 0;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct
> vm_fault *vmf, pfn_t *pfnp,
>  	 * the file system block size to be equal the page size,
> which means
>  	 * that we never have to deal with more than a single extent
> here.
>  	 */
> -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags,
> &iomap);
> +	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags,
> &iomap, &srcmap);
>  	if (iomap_errp)
>  		*iomap_errp = error;
>  	if (error) {
> @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> vm_fault *vmf, pfn_t *pfnp,
>  	struct inode *inode = mapping->host;
>  	vm_fault_t result = VM_FAULT_FALLBACK;
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { 0 };
>  	pgoff_t max_pgoff;
>  	void *entry;
>  	loff_t pos;
> @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> vm_fault *vmf, pfn_t *pfnp,
>  	 * to look up our filesystem block.
>  	 */
>  	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> -	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags,
> &iomap);
> +	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags,
> &iomap,
> +			&srcmap);
>  	if (error)
>  		goto unlock_entry;
>  
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 7004ce581a32..467c13ff6b40 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t
> iblock,
>  
>  #ifdef CONFIG_FS_DAX
>  static int ext2_iomap_begin(struct inode *inode, loff_t offset,
> loff_t length,
> -		unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap
> *srcmap)
>  {
>  	unsigned int blkbits = inode->i_blkbits;
>  	unsigned long first_block = offset >> blkbits;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 420fe3deed39..e2116e8228d2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3453,7 +3453,7 @@ static bool ext4_inode_datasync_dirty(struct
> inode *inode)
>  }
>  
>  static int ext4_iomap_begin(struct inode *inode, loff_t offset,
> loff_t length,
> -			    unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap
> *srcmap)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	unsigned int blkbits = inode->i_blkbits;
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 79581b9bdebb..0bf8e8fa82bd 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1123,7 +1123,8 @@ static int gfs2_iomap_begin_write(struct inode
> *inode, loff_t pos,
>  }
>  
>  static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t
> length,
> -			    unsigned flags, struct iomap *iomap)
> +			    unsigned flags, struct iomap *iomap,
> +			    struct iomap *srcmap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  	struct metapath mp = { .mp_aheight = 1, };
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 54c02aecf3cd..67efd86675e6 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -24,7 +24,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t
> length, unsigned flags,
>  		const struct iomap_ops *ops, void *data,
> iomap_actor_t actor)
>  {
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { 0 };
>  	loff_t written = 0, ret;
> +	u64 end;
>  
>  	/*
>  	 * Need to map a range from start position for length bytes.
> This can
> @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t
> length, unsigned flags,
>  	 * expose transient stale data. If the reserve fails, we can
> safely
>  	 * back out at this point as there is nothing to undo.
>  	 */
> -	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> +	ret = ops->iomap_begin(inode, pos, length, flags, &iomap,
> &srcmap);
>  	if (ret)
>  		return ret;
>  	if (WARN_ON(iomap.offset > pos))
> @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos,
> loff_t length, unsigned flags,
>  	 * Cut down the length to the one actually provided by the
> filesystem,
>  	 * as it might not be able to give us the whole size that we
> requested.
>  	 */
> -	if (iomap.offset + iomap.length < pos + length)
> -		length = iomap.offset + iomap.length - pos;
> +	end = iomap.offset + iomap.length;
> +	if (srcmap.type)
> +		end = min(end, srcmap.offset + srcmap.length);
> +	if (pos + length > end)
> +		length = end - pos;
>  


Yes, that looks more correct. However, can we be smart and not bother
setting the minimum of end and srcmap.offset + srcmap.length if it is
not required? ie in situations where end coincides with block boundary.
 Or if srcmap.length falls short, until the last block boundary of
iomap? 

I did think about this scenario. This case is specific to CoW and
thought this is best handled by filesystem's iomap_begin(). If this
goes in, the filesystems would have to "falsify" srcmap length
information to maximize the amount of I/O that goes in one iteration.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O
  2019-09-10 12:48   ` Goldwyn Rodrigues
@ 2019-09-10 14:39     ` hch
  2019-09-16 17:57       ` Darrick J. Wong
  0 siblings, 1 reply; 54+ messages in thread
From: hch @ 2019-09-10 14:39 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: hch, linux-xfs, linux-fsdevel

On Tue, Sep 10, 2019 at 12:48:14PM +0000, Goldwyn Rodrigues wrote:
> > +	if (srcmap.type)
> > +		end = min(end, srcmap.offset + srcmap.length);
> > +	if (pos + length > end)
> > +		length = end - pos;
> >  
> 
> 
> Yes, that looks more correct. However, can we be smart and not bother
> setting the minimum of end and srcmap.offset + srcmap.length if it is
> not required? ie in situations where end coincides with block boundary.
>  Or if srcmap.length falls short, until the last block boundary of
> iomap? 
> 
> I did think about this scenario. This case is specific to CoW and
> thought this is best handled by filesystem's iomap_begin(). If this
> goes in, the filesystems would have to "falsify" srcmap length
> information to maximize the amount of I/O that goes in one iteration.

The problem is really that we can easily run over the srcmap (that's
what happened to me with XFS..)  One thing you've done in btrfs that
I haven't done yet in XFS is to simply not bother with filling out
the srcmap if we don't need to (that is if the iteration is fully
page aligned in your patch set - the unshare op in this series will
complicate things a little).  With that optimization the only case
where the shortening of the iteration that matters is if the start
is unaligned and needs a read-modify-write cycle, but the end is
aligned and beyond the end of the srcmap.  Is that such an important
case?

> 
> -- 
> Goldwyn---end quoted text---

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 01/19] iomap: better document the IOMAP_F_* flags
  2019-09-09 18:27 ` [PATCH 01/19] iomap: better document the IOMAP_F_* flags Christoph Hellwig
@ 2019-09-14  0:42   ` Allison Collins
  2019-09-16 18:08   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Allison Collins @ 2019-09-14  0:42 UTC (permalink / raw)
  To: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Looks good to me, I think the new comments are a lot more helpful
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

On 9/9/19 11:27 AM, Christoph Hellwig wrote:
> The documentation for IOMAP_F_* is a bit disorganized, and doesn't
> mention the fact that most flags are set by the file system and consumed
> by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
> consumed by the file system.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/iomap.h | 31 +++++++++++++++++++++++--------
>   1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e79af6b28410..8adcc8dd4498 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -30,21 +30,36 @@ struct vm_fault;
>   #define IOMAP_INLINE	0x05	/* data inline in the inode */
>   
>   /*
> - * Flags for all iomap mappings:
> + * Flags reported by the file system from iomap_begin:
> + *
> + * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
> + * zeroing for areas that no data is copied to.
>    *
>    * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
>    * written data and requires fdatasync to commit them to persistent storage.
> + *
> + * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be
> + * unshared as part a write.
> + *
> + * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block
> + * mappings.
> + *
> + * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
> + * buffer heads for this mapping.
>    */
> -#define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
> -#define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
> -#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
> -#define IOMAP_F_SIZE_CHANGED	0x08	/* file size has changed */
> +#define IOMAP_F_NEW		0x01
> +#define IOMAP_F_DIRTY		0x02
> +#define IOMAP_F_SHARED		0x04
> +#define IOMAP_F_MERGED		0x08
> +#define IOMAP_F_BUFFER_HEAD	0x10
>   
>   /*
> - * Flags that only need to be reported for IOMAP_REPORT requests:
> + * Flags set by the core iomap code during operations:
> + *
> + * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
> + * has changed as the result of this write operation.
>    */
> -#define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
> -#define IOMAP_F_SHARED		0x20	/* block shared with another file */
> +#define IOMAP_F_SIZE_CHANGED	0x100
>   
>   /*
>    * Flags from 0x1000 up are for file system specific usage:
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end
  2019-09-09 18:27 ` [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
@ 2019-09-14  0:42   ` Allison Collins
  2019-09-16 18:10   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Allison Collins @ 2019-09-14  0:42 UTC (permalink / raw)
  To: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Looks fine:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

On 9/9/19 11:27 AM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/iomap/buffered-io.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 051b8ec326ba..2a9b41352495 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -678,7 +678,7 @@ EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
>   
>   static int
>   __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page, struct iomap *iomap)
> +		unsigned copied, struct page *page)
>   {
>   	flush_dcache_page(page);
>   
> @@ -731,7 +731,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>   		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
>   				page, NULL);
>   	} else {
> -		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
> +		ret = __iomap_write_end(inode, pos, len, copied, page);
>   	}
>   
>   	/*
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb
  2019-09-09 18:27 ` [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
@ 2019-09-14  0:42   ` Allison Collins
  2019-09-18 17:55   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Allison Collins @ 2019-09-14  0:42 UTC (permalink / raw)
  To: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Looks Ok to me:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

On 9/9/19 11:27 AM, Christoph Hellwig wrote:
> We have lots of places that want to calculate the final fsb for
> a offset + count in bytes and check that the result fits into
> s_maxbytes.  Factor out a helper for that.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/xfs/xfs_iomap.c | 40 ++++++++++++++++++++--------------------
>   1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index d12eacdc9bba..0ba67a8d8169 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -102,6 +102,17 @@ xfs_hole_to_iomap(
>   	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
>   }
>   
> +static inline xfs_fileoff_t
> +xfs_iomap_end_fsb(
> +	struct xfs_mount	*mp,
> +	loff_t			offset,
> +	loff_t			count)
> +{
> +	ASSERT(offset <= mp->m_super->s_maxbytes);
> +	return min(XFS_B_TO_FSB(mp, offset + count),
> +		   XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
> +}
> +
>   xfs_extlen_t
>   xfs_eof_alignment(
>   	struct xfs_inode	*ip,
> @@ -172,8 +183,8 @@ xfs_iomap_write_direct(
>   	int		nmaps)
>   {
>   	xfs_mount_t	*mp = ip->i_mount;
> -	xfs_fileoff_t	offset_fsb;
> -	xfs_fileoff_t	last_fsb;
> +	xfs_fileoff_t	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t	last_fsb = xfs_iomap_end_fsb(mp, offset, count);
>   	xfs_filblks_t	count_fsb, resaligned;
>   	xfs_extlen_t	extsz;
>   	int		nimaps;
> @@ -192,8 +203,6 @@ xfs_iomap_write_direct(
>   
>   	ASSERT(xfs_isilocked(ip, lockmode));
>   
> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
>   	if ((offset + count) > XFS_ISIZE(ip)) {
>   		/*
>   		 * Assert that the in-core extent list is present since this can
> @@ -533,9 +542,7 @@ xfs_file_iomap_begin_delay(
>   	struct xfs_inode	*ip = XFS_I(inode);
>   	struct xfs_mount	*mp = ip->i_mount;
>   	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	xfs_fileoff_t		maxbytes_fsb =
> -		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> -	xfs_fileoff_t		end_fsb;
> +	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
>   	struct xfs_bmbt_irec	imap, cmap;
>   	struct xfs_iext_cursor	icur, ccur;
>   	xfs_fsblock_t		prealloc_blocks = 0;
> @@ -565,8 +572,6 @@ xfs_file_iomap_begin_delay(
>   			goto out_unlock;
>   	}
>   
> -	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> -
>   	/*
>   	 * Search the data fork fork first to look up our source mapping.  We
>   	 * always need the data fork map, as we have to return it to the
> @@ -648,7 +653,7 @@ xfs_file_iomap_begin_delay(
>   		 * the lower level functions are updated.
>   		 */
>   		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> -		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> +		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
>   
>   		if (xfs_is_always_cow_inode(ip))
>   			whichfork = XFS_COW_FORK;
> @@ -674,7 +679,8 @@ xfs_file_iomap_begin_delay(
>   			if (align)
>   				p_end_fsb = roundup_64(p_end_fsb, align);
>   
> -			p_end_fsb = min(p_end_fsb, maxbytes_fsb);
> +			p_end_fsb = min(p_end_fsb,
> +				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
>   			ASSERT(p_end_fsb > offset_fsb);
>   			prealloc_blocks = p_end_fsb - end_fsb;
>   		}
> @@ -937,7 +943,8 @@ xfs_file_iomap_begin(
>   	struct xfs_inode	*ip = XFS_I(inode);
>   	struct xfs_mount	*mp = ip->i_mount;
>   	struct xfs_bmbt_irec	imap, cmap;
> -	xfs_fileoff_t		offset_fsb, end_fsb;
> +	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
>   	int			nimaps = 1, error = 0;
>   	bool			shared = false;
>   	u16			iomap_flags = 0;
> @@ -963,12 +970,6 @@ xfs_file_iomap_begin(
>   	if (error)
>   		return error;
>   
> -	ASSERT(offset <= mp->m_super->s_maxbytes);
> -	if (offset > mp->m_super->s_maxbytes - length)
> -		length = mp->m_super->s_maxbytes - offset;
> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> -
>   	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
>   			       &nimaps, 0);
>   	if (error)
> @@ -1189,8 +1190,7 @@ xfs_seek_iomap_begin(
>   		/*
>   		 * Fake a hole until the end of the file.
>   		 */
> -		data_fsb = min(XFS_B_TO_FSB(mp, offset + length),
> -			       XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
> +		data_fsb = xfs_iomap_end_fsb(mp, offset, length);
>   	}
>   
>   	/*
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin
  2019-09-09 18:27 ` [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
@ 2019-09-14  0:42   ` Allison Collins
  2019-09-18 18:00   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Allison Collins @ 2019-09-14  0:42 UTC (permalink / raw)
  To: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Looks fine:
Reviewed-by: Allison Collins <allison.henderson@oracle.com>

On 9/9/19 11:27 AM, Christoph Hellwig wrote:
> Renaming whichfork to allocfork in xfs_buffered_write_iomap_begin makes
> the usage of this variable a little more clear.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/xfs/xfs_iomap.c | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 6dd143374d75..0e575ca1e3fc 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -862,7 +862,7 @@ xfs_buffered_write_iomap_begin(
>   	struct xfs_iext_cursor	icur, ccur;
>   	xfs_fsblock_t		prealloc_blocks = 0;
>   	bool			eof = false, cow_eof = false, shared = false;
> -	int			whichfork = XFS_DATA_FORK;
> +	int			allocfork = XFS_DATA_FORK;
>   	int			error = 0;
>   
>   	/* we can't use delayed allocations when using extent size hints */
> @@ -959,7 +959,7 @@ xfs_buffered_write_iomap_begin(
>   		 * Fork all the shared blocks from our write offset until the
>   		 * end of the extent.
>   		 */
> -		whichfork = XFS_COW_FORK;
> +		allocfork = XFS_COW_FORK;
>   		end_fsb = imap.br_startoff + imap.br_blockcount;
>   	} else {
>   		/*
> @@ -975,7 +975,7 @@ xfs_buffered_write_iomap_begin(
>   		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
>   
>   		if (xfs_is_always_cow_inode(ip))
> -			whichfork = XFS_COW_FORK;
> +			allocfork = XFS_COW_FORK;
>   	}
>   
>   	error = xfs_qm_dqattach_locked(ip, false);
> @@ -983,7 +983,7 @@ xfs_buffered_write_iomap_begin(
>   		goto out_unlock;
>   
>   	if (eof) {
> -		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
> +		prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
>   				count, &icur);
>   		if (prealloc_blocks) {
>   			xfs_extlen_t	align;
> @@ -1006,11 +1006,11 @@ xfs_buffered_write_iomap_begin(
>   	}
>   
>   retry:
> -	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
> +	error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
>   			end_fsb - offset_fsb, prealloc_blocks,
> -			whichfork == XFS_DATA_FORK ? &imap : &cmap,
> -			whichfork == XFS_DATA_FORK ? &icur : &ccur,
> -			whichfork == XFS_DATA_FORK ? eof : cow_eof);
> +			allocfork == XFS_DATA_FORK ? &imap : &cmap,
> +			allocfork == XFS_DATA_FORK ? &icur : &ccur,
> +			allocfork == XFS_DATA_FORK ? eof : cow_eof);
>   	switch (error) {
>   	case 0:
>   		break;
> @@ -1027,8 +1027,8 @@ xfs_buffered_write_iomap_begin(
>   		goto out_unlock;
>   	}
>   
> -	if (whichfork == XFS_COW_FORK) {
> -		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
> +	if (allocfork == XFS_COW_FORK) {
> +		trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap);
>   		goto found_cow;
>   	}
>   
> @@ -1037,7 +1037,7 @@ xfs_buffered_write_iomap_begin(
>   	 * them out if the write happens to fail.
>   	 */
>   	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> +	trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
>   	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
>   
>   found_imap:
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O
  2019-09-10 14:39     ` hch
@ 2019-09-16 17:57       ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 17:57 UTC (permalink / raw)
  To: hch; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Sep 10, 2019 at 04:39:59PM +0200, hch@lst.de wrote:
> On Tue, Sep 10, 2019 at 12:48:14PM +0000, Goldwyn Rodrigues wrote:
> > > +	if (srcmap.type)
> > > +		end = min(end, srcmap.offset + srcmap.length);
> > > +	if (pos + length > end)
> > > +		length = end - pos;
> > >  
> > 
> > 
> > Yes, that looks more correct. However, can we be smart and not bother
> > setting the minimum of end and srcmap.offset + srcmap.length if it is
> > not required? ie in situations where end coincides with block boundary.
> >  Or if srcmap.length falls short, until the last block boundary of
> > iomap? 
> > 
> > I did think about this scenario. This case is specific to CoW and
> > thought this is best handled by filesystem's iomap_begin(). If this
> > goes in, the filesystems would have to "falsify" srcmap length
> > information to maximize the amount of I/O that goes in one iteration.
> 
> The problem is really that we can easily run over the srcmap (that's
> what happened to me with XFS..)  One thing you've done in btrfs that
> I haven't done yet in XFS is to simply not bother with filling out
> the srcmap if we don't need to (that is if the iteration is fully
> page aligned in your patch set - the unshare op in this series will
> complicate things a little).  With that optimization the only case
> where the shortening of the iteration that matters is if the start
> is unaligned and needs a read-modify-write cycle, but the end is
> aligned and beyond the end of the srcmap.  Is that such an important
> case?

<shrug> You might as well, since it's still going to be a few weeks
until I carve off a iomap work branch for 5.5... :)

(He says stumbling in from vacation with half a brain)

--D

> > 
> > -- 
> > Goldwyn---end quoted text---

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 01/19] iomap: better document the IOMAP_F_* flags
  2019-09-09 18:27 ` [PATCH 01/19] iomap: better document the IOMAP_F_* flags Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
@ 2019-09-16 18:08   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 18:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:04PM +0200, Christoph Hellwig wrote:
> The documentation for IOMAP_F_* is a bit disorganized, and doesn't
> mention the fact that most flags are set by the file system and consumed
> by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
> consumed by the file system.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Wooooooo,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  include/linux/iomap.h | 31 +++++++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e79af6b28410..8adcc8dd4498 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -30,21 +30,36 @@ struct vm_fault;
>  #define IOMAP_INLINE	0x05	/* data inline in the inode */
>  
>  /*
> - * Flags for all iomap mappings:
> + * Flags reported by the file system from iomap_begin:
> + *
> + * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
> + * zeroing for areas that no data is copied to.
>   *
>   * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
>   * written data and requires fdatasync to commit them to persistent storage.
> + *
> + * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be
> + * unshared as part a write.
> + *
> + * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block
> + * mappings.
> + *
> + * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
> + * buffer heads for this mapping.
>   */
> -#define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
> -#define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
> -#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
> -#define IOMAP_F_SIZE_CHANGED	0x08	/* file size has changed */
> +#define IOMAP_F_NEW		0x01
> +#define IOMAP_F_DIRTY		0x02
> +#define IOMAP_F_SHARED		0x04
> +#define IOMAP_F_MERGED		0x08
> +#define IOMAP_F_BUFFER_HEAD	0x10
>  
>  /*
> - * Flags that only need to be reported for IOMAP_REPORT requests:
> + * Flags set by the core iomap code during operations:
> + *
> + * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
> + * has changed as the result of this write operation.
>   */
> -#define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
> -#define IOMAP_F_SHARED		0x20	/* block shared with another file */
> +#define IOMAP_F_SIZE_CHANGED	0x100
>  
>  /*
>   * Flags from 0x1000 up are for file system specific usage:
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end
  2019-09-09 18:27 ` [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
@ 2019-09-16 18:10   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 18:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:05PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 051b8ec326ba..2a9b41352495 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -678,7 +678,7 @@ EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
>  
>  static int
>  __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page, struct iomap *iomap)
> +		unsigned copied, struct page *page)
>  {
>  	flush_dcache_page(page);
>  
> @@ -731,7 +731,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
>  				page, NULL);
>  	} else {
> -		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
> +		ret = __iomap_write_end(inode, pos, len, copied, page);
>  	}
>  
>  	/*
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/19] iomap: always use AOP_FLAG_NOFS in iomap_write_begin
  2019-09-09 18:27 ` [PATCH 03/19] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
@ 2019-09-16 18:11   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 18:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:06PM +0200, Christoph Hellwig wrote:
> All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
> to allow reusing the flags arguments for an internal flags namespace
> soon.  Also remove the local index variable that is only used once.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 2a9b41352495..33e03992d8a7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -606,7 +606,6 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		struct page **pagep, struct iomap *iomap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
> -	pgoff_t index = pos >> PAGE_SHIFT;
>  	struct page *page;
>  	int status = 0;
>  
> @@ -621,7 +620,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  			return status;
>  	}
>  
> -	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
> +	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
> +			AOP_FLAG_NOFS);
>  	if (!page) {
>  		status = -ENOMEM;
>  		goto out_no_page;
> @@ -763,7 +763,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	struct iov_iter *i = data;
>  	long status = 0;
>  	ssize_t written = 0;
> -	unsigned int flags = AOP_FLAG_NOFS;
>  
>  	do {
>  		struct page *page;
> @@ -793,8 +792,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  			break;
>  		}
>  
> -		status = iomap_write_begin(inode, pos, bytes, flags, &page,
> -				iomap);
> +		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
>  		if (unlikely(status))
>  			break;
>  
> @@ -892,8 +890,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		if (IS_ERR(rpage))
>  			return PTR_ERR(rpage);
>  
> -		status = iomap_write_begin(inode, pos, bytes,
> -					   AOP_FLAG_NOFS, &page, iomap);
> +		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
>  		put_page(rpage);
>  		if (unlikely(status))
>  			return status;
> @@ -944,8 +941,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>  	struct page *page;
>  	int status;
>  
> -	status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
> -				   iomap);
> +	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
>  	if (status)
>  		return status;
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 04/19] iomap: ignore non-shared or non-data blocks in xfs_file_dirty
  2019-09-09 18:27 ` [PATCH 04/19] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
@ 2019-09-16 18:12   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 18:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:07PM +0200, Christoph Hellwig wrote:
> xfs_file_dirty is used to unshare reflink blocks.  Rename the function
> to xfs_file_unshare to better document that purpose, and skip iomaps
> that are not shared are don't need zeroing.  This will allow to simplify

                      ^^^ "and"

> the caller.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Otherwise this looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 15 +++++++++++----
>  fs/xfs/xfs_reflink.c   |  2 +-
>  include/linux/iomap.h  |  2 +-
>  3 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 33e03992d8a7..0b05551d9b5a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -872,12 +872,19 @@ __iomap_read_page(struct inode *inode, loff_t offset)
>  }
>  
>  static loff_t
> -iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> +iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		struct iomap *iomap)
>  {
>  	long status = 0;
>  	ssize_t written = 0;
>  
> +	/* don't bother with blocks that are not shared to start with */
> +	if (!(iomap->flags & IOMAP_F_SHARED))
> +		return length;
> +	/* don't bother with holes or unwritten extents */
> +	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> +		return length;
> +
>  	do {
>  		struct page *page, *rpage;
>  		unsigned long offset;	/* Offset into pagecache page */
> @@ -917,14 +924,14 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  }
>  
>  int
> -iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
> +iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  		const struct iomap_ops *ops)
>  {
>  	loff_t ret;
>  
>  	while (len) {
>  		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
> -				iomap_dirty_actor);
> +				iomap_unshare_actor);
>  		if (ret <= 0)
>  			return ret;
>  		pos += ret;
> @@ -933,7 +940,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL_GPL(iomap_file_dirty);
> +EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
>  static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
>  		unsigned bytes, struct iomap *iomap)
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index c4ec7afd1170..cadc0456804d 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1442,7 +1442,7 @@ xfs_reflink_dirty_extents(
>  			flen = XFS_FSB_TO_B(mp, rlen);
>  			if (fpos + flen > isize)
>  				flen = isize - fpos;
> -			error = iomap_file_dirty(VFS_I(ip), fpos, flen,
> +			error = iomap_file_unshare(VFS_I(ip), fpos, flen,
>  					&xfs_iomap_ops);
>  			xfs_ilock(ip, XFS_ILOCK_EXCL);
>  			if (error)
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 8adcc8dd4498..3a0f0975a57e 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -183,7 +183,7 @@ int iomap_migrate_page(struct address_space *mapping, struct page *newpage,
>  #else
>  #define iomap_migrate_page NULL
>  #endif
> -int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
> +int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  		const struct iomap_ops *ops);
>  int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
>  		bool *did_zero, const struct iomap_ops *ops);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 05/19] iomap: move the zeroing case out of iomap_read_page_sync
  2019-09-09 18:27 ` [PATCH 05/19] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
@ 2019-09-16 18:17   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 18:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:08PM +0200, Christoph Hellwig wrote:
> That keeps the function a little easier to understand, and easier to
> modify for pending enhancements.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 33 ++++++++++++++++-----------------
>  1 file changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 0b05551d9b5a..fe099faf540f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -547,19 +547,12 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  }
>  
>  static int
> -iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
> -		unsigned poff, unsigned plen, unsigned from, unsigned to,
> -		struct iomap *iomap)
> +iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
> +		unsigned plen, struct iomap *iomap)
>  {
>  	struct bio_vec bvec;
>  	struct bio bio;
>  
> -	if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> -		zero_user_segments(page, poff, from, to, poff + plen);
> -		iomap_set_range_uptodate(page, poff, plen);
> -		return 0;
> -	}
> -
>  	bio_init(&bio, &bvec, 1);
>  	bio.bi_opf = REQ_OP_READ;
>  	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
> @@ -577,7 +570,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>  	loff_t block_start = pos & ~(block_size - 1);
>  	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
>  	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
> -	int status = 0;
> +	int status;
>  
>  	if (PageUptodate(page))
>  		return 0;
> @@ -588,17 +581,23 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>  		if (plen == 0)
>  			break;
>  
> -		if ((from > poff && from < poff + plen) ||
> -		    (to > poff && to < poff + plen)) {
> -			status = iomap_read_page_sync(inode, block_start, page,
> -					poff, plen, from, to, iomap);
> -			if (status)
> -				break;
> +		if ((from <= poff || from >= poff + plen) &&
> +		    (to <= poff || to >= poff + plen))
> +			continue;
> +
> +		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> +			zero_user_segments(page, poff, from, to, poff + plen);
> +			iomap_set_range_uptodate(page, poff, plen);
> +			continue;
>  		}
>  
> +		status = iomap_read_page_sync(block_start, page, poff, plen,
> +				iomap);
> +		if (status)
> +			return status;
>  	} while ((block_start += plen) < block_end);
>  
> -	return status;
> +	return 0;
>  }
>  
>  static int
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/19] iomap: use write_begin to read pages to unshare
  2019-09-09 18:27 ` [PATCH 06/19] iomap: use write_begin to read pages to unshare Christoph Hellwig
@ 2019-09-16 18:34   ` Darrick J. Wong
  2019-09-30 11:07     ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 18:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:09PM +0200, Christoph Hellwig wrote:
> Use the existing iomap write_begin code to read the pages unshared
> by iomap_file_unshare.  That avoids the extra ->readpage call and
> extent tree lookup currently done by read_mapping_page.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 49 ++++++++++++++----------------------------
>  1 file changed, 16 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index fe099faf540f..a421977a9496 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -533,6 +533,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
>  EXPORT_SYMBOL_GPL(iomap_migrate_page);
>  #endif /* CONFIG_MIGRATION */
>  
> +enum {
> +	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
> +};
> +
>  static void
>  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  {
> @@ -562,7 +566,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  }
>  
>  static int
> -__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> +__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		struct page *page, struct iomap *iomap)
>  {
>  	struct iomap_page *iop = iomap_page_create(inode, page);
> @@ -581,11 +585,14 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>  		if (plen == 0)
>  			break;
>  
> -		if ((from <= poff || from >= poff + plen) &&
> +		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&

Mmm, archeology of code that I wrote originally and have forgotten
already... :)

I think the purpose of F_UNSHARE is to mimic the behavior of the code
that's being removed, and the old behavior is that if a user asks to
unshare a page backed by shared extents we'll read in all the blocks
backing the page, even if that means reading in blocks that weren't part
of the original unshare request, right?

The only reason I can think of (or remember) for doing it that way is
that read_mapping_page does its IO one page at a time, i.e. sheer
laziness on my part.

So do we actually need F_UNSHARE to read in the whole page or can we get
by with reading only the blocks that are included in the range that's
being unshared, just like how write only reads in the blocks that are
included in the range being written to?

--D

> +		    (from <= poff || from >= poff + plen) &&
>  		    (to <= poff || to >= poff + plen))
>  			continue;
>  
>  		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> +			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> +				return -EIO;
>  			zero_user_segments(page, poff, from, to, poff + plen);
>  			iomap_set_range_uptodate(page, poff, plen);
>  			continue;
> @@ -631,7 +638,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, iomap);
>  	else
> -		status = __iomap_write_begin(inode, pos, len, page, iomap);
> +		status = __iomap_write_begin(inode, pos, len, flags, page,
> +				iomap);
>  
>  	if (unlikely(status))
>  		goto out_unlock;
> @@ -854,22 +862,6 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
> -static struct page *
> -__iomap_read_page(struct inode *inode, loff_t offset)
> -{
> -	struct address_space *mapping = inode->i_mapping;
> -	struct page *page;
> -
> -	page = read_mapping_page(mapping, offset >> PAGE_SHIFT, NULL);
> -	if (IS_ERR(page))
> -		return page;
> -	if (!PageUptodate(page)) {
> -		put_page(page);
> -		return ERR_PTR(-EIO);
> -	}
> -	return page;
> -}
> -
>  static loff_t
>  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		struct iomap *iomap)
> @@ -885,24 +877,15 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		return length;
>  
>  	do {
> -		struct page *page, *rpage;
> -		unsigned long offset;	/* Offset into pagecache page */
> -		unsigned long bytes;	/* Bytes to write to page */
> -
> -		offset = offset_in_page(pos);
> -		bytes = min_t(loff_t, PAGE_SIZE - offset, length);
> -
> -		rpage = __iomap_read_page(inode, pos);
> -		if (IS_ERR(rpage))
> -			return PTR_ERR(rpage);
> +		unsigned long offset = offset_in_page(pos);
> +		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
> +		struct page *page;
>  
> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> -		put_page(rpage);
> +		status = iomap_write_begin(inode, pos, bytes,
> +				IOMAP_WRITE_F_UNSHARE, &page, iomap);
>  		if (unlikely(status))
>  			return status;
>  
> -		WARN_ON_ONCE(!PageUptodate(page));
> -
>  		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>  		if (unlikely(status <= 0)) {
>  			if (WARN_ON_ONCE(status == 0))
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O
  2019-09-09 18:27 ` [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
  2019-09-10 12:48   ` Goldwyn Rodrigues
@ 2019-09-16 18:42   ` Darrick J. Wong
  2019-09-18 18:15     ` Darrick J. Wong
  1 sibling, 1 reply; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-16 18:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:10PM +0200, Christoph Hellwig wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The srcmap is used to identify where the read is to be performed from.
> It is passed to ->iomap_begin, which can fill it in if we need to read
> data for partially written blocks from a different location than the
> write target.  The srcmap is only supported for buffered writes so far.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
>       srcmap if not set, adjust length down to srcmap end as well]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c               |  9 ++++--
>  fs/ext2/inode.c        |  2 +-
>  fs/ext4/inode.c        |  2 +-
>  fs/gfs2/bmap.c         |  3 +-
>  fs/iomap/apply.c       | 23 +++++++++++----
>  fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
>  fs/iomap/direct-io.c   |  2 +-
>  fs/iomap/fiemap.c      |  4 +--
>  fs/iomap/seek.c        |  4 +--
>  fs/iomap/swapfile.c    |  3 +-
>  fs/xfs/xfs_iomap.c     |  9 ++++--
>  include/linux/iomap.h  |  5 ++--
>  12 files changed, 79 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index a237141d8787..8016be24bbd9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
>  
>  static loff_t
>  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct block_device *bdev = iomap->bdev;
>  	struct dax_device *dax_dev = iomap->dax_dev;
> @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	unsigned long vaddr = vmf->address;
>  	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { 0 };
>  	unsigned flags = IOMAP_FAULT;
>  	int error, major = 0;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	 * the file system block size to be equal the page size, which means
>  	 * that we never have to deal with more than a single extent here.
>  	 */
> -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> +	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
>  	if (iomap_errp)
>  		*iomap_errp = error;
>  	if (error) {
> @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	struct inode *inode = mapping->host;
>  	vm_fault_t result = VM_FAULT_FALLBACK;
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { 0 };
>  	pgoff_t max_pgoff;
>  	void *entry;
>  	loff_t pos;
> @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	 * to look up our filesystem block.
>  	 */
>  	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> -	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
> +	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
> +			&srcmap);
>  	if (error)
>  		goto unlock_entry;
>  
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 7004ce581a32..467c13ff6b40 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
>  
>  #ifdef CONFIG_FS_DAX
>  static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> -		unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	unsigned int blkbits = inode->i_blkbits;
>  	unsigned long first_block = offset >> blkbits;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 420fe3deed39..e2116e8228d2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3453,7 +3453,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>  }
>  
>  static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> -			    unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	unsigned int blkbits = inode->i_blkbits;
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index 79581b9bdebb..0bf8e8fa82bd 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1123,7 +1123,8 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
>  }
>  
>  static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
> -			    unsigned flags, struct iomap *iomap)
> +			    unsigned flags, struct iomap *iomap,
> +			    struct iomap *srcmap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  	struct metapath mp = { .mp_aheight = 1, };
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 54c02aecf3cd..67efd86675e6 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -24,7 +24,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
>  {
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { 0 };
>  	loff_t written = 0, ret;
> +	u64 end;
>  
>  	/*
>  	 * Need to map a range from start position for length bytes. This can
> @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  	 * expose transient stale data. If the reserve fails, we can safely
>  	 * back out at this point as there is nothing to undo.
>  	 */
> -	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> +	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
>  	if (ret)
>  		return ret;
>  	if (WARN_ON(iomap.offset > pos))
> @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  	 * Cut down the length to the one actually provided by the filesystem,
>  	 * as it might not be able to give us the whole size that we requested.
>  	 */
> -	if (iomap.offset + iomap.length < pos + length)
> -		length = iomap.offset + iomap.length - pos;
> +	end = iomap.offset + iomap.length;
> +	if (srcmap.type)

Do we need a "#define IOMAP_NULL 0" type to make this more explicit?

Or at least a comment in iomap.h above the IOMAP_HOLE definition
pointing out that type == 0 is valid for srcmap and not valid for iomap?

Other than that the rest looks ok.

--D

(This reminds me that at some point I ought to work on iomap debugging
checks for the mappings returned by ->iomap_begin ...)

> +		end = min(end, srcmap.offset + srcmap.length);
> +	if (pos + length > end)
> +		length = end - pos;
>  
>  	/*
> -	 * Now that we have guaranteed that the space allocation will succeed.
> +	 * Now that we have guaranteed that the space allocation will succeed,
>  	 * we can do the copy-in page by page without having to worry about
>  	 * failures exposing transient data.
> +	 *
> +	 * To support COW operations, we read in data for partially blocks from
> +	 * the srcmap if the file system filled it in.  In that case we the
> +	 * length needs to be limited to the earlier of the ends of the iomaps.
> +	 * If the file system did not provide a srcmap we pass in the normal
> +	 * iomap into the actors so that they don't need to have special
> +	 * handling for the two cases.
>  	 */
> -	written = actor(inode, pos, length, data, &iomap);
> +	written = actor(inode, pos, length, data, &iomap,
> +			srcmap.type ? &srcmap : &iomap);
>  
>  	/*
>  	 * Now the data has been copied, commit the range we've copied.  This
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a421977a9496..58fb7a894ad4 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -219,7 +219,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
>  
>  static loff_t
>  iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
>  	struct page *page = ctx->cur_page;
> @@ -367,7 +367,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
>  
>  static loff_t
>  iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
>  	loff_t done, ret;
> @@ -387,7 +387,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  			ctx->cur_page_in_bio = false;
>  		}
>  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> -				ctx, iomap);
> +				ctx, iomap, srcmap);
>  	}
>  
>  	return done;
> @@ -567,7 +567,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  
>  static int
>  __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> -		struct page *page, struct iomap *iomap)
> +		struct page *page, struct iomap *srcmap)
>  {
>  	struct iomap_page *iop = iomap_page_create(inode, page);
>  	loff_t block_size = i_blocksize(inode);
> @@ -590,7 +590,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		    (to <= poff || to >= poff + plen))
>  			continue;
>  
> -		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> +		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
>  			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
>  				return -EIO;
>  			zero_user_segments(page, poff, from, to, poff + plen);
> @@ -599,7 +599,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		}
>  
>  		status = iomap_read_page_sync(block_start, page, poff, plen,
> -				iomap);
> +				srcmap);
>  		if (status)
>  			return status;
>  	} while ((block_start += plen) < block_end);
> @@ -609,13 +609,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> -		struct page **pagep, struct iomap *iomap)
> +		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	struct page *page;
>  	int status = 0;
>  
>  	BUG_ON(pos + len > iomap->offset + iomap->length);
> +	if (srcmap != iomap)
> +		BUG_ON(pos + len > srcmap->offset + srcmap->length);
>  
>  	if (fatal_signal_pending(current))
>  		return -EINTR;
> @@ -633,13 +635,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		goto out_no_page;
>  	}
>  
> -	if (iomap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, iomap);
> +	if (srcmap->type == IOMAP_INLINE)
> +		iomap_read_inline_data(inode, page, srcmap);
>  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> -		status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, flags, page,
> -				iomap);
> +				srcmap);
>  
>  	if (unlikely(status))
>  		goto out_unlock;
> @@ -725,16 +727,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  }
>  
>  static int
> -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page, struct iomap *iomap)
> +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> +		struct page *page, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	loff_t old_size = inode->i_size;
>  	int ret;
>  
> -	if (iomap->type == IOMAP_INLINE) {
> +	if (srcmap->type == IOMAP_INLINE) {
>  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> -	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> +	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
>  		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
>  				page, NULL);
>  	} else {
> @@ -765,7 +767,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  
>  static loff_t
>  iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iov_iter *i = data;
>  	long status = 0;
> @@ -799,7 +801,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  			break;
>  		}
>  
> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> +		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> +				srcmap);
>  		if (unlikely(status))
>  			break;
>  
> @@ -810,8 +813,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		flush_dcache_page(page);
>  
> -		status = iomap_write_end(inode, pos, bytes, copied, page,
> -				iomap);
> +		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> +				srcmap);
>  		if (unlikely(status < 0))
>  			break;
>  		copied = status;
> @@ -864,7 +867,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
>  static loff_t
>  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	long status = 0;
>  	ssize_t written = 0;
> @@ -873,7 +876,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	if (!(iomap->flags & IOMAP_F_SHARED))
>  		return length;
>  	/* don't bother with holes or unwritten extents */
> -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return length;
>  
>  	do {
> @@ -882,11 +885,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		struct page *page;
>  
>  		status = iomap_write_begin(inode, pos, bytes,
> -				IOMAP_WRITE_F_UNSHARE, &page, iomap);
> +				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
>  		if (unlikely(status))
>  			return status;
>  
> -		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> +				srcmap);
>  		if (unlikely(status <= 0)) {
>  			if (WARN_ON_ONCE(status == 0))
>  				return -EIO;
> @@ -925,19 +929,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
>  static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> -		unsigned bytes, struct iomap *iomap)
> +		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page;
>  	int status;
>  
> -	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> +	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
>  	if (status)
>  		return status;
>  
>  	zero_user(page, offset, bytes);
>  	mark_page_accessed(page);
>  
> -	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
>  }
>  
>  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -949,14 +953,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
>  
>  static loff_t
>  iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	bool *did_zero = data;
>  	loff_t written = 0;
>  	int status;
>  
>  	/* already zeroed?  we're done. */
> -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return count;
>  
>  	do {
> @@ -968,7 +972,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
>  		if (IS_DAX(inode))
>  			status = iomap_dax_zero(pos, offset, bytes, iomap);
>  		else
> -			status = iomap_zero(inode, pos, offset, bytes, iomap);
> +			status = iomap_zero(inode, pos, offset, bytes, iomap,
> +					srcmap);
>  		if (status < 0)
>  			return status;
>  
> @@ -1018,7 +1023,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
>  
>  static loff_t
>  iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page = data;
>  	int ret;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 1fc28c2da279..e3ccbf7daaae 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>  
>  static loff_t
>  iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_dio *dio = data;
>  
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index f26fdd36e383..690ef2d7c6c8 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
>  
>  static loff_t
>  iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct fiemap_ctx *ctx = data;
>  	loff_t ret = length;
> @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
>  
>  static loff_t
>  iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	sector_t *bno = data, addr;
>  
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index c04bad4b2b43..89f61d93c0bc 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
>  
>  static loff_t
>  iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> -		      void *data, struct iomap *iomap)
> +		      void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	switch (iomap->type) {
>  	case IOMAP_UNWRITTEN:
> @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
>  
>  static loff_t
>  iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> -		      void *data, struct iomap *iomap)
> +		      void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	switch (iomap->type) {
>  	case IOMAP_HOLE:
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 152a230f668d..a648dbf6991e 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>   * distinction between written and unwritten extents.
>   */
>  static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> -		loff_t count, void *data, struct iomap *iomap)
> +		loff_t count, void *data, struct iomap *iomap,
> +		struct iomap *srcmap)
>  {
>  	struct iomap_swapfile_info *isi = data;
>  	int error;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index d340db666c06..f1c3ff27c666 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 3a0f0975a57e..6d795924baa7 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -127,7 +127,8 @@ struct iomap_ops {
>  	 * The actual length is returned in iomap->length.
>  	 */
>  	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> -			unsigned flags, struct iomap *iomap);
> +			unsigned flags, struct iomap *iomap,
> +			struct iomap *srcmap);
>  
>  	/*
>  	 * Commit and/or unreserve space previous allocated using iomap_begin.
> @@ -143,7 +144,7 @@ struct iomap_ops {
>   * Main iomap iterator function.
>   */
>  typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> -		void *data, struct iomap *iomap);
> +		void *data, struct iomap *iomap, struct iomap *srcmap);
>  
>  loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
>  		unsigned flags, const struct iomap_ops *ops, void *data,
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 08/19] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations
  2019-09-09 18:27 ` [PATCH 08/19] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
@ 2019-09-18 17:09   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:11PM +0200, Christoph Hellwig wrote:
> There is no reason not to punch out stale delalloc blocks for zeroing
> operations, as they otherwise behave exactly like normal writes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index f1c3ff27c666..db4764c16142 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1138,7 +1138,8 @@ xfs_file_iomap_end(
>  	unsigned		flags,
>  	struct iomap		*iomap)
>  {
> -	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
> +	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
> +	    iomap->type == IOMAP_DELALLOC)
>  		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
>  				length, written, iomap);
>  	return 0;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents
  2019-09-09 18:27 ` [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
@ 2019-09-18 17:17   ` Darrick J. Wong
  2019-09-18 17:25     ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:12PM +0200, Christoph Hellwig wrote:
> Now that xfs_file_unshare is not completely dumb we can just call it
> directly without iterating the extent and reflink btrees ourselves.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_reflink.c | 108 ++++---------------------------------------
>  1 file changed, 10 insertions(+), 98 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index cadc0456804d..73f8cce4722d 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1381,85 +1381,6 @@ xfs_reflink_remap_prep(
>  	return ret;
>  }
>  
> -/*
> - * The user wants to preemptively CoW all shared blocks in this file,
> - * which enables us to turn off the reflink flag.  Iterate all
> - * extents which are not prealloc/delalloc to see which ranges are
> - * mentioned in the refcount tree, then read those blocks into the
> - * pagecache, dirty them, fsync them back out, and then we can update
> - * the inode flag.  What happens if we run out of memory? :)
> - */
> -STATIC int
> -xfs_reflink_dirty_extents(
> -	struct xfs_inode	*ip,
> -	xfs_fileoff_t		fbno,
> -	xfs_filblks_t		end,
> -	xfs_off_t		isize)
> -{
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_agnumber_t		agno;
> -	xfs_agblock_t		agbno;
> -	xfs_extlen_t		aglen;
> -	xfs_agblock_t		rbno;
> -	xfs_extlen_t		rlen;
> -	xfs_off_t		fpos;
> -	xfs_off_t		flen;
> -	struct xfs_bmbt_irec	map[2];
> -	int			nmaps;
> -	int			error = 0;
> -
> -	while (end - fbno > 0) {
> -		nmaps = 1;
> -		/*
> -		 * Look for extents in the file.  Skip holes, delalloc, or
> -		 * unwritten extents; they can't be reflinked.
> -		 */
> -		error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
> -		if (error)
> -			goto out;
> -		if (nmaps == 0)
> -			break;
> -		if (!xfs_bmap_is_real_extent(&map[0]))
> -			goto next;
> -
> -		map[1] = map[0];
> -		while (map[1].br_blockcount) {
> -			agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
> -			agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
> -			aglen = map[1].br_blockcount;
> -
> -			error = xfs_reflink_find_shared(mp, NULL, agno, agbno,
> -					aglen, &rbno, &rlen, true);
> -			if (error)
> -				goto out;
> -			if (rbno == NULLAGBLOCK)
> -				break;
> -
> -			/* Dirty the pages */
> -			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -			fpos = XFS_FSB_TO_B(mp, map[1].br_startoff +
> -					(rbno - agbno));
> -			flen = XFS_FSB_TO_B(mp, rlen);
> -			if (fpos + flen > isize)
> -				flen = isize - fpos;
> -			error = iomap_file_unshare(VFS_I(ip), fpos, flen,
> -					&xfs_iomap_ops);
> -			xfs_ilock(ip, XFS_ILOCK_EXCL);
> -			if (error)
> -				goto out;
> -
> -			map[1].br_blockcount -= (rbno - agbno + rlen);
> -			map[1].br_startoff += (rbno - agbno + rlen);
> -			map[1].br_startblock += (rbno - agbno + rlen);
> -		}
> -
> -next:
> -		fbno = map[0].br_startoff + map[0].br_blockcount;
> -	}
> -out:
> -	return error;
> -}
> -
>  /* Does this inode need the reflink flag? */
>  int
>  xfs_reflink_inode_has_shared_extents(
> @@ -1589,6 +1510,11 @@ xfs_reflink_try_clear_inode_flag(
>  /*
>   * Pre-COW all shared blocks within a given byte range of a file and turn off
>   * the reflink flag if we unshare all of the file's blocks.
> + *
> + * Let iomap iterate all extents to see which are shared and not unwritten or
> + * delalloc and read them into the page cache, dirty them, fsync them back out,
> + * and then we can update the inode flag.  What happens if we run out of
> + * memory? :)

I don't know, what /does/ happen? :)

It /should/ be fine, right?  Writeback will start pushing the dirty
cache pages to disk, and since writeback only takes the ILOCK, it should
be able to perform the COW even while the unshare process sits on the
IOLOCK/MMAPLOCK.  True, the unshare process and writeback will both be
contending on the ILOCK, but that shouldn't be a problem...

...unless I'm missing something?  It sure does look nice to drain all
this other code out.

--D

>   */
>  int
>  xfs_reflink_unshare(
> @@ -1596,10 +1522,7 @@ xfs_reflink_unshare(
>  	xfs_off_t		offset,
>  	xfs_off_t		len)
>  {
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_fileoff_t		fbno;
> -	xfs_filblks_t		end;
> -	xfs_off_t		isize;
> +	struct inode		*inode = VFS_I(ip);
>  	int			error;
>  
>  	if (!xfs_is_reflink_inode(ip))
> @@ -1607,20 +1530,12 @@ xfs_reflink_unshare(
>  
>  	trace_xfs_reflink_unshare(ip, offset, len);
>  
> -	inode_dio_wait(VFS_I(ip));
> +	inode_dio_wait(inode);
>  
> -	/* Try to CoW the selected ranges */
> -	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	fbno = XFS_B_TO_FSBT(mp, offset);
> -	isize = i_size_read(VFS_I(ip));
> -	end = XFS_B_TO_FSB(mp, offset + len);
> -	error = xfs_reflink_dirty_extents(ip, fbno, end, isize);
> +	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
>  	if (error)
> -		goto out_unlock;
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -
> -	/* Wait for the IO to finish */
> -	error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
> +		goto out;
> +	error = filemap_write_and_wait(inode->i_mapping);
>  	if (error)
>  		goto out;
>  
> @@ -1628,11 +1543,8 @@ xfs_reflink_unshare(
>  	error = xfs_reflink_try_clear_inode_flag(ip);
>  	if (error)
>  		goto out;
> -
>  	return 0;
>  
> -out_unlock:
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  out:
>  	trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
>  	return error;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents
  2019-09-18 17:17   ` Darrick J. Wong
@ 2019-09-18 17:25     ` Christoph Hellwig
  2019-09-18 17:31       ` Darrick J. Wong
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-18 17:25 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Wed, Sep 18, 2019 at 10:17:33AM -0700, Darrick J. Wong wrote:
> >  /*
> >   * Pre-COW all shared blocks within a given byte range of a file and turn off
> >   * the reflink flag if we unshare all of the file's blocks.
> > + *
> > + * Let iomap iterate all extents to see which are shared and not unwritten or
> > + * delalloc and read them into the page cache, dirty them, fsync them back out,
> > + * and then we can update the inode flag.  What happens if we run out of
> > + * memory? :)
> 
> I don't know, what /does/ happen? :)
> 
> It /should/ be fine, right?  Writeback will start pushing the dirty
> cache pages to disk, and since writeback only takes the ILOCK, it should
> be able to perform the COW even while the unshare process sits on the
> IOLOCK/MMAPLOCK.  True, the unshare process and writeback will both be
> contending on the ILOCK, but that shouldn't be a problem...
> 
> ...unless I'm missing something?  It sure does look nice to drain all
> this other code out.

No idea.  This is your old code just moved down to this function.  But
yes, the comment looks rather confusing and maybe we should just remove
it.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/19] xfs: pass two imaps to xfs_reflink_allocate_cow
  2019-09-09 18:27 ` [PATCH 10/19] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
@ 2019-09-18 17:26   ` Darrick J. Wong
  2019-09-30 11:10     ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:13PM +0200, Christoph Hellwig wrote:
> xfs_reflink_allocate_cow consumes the source data fork imap, and
> potentially returns the COW fork imap.  Split the arguments in two
> to clear up the calling conventions and to prepare for returning
> a source iomap from ->iomap_begin.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c   |  8 ++++----
>  fs/xfs/xfs_reflink.c | 30 +++++++++++++++---------------
>  fs/xfs/xfs_reflink.h |  4 ++--
>  3 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index db4764c16142..b228d1dbf59f 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -996,9 +996,8 @@ xfs_file_iomap_begin(
>  			goto out_found;
>  
>  		/* may drop and re-acquire the ilock */
> -		cmap = imap;
> -		error = xfs_reflink_allocate_cow(ip, &cmap, &shared, &lockmode,
> -				directio);
> +		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> +				&lockmode, directio);
>  		if (error)
>  			goto out_unlock;
>  
> @@ -1011,7 +1010,8 @@ xfs_file_iomap_begin(
>  		 * newly allocated address.  If the data fork has a hole, copy
>  		 * the COW fork mapping to avoid allocating to the data fork.
>  		 */
> -		if (directio || imap.br_startblock == HOLESTARTBLOCK)
> +		if (shared &&

Hmm.  At first I thought this was a behavior change but I think it isn't
because prior to this patch we'd set cmap = imap and if _allocate_cow
didn't find a shared extent then it would just return without doing
anything or touching cmap.  In the !shared case this would just set imap
to itself pointlessly.

Now that we pass both imap and cmap to _allocate_cow, in the !shared
case we don't initialized cmap at all, so adding the @shared check is
required for correct operation, right?

If so,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thanks for cleaning that up, the "mapping could be from another fork on
exit" behavior always bothered me.

--D

> +		    (directio || imap.br_startblock == HOLESTARTBLOCK))
>  			imap = cmap;
>  
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 73f8cce4722d..673018a618f0 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -308,13 +308,13 @@ static int
>  xfs_find_trim_cow_extent(
>  	struct xfs_inode	*ip,
>  	struct xfs_bmbt_irec	*imap,
> +	struct xfs_bmbt_irec	*cmap,
>  	bool			*shared,
>  	bool			*found)
>  {
>  	xfs_fileoff_t		offset_fsb = imap->br_startoff;
>  	xfs_filblks_t		count_fsb = imap->br_blockcount;
>  	struct xfs_iext_cursor	icur;
> -	struct xfs_bmbt_irec	got;
>  
>  	*found = false;
>  
> @@ -322,23 +322,22 @@ xfs_find_trim_cow_extent(
>  	 * If we don't find an overlapping extent, trim the range we need to
>  	 * allocate to fit the hole we found.
>  	 */
> -	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got))
> -		got.br_startoff = offset_fsb + count_fsb;
> -	if (got.br_startoff > offset_fsb) {
> +	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, cmap))
> +		cmap->br_startoff = offset_fsb + count_fsb;
> +	if (cmap->br_startoff > offset_fsb) {
>  		xfs_trim_extent(imap, imap->br_startoff,
> -				got.br_startoff - imap->br_startoff);
> +				cmap->br_startoff - imap->br_startoff);
>  		return xfs_inode_need_cow(ip, imap, shared);
>  	}
>  
>  	*shared = true;
> -	if (isnullstartblock(got.br_startblock)) {
> -		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> +	if (isnullstartblock(cmap->br_startblock)) {
> +		xfs_trim_extent(imap, cmap->br_startoff, cmap->br_blockcount);
>  		return 0;
>  	}
>  
>  	/* real extent found - no need to allocate */
> -	xfs_trim_extent(&got, offset_fsb, count_fsb);
> -	*imap = got;
> +	xfs_trim_extent(cmap, offset_fsb, count_fsb);
>  	*found = true;
>  	return 0;
>  }
> @@ -348,6 +347,7 @@ int
>  xfs_reflink_allocate_cow(
>  	struct xfs_inode	*ip,
>  	struct xfs_bmbt_irec	*imap,
> +	struct xfs_bmbt_irec	*cmap,
>  	bool			*shared,
>  	uint			*lockmode,
>  	bool			convert_now)
> @@ -367,7 +367,7 @@ xfs_reflink_allocate_cow(
>  		xfs_ifork_init_cow(ip);
>  	}
>  
> -	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
> +	error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
>  	if (error || !*shared)
>  		return error;
>  	if (found)
> @@ -392,7 +392,7 @@ xfs_reflink_allocate_cow(
>  	/*
>  	 * Check for an overlapping extent again now that we dropped the ilock.
>  	 */
> -	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
> +	error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
>  	if (error || !*shared)
>  		goto out_trans_cancel;
>  	if (found) {
> @@ -411,7 +411,7 @@ xfs_reflink_allocate_cow(
>  	nimaps = 1;
>  	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
>  			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
> -			resblks, imap, &nimaps);
> +			resblks, cmap, &nimaps);
>  	if (error)
>  		goto out_unreserve;
>  
> @@ -427,15 +427,15 @@ xfs_reflink_allocate_cow(
>  	if (nimaps == 0)
>  		return -ENOSPC;
>  convert:
> -	xfs_trim_extent(imap, offset_fsb, count_fsb);
> +	xfs_trim_extent(cmap, offset_fsb, count_fsb);
>  	/*
>  	 * COW fork extents are supposed to remain unwritten until we're ready
>  	 * to initiate a disk write.  For direct I/O we are going to write the
>  	 * data and need the conversion, but for buffered writes we're done.
>  	 */
> -	if (!convert_now || imap->br_state == XFS_EXT_NORM)
> +	if (!convert_now || cmap->br_state == XFS_EXT_NORM)
>  		return 0;
> -	trace_xfs_reflink_convert_cow(ip, imap);
> +	trace_xfs_reflink_convert_cow(ip, cmap);
>  	return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
>  
>  out_unreserve:
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 28a43b7f581d..d18ad7f4fb64 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -25,8 +25,8 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  bool xfs_inode_need_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
>  		bool *shared);
>  
> -extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
> -		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
> +int xfs_reflink_allocate_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
> +		struct xfs_bmbt_irec *cmap, bool *shared, uint *lockmode,
>  		bool convert_now);
>  extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
>  		xfs_off_t count);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 11/19] xfs: refactor xfs_file_iomap_begin_delay
  2019-09-09 18:27 ` [PATCH 11/19] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
@ 2019-09-18 17:30   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:14PM +0200, Christoph Hellwig wrote:
> Rejuggle the return path to prepare for filling out a source iomap.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 48 +++++++++++++++++++++++-----------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index b228d1dbf59f..18a0f8a5d8c9 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -539,7 +539,6 @@ xfs_file_iomap_begin_delay(
>  	struct xfs_iext_cursor	icur, ccur;
>  	xfs_fsblock_t		prealloc_blocks = 0;
>  	bool			eof = false, cow_eof = false, shared = false;
> -	u16			iomap_flags = 0;
>  	int			whichfork = XFS_DATA_FORK;
>  	int			error = 0;
>  
> @@ -600,8 +599,7 @@ xfs_file_iomap_begin_delay(
>  				&ccur, &cmap);
>  		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
>  			trace_xfs_reflink_cow_found(ip, &cmap);
> -			whichfork = XFS_COW_FORK;
> -			goto done;
> +			goto found_cow;
>  		}
>  	}
>  
> @@ -615,7 +613,7 @@ xfs_file_iomap_begin_delay(
>  		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
>  			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
>  					&imap);
> -			goto done;
> +			goto found_imap;
>  		}
>  
>  		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> @@ -629,7 +627,7 @@ xfs_file_iomap_begin_delay(
>  		if (!shared) {
>  			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
>  					&imap);
> -			goto done;
> +			goto found_imap;
>  		}
>  
>  		/*
> @@ -703,35 +701,37 @@ xfs_file_iomap_begin_delay(
>  		goto out_unlock;
>  	}
>  
> +	if (whichfork == XFS_COW_FORK) {
> +		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
> +		goto found_cow;
> +	}
> +
>  	/*
>  	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
>  	 * them out if the write happens to fail.
>  	 */
> -	if (whichfork == XFS_DATA_FORK) {
> -		iomap_flags |= IOMAP_F_NEW;
> -		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> -	} else {
> -		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
> -	}
> -done:
> -	if (whichfork == XFS_COW_FORK) {
> -		if (imap.br_startoff > offset_fsb) {
> -			xfs_trim_extent(&cmap, offset_fsb,
> -					imap.br_startoff - offset_fsb);
> -			error = xfs_bmbt_to_iomap(ip, iomap, &cmap,
> -					IOMAP_F_SHARED);
> -			goto out_unlock;
> -		}
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
> +
> +found_imap:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
> +
> +found_cow:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	if (imap.br_startoff <= offset_fsb) {
>  		/* ensure we only report blocks we have a reservation for */
>  		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
> -		shared = true;
> +		return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
>  	}
> -	if (shared)
> -		iomap_flags |= IOMAP_F_SHARED;
> -	error = xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
> +	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
> +	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
> +
>  out_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return error;
> +
>  }
>  
>  int
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents
  2019-09-18 17:25     ` Christoph Hellwig
@ 2019-09-18 17:31       ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Wed, Sep 18, 2019 at 07:25:45PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 18, 2019 at 10:17:33AM -0700, Darrick J. Wong wrote:
> > >  /*
> > >   * Pre-COW all shared blocks within a given byte range of a file and turn off
> > >   * the reflink flag if we unshare all of the file's blocks.
> > > + *
> > > + * Let iomap iterate all extents to see which are shared and not unwritten or
> > > + * delalloc and read them into the page cache, dirty them, fsync them back out,
> > > + * and then we can update the inode flag.  What happens if we run out of
> > > + * memory? :)
> > 
> > I don't know, what /does/ happen? :)
> > 
> > It /should/ be fine, right?  Writeback will start pushing the dirty
> > cache pages to disk, and since writeback only takes the ILOCK, it should
> > be able to perform the COW even while the unshare process sits on the
> > IOLOCK/MMAPLOCK.  True, the unshare process and writeback will both be
> > contending on the ILOCK, but that shouldn't be a problem...
> > 
> > ...unless I'm missing something?  It sure does look nice to drain all
> > this other code out.
> 
> No idea.  This is your old code just moved down to this function.  But
> yes, the comment looks rather confusing and maybe we should just remove
> it.

I know.  I remember testing it way back in 4.9 when we first merged it
to make sure it really did work that way, and I think the locking model
hasn't changed so it should be fine.  But just wanted another set of
eyes to make sure things haven't subtlely shifted since then. :)

--D

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 12/19] xfs: fill out the srcmap in iomap_begin
  2019-09-09 18:27 ` [PATCH 12/19] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
@ 2019-09-18 17:52   ` Darrick J. Wong
  2019-10-01  6:26     ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:15PM +0200, Christoph Hellwig wrote:
> Replace our local hacks to report the source block in the main iomap
> with the proper scrmap reporting.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 49 +++++++++++++++++++++++-----------------------
>  1 file changed, 24 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 18a0f8a5d8c9..d12eacdc9bba 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -527,7 +527,8 @@ xfs_file_iomap_begin_delay(
>  	loff_t			offset,
>  	loff_t			count,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -721,11 +722,13 @@ xfs_file_iomap_begin_delay(
>  found_cow:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	if (imap.br_startoff <= offset_fsb) {
> -		/* ensure we only report blocks we have a reservation for */
> -		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
> -		return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
> +		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> +		if (error)
> +			return error;
> +	} else {
> +		xfs_trim_extent(&cmap, offset_fsb,
> +				imap.br_startoff - offset_fsb);
>  	}
> -	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
>  	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
>  
>  out_unlock:
> @@ -933,7 +936,7 @@ xfs_file_iomap_begin(
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_bmbt_irec	imap;
> +	struct xfs_bmbt_irec	imap, cmap;
>  	xfs_fileoff_t		offset_fsb, end_fsb;
>  	int			nimaps = 1, error = 0;
>  	bool			shared = false;
> @@ -947,7 +950,7 @@ xfs_file_iomap_begin(
>  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
>  		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> -				iomap);
> +				iomap, srcmap);
>  	}
>  
>  	/*
> @@ -987,9 +990,6 @@ xfs_file_iomap_begin(
>  	 * been done up front, so we don't need to do them here.
>  	 */
>  	if (xfs_is_cow_inode(ip)) {
> -		struct xfs_bmbt_irec	cmap;
> -		bool			directio = (flags & IOMAP_DIRECT);
> -
>  		/* if zeroing doesn't need COW allocation, then we are done. */
>  		if ((flags & IOMAP_ZERO) &&
>  		    !needs_cow_for_zeroing(&imap, nimaps))
> @@ -997,23 +997,11 @@ xfs_file_iomap_begin(
>  
>  		/* may drop and re-acquire the ilock */
>  		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> -				&lockmode, directio);
> +				&lockmode, flags & IOMAP_DIRECT);
>  		if (error)
>  			goto out_unlock;
> -
> -		/*
> -		 * For buffered writes we need to report the address of the
> -		 * previous block (if there was any) so that the higher level
> -		 * write code can perform read-modify-write operations; we
> -		 * won't need the CoW fork mapping until writeback.  For direct
> -		 * I/O, which must be block aligned, we need to report the
> -		 * newly allocated address.  If the data fork has a hole, copy
> -		 * the COW fork mapping to avoid allocating to the data fork.
> -		 */
> -		if (shared &&
> -		    (directio || imap.br_startblock == HOLESTARTBLOCK))
> -			imap = cmap;
> -
> +		if (shared)
> +			goto out_found_cow;
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
>  	}
> @@ -1067,6 +1055,17 @@ xfs_file_iomap_begin(
>  	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
>  	goto out_finish;
>  
> +out_found_cow:
> +	xfs_iunlock(ip, lockmode);
> +	length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
> +	trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap);
> +	if (imap.br_startblock != HOLESTARTBLOCK) {

TBH I've been wondering for a while now if it would make more sense to
do this in iomap_apply (and the open-coded versions in dax.c):

	struct iomap srcmap = { .type = IOMAP_HOLE };

in the iomap_apply function (and change the "if (!srcmap.type)" checks
to "if (srcmap.type != IOMAP_HOLE)").  That would get rid of the weird
situation where iomap.h doesn't define an iomap type name corresponding
to 0 but clearly it has some special meaning because the iomap code
changes behavior based on that.

It also strikes me as weird that for the @imap parameter, type == 0
would be considered a coding error but for @srcmap, we use type == 0 to
mean "no mapping" but we don't do that for @srcmap.type == IOMAP_HOLE.

I mention that because, if some ->iomap_begin function returns
IOMAP_HOLE then iomap_apply will pass the (hole) srcmap as the second
parameter to the ->actor function.  When that happens, iomap_write_begin
call will try to fill in the rest of the page from @srcmap (which is
hole), not the @iomap (which might not be a hole) which seems wrong.

As for this function, if we made the above change, then the conditional
becomes unneccessary -- we know this is a COW write, so we call
xfs_bmbt_to_iomap on both mappings and exit.  No need for special
casing.

--D

> +		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> +		if (error)
> +			return error;
> +	}
> +	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
> +
>  out_unlock:
>  	xfs_iunlock(ip, lockmode);
>  	return error;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb
  2019-09-09 18:27 ` [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
@ 2019-09-18 17:55   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:16PM +0200, Christoph Hellwig wrote:
> We have lots of places that want to calculate the final fsb for
> a offset + count in bytes and check that the result fits into
> s_maxbytes.  Factor out a helper for that.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 40 ++++++++++++++++++++--------------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index d12eacdc9bba..0ba67a8d8169 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -102,6 +102,17 @@ xfs_hole_to_iomap(
>  	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
>  }
>  
> +static inline xfs_fileoff_t
> +xfs_iomap_end_fsb(

I wonder if this function ought to have a comment that it returns the
offset block number of offset + count clamped to the highest offset
supported by the page cache, but ... eh.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +	struct xfs_mount	*mp,
> +	loff_t			offset,
> +	loff_t			count)
> +{
> +	ASSERT(offset <= mp->m_super->s_maxbytes);
> +	return min(XFS_B_TO_FSB(mp, offset + count),
> +		   XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
> +}
> +
>  xfs_extlen_t
>  xfs_eof_alignment(
>  	struct xfs_inode	*ip,
> @@ -172,8 +183,8 @@ xfs_iomap_write_direct(
>  	int		nmaps)
>  {
>  	xfs_mount_t	*mp = ip->i_mount;
> -	xfs_fileoff_t	offset_fsb;
> -	xfs_fileoff_t	last_fsb;
> +	xfs_fileoff_t	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t	last_fsb = xfs_iomap_end_fsb(mp, offset, count);
>  	xfs_filblks_t	count_fsb, resaligned;
>  	xfs_extlen_t	extsz;
>  	int		nimaps;
> @@ -192,8 +203,6 @@ xfs_iomap_write_direct(
>  
>  	ASSERT(xfs_isilocked(ip, lockmode));
>  
> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
>  	if ((offset + count) > XFS_ISIZE(ip)) {
>  		/*
>  		 * Assert that the in-core extent list is present since this can
> @@ -533,9 +542,7 @@ xfs_file_iomap_begin_delay(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	xfs_fileoff_t		maxbytes_fsb =
> -		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> -	xfs_fileoff_t		end_fsb;
> +	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
>  	struct xfs_bmbt_irec	imap, cmap;
>  	struct xfs_iext_cursor	icur, ccur;
>  	xfs_fsblock_t		prealloc_blocks = 0;
> @@ -565,8 +572,6 @@ xfs_file_iomap_begin_delay(
>  			goto out_unlock;
>  	}
>  
> -	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> -
>  	/*
>  	 * Search the data fork fork first to look up our source mapping.  We
>  	 * always need the data fork map, as we have to return it to the
> @@ -648,7 +653,7 @@ xfs_file_iomap_begin_delay(
>  		 * the lower level functions are updated.
>  		 */
>  		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> -		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
> +		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
>  
>  		if (xfs_is_always_cow_inode(ip))
>  			whichfork = XFS_COW_FORK;
> @@ -674,7 +679,8 @@ xfs_file_iomap_begin_delay(
>  			if (align)
>  				p_end_fsb = roundup_64(p_end_fsb, align);
>  
> -			p_end_fsb = min(p_end_fsb, maxbytes_fsb);
> +			p_end_fsb = min(p_end_fsb,
> +				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
>  			ASSERT(p_end_fsb > offset_fsb);
>  			prealloc_blocks = p_end_fsb - end_fsb;
>  		}
> @@ -937,7 +943,8 @@ xfs_file_iomap_begin(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_bmbt_irec	imap, cmap;
> -	xfs_fileoff_t		offset_fsb, end_fsb;
> +	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
>  	int			nimaps = 1, error = 0;
>  	bool			shared = false;
>  	u16			iomap_flags = 0;
> @@ -963,12 +970,6 @@ xfs_file_iomap_begin(
>  	if (error)
>  		return error;
>  
> -	ASSERT(offset <= mp->m_super->s_maxbytes);
> -	if (offset > mp->m_super->s_maxbytes - length)
> -		length = mp->m_super->s_maxbytes - offset;
> -	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> -
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
>  			       &nimaps, 0);
>  	if (error)
> @@ -1189,8 +1190,7 @@ xfs_seek_iomap_begin(
>  		/*
>  		 * Fake a hole until the end of the file.
>  		 */
> -		data_fsb = min(XFS_B_TO_FSB(mp, offset + length),
> -			       XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
> +		data_fsb = xfs_iomap_end_fsb(mp, offset, length);
>  	}
>  
>  	/*
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 14/19] xfs: split out a new set of read-only iomap ops
  2019-09-09 18:27 ` [PATCH 14/19] xfs: split out a new set of read-only iomap ops Christoph Hellwig
@ 2019-09-18 17:56   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:17PM +0200, Christoph Hellwig wrote:
> Start untangling xfs_file_iomap_begin by splitting out the read-only
> case into its own set of iomap_ops with a very simply iomap_begin
> helper.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c  |  9 ++++---
>  fs/xfs/xfs_file.c  |  8 +++---
>  fs/xfs/xfs_iomap.c | 61 ++++++++++++++++++++++++++++++++++------------
>  fs/xfs/xfs_iomap.h |  1 +
>  fs/xfs/xfs_iops.c  |  2 +-
>  5 files changed, 58 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 4e4a4d7df5ac..fe47a4d52247 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1158,7 +1158,7 @@ xfs_vm_bmap(
>  	 */
>  	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
>  		return 0;
> -	return iomap_bmap(mapping, block, &xfs_iomap_ops);
> +	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
>  }
>  
>  STATIC int
> @@ -1167,7 +1167,7 @@ xfs_vm_readpage(
>  	struct page		*page)
>  {
>  	trace_xfs_vm_readpage(page->mapping->host, 1);
> -	return iomap_readpage(page, &xfs_iomap_ops);
> +	return iomap_readpage(page, &xfs_read_iomap_ops);
>  }
>  
>  STATIC int
> @@ -1178,7 +1178,7 @@ xfs_vm_readpages(
>  	unsigned		nr_pages)
>  {
>  	trace_xfs_vm_readpages(mapping->host, nr_pages);
> -	return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
> +	return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
>  }
>  
>  static int
> @@ -1188,7 +1188,8 @@ xfs_iomap_swapfile_activate(
>  	sector_t			*span)
>  {
>  	sis->bdev = xfs_find_bdev_for_inode(file_inode(swap_file));
> -	return iomap_swapfile_activate(sis, swap_file, span, &xfs_iomap_ops);
> +	return iomap_swapfile_activate(sis, swap_file, span,
> +			&xfs_read_iomap_ops);
>  }
>  
>  const struct address_space_operations xfs_address_space_operations = {
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 21bd3d575aaa..d8e2e36b84f7 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -187,7 +187,7 @@ xfs_file_dio_aio_read(
>  	file_accessed(iocb->ki_filp);
>  
>  	xfs_ilock(ip, XFS_IOLOCK_SHARED);
> -	ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
> +	ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL);
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
>  	return ret;
> @@ -214,7 +214,7 @@ xfs_file_dax_read(
>  		xfs_ilock(ip, XFS_IOLOCK_SHARED);
>  	}
>  
> -	ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
> +	ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
>  	file_accessed(iocb->ki_filp);
> @@ -1131,7 +1131,9 @@ __xfs_filemap_fault(
>  	if (IS_DAX(inode)) {
>  		pfn_t pfn;
>  
> -		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
> +		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
> +				(write_fault && !vmf->cow_page) ?
> +				 &xfs_iomap_ops : &xfs_read_iomap_ops);
>  		if (ret & VM_FAULT_NEEDDSYNC)
>  			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
>  	} else {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 0ba67a8d8169..d0ed0cba9041 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -950,11 +950,13 @@ xfs_file_iomap_begin(
>  	u16			iomap_flags = 0;
>  	unsigned		lockmode;
>  
> +	ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
> +
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
> -			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> +	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
> +	    !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
>  		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
>  				iomap, srcmap);
> @@ -975,17 +977,6 @@ xfs_file_iomap_begin(
>  	if (error)
>  		goto out_unlock;
>  
> -	if (flags & IOMAP_REPORT) {
> -		/* Trim the mapping to the nearest shared extent boundary. */
> -		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> -		if (error)
> -			goto out_unlock;
> -	}
> -
> -	/* Non-modifying mapping requested, so we are done */
> -	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
> -		goto out_found;
> -
>  	/*
>  	 * Break shared extents if necessary. Checks for non-blocking IO have
>  	 * been done up front, so we don't need to do them here.
> @@ -1046,8 +1037,6 @@ xfs_file_iomap_begin(
>  	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
>  
>  out_finish:
> -	if (shared)
> -		iomap_flags |= IOMAP_F_SHARED;
>  	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
>  
>  out_found:
> @@ -1150,6 +1139,48 @@ const struct iomap_ops xfs_iomap_ops = {
>  	.iomap_end		= xfs_file_iomap_end,
>  };
>  
> +static int
> +xfs_read_iomap_begin(
> +	struct inode		*inode,
> +	loff_t			offset,
> +	loff_t			length,
> +	unsigned		flags,
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_bmbt_irec	imap;
> +	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
> +	int			nimaps = 1, error = 0;
> +	bool			shared = false;
> +	unsigned		lockmode;
> +
> +	ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
> +
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return -EIO;
> +
> +	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
> +	if (error)
> +		return error;
> +	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> +			       &nimaps, 0);
> +	if (!error && (flags & IOMAP_REPORT))
> +		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> +	xfs_iunlock(ip, lockmode);
> +
> +	if (error)
> +		return error;
> +	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, shared ? IOMAP_F_SHARED : 0);
> +}
> +
> +const struct iomap_ops xfs_read_iomap_ops = {
> +	.iomap_begin		= xfs_read_iomap_begin,
> +};
> +
>  static int
>  xfs_seek_iomap_begin(
>  	struct inode		*inode,
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 71d0ae460c44..61b1fc3e5143 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -40,6 +40,7 @@ xfs_aligned_fsb_count(
>  }
>  
>  extern const struct iomap_ops xfs_iomap_ops;
> +extern const struct iomap_ops xfs_read_iomap_ops;
>  extern const struct iomap_ops xfs_seek_iomap_ops;
>  extern const struct iomap_ops xfs_xattr_iomap_ops;
>  
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index ff3c1fae5357..f0c7df82e18b 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1113,7 +1113,7 @@ xfs_vn_fiemap(
>  				&xfs_xattr_iomap_ops);
>  	} else {
>  		error = iomap_fiemap(inode, fieinfo, start, length,
> -				&xfs_iomap_ops);
> +				&xfs_read_iomap_ops);
>  	}
>  	xfs_iunlock(XFS_I(inode), XFS_IOLOCK_SHARED);
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 15/19] xfs: move xfs_file_iomap_begin_delay around
  2019-09-09 18:27 ` [PATCH 15/19] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
@ 2019-09-18 17:59   ` Darrick J. Wong
  2019-09-30 11:14     ` Christoph Hellwig
  0 siblings, 1 reply; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 17:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:18PM +0200, Christoph Hellwig wrote:
> Move xfs_file_iomap_begin_delay near the end of the file next to the
> other iomap functions to prepare for additional refactoring.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iomap.c | 434 +++++++++++++++++++++++----------------------
>  1 file changed, 221 insertions(+), 213 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index d0ed0cba9041..7ae55503ee27 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -530,219 +530,6 @@ xfs_iomap_prealloc_size(
>  	return alloc_blocks;
>  }
>  
> -static int
> -xfs_file_iomap_begin_delay(
> -	struct inode		*inode,
> -	loff_t			offset,
> -	loff_t			count,
> -	unsigned		flags,
> -	struct iomap		*iomap,
> -	struct iomap		*srcmap)
> -{
> -	struct xfs_inode	*ip = XFS_I(inode);
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
> -	struct xfs_bmbt_irec	imap, cmap;
> -	struct xfs_iext_cursor	icur, ccur;
> -	xfs_fsblock_t		prealloc_blocks = 0;
> -	bool			eof = false, cow_eof = false, shared = false;
> -	int			whichfork = XFS_DATA_FORK;
> -	int			error = 0;
> -
> -	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> -	ASSERT(!xfs_get_extsz_hint(ip));
> -
> -	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -
> -	if (unlikely(XFS_TEST_ERROR(
> -	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
> -	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
> -	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
> -		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
> -		error = -EFSCORRUPTED;
> -		goto out_unlock;
> -	}
> -
> -	XFS_STATS_INC(mp, xs_blk_mapw);
> -
> -	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
> -		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> -		if (error)
> -			goto out_unlock;
> -	}
> -
> -	/*
> -	 * Search the data fork fork first to look up our source mapping.  We
> -	 * always need the data fork map, as we have to return it to the
> -	 * iomap code so that the higher level write code can read data in to
> -	 * perform read-modify-write cycles for unaligned writes.
> -	 */
> -	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
> -	if (eof)
> -		imap.br_startoff = end_fsb; /* fake hole until the end */
> -
> -	/* We never need to allocate blocks for zeroing a hole. */
> -	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> -		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> -		goto out_unlock;
> -	}
> -
> -	/*
> -	 * Search the COW fork extent list even if we did not find a data fork
> -	 * extent.  This serves two purposes: first this implements the
> -	 * speculative preallocation using cowextsize, so that we also unshare
> -	 * block adjacent to shared blocks instead of just the shared blocks
> -	 * themselves.  Second the lookup in the extent list is generally faster
> -	 * than going out to the shared extent tree.
> -	 */
> -	if (xfs_is_cow_inode(ip)) {
> -		if (!ip->i_cowfp) {
> -			ASSERT(!xfs_is_reflink_inode(ip));
> -			xfs_ifork_init_cow(ip);
> -		}
> -		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
> -				&ccur, &cmap);
> -		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
> -			trace_xfs_reflink_cow_found(ip, &cmap);
> -			goto found_cow;
> -		}
> -	}
> -
> -	if (imap.br_startoff <= offset_fsb) {
> -		/*
> -		 * For reflink files we may need a delalloc reservation when
> -		 * overwriting shared extents.   This includes zeroing of
> -		 * existing extents that contain data.
> -		 */
> -		if (!xfs_is_cow_inode(ip) ||
> -		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
> -			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> -					&imap);
> -			goto found_imap;
> -		}
> -
> -		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> -
> -		/* Trim the mapping to the nearest shared extent boundary. */
> -		error = xfs_inode_need_cow(ip, &imap, &shared);
> -		if (error)
> -			goto out_unlock;
> -
> -		/* Not shared?  Just report the (potentially capped) extent. */
> -		if (!shared) {
> -			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> -					&imap);
> -			goto found_imap;
> -		}
> -
> -		/*
> -		 * Fork all the shared blocks from our write offset until the
> -		 * end of the extent.
> -		 */
> -		whichfork = XFS_COW_FORK;
> -		end_fsb = imap.br_startoff + imap.br_blockcount;
> -	} else {
> -		/*
> -		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
> -		 * pages to keep the chunks of work done where somewhat
> -		 * symmetric with the work writeback does.  This is a completely
> -		 * arbitrary number pulled out of thin air.
> -		 *
> -		 * Note that the values needs to be less than 32-bits wide until
> -		 * the lower level functions are updated.
> -		 */
> -		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> -		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
> -
> -		if (xfs_is_always_cow_inode(ip))
> -			whichfork = XFS_COW_FORK;
> -	}
> -
> -	error = xfs_qm_dqattach_locked(ip, false);
> -	if (error)
> -		goto out_unlock;
> -
> -	if (eof) {
> -		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
> -				count, &icur);
> -		if (prealloc_blocks) {
> -			xfs_extlen_t	align;
> -			xfs_off_t	end_offset;
> -			xfs_fileoff_t	p_end_fsb;
> -
> -			end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
> -			p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
> -					prealloc_blocks;
> -
> -			align = xfs_eof_alignment(ip, 0);
> -			if (align)
> -				p_end_fsb = roundup_64(p_end_fsb, align);
> -
> -			p_end_fsb = min(p_end_fsb,
> -				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
> -			ASSERT(p_end_fsb > offset_fsb);
> -			prealloc_blocks = p_end_fsb - end_fsb;
> -		}
> -	}
> -
> -retry:
> -	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
> -			end_fsb - offset_fsb, prealloc_blocks,
> -			whichfork == XFS_DATA_FORK ? &imap : &cmap,
> -			whichfork == XFS_DATA_FORK ? &icur : &ccur,
> -			whichfork == XFS_DATA_FORK ? eof : cow_eof);
> -	switch (error) {
> -	case 0:
> -		break;
> -	case -ENOSPC:
> -	case -EDQUOT:
> -		/* retry without any preallocation */
> -		trace_xfs_delalloc_enospc(ip, offset, count);
> -		if (prealloc_blocks) {
> -			prealloc_blocks = 0;
> -			goto retry;
> -		}
> -		/*FALLTHRU*/
> -	default:
> -		goto out_unlock;
> -	}
> -
> -	if (whichfork == XFS_COW_FORK) {
> -		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
> -		goto found_cow;
> -	}
> -
> -	/*
> -	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
> -	 * them out if the write happens to fail.
> -	 */
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> -	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
> -
> -found_imap:
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
> -
> -found_cow:
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	if (imap.br_startoff <= offset_fsb) {
> -		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> -		if (error)
> -			return error;
> -	} else {
> -		xfs_trim_extent(&cmap, offset_fsb,
> -				imap.br_startoff - offset_fsb);
> -	}
> -	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
> -
> -out_unlock:
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	return error;
> -
> -}
> -
>  int
>  xfs_iomap_write_unwritten(
>  	xfs_inode_t	*ip,
> @@ -931,6 +718,15 @@ xfs_ilock_for_iomap(
>  	return 0;
>  }
>  
> +static int
> +xfs_file_iomap_begin_delay(
> +	struct inode		*inode,
> +	loff_t			offset,
> +	loff_t			count,
> +	unsigned		flags,
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap);
> +
>  static int
>  xfs_file_iomap_begin(
>  	struct inode		*inode,
> @@ -1061,6 +857,218 @@ xfs_file_iomap_begin(
>  	return error;
>  }
>  
> +static int
> +xfs_file_iomap_begin_delay(
> +	struct inode		*inode,
> +	loff_t			offset,
> +	loff_t			count,
> +	unsigned		flags,
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
> +	struct xfs_bmbt_irec	imap, cmap;
> +	struct xfs_iext_cursor	icur, ccur;
> +	xfs_fsblock_t		prealloc_blocks = 0;
> +	bool			eof = false, cow_eof = false, shared = false;
> +	int			whichfork = XFS_DATA_FORK;
> +	int			error = 0;
> +
> +	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> +	ASSERT(!xfs_get_extsz_hint(ip));
> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +
> +	if (unlikely(XFS_TEST_ERROR(
> +	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
> +	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
> +	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
> +		error = -EFSCORRUPTED;
> +		goto out_unlock;
> +	}
> +
> +	XFS_STATS_INC(mp, xs_blk_mapw);
> +
> +	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
> +		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
> +		if (error)
> +			goto out_unlock;
> +	}
> +
> +	/*
> +	 * Search the data fork fork first to look up our source mapping.  We
> +	 * always need the data fork map, as we have to return it to the
> +	 * iomap code so that the higher level write code can read data in to
> +	 * perform read-modify-write cycles for unaligned writes.
> +	 */
> +	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
> +	if (eof)
> +		imap.br_startoff = end_fsb; /* fake hole until the end */
> +
> +	/* We never need to allocate blocks for zeroing a hole. */
> +	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
> +		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * Search the COW fork extent list even if we did not find a data fork
> +	 * extent.  This serves two purposes: first this implements the
> +	 * speculative preallocation using cowextsize, so that we also unshare
> +	 * block adjacent to shared blocks instead of just the shared blocks
> +	 * themselves.  Second the lookup in the extent list is generally faster
> +	 * than going out to the shared extent tree.
> +	 */
> +	if (xfs_is_cow_inode(ip)) {
> +		if (!ip->i_cowfp) {
> +			ASSERT(!xfs_is_reflink_inode(ip));
> +			xfs_ifork_init_cow(ip);
> +		}
> +		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
> +				&ccur, &cmap);
> +		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
> +			trace_xfs_reflink_cow_found(ip, &cmap);
> +			goto found_cow;
> +		}
> +	}
> +
> +	if (imap.br_startoff <= offset_fsb) {
> +		/*
> +		 * For reflink files we may need a delalloc reservation when
> +		 * overwriting shared extents.   This includes zeroing of
> +		 * existing extents that contain data.
> +		 */
> +		if (!xfs_is_cow_inode(ip) ||
> +		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
> +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> +					&imap);
> +			goto found_imap;
> +		}
> +
> +		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> +
> +		/* Trim the mapping to the nearest shared extent boundary. */
> +		error = xfs_inode_need_cow(ip, &imap, &shared);
> +		if (error)
> +			goto out_unlock;
> +
> +		/* Not shared?  Just report the (potentially capped) extent. */
> +		if (!shared) {
> +			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
> +					&imap);
> +			goto found_imap;
> +		}
> +
> +		/*
> +		 * Fork all the shared blocks from our write offset until the
> +		 * end of the extent.
> +		 */
> +		whichfork = XFS_COW_FORK;
> +		end_fsb = imap.br_startoff + imap.br_blockcount;
> +	} else {
> +		/*
> +		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
> +		 * pages to keep the chunks of work done where somewhat
> +		 * symmetric with the work writeback does.  This is a completely
> +		 * arbitrary number pulled out of thin air.
> +		 *
> +		 * Note that the values needs to be less than 32-bits wide until
> +		 * the lower level functions are updated.
> +		 */
> +		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
> +		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
> +
> +		if (xfs_is_always_cow_inode(ip))
> +			whichfork = XFS_COW_FORK;
> +	}
> +
> +	error = xfs_qm_dqattach_locked(ip, false);
> +	if (error)
> +		goto out_unlock;
> +
> +	if (eof) {
> +		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
> +				count, &icur);
> +		if (prealloc_blocks) {
> +			xfs_extlen_t	align;
> +			xfs_off_t	end_offset;
> +			xfs_fileoff_t	p_end_fsb;
> +
> +			end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
> +			p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
> +					prealloc_blocks;
> +
> +			align = xfs_eof_alignment(ip, 0);
> +			if (align)
> +				p_end_fsb = roundup_64(p_end_fsb, align);
> +
> +			p_end_fsb = min(p_end_fsb,
> +		   		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));

Might as well fix the indenting damage here?

With that fixed,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +			ASSERT(p_end_fsb > offset_fsb);
> +			prealloc_blocks = p_end_fsb - end_fsb;
> +		}
> +	}
> +
> +retry:
> +	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
> +			end_fsb - offset_fsb, prealloc_blocks,
> +			whichfork == XFS_DATA_FORK ? &imap : &cmap,
> +			whichfork == XFS_DATA_FORK ? &icur : &ccur,
> +			whichfork == XFS_DATA_FORK ? eof : cow_eof);
> +	switch (error) {
> +	case 0:
> +		break;
> +	case -ENOSPC:
> +	case -EDQUOT:
> +		/* retry without any preallocation */
> +		trace_xfs_delalloc_enospc(ip, offset, count);
> +		if (prealloc_blocks) {
> +			prealloc_blocks = 0;
> +			goto retry;
> +		}
> +		/*FALLTHRU*/
> +	default:
> +		goto out_unlock;
> +	}
> +
> +	if (whichfork == XFS_COW_FORK) {
> +		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
> +		goto found_cow;
> +	}
> +
> +	/*
> +	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
> +	 * them out if the write happens to fail.
> +	 */
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
> +
> +found_imap:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
> +
> +found_cow:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	if (imap.br_startoff <= offset_fsb) {
> +		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> +		if (error)
> +			return error;
> +	} else {
> +		xfs_trim_extent(&cmap, offset_fsb,
> +				imap.br_startoff - offset_fsb);
> +	}
> +	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
> +
> +out_unlock:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	return error;
> +}
> +
>  static int
>  xfs_file_iomap_end_delalloc(
>  	struct xfs_inode	*ip,
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 16/19] xfs: split the iomap ops for buffered vs direct writes
  2019-09-09 18:27 ` [PATCH 16/19] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
@ 2019-09-18 18:00   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:19PM +0200, Christoph Hellwig wrote:
> Instead of lots of magic conditionals in the main write_begin
> handler this make the intent very clear.  Thing will become even
> better once we support delayed allocations for extent size hints
> and realtime allocations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_bmap_util.c |  3 ++-
>  fs/xfs/xfs_file.c      | 16 ++++++-----
>  fs/xfs/xfs_iomap.c     | 61 +++++++++++++++---------------------------
>  fs/xfs/xfs_iomap.h     |  3 ++-
>  fs/xfs/xfs_iops.c      |  4 +--
>  fs/xfs/xfs_reflink.c   |  5 ++--
>  6 files changed, 40 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 98c6a7a71427..a4a4813e9cc6 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1111,7 +1111,8 @@ xfs_free_file_space(
>  		return 0;
>  	if (offset + len > XFS_ISIZE(ip))
>  		len = XFS_ISIZE(ip) - offset;
> -	error = iomap_zero_range(VFS_I(ip), offset, len, NULL, &xfs_iomap_ops);
> +	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> +			&xfs_buffered_write_iomap_ops);
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index d8e2e36b84f7..b2ea030485e9 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -350,7 +350,7 @@ xfs_file_aio_write_checks(
>  	
>  		trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
>  		error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
> -				NULL, &xfs_iomap_ops);
> +				NULL, &xfs_buffered_write_iomap_ops);
>  		if (error)
>  			return error;
>  	} else
> @@ -546,7 +546,8 @@ xfs_file_dio_aio_write(
>  	}
>  
>  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> -	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, &xfs_dio_write_ops);
> +	ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> +			   &xfs_dio_write_ops);
>  
>  	/*
>  	 * If unaligned, this is the only IO in-flight. If it has not yet
> @@ -593,7 +594,7 @@ xfs_file_dax_write(
>  	count = iov_iter_count(from);
>  
>  	trace_xfs_file_dax_write(ip, count, pos);
> -	ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
> +	ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
>  	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
>  		i_size_write(inode, iocb->ki_pos);
>  		error = xfs_setfilesize(ip, pos, ret);
> @@ -640,7 +641,8 @@ xfs_file_buffered_aio_write(
>  	current->backing_dev_info = inode_to_bdi(inode);
>  
>  	trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
> -	ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
> +	ret = iomap_file_buffered_write(iocb, from,
> +			&xfs_buffered_write_iomap_ops);
>  	if (likely(ret >= 0))
>  		iocb->ki_pos += ret;
>  
> @@ -1133,12 +1135,14 @@ __xfs_filemap_fault(
>  
>  		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
>  				(write_fault && !vmf->cow_page) ?
> -				 &xfs_iomap_ops : &xfs_read_iomap_ops);
> +				 &xfs_direct_write_iomap_ops :
> +				 &xfs_read_iomap_ops);
>  		if (ret & VM_FAULT_NEEDDSYNC)
>  			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
>  	} else {
>  		if (write_fault)
> -			ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
> +			ret = iomap_page_mkwrite(vmf,
> +					&xfs_buffered_write_iomap_ops);
>  		else
>  			ret = filemap_fault(vmf);
>  	}
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 7ae55503ee27..6dd143374d75 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -719,16 +719,7 @@ xfs_ilock_for_iomap(
>  }
>  
>  static int
> -xfs_file_iomap_begin_delay(
> -	struct inode		*inode,
> -	loff_t			offset,
> -	loff_t			count,
> -	unsigned		flags,
> -	struct iomap		*iomap,
> -	struct iomap		*srcmap);
> -
> -static int
> -xfs_file_iomap_begin(
> +xfs_direct_write_iomap_begin(
>  	struct inode		*inode,
>  	loff_t			offset,
>  	loff_t			length,
> @@ -751,13 +742,6 @@ xfs_file_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
> -	    !xfs_get_extsz_hint(ip)) {
> -		/* Reserve delalloc blocks for regular writeback. */
> -		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> -				iomap, srcmap);
> -	}
> -
>  	/*
>  	 * Lock the inode in the manner required for the specified operation and
>  	 * check for as many conditions that would result in blocking as
> @@ -857,8 +841,12 @@ xfs_file_iomap_begin(
>  	return error;
>  }
>  
> +const struct iomap_ops xfs_direct_write_iomap_ops = {
> +	.iomap_begin		= xfs_direct_write_iomap_begin,
> +};
> +
>  static int
> -xfs_file_iomap_begin_delay(
> +xfs_buffered_write_iomap_begin(
>  	struct inode		*inode,
>  	loff_t			offset,
>  	loff_t			count,
> @@ -877,8 +865,12 @@ xfs_file_iomap_begin_delay(
>  	int			whichfork = XFS_DATA_FORK;
>  	int			error = 0;
>  
> +	/* we can't use delayed allocations when using extent size hints */
> +	if (xfs_get_extsz_hint(ip))
> +		return xfs_direct_write_iomap_begin(inode, offset, count,
> +				flags, iomap, srcmap);
> +
>  	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> -	ASSERT(!xfs_get_extsz_hint(ip));
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  
> @@ -1070,18 +1062,23 @@ xfs_file_iomap_begin_delay(
>  }
>  
>  static int
> -xfs_file_iomap_end_delalloc(
> -	struct xfs_inode	*ip,
> +xfs_buffered_write_iomap_end(
> +	struct inode		*inode,
>  	loff_t			offset,
>  	loff_t			length,
>  	ssize_t			written,
> +	unsigned		flags,
>  	struct iomap		*iomap)
>  {
> +	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		start_fsb;
>  	xfs_fileoff_t		end_fsb;
>  	int			error = 0;
>  
> +	if (iomap->type != IOMAP_DELALLOC)
> +		return 0;
> +
>  	/*
>  	 * Behave as if the write failed if drop writes is enabled. Set the NEW
>  	 * flag to force delalloc cleanup.
> @@ -1126,25 +1123,9 @@ xfs_file_iomap_end_delalloc(
>  	return 0;
>  }
>  
> -static int
> -xfs_file_iomap_end(
> -	struct inode		*inode,
> -	loff_t			offset,
> -	loff_t			length,
> -	ssize_t			written,
> -	unsigned		flags,
> -	struct iomap		*iomap)
> -{
> -	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
> -	    iomap->type == IOMAP_DELALLOC)
> -		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
> -				length, written, iomap);
> -	return 0;
> -}
> -
> -const struct iomap_ops xfs_iomap_ops = {
> -	.iomap_begin		= xfs_file_iomap_begin,
> -	.iomap_end		= xfs_file_iomap_end,
> +const struct iomap_ops xfs_buffered_write_iomap_ops = {
> +	.iomap_begin		= xfs_buffered_write_iomap_begin,
> +	.iomap_end		= xfs_buffered_write_iomap_end,
>  };
>  
>  static int
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 61b1fc3e5143..7aed28275089 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -39,7 +39,8 @@ xfs_aligned_fsb_count(
>  	return count_fsb;
>  }
>  
> -extern const struct iomap_ops xfs_iomap_ops;
> +extern const struct iomap_ops xfs_buffered_write_iomap_ops;
> +extern const struct iomap_ops xfs_direct_write_iomap_ops;
>  extern const struct iomap_ops xfs_read_iomap_ops;
>  extern const struct iomap_ops xfs_seek_iomap_ops;
>  extern const struct iomap_ops xfs_xattr_iomap_ops;
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index f0c7df82e18b..5c1154e761ca 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -882,10 +882,10 @@ xfs_setattr_size(
>  	if (newsize > oldsize) {
>  		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
>  		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
> -				&did_zeroing, &xfs_iomap_ops);
> +				&did_zeroing, &xfs_buffered_write_iomap_ops);
>  	} else {
>  		error = iomap_truncate_page(inode, newsize, &did_zeroing,
> -				&xfs_iomap_ops);
> +				&xfs_buffered_write_iomap_ops);
>  	}
>  
>  	if (error)
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 673018a618f0..c68700c3a64a 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1270,7 +1270,7 @@ xfs_reflink_zero_posteof(
>  
>  	trace_xfs_zero_eof(ip, isize, pos - isize);
>  	return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
> -			&xfs_iomap_ops);
> +			&xfs_buffered_write_iomap_ops);
>  }
>  
>  /*
> @@ -1532,7 +1532,8 @@ xfs_reflink_unshare(
>  
>  	inode_dio_wait(inode);
>  
> -	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
> +	error = iomap_file_unshare(inode, offset, len,
> +			&xfs_buffered_write_iomap_ops);
>  	if (error)
>  		goto out;
>  	error = filemap_write_and_wait(inode->i_mapping);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin
  2019-09-09 18:27 ` [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
  2019-09-14  0:42   ` Allison Collins
@ 2019-09-18 18:00   ` Darrick J. Wong
  1 sibling, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:20PM +0200, Christoph Hellwig wrote:
> Renaming whichfork to allocfork in xfs_buffered_write_iomap_begin makes
> the usage of this variable a little more clear.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 6dd143374d75..0e575ca1e3fc 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -862,7 +862,7 @@ xfs_buffered_write_iomap_begin(
>  	struct xfs_iext_cursor	icur, ccur;
>  	xfs_fsblock_t		prealloc_blocks = 0;
>  	bool			eof = false, cow_eof = false, shared = false;
> -	int			whichfork = XFS_DATA_FORK;
> +	int			allocfork = XFS_DATA_FORK;
>  	int			error = 0;
>  
>  	/* we can't use delayed allocations when using extent size hints */
> @@ -959,7 +959,7 @@ xfs_buffered_write_iomap_begin(
>  		 * Fork all the shared blocks from our write offset until the
>  		 * end of the extent.
>  		 */
> -		whichfork = XFS_COW_FORK;
> +		allocfork = XFS_COW_FORK;
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
>  	} else {
>  		/*
> @@ -975,7 +975,7 @@ xfs_buffered_write_iomap_begin(
>  		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
>  
>  		if (xfs_is_always_cow_inode(ip))
> -			whichfork = XFS_COW_FORK;
> +			allocfork = XFS_COW_FORK;
>  	}
>  
>  	error = xfs_qm_dqattach_locked(ip, false);
> @@ -983,7 +983,7 @@ xfs_buffered_write_iomap_begin(
>  		goto out_unlock;
>  
>  	if (eof) {
> -		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
> +		prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
>  				count, &icur);
>  		if (prealloc_blocks) {
>  			xfs_extlen_t	align;
> @@ -1006,11 +1006,11 @@ xfs_buffered_write_iomap_begin(
>  	}
>  
>  retry:
> -	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
> +	error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
>  			end_fsb - offset_fsb, prealloc_blocks,
> -			whichfork == XFS_DATA_FORK ? &imap : &cmap,
> -			whichfork == XFS_DATA_FORK ? &icur : &ccur,
> -			whichfork == XFS_DATA_FORK ? eof : cow_eof);
> +			allocfork == XFS_DATA_FORK ? &imap : &cmap,
> +			allocfork == XFS_DATA_FORK ? &icur : &ccur,
> +			allocfork == XFS_DATA_FORK ? eof : cow_eof);
>  	switch (error) {
>  	case 0:
>  		break;
> @@ -1027,8 +1027,8 @@ xfs_buffered_write_iomap_begin(
>  		goto out_unlock;
>  	}
>  
> -	if (whichfork == XFS_COW_FORK) {
> -		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
> +	if (allocfork == XFS_COW_FORK) {
> +		trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap);
>  		goto found_cow;
>  	}
>  
> @@ -1037,7 +1037,7 @@ xfs_buffered_write_iomap_begin(
>  	 * them out if the write happens to fail.
>  	 */
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
> +	trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
>  	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
>  
>  found_imap:
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 18/19] xfs: cleanup xfs_iomap_write_unwritten
  2019-09-09 18:27 ` [PATCH 18/19] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
@ 2019-09-18 18:06   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:21PM +0200, Christoph Hellwig wrote:
> Move more checks into the helpers that determine if we need a COW
> operation or allocation and split the return path for when an existing
> data for allocation has been found versus a new allocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 74 ++++++++++++++++++++++++----------------------
>  1 file changed, 38 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 0e575ca1e3fc..e4e79aa5b695 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -642,23 +642,42 @@ xfs_iomap_write_unwritten(
>  static inline bool
>  imap_needs_alloc(
>  	struct inode		*inode,
> +	unsigned		flags,
>  	struct xfs_bmbt_irec	*imap,
>  	int			nimaps)
>  {
> -	return !nimaps ||
> -		imap->br_startblock == HOLESTARTBLOCK ||
> -		imap->br_startblock == DELAYSTARTBLOCK ||
> -		(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
> +	/* don't allocate blocks when just zeroing */
> +	if (flags & IOMAP_ZERO)
> +		return false;
> +	if (!nimaps ||
> +	    imap->br_startblock == HOLESTARTBLOCK ||
> +	    imap->br_startblock == DELAYSTARTBLOCK)
> +		return true;
> +	/* we convert unwritten extents before copying the data for DAX */
> +	if (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN)
> +		return true;
> +	return false;
>  }
>  
>  static inline bool
> -needs_cow_for_zeroing(
> +imap_needs_cow(
> +	struct xfs_inode	*ip,
>  	struct xfs_bmbt_irec	*imap,
> +	unsigned int		flags,
>  	int			nimaps)
>  {
> -	return nimaps &&
> -		imap->br_startblock != HOLESTARTBLOCK &&
> -		imap->br_state != XFS_EXT_UNWRITTEN;
> +	if (!xfs_is_cow_inode(ip))
> +		return false;
> +
> +	/* when zeroing we don't have to COW holes or unwritten extents */
> +	if (flags & IOMAP_ZERO) {
> +		if (!nimaps ||
> +		    imap->br_startblock == HOLESTARTBLOCK ||
> +		    imap->br_state == XFS_EXT_UNWRITTEN)
> +			return false;
> +	}
> +
> +	return true;
>  }
>  
>  static int
> @@ -734,7 +753,6 @@ xfs_direct_write_iomap_begin(
>  	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
>  	int			nimaps = 1, error = 0;
>  	bool			shared = false;
> -	u16			iomap_flags = 0;
>  	unsigned		lockmode;
>  
>  	ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
> @@ -761,12 +779,7 @@ xfs_direct_write_iomap_begin(
>  	 * Break shared extents if necessary. Checks for non-blocking IO have
>  	 * been done up front, so we don't need to do them here.
>  	 */
> -	if (xfs_is_cow_inode(ip)) {
> -		/* if zeroing doesn't need COW allocation, then we are done. */
> -		if ((flags & IOMAP_ZERO) &&
> -		    !needs_cow_for_zeroing(&imap, nimaps))
> -			goto out_found;
> -
> +	if (imap_needs_cow(ip, &imap, flags, nimaps)) {
>  		/* may drop and re-acquire the ilock */
>  		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
>  				&lockmode, flags & IOMAP_DIRECT);
> @@ -778,18 +791,17 @@ xfs_direct_write_iomap_begin(
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
>  	}
>  
> -	/* Don't need to allocate over holes when doing zeroing operations. */
> -	if (flags & IOMAP_ZERO)
> -		goto out_found;
> +	if (imap_needs_alloc(inode, flags, &imap, nimaps))
> +		goto allocate_blocks;
>  
> -	if (!imap_needs_alloc(inode, &imap, nimaps))
> -		goto out_found;
> +	xfs_iunlock(ip, lockmode);
> +	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
>  
> -	/* If nowait is set bail since we are going to make allocations. */
> -	if (flags & IOMAP_NOWAIT) {
> -		error = -EAGAIN;
> +allocate_blocks:
> +	error = -EAGAIN;
> +	if (flags & IOMAP_NOWAIT)
>  		goto out_unlock;
> -	}
>  
>  	/*
>  	 * We cap the maximum length we map to a sane size  to keep the chunks
> @@ -808,22 +820,12 @@ xfs_direct_write_iomap_begin(
>  	 */
>  	if (lockmode == XFS_ILOCK_EXCL)
>  		xfs_ilock_demote(ip, lockmode);
> -	error = xfs_iomap_write_direct(ip, offset, length, &imap,
> -			nimaps);
> +	error = xfs_iomap_write_direct(ip, offset, length, &imap, nimaps);
>  	if (error)
>  		return error;
>  
> -	iomap_flags |= IOMAP_F_NEW;
>  	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
> -
> -out_finish:
> -	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
> -
> -out_found:
> -	ASSERT(nimaps);
> -	xfs_iunlock(ip, lockmode);
> -	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> -	goto out_finish;
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
>  
>  out_found_cow:
>  	xfs_iunlock(ip, lockmode);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 19/19] xfs: improve the IOMAP_NOWAIT check for COW inodes
  2019-09-09 18:27 ` [PATCH 19/19] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
@ 2019-09-18 18:09   ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 09, 2019 at 08:27:22PM +0200, Christoph Hellwig wrote:
> Only bail out once we know that a COW allocation is actually required,
> similar to how we handle normal data fork allocations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

/me is still a little fuzzy about what NOWAIT is actually supposed to
mean (no waiting for fs metadata to load?  no waiting on other io to the
same device?  no waiting for anything, period?) but afaict this doesn't
seem to change the behavior much...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 23 +++++------------------
>  1 file changed, 5 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e4e79aa5b695..9e1b4b94acac 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -693,15 +693,8 @@ xfs_ilock_for_iomap(
>  	 * COW writes may allocate delalloc space or convert unwritten COW
>  	 * extents, so we need to make sure to take the lock exclusively here.
>  	 */
> -	if (xfs_is_cow_inode(ip) && is_write) {
> -		/*
> -		 * FIXME: It could still overwrite on unshared extents and not
> -		 * need allocation.
> -		 */
> -		if (flags & IOMAP_NOWAIT)
> -			return -EAGAIN;
> +	if (xfs_is_cow_inode(ip) && is_write)
>  		mode = XFS_ILOCK_EXCL;
> -	}
>  
>  	/*
>  	 * Extents not yet cached requires exclusive access, don't block.  This
> @@ -760,12 +753,6 @@ xfs_direct_write_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	/*
> -	 * Lock the inode in the manner required for the specified operation and
> -	 * check for as many conditions that would result in blocking as
> -	 * possible. This removes most of the non-blocking checks from the
> -	 * mapping code below.
> -	 */
>  	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
>  	if (error)
>  		return error;
> @@ -775,11 +762,11 @@ xfs_direct_write_iomap_begin(
>  	if (error)
>  		goto out_unlock;
>  
> -	/*
> -	 * Break shared extents if necessary. Checks for non-blocking IO have
> -	 * been done up front, so we don't need to do them here.
> -	 */
>  	if (imap_needs_cow(ip, &imap, flags, nimaps)) {
> +		error = -EAGAIN;
> +		if (flags & IOMAP_NOWAIT)
> +			goto out_unlock;
> +
>  		/* may drop and re-acquire the ilock */
>  		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
>  				&lockmode, flags & IOMAP_DIRECT);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O
  2019-09-16 18:42   ` Darrick J. Wong
@ 2019-09-18 18:15     ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 16, 2019 at 11:42:37AM -0700, Darrick J. Wong wrote:
> On Mon, Sep 09, 2019 at 08:27:10PM +0200, Christoph Hellwig wrote:

<snip>

> > diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> > index 54c02aecf3cd..67efd86675e6 100644
> > --- a/fs/iomap/apply.c
> > +++ b/fs/iomap/apply.c
> > @@ -24,7 +24,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> >  		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
> >  {
> >  	struct iomap iomap = { 0 };
> > +	struct iomap srcmap = { 0 };
> >  	loff_t written = 0, ret;
> > +	u64 end;
> >  
> >  	/*
> >  	 * Need to map a range from start position for length bytes. This can
> > @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> >  	 * expose transient stale data. If the reserve fails, we can safely
> >  	 * back out at this point as there is nothing to undo.
> >  	 */
> > -	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> > +	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
> >  	if (ret)
> >  		return ret;
> >  	if (WARN_ON(iomap.offset > pos))
> > @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> >  	 * Cut down the length to the one actually provided by the filesystem,
> >  	 * as it might not be able to give us the whole size that we requested.
> >  	 */
> > -	if (iomap.offset + iomap.length < pos + length)
> > -		length = iomap.offset + iomap.length - pos;
> > +	end = iomap.offset + iomap.length;
> > +	if (srcmap.type)
> 
> Do we need a "#define IOMAP_NULL 0" type to make this more explicit?
> 
> Or at least a comment in iomap.h above the IOMAP_HOLE definition
> pointing out that type == 0 is valid for srcmap and not valid for iomap?
> 
> Other than that the rest looks ok.

FWIW, my opinions about this have evolved since two days ago; please see
my reply to patch #12 about initializing @srcmap.type = IOMAP_HOLE and
using "type == IOMAP_HOLE" to decide if we're going to use the srcmap
for anything.

--D

> --D
> 
> (This reminds me that at some point I ought to work on iomap debugging
> checks for the mappings returned by ->iomap_begin ...)
> 
> > +		end = min(end, srcmap.offset + srcmap.length);
> > +	if (pos + length > end)
> > +		length = end - pos;
> >  
> >  	/*
> > -	 * Now that we have guaranteed that the space allocation will succeed.
> > +	 * Now that we have guaranteed that the space allocation will succeed,
> >  	 * we can do the copy-in page by page without having to worry about
> >  	 * failures exposing transient data.
> > +	 *
> > +	 * To support COW operations, we read in data for partially blocks from
> > +	 * the srcmap if the file system filled it in.  In that case we the
> > +	 * length needs to be limited to the earlier of the ends of the iomaps.
> > +	 * If the file system did not provide a srcmap we pass in the normal
> > +	 * iomap into the actors so that they don't need to have special
> > +	 * handling for the two cases.
> >  	 */
> > -	written = actor(inode, pos, length, data, &iomap);
> > +	written = actor(inode, pos, length, data, &iomap,
> > +			srcmap.type ? &srcmap : &iomap);
> >  
> >  	/*
> >  	 * Now the data has been copied, commit the range we've copied.  This
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index a421977a9496..58fb7a894ad4 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -219,7 +219,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
> >  
> >  static loff_t
> >  iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iomap_readpage_ctx *ctx = data;
> >  	struct page *page = ctx->cur_page;
> > @@ -367,7 +367,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
> >  
> >  static loff_t
> >  iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iomap_readpage_ctx *ctx = data;
> >  	loff_t done, ret;
> > @@ -387,7 +387,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> >  			ctx->cur_page_in_bio = false;
> >  		}
> >  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> > -				ctx, iomap);
> > +				ctx, iomap, srcmap);
> >  	}
> >  
> >  	return done;
> > @@ -567,7 +567,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
> >  
> >  static int
> >  __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> > -		struct page *page, struct iomap *iomap)
> > +		struct page *page, struct iomap *srcmap)
> >  {
> >  	struct iomap_page *iop = iomap_page_create(inode, page);
> >  	loff_t block_size = i_blocksize(inode);
> > @@ -590,7 +590,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >  		    (to <= poff || to >= poff + plen))
> >  			continue;
> >  
> > -		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> > +		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
> >  			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> >  				return -EIO;
> >  			zero_user_segments(page, poff, from, to, poff + plen);
> > @@ -599,7 +599,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >  		}
> >  
> >  		status = iomap_read_page_sync(block_start, page, poff, plen,
> > -				iomap);
> > +				srcmap);
> >  		if (status)
> >  			return status;
> >  	} while ((block_start += plen) < block_end);
> > @@ -609,13 +609,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >  
> >  static int
> >  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> > -		struct page **pagep, struct iomap *iomap)
> > +		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	const struct iomap_page_ops *page_ops = iomap->page_ops;
> >  	struct page *page;
> >  	int status = 0;
> >  
> >  	BUG_ON(pos + len > iomap->offset + iomap->length);
> > +	if (srcmap != iomap)
> > +		BUG_ON(pos + len > srcmap->offset + srcmap->length);
> >  
> >  	if (fatal_signal_pending(current))
> >  		return -EINTR;
> > @@ -633,13 +635,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >  		goto out_no_page;
> >  	}
> >  
> > -	if (iomap->type == IOMAP_INLINE)
> > -		iomap_read_inline_data(inode, page, iomap);
> > +	if (srcmap->type == IOMAP_INLINE)
> > +		iomap_read_inline_data(inode, page, srcmap);
> >  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> > -		status = __block_write_begin_int(page, pos, len, NULL, iomap);
> > +		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
> >  	else
> >  		status = __iomap_write_begin(inode, pos, len, flags, page,
> > -				iomap);
> > +				srcmap);
> >  
> >  	if (unlikely(status))
> >  		goto out_unlock;
> > @@ -725,16 +727,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
> >  }
> >  
> >  static int
> > -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> > -		unsigned copied, struct page *page, struct iomap *iomap)
> > +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> > +		struct page *page, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	const struct iomap_page_ops *page_ops = iomap->page_ops;
> >  	loff_t old_size = inode->i_size;
> >  	int ret;
> >  
> > -	if (iomap->type == IOMAP_INLINE) {
> > +	if (srcmap->type == IOMAP_INLINE) {
> >  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> > -	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> > +	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
> >  		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> >  				page, NULL);
> >  	} else {
> > @@ -765,7 +767,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> >  
> >  static loff_t
> >  iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iov_iter *i = data;
> >  	long status = 0;
> > @@ -799,7 +801,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  			break;
> >  		}
> >  
> > -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> > +		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> > +				srcmap);
> >  		if (unlikely(status))
> >  			break;
> >  
> > @@ -810,8 +813,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  
> >  		flush_dcache_page(page);
> >  
> > -		status = iomap_write_end(inode, pos, bytes, copied, page,
> > -				iomap);
> > +		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> > +				srcmap);
> >  		if (unlikely(status < 0))
> >  			break;
> >  		copied = status;
> > @@ -864,7 +867,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
> >  
> >  static loff_t
> >  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	long status = 0;
> >  	ssize_t written = 0;
> > @@ -873,7 +876,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  	if (!(iomap->flags & IOMAP_F_SHARED))
> >  		return length;
> >  	/* don't bother with holes or unwritten extents */
> > -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> > +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> >  		return length;
> >  
> >  	do {
> > @@ -882,11 +885,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  		struct page *page;
> >  
> >  		status = iomap_write_begin(inode, pos, bytes,
> > -				IOMAP_WRITE_F_UNSHARE, &page, iomap);
> > +				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
> >  		if (unlikely(status))
> >  			return status;
> >  
> > -		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> > +		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> > +				srcmap);
> >  		if (unlikely(status <= 0)) {
> >  			if (WARN_ON_ONCE(status == 0))
> >  				return -EIO;
> > @@ -925,19 +929,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
> >  EXPORT_SYMBOL_GPL(iomap_file_unshare);
> >  
> >  static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> > -		unsigned bytes, struct iomap *iomap)
> > +		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct page *page;
> >  	int status;
> >  
> > -	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> > +	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
> >  	if (status)
> >  		return status;
> >  
> >  	zero_user(page, offset, bytes);
> >  	mark_page_accessed(page);
> >  
> > -	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> > +	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
> >  }
> >  
> >  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> > @@ -949,14 +953,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> >  
> >  static loff_t
> >  iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	bool *did_zero = data;
> >  	loff_t written = 0;
> >  	int status;
> >  
> >  	/* already zeroed?  we're done. */
> > -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> > +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> >  		return count;
> >  
> >  	do {
> > @@ -968,7 +972,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> >  		if (IS_DAX(inode))
> >  			status = iomap_dax_zero(pos, offset, bytes, iomap);
> >  		else
> > -			status = iomap_zero(inode, pos, offset, bytes, iomap);
> > +			status = iomap_zero(inode, pos, offset, bytes, iomap,
> > +					srcmap);
> >  		if (status < 0)
> >  			return status;
> >  
> > @@ -1018,7 +1023,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
> >  
> >  static loff_t
> >  iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct page *page = data;
> >  	int ret;
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 1fc28c2da279..e3ccbf7daaae 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
> >  
> >  static loff_t
> >  iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iomap_dio *dio = data;
> >  
> > diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> > index f26fdd36e383..690ef2d7c6c8 100644
> > --- a/fs/iomap/fiemap.c
> > +++ b/fs/iomap/fiemap.c
> > @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
> >  
> >  static loff_t
> >  iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct fiemap_ctx *ctx = data;
> >  	loff_t ret = length;
> > @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
> >  
> >  static loff_t
> >  iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	sector_t *bno = data, addr;
> >  
> > diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> > index c04bad4b2b43..89f61d93c0bc 100644
> > --- a/fs/iomap/seek.c
> > +++ b/fs/iomap/seek.c
> > @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
> >  
> >  static loff_t
> >  iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> > -		      void *data, struct iomap *iomap)
> > +		      void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	switch (iomap->type) {
> >  	case IOMAP_UNWRITTEN:
> > @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
> >  
> >  static loff_t
> >  iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> > -		      void *data, struct iomap *iomap)
> > +		      void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	switch (iomap->type) {
> >  	case IOMAP_HOLE:
> > diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> > index 152a230f668d..a648dbf6991e 100644
> > --- a/fs/iomap/swapfile.c
> > +++ b/fs/iomap/swapfile.c
> > @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> >   * distinction between written and unwritten extents.
> >   */
> >  static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> > -		loff_t count, void *data, struct iomap *iomap)
> > +		loff_t count, void *data, struct iomap *iomap,
> > +		struct iomap *srcmap)
> >  {
> >  	struct iomap_swapfile_info *isi = data;
> >  	int error;
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index d340db666c06..f1c3ff27c666 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
> >  	loff_t			offset,
> >  	loff_t			length,
> >  	unsigned		flags,
> > -	struct iomap		*iomap)
> > +	struct iomap		*iomap,
> > +	struct iomap		*srcmap)
> >  {
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
> >  	loff_t			offset,
> >  	loff_t			length,
> >  	unsigned		flags,
> > -	struct iomap		*iomap)
> > +	struct iomap		*iomap,
> > +	struct iomap		*srcmap)
> >  {
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
> >  	loff_t			offset,
> >  	loff_t			length,
> >  	unsigned		flags,
> > -	struct iomap		*iomap)
> > +	struct iomap		*iomap,
> > +	struct iomap		*srcmap)
> >  {
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 3a0f0975a57e..6d795924baa7 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -127,7 +127,8 @@ struct iomap_ops {
> >  	 * The actual length is returned in iomap->length.
> >  	 */
> >  	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> > -			unsigned flags, struct iomap *iomap);
> > +			unsigned flags, struct iomap *iomap,
> > +			struct iomap *srcmap);
> >  
> >  	/*
> >  	 * Commit and/or unreserve space previous allocated using iomap_begin.
> > @@ -143,7 +144,7 @@ struct iomap_ops {
> >   * Main iomap iterator function.
> >   */
> >  typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> > -		void *data, struct iomap *iomap);
> > +		void *data, struct iomap *iomap, struct iomap *srcmap);
> >  
> >  loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
> >  		unsigned flags, const struct iomap_ops *ops, void *data,
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/19] iomap: use write_begin to read pages to unshare
  2019-09-16 18:34   ` Darrick J. Wong
@ 2019-09-30 11:07     ` Christoph Hellwig
  2019-10-08 15:12       ` Darrick J. Wong
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-30 11:07 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 16, 2019 at 11:34:28AM -0700, Darrick J. Wong wrote:
> > -		if ((from <= poff || from >= poff + plen) &&
> > +		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
> 
> Mmm, archeology of code that I wrote originally and have forgotten
> already... :)
> 
> I think the purpose of F_UNSHARE is to mimic the behavior of the code
> that's being removed, and the old behavior is that if a user asks to
> unshare a page backed by shared extents we'll read in all the blocks
> backing the page, even if that means reading in blocks that weren't part
> of the original unshare request, right?

No.  The flag causes the code to always read the page, even if the iomap
range covers the whole block.  For normal writes that means we don't to
read the block in at all, but for unshare we absolutely must do so.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/19] xfs: pass two imaps to xfs_reflink_allocate_cow
  2019-09-18 17:26   ` Darrick J. Wong
@ 2019-09-30 11:10     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-30 11:10 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Wed, Sep 18, 2019 at 10:26:17AM -0700, Darrick J. Wong wrote:
> Hmm.  At first I thought this was a behavior change but I think it isn't
> because prior to this patch we'd set cmap = imap and if _allocate_cow
> didn't find a shared extent then it would just return without doing
> anything or touching cmap.  In the !shared case this would just set imap
> to itself pointlessly.
> 
> Now that we pass both imap and cmap to _allocate_cow, in the !shared
> case we don't initialized cmap at all, so adding the @shared check is
> required for correct operation, right?

Exactly.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 15/19] xfs: move xfs_file_iomap_begin_delay around
  2019-09-18 17:59   ` Darrick J. Wong
@ 2019-09-30 11:14     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-09-30 11:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Wed, Sep 18, 2019 at 10:59:02AM -0700, Darrick J. Wong wrote:
> > +			p_end_fsb = min(p_end_fsb,
> > +		   		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
> 
> Might as well fix the indenting damage here?

Done.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 12/19] xfs: fill out the srcmap in iomap_begin
  2019-09-18 17:52   ` Darrick J. Wong
@ 2019-10-01  6:26     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2019-10-01  6:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Wed, Sep 18, 2019 at 10:52:28AM -0700, Darrick J. Wong wrote:
> TBH I've been wondering for a while now if it would make more sense to
> do this in iomap_apply (and the open-coded versions in dax.c):
> 
> 	struct iomap srcmap = { .type = IOMAP_HOLE };
> 
> in the iomap_apply function (and change the "if (!srcmap.type)" checks
> to "if (srcmap.type != IOMAP_HOLE)").  That would get rid of the weird
> situation where iomap.h doesn't define an iomap type name corresponding
> to 0 but clearly it has some special meaning because the iomap code
> changes behavior based on that.
> 
> It also strikes me as weird that for the @imap parameter, type == 0
> would be considered a coding error but for @srcmap, we use type == 0 to
> mean "no mapping" but we don't do that for @srcmap.type == IOMAP_HOLE.
> 
> I mention that because, if some ->iomap_begin function returns
> IOMAP_HOLE then iomap_apply will pass the (hole) srcmap as the second
> parameter to the ->actor function.  When that happens, iomap_write_begin
> call will try to fill in the rest of the page from @srcmap (which is
> hole), not the @iomap (which might not be a hole) which seems wrong.

I've renumber IOMAP_HOLE and initialized all the maps to it, that seems
like a nice improvement.

> As for this function, if we made the above change, then the conditional
> becomes unneccessary -- we know this is a COW write, so we call
> xfs_bmbt_to_iomap on both mappings and exit.  No need for special
> casing.

OTOH I can't really agree to this.  We now do pointless extra work
for a common case, which also seems a little confusing.  It also goes
again the future direction where at least for some cases I want to
avoid the imap lookup entirely if we don't need it.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/19] iomap: use write_begin to read pages to unshare
  2019-09-30 11:07     ` Christoph Hellwig
@ 2019-10-08 15:12       ` Darrick J. Wong
  0 siblings, 0 replies; 54+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Mon, Sep 30, 2019 at 01:07:31PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 16, 2019 at 11:34:28AM -0700, Darrick J. Wong wrote:
> > > -		if ((from <= poff || from >= poff + plen) &&
> > > +		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
> > 
> > Mmm, archeology of code that I wrote originally and have forgotten
> > already... :)
> > 
> > I think the purpose of F_UNSHARE is to mimic the behavior of the code
> > that's being removed, and the old behavior is that if a user asks to
> > unshare a page backed by shared extents we'll read in all the blocks
> > backing the page, even if that means reading in blocks that weren't part
> > of the original unshare request, right?
> 
> No.  The flag causes the code to always read the page, even if the iomap
> range covers the whole block.  For normal writes that means we don't to
> read the block in at all, but for unshare we absolutely must do so.

Ahhh, right.  Missed that.  I'll go RVB the v2 patch now.

--D

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2019-10-08 15:13 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-09 18:27 iomap and xfs COW cleanups Christoph Hellwig
2019-09-09 18:27 ` [PATCH 01/19] iomap: better document the IOMAP_F_* flags Christoph Hellwig
2019-09-14  0:42   ` Allison Collins
2019-09-16 18:08   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 02/19] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
2019-09-14  0:42   ` Allison Collins
2019-09-16 18:10   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 03/19] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
2019-09-16 18:11   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 04/19] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
2019-09-16 18:12   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 05/19] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
2019-09-16 18:17   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 06/19] iomap: use write_begin to read pages to unshare Christoph Hellwig
2019-09-16 18:34   ` Darrick J. Wong
2019-09-30 11:07     ` Christoph Hellwig
2019-10-08 15:12       ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 07/19] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
2019-09-10 12:48   ` Goldwyn Rodrigues
2019-09-10 14:39     ` hch
2019-09-16 17:57       ` Darrick J. Wong
2019-09-16 18:42   ` Darrick J. Wong
2019-09-18 18:15     ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 08/19] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
2019-09-18 17:09   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 09/19] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
2019-09-18 17:17   ` Darrick J. Wong
2019-09-18 17:25     ` Christoph Hellwig
2019-09-18 17:31       ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 10/19] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
2019-09-18 17:26   ` Darrick J. Wong
2019-09-30 11:10     ` Christoph Hellwig
2019-09-09 18:27 ` [PATCH 11/19] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
2019-09-18 17:30   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 12/19] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
2019-09-18 17:52   ` Darrick J. Wong
2019-10-01  6:26     ` Christoph Hellwig
2019-09-09 18:27 ` [PATCH 13/19] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
2019-09-14  0:42   ` Allison Collins
2019-09-18 17:55   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 14/19] xfs: split out a new set of read-only iomap ops Christoph Hellwig
2019-09-18 17:56   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 15/19] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
2019-09-18 17:59   ` Darrick J. Wong
2019-09-30 11:14     ` Christoph Hellwig
2019-09-09 18:27 ` [PATCH 16/19] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
2019-09-18 18:00   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 17/19] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
2019-09-14  0:42   ` Allison Collins
2019-09-18 18:00   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 18/19] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
2019-09-18 18:06   ` Darrick J. Wong
2019-09-09 18:27 ` [PATCH 19/19] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
2019-09-18 18:09   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).