linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iomap and xfs COW cleanups v2
@ 2019-10-08  7:15 Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 01/20] iomap: better document the IOMAP_F_* flags Christoph Hellwig
                   ` (19 more replies)
  0 siblings, 20 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Hi all,

this series started out based on a review of the btrfs iomap series
from Goldwyn.  I've taken his main srcmap patch and modified it a bit
based on my experience of converting xfs over to use the feature.
That led to comments that the xfs code is a mess, so I resurrected
an old series to clean that up and merged it in.  That series also
happens to massively clean up the unshare path in the iomap and xfs
code as well.  The series is on top of the move of the xfs writeback
code to iomap.

Changes since v1:
 - renumber IOMAP_HOLE to 0 and avoid the reserved 0 value
 - fix minor typos and update comments

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/20] iomap: better document the IOMAP_F_* flags
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 02/20] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
  Cc: Allison Collins, Darrick J . Wong

The documentation for IOMAP_F_* is a bit disorganized, and doesn't
mention the fact that most flags are set by the file system and consumed
by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
consumed by the file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/linux/iomap.h | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 46ce730b1590..17cf63717681 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -30,21 +30,36 @@ struct vm_fault;
 #define IOMAP_INLINE	0x05	/* data inline in the inode */
 
 /*
- * Flags for all iomap mappings:
+ * Flags reported by the file system from iomap_begin:
+ *
+ * IOMAP_F_NEW indicates that the blocks have been newly allocated and need
+ * zeroing for areas that no data is copied to.
  *
  * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access
  * written data and requires fdatasync to commit them to persistent storage.
+ *
+ * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be
+ * unshared as part a write.
+ *
+ * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block
+ * mappings.
+ *
+ * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of
+ * buffer heads for this mapping.
  */
-#define IOMAP_F_NEW		0x01	/* blocks have been newly allocated */
-#define IOMAP_F_DIRTY		0x02	/* uncommitted metadata */
-#define IOMAP_F_BUFFER_HEAD	0x04	/* file system requires buffer heads */
-#define IOMAP_F_SIZE_CHANGED	0x08	/* file size has changed */
+#define IOMAP_F_NEW		0x01
+#define IOMAP_F_DIRTY		0x02
+#define IOMAP_F_SHARED		0x04
+#define IOMAP_F_MERGED		0x08
+#define IOMAP_F_BUFFER_HEAD	0x10
 
 /*
- * Flags that only need to be reported for IOMAP_REPORT requests:
+ * Flags set by the core iomap code during operations:
+ *
+ * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
+ * has changed as the result of this write operation.
  */
-#define IOMAP_F_MERGED		0x10	/* contains multiple blocks/extents */
-#define IOMAP_F_SHARED		0x20	/* block shared with another file */
+#define IOMAP_F_SIZE_CHANGED	0x100
 
 /*
  * Flags from 0x1000 up are for file system specific usage:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/20] iomap: remove the unused iomap argument to __iomap_write_end
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 01/20] iomap: better document the IOMAP_F_* flags Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 03/20] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
  Cc: Allison Collins, Darrick J . Wong

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap/buffered-io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 35c93f72c172..672f5d8efdbe 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -693,7 +693,7 @@ EXPORT_SYMBOL_GPL(iomap_set_page_dirty);
 
 static int
 __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page, struct iomap *iomap)
+		unsigned copied, struct page *page)
 {
 	flush_dcache_page(page);
 
@@ -746,7 +746,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
 				page, NULL);
 	} else {
-		ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
+		ret = __iomap_write_end(inode, pos, len, copied, page);
 	}
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/20] iomap: always use AOP_FLAG_NOFS in iomap_write_begin
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 01/20] iomap: better document the IOMAP_F_* flags Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 02/20] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 04/20] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
to allow reusing the flags arguments for an internal flags namespace
soon.  Also remove the local index variable that is only used once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap/buffered-io.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 672f5d8efdbe..bf6a0e0b92a5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -621,7 +621,6 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, struct iomap *iomap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
-	pgoff_t index = pos >> PAGE_SHIFT;
 	struct page *page;
 	int status = 0;
 
@@ -636,7 +635,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 			return status;
 	}
 
-	page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
+	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
+			AOP_FLAG_NOFS);
 	if (!page) {
 		status = -ENOMEM;
 		goto out_no_page;
@@ -778,7 +778,6 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct iov_iter *i = data;
 	long status = 0;
 	ssize_t written = 0;
-	unsigned int flags = AOP_FLAG_NOFS;
 
 	do {
 		struct page *page;
@@ -808,8 +807,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, flags, &page,
-				iomap);
+		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
 		if (unlikely(status))
 			break;
 
@@ -907,8 +905,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		if (IS_ERR(rpage))
 			return PTR_ERR(rpage);
 
-		status = iomap_write_begin(inode, pos, bytes,
-					   AOP_FLAG_NOFS, &page, iomap);
+		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
 		put_page(rpage);
 		if (unlikely(status))
 			return status;
@@ -959,8 +956,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 	struct page *page;
 	int status;
 
-	status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
-				   iomap);
+	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
 	if (status)
 		return status;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/20] iomap: ignore non-shared or non-data blocks in xfs_file_dirty
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 03/20] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 05/20] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

xfs_file_dirty is used to unshare reflink blocks.  Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing.  This will allow to simplify
the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap/buffered-io.c | 15 +++++++++++----
 fs/xfs/xfs_reflink.c   |  2 +-
 include/linux/iomap.h  |  2 +-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bf6a0e0b92a5..59751835f172 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -887,12 +887,19 @@ __iomap_read_page(struct inode *inode, loff_t offset)
 }
 
 static loff_t
-iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
+iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
 {
 	long status = 0;
 	ssize_t written = 0;
 
+	/* don't bother with blocks that are not shared to start with */
+	if (!(iomap->flags & IOMAP_F_SHARED))
+		return length;
+	/* don't bother with holes or unwritten extents */
+	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+		return length;
+
 	do {
 		struct page *page, *rpage;
 		unsigned long offset;	/* Offset into pagecache page */
@@ -932,14 +939,14 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 }
 
 int
-iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
+iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops)
 {
 	loff_t ret;
 
 	while (len) {
 		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
-				iomap_dirty_actor);
+				iomap_unshare_actor);
 		if (ret <= 0)
 			return ret;
 		pos += ret;
@@ -948,7 +955,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(iomap_file_dirty);
+EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
 static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
 		unsigned bytes, struct iomap *iomap)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0f08153b4994..a9634110c783 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1442,7 +1442,7 @@ xfs_reflink_dirty_extents(
 			flen = XFS_FSB_TO_B(mp, rlen);
 			if (fpos + flen > isize)
 				flen = isize - fpos;
-			error = iomap_file_dirty(VFS_I(ip), fpos, flen,
+			error = iomap_file_unshare(VFS_I(ip), fpos, flen,
 					&xfs_iomap_ops);
 			xfs_ilock(ip, XFS_ILOCK_EXCL);
 			if (error)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 17cf63717681..220f6b17a1a7 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -166,7 +166,7 @@ int iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 #else
 #define iomap_migrate_page NULL
 #endif
-int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
+int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 		bool *did_zero, const struct iomap_ops *ops);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/20] iomap: move the zeroing case out of iomap_read_page_sync
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 04/20] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 06/20] iomap: use write_begin to read pages to unshare Christoph Hellwig
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

That keeps the function a little easier to understand, and easier to
modify for pending enhancements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap/buffered-io.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 59751835f172..d5abd8e5dca7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -562,19 +562,12 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 }
 
 static int
-iomap_read_page_sync(struct inode *inode, loff_t block_start, struct page *page,
-		unsigned poff, unsigned plen, unsigned from, unsigned to,
-		struct iomap *iomap)
+iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
+		unsigned plen, struct iomap *iomap)
 {
 	struct bio_vec bvec;
 	struct bio bio;
 
-	if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
-		zero_user_segments(page, poff, from, to, poff + plen);
-		iomap_set_range_uptodate(page, poff, plen);
-		return 0;
-	}
-
 	bio_init(&bio, &bvec, 1);
 	bio.bi_opf = REQ_OP_READ;
 	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
@@ -592,7 +585,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
 	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
-	int status = 0;
+	int status;
 
 	if (PageUptodate(page))
 		return 0;
@@ -603,17 +596,23 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 		if (plen == 0)
 			break;
 
-		if ((from > poff && from < poff + plen) ||
-		    (to > poff && to < poff + plen)) {
-			status = iomap_read_page_sync(inode, block_start, page,
-					poff, plen, from, to, iomap);
-			if (status)
-				break;
+		if ((from <= poff || from >= poff + plen) &&
+		    (to <= poff || to >= poff + plen))
+			continue;
+
+		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+			zero_user_segments(page, poff, from, to, poff + plen);
+			iomap_set_range_uptodate(page, poff, plen);
+			continue;
 		}
 
+		status = iomap_read_page_sync(block_start, page, poff, plen,
+				iomap);
+		if (status)
+			return status;
 	} while ((block_start += plen) < block_end);
 
-	return status;
+	return 0;
 }
 
 static int
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/20] iomap: use write_begin to read pages to unshare
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 05/20] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08 15:12   ` Darrick J. Wong
  2019-10-08  7:15 ` [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0 Christoph Hellwig
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Use the existing iomap write_begin code to read the pages unshared
by iomap_file_unshare.  That avoids the extra ->readpage call and
extent tree lookup currently done by read_mapping_page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 49 ++++++++++++++----------------------------
 1 file changed, 16 insertions(+), 33 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index d5abd8e5dca7..ac1bbed71a9b 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -548,6 +548,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 EXPORT_SYMBOL_GPL(iomap_migrate_page);
 #endif /* CONFIG_MIGRATION */
 
+enum {
+	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
+};
+
 static void
 iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 {
@@ -577,7 +581,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 }
 
 static int
-__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
+__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		struct page *page, struct iomap *iomap)
 {
 	struct iomap_page *iop = iomap_page_create(inode, page);
@@ -596,11 +600,14 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 		if (plen == 0)
 			break;
 
-		if ((from <= poff || from >= poff + plen) &&
+		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
+		    (from <= poff || from >= poff + plen) &&
 		    (to <= poff || to >= poff + plen))
 			continue;
 
 		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
+				return -EIO;
 			zero_user_segments(page, poff, from, to, poff + plen);
 			iomap_set_range_uptodate(page, poff, plen);
 			continue;
@@ -646,7 +653,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, iomap);
 	else
-		status = __iomap_write_begin(inode, pos, len, page, iomap);
+		status = __iomap_write_begin(inode, pos, len, flags, page,
+				iomap);
 
 	if (unlikely(status))
 		goto out_unlock;
@@ -869,22 +877,6 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
-static struct page *
-__iomap_read_page(struct inode *inode, loff_t offset)
-{
-	struct address_space *mapping = inode->i_mapping;
-	struct page *page;
-
-	page = read_mapping_page(mapping, offset >> PAGE_SHIFT, NULL);
-	if (IS_ERR(page))
-		return page;
-	if (!PageUptodate(page)) {
-		put_page(page);
-		return ERR_PTR(-EIO);
-	}
-	return page;
-}
-
 static loff_t
 iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap)
@@ -900,24 +892,15 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return length;
 
 	do {
-		struct page *page, *rpage;
-		unsigned long offset;	/* Offset into pagecache page */
-		unsigned long bytes;	/* Bytes to write to page */
-
-		offset = offset_in_page(pos);
-		bytes = min_t(loff_t, PAGE_SIZE - offset, length);
-
-		rpage = __iomap_read_page(inode, pos);
-		if (IS_ERR(rpage))
-			return PTR_ERR(rpage);
+		unsigned long offset = offset_in_page(pos);
+		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
+		struct page *page;
 
-		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
-		put_page(rpage);
+		status = iomap_write_begin(inode, pos, bytes,
+				IOMAP_WRITE_F_UNSHARE, &page, iomap);
 		if (unlikely(status))
 			return status;
 
-		WARN_ON_ONCE(!PageUptodate(page));
-
 		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 06/20] iomap: use write_begin to read pages to unshare Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08 15:01   ` Darrick J. Wong
  2019-10-08  7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

Instead of keeping a separate unnamed state for uninitialized iomaps,
renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated
as a hole.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/iomap.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 220f6b17a1a7..24c784e44274 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -23,11 +23,11 @@ struct vm_fault;
 /*
  * Types of block ranges for iomap mappings:
  */
-#define IOMAP_HOLE	0x01	/* no blocks allocated, need allocation */
-#define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
-#define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
-#define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
-#define IOMAP_INLINE	0x05	/* data inline in the inode */
+#define IOMAP_HOLE	0	/* no blocks allocated, need allocation */
+#define IOMAP_DELALLOC	1	/* delayed allocation blocks */
+#define IOMAP_MAPPED	2	/* blocks allocated at @addr */
+#define IOMAP_UNWRITTEN	3	/* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE	4	/* data inline in the inode */
 
 /*
  * Flags reported by the file system from iomap_begin:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0 Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08 15:00   ` Darrick J. Wong
  2019-10-14 23:27   ` Darrick J. Wong
  2019-10-08  7:15 ` [PATCH 09/20] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
                   ` (11 subsequent siblings)
  19 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

From: Goldwyn Rodrigues <rgoldwyn@suse.com>

The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target.  The srcmap is only supported for buffered writes so far.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
      srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c               |  9 ++++--
 fs/ext2/inode.c        |  2 +-
 fs/ext4/inode.c        |  2 +-
 fs/gfs2/bmap.c         |  3 +-
 fs/iomap/apply.c       | 25 ++++++++++++----
 fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
 fs/iomap/direct-io.c   |  2 +-
 fs/iomap/fiemap.c      |  4 +--
 fs/iomap/seek.c        |  4 +--
 fs/iomap/swapfile.c    |  3 +-
 fs/xfs/xfs_iomap.c     |  9 ++++--
 include/linux/iomap.h  |  5 ++--
 12 files changed, 80 insertions(+), 53 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6bf81f931de3..920105457c2c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
 
 static loff_t
 dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct block_device *bdev = iomap->bdev;
 	struct dax_device *dax_dev = iomap->dax_dev;
@@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
 	struct iomap iomap = { 0 };
+	struct iomap srcmap = { .type = IOMAP_HOLE };
 	unsigned flags = IOMAP_FAULT;
 	int error, major = 0;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
@@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * the file system block size to be equal the page size, which means
 	 * that we never have to deal with more than a single extent here.
 	 */
-	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
+	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
 	if (iomap_errp)
 		*iomap_errp = error;
 	if (error) {
@@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	struct inode *inode = mapping->host;
 	vm_fault_t result = VM_FAULT_FALLBACK;
 	struct iomap iomap = { 0 };
+	struct iomap srcmap = { .type = IOMAP_HOLE };
 	pgoff_t max_pgoff;
 	void *entry;
 	loff_t pos;
@@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * to look up our filesystem block.
 	 */
 	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
-	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
+	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
+			&srcmap);
 	if (error)
 		goto unlock_entry;
 
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 7004ce581a32..467c13ff6b40 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
 
 #ifdef CONFIG_FS_DAX
 static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
-		unsigned flags, struct iomap *iomap)
+		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
 {
 	unsigned int blkbits = inode->i_blkbits;
 	unsigned long first_block = offset >> blkbits;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 516faa280ced..abaaf7d96ca4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
 }
 
 static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
-			    unsigned flags, struct iomap *iomap)
+		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	unsigned int blkbits = inode->i_blkbits;
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index f63df54a08c6..516103248272 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
 }
 
 static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
-			    unsigned flags, struct iomap *iomap)
+			    unsigned flags, struct iomap *iomap,
+			    struct iomap *srcmap)
 {
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct metapath mp = { .mp_aheight = 1, };
diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
index 54c02aecf3cd..484dd8eda861 100644
--- a/fs/iomap/apply.c
+++ b/fs/iomap/apply.c
@@ -23,8 +23,10 @@ loff_t
 iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
 {
-	struct iomap iomap = { 0 };
+	struct iomap iomap = { .type = IOMAP_HOLE };
+	struct iomap srcmap = { .type = IOMAP_HOLE };
 	loff_t written = 0, ret;
+	u64 end;
 
 	/*
 	 * Need to map a range from start position for length bytes. This can
@@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	 * expose transient stale data. If the reserve fails, we can safely
 	 * back out at this point as there is nothing to undo.
 	 */
-	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
+	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
 	if (ret)
 		return ret;
 	if (WARN_ON(iomap.offset > pos))
@@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	 * Cut down the length to the one actually provided by the filesystem,
 	 * as it might not be able to give us the whole size that we requested.
 	 */
-	if (iomap.offset + iomap.length < pos + length)
-		length = iomap.offset + iomap.length - pos;
+	end = iomap.offset + iomap.length;
+	if (srcmap.type != IOMAP_HOLE)
+		end = min(end, srcmap.offset + srcmap.length);
+	if (pos + length > end)
+		length = end - pos;
 
 	/*
-	 * Now that we have guaranteed that the space allocation will succeed.
+	 * Now that we have guaranteed that the space allocation will succeed,
 	 * we can do the copy-in page by page without having to worry about
 	 * failures exposing transient data.
+	 *
+	 * To support COW operations, we read in data for partially blocks from
+	 * the srcmap if the file system filled it in.  In that case we the
+	 * length needs to be limited to the earlier of the ends of the iomaps.
+	 * If the file system did not provide a srcmap we pass in the normal
+	 * iomap into the actors so that they don't need to have special
+	 * handling for the two cases.
 	 */
-	written = actor(inode, pos, length, data, &iomap);
+	written = actor(inode, pos, length, data, &iomap,
+			srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
 
 	/*
 	 * Now the data has been copied, commit the range we've copied.  This
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ac1bbed71a9b..eb2c6d73a837 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
 
 static loff_t
 iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_readpage_ctx *ctx = data;
 	struct page *page = ctx->cur_page;
@@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
 
 static loff_t
 iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_readpage_ctx *ctx = data;
 	loff_t done, ret;
@@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
 			ctx->cur_page_in_bio = false;
 		}
 		ret = iomap_readpage_actor(inode, pos + done, length - done,
-				ctx, iomap);
+				ctx, iomap, srcmap);
 	}
 
 	return done;
@@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 
 static int
 __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
-		struct page *page, struct iomap *iomap)
+		struct page *page, struct iomap *srcmap)
 {
 	struct iomap_page *iop = iomap_page_create(inode, page);
 	loff_t block_size = i_blocksize(inode);
@@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		    (to <= poff || to >= poff + plen))
 			continue;
 
-		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
+		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
 			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
 				return -EIO;
 			zero_user_segments(page, poff, from, to, poff + plen);
@@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		}
 
 		status = iomap_read_page_sync(block_start, page, poff, plen,
-				iomap);
+				srcmap);
 		if (status)
 			return status;
 	} while ((block_start += plen) < block_end);
@@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 
 static int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
-		struct page **pagep, struct iomap *iomap)
+		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	struct page *page;
 	int status = 0;
 
 	BUG_ON(pos + len > iomap->offset + iomap->length);
+	if (srcmap != iomap)
+		BUG_ON(pos + len > srcmap->offset + srcmap->length);
 
 	if (fatal_signal_pending(current))
 		return -EINTR;
@@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		goto out_no_page;
 	}
 
-	if (iomap->type == IOMAP_INLINE)
-		iomap_read_inline_data(inode, page, iomap);
+	if (srcmap->type == IOMAP_INLINE)
+		iomap_read_inline_data(inode, page, srcmap);
 	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
-		status = __block_write_begin_int(page, pos, len, NULL, iomap);
+		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
 		status = __iomap_write_begin(inode, pos, len, flags, page,
-				iomap);
+				srcmap);
 
 	if (unlikely(status))
 		goto out_unlock;
@@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 }
 
 static int
-iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
-		unsigned copied, struct page *page, struct iomap *iomap)
+iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
+		struct page *page, struct iomap *iomap, struct iomap *srcmap)
 {
 	const struct iomap_page_ops *page_ops = iomap->page_ops;
 	loff_t old_size = inode->i_size;
 	int ret;
 
-	if (iomap->type == IOMAP_INLINE) {
+	if (srcmap->type == IOMAP_INLINE) {
 		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
-	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
+	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
 		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
 				page, NULL);
 	} else {
@@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 
 static loff_t
 iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iov_iter *i = data;
 	long status = 0;
@@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
+		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
+				srcmap);
 		if (unlikely(status))
 			break;
 
@@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 		flush_dcache_page(page);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page,
-				iomap);
+		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
+				srcmap);
 		if (unlikely(status < 0))
 			break;
 		copied = status;
@@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
 static loff_t
 iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	long status = 0;
 	ssize_t written = 0;
@@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (!(iomap->flags & IOMAP_F_SHARED))
 		return length;
 	/* don't bother with holes or unwritten extents */
-	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
 		return length;
 
 	do {
@@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct page *page;
 
 		status = iomap_write_begin(inode, pos, bytes,
-				IOMAP_WRITE_F_UNSHARE, &page, iomap);
+				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
 		if (unlikely(status))
 			return status;
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+				srcmap);
 		if (unlikely(status <= 0)) {
 			if (WARN_ON_ONCE(status == 0))
 				return -EIO;
@@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
 static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
-		unsigned bytes, struct iomap *iomap)
+		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page;
 	int status;
 
-	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
+	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
 	if (status)
 		return status;
 
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
 }
 
 static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
 
 static loff_t
 iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	bool *did_zero = data;
 	loff_t written = 0;
 	int status;
 
 	/* already zeroed?  we're done. */
-	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
+	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
 		return count;
 
 	do {
@@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
 		if (IS_DAX(inode))
 			status = iomap_dax_zero(pos, offset, bytes, iomap);
 		else
-			status = iomap_zero(inode, pos, offset, bytes, iomap);
+			status = iomap_zero(inode, pos, offset, bytes, iomap,
+					srcmap);
 		if (status < 0)
 			return status;
 
@@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
 
 static loff_t
 iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct page *page = data;
 	int ret;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 1fc28c2da279..e3ccbf7daaae 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 
 static loff_t
 iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	struct iomap_dio *dio = data;
 
diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index f26fdd36e383..690ef2d7c6c8 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 
 static loff_t
 iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap)
+		struct iomap *iomap, struct iomap *srcmap)
 {
 	struct fiemap_ctx *ctx = data;
 	loff_t ret = length;
@@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
 
 static loff_t
 iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap)
+		void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	sector_t *bno = data, addr;
 
diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
index c04bad4b2b43..89f61d93c0bc 100644
--- a/fs/iomap/seek.c
+++ b/fs/iomap/seek.c
@@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
 
 static loff_t
 iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
-		      void *data, struct iomap *iomap)
+		      void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	switch (iomap->type) {
 	case IOMAP_UNWRITTEN:
@@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
 
 static loff_t
 iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
-		      void *data, struct iomap *iomap)
+		      void *data, struct iomap *iomap, struct iomap *srcmap)
 {
 	switch (iomap->type) {
 	case IOMAP_HOLE:
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 152a230f668d..a648dbf6991e 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
  * distinction between written and unwritten extents.
  */
 static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
-		loff_t count, void *data, struct iomap *iomap)
+		loff_t count, void *data, struct iomap *iomap,
+		struct iomap *srcmap)
 {
 	struct iomap_swapfile_info *isi = data;
 	int error;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c0a492353826..016adcd7dd66 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -928,7 +928,8 @@ xfs_file_iomap_begin(
 	loff_t			offset,
 	loff_t			length,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
 	loff_t			offset,
 	loff_t			length,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
 	loff_t			offset,
 	loff_t			length,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 24c784e44274..37af5f9dc722 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -127,7 +127,8 @@ struct iomap_ops {
 	 * The actual length is returned in iomap->length.
 	 */
 	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
-			unsigned flags, struct iomap *iomap);
+			unsigned flags, struct iomap *iomap,
+			struct iomap *srcmap);
 
 	/*
 	 * Commit and/or unreserve space previous allocated using iomap_begin.
@@ -143,7 +144,7 @@ struct iomap_ops {
  * Main iomap iterator function.
  */
 typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
-		void *data, struct iomap *iomap);
+		void *data, struct iomap *iomap, struct iomap *srcmap);
 
 loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
 		unsigned flags, const struct iomap_ops *ops, void *data,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/20] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

There is no reason not to punch out stale delalloc blocks for zeroing
operations, as they otherwise behave exactly like normal writes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 016adcd7dd66..7031547e25ad 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1138,7 +1138,8 @@ xfs_file_iomap_end(
 	unsigned		flags,
 	struct iomap		*iomap)
 {
-	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
+	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
+	    iomap->type == IOMAP_DELALLOC)
 		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
 				length, written, iomap);
 	return 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 09/20] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08 15:20   ` Darrick J. Wong
  2019-10-08  7:15 ` [PATCH 11/20] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Now that xfs_file_unshare is not completely dumb we can just call it
directly without iterating the extent and reflink btrees ourselves.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_reflink.c | 103 +++----------------------------------------
 1 file changed, 5 insertions(+), 98 deletions(-)

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index a9634110c783..7fc728a8852b 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1381,85 +1381,6 @@ xfs_reflink_remap_prep(
 	return ret;
 }
 
-/*
- * The user wants to preemptively CoW all shared blocks in this file,
- * which enables us to turn off the reflink flag.  Iterate all
- * extents which are not prealloc/delalloc to see which ranges are
- * mentioned in the refcount tree, then read those blocks into the
- * pagecache, dirty them, fsync them back out, and then we can update
- * the inode flag.  What happens if we run out of memory? :)
- */
-STATIC int
-xfs_reflink_dirty_extents(
-	struct xfs_inode	*ip,
-	xfs_fileoff_t		fbno,
-	xfs_filblks_t		end,
-	xfs_off_t		isize)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_agnumber_t		agno;
-	xfs_agblock_t		agbno;
-	xfs_extlen_t		aglen;
-	xfs_agblock_t		rbno;
-	xfs_extlen_t		rlen;
-	xfs_off_t		fpos;
-	xfs_off_t		flen;
-	struct xfs_bmbt_irec	map[2];
-	int			nmaps;
-	int			error = 0;
-
-	while (end - fbno > 0) {
-		nmaps = 1;
-		/*
-		 * Look for extents in the file.  Skip holes, delalloc, or
-		 * unwritten extents; they can't be reflinked.
-		 */
-		error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
-		if (error)
-			goto out;
-		if (nmaps == 0)
-			break;
-		if (!xfs_bmap_is_real_extent(&map[0]))
-			goto next;
-
-		map[1] = map[0];
-		while (map[1].br_blockcount) {
-			agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
-			agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
-			aglen = map[1].br_blockcount;
-
-			error = xfs_reflink_find_shared(mp, NULL, agno, agbno,
-					aglen, &rbno, &rlen, true);
-			if (error)
-				goto out;
-			if (rbno == NULLAGBLOCK)
-				break;
-
-			/* Dirty the pages */
-			xfs_iunlock(ip, XFS_ILOCK_EXCL);
-			fpos = XFS_FSB_TO_B(mp, map[1].br_startoff +
-					(rbno - agbno));
-			flen = XFS_FSB_TO_B(mp, rlen);
-			if (fpos + flen > isize)
-				flen = isize - fpos;
-			error = iomap_file_unshare(VFS_I(ip), fpos, flen,
-					&xfs_iomap_ops);
-			xfs_ilock(ip, XFS_ILOCK_EXCL);
-			if (error)
-				goto out;
-
-			map[1].br_blockcount -= (rbno - agbno + rlen);
-			map[1].br_startoff += (rbno - agbno + rlen);
-			map[1].br_startblock += (rbno - agbno + rlen);
-		}
-
-next:
-		fbno = map[0].br_startoff + map[0].br_blockcount;
-	}
-out:
-	return error;
-}
-
 /* Does this inode need the reflink flag? */
 int
 xfs_reflink_inode_has_shared_extents(
@@ -1596,10 +1517,7 @@ xfs_reflink_unshare(
 	xfs_off_t		offset,
 	xfs_off_t		len)
 {
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		fbno;
-	xfs_filblks_t		end;
-	xfs_off_t		isize;
+	struct inode		*inode = VFS_I(ip);
 	int			error;
 
 	if (!xfs_is_reflink_inode(ip))
@@ -1607,20 +1525,12 @@ xfs_reflink_unshare(
 
 	trace_xfs_reflink_unshare(ip, offset, len);
 
-	inode_dio_wait(VFS_I(ip));
+	inode_dio_wait(inode);
 
-	/* Try to CoW the selected ranges */
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	fbno = XFS_B_TO_FSBT(mp, offset);
-	isize = i_size_read(VFS_I(ip));
-	end = XFS_B_TO_FSB(mp, offset + len);
-	error = xfs_reflink_dirty_extents(ip, fbno, end, isize);
+	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
 	if (error)
-		goto out_unlock;
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
-	/* Wait for the IO to finish */
-	error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
+		goto out;
+	error = filemap_write_and_wait(inode->i_mapping);
 	if (error)
 		goto out;
 
@@ -1628,11 +1538,8 @@ xfs_reflink_unshare(
 	error = xfs_reflink_try_clear_inode_flag(ip);
 	if (error)
 		goto out;
-
 	return 0;
 
-out_unlock:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 out:
 	trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
 	return error;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/20] xfs: pass two imaps to xfs_reflink_allocate_cow
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 12/20] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

xfs_reflink_allocate_cow consumes the source data fork imap, and
potentially returns the COW fork imap.  Split the arguments in two
to clear up the calling conventions and to prepare for returning
a source iomap from ->iomap_begin.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c   |  8 ++++----
 fs/xfs/xfs_reflink.c | 30 +++++++++++++++---------------
 fs/xfs/xfs_reflink.h |  4 ++--
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7031547e25ad..44ae6de02a19 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -996,9 +996,8 @@ xfs_file_iomap_begin(
 			goto out_found;
 
 		/* may drop and re-acquire the ilock */
-		cmap = imap;
-		error = xfs_reflink_allocate_cow(ip, &cmap, &shared, &lockmode,
-				directio);
+		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
+				&lockmode, directio);
 		if (error)
 			goto out_unlock;
 
@@ -1011,7 +1010,8 @@ xfs_file_iomap_begin(
 		 * newly allocated address.  If the data fork has a hole, copy
 		 * the COW fork mapping to avoid allocating to the data fork.
 		 */
-		if (directio || imap.br_startblock == HOLESTARTBLOCK)
+		if (shared &&
+		    (directio || imap.br_startblock == HOLESTARTBLOCK))
 			imap = cmap;
 
 		end_fsb = imap.br_startoff + imap.br_blockcount;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 7fc728a8852b..19a6e4644123 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -308,13 +308,13 @@ static int
 xfs_find_trim_cow_extent(
 	struct xfs_inode	*ip,
 	struct xfs_bmbt_irec	*imap,
+	struct xfs_bmbt_irec	*cmap,
 	bool			*shared,
 	bool			*found)
 {
 	xfs_fileoff_t		offset_fsb = imap->br_startoff;
 	xfs_filblks_t		count_fsb = imap->br_blockcount;
 	struct xfs_iext_cursor	icur;
-	struct xfs_bmbt_irec	got;
 
 	*found = false;
 
@@ -322,23 +322,22 @@ xfs_find_trim_cow_extent(
 	 * If we don't find an overlapping extent, trim the range we need to
 	 * allocate to fit the hole we found.
 	 */
-	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got))
-		got.br_startoff = offset_fsb + count_fsb;
-	if (got.br_startoff > offset_fsb) {
+	if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, cmap))
+		cmap->br_startoff = offset_fsb + count_fsb;
+	if (cmap->br_startoff > offset_fsb) {
 		xfs_trim_extent(imap, imap->br_startoff,
-				got.br_startoff - imap->br_startoff);
+				cmap->br_startoff - imap->br_startoff);
 		return xfs_inode_need_cow(ip, imap, shared);
 	}
 
 	*shared = true;
-	if (isnullstartblock(got.br_startblock)) {
-		xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
+	if (isnullstartblock(cmap->br_startblock)) {
+		xfs_trim_extent(imap, cmap->br_startoff, cmap->br_blockcount);
 		return 0;
 	}
 
 	/* real extent found - no need to allocate */
-	xfs_trim_extent(&got, offset_fsb, count_fsb);
-	*imap = got;
+	xfs_trim_extent(cmap, offset_fsb, count_fsb);
 	*found = true;
 	return 0;
 }
@@ -348,6 +347,7 @@ int
 xfs_reflink_allocate_cow(
 	struct xfs_inode	*ip,
 	struct xfs_bmbt_irec	*imap,
+	struct xfs_bmbt_irec	*cmap,
 	bool			*shared,
 	uint			*lockmode,
 	bool			convert_now)
@@ -367,7 +367,7 @@ xfs_reflink_allocate_cow(
 		xfs_ifork_init_cow(ip);
 	}
 
-	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+	error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
 	if (error || !*shared)
 		return error;
 	if (found)
@@ -392,7 +392,7 @@ xfs_reflink_allocate_cow(
 	/*
 	 * Check for an overlapping extent again now that we dropped the ilock.
 	 */
-	error = xfs_find_trim_cow_extent(ip, imap, shared, &found);
+	error = xfs_find_trim_cow_extent(ip, imap, cmap, shared, &found);
 	if (error || !*shared)
 		goto out_trans_cancel;
 	if (found) {
@@ -411,7 +411,7 @@ xfs_reflink_allocate_cow(
 	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
 			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
-			resblks, imap, &nimaps);
+			resblks, cmap, &nimaps);
 	if (error)
 		goto out_unreserve;
 
@@ -427,15 +427,15 @@ xfs_reflink_allocate_cow(
 	if (nimaps == 0)
 		return -ENOSPC;
 convert:
-	xfs_trim_extent(imap, offset_fsb, count_fsb);
+	xfs_trim_extent(cmap, offset_fsb, count_fsb);
 	/*
 	 * COW fork extents are supposed to remain unwritten until we're ready
 	 * to initiate a disk write.  For direct I/O we are going to write the
 	 * data and need the conversion, but for buffered writes we're done.
 	 */
-	if (!convert_now || imap->br_state == XFS_EXT_NORM)
+	if (!convert_now || cmap->br_state == XFS_EXT_NORM)
 		return 0;
-	trace_xfs_reflink_convert_cow(ip, imap);
+	trace_xfs_reflink_convert_cow(ip, cmap);
 	return xfs_reflink_convert_cow_locked(ip, offset_fsb, count_fsb);
 
 out_unreserve:
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 28a43b7f581d..d18ad7f4fb64 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -25,8 +25,8 @@ extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 bool xfs_inode_need_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
 		bool *shared);
 
-extern int xfs_reflink_allocate_cow(struct xfs_inode *ip,
-		struct xfs_bmbt_irec *imap, bool *shared, uint *lockmode,
+int xfs_reflink_allocate_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
+		struct xfs_bmbt_irec *cmap, bool *shared, uint *lockmode,
 		bool convert_now);
 extern int xfs_reflink_convert_cow(struct xfs_inode *ip, xfs_off_t offset,
 		xfs_off_t count);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/20] xfs: refactor xfs_file_iomap_begin_delay
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 11/20] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 13/20] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

Rejuggle the return path to prepare for filling out a source iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c | 48 +++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 44ae6de02a19..3292dfc8030a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -539,7 +539,6 @@ xfs_file_iomap_begin_delay(
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
 	bool			eof = false, cow_eof = false, shared = false;
-	u16			iomap_flags = 0;
 	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
 
@@ -600,8 +599,7 @@ xfs_file_iomap_begin_delay(
 				&ccur, &cmap);
 		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
 			trace_xfs_reflink_cow_found(ip, &cmap);
-			whichfork = XFS_COW_FORK;
-			goto done;
+			goto found_cow;
 		}
 	}
 
@@ -615,7 +613,7 @@ xfs_file_iomap_begin_delay(
 		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
 			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
 					&imap);
-			goto done;
+			goto found_imap;
 		}
 
 		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
@@ -629,7 +627,7 @@ xfs_file_iomap_begin_delay(
 		if (!shared) {
 			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
 					&imap);
-			goto done;
+			goto found_imap;
 		}
 
 		/*
@@ -703,35 +701,37 @@ xfs_file_iomap_begin_delay(
 		goto out_unlock;
 	}
 
+	if (whichfork == XFS_COW_FORK) {
+		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+		goto found_cow;
+	}
+
 	/*
 	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
 	 * them out if the write happens to fail.
 	 */
-	if (whichfork == XFS_DATA_FORK) {
-		iomap_flags |= IOMAP_F_NEW;
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
-	} else {
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
-	}
-done:
-	if (whichfork == XFS_COW_FORK) {
-		if (imap.br_startoff > offset_fsb) {
-			xfs_trim_extent(&cmap, offset_fsb,
-					imap.br_startoff - offset_fsb);
-			error = xfs_bmbt_to_iomap(ip, iomap, &cmap,
-					IOMAP_F_SHARED);
-			goto out_unlock;
-		}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
+
+found_imap:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
+
+found_cow:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (imap.br_startoff <= offset_fsb) {
 		/* ensure we only report blocks we have a reservation for */
 		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
-		shared = true;
+		return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
 	}
-	if (shared)
-		iomap_flags |= IOMAP_F_SHARED;
-	error = xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
+	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
+
 }
 
 int
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/20] xfs: fill out the srcmap in iomap_begin
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 12/20] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08 15:26   ` Darrick J. Wong
  2019-10-08  7:15 ` [PATCH 14/20] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Replace our local hacks to report the source block in the main iomap
with the proper scrmap reporting.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 49 +++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 3292dfc8030a..ab2482a573d8 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -527,7 +527,8 @@ xfs_file_iomap_begin_delay(
 	loff_t			offset,
 	loff_t			count,
 	unsigned		flags,
-	struct iomap		*iomap)
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -721,11 +722,13 @@ xfs_file_iomap_begin_delay(
 found_cow:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	if (imap.br_startoff <= offset_fsb) {
-		/* ensure we only report blocks we have a reservation for */
-		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
-		return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+		if (error)
+			return error;
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb,
+				imap.br_startoff - offset_fsb);
 	}
-	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
 	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
 
 out_unlock:
@@ -933,7 +936,7 @@ xfs_file_iomap_begin(
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_bmbt_irec	imap;
+	struct xfs_bmbt_irec	imap, cmap;
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	int			nimaps = 1, error = 0;
 	bool			shared = false;
@@ -947,7 +950,7 @@ xfs_file_iomap_begin(
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
 		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
-				iomap);
+				iomap, srcmap);
 	}
 
 	/*
@@ -987,9 +990,6 @@ xfs_file_iomap_begin(
 	 * been done up front, so we don't need to do them here.
 	 */
 	if (xfs_is_cow_inode(ip)) {
-		struct xfs_bmbt_irec	cmap;
-		bool			directio = (flags & IOMAP_DIRECT);
-
 		/* if zeroing doesn't need COW allocation, then we are done. */
 		if ((flags & IOMAP_ZERO) &&
 		    !needs_cow_for_zeroing(&imap, nimaps))
@@ -997,23 +997,11 @@ xfs_file_iomap_begin(
 
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
-				&lockmode, directio);
+				&lockmode, flags & IOMAP_DIRECT);
 		if (error)
 			goto out_unlock;
-
-		/*
-		 * For buffered writes we need to report the address of the
-		 * previous block (if there was any) so that the higher level
-		 * write code can perform read-modify-write operations; we
-		 * won't need the CoW fork mapping until writeback.  For direct
-		 * I/O, which must be block aligned, we need to report the
-		 * newly allocated address.  If the data fork has a hole, copy
-		 * the COW fork mapping to avoid allocating to the data fork.
-		 */
-		if (shared &&
-		    (directio || imap.br_startblock == HOLESTARTBLOCK))
-			imap = cmap;
-
+		if (shared)
+			goto out_found_cow;
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
 	}
@@ -1067,6 +1055,17 @@ xfs_file_iomap_begin(
 	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
 	goto out_finish;
 
+out_found_cow:
+	xfs_iunlock(ip, lockmode);
+	length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
+	trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap);
+	if (imap.br_startblock != HOLESTARTBLOCK) {
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+		if (error)
+			return error;
+	}
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
 out_unlock:
 	xfs_iunlock(ip, lockmode);
 	return error;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/20] xfs: factor out a helper to calculate the end_fsb
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 13/20] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 15/20] xfs: split out a new set of read-only iomap ops Christoph Hellwig
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
  Cc: Allison Collins, Darrick J . Wong

We have lots of places that want to calculate the final fsb for
a offset + count in bytes and check that the result fits into
s_maxbytes.  Factor out a helper for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ab2482a573d8..a9f9e8c9034a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -102,6 +102,17 @@ xfs_hole_to_iomap(
 	iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
 }
 
+static inline xfs_fileoff_t
+xfs_iomap_end_fsb(
+	struct xfs_mount	*mp,
+	loff_t			offset,
+	loff_t			count)
+{
+	ASSERT(offset <= mp->m_super->s_maxbytes);
+	return min(XFS_B_TO_FSB(mp, offset + count),
+		   XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+}
+
 xfs_extlen_t
 xfs_eof_alignment(
 	struct xfs_inode	*ip,
@@ -172,8 +183,8 @@ xfs_iomap_write_direct(
 	int		nmaps)
 {
 	xfs_mount_t	*mp = ip->i_mount;
-	xfs_fileoff_t	offset_fsb;
-	xfs_fileoff_t	last_fsb;
+	xfs_fileoff_t	offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t	last_fsb = xfs_iomap_end_fsb(mp, offset, count);
 	xfs_filblks_t	count_fsb, resaligned;
 	xfs_extlen_t	extsz;
 	int		nimaps;
@@ -192,8 +203,6 @@ xfs_iomap_write_direct(
 
 	ASSERT(xfs_isilocked(ip, lockmode));
 
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
 	if ((offset + count) > XFS_ISIZE(ip)) {
 		/*
 		 * Assert that the in-core extent list is present since this can
@@ -533,9 +542,7 @@ xfs_file_iomap_begin_delay(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	xfs_fileoff_t		maxbytes_fsb =
-		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
-	xfs_fileoff_t		end_fsb;
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
 	struct xfs_bmbt_irec	imap, cmap;
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
@@ -565,8 +572,6 @@ xfs_file_iomap_begin_delay(
 			goto out_unlock;
 	}
 
-	end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
-
 	/*
 	 * Search the data fork fork first to look up our source mapping.  We
 	 * always need the data fork map, as we have to return it to the
@@ -648,7 +653,7 @@ xfs_file_iomap_begin_delay(
 		 * the lower level functions are updated.
 		 */
 		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
-		end_fsb = min(XFS_B_TO_FSB(mp, offset + count), maxbytes_fsb);
+		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
 
 		if (xfs_is_always_cow_inode(ip))
 			whichfork = XFS_COW_FORK;
@@ -674,7 +679,8 @@ xfs_file_iomap_begin_delay(
 			if (align)
 				p_end_fsb = roundup_64(p_end_fsb, align);
 
-			p_end_fsb = min(p_end_fsb, maxbytes_fsb);
+			p_end_fsb = min(p_end_fsb,
+				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
 			ASSERT(p_end_fsb > offset_fsb);
 			prealloc_blocks = p_end_fsb - end_fsb;
 		}
@@ -937,7 +943,8 @@ xfs_file_iomap_begin(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_bmbt_irec	imap, cmap;
-	xfs_fileoff_t		offset_fsb, end_fsb;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
 	int			nimaps = 1, error = 0;
 	bool			shared = false;
 	u16			iomap_flags = 0;
@@ -963,12 +970,6 @@ xfs_file_iomap_begin(
 	if (error)
 		return error;
 
-	ASSERT(offset <= mp->m_super->s_maxbytes);
-	if (offset > mp->m_super->s_maxbytes - length)
-		length = mp->m_super->s_maxbytes - offset;
-	offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	end_fsb = XFS_B_TO_FSB(mp, offset + length);
-
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
 			       &nimaps, 0);
 	if (error)
@@ -1189,8 +1190,7 @@ xfs_seek_iomap_begin(
 		/*
 		 * Fake a hole until the end of the file.
 		 */
-		data_fsb = min(XFS_B_TO_FSB(mp, offset + length),
-			       XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+		data_fsb = xfs_iomap_end_fsb(mp, offset, length);
 	}
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 15/20] xfs: split out a new set of read-only iomap ops
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 14/20] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08 15:27   ` Darrick J. Wong
  2019-10-08  7:15 ` [PATCH 16/20] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Start untangling xfs_file_iomap_begin by splitting out the read-only
case into its own set of iomap_ops with a very simply iomap_begin
helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  |  9 ++++---
 fs/xfs/xfs_file.c  |  8 +++---
 fs/xfs/xfs_iomap.c | 61 ++++++++++++++++++++++++++++++++++------------
 fs/xfs/xfs_iomap.h |  1 +
 fs/xfs/xfs_iops.c  |  2 +-
 5 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 807af5c3c347..f708a2831d2f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -635,7 +635,7 @@ xfs_vm_bmap(
 	 */
 	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
 		return 0;
-	return iomap_bmap(mapping, block, &xfs_iomap_ops);
+	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
 }
 
 STATIC int
@@ -643,7 +643,7 @@ xfs_vm_readpage(
 	struct file		*unused,
 	struct page		*page)
 {
-	return iomap_readpage(page, &xfs_iomap_ops);
+	return iomap_readpage(page, &xfs_read_iomap_ops);
 }
 
 STATIC int
@@ -653,7 +653,7 @@ xfs_vm_readpages(
 	struct list_head	*pages,
 	unsigned		nr_pages)
 {
-	return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
+	return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
 }
 
 static int
@@ -663,7 +663,8 @@ xfs_iomap_swapfile_activate(
 	sector_t			*span)
 {
 	sis->bdev = xfs_find_bdev_for_inode(file_inode(swap_file));
-	return iomap_swapfile_activate(sis, swap_file, span, &xfs_iomap_ops);
+	return iomap_swapfile_activate(sis, swap_file, span,
+			&xfs_read_iomap_ops);
 }
 
 const struct address_space_operations xfs_address_space_operations = {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 1ffb179f35d2..f9814306ed8e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -188,7 +188,7 @@ xfs_file_dio_aio_read(
 	file_accessed(iocb->ki_filp);
 
 	xfs_ilock(ip, XFS_IOLOCK_SHARED);
-	ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
+	ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
 	return ret;
@@ -215,7 +215,7 @@ xfs_file_dax_read(
 		xfs_ilock(ip, XFS_IOLOCK_SHARED);
 	}
 
-	ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
+	ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
 	file_accessed(iocb->ki_filp);
@@ -1156,7 +1156,9 @@ __xfs_filemap_fault(
 	if (IS_DAX(inode)) {
 		pfn_t pfn;
 
-		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
+		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
+				(write_fault && !vmf->cow_page) ?
+				 &xfs_iomap_ops : &xfs_read_iomap_ops);
 		if (ret & VM_FAULT_NEEDDSYNC)
 			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
 	} else {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index a9f9e8c9034a..0cfd973fd192 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -950,11 +950,13 @@ xfs_file_iomap_begin(
 	u16			iomap_flags = 0;
 	unsigned		lockmode;
 
+	ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
+
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
-			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
+	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
+	    !xfs_get_extsz_hint(ip)) {
 		/* Reserve delalloc blocks for regular writeback. */
 		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
 				iomap, srcmap);
@@ -975,17 +977,6 @@ xfs_file_iomap_begin(
 	if (error)
 		goto out_unlock;
 
-	if (flags & IOMAP_REPORT) {
-		/* Trim the mapping to the nearest shared extent boundary. */
-		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
-		if (error)
-			goto out_unlock;
-	}
-
-	/* Non-modifying mapping requested, so we are done */
-	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
-		goto out_found;
-
 	/*
 	 * Break shared extents if necessary. Checks for non-blocking IO have
 	 * been done up front, so we don't need to do them here.
@@ -1046,8 +1037,6 @@ xfs_file_iomap_begin(
 	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
 
 out_finish:
-	if (shared)
-		iomap_flags |= IOMAP_F_SHARED;
 	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
 
 out_found:
@@ -1150,6 +1139,48 @@ const struct iomap_ops xfs_iomap_ops = {
 	.iomap_end		= xfs_file_iomap_end,
 };
 
+static int
+xfs_read_iomap_begin(
+	struct inode		*inode,
+	loff_t			offset,
+	loff_t			length,
+	unsigned		flags,
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_bmbt_irec	imap;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
+	int			nimaps = 1, error = 0;
+	bool			shared = false;
+	unsigned		lockmode;
+
+	ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
+
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
+
+	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
+	if (error)
+		return error;
+	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
+			       &nimaps, 0);
+	if (!error && (flags & IOMAP_REPORT))
+		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
+	xfs_iunlock(ip, lockmode);
+
+	if (error)
+		return error;
+	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, shared ? IOMAP_F_SHARED : 0);
+}
+
+const struct iomap_ops xfs_read_iomap_ops = {
+	.iomap_begin		= xfs_read_iomap_begin,
+};
+
 static int
 xfs_seek_iomap_begin(
 	struct inode		*inode,
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 71d0ae460c44..61b1fc3e5143 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -40,6 +40,7 @@ xfs_aligned_fsb_count(
 }
 
 extern const struct iomap_ops xfs_iomap_ops;
+extern const struct iomap_ops xfs_read_iomap_ops;
 extern const struct iomap_ops xfs_seek_iomap_ops;
 extern const struct iomap_ops xfs_xattr_iomap_ops;
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index fe285d123d69..9c448a54a951 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1114,7 +1114,7 @@ xfs_vn_fiemap(
 				&xfs_xattr_iomap_ops);
 	} else {
 		error = iomap_fiemap(inode, fieinfo, start, length,
-				&xfs_iomap_ops);
+				&xfs_read_iomap_ops);
 	}
 	xfs_iunlock(XFS_I(inode), XFS_IOLOCK_SHARED);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 16/20] xfs: move xfs_file_iomap_begin_delay around
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 15/20] xfs: split out a new set of read-only iomap ops Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

Move xfs_file_iomap_begin_delay near the end of the file next to the
other iomap functions to prepare for additional refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c | 434 +++++++++++++++++++++++----------------------
 1 file changed, 221 insertions(+), 213 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0cfd973fd192..a6a03b65c4e7 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -530,219 +530,6 @@ xfs_iomap_prealloc_size(
 	return alloc_blocks;
 }
 
-static int
-xfs_file_iomap_begin_delay(
-	struct inode		*inode,
-	loff_t			offset,
-	loff_t			count,
-	unsigned		flags,
-	struct iomap		*iomap,
-	struct iomap		*srcmap)
-{
-	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
-	struct xfs_bmbt_irec	imap, cmap;
-	struct xfs_iext_cursor	icur, ccur;
-	xfs_fsblock_t		prealloc_blocks = 0;
-	bool			eof = false, cow_eof = false, shared = false;
-	int			whichfork = XFS_DATA_FORK;
-	int			error = 0;
-
-	ASSERT(!XFS_IS_REALTIME_INODE(ip));
-	ASSERT(!xfs_get_extsz_hint(ip));
-
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
-
-	if (unlikely(XFS_TEST_ERROR(
-	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
-	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
-	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
-		error = -EFSCORRUPTED;
-		goto out_unlock;
-	}
-
-	XFS_STATS_INC(mp, xs_blk_mapw);
-
-	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
-		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
-		if (error)
-			goto out_unlock;
-	}
-
-	/*
-	 * Search the data fork fork first to look up our source mapping.  We
-	 * always need the data fork map, as we have to return it to the
-	 * iomap code so that the higher level write code can read data in to
-	 * perform read-modify-write cycles for unaligned writes.
-	 */
-	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
-	if (eof)
-		imap.br_startoff = end_fsb; /* fake hole until the end */
-
-	/* We never need to allocate blocks for zeroing a hole. */
-	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
-		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
-		goto out_unlock;
-	}
-
-	/*
-	 * Search the COW fork extent list even if we did not find a data fork
-	 * extent.  This serves two purposes: first this implements the
-	 * speculative preallocation using cowextsize, so that we also unshare
-	 * block adjacent to shared blocks instead of just the shared blocks
-	 * themselves.  Second the lookup in the extent list is generally faster
-	 * than going out to the shared extent tree.
-	 */
-	if (xfs_is_cow_inode(ip)) {
-		if (!ip->i_cowfp) {
-			ASSERT(!xfs_is_reflink_inode(ip));
-			xfs_ifork_init_cow(ip);
-		}
-		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
-				&ccur, &cmap);
-		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
-			trace_xfs_reflink_cow_found(ip, &cmap);
-			goto found_cow;
-		}
-	}
-
-	if (imap.br_startoff <= offset_fsb) {
-		/*
-		 * For reflink files we may need a delalloc reservation when
-		 * overwriting shared extents.   This includes zeroing of
-		 * existing extents that contain data.
-		 */
-		if (!xfs_is_cow_inode(ip) ||
-		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
-			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
-					&imap);
-			goto found_imap;
-		}
-
-		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
-
-		/* Trim the mapping to the nearest shared extent boundary. */
-		error = xfs_inode_need_cow(ip, &imap, &shared);
-		if (error)
-			goto out_unlock;
-
-		/* Not shared?  Just report the (potentially capped) extent. */
-		if (!shared) {
-			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
-					&imap);
-			goto found_imap;
-		}
-
-		/*
-		 * Fork all the shared blocks from our write offset until the
-		 * end of the extent.
-		 */
-		whichfork = XFS_COW_FORK;
-		end_fsb = imap.br_startoff + imap.br_blockcount;
-	} else {
-		/*
-		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
-		 * pages to keep the chunks of work done where somewhat
-		 * symmetric with the work writeback does.  This is a completely
-		 * arbitrary number pulled out of thin air.
-		 *
-		 * Note that the values needs to be less than 32-bits wide until
-		 * the lower level functions are updated.
-		 */
-		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
-		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
-
-		if (xfs_is_always_cow_inode(ip))
-			whichfork = XFS_COW_FORK;
-	}
-
-	error = xfs_qm_dqattach_locked(ip, false);
-	if (error)
-		goto out_unlock;
-
-	if (eof) {
-		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
-				count, &icur);
-		if (prealloc_blocks) {
-			xfs_extlen_t	align;
-			xfs_off_t	end_offset;
-			xfs_fileoff_t	p_end_fsb;
-
-			end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
-			p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
-					prealloc_blocks;
-
-			align = xfs_eof_alignment(ip, 0);
-			if (align)
-				p_end_fsb = roundup_64(p_end_fsb, align);
-
-			p_end_fsb = min(p_end_fsb,
-				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
-			ASSERT(p_end_fsb > offset_fsb);
-			prealloc_blocks = p_end_fsb - end_fsb;
-		}
-	}
-
-retry:
-	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
-			end_fsb - offset_fsb, prealloc_blocks,
-			whichfork == XFS_DATA_FORK ? &imap : &cmap,
-			whichfork == XFS_DATA_FORK ? &icur : &ccur,
-			whichfork == XFS_DATA_FORK ? eof : cow_eof);
-	switch (error) {
-	case 0:
-		break;
-	case -ENOSPC:
-	case -EDQUOT:
-		/* retry without any preallocation */
-		trace_xfs_delalloc_enospc(ip, offset, count);
-		if (prealloc_blocks) {
-			prealloc_blocks = 0;
-			goto retry;
-		}
-		/*FALLTHRU*/
-	default:
-		goto out_unlock;
-	}
-
-	if (whichfork == XFS_COW_FORK) {
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
-		goto found_cow;
-	}
-
-	/*
-	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
-	 * them out if the write happens to fail.
-	 */
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
-
-found_imap:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
-
-found_cow:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	if (imap.br_startoff <= offset_fsb) {
-		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
-		if (error)
-			return error;
-	} else {
-		xfs_trim_extent(&cmap, offset_fsb,
-				imap.br_startoff - offset_fsb);
-	}
-	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
-
-out_unlock:
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return error;
-
-}
-
 int
 xfs_iomap_write_unwritten(
 	xfs_inode_t	*ip,
@@ -931,6 +718,15 @@ xfs_ilock_for_iomap(
 	return 0;
 }
 
+static int
+xfs_file_iomap_begin_delay(
+	struct inode		*inode,
+	loff_t			offset,
+	loff_t			count,
+	unsigned		flags,
+	struct iomap		*iomap,
+	struct iomap		*srcmap);
+
 static int
 xfs_file_iomap_begin(
 	struct inode		*inode,
@@ -1061,6 +857,218 @@ xfs_file_iomap_begin(
 	return error;
 }
 
+static int
+xfs_file_iomap_begin_delay(
+	struct inode		*inode,
+	loff_t			offset,
+	loff_t			count,
+	unsigned		flags,
+	struct iomap		*iomap,
+	struct iomap		*srcmap)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
+	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
+	struct xfs_bmbt_irec	imap, cmap;
+	struct xfs_iext_cursor	icur, ccur;
+	xfs_fsblock_t		prealloc_blocks = 0;
+	bool			eof = false, cow_eof = false, shared = false;
+	int			whichfork = XFS_DATA_FORK;
+	int			error = 0;
+
+	ASSERT(!XFS_IS_REALTIME_INODE(ip));
+	ASSERT(!xfs_get_extsz_hint(ip));
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	if (unlikely(XFS_TEST_ERROR(
+	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
+	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
+	     mp, XFS_ERRTAG_BMAPIFORMAT))) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
+		error = -EFSCORRUPTED;
+		goto out_unlock;
+	}
+
+	XFS_STATS_INC(mp, xs_blk_mapw);
+
+	if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+		if (error)
+			goto out_unlock;
+	}
+
+	/*
+	 * Search the data fork fork first to look up our source mapping.  We
+	 * always need the data fork map, as we have to return it to the
+	 * iomap code so that the higher level write code can read data in to
+	 * perform read-modify-write cycles for unaligned writes.
+	 */
+	eof = !xfs_iext_lookup_extent(ip, &ip->i_df, offset_fsb, &icur, &imap);
+	if (eof)
+		imap.br_startoff = end_fsb; /* fake hole until the end */
+
+	/* We never need to allocate blocks for zeroing a hole. */
+	if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) {
+		xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff);
+		goto out_unlock;
+	}
+
+	/*
+	 * Search the COW fork extent list even if we did not find a data fork
+	 * extent.  This serves two purposes: first this implements the
+	 * speculative preallocation using cowextsize, so that we also unshare
+	 * block adjacent to shared blocks instead of just the shared blocks
+	 * themselves.  Second the lookup in the extent list is generally faster
+	 * than going out to the shared extent tree.
+	 */
+	if (xfs_is_cow_inode(ip)) {
+		if (!ip->i_cowfp) {
+			ASSERT(!xfs_is_reflink_inode(ip));
+			xfs_ifork_init_cow(ip);
+		}
+		cow_eof = !xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb,
+				&ccur, &cmap);
+		if (!cow_eof && cmap.br_startoff <= offset_fsb) {
+			trace_xfs_reflink_cow_found(ip, &cmap);
+			goto found_cow;
+		}
+	}
+
+	if (imap.br_startoff <= offset_fsb) {
+		/*
+		 * For reflink files we may need a delalloc reservation when
+		 * overwriting shared extents.   This includes zeroing of
+		 * existing extents that contain data.
+		 */
+		if (!xfs_is_cow_inode(ip) ||
+		    ((flags & IOMAP_ZERO) && imap.br_state != XFS_EXT_NORM)) {
+			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+					&imap);
+			goto found_imap;
+		}
+
+		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
+
+		/* Trim the mapping to the nearest shared extent boundary. */
+		error = xfs_inode_need_cow(ip, &imap, &shared);
+		if (error)
+			goto out_unlock;
+
+		/* Not shared?  Just report the (potentially capped) extent. */
+		if (!shared) {
+			trace_xfs_iomap_found(ip, offset, count, XFS_DATA_FORK,
+					&imap);
+			goto found_imap;
+		}
+
+		/*
+		 * Fork all the shared blocks from our write offset until the
+		 * end of the extent.
+		 */
+		whichfork = XFS_COW_FORK;
+		end_fsb = imap.br_startoff + imap.br_blockcount;
+	} else {
+		/*
+		 * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
+		 * pages to keep the chunks of work done where somewhat
+		 * symmetric with the work writeback does.  This is a completely
+		 * arbitrary number pulled out of thin air.
+		 *
+		 * Note that the values needs to be less than 32-bits wide until
+		 * the lower level functions are updated.
+		 */
+		count = min_t(loff_t, count, 1024 * PAGE_SIZE);
+		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
+
+		if (xfs_is_always_cow_inode(ip))
+			whichfork = XFS_COW_FORK;
+	}
+
+	error = xfs_qm_dqattach_locked(ip, false);
+	if (error)
+		goto out_unlock;
+
+	if (eof) {
+		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
+				count, &icur);
+		if (prealloc_blocks) {
+			xfs_extlen_t	align;
+			xfs_off_t	end_offset;
+			xfs_fileoff_t	p_end_fsb;
+
+			end_offset = XFS_WRITEIO_ALIGN(mp, offset + count - 1);
+			p_end_fsb = XFS_B_TO_FSBT(mp, end_offset) +
+					prealloc_blocks;
+
+			align = xfs_eof_alignment(ip, 0);
+			if (align)
+				p_end_fsb = roundup_64(p_end_fsb, align);
+
+			p_end_fsb = min(p_end_fsb,
+				XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes));
+			ASSERT(p_end_fsb > offset_fsb);
+			prealloc_blocks = p_end_fsb - end_fsb;
+		}
+	}
+
+retry:
+	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
+			end_fsb - offset_fsb, prealloc_blocks,
+			whichfork == XFS_DATA_FORK ? &imap : &cmap,
+			whichfork == XFS_DATA_FORK ? &icur : &ccur,
+			whichfork == XFS_DATA_FORK ? eof : cow_eof);
+	switch (error) {
+	case 0:
+		break;
+	case -ENOSPC:
+	case -EDQUOT:
+		/* retry without any preallocation */
+		trace_xfs_delalloc_enospc(ip, offset, count);
+		if (prealloc_blocks) {
+			prealloc_blocks = 0;
+			goto retry;
+		}
+		/*FALLTHRU*/
+	default:
+		goto out_unlock;
+	}
+
+	if (whichfork == XFS_COW_FORK) {
+		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+		goto found_cow;
+	}
+
+	/*
+	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
+	 * them out if the write happens to fail.
+	 */
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
+
+found_imap:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
+
+found_cow:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (imap.br_startoff <= offset_fsb) {
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
+		if (error)
+			return error;
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb,
+				imap.br_startoff - offset_fsb);
+	}
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+}
+
 static int
 xfs_file_iomap_end_delalloc(
 	struct xfs_inode	*ip,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 16/20] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08 15:28   ` Darrick J. Wong
  2019-10-08  7:15 ` [PATCH 18/20] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

Instead of lots of magic conditionals in the main write_begin
handler this make the intent very clear.  Thing will become even
better once we support delayed allocations for extent size hints
and realtime allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c |  3 ++-
 fs/xfs/xfs_file.c      | 16 ++++++-----
 fs/xfs/xfs_iomap.c     | 61 +++++++++++++++---------------------------
 fs/xfs/xfs_iomap.h     |  3 ++-
 fs/xfs/xfs_iops.c      |  4 +--
 fs/xfs/xfs_reflink.c   |  5 ++--
 6 files changed, 40 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0910cb75b65d..a6831b7bdc18 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1111,7 +1111,8 @@ xfs_free_file_space(
 		return 0;
 	if (offset + len > XFS_ISIZE(ip))
 		len = XFS_ISIZE(ip) - offset;
-	error = iomap_zero_range(VFS_I(ip), offset, len, NULL, &xfs_iomap_ops);
+	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
+			&xfs_buffered_write_iomap_ops);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index f9814306ed8e..71ffab53a0fc 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -351,7 +351,7 @@ xfs_file_aio_write_checks(
 	
 		trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
 		error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
-				NULL, &xfs_iomap_ops);
+				NULL, &xfs_buffered_write_iomap_ops);
 		if (error)
 			return error;
 	} else
@@ -547,7 +547,8 @@ xfs_file_dio_aio_write(
 	}
 
 	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
-	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, &xfs_dio_write_ops);
+	ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
+			   &xfs_dio_write_ops);
 
 	/*
 	 * If unaligned, this is the only IO in-flight. If it has not yet
@@ -594,7 +595,7 @@ xfs_file_dax_write(
 	count = iov_iter_count(from);
 
 	trace_xfs_file_dax_write(ip, count, pos);
-	ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
+	ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
 	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
 		i_size_write(inode, iocb->ki_pos);
 		error = xfs_setfilesize(ip, pos, ret);
@@ -641,7 +642,8 @@ xfs_file_buffered_aio_write(
 	current->backing_dev_info = inode_to_bdi(inode);
 
 	trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
-	ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
+	ret = iomap_file_buffered_write(iocb, from,
+			&xfs_buffered_write_iomap_ops);
 	if (likely(ret >= 0))
 		iocb->ki_pos += ret;
 
@@ -1158,12 +1160,14 @@ __xfs_filemap_fault(
 
 		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
 				(write_fault && !vmf->cow_page) ?
-				 &xfs_iomap_ops : &xfs_read_iomap_ops);
+				 &xfs_direct_write_iomap_ops :
+				 &xfs_read_iomap_ops);
 		if (ret & VM_FAULT_NEEDDSYNC)
 			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
 	} else {
 		if (write_fault)
-			ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
+			ret = iomap_page_mkwrite(vmf,
+					&xfs_buffered_write_iomap_ops);
 		else
 			ret = filemap_fault(vmf);
 	}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index a6a03b65c4e7..5a7499f88673 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -719,16 +719,7 @@ xfs_ilock_for_iomap(
 }
 
 static int
-xfs_file_iomap_begin_delay(
-	struct inode		*inode,
-	loff_t			offset,
-	loff_t			count,
-	unsigned		flags,
-	struct iomap		*iomap,
-	struct iomap		*srcmap);
-
-static int
-xfs_file_iomap_begin(
+xfs_direct_write_iomap_begin(
 	struct inode		*inode,
 	loff_t			offset,
 	loff_t			length,
@@ -751,13 +742,6 @@ xfs_file_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
-	    !xfs_get_extsz_hint(ip)) {
-		/* Reserve delalloc blocks for regular writeback. */
-		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
-				iomap, srcmap);
-	}
-
 	/*
 	 * Lock the inode in the manner required for the specified operation and
 	 * check for as many conditions that would result in blocking as
@@ -857,8 +841,12 @@ xfs_file_iomap_begin(
 	return error;
 }
 
+const struct iomap_ops xfs_direct_write_iomap_ops = {
+	.iomap_begin		= xfs_direct_write_iomap_begin,
+};
+
 static int
-xfs_file_iomap_begin_delay(
+xfs_buffered_write_iomap_begin(
 	struct inode		*inode,
 	loff_t			offset,
 	loff_t			count,
@@ -877,8 +865,12 @@ xfs_file_iomap_begin_delay(
 	int			whichfork = XFS_DATA_FORK;
 	int			error = 0;
 
+	/* we can't use delayed allocations when using extent size hints */
+	if (xfs_get_extsz_hint(ip))
+		return xfs_direct_write_iomap_begin(inode, offset, count,
+				flags, iomap, srcmap);
+
 	ASSERT(!XFS_IS_REALTIME_INODE(ip));
-	ASSERT(!xfs_get_extsz_hint(ip));
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 
@@ -1070,18 +1062,23 @@ xfs_file_iomap_begin_delay(
 }
 
 static int
-xfs_file_iomap_end_delalloc(
-	struct xfs_inode	*ip,
+xfs_buffered_write_iomap_end(
+	struct inode		*inode,
 	loff_t			offset,
 	loff_t			length,
 	ssize_t			written,
+	unsigned		flags,
 	struct iomap		*iomap)
 {
+	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		start_fsb;
 	xfs_fileoff_t		end_fsb;
 	int			error = 0;
 
+	if (iomap->type != IOMAP_DELALLOC)
+		return 0;
+
 	/*
 	 * Behave as if the write failed if drop writes is enabled. Set the NEW
 	 * flag to force delalloc cleanup.
@@ -1126,25 +1123,9 @@ xfs_file_iomap_end_delalloc(
 	return 0;
 }
 
-static int
-xfs_file_iomap_end(
-	struct inode		*inode,
-	loff_t			offset,
-	loff_t			length,
-	ssize_t			written,
-	unsigned		flags,
-	struct iomap		*iomap)
-{
-	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
-	    iomap->type == IOMAP_DELALLOC)
-		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
-				length, written, iomap);
-	return 0;
-}
-
-const struct iomap_ops xfs_iomap_ops = {
-	.iomap_begin		= xfs_file_iomap_begin,
-	.iomap_end		= xfs_file_iomap_end,
+const struct iomap_ops xfs_buffered_write_iomap_ops = {
+	.iomap_begin		= xfs_buffered_write_iomap_begin,
+	.iomap_end		= xfs_buffered_write_iomap_end,
 };
 
 static int
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 61b1fc3e5143..7aed28275089 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -39,7 +39,8 @@ xfs_aligned_fsb_count(
 	return count_fsb;
 }
 
-extern const struct iomap_ops xfs_iomap_ops;
+extern const struct iomap_ops xfs_buffered_write_iomap_ops;
+extern const struct iomap_ops xfs_direct_write_iomap_ops;
 extern const struct iomap_ops xfs_read_iomap_ops;
 extern const struct iomap_ops xfs_seek_iomap_ops;
 extern const struct iomap_ops xfs_xattr_iomap_ops;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 9c448a54a951..329a34af8e79 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -883,10 +883,10 @@ xfs_setattr_size(
 	if (newsize > oldsize) {
 		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
 		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-				&did_zeroing, &xfs_iomap_ops);
+				&did_zeroing, &xfs_buffered_write_iomap_ops);
 	} else {
 		error = iomap_truncate_page(inode, newsize, &did_zeroing,
-				&xfs_iomap_ops);
+				&xfs_buffered_write_iomap_ops);
 	}
 
 	if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 19a6e4644123..1e18b4024b82 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1270,7 +1270,7 @@ xfs_reflink_zero_posteof(
 
 	trace_xfs_zero_eof(ip, isize, pos - isize);
 	return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
-			&xfs_iomap_ops);
+			&xfs_buffered_write_iomap_ops);
 }
 
 /*
@@ -1527,7 +1527,8 @@ xfs_reflink_unshare(
 
 	inode_dio_wait(inode);
 
-	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
+	error = iomap_file_unshare(inode, offset, len,
+			&xfs_buffered_write_iomap_ops);
 	if (error)
 		goto out;
 	error = filemap_write_and_wait(inode->i_mapping);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 18/20] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 19/20] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 20/20] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel
  Cc: Allison Collins, Darrick J . Wong

Renaming whichfork to allocfork in xfs_buffered_write_iomap_begin makes
the usage of this variable a little more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 5a7499f88673..7a3fcfe5662c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -862,7 +862,7 @@ xfs_buffered_write_iomap_begin(
 	struct xfs_iext_cursor	icur, ccur;
 	xfs_fsblock_t		prealloc_blocks = 0;
 	bool			eof = false, cow_eof = false, shared = false;
-	int			whichfork = XFS_DATA_FORK;
+	int			allocfork = XFS_DATA_FORK;
 	int			error = 0;
 
 	/* we can't use delayed allocations when using extent size hints */
@@ -959,7 +959,7 @@ xfs_buffered_write_iomap_begin(
 		 * Fork all the shared blocks from our write offset until the
 		 * end of the extent.
 		 */
-		whichfork = XFS_COW_FORK;
+		allocfork = XFS_COW_FORK;
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 	} else {
 		/*
@@ -975,7 +975,7 @@ xfs_buffered_write_iomap_begin(
 		end_fsb = xfs_iomap_end_fsb(mp, offset, count);
 
 		if (xfs_is_always_cow_inode(ip))
-			whichfork = XFS_COW_FORK;
+			allocfork = XFS_COW_FORK;
 	}
 
 	error = xfs_qm_dqattach_locked(ip, false);
@@ -983,7 +983,7 @@ xfs_buffered_write_iomap_begin(
 		goto out_unlock;
 
 	if (eof) {
-		prealloc_blocks = xfs_iomap_prealloc_size(ip, whichfork, offset,
+		prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset,
 				count, &icur);
 		if (prealloc_blocks) {
 			xfs_extlen_t	align;
@@ -1006,11 +1006,11 @@ xfs_buffered_write_iomap_begin(
 	}
 
 retry:
-	error = xfs_bmapi_reserve_delalloc(ip, whichfork, offset_fsb,
+	error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
 			end_fsb - offset_fsb, prealloc_blocks,
-			whichfork == XFS_DATA_FORK ? &imap : &cmap,
-			whichfork == XFS_DATA_FORK ? &icur : &ccur,
-			whichfork == XFS_DATA_FORK ? eof : cow_eof);
+			allocfork == XFS_DATA_FORK ? &imap : &cmap,
+			allocfork == XFS_DATA_FORK ? &icur : &ccur,
+			allocfork == XFS_DATA_FORK ? eof : cow_eof);
 	switch (error) {
 	case 0:
 		break;
@@ -1027,8 +1027,8 @@ xfs_buffered_write_iomap_begin(
 		goto out_unlock;
 	}
 
-	if (whichfork == XFS_COW_FORK) {
-		trace_xfs_iomap_alloc(ip, offset, count, whichfork, &cmap);
+	if (allocfork == XFS_COW_FORK) {
+		trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap);
 		goto found_cow;
 	}
 
@@ -1037,7 +1037,7 @@ xfs_buffered_write_iomap_begin(
 	 * them out if the write happens to fail.
 	 */
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	trace_xfs_iomap_alloc(ip, offset, count, whichfork, &imap);
+	trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
 	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
 
 found_imap:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 19/20] xfs: cleanup xfs_iomap_write_unwritten
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 18/20] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  2019-10-08  7:15 ` [PATCH 20/20] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

Move more checks into the helpers that determine if we need a COW
operation or allocation and split the return path for when an existing
data for allocation has been found versus a new allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c | 74 ++++++++++++++++++++++++----------------------
 1 file changed, 38 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7a3fcfe5662c..cc3d8fa011fc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -642,23 +642,42 @@ xfs_iomap_write_unwritten(
 static inline bool
 imap_needs_alloc(
 	struct inode		*inode,
+	unsigned		flags,
 	struct xfs_bmbt_irec	*imap,
 	int			nimaps)
 {
-	return !nimaps ||
-		imap->br_startblock == HOLESTARTBLOCK ||
-		imap->br_startblock == DELAYSTARTBLOCK ||
-		(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
+	/* don't allocate blocks when just zeroing */
+	if (flags & IOMAP_ZERO)
+		return false;
+	if (!nimaps ||
+	    imap->br_startblock == HOLESTARTBLOCK ||
+	    imap->br_startblock == DELAYSTARTBLOCK)
+		return true;
+	/* we convert unwritten extents before copying the data for DAX */
+	if (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN)
+		return true;
+	return false;
 }
 
 static inline bool
-needs_cow_for_zeroing(
+imap_needs_cow(
+	struct xfs_inode	*ip,
+	unsigned int		flags,
 	struct xfs_bmbt_irec	*imap,
 	int			nimaps)
 {
-	return nimaps &&
-		imap->br_startblock != HOLESTARTBLOCK &&
-		imap->br_state != XFS_EXT_UNWRITTEN;
+	if (!xfs_is_cow_inode(ip))
+		return false;
+
+	/* when zeroing we don't have to COW holes or unwritten extents */
+	if (flags & IOMAP_ZERO) {
+		if (!nimaps ||
+		    imap->br_startblock == HOLESTARTBLOCK ||
+		    imap->br_state == XFS_EXT_UNWRITTEN)
+			return false;
+	}
+
+	return true;
 }
 
 static int
@@ -734,7 +753,6 @@ xfs_direct_write_iomap_begin(
 	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
 	int			nimaps = 1, error = 0;
 	bool			shared = false;
-	u16			iomap_flags = 0;
 	unsigned		lockmode;
 
 	ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
@@ -761,12 +779,7 @@ xfs_direct_write_iomap_begin(
 	 * Break shared extents if necessary. Checks for non-blocking IO have
 	 * been done up front, so we don't need to do them here.
 	 */
-	if (xfs_is_cow_inode(ip)) {
-		/* if zeroing doesn't need COW allocation, then we are done. */
-		if ((flags & IOMAP_ZERO) &&
-		    !needs_cow_for_zeroing(&imap, nimaps))
-			goto out_found;
-
+	if (imap_needs_cow(ip, flags, &imap, nimaps)) {
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
 				&lockmode, flags & IOMAP_DIRECT);
@@ -778,18 +791,17 @@ xfs_direct_write_iomap_begin(
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
 	}
 
-	/* Don't need to allocate over holes when doing zeroing operations. */
-	if (flags & IOMAP_ZERO)
-		goto out_found;
+	if (imap_needs_alloc(inode, flags, &imap, nimaps))
+		goto allocate_blocks;
 
-	if (!imap_needs_alloc(inode, &imap, nimaps))
-		goto out_found;
+	xfs_iunlock(ip, lockmode);
+	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, 0);
 
-	/* If nowait is set bail since we are going to make allocations. */
-	if (flags & IOMAP_NOWAIT) {
-		error = -EAGAIN;
+allocate_blocks:
+	error = -EAGAIN;
+	if (flags & IOMAP_NOWAIT)
 		goto out_unlock;
-	}
 
 	/*
 	 * We cap the maximum length we map to a sane size  to keep the chunks
@@ -808,22 +820,12 @@ xfs_direct_write_iomap_begin(
 	 */
 	if (lockmode == XFS_ILOCK_EXCL)
 		xfs_ilock_demote(ip, lockmode);
-	error = xfs_iomap_write_direct(ip, offset, length, &imap,
-			nimaps);
+	error = xfs_iomap_write_direct(ip, offset, length, &imap, nimaps);
 	if (error)
 		return error;
 
-	iomap_flags |= IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
-
-out_finish:
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
-
-out_found:
-	ASSERT(nimaps);
-	xfs_iunlock(ip, lockmode);
-	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
-	goto out_finish;
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_NEW);
 
 out_found_cow:
 	xfs_iunlock(ip, lockmode);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 20/20] xfs: improve the IOMAP_NOWAIT check for COW inodes
  2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2019-10-08  7:15 ` [PATCH 19/20] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
@ 2019-10-08  7:15 ` Christoph Hellwig
  19 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-08  7:15 UTC (permalink / raw)
  To: Goldwyn Rodrigues, linux-xfs, linux-fsdevel; +Cc: Darrick J . Wong

Only bail out once we know that a COW allocation is actually required,
similar to how we handle normal data fork allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iomap.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index cc3d8fa011fc..46dee9c9e6ef 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -693,15 +693,8 @@ xfs_ilock_for_iomap(
 	 * COW writes may allocate delalloc space or convert unwritten COW
 	 * extents, so we need to make sure to take the lock exclusively here.
 	 */
-	if (xfs_is_cow_inode(ip) && is_write) {
-		/*
-		 * FIXME: It could still overwrite on unshared extents and not
-		 * need allocation.
-		 */
-		if (flags & IOMAP_NOWAIT)
-			return -EAGAIN;
+	if (xfs_is_cow_inode(ip) && is_write)
 		mode = XFS_ILOCK_EXCL;
-	}
 
 	/*
 	 * Extents not yet cached requires exclusive access, don't block.  This
@@ -760,12 +753,6 @@ xfs_direct_write_iomap_begin(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	/*
-	 * Lock the inode in the manner required for the specified operation and
-	 * check for as many conditions that would result in blocking as
-	 * possible. This removes most of the non-blocking checks from the
-	 * mapping code below.
-	 */
 	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
 	if (error)
 		return error;
@@ -775,11 +762,11 @@ xfs_direct_write_iomap_begin(
 	if (error)
 		goto out_unlock;
 
-	/*
-	 * Break shared extents if necessary. Checks for non-blocking IO have
-	 * been done up front, so we don't need to do them here.
-	 */
 	if (imap_needs_cow(ip, flags, &imap, nimaps)) {
+		error = -EAGAIN;
+		if (flags & IOMAP_NOWAIT)
+			goto out_unlock;
+
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
 				&lockmode, flags & IOMAP_DIRECT);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
  2019-10-08  7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
@ 2019-10-08 15:00   ` Darrick J. Wong
  2019-10-09  6:28     ` Christoph Hellwig
  2019-10-14 23:27   ` Darrick J. Wong
  1 sibling, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:15AM +0200, Christoph Hellwig wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The srcmap is used to identify where the read is to be performed from.
> It is passed to ->iomap_begin, which can fill it in if we need to read
> data for partially written blocks from a different location than the
> write target.  The srcmap is only supported for buffered writes so far.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
>       srcmap if not set, adjust length down to srcmap end as well]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c               |  9 ++++--
>  fs/ext2/inode.c        |  2 +-
>  fs/ext4/inode.c        |  2 +-
>  fs/gfs2/bmap.c         |  3 +-
>  fs/iomap/apply.c       | 25 ++++++++++++----
>  fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
>  fs/iomap/direct-io.c   |  2 +-
>  fs/iomap/fiemap.c      |  4 +--
>  fs/iomap/seek.c        |  4 +--
>  fs/iomap/swapfile.c    |  3 +-
>  fs/xfs/xfs_iomap.c     |  9 ++++--
>  include/linux/iomap.h  |  5 ++--
>  12 files changed, 80 insertions(+), 53 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 6bf81f931de3..920105457c2c 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
>  
>  static loff_t
>  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct block_device *bdev = iomap->bdev;
>  	struct dax_device *dax_dev = iomap->dax_dev;
> @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	unsigned long vaddr = vmf->address;
>  	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
>  	struct iomap iomap = { 0 };

Does this definition ^^^^^ need to be converted too?  You convert the
one in iomap_apply()...

> +	struct iomap srcmap = { .type = IOMAP_HOLE };

...and at the same time I wonder if we ought to have:

	/*
	 * The @iomap and @srcmap parameters should be set to a hole
	 * prior to calling ->iomap_begin.
	 */
	#define IOMAP_EMPTY_RECORD	{ .type = IOMAP_HOLE }

...and later...

	struct iomap srcmap = IOMAP_EMPTY_RECORD;

..but meh, I'm not sure that adds much.

>  	unsigned flags = IOMAP_FAULT;
>  	int error, major = 0;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	 * the file system block size to be equal the page size, which means
>  	 * that we never have to deal with more than a single extent here.
>  	 */
> -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> +	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);

->iomap_begin callers are never supposed to touch srcmap, right?
Maybe we ought to check that srcmap.io_type == HOLE, at least until
someone fixes this code to dax-copy the data from srcmap to iomap?

(I don't like this open-coded iomap_apply here, but fixing that is for
another day because I once tried to extract the iteration pieces and
yeurghck...)

The rest of the patch looks ok.

--D

>  	if (iomap_errp)
>  		*iomap_errp = error;
>  	if (error) {
> @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	struct inode *inode = mapping->host;
>  	vm_fault_t result = VM_FAULT_FALLBACK;
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { .type = IOMAP_HOLE };
>  	pgoff_t max_pgoff;
>  	void *entry;
>  	loff_t pos;
> @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	 * to look up our filesystem block.
>  	 */
>  	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> -	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
> +	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
> +			&srcmap);
>  	if (error)
>  		goto unlock_entry;
>  
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 7004ce581a32..467c13ff6b40 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
>  
>  #ifdef CONFIG_FS_DAX
>  static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> -		unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	unsigned int blkbits = inode->i_blkbits;
>  	unsigned long first_block = offset >> blkbits;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 516faa280ced..abaaf7d96ca4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>  }
>  
>  static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> -			    unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	unsigned int blkbits = inode->i_blkbits;
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index f63df54a08c6..516103248272 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
>  }
>  
>  static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
> -			    unsigned flags, struct iomap *iomap)
> +			    unsigned flags, struct iomap *iomap,
> +			    struct iomap *srcmap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  	struct metapath mp = { .mp_aheight = 1, };
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 54c02aecf3cd..484dd8eda861 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -23,8 +23,10 @@ loff_t
>  iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
>  {
> -	struct iomap iomap = { 0 };
> +	struct iomap iomap = { .type = IOMAP_HOLE };
> +	struct iomap srcmap = { .type = IOMAP_HOLE };
>  	loff_t written = 0, ret;
> +	u64 end;
>  
>  	/*
>  	 * Need to map a range from start position for length bytes. This can
> @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  	 * expose transient stale data. If the reserve fails, we can safely
>  	 * back out at this point as there is nothing to undo.
>  	 */
> -	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> +	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
>  	if (ret)
>  		return ret;
>  	if (WARN_ON(iomap.offset > pos))
> @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  	 * Cut down the length to the one actually provided by the filesystem,
>  	 * as it might not be able to give us the whole size that we requested.
>  	 */
> -	if (iomap.offset + iomap.length < pos + length)
> -		length = iomap.offset + iomap.length - pos;
> +	end = iomap.offset + iomap.length;
> +	if (srcmap.type != IOMAP_HOLE)
> +		end = min(end, srcmap.offset + srcmap.length);
> +	if (pos + length > end)
> +		length = end - pos;
>  
>  	/*
> -	 * Now that we have guaranteed that the space allocation will succeed.
> +	 * Now that we have guaranteed that the space allocation will succeed,
>  	 * we can do the copy-in page by page without having to worry about
>  	 * failures exposing transient data.
> +	 *
> +	 * To support COW operations, we read in data for partially blocks from
> +	 * the srcmap if the file system filled it in.  In that case we the
> +	 * length needs to be limited to the earlier of the ends of the iomaps.
> +	 * If the file system did not provide a srcmap we pass in the normal
> +	 * iomap into the actors so that they don't need to have special
> +	 * handling for the two cases.
>  	 */
> -	written = actor(inode, pos, length, data, &iomap);
> +	written = actor(inode, pos, length, data, &iomap,
> +			srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
>  
>  	/*
>  	 * Now the data has been copied, commit the range we've copied.  This
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ac1bbed71a9b..eb2c6d73a837 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
>  
>  static loff_t
>  iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
>  	struct page *page = ctx->cur_page;
> @@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
>  
>  static loff_t
>  iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
>  	loff_t done, ret;
> @@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  			ctx->cur_page_in_bio = false;
>  		}
>  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> -				ctx, iomap);
> +				ctx, iomap, srcmap);
>  	}
>  
>  	return done;
> @@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  
>  static int
>  __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> -		struct page *page, struct iomap *iomap)
> +		struct page *page, struct iomap *srcmap)
>  {
>  	struct iomap_page *iop = iomap_page_create(inode, page);
>  	loff_t block_size = i_blocksize(inode);
> @@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		    (to <= poff || to >= poff + plen))
>  			continue;
>  
> -		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> +		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
>  			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
>  				return -EIO;
>  			zero_user_segments(page, poff, from, to, poff + plen);
> @@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		}
>  
>  		status = iomap_read_page_sync(block_start, page, poff, plen,
> -				iomap);
> +				srcmap);
>  		if (status)
>  			return status;
>  	} while ((block_start += plen) < block_end);
> @@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> -		struct page **pagep, struct iomap *iomap)
> +		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	struct page *page;
>  	int status = 0;
>  
>  	BUG_ON(pos + len > iomap->offset + iomap->length);
> +	if (srcmap != iomap)
> +		BUG_ON(pos + len > srcmap->offset + srcmap->length);

This should be a WARN_ON(...) followed by return -EIO, right?

>  
>  	if (fatal_signal_pending(current))
>  		return -EINTR;
> @@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		goto out_no_page;
>  	}
>  
> -	if (iomap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, iomap);
> +	if (srcmap->type == IOMAP_INLINE)
> +		iomap_read_inline_data(inode, page, srcmap);
>  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> -		status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, flags, page,
> -				iomap);
> +				srcmap);
>  
>  	if (unlikely(status))
>  		goto out_unlock;
> @@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  }
>  
>  static int
> -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page, struct iomap *iomap)
> +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> +		struct page *page, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	loff_t old_size = inode->i_size;
>  	int ret;
>  
> -	if (iomap->type == IOMAP_INLINE) {
> +	if (srcmap->type == IOMAP_INLINE) {
>  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> -	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> +	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
>  		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
>  				page, NULL);
>  	} else {
> @@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  
>  static loff_t
>  iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iov_iter *i = data;
>  	long status = 0;
> @@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  			break;
>  		}
>  
> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> +		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> +				srcmap);
>  		if (unlikely(status))
>  			break;
>  
> @@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		flush_dcache_page(page);
>  
> -		status = iomap_write_end(inode, pos, bytes, copied, page,
> -				iomap);
> +		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> +				srcmap);
>  		if (unlikely(status < 0))
>  			break;
>  		copied = status;
> @@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
>  static loff_t
>  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	long status = 0;
>  	ssize_t written = 0;
> @@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	if (!(iomap->flags & IOMAP_F_SHARED))
>  		return length;
>  	/* don't bother with holes or unwritten extents */
> -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return length;
>  
>  	do {
> @@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		struct page *page;
>  
>  		status = iomap_write_begin(inode, pos, bytes,
> -				IOMAP_WRITE_F_UNSHARE, &page, iomap);
> +				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
>  		if (unlikely(status))
>  			return status;
>  
> -		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> +				srcmap);
>  		if (unlikely(status <= 0)) {
>  			if (WARN_ON_ONCE(status == 0))
>  				return -EIO;
> @@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
>  static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> -		unsigned bytes, struct iomap *iomap)
> +		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page;
>  	int status;
>  
> -	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> +	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
>  	if (status)
>  		return status;
>  
>  	zero_user(page, offset, bytes);
>  	mark_page_accessed(page);
>  
> -	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
>  }
>  
>  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
>  
>  static loff_t
>  iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	bool *did_zero = data;
>  	loff_t written = 0;
>  	int status;
>  
>  	/* already zeroed?  we're done. */
> -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return count;
>  
>  	do {
> @@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
>  		if (IS_DAX(inode))
>  			status = iomap_dax_zero(pos, offset, bytes, iomap);
>  		else
> -			status = iomap_zero(inode, pos, offset, bytes, iomap);
> +			status = iomap_zero(inode, pos, offset, bytes, iomap,
> +					srcmap);
>  		if (status < 0)
>  			return status;
>  
> @@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
>  
>  static loff_t
>  iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page = data;
>  	int ret;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 1fc28c2da279..e3ccbf7daaae 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>  
>  static loff_t
>  iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_dio *dio = data;
>  
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index f26fdd36e383..690ef2d7c6c8 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
>  
>  static loff_t
>  iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct fiemap_ctx *ctx = data;
>  	loff_t ret = length;
> @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
>  
>  static loff_t
>  iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	sector_t *bno = data, addr;
>  
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index c04bad4b2b43..89f61d93c0bc 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
>  
>  static loff_t
>  iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> -		      void *data, struct iomap *iomap)
> +		      void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	switch (iomap->type) {
>  	case IOMAP_UNWRITTEN:
> @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
>  
>  static loff_t
>  iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> -		      void *data, struct iomap *iomap)
> +		      void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	switch (iomap->type) {
>  	case IOMAP_HOLE:
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 152a230f668d..a648dbf6991e 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>   * distinction between written and unwritten extents.
>   */
>  static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> -		loff_t count, void *data, struct iomap *iomap)
> +		loff_t count, void *data, struct iomap *iomap,
> +		struct iomap *srcmap)
>  {
>  	struct iomap_swapfile_info *isi = data;
>  	int error;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c0a492353826..016adcd7dd66 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 24c784e44274..37af5f9dc722 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -127,7 +127,8 @@ struct iomap_ops {
>  	 * The actual length is returned in iomap->length.
>  	 */
>  	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> -			unsigned flags, struct iomap *iomap);
> +			unsigned flags, struct iomap *iomap,
> +			struct iomap *srcmap);
>  
>  	/*
>  	 * Commit and/or unreserve space previous allocated using iomap_begin.
> @@ -143,7 +144,7 @@ struct iomap_ops {
>   * Main iomap iterator function.
>   */
>  typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> -		void *data, struct iomap *iomap);
> +		void *data, struct iomap *iomap, struct iomap *srcmap);
>  
>  loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
>  		unsigned flags, const struct iomap_ops *ops, void *data,
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0
  2019-10-08  7:15 ` [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0 Christoph Hellwig
@ 2019-10-08 15:01   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:14AM +0200, Christoph Hellwig wrote:
> Instead of keeping a separate unnamed state for uninitialized iomaps,
> renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated
> as a hole.
> 
> Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  include/linux/iomap.h | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 220f6b17a1a7..24c784e44274 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -23,11 +23,11 @@ struct vm_fault;
>  /*
>   * Types of block ranges for iomap mappings:
>   */
> -#define IOMAP_HOLE	0x01	/* no blocks allocated, need allocation */
> -#define IOMAP_DELALLOC	0x02	/* delayed allocation blocks */
> -#define IOMAP_MAPPED	0x03	/* blocks allocated at @addr */
> -#define IOMAP_UNWRITTEN	0x04	/* blocks allocated at @addr in unwritten state */
> -#define IOMAP_INLINE	0x05	/* data inline in the inode */
> +#define IOMAP_HOLE	0	/* no blocks allocated, need allocation */
> +#define IOMAP_DELALLOC	1	/* delayed allocation blocks */
> +#define IOMAP_MAPPED	2	/* blocks allocated at @addr */
> +#define IOMAP_UNWRITTEN	3	/* blocks allocated at @addr in unwritten state */
> +#define IOMAP_INLINE	4	/* data inline in the inode */
>  
>  /*
>   * Flags reported by the file system from iomap_begin:
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/20] iomap: use write_begin to read pages to unshare
  2019-10-08  7:15 ` [PATCH 06/20] iomap: use write_begin to read pages to unshare Christoph Hellwig
@ 2019-10-08 15:12   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:13AM +0200, Christoph Hellwig wrote:
> Use the existing iomap write_begin code to read the pages unshared
> by iomap_file_unshare.  That avoids the extra ->readpage call and
> extent tree lookup currently done by read_mapping_page.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/iomap/buffered-io.c | 49 ++++++++++++++----------------------------
>  1 file changed, 16 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d5abd8e5dca7..ac1bbed71a9b 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -548,6 +548,10 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
>  EXPORT_SYMBOL_GPL(iomap_migrate_page);
>  #endif /* CONFIG_MIGRATION */
>  
> +enum {
> +	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
> +};
> +
>  static void
>  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  {
> @@ -577,7 +581,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  }
>  
>  static int
> -__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> +__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		struct page *page, struct iomap *iomap)
>  {
>  	struct iomap_page *iop = iomap_page_create(inode, page);
> @@ -596,11 +600,14 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>  		if (plen == 0)
>  			break;
>  
> -		if ((from <= poff || from >= poff + plen) &&
> +		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
> +		    (from <= poff || from >= poff + plen) &&
>  		    (to <= poff || to >= poff + plen))
>  			continue;
>  
>  		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> +			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> +				return -EIO;
>  			zero_user_segments(page, poff, from, to, poff + plen);
>  			iomap_set_range_uptodate(page, poff, plen);
>  			continue;
> @@ -646,7 +653,8 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, iomap);
>  	else
> -		status = __iomap_write_begin(inode, pos, len, page, iomap);
> +		status = __iomap_write_begin(inode, pos, len, flags, page,
> +				iomap);
>  
>  	if (unlikely(status))
>  		goto out_unlock;
> @@ -869,22 +877,6 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
> -static struct page *
> -__iomap_read_page(struct inode *inode, loff_t offset)
> -{
> -	struct address_space *mapping = inode->i_mapping;
> -	struct page *page;
> -
> -	page = read_mapping_page(mapping, offset >> PAGE_SHIFT, NULL);
> -	if (IS_ERR(page))
> -		return page;
> -	if (!PageUptodate(page)) {
> -		put_page(page);
> -		return ERR_PTR(-EIO);
> -	}
> -	return page;
> -}
> -
>  static loff_t
>  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		struct iomap *iomap)
> @@ -900,24 +892,15 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		return length;
>  
>  	do {
> -		struct page *page, *rpage;
> -		unsigned long offset;	/* Offset into pagecache page */
> -		unsigned long bytes;	/* Bytes to write to page */
> -
> -		offset = offset_in_page(pos);
> -		bytes = min_t(loff_t, PAGE_SIZE - offset, length);
> -
> -		rpage = __iomap_read_page(inode, pos);
> -		if (IS_ERR(rpage))
> -			return PTR_ERR(rpage);
> +		unsigned long offset = offset_in_page(pos);
> +		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
> +		struct page *page;
>  
> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> -		put_page(rpage);
> +		status = iomap_write_begin(inode, pos, bytes,
> +				IOMAP_WRITE_F_UNSHARE, &page, iomap);
>  		if (unlikely(status))
>  			return status;
>  
> -		WARN_ON_ONCE(!PageUptodate(page));
> -
>  		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
>  		if (unlikely(status <= 0)) {
>  			if (WARN_ON_ONCE(status == 0))
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents
  2019-10-08  7:15 ` [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
@ 2019-10-08 15:20   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:17AM +0200, Christoph Hellwig wrote:
> Now that xfs_file_unshare is not completely dumb we can just call it
> directly without iterating the extent and reflink btrees ourselves.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_reflink.c | 103 +++----------------------------------------
>  1 file changed, 5 insertions(+), 98 deletions(-)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index a9634110c783..7fc728a8852b 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1381,85 +1381,6 @@ xfs_reflink_remap_prep(
>  	return ret;
>  }
>  
> -/*
> - * The user wants to preemptively CoW all shared blocks in this file,
> - * which enables us to turn off the reflink flag.  Iterate all
> - * extents which are not prealloc/delalloc to see which ranges are
> - * mentioned in the refcount tree, then read those blocks into the
> - * pagecache, dirty them, fsync them back out, and then we can update
> - * the inode flag.  What happens if we run out of memory? :)
> - */
> -STATIC int
> -xfs_reflink_dirty_extents(
> -	struct xfs_inode	*ip,
> -	xfs_fileoff_t		fbno,
> -	xfs_filblks_t		end,
> -	xfs_off_t		isize)
> -{
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_agnumber_t		agno;
> -	xfs_agblock_t		agbno;
> -	xfs_extlen_t		aglen;
> -	xfs_agblock_t		rbno;
> -	xfs_extlen_t		rlen;
> -	xfs_off_t		fpos;
> -	xfs_off_t		flen;
> -	struct xfs_bmbt_irec	map[2];
> -	int			nmaps;
> -	int			error = 0;
> -
> -	while (end - fbno > 0) {
> -		nmaps = 1;
> -		/*
> -		 * Look for extents in the file.  Skip holes, delalloc, or
> -		 * unwritten extents; they can't be reflinked.
> -		 */
> -		error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
> -		if (error)
> -			goto out;
> -		if (nmaps == 0)
> -			break;
> -		if (!xfs_bmap_is_real_extent(&map[0]))
> -			goto next;
> -
> -		map[1] = map[0];
> -		while (map[1].br_blockcount) {
> -			agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
> -			agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
> -			aglen = map[1].br_blockcount;
> -
> -			error = xfs_reflink_find_shared(mp, NULL, agno, agbno,
> -					aglen, &rbno, &rlen, true);
> -			if (error)
> -				goto out;
> -			if (rbno == NULLAGBLOCK)
> -				break;
> -
> -			/* Dirty the pages */
> -			xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -			fpos = XFS_FSB_TO_B(mp, map[1].br_startoff +
> -					(rbno - agbno));
> -			flen = XFS_FSB_TO_B(mp, rlen);
> -			if (fpos + flen > isize)
> -				flen = isize - fpos;
> -			error = iomap_file_unshare(VFS_I(ip), fpos, flen,
> -					&xfs_iomap_ops);
> -			xfs_ilock(ip, XFS_ILOCK_EXCL);
> -			if (error)
> -				goto out;
> -
> -			map[1].br_blockcount -= (rbno - agbno + rlen);
> -			map[1].br_startoff += (rbno - agbno + rlen);
> -			map[1].br_startblock += (rbno - agbno + rlen);
> -		}
> -
> -next:
> -		fbno = map[0].br_startoff + map[0].br_blockcount;
> -	}
> -out:
> -	return error;
> -}
> -
>  /* Does this inode need the reflink flag? */
>  int
>  xfs_reflink_inode_has_shared_extents(
> @@ -1596,10 +1517,7 @@ xfs_reflink_unshare(
>  	xfs_off_t		offset,
>  	xfs_off_t		len)
>  {
> -	struct xfs_mount	*mp = ip->i_mount;
> -	xfs_fileoff_t		fbno;
> -	xfs_filblks_t		end;
> -	xfs_off_t		isize;
> +	struct inode		*inode = VFS_I(ip);
>  	int			error;
>  
>  	if (!xfs_is_reflink_inode(ip))
> @@ -1607,20 +1525,12 @@ xfs_reflink_unshare(
>  
>  	trace_xfs_reflink_unshare(ip, offset, len);
>  
> -	inode_dio_wait(VFS_I(ip));
> +	inode_dio_wait(inode);
>  
> -	/* Try to CoW the selected ranges */
> -	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	fbno = XFS_B_TO_FSBT(mp, offset);
> -	isize = i_size_read(VFS_I(ip));
> -	end = XFS_B_TO_FSB(mp, offset + len);
> -	error = xfs_reflink_dirty_extents(ip, fbno, end, isize);
> +	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
>  	if (error)
> -		goto out_unlock;
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -
> -	/* Wait for the IO to finish */
> -	error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
> +		goto out;
> +	error = filemap_write_and_wait(inode->i_mapping);
>  	if (error)
>  		goto out;
>  
> @@ -1628,11 +1538,8 @@ xfs_reflink_unshare(
>  	error = xfs_reflink_try_clear_inode_flag(ip);
>  	if (error)
>  		goto out;
> -
>  	return 0;
>  
> -out_unlock:
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  out:
>  	trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
>  	return error;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] xfs: fill out the srcmap in iomap_begin
  2019-10-08  7:15 ` [PATCH 13/20] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
@ 2019-10-08 15:26   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:20AM +0200, Christoph Hellwig wrote:
> Replace our local hacks to report the source block in the main iomap
> with the proper scrmap reporting.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_iomap.c | 49 +++++++++++++++++++++++-----------------------
>  1 file changed, 24 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 3292dfc8030a..ab2482a573d8 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -527,7 +527,8 @@ xfs_file_iomap_begin_delay(
>  	loff_t			offset,
>  	loff_t			count,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -721,11 +722,13 @@ xfs_file_iomap_begin_delay(
>  found_cow:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	if (imap.br_startoff <= offset_fsb) {
> -		/* ensure we only report blocks we have a reservation for */
> -		xfs_trim_extent(&imap, cmap.br_startoff, cmap.br_blockcount);
> -		return xfs_bmbt_to_iomap(ip, iomap, &imap, IOMAP_F_SHARED);
> +		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> +		if (error)
> +			return error;
> +	} else {
> +		xfs_trim_extent(&cmap, offset_fsb,
> +				imap.br_startoff - offset_fsb);
>  	}
> -	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
>  	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
>  
>  out_unlock:
> @@ -933,7 +936,7 @@ xfs_file_iomap_begin(
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_bmbt_irec	imap;
> +	struct xfs_bmbt_irec	imap, cmap;
>  	xfs_fileoff_t		offset_fsb, end_fsb;
>  	int			nimaps = 1, error = 0;
>  	bool			shared = false;
> @@ -947,7 +950,7 @@ xfs_file_iomap_begin(
>  			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
>  		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> -				iomap);
> +				iomap, srcmap);
>  	}
>  
>  	/*
> @@ -987,9 +990,6 @@ xfs_file_iomap_begin(
>  	 * been done up front, so we don't need to do them here.
>  	 */
>  	if (xfs_is_cow_inode(ip)) {
> -		struct xfs_bmbt_irec	cmap;
> -		bool			directio = (flags & IOMAP_DIRECT);
> -
>  		/* if zeroing doesn't need COW allocation, then we are done. */
>  		if ((flags & IOMAP_ZERO) &&
>  		    !needs_cow_for_zeroing(&imap, nimaps))
> @@ -997,23 +997,11 @@ xfs_file_iomap_begin(
>  
>  		/* may drop and re-acquire the ilock */
>  		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
> -				&lockmode, directio);
> +				&lockmode, flags & IOMAP_DIRECT);
>  		if (error)
>  			goto out_unlock;
> -
> -		/*
> -		 * For buffered writes we need to report the address of the
> -		 * previous block (if there was any) so that the higher level
> -		 * write code can perform read-modify-write operations; we
> -		 * won't need the CoW fork mapping until writeback.  For direct
> -		 * I/O, which must be block aligned, we need to report the
> -		 * newly allocated address.  If the data fork has a hole, copy
> -		 * the COW fork mapping to avoid allocating to the data fork.
> -		 */
> -		if (shared &&
> -		    (directio || imap.br_startblock == HOLESTARTBLOCK))
> -			imap = cmap;
> -
> +		if (shared)
> +			goto out_found_cow;
>  		end_fsb = imap.br_startoff + imap.br_blockcount;
>  		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
>  	}
> @@ -1067,6 +1055,17 @@ xfs_file_iomap_begin(
>  	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
>  	goto out_finish;
>  
> +out_found_cow:
> +	xfs_iunlock(ip, lockmode);
> +	length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount);
> +	trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap);
> +	if (imap.br_startblock != HOLESTARTBLOCK) {
> +		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, 0);
> +		if (error)
> +			return error;
> +	}
> +	return xfs_bmbt_to_iomap(ip, iomap, &cmap, IOMAP_F_SHARED);
> +
>  out_unlock:
>  	xfs_iunlock(ip, lockmode);
>  	return error;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 15/20] xfs: split out a new set of read-only iomap ops
  2019-10-08  7:15 ` [PATCH 15/20] xfs: split out a new set of read-only iomap ops Christoph Hellwig
@ 2019-10-08 15:27   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:22AM +0200, Christoph Hellwig wrote:
> Start untangling xfs_file_iomap_begin by splitting out the read-only
> case into its own set of iomap_ops with a very simply iomap_begin
> helper.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks the same as last time...
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_aops.c  |  9 ++++---
>  fs/xfs/xfs_file.c  |  8 +++---
>  fs/xfs/xfs_iomap.c | 61 ++++++++++++++++++++++++++++++++++------------
>  fs/xfs/xfs_iomap.h |  1 +
>  fs/xfs/xfs_iops.c  |  2 +-
>  5 files changed, 58 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 807af5c3c347..f708a2831d2f 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -635,7 +635,7 @@ xfs_vm_bmap(
>  	 */
>  	if (xfs_is_cow_inode(ip) || XFS_IS_REALTIME_INODE(ip))
>  		return 0;
> -	return iomap_bmap(mapping, block, &xfs_iomap_ops);
> +	return iomap_bmap(mapping, block, &xfs_read_iomap_ops);
>  }
>  
>  STATIC int
> @@ -643,7 +643,7 @@ xfs_vm_readpage(
>  	struct file		*unused,
>  	struct page		*page)
>  {
> -	return iomap_readpage(page, &xfs_iomap_ops);
> +	return iomap_readpage(page, &xfs_read_iomap_ops);
>  }
>  
>  STATIC int
> @@ -653,7 +653,7 @@ xfs_vm_readpages(
>  	struct list_head	*pages,
>  	unsigned		nr_pages)
>  {
> -	return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops);
> +	return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
>  }
>  
>  static int
> @@ -663,7 +663,8 @@ xfs_iomap_swapfile_activate(
>  	sector_t			*span)
>  {
>  	sis->bdev = xfs_find_bdev_for_inode(file_inode(swap_file));
> -	return iomap_swapfile_activate(sis, swap_file, span, &xfs_iomap_ops);
> +	return iomap_swapfile_activate(sis, swap_file, span,
> +			&xfs_read_iomap_ops);
>  }
>  
>  const struct address_space_operations xfs_address_space_operations = {
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1ffb179f35d2..f9814306ed8e 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -188,7 +188,7 @@ xfs_file_dio_aio_read(
>  	file_accessed(iocb->ki_filp);
>  
>  	xfs_ilock(ip, XFS_IOLOCK_SHARED);
> -	ret = iomap_dio_rw(iocb, to, &xfs_iomap_ops, NULL);
> +	ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL);
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
>  	return ret;
> @@ -215,7 +215,7 @@ xfs_file_dax_read(
>  		xfs_ilock(ip, XFS_IOLOCK_SHARED);
>  	}
>  
> -	ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
> +	ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops);
>  	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
>  
>  	file_accessed(iocb->ki_filp);
> @@ -1156,7 +1156,9 @@ __xfs_filemap_fault(
>  	if (IS_DAX(inode)) {
>  		pfn_t pfn;
>  
> -		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &xfs_iomap_ops);
> +		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
> +				(write_fault && !vmf->cow_page) ?
> +				 &xfs_iomap_ops : &xfs_read_iomap_ops);
>  		if (ret & VM_FAULT_NEEDDSYNC)
>  			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
>  	} else {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index a9f9e8c9034a..0cfd973fd192 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -950,11 +950,13 @@ xfs_file_iomap_begin(
>  	u16			iomap_flags = 0;
>  	unsigned		lockmode;
>  
> +	ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO));
> +
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) &&
> -			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
> +	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
> +	    !xfs_get_extsz_hint(ip)) {
>  		/* Reserve delalloc blocks for regular writeback. */
>  		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
>  				iomap, srcmap);
> @@ -975,17 +977,6 @@ xfs_file_iomap_begin(
>  	if (error)
>  		goto out_unlock;
>  
> -	if (flags & IOMAP_REPORT) {
> -		/* Trim the mapping to the nearest shared extent boundary. */
> -		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> -		if (error)
> -			goto out_unlock;
> -	}
> -
> -	/* Non-modifying mapping requested, so we are done */
> -	if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
> -		goto out_found;
> -
>  	/*
>  	 * Break shared extents if necessary. Checks for non-blocking IO have
>  	 * been done up front, so we don't need to do them here.
> @@ -1046,8 +1037,6 @@ xfs_file_iomap_begin(
>  	trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap);
>  
>  out_finish:
> -	if (shared)
> -		iomap_flags |= IOMAP_F_SHARED;
>  	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
>  
>  out_found:
> @@ -1150,6 +1139,48 @@ const struct iomap_ops xfs_iomap_ops = {
>  	.iomap_end		= xfs_file_iomap_end,
>  };
>  
> +static int
> +xfs_read_iomap_begin(
> +	struct inode		*inode,
> +	loff_t			offset,
> +	loff_t			length,
> +	unsigned		flags,
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
> +{
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	struct xfs_bmbt_irec	imap;
> +	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
> +	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
> +	int			nimaps = 1, error = 0;
> +	bool			shared = false;
> +	unsigned		lockmode;
> +
> +	ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO)));
> +
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return -EIO;
> +
> +	error = xfs_ilock_for_iomap(ip, flags, &lockmode);
> +	if (error)
> +		return error;
> +	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap,
> +			       &nimaps, 0);
> +	if (!error && (flags & IOMAP_REPORT))
> +		error = xfs_reflink_trim_around_shared(ip, &imap, &shared);
> +	xfs_iunlock(ip, lockmode);
> +
> +	if (error)
> +		return error;
> +	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, shared ? IOMAP_F_SHARED : 0);
> +}
> +
> +const struct iomap_ops xfs_read_iomap_ops = {
> +	.iomap_begin		= xfs_read_iomap_begin,
> +};
> +
>  static int
>  xfs_seek_iomap_begin(
>  	struct inode		*inode,
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 71d0ae460c44..61b1fc3e5143 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -40,6 +40,7 @@ xfs_aligned_fsb_count(
>  }
>  
>  extern const struct iomap_ops xfs_iomap_ops;
> +extern const struct iomap_ops xfs_read_iomap_ops;
>  extern const struct iomap_ops xfs_seek_iomap_ops;
>  extern const struct iomap_ops xfs_xattr_iomap_ops;
>  
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index fe285d123d69..9c448a54a951 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -1114,7 +1114,7 @@ xfs_vn_fiemap(
>  				&xfs_xattr_iomap_ops);
>  	} else {
>  		error = iomap_fiemap(inode, fieinfo, start, length,
> -				&xfs_iomap_ops);
> +				&xfs_read_iomap_ops);
>  	}
>  	xfs_iunlock(XFS_I(inode), XFS_IOLOCK_SHARED);
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes
  2019-10-08  7:15 ` [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
@ 2019-10-08 15:28   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-08 15:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:24AM +0200, Christoph Hellwig wrote:
> Instead of lots of magic conditionals in the main write_begin
> handler this make the intent very clear.  Thing will become even
> better once we support delayed allocations for extent size hints
> and realtime allocations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks the same as last time...right?
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/xfs_bmap_util.c |  3 ++-
>  fs/xfs/xfs_file.c      | 16 ++++++-----
>  fs/xfs/xfs_iomap.c     | 61 +++++++++++++++---------------------------
>  fs/xfs/xfs_iomap.h     |  3 ++-
>  fs/xfs/xfs_iops.c      |  4 +--
>  fs/xfs/xfs_reflink.c   |  5 ++--
>  6 files changed, 40 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 0910cb75b65d..a6831b7bdc18 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1111,7 +1111,8 @@ xfs_free_file_space(
>  		return 0;
>  	if (offset + len > XFS_ISIZE(ip))
>  		len = XFS_ISIZE(ip) - offset;
> -	error = iomap_zero_range(VFS_I(ip), offset, len, NULL, &xfs_iomap_ops);
> +	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> +			&xfs_buffered_write_iomap_ops);
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index f9814306ed8e..71ffab53a0fc 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -351,7 +351,7 @@ xfs_file_aio_write_checks(
>  	
>  		trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
>  		error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
> -				NULL, &xfs_iomap_ops);
> +				NULL, &xfs_buffered_write_iomap_ops);
>  		if (error)
>  			return error;
>  	} else
> @@ -547,7 +547,8 @@ xfs_file_dio_aio_write(
>  	}
>  
>  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> -	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, &xfs_dio_write_ops);
> +	ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> +			   &xfs_dio_write_ops);
>  
>  	/*
>  	 * If unaligned, this is the only IO in-flight. If it has not yet
> @@ -594,7 +595,7 @@ xfs_file_dax_write(
>  	count = iov_iter_count(from);
>  
>  	trace_xfs_file_dax_write(ip, count, pos);
> -	ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
> +	ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
>  	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
>  		i_size_write(inode, iocb->ki_pos);
>  		error = xfs_setfilesize(ip, pos, ret);
> @@ -641,7 +642,8 @@ xfs_file_buffered_aio_write(
>  	current->backing_dev_info = inode_to_bdi(inode);
>  
>  	trace_xfs_file_buffered_write(ip, iov_iter_count(from), iocb->ki_pos);
> -	ret = iomap_file_buffered_write(iocb, from, &xfs_iomap_ops);
> +	ret = iomap_file_buffered_write(iocb, from,
> +			&xfs_buffered_write_iomap_ops);
>  	if (likely(ret >= 0))
>  		iocb->ki_pos += ret;
>  
> @@ -1158,12 +1160,14 @@ __xfs_filemap_fault(
>  
>  		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
>  				(write_fault && !vmf->cow_page) ?
> -				 &xfs_iomap_ops : &xfs_read_iomap_ops);
> +				 &xfs_direct_write_iomap_ops :
> +				 &xfs_read_iomap_ops);
>  		if (ret & VM_FAULT_NEEDDSYNC)
>  			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
>  	} else {
>  		if (write_fault)
> -			ret = iomap_page_mkwrite(vmf, &xfs_iomap_ops);
> +			ret = iomap_page_mkwrite(vmf,
> +					&xfs_buffered_write_iomap_ops);
>  		else
>  			ret = filemap_fault(vmf);
>  	}
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index a6a03b65c4e7..5a7499f88673 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -719,16 +719,7 @@ xfs_ilock_for_iomap(
>  }
>  
>  static int
> -xfs_file_iomap_begin_delay(
> -	struct inode		*inode,
> -	loff_t			offset,
> -	loff_t			count,
> -	unsigned		flags,
> -	struct iomap		*iomap,
> -	struct iomap		*srcmap);
> -
> -static int
> -xfs_file_iomap_begin(
> +xfs_direct_write_iomap_begin(
>  	struct inode		*inode,
>  	loff_t			offset,
>  	loff_t			length,
> @@ -751,13 +742,6 @@ xfs_file_iomap_begin(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if (!(flags & IOMAP_DIRECT) && !IS_DAX(inode) &&
> -	    !xfs_get_extsz_hint(ip)) {
> -		/* Reserve delalloc blocks for regular writeback. */
> -		return xfs_file_iomap_begin_delay(inode, offset, length, flags,
> -				iomap, srcmap);
> -	}
> -
>  	/*
>  	 * Lock the inode in the manner required for the specified operation and
>  	 * check for as many conditions that would result in blocking as
> @@ -857,8 +841,12 @@ xfs_file_iomap_begin(
>  	return error;
>  }
>  
> +const struct iomap_ops xfs_direct_write_iomap_ops = {
> +	.iomap_begin		= xfs_direct_write_iomap_begin,
> +};
> +
>  static int
> -xfs_file_iomap_begin_delay(
> +xfs_buffered_write_iomap_begin(
>  	struct inode		*inode,
>  	loff_t			offset,
>  	loff_t			count,
> @@ -877,8 +865,12 @@ xfs_file_iomap_begin_delay(
>  	int			whichfork = XFS_DATA_FORK;
>  	int			error = 0;
>  
> +	/* we can't use delayed allocations when using extent size hints */
> +	if (xfs_get_extsz_hint(ip))
> +		return xfs_direct_write_iomap_begin(inode, offset, count,
> +				flags, iomap, srcmap);
> +
>  	ASSERT(!XFS_IS_REALTIME_INODE(ip));
> -	ASSERT(!xfs_get_extsz_hint(ip));
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  
> @@ -1070,18 +1062,23 @@ xfs_file_iomap_begin_delay(
>  }
>  
>  static int
> -xfs_file_iomap_end_delalloc(
> -	struct xfs_inode	*ip,
> +xfs_buffered_write_iomap_end(
> +	struct inode		*inode,
>  	loff_t			offset,
>  	loff_t			length,
>  	ssize_t			written,
> +	unsigned		flags,
>  	struct iomap		*iomap)
>  {
> +	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		start_fsb;
>  	xfs_fileoff_t		end_fsb;
>  	int			error = 0;
>  
> +	if (iomap->type != IOMAP_DELALLOC)
> +		return 0;
> +
>  	/*
>  	 * Behave as if the write failed if drop writes is enabled. Set the NEW
>  	 * flag to force delalloc cleanup.
> @@ -1126,25 +1123,9 @@ xfs_file_iomap_end_delalloc(
>  	return 0;
>  }
>  
> -static int
> -xfs_file_iomap_end(
> -	struct inode		*inode,
> -	loff_t			offset,
> -	loff_t			length,
> -	ssize_t			written,
> -	unsigned		flags,
> -	struct iomap		*iomap)
> -{
> -	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) &&
> -	    iomap->type == IOMAP_DELALLOC)
> -		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
> -				length, written, iomap);
> -	return 0;
> -}
> -
> -const struct iomap_ops xfs_iomap_ops = {
> -	.iomap_begin		= xfs_file_iomap_begin,
> -	.iomap_end		= xfs_file_iomap_end,
> +const struct iomap_ops xfs_buffered_write_iomap_ops = {
> +	.iomap_begin		= xfs_buffered_write_iomap_begin,
> +	.iomap_end		= xfs_buffered_write_iomap_end,
>  };
>  
>  static int
> diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
> index 61b1fc3e5143..7aed28275089 100644
> --- a/fs/xfs/xfs_iomap.h
> +++ b/fs/xfs/xfs_iomap.h
> @@ -39,7 +39,8 @@ xfs_aligned_fsb_count(
>  	return count_fsb;
>  }
>  
> -extern const struct iomap_ops xfs_iomap_ops;
> +extern const struct iomap_ops xfs_buffered_write_iomap_ops;
> +extern const struct iomap_ops xfs_direct_write_iomap_ops;
>  extern const struct iomap_ops xfs_read_iomap_ops;
>  extern const struct iomap_ops xfs_seek_iomap_ops;
>  extern const struct iomap_ops xfs_xattr_iomap_ops;
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 9c448a54a951..329a34af8e79 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -883,10 +883,10 @@ xfs_setattr_size(
>  	if (newsize > oldsize) {
>  		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
>  		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
> -				&did_zeroing, &xfs_iomap_ops);
> +				&did_zeroing, &xfs_buffered_write_iomap_ops);
>  	} else {
>  		error = iomap_truncate_page(inode, newsize, &did_zeroing,
> -				&xfs_iomap_ops);
> +				&xfs_buffered_write_iomap_ops);
>  	}
>  
>  	if (error)
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 19a6e4644123..1e18b4024b82 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1270,7 +1270,7 @@ xfs_reflink_zero_posteof(
>  
>  	trace_xfs_zero_eof(ip, isize, pos - isize);
>  	return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
> -			&xfs_iomap_ops);
> +			&xfs_buffered_write_iomap_ops);
>  }
>  
>  /*
> @@ -1527,7 +1527,8 @@ xfs_reflink_unshare(
>  
>  	inode_dio_wait(inode);
>  
> -	error = iomap_file_unshare(inode, offset, len, &xfs_iomap_ops);
> +	error = iomap_file_unshare(inode, offset, len,
> +			&xfs_buffered_write_iomap_ops);
>  	if (error)
>  		goto out;
>  	error = filemap_write_and_wait(inode->i_mapping);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
  2019-10-08 15:00   ` Darrick J. Wong
@ 2019-10-09  6:28     ` Christoph Hellwig
  2019-10-09 17:16       ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2019-10-09  6:28 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 08:00:44AM -0700, Darrick J. Wong wrote:
> >  	unsigned long vaddr = vmf->address;
> >  	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> >  	struct iomap iomap = { 0 };
> 
> Does this definition ^^^^^ need to be converted too?  You convert the
> one in iomap_apply()...

Doesn't strictly need to, but it sure would look nicer and fit the theme.

> 	/*
> 	 * The @iomap and @srcmap parameters should be set to a hole
> 	 * prior to calling ->iomap_begin.
> 	 */
> 	#define IOMAP_EMPTY_RECORD	{ .type = IOMAP_HOLE }
> 
> ...and later...
> 
> 	struct iomap srcmap = IOMAP_EMPTY_RECORD;
> 
> ..but meh, I'm not sure that adds much.

I don't really see the point.

> >  	unsigned flags = IOMAP_FAULT;
> >  	int error, major = 0;
> >  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> > @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> >  	 * the file system block size to be equal the page size, which means
> >  	 * that we never have to deal with more than a single extent here.
> >  	 */
> > -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> > +	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
> 
> ->iomap_begin callers are never supposed to touch srcmap, right?
> Maybe we ought to check that srcmap.io_type == HOLE, at least until
> someone fixes this code to dax-copy the data from srcmap to iomap?

What do you mean with touch?  ->iomap_begin fills it out and then the
caller looks at it, at least for places that can deal with
read-modify-write operations (DAX currently can't).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
  2019-10-09  6:28     ` Christoph Hellwig
@ 2019-10-09 17:16       ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-09 17:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Wed, Oct 09, 2019 at 08:28:24AM +0200, Christoph Hellwig wrote:
> On Tue, Oct 08, 2019 at 08:00:44AM -0700, Darrick J. Wong wrote:
> > >  	unsigned long vaddr = vmf->address;
> > >  	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> > >  	struct iomap iomap = { 0 };
> > 
> > Does this definition ^^^^^ need to be converted too?  You convert the
> > one in iomap_apply()...
> 
> Doesn't strictly need to, but it sure would look nicer and fit the theme.
> 
> > 	/*
> > 	 * The @iomap and @srcmap parameters should be set to a hole
> > 	 * prior to calling ->iomap_begin.
> > 	 */
> > 	#define IOMAP_EMPTY_RECORD	{ .type = IOMAP_HOLE }
> > 
> > ...and later...
> > 
> > 	struct iomap srcmap = IOMAP_EMPTY_RECORD;
> > 
> > ..but meh, I'm not sure that adds much.
> 
> I don't really see the point.

Yeah.  Agreed.

> > >  	unsigned flags = IOMAP_FAULT;
> > >  	int error, major = 0;
> > >  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> > > @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > >  	 * the file system block size to be equal the page size, which means
> > >  	 * that we never have to deal with more than a single extent here.
> > >  	 */
> > > -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> > > +	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
> > 
> > ->iomap_begin callers are never supposed to touch srcmap, right?
> > Maybe we ought to check that srcmap.io_type == HOLE, at least until
> > someone fixes this code to dax-copy the data from srcmap to iomap?
> 
> What do you mean with touch?  ->iomap_begin fills it out and then the
> caller looks at it, at least for places that can deal with
> read-modify-write operations (DAX currently can't).

Yes, I grok that the DAX code should never get fed a shared mapping, but
maybe we ought to have a WARN_ON_ONCE just in case some filesystem AI
programmer decides to backport a fs patch that results in sending a
non-hole srcmap back to the dax iomap callers.  /We/ know that you
should never do this, but does the AI know? <grumble>

(Yeah, pure paranoia on my part :P)

--D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
  2019-10-08  7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
  2019-10-08 15:00   ` Darrick J. Wong
@ 2019-10-14 23:27   ` Darrick J. Wong
  2019-10-15 13:00     ` Goldwyn Rodrigues
  1 sibling, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-14 23:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Goldwyn Rodrigues, linux-xfs, linux-fsdevel

On Tue, Oct 08, 2019 at 09:15:15AM +0200, Christoph Hellwig wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> The srcmap is used to identify where the read is to be performed from.
> It is passed to ->iomap_begin, which can fill it in if we need to read
> data for partially written blocks from a different location than the
> write target.  The srcmap is only supported for buffered writes so far.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

Goldwyn,

Since we've reworked your original patch quite extensively, could you
please have a look at (and if you approve, add an Acked-by) this new(er)
version so we can get this series moving for 5.5?

--D

> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
>       srcmap if not set, adjust length down to srcmap end as well]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c               |  9 ++++--
>  fs/ext2/inode.c        |  2 +-
>  fs/ext4/inode.c        |  2 +-
>  fs/gfs2/bmap.c         |  3 +-
>  fs/iomap/apply.c       | 25 ++++++++++++----
>  fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
>  fs/iomap/direct-io.c   |  2 +-
>  fs/iomap/fiemap.c      |  4 +--
>  fs/iomap/seek.c        |  4 +--
>  fs/iomap/swapfile.c    |  3 +-
>  fs/xfs/xfs_iomap.c     |  9 ++++--
>  include/linux/iomap.h  |  5 ++--
>  12 files changed, 80 insertions(+), 53 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 6bf81f931de3..920105457c2c 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
>  
>  static loff_t
>  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct block_device *bdev = iomap->bdev;
>  	struct dax_device *dax_dev = iomap->dax_dev;
> @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	unsigned long vaddr = vmf->address;
>  	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { .type = IOMAP_HOLE };
>  	unsigned flags = IOMAP_FAULT;
>  	int error, major = 0;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	 * the file system block size to be equal the page size, which means
>  	 * that we never have to deal with more than a single extent here.
>  	 */
> -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> +	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
>  	if (iomap_errp)
>  		*iomap_errp = error;
>  	if (error) {
> @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	struct inode *inode = mapping->host;
>  	vm_fault_t result = VM_FAULT_FALLBACK;
>  	struct iomap iomap = { 0 };
> +	struct iomap srcmap = { .type = IOMAP_HOLE };
>  	pgoff_t max_pgoff;
>  	void *entry;
>  	loff_t pos;
> @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  	 * to look up our filesystem block.
>  	 */
>  	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> -	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
> +	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
> +			&srcmap);
>  	if (error)
>  		goto unlock_entry;
>  
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 7004ce581a32..467c13ff6b40 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
>  
>  #ifdef CONFIG_FS_DAX
>  static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> -		unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	unsigned int blkbits = inode->i_blkbits;
>  	unsigned long first_block = offset >> blkbits;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 516faa280ced..abaaf7d96ca4 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>  }
>  
>  static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> -			    unsigned flags, struct iomap *iomap)
> +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	unsigned int blkbits = inode->i_blkbits;
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index f63df54a08c6..516103248272 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
>  }
>  
>  static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
> -			    unsigned flags, struct iomap *iomap)
> +			    unsigned flags, struct iomap *iomap,
> +			    struct iomap *srcmap)
>  {
>  	struct gfs2_inode *ip = GFS2_I(inode);
>  	struct metapath mp = { .mp_aheight = 1, };
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 54c02aecf3cd..484dd8eda861 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -23,8 +23,10 @@ loff_t
>  iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
>  {
> -	struct iomap iomap = { 0 };
> +	struct iomap iomap = { .type = IOMAP_HOLE };
> +	struct iomap srcmap = { .type = IOMAP_HOLE };
>  	loff_t written = 0, ret;
> +	u64 end;
>  
>  	/*
>  	 * Need to map a range from start position for length bytes. This can
> @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  	 * expose transient stale data. If the reserve fails, we can safely
>  	 * back out at this point as there is nothing to undo.
>  	 */
> -	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> +	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
>  	if (ret)
>  		return ret;
>  	if (WARN_ON(iomap.offset > pos))
> @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
>  	 * Cut down the length to the one actually provided by the filesystem,
>  	 * as it might not be able to give us the whole size that we requested.
>  	 */
> -	if (iomap.offset + iomap.length < pos + length)
> -		length = iomap.offset + iomap.length - pos;
> +	end = iomap.offset + iomap.length;
> +	if (srcmap.type != IOMAP_HOLE)
> +		end = min(end, srcmap.offset + srcmap.length);
> +	if (pos + length > end)
> +		length = end - pos;
>  
>  	/*
> -	 * Now that we have guaranteed that the space allocation will succeed.
> +	 * Now that we have guaranteed that the space allocation will succeed,
>  	 * we can do the copy-in page by page without having to worry about
>  	 * failures exposing transient data.
> +	 *
> +	 * To support COW operations, we read in data for partially blocks from
> +	 * the srcmap if the file system filled it in.  In that case we the
> +	 * length needs to be limited to the earlier of the ends of the iomaps.
> +	 * If the file system did not provide a srcmap we pass in the normal
> +	 * iomap into the actors so that they don't need to have special
> +	 * handling for the two cases.
>  	 */
> -	written = actor(inode, pos, length, data, &iomap);
> +	written = actor(inode, pos, length, data, &iomap,
> +			srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
>  
>  	/*
>  	 * Now the data has been copied, commit the range we've copied.  This
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ac1bbed71a9b..eb2c6d73a837 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
>  
>  static loff_t
>  iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
>  	struct page *page = ctx->cur_page;
> @@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
>  
>  static loff_t
>  iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_readpage_ctx *ctx = data;
>  	loff_t done, ret;
> @@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
>  			ctx->cur_page_in_bio = false;
>  		}
>  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> -				ctx, iomap);
> +				ctx, iomap, srcmap);
>  	}
>  
>  	return done;
> @@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  
>  static int
>  __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> -		struct page *page, struct iomap *iomap)
> +		struct page *page, struct iomap *srcmap)
>  {
>  	struct iomap_page *iop = iomap_page_create(inode, page);
>  	loff_t block_size = i_blocksize(inode);
> @@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		    (to <= poff || to >= poff + plen))
>  			continue;
>  
> -		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> +		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
>  			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
>  				return -EIO;
>  			zero_user_segments(page, poff, from, to, poff + plen);
> @@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		}
>  
>  		status = iomap_read_page_sync(block_start, page, poff, plen,
> -				iomap);
> +				srcmap);
>  		if (status)
>  			return status;
>  	} while ((block_start += plen) < block_end);
> @@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  
>  static int
>  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> -		struct page **pagep, struct iomap *iomap)
> +		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	struct page *page;
>  	int status = 0;
>  
>  	BUG_ON(pos + len > iomap->offset + iomap->length);
> +	if (srcmap != iomap)
> +		BUG_ON(pos + len > srcmap->offset + srcmap->length);
>  
>  	if (fatal_signal_pending(current))
>  		return -EINTR;
> @@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		goto out_no_page;
>  	}
>  
> -	if (iomap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, iomap);
> +	if (srcmap->type == IOMAP_INLINE)
> +		iomap_read_inline_data(inode, page, srcmap);
>  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> -		status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, flags, page,
> -				iomap);
> +				srcmap);
>  
>  	if (unlikely(status))
>  		goto out_unlock;
> @@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  }
>  
>  static int
> -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> -		unsigned copied, struct page *page, struct iomap *iomap)
> +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> +		struct page *page, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	const struct iomap_page_ops *page_ops = iomap->page_ops;
>  	loff_t old_size = inode->i_size;
>  	int ret;
>  
> -	if (iomap->type == IOMAP_INLINE) {
> +	if (srcmap->type == IOMAP_INLINE) {
>  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> -	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> +	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
>  		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
>  				page, NULL);
>  	} else {
> @@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>  
>  static loff_t
>  iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iov_iter *i = data;
>  	long status = 0;
> @@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  			break;
>  		}
>  
> -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> +		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> +				srcmap);
>  		if (unlikely(status))
>  			break;
>  
> @@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  
>  		flush_dcache_page(page);
>  
> -		status = iomap_write_end(inode, pos, bytes, copied, page,
> -				iomap);
> +		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> +				srcmap);
>  		if (unlikely(status < 0))
>  			break;
>  		copied = status;
> @@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
>  static loff_t
>  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	long status = 0;
>  	ssize_t written = 0;
> @@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	if (!(iomap->flags & IOMAP_F_SHARED))
>  		return length;
>  	/* don't bother with holes or unwritten extents */
> -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return length;
>  
>  	do {
> @@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		struct page *page;
>  
>  		status = iomap_write_begin(inode, pos, bytes,
> -				IOMAP_WRITE_F_UNSHARE, &page, iomap);
> +				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
>  		if (unlikely(status))
>  			return status;
>  
> -		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> +				srcmap);
>  		if (unlikely(status <= 0)) {
>  			if (WARN_ON_ONCE(status == 0))
>  				return -EIO;
> @@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
>  static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> -		unsigned bytes, struct iomap *iomap)
> +		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page;
>  	int status;
>  
> -	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> +	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
>  	if (status)
>  		return status;
>  
>  	zero_user(page, offset, bytes);
>  	mark_page_accessed(page);
>  
> -	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> +	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
>  }
>  
>  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> @@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
>  
>  static loff_t
>  iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	bool *did_zero = data;
>  	loff_t written = 0;
>  	int status;
>  
>  	/* already zeroed?  we're done. */
> -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
>  		return count;
>  
>  	do {
> @@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
>  		if (IS_DAX(inode))
>  			status = iomap_dax_zero(pos, offset, bytes, iomap);
>  		else
> -			status = iomap_zero(inode, pos, offset, bytes, iomap);
> +			status = iomap_zero(inode, pos, offset, bytes, iomap,
> +					srcmap);
>  		if (status < 0)
>  			return status;
>  
> @@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
>  
>  static loff_t
>  iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct page *page = data;
>  	int ret;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 1fc28c2da279..e3ccbf7daaae 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>  
>  static loff_t
>  iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct iomap_dio *dio = data;
>  
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index f26fdd36e383..690ef2d7c6c8 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
>  
>  static loff_t
>  iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap)
> +		struct iomap *iomap, struct iomap *srcmap)
>  {
>  	struct fiemap_ctx *ctx = data;
>  	loff_t ret = length;
> @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
>  
>  static loff_t
>  iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap)
> +		void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	sector_t *bno = data, addr;
>  
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index c04bad4b2b43..89f61d93c0bc 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
>  
>  static loff_t
>  iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> -		      void *data, struct iomap *iomap)
> +		      void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	switch (iomap->type) {
>  	case IOMAP_UNWRITTEN:
> @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
>  
>  static loff_t
>  iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> -		      void *data, struct iomap *iomap)
> +		      void *data, struct iomap *iomap, struct iomap *srcmap)
>  {
>  	switch (iomap->type) {
>  	case IOMAP_HOLE:
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 152a230f668d..a648dbf6991e 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>   * distinction between written and unwritten extents.
>   */
>  static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> -		loff_t count, void *data, struct iomap *iomap)
> +		loff_t count, void *data, struct iomap *iomap,
> +		struct iomap *srcmap)
>  {
>  	struct iomap_swapfile_info *isi = data;
>  	int error;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c0a492353826..016adcd7dd66 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
>  	loff_t			offset,
>  	loff_t			length,
>  	unsigned		flags,
> -	struct iomap		*iomap)
> +	struct iomap		*iomap,
> +	struct iomap		*srcmap)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 24c784e44274..37af5f9dc722 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -127,7 +127,8 @@ struct iomap_ops {
>  	 * The actual length is returned in iomap->length.
>  	 */
>  	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> -			unsigned flags, struct iomap *iomap);
> +			unsigned flags, struct iomap *iomap,
> +			struct iomap *srcmap);
>  
>  	/*
>  	 * Commit and/or unreserve space previous allocated using iomap_begin.
> @@ -143,7 +144,7 @@ struct iomap_ops {
>   * Main iomap iterator function.
>   */
>  typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> -		void *data, struct iomap *iomap);
> +		void *data, struct iomap *iomap, struct iomap *srcmap);
>  
>  loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
>  		unsigned flags, const struct iomap_ops *ops, void *data,
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O
  2019-10-14 23:27   ` Darrick J. Wong
@ 2019-10-15 13:00     ` Goldwyn Rodrigues
  0 siblings, 0 replies; 32+ messages in thread
From: Goldwyn Rodrigues @ 2019-10-15 13:00 UTC (permalink / raw)
  To:  Darrick J. Wong ; +Cc: Christoph Hellwig, linux-fsdevel, linux-xfs

On 16:27 14/10,  Darrick J. Wong  wrote:
> On Tue, Oct 08, 2019 at 09:15:15AM +0200, Christoph Hellwig wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > The srcmap is used to identify where the read is to be performed from.
> > It is passed to ->iomap_begin, which can fill it in if we need to read
> > data for partially written blocks from a different location than the
> > write target.  The srcmap is only supported for buffered writes so far.
> > 
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Goldwyn,
> 
> Since we've reworked your original patch quite extensively, could you
> please have a look at (and if you approve, add an Acked-by) this new(er)
> version so we can get this series moving for 5.5?
> 
> 

Sorry, was on vacation until yesterday.
Yes, I have used this version in my series and I can confirm it works
correctly with btrfs. Thanks!

> > [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
> >       srcmap if not set, adjust length down to srcmap end as well]

Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>

> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/dax.c               |  9 ++++--
> >  fs/ext2/inode.c        |  2 +-
> >  fs/ext4/inode.c        |  2 +-
> >  fs/gfs2/bmap.c         |  3 +-
> >  fs/iomap/apply.c       | 25 ++++++++++++----
> >  fs/iomap/buffered-io.c | 65 +++++++++++++++++++++++-------------------
> >  fs/iomap/direct-io.c   |  2 +-
> >  fs/iomap/fiemap.c      |  4 +--
> >  fs/iomap/seek.c        |  4 +--
> >  fs/iomap/swapfile.c    |  3 +-
> >  fs/xfs/xfs_iomap.c     |  9 ++++--
> >  include/linux/iomap.h  |  5 ++--
> >  12 files changed, 80 insertions(+), 53 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 6bf81f931de3..920105457c2c 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range);
> >  
> >  static loff_t
> >  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct block_device *bdev = iomap->bdev;
> >  	struct dax_device *dax_dev = iomap->dax_dev;
> > @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> >  	unsigned long vaddr = vmf->address;
> >  	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> >  	struct iomap iomap = { 0 };
> > +	struct iomap srcmap = { .type = IOMAP_HOLE };
> >  	unsigned flags = IOMAP_FAULT;
> >  	int error, major = 0;
> >  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> > @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> >  	 * the file system block size to be equal the page size, which means
> >  	 * that we never have to deal with more than a single extent here.
> >  	 */
> > -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
> > +	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
> >  	if (iomap_errp)
> >  		*iomap_errp = error;
> >  	if (error) {
> > @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> >  	struct inode *inode = mapping->host;
> >  	vm_fault_t result = VM_FAULT_FALLBACK;
> >  	struct iomap iomap = { 0 };
> > +	struct iomap srcmap = { .type = IOMAP_HOLE };
> >  	pgoff_t max_pgoff;
> >  	void *entry;
> >  	loff_t pos;
> > @@ -1546,7 +1548,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
> >  	 * to look up our filesystem block.
> >  	 */
> >  	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> > -	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap);
> > +	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
> > +			&srcmap);
> >  	if (error)
> >  		goto unlock_entry;
> >  
> > diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> > index 7004ce581a32..467c13ff6b40 100644
> > --- a/fs/ext2/inode.c
> > +++ b/fs/ext2/inode.c
> > @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock,
> >  
> >  #ifdef CONFIG_FS_DAX
> >  static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > -		unsigned flags, struct iomap *iomap)
> > +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	unsigned int blkbits = inode->i_blkbits;
> >  	unsigned long first_block = offset >> blkbits;
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 516faa280ced..abaaf7d96ca4 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -3407,7 +3407,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
> >  }
> >  
> >  static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> > -			    unsigned flags, struct iomap *iomap)
> > +		unsigned flags, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> >  	unsigned int blkbits = inode->i_blkbits;
> > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> > index f63df54a08c6..516103248272 100644
> > --- a/fs/gfs2/bmap.c
> > +++ b/fs/gfs2/bmap.c
> > @@ -1149,7 +1149,8 @@ static inline bool gfs2_iomap_need_write_lock(unsigned flags)
> >  }
> >  
> >  static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
> > -			    unsigned flags, struct iomap *iomap)
> > +			    unsigned flags, struct iomap *iomap,
> > +			    struct iomap *srcmap)
> >  {
> >  	struct gfs2_inode *ip = GFS2_I(inode);
> >  	struct metapath mp = { .mp_aheight = 1, };
> > diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> > index 54c02aecf3cd..484dd8eda861 100644
> > --- a/fs/iomap/apply.c
> > +++ b/fs/iomap/apply.c
> > @@ -23,8 +23,10 @@ loff_t
> >  iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> >  		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
> >  {
> > -	struct iomap iomap = { 0 };
> > +	struct iomap iomap = { .type = IOMAP_HOLE };
> > +	struct iomap srcmap = { .type = IOMAP_HOLE };
> >  	loff_t written = 0, ret;
> > +	u64 end;
> >  
> >  	/*
> >  	 * Need to map a range from start position for length bytes. This can
> > @@ -38,7 +40,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> >  	 * expose transient stale data. If the reserve fails, we can safely
> >  	 * back out at this point as there is nothing to undo.
> >  	 */
> > -	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
> > +	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
> >  	if (ret)
> >  		return ret;
> >  	if (WARN_ON(iomap.offset > pos))
> > @@ -50,15 +52,26 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> >  	 * Cut down the length to the one actually provided by the filesystem,
> >  	 * as it might not be able to give us the whole size that we requested.
> >  	 */
> > -	if (iomap.offset + iomap.length < pos + length)
> > -		length = iomap.offset + iomap.length - pos;
> > +	end = iomap.offset + iomap.length;
> > +	if (srcmap.type != IOMAP_HOLE)
> > +		end = min(end, srcmap.offset + srcmap.length);
> > +	if (pos + length > end)
> > +		length = end - pos;
> >  
> >  	/*
> > -	 * Now that we have guaranteed that the space allocation will succeed.
> > +	 * Now that we have guaranteed that the space allocation will succeed,
> >  	 * we can do the copy-in page by page without having to worry about
> >  	 * failures exposing transient data.
> > +	 *
> > +	 * To support COW operations, we read in data for partially blocks from
> > +	 * the srcmap if the file system filled it in.  In that case we the
> > +	 * length needs to be limited to the earlier of the ends of the iomaps.
> > +	 * If the file system did not provide a srcmap we pass in the normal
> > +	 * iomap into the actors so that they don't need to have special
> > +	 * handling for the two cases.
> >  	 */
> > -	written = actor(inode, pos, length, data, &iomap);
> > +	written = actor(inode, pos, length, data, &iomap,
> > +			srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
> >  
> >  	/*
> >  	 * Now the data has been copied, commit the range we've copied.  This
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index ac1bbed71a9b..eb2c6d73a837 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -234,7 +234,7 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
> >  
> >  static loff_t
> >  iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iomap_readpage_ctx *ctx = data;
> >  	struct page *page = ctx->cur_page;
> > @@ -382,7 +382,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
> >  
> >  static loff_t
> >  iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iomap_readpage_ctx *ctx = data;
> >  	loff_t done, ret;
> > @@ -402,7 +402,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
> >  			ctx->cur_page_in_bio = false;
> >  		}
> >  		ret = iomap_readpage_actor(inode, pos + done, length - done,
> > -				ctx, iomap);
> > +				ctx, iomap, srcmap);
> >  	}
> >  
> >  	return done;
> > @@ -582,7 +582,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
> >  
> >  static int
> >  __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> > -		struct page *page, struct iomap *iomap)
> > +		struct page *page, struct iomap *srcmap)
> >  {
> >  	struct iomap_page *iop = iomap_page_create(inode, page);
> >  	loff_t block_size = i_blocksize(inode);
> > @@ -605,7 +605,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >  		    (to <= poff || to >= poff + plen))
> >  			continue;
> >  
> > -		if (iomap_block_needs_zeroing(inode, iomap, block_start)) {
> > +		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
> >  			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> >  				return -EIO;
> >  			zero_user_segments(page, poff, from, to, poff + plen);
> > @@ -614,7 +614,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >  		}
> >  
> >  		status = iomap_read_page_sync(block_start, page, poff, plen,
> > -				iomap);
> > +				srcmap);
> >  		if (status)
> >  			return status;
> >  	} while ((block_start += plen) < block_end);
> > @@ -624,13 +624,15 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> >  
> >  static int
> >  iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> > -		struct page **pagep, struct iomap *iomap)
> > +		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	const struct iomap_page_ops *page_ops = iomap->page_ops;
> >  	struct page *page;
> >  	int status = 0;
> >  
> >  	BUG_ON(pos + len > iomap->offset + iomap->length);
> > +	if (srcmap != iomap)
> > +		BUG_ON(pos + len > srcmap->offset + srcmap->length);
> >  
> >  	if (fatal_signal_pending(current))
> >  		return -EINTR;
> > @@ -648,13 +650,13 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >  		goto out_no_page;
> >  	}
> >  
> > -	if (iomap->type == IOMAP_INLINE)
> > -		iomap_read_inline_data(inode, page, iomap);
> > +	if (srcmap->type == IOMAP_INLINE)
> > +		iomap_read_inline_data(inode, page, srcmap);
> >  	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> > -		status = __block_write_begin_int(page, pos, len, NULL, iomap);
> > +		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
> >  	else
> >  		status = __iomap_write_begin(inode, pos, len, flags, page,
> > -				iomap);
> > +				srcmap);
> >  
> >  	if (unlikely(status))
> >  		goto out_unlock;
> > @@ -740,16 +742,16 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
> >  }
> >  
> >  static int
> > -iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> > -		unsigned copied, struct page *page, struct iomap *iomap)
> > +iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied,
> > +		struct page *page, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	const struct iomap_page_ops *page_ops = iomap->page_ops;
> >  	loff_t old_size = inode->i_size;
> >  	int ret;
> >  
> > -	if (iomap->type == IOMAP_INLINE) {
> > +	if (srcmap->type == IOMAP_INLINE) {
> >  		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> > -	} else if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> > +	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
> >  		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> >  				page, NULL);
> >  	} else {
> > @@ -780,7 +782,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> >  
> >  static loff_t
> >  iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iov_iter *i = data;
> >  	long status = 0;
> > @@ -814,7 +816,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  			break;
> >  		}
> >  
> > -		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> > +		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
> > +				srcmap);
> >  		if (unlikely(status))
> >  			break;
> >  
> > @@ -825,8 +828,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  
> >  		flush_dcache_page(page);
> >  
> > -		status = iomap_write_end(inode, pos, bytes, copied, page,
> > -				iomap);
> > +		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> > +				srcmap);
> >  		if (unlikely(status < 0))
> >  			break;
> >  		copied = status;
> > @@ -879,7 +882,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
> >  
> >  static loff_t
> >  iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	long status = 0;
> >  	ssize_t written = 0;
> > @@ -888,7 +891,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  	if (!(iomap->flags & IOMAP_F_SHARED))
> >  		return length;
> >  	/* don't bother with holes or unwritten extents */
> > -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> > +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> >  		return length;
> >  
> >  	do {
> > @@ -897,11 +900,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >  		struct page *page;
> >  
> >  		status = iomap_write_begin(inode, pos, bytes,
> > -				IOMAP_WRITE_F_UNSHARE, &page, iomap);
> > +				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
> >  		if (unlikely(status))
> >  			return status;
> >  
> > -		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> > +		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> > +				srcmap);
> >  		if (unlikely(status <= 0)) {
> >  			if (WARN_ON_ONCE(status == 0))
> >  				return -EIO;
> > @@ -940,19 +944,19 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
> >  EXPORT_SYMBOL_GPL(iomap_file_unshare);
> >  
> >  static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> > -		unsigned bytes, struct iomap *iomap)
> > +		unsigned bytes, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct page *page;
> >  	int status;
> >  
> > -	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap);
> > +	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
> >  	if (status)
> >  		return status;
> >  
> >  	zero_user(page, offset, bytes);
> >  	mark_page_accessed(page);
> >  
> > -	return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> > +	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
> >  }
> >  
> >  static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> > @@ -964,14 +968,14 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> >  
> >  static loff_t
> >  iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	bool *did_zero = data;
> >  	loff_t written = 0;
> >  	int status;
> >  
> >  	/* already zeroed?  we're done. */
> > -	if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
> > +	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> >  		return count;
> >  
> >  	do {
> > @@ -983,7 +987,8 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
> >  		if (IS_DAX(inode))
> >  			status = iomap_dax_zero(pos, offset, bytes, iomap);
> >  		else
> > -			status = iomap_zero(inode, pos, offset, bytes, iomap);
> > +			status = iomap_zero(inode, pos, offset, bytes, iomap,
> > +					srcmap);
> >  		if (status < 0)
> >  			return status;
> >  
> > @@ -1033,7 +1038,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page);
> >  
> >  static loff_t
> >  iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct page *page = data;
> >  	int ret;
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 1fc28c2da279..e3ccbf7daaae 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
> >  
> >  static loff_t
> >  iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct iomap_dio *dio = data;
> >  
> > diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> > index f26fdd36e383..690ef2d7c6c8 100644
> > --- a/fs/iomap/fiemap.c
> > +++ b/fs/iomap/fiemap.c
> > @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
> >  
> >  static loff_t
> >  iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> > -		struct iomap *iomap)
> > +		struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	struct fiemap_ctx *ctx = data;
> >  	loff_t ret = length;
> > @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap);
> >  
> >  static loff_t
> >  iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> > -		void *data, struct iomap *iomap)
> > +		void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	sector_t *bno = data, addr;
> >  
> > diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> > index c04bad4b2b43..89f61d93c0bc 100644
> > --- a/fs/iomap/seek.c
> > +++ b/fs/iomap/seek.c
> > @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
> >  
> >  static loff_t
> >  iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length,
> > -		      void *data, struct iomap *iomap)
> > +		      void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	switch (iomap->type) {
> >  	case IOMAP_UNWRITTEN:
> > @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole);
> >  
> >  static loff_t
> >  iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length,
> > -		      void *data, struct iomap *iomap)
> > +		      void *data, struct iomap *iomap, struct iomap *srcmap)
> >  {
> >  	switch (iomap->type) {
> >  	case IOMAP_HOLE:
> > diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> > index 152a230f668d..a648dbf6991e 100644
> > --- a/fs/iomap/swapfile.c
> > +++ b/fs/iomap/swapfile.c
> > @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
> >   * distinction between written and unwritten extents.
> >   */
> >  static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> > -		loff_t count, void *data, struct iomap *iomap)
> > +		loff_t count, void *data, struct iomap *iomap,
> > +		struct iomap *srcmap)
> >  {
> >  	struct iomap_swapfile_info *isi = data;
> >  	int error;
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index c0a492353826..016adcd7dd66 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -928,7 +928,8 @@ xfs_file_iomap_begin(
> >  	loff_t			offset,
> >  	loff_t			length,
> >  	unsigned		flags,
> > -	struct iomap		*iomap)
> > +	struct iomap		*iomap,
> > +	struct iomap		*srcmap)
> >  {
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > @@ -1154,7 +1155,8 @@ xfs_seek_iomap_begin(
> >  	loff_t			offset,
> >  	loff_t			length,
> >  	unsigned		flags,
> > -	struct iomap		*iomap)
> > +	struct iomap		*iomap,
> > +	struct iomap		*srcmap)
> >  {
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > @@ -1240,7 +1242,8 @@ xfs_xattr_iomap_begin(
> >  	loff_t			offset,
> >  	loff_t			length,
> >  	unsigned		flags,
> > -	struct iomap		*iomap)
> > +	struct iomap		*iomap,
> > +	struct iomap		*srcmap)
> >  {
> >  	struct xfs_inode	*ip = XFS_I(inode);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 24c784e44274..37af5f9dc722 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -127,7 +127,8 @@ struct iomap_ops {
> >  	 * The actual length is returned in iomap->length.
> >  	 */
> >  	int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
> > -			unsigned flags, struct iomap *iomap);
> > +			unsigned flags, struct iomap *iomap,
> > +			struct iomap *srcmap);
> >  
> >  	/*
> >  	 * Commit and/or unreserve space previous allocated using iomap_begin.
> > @@ -143,7 +144,7 @@ struct iomap_ops {
> >   * Main iomap iterator function.
> >   */
> >  typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> > -		void *data, struct iomap *iomap);
> > +		void *data, struct iomap *iomap, struct iomap *srcmap);
> >  
> >  loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
> >  		unsigned flags, const struct iomap_ops *ops, void *data,
> > -- 
> > 2.20.1
> > 
> 

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-10-15 13:01 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08  7:15 iomap and xfs COW cleanups v2 Christoph Hellwig
2019-10-08  7:15 ` [PATCH 01/20] iomap: better document the IOMAP_F_* flags Christoph Hellwig
2019-10-08  7:15 ` [PATCH 02/20] iomap: remove the unused iomap argument to __iomap_write_end Christoph Hellwig
2019-10-08  7:15 ` [PATCH 03/20] iomap: always use AOP_FLAG_NOFS in iomap_write_begin Christoph Hellwig
2019-10-08  7:15 ` [PATCH 04/20] iomap: ignore non-shared or non-data blocks in xfs_file_dirty Christoph Hellwig
2019-10-08  7:15 ` [PATCH 05/20] iomap: move the zeroing case out of iomap_read_page_sync Christoph Hellwig
2019-10-08  7:15 ` [PATCH 06/20] iomap: use write_begin to read pages to unshare Christoph Hellwig
2019-10-08 15:12   ` Darrick J. Wong
2019-10-08  7:15 ` [PATCH 07/20] iomap: renumber IOMAP_HOLE to 0 Christoph Hellwig
2019-10-08 15:01   ` Darrick J. Wong
2019-10-08  7:15 ` [PATCH 08/20] iomap: use a srcmap for a read-modify-write I/O Christoph Hellwig
2019-10-08 15:00   ` Darrick J. Wong
2019-10-09  6:28     ` Christoph Hellwig
2019-10-09 17:16       ` Darrick J. Wong
2019-10-14 23:27   ` Darrick J. Wong
2019-10-15 13:00     ` Goldwyn Rodrigues
2019-10-08  7:15 ` [PATCH 09/20] xfs: also call xfs_file_iomap_end_delalloc for zeroing operations Christoph Hellwig
2019-10-08  7:15 ` [PATCH 10/20] xfs: remove xfs_reflink_dirty_extents Christoph Hellwig
2019-10-08 15:20   ` Darrick J. Wong
2019-10-08  7:15 ` [PATCH 11/20] xfs: pass two imaps to xfs_reflink_allocate_cow Christoph Hellwig
2019-10-08  7:15 ` [PATCH 12/20] xfs: refactor xfs_file_iomap_begin_delay Christoph Hellwig
2019-10-08  7:15 ` [PATCH 13/20] xfs: fill out the srcmap in iomap_begin Christoph Hellwig
2019-10-08 15:26   ` Darrick J. Wong
2019-10-08  7:15 ` [PATCH 14/20] xfs: factor out a helper to calculate the end_fsb Christoph Hellwig
2019-10-08  7:15 ` [PATCH 15/20] xfs: split out a new set of read-only iomap ops Christoph Hellwig
2019-10-08 15:27   ` Darrick J. Wong
2019-10-08  7:15 ` [PATCH 16/20] xfs: move xfs_file_iomap_begin_delay around Christoph Hellwig
2019-10-08  7:15 ` [PATCH 17/20] xfs: split the iomap ops for buffered vs direct writes Christoph Hellwig
2019-10-08 15:28   ` Darrick J. Wong
2019-10-08  7:15 ` [PATCH 18/20] xfs: rename the whichfork variable in xfs_buffered_write_iomap_begin Christoph Hellwig
2019-10-08  7:15 ` [PATCH 19/20] xfs: cleanup xfs_iomap_write_unwritten Christoph Hellwig
2019-10-08  7:15 ` [PATCH 20/20] xfs: improve the IOMAP_NOWAIT check for COW inodes Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).