nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* RFC: switch iomap to an iterator model
@ 2021-07-19 10:34 Christoph Hellwig
  2021-07-19 10:34 ` [PATCH 01/27] iomap: fix a trivial comment typo in trace.h Christoph Hellwig
                   ` (28 more replies)
  0 siblings, 29 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Hi all,

this series replies the existing callback-based iomap_apply to an iter based
model.  The prime aim here is to simply the DAX reflink support, which
requires iterating through two inodes, something that is rather painful
with the apply model.  It also helps to kill an indirect call per segment
as-is.  Compared to the earlier patchset from Matthew Wilcox that this
series is based upon it does not eliminate all indirect calls, but as the
upside it does not change the file systems at all (except for the btrfs
and gfs2 hooks which have slight prototype changes).

This passes basic testing on XFS for block based file systems.  The DAX
changes are entirely untested as I haven't managed to get pmem work in
recent qemu.

Diffstat:
 b/fs/btrfs/inode.c       |    5 
 b/fs/buffer.c            |    4 
 b/fs/dax.c               |  578 ++++++++++++++++++++++-------------------------
 b/fs/gfs2/bmap.c         |    5 
 b/fs/internal.h          |    4 
 b/fs/iomap/Makefile      |    2 
 b/fs/iomap/buffered-io.c |  344 +++++++++++++--------------
 b/fs/iomap/direct-io.c   |  162 ++++++-------
 b/fs/iomap/fiemap.c      |  101 +++-----
 b/fs/iomap/iter.c        |   74 ++++++
 b/fs/iomap/seek.c        |   88 +++----
 b/fs/iomap/swapfile.c    |   38 +--
 b/fs/iomap/trace.h       |   35 +-
 b/include/linux/iomap.h  |   73 ++++-
 fs/iomap/apply.c         |   99 --------
 15 files changed, 777 insertions(+), 835 deletions(-)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/27] iomap: fix a trivial comment typo in trace.h
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
@ 2021-07-19 10:34 ` Christoph Hellwig
  2021-07-19 16:00   ` Darrick J. Wong
  2021-07-19 10:34 ` [PATCH 02/27] iomap: remove the iomap arguments to ->page_{prepare,done} Christoph Hellwig
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index fdc7ae388476f5..e9cd5cc0d6ba40 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -2,7 +2,7 @@
 /*
  * Copyright (c) 2009-2019 Christoph Hellwig
  *
- * NOTE: none of these tracepoints shall be consider a stable kernel ABI
+ * NOTE: none of these tracepoints shall be considered a stable kernel ABI
  * as they can change at any time.
  */
 #undef TRACE_SYSTEM
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 02/27] iomap: remove the iomap arguments to ->page_{prepare,done}
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
  2021-07-19 10:34 ` [PATCH 01/27] iomap: fix a trivial comment typo in trace.h Christoph Hellwig
@ 2021-07-19 10:34 ` Christoph Hellwig
  2021-07-19 16:04   ` Darrick J. Wong
  2021-07-19 10:34 ` [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const Christoph Hellwig
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

These aren't actually used by the only instance implementing the methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/gfs2/bmap.c         | 5 ++---
 fs/iomap/buffered-io.c | 6 +++---
 include/linux/iomap.h  | 5 ++---
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index ed8b67b2171817..5414c2c3358092 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1002,7 +1002,7 @@ static void gfs2_write_unlock(struct inode *inode)
 }
 
 static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
-				   unsigned len, struct iomap *iomap)
+				   unsigned len)
 {
 	unsigned int blockmask = i_blocksize(inode) - 1;
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
@@ -1013,8 +1013,7 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
 }
 
 static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
-				 unsigned copied, struct page *page,
-				 struct iomap *iomap)
+				 unsigned copied, struct page *page)
 {
 	struct gfs2_trans *tr = current->journal_info;
 	struct gfs2_inode *ip = GFS2_I(inode);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 87ccb3438becd9..75310f6fcf8401 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -605,7 +605,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		return -EINTR;
 
 	if (page_ops && page_ops->page_prepare) {
-		status = page_ops->page_prepare(inode, pos, len, iomap);
+		status = page_ops->page_prepare(inode, pos, len);
 		if (status)
 			return status;
 	}
@@ -638,7 +638,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 
 out_no_page:
 	if (page_ops && page_ops->page_done)
-		page_ops->page_done(inode, pos, 0, NULL, iomap);
+		page_ops->page_done(inode, pos, 0, NULL);
 	return status;
 }
 
@@ -714,7 +714,7 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	if (old_size < pos)
 		pagecache_isize_extended(inode, old_size, pos);
 	if (page_ops && page_ops->page_done)
-		page_ops->page_done(inode, pos, ret, page, iomap);
+		page_ops->page_done(inode, pos, ret, page);
 	put_page(page);
 
 	if (ret < len)
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 479c1da3e2211e..093519d91cc9cc 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -108,10 +108,9 @@ iomap_sector(struct iomap *iomap, loff_t pos)
  * associated page could not be obtained.
  */
 struct iomap_page_ops {
-	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
-			struct iomap *iomap);
+	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
 	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
-			struct page *page, struct iomap *iomap);
+			struct page *page);
 };
 
 /*
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
  2021-07-19 10:34 ` [PATCH 01/27] iomap: fix a trivial comment typo in trace.h Christoph Hellwig
  2021-07-19 10:34 ` [PATCH 02/27] iomap: remove the iomap arguments to ->page_{prepare,done} Christoph Hellwig
@ 2021-07-19 10:34 ` Christoph Hellwig
  2021-07-19 16:08   ` Darrick J. Wong
  2021-07-19 10:34 ` [PATCH 04/27] fs: mark the iomap argument to __block_write_begin_int const Christoph Hellwig
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/iomap.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 093519d91cc9cc..f9c36df6a3061b 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -91,8 +91,7 @@ struct iomap {
 	const struct iomap_page_ops *page_ops;
 };
 
-static inline sector_t
-iomap_sector(struct iomap *iomap, loff_t pos)
+static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
 {
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 04/27] fs: mark the iomap argument to __block_write_begin_int const
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (2 preceding siblings ...)
  2021-07-19 10:34 ` [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const Christoph Hellwig
@ 2021-07-19 10:34 ` Christoph Hellwig
  2021-07-19 17:35   ` Darrick J. Wong
  2021-07-19 10:34 ` [PATCH 05/27] fsdax: mark the iomap argument to dax_iomap_sector as const Christoph Hellwig
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

__block_write_begin_int never modifies the passed in iomap, so mark it
const.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/buffer.c   | 4 ++--
 fs/internal.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 6290c3afdba488..bd6a9e9fbd64c9 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1912,7 +1912,7 @@ EXPORT_SYMBOL(page_zero_new_buffers);
 
 static void
 iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
-		struct iomap *iomap)
+		const struct iomap *iomap)
 {
 	loff_t offset = block << inode->i_blkbits;
 
@@ -1966,7 +1966,7 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 }
 
 int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
-		get_block_t *get_block, struct iomap *iomap)
+		get_block_t *get_block, const struct iomap *iomap)
 {
 	unsigned from = pos & (PAGE_SIZE - 1);
 	unsigned to = from + len;
diff --git a/fs/internal.h b/fs/internal.h
index 3ce8edbaa3ca2f..9ad6b5157584b8 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -48,8 +48,8 @@ static inline int emergency_thaw_bdev(struct super_block *sb)
 /*
  * buffer.c
  */
-extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
-		get_block_t *get_block, struct iomap *iomap);
+int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
+		get_block_t *get_block, const struct iomap *iomap);
 
 /*
  * char_dev.c
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 05/27] fsdax: mark the iomap argument to dax_iomap_sector as const
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (3 preceding siblings ...)
  2021-07-19 10:34 ` [PATCH 04/27] fs: mark the iomap argument to __block_write_begin_int const Christoph Hellwig
@ 2021-07-19 10:34 ` Christoph Hellwig
  2021-07-19 17:35   ` Darrick J. Wong
  2021-07-19 10:34 ` [PATCH 06/27] iomap: mark the iomap argument to iomap_read_inline_data const Christoph Hellwig
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index da41f9363568e0..4d63040fd71f56 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1005,7 +1005,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
+static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
 {
 	return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/27] iomap: mark the iomap argument to iomap_read_inline_data const
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (4 preceding siblings ...)
  2021-07-19 10:34 ` [PATCH 05/27] fsdax: mark the iomap argument to dax_iomap_sector as const Christoph Hellwig
@ 2021-07-19 10:34 ` Christoph Hellwig
  2021-07-19 17:35   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 07/27] iomap: mark the iomap argument to iomap_read_page_sync const Christoph Hellwig
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

iomap_read_inline_data never modifies the passed in iomap, so mark
it const.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 75310f6fcf8401..e47380259cf7e1 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -207,7 +207,7 @@ struct iomap_readpage_ctx {
 
 static void
 iomap_read_inline_data(struct inode *inode, struct page *page,
-		struct iomap *iomap)
+		const struct iomap *iomap)
 {
 	size_t size = i_size_read(inode);
 	void *addr;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 07/27] iomap: mark the iomap argument to iomap_read_page_sync const
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (5 preceding siblings ...)
  2021-07-19 10:34 ` [PATCH 06/27] iomap: mark the iomap argument to iomap_read_inline_data const Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:35   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 08/27] iomap: add the new iomap_iter model Christoph Hellwig
                   ` (21 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

iomap_read_page_sync never modifies the passed in iomap, so mark
it const.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e47380259cf7e1..8c26cf7cbd72b0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -535,7 +535,7 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 
 static int
 iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
-		unsigned plen, struct iomap *iomap)
+		unsigned plen, const struct iomap *iomap)
 {
 	struct bio_vec bvec;
 	struct bio bio;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 08/27] iomap: add the new iomap_iter model
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (6 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 07/27] iomap: mark the iomap argument to iomap_read_page_sync const Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 16:56   ` Darrick J. Wong
  2021-07-19 21:48   ` Dave Chinner
  2021-07-19 10:35 ` [PATCH 09/27] iomap: switch readahead and readpage to use iomap_iter Christoph Hellwig
                   ` (20 subsequent siblings)
  28 siblings, 2 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

The iomap_iter struct provides a convenient way to package up and
maintain all the arguments to the various mapping and operation
functions.  It is operated on using the iomap_iter() function that
is called in loop until the whole range has been processed.  Compared
to the existing iomap_apply() function this avoid an indirect call
for each iteration.

For now iomap_iter() calls back into the existing ->iomap_begin and
->iomap_end methods, but in the future this could be further optimized
to avoid indirect calls entirely.

Based on an earlier patch from Matthew Wilcox <willy@infradead.org>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/Makefile     |  1 +
 fs/iomap/iter.c       | 74 +++++++++++++++++++++++++++++++++++++++++++
 fs/iomap/trace.h      | 37 +++++++++++++++++++++-
 include/linux/iomap.h | 56 ++++++++++++++++++++++++++++++++
 4 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 fs/iomap/iter.c

diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index eef2722d93a183..85034deb5a2f19 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_FS_IOMAP)		+= iomap.o
 
 iomap-y				+= trace.o \
 				   apply.o \
+				   iter.o \
 				   buffered-io.o \
 				   direct-io.o \
 				   fiemap.o \
diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
new file mode 100644
index 00000000000000..b21e2489700b7c
--- /dev/null
+++ b/fs/iomap/iter.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Christoph Hellwig.
+ */
+#include <linux/fs.h>
+#include <linux/iomap.h>
+#include "trace.h"
+
+static inline int iomap_iter_advance(struct iomap_iter *iter)
+{
+	/* handle the previous iteration (if any) */
+	if (iter->iomap.length) {
+		if (iter->processed <= 0)
+			return iter->processed;
+		WARN_ON_ONCE(iter->processed > iomap_length(iter));
+		iter->pos += iter->processed;
+		iter->len -= iter->processed;
+		if (!iter->len)
+			return 0;
+	}
+
+	/* clear the state for the next iteration */
+	iter->processed = 0;
+	memset(&iter->iomap, 0, sizeof(iter->iomap));
+	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
+	return 1;
+}
+
+static inline void iomap_iter_done(struct iomap_iter *iter)
+{
+	WARN_ON_ONCE(iter->iomap.offset > iter->pos);
+	WARN_ON_ONCE(iter->iomap.length == 0);
+	WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos);
+
+	trace_iomap_iter_dstmap(iter->inode, &iter->iomap);
+	if (iter->srcmap.type != IOMAP_HOLE)
+		trace_iomap_iter_srcmap(iter->inode, &iter->srcmap);
+}
+
+/**
+ * iomap_iter - iterate over a ranges in a file
+ * @iter: iteration structue
+ * @ops: iomap ops provided by the file system
+ *
+ * Iterate over file system provided contiguous ranges of blocks with the same
+ * state.  Should be called in a loop that continues as long as this function
+ * returns a positive value.  If 0 or a negative value is returned the caller
+ * should break out of the loop - a negative value is an error either from the
+ * file system or from the last iteration stored in @iter.copied.
+ */
+int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
+{
+	int ret;
+
+	if (iter->iomap.length && ops->iomap_end) {
+		ret = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
+				iter->processed > 0 ? iter->processed : 0,
+				iter->flags, &iter->iomap);
+		if (ret < 0 && !iter->processed)
+			return ret;
+	}
+
+	trace_iomap_iter(iter, ops, _RET_IP_);
+	ret = iomap_iter_advance(iter);
+	if (ret <= 0)
+		return ret;
+
+	ret = ops->iomap_begin(iter->inode, iter->pos, iter->len, iter->flags,
+			       &iter->iomap, &iter->srcmap);
+	if (ret < 0)
+		return ret;
+	iomap_iter_done(iter);
+	return 1;
+}
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index e9cd5cc0d6ba40..1012d7af6b689b 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Copyright (c) 2009-2019 Christoph Hellwig
+ * Copyright (c) 2009-2021 Christoph Hellwig
  *
  * NOTE: none of these tracepoints shall be considered a stable kernel ABI
  * as they can change at any time.
@@ -140,6 +140,8 @@ DEFINE_EVENT(iomap_class, name,	\
 	TP_ARGS(inode, iomap))
 DEFINE_IOMAP_EVENT(iomap_apply_dstmap);
 DEFINE_IOMAP_EVENT(iomap_apply_srcmap);
+DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
+DEFINE_IOMAP_EVENT(iomap_iter_srcmap);
 
 TRACE_EVENT(iomap_apply,
 	TP_PROTO(struct inode *inode, loff_t pos, loff_t length,
@@ -179,6 +181,39 @@ TRACE_EVENT(iomap_apply,
 		   __entry->actor)
 );
 
+TRACE_EVENT(iomap_iter,
+	TP_PROTO(struct iomap_iter *iter, const void *ops,
+		 unsigned long caller),
+	TP_ARGS(iter, ops, caller),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(u64, ino)
+		__field(loff_t, pos)
+		__field(loff_t, length)
+		__field(unsigned int, flags)
+		__field(const void *, ops)
+		__field(unsigned long, caller)
+	),
+	TP_fast_assign(
+		__entry->dev = iter->inode->i_sb->s_dev;
+		__entry->ino = iter->inode->i_ino;
+		__entry->pos = iter->pos;
+		__entry->length = iomap_length(iter);
+		__entry->flags = iter->flags;
+		__entry->ops = ops;
+		__entry->caller = caller;
+	),
+	TP_printk("dev %d:%d ino 0x%llx pos %lld length %lld flags %s (0x%x) ops %ps caller %pS",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		   __entry->ino,
+		   __entry->pos,
+		   __entry->length,
+		   __print_flags(__entry->flags, "|", IOMAP_FLAGS_STRINGS),
+		   __entry->flags,
+		   __entry->ops,
+		   (void *)__entry->caller)
+);
+
 #endif /* _IOMAP_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index f9c36df6a3061b..a9f3f736017989 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -143,6 +143,62 @@ struct iomap_ops {
 			ssize_t written, unsigned flags, struct iomap *iomap);
 };
 
+/**
+ * struct iomap_iter - Iterate through a range of a file
+ * @inode: Set at the start of the iteration and should not change.
+ * @pos: The current file position we are operating on.  It is updated by
+ *	calls to iomap_iter().  Treat as read-only in the body.
+ * @len: The remaining length of the file segment we're operating on.
+ *	It is updated at the same time as @pos.
+ * @processed: The number of bytes processed by the body in the most recent
+ *	iteration, or a negative errno. 0 causes the iteration to stop.
+ * @flags: Zero or more of the iomap_begin flags above.
+ * @iomap: Map describing the I/O iteration
+ * @srcmap: Source map for COW operations
+ */
+struct iomap_iter {
+	struct inode *inode;
+	loff_t pos;
+	u64 len;
+	ssize_t processed;
+	unsigned flags;
+	struct iomap iomap;
+	struct iomap srcmap;
+};
+
+int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops);
+
+/**
+ * iomap_length - length of the current iomap iteration
+ * @iter: iteration structure
+ *
+ * Returns the length that the operation applies to for the current iteration.
+ */
+static inline u64 iomap_length(const struct iomap_iter *iter)
+{
+	u64 end = iter->iomap.offset + iter->iomap.length;
+
+	if (iter->srcmap.type != IOMAP_HOLE)
+		end = min(end, iter->srcmap.offset + iter->srcmap.length);
+	return min(iter->len, end - iter->pos);
+}
+
+/**
+ * iomap_iter_srcmap - return the source map for the current iomap iteration
+ * @i: iteration structure
+ *
+ * Write operations on file systems with reflink support might require a
+ * source and a destination map.  This function retourns the source map
+ * for a given operation, which may or may no be identical to the destination
+ * map in &i->iomap.
+ */
+static inline struct iomap *iomap_iter_srcmap(struct iomap_iter *i)
+{
+	if (i->srcmap.type != IOMAP_HOLE)
+		return &i->srcmap;
+	return &i->iomap;
+}
+
 /*
  * Main iomap iterator function.
  */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 09/27] iomap: switch readahead and readpage to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (7 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 08/27] iomap: add the new iomap_iter model Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 10/27] iomap: switch iomap_file_buffered_write " Christoph Hellwig
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch the page cache read functions to use iomap_iter instead of
iomap_apply.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 79 +++++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 43 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8c26cf7cbd72b0..3b18cafa72bec6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -234,11 +234,12 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
 		pos >= i_size_read(inode);
 }
 
-static loff_t
-iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_readpage_iter(struct iomap_iter *iter,
+		struct iomap_readpage_ctx *ctx, loff_t offset)
 {
-	struct iomap_readpage_ctx *ctx = data;
+	struct iomap *iomap = &iter->iomap;
+	loff_t pos = iter->pos + offset;
+	loff_t length = iomap_length(iter) - offset;
 	struct page *page = ctx->cur_page;
 	struct iomap_page *iop;
 	bool same_page = false, is_contig = false;
@@ -248,17 +249,17 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 
 	if (iomap->type == IOMAP_INLINE) {
 		WARN_ON_ONCE(pos);
-		iomap_read_inline_data(inode, page, iomap);
+		iomap_read_inline_data(iter->inode, page, iomap);
 		return PAGE_SIZE;
 	}
 
 	/* zero post-eof blocks as the page may be mapped */
-	iop = iomap_page_create(inode, page);
-	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
+	iop = iomap_page_create(iter->inode, page);
+	iomap_adjust_read_range(iter->inode, iop, &pos, length, &poff, &plen);
 	if (plen == 0)
 		goto done;
 
-	if (iomap_block_needs_zeroing(inode, iomap, pos)) {
+	if (iomap_block_needs_zeroing(iter->inode, iomap, pos)) {
 		zero_user(page, poff, plen);
 		iomap_set_range_uptodate(page, poff, plen);
 		goto done;
@@ -317,23 +318,23 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 int
 iomap_readpage(struct page *page, const struct iomap_ops *ops)
 {
-	struct iomap_readpage_ctx ctx = { .cur_page = page };
-	struct inode *inode = page->mapping->host;
-	unsigned poff;
-	loff_t ret;
+	struct iomap_iter iter = {
+		.inode		= page->mapping->host,
+		.pos		= page_offset(page),
+		.len		= PAGE_SIZE,
+	};
+	struct iomap_readpage_ctx ctx = {
+		.cur_page	= page,
+	};
+	int ret;
 
 	trace_iomap_readpage(page->mapping->host, 1);
 
-	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
-		ret = iomap_apply(inode, page_offset(page) + poff,
-				PAGE_SIZE - poff, 0, ops, &ctx,
-				iomap_readpage_actor);
-		if (ret <= 0) {
-			WARN_ON_ONCE(ret == 0);
-			SetPageError(page);
-			break;
-		}
-	}
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_readpage_iter(&iter, &ctx, 0);
+
+	if (ret < 0)
+		SetPageError(page);
 
 	if (ctx.bio) {
 		submit_bio(ctx.bio);
@@ -352,15 +353,14 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 }
 EXPORT_SYMBOL_GPL(iomap_readpage);
 
-static loff_t
-iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_readahead_iter(struct iomap_iter *iter,
+		struct iomap_readpage_ctx *ctx)
 {
-	struct iomap_readpage_ctx *ctx = data;
+	loff_t length = iomap_length(iter);
 	loff_t done, ret;
 
 	for (done = 0; done < length; done += ret) {
-		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
+		if (ctx->cur_page && offset_in_page(iter->pos + done) == 0) {
 			if (!ctx->cur_page_in_bio)
 				unlock_page(ctx->cur_page);
 			put_page(ctx->cur_page);
@@ -370,8 +370,7 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
 			ctx->cur_page = readahead_page(ctx->rac);
 			ctx->cur_page_in_bio = false;
 		}
-		ret = iomap_readpage_actor(inode, pos + done, length - done,
-				ctx, iomap, srcmap);
+		ret = iomap_readpage_iter(iter, ctx, done);
 	}
 
 	return done;
@@ -394,25 +393,19 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
  */
 void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
 {
-	struct inode *inode = rac->mapping->host;
-	loff_t pos = readahead_pos(rac);
-	size_t length = readahead_length(rac);
+	struct iomap_iter iter = {
+		.inode	= rac->mapping->host,
+		.pos	= readahead_pos(rac),
+		.len	= readahead_length(rac),
+	};
 	struct iomap_readpage_ctx ctx = {
 		.rac	= rac,
 	};
 
-	trace_iomap_readahead(inode, readahead_count(rac));
+	trace_iomap_readahead(rac->mapping->host, readahead_count(rac));
 
-	while (length > 0) {
-		ssize_t ret = iomap_apply(inode, pos, length, 0, ops,
-				&ctx, iomap_readahead_actor);
-		if (ret <= 0) {
-			WARN_ON_ONCE(ret == 0);
-			break;
-		}
-		pos += ret;
-		length -= ret;
-	}
+	while (iomap_iter(&iter, ops) > 0)
+		iter.processed = iomap_readahead_iter(&iter, &ctx);
 
 	if (ctx.bio)
 		submit_bio(ctx.bio);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 10/27] iomap: switch iomap_file_buffered_write to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (8 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 09/27] iomap: switch readahead and readpage to use iomap_iter Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 11/27] iomap: switch iomap_file_unshare " Christoph Hellwig
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch iomap_file_buffered_write to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 49 +++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 3b18cafa72bec6..7195e82d15775e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -715,13 +715,14 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	return ret;
 }
 
-static loff_t
-iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 {
-	struct iov_iter *i = data;
-	long status = 0;
+	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	struct iomap *iomap = &iter->iomap;
+	loff_t length = iomap_length(iter);
+	loff_t pos = iter->pos;
 	ssize_t written = 0;
+	long status = 0;
 
 	do {
 		struct page *page;
@@ -747,18 +748,18 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
-		status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap,
-				srcmap);
+		status = iomap_write_begin(iter->inode, pos, bytes, 0, &page,
+					   iomap, srcmap);
 		if (unlikely(status))
 			break;
 
-		if (mapping_writably_mapped(inode->i_mapping))
+		if (mapping_writably_mapped(iter->inode->i_mapping))
 			flush_dcache_page(page);
 
 		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
 
-		status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
-				srcmap);
+		status = iomap_write_end(iter->inode, pos, bytes, copied, page,
+					 iomap, srcmap);
 
 		if (unlikely(copied != status))
 			iov_iter_revert(i, copied - status);
@@ -779,29 +780,29 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		written += status;
 		length -= status;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(iter->inode->i_mapping);
 	} while (iov_iter_count(i) && length);
 
 	return written ? written : status;
 }
 
 ssize_t
-iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
+iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
 		const struct iomap_ops *ops)
 {
-	struct inode *inode = iocb->ki_filp->f_mapping->host;
-	loff_t pos = iocb->ki_pos, ret = 0, written = 0;
-
-	while (iov_iter_count(iter)) {
-		ret = iomap_apply(inode, pos, iov_iter_count(iter),
-				IOMAP_WRITE, ops, iter, iomap_write_actor);
-		if (ret <= 0)
-			break;
-		pos += ret;
-		written += ret;
-	}
+	struct iomap_iter iter = {
+		.inode		= iocb->ki_filp->f_mapping->host,
+		.pos		= iocb->ki_pos,
+		.len		= iov_iter_count(i),
+		.flags		= IOMAP_WRITE,
+	};
+	int ret;
 
-	return written ? written : ret;
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_write_iter(&iter, i);
+	if (iter.pos == iocb->ki_pos)
+		return ret;
+	return iter.pos - iocb->ki_pos;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 11/27] iomap: switch iomap_file_unshare to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (9 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 10/27] iomap: switch iomap_file_buffered_write " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 12/27] iomap: switch iomap_zero_range " Christoph Hellwig
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch iomap_file_unshare to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7195e82d15775e..59781c72c278e5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -806,10 +806,12 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
-static loff_t
-iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 {
+	struct iomap *iomap = &iter->iomap;
+	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	loff_t pos = iter->pos;
+	loff_t length = iomap_length(iter);
 	long status = 0;
 	loff_t written = 0;
 
@@ -825,12 +827,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 		struct page *page;
 
-		status = iomap_write_begin(inode, pos, bytes,
+		status = iomap_write_begin(iter->inode, pos, bytes,
 				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
 		if (unlikely(status))
 			return status;
 
-		status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
+		status = iomap_write_end(iter->inode, pos, bytes, bytes, page, iomap,
 				srcmap);
 		if (WARN_ON_ONCE(status == 0))
 			return -EIO;
@@ -841,7 +843,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		written += status;
 		length -= status;
 
-		balance_dirty_pages_ratelimited(inode->i_mapping);
+		balance_dirty_pages_ratelimited(iter->inode->i_mapping);
 	} while (length);
 
 	return written;
@@ -851,18 +853,17 @@ int
 iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops)
 {
-	loff_t ret;
-
-	while (len) {
-		ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
-				iomap_unshare_actor);
-		if (ret <= 0)
-			return ret;
-		pos += ret;
-		len -= ret;
-	}
+	struct iomap_iter iter = {
+		.inode		= inode,
+		.pos		= pos,
+		.len		= len,
+		.flags		= IOMAP_WRITE,
+	};
+	int ret;
 
-	return 0;
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_unshare_iter(&iter);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 12/27] iomap: switch iomap_zero_range to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (10 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 11/27] iomap: switch iomap_file_unshare " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 13/27] iomap: switch iomap_page_mkwrite " Christoph Hellwig
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch iomap_zero_range to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 59781c72c278e5..e5832ffb413cb6 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -885,11 +885,12 @@ static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
 	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
 }
 
-static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
-		loff_t length, void *data, struct iomap *iomap,
-		struct iomap *srcmap)
+static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 {
-	bool *did_zero = data;
+	struct iomap *iomap = &iter->iomap;
+	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	loff_t pos = iter->pos;
+	loff_t length = iomap_length(iter);
 	loff_t written = 0;
 
 	/* already zeroed?  we're done. */
@@ -899,10 +900,11 @@ static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
 	do {
 		s64 bytes;
 
-		if (IS_DAX(inode))
+		if (IS_DAX(iter->inode))
 			bytes = dax_iomap_zero(pos, length, iomap);
 		else
-			bytes = iomap_zero(inode, pos, length, iomap, srcmap);
+			bytes = iomap_zero(iter->inode, pos, length, iomap,
+					   srcmap);
 		if (bytes < 0)
 			return bytes;
 
@@ -920,19 +922,17 @@ int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops)
 {
-	loff_t ret;
-
-	while (len > 0) {
-		ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
-				ops, did_zero, iomap_zero_range_actor);
-		if (ret <= 0)
-			return ret;
-
-		pos += ret;
-		len -= ret;
-	}
+	struct iomap_iter iter = {
+		.inode		= inode,
+		.pos		= pos,
+		.len		= len,
+		.flags		= IOMAP_ZERO,
+	};
+	int ret;
 
-	return 0;
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_zero_iter(&iter, did_zero);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_zero_range);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 13/27] iomap: switch iomap_page_mkwrite to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (11 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 12/27] iomap: switch iomap_zero_range " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 14/27] iomap: switch __iomap_dio_rw " Christoph Hellwig
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch iomap_page_mkwrite to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e5832ffb413cb6..c273b5d88dd8a8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -950,15 +950,15 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 }
 EXPORT_SYMBOL_GPL(iomap_truncate_page);
 
-static loff_t
-iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_page_mkwrite_iter(struct iomap_iter *iter,
+		struct page *page)
 {
-	struct page *page = data;
+	loff_t length = iomap_length(iter);
 	int ret;
 
-	if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
-		ret = __block_write_begin_int(page, pos, length, NULL, iomap);
+	if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD) {
+		ret = __block_write_begin_int(page, iter->pos, length, NULL,
+					      &iter->iomap);
 		if (ret)
 			return ret;
 		block_commit_write(page, 0, length);
@@ -972,29 +972,24 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
 
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 {
+	struct iomap_iter iter = {
+		.inode		= file_inode(vmf->vma->vm_file),
+		.flags		= IOMAP_WRITE | IOMAP_FAULT,
+	};
 	struct page *page = vmf->page;
-	struct inode *inode = file_inode(vmf->vma->vm_file);
-	unsigned long length;
-	loff_t offset;
 	ssize_t ret;
 
 	lock_page(page);
-	ret = page_mkwrite_check_truncate(page, inode);
+	ret = page_mkwrite_check_truncate(page, iter.inode);
 	if (ret < 0)
 		goto out_unlock;
-	length = ret;
-
-	offset = page_offset(page);
-	while (length > 0) {
-		ret = iomap_apply(inode, offset, length,
-				IOMAP_WRITE | IOMAP_FAULT, ops, page,
-				iomap_page_mkwrite_actor);
-		if (unlikely(ret <= 0))
-			goto out_unlock;
-		offset += ret;
-		length -= ret;
-	}
+	iter.pos = page_offset(page);
+	iter.len = ret;
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_page_mkwrite_iter(&iter, page);
 
+	if (ret < 0)
+		goto out_unlock;
 	wait_for_stable_page(page);
 	return VM_FAULT_LOCKED;
 out_unlock:
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 14/27] iomap: switch __iomap_dio_rw to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (12 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 13/27] iomap: switch iomap_page_mkwrite " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 15/27] iomap: switch iomap_fiemap " Christoph Hellwig
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch __iomap_dio_rw to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c      |   5 +-
 fs/iomap/direct-io.c  | 162 +++++++++++++++++++++---------------------
 include/linux/iomap.h |   4 +-
 3 files changed, 85 insertions(+), 86 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8f60314c36c55e..12b18fdf86dcfa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8194,9 +8194,10 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 	return dip;
 }
 
-static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
+static blk_qc_t btrfs_submit_direct(const struct iomap_iter *iter,
 		struct bio *dio_bio, loff_t file_offset)
 {
+	struct inode *inode = iter->inode;
 	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
@@ -8212,7 +8213,7 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
 	int ret;
 	blk_status_t status;
 	struct btrfs_io_geometry geom;
-	struct btrfs_dio_data *dio_data = iomap->private;
+	struct btrfs_dio_data *dio_data = iter->iomap.private;
 	struct extent_map *em = NULL;
 
 	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 9398b8c31323b3..b77e4416527c7b 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (C) 2010 Red Hat, Inc.
- * Copyright (c) 2016-2018 Christoph Hellwig.
+ * Copyright (c) 2016-2021 Christoph Hellwig.
  */
 #include <linux/module.h>
 #include <linux/compiler.h>
@@ -59,19 +59,17 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin)
 }
 EXPORT_SYMBOL_GPL(iomap_dio_iopoll);
 
-static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
-		struct bio *bio, loff_t pos)
+static void iomap_dio_submit_bio(const struct iomap_iter *iter,
+		struct iomap_dio *dio, struct bio *bio, loff_t pos)
 {
 	atomic_inc(&dio->ref);
 
 	if (dio->iocb->ki_flags & IOCB_HIPRI)
 		bio_set_polled(bio, dio->iocb);
 
-	dio->submit.last_queue = bdev_get_queue(iomap->bdev);
+	dio->submit.last_queue = bdev_get_queue(iter->iomap.bdev);
 	if (dio->dops && dio->dops->submit_io)
-		dio->submit.cookie = dio->dops->submit_io(
-				file_inode(dio->iocb->ki_filp),
-				iomap, bio, pos);
+		dio->submit.cookie = dio->dops->submit_io(iter, bio, pos);
 	else
 		dio->submit.cookie = submit_bio(bio);
 }
@@ -181,24 +179,23 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 	}
 }
 
-static void
-iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
-		unsigned len)
+static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
+		loff_t pos, unsigned len)
 {
 	struct page *page = ZERO_PAGE(0);
 	int flags = REQ_SYNC | REQ_IDLE;
 	struct bio *bio;
 
 	bio = bio_alloc(GFP_KERNEL, 1);
-	bio_set_dev(bio, iomap->bdev);
-	bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
+	bio_set_dev(bio, iter->iomap.bdev);
+	bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
 	get_page(page);
 	__bio_add_page(bio, page, len, 0);
 	bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
-	iomap_dio_submit_bio(dio, iomap, bio, pos);
+	iomap_dio_submit_bio(iter, dio, bio, pos);
 }
 
 /*
@@ -206,8 +203,8 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
  * mapping, and whether or not we want FUA.  Note that we can end up
  * clearing the WRITE_FUA flag in the dio request.
  */
-static inline unsigned int
-iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua)
+static inline unsigned int iomap_dio_bio_opflags(struct iomap_dio *dio,
+		const struct iomap *iomap, bool use_fua)
 {
 	unsigned int opflags = REQ_SYNC | REQ_IDLE;
 
@@ -229,13 +226,16 @@ iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool use_fua)
 	return opflags;
 }
 
-static loff_t
-iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
-		struct iomap_dio *dio, struct iomap *iomap)
+static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
+		struct iomap_dio *dio)
 {
+	const struct iomap *iomap = &iter->iomap;
+	struct inode *inode = iter->inode;
 	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
 	unsigned int fs_block_size = i_blocksize(inode), pad;
 	unsigned int align = iov_iter_alignment(dio->submit.iter);
+	loff_t length = iomap_length(iter);
+	loff_t pos = iter->pos;
 	unsigned int bio_opf;
 	struct bio *bio;
 	bool need_zeroout = false;
@@ -286,7 +286,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
 		/* zero out from the start of the block to the write offset */
 		pad = pos & (fs_block_size - 1);
 		if (pad)
-			iomap_dio_zero(dio, iomap, pos - pad, pad);
+			iomap_dio_zero(iter, dio, pos - pad, pad);
 	}
 
 	/*
@@ -339,7 +339,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
 
 		nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter,
 						 BIO_MAX_VECS);
-		iomap_dio_submit_bio(dio, iomap, bio, pos);
+		iomap_dio_submit_bio(iter, dio, bio, pos);
 		pos += n;
 	} while (nr_pages);
 
@@ -355,7 +355,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
 		/* zero out from the end of the write to the end of the block */
 		pad = pos & (fs_block_size - 1);
 		if (pad)
-			iomap_dio_zero(dio, iomap, pos, fs_block_size - pad);
+			iomap_dio_zero(iter, dio, pos, fs_block_size - pad);
 	}
 out:
 	/* Undo iter limitation to current extent */
@@ -365,33 +365,36 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
-static loff_t
-iomap_dio_hole_actor(loff_t length, struct iomap_dio *dio)
+static loff_t iomap_dio_hole_iter(const struct iomap_iter *iter,
+		struct iomap_dio *dio)
 {
-	length = iov_iter_zero(length, dio->submit.iter);
+	loff_t length = iov_iter_zero(iomap_length(iter), dio->submit.iter);
+
 	dio->size += length;
 	return length;
 }
 
-static loff_t
-iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
-		struct iomap_dio *dio, struct iomap *iomap)
+static loff_t iomap_dio_inline_iter(const struct iomap_iter *iomi,
+		struct iomap_dio *dio)
 {
+	const struct iomap *iomap = &iomi->iomap;
 	struct iov_iter *iter = dio->submit.iter;
+	loff_t length = iomap_length(iomi);
+	loff_t pos = iomi->pos;
 	size_t copied;
 
 	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	if (dio->flags & IOMAP_DIO_WRITE) {
-		loff_t size = inode->i_size;
+		loff_t size = iomi->inode->i_size;
 
 		if (pos > size)
 			memset(iomap->inline_data + size, 0, pos - size);
 		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
 		if (copied) {
 			if (pos + copied > size)
-				i_size_write(inode, pos + copied);
-			mark_inode_dirty(inode);
+				i_size_write(iomi->inode, pos + copied);
+			mark_inode_dirty(iomi->inode);
 		}
 	} else {
 		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
@@ -400,30 +403,27 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 	return copied;
 }
 
-static loff_t
-iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_dio_iter(const struct iomap_iter *iter,
+		struct iomap_dio *dio)
 {
-	struct iomap_dio *dio = data;
-
-	switch (iomap->type) {
+	switch (iter->iomap.type) {
 	case IOMAP_HOLE:
 		if (WARN_ON_ONCE(dio->flags & IOMAP_DIO_WRITE))
 			return -EIO;
-		return iomap_dio_hole_actor(length, dio);
+		return iomap_dio_hole_iter(iter, dio);
 	case IOMAP_UNWRITTEN:
 		if (!(dio->flags & IOMAP_DIO_WRITE))
-			return iomap_dio_hole_actor(length, dio);
-		return iomap_dio_bio_actor(inode, pos, length, dio, iomap);
+			return iomap_dio_hole_iter(iter, dio);
+		return iomap_dio_bio_iter(iter, dio);
 	case IOMAP_MAPPED:
-		return iomap_dio_bio_actor(inode, pos, length, dio, iomap);
+		return iomap_dio_bio_iter(iter, dio);
 	case IOMAP_INLINE:
-		return iomap_dio_inline_actor(inode, pos, length, dio, iomap);
+		return iomap_dio_inline_iter(iter, dio);
 	case IOMAP_DELALLOC:
 		/*
 		 * DIO is not serialised against mmap() access at all, and so
 		 * if the page_mkwrite occurs between the writeback and the
-		 * iomap_apply() call in the DIO path, then it will see the
+		 * iomap_iter() call in the DIO path, then it will see the
 		 * DELALLOC block that the page-mkwrite allocated.
 		 */
 		pr_warn_ratelimited("Direct I/O collision with buffered writes! File: %pD4 Comm: %.20s\n",
@@ -454,16 +454,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 {
 	struct address_space *mapping = iocb->ki_filp->f_mapping;
 	struct inode *inode = file_inode(iocb->ki_filp);
-	size_t count = iov_iter_count(iter);
-	loff_t pos = iocb->ki_pos;
-	loff_t end = iocb->ki_pos + count - 1, ret = 0;
+	struct iomap_iter iomi = {
+		.inode		= inode,
+		.pos		= iocb->ki_pos,
+		.len		= iov_iter_count(iter),
+		.flags		= IOMAP_DIRECT,
+	};
+	loff_t end = iomi.pos + iomi.len - 1, ret = 0;
 	bool wait_for_completion =
 		is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
-	unsigned int iomap_flags = IOMAP_DIRECT;
 	struct blk_plug plug;
 	struct iomap_dio *dio;
 
-	if (!count)
+	if (!iomi.len)
 		return NULL;
 
 	dio = kmalloc(sizeof(*dio), GFP_KERNEL);
@@ -484,29 +487,30 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->submit.last_queue = NULL;
 
 	if (iov_iter_rw(iter) == READ) {
-		if (pos >= dio->i_size)
+		if (iomi.pos >= dio->i_size)
 			goto out_free_dio;
 
 		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (filemap_range_needs_writeback(mapping, pos, end)) {
+			if (filemap_range_needs_writeback(mapping, iomi.pos,
+					end)) {
 				ret = -EAGAIN;
 				goto out_free_dio;
 			}
-			iomap_flags |= IOMAP_NOWAIT;
+			iomi.flags |= IOMAP_NOWAIT;
 		}
 
 		if (iter_is_iovec(iter))
 			dio->flags |= IOMAP_DIO_DIRTY;
 	} else {
-		iomap_flags |= IOMAP_WRITE;
+		iomi.flags |= IOMAP_WRITE;
 		dio->flags |= IOMAP_DIO_WRITE;
 
 		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (filemap_range_has_page(mapping, pos, end)) {
+			if (filemap_range_has_page(mapping, iomi.pos, end)) {
 				ret = -EAGAIN;
 				goto out_free_dio;
 			}
-			iomap_flags |= IOMAP_NOWAIT;
+			iomi.flags |= IOMAP_NOWAIT;
 		}
 
 		/* for data sync or sync, we need sync completion processing */
@@ -525,12 +529,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 
 	if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
 		ret = -EAGAIN;
-		if (pos >= dio->i_size || pos + count > dio->i_size)
+		if (iomi.pos >= dio->i_size ||
+		    iomi.pos + iomi.len > dio->i_size)
 			goto out_free_dio;
-		iomap_flags |= IOMAP_OVERWRITE_ONLY;
+		iomi.flags |= IOMAP_OVERWRITE_ONLY;
 	}
 
-	ret = filemap_write_and_wait_range(mapping, pos, end);
+	ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
 	if (ret)
 		goto out_free_dio;
 
@@ -540,9 +545,10 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		 * If this invalidation fails, let the caller fall back to
 		 * buffered I/O.
 		 */
-		if (invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
-				end >> PAGE_SHIFT)) {
-			trace_iomap_dio_invalidate_fail(inode, pos, count);
+		if (invalidate_inode_pages2_range(mapping,
+				iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
+			trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+							iomi.len);
 			ret = -ENOTBLK;
 			goto out_free_dio;
 		}
@@ -557,31 +563,23 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	inode_dio_begin(inode);
 
 	blk_start_plug(&plug);
-	do {
-		ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio,
-				iomap_dio_actor);
-		if (ret <= 0) {
-			/* magic error code to fall back to buffered I/O */
-			if (ret == -ENOTBLK) {
-				wait_for_completion = true;
-				ret = 0;
-			}
-			break;
-		}
-		pos += ret;
-
-		if (iov_iter_rw(iter) == READ && pos >= dio->i_size) {
-			/*
-			 * We only report that we've read data up to i_size.
-			 * Revert iter to a state corresponding to that as
-			 * some callers (such as splice code) rely on it.
-			 */
-			iov_iter_revert(iter, pos - dio->i_size);
-			break;
-		}
-	} while ((count = iov_iter_count(iter)) > 0);
+	while ((ret = iomap_iter(&iomi, ops)) > 0)
+		iomi.processed = iomap_dio_iter(&iomi, dio);
 	blk_finish_plug(&plug);
 
+	/*
+	 * We only report that we've read data up to i_size.
+	 * Revert iter to a state corresponding to that as some callers (such
+	 * as the splice code) rely on it.
+	 */
+	if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size)
+		iov_iter_revert(iter, iomi.pos - dio->i_size);
+
+	/* magic error code to fall back to buffered I/O */
+	if (ret == -ENOTBLK) {
+		wait_for_completion = true;
+		ret = 0;
+	}
 	if (ret < 0)
 		iomap_dio_set_error(dio, ret);
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a9f3f736017989..da01226886eca4 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -304,8 +304,8 @@ int iomap_writepages(struct address_space *mapping,
 struct iomap_dio_ops {
 	int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
 		      unsigned flags);
-	blk_qc_t (*submit_io)(struct inode *inode, struct iomap *iomap,
-			struct bio *bio, loff_t file_offset);
+	blk_qc_t (*submit_io)(const struct iomap_iter *iter, struct bio *bio,
+			      loff_t file_offset);
 };
 
 /*
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 15/27] iomap: switch iomap_fiemap to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (13 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 14/27] iomap: switch __iomap_dio_rw " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 16/27] iomap: switch iomap_bmap " Christoph Hellwig
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Rewrite the ->fiemap implementation based on iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/fiemap.c | 70 ++++++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 41 deletions(-)

diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index aab070df4a2175..acad09a8c188df 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- * Copyright (c) 2016-2018 Christoph Hellwig.
+ * Copyright (c) 2016-2021 Christoph Hellwig.
  */
 #include <linux/module.h>
 #include <linux/compiler.h>
@@ -8,13 +8,8 @@
 #include <linux/iomap.h>
 #include <linux/fiemap.h>
 
-struct fiemap_ctx {
-	struct fiemap_extent_info *fi;
-	struct iomap prev;
-};
-
 static int iomap_to_fiemap(struct fiemap_extent_info *fi,
-		struct iomap *iomap, u32 flags)
+		const struct iomap *iomap, u32 flags)
 {
 	switch (iomap->type) {
 	case IOMAP_HOLE:
@@ -43,24 +38,22 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
 			iomap->length, flags);
 }
 
-static loff_t
-iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_fiemap_iter(const struct iomap_iter *iter,
+		struct fiemap_extent_info *fi, struct iomap *prev)
 {
-	struct fiemap_ctx *ctx = data;
-	loff_t ret = length;
+	int ret;
 
-	if (iomap->type == IOMAP_HOLE)
-		return length;
+	if (iter->iomap.type == IOMAP_HOLE)
+		return iomap_length(iter);
 
-	ret = iomap_to_fiemap(ctx->fi, &ctx->prev, 0);
-	ctx->prev = *iomap;
+	ret = iomap_to_fiemap(fi, prev, 0);
+	*prev = iter->iomap;
 	switch (ret) {
 	case 0:		/* success */
-		return length;
+		return iomap_length(iter);
 	case 1:		/* extent array full */
 		return 0;
-	default:
+	default:	/* error */
 		return ret;
 	}
 }
@@ -68,38 +61,33 @@ iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
 		u64 start, u64 len, const struct iomap_ops *ops)
 {
-	struct fiemap_ctx ctx;
-	loff_t ret;
-
-	memset(&ctx, 0, sizeof(ctx));
-	ctx.fi = fi;
-	ctx.prev.type = IOMAP_HOLE;
+	struct iomap_iter iter = {
+		.inode		= inode,
+		.pos		= start,
+		.len		= len,
+		.flags		= IOMAP_REPORT,
+	};
+	struct iomap prev = {
+		.type		= IOMAP_HOLE,
+	};
+	int ret;
 
-	ret = fiemap_prep(inode, fi, start, &len, 0);
+	ret = fiemap_prep(inode, fi, start, &iter.len, 0);
 	if (ret)
 		return ret;
 
-	while (len > 0) {
-		ret = iomap_apply(inode, start, len, IOMAP_REPORT, ops, &ctx,
-				iomap_fiemap_actor);
-		/* inode with no (attribute) mapping will give ENOENT */
-		if (ret == -ENOENT)
-			break;
-		if (ret < 0)
-			return ret;
-		if (ret == 0)
-			break;
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_fiemap_iter(&iter, fi, &prev);
 
-		start += ret;
-		len -= ret;
-	}
-
-	if (ctx.prev.type != IOMAP_HOLE) {
-		ret = iomap_to_fiemap(fi, &ctx.prev, FIEMAP_EXTENT_LAST);
+	if (prev.type != IOMAP_HOLE) {
+		ret = iomap_to_fiemap(fi, &prev, FIEMAP_EXTENT_LAST);
 		if (ret < 0)
 			return ret;
 	}
 
+	/* inode with no (attribute) mapping will give ENOENT */
+	if (ret < 0 && ret != -ENOENT)
+		return ret;
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iomap_fiemap);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (14 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 15/27] iomap: switch iomap_fiemap " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:05   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 17/27] iomap: switch iomap_seek_hole " Christoph Hellwig
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Rewrite the ->bmap implementation based on iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/fiemap.c | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index acad09a8c188df..60daadba16c149 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -92,35 +92,30 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
 }
 EXPORT_SYMBOL_GPL(iomap_fiemap);
 
-static loff_t
-iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
-		void *data, struct iomap *iomap, struct iomap *srcmap)
-{
-	sector_t *bno = data, addr;
-
-	if (iomap->type == IOMAP_MAPPED) {
-		addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits;
-		*bno = addr;
-	}
-	return 0;
-}
-
 /* legacy ->bmap interface.  0 is the error return (!) */
 sector_t
 iomap_bmap(struct address_space *mapping, sector_t bno,
 		const struct iomap_ops *ops)
 {
-	struct inode *inode = mapping->host;
-	loff_t pos = bno << inode->i_blkbits;
-	unsigned blocksize = i_blocksize(inode);
+	struct iomap_iter iter = {
+		.inode	= mapping->host,
+		.pos	= (loff_t)bno << mapping->host->i_blkbits,
+		.len	= i_blocksize(mapping->host),
+		.flags	= IOMAP_REPORT,
+	};
 	int ret;
 
 	if (filemap_write_and_wait(mapping))
 		return 0;
 
 	bno = 0;
-	ret = iomap_apply(inode, pos, blocksize, 0, ops, &bno,
-			  iomap_bmap_actor);
+	while ((ret = iomap_iter(&iter, ops)) > 0) {
+		if (iter.iomap.type != IOMAP_MAPPED)
+			continue;
+		bno = (iter.pos - iter.iomap.offset + iter.iomap.addr) >>
+				mapping->host->i_blkbits;
+	}
+
 	if (ret)
 		return 0;
 	return bno;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 17/27] iomap: switch iomap_seek_hole to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (15 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 16/27] iomap: switch iomap_bmap " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:22   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 18/27] iomap: switch iomap_seek_data " Christoph Hellwig
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Rewrite iomap_seek_hole to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/seek.c | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
index ce6fb810854fec..7d6ed9af925e96 100644
--- a/fs/iomap/seek.c
+++ b/fs/iomap/seek.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (C) 2017 Red Hat, Inc.
- * Copyright (c) 2018 Christoph Hellwig.
+ * Copyright (c) 2018-2021 Christoph Hellwig.
  */
 #include <linux/module.h>
 #include <linux/compiler.h>
@@ -10,21 +10,19 @@
 #include <linux/pagemap.h>
 #include <linux/pagevec.h>
 
-static loff_t
-iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
-		      void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_seek_hole_iter(const struct iomap_iter *iter, loff_t *pos)
 {
-	loff_t offset = start;
+	loff_t length = iomap_length(iter);
 
-	switch (iomap->type) {
+	switch (iter->iomap.type) {
 	case IOMAP_UNWRITTEN:
-		offset = mapping_seek_hole_data(inode->i_mapping, start,
-				start + length, SEEK_HOLE);
-		if (offset == start + length)
+		*pos = mapping_seek_hole_data(iter->inode->i_mapping,
+				iter->pos, iter->pos + length, SEEK_HOLE);
+		if (*pos == iter->pos + length)
 			return length;
-		fallthrough;
+		return 0;
 	case IOMAP_HOLE:
-		*(loff_t *)data = offset;
+		*pos = iter->pos;
 		return 0;
 	default:
 		return length;
@@ -35,23 +33,25 @@ loff_t
 iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
 {
 	loff_t size = i_size_read(inode);
-	loff_t ret;
+	struct iomap_iter iter = {
+		.inode	= inode,
+		.pos	= offset,
+		.flags	= IOMAP_REPORT,
+	};
+	int ret;
 
 	/* Nothing to be found before or beyond the end of the file. */
 	if (offset < 0 || offset >= size)
 		return -ENXIO;
 
-	while (offset < size) {
-		ret = iomap_apply(inode, offset, size - offset, IOMAP_REPORT,
-				  ops, &offset, iomap_seek_hole_actor);
-		if (ret < 0)
-			return ret;
-		if (ret == 0)
-			break;
-		offset += ret;
-	}
-
-	return offset;
+	iter.len = size - offset;
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_seek_hole_iter(&iter, &offset);
+	if (ret < 0)
+		return ret;
+	if (iter.len)
+		return offset;
+	return size;
 }
 EXPORT_SYMBOL_GPL(iomap_seek_hole);
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 18/27] iomap: switch iomap_seek_data to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (16 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 17/27] iomap: switch iomap_seek_hole " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 19/27] iomap: switch iomap_swapfile_activate " Christoph Hellwig
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Rewrite iomap_seek_data to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/seek.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
index 7d6ed9af925e96..0a758e3851fcb7 100644
--- a/fs/iomap/seek.c
+++ b/fs/iomap/seek.c
@@ -55,23 +55,21 @@ iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
 }
 EXPORT_SYMBOL_GPL(iomap_seek_hole);
 
-static loff_t
-iomap_seek_data_actor(struct inode *inode, loff_t start, loff_t length,
-		      void *data, struct iomap *iomap, struct iomap *srcmap)
+static loff_t iomap_seek_data_iter(const struct iomap_iter *iter, loff_t *pos)
 {
-	loff_t offset = start;
+	loff_t length = iomap_length(iter);
 
-	switch (iomap->type) {
+	switch (iter->iomap.type) {
 	case IOMAP_HOLE:
 		return length;
 	case IOMAP_UNWRITTEN:
-		offset = mapping_seek_hole_data(inode->i_mapping, start,
-				start + length, SEEK_DATA);
-		if (offset < 0)
+		*pos = mapping_seek_hole_data(iter->inode->i_mapping,
+				iter->pos, iter->pos + length, SEEK_DATA);
+		if (*pos < 0)
 			return length;
-		fallthrough;
+		return 0;
 	default:
-		*(loff_t *)data = offset;
+		*pos = iter->pos;
 		return 0;
 	}
 }
@@ -80,22 +78,24 @@ loff_t
 iomap_seek_data(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
 {
 	loff_t size = i_size_read(inode);
-	loff_t ret;
+	struct iomap_iter iter = {
+		.inode	= inode,
+		.pos	= offset,
+		.flags	= IOMAP_REPORT,
+	};
+	int ret;
 
 	/* Nothing to be found before or beyond the end of the file. */
 	if (offset < 0 || offset >= size)
 		return -ENXIO;
 
-	while (offset < size) {
-		ret = iomap_apply(inode, offset, size - offset, IOMAP_REPORT,
-				  ops, &offset, iomap_seek_data_actor);
-		if (ret < 0)
-			return ret;
-		if (ret == 0)
-			return offset;
-		offset += ret;
-	}
-
+	iter.len = size - offset;
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_seek_data_iter(&iter, &offset);
+	if (ret < 0)
+		return ret;
+	if (iter.len)
+		return offset;
 	/* We've reached the end of the file without finding data */
 	return -ENXIO;
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 19/27] iomap: switch iomap_swapfile_activate to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (17 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 18/27] iomap: switch iomap_seek_data " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 20/27] fsdax: switch dax_iomap_rw " Christoph Hellwig
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch iomap_swapfile_activate to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/swapfile.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 6250ca6a1f851d..7069606eca85b2 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -88,13 +88,9 @@ static int iomap_swapfile_fail(struct iomap_swapfile_info *isi, const char *str)
  * swap only cares about contiguous page-aligned physical extents and makes no
  * distinction between written and unwritten extents.
  */
-static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
-		loff_t count, void *data, struct iomap *iomap,
-		struct iomap *srcmap)
+static loff_t iomap_swapfile_iter(const struct iomap_iter *iter,
+		struct iomap *iomap, struct iomap_swapfile_info *isi)
 {
-	struct iomap_swapfile_info *isi = data;
-	int error;
-
 	switch (iomap->type) {
 	case IOMAP_MAPPED:
 	case IOMAP_UNWRITTEN:
@@ -125,12 +121,12 @@ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
 		isi->iomap.length += iomap->length;
 	} else {
 		/* Otherwise, add the retained iomap and store this one. */
-		error = iomap_swapfile_add_extent(isi);
+		int error = iomap_swapfile_add_extent(isi);
 		if (error)
 			return error;
 		memcpy(&isi->iomap, iomap, sizeof(isi->iomap));
 	}
-	return count;
+	return iomap_length(iter);
 }
 
 /*
@@ -141,16 +137,19 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
 		struct file *swap_file, sector_t *pagespan,
 		const struct iomap_ops *ops)
 {
+	struct inode *inode = swap_file->f_mapping->host;
+	struct iomap_iter iter = {
+		.inode	= inode,
+		.pos	= 0,
+		.len	= ALIGN_DOWN(i_size_read(inode), PAGE_SIZE),
+		.flags	= IOMAP_REPORT,
+	};
 	struct iomap_swapfile_info isi = {
 		.sis = sis,
 		.lowest_ppage = (sector_t)-1ULL,
 		.file = swap_file,
 	};
-	struct address_space *mapping = swap_file->f_mapping;
-	struct inode *inode = mapping->host;
-	loff_t pos = 0;
-	loff_t len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE);
-	loff_t ret;
+	int ret;
 
 	/*
 	 * Persist all file mapping metadata so that we won't have any
@@ -160,15 +159,10 @@ int iomap_swapfile_activate(struct swap_info_struct *sis,
 	if (ret)
 		return ret;
 
-	while (len > 0) {
-		ret = iomap_apply(inode, pos, len, IOMAP_REPORT,
-				ops, &isi, iomap_swapfile_activate_actor);
-		if (ret <= 0)
-			return ret;
-
-		pos += ret;
-		len -= ret;
-	}
+	while ((ret = iomap_iter(&iter, ops)) > 0)
+		iter.processed = iomap_swapfile_iter(&iter, &iter.iomap, &isi);
+	if (ret < 0)
+		return ret;
 
 	if (isi.iomap.length) {
 		ret = iomap_swapfile_add_extent(&isi);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 20/27] fsdax: switch dax_iomap_rw to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (18 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 19/27] iomap: switch iomap_swapfile_activate " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 22:10   ` Dave Chinner
  2021-07-19 10:35 ` [PATCH 21/27] iomap: remove iomap_apply Christoph Hellwig
                   ` (8 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Switch the dax_iomap_rw implementation to use iomap_iter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c | 49 ++++++++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4d63040fd71f56..51da45301350a6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1103,20 +1103,21 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 	return size;
 }
 
-static loff_t
-dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
-		struct iomap *iomap, struct iomap *srcmap)
+static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
+		struct iov_iter *iter)
 {
+	const struct iomap *iomap = &iomi->iomap;
+	loff_t length = iomap_length(iomi);
+	loff_t pos = iomi->pos;
 	struct block_device *bdev = iomap->bdev;
 	struct dax_device *dax_dev = iomap->dax_dev;
-	struct iov_iter *iter = data;
 	loff_t end = pos + length, done = 0;
 	ssize_t ret = 0;
 	size_t xfer;
 	int id;
 
 	if (iov_iter_rw(iter) == READ) {
-		end = min(end, i_size_read(inode));
+		end = min(end, i_size_read(iomi->inode));
 		if (pos >= end)
 			return 0;
 
@@ -1133,7 +1134,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	 * written by write(2) is visible in mmap.
 	 */
 	if (iomap->flags & IOMAP_F_NEW) {
-		invalidate_inode_pages2_range(inode->i_mapping,
+		invalidate_inode_pages2_range(iomi->inode->i_mapping,
 					      pos >> PAGE_SHIFT,
 					      (end - 1) >> PAGE_SHIFT);
 	}
@@ -1209,31 +1210,29 @@ ssize_t
 dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops)
 {
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
-	struct inode *inode = mapping->host;
-	loff_t pos = iocb->ki_pos, ret = 0, done = 0;
-	unsigned flags = 0;
+	struct iomap_iter iomi = {
+		.inode		= iocb->ki_filp->f_mapping->host,
+		.pos		= iocb->ki_pos,
+		.len		= iov_iter_count(iter),
+	};
+	loff_t done = 0;
+	int ret;
 
 	if (iov_iter_rw(iter) == WRITE) {
-		lockdep_assert_held_write(&inode->i_rwsem);
-		flags |= IOMAP_WRITE;
+		lockdep_assert_held_write(&iomi.inode->i_rwsem);
+		iomi.flags |= IOMAP_WRITE;
 	} else {
-		lockdep_assert_held(&inode->i_rwsem);
+		lockdep_assert_held(&iomi.inode->i_rwsem);
 	}
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
-		flags |= IOMAP_NOWAIT;
+		iomi.flags |= IOMAP_NOWAIT;
 
-	while (iov_iter_count(iter)) {
-		ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
-				iter, dax_iomap_actor);
-		if (ret <= 0)
-			break;
-		pos += ret;
-		done += ret;
-	}
+	while ((ret = iomap_iter(&iomi, ops)) > 0)
+		iomi.processed = dax_iomap_iter(&iomi, iter);
 
-	iocb->ki_pos += done;
+	done = iomi.pos - iocb->ki_pos;
+	iocb->ki_pos = iomi.pos;
 	return done ? done : ret;
 }
 EXPORT_SYMBOL_GPL(dax_iomap_rw);
@@ -1307,7 +1306,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	}
 
 	/*
-	 * Note that we don't bother to use iomap_apply here: DAX required
+	 * Note that we don't bother to use iomap_iter here: DAX required
 	 * the file system block size to be equal the page size, which means
 	 * that we never have to deal with more than a single extent here.
 	 */
@@ -1561,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	}
 
 	/*
-	 * Note that we don't use iomap_apply here.  We aren't doing I/O, only
+	 * Note that we don't use iomap_iter here.  We aren't doing I/O, only
 	 * setting up a mapping, so really we're using iomap_begin() as a way
 	 * to look up our filesystem block.
 	 */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 21/27] iomap: remove iomap_apply
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (19 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 20/27] fsdax: switch dax_iomap_rw " Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:48   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 22/27] iomap: pass an iomap_iter to various buffered I/O helpers Christoph Hellwig
                   ` (7 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

iomap_apply is unused now, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/Makefile     |  1 -
 fs/iomap/apply.c      | 99 -------------------------------------------
 fs/iomap/trace.h      | 40 -----------------
 include/linux/iomap.h | 10 -----
 4 files changed, 150 deletions(-)
 delete mode 100644 fs/iomap/apply.c

diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index 85034deb5a2f19..ebd9866d80ae90 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -9,7 +9,6 @@ ccflags-y += -I $(srctree)/$(src)		# needed for trace events
 obj-$(CONFIG_FS_IOMAP)		+= iomap.o
 
 iomap-y				+= trace.o \
-				   apply.o \
 				   iter.o \
 				   buffered-io.o \
 				   direct-io.o \
diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
deleted file mode 100644
index 26ab6563181fc6..00000000000000
--- a/fs/iomap/apply.c
+++ /dev/null
@@ -1,99 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Copyright (C) 2010 Red Hat, Inc.
- * Copyright (c) 2016-2018 Christoph Hellwig.
- */
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
-#include <linux/iomap.h>
-#include "trace.h"
-
-/*
- * Execute a iomap write on a segment of the mapping that spans a
- * contiguous range of pages that have identical block mapping state.
- *
- * This avoids the need to map pages individually, do individual allocations
- * for each page and most importantly avoid the need for filesystem specific
- * locking per page. Instead, all the operations are amortised over the entire
- * range of pages. It is assumed that the filesystems will lock whatever
- * resources they require in the iomap_begin call, and release them in the
- * iomap_end call.
- */
-loff_t
-iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
-		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
-{
-	struct iomap iomap = { .type = IOMAP_HOLE };
-	struct iomap srcmap = { .type = IOMAP_HOLE };
-	loff_t written = 0, ret;
-	u64 end;
-
-	trace_iomap_apply(inode, pos, length, flags, ops, actor, _RET_IP_);
-
-	/*
-	 * Need to map a range from start position for length bytes. This can
-	 * span multiple pages - it is only guaranteed to return a range of a
-	 * single type of pages (e.g. all into a hole, all mapped or all
-	 * unwritten). Failure at this point has nothing to undo.
-	 *
-	 * If allocation is required for this range, reserve the space now so
-	 * that the allocation is guaranteed to succeed later on. Once we copy
-	 * the data into the page cache pages, then we cannot fail otherwise we
-	 * expose transient stale data. If the reserve fails, we can safely
-	 * back out at this point as there is nothing to undo.
-	 */
-	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
-	if (ret)
-		return ret;
-	if (WARN_ON(iomap.offset > pos)) {
-		written = -EIO;
-		goto out;
-	}
-	if (WARN_ON(iomap.length == 0)) {
-		written = -EIO;
-		goto out;
-	}
-
-	trace_iomap_apply_dstmap(inode, &iomap);
-	if (srcmap.type != IOMAP_HOLE)
-		trace_iomap_apply_srcmap(inode, &srcmap);
-
-	/*
-	 * Cut down the length to the one actually provided by the filesystem,
-	 * as it might not be able to give us the whole size that we requested.
-	 */
-	end = iomap.offset + iomap.length;
-	if (srcmap.type != IOMAP_HOLE)
-		end = min(end, srcmap.offset + srcmap.length);
-	if (pos + length > end)
-		length = end - pos;
-
-	/*
-	 * Now that we have guaranteed that the space allocation will succeed,
-	 * we can do the copy-in page by page without having to worry about
-	 * failures exposing transient data.
-	 *
-	 * To support COW operations, we read in data for partially blocks from
-	 * the srcmap if the file system filled it in.  In that case we the
-	 * length needs to be limited to the earlier of the ends of the iomaps.
-	 * If the file system did not provide a srcmap we pass in the normal
-	 * iomap into the actors so that they don't need to have special
-	 * handling for the two cases.
-	 */
-	written = actor(inode, pos, length, data, &iomap,
-			srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
-
-out:
-	/*
-	 * Now the data has been copied, commit the range we've copied.  This
-	 * should not fail unless the filesystem has had a fatal error.
-	 */
-	if (ops->iomap_end) {
-		ret = ops->iomap_end(inode, pos, length,
-				     written > 0 ? written : 0,
-				     flags, &iomap);
-	}
-
-	return written ? written : ret;
-}
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 1012d7af6b689b..f1519f9a140320 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -138,49 +138,9 @@ DECLARE_EVENT_CLASS(iomap_class,
 DEFINE_EVENT(iomap_class, name,	\
 	TP_PROTO(struct inode *inode, struct iomap *iomap), \
 	TP_ARGS(inode, iomap))
-DEFINE_IOMAP_EVENT(iomap_apply_dstmap);
-DEFINE_IOMAP_EVENT(iomap_apply_srcmap);
 DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
 DEFINE_IOMAP_EVENT(iomap_iter_srcmap);
 
-TRACE_EVENT(iomap_apply,
-	TP_PROTO(struct inode *inode, loff_t pos, loff_t length,
-		unsigned int flags, const void *ops, void *actor,
-		unsigned long caller),
-	TP_ARGS(inode, pos, length, flags, ops, actor, caller),
-	TP_STRUCT__entry(
-		__field(dev_t, dev)
-		__field(u64, ino)
-		__field(loff_t, pos)
-		__field(loff_t, length)
-		__field(unsigned int, flags)
-		__field(const void *, ops)
-		__field(void *, actor)
-		__field(unsigned long, caller)
-	),
-	TP_fast_assign(
-		__entry->dev = inode->i_sb->s_dev;
-		__entry->ino = inode->i_ino;
-		__entry->pos = pos;
-		__entry->length = length;
-		__entry->flags = flags;
-		__entry->ops = ops;
-		__entry->actor = actor;
-		__entry->caller = caller;
-	),
-	TP_printk("dev %d:%d ino 0x%llx pos %lld length %lld flags %s (0x%x) "
-		  "ops %ps caller %pS actor %ps",
-		  MAJOR(__entry->dev), MINOR(__entry->dev),
-		   __entry->ino,
-		   __entry->pos,
-		   __entry->length,
-		   __print_flags(__entry->flags, "|", IOMAP_FLAGS_STRINGS),
-		   __entry->flags,
-		   __entry->ops,
-		   (void *)__entry->caller,
-		   __entry->actor)
-);
-
 TRACE_EVENT(iomap_iter,
 	TP_PROTO(struct iomap_iter *iter, const void *ops,
 		 unsigned long caller),
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index da01226886eca4..2f13e34c2c0b0b 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -199,16 +199,6 @@ static inline struct iomap *iomap_iter_srcmap(struct iomap_iter *i)
 	return &i->iomap;
 }
 
-/*
- * Main iomap iterator function.
- */
-typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
-		void *data, struct iomap *iomap, struct iomap *srcmap);
-
-loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
-		unsigned flags, const struct iomap_ops *ops, void *data,
-		iomap_actor_t actor);
-
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_readpage(struct page *page, const struct iomap_ops *ops);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 22/27] iomap: pass an iomap_iter to various buffered I/O helpers
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (20 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 21/27] iomap: remove iomap_apply Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:48   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 23/27] iomap: rework unshare flag Christoph Hellwig
                   ` (6 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Pass the iomap_iter structure instead of individual parameters to
various internal helpers for buffered I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 117 ++++++++++++++++++++---------------------
 1 file changed, 56 insertions(+), 61 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index c273b5d88dd8a8..daabbe8d7edfb5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -226,12 +226,14 @@ iomap_read_inline_data(struct inode *inode, struct page *page,
 	SetPageUptodate(page);
 }
 
-static inline bool iomap_block_needs_zeroing(struct inode *inode,
-		struct iomap *iomap, loff_t pos)
+static inline bool iomap_block_needs_zeroing(struct iomap_iter *iter,
+		loff_t pos)
 {
-	return iomap->type != IOMAP_MAPPED ||
-		(iomap->flags & IOMAP_F_NEW) ||
-		pos >= i_size_read(inode);
+	struct iomap *srcmap = iomap_iter_srcmap(iter);
+
+	return srcmap->type != IOMAP_MAPPED ||
+		(srcmap->flags & IOMAP_F_NEW) ||
+		pos >= i_size_read(iter->inode);
 }
 
 static loff_t iomap_readpage_iter(struct iomap_iter *iter,
@@ -259,7 +261,7 @@ static loff_t iomap_readpage_iter(struct iomap_iter *iter,
 	if (plen == 0)
 		goto done;
 
-	if (iomap_block_needs_zeroing(iter->inode, iomap, pos)) {
+	if (iomap_block_needs_zeroing(iter, pos)) {
 		zero_user(page, poff, plen);
 		iomap_set_range_uptodate(page, poff, plen);
 		goto done;
@@ -541,12 +543,12 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 	return submit_bio_wait(&bio);
 }
 
-static int
-__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
-		struct page *page, struct iomap *srcmap)
+static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
+		unsigned len, int flags, struct page *page)
 {
-	struct iomap_page *iop = iomap_page_create(inode, page);
-	loff_t block_size = i_blocksize(inode);
+	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	struct iomap_page *iop = iomap_page_create(iter->inode, page);
+	loff_t block_size = i_blocksize(iter->inode);
 	loff_t block_start = round_down(pos, block_size);
 	loff_t block_end = round_up(pos + len, block_size);
 	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
@@ -556,7 +558,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	ClearPageError(page);
 
 	do {
-		iomap_adjust_read_range(inode, iop, &block_start,
+		iomap_adjust_read_range(iter->inode, iop, &block_start,
 				block_end - block_start, &poff, &plen);
 		if (plen == 0)
 			break;
@@ -566,7 +568,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		    (to <= poff || to >= poff + plen))
 			continue;
 
-		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
+		if (iomap_block_needs_zeroing(iter, block_start)) {
 			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
 				return -EIO;
 			zero_user_segments(page, poff, from, to, poff + plen);
@@ -582,41 +584,40 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	return 0;
 }
 
-static int
-iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
-		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
+static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
+		unsigned flags, struct page **pagep)
 {
-	const struct iomap_page_ops *page_ops = iomap->page_ops;
+	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+	struct iomap *srcmap = iomap_iter_srcmap(iter);
 	struct page *page;
 	int status = 0;
 
-	BUG_ON(pos + len > iomap->offset + iomap->length);
-	if (srcmap != iomap)
+	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
+	if (srcmap != &iter->iomap)
 		BUG_ON(pos + len > srcmap->offset + srcmap->length);
 
 	if (fatal_signal_pending(current))
 		return -EINTR;
 
 	if (page_ops && page_ops->page_prepare) {
-		status = page_ops->page_prepare(inode, pos, len);
+		status = page_ops->page_prepare(iter->inode, pos, len);
 		if (status)
 			return status;
 	}
 
-	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
-			AOP_FLAG_NOFS);
+	page = grab_cache_page_write_begin(iter->inode->i_mapping,
+				pos >> PAGE_SHIFT, AOP_FLAG_NOFS);
 	if (!page) {
 		status = -ENOMEM;
 		goto out_no_page;
 	}
 
 	if (srcmap->type == IOMAP_INLINE)
-		iomap_read_inline_data(inode, page, srcmap);
-	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
+		iomap_read_inline_data(iter->inode, page, srcmap);
+	else if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
-		status = __iomap_write_begin(inode, pos, len, flags, page,
-				srcmap);
+		status = __iomap_write_begin(iter, pos, len, flags, page);
 
 	if (unlikely(status))
 		goto out_unlock;
@@ -627,11 +628,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 out_unlock:
 	unlock_page(page);
 	put_page(page);
-	iomap_write_failed(inode, pos, len);
+	iomap_write_failed(iter->inode, pos, len);
 
 out_no_page:
 	if (page_ops && page_ops->page_done)
-		page_ops->page_done(inode, pos, 0, NULL);
+		page_ops->page_done(iter->inode, pos, 0, NULL);
 	return status;
 }
 
@@ -658,9 +659,10 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	return copied;
 }
 
-static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
-		struct iomap *iomap, loff_t pos, size_t copied)
+static size_t iomap_write_end_inline(struct iomap_iter *iter, struct page *page,
+		loff_t pos, size_t copied)
 {
+	struct iomap *iomap = &iter->iomap;
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
@@ -671,26 +673,26 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
 	memcpy(iomap->inline_data + pos, addr + pos, copied);
 	kunmap_atomic(addr);
 
-	mark_inode_dirty(inode);
+	mark_inode_dirty(iter->inode);
 	return copied;
 }
 
 /* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
-static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
-		size_t copied, struct page *page, struct iomap *iomap,
-		struct iomap *srcmap)
+static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
+		size_t copied, struct page *page)
 {
-	const struct iomap_page_ops *page_ops = iomap->page_ops;
-	loff_t old_size = inode->i_size;
+	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	loff_t old_size = iter->inode->i_size;
 	size_t ret;
 
 	if (srcmap->type == IOMAP_INLINE) {
-		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
+		ret = iomap_write_end_inline(iter, page, pos, copied);
 	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
-		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
-				page, NULL);
+		ret = block_write_end(NULL, iter->inode->i_mapping, pos, len,
+				copied, page, NULL);
 	} else {
-		ret = __iomap_write_end(inode, pos, len, copied, page);
+		ret = __iomap_write_end(iter->inode, pos, len, copied, page);
 	}
 
 	/*
@@ -699,26 +701,24 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	 * preferably after I/O completion so that no stale data is exposed.
 	 */
 	if (pos + ret > old_size) {
-		i_size_write(inode, pos + ret);
-		iomap->flags |= IOMAP_F_SIZE_CHANGED;
+		i_size_write(iter->inode, pos + ret);
+		iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
 	}
 	unlock_page(page);
 
 	if (old_size < pos)
-		pagecache_isize_extended(inode, old_size, pos);
+		pagecache_isize_extended(iter->inode, old_size, pos);
 	if (page_ops && page_ops->page_done)
-		page_ops->page_done(inode, pos, ret, page);
+		page_ops->page_done(iter->inode, pos, ret, page);
 	put_page(page);
 
 	if (ret < len)
-		iomap_write_failed(inode, pos, len);
+		iomap_write_failed(iter->inode, pos, len);
 	return ret;
 }
 
 static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 {
-	struct iomap *srcmap = iomap_iter_srcmap(iter);
-	struct iomap *iomap = &iter->iomap;
 	loff_t length = iomap_length(iter);
 	loff_t pos = iter->pos;
 	ssize_t written = 0;
@@ -748,8 +748,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 			break;
 		}
 
-		status = iomap_write_begin(iter->inode, pos, bytes, 0, &page,
-					   iomap, srcmap);
+		status = iomap_write_begin(iter, pos, bytes, 0, &page);
 		if (unlikely(status))
 			break;
 
@@ -758,8 +757,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 
 		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
 
-		status = iomap_write_end(iter->inode, pos, bytes, copied, page,
-					 iomap, srcmap);
+		status = iomap_write_end(iter, pos, bytes, copied, page);
 
 		if (unlikely(copied != status))
 			iov_iter_revert(i, copied - status);
@@ -827,13 +825,12 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 		struct page *page;
 
-		status = iomap_write_begin(iter->inode, pos, bytes,
-				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
+		status = iomap_write_begin(iter, pos, bytes,
+				IOMAP_WRITE_F_UNSHARE, &page);
 		if (unlikely(status))
 			return status;
 
-		status = iomap_write_end(iter->inode, pos, bytes, bytes, page, iomap,
-				srcmap);
+		status = iomap_write_end(iter, pos, bytes, bytes, page);
 		if (WARN_ON_ONCE(status == 0))
 			return -EIO;
 
@@ -867,22 +864,21 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 }
 EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
-static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
-		struct iomap *iomap, struct iomap *srcmap)
+static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
 {
 	struct page *page;
 	int status;
 	unsigned offset = offset_in_page(pos);
 	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
 
-	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
+	status = iomap_write_begin(iter, pos, bytes, 0, &page);
 	if (status)
 		return status;
 
 	zero_user(page, offset, bytes);
 	mark_page_accessed(page);
 
-	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
+	return iomap_write_end(iter, pos, bytes, bytes, page);
 }
 
 static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
@@ -903,8 +899,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 		if (IS_DAX(iter->inode))
 			bytes = dax_iomap_zero(pos, length, iomap);
 		else
-			bytes = iomap_zero(iter->inode, pos, length, iomap,
-					   srcmap);
+			bytes = __iomap_zero_iter(iter, pos, length);
 		if (bytes < 0)
 			return bytes;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 23/27] iomap: rework unshare flag
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (21 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 22/27] iomap: pass an iomap_iter to various buffered I/O helpers Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:44   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 24/27] fsdax: factor out helpers to simplify the dax fault code Christoph Hellwig
                   ` (5 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Instead of another internal flags namespace inside of buffered-io.c,
just pass a UNSHARE hint in the main iomap flags field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 23 +++++++++--------------
 include/linux/iomap.h  |  1 +
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index daabbe8d7edfb5..eb5d742b5bf8b7 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -511,10 +511,6 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
 EXPORT_SYMBOL_GPL(iomap_migrate_page);
 #endif /* CONFIG_MIGRATION */
 
-enum {
-	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
-};
-
 static void
 iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
 {
@@ -544,7 +540,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 }
 
 static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
-		unsigned len, int flags, struct page *page)
+		unsigned len, struct page *page)
 {
 	struct iomap *srcmap = iomap_iter_srcmap(iter);
 	struct iomap_page *iop = iomap_page_create(iter->inode, page);
@@ -563,13 +559,13 @@ static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 		if (plen == 0)
 			break;
 
-		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
+		if (!(iter->flags & IOMAP_UNSHARE) &&
 		    (from <= poff || from >= poff + plen) &&
 		    (to <= poff || to >= poff + plen))
 			continue;
 
 		if (iomap_block_needs_zeroing(iter, block_start)) {
-			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
+			if (WARN_ON_ONCE(iter->flags & IOMAP_UNSHARE))
 				return -EIO;
 			zero_user_segments(page, poff, from, to, poff + plen);
 		} else {
@@ -585,7 +581,7 @@ static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 }
 
 static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
-		unsigned flags, struct page **pagep)
+		struct page **pagep)
 {
 	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
 	struct iomap *srcmap = iomap_iter_srcmap(iter);
@@ -617,7 +613,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
 	else if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
 	else
-		status = __iomap_write_begin(iter, pos, len, flags, page);
+		status = __iomap_write_begin(iter, pos, len, page);
 
 	if (unlikely(status))
 		goto out_unlock;
@@ -748,7 +744,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
 			break;
 		}
 
-		status = iomap_write_begin(iter, pos, bytes, 0, &page);
+		status = iomap_write_begin(iter, pos, bytes, &page);
 		if (unlikely(status))
 			break;
 
@@ -825,8 +821,7 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 		struct page *page;
 
-		status = iomap_write_begin(iter, pos, bytes,
-				IOMAP_WRITE_F_UNSHARE, &page);
+		status = iomap_write_begin(iter, pos, bytes, &page);
 		if (unlikely(status))
 			return status;
 
@@ -854,7 +849,7 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		.inode		= inode,
 		.pos		= pos,
 		.len		= len,
-		.flags		= IOMAP_WRITE,
+		.flags		= IOMAP_WRITE | IOMAP_UNSHARE,
 	};
 	int ret;
 
@@ -871,7 +866,7 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
 	unsigned offset = offset_in_page(pos);
 	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
 
-	status = iomap_write_begin(iter, pos, bytes, 0, &page);
+	status = iomap_write_begin(iter, pos, bytes, &page);
 	if (status)
 		return status;
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 2f13e34c2c0b0b..719798814bdfdb 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -122,6 +122,7 @@ struct iomap_page_ops {
 #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
 #define IOMAP_NOWAIT		(1 << 5) /* do not block */
 #define IOMAP_OVERWRITE_ONLY	(1 << 6) /* only pure overwrites allowed */
+#define IOMAP_UNSHARE		(1 << 7) /* unshare_file_range */
 
 struct iomap_ops {
 	/*
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 24/27] fsdax: factor out helpers to simplify the dax fault code
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (22 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 23/27] iomap: rework unshare flag Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 25/27] fsdax: factor out a dax_fault_actor() helper Christoph Hellwig
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel,
	Ritesh Harjani

From: Shiyang Ruan <ruansy.fnst@fujitsu.com>

The dax page fault code is too long and a bit difficult to read. And it
is hard to understand when we trying to add new features. Some of the
PTE/PMD codes have similar logic. So, factor out helper functions to
simplify the code.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[hch: minor cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c | 153 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 84 insertions(+), 69 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 51da45301350a6..c09d721629d167 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1255,6 +1255,53 @@ static bool dax_fault_is_synchronous(unsigned long flags,
 		&& (iomap->flags & IOMAP_F_DIRTY);
 }
 
+/*
+ * When handling a synchronous page fault and the inode need a fsync, we can
+ * insert the PTE/PMD into page tables only after that fsync happened. Skip
+ * insertion for now and return the pfn so that caller can insert it after the
+ * fsync is done.
+ */
+static vm_fault_t dax_fault_synchronous_pfnp(pfn_t *pfnp, pfn_t pfn)
+{
+	if (WARN_ON_ONCE(!pfnp))
+		return VM_FAULT_SIGBUS;
+	*pfnp = pfn;
+	return VM_FAULT_NEEDDSYNC;
+}
+
+static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf, struct iomap *iomap,
+		loff_t pos)
+{
+	sector_t sector = dax_iomap_sector(iomap, pos);
+	unsigned long vaddr = vmf->address;
+	vm_fault_t ret;
+	int error = 0;
+
+	switch (iomap->type) {
+	case IOMAP_HOLE:
+	case IOMAP_UNWRITTEN:
+		clear_user_highpage(vmf->cow_page, vaddr);
+		break;
+	case IOMAP_MAPPED:
+		error = copy_cow_page_dax(iomap->bdev, iomap->dax_dev, sector,
+					  vmf->cow_page, vaddr);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		error = -EIO;
+		break;
+	}
+
+	if (error)
+		return dax_fault_return(error);
+
+	__SetPageUptodate(vmf->cow_page);
+	ret = finish_fault(vmf);
+	if (!ret)
+		return VM_FAULT_DONE_COW;
+	return ret;
+}
+
 static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 			       int *iomap_errp, const struct iomap_ops *ops)
 {
@@ -1323,30 +1370,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	}
 
 	if (vmf->cow_page) {
-		sector_t sector = dax_iomap_sector(&iomap, pos);
-
-		switch (iomap.type) {
-		case IOMAP_HOLE:
-		case IOMAP_UNWRITTEN:
-			clear_user_highpage(vmf->cow_page, vaddr);
-			break;
-		case IOMAP_MAPPED:
-			error = copy_cow_page_dax(iomap.bdev, iomap.dax_dev,
-						  sector, vmf->cow_page, vaddr);
-			break;
-		default:
-			WARN_ON_ONCE(1);
-			error = -EIO;
-			break;
-		}
-
-		if (error)
-			goto error_finish_iomap;
-
-		__SetPageUptodate(vmf->cow_page);
-		ret = finish_fault(vmf);
-		if (!ret)
-			ret = VM_FAULT_DONE_COW;
+		ret = dax_fault_cow_page(vmf, &iomap, pos);
 		goto finish_iomap;
 	}
 
@@ -1366,19 +1390,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
 						 0, write && !sync);
 
-		/*
-		 * If we are doing synchronous page fault and inode needs fsync,
-		 * we can insert PTE into page tables only after that happens.
-		 * Skip insertion for now and return the pfn so that caller can
-		 * insert it after fsync is done.
-		 */
 		if (sync) {
-			if (WARN_ON_ONCE(!pfnp)) {
-				error = -EIO;
-				goto error_finish_iomap;
-			}
-			*pfnp = pfn;
-			ret = VM_FAULT_NEEDDSYNC | major;
+			ret = dax_fault_synchronous_pfnp(pfnp, pfn);
 			goto finish_iomap;
 		}
 		trace_dax_insert_mapping(inode, vmf, entry);
@@ -1477,13 +1490,45 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
 	return VM_FAULT_FALLBACK;
 }
 
+static bool dax_fault_check_fallback(struct vm_fault *vmf, struct xa_state *xas,
+		pgoff_t max_pgoff)
+{
+	unsigned long pmd_addr = vmf->address & PMD_MASK;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+
+	/*
+	 * Make sure that the faulting address's PMD offset (color) matches
+	 * the PMD offset from the start of the file.  This is necessary so
+	 * that a PMD range in the page table overlaps exactly with a PMD
+	 * range in the page cache.
+	 */
+	if ((vmf->pgoff & PG_PMD_COLOUR) !=
+	    ((vmf->address >> PAGE_SHIFT) & PG_PMD_COLOUR))
+		return true;
+
+	/* Fall back to PTEs if we're going to COW */
+	if (write && !(vmf->vma->vm_flags & VM_SHARED))
+		return true;
+
+	/* If the PMD would extend outside the VMA */
+	if (pmd_addr < vmf->vma->vm_start)
+		return true;
+	if ((pmd_addr + PMD_SIZE) > vmf->vma->vm_end)
+		return true;
+
+	/* If the PMD would extend beyond the file size */
+	if ((xas->xa_index | PG_PMD_COLOUR) >= max_pgoff)
+		return true;
+
+	return false;
+}
+
 static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 			       const struct iomap_ops *ops)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_ORDER);
-	unsigned long pmd_addr = vmf->address & PMD_MASK;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	bool sync;
 	unsigned int iomap_flags = (write ? IOMAP_WRITE : 0) | IOMAP_FAULT;
@@ -1506,33 +1551,12 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 
 	trace_dax_pmd_fault(inode, vmf, max_pgoff, 0);
 
-	/*
-	 * Make sure that the faulting address's PMD offset (color) matches
-	 * the PMD offset from the start of the file.  This is necessary so
-	 * that a PMD range in the page table overlaps exactly with a PMD
-	 * range in the page cache.
-	 */
-	if ((vmf->pgoff & PG_PMD_COLOUR) !=
-	    ((vmf->address >> PAGE_SHIFT) & PG_PMD_COLOUR))
-		goto fallback;
-
-	/* Fall back to PTEs if we're going to COW */
-	if (write && !(vma->vm_flags & VM_SHARED))
-		goto fallback;
-
-	/* If the PMD would extend outside the VMA */
-	if (pmd_addr < vma->vm_start)
-		goto fallback;
-	if ((pmd_addr + PMD_SIZE) > vma->vm_end)
-		goto fallback;
-
 	if (xas.xa_index >= max_pgoff) {
 		result = VM_FAULT_SIGBUS;
 		goto out;
 	}
 
-	/* If the PMD would extend beyond the file size */
-	if ((xas.xa_index | PG_PMD_COLOUR) >= max_pgoff)
+	if (dax_fault_check_fallback(vmf, &xas, max_pgoff))
 		goto fallback;
 
 	/*
@@ -1584,17 +1608,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
 						DAX_PMD, write && !sync);
 
-		/*
-		 * If we are doing synchronous page fault and inode needs fsync,
-		 * we can insert PMD into page tables only after that happens.
-		 * Skip insertion for now and return the pfn so that caller can
-		 * insert it after fsync is done.
-		 */
 		if (sync) {
-			if (WARN_ON_ONCE(!pfnp))
-				goto finish_iomap;
-			*pfnp = pfn;
-			result = VM_FAULT_NEEDDSYNC;
+			result = dax_fault_synchronous_pfnp(pfnp, pfn);
 			goto finish_iomap;
 		}
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 25/27] fsdax: factor out a dax_fault_actor() helper
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (23 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 24/27] fsdax: factor out helpers to simplify the dax fault code Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 10:35 ` [PATCH 26/27] fsdax: switch the fault handlers to use iomap_iter Christoph Hellwig
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel,
	Ritesh Harjani

From: Shiyang Ruan <ruansy.fnst@fujitsu.com>

The core logic in the two dax page fault functions is similar. So, move
the logic into a common helper function. Also, to facilitate the
addition of new features, such as CoW, switch-case is no longer used to
handle different iomap types.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c | 297 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 149 insertions(+), 148 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index c09d721629d167..6d0c6d28be83b1 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1066,6 +1066,66 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
 	return ret;
 }
 
+#ifdef CONFIG_FS_DAX_PMD
+static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
+		struct iomap *iomap, void **entry)
+{
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	unsigned long pmd_addr = vmf->address & PMD_MASK;
+	struct vm_area_struct *vma = vmf->vma;
+	struct inode *inode = mapping->host;
+	pgtable_t pgtable = NULL;
+	struct page *zero_page;
+	spinlock_t *ptl;
+	pmd_t pmd_entry;
+	pfn_t pfn;
+
+	zero_page = mm_get_huge_zero_page(vmf->vma->vm_mm);
+
+	if (unlikely(!zero_page))
+		goto fallback;
+
+	pfn = page_to_pfn_t(zero_page);
+	*entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn,
+			DAX_PMD | DAX_ZERO_PAGE, false);
+
+	if (arch_needs_pgtable_deposit()) {
+		pgtable = pte_alloc_one(vma->vm_mm);
+		if (!pgtable)
+			return VM_FAULT_OOM;
+	}
+
+	ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
+	if (!pmd_none(*(vmf->pmd))) {
+		spin_unlock(ptl);
+		goto fallback;
+	}
+
+	if (pgtable) {
+		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
+		mm_inc_nr_ptes(vma->vm_mm);
+	}
+	pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
+	pmd_entry = pmd_mkhuge(pmd_entry);
+	set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
+	spin_unlock(ptl);
+	trace_dax_pmd_load_hole(inode, vmf, zero_page, *entry);
+	return VM_FAULT_NOPAGE;
+
+fallback:
+	if (pgtable)
+		pte_free(vma->vm_mm, pgtable);
+	trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, *entry);
+	return VM_FAULT_FALLBACK;
+}
+#else
+static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
+		struct iomap *iomap, void **entry)
+{
+	return VM_FAULT_FALLBACK;
+}
+#endif /* CONFIG_FS_DAX_PMD */
+
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
 	sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
@@ -1302,6 +1362,63 @@ static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf, struct iomap *iomap,
 	return ret;
 }
 
+/**
+ * dax_fault_actor - Common actor to handle pfn insertion in PTE/PMD fault.
+ * @vmf:	vm fault instance
+ * @pfnp:	pfn to be returned
+ * @xas:	the dax mapping tree of a file
+ * @entry:	an unlocked dax entry to be inserted
+ * @pmd:	distinguish whether it is a pmd fault
+ * @flags:	iomap flags
+ * @iomap:	from iomap_begin()
+ * @srcmap:	from iomap_begin(), not equal to iomap if it is a CoW
+ */
+static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
+		struct xa_state *xas, void **entry, bool pmd,
+		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+{
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
+	loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	bool sync = dax_fault_is_synchronous(flags, vmf->vma, iomap);
+	unsigned long entry_flags = pmd ? DAX_PMD : 0;
+	int err = 0;
+	pfn_t pfn;
+
+	/* if we are reading UNWRITTEN and HOLE, return a hole. */
+	if (!write &&
+	    (iomap->type == IOMAP_UNWRITTEN || iomap->type == IOMAP_HOLE)) {
+		if (!pmd)
+			return dax_load_hole(xas, mapping, entry, vmf);
+		return dax_pmd_load_hole(xas, vmf, iomap, entry);
+	}
+
+	if (iomap->type != IOMAP_MAPPED) {
+		WARN_ON_ONCE(1);
+		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
+	}
+
+	err = dax_iomap_pfn(iomap, pos, size, &pfn);
+	if (err)
+		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
+
+	*entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags,
+				  write && !sync);
+
+	if (sync)
+		return dax_fault_synchronous_pfnp(pfnp, pfn);
+
+	/* insert PMD pfn */
+	if (pmd)
+		return vmf_insert_pfn_pmd(vmf, pfn, write);
+
+	/* insert PTE pfn */
+	if (write)
+		return vmf_insert_mixed_mkwrite(vmf->vma, vmf->address, pfn);
+	return vmf_insert_mixed(vmf->vma, vmf->address, pfn);
+}
+
 static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 			       int *iomap_errp, const struct iomap_ops *ops)
 {
@@ -1309,17 +1426,14 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	XA_STATE(xas, &mapping->i_pages, vmf->pgoff);
 	struct inode *inode = mapping->host;
-	unsigned long vaddr = vmf->address;
 	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
 	struct iomap iomap = { .type = IOMAP_HOLE };
 	struct iomap srcmap = { .type = IOMAP_HOLE };
 	unsigned flags = IOMAP_FAULT;
-	int error, major = 0;
+	int error;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	bool sync;
-	vm_fault_t ret = 0;
+	vm_fault_t ret = 0, major = 0;
 	void *entry;
-	pfn_t pfn;
 
 	trace_dax_pte_fault(inode, vmf, ret);
 	/*
@@ -1365,8 +1479,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		goto unlock_entry;
 	}
 	if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) {
-		error = -EIO;	/* fs corruption? */
-		goto error_finish_iomap;
+		ret = VM_FAULT_SIGBUS;	/* fs corruption? */
+		goto finish_iomap;
 	}
 
 	if (vmf->cow_page) {
@@ -1374,49 +1488,19 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		goto finish_iomap;
 	}
 
-	sync = dax_fault_is_synchronous(flags, vma, &iomap);
-
-	switch (iomap.type) {
-	case IOMAP_MAPPED:
-		if (iomap.flags & IOMAP_F_NEW) {
-			count_vm_event(PGMAJFAULT);
-			count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
-			major = VM_FAULT_MAJOR;
-		}
-		error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn);
-		if (error < 0)
-			goto error_finish_iomap;
-
-		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
-						 0, write && !sync);
-
-		if (sync) {
-			ret = dax_fault_synchronous_pfnp(pfnp, pfn);
-			goto finish_iomap;
-		}
-		trace_dax_insert_mapping(inode, vmf, entry);
-		if (write)
-			ret = vmf_insert_mixed_mkwrite(vma, vaddr, pfn);
-		else
-			ret = vmf_insert_mixed(vma, vaddr, pfn);
-
+	ret = dax_fault_actor(vmf, pfnp, &xas, &entry, false, flags,
+			      &iomap, &srcmap);
+	if (ret == VM_FAULT_SIGBUS)
 		goto finish_iomap;
-	case IOMAP_UNWRITTEN:
-	case IOMAP_HOLE:
-		if (!write) {
-			ret = dax_load_hole(&xas, mapping, &entry, vmf);
-			goto finish_iomap;
-		}
-		fallthrough;
-	default:
-		WARN_ON_ONCE(1);
-		error = -EIO;
-		break;
+
+	/* read/write MAPPED, CoW UNWRITTEN */
+	if (iomap.flags & IOMAP_F_NEW) {
+		count_vm_event(PGMAJFAULT);
+		count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
+		major = VM_FAULT_MAJOR;
 	}
 
- error_finish_iomap:
-	ret = dax_fault_return(error);
- finish_iomap:
+finish_iomap:
 	if (ops->iomap_end) {
 		int copied = PAGE_SIZE;
 
@@ -1430,66 +1514,14 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		 */
 		ops->iomap_end(inode, pos, PAGE_SIZE, copied, flags, &iomap);
 	}
- unlock_entry:
+unlock_entry:
 	dax_unlock_entry(&xas, entry);
- out:
+out:
 	trace_dax_pte_fault_done(inode, vmf, ret);
 	return ret | major;
 }
 
 #ifdef CONFIG_FS_DAX_PMD
-static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
-		struct iomap *iomap, void **entry)
-{
-	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
-	unsigned long pmd_addr = vmf->address & PMD_MASK;
-	struct vm_area_struct *vma = vmf->vma;
-	struct inode *inode = mapping->host;
-	pgtable_t pgtable = NULL;
-	struct page *zero_page;
-	spinlock_t *ptl;
-	pmd_t pmd_entry;
-	pfn_t pfn;
-
-	zero_page = mm_get_huge_zero_page(vmf->vma->vm_mm);
-
-	if (unlikely(!zero_page))
-		goto fallback;
-
-	pfn = page_to_pfn_t(zero_page);
-	*entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn,
-			DAX_PMD | DAX_ZERO_PAGE, false);
-
-	if (arch_needs_pgtable_deposit()) {
-		pgtable = pte_alloc_one(vma->vm_mm);
-		if (!pgtable)
-			return VM_FAULT_OOM;
-	}
-
-	ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
-	if (!pmd_none(*(vmf->pmd))) {
-		spin_unlock(ptl);
-		goto fallback;
-	}
-
-	if (pgtable) {
-		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
-		mm_inc_nr_ptes(vma->vm_mm);
-	}
-	pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
-	pmd_entry = pmd_mkhuge(pmd_entry);
-	set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
-	spin_unlock(ptl);
-	trace_dax_pmd_load_hole(inode, vmf, zero_page, *entry);
-	return VM_FAULT_NOPAGE;
-
-fallback:
-	if (pgtable)
-		pte_free(vma->vm_mm, pgtable);
-	trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, *entry);
-	return VM_FAULT_FALLBACK;
-}
-
 static bool dax_fault_check_fallback(struct vm_fault *vmf, struct xa_state *xas,
 		pgoff_t max_pgoff)
 {
@@ -1530,17 +1562,15 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_ORDER);
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	bool sync;
-	unsigned int iomap_flags = (write ? IOMAP_WRITE : 0) | IOMAP_FAULT;
+	unsigned int flags = (write ? IOMAP_WRITE : 0) | IOMAP_FAULT;
 	struct inode *inode = mapping->host;
-	vm_fault_t result = VM_FAULT_FALLBACK;
+	vm_fault_t ret = VM_FAULT_FALLBACK;
 	struct iomap iomap = { .type = IOMAP_HOLE };
 	struct iomap srcmap = { .type = IOMAP_HOLE };
 	pgoff_t max_pgoff;
 	void *entry;
 	loff_t pos;
 	int error;
-	pfn_t pfn;
 
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
@@ -1552,7 +1582,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	trace_dax_pmd_fault(inode, vmf, max_pgoff, 0);
 
 	if (xas.xa_index >= max_pgoff) {
-		result = VM_FAULT_SIGBUS;
+		ret = VM_FAULT_SIGBUS;
 		goto out;
 	}
 
@@ -1567,7 +1597,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 */
 	entry = grab_mapping_entry(&xas, mapping, PMD_ORDER);
 	if (xa_is_internal(entry)) {
-		result = xa_to_internal(entry);
+		ret = xa_to_internal(entry);
 		goto fallback;
 	}
 
@@ -1579,7 +1609,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 */
 	if (!pmd_none(*vmf->pmd) && !pmd_trans_huge(*vmf->pmd) &&
 			!pmd_devmap(*vmf->pmd)) {
-		result = 0;
+		ret = 0;
 		goto unlock_entry;
 	}
 
@@ -1589,49 +1619,21 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 	 * to look up our filesystem block.
 	 */
 	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
-	error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap,
-			&srcmap);
+	error = ops->iomap_begin(inode, pos, PMD_SIZE, flags, &iomap, &srcmap);
 	if (error)
 		goto unlock_entry;
 
 	if (iomap.offset + iomap.length < pos + PMD_SIZE)
 		goto finish_iomap;
 
-	sync = dax_fault_is_synchronous(iomap_flags, vma, &iomap);
-
-	switch (iomap.type) {
-	case IOMAP_MAPPED:
-		error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn);
-		if (error < 0)
-			goto finish_iomap;
+	ret = dax_fault_actor(vmf, pfnp, &xas, &entry, true, flags,
+			      &iomap, &srcmap);
 
-		entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn,
-						DAX_PMD, write && !sync);
-
-		if (sync) {
-			result = dax_fault_synchronous_pfnp(pfnp, pfn);
-			goto finish_iomap;
-		}
-
-		trace_dax_pmd_insert_mapping(inode, vmf, PMD_SIZE, pfn, entry);
-		result = vmf_insert_pfn_pmd(vmf, pfn, write);
-		break;
-	case IOMAP_UNWRITTEN:
-	case IOMAP_HOLE:
-		if (WARN_ON_ONCE(write))
-			break;
-		result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry);
-		break;
-	default:
-		WARN_ON_ONCE(1);
-		break;
-	}
-
- finish_iomap:
+finish_iomap:
 	if (ops->iomap_end) {
 		int copied = PMD_SIZE;
 
-		if (result == VM_FAULT_FALLBACK)
+		if (ret == VM_FAULT_FALLBACK)
 			copied = 0;
 		/*
 		 * The fault is done by now and there's no way back (other
@@ -1639,19 +1641,18 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		 * Just ignore error from ->iomap_end since we cannot do much
 		 * with it.
 		 */
-		ops->iomap_end(inode, pos, PMD_SIZE, copied, iomap_flags,
-				&iomap);
+		ops->iomap_end(inode, pos, PMD_SIZE, copied, flags, &iomap);
 	}
- unlock_entry:
+unlock_entry:
 	dax_unlock_entry(&xas, entry);
- fallback:
-	if (result == VM_FAULT_FALLBACK) {
+fallback:
+	if (ret == VM_FAULT_FALLBACK) {
 		split_huge_pmd(vma, vmf->pmd, vmf->address);
 		count_vm_event(THP_FAULT_FALLBACK);
 	}
 out:
-	trace_dax_pmd_fault_done(inode, vmf, max_pgoff, result);
-	return result;
+	trace_dax_pmd_fault_done(inode, vmf, max_pgoff, ret);
+	return ret;
 }
 #else
 static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 26/27] fsdax: switch the fault handlers to use iomap_iter
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (24 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 25/27] fsdax: factor out a dax_fault_actor() helper Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:35   ` Darrick J. Wong
  2021-07-19 10:35 ` [PATCH 27/27] iomap: constify iomap_iter_srcmap Christoph Hellwig
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

Avoid the open coded calls to ->iomap_begin and ->iomap_end and call
iomap_iter instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c | 193 +++++++++++++++++++++----------------------------------
 1 file changed, 75 insertions(+), 118 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 6d0c6d28be83b1..118c9e2923f5f8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1010,7 +1010,7 @@ static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
 	return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
 }
 
-static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
+static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
 			 pfn_t *pfnp)
 {
 	const sector_t sector = dax_iomap_sector(iomap, pos);
@@ -1068,7 +1068,7 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
 
 #ifdef CONFIG_FS_DAX_PMD
 static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
-		struct iomap *iomap, void **entry)
+		const struct iomap *iomap, void **entry)
 {
 	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
 	unsigned long pmd_addr = vmf->address & PMD_MASK;
@@ -1120,7 +1120,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
 }
 #else
 static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
-		struct iomap *iomap, void **entry)
+		const struct iomap *iomap, void **entry)
 {
 	return VM_FAULT_FALLBACK;
 }
@@ -1309,7 +1309,7 @@ static vm_fault_t dax_fault_return(int error)
  * flushed on write-faults (non-cow), but not read-faults.
  */
 static bool dax_fault_is_synchronous(unsigned long flags,
-		struct vm_area_struct *vma, struct iomap *iomap)
+		struct vm_area_struct *vma, const struct iomap *iomap)
 {
 	return (flags & IOMAP_WRITE) && (vma->vm_flags & VM_SYNC)
 		&& (iomap->flags & IOMAP_F_DIRTY);
@@ -1329,22 +1329,22 @@ static vm_fault_t dax_fault_synchronous_pfnp(pfn_t *pfnp, pfn_t pfn)
 	return VM_FAULT_NEEDDSYNC;
 }
 
-static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf, struct iomap *iomap,
-		loff_t pos)
+static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
+		const struct iomap_iter *iter)
 {
-	sector_t sector = dax_iomap_sector(iomap, pos);
+	sector_t sector = dax_iomap_sector(&iter->iomap, iter->pos);
 	unsigned long vaddr = vmf->address;
 	vm_fault_t ret;
 	int error = 0;
 
-	switch (iomap->type) {
+	switch (iter->iomap.type) {
 	case IOMAP_HOLE:
 	case IOMAP_UNWRITTEN:
 		clear_user_highpage(vmf->cow_page, vaddr);
 		break;
 	case IOMAP_MAPPED:
-		error = copy_cow_page_dax(iomap->bdev, iomap->dax_dev, sector,
-					  vmf->cow_page, vaddr);
+		error = copy_cow_page_dax(iter->iomap.bdev, iter->iomap.dax_dev,
+					  sector, vmf->cow_page, vaddr);
 		break;
 	default:
 		WARN_ON_ONCE(1);
@@ -1363,29 +1363,31 @@ static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf, struct iomap *iomap,
 }
 
 /**
- * dax_fault_actor - Common actor to handle pfn insertion in PTE/PMD fault.
+ * dax_fault_iter - Common actor to handle pfn insertion in PTE/PMD fault.
  * @vmf:	vm fault instance
+ * @iter:	iomap iter
  * @pfnp:	pfn to be returned
  * @xas:	the dax mapping tree of a file
  * @entry:	an unlocked dax entry to be inserted
  * @pmd:	distinguish whether it is a pmd fault
- * @flags:	iomap flags
- * @iomap:	from iomap_begin()
- * @srcmap:	from iomap_begin(), not equal to iomap if it is a CoW
  */
-static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
-		struct xa_state *xas, void **entry, bool pmd,
-		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
+static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
+		const struct iomap_iter *iter, pfn_t *pfnp,
+		struct xa_state *xas, void **entry, bool pmd)
 {
 	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
+	const struct iomap *iomap = &iter->iomap;
 	size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
 	loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	bool sync = dax_fault_is_synchronous(flags, vmf->vma, iomap);
+	bool sync = dax_fault_is_synchronous(iter->flags, vmf->vma, iomap);
 	unsigned long entry_flags = pmd ? DAX_PMD : 0;
 	int err = 0;
 	pfn_t pfn;
 
+	if (!pmd && vmf->cow_page)
+		return dax_fault_cow_page(vmf, iter);
+
 	/* if we are reading UNWRITTEN and HOLE, return a hole. */
 	if (!write &&
 	    (iomap->type == IOMAP_UNWRITTEN || iomap->type == IOMAP_HOLE)) {
@@ -1399,7 +1401,7 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
 		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
 	}
 
-	err = dax_iomap_pfn(iomap, pos, size, &pfn);
+	err = dax_iomap_pfn(&iter->iomap, pos, size, &pfn);
 	if (err)
 		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
 
@@ -1422,32 +1424,31 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
 static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 			       int *iomap_errp, const struct iomap_ops *ops)
 {
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
 	XA_STATE(xas, &mapping->i_pages, vmf->pgoff);
-	struct inode *inode = mapping->host;
-	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
-	struct iomap iomap = { .type = IOMAP_HOLE };
-	struct iomap srcmap = { .type = IOMAP_HOLE };
-	unsigned flags = IOMAP_FAULT;
-	int error;
-	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	vm_fault_t ret = 0, major = 0;
+	struct iomap_iter iter = {
+		.inode		= mapping->host,
+		.pos		= (loff_t)vmf->pgoff << PAGE_SHIFT,
+		.len		= PAGE_SIZE,
+		.flags		= IOMAP_FAULT,
+	};
+	vm_fault_t ret = 0;
 	void *entry;
+	int error;
 
-	trace_dax_pte_fault(inode, vmf, ret);
+	trace_dax_pte_fault(iter.inode, vmf, ret);
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is supposed
 	 * to hold locks serializing us with truncate / punch hole so this is
 	 * a reliable test.
 	 */
-	if (pos >= i_size_read(inode)) {
+	if (iter.pos >= i_size_read(iter.inode)) {
 		ret = VM_FAULT_SIGBUS;
 		goto out;
 	}
 
-	if (write && !vmf->cow_page)
-		flags |= IOMAP_WRITE;
+	if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page)
+		iter.flags |= IOMAP_WRITE;
 
 	entry = grab_mapping_entry(&xas, mapping, 0);
 	if (xa_is_internal(entry)) {
@@ -1466,59 +1467,34 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		goto unlock_entry;
 	}
 
-	/*
-	 * Note that we don't bother to use iomap_iter here: DAX required
-	 * the file system block size to be equal the page size, which means
-	 * that we never have to deal with more than a single extent here.
-	 */
-	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
-	if (iomap_errp)
-		*iomap_errp = error;
-	if (error) {
-		ret = dax_fault_return(error);
-		goto unlock_entry;
-	}
-	if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) {
-		ret = VM_FAULT_SIGBUS;	/* fs corruption? */
-		goto finish_iomap;
-	}
-
-	if (vmf->cow_page) {
-		ret = dax_fault_cow_page(vmf, &iomap, pos);
-		goto finish_iomap;
-	}
+	while ((error = iomap_iter(&iter, ops)) > 0) {
+		if (WARN_ON_ONCE(iomap_length(&iter) < PAGE_SIZE)) {
+			iter.processed = -EIO;	/* fs corruption? */
+			continue;
+		}
 
-	ret = dax_fault_actor(vmf, pfnp, &xas, &entry, false, flags,
-			      &iomap, &srcmap);
-	if (ret == VM_FAULT_SIGBUS)
-		goto finish_iomap;
+		ret = dax_fault_iter(vmf, &iter, pfnp, &xas, &entry, false);
+		if (ret != VM_FAULT_SIGBUS &&
+		    (iter.iomap.flags & IOMAP_F_NEW)) {
+			count_vm_event(PGMAJFAULT);
+			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
+			ret |= VM_FAULT_MAJOR;
+		}
 
-	/* read/write MAPPED, CoW UNWRITTEN */
-	if (iomap.flags & IOMAP_F_NEW) {
-		count_vm_event(PGMAJFAULT);
-		count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
-		major = VM_FAULT_MAJOR;
+		if (!(ret & VM_FAULT_ERROR))
+			iter.processed = PAGE_SIZE;
 	}
 
-finish_iomap:
-	if (ops->iomap_end) {
-		int copied = PAGE_SIZE;
+	if (iomap_errp)
+		*iomap_errp = error;
+	if (!ret && error)
+		ret = dax_fault_return(error);
 
-		if (ret & VM_FAULT_ERROR)
-			copied = 0;
-		/*
-		 * The fault is done by now and there's no way back (other
-		 * thread may be already happily using PTE we have installed).
-		 * Just ignore error from ->iomap_end since we cannot do much
-		 * with it.
-		 */
-		ops->iomap_end(inode, pos, PAGE_SIZE, copied, flags, &iomap);
-	}
 unlock_entry:
 	dax_unlock_entry(&xas, entry);
 out:
-	trace_dax_pte_fault_done(inode, vmf, ret);
-	return ret | major;
+	trace_dax_pte_fault_done(iter.inode, vmf, ret);
+	return ret;
 }
 
 #ifdef CONFIG_FS_DAX_PMD
@@ -1558,28 +1534,29 @@ static bool dax_fault_check_fallback(struct vm_fault *vmf, struct xa_state *xas,
 static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 			       const struct iomap_ops *ops)
 {
-	struct vm_area_struct *vma = vmf->vma;
-	struct address_space *mapping = vma->vm_file->f_mapping;
+	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
 	XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_ORDER);
-	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	unsigned int flags = (write ? IOMAP_WRITE : 0) | IOMAP_FAULT;
-	struct inode *inode = mapping->host;
+	struct iomap_iter iter = {
+		.inode		= mapping->host,
+		.len		= PMD_SIZE,
+		.flags		= IOMAP_FAULT,
+	};
 	vm_fault_t ret = VM_FAULT_FALLBACK;
-	struct iomap iomap = { .type = IOMAP_HOLE };
-	struct iomap srcmap = { .type = IOMAP_HOLE };
 	pgoff_t max_pgoff;
 	void *entry;
-	loff_t pos;
 	int error;
 
+	if (vmf->flags & FAULT_FLAG_WRITE)
+		iter.flags |= IOMAP_WRITE;
+
 	/*
 	 * Check whether offset isn't beyond end of file now. Caller is
 	 * supposed to hold locks serializing us with truncate / punch hole so
 	 * this is a reliable test.
 	 */
-	max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+	max_pgoff = DIV_ROUND_UP(i_size_read(iter.inode), PAGE_SIZE);
 
-	trace_dax_pmd_fault(inode, vmf, max_pgoff, 0);
+	trace_dax_pmd_fault(iter.inode, vmf, max_pgoff, 0);
 
 	if (xas.xa_index >= max_pgoff) {
 		ret = VM_FAULT_SIGBUS;
@@ -1613,45 +1590,25 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
 		goto unlock_entry;
 	}
 
-	/*
-	 * Note that we don't use iomap_iter here.  We aren't doing I/O, only
-	 * setting up a mapping, so really we're using iomap_begin() as a way
-	 * to look up our filesystem block.
-	 */
-	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
-	error = ops->iomap_begin(inode, pos, PMD_SIZE, flags, &iomap, &srcmap);
-	if (error)
-		goto unlock_entry;
-
-	if (iomap.offset + iomap.length < pos + PMD_SIZE)
-		goto finish_iomap;
+	iter.pos = (loff_t)xas.xa_index << PAGE_SHIFT;
+	while ((error = iomap_iter(&iter, ops)) > 0) {
+		if (iomap_length(&iter) < PMD_SIZE)
+			continue; /* actually breaks out of the loop */
 
-	ret = dax_fault_actor(vmf, pfnp, &xas, &entry, true, flags,
-			      &iomap, &srcmap);
-
-finish_iomap:
-	if (ops->iomap_end) {
-		int copied = PMD_SIZE;
-
-		if (ret == VM_FAULT_FALLBACK)
-			copied = 0;
-		/*
-		 * The fault is done by now and there's no way back (other
-		 * thread may be already happily using PMD we have installed).
-		 * Just ignore error from ->iomap_end since we cannot do much
-		 * with it.
-		 */
-		ops->iomap_end(inode, pos, PMD_SIZE, copied, flags, &iomap);
+		ret = dax_fault_iter(vmf, &iter, pfnp, &xas, &entry, true);
+		if (ret != VM_FAULT_FALLBACK)
+			iter.processed = PMD_SIZE;
 	}
+
 unlock_entry:
 	dax_unlock_entry(&xas, entry);
 fallback:
 	if (ret == VM_FAULT_FALLBACK) {
-		split_huge_pmd(vma, vmf->pmd, vmf->address);
+		split_huge_pmd(vmf->vma, vmf->pmd, vmf->address);
 		count_vm_event(THP_FAULT_FALLBACK);
 	}
 out:
-	trace_dax_pmd_fault_done(inode, vmf, max_pgoff, ret);
+	trace_dax_pmd_fault_done(iter.inode, vmf, max_pgoff, ret);
 	return ret;
 }
 #else
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 27/27] iomap: constify iomap_iter_srcmap
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (25 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 26/27] fsdax: switch the fault handlers to use iomap_iter Christoph Hellwig
@ 2021-07-19 10:35 ` Christoph Hellwig
  2021-07-19 17:44   ` Darrick J. Wong
  2021-07-19 17:57 ` RFC: switch iomap to an iterator model Darrick J. Wong
  2021-07-29 20:33 ` Darrick J. Wong
  28 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-19 10:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

The srcmap returned from iomap_iter_srcmap is never modified, so mark
the iomap returned from it const and constify a lot of code that never
modifies the iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 32 ++++++++++++++++----------------
 include/linux/iomap.h  |  2 +-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index eb5d742b5bf8b7..a2dd42f3115cfa 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -226,20 +226,20 @@ iomap_read_inline_data(struct inode *inode, struct page *page,
 	SetPageUptodate(page);
 }
 
-static inline bool iomap_block_needs_zeroing(struct iomap_iter *iter,
+static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
 		loff_t pos)
 {
-	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 
 	return srcmap->type != IOMAP_MAPPED ||
 		(srcmap->flags & IOMAP_F_NEW) ||
 		pos >= i_size_read(iter->inode);
 }
 
-static loff_t iomap_readpage_iter(struct iomap_iter *iter,
+static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 		struct iomap_readpage_ctx *ctx, loff_t offset)
 {
-	struct iomap *iomap = &iter->iomap;
+	const struct iomap *iomap = &iter->iomap;
 	loff_t pos = iter->pos + offset;
 	loff_t length = iomap_length(iter) - offset;
 	struct page *page = ctx->cur_page;
@@ -355,7 +355,7 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 }
 EXPORT_SYMBOL_GPL(iomap_readpage);
 
-static loff_t iomap_readahead_iter(struct iomap_iter *iter,
+static loff_t iomap_readahead_iter(const struct iomap_iter *iter,
 		struct iomap_readpage_ctx *ctx)
 {
 	loff_t length = iomap_length(iter);
@@ -539,10 +539,10 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
 	return submit_bio_wait(&bio);
 }
 
-static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
+static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 		unsigned len, struct page *page)
 {
-	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	struct iomap_page *iop = iomap_page_create(iter->inode, page);
 	loff_t block_size = i_blocksize(iter->inode);
 	loff_t block_start = round_down(pos, block_size);
@@ -580,11 +580,11 @@ static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
 	return 0;
 }
 
-static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
-		struct page **pagep)
+static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
+		unsigned len, struct page **pagep)
 {
 	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
-	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	struct page *page;
 	int status = 0;
 
@@ -655,10 +655,10 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	return copied;
 }
 
-static size_t iomap_write_end_inline(struct iomap_iter *iter, struct page *page,
-		loff_t pos, size_t copied)
+static size_t iomap_write_end_inline(const struct iomap_iter *iter,
+		struct page *page, loff_t pos, size_t copied)
 {
-	struct iomap *iomap = &iter->iomap;
+	const struct iomap *iomap = &iter->iomap;
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
@@ -678,7 +678,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
 		size_t copied, struct page *page)
 {
 	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
-	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t old_size = iter->inode->i_size;
 	size_t ret;
 
@@ -803,7 +803,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 {
 	struct iomap *iomap = &iter->iomap;
-	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t pos = iter->pos;
 	loff_t length = iomap_length(iter);
 	long status = 0;
@@ -879,7 +879,7 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
 static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 {
 	struct iomap *iomap = &iter->iomap;
-	struct iomap *srcmap = iomap_iter_srcmap(iter);
+	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t pos = iter->pos;
 	loff_t length = iomap_length(iter);
 	loff_t written = 0;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 719798814bdfdb..a1fb0d22efbd40 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -193,7 +193,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter)
  * for a given operation, which may or may no be identical to the destination
  * map in &i->iomap.
  */
-static inline struct iomap *iomap_iter_srcmap(struct iomap_iter *i)
+static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
 {
 	if (i->srcmap.type != IOMAP_HOLE)
 		return &i->srcmap;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/27] iomap: fix a trivial comment typo in trace.h
  2021-07-19 10:34 ` [PATCH 01/27] iomap: fix a trivial comment typo in trace.h Christoph Hellwig
@ 2021-07-19 16:00   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 16:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:34:54PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/trace.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index fdc7ae388476f5..e9cd5cc0d6ba40 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -2,7 +2,7 @@
>  /*
>   * Copyright (c) 2009-2019 Christoph Hellwig
>   *
> - * NOTE: none of these tracepoints shall be consider a stable kernel ABI
> + * NOTE: none of these tracepoints shall be considered a stable kernel ABI
>   * as they can change at any time.
>   */
>  #undef TRACE_SYSTEM
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/27] iomap: remove the iomap arguments to ->page_{prepare,done}
  2021-07-19 10:34 ` [PATCH 02/27] iomap: remove the iomap arguments to ->page_{prepare,done} Christoph Hellwig
@ 2021-07-19 16:04   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 16:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:34:55PM +0200, Christoph Hellwig wrote:
> These aren't actually used by the only instance implementing the methods.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

/me finds it kind of amusing that we still don't have any ->page_prepare
use cases for actually passing the page in, but if nobody /else/ has any
objection or imminently wants to use the iomap argument, then...

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/gfs2/bmap.c         | 5 ++---
>  fs/iomap/buffered-io.c | 6 +++---
>  include/linux/iomap.h  | 5 ++---
>  3 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index ed8b67b2171817..5414c2c3358092 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1002,7 +1002,7 @@ static void gfs2_write_unlock(struct inode *inode)
>  }
>  
>  static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
> -				   unsigned len, struct iomap *iomap)
> +				   unsigned len)
>  {
>  	unsigned int blockmask = i_blocksize(inode) - 1;
>  	struct gfs2_sbd *sdp = GFS2_SB(inode);
> @@ -1013,8 +1013,7 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
>  }
>  
>  static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
> -				 unsigned copied, struct page *page,
> -				 struct iomap *iomap)
> +				 unsigned copied, struct page *page)
>  {
>  	struct gfs2_trans *tr = current->journal_info;
>  	struct gfs2_inode *ip = GFS2_I(inode);
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 87ccb3438becd9..75310f6fcf8401 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -605,7 +605,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		return -EINTR;
>  
>  	if (page_ops && page_ops->page_prepare) {
> -		status = page_ops->page_prepare(inode, pos, len, iomap);
> +		status = page_ops->page_prepare(inode, pos, len);
>  		if (status)
>  			return status;
>  	}
> @@ -638,7 +638,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  
>  out_no_page:
>  	if (page_ops && page_ops->page_done)
> -		page_ops->page_done(inode, pos, 0, NULL, iomap);
> +		page_ops->page_done(inode, pos, 0, NULL);
>  	return status;
>  }
>  
> @@ -714,7 +714,7 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  	if (old_size < pos)
>  		pagecache_isize_extended(inode, old_size, pos);
>  	if (page_ops && page_ops->page_done)
> -		page_ops->page_done(inode, pos, ret, page, iomap);
> +		page_ops->page_done(inode, pos, ret, page);
>  	put_page(page);
>  
>  	if (ret < len)
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 479c1da3e2211e..093519d91cc9cc 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -108,10 +108,9 @@ iomap_sector(struct iomap *iomap, loff_t pos)
>   * associated page could not be obtained.
>   */
>  struct iomap_page_ops {
> -	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
> -			struct iomap *iomap);
> +	int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
>  	void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
> -			struct page *page, struct iomap *iomap);
> +			struct page *page);
>  };
>  
>  /*
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const
  2021-07-19 10:34 ` [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const Christoph Hellwig
@ 2021-07-19 16:08   ` Darrick J. Wong
  2021-07-20  9:52     ` Nikolay Borisov
  2021-07-26  8:12     ` Christoph Hellwig
  0 siblings, 2 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 16:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:34:56PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

/me wonders, does this have any significant effect on the generated
code?

It's probably a good idea to feed the optimizer as much usage info as we
can, though I would imagine that for such a simple function it can
probably tell that we don't change *iomap.

IMHO, constifiying functions is a good way to signal to /programmers/
that they're not intended to touch the arguments, so

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  include/linux/iomap.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 093519d91cc9cc..f9c36df6a3061b 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -91,8 +91,7 @@ struct iomap {
>  	const struct iomap_page_ops *page_ops;
>  };
>  
> -static inline sector_t
> -iomap_sector(struct iomap *iomap, loff_t pos)
> +static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
>  {
>  	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/27] iomap: add the new iomap_iter model
  2021-07-19 10:35 ` [PATCH 08/27] iomap: add the new iomap_iter model Christoph Hellwig
@ 2021-07-19 16:56   ` Darrick J. Wong
  2021-07-26  8:15     ` Christoph Hellwig
  2021-07-19 21:48   ` Dave Chinner
  1 sibling, 1 reply; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 16:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:01PM +0200, Christoph Hellwig wrote:
> The iomap_iter struct provides a convenient way to package up and
> maintain all the arguments to the various mapping and operation
> functions.  It is operated on using the iomap_iter() function that
> is called in loop until the whole range has been processed.  Compared
> to the existing iomap_apply() function this avoid an indirect call
> for each iteration.
> 
> For now iomap_iter() calls back into the existing ->iomap_begin and
> ->iomap_end methods, but in the future this could be further optimized
> to avoid indirect calls entirely.
> 
> Based on an earlier patch from Matthew Wilcox <willy@infradead.org>.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/Makefile     |  1 +
>  fs/iomap/iter.c       | 74 +++++++++++++++++++++++++++++++++++++++++++
>  fs/iomap/trace.h      | 37 +++++++++++++++++++++-
>  include/linux/iomap.h | 56 ++++++++++++++++++++++++++++++++
>  4 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 fs/iomap/iter.c
> 
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index eef2722d93a183..85034deb5a2f19 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_FS_IOMAP)		+= iomap.o
>  
>  iomap-y				+= trace.o \
>  				   apply.o \
> +				   iter.o \
>  				   buffered-io.o \
>  				   direct-io.o \
>  				   fiemap.o \
> diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> new file mode 100644
> index 00000000000000..b21e2489700b7c
> --- /dev/null
> +++ b/fs/iomap/iter.c
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 Christoph Hellwig.
> + */
> +#include <linux/fs.h>
> +#include <linux/iomap.h>
> +#include "trace.h"
> +
> +static inline int iomap_iter_advance(struct iomap_iter *iter)
> +{
> +	/* handle the previous iteration (if any) */
> +	if (iter->iomap.length) {
> +		if (iter->processed <= 0)
> +			return iter->processed;

Hmm, converting ssize_t to int here... I suppose that's fine since we're
merely returning "the usual negative errno code", but read on.

> +		WARN_ON_ONCE(iter->processed > iomap_length(iter));
> +		iter->pos += iter->processed;
> +		iter->len -= iter->processed;
> +		if (!iter->len)
> +			return 0;
> +	}
> +
> +	/* clear the state for the next iteration */
> +	iter->processed = 0;
> +	memset(&iter->iomap, 0, sizeof(iter->iomap));
> +	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
> +	return 1;
> +}
> +
> +static inline void iomap_iter_done(struct iomap_iter *iter)
> +{
> +	WARN_ON_ONCE(iter->iomap.offset > iter->pos);
> +	WARN_ON_ONCE(iter->iomap.length == 0);
> +	WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos);
> +
> +	trace_iomap_iter_dstmap(iter->inode, &iter->iomap);
> +	if (iter->srcmap.type != IOMAP_HOLE)
> +		trace_iomap_iter_srcmap(iter->inode, &iter->srcmap);
> +}
> +
> +/**
> + * iomap_iter - iterate over a ranges in a file
> + * @iter: iteration structue
> + * @ops: iomap ops provided by the file system
> + *
> + * Iterate over file system provided contiguous ranges of blocks with the same
> + * state.  Should be called in a loop that continues as long as this function
> + * returns a positive value.  If 0 or a negative value is returned the caller
> + * should break out of the loop - a negative value is an error either from the
> + * file system or from the last iteration stored in @iter.copied.
> + */
> +int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> +{
> +	int ret;
> +
> +	if (iter->iomap.length && ops->iomap_end) {
> +		ret = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
> +				iter->processed > 0 ? iter->processed : 0,
> +				iter->flags, &iter->iomap);
> +		if (ret < 0 && !iter->processed)
> +			return ret;
> +	}
> +
> +	trace_iomap_iter(iter, ops, _RET_IP_);
> +	ret = iomap_iter_advance(iter);
> +	if (ret <= 0)
> +		return ret;
> +
> +	ret = ops->iomap_begin(iter->inode, iter->pos, iter->len, iter->flags,
> +			       &iter->iomap, &iter->srcmap);
> +	if (ret < 0)
> +		return ret;
> +	iomap_iter_done(iter);
> +	return 1;
> +}

<snip out macro hell>

> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index f9c36df6a3061b..a9f3f736017989 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -143,6 +143,62 @@ struct iomap_ops {
>  			ssize_t written, unsigned flags, struct iomap *iomap);
>  };
>  
> +/**
> + * struct iomap_iter - Iterate through a range of a file
> + * @inode: Set at the start of the iteration and should not change.
> + * @pos: The current file position we are operating on.  It is updated by
> + *	calls to iomap_iter().  Treat as read-only in the body.
> + * @len: The remaining length of the file segment we're operating on.
> + *	It is updated at the same time as @pos.
> + * @processed: The number of bytes processed by the body in the most recent
> + *	iteration, or a negative errno. 0 causes the iteration to stop.
> + * @flags: Zero or more of the iomap_begin flags above.
> + * @iomap: Map describing the I/O iteration
> + * @srcmap: Source map for COW operations
> + */
> +struct iomap_iter {
> +	struct inode *inode;
> +	loff_t pos;
> +	u64 len;
> +	ssize_t processed;

I looked a the SEEK_HOLE/SEEK_DATA conversion a few patches ahead, and
noticed that it does things like:

	iter.processed = iomap_seek_hole_iter(&iter, &offset);

where iomap_seek_hole_iter returns a loff_t.  This will not do the right
thing handling large extents on 32-bit architectures because ssize_t
will a 32-bit signed int whereas loff_t is always a 64-bit signed int.

Linus previously complained to me about filesystem code (especially
iomap since it was "newer") (ab)using loff_t variables to store the
lengths of byte ranges.  It was "loff_t length;" (or so willy
recollects) that tripped him up.

ISTR he also said we should use size_t for all lengths because nobody
should do operations larger than ~2G, but I reject that because iomap
has users that iterate large ranges of data without generating any IO
(e.g. fiemap, seek, swapfile activation).

So... rather than confusing things even more by mixing u64 and ssize_t
for lengths, can we introduce a new 64-bit length typedef for iomap?
Last summer, Dave suggested[1] something like:

	typedef long long lsize_t;

That would enable cleanup of all the count/size/length parameters in
fs/remap_range.c and fs/xfs/xfs_reflink.c to use the new 64-bit length
type, since those operations have never been limited to 32-bit sizes.

--D

[1] https://lore.kernel.org/linux-xfs/20200825042711.GL12131@dread.disaster.area/

> +	unsigned flags;
> +	struct iomap iomap;
> +	struct iomap srcmap;
> +};
> +
> +int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops);
> +
> +/**
> + * iomap_length - length of the current iomap iteration
> + * @iter: iteration structure
> + *
> + * Returns the length that the operation applies to for the current iteration.
> + */
> +static inline u64 iomap_length(const struct iomap_iter *iter)
> +{
> +	u64 end = iter->iomap.offset + iter->iomap.length;
> +
> +	if (iter->srcmap.type != IOMAP_HOLE)
> +		end = min(end, iter->srcmap.offset + iter->srcmap.length);
> +	return min(iter->len, end - iter->pos);
> +}
> +
> +/**
> + * iomap_iter_srcmap - return the source map for the current iomap iteration
> + * @i: iteration structure
> + *
> + * Write operations on file systems with reflink support might require a
> + * source and a destination map.  This function retourns the source map
> + * for a given operation, which may or may no be identical to the destination
> + * map in &i->iomap.
> + */
> +static inline struct iomap *iomap_iter_srcmap(struct iomap_iter *i)
> +{
> +	if (i->srcmap.type != IOMAP_HOLE)
> +		return &i->srcmap;
> +	return &i->iomap;
> +}
> +
>  /*
>   * Main iomap iterator function.
>   */
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter
  2021-07-19 10:35 ` [PATCH 16/27] iomap: switch iomap_bmap " Christoph Hellwig
@ 2021-07-19 17:05   ` Darrick J. Wong
  2021-07-26  8:19     ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:09PM +0200, Christoph Hellwig wrote:
> Rewrite the ->bmap implementation based on iomap_iter.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/fiemap.c | 31 +++++++++++++------------------
>  1 file changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index acad09a8c188df..60daadba16c149 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -92,35 +92,30 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
>  }
>  EXPORT_SYMBOL_GPL(iomap_fiemap);
>  
> -static loff_t
> -iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> -		void *data, struct iomap *iomap, struct iomap *srcmap)
> -{
> -	sector_t *bno = data, addr;
> -
> -	if (iomap->type == IOMAP_MAPPED) {
> -		addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits;
> -		*bno = addr;
> -	}
> -	return 0;
> -}
> -
>  /* legacy ->bmap interface.  0 is the error return (!) */
>  sector_t
>  iomap_bmap(struct address_space *mapping, sector_t bno,
>  		const struct iomap_ops *ops)
>  {
> -	struct inode *inode = mapping->host;
> -	loff_t pos = bno << inode->i_blkbits;
> -	unsigned blocksize = i_blocksize(inode);
> +	struct iomap_iter iter = {
> +		.inode	= mapping->host,
> +		.pos	= (loff_t)bno << mapping->host->i_blkbits,
> +		.len	= i_blocksize(mapping->host),
> +		.flags	= IOMAP_REPORT,
> +	};
>  	int ret;
>  
>  	if (filemap_write_and_wait(mapping))
>  		return 0;
>  
>  	bno = 0;
> -	ret = iomap_apply(inode, pos, blocksize, 0, ops, &bno,
> -			  iomap_bmap_actor);
> +	while ((ret = iomap_iter(&iter, ops)) > 0) {
> +		if (iter.iomap.type != IOMAP_MAPPED)
> +			continue;

There isn't a mapped extent, so return 0 here, right?

--D

> +		bno = (iter.pos - iter.iomap.offset + iter.iomap.addr) >>
> +				mapping->host->i_blkbits;
> +	}
> +
>  	if (ret)
>  		return 0;
>  	return bno;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 17/27] iomap: switch iomap_seek_hole to use iomap_iter
  2021-07-19 10:35 ` [PATCH 17/27] iomap: switch iomap_seek_hole " Christoph Hellwig
@ 2021-07-19 17:22   ` Darrick J. Wong
  2021-07-26  8:22     ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:10PM +0200, Christoph Hellwig wrote:
> Rewrite iomap_seek_hole to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/seek.c | 46 +++++++++++++++++++++++-----------------------
>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index ce6fb810854fec..7d6ed9af925e96 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (C) 2017 Red Hat, Inc.
> - * Copyright (c) 2018 Christoph Hellwig.
> + * Copyright (c) 2018-2021 Christoph Hellwig.
>   */
>  #include <linux/module.h>
>  #include <linux/compiler.h>
> @@ -10,21 +10,19 @@
>  #include <linux/pagemap.h>
>  #include <linux/pagevec.h>
>  
> -static loff_t
> -iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
> -		      void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_seek_hole_iter(const struct iomap_iter *iter, loff_t *pos)

/me wonders if @pos should be named hole_pos (here and in the caller) to
make it a little easier to read...

>  {
> -	loff_t offset = start;
> +	loff_t length = iomap_length(iter);
>  
> -	switch (iomap->type) {
> +	switch (iter->iomap.type) {
>  	case IOMAP_UNWRITTEN:
> -		offset = mapping_seek_hole_data(inode->i_mapping, start,
> -				start + length, SEEK_HOLE);
> -		if (offset == start + length)
> +		*pos = mapping_seek_hole_data(iter->inode->i_mapping,
> +				iter->pos, iter->pos + length, SEEK_HOLE);
> +		if (*pos == iter->pos + length)
>  			return length;
> -		fallthrough;
> +		return 0;
>  	case IOMAP_HOLE:
> -		*(loff_t *)data = offset;
> +		*pos = iter->pos;
>  		return 0;
>  	default:
>  		return length;
> @@ -35,23 +33,25 @@ loff_t
>  iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops)
>  {
>  	loff_t size = i_size_read(inode);
> -	loff_t ret;
> +	struct iomap_iter iter = {
> +		.inode	= inode,
> +		.pos	= offset,
> +		.flags	= IOMAP_REPORT,
> +	};
> +	int ret;
>  
>  	/* Nothing to be found before or beyond the end of the file. */
>  	if (offset < 0 || offset >= size)
>  		return -ENXIO;
>  
> -	while (offset < size) {
> -		ret = iomap_apply(inode, offset, size - offset, IOMAP_REPORT,
> -				  ops, &offset, iomap_seek_hole_actor);
> -		if (ret < 0)
> -			return ret;
> -		if (ret == 0)
> -			break;
> -		offset += ret;
> -	}
> -
> -	return offset;
> +	iter.len = size - offset;
> +	while ((ret = iomap_iter(&iter, ops)) > 0)
> +		iter.processed = iomap_seek_hole_iter(&iter, &offset);
> +	if (ret < 0)
> +		return ret;
> +	if (iter.len)
> +		return offset;

...because what we're really saying here is that if seek_hole_iter found
a hole (and returned zero, thereby terminating the loop before iter.len
could reach zero), we want to return the position of the hole.

> +	return size;

Not sure why we return size here...?  Oh, because there's an implicit
hole at EOF, so we return i_size.  Uh, does this do the right thing if
->iomap_begin returns posteof mappings?  I don't see anything in
iomap_iter_advance that would stop iteration at EOF.

--D

>  }
>  EXPORT_SYMBOL_GPL(iomap_seek_hole);
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 26/27] fsdax: switch the fault handlers to use iomap_iter
  2021-07-19 10:35 ` [PATCH 26/27] fsdax: switch the fault handlers to use iomap_iter Christoph Hellwig
@ 2021-07-19 17:35   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:19PM +0200, Christoph Hellwig wrote:
> Avoid the open coded calls to ->iomap_begin and ->iomap_end and call
> iomap_iter instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Finally this nightmare is over...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/dax.c | 193 +++++++++++++++++++++----------------------------------
>  1 file changed, 75 insertions(+), 118 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 6d0c6d28be83b1..118c9e2923f5f8 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1010,7 +1010,7 @@ static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
>  	return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
>  }
>  
> -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> +static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
>  			 pfn_t *pfnp)
>  {
>  	const sector_t sector = dax_iomap_sector(iomap, pos);
> @@ -1068,7 +1068,7 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
>  
>  #ifdef CONFIG_FS_DAX_PMD
>  static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
> -		struct iomap *iomap, void **entry)
> +		const struct iomap *iomap, void **entry)
>  {
>  	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>  	unsigned long pmd_addr = vmf->address & PMD_MASK;
> @@ -1120,7 +1120,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
>  }
>  #else
>  static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf,
> -		struct iomap *iomap, void **entry)
> +		const struct iomap *iomap, void **entry)
>  {
>  	return VM_FAULT_FALLBACK;
>  }
> @@ -1309,7 +1309,7 @@ static vm_fault_t dax_fault_return(int error)
>   * flushed on write-faults (non-cow), but not read-faults.
>   */
>  static bool dax_fault_is_synchronous(unsigned long flags,
> -		struct vm_area_struct *vma, struct iomap *iomap)
> +		struct vm_area_struct *vma, const struct iomap *iomap)
>  {
>  	return (flags & IOMAP_WRITE) && (vma->vm_flags & VM_SYNC)
>  		&& (iomap->flags & IOMAP_F_DIRTY);
> @@ -1329,22 +1329,22 @@ static vm_fault_t dax_fault_synchronous_pfnp(pfn_t *pfnp, pfn_t pfn)
>  	return VM_FAULT_NEEDDSYNC;
>  }
>  
> -static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf, struct iomap *iomap,
> -		loff_t pos)
> +static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
> +		const struct iomap_iter *iter)
>  {
> -	sector_t sector = dax_iomap_sector(iomap, pos);
> +	sector_t sector = dax_iomap_sector(&iter->iomap, iter->pos);
>  	unsigned long vaddr = vmf->address;
>  	vm_fault_t ret;
>  	int error = 0;
>  
> -	switch (iomap->type) {
> +	switch (iter->iomap.type) {
>  	case IOMAP_HOLE:
>  	case IOMAP_UNWRITTEN:
>  		clear_user_highpage(vmf->cow_page, vaddr);
>  		break;
>  	case IOMAP_MAPPED:
> -		error = copy_cow_page_dax(iomap->bdev, iomap->dax_dev, sector,
> -					  vmf->cow_page, vaddr);
> +		error = copy_cow_page_dax(iter->iomap.bdev, iter->iomap.dax_dev,
> +					  sector, vmf->cow_page, vaddr);
>  		break;
>  	default:
>  		WARN_ON_ONCE(1);
> @@ -1363,29 +1363,31 @@ static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf, struct iomap *iomap,
>  }
>  
>  /**
> - * dax_fault_actor - Common actor to handle pfn insertion in PTE/PMD fault.
> + * dax_fault_iter - Common actor to handle pfn insertion in PTE/PMD fault.
>   * @vmf:	vm fault instance
> + * @iter:	iomap iter
>   * @pfnp:	pfn to be returned
>   * @xas:	the dax mapping tree of a file
>   * @entry:	an unlocked dax entry to be inserted
>   * @pmd:	distinguish whether it is a pmd fault
> - * @flags:	iomap flags
> - * @iomap:	from iomap_begin()
> - * @srcmap:	from iomap_begin(), not equal to iomap if it is a CoW
>   */
> -static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
> -		struct xa_state *xas, void **entry, bool pmd,
> -		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
> +static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
> +		const struct iomap_iter *iter, pfn_t *pfnp,
> +		struct xa_state *xas, void **entry, bool pmd)
>  {
>  	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> +	const struct iomap *iomap = &iter->iomap;
>  	size_t size = pmd ? PMD_SIZE : PAGE_SIZE;
>  	loff_t pos = (loff_t)xas->xa_index << PAGE_SHIFT;
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
> -	bool sync = dax_fault_is_synchronous(flags, vmf->vma, iomap);
> +	bool sync = dax_fault_is_synchronous(iter->flags, vmf->vma, iomap);
>  	unsigned long entry_flags = pmd ? DAX_PMD : 0;
>  	int err = 0;
>  	pfn_t pfn;
>  
> +	if (!pmd && vmf->cow_page)
> +		return dax_fault_cow_page(vmf, iter);
> +
>  	/* if we are reading UNWRITTEN and HOLE, return a hole. */
>  	if (!write &&
>  	    (iomap->type == IOMAP_UNWRITTEN || iomap->type == IOMAP_HOLE)) {
> @@ -1399,7 +1401,7 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
>  		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
>  	}
>  
> -	err = dax_iomap_pfn(iomap, pos, size, &pfn);
> +	err = dax_iomap_pfn(&iter->iomap, pos, size, &pfn);
>  	if (err)
>  		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
>  
> @@ -1422,32 +1424,31 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
>  static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  			       int *iomap_errp, const struct iomap_ops *ops)
>  {
> -	struct vm_area_struct *vma = vmf->vma;
> -	struct address_space *mapping = vma->vm_file->f_mapping;
> +	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>  	XA_STATE(xas, &mapping->i_pages, vmf->pgoff);
> -	struct inode *inode = mapping->host;
> -	loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
> -	struct iomap iomap = { .type = IOMAP_HOLE };
> -	struct iomap srcmap = { .type = IOMAP_HOLE };
> -	unsigned flags = IOMAP_FAULT;
> -	int error;
> -	bool write = vmf->flags & FAULT_FLAG_WRITE;
> -	vm_fault_t ret = 0, major = 0;
> +	struct iomap_iter iter = {
> +		.inode		= mapping->host,
> +		.pos		= (loff_t)vmf->pgoff << PAGE_SHIFT,
> +		.len		= PAGE_SIZE,
> +		.flags		= IOMAP_FAULT,
> +	};
> +	vm_fault_t ret = 0;
>  	void *entry;
> +	int error;
>  
> -	trace_dax_pte_fault(inode, vmf, ret);
> +	trace_dax_pte_fault(iter.inode, vmf, ret);
>  	/*
>  	 * Check whether offset isn't beyond end of file now. Caller is supposed
>  	 * to hold locks serializing us with truncate / punch hole so this is
>  	 * a reliable test.
>  	 */
> -	if (pos >= i_size_read(inode)) {
> +	if (iter.pos >= i_size_read(iter.inode)) {
>  		ret = VM_FAULT_SIGBUS;
>  		goto out;
>  	}
>  
> -	if (write && !vmf->cow_page)
> -		flags |= IOMAP_WRITE;
> +	if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page)
> +		iter.flags |= IOMAP_WRITE;
>  
>  	entry = grab_mapping_entry(&xas, mapping, 0);
>  	if (xa_is_internal(entry)) {
> @@ -1466,59 +1467,34 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  		goto unlock_entry;
>  	}
>  
> -	/*
> -	 * Note that we don't bother to use iomap_iter here: DAX required
> -	 * the file system block size to be equal the page size, which means
> -	 * that we never have to deal with more than a single extent here.
> -	 */
> -	error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap);
> -	if (iomap_errp)
> -		*iomap_errp = error;
> -	if (error) {
> -		ret = dax_fault_return(error);
> -		goto unlock_entry;
> -	}
> -	if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) {
> -		ret = VM_FAULT_SIGBUS;	/* fs corruption? */
> -		goto finish_iomap;
> -	}
> -
> -	if (vmf->cow_page) {
> -		ret = dax_fault_cow_page(vmf, &iomap, pos);
> -		goto finish_iomap;
> -	}
> +	while ((error = iomap_iter(&iter, ops)) > 0) {
> +		if (WARN_ON_ONCE(iomap_length(&iter) < PAGE_SIZE)) {
> +			iter.processed = -EIO;	/* fs corruption? */
> +			continue;
> +		}
>  
> -	ret = dax_fault_actor(vmf, pfnp, &xas, &entry, false, flags,
> -			      &iomap, &srcmap);
> -	if (ret == VM_FAULT_SIGBUS)
> -		goto finish_iomap;
> +		ret = dax_fault_iter(vmf, &iter, pfnp, &xas, &entry, false);
> +		if (ret != VM_FAULT_SIGBUS &&
> +		    (iter.iomap.flags & IOMAP_F_NEW)) {
> +			count_vm_event(PGMAJFAULT);
> +			count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);
> +			ret |= VM_FAULT_MAJOR;
> +		}
>  
> -	/* read/write MAPPED, CoW UNWRITTEN */
> -	if (iomap.flags & IOMAP_F_NEW) {
> -		count_vm_event(PGMAJFAULT);
> -		count_memcg_event_mm(vma->vm_mm, PGMAJFAULT);
> -		major = VM_FAULT_MAJOR;
> +		if (!(ret & VM_FAULT_ERROR))
> +			iter.processed = PAGE_SIZE;
>  	}
>  
> -finish_iomap:
> -	if (ops->iomap_end) {
> -		int copied = PAGE_SIZE;
> +	if (iomap_errp)
> +		*iomap_errp = error;
> +	if (!ret && error)
> +		ret = dax_fault_return(error);
>  
> -		if (ret & VM_FAULT_ERROR)
> -			copied = 0;
> -		/*
> -		 * The fault is done by now and there's no way back (other
> -		 * thread may be already happily using PTE we have installed).
> -		 * Just ignore error from ->iomap_end since we cannot do much
> -		 * with it.
> -		 */
> -		ops->iomap_end(inode, pos, PAGE_SIZE, copied, flags, &iomap);
> -	}
>  unlock_entry:
>  	dax_unlock_entry(&xas, entry);
>  out:
> -	trace_dax_pte_fault_done(inode, vmf, ret);
> -	return ret | major;
> +	trace_dax_pte_fault_done(iter.inode, vmf, ret);
> +	return ret;
>  }
>  
>  #ifdef CONFIG_FS_DAX_PMD
> @@ -1558,28 +1534,29 @@ static bool dax_fault_check_fallback(struct vm_fault *vmf, struct xa_state *xas,
>  static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  			       const struct iomap_ops *ops)
>  {
> -	struct vm_area_struct *vma = vmf->vma;
> -	struct address_space *mapping = vma->vm_file->f_mapping;
> +	struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>  	XA_STATE_ORDER(xas, &mapping->i_pages, vmf->pgoff, PMD_ORDER);
> -	bool write = vmf->flags & FAULT_FLAG_WRITE;
> -	unsigned int flags = (write ? IOMAP_WRITE : 0) | IOMAP_FAULT;
> -	struct inode *inode = mapping->host;
> +	struct iomap_iter iter = {
> +		.inode		= mapping->host,
> +		.len		= PMD_SIZE,
> +		.flags		= IOMAP_FAULT,
> +	};
>  	vm_fault_t ret = VM_FAULT_FALLBACK;
> -	struct iomap iomap = { .type = IOMAP_HOLE };
> -	struct iomap srcmap = { .type = IOMAP_HOLE };
>  	pgoff_t max_pgoff;
>  	void *entry;
> -	loff_t pos;
>  	int error;
>  
> +	if (vmf->flags & FAULT_FLAG_WRITE)
> +		iter.flags |= IOMAP_WRITE;
> +
>  	/*
>  	 * Check whether offset isn't beyond end of file now. Caller is
>  	 * supposed to hold locks serializing us with truncate / punch hole so
>  	 * this is a reliable test.
>  	 */
> -	max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> +	max_pgoff = DIV_ROUND_UP(i_size_read(iter.inode), PAGE_SIZE);
>  
> -	trace_dax_pmd_fault(inode, vmf, max_pgoff, 0);
> +	trace_dax_pmd_fault(iter.inode, vmf, max_pgoff, 0);
>  
>  	if (xas.xa_index >= max_pgoff) {
>  		ret = VM_FAULT_SIGBUS;
> @@ -1613,45 +1590,25 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
>  		goto unlock_entry;
>  	}
>  
> -	/*
> -	 * Note that we don't use iomap_iter here.  We aren't doing I/O, only
> -	 * setting up a mapping, so really we're using iomap_begin() as a way
> -	 * to look up our filesystem block.
> -	 */
> -	pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> -	error = ops->iomap_begin(inode, pos, PMD_SIZE, flags, &iomap, &srcmap);
> -	if (error)
> -		goto unlock_entry;
> -
> -	if (iomap.offset + iomap.length < pos + PMD_SIZE)
> -		goto finish_iomap;
> +	iter.pos = (loff_t)xas.xa_index << PAGE_SHIFT;
> +	while ((error = iomap_iter(&iter, ops)) > 0) {
> +		if (iomap_length(&iter) < PMD_SIZE)
> +			continue; /* actually breaks out of the loop */
>  
> -	ret = dax_fault_actor(vmf, pfnp, &xas, &entry, true, flags,
> -			      &iomap, &srcmap);
> -
> -finish_iomap:
> -	if (ops->iomap_end) {
> -		int copied = PMD_SIZE;
> -
> -		if (ret == VM_FAULT_FALLBACK)
> -			copied = 0;
> -		/*
> -		 * The fault is done by now and there's no way back (other
> -		 * thread may be already happily using PMD we have installed).
> -		 * Just ignore error from ->iomap_end since we cannot do much
> -		 * with it.
> -		 */
> -		ops->iomap_end(inode, pos, PMD_SIZE, copied, flags, &iomap);
> +		ret = dax_fault_iter(vmf, &iter, pfnp, &xas, &entry, true);
> +		if (ret != VM_FAULT_FALLBACK)
> +			iter.processed = PMD_SIZE;
>  	}
> +
>  unlock_entry:
>  	dax_unlock_entry(&xas, entry);
>  fallback:
>  	if (ret == VM_FAULT_FALLBACK) {
> -		split_huge_pmd(vma, vmf->pmd, vmf->address);
> +		split_huge_pmd(vmf->vma, vmf->pmd, vmf->address);
>  		count_vm_event(THP_FAULT_FALLBACK);
>  	}
>  out:
> -	trace_dax_pmd_fault_done(inode, vmf, max_pgoff, ret);
> +	trace_dax_pmd_fault_done(iter.inode, vmf, max_pgoff, ret);
>  	return ret;
>  }
>  #else
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/27] fs: mark the iomap argument to __block_write_begin_int const
  2021-07-19 10:34 ` [PATCH 04/27] fs: mark the iomap argument to __block_write_begin_int const Christoph Hellwig
@ 2021-07-19 17:35   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:34:57PM +0200, Christoph Hellwig wrote:
> __block_write_begin_int never modifies the passed in iomap, so mark it
> const.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/buffer.c   | 4 ++--
>  fs/internal.h | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 6290c3afdba488..bd6a9e9fbd64c9 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1912,7 +1912,7 @@ EXPORT_SYMBOL(page_zero_new_buffers);
>  
>  static void
>  iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
> -		struct iomap *iomap)
> +		const struct iomap *iomap)
>  {
>  	loff_t offset = block << inode->i_blkbits;
>  
> @@ -1966,7 +1966,7 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>  }
>  
>  int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
> -		get_block_t *get_block, struct iomap *iomap)
> +		get_block_t *get_block, const struct iomap *iomap)
>  {
>  	unsigned from = pos & (PAGE_SIZE - 1);
>  	unsigned to = from + len;
> diff --git a/fs/internal.h b/fs/internal.h
> index 3ce8edbaa3ca2f..9ad6b5157584b8 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -48,8 +48,8 @@ static inline int emergency_thaw_bdev(struct super_block *sb)
>  /*
>   * buffer.c
>   */
> -extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
> -		get_block_t *get_block, struct iomap *iomap);
> +int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
> +		get_block_t *get_block, const struct iomap *iomap);
>  
>  /*
>   * char_dev.c
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/27] fsdax: mark the iomap argument to dax_iomap_sector as const
  2021-07-19 10:34 ` [PATCH 05/27] fsdax: mark the iomap argument to dax_iomap_sector as const Christoph Hellwig
@ 2021-07-19 17:35   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:34:58PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>

LGTM
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index da41f9363568e0..4d63040fd71f56 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1005,7 +1005,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
>  
> -static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
> +static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
>  {
>  	return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/27] iomap: mark the iomap argument to iomap_read_inline_data const
  2021-07-19 10:34 ` [PATCH 06/27] iomap: mark the iomap argument to iomap_read_inline_data const Christoph Hellwig
@ 2021-07-19 17:35   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:34:59PM +0200, Christoph Hellwig wrote:
> iomap_read_inline_data never modifies the passed in iomap, so mark
> it const.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 75310f6fcf8401..e47380259cf7e1 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -207,7 +207,7 @@ struct iomap_readpage_ctx {
>  
>  static void
>  iomap_read_inline_data(struct inode *inode, struct page *page,
> -		struct iomap *iomap)
> +		const struct iomap *iomap)
>  {
>  	size_t size = i_size_read(inode);
>  	void *addr;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/27] iomap: mark the iomap argument to iomap_read_page_sync const
  2021-07-19 10:35 ` [PATCH 07/27] iomap: mark the iomap argument to iomap_read_page_sync const Christoph Hellwig
@ 2021-07-19 17:35   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:00PM +0200, Christoph Hellwig wrote:
> iomap_read_page_sync never modifies the passed in iomap, so mark
> it const.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e47380259cf7e1..8c26cf7cbd72b0 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -535,7 +535,7 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  
>  static int
>  iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
> -		unsigned plen, struct iomap *iomap)
> +		unsigned plen, const struct iomap *iomap)
>  {
>  	struct bio_vec bvec;
>  	struct bio bio;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 27/27] iomap: constify iomap_iter_srcmap
  2021-07-19 10:35 ` [PATCH 27/27] iomap: constify iomap_iter_srcmap Christoph Hellwig
@ 2021-07-19 17:44   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:20PM +0200, Christoph Hellwig wrote:
> The srcmap returned from iomap_iter_srcmap is never modified, so mark
> the iomap returned from it const and constify a lot of code that never
> modifies the iomap.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

LGTM!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 32 ++++++++++++++++----------------
>  include/linux/iomap.h  |  2 +-
>  2 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index eb5d742b5bf8b7..a2dd42f3115cfa 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -226,20 +226,20 @@ iomap_read_inline_data(struct inode *inode, struct page *page,
>  	SetPageUptodate(page);
>  }
>  
> -static inline bool iomap_block_needs_zeroing(struct iomap_iter *iter,
> +static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
>  		loff_t pos)
>  {
> -	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  
>  	return srcmap->type != IOMAP_MAPPED ||
>  		(srcmap->flags & IOMAP_F_NEW) ||
>  		pos >= i_size_read(iter->inode);
>  }
>  
> -static loff_t iomap_readpage_iter(struct iomap_iter *iter,
> +static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>  		struct iomap_readpage_ctx *ctx, loff_t offset)
>  {
> -	struct iomap *iomap = &iter->iomap;
> +	const struct iomap *iomap = &iter->iomap;
>  	loff_t pos = iter->pos + offset;
>  	loff_t length = iomap_length(iter) - offset;
>  	struct page *page = ctx->cur_page;
> @@ -355,7 +355,7 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(iomap_readpage);
>  
> -static loff_t iomap_readahead_iter(struct iomap_iter *iter,
> +static loff_t iomap_readahead_iter(const struct iomap_iter *iter,
>  		struct iomap_readpage_ctx *ctx)
>  {
>  	loff_t length = iomap_length(iter);
> @@ -539,10 +539,10 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  	return submit_bio_wait(&bio);
>  }
>  
> -static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> +static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>  		unsigned len, struct page *page)
>  {
> -	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	struct iomap_page *iop = iomap_page_create(iter->inode, page);
>  	loff_t block_size = i_blocksize(iter->inode);
>  	loff_t block_start = round_down(pos, block_size);
> @@ -580,11 +580,11 @@ static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
>  	return 0;
>  }
>  
> -static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
> -		struct page **pagep)
> +static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> +		unsigned len, struct page **pagep)
>  {
>  	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> -	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	struct page *page;
>  	int status = 0;
>  
> @@ -655,10 +655,10 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  	return copied;
>  }
>  
> -static size_t iomap_write_end_inline(struct iomap_iter *iter, struct page *page,
> -		loff_t pos, size_t copied)
> +static size_t iomap_write_end_inline(const struct iomap_iter *iter,
> +		struct page *page, loff_t pos, size_t copied)
>  {
> -	struct iomap *iomap = &iter->iomap;
> +	const struct iomap *iomap = &iter->iomap;
>  	void *addr;
>  
>  	WARN_ON_ONCE(!PageUptodate(page));
> @@ -678,7 +678,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
>  		size_t copied, struct page *page)
>  {
>  	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> -	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	loff_t old_size = iter->inode->i_size;
>  	size_t ret;
>  
> @@ -803,7 +803,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  {
>  	struct iomap *iomap = &iter->iomap;
> -	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	loff_t pos = iter->pos;
>  	loff_t length = iomap_length(iter);
>  	long status = 0;
> @@ -879,7 +879,7 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
>  static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
>  {
>  	struct iomap *iomap = &iter->iomap;
> -	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	loff_t pos = iter->pos;
>  	loff_t length = iomap_length(iter);
>  	loff_t written = 0;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 719798814bdfdb..a1fb0d22efbd40 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -193,7 +193,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter)
>   * for a given operation, which may or may no be identical to the destination
>   * map in &i->iomap.
>   */
> -static inline struct iomap *iomap_iter_srcmap(struct iomap_iter *i)
> +static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
>  {
>  	if (i->srcmap.type != IOMAP_HOLE)
>  		return &i->srcmap;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 23/27] iomap: rework unshare flag
  2021-07-19 10:35 ` [PATCH 23/27] iomap: rework unshare flag Christoph Hellwig
@ 2021-07-19 17:44   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:16PM +0200, Christoph Hellwig wrote:
> Instead of another internal flags namespace inside of buffered-io.c,
> just pass a UNSHARE hint in the main iomap flags field.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 23 +++++++++--------------
>  include/linux/iomap.h  |  1 +
>  2 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index daabbe8d7edfb5..eb5d742b5bf8b7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -511,10 +511,6 @@ iomap_migrate_page(struct address_space *mapping, struct page *newpage,
>  EXPORT_SYMBOL_GPL(iomap_migrate_page);
>  #endif /* CONFIG_MIGRATION */
>  
> -enum {
> -	IOMAP_WRITE_F_UNSHARE		= (1 << 0),
> -};

Oh good, this finally dies.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> -
>  static void
>  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  {
> @@ -544,7 +540,7 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  }
>  
>  static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> -		unsigned len, int flags, struct page *page)
> +		unsigned len, struct page *page)
>  {
>  	struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	struct iomap_page *iop = iomap_page_create(iter->inode, page);
> @@ -563,13 +559,13 @@ static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
>  		if (plen == 0)
>  			break;
>  
> -		if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
> +		if (!(iter->flags & IOMAP_UNSHARE) &&
>  		    (from <= poff || from >= poff + plen) &&
>  		    (to <= poff || to >= poff + plen))
>  			continue;
>  
>  		if (iomap_block_needs_zeroing(iter, block_start)) {
> -			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> +			if (WARN_ON_ONCE(iter->flags & IOMAP_UNSHARE))
>  				return -EIO;
>  			zero_user_segments(page, poff, from, to, poff + plen);
>  		} else {
> @@ -585,7 +581,7 @@ static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
>  }
>  
>  static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
> -		unsigned flags, struct page **pagep)
> +		struct page **pagep)
>  {
>  	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
>  	struct iomap *srcmap = iomap_iter_srcmap(iter);
> @@ -617,7 +613,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
>  	else if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>  	else
> -		status = __iomap_write_begin(iter, pos, len, flags, page);
> +		status = __iomap_write_begin(iter, pos, len, page);
>  
>  	if (unlikely(status))
>  		goto out_unlock;
> @@ -748,7 +744,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  			break;
>  		}
>  
> -		status = iomap_write_begin(iter, pos, bytes, 0, &page);
> +		status = iomap_write_begin(iter, pos, bytes, &page);
>  		if (unlikely(status))
>  			break;
>  
> @@ -825,8 +821,7 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
>  		struct page *page;
>  
> -		status = iomap_write_begin(iter, pos, bytes,
> -				IOMAP_WRITE_F_UNSHARE, &page);
> +		status = iomap_write_begin(iter, pos, bytes, &page);
>  		if (unlikely(status))
>  			return status;
>  
> @@ -854,7 +849,7 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  		.inode		= inode,
>  		.pos		= pos,
>  		.len		= len,
> -		.flags		= IOMAP_WRITE,
> +		.flags		= IOMAP_WRITE | IOMAP_UNSHARE,
>  	};
>  	int ret;
>  
> @@ -871,7 +866,7 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
>  	unsigned offset = offset_in_page(pos);
>  	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
>  
> -	status = iomap_write_begin(iter, pos, bytes, 0, &page);
> +	status = iomap_write_begin(iter, pos, bytes, &page);
>  	if (status)
>  		return status;
>  
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 2f13e34c2c0b0b..719798814bdfdb 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -122,6 +122,7 @@ struct iomap_page_ops {
>  #define IOMAP_DIRECT		(1 << 4) /* direct I/O */
>  #define IOMAP_NOWAIT		(1 << 5) /* do not block */
>  #define IOMAP_OVERWRITE_ONLY	(1 << 6) /* only pure overwrites allowed */
> +#define IOMAP_UNSHARE		(1 << 7) /* unshare_file_range */
>  
>  struct iomap_ops {
>  	/*
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 22/27] iomap: pass an iomap_iter to various buffered I/O helpers
  2021-07-19 10:35 ` [PATCH 22/27] iomap: pass an iomap_iter to various buffered I/O helpers Christoph Hellwig
@ 2021-07-19 17:48   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:15PM +0200, Christoph Hellwig wrote:
> Pass the iomap_iter structure instead of individual parameters to
> various internal helpers for buffered I/O.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 117 ++++++++++++++++++++---------------------
>  1 file changed, 56 insertions(+), 61 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index c273b5d88dd8a8..daabbe8d7edfb5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -226,12 +226,14 @@ iomap_read_inline_data(struct inode *inode, struct page *page,
>  	SetPageUptodate(page);
>  }
>  
> -static inline bool iomap_block_needs_zeroing(struct inode *inode,
> -		struct iomap *iomap, loff_t pos)
> +static inline bool iomap_block_needs_zeroing(struct iomap_iter *iter,
> +		loff_t pos)
>  {
> -	return iomap->type != IOMAP_MAPPED ||
> -		(iomap->flags & IOMAP_F_NEW) ||
> -		pos >= i_size_read(inode);
> +	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +
> +	return srcmap->type != IOMAP_MAPPED ||
> +		(srcmap->flags & IOMAP_F_NEW) ||
> +		pos >= i_size_read(iter->inode);
>  }
>  
>  static loff_t iomap_readpage_iter(struct iomap_iter *iter,
> @@ -259,7 +261,7 @@ static loff_t iomap_readpage_iter(struct iomap_iter *iter,
>  	if (plen == 0)
>  		goto done;
>  
> -	if (iomap_block_needs_zeroing(iter->inode, iomap, pos)) {
> +	if (iomap_block_needs_zeroing(iter, pos)) {
>  		zero_user(page, poff, plen);
>  		iomap_set_range_uptodate(page, poff, plen);
>  		goto done;
> @@ -541,12 +543,12 @@ iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
>  	return submit_bio_wait(&bio);
>  }
>  
> -static int
> -__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> -		struct page *page, struct iomap *srcmap)
> +static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> +		unsigned len, int flags, struct page *page)
>  {
> -	struct iomap_page *iop = iomap_page_create(inode, page);
> -	loff_t block_size = i_blocksize(inode);
> +	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	struct iomap_page *iop = iomap_page_create(iter->inode, page);
> +	loff_t block_size = i_blocksize(iter->inode);
>  	loff_t block_start = round_down(pos, block_size);
>  	loff_t block_end = round_up(pos + len, block_size);
>  	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
> @@ -556,7 +558,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  	ClearPageError(page);
>  
>  	do {
> -		iomap_adjust_read_range(inode, iop, &block_start,
> +		iomap_adjust_read_range(iter->inode, iop, &block_start,
>  				block_end - block_start, &poff, &plen);
>  		if (plen == 0)
>  			break;
> @@ -566,7 +568,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  		    (to <= poff || to >= poff + plen))
>  			continue;
>  
> -		if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
> +		if (iomap_block_needs_zeroing(iter, block_start)) {
>  			if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
>  				return -EIO;
>  			zero_user_segments(page, poff, from, to, poff + plen);
> @@ -582,41 +584,40 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
>  	return 0;
>  }
>  
> -static int
> -iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> -		struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> +static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned len,
> +		unsigned flags, struct page **pagep)
>  {
> -	const struct iomap_page_ops *page_ops = iomap->page_ops;
> +	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> +	struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	struct page *page;
>  	int status = 0;
>  
> -	BUG_ON(pos + len > iomap->offset + iomap->length);
> -	if (srcmap != iomap)
> +	BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> +	if (srcmap != &iter->iomap)
>  		BUG_ON(pos + len > srcmap->offset + srcmap->length);
>  
>  	if (fatal_signal_pending(current))
>  		return -EINTR;
>  
>  	if (page_ops && page_ops->page_prepare) {
> -		status = page_ops->page_prepare(inode, pos, len);
> +		status = page_ops->page_prepare(iter->inode, pos, len);
>  		if (status)
>  			return status;
>  	}
>  
> -	page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT,
> -			AOP_FLAG_NOFS);
> +	page = grab_cache_page_write_begin(iter->inode->i_mapping,
> +				pos >> PAGE_SHIFT, AOP_FLAG_NOFS);
>  	if (!page) {
>  		status = -ENOMEM;
>  		goto out_no_page;
>  	}
>  
>  	if (srcmap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, srcmap);
> -	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> +		iomap_read_inline_data(iter->inode, page, srcmap);
> +	else if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>  	else
> -		status = __iomap_write_begin(inode, pos, len, flags, page,
> -				srcmap);
> +		status = __iomap_write_begin(iter, pos, len, flags, page);
>  
>  	if (unlikely(status))
>  		goto out_unlock;
> @@ -627,11 +628,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  out_unlock:
>  	unlock_page(page);
>  	put_page(page);
> -	iomap_write_failed(inode, pos, len);
> +	iomap_write_failed(iter->inode, pos, len);
>  
>  out_no_page:
>  	if (page_ops && page_ops->page_done)
> -		page_ops->page_done(inode, pos, 0, NULL);
> +		page_ops->page_done(iter->inode, pos, 0, NULL);
>  	return status;
>  }
>  
> @@ -658,9 +659,10 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  	return copied;
>  }
>  
> -static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
> -		struct iomap *iomap, loff_t pos, size_t copied)
> +static size_t iomap_write_end_inline(struct iomap_iter *iter, struct page *page,
> +		loff_t pos, size_t copied)
>  {
> +	struct iomap *iomap = &iter->iomap;
>  	void *addr;
>  
>  	WARN_ON_ONCE(!PageUptodate(page));
> @@ -671,26 +673,26 @@ static size_t iomap_write_end_inline(struct inode *inode, struct page *page,
>  	memcpy(iomap->inline_data + pos, addr + pos, copied);
>  	kunmap_atomic(addr);
>  
> -	mark_inode_dirty(inode);
> +	mark_inode_dirty(iter->inode);
>  	return copied;
>  }
>  
>  /* Returns the number of bytes copied.  May be 0.  Cannot be an errno. */
> -static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
> -		size_t copied, struct page *page, struct iomap *iomap,
> -		struct iomap *srcmap)
> +static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
> +		size_t copied, struct page *page)
>  {
> -	const struct iomap_page_ops *page_ops = iomap->page_ops;
> -	loff_t old_size = inode->i_size;
> +	const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> +	struct iomap *srcmap = iomap_iter_srcmap(iter);
> +	loff_t old_size = iter->inode->i_size;
>  	size_t ret;
>  
>  	if (srcmap->type == IOMAP_INLINE) {
> -		ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> +		ret = iomap_write_end_inline(iter, page, pos, copied);
>  	} else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) {
> -		ret = block_write_end(NULL, inode->i_mapping, pos, len, copied,
> -				page, NULL);
> +		ret = block_write_end(NULL, iter->inode->i_mapping, pos, len,
> +				copied, page, NULL);
>  	} else {
> -		ret = __iomap_write_end(inode, pos, len, copied, page);
> +		ret = __iomap_write_end(iter->inode, pos, len, copied, page);
>  	}
>  
>  	/*
> @@ -699,26 +701,24 @@ static size_t iomap_write_end(struct inode *inode, loff_t pos, size_t len,
>  	 * preferably after I/O completion so that no stale data is exposed.
>  	 */
>  	if (pos + ret > old_size) {
> -		i_size_write(inode, pos + ret);
> -		iomap->flags |= IOMAP_F_SIZE_CHANGED;
> +		i_size_write(iter->inode, pos + ret);
> +		iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
>  	}
>  	unlock_page(page);
>  
>  	if (old_size < pos)
> -		pagecache_isize_extended(inode, old_size, pos);
> +		pagecache_isize_extended(iter->inode, old_size, pos);
>  	if (page_ops && page_ops->page_done)
> -		page_ops->page_done(inode, pos, ret, page);
> +		page_ops->page_done(iter->inode, pos, ret, page);
>  	put_page(page);
>  
>  	if (ret < len)
> -		iomap_write_failed(inode, pos, len);
> +		iomap_write_failed(iter->inode, pos, len);
>  	return ret;
>  }
>  
>  static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  {
> -	struct iomap *srcmap = iomap_iter_srcmap(iter);
> -	struct iomap *iomap = &iter->iomap;
>  	loff_t length = iomap_length(iter);
>  	loff_t pos = iter->pos;
>  	ssize_t written = 0;
> @@ -748,8 +748,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  			break;
>  		}
>  
> -		status = iomap_write_begin(iter->inode, pos, bytes, 0, &page,
> -					   iomap, srcmap);
> +		status = iomap_write_begin(iter, pos, bytes, 0, &page);
>  		if (unlikely(status))
>  			break;
>  
> @@ -758,8 +757,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  
>  		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
>  
> -		status = iomap_write_end(iter->inode, pos, bytes, copied, page,
> -					 iomap, srcmap);
> +		status = iomap_write_end(iter, pos, bytes, copied, page);
>  
>  		if (unlikely(copied != status))
>  			iov_iter_revert(i, copied - status);
> @@ -827,13 +825,12 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  		unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
>  		struct page *page;
>  
> -		status = iomap_write_begin(iter->inode, pos, bytes,
> -				IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap);
> +		status = iomap_write_begin(iter, pos, bytes,
> +				IOMAP_WRITE_F_UNSHARE, &page);
>  		if (unlikely(status))
>  			return status;
>  
> -		status = iomap_write_end(iter->inode, pos, bytes, bytes, page, iomap,
> -				srcmap);
> +		status = iomap_write_end(iter, pos, bytes, bytes, page);
>  		if (WARN_ON_ONCE(status == 0))
>  			return -EIO;
>  
> @@ -867,22 +864,21 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
> -static s64 iomap_zero(struct inode *inode, loff_t pos, u64 length,
> -		struct iomap *iomap, struct iomap *srcmap)
> +static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
>  {
>  	struct page *page;
>  	int status;
>  	unsigned offset = offset_in_page(pos);
>  	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
>  
> -	status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, srcmap);
> +	status = iomap_write_begin(iter, pos, bytes, 0, &page);
>  	if (status)
>  		return status;
>  
>  	zero_user(page, offset, bytes);
>  	mark_page_accessed(page);
>  
> -	return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
> +	return iomap_write_end(iter, pos, bytes, bytes, page);
>  }
>  
>  static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> @@ -903,8 +899,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
>  		if (IS_DAX(iter->inode))
>  			bytes = dax_iomap_zero(pos, length, iomap);
>  		else
> -			bytes = iomap_zero(iter->inode, pos, length, iomap,
> -					   srcmap);
> +			bytes = __iomap_zero_iter(iter, pos, length);
>  		if (bytes < 0)
>  			return bytes;
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 21/27] iomap: remove iomap_apply
  2021-07-19 10:35 ` [PATCH 21/27] iomap: remove iomap_apply Christoph Hellwig
@ 2021-07-19 17:48   ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:14PM +0200, Christoph Hellwig wrote:
> iomap_apply is unused now, so remove it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/Makefile     |  1 -
>  fs/iomap/apply.c      | 99 -------------------------------------------
>  fs/iomap/trace.h      | 40 -----------------
>  include/linux/iomap.h | 10 -----
>  4 files changed, 150 deletions(-)

mmm, negative LOC delta ;)
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  delete mode 100644 fs/iomap/apply.c
> 
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index 85034deb5a2f19..ebd9866d80ae90 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -9,7 +9,6 @@ ccflags-y += -I $(srctree)/$(src)		# needed for trace events
>  obj-$(CONFIG_FS_IOMAP)		+= iomap.o
>  
>  iomap-y				+= trace.o \
> -				   apply.o \
>  				   iter.o \
>  				   buffered-io.o \
>  				   direct-io.o \
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> deleted file mode 100644
> index 26ab6563181fc6..00000000000000
> --- a/fs/iomap/apply.c
> +++ /dev/null
> @@ -1,99 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * Copyright (C) 2010 Red Hat, Inc.
> - * Copyright (c) 2016-2018 Christoph Hellwig.
> - */
> -#include <linux/module.h>
> -#include <linux/compiler.h>
> -#include <linux/fs.h>
> -#include <linux/iomap.h>
> -#include "trace.h"
> -
> -/*
> - * Execute a iomap write on a segment of the mapping that spans a
> - * contiguous range of pages that have identical block mapping state.
> - *
> - * This avoids the need to map pages individually, do individual allocations
> - * for each page and most importantly avoid the need for filesystem specific
> - * locking per page. Instead, all the operations are amortised over the entire
> - * range of pages. It is assumed that the filesystems will lock whatever
> - * resources they require in the iomap_begin call, and release them in the
> - * iomap_end call.
> - */
> -loff_t
> -iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> -		const struct iomap_ops *ops, void *data, iomap_actor_t actor)
> -{
> -	struct iomap iomap = { .type = IOMAP_HOLE };
> -	struct iomap srcmap = { .type = IOMAP_HOLE };
> -	loff_t written = 0, ret;
> -	u64 end;
> -
> -	trace_iomap_apply(inode, pos, length, flags, ops, actor, _RET_IP_);
> -
> -	/*
> -	 * Need to map a range from start position for length bytes. This can
> -	 * span multiple pages - it is only guaranteed to return a range of a
> -	 * single type of pages (e.g. all into a hole, all mapped or all
> -	 * unwritten). Failure at this point has nothing to undo.
> -	 *
> -	 * If allocation is required for this range, reserve the space now so
> -	 * that the allocation is guaranteed to succeed later on. Once we copy
> -	 * the data into the page cache pages, then we cannot fail otherwise we
> -	 * expose transient stale data. If the reserve fails, we can safely
> -	 * back out at this point as there is nothing to undo.
> -	 */
> -	ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap);
> -	if (ret)
> -		return ret;
> -	if (WARN_ON(iomap.offset > pos)) {
> -		written = -EIO;
> -		goto out;
> -	}
> -	if (WARN_ON(iomap.length == 0)) {
> -		written = -EIO;
> -		goto out;
> -	}
> -
> -	trace_iomap_apply_dstmap(inode, &iomap);
> -	if (srcmap.type != IOMAP_HOLE)
> -		trace_iomap_apply_srcmap(inode, &srcmap);
> -
> -	/*
> -	 * Cut down the length to the one actually provided by the filesystem,
> -	 * as it might not be able to give us the whole size that we requested.
> -	 */
> -	end = iomap.offset + iomap.length;
> -	if (srcmap.type != IOMAP_HOLE)
> -		end = min(end, srcmap.offset + srcmap.length);
> -	if (pos + length > end)
> -		length = end - pos;
> -
> -	/*
> -	 * Now that we have guaranteed that the space allocation will succeed,
> -	 * we can do the copy-in page by page without having to worry about
> -	 * failures exposing transient data.
> -	 *
> -	 * To support COW operations, we read in data for partially blocks from
> -	 * the srcmap if the file system filled it in.  In that case we the
> -	 * length needs to be limited to the earlier of the ends of the iomaps.
> -	 * If the file system did not provide a srcmap we pass in the normal
> -	 * iomap into the actors so that they don't need to have special
> -	 * handling for the two cases.
> -	 */
> -	written = actor(inode, pos, length, data, &iomap,
> -			srcmap.type != IOMAP_HOLE ? &srcmap : &iomap);
> -
> -out:
> -	/*
> -	 * Now the data has been copied, commit the range we've copied.  This
> -	 * should not fail unless the filesystem has had a fatal error.
> -	 */
> -	if (ops->iomap_end) {
> -		ret = ops->iomap_end(inode, pos, length,
> -				     written > 0 ? written : 0,
> -				     flags, &iomap);
> -	}
> -
> -	return written ? written : ret;
> -}
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 1012d7af6b689b..f1519f9a140320 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -138,49 +138,9 @@ DECLARE_EVENT_CLASS(iomap_class,
>  DEFINE_EVENT(iomap_class, name,	\
>  	TP_PROTO(struct inode *inode, struct iomap *iomap), \
>  	TP_ARGS(inode, iomap))
> -DEFINE_IOMAP_EVENT(iomap_apply_dstmap);
> -DEFINE_IOMAP_EVENT(iomap_apply_srcmap);
>  DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
>  DEFINE_IOMAP_EVENT(iomap_iter_srcmap);
>  
> -TRACE_EVENT(iomap_apply,
> -	TP_PROTO(struct inode *inode, loff_t pos, loff_t length,
> -		unsigned int flags, const void *ops, void *actor,
> -		unsigned long caller),
> -	TP_ARGS(inode, pos, length, flags, ops, actor, caller),
> -	TP_STRUCT__entry(
> -		__field(dev_t, dev)
> -		__field(u64, ino)
> -		__field(loff_t, pos)
> -		__field(loff_t, length)
> -		__field(unsigned int, flags)
> -		__field(const void *, ops)
> -		__field(void *, actor)
> -		__field(unsigned long, caller)
> -	),
> -	TP_fast_assign(
> -		__entry->dev = inode->i_sb->s_dev;
> -		__entry->ino = inode->i_ino;
> -		__entry->pos = pos;
> -		__entry->length = length;
> -		__entry->flags = flags;
> -		__entry->ops = ops;
> -		__entry->actor = actor;
> -		__entry->caller = caller;
> -	),
> -	TP_printk("dev %d:%d ino 0x%llx pos %lld length %lld flags %s (0x%x) "
> -		  "ops %ps caller %pS actor %ps",
> -		  MAJOR(__entry->dev), MINOR(__entry->dev),
> -		   __entry->ino,
> -		   __entry->pos,
> -		   __entry->length,
> -		   __print_flags(__entry->flags, "|", IOMAP_FLAGS_STRINGS),
> -		   __entry->flags,
> -		   __entry->ops,
> -		   (void *)__entry->caller,
> -		   __entry->actor)
> -);
> -
>  TRACE_EVENT(iomap_iter,
>  	TP_PROTO(struct iomap_iter *iter, const void *ops,
>  		 unsigned long caller),
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index da01226886eca4..2f13e34c2c0b0b 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -199,16 +199,6 @@ static inline struct iomap *iomap_iter_srcmap(struct iomap_iter *i)
>  	return &i->iomap;
>  }
>  
> -/*
> - * Main iomap iterator function.
> - */
> -typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
> -		void *data, struct iomap *iomap, struct iomap *srcmap);
> -
> -loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
> -		unsigned flags, const struct iomap_ops *ops, void *data,
> -		iomap_actor_t actor);
> -
>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  		const struct iomap_ops *ops);
>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: RFC: switch iomap to an iterator model
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (26 preceding siblings ...)
  2021-07-19 10:35 ` [PATCH 27/27] iomap: constify iomap_iter_srcmap Christoph Hellwig
@ 2021-07-19 17:57 ` Darrick J. Wong
  2021-07-27  8:07   ` DAX setup pains, was " Christoph Hellwig
  2021-07-29 20:33 ` Darrick J. Wong
  28 siblings, 1 reply; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-19 17:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:34:53PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series replies the existing callback-based iomap_apply to an iter based
> model.  The prime aim here is to simply the DAX reflink support, which
> requires iterating through two inodes, something that is rather painful
> with the apply model.  It also helps to kill an indirect call per segment
> as-is.  Compared to the earlier patchset from Matthew Wilcox that this
> series is based upon it does not eliminate all indirect calls, but as the
> upside it does not change the file systems at all (except for the btrfs
> and gfs2 hooks which have slight prototype changes).

FWIW patches 9-20 look ok to me, modulo the discussion I started in
patch 8 about defining a distinct type for iomap byte lengths instead of
the combination of int/ssize_t/u64 that we use now.

> This passes basic testing on XFS for block based file systems.  The DAX
> changes are entirely untested as I haven't managed to get pmem work in
> recent qemu.

This gets increasingly difficult as time goes by.

Right now I have the following bits of libvirt xml in the vm
definitions:

  <maxMemory slots='32' unit='KiB'>1073741824</maxMemory>
  <devices>
    <memory model='nvdimm' access='shared'>
      <source>
        <path>/run/g.mem</path>
      </source>
      <target>
        <size unit='KiB'>10487808</size>
        <node>0</node>
      </target>
      <address type='dimm' slot='0'/>
    </memory>
  </devices>

Which seems to translate to:

-machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm=on
-object memory-backend-file,id=memnvdimm0,prealloc=no,mem-path=/run/g.mem,share=yes,size=10739515392,align=128M
-device nvdimm,memdev=memnvdimm0,id=nvdimm0,slot=0,label-size=2M

Evidently something was added to the pmem code(?) that makes it fussy if
the memory region doesn't align to a 128M boundary or the label isn't
big enough for ... whatever gets written into them.

The file /run/g.mem is intended to provide 10GB of pmem to the VM, with
an additional 2M allocated for the label.

--D

> Diffstat:
>  b/fs/btrfs/inode.c       |    5 
>  b/fs/buffer.c            |    4 
>  b/fs/dax.c               |  578 ++++++++++++++++++++++-------------------------
>  b/fs/gfs2/bmap.c         |    5 
>  b/fs/internal.h          |    4 
>  b/fs/iomap/Makefile      |    2 
>  b/fs/iomap/buffered-io.c |  344 +++++++++++++--------------
>  b/fs/iomap/direct-io.c   |  162 ++++++-------
>  b/fs/iomap/fiemap.c      |  101 +++-----
>  b/fs/iomap/iter.c        |   74 ++++++
>  b/fs/iomap/seek.c        |   88 +++----
>  b/fs/iomap/swapfile.c    |   38 +--
>  b/fs/iomap/trace.h       |   35 +-
>  b/include/linux/iomap.h  |   73 ++++-
>  fs/iomap/apply.c         |   99 --------
>  15 files changed, 777 insertions(+), 835 deletions(-)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/27] iomap: add the new iomap_iter model
  2021-07-19 10:35 ` [PATCH 08/27] iomap: add the new iomap_iter model Christoph Hellwig
  2021-07-19 16:56   ` Darrick J. Wong
@ 2021-07-19 21:48   ` Dave Chinner
  2021-07-26  8:17     ` Christoph Hellwig
  1 sibling, 1 reply; 59+ messages in thread
From: Dave Chinner @ 2021-07-19 21:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:01PM +0200, Christoph Hellwig wrote:
> The iomap_iter struct provides a convenient way to package up and
> maintain all the arguments to the various mapping and operation
> functions.  It is operated on using the iomap_iter() function that
> is called in loop until the whole range has been processed.  Compared
> to the existing iomap_apply() function this avoid an indirect call
> for each iteration.
> 
> For now iomap_iter() calls back into the existing ->iomap_begin and
> ->iomap_end methods, but in the future this could be further optimized
> to avoid indirect calls entirely.
> 
> Based on an earlier patch from Matthew Wilcox <willy@infradead.org>.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/Makefile     |  1 +
>  fs/iomap/iter.c       | 74 +++++++++++++++++++++++++++++++++++++++++++
>  fs/iomap/trace.h      | 37 +++++++++++++++++++++-
>  include/linux/iomap.h | 56 ++++++++++++++++++++++++++++++++
>  4 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 fs/iomap/iter.c
> 
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index eef2722d93a183..85034deb5a2f19 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_FS_IOMAP)		+= iomap.o
>  
>  iomap-y				+= trace.o \
>  				   apply.o \
> +				   iter.o \

Can we break this cycle of creating new files and removing old files
when changing the iomap core code? It breaks the ability to troll
git history easily through git blame and other techniques that are
file based.

If we are going to create a new file, then the core iomap code that
every thing depends on should just be in a neutrally names file such
as "iomap.c" so that we don't need to play these games in future.

....

> +/**
> + * iomap_iter - iterate over a ranges in a file
> + * @iter: iteration structue
> + * @ops: iomap ops provided by the file system
> + *
> + * Iterate over file system provided contiguous ranges of blocks with the same
> + * state.  Should be called in a loop that continues as long as this function
> + * returns a positive value.  If 0 or a negative value is returned the caller
> + * should break out of the loop - a negative value is an error either from the
> + * file system or from the last iteration stored in @iter.copied.
> + */
> +int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> +{

We should avoid namespace conflicts where function names shadow
object types. iomap_iterate() is fine as the function name - there's
no need for abbreviation here because it's not an overly long name.
This will makes it clearly different to the struct iomap_iter that
is passed to it and it will also make grep, cscope and other
code searching tools much more precise...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 20/27] fsdax: switch dax_iomap_rw to use iomap_iter
  2021-07-19 10:35 ` [PATCH 20/27] fsdax: switch dax_iomap_rw " Christoph Hellwig
@ 2021-07-19 22:10   ` Dave Chinner
  2021-07-26  8:25     ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Dave Chinner @ 2021-07-19 22:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 12:35:13PM +0200, Christoph Hellwig wrote:
> Switch the dax_iomap_rw implementation to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c | 49 ++++++++++++++++++++++++-------------------------
>  1 file changed, 24 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 4d63040fd71f56..51da45301350a6 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1103,20 +1103,21 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
>  	return size;
>  }
>  
> -static loff_t
> -dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> -		struct iomap *iomap, struct iomap *srcmap)
> +static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
> +		struct iov_iter *iter)

At first I wondered "iomi? Strange name, why is this one-off name
used?" and then I realised it's because this function also takes an
struct iov_iter named "iter".

That's going to cause confusion in the long run - iov_iter and
iomap_iter both being generally named "iter", and then one or the
other randomly changing when both are used in the same function.

Would it be better to avoid any possible confusion simply by using
"iomi" for all iomap_iter variables throughout the patchset from the
start? That way nobody is going to confuse iov_iter with iomap_iter
iteration variables and code that uses both types will naturally
have different, well known names...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const
  2021-07-19 16:08   ` Darrick J. Wong
@ 2021-07-20  9:52     ` Nikolay Borisov
  2021-07-26  8:12     ` Christoph Hellwig
  1 sibling, 0 replies; 59+ messages in thread
From: Nikolay Borisov @ 2021-07-20  9:52 UTC (permalink / raw)
  To: Darrick J. Wong, Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel



On 19.07.21 г. 19:08, Darrick J. Wong wrote:
> On Mon, Jul 19, 2021 at 12:34:56PM +0200, Christoph Hellwig wrote:
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> /me wonders, does this have any significant effect on the generated
> code?

https://theartofmachinery.com/2019/08/12/c_const_isnt_for_performance.html

> 
> It's probably a good idea to feed the optimizer as much usage info as we
> can, though I would imagine that for such a simple function it can
> probably tell that we don't change *iomap.
> 
> IMHO, constifiying functions is a good way to signal to /programmers/
> that they're not intended to touch the arguments, so
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> --D
> 
>> ---
>>  include/linux/iomap.h | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>> index 093519d91cc9cc..f9c36df6a3061b 100644
>> --- a/include/linux/iomap.h
>> +++ b/include/linux/iomap.h
>> @@ -91,8 +91,7 @@ struct iomap {
>>  	const struct iomap_page_ops *page_ops;
>>  };
>>  
>> -static inline sector_t
>> -iomap_sector(struct iomap *iomap, loff_t pos)
>> +static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
>>  {
>>  	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
>>  }
>> -- 
>> 2.30.2
>>
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const
  2021-07-19 16:08   ` Darrick J. Wong
  2021-07-20  9:52     ` Nikolay Borisov
@ 2021-07-26  8:12     ` Christoph Hellwig
  1 sibling, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-26  8:12 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 09:08:20AM -0700, Darrick J. Wong wrote:
> IMHO, constifiying functions is a good way to signal to /programmers/
> that they're not intended to touch the arguments, so

Yes, that is the point here.  Basically the iomap and iter should
be pretty much const, and we almost get there except for the odd
size changed flag for gfs2.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/27] iomap: add the new iomap_iter model
  2021-07-19 16:56   ` Darrick J. Wong
@ 2021-07-26  8:15     ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-26  8:15 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 09:56:00AM -0700, Darrick J. Wong wrote:
> Linus previously complained to me about filesystem code (especially
> iomap since it was "newer") (ab)using loff_t variables to store the
> lengths of byte ranges.  It was "loff_t length;" (or so willy
> recollects) that tripped him up.
> 
> ISTR he also said we should use size_t for all lengths because nobody
> should do operations larger than ~2G, but I reject that because iomap
> has users that iterate large ranges of data without generating any IO
> (e.g. fiemap, seek, swapfile activation).
> 
> So... rather than confusing things even more by mixing u64 and ssize_t
> for lengths, can we introduce a new 64-bit length typedef for iomap?
> Last summer, Dave suggested[1] something like:
> 
> 	typedef long long lsize_t;
> 
> That would enable cleanup of all the count/size/length parameters in
> fs/remap_range.c and fs/xfs/xfs_reflink.c to use the new 64-bit length
> type, since those operations have never been limited to 32-bit sizes.

I'd rather avoid playing guinea pig for a somewhat odd new type.  For
now I've switched it to the loff_t as that matches the rest of iomap.
If we switch away either to a new type or s64/u64 we should probably do
it as a big sweep.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/27] iomap: add the new iomap_iter model
  2021-07-19 21:48   ` Dave Chinner
@ 2021-07-26  8:17     ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-26  8:17 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Darrick J. Wong, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Tue, Jul 20, 2021 at 07:48:38AM +1000, Dave Chinner wrote:
> We should avoid namespace conflicts where function names shadow
> object types. iomap_iterate() is fine as the function name - there's
> no need for abbreviation here because it's not an overly long name.
> This will makes it clearly different to the struct iomap_iter that
> is passed to it and it will also make grep, cscope and other
> code searching tools much more precise...

Well, there isn't really a conflict by definition.  I actually like
this choice of names (stolen from the original patch from willy)
as it clearly indicates they go together.

But I'm happy to collect a few more opinions.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter
  2021-07-19 17:05   ` Darrick J. Wong
@ 2021-07-26  8:19     ` Christoph Hellwig
  2021-07-26 16:39       ` Darrick J. Wong
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-26  8:19 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 10:05:45AM -0700, Darrick J. Wong wrote:
> >  	bno = 0;
> > -	ret = iomap_apply(inode, pos, blocksize, 0, ops, &bno,
> > -			  iomap_bmap_actor);
> > +	while ((ret = iomap_iter(&iter, ops)) > 0) {
> > +		if (iter.iomap.type != IOMAP_MAPPED)
> > +			continue;
> 
> There isn't a mapped extent, so return 0 here, right?

We can't just return 0, we always need the final iomap_iter() call
to clean up in case a ->iomap_end method is supplied.  No for bmap
having and needing one is rather theoretical, but people will copy
and paste that once we start breaking the rules.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 17/27] iomap: switch iomap_seek_hole to use iomap_iter
  2021-07-19 17:22   ` Darrick J. Wong
@ 2021-07-26  8:22     ` Christoph Hellwig
  2021-07-26 16:41       ` Darrick J. Wong
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-26  8:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 19, 2021 at 10:22:47AM -0700, Darrick J. Wong wrote:
> > -static loff_t
> > -iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
> > -		      void *data, struct iomap *iomap, struct iomap *srcmap)
> > +static loff_t iomap_seek_hole_iter(const struct iomap_iter *iter, loff_t *pos)
> 
> /me wonders if @pos should be named hole_pos (here and in the caller) to
> make it a little easier to read...

Sure.

> ...because what we're really saying here is that if seek_hole_iter found
> a hole (and returned zero, thereby terminating the loop before iter.len
> could reach zero), we want to return the position of the hole.

Yes.

> > +	return size;
> 
> Not sure why we return size here...?  Oh, because there's an implicit
> hole at EOF, so we return i_size.  Uh, does this do the right thing if
> ->iomap_begin returns posteof mappings?  I don't see anything in
> iomap_iter_advance that would stop iteration at EOF.

Nothing in ->iomap_begin checks that, iomap_seek_hole initializes
iter.len so that it stops at EOF.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 20/27] fsdax: switch dax_iomap_rw to use iomap_iter
  2021-07-19 22:10   ` Dave Chinner
@ 2021-07-26  8:25     ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-26  8:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Darrick J. Wong, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Tue, Jul 20, 2021 at 08:10:05AM +1000, Dave Chinner wrote:
> At first I wondered "iomi? Strange name, why is this one-off name
> used?" and then I realised it's because this function also takes an
> struct iov_iter named "iter".
> 
> That's going to cause confusion in the long run - iov_iter and
> iomap_iter both being generally named "iter", and then one or the
> other randomly changing when both are used in the same function.
> 
> Would it be better to avoid any possible confusion simply by using
> "iomi" for all iomap_iter variables throughout the patchset from the
> start? That way nobody is going to confuse iov_iter with iomap_iter
> iteration variables and code that uses both types will naturally
> have different, well known names...

Hmm.  iomi comes from the original patch from willy and I kinda hate
it.  But given that we have this clash here (and in the direct I/O code)
I kept using it.

Does anyone have any strong opinions here?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter
  2021-07-26  8:19     ` Christoph Hellwig
@ 2021-07-26 16:39       ` Darrick J. Wong
  2021-07-27  6:31         ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-26 16:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 26, 2021 at 10:19:42AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 19, 2021 at 10:05:45AM -0700, Darrick J. Wong wrote:
> > >  	bno = 0;
> > > -	ret = iomap_apply(inode, pos, blocksize, 0, ops, &bno,
> > > -			  iomap_bmap_actor);
> > > +	while ((ret = iomap_iter(&iter, ops)) > 0) {
> > > +		if (iter.iomap.type != IOMAP_MAPPED)
> > > +			continue;
> > 
> > There isn't a mapped extent, so return 0 here, right?
> 
> We can't just return 0, we always need the final iomap_iter() call
> to clean up in case a ->iomap_end method is supplied.  No for bmap
> having and needing one is rather theoretical, but people will copy
> and paste that once we start breaking the rules.

Oh, right, I forgot that someone might want to ->iomap_end.  The
"continue" works because we only asked for one block, therefore we know
that we'll never get to the loop body a second time; and we ignore
iter.processed, which also means we never revisit the loop body.

This "continue without setting iter.processed to break out of loop"
pattern is a rather indirect subtlety, since C programmers are taught
that they can break out of a loop using break;.  This new iomap_iter
pattern fubars that longstanding language feature, and the language
around it is soft:

> /**
>  * iomap_iter - iterate over a ranges in a file
>  * @iter: iteration structue
>  * @ops: iomap ops provided by the file system
>  *
>  * Iterate over file system provided contiguous ranges of blocks with the same
>  * state.  Should be called in a loop that continues as long as this function
>  * returns a positive value.  If 0 or a negative value is returned the caller
>  * should break out of the loop - a negative value is an error either from the
>  * file system or from the last iteration stored in @iter.copied.
>  */

The documentation needs to be much more explicit about the fact that you
cannot "break;" your way out of an iomap_iter loop.  I think the comment
should be rewritten along these lines:

"Iterate over filesystem-provided space mappings for the provided file
range.  This function handles cleanup of resources acquired for
iteration when the filesystem indicates there are no more space
mappings, which means that this function must be called in a loop that
continues as long it returns a positive value.  If 0 or a negative value
is returned, the caller must not return to the loop body.  Within a loop
body, there are two ways to break out of the loop body: leave
@iter.processed unchanged, or set it to the usual negative errno."

Hm.

What if we provide an explicit loop break function?  That would be clear
overkill for bmap, but somebody else wanting to break out of a more
complex loop body ought to be able to say "break" to do that, not
"continue with subtleties".

static inline int
iomap_iter_break(struct iomap_iter *iter, int ret)
{
	int ret2;

	if (!iter->iomap.length || !ops->iomap_end)
		return ret;

	ret2 = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
			0, iter->flags, &iter->iomap);
	return ret ? ret : ret2;
}

And then then theoretical loop body becomes:

	while ((ret = iomap_iter(&iter, ops)) > 0) {
		if (iter.iomap.type != WHAT_I_WANT) {
			ret = iomap_iter_break(&iter, 0);
			break;
		}

		<large blob of code here>

		ret = vfs_do_some_risky_thing(...);
		if (ret) {
			ret = iomap_iter_break(&iter, ret);
			break;
		}

		<more loop body here>

		iter.processed = iter.iomap.length;
	}
	return ret;

Clunky, for sure, but at least we still get to use break as the language
designers intended.

--D

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 17/27] iomap: switch iomap_seek_hole to use iomap_iter
  2021-07-26  8:22     ` Christoph Hellwig
@ 2021-07-26 16:41       ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-26 16:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 26, 2021 at 10:22:36AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 19, 2021 at 10:22:47AM -0700, Darrick J. Wong wrote:
> > > -static loff_t
> > > -iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
> > > -		      void *data, struct iomap *iomap, struct iomap *srcmap)
> > > +static loff_t iomap_seek_hole_iter(const struct iomap_iter *iter, loff_t *pos)
> > 
> > /me wonders if @pos should be named hole_pos (here and in the caller) to
> > make it a little easier to read...
> 
> Sure.
> 
> > ...because what we're really saying here is that if seek_hole_iter found
> > a hole (and returned zero, thereby terminating the loop before iter.len
> > could reach zero), we want to return the position of the hole.
> 
> Yes.
> 
> > > +	return size;
> > 
> > Not sure why we return size here...?  Oh, because there's an implicit
> > hole at EOF, so we return i_size.  Uh, does this do the right thing if
> > ->iomap_begin returns posteof mappings?  I don't see anything in
> > iomap_iter_advance that would stop iteration at EOF.
> 
> Nothing in ->iomap_begin checks that, iomap_seek_hole initializes
> iter.len so that it stops at EOF.

Oh, right.  Sorry, I forgot that. :(

--D

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter
  2021-07-26 16:39       ` Darrick J. Wong
@ 2021-07-27  6:31         ` Christoph Hellwig
  2021-07-27 14:32           ` Darrick J. Wong
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-27  6:31 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dan Williams, Matthew Wilcox,
	Andreas Gruenbacher, Shiyang Ruan, linux-xfs, linux-fsdevel,
	linux-btrfs, nvdimm, cluster-devel

On Mon, Jul 26, 2021 at 09:39:22AM -0700, Darrick J. Wong wrote:
> The documentation needs to be much more explicit about the fact that you
> cannot "break;" your way out of an iomap_iter loop.  I think the comment
> should be rewritten along these lines:
> 
> "Iterate over filesystem-provided space mappings for the provided file
> range.  This function handles cleanup of resources acquired for
> iteration when the filesystem indicates there are no more space
> mappings, which means that this function must be called in a loop that
> continues as long it returns a positive value.  If 0 or a negative value
> is returned, the caller must not return to the loop body.  Within a loop
> body, there are two ways to break out of the loop body: leave
> @iter.processed unchanged, or set it to the usual negative errno."
> 
> Hm.

Yes, I'll update the documentation.

> Clunky, for sure, but at least we still get to use break as the language
> designers intended.

I can't see any advantage there over just proper documentation.  If you
are totally attached to a working break we might have to come up with
a nasty for_each macro that ensures we have a final iomap_apply, but I
doubt it is worth the effort.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* DAX setup pains, was Re: RFC: switch iomap to an iterator model
  2021-07-19 17:57 ` RFC: switch iomap to an iterator model Darrick J. Wong
@ 2021-07-27  8:07   ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2021-07-27  8:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dan Williams, nvdimm

On Mon, Jul 19, 2021 at 10:57:56AM -0700, Darrick J. Wong wrote:
> Which seems to translate to:
> 
> -machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm=on
> -object memory-backend-file,id=memnvdimm0,prealloc=no,mem-path=/run/g.mem,share=yes,size=10739515392,align=128M
> -device nvdimm,memdev=memnvdimm0,id=nvdimm0,slot=0,label-size=2M
> 
> Evidently something was added to the pmem code(?) that makes it fussy if
> the memory region doesn't align to a 128M boundary or the label isn't
> big enough for ... whatever gets written into them.
> 
> The file /run/g.mem is intended to provide 10GB of pmem to the VM, with
> an additional 2M allocated for the label.

I managed to get something like this to work, and had two pmem devices
shown up.  But of course they don't actually support DAX without a
reconfiguration in the VM, and the #$%$@^$^$ DAX code won't even
tell you about why as the printk for that is a pr_debug (patch to fix
this coming).  After a fair amount of goodling I tried to copy this
command line to reconfigure them:

$NDCTL create-namespace --force --reconfig=namespace0.0 --mode=fsdax --map=mem
$NDCTL create-namespace --force --reconfig=namespace1.0 --mode=fsdax --map=mem

Of course that fails with EINVAL.  And after the first run the second
namespace is gone entirely.  The DAX user story is just a trainwreck.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter
  2021-07-27  6:31         ` Christoph Hellwig
@ 2021-07-27 14:32           ` Darrick J. Wong
  0 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-27 14:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel

On Tue, Jul 27, 2021 at 08:31:38AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 26, 2021 at 09:39:22AM -0700, Darrick J. Wong wrote:
> > The documentation needs to be much more explicit about the fact that you
> > cannot "break;" your way out of an iomap_iter loop.  I think the comment
> > should be rewritten along these lines:
> > 
> > "Iterate over filesystem-provided space mappings for the provided file
> > range.  This function handles cleanup of resources acquired for
> > iteration when the filesystem indicates there are no more space
> > mappings, which means that this function must be called in a loop that
> > continues as long it returns a positive value.  If 0 or a negative value
> > is returned, the caller must not return to the loop body.  Within a loop
> > body, there are two ways to break out of the loop body: leave
> > @iter.processed unchanged, or set it to the usual negative errno."
> > 
> > Hm.
> 
> Yes, I'll update the documentation.

Ok, thanks!

> > Clunky, for sure, but at least we still get to use break as the language
> > designers intended.
> 
> I can't see any advantage there over just proper documentation.  If you
> are totally attached to a working break we might have to come up with
> a nasty for_each macro that ensures we have a final iomap_apply, but I
> doubt it is worth the effort.

I was pushing the explicit _break() function as a means to avoid an even
fuglier loop macro.

--D

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: RFC: switch iomap to an iterator model
  2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
                   ` (27 preceding siblings ...)
  2021-07-19 17:57 ` RFC: switch iomap to an iterator model Darrick J. Wong
@ 2021-07-29 20:33 ` Darrick J. Wong
  28 siblings, 0 replies; 59+ messages in thread
From: Darrick J. Wong @ 2021-07-29 20:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Matthew Wilcox, Andreas Gruenbacher, Shiyang Ruan,
	linux-xfs, linux-fsdevel, linux-btrfs, nvdimm, cluster-devel,
	Jan Kara

On Mon, Jul 19, 2021 at 12:34:53PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series replies the existing callback-based iomap_apply to an iter based
> model.  The prime aim here is to simply the DAX reflink support, which

Jan Kara pointed out that recent gcc and clang support a magic attribute
that causes a cleanup function to be called when an automatic variable
goes out of scope.  I've ported the XFS for_each_perag* macros to use
it, but I think this would be roughly (totally untested) what you'd do
for iomap iterators:

/* automatic iteration cleanup via macro hell */
struct iomap_iter_cleanup {
	struct iomap_ops	*ops;
	struct iomap_iter	*iter;
	loff_t			*ret;
};

static inline void iomap_iter_cleanup(struct iomap_iter_cleanup *ic)
{
	struct iomap_iter *iter = ic->iter;
	int ret2 = 0;

	if (!iter->iomap.length || !ic->ops->iomap_end)
		return;

	ret2 = ops->iomap_end(iter->inode, iter->pos,
			iomap_length(iter), 0, iter->flags,
			&iter->iomap);

	if (ret2 && *ic->ret == 0)
		*ic->ret = ret2;

	iter->iomap.length = 0;
}

#define IOMAP_ITER_CLEANUP(pag)	\
	struct iomap_iter_cleanup __iomap_iter_cleanup \
			__attribute__((__cleanup__(iomap_iter_cleanup))) = \
			{ .iter = (iter), .ops = (ops), .ret = &(ret) }

#define for_each_iomap(iter, ops, ret) \
	(ret) = iomap_iter((iter), (ops)); \
	for (IOMAP_ITER_CLEANUP(iter, ops, ret); \
		(ret) > 0; \
		(ret) = iomap_iter((iter), (ops)) \

Then we actually /can/ write our iteration loops in the normal C style:

	struct iomap_iter iter = {
		.inode = ...,
		.pos = 0,
		.length = 32768,
	};
	loff_t ret = 0;

	for_each_iomap(&iter, ops, ret) {
		if (iter.iomap.type != WHAT_I_WANT)
                        break;

		ret = am_i_pissed_off(...);
		if (ret)
			return ret;
	}

	return ret;

and ->iomap_end will always get called.  There are a few sharp edges:

I can't figure out how far back clang and gcc support this attribute.
The gcc docs mention it at least far back as 3.3.6.  clang (afaict) docs
don't reference it directly, but the clang 4 docs claim that it can be
as pedantic as gcc w.r.t. attribute use.  That's more than new enough
for upstream, which requires gcc 4.9 or clang 10.

The /other/ problem is that gcc gets fussy about defining variables
inside the for loop parentheses, which means that any code using it has
to compile with -std=gnu99, which is /not/ the usual c89 that the kernel
uses.  OTOH, it's been 22 years since C99 was ratified, c'mon...

--D

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2021-07-29 20:33 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-19 10:34 RFC: switch iomap to an iterator model Christoph Hellwig
2021-07-19 10:34 ` [PATCH 01/27] iomap: fix a trivial comment typo in trace.h Christoph Hellwig
2021-07-19 16:00   ` Darrick J. Wong
2021-07-19 10:34 ` [PATCH 02/27] iomap: remove the iomap arguments to ->page_{prepare,done} Christoph Hellwig
2021-07-19 16:04   ` Darrick J. Wong
2021-07-19 10:34 ` [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const Christoph Hellwig
2021-07-19 16:08   ` Darrick J. Wong
2021-07-20  9:52     ` Nikolay Borisov
2021-07-26  8:12     ` Christoph Hellwig
2021-07-19 10:34 ` [PATCH 04/27] fs: mark the iomap argument to __block_write_begin_int const Christoph Hellwig
2021-07-19 17:35   ` Darrick J. Wong
2021-07-19 10:34 ` [PATCH 05/27] fsdax: mark the iomap argument to dax_iomap_sector as const Christoph Hellwig
2021-07-19 17:35   ` Darrick J. Wong
2021-07-19 10:34 ` [PATCH 06/27] iomap: mark the iomap argument to iomap_read_inline_data const Christoph Hellwig
2021-07-19 17:35   ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 07/27] iomap: mark the iomap argument to iomap_read_page_sync const Christoph Hellwig
2021-07-19 17:35   ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 08/27] iomap: add the new iomap_iter model Christoph Hellwig
2021-07-19 16:56   ` Darrick J. Wong
2021-07-26  8:15     ` Christoph Hellwig
2021-07-19 21:48   ` Dave Chinner
2021-07-26  8:17     ` Christoph Hellwig
2021-07-19 10:35 ` [PATCH 09/27] iomap: switch readahead and readpage to use iomap_iter Christoph Hellwig
2021-07-19 10:35 ` [PATCH 10/27] iomap: switch iomap_file_buffered_write " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 11/27] iomap: switch iomap_file_unshare " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 12/27] iomap: switch iomap_zero_range " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 13/27] iomap: switch iomap_page_mkwrite " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 14/27] iomap: switch __iomap_dio_rw " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 15/27] iomap: switch iomap_fiemap " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 16/27] iomap: switch iomap_bmap " Christoph Hellwig
2021-07-19 17:05   ` Darrick J. Wong
2021-07-26  8:19     ` Christoph Hellwig
2021-07-26 16:39       ` Darrick J. Wong
2021-07-27  6:31         ` Christoph Hellwig
2021-07-27 14:32           ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 17/27] iomap: switch iomap_seek_hole " Christoph Hellwig
2021-07-19 17:22   ` Darrick J. Wong
2021-07-26  8:22     ` Christoph Hellwig
2021-07-26 16:41       ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 18/27] iomap: switch iomap_seek_data " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 19/27] iomap: switch iomap_swapfile_activate " Christoph Hellwig
2021-07-19 10:35 ` [PATCH 20/27] fsdax: switch dax_iomap_rw " Christoph Hellwig
2021-07-19 22:10   ` Dave Chinner
2021-07-26  8:25     ` Christoph Hellwig
2021-07-19 10:35 ` [PATCH 21/27] iomap: remove iomap_apply Christoph Hellwig
2021-07-19 17:48   ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 22/27] iomap: pass an iomap_iter to various buffered I/O helpers Christoph Hellwig
2021-07-19 17:48   ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 23/27] iomap: rework unshare flag Christoph Hellwig
2021-07-19 17:44   ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 24/27] fsdax: factor out helpers to simplify the dax fault code Christoph Hellwig
2021-07-19 10:35 ` [PATCH 25/27] fsdax: factor out a dax_fault_actor() helper Christoph Hellwig
2021-07-19 10:35 ` [PATCH 26/27] fsdax: switch the fault handlers to use iomap_iter Christoph Hellwig
2021-07-19 17:35   ` Darrick J. Wong
2021-07-19 10:35 ` [PATCH 27/27] iomap: constify iomap_iter_srcmap Christoph Hellwig
2021-07-19 17:44   ` Darrick J. Wong
2021-07-19 17:57 ` RFC: switch iomap to an iterator model Darrick J. Wong
2021-07-27  8:07   ` DAX setup pains, was " Christoph Hellwig
2021-07-29 20:33 ` Darrick J. Wong

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox