All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: map multiple blocks per ->map_blocks in iomap writeback
@ 2023-11-26 12:47 Christoph Hellwig
  2023-11-26 12:47 ` [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Hi all,

this series overhaults a large chunk of the iomap writeback code with
the end result that ->map_blocks can now map multiple blocks at a time,
at least as long as they are all inside the same folio.

Variants of this have passed a lot of testing on XFS, but I haven't even
starting testing it with other file systems.  In terms of performance
there is a very slight reduction in large write workloads, but the main
point for me was to enable other work anyway.

Diffstat:
 fs/iomap/buffered-io.c |  561 +++++++++++++++++++++++--------------------------
 fs/xfs/xfs_aops.c      |    6 
 include/linux/iomap.h  |   17 +
 3 files changed, 287 insertions(+), 297 deletions(-)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  3:47   ` Ritesh Harjani
  2023-11-26 12:47 ` [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

write_cache_pages always clear the page dirty bit before calling into the
file systems, and leaves folios with a writeback failure without the
dirty bit after return.  We also clear the per-block writeback bits for
writeback failures unless no I/O has submitted, which will leave the
folio in an inconsistent state where it doesn't have the folio dirty,
but one or more per-block dirty bits.  This seems to be due the place
where the iomap_clear_range_dirty call was inserted into the existing
not very clearly structured code when adding per-block dirty bit support
and not actually intentional.  Switch to always clearing the dirty on
writeback failure.

Fixes: 4ce02c679722 ("iomap: Add per-block dirty state tracking to improve performance")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f72df2babe561a..98d52feb220f0a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1849,10 +1849,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		 */
 		if (wpc->ops->discard_folio)
 			wpc->ops->discard_folio(folio, pos);
-		if (!count) {
-			folio_unlock(folio);
-			goto done;
-		}
 	}
 
 	/*
@@ -1861,6 +1857,12 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 * all the dirty bits in the folio here.
 	 */
 	iomap_clear_range_dirty(folio, 0, folio_size(folio));
+
+	if (error && !count) {
+		folio_unlock(folio);
+		goto done;
+	}
+
 	folio_start_writeback(folio);
 	folio_unlock(folio);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
  2023-11-26 12:47 ` [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  5:01   ` Ritesh Harjani
  2023-11-26 12:47 ` [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

iomap_writepage_map aready warns about inline data, but then just ignores
it.  Treat it as an error and return -EIO.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 98d52feb220f0a..b1bcc43baf0caf 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1818,8 +1818,10 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		if (error)
 			break;
 		trace_iomap_writepage_map(inode, &wpc->iomap);
-		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
-			continue;
+		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) {
+			error = -EIO;
+			break;
+		}
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
 		iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
  2023-11-26 12:47 ` [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures Christoph Hellwig
  2023-11-26 12:47 ` [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  5:33   ` Ritesh Harjani
  2023-11-29  4:41   ` Darrick J. Wong
  2023-11-26 12:47 ` [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

The io_folios member in struct iomap_ioend counts the number of folios
added to an ioend.  It is only used at submission time and can thus be
moved to iomap_writepage_ctx instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 7 ++++---
 include/linux/iomap.h  | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b1bcc43baf0caf..b28c57f8603303 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1685,10 +1685,11 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 	ioend->io_flags = wpc->iomap.flags;
 	ioend->io_inode = inode;
 	ioend->io_size = 0;
-	ioend->io_folios = 0;
 	ioend->io_offset = offset;
 	ioend->io_bio = bio;
 	ioend->io_sector = sector;
+
+	wpc->nr_folios = 0;
 	return ioend;
 }
 
@@ -1732,7 +1733,7 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
 	 * also prevents long tight loops ending page writeback on all the
 	 * folios in the ioend.
 	 */
-	if (wpc->ioend->io_folios >= IOEND_BATCH_SIZE)
+	if (wpc->nr_folios >= IOEND_BATCH_SIZE)
 		return false;
 	return true;
 }
@@ -1829,7 +1830,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		count++;
 	}
 	if (count)
-		wpc->ioend->io_folios++;
+		wpc->nr_folios++;
 
 	WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
 	WARN_ON_ONCE(!folio_test_locked(folio));
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 96dd0acbba44ac..b2a05dff914d0c 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -293,7 +293,6 @@ struct iomap_ioend {
 	struct list_head	io_list;	/* next ioend in chain */
 	u16			io_type;
 	u16			io_flags;	/* IOMAP_F_* */
-	u32			io_folios;	/* folios added to ioend */
 	struct inode		*io_inode;	/* file being written to */
 	size_t			io_size;	/* size of the extent */
 	loff_t			io_offset;	/* offset in the file */
@@ -329,6 +328,7 @@ struct iomap_writepage_ctx {
 	struct iomap		iomap;
 	struct iomap_ioend	*ioend;
 	const struct iomap_writeback_ops *ops;
+	u32			nr_folios;	/* folios added to the ioend */
 };
 
 void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  6:39   ` Ritesh Harjani
  2023-11-26 12:47 ` [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

The iomap writepage implementation has been removed in commit
478af190cb6c ("iomap: remove iomap_writepage") and this code is now only
called through ->writepages which never happens from memory reclaim.
Remove the canary in the coal mine now that the coal mine has been shut
down.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b28c57f8603303..8148e4c9765dac 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1910,20 +1910,6 @@ static int iomap_do_writepage(struct folio *folio,
 
 	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
 
-	/*
-	 * Refuse to write the folio out if we're called from reclaim context.
-	 *
-	 * This avoids stack overflows when called from deeply used stacks in
-	 * random callers for direct reclaim or memcg reclaim.  We explicitly
-	 * allow reclaim from kswapd as the stack usage there is relatively low.
-	 *
-	 * This should never happen except in the case of a VM regression so
-	 * warn about it.
-	 */
-	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
-			PF_MEMALLOC))
-		goto redirty;
-
 	/*
 	 * Is this folio beyond the end of the file?
 	 *
@@ -1989,8 +1975,6 @@ static int iomap_do_writepage(struct folio *folio,
 
 	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
 
-redirty:
-	folio_redirty_for_writepage(wbc, folio);
 unlock:
 	folio_unlock(folio);
 	return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  6:57   ` Ritesh Harjani
  2023-11-26 12:47 ` [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Most of iomap_do_writepage is dedidcated to handling a folio crossing or
beyond i_size.  Split this is into a separate helper and update the
commens to deal with folios instead of pages and make them more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 128 ++++++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 66 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8148e4c9765dac..4a5a21809b0182 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
 	wbc_account_cgroup_owner(wbc, &folio->page, len);
 }
 
+/*
+ * Check interaction of the folio with the file end.
+ *
+ * If the folio is entirely beyond i_size, return false.  If it straddles
+ * i_size, adjust end_pos and zero all data beyond i_size.
+ */
+static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
+		u64 *end_pos)
+{
+	u64 isize = i_size_read(inode);
+
+	if (*end_pos > isize) {
+		size_t poff = offset_in_folio(folio, isize);
+		pgoff_t end_index = isize >> PAGE_SHIFT;
+
+		/*
+		 * If the folio is entirely ouside of i_size, skip it.
+		 *
+		 * This can happen due to a truncate operation that is in
+		 * progress and in that case truncate will finish it off once
+		 * we've dropped the folio lock.
+		 *
+		 * Note that the pgoff_t used for end_index is an unsigned long.
+		 * If the given offset is greater than 16TB on a 32-bit system,
+		 * then if we checked if the folio is fully outside i_size with
+		 * "if (folio->index >= end_index + 1)", "end_index + 1" would
+		 * overflow and evaluate to 0.  Hence this folio would be
+		 * redirtied and written out repeatedly, which would result in
+		 * an infinite loop; the user program performing this operation
+		 * would hang.  Instead, we can detect this situation by
+		 * checking if the folio is totally beyond i_size or if its
+		 * offset is just equal to the EOF.
+		 */
+		if (folio->index > end_index ||
+		    (folio->index == end_index && poff == 0))
+			return false;
+
+		/*
+		 * The folio straddles i_size.
+		 *
+		 * It must be zeroed out on each and every writepage invocation
+		 * because it may be mmapped:
+		 *
+		 *    A file is mapped in multiples of the page size.  For a
+		 *    file that is not a multiple of the page size, the
+		 *    remaining memory is zeroed when mapped, and writes to that
+		 *    region are not written out to the file.
+		 *
+		 * Also adjust the writeback range to skip all blocks entirely
+		 * beyond i_size.
+		 */
+		folio_zero_segment(folio, poff, folio_size(folio));
+		*end_pos = isize;
+	}
+
+	return true;
+}
+
 /*
  * We implement an immediate ioend submission policy here to avoid needing to
  * chain multiple ioends and hence nest mempool allocations which can violate
@@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio,
 {
 	struct iomap_writepage_ctx *wpc = data;
 	struct inode *inode = folio->mapping->host;
-	u64 end_pos, isize;
+	u64 end_pos = folio_pos(folio) + folio_size(folio);
 
 	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
 
-	/*
-	 * Is this folio beyond the end of the file?
-	 *
-	 * The folio index is less than the end_index, adjust the end_pos
-	 * to the highest offset that this folio should represent.
-	 * -----------------------------------------------------
-	 * |			file mapping	       | <EOF> |
-	 * -----------------------------------------------------
-	 * | Page ... | Page N-2 | Page N-1 |  Page N  |       |
-	 * ^--------------------------------^----------|--------
-	 * |     desired writeback range    |      see else    |
-	 * ---------------------------------^------------------|
-	 */
-	isize = i_size_read(inode);
-	end_pos = folio_pos(folio) + folio_size(folio);
-	if (end_pos > isize) {
-		/*
-		 * Check whether the page to write out is beyond or straddles
-		 * i_size or not.
-		 * -------------------------------------------------------
-		 * |		file mapping		        | <EOF>  |
-		 * -------------------------------------------------------
-		 * | Page ... | Page N-2 | Page N-1 |  Page N   | Beyond |
-		 * ^--------------------------------^-----------|---------
-		 * |				    |      Straddles     |
-		 * ---------------------------------^-----------|--------|
-		 */
-		size_t poff = offset_in_folio(folio, isize);
-		pgoff_t end_index = isize >> PAGE_SHIFT;
-
-		/*
-		 * Skip the page if it's fully outside i_size, e.g.
-		 * due to a truncate operation that's in progress.  We've
-		 * cleaned this page and truncate will finish things off for
-		 * us.
-		 *
-		 * Note that the end_index is unsigned long.  If the given
-		 * offset is greater than 16TB on a 32-bit system then if we
-		 * checked if the page is fully outside i_size with
-		 * "if (page->index >= end_index + 1)", "end_index + 1" would
-		 * overflow and evaluate to 0.  Hence this page would be
-		 * redirtied and written out repeatedly, which would result in
-		 * an infinite loop; the user program performing this operation
-		 * would hang.  Instead, we can detect this situation by
-		 * checking if the page is totally beyond i_size or if its
-		 * offset is just equal to the EOF.
-		 */
-		if (folio->index > end_index ||
-		    (folio->index == end_index && poff == 0))
-			goto unlock;
-
-		/*
-		 * The page straddles i_size.  It must be zeroed out on each
-		 * and every writepage invocation because it may be mmapped.
-		 * "A file is mapped in multiples of the page size.  For a file
-		 * that is not a multiple of the page size, the remaining
-		 * memory is zeroed when mapped, and writes to that region are
-		 * not written out to the file."
-		 */
-		folio_zero_segment(folio, poff, folio_size(folio));
-		end_pos = isize;
+	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
+		folio_unlock(folio);
+		return 0;
 	}
 
 	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
-
-unlock:
-	folio_unlock(folio);
-	return 0;
 }
 
 int
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  7:36   ` Ritesh Harjani
                     ` (2 more replies)
  2023-11-26 12:47 ` [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 3 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Move the tracepoint and the iomap check from iomap_do_writepage into
iomap_writepage_map.  This keeps all logic in one places, and leaves
iomap_do_writepage just as the wrapper for the callback conventions of
write_cache_pages, which will go away when that is convertd to an
iterator.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 34 +++++++++++-----------------------
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4a5a21809b0182..5834aa46bdb8cf 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1842,19 +1842,25 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
  * At the end of a writeback pass, there will be a cached ioend remaining on the
  * writepage context that the caller will need to submit.
  */
-static int
-iomap_writepage_map(struct iomap_writepage_ctx *wpc,
-		struct writeback_control *wbc, struct inode *inode,
-		struct folio *folio, u64 end_pos)
+static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
+		struct writeback_control *wbc, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
+	struct inode *inode = folio->mapping->host;
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
 	unsigned nblocks = i_blocks_per_folio(inode, folio);
 	u64 pos = folio_pos(folio);
+	u64 end_pos = pos + folio_size(folio);
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
+	trace_iomap_writepage(inode, pos, folio_size(folio));
+
+	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
+		folio_unlock(folio);
+		return 0;
+	}
 	WARN_ON_ONCE(end_pos <= pos);
 
 	if (!ifs && nblocks > 1) {
@@ -1952,28 +1958,10 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	return error;
 }
 
-/*
- * Write out a dirty page.
- *
- * For delalloc space on the page, we need to allocate space and flush it.
- * For unwritten space on the page, we need to start the conversion to
- * regular allocated space.
- */
 static int iomap_do_writepage(struct folio *folio,
 		struct writeback_control *wbc, void *data)
 {
-	struct iomap_writepage_ctx *wpc = data;
-	struct inode *inode = folio->mapping->host;
-	u64 end_pos = folio_pos(folio) + folio_size(folio);
-
-	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
-
-	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
-		folio_unlock(folio);
-		return 0;
-	}
-
-	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
+	return iomap_writepage_map(data, wbc, folio);
 }
 
 int
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  7:43   ` Ritesh Harjani
  2023-11-26 12:47 ` [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Switch to the same argument order as iomap_writepage_map and remove the
ifs argument that can be trivially recalculated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 5834aa46bdb8cf..7f86d2f90e3863 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1742,11 +1742,11 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
  * Test to see if we have an existing ioend structure that we could append to
  * first; otherwise finish off the current ioend and start another.
  */
-static void
-iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
-		struct iomap_folio_state *ifs, struct iomap_writepage_ctx *wpc,
-		struct writeback_control *wbc, struct list_head *iolist)
+static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+		struct writeback_control *wbc, struct folio *folio,
+		struct inode *inode, loff_t pos, struct list_head *iolist)
 {
+	struct iomap_folio_state *ifs = folio->private;
 	sector_t sector = iomap_sector(&wpc->iomap, pos);
 	unsigned len = i_blocksize(inode);
 	size_t poff = offset_in_folio(folio, pos);
@@ -1889,8 +1889,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		}
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
-		iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc,
-				 &submit_list);
+		iomap_add_to_ioend(wpc, wbc, folio, inode, pos, &submit_list);
 		count++;
 	}
 	if (count)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27  9:54   ` Ritesh Harjani
  2023-11-26 12:47 ` [PATCH 09/13] iomap: don't chain bios Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

The calculation in iomap_sector is pretty trivial and most of the time
iomap_add_to_ioend only callers either iomap_can_add_to_ioend or
iomap_alloc_ioend from a single invocation.

Calculate the sector in the two lower level functions and stop passing it
from iomap_add_to_ioend and update the iomap_alloc_ioend argument passing
order to match that of iomap_add_to_ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7f86d2f90e3863..329e2c342f1c64 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1666,9 +1666,8 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
 	return 0;
 }
 
-static struct iomap_ioend *
-iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
-		loff_t offset, sector_t sector, struct writeback_control *wbc)
+static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
+		struct writeback_control *wbc, struct inode *inode, loff_t pos)
 {
 	struct iomap_ioend *ioend;
 	struct bio *bio;
@@ -1676,7 +1675,7 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 	bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
-	bio->bi_iter.bi_sector = sector;
+	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
 	wbc_init_bio(wbc, bio);
 
 	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
@@ -1685,9 +1684,9 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 	ioend->io_flags = wpc->iomap.flags;
 	ioend->io_inode = inode;
 	ioend->io_size = 0;
-	ioend->io_offset = offset;
+	ioend->io_offset = pos;
 	ioend->io_bio = bio;
-	ioend->io_sector = sector;
+	ioend->io_sector = bio->bi_iter.bi_sector;
 
 	wpc->nr_folios = 0;
 	return ioend;
@@ -1716,8 +1715,7 @@ iomap_chain_bio(struct bio *prev)
 }
 
 static bool
-iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
-		sector_t sector)
+iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
 {
 	if ((wpc->iomap.flags & IOMAP_F_SHARED) !=
 	    (wpc->ioend->io_flags & IOMAP_F_SHARED))
@@ -1726,7 +1724,8 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
 		return false;
 	if (offset != wpc->ioend->io_offset + wpc->ioend->io_size)
 		return false;
-	if (sector != bio_end_sector(wpc->ioend->io_bio))
+	if (iomap_sector(&wpc->iomap, offset) !=
+	    bio_end_sector(wpc->ioend->io_bio))
 		return false;
 	/*
 	 * Limit ioend bio chain lengths to minimise IO completion latency. This
@@ -1747,14 +1746,13 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
 		struct inode *inode, loff_t pos, struct list_head *iolist)
 {
 	struct iomap_folio_state *ifs = folio->private;
-	sector_t sector = iomap_sector(&wpc->iomap, pos);
 	unsigned len = i_blocksize(inode);
 	size_t poff = offset_in_folio(folio, pos);
 
-	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, sector)) {
+	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
 		if (wpc->ioend)
 			list_add(&wpc->ioend->io_list, iolist);
-		wpc->ioend = iomap_alloc_ioend(inode, wpc, pos, sector, wbc);
+		wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
 	}
 
 	if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/13] iomap: don't chain bios
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-27 12:53   ` Zhang Yi
  2023-11-29  4:59   ` Darrick J. Wong
  2023-11-26 12:47 ` [PATCH 10/13] iomap: only call mapping_set_error once for each failed bio Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Back in the days when a single bio could only be filled to the hardware
limits, and we scheduled a work item for each bio completion, chaining
multiple bios for a single ioend made a lot of sense to reduce the number
of completions.  But these days bios can be filled until we reach the
number of vectors or total size limit, which means we can always fit at
least 1 megabyte worth of data in the worst case, but usually a lot more
due to large folios.  The only thing bio chaining is buying us now is
to reduce the size of the allocation from an ioend with an embedded bio
into a plain bio, which is a 52 bytes differences on 64-bit systems.

This is not worth the added complexity, so remove the bio chaining and
only use the bio embedded into the ioend.  This will help to simplify
further changes to the iomap writeback code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 90 +++++++++++-------------------------------
 fs/xfs/xfs_aops.c      |  6 +--
 include/linux/iomap.h  |  8 +++-
 3 files changed, 32 insertions(+), 72 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 329e2c342f1c64..17760f3e67c61e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1489,40 +1489,23 @@ static u32
 iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 {
 	struct inode *inode = ioend->io_inode;
-	struct bio *bio = &ioend->io_inline_bio;
-	struct bio *last = ioend->io_bio, *next;
-	u64 start = bio->bi_iter.bi_sector;
-	loff_t offset = ioend->io_offset;
-	bool quiet = bio_flagged(bio, BIO_QUIET);
+	struct bio *bio = &ioend->io_bio;
+	struct folio_iter fi;
 	u32 folio_count = 0;
 
-	for (bio = &ioend->io_inline_bio; bio; bio = next) {
-		struct folio_iter fi;
-
-		/*
-		 * For the last bio, bi_private points to the ioend, so we
-		 * need to explicitly end the iteration here.
-		 */
-		if (bio == last)
-			next = NULL;
-		else
-			next = bio->bi_private;
-
-		/* walk all folios in bio, ending page IO on them */
-		bio_for_each_folio_all(fi, bio) {
-			iomap_finish_folio_write(inode, fi.folio, fi.length,
-					error);
-			folio_count++;
-		}
-		bio_put(bio);
+	/* walk all folios in bio, ending page IO on them */
+	bio_for_each_folio_all(fi, bio) {
+		iomap_finish_folio_write(inode, fi.folio, fi.length, error);
+		folio_count++;
 	}
-	/* The ioend has been freed by bio_put() */
 
-	if (unlikely(error && !quiet)) {
+	if (unlikely(error && !bio_flagged(bio, BIO_QUIET))) {
 		printk_ratelimited(KERN_ERR
 "%s: writeback error on inode %lu, offset %lld, sector %llu",
-			inode->i_sb->s_id, inode->i_ino, offset, start);
+			inode->i_sb->s_id, inode->i_ino,
+			ioend->io_offset, ioend->io_sector);
 	}
+	bio_put(bio);	/* frees the ioend */
 	return folio_count;
 }
 
@@ -1563,7 +1546,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_ioends);
 static bool
 iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next)
 {
-	if (ioend->io_bio->bi_status != next->io_bio->bi_status)
+	if (ioend->io_bio.bi_status != next->io_bio.bi_status)
 		return false;
 	if ((ioend->io_flags & IOMAP_F_SHARED) ^
 	    (next->io_flags & IOMAP_F_SHARED))
@@ -1628,9 +1611,8 @@ EXPORT_SYMBOL_GPL(iomap_sort_ioends);
 
 static void iomap_writepage_end_bio(struct bio *bio)
 {
-	struct iomap_ioend *ioend = bio->bi_private;
-
-	iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status));
+	iomap_finish_ioend(iomap_ioend_from_bio(bio),
+			blk_status_to_errno(bio->bi_status));
 }
 
 /*
@@ -1645,9 +1627,6 @@ static int
 iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
 		int error)
 {
-	ioend->io_bio->bi_private = ioend;
-	ioend->io_bio->bi_end_io = iomap_writepage_end_bio;
-
 	if (wpc->ops->prepare_ioend)
 		error = wpc->ops->prepare_ioend(ioend, error);
 	if (error) {
@@ -1657,12 +1636,12 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
 		 * as there is only one reference to the ioend at this point in
 		 * time.
 		 */
-		ioend->io_bio->bi_status = errno_to_blk_status(error);
-		bio_endio(ioend->io_bio);
+		ioend->io_bio.bi_status = errno_to_blk_status(error);
+		bio_endio(&ioend->io_bio);
 		return error;
 	}
 
-	submit_bio(ioend->io_bio);
+	submit_bio(&ioend->io_bio);
 	return 0;
 }
 
@@ -1676,44 +1655,22 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
+	bio->bi_end_io = iomap_writepage_end_bio;
 	wbc_init_bio(wbc, bio);
 
-	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
+	ioend = iomap_ioend_from_bio(bio);
 	INIT_LIST_HEAD(&ioend->io_list);
 	ioend->io_type = wpc->iomap.type;
 	ioend->io_flags = wpc->iomap.flags;
 	ioend->io_inode = inode;
 	ioend->io_size = 0;
 	ioend->io_offset = pos;
-	ioend->io_bio = bio;
 	ioend->io_sector = bio->bi_iter.bi_sector;
 
 	wpc->nr_folios = 0;
 	return ioend;
 }
 
-/*
- * Allocate a new bio, and chain the old bio to the new one.
- *
- * Note that we have to perform the chaining in this unintuitive order
- * so that the bi_private linkage is set up in the right direction for the
- * traversal in iomap_finish_ioend().
- */
-static struct bio *
-iomap_chain_bio(struct bio *prev)
-{
-	struct bio *new;
-
-	new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
-	bio_clone_blkg_association(new, prev);
-	new->bi_iter.bi_sector = bio_end_sector(prev);
-
-	bio_chain(prev, new);
-	bio_get(prev);		/* for iomap_finish_ioend */
-	submit_bio(prev);
-	return new;
-}
-
 static bool
 iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
 {
@@ -1725,7 +1682,7 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
 	if (offset != wpc->ioend->io_offset + wpc->ioend->io_size)
 		return false;
 	if (iomap_sector(&wpc->iomap, offset) !=
-	    bio_end_sector(wpc->ioend->io_bio))
+	    bio_end_sector(&wpc->ioend->io_bio))
 		return false;
 	/*
 	 * Limit ioend bio chain lengths to minimise IO completion latency. This
@@ -1750,15 +1707,14 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
 	size_t poff = offset_in_folio(folio, pos);
 
 	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
+new_ioend:
 		if (wpc->ioend)
 			list_add(&wpc->ioend->io_list, iolist);
 		wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
 	}
 
-	if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) {
-		wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio);
-		bio_add_folio_nofail(wpc->ioend->io_bio, folio, len, poff);
-	}
+	if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
+		goto new_ioend;
 
 	if (ifs)
 		atomic_add(len, &ifs->write_bytes_pending);
@@ -1979,7 +1935,7 @@ EXPORT_SYMBOL_GPL(iomap_writepages);
 static int __init iomap_init(void)
 {
 	return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
-			   offsetof(struct iomap_ioend, io_inline_bio),
+			   offsetof(struct iomap_ioend, io_bio),
 			   BIOSET_NEED_BVECS);
 }
 fs_initcall(iomap_init);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 465d7630bb2185..b45ee6cbbdaab2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -112,7 +112,7 @@ xfs_end_ioend(
 	 * longer dirty. If we don't remove delalloc blocks here, they become
 	 * stale and can corrupt free space accounting on unmount.
 	 */
-	error = blk_status_to_errno(ioend->io_bio->bi_status);
+	error = blk_status_to_errno(ioend->io_bio.bi_status);
 	if (unlikely(error)) {
 		if (ioend->io_flags & IOMAP_F_SHARED) {
 			xfs_reflink_cancel_cow_range(ip, offset, size, true);
@@ -179,7 +179,7 @@ STATIC void
 xfs_end_bio(
 	struct bio		*bio)
 {
-	struct iomap_ioend	*ioend = bio->bi_private;
+	struct iomap_ioend	*ioend = iomap_ioend_from_bio(bio);
 	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
 	unsigned long		flags;
 
@@ -444,7 +444,7 @@ xfs_prepare_ioend(
 	/* send ioends that might require a transaction to the completion wq */
 	if (xfs_ioend_is_append(ioend) || ioend->io_type == IOMAP_UNWRITTEN ||
 	    (ioend->io_flags & IOMAP_F_SHARED))
-		ioend->io_bio->bi_end_io = xfs_end_bio;
+		ioend->io_bio.bi_end_io = xfs_end_bio;
 	return status;
 }
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index b2a05dff914d0c..b8d3b658ad2b03 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -297,10 +297,14 @@ struct iomap_ioend {
 	size_t			io_size;	/* size of the extent */
 	loff_t			io_offset;	/* offset in the file */
 	sector_t		io_sector;	/* start sector of ioend */
-	struct bio		*io_bio;	/* bio being built */
-	struct bio		io_inline_bio;	/* MUST BE LAST! */
+	struct bio		io_bio;		/* MUST BE LAST! */
 };
 
+static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
+{
+	return container_of(bio, struct iomap_ioend, io_bio);
+}
+
 struct iomap_writeback_ops {
 	/*
 	 * Required, maps the blocks so that writeback can be performed on
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/13] iomap: only call mapping_set_error once for each failed bio
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 09/13] iomap: don't chain bios Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-29  5:03   ` Darrick J. Wong
  2023-11-26 12:47 ` [PATCH 11/13] iomap: factor out a iomap_writepage_map_block helper Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Instead of clling mapping_set_error once per folio, only do that once
per bio, and consolidate all the writeback error handling code in
iomap_finish_ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 17760f3e67c61e..e1d5076251702d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1464,15 +1464,10 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
 EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
 static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
-		size_t len, int error)
+		size_t len)
 {
 	struct iomap_folio_state *ifs = folio->private;
 
-	if (error) {
-		folio_set_error(folio);
-		mapping_set_error(inode->i_mapping, error);
-	}
-
 	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
 	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
 
@@ -1493,18 +1488,24 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 	struct folio_iter fi;
 	u32 folio_count = 0;
 
+	if (error) {
+		mapping_set_error(inode->i_mapping, error);
+		if (!bio_flagged(bio, BIO_QUIET)) {
+			pr_err_ratelimited(
+"%s: writeback error on inode %lu, offset %lld, sector %llu",
+				inode->i_sb->s_id, inode->i_ino,
+				ioend->io_offset, ioend->io_sector);
+		}
+	}
+
 	/* walk all folios in bio, ending page IO on them */
 	bio_for_each_folio_all(fi, bio) {
-		iomap_finish_folio_write(inode, fi.folio, fi.length, error);
+		if (error)
+			folio_set_error(fi.folio);
+		iomap_finish_folio_write(inode, fi.folio, fi.length);
 		folio_count++;
 	}
 
-	if (unlikely(error && !bio_flagged(bio, BIO_QUIET))) {
-		printk_ratelimited(KERN_ERR
-"%s: writeback error on inode %lu, offset %lld, sector %llu",
-			inode->i_sb->s_id, inode->i_ino,
-			ioend->io_offset, ioend->io_sector);
-	}
 	bio_put(bio);	/* frees the ioend */
 	return folio_count;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/13] iomap: factor out a iomap_writepage_map_block helper
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 10/13] iomap: only call mapping_set_error once for each failed bio Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-29  5:06   ` Darrick J. Wong
  2023-11-26 12:47 ` [PATCH 12/13] iomap: submit ioends immediately Christoph Hellwig
  2023-11-26 12:47 ` [PATCH 13/13] iomap: map multiple blocks at a time Christoph Hellwig
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Split the loop body that calls into the file system to map a block and
add it to the ioend into a separate helper to prefer for refactoring of
the surrounding code.

Note that this was the only place in iomap_writepage_map that could
return an error, so include the call to ->discard_folio into the new
helper as that will help to avoid code duplication in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 72 +++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index e1d5076251702d..9f223820f60d22 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1723,6 +1723,45 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
 	wbc_account_cgroup_owner(wbc, &folio->page, len);
 }
 
+static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
+		struct writeback_control *wbc, struct folio *folio,
+		struct inode *inode, u64 pos, unsigned *count,
+		struct list_head *submit_list)
+{
+	int error;
+
+	error = wpc->ops->map_blocks(wpc, inode, pos);
+	if (error)
+		goto fail;
+	trace_iomap_writepage_map(inode, &wpc->iomap);
+
+	switch (wpc->iomap.type) {
+	case IOMAP_INLINE:
+		WARN_ON_ONCE(1);
+		error = -EIO;
+		break;
+	case IOMAP_HOLE:
+		break;
+	default:
+		iomap_add_to_ioend(wpc, wbc, folio, inode, pos, submit_list);
+		(*count)++;
+	}
+
+fail:
+	/*
+	 * We cannot cancel the ioend directly here on error.  We may have
+	 * already set other pages under writeback and hence we have to run I/O
+	 * completion to mark the error state of the pages under writeback
+	 * appropriately.
+	 *
+	 * Just let the file system know what portion of the folio failed to
+	 * map.
+	 */
+	if (error && wpc->ops->discard_folio)
+		wpc->ops->discard_folio(folio, pos);
+	return error;
+}
+
 /*
  * Check interaction of the folio with the file end.
  *
@@ -1807,7 +1846,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	unsigned nblocks = i_blocks_per_folio(inode, folio);
 	u64 pos = folio_pos(folio);
 	u64 end_pos = pos + folio_size(folio);
-	int error = 0, count = 0, i;
+	unsigned count = 0;
+	int error = 0, i;
 	LIST_HEAD(submit_list);
 
 	trace_iomap_writepage(inode, pos, folio_size(folio));
@@ -1833,19 +1873,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
 		if (ifs && !ifs_block_is_dirty(folio, ifs, i))
 			continue;
-
-		error = wpc->ops->map_blocks(wpc, inode, pos);
+		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos,
+				&count, &submit_list);
 		if (error)
 			break;
-		trace_iomap_writepage_map(inode, &wpc->iomap);
-		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) {
-			error = -EIO;
-			break;
-		}
-		if (wpc->iomap.type == IOMAP_HOLE)
-			continue;
-		iomap_add_to_ioend(wpc, wbc, folio, inode, pos, &submit_list);
-		count++;
 	}
 	if (count)
 		wpc->nr_folios++;
@@ -1855,23 +1886,6 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	WARN_ON_ONCE(folio_test_writeback(folio));
 	WARN_ON_ONCE(folio_test_dirty(folio));
 
-	/*
-	 * We cannot cancel the ioend directly here on error.  We may have
-	 * already set other pages under writeback and hence we have to run I/O
-	 * completion to mark the error state of the pages under writeback
-	 * appropriately.
-	 */
-	if (unlikely(error)) {
-		/*
-		 * Let the filesystem know what portion of the current page
-		 * failed to map. If the page hasn't been added to ioend, it
-		 * won't be affected by I/O completion and we must unlock it
-		 * now.
-		 */
-		if (wpc->ops->discard_folio)
-			wpc->ops->discard_folio(folio, pos);
-	}
-
 	/*
 	 * We can have dirty bits set past end of file in page_mkwrite path
 	 * while mapping the last partial folio. Hence it's better to clear
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/13] iomap: submit ioends immediately
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 11/13] iomap: factor out a iomap_writepage_map_block helper Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-29  5:14   ` Darrick J. Wong
  2023-11-26 12:47 ` [PATCH 13/13] iomap: map multiple blocks at a time Christoph Hellwig
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

Currently the writeback code delays submitting fill ioends until we
reach the end of the folio.  The reason for that is that otherwise
the end I/O handler could clear the writeback bit before we've even
finished submitting all I/O for the folio.

Add a bias to ifs->write_bytes_pending while we are submitting I/O
for a folio so that it never reaches zero until all I/O is completed
to prevent the early writeback bit clearing, and remove the now
superfluous submit_list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 157 ++++++++++++++++++++---------------------
 1 file changed, 75 insertions(+), 82 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9f223820f60d22..a01b0686e7c8a0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1620,30 +1620,34 @@ static void iomap_writepage_end_bio(struct bio *bio)
  * Submit the final bio for an ioend.
  *
  * If @error is non-zero, it means that we have a situation where some part of
- * the submission process has failed after we've marked pages for writeback
- * and unlocked them.  In this situation, we need to fail the bio instead of
- * submitting it.  This typically only happens on a filesystem shutdown.
+ * the submission process has failed after we've marked pages for writeback.
+ * We cannot cancel ioend directly in that case, so call the bio end I/O handler
+ * with the error status here to run the normal I/O completion handler to clear
+ * the writeback bit and let the file system proess the errors.
  */
-static int
-iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
-		int error)
+static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
 {
+	if (!wpc->ioend)
+		return error;
+
+	/*
+	 * Let the file systems prepare the I/O submission and hook in an I/O
+	 * comletion handler.  This also needs to happen in case after a
+	 * failure happened so that the file system end I/O handler gets called
+	 * to clean up.
+	 */
 	if (wpc->ops->prepare_ioend)
-		error = wpc->ops->prepare_ioend(ioend, error);
+		error = wpc->ops->prepare_ioend(wpc->ioend, error);
+
 	if (error) {
-		/*
-		 * If we're failing the IO now, just mark the ioend with an
-		 * error and finish it.  This will run IO completion immediately
-		 * as there is only one reference to the ioend at this point in
-		 * time.
-		 */
-		ioend->io_bio.bi_status = errno_to_blk_status(error);
-		bio_endio(&ioend->io_bio);
-		return error;
+		wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
+		bio_endio(&wpc->ioend->io_bio);
+	} else {
+		submit_bio(&wpc->ioend->io_bio);
 	}
 
-	submit_bio(&ioend->io_bio);
-	return 0;
+	wpc->ioend = NULL;
+	return error;
 }
 
 static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
@@ -1698,19 +1702,28 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
 /*
  * Test to see if we have an existing ioend structure that we could append to
  * first; otherwise finish off the current ioend and start another.
+ *
+ * If a new ioend is created and cached, the old ioend is submitted to the block
+ * layer instantly.  Batching optimisations are provided by higher level block
+ * plugging.
+ *
+ * At the end of a writeback pass, there will be a cached ioend remaining on the
+ * writepage context that the caller will need to submit.
  */
-static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct folio *folio,
-		struct inode *inode, loff_t pos, struct list_head *iolist)
+		struct inode *inode, loff_t pos)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	unsigned len = i_blocksize(inode);
 	size_t poff = offset_in_folio(folio, pos);
+	int error;
 
 	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
 new_ioend:
-		if (wpc->ioend)
-			list_add(&wpc->ioend->io_list, iolist);
+		error = iomap_submit_ioend(wpc, 0);
+		if (error)
+			return error;
 		wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
 	}
 
@@ -1721,12 +1734,12 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
 		atomic_add(len, &ifs->write_bytes_pending);
 	wpc->ioend->io_size += len;
 	wbc_account_cgroup_owner(wbc, &folio->page, len);
+	return 0;
 }
 
 static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct folio *folio,
-		struct inode *inode, u64 pos, unsigned *count,
-		struct list_head *submit_list)
+		struct inode *inode, u64 pos, unsigned *count)
 {
 	int error;
 
@@ -1743,7 +1756,7 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
 	case IOMAP_HOLE:
 		break;
 	default:
-		iomap_add_to_ioend(wpc, wbc, folio, inode, pos, submit_list);
+		iomap_add_to_ioend(wpc, wbc, folio, inode, pos);
 		(*count)++;
 	}
 
@@ -1820,35 +1833,21 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
 	return true;
 }
 
-/*
- * We implement an immediate ioend submission policy here to avoid needing to
- * chain multiple ioends and hence nest mempool allocations which can violate
- * the forward progress guarantees we need to provide. The current ioend we're
- * adding blocks to is cached in the writepage context, and if the new block
- * doesn't append to the cached ioend, it will create a new ioend and cache that
- * instead.
- *
- * If a new ioend is created and cached, the old ioend is returned and queued
- * locally for submission once the entire page is processed or an error has been
- * detected.  While ioends are submitted immediately after they are completed,
- * batching optimisations are provided by higher level block plugging.
- *
- * At the end of a writeback pass, there will be a cached ioend remaining on the
- * writepage context that the caller will need to submit.
- */
 static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct folio *folio)
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
 	unsigned nblocks = i_blocks_per_folio(inode, folio);
 	u64 pos = folio_pos(folio);
 	u64 end_pos = pos + folio_size(folio);
 	unsigned count = 0;
 	int error = 0, i;
-	LIST_HEAD(submit_list);
+
+	WARN_ON_ONCE(!folio_test_locked(folio));
+	WARN_ON_ONCE(folio_test_dirty(folio));
+	WARN_ON_ONCE(folio_test_writeback(folio));
 
 	trace_iomap_writepage(inode, pos, folio_size(folio));
 
@@ -1858,12 +1857,27 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	}
 	WARN_ON_ONCE(end_pos <= pos);
 
-	if (!ifs && nblocks > 1) {
-		ifs = ifs_alloc(inode, folio, 0);
-		iomap_set_range_dirty(folio, 0, end_pos - pos);
+	if (nblocks > 1) {
+		if (!ifs) {
+			ifs = ifs_alloc(inode, folio, 0);
+			iomap_set_range_dirty(folio, 0, end_pos - pos);
+		}
+
+		/*
+		 * Keep the I/O completion handler from clearing the writeback
+		 * bit until we have submitted all blocks by adding a bias to
+		 * ifs->write_bytes_pending, which is dropped after submitting
+		 * all blocks.
+		 */
+		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+		atomic_inc(&ifs->write_bytes_pending);
 	}
 
-	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
+	/*
+	 * Set the writeback bit ASAP, as the I/O completion for the single
+	 * block per folio case happen hit as soon as we're submitting the bio.
+	 */
+	folio_start_writeback(folio);
 
 	/*
 	 * Walk through the folio to find areas to write back. If we
@@ -1874,18 +1888,13 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 		if (ifs && !ifs_block_is_dirty(folio, ifs, i))
 			continue;
 		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos,
-				&count, &submit_list);
+				&count);
 		if (error)
 			break;
 	}
 	if (count)
 		wpc->nr_folios++;
 
-	WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
-	WARN_ON_ONCE(!folio_test_locked(folio));
-	WARN_ON_ONCE(folio_test_writeback(folio));
-	WARN_ON_ONCE(folio_test_dirty(folio));
-
 	/*
 	 * We can have dirty bits set past end of file in page_mkwrite path
 	 * while mapping the last partial folio. Hence it's better to clear
@@ -1893,35 +1902,21 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 */
 	iomap_clear_range_dirty(folio, 0, folio_size(folio));
 
-	if (error && !count) {
-		folio_unlock(folio);
-		goto done;
-	}
-
-	folio_start_writeback(folio);
-	folio_unlock(folio);
-
 	/*
-	 * Preserve the original error if there was one; catch
-	 * submission errors here and propagate into subsequent ioend
-	 * submissions.
+	 * Usually the writeback bit is cleared by the I/O completion handler.
+	 * But we may end up either not actually writing any blocks, or (when
+	 * there are multiple blocks in a folio) all I/O might have finished
+	 * already at this point.  In that case we need to clear the writeback
+	 * bit ourselves right after unlocking the page.
 	 */
-	list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
-		int error2;
-
-		list_del_init(&ioend->io_list);
-		error2 = iomap_submit_ioend(wpc, ioend, error);
-		if (error2 && !error)
-			error = error2;
+	folio_unlock(folio);
+	if (ifs) {
+		if (atomic_dec_and_test(&ifs->write_bytes_pending))
+			folio_end_writeback(folio);
+	} else {
+		if (!count)
+			folio_end_writeback(folio);
 	}
-
-	/*
-	 * We can end up here with no error and nothing to write only if we race
-	 * with a partial page truncate on a sub-page block sized filesystem.
-	 */
-	if (!count)
-		folio_end_writeback(folio);
-done:
 	mapping_set_error(inode->i_mapping, error);
 	return error;
 }
@@ -1941,9 +1936,7 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
 
 	wpc->ops = ops;
 	ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc);
-	if (!wpc->ioend)
-		return ret;
-	return iomap_submit_ioend(wpc, wpc->ioend, ret);
+	return iomap_submit_ioend(wpc, ret);
 }
 EXPORT_SYMBOL_GPL(iomap_writepages);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 13/13] iomap: map multiple blocks at a time
  2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
                   ` (11 preceding siblings ...)
  2023-11-26 12:47 ` [PATCH 12/13] iomap: submit ioends immediately Christoph Hellwig
@ 2023-11-26 12:47 ` Christoph Hellwig
  2023-11-29  5:25   ` Darrick J. Wong
  12 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-26 12:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

The ->map_blocks interface returns a valid range for writeback, but we
still call back into it for every block, which is a bit inefficient.

Change xfs_writepage_map to use the valid range in the map until the end
of the folio or the dirty range inside the folio instead of calling back
into every block.

Note that the range is not used over folio boundaries as we need to be
able to check the mapping sequence count under the folio lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 113 ++++++++++++++++++++++++++++-------------
 include/linux/iomap.h  |   7 +++
 2 files changed, 86 insertions(+), 34 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index a01b0686e7c8a0..a98cb38a71ebc8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (C) 2010 Red Hat, Inc.
- * Copyright (C) 2016-2019 Christoph Hellwig.
+ * Copyright (C) 2016-2023 Christoph Hellwig.
  */
 #include <linux/module.h>
 #include <linux/compiler.h>
@@ -95,6 +95,42 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
 	return test_bit(block + blks_per_folio, ifs->state);
 }
 
+static unsigned ifs_find_dirty_range(struct folio *folio,
+		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned start_blk =
+		offset_in_folio(folio, *range_start) >> inode->i_blkbits;
+	unsigned end_blk = min_not_zero(
+		offset_in_folio(folio, range_end) >> inode->i_blkbits,
+		i_blocks_per_folio(inode, folio));
+	unsigned nblks = 1;
+
+	while (!ifs_block_is_dirty(folio, ifs, start_blk))
+		if (++start_blk == end_blk)
+			return 0;
+
+	while (start_blk + nblks < end_blk &&
+	       ifs_block_is_dirty(folio, ifs, start_blk + nblks))
+		nblks++;
+
+	*range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
+	return nblks << inode->i_blkbits;
+}
+
+static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
+		u64 range_end)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	
+	if (*range_start >= range_end)
+		return 0;
+
+	if (ifs)
+		return ifs_find_dirty_range(folio, ifs, range_start, range_end);
+	return range_end - *range_start;
+}
+
 static void ifs_clear_range_dirty(struct folio *folio,
 		struct iomap_folio_state *ifs, size_t off, size_t len)
 {
@@ -1712,10 +1748,9 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
  */
 static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct folio *folio,
-		struct inode *inode, loff_t pos)
+		struct inode *inode, loff_t pos, unsigned len)
 {
 	struct iomap_folio_state *ifs = folio->private;
-	unsigned len = i_blocksize(inode);
 	size_t poff = offset_in_folio(folio, pos);
 	int error;
 
@@ -1739,28 +1774,41 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
 
 static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
 		struct writeback_control *wbc, struct folio *folio,
-		struct inode *inode, u64 pos, unsigned *count)
+		struct inode *inode, u64 pos, unsigned dirty_len,
+		unsigned *count)
 {
 	int error;
 
-	error = wpc->ops->map_blocks(wpc, inode, pos);
-	if (error)
-		goto fail;
-	trace_iomap_writepage_map(inode, &wpc->iomap);
-
-	switch (wpc->iomap.type) {
-	case IOMAP_INLINE:
-		WARN_ON_ONCE(1);
-		error = -EIO;
-		break;
-	case IOMAP_HOLE:
-		break;
-	default:
-		iomap_add_to_ioend(wpc, wbc, folio, inode, pos);
-		(*count)++;
-	}
+	do {
+		unsigned map_len;
+
+		error = wpc->ops->map_blocks(wpc, inode, pos);
+		if (error)
+			break;
+		trace_iomap_writepage_map(inode, &wpc->iomap);
+
+		map_len = min_t(u64, dirty_len,
+			wpc->iomap.offset + wpc->iomap.length - pos);
+		WARN_ON_ONCE(!folio->private && map_len < dirty_len);
+
+		switch (wpc->iomap.type) {
+		case IOMAP_INLINE:
+			WARN_ON_ONCE(1);
+			error = -EIO;
+			break;
+		case IOMAP_HOLE:
+			break;
+		default:
+			error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
+					map_len);
+			if (!error)
+				(*count)++;
+			break;
+		}
+		dirty_len -= map_len;
+		pos += map_len;
+	} while (dirty_len && !error);
 
-fail:
 	/*
 	 * We cannot cancel the ioend directly here on error.  We may have
 	 * already set other pages under writeback and hence we have to run I/O
@@ -1827,7 +1875,7 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
 		 * beyond i_size.
 		 */
 		folio_zero_segment(folio, poff, folio_size(folio));
-		*end_pos = isize;
+		*end_pos = round_up(isize, i_blocksize(inode));
 	}
 
 	return true;
@@ -1838,12 +1886,11 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 {
 	struct iomap_folio_state *ifs = folio->private;
 	struct inode *inode = folio->mapping->host;
-	unsigned len = i_blocksize(inode);
-	unsigned nblocks = i_blocks_per_folio(inode, folio);
 	u64 pos = folio_pos(folio);
 	u64 end_pos = pos + folio_size(folio);
 	unsigned count = 0;
-	int error = 0, i;
+	int error = 0;
+	u32 rlen;
 
 	WARN_ON_ONCE(!folio_test_locked(folio));
 	WARN_ON_ONCE(folio_test_dirty(folio));
@@ -1857,7 +1904,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	}
 	WARN_ON_ONCE(end_pos <= pos);
 
-	if (nblocks > 1) {
+	if (i_blocks_per_folio(inode, folio) > 1) {
 		if (!ifs) {
 			ifs = ifs_alloc(inode, folio, 0);
 			iomap_set_range_dirty(folio, 0, end_pos - pos);
@@ -1880,18 +1927,16 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	folio_start_writeback(folio);
 
 	/*
-	 * Walk through the folio to find areas to write back. If we
-	 * run off the end of the current map or find the current map
-	 * invalid, grab a new one.
+	 * Walk through the folio to find dirty areas to write back.
 	 */
-	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
-		if (ifs && !ifs_block_is_dirty(folio, ifs, i))
-			continue;
-		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos,
-				&count);
+	while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
+		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
+				pos, rlen, &count);
 		if (error)
 			break;
+		pos += rlen;
 	}
+
 	if (count)
 		wpc->nr_folios++;
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index b8d3b658ad2b03..49d93f53878565 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -309,6 +309,13 @@ struct iomap_writeback_ops {
 	/*
 	 * Required, maps the blocks so that writeback can be performed on
 	 * the range starting at offset.
+	 *
+	 * Can return arbitrarily large regions, but we need to call into it at
+	 * least once per folio to allow the file systems to synchronize with
+	 * the write path that could be invalidating mappings.
+	 *
+	 * An existing mapping from a previous call to this method can be reused
+	 * by the file system if it is still valid.
 	 */
 	int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
 				loff_t offset);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures
  2023-11-26 12:47 ` [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures Christoph Hellwig
@ 2023-11-27  3:47   ` Ritesh Harjani
  2023-11-27  6:29     ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  3:47 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> write_cache_pages always clear the page dirty bit before calling into the
> file systems, and leaves folios with a writeback failure without the
> dirty bit after return.  We also clear the per-block writeback bits for

you mean per-block dirty bits, right?

> writeback failures unless no I/O has submitted, which will leave the
> folio in an inconsistent state where it doesn't have the folio dirty,
> but one or more per-block dirty bits.  This seems to be due the place
> where the iomap_clear_range_dirty call was inserted into the existing
> not very clearly structured code when adding per-block dirty bit support
> and not actually intentional.  Switch to always clearing the dirty on
> writeback failure.
>
> Fixes: 4ce02c679722 ("iomap: Add per-block dirty state tracking to improve performance")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Thanks for catching it. Small nit.

>  fs/iomap/buffered-io.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f72df2babe561a..98d52feb220f0a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1849,10 +1849,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  		 */

		/*
		 * Let the filesystem know what portion of the current page
		 * failed to map. If the page hasn't been added to ioend, it
		 * won't be affected by I/O completion and we must unlock it
		 * now.
		 */
The comment to unlock it now becomes stale here.

>  		if (wpc->ops->discard_folio)
>  			wpc->ops->discard_folio(folio, pos);
> -		if (!count) {
> -			folio_unlock(folio);
> -			goto done;
> -		}
>  	}
>  
>  	/*
> @@ -1861,6 +1857,12 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	 * all the dirty bits in the folio here.
>  	 */
>  	iomap_clear_range_dirty(folio, 0, folio_size(folio));

Maybe why not move iomap_clear_range_dirty() before?

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 200c26f95893..c875ba632dd8 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1842,6 +1842,13 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
        if (count)
                wpc->ioend->io_folios++;

+       /*
+        * We can have dirty bits set past end of file in page_mkwrite path
+        * while mapping the last partial folio. Hence it's better to clear
+        * all the dirty bits in the folio here.
+        */
+       iomap_clear_range_dirty(folio, 0, folio_size(folio));
+
        WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
        WARN_ON_ONCE(!folio_test_locked(folio));
        WARN_ON_ONCE(folio_test_writeback(folio));
@@ -1867,13 +1874,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
                        goto done;
                }
        }
-
-       /*
-        * We can have dirty bits set past end of file in page_mkwrite path
-        * while mapping the last partial folio. Hence it's better to clear
-        * all the dirty bits in the folio here.
-        */
-       iomap_clear_range_dirty(folio, 0, folio_size(folio));
        folio_start_writeback(folio);
        folio_unlock(folio);

> +
> +	if (error && !count) {
> +		folio_unlock(folio);
> +		goto done;
> +	}
> +
>  	folio_start_writeback(folio);
>  	folio_unlock(folio);
>  

-ritesh

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error
  2023-11-26 12:47 ` [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error Christoph Hellwig
@ 2023-11-27  5:01   ` Ritesh Harjani
  2023-11-27  6:33     ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  5:01 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> iomap_writepage_map aready warns about inline data, but then just ignores
> it.  Treat it as an error and return -EIO.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)

The code change looks very obvious. But sorry that I have some queries
which I would like to clarify - 

The dirty page we are trying to write can always belong to the dirty
inode with inline data in it right? 
So it is then the FS responsibility to un-inline the inode in the
->map_blocks call is it?

Looking at the gfs2 code, it might as well return iomap->type as
IOMAP_INLINE for IOMAP_WRITE request in it's iomap_writeback_ops no?
    iomap_writeback_ops -> gfs2_map_blocks -> __gfs2_iomap_get

>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 98d52feb220f0a..b1bcc43baf0caf 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1818,8 +1818,10 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  		if (error)
>  			break;
>  		trace_iomap_writepage_map(inode, &wpc->iomap);
> -		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
> -			continue;
> +		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) {
> +			error = -EIO;
> +			break;
> +		}
>  		if (wpc->iomap.type == IOMAP_HOLE)
>  			continue;
>  		iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc,
> -- 
> 2.39.2

-ritesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend
  2023-11-26 12:47 ` [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend Christoph Hellwig
@ 2023-11-27  5:33   ` Ritesh Harjani
  2023-11-29  4:41   ` Darrick J. Wong
  1 sibling, 0 replies; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  5:33 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> The io_folios member in struct iomap_ioend counts the number of folios
> added to an ioend.  It is only used at submission time and can thus be
> moved to iomap_writepage_ctx instead.
>

No objections there. We indeed used io_folios to limit the ioend bio
chain lengths and we do this only at the submission time, which is where
wpc is also used for.

Looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures
  2023-11-27  3:47   ` Ritesh Harjani
@ 2023-11-27  6:29     ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-27  6:29 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 09:17:07AM +0530, Ritesh Harjani wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > write_cache_pages always clear the page dirty bit before calling into the
> > file systems, and leaves folios with a writeback failure without the
> > dirty bit after return.  We also clear the per-block writeback bits for
> 
> you mean per-block dirty bits, right?

Yes, sorry.

> 		/*
> 		 * Let the filesystem know what portion of the current page
> 		 * failed to map. If the page hasn't been added to ioend, it
> 		 * won't be affected by I/O completion and we must unlock it
> 		 * now.
> 		 */
> The comment to unlock it now becomes stale here.

Indeed.

> Maybe why not move iomap_clear_range_dirty() before?

A few patches down the ->discard_folio call moves into a helper, so
I'd rather avoid the churn here.  I'll move that part of the comment
to thew new check below iomap_clear_range_dirty instead.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error
  2023-11-27  5:01   ` Ritesh Harjani
@ 2023-11-27  6:33     ` Christoph Hellwig
  2023-11-29  4:40       ` Darrick J. Wong
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-27  6:33 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 10:31:31AM +0530, Ritesh Harjani wrote:
> The code change looks very obvious. But sorry that I have some queries
> which I would like to clarify - 
> 
> The dirty page we are trying to write can always belong to the dirty
> inode with inline data in it right? 

Yes.

> So it is then the FS responsibility to un-inline the inode in the
> ->map_blocks call is it?

I think they way it currently works for gfs2 is that writeback from the
page cache never goes back into the inline area.  

If we ever have a need to actually write back inline data we could
change this code to support it, but right now I just want to make the
assert more consistent.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage
  2023-11-26 12:47 ` [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage Christoph Hellwig
@ 2023-11-27  6:39   ` Ritesh Harjani
  2023-11-27  6:41     ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  6:39 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> The iomap writepage implementation has been removed in commit
> 478af190cb6c ("iomap: remove iomap_writepage") and this code is now only
> called through ->writepages which never happens from memory reclaim.
> Remove the canary in the coal mine now that the coal mine has been shut
> down.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 16 ----------------
>  1 file changed, 16 deletions(-)

Nice cleanup. As you explained, iomap_do_writepage() never gets called
from memory reclaim context. So it is unused code which can be removed. 

However, there was an instance when this WARN was hit by wrong
usage of PF_MEMALLOC flag [1], which was caught due to WARN_ON_ONCE.

[1]: https://lore.kernel.org/linux-xfs/20200309185714.42850-1-ebiggers@kernel.org/

Maybe we can just have a WARN_ON_ONCE() and update the comments?
We anyway don't require "goto redirty" anymore since we will never
actually get called from reclaim context.

Thoughts?

-ritesh


>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index b28c57f8603303..8148e4c9765dac 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1910,20 +1910,6 @@ static int iomap_do_writepage(struct folio *folio,
>  
>  	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
>  
> -	/*
> -	 * Refuse to write the folio out if we're called from reclaim context.
> -	 *
> -	 * This avoids stack overflows when called from deeply used stacks in
> -	 * random callers for direct reclaim or memcg reclaim.  We explicitly
> -	 * allow reclaim from kswapd as the stack usage there is relatively low.
> -	 *
> -	 * This should never happen except in the case of a VM regression so
> -	 * warn about it.
> -	 */
> -	if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
> -			PF_MEMALLOC))
> -		goto redirty;
> -
>  	/*
>  	 * Is this folio beyond the end of the file?
>  	 *
> @@ -1989,8 +1975,6 @@ static int iomap_do_writepage(struct folio *folio,
>  
>  	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
>  
> -redirty:
> -	folio_redirty_for_writepage(wbc, folio);
>  unlock:
>  	folio_unlock(folio);
>  	return 0;
> -- 
> 2.39.2

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage
  2023-11-27  6:39   ` Ritesh Harjani
@ 2023-11-27  6:41     ` Christoph Hellwig
  2023-11-29  4:45       ` Darrick J. Wong
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-27  6:41 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 12:09:08PM +0530, Ritesh Harjani wrote:
> Nice cleanup. As you explained, iomap_do_writepage() never gets called
> from memory reclaim context. So it is unused code which can be removed. 
> 
> However, there was an instance when this WARN was hit by wrong
> usage of PF_MEMALLOC flag [1], which was caught due to WARN_ON_ONCE.
> 
> [1]: https://lore.kernel.org/linux-xfs/20200309185714.42850-1-ebiggers@kernel.org/
> 
> Maybe we can just have a WARN_ON_ONCE() and update the comments?
> We anyway don't require "goto redirty" anymore since we will never
> actually get called from reclaim context.

Well, xfs/iomap never really cared about the flag, just that it didn't
get called from direct reclaim or kswapd.  If we think such a WARN_ON
is still useful, the only caller of it in do_writepages might be a better
place.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper
  2023-11-26 12:47 ` [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper Christoph Hellwig
@ 2023-11-27  6:57   ` Ritesh Harjani
  2023-11-27  7:02     ` Ritesh Harjani
  0 siblings, 1 reply; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  6:57 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> Most of iomap_do_writepage is dedidcated to handling a folio crossing or
> beyond i_size.  Split this is into a separate helper and update the
> commens to deal with folios instead of pages and make them more readable.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 128 ++++++++++++++++++++---------------------
>  1 file changed, 62 insertions(+), 66 deletions(-)

Just a small nit below.

But otherwise looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8148e4c9765dac..4a5a21809b0182 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
>  	wbc_account_cgroup_owner(wbc, &folio->page, len);
>  }
>  
> +/*
> + * Check interaction of the folio with the file end.
> + *
> + * If the folio is entirely beyond i_size, return false.  If it straddles
> + * i_size, adjust end_pos and zero all data beyond i_size.
> + */
> +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
> +		u64 *end_pos)
> +{
> +	u64 isize = i_size_read(inode);

i_size_read(inode) returns loff_t type. Can we make end_pos also as
loff_t type. We anyway use loff_t for
folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
would be more consistent with the data type then.

Thoughts?

-ritesh


> +
> +	if (*end_pos > isize) {
> +		size_t poff = offset_in_folio(folio, isize);
> +		pgoff_t end_index = isize >> PAGE_SHIFT;
> +
> +		/*
> +		 * If the folio is entirely ouside of i_size, skip it.
> +		 *
> +		 * This can happen due to a truncate operation that is in
> +		 * progress and in that case truncate will finish it off once
> +		 * we've dropped the folio lock.
> +		 *
> +		 * Note that the pgoff_t used for end_index is an unsigned long.
> +		 * If the given offset is greater than 16TB on a 32-bit system,
> +		 * then if we checked if the folio is fully outside i_size with
> +		 * "if (folio->index >= end_index + 1)", "end_index + 1" would
> +		 * overflow and evaluate to 0.  Hence this folio would be
> +		 * redirtied and written out repeatedly, which would result in
> +		 * an infinite loop; the user program performing this operation
> +		 * would hang.  Instead, we can detect this situation by
> +		 * checking if the folio is totally beyond i_size or if its
> +		 * offset is just equal to the EOF.
> +		 */
> +		if (folio->index > end_index ||
> +		    (folio->index == end_index && poff == 0))
> +			return false;
> +
> +		/*
> +		 * The folio straddles i_size.
> +		 *
> +		 * It must be zeroed out on each and every writepage invocation
> +		 * because it may be mmapped:
> +		 *
> +		 *    A file is mapped in multiples of the page size.  For a
> +		 *    file that is not a multiple of the page size, the
> +		 *    remaining memory is zeroed when mapped, and writes to that
> +		 *    region are not written out to the file.
> +		 *
> +		 * Also adjust the writeback range to skip all blocks entirely
> +		 * beyond i_size.
> +		 */
> +		folio_zero_segment(folio, poff, folio_size(folio));
> +		*end_pos = isize;
> +	}
> +
> +	return true;
> +}
> +
>  /*
>   * We implement an immediate ioend submission policy here to avoid needing to
>   * chain multiple ioends and hence nest mempool allocations which can violate
> @@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio,
>  {
>  	struct iomap_writepage_ctx *wpc = data;
>  	struct inode *inode = folio->mapping->host;
> -	u64 end_pos, isize;
> +	u64 end_pos = folio_pos(folio) + folio_size(folio);
>  
>  	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
>  
> -	/*
> -	 * Is this folio beyond the end of the file?
> -	 *
> -	 * The folio index is less than the end_index, adjust the end_pos
> -	 * to the highest offset that this folio should represent.
> -	 * -----------------------------------------------------
> -	 * |			file mapping	       | <EOF> |
> -	 * -----------------------------------------------------
> -	 * | Page ... | Page N-2 | Page N-1 |  Page N  |       |
> -	 * ^--------------------------------^----------|--------
> -	 * |     desired writeback range    |      see else    |
> -	 * ---------------------------------^------------------|
> -	 */
> -	isize = i_size_read(inode);
> -	end_pos = folio_pos(folio) + folio_size(folio);
> -	if (end_pos > isize) {
> -		/*
> -		 * Check whether the page to write out is beyond or straddles
> -		 * i_size or not.
> -		 * -------------------------------------------------------
> -		 * |		file mapping		        | <EOF>  |
> -		 * -------------------------------------------------------
> -		 * | Page ... | Page N-2 | Page N-1 |  Page N   | Beyond |
> -		 * ^--------------------------------^-----------|---------
> -		 * |				    |      Straddles     |
> -		 * ---------------------------------^-----------|--------|
> -		 */
> -		size_t poff = offset_in_folio(folio, isize);
> -		pgoff_t end_index = isize >> PAGE_SHIFT;
> -
> -		/*
> -		 * Skip the page if it's fully outside i_size, e.g.
> -		 * due to a truncate operation that's in progress.  We've
> -		 * cleaned this page and truncate will finish things off for
> -		 * us.
> -		 *
> -		 * Note that the end_index is unsigned long.  If the given
> -		 * offset is greater than 16TB on a 32-bit system then if we
> -		 * checked if the page is fully outside i_size with
> -		 * "if (page->index >= end_index + 1)", "end_index + 1" would
> -		 * overflow and evaluate to 0.  Hence this page would be
> -		 * redirtied and written out repeatedly, which would result in
> -		 * an infinite loop; the user program performing this operation
> -		 * would hang.  Instead, we can detect this situation by
> -		 * checking if the page is totally beyond i_size or if its
> -		 * offset is just equal to the EOF.
> -		 */
> -		if (folio->index > end_index ||
> -		    (folio->index == end_index && poff == 0))
> -			goto unlock;
> -
> -		/*
> -		 * The page straddles i_size.  It must be zeroed out on each
> -		 * and every writepage invocation because it may be mmapped.
> -		 * "A file is mapped in multiples of the page size.  For a file
> -		 * that is not a multiple of the page size, the remaining
> -		 * memory is zeroed when mapped, and writes to that region are
> -		 * not written out to the file."
> -		 */
> -		folio_zero_segment(folio, poff, folio_size(folio));
> -		end_pos = isize;
> +	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> +		folio_unlock(folio);
> +		return 0;
>  	}
>  
>  	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
> -
> -unlock:
> -	folio_unlock(folio);
> -	return 0;
>  }
>  
>  int
> -- 
> 2.39.2

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper
  2023-11-27  6:57   ` Ritesh Harjani
@ 2023-11-27  7:02     ` Ritesh Harjani
  2023-11-27  7:12       ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  7:02 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Ritesh Harjani (IBM) <ritesh.list@gmail.com> writes:

> Christoph Hellwig <hch@lst.de> writes:
>
>> Most of iomap_do_writepage is dedidcated to handling a folio crossing or
>> beyond i_size.  Split this is into a separate helper and update the
>> commens to deal with folios instead of pages and make them more readable.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  fs/iomap/buffered-io.c | 128 ++++++++++++++++++++---------------------
>>  1 file changed, 62 insertions(+), 66 deletions(-)
>
> Just a small nit below.
>
> But otherwise looks good to me. Please feel free to add - 
>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 8148e4c9765dac..4a5a21809b0182 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
>>  	wbc_account_cgroup_owner(wbc, &folio->page, len);
>>  }
>>  
>> +/*
>> + * Check interaction of the folio with the file end.
>> + *
>> + * If the folio is entirely beyond i_size, return false.  If it straddles
>> + * i_size, adjust end_pos and zero all data beyond i_size.
>> + */
>> +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
>> +		u64 *end_pos)
>> +{
>> +	u64 isize = i_size_read(inode);
>
> i_size_read(inode) returns loff_t type. Can we make end_pos also as
> loff_t type. We anyway use loff_t for
> folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
> would be more consistent with the data type then.
>
> Thoughts?

aah, that might also require to change the types in
iomap_writepage_map(). So I guess the data type consistency change
should be a follow up change as this patch does only the refactoring.

>
> -ritesh
>
>
>> +
>> +	if (*end_pos > isize) {
>> +		size_t poff = offset_in_folio(folio, isize);
>> +		pgoff_t end_index = isize >> PAGE_SHIFT;
>> +
>> +		/*
>> +		 * If the folio is entirely ouside of i_size, skip it.
>> +		 *
>> +		 * This can happen due to a truncate operation that is in
>> +		 * progress and in that case truncate will finish it off once
>> +		 * we've dropped the folio lock.
>> +		 *
>> +		 * Note that the pgoff_t used for end_index is an unsigned long.
>> +		 * If the given offset is greater than 16TB on a 32-bit system,
>> +		 * then if we checked if the folio is fully outside i_size with
>> +		 * "if (folio->index >= end_index + 1)", "end_index + 1" would
>> +		 * overflow and evaluate to 0.  Hence this folio would be
>> +		 * redirtied and written out repeatedly, which would result in
>> +		 * an infinite loop; the user program performing this operation
>> +		 * would hang.  Instead, we can detect this situation by
>> +		 * checking if the folio is totally beyond i_size or if its
>> +		 * offset is just equal to the EOF.
>> +		 */
>> +		if (folio->index > end_index ||
>> +		    (folio->index == end_index && poff == 0))
>> +			return false;
>> +
>> +		/*
>> +		 * The folio straddles i_size.
>> +		 *
>> +		 * It must be zeroed out on each and every writepage invocation
>> +		 * because it may be mmapped:
>> +		 *
>> +		 *    A file is mapped in multiples of the page size.  For a
>> +		 *    file that is not a multiple of the page size, the
>> +		 *    remaining memory is zeroed when mapped, and writes to that
>> +		 *    region are not written out to the file.
>> +		 *
>> +		 * Also adjust the writeback range to skip all blocks entirely
>> +		 * beyond i_size.
>> +		 */
>> +		folio_zero_segment(folio, poff, folio_size(folio));
>> +		*end_pos = isize;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>  /*
>>   * We implement an immediate ioend submission policy here to avoid needing to
>>   * chain multiple ioends and hence nest mempool allocations which can violate
>> @@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio,
>>  {
>>  	struct iomap_writepage_ctx *wpc = data;
>>  	struct inode *inode = folio->mapping->host;
>> -	u64 end_pos, isize;
>> +	u64 end_pos = folio_pos(folio) + folio_size(folio);
>>  
>>  	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
>>  
>> -	/*
>> -	 * Is this folio beyond the end of the file?
>> -	 *
>> -	 * The folio index is less than the end_index, adjust the end_pos
>> -	 * to the highest offset that this folio should represent.
>> -	 * -----------------------------------------------------
>> -	 * |			file mapping	       | <EOF> |
>> -	 * -----------------------------------------------------
>> -	 * | Page ... | Page N-2 | Page N-1 |  Page N  |       |
>> -	 * ^--------------------------------^----------|--------
>> -	 * |     desired writeback range    |      see else    |
>> -	 * ---------------------------------^------------------|
>> -	 */
>> -	isize = i_size_read(inode);
>> -	end_pos = folio_pos(folio) + folio_size(folio);
>> -	if (end_pos > isize) {
>> -		/*
>> -		 * Check whether the page to write out is beyond or straddles
>> -		 * i_size or not.
>> -		 * -------------------------------------------------------
>> -		 * |		file mapping		        | <EOF>  |
>> -		 * -------------------------------------------------------
>> -		 * | Page ... | Page N-2 | Page N-1 |  Page N   | Beyond |
>> -		 * ^--------------------------------^-----------|---------
>> -		 * |				    |      Straddles     |
>> -		 * ---------------------------------^-----------|--------|
>> -		 */
>> -		size_t poff = offset_in_folio(folio, isize);
>> -		pgoff_t end_index = isize >> PAGE_SHIFT;
>> -
>> -		/*
>> -		 * Skip the page if it's fully outside i_size, e.g.
>> -		 * due to a truncate operation that's in progress.  We've
>> -		 * cleaned this page and truncate will finish things off for
>> -		 * us.
>> -		 *
>> -		 * Note that the end_index is unsigned long.  If the given
>> -		 * offset is greater than 16TB on a 32-bit system then if we
>> -		 * checked if the page is fully outside i_size with
>> -		 * "if (page->index >= end_index + 1)", "end_index + 1" would
>> -		 * overflow and evaluate to 0.  Hence this page would be
>> -		 * redirtied and written out repeatedly, which would result in
>> -		 * an infinite loop; the user program performing this operation
>> -		 * would hang.  Instead, we can detect this situation by
>> -		 * checking if the page is totally beyond i_size or if its
>> -		 * offset is just equal to the EOF.
>> -		 */
>> -		if (folio->index > end_index ||
>> -		    (folio->index == end_index && poff == 0))
>> -			goto unlock;
>> -
>> -		/*
>> -		 * The page straddles i_size.  It must be zeroed out on each
>> -		 * and every writepage invocation because it may be mmapped.
>> -		 * "A file is mapped in multiples of the page size.  For a file
>> -		 * that is not a multiple of the page size, the remaining
>> -		 * memory is zeroed when mapped, and writes to that region are
>> -		 * not written out to the file."
>> -		 */
>> -		folio_zero_segment(folio, poff, folio_size(folio));
>> -		end_pos = isize;
>> +	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
>> +		folio_unlock(folio);
>> +		return 0;
>>  	}
>>  
>>  	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
>> -
>> -unlock:
>> -	folio_unlock(folio);
>> -	return 0;
>>  }
>>  
>>  int
>> -- 
>> 2.39.2

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper
  2023-11-27  7:02     ` Ritesh Harjani
@ 2023-11-27  7:12       ` Christoph Hellwig
  2023-11-29  4:48         ` Darrick J. Wong
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-27  7:12 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 12:32:38PM +0530, Ritesh Harjani wrote:
> >
> > i_size_read(inode) returns loff_t type. Can we make end_pos also as
> > loff_t type. We anyway use loff_t for
> > folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
> > would be more consistent with the data type then.
> >
> > Thoughts?
> 
> aah, that might also require to change the types in
> iomap_writepage_map(). So I guess the data type consistency change
> should be a follow up change as this patch does only the refactoring.

Yes, I'm trying to stay consistent in the writeback code.  IIRC some
of the u64 use was to better deal with overflows, but I'll have to look
up the history.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map
  2023-11-26 12:47 ` [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map Christoph Hellwig
@ 2023-11-27  7:36   ` Ritesh Harjani
  2023-11-27 19:20   ` Josef Bacik
  2023-11-29  4:50   ` Darrick J. Wong
  2 siblings, 0 replies; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  7:36 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> Move the tracepoint and the iomap check from iomap_do_writepage into
> iomap_writepage_map.  This keeps all logic in one places, and leaves
> iomap_do_writepage just as the wrapper for the callback conventions of
> write_cache_pages, which will go away when that is convertd to an
                                                     ^^^ converted
> iterator.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 34 +++++++++++-----------------------
>  1 file changed, 11 insertions(+), 23 deletions(-)

Straight forward refactoring. The change looks good to me. Please feel
free to add - 

Reivewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention
  2023-11-26 12:47 ` [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention Christoph Hellwig
@ 2023-11-27  7:43   ` Ritesh Harjani
  2023-11-27  8:13     ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  7:43 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> Switch to the same argument order as iomap_writepage_map and remove the
> ifs argument that can be trivially recalculated.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)

Straight forward change. Looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention
  2023-11-27  7:43   ` Ritesh Harjani
@ 2023-11-27  8:13     ` Christoph Hellwig
  2023-11-29  4:51       ` Darrick J. Wong
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-27  8:13 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 01:13:29PM +0530, Ritesh Harjani wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > Switch to the same argument order as iomap_writepage_map and remove the
> > ifs argument that can be trivially recalculated.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/iomap/buffered-io.c | 11 +++++------
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> Straight forward change. Looks good to me. Please feel free to add - 

Also the subject should be changed now that I'm reading your reply.
I dropped the rename of this function that I had in earlier versions.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
  2023-11-26 12:47 ` [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend Christoph Hellwig
@ 2023-11-27  9:54   ` Ritesh Harjani
  2023-11-27 13:54     ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Ritesh Harjani @ 2023-11-27  9:54 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

Christoph Hellwig <hch@lst.de> writes:

> The calculation in iomap_sector is pretty trivial and most of the time
> iomap_add_to_ioend only callers either iomap_can_add_to_ioend or
> iomap_alloc_ioend from a single invocation.
>
> Calculate the sector in the two lower level functions and stop passing it
> from iomap_add_to_ioend and update the iomap_alloc_ioend argument passing
> order to match that of iomap_add_to_ioend.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)

Straight forward change. Looks good to me, please feel free to add - 
(small nit below on naming style convention)

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7f86d2f90e3863..329e2c342f1c64 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1666,9 +1666,8 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
>  	return 0;
>  }
>  
> -static struct iomap_ioend *
> -iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
> -		loff_t offset, sector_t sector, struct writeback_control *wbc)
> +static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
> +		struct writeback_control *wbc, struct inode *inode, loff_t pos)
>  {
>  	struct iomap_ioend *ioend;
>  	struct bio *bio;
> @@ -1676,7 +1675,7 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
>  	bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
>  			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
>  			       GFP_NOFS, &iomap_ioend_bioset);
> -	bio->bi_iter.bi_sector = sector;
> +	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
>  	wbc_init_bio(wbc, bio);
>  
>  	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
> @@ -1685,9 +1684,9 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
>  	ioend->io_flags = wpc->iomap.flags;
>  	ioend->io_inode = inode;
>  	ioend->io_size = 0;
> -	ioend->io_offset = offset;
> +	ioend->io_offset = pos;
>  	ioend->io_bio = bio;
> -	ioend->io_sector = sector;
> +	ioend->io_sector = bio->bi_iter.bi_sector;
>  
>  	wpc->nr_folios = 0;
>  	return ioend;
> @@ -1716,8 +1715,7 @@ iomap_chain_bio(struct bio *prev)
>  }
>  
>  static bool
> -iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
> -		sector_t sector)
> +iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)

Not sure which style you would like to keep in fs/iomap/.
Should the function name be in the same line as "static bool" or in the next line?
For previous function you made the function name definition in the same
line. Or is the naming style irrelevant for fs/iomap/?

-ritesh

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/13] iomap: don't chain bios
  2023-11-26 12:47 ` [PATCH 09/13] iomap: don't chain bios Christoph Hellwig
@ 2023-11-27 12:53   ` Zhang Yi
  2023-11-27 13:51     ` Christoph Hellwig
  2023-11-29  4:59   ` Darrick J. Wong
  1 sibling, 1 reply; 49+ messages in thread
From: Zhang Yi @ 2023-11-27 12:53 UTC (permalink / raw)
  To: Christoph Hellwig, Christian Brauner
  Cc: Darrick J. Wong, Chandan Babu R, Ritesh Harjani, linux-xfs,
	linux-fsdevel

Hi, Christoph.

On 2023/11/26 20:47, Christoph Hellwig wrote:
> Back in the days when a single bio could only be filled to the hardware
> limits, and we scheduled a work item for each bio completion, chaining
> multiple bios for a single ioend made a lot of sense to reduce the number
> of completions.  But these days bios can be filled until we reach the
> number of vectors or total size limit, which means we can always fit at
> least 1 megabyte worth of data in the worst case, but usually a lot more
> due to large folios.  The only thing bio chaining is buying us now is
> to reduce the size of the allocation from an ioend with an embedded bio
> into a plain bio, which is a 52 bytes differences on 64-bit systems.
> 
> This is not worth the added complexity, so remove the bio chaining and
> only use the bio embedded into the ioend.  This will help to simplify
> further changes to the iomap writeback code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

A nice cleanup! I'm just a little curious about the writeback performance
impact of this patch. Do you have any actual test data on xfs?

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/13] iomap: don't chain bios
  2023-11-27 12:53   ` Zhang Yi
@ 2023-11-27 13:51     ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-27 13:51 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong,
	Chandan Babu R, Ritesh Harjani, linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 08:53:22PM +0800, Zhang Yi wrote:
> A nice cleanup! I'm just a little curious about the writeback performance
> impact of this patch. Do you have any actual test data on xfs?

I've only tested the entire series, not this patch specifically.  The
throughput doesn't change at all in my testing, and cpu usage goes
down a tiny amount, although it's probably below the measurement
tolerance.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
  2023-11-27  9:54   ` Ritesh Harjani
@ 2023-11-27 13:54     ` Christoph Hellwig
  2023-11-29  4:53       ` Darrick J. Wong
  0 siblings, 1 reply; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-27 13:54 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, Christian Brauner, Darrick J. Wong,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 03:24:49PM +0530, Ritesh Harjani wrote:
> >  static bool
> > -iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
> > -		sector_t sector)
> > +iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
> 
> Not sure which style you would like to keep in fs/iomap/.
> Should the function name be in the same line as "static bool" or in the next line?
> For previous function you made the function name definition in the same
> line. Or is the naming style irrelevant for fs/iomap/?

The XFS style that iomap start out with has the separate line, and I
actually kinda like it.  But I think willy convinced us a while ago to
move the common line which is the normal kernel style, and most new code
seems to use this.  And yes, I should probably be consistent and I
should change it here as well.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map
  2023-11-26 12:47 ` [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map Christoph Hellwig
  2023-11-27  7:36   ` Ritesh Harjani
@ 2023-11-27 19:20   ` Josef Bacik
  2023-11-29  4:50   ` Darrick J. Wong
  2 siblings, 0 replies; 49+ messages in thread
From: Josef Bacik @ 2023-11-27 19:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Darrick J. Wong, Chandan Babu R, Zhang Yi,
	Ritesh Harjani, linux-xfs, linux-fsdevel

The subject should read "iomap: move all remaining per-folio logic into
 iomap_writepage_map".  Thanks,

Josef

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error
  2023-11-27  6:33     ` Christoph Hellwig
@ 2023-11-29  4:40       ` Darrick J. Wong
  2023-11-29  5:39         ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ritesh Harjani, Christian Brauner, Chandan Babu R, Zhang Yi,
	linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 07:33:25AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 27, 2023 at 10:31:31AM +0530, Ritesh Harjani wrote:
> > The code change looks very obvious. But sorry that I have some queries
> > which I would like to clarify - 
> > 
> > The dirty page we are trying to write can always belong to the dirty
> > inode with inline data in it right? 
> 
> Yes.
> 
> > So it is then the FS responsibility to un-inline the inode in the
> > ->map_blocks call is it?
> 
> I think they way it currently works for gfs2 is that writeback from the
> page cache never goes back into the inline area.  
> 
> If we ever have a need to actually write back inline data we could
> change this code to support it, but right now I just want to make the
> assert more consistent.

Question: Do we even /want/ writeback to be initiating transactions
to log the inline data?  I suppose for ext4/jbd2 that would be the least
inefficient time to do that.

--D

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend
  2023-11-26 12:47 ` [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend Christoph Hellwig
  2023-11-27  5:33   ` Ritesh Harjani
@ 2023-11-29  4:41   ` Darrick J. Wong
  1 sibling, 0 replies; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

On Sun, Nov 26, 2023 at 01:47:10PM +0100, Christoph Hellwig wrote:
> The io_folios member in struct iomap_ioend counts the number of folios
> added to an ioend.  It is only used at submission time and can thus be
> moved to iomap_writepage_ctx instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

LGTM
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 7 ++++---
>  include/linux/iomap.h  | 2 +-
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index b1bcc43baf0caf..b28c57f8603303 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1685,10 +1685,11 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
>  	ioend->io_flags = wpc->iomap.flags;
>  	ioend->io_inode = inode;
>  	ioend->io_size = 0;
> -	ioend->io_folios = 0;
>  	ioend->io_offset = offset;
>  	ioend->io_bio = bio;
>  	ioend->io_sector = sector;
> +
> +	wpc->nr_folios = 0;
>  	return ioend;
>  }
>  
> @@ -1732,7 +1733,7 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
>  	 * also prevents long tight loops ending page writeback on all the
>  	 * folios in the ioend.
>  	 */
> -	if (wpc->ioend->io_folios >= IOEND_BATCH_SIZE)
> +	if (wpc->nr_folios >= IOEND_BATCH_SIZE)
>  		return false;
>  	return true;
>  }
> @@ -1829,7 +1830,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  		count++;
>  	}
>  	if (count)
> -		wpc->ioend->io_folios++;
> +		wpc->nr_folios++;
>  
>  	WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
>  	WARN_ON_ONCE(!folio_test_locked(folio));
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 96dd0acbba44ac..b2a05dff914d0c 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -293,7 +293,6 @@ struct iomap_ioend {
>  	struct list_head	io_list;	/* next ioend in chain */
>  	u16			io_type;
>  	u16			io_flags;	/* IOMAP_F_* */
> -	u32			io_folios;	/* folios added to ioend */
>  	struct inode		*io_inode;	/* file being written to */
>  	size_t			io_size;	/* size of the extent */
>  	loff_t			io_offset;	/* offset in the file */
> @@ -329,6 +328,7 @@ struct iomap_writepage_ctx {
>  	struct iomap		iomap;
>  	struct iomap_ioend	*ioend;
>  	const struct iomap_writeback_ops *ops;
> +	u32			nr_folios;	/* folios added to the ioend */
>  };
>  
>  void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage
  2023-11-27  6:41     ` Christoph Hellwig
@ 2023-11-29  4:45       ` Darrick J. Wong
  0 siblings, 0 replies; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ritesh Harjani, Christian Brauner, Chandan Babu R, Zhang Yi,
	linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 07:41:07AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 27, 2023 at 12:09:08PM +0530, Ritesh Harjani wrote:
> > Nice cleanup. As you explained, iomap_do_writepage() never gets called
> > from memory reclaim context. So it is unused code which can be removed. 
> > 
> > However, there was an instance when this WARN was hit by wrong
> > usage of PF_MEMALLOC flag [1], which was caught due to WARN_ON_ONCE.
> > 
> > [1]: https://lore.kernel.org/linux-xfs/20200309185714.42850-1-ebiggers@kernel.org/
> > 
> > Maybe we can just have a WARN_ON_ONCE() and update the comments?
> > We anyway don't require "goto redirty" anymore since we will never
> > actually get called from reclaim context.
> 
> Well, xfs/iomap never really cared about the flag, just that it didn't
> get called from direct reclaim or kswapd.  If we think such a WARN_ON
> is still useful, the only caller of it in do_writepages might be a better
> place.

I think that's a good idea, given the enormous complexity of modern day
XFS and the sometimes rough quality of others.  :P

--D

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper
  2023-11-27  7:12       ` Christoph Hellwig
@ 2023-11-29  4:48         ` Darrick J. Wong
  0 siblings, 0 replies; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ritesh Harjani, Christian Brauner, Chandan Babu R, Zhang Yi,
	linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 08:12:19AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 27, 2023 at 12:32:38PM +0530, Ritesh Harjani wrote:
> > >
> > > i_size_read(inode) returns loff_t type. Can we make end_pos also as
> > > loff_t type. We anyway use loff_t for
> > > folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
> > > would be more consistent with the data type then.
> > >
> > > Thoughts?
> > 
> > aah, that might also require to change the types in
> > iomap_writepage_map(). So I guess the data type consistency change
> > should be a follow up change as this patch does only the refactoring.

Separate patch for the cleanup, please.  Then we can more easily target
it for bisection in case there are signed comparison bugs.  I hate C.

> Yes, I'm trying to stay consistent in the writeback code.  IIRC some
> of the u64 use was to better deal with overflows, but I'll have to look
> up the history.

For this patch,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map
  2023-11-26 12:47 ` [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map Christoph Hellwig
  2023-11-27  7:36   ` Ritesh Harjani
  2023-11-27 19:20   ` Josef Bacik
@ 2023-11-29  4:50   ` Darrick J. Wong
  2 siblings, 0 replies; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

On Sun, Nov 26, 2023 at 01:47:13PM +0100, Christoph Hellwig wrote:
> Move the tracepoint and the iomap check from iomap_do_writepage into
> iomap_writepage_map.  This keeps all logic in one places, and leaves
> iomap_do_writepage just as the wrapper for the callback conventions of
> write_cache_pages, which will go away when that is convertd to an
> iterator.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

With the two fixes from Ritesh and Josef added, I think this looks like
a simple enough movement.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 34 +++++++++++-----------------------
>  1 file changed, 11 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4a5a21809b0182..5834aa46bdb8cf 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1842,19 +1842,25 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
>   * At the end of a writeback pass, there will be a cached ioend remaining on the
>   * writepage context that the caller will need to submit.
>   */
> -static int
> -iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> -		struct writeback_control *wbc, struct inode *inode,
> -		struct folio *folio, u64 end_pos)
> +static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
> +		struct writeback_control *wbc, struct folio *folio)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
> +	struct inode *inode = folio->mapping->host;
>  	struct iomap_ioend *ioend, *next;
>  	unsigned len = i_blocksize(inode);
>  	unsigned nblocks = i_blocks_per_folio(inode, folio);
>  	u64 pos = folio_pos(folio);
> +	u64 end_pos = pos + folio_size(folio);
>  	int error = 0, count = 0, i;
>  	LIST_HEAD(submit_list);
>  
> +	trace_iomap_writepage(inode, pos, folio_size(folio));
> +
> +	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> +		folio_unlock(folio);
> +		return 0;
> +	}
>  	WARN_ON_ONCE(end_pos <= pos);
>  
>  	if (!ifs && nblocks > 1) {
> @@ -1952,28 +1958,10 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	return error;
>  }
>  
> -/*
> - * Write out a dirty page.
> - *
> - * For delalloc space on the page, we need to allocate space and flush it.
> - * For unwritten space on the page, we need to start the conversion to
> - * regular allocated space.
> - */
>  static int iomap_do_writepage(struct folio *folio,
>  		struct writeback_control *wbc, void *data)
>  {
> -	struct iomap_writepage_ctx *wpc = data;
> -	struct inode *inode = folio->mapping->host;
> -	u64 end_pos = folio_pos(folio) + folio_size(folio);
> -
> -	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
> -
> -	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> -		folio_unlock(folio);
> -		return 0;
> -	}
> -
> -	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
> +	return iomap_writepage_map(data, wbc, folio);
>  }
>  
>  int
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention
  2023-11-27  8:13     ` Christoph Hellwig
@ 2023-11-29  4:51       ` Darrick J. Wong
  0 siblings, 0 replies; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ritesh Harjani, Christian Brauner, Chandan Babu R, Zhang Yi,
	linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 09:13:43AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 27, 2023 at 01:13:29PM +0530, Ritesh Harjani wrote:
> > Christoph Hellwig <hch@lst.de> writes:
> > 
> > > Switch to the same argument order as iomap_writepage_map and remove the
> > > ifs argument that can be trivially recalculated.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/iomap/buffered-io.c | 11 +++++------
> > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > Straight forward change. Looks good to me. Please feel free to add - 
> 
> Also the subject should be changed now that I'm reading your reply.
> I dropped the rename of this function that I had in earlier versions.

With that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
  2023-11-27 13:54     ` Christoph Hellwig
@ 2023-11-29  4:53       ` Darrick J. Wong
  2023-11-29  5:42         ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ritesh Harjani, Christian Brauner, Chandan Babu R, Zhang Yi,
	linux-xfs, linux-fsdevel

On Mon, Nov 27, 2023 at 02:54:02PM +0100, Christoph Hellwig wrote:
> On Mon, Nov 27, 2023 at 03:24:49PM +0530, Ritesh Harjani wrote:
> > >  static bool
> > > -iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
> > > -		sector_t sector)
> > > +iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)

Can you change @offset to @pos while you're changing the function
signature?

> > Not sure which style you would like to keep in fs/iomap/.
> > Should the function name be in the same line as "static bool" or in the next line?
> > For previous function you made the function name definition in the same
> > line. Or is the naming style irrelevant for fs/iomap/?
> 
> The XFS style that iomap start out with has the separate line, and I
> actually kinda like it.  But I think willy convinced us a while ago to
> move the common line which is the normal kernel style, and most new code
> seems to use this.  And yes, I should probably be consistent and I
> should change it here as well.

I prefer xfs style, but I've been told privately to knock it off outside
xfs.  So.  Fugly kernel style with too much manual whitespace
maintenance it is. :/

--D

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/13] iomap: don't chain bios
  2023-11-26 12:47 ` [PATCH 09/13] iomap: don't chain bios Christoph Hellwig
  2023-11-27 12:53   ` Zhang Yi
@ 2023-11-29  4:59   ` Darrick J. Wong
  2023-11-29  5:40     ` Christoph Hellwig
  1 sibling, 1 reply; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  4:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

On Sun, Nov 26, 2023 at 01:47:16PM +0100, Christoph Hellwig wrote:
> Back in the days when a single bio could only be filled to the hardware
> limits, and we scheduled a work item for each bio completion, chaining
> multiple bios for a single ioend made a lot of sense to reduce the number
> of completions.  But these days bios can be filled until we reach the
> number of vectors or total size limit, which means we can always fit at
> least 1 megabyte worth of data in the worst case, but usually a lot more
> due to large folios.  The only thing bio chaining is buying us now is
> to reduce the size of the allocation from an ioend with an embedded bio
> into a plain bio, which is a 52 bytes differences on 64-bit systems.
> 
> This is not worth the added complexity, so remove the bio chaining and
> only use the bio embedded into the ioend.  This will help to simplify
> further changes to the iomap writeback code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Well that's a nice code deflation!  And if I read this correctly,
instead of chaining bios together, now we create a new ioend and add it
to the ioend list, eventually submitting the entire list of them?

If so,

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 90 +++++++++++-------------------------------
>  fs/xfs/xfs_aops.c      |  6 +--
>  include/linux/iomap.h  |  8 +++-
>  3 files changed, 32 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 329e2c342f1c64..17760f3e67c61e 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1489,40 +1489,23 @@ static u32
>  iomap_finish_ioend(struct iomap_ioend *ioend, int error)
>  {
>  	struct inode *inode = ioend->io_inode;
> -	struct bio *bio = &ioend->io_inline_bio;
> -	struct bio *last = ioend->io_bio, *next;
> -	u64 start = bio->bi_iter.bi_sector;
> -	loff_t offset = ioend->io_offset;
> -	bool quiet = bio_flagged(bio, BIO_QUIET);
> +	struct bio *bio = &ioend->io_bio;
> +	struct folio_iter fi;
>  	u32 folio_count = 0;
>  
> -	for (bio = &ioend->io_inline_bio; bio; bio = next) {
> -		struct folio_iter fi;
> -
> -		/*
> -		 * For the last bio, bi_private points to the ioend, so we
> -		 * need to explicitly end the iteration here.
> -		 */
> -		if (bio == last)
> -			next = NULL;
> -		else
> -			next = bio->bi_private;
> -
> -		/* walk all folios in bio, ending page IO on them */
> -		bio_for_each_folio_all(fi, bio) {
> -			iomap_finish_folio_write(inode, fi.folio, fi.length,
> -					error);
> -			folio_count++;
> -		}
> -		bio_put(bio);
> +	/* walk all folios in bio, ending page IO on them */
> +	bio_for_each_folio_all(fi, bio) {
> +		iomap_finish_folio_write(inode, fi.folio, fi.length, error);
> +		folio_count++;
>  	}
> -	/* The ioend has been freed by bio_put() */
>  
> -	if (unlikely(error && !quiet)) {
> +	if (unlikely(error && !bio_flagged(bio, BIO_QUIET))) {
>  		printk_ratelimited(KERN_ERR
>  "%s: writeback error on inode %lu, offset %lld, sector %llu",
> -			inode->i_sb->s_id, inode->i_ino, offset, start);
> +			inode->i_sb->s_id, inode->i_ino,
> +			ioend->io_offset, ioend->io_sector);
>  	}
> +	bio_put(bio);	/* frees the ioend */
>  	return folio_count;
>  }
>  
> @@ -1563,7 +1546,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_ioends);
>  static bool
>  iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next)
>  {
> -	if (ioend->io_bio->bi_status != next->io_bio->bi_status)
> +	if (ioend->io_bio.bi_status != next->io_bio.bi_status)
>  		return false;
>  	if ((ioend->io_flags & IOMAP_F_SHARED) ^
>  	    (next->io_flags & IOMAP_F_SHARED))
> @@ -1628,9 +1611,8 @@ EXPORT_SYMBOL_GPL(iomap_sort_ioends);
>  
>  static void iomap_writepage_end_bio(struct bio *bio)
>  {
> -	struct iomap_ioend *ioend = bio->bi_private;
> -
> -	iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status));
> +	iomap_finish_ioend(iomap_ioend_from_bio(bio),
> +			blk_status_to_errno(bio->bi_status));
>  }
>  
>  /*
> @@ -1645,9 +1627,6 @@ static int
>  iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
>  		int error)
>  {
> -	ioend->io_bio->bi_private = ioend;
> -	ioend->io_bio->bi_end_io = iomap_writepage_end_bio;
> -
>  	if (wpc->ops->prepare_ioend)
>  		error = wpc->ops->prepare_ioend(ioend, error);
>  	if (error) {
> @@ -1657,12 +1636,12 @@ iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
>  		 * as there is only one reference to the ioend at this point in
>  		 * time.
>  		 */
> -		ioend->io_bio->bi_status = errno_to_blk_status(error);
> -		bio_endio(ioend->io_bio);
> +		ioend->io_bio.bi_status = errno_to_blk_status(error);
> +		bio_endio(&ioend->io_bio);
>  		return error;
>  	}
>  
> -	submit_bio(ioend->io_bio);
> +	submit_bio(&ioend->io_bio);
>  	return 0;
>  }
>  
> @@ -1676,44 +1655,22 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
>  			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
>  			       GFP_NOFS, &iomap_ioend_bioset);
>  	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
> +	bio->bi_end_io = iomap_writepage_end_bio;
>  	wbc_init_bio(wbc, bio);
>  
> -	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
> +	ioend = iomap_ioend_from_bio(bio);
>  	INIT_LIST_HEAD(&ioend->io_list);
>  	ioend->io_type = wpc->iomap.type;
>  	ioend->io_flags = wpc->iomap.flags;
>  	ioend->io_inode = inode;
>  	ioend->io_size = 0;
>  	ioend->io_offset = pos;
> -	ioend->io_bio = bio;
>  	ioend->io_sector = bio->bi_iter.bi_sector;
>  
>  	wpc->nr_folios = 0;
>  	return ioend;
>  }
>  
> -/*
> - * Allocate a new bio, and chain the old bio to the new one.
> - *
> - * Note that we have to perform the chaining in this unintuitive order
> - * so that the bi_private linkage is set up in the right direction for the
> - * traversal in iomap_finish_ioend().
> - */
> -static struct bio *
> -iomap_chain_bio(struct bio *prev)
> -{
> -	struct bio *new;
> -
> -	new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
> -	bio_clone_blkg_association(new, prev);
> -	new->bi_iter.bi_sector = bio_end_sector(prev);
> -
> -	bio_chain(prev, new);
> -	bio_get(prev);		/* for iomap_finish_ioend */
> -	submit_bio(prev);
> -	return new;
> -}
> -
>  static bool
>  iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
>  {
> @@ -1725,7 +1682,7 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
>  	if (offset != wpc->ioend->io_offset + wpc->ioend->io_size)
>  		return false;
>  	if (iomap_sector(&wpc->iomap, offset) !=
> -	    bio_end_sector(wpc->ioend->io_bio))
> +	    bio_end_sector(&wpc->ioend->io_bio))
>  		return false;
>  	/*
>  	 * Limit ioend bio chain lengths to minimise IO completion latency. This
> @@ -1750,15 +1707,14 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>  	size_t poff = offset_in_folio(folio, pos);
>  
>  	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
> +new_ioend:
>  		if (wpc->ioend)
>  			list_add(&wpc->ioend->io_list, iolist);
>  		wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
>  	}
>  
> -	if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) {
> -		wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio);
> -		bio_add_folio_nofail(wpc->ioend->io_bio, folio, len, poff);
> -	}
> +	if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
> +		goto new_ioend;
>  
>  	if (ifs)
>  		atomic_add(len, &ifs->write_bytes_pending);
> @@ -1979,7 +1935,7 @@ EXPORT_SYMBOL_GPL(iomap_writepages);
>  static int __init iomap_init(void)
>  {
>  	return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
> -			   offsetof(struct iomap_ioend, io_inline_bio),
> +			   offsetof(struct iomap_ioend, io_bio),
>  			   BIOSET_NEED_BVECS);
>  }
>  fs_initcall(iomap_init);
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 465d7630bb2185..b45ee6cbbdaab2 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -112,7 +112,7 @@ xfs_end_ioend(
>  	 * longer dirty. If we don't remove delalloc blocks here, they become
>  	 * stale and can corrupt free space accounting on unmount.
>  	 */
> -	error = blk_status_to_errno(ioend->io_bio->bi_status);
> +	error = blk_status_to_errno(ioend->io_bio.bi_status);
>  	if (unlikely(error)) {
>  		if (ioend->io_flags & IOMAP_F_SHARED) {
>  			xfs_reflink_cancel_cow_range(ip, offset, size, true);
> @@ -179,7 +179,7 @@ STATIC void
>  xfs_end_bio(
>  	struct bio		*bio)
>  {
> -	struct iomap_ioend	*ioend = bio->bi_private;
> +	struct iomap_ioend	*ioend = iomap_ioend_from_bio(bio);
>  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
>  	unsigned long		flags;
>  
> @@ -444,7 +444,7 @@ xfs_prepare_ioend(
>  	/* send ioends that might require a transaction to the completion wq */
>  	if (xfs_ioend_is_append(ioend) || ioend->io_type == IOMAP_UNWRITTEN ||
>  	    (ioend->io_flags & IOMAP_F_SHARED))
> -		ioend->io_bio->bi_end_io = xfs_end_bio;
> +		ioend->io_bio.bi_end_io = xfs_end_bio;
>  	return status;
>  }
>  
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index b2a05dff914d0c..b8d3b658ad2b03 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -297,10 +297,14 @@ struct iomap_ioend {
>  	size_t			io_size;	/* size of the extent */
>  	loff_t			io_offset;	/* offset in the file */
>  	sector_t		io_sector;	/* start sector of ioend */
> -	struct bio		*io_bio;	/* bio being built */
> -	struct bio		io_inline_bio;	/* MUST BE LAST! */
> +	struct bio		io_bio;		/* MUST BE LAST! */
>  };
>  
> +static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
> +{
> +	return container_of(bio, struct iomap_ioend, io_bio);
> +}
> +
>  struct iomap_writeback_ops {
>  	/*
>  	 * Required, maps the blocks so that writeback can be performed on
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/13] iomap: only call mapping_set_error once for each failed bio
  2023-11-26 12:47 ` [PATCH 10/13] iomap: only call mapping_set_error once for each failed bio Christoph Hellwig
@ 2023-11-29  5:03   ` Darrick J. Wong
  2023-11-29  5:41     ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  5:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

On Sun, Nov 26, 2023 at 01:47:17PM +0100, Christoph Hellwig wrote:
> Instead of clling mapping_set_error once per folio, only do that once
> per bio, and consolidate all the writeback error handling code in
> iomap_finish_ioend.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 17760f3e67c61e..e1d5076251702d 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1464,15 +1464,10 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
>  EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
>  
>  static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
> -		size_t len, int error)
> +		size_t len)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  
> -	if (error) {
> -		folio_set_error(folio);
> -		mapping_set_error(inode->i_mapping, error);
> -	}
> -
>  	WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
>  	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
>  
> @@ -1493,18 +1488,24 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
>  	struct folio_iter fi;
>  	u32 folio_count = 0;
>  
> +	if (error) {
> +		mapping_set_error(inode->i_mapping, error);
> +		if (!bio_flagged(bio, BIO_QUIET)) {
> +			pr_err_ratelimited(
> +"%s: writeback error on inode %lu, offset %lld, sector %llu",
> +				inode->i_sb->s_id, inode->i_ino,
> +				ioend->io_offset, ioend->io_sector);

Something that's always bothered me: Why don't we log the *amount* of
data that just failed writeback?

"XFS: writeback error on inode 63, offset 4096, length 8192, sector 27"

Now we'd actually have some way to work out that we've lost 8k worth of
data.  OFC maybe the better solution is to wire up that inotify error
reporting interface that ext4 added a while back...?

This patch is pretty straightforward tho
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> +		}
> +	}
> +
>  	/* walk all folios in bio, ending page IO on them */
>  	bio_for_each_folio_all(fi, bio) {
> -		iomap_finish_folio_write(inode, fi.folio, fi.length, error);
> +		if (error)
> +			folio_set_error(fi.folio);
> +		iomap_finish_folio_write(inode, fi.folio, fi.length);
>  		folio_count++;
>  	}
>  
> -	if (unlikely(error && !bio_flagged(bio, BIO_QUIET))) {
> -		printk_ratelimited(KERN_ERR
> -"%s: writeback error on inode %lu, offset %lld, sector %llu",
> -			inode->i_sb->s_id, inode->i_ino,
> -			ioend->io_offset, ioend->io_sector);
> -	}
>  	bio_put(bio);	/* frees the ioend */
>  	return folio_count;
>  }
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/13] iomap: factor out a iomap_writepage_map_block helper
  2023-11-26 12:47 ` [PATCH 11/13] iomap: factor out a iomap_writepage_map_block helper Christoph Hellwig
@ 2023-11-29  5:06   ` Darrick J. Wong
  0 siblings, 0 replies; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  5:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

On Sun, Nov 26, 2023 at 01:47:18PM +0100, Christoph Hellwig wrote:
> Split the loop body that calls into the file system to map a block and
> add it to the ioend into a separate helper to prefer for refactoring of
> the surrounding code.
> 
> Note that this was the only place in iomap_writepage_map that could
> return an error, so include the call to ->discard_folio into the new
> helper as that will help to avoid code duplication in the future.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Simple enough hoist,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 72 +++++++++++++++++++++++++-----------------
>  1 file changed, 43 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e1d5076251702d..9f223820f60d22 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1723,6 +1723,45 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>  	wbc_account_cgroup_owner(wbc, &folio->page, len);
>  }
>  
> +static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
> +		struct writeback_control *wbc, struct folio *folio,
> +		struct inode *inode, u64 pos, unsigned *count,
> +		struct list_head *submit_list)
> +{
> +	int error;
> +
> +	error = wpc->ops->map_blocks(wpc, inode, pos);
> +	if (error)
> +		goto fail;
> +	trace_iomap_writepage_map(inode, &wpc->iomap);
> +
> +	switch (wpc->iomap.type) {
> +	case IOMAP_INLINE:
> +		WARN_ON_ONCE(1);
> +		error = -EIO;
> +		break;
> +	case IOMAP_HOLE:
> +		break;
> +	default:
> +		iomap_add_to_ioend(wpc, wbc, folio, inode, pos, submit_list);
> +		(*count)++;
> +	}
> +
> +fail:
> +	/*
> +	 * We cannot cancel the ioend directly here on error.  We may have
> +	 * already set other pages under writeback and hence we have to run I/O
> +	 * completion to mark the error state of the pages under writeback
> +	 * appropriately.
> +	 *
> +	 * Just let the file system know what portion of the folio failed to
> +	 * map.
> +	 */
> +	if (error && wpc->ops->discard_folio)
> +		wpc->ops->discard_folio(folio, pos);
> +	return error;
> +}
> +
>  /*
>   * Check interaction of the folio with the file end.
>   *
> @@ -1807,7 +1846,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	unsigned nblocks = i_blocks_per_folio(inode, folio);
>  	u64 pos = folio_pos(folio);
>  	u64 end_pos = pos + folio_size(folio);
> -	int error = 0, count = 0, i;
> +	unsigned count = 0;
> +	int error = 0, i;
>  	LIST_HEAD(submit_list);
>  
>  	trace_iomap_writepage(inode, pos, folio_size(folio));
> @@ -1833,19 +1873,10 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
>  		if (ifs && !ifs_block_is_dirty(folio, ifs, i))
>  			continue;
> -
> -		error = wpc->ops->map_blocks(wpc, inode, pos);
> +		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos,
> +				&count, &submit_list);
>  		if (error)
>  			break;
> -		trace_iomap_writepage_map(inode, &wpc->iomap);
> -		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) {
> -			error = -EIO;
> -			break;
> -		}
> -		if (wpc->iomap.type == IOMAP_HOLE)
> -			continue;
> -		iomap_add_to_ioend(wpc, wbc, folio, inode, pos, &submit_list);
> -		count++;
>  	}
>  	if (count)
>  		wpc->nr_folios++;
> @@ -1855,23 +1886,6 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	WARN_ON_ONCE(folio_test_writeback(folio));
>  	WARN_ON_ONCE(folio_test_dirty(folio));
>  
> -	/*
> -	 * We cannot cancel the ioend directly here on error.  We may have
> -	 * already set other pages under writeback and hence we have to run I/O
> -	 * completion to mark the error state of the pages under writeback
> -	 * appropriately.
> -	 */
> -	if (unlikely(error)) {
> -		/*
> -		 * Let the filesystem know what portion of the current page
> -		 * failed to map. If the page hasn't been added to ioend, it
> -		 * won't be affected by I/O completion and we must unlock it
> -		 * now.
> -		 */
> -		if (wpc->ops->discard_folio)
> -			wpc->ops->discard_folio(folio, pos);
> -	}
> -
>  	/*
>  	 * We can have dirty bits set past end of file in page_mkwrite path
>  	 * while mapping the last partial folio. Hence it's better to clear
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 12/13] iomap: submit ioends immediately
  2023-11-26 12:47 ` [PATCH 12/13] iomap: submit ioends immediately Christoph Hellwig
@ 2023-11-29  5:14   ` Darrick J. Wong
  0 siblings, 0 replies; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  5:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

On Sun, Nov 26, 2023 at 01:47:19PM +0100, Christoph Hellwig wrote:
> Currently the writeback code delays submitting fill ioends until we
> reach the end of the folio.  The reason for that is that otherwise
> the end I/O handler could clear the writeback bit before we've even
> finished submitting all I/O for the folio.
> 
> Add a bias to ifs->write_bytes_pending while we are submitting I/O
> for a folio so that it never reaches zero until all I/O is completed
> to prevent the early writeback bit clearing, and remove the now
> superfluous submit_list.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok, we'll see what happens in the last patch...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 157 ++++++++++++++++++++---------------------
>  1 file changed, 75 insertions(+), 82 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 9f223820f60d22..a01b0686e7c8a0 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1620,30 +1620,34 @@ static void iomap_writepage_end_bio(struct bio *bio)
>   * Submit the final bio for an ioend.
>   *
>   * If @error is non-zero, it means that we have a situation where some part of
> - * the submission process has failed after we've marked pages for writeback
> - * and unlocked them.  In this situation, we need to fail the bio instead of
> - * submitting it.  This typically only happens on a filesystem shutdown.
> + * the submission process has failed after we've marked pages for writeback.
> + * We cannot cancel ioend directly in that case, so call the bio end I/O handler
> + * with the error status here to run the normal I/O completion handler to clear
> + * the writeback bit and let the file system proess the errors.
>   */
> -static int
> -iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
> -		int error)
> +static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
>  {
> +	if (!wpc->ioend)
> +		return error;
> +
> +	/*
> +	 * Let the file systems prepare the I/O submission and hook in an I/O
> +	 * comletion handler.  This also needs to happen in case after a
> +	 * failure happened so that the file system end I/O handler gets called
> +	 * to clean up.
> +	 */
>  	if (wpc->ops->prepare_ioend)
> -		error = wpc->ops->prepare_ioend(ioend, error);
> +		error = wpc->ops->prepare_ioend(wpc->ioend, error);
> +
>  	if (error) {
> -		/*
> -		 * If we're failing the IO now, just mark the ioend with an
> -		 * error and finish it.  This will run IO completion immediately
> -		 * as there is only one reference to the ioend at this point in
> -		 * time.
> -		 */
> -		ioend->io_bio.bi_status = errno_to_blk_status(error);
> -		bio_endio(&ioend->io_bio);
> -		return error;
> +		wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
> +		bio_endio(&wpc->ioend->io_bio);
> +	} else {
> +		submit_bio(&wpc->ioend->io_bio);
>  	}
>  
> -	submit_bio(&ioend->io_bio);
> -	return 0;
> +	wpc->ioend = NULL;
> +	return error;
>  }
>  
>  static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
> @@ -1698,19 +1702,28 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
>  /*
>   * Test to see if we have an existing ioend structure that we could append to
>   * first; otherwise finish off the current ioend and start another.
> + *
> + * If a new ioend is created and cached, the old ioend is submitted to the block
> + * layer instantly.  Batching optimisations are provided by higher level block
> + * plugging.
> + *
> + * At the end of a writeback pass, there will be a cached ioend remaining on the
> + * writepage context that the caller will need to submit.
>   */
> -static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
> +static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>  		struct writeback_control *wbc, struct folio *folio,
> -		struct inode *inode, loff_t pos, struct list_head *iolist)
> +		struct inode *inode, loff_t pos)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  	unsigned len = i_blocksize(inode);
>  	size_t poff = offset_in_folio(folio, pos);
> +	int error;
>  
>  	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
>  new_ioend:
> -		if (wpc->ioend)
> -			list_add(&wpc->ioend->io_list, iolist);
> +		error = iomap_submit_ioend(wpc, 0);
> +		if (error)
> +			return error;
>  		wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
>  	}
>  
> @@ -1721,12 +1734,12 @@ static void iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>  		atomic_add(len, &ifs->write_bytes_pending);
>  	wpc->ioend->io_size += len;
>  	wbc_account_cgroup_owner(wbc, &folio->page, len);
> +	return 0;
>  }
>  
>  static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
>  		struct writeback_control *wbc, struct folio *folio,
> -		struct inode *inode, u64 pos, unsigned *count,
> -		struct list_head *submit_list)
> +		struct inode *inode, u64 pos, unsigned *count)
>  {
>  	int error;
>  
> @@ -1743,7 +1756,7 @@ static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
>  	case IOMAP_HOLE:
>  		break;
>  	default:
> -		iomap_add_to_ioend(wpc, wbc, folio, inode, pos, submit_list);
> +		iomap_add_to_ioend(wpc, wbc, folio, inode, pos);
>  		(*count)++;
>  	}
>  
> @@ -1820,35 +1833,21 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
>  	return true;
>  }
>  
> -/*
> - * We implement an immediate ioend submission policy here to avoid needing to
> - * chain multiple ioends and hence nest mempool allocations which can violate
> - * the forward progress guarantees we need to provide. The current ioend we're
> - * adding blocks to is cached in the writepage context, and if the new block
> - * doesn't append to the cached ioend, it will create a new ioend and cache that
> - * instead.
> - *
> - * If a new ioend is created and cached, the old ioend is returned and queued
> - * locally for submission once the entire page is processed or an error has been
> - * detected.  While ioends are submitted immediately after they are completed,
> - * batching optimisations are provided by higher level block plugging.
> - *
> - * At the end of a writeback pass, there will be a cached ioend remaining on the
> - * writepage context that the caller will need to submit.
> - */
>  static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  		struct writeback_control *wbc, struct folio *folio)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  	struct inode *inode = folio->mapping->host;
> -	struct iomap_ioend *ioend, *next;
>  	unsigned len = i_blocksize(inode);
>  	unsigned nblocks = i_blocks_per_folio(inode, folio);
>  	u64 pos = folio_pos(folio);
>  	u64 end_pos = pos + folio_size(folio);
>  	unsigned count = 0;
>  	int error = 0, i;
> -	LIST_HEAD(submit_list);
> +
> +	WARN_ON_ONCE(!folio_test_locked(folio));
> +	WARN_ON_ONCE(folio_test_dirty(folio));
> +	WARN_ON_ONCE(folio_test_writeback(folio));
>  
>  	trace_iomap_writepage(inode, pos, folio_size(folio));
>  
> @@ -1858,12 +1857,27 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	}
>  	WARN_ON_ONCE(end_pos <= pos);
>  
> -	if (!ifs && nblocks > 1) {
> -		ifs = ifs_alloc(inode, folio, 0);
> -		iomap_set_range_dirty(folio, 0, end_pos - pos);
> +	if (nblocks > 1) {
> +		if (!ifs) {
> +			ifs = ifs_alloc(inode, folio, 0);
> +			iomap_set_range_dirty(folio, 0, end_pos - pos);
> +		}
> +
> +		/*
> +		 * Keep the I/O completion handler from clearing the writeback
> +		 * bit until we have submitted all blocks by adding a bias to
> +		 * ifs->write_bytes_pending, which is dropped after submitting
> +		 * all blocks.
> +		 */
> +		WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
> +		atomic_inc(&ifs->write_bytes_pending);
>  	}
>  
> -	WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
> +	/*
> +	 * Set the writeback bit ASAP, as the I/O completion for the single
> +	 * block per folio case happen hit as soon as we're submitting the bio.
> +	 */
> +	folio_start_writeback(folio);
>  
>  	/*
>  	 * Walk through the folio to find areas to write back. If we
> @@ -1874,18 +1888,13 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  		if (ifs && !ifs_block_is_dirty(folio, ifs, i))
>  			continue;
>  		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos,
> -				&count, &submit_list);
> +				&count);
>  		if (error)
>  			break;
>  	}
>  	if (count)
>  		wpc->nr_folios++;
>  
> -	WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
> -	WARN_ON_ONCE(!folio_test_locked(folio));
> -	WARN_ON_ONCE(folio_test_writeback(folio));
> -	WARN_ON_ONCE(folio_test_dirty(folio));
> -
>  	/*
>  	 * We can have dirty bits set past end of file in page_mkwrite path
>  	 * while mapping the last partial folio. Hence it's better to clear
> @@ -1893,35 +1902,21 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	 */
>  	iomap_clear_range_dirty(folio, 0, folio_size(folio));
>  
> -	if (error && !count) {
> -		folio_unlock(folio);
> -		goto done;
> -	}
> -
> -	folio_start_writeback(folio);
> -	folio_unlock(folio);
> -
>  	/*
> -	 * Preserve the original error if there was one; catch
> -	 * submission errors here and propagate into subsequent ioend
> -	 * submissions.
> +	 * Usually the writeback bit is cleared by the I/O completion handler.
> +	 * But we may end up either not actually writing any blocks, or (when
> +	 * there are multiple blocks in a folio) all I/O might have finished
> +	 * already at this point.  In that case we need to clear the writeback
> +	 * bit ourselves right after unlocking the page.
>  	 */
> -	list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
> -		int error2;
> -
> -		list_del_init(&ioend->io_list);
> -		error2 = iomap_submit_ioend(wpc, ioend, error);
> -		if (error2 && !error)
> -			error = error2;
> +	folio_unlock(folio);
> +	if (ifs) {
> +		if (atomic_dec_and_test(&ifs->write_bytes_pending))
> +			folio_end_writeback(folio);
> +	} else {
> +		if (!count)
> +			folio_end_writeback(folio);
>  	}
> -
> -	/*
> -	 * We can end up here with no error and nothing to write only if we race
> -	 * with a partial page truncate on a sub-page block sized filesystem.
> -	 */
> -	if (!count)
> -		folio_end_writeback(folio);
> -done:
>  	mapping_set_error(inode->i_mapping, error);
>  	return error;
>  }
> @@ -1941,9 +1936,7 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
>  
>  	wpc->ops = ops;
>  	ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc);
> -	if (!wpc->ioend)
> -		return ret;
> -	return iomap_submit_ioend(wpc, wpc->ioend, ret);
> +	return iomap_submit_ioend(wpc, ret);
>  }
>  EXPORT_SYMBOL_GPL(iomap_writepages);
>  
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/13] iomap: map multiple blocks at a time
  2023-11-26 12:47 ` [PATCH 13/13] iomap: map multiple blocks at a time Christoph Hellwig
@ 2023-11-29  5:25   ` Darrick J. Wong
  2023-11-29  5:44     ` Christoph Hellwig
  0 siblings, 1 reply; 49+ messages in thread
From: Darrick J. Wong @ 2023-11-29  5:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Chandan Babu R, Zhang Yi, Ritesh Harjani,
	linux-xfs, linux-fsdevel

On Sun, Nov 26, 2023 at 01:47:20PM +0100, Christoph Hellwig wrote:
> The ->map_blocks interface returns a valid range for writeback, but we
> still call back into it for every block, which is a bit inefficient.
> 
> Change xfs_writepage_map to use the valid range in the map until the end

iomap_writepage_map?

> of the folio or the dirty range inside the folio instead of calling back
> into every block.
> 
> Note that the range is not used over folio boundaries as we need to be
> able to check the mapping sequence count under the folio lock.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 113 ++++++++++++++++++++++++++++-------------
>  include/linux/iomap.h  |   7 +++
>  2 files changed, 86 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a01b0686e7c8a0..a98cb38a71ebc8 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (C) 2010 Red Hat, Inc.
> - * Copyright (C) 2016-2019 Christoph Hellwig.
> + * Copyright (C) 2016-2023 Christoph Hellwig.
>   */
>  #include <linux/module.h>
>  #include <linux/compiler.h>
> @@ -95,6 +95,42 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
>  	return test_bit(block + blks_per_folio, ifs->state);
>  }
>  
> +static unsigned ifs_find_dirty_range(struct folio *folio,
> +		struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
> +{
> +	struct inode *inode = folio->mapping->host;
> +	unsigned start_blk =
> +		offset_in_folio(folio, *range_start) >> inode->i_blkbits;
> +	unsigned end_blk = min_not_zero(
> +		offset_in_folio(folio, range_end) >> inode->i_blkbits,
> +		i_blocks_per_folio(inode, folio));
> +	unsigned nblks = 1;
> +
> +	while (!ifs_block_is_dirty(folio, ifs, start_blk))
> +		if (++start_blk == end_blk)
> +			return 0;
> +
> +	while (start_blk + nblks < end_blk &&
> +	       ifs_block_is_dirty(folio, ifs, start_blk + nblks))
> +		nblks++;

I don't like ^^^ the single-space indentation of the loop body over the
loop test expression because I almost missed that.  Extra braces for
clarity, please?

> +
> +	*range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
> +	return nblks << inode->i_blkbits;
> +}
> +
> +static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
> +		u64 range_end)
> +{
> +	struct iomap_folio_state *ifs = folio->private;
> +	

   ^^^^^ extra whitespace?

> +	if (*range_start >= range_end)
> +		return 0;
> +
> +	if (ifs)
> +		return ifs_find_dirty_range(folio, ifs, range_start, range_end);
> +	return range_end - *range_start;
> +}
> +
>  static void ifs_clear_range_dirty(struct folio *folio,
>  		struct iomap_folio_state *ifs, size_t off, size_t len)
>  {
> @@ -1712,10 +1748,9 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset)
>   */
>  static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>  		struct writeback_control *wbc, struct folio *folio,
> -		struct inode *inode, loff_t pos)
> +		struct inode *inode, loff_t pos, unsigned len)
>  {
>  	struct iomap_folio_state *ifs = folio->private;
> -	unsigned len = i_blocksize(inode);
>  	size_t poff = offset_in_folio(folio, pos);
>  	int error;
>  
> @@ -1739,28 +1774,41 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>  
>  static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
>  		struct writeback_control *wbc, struct folio *folio,
> -		struct inode *inode, u64 pos, unsigned *count)
> +		struct inode *inode, u64 pos, unsigned dirty_len,
> +		unsigned *count)
>  {
>  	int error;
>  
> -	error = wpc->ops->map_blocks(wpc, inode, pos);
> -	if (error)
> -		goto fail;
> -	trace_iomap_writepage_map(inode, &wpc->iomap);
> -
> -	switch (wpc->iomap.type) {
> -	case IOMAP_INLINE:
> -		WARN_ON_ONCE(1);
> -		error = -EIO;
> -		break;
> -	case IOMAP_HOLE:
> -		break;
> -	default:
> -		iomap_add_to_ioend(wpc, wbc, folio, inode, pos);

Hey wait, the previous patch missed the error return here!

> -		(*count)++;
> -	}
> +	do {
> +		unsigned map_len;
> +
> +		error = wpc->ops->map_blocks(wpc, inode, pos);
> +		if (error)
> +			break;
> +		trace_iomap_writepage_map(inode, &wpc->iomap);
> +
> +		map_len = min_t(u64, dirty_len,
> +			wpc->iomap.offset + wpc->iomap.length - pos);
> +		WARN_ON_ONCE(!folio->private && map_len < dirty_len);
> +
> +		switch (wpc->iomap.type) {
> +		case IOMAP_INLINE:
> +			WARN_ON_ONCE(1);
> +			error = -EIO;
> +			break;
> +		case IOMAP_HOLE:
> +			break;
> +		default:
> +			error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
> +					map_len);
> +			if (!error)
> +				(*count)++;
> +			break;
> +		}
> +		dirty_len -= map_len;
> +		pos += map_len;
> +	} while (dirty_len && !error);
>  
> -fail:
>  	/*
>  	 * We cannot cancel the ioend directly here on error.  We may have
>  	 * already set other pages under writeback and hence we have to run I/O
> @@ -1827,7 +1875,7 @@ static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
>  		 * beyond i_size.
>  		 */
>  		folio_zero_segment(folio, poff, folio_size(folio));
> -		*end_pos = isize;
> +		*end_pos = round_up(isize, i_blocksize(inode));
>  	}
>  
>  	return true;
> @@ -1838,12 +1886,11 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  {
>  	struct iomap_folio_state *ifs = folio->private;
>  	struct inode *inode = folio->mapping->host;
> -	unsigned len = i_blocksize(inode);
> -	unsigned nblocks = i_blocks_per_folio(inode, folio);
>  	u64 pos = folio_pos(folio);
>  	u64 end_pos = pos + folio_size(folio);
>  	unsigned count = 0;
> -	int error = 0, i;
> +	int error = 0;
> +	u32 rlen;
>  
>  	WARN_ON_ONCE(!folio_test_locked(folio));
>  	WARN_ON_ONCE(folio_test_dirty(folio));
> @@ -1857,7 +1904,7 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	}
>  	WARN_ON_ONCE(end_pos <= pos);
>  
> -	if (nblocks > 1) {
> +	if (i_blocks_per_folio(inode, folio) > 1) {
>  		if (!ifs) {
>  			ifs = ifs_alloc(inode, folio, 0);
>  			iomap_set_range_dirty(folio, 0, end_pos - pos);
> @@ -1880,18 +1927,16 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
>  	folio_start_writeback(folio);
>  
>  	/*
> -	 * Walk through the folio to find areas to write back. If we
> -	 * run off the end of the current map or find the current map
> -	 * invalid, grab a new one.
> +	 * Walk through the folio to find dirty areas to write back.
>  	 */
> -	for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
> -		if (ifs && !ifs_block_is_dirty(folio, ifs, i))
> -			continue;
> -		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode, pos,
> -				&count);
> +	while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
> +		error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
> +				pos, rlen, &count);
>  		if (error)
>  			break;
> +		pos += rlen;
>  	}
> +
>  	if (count)
>  		wpc->nr_folios++;
>  
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index b8d3b658ad2b03..49d93f53878565 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -309,6 +309,13 @@ struct iomap_writeback_ops {
>  	/*
>  	 * Required, maps the blocks so that writeback can be performed on
>  	 * the range starting at offset.
> +	 *
> +	 * Can return arbitrarily large regions, but we need to call into it at
> +	 * least once per folio to allow the file systems to synchronize with
> +	 * the write path that could be invalidating mappings.

Does xfs_map_blocks already return arbitrarily large regions?  I think
it already does since I see it setting imap.br_blockcount in places.

--D

> +	 *
> +	 * An existing mapping from a previous call to this method can be reused
> +	 * by the file system if it is still valid.
>  	 */
>  	int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
>  				loff_t offset);
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error
  2023-11-29  4:40       ` Darrick J. Wong
@ 2023-11-29  5:39         ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-29  5:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Ritesh Harjani, Christian Brauner,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Tue, Nov 28, 2023 at 08:40:57PM -0800, Darrick J. Wong wrote:
> > I think they way it currently works for gfs2 is that writeback from the
> > page cache never goes back into the inline area.  
> > 
> > If we ever have a need to actually write back inline data we could
> > change this code to support it, but right now I just want to make the
> > assert more consistent.
> 
> Question: Do we even /want/ writeback to be initiating transactions
> to log the inline data?  I suppose for ext4/jbd2 that would be the least
> inefficient time to do that.

I think in general not, but it really depends on the file system
architecture.  In other words:  don't touch the current behavior unless
we have a good reason.  I just want to make the error check a little
saner..

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/13] iomap: don't chain bios
  2023-11-29  4:59   ` Darrick J. Wong
@ 2023-11-29  5:40     ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-29  5:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Christian Brauner, Chandan Babu R, Zhang Yi,
	Ritesh Harjani, linux-xfs, linux-fsdevel

On Tue, Nov 28, 2023 at 08:59:51PM -0800, Darrick J. Wong wrote:
> Well that's a nice code deflation!  And if I read this correctly,
> instead of chaining bios together, now we create a new ioend and add it
> to the ioend list, eventually submitting the entire list of them?

Yes. (although that changes a bit again later)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/13] iomap: only call mapping_set_error once for each failed bio
  2023-11-29  5:03   ` Darrick J. Wong
@ 2023-11-29  5:41     ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-29  5:41 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Christian Brauner, Chandan Babu R, Zhang Yi,
	Ritesh Harjani, linux-xfs, linux-fsdevel

On Tue, Nov 28, 2023 at 09:03:22PM -0800, Darrick J. Wong wrote:
> > +	if (error) {
> > +		mapping_set_error(inode->i_mapping, error);
> > +		if (!bio_flagged(bio, BIO_QUIET)) {
> > +			pr_err_ratelimited(
> > +"%s: writeback error on inode %lu, offset %lld, sector %llu",
> > +				inode->i_sb->s_id, inode->i_ino,
> > +				ioend->io_offset, ioend->io_sector);
> 
> Something that's always bothered me: Why don't we log the *amount* of
> data that just failed writeback?
> 
> "XFS: writeback error on inode 63, offset 4096, length 8192, sector 27"
> 
> Now we'd actually have some way to work out that we've lost 8k worth of
> data.  OFC maybe the better solution is to wire up that inotify error
> reporting interface that ext4 added a while back...?

Just adding the amount sounds sane to me and I can add a patch to do
that.  I think this message originated in buffer.c, where it was
printed once for each block before rate limiting got added..


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend
  2023-11-29  4:53       ` Darrick J. Wong
@ 2023-11-29  5:42         ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-29  5:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Ritesh Harjani, Christian Brauner,
	Chandan Babu R, Zhang Yi, linux-xfs, linux-fsdevel

On Tue, Nov 28, 2023 at 08:53:57PM -0800, Darrick J. Wong wrote:
> Can you change @offset to @pos while you're changing the function
> signature?

Sure.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/13] iomap: map multiple blocks at a time
  2023-11-29  5:25   ` Darrick J. Wong
@ 2023-11-29  5:44     ` Christoph Hellwig
  0 siblings, 0 replies; 49+ messages in thread
From: Christoph Hellwig @ 2023-11-29  5:44 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Christian Brauner, Chandan Babu R, Zhang Yi,
	Ritesh Harjani, linux-xfs, linux-fsdevel

On Tue, Nov 28, 2023 at 09:25:35PM -0800, Darrick J. Wong wrote:
> > -	case IOMAP_HOLE:
> > -		break;
> > -	default:
> > -		iomap_add_to_ioend(wpc, wbc, folio, inode, pos);
> 
> Hey wait, the previous patch missed the error return here!

Oh.  I guess that's what you get for rebasing and reordering a little
too much..

> > index b8d3b658ad2b03..49d93f53878565 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -309,6 +309,13 @@ struct iomap_writeback_ops {
> >  	/*
> >  	 * Required, maps the blocks so that writeback can be performed on
> >  	 * the range starting at offset.
> > +	 *
> > +	 * Can return arbitrarily large regions, but we need to call into it at
> > +	 * least once per folio to allow the file systems to synchronize with
> > +	 * the write path that could be invalidating mappings.
> 
> Does xfs_map_blocks already return arbitrarily large regions?  I think
> it already does since I see it setting imap.br_blockcount in places.

Yes, it does try to convert the entire underlying delalloc extent.


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-11-29  5:44 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-26 12:47 RFC: map multiple blocks per ->map_blocks in iomap writeback Christoph Hellwig
2023-11-26 12:47 ` [PATCH 01/13] iomap: clear the per-folio dirty bits on all writeback failures Christoph Hellwig
2023-11-27  3:47   ` Ritesh Harjani
2023-11-27  6:29     ` Christoph Hellwig
2023-11-26 12:47 ` [PATCH 02/13] iomap: treat inline data in iomap_writepage_map as an I/O error Christoph Hellwig
2023-11-27  5:01   ` Ritesh Harjani
2023-11-27  6:33     ` Christoph Hellwig
2023-11-29  4:40       ` Darrick J. Wong
2023-11-29  5:39         ` Christoph Hellwig
2023-11-26 12:47 ` [PATCH 03/13] iomap: move the io_folios field out of struct iomap_ioend Christoph Hellwig
2023-11-27  5:33   ` Ritesh Harjani
2023-11-29  4:41   ` Darrick J. Wong
2023-11-26 12:47 ` [PATCH 04/13] iomap: drop the obsolete PF_MEMALLOC check in iomap_do_writepage Christoph Hellwig
2023-11-27  6:39   ` Ritesh Harjani
2023-11-27  6:41     ` Christoph Hellwig
2023-11-29  4:45       ` Darrick J. Wong
2023-11-26 12:47 ` [PATCH 05/13] iomap: factor out a iomap_writepage_handle_eof helper Christoph Hellwig
2023-11-27  6:57   ` Ritesh Harjani
2023-11-27  7:02     ` Ritesh Harjani
2023-11-27  7:12       ` Christoph Hellwig
2023-11-29  4:48         ` Darrick J. Wong
2023-11-26 12:47 ` [PATCH 06/13] iomap: move all remaining per-folio logic into xfs_writepage_map Christoph Hellwig
2023-11-27  7:36   ` Ritesh Harjani
2023-11-27 19:20   ` Josef Bacik
2023-11-29  4:50   ` Darrick J. Wong
2023-11-26 12:47 ` [PATCH 07/13] iomap: clean up the iomap_new_ioend calling convention Christoph Hellwig
2023-11-27  7:43   ` Ritesh Harjani
2023-11-27  8:13     ` Christoph Hellwig
2023-11-29  4:51       ` Darrick J. Wong
2023-11-26 12:47 ` [PATCH 08/13] iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend Christoph Hellwig
2023-11-27  9:54   ` Ritesh Harjani
2023-11-27 13:54     ` Christoph Hellwig
2023-11-29  4:53       ` Darrick J. Wong
2023-11-29  5:42         ` Christoph Hellwig
2023-11-26 12:47 ` [PATCH 09/13] iomap: don't chain bios Christoph Hellwig
2023-11-27 12:53   ` Zhang Yi
2023-11-27 13:51     ` Christoph Hellwig
2023-11-29  4:59   ` Darrick J. Wong
2023-11-29  5:40     ` Christoph Hellwig
2023-11-26 12:47 ` [PATCH 10/13] iomap: only call mapping_set_error once for each failed bio Christoph Hellwig
2023-11-29  5:03   ` Darrick J. Wong
2023-11-29  5:41     ` Christoph Hellwig
2023-11-26 12:47 ` [PATCH 11/13] iomap: factor out a iomap_writepage_map_block helper Christoph Hellwig
2023-11-29  5:06   ` Darrick J. Wong
2023-11-26 12:47 ` [PATCH 12/13] iomap: submit ioends immediately Christoph Hellwig
2023-11-29  5:14   ` Darrick J. Wong
2023-11-26 12:47 ` [PATCH 13/13] iomap: map multiple blocks at a time Christoph Hellwig
2023-11-29  5:25   ` Darrick J. Wong
2023-11-29  5:44     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.