All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Subject: [PATCH RFC v3 3/3] iomap: bound ioend size to 4096 pages
Date: Mon, 17 May 2021 13:17:22 -0400	[thread overview]
Message-ID: <20210517171722.1266878-4-bfoster@redhat.com> (raw)
In-Reply-To: <20210517171722.1266878-1-bfoster@redhat.com>

The iomap writeback infrastructure is currently able to construct
extremely large bio chains (tens of GBs) associated with a single
ioend. This consolidation provides no significant value as bio
chains increase beyond a reasonable minimum size. On the other hand,
this does hold significant numbers of pages in the writeback
state across an unnecessarily large number of bios because the ioend
is not processed for completion until the final bio in the chain
completes. Cap an individual ioend to a reasonable size of 4096
pages (16MB with 4k pages) to avoid this condition.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c |  6 ++++--
 include/linux/iomap.h  | 26 ++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 642422775e4e..f2890ee434d0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1269,7 +1269,7 @@ iomap_chain_bio(struct bio *prev)
 
 static bool
 iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
-		sector_t sector)
+		unsigned len, sector_t sector)
 {
 	if ((wpc->iomap.flags & IOMAP_F_SHARED) !=
 	    (wpc->ioend->io_flags & IOMAP_F_SHARED))
@@ -1280,6 +1280,8 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
 		return false;
 	if (sector != bio_end_sector(wpc->ioend->io_bio))
 		return false;
+	if (wpc->ioend->io_size + len > IOEND_MAX_IOSIZE)
+		return false;
 	return true;
 }
 
@@ -1297,7 +1299,7 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
 	unsigned poff = offset & (PAGE_SIZE - 1);
 	bool merged, same_page = false;
 
-	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, offset, sector)) {
+	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, offset, len, sector)) {
 		if (wpc->ioend)
 			list_add(&wpc->ioend->io_list, iolist);
 		wpc->ioend = iomap_alloc_ioend(inode, wpc, offset, sector, wbc);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 07f3f4e69084..89b15cc236d5 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -203,6 +203,32 @@ struct iomap_ioend {
 	struct bio		io_inline_bio;	/* MUST BE LAST! */
 };
 
+/*
+ * Maximum ioend IO size is used to prevent ioends from becoming unbound in
+ * size. bios can reach 4GB in size if pages are contiguous, and bio chains are
+ * effectively unbound in length. Hence the only limits on the size of the bio
+ * chain is the contiguity of the extent on disk and the length of the run of
+ * sequential dirty pages in the page cache. This can be tens of GBs of physical
+ * extents and if memory is large enough, tens of millions of dirty pages.
+ * Locking them all under writeback until the final bio in the chain is
+ * submitted and completed locks all those pages for the legnth of time it takes
+ * to write those many, many GBs of data to storage.
+ *
+ * Background writeback caps any single writepages call to half the device
+ * bandwidth to ensure fairness and prevent any one dirty inode causing
+ * writeback starvation. fsync() and other WB_SYNC_ALL writebacks have no such
+ * cap on wbc->nr_pages, and that's where the above massive bio chain lengths
+ * come from. We want large IOs to reach the storage, but we need to limit
+ * completion latencies, hence we need to control the maximum IO size we
+ * dispatch to the storage stack.
+ *
+ * We don't really have to care about the extra IO completion overhead here
+ * because iomap has contiguous IO completion merging. If the device can sustain
+ * high throughput and large bios, the ioends are merged on completion and
+ * processed in large, efficient chunks with no additional IO latency.
+ */
+#define IOEND_MAX_IOSIZE	(4096ULL << PAGE_SHIFT)
+
 struct iomap_writeback_ops {
 	/*
 	 * Required, maps the blocks so that writeback can be performed on
-- 
2.26.3


  parent reply	other threads:[~2021-05-17 17:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 17:17 [PATCH v3 0/3] iomap: avoid soft lockup warnings on large ioends Brian Foster
2021-05-17 17:17 ` [PATCH v3 1/3] iomap: resched ioend completion when in non-atomic context Brian Foster
2021-05-17 17:54   ` Matthew Wilcox
2021-05-18 11:38     ` Brian Foster
2021-05-20 21:58       ` Darrick J. Wong
2021-05-24 11:57         ` Brian Foster
2021-05-24 16:53           ` Darrick J. Wong
2021-05-26  1:19             ` Darrick J. Wong
2021-05-22  7:45   ` Ming Lei
2021-05-24 11:57     ` Brian Foster
2021-05-24 14:11       ` Ming Lei
2021-05-17 17:17 ` [PATCH v3 2/3] xfs: kick large ioends to completion workqueue Brian Foster
2021-05-26  1:20   ` Darrick J. Wong
2021-05-17 17:17 ` Brian Foster [this message]
2021-05-19 13:28   ` [PATCH RFC v3 3/3] iomap: bound ioend size to 4096 pages Christoph Hellwig
2021-05-19 14:52     ` Brian Foster
2021-05-20 23:27   ` Darrick J. Wong
2021-05-24 12:02     ` Brian Foster
2021-05-25  4:20       ` Darrick J. Wong
2021-05-25  4:29         ` Damien Le Moal
2021-05-25  7:13         ` Dave Chinner
2021-05-25  9:07         ` Andreas Gruenbacher
2021-05-26  2:12         ` Matthew Wilcox
2021-05-26  3:32           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210517171722.1266878-4-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.