All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: <djwong@kernel.org>, <hch@infradead.org>,
	<linux-xfs@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<hannes@cmpxchg.org>
Subject: [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage()
Date: Tue, 31 May 2022 18:11:17 -0700	[thread overview]
Message-ID: <20220601011116.495988-1-clm@fb.com> (raw)

iomap_do_writepage() sends pages past i_size through
folio_redirty_for_writepage(), which normally isn't a problem because
truncate and friends clean them very quickly.

When the system a variety of cgroups, we can end up in situations where
one cgroup has almost no dirty pages at all.  This is especially common
in our XFS workloads in production because they tend to use O_DIRECT for
everything.

We've hit storms where the redirty path hits millions of times in a few
seconds, on all a single file that's only ~40 pages long.  This ends up
leading to long tail latencies for file writes because the page reclaim
workers are hogging the CPU from some kworkers bound to the same CPU.

That's the theory anyway.  We know the storms exist, but the tail
latencies are so sporadic that it's hard to have any certainty about the
cause without patching a large number of boxes.

There are a few different problems here.  First is just that I don't
understand how invalidating the page instead of redirtying might upset
the rest of the iomap/xfs world.  Btrfs invalidates in this case, which
seems like the right thing to me, but we all have slightly different
sharp corners in the truncate path so I thought I'd ask for comments.

Second is the VM should take wbc->pages_skipped into account, or use
some other way to avoid looping over and over.  I think we actually want
both but I wanted to understand the page invalidation path first.

Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Domas Mituzas <domas@fb.com>
---
 fs/iomap/buffered-io.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8ce8720093b9..4a687a2a9ed9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1482,10 +1482,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		pgoff_t end_index = isize >> PAGE_SHIFT;
 
 		/*
-		 * Skip the page if it's fully outside i_size, e.g. due to a
-		 * truncate operation that's in progress. We must redirty the
-		 * page so that reclaim stops reclaiming it. Otherwise
-		 * iomap_vm_releasepage() is called on it and gets confused.
+		 * invalidate the page if it's fully outside i_size, e.g.
+		 * due to a truncate operation that's in progress.
 		 *
 		 * Note that the end_index is unsigned long.  If the given
 		 * offset is greater than 16TB on a 32-bit system then if we
@@ -1499,8 +1497,10 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		 * offset is just equal to the EOF.
 		 */
 		if (folio->index > end_index ||
-		    (folio->index == end_index && poff == 0))
-			goto redirty;
+		    (folio->index == end_index && poff == 0)) {
+			folio_invalidate(folio, 0, folio_size(folio));
+			goto unlock;
+		}
 
 		/*
 		 * The page straddles i_size.  It must be zeroed out on each
@@ -1518,6 +1518,7 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 
 redirty:
 	folio_redirty_for_writepage(wbc, folio);
+unlock:
 	folio_unlock(folio);
 	return 0;
 }
-- 
2.30.2


             reply	other threads:[~2022-06-01  1:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01  1:11 Chris Mason [this message]
2022-06-01 12:18 ` [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage() Christoph Hellwig
2022-06-01 14:13   ` Chris Mason
2022-06-02  6:52     ` Dave Chinner
2022-06-02 15:32       ` Johannes Weiner
2022-06-02 19:41         ` Chris Mason
2022-06-02 19:59           ` Matthew Wilcox
2022-06-02 22:07             ` Dave Chinner
2022-06-02 22:06         ` Dave Chinner
2022-06-03  1:29           ` Chris Mason
2022-06-03  5:20             ` Dave Chinner
2022-06-03 15:06               ` Johannes Weiner
2022-06-03 16:09                 ` Chris Mason
2022-06-05 23:32                   ` Dave Chinner
2022-06-06 14:46                     ` Johannes Weiner
2022-06-06 15:13                       ` Chris Mason
2022-06-07 22:52                       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220601011116.495988-1-clm@fb.com \
    --to=clm@fb.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.