All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Thorsten Leemhuis <linux@leemhuis.info>,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Matthew Wilcox <willy@infradead.org>,
	Suren Baghdasaryan <surenb@google.com>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Gao Xiang <xiang@kernel.org>,
	Chao Yu <chao@kernel.org>,
	linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-erofs@lists.ozlabs.org,
	linux-mm@kvack.org,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: [REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs (was: Re: [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads)
Date: Fri, 4 Nov 2022 08:36:22 -0400	[thread overview]
Message-ID: <Y2UHRqthNUwuIQGS@cmpxchg.org> (raw)
In-Reply-To: <5f7bac77-c088-6fb7-ccb5-bef9267f7186@leemhuis.info>

On Fri, Nov 04, 2022 at 08:32:22AM +0100, Thorsten Leemhuis wrote:
> On 03.11.22 23:20, Johannes Weiner wrote:
> > Can you try this patch?
> 
> It apparently does the trick -- at least my test setup that usually
> triggers the bug within a minute or two survived for nearly an hour now, so:
> 
> Tested-by: Thorsten Leemhuis <linux@leemhuis.info>

Great, thanks Thorsten.

> Can you please also add this tag to help future archeologists, as
> explained by the kernel docs (for details see
> Documentation/process/submitting-patches.rst and
> Documentation/process/5.Posting.rst):
> 
> Link:
> https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
> 
> It also will make my regression tracking bot see further postings of
> this patch and mark the issue as resolved once the patch lands in mainline.

Done.

Looks like erofs has the same issue, I included a fix for that.

Andrew would you mind picking this up and sending it Linusward? Jens
routed the series originally, but I believe he is out today.

Thanks

From b668b261ed18105e91745f3d7676b6bca968476d Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 3 Nov 2022 17:34:31 -0400
Subject: [PATCH] fs: fix leaked psi pressure state

When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read
was faulty. A pressure state, once entered, is never left. This
results in incorrectly elevated pressure, which triggers OOM kills.

pflags record the *previous* memstall state when we enter a new
one. The code tried to initialize pflags to 1, and then optimize the
leave call when we either didn't enter a memstall, or were already
inside a nested stall. However, there can be multiple PageWorkingset
pages in the bio, at which point it's that path itself that enters
repeatedly and overwrites pflags. This causes us to miss the exit.

Enter the stall only once if needed, then unwind correctly.

erofs has the same problem, fix that up too. And move the memstall
exit past submit_bio() to restore submit accounting originally added
by b8e24a9300b0 ("block: annotate refault stalls from IO submission").

Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads")
Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space")
Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer")
Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
---
 fs/btrfs/compression.c | 14 ++++++++------
 fs/erofs/zdata.c       | 18 +++++++++++-------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f1f051ad3147..e6635fe70067 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -512,7 +512,7 @@ static u64 bio_end_offset(struct bio *bio)
 static noinline int add_ra_bio_pages(struct inode *inode,
 				     u64 compressed_end,
 				     struct compressed_bio *cb,
-				     unsigned long *pflags)
+				     int *memstall, unsigned long *pflags)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	unsigned long end_index;
@@ -581,8 +581,10 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			continue;
 		}
 
-		if (PageWorkingset(page))
+		if (!*memstall && PageWorkingset(page)) {
 			psi_memstall_enter(pflags);
+			*memstall = 1;
+		}
 
 		ret = set_page_extent_mapped(page);
 		if (ret < 0) {
@@ -670,8 +672,8 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	u64 em_len;
 	u64 em_start;
 	struct extent_map *em;
-	/* Initialize to 1 to make skip psi_memstall_leave unless needed */
-	unsigned long pflags = 1;
+	unsigned long pflags;
+	int memstall = 0;
 	blk_status_t ret;
 	int ret2;
 	int i;
@@ -727,7 +729,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		goto fail;
 	}
 
-	add_ra_bio_pages(inode, em_start + em_len, cb, &pflags);
+	add_ra_bio_pages(inode, em_start + em_len, cb, &memstall, &pflags);
 
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
@@ -807,7 +809,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		}
 	}
 
-	if (!pflags)
+	if (memstall)
 		psi_memstall_leave(&pflags);
 
 	if (refcount_dec_and_test(&cb->pending_ios))
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index c7f24fc7efd5..064a166324a7 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1412,8 +1412,8 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
 	struct block_device *last_bdev;
 	unsigned int nr_bios = 0;
 	struct bio *bio = NULL;
-	/* initialize to 1 to make skip psi_memstall_leave unless needed */
-	unsigned long pflags = 1;
+	unsigned long pflags;
+	int memstall = 0;
 
 	bi_private = jobqueueset_init(sb, q, fgq, force_fg);
 	qtail[JQ_BYPASS] = &q[JQ_BYPASS]->head;
@@ -1463,14 +1463,18 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
 			if (bio && (cur != last_index + 1 ||
 				    last_bdev != mdev.m_bdev)) {
 submit_bio_retry:
-				if (!pflags)
-					psi_memstall_leave(&pflags);
 				submit_bio(bio);
+				if (memstall) {
+					psi_memstall_leave(&pflags);
+					memstall = 0;
+				}
 				bio = NULL;
 			}
 
-			if (unlikely(PageWorkingset(page)))
+			if (unlikely(PageWorkingset(page)) && !memstall) {
 				psi_memstall_enter(&pflags);
+				memstall = 1;
+			}
 
 			if (!bio) {
 				bio = bio_alloc(mdev.m_bdev, BIO_MAX_VECS,
@@ -1500,9 +1504,9 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
 	} while (owned_head != Z_EROFS_PCLUSTER_TAIL);
 
 	if (bio) {
-		if (!pflags)
-			psi_memstall_leave(&pflags);
 		submit_bio(bio);
+		if (memstall)
+			psi_memstall_leave(&pflags);
 	}
 
 	/*
-- 
2.38.1


WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
	linux-mm@kvack.org, linux-erofs@lists.ozlabs.org,
	Thorsten Leemhuis <linux@leemhuis.info>,
	Matthew Wilcox <willy@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	linux-block@vger.kernel.org, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-fsdevel@vger.kernel.org,
	Suren Baghdasaryan <surenb@google.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-btrfs@vger.kernel.org
Subject: Re: [REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs (was: Re: [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads)
Date: Fri, 4 Nov 2022 08:36:22 -0400	[thread overview]
Message-ID: <Y2UHRqthNUwuIQGS@cmpxchg.org> (raw)
In-Reply-To: <5f7bac77-c088-6fb7-ccb5-bef9267f7186@leemhuis.info>

On Fri, Nov 04, 2022 at 08:32:22AM +0100, Thorsten Leemhuis wrote:
> On 03.11.22 23:20, Johannes Weiner wrote:
> > Can you try this patch?
> 
> It apparently does the trick -- at least my test setup that usually
> triggers the bug within a minute or two survived for nearly an hour now, so:
> 
> Tested-by: Thorsten Leemhuis <linux@leemhuis.info>

Great, thanks Thorsten.

> Can you please also add this tag to help future archeologists, as
> explained by the kernel docs (for details see
> Documentation/process/submitting-patches.rst and
> Documentation/process/5.Posting.rst):
> 
> Link:
> https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
> 
> It also will make my regression tracking bot see further postings of
> this patch and mark the issue as resolved once the patch lands in mainline.

Done.

Looks like erofs has the same issue, I included a fix for that.

Andrew would you mind picking this up and sending it Linusward? Jens
routed the series originally, but I believe he is out today.

Thanks

From b668b261ed18105e91745f3d7676b6bca968476d Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 3 Nov 2022 17:34:31 -0400
Subject: [PATCH] fs: fix leaked psi pressure state

When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read
was faulty. A pressure state, once entered, is never left. This
results in incorrectly elevated pressure, which triggers OOM kills.

pflags record the *previous* memstall state when we enter a new
one. The code tried to initialize pflags to 1, and then optimize the
leave call when we either didn't enter a memstall, or were already
inside a nested stall. However, there can be multiple PageWorkingset
pages in the bio, at which point it's that path itself that enters
repeatedly and overwrites pflags. This causes us to miss the exit.

Enter the stall only once if needed, then unwind correctly.

erofs has the same problem, fix that up too. And move the memstall
exit past submit_bio() to restore submit accounting originally added
by b8e24a9300b0 ("block: annotate refault stalls from IO submission").

Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads")
Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space")
Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer")
Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
---
 fs/btrfs/compression.c | 14 ++++++++------
 fs/erofs/zdata.c       | 18 +++++++++++-------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f1f051ad3147..e6635fe70067 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -512,7 +512,7 @@ static u64 bio_end_offset(struct bio *bio)
 static noinline int add_ra_bio_pages(struct inode *inode,
 				     u64 compressed_end,
 				     struct compressed_bio *cb,
-				     unsigned long *pflags)
+				     int *memstall, unsigned long *pflags)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	unsigned long end_index;
@@ -581,8 +581,10 @@ static noinline int add_ra_bio_pages(struct inode *inode,
 			continue;
 		}
 
-		if (PageWorkingset(page))
+		if (!*memstall && PageWorkingset(page)) {
 			psi_memstall_enter(pflags);
+			*memstall = 1;
+		}
 
 		ret = set_page_extent_mapped(page);
 		if (ret < 0) {
@@ -670,8 +672,8 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	u64 em_len;
 	u64 em_start;
 	struct extent_map *em;
-	/* Initialize to 1 to make skip psi_memstall_leave unless needed */
-	unsigned long pflags = 1;
+	unsigned long pflags;
+	int memstall = 0;
 	blk_status_t ret;
 	int ret2;
 	int i;
@@ -727,7 +729,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		goto fail;
 	}
 
-	add_ra_bio_pages(inode, em_start + em_len, cb, &pflags);
+	add_ra_bio_pages(inode, em_start + em_len, cb, &memstall, &pflags);
 
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
@@ -807,7 +809,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		}
 	}
 
-	if (!pflags)
+	if (memstall)
 		psi_memstall_leave(&pflags);
 
 	if (refcount_dec_and_test(&cb->pending_ios))
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index c7f24fc7efd5..064a166324a7 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1412,8 +1412,8 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
 	struct block_device *last_bdev;
 	unsigned int nr_bios = 0;
 	struct bio *bio = NULL;
-	/* initialize to 1 to make skip psi_memstall_leave unless needed */
-	unsigned long pflags = 1;
+	unsigned long pflags;
+	int memstall = 0;
 
 	bi_private = jobqueueset_init(sb, q, fgq, force_fg);
 	qtail[JQ_BYPASS] = &q[JQ_BYPASS]->head;
@@ -1463,14 +1463,18 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
 			if (bio && (cur != last_index + 1 ||
 				    last_bdev != mdev.m_bdev)) {
 submit_bio_retry:
-				if (!pflags)
-					psi_memstall_leave(&pflags);
 				submit_bio(bio);
+				if (memstall) {
+					psi_memstall_leave(&pflags);
+					memstall = 0;
+				}
 				bio = NULL;
 			}
 
-			if (unlikely(PageWorkingset(page)))
+			if (unlikely(PageWorkingset(page)) && !memstall) {
 				psi_memstall_enter(&pflags);
+				memstall = 1;
+			}
 
 			if (!bio) {
 				bio = bio_alloc(mdev.m_bdev, BIO_MAX_VECS,
@@ -1500,9 +1504,9 @@ static void z_erofs_submit_queue(struct z_erofs_decompress_frontend *f,
 	} while (owned_head != Z_EROFS_PCLUSTER_TAIL);
 
 	if (bio) {
-		if (!pflags)
-			psi_memstall_leave(&pflags);
 		submit_bio(bio);
+		if (memstall)
+			psi_memstall_leave(&pflags);
 	}
 
 	/*
-- 
2.38.1


  reply	other threads:[~2022-11-04 12:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-15  9:41 improve pagecache PSI annotations v2 Christoph Hellwig
2022-09-15  9:41 ` Christoph Hellwig
2022-09-15  9:41 ` [PATCH 1/5] mm: add PSI accounting around ->read_folio and ->readahead calls Christoph Hellwig
2022-09-15  9:41   ` Christoph Hellwig
2022-09-15  9:41 ` [PATCH 2/5] sched/psi: export psi_memstall_{enter,leave} Christoph Hellwig
2022-09-15  9:41   ` Christoph Hellwig
2022-09-15  9:41 ` [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads Christoph Hellwig
2022-09-15  9:41   ` Christoph Hellwig
2022-11-03 10:46   ` [REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs (was: Re: [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads) Thorsten Leemhuis
2022-11-03 10:46     ` Thorsten Leemhuis
2022-11-03 11:08     ` [REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs #forregzbot Thorsten Leemhuis
2022-11-03 11:08       ` Thorsten Leemhuis
2022-11-03 12:40     ` [REGESSION] systemd-oomd overreacting due to PSI changes for Btrfs (was: Re: [PATCH 3/5] btrfs: add manual PSI accounting for compressed reads) Christoph Hellwig
2022-11-03 12:40       ` Christoph Hellwig
2022-11-03 22:20     ` Johannes Weiner
2022-11-03 22:20       ` Johannes Weiner
2022-11-04  7:32       ` Thorsten Leemhuis
2022-11-04  7:32         ` Thorsten Leemhuis
2022-11-04 12:36         ` Johannes Weiner [this message]
2022-11-04 12:36           ` Johannes Weiner
2022-09-15  9:41 ` [PATCH 4/5] erofs: add manual PSI accounting for the compressed address space Christoph Hellwig
2022-09-15  9:41   ` Christoph Hellwig
2022-09-15  9:42 ` [PATCH 5/5] block: remove PSI accounting from the bio layer Christoph Hellwig
2022-09-15  9:42   ` Christoph Hellwig
2022-09-15 13:01 ` improve pagecache PSI annotations v2 David Sterba
2022-09-15 13:01   ` David Sterba
2022-09-19 15:45   ` Christoph Hellwig
2022-09-19 15:45     ` Christoph Hellwig
2022-09-20 14:24 ` Jens Axboe
2022-09-20 14:24   ` Jens Axboe
2022-09-20 17:21   ` Christoph Hellwig
2022-09-20 17:21     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2UHRqthNUwuIQGS@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=chao@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-erofs@lists.ozlabs.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.