All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Metadata IO error fixes
@ 2021-11-24 19:14 Josef Bacik
  2021-11-24 19:14 ` [PATCH v2 1/3] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Josef Bacik @ 2021-11-24 19:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- I was debugging generic/484 separately because I thought it was data related,
  but it turned out to be metadata related as well, so I've added the patch
  "btrfs: call mapping_set_error() on btree inode with a write error" to the
  series.

--- Original email ---

Hello,

I saw a dmesg failure with generic/281 on our overnight runs.  This turned out
to be because we weren't getting an error back from btrfs_search_slot() even
though we found a metadata block that shouldn't have been uptodate.

The root cause is that write errors on the page clear uptodate on the page, but
not on the extent buffer itself.  Since we rely on that bit to tell wether the
extent buffer is valid or not we don't notice that the eb is bogus when we find
it in cache in a subsequent write, and eventually trip over
assert_eb_page_uptodate() warnings.

This fixes the problem I was seeing, I could easily reproduce by running
generic/281 in a loop a few times.  With these pages I haven't reproduced in 20
loops.  Thanks,

Josef

Josef Bacik (3):
  btrfs: clear extent buffer uptodate when we fail to write it
  btrfs: check the root node for uptodate before returning it
  btrfs: call mapping_set_error() on btree inode with a write error

 fs/btrfs/ctree.c     | 19 +++++++++++++++----
 fs/btrfs/extent_io.c | 14 ++++++++++++++
 2 files changed, 29 insertions(+), 4 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] btrfs: clear extent buffer uptodate when we fail to write it
  2021-11-24 19:14 [PATCH v2 0/3] Metadata IO error fixes Josef Bacik
@ 2021-11-24 19:14 ` Josef Bacik
  2021-11-24 19:14 ` [PATCH v2 2/3] btrfs: check the root node for uptodate before returning it Josef Bacik
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2021-11-24 19:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I got dmesg errors on generic/281 on our overnight xfstests.  Looking at
the history this happens occasionally, with errors like this

------------[ cut here ]------------
WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50
CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G        W         5.16.0-rc2+ #469
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: btrfs-cache btrfs_work_helper
RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
RSP: 0018:ffffae598230bc60 EFLAGS: 00010246
RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000
RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340
RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000
R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0
R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000
FS:  0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0
Call Trace:

 extent_buffer_test_bit+0x3f/0x70
 free_space_test_bit+0xa6/0xc0
 load_free_space_tree+0x1d6/0x430
 caching_thread+0x454/0x630
 ? rcu_read_lock_sched_held+0x12/0x60
 ? rcu_read_lock_sched_held+0x12/0x60
 ? rcu_read_lock_sched_held+0x12/0x60
 ? lock_release+0x1f0/0x2d0
 btrfs_work_helper+0xf2/0x3e0
 ? lock_release+0x1f0/0x2d0
 ? finish_task_switch.isra.0+0xf9/0x3a0
 process_one_work+0x270/0x5a0
 worker_thread+0x55/0x3c0
 ? process_one_work+0x5a0/0x5a0
 kthread+0x174/0x1a0
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

This happens because we're trying to read from a extent buffer page that
is !PageUptodate.  This happens because we will clear the page uptodate
when we have an IO error, but we don't clear the extent buffer uptodate.
If we do a read later and find this extent buffer we'll think its valid
and not return an error, and then trip over this warning.

Fix this by also clearing uptodate on the extent buffer when this
happens, so that we get an error when we do a btrfs_search_slot() and
find this block later.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b289d26aca0d..3454cac28389 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4308,6 +4308,12 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
 		return;
 
+	/*
+	 * A read may stumble upon this buffer later, make sure that it gets an
+	 * error and knows there was an error.
+	 */
+	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+
 	/*
 	 * If we error out, we should add back the dirty_metadata_bytes
 	 * to make it consistent.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] btrfs: check the root node for uptodate before returning it
  2021-11-24 19:14 [PATCH v2 0/3] Metadata IO error fixes Josef Bacik
  2021-11-24 19:14 ` [PATCH v2 1/3] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
@ 2021-11-24 19:14 ` Josef Bacik
  2021-11-25  9:07   ` Nikolay Borisov
  2021-11-24 19:14 ` [PATCH v2 3/3] btrfs: call mapping_set_error() on btree inode with a write error Josef Bacik
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Josef Bacik @ 2021-11-24 19:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Now that we clear the extent buffer uptodate if we fail to write it out
we need to check to see if our root node is uptodate before we search
down it.  Otherwise we could return stale data (or potentially corrupt
data that was caught by the write verification step) and think that the
path is OK to search down.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 216bf35f6caf..d2297e449072 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1568,12 +1568,9 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 							int write_lock_level)
 {
 	struct extent_buffer *b;
-	int root_lock;
+	int root_lock = 0;
 	int level = 0;
 
-	/* We try very hard to do read locks on the root */
-	root_lock = BTRFS_READ_LOCK;
-
 	if (p->search_commit_root) {
 		b = root->commit_root;
 		atomic_inc(&b->refs);
@@ -1593,6 +1590,9 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 		goto out;
 	}
 
+	/* We try very hard to do read locks on the root */
+	root_lock = BTRFS_READ_LOCK;
+
 	/*
 	 * If the level is set to maximum, we can skip trying to get the read
 	 * lock.
@@ -1619,6 +1619,17 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
 	level = btrfs_header_level(b);
 
 out:
+	/*
+	 * The root may have failed to write out at some point, and thus is no
+	 * longer valid, return an error in this case.
+	 */
+	if (!extent_buffer_uptodate(b)) {
+		if (root_lock)
+			btrfs_tree_unlock_rw(b, root_lock);
+		free_extent_buffer(b);
+		return ERR_PTR(-EIO);
+	}
+
 	p->nodes[level] = b;
 	if (!p->skip_locking)
 		p->locks[level] = root_lock;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] btrfs: call mapping_set_error() on btree inode with a write error
  2021-11-24 19:14 [PATCH v2 0/3] Metadata IO error fixes Josef Bacik
  2021-11-24 19:14 ` [PATCH v2 1/3] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
  2021-11-24 19:14 ` [PATCH v2 2/3] btrfs: check the root node for uptodate before returning it Josef Bacik
@ 2021-11-24 19:14 ` Josef Bacik
  2021-11-25  9:12   ` Nikolay Borisov
  2021-11-25  9:12 ` [PATCH v2 0/3] Metadata IO error fixes Nikolay Borisov
  2021-11-29 16:56 ` David Sterba
  4 siblings, 1 reply; 9+ messages in thread
From: Josef Bacik @ 2021-11-24 19:14 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

generic/484 fails sometimes with compression on because the write ends
up small enough that it goes into the btree.  This means that we never
call mapping_set_error() on the inode itself, because the page gets
marked as fine when we inline it into the metadata.  When the metadata
writeback happens we see it and abort the transaction properly and mark
the fs as readonly, however we don't do the mapping_set_error() on
anything.  In syncfs() we will simply return 0 if the sb is marked
read-only, so we can't check for this in our syncfs callback.  The only
way the error gets returned if we called mapping_set_error() on
something.  Fix this by calling mapping_set_error() on the btree inode
mapping.  This allows us to properly return an error on syncfs and pass
generic/484 with compression on.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3454cac28389..1a67f4b3986b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4314,6 +4314,14 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	 */
 	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
 
+	/*
+	 * We need to set the mapping with the io error as well because a write
+	 * error will flip the file system readonly, and then syncfs() will
+	 * return a 0 because we are readonly if we don't modify the err seq for
+	 * the superblock.
+	 */
+	mapping_set_error(page->mapping, -EIO);
+
 	/*
 	 * If we error out, we should add back the dirty_metadata_bytes
 	 * to make it consistent.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] btrfs: check the root node for uptodate before returning it
  2021-11-24 19:14 ` [PATCH v2 2/3] btrfs: check the root node for uptodate before returning it Josef Bacik
@ 2021-11-25  9:07   ` Nikolay Borisov
  0 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2021-11-25  9:07 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 24.11.21 г. 21:14, Josef Bacik wrote:
> Now that we clear the extent buffer uptodate if we fail to write it out
> we need to check to see if our root node is uptodate before we search
> down it.  Otherwise we could return stale data (or potentially corrupt
> data that was caught by the write verification step) and think that the
> path is OK to search down.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 3/3] btrfs: call mapping_set_error() on btree inode with a write error
  2021-11-24 19:14 ` [PATCH v2 3/3] btrfs: call mapping_set_error() on btree inode with a write error Josef Bacik
@ 2021-11-25  9:12   ` Nikolay Borisov
  0 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2021-11-25  9:12 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 24.11.21 г. 21:14, Josef Bacik wrote:
> generic/484 fails sometimes with compression on because the write ends
> up small enough that it goes into the btree.  This means that we never
> call mapping_set_error() on the inode itself, because the page gets
> marked as fine when we inline it into the metadata.  When the metadata
> writeback happens we see it and abort the transaction properly and mark
> the fs as readonly, however we don't do the mapping_set_error() on
> anything.  In syncfs() we will simply return 0 if the sb is marked
> read-only, so we can't check for this in our syncfs callback.  The only
> way the error gets returned if we called mapping_set_error() on
> something.  Fix this by calling mapping_set_error() on the btree inode
> mapping.  This allows us to properly return an error on syncfs and pass
> generic/484 with compression on.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/extent_io.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 3454cac28389..1a67f4b3986b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4314,6 +4314,14 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
>  	 */
>  	clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
>  
> +	/*
> +	 * We need to set the mapping with the io error as well because a write
> +	 * error will flip the file system readonly, and then syncfs() will
> +	 * return a 0 because we are readonly if we don't modify the err seq for
> +	 * the superblock.
> +	 */
> +	mapping_set_error(page->mapping, -EIO);
> +
>  	/*
>  	 * If we error out, we should add back the dirty_metadata_bytes
>  	 * to make it consistent.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] Metadata IO error fixes
  2021-11-24 19:14 [PATCH v2 0/3] Metadata IO error fixes Josef Bacik
                   ` (2 preceding siblings ...)
  2021-11-24 19:14 ` [PATCH v2 3/3] btrfs: call mapping_set_error() on btree inode with a write error Josef Bacik
@ 2021-11-25  9:12 ` Nikolay Borisov
  2021-11-29 16:56   ` David Sterba
  2021-11-29 16:56 ` David Sterba
  4 siblings, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2021-11-25  9:12 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 24.11.21 г. 21:14, Josef Bacik wrote:
> v1->v2:
> - I was debugging generic/484 separately because I thought it was data related,
>   but it turned out to be metadata related as well, so I've added the patch
>   "btrfs: call mapping_set_error() on btree inode with a write error" to the
>   series.
> 
> --- Original email ---
> 
> Hello,
> 
> I saw a dmesg failure with generic/281 on our overnight runs.  This turned out
> to be because we weren't getting an error back from btrfs_search_slot() even
> though we found a metadata block that shouldn't have been uptodate.
> 
> The root cause is that write errors on the page clear uptodate on the page, but
> not on the extent buffer itself.  Since we rely on that bit to tell wether the
> extent buffer is valid or not we don't notice that the eb is bogus when we find
> it in cache in a subsequent write, and eventually trip over
> assert_eb_page_uptodate() warnings.
> 
> This fixes the problem I was seeing, I could easily reproduce by running
> generic/281 in a loop a few times.  With these pages I haven't reproduced in 20
> loops.  Thanks,
> 
> Josef
> 
> Josef Bacik (3):
>   btrfs: clear extent buffer uptodate when we fail to write it
>   btrfs: check the root node for uptodate before returning it
>   btrfs: call mapping_set_error() on btree inode with a write error
> 
>  fs/btrfs/ctree.c     | 19 +++++++++++++++----
>  fs/btrfs/extent_io.c | 14 ++++++++++++++
>  2 files changed, 29 insertions(+), 4 deletions(-)
> 


This is stable material as well, right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] Metadata IO error fixes
  2021-11-25  9:12 ` [PATCH v2 0/3] Metadata IO error fixes Nikolay Borisov
@ 2021-11-29 16:56   ` David Sterba
  0 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2021-11-29 16:56 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Thu, Nov 25, 2021 at 11:12:53AM +0200, Nikolay Borisov wrote:
> 
> 
> On 24.11.21 г. 21:14, Josef Bacik wrote:
> > v1->v2:
> > - I was debugging generic/484 separately because I thought it was data related,
> >   but it turned out to be metadata related as well, so I've added the patch
> >   "btrfs: call mapping_set_error() on btree inode with a write error" to the
> >   series.
> > 
> > --- Original email ---
> > 
> > Hello,
> > 
> > I saw a dmesg failure with generic/281 on our overnight runs.  This turned out
> > to be because we weren't getting an error back from btrfs_search_slot() even
> > though we found a metadata block that shouldn't have been uptodate.
> > 
> > The root cause is that write errors on the page clear uptodate on the page, but
> > not on the extent buffer itself.  Since we rely on that bit to tell wether the
> > extent buffer is valid or not we don't notice that the eb is bogus when we find
> > it in cache in a subsequent write, and eventually trip over
> > assert_eb_page_uptodate() warnings.
> > 
> > This fixes the problem I was seeing, I could easily reproduce by running
> > generic/281 in a loop a few times.  With these pages I haven't reproduced in 20
> > loops.  Thanks,
> > 
> > Josef
> > 
> > Josef Bacik (3):
> >   btrfs: clear extent buffer uptodate when we fail to write it
> >   btrfs: check the root node for uptodate before returning it
> >   btrfs: call mapping_set_error() on btree inode with a write error
> > 
> >  fs/btrfs/ctree.c     | 19 +++++++++++++++----
> >  fs/btrfs/extent_io.c | 14 ++++++++++++++
> >  2 files changed, 29 insertions(+), 4 deletions(-)
> > 
> 
> 
> This is stable material as well, right?

Yes, tags added.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] Metadata IO error fixes
  2021-11-24 19:14 [PATCH v2 0/3] Metadata IO error fixes Josef Bacik
                   ` (3 preceding siblings ...)
  2021-11-25  9:12 ` [PATCH v2 0/3] Metadata IO error fixes Nikolay Borisov
@ 2021-11-29 16:56 ` David Sterba
  4 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2021-11-29 16:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Nov 24, 2021 at 02:14:22PM -0500, Josef Bacik wrote:
> v1->v2:
> - I was debugging generic/484 separately because I thought it was data related,
>   but it turned out to be metadata related as well, so I've added the patch
>   "btrfs: call mapping_set_error() on btree inode with a write error" to the
>   series.
> 
> --- Original email ---
> 
> Hello,
> 
> I saw a dmesg failure with generic/281 on our overnight runs.  This turned out
> to be because we weren't getting an error back from btrfs_search_slot() even
> though we found a metadata block that shouldn't have been uptodate.
> 
> The root cause is that write errors on the page clear uptodate on the page, but
> not on the extent buffer itself.  Since we rely on that bit to tell wether the
> extent buffer is valid or not we don't notice that the eb is bogus when we find
> it in cache in a subsequent write, and eventually trip over
> assert_eb_page_uptodate() warnings.
> 
> This fixes the problem I was seeing, I could easily reproduce by running
> generic/281 in a loop a few times.  With these pages I haven't reproduced in 20
> loops.  Thanks,
> 
> Josef
> 
> Josef Bacik (3):
>   btrfs: clear extent buffer uptodate when we fail to write it
>   btrfs: check the root node for uptodate before returning it
>   btrfs: call mapping_set_error() on btree inode with a write error

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-11-29 16:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-24 19:14 [PATCH v2 0/3] Metadata IO error fixes Josef Bacik
2021-11-24 19:14 ` [PATCH v2 1/3] btrfs: clear extent buffer uptodate when we fail to write it Josef Bacik
2021-11-24 19:14 ` [PATCH v2 2/3] btrfs: check the root node for uptodate before returning it Josef Bacik
2021-11-25  9:07   ` Nikolay Borisov
2021-11-24 19:14 ` [PATCH v2 3/3] btrfs: call mapping_set_error() on btree inode with a write error Josef Bacik
2021-11-25  9:12   ` Nikolay Borisov
2021-11-25  9:12 ` [PATCH v2 0/3] Metadata IO error fixes Nikolay Borisov
2021-11-29 16:56   ` David Sterba
2021-11-29 16:56 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.