All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>,
	wqu@suse.com, dsterba@suse.com,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
Subject: Re: [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced switch to readonly after a lot of disk writes
Date: Mon, 26 Dec 2022 08:14:55 +0800	[thread overview]
Message-ID: <f68699c3-ec5e-d8e8-f101-6e9a7020ac81@gmx.com> (raw)
In-Reply-To: <CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2204 bytes --]



On 2022/12/26 05:32, Mikhail Gavrilov wrote:
> Hi,
> It is curious but it happens only on machine which have BTRFS volume
> combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines
> with BTRFS volume from one HDD the bug does not appear.
> 
> To bisect the problematic commit, I had to sweat a lot. At each step,
> I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and
> deleted it. For make sure that the commit previous to
> 947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by
> the bug, I downloaded this game 10 times, which should have provided
> more than 1.5 Tb of data writing to the btrfs volume.
> 
> Here is result of my bisection:
> 947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit
> commit 947a629988f191807d2d22ba63ae18259bb645c5
> Author: Qu Wenruo <wqu@suse.com>
> Date:   Wed Sep 14 13:32:51 2022 +0800
> 
>      btrfs: move tree block parentness check into validate_extent_buffer()
> 
[...]
>      Signed-off-by: Qu Wenruo <wqu@suse.com>
>      Signed-off-by: David Sterba <dsterba@suse.com>
> 
>   fs/btrfs/disk-io.c   | 73 ++++++++++++++++++++++++++++++++++++++--------------
>   fs/btrfs/extent_io.c | 18 ++++++++++---
>   fs/btrfs/extent_io.h |  5 ++--
>   fs/btrfs/volumes.h   | 25 +++++++++++++++---
>   4 files changed, 93 insertions(+), 28 deletions(-)
> 
> Before going to readonly, the preceding line in kernel log display a message:
> [ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in
> btrfs_run_delayed_refs:2147: errno=-5 IO failure
> 
> I also attached a full kernel log.
> 
Thanks a lot for the full kernel log.

It indeed shows something is wrong in the run_one_delayed_ref().
But surprisingly, if there is something wrong, I'd expect more output 
from btrfs, as normally if one tree block failed to pass whatever the 
checks, it should cause an error message at least.

Since you can reproduce the bug (although I don't think this is easy to 
reproduce), mind to apply the extra debug patch and then try to reproduce?

(Part of the patch would go upstreamed soon)

Another thing is, mind to run "btrfs check --readonly" on the fs?
I don't believe that's the case, but just in case.

Thanks,
Qu

[-- Attachment #2: 0001-btrfs-add-extra-debug-for-run_one_delayed_ref.patch --]
[-- Type: text/x-patch, Size: 3096 bytes --]

From 41cf78e56ed2a3391499469855b906d74d03eeca Mon Sep 17 00:00:00 2001
Message-Id: <41cf78e56ed2a3391499469855b906d74d03eeca.1672013469.git.wqu@suse.com>
From: Qu Wenruo <wqu@suse.com>
Date: Mon, 26 Dec 2022 08:10:27 +0800
Subject: [PATCH] btrfs: add extra debug for run_one_delayed_ref()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c     |  4 ++++
 fs/btrfs/extent-tree.c | 22 ++++++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8aa7d79a341a..c67c688d568e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -530,6 +530,10 @@ static int validate_extent_buffer(struct extent_buffer *eb,
 	}
 
 	if (found_level != check->level) {
+		btrfs_err_rl(eb->fs_info,
+	"level check failed on logical %llu mirror %u wanted %u found %u",
+			     eb->start, eb->read_mirror, check->level,
+			     found_level);
 		ret = -EIO;
 		goto out;
 	}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index eaa1fb2850d7..fe1f31319d58 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1462,8 +1462,13 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 	ret = insert_inline_extent_backref(trans, path, bytenr, num_bytes,
 					   parent, root_objectid, owner,
 					   offset, refs_to_add, extent_op);
-	if ((ret < 0 && ret != -EAGAIN) || !ret)
+	if ((ret < 0 && ret != -EAGAIN) || !ret) {
+		if (ret < 0 && ret != -EAGAIN)
+			btrfs_err(trans->fs_info,
+				  "failed to insert inline extent backref: %d",
+				  ret);
 		goto out;
+	}
 
 	/*
 	 * Ok we had -EAGAIN which means we didn't have space to insert and
@@ -1954,8 +1959,8 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
 		if (ret) {
 			unselect_delayed_ref_head(delayed_refs, locked_ref);
 			btrfs_put_delayed_ref(ref);
-			btrfs_debug(fs_info, "run_one_delayed_ref returned %d",
-				    ret);
+			btrfs_err(fs_info, "run_one_delayed_ref returned %d",
+				  ret);
 			return ret;
 		}
 
@@ -4642,8 +4647,12 @@ static int alloc_reserved_extent(struct btrfs_trans_handle *trans, u64 bytenr,
 	int ret;
 
 	ret = remove_from_free_space_tree(trans, bytenr, num_bytes);
-	if (ret)
+	if (ret) {
+		btrfs_err(fs_info,
+	"failed to remove bytenr %llu num_bytes %llu from free space tree: %d",
+			  bytenr, num_bytes, ret);
 		return ret;
+	}
 
 	ret = btrfs_update_block_group(trans, bytenr, num_bytes, true);
 	if (ret) {
@@ -4686,6 +4695,9 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
 	extent_root = btrfs_extent_root(fs_info, ins->objectid);
 	ret = btrfs_insert_empty_item(trans, extent_root, path, ins, size);
 	if (ret) {
+		btrfs_err(fs_info,
+			  "failed to insert empty item for extent root: %d",
+			  ret);
 		btrfs_free_path(path);
 		return ret;
 	}
@@ -4759,6 +4771,8 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
 				      size);
 	if (ret) {
 		btrfs_free_path(path);
+		btrfs_err(fs_info, "failed to insert empty item into extent root: %d",
+			  ret);
 		return ret;
 	}
 
-- 
2.39.0


  reply	other threads:[~2022-12-26  0:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-25 21:32 [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced switch to readonly after a lot of disk writes Mikhail Gavrilov
2022-12-26  0:14 ` Qu Wenruo [this message]
2022-12-26  2:47   ` Mikhail Gavrilov
2022-12-26  3:29     ` Qu Wenruo
2022-12-26  7:25       ` David Arendt
2022-12-26  7:31         ` Write time corrupting for log tree Qu Wenruo
2023-01-03 10:41         ` [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced switch to readonly after a lot of disk writes Filipe Manana
2022-12-26  8:15       ` Mikhail Gavrilov
2022-12-26  8:47         ` Qu Wenruo
2022-12-26 11:14           ` Mikhail Gavrilov
2022-12-26 12:11             ` Qu Wenruo
2022-12-26 12:15               ` Mikhail Gavrilov
2022-12-26 23:36                 ` Qu Wenruo
2022-12-26 23:52                   ` Mikhail Gavrilov
2022-12-27  0:18                     ` Qu Wenruo
2022-12-27  1:35                       ` Mikhail Gavrilov
2022-12-27  5:13                         ` Qu Wenruo
2022-12-27 10:19                           ` Mikhail Gavrilov
2022-12-27 11:02                             ` Qu Wenruo
2022-12-27 13:11                               ` Mikhail Gavrilov
2022-12-28  1:08                                 ` Qu Wenruo
2022-12-28 14:12                                   ` Mikhail Gavrilov
2022-12-28 23:24                                     ` Qu Wenruo
2022-12-28 23:42                                       ` Mikhail Gavrilov
2022-12-29  0:00                                         ` Qu Wenruo
2022-12-28 23:31                                     ` Qu Wenruo
2022-12-29  0:08                                       ` Mikhail Gavrilov
2022-12-29  3:05                                         ` Qu Wenruo
2022-12-26 13:37 ` [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced switch to readonly after a lot of disk writes #forregzbot Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f68699c3-ec5e-d8e8-f101-6e9a7020ac81@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.