All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref()
@ 2022-12-26  1:00 Qu Wenruo
  2022-12-26  1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo
  2022-12-26  1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo
  0 siblings, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-12-26  1:00 UTC (permalink / raw)
  To: linux-btrfs

There is a bug report that one can cause error in run_one_delayed_ref()
and make the fs RO.

But there is only two lines of dmesg:

 BTRFS: error (device nvme0n1p3: state A) in btrfs_run_delayed_refs:2147: errno=-5 IO failure
 BTRFS info (device nvme0n1p3: state EA): forced readonly

Which is not really helpful.

This patch will add two error reports which should help us to debug:

- Add the missing level check errors in validate_extent_buffer()
  Failed level checks didn't cause any error message, which is very
  different compared to the other checks.

- Add error report for failed run_one_delayed_ref()
  This will include extra info for each node.

The remaining branches are either calling btrfs_abort_transaction(),
which should provide function and line number, or have extra error
message like alloc_reserved_extent(), or should trigger error messages
from validate_extent_buffer() when failed, like btrfs_insert_empty_item().

Thus this two patches should cover most of the error cases.

Qu Wenruo (2):
  btrfs: add extra error message for tree block level mismatch
  btrfs: always report error for run_one_delayed_ref()

 fs/btrfs/disk-io.c     | 4 ++++
 fs/btrfs/extent-tree.c | 7 +++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] btrfs: add extra error message for tree block level mismatch
  2022-12-26  1:00 [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref() Qu Wenruo
@ 2022-12-26  1:00 ` Qu Wenruo
  2023-01-03  8:36   ` Anand Jain
  2022-12-26  1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo
  1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2022-12-26  1:00 UTC (permalink / raw)
  To: linux-btrfs

There is a bug report that commit 947a629988f1 ("btrfs: move tree block
parentness check into validate_extent_buffer()") caused some fs to go RO
under heavy write workload.

The full dmesg provided very little useful info, but surprisingly if
there is something wrong with the tree block, we should got some error
message in validate_extent_buffer().

However this is not always true, as there is one missing error message
for validate_extent_buffer(), tree level check.

So this patch will add the missing error message for
validate_extent_buffer() to make later debug much easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f8b5955f003f..3421f06eade3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -530,6 +530,10 @@ static int validate_extent_buffer(struct extent_buffer *eb,
 	}
 
 	if (found_level != check->level) {
+		btrfs_err_rl(eb->fs_info,
+	"level verify failed on logical %llu mirror %u wanted %u found %u",
+			     eb->start, eb->read_mirror, check->level,
+			     found_level);
 		ret = -EIO;
 		goto out;
 	}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] btrfs: always report error for run_one_delayed_ref()
  2022-12-26  1:00 [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref() Qu Wenruo
  2022-12-26  1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo
@ 2022-12-26  1:00 ` Qu Wenruo
  2023-01-02 16:45   ` David Sterba
  2023-01-03  8:48   ` Anand Jain
  1 sibling, 2 replies; 6+ messages in thread
From: Qu Wenruo @ 2022-12-26  1:00 UTC (permalink / raw)
  To: linux-btrfs

Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but
if end users hit such problem, there will be no chance that
btrfs_debug() is enabled.

This can lead to very little useful info for debug.

This patch will:

- Add extra info for error reporting
  Including:
  * logical bytenr
  * num_bytes
  * type
  * action
  * ref_mod

- Replace the btrfs_debug() with btrfs_err()

- Move the error reporting into run_one_delayed_ref()
  This is to avoid use-after-free, the @node can be freed in the caller.

This error should only be triggered at most once.

As if run_one_delayed_ref() failed, we trigger the error message, then
causing the call chain to error out:

btrfs_run_delayed_refs()
`- btrfs_run_delayed_refs()
   `- btrfs_run_delayed_refs_for_head()
      `- run_one_delayed_ref()

And we will abort the current transaction in btrfs_run_delayed_refs().
If we have to run delayed refs for the abort transaction,
run_one_delayed_ref() will just cleanup the refs and do nothing, thus no
new error messages would be output.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index eaa1fb2850d7..d1a4e51f8fbc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1713,6 +1713,11 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans,
 		BUG();
 	if (ret && insert_reserved)
 		btrfs_pin_extent(trans, node->bytenr, node->num_bytes, 1);
+	if (ret < 0)
+		btrfs_err(trans->fs_info,
+"failed to run delayed ref for logical %llu num_bytes %llu type %u action %u ref_mod %d: %d",
+			  node->bytenr, node->num_bytes, node->type,
+			  node->action, node->ref_mod, ret);
 	return ret;
 }
 
@@ -1954,8 +1959,6 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
 		if (ret) {
 			unselect_delayed_ref_head(delayed_refs, locked_ref);
 			btrfs_put_delayed_ref(ref);
-			btrfs_debug(fs_info, "run_one_delayed_ref returned %d",
-				    ret);
 			return ret;
 		}
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: always report error for run_one_delayed_ref()
  2022-12-26  1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo
@ 2023-01-02 16:45   ` David Sterba
  2023-01-03  8:48   ` Anand Jain
  1 sibling, 0 replies; 6+ messages in thread
From: David Sterba @ 2023-01-02 16:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Dec 26, 2022 at 09:00:40AM +0800, Qu Wenruo wrote:
> Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but
> if end users hit such problem, there will be no chance that
> btrfs_debug() is enabled.
> 
> This can lead to very little useful info for debug.
> 
> This patch will:
> 
> - Add extra info for error reporting
>   Including:
>   * logical bytenr
>   * num_bytes
>   * type
>   * action
>   * ref_mod
> 
> - Replace the btrfs_debug() with btrfs_err()
> 
> - Move the error reporting into run_one_delayed_ref()
>   This is to avoid use-after-free, the @node can be freed in the caller.
> 
> This error should only be triggered at most once.
> 
> As if run_one_delayed_ref() failed, we trigger the error message, then
> causing the call chain to error out:
> 
> btrfs_run_delayed_refs()
> `- btrfs_run_delayed_refs()
>    `- btrfs_run_delayed_refs_for_head()
>       `- run_one_delayed_ref()
> 
> And we will abort the current transaction in btrfs_run_delayed_refs().
> If we have to run delayed refs for the abort transaction,
> run_one_delayed_ref() will just cleanup the refs and do nothing, thus no
> new error messages would be output.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] btrfs: add extra error message for tree block level mismatch
  2022-12-26  1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo
@ 2023-01-03  8:36   ` Anand Jain
  0 siblings, 0 replies; 6+ messages in thread
From: Anand Jain @ 2023-01-03  8:36 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/26/22 09:00, Qu Wenruo wrote:
> There is a bug report that commit 947a629988f1 ("btrfs: move tree block
> parentness check into validate_extent_buffer()") caused some fs to go RO
> under heavy write workload.
> 
> The full dmesg provided very little useful info, but surprisingly if
> there is something wrong with the tree block, we should got some error
> message in validate_extent_buffer().
> 
> However this is not always true, as there is one missing error message
> for validate_extent_buffer(), tree level check.
> 
> So this patch will add the missing error message for
> validate_extent_buffer() to make later debug much easier.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>



> ---
>   fs/btrfs/disk-io.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f8b5955f003f..3421f06eade3 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -530,6 +530,10 @@ static int validate_extent_buffer(struct extent_buffer *eb,
>   	}
>   
>   	if (found_level != check->level) {
> +		btrfs_err_rl(eb->fs_info,
> +	"level verify failed on logical %llu mirror %u wanted %u found %u",
> +			     eb->start, eb->read_mirror, check->level,
> +			     found_level);
>   		ret = -EIO;
>   		goto out;
>   	}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] btrfs: always report error for run_one_delayed_ref()
  2022-12-26  1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo
  2023-01-02 16:45   ` David Sterba
@ 2023-01-03  8:48   ` Anand Jain
  1 sibling, 0 replies; 6+ messages in thread
From: Anand Jain @ 2023-01-03  8:48 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 12/26/22 09:00, Qu Wenruo wrote:
> Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but
> if end users hit such problem, there will be no chance that
> btrfs_debug() is enabled.
> 
> This can lead to very little useful info for debug.
> 
> This patch will:
> 
> - Add extra info for error reporting
>    Including:
>    * logical bytenr
>    * num_bytes
>    * type
>    * action
>    * ref_mod
> 
> - Replace the btrfs_debug() with btrfs_err()
> 
> - Move the error reporting into run_one_delayed_ref()
>    This is to avoid use-after-free, the @node can be freed in the caller.
> 
> This error should only be triggered at most once.
> 
> As if run_one_delayed_ref() failed, we trigger the error message, then
> causing the call chain to error out:
> 
> btrfs_run_delayed_refs()
> `- btrfs_run_delayed_refs()
>     `- btrfs_run_delayed_refs_for_head()
>        `- run_one_delayed_ref()
> 
> And we will abort the current transaction in btrfs_run_delayed_refs().
> If we have to run delayed refs for the abort transaction,
> run_one_delayed_ref() will just cleanup the refs and do nothing, thus no
> new error messages would be output.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Anand Jain <anand.jain@oracle.com>



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-03  8:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-26  1:00 [PATCH 0/2] btrfs: add error reports for possible run_one_delayed_ref() Qu Wenruo
2022-12-26  1:00 ` [PATCH 1/2] btrfs: add extra error message for tree block level mismatch Qu Wenruo
2023-01-03  8:36   ` Anand Jain
2022-12-26  1:00 ` [PATCH 2/2] btrfs: always report error for run_one_delayed_ref() Qu Wenruo
2023-01-02 16:45   ` David Sterba
2023-01-03  8:48   ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.