linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort
@ 2022-08-25  7:09 Qu Wenruo
  2022-08-25  7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-08-25  7:09 UTC (permalink / raw)
  To: linux-btrfs

[Changelog]
v3:
- Drop the multi-line output change
  The biggest problem is, for the space info dump, it can be ratelimited
  especially for enospc_debug mount option.
  Thus multi-line can be more confusing when it's related-limited or
  interrupted by other lines.

  So this version will reuse the old single line output.

v2:
- Add a new line to show the meaning of the metadata dump.
  Previous output only includes the reserved bytes and size bytes,
  but not showing which is which, thus still need to check the code

We have some internal reports of transaction abort with -ENOSPC.

Unfortunately it's really hard to debug, since we really got nothing
other than the error message for most cases.

Also we have helpers like __btrfs_dump_space_info(), it needs enospc_debug
mount option which only makes sense if the user can reproduce the bug
and retry with that mount option.

Considering ENOSPC should not happen for critical paths which failure
means transaction abort, we should dump all space info for debug purpose
if the transaction arbot is caused by -ENOSPC.

One example of the output at transaction abort time:

 ------------[ cut here ]------------
 BTRFS: Transaction aborted (error -28)
 WARNING: CPU: 8 PID: 3366 at fs/btrfs/transaction.c:2137 btrfs_commit_transaction+0xf81/0xfb0 [btrfs]
 <call trace skipped>
 ---[ end trace 0000000000000000 ]---
 BTRFS info (device dm-1: state A): dumping space info:
 BTRFS info (device dm-1: state A): space_info DATA has 6791168 free, is not full
 BTRFS info (device dm-1: state A): space_info total=8388608, used=1597440, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
 BTRFS info (device dm-1: state A): space_info METADATA has 257114112 free, is not full
 BTRFS info (device dm-1: state A): space_info total=268435456, used=131072, pinned=180224, reserved=65536, may_use=10878976, readonly=65536 zone_unusable=0
 BTRFS info (device dm-1: state A): space_info SYS has 8372224 free, is not full
 BTRFS info (device dm-1: state A): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
 BTRFS info (device dm-1: state A): global_block_rsv: size 3670016 reserved 3670016
 BTRFS info (device dm-1: state A): trans_block_rsv: size 0 reserved 0
 BTRFS info (device dm-1: state A): chunk_block_rsv: size 0 reserved 0
 BTRFS info (device dm-1: state A): delayed_block_rsv: size 4063232 reserved 4063232
 BTRFS info (device dm-1: state A): delayed_refs_rsv: size 3145728 reserved 3145728
 BTRFS: error (device dm-1: state A) in btrfs_commit_transaction:2137: errno=-28 No space left
 BTRFS info (device dm-1: state EA): forced readonly


Qu Wenruo (2):
  btrfs: output human readable space info flag
  btrfs: dump all space infos if we abort transaction due to ENOSPC

 fs/btrfs/ctree.h      |  6 +++--
 fs/btrfs/space-info.c | 52 +++++++++++++++++++++++++++++++++++--------
 fs/btrfs/space-info.h |  2 ++
 fs/btrfs/super.c      |  4 +++-
 4 files changed, 52 insertions(+), 12 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/2] btrfs: output human readable space info flag
  2022-08-25  7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
@ 2022-08-25  7:09 ` Qu Wenruo
  2022-08-25  7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
  2022-09-02 13:43 ` [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort David Sterba
  2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-08-25  7:09 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Johannes Thumshirn

For btrfs_space_info, its flags has only 4 possible values:
- BTRFS_BLOCK_GROUP_SYSTEM
- BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA
- BTRFS_BLOCK_GROUP_METADATA
- BTRFS_BLOCK_GROUP_DATA

Thus do debuggers a favor by output a human readable flags in
__btrfs_dump_space_info().

Now the summary line of __btrfs_dump_space_info() looks like:

 BTRFS info (device dm-1: state A): space_info META has 251494400 free, is not full

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/space-info.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 477e57ace48d..1f524b62313c 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -476,14 +476,31 @@ do {									\
 	spin_unlock(&__rsv->lock);					\
 } while (0)
 
+static const char *space_info_flag_to_str(struct btrfs_space_info *space_info)
+{
+	switch (space_info->flags) {
+	case BTRFS_BLOCK_GROUP_SYSTEM:
+		return "SYS";
+	case BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA:
+		return "DATA+META";
+	case BTRFS_BLOCK_GROUP_DATA:
+		return "DATA";
+	case BTRFS_BLOCK_GROUP_METADATA:
+		return "METADATA";
+	default:
+		return "UNKNOWN";
+	}
+}
+
 static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 				    struct btrfs_space_info *info)
 {
+	const char *flag_str = space_info_flag_to_str(info);
 	lockdep_assert_held(&info->lock);
 
 	/* The free space could be negative in case of overcommit */
-	btrfs_info(fs_info, "space_info %llu has %lld free, is %sfull",
-		   info->flags,
+	btrfs_info(fs_info, "space_info %s has %lld free, is %sfull",
+		   flag_str,
 		   (s64)(info->total_bytes - btrfs_space_info_used(info, true)),
 		   info->full ? "" : "not ");
 	btrfs_info(fs_info,
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC
  2022-08-25  7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
  2022-08-25  7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
@ 2022-08-25  7:09 ` Qu Wenruo
  2022-08-25  8:18   ` Johannes Thumshirn
  2022-09-02 13:43 ` [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort David Sterba
  2 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2022-08-25  7:09 UTC (permalink / raw)
  To: linux-btrfs

We have hit some transaction abort due to -ENOSPC internally.

Normally we should always reserve enough space for metadata for every
transaction, thus hitting -ENOSPC should really indicate some cases we
didn't expect.

But unfrotunately current error reporting will only give a kernel
wanring and backtrace, not really helpful to debug what's causing the
problem.

And debug_enospc can only help when user can reproduce the problem, but
under most cases, such transaction abort by -ENOSPC is really hard to
reproduce.

So this patch will dump all space infos (data, metadata, system) when we
abort the first transaction with -ENOSPC.

This should at least provide some clue to us.

The example of a dump would look like this:

 ------------[ cut here ]------------
 BTRFS: Transaction aborted (error -28)
 WARNING: CPU: 8 PID: 3366 at fs/btrfs/transaction.c:2137 btrfs_commit_transaction+0xf81/0xfb0 [btrfs]
 <call trace skipped>
 ---[ end trace 0000000000000000 ]---
 BTRFS info (device dm-1: state A): dumping space info:
 BTRFS info (device dm-1: state A): space_info DATA has 6791168 free, is not full
 BTRFS info (device dm-1: state A): space_info total=8388608, used=1597440, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
 BTRFS info (device dm-1: state A): space_info METADATA has 257114112 free, is not full
 BTRFS info (device dm-1: state A): space_info total=268435456, used=131072, pinned=180224, reserved=65536, may_use=10878976, readonly=65536 zone_unusable=0
 BTRFS info (device dm-1: state A): space_info SYS has 8372224 free, is not full
 BTRFS info (device dm-1: state A): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
 BTRFS info (device dm-1: state A): global_block_rsv: size 3670016 reserved 3670016
 BTRFS info (device dm-1: state A): trans_block_rsv: size 0 reserved 0
 BTRFS info (device dm-1: state A): chunk_block_rsv: size 0 reserved 0
 BTRFS info (device dm-1: state A): delayed_block_rsv: size 4063232 reserved 4063232
 BTRFS info (device dm-1: state A): delayed_refs_rsv: size 3145728 reserved 3145728
 BTRFS: error (device dm-1: state A) in btrfs_commit_transaction:2137: errno=-28 No space left
 BTRFS info (device dm-1: state EA): forced readonly

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h      |  6 ++++--
 fs/btrfs/space-info.c | 31 ++++++++++++++++++++++++-------
 fs/btrfs/space-info.h |  2 ++
 fs/btrfs/super.c      |  4 +++-
 4 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3dc30f5e6fd0..8781f86aa534 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3830,7 +3830,7 @@ const char * __attribute_const__ btrfs_decode_error(int errno);
 __cold
 void __btrfs_abort_transaction(struct btrfs_trans_handle *trans,
 			       const char *function,
-			       unsigned int line, int errno);
+			       unsigned int line, int errno, bool first_hit);
 
 /*
  * Call btrfs_abort_transaction as early as possible when an error condition is
@@ -3838,9 +3838,11 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle *trans,
  */
 #define btrfs_abort_transaction(trans, errno)		\
 do {								\
+	bool first = false;					\
 	/* Report first abort since mount */			\
 	if (!test_and_set_bit(BTRFS_FS_STATE_TRANS_ABORTED,	\
 			&((trans)->fs_info->fs_state))) {	\
+		first = true;					\
 		if ((errno) != -EIO && (errno) != -EROFS) {		\
 			WARN(1, KERN_DEBUG				\
 			"BTRFS: Transaction aborted (error %d)\n",	\
@@ -3852,7 +3854,7 @@ do {								\
 		}						\
 	}							\
 	__btrfs_abort_transaction((trans), __func__,		\
-				  __LINE__, (errno));		\
+				  __LINE__, (errno), first);	\
 } while (0)
 
 #ifdef CONFIG_PRINTK_INDEX
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 1f524b62313c..b0a9ba34eca2 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -492,6 +492,15 @@ static const char *space_info_flag_to_str(struct btrfs_space_info *space_info)
 	}
 }
 
+static void dump_global_block_rsv(struct btrfs_fs_info *fs_info)
+{
+	DUMP_BLOCK_RSV(fs_info, global_block_rsv);
+	DUMP_BLOCK_RSV(fs_info, trans_block_rsv);
+	DUMP_BLOCK_RSV(fs_info, chunk_block_rsv);
+	DUMP_BLOCK_RSV(fs_info, delayed_block_rsv);
+	DUMP_BLOCK_RSV(fs_info, delayed_refs_rsv);
+}
+
 static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 				    struct btrfs_space_info *info)
 {
@@ -508,13 +517,6 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 		info->total_bytes, info->bytes_used, info->bytes_pinned,
 		info->bytes_reserved, info->bytes_may_use,
 		info->bytes_readonly, info->bytes_zone_unusable);
-
-	DUMP_BLOCK_RSV(fs_info, global_block_rsv);
-	DUMP_BLOCK_RSV(fs_info, trans_block_rsv);
-	DUMP_BLOCK_RSV(fs_info, chunk_block_rsv);
-	DUMP_BLOCK_RSV(fs_info, delayed_block_rsv);
-	DUMP_BLOCK_RSV(fs_info, delayed_refs_rsv);
-
 }
 
 void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
@@ -526,6 +528,7 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 
 	spin_lock(&info->lock);
 	__btrfs_dump_space_info(fs_info, info);
+	dump_global_block_rsv(fs_info);
 	spin_unlock(&info->lock);
 
 	if (!dump_block_groups)
@@ -1770,3 +1773,17 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 	}
 	return ret;
 }
+
+/* Dump all the space infos when we abort a transaction due to ENOSPC. */
+__cold void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_space_info *space_info;
+
+	btrfs_info(fs_info, "dumping space info:");
+	list_for_each_entry(space_info, &fs_info->space_info, list) {
+		spin_lock(&space_info->lock);
+		__btrfs_dump_space_info(fs_info, space_info);
+		spin_unlock(&space_info->lock);
+	}
+	dump_global_block_rsv(fs_info);
+}
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 2039096803ed..8f5948740941 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -157,4 +157,6 @@ static inline void btrfs_space_info_free_bytes_may_use(
 }
 int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
 			     enum btrfs_reserve_flush_enum flush);
+void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info);
+
 #endif /* BTRFS_SPACE_INFO_H */
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f1c6ca59299e..019141371c14 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -346,12 +346,14 @@ void __cold btrfs_err_32bit_limit(struct btrfs_fs_info *fs_info)
 __cold
 void __btrfs_abort_transaction(struct btrfs_trans_handle *trans,
 			       const char *function,
-			       unsigned int line, int errno)
+			       unsigned int line, int errno, bool first_hit)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 
 	WRITE_ONCE(trans->aborted, errno);
 	WRITE_ONCE(trans->transaction->aborted, errno);
+	if (first_hit && errno == -ENOSPC)
+		btrfs_dump_space_info_for_trans_abort(fs_info);
 	/* Wake up anybody who may be waiting on this transaction */
 	wake_up(&fs_info->transaction_wait);
 	wake_up(&fs_info->transaction_blocked_wait);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC
  2022-08-25  7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
@ 2022-08-25  8:18   ` Johannes Thumshirn
  0 siblings, 0 replies; 5+ messages in thread
From: Johannes Thumshirn @ 2022-08-25  8:18 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort
  2022-08-25  7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
  2022-08-25  7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
  2022-08-25  7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
@ 2022-09-02 13:43 ` David Sterba
  2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2022-09-02 13:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Aug 25, 2022 at 03:09:08PM +0800, Qu Wenruo wrote:
> [Changelog]
> v3:
> - Drop the multi-line output change
>   The biggest problem is, for the space info dump, it can be ratelimited
>   especially for enospc_debug mount option.
>   Thus multi-line can be more confusing when it's related-limited or
>   interrupted by other lines.
> 
>   So this version will reuse the old single line output.

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-02 14:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25  7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
2022-08-25  7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
2022-08-25  7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
2022-08-25  8:18   ` Johannes Thumshirn
2022-09-02 13:43 ` [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).