* [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort
@ 2022-08-25 7:09 Qu Wenruo
2022-08-25 7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-08-25 7:09 UTC (permalink / raw)
To: linux-btrfs
[Changelog]
v3:
- Drop the multi-line output change
The biggest problem is, for the space info dump, it can be ratelimited
especially for enospc_debug mount option.
Thus multi-line can be more confusing when it's related-limited or
interrupted by other lines.
So this version will reuse the old single line output.
v2:
- Add a new line to show the meaning of the metadata dump.
Previous output only includes the reserved bytes and size bytes,
but not showing which is which, thus still need to check the code
We have some internal reports of transaction abort with -ENOSPC.
Unfortunately it's really hard to debug, since we really got nothing
other than the error message for most cases.
Also we have helpers like __btrfs_dump_space_info(), it needs enospc_debug
mount option which only makes sense if the user can reproduce the bug
and retry with that mount option.
Considering ENOSPC should not happen for critical paths which failure
means transaction abort, we should dump all space info for debug purpose
if the transaction arbot is caused by -ENOSPC.
One example of the output at transaction abort time:
------------[ cut here ]------------
BTRFS: Transaction aborted (error -28)
WARNING: CPU: 8 PID: 3366 at fs/btrfs/transaction.c:2137 btrfs_commit_transaction+0xf81/0xfb0 [btrfs]
<call trace skipped>
---[ end trace 0000000000000000 ]---
BTRFS info (device dm-1: state A): dumping space info:
BTRFS info (device dm-1: state A): space_info DATA has 6791168 free, is not full
BTRFS info (device dm-1: state A): space_info total=8388608, used=1597440, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
BTRFS info (device dm-1: state A): space_info METADATA has 257114112 free, is not full
BTRFS info (device dm-1: state A): space_info total=268435456, used=131072, pinned=180224, reserved=65536, may_use=10878976, readonly=65536 zone_unusable=0
BTRFS info (device dm-1: state A): space_info SYS has 8372224 free, is not full
BTRFS info (device dm-1: state A): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
BTRFS info (device dm-1: state A): global_block_rsv: size 3670016 reserved 3670016
BTRFS info (device dm-1: state A): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-1: state A): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-1: state A): delayed_block_rsv: size 4063232 reserved 4063232
BTRFS info (device dm-1: state A): delayed_refs_rsv: size 3145728 reserved 3145728
BTRFS: error (device dm-1: state A) in btrfs_commit_transaction:2137: errno=-28 No space left
BTRFS info (device dm-1: state EA): forced readonly
Qu Wenruo (2):
btrfs: output human readable space info flag
btrfs: dump all space infos if we abort transaction due to ENOSPC
fs/btrfs/ctree.h | 6 +++--
fs/btrfs/space-info.c | 52 +++++++++++++++++++++++++++++++++++--------
fs/btrfs/space-info.h | 2 ++
fs/btrfs/super.c | 4 +++-
4 files changed, 52 insertions(+), 12 deletions(-)
--
2.37.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/2] btrfs: output human readable space info flag
2022-08-25 7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
@ 2022-08-25 7:09 ` Qu Wenruo
2022-08-25 7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
2022-09-02 13:43 ` [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2022-08-25 7:09 UTC (permalink / raw)
To: linux-btrfs; +Cc: Johannes Thumshirn
For btrfs_space_info, its flags has only 4 possible values:
- BTRFS_BLOCK_GROUP_SYSTEM
- BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA
- BTRFS_BLOCK_GROUP_METADATA
- BTRFS_BLOCK_GROUP_DATA
Thus do debuggers a favor by output a human readable flags in
__btrfs_dump_space_info().
Now the summary line of __btrfs_dump_space_info() looks like:
BTRFS info (device dm-1: state A): space_info META has 251494400 free, is not full
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/space-info.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 477e57ace48d..1f524b62313c 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -476,14 +476,31 @@ do { \
spin_unlock(&__rsv->lock); \
} while (0)
+static const char *space_info_flag_to_str(struct btrfs_space_info *space_info)
+{
+ switch (space_info->flags) {
+ case BTRFS_BLOCK_GROUP_SYSTEM:
+ return "SYS";
+ case BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA:
+ return "DATA+META";
+ case BTRFS_BLOCK_GROUP_DATA:
+ return "DATA";
+ case BTRFS_BLOCK_GROUP_METADATA:
+ return "METADATA";
+ default:
+ return "UNKNOWN";
+ }
+}
+
static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *info)
{
+ const char *flag_str = space_info_flag_to_str(info);
lockdep_assert_held(&info->lock);
/* The free space could be negative in case of overcommit */
- btrfs_info(fs_info, "space_info %llu has %lld free, is %sfull",
- info->flags,
+ btrfs_info(fs_info, "space_info %s has %lld free, is %sfull",
+ flag_str,
(s64)(info->total_bytes - btrfs_space_info_used(info, true)),
info->full ? "" : "not ");
btrfs_info(fs_info,
--
2.37.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC
2022-08-25 7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
2022-08-25 7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
@ 2022-08-25 7:09 ` Qu Wenruo
2022-08-25 8:18 ` Johannes Thumshirn
2022-09-02 13:43 ` [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort David Sterba
2 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2022-08-25 7:09 UTC (permalink / raw)
To: linux-btrfs
We have hit some transaction abort due to -ENOSPC internally.
Normally we should always reserve enough space for metadata for every
transaction, thus hitting -ENOSPC should really indicate some cases we
didn't expect.
But unfrotunately current error reporting will only give a kernel
wanring and backtrace, not really helpful to debug what's causing the
problem.
And debug_enospc can only help when user can reproduce the problem, but
under most cases, such transaction abort by -ENOSPC is really hard to
reproduce.
So this patch will dump all space infos (data, metadata, system) when we
abort the first transaction with -ENOSPC.
This should at least provide some clue to us.
The example of a dump would look like this:
------------[ cut here ]------------
BTRFS: Transaction aborted (error -28)
WARNING: CPU: 8 PID: 3366 at fs/btrfs/transaction.c:2137 btrfs_commit_transaction+0xf81/0xfb0 [btrfs]
<call trace skipped>
---[ end trace 0000000000000000 ]---
BTRFS info (device dm-1: state A): dumping space info:
BTRFS info (device dm-1: state A): space_info DATA has 6791168 free, is not full
BTRFS info (device dm-1: state A): space_info total=8388608, used=1597440, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
BTRFS info (device dm-1: state A): space_info METADATA has 257114112 free, is not full
BTRFS info (device dm-1: state A): space_info total=268435456, used=131072, pinned=180224, reserved=65536, may_use=10878976, readonly=65536 zone_unusable=0
BTRFS info (device dm-1: state A): space_info SYS has 8372224 free, is not full
BTRFS info (device dm-1: state A): space_info total=8388608, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
BTRFS info (device dm-1: state A): global_block_rsv: size 3670016 reserved 3670016
BTRFS info (device dm-1: state A): trans_block_rsv: size 0 reserved 0
BTRFS info (device dm-1: state A): chunk_block_rsv: size 0 reserved 0
BTRFS info (device dm-1: state A): delayed_block_rsv: size 4063232 reserved 4063232
BTRFS info (device dm-1: state A): delayed_refs_rsv: size 3145728 reserved 3145728
BTRFS: error (device dm-1: state A) in btrfs_commit_transaction:2137: errno=-28 No space left
BTRFS info (device dm-1: state EA): forced readonly
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/ctree.h | 6 ++++--
fs/btrfs/space-info.c | 31 ++++++++++++++++++++++++-------
fs/btrfs/space-info.h | 2 ++
fs/btrfs/super.c | 4 +++-
4 files changed, 33 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3dc30f5e6fd0..8781f86aa534 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3830,7 +3830,7 @@ const char * __attribute_const__ btrfs_decode_error(int errno);
__cold
void __btrfs_abort_transaction(struct btrfs_trans_handle *trans,
const char *function,
- unsigned int line, int errno);
+ unsigned int line, int errno, bool first_hit);
/*
* Call btrfs_abort_transaction as early as possible when an error condition is
@@ -3838,9 +3838,11 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle *trans,
*/
#define btrfs_abort_transaction(trans, errno) \
do { \
+ bool first = false; \
/* Report first abort since mount */ \
if (!test_and_set_bit(BTRFS_FS_STATE_TRANS_ABORTED, \
&((trans)->fs_info->fs_state))) { \
+ first = true; \
if ((errno) != -EIO && (errno) != -EROFS) { \
WARN(1, KERN_DEBUG \
"BTRFS: Transaction aborted (error %d)\n", \
@@ -3852,7 +3854,7 @@ do { \
} \
} \
__btrfs_abort_transaction((trans), __func__, \
- __LINE__, (errno)); \
+ __LINE__, (errno), first); \
} while (0)
#ifdef CONFIG_PRINTK_INDEX
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 1f524b62313c..b0a9ba34eca2 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -492,6 +492,15 @@ static const char *space_info_flag_to_str(struct btrfs_space_info *space_info)
}
}
+static void dump_global_block_rsv(struct btrfs_fs_info *fs_info)
+{
+ DUMP_BLOCK_RSV(fs_info, global_block_rsv);
+ DUMP_BLOCK_RSV(fs_info, trans_block_rsv);
+ DUMP_BLOCK_RSV(fs_info, chunk_block_rsv);
+ DUMP_BLOCK_RSV(fs_info, delayed_block_rsv);
+ DUMP_BLOCK_RSV(fs_info, delayed_refs_rsv);
+}
+
static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *info)
{
@@ -508,13 +517,6 @@ static void __btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
info->total_bytes, info->bytes_used, info->bytes_pinned,
info->bytes_reserved, info->bytes_may_use,
info->bytes_readonly, info->bytes_zone_unusable);
-
- DUMP_BLOCK_RSV(fs_info, global_block_rsv);
- DUMP_BLOCK_RSV(fs_info, trans_block_rsv);
- DUMP_BLOCK_RSV(fs_info, chunk_block_rsv);
- DUMP_BLOCK_RSV(fs_info, delayed_block_rsv);
- DUMP_BLOCK_RSV(fs_info, delayed_refs_rsv);
-
}
void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
@@ -526,6 +528,7 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
spin_lock(&info->lock);
__btrfs_dump_space_info(fs_info, info);
+ dump_global_block_rsv(fs_info);
spin_unlock(&info->lock);
if (!dump_block_groups)
@@ -1770,3 +1773,17 @@ int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
}
return ret;
}
+
+/* Dump all the space infos when we abort a transaction due to ENOSPC. */
+__cold void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_space_info *space_info;
+
+ btrfs_info(fs_info, "dumping space info:");
+ list_for_each_entry(space_info, &fs_info->space_info, list) {
+ spin_lock(&space_info->lock);
+ __btrfs_dump_space_info(fs_info, space_info);
+ spin_unlock(&space_info->lock);
+ }
+ dump_global_block_rsv(fs_info);
+}
diff --git a/fs/btrfs/space-info.h b/fs/btrfs/space-info.h
index 2039096803ed..8f5948740941 100644
--- a/fs/btrfs/space-info.h
+++ b/fs/btrfs/space-info.h
@@ -157,4 +157,6 @@ static inline void btrfs_space_info_free_bytes_may_use(
}
int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
enum btrfs_reserve_flush_enum flush);
+void btrfs_dump_space_info_for_trans_abort(struct btrfs_fs_info *fs_info);
+
#endif /* BTRFS_SPACE_INFO_H */
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f1c6ca59299e..019141371c14 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -346,12 +346,14 @@ void __cold btrfs_err_32bit_limit(struct btrfs_fs_info *fs_info)
__cold
void __btrfs_abort_transaction(struct btrfs_trans_handle *trans,
const char *function,
- unsigned int line, int errno)
+ unsigned int line, int errno, bool first_hit)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
WRITE_ONCE(trans->aborted, errno);
WRITE_ONCE(trans->transaction->aborted, errno);
+ if (first_hit && errno == -ENOSPC)
+ btrfs_dump_space_info_for_trans_abort(fs_info);
/* Wake up anybody who may be waiting on this transaction */
wake_up(&fs_info->transaction_wait);
wake_up(&fs_info->transaction_blocked_wait);
--
2.37.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC
2022-08-25 7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
@ 2022-08-25 8:18 ` Johannes Thumshirn
0 siblings, 0 replies; 5+ messages in thread
From: Johannes Thumshirn @ 2022-08-25 8:18 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort
2022-08-25 7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
2022-08-25 7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
2022-08-25 7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
@ 2022-09-02 13:43 ` David Sterba
2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2022-09-02 13:43 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Thu, Aug 25, 2022 at 03:09:08PM +0800, Qu Wenruo wrote:
> [Changelog]
> v3:
> - Drop the multi-line output change
> The biggest problem is, for the space info dump, it can be ratelimited
> especially for enospc_debug mount option.
> Thus multi-line can be more confusing when it's related-limited or
> interrupted by other lines.
>
> So this version will reuse the old single line output.
Added to misc-next, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-09-02 14:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 7:09 [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort Qu Wenruo
2022-08-25 7:09 ` [PATCH v3 1/2] btrfs: output human readable space info flag Qu Wenruo
2022-08-25 7:09 ` [PATCH v3 2/2] btrfs: dump all space infos if we abort transaction due to ENOSPC Qu Wenruo
2022-08-25 8:18 ` Johannes Thumshirn
2022-09-02 13:43 ` [PATCH v3 0/2] btrfs: output more info for -ENOSPC caused transaction abort David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).