From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
Nikolay Borisov <nborisov@suse.com>,
Damien Le Moal <damien.lemoal@wdc.com>,
Matias Bjorling <Matias.Bjorling@wdc.com>,
Johannes Thumshirn <jthumshirn@suse.de>,
Hannes Reinecke <hare@suse.com>,
linux-fsdevel@vger.kernel.org,
Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH v3 12/27] btrfs: ensure metadata space available on/after degraded mount in HMZONED
Date: Thu, 8 Aug 2019 18:30:23 +0900 [thread overview]
Message-ID: <20190808093038.4163421-13-naohiro.aota@wdc.com> (raw)
In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com>
On/After degraded mount, we might have no writable metadata block group due
to broken write pointers. If you e.g. balance the FS before writing any
data, alloc_tree_block_no_bg_flush() (called from insert_balance_item())
fails to allocate a tree block for it, due to global reservation failure.
We can reproduce this situation with xfstests btrfs/124.
While we can workaround the failure if we write some data and, as a result
of writing, let a new metadata block group allocated, it's a bad practice
to apply.
This commit avoids such failures by ensuring that read-write mounted volume
has non-zero metadata space. If metadata space is empty, it forces new
metadata block group allocation.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
fs/btrfs/disk-io.c | 9 +++++++++
fs/btrfs/hmzoned.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/hmzoned.h | 1 +
3 files changed, 55 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8854ff2e5fa5..65b3198c6e83 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3287,6 +3287,15 @@ int open_ctree(struct super_block *sb,
}
}
+ ret = btrfs_hmzoned_check_metadata_space(fs_info);
+ if (ret) {
+ btrfs_warn(fs_info, "failed to allocate metadata space: %d",
+ ret);
+ btrfs_warn(fs_info, "try remount with readonly");
+ close_ctree(fs_info);
+ return ret;
+ }
+
down_read(&fs_info->cleanup_work_sem);
if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
(ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c
index 89631f5f01f2..38cc1bbfe118 100644
--- a/fs/btrfs/hmzoned.c
+++ b/fs/btrfs/hmzoned.c
@@ -13,6 +13,8 @@
#include "hmzoned.h"
#include "rcu-string.h"
#include "disk-io.h"
+#include "space-info.h"
+#include "transaction.h"
/* Maximum number of zones to report per blkdev_report_zones() call */
#define BTRFS_REPORT_NR_ZONES 4096
@@ -551,3 +553,46 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache)
return ret;
}
+
+/*
+ * On/After degraded mount, we might have no writable metadata block
+ * group due to broken write pointers. If you e.g. balance the FS
+ * before writing any data, alloc_tree_block_no_bg_flush() (called
+ * from insert_balance_item())fails to allocate a tree block for
+ * it. To avoid such situations, ensure we have some metadata BG here.
+ */
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_root *root = fs_info->extent_root;
+ struct btrfs_trans_handle *trans;
+ struct btrfs_space_info *info;
+ u64 left;
+ int ret;
+
+ if (!btrfs_fs_incompat(fs_info, HMZONED))
+ return 0;
+
+ info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+ spin_lock(&info->lock);
+ left = info->total_bytes - btrfs_space_info_used(info, true);
+ spin_unlock(&info->lock);
+
+ if (left)
+ return 0;
+
+ trans = btrfs_start_transaction(root, 0);
+ if (IS_ERR(trans))
+ return PTR_ERR(trans);
+
+ mutex_lock(&fs_info->chunk_mutex);
+ ret = btrfs_alloc_chunk(trans, btrfs_metadata_alloc_profile(fs_info));
+ if (ret) {
+ mutex_unlock(&fs_info->chunk_mutex);
+ btrfs_abort_transaction(trans, ret);
+ btrfs_end_transaction(trans);
+ return ret;
+ }
+ mutex_unlock(&fs_info->chunk_mutex);
+
+ return btrfs_commit_transaction(trans);
+}
diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h
index 399d9e9543aa..e95139d4c072 100644
--- a/fs/btrfs/hmzoned.h
+++ b/fs/btrfs/hmzoned.h
@@ -32,6 +32,7 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info);
bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
u64 num_bytes);
int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache);
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info);
static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
{
--
2.22.0
next prev parent reply other threads:[~2019-08-08 9:31 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-08 9:30 [PATCH v3 00/27] btrfs zoned block device support Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 01/27] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-08-16 4:49 ` Anand Jain
2019-08-08 9:30 ` [PATCH v3 02/27] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-08-16 4:44 ` Anand Jain
2019-08-16 14:19 ` Damien Le Moal
2019-08-16 23:47 ` Anand Jain
2019-08-16 23:55 ` Damien Le Moal
2019-08-08 9:30 ` [PATCH v3 03/27] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-08-16 5:46 ` Anand Jain
2019-08-16 14:23 ` Damien Le Moal
2019-08-16 23:56 ` Anand Jain
2019-08-17 0:05 ` Damien Le Moal
2019-08-20 5:07 ` Naohiro Aota
2019-08-20 13:05 ` David Sterba
2019-08-08 9:30 ` [PATCH v3 04/27] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 05/27] btrfs: disallow space_cache " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 06/27] btrfs: disallow NODATACOW " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 07/27] btrfs: disable tree-log " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 08/27] btrfs: disable fallocate " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 09/27] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 10/27] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 11/27] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-08-08 9:30 ` Naohiro Aota [this message]
2019-08-08 9:30 ` [PATCH v3 13/27] btrfs: reset zones of unused block groups Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 14/27] btrfs: limit super block locations in HMZONED mode Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 15/27] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 16/27] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 17/27] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 18/27] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 19/27] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 20/27] btrfs: wait existing extents before truncating Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 21/27] btrfs: avoid async checksum/submit on HMZONED mode Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 22/27] btrfs: disallow mixed-bg in " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 23/27] btrfs: disallow inode_cache " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 24/27] btrfs: support dev-replace " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 25/27] btrfs: enable relocation " Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 26/27] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-08-08 9:30 ` [PATCH v3 27/27] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190808093038.4163421-13-naohiro.aota@wdc.com \
--to=naohiro.aota@wdc.com \
--cc=Matias.Bjorling@wdc.com \
--cc=clm@fb.com \
--cc=damien.lemoal@wdc.com \
--cc=dsterba@suse.com \
--cc=hare@suse.com \
--cc=josef@toxicpanda.com \
--cc=jthumshirn@suse.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).