From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
Nikolay Borisov <nborisov@suse.com>,
Damien Le Moal <damien.lemoal@wdc.com>,
Johannes Thumshirn <jthumshirn@suse.de>,
Hannes Reinecke <hare@suse.com>,
Anand Jain <anand.jain@oracle.com>,
linux-fsdevel@vger.kernel.org,
Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH v5 12/28] btrfs: ensure metadata space available on/after degraded mount in HMZONED
Date: Wed, 4 Dec 2019 17:17:19 +0900 [thread overview]
Message-ID: <20191204081735.852438-13-naohiro.aota@wdc.com> (raw)
In-Reply-To: <20191204081735.852438-1-naohiro.aota@wdc.com>
On/After degraded mount, we might have no writable metadata block group due
to broken write pointers. If you e.g. balance the FS before writing any
data, alloc_tree_block_no_bg_flush() (called from insert_balance_item())
fails to allocate a tree block for it, due to global reservation failure.
We can reproduce this situation with xfstests btrfs/124.
While we can workaround the failure if we write some data and, as a result
of writing, let a new metadata block group allocated, it's a bad practice
to apply.
This commit avoids such failures by ensuring that read-write mounted volume
has non-zero metadata space. If metadata space is empty, it forces new
metadata block group allocation.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
fs/btrfs/disk-io.c | 9 +++++++++
fs/btrfs/hmzoned.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/hmzoned.h | 6 ++++++
3 files changed, 60 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index deca9fd70771..7f4c6a92079a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3246,6 +3246,15 @@ int __cold open_ctree(struct super_block *sb,
}
}
+ ret = btrfs_hmzoned_check_metadata_space(fs_info);
+ if (ret) {
+ btrfs_warn(fs_info, "failed to allocate metadata space: %d",
+ ret);
+ btrfs_warn(fs_info, "try remount with readonly");
+ close_ctree(fs_info);
+ return ret;
+ }
+
down_read(&fs_info->cleanup_work_sem);
if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
(ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c
index 83dc2dc22323..4fd96fd43897 100644
--- a/fs/btrfs/hmzoned.c
+++ b/fs/btrfs/hmzoned.c
@@ -16,6 +16,8 @@
#include "disk-io.h"
#include "block-group.h"
#include "locking.h"
+#include "space-info.h"
+#include "transaction.h"
/* Maximum number of zones to report per blkdev_report_zones() call */
#define BTRFS_REPORT_NR_ZONES 4096
@@ -1075,3 +1077,46 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache)
return ret;
}
+
+/*
+ * On/After degraded mount, we might have no writable metadata block
+ * group due to broken write pointers. If you e.g. balance the FS
+ * before writing any data, alloc_tree_block_no_bg_flush() (called
+ * from insert_balance_item())fails to allocate a tree block for
+ * it. To avoid such situations, ensure we have some metadata BG here.
+ */
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_root *root = fs_info->extent_root;
+ struct btrfs_trans_handle *trans;
+ struct btrfs_space_info *info;
+ u64 left;
+ int ret;
+
+ if (!btrfs_fs_incompat(fs_info, HMZONED))
+ return 0;
+
+ info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+ spin_lock(&info->lock);
+ left = info->total_bytes - btrfs_space_info_used(info, true);
+ spin_unlock(&info->lock);
+
+ if (left)
+ return 0;
+
+ trans = btrfs_start_transaction(root, 0);
+ if (IS_ERR(trans))
+ return PTR_ERR(trans);
+
+ mutex_lock(&fs_info->chunk_mutex);
+ ret = btrfs_alloc_chunk(trans, btrfs_metadata_alloc_profile(fs_info));
+ if (ret) {
+ mutex_unlock(&fs_info->chunk_mutex);
+ btrfs_abort_transaction(trans, ret);
+ btrfs_end_transaction(trans);
+ return ret;
+ }
+ mutex_unlock(&fs_info->chunk_mutex);
+
+ return btrfs_commit_transaction(trans);
+}
diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h
index 4ed985d027cc..8ac758074afd 100644
--- a/fs/btrfs/hmzoned.h
+++ b/fs/btrfs/hmzoned.h
@@ -42,6 +42,7 @@ bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
u64 num_bytes);
void btrfs_calc_zone_unusable(struct btrfs_block_group *cache);
int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache);
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info);
#else /* CONFIG_BLK_DEV_ZONED */
static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
struct blk_zone *zone)
@@ -97,6 +98,11 @@ static inline int btrfs_load_block_group_zone_info(
{
return 0;
}
+static inline int btrfs_hmzoned_check_metadata_space(
+ struct btrfs_fs_info *fs_info)
+{
+ return 0;
+}
#endif
static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
--
2.24.0
next prev parent reply other threads:[~2019-12-04 8:19 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-04 8:17 [PATCH v5 00/28] btrfs: zoned block device support Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 01/28] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 02/28] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-12-04 15:37 ` Johannes Thumshirn
2019-12-04 17:22 ` David Sterba
2019-12-05 6:29 ` Naohiro Aota
2019-12-05 6:28 ` Naohiro Aota
2019-12-07 9:47 ` Anand Jain
2019-12-10 4:41 ` Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 03/28] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-12-04 16:07 ` Johannes Thumshirn
2019-12-05 5:17 ` Naohiro Aota
2019-12-05 15:28 ` David Sterba
2019-12-04 8:17 ` [PATCH v5 04/28] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 05/28] btrfs: disallow space_cache " Naohiro Aota
2019-12-05 7:21 ` Johannes Thumshirn
2019-12-05 15:39 ` David Sterba
2019-12-06 5:32 ` Naohiro Aota
2019-12-06 15:12 ` David Sterba
2019-12-04 8:17 ` [PATCH v5 06/28] btrfs: disallow NODATACOW " Naohiro Aota
2019-12-05 7:58 ` Johannes Thumshirn
2019-12-05 15:31 ` David Sterba
2019-12-06 5:37 ` Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 07/28] btrfs: disable fallocate " Naohiro Aota
2019-12-05 8:00 ` Johannes Thumshirn
2019-12-04 8:17 ` [PATCH v5 08/28] btrfs: implement log-structured superblock for " Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 09/28] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-12-05 8:56 ` Johannes Thumshirn
2019-12-06 5:45 ` Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 10/28] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 11/28] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-12-04 8:17 ` Naohiro Aota [this message]
2019-12-04 8:17 ` [PATCH v5 13/28] btrfs: reset zones of unused block groups Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 14/28] btrfs: redirty released extent buffers in HMZONED mode Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 15/28] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 16/28] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 17/28] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 18/28] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 19/28] btrfs: wait existing extents before truncating Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 20/28] btrfs: avoid async checksum on HMZONED mode Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 21/28] btrfs: disallow mixed-bg in " Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 22/28] btrfs: disallow inode_cache " Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 23/28] btrfs: support dev-replace " Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 24/28] btrfs: enable relocation " Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 25/28] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 26/28] btrfs: split alloc_log_tree() Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 27/28] btrfs: enable tree-log on HMZONED mode Naohiro Aota
2019-12-04 8:17 ` [PATCH v5 28/28] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191204081735.852438-13-naohiro.aota@wdc.com \
--to=naohiro.aota@wdc.com \
--cc=anand.jain@oracle.com \
--cc=clm@fb.com \
--cc=damien.lemoal@wdc.com \
--cc=dsterba@suse.com \
--cc=hare@suse.com \
--cc=josef@toxicpanda.com \
--cc=jthumshirn@suse.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).