Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Matias Bjorling <Matias.Bjorling@wdc.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Hannes Reinecke <hare@suse.com>,
	linux-fsdevel@vger.kernel.org,
	Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH v3 12/27] btrfs: ensure metadata space available on/after degraded mount in HMZONED
Date: Thu,  8 Aug 2019 18:30:23 +0900
Message-ID: <20190808093038.4163421-13-naohiro.aota@wdc.com> (raw)
In-Reply-To: <20190808093038.4163421-1-naohiro.aota@wdc.com>

On/After degraded mount, we might have no writable metadata block group due
to broken write pointers. If you e.g. balance the FS before writing any
data, alloc_tree_block_no_bg_flush() (called from insert_balance_item())
fails to allocate a tree block for it, due to global reservation failure.
We can reproduce this situation with xfstests btrfs/124.

While we can workaround the failure if we write some data and, as a result
of writing, let a new metadata block group allocated, it's a bad practice
to apply.

This commit avoids such failures by ensuring that read-write mounted volume
has non-zero metadata space. If metadata space is empty, it forces new
metadata block group allocation.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/disk-io.c |  9 +++++++++
 fs/btrfs/hmzoned.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/hmzoned.h |  1 +
 3 files changed, 55 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8854ff2e5fa5..65b3198c6e83 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3287,6 +3287,15 @@ int open_ctree(struct super_block *sb,
 		}
 	}
 
+	ret = btrfs_hmzoned_check_metadata_space(fs_info);
+	if (ret) {
+		btrfs_warn(fs_info, "failed to allocate metadata space: %d",
+			   ret);
+		btrfs_warn(fs_info, "try remount with readonly");
+		close_ctree(fs_info);
+		return ret;
+	}
+
 	down_read(&fs_info->cleanup_work_sem);
 	if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
 	    (ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c
index 89631f5f01f2..38cc1bbfe118 100644
--- a/fs/btrfs/hmzoned.c
+++ b/fs/btrfs/hmzoned.c
@@ -13,6 +13,8 @@
 #include "hmzoned.h"
 #include "rcu-string.h"
 #include "disk-io.h"
+#include "space-info.h"
+#include "transaction.h"
 
 /* Maximum number of zones to report per blkdev_report_zones() call */
 #define BTRFS_REPORT_NR_ZONES   4096
@@ -551,3 +553,46 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache)
 
 	return ret;
 }
+
+/*
+ * On/After degraded mount, we might have no writable metadata block
+ * group due to broken write pointers. If you e.g. balance the FS
+ * before writing any data, alloc_tree_block_no_bg_flush() (called
+ * from insert_balance_item())fails to allocate a tree block for
+ * it. To avoid such situations, ensure we have some metadata BG here.
+ */
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_trans_handle *trans;
+	struct btrfs_space_info *info;
+	u64 left;
+	int ret;
+
+	if (!btrfs_fs_incompat(fs_info, HMZONED))
+		return 0;
+
+	info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+	spin_lock(&info->lock);
+	left = info->total_bytes - btrfs_space_info_used(info, true);
+	spin_unlock(&info->lock);
+
+	if (left)
+		return 0;
+
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
+	mutex_lock(&fs_info->chunk_mutex);
+	ret = btrfs_alloc_chunk(trans, btrfs_metadata_alloc_profile(fs_info));
+	if (ret) {
+		mutex_unlock(&fs_info->chunk_mutex);
+		btrfs_abort_transaction(trans, ret);
+		btrfs_end_transaction(trans);
+		return ret;
+	}
+	mutex_unlock(&fs_info->chunk_mutex);
+
+	return btrfs_commit_transaction(trans);
+}
diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h
index 399d9e9543aa..e95139d4c072 100644
--- a/fs/btrfs/hmzoned.h
+++ b/fs/btrfs/hmzoned.h
@@ -32,6 +32,7 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info);
 bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
 				   u64 num_bytes);
 int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache);
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info);
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
 {
-- 
2.22.0


  parent reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-08  9:30 [PATCH v3 00/27] btrfs zoned block device support Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 01/27] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-08-16  4:49   ` Anand Jain
2019-08-08  9:30 ` [PATCH v3 02/27] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-08-16  4:44   ` Anand Jain
2019-08-16 14:19     ` Damien Le Moal
2019-08-16 23:47       ` Anand Jain
2019-08-16 23:55         ` Damien Le Moal
2019-08-08  9:30 ` [PATCH v3 03/27] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-08-16  5:46   ` Anand Jain
2019-08-16 14:23     ` Damien Le Moal
2019-08-16 23:56       ` Anand Jain
2019-08-17  0:05         ` Damien Le Moal
2019-08-20  5:07         ` Naohiro Aota
2019-08-20 13:05           ` David Sterba
2019-08-08  9:30 ` [PATCH v3 04/27] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 05/27] btrfs: disallow space_cache " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 06/27] btrfs: disallow NODATACOW " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 07/27] btrfs: disable tree-log " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 08/27] btrfs: disable fallocate " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 09/27] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 10/27] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 11/27] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-08-08  9:30 ` Naohiro Aota [this message]
2019-08-08  9:30 ` [PATCH v3 13/27] btrfs: reset zones of unused block groups Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 14/27] btrfs: limit super block locations in HMZONED mode Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 15/27] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 16/27] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 17/27] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 18/27] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 19/27] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 20/27] btrfs: wait existing extents before truncating Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 21/27] btrfs: avoid async checksum/submit on HMZONED mode Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 22/27] btrfs: disallow mixed-bg in " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 23/27] btrfs: disallow inode_cache " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 24/27] btrfs: support dev-replace " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 25/27] btrfs: enable relocation " Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 26/27] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-08-08  9:30 ` [PATCH v3 27/27] btrfs: enable to mount HMZONED incompat flag Naohiro Aota

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190808093038.4163421-13-naohiro.aota@wdc.com \
    --to=naohiro.aota@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git