linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Matias Bjorling <Matias.Bjorling@wdc.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Hannes Reinecke <hare@suse.com>,
	Anand Jain <anand.jain@oracle.com>,
	linux-fsdevel@vger.kernel.org,
	Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH v4 12/27] btrfs: ensure metadata space available on/after degraded mount in HMZONED
Date: Fri, 23 Aug 2019 19:10:21 +0900	[thread overview]
Message-ID: <20190823101036.796932-13-naohiro.aota@wdc.com> (raw)
In-Reply-To: <20190823101036.796932-1-naohiro.aota@wdc.com>

On/After degraded mount, we might have no writable metadata block group due
to broken write pointers. If you e.g. balance the FS before writing any
data, alloc_tree_block_no_bg_flush() (called from insert_balance_item())
fails to allocate a tree block for it, due to global reservation failure.
We can reproduce this situation with xfstests btrfs/124.

While we can workaround the failure if we write some data and, as a result
of writing, let a new metadata block group allocated, it's a bad practice
to apply.

This commit avoids such failures by ensuring that read-write mounted volume
has non-zero metadata space. If metadata space is empty, it forces new
metadata block group allocation.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/disk-io.c |  9 +++++++++
 fs/btrfs/hmzoned.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/hmzoned.h |  1 +
 3 files changed, 55 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f5ea92f546c..b25cff8af3b7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3285,6 +3285,15 @@ int open_ctree(struct super_block *sb,
 		}
 	}
 
+	ret = btrfs_hmzoned_check_metadata_space(fs_info);
+	if (ret) {
+		btrfs_warn(fs_info, "failed to allocate metadata space: %d",
+			   ret);
+		btrfs_warn(fs_info, "try remount with readonly");
+		close_ctree(fs_info);
+		return ret;
+	}
+
 	down_read(&fs_info->cleanup_work_sem);
 	if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
 	    (ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c
index 55c00410e2f1..b5fd3e280b65 100644
--- a/fs/btrfs/hmzoned.c
+++ b/fs/btrfs/hmzoned.c
@@ -13,6 +13,8 @@
 #include "hmzoned.h"
 #include "rcu-string.h"
 #include "disk-io.h"
+#include "space-info.h"
+#include "transaction.h"
 
 /* Maximum number of zones to report per blkdev_report_zones() call */
 #define BTRFS_REPORT_NR_ZONES   4096
@@ -548,3 +550,46 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache)
 
 	return ret;
 }
+
+/*
+ * On/After degraded mount, we might have no writable metadata block
+ * group due to broken write pointers. If you e.g. balance the FS
+ * before writing any data, alloc_tree_block_no_bg_flush() (called
+ * from insert_balance_item())fails to allocate a tree block for
+ * it. To avoid such situations, ensure we have some metadata BG here.
+ */
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_trans_handle *trans;
+	struct btrfs_space_info *info;
+	u64 left;
+	int ret;
+
+	if (!btrfs_fs_incompat(fs_info, HMZONED))
+		return 0;
+
+	info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+	spin_lock(&info->lock);
+	left = info->total_bytes - btrfs_space_info_used(info, true);
+	spin_unlock(&info->lock);
+
+	if (left)
+		return 0;
+
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
+	mutex_lock(&fs_info->chunk_mutex);
+	ret = btrfs_alloc_chunk(trans, btrfs_metadata_alloc_profile(fs_info));
+	if (ret) {
+		mutex_unlock(&fs_info->chunk_mutex);
+		btrfs_abort_transaction(trans, ret);
+		btrfs_end_transaction(trans);
+		return ret;
+	}
+	mutex_unlock(&fs_info->chunk_mutex);
+
+	return btrfs_commit_transaction(trans);
+}
diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h
index 399d9e9543aa..e95139d4c072 100644
--- a/fs/btrfs/hmzoned.h
+++ b/fs/btrfs/hmzoned.h
@@ -32,6 +32,7 @@ int btrfs_check_mountopts_hmzoned(struct btrfs_fs_info *info);
 bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
 				   u64 num_bytes);
 int btrfs_load_block_group_zone_info(struct btrfs_block_group_cache *cache);
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info);
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
 {
-- 
2.23.0


  parent reply	other threads:[~2019-08-23 10:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 10:10 [PATCH v4 00/27] btrfs zoned block device support Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 01/27] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-08-23 11:45   ` Johannes Thumshirn
2019-08-23 10:10 ` [PATCH v4 02/27] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-08-23 11:57   ` Johannes Thumshirn
2019-08-26  6:29     ` Naohiro Aota
2019-08-24  9:22   ` kbuild test robot
2019-08-24 10:49   ` kbuild test robot
2019-08-23 10:10 ` [PATCH v4 03/27] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-08-23 12:07   ` Johannes Thumshirn
2019-08-26  8:38     ` Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 04/27] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-08-23 12:09   ` Johannes Thumshirn
2019-08-23 10:10 ` [PATCH v4 05/27] btrfs: disallow space_cache " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 06/27] btrfs: disallow NODATACOW " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 07/27] btrfs: disable tree-log " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 08/27] btrfs: disable fallocate " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 09/27] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 10/27] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 11/27] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-08-23 10:10 ` Naohiro Aota [this message]
2019-08-23 10:10 ` [PATCH v4 13/27] btrfs: reset zones of unused block groups Naohiro Aota
2019-08-24 11:32   ` kbuild test robot
2019-08-25  4:56   ` kbuild test robot
2019-08-23 10:10 ` [PATCH v4 14/27] btrfs: limit super block locations in HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 15/27] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 16/27] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 17/27] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 18/27] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 19/27] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 20/27] btrfs: wait existing extents before truncating Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 21/27] btrfs: avoid async checksum/submit on HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 22/27] btrfs: disallow mixed-bg in " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 23/27] btrfs: disallow inode_cache " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 24/27] btrfs: support dev-replace " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 25/27] btrfs: enable relocation " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 26/27] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 27/27] btrfs: enable to mount HMZONED incompat flag Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190823101036.796932-13-naohiro.aota@wdc.com \
    --to=naohiro.aota@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=anand.jain@oracle.com \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nborisov@suse.com \
    --subject='Re: [PATCH v4 12/27] btrfs: ensure metadata space available on/after degraded mount in HMZONED' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).