linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Hannes Reinecke <hare@suse.com>,
	Anand Jain <anand.jain@oracle.com>,
	linux-fsdevel@vger.kernel.org,
	Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH v5 12/28] btrfs: ensure metadata space available on/after degraded mount in HMZONED
Date: Wed,  4 Dec 2019 17:17:19 +0900	[thread overview]
Message-ID: <20191204081735.852438-13-naohiro.aota@wdc.com> (raw)
In-Reply-To: <20191204081735.852438-1-naohiro.aota@wdc.com>

On/After degraded mount, we might have no writable metadata block group due
to broken write pointers. If you e.g. balance the FS before writing any
data, alloc_tree_block_no_bg_flush() (called from insert_balance_item())
fails to allocate a tree block for it, due to global reservation failure.
We can reproduce this situation with xfstests btrfs/124.

While we can workaround the failure if we write some data and, as a result
of writing, let a new metadata block group allocated, it's a bad practice
to apply.

This commit avoids such failures by ensuring that read-write mounted volume
has non-zero metadata space. If metadata space is empty, it forces new
metadata block group allocation.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/disk-io.c |  9 +++++++++
 fs/btrfs/hmzoned.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/hmzoned.h |  6 ++++++
 3 files changed, 60 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index deca9fd70771..7f4c6a92079a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3246,6 +3246,15 @@ int __cold open_ctree(struct super_block *sb,
 		}
 	}
 
+	ret = btrfs_hmzoned_check_metadata_space(fs_info);
+	if (ret) {
+		btrfs_warn(fs_info, "failed to allocate metadata space: %d",
+			   ret);
+		btrfs_warn(fs_info, "try remount with readonly");
+		close_ctree(fs_info);
+		return ret;
+	}
+
 	down_read(&fs_info->cleanup_work_sem);
 	if ((ret = btrfs_orphan_cleanup(fs_info->fs_root)) ||
 	    (ret = btrfs_orphan_cleanup(fs_info->tree_root))) {
diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c
index 83dc2dc22323..4fd96fd43897 100644
--- a/fs/btrfs/hmzoned.c
+++ b/fs/btrfs/hmzoned.c
@@ -16,6 +16,8 @@
 #include "disk-io.h"
 #include "block-group.h"
 #include "locking.h"
+#include "space-info.h"
+#include "transaction.h"
 
 /* Maximum number of zones to report per blkdev_report_zones() call */
 #define BTRFS_REPORT_NR_ZONES   4096
@@ -1075,3 +1077,46 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache)
 
 	return ret;
 }
+
+/*
+ * On/After degraded mount, we might have no writable metadata block
+ * group due to broken write pointers. If you e.g. balance the FS
+ * before writing any data, alloc_tree_block_no_bg_flush() (called
+ * from insert_balance_item())fails to allocate a tree block for
+ * it. To avoid such situations, ensure we have some metadata BG here.
+ */
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_root *root = fs_info->extent_root;
+	struct btrfs_trans_handle *trans;
+	struct btrfs_space_info *info;
+	u64 left;
+	int ret;
+
+	if (!btrfs_fs_incompat(fs_info, HMZONED))
+		return 0;
+
+	info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
+	spin_lock(&info->lock);
+	left = info->total_bytes - btrfs_space_info_used(info, true);
+	spin_unlock(&info->lock);
+
+	if (left)
+		return 0;
+
+	trans = btrfs_start_transaction(root, 0);
+	if (IS_ERR(trans))
+		return PTR_ERR(trans);
+
+	mutex_lock(&fs_info->chunk_mutex);
+	ret = btrfs_alloc_chunk(trans, btrfs_metadata_alloc_profile(fs_info));
+	if (ret) {
+		mutex_unlock(&fs_info->chunk_mutex);
+		btrfs_abort_transaction(trans, ret);
+		btrfs_end_transaction(trans);
+		return ret;
+	}
+	mutex_unlock(&fs_info->chunk_mutex);
+
+	return btrfs_commit_transaction(trans);
+}
diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h
index 4ed985d027cc..8ac758074afd 100644
--- a/fs/btrfs/hmzoned.h
+++ b/fs/btrfs/hmzoned.h
@@ -42,6 +42,7 @@ bool btrfs_check_allocatable_zones(struct btrfs_device *device, u64 pos,
 				   u64 num_bytes);
 void btrfs_calc_zone_unusable(struct btrfs_block_group *cache);
 int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache);
+int btrfs_hmzoned_check_metadata_space(struct btrfs_fs_info *fs_info);
 #else /* CONFIG_BLK_DEV_ZONED */
 static inline int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
 				     struct blk_zone *zone)
@@ -97,6 +98,11 @@ static inline int btrfs_load_block_group_zone_info(
 {
 	return 0;
 }
+static inline int btrfs_hmzoned_check_metadata_space(
+	struct btrfs_fs_info *fs_info)
+{
+	return 0;
+}
 #endif
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
-- 
2.24.0


  parent reply	other threads:[~2019-12-04  8:19 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  8:17 [PATCH v5 00/28] btrfs: zoned block device support Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 01/28] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 02/28] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-12-04 15:37   ` Johannes Thumshirn
2019-12-04 17:22     ` David Sterba
2019-12-05  6:29       ` Naohiro Aota
2019-12-05  6:28     ` Naohiro Aota
2019-12-07  9:47   ` Anand Jain
2019-12-10  4:41     ` Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 03/28] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-12-04 16:07   ` Johannes Thumshirn
2019-12-05  5:17     ` Naohiro Aota
2019-12-05 15:28       ` David Sterba
2019-12-04  8:17 ` [PATCH v5 04/28] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 05/28] btrfs: disallow space_cache " Naohiro Aota
2019-12-05  7:21   ` Johannes Thumshirn
2019-12-05 15:39   ` David Sterba
2019-12-06  5:32     ` Naohiro Aota
2019-12-06 15:12       ` David Sterba
2019-12-04  8:17 ` [PATCH v5 06/28] btrfs: disallow NODATACOW " Naohiro Aota
2019-12-05  7:58   ` Johannes Thumshirn
2019-12-05 15:31   ` David Sterba
2019-12-06  5:37     ` Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 07/28] btrfs: disable fallocate " Naohiro Aota
2019-12-05  8:00   ` Johannes Thumshirn
2019-12-04  8:17 ` [PATCH v5 08/28] btrfs: implement log-structured superblock for " Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 09/28] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-12-05  8:56   ` Johannes Thumshirn
2019-12-06  5:45     ` Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 10/28] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 11/28] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-12-04  8:17 ` Naohiro Aota [this message]
2019-12-04  8:17 ` [PATCH v5 13/28] btrfs: reset zones of unused block groups Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 14/28] btrfs: redirty released extent buffers in HMZONED mode Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 15/28] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 16/28] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 17/28] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 18/28] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 19/28] btrfs: wait existing extents before truncating Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 20/28] btrfs: avoid async checksum on HMZONED mode Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 21/28] btrfs: disallow mixed-bg in " Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 22/28] btrfs: disallow inode_cache " Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 23/28] btrfs: support dev-replace " Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 24/28] btrfs: enable relocation " Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 25/28] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 26/28] btrfs: split alloc_log_tree() Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 27/28] btrfs: enable tree-log on HMZONED mode Naohiro Aota
2019-12-04  8:17 ` [PATCH v5 28/28] btrfs: enable to mount HMZONED incompat flag Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191204081735.852438-13-naohiro.aota@wdc.com \
    --to=naohiro.aota@wdc.com \
    --cc=anand.jain@oracle.com \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).