linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Naohiro Aota <naohiro.aota@wdc.com>
To: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Matias Bjorling <Matias.Bjorling@wdc.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Hannes Reinecke <hare@suse.com>,
	Anand Jain <anand.jain@oracle.com>,
	linux-fsdevel@vger.kernel.org,
	Naohiro Aota <naohiro.aota@wdc.com>
Subject: [PATCH v4 03/27] btrfs: Check and enable HMZONED mode
Date: Fri, 23 Aug 2019 19:10:12 +0900	[thread overview]
Message-ID: <20190823101036.796932-4-naohiro.aota@wdc.com> (raw)
In-Reply-To: <20190823101036.796932-1-naohiro.aota@wdc.com>

HMZONED mode cannot be used together with the RAID5/6 profile for now.
Introduce the function btrfs_check_hmzoned_mode() to check this. This
function will also check if HMZONED flag is enabled on the file system and
if the file system consists of zoned devices with equal zone size.

Additionally, as updates to the space cache are in-place, the space cache
cannot be located over sequential zones and there is no guarantees that the
device will have enough conventional zones to store this cache. Resolve
this problem by disabling completely the space cache.  This does not
introduces any problems with sequential block groups: all the free space is
located after the allocation pointer and no free space before the pointer.
There is no need to have such cache.

For the same reason, NODATACOW is also disabled.

Also INODE_MAP_CACHE is also disabled to avoid preallocation in the
INODE_MAP_CACHE inode.

In summary, HMZONED will disable:

| Disabled features | Reason                                              |
|-------------------+-----------------------------------------------------|
| RAID5/6           | 1) Non-full stripe write cause overwriting of       |
|                   | parity block                                        |
|                   | 2) Rebuilding on high capacity volume (usually with |
|                   | SMR) can lead to higher failure rate                |
|-------------------+-----------------------------------------------------|
| space_cache (v1)  | In-place updating                                   |
| NODATACOW         | In-place updating                                   |
|-------------------+-----------------------------------------------------|
| tree-log          | Partial write out of metadata creates write holes   |
|-------------------+-----------------------------------------------------|
| fallocate         | Reserved extent will be a write hole                |
| INODE_MAP_CACHE   | Need pre-allocation. (and will be deprecated?)      |
|-------------------+-----------------------------------------------------|
| MIXED_BG          | Allocated metadata region will be write holes for   |
|                   | data writes                                         |
| async checksum    | Not to mix up bios by multiple workers              |

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/ctree.h       |  3 ++
 fs/btrfs/dev-replace.c |  8 +++++
 fs/btrfs/disk-io.c     |  8 +++++
 fs/btrfs/hmzoned.c     | 67 ++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/hmzoned.h     | 18 ++++++++++++
 fs/btrfs/super.c       |  1 +
 fs/btrfs/volumes.c     |  5 ++++
 7 files changed, 110 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 94660063a162..221259737703 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -712,6 +712,9 @@ struct btrfs_fs_info {
 	struct btrfs_root *uuid_root;
 	struct btrfs_root *free_space_root;
 
+	/* Zone size when in HMZONED mode */
+	u64 zone_size;
+
 	/* the log root tree is a directory of all the other log roots */
 	struct btrfs_root *log_root_tree;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 6b2e9aa83ffa..2cc3ac4d101d 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -20,6 +20,7 @@
 #include "rcu-string.h"
 #include "dev-replace.h"
 #include "sysfs.h"
+#include "hmzoned.h"
 
 static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 				       int scrub_ret);
@@ -201,6 +202,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 		return PTR_ERR(bdev);
 	}
 
+	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
+		btrfs_err(fs_info,
+			  "zone type of target device mismatch with the filesystem!");
+		ret = -EINVAL;
+		goto error;
+	}
+
 	sync_blockdev(bdev);
 
 	devices = &fs_info->fs_devices->devices;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 97beb351a10c..3f5ea92f546c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -40,6 +40,7 @@
 #include "compression.h"
 #include "tree-checker.h"
 #include "ref-verify.h"
+#include "hmzoned.h"
 
 #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
 				 BTRFS_HEADER_FLAG_RELOC |\
@@ -3121,6 +3122,13 @@ int open_ctree(struct super_block *sb,
 
 	btrfs_free_extra_devids(fs_devices, 1);
 
+	ret = btrfs_check_hmzoned_mode(fs_info);
+	if (ret) {
+		btrfs_err(fs_info, "failed to init hmzoned mode: %d",
+				ret);
+		goto fail_block_groups;
+	}
+
 	ret = btrfs_sysfs_add_fsid(fs_devices, NULL);
 	if (ret) {
 		btrfs_err(fs_info, "failed to init sysfs fsid interface: %d",
diff --git a/fs/btrfs/hmzoned.c b/fs/btrfs/hmzoned.c
index 23bf58d3d7bb..ca58eee08a70 100644
--- a/fs/btrfs/hmzoned.c
+++ b/fs/btrfs/hmzoned.c
@@ -157,3 +157,70 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
 
 	return 0;
 }
+
+int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+	struct btrfs_device *device;
+	u64 hmzoned_devices = 0;
+	u64 nr_devices = 0;
+	u64 zone_size = 0;
+	int incompat_hmzoned = btrfs_fs_incompat(fs_info, HMZONED);
+	int ret = 0;
+
+	/* Count zoned devices */
+	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		if (!device->bdev)
+			continue;
+		if (bdev_zoned_model(device->bdev) == BLK_ZONED_HM ||
+		    (bdev_zoned_model(device->bdev) == BLK_ZONED_HA &&
+		     incompat_hmzoned)) {
+			hmzoned_devices++;
+			if (!zone_size) {
+				zone_size = device->zone_info->zone_size;
+			} else if (device->zone_info->zone_size != zone_size) {
+				btrfs_err(fs_info,
+					  "Zoned block devices must have equal zone sizes");
+				ret = -EINVAL;
+				goto out;
+			}
+		}
+		nr_devices++;
+	}
+
+	if (!hmzoned_devices && incompat_hmzoned) {
+		/* No zoned block device found on HMZONED FS */
+		btrfs_err(fs_info, "HMZONED enabled file system should have zoned devices");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!hmzoned_devices && !incompat_hmzoned)
+		goto out;
+
+	fs_info->zone_size = zone_size;
+
+	if (hmzoned_devices != nr_devices) {
+		btrfs_err(fs_info,
+			  "zoned devices cannot be mixed with regular devices");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * stripe_size is always aligned to BTRFS_STRIPE_LEN in
+	 * __btrfs_alloc_chunk(). Since we want stripe_len == zone_size,
+	 * check the alignment here.
+	 */
+	if (!IS_ALIGNED(zone_size, BTRFS_STRIPE_LEN)) {
+		btrfs_err(fs_info,
+			  "zone size is not aligned to BTRFS_STRIPE_LEN");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	btrfs_info(fs_info, "HMZONED mode enabled, zone size %llu B",
+		   fs_info->zone_size);
+out:
+	return ret;
+}
diff --git a/fs/btrfs/hmzoned.h b/fs/btrfs/hmzoned.h
index ffc70842135e..29cfdcabff2f 100644
--- a/fs/btrfs/hmzoned.h
+++ b/fs/btrfs/hmzoned.h
@@ -9,6 +9,8 @@
 #ifndef BTRFS_HMZONED_H
 #define BTRFS_HMZONED_H
 
+#include <linux/blkdev.h>
+
 struct btrfs_zoned_device_info {
 	/*
 	 * Number of zones, zone size and types of zones if bdev is a
@@ -25,6 +27,7 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos,
 		       struct blk_zone *zone, gfp_t gfp_mask);
 int btrfs_get_dev_zone_info(struct btrfs_device *device);
 void btrfs_destroy_dev_zone_info(struct btrfs_device *device);
+int btrfs_check_hmzoned_mode(struct btrfs_fs_info *fs_info);
 
 static inline bool btrfs_dev_is_sequential(struct btrfs_device *device, u64 pos)
 {
@@ -76,4 +79,19 @@ static inline void btrfs_dev_clear_zone_empty(struct btrfs_device *device,
 	btrfs_dev_set_empty_zone_bit(device, pos, false);
 }
 
+static inline bool btrfs_check_device_zone_type(struct btrfs_fs_info *fs_info,
+						struct block_device *bdev)
+{
+	u64 zone_size;
+
+	if (btrfs_fs_incompat(fs_info, HMZONED)) {
+		zone_size = (u64)bdev_zone_sectors(bdev) << SECTOR_SHIFT;
+		/* Do not allow non-zoned device */
+		return bdev_is_zoned(bdev) && fs_info->zone_size == zone_size;
+	}
+
+	/* Do not allow Host Manged zoned device */
+	return bdev_zoned_model(bdev) != BLK_ZONED_HM;
+}
+
 #endif
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 78de9d5d80c6..d7879a5a2536 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -43,6 +43,7 @@
 #include "free-space-cache.h"
 #include "backref.h"
 #include "space-info.h"
+#include "hmzoned.h"
 #include "tests/btrfs-tests.h"
 
 #include "qgroup.h"
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a8c550562057..ffa4de09666d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2572,6 +2572,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
+	if (!btrfs_check_device_zone_type(fs_info, bdev)) {
+		ret = -EINVAL;
+		goto error;
+	}
+
 	if (fs_devices->seeding) {
 		seeding_dev = 1;
 		down_write(&sb->s_umount);
-- 
2.23.0


  parent reply	other threads:[~2019-08-23 10:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-23 10:10 [PATCH v4 00/27] btrfs zoned block device support Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 01/27] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-08-23 11:45   ` Johannes Thumshirn
2019-08-23 10:10 ` [PATCH v4 02/27] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-08-23 11:57   ` Johannes Thumshirn
2019-08-26  6:29     ` Naohiro Aota
2019-08-24  9:22   ` kbuild test robot
2019-08-24 10:49   ` kbuild test robot
2019-08-23 10:10 ` Naohiro Aota [this message]
2019-08-23 12:07   ` [PATCH v4 03/27] btrfs: Check and enable HMZONED mode Johannes Thumshirn
2019-08-26  8:38     ` Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 04/27] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-08-23 12:09   ` Johannes Thumshirn
2019-08-23 10:10 ` [PATCH v4 05/27] btrfs: disallow space_cache " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 06/27] btrfs: disallow NODATACOW " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 07/27] btrfs: disable tree-log " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 08/27] btrfs: disable fallocate " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 09/27] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 10/27] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 11/27] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 12/27] btrfs: ensure metadata space available on/after degraded mount in HMZONED Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 13/27] btrfs: reset zones of unused block groups Naohiro Aota
2019-08-24 11:32   ` kbuild test robot
2019-08-25  4:56   ` kbuild test robot
2019-08-23 10:10 ` [PATCH v4 14/27] btrfs: limit super block locations in HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 15/27] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 16/27] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 17/27] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 18/27] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 19/27] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 20/27] btrfs: wait existing extents before truncating Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 21/27] btrfs: avoid async checksum/submit on HMZONED mode Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 22/27] btrfs: disallow mixed-bg in " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 23/27] btrfs: disallow inode_cache " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 24/27] btrfs: support dev-replace " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 25/27] btrfs: enable relocation " Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 26/27] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-08-23 10:10 ` [PATCH v4 27/27] btrfs: enable to mount HMZONED incompat flag Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190823101036.796932-4-naohiro.aota@wdc.com \
    --to=naohiro.aota@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=anand.jain@oracle.com \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nborisov@suse.com \
    --subject='Re: [PATCH v4 03/27] btrfs: Check and enable HMZONED mode' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).