All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] zoned: moving superblock logging zones
@ 2021-03-15  5:53 Naohiro Aota
  2021-03-15  5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
  2021-03-29  3:33 ` [PATCH 0/1] zoned: moving superblock logging zones Anand Jain
  0 siblings, 2 replies; 10+ messages in thread
From: Naohiro Aota @ 2021-03-15  5:53 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: Naohiro Aota

The following patch will change the superblock logging zones' location from
fixed zone number to fixed LBAs.

Here is a background of how the superblock is working on zoned btrfs.

This document will be promoted to btrfs-dev-docs in the future.

# Superblock logging for zoned btrfs

The superblock and its copies are the only data structures in btrfs with a
fixed location on a device. Since we cannot overwrite these blocks if they
are placed in sequential write required zones, we cannot use the regular
method of updating superblocks with zoned btrfs. We also cannot limit the
position of superblocks to conventional zones as that would prevent using
zoned block devices that do not have this zone type (e.g. NVMe ZNS SSDs).

To solve this problem, we use superblock log writing. This method uses two
sequential write required zones as a circular buffer to write updated
superblocks. Once the first zone is filled up, start writing into the
second zone. When both zones are filled up and before start writing to the
first zone again, the first zone is reset and writing continues in the
first zone. Once the first zone is full, reset the second zone, and write
the latest superblock in the second zone. With this logging, we can always
determine the position of the latest superblock by inspecting the zones'
write pointer information provided by the device. One corner case is when
both zones are full. For this situation, we read out the last superblock of
each zone and compare them to determine which copy is the latest one.

## Placement of superblock logging zones

We use the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.

  - Primary superblock: zone starting at offset 0 and the following zone
  - First copy: zone containing offset 64GB and the following zone
  - Second copy: zone containing offset 256GB and the following zone

These zones are reserved for superblock logging and never used for data or
metadata blocks. Zones containing the offsets used to store superblocks in
a regular btrfs volume (no zoned case) are also reserved to avoid
confusion.

The first copy position is much larger than for a regular btrfs volume
(64M).  This increase is to avoid overlapping with the log zones for the
primary superblock. This higher location is arbitrary but allows supporting
devices with very large zone size, up to 32GB. But we only allow zone sizes
up to 8GB for now.

## Writing superblock in conventional zones

Conventional zones do not have a write pointer. This zone type thus cannot
be used with superblock logging since determining the position of the
latest copy of the superblock in a zone pair would be impossible.

To address this problem, if either of the zones containing the fixed offset
locations for zone logging is a conventional zone, superblock updates are
done in-place using the first block of the conventional zone.

## Reading zoned btrfs dump image without zone information

Reading a zoned btrfs image without zone information is challenging but
possible.

We can always find a superblock copy at or after the fixed offset locations
determining the logging zones position. With such copy, the superblock
incompatible flags indicates if the volume is zoned or not. With a chunk
item in the sys_chunk_array, we can determine the zone size from the size
of a device extent, itself determined from the chunk length, num_stripes,
and sub_stripes.  With this information, all blocks within the 2 logging
zones containing the fixed locations can be inspected to find the newest
superblock copy.

The first zone of a log pair may be empty and have no superblock copy. This
can happen if a system crashes after resetting the first zone of a pair and
before writing out a new superblock. In this case, a superblock copy can be
found in the second zone of a log pair. The start of this second zone can
be found by inspecting the blocks located at the fixed offset of the log
pair plus the possible zone size (4M [1], 8M, 16M, 32M, 64M, 128M, 256M,
512M, 1G, 2G, 4G, 8G [2])[3]. Once we find a superblock, we can follow the
same instruction above to find the latest superblock copy within the zone
log pair.

[1] 4M = BTRFS_MKFS_SYSTEM_GROUP_SIZE. We cannot mkfs on a device with a
zone size less than 4MB because we cannot create the initial temporary
system chunk with the size.
[2] The maximum size we support for now.
[3] The zone size is limited to these 11 cases, as it must be a power of 2.

Once we find the latest superblock, it is no different than reading a
regular btrfs image. You can further confirm the determined zone size by
comparing it with the size of a device extent because it is the same as the
zone size.

Actually, since the writing offset within the logging buffer is different
from the primary to copies [4], the timing when resetting the former zone
will become different. So, we can also try reading the head of the buffer
of a copy in case of missing superblock at offset 0.

[4] Because mkfs update the primary in the initial process, advancing only
the write pointer of the primary log buffer

## Superblock writing on an emulated zoned device

By mounting a regular device in zoned mode, btrfs emulates conventional
zones by slicing the device with a fixed size. In this case, however, we do
not follow the above rule of writing superblocks at the head of the logging
zones if they are conventional. Doing so would introduce a chicken-and-egg
problem. To know the given btrfs is zoned btrfs, we need to read a
superblock to see the incompatible flags. But, to read a superblock
properly from a zoned position, we need to know the file-system is zoned a
priori (e.g. resided in a zoned device), leading to a recursive dependency.

We can use the regular super block update method on an emulated zoned
device to break the recursion. Since the zones containing the regular
locations are always reserved, it is safe to do so. Then, we can naturally
read a regular superblock on a regular device and determine the file-system
is zoned or not.

Naohiro Aota (1):
  btrfs: zoned: move superblock logging zone location

 fs/btrfs/zoned.c | 40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] btrfs: zoned: move superblock logging zone location
  2021-03-15  5:53 [PATCH 0/1] zoned: moving superblock logging zones Naohiro Aota
@ 2021-03-15  5:53 ` Naohiro Aota
  2021-03-19  8:19   ` Johannes Thumshirn
                     ` (3 more replies)
  2021-03-29  3:33 ` [PATCH 0/1] zoned: moving superblock logging zones Anand Jain
  1 sibling, 4 replies; 10+ messages in thread
From: Naohiro Aota @ 2021-03-15  5:53 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: Naohiro Aota

This commit moves the location of superblock logging zones. The location of
the logging zones are determined based on fixed block addresses instead of
on fixed zone numbers.

By locating the superblock zones using fixed addresses, we can scan a
dumped file system image without the zone information. And, no drawbacks
exist.

We use the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.

  - Primary superblock: zone starting at offset 0 and the following zone
  - First copy: zone containing offset 64GB and the following zone
  - Second copy: zone containing offset 256GB and the following zone

If the location of the zones are outside of disk, we don't record the
superblock copy.

These addresses are arbitrary, but using addresses that are too large
reduces superblock reliability for smaller devices, so we do not want to
exceed 1T to cover all case nicely.

Also, LBAs are generally distributed initially across one head (platter
side) up to one or more zones, then go on the next head backward (the other
side of the same platter), and on to the following head/platter. Thus using
non sequential fixed addresses for superblock logging, such as 0/64G/256G,
likely result in each superblock copy being on a different head/platter
which improves chances of recovery in case of superblock read error.

These zones are reserved for superblock logging and never used for data or
metadata blocks. Zones containing the offsets used to store superblocks in
a regular btrfs volume (no zoned case) are also reserved to avoid
confusion.

Note that we only reserve the 2 zones per primary/copy actually used for
superblock logging. We don't reserve the ranges possibly containing
superblock with the largest supported zone size (0-16GB, 64G-80GB,
256G-272GB).

The first copy position is much larger than for a regular btrfs volume
(64M).  This increase is to avoid overlapping with the log zones for the
primary superblock. This higher location is arbitrary but allows supporting
devices with very large zone size, up to 32GB. But we only allow zone sizes
up to 8GB for now.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 43948bd40e02..6a72ca1f7988 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -21,9 +21,24 @@
 /* Pseudo write pointer value for conventional zone */
 #define WP_CONVENTIONAL ((u64)-2)
 
+/*
+ * Location of the first zone of superblock logging zone pairs.
+ * - Primary superblock: the zone containing offset 0 (zone 0)
+ * - First superblock copy: the zone containing offset 64G
+ * - Second superblock copy: the zone containing offset 256G
+ */
+#define BTRFS_PRIMARY_SB_LOG_ZONE 0ULL
+#define BTRFS_FIRST_SB_LOG_ZONE (64ULL * SZ_1G)
+#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G)
+#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE)
+#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE)
+
 /* Number of superblock log zones */
 #define BTRFS_NR_SB_LOG_ZONES 2
 
+/* Max size of supported zone size */
+#define BTRFS_MAX_ZONE_SIZE SZ_8G
+
 static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data)
 {
 	struct blk_zone *zones = data;
@@ -111,11 +126,8 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 }
 
 /*
- * The following zones are reserved as the circular buffer on ZONED btrfs.
- *  - The primary superblock: zones 0 and 1
- *  - The first copy: zones 16 and 17
- *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
- *                     the following one
+ * Get the zone number of the first zone of a pair of contiguous zones used
+ * for superblock logging.
  */
 static inline u32 sb_zone_number(int shift, int mirror)
 {
@@ -123,8 +135,8 @@ static inline u32 sb_zone_number(int shift, int mirror)
 
 	switch (mirror) {
 	case 0: return 0;
-	case 1: return 16;
-	case 2: return min_t(u64, btrfs_sb_offset(mirror) >> shift, 1024);
+	case 1: return 1 << (BTRFS_FIRST_SB_LOG_ZONE_SHIFT - shift);
+	case 2: return 1 << (BTRFS_SECOND_SB_LOG_ZONE_SHIFT - shift);
 	}
 
 	return 0;
@@ -300,10 +312,21 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 		zone_sectors = bdev_zone_sectors(bdev);
 	}
 
-	nr_sectors = bdev_nr_sectors(bdev);
 	/* Check if it's power of 2 (see is_power_of_2) */
 	ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0);
 	zone_info->zone_size = zone_sectors << SECTOR_SHIFT;
+
+	/* We reject devices with a zone size larger than 8GB. */
+	if (zone_info->zone_size > BTRFS_MAX_ZONE_SIZE) {
+		btrfs_err_in_rcu(fs_info,
+				 "zoned: %s: zone size %llu is too large",
+				 rcu_str_deref(device->name),
+				 zone_info->zone_size);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	nr_sectors = bdev_nr_sectors(bdev);
 	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
 	zone_info->max_zone_append_size =
 		(u64)queue_max_zone_append_sectors(queue) << SECTOR_SHIFT;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: zoned: move superblock logging zone location
  2021-03-15  5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
@ 2021-03-19  8:19   ` Johannes Thumshirn
  2021-03-24  8:42   ` Damien Le Moal
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Johannes Thumshirn @ 2021-03-19  8:19 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, dsterba

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: zoned: move superblock logging zone location
  2021-03-15  5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
  2021-03-19  8:19   ` Johannes Thumshirn
@ 2021-03-24  8:42   ` Damien Le Moal
  2021-03-26 15:56   ` Johannes Thumshirn
  2021-04-07 17:52   ` Josef Bacik
  3 siblings, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2021-03-24  8:42 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, dsterba; +Cc: Johannes Thumshirn

On 2021/03/15 14:55, Naohiro Aota wrote:
> This commit moves the location of superblock logging zones. The location of
> the logging zones are determined based on fixed block addresses instead of
> on fixed zone numbers.

David,

Any comment on this ? It would be nice to get this settled in this cycle so that
we have a stable on-disk format going forward. btrfs-tools and libblkid zoned
support patches also depend on this.

> 
> By locating the superblock zones using fixed addresses, we can scan a
> dumped file system image without the zone information. And, no drawbacks
> exist.
> 
> We use the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>   - Primary superblock: zone starting at offset 0 and the following zone
>   - First copy: zone containing offset 64GB and the following zone
>   - Second copy: zone containing offset 256GB and the following zone
> 
> If the location of the zones are outside of disk, we don't record the
> superblock copy.
> 
> These addresses are arbitrary, but using addresses that are too large
> reduces superblock reliability for smaller devices, so we do not want to
> exceed 1T to cover all case nicely.
> 
> Also, LBAs are generally distributed initially across one head (platter
> side) up to one or more zones, then go on the next head backward (the other
> side of the same platter), and on to the following head/platter. Thus using
> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
> likely result in each superblock copy being on a different head/platter
> which improves chances of recovery in case of superblock read error.
> 
> These zones are reserved for superblock logging and never used for data or
> metadata blocks. Zones containing the offsets used to store superblocks in
> a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
> 
> Note that we only reserve the 2 zones per primary/copy actually used for
> superblock logging. We don't reserve the ranges possibly containing
> superblock with the largest supported zone size (0-16GB, 64G-80GB,
> 256G-272GB).
> 
> The first copy position is much larger than for a regular btrfs volume
> (64M).  This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone size, up to 32GB. But we only allow zone sizes
> up to 8GB for now.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/zoned.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 43948bd40e02..6a72ca1f7988 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -21,9 +21,24 @@
>  /* Pseudo write pointer value for conventional zone */
>  #define WP_CONVENTIONAL ((u64)-2)
>  
> +/*
> + * Location of the first zone of superblock logging zone pairs.
> + * - Primary superblock: the zone containing offset 0 (zone 0)
> + * - First superblock copy: the zone containing offset 64G
> + * - Second superblock copy: the zone containing offset 256G
> + */
> +#define BTRFS_PRIMARY_SB_LOG_ZONE 0ULL
> +#define BTRFS_FIRST_SB_LOG_ZONE (64ULL * SZ_1G)
> +#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G)
> +#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE)
> +#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE)
> +
>  /* Number of superblock log zones */
>  #define BTRFS_NR_SB_LOG_ZONES 2
>  
> +/* Max size of supported zone size */
> +#define BTRFS_MAX_ZONE_SIZE SZ_8G
> +
>  static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data)
>  {
>  	struct blk_zone *zones = data;
> @@ -111,11 +126,8 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
>  }
>  
>  /*
> - * The following zones are reserved as the circular buffer on ZONED btrfs.
> - *  - The primary superblock: zones 0 and 1
> - *  - The first copy: zones 16 and 17
> - *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
> - *                     the following one
> + * Get the zone number of the first zone of a pair of contiguous zones used
> + * for superblock logging.
>   */
>  static inline u32 sb_zone_number(int shift, int mirror)
>  {
> @@ -123,8 +135,8 @@ static inline u32 sb_zone_number(int shift, int mirror)
>  
>  	switch (mirror) {
>  	case 0: return 0;
> -	case 1: return 16;
> -	case 2: return min_t(u64, btrfs_sb_offset(mirror) >> shift, 1024);
> +	case 1: return 1 << (BTRFS_FIRST_SB_LOG_ZONE_SHIFT - shift);
> +	case 2: return 1 << (BTRFS_SECOND_SB_LOG_ZONE_SHIFT - shift);
>  	}
>  
>  	return 0;
> @@ -300,10 +312,21 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
>  		zone_sectors = bdev_zone_sectors(bdev);
>  	}
>  
> -	nr_sectors = bdev_nr_sectors(bdev);
>  	/* Check if it's power of 2 (see is_power_of_2) */
>  	ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0);
>  	zone_info->zone_size = zone_sectors << SECTOR_SHIFT;
> +
> +	/* We reject devices with a zone size larger than 8GB. */
> +	if (zone_info->zone_size > BTRFS_MAX_ZONE_SIZE) {
> +		btrfs_err_in_rcu(fs_info,
> +				 "zoned: %s: zone size %llu is too large",
> +				 rcu_str_deref(device->name),
> +				 zone_info->zone_size);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	nr_sectors = bdev_nr_sectors(bdev);
>  	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
>  	zone_info->max_zone_append_size =
>  		(u64)queue_max_zone_append_sectors(queue) << SECTOR_SHIFT;
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: zoned: move superblock logging zone location
  2021-03-15  5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
  2021-03-19  8:19   ` Johannes Thumshirn
  2021-03-24  8:42   ` Damien Le Moal
@ 2021-03-26 15:56   ` Johannes Thumshirn
  2021-04-07 17:52   ` Josef Bacik
  3 siblings, 0 replies; 10+ messages in thread
From: Johannes Thumshirn @ 2021-03-26 15:56 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, dsterba

On 15/03/2021 06:55, Naohiro Aota wrote:
> This commit moves the location of superblock logging zones. The location of
> the logging zones are determined based on fixed block addresses instead of
> on fixed zone numbers.
> 
> By locating the superblock zones using fixed addresses, we can scan a
> dumped file system image without the zone information. And, no drawbacks
> exist.
> 
> We use the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>   - Primary superblock: zone starting at offset 0 and the following zone
>   - First copy: zone containing offset 64GB and the following zone
>   - Second copy: zone containing offset 256GB and the following zone
> 
> If the location of the zones are outside of disk, we don't record the
> superblock copy.
> 
> These addresses are arbitrary, but using addresses that are too large
> reduces superblock reliability for smaller devices, so we do not want to
> exceed 1T to cover all case nicely.
> 
> Also, LBAs are generally distributed initially across one head (platter
> side) up to one or more zones, then go on the next head backward (the other
> side of the same platter), and on to the following head/platter. Thus using
> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
> likely result in each superblock copy being on a different head/platter
> which improves chances of recovery in case of superblock read error.
> 
> These zones are reserved for superblock logging and never used for data or
> metadata blocks. Zones containing the offsets used to store superblocks in
> a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
> 
> Note that we only reserve the 2 zones per primary/copy actually used for
> superblock logging. We don't reserve the ranges possibly containing
> superblock with the largest supported zone size (0-16GB, 64G-80GB,
> 256G-272GB).
> 
> The first copy position is much larger than for a regular btrfs volume
> (64M).  This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone size, up to 32GB. But we only allow zone sizes
> up to 8GB for now.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

Ping?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/1] zoned: moving superblock logging zones
  2021-03-15  5:53 [PATCH 0/1] zoned: moving superblock logging zones Naohiro Aota
  2021-03-15  5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
@ 2021-03-29  3:33 ` Anand Jain
  2021-03-29  7:36   ` Damien Le Moal
  1 sibling, 1 reply; 10+ messages in thread
From: Anand Jain @ 2021-03-29  3:33 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, dsterba

On 15/03/2021 13:53, Naohiro Aota wrote:
> The following patch will change the superblock logging zones' location from
> fixed zone number to fixed LBAs.
> 
> Here is a background of how the superblock is working on zoned btrfs.
> 
> This document will be promoted to btrfs-dev-docs in the future.
> 
> # Superblock logging for zoned btrfs
> 
> The superblock and its copies are the only data structures in btrfs with a
> fixed location on a device. Since we cannot overwrite these blocks if they
> are placed in sequential write required zones, we cannot use the regular
> method of updating superblocks with zoned btrfs.

  Looks like a ZBC which does the write pointer reset and write could 
have helped here.

> We also cannot limit the
> position of superblocks to conventional zones as that would prevent using
> zoned block devices that do not have this zone type (e.g. NVMe ZNS SSDs).
> 
> To solve this problem, we use superblock log writing. This method uses two
> sequential write required zones as a circular buffer to write updated
> superblocks. Once the first zone is filled up, start writing into the
> second zone. When both zones are filled up and before start writing to the
> first zone again, the first zone is reset and writing continues in the
> first zone. Once the first zone is full, reset the second zone, and write
> the latest superblock in the second zone. With this logging, we can always
> determine the position of the latest superblock by inspecting the zones'
> write pointer information provided by the device. One corner case is when
> both zones are full. For this situation, we read out the last superblock of
> each zone and compare them to determine which copy is the latest one.
> 
> ## Placement of superblock logging zones
> 
> We use the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>    - Primary superblock: zone starting at offset 0 and the following zone
>    - First copy: zone containing offset 64GB and the following zone
>    - Second copy: zone containing offset 256GB and the following zone
> 
> These zones are reserved for superblock logging and never used for data or
> metadata blocks. Zones containing the offsets used to store superblocks in
> a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
> 
> The first copy position is much larger than for a regular btrfs volume
> (64M).  This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone size, up to 32GB. But we only allow zone sizes
> up to 8GB for now.
> 
> ## Writing superblock in conventional zones
> 
> Conventional zones do not have a write pointer. This zone type thus cannot
> be used with superblock logging since determining the position of the
> latest copy of the superblock in a zone pair would be impossible.
> 
> To address this problem, if either of the zones containing the fixed offset
> locations for zone logging is a conventional zone, superblock updates are
> done in-place using the first block of the conventional zone.
> 
> ## Reading zoned btrfs dump image without zone information
> 
> Reading a zoned btrfs image without zone information is challenging but
> possible.
> 
> We can always find a superblock copy at or after the fixed offset locations
> determining the logging zones position. With such copy, the superblock
> incompatible flags indicates if the volume is zoned or not. With a chunk
> item in the sys_chunk_array, we can determine the zone size from the size
> of a device extent, itself determined from the chunk length, num_stripes,
> and sub_stripes.  With this information, all blocks within the 2 logging
> zones containing the fixed locations can be inspected to find the newest
> superblock copy.
> 
> The first zone of a log pair may be empty and have no superblock copy. This
> can happen if a system crashes after resetting the first zone of a pair and
> before writing out a new superblock. In this case, a superblock copy can be
> found in the second zone of a log pair. The start of this second zone can
> be found by inspecting the blocks located at the fixed offset of the log
> pair plus the possible zone size (4M [1], 8M, 16M, 32M, 64M, 128M, 256M,
> 512M, 1G, 2G, 4G, 8G [2])[3]. Once we find a superblock, we can follow the
> same instruction above to find the latest superblock copy within the zone
> log pair.
> 
> [1] 4M = BTRFS_MKFS_SYSTEM_GROUP_SIZE. We cannot mkfs on a device with a
> zone size less than 4MB because we cannot create the initial temporary
> system chunk with the size.
> [2] The maximum size we support for now.
> [3] The zone size is limited to these 11 cases, as it must be a power of 2.
> 
> Once we find the latest superblock, it is no different than reading a
> regular btrfs image. You can further confirm the determined zone size by
> comparing it with the size of a device extent because it is the same as the
> zone size.
> 
> Actually, since the writing offset within the logging buffer is different
> from the primary to copies [4], the timing when resetting the former zone
> will become different. So, we can also try reading the head of the buffer
> of a copy in case of missing superblock at offset 0.
> 
> [4] Because mkfs update the primary in the initial process, advancing only
> the write pointer of the primary log buffer
> 
> ## Superblock writing on an emulated zoned device
> 
> By mounting a regular device in zoned mode, btrfs emulates conventional
> zones by slicing the device with a fixed size. In this case, however, we do
> not follow the above rule of writing superblocks at the head of the logging
> zones if they are conventional. Doing so would introduce a chicken-and-egg
> problem. To know the given btrfs is zoned btrfs, we need to read a
> superblock to see the incompatible flags. But, to read a superblock
> properly from a zoned position, we need to know the file-system is zoned a
> priori (e.g. resided in a zoned device), leading to a recursive dependency.
> 
> We can use the regular super block update method on an emulated zoned
> device to break the recursion. Since the zones containing the regular
> locations are always reserved, it is safe to do so. Then, we can naturally
> read a regular superblock on a regular device and determine the file-system
> is zoned or not.
> 
> Naohiro Aota (1):
>    btrfs: zoned: move superblock logging zone location
> 
>   fs/btrfs/zoned.c | 40 ++++++++++++++++++++++++++++++----------
>   1 file changed, 30 insertions(+), 10 deletions(-)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/1] zoned: moving superblock logging zones
  2021-03-29  3:33 ` [PATCH 0/1] zoned: moving superblock logging zones Anand Jain
@ 2021-03-29  7:36   ` Damien Le Moal
  0 siblings, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2021-03-29  7:36 UTC (permalink / raw)
  To: Anand Jain, Naohiro Aota, linux-btrfs, dsterba

On 2021/03/29 12:34, Anand Jain wrote:
> On 15/03/2021 13:53, Naohiro Aota wrote:
>> The following patch will change the superblock logging zones' location from
>> fixed zone number to fixed LBAs.
>>
>> Here is a background of how the superblock is working on zoned btrfs.
>>
>> This document will be promoted to btrfs-dev-docs in the future.
>>
>> # Superblock logging for zoned btrfs
>>
>> The superblock and its copies are the only data structures in btrfs with a
>> fixed location on a device. Since we cannot overwrite these blocks if they
>> are placed in sequential write required zones, we cannot use the regular
>> method of updating superblocks with zoned btrfs.
> 
>   Looks like a ZBC which does the write pointer reset and write could 
> have helped here.

Yes and no. A two-part command like this could fail either on the reset part or
the write part (which would leave the zone empty). So in the end, the possible
error patterns are very similar to using 2 commands, and for the SB logging, we
would still need 2 zones to make sure we do not end up with an SB log that is empty.

> 
>> We also cannot limit the
>> position of superblocks to conventional zones as that would prevent using
>> zoned block devices that do not have this zone type (e.g. NVMe ZNS SSDs).
>>
>> To solve this problem, we use superblock log writing. This method uses two
>> sequential write required zones as a circular buffer to write updated
>> superblocks. Once the first zone is filled up, start writing into the
>> second zone. When both zones are filled up and before start writing to the
>> first zone again, the first zone is reset and writing continues in the
>> first zone. Once the first zone is full, reset the second zone, and write
>> the latest superblock in the second zone. With this logging, we can always
>> determine the position of the latest superblock by inspecting the zones'
>> write pointer information provided by the device. One corner case is when
>> both zones are full. For this situation, we read out the last superblock of
>> each zone and compare them to determine which copy is the latest one.
>>
>> ## Placement of superblock logging zones
>>
>> We use the following three pairs of zones containing fixed offset
>> locations, regardless of the device zone size.
>>
>>    - Primary superblock: zone starting at offset 0 and the following zone
>>    - First copy: zone containing offset 64GB and the following zone
>>    - Second copy: zone containing offset 256GB and the following zone
>>
>> These zones are reserved for superblock logging and never used for data or
>> metadata blocks. Zones containing the offsets used to store superblocks in
>> a regular btrfs volume (no zoned case) are also reserved to avoid
>> confusion.
>>
>> The first copy position is much larger than for a regular btrfs volume
>> (64M).  This increase is to avoid overlapping with the log zones for the
>> primary superblock. This higher location is arbitrary but allows supporting
>> devices with very large zone size, up to 32GB. But we only allow zone sizes
>> up to 8GB for now.
>>
>> ## Writing superblock in conventional zones
>>
>> Conventional zones do not have a write pointer. This zone type thus cannot
>> be used with superblock logging since determining the position of the
>> latest copy of the superblock in a zone pair would be impossible.
>>
>> To address this problem, if either of the zones containing the fixed offset
>> locations for zone logging is a conventional zone, superblock updates are
>> done in-place using the first block of the conventional zone.
>>
>> ## Reading zoned btrfs dump image without zone information
>>
>> Reading a zoned btrfs image without zone information is challenging but
>> possible.
>>
>> We can always find a superblock copy at or after the fixed offset locations
>> determining the logging zones position. With such copy, the superblock
>> incompatible flags indicates if the volume is zoned or not. With a chunk
>> item in the sys_chunk_array, we can determine the zone size from the size
>> of a device extent, itself determined from the chunk length, num_stripes,
>> and sub_stripes.  With this information, all blocks within the 2 logging
>> zones containing the fixed locations can be inspected to find the newest
>> superblock copy.
>>
>> The first zone of a log pair may be empty and have no superblock copy. This
>> can happen if a system crashes after resetting the first zone of a pair and
>> before writing out a new superblock. In this case, a superblock copy can be
>> found in the second zone of a log pair. The start of this second zone can
>> be found by inspecting the blocks located at the fixed offset of the log
>> pair plus the possible zone size (4M [1], 8M, 16M, 32M, 64M, 128M, 256M,
>> 512M, 1G, 2G, 4G, 8G [2])[3]. Once we find a superblock, we can follow the
>> same instruction above to find the latest superblock copy within the zone
>> log pair.
>>
>> [1] 4M = BTRFS_MKFS_SYSTEM_GROUP_SIZE. We cannot mkfs on a device with a
>> zone size less than 4MB because we cannot create the initial temporary
>> system chunk with the size.
>> [2] The maximum size we support for now.
>> [3] The zone size is limited to these 11 cases, as it must be a power of 2.
>>
>> Once we find the latest superblock, it is no different than reading a
>> regular btrfs image. You can further confirm the determined zone size by
>> comparing it with the size of a device extent because it is the same as the
>> zone size.
>>
>> Actually, since the writing offset within the logging buffer is different
>> from the primary to copies [4], the timing when resetting the former zone
>> will become different. So, we can also try reading the head of the buffer
>> of a copy in case of missing superblock at offset 0.
>>
>> [4] Because mkfs update the primary in the initial process, advancing only
>> the write pointer of the primary log buffer
>>
>> ## Superblock writing on an emulated zoned device
>>
>> By mounting a regular device in zoned mode, btrfs emulates conventional
>> zones by slicing the device with a fixed size. In this case, however, we do
>> not follow the above rule of writing superblocks at the head of the logging
>> zones if they are conventional. Doing so would introduce a chicken-and-egg
>> problem. To know the given btrfs is zoned btrfs, we need to read a
>> superblock to see the incompatible flags. But, to read a superblock
>> properly from a zoned position, we need to know the file-system is zoned a
>> priori (e.g. resided in a zoned device), leading to a recursive dependency.
>>
>> We can use the regular super block update method on an emulated zoned
>> device to break the recursion. Since the zones containing the regular
>> locations are always reserved, it is safe to do so. Then, we can naturally
>> read a regular superblock on a regular device and determine the file-system
>> is zoned or not.
>>
>> Naohiro Aota (1):
>>    btrfs: zoned: move superblock logging zone location
>>
>>   fs/btrfs/zoned.c | 40 ++++++++++++++++++++++++++++++----------
>>   1 file changed, 30 insertions(+), 10 deletions(-)
>>
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: zoned: move superblock logging zone location
  2021-03-15  5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
                     ` (2 preceding siblings ...)
  2021-03-26 15:56   ` Johannes Thumshirn
@ 2021-04-07 17:52   ` Josef Bacik
  2021-04-07 18:31     ` Johannes Thumshirn
  3 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2021-04-07 17:52 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, dsterba

On 3/15/21 1:53 AM, Naohiro Aota wrote:
> This commit moves the location of superblock logging zones. The location of
> the logging zones are determined based on fixed block addresses instead of
> on fixed zone numbers.
> 
> By locating the superblock zones using fixed addresses, we can scan a
> dumped file system image without the zone information. And, no drawbacks
> exist.
> 
> We use the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>    - Primary superblock: zone starting at offset 0 and the following zone
>    - First copy: zone containing offset 64GB and the following zone
>    - Second copy: zone containing offset 256GB and the following zone
> 
> If the location of the zones are outside of disk, we don't record the
> superblock copy.
> 
> These addresses are arbitrary, but using addresses that are too large
> reduces superblock reliability for smaller devices, so we do not want to
> exceed 1T to cover all case nicely.
> 
> Also, LBAs are generally distributed initially across one head (platter
> side) up to one or more zones, then go on the next head backward (the other
> side of the same platter), and on to the following head/platter. Thus using
> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
> likely result in each superblock copy being on a different head/platter
> which improves chances of recovery in case of superblock read error.
> 
> These zones are reserved for superblock logging and never used for data or
> metadata blocks. Zones containing the offsets used to store superblocks in
> a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
> 
> Note that we only reserve the 2 zones per primary/copy actually used for
> superblock logging. We don't reserve the ranges possibly containing
> superblock with the largest supported zone size (0-16GB, 64G-80GB,
> 256G-272GB).
> 
> The first copy position is much larger than for a regular btrfs volume
> (64M).  This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone size, up to 32GB. But we only allow zone sizes
> up to 8GB for now.
> 

Ok it took me a few reads to figure out what's going on.

The problem is that with large zone sizes, our current choices put the back up 
super blocks waaaayyyyyy out on the disk, correct?  So instead you've picked 
arbitrary byte offsets, hoping that they'll be closer to the front of the disk 
and thus actually be useful.

And then you've introduced the 8gib zone size as a way to avoid problems where 
we get the same zone for the backup supers.

Are these statements correct?  If so the changelog should be updated to make 
this clear up front, because it took me a while to work that out.

Something at the beginning like the following

"With larger zone sizes, for example 8gib, the 3rd backup super would be located 
8tib into the device.  However not all zoned block devices are this large.  In 
order to fix this limitation set the zones to a static byte offset, and 
calculate the zone number from there based on the devices zone size."

So that it's clear from the outset why we're making this change.

And this brings up another problem, in that what happens when we _do_ run into 
block devices that have huge zones, like 64gib zones?  We have to change the 
disk format to support these devices.  I'm not against that per-se, but it seems 
like a limitation, even if it's unlikely to ever happen.  With the locations we 
currently have, any arbitrary zone size is going to work in the future, and the 
only drawback is you need a device of a certain size to take advantage of the 
back up super blocks.  I would hope that we don't have 64gib zone size block 
devices that are only 128gib in size in the future.


> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/zoned.c | 39 +++++++++++++++++++++++++++++++--------
>   1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 43948bd40e02..6a72ca1f7988 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -21,9 +21,24 @@
>   /* Pseudo write pointer value for conventional zone */
>   #define WP_CONVENTIONAL ((u64)-2)
>   
> +/*
> + * Location of the first zone of superblock logging zone pairs.
> + * - Primary superblock: the zone containing offset 0 (zone 0)
> + * - First superblock copy: the zone containing offset 64G
> + * - Second superblock copy: the zone containing offset 256G
> + */
> +#define BTRFS_PRIMARY_SB_LOG_ZONE 0ULL
> +#define BTRFS_FIRST_SB_LOG_ZONE (64ULL * SZ_1G)
> +#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G)
> +#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE)
> +#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE)
> +
>   /* Number of superblock log zones */
>   #define BTRFS_NR_SB_LOG_ZONES 2
>   
> +/* Max size of supported zone size */
> +#define BTRFS_MAX_ZONE_SIZE SZ_8G
> +
>   static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data)
>   {
>   	struct blk_zone *zones = data;
> @@ -111,11 +126,8 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
>   }
>   
>   /*
> - * The following zones are reserved as the circular buffer on ZONED btrfs.
> - *  - The primary superblock: zones 0 and 1
> - *  - The first copy: zones 16 and 17
> - *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
> - *                     the following one
> + * Get the zone number of the first zone of a pair of contiguous zones used
> + * for superblock logging.
>    */
>   static inline u32 sb_zone_number(int shift, int mirror)
>   {
> @@ -123,8 +135,8 @@ static inline u32 sb_zone_number(int shift, int mirror)
>   
>   	switch (mirror) {
>   	case 0: return 0;
> -	case 1: return 16;
> -	case 2: return min_t(u64, btrfs_sb_offset(mirror) >> shift, 1024);
> +	case 1: return 1 << (BTRFS_FIRST_SB_LOG_ZONE_SHIFT - shift);
> +	case 2: return 1 << (BTRFS_SECOND_SB_LOG_ZONE_SHIFT - shift);
>   	}
>   
>   	return 0;
> @@ -300,10 +312,21 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
>   		zone_sectors = bdev_zone_sectors(bdev);
>   	}
>   
> -	nr_sectors = bdev_nr_sectors(bdev);
>   	/* Check if it's power of 2 (see is_power_of_2) */
>   	ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0);
>   	zone_info->zone_size = zone_sectors << SECTOR_SHIFT;
> +
> +	/* We reject devices with a zone size larger than 8GB. */

A longer explanation here of why it's important that we limit it to our 
MAX_ZONE_SIZE, and use MAX_ZONE_SIZE instead of 8gib, in case we increase the 
limit in the future.

For example

We reject devices with a zone size larger than MAX_ZONE_SIZE because we do not 
want the backup super block zone to overlap with the primary super block zone.

Or something along these lines.  Again, I was confused why this was in the patch 
until I spent a lot more time thinking about it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: zoned: move superblock logging zone location
  2021-04-07 17:52   ` Josef Bacik
@ 2021-04-07 18:31     ` Johannes Thumshirn
  2021-04-07 18:56       ` Josef Bacik
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Thumshirn @ 2021-04-07 18:31 UTC (permalink / raw)
  To: Josef Bacik, Naohiro Aota, linux-btrfs, dsterba

On 07/04/2021 19:54, Josef Bacik wrote:
> On 3/15/21 1:53 AM, Naohiro Aota wrote:
>> This commit moves the location of superblock logging zones. The location of
>> the logging zones are determined based on fixed block addresses instead of
>> on fixed zone numbers.
>>
>> By locating the superblock zones using fixed addresses, we can scan a
>> dumped file system image without the zone information. And, no drawbacks
>> exist.
>>
>> We use the following three pairs of zones containing fixed offset
>> locations, regardless of the device zone size.
>>
>>    - Primary superblock: zone starting at offset 0 and the following zone
>>    - First copy: zone containing offset 64GB and the following zone
>>    - Second copy: zone containing offset 256GB and the following zone
>>
>> If the location of the zones are outside of disk, we don't record the
>> superblock copy.
>>
>> These addresses are arbitrary, but using addresses that are too large
>> reduces superblock reliability for smaller devices, so we do not want to
>> exceed 1T to cover all case nicely.
>>
>> Also, LBAs are generally distributed initially across one head (platter
>> side) up to one or more zones, then go on the next head backward (the other
>> side of the same platter), and on to the following head/platter. Thus using
>> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
>> likely result in each superblock copy being on a different head/platter
>> which improves chances of recovery in case of superblock read error.
>>
>> These zones are reserved for superblock logging and never used for data or
>> metadata blocks. Zones containing the offsets used to store superblocks in
>> a regular btrfs volume (no zoned case) are also reserved to avoid
>> confusion.
>>
>> Note that we only reserve the 2 zones per primary/copy actually used for
>> superblock logging. We don't reserve the ranges possibly containing
>> superblock with the largest supported zone size (0-16GB, 64G-80GB,
>> 256G-272GB).
>>
>> The first copy position is much larger than for a regular btrfs volume
>> (64M).  This increase is to avoid overlapping with the log zones for the
>> primary superblock. This higher location is arbitrary but allows supporting
>> devices with very large zone size, up to 32GB. But we only allow zone sizes
>> up to 8GB for now.
>>
> 
> Ok it took me a few reads to figure out what's going on.
> 
> The problem is that with large zone sizes, our current choices put the back up 
> super blocks waaaayyyyyy out on the disk, correct?  So instead you've picked 
> arbitrary byte offsets, hoping that they'll be closer to the front of the disk 
> and thus actually be useful.
> 
> And then you've introduced the 8gib zone size as a way to avoid problems where 
> we get the same zone for the backup supers.
> 
> Are these statements correct?  If so the changelog should be updated to make 
> this clear up front, because it took me a while to work that out.

No the problem is, we're placing superblocks into specific zones, regardless of
the zone size. This creates a problem when you need to inspect a file system,
but don't have the block device available, because you can't look at the zone 
size to calculate where the superblocks are on the device.

With this change we're placing the superblocks not into specific zone numbers,
but into the zones starting at specific offsets. We're taking 8G zone size as
a maximum expected zone size, to make sure we're not overlapping superblock
zones. Currently SMR disks have a zone size of 256MB and we're expecting ZNS
drives to be in the 1-2GB range, so this 8GB gives us room to breath.

Hope this helps clearing up any confusion.

Byte,
Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] btrfs: zoned: move superblock logging zone location
  2021-04-07 18:31     ` Johannes Thumshirn
@ 2021-04-07 18:56       ` Josef Bacik
  0 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2021-04-07 18:56 UTC (permalink / raw)
  To: Johannes Thumshirn, Naohiro Aota, linux-btrfs, dsterba

On 4/7/21 2:31 PM, Johannes Thumshirn wrote:
> On 07/04/2021 19:54, Josef Bacik wrote:
>> On 3/15/21 1:53 AM, Naohiro Aota wrote:
>>> This commit moves the location of superblock logging zones. The location of
>>> the logging zones are determined based on fixed block addresses instead of
>>> on fixed zone numbers.
>>>
>>> By locating the superblock zones using fixed addresses, we can scan a
>>> dumped file system image without the zone information. And, no drawbacks
>>> exist.
>>>
>>> We use the following three pairs of zones containing fixed offset
>>> locations, regardless of the device zone size.
>>>
>>>     - Primary superblock: zone starting at offset 0 and the following zone
>>>     - First copy: zone containing offset 64GB and the following zone
>>>     - Second copy: zone containing offset 256GB and the following zone
>>>
>>> If the location of the zones are outside of disk, we don't record the
>>> superblock copy.
>>>
>>> These addresses are arbitrary, but using addresses that are too large
>>> reduces superblock reliability for smaller devices, so we do not want to
>>> exceed 1T to cover all case nicely.
>>>
>>> Also, LBAs are generally distributed initially across one head (platter
>>> side) up to one or more zones, then go on the next head backward (the other
>>> side of the same platter), and on to the following head/platter. Thus using
>>> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
>>> likely result in each superblock copy being on a different head/platter
>>> which improves chances of recovery in case of superblock read error.
>>>
>>> These zones are reserved for superblock logging and never used for data or
>>> metadata blocks. Zones containing the offsets used to store superblocks in
>>> a regular btrfs volume (no zoned case) are also reserved to avoid
>>> confusion.
>>>
>>> Note that we only reserve the 2 zones per primary/copy actually used for
>>> superblock logging. We don't reserve the ranges possibly containing
>>> superblock with the largest supported zone size (0-16GB, 64G-80GB,
>>> 256G-272GB).
>>>
>>> The first copy position is much larger than for a regular btrfs volume
>>> (64M).  This increase is to avoid overlapping with the log zones for the
>>> primary superblock. This higher location is arbitrary but allows supporting
>>> devices with very large zone size, up to 32GB. But we only allow zone sizes
>>> up to 8GB for now.
>>>
>>
>> Ok it took me a few reads to figure out what's going on.
>>
>> The problem is that with large zone sizes, our current choices put the back up
>> super blocks waaaayyyyyy out on the disk, correct?  So instead you've picked
>> arbitrary byte offsets, hoping that they'll be closer to the front of the disk
>> and thus actually be useful.
>>
>> And then you've introduced the 8gib zone size as a way to avoid problems where
>> we get the same zone for the backup supers.
>>
>> Are these statements correct?  If so the changelog should be updated to make
>> this clear up front, because it took me a while to work that out.
> 
> No the problem is, we're placing superblocks into specific zones, regardless of
> the zone size. This creates a problem when you need to inspect a file system,
> but don't have the block device available, because you can't look at the zone
> size to calculate where the superblocks are on the device.
> 
> With this change we're placing the superblocks not into specific zone numbers,
> but into the zones starting at specific offsets. We're taking 8G zone size as
> a maximum expected zone size, to make sure we're not overlapping superblock
> zones. Currently SMR disks have a zone size of 256MB and we're expecting ZNS
> drives to be in the 1-2GB range, so this 8GB gives us room to breath.
> 
> Hope this helps clearing up any confusion.
> 

Ok this makes a lot more sense, and should be the first thing in the changelog, 
because I still got it wrong after reading the thing a few times.

And I think it's worth pointing out in the comments that 8gib represents a zone 
size that doesn't exist currently and is likely to never exist.

That will make this much easier to grok and understand in the future.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-07 18:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15  5:53 [PATCH 0/1] zoned: moving superblock logging zones Naohiro Aota
2021-03-15  5:53 ` [PATCH] btrfs: zoned: move superblock logging zone location Naohiro Aota
2021-03-19  8:19   ` Johannes Thumshirn
2021-03-24  8:42   ` Damien Le Moal
2021-03-26 15:56   ` Johannes Thumshirn
2021-04-07 17:52   ` Josef Bacik
2021-04-07 18:31     ` Johannes Thumshirn
2021-04-07 18:56       ` Josef Bacik
2021-03-29  3:33 ` [PATCH 0/1] zoned: moving superblock logging zones Anand Jain
2021-03-29  7:36   ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.