All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Fixes for zoned mode
@ 2021-03-03  8:55 Naohiro Aota
  2021-03-03  8:55 ` [PATCH v2 1/3] btrfs: zoned: use sector_t to get zone sectors Naohiro Aota
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Naohiro Aota @ 2021-03-03  8:55 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: linux-fsdevel, Naohiro Aota

These are fixes for zoned btrfs.

Patch 01 fixes the type of a zone sector variable.  

Patch 02 moves the superblock location to address based.

Patch 03 fixes zone_unusable acconting when a block group is read-only.

v1 -> v2:
 - Move the type fix patch to be the first one
 - Fix the type of variable instead of casting
 - Introduce #define's for the superblock zone locations and maximum zone
   size

Naohiro Aota (3):
  btrfs: zoned: use sector_t to get zone sectors
  btrfs: zoned: move superblock logging zone location
  btrfs: zoned: do not account freed region of read-only block group as
    zone_unusable

 fs/btrfs/free-space-cache.c |  7 ++++++-
 fs/btrfs/zoned.c            | 39 +++++++++++++++++++++++++++++--------
 2 files changed, 37 insertions(+), 9 deletions(-)

-- 
2.30.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] btrfs: zoned: use sector_t to get zone sectors
  2021-03-03  8:55 [PATCH v2 0/3] Fixes for zoned mode Naohiro Aota
@ 2021-03-03  8:55 ` Naohiro Aota
  2021-03-04 14:30   ` David Sterba
  2021-03-03  8:55 ` [PATCH v2 2/3] btrfs: zoned: move superblock logging zone location Naohiro Aota
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Naohiro Aota @ 2021-03-03  8:55 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: linux-fsdevel, Naohiro Aota

We need to use sector_t for zone_sectors, or it set the zone size = 0 when
the size >= 4GB (=  2^24 sectors) by shifting the zone_sectors value by
SECTOR_SHIFT.

Fixes: 5b316468983d ("btrfs: get zone information of zoned block devices")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 9a5cf153da89..1324bb6c3946 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -269,7 +269,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 	sector_t sector = 0;
 	struct blk_zone *zones = NULL;
 	unsigned int i, nreported = 0, nr_zones;
-	unsigned int zone_sectors;
+	sector_t zone_sectors;
 	char *model, *emulated;
 	int ret;
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] btrfs: zoned: move superblock logging zone location
  2021-03-03  8:55 [PATCH v2 0/3] Fixes for zoned mode Naohiro Aota
  2021-03-03  8:55 ` [PATCH v2 1/3] btrfs: zoned: use sector_t to get zone sectors Naohiro Aota
@ 2021-03-03  8:55 ` Naohiro Aota
  2021-03-04 15:14   ` David Sterba
  2021-03-03  8:55 ` [PATCH v2 3/3] btrfs: zoned: do not account freed region of read-only block group as zone_unusable Naohiro Aota
  2021-03-04 13:54 ` [PATCH v2 0/3] Fixes for zoned mode Johannes Thumshirn
  3 siblings, 1 reply; 8+ messages in thread
From: Naohiro Aota @ 2021-03-03  8:55 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: linux-fsdevel, Naohiro Aota

This commit moves the location of superblock logging zones basing on the
fixed address instead of the fixed zone number.

By locating the superblock zones using fixed addresses, we can scan a
dumped file system image without the zone information. And, no drawbacks
exist.

The following zones are reserved as the circular buffer on zoned btrfs.
  - The primary superblock: zone at LBA 0 and the next zone
  - The first copy: zone at LBA 16G and the next zone
  - The second copy: zone at LBA 256G and the next zone

If the location of the zones are outside of disk, we don't record the
superblock copy.

The addresses are much larger than the usual superblock copies locations.
The copies' locations are decided to support possible future larger zone
size, not to overlap the log zones. We support zone size up to 8GB.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 1324bb6c3946..b8f50dc9fbb0 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -24,6 +24,15 @@
 /* Number of superblock log zones */
 #define BTRFS_NR_SB_LOG_ZONES 2
 
+/* Location of superblock log zones */
+#define BTRFS_FIRST_SB_LOG_ZONE SZ_16G
+#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G)
+#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE)
+#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE)
+
+/* Max size of supported zone size */
+#define BTRFS_MAX_ZONE_SIZE SZ_8G
+
 static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data)
 {
 	struct blk_zone *zones = data;
@@ -112,10 +121,9 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 
 /*
  * The following zones are reserved as the circular buffer on ZONED btrfs.
- *  - The primary superblock: zones 0 and 1
- *  - The first copy: zones 16 and 17
- *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
- *                     the following one
+ *  - The primary superblock: zone at LBA 0 and the next zone
+ *  - The first copy: zone at LBA 16G and the next zone
+ *  - The second copy: zone at LBA 256G and the next zone
  */
 static inline u32 sb_zone_number(int shift, int mirror)
 {
@@ -123,8 +131,8 @@ static inline u32 sb_zone_number(int shift, int mirror)
 
 	switch (mirror) {
 	case 0: return 0;
-	case 1: return 16;
-	case 2: return min_t(u64, btrfs_sb_offset(mirror) >> shift, 1024);
+	case 1: return 1 << (BTRFS_FIRST_SB_LOG_ZONE_SHIFT - shift);
+	case 2: return 1 << (BTRFS_SECOND_SB_LOG_ZONE_SHIFT - shift);
 	}
 
 	return 0;
@@ -300,10 +308,25 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
 		zone_sectors = bdev_zone_sectors(bdev);
 	}
 
-	nr_sectors = bdev_nr_sectors(bdev);
 	/* Check if it's power of 2 (see is_power_of_2) */
 	ASSERT(zone_sectors != 0 && (zone_sectors & (zone_sectors - 1)) == 0);
 	zone_info->zone_size = zone_sectors << SECTOR_SHIFT;
+
+	/*
+	 * We must reject devices with a zone size larger than 8GB to avoid
+	 * overlap between the super block primary and first copy fixed log
+	 * locations.
+	 */
+	if (zone_info->zone_size > BTRFS_MAX_ZONE_SIZE) {
+		btrfs_err_in_rcu(fs_info,
+				 "zoned: %s: zone size %llu is too large",
+				 rcu_str_deref(device->name),
+				 zone_info->zone_size);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	nr_sectors = bdev_nr_sectors(bdev);
 	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
 	zone_info->max_zone_append_size =
 		(u64)queue_max_zone_append_sectors(queue) << SECTOR_SHIFT;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] btrfs: zoned: do not account freed region of read-only block group as zone_unusable
  2021-03-03  8:55 [PATCH v2 0/3] Fixes for zoned mode Naohiro Aota
  2021-03-03  8:55 ` [PATCH v2 1/3] btrfs: zoned: use sector_t to get zone sectors Naohiro Aota
  2021-03-03  8:55 ` [PATCH v2 2/3] btrfs: zoned: move superblock logging zone location Naohiro Aota
@ 2021-03-03  8:55 ` Naohiro Aota
  2021-03-04 13:54 ` [PATCH v2 0/3] Fixes for zoned mode Johannes Thumshirn
  3 siblings, 0 replies; 8+ messages in thread
From: Naohiro Aota @ 2021-03-03  8:55 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: linux-fsdevel, Naohiro Aota

We migrate zone unusable bytes to read-only bytes when a block group is set
to read-only, and account all the free region as bytes_readonly. Thus, we
should not increase block_group->zone_unusable when the block group is
read-only.

Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/free-space-cache.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 711a6a751ae9..81835153f747 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2555,7 +2555,12 @@ static int __btrfs_add_free_space_zoned(struct btrfs_block_group *block_group,
 	to_unusable = size - to_free;
 
 	ctl->free_space += to_free;
-	block_group->zone_unusable += to_unusable;
+	/*
+	 * If the block group is read-only, we should account freed
+	 * space into bytes_readonly.
+	 */
+	if (!block_group->ro)
+		block_group->zone_unusable += to_unusable;
 	spin_unlock(&ctl->tree_lock);
 	if (!used) {
 		spin_lock(&block_group->lock);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] Fixes for zoned mode
  2021-03-03  8:55 [PATCH v2 0/3] Fixes for zoned mode Naohiro Aota
                   ` (2 preceding siblings ...)
  2021-03-03  8:55 ` [PATCH v2 3/3] btrfs: zoned: do not account freed region of read-only block group as zone_unusable Naohiro Aota
@ 2021-03-04 13:54 ` Johannes Thumshirn
  3 siblings, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2021-03-04 13:54 UTC (permalink / raw)
  To: Naohiro Aota, linux-btrfs, dsterba; +Cc: linux-fsdevel

On 04/03/2021 02:50, Naohiro Aota wrote:
> These are fixes for zoned btrfs.
> 
> Patch 01 fixes the type of a zone sector variable.  
> 
> Patch 02 moves the superblock location to address based.
> 
> Patch 03 fixes zone_unusable acconting when a block group is read-only.
> 
> v1 -> v2:
>  - Move the type fix patch to be the first one
>  - Fix the type of variable instead of casting
>  - Introduce #define's for the superblock zone locations and maximum zone
>    size
> 
> Naohiro Aota (3):
>   btrfs: zoned: use sector_t to get zone sectors
>   btrfs: zoned: move superblock logging zone location
>   btrfs: zoned: do not account freed region of read-only block group as
>     zone_unusable
> 
>  fs/btrfs/free-space-cache.c |  7 ++++++-
>  fs/btrfs/zoned.c            | 39 +++++++++++++++++++++++++++++--------
>  2 files changed, 37 insertions(+), 9 deletions(-)
> 

For the whole series:
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/3] btrfs: zoned: use sector_t to get zone sectors
  2021-03-03  8:55 ` [PATCH v2 1/3] btrfs: zoned: use sector_t to get zone sectors Naohiro Aota
@ 2021-03-04 14:30   ` David Sterba
  0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2021-03-04 14:30 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, dsterba, linux-fsdevel

On Wed, Mar 03, 2021 at 05:55:46PM +0900, Naohiro Aota wrote:
> We need to use sector_t for zone_sectors, or it set the zone size = 0 when
> the size >= 4GB (=  2^24 sectors) by shifting the zone_sectors value by
> SECTOR_SHIFT.

This does not fix the same bug in btrfs_sb_log_location_bdev.

> Fixes: 5b316468983d ("btrfs: get zone information of zoned block devices")
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/zoned.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 9a5cf153da89..1324bb6c3946 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -269,7 +269,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
>  	sector_t sector = 0;
>  	struct blk_zone *zones = NULL;
>  	unsigned int i, nreported = 0, nr_zones;
> -	unsigned int zone_sectors;
> +	sector_t zone_sectors;
>  	char *model, *emulated;
>  	int ret;
>  
> -- 
> 2.30.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] btrfs: zoned: move superblock logging zone location
  2021-03-03  8:55 ` [PATCH v2 2/3] btrfs: zoned: move superblock logging zone location Naohiro Aota
@ 2021-03-04 15:14   ` David Sterba
  2021-03-04 23:00     ` Damien Le Moal
  0 siblings, 1 reply; 8+ messages in thread
From: David Sterba @ 2021-03-04 15:14 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, dsterba, linux-fsdevel

On Wed, Mar 03, 2021 at 05:55:47PM +0900, Naohiro Aota wrote:
> This commit moves the location of superblock logging zones basing on the
> fixed address instead of the fixed zone number.
> 
> By locating the superblock zones using fixed addresses, we can scan a
> dumped file system image without the zone information. And, no drawbacks
> exist.
> 
> The following zones are reserved as the circular buffer on zoned btrfs.
>   - The primary superblock: zone at LBA 0 and the next zone
>   - The first copy: zone at LBA 16G and the next zone
>   - The second copy: zone at LBA 256G and the next zone
> 
> If the location of the zones are outside of disk, we don't record the
> superblock copy.
> 
> The addresses are much larger than the usual superblock copies locations.
> The copies' locations are decided to support possible future larger zone
> size, not to overlap the log zones. We support zone size up to 8GB.

One thing I don't see is that the reserved space for superblock is fixed
regardless of the actual device zone size. In exclude_super_stripes.

0-16G for primary
... and now what, 16G would be the next copy thus reserving 16 up to 32G

So the 64G offset for the 1st copy is more suitable:

0    -  16G primary
64G  -  80G 1st copy
256G - 272G 2nd copy

This still does not sound great because it just builds on the original
offsets from 10 years ago.  The device sizes are expected to be in
terabytes but all the superblocks are in the first terabyte.

What if we do that like

0   -  16G
1T  -  1T+16G
8T  -  8T+16G

The HDD sizes start somewhere at 4T so the first two copies cover the
small sizes, larger have all three copies. But we could go wild even
more, like 0/4T/16T.

I'm not sure if the capacities for non-HDD are going to be also that
large, I could not find anything specific, the only existing ZNS is some
DC ZN540 but no details.

We need to get this right (best effort), so I'll postpone this patch
until it's all sorted.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] btrfs: zoned: move superblock logging zone location
  2021-03-04 15:14   ` David Sterba
@ 2021-03-04 23:00     ` Damien Le Moal
  0 siblings, 0 replies; 8+ messages in thread
From: Damien Le Moal @ 2021-03-04 23:00 UTC (permalink / raw)
  To: dsterba, Naohiro Aota; +Cc: linux-btrfs, dsterba, linux-fsdevel

On 2021/03/05 0:20, David Sterba wrote:
> On Wed, Mar 03, 2021 at 05:55:47PM +0900, Naohiro Aota wrote:
>> This commit moves the location of superblock logging zones basing on the
>> fixed address instead of the fixed zone number.
>>
>> By locating the superblock zones using fixed addresses, we can scan a
>> dumped file system image without the zone information. And, no drawbacks
>> exist.
>>
>> The following zones are reserved as the circular buffer on zoned btrfs.
>>   - The primary superblock: zone at LBA 0 and the next zone
>>   - The first copy: zone at LBA 16G and the next zone
>>   - The second copy: zone at LBA 256G and the next zone
>>
>> If the location of the zones are outside of disk, we don't record the
>> superblock copy.
>>
>> The addresses are much larger than the usual superblock copies locations.
>> The copies' locations are decided to support possible future larger zone
>> size, not to overlap the log zones. We support zone size up to 8GB.
> 
> One thing I don't see is that the reserved space for superblock is fixed
> regardless of the actual device zone size. In exclude_super_stripes.
> 
> 0-16G for primary
> ... and now what, 16G would be the next copy thus reserving 16 up to 32G
> 
> So the 64G offset for the 1st copy is more suitable:
> 
> 0    -  16G primary
> 64G  -  80G 1st copy
> 256G - 272G 2nd copy
> 
> This still does not sound great because it just builds on the original
> offsets from 10 years ago.  The device sizes are expected to be in
> terabytes but all the superblocks are in the first terabyte.

I do not see an issue with that. For HDDs, one would ideally want each copy
under a different head but determining which head serves which LBA is not
possible with standard commands. LBAs are generally distributed initially across
one head (platter side) up to one or more zones, then goes on the next head
backward (other side of the same platter), and on to the following head/platter.
So distribution is first vertical then goes inward (and when reaching middle of
the platter, everything starts again from the spindle outward).

0/64G/256G likely gives you different heads. No way to tell for certain though.

> What if we do that like
> 
> 0   -  16G
> 1T  -  1T+16G
> 8T  -  8T+16G
> 
> The HDD sizes start somewhere at 4T so the first two copies cover the
> small sizes, larger have all three copies. But we could go wild even
> more, like 0/4T/16T.

That would work for HDDs. We are at 20T with SMR now and the lowest SMR capacity
is 14T. For regular disks, yes, 4T is kind of the starting point for enterprise
drives. Consumer/NAS drives can start lower though, at 1T or 2T.
To be able able to cover all cases nicely, I would suggest not exceeding 1T for
the SB copies.

> I'm not sure if the capacities for non-HDD are going to be also that
> large, I could not find anything specific, the only existing ZNS is some
> DC ZN540 but no details.

That one is sampling at 2T capacity now. This likely will be a lower boundary
and higher capacities will be available. Not sure yet up to what point. Likely,
different models at 4T, 6T, 8T, 16T... will be available. So kind of the same
story as for HDDs. Keeping the SB copies within the first TB will allow
supporting all models.

So I kind of like your initial suggestion:

0    -  16G primary
64G  -  80G 1st copy
256G - 272G 2nd copy

And we could even do:

0    -  16G primary
128G - 160G 1st copy
512G - 544G 2nd copy

Which would also safely allow larger zone sizes beyond 8G for ZNS too.
(I do not think this will happen anytime soon though, but with these values, we
are safer).

> 
> We need to get this right (best effort), so I'll postpone this patch
> until it's all sorted.
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-04 23:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03  8:55 [PATCH v2 0/3] Fixes for zoned mode Naohiro Aota
2021-03-03  8:55 ` [PATCH v2 1/3] btrfs: zoned: use sector_t to get zone sectors Naohiro Aota
2021-03-04 14:30   ` David Sterba
2021-03-03  8:55 ` [PATCH v2 2/3] btrfs: zoned: move superblock logging zone location Naohiro Aota
2021-03-04 15:14   ` David Sterba
2021-03-04 23:00     ` Damien Le Moal
2021-03-03  8:55 ` [PATCH v2 3/3] btrfs: zoned: do not account freed region of read-only block group as zone_unusable Naohiro Aota
2021-03-04 13:54 ` [PATCH v2 0/3] Fixes for zoned mode Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.