linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] md superblock write alignment on 512e devices
@ 2020-10-29 20:13 Christopher Unkel
  2020-10-29 20:13 ` [PATCH v2 1/3] md: factor out repeated sb alignment logic Christopher Unkel
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Christopher Unkel @ 2020-10-29 20:13 UTC (permalink / raw)
  To: linux-raid, Song Liu, Christoph Hellwig; +Cc: linux-kernel, Christopher Unkel

Hello,

Thanks for the feedback on the previous patch series.

A updated patch series with the same function as the first patch
(https://lkml.org/lkml/2020/10/22/1058 "md: align superblock writes to
physical blocks") follows.

As suggested, it introduces a helper function, which can be used to
reduce some code duplication.  It handles the case in super_1_sync()
where the superblock is extended by the addition of new component
devices.

I think it also fixes a bug where the existing code in super_1_load()
ought to be rejecting the array with EINVAL: if the superblock padded
out to the *logical* block length runs into the bitmap.  For example, if
the bitmap offset is 2 (bitmap 1K after superblock) and the logical
block size is 4K, the superblock padded out to 4K runs into the bitmap.
This case may be unusual (perhaps only happens if the array is created
on a 512n device and then raw contents are copied onto a 4kn device) but
I think it is possible.

With respect to the option of simply replacing
queue_logical_block_size() with queue_physical_block_size(), I think
this can result in the code rejecting devices that can be loaded, but
for which the physical block alignment can't be respected--the longer
padded size would trigger the EINVAL cases testing against
data_offset/new_data_offset.  I think it's better to proceed in such
cases, just with unaligned superblock writes as would presently happen.
Also if I'm right about the above bug, then I think this subsitution
would be more likely to trigger it.

Thanks,

  --Chris


Christopher Unkel (3):
  md: factor out repeated sb alignment logic
  md: align superblock writes to physical blocks
  md: reuse sb length-checking logic

 drivers/md/md.c | 69 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 52 insertions(+), 17 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] md: factor out repeated sb alignment logic
  2020-10-29 20:13 [PATCH v2 0/3] md superblock write alignment on 512e devices Christopher Unkel
@ 2020-10-29 20:13 ` Christopher Unkel
  2020-10-29 20:13 ` [PATCH 2/3] md: align superblock writes to physical blocks Christopher Unkel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Christopher Unkel @ 2020-10-29 20:13 UTC (permalink / raw)
  To: linux-raid, Song Liu, Christoph Hellwig; +Cc: linux-kernel, Christopher Unkel

super_1_load() and super_1_sync() both contain a copy of logic to pad
out the superblock size so that it is aligned on a logical block
boundary.  Factor into new function, and use round_up() rather than
explict bitmask-based calculation.

Signed-off-by: Christopher Unkel <cunkel@drivescale.com>
---
This series replaces the first patch of the previous series
(https://lkml.org/lkml/2020/10/22/1058), with the following changes:

1. Creates a helper function super_1_sb_length_ok().
2. Fixes operator placement style violation.
3. Covers case in super_1_sync().
4. Refactors duplicate logic.
5. Covers a case in existing code where aligned superblock could
   run into bitmap.

drivers/md/md.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 98bac4f304ae..d6a55ca1d52e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1646,6 +1646,17 @@ static __le32 calc_sb_1_csum(struct mdp_superblock_1 *sb)
 	return cpu_to_le32(csum);
 }
 
+/*
+ * set rdev->sb_size to that required for number of devices in array
+ * with appropriate padding to underlying sectors
+ */
+static void
+super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev)
+{
+	int sb_size = max_dev * 2 + 256;
+	rdev->sb_size = round_up(sb_size, bdev_logical_block_size(rdev->bdev));
+}
+
 static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version)
 {
 	struct mdp_superblock_1 *sb;
@@ -1653,7 +1664,6 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
 	sector_t sb_start;
 	sector_t sectors;
 	char b[BDEVNAME_SIZE], b2[BDEVNAME_SIZE];
-	int bmask;
 	bool spare_disk = true;
 
 	/*
@@ -1720,10 +1730,7 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
 		rdev->new_data_offset += (s32)le32_to_cpu(sb->new_offset);
 	atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read));
 
-	rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256;
-	bmask = queue_logical_block_size(rdev->bdev->bd_disk->queue)-1;
-	if (rdev->sb_size & bmask)
-		rdev->sb_size = (rdev->sb_size | bmask) + 1;
+	super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev));
 
 	if (minor_version
 	    && rdev->data_offset < sb_start + (rdev->sb_size/512))
@@ -2132,12 +2139,8 @@ static void super_1_sync(struct mddev *mddev, struct md_rdev *rdev)
 			max_dev = rdev2->desc_nr+1;
 
 	if (max_dev > le32_to_cpu(sb->max_dev)) {
-		int bmask;
 		sb->max_dev = cpu_to_le32(max_dev);
-		rdev->sb_size = max_dev * 2 + 256;
-		bmask = queue_logical_block_size(rdev->bdev->bd_disk->queue)-1;
-		if (rdev->sb_size & bmask)
-			rdev->sb_size = (rdev->sb_size | bmask) + 1;
+		super_1_set_rdev_sb_size(rdev, max_dev);
 	} else
 		max_dev = le32_to_cpu(sb->max_dev);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] md: align superblock writes to physical blocks
  2020-10-29 20:13 [PATCH v2 0/3] md superblock write alignment on 512e devices Christopher Unkel
  2020-10-29 20:13 ` [PATCH v2 1/3] md: factor out repeated sb alignment logic Christopher Unkel
@ 2020-10-29 20:13 ` Christopher Unkel
  2020-11-02  8:02   ` Xiao Ni
  2020-10-29 20:13 ` [PATCH 3/3] md: reuse sb length-checking logic Christopher Unkel
  2020-11-02  7:42 ` [PATCH v2 0/3] md superblock write alignment on 512e devices Xiao Ni
  3 siblings, 1 reply; 8+ messages in thread
From: Christopher Unkel @ 2020-10-29 20:13 UTC (permalink / raw)
  To: linux-raid, Song Liu, Christoph Hellwig; +Cc: linux-kernel, Christopher Unkel

Writes of the md superblock are aligned to the logical blocks of the
containing device, but no attempt is made to align them to physical
block boundaries.  This means that on a "512e" device (4k physical, 512
logical) every superblock update hits the 512-byte emulation and the
possible associated performance penalty.

Respect the physical block alignment when possible, that is, when the
write padded out to the physical block doesn't run into the data or
bitmap.

Signed-off-by: Christopher Unkel <cunkel@drivescale.com>
---
This series replaces the first patch of the previous series
(https://lkml.org/lkml/2020/10/22/1058), with the following changes:

1. Creates a helper function super_1_sb_length_ok().
2. Fixes operator placement style violation.
3. Covers case in super_1_sync().
4. Refactors duplicate logic.
5. Covers a case in existing code where aligned superblock could
   run into bitmap.

 drivers/md/md.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d6a55ca1d52e..802a9a256fe5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1646,15 +1646,52 @@ static __le32 calc_sb_1_csum(struct mdp_superblock_1 *sb)
 	return cpu_to_le32(csum);
 }
 
+static int
+super_1_sb_length_ok(struct md_rdev *rdev, int minor_version, int sb_len)
+{
+	int sectors = sb_len / 512;
+	struct mdp_superblock_1 *sb;
+
+	/* superblock is stored in memory as a single page */
+	if (sb_len > PAGE_SIZE)
+		return 0;
+
+	/* check if sb runs into data */
+	if (minor_version) {
+		if (rdev->sb_start + sectors > rdev->data_offset
+		    || rdev->sb_start + sectors > rdev->new_data_offset)
+			return 0;
+	} else if (sb_len > 4096)
+		return 0;
+
+	/* check if sb runs into bitmap */
+	sb = page_address(rdev->sb_page);
+	if (le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) {
+		__s32 bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset);
+		if (bitmap_offset > 0 && sectors > bitmap_offset)
+			return 0;
+	}
+
+	return 1;
+}
+
 /*
  * set rdev->sb_size to that required for number of devices in array
  * with appropriate padding to underlying sectors
  */
 static void
-super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev)
+super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev, int minor_version)
 {
 	int sb_size = max_dev * 2 + 256;
-	rdev->sb_size = round_up(sb_size, bdev_logical_block_size(rdev->bdev));
+	int pb_aligned_size = round_up(sb_size,
+				       bdev_physical_block_size(rdev->bdev));
+
+	/* generate physical-block aligned writes if legal */
+	if (super_1_sb_length_ok(rdev, minor_version, pb_aligned_size))
+		rdev->sb_size = pb_aligned_size;
+	else
+		rdev->sb_size = round_up(sb_size,
+					 bdev_logical_block_size(rdev->bdev));
 }
 
 static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version)
@@ -1730,7 +1767,7 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
 		rdev->new_data_offset += (s32)le32_to_cpu(sb->new_offset);
 	atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read));
 
-	super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev));
+	super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev), minor_version);
 
 	if (minor_version
 	    && rdev->data_offset < sb_start + (rdev->sb_size/512))
@@ -2140,7 +2177,7 @@ static void super_1_sync(struct mddev *mddev, struct md_rdev *rdev)
 
 	if (max_dev > le32_to_cpu(sb->max_dev)) {
 		sb->max_dev = cpu_to_le32(max_dev);
-		super_1_set_rdev_sb_size(rdev, max_dev);
+		super_1_set_rdev_sb_size(rdev, max_dev, mddev->minor_version);
 	} else
 		max_dev = le32_to_cpu(sb->max_dev);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] md: reuse sb length-checking logic
  2020-10-29 20:13 [PATCH v2 0/3] md superblock write alignment on 512e devices Christopher Unkel
  2020-10-29 20:13 ` [PATCH v2 1/3] md: factor out repeated sb alignment logic Christopher Unkel
  2020-10-29 20:13 ` [PATCH 2/3] md: align superblock writes to physical blocks Christopher Unkel
@ 2020-10-29 20:13 ` Christopher Unkel
  2020-11-02  7:42 ` [PATCH v2 0/3] md superblock write alignment on 512e devices Xiao Ni
  3 siblings, 0 replies; 8+ messages in thread
From: Christopher Unkel @ 2020-10-29 20:13 UTC (permalink / raw)
  To: linux-raid, Song Liu, Christoph Hellwig; +Cc: linux-kernel, Christopher Unkel

The logic in super_1_load() to check the length of the superblock
against (new_)data_offset has the same purpose as the newly-created
super_1_sb_length_ok().  The latter is also more complete in that it
check for overlap between the superblock write and the bitmap.

Signed-off-by: Christopher Unkel <cunkel@drivescale.com>
---
This series replaces the first patch of the previous series
(https://lkml.org/lkml/2020/10/22/1058), with the following changes:

1. Creates a helper function super_1_sb_length_ok().
2. Fixes operator placement style violation.
3. Covers case in super_1_sync().
4. Refactors duplicate logic.
5. Covers a case in existing code where aligned superblock could
   run into bitmap.

 drivers/md/md.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 802a9a256fe5..3b7bf14922ac 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1768,13 +1768,8 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
 	atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read));
 
 	super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev), minor_version);
-
-	if (minor_version
-	    && rdev->data_offset < sb_start + (rdev->sb_size/512))
-		return -EINVAL;
-	if (minor_version
-	    && rdev->new_data_offset < sb_start + (rdev->sb_size/512))
-		return -EINVAL;
+	if (!super_1_sb_length_ok(rdev, minor_version, rdev->sb_size))
+	    return -EINVAL;
 
 	if (sb->level == cpu_to_le32(LEVEL_MULTIPATH))
 		rdev->desc_nr = -1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] md superblock write alignment on 512e devices
  2020-10-29 20:13 [PATCH v2 0/3] md superblock write alignment on 512e devices Christopher Unkel
                   ` (2 preceding siblings ...)
  2020-10-29 20:13 ` [PATCH 3/3] md: reuse sb length-checking logic Christopher Unkel
@ 2020-11-02  7:42 ` Xiao Ni
  2020-11-03 20:12   ` Chris Unkel
  3 siblings, 1 reply; 8+ messages in thread
From: Xiao Ni @ 2020-11-02  7:42 UTC (permalink / raw)
  To: Christopher Unkel, linux-raid, Song Liu, Christoph Hellwig; +Cc: linux-kernel



On 10/30/2020 04:13 AM, Christopher Unkel wrote:
> Hello,
>
> Thanks for the feedback on the previous patch series.
>
> A updated patch series with the same function as the first patch
> (https://lkml.org/lkml/2020/10/22/1058 "md: align superblock writes to
> physical blocks") follows.
>
> As suggested, it introduces a helper function, which can be used to
> reduce some code duplication.  It handles the case in super_1_sync()
> where the superblock is extended by the addition of new component
> devices.
>
> I think it also fixes a bug where the existing code in super_1_load()
> ought to be rejecting the array with EINVAL: if the superblock padded
> out to the *logical* block length runs into the bitmap.  For example, if
> the bitmap offset is 2 (bitmap 1K after superblock) and the logical
> block size is 4K, the superblock padded out to 4K runs into the bitmap.
> This case may be unusual (perhaps only happens if the array is created
> on a 512n device and then raw contents are copied onto a 4kn device) but
> I think it is possible.
Hi Chris
For super1.1 and super1.2 bitmap offset is 8. It's a fixed value. So it 
should
not have the risk?

But for future maybe it has this problem. If the disk logical or 
physical block size
is larger than 4K in future, it has data corruption risk.
>
> With respect to the option of simply replacing
> queue_logical_block_size() with queue_physical_block_size(), I think
> this can result in the code rejecting devices that can be loaded, but
In mdadm it defines the max super size of super1 is 4096
#define MAX_SB_SIZE 4096
/* bitmap super size is 256, but we round up to a sector for alignment */
#define BM_SUPER_SIZE 512
#define MAX_DEVS ((int)(MAX_SB_SIZE - sizeof(struct mdp_superblock_1)) / 2)
#define SUPER1_SIZE     (MAX_SB_SIZE + BM_SUPER_SIZE \
                          + sizeof(struct misc_dev_info))

It should be ok to replace queue_logical_block_size with 
queue_physical_block_size?
Now it doesn't check physical block size and super block size. For 
super1, we can add
a check that if physical block size is larger than MAX_SB_SIZE, then we 
reject to create/assmble
the raid device.
> for which the physical block alignment can't be respected--the longer
> padded size would trigger the EINVAL cases testing against
> data_offset/new_data_offset.  I think it's better to proceed in such
> cases, just with unaligned superblock writes as would presently happen.
> Also if I'm right about the above bug, then I think this subsitution
> would be more likely to trigger it.
>
> Thanks,
>
>    --Chris
>
>
> Christopher Unkel (3):
>    md: factor out repeated sb alignment logic
>    md: align superblock writes to physical blocks
>    md: reuse sb length-checking logic
>
>   drivers/md/md.c | 69 +++++++++++++++++++++++++++++++++++++------------
>   1 file changed, 52 insertions(+), 17 deletions(-)
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] md: align superblock writes to physical blocks
  2020-10-29 20:13 ` [PATCH 2/3] md: align superblock writes to physical blocks Christopher Unkel
@ 2020-11-02  8:02   ` Xiao Ni
  0 siblings, 0 replies; 8+ messages in thread
From: Xiao Ni @ 2020-11-02  8:02 UTC (permalink / raw)
  To: Christopher Unkel, linux-raid, Song Liu, Christoph Hellwig; +Cc: linux-kernel



On 10/30/2020 04:13 AM, Christopher Unkel wrote:
> Writes of the md superblock are aligned to the logical blocks of the
> containing device, but no attempt is made to align them to physical
> block boundaries.  This means that on a "512e" device (4k physical, 512
> logical) every superblock update hits the 512-byte emulation and the
> possible associated performance penalty.
>
> Respect the physical block alignment when possible, that is, when the
> write padded out to the physical block doesn't run into the data or
> bitmap.
>
> Signed-off-by: Christopher Unkel <cunkel@drivescale.com>
> ---
> This series replaces the first patch of the previous series
> (https://lkml.org/lkml/2020/10/22/1058), with the following changes:
>
> 1. Creates a helper function super_1_sb_length_ok().
> 2. Fixes operator placement style violation.
> 3. Covers case in super_1_sync().
> 4. Refactors duplicate logic.
> 5. Covers a case in existing code where aligned superblock could
>     run into bitmap.
>
>   drivers/md/md.c | 45 +++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 41 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d6a55ca1d52e..802a9a256fe5 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1646,15 +1646,52 @@ static __le32 calc_sb_1_csum(struct mdp_superblock_1 *sb)
>   	return cpu_to_le32(csum);
>   }
>   
> +static int
> +super_1_sb_length_ok(struct md_rdev *rdev, int minor_version, int sb_len)
> +{
> +	int sectors = sb_len / 512;
> +	struct mdp_superblock_1 *sb;
> +
> +	/* superblock is stored in memory as a single page */
> +	if (sb_len > PAGE_SIZE)
> +		return 0;
> +
> +	/* check if sb runs into data */
> +	if (minor_version) {
> +		if (rdev->sb_start + sectors > rdev->data_offset
> +		    || rdev->sb_start + sectors > rdev->new_data_offset)
> +			return 0;
> +	} else if (sb_len > 4096)
> +		return 0;
> +
> +	/* check if sb runs into bitmap */
> +	sb = page_address(rdev->sb_page);
> +	if (le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) {
> +		__s32 bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset);
> +		if (bitmap_offset > 0 && sectors > bitmap_offset)
> +			return 0;
> +	}
> +
> +	return 1;
> +}
> +
For super1.0 it doesn't need to consider this. Because the data and 
bitmap is before superblock.
For super1.1 and 1.2 it only needs to check whether it runs into bitmap. 
The data is behind the
bitmap.

Regards
Xiao
>   /*
>    * set rdev->sb_size to that required for number of devices in array
>    * with appropriate padding to underlying sectors
>    */
>   static void
> -super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev)
> +super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev, int minor_version)
>   {
>   	int sb_size = max_dev * 2 + 256;
> -	rdev->sb_size = round_up(sb_size, bdev_logical_block_size(rdev->bdev));
> +	int pb_aligned_size = round_up(sb_size,
> +				       bdev_physical_block_size(rdev->bdev));
> +
> +	/* generate physical-block aligned writes if legal */
> +	if (super_1_sb_length_ok(rdev, minor_version, pb_aligned_size))
> +		rdev->sb_size = pb_aligned_size;
> +	else
> +		rdev->sb_size = round_up(sb_size,
> +					 bdev_logical_block_size(rdev->bdev));
>   }
>   
>   static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version)
> @@ -1730,7 +1767,7 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_
>   		rdev->new_data_offset += (s32)le32_to_cpu(sb->new_offset);
>   	atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read));
>   
> -	super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev));
> +	super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev), minor_version);
>   
>   	if (minor_version
>   	    && rdev->data_offset < sb_start + (rdev->sb_size/512))
> @@ -2140,7 +2177,7 @@ static void super_1_sync(struct mddev *mddev, struct md_rdev *rdev)
>   
>   	if (max_dev > le32_to_cpu(sb->max_dev)) {
>   		sb->max_dev = cpu_to_le32(max_dev);
> -		super_1_set_rdev_sb_size(rdev, max_dev);
> +		super_1_set_rdev_sb_size(rdev, max_dev, mddev->minor_version);
>   	} else
>   		max_dev = le32_to_cpu(sb->max_dev);
>   


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] md superblock write alignment on 512e devices
  2020-11-02  7:42 ` [PATCH v2 0/3] md superblock write alignment on 512e devices Xiao Ni
@ 2020-11-03 20:12   ` Chris Unkel
  2020-11-05  3:29     ` Xiao Ni
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Unkel @ 2020-11-03 20:12 UTC (permalink / raw)
  To: Xiao Ni; +Cc: linux-raid, Song Liu, Christoph Hellwig, open list

Hi Xiao,

Thanks for the excellent feedback.  Since bitmap_offset appears to be
a free-form field, it wasn't apparent to me that the bitmap never
starts within 4K of the bitmap.

I don't think it's worth worrying about a logical block size that's
more than 4K here--from what I can see logical block size larger than
the usual 4K page isn't going to happen.

I do think that it makes sense to handle the case where the physical
block size is more than 4K.  I think what you propose works, but I
think in the physical block > MAX_SB_SIZE case it makes more sense to
align the superblock writes to the physical block size (as now) rather
than rejecting the create/assemble.  Mounting with the possible
performance hit seems like a better outcome for the user in that case
than refusing to assemble.
It's the same check that would have to be written to reject the
assembly in that case and so the code shouldn't really be any more
complex.

So basically what I propose is:  if the physical block size is no
larger than MAX_SB_SIZE, pad to that; otherwise pad to to
logical_block_size, that is, replace queue_logical_block_size()
with something equivalent to:

    queue_physical_block_size(...) > MAX_SB_SIZE ?
queue_logical_block_size(...) : queue_physical_block_size(...)

which is simple, safe in all cases, doesn't reject any feasible
assembly, and generates aligned sb writes on all common current
devices (512n,4kn,512e.)

What do you think?

Regards,

  --Chris

On Sun, Nov 1, 2020 at 11:43 PM Xiao Ni <xni@redhat.com> wrote:
>
>
>
> On 10/30/2020 04:13 AM, Christopher Unkel wrote:
> > Hello,
> >
> > Thanks for the feedback on the previous patch series.
> >
> > A updated patch series with the same function as the first patch
> > (https://lkml.org/lkml/2020/10/22/1058 "md: align superblock writes to
> > physical blocks") follows.
> >
> > As suggested, it introduces a helper function, which can be used to
> > reduce some code duplication.  It handles the case in super_1_sync()
> > where the superblock is extended by the addition of new component
> > devices.
> >
> > I think it also fixes a bug where the existing code in super_1_load()
> > ought to be rejecting the array with EINVAL: if the superblock padded
> > out to the *logical* block length runs into the bitmap.  For example, if
> > the bitmap offset is 2 (bitmap 1K after superblock) and the logical
> > block size is 4K, the superblock padded out to 4K runs into the bitmap.
> > This case may be unusual (perhaps only happens if the array is created
> > on a 512n device and then raw contents are copied onto a 4kn device) but
> > I think it is possible.
> Hi Chris
> For super1.1 and super1.2 bitmap offset is 8. It's a fixed value. So it
> should
> not have the risk?
>
> But for future maybe it has this problem. If the disk logical or
> physical block size
> is larger than 4K in future, it has data corruption risk.
> >
> > With respect to the option of simply replacing
> > queue_logical_block_size() with queue_physical_block_size(), I think
> > this can result in the code rejecting devices that can be loaded, but
> In mdadm it defines the max super size of super1 is 4096
> #define MAX_SB_SIZE 4096
> /* bitmap super size is 256, but we round up to a sector for alignment */
> #define BM_SUPER_SIZE 512
> #define MAX_DEVS ((int)(MAX_SB_SIZE - sizeof(struct mdp_superblock_1)) / 2)
> #define SUPER1_SIZE     (MAX_SB_SIZE + BM_SUPER_SIZE \
>                           + sizeof(struct misc_dev_info))
>
> It should be ok to replace queue_logical_block_size with
> queue_physical_block_size?
> Now it doesn't check physical block size and super block size. For
> super1, we can add
> a check that if physical block size is larger than MAX_SB_SIZE, then we
> reject to create/assmble
> the raid device.
> > for which the physical block alignment can't be respected--the longer
> > padded size would trigger the EINVAL cases testing against
> > data_offset/new_data_offset.  I think it's better to proceed in such
> > cases, just with unaligned superblock writes as would presently happen.
> > Also if I'm right about the above bug, then I think this subsitution
> > would be more likely to trigger it.
> >
> > Thanks,
> >
> >    --Chris
> >
> >
> > Christopher Unkel (3):
> >    md: factor out repeated sb alignment logic
> >    md: align superblock writes to physical blocks
> >    md: reuse sb length-checking logic
> >
> >   drivers/md/md.c | 69 +++++++++++++++++++++++++++++++++++++------------
> >   1 file changed, 52 insertions(+), 17 deletions(-)
> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] md superblock write alignment on 512e devices
  2020-11-03 20:12   ` Chris Unkel
@ 2020-11-05  3:29     ` Xiao Ni
  0 siblings, 0 replies; 8+ messages in thread
From: Xiao Ni @ 2020-11-05  3:29 UTC (permalink / raw)
  To: Chris Unkel; +Cc: linux-raid, Song Liu, Christoph Hellwig, open list



On 11/04/2020 04:12 AM, Chris Unkel wrote:
> Hi Xiao,
>
> Thanks for the excellent feedback.  Since bitmap_offset appears to be
> a free-form field, it wasn't apparent to me that the bitmap never
> starts within 4K of the bitmap.
>
> I don't think it's worth worrying about a logical block size that's
> more than 4K here--from what I can see logical block size larger than
> the usual 4K page isn't going to happen.
>
> I do think that it makes sense to handle the case where the physical
> block size is more than 4K.  I think what you propose works, but I
> think in the physical block > MAX_SB_SIZE case it makes more sense to
> align the superblock writes to the physical block size (as now) rather
Is it a typo error? You want to say if physical block > MAX_SB_SIZE, it 
should align the
superblock writes to logical block size? Because I see the comments 
below, your solution
is to align to logical block size when physical block > MAX_SB_SIZE.
> than rejecting the create/assemble.  Mounting with the possible
> performance hit seems like a better outcome for the user in that case
> than refusing to assemble.
> It's the same check that would have to be written to reject the
> assembly in that case and so the code shouldn't really be any more
> complex.
>
> So basically what I propose is:  if the physical block size is no
> larger than MAX_SB_SIZE, pad to that; otherwise pad to to
> logical_block_size, that is, replace queue_logical_block_size()
> with something equivalent to:
>
>      queue_physical_block_size(...) > MAX_SB_SIZE ?
> queue_logical_block_size(...) : queue_physical_block_size(...)
>
> which is simple, safe in all cases, doesn't reject any feasible
> assembly, and generates aligned sb writes on all common current
> devices (512n,4kn,512e.)
>
> What do you think?
Yes, It's a nice solution :)

Regards
Xiao


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-05  3:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29 20:13 [PATCH v2 0/3] md superblock write alignment on 512e devices Christopher Unkel
2020-10-29 20:13 ` [PATCH v2 1/3] md: factor out repeated sb alignment logic Christopher Unkel
2020-10-29 20:13 ` [PATCH 2/3] md: align superblock writes to physical blocks Christopher Unkel
2020-11-02  8:02   ` Xiao Ni
2020-10-29 20:13 ` [PATCH 3/3] md: reuse sb length-checking logic Christopher Unkel
2020-11-02  7:42 ` [PATCH v2 0/3] md superblock write alignment on 512e devices Xiao Ni
2020-11-03 20:12   ` Chris Unkel
2020-11-05  3:29     ` Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).