[PATCH] block: align max append sectors to physical block size

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] block: align max append sectors to physical block size
@ 2020-07-16 10:09 Johannes Thumshirn
  2020-07-16 14:34 ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Johannes Thumshirn @ 2020-07-16 10:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Johannes Thumshirn

Max append sectors needs to be aligned to physical block size, otherwise
we can end up in a situation where it's off by 1-3 sectors which would
cause short writes with asynchronous zone append submissions from an FS.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 block/blk-settings.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 9a2c23cd9700..d75c4cc34a7a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -231,6 +231,7 @@ EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
 void blk_queue_max_zone_append_sectors(struct request_queue *q,
 		unsigned int max_zone_append_sectors)
 {
+	unsigned int phys = queue_physical_block_size(q);
 	unsigned int max_sectors;
 
 	if (WARN_ON(!blk_queue_is_zoned(q)))
@@ -246,6 +247,13 @@ void blk_queue_max_zone_append_sectors(struct request_queue *q,
 	 */
 	WARN_ON(!max_sectors);
 
+	/*
+	 * Max append sectors needs to be aligned to physical block size,
+	 * otherwise we can end up in a situation where it's off by 1-3 sectors
+	 * which would cause short writes with asynchronous zone append
+	 * submissions from an FS.
+	 */
+	max_sectors = ALIGN_DOWN(max_sectors << 9, phys) >> 9;
 	q->limits.max_zone_append_sectors = max_sectors;
 }
 EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-16 10:09 [PATCH] block: align max append sectors to physical block size Johannes Thumshirn
@ 2020-07-16 14:34 ` Christoph Hellwig
  2020-07-17  2:45   ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2020-07-16 14:34 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Jens Axboe, linux-block

On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
> Max append sectors needs to be aligned to physical block size, otherwise
> we can end up in a situation where it's off by 1-3 sectors which would
> cause short writes with asynchronous zone append submissions from an FS.

Huh? The physical block size is purely a hint.

> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  block/blk-settings.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 9a2c23cd9700..d75c4cc34a7a 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -231,6 +231,7 @@ EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
>  void blk_queue_max_zone_append_sectors(struct request_queue *q,
>  		unsigned int max_zone_append_sectors)
>  {
> +	unsigned int phys = queue_physical_block_size(q);
>  	unsigned int max_sectors;
>  
>  	if (WARN_ON(!blk_queue_is_zoned(q)))
> @@ -246,6 +247,13 @@ void blk_queue_max_zone_append_sectors(struct request_queue *q,
>  	 */
>  	WARN_ON(!max_sectors);
>  
> +	/*
> +	 * Max append sectors needs to be aligned to physical block size,
> +	 * otherwise we can end up in a situation where it's off by 1-3 sectors
> +	 * which would cause short writes with asynchronous zone append
> +	 * submissions from an FS.
> +	 */
> +	max_sectors = ALIGN_DOWN(max_sectors << 9, phys) >> 9;
>  	q->limits.max_zone_append_sectors = max_sectors;
>  }
>  EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors);
> -- 
> 2.26.2
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-16 14:34 ` Christoph Hellwig
@ 2020-07-17  2:45   ` Damien Le Moal
  2020-07-17  7:50     ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2020-07-17  2:45 UTC (permalink / raw)
  To: hch, Johannes Thumshirn; +Cc: Jens Axboe, linux-block

On 2020/07/16 23:35, Christoph Hellwig wrote:
> On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
>> Max append sectors needs to be aligned to physical block size, otherwise
>> we can end up in a situation where it's off by 1-3 sectors which would
>> cause short writes with asynchronous zone append submissions from an FS.
> 
> Huh? The physical block size is purely a hint.

For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size.
However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K
on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to
255 x 512B sectors, which is not 4K aligned if the nullb device is created with
4K block size.

So we can fix each driver separately to ensure that we end up with a sensible
max_hw_sectors_kb and by extension a sensible value for max_zone_append_sectors,
or we could force this alignment in blk_queue_max_hw_sectors().

Right now, without any fix, there are setups that end up having weird values
that the user have to sort through and may have a hard time making sense of.

> 
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>  block/blk-settings.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 9a2c23cd9700..d75c4cc34a7a 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -231,6 +231,7 @@ EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);
>>  void blk_queue_max_zone_append_sectors(struct request_queue *q,
>>  		unsigned int max_zone_append_sectors)
>>  {
>> +	unsigned int phys = queue_physical_block_size(q);
>>  	unsigned int max_sectors;
>>  
>>  	if (WARN_ON(!blk_queue_is_zoned(q)))
>> @@ -246,6 +247,13 @@ void blk_queue_max_zone_append_sectors(struct request_queue *q,
>>  	 */
>>  	WARN_ON(!max_sectors);
>>  
>> +	/*
>> +	 * Max append sectors needs to be aligned to physical block size,
>> +	 * otherwise we can end up in a situation where it's off by 1-3 sectors
>> +	 * which would cause short writes with asynchronous zone append
>> +	 * submissions from an FS.
>> +	 */
>> +	max_sectors = ALIGN_DOWN(max_sectors << 9, phys) >> 9;
>>  	q->limits.max_zone_append_sectors = max_sectors;
>>  }
>>  EXPORT_SYMBOL_GPL(blk_queue_max_zone_append_sectors);
>> -- 
>> 2.26.2
>>
> ---end quoted text---
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-17  2:45   ` Damien Le Moal
@ 2020-07-17  7:50     ` Ming Lei
  2020-07-17  8:22       ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-07-17  7:50 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: hch, Johannes Thumshirn, Jens Axboe, linux-block

On Fri, Jul 17, 2020 at 02:45:25AM +0000, Damien Le Moal wrote:
> On 2020/07/16 23:35, Christoph Hellwig wrote:
> > On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
> >> Max append sectors needs to be aligned to physical block size, otherwise
> >> we can end up in a situation where it's off by 1-3 sectors which would
> >> cause short writes with asynchronous zone append submissions from an FS.
> > 
> > Huh? The physical block size is purely a hint.
> 
> For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size.

Then the physical block size should be same with logical block size.
The real workable limit for io request is aligned with logical block size.

> However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K
> on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to
> 255 x 512B sectors, which is not 4K aligned if the nullb device is created with
> 4K block size.

Actually the real limit is from max_sectors_kb which is <= max_hw_sectors_kb, and
both should be aligned with logical block size, IMO.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-17  7:50     ` Ming Lei
@ 2020-07-17  8:22       ` Damien Le Moal
  2020-07-17  9:11         ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2020-07-17  8:22 UTC (permalink / raw)
  To: Ming Lei; +Cc: hch, Johannes Thumshirn, Jens Axboe, linux-block

On 2020/07/17 16:50, Ming Lei wrote:
> On Fri, Jul 17, 2020 at 02:45:25AM +0000, Damien Le Moal wrote:
>> On 2020/07/16 23:35, Christoph Hellwig wrote:
>>> On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
>>>> Max append sectors needs to be aligned to physical block size, otherwise
>>>> we can end up in a situation where it's off by 1-3 sectors which would
>>>> cause short writes with asynchronous zone append submissions from an FS.
>>>
>>> Huh? The physical block size is purely a hint.
>>
>> For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size.
> 
> Then the physical block size should be same with logical block size.
> The real workable limit for io request is aligned with logical block size.

Yes, I know. This T10/T13 design is not the brightest thing they did... on 512e
SMR drives, addressing is LBA=512B unit, but all writes in sequential zones must
be 4K aligned (8 LBAs).

> 
>> However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K
>> on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to
>> 255 x 512B sectors, which is not 4K aligned if the nullb device is created with
>> 4K block size.
> 
> Actually the real limit is from max_sectors_kb which is <= max_hw_sectors_kb, and
> both should be aligned with logical block size, IMO.

Yes, agreed, but for nullblk device created with block size = 4K it is not. So
one driver to patch for sure. However, I though having some forced alignment in
blk_queue_max_hw_sectors() for limit->max_hw_sectors and limit->max_sectors
would avoid tripping on weird values for weird drives...



-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-17  8:22       ` Damien Le Moal
@ 2020-07-17  9:11         ` Ming Lei
  2020-07-17  9:19           ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-07-17  9:11 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: hch, Johannes Thumshirn, Jens Axboe, linux-block

On Fri, Jul 17, 2020 at 08:22:45AM +0000, Damien Le Moal wrote:
> On 2020/07/17 16:50, Ming Lei wrote:
> > On Fri, Jul 17, 2020 at 02:45:25AM +0000, Damien Le Moal wrote:
> >> On 2020/07/16 23:35, Christoph Hellwig wrote:
> >>> On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
> >>>> Max append sectors needs to be aligned to physical block size, otherwise
> >>>> we can end up in a situation where it's off by 1-3 sectors which would
> >>>> cause short writes with asynchronous zone append submissions from an FS.
> >>>
> >>> Huh? The physical block size is purely a hint.
> >>
> >> For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size.
> > 
> > Then the physical block size should be same with logical block size.
> > The real workable limit for io request is aligned with logical block size.
> 
> Yes, I know. This T10/T13 design is not the brightest thing they did... on 512e
> SMR drives, addressing is LBA=512B unit, but all writes in sequential zones must
> be 4K aligned (8 LBAs).

Then the issue isn't related with zone append command only. Just wondering how this
special write block size alignment is enhanced in sequential zones. So far, write
from FS or raw block size is only logical block size aligned.

> 
> > 
> >> However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K
> >> on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to
> >> 255 x 512B sectors, which is not 4K aligned if the nullb device is created with
> >> 4K block size.
> > 
> > Actually the real limit is from max_sectors_kb which is <= max_hw_sectors_kb, and
> > both should be aligned with logical block size, IMO.
> 
> Yes, agreed, but for nullblk device created with block size = 4K it is not. So

That is because the default magic number of BLK_SAFE_MAX_SECTORS.

> one driver to patch for sure. However, I though having some forced alignment in
> blk_queue_max_hw_sectors() for limit->max_hw_sectors and limit->max_sectors
> would avoid tripping on weird values for weird drives...

Maybe we can update it once the logical block size is available, such
as:

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 9a2c23cd9700..f9cbaadaa267 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -311,6 +311,14 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
 }
 EXPORT_SYMBOL(blk_queue_max_segment_size);
 
+static unsigned blk_queue_round_sectors(struct request_queue *q,
+		unsigned sectors)
+{
+	u64 bytes = sectors << 9;
+
+	return (unsigned)round_down(bytes, queue_logical_block_size(q));
+}
+
 /**
  * blk_queue_logical_block_size - set logical block size for the queue
  * @q:  the request queue for the device
@@ -330,6 +338,9 @@ void blk_queue_logical_block_size(struct request_queue *q, unsigned int size)
 
 	if (q->limits.io_min < q->limits.physical_block_size)
 		q->limits.io_min = q->limits.physical_block_size;
+
+	q->limits.max_sectors = blk_queue_round_sectors(q,
+			q->limits.max_sectors)
 }
 EXPORT_SYMBOL(blk_queue_logical_block_size);
 


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-17  9:11         ` Ming Lei
@ 2020-07-17  9:19           ` Damien Le Moal
  2020-07-17 10:02             ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2020-07-17  9:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: hch, Johannes Thumshirn, Jens Axboe, linux-block

On 2020/07/17 18:12, Ming Lei wrote:
> On Fri, Jul 17, 2020 at 08:22:45AM +0000, Damien Le Moal wrote:
>> On 2020/07/17 16:50, Ming Lei wrote:
>>> On Fri, Jul 17, 2020 at 02:45:25AM +0000, Damien Le Moal wrote:
>>>> On 2020/07/16 23:35, Christoph Hellwig wrote:
>>>>> On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
>>>>>> Max append sectors needs to be aligned to physical block size, otherwise
>>>>>> we can end up in a situation where it's off by 1-3 sectors which would
>>>>>> cause short writes with asynchronous zone append submissions from an FS.
>>>>>
>>>>> Huh? The physical block size is purely a hint.
>>>>
>>>> For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size.
>>>
>>> Then the physical block size should be same with logical block size.
>>> The real workable limit for io request is aligned with logical block size.
>>
>> Yes, I know. This T10/T13 design is not the brightest thing they did... on 512e
>> SMR drives, addressing is LBA=512B unit, but all writes in sequential zones must
>> be 4K aligned (8 LBAs).
> 
> Then the issue isn't related with zone append command only. Just wondering how this
> special write block size alignment is enhanced in sequential zones. So far, write
> from FS or raw block size is only logical block size aligned.

This is not enforced in sd/sd_zbc.c. If the user issues a non 4K aligned
request, it will get back an "unaligned write" error from the drive. zonefs and
dm-zoned define a 4K block size to avoid that. For applications doing raw block
device accesses, they have to issue properly aligned writes.

> 
>>
>>>
>>>> However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K
>>>> on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to
>>>> 255 x 512B sectors, which is not 4K aligned if the nullb device is created with
>>>> 4K block size.
>>>
>>> Actually the real limit is from max_sectors_kb which is <= max_hw_sectors_kb, and
>>> both should be aligned with logical block size, IMO.
>>
>> Yes, agreed, but for nullblk device created with block size = 4K it is not. So
> 
> That is because the default magic number of BLK_SAFE_MAX_SECTORS.
> 
>> one driver to patch for sure. However, I though having some forced alignment in
>> blk_queue_max_hw_sectors() for limit->max_hw_sectors and limit->max_sectors
>> would avoid tripping on weird values for weird drives...
> 
> Maybe we can update it once the logical block size is available, such
> as:
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 9a2c23cd9700..f9cbaadaa267 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -311,6 +311,14 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
>  }
>  EXPORT_SYMBOL(blk_queue_max_segment_size);
>  
> +static unsigned blk_queue_round_sectors(struct request_queue *q,
> +		unsigned sectors)
> +{
> +	u64 bytes = sectors << 9;
> +
> +	return (unsigned)round_down(bytes, queue_logical_block_size(q));
> +}
> +
>  /**
>   * blk_queue_logical_block_size - set logical block size for the queue
>   * @q:  the request queue for the device
> @@ -330,6 +338,9 @@ void blk_queue_logical_block_size(struct request_queue *q, unsigned int size)
>  
>  	if (q->limits.io_min < q->limits.physical_block_size)
>  		q->limits.io_min = q->limits.physical_block_size;
> +
> +	q->limits.max_sectors = blk_queue_round_sectors(q,
> +			q->limits.max_sectors)
>  }
>  EXPORT_SYMBOL(blk_queue_logical_block_size);

Yes, something like this was what I had in mind so that 4Kn drives get a
sensible value aligned to the 4K LBA, always. However, with the above, there is
no guarantee that max_sectors is already set when the logical block size is set.
I am not sure about the reverse either. So the rounding may need to be in both
blk_queue_logical_block_size() and blk_queue_max_hw_sectors().


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-17  9:19           ` Damien Le Moal
@ 2020-07-17 10:02             ` Ming Lei
  2020-07-17 10:55               ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-07-17 10:02 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: hch, Johannes Thumshirn, Jens Axboe, linux-block

On Fri, Jul 17, 2020 at 09:19:50AM +0000, Damien Le Moal wrote:
> On 2020/07/17 18:12, Ming Lei wrote:
> > On Fri, Jul 17, 2020 at 08:22:45AM +0000, Damien Le Moal wrote:
> >> On 2020/07/17 16:50, Ming Lei wrote:
> >>> On Fri, Jul 17, 2020 at 02:45:25AM +0000, Damien Le Moal wrote:
> >>>> On 2020/07/16 23:35, Christoph Hellwig wrote:
> >>>>> On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
> >>>>>> Max append sectors needs to be aligned to physical block size, otherwise
> >>>>>> we can end up in a situation where it's off by 1-3 sectors which would
> >>>>>> cause short writes with asynchronous zone append submissions from an FS.
> >>>>>
> >>>>> Huh? The physical block size is purely a hint.
> >>>>
> >>>> For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size.
> >>>
> >>> Then the physical block size should be same with logical block size.
> >>> The real workable limit for io request is aligned with logical block size.
> >>
> >> Yes, I know. This T10/T13 design is not the brightest thing they did... on 512e
> >> SMR drives, addressing is LBA=512B unit, but all writes in sequential zones must
> >> be 4K aligned (8 LBAs).
> > 
> > Then the issue isn't related with zone append command only. Just wondering how this
> > special write block size alignment is enhanced in sequential zones. So far, write
> > from FS or raw block size is only logical block size aligned.
> 
> This is not enforced in sd/sd_zbc.c. If the user issues a non 4K aligned
> request, it will get back an "unaligned write" error from the drive. zonefs and
> dm-zoned define a 4K block size to avoid that. For applications doing raw block
> device accesses, they have to issue properly aligned writes.

OK, then I guess either:

1) the same write alignment issue for zone append can be handled in
same way with normal write on seq zone

OR

2) add one new limit for write on seq zone, such as: zone_write_block_size

Then the two cases can be dealt with in same way, and physical block
size is usually a hint as Christoph mentioned, looks a bit weird to use
it in this way, or at least the story should be documented.

> 
> > 
> >>
> >>>
> >>>> However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K
> >>>> on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to
> >>>> 255 x 512B sectors, which is not 4K aligned if the nullb device is created with
> >>>> 4K block size.
> >>>
> >>> Actually the real limit is from max_sectors_kb which is <= max_hw_sectors_kb, and
> >>> both should be aligned with logical block size, IMO.
> >>
> >> Yes, agreed, but for nullblk device created with block size = 4K it is not. So
> > 
> > That is because the default magic number of BLK_SAFE_MAX_SECTORS.
> > 
> >> one driver to patch for sure. However, I though having some forced alignment in
> >> blk_queue_max_hw_sectors() for limit->max_hw_sectors and limit->max_sectors
> >> would avoid tripping on weird values for weird drives...
> > 
> > Maybe we can update it once the logical block size is available, such
> > as:
> > 
> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index 9a2c23cd9700..f9cbaadaa267 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -311,6 +311,14 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
> >  }
> >  EXPORT_SYMBOL(blk_queue_max_segment_size);
> >  
> > +static unsigned blk_queue_round_sectors(struct request_queue *q,
> > +		unsigned sectors)
> > +{
> > +	u64 bytes = sectors << 9;
> > +
> > +	return (unsigned)round_down(bytes, queue_logical_block_size(q));
> > +}
> > +
> >  /**
> >   * blk_queue_logical_block_size - set logical block size for the queue
> >   * @q:  the request queue for the device
> > @@ -330,6 +338,9 @@ void blk_queue_logical_block_size(struct request_queue *q, unsigned int size)
> >  
> >  	if (q->limits.io_min < q->limits.physical_block_size)
> >  		q->limits.io_min = q->limits.physical_block_size;
> > +
> > +	q->limits.max_sectors = blk_queue_round_sectors(q,
> > +			q->limits.max_sectors)
> >  }
> >  EXPORT_SYMBOL(blk_queue_logical_block_size);
> 
> Yes, something like this was what I had in mind so that 4Kn drives get a
> sensible value aligned to the 4K LBA, always. However, with the above, there is
> no guarantee that max_sectors is already set when the logical block size is set.
> I am not sure about the reverse either. So the rounding may need to be in both
> blk_queue_logical_block_size() and blk_queue_max_hw_sectors().

OK, that looks better.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-17 10:02             ` Ming Lei
@ 2020-07-17 10:55               ` Damien Le Moal
  2020-07-20 11:08                 ` hch
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2020-07-17 10:55 UTC (permalink / raw)
  To: Ming Lei; +Cc: hch, Johannes Thumshirn, Jens Axboe, linux-block

On 2020/07/17 19:02, Ming Lei wrote:
> On Fri, Jul 17, 2020 at 09:19:50AM +0000, Damien Le Moal wrote:
>> On 2020/07/17 18:12, Ming Lei wrote:
>>> On Fri, Jul 17, 2020 at 08:22:45AM +0000, Damien Le Moal wrote:
>>>> On 2020/07/17 16:50, Ming Lei wrote:
>>>>> On Fri, Jul 17, 2020 at 02:45:25AM +0000, Damien Le Moal wrote:
>>>>>> On 2020/07/16 23:35, Christoph Hellwig wrote:
>>>>>>> On Thu, Jul 16, 2020 at 07:09:33PM +0900, Johannes Thumshirn wrote:
>>>>>>>> Max append sectors needs to be aligned to physical block size, otherwise
>>>>>>>> we can end up in a situation where it's off by 1-3 sectors which would
>>>>>>>> cause short writes with asynchronous zone append submissions from an FS.
>>>>>>>
>>>>>>> Huh? The physical block size is purely a hint.
>>>>>>
>>>>>> For ZBC/ZAC SMR drives, all writes must be aligned to the physical sector size.
>>>>>
>>>>> Then the physical block size should be same with logical block size.
>>>>> The real workable limit for io request is aligned with logical block size.
>>>>
>>>> Yes, I know. This T10/T13 design is not the brightest thing they did... on 512e
>>>> SMR drives, addressing is LBA=512B unit, but all writes in sequential zones must
>>>> be 4K aligned (8 LBAs).
>>>
>>> Then the issue isn't related with zone append command only. Just wondering how this
>>> special write block size alignment is enhanced in sequential zones. So far, write
>>> from FS or raw block size is only logical block size aligned.
>>
>> This is not enforced in sd/sd_zbc.c. If the user issues a non 4K aligned
>> request, it will get back an "unaligned write" error from the drive. zonefs and
>> dm-zoned define a 4K block size to avoid that. For applications doing raw block
>> device accesses, they have to issue properly aligned writes.
> 
> OK, then I guess either:
> 
> 1) the same write alignment issue for zone append can be handled in
> same way with normal write on seq zone

Yes, absolutely. I think this is fine. The point was that we at least should be
exposing a meaningful maximum command size for that, so that the user does not
have to align that maximum too.

> 
> OR
> 
> 2) add one new limit for write on seq zone, such as: zone_write_block_size
> 
> Then the two cases can be dealt with in same way, and physical block
> size is usually a hint as Christoph mentioned, looks a bit weird to use
> it in this way, or at least the story should be documented.

Yeah, but zone_write_block_size would end up always being equal to the physical
block size for SMR. For ZNS and nullblk, logical block size == physical block
size, always, so it would not be useful. I do not think such change is necessary.

> 
>>
>>>
>>>>
>>>>>
>>>>>> However, sd/sd_zbc does not change max_hw_sectors_kb to ensure alignment to 4K
>>>>>> on 512e disks. There is also nullblk which uses the default max_hw_sectors_kb to
>>>>>> 255 x 512B sectors, which is not 4K aligned if the nullb device is created with
>>>>>> 4K block size.
>>>>>
>>>>> Actually the real limit is from max_sectors_kb which is <= max_hw_sectors_kb, and
>>>>> both should be aligned with logical block size, IMO.
>>>>
>>>> Yes, agreed, but for nullblk device created with block size = 4K it is not. So
>>>
>>> That is because the default magic number of BLK_SAFE_MAX_SECTORS.
>>>
>>>> one driver to patch for sure. However, I though having some forced alignment in
>>>> blk_queue_max_hw_sectors() for limit->max_hw_sectors and limit->max_sectors
>>>> would avoid tripping on weird values for weird drives...
>>>
>>> Maybe we can update it once the logical block size is available, such
>>> as:
>>>
>>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>>> index 9a2c23cd9700..f9cbaadaa267 100644
>>> --- a/block/blk-settings.c
>>> +++ b/block/blk-settings.c
>>> @@ -311,6 +311,14 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
>>>  }
>>>  EXPORT_SYMBOL(blk_queue_max_segment_size);
>>>  
>>> +static unsigned blk_queue_round_sectors(struct request_queue *q,
>>> +		unsigned sectors)
>>> +{
>>> +	u64 bytes = sectors << 9;
>>> +
>>> +	return (unsigned)round_down(bytes, queue_logical_block_size(q));
>>> +}
>>> +
>>>  /**
>>>   * blk_queue_logical_block_size - set logical block size for the queue
>>>   * @q:  the request queue for the device
>>> @@ -330,6 +338,9 @@ void blk_queue_logical_block_size(struct request_queue *q, unsigned int size)
>>>  
>>>  	if (q->limits.io_min < q->limits.physical_block_size)
>>>  		q->limits.io_min = q->limits.physical_block_size;
>>> +
>>> +	q->limits.max_sectors = blk_queue_round_sectors(q,
>>> +			q->limits.max_sectors)
>>>  }
>>>  EXPORT_SYMBOL(blk_queue_logical_block_size);
>>
>> Yes, something like this was what I had in mind so that 4Kn drives get a
>> sensible value aligned to the 4K LBA, always. However, with the above, there is
>> no guarantee that max_sectors is already set when the logical block size is set.
>> I am not sure about the reverse either. So the rounding may need to be in both
>> blk_queue_logical_block_size() and blk_queue_max_hw_sectors().
> 
> OK, that looks better.

OK.

Johannes, care to resend your patch ?

> 
> Thanks,
> Ming
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-17 10:55               ` Damien Le Moal
@ 2020-07-20 11:08                 ` hch
  2020-07-20 12:32                   ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: hch @ 2020-07-20 11:08 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Ming Lei, hch, Johannes Thumshirn, Jens Axboe, linux-block

On Fri, Jul 17, 2020 at 10:55:49AM +0000, Damien Le Moal wrote:
> > 
> > 2) add one new limit for write on seq zone, such as: zone_write_block_size
> > 
> > Then the two cases can be dealt with in same way, and physical block
> > size is usually a hint as Christoph mentioned, looks a bit weird to use
> > it in this way, or at least the story should be documented.
> 
> Yeah, but zone_write_block_size would end up always being equal to the physical
> block size for SMR. For ZNS and nullblk, logical block size == physical block
> size, always, so it would not be useful. I do not think such change is necessary.

I think we should add a write_block_size (or write_granularity) field.
There have been some early stage NVMe proposal to add that even for
conventional random/write namespaces.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-20 11:08                 ` hch
@ 2020-07-20 12:32                   ` Damien Le Moal
  2020-07-20 12:34                     ` hch
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2020-07-20 12:32 UTC (permalink / raw)
  To: hch; +Cc: Ming Lei, Johannes Thumshirn, Jens Axboe, linux-block

On 2020/07/20 20:08, hch@infradead.org wrote:
> On Fri, Jul 17, 2020 at 10:55:49AM +0000, Damien Le Moal wrote:
>>>
>>> 2) add one new limit for write on seq zone, such as: zone_write_block_size
>>>
>>> Then the two cases can be dealt with in same way, and physical block
>>> size is usually a hint as Christoph mentioned, looks a bit weird to use
>>> it in this way, or at least the story should be documented.
>>
>> Yeah, but zone_write_block_size would end up always being equal to the physical
>> block size for SMR. For ZNS and nullblk, logical block size == physical block
>> size, always, so it would not be useful. I do not think such change is necessary.
> 
> I think we should add a write_block_size (or write_granularity) field.
> There have been some early stage NVMe proposal to add that even for
> conventional random/write namespaces.

OK. We can add that easily enough. The default value will be the logical block
size and only ZBC/ZAC SMR drives will need to set that to the physical block size.

But for regular 4Kn drives, including all logical drives like null_blk, I think
it would still be nice to have a max_hw_sectors and max_sectors aligned on 4K.
We can enforce that generically in the block layer when setting these limits, or
audit drivers and correct those setting weird values (like null_blk). Which
approach do you think is better ?


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-20 12:32                   ` Damien Le Moal
@ 2020-07-20 12:34                     ` hch
  2020-07-20 12:35                       ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: hch @ 2020-07-20 12:34 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: hch, Ming Lei, Johannes Thumshirn, Jens Axboe, linux-block

On Mon, Jul 20, 2020 at 12:32:48PM +0000, Damien Le Moal wrote:
> But for regular 4Kn drives, including all logical drives like null_blk, I think
> it would still be nice to have a max_hw_sectors and max_sectors aligned on 4K.
> We can enforce that generically in the block layer when setting these limits, or
> audit drivers and correct those setting weird values (like null_blk). Which
> approach do you think is better ?

I guess we can just round down in the block layer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] block: align max append sectors to physical block size
  2020-07-20 12:34                     ` hch
@ 2020-07-20 12:35                       ` Damien Le Moal
  0 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2020-07-20 12:35 UTC (permalink / raw)
  To: hch; +Cc: Ming Lei, Johannes Thumshirn, Jens Axboe, linux-block

On 2020/07/20 21:34, hch@infradead.org wrote:
> On Mon, Jul 20, 2020 at 12:32:48PM +0000, Damien Le Moal wrote:
>> But for regular 4Kn drives, including all logical drives like null_blk, I think
>> it would still be nice to have a max_hw_sectors and max_sectors aligned on 4K.
>> We can enforce that generically in the block layer when setting these limits, or
>> audit drivers and correct those setting weird values (like null_blk). Which
>> approach do you think is better ?
> 
> I guess we can just round down in the block layer.

Works for me.


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-07-20 12:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-16 10:09 [PATCH] block: align max append sectors to physical block size Johannes Thumshirn
2020-07-16 14:34 ` Christoph Hellwig
2020-07-17  2:45   ` Damien Le Moal
2020-07-17  7:50     ` Ming Lei
2020-07-17  8:22       ` Damien Le Moal
2020-07-17  9:11         ` Ming Lei
2020-07-17  9:19           ` Damien Le Moal
2020-07-17 10:02             ` Ming Lei
2020-07-17 10:55               ` Damien Le Moal
2020-07-20 11:08                 ` hch
2020-07-20 12:32                   ` Damien Le Moal
2020-07-20 12:34                     ` hch
2020-07-20 12:35                       ` Damien Le Moal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.