linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: Fix io_opt limit setting
@ 2020-05-14  1:54 Damien Le Moal
  2020-05-14  3:29 ` Martin K. Petersen
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Damien Le Moal @ 2020-05-14  1:54 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch, Sagi Grimberg,
	linux-block, Jens Axboe

Currently, a namespace io_opt queue limit is set by default to the
physical sector size of the namespace and to the the write optimal
size (NOWS) when the namespace reports this value. This causes problems
with block limits stacking in blk_stack_limits() when a namespace block
device is combined with an HDD which generally do not report any optimal
transfer size (io_opt limit is 0). The code:

/* Optimal I/O a multiple of the physical block size? */
if (t->io_opt & (t->physical_block_size - 1)) {
	t->io_opt = 0;
	t->misaligned = 1;
	ret = -1;
}

results in blk_stack_limits() to return an error when the combined
devices have different but compatible physical sector sizes (e.g. 512B
sector SSD with 4KB sector disks).

Fix this by not setting the optiomal IO size limit if the namespace does
not report an optimal write size value.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/nvme/host/core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f3c037f5a9ba..0729173053ed 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1809,7 +1809,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 {
 	sector_t capacity = nvme_lba_to_sect(ns, le64_to_cpu(id->nsze));
 	unsigned short bs = 1 << ns->lba_shift;
-	u32 atomic_bs, phys_bs, io_opt;
+	u32 atomic_bs, phys_bs, io_opt = 0;
 
 	if (ns->lba_shift > PAGE_SHIFT) {
 		/* unsupported block size, set capacity to 0 later */
@@ -1832,12 +1832,11 @@ static void nvme_update_disk_info(struct gendisk *disk,
 		atomic_bs = bs;
 	}
 	phys_bs = bs;
-	io_opt = bs;
 	if (id->nsfeat & (1 << 4)) {
 		/* NPWG = Namespace Preferred Write Granularity */
 		phys_bs *= 1 + le16_to_cpu(id->npwg);
 		/* NOWS = Namespace Optimal Write Size */
-		io_opt *= 1 + le16_to_cpu(id->nows);
+		io_opt = bs * (1 + le16_to_cpu(id->nows));
 	}
 
 	blk_queue_logical_block_size(disk->queue, bs);
@@ -1848,7 +1847,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	 */
 	blk_queue_physical_block_size(disk->queue, min(phys_bs, atomic_bs));
 	blk_queue_io_min(disk->queue, phys_bs);
-	blk_queue_io_opt(disk->queue, io_opt);
+	if (io_opt)
+		blk_queue_io_opt(disk->queue, io_opt);
 
 	if (ns->ms && !ns->ext &&
 	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
-- 
2.25.4


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  1:54 [PATCH] nvme: Fix io_opt limit setting Damien Le Moal
@ 2020-05-14  3:29 ` Martin K. Petersen
  2020-05-14  3:40 ` Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2020-05-14  3:29 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block, Keith Busch,
	Christoph Hellwig


Damien,

> results in blk_stack_limits() to return an error when the combined
> devices have different but compatible physical sector sizes (e.g. 512B
> sector SSD with 4KB sector disks).

We'll need to get that stacking logic fixed up to take io_opt into
account when scaling pbs/min. Just as a safety measure in case we don't
catch devices reporting crazy values in the LLDs.

> Fix this by not setting the optiomal IO size limit if the namespace

optimal

> does not report an optimal write size value.

Setting io_opt to the logical block size in the NVMe driver is
equivalent to telling the filesystems that they should not submit I/Os
larger than one sector. That makes no sense. This change is correct.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  1:54 [PATCH] nvme: Fix io_opt limit setting Damien Le Moal
  2020-05-14  3:29 ` Martin K. Petersen
@ 2020-05-14  3:40 ` Keith Busch
  2020-05-14  3:47   ` Damien Le Moal
  2020-05-14  4:47 ` Bart Van Assche
  2020-05-14  6:11 ` Hannes Reinecke
  3 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2020-05-14  3:40 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Thu, May 14, 2020 at 10:54:52AM +0900, Damien Le Moal wrote:
> Currently, a namespace io_opt queue limit is set by default to the
> physical sector size of the namespace and to the the write optimal
> size (NOWS) when the namespace reports this value. This causes problems
> with block limits stacking in blk_stack_limits() when a namespace block
> device is combined with an HDD which generally do not report any optimal
> transfer size (io_opt limit is 0). The code:
> 
> /* Optimal I/O a multiple of the physical block size? */
> if (t->io_opt & (t->physical_block_size - 1)) {
> 	t->io_opt = 0;
> 	t->misaligned = 1;
> 	ret = -1;
> }
> 
> results in blk_stack_limits() to return an error when the combined
> devices have different but compatible physical sector sizes (e.g. 512B
> sector SSD with 4KB sector disks).
> 
> Fix this by not setting the optiomal IO size limit if the namespace does
> not report an optimal write size value.

Won't this continue to break if a controller does report NOWS that's not
a multiple of the physical block size of the device it's stacking with?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  3:40 ` Keith Busch
@ 2020-05-14  3:47   ` Damien Le Moal
  2020-05-14  4:12     ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Damien Le Moal @ 2020-05-14  3:47 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, Jens Axboe, Christoph Hellwig, linux-nvme, Sagi Grimberg

On 2020/05/14 12:40, Keith Busch wrote:
> On Thu, May 14, 2020 at 10:54:52AM +0900, Damien Le Moal wrote:
>> Currently, a namespace io_opt queue limit is set by default to the
>> physical sector size of the namespace and to the the write optimal
>> size (NOWS) when the namespace reports this value. This causes problems
>> with block limits stacking in blk_stack_limits() when a namespace block
>> device is combined with an HDD which generally do not report any optimal
>> transfer size (io_opt limit is 0). The code:
>>
>> /* Optimal I/O a multiple of the physical block size? */
>> if (t->io_opt & (t->physical_block_size - 1)) {
>> 	t->io_opt = 0;
>> 	t->misaligned = 1;
>> 	ret = -1;
>> }
>>
>> results in blk_stack_limits() to return an error when the combined
>> devices have different but compatible physical sector sizes (e.g. 512B
>> sector SSD with 4KB sector disks).
>>
>> Fix this by not setting the optiomal IO size limit if the namespace does
>> not report an optimal write size value.
> 
> Won't this continue to break if a controller does report NOWS that's not
> a multiple of the physical block size of the device it's stacking with?

When io_opt stacking is handled, the physical sector size for the stacked device
is already resolved to a common value. If the NOWS value cannot accommodate this
resolved physical sector size, this is an incompatible stacking, so failing is
OK in that case.


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  3:47   ` Damien Le Moal
@ 2020-05-14  4:12     ` Keith Busch
  2020-05-14  4:13       ` Damien Le Moal
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2020-05-14  4:12 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-block, Jens Axboe, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Thu, May 14, 2020 at 03:47:56AM +0000, Damien Le Moal wrote:
> On 2020/05/14 12:40, Keith Busch wrote:
> > On Thu, May 14, 2020 at 10:54:52AM +0900, Damien Le Moal wrote:
> >> Currently, a namespace io_opt queue limit is set by default to the
> >> physical sector size of the namespace and to the the write optimal
> >> size (NOWS) when the namespace reports this value. This causes problems
> >> with block limits stacking in blk_stack_limits() when a namespace block
> >> device is combined with an HDD which generally do not report any optimal
> >> transfer size (io_opt limit is 0). The code:
> >>
> >> /* Optimal I/O a multiple of the physical block size? */
> >> if (t->io_opt & (t->physical_block_size - 1)) {
> >> 	t->io_opt = 0;
> >> 	t->misaligned = 1;
> >> 	ret = -1;
> >> }
> >>
> >> results in blk_stack_limits() to return an error when the combined
> >> devices have different but compatible physical sector sizes (e.g. 512B
> >> sector SSD with 4KB sector disks).
> >>
> >> Fix this by not setting the optiomal IO size limit if the namespace does
> >> not report an optimal write size value.
> > 
> > Won't this continue to break if a controller does report NOWS that's not
> > a multiple of the physical block size of the device it's stacking with?
> 
> When io_opt stacking is handled, the physical sector size for the stacked device
> is already resolved to a common value. If the NOWS value cannot accommodate this
> resolved physical sector size, this is an incompatible stacking, so failing is
> OK in that case.

I see, though it's not strictly incompatible as io_opt is merely a hint
that could continue to work if the stacked limit was recalculated as:

	if (t->io_opt & (t->physical_block_size - 1))
	 	t->io_opt = lcm(t->io_opt, t->physical_block_size);

Regardless, your patch does make sense, but it does have a merge
conflict with nvme-5.8.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  4:12     ` Keith Busch
@ 2020-05-14  4:13       ` Damien Le Moal
  0 siblings, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2020-05-14  4:13 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, Jens Axboe, Christoph Hellwig, linux-nvme, Sagi Grimberg

On 2020/05/14 13:12, Keith Busch wrote:
> On Thu, May 14, 2020 at 03:47:56AM +0000, Damien Le Moal wrote:
>> On 2020/05/14 12:40, Keith Busch wrote:
>>> On Thu, May 14, 2020 at 10:54:52AM +0900, Damien Le Moal wrote:
>>>> Currently, a namespace io_opt queue limit is set by default to the
>>>> physical sector size of the namespace and to the the write optimal
>>>> size (NOWS) when the namespace reports this value. This causes problems
>>>> with block limits stacking in blk_stack_limits() when a namespace block
>>>> device is combined with an HDD which generally do not report any optimal
>>>> transfer size (io_opt limit is 0). The code:
>>>>
>>>> /* Optimal I/O a multiple of the physical block size? */
>>>> if (t->io_opt & (t->physical_block_size - 1)) {
>>>> 	t->io_opt = 0;
>>>> 	t->misaligned = 1;
>>>> 	ret = -1;
>>>> }
>>>>
>>>> results in blk_stack_limits() to return an error when the combined
>>>> devices have different but compatible physical sector sizes (e.g. 512B
>>>> sector SSD with 4KB sector disks).
>>>>
>>>> Fix this by not setting the optiomal IO size limit if the namespace does
>>>> not report an optimal write size value.
>>>
>>> Won't this continue to break if a controller does report NOWS that's not
>>> a multiple of the physical block size of the device it's stacking with?
>>
>> When io_opt stacking is handled, the physical sector size for the stacked device
>> is already resolved to a common value. If the NOWS value cannot accommodate this
>> resolved physical sector size, this is an incompatible stacking, so failing is
>> OK in that case.
> 
> I see, though it's not strictly incompatible as io_opt is merely a hint
> that could continue to work if the stacked limit was recalculated as:
> 
> 	if (t->io_opt & (t->physical_block_size - 1))
> 	 	t->io_opt = lcm(t->io_opt, t->physical_block_size);
> 
> Regardless, your patch does make sense, but it does have a merge
> conflict with nvme-5.8.

Ooops. I will rebase and resend.

And maybe we should send your suggestion above as a proper patch ?

> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  1:54 [PATCH] nvme: Fix io_opt limit setting Damien Le Moal
  2020-05-14  3:29 ` Martin K. Petersen
  2020-05-14  3:40 ` Keith Busch
@ 2020-05-14  4:47 ` Bart Van Assche
  2020-05-14  4:49   ` Damien Le Moal
  2020-05-14 22:19   ` Martin K. Petersen
  2020-05-14  6:11 ` Hannes Reinecke
  3 siblings, 2 replies; 10+ messages in thread
From: Bart Van Assche @ 2020-05-14  4:47 UTC (permalink / raw)
  To: Damien Le Moal, linux-nvme, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, linux-block, Jens Axboe

On 2020-05-13 18:54, Damien Le Moal wrote:
> @@ -1848,7 +1847,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
>  	 */
>  	blk_queue_physical_block_size(disk->queue, min(phys_bs, atomic_bs));
>  	blk_queue_io_min(disk->queue, phys_bs);
> -	blk_queue_io_opt(disk->queue, io_opt);
> +	if (io_opt)
> +		blk_queue_io_opt(disk->queue, io_opt);

The above change looks confusing to me. We want the NVMe driver to set
io_opt, so why only call blk_queue_io_opt() if io_opt != 0? That means
that the io_opt value will be left to any value set by the block layer
core if io_opt == 0 instead of properly being set to zero.

Thanks,

Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  4:47 ` Bart Van Assche
@ 2020-05-14  4:49   ` Damien Le Moal
  2020-05-14 22:19   ` Martin K. Petersen
  1 sibling, 0 replies; 10+ messages in thread
From: Damien Le Moal @ 2020-05-14  4:49 UTC (permalink / raw)
  To: Bart Van Assche, linux-nvme, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, linux-block, Jens Axboe

On 2020/05/14 13:47, Bart Van Assche wrote:
> On 2020-05-13 18:54, Damien Le Moal wrote:
>> @@ -1848,7 +1847,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
>>  	 */
>>  	blk_queue_physical_block_size(disk->queue, min(phys_bs, atomic_bs));
>>  	blk_queue_io_min(disk->queue, phys_bs);
>> -	blk_queue_io_opt(disk->queue, io_opt);
>> +	if (io_opt)
>> +		blk_queue_io_opt(disk->queue, io_opt);
> 
> The above change looks confusing to me. We want the NVMe driver to set
> io_opt, so why only call blk_queue_io_opt() if io_opt != 0? That means
> that the io_opt value will be left to any value set by the block layer
> core if io_opt == 0 instead of properly being set to zero.

OK. I will remove the "if".

> 
> Thanks,
> 
> Bart.
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  1:54 [PATCH] nvme: Fix io_opt limit setting Damien Le Moal
                   ` (2 preceding siblings ...)
  2020-05-14  4:47 ` Bart Van Assche
@ 2020-05-14  6:11 ` Hannes Reinecke
  3 siblings, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2020-05-14  6:11 UTC (permalink / raw)
  To: Damien Le Moal, linux-nvme, Christoph Hellwig, Keith Busch,
	Sagi Grimberg, linux-block, Jens Axboe

On 5/14/20 3:54 AM, Damien Le Moal wrote:
> Currently, a namespace io_opt queue limit is set by default to the
> physical sector size of the namespace and to the the write optimal
> size (NOWS) when the namespace reports this value. This causes problems
> with block limits stacking in blk_stack_limits() when a namespace block
> device is combined with an HDD which generally do not report any optimal
> transfer size (io_opt limit is 0). The code:
> 
> /* Optimal I/O a multiple of the physical block size? */
> if (t->io_opt & (t->physical_block_size - 1)) {
> 	t->io_opt = 0;
> 	t->misaligned = 1;
> 	ret = -1;
> }
> 
> results in blk_stack_limits() to return an error when the combined
> devices have different but compatible physical sector sizes (e.g. 512B
> sector SSD with 4KB sector disks).
> 
> Fix this by not setting the optiomal IO size limit if the namespace does
> not report an optimal write size value.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>   drivers/nvme/host/core.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
Ah, so you beat me to it :-)

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme: Fix io_opt limit setting
  2020-05-14  4:47 ` Bart Van Assche
  2020-05-14  4:49   ` Damien Le Moal
@ 2020-05-14 22:19   ` Martin K. Petersen
  1 sibling, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2020-05-14 22:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Damien Le Moal, Sagi Grimberg, linux-nvme,
	linux-block, Keith Busch, Christoph Hellwig


Bart,

> The above change looks confusing to me. We want the NVMe driver to set
> io_opt, so why only call blk_queue_io_opt() if io_opt != 0? That means
> that the io_opt value will be left to any value set by the block layer
> core if io_opt == 0 instead of properly being set to zero.

We do explicitly set it to 0 when allocating a queue. But no biggie.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-14 22:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14  1:54 [PATCH] nvme: Fix io_opt limit setting Damien Le Moal
2020-05-14  3:29 ` Martin K. Petersen
2020-05-14  3:40 ` Keith Busch
2020-05-14  3:47   ` Damien Le Moal
2020-05-14  4:12     ` Keith Busch
2020-05-14  4:13       ` Damien Le Moal
2020-05-14  4:47 ` Bart Van Assche
2020-05-14  4:49   ` Damien Le Moal
2020-05-14 22:19   ` Martin K. Petersen
2020-05-14  6:11 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).