All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node
@ 2024-03-21 21:08 Christoph Hellwig
  2024-03-22  5:15 ` Nilay Shroff
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Christoph Hellwig @ 2024-03-21 21:08 UTC (permalink / raw)
  To: kbusch, sagi; +Cc: linux-nvme, Nilay Shroff

Linux 6.9 made the nvme multipath nodes not properly pick up changes when
the LBA size goes smaller after an nvme format.  This is because we now
try to inherit the queue settings for the multipath node entirely from
the individual paths.  That is the right thing to do for I/O size
limitations, which make up most of the queue limits, but it is wrong for
changes to the namespace configuration, where we do want to pick up the
new format, which will eventually show up on all paths once they are
re-queried.

Fix this by not inheriting the block size and related fields and always
for updating them.

Fixes: 8f03cfa117e0 ("nvme: don't use nvme_update_disk_info for the multipath disk")
Reported-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 00864a63447099..4bac54d4e0015b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2204,6 +2204,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 	}
 
 	if (!ret && nvme_ns_head_multipath(ns->head)) {
+		struct queue_limits *ns_lim = &ns->disk->queue->limits;
 		struct queue_limits lim;
 
 		blk_mq_freeze_queue(ns->head->disk->queue);
@@ -2215,7 +2216,26 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 		set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
 		nvme_mpath_revalidate_paths(ns);
 
+		/*
+		 * queue_limits mixes values that are the hardware limitations
+		 * for bio splitting with what is the device configuration.
+		 *
+		 * For NVMe the device configuration can change after e.g. a
+		 * Format command, and we really want to pick up the new format
+		 * value here.  But we must still stack the queue limits to the
+		 * least common denominator for multipathing to split the bios
+		 * properly.
+		 *
+		 * To work around this, we explicitly set the device
+		 * configuration to those that we just queried, but only stack
+		 * the splitting limits in to make sure we still obey possibly
+		 * lower limitations of other controllers.
+		 */
 		lim = queue_limits_start_update(ns->head->disk->queue);
+		lim.logical_block_size = ns_lim->logical_block_size;
+		lim.physical_block_size = ns_lim->physical_block_size;
+		lim.io_min = ns_lim->io_min;
+		lim.io_opt = ns_lim->io_opt;
 		queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
 					ns->head->disk->disk_name);
 		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node
  2024-03-21 21:08 [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node Christoph Hellwig
@ 2024-03-22  5:15 ` Nilay Shroff
  2024-03-22 16:22 ` Keith Busch
  2024-03-23 20:33 ` Sagi Grimberg
  2 siblings, 0 replies; 6+ messages in thread
From: Nilay Shroff @ 2024-03-22  5:15 UTC (permalink / raw)
  To: Christoph Hellwig, kbusch, sagi; +Cc: linux-nvme, gjoyce



On 3/22/24 02:38, Christoph Hellwig wrote:
> Linux 6.9 made the nvme multipath nodes not properly pick up changes when
> the LBA size goes smaller after an nvme format.  This is because we now
> try to inherit the queue settings for the multipath node entirely from
> the individual paths.  That is the right thing to do for I/O size
> limitations, which make up most of the queue limits, but it is wrong for
> changes to the namespace configuration, where we do want to pick up the
> new format, which will eventually show up on all paths once they are
> re-queried.
> 
> Fix this by not inheriting the block size and related fields and always
> for updating them.
> 
> Fixes: 8f03cfa117e0 ("nvme: don't use nvme_update_disk_info for the multipath disk")
> Reported-by: Nilay Shroff <nilay@linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 00864a63447099..4bac54d4e0015b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2204,6 +2204,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
>  	}
>  
>  	if (!ret && nvme_ns_head_multipath(ns->head)) {
> +		struct queue_limits *ns_lim = &ns->disk->queue->limits;
>  		struct queue_limits lim;
>  
>  		blk_mq_freeze_queue(ns->head->disk->queue);
> @@ -2215,7 +2216,26 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
>  		set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
>  		nvme_mpath_revalidate_paths(ns);
>  
> +		/*
> +		 * queue_limits mixes values that are the hardware limitations
> +		 * for bio splitting with what is the device configuration.
> +		 *
> +		 * For NVMe the device configuration can change after e.g. a
> +		 * Format command, and we really want to pick up the new format
> +		 * value here.  But we must still stack the queue limits to the
> +		 * least common denominator for multipathing to split the bios
> +		 * properly.
> +		 *
> +		 * To work around this, we explicitly set the device
> +		 * configuration to those that we just queried, but only stack
> +		 * the splitting limits in to make sure we still obey possibly
> +		 * lower limitations of other controllers.
> +		 */
>  		lim = queue_limits_start_update(ns->head->disk->queue);
> +		lim.logical_block_size = ns_lim->logical_block_size;
> +		lim.physical_block_size = ns_lim->physical_block_size;
> +		lim.io_min = ns_lim->io_min;
> +		lim.io_opt = ns_lim->io_opt;
>  		queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
>  					ns->head->disk->disk_name);
>  		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);

I had tested the above patch from Christoph and it looks good.

Test results could be found here: 
https://lore.kernel.org/all/239228ec-6c8d-432c-905d-b477014deee3@linux.ibm.com/

Thanks,
--Nilay



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node
  2024-03-21 21:08 [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node Christoph Hellwig
  2024-03-22  5:15 ` Nilay Shroff
@ 2024-03-22 16:22 ` Keith Busch
  2024-04-02 13:21   ` Nilay Shroff
  2024-03-23 20:33 ` Sagi Grimberg
  2 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2024-03-22 16:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sagi, linux-nvme, Nilay Shroff

On Fri, Mar 22, 2024 at 07:08:19AM +1000, Christoph Hellwig wrote:
> Linux 6.9 made the nvme multipath nodes not properly pick up changes when
> the LBA size goes smaller after an nvme format.  This is because we now
> try to inherit the queue settings for the multipath node entirely from
> the individual paths.  That is the right thing to do for I/O size
> limitations, which make up most of the queue limits, but it is wrong for
> changes to the namespace configuration, where we do want to pick up the
> new format, which will eventually show up on all paths once they are
> re-queried.
> 
> Fix this by not inheriting the block size and related fields and always
> for updating them.
> 
> Fixes: 8f03cfa117e0 ("nvme: don't use nvme_update_disk_info for the multipath disk")

Applied to nvme-6.9. This just missed yesterday's pull, but at least
it's queued up for the next one.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node
  2024-03-21 21:08 [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node Christoph Hellwig
  2024-03-22  5:15 ` Nilay Shroff
  2024-03-22 16:22 ` Keith Busch
@ 2024-03-23 20:33 ` Sagi Grimberg
  2 siblings, 0 replies; 6+ messages in thread
From: Sagi Grimberg @ 2024-03-23 20:33 UTC (permalink / raw)
  To: Christoph Hellwig, kbusch; +Cc: linux-nvme, Nilay Shroff

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node
  2024-03-22 16:22 ` Keith Busch
@ 2024-04-02 13:21   ` Nilay Shroff
  2024-04-02 14:15     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Nilay Shroff @ 2024-04-02 13:21 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig; +Cc: sagi, linux-nvme

Hi Keith,

On 3/22/24 21:52, Keith Busch wrote:
> On Fri, Mar 22, 2024 at 07:08:19AM +1000, Christoph Hellwig wrote:
>> Linux 6.9 made the nvme multipath nodes not properly pick up changes when
>> the LBA size goes smaller after an nvme format.  This is because we now
>> try to inherit the queue settings for the multipath node entirely from
>> the individual paths.  That is the right thing to do for I/O size
>> limitations, which make up most of the queue limits, but it is wrong for
>> changes to the namespace configuration, where we do want to pick up the
>> new format, which will eventually show up on all paths once they are
>> re-queried.
>>
>> Fix this by not inheriting the block size and related fields and always
>> for updating them.
>>
>> Fixes: 8f03cfa117e0 ("nvme: don't use nvme_update_disk_info for the multipath disk")
> 
> Applied to nvme-6.9. This just missed yesterday's pull, but at least
> it's queued up for the next one.

I found this one couldn't made into 6.9-rc2. So are we planning to queue this fix
for the next rc?

Thanks,
--Nilay



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node
  2024-04-02 13:21   ` Nilay Shroff
@ 2024-04-02 14:15     ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2024-04-02 14:15 UTC (permalink / raw)
  To: Nilay Shroff; +Cc: Christoph Hellwig, sagi, linux-nvme

On Tue, Apr 02, 2024 at 06:51:02PM +0530, Nilay Shroff wrote:
> I found this one couldn't made into 6.9-rc2. So are we planning to queue this fix
> for the next rc?

Yes, it'll get in. I send pull requests on Thursday's but some unplanned
travel events got in the way last week.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-02 14:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21 21:08 [PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node Christoph Hellwig
2024-03-22  5:15 ` Nilay Shroff
2024-03-22 16:22 ` Keith Busch
2024-04-02 13:21   ` Nilay Shroff
2024-04-02 14:15     ` Keith Busch
2024-03-23 20:33 ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.