linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] nvmet: force reconnect when number of queue changes
@ 2022-10-25 15:50 Daniel Wagner
  2022-10-25 17:30 ` Chaitanya Kulkarni
  2022-11-01  9:45 ` Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-10-25 15:50 UTC (permalink / raw)
  To: linux-nvme
  Cc: Christoph Hellwig, Daniel Wagner, Hannes Reinecke, Sagi Grimberg

In order to test queue number changes we need to make sure that the
host reconnects. Because only when the host disconnects from the
target the number of queues are allowed to change according the spec.

The initial idea was to disable and re-enable the ports and have the
host wait until the KATO timer expires, triggering error
recovery. Though the host would see a DNR reply when trying to
reconnect. Because of the DNR bit the connection is dropped
completely. There is no point in trying to reconnect with the same
parameters according the spec.

We can force to reconnect the host is by deleting all controllers. The
host will observe any newly posted request to fail and thus starts the
error recovery but this time without the DNR bit set.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---

v4:
  - Rebased on -rc2
  - invalid memory fix went in as separate fix
    94f5a0688407 ("nvmet: fix invalid memory reference in
    nvmet_subsys_attr_qid_max_show")

v3:
  - Updated Fixes tag
  - Updated commit message with some additional information
    from the v2 discussion
  - https://lore.kernel.org/linux-nvme/20221007072934.9536-1-dwagner@suse.de/

v2:
  - instead preventing changes, force reconnect by delete ctrls
  - renamed patch
  - https://lore.kernel.org/linux-nvme/20220927143157.3659-1-dwagner@suse.de/

v1:
  - initial verison
  - https://lore.kernel.org/linux-nvme/20220913064203.133536-1-dwagner@suse.de/


 drivers/nvme/target/configfs.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 9443ee1d4ae3..051a420d818e 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1290,6 +1290,8 @@ static ssize_t nvmet_subsys_attr_qid_max_show(struct config_item *item,
 static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
 					       const char *page, size_t cnt)
 {
+	struct nvmet_subsys *subsys = to_subsys(item);
+	struct nvmet_ctrl *ctrl;
 	u16 qid_max;
 
 	if (sscanf(page, "%hu\n", &qid_max) != 1)
@@ -1299,8 +1301,13 @@ static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
 		return -EINVAL;
 
 	down_write(&nvmet_config_sem);
-	to_subsys(item)->max_qid = qid_max;
+	subsys->max_qid = qid_max;
+
+	/* Force reconnect */
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+		ctrl->ops->delete_ctrl(ctrl);
 	up_write(&nvmet_config_sem);
+
 	return cnt;
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_qid_max);
-- 
2.38.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
  2022-10-25 15:50 [PATCH v4] nvmet: force reconnect when number of queue changes Daniel Wagner
@ 2022-10-25 17:30 ` Chaitanya Kulkarni
  2022-10-26  5:53   ` Hannes Reinecke
  2022-11-01  9:45 ` Christoph Hellwig
  1 sibling, 1 reply; 5+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25 17:30 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme
  Cc: Christoph Hellwig, Hannes Reinecke, Sagi Grimberg

On 10/25/2022 8:50 AM, Daniel Wagner wrote:
> In order to test queue number changes we need to make sure that the
> host reconnects. Because only when the host disconnects from the
> target the number of queues are allowed to change according the spec.
> 
> The initial idea was to disable and re-enable the ports and have the
> host wait until the KATO timer expires, triggering error
> recovery. Though the host would see a DNR reply when trying to
> reconnect. Because of the DNR bit the connection is dropped
> completely. There is no point in trying to reconnect with the same
> parameters according the spec.
> 
> We can force to reconnect the host is by deleting all controllers. The
> host will observe any newly posted request to fail and thus starts the
> error recovery but this time without the DNR bit set.
> 

Without looking into the spec, isn't some sort of AEN should be used for
this ? please correct me if I'm wrong but deleting all the controllers 
and relaying on reconnect maybe overkill ? if it doesn't exists in
the NVMe spec then perhaps we should think/work on it to update the
spec ? Is it worth it ?

> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Acked-by: Sagi Grimberg <sagi@grimberg.me>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---


Reviewed-by: Chaitanya Kulkarni  <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
  2022-10-25 17:30 ` Chaitanya Kulkarni
@ 2022-10-26  5:53   ` Hannes Reinecke
  2022-10-26  7:22     ` Daniel Wagner
  0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2022-10-26  5:53 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Daniel Wagner, linux-nvme
  Cc: Christoph Hellwig, Sagi Grimberg

On 10/25/22 19:30, Chaitanya Kulkarni wrote:
> On 10/25/2022 8:50 AM, Daniel Wagner wrote:
>> In order to test queue number changes we need to make sure that the
>> host reconnects. Because only when the host disconnects from the
>> target the number of queues are allowed to change according the spec.
>>
>> The initial idea was to disable and re-enable the ports and have the
>> host wait until the KATO timer expires, triggering error
>> recovery. Though the host would see a DNR reply when trying to
>> reconnect. Because of the DNR bit the connection is dropped
>> completely. There is no point in trying to reconnect with the same
>> parameters according the spec.
>>
>> We can force to reconnect the host is by deleting all controllers. The
>> host will observe any newly posted request to fail and thus starts the
>> error recovery but this time without the DNR bit set.
>>
> 
> Without looking into the spec, isn't some sort of AEN should be used for
> this ? please correct me if I'm wrong but deleting all the controllers
> and relaying on reconnect maybe overkill ? if it doesn't exists in
> the NVMe spec then perhaps we should think/work on it to update the
> spec ? Is it worth it ?
> 
Problem is that changing the number of queues on the fly is really 
awkward, _and_ you have the problem that the number of queues is 
negotiated during the 'connect' call. So changing the number of queues 
from the controller side really requires us to reconnect to rectify the 
situation.

And no, I don't think it's overkill. It actually makes the situation 
easier as you start off from scratch with the correct number of queues.

Plus you have to quiesce all queues anyway when changing the number of 
queues; the tagset will be modified, requiring you to stop all I/O when 
doing so. So from the block layer side there isn't much of a difference 
here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
  2022-10-26  5:53   ` Hannes Reinecke
@ 2022-10-26  7:22     ` Daniel Wagner
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-10-26  7:22 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Chaitanya Kulkarni, linux-nvme, Christoph Hellwig, Sagi Grimberg

On Wed, Oct 26, 2022 at 07:53:56AM +0200, Hannes Reinecke wrote:
> > Without looking into the spec, isn't some sort of AEN should be used for
> > this ? please correct me if I'm wrong but deleting all the controllers
> > and relaying on reconnect maybe overkill ? if it doesn't exists in
> > the NVMe spec then perhaps we should think/work on it to update the
> > spec ? Is it worth it ?

As it turns out there is no AEN which really matches this. We scenario
had a short discussion on this topic with Fred during ALPSS and he was
not against adding something if there is a good use case for it.

The issue is that no one really came up with a good general use case
except for testing corner cases.

> Problem is that changing the number of queues on the fly is really awkward,
> _and_ you have the problem that the number of queues is negotiated during
> the 'connect' call. So changing the number of queues from the controller
> side really requires us to reconnect to rectify the situation.
> 
> And no, I don't think it's overkill. It actually makes the situation easier
> as you start off from scratch with the correct number of queues.

There is also the argument, that this approach doesn't rely on the host
to cooperate correctly. For example, if we had an AEN to tell the host
'hey, drop all connections and reconnect', the target still doesn't know
when and if this has been done by the host.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
  2022-10-25 15:50 [PATCH v4] nvmet: force reconnect when number of queue changes Daniel Wagner
  2022-10-25 17:30 ` Chaitanya Kulkarni
@ 2022-11-01  9:45 ` Christoph Hellwig
  1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2022-11-01  9:45 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme, Christoph Hellwig, Hannes Reinecke, Sagi Grimberg

Thanks,

applied to nvme-6.2.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-01  9:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25 15:50 [PATCH v4] nvmet: force reconnect when number of queue changes Daniel Wagner
2022-10-25 17:30 ` Chaitanya Kulkarni
2022-10-26  5:53   ` Hannes Reinecke
2022-10-26  7:22     ` Daniel Wagner
2022-11-01  9:45 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).