* [PATCH v4] nvmet: force reconnect when number of queue changes
@ 2022-10-25 15:50 Daniel Wagner
2022-10-25 17:30 ` Chaitanya Kulkarni
2022-11-01 9:45 ` Christoph Hellwig
0 siblings, 2 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-10-25 15:50 UTC (permalink / raw)
To: linux-nvme
Cc: Christoph Hellwig, Daniel Wagner, Hannes Reinecke, Sagi Grimberg
In order to test queue number changes we need to make sure that the
host reconnects. Because only when the host disconnects from the
target the number of queues are allowed to change according the spec.
The initial idea was to disable and re-enable the ports and have the
host wait until the KATO timer expires, triggering error
recovery. Though the host would see a DNR reply when trying to
reconnect. Because of the DNR bit the connection is dropped
completely. There is no point in trying to reconnect with the same
parameters according the spec.
We can force to reconnect the host is by deleting all controllers. The
host will observe any newly posted request to fail and thus starts the
error recovery but this time without the DNR bit set.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
v4:
- Rebased on -rc2
- invalid memory fix went in as separate fix
94f5a0688407 ("nvmet: fix invalid memory reference in
nvmet_subsys_attr_qid_max_show")
v3:
- Updated Fixes tag
- Updated commit message with some additional information
from the v2 discussion
- https://lore.kernel.org/linux-nvme/20221007072934.9536-1-dwagner@suse.de/
v2:
- instead preventing changes, force reconnect by delete ctrls
- renamed patch
- https://lore.kernel.org/linux-nvme/20220927143157.3659-1-dwagner@suse.de/
v1:
- initial verison
- https://lore.kernel.org/linux-nvme/20220913064203.133536-1-dwagner@suse.de/
drivers/nvme/target/configfs.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 9443ee1d4ae3..051a420d818e 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1290,6 +1290,8 @@ static ssize_t nvmet_subsys_attr_qid_max_show(struct config_item *item,
static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
const char *page, size_t cnt)
{
+ struct nvmet_subsys *subsys = to_subsys(item);
+ struct nvmet_ctrl *ctrl;
u16 qid_max;
if (sscanf(page, "%hu\n", &qid_max) != 1)
@@ -1299,8 +1301,13 @@ static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
return -EINVAL;
down_write(&nvmet_config_sem);
- to_subsys(item)->max_qid = qid_max;
+ subsys->max_qid = qid_max;
+
+ /* Force reconnect */
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+ ctrl->ops->delete_ctrl(ctrl);
up_write(&nvmet_config_sem);
+
return cnt;
}
CONFIGFS_ATTR(nvmet_subsys_, attr_qid_max);
--
2.38.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
2022-10-25 15:50 [PATCH v4] nvmet: force reconnect when number of queue changes Daniel Wagner
@ 2022-10-25 17:30 ` Chaitanya Kulkarni
2022-10-26 5:53 ` Hannes Reinecke
2022-11-01 9:45 ` Christoph Hellwig
1 sibling, 1 reply; 5+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25 17:30 UTC (permalink / raw)
To: Daniel Wagner, linux-nvme
Cc: Christoph Hellwig, Hannes Reinecke, Sagi Grimberg
On 10/25/2022 8:50 AM, Daniel Wagner wrote:
> In order to test queue number changes we need to make sure that the
> host reconnects. Because only when the host disconnects from the
> target the number of queues are allowed to change according the spec.
>
> The initial idea was to disable and re-enable the ports and have the
> host wait until the KATO timer expires, triggering error
> recovery. Though the host would see a DNR reply when trying to
> reconnect. Because of the DNR bit the connection is dropped
> completely. There is no point in trying to reconnect with the same
> parameters according the spec.
>
> We can force to reconnect the host is by deleting all controllers. The
> host will observe any newly posted request to fail and thus starts the
> error recovery but this time without the DNR bit set.
>
Without looking into the spec, isn't some sort of AEN should be used for
this ? please correct me if I'm wrong but deleting all the controllers
and relaying on reconnect maybe overkill ? if it doesn't exists in
the NVMe spec then perhaps we should think/work on it to update the
spec ? Is it worth it ?
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Acked-by: Sagi Grimberg <sagi@grimberg.me>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
2022-10-25 17:30 ` Chaitanya Kulkarni
@ 2022-10-26 5:53 ` Hannes Reinecke
2022-10-26 7:22 ` Daniel Wagner
0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2022-10-26 5:53 UTC (permalink / raw)
To: Chaitanya Kulkarni, Daniel Wagner, linux-nvme
Cc: Christoph Hellwig, Sagi Grimberg
On 10/25/22 19:30, Chaitanya Kulkarni wrote:
> On 10/25/2022 8:50 AM, Daniel Wagner wrote:
>> In order to test queue number changes we need to make sure that the
>> host reconnects. Because only when the host disconnects from the
>> target the number of queues are allowed to change according the spec.
>>
>> The initial idea was to disable and re-enable the ports and have the
>> host wait until the KATO timer expires, triggering error
>> recovery. Though the host would see a DNR reply when trying to
>> reconnect. Because of the DNR bit the connection is dropped
>> completely. There is no point in trying to reconnect with the same
>> parameters according the spec.
>>
>> We can force to reconnect the host is by deleting all controllers. The
>> host will observe any newly posted request to fail and thus starts the
>> error recovery but this time without the DNR bit set.
>>
>
> Without looking into the spec, isn't some sort of AEN should be used for
> this ? please correct me if I'm wrong but deleting all the controllers
> and relaying on reconnect maybe overkill ? if it doesn't exists in
> the NVMe spec then perhaps we should think/work on it to update the
> spec ? Is it worth it ?
>
Problem is that changing the number of queues on the fly is really
awkward, _and_ you have the problem that the number of queues is
negotiated during the 'connect' call. So changing the number of queues
from the controller side really requires us to reconnect to rectify the
situation.
And no, I don't think it's overkill. It actually makes the situation
easier as you start off from scratch with the correct number of queues.
Plus you have to quiesce all queues anyway when changing the number of
queues; the tagset will be modified, requiring you to stop all I/O when
doing so. So from the block layer side there isn't much of a difference
here.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
2022-10-26 5:53 ` Hannes Reinecke
@ 2022-10-26 7:22 ` Daniel Wagner
0 siblings, 0 replies; 5+ messages in thread
From: Daniel Wagner @ 2022-10-26 7:22 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Chaitanya Kulkarni, linux-nvme, Christoph Hellwig, Sagi Grimberg
On Wed, Oct 26, 2022 at 07:53:56AM +0200, Hannes Reinecke wrote:
> > Without looking into the spec, isn't some sort of AEN should be used for
> > this ? please correct me if I'm wrong but deleting all the controllers
> > and relaying on reconnect maybe overkill ? if it doesn't exists in
> > the NVMe spec then perhaps we should think/work on it to update the
> > spec ? Is it worth it ?
As it turns out there is no AEN which really matches this. We scenario
had a short discussion on this topic with Fred during ALPSS and he was
not against adding something if there is a good use case for it.
The issue is that no one really came up with a good general use case
except for testing corner cases.
> Problem is that changing the number of queues on the fly is really awkward,
> _and_ you have the problem that the number of queues is negotiated during
> the 'connect' call. So changing the number of queues from the controller
> side really requires us to reconnect to rectify the situation.
>
> And no, I don't think it's overkill. It actually makes the situation easier
> as you start off from scratch with the correct number of queues.
There is also the argument, that this approach doesn't rely on the host
to cooperate correctly. For example, if we had an AEN to tell the host
'hey, drop all connections and reconnect', the target still doesn't know
when and if this has been done by the host.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] nvmet: force reconnect when number of queue changes
2022-10-25 15:50 [PATCH v4] nvmet: force reconnect when number of queue changes Daniel Wagner
2022-10-25 17:30 ` Chaitanya Kulkarni
@ 2022-11-01 9:45 ` Christoph Hellwig
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2022-11-01 9:45 UTC (permalink / raw)
To: Daniel Wagner
Cc: linux-nvme, Christoph Hellwig, Hannes Reinecke, Sagi Grimberg
Thanks,
applied to nvme-6.2.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-11-01 9:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25 15:50 [PATCH v4] nvmet: force reconnect when number of queue changes Daniel Wagner
2022-10-25 17:30 ` Chaitanya Kulkarni
2022-10-26 5:53 ` Hannes Reinecke
2022-10-26 7:22 ` Daniel Wagner
2022-11-01 9:45 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).