[PATCH v2] nvmet: force reconnect when number of queue changes

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] nvmet: force reconnect when number of queue changes
@ 2022-09-27 14:31 Daniel Wagner
  2022-09-27 15:01 ` Daniel Wagner
  2022-09-28  6:55 ` Sagi Grimberg
  0 siblings, 2 replies; 23+ messages in thread
From: Daniel Wagner @ 2022-09-27 14:31 UTC (permalink / raw)
  To: linux-nvme; +Cc: Shinichiro Kawasaki, Daniel Wagner

In order to be able to test queue number changes we need to make sure
that the host reconnects.

The initial idea was to disable and re-enable the ports and have the
host to wait until the KATO timer expires and enter error
recovery. But in this scenario the host could see DNR for a connection
attempt which results in the host dropping the connection completely.

We can force to reconnect the host by deleting all controllers
connected to subsystem, which results the host observing a failing
command and tries to reconnect.

Also, the item passed into nvmet_subsys_attr_qid_max_show is not a
member of struct nvmet_port, it is part of nvmet_subsys. Hence,
don't try to dereference it as struct nvme_ctrl pointer.

Fixes: 2c4282742d04 ("nvmet: Expose max queues to configfs")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20220913064203.133536-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
v2:
  - instead preventing changes, force reconnect by delete ctrls
  - renamed patch

v1:
  - initial verison
  - https://lore.kernel.org/linux-nvme/20220913064203.133536-1-dwagner@suse.de/

 drivers/nvme/target/configfs.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index e34a2896fedb..051a420d818e 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1290,12 +1290,10 @@ static ssize_t nvmet_subsys_attr_qid_max_show(struct config_item *item,
 static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
 					       const char *page, size_t cnt)
 {
-	struct nvmet_port *port = to_nvmet_port(item);
+	struct nvmet_subsys *subsys = to_subsys(item);
+	struct nvmet_ctrl *ctrl;
 	u16 qid_max;
 
-	if (nvmet_is_port_enabled(port, __func__))
-		return -EACCES;
-
 	if (sscanf(page, "%hu\n", &qid_max) != 1)
 		return -EINVAL;
 
@@ -1303,8 +1301,13 @@ static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
 		return -EINVAL;
 
 	down_write(&nvmet_config_sem);
-	to_subsys(item)->max_qid = qid_max;
+	subsys->max_qid = qid_max;
+
+	/* Force reconnect */
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+		ctrl->ops->delete_ctrl(ctrl);
 	up_write(&nvmet_config_sem);
+
 	return cnt;
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_qid_max);
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-27 14:31 [PATCH v2] nvmet: force reconnect when number of queue changes Daniel Wagner
@ 2022-09-27 15:01 ` Daniel Wagner
  2022-09-28  6:55 ` Sagi Grimberg
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel Wagner @ 2022-09-27 15:01 UTC (permalink / raw)
  To: linux-nvme; +Cc: Shinichiro Kawasaki

On Tue, Sep 27, 2022 at 04:31:57PM +0200, Daniel Wagner wrote:
> In order to be able to test queue number changes we need to make sure
> that the host reconnects.
> 
> The initial idea was to disable and re-enable the ports and have the
> host to wait until the KATO timer expires and enter error
> recovery. But in this scenario the host could see DNR for a connection
> attempt which results in the host dropping the connection completely.
> 
> We can force to reconnect the host by deleting all controllers
> connected to subsystem, which results the host observing a failing
> command and tries to reconnect.
> 
> Also, the item passed into nvmet_subsys_attr_qid_max_show is not a
> member of struct nvmet_port, it is part of nvmet_subsys. Hence,
> don't try to dereference it as struct nvme_ctrl pointer.
> 
> Fixes: 2c4282742d04 ("nvmet: Expose max queues to configfs")

The ref is wrong should be

3e980f5995e0 ("nvmet: expose max queues to configfs")


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-27 14:31 [PATCH v2] nvmet: force reconnect when number of queue changes Daniel Wagner
  2022-09-27 15:01 ` Daniel Wagner
@ 2022-09-28  6:55 ` Sagi Grimberg
  2022-09-28  7:48   ` Daniel Wagner
  1 sibling, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2022-09-28  6:55 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme; +Cc: Shinichiro Kawasaki


> In order to be able to test queue number changes we need to make sure
> that the host reconnects.
> 
> The initial idea was to disable and re-enable the ports and have the
> host to wait until the KATO timer expires and enter error
> recovery. But in this scenario the host could see DNR for a connection
> attempt which results in the host dropping the connection completely.
> 
> We can force to reconnect the host by deleting all controllers
> connected to subsystem, which results the host observing a failing
> command and tries to reconnect.

This looks like a change that attempts to fix a host issue from the
target side... Why do we want to do that?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28  6:55 ` Sagi Grimberg
@ 2022-09-28  7:48   ` Daniel Wagner
  2022-09-28  8:31     ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2022-09-28  7:48 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Shinichiro Kawasaki

On Wed, Sep 28, 2022 at 09:55:11AM +0300, Sagi Grimberg wrote:
> 
> > In order to be able to test queue number changes we need to make sure
> > that the host reconnects.
> > 
> > The initial idea was to disable and re-enable the ports and have the
> > host to wait until the KATO timer expires and enter error
> > recovery. But in this scenario the host could see DNR for a connection
> > attempt which results in the host dropping the connection completely.
> > 
> > We can force to reconnect the host by deleting all controllers
> > connected to subsystem, which results the host observing a failing
> > command and tries to reconnect.
> 
> This looks like a change that attempts to fix a host issue from the
> target side... Why do we want to do that?

It's not a host issue at all. The scenario I'd like to test a when
target changes this property while the host is connected (e.g. software
updated -> new configuration). I haven't found a way to signal the host
to reset/reconnect from the target. Hannes suggested to delete all
controllers from the given subsystem which will trigger the recovery
process on the host on the next request. This makes this test work.

Though if you have a better idea how to signal the host to reconfigure
itself, I am glad to work on it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28  7:48   ` Daniel Wagner
@ 2022-09-28  8:31     ` Sagi Grimberg
  2022-09-28  9:02       ` Daniel Wagner
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2022-09-28  8:31 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-nvme, Shinichiro Kawasaki

>>> In order to be able to test queue number changes we need to make sure
>>> that the host reconnects.
>>>
>>> The initial idea was to disable and re-enable the ports and have the
>>> host to wait until the KATO timer expires and enter error
>>> recovery. But in this scenario the host could see DNR for a connection
>>> attempt which results in the host dropping the connection completely.
>>>
>>> We can force to reconnect the host by deleting all controllers
>>> connected to subsystem, which results the host observing a failing
>>> command and tries to reconnect.
>>
>> This looks like a change that attempts to fix a host issue from the
>> target side... Why do we want to do that?
> 
> It's not a host issue at all. The scenario I'd like to test a when
> target changes this property while the host is connected (e.g. software
> updated -> new configuration). I haven't found a way to signal the host
> to reset/reconnect from the target. Hannes suggested to delete all
> controllers from the given subsystem which will trigger the recovery
> process on the host on the next request. This makes this test work.

But that is exactly like doing:
- remove subsystem from port
- apply q count change
- link subsystem to port

Your problem is that the target returns an error code that makes the
host to never reconnect. That is a host behavior, and that behavior is
different from each transport used.

This is why I'm not clear on weather this is the right place to
address this issue.

I personally do not understand why a DNR completion makes the host
choose to not reconnect. DNR means "do not retry" for the command
itself (which the host adheres to), and it does not have any meaning to
a reset/reconnect logic.

In my mind, a possible use-case is that a subsystem can be un-exported
from a port for maintenance reasons, and rely on the host to
periodically attempt to reconnect, and this is exactly what your test is
doing.

> Though if you have a better idea how to signal the host to reconfigure
> itself, I am glad to work on it.

I think we should first agree on what the host should/shoudn't do and
make the logic consistent between all transports. Then we can talk about
how to write a test for your test case.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28  8:31     ` Sagi Grimberg
@ 2022-09-28  9:02       ` Daniel Wagner
  2022-09-28 12:39         ` Knight, Frederick
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2022-09-28  9:02 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Shinichiro Kawasaki, hare, Frederick.Knight

On Wed, Sep 28, 2022 at 11:31:43AM +0300, Sagi Grimberg wrote:
> 
> > > > In order to be able to test queue number changes we need to make sure
> > > > that the host reconnects.
> > > > 
> > > > The initial idea was to disable and re-enable the ports and have the
> > > > host to wait until the KATO timer expires and enter error
> > > > recovery. But in this scenario the host could see DNR for a connection
> > > > attempt which results in the host dropping the connection completely.
> > > > 
> > > > We can force to reconnect the host by deleting all controllers
> > > > connected to subsystem, which results the host observing a failing
> > > > command and tries to reconnect.
> > > 
> > > This looks like a change that attempts to fix a host issue from the
> > > target side... Why do we want to do that?
> > 
> > It's not a host issue at all. The scenario I'd like to test a when
> > target changes this property while the host is connected (e.g. software
> > updated -> new configuration). I haven't found a way to signal the host
> > to reset/reconnect from the target. Hannes suggested to delete all
> > controllers from the given subsystem which will trigger the recovery
> > process on the host on the next request. This makes this test work.
> 
> But that is exactly like doing:
> - remove subsystem from port
> - apply q count change
> - link subsystem to port
> 
> Your problem is that the target returns an error code that makes the
> host to never reconnect. That is a host behavior, and that behavior is
> different from each transport used.

Yes, I try to avoid to trigger the DNR.

> This is why I'm not clear on weather this is the right place to
> address this issue.
> 
> I personally do not understand why a DNR completion makes the host
> choose to not reconnect. DNR means "do not retry" for the command
> itself (which the host adheres to), and it does not have any meaning to
> a reset/reconnect logic.

I am just the messenger: Besides Hannes' objection in the last mail
thread, I got this private reply from Fred Knight:

   Do Not Retry (DNR): If set to ‘1’, indicates that if the same command is
   re-submitted to any controller in the NVM subsystem, then that
   re-submitted command is expected to fail. If cleared to ‘0’, indicates
   that the same command may succeed if retried. If a command is aborted
   due to time limited error recovery (refer to the Error Recovery section
   in the NVM Command Set Specification), this bit should be cleared to
   ‘0’. If the SCT and SC fields are cleared to 0h, then this bit should be
   cleared to ‘0’.a

   It simply makes NO SENSE to retry that command. If the device wants the
   host to retry, then it will clear DNR=0.

> In my mind, a possible use-case is that a subsystem can be un-exported
> from a port for maintenance reasons, and rely on the host to
> periodically attempt to reconnect, and this is exactly what your test is
> doing.

Yes and that's the indented test case. The number of queue change is on
top of this scenario. It's a combined test case.

> > Though if you have a better idea how to signal the host to reconfigure
> > itself, I am glad to work on it.
> 
> I think we should first agree on what the host should/shoudn't do and
> make the logic consistent between all transports. Then we can talk about
> how to write a test for your test case.

Fair enough. This here was just my cheesy attempt to get things
moving.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28  9:02       ` Daniel Wagner
@ 2022-09-28 12:39         ` Knight, Frederick
  2022-09-28 13:50           ` Daniel Wagner
  2022-09-28 15:01           ` John Meneghini
  0 siblings, 2 replies; 23+ messages in thread
From: Knight, Frederick @ 2022-09-28 12:39 UTC (permalink / raw)
  To: Daniel Wagner, Sagi Grimberg; +Cc: linux-nvme, Shinichiro Kawasaki, hare

Would this be a case for a new AEN - controller configuration changed?  I'm also wondering exactly what changed in the controller?  It can't be the "Number of Queues" feature (that can't change - The controller shall not change the value allocated between resets.).  Is it the MQES field in the CAP property that changes (queue size)?

We already have change reporting for: Namespace attribute, Predictable Latency, LBA status, EG aggregate, Zone descriptor, Discovery Log, Reservations. We've questioned whether we need a Controller Attribute Changed.

Would this be a case for an exception?  Does the DNR bit apply only to commands sent on queues that already exist (i.e., NOT the connect command since that command is actually creating the queue)?  FWIW - I don't like exceptions.

Can you elaborate on exactly what is changing?

	Fred

> -----Original Message-----
> From: Daniel Wagner <dwagner@suse.de>
> Sent: Wednesday, September 28, 2022 5:03 AM
> To: Sagi Grimberg <sagi@grimberg.me>
> Cc: linux-nvme@lists.infradead.org; Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com>; hare@suse.de; Knight, Frederick
> <Frederick.Knight@netapp.com>
> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
> changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> 
> 
> On Wed, Sep 28, 2022 at 11:31:43AM +0300, Sagi Grimberg wrote:
> >
> > > > > In order to be able to test queue number changes we need to make
> > > > > sure that the host reconnects.
> > > > >
> > > > > The initial idea was to disable and re-enable the ports and have
> > > > > the host to wait until the KATO timer expires and enter error
> > > > > recovery. But in this scenario the host could see DNR for a
> > > > > connection attempt which results in the host dropping the connection
> completely.
> > > > >
> > > > > We can force to reconnect the host by deleting all controllers
> > > > > connected to subsystem, which results the host observing a
> > > > > failing command and tries to reconnect.
> > > >
> > > > This looks like a change that attempts to fix a host issue from
> > > > the target side... Why do we want to do that?
> > >
> > > It's not a host issue at all. The scenario I'd like to test a when
> > > target changes this property while the host is connected (e.g.
> > > software updated -> new configuration). I haven't found a way to
> > > signal the host to reset/reconnect from the target. Hannes suggested
> > > to delete all controllers from the given subsystem which will
> > > trigger the recovery process on the host on the next request. This makes
> this test work.
> >
> > But that is exactly like doing:
> > - remove subsystem from port
> > - apply q count change
> > - link subsystem to port
> >
> > Your problem is that the target returns an error code that makes the
> > host to never reconnect. That is a host behavior, and that behavior is
> > different from each transport used.
> 
> Yes, I try to avoid to trigger the DNR.
> 
> > This is why I'm not clear on weather this is the right place to
> > address this issue.
> >
> > I personally do not understand why a DNR completion makes the host
> > choose to not reconnect. DNR means "do not retry" for the command
> > itself (which the host adheres to), and it does not have any meaning
> > to a reset/reconnect logic.
> 
> I am just the messenger: Besides Hannes' objection in the last mail thread, I
> got this private reply from Fred Knight:
> 
>    Do Not Retry (DNR): If set to ‘1’, indicates that if the same command is
>    re-submitted to any controller in the NVM subsystem, then that
>    re-submitted command is expected to fail. If cleared to ‘0’, indicates
>    that the same command may succeed if retried. If a command is aborted
>    due to time limited error recovery (refer to the Error Recovery section
>    in the NVM Command Set Specification), this bit should be cleared to
>    ‘0’. If the SCT and SC fields are cleared to 0h, then this bit should be
>    cleared to ‘0’.a
> 
>    It simply makes NO SENSE to retry that command. If the device wants the
>    host to retry, then it will clear DNR=0.
> 
> > In my mind, a possible use-case is that a subsystem can be un-exported
> > from a port for maintenance reasons, and rely on the host to
> > periodically attempt to reconnect, and this is exactly what your test
> > is doing.
> 
> Yes and that's the indented test case. The number of queue change is on top
> of this scenario. It's a combined test case.
> 
> > > Though if you have a better idea how to signal the host to
> > > reconfigure itself, I am glad to work on it.
> >
> > I think we should first agree on what the host should/shoudn't do and
> > make the logic consistent between all transports. Then we can talk
> > about how to write a test for your test case.
> 
> Fair enough. This here was just my cheesy attempt to get things moving.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 12:39         ` Knight, Frederick
@ 2022-09-28 13:50           ` Daniel Wagner
  2022-09-28 14:23             ` Sagi Grimberg
  2022-09-28 16:01             ` Knight, Frederick
  2022-09-28 15:01           ` John Meneghini
  1 sibling, 2 replies; 23+ messages in thread
From: Daniel Wagner @ 2022-09-28 13:50 UTC (permalink / raw)
  To: Knight, Frederick; +Cc: Sagi Grimberg, linux-nvme, Shinichiro Kawasaki, hare

On Wed, Sep 28, 2022 at 12:39:44PM +0000, Knight, Frederick wrote:
> Would this be a case for a new AEN - controller configuration changed?
> I'm also wondering exactly what changed in the controller?  It can't
> be the "Number of Queues" feature (that can't change - The controller
> shall not change the value allocated between resets.).  Is it the MQES
> field in the CAP property that changes (queue size)?
>
> We already have change reporting for: Namespace attribute, Predictable
> Latency, LBA status, EG aggregate, Zone descriptor, Discovery Log,
> Reservations. We've questioned whether we need a Controller Attribute
> Changed.
>
> Would this be a case for an exception?  Does the DNR bit apply only to
> commands sent on queues that already exist (i.e., NOT the connect
> command since that command is actually creating the queue)?  FWIW - I
> don't like exceptions.
>
> Can you elaborate on exactly what is changing?

The background story is, that we have a setup with two targets (*) and
the host is connected two both of them (HA setup). Both server run the
same software version. The host is told via Number of Queues (Feature
Identifier 07h) how many queues are supported (N queues).

Now, a software upgrade is started which takes first one server offline,
updates it and brings it online again. Then the same process with the
second server.

In the meantime the host tries to reconnect. Eventually, the reconnect
is successful but the Number of Queues (Feature Identifier 07h) has
changed to a smaller value, e.g N-2 queues.

My test case here is trying to replicated this scenario but just with
one target. Hence the problem how to notify the host that it should
reconnect. As you mentioned this is not to supposed to change as long a
connection is established.

My understanding is that the current nvme target implementation in Linux
doesn't really support this HA setup scenario hence my attempt to get it
flying with one target. The DNR bit comes into play because I was toying
with removing the subsystem from the port, changing the number of queues
and re-adding the subsystem to the port.

This resulted in any request posted from the host seeing the DNR
bit. The second attempt here was to delete the controller to force a
reconnect. I agree it's also not really the right thing to do.

As far I can tell, what's is missing from a testing point of view is the
ability to fail requests without the DNR bit set or the ability to tell
the host to reconnect. Obviously, an AEN would be nice for this but I
don't know if this is reason enough to extend the spec.

I can't really say if this is a real world scenario or just a result of
trying to cut corners. Anyway, I am glad to do the work if we can agree
on how this test case could be implemented.

Daniel

(*) not sure how to call it properly, is this one target or two targets?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 13:50           ` Daniel Wagner
@ 2022-09-28 14:23             ` Sagi Grimberg
  2022-09-28 15:21               ` Hannes Reinecke
                                 ` (2 more replies)
  2022-09-28 16:01             ` Knight, Frederick
  1 sibling, 3 replies; 23+ messages in thread
From: Sagi Grimberg @ 2022-09-28 14:23 UTC (permalink / raw)
  To: Daniel Wagner, Knight, Frederick; +Cc: linux-nvme, Shinichiro Kawasaki, hare


>> Would this be a case for a new AEN - controller configuration changed?
>> I'm also wondering exactly what changed in the controller?  It can't
>> be the "Number of Queues" feature (that can't change - The controller
>> shall not change the value allocated between resets.).  Is it the MQES
>> field in the CAP property that changes (queue size)?
>>
>> We already have change reporting for: Namespace attribute, Predictable
>> Latency, LBA status, EG aggregate, Zone descriptor, Discovery Log,
>> Reservations. We've questioned whether we need a Controller Attribute
>> Changed.
>>
>> Would this be a case for an exception?  Does the DNR bit apply only to
>> commands sent on queues that already exist (i.e., NOT the connect
>> command since that command is actually creating the queue)?  FWIW - I
>> don't like exceptions.
>>
>> Can you elaborate on exactly what is changing?
> 
> The background story is, that we have a setup with two targets (*) and
> the host is connected two both of them (HA setup). Both server run the
> same software version. The host is told via Number of Queues (Feature
> Identifier 07h) how many queues are supported (N queues).
> 
> Now, a software upgrade is started which takes first one server offline,
> updates it and brings it online again. Then the same process with the
> second server.
> 
> In the meantime the host tries to reconnect. Eventually, the reconnect
> is successful but the Number of Queues (Feature Identifier 07h) has
> changed to a smaller value, e.g N-2 queues.
> 
> My test case here is trying to replicated this scenario but just with
> one target. Hence the problem how to notify the host that it should
> reconnect. As you mentioned this is not to supposed to change as long a
> connection is established.
> 
> My understanding is that the current nvme target implementation in Linux
> doesn't really support this HA setup scenario hence my attempt to get it
> flying with one target. The DNR bit comes into play because I was toying
> with removing the subsystem from the port, changing the number of queues
> and re-adding the subsystem to the port.
> 
> This resulted in any request posted from the host seeing the DNR
> bit. The second attempt here was to delete the controller to force a
> reconnect. I agree it's also not really the right thing to do.
> 
> As far I can tell, what's is missing from a testing point of view is the
> ability to fail requests without the DNR bit set or the ability to tell
> the host to reconnect. Obviously, an AEN would be nice for this but I
> don't know if this is reason enough to extend the spec.

Looking into the code, its the connect that fails on invalid parameter
with a DNR, because the host is attempting to connect to a subsystems
that does not exist on the port (because it was taken offline for
maintenance reasons).

So I guess it is valid to allow queue change without removing it from
the port, but that does not change the fundamental question on DNR.
If the host sees a DNR error on connect, my interpretation is that the
host should not retry the connect command itself, but it shouldn't imply
anything on tearing down the controller and giving up on it completely,
forever.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 12:39         ` Knight, Frederick
  2022-09-28 13:50           ` Daniel Wagner
@ 2022-09-28 15:01           ` John Meneghini
  2022-09-28 15:26             ` Daniel Wagner
  1 sibling, 1 reply; 23+ messages in thread
From: John Meneghini @ 2022-09-28 15:01 UTC (permalink / raw)
  To: Knight, Frederick, Daniel Wagner, Sagi Grimberg
  Cc: linux-nvme, Shinichiro Kawasaki, hare

Fred, what's the difference between a "reset" and a "connection loss".

As Daniel explained, the problem case in question involves a controller that "disconnects" from the network, and then reconnects 
after a HA takeover/giveback event. And the fact is, this controller is changing the number of IOQs after the 
"disconnect/reconnect".

So this is one place where I think we conflate NVMe-oF with NVMe in the specification.  If "reset" == "disconnect", then 
Daniel's patches just implemented support for something which is forbidden by the spec - a controller that changes the number of 
IOQs between resets.

/John

5.27.1.5 Number of Queues (Feature Identifier 07h)

This Feature indicates the number of queues that the host requests for the controller processing the
command. This feature shall only be issued during initialization prior to creation of any I/O Submission
and/or Completion Queues. If a Set Features command is issued for this feature after creation of any I/O
Submission and/or I/O Completion Queues, then the Set Features command shall abort with status code
of Command Sequence Error. The controller shall not change the value allocated between resets. For a
Set Features command, the attributes are specified in Command Dword 11 (refer to Figure 322). For a Get
Features command, Dword 11 is ignored.


On 9/28/22 08:39, Knight, Frederick wrote:
> Would this be a case for a new AEN - controller configuration changed?  I'm also wondering exactly what changed in the controller?  It can't be the "Number of Queues" feature (that can't change - The controller shall not change the value allocated between resets.).  Is it the MQES field in the CAP property that changes (queue size)?
> 
> We already have change reporting for: Namespace attribute, Predictable Latency, LBA status, EG aggregate, Zone descriptor, Discovery Log, Reservations. We've questioned whether we need a Controller Attribute Changed.
> 
> Would this be a case for an exception?  Does the DNR bit apply only to commands sent on queues that already exist (i.e., NOT the connect command since that command is actually creating the queue)?  FWIW - I don't like exceptions.
> 
> Can you elaborate on exactly what is changing?
> 
> 	Fred
> 
>> -----Original Message-----
>> From: Daniel Wagner <dwagner@suse.de>
>> Sent: Wednesday, September 28, 2022 5:03 AM
>> To: Sagi Grimberg <sagi@grimberg.me>
>> Cc: linux-nvme@lists.infradead.org; Shinichiro Kawasaki
>> <shinichiro.kawasaki@wdc.com>; hare@suse.de; Knight, Frederick
>> <Frederick.Knight@netapp.com>
>> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
>> changes
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>>
>> On Wed, Sep 28, 2022 at 11:31:43AM +0300, Sagi Grimberg wrote:
>>>
>>>>>> In order to be able to test queue number changes we need to make
>>>>>> sure that the host reconnects.
>>>>>>
>>>>>> The initial idea was to disable and re-enable the ports and have
>>>>>> the host to wait until the KATO timer expires and enter error
>>>>>> recovery. But in this scenario the host could see DNR for a
>>>>>> connection attempt which results in the host dropping the connection
>> completely.
>>>>>>
>>>>>> We can force to reconnect the host by deleting all controllers
>>>>>> connected to subsystem, which results the host observing a
>>>>>> failing command and tries to reconnect.
>>>>>
>>>>> This looks like a change that attempts to fix a host issue from
>>>>> the target side... Why do we want to do that?
>>>>
>>>> It's not a host issue at all. The scenario I'd like to test a when
>>>> target changes this property while the host is connected (e.g.
>>>> software updated -> new configuration). I haven't found a way to
>>>> signal the host to reset/reconnect from the target. Hannes suggested
>>>> to delete all controllers from the given subsystem which will
>>>> trigger the recovery process on the host on the next request. This makes
>> this test work.
>>>
>>> But that is exactly like doing:
>>> - remove subsystem from port
>>> - apply q count change
>>> - link subsystem to port
>>>
>>> Your problem is that the target returns an error code that makes the
>>> host to never reconnect. That is a host behavior, and that behavior is
>>> different from each transport used.
>>
>> Yes, I try to avoid to trigger the DNR.
>>
>>> This is why I'm not clear on weather this is the right place to
>>> address this issue.
>>>
>>> I personally do not understand why a DNR completion makes the host
>>> choose to not reconnect. DNR means "do not retry" for the command
>>> itself (which the host adheres to), and it does not have any meaning
>>> to a reset/reconnect logic.
>>
>> I am just the messenger: Besides Hannes' objection in the last mail thread, I
>> got this private reply from Fred Knight:
>>
>>     Do Not Retry (DNR): If set to ‘1’, indicates that if the same command is
>>     re-submitted to any controller in the NVM subsystem, then that
>>     re-submitted command is expected to fail. If cleared to ‘0’, indicates
>>     that the same command may succeed if retried. If a command is aborted
>>     due to time limited error recovery (refer to the Error Recovery section
>>     in the NVM Command Set Specification), this bit should be cleared to
>>     ‘0’. If the SCT and SC fields are cleared to 0h, then this bit should be
>>     cleared to ‘0’.a
>>
>>     It simply makes NO SENSE to retry that command. If the device wants the
>>     host to retry, then it will clear DNR=0.
>>
>>> In my mind, a possible use-case is that a subsystem can be un-exported
>>> from a port for maintenance reasons, and rely on the host to
>>> periodically attempt to reconnect, and this is exactly what your test
>>> is doing.
>>
>> Yes and that's the indented test case. The number of queue change is on top
>> of this scenario. It's a combined test case.
>>
>>>> Though if you have a better idea how to signal the host to
>>>> reconfigure itself, I am glad to work on it.
>>>
>>> I think we should first agree on what the host should/shoudn't do and
>>> make the logic consistent between all transports. Then we can talk
>>> about how to write a test for your test case.
>>
>> Fair enough. This here was just my cheesy attempt to get things moving.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 14:23             ` Sagi Grimberg
@ 2022-09-28 15:21               ` Hannes Reinecke
  2022-09-28 16:01               ` Knight, Frederick
  2022-10-05 18:15               ` Daniel Wagner
  2 siblings, 0 replies; 23+ messages in thread
From: Hannes Reinecke @ 2022-09-28 15:21 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner, Knight, Frederick
  Cc: linux-nvme, Shinichiro Kawasaki

On 9/28/22 16:23, Sagi Grimberg wrote:
> 
>>> Would this be a case for a new AEN - controller configuration changed?
>>> I'm also wondering exactly what changed in the controller?  It can't
>>> be the "Number of Queues" feature (that can't change - The controller
>>> shall not change the value allocated between resets.).  Is it the MQES
>>> field in the CAP property that changes (queue size)?
>>>
>>> We already have change reporting for: Namespace attribute, Predictable
>>> Latency, LBA status, EG aggregate, Zone descriptor, Discovery Log,
>>> Reservations. We've questioned whether we need a Controller Attribute
>>> Changed.
>>>
>>> Would this be a case for an exception?  Does the DNR bit apply only to
>>> commands sent on queues that already exist (i.e., NOT the connect
>>> command since that command is actually creating the queue)?  FWIW - I
>>> don't like exceptions.
>>>
>>> Can you elaborate on exactly what is changing?
>>
>> The background story is, that we have a setup with two targets (*) and
>> the host is connected two both of them (HA setup). Both server run the
>> same software version. The host is told via Number of Queues (Feature
>> Identifier 07h) how many queues are supported (N queues).
>>
>> Now, a software upgrade is started which takes first one server offline,
>> updates it and brings it online again. Then the same process with the
>> second server.
>>
>> In the meantime the host tries to reconnect. Eventually, the reconnect
>> is successful but the Number of Queues (Feature Identifier 07h) has
>> changed to a smaller value, e.g N-2 queues.
>>
>> My test case here is trying to replicated this scenario but just with
>> one target. Hence the problem how to notify the host that it should
>> reconnect. As you mentioned this is not to supposed to change as long a
>> connection is established.
>>
>> My understanding is that the current nvme target implementation in Linux
>> doesn't really support this HA setup scenario hence my attempt to get it
>> flying with one target. The DNR bit comes into play because I was toying
>> with removing the subsystem from the port, changing the number of queues
>> and re-adding the subsystem to the port.
>>
>> This resulted in any request posted from the host seeing the DNR
>> bit. The second attempt here was to delete the controller to force a
>> reconnect. I agree it's also not really the right thing to do.
>>
>> As far I can tell, what's is missing from a testing point of view is the
>> ability to fail requests without the DNR bit set or the ability to tell
>> the host to reconnect. Obviously, an AEN would be nice for this but I
>> don't know if this is reason enough to extend the spec.
> 
> Looking into the code, its the connect that fails on invalid parameter
> with a DNR, because the host is attempting to connect to a subsystems
> that does not exist on the port (because it was taken offline for
> maintenance reasons).
> 
> So I guess it is valid to allow queue change without removing it from
> the port, but that does not change the fundamental question on DNR.
> If the host sees a DNR error on connect, my interpretation is that the
> host should not retry the connect command itself, but it shouldn't imply
> anything on tearing down the controller and giving up on it completely,
> forever.

Well. And now we're in the nitty gritty details.
Problem is that 'connect' is tied with the whole 'create association' 
stuff from the transport.
I can't really imagine what would be the results of a successful 
association creation, a 'connect' failing with DNR bit set, and the 
association _not_ being torn down.
In the end, the association is only valid for this specific 'connect' 
command, and as that failed we have to tear down the association.
Am I correct this far?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 15:01           ` John Meneghini
@ 2022-09-28 15:26             ` Daniel Wagner
  2022-09-28 18:02               ` Knight, Frederick
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2022-09-28 15:26 UTC (permalink / raw)
  To: John Meneghini
  Cc: Knight, Frederick, Sagi Grimberg, linux-nvme, Shinichiro Kawasaki, hare

On Wed, Sep 28, 2022 at 11:01:00AM -0400, John Meneghini wrote:
> So this is one place where I think we conflate NVMe-oF with NVMe in the
> specification.  If "reset" == "disconnect", then Daniel's patches just
> implemented support for something which is forbidden by the spec - a
> controller that changes the number of IOQs between resets.

FWIW, the scenario I described is from a real world experience. So even
if it is forbidden by the spec, it exists.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 13:50           ` Daniel Wagner
  2022-09-28 14:23             ` Sagi Grimberg
@ 2022-09-28 16:01             ` Knight, Frederick
  1 sibling, 0 replies; 23+ messages in thread
From: Knight, Frederick @ 2022-09-28 16:01 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Sagi Grimberg, linux-nvme, Shinichiro Kawasaki, hare

> two targets (*)

Yes, that is what it would be called in SCSI (2 - SCSI Target Devices).  In NVMe, that would more likely be 1 NVM Subsystem with 2 controllers (which could be on the same port - although that would not really be redundant; or on different ports - which would be more likely).  There's also TP4034 with 2 NVM subsystems - but that is a completely different discussion.

There is an AEN for firmware updates - "Firmware Activation Starting", but that one requires just a "pause" in command processing, and not new connections (it is for a different use case than what you're talking about).

If you had static controllers, then the Discovery Controller would report those changes; so, if you had persistent connections to the discovery controller, you'd know when it went away and when it returned.

For dynamic controllers, it's a little harder (since they all get reported under the single ID of FFFFh), so there is no change when a dynamic controller goes away and then comes back - on the same port).

If these controllers were on different ports, and the whole port goes away, then the Discovery Controller should be reporting the changes - the removal of a whole port and then the addition of a whole port.  But you'd still need a persistent connection to the Discovery Controller so it can send the AEN to notify the host about those changes.

I agree that adding some new feature just for a test case isn't really a good idea.  For one, it doesn't really test what would happen in the real world.

	Fred


> -----Original Message-----
> From: Daniel Wagner <dwagner@suse.de>
> Sent: Wednesday, September 28, 2022 9:51 AM
> To: Knight, Frederick <Frederick.Knight@netapp.com>
> Cc: Sagi Grimberg <sagi@grimberg.me>; linux-nvme@lists.infradead.org;
> Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>; hare@suse.de
> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
> changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> 
> 
> On Wed, Sep 28, 2022 at 12:39:44PM +0000, Knight, Frederick wrote:
> > Would this be a case for a new AEN - controller configuration changed?
> > I'm also wondering exactly what changed in the controller?  It can't
> > be the "Number of Queues" feature (that can't change - The controller
> > shall not change the value allocated between resets.).  Is it the MQES
> > field in the CAP property that changes (queue size)?
> >
> > We already have change reporting for: Namespace attribute, Predictable
> > Latency, LBA status, EG aggregate, Zone descriptor, Discovery Log,
> > Reservations. We've questioned whether we need a Controller Attribute
> > Changed.
> >
> > Would this be a case for an exception?  Does the DNR bit apply only to
> > commands sent on queues that already exist (i.e., NOT the connect
> > command since that command is actually creating the queue)?  FWIW - I
> > don't like exceptions.
> >
> > Can you elaborate on exactly what is changing?
> 
> The background story is, that we have a setup with two targets (*) and the
> host is connected two both of them (HA setup). Both server run the same
> software version. The host is told via Number of Queues (Feature Identifier
> 07h) how many queues are supported (N queues).
> 
> Now, a software upgrade is started which takes first one server offline,
> updates it and brings it online again. Then the same process with the second
> server.
> 
> In the meantime the host tries to reconnect. Eventually, the reconnect is
> successful but the Number of Queues (Feature Identifier 07h) has changed to
> a smaller value, e.g N-2 queues.
> 
> My test case here is trying to replicated this scenario but just with one target.
> Hence the problem how to notify the host that it should reconnect. As you
> mentioned this is not to supposed to change as long a connection is
> established.
> 
> My understanding is that the current nvme target implementation in Linux
> doesn't really support this HA setup scenario hence my attempt to get it
> flying with one target. The DNR bit comes into play because I was toying
> with removing the subsystem from the port, changing the number of queues
> and re-adding the subsystem to the port.
> 
> This resulted in any request posted from the host seeing the DNR bit. The
> second attempt here was to delete the controller to force a reconnect. I
> agree it's also not really the right thing to do.
> 
> As far I can tell, what's is missing from a testing point of view is the ability to
> fail requests without the DNR bit set or the ability to tell the host to
> reconnect. Obviously, an AEN would be nice for this but I don't know if this is
> reason enough to extend the spec.
> 
> I can't really say if this is a real world scenario or just a result of trying to cut
> corners. Anyway, I am glad to do the work if we can agree on how this test
> case could be implemented.
> 
> Daniel
> 
> (*) not sure how to call it properly, is this one target or two targets?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 14:23             ` Sagi Grimberg
  2022-09-28 15:21               ` Hannes Reinecke
@ 2022-09-28 16:01               ` Knight, Frederick
  2022-10-05 18:15               ` Daniel Wagner
  2 siblings, 0 replies; 23+ messages in thread
From: Knight, Frederick @ 2022-09-28 16:01 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner; +Cc: linux-nvme, Shinichiro Kawasaki, hare

> So I guess it is valid to allow queue change without removing it from the
> port, but that does not change the fundamental question on DNR.
> If the host sees a DNR error on connect, my interpretation is that the host
> should not retry the connect command itself, but it shouldn't imply anything
> on tearing down the controller and giving up on it completely, forever.

I agree that DNR is command specific - the problem is when the host sees DNR on a Connect command.  If DNR means "Don't bother to retry the exact same command" - and the command is a Connect command - what does that mean - never try to connect to that NVM subsystem ever again?  That doesn't seem right all the time; there might be cases where it IS the right thing to do.

DNR makes sense for non-connect commands - doing the same thing again has a very high likelihood of failing in exactly the same way it did the first time.  Things like an invalid LBA value in a Read command - no matter how many times you retry - that LBA is still going to be invalid.  But, Connect has the possibility that it could be different - things can change.  Especially, if the device knows it is doing an update, and that it will return - WHY did the device set DNR? Maybe the device should not be setting DNR for this case.  Maybe the device could just hold the Connect command for a short time (say 5-10 seconds), and then FAIL the command (with DNR=0), then the host tries again.  The device just inserts a delay based on how long it might take to finish restarting (and the number of retries the host will do).

Should we talk about the CRD field?  That only works 1) if you have an IDENTIFY CONTROLLER structure to tell you the delay times, and 2) if the host enables it (via Host Behavior Enable).  We never really talk (in the spec) about doing that for a Connect kind of function - we assume there is already a valid connection to use that feature.  Using that (and I do mean writing spec text) would allow a Connect command to suggest a time limit for a retry.

	Fred

> -----Original Message-----
> From: Sagi Grimberg <sagi@grimberg.me>
> Sent: Wednesday, September 28, 2022 10:23 AM
> To: Daniel Wagner <dwagner@suse.de>; Knight, Frederick
> <Frederick.Knight@netapp.com>
> Cc: linux-nvme@lists.infradead.org; Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com>; hare@suse.de
> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
> changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> 
> 
> >> Would this be a case for a new AEN - controller configuration changed?
> >> I'm also wondering exactly what changed in the controller?  It can't
> >> be the "Number of Queues" feature (that can't change - The controller
> >> shall not change the value allocated between resets.).  Is it the
> >> MQES field in the CAP property that changes (queue size)?
> >>
> >> We already have change reporting for: Namespace attribute,
> >> Predictable Latency, LBA status, EG aggregate, Zone descriptor,
> >> Discovery Log, Reservations. We've questioned whether we need a
> >> Controller Attribute Changed.
> >>
> >> Would this be a case for an exception?  Does the DNR bit apply only
> >> to commands sent on queues that already exist (i.e., NOT the connect
> >> command since that command is actually creating the queue)?  FWIW - I
> >> don't like exceptions.
> >>
> >> Can you elaborate on exactly what is changing?
> >
> > The background story is, that we have a setup with two targets (*) and
> > the host is connected two both of them (HA setup). Both server run the
> > same software version. The host is told via Number of Queues (Feature
> > Identifier 07h) how many queues are supported (N queues).
> >
> > Now, a software upgrade is started which takes first one server
> > offline, updates it and brings it online again. Then the same process
> > with the second server.
> >
> > In the meantime the host tries to reconnect. Eventually, the reconnect
> > is successful but the Number of Queues (Feature Identifier 07h) has
> > changed to a smaller value, e.g N-2 queues.
> >
> > My test case here is trying to replicated this scenario but just with
> > one target. Hence the problem how to notify the host that it should
> > reconnect. As you mentioned this is not to supposed to change as long
> > a connection is established.
> >
> > My understanding is that the current nvme target implementation in
> > Linux doesn't really support this HA setup scenario hence my attempt
> > to get it flying with one target. The DNR bit comes into play because
> > I was toying with removing the subsystem from the port, changing the
> > number of queues and re-adding the subsystem to the port.
> >
> > This resulted in any request posted from the host seeing the DNR bit.
> > The second attempt here was to delete the controller to force a
> > reconnect. I agree it's also not really the right thing to do.
> >
> > As far I can tell, what's is missing from a testing point of view is
> > the ability to fail requests without the DNR bit set or the ability to
> > tell the host to reconnect. Obviously, an AEN would be nice for this
> > but I don't know if this is reason enough to extend the spec.
> 
> Looking into the code, its the connect that fails on invalid parameter with a
> DNR, because the host is attempting to connect to a subsystems that does
> not exist on the port (because it was taken offline for maintenance reasons).
> 
> So I guess it is valid to allow queue change without removing it from the
> port, but that does not change the fundamental question on DNR.
> If the host sees a DNR error on connect, my interpretation is that the host
> should not retry the connect command itself, but it shouldn't imply anything
> on tearing down the controller and giving up on it completely, forever.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 15:26             ` Daniel Wagner
@ 2022-09-28 18:02               ` Knight, Frederick
  2022-09-29  2:14                 ` John Meneghini
  2022-09-30  6:32                 ` Hannes Reinecke
  0 siblings, 2 replies; 23+ messages in thread
From: Knight, Frederick @ 2022-09-28 18:02 UTC (permalink / raw)
  To: Daniel Wagner, John Meneghini
  Cc: Sagi Grimberg, linux-nvme, Shinichiro Kawasaki, hare

Reset and Disconnect are different.

Reset uses registers (and therefore Get Property and Set Property commands) over a valid connection.

Disconnect causes a reset (you have to reconnect to the reset controller)
Reset does not cause an immediate disconnect.

	Fred

> -----Original Message-----
> From: Daniel Wagner <dwagner@suse.de>
> Sent: Wednesday, September 28, 2022 11:26 AM
> To: John Meneghini <jmeneghi@redhat.com>
> Cc: Knight, Frederick <Frederick.Knight@netapp.com>; Sagi Grimberg
> <sagi@grimberg.me>; linux-nvme@lists.infradead.org; Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com>; hare@suse.de
> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
> changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> 
> 
> On Wed, Sep 28, 2022 at 11:01:00AM -0400, John Meneghini wrote:
> > So this is one place where I think we conflate NVMe-oF with NVMe in
> > the specification.  If "reset" == "disconnect", then Daniel's patches
> > just implemented support for something which is forbidden by the spec
> > - a controller that changes the number of IOQs between resets.
> 
> FWIW, the scenario I described is from a real world experience. So even if it
> is forbidden by the spec, it exists.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 18:02               ` Knight, Frederick
@ 2022-09-29  2:14                 ` John Meneghini
  2022-09-29  3:04                   ` Knight, Frederick
  2022-09-30  7:03                   ` Hannes Reinecke
  2022-09-30  6:32                 ` Hannes Reinecke
  1 sibling, 2 replies; 23+ messages in thread
From: John Meneghini @ 2022-09-29  2:14 UTC (permalink / raw)
  To: Knight, Frederick, Daniel Wagner
  Cc: Sagi Grimberg, linux-nvme, Shinichiro Kawasaki, hare

On 9/28/22 14:02, Knight, Frederick wrote:
> Reset and Disconnect are different.
> 
> Reset uses registers (and therefore Get Property and Set Property commands) over a valid connection.
> 
> Disconnect causes a reset (you have to reconnect to the reset controller)
> Reset does not cause an immediate disconnect.

OK, so you are saying that the rule "The controller shall not change the value allocated between resets" in NVMe 5.27.1.5 
doesn't apply to fabric controllers?

/John

> 	Fred
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-29  2:14                 ` John Meneghini
@ 2022-09-29  3:04                   ` Knight, Frederick
  2022-09-30  7:03                   ` Hannes Reinecke
  1 sibling, 0 replies; 23+ messages in thread
From: Knight, Frederick @ 2022-09-29  3:04 UTC (permalink / raw)
  To: John Meneghini, Daniel Wagner
  Cc: Sagi Grimberg, linux-nvme, Shinichiro Kawasaki, hare

I sure didn't intend to said that.  Section 3.6.2 talks about shutting down the controller (which leaves the connection intact).

Section 3.7.1 talks about NVM Subsystem resets which causes controller level resets (where again, I believe the intent is that the connection remains intact - so the host can use Get/Set Properties commands to restore operation (if the host so chooses)).

Those are cases where resets happen but the connection remains intact.  The spec suggests a 2 minute window (see section 3.5.2) - last sentence:
The association may be removed if step 5 (i.e., setting CC.EN to ‘1’) is not completed within 2 minutes after establishing the association.
Section 3.6.2 on shutdown says:
After CC.EN transitions to ‘0’ (i.e., due to Controller Level Reset), the association between the host and controller shall be preserved for at least 2 minutes. After this time, the association may be removed if the controller has not been re-enabled.
You've got 2 minutes to reenable the controller before disconnection.

There are also statements in those sections that talk about fatal errors - which do cause the connection to go down (and which therefore, reset the controller).

Section 3.7.2 talks about Controller Level Resets (which uses CC.EN). Although, this section does say - that for message-based transports (fabrics), all internal controller state is reset - no exceptions.  If the host is trying to reenable a controller after a reset on an existing connection, then it must reestablish everything - because everything (all state) has been reset.  The host can't assume anything has been retained.

If the controller doesn't get reenabled, then a disconnect will happen.

But, I’m not sure what that has to do with the Number of Queues feature setting.
The host can do a SET FEATURES to request a Number of Queues - ONCE, and the controller then tells the host how many it really gets.
Then, the host can't do that again (Command Sequence Error if you try); and
The controller can't change it's mind about how many it gave to the host.
That value is STUCK - until a RESET happens.
Once a RESET happens, then there are no queues, so everything must be initialized again.
The host goes through the whole controller enabling process again - including a new SET FEATURES to request a number of queues and the controller tells the host how many it gets this time.
And again, that number is stuck until a RESET happens.

If the controller wants to change the number of IO Queues - then it must force a RESET - and cause the host to perform a full reinitialization (and reestablish whatever is the appropriate new number of IO Queues).

Yes, when that RESET happens, the connection could be dropped, but that is not required.

	Fred

> -----Original Message-----
> From: John Meneghini <jmeneghi@redhat.com>
> Sent: Wednesday, September 28, 2022 10:14 PM
> To: Knight, Frederick <Frederick.Knight@netapp.com>; Daniel Wagner
> <dwagner@suse.de>
> Cc: Sagi Grimberg <sagi@grimberg.me>; linux-nvme@lists.infradead.org;
> Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>; hare@suse.de
> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
> changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> 
> 
> On 9/28/22 14:02, Knight, Frederick wrote:
> > Reset and Disconnect are different.
> >
> > Reset uses registers (and therefore Get Property and Set Property
> commands) over a valid connection.
> >
> > Disconnect causes a reset (you have to reconnect to the reset
> > controller) Reset does not cause an immediate disconnect.
> 
> OK, so you are saying that the rule "The controller shall not change the value
> allocated between resets" in NVMe 5.27.1.5 doesn't apply to fabric
> controllers?
> 
> /John
> 
> >       Fred
> >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 18:02               ` Knight, Frederick
  2022-09-29  2:14                 ` John Meneghini
@ 2022-09-30  6:32                 ` Hannes Reinecke
  1 sibling, 0 replies; 23+ messages in thread
From: Hannes Reinecke @ 2022-09-30  6:32 UTC (permalink / raw)
  To: Knight, Frederick, Daniel Wagner, John Meneghini
  Cc: Sagi Grimberg, linux-nvme, Shinichiro Kawasaki

On 9/28/22 20:02, Knight, Frederick wrote:
> Reset and Disconnect are different.
> 
> Reset uses registers (and therefore Get Property and Set Property commands) over a valid connection.
> 
> Disconnect causes a reset (you have to reconnect to the reset controller)
> Reset does not cause an immediate disconnect.
> 
This probably is a confusion with how the linux implementation works.
For linux NVMe-oF a reset is equivalent to a reconnect (ie disconnect 
followed by a reconnect), as for linux _any_ error will be escalated to 
be equivalent to a connection error.
Hence a pure 'reset' using Get/Set Property is never attempted.

Cheers,

Hannes

>> -----Original Message-----
>> From: Daniel Wagner <dwagner@suse.de>
>> Sent: Wednesday, September 28, 2022 11:26 AM
>> To: John Meneghini <jmeneghi@redhat.com>
>> Cc: Knight, Frederick <Frederick.Knight@netapp.com>; Sagi Grimberg
>> <sagi@grimberg.me>; linux-nvme@lists.infradead.org; Shinichiro Kawasaki
>> <shinichiro.kawasaki@wdc.com>; hare@suse.de
>> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
>> changes
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>>
>> On Wed, Sep 28, 2022 at 11:01:00AM -0400, John Meneghini wrote:
>>> So this is one place where I think we conflate NVMe-oF with NVMe in
>>> the specification.  If "reset" == "disconnect", then Daniel's patches
>>> just implemented support for something which is forbidden by the spec
>>> - a controller that changes the number of IOQs between resets.
>>
>> FWIW, the scenario I described is from a real world experience. So even if it
>> is forbidden by the spec, it exists.

-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-29  2:14                 ` John Meneghini
  2022-09-29  3:04                   ` Knight, Frederick
@ 2022-09-30  7:03                   ` Hannes Reinecke
  1 sibling, 0 replies; 23+ messages in thread
From: Hannes Reinecke @ 2022-09-30  7:03 UTC (permalink / raw)
  To: John Meneghini, Knight, Frederick, Daniel Wagner
  Cc: Sagi Grimberg, linux-nvme, Shinichiro Kawasaki

On 9/29/22 04:14, John Meneghini wrote:
> On 9/28/22 14:02, Knight, Frederick wrote:
>> Reset and Disconnect are different.
>>
>> Reset uses registers (and therefore Get Property and Set Property 
>> commands) over a valid connection.
>>
>> Disconnect causes a reset (you have to reconnect to the reset controller)
>> Reset does not cause an immediate disconnect.
> 
> OK, so you are saying that the rule "The controller shall not change the 
> value allocated between resets" in NVMe 5.27.1.5 doesn't apply to fabric 
> controllers?
> 
No; problem is that the linux usage of the word 'reset' really is a 
disconnect/reconnect (for which a queue change is allowed); we never to 
a NVMe reset via Get/Set Properties for fabrics.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-09-28 14:23             ` Sagi Grimberg
  2022-09-28 15:21               ` Hannes Reinecke
  2022-09-28 16:01               ` Knight, Frederick
@ 2022-10-05 18:15               ` Daniel Wagner
  2022-10-06 11:37                 ` Sagi Grimberg
  2 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2022-10-05 18:15 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Knight, Frederick, linux-nvme, Shinichiro Kawasaki, hare

On Wed, Sep 28, 2022 at 05:23:06PM +0300, Sagi Grimberg wrote:
> > As far I can tell, what's is missing from a testing point of view is the
> > ability to fail requests without the DNR bit set or the ability to tell
> > the host to reconnect. Obviously, an AEN would be nice for this but I
> > don't know if this is reason enough to extend the spec.
> 
> Looking into the code, its the connect that fails on invalid parameter
> with a DNR, because the host is attempting to connect to a subsystems
> that does not exist on the port (because it was taken offline for
> maintenance reasons).
> 
> So I guess it is valid to allow queue change without removing it from
> the port, but that does not change the fundamental question on DNR.
> If the host sees a DNR error on connect, my interpretation is that the
> host should not retry the connect command itself, but it shouldn't imply
> anything on tearing down the controller and giving up on it completely,
> forever.

Okay, let me try to avoid the DNR discussion for now and propose
something else? What about adding a 'enable' attribute to the subsys?

The snipped below does the trick. Though There is no explicit
synchronization between host and target, so it's possible the host
doesn't notice that the subsystem toggled enabled and updated the number
queues. But not sure if it's worth to address this, it feels a bit
over-engineered.


diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 051a420d818e..2438f5e04409 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1294,6 +1294,9 @@ static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
 	struct nvmet_ctrl *ctrl;
 	u16 qid_max;
 
+	if (subsys->enabled)
+		return -EACCES;
+
 	if (sscanf(page, "%hu\n", &qid_max) != 1)
 		return -EINVAL;
 
@@ -1302,16 +1305,39 @@ static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item,
 
 	down_write(&nvmet_config_sem);
 	subsys->max_qid = qid_max;
-
-	/* Force reconnect */
-	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
-		ctrl->ops->delete_ctrl(ctrl);
 	up_write(&nvmet_config_sem);
 
 	return cnt;
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_qid_max);
 
+static ssize_t nvmet_subsys_attr_enable_show(struct config_item *item,
+					     char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%d\n", to_subsys(item)->enabled);
+}
+
+static ssize_t nvmet_subsys_attr_enable_store(struct config_item *item,
+					      const char *page, size_t cnt)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+	bool enable;
+
+	if (strtobool(page, &enable))
+		goto inval;
+
+	if (enable)
+		nvmet_subsystem_enable(subsys);
+	else
+		nvmet_subsystem_disable(subsys);
+
+	return cnt;
+inval:
+	pr_err("Invalid value '%s' for enable\n", page);
+	return cnt;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_enable);
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
@@ -1320,6 +1346,7 @@ static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_cntlid_max,
 	&nvmet_subsys_attr_attr_model,
 	&nvmet_subsys_attr_attr_qid_max,
+	&nvmet_subsys_attr_attr_enable,
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 	&nvmet_subsys_attr_attr_pi_enable,
 #endif
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 8e3cf0c3588c..487842f16d67 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -546,6 +546,22 @@ bool nvmet_ns_revalidate(struct nvmet_ns *ns)
 	return oldsize != ns->size;
 }
 
+int nvmet_subsystem_enable(struct nvmet_subsys *subsys)
+{
+	mutex_lock(&subsys->lock);
+	subsys->enabled = true;
+	mutex_unlock(&subsys->lock);
+
+	return 0;
+}
+
+void nvmet_subsystem_disable(struct nvmet_subsys *subsys)
+{
+	mutex_lock(&subsys->lock);
+	subsys->enabled = false;
+	mutex_unlock(&subsys->lock);
+}
+
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
@@ -954,7 +970,10 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 	if (unlikely(!req->sq->ctrl))
 		/* will return an error for any non-connect command: */
 		status = nvmet_parse_connect_cmd(req);
-	else if (likely(req->sq->qid != 0))
+	else if (unlikely(!req->sq->ctrl->subsys->enabled)) {
+		req->error_loc = offsetof(struct nvme_common_command, dptr);
+		status = NVME_SC_INTERNAL;
+	} else if (likely(req->sq->qid != 0))
 		status = nvmet_parse_io_cmd(req);
 	else
 		status = nvmet_parse_admin_cmd(req);
@@ -1368,6 +1387,15 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	}
 
 	down_read(&nvmet_config_sem);
+	if (!subsys->enabled) {
+		pr_info("connect by host %s for disabled subsystem %s not possible\n",
+			hostnqn, subsysnqn);
+		req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(subsysnqn);
+		up_read(&nvmet_config_sem);
+		status = NVME_SC_INTERNAL;
+		req->error_loc = offsetof(struct nvme_common_command, dptr);
+		goto out_put_subsystem;
+	}
 	if (!nvmet_host_allowed(subsys, hostnqn)) {
 		pr_info("connect by host %s for subsystem %s not allowed\n",
 			hostnqn, subsysnqn);
@@ -1588,6 +1616,8 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 	INIT_LIST_HEAD(&subsys->ctrls);
 	INIT_LIST_HEAD(&subsys->hosts);
 
+	subsys->enabled = true;
+
 	return subsys;
 
 free_mn:
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index dfe3894205aa..269bfc979870 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -235,6 +235,7 @@ struct nvmet_ctrl {
 
 struct nvmet_subsys {
 	enum nvme_subsys_type	type;
+	bool			enabled;
 
 	struct mutex		lock;
 	struct kref		ref;
@@ -486,6 +487,8 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 		enum nvme_subsys_type type);
 void nvmet_subsys_put(struct nvmet_subsys *subsys);
 void nvmet_subsys_del_ctrls(struct nvmet_subsys *subsys);
+int nvmet_subsystem_enable(struct nvmet_subsys *subsys);
+void nvmet_subsystem_disable(struct nvmet_subsys *subsys);
 
 u16 nvmet_req_find_ns(struct nvmet_req *req);
 void nvmet_put_namespace(struct nvmet_ns *ns);


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-10-05 18:15               ` Daniel Wagner
@ 2022-10-06 11:37                 ` Sagi Grimberg
  2022-10-06 20:15                   ` James Smart
  0 siblings, 1 reply; 23+ messages in thread
From: Sagi Grimberg @ 2022-10-06 11:37 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Knight, Frederick, linux-nvme, Shinichiro Kawasaki, hare


>>> As far I can tell, what's is missing from a testing point of view is the
>>> ability to fail requests without the DNR bit set or the ability to tell
>>> the host to reconnect. Obviously, an AEN would be nice for this but I
>>> don't know if this is reason enough to extend the spec.
>>
>> Looking into the code, its the connect that fails on invalid parameter
>> with a DNR, because the host is attempting to connect to a subsystems
>> that does not exist on the port (because it was taken offline for
>> maintenance reasons).
>>
>> So I guess it is valid to allow queue change without removing it from
>> the port, but that does not change the fundamental question on DNR.
>> If the host sees a DNR error on connect, my interpretation is that the
>> host should not retry the connect command itself, but it shouldn't imply
>> anything on tearing down the controller and giving up on it completely,
>> forever.
> 
> Okay, let me try to avoid the DNR discussion for now and propose
> something else? What about adding a 'enable' attribute to the subsys?
> 
> The snipped below does the trick. Though There is no explicit
> synchronization between host and target, so it's possible the host
> doesn't notice that the subsystem toggled enabled and updated the number
> queues. But not sure if it's worth to address this, it feels a bit
> over-engineered.

I think that for the matter of this patch, you can keep force reconnect.

But I still think we need to be consistent with the different transports
on how we interperet controller returning DNR...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-10-06 11:37                 ` Sagi Grimberg
@ 2022-10-06 20:15                   ` James Smart
  2022-10-06 20:54                     ` Knight, Frederick
  0 siblings, 1 reply; 23+ messages in thread
From: James Smart @ 2022-10-06 20:15 UTC (permalink / raw)
  To: Sagi Grimberg, Daniel Wagner
  Cc: Knight, Frederick, linux-nvme, Shinichiro Kawasaki, hare

On 10/6/2022 4:37 AM, Sagi Grimberg wrote:
> 
>>>> As far I can tell, what's is missing from a testing point of view is 
>>>> the
>>>> ability to fail requests without the DNR bit set or the ability to tell
>>>> the host to reconnect. Obviously, an AEN would be nice for this but I
>>>> don't know if this is reason enough to extend the spec.
>>>
>>> Looking into the code, its the connect that fails on invalid parameter
>>> with a DNR, because the host is attempting to connect to a subsystems
>>> that does not exist on the port (because it was taken offline for
>>> maintenance reasons).
>>>
>>> So I guess it is valid to allow queue change without removing it from
>>> the port, but that does not change the fundamental question on DNR.
>>> If the host sees a DNR error on connect, my interpretation is that the
>>> host should not retry the connect command itself, but it shouldn't imply
>>> anything on tearing down the controller and giving up on it completely,
>>> forever.
>>
>> Okay, let me try to avoid the DNR discussion for now and propose
>> something else? What about adding a 'enable' attribute to the subsys?
>>
>> The snipped below does the trick. Though There is no explicit
>> synchronization between host and target, so it's possible the host
>> doesn't notice that the subsystem toggled enabled and updated the number
>> queues. But not sure if it's worth to address this, it feels a bit
>> over-engineered.
> 
> I think that for the matter of this patch, you can keep force reconnect.
> 
> But I still think we need to be consistent with the different transports
> on how we interperet controller returning DNR...
> 

I agree - behavior should be the same regardless of transport.  DNR is 
pretty specific in its definition - "If the same command is re-submitted 
to any controller in the NVM subsystem, then that re-submitted command 
is expected to fail"

But, what we forget is "the command" is actually "command X with fields 
set this way".  If we change the fields, it may actually succeed.

So if we're re-issuing Connect in the same way without any changing 
values, we're better off not reconencting.   But if Connect changes, 
we're back to ground 0.

-- james



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH v2] nvmet: force reconnect when number of queue changes
  2022-10-06 20:15                   ` James Smart
@ 2022-10-06 20:54                     ` Knight, Frederick
  0 siblings, 0 replies; 23+ messages in thread
From: Knight, Frederick @ 2022-10-06 20:54 UTC (permalink / raw)
  To: James Smart, Sagi Grimberg, Daniel Wagner
  Cc: linux-nvme, Shinichiro Kawasaki, hare

YES!  100%

Don't retry (DNR) - IF you are resending exactly the same command with exactly the same parameters, it is likely to fail for the same reason it did the first time. If you change the commands parameters (maybe because you're trying to fix whatever caused the failure), then DNR doesn't apply.

	Fred

> -----Original Message-----
> From: James Smart <jsmart2021@gmail.com>
> Sent: Thursday, October 6, 2022 4:15 PM
> To: Sagi Grimberg <sagi@grimberg.me>; Daniel Wagner
> <dwagner@suse.de>
> Cc: Knight, Frederick <Frederick.Knight@netapp.com>; linux-
> nvme@lists.infradead.org; Shinichiro Kawasaki
> <shinichiro.kawasaki@wdc.com>; hare@suse.de
> Subject: Re: [PATCH v2] nvmet: force reconnect when number of queue
> changes
> 
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
> 
> 
> 
> 
> On 10/6/2022 4:37 AM, Sagi Grimberg wrote:
> >
> >>>> As far I can tell, what's is missing from a testing point of view
> >>>> is the ability to fail requests without the DNR bit set or the
> >>>> ability to tell the host to reconnect. Obviously, an AEN would be
> >>>> nice for this but I don't know if this is reason enough to extend
> >>>> the spec.
> >>>
> >>> Looking into the code, its the connect that fails on invalid
> >>> parameter with a DNR, because the host is attempting to connect to a
> >>> subsystems that does not exist on the port (because it was taken
> >>> offline for maintenance reasons).
> >>>
> >>> So I guess it is valid to allow queue change without removing it
> >>> from the port, but that does not change the fundamental question on
> DNR.
> >>> If the host sees a DNR error on connect, my interpretation is that
> >>> the host should not retry the connect command itself, but it
> >>> shouldn't imply anything on tearing down the controller and giving
> >>> up on it completely, forever.
> >>
> >> Okay, let me try to avoid the DNR discussion for now and propose
> >> something else? What about adding a 'enable' attribute to the subsys?
> >>
> >> The snipped below does the trick. Though There is no explicit
> >> synchronization between host and target, so it's possible the host
> >> doesn't notice that the subsystem toggled enabled and updated the
> >> number queues. But not sure if it's worth to address this, it feels a
> >> bit over-engineered.
> >
> > I think that for the matter of this patch, you can keep force reconnect.
> >
> > But I still think we need to be consistent with the different
> > transports on how we interperet controller returning DNR...
> >
> 
> I agree - behavior should be the same regardless of transport.  DNR is pretty
> specific in its definition - "If the same command is re-submitted to any
> controller in the NVM subsystem, then that re-submitted command is
> expected to fail"
> 
> But, what we forget is "the command" is actually "command X with fields set
> this way".  If we change the fields, it may actually succeed.
> 
> So if we're re-issuing Connect in the same way without any changing
> values, we're better off not reconencting.   But if Connect changes,
> we're back to ground 0.
> 
> -- james


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-10-06 20:55 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 14:31 [PATCH v2] nvmet: force reconnect when number of queue changes Daniel Wagner
2022-09-27 15:01 ` Daniel Wagner
2022-09-28  6:55 ` Sagi Grimberg
2022-09-28  7:48   ` Daniel Wagner
2022-09-28  8:31     ` Sagi Grimberg
2022-09-28  9:02       ` Daniel Wagner
2022-09-28 12:39         ` Knight, Frederick
2022-09-28 13:50           ` Daniel Wagner
2022-09-28 14:23             ` Sagi Grimberg
2022-09-28 15:21               ` Hannes Reinecke
2022-09-28 16:01               ` Knight, Frederick
2022-10-05 18:15               ` Daniel Wagner
2022-10-06 11:37                 ` Sagi Grimberg
2022-10-06 20:15                   ` James Smart
2022-10-06 20:54                     ` Knight, Frederick
2022-09-28 16:01             ` Knight, Frederick
2022-09-28 15:01           ` John Meneghini
2022-09-28 15:26             ` Daniel Wagner
2022-09-28 18:02               ` Knight, Frederick
2022-09-29  2:14                 ` John Meneghini
2022-09-29  3:04                   ` Knight, Frederick
2022-09-30  7:03                   ` Hannes Reinecke
2022-09-30  6:32                 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).