All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] nvme-tcp: Handle number of queue changes
@ 2022-08-12 14:28 Daniel Wagner
  2022-08-17 11:28 ` Sagi Grimberg
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Wagner @ 2022-08-12 14:28 UTC (permalink / raw)
  To: linux-nvme; +Cc: Daniel Wagner

On reconnect, the number of queues might have changed. In the case
where the old number of queues is lower than the new one, we try to
restart the new number of queues. At this point though, the tagset
is has not yet been updated and the operations fails.

Thus, on reconnect only start only the old number of queues, then update
the tag set and then start the rest of the newly added queues.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---

We got a report from our customer that a firmware upgrade on the
storage array is able to 'break' host. This is caused a change of
number of queues which the target supports after a reconnect.

Let's assume the number of queues is 8 and all is working fine. Then
the connection is dropped and the host starts to try to
reconnect. Eventually, this succeeds but now the new number of queues
is 10:

nvme0: creating 8 I/O queues.
nvme0: mapped 8/0/0 default/read/poll queues.
nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
nvme0: queue 0: timeout request 0x0 type 4
nvme0: starting error recovery
nvme0: failed nvme_keep_alive_end_io error=10
nvme0: Reconnecting in 10 seconds...
nvme0: failed to connect socket: -110
nvme0: Failed reconnect attempt 1
nvme0: Reconnecting in 10 seconds...
nvme0: creating 10 I/O queues.
nvme0: Connect command failed, error wo/DNR bit: -16389
nvme0: failed to connect queue: 9 ret=-5
nvme0: Failed reconnect attempt 2

As you can see queue number 9 is not able to connect.

As the order of starting and unfreezing is important we can't just
move the start of the queues after the tagset update. So my stupid
idea was to start just the older number of queues and then the rest.

This seems work:

nvme nvme0: creating 4 I/O queues.
nvme nvme0: mapped 4/0/0 default/read/poll queues.
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
nvme nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
nvme nvme0: queue 0: timeout request 0x0 type 4
nvme nvme0: starting error recovery
nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71) 
nvme nvme0: failed nvme_keep_alive_end_io error=10
nvme nvme0: Reconnecting in 10 seconds...
nvme nvme0: creating 6 I/O queues.
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 7 qcnt 5
nvme nvme0: mapped 6/0/0 default/read/poll queues.
nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 7 qcnt 7
nvme nvme0: Successfully reconnected (1 attempt)
nvme nvme0: starting error recovery
nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71) 
nvme nvme0: failed nvme_keep_alive_end_io error=10
nvme nvme0: Reconnecting in 10 seconds...
nvme nvme0: creating 4 I/O queues.
nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 5 qcnt 5
nvme nvme0: mapped 4/0/0 default/read/poll queues.
nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
nvme nvme0: Successfully reconnected (1 attempt)

And yes, the patch per se is ugly, but I didn't want to reorganize
code at this point. It just an illustration of my idea.

Feedback highly appreciated!

Thanks,
Daniel


 drivers/nvme/host/tcp.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index e82dcfcda29b..177d2f5d9047 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1671,6 +1671,9 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
 	int ret;
 
+	if (test_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags))
+		return 0;
+
 	if (idx)
 		ret = nvmf_connect_io_queue(nctrl, idx);
 	else
@@ -1759,12 +1762,18 @@ static void nvme_tcp_stop_io_queues(struct nvme_ctrl *ctrl)
 		nvme_tcp_stop_queue(ctrl, i);
 }
 
-static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
+static int nvme_tcp_start_io_queues(struct nvme_ctrl *nctrl)
 {
-	int i, ret;
+	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
+	int i, ret, qcnt;
 
-	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = nvme_tcp_start_queue(ctrl, i);
+	qcnt = min(ctrl->tag_set.nr_hw_queues + 1, nctrl->queue_count);
+
+	dev_info(nctrl->device, "nr_hw_queues %d queue_count %d qcnt %d\n",
+	         ctrl->tag_set.nr_hw_queues, nctrl->queue_count, qcnt);
+
+	for (i = 1; i < qcnt; i++) {
+		ret = nvme_tcp_start_queue(nctrl, i);
 		if (ret)
 			goto out_stop_queues;
 	}
@@ -1773,7 +1782,7 @@ static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
 
 out_stop_queues:
 	for (i--; i >= 1; i--)
-		nvme_tcp_stop_queue(ctrl, i);
+		nvme_tcp_stop_queue(nctrl, i);
 	return ret;
 }
 
@@ -1934,6 +1943,10 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 		nvme_unfreeze(ctrl);
 	}
 
+	ret = nvme_tcp_start_io_queues(ctrl);
+	if (ret)
+		goto out_cleanup_connect_q;
+
 	return 0;
 
 out_wait_freeze_timed_out:
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC] nvme-tcp: Handle number of queue changes
  2022-08-12 14:28 [RFC] nvme-tcp: Handle number of queue changes Daniel Wagner
@ 2022-08-17 11:28 ` Sagi Grimberg
  2022-08-18  7:06   ` Daniel Wagner
  0 siblings, 1 reply; 3+ messages in thread
From: Sagi Grimberg @ 2022-08-17 11:28 UTC (permalink / raw)
  To: Daniel Wagner, linux-nvme


> On reconnect, the number of queues might have changed. In the case
> where the old number of queues is lower than the new one, we try to
> restart the new number of queues. At this point though, the tagset
> is has not yet been updated and the operations fails.
> 
> Thus, on reconnect only start only the old number of queues, then update
> the tag set and then start the rest of the newly added queues.

I agree this change is needed. The same issue exists in rdma and
possibly others...


> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
> 
> We got a report from our customer that a firmware upgrade on the
> storage array is able to 'break' host. This is caused a change of
> number of queues which the target supports after a reconnect.
> 
> Let's assume the number of queues is 8 and all is working fine. Then
> the connection is dropped and the host starts to try to
> reconnect. Eventually, this succeeds but now the new number of queues
> is 10:
> 
> nvme0: creating 8 I/O queues.
> nvme0: mapped 8/0/0 default/read/poll queues.
> nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
> nvme0: queue 0: timeout request 0x0 type 4
> nvme0: starting error recovery
> nvme0: failed nvme_keep_alive_end_io error=10
> nvme0: Reconnecting in 10 seconds...
> nvme0: failed to connect socket: -110
> nvme0: Failed reconnect attempt 1
> nvme0: Reconnecting in 10 seconds...
> nvme0: creating 10 I/O queues.
> nvme0: Connect command failed, error wo/DNR bit: -16389
> nvme0: failed to connect queue: 9 ret=-5
> nvme0: Failed reconnect attempt 2
> 
> As you can see queue number 9 is not able to connect.
> 
> As the order of starting and unfreezing is important we can't just
> move the start of the queues after the tagset update. So my stupid
> idea was to start just the older number of queues and then the rest.
> 
> This seems work:
> 
> nvme nvme0: creating 4 I/O queues.
> nvme nvme0: mapped 4/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme nvme0: new ctrl: NQN "nvmet-test", addr 10.100.128.29:4420
> nvme nvme0: queue 0: timeout request 0x0 type 4
> nvme nvme0: starting error recovery
> nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
> nvme nvme0: failed nvme_keep_alive_end_io error=10
> nvme nvme0: Reconnecting in 10 seconds...
> nvme nvme0: creating 6 I/O queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 7 qcnt 5
> nvme nvme0: mapped 6/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 7 qcnt 7
> nvme nvme0: Successfully reconnected (1 attempt)
> nvme nvme0: starting error recovery
> nvme0: Keep Alive(0x18), Unknown (sct 0x3 / sc 0x71)
> nvme nvme0: failed nvme_keep_alive_end_io error=10
> nvme nvme0: Reconnecting in 10 seconds...
> nvme nvme0: creating 4 I/O queues.
> nvme_tcp_start_io_queues nr_hw_queues 6 queue_count 5 qcnt 5
> nvme nvme0: mapped 4/0/0 default/read/poll queues.
> nvme_tcp_start_io_queues nr_hw_queues 4 queue_count 5 qcnt 5
> nvme nvme0: Successfully reconnected (1 attempt)
> 
> And yes, the patch per se is ugly, but I didn't want to reorganize
> code at this point. It just an illustration of my idea.
> 
> Feedback highly appreciated!
> 
> Thanks,
> Daniel
> 
> 
>   drivers/nvme/host/tcp.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index e82dcfcda29b..177d2f5d9047 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1671,6 +1671,9 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
>   	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
>   	int ret;
>   
> +	if (test_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags))
> +		return 0;
> +

How about we make concise calls to queue starts instead of
making this re-entrant safe... Although this does follow
the same model as pci...

>   	if (idx)
>   		ret = nvmf_connect_io_queue(nctrl, idx);
>   	else
> @@ -1759,12 +1762,18 @@ static void nvme_tcp_stop_io_queues(struct nvme_ctrl *ctrl)
>   		nvme_tcp_stop_queue(ctrl, i);
>   }
>   
> -static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
> +static int nvme_tcp_start_io_queues(struct nvme_ctrl *nctrl)
>   {
> -	int i, ret;
> +	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> +	int i, ret, qcnt;
>   
> -	for (i = 1; i < ctrl->queue_count; i++) {
> -		ret = nvme_tcp_start_queue(ctrl, i);
> +	qcnt = min(ctrl->tag_set.nr_hw_queues + 1, nctrl->queue_count);
> +
> +	dev_info(nctrl->device, "nr_hw_queues %d queue_count %d qcnt %d\n",
> +	         ctrl->tag_set.nr_hw_queues, nctrl->queue_count, qcnt);

No need for this print as info, its just spamming...

> +
> +	for (i = 1; i < qcnt; i++) {
> +		ret = nvme_tcp_start_queue(nctrl, i);
>   		if (ret)
>   			goto out_stop_queues;
>   	}
> @@ -1773,7 +1782,7 @@ static int nvme_tcp_start_io_queues(struct nvme_ctrl *ctrl)
>   
>   out_stop_queues:
>   	for (i--; i >= 1; i--)
> -		nvme_tcp_stop_queue(ctrl, i);
> +		nvme_tcp_stop_queue(nctrl, i);
>   	return ret;
>   }
>   
> @@ -1934,6 +1943,10 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
>   		nvme_unfreeze(ctrl);
>   	}
>   
> +	ret = nvme_tcp_start_io_queues(ctrl);
> +	if (ret)
> +		goto out_cleanup_connect_q;
> +

Shouldn't we do the same for the create itself? Is there any chance that
we will leave a tcp connection idle in any situation?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] nvme-tcp: Handle number of queue changes
  2022-08-17 11:28 ` Sagi Grimberg
@ 2022-08-18  7:06   ` Daniel Wagner
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel Wagner @ 2022-08-18  7:06 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme

On Wed, Aug 17, 2022 at 02:28:12PM +0300, Sagi Grimberg wrote:
> 
> > On reconnect, the number of queues might have changed. In the case
> > where the old number of queues is lower than the new one, we try to
> > restart the new number of queues. At this point though, the tagset
> > is has not yet been updated and the operations fails.
> > 
> > Thus, on reconnect only start only the old number of queues, then update
> > the tag set and then start the rest of the newly added queues.
> 
> I agree this change is needed. The same issue exists in rdma and
> possibly others...

Ok, I'll take a look.

> > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> > index e82dcfcda29b..177d2f5d9047 100644
> > --- a/drivers/nvme/host/tcp.c
> > +++ b/drivers/nvme/host/tcp.c
> > @@ -1671,6 +1671,9 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
> >   	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> >   	int ret;
> > +	if (test_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags))
> > +		return 0;
> > +
> 
> How about we make concise calls to queue starts instead of
> making this re-entrant safe... Although this does follow
> the same model as pci...

I am also not too fond about this version here but I was at least sure
it works correctly. I see what I can do.

> > +	dev_info(nctrl->device, "nr_hw_queues %d queue_count %d qcnt %d\n",
> > +	         ctrl->tag_set.nr_hw_queues, nctrl->queue_count, qcnt);
> 
> No need for this print as info, its just spamming...

Sure, was just for the RFC here. Will drop it.

> Shouldn't we do the same for the create itself? Is there any chance that
> we will leave a tcp connection idle in any situation?

No idea, though would this not change how we handle the writes to
/dev/fabrics? Currently, we block and return success or fail.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-18  7:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-12 14:28 [RFC] nvme-tcp: Handle number of queue changes Daniel Wagner
2022-08-17 11:28 ` Sagi Grimberg
2022-08-18  7:06   ` Daniel Wagner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.