[PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
@ 2018-11-04 16:37 Israel Rukshin
  2018-11-04 17:09 ` [Suspected-Phishing][PATCH] " Max Gurtovoy
  2018-11-05 19:34 ` [PATCH] " James Smart
  0 siblings, 2 replies; 13+ messages in thread
From: Israel Rukshin @ 2018-11-04 16:37 UTC (permalink / raw)


From: Sagi Grimberg <sagi@grimberg.me>

blk-mq tells us to skip unmapped queues but we fail the controller altogether.

Signed-off-by: Israel Rukshin <israelr at mellanox.com>
Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6f1bd79..c6dd1efe 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -643,20 +643,33 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
 
 static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
-	int i, ret = 0;
+	int i, ret, count = 0;
 
 	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
 		ret = nvme_rdma_start_queue(ctrl, i);
-		if (ret)
+		if (ret) {
+			if (ret == -EXDEV) {
+				/* unmapped queue, skip ... */
+				nvme_rdma_free_queue(&ctrl->queues[i]);
+				continue;
+			}
 			goto out_stop_queues;
+		}
+		count++;
 	}
 
+	if (!count)
+		/* no started queues, fail */
+		return -EIO;
+
+	dev_info(ctrl->ctrl.device, "connected %d I/O queues.\n", count);
+
 	return 0;
 
 out_stop_queues:
 	for (i--; i >= 1; i--)
 		nvme_rdma_stop_queue(&ctrl->queues[i]);
-	return ret;
+	return -EIO;
 }
 
 static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Suspected-Phishing][PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-04 16:37 [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect Israel Rukshin
@ 2018-11-04 17:09 ` Max Gurtovoy
  2018-11-05 19:34 ` [PATCH] " James Smart
  1 sibling, 0 replies; 13+ messages in thread
From: Max Gurtovoy @ 2018-11-04 17:09 UTC (permalink / raw)



> From: Sagi Grimberg <sagi at grimberg.me>
>
> blk-mq tells us to skip unmapped queues but we fail the controller altogether.
>
> Signed-off-by: Israel Rukshin <israelr at mellanox.com>

need to add Sagi's "Signed-off-by"

Sagi,

you proposed this during cpu-queue affinity debug. We think it's good 
enough to use a ctrl that has at least 1 IOQ.


> Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
>   1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 6f1bd79..c6dd1efe 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -643,20 +643,33 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
>   
>   static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
>   {
> -	int i, ret = 0;
> +	int i, ret, count = 0;
>   
>   	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
>   		ret = nvme_rdma_start_queue(ctrl, i);
> -		if (ret)
> +		if (ret) {
> +			if (ret == -EXDEV) {
> +				/* unmapped queue, skip ... */
> +				nvme_rdma_free_queue(&ctrl->queues[i]);
> +				continue;
> +			}
>   			goto out_stop_queues;
> +		}
> +		count++;
>   	}
>   
> +	if (!count)
> +		/* no started queues, fail */
> +		return -EIO;
> +
> +	dev_info(ctrl->ctrl.device, "connected %d I/O queues.\n", count);
> +
>   	return 0;
>   
>   out_stop_queues:
>   	for (i--; i >= 1; i--)
>   		nvme_rdma_stop_queue(&ctrl->queues[i]);
> -	return ret;
> +	return -EIO;
>   }
>   
>   static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-04 16:37 [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect Israel Rukshin
  2018-11-04 17:09 ` [Suspected-Phishing][PATCH] " Max Gurtovoy
@ 2018-11-05 19:34 ` James Smart
  2018-11-06 11:10   ` Max Gurtovoy
  1 sibling, 1 reply; 13+ messages in thread
From: James Smart @ 2018-11-05 19:34 UTC (permalink / raw)

On 11/4/2018 8:37 AM, Israel Rukshin wrote:
> From: Sagi Grimberg <sagi at grimberg.me>
>
> blk-mq tells us to skip unmapped queues but we fail the controller altogether.
>
> Signed-off-by: Israel Rukshin <israelr at mellanox.com>
> Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
>   1 file changed, 16 insertions(+), 3 deletions(-)
>

This sounds odd.?? Why aren't you concerned that io queues are not 
connecting ?? Are there any log messages hinting at the failures ? any 
way someone looking at the controller knows how many queues were 
actually created ? ? I would assume any failure is significant and 
should be visible, and it's worthwhile knowing whether this is a 
consistent failure or a random failure. and what the failure was.

As far as continuing on in spite of failure - ok, but the target has 
already been told, via SET FEATURES, to prepare (thus may allocate 
resources) for X number of io queues, so minimally its a condition where 
the target may be severely wasting resources which may affect it's scale 
out ability.? I assume the transport should should be resetting the 
controller and self-tuning the max queue count so that it is reduced to 
what is being used.? Granted, this assumes the error was consistent and 
for the same reason.? If it was an inconsistent error, its ok to 
continue without resetting, but the transport should be very verbose 
about the state it's in and why.

It appears the patch doesn't save the connected queue count, which means 
the blk-mq nr_hw_queues is not in sync with the new count. What happens 
on those cpu's that select a hctx that is on a queue that isn't 
connected ??? I assume the Q_LIVE bit has the queue_rq routine 
continually bouncing them until they fail ?

Note: For now, where queues are only seen for cpu affinity, the choice 
to "not use it" may not be a big deal. But in the future, queues may be 
related to specific activities/features and this choice can't be so 
arbitrary.

-- james

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-05 19:34 ` [PATCH] " James Smart
@ 2018-11-06 11:10   ` Max Gurtovoy
  2018-11-07  9:07     ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Max Gurtovoy @ 2018-11-06 11:10 UTC (permalink / raw)



On 11/5/2018 9:34 PM, James Smart wrote:
> On 11/4/2018 8:37 AM, Israel Rukshin wrote:
>> From: Sagi Grimberg <sagi at grimberg.me>
>>
>> blk-mq tells us to skip unmapped queues but we fail the controller 
>> altogether.
>>
>> Signed-off-by: Israel Rukshin <israelr at mellanox.com>
>> Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
>> ---
>> ? drivers/nvme/host/rdma.c | 19 ++++++++++++++++---
>> ? 1 file changed, 16 insertions(+), 3 deletions(-)
>>
>
> This sounds odd.?? Why aren't you concerned that io queues are not 
> connecting ?? Are there any log messages hinting at the failures ? any 
> way someone looking at the controller knows how many queues were 
> actually created ? ? I would assume any failure is significant and 
> should be visible, and it's worthwhile knowing whether this is a 
> consistent failure or a random failure. and what the failure was.

This may happen (well it happened in the past, and fixed in the block 
layer) in case there are offline cpu's or some other reason that some 
queue is unmapped.

I prefer not to relay on the block layer to ensure us 100% mapping and 
prefer be safe in our ULP.

>
> As far as continuing on in spite of failure - ok, but the target has 
> already been told, via SET FEATURES, to prepare (thus may allocate 
> resources) for X number of io queues, so minimally its a condition 
> where the target may be severely wasting resources which may affect 
> it's scale out ability.? I assume the transport should should be 
> resetting the controller and self-tuning the max queue count so that 
> it is reduced to what is being used.? Granted, this assumes the error 
> was consistent and for the same reason.? If it was an inconsistent 
> error, its ok to continue without resetting, but the transport should 
> be very verbose about the state it's in and why.

You can create IO queues (connect to IO queues in case of fabric) 
according to your desire. I don't think it's by the spec to create all 
queues that the target/drive support. For scale out we have suggested 
other solutions (set max_queues according to target resources, SRQ per 
core, etc...).



>
> It appears the patch doesn't save the connected queue count, which 
> means the blk-mq nr_hw_queues is not in sync with the new count. What 
> happens on those cpu's that select a hctx that is on a queue that 
> isn't connected ??? I assume the Q_LIVE bit has the queue_rq routine 
> continually bouncing them until they fail ?
>
we only allow failures if we get EXDEV rc. Usually in case of transport 
error, non of the queues will be able to connect. Here we protect 
against "in-house" recoverable error such as offline cpu's or others.

I guess Sagi can comment, it was originally proposed by him.


> Note: For now, where queues are only seen for cpu affinity, the choice 
> to "not use it" may not be a big deal. But in the future, queues may 
> be related to specific activities/features and this choice can't be so 
> arbitrary.

In the future, we'll update the code to suite new features.


>
> -- james
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-06 11:10   ` Max Gurtovoy
@ 2018-11-07  9:07     ` Christoph Hellwig
  2018-11-07 18:31       ` Sagi Grimberg
  2018-11-08  8:20       ` Max Gurtovoy
  0 siblings, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2018-11-07  9:07 UTC (permalink / raw)


On Tue, Nov 06, 2018@01:10:27PM +0200, Max Gurtovoy wrote:
>> This sounds odd.?? Why aren't you concerned that io queues are not 
>> connecting ?? Are there any log messages hinting at the failures ? any 
>> way someone looking at the controller knows how many queues were actually 
>> created ? ? I would assume any failure is significant and should be 
>> visible, and it's worthwhile knowing whether this is a consistent failure 
>> or a random failure. and what the failure was.
>
> This may happen (well it happened in the past, and fixed in the block 
> layer) in case there are offline cpu's or some other reason that some queue 
> is unmapped.
>
> I prefer not to relay on the block layer to ensure us 100% mapping and 
> prefer be safe in our ULP.

How do we ensure we ensure any potential new block layer bug returns
-EXDEV so that your handling kicks in?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-07  9:07     ` Christoph Hellwig
@ 2018-11-07 18:31       ` Sagi Grimberg
  2018-11-08  8:20       ` Max Gurtovoy
  1 sibling, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2018-11-07 18:31 UTC (permalink / raw)



>>> This sounds odd.?? Why aren't you concerned that io queues are not
>>> connecting ?? Are there any log messages hinting at the failures ? any
>>> way someone looking at the controller knows how many queues were actually
>>> created ? ? I would assume any failure is significant and should be
>>> visible, and it's worthwhile knowing whether this is a consistent failure
>>> or a random failure. and what the failure was.
>>
>> This may happen (well it happened in the past, and fixed in the block
>> layer) in case there are offline cpu's or some other reason that some queue
>> is unmapped.
>>
>> I prefer not to relay on the block layer to ensure us 100% mapping and
>> prefer be safe in our ULP.
> 
> How do we ensure we ensure any potential new block layer bug returns
> -EXDEV so that your handling kicks in?

We can't. I guess that we simply need to warn about it but still allow
degraded operation. But the motivation here was to sustain failure when
attempting to connect an unmapped queue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-07  9:07     ` Christoph Hellwig
  2018-11-07 18:31       ` Sagi Grimberg
@ 2018-11-08  8:20       ` Max Gurtovoy
  2018-11-08  8:37         ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Max Gurtovoy @ 2018-11-08  8:20 UTC (permalink / raw)



On 11/7/2018 11:07 AM, Christoph Hellwig wrote:
> On Tue, Nov 06, 2018@01:10:27PM +0200, Max Gurtovoy wrote:
>>> This sounds odd.?? Why aren't you concerned that io queues are not
>>> connecting ?? Are there any log messages hinting at the failures ? any
>>> way someone looking at the controller knows how many queues were actually
>>> created ? ? I would assume any failure is significant and should be
>>> visible, and it's worthwhile knowing whether this is a consistent failure
>>> or a random failure. and what the failure was.
>> This may happen (well it happened in the past, and fixed in the block
>> layer) in case there are offline cpu's or some other reason that some queue
>> is unmapped.
>>
>> I prefer not to relay on the block layer to ensure us 100% mapping and
>> prefer be safe in our ULP.
> How do we ensure we ensure any potential new block layer bug returns
> -EXDEV so that your handling kicks in?

well we can't ensure that. Are you suggesting to do the handling for 
each failure ?

I guess we'll need this anyway for older kernels that don't have the fix 
for offline cpu mapping.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-08  8:20       ` Max Gurtovoy
@ 2018-11-08  8:37         ` Christoph Hellwig
  2018-11-08 14:46           ` Max Gurtovoy
  2018-11-09  0:00           ` Sagi Grimberg
  0 siblings, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2018-11-08  8:37 UTC (permalink / raw)


On Thu, Nov 08, 2018@10:20:00AM +0200, Max Gurtovoy wrote:
>
> On 11/7/2018 11:07 AM, Christoph Hellwig wrote:
>> On Tue, Nov 06, 2018@01:10:27PM +0200, Max Gurtovoy wrote:
>>>> This sounds odd.?? Why aren't you concerned that io queues are not
>>>> connecting ?? Are there any log messages hinting at the failures ? any
>>>> way someone looking at the controller knows how many queues were actually
>>>> created ? ? I would assume any failure is significant and should be
>>>> visible, and it's worthwhile knowing whether this is a consistent failure
>>>> or a random failure. and what the failure was.
>>> This may happen (well it happened in the past, and fixed in the block
>>> layer) in case there are offline cpu's or some other reason that some queue
>>> is unmapped.
>>>
>>> I prefer not to relay on the block layer to ensure us 100% mapping and
>>> prefer be safe in our ULP.
>> How do we ensure we ensure any potential new block layer bug returns
>> -EXDEV so that your handling kicks in?
>
> well we can't ensure that. Are you suggesting to do the handling for each 
> failure ?
>
> I guess we'll need this anyway for older kernels that don't have the fix 
> for offline cpu mapping.

No, I'm arguing that adding this just in case code which doesn't
have a good way to actually catch a typical bug is not very useful.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-08  8:37         ` Christoph Hellwig
@ 2018-11-08 14:46           ` Max Gurtovoy
  2018-11-08 14:58             ` Christoph Hellwig
  2018-11-09  0:00           ` Sagi Grimberg
  1 sibling, 1 reply; 13+ messages in thread
From: Max Gurtovoy @ 2018-11-08 14:46 UTC (permalink / raw)



On 11/8/2018 10:37 AM, Christoph Hellwig wrote:
> On Thu, Nov 08, 2018@10:20:00AM +0200, Max Gurtovoy wrote:
>> On 11/7/2018 11:07 AM, Christoph Hellwig wrote:
>>> On Tue, Nov 06, 2018@01:10:27PM +0200, Max Gurtovoy wrote:
>>>>> This sounds odd.?? Why aren't you concerned that io queues are not
>>>>> connecting ?? Are there any log messages hinting at the failures ? any
>>>>> way someone looking at the controller knows how many queues were actually
>>>>> created ? ? I would assume any failure is significant and should be
>>>>> visible, and it's worthwhile knowing whether this is a consistent failure
>>>>> or a random failure. and what the failure was.
>>>> This may happen (well it happened in the past, and fixed in the block
>>>> layer) in case there are offline cpu's or some other reason that some queue
>>>> is unmapped.
>>>>
>>>> I prefer not to relay on the block layer to ensure us 100% mapping and
>>>> prefer be safe in our ULP.
>>> How do we ensure we ensure any potential new block layer bug returns
>>> -EXDEV so that your handling kicks in?
>> well we can't ensure that. Are you suggesting to do the handling for each
>> failure ?
>>
>> I guess we'll need this anyway for older kernels that don't have the fix
>> for offline cpu mapping.
> No, I'm arguing that adding this just in case code which doesn't
> have a good way to actually catch a typical bug is not very useful.

well we can continue with all or nothing approach. By I'm worried about 
older kernels and distros that may not have all the needed fixes in the 
block layer regarding offline CPU mappings.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-08 14:46           ` Max Gurtovoy
@ 2018-11-08 14:58             ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2018-11-08 14:58 UTC (permalink / raw)


On Thu, Nov 08, 2018@04:46:52PM +0200, Max Gurtovoy wrote:
> well we can continue with all or nothing approach. By I'm worried about 
> older kernels and distros that may not have all the needed fixes in the 
> block layer regarding offline CPU mappings.

Well, feel free to keep your patch around for backports then.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-08  8:37         ` Christoph Hellwig
  2018-11-08 14:46           ` Max Gurtovoy
@ 2018-11-09  0:00           ` Sagi Grimberg
  2018-11-09 18:55             ` James Smart
  1 sibling, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2018-11-09  0:00 UTC (permalink / raw)



> No, I'm arguing that adding this just in case code which doesn't
> have a good way to actually catch a typical bug is not very useful.

Regardless of the specific error code, the question is do we want
to be able to still continue degraded operation if we happen to fail
queue allocation. A different error could be that we failed to connect
because the target is overloaded..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-09  0:00           ` Sagi Grimberg
@ 2018-11-09 18:55             ` James Smart
  2018-11-09 21:52               ` Sagi Grimberg
  0 siblings, 1 reply; 13+ messages in thread
From: James Smart @ 2018-11-09 18:55 UTC (permalink / raw)




On 11/8/2018 4:00 PM, Sagi Grimberg wrote:
>
>> No, I'm arguing that adding this just in case code which doesn't
>> have a good way to actually catch a typical bug is not very useful.
>
> Regardless of the specific error code, the question is do we want
> to be able to still continue degraded operation if we happen to fail
> queue allocation. A different error could be that we failed to connect
> because the target is overloaded..
>

That shouldn't have happened if you have already done a SET_FEATURES to 
set the io queue count.

-- james

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect
  2018-11-09 18:55             ` James Smart
@ 2018-11-09 21:52               ` Sagi Grimberg
  0 siblings, 0 replies; 13+ messages in thread
From: Sagi Grimberg @ 2018-11-09 21:52 UTC (permalink / raw)



>>> No, I'm arguing that adding this just in case code which doesn't
>>> have a good way to actually catch a typical bug is not very useful.
>>
>> Regardless of the specific error code, the question is do we want
>> to be able to still continue degraded operation if we happen to fail
>> queue allocation. A different error could be that we failed to connect
>> because the target is overloaded..
>>
> 
> That shouldn't have happened if you have already done a SET_FEATURES to 
> set the io queue count.

Shouldn't but could... One can also theoretically exceed the number of
queues supported by its local HCA. The question still holds...

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-11-09 21:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-04 16:37 [PATCH] nvme-rdma: Don't fail the controller if only part of the queues fail to connect Israel Rukshin
2018-11-04 17:09 ` [Suspected-Phishing][PATCH] " Max Gurtovoy
2018-11-05 19:34 ` [PATCH] " James Smart
2018-11-06 11:10   ` Max Gurtovoy
2018-11-07  9:07     ` Christoph Hellwig
2018-11-07 18:31       ` Sagi Grimberg
2018-11-08  8:20       ` Max Gurtovoy
2018-11-08  8:37         ` Christoph Hellwig
2018-11-08 14:46           ` Max Gurtovoy
2018-11-08 14:58             ` Christoph Hellwig
2018-11-09  0:00           ` Sagi Grimberg
2018-11-09 18:55             ` James Smart
2018-11-09 21:52               ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.