All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()
@ 2021-01-21  9:50 Hannes Reinecke
  2021-01-21 20:03 ` Chaitanya Kulkarni
  2021-01-21 20:14 ` Keith Busch
  0 siblings, 2 replies; 7+ messages in thread
From: Hannes Reinecke @ 2021-01-21  9:50 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, Hannes Reinecke

If the call to nvme_set_queue_count() fails with a status we should
not ignore it but rather pass it on to the caller.
It's then up to the transport to decide whether to ignore it
(like PCI does) or to reset the connection (as would be appropriate
for fabrics).

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ce1b61519441..ddf32f5b4534 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1486,7 +1486,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 		*count = min(*count, nr_io_queues);
 	}
 
-	return 0;
+	return status;
 }
 EXPORT_SYMBOL_GPL(nvme_set_queue_count);
 
-- 
2.26.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()
  2021-01-21  9:50 [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count() Hannes Reinecke
@ 2021-01-21 20:03 ` Chaitanya Kulkarni
  2021-01-21 20:14 ` Keith Busch
  1 sibling, 0 replies; 7+ messages in thread
From: Chaitanya Kulkarni @ 2021-01-21 20:03 UTC (permalink / raw)
  To: Hannes Reinecke, Sagi Grimberg; +Cc: Keith Busch, Christoph Hellwig, linux-nvme

On 1/21/21 1:54 AM, Hannes Reinecke wrote:
> If the call to nvme_set_queue_count() fails with a status we should
> not ignore it but rather pass it on to the caller.
> It's then up to the transport to decide whether to ignore it
> (like PCI does) or to reset the connection (as would be appropriate
> for fabrics).
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/nvme/host/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index ce1b61519441..ddf32f5b4534 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1486,7 +1486,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
>  		*count = min(*count, nr_io_queues);
>  	}
>  
> -	return 0;
> +	return status;
>  }
>  EXPORT_SYMBOL_GPL(nvme_set_queue_count);
>  
Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()
  2021-01-21  9:50 [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count() Hannes Reinecke
  2021-01-21 20:03 ` Chaitanya Kulkarni
@ 2021-01-21 20:14 ` Keith Busch
  2021-01-22 16:35   ` Hannes Reinecke
  1 sibling, 1 reply; 7+ messages in thread
From: Keith Busch @ 2021-01-21 20:14 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, Christoph Hellwig

On Thu, Jan 21, 2021 at 10:50:21AM +0100, Hannes Reinecke wrote:
> If the call to nvme_set_queue_count() fails with a status we should
> not ignore it but rather pass it on to the caller.
> It's then up to the transport to decide whether to ignore it
> (like PCI does) or to reset the connection (as would be appropriate
> for fabrics).

Instead of checking the error, wouldn't checking the number of created
queues be sufficient? What handling difference do you expect to occur
between getting a success with 0 queues, vs getting an error?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()
  2021-01-21 20:14 ` Keith Busch
@ 2021-01-22 16:35   ` Hannes Reinecke
  2021-01-22 16:44     ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2021-01-22 16:35 UTC (permalink / raw)
  To: Keith Busch; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, Christoph Hellwig

On 1/21/21 9:14 PM, Keith Busch wrote:
> On Thu, Jan 21, 2021 at 10:50:21AM +0100, Hannes Reinecke wrote:
>> If the call to nvme_set_queue_count() fails with a status we should
>> not ignore it but rather pass it on to the caller.
>> It's then up to the transport to decide whether to ignore it
>> (like PCI does) or to reset the connection (as would be appropriate
>> for fabrics).
> 
> Instead of checking the error, wouldn't checking the number of created
> queues be sufficient? What handling difference do you expect to occur
> between getting a success with 0 queues, vs getting an error?
> 
The difference is that an error will (re-)start recovery, 0 queues won't.
But the problem here is that nvme_set_queue_count() is being called 
during reconnection, ie during the recovery process itself.
And this command is returned with a timeout, which in any other case is 
being treated as a fatal error. Plus we have been sending this command 
on the admin queue, so a timeout on the admin queue pretty much _is_  a 
fatal error. So we should be terminating the current recovery and 
reconnect. None of that will happen if we return '0' queues.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()
  2021-01-22 16:35   ` Hannes Reinecke
@ 2021-01-22 16:44     ` Keith Busch
  2021-01-26 15:25       ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2021-01-22 16:44 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, Christoph Hellwig

On Fri, Jan 22, 2021 at 05:35:35PM +0100, Hannes Reinecke wrote:
> On 1/21/21 9:14 PM, Keith Busch wrote:
> > On Thu, Jan 21, 2021 at 10:50:21AM +0100, Hannes Reinecke wrote:
> > > If the call to nvme_set_queue_count() fails with a status we should
> > > not ignore it but rather pass it on to the caller.
> > > It's then up to the transport to decide whether to ignore it
> > > (like PCI does) or to reset the connection (as would be appropriate
> > > for fabrics).
> > 
> > Instead of checking the error, wouldn't checking the number of created
> > queues be sufficient? What handling difference do you expect to occur
> > between getting a success with 0 queues, vs getting an error?
> > 
> The difference is that an error will (re-)start recovery, 0 queues won't.
> But the problem here is that nvme_set_queue_count() is being called during
> reconnection, ie during the recovery process itself.
> And this command is returned with a timeout, which in any other case is
> being treated as a fatal error. Plus we have been sending this command on
> the admin queue, so a timeout on the admin queue pretty much _is_  a fatal
> error. So we should be terminating the current recovery and reconnect. None
> of that will happen if we return '0' queues.

You should already be getting an error return status if a timeout occurs
for nvme_set_queue_count(), specifically -EINTR. Are you getting success
for some reason?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()
  2021-01-22 16:44     ` Keith Busch
@ 2021-01-26 15:25       ` Hannes Reinecke
  2021-01-26 19:06         ` Keith Busch
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2021-01-26 15:25 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Christoph Hellwig

On 1/22/21 5:44 PM, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 05:35:35PM +0100, Hannes Reinecke wrote:
>> On 1/21/21 9:14 PM, Keith Busch wrote:
>>> On Thu, Jan 21, 2021 at 10:50:21AM +0100, Hannes Reinecke wrote:
>>>> If the call to nvme_set_queue_count() fails with a status we should
>>>> not ignore it but rather pass it on to the caller.
>>>> It's then up to the transport to decide whether to ignore it
>>>> (like PCI does) or to reset the connection (as would be appropriate
>>>> for fabrics).
>>>
>>> Instead of checking the error, wouldn't checking the number of created
>>> queues be sufficient? What handling difference do you expect to occur
>>> between getting a success with 0 queues, vs getting an error?
>>>
>> The difference is that an error will (re-)start recovery, 0 queues won't.
>> But the problem here is that nvme_set_queue_count() is being called during
>> reconnection, ie during the recovery process itself.
>> And this command is returned with a timeout, which in any other case is
>> being treated as a fatal error. Plus we have been sending this command on
>> the admin queue, so a timeout on the admin queue pretty much _is_  a fatal
>> error. So we should be terminating the current recovery and reconnect. None
>> of that will happen if we return '0' queues.
> 
> You should already be getting an error return status if a timeout occurs
> for nvme_set_queue_count(), specifically -EINTR. Are you getting success
> for some reason?
> 
-EINTR (which translates to 'nvme_req(req)->flags & NVME_REQ_CANCELLED') 
will only ever be returned on pci; fabrics doesn't set this flag, so 
we're never getting an -EINTR.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()
  2021-01-26 15:25       ` Hannes Reinecke
@ 2021-01-26 19:06         ` Keith Busch
  0 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2021-01-26 19:06 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Christoph Hellwig

On Tue, Jan 26, 2021 at 04:25:11PM +0100, Hannes Reinecke wrote:
> On 1/22/21 5:44 PM, Keith Busch wrote:
> > On Fri, Jan 22, 2021 at 05:35:35PM +0100, Hannes Reinecke wrote:
> > > On 1/21/21 9:14 PM, Keith Busch wrote:
> > > > On Thu, Jan 21, 2021 at 10:50:21AM +0100, Hannes Reinecke wrote:
> > > > > If the call to nvme_set_queue_count() fails with a status we should
> > > > > not ignore it but rather pass it on to the caller.
> > > > > It's then up to the transport to decide whether to ignore it
> > > > > (like PCI does) or to reset the connection (as would be appropriate
> > > > > for fabrics).
> > > > 
> > > > Instead of checking the error, wouldn't checking the number of created
> > > > queues be sufficient? What handling difference do you expect to occur
> > > > between getting a success with 0 queues, vs getting an error?
> > > > 
> > > The difference is that an error will (re-)start recovery, 0 queues won't.
> > > But the problem here is that nvme_set_queue_count() is being called during
> > > reconnection, ie during the recovery process itself.
> > > And this command is returned with a timeout, which in any other case is
> > > being treated as a fatal error. Plus we have been sending this command on
> > > the admin queue, so a timeout on the admin queue pretty much _is_  a fatal
> > > error. So we should be terminating the current recovery and reconnect. None
> > > of that will happen if we return '0' queues.
> > 
> > You should already be getting an error return status if a timeout occurs
> > for nvme_set_queue_count(), specifically -EINTR. Are you getting success
> > for some reason?
> > 
> -EINTR (which translates to 'nvme_req(req)->flags & NVME_REQ_CANCELLED')
> will only ever be returned on pci; fabrics doesn't set this flag, so we're
> never getting an -EINTR.

Sounds like that's the problem that needs to be fixed.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-26 19:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21  9:50 [PATCH] nvme: do not ignore nvme status in nvme_set_queue_count() Hannes Reinecke
2021-01-21 20:03 ` Chaitanya Kulkarni
2021-01-21 20:14 ` Keith Busch
2021-01-22 16:35   ` Hannes Reinecke
2021-01-22 16:44     ` Keith Busch
2021-01-26 15:25       ` Hannes Reinecke
2021-01-26 19:06         ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.