linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-rdma: fix crash for no IO queues
@ 2021-02-23  7:26 Chao Leng
  2021-02-23 22:03 ` Chaitanya Kulkarni
  2021-02-23 23:21 ` Keith Busch
  0 siblings, 2 replies; 15+ messages in thread
From: Chao Leng @ 2021-02-23  7:26 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, hch, sagi

A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
over rdma(roce) reconnection, the reason is use the queue which is not
alloced.

If it is not discovery and no io queues, the connection should fail.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/rdma.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 53ac4d7442ba..7410972353ae 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -736,8 +736,11 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 		return ret;
 
 	ctrl->ctrl.queue_count = nr_io_queues + 1;
-	if (ctrl->ctrl.queue_count < 2)
-		return 0;
+	if (ctrl->ctrl.queue_count < 2) {
+		if (opts->discovery_nqn)
+			return 0;
+		return -EAGAIN;
+	}
 
 	dev_info(ctrl->ctrl.device,
 		"creating %d I/O queues.\n", nr_io_queues);
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-02-23  7:26 [PATCH] nvme-rdma: fix crash for no IO queues Chao Leng
@ 2021-02-23 22:03 ` Chaitanya Kulkarni
  2021-02-24  5:52   ` Chao Leng
  2021-02-23 23:21 ` Keith Busch
  1 sibling, 1 reply; 15+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-23 22:03 UTC (permalink / raw)
  To: Chao Leng, linux-nvme; +Cc: kbusch, axboe, hch, sagi

On 2/22/21 23:30, Chao Leng wrote:
> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
> over rdma(roce) reconnection, the reason is use the queue which is not
> alloced.
>
> If it is not discovery and no io queues, the connection should fail.
>
> Signed-off-by: Chao Leng <lengchao@huawei.com>

Can you please share more information about

"when set feature(NVME_FEAT_NUM_QUEUES) timeout" scenario ?


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-02-23  7:26 [PATCH] nvme-rdma: fix crash for no IO queues Chao Leng
  2021-02-23 22:03 ` Chaitanya Kulkarni
@ 2021-02-23 23:21 ` Keith Busch
  2021-02-24  5:59   ` Chao Leng
  1 sibling, 1 reply; 15+ messages in thread
From: Keith Busch @ 2021-02-23 23:21 UTC (permalink / raw)
  To: Chao Leng; +Cc: axboe, hch, linux-nvme, sagi

On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
> over rdma(roce) reconnection, the reason is use the queue which is not
> alloced.
> 
> If it is not discovery and no io queues, the connection should fail.

If you're getting a timeout, we need to quit initialization. Hannes
attempted making that status visible for fabrics here:

  http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html

There's still some corner cases that need handling, though.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-02-23 22:03 ` Chaitanya Kulkarni
@ 2021-02-24  5:52   ` Chao Leng
  0 siblings, 0 replies; 15+ messages in thread
From: Chao Leng @ 2021-02-24  5:52 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme; +Cc: kbusch, axboe, hch, sagi



On 2021/2/24 6:03, Chaitanya Kulkarni wrote:
> On 2/22/21 23:30, Chao Leng wrote:
>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>> over rdma(roce) reconnection, the reason is use the queue which is not
>> alloced.
>>
>> If it is not discovery and no io queues, the connection should fail.
>>
>> Signed-off-by: Chao Leng <lengchao@huawei.com>
> 
> Can you please share more information about
> 
> "when set feature(NVME_FEAT_NUM_QUEUES) timeout" scenario ?
Inject a large number of bit errors intermittently. This will cause
request time out, and then reconnection will trigger by error recovery.
The requests of reconnect may also time out. If set feature
(NVME_FEAT_NUM_QUEUES, which is called by nvme_set_queue_count) time out
and the reconnection may success. Block will continue send request,
and then crash due to use unallocated queue.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-02-23 23:21 ` Keith Busch
@ 2021-02-24  5:59   ` Chao Leng
  2021-02-27  9:12     ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-02-24  5:59 UTC (permalink / raw)
  To: Keith Busch; +Cc: axboe, hch, linux-nvme, sagi



On 2021/2/24 7:21, Keith Busch wrote:
> On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>> over rdma(roce) reconnection, the reason is use the queue which is not
>> alloced.
>>
>> If it is not discovery and no io queues, the connection should fail.
> 
> If you're getting a timeout, we need to quit initialization. Hannes
> attempted making that status visible for fabrics here:
> 
>    http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
I know the patch. It can not solve the scenario: target may be an
attacker or the target behavior is incorrect.
If target return 0 io queues or return other error code, the crash will
still happen. We should not allow this to happen.
> 
> There's still some corner cases that need handling, though.
> .
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-02-24  5:59   ` Chao Leng
@ 2021-02-27  9:12     ` Hannes Reinecke
  2021-02-27  9:30       ` Chao Leng
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2021-02-27  9:12 UTC (permalink / raw)
  To: linux-nvme

On 2/24/21 6:59 AM, Chao Leng wrote:
> 
> 
> On 2021/2/24 7:21, Keith Busch wrote:
>> On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
>>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>>> over rdma(roce) reconnection, the reason is use the queue which is not
>>> alloced.
>>>
>>> If it is not discovery and no io queues, the connection should fail.
>>
>> If you're getting a timeout, we need to quit initialization. Hannes
>> attempted making that status visible for fabrics here:
>>
>>    
>> http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
> I know the patch. It can not solve the scenario: target may be an
> attacker or the target behavior is incorrect.
> If target return 0 io queues or return other error code, the crash will
> still happen. We should not allow this to happen.
I'm fully with you that we shouldn't crash, but at the same time a value 
of '0' for the number of I/O queues is considered valid.
So we should fix the code to handle this scenario, and not disallowing 
zero I/O queues.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-02-27  9:12     ` Hannes Reinecke
@ 2021-02-27  9:30       ` Chao Leng
  2021-03-02  7:48         ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-02-27  9:30 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme



On 2021/2/27 17:12, Hannes Reinecke wrote:
> On 2/24/21 6:59 AM, Chao Leng wrote:
>>
>>
>> On 2021/2/24 7:21, Keith Busch wrote:
>>> On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
>>>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>>>> over rdma(roce) reconnection, the reason is use the queue which is not
>>>> alloced.
>>>>
>>>> If it is not discovery and no io queues, the connection should fail.
>>>
>>> If you're getting a timeout, we need to quit initialization. Hannes
>>> attempted making that status visible for fabrics here:
>>>
>>> http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
>> I know the patch. It can not solve the scenario: target may be an
>> attacker or the target behavior is incorrect.
>> If target return 0 io queues or return other error code, the crash will
>> still happen. We should not allow this to happen.
> I'm fully with you that we shouldn't crash, but at the same time a value of '0' for the number of I/O queues is considered valid.
> So we should fix the code to handle this scenario, and not disallowing zero I/O queues.
'0' I/O queues doesn't make any sense to nvme over fabrics, it is
different with nvme over pci. If there is some bug with target, we can
debug it in target instead of use admin queue in host.
target may be an attacker or the target behavior is incorrect. So we
should avoid crash. Another option: prohibit  request delivery if
io queue do not created.
I think failed connection with '0' I/O queues is a better choice.
> 
> Cheers,
> 
> Hannes

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-02-27  9:30       ` Chao Leng
@ 2021-03-02  7:48         ` Hannes Reinecke
  2021-03-02  9:49           ` Chao Leng
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2021-03-02  7:48 UTC (permalink / raw)
  To: Chao Leng, linux-nvme

On 2/27/21 10:30 AM, Chao Leng wrote:
> 
> 
> On 2021/2/27 17:12, Hannes Reinecke wrote:
>> On 2/24/21 6:59 AM, Chao Leng wrote:
>>>
>>>
>>> On 2021/2/24 7:21, Keith Busch wrote:
>>>> On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
>>>>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>>>>> over rdma(roce) reconnection, the reason is use the queue which is not
>>>>> alloced.
>>>>>
>>>>> If it is not discovery and no io queues, the connection should fail.
>>>>
>>>> If you're getting a timeout, we need to quit initialization. Hannes
>>>> attempted making that status visible for fabrics here:
>>>>
>>>> http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
>>>>
>>> I know the patch. It can not solve the scenario: target may be an
>>> attacker or the target behavior is incorrect.
>>> If target return 0 io queues or return other error code, the crash will
>>> still happen. We should not allow this to happen.
>> I'm fully with you that we shouldn't crash, but at the same time a
>> value of '0' for the number of I/O queues is considered valid.
>> So we should fix the code to handle this scenario, and not disallowing
>> zero I/O queues.
> '0' I/O queues doesn't make any sense to nvme over fabrics, it is
> different with nvme over pci. If there is some bug with target, we can
> debug it in target instead of use admin queue in host.
> target may be an attacker or the target behavior is incorrect. So we
> should avoid crash. Another option: prohibit  request delivery if
> io queue do not created.
> I think failed connection with '0' I/O queues is a better choice.

Might be, but that's not for me to decide.
I tried that initially, but that patch got rejected as _technically_ the
controller is reachable via its admin queue.
Sagi? Christoph?
Are controllers with 0 I/O queues valid or is this an error condition?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-03-02  7:48         ` Hannes Reinecke
@ 2021-03-02  9:49           ` Chao Leng
  2021-03-02 18:24             ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-03-02  9:49 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme



On 2021/3/2 15:48, Hannes Reinecke wrote:
> On 2/27/21 10:30 AM, Chao Leng wrote:
>>
>>
>> On 2021/2/27 17:12, Hannes Reinecke wrote:
>>> On 2/24/21 6:59 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/2/24 7:21, Keith Busch wrote:
>>>>> On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
>>>>>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>>>>>> over rdma(roce) reconnection, the reason is use the queue which is not
>>>>>> alloced.
>>>>>>
>>>>>> If it is not discovery and no io queues, the connection should fail.
>>>>>
>>>>> If you're getting a timeout, we need to quit initialization. Hannes
>>>>> attempted making that status visible for fabrics here:
>>>>>
>>>>> http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
>>>>>
>>>> I know the patch. It can not solve the scenario: target may be an
>>>> attacker or the target behavior is incorrect.
>>>> If target return 0 io queues or return other error code, the crash will
>>>> still happen. We should not allow this to happen.
>>> I'm fully with you that we shouldn't crash, but at the same time a
>>> value of '0' for the number of I/O queues is considered valid.
>>> So we should fix the code to handle this scenario, and not disallowing
>>> zero I/O queues.
>> '0' I/O queues doesn't make any sense to nvme over fabrics, it is
>> different with nvme over pci. If there is some bug with target, we can
>> debug it in target instead of use admin queue in host.
>> target may be an attacker or the target behavior is incorrect. So we
>> should avoid crash. Another option: prohibit  request delivery if
>> io queue do not created.
>> I think failed connection with '0' I/O queues is a better choice.
> 
> Might be, but that's not for me to decide.
> I tried that initially, but that patch got rejected as _technically_ the
> controller is reachable via its admin queue.
I know about your patch. That patch failed connection for all transports.
It is not good for pcie transport, the controller can accept admin
commands to get some diagnostics (perhaps an error log page), this is
keith's thoughts.
> Sagi? Christoph?
> Are controllers with 0 I/O queues valid or is this an error condition?
> 
> Cheers,
> 
> Hannes
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-03-02  9:49           ` Chao Leng
@ 2021-03-02 18:24             ` Keith Busch
  2021-03-03  2:27               ` Chao Leng
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2021-03-02 18:24 UTC (permalink / raw)
  To: Chao Leng; +Cc: Hannes Reinecke, linux-nvme

On Tue, Mar 02, 2021 at 05:49:05PM +0800, Chao Leng wrote:
> 
> 
> On 2021/3/2 15:48, Hannes Reinecke wrote:
> > On 2/27/21 10:30 AM, Chao Leng wrote:
> > > 
> > > 
> > > On 2021/2/27 17:12, Hannes Reinecke wrote:
> > > > On 2/24/21 6:59 AM, Chao Leng wrote:
> > > > > 
> > > > > 
> > > > > On 2021/2/24 7:21, Keith Busch wrote:
> > > > > > On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
> > > > > > > A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
> > > > > > > over rdma(roce) reconnection, the reason is use the queue which is not
> > > > > > > alloced.
> > > > > > > 
> > > > > > > If it is not discovery and no io queues, the connection should fail.
> > > > > > 
> > > > > > If you're getting a timeout, we need to quit initialization. Hannes
> > > > > > attempted making that status visible for fabrics here:
> > > > > > 
> > > > > > http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
> > > > > > 
> > > > > I know the patch. It can not solve the scenario: target may be an
> > > > > attacker or the target behavior is incorrect.
> > > > > If target return 0 io queues or return other error code, the crash will
> > > > > still happen. We should not allow this to happen.
> > > > I'm fully with you that we shouldn't crash, but at the same time a
> > > > value of '0' for the number of I/O queues is considered valid.
> > > > So we should fix the code to handle this scenario, and not disallowing
> > > > zero I/O queues.
> > > '0' I/O queues doesn't make any sense to nvme over fabrics, it is
> > > different with nvme over pci. If there is some bug with target, we can
> > > debug it in target instead of use admin queue in host.
> > > target may be an attacker or the target behavior is incorrect. So we
> > > should avoid crash. Another option: prohibit  request delivery if
> > > io queue do not created.
> > > I think failed connection with '0' I/O queues is a better choice.
> > 
> > Might be, but that's not for me to decide.
> > I tried that initially, but that patch got rejected as _technically_ the
> > controller is reachable via its admin queue.
> I know about your patch. That patch failed connection for all transports.
> It is not good for pcie transport, the controller can accept admin
> commands to get some diagnostics (perhaps an error log page), this is
> keith's thoughts.

We can continue to administrate a controller that didn't create IO
queues, but the controller must provide a response to all commands. If
it doesn't, the controller will either be reset or abandoned. This
should be the same behavior for any transport, though; there's nothing
special about PCIe for that.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-03-02 18:24             ` Keith Busch
@ 2021-03-03  2:27               ` Chao Leng
  2021-03-03  3:14                 ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-03-03  2:27 UTC (permalink / raw)
  To: Keith Busch; +Cc: Hannes Reinecke, linux-nvme



On 2021/3/3 2:24, Keith Busch wrote:
> On Tue, Mar 02, 2021 at 05:49:05PM +0800, Chao Leng wrote:
>>
>>
>> On 2021/3/2 15:48, Hannes Reinecke wrote:
>>> On 2/27/21 10:30 AM, Chao Leng wrote:
>>>>
>>>>
>>>> On 2021/2/27 17:12, Hannes Reinecke wrote:
>>>>> On 2/24/21 6:59 AM, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2021/2/24 7:21, Keith Busch wrote:
>>>>>>> On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
>>>>>>>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>>>>>>>> over rdma(roce) reconnection, the reason is use the queue which is not
>>>>>>>> alloced.
>>>>>>>>
>>>>>>>> If it is not discovery and no io queues, the connection should fail.
>>>>>>>
>>>>>>> If you're getting a timeout, we need to quit initialization. Hannes
>>>>>>> attempted making that status visible for fabrics here:
>>>>>>>
>>>>>>> http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
>>>>>>>
>>>>>> I know the patch. It can not solve the scenario: target may be an
>>>>>> attacker or the target behavior is incorrect.
>>>>>> If target return 0 io queues or return other error code, the crash will
>>>>>> still happen. We should not allow this to happen.
>>>>> I'm fully with you that we shouldn't crash, but at the same time a
>>>>> value of '0' for the number of I/O queues is considered valid.
>>>>> So we should fix the code to handle this scenario, and not disallowing
>>>>> zero I/O queues.
>>>> '0' I/O queues doesn't make any sense to nvme over fabrics, it is
>>>> different with nvme over pci. If there is some bug with target, we can
>>>> debug it in target instead of use admin queue in host.
>>>> target may be an attacker or the target behavior is incorrect. So we
>>>> should avoid crash. Another option: prohibit  request delivery if
>>>> io queue do not created.
>>>> I think failed connection with '0' I/O queues is a better choice.
>>>
>>> Might be, but that's not for me to decide.
>>> I tried that initially, but that patch got rejected as _technically_ the
>>> controller is reachable via its admin queue.
>> I know about your patch. That patch failed connection for all transports.
>> It is not good for pcie transport, the controller can accept admin
>> commands to get some diagnostics (perhaps an error log page), this is
>> keith's thoughts.
> 
> We can continue to administrate a controller that didn't create IO
> queues, but the controller must provide a response to all commands. If
> it doesn't, the controller will either be reset or abandoned. This
> should be the same behavior for any transport, though; there's nothing
> special about PCIe for that.
Though I don't see any useful scenarios for nvme over fabric now,
Reserved for future possibilities may be a better choice.
I will send another patch. Prohibit request delivery if the queue is not
live.

> .
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-03-03  2:27               ` Chao Leng
@ 2021-03-03  3:14                 ` Keith Busch
  2021-03-03  3:39                   ` Chao Leng
  0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2021-03-03  3:14 UTC (permalink / raw)
  To: Chao Leng; +Cc: Hannes Reinecke, linux-nvme

On Wed, Mar 03, 2021 at 10:27:01AM +0800, Chao Leng wrote:
> On 2021/3/3 2:24, Keith Busch wrote:
> > We can continue to administrate a controller that didn't create IO
> > queues, but the controller must provide a response to all commands. If
> > it doesn't, the controller will either be reset or abandoned. This
> > should be the same behavior for any transport, though; there's nothing
> > special about PCIe for that.
> Though I don't see any useful scenarios for nvme over fabric now,
> Reserved for future possibilities may be a better choice.

The admin queue may be the only way for a user to retrieve useful
information on the malfunctioning controller. The telemetry log isn't
unique to PCIe.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-03-03  3:14                 ` Keith Busch
@ 2021-03-03  3:39                   ` Chao Leng
  2021-03-03  7:41                     ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Chao Leng @ 2021-03-03  3:39 UTC (permalink / raw)
  To: Keith Busch; +Cc: Hannes Reinecke, linux-nvme



On 2021/3/3 11:14, Keith Busch wrote:
> On Wed, Mar 03, 2021 at 10:27:01AM +0800, Chao Leng wrote:
>> On 2021/3/3 2:24, Keith Busch wrote:
>>> We can continue to administrate a controller that didn't create IO
>>> queues, but the controller must provide a response to all commands. If
>>> it doesn't, the controller will either be reset or abandoned. This
>>> should be the same behavior for any transport, though; there's nothing
>>> special about PCIe for that.
>> Though I don't see any useful scenarios for nvme over fabric now,
>> Reserved for future possibilities may be a better choice.
> 
> The admin queue may be the only way for a user to retrieve useful
> information on the malfunctioning controller. The telemetry log isn't
> unique to PCIe.
User can also directly visit target to get these information for
nvme over fabrics.
> .
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-03-03  3:39                   ` Chao Leng
@ 2021-03-03  7:41                     ` Hannes Reinecke
  2021-03-03 15:08                       ` Keith Busch
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2021-03-03  7:41 UTC (permalink / raw)
  To: Chao Leng, Keith Busch; +Cc: linux-nvme

On 3/3/21 4:39 AM, Chao Leng wrote:
> 
> 
> On 2021/3/3 11:14, Keith Busch wrote:
>> On Wed, Mar 03, 2021 at 10:27:01AM +0800, Chao Leng wrote:
>>> On 2021/3/3 2:24, Keith Busch wrote:
>>>> We can continue to administrate a controller that didn't create IO
>>>> queues, but the controller must provide a response to all commands. If
>>>> it doesn't, the controller will either be reset or abandoned. This
>>>> should be the same behavior for any transport, though; there's nothing
>>>> special about PCIe for that.
>>> Though I don't see any useful scenarios for nvme over fabric now,
>>> Reserved for future possibilities may be a better choice.
>>
>> The admin queue may be the only way for a user to retrieve useful
>> information on the malfunctioning controller. The telemetry log isn't
>> unique to PCIe.
> User can also directly visit target to get these information for
> nvme over fabrics.

Not necessarily. There is no requirement that the storage admin is
identical to the host admin; in fact, in larger installations both roles
are strictly separated.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] nvme-rdma: fix crash for no IO queues
  2021-03-03  7:41                     ` Hannes Reinecke
@ 2021-03-03 15:08                       ` Keith Busch
  0 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2021-03-03 15:08 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Chao Leng, linux-nvme

On Wed, Mar 03, 2021 at 08:41:58AM +0100, Hannes Reinecke wrote:
> On 3/3/21 4:39 AM, Chao Leng wrote:
> > 
> > 
> > On 2021/3/3 11:14, Keith Busch wrote:
> >> On Wed, Mar 03, 2021 at 10:27:01AM +0800, Chao Leng wrote:
> >>> On 2021/3/3 2:24, Keith Busch wrote:
> >>>> We can continue to administrate a controller that didn't create IO
> >>>> queues, but the controller must provide a response to all commands. If
> >>>> it doesn't, the controller will either be reset or abandoned. This
> >>>> should be the same behavior for any transport, though; there's nothing
> >>>> special about PCIe for that.
> >>> Though I don't see any useful scenarios for nvme over fabric now,
> >>> Reserved for future possibilities may be a better choice.
> >>
> >> The admin queue may be the only way for a user to retrieve useful
> >> information on the malfunctioning controller. The telemetry log isn't
> >> unique to PCIe.
> > User can also directly visit target to get these information for
> > nvme over fabrics.
> 
> Not necessarily. There is no requirement that the storage admin is
> identical to the host admin; in fact, in larger installations both roles
> are strictly separated.

There's also no requirement that nvme targets providing anything but
in-band management capabilities, so there is no justification for
relying on a vendor specific method.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-03-03 22:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23  7:26 [PATCH] nvme-rdma: fix crash for no IO queues Chao Leng
2021-02-23 22:03 ` Chaitanya Kulkarni
2021-02-24  5:52   ` Chao Leng
2021-02-23 23:21 ` Keith Busch
2021-02-24  5:59   ` Chao Leng
2021-02-27  9:12     ` Hannes Reinecke
2021-02-27  9:30       ` Chao Leng
2021-03-02  7:48         ` Hannes Reinecke
2021-03-02  9:49           ` Chao Leng
2021-03-02 18:24             ` Keith Busch
2021-03-03  2:27               ` Chao Leng
2021-03-03  3:14                 ` Keith Busch
2021-03-03  3:39                   ` Chao Leng
2021-03-03  7:41                     ` Hannes Reinecke
2021-03-03 15:08                       ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).