linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Keith Busch <kbusch@kernel.org>, Chao Leng <lengchao@huawei.com>
Cc: linux-nvme@lists.infradead.org, axboe@fb.com, hch@lst.de
Subject: Re: [PATCH] nvme-fabrics: fix crash for no IO queues
Date: Mon, 15 Mar 2021 22:08:05 -0700	[thread overview]
Message-ID: <21bc3b62-967c-6cb2-c9f3-7da479aef554@grimberg.me> (raw)
In-Reply-To: <20210316020229.GA35099@C02WT3WMHTD6>


>>>>>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>>>>>> over rdma(roce) reconnection, the reason is use the queue which is not
>>>>>> alloced.
>>>>>>
>>>>>> If queue is not live, should not allow queue request.
>>>>>
>>>>> Can you describe exactly the scenario here? What is the state
>>>>> here? LIVE? or DELETING?
>>>> If seting feature(NVME_FEAT_NUM_QUEUES) failed due to time out or
>>>> the target return 0 io queues, nvme_set_queue_count will return 0,
>>>> and then reconnection will continue and success. The state of controller
>>>> is LIVE. The request will continue to deliver by call ->queue_rq(),
>>>> and then crash happens.
>>>
>>> Thinking about this again, we should absolutely fail the reconnection
>>> when we are unable to set any I/O queues, it is just wrong to
>>> keep this controller alive...
>> Keith think keeping the controller alive for diagnose is better.
>> This is the patch which failed the connection.
>> https://lore.kernel.org/linux-nvme/20210223072602.3196-1-lengchao@huawei.com/
>>
>> Now we have 2 choice:
>> 1.failed the connection when unable to set any I/O queues.
>> 2.do not allow queue request when queue is not live.
> 
> Okay, so there are different views on how to handles this. I personally find
> in-band administration for a misbehaving device is a good thing to have, but I
> won't 'nak' if the consensus from the people using this is for the other way.

While I understand that this can be useful, I've seen it do more harm
than good. It is really puzzling to people when the controller state
reflected is live (and even optimized) and no I/O is making progress for
unknown reason. And logs are rarely accessed in these cases.

I am also opting for failing it and rescheduling a reconnect.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-03-16  5:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04  0:55 [PATCH] nvme-fabrics: fix crash for no IO queues Chao Leng
2021-03-05 20:58 ` Sagi Grimberg
2021-03-08  1:30   ` Chao Leng
2021-03-15 17:08     ` Sagi Grimberg
2021-03-16  1:23       ` Chao Leng
2021-03-16  2:02         ` Keith Busch
2021-03-16  5:08           ` Sagi Grimberg [this message]
2021-03-16 20:57             ` James Smart
2021-03-16 21:25               ` Keith Busch
2021-03-16 23:52                 ` Sagi Grimberg
2021-03-17  0:19                 ` James Smart
  -- strict thread matches above, loose matches on Subject: below --
2021-03-03  2:53 Chao Leng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21bc3b62-967c-6cb2-c9f3-7da479aef554@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).