linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Hannes Reinecke <hare@suse.de>, Christoph Hellwig <hch@lst.de>,
	Hannes Reinecke <hare@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>, linux-nvme@lists.infradead.org
Subject: Re: [PATCH 1/4] nvme: authentication error are always non-retryable
Date: Thu, 7 Mar 2024 13:37:12 +0200	[thread overview]
Message-ID: <629656d5-2967-467b-8b80-759f4fccdfeb@grimberg.me> (raw)
In-Reply-To: <20531716-b18d-4ab6-91e8-2ac0ea745343@suse.de>



On 07/03/2024 12:32, Hannes Reinecke wrote:
> On 3/7/24 09:51, Sagi Grimberg wrote:
>>
>>
>> On 01/03/2024 17:26, Hannes Reinecke wrote:
>>> On 3/1/24 14:12, Christoph Hellwig wrote:
>>>>> -    if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>>>> -        return AUTHENTICATE;
>>>>> -
>>>>>       if (blk_noretry_request(req) ||
>>>>>           (nvme_req(req)->status & NVME_SC_DNR) ||
>>>>>           nvme_req(req)->retries >= nvme_max_retries)
>>>>>           return COMPLETE;
>>>>>   +    if ((nvme_req(req)->status & 0x7ff) == NVME_SC_AUTH_REQUIRED)
>>>>> +        return AUTHENTICATE;
>>>>
>>>> This part looks fine (although I'd word the commit message
>>>> differently for it).
>>>>
>>>>> @@ -467,7 +467,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl 
>>>>> *ctrl)
>>>>>           if (result & NVME_CONNECT_AUTHREQ_ASCR) {
>>>>>               dev_warn(ctrl->device,
>>>>>                    "qid 0: secure concatenation is not supported\n");
>>>>> -            ret = NVME_SC_AUTH_REQUIRED;
>>>>> +            ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>>>>               goto out_free_data;
>>>>>           }
>>>>>           /* Authentication required */
>>>>> @@ -475,14 +475,16 @@ int nvmf_connect_admin_queue(struct 
>>>>> nvme_ctrl *ctrl)
>>>>>           if (ret) {
>>>>>               dev_warn(ctrl->device,
>>>>>                    "qid 0: authentication setup failed\n");
>>>>> -            ret = NVME_SC_AUTH_REQUIRED;
>>>>> +            ret = NVME_SC_AUTH_REQUIRED | NVME_SC_DNR;
>>>>
>>>>
>>>> .. but the others should never use nvme status codes as they never
>>>> go out onto the wire.
>>>>
>>> Would love to, but to my knowledge we have the convention that NVMe 
>>> status codes less than 0 should _not_ be retried.
>>
>> What nvme status code is less than zero?
>>
> Not the NVMe status, but the 'ret' value containing either a 
> (positive) NVMe status or a (negative) errno.
>
>>> NVMe transport errors,
>>> OTOH, _should_ be retried.
>> Yes.
>>> 'connect_admin_queue' and 'connect_io_queue' now straddles the 
>>> boundary: technically it's an NVMe command, but in
>>> practice it's a transport command as there are quite a lot of steps
>>> before we even can send the 'connect' command.
>>>
>>> So, how do we handle the return codes from 'connect_admin_queue' ?
>>> The NVMe core style (with not retrying negative errors) or the NVMe
>>> transport style (with always retrying until we run out of retries)?
>>
>> Not exactly sure what is the question here. I am still going over 
>> emails so there
>> may be some discussion around this, but it is unclear from the patch 
>> description what is the issue this is solving.
>
> The question is what to do if nvmf_connect_admin_queue() returns with 
> a status with the DNR status set.
> In TCP the status is carried back to the return value of 
> nvme_tcp_setup_ctrl(), which is called during reset, reconnect, and 
> initialisation.
> For the first two the status is simply ignored, and always retried
> (ie every error is treated as retryable). For the latter, we will 
> always abort the initialisation (ie every error is non-retryable).
> Either way, we're completely ignoring the DNR bit in the NVMe status.

IMO We should always fail initial connect, for whatever reason. 
userspace can take care
of it if it wants to reconnect. If we managed to ever connect to a 
controller, we will retry
once we lose the connection.

>
> Daniel has sent two patches on my behalf, trying to evaluate the DNR
> status during reconnect and reset.

Didn't get to those yet.

> With them it suddenly matters how
> the return value from nvmf_connect_admin_queue() is to be interpreted.

OK, The spec should probably be clear about it.

> With my original idea of retrying authentication errors we will keep
> on retrying, even though the authentication settings won't change,
> and consequently each retry will fail. So in effect we go through
> pointless retries, and only abort the test case once the retries
> are exhausted. By making authentication errors non-retryable we
> can terminate the testcase directly.
> The only way I could think of to signal a non-retryable error
> is to return an NVMe status with the DNR bit set; every other error
> will cause a retry. But this will require us to fabricate an NVMe
> status code, which Christoph objects to.

Can we start with defining what is a retry-able authentication error and 
what isn't?


  reply	other threads:[~2024-03-07 11:37 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 11:28 [PATCH 0/4] nvme: fixes for authentication errors Hannes Reinecke
2024-03-01 11:28 ` [PATCH 1/4] nvme: authentication error are always non-retryable Hannes Reinecke
2024-03-01 13:12   ` Christoph Hellwig
2024-03-01 15:26     ` Hannes Reinecke
2024-03-07  8:51       ` Sagi Grimberg
2024-03-07 10:32         ` Hannes Reinecke
2024-03-07 11:37           ` Sagi Grimberg [this message]
2024-03-01 11:28 ` [PATCH 2/4] nvmet: lock config semaphore when accessing DH-HMAC-CHAP key Hannes Reinecke
2024-03-01 13:13   ` Christoph Hellwig
2024-03-07  8:53   ` Sagi Grimberg
2024-03-01 11:28 ` [PATCH 3/4] nvmet: return DHCHAP status codes from nvmet_setup_auth() Hannes Reinecke
2024-03-01 13:13   ` Christoph Hellwig
2024-03-07  8:56   ` Sagi Grimberg
2024-03-07 11:19     ` Hannes Reinecke
2024-03-07 12:03       ` Sagi Grimberg
2024-03-01 11:28 ` [PATCH 4/4] nvmet-loop: do not call nvme_ctrl_put() after nvme_ctrl_uninit() Hannes Reinecke
2024-03-01 13:14   ` Christoph Hellwig
2024-03-07  8:58   ` Sagi Grimberg
2024-03-01 12:24 ` [PATCH 0/4] nvme: fixes for authentication errors Daniel Wagner
2024-03-03  2:58 ` Chaitanya Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=629656d5-2967-467b-8b80-759f4fccdfeb@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=hare@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).