linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Hannes Reinecke <hare@suse.de>, Daniel Wagner <dwagner@suse.de>,
	James Smart <james.smart@broadcom.com>
Cc: Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries
Date: Fri, 8 Mar 2024 12:21:40 +0200	[thread overview]
Message-ID: <2cda7c9a-a460-4bb4-95f7-ab44f8f1007c@grimberg.me> (raw)
In-Reply-To: <b23a5c7c-a877-4cde-acd4-50c21c3ef1fc@suse.de>



On 07/03/2024 14:52, Hannes Reinecke wrote:
> On 3/7/24 13:14, Sagi Grimberg wrote:
>>
>>
>> On 07/03/2024 13:45, Hannes Reinecke wrote:
>>> On 3/7/24 12:30, Sagi Grimberg wrote:
>>>>
> [ .. ]
>>>>
>>>> Where is this retried today, I don't see where connect failure is 
>>>> retried, outside of a periodic reconnect.
>>>> Maybe I'm missing where what is the actual failure here.
>>>
>>> static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
>>> {
>>>         struct nvme_tcp_ctrl *tcp_ctrl =
>>>                         container_of(to_delayed_work(work),
>>>                         struct nvme_tcp_ctrl, connect_work);
>>>         struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;
>>>
>>>         ++ctrl->nr_reconnects;
>>>
>>>         if (nvme_tcp_setup_ctrl(ctrl, false))
>>>                 goto requeue;
>>>
>>>         dev_info(ctrl->device, "Successfully reconnected (%d 
>>> attempt)\n",
>>>                         ctrl->nr_reconnects);
>>>
>>>         ctrl->nr_reconnects = 0;
>>>
>>>         return;
>>>
>>> requeue:
>>>         dev_info(ctrl->device, "Failed reconnect attempt %d\n",
>>>
>>> and nvme_tcp_setup_ctrl() returns either a negative errno or an NVMe 
>>> status code (which might include the DNR bit).
>>
>> I thought this is about the initialization. yes today we ignore the 
>> status in re-connection assuming that whatever
>> happened, may (or may not) resolve itself. The basis for this 
>> assumption is that if we managed to connect the first
>> time there is no reason to assume that connecting again should fail 
>> persistently.
>>
> And that is another issue where I'm not really comfortable with.
> While it would make sense to have the connect functionality to be
> one-shot, and let userspace retry if needed, the problem is that we
> don't have a means of transporting that information to userspace.
> The only thing which we can transport is an error number, which
> could be anything and mean anything.

Not necessarily. error codes semantics exists for a reason.
I just really don't think that doing reconnects on a user-driven 
initialization is a good idea at all
unlike the case where the controller was connected and got disrupted, 
this is not user driven and
hence makes sense.

> If we had a defined way stating: 'This is a retryable, retry with the 
> same options.' vs 'This is retryable error, retry with modified 
> options.' vs 'This a non-retryable error, don't bother.' I'd be
> fine with delegating retries to userspace.
> But currently we don't.

Well, TBH I don't know if userspace even needs it. Most likely what a 
user would want is to define
a number of retries and give up if they expire. Adding the intelligence 
for what connect is retry-able or
not does not seem all that useful to me.

>
>> If there is a consensus that we should not assume it, its a valid 
>> argument. I didn't see where this happens with respect
>> to authentication though.
>
> nvmf_connect_admin_queue():
>
>             /* Authentication required */
>             ret = nvme_auth_negotiate(ctrl, 0);
>             if (ret) {
>                     dev_warn(ctrl->device,
>                              "qid 0: authentication setup failed\n");
>                     ret = NVME_SC_AUTH_REQUIRED;
>                     goto out_free_data;
>             }
>             ret = nvme_auth_wait(ctrl, 0);
>             if (ret)
>                     dev_warn(ctrl->device,
>                              "qid 0: authentication failed\n");
>             else
>                     dev_info(ctrl->device,
>                              "qid 0: authenticated\n");
>
> The first call to 'nvme_auth_negotiate()' is just for setting up
> the negotiation context and start the protocol. So if we get
> an error here it's pretty much non-retryable as it's completely
> controlled by the fabrics options.
> nvme_auth_wait(), OTOH, contains the actual result from the negotiation,
> so there we might or might not retry, depending on the value of 'ret'.
>
> Cheers,
>
> Hannes
>


      reply	other threads:[~2024-03-08 10:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  8:00 [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries Daniel Wagner
2024-03-05  8:00 ` [PATCH v3 1/2] nvme-tcp: short-circuit reconnect retries Daniel Wagner
2024-03-06  8:10   ` Chaitanya Kulkarni
2024-03-05  8:00 ` [PATCH v3 2/2] nvme-rdma: " Daniel Wagner
2024-03-06  8:11   ` Chaitanya Kulkarni
2024-03-06  8:17     ` Chaitanya Kulkarni
2024-03-07  8:00 ` [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries Sagi Grimberg
2024-03-07 10:37   ` Hannes Reinecke
2024-03-07 11:30     ` Sagi Grimberg
2024-03-07 11:45       ` Hannes Reinecke
2024-03-07 12:14         ` Sagi Grimberg
2024-03-07 12:52           ` Hannes Reinecke
2024-03-08 10:21             ` Sagi Grimberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2cda7c9a-a460-4bb4-95f7-ab44f8f1007c@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=dwagner@suse.de \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).