From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Daniel Wagner <dwagner@suse.de>,
James Smart <james.smart@broadcom.com>
Cc: Keith Busch <kbusch@kernel.org>, Christoph Hellwig <hch@lst.de>,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries
Date: Thu, 7 Mar 2024 13:52:25 +0100 [thread overview]
Message-ID: <b23a5c7c-a877-4cde-acd4-50c21c3ef1fc@suse.de> (raw)
In-Reply-To: <b02588cb-6fbc-4116-86d6-173c115f50c5@grimberg.me>
On 3/7/24 13:14, Sagi Grimberg wrote:
>
>
> On 07/03/2024 13:45, Hannes Reinecke wrote:
>> On 3/7/24 12:30, Sagi Grimberg wrote:
>>>
[ .. ]
>>>
>>> Where is this retried today, I don't see where connect failure is
>>> retried, outside of a periodic reconnect.
>>> Maybe I'm missing where what is the actual failure here.
>>
>> static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
>> {
>> struct nvme_tcp_ctrl *tcp_ctrl =
>> container_of(to_delayed_work(work),
>> struct nvme_tcp_ctrl, connect_work);
>> struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;
>>
>> ++ctrl->nr_reconnects;
>>
>> if (nvme_tcp_setup_ctrl(ctrl, false))
>> goto requeue;
>>
>> dev_info(ctrl->device, "Successfully reconnected (%d attempt)\n",
>> ctrl->nr_reconnects);
>>
>> ctrl->nr_reconnects = 0;
>>
>> return;
>>
>> requeue:
>> dev_info(ctrl->device, "Failed reconnect attempt %d\n",
>>
>> and nvme_tcp_setup_ctrl() returns either a negative errno or an NVMe
>> status code (which might include the DNR bit).
>
> I thought this is about the initialization. yes today we ignore the
> status in re-connection assuming that whatever
> happened, may (or may not) resolve itself. The basis for this assumption
> is that if we managed to connect the first
> time there is no reason to assume that connecting again should fail
> persistently.
>
And that is another issue where I'm not really comfortable with.
While it would make sense to have the connect functionality to be
one-shot, and let userspace retry if needed, the problem is that we
don't have a means of transporting that information to userspace.
The only thing which we can transport is an error number, which
could be anything and mean anything.
If we had a defined way stating: 'This is a retryable, retry with the
same options.' vs 'This is retryable error, retry with modified
options.' vs 'This a non-retryable error, don't bother.' I'd be
fine with delegating retries to userspace.
But currently we don't.
> If there is a consensus that we should not assume it, its a valid
> argument. I didn't see where this happens with respect
> to authentication though.
nvmf_connect_admin_queue():
/* Authentication required */
ret = nvme_auth_negotiate(ctrl, 0);
if (ret) {
dev_warn(ctrl->device,
"qid 0: authentication setup failed\n");
ret = NVME_SC_AUTH_REQUIRED;
goto out_free_data;
}
ret = nvme_auth_wait(ctrl, 0);
if (ret)
dev_warn(ctrl->device,
"qid 0: authentication failed\n");
else
dev_info(ctrl->device,
"qid 0: authenticated\n");
The first call to 'nvme_auth_negotiate()' is just for setting up
the negotiation context and start the protocol. So if we get
an error here it's pretty much non-retryable as it's completely
controlled by the fabrics options.
nvme_auth_wait(), OTOH, contains the actual result from the negotiation,
so there we might or might not retry, depending on the value of 'ret'.
Cheers,
Hannes
next prev parent reply other threads:[~2024-03-07 14:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-05 8:00 [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries Daniel Wagner
2024-03-05 8:00 ` [PATCH v3 1/2] nvme-tcp: short-circuit reconnect retries Daniel Wagner
2024-03-06 8:10 ` Chaitanya Kulkarni
2024-03-05 8:00 ` [PATCH v3 2/2] nvme-rdma: " Daniel Wagner
2024-03-06 8:11 ` Chaitanya Kulkarni
2024-03-06 8:17 ` Chaitanya Kulkarni
2024-03-07 8:00 ` [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries Sagi Grimberg
2024-03-07 10:37 ` Hannes Reinecke
2024-03-07 11:30 ` Sagi Grimberg
2024-03-07 11:45 ` Hannes Reinecke
2024-03-07 12:14 ` Sagi Grimberg
2024-03-07 12:52 ` Hannes Reinecke [this message]
2024-03-08 10:21 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b23a5c7c-a877-4cde-acd4-50c21c3ef1fc@suse.de \
--to=hare@suse.de \
--cc=dwagner@suse.de \
--cc=hch@lst.de \
--cc=james.smart@broadcom.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).