All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: James Smart <jsmart2021@gmail.com>, Daniel Wagner <dwagner@suse.de>
Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, Chaitanya Kulkarni <kch@nvidia.com>,
	Shin'ichiro Kawasaki <shinichiro@fastmail.com>,
	Sagi Grimberg <sagi@grimberg.me>, Ewan Milne <emilne@redhat.com>
Subject: Re: [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous
Date: Wed, 12 Jul 2023 08:50:40 +0200	[thread overview]
Message-ID: <5f239919-d4bb-17ed-f8d1-8edb4f3659fc@suse.de> (raw)
In-Reply-To: <0605ac36-16d5-2026-d3c6-62d346db6dfb@gmail.com>

On 7/12/23 00:47, James Smart wrote:
> On 7/6/2023 5:07 AM, Daniel Wagner wrote:
>> Hi James,
>>
>> On Sat, Jul 01, 2023 at 05:11:11AM -0700, James Smart wrote:
>>> As much as you want to make this change to make transports "similar", 
>>> I am dead set against it unless you are completing a long qualification
>>> of the change on real FC hardware and FC-NVME devices. There is probably 
>>> 1.5 yrs of testing of different race conditions that drove this change.
>>> You cannot declare success from a simplistic toy tool such as fcloop for 
>>> validation.
>>>
>>> The original issues exist, probably have even morphed given the time 
>>> from
>>> the original change, and this will seriously disrupt the transport 
>>> and any
>>> downstream releases.  So I have a very strong NACK on this change.
>>>
>>> Yes - things such as the connect failure results are difficult to return
>>> back to nvme-cli. I have had many gripes about the nvme-cli's 
>>> behavior over
>>> the years, especially on negative cases due to race conditions which
>>> required retries. It still fails this miserably.  The async reconnect 
>>> path
>>> solved many of these issues for fc.
>>>
>>> For the auth failure, how do we deal with things if auth fails over 
>>> time as
>>> reconnects fail due to a credential changes ?  I would think 
>>> commonality of
>>> this behavior drives part of the choice.
>>
>> Alright, what do you think about the idea to introduce a new '--sync' 
>> option to
>> nvme-cli which forwards this info to the kernel that we want to wait 
>> for the
>> initial connect to succeed or fail? Obviously, this needs to handle 
>> signals too.
>>
>>  From what I understood this is also what Ewan would like to have
> To me this is not sync vs non-sync option, it's a max_reconnects value 
> tested for in nvmf_should_reconnect(). Which, if set to 0 (or 1), should 
> fail if the initial connect fails.
> 
Well, this is more a technical detail while we continue to harp about 
'sync' vs 'non-sync'.
Currently all instances of ->create_ctrl() are running asynchronously,
ie ->create_ctrl() returns a 'ctrl' object which is still in the process
of establishing the connection.
(And there it doesn't really matter whether it's FC or TCP/RDMA; FC is 
kicking of a workqueue for the 'reconnect' call, whereas TCP/RDMA is 
creating the association and issues the actual 'connect' NVMe SQE via
an I/O workqueue; net result is identical).
And when we talk about 'sync' connect we are planning to _wait_ until
this asynchronous operation reaches a steady state, ie either after the 
connect attempts succeeded or after the connect retries are exhausted.

And yes, we _are_ aware that this might be a quite long time.

> Right now max_reconnects is calculated by the ctrl_loss_tmo and 
> reconnect_delay. So there's already a way via the cli to make sure 
> there's only 1 connect attempt. I wouldn't mind seeing an exact cli 
> option that sets it to 1 connection attempt w/o the user calculation and 
> 2 value specification.
> 
Again, we do _not_ propose to change any of the default settings.
The 'sync' option will not modify the reconnect settings, it will just 
wait until a steady state it reached.

> I also assume that this is not something that would be set by default in 
> the auto-connect scripts or automated cli startup scripts.
> 
You assume correctly. That's why it'll be an additional option.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


  reply	other threads:[~2023-07-12  6:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-20 13:37 [PATCH v2 0/5] nvme-fc: Fix blktests hangers Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 1/5] nvme-fc: Do not wait in vain when unloading module Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 2/5] nvme-fcloop: queue work items correctly Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 3/5] nvmet-fcloop: Remove remote port from list when unlinking Daniel Wagner
2023-06-20 13:37 ` [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous Daniel Wagner
2023-06-26 10:59   ` Dan Carpenter
2023-06-26 11:33   ` Dan Carpenter
2023-06-27  6:18     ` Daniel Wagner
2023-06-27  6:39       ` Hannes Reinecke
2023-06-27  6:51         ` Hannes Reinecke
2023-07-01 12:11   ` James Smart
2023-07-06 12:07     ` Daniel Wagner
2023-07-11 22:47       ` James Smart
2023-07-12  6:50         ` Hannes Reinecke [this message]
2023-07-13 20:35       ` Ewan Milne
2023-06-20 13:37 ` [PATCH v2 5/5] nvme-fc: do no free ctrl opts Daniel Wagner
2023-06-30 13:33 ` [PATCH v2 0/5] nvme-fc: Fix blktests hangers Ewan Milne
2023-06-23 17:14 [PATCH v2 4/5] nvme-fc: Make initial connect attempt synchronous kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f239919-d4bb-17ed-f8d1-8edb4f3659fc@suse.de \
    --to=hare@suse.de \
    --cc=dwagner@suse.de \
    --cc=emilne@redhat.com \
    --cc=jsmart2021@gmail.com \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=shinichiro@fastmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.