From mboxrd@z Thu Jan 1 00:00:00 1970 From: sagi@grimberg.me (Sagi Grimberg) Date: Thu, 12 Apr 2018 15:34:12 +0300 Subject: [PATCH v1 0/5] Fix keep-alive mechanism for fabrics In-Reply-To: References: <1523380689-17151-1-git-send-email-maxg@mellanox.com> <512640d9-9e1d-5e52-d7be-5d8203381023@grimberg.me> <1c36f016-5b58-038f-8938-cd9deb15a205@grimberg.me> <3ee4b1a9-0b00-f7df-5198-a9f893e5a6af@broadcom.com> Message-ID: <8607cd8d-a0bd-ac88-7923-2473970e587d@grimberg.me> On 04/12/2018 11:49 AM, Max Gurtovoy wrote: > > > On 4/11/2018 7:48 PM, James Smart wrote: >> On 4/11/2018 7:40 AM, Max Gurtovoy wrote: >>> >>> >>> On 4/11/2018 5:07 PM, Sagi Grimberg wrote: >>>> >>>>>> I don't understand what you mean by a "property of the admin queue", >>>>>> there is no such definition in the spec. In any event, the keep alive >>>>>> mechanism was defined to detect host/controller health on a periodic >>>>>> basis. >>>>>> >>>>>> The spec clearly states that the host needs to send keep alive at >>>>>> a faster rate than the Keep alive timeout accounting things such >>>>>> as transport roundtrip times, transport delays, keep alive timer >>>>>> granularity and so on. >>>>>> >>>>>> To me, this is a clear case where your use-case requires a larger >>>>>> keep alive timeout. I'm not sure why sprinkling the keep alive >>>>>> execution >>>>>> around solve/help anything. To me this looks completely redundant >>>>>> (sorry). >>>>> >>>>> I think that starting keep-alive timer at the target side after >>>>> admin connect and starting keep-alive timer at the host side after >>>>> all io-queues connect is wrong. >>>>> In my solution I start keep-alive mechanism in the host side also >>>>> after admin connect (exactly as the target side). >> >> well - true, but I believe it should be started after the controller >> is enabled - not just that the admin queue has been created.? I don't >> think even the admin queue can be serviced till the adapter is >> enabled. Thus, I would agree with moving nvme_start_keep_alive() from >> nvme_start_ctrl() to nvme_enable_ctrl().?? We would also want the >> "enablement" to detect the keepalive style (today's keepalive or TBKA) >> supported by the time that call is made.?? Given that needs to be >> known, I believe there will always be a small window where a couple of >> things have to be accessed before the KA starts. > > At least I see that you agree to start keep-alive timer before creating > the IO queues so that's a progress :). > The fact that I didn't put it in nvme_enable_ctrl is because the FC code > never calls nvme_disable_ctrl (and there we probably should stop the > timer). > I'm trying to solve an issue without changing the whole design of the FC > implementation. We can do it incrementaly. Please don't touch any of the transport drivers for any of this, its the wrong place to handle keep alive. >> >> I strongly support Sagi's statements on trying to keep the KA >> start/stop in common code rather than sprinkling it around.?? I agree >> with the current placement of the stop code in nvme_stop_ctrl().?? The >> stop ctrl should always be called prior to the admin queue being torn >> down in the transport (as part of kicking off the reset of the >> controller). > > If we'll add KA start to nvme_enable_ctrl then the KA stop should be > added to nvme_disable_ctrl. We should be symmetric. Not a must. even after the controller is disabled we want to have a keep alive to it as we would need to enable it again at some point.