Hi Andrey, Thanks for clarifying that you were talking specifically about the bdev layer. I think you are right there (And Ben spoke to me today as well about this). When I was initially working on the bdev_nvme implementation of failover, we were trying to mimic what happens in the kernel when you lose contact with a target side NVMe drive. Essentially, it just continually attempts to reconnect until you disconnect the bdev. But it makes sense to expose that through the bdev hotplug functionality. As far as hotplug insertion goes, do you have a specific idea of how to implement that in a generic enough way to be universally applicable? I can think of a few different things people might want to do: 1. Have a specific set of subsystems on a given TRID (set of TRIDs) that you would want to poll. 2. Attach to all subsystems on a given set of TRIDs to poll. I think both of these would require some changes to both the bdev_nvme_set_hotplug API and RPC methods since they currently only allow probing of PCIe addresses. It might be best to simply allow users to call bdev_nvme_set_hotplug multiple times specifying a single TRID each time and then keep an internal list of all of the monitored targets. I'm definitely in favor of both of these changes. I would be interested to see what others in the community think as well though in case I am missing anything. The removal side could be pretty readily implemented, but there are the API considerations on the insertion side. Thanks, Seth -----Original Message----- From: Andrey Kuzmin Sent: Tuesday, May 26, 2020 2:11 PM To: Storage Performance Development Kit Subject: [SPDK] Re: NVMe hotplug for RDMA and TCP transports Thanks Seth, please fins a few comments inline below. On Tue, May 26, 2020, 23:35 Howell, Seth wrote: > Hi Andrey, > > Typically when we refer to hotplug (removal) That addition speaks for itself :). Since I'm interested in both hot removal/plugging, that's just wording. in fabrics transports, we are talking about the target side of the > connection suddenly disconnecting the admin and I/O qpairs by the > target side of the connection. This definition of hotplug is already > supported in the NVMe initiator. If your definition of hotplug is > something different, please correct me so that I can better answer your question. > > In RDMA for example, when we receive a disconnect event on the admin > qpair for a given controller, we mark that controller as failed and > fail up all I/O corresponding to I/O qpairs on that controller. Then > subsequent calls to either submit I/O or process completions on any > qpair associated with that controller return -ENXIO indicating to the > initiator application that the drive has been failed by the target side. > There are a couple of reasons that could happen: > 1. The actual drive itself has been hotplugged from the target > application (i.e. nvme pcie hotplug on the target side) 2. There wsa > some network event that caused the target application to disconnect > (NIC failure, RDMA error, etc) > > Because there are multiple reasons we could receive a "hotplug" event > from the target application we leave it up to the initator application > to decide what they want to do with this. Either destroy the > controller from the initiator side, try reconnecting to the controller > from the same TRID or attempting to connect to the controller from a > different TRID (something like target side port failover). > What I'm concerned with right now is that the above decision is seemingly at odds with the SPDK own bdev layer hotremove functionality. When spdk_bdev_open is being called, the caller provides a hotremove callback that is expected to be called when the associated bdev goes away. If, for instance, I model hotplug event by killing SPDK nvmf/tcp target while running bdeperf against namespace bdevs it exposes, I'd expect the bdeperf hotremove callback to be fired for each active target namespace. What I'm witnessing instead is the target subsystem controller (on the initiator side) going into failed state after a number of unsuccessful resets, with bdevperf failing due to I/O errors rather than cleanly handling the hotremove event. Is that by design so that I'm looking for something that's actually not expected to work, or is bdev layer hot-remove functionality a bit ahead of the nvme layer in this case? > In terms of hotplug insertion, I assume that would mean you want the > initiator to automatically connect to a target subsystem that can be > presented at any point in time during the running of the application. > There isn't a specific driver level implementation of this feature for > fabrics controllers, I think mostly because it would be very easy to > implement and customize this functionality at the application layer. > For example, one could periodically call discover on the targets they > want to connect to and when new controllers/subsystems appear, connect to them at that time. > Understood, though I'd expect such a feature to be pretty popular, similar to PCIe hotplug (which currently works), so providing it off-the-shelf rather than leaving the implementation to SPDK users would make sense to me. Thanks, Andrey > I hope that this answers your question. Please let me know if I am > talking about a different definition of hotplug than the one you are using. > > Thanks, > > Seth > > > > -----Original Message----- > From: Andrey Kuzmin > Sent: Friday, May 22, 2020 1:47 AM > To: Storage Performance Development Kit > Subject: [SPDK] NVMe hotplug for RDMA and TCP transports > > Hi team, > > is NVMe hotplug functionality as implemented limited to PCIe transport > or does it also work for other transports? If it's currently PCIe > only, are there any plans to extend the support to RDMA/TCP? > > Thanks, > Andrey > _______________________________________________ > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to > spdk-leave(a)lists.01.org > _______________________________________________ > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to > spdk-leave(a)lists.01.org > _______________________________________________ SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to spdk-leave(a)lists.01.org