On May 11 15:12, Hannes Reinecke wrote: >On 5/11/21 2:22 PM, Klaus Jensen wrote: >> On May 11 09:35, Hannes Reinecke wrote: >>> Ever since commit e570768566 ("hw/block/nvme: support for shared >>> namespace in subsystem") NVMe PCI hotplug is broken, as the PCI >>> hotplug infrastructure will only work for the nvme devices (which >>> are PCI devices), but not for any attached namespaces. >>> So when re-adding the NVMe PCI device via 'device_add' the NVMe >>> controller is added, but all namespaces are missing. >>> This patch adds device hotplug hooks for NVMe namespaces, such that one >>> can call 'device_add nvme-ns' to (re-)attach the namespaces after >>> the PCI NVMe device 'device_add nvme' hotplug call. >>> >> >> Hi Hannes, >> >> Thanks for this. >> >> The real fix here is that namespaces are properly detached from other >> controllers that it may be shared on. >> >> But is this really the behavior we want? That nvme-ns devices always >> "belongs to" (in QEMU qdev terms) an nvme device is an artifact of the >> Bus/Device architecture and not really how an NVM subsystem should >> behave. Removing a controller should not cause shared namespaces to >> disappear from other controllers. >> >> I have a WIP that instead adds an NvmeBus to the nvme-subsys device and >> reparents the nvme-ns devices to that if the parent controller is linked >> to a sybsystem. This way, nvme-ns devices wont be unrealized under the >> feet of other controllers. >> >That would be the other direction I thought of; _technically_ NVMe >namespaces are objects of the subsystem, and 'controllers' are just >temporary objects providing access to the namespaces presented by the >subsystem. >So if you are going to rework it I'd rather make the namespaces children >objects of the subsystem, and have nsid maps per controller detailing >which nsids are accessible from the individual controllers. >That would probably a simple memcpy() to start with, but it would allow >us to modify that map via NVMe-MI and stuff. > >However, if you do that you'll find that subsystems can't be hotplugged, >too; but I'm sure you'll be able to fix it up :-) > >> The hotplug fix looks good - I'll post a series that tries to integrate >> both. >> >Ta. > >The more I think about it, the more I think we should be looking into >reparenting the namespaces to the subsystem. >That would have the _immediate_ benefit that 'device_del' and >'device_add' becomes symmetric (ie one doesn't have to do a separate >'device_add nvme-ns'), as the nvme namespace is not affected by the >hotplug event. > I have that working, but I'm struggling with a QEMU API technicality in that I apparently cannot simply move the NvmeBus creation to the nvme-subsys device. For some reason the bus is not available for the nvme-ns devices. That is, if one does something like this: -device nvme-subsys,... -device nvme-ns,... Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns". This is probably just me not grok'ing the qdev well enough, so I'll keep trying to fix that. What works now is to have the regular setup: -device nvme-subsys,... -device nvme,... -device nvme-ns,... And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys (which magically now IS available when nvme-ns is realized). This has the same end result, but I really would like that the namespaces could be specified as children of the subsys directly. Any help from qdev experts would be appreciated! :) >This really was a quick hack to demonstrate a shortcoming in the linux >NVMe stack (cf 'nvme-mpath: delete disk after last connection' if you >are interested in details), so I'm sure there is room for improvement. > I follow linux-nvme, so I saw the patch ;) >And the prime reason for sending it out was to gauge interest by >qemu-devel; I have a somewhat mixed experience when sending patches to >the qemu ML ... > Your contribution is very much appreciated :)