On Sep 24 08:05, Hannes Reinecke wrote: > On 9/23/21 10:09 PM, Klaus Jensen wrote: > > On Sep 9 13:37, Hannes Reinecke wrote: > > > On 9/9/21 12:47 PM, Klaus Jensen wrote: > > > > On Sep 9 11:43, Hannes Reinecke wrote: > > > > > With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging") > > > > > namespaces get moved from the controller to the subsystem if one > > > > > is specified. > > > > > That keeps the namespaces alive after a controller hot-unplug, but > > > > > after a controller hotplug we have to reconnect the namespaces > > > > > from the subsystem to the controller. > > > > > > > > > > Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging") > > > > > Cc: Klaus Jensen > > > > > Signed-off-by: Hannes Reinecke > > > > > --- > > > > > hw/nvme/subsys.c | 8 +++++++- > > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c > > > > > index 93c35950d6..a9404f2b5e 100644 > > > > > --- a/hw/nvme/subsys.c > > > > > +++ b/hw/nvme/subsys.c > > > > > @@ -14,7 +14,7 @@ > > > > > int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) > > > > > { > > > > > NvmeSubsystem *subsys = n->subsys; > > > > > - int cntlid; > > > > > + int cntlid, nsid; > > > > > for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) { > > > > > if (!subsys->ctrls[cntlid]) { > > > > > @@ -29,12 +29,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) > > > > > subsys->ctrls[cntlid] = n; > > > > > + for (nsid = 0; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) { > > > > > + if (subsys->namespaces[nsid]) { > > > > > + nvme_attach_ns(n, subsys->namespaces[nsid]); > > > > > + } > > > > > > > > Thanks Hannes! I like it, keeping things simple. > > > > > > > > But we should only attach namespaces that have the shared property or > > > > have ns->attached == 0. Non-shared namespaces may already be attached to > > > > another controller in the subsystem. > > > > > > > > > > Well ... I tried to avoid that subject, but as you brought it up: > > > There is a slightly tricky issue in fabrics, in that the 'controller' is > > > independent from the 'connection'. > > > The 'shared' bit in the CMIC setting indicates that the subsystem may > > > have more than one _controller_. It doesn't talk about how many > > > _connections_ a controller may support; that then is the realm of > > > dynamic or static controllers, which we don't talk about :-). > > > Sufficient to say, linux only implements the dynamic controller model, > > > so every connection will be to a different controller. > > > But you have been warned :-) > > > > > > However, the 'CMIC' setting is independent on the 'NMIC' setting (ie the > > > 'shared' bit in the namespace). > > > Which leads to the interesting question on how exactly should one handle > > > non-shared namespaces in subsystems for which there are multiple > > > controllers. Especially as the NSID space is per subsystem, so each > > > controller will be able to figure out if there are blanked-out namespaces. > > > So to make this a sane setup I would propose to set the 'shared' option > > > automatically whenever the controller has the 'subsys' option set. > > > And actually, I would ditch the 'shared' option completely, and make it > > > dependent on the 'subsys' setting for the controller. > > > Much like we treat the 'CMIC' setting nowadays. > > > That avoids lots of confusions, and also make the implementation _way_ > > > easier. > > > > > > > I see your point. Unfortunately, there is no easy way to ditch shared= > > now. But I think it'd be good enough to attach to shared automatically, > > but not to "shared=off". > > > > I've already ditched the shared parameter on my RFC refactor series and > > always having the namespaces shared. > > > > Okay. > > > > > However... > > > > > > > > The spec says that "The attach and detach operations are persistent > > > > across all reset events.". This means that we should track those events > > > > in the subsystem and only reattach namespaces that were attached at the > > > > time of the "reset" event. Currently, we don't have anything mapping > > > > that state. But the device already has to take some liberties with > > > > regard to stuff that is considered persistent by the spec (SMART log > > > > etc.) since we do not have any way to store stuff persistently across > > > > qemu invocations, so I think the above is an acceptable compromise. > > > > > > > Careful. 'attach' and 'detach' are MI (management interface) operations. > > > If and how many namespaces are visible to any given controllers is > > > actually independent on that; and, in fact, controllers might not even > > > implement 'attach' or 'detach'. > > > But I do agree that we don't handle the 'reset' state properly. > > > > > > > The Namespace Attachment command has nothing to do with MI? And the QEMU > > controller does support attaching and detaching namespaces. > > > > No, you got me wrong. Whether a not a namespace is attached is independent > of any 'Namespace attachment' command. > Case in point: the subsystem will be starting up with namespace already > attached, even though no-one issued any namespace attachment command. > Right, understood. > > > > A potential (as good as it gets) fix would be to keep a map/list of > > > > "persistently" attached controllers on the namespaces and re-attach > > > > according to that when we see that controller joining the subsystem > > > > again. CNTLID would be the obvious choice for the key here, but problem > > > > is that we cant really use it since we assign it sequentially from the > > > > subsystem, which now looks like a pretty bad choice. CNTLID should have > > > > been a required property of the nvme device when subsystems are > > > > involved. Maybe we can fix up the CNTLID assignment to take the serial > > > > into account (we know that is defined and *should* be unique) and not > > > > reuse CNTLIDs. This limits the subsystem to NVME_MAX_CONTROLLERS unique > > > > controllers, but I think that is fair enough. > > > > > > > > Sigh. Need to think this through. > > > > > > > Well, actually there is an easy way out. I do agree that we need to make > > > the 'cntlid' a property of the controller. And once it's set we can > > > track it properly, eg by having per-cntlid nsid lists in the subsystem. > > > But if it's not set we can claim that we'll be allocating a new > > > controller across reboots (which is actually what we're doing), making > > > us spec compliant again :-) > > > > > > > That is a decent solution, but we still can't track it across reboots. I > > think we should just with your patch (but for now, only automatically > > attach shared namespaces). > > > But that's the point. > We don't _need_ to track it. > We only need to track it when it's specified via a (yet-to-be-added) cntlid > parameter, but then it's trivial. > If it's not specified we will allocate a new one and don't need to track > stuff. That's even permitted by the NVMe spec; v2.0 section 3.1.1 has: > > While allocation of static controllers to hosts are expected to be durable > (so that hosts can expect to form associations to the same controllers > repeatedly (e.g., after each host reboot)), the NVM subsystem may remove the > host allocation of a controller that is not in use at any time for > implementation specific reasons > (e.g., controller resource reclamation, subsystem reconfiguration). > Awesome. Thanks for pointing me towards that wording. > > Would that be acceptable to you? It would require your to add shared=on > > on your single-namespace test set up since unfortunately, shared=on is > > not the default. However, we can consider changing that default in 6.2. > > > Yes, for an interim solution it's okay. > The important bit is to get hotplug up and running again for NVMe; we do > have some testcases for patches in upstream linux which I would like to > forward to our QA Team. > Cool. I've sent out a series with your patch and an additional patch that changes the default for 'shared'.