On Sep 9 13:37, Hannes Reinecke wrote: > On 9/9/21 12:47 PM, Klaus Jensen wrote: > > On Sep 9 11:43, Hannes Reinecke wrote: > >> With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging") > >> namespaces get moved from the controller to the subsystem if one > >> is specified. > >> That keeps the namespaces alive after a controller hot-unplug, but > >> after a controller hotplug we have to reconnect the namespaces > >> from the subsystem to the controller. > >> > >> Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging") > >> Cc: Klaus Jensen > >> Signed-off-by: Hannes Reinecke > >> --- > >> hw/nvme/subsys.c | 8 +++++++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c > >> index 93c35950d6..a9404f2b5e 100644 > >> --- a/hw/nvme/subsys.c > >> +++ b/hw/nvme/subsys.c > >> @@ -14,7 +14,7 @@ > >> int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) > >> { > >> NvmeSubsystem *subsys = n->subsys; > >> - int cntlid; > >> + int cntlid, nsid; > >> > >> for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) { > >> if (!subsys->ctrls[cntlid]) { > >> @@ -29,12 +29,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) > >> > >> subsys->ctrls[cntlid] = n; > >> > >> + for (nsid = 0; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) { > >> + if (subsys->namespaces[nsid]) { > >> + nvme_attach_ns(n, subsys->namespaces[nsid]); > >> + } > > > > Thanks Hannes! I like it, keeping things simple. > > > > But we should only attach namespaces that have the shared property or > > have ns->attached == 0. Non-shared namespaces may already be attached to > > another controller in the subsystem. > > > > Well ... I tried to avoid that subject, but as you brought it up: > There is a slightly tricky issue in fabrics, in that the 'controller' is > independent from the 'connection'. > The 'shared' bit in the CMIC setting indicates that the subsystem may > have more than one _controller_. It doesn't talk about how many > _connections_ a controller may support; that then is the realm of > dynamic or static controllers, which we don't talk about :-). > Sufficient to say, linux only implements the dynamic controller model, > so every connection will be to a different controller. > But you have been warned :-) > > However, the 'CMIC' setting is independent on the 'NMIC' setting (ie the > 'shared' bit in the namespace). > Which leads to the interesting question on how exactly should one handle > non-shared namespaces in subsystems for which there are multiple > controllers. Especially as the NSID space is per subsystem, so each > controller will be able to figure out if there are blanked-out namespaces. > So to make this a sane setup I would propose to set the 'shared' option > automatically whenever the controller has the 'subsys' option set. > And actually, I would ditch the 'shared' option completely, and make it > dependent on the 'subsys' setting for the controller. > Much like we treat the 'CMIC' setting nowadays. > That avoids lots of confusions, and also make the implementation _way_ > easier. > I see your point. Unfortunately, there is no easy way to ditch shared= now. But I think it'd be good enough to attach to shared automatically, but not to "shared=off". I've already ditched the shared parameter on my RFC refactor series and always having the namespaces shared. > > However... > > > > The spec says that "The attach and detach operations are persistent > > across all reset events.". This means that we should track those events > > in the subsystem and only reattach namespaces that were attached at the > > time of the "reset" event. Currently, we don't have anything mapping > > that state. But the device already has to take some liberties with > > regard to stuff that is considered persistent by the spec (SMART log > > etc.) since we do not have any way to store stuff persistently across > > qemu invocations, so I think the above is an acceptable compromise. > > > Careful. 'attach' and 'detach' are MI (management interface) operations. > If and how many namespaces are visible to any given controllers is > actually independent on that; and, in fact, controllers might not even > implement 'attach' or 'detach'. > But I do agree that we don't handle the 'reset' state properly. > The Namespace Attachment command has nothing to do with MI? And the QEMU controller does support attaching and detaching namespaces. > > A potential (as good as it gets) fix would be to keep a map/list of > > "persistently" attached controllers on the namespaces and re-attach > > according to that when we see that controller joining the subsystem > > again. CNTLID would be the obvious choice for the key here, but problem > > is that we cant really use it since we assign it sequentially from the > > subsystem, which now looks like a pretty bad choice. CNTLID should have > > been a required property of the nvme device when subsystems are > > involved. Maybe we can fix up the CNTLID assignment to take the serial > > into account (we know that is defined and *should* be unique) and not > > reuse CNTLIDs. This limits the subsystem to NVME_MAX_CONTROLLERS unique > > controllers, but I think that is fair enough. > > > > Sigh. Need to think this through. > > > Well, actually there is an easy way out. I do agree that we need to make > the 'cntlid' a property of the controller. And once it's set we can > track it properly, eg by having per-cntlid nsid lists in the subsystem. > But if it's not set we can claim that we'll be allocating a new > controller across reboots (which is actually what we're doing), making > us spec compliant again :-) > That is a decent solution, but we still can't track it across reboots. I think we should just with your patch (but for now, only automatically attach shared namespaces). Would that be acceptable to you? It would require your to add shared=on on your single-namespace test set up since unfortunately, shared=on is not the default. However, we can consider changing that default in 6.2.