All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] hw/nvme: fix controller hotplugging
@ 2021-07-06  9:33 Klaus Jensen
  2021-07-06  9:33 ` [PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Klaus Jensen @ 2021-07-06  9:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Keith Busch, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
discussed a bit back and fourth and I mentioned that the core issue was
an artifact of the parent/child relationship stemming from the qdev
setup we have with namespaces attaching to controller through a qdev
bus.

The gist of this series is the fourth patch "hw/nvme: fix controller hot
unplugging" which basically causes namespaces to be reassigned to a bus
owned by the subsystem if the parent controller is linked to one. This
fixes `device_del/add nvme` in such settings.

Note, that in the case that there is no subsystem involved, nvme devices
can be removed from the system with `device_del`, but this *will* cause
the namespaces to be removed as well since there is no place (i.e. no
subsystem) for them to "linger". And since this series does not add
support for hotplugging nvme-ns devices, while an nvme device can be
readded, no namespaces can. Support for hotplugging nvme-ns devices is
present in [1], but I'd rather not add that since I think '-device
nvme-ns' is already a bad design choice.

Now, I do realize that it is not "pretty" to explicitly change the
parent bus, so I do have a an RFC patch in queue that replaces the
subsystem and namespace devices with objects, but keeps -device shims
available for backwards compatibility. This approach will solve the
problems properly and should be a better model. However, I don't believe
it will make it for 6.1 and I'd really like to at least fix the
unplugging for 6.1 and this gets the job done.

  [1]: 20210511073511.32511-1-hare@suse.de

Klaus Jensen (4):
  hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
  hw/nvme: mark nvme-subsys non-hotpluggable
  hw/nvme: unregister controller with subsystem at exit
  hw/nvme: fix controller hot unplugging

 hw/nvme/nvme.h   | 21 ++++++++-------
 hw/nvme/ctrl.c   | 14 +++++++---
 hw/nvme/ns.c     | 67 ++++++++++++++++++++++++++++++------------------
 hw/nvme/subsys.c | 10 ++++++++
 4 files changed, 74 insertions(+), 38 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
  2021-07-06  9:33 [PATCH 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
@ 2021-07-06  9:33 ` Klaus Jensen
  2021-07-07  7:45   ` Hannes Reinecke
  2021-07-06  9:33 ` [PATCH 2/4] hw/nvme: mark nvme-subsys non-hotpluggable Klaus Jensen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Klaus Jensen @ 2021-07-06  9:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Keith Busch, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

The nvme_ns_setup and nvme_ns_check_constraints should not depend on the
controller state. Refactor and remove it.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/nvme.h |  2 +-
 hw/nvme/ctrl.c |  2 +-
 hw/nvme/ns.c   | 37 ++++++++++++++++++-------------------
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 56f8eceed2ad..0868359a1e86 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -246,7 +246,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 }
 
 void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 629b0d38c2a2..dd1801510032 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6498,7 +6498,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         ns = &n->namespace;
         ns->params.nsid = 1;
 
-        if (nvme_ns_setup(n, ns, errp)) {
+        if (nvme_ns_setup(ns, errp)) {
             return;
         }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 4275c3db6301..3c4f5b8c714a 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -346,8 +346,7 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns)
     assert(ns->nr_open_zones == 0);
 }
 
-static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns,
-                                     Error **errp)
+static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
 {
     if (!ns->blkconf.blk) {
         error_setg(errp, "block backend not configured");
@@ -366,20 +365,6 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns,
         return -1;
     }
 
-    if (!n->subsys) {
-        if (ns->params.detached) {
-            error_setg(errp, "detached requires that the nvme device is "
-                       "linked to an nvme-subsys device");
-            return -1;
-        }
-
-        if (ns->params.shared) {
-            error_setg(errp, "shared requires that the nvme device is "
-                       "linked to an nvme-subsys device");
-            return -1;
-        }
-    }
-
     if (ns->params.zoned) {
         if (ns->params.max_active_zones) {
             if (ns->params.max_open_zones > ns->params.max_active_zones) {
@@ -411,9 +396,9 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns,
     return 0;
 }
 
-int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 {
-    if (nvme_ns_check_constraints(n, ns, errp)) {
+    if (nvme_ns_check_constraints(ns, errp)) {
         return -1;
     }
 
@@ -465,7 +450,21 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
     uint32_t nsid = ns->params.nsid;
     int i;
 
-    if (nvme_ns_setup(n, ns, errp)) {
+    if (!n->subsys) {
+        if (ns->params.detached) {
+            error_setg(errp, "detached requires that the nvme device is "
+                       "linked to an nvme-subsys device");
+            return;
+        }
+
+        if (ns->params.shared) {
+            error_setg(errp, "shared requires that the nvme device is "
+                       "linked to an nvme-subsys device");
+            return;
+        }
+    }
+
+    if (nvme_ns_setup(ns, errp)) {
         return;
     }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] hw/nvme: mark nvme-subsys non-hotpluggable
  2021-07-06  9:33 [PATCH 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
  2021-07-06  9:33 ` [PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
@ 2021-07-06  9:33 ` Klaus Jensen
  2021-07-07  7:45   ` Hannes Reinecke
  2021-07-06  9:33 ` [PATCH 3/4] hw/nvme: unregister controller with subsystem at exit Klaus Jensen
  2021-07-06  9:33 ` [PATCH 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
  3 siblings, 1 reply; 16+ messages in thread
From: Klaus Jensen @ 2021-07-06  9:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Keith Busch, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

We currently lack the infrastructure to handle subsystem hotplugging, so
disable it.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/subsys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 192223d17ca1..dc7a96862f37 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -61,6 +61,7 @@ static void nvme_subsys_class_init(ObjectClass *oc, void *data)
 
     dc->realize = nvme_subsys_realize;
     dc->desc = "Virtual NVMe subsystem";
+    dc->hotpluggable = false;
 
     device_class_set_props(dc, nvme_subsystem_props);
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] hw/nvme: unregister controller with subsystem at exit
  2021-07-06  9:33 [PATCH 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
  2021-07-06  9:33 ` [PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
  2021-07-06  9:33 ` [PATCH 2/4] hw/nvme: mark nvme-subsys non-hotpluggable Klaus Jensen
@ 2021-07-06  9:33 ` Klaus Jensen
  2021-07-07  7:46   ` Hannes Reinecke
  2021-07-06  9:33 ` [PATCH 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
  3 siblings, 1 reply; 16+ messages in thread
From: Klaus Jensen @ 2021-07-06  9:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Keith Busch, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

Make sure the controller is unregistered from the subsystem when device
is removed.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/nvme.h   | 1 +
 hw/nvme/ctrl.c   | 4 ++++
 hw/nvme/subsys.c | 5 +++++
 3 files changed, 10 insertions(+)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 0868359a1e86..c4065467d877 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -50,6 +50,7 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
                                          uint32_t cntlid)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dd1801510032..90e3ee2b70ee 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6523,6 +6523,10 @@ static void nvme_exit(PCIDevice *pci_dev)
         nvme_ns_cleanup(ns);
     }
 
+    if (n->subsys) {
+        nvme_subsys_unregister_ctrl(n->subsys, n);
+    }
+
     g_free(n->cq);
     g_free(n->sq);
     g_free(n->aer_reqs);
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index dc7a96862f37..92caa604a280 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -32,6 +32,11 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
     return cntlid;
 }
 
+void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n)
+{
+    subsys->ctrls[n->cntlid] = NULL;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
     const char *nqn = subsys->params.nqn ?
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-06  9:33 [PATCH 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
                   ` (2 preceding siblings ...)
  2021-07-06  9:33 ` [PATCH 3/4] hw/nvme: unregister controller with subsystem at exit Klaus Jensen
@ 2021-07-06  9:33 ` Klaus Jensen
  2021-07-07  7:49   ` Hannes Reinecke
  2021-07-07 10:02   ` Klaus Jensen
  3 siblings, 2 replies; 16+ messages in thread
From: Klaus Jensen @ 2021-07-06  9:33 UTC (permalink / raw)
  To: qemu-devel
  Cc: Klaus Jensen, Keith Busch, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

Prior to this patch the nvme-ns devices are always children of the
NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
unrealized when the parent device is removed. However, when subsystems
are involved, this is not what we want since the namespaces may be
attached to other controllers as well.

This patch adds an additional NvmeBus on the subsystem device. When
nvme-ns devices are realized, if the parent controller device is linked
to a subsystem, the parent bus is set to the subsystem one instead. This
makes sure that namespaces are kept alive and not unrealized.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/nvme.h   | 18 ++++++++++--------
 hw/nvme/ctrl.c   |  8 +++++---
 hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
 hw/nvme/subsys.c |  4 ++++
 4 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index c4065467d877..9401e212f9f7 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
 
+#define TYPE_NVME_BUS "nvme-bus"
+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
+
+typedef struct NvmeBus {
+    BusState parent_bus;
+    bool     is_subsys;
+} NvmeBus;
+
 #define TYPE_NVME_SUBSYS "nvme-subsys"
 #define NVME_SUBSYS(obj) \
     OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 typedef struct NvmeSubsystem {
     DeviceState parent_obj;
+    NvmeBus     bus;
     uint8_t     subnqn[256];
 
     NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
@@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
     QTAILQ_HEAD(, NvmeRequest) req_list;
 } NvmeCQueue;
 
-#define TYPE_NVME_BUS "nvme-bus"
-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
-
-typedef struct NvmeBus {
-    BusState parent_bus;
-} NvmeBus;
-
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
@@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
 {
-    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
+    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
         return NULL;
     }
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 90e3ee2b70ee..7c8fca36d9a5 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
 
     for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
         ns = nvme_ns(n, i);
-        if (!ns) {
-            continue;
+        if (ns) {
+            ns->attached--;
         }
+    }
 
-        nvme_ns_cleanup(ns);
+    if (n->subsys) {
+        nvme_subsys_unregister_ctrl(n->subsys, n);
     }
 
     if (n->subsys) {
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 3c4f5b8c714a..612a2786d75d 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
 static void nvme_ns_realize(DeviceState *dev, Error **errp)
 {
     NvmeNamespace *ns = NVME_NS(dev);
-    BusState *s = qdev_get_parent_bus(dev);
-    NvmeCtrl *n = NVME(s->parent);
-    NvmeSubsystem *subsys = n->subsys;
+    BusState *qbus = qdev_get_parent_bus(dev);
+    NvmeBus *bus = NVME_BUS(qbus);
+    NvmeCtrl *n = NULL;
+    NvmeSubsystem *subsys = NULL;
     uint32_t nsid = ns->params.nsid;
     int i;
 
-    if (!n->subsys) {
+    if (bus->is_subsys) {
+        subsys = NVME_SUBSYS(qbus->parent);
+    } else {
+        n = NVME(qbus->parent);
+        subsys = n->subsys;
+    }
+
+    if (subsys) {
+        /*
+         * If this namespace belongs to a subsystem (through a link on the
+         * controller device), reparent the device.
+         */
+        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
+            return;
+        }
+    } else {
         if (ns->params.detached) {
             error_setg(errp, "detached requires that the nvme device is "
                        "linked to an nvme-subsys device");
@@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 
     if (!nsid) {
         for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
-            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
+            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
                 continue;
             }
 
@@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
             return;
         }
     } else {
-        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
+        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
             error_setg(errp, "namespace id '%d' already allocated", nsid);
             return;
         }
@@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
         }
     }
 
-    nvme_attach_ns(n, ns);
+    if (n) {
+        nvme_attach_ns(n, ns);
+    }
 }
 
 static Property nvme_ns_props[] = {
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 92caa604a280..fb7c3a7c55fc 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
 {
     NvmeSubsystem *subsys = NVME_SUBSYS(dev);
 
+    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
+                        dev->id);
+    subsys->bus.is_subsys = true;
+
     nvme_subsys_setup(subsys);
 }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
  2021-07-06  9:33 ` [PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
@ 2021-07-07  7:45   ` Hannes Reinecke
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-07-07  7:45 UTC (permalink / raw)
  To: Klaus Jensen, qemu-devel; +Cc: Keith Busch, Klaus Jensen, qemu-block

On 7/6/21 11:33 AM, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> The nvme_ns_setup and nvme_ns_check_constraints should not depend on the
> controller state. Refactor and remove it.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>   hw/nvme/nvme.h |  2 +-
>   hw/nvme/ctrl.c |  2 +-
>   hw/nvme/ns.c   | 37 ++++++++++++++++++-------------------
>   3 files changed, 20 insertions(+), 21 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] hw/nvme: mark nvme-subsys non-hotpluggable
  2021-07-06  9:33 ` [PATCH 2/4] hw/nvme: mark nvme-subsys non-hotpluggable Klaus Jensen
@ 2021-07-07  7:45   ` Hannes Reinecke
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-07-07  7:45 UTC (permalink / raw)
  To: Klaus Jensen, qemu-devel; +Cc: Keith Busch, Klaus Jensen, qemu-block

On 7/6/21 11:33 AM, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> We currently lack the infrastructure to handle subsystem hotplugging, so
> disable it.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>   hw/nvme/subsys.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
> index 192223d17ca1..dc7a96862f37 100644
> --- a/hw/nvme/subsys.c
> +++ b/hw/nvme/subsys.c
> @@ -61,6 +61,7 @@ static void nvme_subsys_class_init(ObjectClass *oc, void *data)
>   
>       dc->realize = nvme_subsys_realize;
>       dc->desc = "Virtual NVMe subsystem";
> +    dc->hotpluggable = false;
>   
>       device_class_set_props(dc, nvme_subsystem_props);
>   }
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] hw/nvme: unregister controller with subsystem at exit
  2021-07-06  9:33 ` [PATCH 3/4] hw/nvme: unregister controller with subsystem at exit Klaus Jensen
@ 2021-07-07  7:46   ` Hannes Reinecke
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-07-07  7:46 UTC (permalink / raw)
  To: Klaus Jensen, qemu-devel; +Cc: Keith Busch, Klaus Jensen, qemu-block

On 7/6/21 11:33 AM, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Make sure the controller is unregistered from the subsystem when device
> is removed.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>   hw/nvme/nvme.h   | 1 +
>   hw/nvme/ctrl.c   | 4 ++++
>   hw/nvme/subsys.c | 5 +++++
>   3 files changed, 10 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-06  9:33 ` [PATCH 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
@ 2021-07-07  7:49   ` Hannes Reinecke
  2021-07-07  9:53     ` Klaus Jensen
  2021-07-07 10:02   ` Klaus Jensen
  1 sibling, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2021-07-07  7:49 UTC (permalink / raw)
  To: Klaus Jensen, qemu-devel; +Cc: Keith Busch, Klaus Jensen, qemu-block

On 7/6/21 11:33 AM, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Prior to this patch the nvme-ns devices are always children of the
> NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
> unrealized when the parent device is removed. However, when subsystems
> are involved, this is not what we want since the namespaces may be
> attached to other controllers as well.
> 
> This patch adds an additional NvmeBus on the subsystem device. When
> nvme-ns devices are realized, if the parent controller device is linked
> to a subsystem, the parent bus is set to the subsystem one instead. This
> makes sure that namespaces are kept alive and not unrealized.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>   hw/nvme/nvme.h   | 18 ++++++++++--------
>   hw/nvme/ctrl.c   |  8 +++++---
>   hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
>   hw/nvme/subsys.c |  4 ++++
>   4 files changed, 44 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> index c4065467d877..9401e212f9f7 100644
> --- a/hw/nvme/nvme.h
> +++ b/hw/nvme/nvme.h
> @@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
>   typedef struct NvmeCtrl NvmeCtrl;
>   typedef struct NvmeNamespace NvmeNamespace;
>   
> +#define TYPE_NVME_BUS "nvme-bus"
> +OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
> +
> +typedef struct NvmeBus {
> +    BusState parent_bus;
> +    bool     is_subsys;
> +} NvmeBus;
> +
>   #define TYPE_NVME_SUBSYS "nvme-subsys"
>   #define NVME_SUBSYS(obj) \
>       OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>   
>   typedef struct NvmeSubsystem {
>       DeviceState parent_obj;
> +    NvmeBus     bus;
>       uint8_t     subnqn[256];
>   
>       NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
> @@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
>       QTAILQ_HEAD(, NvmeRequest) req_list;
>   } NvmeCQueue;
>   
> -#define TYPE_NVME_BUS "nvme-bus"
> -#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
> -
> -typedef struct NvmeBus {
> -    BusState parent_bus;
> -} NvmeBus;
> -
>   #define TYPE_NVME "nvme"
>   #define NVME(obj) \
>           OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
> @@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
>   
>   static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
>   {
> -    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
> +    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
>           return NULL;
>       }
>   
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 90e3ee2b70ee..7c8fca36d9a5 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
>   
>       for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>           ns = nvme_ns(n, i);
> -        if (!ns) {
> -            continue;
> +        if (ns) {
> +            ns->attached--;
>           }
> +    }
>   
> -        nvme_ns_cleanup(ns);
> +    if (n->subsys) {
> +        nvme_subsys_unregister_ctrl(n->subsys, n);
>       }
>   
>       if (n->subsys) {
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> index 3c4f5b8c714a..612a2786d75d 100644
> --- a/hw/nvme/ns.c
> +++ b/hw/nvme/ns.c
> @@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>   static void nvme_ns_realize(DeviceState *dev, Error **errp)
>   {
>       NvmeNamespace *ns = NVME_NS(dev);
> -    BusState *s = qdev_get_parent_bus(dev);
> -    NvmeCtrl *n = NVME(s->parent);
> -    NvmeSubsystem *subsys = n->subsys;
> +    BusState *qbus = qdev_get_parent_bus(dev);
> +    NvmeBus *bus = NVME_BUS(qbus);
> +    NvmeCtrl *n = NULL;
> +    NvmeSubsystem *subsys = NULL;
>       uint32_t nsid = ns->params.nsid;
>       int i;
>   
> -    if (!n->subsys) {
> +    if (bus->is_subsys) {
> +        subsys = NVME_SUBSYS(qbus->parent);
> +    } else {
> +        n = NVME(qbus->parent);
> +        subsys = n->subsys;
> +    }
> +
> +    if (subsys) {
> +        /*
> +         * If this namespace belongs to a subsystem (through a link on the
> +         * controller device), reparent the device.
> +         */
> +        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
> +            return;
> +        }
> +    } else {
>           if (ns->params.detached) {
>               error_setg(errp, "detached requires that the nvme device is "
>                          "linked to an nvme-subsys device");
> @@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>   
>       if (!nsid) {
>           for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> -            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
> +            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
>                   continue;
>               }
>   
> @@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>               return;
>           }
>       } else {
> -        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
> +        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
>               error_setg(errp, "namespace id '%d' already allocated", nsid);
>               return;
>           }
> @@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>           }
>       }
>   
> -    nvme_attach_ns(n, ns);
> +    if (n) {
> +        nvme_attach_ns(n, ns);
> +    }
>   }
>   
>   static Property nvme_ns_props[] = {
> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
> index 92caa604a280..fb7c3a7c55fc 100644
> --- a/hw/nvme/subsys.c
> +++ b/hw/nvme/subsys.c
> @@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
>   {
>       NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>   
> +    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
> +                        dev->id);
> +    subsys->bus.is_subsys = true;
> +
>       nvme_subsys_setup(subsys);
>   }
>   
> 
Why not always create a subsystem, and distinguish between 'shared' and 
'non-shared' subsystems instead of the ->subsys pointer?
That way all namespaces can be children of the subsystem, we won't need 
any reparenting, and the whole thing will be more in-line with qdev, no?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07  7:49   ` Hannes Reinecke
@ 2021-07-07  9:53     ` Klaus Jensen
  2021-07-07 10:43       ` Hannes Reinecke
  0 siblings, 1 reply; 16+ messages in thread
From: Klaus Jensen @ 2021-07-07  9:53 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: qemu-block, Klaus Jensen, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Keith Busch, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 7801 bytes --]

On Jul  7 09:49, Hannes Reinecke wrote:
>On 7/6/21 11:33 AM, Klaus Jensen wrote:
>>From: Klaus Jensen <k.jensen@samsung.com>
>>
>>Prior to this patch the nvme-ns devices are always children of the
>>NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>>unrealized when the parent device is removed. However, when subsystems
>>are involved, this is not what we want since the namespaces may be
>>attached to other controllers as well.
>>
>>This patch adds an additional NvmeBus on the subsystem device. When
>>nvme-ns devices are realized, if the parent controller device is linked
>>to a subsystem, the parent bus is set to the subsystem one instead. This
>>makes sure that namespaces are kept alive and not unrealized.
>>
>>Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>>---
>>  hw/nvme/nvme.h   | 18 ++++++++++--------
>>  hw/nvme/ctrl.c   |  8 +++++---
>>  hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
>>  hw/nvme/subsys.c |  4 ++++
>>  4 files changed, 44 insertions(+), 18 deletions(-)
>>
>>diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>>index c4065467d877..9401e212f9f7 100644
>>--- a/hw/nvme/nvme.h
>>+++ b/hw/nvme/nvme.h
>>@@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
>>  typedef struct NvmeCtrl NvmeCtrl;
>>  typedef struct NvmeNamespace NvmeNamespace;
>>+#define TYPE_NVME_BUS "nvme-bus"
>>+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>>+
>>+typedef struct NvmeBus {
>>+    BusState parent_bus;
>>+    bool     is_subsys;
>>+} NvmeBus;
>>+
>>  #define TYPE_NVME_SUBSYS "nvme-subsys"
>>  #define NVME_SUBSYS(obj) \
>>      OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>>  typedef struct NvmeSubsystem {
>>      DeviceState parent_obj;
>>+    NvmeBus     bus;
>>      uint8_t     subnqn[256];
>>      NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>>@@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
>>      QTAILQ_HEAD(, NvmeRequest) req_list;
>>  } NvmeCQueue;
>>-#define TYPE_NVME_BUS "nvme-bus"
>>-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>>-
>>-typedef struct NvmeBus {
>>-    BusState parent_bus;
>>-} NvmeBus;
>>-
>>  #define TYPE_NVME "nvme"
>>  #define NVME(obj) \
>>          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>>@@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
>>  static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
>>  {
>>-    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
>>+    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
>>          return NULL;
>>      }
>>diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>index 90e3ee2b70ee..7c8fca36d9a5 100644
>>--- a/hw/nvme/ctrl.c
>>+++ b/hw/nvme/ctrl.c
>>@@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
>>      for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>          ns = nvme_ns(n, i);
>>-        if (!ns) {
>>-            continue;
>>+        if (ns) {
>>+            ns->attached--;
>>          }
>>+    }
>>-        nvme_ns_cleanup(ns);
>>+    if (n->subsys) {
>>+        nvme_subsys_unregister_ctrl(n->subsys, n);
>>      }
>>      if (n->subsys) {
>>diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>>index 3c4f5b8c714a..612a2786d75d 100644
>>--- a/hw/nvme/ns.c
>>+++ b/hw/nvme/ns.c
>>@@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>>  static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>  {
>>      NvmeNamespace *ns = NVME_NS(dev);
>>-    BusState *s = qdev_get_parent_bus(dev);
>>-    NvmeCtrl *n = NVME(s->parent);
>>-    NvmeSubsystem *subsys = n->subsys;
>>+    BusState *qbus = qdev_get_parent_bus(dev);
>>+    NvmeBus *bus = NVME_BUS(qbus);
>>+    NvmeCtrl *n = NULL;
>>+    NvmeSubsystem *subsys = NULL;
>>      uint32_t nsid = ns->params.nsid;
>>      int i;
>>-    if (!n->subsys) {
>>+    if (bus->is_subsys) {
>>+        subsys = NVME_SUBSYS(qbus->parent);
>>+    } else {
>>+        n = NVME(qbus->parent);
>>+        subsys = n->subsys;
>>+    }
>>+
>>+    if (subsys) {
>>+        /*
>>+         * If this namespace belongs to a subsystem (through a link on the
>>+         * controller device), reparent the device.
>>+         */
>>+        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>>+            return;
>>+        }
>>+    } else {
>>          if (ns->params.detached) {
>>              error_setg(errp, "detached requires that the nvme device is "
>>                         "linked to an nvme-subsys device");
>>@@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>      if (!nsid) {
>>          for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>-            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
>>+            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
>>                  continue;
>>              }
>>@@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>              return;
>>          }
>>      } else {
>>-        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
>>+        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
>>              error_setg(errp, "namespace id '%d' already allocated", nsid);
>>              return;
>>          }
>>@@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>          }
>>      }
>>-    nvme_attach_ns(n, ns);
>>+    if (n) {
>>+        nvme_attach_ns(n, ns);
>>+    }
>>  }
>>  static Property nvme_ns_props[] = {
>>diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
>>index 92caa604a280..fb7c3a7c55fc 100644
>>--- a/hw/nvme/subsys.c
>>+++ b/hw/nvme/subsys.c
>>@@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
>>  {
>>      NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>>+    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
>>+                        dev->id);
>>+    subsys->bus.is_subsys = true;
>>+
>>      nvme_subsys_setup(subsys);
>>  }
>>
>Why not always create a subsystem, and distinguish between 'shared' 
>and 'non-shared' subsystems instead of the ->subsys pointer?
>That way all namespaces can be children of the subsystem, we won't 
>need any reparenting, and the whole thing will be more in-line with 
>qdev, no?
>

Hi Hannes,

Thanks for your reviews and comments!

So, I have actually considered what you suggest.

The problem is that the nvme-ns device currently expects to plug into a 
bus (an 'nvme-bus') and we cannot really get rid of that 'bus' parameter 
without breaking compatibility. I experimented with removing the bus 
from the nvme device and instead creating it in the nvme-subsys device. 
My idea here was to implicitly always create an nvme-subsys if not 
explicitly created by the user, but since nvme-subsys does not plug into 
any bus itself, it is not exposed in the qtree and thus not reachable 
(i.e., when realizing the nvme-ns device, it will complain that there 
are no 'nvme-bus' available to plug into). To make this work, I believe 
we'd have to have the nvme-subsys plug into the main system bus (or some 
other bus rooted at main-system-bus), and I'd prefer not to do that 
since this is already a flawed design and I think it would be more 
intrusive than what my patch does.

I raised these issues on the mailinglist some time ago[1]. We didn't 
really find a good solution, but I think the conclusion is that the 
bus/device design of subsystems and namespaces is flawed. That's why I 
am working on an "objectification" of the two devices as suggested by 
Stefan[2] as a proper fix for this mess.


   [1]: https://lore.kernel.org/qemu-devel/YJrKRsF4%2F38QheKn@apples.localdomain/T/#u
   [2]: https://lore.kernel.org/qemu-devel/YKI9zh2AqX35S8wd@stefanha-x1.localdomain/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-06  9:33 ` [PATCH 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
  2021-07-07  7:49   ` Hannes Reinecke
@ 2021-07-07 10:02   ` Klaus Jensen
  1 sibling, 0 replies; 16+ messages in thread
From: Klaus Jensen @ 2021-07-07 10:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: Keith Busch, Klaus Jensen, Hannes Reinecke, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5960 bytes --]

On Jul  6 11:33, Klaus Jensen wrote:
>From: Klaus Jensen <k.jensen@samsung.com>
>
>Prior to this patch the nvme-ns devices are always children of the
>NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>unrealized when the parent device is removed. However, when subsystems
>are involved, this is not what we want since the namespaces may be
>attached to other controllers as well.
>
>This patch adds an additional NvmeBus on the subsystem device. When
>nvme-ns devices are realized, if the parent controller device is linked
>to a subsystem, the parent bus is set to the subsystem one instead. This
>makes sure that namespaces are kept alive and not unrealized.
>
>Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>---
> hw/nvme/nvme.h   | 18 ++++++++++--------
> hw/nvme/ctrl.c   |  8 +++++---
> hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
> hw/nvme/subsys.c |  4 ++++
> 4 files changed, 44 insertions(+), 18 deletions(-)
>
>diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>index c4065467d877..9401e212f9f7 100644
>--- a/hw/nvme/nvme.h
>+++ b/hw/nvme/nvme.h
>@@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
> typedef struct NvmeCtrl NvmeCtrl;
> typedef struct NvmeNamespace NvmeNamespace;
>
>+#define TYPE_NVME_BUS "nvme-bus"
>+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>+
>+typedef struct NvmeBus {
>+    BusState parent_bus;
>+    bool     is_subsys;
>+} NvmeBus;
>+
> #define TYPE_NVME_SUBSYS "nvme-subsys"
> #define NVME_SUBSYS(obj) \
>     OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>
> typedef struct NvmeSubsystem {
>     DeviceState parent_obj;
>+    NvmeBus     bus;
>     uint8_t     subnqn[256];
>
>     NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>@@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
>     QTAILQ_HEAD(, NvmeRequest) req_list;
> } NvmeCQueue;
>
>-#define TYPE_NVME_BUS "nvme-bus"
>-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>-
>-typedef struct NvmeBus {
>-    BusState parent_bus;
>-} NvmeBus;
>-
> #define TYPE_NVME "nvme"
> #define NVME(obj) \
>         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>@@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
>
> static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
> {
>-    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
>+    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
>         return NULL;
>     }
>
>diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>index 90e3ee2b70ee..7c8fca36d9a5 100644
>--- a/hw/nvme/ctrl.c
>+++ b/hw/nvme/ctrl.c
>@@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
>
>     for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>         ns = nvme_ns(n, i);
>-        if (!ns) {
>-            continue;
>+        if (ns) {
>+            ns->attached--;
>         }
>+    }
>
>-        nvme_ns_cleanup(ns);
>+    if (n->subsys) {
>+        nvme_subsys_unregister_ctrl(n->subsys, n);
>     }
>
>     if (n->subsys) {
>diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>index 3c4f5b8c714a..612a2786d75d 100644
>--- a/hw/nvme/ns.c
>+++ b/hw/nvme/ns.c
>@@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
> static void nvme_ns_realize(DeviceState *dev, Error **errp)
> {
>     NvmeNamespace *ns = NVME_NS(dev);
>-    BusState *s = qdev_get_parent_bus(dev);
>-    NvmeCtrl *n = NVME(s->parent);
>-    NvmeSubsystem *subsys = n->subsys;
>+    BusState *qbus = qdev_get_parent_bus(dev);
>+    NvmeBus *bus = NVME_BUS(qbus);
>+    NvmeCtrl *n = NULL;
>+    NvmeSubsystem *subsys = NULL;
>     uint32_t nsid = ns->params.nsid;
>     int i;
>
>-    if (!n->subsys) {
>+    if (bus->is_subsys) {
>+        subsys = NVME_SUBSYS(qbus->parent);
>+    } else {
>+        n = NVME(qbus->parent);
>+        subsys = n->subsys;
>+    }

So, I realized that this if is not needed, since the device will always 
attach to the bus from the nvme device, never the 'fake' one from the 
subsystem.

>+
>+    if (subsys) {
>+        /*
>+         * If this namespace belongs to a subsystem (through a link on the
>+         * controller device), reparent the device.
>+         */
>+        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>+            return;
>+        }
>+    } else {
>         if (ns->params.detached) {
>             error_setg(errp, "detached requires that the nvme device is "
>                        "linked to an nvme-subsys device");
>@@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>
>     if (!nsid) {
>         for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>-            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
>+            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
>                 continue;
>             }
>
>@@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>             return;
>         }
>     } else {
>-        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
>+        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
>             error_setg(errp, "namespace id '%d' already allocated", nsid);
>             return;
>         }
>@@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>         }
>     }
>
>-    nvme_attach_ns(n, ns);
>+    if (n) {
>+        nvme_attach_ns(n, ns);
>+    }
> }
>
> static Property nvme_ns_props[] = {
>diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
>index 92caa604a280..fb7c3a7c55fc 100644
>--- a/hw/nvme/subsys.c
>+++ b/hw/nvme/subsys.c
>@@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
> {
>     NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>
>+    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
>+                        dev->id);
>+    subsys->bus.is_subsys = true;
>+
>     nvme_subsys_setup(subsys);
> }
>
>-- 
>2.32.0
>
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07  9:53     ` Klaus Jensen
@ 2021-07-07 10:43       ` Hannes Reinecke
  2021-07-07 15:41         ` Klaus Jensen
  2021-07-07 16:14         ` Stefan Hajnoczi
  0 siblings, 2 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-07-07 10:43 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: qemu-block, Klaus Jensen, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Keith Busch, Philippe Mathieu-Daudé

On 7/7/21 11:53 AM, Klaus Jensen wrote:
> On Jul  7 09:49, Hannes Reinecke wrote:
>> On 7/6/21 11:33 AM, Klaus Jensen wrote:
>>> From: Klaus Jensen <k.jensen@samsung.com>
>>>
>>> Prior to this patch the nvme-ns devices are always children of the
>>> NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>>> unrealized when the parent device is removed. However, when subsystems
>>> are involved, this is not what we want since the namespaces may be
>>> attached to other controllers as well.
>>>
>>> This patch adds an additional NvmeBus on the subsystem device. When
>>> nvme-ns devices are realized, if the parent controller device is linked
>>> to a subsystem, the parent bus is set to the subsystem one instead. This
>>> makes sure that namespaces are kept alive and not unrealized.
>>>
>>> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>>> ---
>>>  hw/nvme/nvme.h   | 18 ++++++++++--------
>>>  hw/nvme/ctrl.c   |  8 +++++---
>>>  hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
>>>  hw/nvme/subsys.c |  4 ++++
>>>  4 files changed, 44 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>>> index c4065467d877..9401e212f9f7 100644
>>> --- a/hw/nvme/nvme.h
>>> +++ b/hw/nvme/nvme.h
>>> @@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > 
>>> NVME_NSID_BROADCAST - 1);
>>>  typedef struct NvmeCtrl NvmeCtrl;
>>>  typedef struct NvmeNamespace NvmeNamespace;
>>> +#define TYPE_NVME_BUS "nvme-bus"
>>> +OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>>> +
>>> +typedef struct NvmeBus {
>>> +    BusState parent_bus;
>>> +    bool     is_subsys;
>>> +} NvmeBus;
>>> +
>>>  #define TYPE_NVME_SUBSYS "nvme-subsys"
>>>  #define NVME_SUBSYS(obj) \
>>>      OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>>>  typedef struct NvmeSubsystem {
>>>      DeviceState parent_obj;
>>> +    NvmeBus     bus;
>>>      uint8_t     subnqn[256];
>>>      NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>>> @@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
>>>      QTAILQ_HEAD(, NvmeRequest) req_list;
>>>  } NvmeCQueue;
>>> -#define TYPE_NVME_BUS "nvme-bus"
>>> -#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>>> -
>>> -typedef struct NvmeBus {
>>> -    BusState parent_bus;
>>> -} NvmeBus;
>>> -
>>>  #define TYPE_NVME "nvme"
>>>  #define NVME(obj) \
>>>          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>>> @@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
>>>  static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
>>>  {
>>> -    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
>>> +    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
>>>          return NULL;
>>>      }
>>> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>> index 90e3ee2b70ee..7c8fca36d9a5 100644
>>> --- a/hw/nvme/ctrl.c
>>> +++ b/hw/nvme/ctrl.c
>>> @@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
>>>      for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>>          ns = nvme_ns(n, i);
>>> -        if (!ns) {
>>> -            continue;
>>> +        if (ns) {
>>> +            ns->attached--;
>>>          }
>>> +    }
>>> -        nvme_ns_cleanup(ns);
>>> +    if (n->subsys) {
>>> +        nvme_subsys_unregister_ctrl(n->subsys, n);
>>>      }
>>>      if (n->subsys) {
>>> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>>> index 3c4f5b8c714a..612a2786d75d 100644
>>> --- a/hw/nvme/ns.c
>>> +++ b/hw/nvme/ns.c
>>> @@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>>>  static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>>  {
>>>      NvmeNamespace *ns = NVME_NS(dev);
>>> -    BusState *s = qdev_get_parent_bus(dev);
>>> -    NvmeCtrl *n = NVME(s->parent);
>>> -    NvmeSubsystem *subsys = n->subsys;
>>> +    BusState *qbus = qdev_get_parent_bus(dev);
>>> +    NvmeBus *bus = NVME_BUS(qbus);
>>> +    NvmeCtrl *n = NULL;
>>> +    NvmeSubsystem *subsys = NULL;
>>>      uint32_t nsid = ns->params.nsid;
>>>      int i;
>>> -    if (!n->subsys) {
>>> +    if (bus->is_subsys) {
>>> +        subsys = NVME_SUBSYS(qbus->parent);
>>> +    } else {
>>> +        n = NVME(qbus->parent);
>>> +        subsys = n->subsys;
>>> +    }
>>> +
>>> +    if (subsys) {
>>> +        /*
>>> +         * If this namespace belongs to a subsystem (through a link 
>>> on the
>>> +         * controller device), reparent the device.
>>> +         */
>>> +        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>>> +            return;
>>> +        }
>>> +    } else {
>>>          if (ns->params.detached) {
>>>              error_setg(errp, "detached requires that the nvme device 
>>> is "
>>>                         "linked to an nvme-subsys device");
>>> @@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState *dev, 
>>> Error **errp)
>>>      if (!nsid) {
>>>          for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>> -            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
>>> +            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
>>>                  continue;
>>>              }
>>> @@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState *dev, 
>>> Error **errp)
>>>              return;
>>>          }
>>>      } else {
>>> -        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
>>> +        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
>>>              error_setg(errp, "namespace id '%d' already allocated", 
>>> nsid);
>>>              return;
>>>          }
>>> @@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState *dev, 
>>> Error **errp)
>>>          }
>>>      }
>>> -    nvme_attach_ns(n, ns);
>>> +    if (n) {
>>> +        nvme_attach_ns(n, ns);
>>> +    }
>>>  }
>>>  static Property nvme_ns_props[] = {
>>> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
>>> index 92caa604a280..fb7c3a7c55fc 100644
>>> --- a/hw/nvme/subsys.c
>>> +++ b/hw/nvme/subsys.c
>>> @@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState *dev, 
>>> Error **errp)
>>>  {
>>>      NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>>> +    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), 
>>> TYPE_NVME_BUS, dev,
>>> +                        dev->id);
>>> +    subsys->bus.is_subsys = true;
>>> +
>>>      nvme_subsys_setup(subsys);
>>>  }
>>>
>> Why not always create a subsystem, and distinguish between 'shared' 
>> and 'non-shared' subsystems instead of the ->subsys pointer?
>> That way all namespaces can be children of the subsystem, we won't 
>> need any reparenting, and the whole thing will be more in-line with 
>> qdev, no?
>>
> 
> Hi Hannes,
> 
> Thanks for your reviews and comments!
> 
> So, I have actually considered what you suggest.
> 
> The problem is that the nvme-ns device currently expects to plug into a 
> bus (an 'nvme-bus') and we cannot really get rid of that 'bus' parameter 
> without breaking compatibility. I experimented with removing the bus 
> from the nvme device and instead creating it in the nvme-subsys device. 
> My idea here was to implicitly always create an nvme-subsys if not 
> explicitly created by the user, but since nvme-subsys does not plug into 
> any bus itself, it is not exposed in the qtree and thus not reachable 
> (i.e., when realizing the nvme-ns device, it will complain that there 
> are no 'nvme-bus' available to plug into). To make this work, I believe 
> we'd have to have the nvme-subsys plug into the main system bus (or some 
> other bus rooted at main-system-bus), and I'd prefer not to do that 
> since this is already a flawed design and I think it would be more 
> intrusive than what my patch does.
> 
Sigh.
_Again_.

I seem to trip over issues for which no patch can possibly be accepted 
as first the infrastructure has to be reworked.

... there is a reason why I'm avoiding posting patches to qemu-devel :-(

> I raised these issues on the mailinglist some time ago[1]. We didn't 
> really find a good solution, but I think the conclusion is that the 
> bus/device design of subsystems and namespaces is flawed. That's why I 
> am working on an "objectification" of the two devices as suggested by 
> Stefan[2] as a proper fix for this mess.
> 
Actually, I _do_ think that it would be good to have an nvme-subsystem 
bus, providing a list of all namespaces for that subsystem.
Creating a controller would then mean that one has to create a 
controller and a namespace _separately_, as then the namespace would 
_not_ be a child of the controller.
But that's arguably a good thing, as the namespaces _do_ have a separate 
lifetime from controllers.
And the lifetime of the subsystem could be completely self-contained; 
the controller would be looking up subsystems via the subsystem NQN if 
present; I guess we'll need to create ad-hoc subsystems for PCIe.
But nothing insurmountable.
And certainly nothing which require a large-scale rework of qemu 
internals, I would think.

But _if_ you work on the rework, please ensure that not only NVMe is 
represented here. SCSI multipathing has the same issue currently; there 
is a strict host->lun->block->OS device relationship currently in SCSI, 
making it impossible to represent a multipathed SCSI device in qemu.
The only sneaky way is to allow for a shared OS device (ie specifying 
the filename more than once), but that completely sidesteps qemu 
internals, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 10:43       ` Hannes Reinecke
@ 2021-07-07 15:41         ` Klaus Jensen
  2021-07-07 16:14         ` Stefan Hajnoczi
  1 sibling, 0 replies; 16+ messages in thread
From: Klaus Jensen @ 2021-07-07 15:41 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: qemu-block, Klaus Jensen, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Keith Busch, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 10702 bytes --]

On Jul  7 12:43, Hannes Reinecke wrote:
>On 7/7/21 11:53 AM, Klaus Jensen wrote:
>>On Jul  7 09:49, Hannes Reinecke wrote:
>>>On 7/6/21 11:33 AM, Klaus Jensen wrote:
>>>>From: Klaus Jensen <k.jensen@samsung.com>
>>>>
>>>>Prior to this patch the nvme-ns devices are always children of the
>>>>NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>>>>unrealized when the parent device is removed. However, when subsystems
>>>>are involved, this is not what we want since the namespaces may be
>>>>attached to other controllers as well.
>>>>
>>>>This patch adds an additional NvmeBus on the subsystem device. When
>>>>nvme-ns devices are realized, if the parent controller device is linked
>>>>to a subsystem, the parent bus is set to the subsystem one instead. This
>>>>makes sure that namespaces are kept alive and not unrealized.
>>>>
>>>>Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>>>>---
>>>> hw/nvme/nvme.h   | 18 ++++++++++--------
>>>> hw/nvme/ctrl.c   |  8 +++++---
>>>> hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
>>>> hw/nvme/subsys.c |  4 ++++
>>>> 4 files changed, 44 insertions(+), 18 deletions(-)
>>>>
>>>>diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>>>>index c4065467d877..9401e212f9f7 100644
>>>>--- a/hw/nvme/nvme.h
>>>>+++ b/hw/nvme/nvme.h
>>>>@@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > 
>>>>NVME_NSID_BROADCAST - 1);
>>>> typedef struct NvmeCtrl NvmeCtrl;
>>>> typedef struct NvmeNamespace NvmeNamespace;
>>>>+#define TYPE_NVME_BUS "nvme-bus"
>>>>+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>>>>+
>>>>+typedef struct NvmeBus {
>>>>+    BusState parent_bus;
>>>>+    bool     is_subsys;
>>>>+} NvmeBus;
>>>>+
>>>> #define TYPE_NVME_SUBSYS "nvme-subsys"
>>>> #define NVME_SUBSYS(obj) \
>>>>     OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>>>> typedef struct NvmeSubsystem {
>>>>     DeviceState parent_obj;
>>>>+    NvmeBus     bus;
>>>>     uint8_t     subnqn[256];
>>>>     NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>>>>@@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
>>>>     QTAILQ_HEAD(, NvmeRequest) req_list;
>>>> } NvmeCQueue;
>>>>-#define TYPE_NVME_BUS "nvme-bus"
>>>>-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>>>>-
>>>>-typedef struct NvmeBus {
>>>>-    BusState parent_bus;
>>>>-} NvmeBus;
>>>>-
>>>> #define TYPE_NVME "nvme"
>>>> #define NVME(obj) \
>>>>         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>>>>@@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
>>>> static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
>>>> {
>>>>-    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
>>>>+    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
>>>>         return NULL;
>>>>     }
>>>>diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>>>index 90e3ee2b70ee..7c8fca36d9a5 100644
>>>>--- a/hw/nvme/ctrl.c
>>>>+++ b/hw/nvme/ctrl.c
>>>>@@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
>>>>     for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>>>         ns = nvme_ns(n, i);
>>>>-        if (!ns) {
>>>>-            continue;
>>>>+        if (ns) {
>>>>+            ns->attached--;
>>>>         }
>>>>+    }
>>>>-        nvme_ns_cleanup(ns);
>>>>+    if (n->subsys) {
>>>>+        nvme_subsys_unregister_ctrl(n->subsys, n);
>>>>     }
>>>>     if (n->subsys) {
>>>>diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>>>>index 3c4f5b8c714a..612a2786d75d 100644
>>>>--- a/hw/nvme/ns.c
>>>>+++ b/hw/nvme/ns.c
>>>>@@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>>>> static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>>> {
>>>>     NvmeNamespace *ns = NVME_NS(dev);
>>>>-    BusState *s = qdev_get_parent_bus(dev);
>>>>-    NvmeCtrl *n = NVME(s->parent);
>>>>-    NvmeSubsystem *subsys = n->subsys;
>>>>+    BusState *qbus = qdev_get_parent_bus(dev);
>>>>+    NvmeBus *bus = NVME_BUS(qbus);
>>>>+    NvmeCtrl *n = NULL;
>>>>+    NvmeSubsystem *subsys = NULL;
>>>>     uint32_t nsid = ns->params.nsid;
>>>>     int i;
>>>>-    if (!n->subsys) {
>>>>+    if (bus->is_subsys) {
>>>>+        subsys = NVME_SUBSYS(qbus->parent);
>>>>+    } else {
>>>>+        n = NVME(qbus->parent);
>>>>+        subsys = n->subsys;
>>>>+    }
>>>>+
>>>>+    if (subsys) {
>>>>+        /*
>>>>+         * If this namespace belongs to a subsystem (through a 
>>>>link on the
>>>>+         * controller device), reparent the device.
>>>>+         */
>>>>+        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>>>>+            return;
>>>>+        }
>>>>+    } else {
>>>>         if (ns->params.detached) {
>>>>             error_setg(errp, "detached requires that the nvme 
>>>>device is "
>>>>                        "linked to an nvme-subsys device");
>>>>@@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState 
>>>>*dev, Error **errp)
>>>>     if (!nsid) {
>>>>         for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>>>-            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
>>>>+            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
>>>>                 continue;
>>>>             }
>>>>@@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState 
>>>>*dev, Error **errp)
>>>>             return;
>>>>         }
>>>>     } else {
>>>>-        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
>>>>+        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
>>>>             error_setg(errp, "namespace id '%d' already 
>>>>allocated", nsid);
>>>>             return;
>>>>         }
>>>>@@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState 
>>>>*dev, Error **errp)
>>>>         }
>>>>     }
>>>>-    nvme_attach_ns(n, ns);
>>>>+    if (n) {
>>>>+        nvme_attach_ns(n, ns);
>>>>+    }
>>>> }
>>>> static Property nvme_ns_props[] = {
>>>>diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
>>>>index 92caa604a280..fb7c3a7c55fc 100644
>>>>--- a/hw/nvme/subsys.c
>>>>+++ b/hw/nvme/subsys.c
>>>>@@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState 
>>>>*dev, Error **errp)
>>>> {
>>>>     NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>>>>+    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), 
>>>>TYPE_NVME_BUS, dev,
>>>>+                        dev->id);
>>>>+    subsys->bus.is_subsys = true;
>>>>+
>>>>     nvme_subsys_setup(subsys);
>>>> }
>>>>
>>>Why not always create a subsystem, and distinguish between 
>>>'shared' and 'non-shared' subsystems instead of the ->subsys 
>>>pointer?
>>>That way all namespaces can be children of the subsystem, we won't 
>>>need any reparenting, and the whole thing will be more in-line 
>>>with qdev, no?
>>>
>>
>>Hi Hannes,
>>
>>Thanks for your reviews and comments!
>>
>>So, I have actually considered what you suggest.
>>
>>The problem is that the nvme-ns device currently expects to plug 
>>into a bus (an 'nvme-bus') and we cannot really get rid of that 
>>'bus' parameter without breaking compatibility. I experimented with 
>>removing the bus from the nvme device and instead creating it in the 
>>nvme-subsys device. My idea here was to implicitly always create an 
>>nvme-subsys if not explicitly created by the user, but since 
>>nvme-subsys does not plug into any bus itself, it is not exposed in 
>>the qtree and thus not reachable (i.e., when realizing the nvme-ns 
>>device, it will complain that there are no 'nvme-bus' available to 
>>plug into). To make this work, I believe we'd have to have the 
>>nvme-subsys plug into the main system bus (or some other bus rooted 
>>at main-system-bus), and I'd prefer not to do that since this is 
>>already a flawed design and I think it would be more intrusive than 
>>what my patch does.
>>
>Sigh.
>_Again_.
>
>I seem to trip over issues for which no patch can possibly be accepted 
>as first the infrastructure has to be reworked.
>
>... there is a reason why I'm avoiding posting patches to qemu-devel :-(
>

Actually, I'm the one to blame for making nvme-ns a -device and partly 
for why nvme-subsys is one as well... It's made sense at the time and 
thats why the infrastructure is biting me in the behind now ;)

>>I raised these issues on the mailinglist some time ago[1]. We didn't 
>>really find a good solution, but I think the conclusion is that the 
>>bus/device design of subsystems and namespaces is flawed. That's why 
>>I am working on an "objectification" of the two devices as suggested 
>>by Stefan[2] as a proper fix for this mess.
>>
>Actually, I _do_ think that it would be good to have an nvme-subsystem 
>bus, providinga list of all namespaces for that subsystem.

My goal was to *not* use a qbus I don't think the subsystem should be 
modelled as a device. The same can be accomplished with -objects and 
links.

>Creating a controller would then mean that one has to create a 
>controller and a namespace _separately_, as then the namespace would 
>_not_ be a child of the controller.
>But that's arguably a good thing, as the namespaces _do_ have a 
>separate lifetime from controllers.
>And the lifetime of the subsystem could be completely self-contained; 
>the controller would be looking up subsystems via the subsystem NQN if 
>present; I guess we'll need to create ad-hoc subsystems for PCIe.
>But nothing insurmountable.
>And certainly nothing which require a large-scale rework of qemu 
>internals, I would think.
>

Yes, exactly, totally agree.

>But _if_ you work on the rework, please ensure that not only NVMe is 
>represented here. SCSI multipathing has the same issue currently; 
>there is a strict host->lun->block->OS device relationship currently 
>in SCSI, making it impossible to represent a multipathed SCSI device 
>in qemu.
>The only sneaky way is to allow for a shared OS device (ie specifying 
>the filename more than once), but that completely sidesteps qemu 
>internals, too.
>

I'm only very superficially familiar with hw/scsi, but this sounds more 
like the scsi subsystem also has some similar design problems like 
hw/nvme, not an inherent issue with QDEV/QOM.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 10:43       ` Hannes Reinecke
  2021-07-07 15:41         ` Klaus Jensen
@ 2021-07-07 16:14         ` Stefan Hajnoczi
  2021-07-07 16:47           ` Klaus Jensen
  2021-07-08  6:15           ` Hannes Reinecke
  1 sibling, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2021-07-07 16:14 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: qemu-block, Klaus Jensen, Markus Armbruster, qemu-devel,
	Keith Busch, Klaus Jensen, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 10737 bytes --]

On Wed, Jul 07, 2021 at 12:43:56PM +0200, Hannes Reinecke wrote:
> On 7/7/21 11:53 AM, Klaus Jensen wrote:
> > On Jul  7 09:49, Hannes Reinecke wrote:
> > > On 7/6/21 11:33 AM, Klaus Jensen wrote:
> > > > From: Klaus Jensen <k.jensen@samsung.com>
> > > > 
> > > > Prior to this patch the nvme-ns devices are always children of the
> > > > NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
> > > > unrealized when the parent device is removed. However, when subsystems
> > > > are involved, this is not what we want since the namespaces may be
> > > > attached to other controllers as well.
> > > > 
> > > > This patch adds an additional NvmeBus on the subsystem device. When
> > > > nvme-ns devices are realized, if the parent controller device is linked
> > > > to a subsystem, the parent bus is set to the subsystem one instead. This
> > > > makes sure that namespaces are kept alive and not unrealized.
> > > > 
> > > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > > ---
> > > >  hw/nvme/nvme.h   | 18 ++++++++++--------
> > > >  hw/nvme/ctrl.c   |  8 +++++---
> > > >  hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
> > > >  hw/nvme/subsys.c |  4 ++++
> > > >  4 files changed, 44 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> > > > index c4065467d877..9401e212f9f7 100644
> > > > --- a/hw/nvme/nvme.h
> > > > +++ b/hw/nvme/nvme.h
> > > > @@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES >
> > > > NVME_NSID_BROADCAST - 1);
> > > >  typedef struct NvmeCtrl NvmeCtrl;
> > > >  typedef struct NvmeNamespace NvmeNamespace;
> > > > +#define TYPE_NVME_BUS "nvme-bus"
> > > > +OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
> > > > +
> > > > +typedef struct NvmeBus {
> > > > +    BusState parent_bus;
> > > > +    bool     is_subsys;
> > > > +} NvmeBus;
> > > > +
> > > >  #define TYPE_NVME_SUBSYS "nvme-subsys"
> > > >  #define NVME_SUBSYS(obj) \
> > > >      OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
> > > >  typedef struct NvmeSubsystem {
> > > >      DeviceState parent_obj;
> > > > +    NvmeBus     bus;
> > > >      uint8_t     subnqn[256];
> > > >      NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
> > > > @@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
> > > >      QTAILQ_HEAD(, NvmeRequest) req_list;
> > > >  } NvmeCQueue;
> > > > -#define TYPE_NVME_BUS "nvme-bus"
> > > > -#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
> > > > -
> > > > -typedef struct NvmeBus {
> > > > -    BusState parent_bus;
> > > > -} NvmeBus;
> > > > -
> > > >  #define TYPE_NVME "nvme"
> > > >  #define NVME(obj) \
> > > >          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
> > > > @@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
> > > >  static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
> > > >  {
> > > > -    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
> > > > +    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
> > > >          return NULL;
> > > >      }
> > > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > > > index 90e3ee2b70ee..7c8fca36d9a5 100644
> > > > --- a/hw/nvme/ctrl.c
> > > > +++ b/hw/nvme/ctrl.c
> > > > @@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
> > > >      for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> > > >          ns = nvme_ns(n, i);
> > > > -        if (!ns) {
> > > > -            continue;
> > > > +        if (ns) {
> > > > +            ns->attached--;
> > > >          }
> > > > +    }
> > > > -        nvme_ns_cleanup(ns);
> > > > +    if (n->subsys) {
> > > > +        nvme_subsys_unregister_ctrl(n->subsys, n);
> > > >      }
> > > >      if (n->subsys) {
> > > > diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> > > > index 3c4f5b8c714a..612a2786d75d 100644
> > > > --- a/hw/nvme/ns.c
> > > > +++ b/hw/nvme/ns.c
> > > > @@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
> > > >  static void nvme_ns_realize(DeviceState *dev, Error **errp)
> > > >  {
> > > >      NvmeNamespace *ns = NVME_NS(dev);
> > > > -    BusState *s = qdev_get_parent_bus(dev);
> > > > -    NvmeCtrl *n = NVME(s->parent);
> > > > -    NvmeSubsystem *subsys = n->subsys;
> > > > +    BusState *qbus = qdev_get_parent_bus(dev);
> > > > +    NvmeBus *bus = NVME_BUS(qbus);
> > > > +    NvmeCtrl *n = NULL;
> > > > +    NvmeSubsystem *subsys = NULL;
> > > >      uint32_t nsid = ns->params.nsid;
> > > >      int i;
> > > > -    if (!n->subsys) {
> > > > +    if (bus->is_subsys) {
> > > > +        subsys = NVME_SUBSYS(qbus->parent);
> > > > +    } else {
> > > > +        n = NVME(qbus->parent);
> > > > +        subsys = n->subsys;
> > > > +    }
> > > > +
> > > > +    if (subsys) {
> > > > +        /*
> > > > +         * If this namespace belongs to a subsystem (through a
> > > > link on the
> > > > +         * controller device), reparent the device.
> > > > +         */
> > > > +        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
> > > > +            return;
> > > > +        }
> > > > +    } else {
> > > >          if (ns->params.detached) {
> > > >              error_setg(errp, "detached requires that the nvme
> > > > device is "
> > > >                         "linked to an nvme-subsys device");
> > > > @@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState
> > > > *dev, Error **errp)
> > > >      if (!nsid) {
> > > >          for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> > > > -            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
> > > > +            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
> > > >                  continue;
> > > >              }
> > > > @@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState
> > > > *dev, Error **errp)
> > > >              return;
> > > >          }
> > > >      } else {
> > > > -        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
> > > > +        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
> > > >              error_setg(errp, "namespace id '%d' already
> > > > allocated", nsid);
> > > >              return;
> > > >          }
> > > > @@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState
> > > > *dev, Error **errp)
> > > >          }
> > > >      }
> > > > -    nvme_attach_ns(n, ns);
> > > > +    if (n) {
> > > > +        nvme_attach_ns(n, ns);
> > > > +    }
> > > >  }
> > > >  static Property nvme_ns_props[] = {
> > > > diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
> > > > index 92caa604a280..fb7c3a7c55fc 100644
> > > > --- a/hw/nvme/subsys.c
> > > > +++ b/hw/nvme/subsys.c
> > > > @@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState
> > > > *dev, Error **errp)
> > > >  {
> > > >      NvmeSubsystem *subsys = NVME_SUBSYS(dev);
> > > > +    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus),
> > > > TYPE_NVME_BUS, dev,
> > > > +                        dev->id);
> > > > +    subsys->bus.is_subsys = true;
> > > > +
> > > >      nvme_subsys_setup(subsys);
> > > >  }
> > > > 
> > > Why not always create a subsystem, and distinguish between 'shared'
> > > and 'non-shared' subsystems instead of the ->subsys pointer?
> > > That way all namespaces can be children of the subsystem, we won't
> > > need any reparenting, and the whole thing will be more in-line with
> > > qdev, no?
> > > 
> > 
> > Hi Hannes,
> > 
> > Thanks for your reviews and comments!
> > 
> > So, I have actually considered what you suggest.
> > 
> > The problem is that the nvme-ns device currently expects to plug into a
> > bus (an 'nvme-bus') and we cannot really get rid of that 'bus' parameter
> > without breaking compatibility. I experimented with removing the bus
> > from the nvme device and instead creating it in the nvme-subsys device.
> > My idea here was to implicitly always create an nvme-subsys if not
> > explicitly created by the user, but since nvme-subsys does not plug into
> > any bus itself, it is not exposed in the qtree and thus not reachable
> > (i.e., when realizing the nvme-ns device, it will complain that there
> > are no 'nvme-bus' available to plug into). To make this work, I believe
> > we'd have to have the nvme-subsys plug into the main system bus (or some
> > other bus rooted at main-system-bus), and I'd prefer not to do that
> > since this is already a flawed design and I think it would be more
> > intrusive than what my patch does.
> > 
> Sigh.
> _Again_.
> 
> I seem to trip over issues for which no patch can possibly be accepted as
> first the infrastructure has to be reworked.
> 
> ... there is a reason why I'm avoiding posting patches to qemu-devel :-(
> 
> > I raised these issues on the mailinglist some time ago[1]. We didn't
> > really find a good solution, but I think the conclusion is that the
> > bus/device design of subsystems and namespaces is flawed. That's why I
> > am working on an "objectification" of the two devices as suggested by
> > Stefan[2] as a proper fix for this mess.
> > 
> Actually, I _do_ think that it would be good to have an nvme-subsystem bus,
> providing a list of all namespaces for that subsystem.
> Creating a controller would then mean that one has to create a controller
> and a namespace _separately_, as then the namespace would _not_ be a child
> of the controller.
> But that's arguably a good thing, as the namespaces _do_ have a separate
> lifetime from controllers.
> And the lifetime of the subsystem could be completely self-contained; the
> controller would be looking up subsystems via the subsystem NQN if present;
> I guess we'll need to create ad-hoc subsystems for PCIe.
> But nothing insurmountable.
> And certainly nothing which require a large-scale rework of qemu internals,
> I would think.

I don't remember the reason to use --object instead of --device. Maybe
the --device approach is viable and it makes some sense because this is
a "device" (like a SCSI LUN) that exposed to the guest. It would be
necessary to have a nvme-subsys device on the system bus to act as the
bus for nvme-ns devices as Klaus described.

> But _if_ you work on the rework, please ensure that not only NVMe is
> represented here. SCSI multipathing has the same issue currently; there is a
> strict host->lun->block->OS device relationship currently in SCSI, making it
> impossible to represent a multipathed SCSI device in qemu.
> The only sneaky way is to allow for a shared OS device (ie specifying the
> filename more than once), but that completely sidesteps qemu internals, too.

Do you want one SCSIDevice to be accessible via multiple SCSIBus
instances for multipathing in QEMU?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 16:14         ` Stefan Hajnoczi
@ 2021-07-07 16:47           ` Klaus Jensen
  2021-07-08  6:15           ` Hannes Reinecke
  1 sibling, 0 replies; 16+ messages in thread
From: Klaus Jensen @ 2021-07-07 16:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, Klaus Jensen, Markus Armbruster, qemu-devel,
	Hannes Reinecke, Keith Busch, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 11858 bytes --]

On Jul  7 17:14, Stefan Hajnoczi wrote:
>On Wed, Jul 07, 2021 at 12:43:56PM +0200, Hannes Reinecke wrote:
>> On 7/7/21 11:53 AM, Klaus Jensen wrote:
>> > On Jul  7 09:49, Hannes Reinecke wrote:
>> > > On 7/6/21 11:33 AM, Klaus Jensen wrote:
>> > > > From: Klaus Jensen <k.jensen@samsung.com>
>> > > >
>> > > > Prior to this patch the nvme-ns devices are always children of the
>> > > > NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>> > > > unrealized when the parent device is removed. However, when subsystems
>> > > > are involved, this is not what we want since the namespaces may be
>> > > > attached to other controllers as well.
>> > > >
>> > > > This patch adds an additional NvmeBus on the subsystem device. When
>> > > > nvme-ns devices are realized, if the parent controller device is linked
>> > > > to a subsystem, the parent bus is set to the subsystem one instead. This
>> > > > makes sure that namespaces are kept alive and not unrealized.
>> > > >
>> > > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>> > > > ---
>> > > >  hw/nvme/nvme.h   | 18 ++++++++++--------
>> > > >  hw/nvme/ctrl.c   |  8 +++++---
>> > > >  hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
>> > > >  hw/nvme/subsys.c |  4 ++++
>> > > >  4 files changed, 44 insertions(+), 18 deletions(-)
>> > > >
>> > > > diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>> > > > index c4065467d877..9401e212f9f7 100644
>> > > > --- a/hw/nvme/nvme.h
>> > > > +++ b/hw/nvme/nvme.h
>> > > > @@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES >
>> > > > NVME_NSID_BROADCAST - 1);
>> > > >  typedef struct NvmeCtrl NvmeCtrl;
>> > > >  typedef struct NvmeNamespace NvmeNamespace;
>> > > > +#define TYPE_NVME_BUS "nvme-bus"
>> > > > +OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>> > > > +
>> > > > +typedef struct NvmeBus {
>> > > > +    BusState parent_bus;
>> > > > +    bool     is_subsys;
>> > > > +} NvmeBus;
>> > > > +
>> > > >  #define TYPE_NVME_SUBSYS "nvme-subsys"
>> > > >  #define NVME_SUBSYS(obj) \
>> > > >      OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>> > > >  typedef struct NvmeSubsystem {
>> > > >      DeviceState parent_obj;
>> > > > +    NvmeBus     bus;
>> > > >      uint8_t     subnqn[256];
>> > > >      NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>> > > > @@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
>> > > >      QTAILQ_HEAD(, NvmeRequest) req_list;
>> > > >  } NvmeCQueue;
>> > > > -#define TYPE_NVME_BUS "nvme-bus"
>> > > > -#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>> > > > -
>> > > > -typedef struct NvmeBus {
>> > > > -    BusState parent_bus;
>> > > > -} NvmeBus;
>> > > > -
>> > > >  #define TYPE_NVME "nvme"
>> > > >  #define NVME(obj) \
>> > > >          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>> > > > @@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
>> > > >  static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
>> > > >  {
>> > > > -    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
>> > > > +    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
>> > > >          return NULL;
>> > > >      }
>> > > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>> > > > index 90e3ee2b70ee..7c8fca36d9a5 100644
>> > > > --- a/hw/nvme/ctrl.c
>> > > > +++ b/hw/nvme/ctrl.c
>> > > > @@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
>> > > >      for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>> > > >          ns = nvme_ns(n, i);
>> > > > -        if (!ns) {
>> > > > -            continue;
>> > > > +        if (ns) {
>> > > > +            ns->attached--;
>> > > >          }
>> > > > +    }
>> > > > -        nvme_ns_cleanup(ns);
>> > > > +    if (n->subsys) {
>> > > > +        nvme_subsys_unregister_ctrl(n->subsys, n);
>> > > >      }
>> > > >      if (n->subsys) {
>> > > > diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>> > > > index 3c4f5b8c714a..612a2786d75d 100644
>> > > > --- a/hw/nvme/ns.c
>> > > > +++ b/hw/nvme/ns.c
>> > > > @@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>> > > >  static void nvme_ns_realize(DeviceState *dev, Error **errp)
>> > > >  {
>> > > >      NvmeNamespace *ns = NVME_NS(dev);
>> > > > -    BusState *s = qdev_get_parent_bus(dev);
>> > > > -    NvmeCtrl *n = NVME(s->parent);
>> > > > -    NvmeSubsystem *subsys = n->subsys;
>> > > > +    BusState *qbus = qdev_get_parent_bus(dev);
>> > > > +    NvmeBus *bus = NVME_BUS(qbus);
>> > > > +    NvmeCtrl *n = NULL;
>> > > > +    NvmeSubsystem *subsys = NULL;
>> > > >      uint32_t nsid = ns->params.nsid;
>> > > >      int i;
>> > > > -    if (!n->subsys) {
>> > > > +    if (bus->is_subsys) {
>> > > > +        subsys = NVME_SUBSYS(qbus->parent);
>> > > > +    } else {
>> > > > +        n = NVME(qbus->parent);
>> > > > +        subsys = n->subsys;
>> > > > +    }
>> > > > +
>> > > > +    if (subsys) {
>> > > > +        /*
>> > > > +         * If this namespace belongs to a subsystem (through a
>> > > > link on the
>> > > > +         * controller device), reparent the device.
>> > > > +         */
>> > > > +        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>> > > > +            return;
>> > > > +        }
>> > > > +    } else {
>> > > >          if (ns->params.detached) {
>> > > >              error_setg(errp, "detached requires that the nvme
>> > > > device is "
>> > > >                         "linked to an nvme-subsys device");
>> > > > @@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState
>> > > > *dev, Error **errp)
>> > > >      if (!nsid) {
>> > > >          for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>> > > > -            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
>> > > > +            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
>> > > >                  continue;
>> > > >              }
>> > > > @@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState
>> > > > *dev, Error **errp)
>> > > >              return;
>> > > >          }
>> > > >      } else {
>> > > > -        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
>> > > > +        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
>> > > >              error_setg(errp, "namespace id '%d' already
>> > > > allocated", nsid);
>> > > >              return;
>> > > >          }
>> > > > @@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState
>> > > > *dev, Error **errp)
>> > > >          }
>> > > >      }
>> > > > -    nvme_attach_ns(n, ns);
>> > > > +    if (n) {
>> > > > +        nvme_attach_ns(n, ns);
>> > > > +    }
>> > > >  }
>> > > >  static Property nvme_ns_props[] = {
>> > > > diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
>> > > > index 92caa604a280..fb7c3a7c55fc 100644
>> > > > --- a/hw/nvme/subsys.c
>> > > > +++ b/hw/nvme/subsys.c
>> > > > @@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState
>> > > > *dev, Error **errp)
>> > > >  {
>> > > >      NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>> > > > +    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus),
>> > > > TYPE_NVME_BUS, dev,
>> > > > +                        dev->id);
>> > > > +    subsys->bus.is_subsys = true;
>> > > > +
>> > > >      nvme_subsys_setup(subsys);
>> > > >  }
>> > > >
>> > > Why not always create a subsystem, and distinguish between 'shared'
>> > > and 'non-shared' subsystems instead of the ->subsys pointer?
>> > > That way all namespaces can be children of the subsystem, we won't
>> > > need any reparenting, and the whole thing will be more in-line with
>> > > qdev, no?
>> > >
>> >
>> > Hi Hannes,
>> >
>> > Thanks for your reviews and comments!
>> >
>> > So, I have actually considered what you suggest.
>> >
>> > The problem is that the nvme-ns device currently expects to plug into a
>> > bus (an 'nvme-bus') and we cannot really get rid of that 'bus' parameter
>> > without breaking compatibility. I experimented with removing the bus
>> > from the nvme device and instead creating it in the nvme-subsys device.
>> > My idea here was to implicitly always create an nvme-subsys if not
>> > explicitly created by the user, but since nvme-subsys does not plug into
>> > any bus itself, it is not exposed in the qtree and thus not reachable
>> > (i.e., when realizing the nvme-ns device, it will complain that there
>> > are no 'nvme-bus' available to plug into). To make this work, I believe
>> > we'd have to have the nvme-subsys plug into the main system bus (or some
>> > other bus rooted at main-system-bus), and I'd prefer not to do that
>> > since this is already a flawed design and I think it would be more
>> > intrusive than what my patch does.
>> >
>> Sigh.
>> _Again_.
>>
>> I seem to trip over issues for which no patch can possibly be accepted as
>> first the infrastructure has to be reworked.
>>
>> ... there is a reason why I'm avoiding posting patches to qemu-devel :-(
>>
>> > I raised these issues on the mailinglist some time ago[1]. We didn't
>> > really find a good solution, but I think the conclusion is that the
>> > bus/device design of subsystems and namespaces is flawed. That's why I
>> > am working on an "objectification" of the two devices as suggested by
>> > Stefan[2] as a proper fix for this mess.
>> >
>> Actually, I _do_ think that it would be good to have an nvme-subsystem bus,
>> providing a list of all namespaces for that subsystem.
>> Creating a controller would then mean that one has to create a controller
>> and a namespace _separately_, as then the namespace would _not_ be a child
>> of the controller.
>> But that's arguably a good thing, as the namespaces _do_ have a separate
>> lifetime from controllers.
>> And the lifetime of the subsystem could be completely self-contained; the
>> controller would be looking up subsystems via the subsystem NQN if present;
>> I guess we'll need to create ad-hoc subsystems for PCIe.
>> But nothing insurmountable.
>> And certainly nothing which require a large-scale rework of qemu internals,
>> I would think.
>
>I don't remember the reason to use --object instead of --device. Maybe
>the --device approach is viable and it makes some sense because this is
>a "device" (like a SCSI LUN) that exposed to the guest. It would be
>necessary to have a nvme-subsys device on the system bus to act as the
>bus for nvme-ns devices as Klaus described.
>

A namespace (-device nvme-ns) does not expose anything on its own to the 
guest. Whatever the namespace provides (basically the association to a 
-drive) is accessed through the pcie controller (-device nvme). 
Similarly, the subsystem is just a concept for allowing multiple 
controllers to access the same namespace. And this is where the model 
clash because the parent/child relationship doesnt fit. Reversing the 
relationship (controller is a child of a namespace) also doesnt work 
since a contrller may have multiple namespace attached. A many-to-many 
relationship is required which cannot be expresseed with QDEV as far as 
I understand.

Talk is cheap, but I have a WIP that shims the -device's and moves the 
core functionality of namespaces and subsystems to -objects. This makes 
the setup much more flexible and fits QDEV/QOM much better.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 16:14         ` Stefan Hajnoczi
  2021-07-07 16:47           ` Klaus Jensen
@ 2021-07-08  6:15           ` Hannes Reinecke
  1 sibling, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2021-07-08  6:15 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, Klaus Jensen, Markus Armbruster, qemu-devel,
	Keith Busch, Klaus Jensen, Philippe Mathieu-Daudé

On 7/7/21 6:14 PM, Stefan Hajnoczi wrote:
> On Wed, Jul 07, 2021 at 12:43:56PM +0200, Hannes Reinecke wrote:
>> On 7/7/21 11:53 AM, Klaus Jensen wrote:
>>> On Jul  7 09:49, Hannes Reinecke wrote:
>>>> On 7/6/21 11:33 AM, Klaus Jensen wrote:
>>>>> From: Klaus Jensen <k.jensen@samsung.com>
>>>>>
>>>>> Prior to this patch the nvme-ns devices are always children of the
>>>>> NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>>>>> unrealized when the parent device is removed. However, when subsystems
>>>>> are involved, this is not what we want since the namespaces may be
>>>>> attached to other controllers as well.
>>>>>
>>>>> This patch adds an additional NvmeBus on the subsystem device. When
>>>>> nvme-ns devices are realized, if the parent controller device is linked
>>>>> to a subsystem, the parent bus is set to the subsystem one instead. This
>>>>> makes sure that namespaces are kept alive and not unrealized.
>>>>>
>>>>> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>>>>> ---
>>>>>   hw/nvme/nvme.h   | 18 ++++++++++--------
>>>>>   hw/nvme/ctrl.c   |  8 +++++---
>>>>>   hw/nvme/ns.c     | 32 +++++++++++++++++++++++++-------
>>>>>   hw/nvme/subsys.c |  4 ++++
>>>>>   4 files changed, 44 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>>>>> index c4065467d877..9401e212f9f7 100644
>>>>> --- a/hw/nvme/nvme.h
>>>>> +++ b/hw/nvme/nvme.h
>>>>> @@ -33,12 +33,21 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES >
>>>>> NVME_NSID_BROADCAST - 1);
>>>>>   typedef struct NvmeCtrl NvmeCtrl;
>>>>>   typedef struct NvmeNamespace NvmeNamespace;
>>>>> +#define TYPE_NVME_BUS "nvme-bus"
>>>>> +OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>>>>> +
>>>>> +typedef struct NvmeBus {
>>>>> +    BusState parent_bus;
>>>>> +    bool     is_subsys;
>>>>> +} NvmeBus;
>>>>> +
>>>>>   #define TYPE_NVME_SUBSYS "nvme-subsys"
>>>>>   #define NVME_SUBSYS(obj) \
>>>>>       OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>>>>>   typedef struct NvmeSubsystem {
>>>>>       DeviceState parent_obj;
>>>>> +    NvmeBus     bus;
>>>>>       uint8_t     subnqn[256];
>>>>>       NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>>>>> @@ -365,13 +374,6 @@ typedef struct NvmeCQueue {
>>>>>       QTAILQ_HEAD(, NvmeRequest) req_list;
>>>>>   } NvmeCQueue;
>>>>> -#define TYPE_NVME_BUS "nvme-bus"
>>>>> -#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>>>>> -
>>>>> -typedef struct NvmeBus {
>>>>> -    BusState parent_bus;
>>>>> -} NvmeBus;
>>>>> -
>>>>>   #define TYPE_NVME "nvme"
>>>>>   #define NVME(obj) \
>>>>>           OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>>>>> @@ -463,7 +465,7 @@ typedef struct NvmeCtrl {
>>>>>   static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
>>>>>   {
>>>>> -    if (!nsid || nsid > NVME_MAX_NAMESPACES) {
>>>>> +    if (!n || !nsid || nsid > NVME_MAX_NAMESPACES) {
>>>>>           return NULL;
>>>>>       }
>>>>> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>>>> index 90e3ee2b70ee..7c8fca36d9a5 100644
>>>>> --- a/hw/nvme/ctrl.c
>>>>> +++ b/hw/nvme/ctrl.c
>>>>> @@ -6516,11 +6516,13 @@ static void nvme_exit(PCIDevice *pci_dev)
>>>>>       for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>>>>           ns = nvme_ns(n, i);
>>>>> -        if (!ns) {
>>>>> -            continue;
>>>>> +        if (ns) {
>>>>> +            ns->attached--;
>>>>>           }
>>>>> +    }
>>>>> -        nvme_ns_cleanup(ns);
>>>>> +    if (n->subsys) {
>>>>> +        nvme_subsys_unregister_ctrl(n->subsys, n);
>>>>>       }
>>>>>       if (n->subsys) {
>>>>> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>>>>> index 3c4f5b8c714a..612a2786d75d 100644
>>>>> --- a/hw/nvme/ns.c
>>>>> +++ b/hw/nvme/ns.c
>>>>> @@ -444,13 +444,29 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>>>>>   static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>>>>   {
>>>>>       NvmeNamespace *ns = NVME_NS(dev);
>>>>> -    BusState *s = qdev_get_parent_bus(dev);
>>>>> -    NvmeCtrl *n = NVME(s->parent);
>>>>> -    NvmeSubsystem *subsys = n->subsys;
>>>>> +    BusState *qbus = qdev_get_parent_bus(dev);
>>>>> +    NvmeBus *bus = NVME_BUS(qbus);
>>>>> +    NvmeCtrl *n = NULL;
>>>>> +    NvmeSubsystem *subsys = NULL;
>>>>>       uint32_t nsid = ns->params.nsid;
>>>>>       int i;
>>>>> -    if (!n->subsys) {
>>>>> +    if (bus->is_subsys) {
>>>>> +        subsys = NVME_SUBSYS(qbus->parent);
>>>>> +    } else {
>>>>> +        n = NVME(qbus->parent);
>>>>> +        subsys = n->subsys;
>>>>> +    }
>>>>> +
>>>>> +    if (subsys) {
>>>>> +        /*
>>>>> +         * If this namespace belongs to a subsystem (through a
>>>>> link on the
>>>>> +         * controller device), reparent the device.
>>>>> +         */
>>>>> +        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>>>>> +            return;
>>>>> +        }
>>>>> +    } else {
>>>>>           if (ns->params.detached) {
>>>>>               error_setg(errp, "detached requires that the nvme
>>>>> device is "
>>>>>                          "linked to an nvme-subsys device");
>>>>> @@ -470,7 +486,7 @@ static void nvme_ns_realize(DeviceState
>>>>> *dev, Error **errp)
>>>>>       if (!nsid) {
>>>>>           for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>>>> -            if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) {
>>>>> +            if (nvme_subsys_ns(subsys, i) || nvme_ns(n, i)) {
>>>>>                   continue;
>>>>>               }
>>>>> @@ -483,7 +499,7 @@ static void nvme_ns_realize(DeviceState
>>>>> *dev, Error **errp)
>>>>>               return;
>>>>>           }
>>>>>       } else {
>>>>> -        if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) {
>>>>> +        if (nvme_subsys_ns(subsys, nsid) || nvme_ns(n, nsid)) {
>>>>>               error_setg(errp, "namespace id '%d' already
>>>>> allocated", nsid);
>>>>>               return;
>>>>>           }
>>>>> @@ -509,7 +525,9 @@ static void nvme_ns_realize(DeviceState
>>>>> *dev, Error **errp)
>>>>>           }
>>>>>       }
>>>>> -    nvme_attach_ns(n, ns);
>>>>> +    if (n) {
>>>>> +        nvme_attach_ns(n, ns);
>>>>> +    }
>>>>>   }
>>>>>   static Property nvme_ns_props[] = {
>>>>> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
>>>>> index 92caa604a280..fb7c3a7c55fc 100644
>>>>> --- a/hw/nvme/subsys.c
>>>>> +++ b/hw/nvme/subsys.c
>>>>> @@ -50,6 +50,10 @@ static void nvme_subsys_realize(DeviceState
>>>>> *dev, Error **errp)
>>>>>   {
>>>>>       NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>>>>> +    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus),
>>>>> TYPE_NVME_BUS, dev,
>>>>> +                        dev->id);
>>>>> +    subsys->bus.is_subsys = true;
>>>>> +
>>>>>       nvme_subsys_setup(subsys);
>>>>>   }
>>>>>
>>>> Why not always create a subsystem, and distinguish between 'shared'
>>>> and 'non-shared' subsystems instead of the ->subsys pointer?
>>>> That way all namespaces can be children of the subsystem, we won't
>>>> need any reparenting, and the whole thing will be more in-line with
>>>> qdev, no?
>>>>
>>>
>>> Hi Hannes,
>>>
>>> Thanks for your reviews and comments!
>>>
>>> So, I have actually considered what you suggest.
>>>
>>> The problem is that the nvme-ns device currently expects to plug into a
>>> bus (an 'nvme-bus') and we cannot really get rid of that 'bus' parameter
>>> without breaking compatibility. I experimented with removing the bus
>>> from the nvme device and instead creating it in the nvme-subsys device.
>>> My idea here was to implicitly always create an nvme-subsys if not
>>> explicitly created by the user, but since nvme-subsys does not plug into
>>> any bus itself, it is not exposed in the qtree and thus not reachable
>>> (i.e., when realizing the nvme-ns device, it will complain that there
>>> are no 'nvme-bus' available to plug into). To make this work, I believe
>>> we'd have to have the nvme-subsys plug into the main system bus (or some
>>> other bus rooted at main-system-bus), and I'd prefer not to do that
>>> since this is already a flawed design and I think it would be more
>>> intrusive than what my patch does.
>>>
>> Sigh.
>> _Again_.
>>
>> I seem to trip over issues for which no patch can possibly be accepted as
>> first the infrastructure has to be reworked.
>>
>> ... there is a reason why I'm avoiding posting patches to qemu-devel :-(
>>
>>> I raised these issues on the mailinglist some time ago[1]. We didn't
>>> really find a good solution, but I think the conclusion is that the
>>> bus/device design of subsystems and namespaces is flawed. That's why I
>>> am working on an "objectification" of the two devices as suggested by
>>> Stefan[2] as a proper fix for this mess.
>>>
>> Actually, I _do_ think that it would be good to have an nvme-subsystem bus,
>> providing a list of all namespaces for that subsystem.
>> Creating a controller would then mean that one has to create a controller
>> and a namespace _separately_, as then the namespace would _not_ be a child
>> of the controller.
>> But that's arguably a good thing, as the namespaces _do_ have a separate
>> lifetime from controllers.
>> And the lifetime of the subsystem could be completely self-contained; the
>> controller would be looking up subsystems via the subsystem NQN if present;
>> I guess we'll need to create ad-hoc subsystems for PCIe.
>> But nothing insurmountable.
>> And certainly nothing which require a large-scale rework of qemu internals,
>> I would think.
> 
> I don't remember the reason to use --object instead of --device. Maybe
> the --device approach is viable and it makes some sense because this is
> a "device" (like a SCSI LUN) that exposed to the guest. It would be
> necessary to have a nvme-subsys device on the system bus to act as the
> bus for nvme-ns devices as Klaus described.
> 

Which is what I've had in mind.
The only issue here is that a nvme controller will also generate or 
attach to a nvme subsys, and the subsys will have a different lifetime 
than the controller.
But it should be possible to key the lifetime of the subsys to the 
attached controllers (via refcounting, say), so that once the last 
controller is detached from a subsystem the subsystem itself is also 
detached.

>> But _if_ you work on the rework, please ensure that not only NVMe is
>> represented here. SCSI multipathing has the same issue currently; there is a
>> strict host->lun->block->OS device relationship currently in SCSI, making it
>> impossible to represent a multipathed SCSI device in qemu.
>> The only sneaky way is to allow for a shared OS device (ie specifying the
>> filename more than once), but that completely sidesteps qemu internals, too.
> 
> Do you want one SCSIDevice to be accessible via multiple SCSIBus
> instances for multipathing in QEMU?
> 
Precisely. Some things like ALUA or multipath issues in general are 
tricky to reproduce, and qemu would be a neat way here. But currently
we can't as we cannot emulate multipathing.

Any hints are welcome; I've tried several times to implement it but 
inevitably failed :-(

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-08  6:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06  9:33 [PATCH 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
2021-07-06  9:33 ` [PATCH 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
2021-07-07  7:45   ` Hannes Reinecke
2021-07-06  9:33 ` [PATCH 2/4] hw/nvme: mark nvme-subsys non-hotpluggable Klaus Jensen
2021-07-07  7:45   ` Hannes Reinecke
2021-07-06  9:33 ` [PATCH 3/4] hw/nvme: unregister controller with subsystem at exit Klaus Jensen
2021-07-07  7:46   ` Hannes Reinecke
2021-07-06  9:33 ` [PATCH 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
2021-07-07  7:49   ` Hannes Reinecke
2021-07-07  9:53     ` Klaus Jensen
2021-07-07 10:43       ` Hannes Reinecke
2021-07-07 15:41         ` Klaus Jensen
2021-07-07 16:14         ` Stefan Hajnoczi
2021-07-07 16:47           ` Klaus Jensen
2021-07-08  6:15           ` Hannes Reinecke
2021-07-07 10:02   ` Klaus Jensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.