All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] hw/nvme: fix controller hotplugging
@ 2021-07-07 15:49 Klaus Jensen
  2021-07-07 15:49 ` [PATCH v2 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-07 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Keith Busch, Klaus Jensen, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
discussed a bit back and fourth and I mentioned that the core issue was
an artifact of the parent/child relationship stemming from the qdev
setup we have with namespaces attaching to controller through a qdev
bus.

The gist of this series is the fourth patch "hw/nvme: fix controller hot
unplugging" which basically causes namespaces to be reassigned to a bus
owned by the subsystem if the parent controller is linked to one. This
fixes `device_del/add nvme` in such settings.

Note, that in the case that there is no subsystem involved, nvme devices
can be removed from the system with `device_del`, but this *will* cause
the namespaces to be removed as well since there is no place (i.e. no
subsystem) for them to "linger". And since this series does not add
support for hotplugging nvme-ns devices, while an nvme device can be
readded, no namespaces can. Support for hotplugging nvme-ns devices is
present in [1], but I'd rather not add that since I think '-device
nvme-ns' is already a bad design choice.

Now, I do realize that it is not "pretty" to explicitly change the
parent bus, so I do have a an RFC patch in queue that replaces the
subsystem and namespace devices with objects, but keeps -device shims
available for backwards compatibility. This approach will solve the
problems properly and should be a better model. However, I don't believe
it will make it for 6.1 and I'd really like to at least fix the
unplugging for 6.1 and this gets the job done.

  [1]: 20210511073511.32511-1-hare@suse.de

v2:
- added R-b's by Hannes for patches 1 through 3
- simplified "hw/nvme: fix controller hot unplugging"

Klaus Jensen (4):
  hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
  hw/nvme: mark nvme-subsys non-hotpluggable
  hw/nvme: unregister controller with subsystem at exit
  hw/nvme: fix controller hot unplugging

 hw/nvme/nvme.h   | 18 +++++++++-------
 hw/nvme/ctrl.c   | 14 ++++++------
 hw/nvme/ns.c     | 55 +++++++++++++++++++++++++++++++-----------------
 hw/nvme/subsys.c |  9 ++++++++
 4 files changed, 63 insertions(+), 33 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
  2021-07-07 15:49 [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
@ 2021-07-07 15:49 ` Klaus Jensen
  2021-07-07 15:49 ` [PATCH v2 2/4] hw/nvme: mark nvme-subsys non-hotpluggable Klaus Jensen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-07 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Keith Busch, Klaus Jensen, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

The nvme_ns_setup and nvme_ns_check_constraints should not depend on the
controller state. Refactor and remove it.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/nvme.h |  2 +-
 hw/nvme/ctrl.c |  2 +-
 hw/nvme/ns.c   | 37 ++++++++++++++++++-------------------
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 56f8eceed2ad..0868359a1e86 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -246,7 +246,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 }
 
 void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 629b0d38c2a2..dd1801510032 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6498,7 +6498,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         ns = &n->namespace;
         ns->params.nsid = 1;
 
-        if (nvme_ns_setup(n, ns, errp)) {
+        if (nvme_ns_setup(ns, errp)) {
             return;
         }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 4275c3db6301..3c4f5b8c714a 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -346,8 +346,7 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns)
     assert(ns->nr_open_zones == 0);
 }
 
-static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns,
-                                     Error **errp)
+static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
 {
     if (!ns->blkconf.blk) {
         error_setg(errp, "block backend not configured");
@@ -366,20 +365,6 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns,
         return -1;
     }
 
-    if (!n->subsys) {
-        if (ns->params.detached) {
-            error_setg(errp, "detached requires that the nvme device is "
-                       "linked to an nvme-subsys device");
-            return -1;
-        }
-
-        if (ns->params.shared) {
-            error_setg(errp, "shared requires that the nvme device is "
-                       "linked to an nvme-subsys device");
-            return -1;
-        }
-    }
-
     if (ns->params.zoned) {
         if (ns->params.max_active_zones) {
             if (ns->params.max_open_zones > ns->params.max_active_zones) {
@@ -411,9 +396,9 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns,
     return 0;
 }
 
-int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 {
-    if (nvme_ns_check_constraints(n, ns, errp)) {
+    if (nvme_ns_check_constraints(ns, errp)) {
         return -1;
     }
 
@@ -465,7 +450,21 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
     uint32_t nsid = ns->params.nsid;
     int i;
 
-    if (nvme_ns_setup(n, ns, errp)) {
+    if (!n->subsys) {
+        if (ns->params.detached) {
+            error_setg(errp, "detached requires that the nvme device is "
+                       "linked to an nvme-subsys device");
+            return;
+        }
+
+        if (ns->params.shared) {
+            error_setg(errp, "shared requires that the nvme device is "
+                       "linked to an nvme-subsys device");
+            return;
+        }
+    }
+
+    if (nvme_ns_setup(ns, errp)) {
         return;
     }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/4] hw/nvme: mark nvme-subsys non-hotpluggable
  2021-07-07 15:49 [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
  2021-07-07 15:49 ` [PATCH v2 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
@ 2021-07-07 15:49 ` Klaus Jensen
  2021-07-07 15:49 ` [PATCH v2 3/4] hw/nvme: unregister controller with subsystem at exit Klaus Jensen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-07 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Keith Busch, Klaus Jensen, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

We currently lack the infrastructure to handle subsystem hotplugging, so
disable it.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/subsys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 192223d17ca1..dc7a96862f37 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -61,6 +61,7 @@ static void nvme_subsys_class_init(ObjectClass *oc, void *data)
 
     dc->realize = nvme_subsys_realize;
     dc->desc = "Virtual NVMe subsystem";
+    dc->hotpluggable = false;
 
     device_class_set_props(dc, nvme_subsystem_props);
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/4] hw/nvme: unregister controller with subsystem at exit
  2021-07-07 15:49 [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
  2021-07-07 15:49 ` [PATCH v2 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
  2021-07-07 15:49 ` [PATCH v2 2/4] hw/nvme: mark nvme-subsys non-hotpluggable Klaus Jensen
@ 2021-07-07 15:49 ` Klaus Jensen
  2021-07-07 15:49 ` [PATCH v2 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
  2021-07-09  6:05 ` [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
  4 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-07 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Keith Busch, Klaus Jensen, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

Make sure the controller is unregistered from the subsystem when device
is removed.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/nvme.h   | 1 +
 hw/nvme/ctrl.c   | 4 ++++
 hw/nvme/subsys.c | 5 +++++
 3 files changed, 10 insertions(+)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 0868359a1e86..c4065467d877 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -50,6 +50,7 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
                                          uint32_t cntlid)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dd1801510032..90e3ee2b70ee 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6523,6 +6523,10 @@ static void nvme_exit(PCIDevice *pci_dev)
         nvme_ns_cleanup(ns);
     }
 
+    if (n->subsys) {
+        nvme_subsys_unregister_ctrl(n->subsys, n);
+    }
+
     g_free(n->cq);
     g_free(n->sq);
     g_free(n->aer_reqs);
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index dc7a96862f37..92caa604a280 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -32,6 +32,11 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
     return cntlid;
 }
 
+void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n)
+{
+    subsys->ctrls[n->cntlid] = NULL;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
     const char *nqn = subsys->params.nqn ?
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 15:49 [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
                   ` (2 preceding siblings ...)
  2021-07-07 15:49 ` [PATCH v2 3/4] hw/nvme: unregister controller with subsystem at exit Klaus Jensen
@ 2021-07-07 15:49 ` Klaus Jensen
  2021-07-07 15:57   ` Hannes Reinecke
  2021-07-09  6:05 ` [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
  4 siblings, 1 reply; 14+ messages in thread
From: Klaus Jensen @ 2021-07-07 15:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Keith Busch, Klaus Jensen, Hannes Reinecke, qemu-block, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

Prior to this patch the nvme-ns devices are always children of the
NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
unrealized when the parent device is removed. However, when subsystems
are involved, this is not what we want since the namespaces may be
attached to other controllers as well.

This patch adds an additional NvmeBus on the subsystem device. When
nvme-ns devices are realized, if the parent controller device is linked
to a subsystem, the parent bus is set to the subsystem one instead. This
makes sure that namespaces are kept alive and not unrealized.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/nvme.h   | 15 ++++++++-------
 hw/nvme/ctrl.c   | 14 ++++++--------
 hw/nvme/ns.c     | 18 ++++++++++++++++++
 hw/nvme/subsys.c |  3 +++
 4 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index c4065467d877..83ffabade4cf 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -33,12 +33,20 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
 
+#define TYPE_NVME_BUS "nvme-bus"
+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
+
+typedef struct NvmeBus {
+    BusState parent_bus;
+} NvmeBus;
+
 #define TYPE_NVME_SUBSYS "nvme-subsys"
 #define NVME_SUBSYS(obj) \
     OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 typedef struct NvmeSubsystem {
     DeviceState parent_obj;
+    NvmeBus     bus;
     uint8_t     subnqn[256];
 
     NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
@@ -365,13 +373,6 @@ typedef struct NvmeCQueue {
     QTAILQ_HEAD(, NvmeRequest) req_list;
 } NvmeCQueue;
 
-#define TYPE_NVME_BUS "nvme-bus"
-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
-
-typedef struct NvmeBus {
-    BusState parent_bus;
-} NvmeBus;
-
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 90e3ee2b70ee..9a3b3a27c293 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6514,16 +6514,14 @@ static void nvme_exit(PCIDevice *pci_dev)
 
     nvme_ctrl_reset(n);
 
-    for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
-        ns = nvme_ns(n, i);
-        if (!ns) {
-            continue;
+    if (n->subsys) {
+        for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+            ns = nvme_ns(n, i);
+            if (ns) {
+                ns->attached--;
+            }
         }
 
-        nvme_ns_cleanup(ns);
-    }
-
-    if (n->subsys) {
         nvme_subsys_unregister_ctrl(n->subsys, n);
     }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 3c4f5b8c714a..b7cf1494e75b 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -441,6 +441,15 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
     }
 }
 
+static void nvme_ns_unrealize(DeviceState *dev)
+{
+    NvmeNamespace *ns = NVME_NS(dev);
+
+    nvme_ns_drain(ns);
+    nvme_ns_shutdown(ns);
+    nvme_ns_cleanup(ns);
+}
+
 static void nvme_ns_realize(DeviceState *dev, Error **errp)
 {
     NvmeNamespace *ns = NVME_NS(dev);
@@ -462,6 +471,14 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
                        "linked to an nvme-subsys device");
             return;
         }
+    } else {
+        /*
+         * If this namespace belongs to a subsystem (through a link on the
+         * controller device), reparent the device.
+         */
+        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
+            return;
+        }
     }
 
     if (nvme_ns_setup(ns, errp)) {
@@ -552,6 +569,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)
 
     dc->bus_type = TYPE_NVME_BUS;
     dc->realize = nvme_ns_realize;
+    dc->unrealize = nvme_ns_unrealize;
     device_class_set_props(dc, nvme_ns_props);
     dc->desc = "Virtual NVMe namespace";
 }
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 92caa604a280..93c35950d69d 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -50,6 +50,9 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
 {
     NvmeSubsystem *subsys = NVME_SUBSYS(dev);
 
+    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
+                        dev->id);
+
     nvme_subsys_setup(subsys);
 }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 15:49 ` [PATCH v2 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
@ 2021-07-07 15:57   ` Hannes Reinecke
  2021-07-07 16:56     ` Klaus Jensen
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2021-07-07 15:57 UTC (permalink / raw)
  To: Klaus Jensen, qemu-devel; +Cc: Keith Busch, Klaus Jensen, qemu-block

On 7/7/21 5:49 PM, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> Prior to this patch the nvme-ns devices are always children of the
> NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
> unrealized when the parent device is removed. However, when subsystems
> are involved, this is not what we want since the namespaces may be
> attached to other controllers as well.
> 
> This patch adds an additional NvmeBus on the subsystem device. When
> nvme-ns devices are realized, if the parent controller device is linked
> to a subsystem, the parent bus is set to the subsystem one instead. This
> makes sure that namespaces are kept alive and not unrealized.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>   hw/nvme/nvme.h   | 15 ++++++++-------
>   hw/nvme/ctrl.c   | 14 ++++++--------
>   hw/nvme/ns.c     | 18 ++++++++++++++++++
>   hw/nvme/subsys.c |  3 +++
>   4 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
> index c4065467d877..83ffabade4cf 100644
> --- a/hw/nvme/nvme.h
> +++ b/hw/nvme/nvme.h
> @@ -33,12 +33,20 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
>   typedef struct NvmeCtrl NvmeCtrl;
>   typedef struct NvmeNamespace NvmeNamespace;
>   
> +#define TYPE_NVME_BUS "nvme-bus"
> +OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
> +
> +typedef struct NvmeBus {
> +    BusState parent_bus;
> +} NvmeBus;
> +
>   #define TYPE_NVME_SUBSYS "nvme-subsys"
>   #define NVME_SUBSYS(obj) \
>       OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>   
>   typedef struct NvmeSubsystem {
>       DeviceState parent_obj;
> +    NvmeBus     bus;
>       uint8_t     subnqn[256];
>   
>       NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
> @@ -365,13 +373,6 @@ typedef struct NvmeCQueue {
>       QTAILQ_HEAD(, NvmeRequest) req_list;
>   } NvmeCQueue;
>   
> -#define TYPE_NVME_BUS "nvme-bus"
> -#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
> -
> -typedef struct NvmeBus {
> -    BusState parent_bus;
> -} NvmeBus;
> -
>   #define TYPE_NVME "nvme"
>   #define NVME(obj) \
>           OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 90e3ee2b70ee..9a3b3a27c293 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -6514,16 +6514,14 @@ static void nvme_exit(PCIDevice *pci_dev)
>   
>       nvme_ctrl_reset(n);
>   
> -    for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> -        ns = nvme_ns(n, i);
> -        if (!ns) {
> -            continue;
> +    if (n->subsys) {
> +        for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
> +            ns = nvme_ns(n, i);
> +            if (ns) {
> +                ns->attached--;
> +            }
>           }
>   
> -        nvme_ns_cleanup(ns);

So who is removing the namespaces, then?
I would have expected some cleanup action from the subsystem, seeing 
that we reparent to that ...

> -    }
> -
> -    if (n->subsys) {
>           nvme_subsys_unregister_ctrl(n->subsys, n);
>       }
>   
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> index 3c4f5b8c714a..b7cf1494e75b 100644
> --- a/hw/nvme/ns.c
> +++ b/hw/nvme/ns.c
> @@ -441,6 +441,15 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>       }
>   }
>   
> +static void nvme_ns_unrealize(DeviceState *dev)
> +{
> +    NvmeNamespace *ns = NVME_NS(dev);
> +
> +    nvme_ns_drain(ns);
> +    nvme_ns_shutdown(ns);
> +    nvme_ns_cleanup(ns);
> +}
> +
>   static void nvme_ns_realize(DeviceState *dev, Error **errp)
>   {
>       NvmeNamespace *ns = NVME_NS(dev);
> @@ -462,6 +471,14 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>                          "linked to an nvme-subsys device");
>               return;
>           }
> +    } else {
> +        /*
> +         * If this namespace belongs to a subsystem (through a link on the
> +         * controller device), reparent the device.
> +         */
> +        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
> +            return;
> +        }

What happens if that fails?
Will we abort? Not create the namespace?

>       }
>   
>       if (nvme_ns_setup(ns, errp)) {
> @@ -552,6 +569,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)
>   
>       dc->bus_type = TYPE_NVME_BUS;
>       dc->realize = nvme_ns_realize;
> +    dc->unrealize = nvme_ns_unrealize;
>       device_class_set_props(dc, nvme_ns_props);
>       dc->desc = "Virtual NVMe namespace";
>   }
> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
> index 92caa604a280..93c35950d69d 100644
> --- a/hw/nvme/subsys.c
> +++ b/hw/nvme/subsys.c
> @@ -50,6 +50,9 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
>   {
>       NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>   
> +    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
> +                        dev->id);
> +
>       nvme_subsys_setup(subsys);
>   }
>   
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 15:57   ` Hannes Reinecke
@ 2021-07-07 16:56     ` Klaus Jensen
  2021-07-08  5:16       ` Klaus Jensen
  0 siblings, 1 reply; 14+ messages in thread
From: Klaus Jensen @ 2021-07-07 16:56 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Klaus Jensen, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5750 bytes --]

On Jul  7 17:57, Hannes Reinecke wrote:
>On 7/7/21 5:49 PM, Klaus Jensen wrote:
>>From: Klaus Jensen <k.jensen@samsung.com>
>>
>>Prior to this patch the nvme-ns devices are always children of the
>>NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>>unrealized when the parent device is removed. However, when subsystems
>>are involved, this is not what we want since the namespaces may be
>>attached to other controllers as well.
>>
>>This patch adds an additional NvmeBus on the subsystem device. When
>>nvme-ns devices are realized, if the parent controller device is linked
>>to a subsystem, the parent bus is set to the subsystem one instead. This
>>makes sure that namespaces are kept alive and not unrealized.
>>
>>Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>>---
>>  hw/nvme/nvme.h   | 15 ++++++++-------
>>  hw/nvme/ctrl.c   | 14 ++++++--------
>>  hw/nvme/ns.c     | 18 ++++++++++++++++++
>>  hw/nvme/subsys.c |  3 +++
>>  4 files changed, 35 insertions(+), 15 deletions(-)
>>
>>diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>>index c4065467d877..83ffabade4cf 100644
>>--- a/hw/nvme/nvme.h
>>+++ b/hw/nvme/nvme.h
>>@@ -33,12 +33,20 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
>>  typedef struct NvmeCtrl NvmeCtrl;
>>  typedef struct NvmeNamespace NvmeNamespace;
>>+#define TYPE_NVME_BUS "nvme-bus"
>>+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>>+
>>+typedef struct NvmeBus {
>>+    BusState parent_bus;
>>+} NvmeBus;
>>+
>>  #define TYPE_NVME_SUBSYS "nvme-subsys"
>>  #define NVME_SUBSYS(obj) \
>>      OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>>  typedef struct NvmeSubsystem {
>>      DeviceState parent_obj;
>>+    NvmeBus     bus;
>>      uint8_t     subnqn[256];
>>      NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>>@@ -365,13 +373,6 @@ typedef struct NvmeCQueue {
>>      QTAILQ_HEAD(, NvmeRequest) req_list;
>>  } NvmeCQueue;
>>-#define TYPE_NVME_BUS "nvme-bus"
>>-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>>-
>>-typedef struct NvmeBus {
>>-    BusState parent_bus;
>>-} NvmeBus;
>>-
>>  #define TYPE_NVME "nvme"
>>  #define NVME(obj) \
>>          OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>>diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>index 90e3ee2b70ee..9a3b3a27c293 100644
>>--- a/hw/nvme/ctrl.c
>>+++ b/hw/nvme/ctrl.c
>>@@ -6514,16 +6514,14 @@ static void nvme_exit(PCIDevice *pci_dev)
>>      nvme_ctrl_reset(n);
>>-    for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>-        ns = nvme_ns(n, i);
>>-        if (!ns) {
>>-            continue;
>>+    if (n->subsys) {
>>+        for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>+            ns = nvme_ns(n, i);
>>+            if (ns) {
>>+                ns->attached--;
>>+            }
>>          }
>>-        nvme_ns_cleanup(ns);
>
>So who is removing the namespaces, then?
>I would have expected some cleanup action from the subsystem, seeing 
>that we reparent to that ...
>

Since we "move" the namespaces to the subsystem, and since the subsystem 
is non-hotpluggable, they will (and can) not be removed. In the case 
that there is no subsystem, nvme_ns_unrealize() will be called for each 
child namespace on the controller NvmeBus.

>>-    }
>>-
>>-    if (n->subsys) {
>>          nvme_subsys_unregister_ctrl(n->subsys, n);
>>      }
>>diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>>index 3c4f5b8c714a..b7cf1494e75b 100644
>>--- a/hw/nvme/ns.c
>>+++ b/hw/nvme/ns.c
>>@@ -441,6 +441,15 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>>      }
>>  }
>>+static void nvme_ns_unrealize(DeviceState *dev)
>>+{
>>+    NvmeNamespace *ns = NVME_NS(dev);
>>+
>>+    nvme_ns_drain(ns);
>>+    nvme_ns_shutdown(ns);
>>+    nvme_ns_cleanup(ns);
>>+}
>>+
>>  static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>  {
>>      NvmeNamespace *ns = NVME_NS(dev);
>>@@ -462,6 +471,14 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>                         "linked to an nvme-subsys device");
>>              return;
>>          }
>>+    } else {
>>+        /*
>>+         * If this namespace belongs to a subsystem (through a link on the
>>+         * controller device), reparent the device.
>>+         */
>>+        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>>+            return;
>>+        }
>
>What happens if that fails?
>Will we abort? Not create the namespace?
>

Good point!

It can actually only fail if the bus implements check_address(), which 
it does not, so it always succeeds, so it should assert instead.

>>      }
>>      if (nvme_ns_setup(ns, errp)) {
>>@@ -552,6 +569,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)
>>      dc->bus_type = TYPE_NVME_BUS;
>>      dc->realize = nvme_ns_realize;
>>+    dc->unrealize = nvme_ns_unrealize;
>>      device_class_set_props(dc, nvme_ns_props);
>>      dc->desc = "Virtual NVMe namespace";
>>  }
>>diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
>>index 92caa604a280..93c35950d69d 100644
>>--- a/hw/nvme/subsys.c
>>+++ b/hw/nvme/subsys.c
>>@@ -50,6 +50,9 @@ static void nvme_subsys_realize(DeviceState *dev, Error **errp)
>>  {
>>      NvmeSubsystem *subsys = NVME_SUBSYS(dev);
>>+    qbus_create_inplace(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
>>+                        dev->id);
>>+
>>      nvme_subsys_setup(subsys);
>>  }
>>
>Cheers,
>
>Hannes
>-- 
>Dr. Hannes Reinecke                Kernel Storage Architect
>hare@suse.de                              +49 911 74053 688
>SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
>HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/4] hw/nvme: fix controller hot unplugging
  2021-07-07 16:56     ` Klaus Jensen
@ 2021-07-08  5:16       ` Klaus Jensen
  0 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-08  5:16 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Klaus Jensen, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4985 bytes --]

On Jul  7 18:56, Klaus Jensen wrote:
>On Jul  7 17:57, Hannes Reinecke wrote:
>>On 7/7/21 5:49 PM, Klaus Jensen wrote:
>>>From: Klaus Jensen <k.jensen@samsung.com>
>>>
>>>Prior to this patch the nvme-ns devices are always children of the
>>>NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
>>>unrealized when the parent device is removed. However, when subsystems
>>>are involved, this is not what we want since the namespaces may be
>>>attached to other controllers as well.
>>>
>>>This patch adds an additional NvmeBus on the subsystem device. When
>>>nvme-ns devices are realized, if the parent controller device is linked
>>>to a subsystem, the parent bus is set to the subsystem one instead. This
>>>makes sure that namespaces are kept alive and not unrealized.
>>>
>>>Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>>>---
>>> hw/nvme/nvme.h   | 15 ++++++++-------
>>> hw/nvme/ctrl.c   | 14 ++++++--------
>>> hw/nvme/ns.c     | 18 ++++++++++++++++++
>>> hw/nvme/subsys.c |  3 +++
>>> 4 files changed, 35 insertions(+), 15 deletions(-)
>>>
>>>diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
>>>index c4065467d877..83ffabade4cf 100644
>>>--- a/hw/nvme/nvme.h
>>>+++ b/hw/nvme/nvme.h
>>>@@ -33,12 +33,20 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1);
>>> typedef struct NvmeCtrl NvmeCtrl;
>>> typedef struct NvmeNamespace NvmeNamespace;
>>>+#define TYPE_NVME_BUS "nvme-bus"
>>>+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
>>>+
>>>+typedef struct NvmeBus {
>>>+    BusState parent_bus;
>>>+} NvmeBus;
>>>+
>>> #define TYPE_NVME_SUBSYS "nvme-subsys"
>>> #define NVME_SUBSYS(obj) \
>>>     OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
>>> typedef struct NvmeSubsystem {
>>>     DeviceState parent_obj;
>>>+    NvmeBus     bus;
>>>     uint8_t     subnqn[256];
>>>     NvmeCtrl      *ctrls[NVME_MAX_CONTROLLERS];
>>>@@ -365,13 +373,6 @@ typedef struct NvmeCQueue {
>>>     QTAILQ_HEAD(, NvmeRequest) req_list;
>>> } NvmeCQueue;
>>>-#define TYPE_NVME_BUS "nvme-bus"
>>>-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
>>>-
>>>-typedef struct NvmeBus {
>>>-    BusState parent_bus;
>>>-} NvmeBus;
>>>-
>>> #define TYPE_NVME "nvme"
>>> #define NVME(obj) \
>>>         OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
>>>diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>>index 90e3ee2b70ee..9a3b3a27c293 100644
>>>--- a/hw/nvme/ctrl.c
>>>+++ b/hw/nvme/ctrl.c
>>>@@ -6514,16 +6514,14 @@ static void nvme_exit(PCIDevice *pci_dev)
>>>     nvme_ctrl_reset(n);
>>>-    for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>>-        ns = nvme_ns(n, i);
>>>-        if (!ns) {
>>>-            continue;
>>>+    if (n->subsys) {
>>>+        for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
>>>+            ns = nvme_ns(n, i);
>>>+            if (ns) {
>>>+                ns->attached--;
>>>+            }
>>>         }
>>>-        nvme_ns_cleanup(ns);
>>
>>So who is removing the namespaces, then?
>>I would have expected some cleanup action from the subsystem, seeing 
>>that we reparent to that ...
>>
>
>Since we "move" the namespaces to the subsystem, and since the 
>subsystem is non-hotpluggable, they will (and can) not be removed. In 
>the case that there is no subsystem, nvme_ns_unrealize() will be 
>called for each child namespace on the controller NvmeBus.
>
>>>-    }
>>>-
>>>-    if (n->subsys) {
>>>         nvme_subsys_unregister_ctrl(n->subsys, n);
>>>     }
>>>diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
>>>index 3c4f5b8c714a..b7cf1494e75b 100644
>>>--- a/hw/nvme/ns.c
>>>+++ b/hw/nvme/ns.c
>>>@@ -441,6 +441,15 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
>>>     }
>>> }
>>>+static void nvme_ns_unrealize(DeviceState *dev)
>>>+{
>>>+    NvmeNamespace *ns = NVME_NS(dev);
>>>+
>>>+    nvme_ns_drain(ns);
>>>+    nvme_ns_shutdown(ns);
>>>+    nvme_ns_cleanup(ns);
>>>+}
>>>+
>>> static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>> {
>>>     NvmeNamespace *ns = NVME_NS(dev);
>>>@@ -462,6 +471,14 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
>>>                        "linked to an nvme-subsys device");
>>>             return;
>>>         }
>>>+    } else {
>>>+        /*
>>>+         * If this namespace belongs to a subsystem (through a link on the
>>>+         * controller device), reparent the device.
>>>+         */
>>>+        if (!qdev_set_parent_bus(dev, &subsys->bus.parent_bus, errp)) {
>>>+            return;
>>>+        }
>>
>>What happens if that fails?
>>Will we abort? Not create the namespace?
>>
>
>Good point!
>
>It can actually only fail if the bus implements check_address(), which 
>it does not, so it always succeeds, so it should assert instead.
>

Nah, the 'if' is fine. If check_address() should be implemented at some 
point, errp will be set and invocation of qemu will stop with an error. 
So I think the error handling is fine as-is.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] hw/nvme: fix controller hotplugging
  2021-07-07 15:49 [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
                   ` (3 preceding siblings ...)
  2021-07-07 15:49 ` [PATCH v2 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
@ 2021-07-09  6:05 ` Klaus Jensen
  2021-07-09  6:16   ` Hannes Reinecke
  4 siblings, 1 reply; 14+ messages in thread
From: Klaus Jensen @ 2021-07-09  6:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Keith Busch, Klaus Jensen, Hannes Reinecke, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

On Jul  7 17:49, Klaus Jensen wrote:
>From: Klaus Jensen <k.jensen@samsung.com>
>
>Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
>discussed a bit back and fourth and I mentioned that the core issue was
>an artifact of the parent/child relationship stemming from the qdev
>setup we have with namespaces attaching to controller through a qdev
>bus.
>
>The gist of this series is the fourth patch "hw/nvme: fix controller hot
>unplugging" which basically causes namespaces to be reassigned to a bus
>owned by the subsystem if the parent controller is linked to one. This
>fixes `device_del/add nvme` in such settings.
>
>Note, that in the case that there is no subsystem involved, nvme devices
>can be removed from the system with `device_del`, but this *will* cause
>the namespaces to be removed as well since there is no place (i.e. no
>subsystem) for them to "linger". And since this series does not add
>support for hotplugging nvme-ns devices, while an nvme device can be
>readded, no namespaces can. Support for hotplugging nvme-ns devices is
>present in [1], but I'd rather not add that since I think '-device
>nvme-ns' is already a bad design choice.
>
>Now, I do realize that it is not "pretty" to explicitly change the
>parent bus, so I do have a an RFC patch in queue that replaces the
>subsystem and namespace devices with objects, but keeps -device shims
>available for backwards compatibility. This approach will solve the
>problems properly and should be a better model. However, I don't believe
>it will make it for 6.1 and I'd really like to at least fix the
>unplugging for 6.1 and this gets the job done.
>
>  [1]: 20210511073511.32511-1-hare@suse.de
>
>v2:
>- added R-b's by Hannes for patches 1 through 3
>- simplified "hw/nvme: fix controller hot unplugging"
>
>Klaus Jensen (4):
>  hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
>  hw/nvme: mark nvme-subsys non-hotpluggable
>  hw/nvme: unregister controller with subsystem at exit
>  hw/nvme: fix controller hot unplugging
>
> hw/nvme/nvme.h   | 18 +++++++++-------
> hw/nvme/ctrl.c   | 14 ++++++------
> hw/nvme/ns.c     | 55 +++++++++++++++++++++++++++++++-----------------
> hw/nvme/subsys.c |  9 ++++++++
> 4 files changed, 63 insertions(+), 33 deletions(-)
>
>-- 
>2.32.0
>

Applied patches 1 through 3 to nvme-next.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] hw/nvme: fix controller hotplugging
  2021-07-09  6:05 ` [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
@ 2021-07-09  6:16   ` Hannes Reinecke
  2021-07-09  6:55     ` Klaus Jensen
  0 siblings, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2021-07-09  6:16 UTC (permalink / raw)
  To: Klaus Jensen, qemu-devel; +Cc: Keith Busch, Klaus Jensen, qemu-block

On 7/9/21 8:05 AM, Klaus Jensen wrote:
> On Jul  7 17:49, Klaus Jensen wrote:
>> From: Klaus Jensen <k.jensen@samsung.com>
>>
>> Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
>> discussed a bit back and fourth and I mentioned that the core issue was
>> an artifact of the parent/child relationship stemming from the qdev
>> setup we have with namespaces attaching to controller through a qdev
>> bus.
>>
>> The gist of this series is the fourth patch "hw/nvme: fix controller hot
>> unplugging" which basically causes namespaces to be reassigned to a bus
>> owned by the subsystem if the parent controller is linked to one. This
>> fixes `device_del/add nvme` in such settings.
>>
>> Note, that in the case that there is no subsystem involved, nvme devices
>> can be removed from the system with `device_del`, but this *will* cause
>> the namespaces to be removed as well since there is no place (i.e. no
>> subsystem) for them to "linger". And since this series does not add
>> support for hotplugging nvme-ns devices, while an nvme device can be
>> readded, no namespaces can. Support for hotplugging nvme-ns devices is
>> present in [1], but I'd rather not add that since I think '-device
>> nvme-ns' is already a bad design choice.
>>
>> Now, I do realize that it is not "pretty" to explicitly change the
>> parent bus, so I do have a an RFC patch in queue that replaces the
>> subsystem and namespace devices with objects, but keeps -device shims
>> available for backwards compatibility. This approach will solve the
>> problems properly and should be a better model. However, I don't believe
>> it will make it for 6.1 and I'd really like to at least fix the
>> unplugging for 6.1 and this gets the job done.
>>
>>  [1]: 20210511073511.32511-1-hare@suse.de
>>
>> v2:
>> - added R-b's by Hannes for patches 1 through 3
>> - simplified "hw/nvme: fix controller hot unplugging"
>>
>> Klaus Jensen (4):
>>  hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
>>  hw/nvme: mark nvme-subsys non-hotpluggable
>>  hw/nvme: unregister controller with subsystem at exit
>>  hw/nvme: fix controller hot unplugging
>>
>> hw/nvme/nvme.h   | 18 +++++++++-------
>> hw/nvme/ctrl.c   | 14 ++++++------
>> hw/nvme/ns.c     | 55 +++++++++++++++++++++++++++++++-----------------
>> hw/nvme/subsys.c |  9 ++++++++
>> 4 files changed, 63 insertions(+), 33 deletions(-)
>>
>> -- 
>> 2.32.0
>>
> 
> Applied patches 1 through 3 to nvme-next.

So, how do we go about with patch 4?
Without it this whole exercise is a bit pointless, seeing that it 
doesn't fix anything.

Shall we go with that patch as an interim solution?
Will you replace it with your 'object' patch?
What is the plan?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] hw/nvme: fix controller hotplugging
  2021-07-09  6:16   ` Hannes Reinecke
@ 2021-07-09  6:55     ` Klaus Jensen
  2021-07-09  6:59       ` Klaus Jensen
  2021-07-09  8:51       ` Hannes Reinecke
  0 siblings, 2 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-09  6:55 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Klaus Jensen, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5671 bytes --]

On Jul  9 08:16, Hannes Reinecke wrote:
>On 7/9/21 8:05 AM, Klaus Jensen wrote:
>>On Jul  7 17:49, Klaus Jensen wrote:
>>>From: Klaus Jensen <k.jensen@samsung.com>
>>>
>>>Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
>>>discussed a bit back and fourth and I mentioned that the core issue was
>>>an artifact of the parent/child relationship stemming from the qdev
>>>setup we have with namespaces attaching to controller through a qdev
>>>bus.
>>>
>>>The gist of this series is the fourth patch "hw/nvme: fix controller hot
>>>unplugging" which basically causes namespaces to be reassigned to a bus
>>>owned by the subsystem if the parent controller is linked to one. This
>>>fixes `device_del/add nvme` in such settings.
>>>
>>>Note, that in the case that there is no subsystem involved, nvme devices
>>>can be removed from the system with `device_del`, but this *will* cause
>>>the namespaces to be removed as well since there is no place (i.e. no
>>>subsystem) for them to "linger". And since this series does not add
>>>support for hotplugging nvme-ns devices, while an nvme device can be
>>>readded, no namespaces can. Support for hotplugging nvme-ns devices is
>>>present in [1], but I'd rather not add that since I think '-device
>>>nvme-ns' is already a bad design choice.
>>>
>>>Now, I do realize that it is not "pretty" to explicitly change the
>>>parent bus, so I do have a an RFC patch in queue that replaces the
>>>subsystem and namespace devices with objects, but keeps -device shims
>>>available for backwards compatibility. This approach will solve the
>>>problems properly and should be a better model. However, I don't believe
>>>it will make it for 6.1 and I'd really like to at least fix the
>>>unplugging for 6.1 and this gets the job done.
>>>
>>> [1]: 20210511073511.32511-1-hare@suse.de
>>>
>>>v2:
>>>- added R-b's by Hannes for patches 1 through 3
>>>- simplified "hw/nvme: fix controller hot unplugging"
>>>
>>>Klaus Jensen (4):
>>> hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
>>> hw/nvme: mark nvme-subsys non-hotpluggable
>>> hw/nvme: unregister controller with subsystem at exit
>>> hw/nvme: fix controller hot unplugging
>>>
>>>hw/nvme/nvme.h   | 18 +++++++++-------
>>>hw/nvme/ctrl.c   | 14 ++++++------
>>>hw/nvme/ns.c     | 55 +++++++++++++++++++++++++++++++-----------------
>>>hw/nvme/subsys.c |  9 ++++++++
>>>4 files changed, 63 insertions(+), 33 deletions(-)
>>>
>>>-- 
>>>2.32.0
>>>
>>
>>Applied patches 1 through 3 to nvme-next.
>
>So, how do we go about with patch 4?
>Without it this whole exercise is a bit pointless, seeing that it 
>doesn't fix anything.
>

Patch 1-3 are fixes we need anyway, so I thought I might as well apply 
them :)

>Shall we go with that patch as an interim solution?
>Will you replace it with your 'object' patch?
>What is the plan?
>

Yes, if acceptable, I would like to use patch 4 as an interim solution. 
We have a bug we need to fix for 6.1, and I belive this does the job.

I considered changing the existing nvme-bus to be on the main system 
bus, but then we break the existing behavior that the namespaces attach 
to the most recently defined controller in the absence of the shared 
parameter or an explicit bus parameter.

Wrt. "the plan", right now, I see two solutions going forward:

1. Introduce new -object's for nvme-nvm-subsystem and nvme-ns
    This is the approach that I am taking right now and it works well. It 
    allows many-to-many relationships and separates the life times of 
    subsystems, namespaces and controllers like you mentioned.

    Conceptually, I also really like that the subsystem and namespace are 
    not "devices". One could argue that the namespace is comparable to a 
    SCSI LUN (-device scsi-hd, right?), but where the SCSI LUN actually 
    "shows up" in the host, the nvme namespace does not.

    My series handles backwards compatibility by keeping -device "shims" 
    around that just wraps the new objects but behaves like it used to. 
    The plan would be to deprecate these devices.

    The downside to this approach is that it moves the subsystem and 
    namespaces out of the "qdev tree (info qtree)" and into the pure QOM 
    "/objects" tree. Instead of qtree, we can have QMP and HMP commands 
    for introspection.

2. Make the subsystem a "system bus device"
    This way we add an "nvme-nvm-subsystems" bus as a direct child of the 
    main system bus, and we can possibly get rid of the explicit -device 
    nvme-subsys as well. We change the namespace device to plug into that 
    instead. The nvme controller device still needs to plug into the PCI 
    bus, so it cannot be a child of the subsystems bus, but can keep 
    using a link parameter to hook into the subsystem and attach to any 
    namespaces it would like.

    I'm unsure if we can do this without deprecating the existing 
    namespace device, just like option 1.

    I have not implemented this, so I need to look more into it. It seems 
    like the main thing that this gives us compared to 1) is `info qtree` 
    support and we still end up just "wiring" namespace attachment with 
    backlinks anyway.

I'm not sure what I would prefer, but I've found that implemting it as 
-object's is a breath of fresh air and as I mentioned, conceptually, I 
like option 1 because namespaces are -objects and not -devices.

And, by the way, thanks for chipping in on this Hannes, I had sort of 
crossed off option 2 before you showed up and threw some ideas in the 
air ;)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] hw/nvme: fix controller hotplugging
  2021-07-09  6:55     ` Klaus Jensen
@ 2021-07-09  6:59       ` Klaus Jensen
  2021-07-09  8:51       ` Hannes Reinecke
  1 sibling, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-09  6:59 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Klaus Jensen, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5872 bytes --]

On Jul  9 08:55, Klaus Jensen wrote:
>On Jul  9 08:16, Hannes Reinecke wrote:
>>On 7/9/21 8:05 AM, Klaus Jensen wrote:
>>>On Jul  7 17:49, Klaus Jensen wrote:
>>>>From: Klaus Jensen <k.jensen@samsung.com>
>>>>
>>>>Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
>>>>discussed a bit back and fourth and I mentioned that the core issue was
>>>>an artifact of the parent/child relationship stemming from the qdev
>>>>setup we have with namespaces attaching to controller through a qdev
>>>>bus.
>>>>
>>>>The gist of this series is the fourth patch "hw/nvme: fix controller hot
>>>>unplugging" which basically causes namespaces to be reassigned to a bus
>>>>owned by the subsystem if the parent controller is linked to one. This
>>>>fixes `device_del/add nvme` in such settings.
>>>>
>>>>Note, that in the case that there is no subsystem involved, nvme devices
>>>>can be removed from the system with `device_del`, but this *will* cause
>>>>the namespaces to be removed as well since there is no place (i.e. no
>>>>subsystem) for them to "linger". And since this series does not add
>>>>support for hotplugging nvme-ns devices, while an nvme device can be
>>>>readded, no namespaces can. Support for hotplugging nvme-ns devices is
>>>>present in [1], but I'd rather not add that since I think '-device
>>>>nvme-ns' is already a bad design choice.
>>>>
>>>>Now, I do realize that it is not "pretty" to explicitly change the
>>>>parent bus, so I do have a an RFC patch in queue that replaces the
>>>>subsystem and namespace devices with objects, but keeps -device shims
>>>>available for backwards compatibility. This approach will solve the
>>>>problems properly and should be a better model. However, I don't believe
>>>>it will make it for 6.1 and I'd really like to at least fix the
>>>>unplugging for 6.1 and this gets the job done.
>>>>
>>>> [1]: 20210511073511.32511-1-hare@suse.de
>>>>
>>>>v2:
>>>>- added R-b's by Hannes for patches 1 through 3
>>>>- simplified "hw/nvme: fix controller hot unplugging"
>>>>
>>>>Klaus Jensen (4):
>>>> hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
>>>> hw/nvme: mark nvme-subsys non-hotpluggable
>>>> hw/nvme: unregister controller with subsystem at exit
>>>> hw/nvme: fix controller hot unplugging
>>>>
>>>>hw/nvme/nvme.h   | 18 +++++++++-------
>>>>hw/nvme/ctrl.c   | 14 ++++++------
>>>>hw/nvme/ns.c     | 55 +++++++++++++++++++++++++++++++-----------------
>>>>hw/nvme/subsys.c |  9 ++++++++
>>>>4 files changed, 63 insertions(+), 33 deletions(-)
>>>>
>>>>-- 
>>>>2.32.0
>>>>
>>>
>>>Applied patches 1 through 3 to nvme-next.
>>
>>So, how do we go about with patch 4?
>>Without it this whole exercise is a bit pointless, seeing that it 
>>doesn't fix anything.
>>
>
>Patch 1-3 are fixes we need anyway, so I thought I might as well apply 
>them :)
>
>>Shall we go with that patch as an interim solution?
>>Will you replace it with your 'object' patch?
>>What is the plan?
>>
>
>Yes, if acceptable, I would like to use patch 4 as an interim 
>solution. We have a bug we need to fix for 6.1, and I belive this does 
>the job.
>
>I considered changing the existing nvme-bus to be on the main system 
>bus, but then we break the existing behavior that the namespaces 
>attach to the most recently defined controller in the absence of the 
>shared parameter or an explicit bus parameter.
>
>Wrt. "the plan", right now, I see two solutions going forward:
>
>1. Introduce new -object's for nvme-nvm-subsystem and nvme-ns
>   This is the approach that I am taking right now and it works well. 
>It    allows many-to-many relationships and separates the life times 
>of    subsystems, namespaces and controllers like you mentioned.
>
>   Conceptually, I also really like that the subsystem and namespace 
>are    not "devices". One could argue that the namespace is comparable 
>to a    SCSI LUN (-device scsi-hd, right?), but where the SCSI LUN 
>actually    "shows up" in the host, the nvme namespace does not.
>
>   My series handles backwards compatibility by keeping -device 
>"shims"    around that just wraps the new objects but behaves like it 
>used to.    The plan would be to deprecate these devices.
>
>   The downside to this approach is that it moves the subsystem and    
>namespaces out of the "qdev tree (info qtree)" and into the pure QOM    
>"/objects" tree. Instead of qtree, we can have QMP and HMP commands    
>for introspection.
>
>2. Make the subsystem a "system bus device"
>   This way we add an "nvme-nvm-subsystems" bus as a direct child of 
>the    main system bus, and we can possibly get rid of the explicit 
>-device    nvme-subsys as well. We change the namespace device to plug 
>into that    instead. The nvme controller device still needs to plug 
>into the PCI    bus, so it cannot be a child of the subsystems bus, 
>but can keep    using a link parameter to hook into the subsystem and 
>attach to any    namespaces it would like.
>
>   I'm unsure if we can do this without deprecating the existing    
>namespace device, just like option 1.
>
>   I have not implemented this, so I need to look more into it. It 
>seems    like the main thing that this gives us compared to 1) is 
>`info qtree`    support and we still end up just "wiring" namespace 
>attachment with    backlinks anyway.
>
>I'm not sure what I would prefer, but I've found that implemting it as 
>-object's is a breath of fresh air and as I mentioned, conceptually, I 
>like option 1 because namespaces are -objects and not -devices.
>
>And, by the way, thanks for chipping in on this Hannes, I had sort of 
>crossed off option 2 before you showed up and threw some ideas in the 
>air ;)

Wow, my mailer screwed up that formatting, sorry about that.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] hw/nvme: fix controller hotplugging
  2021-07-09  6:55     ` Klaus Jensen
  2021-07-09  6:59       ` Klaus Jensen
@ 2021-07-09  8:51       ` Hannes Reinecke
  2021-07-09 10:08         ` Klaus Jensen
  1 sibling, 1 reply; 14+ messages in thread
From: Hannes Reinecke @ 2021-07-09  8:51 UTC (permalink / raw)
  To: Klaus Jensen; +Cc: Keith Busch, Klaus Jensen, qemu-devel, qemu-block

On 7/9/21 8:55 AM, Klaus Jensen wrote:
> On Jul  9 08:16, Hannes Reinecke wrote:
>> On 7/9/21 8:05 AM, Klaus Jensen wrote:
>>> On Jul  7 17:49, Klaus Jensen wrote:
>>>> From: Klaus Jensen <k.jensen@samsung.com>
>>>>
>>>> Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
>>>> discussed a bit back and fourth and I mentioned that the core issue was
>>>> an artifact of the parent/child relationship stemming from the qdev
>>>> setup we have with namespaces attaching to controller through a qdev
>>>> bus.
>>>>
>>>> The gist of this series is the fourth patch "hw/nvme: fix controller 
>>>> hot
>>>> unplugging" which basically causes namespaces to be reassigned to a bus
>>>> owned by the subsystem if the parent controller is linked to one. This
>>>> fixes `device_del/add nvme` in such settings.
>>>>
>>>> Note, that in the case that there is no subsystem involved, nvme 
>>>> devices
>>>> can be removed from the system with `device_del`, but this *will* cause
>>>> the namespaces to be removed as well since there is no place (i.e. no
>>>> subsystem) for them to "linger". And since this series does not add
>>>> support for hotplugging nvme-ns devices, while an nvme device can be
>>>> readded, no namespaces can. Support for hotplugging nvme-ns devices is
>>>> present in [1], but I'd rather not add that since I think '-device
>>>> nvme-ns' is already a bad design choice.
>>>>
>>>> Now, I do realize that it is not "pretty" to explicitly change the
>>>> parent bus, so I do have a an RFC patch in queue that replaces the
>>>> subsystem and namespace devices with objects, but keeps -device shims
>>>> available for backwards compatibility. This approach will solve the
>>>> problems properly and should be a better model. However, I don't 
>>>> believe
>>>> it will make it for 6.1 and I'd really like to at least fix the
>>>> unplugging for 6.1 and this gets the job done.
>>>>
>>>>  [1]: 20210511073511.32511-1-hare@suse.de
>>>>
>>>> v2:
>>>> - added R-b's by Hannes for patches 1 through 3
>>>> - simplified "hw/nvme: fix controller hot unplugging"
>>>>
>>>> Klaus Jensen (4):
>>>>  hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
>>>>  hw/nvme: mark nvme-subsys non-hotpluggable
>>>>  hw/nvme: unregister controller with subsystem at exit
>>>>  hw/nvme: fix controller hot unplugging
>>>>
>>>> hw/nvme/nvme.h   | 18 +++++++++-------
>>>> hw/nvme/ctrl.c   | 14 ++++++------
>>>> hw/nvme/ns.c     | 55 +++++++++++++++++++++++++++++++-----------------
>>>> hw/nvme/subsys.c |  9 ++++++++
>>>> 4 files changed, 63 insertions(+), 33 deletions(-)
>>>>
>>>> -- 
>>>> 2.32.0
>>>>
>>>
>>> Applied patches 1 through 3 to nvme-next.
>>
>> So, how do we go about with patch 4?
>> Without it this whole exercise is a bit pointless, seeing that it 
>> doesn't fix anything.
>>
> 
> Patch 1-3 are fixes we need anyway, so I thought I might as well apply 
> them :)
> 
>> Shall we go with that patch as an interim solution?
>> Will you replace it with your 'object' patch?
>> What is the plan?
>>
> 
> Yes, if acceptable, I would like to use patch 4 as an interim solution. 
> We have a bug we need to fix for 6.1, and I believe this does the job.
> 
Oh, yes, it does. But it's ever so slightly ugly with the reparenting 
stuff. But if that's considered an interim solution I'm fine with it.

You can add my 'Reviewed-by: Hannes Reinecke <hare@suse.de>' tag if you 
like.

> I considered changing the existing nvme-bus to be on the main system 
> bus, but then we break the existing behavior that the namespaces attach 
> to the most recently defined controller in the absence of the shared 
> parameter or an explicit bus parameter.
> 
Do we?
My idea was to always attach a namespace to a subsystem (and, if not 
present, create one). The controller would then 'link' to that 
subsystem. The subsystem would have a 'shared' attribute, which would 
determine if more than one controller can be 'linked' to it.

That way we change the relationship between the controller and the 
namespace, as then the namespace would be a child of the subsystem,
and the namespace would need to be detached separately from the controller.

But it fits neatly into the current device model, except the slightly 
awkward 'link' thingie.

> Wrt. "the plan", right now, I see two solutions going forward:
> 
> 1. Introduce new -object's for nvme-nvm-subsystem and nvme-ns
>     This is the approach that I am taking right now and it works well. 
> It allows many-to-many relationships and separates the life times of 
> subsystems, namespaces and controllers like you mentioned.
> 

Ah. Would like to see that path, then.

>     Conceptually, I also really like that the subsystem and namespace 
> are not "devices". One could argue that the namespace is comparable 
> to a SCSI LUN (-device scsi-hd, right?), but where the SCSI LUN 
> actually "shows up" in the host, the nvme namespace does not.
> 

Well, 'devices' really is an abstraction, and it really depends on what 
you declare a device is. But yes, in the QDEV sense with its strict 
inheritance the nvme topology is not a good fit, agreed.

As for SCSI: the namespace is quite comparable to a SCSI LUN; the NVMe 
controller is roughly equivalent to the 'initiator' on SCSI, and the 
subsystem would match up to the SCSI target.

The problem for NVMe is that the whole NVMe-over-Fabrics stuff was 
layered on top of the existing NVMe-PCI spec, so that the 'subsystem' 
only truly exists for NVMe-over-Fabrics; for PCI you don't actually need 
it, and indeed some NVMe PCI devices don't even fill out these values.
And it makes things tricky for qemu, as the nvme emulation is actually 
based on the pre-fabrics spec, hence subsystem concept was never 
implemented properly.

>     My series handles backwards compatibility by keeping -device "shims" 
>     around that just wraps the new objects but behaves like it used to. 
>     The plan would be to deprecate these devices.
> 

Or keeping the '-device' shims around for just nvme-pci, and require 
-object specification if one would want to use nvme-over-fabrics.

>     The downside to this approach is that it moves the subsystem and    
> namespaces out of the "qdev tree (info qtree)" and into the pure QOM    
> "/objects" tree. Instead of qtree, we can have QMP and HMP commands    
> for introspection.
> 

Serves them right for introducing tons of different abstractions.
Not a problem from my side.

> 2. Make the subsystem a "system bus device"
>     This way we add an "nvme-nvm-subsystems" bus as a direct child of 
> the main system bus, and we can possibly get rid of the explicit 
> -device nvme-subsys as well. We change the namespace device to plug 
> into that instead. The nvme controller device still needs to plug 
> into the PCI bus, so it cannot be a child of the subsystems bus, but 
> can keep using a link parameter to hook into the subsystem and attach 
> to any namespaces it would like.
> 

I don't think we can or should do away with the subsystem; that's quite 
a central structure in the nvme-oF spec, and trying to create an 
abstraction without it will just lead to lots of duplicated 
identification, not to mention the increased complexity during lookup
(As per spec, the controller connects to a subsystem, and the subsystem 
presents the namespaces. Abstracting away the subsystem would mean that 
you have to have lots of tracking in the individual namespace, with lots 
of possibilities to get it wrong.)

But from my perspective it should be perfectly feasible to have the 
subsystem a child of the main/system bus, and the controller a child of 
the PCI bus.

As mentioned above, that would break the implicit destruction of the 
namespace when detaching the controller, but one could argue that that's 
exactly the point, seeing that several controllers can have access to 
the same namespace.

>     I'm unsure if we can do this without deprecating the existing    
> namespace device, just like option 1.
> 
>     I have not implemented this, so I need to look more into it. It 
> seems like the main thing that this gives us compared to 1) is `info 
> qtree`support and we still end up just "wiring" namespace attachment 
> with backlinks anyway.
> 

Yeah, we'll have to do wiring one way or other.

> I'm not sure what I would prefer, but I've found that implementing it as 
> -object's is a breath of fresh air and as I mentioned, conceptually, I 
> like option 1 because namespaces are -objects and not -devices.
> 

Sure. I just tend leave the infrastructure questions to those actively 
working with the qemu community. I've found the qemu development process 
to be too unwieldy for me to make more than the random contribution.

> And, by the way, thanks for chipping in on this Hannes, I had sort of 
> crossed off option 2 before you showed up and threw some ideas in the 
> air ;)

Sure.
I could give it a go at option 2); patch 4 should be a good starting 
point. And shouldn't be too hard to implement, either.

Then we can compare results and make a judgement call.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/4] hw/nvme: fix controller hotplugging
  2021-07-09  8:51       ` Hannes Reinecke
@ 2021-07-09 10:08         ` Klaus Jensen
  0 siblings, 0 replies; 14+ messages in thread
From: Klaus Jensen @ 2021-07-09 10:08 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Keith Busch, Klaus Jensen, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 11133 bytes --]

On Jul  9 10:51, Hannes Reinecke wrote:
> On 7/9/21 8:55 AM, Klaus Jensen wrote:
> > On Jul  9 08:16, Hannes Reinecke wrote:
> > > On 7/9/21 8:05 AM, Klaus Jensen wrote:
> > > > On Jul  7 17:49, Klaus Jensen wrote:
> > > > > From: Klaus Jensen <k.jensen@samsung.com>
> > > > > 
> > > > > Back in May, Hannes posted a fix[1] to re-enable NVMe PCI hotplug. We
> > > > > discussed a bit back and fourth and I mentioned that the core issue was
> > > > > an artifact of the parent/child relationship stemming from the qdev
> > > > > setup we have with namespaces attaching to controller through a qdev
> > > > > bus.
> > > > > 
> > > > > The gist of this series is the fourth patch "hw/nvme: fix
> > > > > controller hot
> > > > > unplugging" which basically causes namespaces to be reassigned to a bus
> > > > > owned by the subsystem if the parent controller is linked to one. This
> > > > > fixes `device_del/add nvme` in such settings.
> > > > > 
> > > > > Note, that in the case that there is no subsystem involved,
> > > > > nvme devices
> > > > > can be removed from the system with `device_del`, but this *will* cause
> > > > > the namespaces to be removed as well since there is no place (i.e. no
> > > > > subsystem) for them to "linger". And since this series does not add
> > > > > support for hotplugging nvme-ns devices, while an nvme device can be
> > > > > readded, no namespaces can. Support for hotplugging nvme-ns devices is
> > > > > present in [1], but I'd rather not add that since I think '-device
> > > > > nvme-ns' is already a bad design choice.
> > > > > 
> > > > > Now, I do realize that it is not "pretty" to explicitly change the
> > > > > parent bus, so I do have a an RFC patch in queue that replaces the
> > > > > subsystem and namespace devices with objects, but keeps -device shims
> > > > > available for backwards compatibility. This approach will solve the
> > > > > problems properly and should be a better model. However, I
> > > > > don't believe
> > > > > it will make it for 6.1 and I'd really like to at least fix the
> > > > > unplugging for 6.1 and this gets the job done.
> > > > > 
> > > > >  [1]: 20210511073511.32511-1-hare@suse.de
> > > > > 
> > > > > v2:
> > > > > - added R-b's by Hannes for patches 1 through 3
> > > > > - simplified "hw/nvme: fix controller hot unplugging"
> > > > > 
> > > > > Klaus Jensen (4):
> > > > >  hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
> > > > >  hw/nvme: mark nvme-subsys non-hotpluggable
> > > > >  hw/nvme: unregister controller with subsystem at exit
> > > > >  hw/nvme: fix controller hot unplugging
> > > > > 
> > > > > hw/nvme/nvme.h   | 18 +++++++++-------
> > > > > hw/nvme/ctrl.c   | 14 ++++++------
> > > > > hw/nvme/ns.c     | 55 +++++++++++++++++++++++++++++++-----------------
> > > > > hw/nvme/subsys.c |  9 ++++++++
> > > > > 4 files changed, 63 insertions(+), 33 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 2.32.0
> > > > > 
> > > > 
> > > > Applied patches 1 through 3 to nvme-next.
> > > 
> > > So, how do we go about with patch 4?
> > > Without it this whole exercise is a bit pointless, seeing that it
> > > doesn't fix anything.
> > > 
> > 
> > Patch 1-3 are fixes we need anyway, so I thought I might as well apply
> > them :)
> > 
> > > Shall we go with that patch as an interim solution?
> > > Will you replace it with your 'object' patch?
> > > What is the plan?
> > > 
> > 
> > Yes, if acceptable, I would like to use patch 4 as an interim solution.
> > We have a bug we need to fix for 6.1, and I believe this does the job.
> > 
> Oh, yes, it does. But it's ever so slightly ugly with the reparenting stuff.
> But if that's considered an interim solution I'm fine with it.
> 

Definitely interim.

> You can add my 'Reviewed-by: Hannes Reinecke <hare@suse.de>' tag if you
> like.
> 

Thanks mate!

> > I considered changing the existing nvme-bus to be on the main system
> > bus, but then we break the existing behavior that the namespaces attach
> > to the most recently defined controller in the absence of the shared
> > parameter or an explicit bus parameter.
> > 
> Do we?

Yes, I believe so. Right now there is an implicit and documented
behavior that an nvme-ns device attaches to the most recently defined
nvme device (due to the implicit bus attachment). If we move the bus
from the controller so that the namespace plugs into the subsystem when
realized, then we break existing setups since the namespace won't know
what controller to "attach to" without an additional parameter. Right
now, this is determined by inspecting the parent of the bus it plugged
into.

> My idea was to always attach a namespace to a subsystem (and, if not
> present, create one). The controller would then 'link' to that subsystem.
> The subsystem would have a 'shared' attribute, which would determine if more
> than one controller can be 'linked' to it.
> 

Hmm, why would we want to have a subsystem restricted to one controller?
Or do you mean namespace? Because we already have a 'shared' parameter
for that.

> That way we change the relationship between the controller and the
> namespace, as then the namespace would be a child of the subsystem,
> and the namespace would need to be detached separately from the controller.
> 
> But it fits neatly into the current device model, except the slightly
> awkward 'link' thingie.
> 
> > Wrt. "the plan", right now, I see two solutions going forward:
> > 
> > 1. Introduce new -object's for nvme-nvm-subsystem and nvme-ns
> >     This is the approach that I am taking right now and it works well.
> > It allows many-to-many relationships and separates the life times of
> > subsystems, namespaces and controllers like you mentioned.
> > 
> 
> Ah. Would like to see that path, then.
> 
> >     Conceptually, I also really like that the subsystem and namespace
> > are not "devices". One could argue that the namespace is comparable to a
> > SCSI LUN (-device scsi-hd, right?), but where the SCSI LUN actually
> > "shows up" in the host, the nvme namespace does not.
> > 
> 
> Well, 'devices' really is an abstraction, and it really depends on what you
> declare a device is. But yes, in the QDEV sense with its strict inheritance
> the nvme topology is not a good fit, agreed.
> 
> As for SCSI: the namespace is quite comparable to a SCSI LUN; the NVMe
> controller is roughly equivalent to the 'initiator' on SCSI, and the
> subsystem would match up to the SCSI target.
> 
> The problem for NVMe is that the whole NVMe-over-Fabrics stuff was layered
> on top of the existing NVMe-PCI spec, so that the 'subsystem' only truly
> exists for NVMe-over-Fabrics; for PCI you don't actually need it, and indeed
> some NVMe PCI devices don't even fill out these values.
> And it makes things tricky for qemu, as the nvme emulation is actually based
> on the pre-fabrics spec, hence subsystem concept was never implemented
> properly.
> 
> >     My series handles backwards compatibility by keeping -device "shims"
> >    around that just wraps the new objects but behaves like it used to.
> >    The plan would be to deprecate these devices.
> > 
> 
> Or keeping the '-device' shims around for just nvme-pci, and require -object
> specification if one would want to use nvme-over-fabrics.
> 

I actually hadn't considered that, but you are right that moving
functionality into the objects has the additional advantage of maybe
better supporting the implementation of non-pci devices down the road.

> >     The downside to this approach is that it moves the subsystem and
> > namespaces out of the "qdev tree (info qtree)" and into the pure QOM
> > "/objects" tree. Instead of qtree, we can have QMP and HMP commands
> > for introspection.
> > 
> 
> Serves them right for introducing tons of different abstractions.
> Not a problem from my side.
> 
> > 2. Make the subsystem a "system bus device"
> >     This way we add an "nvme-nvm-subsystems" bus as a direct child of
> > the main system bus, and we can possibly get rid of the explicit -device
> > nvme-subsys as well. We change the namespace device to plug into that
> > instead. The nvme controller device still needs to plug into the PCI
> > bus, so it cannot be a child of the subsystems bus, but can keep using a
> > link parameter to hook into the subsystem and attach to any namespaces
> > it would like.
> > 
> 
> I don't think we can or should do away with the subsystem; that's quite a
> central structure in the nvme-oF spec, and trying to create an abstraction
> without it will just lead to lots of duplicated identification, not to
> mention the increased complexity during lookup
> (As per spec, the controller connects to a subsystem, and the subsystem
> presents the namespaces. Abstracting away the subsystem would mean that you
> have to have lots of tracking in the individual namespace, with lots of
> possibilities to get it wrong.)
> 

I tend to agree, having the subsystem explicit is preferable.

> But from my perspective it should be perfectly feasible to have the
> subsystem a child of the main/system bus, and the controller a child of the
> PCI bus.
> 
> As mentioned above, that would break the implicit destruction of the
> namespace when detaching the controller, but one could argue that that's
> exactly the point, seeing that several controllers can have access to the
> same namespace.
> 
> >     I'm unsure if we can do this without deprecating the existing
> > namespace device, just like option 1.
> > 
> >     I have not implemented this, so I need to look more into it. It
> > seems like the main thing that this gives us compared to 1) is `info
> > qtree`support and we still end up just "wiring" namespace attachment
> > with backlinks anyway.
> > 
> 
> Yeah, we'll have to do wiring one way or other.
> 
> > I'm not sure what I would prefer, but I've found that implementing it as
> > -object's is a breath of fresh air and as I mentioned, conceptually, I
> > like option 1 because namespaces are -objects and not -devices.
> > 
> 
> Sure. I just tend leave the infrastructure questions to those actively
> working with the qemu community. I've found the qemu development process to
> be too unwieldy for me to make more than the random contribution.
> 
> > And, by the way, thanks for chipping in on this Hannes, I had sort of
> > crossed off option 2 before you showed up and threw some ideas in the
> > air ;)
> 
> Sure.
> I could give it a go at option 2); patch 4 should be a good starting point.
> And shouldn't be too hard to implement, either.
> 
> Then we can compare results and make a judgement call.
> 

Yeah, basically the "unattached" bus that I create explicitly in patch 4
now would now be rooted on the System bus. But I still can't see how not
to break the existing behavior of how namespaces attaches to controllers
implicitly without keeping the reparenting in place.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-09 10:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07 15:49 [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
2021-07-07 15:49 ` [PATCH v2 1/4] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions Klaus Jensen
2021-07-07 15:49 ` [PATCH v2 2/4] hw/nvme: mark nvme-subsys non-hotpluggable Klaus Jensen
2021-07-07 15:49 ` [PATCH v2 3/4] hw/nvme: unregister controller with subsystem at exit Klaus Jensen
2021-07-07 15:49 ` [PATCH v2 4/4] hw/nvme: fix controller hot unplugging Klaus Jensen
2021-07-07 15:57   ` Hannes Reinecke
2021-07-07 16:56     ` Klaus Jensen
2021-07-08  5:16       ` Klaus Jensen
2021-07-09  6:05 ` [PATCH v2 0/4] hw/nvme: fix controller hotplugging Klaus Jensen
2021-07-09  6:16   ` Hannes Reinecke
2021-07-09  6:55     ` Klaus Jensen
2021-07-09  6:59       ` Klaus Jensen
2021-07-09  8:51       ` Hannes Reinecke
2021-07-09 10:08         ` Klaus Jensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.