All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
@ 2018-05-17  8:15 David Hildenbrand
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space David Hildenbrand
                   ` (14 more replies)
  0 siblings, 15 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

We can have devices that need certain other resources that are e.g.
system resources managed by the machine. We need a clean way to assign
these resources (without violating layers as brought up by Igor).

One example is virtio-mem/virtio-pmem. Both device types need to be
assigned some region in guest physical address space. This device memory
belongs to the machine and is managed by it. However, virito devices are
hotplugged using the hotplug handler their proxy device implements. So we
could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
hotplug handler for virtio-ccw. But definetly not the machine.

Now, we can route other devices through the machine hotplug handler, to
properly assign/unassign resources - like a portion in guest physical
address space.

v3 -> v4:
- Removed the s390x bits, will send that out separately (was just a proof
  that it works just fine with s390x)
- Fixed a typo and reworded a comment

v2 -> v3:
- Added "memory-device: introduce separate config option"
- Dropped "parent_bus" check from hotplug handler lookup functions
- "Handly" -> "Handle" in patch description.

v1 -> v2:
- Use multi stage hotplug handler instead of resource handler
- MemoryDevices only compiled if necessary (CONFIG_MEM_HOTPLUG)
- Prepare PC/SPAPR machines properly for multi stage hotplug handlers
- Route SPAPR unplug code via the hotunplug handler
- Directly include s390x support. But there are no usable memory devices
  yet (well, only my virtio-mem prototype)
- Included "memory-device: drop assert related to align and start of address
  space"

David Hildenbrand (13):
  memory-device: drop assert related to align and start of address space
  memory-device: introduce separate config option
  pc: prepare for multi stage hotplug handlers
  pc: route all memory devices through the machine hotplug handler
  spapr: prepare for multi stage hotplug handlers
  spapr: route all memory devices through the machine hotplug handler
  spapr: handle pc-dimm unplug via hotplug handler chain
  spapr: handle cpu core unplug via hotplug handler chain
  memory-device: new functions to handle plug/unplug
  pc-dimm: implement new memory device functions
  memory-device: factor out pre-plug into hotplug handler
  memory-device: factor out unplug into hotplug handler
  memory-device: factor out plug into hotplug handler

Igor Mammedov (1):
  qdev: let machine hotplug handler to override bus hotplug handler

 default-configs/i386-softmmu.mak   |   3 +-
 default-configs/ppc64-softmmu.mak  |   3 +-
 default-configs/x86_64-softmmu.mak |   3 +-
 hw/Makefile.objs                   |   2 +-
 hw/core/qdev.c                     |   6 +-
 hw/i386/pc.c                       | 102 ++++++++++++++++++++++-------
 hw/mem/Makefile.objs               |   4 +-
 hw/mem/memory-device.c             | 129 +++++++++++++++++++++++--------------
 hw/mem/pc-dimm.c                   |  48 ++++++--------
 hw/mem/trace-events                |   4 +-
 hw/ppc/spapr.c                     | 129 +++++++++++++++++++++++++++++++------
 include/hw/mem/memory-device.h     |  21 ++++--
 include/hw/mem/pc-dimm.h           |   3 +-
 include/hw/qdev-core.h             |  11 ++++
 qapi/misc.json                     |   2 +-
 15 files changed, 330 insertions(+), 140 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-05-29 13:27   ` Igor Mammedov
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 02/14] memory-device: introduce separate config option David Hildenbrand
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

The start of the address space does not have to be aligned for the
search. Handle this case explicitly when starting the search for a new
address.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/mem/memory-device.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 3e04f3954e..361d38bfc5 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -116,7 +116,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
     address_space_start = ms->device_memory->base;
     address_space_end = address_space_start +
                         memory_region_size(&ms->device_memory->mr);
-    g_assert(QEMU_ALIGN_UP(address_space_start, align) == address_space_start);
     g_assert(address_space_end >= address_space_start);
 
     memory_device_check_addable(ms, size, errp);
@@ -149,7 +148,7 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
             return 0;
         }
     } else {
-        new_addr = address_space_start;
+        new_addr = QEMU_ALIGN_UP(address_space_start, align);
     }
 
     /* find address range that will fit new memory device */
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 02/14] memory-device: introduce separate config option
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-05-30 12:58   ` Igor Mammedov
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 03/14] qdev: let machine hotplug handler to override bus hotplug handler David Hildenbrand
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Some architectures might support memory devices, while they don't
support DIMM/NVDIMM. So let's
- Rename CONFIG_MEM_HOTPLUG to CONFIG_MEM_DEVICE
- Introduce CONFIG_DIMM and use it similarly to CONFIG NVDIMM

CONFIG_DIMM and CONFIG_NVDIMM require CONFIG_MEM_DEVICE.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 default-configs/i386-softmmu.mak   | 3 ++-
 default-configs/ppc64-softmmu.mak  | 3 ++-
 default-configs/x86_64-softmmu.mak | 3 ++-
 hw/Makefile.objs                   | 2 +-
 hw/mem/Makefile.objs               | 4 ++--
 qapi/misc.json                     | 2 +-
 6 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 8c7d4a0fa0..4c1637338b 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -50,7 +50,8 @@ CONFIG_PCI_Q35=y
 CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
-CONFIG_MEM_HOTPLUG=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index b94af6c7c6..f550573782 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -16,4 +16,5 @@ CONFIG_VIRTIO_VGA=y
 CONFIG_XICS=$(CONFIG_PSERIES)
 CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
-CONFIG_MEM_HOTPLUG=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 0390b4303c..7785351414 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,7 +50,8 @@ CONFIG_PCI_Q35=y
 CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
-CONFIG_MEM_HOTPLUG=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 6a0ffe0afd..127a60eca4 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -33,7 +33,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_SOFTMMU) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
+devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
 devices-dirs-$(CONFIG_SOFTMMU) += smbios/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 10be4df2a2..3e2f7c5ca2 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,3 +1,3 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
-common-obj-$(CONFIG_MEM_HOTPLUG) += memory-device.o
+common-obj-$(CONFIG_DIMM) += pc-dimm.o
+common-obj-$(CONFIG_MEM_DEVICE) += memory-device.o
 common-obj-$(CONFIG_NVDIMM) += nvdimm.o
diff --git a/qapi/misc.json b/qapi/misc.json
index f5988cc0b5..4e6265cd2e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2060,7 +2060,7 @@
 #
 # @plugged-memory: size of memory that can be hot-unplugged. This field
 #                  is omitted if target doesn't support memory hotplug
-#                  (i.e. CONFIG_MEM_HOTPLUG not defined on build time).
+#                  (i.e. CONFIG_MEM_DEVICE not defined at build time).
 #
 # Since: 2.11.0
 ##
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 03/14] qdev: let machine hotplug handler to override bus hotplug handler
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space David Hildenbrand
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 02/14] memory-device: introduce separate config option David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-06-05  1:02   ` David Gibson
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers David Hildenbrand
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

From: Igor Mammedov <imammedo@redhat.com>

it will allow to return another hotplug handler than the default
one for a specific bus based device type. Which is needed to handle
non trivial plug/unplug sequences that need the access to resources
configured outside of bus where device is attached.

That will allow for returned hotplug handler to orchestrate wiring
in arbitrary order, by chaining other hotplug handlers when
it's needed.

PS:
It could be used for hybrid virtio-mem and virtio-pmem devices
where it will return machine as hotplug handler which will do
necessary wiring at machine level and then pass control down
the chain to bus specific hotplug handler.

Example of top level hotplug handler override and custom plug sequence:

  some_machine_get_hotplug_handler(machine){
      if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
          return HOTPLUG_HANDLER(machine);
      }
      return NULL;
  }

  some_machine_device_plug(hotplug_dev, dev) {
      if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
          /* do machine specific initialization */
          some_machine_init_special_device(dev)

          /* pass control to bus specific handler */
          hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev)
      }
  }

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/core/qdev.c         |  6 ++----
 include/hw/qdev-core.h | 11 +++++++++++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index f6f92473b8..885286f579 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -261,12 +261,10 @@ HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev)
 
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
 {
-    HotplugHandler *hotplug_ctrl;
+    HotplugHandler *hotplug_ctrl = qdev_get_machine_hotplug_handler(dev);
 
-    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+    if (hotplug_ctrl == NULL && dev->parent_bus) {
         hotplug_ctrl = dev->parent_bus->hotplug_handler;
-    } else {
-        hotplug_ctrl = qdev_get_machine_hotplug_handler(dev);
     }
     return hotplug_ctrl;
 }
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9453588160..e6a8eca558 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -286,6 +286,17 @@ void qdev_init_nofail(DeviceState *dev);
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version);
 HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev);
+/**
+ * qdev_get_hotplug_handler: Get handler responsible for device wiring
+ *
+ * Find HOTPLUG_HANDLER for @dev that provides [pre|un]plug callbacks for it.
+ *
+ * Note: in case @dev has a parent bus, it will be returned as handler unless
+ * machine handler overrides it.
+ *
+ * Returns: pointer to object that implements TYPE_HOTPLUG_HANDLER interface
+ *          or NULL if there aren't any.
+ */
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_unplug(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (2 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 03/14] qdev: let machine hotplug handler to override bus hotplug handler David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-05-30 13:08   ` Igor Mammedov
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler David Hildenbrand
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

For multi stage hotplug handlers, we'll have to do some error handling
in some hotplug functions, so let's use a local error variable (except
for unplug requests).

Also, add code to pass control to the final stage hotplug handler at the
parent bus.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/i386/pc.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d768930d02..510076e156 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2007,19 +2007,32 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
 static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
                                           DeviceState *dev, Error **errp)
 {
+    Error *local_err = NULL;
+
+    /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
-        pc_cpu_pre_plug(hotplug_dev, dev, errp);
+        pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
+                                 &local_err);
     }
+    error_propagate(errp, local_err);
 }
 
 static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
+    Error *local_err = NULL;
+
+    /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_plug(hotplug_dev, dev, errp);
+        pc_dimm_plug(hotplug_dev, dev, &local_err);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
-        pc_cpu_plug(hotplug_dev, dev, errp);
+        pc_cpu_plug(hotplug_dev, dev, &local_err);
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
     }
+    error_propagate(errp, local_err);
 }
 
 static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
@@ -2029,7 +2042,10 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
         pc_dimm_unplug_request(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
-    } else {
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
+                                       errp);
+    } else if (!dev->parent_bus) {
         error_setg(errp, "acpi: device unplug request for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
     }
@@ -2038,14 +2054,21 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
                                         DeviceState *dev, Error **errp)
 {
+    Error *local_err = NULL;
+
+    /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_unplug(hotplug_dev, dev, errp);
+        pc_dimm_unplug(hotplug_dev, dev, &local_err);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
-        pc_cpu_unplug_cb(hotplug_dev, dev, errp);
-    } else {
-        error_setg(errp, "acpi: device unplug for not supported device"
+        pc_cpu_unplug_cb(hotplug_dev, dev, &local_err);
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
+                               &local_err);
+    } else if (!dev->parent_bus) {
+        error_setg(&local_err, "acpi: device unplug for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
     }
+    error_propagate(errp, local_err);
 }
 
 static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (3 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-05-30 13:12   ` Igor Mammedov
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers David Hildenbrand
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Necessary to hotplug them cleanly later. We could drop the PC_DIMM
check, as PC_DIMM are just memory devices, but this approach is cleaner.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/i386/pc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 510076e156..8bc41ef24b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -74,6 +74,7 @@
 #include "hw/nmi.h"
 #include "hw/i386/intel_iommu.h"
 #include "hw/net/ne2000-isa.h"
+#include "hw/mem/memory-device.h"
 
 /* debug PC/ISA interrupts */
 //#define DEBUG_IRQ
@@ -2075,6 +2076,7 @@ static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
                                              DeviceState *dev)
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
         object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         return HOTPLUG_HANDLER(machine);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (4 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-05-17 12:43   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
                     ` (2 more replies)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler David Hildenbrand
                   ` (8 subsequent siblings)
  14 siblings, 3 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

For multi stage hotplug handlers, we'll have to do some error handling
in some hotplug functions, so let's use a local error variable (except
for unplug requests).

Also, add code to pass control to the final stage hotplug handler at the
parent bus.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ebf30dd60b..b7c5c95f7a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
 {
     MachineState *ms = MACHINE(hotplug_dev);
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
+    Error *local_err = NULL;
 
+    /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
         int node;
 
         if (!smc->dr_lmb_enabled) {
-            error_setg(errp, "Memory hotplug not supported for this machine");
-            return;
+            error_setg(&local_err,
+                       "Memory hotplug not supported for this machine");
+            goto out;
         }
-        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
-        if (*errp) {
-            return;
+        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
+                                        &local_err);
+        if (local_err) {
+            goto out;
         }
         if (node < 0 || node >= MAX_NODES) {
-            error_setg(errp, "Invaild node %d", node);
-            return;
+            error_setg(&local_err, "Invaild node %d", node);
+            goto out;
         }
 
-        spapr_memory_plug(hotplug_dev, dev, node, errp);
+        spapr_memory_plug(hotplug_dev, dev, node, &local_err);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
-        spapr_core_plug(hotplug_dev, dev, errp);
+        spapr_core_plug(hotplug_dev, dev, &local_err);
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
+    }
+out:
+    error_propagate(errp, local_err);
+}
+
+static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
+                                        DeviceState *dev, Error **errp)
+{
+    Error *local_err = NULL;
+
+    /* final stage hotplug handler */
+    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
+                               &local_err);
     }
+    error_propagate(errp, local_err);
 }
 
 static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
@@ -3618,17 +3639,27 @@ static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
             return;
         }
         spapr_core_unplug_request(hotplug_dev, dev, errp);
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
+                                       errp);
     }
 }
 
 static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
                                           DeviceState *dev, Error **errp)
 {
+    Error *local_err = NULL;
+
+    /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        spapr_memory_pre_plug(hotplug_dev, dev, errp);
+        spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
-        spapr_core_pre_plug(hotplug_dev, dev, errp);
+        spapr_core_pre_plug(hotplug_dev, dev, &local_err);
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
+                                 &local_err);
     }
+    error_propagate(errp, local_err);
 }
 
 static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
@@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
     hc->unplug_request = spapr_machine_device_unplug_request;
+    hc->unplug = spapr_machine_device_unplug;
 
     smc->dr_lmb_enabled = true;
     mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (5 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-06-05  1:09   ` David Gibson
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain David Hildenbrand
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Necessary to hotplug them cleanly later.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/ppc/spapr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b7c5c95f7a..2f315f963b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3666,6 +3666,7 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
                                                  DeviceState *dev)
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
         object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
         return HOTPLUG_HANDLER(machine);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (6 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-06-01 10:53   ` Igor Mammedov
  2018-06-05  1:12   ` David Gibson
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core " David Hildenbrand
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Let's handle it via hotplug_handler_unplug(). E.g. necessary to hotplug/
unplug memory devices (which a pc-dimm is) later.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/ppc/spapr.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2f315f963b..286c38c842 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3291,7 +3291,8 @@ static sPAPRDIMMState *spapr_recover_pending_dimm_state(sPAPRMachineState *ms,
 /* Callback to be called during DRC release. */
 void spapr_lmb_release(DeviceState *dev)
 {
-    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
+    HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
+    sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_ctrl);
     sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));
 
     /* This information will get lost if a migration occurs
@@ -3309,9 +3310,21 @@ void spapr_lmb_release(DeviceState *dev)
 
     /*
      * Now that all the LMBs have been removed by the guest, call the
-     * pc-dimm unplug handler to cleanup up the pc-dimm device.
+     * unplug handler chain. This can never fail.
      */
-    pc_dimm_memory_unplug(dev, MACHINE(spapr));
+    hotplug_ctrl = qdev_get_hotplug_handler(dev);
+    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
+}
+
+static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                                Error **errp)
+{
+    sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
+    sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));
+
+    g_assert(ds);
+    g_assert(!ds->nr_lmbs);
+    pc_dimm_memory_unplug(dev, MACHINE(hotplug_dev));
     object_unparent(OBJECT(dev));
     spapr_pending_dimm_unplugs_remove(spapr, ds);
 }
@@ -3608,7 +3621,9 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
     Error *local_err = NULL;
 
     /* final stage hotplug handler */
-    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        spapr_memory_unplug(hotplug_dev, dev, &local_err);
+    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
         hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
                                &local_err);
     }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core unplug via hotplug handler chain
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (7 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-06-01 10:57   ` Igor Mammedov
  2018-06-05  1:13   ` David Gibson
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 10/14] memory-device: new functions to handle plug/unplug David Hildenbrand
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Let's handle it via hotplug_handler_unplug().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/ppc/spapr.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 286c38c842..13d153b5a6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3412,7 +3412,16 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
 /* Callback to be called during DRC release. */
 void spapr_core_release(DeviceState *dev)
 {
-    MachineState *ms = MACHINE(qdev_get_hotplug_handler(dev));
+    HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
+
+    /* Call the unplug handler chain. This can never fail. */
+    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
+}
+
+static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                              Error **errp)
+{
+    MachineState *ms = MACHINE(hotplug_dev);
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
     CPUCore *cc = CPU_CORE(dev);
     CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
@@ -3623,6 +3632,8 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
     /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
         spapr_memory_unplug(hotplug_dev, dev, &local_err);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
+        spapr_core_unplug(hotplug_dev, dev, &local_err);
     } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
         hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
                                &local_err);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 10/14] memory-device: new functions to handle plug/unplug
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (8 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core " David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 11/14] pc-dimm: implement new memory device functions David Hildenbrand
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

We will need a handful of new functions:
- set_addr(): To set the calculated address
- get_memory_region(): To add it to the memory region container
- get_addr(): If the device has any specific alignment requirements

Using these and the existing functions, we can properly plug/unplug
memory devices.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/hw/mem/memory-device.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index 2853b084b5..62d906be50 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -29,14 +29,24 @@ typedef struct MemoryDeviceState {
     Object parent_obj;
 } MemoryDeviceState;
 
+/*
+ * MemoryDeviceClass functions should only be called on realized
+ * MemoryDevice instances.
+ */
 typedef struct MemoryDeviceClass {
     InterfaceClass parent_class;
 
+    /* required functions that have to be implemented */
     uint64_t (*get_addr)(const MemoryDeviceState *md);
+    void (*set_addr)(MemoryDeviceState *md, uint64_t addr);
+    MemoryRegion *(*get_memory_region)(MemoryDeviceState *md);
     uint64_t (*get_plugged_size)(const MemoryDeviceState *md);
     uint64_t (*get_region_size)(const MemoryDeviceState *md);
     void (*fill_device_info)(const MemoryDeviceState *md,
                              MemoryDeviceInfo *info);
+
+    /* optional functions that can be implemented */
+    uint64_t (*get_align)(const MemoryDeviceState *md);
 } MemoryDeviceClass;
 
 MemoryDeviceInfoList *qmp_memory_device_list(void);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 11/14] pc-dimm: implement new memory device functions
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (9 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 10/14] memory-device: new functions to handle plug/unplug David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler David Hildenbrand
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Implement the new functions, we don't have to care about alignment for
these DIMMs right now, so leave that function unimplemented.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/mem/pc-dimm.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 12da89d562..5e2e3263ab 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -244,6 +244,21 @@ static uint64_t pc_dimm_md_get_addr(const MemoryDeviceState *md)
     return dimm->addr;
 }
 
+static void pc_dimm_md_set_addr(MemoryDeviceState *md, uint64_t addr)
+{
+    PCDIMMDevice *dimm = PC_DIMM(md);
+
+    dimm->addr = addr;
+}
+
+static MemoryRegion *pc_dimm_md_get_memory_region(MemoryDeviceState *md)
+{
+    const PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(md);
+    PCDIMMDevice *dimm = PC_DIMM(md);
+
+    return ddc->get_memory_region(dimm, &error_abort);
+}
+
 static uint64_t pc_dimm_md_get_region_size(const MemoryDeviceState *md)
 {
     /* dropping const here is fine as we don't touch the memory region */
@@ -304,6 +319,8 @@ static void pc_dimm_class_init(ObjectClass *oc, void *data)
     ddc->get_vmstate_memory_region = pc_dimm_get_vmstate_memory_region;
 
     mdc->get_addr = pc_dimm_md_get_addr;
+    mdc->set_addr = pc_dimm_md_set_addr;
+    mdc->get_memory_region = pc_dimm_md_get_memory_region;
     /* for a dimm plugged_size == region_size */
     mdc->get_plugged_size = pc_dimm_md_get_region_size;
     mdc->get_region_size = pc_dimm_md_get_region_size;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (10 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 11/14] pc-dimm: implement new memory device functions David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-06-01 11:17   ` Igor Mammedov
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug " David Hildenbrand
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Let's move all pre-plug checks we can do without the device being
realized into the applicable hotplug handler for pc and spapr.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/i386/pc.c                   | 11 +++++++
 hw/mem/memory-device.c         | 72 +++++++++++++++++++-----------------------
 hw/ppc/spapr.c                 | 11 +++++++
 include/hw/mem/memory-device.h |  2 ++
 4 files changed, 57 insertions(+), 39 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8bc41ef24b..61f1537e14 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2010,6 +2010,16 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
 {
     Error *local_err = NULL;
 
+    /* first stage hotplug handler */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+        memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
+                               &local_err);
+    }
+
+    if (local_err) {
+        goto out;
+    }
+
     /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
@@ -2017,6 +2027,7 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
         hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
                                  &local_err);
     }
+out:
     error_propagate(errp, local_err);
 }
 
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 361d38bfc5..d22c91993f 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -68,58 +68,26 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
     return 0;
 }
 
-static void memory_device_check_addable(MachineState *ms, uint64_t size,
-                                        Error **errp)
-{
-    uint64_t used_region_size = 0;
-
-    /* we will need a new memory slot for kvm and vhost */
-    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
-        error_setg(errp, "hypervisor has no free memory slots left");
-        return;
-    }
-    if (!vhost_has_free_slot()) {
-        error_setg(errp, "a used vhost backend has no free memory slots left");
-        return;
-    }
-
-    /* will we exceed the total amount of memory specified */
-    memory_device_used_region_size(OBJECT(ms), &used_region_size);
-    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
-        error_setg(errp, "not enough space, currently 0x%" PRIx64
-                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
-                   used_region_size, ms->maxram_size - ms->ram_size);
-        return;
-    }
-
-}
-
 uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
                                      uint64_t align, uint64_t size,
                                      Error **errp)
 {
     uint64_t address_space_start, address_space_end;
+    uint64_t used_region_size = 0;
     GSList *list = NULL, *item;
     uint64_t new_addr = 0;
 
-    if (!ms->device_memory) {
-        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
-                         "supported by the machine");
-        return 0;
-    }
-
-    if (!memory_region_size(&ms->device_memory->mr)) {
-        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
-                         "enabled, please specify the maxmem option");
-        return 0;
-    }
     address_space_start = ms->device_memory->base;
     address_space_end = address_space_start +
                         memory_region_size(&ms->device_memory->mr);
     g_assert(address_space_end >= address_space_start);
 
-    memory_device_check_addable(ms, size, errp);
-    if (*errp) {
+    /* will we exceed the total amount of memory specified */
+    memory_device_used_region_size(OBJECT(ms), &used_region_size);
+    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
+        error_setg(errp, "not enough space, currently 0x%" PRIx64
+                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
+                   used_region_size, ms->maxram_size - ms->ram_size);
         return 0;
     }
 
@@ -242,6 +210,32 @@ uint64_t get_plugged_memory_size(void)
     return size;
 }
 
+void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
+                            Error **errp)
+{
+    if (!ms->device_memory) {
+        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
+                         "supported by the machine");
+        return;
+    }
+
+    if (!memory_region_size(&ms->device_memory->mr)) {
+        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
+                         "enabled, please specify the maxmem option");
+        return;
+    }
+
+    /* we will need a new memory slot for kvm and vhost */
+    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
+        error_setg(errp, "hypervisor has no free memory slots left");
+        return;
+    }
+    if (!vhost_has_free_slot()) {
+        error_setg(errp, "a used vhost backend has no free memory slots left");
+        return;
+    }
+}
+
 void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
                                uint64_t addr)
 {
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 13d153b5a6..562712def2 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3676,6 +3676,16 @@ static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
 {
     Error *local_err = NULL;
 
+    /* first stage hotplug handler */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+        memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
+                               &local_err);
+    }
+
+    if (local_err) {
+        goto out;
+    }
+
     /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
         spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
@@ -3685,6 +3695,7 @@ static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
         hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
                                  &local_err);
     }
+out:
     error_propagate(errp, local_err);
 }
 
diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index 62d906be50..3a4e9edc92 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -51,6 +51,8 @@ typedef struct MemoryDeviceClass {
 
 MemoryDeviceInfoList *qmp_memory_device_list(void);
 uint64_t get_plugged_memory_size(void);
+void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
+                            Error **errp);
 uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
                                      uint64_t align, uint64_t size,
                                      Error **errp);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug into hotplug handler
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (11 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-06-01 11:31   ` Igor Mammedov
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug " David Hildenbrand
  2018-05-25 12:43 ` [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Let's move the unplug logic into the applicable hotplug handler for pc and
spapr.

We'll move the plug logic next, then this will look more symmetrical in
the hotplug handlers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/i386/pc.c                   | 17 ++++++++++++++++-
 hw/mem/memory-device.c         | 14 ++++++++++++--
 hw/mem/pc-dimm.c               |  2 --
 hw/mem/trace-events            |  2 ++
 hw/ppc/spapr.c                 | 16 +++++++++++++++-
 include/hw/mem/memory-device.h |  2 +-
 6 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 61f1537e14..426fb534c2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2044,6 +2044,12 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
     } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
         hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
     }
+
+    if (local_err) {
+        if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+            memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+        }
+    }
     error_propagate(errp, local_err);
 }
 
@@ -2080,7 +2086,16 @@ static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
         error_setg(&local_err, "acpi: device unplug for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
     }
-    error_propagate(errp, local_err);
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* first stage hotplug handler */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+        memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+    }
 }
 
 static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index d22c91993f..8f10d613ea 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -17,6 +17,7 @@
 #include "qemu/range.h"
 #include "hw/virtio/vhost.h"
 #include "sysemu/kvm.h"
+#include "trace.h"
 
 static gint memory_device_addr_sort(gconstpointer a, gconstpointer b)
 {
@@ -246,12 +247,21 @@ void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
                                 addr - ms->device_memory->base, mr);
 }
 
-void memory_device_unplug_region(MachineState *ms, MemoryRegion *mr)
+void memory_device_unplug(MachineState *ms, MemoryDeviceState *md)
 {
-    /* we expect a previous call to memory_device_get_free_addr() */
+    const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+    MemoryRegion *mr = mdc->get_memory_region(md);
+
+    /* we expect a previous call to memory_device_pre_plug */
     g_assert(ms->device_memory);
 
+    if (!memory_region_is_mapped(mr)) {
+        return;
+    }
+
     memory_region_del_subregion(&ms->device_memory->mr, mr);
+    trace_memory_device_unassign_address(mdc->get_addr(md));
+    mdc->set_addr(md, 0);
 }
 
 static const TypeInfo memory_device_info = {
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 5e2e3263ab..d487bb513b 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -94,9 +94,7 @@ void pc_dimm_memory_unplug(DeviceState *dev, MachineState *machine)
     PCDIMMDevice *dimm = PC_DIMM(dev);
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
     MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
-    MemoryRegion *mr = ddc->get_memory_region(dimm, &error_abort);
 
-    memory_device_unplug_region(machine, mr);
     vmstate_unregister_ram(vmstate_mr, dev);
 }
 
diff --git a/hw/mem/trace-events b/hw/mem/trace-events
index e150dcc497..a661ee49a3 100644
--- a/hw/mem/trace-events
+++ b/hw/mem/trace-events
@@ -3,3 +3,5 @@
 # hw/mem/pc-dimm.c
 mhp_pc_dimm_assigned_slot(int slot) "%d"
 mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
+# hw/mem/memory-device.c
+memory_device_unassign_address(uint64_t addr) "0x%"PRIx64
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 562712def2..abdd38a6b5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3621,6 +3621,11 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
         hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
     }
 out:
+    if (local_err) {
+        if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+            memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+        }
+    }
     error_propagate(errp, local_err);
 }
 
@@ -3638,7 +3643,16 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
         hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
                                &local_err);
     }
-    error_propagate(errp, local_err);
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /* first stage hotplug handler */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+        memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
+    }
 }
 
 static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index 3a4e9edc92..b8365959e7 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -58,6 +58,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
                                      Error **errp);
 void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
                                uint64_t addr);
-void memory_device_unplug_region(MachineState *ms, MemoryRegion *mr);
+void memory_device_unplug(MachineState *ms, MemoryDeviceState *md);
 
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug into hotplug handler
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (12 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug " David Hildenbrand
@ 2018-05-17  8:15 ` David Hildenbrand
  2018-06-01 11:39   ` Igor Mammedov
  2018-05-25 12:43 ` [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
  14 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-17  8:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Michael S . Tsirkin, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino,
	David Hildenbrand

Let's move the plug logic into the applicable hotplug handler for pc and
spapr.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/i386/pc.c                   | 35 ++++++++++++++++++++---------------
 hw/mem/memory-device.c         | 40 ++++++++++++++++++++++++++++++++++------
 hw/mem/pc-dimm.c               | 29 +----------------------------
 hw/mem/trace-events            |  2 +-
 hw/ppc/spapr.c                 | 15 ++++++++++++---
 include/hw/mem/memory-device.h |  7 ++-----
 include/hw/mem/pc-dimm.h       |  3 +--
 7 files changed, 71 insertions(+), 60 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 426fb534c2..f022eb042e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1682,22 +1682,8 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
     HotplugHandlerClass *hhc;
     Error *local_err = NULL;
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
-    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
-    PCDIMMDevice *dimm = PC_DIMM(dev);
-    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
-    MemoryRegion *mr;
-    uint64_t align = TARGET_PAGE_SIZE;
     bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
 
-    mr = ddc->get_memory_region(dimm, &local_err);
-    if (local_err) {
-        goto out;
-    }
-
-    if (memory_region_get_alignment(mr) && pcmc->enforce_aligned_dimm) {
-        align = memory_region_get_alignment(mr);
-    }
-
     /*
      * When -no-acpi is used with Q35 machine type, no ACPI is built,
      * but pcms->acpi_dev is still created. Check !acpi_enabled in
@@ -1715,7 +1701,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
         goto out;
     }
 
-    pc_dimm_memory_plug(dev, MACHINE(pcms), align, &local_err);
+    pc_dimm_memory_plug(dev, MACHINE(pcms), &local_err);
     if (local_err) {
         goto out;
     }
@@ -2036,6 +2022,25 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
 {
     Error *local_err = NULL;
 
+    /* first stage hotplug handler */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+        const PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(hotplug_dev);
+        uint64_t align = 0;
+
+        /* compat handling: force to TARGET_PAGE_SIZE */
+        if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
+            !pcmc->enforce_aligned_dimm) {
+            align = TARGET_PAGE_SIZE;
+        }
+        memory_device_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
+                           align ? &align : NULL, &local_err);
+    }
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
         pc_dimm_plug(hotplug_dev, dev, &local_err);
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 8f10d613ea..04bdb30f22 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -69,9 +69,10 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
     return 0;
 }
 
-uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
-                                     uint64_t align, uint64_t size,
-                                     Error **errp)
+static uint64_t memory_device_get_free_addr(MachineState *ms,
+                                            const uint64_t *hint,
+                                            uint64_t align, uint64_t size,
+                                            Error **errp)
 {
     uint64_t address_space_start, address_space_end;
     uint64_t used_region_size = 0;
@@ -237,11 +238,38 @@ void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
     }
 }
 
-void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
-                               uint64_t addr)
+void memory_device_plug(MachineState *ms, MemoryDeviceState *md,
+                        uint64_t *enforced_align, Error **errp)
 {
-    /* we expect a previous call to memory_device_get_free_addr() */
+    const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+    const uint64_t size = mdc->get_region_size(md);
+    MemoryRegion *mr = mdc->get_memory_region(md);
+    uint64_t addr = mdc->get_addr(md);
+    uint64_t align;
+
+    /* we expect a previous call to memory_device_pre_plug */
     g_assert(ms->device_memory);
+    g_assert(mr && !memory_region_is_mapped(mr));
+
+    /* compat handling, some alignment has to be enforced for DIMMs */
+    if (enforced_align) {
+        align = *enforced_align;
+    } else {
+        align = memory_region_get_alignment(mr);
+    }
+
+    /* our device might have stronger alignment requirements */
+    if (mdc->get_align) {
+        align = MAX(align, mdc->get_align(md));
+    }
+
+    addr = memory_device_get_free_addr(ms, !addr ? NULL : &addr, align,
+                                       size, errp);
+    if (*errp) {
+        return;
+    }
+    trace_memory_device_assign_address(addr);
+    mdc->set_addr(md, addr);
 
     memory_region_add_subregion(&ms->device_memory->mr,
                                 addr - ms->device_memory->base, mr);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index d487bb513b..8b1dcb3260 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -32,39 +32,13 @@ typedef struct pc_dimms_capacity {
      Error    **errp;
 } pc_dimms_capacity;
 
-void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
-                         uint64_t align, Error **errp)
+void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine, Error **errp)
 {
     int slot;
     PCDIMMDevice *dimm = PC_DIMM(dev);
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
     MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
     Error *local_err = NULL;
-    MemoryRegion *mr;
-    uint64_t addr;
-
-    mr = ddc->get_memory_region(dimm, &local_err);
-    if (local_err) {
-        goto out;
-    }
-
-    addr = object_property_get_uint(OBJECT(dimm),
-                                    PC_DIMM_ADDR_PROP, &local_err);
-    if (local_err) {
-        goto out;
-    }
-
-    addr = memory_device_get_free_addr(machine, !addr ? NULL : &addr, align,
-                                       memory_region_size(mr), &local_err);
-    if (local_err) {
-        goto out;
-    }
-
-    object_property_set_uint(OBJECT(dev), addr, PC_DIMM_ADDR_PROP, &local_err);
-    if (local_err) {
-        goto out;
-    }
-    trace_mhp_pc_dimm_assigned_address(addr);
 
     slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, &local_err);
     if (local_err) {
@@ -82,7 +56,6 @@ void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
     }
     trace_mhp_pc_dimm_assigned_slot(slot);
 
-    memory_device_plug_region(machine, mr, addr);
     vmstate_register_ram(vmstate_mr, dev);
 
 out:
diff --git a/hw/mem/trace-events b/hw/mem/trace-events
index a661ee49a3..930b6aa6ea 100644
--- a/hw/mem/trace-events
+++ b/hw/mem/trace-events
@@ -2,6 +2,6 @@
 
 # hw/mem/pc-dimm.c
 mhp_pc_dimm_assigned_slot(int slot) "%d"
-mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
 # hw/mem/memory-device.c
+memory_device_assign_address(uint64_t addr) "0x%"PRIx64
 memory_device_unassign_address(uint64_t addr) "0x%"PRIx64
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index abdd38a6b5..5a4dbbf31e 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3144,16 +3144,15 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     PCDIMMDevice *dimm = PC_DIMM(dev);
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
     MemoryRegion *mr;
-    uint64_t align, size, addr;
+    uint64_t size, addr;
 
     mr = ddc->get_memory_region(dimm, &local_err);
     if (local_err) {
         goto out;
     }
-    align = memory_region_get_alignment(mr);
     size = memory_region_size(mr);
 
-    pc_dimm_memory_plug(dev, MACHINE(ms), align, &local_err);
+    pc_dimm_memory_plug(dev, MACHINE(ms), &local_err);
     if (local_err) {
         goto out;
     }
@@ -3595,6 +3594,16 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
     Error *local_err = NULL;
 
+    /* first stage hotplug handler */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
+        memory_device_plug(ms, MEMORY_DEVICE(dev), NULL, &local_err);
+    }
+
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     /* final stage hotplug handler */
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
         int node;
diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index b8365959e7..a7408597fd 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -53,11 +53,8 @@ MemoryDeviceInfoList *qmp_memory_device_list(void);
 uint64_t get_plugged_memory_size(void);
 void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
                             Error **errp);
-uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
-                                     uint64_t align, uint64_t size,
-                                     Error **errp);
-void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
-                               uint64_t addr);
+void memory_device_plug(MachineState *ms, MemoryDeviceState *md,
+                        uint64_t *enforced_align, Error **errp);
 void memory_device_unplug(MachineState *ms, MemoryDeviceState *md);
 
 #endif
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 627c8601d9..006c80fb2e 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -78,7 +78,6 @@ typedef struct PCDIMMDeviceClass {
 
 int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
-void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
-                         uint64_t align, Error **errp);
+void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine, Error **errp);
 void pc_dimm_memory_unplug(DeviceState *dev, MachineState *machine);
 #endif
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers David Hildenbrand
@ 2018-05-17 12:43   ` Greg Kurz
  2018-06-01 10:33   ` [Qemu-devel] " Igor Mammedov
  2018-06-05  1:08   ` David Gibson
  2 siblings, 0 replies; 76+ messages in thread
From: Greg Kurz @ 2018-05-17 12:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Christian Borntraeger,
	qemu-s390x, qemu-ppc, Paolo Bonzini, Marcel Apfelbaum,
	Igor Mammedov, Luiz Capitulino, David Gibson, Richard Henderson

On Thu, 17 May 2018 10:15:19 +0200
David Hildenbrand <david@redhat.com> wrote:

> For multi stage hotplug handlers, we'll have to do some error handling
> in some hotplug functions, so let's use a local error variable (except
> for unplug requests).
> 
> Also, add code to pass control to the final stage hotplug handler at the
> parent bus.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ebf30dd60b..b7c5c95f7a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>  {
>      MachineState *ms = MACHINE(hotplug_dev);
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
> +    Error *local_err = NULL;
>  
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          int node;
>  
>          if (!smc->dr_lmb_enabled) {
> -            error_setg(errp, "Memory hotplug not supported for this machine");
> -            return;
> +            error_setg(&local_err,
> +                       "Memory hotplug not supported for this machine");
> +            goto out;
>          }
> -        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
> -        if (*errp) {

Heh ! This is even a fix since errp could theoretically be NULL.

Reviewed-by: Greg Kurz <groug@kaod.org>

> -            return;
> +        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> +                                        &local_err);
> +        if (local_err) {
> +            goto out;
>          }
>          if (node < 0 || node >= MAX_NODES) {
> -            error_setg(errp, "Invaild node %d", node);
> -            return;
> +            error_setg(&local_err, "Invaild node %d", node);
> +            goto out;
>          }
>  
> -        spapr_memory_plug(hotplug_dev, dev, node, errp);
> +        spapr_memory_plug(hotplug_dev, dev, node, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -        spapr_core_plug(hotplug_dev, dev, errp);
> +        spapr_core_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> +    }
> +out:
> +    error_propagate(errp, local_err);
> +}
> +
> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> +                                        DeviceState *dev, Error **errp)
> +{
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
> +    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> +                               &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> @@ -3618,17 +3639,27 @@ static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>              return;
>          }
>          spapr_core_unplug_request(hotplug_dev, dev, errp);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
> +                                       errp);
>      }
>  }
>  
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>                                            DeviceState *dev, Error **errp)
>  {
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -        spapr_memory_pre_plug(hotplug_dev, dev, errp);
> +        spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -        spapr_core_pre_plug(hotplug_dev, dev, errp);
> +        spapr_core_pre_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> +                                 &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
> @@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
>      mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>      hc->unplug_request = spapr_machine_device_unplug_request;
> +    hc->unplug = spapr_machine_device_unplug;
>  
>      smc->dr_lmb_enabled = true;
>      mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
  2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
                   ` (13 preceding siblings ...)
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug " David Hildenbrand
@ 2018-05-25 12:43 ` David Hildenbrand
  2018-05-30 14:03   ` Paolo Bonzini
  2018-06-01 12:13   ` Igor Mammedov
  14 siblings, 2 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-25 12:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Igor Mammedov, Luiz Capitulino, David Gibson,
	Richard Henderson

On 17.05.2018 10:15, David Hildenbrand wrote:
> We can have devices that need certain other resources that are e.g.
> system resources managed by the machine. We need a clean way to assign
> these resources (without violating layers as brought up by Igor).
> 
> One example is virtio-mem/virtio-pmem. Both device types need to be
> assigned some region in guest physical address space. This device memory
> belongs to the machine and is managed by it. However, virito devices are
> hotplugged using the hotplug handler their proxy device implements. So we
> could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
> hotplug handler for virtio-ccw. But definetly not the machine.
> 
> Now, we can route other devices through the machine hotplug handler, to
> properly assign/unassign resources - like a portion in guest physical
> address space.
> 
> v3 -> v4:
> - Removed the s390x bits, will send that out separately (was just a proof
>   that it works just fine with s390x)
> - Fixed a typo and reworded a comment
> 
> v2 -> v3:
> - Added "memory-device: introduce separate config option"
> - Dropped "parent_bus" check from hotplug handler lookup functions
> - "Handly" -> "Handle" in patch description.
> 
> v1 -> v2:
> - Use multi stage hotplug handler instead of resource handler
> - MemoryDevices only compiled if necessary (CONFIG_MEM_HOTPLUG)
> - Prepare PC/SPAPR machines properly for multi stage hotplug handlers
> - Route SPAPR unplug code via the hotunplug handler
> - Directly include s390x support. But there are no usable memory devices
>   yet (well, only my virtio-mem prototype)
> - Included "memory-device: drop assert related to align and start of address
>   space"
> 
> David Hildenbrand (13):
>   memory-device: drop assert related to align and start of address space
>   memory-device: introduce separate config option
>   pc: prepare for multi stage hotplug handlers
>   pc: route all memory devices through the machine hotplug handler
>   spapr: prepare for multi stage hotplug handlers
>   spapr: route all memory devices through the machine hotplug handler
>   spapr: handle pc-dimm unplug via hotplug handler chain
>   spapr: handle cpu core unplug via hotplug handler chain
>   memory-device: new functions to handle plug/unplug
>   pc-dimm: implement new memory device functions
>   memory-device: factor out pre-plug into hotplug handler
>   memory-device: factor out unplug into hotplug handler
>   memory-device: factor out plug into hotplug handler
> 
> Igor Mammedov (1):
>   qdev: let machine hotplug handler to override bus hotplug handler
> 
>  default-configs/i386-softmmu.mak   |   3 +-
>  default-configs/ppc64-softmmu.mak  |   3 +-
>  default-configs/x86_64-softmmu.mak |   3 +-
>  hw/Makefile.objs                   |   2 +-
>  hw/core/qdev.c                     |   6 +-
>  hw/i386/pc.c                       | 102 ++++++++++++++++++++++-------
>  hw/mem/Makefile.objs               |   4 +-
>  hw/mem/memory-device.c             | 129 +++++++++++++++++++++++--------------
>  hw/mem/pc-dimm.c                   |  48 ++++++--------
>  hw/mem/trace-events                |   4 +-
>  hw/ppc/spapr.c                     | 129 +++++++++++++++++++++++++++++++------
>  include/hw/mem/memory-device.h     |  21 ++++--
>  include/hw/mem/pc-dimm.h           |   3 +-
>  include/hw/qdev-core.h             |  11 ++++
>  qapi/misc.json                     |   2 +-
>  15 files changed, 330 insertions(+), 140 deletions(-)
> 

As there was no negative feedback so far, I will go ahead and assume
that this approach is the right thing to do.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space David Hildenbrand
@ 2018-05-29 13:27   ` Igor Mammedov
  2018-05-29 16:02     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-05-29 13:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 17 May 2018 10:15:14 +0200
David Hildenbrand <david@redhat.com> wrote:

> The start of the address space does not have to be aligned for the
> search. Handle this case explicitly when starting the search for a new
> address.
That's true,
but commit message doesn't explain why address_space_start
should be allowed to be non aligned.

At least with this assert we would notice early that
board allocating misaligned address space.
I'd keep the assert unless there is a good reason to drop it.


> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/mem/memory-device.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 3e04f3954e..361d38bfc5 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -116,7 +116,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>      address_space_start = ms->device_memory->base;
>      address_space_end = address_space_start +
>                          memory_region_size(&ms->device_memory->mr);
> -    g_assert(QEMU_ALIGN_UP(address_space_start, align) == address_space_start);
>      g_assert(address_space_end >= address_space_start);
>  
>      memory_device_check_addable(ms, size, errp);
> @@ -149,7 +148,7 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>              return 0;
>          }
>      } else {
> -        new_addr = address_space_start;
> +        new_addr = QEMU_ALIGN_UP(address_space_start, align);
>      }
>  
>      /* find address range that will fit new memory device */

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-05-29 13:27   ` Igor Mammedov
@ 2018-05-29 16:02     ` David Hildenbrand
  2018-05-30 12:57       ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-29 16:02 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 29.05.2018 15:27, Igor Mammedov wrote:
> On Thu, 17 May 2018 10:15:14 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> The start of the address space does not have to be aligned for the
>> search. Handle this case explicitly when starting the search for a new
>> address.
> That's true,
> but commit message doesn't explain why address_space_start
> should be allowed to be non aligned.
> 
> At least with this assert we would notice early that
> board allocating misaligned address space.
> I'd keep the assert unless there is a good reason to drop it.

That reason might be that I can easily crash QEMU

 ./x86_64-softmmu/qemu-system-x86_64 -m 256M,maxmem=20G,slots=2 -object
memory-backend-file,id=mem0,size=8192M,mem-path=/dev/zero,align=8192M
-device pc-dimm,id=dimm1,memdev=mem0

ERROR:hw/mem/memory-device.c:146:memory_device_get_free_addr: assertion
failed: (QEMU_ALIGN_UP(address_space_start, align) == address_space_start)


> 
> 
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/mem/memory-device.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
>> index 3e04f3954e..361d38bfc5 100644
>> --- a/hw/mem/memory-device.c
>> +++ b/hw/mem/memory-device.c
>> @@ -116,7 +116,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>>      address_space_start = ms->device_memory->base;
>>      address_space_end = address_space_start +
>>                          memory_region_size(&ms->device_memory->mr);
>> -    g_assert(QEMU_ALIGN_UP(address_space_start, align) == address_space_start);
>>      g_assert(address_space_end >= address_space_start);
>>  
>>      memory_device_check_addable(ms, size, errp);
>> @@ -149,7 +148,7 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>>              return 0;
>>          }
>>      } else {
>> -        new_addr = address_space_start;
>> +        new_addr = QEMU_ALIGN_UP(address_space_start, align);
>>      }
>>  
>>      /* find address range that will fit new memory device */
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-05-29 16:02     ` David Hildenbrand
@ 2018-05-30 12:57       ` Igor Mammedov
  2018-05-30 14:06         ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-05-30 12:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Tue, 29 May 2018 18:02:06 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 29.05.2018 15:27, Igor Mammedov wrote:
> > On Thu, 17 May 2018 10:15:14 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> The start of the address space does not have to be aligned for the
> >> search. Handle this case explicitly when starting the search for a new
> >> address.  
> > That's true,
> > but commit message doesn't explain why address_space_start
> > should be allowed to be non aligned.
> > 
> > At least with this assert we would notice early that
> > board allocating misaligned address space.
> > I'd keep the assert unless there is a good reason to drop it.  
> 
> That reason might be that I can easily crash QEMU
> 
>  ./x86_64-softmmu/qemu-system-x86_64 -m 256M,maxmem=20G,slots=2 -object
> memory-backend-file,id=mem0,size=8192M,mem-path=/dev/zero,align=8192M
> -device pc-dimm,id=dimm1,memdev=mem0
> 
> ERROR:hw/mem/memory-device.c:146:memory_device_get_free_addr: assertion
> failed: (QEMU_ALIGN_UP(address_space_start, align) == address_space_start)
it looks like a different issue.
As I remember user visible 'align' property was added as duct tape since
we can't figure out alignment for DAX device no the host,
so it was left upto upper layers to magically figure that out.

However we probably shouldn't allow arbitrary or bigger aligment
than max page size supported by target machine/cpu
(i.e. currently hardcoded address_space_start alignment),
as it creates unnecessary fragmentation and not counted in size
of hotplug region (for x86 we count in additional 1Gb per memory device).

How about turning that assert into error check that
inhibits plugging in devices with alignment values
larger than address_space_start alignment?

 
> >   
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  hw/mem/memory-device.c | 3 +--
> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> >> index 3e04f3954e..361d38bfc5 100644
> >> --- a/hw/mem/memory-device.c
> >> +++ b/hw/mem/memory-device.c
> >> @@ -116,7 +116,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
> >>      address_space_start = ms->device_memory->base;
> >>      address_space_end = address_space_start +
> >>                          memory_region_size(&ms->device_memory->mr);
> >> -    g_assert(QEMU_ALIGN_UP(address_space_start, align) == address_space_start);
> >>      g_assert(address_space_end >= address_space_start);
> >>  
> >>      memory_device_check_addable(ms, size, errp);
> >> @@ -149,7 +148,7 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
> >>              return 0;
> >>          }
> >>      } else {
> >> -        new_addr = address_space_start;
> >> +        new_addr = QEMU_ALIGN_UP(address_space_start, align);
> >>      }
> >>  
> >>      /* find address range that will fit new memory device */  
> >   
> 
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/14] memory-device: introduce separate config option
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 02/14] memory-device: introduce separate config option David Hildenbrand
@ 2018-05-30 12:58   ` Igor Mammedov
  0 siblings, 0 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-05-30 12:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 17 May 2018 10:15:15 +0200
David Hildenbrand <david@redhat.com> wrote:

> Some architectures might support memory devices, while they don't
> support DIMM/NVDIMM. So let's
> - Rename CONFIG_MEM_HOTPLUG to CONFIG_MEM_DEVICE
> - Introduce CONFIG_DIMM and use it similarly to CONFIG NVDIMM
> 
> CONFIG_DIMM and CONFIG_NVDIMM require CONFIG_MEM_DEVICE.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  default-configs/i386-softmmu.mak   | 3 ++-
>  default-configs/ppc64-softmmu.mak  | 3 ++-
>  default-configs/x86_64-softmmu.mak | 3 ++-
>  hw/Makefile.objs                   | 2 +-
>  hw/mem/Makefile.objs               | 4 ++--
>  qapi/misc.json                     | 2 +-
>  6 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index 8c7d4a0fa0..4c1637338b 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -50,7 +50,8 @@ CONFIG_PCI_Q35=y
>  CONFIG_APIC=y
>  CONFIG_IOAPIC=y
>  CONFIG_PVPANIC=y
> -CONFIG_MEM_HOTPLUG=y
> +CONFIG_MEM_DEVICE=y
> +CONFIG_DIMM=y
>  CONFIG_NVDIMM=y
>  CONFIG_ACPI_NVDIMM=y
>  CONFIG_PCIE_PORT=y
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index b94af6c7c6..f550573782 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -16,4 +16,5 @@ CONFIG_VIRTIO_VGA=y
>  CONFIG_XICS=$(CONFIG_PSERIES)
>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> -CONFIG_MEM_HOTPLUG=y
> +CONFIG_MEM_DEVICE=y
> +CONFIG_DIMM=y
> diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> index 0390b4303c..7785351414 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -50,7 +50,8 @@ CONFIG_PCI_Q35=y
>  CONFIG_APIC=y
>  CONFIG_IOAPIC=y
>  CONFIG_PVPANIC=y
> -CONFIG_MEM_HOTPLUG=y
> +CONFIG_MEM_DEVICE=y
> +CONFIG_DIMM=y
>  CONFIG_NVDIMM=y
>  CONFIG_ACPI_NVDIMM=y
>  CONFIG_PCIE_PORT=y
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 6a0ffe0afd..127a60eca4 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -33,7 +33,7 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
>  devices-dirs-$(CONFIG_SOFTMMU) += virtio/
>  devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
>  devices-dirs-$(CONFIG_SOFTMMU) += xen/
> -devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
> +devices-dirs-$(CONFIG_MEM_DEVICE) += mem/
>  devices-dirs-$(CONFIG_SOFTMMU) += smbios/
>  devices-dirs-y += core/
>  common-obj-y += $(devices-dirs-y)
> diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
> index 10be4df2a2..3e2f7c5ca2 100644
> --- a/hw/mem/Makefile.objs
> +++ b/hw/mem/Makefile.objs
> @@ -1,3 +1,3 @@
> -common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
> -common-obj-$(CONFIG_MEM_HOTPLUG) += memory-device.o
> +common-obj-$(CONFIG_DIMM) += pc-dimm.o
> +common-obj-$(CONFIG_MEM_DEVICE) += memory-device.o
>  common-obj-$(CONFIG_NVDIMM) += nvdimm.o
> diff --git a/qapi/misc.json b/qapi/misc.json
> index f5988cc0b5..4e6265cd2e 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -2060,7 +2060,7 @@
>  #
>  # @plugged-memory: size of memory that can be hot-unplugged. This field
>  #                  is omitted if target doesn't support memory hotplug
> -#                  (i.e. CONFIG_MEM_HOTPLUG not defined on build time).
> +#                  (i.e. CONFIG_MEM_DEVICE not defined at build time).
>  #
>  # Since: 2.11.0
>  ##

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers David Hildenbrand
@ 2018-05-30 13:08   ` Igor Mammedov
  2018-05-30 14:13     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-05-30 13:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 17 May 2018 10:15:17 +0200
David Hildenbrand <david@redhat.com> wrote:

> For multi stage hotplug handlers, we'll have to do some error handling
> in some hotplug functions, so let's use a local error variable (except
> for unplug requests).
I'd split out introducing local error into separate patch
so patch would do a single thing.

> Also, add code to pass control to the final stage hotplug handler at the
> parent bus.
But I don't agree with generic
 "} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {"
forwarding, it's done by 3/14 for generic case and in case of
special device that needs bus handler called from machine one,
I'd suggest to do forwarding explicitly for that device only
like we do with acpi_dev.


> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/i386/pc.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index d768930d02..510076e156 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2007,19 +2007,32 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>                                            DeviceState *dev, Error **errp)
>  {
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> -        pc_cpu_pre_plug(hotplug_dev, dev, errp);
> +        pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> +                                 &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>                                        DeviceState *dev, Error **errp)
>  {
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -        pc_dimm_plug(hotplug_dev, dev, errp);
> +        pc_dimm_plug(hotplug_dev, dev, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> -        pc_cpu_plug(hotplug_dev, dev, errp);
> +        pc_cpu_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
> @@ -2029,7 +2042,10 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>          pc_dimm_unplug_request(hotplug_dev, dev, errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>          pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
> -    } else {
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
> +                                       errp);
> +    } else if (!dev->parent_bus) {
>          error_setg(errp, "acpi: device unplug request for not supported device"
>                     " type: %s", object_get_typename(OBJECT(dev)));
>      }
> @@ -2038,14 +2054,21 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>  static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
>                                          DeviceState *dev, Error **errp)
>  {
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -        pc_dimm_unplug(hotplug_dev, dev, errp);
> +        pc_dimm_unplug(hotplug_dev, dev, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> -        pc_cpu_unplug_cb(hotplug_dev, dev, errp);
> -    } else {
> -        error_setg(errp, "acpi: device unplug for not supported device"
> +        pc_cpu_unplug_cb(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> +                               &local_err);
> +    } else if (!dev->parent_bus) {
> +        error_setg(&local_err, "acpi: device unplug for not supported device"
>                     " type: %s", object_get_typename(OBJECT(dev)));
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler David Hildenbrand
@ 2018-05-30 13:12   ` Igor Mammedov
  2018-05-30 14:08     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-05-30 13:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Luiz Capitulino, David Gibson,
	Richard Henderson

On Thu, 17 May 2018 10:15:18 +0200
David Hildenbrand <david@redhat.com> wrote:

> Necessary to hotplug them cleanly later. We could drop the PC_DIMM
> check, as PC_DIMM are just memory devices, but this approach is cleaner.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/i386/pc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 510076e156..8bc41ef24b 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -74,6 +74,7 @@
>  #include "hw/nmi.h"
>  #include "hw/i386/intel_iommu.h"
>  #include "hw/net/ne2000-isa.h"
> +#include "hw/mem/memory-device.h"
>  
>  /* debug PC/ISA interrupts */
>  //#define DEBUG_IRQ
> @@ -2075,6 +2076,7 @@ static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
>                                               DeviceState *dev)
>  {
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
> +        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
you probably could drop TYPE_PC_DIMM above, it's redundant since DIMM
can be cast to TYPE_MEMORY_DEVICE

ditto for spapr

>          object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>          return HOTPLUG_HANDLER(machine);
>      }

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
  2018-05-25 12:43 ` [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
@ 2018-05-30 14:03   ` Paolo Bonzini
  2018-05-31 11:47     ` Igor Mammedov
  2018-06-01 12:13   ` Igor Mammedov
  1 sibling, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2018-05-30 14:03 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Marcel Apfelbaum,
	Igor Mammedov, Luiz Capitulino, David Gibson, Richard Henderson

On 25/05/2018 14:43, David Hildenbrand wrote:
> On 17.05.2018 10:15, David Hildenbrand wrote:
>> We can have devices that need certain other resources that are e.g.
>> system resources managed by the machine. We need a clean way to assign
>> these resources (without violating layers as brought up by Igor).
>>
>> One example is virtio-mem/virtio-pmem. Both device types need to be
>> assigned some region in guest physical address space. This device memory
>> belongs to the machine and is managed by it. However, virito devices are
>> hotplugged using the hotplug handler their proxy device implements. So we
>> could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
>> hotplug handler for virtio-ccw. But definetly not the machine.
>>
>> Now, we can route other devices through the machine hotplug handler, to
>> properly assign/unassign resources - like a portion in guest physical
>> address space.
>>
>> v3 -> v4:
>> - Removed the s390x bits, will send that out separately (was just a proof
>>   that it works just fine with s390x)
>> - Fixed a typo and reworded a comment
>>
>> v2 -> v3:
>> - Added "memory-device: introduce separate config option"
>> - Dropped "parent_bus" check from hotplug handler lookup functions
>> - "Handly" -> "Handle" in patch description.
>>
>> v1 -> v2:
>> - Use multi stage hotplug handler instead of resource handler
>> - MemoryDevices only compiled if necessary (CONFIG_MEM_HOTPLUG)
>> - Prepare PC/SPAPR machines properly for multi stage hotplug handlers
>> - Route SPAPR unplug code via the hotunplug handler
>> - Directly include s390x support. But there are no usable memory devices
>>   yet (well, only my virtio-mem prototype)
>> - Included "memory-device: drop assert related to align and start of address
>>   space"
>>
>> David Hildenbrand (13):
>>   memory-device: drop assert related to align and start of address space
>>   memory-device: introduce separate config option
>>   pc: prepare for multi stage hotplug handlers
>>   pc: route all memory devices through the machine hotplug handler
>>   spapr: prepare for multi stage hotplug handlers
>>   spapr: route all memory devices through the machine hotplug handler
>>   spapr: handle pc-dimm unplug via hotplug handler chain
>>   spapr: handle cpu core unplug via hotplug handler chain
>>   memory-device: new functions to handle plug/unplug
>>   pc-dimm: implement new memory device functions
>>   memory-device: factor out pre-plug into hotplug handler
>>   memory-device: factor out unplug into hotplug handler
>>   memory-device: factor out plug into hotplug handler
>>
>> Igor Mammedov (1):
>>   qdev: let machine hotplug handler to override bus hotplug handler
>>
>>  default-configs/i386-softmmu.mak   |   3 +-
>>  default-configs/ppc64-softmmu.mak  |   3 +-
>>  default-configs/x86_64-softmmu.mak |   3 +-
>>  hw/Makefile.objs                   |   2 +-
>>  hw/core/qdev.c                     |   6 +-
>>  hw/i386/pc.c                       | 102 ++++++++++++++++++++++-------
>>  hw/mem/Makefile.objs               |   4 +-
>>  hw/mem/memory-device.c             | 129 +++++++++++++++++++++++--------------
>>  hw/mem/pc-dimm.c                   |  48 ++++++--------
>>  hw/mem/trace-events                |   4 +-
>>  hw/ppc/spapr.c                     | 129 +++++++++++++++++++++++++++++++------
>>  include/hw/mem/memory-device.h     |  21 ++++--
>>  include/hw/mem/pc-dimm.h           |   3 +-
>>  include/hw/qdev-core.h             |  11 ++++
>>  qapi/misc.json                     |   2 +-
>>  15 files changed, 330 insertions(+), 140 deletions(-)
>>
> 
> As there was no negative feedback so far, I will go ahead and assume
> that this approach is the right thing to do.

Ok, I'll queue this.

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-05-30 12:57       ` Igor Mammedov
@ 2018-05-30 14:06         ` David Hildenbrand
  2018-05-31 13:54           ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-30 14:06 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 30.05.2018 14:57, Igor Mammedov wrote:
> On Tue, 29 May 2018 18:02:06 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 29.05.2018 15:27, Igor Mammedov wrote:
>>> On Thu, 17 May 2018 10:15:14 +0200
>>> David Hildenbrand <david@redhat.com> wrote:
>>>   
>>>> The start of the address space does not have to be aligned for the
>>>> search. Handle this case explicitly when starting the search for a new
>>>> address.  
>>> That's true,
>>> but commit message doesn't explain why address_space_start
>>> should be allowed to be non aligned.
>>>
>>> At least with this assert we would notice early that
>>> board allocating misaligned address space.
>>> I'd keep the assert unless there is a good reason to drop it.  
>>
>> That reason might be that I can easily crash QEMU
>>
>>  ./x86_64-softmmu/qemu-system-x86_64 -m 256M,maxmem=20G,slots=2 -object
>> memory-backend-file,id=mem0,size=8192M,mem-path=/dev/zero,align=8192M
>> -device pc-dimm,id=dimm1,memdev=mem0
>>
>> ERROR:hw/mem/memory-device.c:146:memory_device_get_free_addr: assertion
>> failed: (QEMU_ALIGN_UP(address_space_start, align) == address_space_start)
> it looks like a different issue.
> As I remember user visible 'align' property was added as duct tape since
> we can't figure out alignment for DAX device no the host,
> so it was left upto upper layers to magically figure that out.
> 
> However we probably shouldn't allow arbitrary or bigger aligment
> than max page size supported by target machine/cpu
> (i.e. currently hardcoded address_space_start alignment),
> as it creates unnecessary fragmentation and not counted in size
> of hotplug region (for x86 we count in additional 1Gb per memory device).
> 
> How about turning that assert into error check that
> inhibits plugging in devices with alignment values
> larger than address_space_start alignment?


Let me explain a little bit why I don't like such restrictions (for
which I don't see a need yet):

virtio-mem devices will later have a certain block size (e.g. 4MB). I
want to give devices during resource assignment the possibility to
specify their alignment requirements.

For said virtio-mem device, this would e.g. be 4MB. (see patch 10 and 14
of how this call "get_align" comes into play), because the addresses of
the memory blocks are all aligned by e.g. 4MB. This is what is
guaranteed by the device specification.

E.g. for DIMMs we might want to allow to specify the section size (e.g.
128MB on x86), otherwise e.g. Linux is not able to add all memory. (but
we should not hardcode this, as this is a Linux specific requirement -
still it would be nice to specify)

So in general, a memory device might have some alignment that is to be
taken care of.

I don't understand right now why an upper limit on the alignment would
make sense at all. We can easily handle it during our search. And we
have to handle it either way during the search, if we plug some device
with strange sizes (e.g. 1MB DIMM).

Of course, we might end up fragmenting guest physical memory, but that
is purely a setup issue (choosing sizes of devices + main memory
properly). I don't see a reason to error out (and of course also not to
assert out :) ).

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler
  2018-05-30 13:12   ` Igor Mammedov
@ 2018-05-30 14:08     ` David Hildenbrand
  2018-05-30 14:27       ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-30 14:08 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Luiz Capitulino, David Gibson,
	Richard Henderson

On 30.05.2018 15:12, Igor Mammedov wrote:
> On Thu, 17 May 2018 10:15:18 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Necessary to hotplug them cleanly later. We could drop the PC_DIMM
>> check, as PC_DIMM are just memory devices, but this approach is cleaner.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/i386/pc.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 510076e156..8bc41ef24b 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -74,6 +74,7 @@
>>  #include "hw/nmi.h"
>>  #include "hw/i386/intel_iommu.h"
>>  #include "hw/net/ne2000-isa.h"
>> +#include "hw/mem/memory-device.h"
>>  
>>  /* debug PC/ISA interrupts */
>>  //#define DEBUG_IRQ
>> @@ -2075,6 +2076,7 @@ static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
>>                                               DeviceState *dev)
>>  {
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>> +        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
> you probably could drop TYPE_PC_DIMM above, it's redundant since DIMM
> can be cast to TYPE_MEMORY_DEVICE
> 
> ditto for spapr
> 

Yes, had the same in mind but left it for now this way (basically
because we do special handling for PC_DIMM, so anybody reading this code
is not directly confused)

>>          object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>>          return HOTPLUG_HANDLER(machine);
>>      }
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-05-30 13:08   ` Igor Mammedov
@ 2018-05-30 14:13     ` David Hildenbrand
  2018-05-31 14:13       ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-05-30 14:13 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 30.05.2018 15:08, Igor Mammedov wrote:
> On Thu, 17 May 2018 10:15:17 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> For multi stage hotplug handlers, we'll have to do some error handling
>> in some hotplug functions, so let's use a local error variable (except
>> for unplug requests).
> I'd split out introducing local error into separate patch
> so patch would do a single thing.
> 
>> Also, add code to pass control to the final stage hotplug handler at the
>> parent bus.
> But I don't agree with generic
>  "} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {"
> forwarding, it's done by 3/14 for generic case and in case of
> special device that needs bus handler called from machine one,
> I'd suggest to do forwarding explicitly for that device only
> like we do with acpi_dev.

I decided to do it that way because it is generic and results in nicer
recovery handling (e.g. in case pc_dimm plug fails, we can simply
rollback all (for now MemoryDevice) previous plug operations).

IMHO, the resulting code is easier to read.

>From this handling it is clear that
"if we reach the hotplug handler, and it is not some special device
plugged by the machine (CPU, PC_DIMM), pass it on to the actual hotplug
handler if any exists"

> 
> 
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/i386/pc.c | 39 +++++++++++++++++++++++++++++++--------
>>  1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index d768930d02..510076e156 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -2007,19 +2007,32 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>                                            DeviceState *dev, Error **errp)
>>  {
>> +    Error *local_err = NULL;
>> +
>> +    /* final stage hotplug handler */
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>> -        pc_cpu_pre_plug(hotplug_dev, dev, errp);
>> +        pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
>> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>> +        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
>> +                                 &local_err);
>>      }
>> +    error_propagate(errp, local_err);
>>  }
>>  
>>  static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>>                                        DeviceState *dev, Error **errp)
>>  {
>> +    Error *local_err = NULL;
>> +
>> +    /* final stage hotplug handler */
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>> -        pc_dimm_plug(hotplug_dev, dev, errp);
>> +        pc_dimm_plug(hotplug_dev, dev, &local_err);
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>> -        pc_cpu_plug(hotplug_dev, dev, errp);
>> +        pc_cpu_plug(hotplug_dev, dev, &local_err);
>> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>>      }
>> +    error_propagate(errp, local_err);
>>  }
>>  
>>  static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>> @@ -2029,7 +2042,10 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>>          pc_dimm_unplug_request(hotplug_dev, dev, errp);
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>>          pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
>> -    } else {
>> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>> +        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
>> +                                       errp);
>> +    } else if (!dev->parent_bus) {
>>          error_setg(errp, "acpi: device unplug request for not supported device"
>>                     " type: %s", object_get_typename(OBJECT(dev)));
>>      }
>> @@ -2038,14 +2054,21 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
>>  static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
>>                                          DeviceState *dev, Error **errp)
>>  {
>> +    Error *local_err = NULL;
>> +
>> +    /* final stage hotplug handler */
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>> -        pc_dimm_unplug(hotplug_dev, dev, errp);
>> +        pc_dimm_unplug(hotplug_dev, dev, &local_err);
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>> -        pc_cpu_unplug_cb(hotplug_dev, dev, errp);
>> -    } else {
>> -        error_setg(errp, "acpi: device unplug for not supported device"
>> +        pc_cpu_unplug_cb(hotplug_dev, dev, &local_err);
>> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>> +        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
>> +                               &local_err);
>> +    } else if (!dev->parent_bus) {
>> +        error_setg(&local_err, "acpi: device unplug for not supported device"
>>                     " type: %s", object_get_typename(OBJECT(dev)));
>>      }
>> +    error_propagate(errp, local_err);
>>  }
>>  
>>  static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler
  2018-05-30 14:08     ` David Hildenbrand
@ 2018-05-30 14:27       ` Paolo Bonzini
  2018-05-30 14:31         ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2018-05-30 14:27 UTC (permalink / raw)
  To: David Hildenbrand, Igor Mammedov
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Marcel Apfelbaum,
	Luiz Capitulino, David Gibson, Richard Henderson

On 30/05/2018 16:08, David Hildenbrand wrote:
>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>>> +        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
>> you probably could drop TYPE_PC_DIMM above, it's redundant since DIMM
>> can be cast to TYPE_MEMORY_DEVICE
>>
>> ditto for spapr
>>
> Yes, had the same in mind but left it for now this way (basically
> because we do special handling for PC_DIMM, so anybody reading this code
> is not directly confused)
> 

Hmm, I think what Igor proposes is nicer.  Can you send a v5 when he's
done with review?

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler
  2018-05-30 14:27       ` Paolo Bonzini
@ 2018-05-30 14:31         ` David Hildenbrand
  0 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-05-30 14:31 UTC (permalink / raw)
  To: Paolo Bonzini, Igor Mammedov
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Marcel Apfelbaum,
	Luiz Capitulino, David Gibson, Richard Henderson

On 30.05.2018 16:27, Paolo Bonzini wrote:
> On 30/05/2018 16:08, David Hildenbrand wrote:
>>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>>>> +        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
>>> you probably could drop TYPE_PC_DIMM above, it's redundant since DIMM
>>> can be cast to TYPE_MEMORY_DEVICE
>>>
>>> ditto for spapr
>>>
>> Yes, had the same in mind but left it for now this way (basically
>> because we do special handling for PC_DIMM, so anybody reading this code
>> is not directly confused)
>>
> 
> Hmm, I think what Igor proposes is nicer.  Can you send a v5 when he's
> done with review?

Sure!

> 
> Paolo
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
  2018-05-30 14:03   ` Paolo Bonzini
@ 2018-05-31 11:47     ` Igor Mammedov
  2018-05-31 11:50       ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-05-31 11:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Hildenbrand, qemu-devel, Pankaj Gupta, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Cornelia Huck,
	Alexander Graf, Markus Armbruster, Christian Borntraeger,
	qemu-s390x, qemu-ppc, Marcel Apfelbaum, Luiz Capitulino,
	David Gibson

On Wed, 30 May 2018 16:03:29 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 25/05/2018 14:43, David Hildenbrand wrote:
> > On 17.05.2018 10:15, David Hildenbrand wrote:  
> >> We can have devices that need certain other resources that are e.g.
> >> system resources managed by the machine. We need a clean way to assign
> >> these resources (without violating layers as brought up by Igor).
> >>
> >> One example is virtio-mem/virtio-pmem. Both device types need to be
> >> assigned some region in guest physical address space. This device memory
> >> belongs to the machine and is managed by it. However, virito devices are
> >> hotplugged using the hotplug handler their proxy device implements. So we
> >> could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
> >> hotplug handler for virtio-ccw. But definetly not the machine.
> >>
> >> Now, we can route other devices through the machine hotplug handler, to
> >> properly assign/unassign resources - like a portion in guest physical
> >> address space.
> >>
> >> v3 -> v4:
> >> - Removed the s390x bits, will send that out separately (was just a proof
> >>   that it works just fine with s390x)
> >> - Fixed a typo and reworded a comment
> >>
> >> v2 -> v3:
> >> - Added "memory-device: introduce separate config option"
> >> - Dropped "parent_bus" check from hotplug handler lookup functions
> >> - "Handly" -> "Handle" in patch description.
> >>
> >> v1 -> v2:
> >> - Use multi stage hotplug handler instead of resource handler
> >> - MemoryDevices only compiled if necessary (CONFIG_MEM_HOTPLUG)
> >> - Prepare PC/SPAPR machines properly for multi stage hotplug handlers
> >> - Route SPAPR unplug code via the hotunplug handler
> >> - Directly include s390x support. But there are no usable memory devices
> >>   yet (well, only my virtio-mem prototype)
> >> - Included "memory-device: drop assert related to align and start of address
> >>   space"
> >>
> >> David Hildenbrand (13):
> >>   memory-device: drop assert related to align and start of address space
> >>   memory-device: introduce separate config option
> >>   pc: prepare for multi stage hotplug handlers
> >>   pc: route all memory devices through the machine hotplug handler
> >>   spapr: prepare for multi stage hotplug handlers
> >>   spapr: route all memory devices through the machine hotplug handler
> >>   spapr: handle pc-dimm unplug via hotplug handler chain
> >>   spapr: handle cpu core unplug via hotplug handler chain
> >>   memory-device: new functions to handle plug/unplug
> >>   pc-dimm: implement new memory device functions
> >>   memory-device: factor out pre-plug into hotplug handler
> >>   memory-device: factor out unplug into hotplug handler
> >>   memory-device: factor out plug into hotplug handler
> >>
> >> Igor Mammedov (1):
> >>   qdev: let machine hotplug handler to override bus hotplug handler
> >>
> >>  default-configs/i386-softmmu.mak   |   3 +-
> >>  default-configs/ppc64-softmmu.mak  |   3 +-
> >>  default-configs/x86_64-softmmu.mak |   3 +-
> >>  hw/Makefile.objs                   |   2 +-
> >>  hw/core/qdev.c                     |   6 +-
> >>  hw/i386/pc.c                       | 102 ++++++++++++++++++++++-------
> >>  hw/mem/Makefile.objs               |   4 +-
> >>  hw/mem/memory-device.c             | 129 +++++++++++++++++++++++--------------
> >>  hw/mem/pc-dimm.c                   |  48 ++++++--------
> >>  hw/mem/trace-events                |   4 +-
> >>  hw/ppc/spapr.c                     | 129 +++++++++++++++++++++++++++++++------
> >>  include/hw/mem/memory-device.h     |  21 ++++--
> >>  include/hw/mem/pc-dimm.h           |   3 +-
> >>  include/hw/qdev-core.h             |  11 ++++
> >>  qapi/misc.json                     |   2 +-
> >>  15 files changed, 330 insertions(+), 140 deletions(-)
> >>  
> > 
> > As there was no negative feedback so far, I will go ahead and assume
> > that this approach is the right thing to do.  
> 
> Ok, I'll queue this.
I think it's a bit premature.
Series would need a respin and it should also include
for completness at leas one actual user (virtio-mem) to see
how new interfaces/wrappers would be used and if they actually needed.

> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
  2018-05-31 11:47     ` Igor Mammedov
@ 2018-05-31 11:50       ` Paolo Bonzini
  0 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2018-05-31 11:50 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: David Hildenbrand, qemu-devel, Pankaj Gupta, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Cornelia Huck,
	Alexander Graf, Markus Armbruster, Christian Borntraeger,
	qemu-s390x, qemu-ppc, Marcel Apfelbaum, Luiz Capitulino,
	David Gibson

On 31/05/2018 13:47, Igor Mammedov wrote:
> On Wed, 30 May 2018 16:03:29 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> On 25/05/2018 14:43, David Hildenbrand wrote:
>>> On 17.05.2018 10:15, David Hildenbrand wrote:  
>>>> We can have devices that need certain other resources that are e.g.
>>>> system resources managed by the machine. We need a clean way to assign
>>>> these resources (without violating layers as brought up by Igor).
>>>>
>>>> One example is virtio-mem/virtio-pmem. Both device types need to be
>>>> assigned some region in guest physical address space. This device memory
>>>> belongs to the machine and is managed by it. However, virito devices are
>>>> hotplugged using the hotplug handler their proxy device implements. So we
>>>> could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
>>>> hotplug handler for virtio-ccw. But definetly not the machine.
>>>>
>>>> Now, we can route other devices through the machine hotplug handler, to
>>>> properly assign/unassign resources - like a portion in guest physical
>>>> address space.
>>>>
>>>> v3 -> v4:
>>>> - Removed the s390x bits, will send that out separately (was just a proof
>>>>   that it works just fine with s390x)
>>>> - Fixed a typo and reworded a comment
>>>>
>>>> v2 -> v3:
>>>> - Added "memory-device: introduce separate config option"
>>>> - Dropped "parent_bus" check from hotplug handler lookup functions
>>>> - "Handly" -> "Handle" in patch description.
>>>>
>>>> v1 -> v2:
>>>> - Use multi stage hotplug handler instead of resource handler
>>>> - MemoryDevices only compiled if necessary (CONFIG_MEM_HOTPLUG)
>>>> - Prepare PC/SPAPR machines properly for multi stage hotplug handlers
>>>> - Route SPAPR unplug code via the hotunplug handler
>>>> - Directly include s390x support. But there are no usable memory devices
>>>>   yet (well, only my virtio-mem prototype)
>>>> - Included "memory-device: drop assert related to align and start of address
>>>>   space"
>>>>
>>>> David Hildenbrand (13):
>>>>   memory-device: drop assert related to align and start of address space
>>>>   memory-device: introduce separate config option
>>>>   pc: prepare for multi stage hotplug handlers
>>>>   pc: route all memory devices through the machine hotplug handler
>>>>   spapr: prepare for multi stage hotplug handlers
>>>>   spapr: route all memory devices through the machine hotplug handler
>>>>   spapr: handle pc-dimm unplug via hotplug handler chain
>>>>   spapr: handle cpu core unplug via hotplug handler chain
>>>>   memory-device: new functions to handle plug/unplug
>>>>   pc-dimm: implement new memory device functions
>>>>   memory-device: factor out pre-plug into hotplug handler
>>>>   memory-device: factor out unplug into hotplug handler
>>>>   memory-device: factor out plug into hotplug handler
>>>>
>>>> Igor Mammedov (1):
>>>>   qdev: let machine hotplug handler to override bus hotplug handler
>>>>
>>>>  default-configs/i386-softmmu.mak   |   3 +-
>>>>  default-configs/ppc64-softmmu.mak  |   3 +-
>>>>  default-configs/x86_64-softmmu.mak |   3 +-
>>>>  hw/Makefile.objs                   |   2 +-
>>>>  hw/core/qdev.c                     |   6 +-
>>>>  hw/i386/pc.c                       | 102 ++++++++++++++++++++++-------
>>>>  hw/mem/Makefile.objs               |   4 +-
>>>>  hw/mem/memory-device.c             | 129 +++++++++++++++++++++++--------------
>>>>  hw/mem/pc-dimm.c                   |  48 ++++++--------
>>>>  hw/mem/trace-events                |   4 +-
>>>>  hw/ppc/spapr.c                     | 129 +++++++++++++++++++++++++++++++------
>>>>  include/hw/mem/memory-device.h     |  21 ++++--
>>>>  include/hw/mem/pc-dimm.h           |   3 +-
>>>>  include/hw/qdev-core.h             |  11 ++++
>>>>  qapi/misc.json                     |   2 +-
>>>>  15 files changed, 330 insertions(+), 140 deletions(-)
>>>>  
>>>
>>> As there was no negative feedback so far, I will go ahead and assume
>>> that this approach is the right thing to do.  
>>
>> Ok, I'll queue this.
> I think it's a bit premature.
> Series would need a respin and it should also include
> for completness at leas one actual user (virtio-mem) to see
> how new interfaces/wrappers would be used and if they actually needed.

Yeah, I noticed that a respin was needed after sending this.  Thanks,

Paolo

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-05-30 14:06         ` David Hildenbrand
@ 2018-05-31 13:54           ` Igor Mammedov
  2018-06-04 10:53             ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-05-31 13:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Wed, 30 May 2018 16:06:59 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 30.05.2018 14:57, Igor Mammedov wrote:
> > On Tue, 29 May 2018 18:02:06 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> On 29.05.2018 15:27, Igor Mammedov wrote:  
> >>> On Thu, 17 May 2018 10:15:14 +0200
> >>> David Hildenbrand <david@redhat.com> wrote:
> >>>     
> >>>> The start of the address space does not have to be aligned for the
> >>>> search. Handle this case explicitly when starting the search for a new
> >>>> address.    
> >>> That's true,
> >>> but commit message doesn't explain why address_space_start
> >>> should be allowed to be non aligned.
> >>>
> >>> At least with this assert we would notice early that
> >>> board allocating misaligned address space.
> >>> I'd keep the assert unless there is a good reason to drop it.    
> >>
> >> That reason might be that I can easily crash QEMU
> >>
> >>  ./x86_64-softmmu/qemu-system-x86_64 -m 256M,maxmem=20G,slots=2 -object
> >> memory-backend-file,id=mem0,size=8192M,mem-path=/dev/zero,align=8192M
> >> -device pc-dimm,id=dimm1,memdev=mem0
> >>
> >> ERROR:hw/mem/memory-device.c:146:memory_device_get_free_addr: assertion
> >> failed: (QEMU_ALIGN_UP(address_space_start, align) == address_space_start)  
> > it looks like a different issue.
> > As I remember user visible 'align' property was added as duct tape since
> > we can't figure out alignment for DAX device no the host,
> > so it was left upto upper layers to magically figure that out.
> > 
> > However we probably shouldn't allow arbitrary or bigger aligment
> > than max page size supported by target machine/cpu
> > (i.e. currently hardcoded address_space_start alignment),
> > as it creates unnecessary fragmentation and not counted in size
> > of hotplug region (for x86 we count in additional 1Gb per memory device).
> > 
> > How about turning that assert into error check that
> > inhibits plugging in devices with alignment values
> > larger than address_space_start alignment?  
> 
> 
> Let me explain a little bit why I don't like such restrictions (for
> which I don't see a need yet):
(*) being conservative is good here because we can always relax restrictions
in future if it's needed without breaking users, but we can't take away
something thing that's been shipped (and if we do it, it typically
introduces a bunch of compat code to keep old machines working).
Also beside of giving as some freedom of movement in the future,
restrictions also to some degree prevent user from misconfiguration)

> virtio-mem devices will later have a certain block size (e.g. 4MB). I
> want to give devices during resource assignment the possibility to
> specify their alignment requirements.
size and alignment are 2 diffrent things here, alignment in our design
is dictated by backing storage page size and for performance reasons
HVA and GPA should be aligned on the same boundary, users are free
to pick another GPA manually as far as it has the same alignment.
But for automatic placement we use backend's alignment to make placement
as compact as possible but still keeping GPA aligned with HVA.

> For said virtio-mem device, this would e.g. be 4MB. (see patch 10 and 14
> of how this call "get_align" comes into play), because the addresses of
> the memory blocks are all aligned by e.g. 4MB. This is what is
> guaranteed by the device specification.
where does this 4Mb magic comes from and why block must be aligned
on this size?
 
> E.g. for DIMMs we might want to allow to specify the section size (e.g.
> 128MB on x86), otherwise e.g. Linux is not able to add all memory. (but
> we should not hardcode this, as this is a Linux specific requirement -
> still it would be nice to specify)
true, it's guest specific and we do not have restrictions here.
The only restriction we have here is that size must be multiple of
backing storage page size (i.e. alignment) so that guest would
be able to use tail page.

> So in general, a memory device might have some alignment that is to be
> taken care of.
Do we really need introducing frontend specific alignment?
I'd try reuse backend's one and go for frontend's only in case we have to.

> I don't understand right now why an upper limit on the alignment would
> make sense at all. We can easily handle it during our search. And we
> have to handle it either way during the search, if we plug some device
> with strange sizes (e.g. 1MB DIMM).
> 
> Of course, we might end up fragmenting guest physical memory, but that
> is purely a setup issue (choosing sizes of devices + main memory
> properly). I don't see a reason to error out (and of course also not to
> assert out :) ).
Agreed about assert, but I'd still prefer error out there so that users
won't crate insane config and then complain (see below and *).

Upper alignment value is useful for proper sizing of hotplug address space,
so that user could plug #slots devices upto #maxmem specified on CLI.
It's still possible to misconfigure using manual placement, but default
one just works, user either consumes all memory #maxmem and/or #slots.

There is no misunderstanding in case of error here as it works same as
on baremetal, one doesn't have a free place to put more memory or all
memory one asked for is already there.

So it might be that #slots decoupling from memory device a premature
action and we can still use it with virtio-mem/pmem.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-05-30 14:13     ` David Hildenbrand
@ 2018-05-31 14:13       ` Igor Mammedov
  2018-06-04 11:27         ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-05-31 14:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Wed, 30 May 2018 16:13:32 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 30.05.2018 15:08, Igor Mammedov wrote:
> > On Thu, 17 May 2018 10:15:17 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> For multi stage hotplug handlers, we'll have to do some error handling
> >> in some hotplug functions, so let's use a local error variable (except
> >> for unplug requests).  
> > I'd split out introducing local error into separate patch
> > so patch would do a single thing.
> >   
> >> Also, add code to pass control to the final stage hotplug handler at the
> >> parent bus.  
> > But I don't agree with generic
> >  "} else if ("dev->parent_bus && dev->parent_bus->hotplug_handler) {"
> > forwarding, it's done by 3/14 for generic case and in case of
> > special device that needs bus handler called from machine one,
> > I'd suggest to do forwarding explicitly for that device only
> > like we do with acpi_dev.  
> 
> I decided to do it that way because it is generic and results in nicer
> recovery handling (e.g. in case pc_dimm plug fails, we can simply
> rollback all (for now MemoryDevice) previous plug operations).
rollback should be managed by the caller of pc_dimm plug
directly, so it's not relevant here.

> IMHO, the resulting code is easier to read.
> 
> From this handling it is clear that
> "if we reach the hotplug handler, and it is not some special device
> plugged by the machine (CPU, PC_DIMM), pass it on to the actual hotplug
> handler if any exists"
I strongly disagree with that it's easier to deal with.
You are basically duplicating already generalized code
from qdev_get_hotplug_handler() back into boards.

If a device doesn't have to be handled by machine handler,
than qdev_get_hotplug_handler() must return its bus handler
if any directly. So branch in question that your are copying
is a dead one, pls drop it.


> 
> > 
> >   
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  hw/i386/pc.c | 39 +++++++++++++++++++++++++++++++--------
> >>  1 file changed, 31 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index d768930d02..510076e156 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -2007,19 +2007,32 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
> >>  static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> >>                                            DeviceState *dev, Error **errp)
> >>  {
> >> +    Error *local_err = NULL;
> >> +
> >> +    /* final stage hotplug handler */
> >>      if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> >> -        pc_cpu_pre_plug(hotplug_dev, dev, errp);
> >> +        pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
> >> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> >> +        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> >> +                                 &local_err);
> >>      }
> >> +    error_propagate(errp, local_err);
> >>  }
> >>  
> >>  static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
> >>                                        DeviceState *dev, Error **errp)
> >>  {
> >> +    Error *local_err = NULL;
> >> +
> >> +    /* final stage hotplug handler */
> >>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> >> -        pc_dimm_plug(hotplug_dev, dev, errp);
> >> +        pc_dimm_plug(hotplug_dev, dev, &local_err);
> >>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> >> -        pc_cpu_plug(hotplug_dev, dev, errp);
> >> +        pc_cpu_plug(hotplug_dev, dev, &local_err);
> >> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> >> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> >>      }
> >> +    error_propagate(errp, local_err);
> >>  }
> >>  
> >>  static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
> >> @@ -2029,7 +2042,10 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
> >>          pc_dimm_unplug_request(hotplug_dev, dev, errp);
> >>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> >>          pc_cpu_unplug_request_cb(hotplug_dev, dev, errp);
> >> -    } else {
> >> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> >> +        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
> >> +                                       errp);
> >> +    } else if (!dev->parent_bus) {
> >>          error_setg(errp, "acpi: device unplug request for not supported device"
> >>                     " type: %s", object_get_typename(OBJECT(dev)));
> >>      }
> >> @@ -2038,14 +2054,21 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
> >>  static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
> >>                                          DeviceState *dev, Error **errp)
> >>  {
> >> +    Error *local_err = NULL;
> >> +
> >> +    /* final stage hotplug handler */
> >>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> >> -        pc_dimm_unplug(hotplug_dev, dev, errp);
> >> +        pc_dimm_unplug(hotplug_dev, dev, &local_err);
> >>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> >> -        pc_cpu_unplug_cb(hotplug_dev, dev, errp);
> >> -    } else {
> >> -        error_setg(errp, "acpi: device unplug for not supported device"
> >> +        pc_cpu_unplug_cb(hotplug_dev, dev, &local_err);
> >> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> >> +        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> >> +                               &local_err);
> >> +    } else if (!dev->parent_bus) {
> >> +        error_setg(&local_err, "acpi: device unplug for not supported device"
> >>                     " type: %s", object_get_typename(OBJECT(dev)));
> >>      }
> >> +    error_propagate(errp, local_err);
> >>  }
> >>  
> >>  static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,  
> >   
> 
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers David Hildenbrand
  2018-05-17 12:43   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2018-06-01 10:33   ` Igor Mammedov
  2018-06-05  1:08   ` David Gibson
  2 siblings, 0 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-06-01 10:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 17 May 2018 10:15:19 +0200
David Hildenbrand <david@redhat.com> wrote:

maybe subj: make hotplug handlers use local_error
> For multi stage hotplug handlers, we'll have to do some error handling
> in some hotplug functions, so let's use a local error variable (except
> for unplug requests).


> 
> Also, add code to pass control to the final stage hotplug handler at the
> parent bus.
doing several not related things in one patch doesn't help reviewing it.
Also as explained 04/14 it's not needed at all.
Could you try to keep patches minimal,
we can add more complexity in later revisions if it really necessary.

 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ebf30dd60b..b7c5c95f7a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>  {
>      MachineState *ms = MACHINE(hotplug_dev);
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
> +    Error *local_err = NULL;
>  
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          int node;
>  
>          if (!smc->dr_lmb_enabled) {
> -            error_setg(errp, "Memory hotplug not supported for this machine");
> -            return;
> +            error_setg(&local_err,
> +                       "Memory hotplug not supported for this machine");
> +            goto out;
>          }
> -        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
> -        if (*errp) {
> -            return;
> +        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> +                                        &local_err);
> +        if (local_err) {
> +            goto out;
>          }
>          if (node < 0 || node >= MAX_NODES) {
> -            error_setg(errp, "Invaild node %d", node);
> -            return;
> +            error_setg(&local_err, "Invaild node %d", node);
> +            goto out;
>          }
>  
> -        spapr_memory_plug(hotplug_dev, dev, node, errp);
> +        spapr_memory_plug(hotplug_dev, dev, node, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -        spapr_core_plug(hotplug_dev, dev, errp);
> +        spapr_core_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> +    }
> +out:
> +    error_propagate(errp, local_err);
> +}
> +
> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> +                                        DeviceState *dev, Error **errp)
> +{
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
> +    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> +                               &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> @@ -3618,17 +3639,27 @@ static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>              return;
>          }
>          spapr_core_unplug_request(hotplug_dev, dev, errp);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
> +                                       errp);
>      }
>  }
>  
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>                                            DeviceState *dev, Error **errp)
>  {
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -        spapr_memory_pre_plug(hotplug_dev, dev, errp);
> +        spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -        spapr_core_pre_plug(hotplug_dev, dev, errp);
> +        spapr_core_pre_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> +                                 &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
> @@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
>      mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>      hc->unplug_request = spapr_machine_device_unplug_request;
> +    hc->unplug = spapr_machine_device_unplug;
>  
>      smc->dr_lmb_enabled = true;
>      mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain David Hildenbrand
@ 2018-06-01 10:53   ` Igor Mammedov
  2018-06-05  1:12   ` David Gibson
  1 sibling, 0 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-06-01 10:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 17 May 2018 10:15:21 +0200
David Hildenbrand <david@redhat.com> wrote:

> Let's handle it via hotplug_handler_unplug(). E.g. necessary to hotplug/
> unplug memory devices (which a pc-dimm is) later.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/ppc/spapr.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2f315f963b..286c38c842 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3291,7 +3291,8 @@ static sPAPRDIMMState *spapr_recover_pending_dimm_state(sPAPRMachineState *ms,
>  /* Callback to be called during DRC release. */
>  void spapr_lmb_release(DeviceState *dev)
>  {
> -    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> +    HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_ctrl);
>      sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));
>  
>      /* This information will get lost if a migration occurs
> @@ -3309,9 +3310,21 @@ void spapr_lmb_release(DeviceState *dev)
>  
>      /*
>       * Now that all the LMBs have been removed by the guest, call the
> -     * pc-dimm unplug handler to cleanup up the pc-dimm device.
> +     * unplug handler chain. This can never fail.
>       */
> -    pc_dimm_memory_unplug(dev, MACHINE(spapr));
> +    hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> +    sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));

> +    g_assert(ds);
> +    g_assert(!ds->nr_lmbs);
Theses 2 lines seems to unrelated to patch topic,
could you drop it?

if these values should be checked, it would be better to audit 'ds' use
across spapr.c and file separate patch  separately from this series.

> +    pc_dimm_memory_unplug(dev, MACHINE(hotplug_dev));
>      object_unparent(OBJECT(dev));
>      spapr_pending_dimm_unplugs_remove(spapr, ds);
>  }
> @@ -3608,7 +3621,9 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>      Error *local_err = NULL;
>  
>      /* final stage hotplug handler */
> -    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +        spapr_memory_unplug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>          hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
>                                 &local_err);
>      }
otherwise, ignoring dev->parent_bus parts, patch looks reasonable

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core unplug via hotplug handler chain
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core " David Hildenbrand
@ 2018-06-01 10:57   ` Igor Mammedov
  2018-06-05  1:13   ` David Gibson
  1 sibling, 0 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-06-01 10:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Luiz Capitulino, David Gibson,
	Richard Henderson

On Thu, 17 May 2018 10:15:22 +0200
David Hildenbrand <david@redhat.com> wrote:

> Let's handle it via hotplug_handler_unplug().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/ppc/spapr.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 286c38c842..13d153b5a6 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3412,7 +3412,16 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
>  /* Callback to be called during DRC release. */
>  void spapr_core_release(DeviceState *dev)
>  {
> -    MachineState *ms = MACHINE(qdev_get_hotplug_handler(dev));
> +    HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +
> +    /* Call the unplug handler chain. This can never fail. */
> +    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                              Error **errp)
> +{
> +    MachineState *ms = MACHINE(hotplug_dev);
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
>      CPUCore *cc = CPU_CORE(dev);
>      CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
> @@ -3623,6 +3632,8 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>      /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          spapr_memory_unplug(hotplug_dev, dev, &local_err);
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> +        spapr_core_unplug(hotplug_dev, dev, &local_err);
>      } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>          hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
>                                 &local_err);

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler David Hildenbrand
@ 2018-06-01 11:17   ` Igor Mammedov
  2018-06-04 11:45     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-01 11:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 17 May 2018 10:15:25 +0200
David Hildenbrand <david@redhat.com> wrote:

> Let's move all pre-plug checks we can do without the device being
> realized into the applicable hotplug handler for pc and spapr.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/i386/pc.c                   | 11 +++++++
>  hw/mem/memory-device.c         | 72 +++++++++++++++++++-----------------------
>  hw/ppc/spapr.c                 | 11 +++++++
>  include/hw/mem/memory-device.h |  2 ++
>  4 files changed, 57 insertions(+), 39 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 8bc41ef24b..61f1537e14 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2010,6 +2010,16 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>  {
>      Error *local_err = NULL;
>  
> +    /* first stage hotplug handler */
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +        memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
> +                               &local_err);
> +    }
> +
> +    if (local_err) {
> +        goto out;
> +    }
> +
>      /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>          pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
> @@ -2017,6 +2027,7 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>          hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
>                                   &local_err);
>      }
> +out:
>      error_propagate(errp, local_err);
>  }
>  
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 361d38bfc5..d22c91993f 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -68,58 +68,26 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
>      return 0;
>  }
>  
> -static void memory_device_check_addable(MachineState *ms, uint64_t size,
> -                                        Error **errp)
> -{
> -    uint64_t used_region_size = 0;
> -
> -    /* we will need a new memory slot for kvm and vhost */
> -    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
> -        error_setg(errp, "hypervisor has no free memory slots left");
> -        return;
> -    }
> -    if (!vhost_has_free_slot()) {
> -        error_setg(errp, "a used vhost backend has no free memory slots left");
> -        return;
> -    }
> -
> -    /* will we exceed the total amount of memory specified */
> -    memory_device_used_region_size(OBJECT(ms), &used_region_size);
> -    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
> -        error_setg(errp, "not enough space, currently 0x%" PRIx64
> -                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
> -                   used_region_size, ms->maxram_size - ms->ram_size);
> -        return;
> -    }
> -
> -}
> -
>  uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>                                       uint64_t align, uint64_t size,
>                                       Error **errp)
>  {
>      uint64_t address_space_start, address_space_end;
> +    uint64_t used_region_size = 0;
>      GSList *list = NULL, *item;
>      uint64_t new_addr = 0;
>  
> -    if (!ms->device_memory) {
> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> -                         "supported by the machine");
> -        return 0;
> -    }
> -
> -    if (!memory_region_size(&ms->device_memory->mr)) {
> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> -                         "enabled, please specify the maxmem option");
> -        return 0;
> -    }
>      address_space_start = ms->device_memory->base;
>      address_space_end = address_space_start +
>                          memory_region_size(&ms->device_memory->mr);
>      g_assert(address_space_end >= address_space_start);
>  
> -    memory_device_check_addable(ms, size, errp);
> -    if (*errp) {
> +    /* will we exceed the total amount of memory specified */
> +    memory_device_used_region_size(OBJECT(ms), &used_region_size);
> +    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
> +        error_setg(errp, "not enough space, currently 0x%" PRIx64
> +                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
> +                   used_region_size, ms->maxram_size - ms->ram_size);
>          return 0;
>      }
>  
> @@ -242,6 +210,32 @@ uint64_t get_plugged_memory_size(void)
>      return size;
>  }
>  
> +void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
> +                            Error **errp)
> +{
> +    if (!ms->device_memory) {
> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> +                         "supported by the machine");
> +        return;
> +    }
> +
> +    if (!memory_region_size(&ms->device_memory->mr)) {
> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> +                         "enabled, please specify the maxmem option");
> +        return;
> +    }
> +
> +    /* we will need a new memory slot for kvm and vhost */
> +    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
> +        error_setg(errp, "hypervisor has no free memory slots left");
> +        return;
> +    }
> +    if (!vhost_has_free_slot()) {
> +        error_setg(errp, "a used vhost backend has no free memory slots left");
> +        return;
> +    }
thanks for extracting preparations steps into the right callback.

on top of this _preplug() handler should also set being plugged
device properties if they weren't set by user.
 memory_device_get_free_addr() should be here to.

general rule for _preplug() would be to check and prepare device
for being plugged but not touch anything beside the device (it's _plug handler job)

> +}
> +
>  void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
>                                 uint64_t addr)
>  {
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 13d153b5a6..562712def2 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3676,6 +3676,16 @@ static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>  {
>      Error *local_err = NULL;
>  
> +    /* first stage hotplug handler */
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +        memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
> +                               &local_err);
> +    }
> +
> +    if (local_err) {
> +        goto out;
> +    }
> +
>      /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
> @@ -3685,6 +3695,7 @@ static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>          hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
>                                   &local_err);
>      }
> +out:
>      error_propagate(errp, local_err);
>  }
>  
> diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
> index 62d906be50..3a4e9edc92 100644
> --- a/include/hw/mem/memory-device.h
> +++ b/include/hw/mem/memory-device.h
> @@ -51,6 +51,8 @@ typedef struct MemoryDeviceClass {
>  
>  MemoryDeviceInfoList *qmp_memory_device_list(void);
>  uint64_t get_plugged_memory_size(void);
> +void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
> +                            Error **errp);
>  uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>                                       uint64_t align, uint64_t size,
>                                       Error **errp);

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug into hotplug handler
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug " David Hildenbrand
@ 2018-06-01 11:31   ` Igor Mammedov
  2018-06-04 15:54     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-01 11:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Luiz Capitulino, David Gibson,
	Richard Henderson

On Thu, 17 May 2018 10:15:26 +0200
David Hildenbrand <david@redhat.com> wrote:

> Let's move the unplug logic into the applicable hotplug handler for pc and
> spapr.
> 
> We'll move the plug logic next, then this will look more symmetrical in
> the hotplug handlers.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/i386/pc.c                   | 17 ++++++++++++++++-
>  hw/mem/memory-device.c         | 14 ++++++++++++--
>  hw/mem/pc-dimm.c               |  2 --
>  hw/mem/trace-events            |  2 ++
>  hw/ppc/spapr.c                 | 16 +++++++++++++++-
>  include/hw/mem/memory-device.h |  2 +-
>  6 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 61f1537e14..426fb534c2 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2044,6 +2044,12 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>      } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>          hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>      }
> +
> +    if (local_err) {
> +        if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +            memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
> +        }
> +    }
>      error_propagate(errp, local_err);
>  }
>  
> @@ -2080,7 +2086,16 @@ static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
>          error_setg(&local_err, "acpi: device unplug for not supported device"
>                     " type: %s", object_get_typename(OBJECT(dev)));
>      }
> -    error_propagate(errp, local_err);
> +
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /* first stage hotplug handler */
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +        memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
> +    }
>  }
>  
>  static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index d22c91993f..8f10d613ea 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -17,6 +17,7 @@
>  #include "qemu/range.h"
>  #include "hw/virtio/vhost.h"
>  #include "sysemu/kvm.h"
> +#include "trace.h"
>  
>  static gint memory_device_addr_sort(gconstpointer a, gconstpointer b)
>  {
> @@ -246,12 +247,21 @@ void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
>                                  addr - ms->device_memory->base, mr);
>  }
>  
> -void memory_device_unplug_region(MachineState *ms, MemoryRegion *mr)
> +void memory_device_unplug(MachineState *ms, MemoryDeviceState *md)
>  {
> -    /* we expect a previous call to memory_device_get_free_addr() */
> +    const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
> +    MemoryRegion *mr = mdc->get_memory_region(md);
> +
> +    /* we expect a previous call to memory_device_pre_plug */
>      g_assert(ms->device_memory);
>  
> +    if (!memory_region_is_mapped(mr)) {
> +        return;
> +    }
> +
>      memory_region_del_subregion(&ms->device_memory->mr, mr);
> +    trace_memory_device_unassign_address(mdc->get_addr(md));
> +    mdc->set_addr(md, 0);
>  }
>  
>  static const TypeInfo memory_device_info = {
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 5e2e3263ab..d487bb513b 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -94,9 +94,7 @@ void pc_dimm_memory_unplug(DeviceState *dev, MachineState *machine)
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>      MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
> -    MemoryRegion *mr = ddc->get_memory_region(dimm, &error_abort);
>  
> -    memory_device_unplug_region(machine, mr);
>      vmstate_unregister_ram(vmstate_mr, dev);
>  }
>  
> diff --git a/hw/mem/trace-events b/hw/mem/trace-events
> index e150dcc497..a661ee49a3 100644
> --- a/hw/mem/trace-events
> +++ b/hw/mem/trace-events
> @@ -3,3 +3,5 @@
>  # hw/mem/pc-dimm.c
>  mhp_pc_dimm_assigned_slot(int slot) "%d"
>  mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
> +# hw/mem/memory-device.c
> +memory_device_unassign_address(uint64_t addr) "0x%"PRIx64
maybe split out tracing into a separate patch?

> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 562712def2..abdd38a6b5 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3621,6 +3621,11 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>          hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>      }
>  out:
> +    if (local_err) {
> +        if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +            memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
> +        }
> +    }
we shouldn't call unplug here.
I'd suggest spapr_memory_plug() and/or memory_device_plug() to do
error handling and rollback on it's own, it shouldn't complicate generic
machines' hotplug handlers.

>      error_propagate(errp, local_err);
>  }
>  
> @@ -3638,7 +3643,16 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>          hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
>                                 &local_err);
>      }
> -    error_propagate(errp, local_err);
> +
> +    if (local_err) {
this check probably not needed, error_propagate()
bails out of local_err is NULL

> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /* first stage hotplug handler */
I'd drop this comment, it looks not necessary and even a bit confusing to me.

> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +        memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
> +    }
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
> index 3a4e9edc92..b8365959e7 100644
> --- a/include/hw/mem/memory-device.h
> +++ b/include/hw/mem/memory-device.h
> @@ -58,6 +58,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>                                       Error **errp);
>  void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
>                                 uint64_t addr);
> -void memory_device_unplug_region(MachineState *ms, MemoryRegion *mr);
> +void memory_device_unplug(MachineState *ms, MemoryDeviceState *md);
>  
>  #endif

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug into hotplug handler
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug " David Hildenbrand
@ 2018-06-01 11:39   ` Igor Mammedov
  2018-06-04 11:47     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-01 11:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Luiz Capitulino, David Gibson,
	Richard Henderson

On Thu, 17 May 2018 10:15:27 +0200
David Hildenbrand <david@redhat.com> wrote:

> Let's move the plug logic into the applicable hotplug handler for pc and
> spapr.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/i386/pc.c                   | 35 ++++++++++++++++++++---------------
>  hw/mem/memory-device.c         | 40 ++++++++++++++++++++++++++++++++++------
>  hw/mem/pc-dimm.c               | 29 +----------------------------
>  hw/mem/trace-events            |  2 +-
>  hw/ppc/spapr.c                 | 15 ++++++++++++---
>  include/hw/mem/memory-device.h |  7 ++-----
>  include/hw/mem/pc-dimm.h       |  3 +--
>  7 files changed, 71 insertions(+), 60 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 426fb534c2..f022eb042e 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1682,22 +1682,8 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>      HotplugHandlerClass *hhc;
>      Error *local_err = NULL;
>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> -    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> -    PCDIMMDevice *dimm = PC_DIMM(dev);
> -    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> -    MemoryRegion *mr;
> -    uint64_t align = TARGET_PAGE_SIZE;
>      bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>  
> -    mr = ddc->get_memory_region(dimm, &local_err);
> -    if (local_err) {
> -        goto out;
> -    }
> -
> -    if (memory_region_get_alignment(mr) && pcmc->enforce_aligned_dimm) {
> -        align = memory_region_get_alignment(mr);
> -    }
> -
>      /*
>       * When -no-acpi is used with Q35 machine type, no ACPI is built,
>       * but pcms->acpi_dev is still created. Check !acpi_enabled in
> @@ -1715,7 +1701,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>          goto out;
>      }
>  
> -    pc_dimm_memory_plug(dev, MACHINE(pcms), align, &local_err);
> +    pc_dimm_memory_plug(dev, MACHINE(pcms), &local_err);
>      if (local_err) {
>          goto out;
>      }
> @@ -2036,6 +2022,25 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>  {
>      Error *local_err = NULL;
>  
> +    /* first stage hotplug handler */
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +        const PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(hotplug_dev);
> +        uint64_t align = 0;
> +
> +        /* compat handling: force to TARGET_PAGE_SIZE */
> +        if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
> +            !pcmc->enforce_aligned_dimm) {
> +            align = TARGET_PAGE_SIZE;
> +        }
> +        memory_device_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
> +                           align ? &align : NULL, &local_err);
> +    }
> +
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          pc_dimm_plug(hotplug_dev, dev, &local_err);
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 8f10d613ea..04bdb30f22 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -69,9 +69,10 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
>      return 0;
>  }
>  
> -uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
> -                                     uint64_t align, uint64_t size,
> -                                     Error **errp)
> +static uint64_t memory_device_get_free_addr(MachineState *ms,
> +                                            const uint64_t *hint,
> +                                            uint64_t align, uint64_t size,
> +                                            Error **errp)
>  {
>      uint64_t address_space_start, address_space_end;
>      uint64_t used_region_size = 0;
> @@ -237,11 +238,38 @@ void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
>      }
>  }
>  
> -void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
> -                               uint64_t addr)
> +void memory_device_plug(MachineState *ms, MemoryDeviceState *md,
> +                        uint64_t *enforced_align, Error **errp)
enforced_align is PC machine specific compat flag
to keep old machines with unaligned layout work (i.e. don't break CLI/migration)
it shouldn't go into a generic code.
By default all new machines should use aligned layout. 

>  {
> -    /* we expect a previous call to memory_device_get_free_addr() */
> +    const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
> +    const uint64_t size = mdc->get_region_size(md);
> +    MemoryRegion *mr = mdc->get_memory_region(md);
> +    uint64_t addr = mdc->get_addr(md);
> +    uint64_t align;
> +
> +    /* we expect a previous call to memory_device_pre_plug */
>      g_assert(ms->device_memory);
> +    g_assert(mr && !memory_region_is_mapped(mr));
> +
> +    /* compat handling, some alignment has to be enforced for DIMMs */
> +    if (enforced_align) {
> +        align = *enforced_align;
> +    } else {
> +        align = memory_region_get_alignment(mr);
> +    }
> +
> +    /* our device might have stronger alignment requirements */
> +    if (mdc->get_align) {
> +        align = MAX(align, mdc->get_align(md));
> +    }
> +
> +    addr = memory_device_get_free_addr(ms, !addr ? NULL : &addr, align,
> +                                       size, errp);
> +    if (*errp) {
> +        return;
> +    }
> +    trace_memory_device_assign_address(addr);
> +    mdc->set_addr(md, addr);
>  
>      memory_region_add_subregion(&ms->device_memory->mr,
>                                  addr - ms->device_memory->base, mr);
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index d487bb513b..8b1dcb3260 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -32,39 +32,13 @@ typedef struct pc_dimms_capacity {
>       Error    **errp;
>  } pc_dimms_capacity;
>  
> -void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
> -                         uint64_t align, Error **errp)
> +void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine, Error **errp)
>  {
>      int slot;
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>      MemoryRegion *vmstate_mr = ddc->get_vmstate_memory_region(dimm);
>      Error *local_err = NULL;
> -    MemoryRegion *mr;
> -    uint64_t addr;
> -
> -    mr = ddc->get_memory_region(dimm, &local_err);
> -    if (local_err) {
> -        goto out;
> -    }
> -
> -    addr = object_property_get_uint(OBJECT(dimm),
> -                                    PC_DIMM_ADDR_PROP, &local_err);
> -    if (local_err) {
> -        goto out;
> -    }
> -
> -    addr = memory_device_get_free_addr(machine, !addr ? NULL : &addr, align,
> -                                       memory_region_size(mr), &local_err);
> -    if (local_err) {
> -        goto out;
> -    }
> -
> -    object_property_set_uint(OBJECT(dev), addr, PC_DIMM_ADDR_PROP, &local_err);
> -    if (local_err) {
> -        goto out;
> -    }
> -    trace_mhp_pc_dimm_assigned_address(addr);
>  
>      slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, &local_err);
>      if (local_err) {
> @@ -82,7 +56,6 @@ void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
>      }
>      trace_mhp_pc_dimm_assigned_slot(slot);
>  
> -    memory_device_plug_region(machine, mr, addr);
>      vmstate_register_ram(vmstate_mr, dev);
>  
>  out:
> diff --git a/hw/mem/trace-events b/hw/mem/trace-events
> index a661ee49a3..930b6aa6ea 100644
> --- a/hw/mem/trace-events
> +++ b/hw/mem/trace-events
> @@ -2,6 +2,6 @@
>  
>  # hw/mem/pc-dimm.c
>  mhp_pc_dimm_assigned_slot(int slot) "%d"
> -mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
>  # hw/mem/memory-device.c
> +memory_device_assign_address(uint64_t addr) "0x%"PRIx64
>  memory_device_unassign_address(uint64_t addr) "0x%"PRIx64
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index abdd38a6b5..5a4dbbf31e 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3144,16 +3144,15 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>      MemoryRegion *mr;
> -    uint64_t align, size, addr;
> +    uint64_t size, addr;
>  
>      mr = ddc->get_memory_region(dimm, &local_err);
>      if (local_err) {
>          goto out;
>      }
> -    align = memory_region_get_alignment(mr);
>      size = memory_region_size(mr);
>  
> -    pc_dimm_memory_plug(dev, MACHINE(ms), align, &local_err);
> +    pc_dimm_memory_plug(dev, MACHINE(ms), &local_err);
>      if (local_err) {
>          goto out;
>      }
> @@ -3595,6 +3594,16 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
>      Error *local_err = NULL;
>  
> +    /* first stage hotplug handler */
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> +        memory_device_plug(ms, MEMORY_DEVICE(dev), NULL, &local_err);
> +    }
> +
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          int node;
> diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
> index b8365959e7..a7408597fd 100644
> --- a/include/hw/mem/memory-device.h
> +++ b/include/hw/mem/memory-device.h
> @@ -53,11 +53,8 @@ MemoryDeviceInfoList *qmp_memory_device_list(void);
>  uint64_t get_plugged_memory_size(void);
>  void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
>                              Error **errp);
> -uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
> -                                     uint64_t align, uint64_t size,
> -                                     Error **errp);
> -void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
> -                               uint64_t addr);
> +void memory_device_plug(MachineState *ms, MemoryDeviceState *md,
> +                        uint64_t *enforced_align, Error **errp);
>  void memory_device_unplug(MachineState *ms, MemoryDeviceState *md);
>  
>  #endif
> diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
> index 627c8601d9..006c80fb2e 100644
> --- a/include/hw/mem/pc-dimm.h
> +++ b/include/hw/mem/pc-dimm.h
> @@ -78,7 +78,6 @@ typedef struct PCDIMMDeviceClass {
>  
>  int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
>  
> -void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine,
> -                         uint64_t align, Error **errp);
> +void pc_dimm_memory_plug(DeviceState *dev, MachineState *machine, Error **errp);
>  void pc_dimm_memory_unplug(DeviceState *dev, MachineState *machine);
>  #endif

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
  2018-05-25 12:43 ` [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
  2018-05-30 14:03   ` Paolo Bonzini
@ 2018-06-01 12:13   ` Igor Mammedov
  2018-06-04 10:03     ` David Hildenbrand
  2018-06-08  9:57     ` David Hildenbrand
  1 sibling, 2 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-06-01 12:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Richard Henderson, Cornelia Huck, Alexander Graf,
	Markus Armbruster, Christian Borntraeger, qemu-s390x, qemu-ppc,
	Marcel Apfelbaum, Paolo Bonzini, Luiz Capitulino, David Gibson

On Fri, 25 May 2018 14:43:39 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 17.05.2018 10:15, David Hildenbrand wrote:
> > We can have devices that need certain other resources that are e.g.
> > system resources managed by the machine. We need a clean way to assign
> > these resources (without violating layers as brought up by Igor).
> > 
> > One example is virtio-mem/virtio-pmem. Both device types need to be
> > assigned some region in guest physical address space. This device memory
> > belongs to the machine and is managed by it. However, virito devices are
> > hotplugged using the hotplug handler their proxy device implements. So we
> > could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
> > hotplug handler for virtio-ccw. But definetly not the machine.
> > 
> > Now, we can route other devices through the machine hotplug handler, to
> > properly assign/unassign resources - like a portion in guest physical
> > address space.

To sum up review:
Some comments apply to several patches even though I commented only once.

I'd suggest to restructure and split series into several:
  * unplug cleanups 08/14 & co
  * generic _preplug refactoring so we won't have to go back to that question again
  * extending memory_device interface 11/14 + actual user for the sake of which
    interface is actually extended (virtio-mem)

Also more descriptive commit messages describing why change is done,
current ones look like "Lets do something for some vague reason" to
unaware reviewers, having virtio-mem along with new extensions to
memory_device would be useful here as it could have cross reference
to parts that would need it.

Try to keep patches smaller and doing one thing, we can always squash
them later if it would be better.

I'm sorry if some comments were a bit too much or insisting on things
but I'm trying to keep hotplug infrastructure simple so that later
when someone else comes with related patches, I could easily read it
without studying it from ground up.

PS:
(I'm not a fan of idea to marry virtio device with its own bus plug logic
into bus-less machine hotplug, but I don't have a better suggestion or
time to explore alternatives, so lets do it but keep things manageable)

> > 
> > v3 -> v4:
> > - Removed the s390x bits, will send that out separately (was just a proof
> >   that it works just fine with s390x)
> > - Fixed a typo and reworded a comment
> > 
> > v2 -> v3:
> > - Added "memory-device: introduce separate config option"
> > - Dropped "parent_bus" check from hotplug handler lookup functions
> > - "Handly" -> "Handle" in patch description.
> > 
> > v1 -> v2:
> > - Use multi stage hotplug handler instead of resource handler
> > - MemoryDevices only compiled if necessary (CONFIG_MEM_HOTPLUG)
> > - Prepare PC/SPAPR machines properly for multi stage hotplug handlers
> > - Route SPAPR unplug code via the hotunplug handler
> > - Directly include s390x support. But there are no usable memory devices
> >   yet (well, only my virtio-mem prototype)
> > - Included "memory-device: drop assert related to align and start of address
> >   space"
> > 
> > David Hildenbrand (13):
> >   memory-device: drop assert related to align and start of address space
> >   memory-device: introduce separate config option
> >   pc: prepare for multi stage hotplug handlers
> >   pc: route all memory devices through the machine hotplug handler
> >   spapr: prepare for multi stage hotplug handlers
> >   spapr: route all memory devices through the machine hotplug handler
> >   spapr: handle pc-dimm unplug via hotplug handler chain
> >   spapr: handle cpu core unplug via hotplug handler chain
> >   memory-device: new functions to handle plug/unplug
> >   pc-dimm: implement new memory device functions
> >   memory-device: factor out pre-plug into hotplug handler
> >   memory-device: factor out unplug into hotplug handler
> >   memory-device: factor out plug into hotplug handler
> > 
> > Igor Mammedov (1):
> >   qdev: let machine hotplug handler to override bus hotplug handler
> > 
> >  default-configs/i386-softmmu.mak   |   3 +-
> >  default-configs/ppc64-softmmu.mak  |   3 +-
> >  default-configs/x86_64-softmmu.mak |   3 +-
> >  hw/Makefile.objs                   |   2 +-
> >  hw/core/qdev.c                     |   6 +-
> >  hw/i386/pc.c                       | 102 ++++++++++++++++++++++-------
> >  hw/mem/Makefile.objs               |   4 +-
> >  hw/mem/memory-device.c             | 129 +++++++++++++++++++++++--------------
> >  hw/mem/pc-dimm.c                   |  48 ++++++--------
> >  hw/mem/trace-events                |   4 +-
> >  hw/ppc/spapr.c                     | 129 +++++++++++++++++++++++++++++++------
> >  include/hw/mem/memory-device.h     |  21 ++++--
> >  include/hw/mem/pc-dimm.h           |   3 +-
> >  include/hw/qdev-core.h             |  11 ++++
> >  qapi/misc.json                     |   2 +-
> >  15 files changed, 330 insertions(+), 140 deletions(-)
> >   
> 
> As there was no negative feedback so far, I will go ahead and assume
> that this approach is the right thing to do.
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
  2018-06-01 12:13   ` Igor Mammedov
@ 2018-06-04 10:03     ` David Hildenbrand
  2018-06-08  9:57     ` David Hildenbrand
  1 sibling, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-04 10:03 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Richard Henderson, Cornelia Huck, Alexander Graf,
	Markus Armbruster, Christian Borntraeger, qemu-s390x, qemu-ppc,
	Marcel Apfelbaum, Paolo Bonzini, Luiz Capitulino, David Gibson

On 01.06.2018 14:13, Igor Mammedov wrote:
> On Fri, 25 May 2018 14:43:39 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 17.05.2018 10:15, David Hildenbrand wrote:
>>> We can have devices that need certain other resources that are e.g.
>>> system resources managed by the machine. We need a clean way to assign
>>> these resources (without violating layers as brought up by Igor).
>>>
>>> One example is virtio-mem/virtio-pmem. Both device types need to be
>>> assigned some region in guest physical address space. This device memory
>>> belongs to the machine and is managed by it. However, virito devices are
>>> hotplugged using the hotplug handler their proxy device implements. So we
>>> could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
>>> hotplug handler for virtio-ccw. But definetly not the machine.
>>>
>>> Now, we can route other devices through the machine hotplug handler, to
>>> properly assign/unassign resources - like a portion in guest physical
>>> address space.
> 
> To sum up review:

Thanks to the review! I'll look into the details and comment where I
disagree (or where we need a third opinion).

> Some comments apply to several patches even though I commented only once.
> 
> I'd suggest to restructure and split series into several:
>   * unplug cleanups 08/14 & co
>   * generic _preplug refactoring so we won't have to go back to that question again
>   * extending memory_device interface 11/14 + actual user for the sake of which
>     interface is actually extended (virtio-mem)
> 
> Also more descriptive commit messages describing why change is done,
> current ones look like "Lets do something for some vague reason" to
> unaware reviewers, having virtio-mem along with new extensions to
> memory_device would be useful here as it could have cross reference
> to parts that would need it.

I'll try to be more verbose.

> 
> Try to keep patches smaller and doing one thing, we can always squash
> them later if it would be better.

I can try to split them up even further where it makes sense. Please
note that including virtio-mem in the same series won't be happening,
but I'll soon share a virtio-mem protoype where we reviewers can then
see the interfaces in action. (or maybe virtio-pmem is the faster one)

(sending everything in one series will not make reviewers happy due to
the high amount of code, trust me :) )

> 
> I'm sorry if some comments were a bit too much or insisting on things
> but I'm trying to keep hotplug infrastructure simple so that later
> when someone else comes with related patches, I could easily read it
> without studying it from ground up.

Nothing wrong about that, we can talk about the things where I disagree.

> 
> PS:
> (I'm not a fan of idea to marry virtio device with its own bus plug logic
> into bus-less machine hotplug, but I don't have a better suggestion or
> time to explore alternatives, so lets do it but keep things manageable)

If it's stupid but it works, it's not stupid ;) No honestly, you
challenged if it would even be possible and I think we found a way to
make this fit in nicely.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-05-31 13:54           ` Igor Mammedov
@ 2018-06-04 10:53             ` David Hildenbrand
  2018-06-07 13:26               ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-04 10:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino


>> Let me explain a little bit why I don't like such restrictions (for
>> which I don't see a need yet):
> (*) being conservative is good here because we can always relax restrictions
> in future if it's needed without breaking users, but we can't take away
> something thing that's been shipped (and if we do it, it typically
> introduces a bunch of compat code to keep old machines working).
> Also beside of giving as some freedom of movement in the future,
> restrictions also to some degree prevent user from misconfiguration)

Right, but I consider a maximum on an alignment arbitrary (and even
difficult to implement - as you said, "max page size supported by target
machine/cpu" - there are even different huge page sizes e.g. on x86).

So I'd really vote against introducing such checks.

And another point against introducing a check for the maimum alignment
would be: There might be users with a specified alignment > max page
size. Changing the behavior will break these setups (strange but true).

> 
>> virtio-mem devices will later have a certain block size (e.g. 4MB). I
>> want to give devices during resource assignment the possibility to
>> specify their alignment requirements.
> size and alignment are 2 diffrent things here, alignment in our design
> is dictated by backing storage page size and for performance reasons
> HVA and GPA should be aligned on the same boundary, users are free
> to pick another GPA manually as far as it has the same alignment.
> But for automatic placement we use backend's alignment to make placement
> as compact as possible but still keeping GPA aligned with HVA.
> 
>> For said virtio-mem device, this would e.g. be 4MB. (see patch 10 and 14
>> of how this call "get_align" comes into play), because the addresses of
>> the memory blocks are all aligned by e.g. 4MB. This is what is
>> guaranteed by the device specification.
> where does this 4Mb magic comes from and why block must be aligned
> on this size?

(knowing it is hard to get the idea without the current prototype in
place, but that will be fixed soon)

virtio-mem works in blocks. The block size is the size in which memory
can be plugged or unplugged by the guest. It also determines the
granularity (and therefore the bitmap) we have to use to keep track of
unplugged memory. It is configurable (e.g. for migration purposes), but
might also depend on the backend memory type. E.g. if huge pages are
used in the host, the huge page size defines the minimum block size. But
consider the last part a special case. Huge pages will not be supported
for now.

The block size also defines the alignment that has to be used by the
guest for plugging/unplugging (because that's how blocks gets mapped to
the bitmap entries). So a virtio-mem device really needs a block-aligned
start address,

For now I use 4MB because it matches what guests (e.g. Linux) can
actually support and keeps the bitmap small. But as I said, it is
configurable. (-device virtio-mem,block_size=1MB, ...)

>  
>> E.g. for DIMMs we might want to allow to specify the section size (e.g.
>> 128MB on x86), otherwise e.g. Linux is not able to add all memory. (but
>> we should not hardcode this, as this is a Linux specific requirement -
>> still it would be nice to specify)
> true, it's guest specific and we do not have restrictions here.
> The only restriction we have here is that size must be multiple of
> backing storage page size (i.e. alignment) so that guest would
> be able to use tail page.
> 
>> So in general, a memory device might have some alignment that is to be
>> taken care of.
> Do we really need introducing frontend specific alignment?
> I'd try reuse backend's one and go for frontend's only in case we have to.

I think so. But I consider this a special case for virtio-mem.
(specified indirectly via block_size). For the remaining stuff, we might
be able to live with the memory backend aligment. And I agree that this
should be the default if possible.

> 
>> I don't understand right now why an upper limit on the alignment would
>> make sense at all. We can easily handle it during our search. And we
>> have to handle it either way during the search, if we plug some device
>> with strange sizes (e.g. 1MB DIMM).
>>
>> Of course, we might end up fragmenting guest physical memory, but that
>> is purely a setup issue (choosing sizes of devices + main memory
>> properly). I don't see a reason to error out (and of course also not to
>> assert out :) ).
> Agreed about assert, but I'd still prefer error out there so that users
> won't crate insane config and then complain (see below and *).
> 
> Upper alignment value is useful for proper sizing of hotplug address space,
> so that user could plug #slots devices upto #maxmem specified on CLI.
> It's still possible to misconfigure using manual placement, but default
> one just works, user either consumes all memory #maxmem and/or #slots.


Please not that it will still work in most cases. Only if people start
to define crazy alignments (like I did), crazy DIMM sizes (e.g. 1MB) or
crazy block sizes for virtio-mem, we might have a fragmented guest
physical memory. This should  usually not happen, but if so, we already
have error messages for reporting this "fragmented memory".

And I consider these "strange configs" similar as "strange manual
placement". Same result: fragmented memory, error out.

Personally, I don't think we have to guarantee that automatic placement
works in all scenarios that the user is able to come up with. It should
work in sane configurations, which is still that case.


> 
> There is no misunderstanding in case of error here as it works same as
> on baremetal, one doesn't have a free place to put more memory or all
> memory one asked for is already there.
> 
> So it might be that #slots decoupling from memory device a premature
> action and we can still use it with virtio-mem/pmem.
> 

I treat #slots on x86 as #acpi_slots, that's why I don't think they
apply here. I can see how they are used (on x86 only!) to size the
address space - but I consider this a "nice to have feature" that should
not be supposed to work in all possible scenarios.

E.g. powerpc already relies on sane user configs. #slots is not used to
size the guest address space. Similar things will apply on s390x.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-05-31 14:13       ` Igor Mammedov
@ 2018-06-04 11:27         ` David Hildenbrand
  2018-06-07 13:44           ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-04 11:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 31.05.2018 16:13, Igor Mammedov wrote:
> On Wed, 30 May 2018 16:13:32 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 30.05.2018 15:08, Igor Mammedov wrote:
>>> On Thu, 17 May 2018 10:15:17 +0200
>>> David Hildenbrand <david@redhat.com> wrote:
>>>   
>>>> For multi stage hotplug handlers, we'll have to do some error handling
>>>> in some hotplug functions, so let's use a local error variable (except
>>>> for unplug requests).  
>>> I'd split out introducing local error into separate patch
>>> so patch would do a single thing.

I can do that if it makes review easier.

>>>   
>>>> Also, add code to pass control to the final stage hotplug handler at the
>>>> parent bus.  
>>> But I don't agree with generic
>>>  "} else if ("dev->parent_bus && dev->parent_bus->hotplug_handler) {"
>>> forwarding, it's done by 3/14 for generic case and in case of
>>> special device that needs bus handler called from machine one,
>>> I'd suggest to do forwarding explicitly for that device only
>>> like we do with acpi_dev.  
>>
>> I decided to do it that way because it is generic and results in nicer
>> recovery handling (e.g. in case pc_dimm plug fails, we can simply
>> rollback all (for now MemoryDevice) previous plug operations).
> rollback should be managed by the caller of pc_dimm plug
> directly, so it's not relevant here.
> 
>> IMHO, the resulting code is easier to read.
>>
>> From this handling it is clear that
>> "if we reach the hotplug handler, and it is not some special device
>> plugged by the machine (CPU, PC_DIMM), pass it on to the actual hotplug
>> handler if any exists"
> I strongly disagree with that it's easier to deal with.
> You are basically duplicating already generalized code
> from qdev_get_hotplug_handler() back into boards.
> 
> If a device doesn't have to be handled by machine handler,
> than qdev_get_hotplug_handler() must return its bus handler
> if any directly. So branch in question that your are copying
> is a dead one, pls drop it.

We forward selected (pc_get_hotpug_handler()) devices to the
right hotplug handler. Nothing wrong about that. I don't agree
with "basically duplicating already generalized code" wrong.
We have to forward at some place. Your idea simply places that
code at some other place.


But I think we have to get the general idea sorted out first.

What you have in mind (e.g. plug):

if (TYPE_MEMORY_DEVICE) {
	memory_device_plug();
	if (local_err) {
		goto out;
	}

	if (TYPE_PC_DIMM) {
		pc_dimm_plug(hotplug_dev, dev, &local_err);
	} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
		hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
	}
	if (local_err) {
		memory_device_unplug()
		goto out;
	}
} else if (TYPE_CPU)
...


What I have in mind (and implemented in this series):


if (TYPE_MEMORY_DEVICE) {
	memory_device_plug();
}
/* possibly other interfaces */
if (local_err) {
	error_handling();
	return;
}

if (TYPE_PC_DIMM) {
	pc_dimm_plug(hotplug_dev, dev, &local_err);
} ...
} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
	hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
}

if (local_err) {
	if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
		memory_device_unplug()
	}
	/* possibly other interfaces */
}
...


I claim that my variant is more generic because:
- it easily supports multiple interfaces (like MemoryDevice)
  per Device that need a hotplug handler call
- there is only one call to hotplug_handler_plug() in case we
  add similar handling for another device

Apart from that they do exactly the same thing.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler
  2018-06-01 11:17   ` Igor Mammedov
@ 2018-06-04 11:45     ` David Hildenbrand
  2018-06-07 15:00       ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-04 11:45 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 01.06.2018 13:17, Igor Mammedov wrote:
> On Thu, 17 May 2018 10:15:25 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's move all pre-plug checks we can do without the device being
>> realized into the applicable hotplug handler for pc and spapr.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/i386/pc.c                   | 11 +++++++
>>  hw/mem/memory-device.c         | 72 +++++++++++++++++++-----------------------
>>  hw/ppc/spapr.c                 | 11 +++++++
>>  include/hw/mem/memory-device.h |  2 ++
>>  4 files changed, 57 insertions(+), 39 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 8bc41ef24b..61f1537e14 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -2010,6 +2010,16 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>  {
>>      Error *local_err = NULL;
>>  
>> +    /* first stage hotplug handler */
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
>> +        memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
>> +                               &local_err);
>> +    }
>> +
>> +    if (local_err) {
>> +        goto out;
>> +    }
>> +
>>      /* final stage hotplug handler */
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>>          pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
>> @@ -2017,6 +2027,7 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>          hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
>>                                   &local_err);
>>      }
>> +out:
>>      error_propagate(errp, local_err);
>>  }
>>  
>> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
>> index 361d38bfc5..d22c91993f 100644
>> --- a/hw/mem/memory-device.c
>> +++ b/hw/mem/memory-device.c
>> @@ -68,58 +68,26 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
>>      return 0;
>>  }
>>  
>> -static void memory_device_check_addable(MachineState *ms, uint64_t size,
>> -                                        Error **errp)
>> -{
>> -    uint64_t used_region_size = 0;
>> -
>> -    /* we will need a new memory slot for kvm and vhost */
>> -    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
>> -        error_setg(errp, "hypervisor has no free memory slots left");
>> -        return;
>> -    }
>> -    if (!vhost_has_free_slot()) {
>> -        error_setg(errp, "a used vhost backend has no free memory slots left");
>> -        return;
>> -    }
>> -
>> -    /* will we exceed the total amount of memory specified */
>> -    memory_device_used_region_size(OBJECT(ms), &used_region_size);
>> -    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
>> -        error_setg(errp, "not enough space, currently 0x%" PRIx64
>> -                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
>> -                   used_region_size, ms->maxram_size - ms->ram_size);
>> -        return;
>> -    }
>> -
>> -}
>> -
>>  uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>>                                       uint64_t align, uint64_t size,
>>                                       Error **errp)
>>  {
>>      uint64_t address_space_start, address_space_end;
>> +    uint64_t used_region_size = 0;
>>      GSList *list = NULL, *item;
>>      uint64_t new_addr = 0;
>>  
>> -    if (!ms->device_memory) {
>> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>> -                         "supported by the machine");
>> -        return 0;
>> -    }
>> -
>> -    if (!memory_region_size(&ms->device_memory->mr)) {
>> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>> -                         "enabled, please specify the maxmem option");
>> -        return 0;
>> -    }
>>      address_space_start = ms->device_memory->base;
>>      address_space_end = address_space_start +
>>                          memory_region_size(&ms->device_memory->mr);
>>      g_assert(address_space_end >= address_space_start);
>>  
>> -    memory_device_check_addable(ms, size, errp);
>> -    if (*errp) {
>> +    /* will we exceed the total amount of memory specified */
>> +    memory_device_used_region_size(OBJECT(ms), &used_region_size);
>> +    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
>> +        error_setg(errp, "not enough space, currently 0x%" PRIx64
>> +                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
>> +                   used_region_size, ms->maxram_size - ms->ram_size);
>>          return 0;
>>      }
>>  
>> @@ -242,6 +210,32 @@ uint64_t get_plugged_memory_size(void)
>>      return size;
>>  }
>>  
>> +void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
>> +                            Error **errp)
>> +{
>> +    if (!ms->device_memory) {
>> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>> +                         "supported by the machine");
>> +        return;
>> +    }
>> +
>> +    if (!memory_region_size(&ms->device_memory->mr)) {
>> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>> +                         "enabled, please specify the maxmem option");
>> +        return;
>> +    }
>> +
>> +    /* we will need a new memory slot for kvm and vhost */
>> +    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
>> +        error_setg(errp, "hypervisor has no free memory slots left");
>> +        return;
>> +    }
>> +    if (!vhost_has_free_slot()) {
>> +        error_setg(errp, "a used vhost backend has no free memory slots left");
>> +        return;
>> +    }
> thanks for extracting preparations steps into the right callback.
> 
> on top of this _preplug() handler should also set being plugged
> device properties if they weren't set by user.
>  memory_device_get_free_addr() should be here to.
> 
> general rule for _preplug() would be to check and prepare device
> for being plugged but not touch anything beside the device (it's _plug handler job)

I disagree. Or at least I disagree as part of this patch series because
it over-complicates things :)

preplug() can do basic checks but I don't think it should be used to
change actual properties. And I give you the main reason for this:

We have to do quite some amount of unnecessary error handling (please
remember the pc_dimm plug code madness before I refactored it - 80%
consisted of error handling) if we want to work on device properties
before a device is realized. And we even have duplicate checks both in
the realize() and the preplug() code then (again, what we had in the
pc_dimm plug code - do we have a memdev already or not).

Right now, I assume, that all MemoryDevice functions can be safely
called after the device has been realized without error handling. This
is nice. It e.g. guarantees that we have a memdev assigned. Otherwise,
every time we access some MemoryDevice property (e.g. region size), we
have to handle possible uninitialized properties (e.g. memdev) -
something I don't like.

So I want to avoid this by any means. And I don't really see a benefit
for this additional error handling that will be necessary: We don't care
about the performance in case something went wrong.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug into hotplug handler
  2018-06-01 11:39   ` Igor Mammedov
@ 2018-06-04 11:47     ` David Hildenbrand
  2018-06-07 10:44       ` [Qemu-devel] [qemu-s390x] " David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-04 11:47 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Luiz Capitulino, David Gibson,
	Richard Henderson

On 01.06.2018 13:39, Igor Mammedov wrote:
> On Thu, 17 May 2018 10:15:27 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's move the plug logic into the applicable hotplug handler for pc and
>> spapr.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/i386/pc.c                   | 35 ++++++++++++++++++++---------------
>>  hw/mem/memory-device.c         | 40 ++++++++++++++++++++++++++++++++++------
>>  hw/mem/pc-dimm.c               | 29 +----------------------------
>>  hw/mem/trace-events            |  2 +-
>>  hw/ppc/spapr.c                 | 15 ++++++++++++---
>>  include/hw/mem/memory-device.h |  7 ++-----
>>  include/hw/mem/pc-dimm.h       |  3 +--
>>  7 files changed, 71 insertions(+), 60 deletions(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 426fb534c2..f022eb042e 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -1682,22 +1682,8 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>>      HotplugHandlerClass *hhc;
>>      Error *local_err = NULL;
>>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
>> -    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> -    PCDIMMDevice *dimm = PC_DIMM(dev);
>> -    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>> -    MemoryRegion *mr;
>> -    uint64_t align = TARGET_PAGE_SIZE;
>>      bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>  
>> -    mr = ddc->get_memory_region(dimm, &local_err);
>> -    if (local_err) {
>> -        goto out;
>> -    }
>> -
>> -    if (memory_region_get_alignment(mr) && pcmc->enforce_aligned_dimm) {
>> -        align = memory_region_get_alignment(mr);
>> -    }
>> -
>>      /*
>>       * When -no-acpi is used with Q35 machine type, no ACPI is built,
>>       * but pcms->acpi_dev is still created. Check !acpi_enabled in
>> @@ -1715,7 +1701,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>>          goto out;
>>      }
>>  
>> -    pc_dimm_memory_plug(dev, MACHINE(pcms), align, &local_err);
>> +    pc_dimm_memory_plug(dev, MACHINE(pcms), &local_err);
>>      if (local_err) {
>>          goto out;
>>      }
>> @@ -2036,6 +2022,25 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>>  {
>>      Error *local_err = NULL;
>>  
>> +    /* first stage hotplug handler */
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
>> +        const PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(hotplug_dev);
>> +        uint64_t align = 0;
>> +
>> +        /* compat handling: force to TARGET_PAGE_SIZE */
>> +        if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
>> +            !pcmc->enforce_aligned_dimm) {
>> +            align = TARGET_PAGE_SIZE;
>> +        }
>> +        memory_device_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
>> +                           align ? &align : NULL, &local_err);
>> +    }
>> +
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>>      /* final stage hotplug handler */
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>>          pc_dimm_plug(hotplug_dev, dev, &local_err);
>> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
>> index 8f10d613ea..04bdb30f22 100644
>> --- a/hw/mem/memory-device.c
>> +++ b/hw/mem/memory-device.c
>> @@ -69,9 +69,10 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
>>      return 0;
>>  }
>>  
>> -uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>> -                                     uint64_t align, uint64_t size,
>> -                                     Error **errp)
>> +static uint64_t memory_device_get_free_addr(MachineState *ms,
>> +                                            const uint64_t *hint,
>> +                                            uint64_t align, uint64_t size,
>> +                                            Error **errp)
>>  {
>>      uint64_t address_space_start, address_space_end;
>>      uint64_t used_region_size = 0;
>> @@ -237,11 +238,38 @@ void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
>>      }
>>  }
>>  
>> -void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
>> -                               uint64_t addr)
>> +void memory_device_plug(MachineState *ms, MemoryDeviceState *md,
>> +                        uint64_t *enforced_align, Error **errp)
> enforced_align is PC machine specific compat flag
> to keep old machines with unaligned layout work (i.e. don't break CLI/migration)
> it shouldn't go into a generic code.
> By default all new machines should use aligned layout. 
> 

Yes, but there has to be a way for the search to access this property.
So what do you propose in contrast to this?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug into hotplug handler
  2018-06-01 11:31   ` Igor Mammedov
@ 2018-06-04 15:54     ` David Hildenbrand
  0 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-04 15:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Cornelia Huck, Markus Armbruster, Alexander Graf,
	Christian Borntraeger, qemu-s390x, qemu-ppc, Paolo Bonzini,
	Marcel Apfelbaum, Luiz Capitulino, David Gibson,
	Richard Henderson


>>  # hw/mem/pc-dimm.c
>>  mhp_pc_dimm_assigned_slot(int slot) "%d"
>>  mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
>> +# hw/mem/memory-device.c
>> +memory_device_unassign_address(uint64_t addr) "0x%"PRIx64
> maybe split out tracing into a separate patch?

Can do, although I think this is overkill.

> 
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 562712def2..abdd38a6b5 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3621,6 +3621,11 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>>          hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>>      }
>>  out:
>> +    if (local_err) {
>> +        if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
>> +            memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
>> +        }
>> +    }
> we shouldn't call unplug here.
> I'd suggest spapr_memory_plug() and/or memory_device_plug() to do
> error handling and rollback on it's own, it shouldn't complicate generic
> machines' hotplug handlers.

Please note that this is the generic way to handle multi stage hotplug
handlers. (see the more detailed comment in reply to patch 4 if I
remember correctly)

>From my point of view, this is the right thing to do.

> 
>>      error_propagate(errp, local_err);
>>  }
>>  
>> @@ -3638,7 +3643,16 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>>          hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
>>                                 &local_err);
>>      }
>> -    error_propagate(errp, local_err);
>> +
>> +    if (local_err) {
> this check probably not needed, error_propagate()
> bails out of local_err is NULL

With the current code (returning), this is needed.

> 
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    /* first stage hotplug handler */
> I'd drop this comment, it looks not necessary and even a bit confusing to me.

I'll drop all the comments of this form.

> 
>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
>> +        memory_device_unplug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev));
>> +    }
>>  }
>>  
>>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>> diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
>> index 3a4e9edc92..b8365959e7 100644
>> --- a/include/hw/mem/memory-device.h
>> +++ b/include/hw/mem/memory-device.h
>> @@ -58,6 +58,6 @@ uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>>                                       Error **errp);
>>  void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
>>                                 uint64_t addr);
>> -void memory_device_unplug_region(MachineState *ms, MemoryRegion *mr);
>> +void memory_device_unplug(MachineState *ms, MemoryDeviceState *md);
>>  
>>  #endif
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/14] qdev: let machine hotplug handler to override bus hotplug handler
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 03/14] qdev: let machine hotplug handler to override bus hotplug handler David Hildenbrand
@ 2018-06-05  1:02   ` David Gibson
  0 siblings, 0 replies; 76+ messages in thread
From: David Gibson @ 2018-06-05  1:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Igor Mammedov,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 4005 bytes --]

On Thu, May 17, 2018 at 10:15:16AM +0200, David Hildenbrand wrote:
> From: Igor Mammedov <imammedo@redhat.com>
> 
> it will allow to return another hotplug handler than the default
> one for a specific bus based device type. Which is needed to handle
> non trivial plug/unplug sequences that need the access to resources
> configured outside of bus where device is attached.
> 
> That will allow for returned hotplug handler to orchestrate wiring
> in arbitrary order, by chaining other hotplug handlers when
> it's needed.
> 
> PS:
> It could be used for hybrid virtio-mem and virtio-pmem devices
> where it will return machine as hotplug handler which will do
> necessary wiring at machine level and then pass control down
> the chain to bus specific hotplug handler.
> 
> Example of top level hotplug handler override and custom plug sequence:
> 
>   some_machine_get_hotplug_handler(machine){
>       if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
>           return HOTPLUG_HANDLER(machine);
>       }
>       return NULL;
>   }
> 
>   some_machine_device_plug(hotplug_dev, dev) {
>       if (object_dynamic_cast(OBJECT(dev), TYPE_SOME_BUS_DEVICE)) {
>           /* do machine specific initialization */
>           some_machine_init_special_device(dev)
> 
>           /* pass control to bus specific handler */
>           hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev)
>       }
>   }
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

I've been considering a similar change for a while; we'll also need
something like this in order to do hoplug for PCI devices under p2p
bridges on pseries (there's a PAPR specific hotplug model that we need
to use instead of SHP).

> ---
>  hw/core/qdev.c         |  6 ++----
>  include/hw/qdev-core.h | 11 +++++++++++
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index f6f92473b8..885286f579 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -261,12 +261,10 @@ HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev)
>  
>  HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev)
>  {
> -    HotplugHandler *hotplug_ctrl;
> +    HotplugHandler *hotplug_ctrl = qdev_get_machine_hotplug_handler(dev);
>  
> -    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +    if (hotplug_ctrl == NULL && dev->parent_bus) {
>          hotplug_ctrl = dev->parent_bus->hotplug_handler;
> -    } else {
> -        hotplug_ctrl = qdev_get_machine_hotplug_handler(dev);
>      }
>      return hotplug_ctrl;
>  }
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 9453588160..e6a8eca558 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -286,6 +286,17 @@ void qdev_init_nofail(DeviceState *dev);
>  void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
>                                   int required_for_version);
>  HotplugHandler *qdev_get_machine_hotplug_handler(DeviceState *dev);
> +/**
> + * qdev_get_hotplug_handler: Get handler responsible for device wiring
> + *
> + * Find HOTPLUG_HANDLER for @dev that provides [pre|un]plug callbacks for it.
> + *
> + * Note: in case @dev has a parent bus, it will be returned as handler unless
> + * machine handler overrides it.
> + *
> + * Returns: pointer to object that implements TYPE_HOTPLUG_HANDLER interface
> + *          or NULL if there aren't any.
> + */
>  HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
>  void qdev_unplug(DeviceState *dev, Error **errp);
>  void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers David Hildenbrand
  2018-05-17 12:43   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2018-06-01 10:33   ` [Qemu-devel] " Igor Mammedov
@ 2018-06-05  1:08   ` David Gibson
  2018-06-05  7:51     ` David Hildenbrand
  2 siblings, 1 reply; 76+ messages in thread
From: David Gibson @ 2018-06-05  1:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Igor Mammedov,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 5295 bytes --]

On Thu, May 17, 2018 at 10:15:19AM +0200, David Hildenbrand wrote:
> For multi stage hotplug handlers, we'll have to do some error handling
> in some hotplug functions, so let's use a local error variable (except
> for unplug requests).
> 
> Also, add code to pass control to the final stage hotplug handler at the
> parent bus.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ebf30dd60b..b7c5c95f7a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>  {
>      MachineState *ms = MACHINE(hotplug_dev);
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
> +    Error *local_err = NULL;
>  
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          int node;
>  
>          if (!smc->dr_lmb_enabled) {
> -            error_setg(errp, "Memory hotplug not supported for this machine");
> -            return;
> +            error_setg(&local_err,
> +                       "Memory hotplug not supported for this machine");
> +            goto out;
>          }
> -        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
> -        if (*errp) {
> -            return;
> +        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> +                                        &local_err);
> +        if (local_err) {
> +            goto out;
>          }
>          if (node < 0 || node >= MAX_NODES) {
> -            error_setg(errp, "Invaild node %d", node);
> -            return;
> +            error_setg(&local_err, "Invaild node %d", node);
> +            goto out;
>          }
>  
> -        spapr_memory_plug(hotplug_dev, dev, node, errp);
> +        spapr_memory_plug(hotplug_dev, dev, node, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -        spapr_core_plug(hotplug_dev, dev, errp);
> +        spapr_core_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> +    }
> +out:
> +    error_propagate(errp, local_err);
> +}
> +
> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> +                                        DeviceState *dev, Error **errp)
> +{
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
> +    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {

As I think Igor said on the equivalent PC patch, I don't quite get
this.  Isn't this already handled by the generic hotplug code picking
up the bus's hotplug handler if the machine doesn't supply one?

> +        hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
> +                               &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
> @@ -3618,17 +3639,27 @@ static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>              return;
>          }
>          spapr_core_unplug_request(hotplug_dev, dev, errp);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_unplug_request(dev->parent_bus->hotplug_handler, dev,
> +                                       errp);
>      }
>  }
>  
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>                                            DeviceState *dev, Error **errp)
>  {
> +    Error *local_err = NULL;
> +
> +    /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> -        spapr_memory_pre_plug(hotplug_dev, dev, errp);
> +        spapr_memory_pre_plug(hotplug_dev, dev, &local_err);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> -        spapr_core_pre_plug(hotplug_dev, dev, errp);
> +        spapr_core_pre_plug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +        hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> +                                 &local_err);
>      }
> +    error_propagate(errp, local_err);
>  }
>  
>  static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
> @@ -3988,6 +4019,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      mc->get_default_cpu_node_id = spapr_get_default_cpu_node_id;
>      mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>      hc->unplug_request = spapr_machine_device_unplug_request;
> +    hc->unplug = spapr_machine_device_unplug;
>  
>      smc->dr_lmb_enabled = true;
>      mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler David Hildenbrand
@ 2018-06-05  1:09   ` David Gibson
  2018-06-05  7:51     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: David Gibson @ 2018-06-05  1:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Igor Mammedov,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

On Thu, May 17, 2018 at 10:15:20AM +0200, David Hildenbrand wrote:
> Necessary to hotplug them cleanly later.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

As for PC, I think it would be nicer to drop the explicit check
against PC_DIMM, since it is covered by MEMORY_DEVICE.

> ---
>  hw/ppc/spapr.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b7c5c95f7a..2f315f963b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3666,6 +3666,7 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
>                                                   DeviceState *dev)
>  {
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
> +        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>          return HOTPLUG_HANDLER(machine);
>      }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain David Hildenbrand
  2018-06-01 10:53   ` Igor Mammedov
@ 2018-06-05  1:12   ` David Gibson
  1 sibling, 0 replies; 76+ messages in thread
From: David Gibson @ 2018-06-05  1:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Igor Mammedov,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 2895 bytes --]

On Thu, May 17, 2018 at 10:15:21AM +0200, David Hildenbrand wrote:
> Let's handle it via hotplug_handler_unplug(). E.g. necessary to hotplug/
> unplug memory devices (which a pc-dimm is) later.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/ppc/spapr.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2f315f963b..286c38c842 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3291,7 +3291,8 @@ static sPAPRDIMMState *spapr_recover_pending_dimm_state(sPAPRMachineState *ms,
>  /* Callback to be called during DRC release. */
>  void spapr_lmb_release(DeviceState *dev)
>  {
> -    sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_hotplug_handler(dev));
> +    HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_ctrl);
>      sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));
>  
>      /* This information will get lost if a migration occurs
> @@ -3309,9 +3310,21 @@ void spapr_lmb_release(DeviceState *dev)
>  
>      /*
>       * Now that all the LMBs have been removed by the guest, call the
> -     * pc-dimm unplug handler to cleanup up the pc-dimm device.
> +     * unplug handler chain. This can never fail.
>       */
> -    pc_dimm_memory_unplug(dev, MACHINE(spapr));
> +    hotplug_ctrl = qdev_get_hotplug_handler(dev);

You're double initializing hotplug_ctrl to the same thing here, AFAICT.

> +    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_memory_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                Error **errp)
> +{
> +    sPAPRMachineState *spapr = SPAPR_MACHINE(hotplug_dev);
> +    sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev));
> +
> +    g_assert(ds);
> +    g_assert(!ds->nr_lmbs);
> +    pc_dimm_memory_unplug(dev, MACHINE(hotplug_dev));
>      object_unparent(OBJECT(dev));
>      spapr_pending_dimm_unplugs_remove(spapr, ds);
>  }
> @@ -3608,7 +3621,9 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>      Error *local_err = NULL;
>  
>      /* final stage hotplug handler */
> -    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +        spapr_memory_unplug(hotplug_dev, dev, &local_err);
> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>          hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
>                                 &local_err);
>      }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core unplug via hotplug handler chain
  2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core " David Hildenbrand
  2018-06-01 10:57   ` Igor Mammedov
@ 2018-06-05  1:13   ` David Gibson
  1 sibling, 0 replies; 76+ messages in thread
From: David Gibson @ 2018-06-05  1:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Igor Mammedov,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]

On Thu, May 17, 2018 at 10:15:22AM +0200, David Hildenbrand wrote:
> Let's handle it via hotplug_handler_unplug().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 286c38c842..13d153b5a6 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3412,7 +3412,16 @@ static void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
>  /* Callback to be called during DRC release. */
>  void spapr_core_release(DeviceState *dev)
>  {
> -    MachineState *ms = MACHINE(qdev_get_hotplug_handler(dev));
> +    HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
> +
> +    /* Call the unplug handler chain. This can never fail. */
> +    hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> +}
> +
> +static void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                              Error **errp)
> +{
> +    MachineState *ms = MACHINE(hotplug_dev);
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
>      CPUCore *cc = CPU_CORE(dev);
>      CPUArchId *core_slot = spapr_find_cpu_slot(ms, cc->core_id, NULL);
> @@ -3623,6 +3632,8 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>      /* final stage hotplug handler */
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>          spapr_memory_unplug(hotplug_dev, dev, &local_err);
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> +        spapr_core_unplug(hotplug_dev, dev, &local_err);
>      } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>          hotplug_handler_unplug(dev->parent_bus->hotplug_handler, dev,
>                                 &local_err);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers
  2018-06-05  1:08   ` David Gibson
@ 2018-06-05  7:51     ` David Hildenbrand
  2018-06-07 14:26       ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-05  7:51 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Igor Mammedov,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

On 05.06.2018 03:08, David Gibson wrote:
> On Thu, May 17, 2018 at 10:15:19AM +0200, David Hildenbrand wrote:
>> For multi stage hotplug handlers, we'll have to do some error handling
>> in some hotplug functions, so let's use a local error variable (except
>> for unplug requests).
>>
>> Also, add code to pass control to the final stage hotplug handler at the
>> parent bus.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 43 insertions(+), 11 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index ebf30dd60b..b7c5c95f7a 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>>  {
>>      MachineState *ms = MACHINE(hotplug_dev);
>>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
>> +    Error *local_err = NULL;
>>  
>> +    /* final stage hotplug handler */
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>>          int node;
>>  
>>          if (!smc->dr_lmb_enabled) {
>> -            error_setg(errp, "Memory hotplug not supported for this machine");
>> -            return;
>> +            error_setg(&local_err,
>> +                       "Memory hotplug not supported for this machine");
>> +            goto out;
>>          }
>> -        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
>> -        if (*errp) {
>> -            return;
>> +        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
>> +                                        &local_err);
>> +        if (local_err) {
>> +            goto out;
>>          }
>>          if (node < 0 || node >= MAX_NODES) {
>> -            error_setg(errp, "Invaild node %d", node);
>> -            return;
>> +            error_setg(&local_err, "Invaild node %d", node);
>> +            goto out;
>>          }
>>  
>> -        spapr_memory_plug(hotplug_dev, dev, node, errp);
>> +        spapr_memory_plug(hotplug_dev, dev, node, &local_err);
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>> -        spapr_core_plug(hotplug_dev, dev, errp);
>> +        spapr_core_plug(hotplug_dev, dev, &local_err);
>> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>> +    }
>> +out:
>> +    error_propagate(errp, local_err);
>> +}
>> +
>> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
>> +                                        DeviceState *dev, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +
>> +    /* final stage hotplug handler */
>> +    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> 
> As I think Igor said on the equivalent PC patch, I don't quite get
> this.  Isn't this already handled by the generic hotplug code picking
> up the bus's hotplug handler if the machine doesn't supply one?

See my reply to patch nr 4.

What we do is, we install the machine hotplug handler as an
"intermediate" hotplug handler.

E.g. if we have a VIRTIO based MemoryDevice, we have to initialize the
MemoryDevice specific stuff in the machine hotplug handler, but then
pass the device onto the last stage hotplug handler (which will
eventually attach it to a bus and notify the guest).

For PC_DIMM, the machine hotplug handler is already the last stage
hotplug handler. Everything is fine. If it is not a PC_DIMM, we have to
pass it on to the right last stage hotplug handler (e.g. using the bus).

So the generic hotplug code will alway pick up the machine hotplug
handler for MemoryDevices first, do the MemoryDevice specific stuff and
then pass on control to the "actual" hotplug handler.

Please note that this handling only applies to device types we "route"
through the machine hotplug handler, what we will initially only do for
MemoryDevices.

Just like for PC, I factored out the introduction of the local error
variable and will add a better description to the patch descriptions on
how this plays together.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler
  2018-06-05  1:09   ` David Gibson
@ 2018-06-05  7:51     ` David Hildenbrand
  0 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-05  7:51 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Igor Mammedov,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

On 05.06.2018 03:09, David Gibson wrote:
> On Thu, May 17, 2018 at 10:15:20AM +0200, David Hildenbrand wrote:
>> Necessary to hotplug them cleanly later.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> As for PC, I think it would be nicer to drop the explicit check
> against PC_DIMM, since it is covered by MEMORY_DEVICE.

Indeed, already did that.

> 
>> ---
>>  hw/ppc/spapr.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index b7c5c95f7a..2f315f963b 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3666,6 +3666,7 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
>>                                                   DeviceState *dev)
>>  {
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>> +        object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE) ||
>>          object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>>          return HOTPLUG_HANDLER(machine);
>>      }
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 14/14] memory-device: factor out plug into hotplug handler
  2018-06-04 11:47     ` David Hildenbrand
@ 2018-06-07 10:44       ` David Hildenbrand
  0 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-07 10:44 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin, qemu-s390x,
	Richard Henderson, Cornelia Huck, qemu-devel, Markus Armbruster,
	Christian Borntraeger, Alexander Graf, qemu-ppc,
	Marcel Apfelbaum, Paolo Bonzini, Luiz Capitulino, David Gibson

On 04.06.2018 13:47, David Hildenbrand wrote:
> On 01.06.2018 13:39, Igor Mammedov wrote:
>> On Thu, 17 May 2018 10:15:27 +0200
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> Let's move the plug logic into the applicable hotplug handler for pc and
>>> spapr.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  hw/i386/pc.c                   | 35 ++++++++++++++++++++---------------
>>>  hw/mem/memory-device.c         | 40 ++++++++++++++++++++++++++++++++++------
>>>  hw/mem/pc-dimm.c               | 29 +----------------------------
>>>  hw/mem/trace-events            |  2 +-
>>>  hw/ppc/spapr.c                 | 15 ++++++++++++---
>>>  include/hw/mem/memory-device.h |  7 ++-----
>>>  include/hw/mem/pc-dimm.h       |  3 +--
>>>  7 files changed, 71 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 426fb534c2..f022eb042e 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -1682,22 +1682,8 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>>>      HotplugHandlerClass *hhc;
>>>      Error *local_err = NULL;
>>>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
>>> -    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>>> -    PCDIMMDevice *dimm = PC_DIMM(dev);
>>> -    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>>> -    MemoryRegion *mr;
>>> -    uint64_t align = TARGET_PAGE_SIZE;
>>>      bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
>>>  
>>> -    mr = ddc->get_memory_region(dimm, &local_err);
>>> -    if (local_err) {
>>> -        goto out;
>>> -    }
>>> -
>>> -    if (memory_region_get_alignment(mr) && pcmc->enforce_aligned_dimm) {
>>> -        align = memory_region_get_alignment(mr);
>>> -    }
>>> -
>>>      /*
>>>       * When -no-acpi is used with Q35 machine type, no ACPI is built,
>>>       * but pcms->acpi_dev is still created. Check !acpi_enabled in
>>> @@ -1715,7 +1701,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
>>>          goto out;
>>>      }
>>>  
>>> -    pc_dimm_memory_plug(dev, MACHINE(pcms), align, &local_err);
>>> +    pc_dimm_memory_plug(dev, MACHINE(pcms), &local_err);
>>>      if (local_err) {
>>>          goto out;
>>>      }
>>> @@ -2036,6 +2022,25 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>>>  {
>>>      Error *local_err = NULL;
>>>  
>>> +    /* first stage hotplug handler */
>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
>>> +        const PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(hotplug_dev);
>>> +        uint64_t align = 0;
>>> +
>>> +        /* compat handling: force to TARGET_PAGE_SIZE */
>>> +        if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
>>> +            !pcmc->enforce_aligned_dimm) {
>>> +            align = TARGET_PAGE_SIZE;
>>> +        }
>>> +        memory_device_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
>>> +                           align ? &align : NULL, &local_err);
>>> +    }
>>> +
>>> +    if (local_err) {
>>> +        error_propagate(errp, local_err);
>>> +        return;
>>> +    }
>>> +
>>>      /* final stage hotplug handler */
>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>>>          pc_dimm_plug(hotplug_dev, dev, &local_err);
>>> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
>>> index 8f10d613ea..04bdb30f22 100644
>>> --- a/hw/mem/memory-device.c
>>> +++ b/hw/mem/memory-device.c
>>> @@ -69,9 +69,10 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
>>>      return 0;
>>>  }
>>>  
>>> -uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>>> -                                     uint64_t align, uint64_t size,
>>> -                                     Error **errp)
>>> +static uint64_t memory_device_get_free_addr(MachineState *ms,
>>> +                                            const uint64_t *hint,
>>> +                                            uint64_t align, uint64_t size,
>>> +                                            Error **errp)
>>>  {
>>>      uint64_t address_space_start, address_space_end;
>>>      uint64_t used_region_size = 0;
>>> @@ -237,11 +238,38 @@ void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
>>>      }
>>>  }
>>>  
>>> -void memory_device_plug_region(MachineState *ms, MemoryRegion *mr,
>>> -                               uint64_t addr)
>>> +void memory_device_plug(MachineState *ms, MemoryDeviceState *md,
>>> +                        uint64_t *enforced_align, Error **errp)
>> enforced_align is PC machine specific compat flag
>> to keep old machines with unaligned layout work (i.e. don't break CLI/migration)
>> it shouldn't go into a generic code.
>> By default all new machines should use aligned layout. 
>>
> 
> Yes, but there has to be a way for the search to access this property.
> So what do you propose in contrast to this?
> 

FYI, I now have a patch like this:

commit 64cf8b1c210ffc86283ce2f677d425c6569b9358
Author: David Hildenbrand <david@redhat.com>
Date:   Wed Jun 6 21:00:27 2018 +0200

    machine: introduce memory_device_align (factor out enforce_aligned_dimm)
    
    We will handle memory device alignment completely in memory-device.c,
    without passing compatibility parameters ("*align").
    
    As x86 and Power use different strategies to determine an alignment and
    we need clean support for compat handling, let's introduce an enum on
    the machine class level.
    
    The three introduced types represent what is being done on x86 and Power
    right now.
    
    The x86 part is only changed to temporarily keep working, we'll factor
    this out into common code soon.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>

...
diff --git a/include/hw/boards.h b/include/hw/boards.h
index ef7457f5dd..3f151207c1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -105,6 +105,15 @@ typedef struct {
     CPUArchId cpus[0];
 } CPUArchIdList;
 
+typedef enum MemoryDeviceAlign {
+    /* use specified memory region alignment */
+    MEMORY_DEVICE_ALIGN_REGION = 0,
+    /* use target page size as alignment */
+    MEMORY_DEVICE_ALIGN_PAGE,
+    /* use target page size if no memory region alignment has been specified */
+    MEMORY_DEVICE_ALIGN_REGION_OR_PAGE,
+} MemoryDeviceAlign;
+
 /**
  * MachineClass:
  * @max_cpus: maximum number of CPUs supported. Default: 1
@@ -156,6 +165,9 @@ typedef struct {
  *    should instead use "unimplemented-device" for all memory ranges where
  *    the guest will attempt to probe for a device that QEMU doesn't
  *    implement and a stub device is required.
+ * @memory_device_align: The alignment that will be used as default when
+ *    searching for a guest physical memory address while plugging a
+ *    memory device. This is relevant for compatibility handling.
  */
 struct MachineClass {
     /*< private >*/
@@ -202,6 +214,7 @@ struct MachineClass {
     const char **valid_cpu_types;
     strList *allowed_dynamic_sysbus_devices;
     bool auto_enable_numa_with_memhp;
+    MemoryDeviceAlign memory_device_align;
     void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
                                  int nb_nodes, ram_addr_t size);
 
...



-- 

Thanks,

David / dhildenb

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-06-04 10:53             ` David Hildenbrand
@ 2018-06-07 13:26               ` Igor Mammedov
  2018-06-07 14:12                 ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-07 13:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Mon, 4 Jun 2018 12:53:42 +0200
David Hildenbrand <david@redhat.com> wrote:

> >> Let me explain a little bit why I don't like such restrictions (for
> >> which I don't see a need yet):  
> > (*) being conservative is good here because we can always relax restrictions
> > in future if it's needed without breaking users, but we can't take away
> > something thing that's been shipped (and if we do it, it typically
> > introduces a bunch of compat code to keep old machines working).
> > Also beside of giving as some freedom of movement in the future,
> > restrictions also to some degree prevent user from misconfiguration)  
> 
> Right, but I consider a maximum on an alignment arbitrary (and even
> difficult to implement - as you said, "max page size supported by target
> machine/cpu" - there are even different huge page sizes e.g. on x86).
> 
> So I'd really vote against introducing such checks.
> 
> And another point against introducing a check for the maimum alignment
> would be: There might be users with a specified alignment > max page
> size. Changing the behavior will break these setups (strange but true).
check is already there implicitly as assert, and you're removing it
with this patch.
What I'm suggesting is to drop this patch or replace assert with
graceful error.
If you have actual usecase that requires this check being dropped,
then commit message should ddescribe it properly and the patch should
be a part of series that introduces that requirement.

> >> virtio-mem devices will later have a certain block size (e.g. 4MB). I
> >> want to give devices during resource assignment the possibility to
> >> specify their alignment requirements.  
> > size and alignment are 2 diffrent things here, alignment in our design
> > is dictated by backing storage page size and for performance reasons
> > HVA and GPA should be aligned on the same boundary, users are free
> > to pick another GPA manually as far as it has the same alignment.
> > But for automatic placement we use backend's alignment to make placement
> > as compact as possible but still keeping GPA aligned with HVA.
> >   
> >> For said virtio-mem device, this would e.g. be 4MB. (see patch 10 and 14
> >> of how this call "get_align" comes into play), because the addresses of
> >> the memory blocks are all aligned by e.g. 4MB. This is what is
> >> guaranteed by the device specification.  
> > where does this 4Mb magic comes from and why block must be aligned
> > on this size?  
> 
> (knowing it is hard to get the idea without the current prototype in
> place, but that will be fixed soon)
> 
> virtio-mem works in blocks. The block size is the size in which memory
> can be plugged or unplugged by the guest. It also determines the
> granularity (and therefore the bitmap) we have to use to keep track of
> unplugged memory. It is configurable (e.g. for migration purposes), but
> might also depend on the backend memory type. E.g. if huge pages are
> used in the host, the huge page size defines the minimum block size. But
> consider the last part a special case. Huge pages will not be supported
> for now.
> 
> The block size also defines the alignment that has to be used by the
> guest for plugging/unplugging (because that's how blocks gets mapped to
> the bitmap entries). So a virtio-mem device really needs a block-aligned
> start address,
> 
> For now I use 4MB because it matches what guests (e.g. Linux) can
> actually support and keeps the bitmap small. But as I said, it is
> configurable. (-device virtio-mem,block_size=1MB, ...)
> 
> >    
> >> E.g. for DIMMs we might want to allow to specify the section size (e.g.
> >> 128MB on x86), otherwise e.g. Linux is not able to add all memory. (but
> >> we should not hardcode this, as this is a Linux specific requirement -
> >> still it would be nice to specify)  
> > true, it's guest specific and we do not have restrictions here.
> > The only restriction we have here is that size must be multiple of
> > backing storage page size (i.e. alignment) so that guest would
> > be able to use tail page.
> >   
> >> So in general, a memory device might have some alignment that is to be
> >> taken care of.  
> > Do we really need introducing frontend specific alignment?
> > I'd try reuse backend's one and go for frontend's only in case we have to.  
> 
> I think so. But I consider this a special case for virtio-mem.
> (specified indirectly via block_size). For the remaining stuff, we might
> be able to live with the memory backend aligment. And I agree that this
> should be the default if possible.
So if you make block_size to be multiple of backing storage alignment (i.e. page size)
and we should keep GPA mapping (picking address to map region) based on backend's
alignment. virtio-mem probably would even work even with huge pages at cost
of increased granularity.

  
> >> I don't understand right now why an upper limit on the alignment would
> >> make sense at all. We can easily handle it during our search. And we
> >> have to handle it either way during the search, if we plug some device
> >> with strange sizes (e.g. 1MB DIMM).
> >>
> >> Of course, we might end up fragmenting guest physical memory, but that
> >> is purely a setup issue (choosing sizes of devices + main memory
> >> properly). I don't see a reason to error out (and of course also not to
> >> assert out :) ).  
> > Agreed about assert, but I'd still prefer error out there so that users
> > won't crate insane config and then complain (see below and *).
> > 
> > Upper alignment value is useful for proper sizing of hotplug address space,
> > so that user could plug #slots devices upto #maxmem specified on CLI.
> > It's still possible to misconfigure using manual placement, but default
> > one just works, user either consumes all memory #maxmem and/or #slots.  
> 
> 
> Please not that it will still work in most cases. Only if people start
> to define crazy alignments (like I did), crazy DIMM sizes (e.g. 1MB) or
> crazy block sizes for virtio-mem, we might have a fragmented guest
> physical memory. This should  usually not happen, but if so, we already
> have error messages for reporting this "fragmented memory".
> 
> And I consider these "strange configs" similar as "strange manual
> placement". Same result: fragmented memory, error out.
> 
> Personally, I don't think we have to guarantee that automatic placement
> works in all scenarios that the user is able to come up with. It should
> work in sane configurations, which is still that case.
It work now on x86 and I'd wish to keep it this way.
 
> > There is no misunderstanding in case of error here as it works same as
> > on baremetal, one doesn't have a free place to put more memory or all
> > memory one asked for is already there.
> > 
> > So it might be that #slots decoupling from memory device a premature
> > action and we can still use it with virtio-mem/pmem.
> >
> 
> I treat #slots on x86 as #acpi_slots, that's why I don't think they
> apply here. I can see how they are used (on x86 only!) to size the
> address space - but I consider this a "nice to have feature" that should
> not be supposed to work in all possible scenarios.
slot is more user space concept, as it is on real hw where you stick a plank
with memory, so it's easily understood by users.
On ACPI side it's just implementation detail, we can go beyond that if it's
needed without any adverse effects.
Restriction mostly comes from KVM side on # of memory slots and vhost.

> E.g. powerpc already relies on sane user configs. #slots is not used to
> size the guest address space. Similar things will apply on s390x.
They both could benefit from it the same way as x86 providing
properly sized area, so users won't have to scratch their head
trying to understand why they can't plug more memory.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-04 11:27         ` David Hildenbrand
@ 2018-06-07 13:44           ` Igor Mammedov
  2018-06-07 14:00             ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-07 13:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Mon, 4 Jun 2018 13:27:01 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 31.05.2018 16:13, Igor Mammedov wrote:
> > On Wed, 30 May 2018 16:13:32 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> On 30.05.2018 15:08, Igor Mammedov wrote:  
> >>> On Thu, 17 May 2018 10:15:17 +0200
> >>> David Hildenbrand <david@redhat.com> wrote:
> >>>     
> >>>> For multi stage hotplug handlers, we'll have to do some error handling
> >>>> in some hotplug functions, so let's use a local error variable (except
> >>>> for unplug requests).    
> >>> I'd split out introducing local error into separate patch
> >>> so patch would do a single thing.  
> 
> I can do that if it makes review easier.
> 
> >>>     
> >>>> Also, add code to pass control to the final stage hotplug handler at the
> >>>> parent bus.    
> >>> But I don't agree with generic
> >>>  "} else if ("dev->parent_bus && dev->parent_bus->hotplug_handler) {"
> >>> forwarding, it's done by 3/14 for generic case and in case of
> >>> special device that needs bus handler called from machine one,
> >>> I'd suggest to do forwarding explicitly for that device only
> >>> like we do with acpi_dev.    
> >>
> >> I decided to do it that way because it is generic and results in nicer
> >> recovery handling (e.g. in case pc_dimm plug fails, we can simply
> >> rollback all (for now MemoryDevice) previous plug operations).  
> > rollback should be managed by the caller of pc_dimm plug
> > directly, so it's not relevant here.
> >   
> >> IMHO, the resulting code is easier to read.
> >>
> >> From this handling it is clear that
> >> "if we reach the hotplug handler, and it is not some special device
> >> plugged by the machine (CPU, PC_DIMM), pass it on to the actual hotplug
> >> handler if any exists"  
> > I strongly disagree with that it's easier to deal with.
> > You are basically duplicating already generalized code
> > from qdev_get_hotplug_handler() back into boards.
> > 
> > If a device doesn't have to be handled by machine handler,
> > than qdev_get_hotplug_handler() must return its bus handler
> > if any directly. So branch in question that your are copying
> > is a dead one, pls drop it.  
> 
> We forward selected (pc_get_hotpug_handler()) devices to the
> right hotplug handler. Nothing wrong about that. I don't agree
> with "basically duplicating already generalized code" wrong.
> We have to forward at some place. Your idea simply places that
> code at some other place.
> 
> 
> But I think we have to get the general idea sorted out first.
> 
> What you have in mind (e.g. plug):
> 
> if (TYPE_MEMORY_DEVICE) {
> 	memory_device_plug();
> 	if (local_err) {
> 		goto out;
> 	}
> 
> 	if (TYPE_PC_DIMM) {
> 		pc_dimm_plug(hotplug_dev, dev, &local_err);
> 	} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> 		hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> 	}
> 	if (local_err) {
> 		memory_device_unplug()
> 		goto out;
> 	}
> } else if (TYPE_CPU)
> ...
> 
> 
> What I have in mind (and implemented in this series):
> 
> 
> if (TYPE_MEMORY_DEVICE) {
> 	memory_device_plug();
> }
> /* possibly other interfaces */
> if (local_err) {
> 	error_handling();
> 	return;
> }
> 
> if (TYPE_PC_DIMM) {
> 	pc_dimm_plug(hotplug_dev, dev, &local_err);
> } ...
> } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> 	hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> }
I don't like this implicit wiring, where reader of this part of code
has no idea that TYPE_MEMORY_DEVICE might be also bus device that needs
request forwarding.
I'd prefer [pre/un]plug handlers be concrete and explicit spaghetti code,
something like this:

if (TYPE_PC_DIMM) {
    pc_dimm_plug()
    /* do here additional concrete machine specific things */
} else if (TYPE_VIRTIO_MEM) {
    virtio_mem_plug() <- do forwarding in there
    /* and do here additional concrete machine specific things */
} else if (TYPE_CPU) {
    cpu_plug()
    /* do here additional concrete machine specific things */
}

> if (local_err) {
> 	if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> 		memory_device_unplug()
> 	}
> 	/* possibly other interfaces */
> }
> ...
> 
> 
> I claim that my variant is more generic because:
> - it easily supports multiple interfaces (like MemoryDevice)
>   per Device that need a hotplug handler call
> - there is only one call to hotplug_handler_plug() in case we
>   add similar handling for another device
As someone said "one more layer of indirection would solve problem".
But then one would have a clue how it works after a while (including
author of the feature).
I don't think we have a problem here and need generalization.

> 
> Apart from that they do exactly the same thing.
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-07 13:44           ` Igor Mammedov
@ 2018-06-07 14:00             ` David Hildenbrand
  2018-06-08 12:24               ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-07 14:00 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 07.06.2018 15:44, Igor Mammedov wrote:
> On Mon, 4 Jun 2018 13:27:01 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 31.05.2018 16:13, Igor Mammedov wrote:
>>> On Wed, 30 May 2018 16:13:32 +0200
>>> David Hildenbrand <david@redhat.com> wrote:
>>>   
>>>> On 30.05.2018 15:08, Igor Mammedov wrote:  
>>>>> On Thu, 17 May 2018 10:15:17 +0200
>>>>> David Hildenbrand <david@redhat.com> wrote:
>>>>>     
>>>>>> For multi stage hotplug handlers, we'll have to do some error handling
>>>>>> in some hotplug functions, so let's use a local error variable (except
>>>>>> for unplug requests).    
>>>>> I'd split out introducing local error into separate patch
>>>>> so patch would do a single thing.  
>>
>> I can do that if it makes review easier.
>>
>>>>>     
>>>>>> Also, add code to pass control to the final stage hotplug handler at the
>>>>>> parent bus.    
>>>>> But I don't agree with generic
>>>>>  "} else if ("dev->parent_bus && dev->parent_bus->hotplug_handler) {"
>>>>> forwarding, it's done by 3/14 for generic case and in case of
>>>>> special device that needs bus handler called from machine one,
>>>>> I'd suggest to do forwarding explicitly for that device only
>>>>> like we do with acpi_dev.    
>>>>
>>>> I decided to do it that way because it is generic and results in nicer
>>>> recovery handling (e.g. in case pc_dimm plug fails, we can simply
>>>> rollback all (for now MemoryDevice) previous plug operations).  
>>> rollback should be managed by the caller of pc_dimm plug
>>> directly, so it's not relevant here.
>>>   
>>>> IMHO, the resulting code is easier to read.
>>>>
>>>> From this handling it is clear that
>>>> "if we reach the hotplug handler, and it is not some special device
>>>> plugged by the machine (CPU, PC_DIMM), pass it on to the actual hotplug
>>>> handler if any exists"  
>>> I strongly disagree with that it's easier to deal with.
>>> You are basically duplicating already generalized code
>>> from qdev_get_hotplug_handler() back into boards.
>>>
>>> If a device doesn't have to be handled by machine handler,
>>> than qdev_get_hotplug_handler() must return its bus handler
>>> if any directly. So branch in question that your are copying
>>> is a dead one, pls drop it.  
>>
>> We forward selected (pc_get_hotpug_handler()) devices to the
>> right hotplug handler. Nothing wrong about that. I don't agree
>> with "basically duplicating already generalized code" wrong.
>> We have to forward at some place. Your idea simply places that
>> code at some other place.
>>
>>
>> But I think we have to get the general idea sorted out first.
>>
>> What you have in mind (e.g. plug):
>>
>> if (TYPE_MEMORY_DEVICE) {
>> 	memory_device_plug();
>> 	if (local_err) {
>> 		goto out;
>> 	}
>>
>> 	if (TYPE_PC_DIMM) {
>> 		pc_dimm_plug(hotplug_dev, dev, &local_err);
>> 	} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>> 		hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>> 	}
>> 	if (local_err) {
>> 		memory_device_unplug()
>> 		goto out;
>> 	}
>> } else if (TYPE_CPU)
>> ...
>>
>>
>> What I have in mind (and implemented in this series):
>>
>>
>> if (TYPE_MEMORY_DEVICE) {
>> 	memory_device_plug();
>> }
>> /* possibly other interfaces */
>> if (local_err) {
>> 	error_handling();
>> 	return;
>> }
>>
>> if (TYPE_PC_DIMM) {
>> 	pc_dimm_plug(hotplug_dev, dev, &local_err);
>> } ...
>> } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
>> 	hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
>> }
> I don't like this implicit wiring, where reader of this part of code
> has no idea that TYPE_MEMORY_DEVICE might be also bus device that needs
> request forwarding.
> I'd prefer [pre/un]plug handlers be concrete and explicit spaghetti code,
> something like this:
> 
> if (TYPE_PC_DIMM) {
>     pc_dimm_plug()
>     /* do here additional concrete machine specific things */
> } else if (TYPE_VIRTIO_MEM) {
>     virtio_mem_plug() <- do forwarding in there
>     /* and do here additional concrete machine specific things */
> } else if (TYPE_CPU) {
>     cpu_plug()
>     /* do here additional concrete machine specific things */
> }

That will result in a lot of duplicate code - for every machine we
support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
virtio-mem and virtio-pmem could most probably share the code.

> 
>> if (local_err) {
>> 	if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
>> 		memory_device_unplug()
>> 	}
>> 	/* possibly other interfaces */
>> }
>> ...
>>
>>
>> I claim that my variant is more generic because:
>> - it easily supports multiple interfaces (like MemoryDevice)
>>   per Device that need a hotplug handler call
>> - there is only one call to hotplug_handler_plug() in case we
>>   add similar handling for another device
> As someone said "one more layer of indirection would solve problem".
> But then one would have a clue how it works after a while (including
> author of the feature).
> I don't think we have a problem here and need generalization.
> 
>>
>> Apart from that they do exactly the same thing.
>>
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space
  2018-06-07 13:26               ` Igor Mammedov
@ 2018-06-07 14:12                 ` David Hildenbrand
  0 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-07 14:12 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino


>> And another point against introducing a check for the maimum alignment
>> would be: There might be users with a specified alignment > max page
>> size. Changing the behavior will break these setups (strange but true).
> check is already there implicitly as assert, and you're removing it
> with this patch.
> What I'm suggesting is to drop this patch or replace assert with
> graceful error.
> If you have actual usecase that requires this check being dropped,
> then commit message should ddescribe it properly and the patch should
> be a part of series that introduces that requirement.

On x86 hotplug base is aligned to 1GB
On ppc hotplug base is aligned to 1GB

So what you're suggesting is to bail out in case we have an alignment
that does not base (i.e. turning this assert into a check like
"alignment too big")

The we would have to also install such base alignment on s390x, too.
Otherwise, if somebody specifies -m 511, adding something with alignment
of e.g. 4MB will fail.

> 
>>>> virtio-mem devices will later have a certain block size (e.g. 4MB). I
>>>> want to give devices during resource assignment the possibility to
>>>> specify their alignment requirements.  
>>> size and alignment are 2 diffrent things here, alignment in our design
>>> is dictated by backing storage page size and for performance reasons
>>> HVA and GPA should be aligned on the same boundary, users are free
>>> to pick another GPA manually as far as it has the same alignment.
>>> But for automatic placement we use backend's alignment to make placement
>>> as compact as possible but still keeping GPA aligned with HVA.
>>>   
>>>> For said virtio-mem device, this would e.g. be 4MB. (see patch 10 and 14
>>>> of how this call "get_align" comes into play), because the addresses of
>>>> the memory blocks are all aligned by e.g. 4MB. This is what is
>>>> guaranteed by the device specification.  
>>> where does this 4Mb magic comes from and why block must be aligned
>>> on this size?  
>>
>> (knowing it is hard to get the idea without the current prototype in
>> place, but that will be fixed soon)
>>
>> virtio-mem works in blocks. The block size is the size in which memory
>> can be plugged or unplugged by the guest. It also determines the
>> granularity (and therefore the bitmap) we have to use to keep track of
>> unplugged memory. It is configurable (e.g. for migration purposes), but
>> might also depend on the backend memory type. E.g. if huge pages are
>> used in the host, the huge page size defines the minimum block size. But
>> consider the last part a special case. Huge pages will not be supported
>> for now.
>>
>> The block size also defines the alignment that has to be used by the
>> guest for plugging/unplugging (because that's how blocks gets mapped to
>> the bitmap entries). So a virtio-mem device really needs a block-aligned
>> start address,
>>
>> For now I use 4MB because it matches what guests (e.g. Linux) can
>> actually support and keeps the bitmap small. But as I said, it is
>> configurable. (-device virtio-mem,block_size=1MB, ...)
>>
>>>    
>>>> E.g. for DIMMs we might want to allow to specify the section size (e.g.
>>>> 128MB on x86), otherwise e.g. Linux is not able to add all memory. (but
>>>> we should not hardcode this, as this is a Linux specific requirement -
>>>> still it would be nice to specify)  
>>> true, it's guest specific and we do not have restrictions here.
>>> The only restriction we have here is that size must be multiple of
>>> backing storage page size (i.e. alignment) so that guest would
>>> be able to use tail page.
>>>   
>>>> So in general, a memory device might have some alignment that is to be
>>>> taken care of.  
>>> Do we really need introducing frontend specific alignment?
>>> I'd try reuse backend's one and go for frontend's only in case we have to.  
>>
>> I think so. But I consider this a special case for virtio-mem.
>> (specified indirectly via block_size). For the remaining stuff, we might
>> be able to live with the memory backend aligment. And I agree that this
>> should be the default if possible.
> So if you make block_size to be multiple of backing storage alignment (i.e. page size)
> and we should keep GPA mapping (picking address to map region) based on backend's
> alignment. virtio-mem probably would even work even with huge pages at cost
> of increased granularity.
>

block_size will always be a multiple of backing storage alignment (well,
for now huge pages will not be supported, so the 4MB > page_size always
holds).

However, using an alignment of the backing storage (e.g. page size) is
not enough.

Using block_size of 1GB for virtio-mem is very unrealistic, so i think
we can ignore this for now. (meaning, getting errors if we implement
said alignment check)

>   
>>>> I don't understand right now why an upper limit on the alignment would
>>>> make sense at all. We can easily handle it during our search. And we
>>>> have to handle it either way during the search, if we plug some device
>>>> with strange sizes (e.g. 1MB DIMM).
>>>>
>>>> Of course, we might end up fragmenting guest physical memory, but that
>>>> is purely a setup issue (choosing sizes of devices + main memory
>>>> properly). I don't see a reason to error out (and of course also not to
>>>> assert out :) ).  
>>> Agreed about assert, but I'd still prefer error out there so that users
>>> won't crate insane config and then complain (see below and *).
>>>
>>> Upper alignment value is useful for proper sizing of hotplug address space,
>>> so that user could plug #slots devices upto #maxmem specified on CLI.
>>> It's still possible to misconfigure using manual placement, but default
>>> one just works, user either consumes all memory #maxmem and/or #slots.  
>>
>>
>> Please not that it will still work in most cases. Only if people start
>> to define crazy alignments (like I did), crazy DIMM sizes (e.g. 1MB) or
>> crazy block sizes for virtio-mem, we might have a fragmented guest
>> physical memory. This should  usually not happen, but if so, we already
>> have error messages for reporting this "fragmented memory".
>>
>> And I consider these "strange configs" similar as "strange manual
>> placement". Same result: fragmented memory, error out.
>>
>> Personally, I don't think we have to guarantee that automatic placement
>> works in all scenarios that the user is able to come up with. It should
>> work in sane configurations, which is still that case.
> It work now on x86 and I'd wish to keep it this way.

And it keeps working when only using DIMMS/PCDIMMs (slots). Which is
totally fine in my point of view.

>  
>>> There is no misunderstanding in case of error here as it works same as
>>> on baremetal, one doesn't have a free place to put more memory or all
>>> memory one asked for is already there.
>>>
>>> So it might be that #slots decoupling from memory device a premature
>>> action and we can still use it with virtio-mem/pmem.
>>>
>>
>> I treat #slots on x86 as #acpi_slots, that's why I don't think they
>> apply here. I can see how they are used (on x86 only!) to size the
>> address space - but I consider this a "nice to have feature" that should
>> not be supposed to work in all possible scenarios.
> slot is more user space concept, as it is on real hw where you stick a plank
> with memory, so it's easily understood by users.
> On ACPI side it's just implementation detail, we can go beyond that if it's
> needed without any adverse effects.
> Restriction mostly comes from KVM side on # of memory slots and vhost.

KVM memory slots is a completely different concept than the slots
parameter. The problem is: DIMMs use this as if it would be "acpi_slots".

> 
>> E.g. powerpc already relies on sane user configs. #slots is not used to
>> size the guest address space. Similar things will apply on s390x.
> They both could benefit from it the same way as x86 providing
> properly sized area, so users won't have to scratch their head
> trying to understand why they can't plug more memory.

Fragmented memory is more than enough as an error message IMHO.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers
  2018-06-05  7:51     ` David Hildenbrand
@ 2018-06-07 14:26       ` Igor Mammedov
  0 siblings, 0 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-06-07 14:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Gibson, qemu-devel, qemu-s390x, Michael S . Tsirkin,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Markus Armbruster, qemu-ppc, Pankaj Gupta,
	Alexander Graf, Cornelia Huck, Christian Borntraeger,
	Luiz Capitulino

On Tue, 5 Jun 2018 09:51:26 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 05.06.2018 03:08, David Gibson wrote:
> > On Thu, May 17, 2018 at 10:15:19AM +0200, David Hildenbrand wrote:  
> >> For multi stage hotplug handlers, we'll have to do some error handling
> >> in some hotplug functions, so let's use a local error variable (except
> >> for unplug requests).
> >>
> >> Also, add code to pass control to the final stage hotplug handler at the
> >> parent bus.
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++++++++++++++++-----------
> >>  1 file changed, 43 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index ebf30dd60b..b7c5c95f7a 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -3571,27 +3571,48 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
> >>  {
> >>      MachineState *ms = MACHINE(hotplug_dev);
> >>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms);
> >> +    Error *local_err = NULL;
> >>  
> >> +    /* final stage hotplug handler */
> >>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> >>          int node;
> >>  
> >>          if (!smc->dr_lmb_enabled) {
> >> -            error_setg(errp, "Memory hotplug not supported for this machine");
> >> -            return;
> >> +            error_setg(&local_err,
> >> +                       "Memory hotplug not supported for this machine");
> >> +            goto out;
> >>          }
> >> -        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
> >> -        if (*errp) {
> >> -            return;
> >> +        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
> >> +                                        &local_err);
> >> +        if (local_err) {
> >> +            goto out;
> >>          }
> >>          if (node < 0 || node >= MAX_NODES) {
> >> -            error_setg(errp, "Invaild node %d", node);
> >> -            return;
> >> +            error_setg(&local_err, "Invaild node %d", node);
> >> +            goto out;
> >>          }
> >>  
> >> -        spapr_memory_plug(hotplug_dev, dev, node, errp);
> >> +        spapr_memory_plug(hotplug_dev, dev, node, &local_err);
> >>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> >> -        spapr_core_plug(hotplug_dev, dev, errp);
> >> +        spapr_core_plug(hotplug_dev, dev, &local_err);
> >> +    } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> >> +        hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> >> +    }
> >> +out:
> >> +    error_propagate(errp, local_err);
> >> +}
> >> +
> >> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> >> +                                        DeviceState *dev, Error **errp)
> >> +{
> >> +    Error *local_err = NULL;
> >> +
> >> +    /* final stage hotplug handler */
> >> +    if (dev->parent_bus && dev->parent_bus->hotplug_handler) {  
> > 
> > As I think Igor said on the equivalent PC patch, I don't quite get
> > this.  Isn't this already handled by the generic hotplug code picking
> > up the bus's hotplug handler if the machine doesn't supply one?  
> 
> See my reply to patch nr 4.
> 
> What we do is, we install the machine hotplug handler as an
> "intermediate" hotplug handler.
> 
> E.g. if we have a VIRTIO based MemoryDevice, we have to initialize the
> MemoryDevice specific stuff in the machine hotplug handler, but then
> pass the device onto the last stage hotplug handler (which will
> eventually attach it to a bus and notify the guest).
as said in v4, pls don't do this implicit routing as it's hard to
read and maintain. Do explicit routing within concrete device helper
(virtio_mem_[un|pre]plug()) keeping un/pre/plug handlers simple.

And then you won't need if check as well, just call to bus handler
directly.

[...]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler
  2018-06-04 11:45     ` David Hildenbrand
@ 2018-06-07 15:00       ` Igor Mammedov
  2018-06-07 15:10         ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-07 15:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Mon, 4 Jun 2018 13:45:38 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 01.06.2018 13:17, Igor Mammedov wrote:
> > On Thu, 17 May 2018 10:15:25 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> Let's move all pre-plug checks we can do without the device being
> >> realized into the applicable hotplug handler for pc and spapr.
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  hw/i386/pc.c                   | 11 +++++++
> >>  hw/mem/memory-device.c         | 72 +++++++++++++++++++-----------------------
> >>  hw/ppc/spapr.c                 | 11 +++++++
> >>  include/hw/mem/memory-device.h |  2 ++
> >>  4 files changed, 57 insertions(+), 39 deletions(-)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index 8bc41ef24b..61f1537e14 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -2010,6 +2010,16 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> >>  {
> >>      Error *local_err = NULL;
> >>  
> >> +    /* first stage hotplug handler */
> >> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> >> +        memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
> >> +                               &local_err);
> >> +    }
> >> +
> >> +    if (local_err) {
> >> +        goto out;
> >> +    }
> >> +
> >>      /* final stage hotplug handler */
> >>      if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> >>          pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
> >> @@ -2017,6 +2027,7 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
> >>          hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
> >>                                   &local_err);
> >>      }
> >> +out:
> >>      error_propagate(errp, local_err);
> >>  }
> >>  
> >> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> >> index 361d38bfc5..d22c91993f 100644
> >> --- a/hw/mem/memory-device.c
> >> +++ b/hw/mem/memory-device.c
> >> @@ -68,58 +68,26 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
> >>      return 0;
> >>  }
> >>  
> >> -static void memory_device_check_addable(MachineState *ms, uint64_t size,
> >> -                                        Error **errp)
> >> -{
> >> -    uint64_t used_region_size = 0;
> >> -
> >> -    /* we will need a new memory slot for kvm and vhost */
> >> -    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
> >> -        error_setg(errp, "hypervisor has no free memory slots left");
> >> -        return;
> >> -    }
> >> -    if (!vhost_has_free_slot()) {
> >> -        error_setg(errp, "a used vhost backend has no free memory slots left");
> >> -        return;
> >> -    }
> >> -
> >> -    /* will we exceed the total amount of memory specified */
> >> -    memory_device_used_region_size(OBJECT(ms), &used_region_size);
> >> -    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
> >> -        error_setg(errp, "not enough space, currently 0x%" PRIx64
> >> -                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
> >> -                   used_region_size, ms->maxram_size - ms->ram_size);
> >> -        return;
> >> -    }
> >> -
> >> -}
> >> -
> >>  uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
> >>                                       uint64_t align, uint64_t size,
> >>                                       Error **errp)
> >>  {
> >>      uint64_t address_space_start, address_space_end;
> >> +    uint64_t used_region_size = 0;
> >>      GSList *list = NULL, *item;
> >>      uint64_t new_addr = 0;
> >>  
> >> -    if (!ms->device_memory) {
> >> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> >> -                         "supported by the machine");
> >> -        return 0;
> >> -    }
> >> -
> >> -    if (!memory_region_size(&ms->device_memory->mr)) {
> >> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> >> -                         "enabled, please specify the maxmem option");
> >> -        return 0;
> >> -    }
> >>      address_space_start = ms->device_memory->base;
> >>      address_space_end = address_space_start +
> >>                          memory_region_size(&ms->device_memory->mr);
> >>      g_assert(address_space_end >= address_space_start);
> >>  
> >> -    memory_device_check_addable(ms, size, errp);
> >> -    if (*errp) {
> >> +    /* will we exceed the total amount of memory specified */
> >> +    memory_device_used_region_size(OBJECT(ms), &used_region_size);
> >> +    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
> >> +        error_setg(errp, "not enough space, currently 0x%" PRIx64
> >> +                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
> >> +                   used_region_size, ms->maxram_size - ms->ram_size);
> >>          return 0;
> >>      }
> >>  
> >> @@ -242,6 +210,32 @@ uint64_t get_plugged_memory_size(void)
> >>      return size;
> >>  }
> >>  
> >> +void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
> >> +                            Error **errp)
> >> +{
> >> +    if (!ms->device_memory) {
> >> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> >> +                         "supported by the machine");
> >> +        return;
> >> +    }
> >> +
> >> +    if (!memory_region_size(&ms->device_memory->mr)) {
> >> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
> >> +                         "enabled, please specify the maxmem option");
> >> +        return;
> >> +    }
> >> +
> >> +    /* we will need a new memory slot for kvm and vhost */
> >> +    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
> >> +        error_setg(errp, "hypervisor has no free memory slots left");
> >> +        return;
> >> +    }
> >> +    if (!vhost_has_free_slot()) {
> >> +        error_setg(errp, "a used vhost backend has no free memory slots left");
> >> +        return;
> >> +    }  
> > thanks for extracting preparations steps into the right callback.
> > 
> > on top of this _preplug() handler should also set being plugged
> > device properties if they weren't set by user.
> >  memory_device_get_free_addr() should be here to.
> > 
> > general rule for _preplug() would be to check and prepare device
> > for being plugged but not touch anything beside the device (it's _plug handler job)  
> 
> I disagree. Or at least I disagree as part of this patch series because
> it over-complicates things :)
"Welcome to rewrite QEMU first before you do your thing" club :)

That's why I've suggested to split series in several small ones
tackling concrete goals /1st: unplug cleanups, 2nd: preplug refactoring/
instead of intermixing loosely related patches.

It should help to review and merge prerequisite work fast as it doesn't
related to virtio-mem. The rest of refactoring (which is not much once
you split out the 2 first series) that's is done directly for virtio-mem
sake should be posted as part of that series.
It probably would be biger series but posting them separately doesn't
make sense either as reviewer would have to jump between them anyways
to make sensible review.

> preplug() can do basic checks but I don't think it should be used to
> change actual properties. And I give you the main reason for this:
> 
> We have to do quite some amount of unnecessary error handling (please
> remember the pc_dimm plug code madness before I refactored it - 80%
> consisted of error handling) if we want to work on device properties
> before a device is realized. And we even have duplicate checks both in
> the realize() and the preplug() code then (again, what we had in the
> pc_dimm plug code - do we have a memdev already or not).
> 
> Right now, I assume, that all MemoryDevice functions can be safely
> called after the device has been realized without error handling. This
> is nice. It e.g. guarantees that we have a memdev assigned. Otherwise,
> every time we access some MemoryDevice property (e.g. region size), we
> have to handle possible uninitialized properties (e.g. memdev) -
> something I don't like.
> 
> So I want to avoid this by any means. And I don't really see a benefit
> for this additional error handling that will be necessary: We don't care
> about the performance in case something went wrong.
> 
error checks are not really important here.
Here I care about keeping QOM model work as it supposed to, i.e.
 object_new() -> set properties -> realize()
set properties should happen before realize() is called and
that's preplug callback.

Currently setting properties is still in plug() handler
because preplug handler was introduced later and nobody was
rewriting that code to the extent you do.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler
  2018-06-07 15:00       ` Igor Mammedov
@ 2018-06-07 15:10         ` David Hildenbrand
  0 siblings, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-07 15:10 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 07.06.2018 17:00, Igor Mammedov wrote:
> On Mon, 4 Jun 2018 13:45:38 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 01.06.2018 13:17, Igor Mammedov wrote:
>>> On Thu, 17 May 2018 10:15:25 +0200
>>> David Hildenbrand <david@redhat.com> wrote:
>>>   
>>>> Let's move all pre-plug checks we can do without the device being
>>>> realized into the applicable hotplug handler for pc and spapr.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  hw/i386/pc.c                   | 11 +++++++
>>>>  hw/mem/memory-device.c         | 72 +++++++++++++++++++-----------------------
>>>>  hw/ppc/spapr.c                 | 11 +++++++
>>>>  include/hw/mem/memory-device.h |  2 ++
>>>>  4 files changed, 57 insertions(+), 39 deletions(-)
>>>>
>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>> index 8bc41ef24b..61f1537e14 100644
>>>> --- a/hw/i386/pc.c
>>>> +++ b/hw/i386/pc.c
>>>> @@ -2010,6 +2010,16 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>>>  {
>>>>      Error *local_err = NULL;
>>>>  
>>>> +    /* first stage hotplug handler */
>>>> +    if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
>>>> +        memory_device_pre_plug(MACHINE(hotplug_dev), MEMORY_DEVICE(dev),
>>>> +                               &local_err);
>>>> +    }
>>>> +
>>>> +    if (local_err) {
>>>> +        goto out;
>>>> +    }
>>>> +
>>>>      /* final stage hotplug handler */
>>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>>>>          pc_cpu_pre_plug(hotplug_dev, dev, &local_err);
>>>> @@ -2017,6 +2027,7 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>>>          hotplug_handler_pre_plug(dev->parent_bus->hotplug_handler, dev,
>>>>                                   &local_err);
>>>>      }
>>>> +out:
>>>>      error_propagate(errp, local_err);
>>>>  }
>>>>  
>>>> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
>>>> index 361d38bfc5..d22c91993f 100644
>>>> --- a/hw/mem/memory-device.c
>>>> +++ b/hw/mem/memory-device.c
>>>> @@ -68,58 +68,26 @@ static int memory_device_used_region_size(Object *obj, void *opaque)
>>>>      return 0;
>>>>  }
>>>>  
>>>> -static void memory_device_check_addable(MachineState *ms, uint64_t size,
>>>> -                                        Error **errp)
>>>> -{
>>>> -    uint64_t used_region_size = 0;
>>>> -
>>>> -    /* we will need a new memory slot for kvm and vhost */
>>>> -    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
>>>> -        error_setg(errp, "hypervisor has no free memory slots left");
>>>> -        return;
>>>> -    }
>>>> -    if (!vhost_has_free_slot()) {
>>>> -        error_setg(errp, "a used vhost backend has no free memory slots left");
>>>> -        return;
>>>> -    }
>>>> -
>>>> -    /* will we exceed the total amount of memory specified */
>>>> -    memory_device_used_region_size(OBJECT(ms), &used_region_size);
>>>> -    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
>>>> -        error_setg(errp, "not enough space, currently 0x%" PRIx64
>>>> -                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
>>>> -                   used_region_size, ms->maxram_size - ms->ram_size);
>>>> -        return;
>>>> -    }
>>>> -
>>>> -}
>>>> -
>>>>  uint64_t memory_device_get_free_addr(MachineState *ms, const uint64_t *hint,
>>>>                                       uint64_t align, uint64_t size,
>>>>                                       Error **errp)
>>>>  {
>>>>      uint64_t address_space_start, address_space_end;
>>>> +    uint64_t used_region_size = 0;
>>>>      GSList *list = NULL, *item;
>>>>      uint64_t new_addr = 0;
>>>>  
>>>> -    if (!ms->device_memory) {
>>>> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>>>> -                         "supported by the machine");
>>>> -        return 0;
>>>> -    }
>>>> -
>>>> -    if (!memory_region_size(&ms->device_memory->mr)) {
>>>> -        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>>>> -                         "enabled, please specify the maxmem option");
>>>> -        return 0;
>>>> -    }
>>>>      address_space_start = ms->device_memory->base;
>>>>      address_space_end = address_space_start +
>>>>                          memory_region_size(&ms->device_memory->mr);
>>>>      g_assert(address_space_end >= address_space_start);
>>>>  
>>>> -    memory_device_check_addable(ms, size, errp);
>>>> -    if (*errp) {
>>>> +    /* will we exceed the total amount of memory specified */
>>>> +    memory_device_used_region_size(OBJECT(ms), &used_region_size);
>>>> +    if (used_region_size + size > ms->maxram_size - ms->ram_size) {
>>>> +        error_setg(errp, "not enough space, currently 0x%" PRIx64
>>>> +                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
>>>> +                   used_region_size, ms->maxram_size - ms->ram_size);
>>>>          return 0;
>>>>      }
>>>>  
>>>> @@ -242,6 +210,32 @@ uint64_t get_plugged_memory_size(void)
>>>>      return size;
>>>>  }
>>>>  
>>>> +void memory_device_pre_plug(MachineState *ms, const MemoryDeviceState *md,
>>>> +                            Error **errp)
>>>> +{
>>>> +    if (!ms->device_memory) {
>>>> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>>>> +                         "supported by the machine");
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (!memory_region_size(&ms->device_memory->mr)) {
>>>> +        error_setg(errp, "memory devices (e.g. for memory hotplug) are not "
>>>> +                         "enabled, please specify the maxmem option");
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /* we will need a new memory slot for kvm and vhost */
>>>> +    if (kvm_enabled() && !kvm_has_free_slot(ms)) {
>>>> +        error_setg(errp, "hypervisor has no free memory slots left");
>>>> +        return;
>>>> +    }
>>>> +    if (!vhost_has_free_slot()) {
>>>> +        error_setg(errp, "a used vhost backend has no free memory slots left");
>>>> +        return;
>>>> +    }  
>>> thanks for extracting preparations steps into the right callback.
>>>
>>> on top of this _preplug() handler should also set being plugged
>>> device properties if they weren't set by user.
>>>  memory_device_get_free_addr() should be here to.
>>>
>>> general rule for _preplug() would be to check and prepare device
>>> for being plugged but not touch anything beside the device (it's _plug handler job)  
>>
>> I disagree. Or at least I disagree as part of this patch series because
>> it over-complicates things :)
> "Welcome to rewrite QEMU first before you do your thing" club :)

I feels like I'm doing nothing else all time :)

> 
> That's why I've suggested to split series in several small ones
> tackling concrete goals /1st: unplug cleanups, 2nd: preplug refactoring/
> instead of intermixing loosely related patches.

I'll send the fixes and cleanups fairly soon. That will hopefully reduce
the patch count (it increased already a lot due to your review already).

> 
> It should help to review and merge prerequisite work fast as it doesn't
> related to virtio-mem. The rest of refactoring (which is not much once
> you split out the 2 first series) that's is done directly for virtio-mem
> sake should be posted as part of that series.
> It probably would be biger series but posting them separately doesn't
> make sense either as reviewer would have to jump between them anyways
> to make sensible review.

We'll find a way. As you want to have TYPE_VIRTIO_MEM specific handling
code, this can go into the virtio-mem series.

> 
>> preplug() can do basic checks but I don't think it should be used to
>> change actual properties. And I give you the main reason for this:
>>
>> We have to do quite some amount of unnecessary error handling (please
>> remember the pc_dimm plug code madness before I refactored it - 80%
>> consisted of error handling) if we want to work on device properties
>> before a device is realized. And we even have duplicate checks both in
>> the realize() and the preplug() code then (again, what we had in the
>> pc_dimm plug code - do we have a memdev already or not).
>>
>> Right now, I assume, that all MemoryDevice functions can be safely
>> called after the device has been realized without error handling. This
>> is nice. It e.g. guarantees that we have a memdev assigned. Otherwise,
>> every time we access some MemoryDevice property (e.g. region size), we
>> have to handle possible uninitialized properties (e.g. memdev) -
>> something I don't like.
>>
>> So I want to avoid this by any means. And I don't really see a benefit
>> for this additional error handling that will be necessary: We don't care
>> about the performance in case something went wrong.
>>
> error checks are not really important here.
> Here I care about keeping QOM model work as it supposed to, i.e.
>  object_new() -> set properties -> realize()
> set properties should happen before realize() is called and
> that's preplug callback.
> 
> Currently setting properties is still in plug() handler
> because preplug handler was introduced later and nobody was
> rewriting that code to the extent you do.
> 

/me regretting he started to touch that code

I still don't think performing that change is wort it. As I said, we
will need a lot of duplicate error handling "is memdev set or not" -
while this is checked at one central place: when realizing.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers
  2018-06-01 12:13   ` Igor Mammedov
  2018-06-04 10:03     ` David Hildenbrand
@ 2018-06-08  9:57     ` David Hildenbrand
  1 sibling, 0 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-08  9:57 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, Pankaj Gupta, Eduardo Habkost, Michael S . Tsirkin,
	Richard Henderson, Cornelia Huck, Alexander Graf,
	Markus Armbruster, Christian Borntraeger, qemu-s390x, qemu-ppc,
	Marcel Apfelbaum, Paolo Bonzini, Luiz Capitulino, David Gibson

On 01.06.2018 14:13, Igor Mammedov wrote:
> On Fri, 25 May 2018 14:43:39 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 17.05.2018 10:15, David Hildenbrand wrote:
>>> We can have devices that need certain other resources that are e.g.
>>> system resources managed by the machine. We need a clean way to assign
>>> these resources (without violating layers as brought up by Igor).
>>>
>>> One example is virtio-mem/virtio-pmem. Both device types need to be
>>> assigned some region in guest physical address space. This device memory
>>> belongs to the machine and is managed by it. However, virito devices are
>>> hotplugged using the hotplug handler their proxy device implements. So we
>>> could trigger e.g. a PCI hotplug handler for virtio-pci or a CSS/CCW
>>> hotplug handler for virtio-ccw. But definetly not the machine.
>>>
>>> Now, we can route other devices through the machine hotplug handler, to
>>> properly assign/unassign resources - like a portion in guest physical
>>> address space.
> 
> To sum up review:
> Some comments apply to several patches even though I commented only once.
> 
> I'd suggest to restructure and split series into several:
>   * unplug cleanups 08/14 & co
>   * generic _preplug refactoring so we won't have to go back to that question again

Looking into the details, I don't think this is possible and should be
done for address asignment. It can be done for the node. Here is the
reason why:

In pre_plug(), we acccess a device before it has been realized. This is
okay, as long as we are using "direct properties" like the "node", that
won't be touched during realize.

However, for address detection we need access to the memory region. This
is a "derived property", as it is based on the "memdev" property.

While we can easily verify the validity of "memdev" in the pc_dimm
pre_plug handler (instead of doing that in pc_dimm_realize()), we cannot
access the memory region itself before the device has been realized
using get_memory_region() in all cases.

While this works for PC_DIMM (pc_dimm_get_memory_region() is only based
on dimm->hostmem, which we can verify as stated), we cannot do the same
thing for NVDIMM, because nvdimm_get_memory_region() relies on "realize"
already being called and setting up "nvdimm->nvdimm_mr".

In short: we should call device functions only after realize(),
therefore address assignment is not possible during pre_plug.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-07 14:00             ` David Hildenbrand
@ 2018-06-08 12:24               ` Igor Mammedov
  2018-06-08 12:32                 ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-08 12:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 7 Jun 2018 16:00:54 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 07.06.2018 15:44, Igor Mammedov wrote:
> > On Mon, 4 Jun 2018 13:27:01 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> On 31.05.2018 16:13, Igor Mammedov wrote:  
> >>> On Wed, 30 May 2018 16:13:32 +0200
> >>> David Hildenbrand <david@redhat.com> wrote:
> >>>     
> >>>> On 30.05.2018 15:08, Igor Mammedov wrote:    
> >>>>> On Thu, 17 May 2018 10:15:17 +0200
> >>>>> David Hildenbrand <david@redhat.com> wrote:
> >>>>>       
> >>>>>> For multi stage hotplug handlers, we'll have to do some error handling
> >>>>>> in some hotplug functions, so let's use a local error variable (except
> >>>>>> for unplug requests).      
> >>>>> I'd split out introducing local error into separate patch
> >>>>> so patch would do a single thing.    
> >>
> >> I can do that if it makes review easier.
> >>  
> >>>>>       
> >>>>>> Also, add code to pass control to the final stage hotplug handler at the
> >>>>>> parent bus.      
> >>>>> But I don't agree with generic
> >>>>>  "} else if ("dev->parent_bus && dev->parent_bus->hotplug_handler) {"
> >>>>> forwarding, it's done by 3/14 for generic case and in case of
> >>>>> special device that needs bus handler called from machine one,
> >>>>> I'd suggest to do forwarding explicitly for that device only
> >>>>> like we do with acpi_dev.      
> >>>>
> >>>> I decided to do it that way because it is generic and results in nicer
> >>>> recovery handling (e.g. in case pc_dimm plug fails, we can simply
> >>>> rollback all (for now MemoryDevice) previous plug operations).    
> >>> rollback should be managed by the caller of pc_dimm plug
> >>> directly, so it's not relevant here.
> >>>     
> >>>> IMHO, the resulting code is easier to read.
> >>>>
> >>>> From this handling it is clear that
> >>>> "if we reach the hotplug handler, and it is not some special device
> >>>> plugged by the machine (CPU, PC_DIMM), pass it on to the actual hotplug
> >>>> handler if any exists"    
> >>> I strongly disagree with that it's easier to deal with.
> >>> You are basically duplicating already generalized code
> >>> from qdev_get_hotplug_handler() back into boards.
> >>>
> >>> If a device doesn't have to be handled by machine handler,
> >>> than qdev_get_hotplug_handler() must return its bus handler
> >>> if any directly. So branch in question that your are copying
> >>> is a dead one, pls drop it.    
> >>
> >> We forward selected (pc_get_hotpug_handler()) devices to the
> >> right hotplug handler. Nothing wrong about that. I don't agree
> >> with "basically duplicating already generalized code" wrong.
> >> We have to forward at some place. Your idea simply places that
> >> code at some other place.
> >>
> >>
> >> But I think we have to get the general idea sorted out first.
> >>
> >> What you have in mind (e.g. plug):
> >>
> >> if (TYPE_MEMORY_DEVICE) {
> >> 	memory_device_plug();
> >> 	if (local_err) {
> >> 		goto out;
> >> 	}
> >>
> >> 	if (TYPE_PC_DIMM) {
> >> 		pc_dimm_plug(hotplug_dev, dev, &local_err);
> >> 	} else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> >> 		hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> >> 	}
> >> 	if (local_err) {
> >> 		memory_device_unplug()
> >> 		goto out;
> >> 	}
> >> } else if (TYPE_CPU)
> >> ...
> >>
> >>
> >> What I have in mind (and implemented in this series):
> >>
> >>
> >> if (TYPE_MEMORY_DEVICE) {
> >> 	memory_device_plug();
> >> }
> >> /* possibly other interfaces */
> >> if (local_err) {
> >> 	error_handling();
> >> 	return;
> >> }
> >>
> >> if (TYPE_PC_DIMM) {
> >> 	pc_dimm_plug(hotplug_dev, dev, &local_err);
> >> } ...
> >> } else if (dev->parent_bus && dev->parent_bus->hotplug_handler) {
> >> 	hotplug_handler_plug(dev->parent_bus->hotplug_handler, dev, &local_err);
> >> }  
> > I don't like this implicit wiring, where reader of this part of code
> > has no idea that TYPE_MEMORY_DEVICE might be also bus device that needs
> > request forwarding.
> > I'd prefer [pre/un]plug handlers be concrete and explicit spaghetti code,
> > something like this:
> > 
> > if (TYPE_PC_DIMM) {
> >     pc_dimm_plug()
> >     /* do here additional concrete machine specific things */
> > } else if (TYPE_VIRTIO_MEM) {
> >     virtio_mem_plug() <- do forwarding in there
> >     /* and do here additional concrete machine specific things */
> > } else if (TYPE_CPU) {
> >     cpu_plug()
> >     /* do here additional concrete machine specific things */
> > }  
> 
> That will result in a lot of duplicate code - for every machine we
> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
> virtio-mem and virtio-pmem could most probably share the code.
maybe or maybe not, depending on if pmem endups as memory device or
separate controller. And even with duplication, machine code would
be easy to follow just down one explicit call chain.

> 
> >   
> >> if (local_err) {
> >> 	if (object_dynamic_cast(OBJECT(dev), TYPE_MEMORY_DEVICE)) {
> >> 		memory_device_unplug()
> >> 	}
> >> 	/* possibly other interfaces */
> >> }
> >> ...
> >>
> >>
> >> I claim that my variant is more generic because:
> >> - it easily supports multiple interfaces (like MemoryDevice)
> >>   per Device that need a hotplug handler call
> >> - there is only one call to hotplug_handler_plug() in case we
> >>   add similar handling for another device  
> > As someone said "one more layer of indirection would solve problem".
> > But then one would have a clue how it works after a while (including
> > author of the feature).
> > I don't think we have a problem here and need generalization.
> >   
> >>
> >> Apart from that they do exactly the same thing.
> >>  
> >   
> 
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-08 12:24               ` Igor Mammedov
@ 2018-06-08 12:32                 ` David Hildenbrand
  2018-06-08 12:55                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-08 12:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-devel, qemu-s390x, Michael S . Tsirkin, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino


>>> if (TYPE_PC_DIMM) {
>>>     pc_dimm_plug()
>>>     /* do here additional concrete machine specific things */
>>> } else if (TYPE_VIRTIO_MEM) {
>>>     virtio_mem_plug() <- do forwarding in there
>>>     /* and do here additional concrete machine specific things */
>>> } else if (TYPE_CPU) {
>>>     cpu_plug()
>>>     /* do here additional concrete machine specific things */
>>> }  
>>
>> That will result in a lot of duplicate code - for every machine we
>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
>> virtio-mem and virtio-pmem could most probably share the code.
> maybe or maybe not, depending on if pmem endups as memory device or
> separate controller. And even with duplication, machine code would
> be easy to follow just down one explicit call chain.

Not 100% convinced but I am now going into that direction.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-08 12:32                 ` David Hildenbrand
@ 2018-06-08 12:55                   ` Michael S. Tsirkin
  2018-06-08 13:07                     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2018-06-08 12:55 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:
> 
> >>> if (TYPE_PC_DIMM) {
> >>>     pc_dimm_plug()
> >>>     /* do here additional concrete machine specific things */
> >>> } else if (TYPE_VIRTIO_MEM) {
> >>>     virtio_mem_plug() <- do forwarding in there
> >>>     /* and do here additional concrete machine specific things */
> >>> } else if (TYPE_CPU) {
> >>>     cpu_plug()
> >>>     /* do here additional concrete machine specific things */
> >>> }  
> >>
> >> That will result in a lot of duplicate code - for every machine we
> >> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
> >> virtio-mem and virtio-pmem could most probably share the code.
> > maybe or maybe not, depending on if pmem endups as memory device or
> > separate controller. And even with duplication, machine code would
> > be easy to follow just down one explicit call chain.
> 
> Not 100% convinced but I am now going into that direction.

Can this go into DeviceClass? Failover has the same need to
allocate/free resources for vfio without a full realize/unrealize.

> -- 
> 
> Thanks,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-08 12:55                   ` Michael S. Tsirkin
@ 2018-06-08 13:07                     ` David Hildenbrand
  2018-06-08 15:12                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-08 13:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 08.06.2018 14:55, Michael S. Tsirkin wrote:
> On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:
>>
>>>>> if (TYPE_PC_DIMM) {
>>>>>     pc_dimm_plug()
>>>>>     /* do here additional concrete machine specific things */
>>>>> } else if (TYPE_VIRTIO_MEM) {
>>>>>     virtio_mem_plug() <- do forwarding in there
>>>>>     /* and do here additional concrete machine specific things */
>>>>> } else if (TYPE_CPU) {
>>>>>     cpu_plug()
>>>>>     /* do here additional concrete machine specific things */
>>>>> }  
>>>>
>>>> That will result in a lot of duplicate code - for every machine we
>>>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
>>>> virtio-mem and virtio-pmem could most probably share the code.
>>> maybe or maybe not, depending on if pmem endups as memory device or
>>> separate controller. And even with duplication, machine code would
>>> be easy to follow just down one explicit call chain.
>>
>> Not 100% convinced but I am now going into that direction.
> 
> Can this go into DeviceClass? Failover has the same need to
> allocate/free resources for vfio without a full realize/unrealize.
> 

Conceptually, I would have called here something like

virtio_mem_plug() ...

Which would end up calling memory_device_plug() and triggering the
target hotplug handler.

I assume this can also be done from a device class callback.

So we would need a total of 3 callbacks for

a) pre_plug
b) plug
c) unplug

In addition, unplug requests might be necessary, so

d) unplug_request

>> -- 
>>
>> Thanks,
>>
>> David / dhildenb


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-08 13:07                     ` David Hildenbrand
@ 2018-06-08 15:12                       ` Michael S. Tsirkin
  2018-06-13 10:58                         ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2018-06-08 15:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Fri, Jun 08, 2018 at 03:07:53PM +0200, David Hildenbrand wrote:
> On 08.06.2018 14:55, Michael S. Tsirkin wrote:
> > On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:
> >>
> >>>>> if (TYPE_PC_DIMM) {
> >>>>>     pc_dimm_plug()
> >>>>>     /* do here additional concrete machine specific things */
> >>>>> } else if (TYPE_VIRTIO_MEM) {
> >>>>>     virtio_mem_plug() <- do forwarding in there
> >>>>>     /* and do here additional concrete machine specific things */
> >>>>> } else if (TYPE_CPU) {
> >>>>>     cpu_plug()
> >>>>>     /* do here additional concrete machine specific things */
> >>>>> }  
> >>>>
> >>>> That will result in a lot of duplicate code - for every machine we
> >>>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
> >>>> virtio-mem and virtio-pmem could most probably share the code.
> >>> maybe or maybe not, depending on if pmem endups as memory device or
> >>> separate controller. And even with duplication, machine code would
> >>> be easy to follow just down one explicit call chain.
> >>
> >> Not 100% convinced but I am now going into that direction.
> > 
> > Can this go into DeviceClass? Failover has the same need to
> > allocate/free resources for vfio without a full realize/unrealize.
> > 
> 
> Conceptually, I would have called here something like
> 
> virtio_mem_plug() ...
> 
> Which would end up calling memory_device_plug() and triggering the
> target hotplug handler.
> 
> I assume this can also be done from a device class callback.
> 
> So we would need a total of 3 callbacks for
> 
> a) pre_plug
> b) plug
> c) unplug
> 
> In addition, unplug requests might be necessary, so
> 
> d) unplug_request


Right - basically HotplugHandlerClass.

> >> -- 
> >>
> >> Thanks,
> >>
> >> David / dhildenb
> 
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-08 15:12                       ` Michael S. Tsirkin
@ 2018-06-13 10:58                         ` David Hildenbrand
  2018-06-13 15:48                           ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-13 10:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 08.06.2018 17:12, Michael S. Tsirkin wrote:
> On Fri, Jun 08, 2018 at 03:07:53PM +0200, David Hildenbrand wrote:
>> On 08.06.2018 14:55, Michael S. Tsirkin wrote:
>>> On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:
>>>>
>>>>>>> if (TYPE_PC_DIMM) {
>>>>>>>     pc_dimm_plug()
>>>>>>>     /* do here additional concrete machine specific things */
>>>>>>> } else if (TYPE_VIRTIO_MEM) {
>>>>>>>     virtio_mem_plug() <- do forwarding in there
>>>>>>>     /* and do here additional concrete machine specific things */
>>>>>>> } else if (TYPE_CPU) {
>>>>>>>     cpu_plug()
>>>>>>>     /* do here additional concrete machine specific things */
>>>>>>> }  
>>>>>>
>>>>>> That will result in a lot of duplicate code - for every machine we
>>>>>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
>>>>>> virtio-mem and virtio-pmem could most probably share the code.
>>>>> maybe or maybe not, depending on if pmem endups as memory device or
>>>>> separate controller. And even with duplication, machine code would
>>>>> be easy to follow just down one explicit call chain.
>>>>
>>>> Not 100% convinced but I am now going into that direction.
>>>
>>> Can this go into DeviceClass? Failover has the same need to
>>> allocate/free resources for vfio without a full realize/unrealize.
>>>
>>
>> Conceptually, I would have called here something like
>>
>> virtio_mem_plug() ...
>>
>> Which would end up calling memory_device_plug() and triggering the
>> target hotplug handler.
>>
>> I assume this can also be done from a device class callback.
>>
>> So we would need a total of 3 callbacks for
>>
>> a) pre_plug
>> b) plug
>> c) unplug
>>
>> In addition, unplug requests might be necessary, so
>>
>> d) unplug_request
> 
> 
> Right - basically HotplugHandlerClass.

Looking into this idea:

What I would have right now (conceptually)

if (TYPE_PC_DIMM) {
    pc_dimm_plug(machine);
} else if (TYPE_CPU) {
    cpu_plug(machine);
} else if (TYPE_VIRTIO_MEM) {
    virtio_mem_plug(machine);
}

Instead you want something like:

if (TYPE_PC_DIMM) {
    pc_dimm_plug(machine);
} else if (TYPE_CPU) {
    cpu_plug(machine);
// igor requested an explicit list here, we could also check for
// DEVICE_CLASS(device)->plug and make it generic
} else if (TYPE_VIRTIO_MEM) {
    DEVICE_CLASS(device)->plug(machine);
    // call bus hotplug handler if necessary, or let the previous call
    // handle it?
}

We cannot pass the machine directly (due to board.h and user-only),
instead we would have to pass it as hotplug handler. Then, the device
class code would however make assumptions that always a machine is passed.

Any opinions?



>>>> -- 
>>>>
>>>> Thanks,
>>>>
>>>> David / dhildenb
>>
>>
>> -- 
>>
>> Thanks,
>>
>> David / dhildenb


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-13 10:58                         ` David Hildenbrand
@ 2018-06-13 15:48                           ` Igor Mammedov
  2018-06-13 15:51                             ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Igor Mammedov @ 2018-06-13 15:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S. Tsirkin, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Wed, 13 Jun 2018 12:58:46 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 08.06.2018 17:12, Michael S. Tsirkin wrote:
> > On Fri, Jun 08, 2018 at 03:07:53PM +0200, David Hildenbrand wrote:  
> >> On 08.06.2018 14:55, Michael S. Tsirkin wrote:  
> >>> On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:  
> >>>>  
> >>>>>>> if (TYPE_PC_DIMM) {
> >>>>>>>     pc_dimm_plug()
> >>>>>>>     /* do here additional concrete machine specific things */
> >>>>>>> } else if (TYPE_VIRTIO_MEM) {
> >>>>>>>     virtio_mem_plug() <- do forwarding in there
> >>>>>>>     /* and do here additional concrete machine specific things */
> >>>>>>> } else if (TYPE_CPU) {
> >>>>>>>     cpu_plug()
> >>>>>>>     /* do here additional concrete machine specific things */
> >>>>>>> }    
> >>>>>>
> >>>>>> That will result in a lot of duplicate code - for every machine we
> >>>>>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
> >>>>>> virtio-mem and virtio-pmem could most probably share the code.  
> >>>>> maybe or maybe not, depending on if pmem endups as memory device or
> >>>>> separate controller. And even with duplication, machine code would
> >>>>> be easy to follow just down one explicit call chain.  
> >>>>
> >>>> Not 100% convinced but I am now going into that direction.  
> >>>
> >>> Can this go into DeviceClass? Failover has the same need to
> >>> allocate/free resources for vfio without a full realize/unrealize.
> >>>  
> >>
> >> Conceptually, I would have called here something like
> >>
> >> virtio_mem_plug() ...
> >>
> >> Which would end up calling memory_device_plug() and triggering the
> >> target hotplug handler.
> >>
> >> I assume this can also be done from a device class callback.
> >>
> >> So we would need a total of 3 callbacks for
> >>
> >> a) pre_plug
> >> b) plug
> >> c) unplug
> >>
> >> In addition, unplug requests might be necessary, so
> >>
> >> d) unplug_request  
> > 
> > 
> > Right - basically HotplugHandlerClass.  
> 
> Looking into this idea:
> 
> What I would have right now (conceptually)
> 
> if (TYPE_PC_DIMM) {
>     pc_dimm_plug(machine);
> } else if (TYPE_CPU) {
>     cpu_plug(machine);
> } else if (TYPE_VIRTIO_MEM) {
>     virtio_mem_plug(machine);
> }
> 
> Instead you want something like:
> 
> if (TYPE_PC_DIMM) {
>     pc_dimm_plug(machine);
> } else if (TYPE_CPU) {
>     cpu_plug(machine);
> // igor requested an explicit list here, we could also check for
> // DEVICE_CLASS(device)->plug and make it generic
> } else if (TYPE_VIRTIO_MEM) {
>     DEVICE_CLASS(device)->plug(machine);
>     // call bus hotplug handler if necessary, or let the previous call
>     // handle it?
not exactly this, I suggested following:

      [ ... specific to machine_foo wiring ...]

      virtio_mem_plug() {
         [... some machine specific wiring ...]

         bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
         bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)

         [... some more machine specific wiring ...]
      }

      [ ... specific to machine_foo wiring ...]

i.e. device itself doesn't participate in attaching to external entities,
those entities (machine or bus controller virtio device is attached to)
do wiring on their own within their own domain.

> }
> 
> We cannot pass the machine directly (due to board.h and user-only),
> instead we would have to pass it as hotplug handler. Then, the device
> class code would however make assumptions that always a machine is passed.
> 
> Any opinions?
> 
> 
> 
> >>>> -- 
> >>>>
> >>>> Thanks,
> >>>>
> >>>> David / dhildenb  
> >>
> >>
> >> -- 
> >>
> >> Thanks,
> >>
> >> David / dhildenb  
> 
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-13 15:48                           ` Igor Mammedov
@ 2018-06-13 15:51                             ` David Hildenbrand
  2018-06-13 18:32                               ` Michael S. Tsirkin
  2018-06-14  9:20                               ` Igor Mammedov
  0 siblings, 2 replies; 76+ messages in thread
From: David Hildenbrand @ 2018-06-13 15:51 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Michael S. Tsirkin, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 13.06.2018 17:48, Igor Mammedov wrote:
> On Wed, 13 Jun 2018 12:58:46 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 08.06.2018 17:12, Michael S. Tsirkin wrote:
>>> On Fri, Jun 08, 2018 at 03:07:53PM +0200, David Hildenbrand wrote:  
>>>> On 08.06.2018 14:55, Michael S. Tsirkin wrote:  
>>>>> On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:  
>>>>>>  
>>>>>>>>> if (TYPE_PC_DIMM) {
>>>>>>>>>     pc_dimm_plug()
>>>>>>>>>     /* do here additional concrete machine specific things */
>>>>>>>>> } else if (TYPE_VIRTIO_MEM) {
>>>>>>>>>     virtio_mem_plug() <- do forwarding in there
>>>>>>>>>     /* and do here additional concrete machine specific things */
>>>>>>>>> } else if (TYPE_CPU) {
>>>>>>>>>     cpu_plug()
>>>>>>>>>     /* do here additional concrete machine specific things */
>>>>>>>>> }    
>>>>>>>>
>>>>>>>> That will result in a lot of duplicate code - for every machine we
>>>>>>>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
>>>>>>>> virtio-mem and virtio-pmem could most probably share the code.  
>>>>>>> maybe or maybe not, depending on if pmem endups as memory device or
>>>>>>> separate controller. And even with duplication, machine code would
>>>>>>> be easy to follow just down one explicit call chain.  
>>>>>>
>>>>>> Not 100% convinced but I am now going into that direction.  
>>>>>
>>>>> Can this go into DeviceClass? Failover has the same need to
>>>>> allocate/free resources for vfio without a full realize/unrealize.
>>>>>  
>>>>
>>>> Conceptually, I would have called here something like
>>>>
>>>> virtio_mem_plug() ...
>>>>
>>>> Which would end up calling memory_device_plug() and triggering the
>>>> target hotplug handler.
>>>>
>>>> I assume this can also be done from a device class callback.
>>>>
>>>> So we would need a total of 3 callbacks for
>>>>
>>>> a) pre_plug
>>>> b) plug
>>>> c) unplug
>>>>
>>>> In addition, unplug requests might be necessary, so
>>>>
>>>> d) unplug_request  
>>>
>>>
>>> Right - basically HotplugHandlerClass.  
>>
>> Looking into this idea:
>>
>> What I would have right now (conceptually)
>>
>> if (TYPE_PC_DIMM) {
>>     pc_dimm_plug(machine);
>> } else if (TYPE_CPU) {
>>     cpu_plug(machine);
>> } else if (TYPE_VIRTIO_MEM) {
>>     virtio_mem_plug(machine);
>> }
>>
>> Instead you want something like:
>>
>> if (TYPE_PC_DIMM) {
>>     pc_dimm_plug(machine);
>> } else if (TYPE_CPU) {
>>     cpu_plug(machine);
>> // igor requested an explicit list here, we could also check for
>> // DEVICE_CLASS(device)->plug and make it generic
>> } else if (TYPE_VIRTIO_MEM) {
>>     DEVICE_CLASS(device)->plug(machine);
>>     // call bus hotplug handler if necessary, or let the previous call
>>     // handle it?
> not exactly this, I suggested following:
> 
>       [ ... specific to machine_foo wiring ...]
> 
>       virtio_mem_plug() {
>          [... some machine specific wiring ...]
> 
>          bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
>          bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)
> 
>          [... some more machine specific wiring ...]
>       }
> 
>       [ ... specific to machine_foo wiring ...]
> 
> i.e. device itself doesn't participate in attaching to external entities,
> those entities (machine or bus controller virtio device is attached to)
> do wiring on their own within their own domain.

I am fine with this, but Michael asked if this ("virtio_mem_plug()")
could go via new DeviceClass functions. Michael, would that also work
for your use case?


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-13 15:51                             ` David Hildenbrand
@ 2018-06-13 18:32                               ` Michael S. Tsirkin
  2018-06-13 19:37                                 ` David Hildenbrand
  2018-06-14  9:20                               ` Igor Mammedov
  1 sibling, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2018-06-13 18:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Wed, Jun 13, 2018 at 05:51:01PM +0200, David Hildenbrand wrote:
> On 13.06.2018 17:48, Igor Mammedov wrote:
> > On Wed, 13 Jun 2018 12:58:46 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> > 
> >> On 08.06.2018 17:12, Michael S. Tsirkin wrote:
> >>> On Fri, Jun 08, 2018 at 03:07:53PM +0200, David Hildenbrand wrote:  
> >>>> On 08.06.2018 14:55, Michael S. Tsirkin wrote:  
> >>>>> On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:  
> >>>>>>  
> >>>>>>>>> if (TYPE_PC_DIMM) {
> >>>>>>>>>     pc_dimm_plug()
> >>>>>>>>>     /* do here additional concrete machine specific things */
> >>>>>>>>> } else if (TYPE_VIRTIO_MEM) {
> >>>>>>>>>     virtio_mem_plug() <- do forwarding in there
> >>>>>>>>>     /* and do here additional concrete machine specific things */
> >>>>>>>>> } else if (TYPE_CPU) {
> >>>>>>>>>     cpu_plug()
> >>>>>>>>>     /* do here additional concrete machine specific things */
> >>>>>>>>> }    
> >>>>>>>>
> >>>>>>>> That will result in a lot of duplicate code - for every machine we
> >>>>>>>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
> >>>>>>>> virtio-mem and virtio-pmem could most probably share the code.  
> >>>>>>> maybe or maybe not, depending on if pmem endups as memory device or
> >>>>>>> separate controller. And even with duplication, machine code would
> >>>>>>> be easy to follow just down one explicit call chain.  
> >>>>>>
> >>>>>> Not 100% convinced but I am now going into that direction.  
> >>>>>
> >>>>> Can this go into DeviceClass? Failover has the same need to
> >>>>> allocate/free resources for vfio without a full realize/unrealize.
> >>>>>  
> >>>>
> >>>> Conceptually, I would have called here something like
> >>>>
> >>>> virtio_mem_plug() ...
> >>>>
> >>>> Which would end up calling memory_device_plug() and triggering the
> >>>> target hotplug handler.
> >>>>
> >>>> I assume this can also be done from a device class callback.
> >>>>
> >>>> So we would need a total of 3 callbacks for
> >>>>
> >>>> a) pre_plug
> >>>> b) plug
> >>>> c) unplug
> >>>>
> >>>> In addition, unplug requests might be necessary, so
> >>>>
> >>>> d) unplug_request  
> >>>
> >>>
> >>> Right - basically HotplugHandlerClass.  
> >>
> >> Looking into this idea:
> >>
> >> What I would have right now (conceptually)
> >>
> >> if (TYPE_PC_DIMM) {
> >>     pc_dimm_plug(machine);
> >> } else if (TYPE_CPU) {
> >>     cpu_plug(machine);
> >> } else if (TYPE_VIRTIO_MEM) {
> >>     virtio_mem_plug(machine);
> >> }
> >>
> >> Instead you want something like:
> >>
> >> if (TYPE_PC_DIMM) {
> >>     pc_dimm_plug(machine);
> >> } else if (TYPE_CPU) {
> >>     cpu_plug(machine);
> >> // igor requested an explicit list here, we could also check for
> >> // DEVICE_CLASS(device)->plug and make it generic
> >> } else if (TYPE_VIRTIO_MEM) {
> >>     DEVICE_CLASS(device)->plug(machine);
> >>     // call bus hotplug handler if necessary, or let the previous call
> >>     // handle it?
> > not exactly this, I suggested following:
> > 
> >       [ ... specific to machine_foo wiring ...]
> > 
> >       virtio_mem_plug() {
> >          [... some machine specific wiring ...]
> > 
> >          bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
> >          bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)
> > 
> >          [... some more machine specific wiring ...]
> >       }
> > 
> >       [ ... specific to machine_foo wiring ...]
> > 
> > i.e. device itself doesn't participate in attaching to external entities,
> > those entities (machine or bus controller virtio device is attached to)
> > do wiring on their own within their own domain.
> 
> I am fine with this, but Michael asked if this ("virtio_mem_plug()")
> could go via new DeviceClass functions. Michael, would that also work
> for your use case?

It's not virtio specifically, I'm interested in how this will work for
PCI generally.  Right now we call do_pci_register_device which
immediately makes it guest visible.


> 
> -- 
> 
> Thanks,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-13 18:32                               ` Michael S. Tsirkin
@ 2018-06-13 19:37                                 ` David Hildenbrand
  2018-06-13 22:05                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-13 19:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

>>>       [ ... specific to machine_foo wiring ...]
>>>
>>>       virtio_mem_plug() {
>>>          [... some machine specific wiring ...]
>>>
>>>          bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
>>>          bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)
>>>
>>>          [... some more machine specific wiring ...]
>>>       }
>>>
>>>       [ ... specific to machine_foo wiring ...]
>>>
>>> i.e. device itself doesn't participate in attaching to external entities,
>>> those entities (machine or bus controller virtio device is attached to)
>>> do wiring on their own within their own domain.
>>
>> I am fine with this, but Michael asked if this ("virtio_mem_plug()")
>> could go via new DeviceClass functions. Michael, would that also work
>> for your use case?
> 
> It's not virtio specifically, I'm interested in how this will work for
> PCI generally.  Right now we call do_pci_register_device which
> immediately makes it guest visible.

So you're telling me that a PCI device exposes itself to the system in
pci_qdev_realize() instead of letting a hotplug handler handle that? My
assumption is that the PCI bridge hotplug handler handles this. What am
I missing?

I can see that e.g. for a virtio device the realize call chain is

pci_qdev_realize() -> virtio_pci_realize() -> virtio_XXX__pci_realize ->
virtio_XXX_realize()

If any realization in pci_qdev_realize() fails, we do a
do_pci_unregister_device().

So if it is true what you're saying than we're already exposing
partially realized (and possibly unrealized again) devices via PCI. I
*guess* because we're holding the iothread mutex this is okay and
actually not visible. And we only seem to be sending events in the PCI
bridge hotplug handlers, so my assumption is that this is fine.

> 
> 
>>
>> -- 
>>
>> Thanks,
>>
>> David / dhildenb


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-13 19:37                                 ` David Hildenbrand
@ 2018-06-13 22:05                                   ` Michael S. Tsirkin
  2018-06-14  6:14                                     ` David Hildenbrand
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2018-06-13 22:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Wed, Jun 13, 2018 at 09:37:54PM +0200, David Hildenbrand wrote:
> >>>       [ ... specific to machine_foo wiring ...]
> >>>
> >>>       virtio_mem_plug() {
> >>>          [... some machine specific wiring ...]
> >>>
> >>>          bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
> >>>          bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)
> >>>
> >>>          [... some more machine specific wiring ...]
> >>>       }
> >>>
> >>>       [ ... specific to machine_foo wiring ...]
> >>>
> >>> i.e. device itself doesn't participate in attaching to external entities,
> >>> those entities (machine or bus controller virtio device is attached to)
> >>> do wiring on their own within their own domain.
> >>
> >> I am fine with this, but Michael asked if this ("virtio_mem_plug()")
> >> could go via new DeviceClass functions. Michael, would that also work
> >> for your use case?
> > 
> > It's not virtio specifically, I'm interested in how this will work for
> > PCI generally.  Right now we call do_pci_register_device which
> > immediately makes it guest visible.
> 
> So you're telling me that a PCI device exposes itself to the system in
> pci_qdev_realize() instead of letting a hotplug handler handle that? My
> assumption is that the PCI bridge hotplug handler handles this.

Well given how things work in qemu that's not exactly
the case. See below.

> What am
> I missing?
> 
> I can see that e.g. for a virtio device the realize call chain is
> 
> pci_qdev_realize() -> virtio_pci_realize() -> virtio_XXX__pci_realize ->
> virtio_XXX_realize()
> 
> If any realization in pci_qdev_realize() fails, we do a
> do_pci_unregister_device().
> 
> So if it is true what you're saying than we're already exposing
> partially realized (and possibly unrealized again) devices via PCI. I
> *guess* because we're holding the iothread mutex this is okay and
> actually not visible.

For now but we need ability to have separate new commands for
realize and plug, so we will drop the mutex.

> And we only seem to be sending events in the PCI
> bridge hotplug handlers, so my assumption is that this is fine.

For core PCI, it's mostly just this line:

    bus->devices[devfn] = pci_dev;

which makes it accessible to pci config cycles.

But failover also cares about vfio, which seems to set up
e.g. irqfs on realize.




> > 
> > 
> >>
> >> -- 
> >>
> >> Thanks,
> >>
> >> David / dhildenb
> 
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-13 22:05                                   ` Michael S. Tsirkin
@ 2018-06-14  6:14                                     ` David Hildenbrand
  2018-06-14  9:16                                       ` Igor Mammedov
  0 siblings, 1 reply; 76+ messages in thread
From: David Hildenbrand @ 2018-06-14  6:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Igor Mammedov, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On 14.06.2018 00:05, Michael S. Tsirkin wrote:
> On Wed, Jun 13, 2018 at 09:37:54PM +0200, David Hildenbrand wrote:
>>>>>       [ ... specific to machine_foo wiring ...]
>>>>>
>>>>>       virtio_mem_plug() {
>>>>>          [... some machine specific wiring ...]
>>>>>
>>>>>          bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
>>>>>          bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)
>>>>>
>>>>>          [... some more machine specific wiring ...]
>>>>>       }
>>>>>
>>>>>       [ ... specific to machine_foo wiring ...]
>>>>>
>>>>> i.e. device itself doesn't participate in attaching to external entities,
>>>>> those entities (machine or bus controller virtio device is attached to)
>>>>> do wiring on their own within their own domain.
>>>>
>>>> I am fine with this, but Michael asked if this ("virtio_mem_plug()")
>>>> could go via new DeviceClass functions. Michael, would that also work
>>>> for your use case?
>>>
>>> It's not virtio specifically, I'm interested in how this will work for
>>> PCI generally.  Right now we call do_pci_register_device which
>>> immediately makes it guest visible.
>>
>> So you're telling me that a PCI device exposes itself to the system in
>> pci_qdev_realize() instead of letting a hotplug handler handle that? My
>> assumption is that the PCI bridge hotplug handler handles this.
> 
> Well given how things work in qemu that's not exactly
> the case. See below.
> 
>> What am
>> I missing?
>>
>> I can see that e.g. for a virtio device the realize call chain is
>>
>> pci_qdev_realize() -> virtio_pci_realize() -> virtio_XXX__pci_realize ->
>> virtio_XXX_realize()
>>
>> If any realization in pci_qdev_realize() fails, we do a
>> do_pci_unregister_device().
>>
>> So if it is true what you're saying than we're already exposing
>> partially realized (and possibly unrealized again) devices via PCI. I
>> *guess* because we're holding the iothread mutex this is okay and
>> actually not visible.
> 
> For now but we need ability to have separate new commands for
> realize and plug, so we will drop the mutex.

If you want to actually drop the mutex, I assume that quite some rework
will be necessary (not only for this specific PCI hotplug handler case),
most probably also in other code parts (to) - all of the hotplug/realize
code seems to rely on the mutex being locked and not being dropped
temporarily.

> 
>> And we only seem to be sending events in the PCI
>> bridge hotplug handlers, so my assumption is that this is fine.
> 
> For core PCI, it's mostly just this line:
> 
>     bus->devices[devfn] = pci_dev;

This should go into the hotplug handler if I am not wrong. From what I
learned from Igor, this is a layer violation. Resource assignment should
happen during pre_plug / plug. But then you might need a different way
to "reserve" a function (if there could be races then with the lock
temporarily dropped).

> 
> which makes it accessible to pci config cycles.
> 
> But failover also cares about vfio, which seems to set up
> e.g. irqfs on realize.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-14  6:14                                     ` David Hildenbrand
@ 2018-06-14  9:16                                       ` Igor Mammedov
  0 siblings, 0 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-06-14  9:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S. Tsirkin, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Thu, 14 Jun 2018 08:14:05 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 14.06.2018 00:05, Michael S. Tsirkin wrote:
> > On Wed, Jun 13, 2018 at 09:37:54PM +0200, David Hildenbrand wrote:  
> >>>>>       [ ... specific to machine_foo wiring ...]
> >>>>>
> >>>>>       virtio_mem_plug() {
> >>>>>          [... some machine specific wiring ...]
> >>>>>
> >>>>>          bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
> >>>>>          bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)
> >>>>>
> >>>>>          [... some more machine specific wiring ...]
> >>>>>       }
> >>>>>
> >>>>>       [ ... specific to machine_foo wiring ...]
> >>>>>
> >>>>> i.e. device itself doesn't participate in attaching to external entities,
> >>>>> those entities (machine or bus controller virtio device is attached to)
> >>>>> do wiring on their own within their own domain.  
> >>>>
> >>>> I am fine with this, but Michael asked if this ("virtio_mem_plug()")
> >>>> could go via new DeviceClass functions. Michael, would that also work
> >>>> for your use case?  
> >>>
> >>> It's not virtio specifically, I'm interested in how this will work for
> >>> PCI generally.  Right now we call do_pci_register_device which
> >>> immediately makes it guest visible.  
> >>
> >> So you're telling me that a PCI device exposes itself to the system in
> >> pci_qdev_realize() instead of letting a hotplug handler handle that? My
> >> assumption is that the PCI bridge hotplug handler handles this.  
> > 
> > Well given how things work in qemu that's not exactly
> > the case. See below.
> >   
> >> What am
> >> I missing?
> >>
> >> I can see that e.g. for a virtio device the realize call chain is
> >>
> >> pci_qdev_realize() -> virtio_pci_realize() -> virtio_XXX__pci_realize ->
> >> virtio_XXX_realize()
> >>
> >> If any realization in pci_qdev_realize() fails, we do a
> >> do_pci_unregister_device().
> >>
> >> So if it is true what you're saying than we're already exposing
> >> partially realized (and possibly unrealized again) devices via PCI. I
> >> *guess* because we're holding the iothread mutex this is okay and
> >> actually not visible.  
> > 
> > For now but we need ability to have separate new commands for
> > realize and plug, so we will drop the mutex.  
> 
> If you want to actually drop the mutex, I assume that quite some rework
> will be necessary (not only for this specific PCI hotplug handler case),
> most probably also in other code parts (to) - all of the hotplug/realize
> code seems to rely on the mutex being locked and not being dropped
> temporarily.
yep, all monitor actions and reactions from guest via mmio are now
protected by global lock that guaranties not parallel action could
executed at the same time. So it's save for now and dropping global
lock would require some refactoring (probably a lot).

> >   
> >> And we only seem to be sending events in the PCI
> >> bridge hotplug handlers, so my assumption is that this is fine.  
> > 
> > For core PCI, it's mostly just this line:
> > 
> >     bus->devices[devfn] = pci_dev;  
> 
> This should go into the hotplug handler if I am not wrong. From what I
> learned from Igor, this is a layer violation. Resource assignment should
> happen during pre_plug / plug. But then you might need a different way
> to "reserve" a function (if there could be races then with the lock
> temporarily dropped).
I agree, but it's a separate refactoring and it isn't pre-requisite for
virtio-mem work, so it shouldn't hold it.

> > which makes it accessible to pci config cycles.
> > 
> > But failover also cares about vfio, which seems to set up
> > e.g. irqfs on realize.  
Do we have a thread for failover somewhere on the list that discusses
ideas and requirements for it, where we can discuss it?
Otherwise this discussion will get buried in this unrelated thread.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers
  2018-06-13 15:51                             ` David Hildenbrand
  2018-06-13 18:32                               ` Michael S. Tsirkin
@ 2018-06-14  9:20                               ` Igor Mammedov
  1 sibling, 0 replies; 76+ messages in thread
From: Igor Mammedov @ 2018-06-14  9:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S. Tsirkin, qemu-devel, qemu-s390x, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost, David Gibson,
	Markus Armbruster, qemu-ppc, Pankaj Gupta, Alexander Graf,
	Cornelia Huck, Christian Borntraeger, Luiz Capitulino

On Wed, 13 Jun 2018 17:51:01 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 13.06.2018 17:48, Igor Mammedov wrote:
> > On Wed, 13 Jun 2018 12:58:46 +0200
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> On 08.06.2018 17:12, Michael S. Tsirkin wrote:  
> >>> On Fri, Jun 08, 2018 at 03:07:53PM +0200, David Hildenbrand wrote:    
> >>>> On 08.06.2018 14:55, Michael S. Tsirkin wrote:    
> >>>>> On Fri, Jun 08, 2018 at 02:32:09PM +0200, David Hildenbrand wrote:    
> >>>>>>    
> >>>>>>>>> if (TYPE_PC_DIMM) {
> >>>>>>>>>     pc_dimm_plug()
> >>>>>>>>>     /* do here additional concrete machine specific things */
> >>>>>>>>> } else if (TYPE_VIRTIO_MEM) {
> >>>>>>>>>     virtio_mem_plug() <- do forwarding in there
> >>>>>>>>>     /* and do here additional concrete machine specific things */
> >>>>>>>>> } else if (TYPE_CPU) {
> >>>>>>>>>     cpu_plug()
> >>>>>>>>>     /* do here additional concrete machine specific things */
> >>>>>>>>> }      
> >>>>>>>>
> >>>>>>>> That will result in a lot of duplicate code - for every machine we
> >>>>>>>> support (dimm/virtio-mem/virtio-pmem/*add more memory devices here*) -
> >>>>>>>> virtio-mem and virtio-pmem could most probably share the code.    
> >>>>>>> maybe or maybe not, depending on if pmem endups as memory device or
> >>>>>>> separate controller. And even with duplication, machine code would
> >>>>>>> be easy to follow just down one explicit call chain.    
> >>>>>>
> >>>>>> Not 100% convinced but I am now going into that direction.    
> >>>>>
> >>>>> Can this go into DeviceClass? Failover has the same need to
> >>>>> allocate/free resources for vfio without a full realize/unrealize.
> >>>>>    
> >>>>
> >>>> Conceptually, I would have called here something like
> >>>>
> >>>> virtio_mem_plug() ...
> >>>>
> >>>> Which would end up calling memory_device_plug() and triggering the
> >>>> target hotplug handler.
> >>>>
> >>>> I assume this can also be done from a device class callback.
> >>>>
> >>>> So we would need a total of 3 callbacks for
> >>>>
> >>>> a) pre_plug
> >>>> b) plug
> >>>> c) unplug
> >>>>
> >>>> In addition, unplug requests might be necessary, so
> >>>>
> >>>> d) unplug_request    
> >>>
> >>>
> >>> Right - basically HotplugHandlerClass.    
> >>
> >> Looking into this idea:
> >>
> >> What I would have right now (conceptually)
> >>
> >> if (TYPE_PC_DIMM) {
> >>     pc_dimm_plug(machine);
> >> } else if (TYPE_CPU) {
> >>     cpu_plug(machine);
> >> } else if (TYPE_VIRTIO_MEM) {
> >>     virtio_mem_plug(machine);
> >> }
> >>
> >> Instead you want something like:
> >>
> >> if (TYPE_PC_DIMM) {
> >>     pc_dimm_plug(machine);
> >> } else if (TYPE_CPU) {
> >>     cpu_plug(machine);
> >> // igor requested an explicit list here, we could also check for
> >> // DEVICE_CLASS(device)->plug and make it generic
> >> } else if (TYPE_VIRTIO_MEM) {
> >>     DEVICE_CLASS(device)->plug(machine);
> >>     // call bus hotplug handler if necessary, or let the previous call
> >>     // handle it?  
> > not exactly this, I suggested following:
> > 
> >       [ ... specific to machine_foo wiring ...]
> > 
> >       virtio_mem_plug() {
> >          [... some machine specific wiring ...]
> > 
> >          bus_hotplug_ctrl = qdev_get_bus_hotplug_handler()
> >          bus_hotplug_ctrl->plug(bus_hotplug_ctrl, dev)
> > 
> >          [... some more machine specific wiring ...]
> >       }
> > 
> >       [ ... specific to machine_foo wiring ...]
> > 
> > i.e. device itself doesn't participate in attaching to external entities,
> > those entities (machine or bus controller virtio device is attached to)
> > do wiring on their own within their own domain.  
> 
> I am fine with this, but Michael asked if this ("virtio_mem_plug()")
> could go via new DeviceClass functions. Michael, would that also work
> for your use case?
We can discuss DeviceClass functions when patches for failover surface
and if it's really need.

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2018-06-14  9:20 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-17  8:15 [Qemu-devel] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 01/14] memory-device: drop assert related to align and start of address space David Hildenbrand
2018-05-29 13:27   ` Igor Mammedov
2018-05-29 16:02     ` David Hildenbrand
2018-05-30 12:57       ` Igor Mammedov
2018-05-30 14:06         ` David Hildenbrand
2018-05-31 13:54           ` Igor Mammedov
2018-06-04 10:53             ` David Hildenbrand
2018-06-07 13:26               ` Igor Mammedov
2018-06-07 14:12                 ` David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 02/14] memory-device: introduce separate config option David Hildenbrand
2018-05-30 12:58   ` Igor Mammedov
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 03/14] qdev: let machine hotplug handler to override bus hotplug handler David Hildenbrand
2018-06-05  1:02   ` David Gibson
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 04/14] pc: prepare for multi stage hotplug handlers David Hildenbrand
2018-05-30 13:08   ` Igor Mammedov
2018-05-30 14:13     ` David Hildenbrand
2018-05-31 14:13       ` Igor Mammedov
2018-06-04 11:27         ` David Hildenbrand
2018-06-07 13:44           ` Igor Mammedov
2018-06-07 14:00             ` David Hildenbrand
2018-06-08 12:24               ` Igor Mammedov
2018-06-08 12:32                 ` David Hildenbrand
2018-06-08 12:55                   ` Michael S. Tsirkin
2018-06-08 13:07                     ` David Hildenbrand
2018-06-08 15:12                       ` Michael S. Tsirkin
2018-06-13 10:58                         ` David Hildenbrand
2018-06-13 15:48                           ` Igor Mammedov
2018-06-13 15:51                             ` David Hildenbrand
2018-06-13 18:32                               ` Michael S. Tsirkin
2018-06-13 19:37                                 ` David Hildenbrand
2018-06-13 22:05                                   ` Michael S. Tsirkin
2018-06-14  6:14                                     ` David Hildenbrand
2018-06-14  9:16                                       ` Igor Mammedov
2018-06-14  9:20                               ` Igor Mammedov
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 05/14] pc: route all memory devices through the machine hotplug handler David Hildenbrand
2018-05-30 13:12   ` Igor Mammedov
2018-05-30 14:08     ` David Hildenbrand
2018-05-30 14:27       ` Paolo Bonzini
2018-05-30 14:31         ` David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 06/14] spapr: prepare for multi stage hotplug handlers David Hildenbrand
2018-05-17 12:43   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2018-06-01 10:33   ` [Qemu-devel] " Igor Mammedov
2018-06-05  1:08   ` David Gibson
2018-06-05  7:51     ` David Hildenbrand
2018-06-07 14:26       ` Igor Mammedov
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 07/14] spapr: route all memory devices through the machine hotplug handler David Hildenbrand
2018-06-05  1:09   ` David Gibson
2018-06-05  7:51     ` David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 08/14] spapr: handle pc-dimm unplug via hotplug handler chain David Hildenbrand
2018-06-01 10:53   ` Igor Mammedov
2018-06-05  1:12   ` David Gibson
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 09/14] spapr: handle cpu core " David Hildenbrand
2018-06-01 10:57   ` Igor Mammedov
2018-06-05  1:13   ` David Gibson
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 10/14] memory-device: new functions to handle plug/unplug David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 11/14] pc-dimm: implement new memory device functions David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 12/14] memory-device: factor out pre-plug into hotplug handler David Hildenbrand
2018-06-01 11:17   ` Igor Mammedov
2018-06-04 11:45     ` David Hildenbrand
2018-06-07 15:00       ` Igor Mammedov
2018-06-07 15:10         ` David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 13/14] memory-device: factor out unplug " David Hildenbrand
2018-06-01 11:31   ` Igor Mammedov
2018-06-04 15:54     ` David Hildenbrand
2018-05-17  8:15 ` [Qemu-devel] [PATCH v4 14/14] memory-device: factor out plug " David Hildenbrand
2018-06-01 11:39   ` Igor Mammedov
2018-06-04 11:47     ` David Hildenbrand
2018-06-07 10:44       ` [Qemu-devel] [qemu-s390x] " David Hildenbrand
2018-05-25 12:43 ` [Qemu-devel] [qemu-s390x] [PATCH v4 00/14] MemoryDevice: use multi stage hotplug handlers David Hildenbrand
2018-05-30 14:03   ` Paolo Bonzini
2018-05-31 11:47     ` Igor Mammedov
2018-05-31 11:50       ` Paolo Bonzini
2018-06-01 12:13   ` Igor Mammedov
2018-06-04 10:03     ` David Hildenbrand
2018-06-08  9:57     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.