All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
@ 2013-09-03 12:32 Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
                   ` (41 more replies)
  0 siblings, 42 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

QOM splits the destruction of a device in two phases:

- unrealize, also known as "exit" from qdev times, should isolate
  the device from the guest.  After unrealize returns, the guest
  should not be able to issue new requests.

- instance_finalize will reclaim the memory.  This is only called
  after all requests terminate and drop the references on the
  device.

Though overlooked, this is important even now: QEMU's little secret is
that devices already do access memory out of the iothread mutex (with
address_space_map/unmap and AIO), and this can be MMIO memory too
through a bounce buffer.  This series prepares things so that, once
we'll put the memory_region_ref/unref infrastructure to complete use,
things will just work.

Of course this split will be particularly important for devices that
will be able to do unlocked MMIO.

This series changes all PCI devices (the sole to support hotplug _and_
use MemoryRegions) to do memory_region_del_subregion at unrealize time,
and memory_region_destroy at instance_finalize time.  As it is mostly
a PCI patch, it should go through mst's tree.

Paolo

Paolo Bonzini (38):
  qdev: document assumption that unrealize is followed by finalize
  pci: split exit and finalize
  ac97: use instance_finalize instead of exit
  es1370: use instance_finalize instead of exit
  hda: reclaim memory in instance_finalize instead of exit
  serial: reclaim memory in instance_finalize instead of exit
  tpci200: use instance_finalize instead of exit
  pci-assign: reclaim memory in instance_finalize instead of exit
  ahci: reclaim memory in instance_finalize instead of exit
  msix: split msix_free from msix_uninit
  cmd646: use instance_finalize instead of exit
  ide/piix: use instance_finalize instead of exit
  ide/via: use instance_finalize instead of exit
  ivshmem: reclaim memory in instance_finalize instead of exit
  pci-testdev: use instance_finalize instead of exit
  vfio: reclaim memory in instance_finalize instead of exit
  e1000: use instance_finalize instead of exit
  eepro100: use instance_finalize instead of exit
  ne2000: use instance_finalize instead of exit
  pcnet: use instance_finalize instead of exit
  rtl8139: use instance_finalize instead of exit
  vmxnet3: reclaim memory in instance_finalize instead of exit
  shpc: split shpc_free from shpc_cleanup
  pci_bridge: split pci_bridge_free from pci_bridge_exitfn
  pcie_aer: pcie_aer_exit really frees stuff
  pci_bridge: reclaim memory in instance_finalize instead of exit
  ioh4320: reclaim memory in instance_finalize instead of exit
  xio3130-downstream: reclaim memory in instance_finalize instead of
    exit
  xio3130-upstream: reclaim memory in instance_finalize instead of exit
  pcie: do not recreate mmcfg I/O region, use an alias instead
  esp: use instance_finalize instead of exit
  lsi: use instance_finalize instead of exit
  pvscsi: reclaim memory in instance_finalize instead of exit
  usb-uhci: use instance_finalize instead of exit
  virtio-pci: reclaim memory in instance_finalize instead of exit
  wdt_i6300esb: use instance_finalize instead of exit
  xen_pt: reclaim memory in instance_finalize instead of exit
  tpm: move add/del_subregion to realize/unrealize

 hw/audio/ac97.c                    |  5 ++--
 hw/audio/es1370.c                  |  5 ++--
 hw/audio/intel-hda.c               |  8 ++++++
 hw/char/serial-pci.c               | 24 ++++++++++++++++++
 hw/char/tpci200.c                  |  5 ++--
 hw/i386/kvm/pci-assign.c           |  8 ++++++
 hw/ide/ahci.c                      |  2 +-
 hw/ide/ahci.h                      |  2 +-
 hw/ide/cmd646.c                    |  6 ++---
 hw/ide/ich.c                       | 12 ++++++---
 hw/ide/piix.c                      |  9 ++++---
 hw/ide/via.c                       |  6 ++---
 hw/misc/ivshmem.c                  | 13 +++++++---
 hw/misc/pci-testdev.c              |  6 ++---
 hw/misc/vfio.c                     | 52 +++++++++++++++++++++++++++++++++++---
 hw/net/e1000.c                     |  6 ++---
 hw/net/eepro100.c                  |  5 ++--
 hw/net/ne2000.c                    |  5 ++--
 hw/net/pcnet-pci.c                 |  6 ++---
 hw/net/rtl8139.c                   |  6 ++---
 hw/net/vmxnet3.c                   | 14 ++++++++--
 hw/pci-bridge/i82801b11.c          |  1 +
 hw/pci-bridge/ioh3420.c            | 11 +++++++-
 hw/pci-bridge/pci_bridge_dev.c     | 13 +++++++++-
 hw/pci-bridge/xio3130_downstream.c | 11 +++++++-
 hw/pci-bridge/xio3130_upstream.c   | 11 +++++++-
 hw/pci/msix.c                      | 22 +++++++++++-----
 hw/pci/pci.c                       | 15 ++++++++---
 hw/pci/pci_bridge.c                |  5 ++++
 hw/pci/pcie_aer.c                  |  3 ++-
 hw/pci/pcie_host.c                 | 23 ++++++++++++-----
 hw/pci/shpc.c                      |  8 +++++-
 hw/scsi/esp-pci.c                  |  6 ++---
 hw/scsi/lsi53c895a.c               |  6 ++---
 hw/scsi/vmw_pvscsi.c               | 12 ++++++++-
 hw/tpm/tpm_tis.c                   | 17 +++++++++----
 hw/usb/hcd-uhci.c                  |  5 ++--
 hw/virtio/virtio-pci.c             | 10 +++++++-
 hw/watchdog/wdt_i6300esb.c         |  5 ++--
 hw/xen/xen_pt.c                    | 10 ++++++++
 hw/xen/xen_pt_config_init.c        |  3 ---
 hw/xen/xen_pt_msi.c                |  8 +++++-
 include/hw/pci/msix.h              |  1 +
 include/hw/pci/pci_bridge.h        |  1 +
 include/hw/pci/pcie_aer.h          |  2 +-
 include/hw/pci/pcie_host.h         |  1 +
 include/hw/pci/shpc.h              |  1 +
 include/hw/qdev-core.h             |  4 +++
 48 files changed, 329 insertions(+), 91 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-17  9:00   ` Michael S. Tsirkin
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
                   ` (40 subsequent siblings)
  41 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

This becomes important when undoing realize's initializations is split
in two places (unrealize and exit).

The way to fix this could be to split realize further into "alloc" (done
once) and "realize" (can be undone).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/hw/qdev-core.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 46972f4..d840f06 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -86,6 +86,10 @@ struct VMStateDescription;
  * object_initialize() in their own #TypeInfo.instance_init and forward the
  * realization events appropriately.
  *
+ * Note that for now it is not possible to unrealize a device and then
+ * re-realize it.  While this may change in the future, it is a valid
+ * assumption for now.
+ *
  * The @init callback is considered private to a particular bus implementation
  * (immediate abstract child types of TYPE_DEVICE). Derived leaf types set an
  * "init" callback on their parent class instead.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 02/38] pci: split exit and finalize
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-17  9:16   ` Michael S. Tsirkin
  2013-09-17 10:06   ` Michael S. Tsirkin
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 03/38] ac97: use instance_finalize instead of exit Paolo Bonzini
                   ` (39 subsequent siblings)
  41 siblings, 2 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

When converting devices to use out-of-BQL memory access, destruction
needs to be done in two phases.  First, the device is unrealized;
at this point, pending memory accesses can still be completed, but
no new accesses will be started.  The second part is freeing the
device, which happens only after the reference count drops to zero;
this means that all memory accesses are complete.

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci/pci.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4c004f5..bd084c7 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -787,6 +787,16 @@ static void pci_config_free(PCIDevice *pci_dev)
     g_free(pci_dev->used);
 }
 
+static void pci_device_instance_finalize(Object *obj)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(obj);
+
+    qemu_free_irqs(pci_dev->irq);
+
+    address_space_destroy(&pci_dev->bus_master_as);
+    memory_region_destroy(&pci_dev->bus_master_enable_region);
+}
+
 /* -1 for devfn means auto assign */
 static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
                                          const char *name, int devfn)
@@ -875,12 +885,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
 
 static void do_pci_unregister_device(PCIDevice *pci_dev)
 {
-    qemu_free_irqs(pci_dev->irq);
     pci_dev->bus->devices[pci_dev->devfn] = NULL;
     pci_config_free(pci_dev);
-
-    address_space_destroy(&pci_dev->bus_master_as);
-    memory_region_destroy(&pci_dev->bus_master_enable_region);
 }
 
 static void pci_unregister_io_regions(PCIDevice *pci_dev)
@@ -2252,6 +2258,7 @@ static const TypeInfo pci_device_type_info = {
     .abstract = true,
     .class_size = sizeof(PCIDeviceClass),
     .class_init = pci_device_class_init,
+    .instance_finalize = pci_device_instance_finalize,
 };
 
 static void pci_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 03/38] ac97: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 04/38] es1370: " Paolo Bonzini
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/audio/ac97.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/audio/ac97.c b/hw/audio/ac97.c
index 01b4dfb..04ae601 100644
--- a/hw/audio/ac97.c
+++ b/hw/audio/ac97.c
@@ -1390,8 +1390,9 @@ static int ac97_initfn (PCIDevice *dev)
     return 0;
 }
 
-static void ac97_exitfn (PCIDevice *dev)
+static void ac97_instance_finalize (Object *obj)
 {
+    PCIDevice *dev = PCI_DEVICE(obj);
     AC97LinkState *s = DO_UPCAST (AC97LinkState, dev, dev);
 
     memory_region_destroy (&s->io_nam);
@@ -1415,7 +1416,6 @@ static void ac97_class_init (ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS (klass);
 
     k->init = ac97_initfn;
-    k->exit = ac97_exitfn;
     k->vendor_id = PCI_VENDOR_ID_INTEL;
     k->device_id = PCI_DEVICE_ID_INTEL_82801AA_5;
     k->revision = 0x01;
@@ -1431,6 +1431,7 @@ static const TypeInfo ac97_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof (AC97LinkState),
     .class_init    = ac97_class_init,
+    .instance_finalize = ac97_instance_finalize,
 };
 
 static void ac97_register_types (void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 04/38] es1370: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (2 preceding siblings ...)
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 03/38] ac97: use instance_finalize instead of exit Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 05/38] hda: reclaim memory in " Paolo Bonzini
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/audio/es1370.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/audio/es1370.c b/hw/audio/es1370.c
index adb66ce..b68fb84 100644
--- a/hw/audio/es1370.c
+++ b/hw/audio/es1370.c
@@ -1044,8 +1044,9 @@ static int es1370_initfn (PCIDevice *dev)
     return 0;
 }
 
-static void es1370_exitfn (PCIDevice *dev)
+static void es1370_instance_finalize (Object *obj)
 {
+    PCIDevice *dev = PCI_DEVICE(obj);
     ES1370State *s = DO_UPCAST (ES1370State, dev, dev);
 
     memory_region_destroy (&s->io);
@@ -1063,7 +1064,6 @@ static void es1370_class_init (ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS (klass);
 
     k->init = es1370_initfn;
-    k->exit = es1370_exitfn;
     k->vendor_id = PCI_VENDOR_ID_ENSONIQ;
     k->device_id = PCI_DEVICE_ID_ENSONIQ_ES1370;
     k->class_id = PCI_CLASS_MULTIMEDIA_AUDIO;
@@ -1079,6 +1079,7 @@ static const TypeInfo es1370_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof (ES1370State),
     .class_init    = es1370_class_init,
+    .instance_finalize = es1370_instance_finalize,
 };
 
 static void es1370_register_types (void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 05/38] hda: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (3 preceding siblings ...)
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 04/38] es1370: " Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 06/38] serial: " Paolo Bonzini
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/audio/intel-hda.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 78f9d28..cbba88e 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -1153,6 +1153,13 @@ static void intel_hda_exit(PCIDevice *pci)
     IntelHDAState *d = INTEL_HDA(pci);
 
     msi_uninit(&d->pci);
+}
+
+static void intel_hda_instance_finalize(Object *obj)
+{
+    PCIDevice *pci = PCI_DEVICE(obj);
+    IntelHDAState *d = DO_UPCAST(IntelHDAState, pci, pci);
+
     memory_region_destroy(&d->mmio);
 }
 
@@ -1285,6 +1292,7 @@ static const TypeInfo intel_hda_info_ich6 = {
     .name          = "intel-hda",
     .parent        = TYPE_INTEL_HDA_GENERIC,
     .class_init    = intel_hda_class_init_ich6,
+    .instance_finalize = intel_hda_instance_finalize,
 };
 
 static const TypeInfo intel_hda_info_ich9 = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 06/38] serial: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (4 preceding siblings ...)
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 05/38] hda: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 07/38] tpci200: use " Paolo Bonzini
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/char/serial-pci.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c
index aec6705..6dd34d3 100644
--- a/hw/char/serial-pci.c
+++ b/hw/char/serial-pci.c
@@ -131,6 +131,14 @@ static void serial_pci_exit(PCIDevice *dev)
     SerialState *s = &pci->state;
 
     serial_exit_core(s);
+}
+
+static void serial_pci_instance_finalize(Object *obj)
+{
+    PCIDevice *dev = PCI_DEVICE(obj);
+    PCISerialState *pci = DO_UPCAST(PCISerialState, dev, dev);
+    SerialState *s = &pci->state;
+
     memory_region_destroy(&s->io);
 }
 
@@ -143,9 +151,22 @@ static void multi_serial_pci_exit(PCIDevice *dev)
     for (i = 0; i < pci->ports; i++) {
         s = pci->state + i;
         serial_exit_core(s);
+    }
+}
+
+static void multi_serial_pci_instance_finalize(Object *obj)
+{
+    PCIDevice *dev = PCI_DEVICE(obj);
+    PCIMultiSerialState *pci = DO_UPCAST(PCIMultiSerialState, dev, dev);
+    SerialState *s;
+    int i;
+
+    for (i = 0; i < pci->ports; i++) {
+        s = pci->state + i;
         memory_region_destroy(&s->io);
         g_free(pci->name[i]);
     }
+
     memory_region_destroy(&pci->iobar);
     qemu_free_irqs(pci->irqs);
 }
@@ -243,6 +264,7 @@ static const TypeInfo serial_pci_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCISerialState),
     .class_init    = serial_pci_class_initfn,
+    .instance_finalize = serial_pci_instance_finalize,
 };
 
 static const TypeInfo multi_2x_serial_pci_info = {
@@ -250,6 +272,7 @@ static const TypeInfo multi_2x_serial_pci_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCIMultiSerialState),
     .class_init    = multi_2x_serial_pci_class_initfn,
+    .instance_finalize = multi_serial_pci_instance_finalize,
 };
 
 static const TypeInfo multi_4x_serial_pci_info = {
@@ -257,6 +280,7 @@ static const TypeInfo multi_4x_serial_pci_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCIMultiSerialState),
     .class_init    = multi_4x_serial_pci_class_initfn,
+    .instance_finalize = multi_serial_pci_instance_finalize,
 };
 
 static void serial_pci_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 07/38] tpci200: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (5 preceding siblings ...)
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 06/38] serial: " Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 08/38] pci-assign: reclaim memory in " Paolo Bonzini
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/char/tpci200.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/char/tpci200.c b/hw/char/tpci200.c
index d9e17b2..948a188 100644
--- a/hw/char/tpci200.c
+++ b/hw/char/tpci200.c
@@ -613,8 +613,9 @@ static int tpci200_initfn(PCIDevice *pci_dev)
     return 0;
 }
 
-static void tpci200_exitfn(PCIDevice *pci_dev)
+static void tpci200_instance_finalize(Object *obj)
 {
+    PCIDevice *pci_dev = PCI_DEVICE(obj);
     TPCI200State *s = TPCI200(pci_dev);
 
     memory_region_destroy(&s->mmio);
@@ -646,7 +647,6 @@ static void tpci200_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = tpci200_initfn;
-    k->exit = tpci200_exitfn;
     k->vendor_id = PCI_VENDOR_ID_TEWS;
     k->device_id = PCI_DEVICE_ID_TEWS_TPCI200;
     k->class_id = PCI_CLASS_BRIDGE_OTHER;
@@ -662,6 +662,7 @@ static const TypeInfo tpci200_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(TPCI200State),
     .class_init    = tpci200_class_init,
+    .instance_finalize = tpci200_instance_finalize,
 };
 
 static void tpci200_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 08/38] pci-assign: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (6 preceding siblings ...)
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 07/38] tpci200: use " Paolo Bonzini
@ 2013-09-03 12:32 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 09/38] ahci: " Paolo Bonzini
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/i386/kvm/pci-assign.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 011764f..9d0ff3f 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1829,6 +1829,13 @@ static void assigned_exitfn(struct PCIDevice *pci_dev)
     AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
     deassign_device(dev);
+}
+
+static void assigned_instance_finalize(Object *obj)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(obj);
+    AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+
     free_assigned_device(dev);
 }
 
@@ -1864,6 +1871,7 @@ static const TypeInfo assign_info = {
     .parent             = TYPE_PCI_DEVICE,
     .instance_size      = sizeof(AssignedDevice),
     .class_init         = assign_class_init,
+    .instance_finalize  = assigned_instance_finalize,
 };
 
 static void assign_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 09/38] ahci: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (7 preceding siblings ...)
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 08/38] pci-assign: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit Paolo Bonzini
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/ide/ahci.c |  2 +-
 hw/ide/ahci.h |  2 +-
 hw/ide/ich.c  | 12 ++++++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index bba150f..8f1d37b 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1185,7 +1185,7 @@ void ahci_init(AHCIState *s, DeviceState *qdev, AddressSpace *as, int ports)
     }
 }
 
-void ahci_uninit(AHCIState *s)
+void ahci_instance_finalize(AHCIState *s)
 {
     memory_region_destroy(&s->mem);
     memory_region_destroy(&s->idp);
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 20e412c..d1d483c 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -347,7 +347,7 @@ typedef struct NCQFrame {
 } QEMU_PACKED NCQFrame;
 
 void ahci_init(AHCIState *s, DeviceState *qdev, AddressSpace *as, int ports);
-void ahci_uninit(AHCIState *s);
+void ahci_instance_finalize(AHCIState *s);
 
 void ahci_reset(AHCIState *s);
 
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index bff952b..e4f6416 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -140,11 +140,14 @@ static int pci_ich9_ahci_init(PCIDevice *dev)
 
 static void pci_ich9_uninit(PCIDevice *dev)
 {
-    struct AHCIPCIState *d;
-    d = ICH_AHCI(dev);
-
     msi_uninit(dev);
-    ahci_uninit(&d->ahci);
+}
+
+static void pci_ich9_instance_finalize(Object *obj)
+{
+    struct AHCIPCIState *d = ICH_AHCI(obj);
+
+    ahci_instance_finalize(&d->ahci);
 }
 
 static void ich_ahci_class_init(ObjectClass *klass, void *data)
@@ -168,6 +171,7 @@ static const TypeInfo ich_ahci_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(AHCIPCIState),
     .class_init    = ich_ahci_class_init,
+    .instance_finalize = pci_ich9_instance_finalize,
 };
 
 static void ich_ahci_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (8 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 09/38] ahci: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-17  9:21   ` Michael S. Tsirkin
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 11/38] cmd646: use instance_finalize instead of exit Paolo Bonzini
                   ` (31 subsequent siblings)
  41 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/misc/vfio.c         |  1 +
 hw/net/vmxnet3.c       |  3 +++
 hw/pci/msix.c          | 22 ++++++++++++++++------
 hw/virtio/virtio-pci.c |  1 +
 include/hw/pci/msix.h  |  1 +
 5 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index a1c08fb..1311d0e 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -2144,6 +2144,7 @@ static void vfio_teardown_msi(VFIODevice *vdev)
     if (vdev->msix) {
         msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
                     &vdev->bars[vdev->msix->pba_bar].mem);
+        msix_free(&vdev->pdev);
     }
 }
 
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 49c2466..2a79e52 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1979,6 +1979,7 @@ vmxnet3_init_msix(VMXNET3State *s)
         if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
             VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
             msix_uninit(d, &s->msix_bar, &s->msix_bar);
+            msix_free(d);
             s->msix_used = false;
         } else {
             s->msix_used = true;
@@ -1996,6 +1997,8 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
         msix_vector_unuse(d, VMXNET3_MAX_INTRS);
         msix_uninit(d, &s->msix_bar, &s->msix_bar);
     }
+
+    msix_free(d);
 }
 
 #define VMXNET3_MSI_NUM_VECTORS   (1)
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 3430770..0618e3b 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -359,16 +359,26 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar)
     msix_free_irq_entries(dev);
     dev->msix_entries_nr = 0;
     memory_region_del_subregion(pba_bar, &dev->msix_pba_mmio);
-    memory_region_destroy(&dev->msix_pba_mmio);
-    g_free(dev->msix_pba);
-    dev->msix_pba = NULL;
     memory_region_del_subregion(table_bar, &dev->msix_table_mmio);
-    memory_region_destroy(&dev->msix_table_mmio);
-    g_free(dev->msix_table);
+    dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
+}
+
+void msix_free(PCIDevice *dev)
+{
+    if (dev->msix_pba) {
+        memory_region_destroy(&dev->msix_pba_mmio);
+        g_free(dev->msix_pba);
+    }
+    dev->msix_pba = NULL;
+
+    if (dev->msix_table) {
+        memory_region_destroy(&dev->msix_table_mmio);
+        g_free(dev->msix_table);
+    }
     dev->msix_table = NULL;
+
     g_free(dev->msix_entry_used);
     dev->msix_entry_used = NULL;
-    dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
 }
 
 void msix_uninit_exclusive_bar(PCIDevice *dev)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index f2c489b..ff6bed5 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -999,6 +999,7 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
     virtio_pci_stop_ioeventfd(proxy);
     memory_region_destroy(&proxy->bar);
     msix_uninit_exclusive_bar(pci_dev);
+    msix_free(pci_dev);
 }
 
 static void virtio_pci_reset(DeviceState *qdev)
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 954d82b..34e3ba2 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -18,6 +18,7 @@ void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int len);
 void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
                  MemoryRegion *pba_bar);
 void msix_uninit_exclusive_bar(PCIDevice *dev);
+void msix_free(PCIDevice *dev);
 
 unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 11/38] cmd646: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (9 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 12/38] ide/piix: " Paolo Bonzini
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/ide/cmd646.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index d6ef799..161b118 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -303,9 +303,9 @@ static int pci_cmd646_ide_initfn(PCIDevice *dev)
     return 0;
 }
 
-static void pci_cmd646_ide_exitfn(PCIDevice *dev)
+static void pci_cmd646_ide_instance_finalize(Object *obj)
 {
-    PCIIDEState *d = PCI_IDE(dev);
+    PCIIDEState *d = PCI_IDE(obj);
     unsigned i;
 
     for (i = 0; i < 2; ++i) {
@@ -342,7 +342,6 @@ static void cmd646_ide_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_cmd646_ide_initfn;
-    k->exit = pci_cmd646_ide_exitfn;
     k->vendor_id = PCI_VENDOR_ID_CMD;
     k->device_id = PCI_DEVICE_ID_CMD_646;
     k->revision = 0x07;
@@ -354,6 +353,7 @@ static const TypeInfo cmd646_ide_info = {
     .name          = "cmd646-ide",
     .parent        = TYPE_PCI_IDE,
     .class_init    = cmd646_ide_class_init,
+    .instance_finalize = pci_cmd646_ide_instance_finalize,
 };
 
 static void cmd646_ide_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 12/38] ide/piix: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (10 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 11/38] cmd646: use instance_finalize instead of exit Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 13/38] ide/via: " Paolo Bonzini
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/ide/piix.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index e6e6c0b..ddd72c1 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -200,9 +200,9 @@ PCIDevice *pci_piix3_xen_ide_init(PCIBus *bus, DriveInfo **hd_table, int devfn)
     return dev;
 }
 
-static void pci_piix_ide_exitfn(PCIDevice *dev)
+static void pci_piix_ide_instance_finalize(Object *obj)
 {
-    PCIIDEState *d = PCI_IDE(dev);
+    PCIIDEState *d = PCI_IDE(obj);
     unsigned i;
 
     for (i = 0; i < 2; ++i) {
@@ -243,7 +243,6 @@ static void piix3_ide_class_init(ObjectClass *klass, void *data)
 
     k->no_hotplug = 1;
     k->init = pci_piix_ide_initfn;
-    k->exit = pci_piix_ide_exitfn;
     k->vendor_id = PCI_VENDOR_ID_INTEL;
     k->device_id = PCI_DEVICE_ID_INTEL_82371SB_1;
     k->class_id = PCI_CLASS_STORAGE_IDE;
@@ -255,6 +254,7 @@ static const TypeInfo piix3_ide_info = {
     .name          = "piix3-ide",
     .parent        = TYPE_PCI_IDE,
     .class_init    = piix3_ide_class_init,
+    .instance_finalize = pci_piix_ide_instance_finalize,
 };
 
 static void piix3_ide_xen_class_init(ObjectClass *klass, void *data)
@@ -275,6 +275,7 @@ static const TypeInfo piix3_ide_xen_info = {
     .name          = "piix3-ide-xen",
     .parent        = TYPE_PCI_IDE,
     .class_init    = piix3_ide_xen_class_init,
+    .instance_finalize = pci_piix_ide_instance_finalize,
 };
 
 static void piix4_ide_class_init(ObjectClass *klass, void *data)
@@ -284,7 +285,6 @@ static void piix4_ide_class_init(ObjectClass *klass, void *data)
 
     k->no_hotplug = 1;
     k->init = pci_piix_ide_initfn;
-    k->exit = pci_piix_ide_exitfn;
     k->vendor_id = PCI_VENDOR_ID_INTEL;
     k->device_id = PCI_DEVICE_ID_INTEL_82371AB;
     k->class_id = PCI_CLASS_STORAGE_IDE;
@@ -296,6 +296,7 @@ static const TypeInfo piix4_ide_info = {
     .name          = "piix4-ide",
     .parent        = TYPE_PCI_IDE,
     .class_init    = piix4_ide_class_init,
+    .instance_finalize = pci_piix_ide_instance_finalize,
 };
 
 static void piix_ide_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 13/38] ide/via: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (11 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 12/38] ide/piix: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 14/38] ivshmem: reclaim memory in " Paolo Bonzini
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/ide/via.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index e5fb297..0b63bee 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -191,9 +191,9 @@ static int vt82c686b_ide_initfn(PCIDevice *dev)
     return 0;
 }
 
-static void vt82c686b_ide_exitfn(PCIDevice *dev)
+static void vt82c686b_ide_instance_finalize(Object *obj)
 {
-    PCIIDEState *d = PCI_IDE(dev);
+    PCIIDEState *d = PCI_IDE(obj);
     unsigned i;
 
     for (i = 0; i < 2; ++i) {
@@ -219,7 +219,6 @@ static void via_ide_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = vt82c686b_ide_initfn;
-    k->exit = vt82c686b_ide_exitfn;
     k->vendor_id = PCI_VENDOR_ID_VIA;
     k->device_id = PCI_DEVICE_ID_VIA_IDE;
     k->revision = 0x06;
@@ -232,6 +231,7 @@ static const TypeInfo via_ide_info = {
     .name          = "via-ide",
     .parent        = TYPE_PCI_IDE,
     .class_init    = via_ide_class_init,
+    .instance_finalize = vt82c686b_ide_instance_finalize,
 };
 
 static void via_ide_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 14/38] ivshmem: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (12 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 13/38] ide/via: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 15/38] pci-testdev: use " Paolo Bonzini
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/misc/ivshmem.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 2838866..46d8c27 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -784,17 +784,23 @@ static void pci_ivshmem_uninit(PCIDevice *dev)
 {
     IVShmemState *s = IVSHMEM(dev);
 
+    memory_region_del_subregion(&s->bar, &s->ivshmem);
+}
+
+static void pci_ivshmem_instance_finalize(Object *obj)
+{
+    IVShmemState *s = IVSHMEM(obj);
+
     if (s->migration_blocker) {
         migrate_del_blocker(s->migration_blocker);
         error_free(s->migration_blocker);
     }
 
     memory_region_destroy(&s->ivshmem_mmio);
-    memory_region_del_subregion(&s->bar, &s->ivshmem);
-    vmstate_unregister_ram(&s->ivshmem, DEVICE(dev));
+    vmstate_unregister_ram(&s->ivshmem, DEVICE(s));
     memory_region_destroy(&s->ivshmem);
     memory_region_destroy(&s->bar);
-    unregister_savevm(DEVICE(dev), "ivshmem", s);
+    unregister_savevm(DEVICE(s), "ivshmem", s);
 }
 
 static Property ivshmem_properties[] = {
@@ -829,6 +835,7 @@ static const TypeInfo ivshmem_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(IVShmemState),
     .class_init    = ivshmem_class_init,
+    .instance_finalize = pci_ivshmem_instance_finalize,
 };
 
 static void ivshmem_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 15/38] pci-testdev: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (13 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 14/38] ivshmem: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 16/38] vfio: reclaim memory in " Paolo Bonzini
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/misc/pci-testdev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
index ca53b3f..10bf145 100644
--- a/hw/misc/pci-testdev.c
+++ b/hw/misc/pci-testdev.c
@@ -280,9 +280,9 @@ static int pci_testdev_init(PCIDevice *pci_dev)
 }
 
 static void
-pci_testdev_uninit(PCIDevice *dev)
+pci_testdev_instance_finalize(Object *obj)
 {
-    PCITestDevState *d = PCI_TEST_DEV(dev);
+    PCITestDevState *d = PCI_TEST_DEV(obj);
     int i;
 
     pci_testdev_reset(d);
@@ -309,7 +309,6 @@ static void pci_testdev_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_testdev_init;
-    k->exit = pci_testdev_uninit;
     k->vendor_id = PCI_VENDOR_ID_REDHAT;
     k->device_id = PCI_DEVICE_ID_REDHAT_TEST;
     k->revision = 0x00;
@@ -324,6 +323,7 @@ static const TypeInfo pci_testdev_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCITestDevState),
     .class_init    = pci_testdev_class_init,
+    .instance_finalize = pci_testdev_instance_finalize,
 };
 
 static void pci_testdev_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 16/38] vfio: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (14 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 15/38] pci-testdev: use " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 17/38] e1000: use " Paolo Bonzini
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

This requires splitting vfio_teardown_msi, vfio_unmap_bar and
vfio_unmap_bars in two.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/misc/vfio.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 47 insertions(+), 4 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 1311d0e..b463df5 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -2144,6 +2144,12 @@ static void vfio_teardown_msi(VFIODevice *vdev)
     if (vdev->msix) {
         msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
                     &vdev->bars[vdev->msix->pba_bar].mem);
+    }
+}
+
+static void vfio_destroy_msi(VFIODevice *vdev)
+{
+    if (vdev->msix) {
         msix_free(&vdev->pdev);
     }
 }
@@ -2180,10 +2186,23 @@ static void vfio_unmap_bar(VFIODevice *vdev, int nr)
     vfio_bar_quirk_teardown(vdev, nr);
 
     memory_region_del_subregion(&bar->mem, &bar->mmap_mem);
-    munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
 
     if (vdev->msix && vdev->msix->table_bar == nr) {
         memory_region_del_subregion(&bar->mem, &vdev->msix->mmap_mem);
+    }
+}
+
+static void vfio_destroy_bar(VFIODevice *vdev, int nr)
+{
+    VFIOBAR *bar = &vdev->bars[nr];
+
+    if (!bar->size) {
+        return;
+    }
+
+    munmap(bar->mmap, memory_region_size(&bar->mmap_mem));
+
+    if (vdev->msix && vdev->msix->table_bar == nr) {
         munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem));
     }
 
@@ -2340,6 +2359,18 @@ static void vfio_unmap_bars(VFIODevice *vdev)
     if (vdev->has_vga) {
         vfio_vga_quirk_teardown(vdev);
         pci_unregister_vga(&vdev->pdev);
+    }
+}
+
+static void vfio_destroy_bars(VFIODevice *vdev)
+{
+    int i;
+
+    for (i = 0; i < PCI_ROM_SLOT; i++) {
+        vfio_destroy_bar(vdev, i);
+    }
+
+    if (vdev->has_vga) {
         memory_region_destroy(&vdev->vga.region[QEMU_PCI_VGA_MEM].mem);
         memory_region_destroy(&vdev->vga.region[QEMU_PCI_VGA_IO_LO].mem);
         memory_region_destroy(&vdev->vga.region[QEMU_PCI_VGA_IO_HI].mem);
@@ -3194,7 +3225,9 @@ static int vfio_initfn(PCIDevice *pdev)
 out_teardown:
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
     vfio_teardown_msi(vdev);
+    vfio_destroy_msi(vdev);
     vfio_unmap_bars(vdev);
+    vfio_destroy_bars(vdev);
 out_put:
     g_free(vdev->emulated_config_bits);
     vfio_put_device(vdev);
@@ -3205,16 +3238,25 @@ out_put:
 static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
-    VFIOGroup *group = vdev->group;
 
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
     vfio_disable_interrupts(vdev);
+    vfio_teardown_msi(vdev);
+    vfio_unmap_bars(vdev);
+}
+
+static void vfio_instance_finalize(Object *obj)
+{
+    PCIDevice *pdev = PCI_DEVICE(obj);
+    VFIODevice *vdev = DO_UPCAST(VFIODevice, pdev, pdev);
+    VFIOGroup *group = vdev->group;
+
     if (vdev->intx.mmap_timer) {
         timer_free(vdev->intx.mmap_timer);
     }
-    vfio_teardown_msi(vdev);
-    vfio_unmap_bars(vdev);
+    vfio_destroy_msi(vdev);
+    vfio_destroy_bars(vdev);
     g_free(vdev->emulated_config_bits);
     vfio_put_device(vdev);
     vfio_put_group(group);
@@ -3313,6 +3355,7 @@ static const TypeInfo vfio_pci_dev_info = {
     .parent = TYPE_PCI_DEVICE,
     .instance_size = sizeof(VFIODevice),
     .class_init = vfio_pci_dev_class_init,
+    .instance_finalize = vfio_instance_finalize,
 };
 
 static void register_vfio_pci_dev_type(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 17/38] e1000: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (15 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 16/38] vfio: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-17  9:27   ` Michael S. Tsirkin
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 18/38] eepro100: " Paolo Bonzini
                   ` (24 subsequent siblings)
  41 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/net/e1000.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index f5ebed4..55fb062 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1310,9 +1310,9 @@ e1000_cleanup(NetClientState *nc)
 }
 
 static void
-pci_e1000_uninit(PCIDevice *dev)
+pci_e1000_instance_finalize(Object *obj)
 {
-    E1000State *d = E1000(dev);
+    E1000State *d = E1000(obj);
 
     timer_del(d->autoneg_timer);
     timer_free(d->autoneg_timer);
@@ -1394,7 +1394,6 @@ static void e1000_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_e1000_init;
-    k->exit = pci_e1000_uninit;
     k->romfile = "efi-e1000.rom";
     k->vendor_id = PCI_VENDOR_ID_INTEL;
     k->device_id = E1000_DEVID;
@@ -1412,6 +1411,7 @@ static const TypeInfo e1000_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(E1000State),
     .class_init    = e1000_class_init,
+    .instance_finalize = pci_e1000_instance_finalize,
 };
 
 static void e1000_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 18/38] eepro100: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (16 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 17/38] e1000: use " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 19/38] ne2000: " Paolo Bonzini
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/net/eepro100.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index ffa60d5..9d45aa4 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -1840,8 +1840,9 @@ static void nic_cleanup(NetClientState *nc)
     s->nic = NULL;
 }
 
-static void pci_nic_uninit(PCIDevice *pci_dev)
+static void pci_nic_instance_finalize(Object *obj)
 {
+    PCIDevice *pci_dev = PCI_DEVICE(obj);
     EEPRO100State *s = DO_UPCAST(EEPRO100State, dev, pci_dev);
 
     memory_region_destroy(&s->mmio_bar);
@@ -2090,7 +2091,6 @@ static void eepro100_class_init(ObjectClass *klass, void *data)
     k->class_id = PCI_CLASS_NETWORK_ETHERNET;
     k->romfile = "pxe-eepro100.rom";
     k->init = e100_nic_init;
-    k->exit = pci_nic_uninit;
     k->device_id = info->device_id;
     k->revision = info->revision;
     k->subsystem_vendor_id = info->subsystem_vendor_id;
@@ -2108,6 +2108,7 @@ static void eepro100_register_types(void)
         type_info.parent = TYPE_PCI_DEVICE;
         type_info.class_init = eepro100_class_init;
         type_info.instance_size = sizeof(EEPRO100State);
+        type_info.instance_finalize = pci_nic_instance_finalize;
         
         type_register(&type_info);
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 19/38] ne2000: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (17 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 18/38] eepro100: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 20/38] pcnet: " Paolo Bonzini
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/net/ne2000.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/net/ne2000.c b/hw/net/ne2000.c
index 31afd28..3a2597e 100644
--- a/hw/net/ne2000.c
+++ b/hw/net/ne2000.c
@@ -745,8 +745,9 @@ static int pci_ne2000_init(PCIDevice *pci_dev)
     return 0;
 }
 
-static void pci_ne2000_exit(PCIDevice *pci_dev)
+static void pci_ne2000_instance_finalize(Object *obj)
 {
+    PCIDevice *pci_dev = PCI_DEVICE(obj);
     PCINE2000State *d = DO_UPCAST(PCINE2000State, dev, pci_dev);
     NE2000State *s = &d->ne2000;
 
@@ -765,7 +766,6 @@ static void ne2000_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_ne2000_init;
-    k->exit = pci_ne2000_exit;
     k->romfile = "efi-ne2k_pci.rom",
     k->vendor_id = PCI_VENDOR_ID_REALTEK;
     k->device_id = PCI_DEVICE_ID_REALTEK_8029;
@@ -780,6 +780,7 @@ static const TypeInfo ne2000_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCINE2000State),
     .class_init    = ne2000_class_init,
+    .instance_finalize = pci_ne2000_instance_finalize,
 };
 
 static void ne2000_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 20/38] pcnet: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (18 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 19/38] ne2000: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 21/38] rtl8139: " Paolo Bonzini
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/net/pcnet-pci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/net/pcnet-pci.c b/hw/net/pcnet-pci.c
index a893165..12c2349 100644
--- a/hw/net/pcnet-pci.c
+++ b/hw/net/pcnet-pci.c
@@ -278,9 +278,9 @@ static void pci_pcnet_cleanup(NetClientState *nc)
     pcnet_common_cleanup(d);
 }
 
-static void pci_pcnet_uninit(PCIDevice *dev)
+static void pci_pcnet_instance_finalize(Object *obj)
 {
-    PCIPCNetState *d = PCI_PCNET(dev);
+    PCIPCNetState *d = PCI_PCNET(obj);
 
     memory_region_destroy(&d->state.mmio);
     memory_region_destroy(&d->io_bar);
@@ -357,7 +357,6 @@ static void pcnet_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_pcnet_init;
-    k->exit = pci_pcnet_uninit;
     k->romfile = "efi-pcnet.rom",
     k->vendor_id = PCI_VENDOR_ID_AMD;
     k->device_id = PCI_DEVICE_ID_AMD_LANCE;
@@ -374,6 +373,7 @@ static const TypeInfo pcnet_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCIPCNetState),
     .class_init    = pcnet_class_init,
+    .instance_finalize = pci_pcnet_instance_finalize,
 };
 
 static void pci_pcnet_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 21/38] rtl8139: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (19 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 20/38] pcnet: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 22/38] vmxnet3: reclaim memory in " Paolo Bonzini
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/net/rtl8139.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index c31199f..af76d7d 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -3456,9 +3456,9 @@ static void rtl8139_cleanup(NetClientState *nc)
     s->nic = NULL;
 }
 
-static void pci_rtl8139_uninit(PCIDevice *dev)
+static void pci_rtl8139_instance_finalize(Object *obj)
 {
-    RTL8139State *s = RTL8139(dev);
+    RTL8139State *s = RTL8139(obj);
 
     memory_region_destroy(&s->bar_io);
     memory_region_destroy(&s->bar_mem);
@@ -3554,7 +3554,6 @@ static void rtl8139_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = pci_rtl8139_init;
-    k->exit = pci_rtl8139_uninit;
     k->romfile = "efi-rtl8139.rom";
     k->vendor_id = PCI_VENDOR_ID_REALTEK;
     k->device_id = PCI_DEVICE_ID_REALTEK_8139;
@@ -3571,6 +3570,7 @@ static const TypeInfo rtl8139_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(RTL8139State),
     .class_init    = rtl8139_class_init,
+    .instance_finalize = pci_rtl8139_instance_finalize,
 };
 
 static void rtl8139_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 22/38] vmxnet3: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (20 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 21/38] rtl8139: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup Paolo Bonzini
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/net/vmxnet3.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 2a79e52..e65d7bb 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1979,7 +1979,6 @@ vmxnet3_init_msix(VMXNET3State *s)
         if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
             VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
             msix_uninit(d, &s->msix_bar, &s->msix_bar);
-            msix_free(d);
             s->msix_used = false;
         } else {
             s->msix_used = true;
@@ -2124,11 +2123,18 @@ static void vmxnet3_pci_uninit(PCIDevice *pci_dev)
 
     unregister_savevm(dev, "vmxnet3-msix", s);
 
-    vmxnet3_net_uninit(s);
-
     vmxnet3_cleanup_msix(s);
 
     vmxnet3_cleanup_msi(s);
+}
+
+static void vmxnet3_pci_instance_finalize(Object *obj)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(obj);
+    VMXNET3State *s = VMXNET3(pci_dev);
+
+    vmxnet3_net_uninit(s);
+    msix_free(pci_dev);
 
     memory_region_destroy(&s->bar0);
     memory_region_destroy(&s->bar1);
@@ -2464,6 +2470,7 @@ static const TypeInfo vmxnet3_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(VMXNET3State),
     .class_init    = vmxnet3_class_init,
+    .instance_finalize = vmxnet3_pci_instance_finalize,
 };
 
 static void vmxnet3_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (21 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 22/38] vmxnet3: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-17  9:24   ` Michael S. Tsirkin
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 24/38] pci_bridge: split pci_bridge_free from pci_bridge_exitfn Paolo Bonzini
                   ` (18 subsequent siblings)
  41 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-bridge/pci_bridge_dev.c | 2 ++
 hw/pci/shpc.c                  | 8 +++++++-
 include/hw/pci/shpc.h          | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index a9392c7..97dfc49 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -79,6 +79,7 @@ msi_error:
     slotid_cap_cleanup(dev);
 slotid_error:
     shpc_cleanup(dev, &bridge_dev->bar);
+    shpc_free(dev);
 shpc_error:
     memory_region_destroy(&bridge_dev->bar);
     pci_bridge_exitfn(dev);
@@ -94,6 +95,7 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
     }
     slotid_cap_cleanup(dev);
     shpc_cleanup(dev, &bridge_dev->bar);
+    shpc_free(dev);
     memory_region_destroy(&bridge_dev->bar);
     pci_bridge_exitfn(dev);
 }
diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index eb092fd..cefaf69 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -630,15 +630,21 @@ int shpc_bar_size(PCIDevice *d)
 void shpc_cleanup(PCIDevice *d, MemoryRegion *bar)
 {
     SHPCDevice *shpc = d->shpc;
+    /* TODO: cleanup config space changes? */
     d->cap_present &= ~QEMU_PCI_CAP_SHPC;
     memory_region_del_subregion(bar, &shpc->mmio);
-    /* TODO: cleanup config space changes? */
+}
+
+void shpc_free(PCIDevice *d)
+{
+    SHPCDevice *shpc = d->shpc;
     g_free(shpc->config);
     g_free(shpc->cmask);
     g_free(shpc->wmask);
     g_free(shpc->w1cmask);
     memory_region_destroy(&shpc->mmio);
     g_free(shpc);
+    d->shpc = NULL;
 }
 
 void shpc_cap_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
diff --git a/include/hw/pci/shpc.h b/include/hw/pci/shpc.h
index 467911a..5f27431 100644
--- a/include/hw/pci/shpc.h
+++ b/include/hw/pci/shpc.h
@@ -39,6 +39,7 @@ void shpc_reset(PCIDevice *d);
 int shpc_bar_size(PCIDevice *dev);
 int shpc_init(PCIDevice *dev, PCIBus *sec_bus, MemoryRegion *bar, unsigned off);
 void shpc_cleanup(PCIDevice *dev, MemoryRegion *bar);
+void shpc_free(PCIDevice *d);
 void shpc_cap_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int len);
 
 extern VMStateInfo shpc_vmstate_info;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 24/38] pci_bridge: split pci_bridge_free from pci_bridge_exitfn
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (22 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 25/38] pcie_aer: pcie_aer_exit really frees stuff Paolo Bonzini
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-bridge/i82801b11.c          | 1 +
 hw/pci-bridge/ioh3420.c            | 2 ++
 hw/pci-bridge/pci_bridge_dev.c     | 2 ++
 hw/pci-bridge/xio3130_downstream.c | 2 ++
 hw/pci-bridge/xio3130_upstream.c   | 2 ++
 hw/pci/pci_bridge.c                | 5 +++++
 include/hw/pci/pci_bridge.h        | 1 +
 7 files changed, 15 insertions(+)

diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c
index 14cd7fd..a3cc64c 100644
--- a/hw/pci-bridge/i82801b11.c
+++ b/hw/pci-bridge/i82801b11.c
@@ -76,6 +76,7 @@ static int i82801b11_bridge_initfn(PCIDevice *d)
 
 err_bridge:
     pci_bridge_exitfn(d);
+    pci_bridge_free(d);
 
     return rc;
 }
diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index 0f7f209..16f0cf8 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -142,6 +142,7 @@ err_msi:
     msi_uninit(d);
 err_bridge:
     pci_bridge_exitfn(d);
+    pci_bridge_free(d);
     return rc;
 }
 
@@ -154,6 +155,7 @@ static void ioh3420_exitfn(PCIDevice *d)
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+    pci_bridge_free(d);
 }
 
 PCIESlot *ioh3420_init(PCIBus *bus, int devfn, bool multifunction,
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index 97dfc49..970a5b9 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -83,6 +83,7 @@ slotid_error:
 shpc_error:
     memory_region_destroy(&bridge_dev->bar);
     pci_bridge_exitfn(dev);
+    pci_bridge_free(dev);
 bridge_error:
     return err;
 }
@@ -98,6 +99,7 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
     shpc_free(dev);
     memory_region_destroy(&bridge_dev->bar);
     pci_bridge_exitfn(dev);
+    pci_bridge_free(dev);
 }
 
 static void pci_bridge_dev_write_config(PCIDevice *d,
diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
index 94f9781..a0ac179 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -107,6 +107,7 @@ err_msi:
     msi_uninit(d);
 err_bridge:
     pci_bridge_exitfn(d);
+    pci_bridge_free(d);
     return rc;
 }
 
@@ -119,6 +120,7 @@ static void xio3130_downstream_exitfn(PCIDevice *d)
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+    pci_bridge_free(d);
 }
 
 PCIESlot *xio3130_downstream_init(PCIBus *bus, int devfn, bool multifunction,
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 59f97f6..682a7e5 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -94,6 +94,7 @@ err_msi:
     msi_uninit(d);
 err_bridge:
     pci_bridge_exitfn(d);
+    pci_bridge_free(d);
     return rc;
 }
 
@@ -103,6 +104,7 @@ static void xio3130_upstream_exitfn(PCIDevice *d)
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+    pci_bridge_free(d);
 }
 
 PCIEPort *xio3130_upstream_init(PCIBus *bus, int devfn, bool multifunction,
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index a90671d..307e076 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -387,6 +387,11 @@ void pci_bridge_exitfn(PCIDevice *pci_dev)
     assert(QLIST_EMPTY(&s->sec_bus.child));
     QLIST_REMOVE(&s->sec_bus, sibling);
     pci_bridge_region_del(s, s->windows);
+}
+
+void pci_bridge_free(PCIDevice *pci_dev)
+{
+    PCIBridge *s = PCI_BRIDGE(pci_dev);
     pci_bridge_region_cleanup(s, s->windows);
     memory_region_destroy(&s->address_space_mem);
     memory_region_destroy(&s->address_space_io);
diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
index 1d8f997..5be3ac9 100644
--- a/include/hw/pci/pci_bridge.h
+++ b/include/hw/pci/pci_bridge.h
@@ -46,6 +46,7 @@ void pci_bridge_reset(DeviceState *qdev);
 
 int pci_bridge_initfn(PCIDevice *pci_dev, const char *typename);
 void pci_bridge_exitfn(PCIDevice *pci_dev);
+void pci_bridge_free(PCIDevice *pci_dev);
 
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 25/38] pcie_aer: pcie_aer_exit really frees stuff
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (23 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 24/38] pci_bridge: split pci_bridge_free from pci_bridge_exitfn Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 26/38] pci_bridge: reclaim memory in instance_finalize instead of exit Paolo Bonzini
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Rename it to pcie_aer_free, and move it together with other
freeing functions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-bridge/ioh3420.c            | 2 +-
 hw/pci-bridge/xio3130_downstream.c | 2 +-
 hw/pci-bridge/xio3130_upstream.c   | 2 +-
 hw/pci/pcie_aer.c                  | 3 ++-
 include/hw/pci/pcie_aer.h          | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index 16f0cf8..cadf103 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -150,11 +150,11 @@ static void ioh3420_exitfn(PCIDevice *d)
 {
     PCIESlot *s = PCIE_SLOT(d);
 
-    pcie_aer_exit(d);
     pcie_chassis_del_slot(s);
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+    pcie_aer_free(d);
     pci_bridge_free(d);
 }
 
diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
index a0ac179..75522c6 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -115,11 +115,11 @@ static void xio3130_downstream_exitfn(PCIDevice *d)
 {
     PCIESlot *s = PCIE_SLOT(d);
 
-    pcie_aer_exit(d);
     pcie_chassis_del_slot(s);
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+    pcie_aer_free(d);
     pci_bridge_free(d);
 }
 
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 682a7e5..5cfdbc7 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -100,10 +100,10 @@ err_bridge:
 
 static void xio3130_upstream_exitfn(PCIDevice *d)
 {
-    pcie_aer_exit(d);
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+    pcie_aer_free(d);
     pci_bridge_free(d);
 }
 
diff --git a/hw/pci/pcie_aer.c b/hw/pci/pcie_aer.c
index ca762ab..509f77e 100644
--- a/hw/pci/pcie_aer.c
+++ b/hw/pci/pcie_aer.c
@@ -163,9 +163,10 @@ int pcie_aer_init(PCIDevice *dev, uint16_t offset)
     return 0;
 }
 
-void pcie_aer_exit(PCIDevice *dev)
+void pcie_aer_free(PCIDevice *dev)
 {
     g_free(dev->exp.aer_log.log);
+    dev->exp.aer_log.log = NULL;
 }
 
 static void pcie_aer_update_uncor_status(PCIDevice *dev)
diff --git a/include/hw/pci/pcie_aer.h b/include/hw/pci/pcie_aer.h
index bcac80a..af1dec3 100644
--- a/include/hw/pci/pcie_aer.h
+++ b/include/hw/pci/pcie_aer.h
@@ -88,7 +88,7 @@ struct PCIEAERErr {
 extern const VMStateDescription vmstate_pcie_aer_log;
 
 int pcie_aer_init(PCIDevice *dev, uint16_t offset);
-void pcie_aer_exit(PCIDevice *dev);
+void pcie_aer_free(PCIDevice *dev);
 void pcie_aer_write_config(PCIDevice *dev,
                            uint32_t addr, uint32_t val, int len);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 26/38] pci_bridge: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (24 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 25/38] pcie_aer: pcie_aer_exit really frees stuff Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 27/38] ioh4320: " Paolo Bonzini
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-bridge/pci_bridge_dev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index 970a5b9..b33dc59 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -96,9 +96,15 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
     }
     slotid_cap_cleanup(dev);
     shpc_cleanup(dev, &bridge_dev->bar);
+    pci_bridge_exitfn(dev);
+}
+
+static void pci_bridge_dev_instance_finalize(Object *obj)
+{
+    PCIDevice *dev = PCI_DEVICE(obj);
+    PCIBridgeDev *bridge_dev = PCI_BRIDGE_DEV(obj);
     shpc_free(dev);
     memory_region_destroy(&bridge_dev->bar);
-    pci_bridge_exitfn(dev);
     pci_bridge_free(dev);
 }
 
@@ -159,6 +165,7 @@ static const TypeInfo pci_bridge_dev_info = {
     .parent        = TYPE_PCI_BRIDGE,
     .instance_size = sizeof(PCIBridgeDev),
     .class_init = pci_bridge_dev_class_init,
+    .instance_finalize = pci_bridge_dev_instance_finalize,
 };
 
 static void pci_bridge_dev_register(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 27/38] ioh4320: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (25 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 26/38] pci_bridge: reclaim memory in instance_finalize instead of exit Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 28/38] xio3130-downstream: " Paolo Bonzini
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-bridge/ioh3420.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index cadf103..34bb5cd 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -154,6 +154,12 @@ static void ioh3420_exitfn(PCIDevice *d)
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+}
+
+static void ioh3420_instance_finalize(Object *obj)
+{
+    PCIDevice *d = PCI_DEVICE(obj);
+
     pcie_aer_free(d);
     pci_bridge_free(d);
 }
@@ -219,6 +225,7 @@ static const TypeInfo ioh3420_info = {
     .name          = "ioh3420",
     .parent        = TYPE_PCIE_SLOT,
     .class_init    = ioh3420_class_init,
+    .instance_finalize = ioh3420_instance_finalize,
 };
 
 static void ioh3420_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 28/38] xio3130-downstream: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (26 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 27/38] ioh4320: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 29/38] xio3130-upstream: " Paolo Bonzini
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-bridge/xio3130_downstream.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c
index 75522c6..3dc6815 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -119,6 +119,12 @@ static void xio3130_downstream_exitfn(PCIDevice *d)
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+}
+
+static void xio3130_downstream_instance_finalize(Object *obj)
+{
+    PCIDevice *d = PCI_DEVICE(obj);
+
     pcie_aer_free(d);
     pci_bridge_free(d);
 }
@@ -186,6 +192,7 @@ static const TypeInfo xio3130_downstream_info = {
     .name          = "xio3130-downstream",
     .parent        = TYPE_PCIE_SLOT,
     .class_init    = xio3130_downstream_class_init,
+    .instance_finalize = xio3130_downstream_instance_finalize,
 };
 
 static void xio3130_downstream_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 29/38] xio3130-upstream: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (27 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 28/38] xio3130-downstream: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 30/38] pcie: do not recreate mmcfg I/O region, use an alias instead Paolo Bonzini
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-bridge/xio3130_upstream.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 5cfdbc7..5e55797 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -103,6 +103,12 @@ static void xio3130_upstream_exitfn(PCIDevice *d)
     pcie_cap_exit(d);
     msi_uninit(d);
     pci_bridge_exitfn(d);
+}
+
+static void xio3130_upstream_instance_finalize(Object *obj)
+{
+    PCIDevice *d = PCI_DEVICE(obj);
+
     pcie_aer_free(d);
     pci_bridge_free(d);
 }
@@ -165,6 +171,7 @@ static const TypeInfo xio3130_upstream_info = {
     .name          = "x3130-upstream",
     .parent        = TYPE_PCIE_PORT,
     .class_init    = xio3130_upstream_class_init,
+    .instance_finalize = xio3130_upstream_instance_finalize,
 };
 
 static void xio3130_upstream_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 30/38] pcie: do not recreate mmcfg I/O region, use an alias instead
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (28 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 29/38] xio3130-upstream: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 31/38] esp: use instance_finalize instead of exit Paolo Bonzini
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

The MMIO regions should not be destroyed and recreated on the fly,
because they might have I/O in progress.  Instead, create an area that
is as large as possible, and then map a window of it into system memory.
This is safe because containers and aliases are never the destination
of I/O, and can be safely deleted at any time.  It is also similar to
how PAM maps windows of system RAM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci/pcie_host.c         | 23 +++++++++++++++++------
 include/hw/pci/pcie_host.h |  1 +
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/pci/pcie_host.c b/hw/pci/pcie_host.c
index b70e5ad..bb676a9 100644
--- a/hw/pci/pcie_host.c
+++ b/hw/pci/pcie_host.c
@@ -110,19 +110,26 @@ static const MemoryRegionOps pcie_mmcfg_ops = {
 int pcie_host_init(PCIExpressHost *e)
 {
     e->base_addr = PCIE_BASE_ADDR_UNMAPPED;
-
+    memory_region_init_io(&e->mmio, OBJECT(e), &pcie_mmcfg_ops, e, "pcie-mmcfg-mmio",
+                          PCIE_MMCFG_SIZE_MAX);
     return 0;
 }
 
 void pcie_host_mmcfg_unmap(PCIExpressHost *e)
 {
     if (e->base_addr != PCIE_BASE_ADDR_UNMAPPED) {
-        memory_region_del_subregion(get_system_memory(), &e->mmio);
-        memory_region_destroy(&e->mmio);
+        memory_region_del_subregion(get_system_memory(), &e->mapped_mmio);
+        memory_region_destroy(&e->mapped_mmio);
         e->base_addr = PCIE_BASE_ADDR_UNMAPPED;
     }
 }
 
+static void pcie_host_instance_finalize(Object *obj)
+{
+    PCIExpressHost *e = PCIE_HOST_BRIDGE(obj);
+    memory_region_destroy(&e->mmio);
+}
+
 void pcie_host_mmcfg_map(PCIExpressHost *e, hwaddr addr,
                          uint32_t size)
 {
@@ -130,10 +137,13 @@ void pcie_host_mmcfg_map(PCIExpressHost *e, hwaddr addr,
     assert(size >= PCIE_MMCFG_SIZE_MIN);
     assert(size <= PCIE_MMCFG_SIZE_MAX);
     e->size = size;
-    memory_region_init_io(&e->mmio, OBJECT(e), &pcie_mmcfg_ops, e,
-                          "pcie-mmcfg", e->size);
     e->base_addr = addr;
-    memory_region_add_subregion(get_system_memory(), e->base_addr, &e->mmio);
+
+    memory_region_init_alias(&e->mapped_mmio, OBJECT(e),
+                             "pcie-mmcfg", &e->mmio,
+                             0, e->size);
+
+    memory_region_add_subregion(get_system_memory(), e->base_addr, &e->mapped_mmio);
 }
 
 void pcie_host_mmcfg_update(PCIExpressHost *e,
@@ -152,6 +162,7 @@ static const TypeInfo pcie_host_type_info = {
     .parent = TYPE_PCI_HOST_BRIDGE,
     .abstract = true,
     .instance_size = sizeof(PCIExpressHost),
+    .instance_finalize = pcie_host_instance_finalize,
 };
 
 static void pcie_host_register_types(void)
diff --git a/include/hw/pci/pcie_host.h b/include/hw/pci/pcie_host.h
index 1228e36..1656bde 100644
--- a/include/hw/pci/pcie_host.h
+++ b/include/hw/pci/pcie_host.h
@@ -41,6 +41,7 @@ struct PCIExpressHost {
 
     /* MMCONFIG mmio area */
     MemoryRegion mmio;
+    MemoryRegion mapped_mmio;
 };
 
 int pcie_host_init(PCIExpressHost *e);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 31/38] esp: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (29 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 30/38] pcie: do not recreate mmcfg I/O region, use an alias instead Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 32/38] lsi: " Paolo Bonzini
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/esp-pci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/esp-pci.c b/hw/scsi/esp-pci.c
index d7ec173..127868d 100644
--- a/hw/scsi/esp-pci.c
+++ b/hw/scsi/esp-pci.c
@@ -374,9 +374,9 @@ static int esp_pci_scsi_init(PCIDevice *dev)
     return 0;
 }
 
-static void esp_pci_scsi_uninit(PCIDevice *d)
+static void esp_pci_scsi_instance_finalize(Object *obj)
 {
-    PCIESPState *pci = PCI_ESP(d);
+    PCIESPState *pci = PCI_ESP(obj);
 
     memory_region_destroy(&pci->io);
 }
@@ -387,7 +387,6 @@ static void esp_pci_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = esp_pci_scsi_init;
-    k->exit = esp_pci_scsi_uninit;
     k->vendor_id = PCI_VENDOR_ID_AMD;
     k->device_id = PCI_DEVICE_ID_AMD_SCSI;
     k->revision = 0x10;
@@ -403,6 +402,7 @@ static const TypeInfo esp_pci_info = {
     .parent = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PCIESPState),
     .class_init = esp_pci_class_init,
+    .instance_finalize = esp_pci_scsi_instance_finalize,
 };
 
 typedef struct {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 32/38] lsi: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (30 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 31/38] esp: use instance_finalize instead of exit Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 33/38] pvscsi: reclaim memory in " Paolo Bonzini
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/lsi53c895a.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 611f2aa..ee13308 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -2072,9 +2072,9 @@ static const VMStateDescription vmstate_lsi_scsi = {
     }
 };
 
-static void lsi_scsi_uninit(PCIDevice *d)
+static void lsi_scsi_instance_finalize(Object *obj)
 {
-    LSIState *s = LSI53C895A(d);
+    LSIState *s = LSI53C895A(obj);
 
     memory_region_destroy(&s->mmio_io);
     memory_region_destroy(&s->ram_io);
@@ -2134,7 +2134,6 @@ static void lsi_class_init(ObjectClass *klass, void *data)
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
     k->init = lsi_scsi_init;
-    k->exit = lsi_scsi_uninit;
     k->vendor_id = PCI_VENDOR_ID_LSI_LOGIC;
     k->device_id = PCI_DEVICE_ID_LSI_53C895A;
     k->class_id = PCI_CLASS_STORAGE_SCSI;
@@ -2149,6 +2148,7 @@ static const TypeInfo lsi_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(LSIState),
     .class_init    = lsi_class_init,
+    .instance_finalize = lsi_scsi_instance_finalize,
 };
 
 static void lsi53c895a_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 33/38] pvscsi: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (31 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 32/38] lsi: " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 34/38] usb-uhci: use " Paolo Bonzini
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/scsi/vmw_pvscsi.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index d42b359..9761b8e 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -1100,9 +1100,18 @@ pvscsi_uninit(PCIDevice *pci_dev)
     PVSCSIState *s = PVSCSI(pci_dev);
 
     trace_pvscsi_state("uninit");
-    qemu_bh_delete(s->completion_worker);
 
     pvscsi_cleanup_msi(s);
+}
+
+static void 
+pvscsi_instance_finalize(Object *obj)
+{
+    PVSCSIState *s = PVSCSI(obj);
+
+    trace_pvscsi_state("finalize");
+
+    qemu_bh_delete(s->completion_worker);
 
     memory_region_destroy(&s->io_space);
 }
@@ -1206,6 +1215,7 @@ static const TypeInfo pvscsi_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(PVSCSIState),
     .class_init    = pvscsi_class_init,
+    .instance_finalize = pvscsi_instance_finalize,
 };
 
 static void
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 34/38] usb-uhci: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (32 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 33/38] pvscsi: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 35/38] virtio-pci: reclaim memory in " Paolo Bonzini
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/usb/hcd-uhci.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 578b949..821838a 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -1292,8 +1292,9 @@ static int usb_uhci_vt82c686b_initfn(PCIDevice *dev)
     return usb_uhci_common_initfn(dev);
 }
 
-static void usb_uhci_exit(PCIDevice *dev)
+static void usb_uhci_instance_finalize(Object *obj)
 {
+    PCIDevice *dev = PCI_DEVICE(obj);
     UHCIState *s = DO_UPCAST(UHCIState, dev, dev);
 
     memory_region_destroy(&s->io_bar);
@@ -1315,7 +1316,6 @@ static void uhci_class_init(ObjectClass *klass, void *data)
     UHCIInfo *info = data;
 
     k->init = info->initfn ? info->initfn : usb_uhci_common_initfn;
-    k->exit = info->unplug ? usb_uhci_exit : NULL;
     k->vendor_id = info->vendor_id;
     k->device_id = info->device_id;
     k->revision  = info->revision;
@@ -1402,6 +1402,7 @@ static void uhci_register_types(void)
         .instance_size = sizeof(UHCIState),
         .class_size    = sizeof(UHCIPCIDeviceClass),
         .class_init    = uhci_class_init,
+        .instance_finalize = usb_uhci_instance_finalize,
     };
     int i;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 35/38] virtio-pci: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (33 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 34/38] usb-uhci: use " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 36/38] wdt_i6300esb: use " Paolo Bonzini
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/virtio/virtio-pci.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ff6bed5..8ed6d52 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -997,8 +997,14 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
 {
     VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
     virtio_pci_stop_ioeventfd(proxy);
-    memory_region_destroy(&proxy->bar);
     msix_uninit_exclusive_bar(pci_dev);
+}
+
+static void virtio_pci_instance_finalize(Object *obj)
+{
+    PCIDevice *pci_dev = PCI_DEVICE(obj);
+    VirtIOPCIProxy *proxy = VIRTIO_PCI(obj);
+    memory_region_destroy(&proxy->bar);
     msix_free(pci_dev);
 }
 
@@ -1031,6 +1037,7 @@ static const TypeInfo virtio_pci_info = {
     .instance_size = sizeof(VirtIOPCIProxy),
     .class_init    = virtio_pci_class_init,
     .class_size    = sizeof(VirtioPCIClass),
+    .instance_finalize = virtio_pci_instance_finalize,
     .abstract      = true,
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 36/38] wdt_i6300esb: use instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (34 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 35/38] virtio-pci: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 37/38] xen_pt: reclaim memory in " Paolo Bonzini
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/watchdog/wdt_i6300esb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/watchdog/wdt_i6300esb.c b/hw/watchdog/wdt_i6300esb.c
index 36d3887..b18f694 100644
--- a/hw/watchdog/wdt_i6300esb.c
+++ b/hw/watchdog/wdt_i6300esb.c
@@ -425,8 +425,9 @@ static int i6300esb_init(PCIDevice *dev)
     return 0;
 }
 
-static void i6300esb_exit(PCIDevice *dev)
+static void i6300esb_instance_finalize(Object *obj)
 {
+    PCIDevice *dev = PCI_DEVICE(obj);
     I6300State *d = DO_UPCAST(I6300State, dev, dev);
 
     memory_region_destroy(&d->io_mem);
@@ -445,7 +446,6 @@ static void i6300esb_class_init(ObjectClass *klass, void *data)
     k->config_read = i6300esb_config_read;
     k->config_write = i6300esb_config_write;
     k->init = i6300esb_init;
-    k->exit = i6300esb_exit;
     k->vendor_id = PCI_VENDOR_ID_INTEL;
     k->device_id = PCI_DEVICE_ID_INTEL_ESB_9;
     k->class_id = PCI_CLASS_SYSTEM_OTHER;
@@ -459,6 +459,7 @@ static const TypeInfo i6300esb_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(I6300State),
     .class_init    = i6300esb_class_init,
+    .instance_finalize = i6300esb_instance_finalize,
 };
 
 static void i6300esb_register_types(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 37/38] xen_pt: reclaim memory in instance_finalize instead of exit
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (35 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 36/38] wdt_i6300esb: use " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 38/38] tpm: move add/del_subregion to realize/unrealize Paolo Bonzini
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/xen/xen_pt.c             | 10 ++++++++++
 hw/xen/xen_pt_config_init.c |  3 ---
 hw/xen/xen_pt_msi.c         |  8 +++++++-
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index ca2d460..592c472 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -806,7 +806,16 @@ static void xen_pt_unregister_device(PCIDevice *d)
         }
     }
 
+    xen_pt_msix_delete(s);
+}
+
+static void xen_pt_instance_finalize(Object *obj)
+{
+    PCIDevice *d = PCI_DEVICE(obj);
+    XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+
     /* delete all emulated config registers */
+    xen_pt_msix_free(s);
     xen_pt_config_delete(s);
 
     xen_pt_unregister_regions(s);
@@ -840,6 +849,7 @@ static const TypeInfo xen_pci_passthrough_info = {
     .parent = TYPE_PCI_DEVICE,
     .instance_size = sizeof(XenPCIPassthroughState),
     .class_init = xen_pci_passthrough_class_init,
+    .instance_finalize = xen_pt_instance_finalize,
 };
 
 static void xen_pci_passthrough_register_types(void)
diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 8ccc2e4..25f6334 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -1861,9 +1861,6 @@ void xen_pt_config_delete(XenPCIPassthroughState *s)
     struct XenPTReg *reg, *next_reg;
 
     /* free MSI/MSI-X info table */
-    if (s->msix) {
-        xen_pt_msix_delete(s);
-    }
     if (s->msi) {
         g_free(s->msi);
     }
diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
index 6fbe0cc..5bcbbf2 100644
--- a/hw/xen/xen_pt_msi.c
+++ b/hw/xen/xen_pt_msi.c
@@ -605,6 +605,13 @@ void xen_pt_msix_delete(XenPCIPassthroughState *s)
         return;
     }
 
+    memory_region_del_subregion(&s->bar[msix->bar_index], &msix->mmio);
+}
+
+void xen_pt_msix_free(XenPCIPassthroughState *s)
+{
+    XenPTMSIX *msix = s->msix;
+
     /* unmap the MSI-X memory mapped register area */
     if (msix->phys_iomem_base) {
         XEN_PT_LOG(&s->dev, "unmapping physical MSI-X table from %p\n",
@@ -613,7 +620,6 @@ void xen_pt_msix_delete(XenPCIPassthroughState *s)
                + msix->table_offset_adjust);
     }
 
-    memory_region_del_subregion(&s->bar[msix->bar_index], &msix->mmio);
     memory_region_destroy(&msix->mmio);
 
     g_free(s->msix);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Qemu-devel] [PATCH 38/38] tpm: move add/del_subregion to realize/unrealize
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (36 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 37/38] xen_pt: reclaim memory in " Paolo Bonzini
@ 2013-09-03 12:33 ` Paolo Bonzini
  2013-09-16 16:35 ` [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-03 12:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst

They are currently in instance_initialize and instance_finalize.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/tpm/tpm_tis.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 6f0a4d2..7cb2a3a 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -855,6 +855,7 @@ static Property tpm_tis_properties[] = {
 
 static void tpm_tis_realizefn(DeviceState *dev, Error **errp)
 {
+    ISADevice *d = ISA_DEVICE(dev);
     TPMState *s = TPM(dev);
     TPMTISEmuState *tis = &s->s.tis;
 
@@ -881,26 +882,31 @@ static void tpm_tis_realizefn(DeviceState *dev, Error **errp)
 
     tis->bh = qemu_bh_new(tpm_tis_receive_bh, s);
 
-    isa_init_irq(&s->busdev, &tis->irq, tis->irq_num);
+    isa_init_irq(d, &tis->irq, tis->irq_num);
+    memory_region_add_subregion(isa_address_space(d), TPM_TIS_ADDR_BASE,
+                                &s->mmio);
+}
+
+static void tpm_tis_unrealizefn(DeviceState *dev, Error **errp)
+{
+    TPMState *s = TPM(dev);
+
+    memory_region_del_subregion(get_system_memory(), &s->mmio);
 }
 
 static void tpm_tis_initfn(Object *obj)
 {
-    ISADevice *dev = ISA_DEVICE(obj);
     TPMState *s = TPM(obj);
 
     memory_region_init_io(&s->mmio, OBJECT(s), &tpm_tis_memory_ops,
                           s, "tpm-tis-mmio",
                           TPM_TIS_NUM_LOCALITIES << TPM_TIS_LOCALITY_SHIFT);
-    memory_region_add_subregion(isa_address_space(dev), TPM_TIS_ADDR_BASE,
-                                &s->mmio);
 }
 
 static void tpm_tis_uninitfn(Object *obj)
 {
     TPMState *s = TPM(obj);
 
-    memory_region_del_subregion(get_system_memory(), &s->mmio);
     memory_region_destroy(&s->mmio);
 }
 
@@ -909,6 +915,7 @@ static void tpm_tis_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->realize = tpm_tis_realizefn;
+    dc->unrealize = tpm_tis_unrealizefn;
     dc->props = tpm_tis_properties;
     dc->reset = tpm_tis_reset;
     dc->vmsd  = &vmstate_tpm_tis;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (37 preceding siblings ...)
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 38/38] tpm: move add/del_subregion to realize/unrealize Paolo Bonzini
@ 2013-09-16 16:35 ` Paolo Bonzini
  2013-09-17  6:44 ` Wenchao Xia
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-16 16:35 UTC (permalink / raw)
  To: mst; +Cc: qemu-devel

Il 03/09/2013 14:32, Paolo Bonzini ha scritto:
> QOM splits the destruction of a device in two phases:
> 
> - unrealize, also known as "exit" from qdev times, should isolate
>   the device from the guest.  After unrealize returns, the guest
>   should not be able to issue new requests.
> 
> - instance_finalize will reclaim the memory.  This is only called
>   after all requests terminate and drop the references on the
>   device.
> 
> Though overlooked, this is important even now: QEMU's little secret is
> that devices already do access memory out of the iothread mutex (with
> address_space_map/unmap and AIO), and this can be MMIO memory too
> through a bounce buffer.  This series prepares things so that, once
> we'll put the memory_region_ref/unref infrastructure to complete use,
> things will just work.
> 
> Of course this split will be particularly important for devices that
> will be able to do unlocked MMIO.
> 
> This series changes all PCI devices (the sole to support hotplug _and_
> use MemoryRegions) to do memory_region_del_subregion at unrealize time,
> and memory_region_destroy at instance_finalize time.  As it is mostly
> a PCI patch, it should go through mst's tree.

Ping.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (38 preceding siblings ...)
  2013-09-16 16:35 ` [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
@ 2013-09-17  6:44 ` Wenchao Xia
  2013-09-17 10:01   ` Paolo Bonzini
  2013-09-17  9:31 ` Michael S. Tsirkin
  2013-09-17 12:47 ` Michael S. Tsirkin
  41 siblings, 1 reply; 78+ messages in thread
From: Wenchao Xia @ 2013-09-17  6:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

于 2013/9/3 20:32, Paolo Bonzini 写道:
> QOM splits the destruction of a device in two phases:
>
> - unrealize, also known as "exit" from qdev times, should isolate
>   the device from the guest.  After unrealize returns, the guest
>   should not be able to issue new requests.
>
> - instance_finalize will reclaim the memory.  This is only called
>   after all requests terminate and drop the references on the
>   device.
>
> Though overlooked, this is important even now: QEMU's little secret is
> that devices already do access memory out of the iothread mutex (with
> address_space_map/unmap and AIO), and this can be MMIO memory too
> through a bounce buffer.  This series prepares things so that, once
> we'll put the memory_region_ref/unref infrastructure to complete use,
> things will just work.
>
> Of course this split will be particularly important for devices that
> will be able to do unlocked MMIO.
>
> This series changes all PCI devices (the sole to support hotplug _and_
> use MemoryRegions) to do memory_region_del_subregion at unrealize time,
> and memory_region_destroy at instance_finalize time.  As it is mostly
> a PCI patch, it should go through mst's tree.
>
> Paolo
>
> Paolo Bonzini (38):
>   qdev: document assumption that unrealize is followed by finalize
>   pci: split exit and finalize
>   ac97: use instance_finalize instead of exit
>   es1370: use instance_finalize instead of exit
>   hda: reclaim memory in instance_finalize instead of exit
>   serial: reclaim memory in instance_finalize instead of exit
>   tpci200: use instance_finalize instead of exit
>   pci-assign: reclaim memory in instance_finalize instead of exit
>   ahci: reclaim memory in instance_finalize instead of exit
>   msix: split msix_free from msix_uninit
>   cmd646: use instance_finalize instead of exit
>   ide/piix: use instance_finalize instead of exit
>   ide/via: use instance_finalize instead of exit
>   ivshmem: reclaim memory in instance_finalize instead of exit
>   pci-testdev: use instance_finalize instead of exit
>   vfio: reclaim memory in instance_finalize instead of exit
>   e1000: use instance_finalize instead of exit
>   eepro100: use instance_finalize instead of exit
>   ne2000: use instance_finalize instead of exit
>   pcnet: use instance_finalize instead of exit
>   rtl8139: use instance_finalize instead of exit
>   vmxnet3: reclaim memory in instance_finalize instead of exit
>   shpc: split shpc_free from shpc_cleanup
>   pci_bridge: split pci_bridge_free from pci_bridge_exitfn
>   pcie_aer: pcie_aer_exit really frees stuff
>   pci_bridge: reclaim memory in instance_finalize instead of exit
>   ioh4320: reclaim memory in instance_finalize instead of exit
>   xio3130-downstream: reclaim memory in instance_finalize instead of
>     exit
>   xio3130-upstream: reclaim memory in instance_finalize instead of exit
>   pcie: do not recreate mmcfg I/O region, use an alias instead
>   esp: use instance_finalize instead of exit
>   lsi: use instance_finalize instead of exit
>   pvscsi: reclaim memory in instance_finalize instead of exit
>   usb-uhci: use instance_finalize instead of exit
>   virtio-pci: reclaim memory in instance_finalize instead of exit
>   wdt_i6300esb: use instance_finalize instead of exit
>   xen_pt: reclaim memory in instance_finalize instead of exit
>   tpm: move add/del_subregion to realize/unrealize
>
>  hw/audio/ac97.c                    |  5 ++--
>  hw/audio/es1370.c                  |  5 ++--
>  hw/audio/intel-hda.c               |  8 ++++++
>  hw/char/serial-pci.c               | 24 ++++++++++++++++++
>  hw/char/tpci200.c                  |  5 ++--
>  hw/i386/kvm/pci-assign.c           |  8 ++++++
>  hw/ide/ahci.c                      |  2 +-
>  hw/ide/ahci.h                      |  2 +-
>  hw/ide/cmd646.c                    |  6 ++---
>  hw/ide/ich.c                       | 12 ++++++---
>  hw/ide/piix.c                      |  9 ++++---
>  hw/ide/via.c                       |  6 ++---
>  hw/misc/ivshmem.c                  | 13 +++++++---
>  hw/misc/pci-testdev.c              |  6 ++---
>  hw/misc/vfio.c                     | 52 +++++++++++++++++++++++++++++++++++---
>  hw/net/e1000.c                     |  6 ++---
>  hw/net/eepro100.c                  |  5 ++--
>  hw/net/ne2000.c                    |  5 ++--
>  hw/net/pcnet-pci.c                 |  6 ++---
>  hw/net/rtl8139.c                   |  6 ++---
>  hw/net/vmxnet3.c                   | 14 ++++++++--
>  hw/pci-bridge/i82801b11.c          |  1 +
>  hw/pci-bridge/ioh3420.c            | 11 +++++++-
>  hw/pci-bridge/pci_bridge_dev.c     | 13 +++++++++-
>  hw/pci-bridge/xio3130_downstream.c | 11 +++++++-
>  hw/pci-bridge/xio3130_upstream.c   | 11 +++++++-
>  hw/pci/msix.c                      | 22 +++++++++++-----
>  hw/pci/pci.c                       | 15 ++++++++---
>  hw/pci/pci_bridge.c                |  5 ++++
>  hw/pci/pcie_aer.c                  |  3 ++-
>  hw/pci/pcie_host.c                 | 23 ++++++++++++-----
>  hw/pci/shpc.c                      |  8 +++++-
>  hw/scsi/esp-pci.c                  |  6 ++---
>  hw/scsi/lsi53c895a.c               |  6 ++---
>  hw/scsi/vmw_pvscsi.c               | 12 ++++++++-
>  hw/tpm/tpm_tis.c                   | 17 +++++++++----
>  hw/usb/hcd-uhci.c                  |  5 ++--
>  hw/virtio/virtio-pci.c             | 10 +++++++-
>  hw/watchdog/wdt_i6300esb.c         |  5 ++--
>  hw/xen/xen_pt.c                    | 10 ++++++++
>  hw/xen/xen_pt_config_init.c        |  3 ---
>  hw/xen/xen_pt_msi.c                |  8 +++++-
>  include/hw/pci/msix.h              |  1 +
>  include/hw/pci/pci_bridge.h        |  1 +
>  include/hw/pci/pcie_aer.h          |  2 +-
>  include/hw/pci/pcie_host.h         |  1 +
>  include/hw/pci/shpc.h              |  1 +
>  include/hw/qdev-core.h             |  4 +++
>  48 files changed, 329 insertions(+), 91 deletions(-)
>
Just one question: where is the caller of .instance_finalize(), did
I missed that patch?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
@ 2013-09-17  9:00   ` Michael S. Tsirkin
  0 siblings, 0 replies; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17  9:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:32:52PM +0200, Paolo Bonzini wrote:
> This becomes important when undoing realize's initializations is split
> in two places (unrealize and exit).
> 
> The way to fix this could be to split realize further into "alloc" (done
> once) and "realize" (can be undone).

Or alloc + init ?

> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/hw/qdev-core.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 46972f4..d840f06 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -86,6 +86,10 @@ struct VMStateDescription;
>   * object_initialize() in their own #TypeInfo.instance_init and forward the
>   * realization events appropriately.
>   *
> + * Note that for now it is not possible to unrealize a device and then
> + * re-realize it.  While this may change in the future, it is a valid
> + * assumption for now.
> + *
>   * The @init callback is considered private to a particular bus implementation
>   * (immediate abstract child types of TYPE_DEVICE). Derived leaf types set an
>   * "init" callback on their parent class instead.
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 02/38] pci: split exit and finalize
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
@ 2013-09-17  9:16   ` Michael S. Tsirkin
  2013-09-17  9:56     ` Paolo Bonzini
  2013-09-17 10:06   ` Michael S. Tsirkin
  1 sibling, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17  9:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:32:53PM +0200, Paolo Bonzini wrote:
> When converting devices to use out-of-BQL memory access, destruction
> needs to be done in two phases.  First, the device is unrealized;
> at this point, pending memory accesses can still be completed, but
> no new accesses will be started.  The second part is freeing the
> device, which happens only after the reference count drops to zero;
> this means that all memory accesses are complete.
> 
> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/pci/pci.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4c004f5..bd084c7 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -787,6 +787,16 @@ static void pci_config_free(PCIDevice *pci_dev)
>      g_free(pci_dev->used);
>  }
>  
> +static void pci_device_instance_finalize(Object *obj)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(obj);
> +
> +    qemu_free_irqs(pci_dev->irq);
> +
> +    address_space_destroy(&pci_dev->bus_master_as);
> +    memory_region_destroy(&pci_dev->bus_master_enable_region);
> +}
> +
>  /* -1 for devfn means auto assign */
>  static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>                                           const char *name, int devfn)
> @@ -875,12 +885,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>  
>  static void do_pci_unregister_device(PCIDevice *pci_dev)
>  {
> -    qemu_free_irqs(pci_dev->irq);

I don't get this one.
Why do we want to keep irqs about?
If they manage to send an interrupt to guest *somehow*
guest will hang with no way to clear.

>      pci_dev->bus->devices[pci_dev->devfn] = NULL;
>      pci_config_free(pci_dev);
> -
> -    address_space_destroy(&pci_dev->bus_master_as);
> -    memory_region_destroy(&pci_dev->bus_master_enable_region);

Interesting.
So you are saying it's important to keep MMIO MRs around until finalize,
it's not enough that that they are not a subregion of anything?  If not,
is e.g. pcie_host_mmcfg_update buggy?


>  }
>  
>  static void pci_unregister_io_regions(PCIDevice *pci_dev)
> @@ -2252,6 +2258,7 @@ static const TypeInfo pci_device_type_info = {
>      .abstract = true,
>      .class_size = sizeof(PCIDeviceClass),
>      .class_init = pci_device_class_init,
> +    .instance_finalize = pci_device_instance_finalize,
>  };
>  
>  static void pci_register_types(void)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit Paolo Bonzini
@ 2013-09-17  9:21   ` Michael S. Tsirkin
  2013-09-17  9:56     ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17  9:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:33:01PM +0200, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/misc/vfio.c         |  1 +
>  hw/net/vmxnet3.c       |  3 +++
>  hw/pci/msix.c          | 22 ++++++++++++++++------
>  hw/virtio/virtio-pci.c |  1 +
>  include/hw/pci/msix.h  |  1 +
>  5 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
> index a1c08fb..1311d0e 100644
> --- a/hw/misc/vfio.c
> +++ b/hw/misc/vfio.c
> @@ -2144,6 +2144,7 @@ static void vfio_teardown_msi(VFIODevice *vdev)
>      if (vdev->msix) {
>          msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
>                      &vdev->bars[vdev->msix->pba_bar].mem);
> +        msix_free(&vdev->pdev);
>      }
>  }
>  
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 49c2466..2a79e52 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -1979,6 +1979,7 @@ vmxnet3_init_msix(VMXNET3State *s)
>          if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
>              VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
>              msix_uninit(d, &s->msix_bar, &s->msix_bar);
> +            msix_free(d);
>              s->msix_used = false;
>          } else {
>              s->msix_used = true;
> @@ -1996,6 +1997,8 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
>          msix_vector_unuse(d, VMXNET3_MAX_INTRS);
>          msix_uninit(d, &s->msix_bar, &s->msix_bar);
>      }
> +
> +    msix_free(d);
>  }
>  
>  #define VMXNET3_MSI_NUM_VECTORS   (1)
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 3430770..0618e3b 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -359,16 +359,26 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar)
>      msix_free_irq_entries(dev);
>      dev->msix_entries_nr = 0;
>      memory_region_del_subregion(pba_bar, &dev->msix_pba_mmio);
> -    memory_region_destroy(&dev->msix_pba_mmio);
> -    g_free(dev->msix_pba);
> -    dev->msix_pba = NULL;
>      memory_region_del_subregion(table_bar, &dev->msix_table_mmio);
> -    memory_region_destroy(&dev->msix_table_mmio);
> -    g_free(dev->msix_table);
> +    dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
> +}
> +
> +void msix_free(PCIDevice *dev)
> +{
> +    if (dev->msix_pba) {
> +        memory_region_destroy(&dev->msix_pba_mmio);
> +        g_free(dev->msix_pba);
> +    }
> +    dev->msix_pba = NULL;
> +
> +    if (dev->msix_table) {
> +        memory_region_destroy(&dev->msix_table_mmio);
> +        g_free(dev->msix_table);
> +    }
>      dev->msix_table = NULL;
> +
>      g_free(dev->msix_entry_used);
>      dev->msix_entry_used = NULL;
> -    dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
>  }
>  

I dislike the if (...) here. This only works as long
as all data is pointers.
So I'd prefer QEMU_PCI_CAP_MSIX to be cleared in msix_free.
This way msix_free can just check QEMU_PCI_CAP_MSIX
and return if not there.

>  void msix_uninit_exclusive_bar(PCIDevice *dev)
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index f2c489b..ff6bed5 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -999,6 +999,7 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
>      virtio_pci_stop_ioeventfd(proxy);
>      memory_region_destroy(&proxy->bar);
>      msix_uninit_exclusive_bar(pci_dev);
> +    msix_free(pci_dev);
>  }
>  
>  static void virtio_pci_reset(DeviceState *qdev)
> diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
> index 954d82b..34e3ba2 100644
> --- a/include/hw/pci/msix.h
> +++ b/include/hw/pci/msix.h
> @@ -18,6 +18,7 @@ void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int len);
>  void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar,
>                   MemoryRegion *pba_bar);
>  void msix_uninit_exclusive_bar(PCIDevice *dev);
> +void msix_free(PCIDevice *dev);
>  
>  unsigned int msix_nr_vectors_allocated(const PCIDevice *dev);
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup Paolo Bonzini
@ 2013-09-17  9:24   ` Michael S. Tsirkin
  2013-09-17  9:58     ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17  9:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:33:14PM +0200, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/pci-bridge/pci_bridge_dev.c | 2 ++
>  hw/pci/shpc.c                  | 8 +++++++-
>  include/hw/pci/shpc.h          | 1 +
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
> index a9392c7..97dfc49 100644
> --- a/hw/pci-bridge/pci_bridge_dev.c
> +++ b/hw/pci-bridge/pci_bridge_dev.c
> @@ -79,6 +79,7 @@ msi_error:
>      slotid_cap_cleanup(dev);
>  slotid_error:
>      shpc_cleanup(dev, &bridge_dev->bar);
> +    shpc_free(dev);
>  shpc_error:
>      memory_region_destroy(&bridge_dev->bar);
>      pci_bridge_exitfn(dev);
> @@ -94,6 +95,7 @@ static void pci_bridge_dev_exitfn(PCIDevice *dev)
>      }
>      slotid_cap_cleanup(dev);
>      shpc_cleanup(dev, &bridge_dev->bar);
> +    shpc_free(dev);
>      memory_region_destroy(&bridge_dev->bar);
>      pci_bridge_exitfn(dev);
>  }
> diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
> index eb092fd..cefaf69 100644
> --- a/hw/pci/shpc.c
> +++ b/hw/pci/shpc.c
> @@ -630,15 +630,21 @@ int shpc_bar_size(PCIDevice *d)
>  void shpc_cleanup(PCIDevice *d, MemoryRegion *bar)
>  {
>      SHPCDevice *shpc = d->shpc;
> +    /* TODO: cleanup config space changes? */
>      d->cap_present &= ~QEMU_PCI_CAP_SHPC;
>      memory_region_del_subregion(bar, &shpc->mmio);
> -    /* TODO: cleanup config space changes? */
> +}
> +
> +void shpc_free(PCIDevice *d)
> +{
> +    SHPCDevice *shpc = d->shpc;
>      g_free(shpc->config);
>      g_free(shpc->cmask);
>      g_free(shpc->wmask);
>      g_free(shpc->w1cmask);
>      memory_region_destroy(&shpc->mmio);
>      g_free(shpc);
> +    d->shpc = NULL;

Why is this line here btw?

>  }
>  
>  void shpc_cap_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> diff --git a/include/hw/pci/shpc.h b/include/hw/pci/shpc.h
> index 467911a..5f27431 100644
> --- a/include/hw/pci/shpc.h
> +++ b/include/hw/pci/shpc.h
> @@ -39,6 +39,7 @@ void shpc_reset(PCIDevice *d);
>  int shpc_bar_size(PCIDevice *dev);
>  int shpc_init(PCIDevice *dev, PCIBus *sec_bus, MemoryRegion *bar, unsigned off);
>  void shpc_cleanup(PCIDevice *dev, MemoryRegion *bar);
> +void shpc_free(PCIDevice *d);
>  void shpc_cap_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int len);
>  
>  extern VMStateInfo shpc_vmstate_info;
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 17/38] e1000: use instance_finalize instead of exit
  2013-09-03 12:33 ` [Qemu-devel] [PATCH 17/38] e1000: use " Paolo Bonzini
@ 2013-09-17  9:27   ` Michael S. Tsirkin
  2013-09-17 10:13     ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17  9:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:33:08PM +0200, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/net/e1000.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index f5ebed4..55fb062 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -1310,9 +1310,9 @@ e1000_cleanup(NetClientState *nc)
>  }
>  
>  static void
> -pci_e1000_uninit(PCIDevice *dev)
> +pci_e1000_instance_finalize(Object *obj)
>  {
> -    E1000State *d = E1000(dev);
> +    E1000State *d = E1000(obj);
>  
>      timer_del(d->autoneg_timer);
>      timer_free(d->autoneg_timer);

So this looks wrong.
This cancels timers after pci device has been destroyed,
so meanwhile timers can run and send interrupts.


> @@ -1394,7 +1394,6 @@ static void e1000_class_init(ObjectClass *klass, void *data)
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>  
>      k->init = pci_e1000_init;
> -    k->exit = pci_e1000_uninit;
>      k->romfile = "efi-e1000.rom";
>      k->vendor_id = PCI_VENDOR_ID_INTEL;
>      k->device_id = E1000_DEVID;
> @@ -1412,6 +1411,7 @@ static const TypeInfo e1000_info = {
>      .parent        = TYPE_PCI_DEVICE,
>      .instance_size = sizeof(E1000State),
>      .class_init    = e1000_class_init,
> +    .instance_finalize = pci_e1000_instance_finalize,
>  };
>  
>  static void e1000_register_types(void)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (39 preceding siblings ...)
  2013-09-17  6:44 ` Wenchao Xia
@ 2013-09-17  9:31 ` Michael S. Tsirkin
  2013-09-17 12:47 ` Michael S. Tsirkin
  41 siblings, 0 replies; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17  9:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:32:51PM +0200, Paolo Bonzini wrote:
> QOM splits the destruction of a device in two phases:
> 
> - unrealize, also known as "exit" from qdev times, should isolate
>   the device from the guest.  After unrealize returns, the guest
>   should not be able to issue new requests.
> 
> - instance_finalize will reclaim the memory.  This is only called
>   after all requests terminate and drop the references on the
>   device.
> 
> Though overlooked, this is important even now: QEMU's little secret is
> that devices already do access memory out of the iothread mutex (with
> address_space_map/unmap and AIO), and this can be MMIO memory too
> through a bounce buffer.  This series prepares things so that, once
> we'll put the memory_region_ref/unref infrastructure to complete use,
> things will just work.
> 
> Of course this split will be particularly important for devices that
> will be able to do unlocked MMIO.
> 
> This series changes all PCI devices (the sole to support hotplug _and_
> use MemoryRegions) to do memory_region_del_subregion at unrealize time,
> and memory_region_destroy at instance_finalize time.  As it is mostly
> a PCI patch, it should go through mst's tree.
> 
> Paolo

So lots of devices modify guest state until
you stop them, and ATM we stop them
through the exit function.

For this reason I think you can't just change
all devices to do all cleanup in finalize, you
need to split it up.

Sents comments on e.g. e1000 but it applies widely.

> Paolo Bonzini (38):
>   qdev: document assumption that unrealize is followed by finalize
>   pci: split exit and finalize
>   ac97: use instance_finalize instead of exit
>   es1370: use instance_finalize instead of exit
>   hda: reclaim memory in instance_finalize instead of exit
>   serial: reclaim memory in instance_finalize instead of exit
>   tpci200: use instance_finalize instead of exit
>   pci-assign: reclaim memory in instance_finalize instead of exit
>   ahci: reclaim memory in instance_finalize instead of exit
>   msix: split msix_free from msix_uninit
>   cmd646: use instance_finalize instead of exit
>   ide/piix: use instance_finalize instead of exit
>   ide/via: use instance_finalize instead of exit
>   ivshmem: reclaim memory in instance_finalize instead of exit
>   pci-testdev: use instance_finalize instead of exit
>   vfio: reclaim memory in instance_finalize instead of exit
>   e1000: use instance_finalize instead of exit
>   eepro100: use instance_finalize instead of exit
>   ne2000: use instance_finalize instead of exit
>   pcnet: use instance_finalize instead of exit
>   rtl8139: use instance_finalize instead of exit
>   vmxnet3: reclaim memory in instance_finalize instead of exit
>   shpc: split shpc_free from shpc_cleanup
>   pci_bridge: split pci_bridge_free from pci_bridge_exitfn
>   pcie_aer: pcie_aer_exit really frees stuff
>   pci_bridge: reclaim memory in instance_finalize instead of exit
>   ioh4320: reclaim memory in instance_finalize instead of exit
>   xio3130-downstream: reclaim memory in instance_finalize instead of
>     exit
>   xio3130-upstream: reclaim memory in instance_finalize instead of exit
>   pcie: do not recreate mmcfg I/O region, use an alias instead
>   esp: use instance_finalize instead of exit
>   lsi: use instance_finalize instead of exit
>   pvscsi: reclaim memory in instance_finalize instead of exit
>   usb-uhci: use instance_finalize instead of exit
>   virtio-pci: reclaim memory in instance_finalize instead of exit
>   wdt_i6300esb: use instance_finalize instead of exit
>   xen_pt: reclaim memory in instance_finalize instead of exit
>   tpm: move add/del_subregion to realize/unrealize
> 
>  hw/audio/ac97.c                    |  5 ++--
>  hw/audio/es1370.c                  |  5 ++--
>  hw/audio/intel-hda.c               |  8 ++++++
>  hw/char/serial-pci.c               | 24 ++++++++++++++++++
>  hw/char/tpci200.c                  |  5 ++--
>  hw/i386/kvm/pci-assign.c           |  8 ++++++
>  hw/ide/ahci.c                      |  2 +-
>  hw/ide/ahci.h                      |  2 +-
>  hw/ide/cmd646.c                    |  6 ++---
>  hw/ide/ich.c                       | 12 ++++++---
>  hw/ide/piix.c                      |  9 ++++---
>  hw/ide/via.c                       |  6 ++---
>  hw/misc/ivshmem.c                  | 13 +++++++---
>  hw/misc/pci-testdev.c              |  6 ++---
>  hw/misc/vfio.c                     | 52 +++++++++++++++++++++++++++++++++++---
>  hw/net/e1000.c                     |  6 ++---
>  hw/net/eepro100.c                  |  5 ++--
>  hw/net/ne2000.c                    |  5 ++--
>  hw/net/pcnet-pci.c                 |  6 ++---
>  hw/net/rtl8139.c                   |  6 ++---
>  hw/net/vmxnet3.c                   | 14 ++++++++--
>  hw/pci-bridge/i82801b11.c          |  1 +
>  hw/pci-bridge/ioh3420.c            | 11 +++++++-
>  hw/pci-bridge/pci_bridge_dev.c     | 13 +++++++++-
>  hw/pci-bridge/xio3130_downstream.c | 11 +++++++-
>  hw/pci-bridge/xio3130_upstream.c   | 11 +++++++-
>  hw/pci/msix.c                      | 22 +++++++++++-----
>  hw/pci/pci.c                       | 15 ++++++++---
>  hw/pci/pci_bridge.c                |  5 ++++
>  hw/pci/pcie_aer.c                  |  3 ++-
>  hw/pci/pcie_host.c                 | 23 ++++++++++++-----
>  hw/pci/shpc.c                      |  8 +++++-
>  hw/scsi/esp-pci.c                  |  6 ++---
>  hw/scsi/lsi53c895a.c               |  6 ++---
>  hw/scsi/vmw_pvscsi.c               | 12 ++++++++-
>  hw/tpm/tpm_tis.c                   | 17 +++++++++----
>  hw/usb/hcd-uhci.c                  |  5 ++--
>  hw/virtio/virtio-pci.c             | 10 +++++++-
>  hw/watchdog/wdt_i6300esb.c         |  5 ++--
>  hw/xen/xen_pt.c                    | 10 ++++++++
>  hw/xen/xen_pt_config_init.c        |  3 ---
>  hw/xen/xen_pt_msi.c                |  8 +++++-
>  include/hw/pci/msix.h              |  1 +
>  include/hw/pci/pci_bridge.h        |  1 +
>  include/hw/pci/pcie_aer.h          |  2 +-
>  include/hw/pci/pcie_host.h         |  1 +
>  include/hw/pci/shpc.h              |  1 +
>  include/hw/qdev-core.h             |  4 +++
>  48 files changed, 329 insertions(+), 91 deletions(-)
> 
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 02/38] pci: split exit and finalize
  2013-09-17  9:16   ` Michael S. Tsirkin
@ 2013-09-17  9:56     ` Paolo Bonzini
  2013-09-17 10:23       ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17  9:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 11:16, Michael S. Tsirkin ha scritto:
> On Tue, Sep 03, 2013 at 02:32:53PM +0200, Paolo Bonzini wrote:
>> When converting devices to use out-of-BQL memory access, destruction
>> needs to be done in two phases.  First, the device is unrealized;
>> at this point, pending memory accesses can still be completed, but
>> no new accesses will be started.  The second part is freeing the
>> device, which happens only after the reference count drops to zero;
>> this means that all memory accesses are complete.
>>
>> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  hw/pci/pci.c | 15 +++++++++++----
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 4c004f5..bd084c7 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -787,6 +787,16 @@ static void pci_config_free(PCIDevice *pci_dev)
>>      g_free(pci_dev->used);
>>  }
>>  
>> +static void pci_device_instance_finalize(Object *obj)
>> +{
>> +    PCIDevice *pci_dev = PCI_DEVICE(obj);
>> +
>> +    qemu_free_irqs(pci_dev->irq);
>> +
>> +    address_space_destroy(&pci_dev->bus_master_as);
>> +    memory_region_destroy(&pci_dev->bus_master_enable_region);
>> +}
>> +
>>  /* -1 for devfn means auto assign */
>>  static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>                                           const char *name, int devfn)
>> @@ -875,12 +885,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>  
>>  static void do_pci_unregister_device(PCIDevice *pci_dev)
>>  {
>> -    qemu_free_irqs(pci_dev->irq);
> 
> I don't get this one.
> Why do we want to keep irqs about?
> If they manage to send an interrupt to guest *somehow*
> guest will hang with no way to clear.

I can leave this here for now, since IRQs are always triggered under the
BQL.  But I think it's cleaner to do all freeing in instance_finalize
(actually that includes pci_config_free that I somehow missed).

>>      pci_dev->bus->devices[pci_dev->devfn] = NULL;
>>      pci_config_free(pci_dev);
>> -
>> -    address_space_destroy(&pci_dev->bus_master_as);
>> -    memory_region_destroy(&pci_dev->bus_master_enable_region);
> 
> Interesting.
> So you are saying it's important to keep MMIO MRs around until finalize,
> it's not enough that that they are not a subregion of anything?

Yes.  do_pci_unregister_device marks the point where the guest will not
be able to submit new requests to the device, but there may be previous
requests pending. because you could have something like this:

       VCPU 1                    VCPU 2
       ----------------------------------------------------
       start asynchronous I/O
        address_space_map
         address_space_translate
          memory_region_ref
           object_ref
       ** releases BQL
                                  eject device
                                   object_unparent
                                    my_device_exit
                                     memory_region_del_subregion
                                     ** cannot yet destroy!!
                                     ** address_space_unmap will use it
       ** gets BQL again
       asynchronous I/O ends
        address_space_unmap
         memory_region_unref
          object_unref
           instance_finalize
            memory_region_destroy

In RCU terms, do_pci_unregister_device is "removal", while
instance_finalize is "reclamation", but this is not yet getting
RCU-based MMIO dispatch into the picture; it's all using the BQL.  In
fact you could even have just one VCPU that kicks the IO and also ejects
the device, but it's more easily understood if you separate the two actions.

While it generally means the guest is buggy or malicious, of course it
must be handled correctly.

> If not, is e.g. pcie_host_mmcfg_update buggy?

See patch "pcie: do not recreate mmcfg I/O region, use an alias instead"

Paolo

> 
>>  }
>>  
>>  static void pci_unregister_io_regions(PCIDevice *pci_dev)
>> @@ -2252,6 +2258,7 @@ static const TypeInfo pci_device_type_info = {
>>      .abstract = true,
>>      .class_size = sizeof(PCIDeviceClass),
>>      .class_init = pci_device_class_init,
>> +    .instance_finalize = pci_device_instance_finalize,
>>  };
>>  
>>  static void pci_register_types(void)
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit
  2013-09-17  9:21   ` Michael S. Tsirkin
@ 2013-09-17  9:56     ` Paolo Bonzini
  0 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17  9:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 11:21, Michael S. Tsirkin ha scritto:
> On Tue, Sep 03, 2013 at 02:33:01PM +0200, Paolo Bonzini wrote:
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  hw/misc/vfio.c         |  1 +
>>  hw/net/vmxnet3.c       |  3 +++
>>  hw/pci/msix.c          | 22 ++++++++++++++++------
>>  hw/virtio/virtio-pci.c |  1 +
>>  include/hw/pci/msix.h  |  1 +
>>  5 files changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
>> index a1c08fb..1311d0e 100644
>> --- a/hw/misc/vfio.c
>> +++ b/hw/misc/vfio.c
>> @@ -2144,6 +2144,7 @@ static void vfio_teardown_msi(VFIODevice *vdev)
>>      if (vdev->msix) {
>>          msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].mem,
>>                      &vdev->bars[vdev->msix->pba_bar].mem);
>> +        msix_free(&vdev->pdev);
>>      }
>>  }
>>  
>> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
>> index 49c2466..2a79e52 100644
>> --- a/hw/net/vmxnet3.c
>> +++ b/hw/net/vmxnet3.c
>> @@ -1979,6 +1979,7 @@ vmxnet3_init_msix(VMXNET3State *s)
>>          if (!vmxnet3_use_msix_vectors(s, VMXNET3_MAX_INTRS)) {
>>              VMW_WRPRN("Failed to use MSI-X vectors, error %d", res);
>>              msix_uninit(d, &s->msix_bar, &s->msix_bar);
>> +            msix_free(d);
>>              s->msix_used = false;
>>          } else {
>>              s->msix_used = true;
>> @@ -1996,6 +1997,8 @@ vmxnet3_cleanup_msix(VMXNET3State *s)
>>          msix_vector_unuse(d, VMXNET3_MAX_INTRS);
>>          msix_uninit(d, &s->msix_bar, &s->msix_bar);
>>      }
>> +
>> +    msix_free(d);
>>  }
>>  
>>  #define VMXNET3_MSI_NUM_VECTORS   (1)
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index 3430770..0618e3b 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -359,16 +359,26 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar)
>>      msix_free_irq_entries(dev);
>>      dev->msix_entries_nr = 0;
>>      memory_region_del_subregion(pba_bar, &dev->msix_pba_mmio);
>> -    memory_region_destroy(&dev->msix_pba_mmio);
>> -    g_free(dev->msix_pba);
>> -    dev->msix_pba = NULL;
>>      memory_region_del_subregion(table_bar, &dev->msix_table_mmio);
>> -    memory_region_destroy(&dev->msix_table_mmio);
>> -    g_free(dev->msix_table);
>> +    dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
>> +}
>> +
>> +void msix_free(PCIDevice *dev)
>> +{
>> +    if (dev->msix_pba) {
>> +        memory_region_destroy(&dev->msix_pba_mmio);
>> +        g_free(dev->msix_pba);
>> +    }
>> +    dev->msix_pba = NULL;
>> +
>> +    if (dev->msix_table) {
>> +        memory_region_destroy(&dev->msix_table_mmio);
>> +        g_free(dev->msix_table);
>> +    }
>>      dev->msix_table = NULL;
>> +
>>      g_free(dev->msix_entry_used);
>>      dev->msix_entry_used = NULL;
>> -    dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
>>  }
>>  
> 
> I dislike the if (...) here. This only works as long
> as all data is pointers.
> So I'd prefer QEMU_PCI_CAP_MSIX to be cleared in msix_free.
> This way msix_free can just check QEMU_PCI_CAP_MSIX
> and return if not there.

Makes sense.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup
  2013-09-17  9:24   ` Michael S. Tsirkin
@ 2013-09-17  9:58     ` Paolo Bonzini
  2013-09-17 10:03       ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17  9:58 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 11:24, Michael S. Tsirkin ha scritto:
>> @@ -630,15 +630,21 @@ int shpc_bar_size(PCIDevice *d)
>>  void shpc_cleanup(PCIDevice *d, MemoryRegion *bar)
>>  {
>>      SHPCDevice *shpc = d->shpc;
>> +    /* TODO: cleanup config space changes? */
>>      d->cap_present &= ~QEMU_PCI_CAP_SHPC;
>>      memory_region_del_subregion(bar, &shpc->mmio);
>> -    /* TODO: cleanup config space changes? */
>> +}
>> +
>> +void shpc_free(PCIDevice *d)
>> +{
>> +    SHPCDevice *shpc = d->shpc;
>>      g_free(shpc->config);
>>      g_free(shpc->cmask);
>>      g_free(shpc->wmask);
>>      g_free(shpc->w1cmask);
>>      memory_region_destroy(&shpc->mmio);
>>      g_free(shpc);
>> +    d->shpc = NULL;

I dislike having random dangling pointers.  But it's not that bad if I
move the clearing of QEMU_PCI_CAP_SHPC from shpc_cleanup to shpc_free.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17  6:44 ` Wenchao Xia
@ 2013-09-17 10:01   ` Paolo Bonzini
  2013-09-20  6:16     ` Wenchao Xia
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 10:01 UTC (permalink / raw)
  To: Wenchao Xia; +Cc: qemu-devel, mst

Il 17/09/2013 08:44, Wenchao Xia ha scritto:
> Just one question: where is the caller of .instance_finalize(), did
> I missed that patch?

It's called by qom/object.c.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup
  2013-09-17  9:58     ` Paolo Bonzini
@ 2013-09-17 10:03       ` Michael S. Tsirkin
  0 siblings, 0 replies; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 10:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 17, 2013 at 11:58:12AM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 11:24, Michael S. Tsirkin ha scritto:
> >> @@ -630,15 +630,21 @@ int shpc_bar_size(PCIDevice *d)
> >>  void shpc_cleanup(PCIDevice *d, MemoryRegion *bar)
> >>  {
> >>      SHPCDevice *shpc = d->shpc;
> >> +    /* TODO: cleanup config space changes? */
> >>      d->cap_present &= ~QEMU_PCI_CAP_SHPC;
> >>      memory_region_del_subregion(bar, &shpc->mmio);
> >> -    /* TODO: cleanup config space changes? */
> >> +}
> >> +
> >> +void shpc_free(PCIDevice *d)
> >> +{
> >> +    SHPCDevice *shpc = d->shpc;
> >>      g_free(shpc->config);
> >>      g_free(shpc->cmask);
> >>      g_free(shpc->wmask);
> >>      g_free(shpc->w1cmask);
> >>      memory_region_destroy(&shpc->mmio);
> >>      g_free(shpc);
> >> +    d->shpc = NULL;
> 
> I dislike having random dangling pointers.

I don't care much (though there's something to be
said for not initializing memory you don't expect to be
used - it's a compact way to document no one will look at it).

But if this is changed I'd like to do such changes in
a separate patch.

>  But it's not that bad if I
> move the clearing of QEMU_PCI_CAP_SHPC from shpc_cleanup to shpc_free.
> 
> Paolo

Yea.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 02/38] pci: split exit and finalize
  2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
  2013-09-17  9:16   ` Michael S. Tsirkin
@ 2013-09-17 10:06   ` Michael S. Tsirkin
  1 sibling, 0 replies; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 10:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:32:53PM +0200, Paolo Bonzini wrote:
> When converting devices to use out-of-BQL memory access, destruction
> needs to be done in two phases.  First, the device is unrealized;
> at this point, pending memory accesses can still be completed, but
> no new accesses will be started.  The second part is freeing the
> device, which happens only after the reference count drops to zero;
> this means that all memory accesses are complete.
> 
> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/pci/pci.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4c004f5..bd084c7 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -787,6 +787,16 @@ static void pci_config_free(PCIDevice *pci_dev)
>      g_free(pci_dev->used);
>  }
>  
> +static void pci_device_instance_finalize(Object *obj)
> +{
> +    PCIDevice *pci_dev = PCI_DEVICE(obj);
> +
> +    qemu_free_irqs(pci_dev->irq);
> +
> +    address_space_destroy(&pci_dev->bus_master_as);
> +    memory_region_destroy(&pci_dev->bus_master_enable_region);
> +}
> +
>  /* -1 for devfn means auto assign */
>  static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>                                           const char *name, int devfn)

Actually this isn't enough,
memory_region_destroy is called from
pci_del_option_rom too.

> @@ -875,12 +885,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>  
>  static void do_pci_unregister_device(PCIDevice *pci_dev)
>  {
> -    qemu_free_irqs(pci_dev->irq);
>      pci_dev->bus->devices[pci_dev->devfn] = NULL;
>      pci_config_free(pci_dev);
> -
> -    address_space_destroy(&pci_dev->bus_master_as);
> -    memory_region_destroy(&pci_dev->bus_master_enable_region);
>  }
>  
>  static void pci_unregister_io_regions(PCIDevice *pci_dev)
> @@ -2252,6 +2258,7 @@ static const TypeInfo pci_device_type_info = {
>      .abstract = true,
>      .class_size = sizeof(PCIDeviceClass),
>      .class_init = pci_device_class_init,
> +    .instance_finalize = pci_device_instance_finalize,
>  };
>  
>  static void pci_register_types(void)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 17/38] e1000: use instance_finalize instead of exit
  2013-09-17  9:27   ` Michael S. Tsirkin
@ 2013-09-17 10:13     ` Paolo Bonzini
  0 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 10:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 11:27, Michael S. Tsirkin ha scritto:
>> >  static void
>> > -pci_e1000_uninit(PCIDevice *dev)
>> > +pci_e1000_instance_finalize(Object *obj)
>> >  {
>> > -    E1000State *d = E1000(dev);
>> > +    E1000State *d = E1000(obj);
>> >  
>> >      timer_del(d->autoneg_timer);
>> >      timer_free(d->autoneg_timer);
> So this looks wrong.
> This cancels timers after pci device has been destroyed,
> so meanwhile timers can run and send interrupts.

There are definitely cases where the timer deals with pending I/O and
has to run after the device has been removed from guest access.  This is
_not_ yet the point of destruction; the connection to the host backend
still exists in particular (it is only dropped by
object_property_del_all, which is called right after instance_finalize).

It should not be a problem for a device to raise an interrupt after
pci_do_unregister_device; it should go nowhere.  If it is passed to the
guest, it's a bug that we have to fix.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH 02/38] pci: split exit and finalize
  2013-09-17  9:56     ` Paolo Bonzini
@ 2013-09-17 10:23       ` Paolo Bonzini
  0 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 10:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael S. Tsirkin

Il 17/09/2013 11:56, Paolo Bonzini ha scritto:
> Yes.  do_pci_unregister_device marks the point where the guest will not
> be able to submit new requests to the device, but there may be previous
> requests pending. because you could have something like this:

Michael pointed out offlist that the previous example involved the
address_space_map bounce buffer.

Here is a simpler one that doesn't rely on it:

       VCPU 1                    VCPU 2
       ----------------------------------------------------
       start asynchronous I/O
        pci_dma_sglist_init
         object_ref
       ** releases BQL
                                  eject device
                                   object_unparent
                                    my_device_exit
                                     memory_region_del_subregion
                                     ** cannot yet destroy!!
                                     ** address_space_unmap will use it
       ** gets BQL again
       asynchronous I/O ends
        qemu_sglist_destroy
         object_unref
          instance_finalize

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
                   ` (40 preceding siblings ...)
  2013-09-17  9:31 ` Michael S. Tsirkin
@ 2013-09-17 12:47 ` Michael S. Tsirkin
  2013-09-17 14:41   ` Paolo Bonzini
  41 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 12:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 03, 2013 at 02:32:51PM +0200, Paolo Bonzini wrote:
> QOM splits the destruction of a device in two phases:
> 
> - unrealize, also known as "exit" from qdev times, should isolate
>   the device from the guest.  After unrealize returns, the guest
>   should not be able to issue new requests.
> 
> - instance_finalize will reclaim the memory.  This is only called
>   after all requests terminate and drop the references on the
>   device.
> 
> Though overlooked, this is important even now: QEMU's little secret is
> that devices already do access memory out of the iothread mutex (with
> address_space_map/unmap and AIO), and this can be MMIO memory too
> through a bounce buffer.  This series prepares things so that, once
> we'll put the memory_region_ref/unref infrastructure to complete use,
> things will just work.
> 
> Of course this split will be particularly important for devices that
> will be able to do unlocked MMIO.
> 
> This series changes all PCI devices (the sole to support hotplug _and_
> use MemoryRegions) to do memory_region_del_subregion at unrealize time,
> and memory_region_destroy at instance_finalize time.  As it is mostly
> a PCI patch, it should go through mst's tree.
> 
> Paolo

OK so this is the problem.
Memory region reference counting actually does not
have a reference count per MR.
Instead it takes a reference to device:

void memory_region_ref(MemoryRegion *mr)
{
    if (mr && mr->owner) {
        object_ref(mr->owner);
    }
}

void memory_region_unref(MemoryRegion *mr)
{
    if (mr && mr->owner) {
        object_unref(mr->owner);
    }
}

Now object_ref only delays finalize.

Ergo, to make sure a referenced MR does not get
destroyed, we must make sure only finalize
calls memory_region_destroy.

So I think this patchset should do exactly that,
not try to move out more stuff to finalize.

-- 
MST

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 12:47 ` Michael S. Tsirkin
@ 2013-09-17 14:41   ` Paolo Bonzini
  2013-09-17 14:45     ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 14:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 14:47, Michael S. Tsirkin ha scritto:
> Ergo, to make sure a referenced MR does not get
> destroyed, we must make sure only finalize
> calls memory_region_destroy.
> 
> So I think this patchset should do exactly that,
> not try to move out more stuff to finalize.

Yes, this is the part of the problem that applies to all devices.

Some devices may have timers or other dynamically-allocated data that
may be used between exit and finalize.  The same applies to capabilities
such as MSI and MSI-X.  All this must also be moved to finalize, but
this can be done in separate patches.

I'll separate the series as follows:

- memory_region_destroy

- unregister_savevm

- other dynamically allocated data

- del_vm_change_state_handler

- MSI-X

- SHPC, AER and other PCI/bridge stuff

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 14:41   ` Paolo Bonzini
@ 2013-09-17 14:45     ` Michael S. Tsirkin
  2013-09-17 15:41       ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 14:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 17, 2013 at 04:41:45PM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 14:47, Michael S. Tsirkin ha scritto:
> > Ergo, to make sure a referenced MR does not get
> > destroyed, we must make sure only finalize
> > calls memory_region_destroy.
> > 
> > So I think this patchset should do exactly that,
> > not try to move out more stuff to finalize.
> 
> Yes, this is the part of the problem that applies to all devices.
> 
> Some devices may have timers or other dynamically-allocated data that
> may be used between exit and finalize.  The same applies to capabilities
> such as MSI and MSI-X.  All this must also be moved to finalize, but
> this can be done in separate patches.
> 
> I'll separate the series as follows:
> 
> - memory_region_destroy
> 
> - unregister_savevm

What's the problem here exactly?

> - other dynamically allocated data

And here? I don't think we should move rundom stuff out to finalize
just because we can.

> - del_vm_change_state_handler
> 
> - MSI-X
> 
> - SHPC, AER and other PCI/bridge stuff
> 
> Paolo

Let's document specific problems and fix them.
Randomly moving cleanup order has a high chance to
introduce bugs.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 14:45     ` Michael S. Tsirkin
@ 2013-09-17 15:41       ` Paolo Bonzini
  2013-09-17 15:59         ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 15:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 16:45, Michael S. Tsirkin ha scritto:
> On Tue, Sep 17, 2013 at 04:41:45PM +0200, Paolo Bonzini wrote:
>> Il 17/09/2013 14:47, Michael S. Tsirkin ha scritto:
>>> Ergo, to make sure a referenced MR does not get
>>> destroyed, we must make sure only finalize
>>> calls memory_region_destroy.
>>>
>>> So I think this patchset should do exactly that,
>>> not try to move out more stuff to finalize.
>>
>> Yes, this is the part of the problem that applies to all devices.
>>
>> Some devices may have timers or other dynamically-allocated data that
>> may be used between exit and finalize.  The same applies to capabilities
>> such as MSI and MSI-X.  All this must also be moved to finalize, but
>> this can be done in separate patches.
>>
>> I'll separate the series as follows:
>>
>> - memory_region_destroy
>>
>> - unregister_savevm
> 
> What's the problem here exactly?

I was thinking about AIO callbacks here, but we don't support device
hot(un)plug during migration so I can remove this part.

>> - other dynamically allocated data
> 
> And here? I don't think we should move rundom stuff out to finalize
> just because we can.

It's not random stuff, AIO callbacks can access dynamically allocated
data between exit and finalize.  Even if a particular device doesn't do
it, there's no hope of having best practices from contributors if we do
not do things consistently.

>> - del_vm_change_state_handler

Same here: user can stop/cont between exit and finalize, for example
because the I/O failed.

Paolo

>> - MSI-X
>>
>> - SHPC, AER and other PCI/bridge stuff
>>
>> Paolo
> 
> Let's document specific problems and fix them.
> Randomly moving cleanup order has a high chance to
> introduce bugs.
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 15:41       ` Paolo Bonzini
@ 2013-09-17 15:59         ` Michael S. Tsirkin
  2013-09-17 16:13           ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 15:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 17, 2013 at 05:41:52PM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 16:45, Michael S. Tsirkin ha scritto:
> > On Tue, Sep 17, 2013 at 04:41:45PM +0200, Paolo Bonzini wrote:
> >> Il 17/09/2013 14:47, Michael S. Tsirkin ha scritto:
> >>> Ergo, to make sure a referenced MR does not get
> >>> destroyed, we must make sure only finalize
> >>> calls memory_region_destroy.
> >>>
> >>> So I think this patchset should do exactly that,
> >>> not try to move out more stuff to finalize.
> >>
> >> Yes, this is the part of the problem that applies to all devices.
> >>
> >> Some devices may have timers or other dynamically-allocated data that
> >> may be used between exit and finalize.  The same applies to capabilities
> >> such as MSI and MSI-X.  All this must also be moved to finalize, but
> >> this can be done in separate patches.
> >>
> >> I'll separate the series as follows:
> >>
> >> - memory_region_destroy
> >>
> >> - unregister_savevm
> > 
> > What's the problem here exactly?
> 
> I was thinking about AIO callbacks here, but we don't support device
> hot(un)plug during migration so I can remove this part.
> 
> >> - other dynamically allocated data
> > 
> > And here? I don't think we should move rundom stuff out to finalize
> > just because we can.
> 
> It's not random stuff, AIO callbacks can access dynamically allocated
> data between exit and finalize.
> Even if a particular device doesn't do
> it, there's no hope of having best practices from contributors if we do
> not do things consistently.

Yes but just not freeing it is unlikely to be enough.
We need to make sure data structures are consistent.
So this really needs to be done carefully, device by device.

> 
> >> - del_vm_change_state_handler
> 
> Same here: user can stop/cont between exit and finalize, for example
> because the I/O failed.
> 
> Paolo

Device that is not guest visible is very unlikely to
care about whether guest is running.


> >> - MSI-X
> >>
> >> - SHPC, AER and other PCI/bridge stuff
> >>
> >> Paolo
> > 
> > Let's document specific problems and fix them.
> > Randomly moving cleanup order has a high chance to
> > introduce bugs.
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 15:59         ` Michael S. Tsirkin
@ 2013-09-17 16:13           ` Paolo Bonzini
  2013-09-17 16:29             ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 16:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 17:59, Michael S. Tsirkin ha scritto:
> Yes but just not freeing it is unlikely to be enough.
> We need to make sure data structures are consistent.
> So this really needs to be done carefully, device by device.

Of course.  I checked SCSI already and it's sane.

>>>> - del_vm_change_state_handler
>>
>> Same here: user can stop/cont between exit and finalize, for example
>> because the I/O failed.
> 
> Device that is not guest visible is very unlikely to
> care about whether guest is running.

Most devices do not care at all whether the guest is running. :)  If
they do, they usually just use vm_clock.

But retrying failed I/O uses qemu_add_vm_change_set_handler, and that
has to be done even after the device is not guest visible anymore.

BTW, qemu_del_nic is another one that I forgot to mention.  You could
have MMIO that triggers a transmit while the device is going down, for
example.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 16:13           ` Paolo Bonzini
@ 2013-09-17 16:29             ` Michael S. Tsirkin
  2013-09-17 16:58               ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 16:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 17, 2013 at 06:13:51PM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 17:59, Michael S. Tsirkin ha scritto:
> > Yes but just not freeing it is unlikely to be enough.
> > We need to make sure data structures are consistent.
> > So this really needs to be done carefully, device by device.
> 
> Of course.  I checked SCSI already and it's sane.
> 
> >>>> - del_vm_change_state_handler
> >>
> >> Same here: user can stop/cont between exit and finalize, for example
> >> because the I/O failed.
> > 
> > Device that is not guest visible is very unlikely to
> > care about whether guest is running.
> 
> Most devices do not care at all whether the guest is running. :)  If
> they do, they usually just use vm_clock.
> 
> But retrying failed I/O uses qemu_add_vm_change_set_handler, and that
> has to be done even after the device is not guest visible anymore.
> 
> BTW, qemu_del_nic is another one that I forgot to mention.  You could
> have MMIO that triggers a transmit while the device is going down, for
> example.
> 
> Paolo

Wait a second.  This API simply does not make sense.
If region is not visible it's MMIO really mustn't trigger,
exit or no exit.  Disabling region and still getting op callbacks
afterwards is not what any caller of this API expects.

I'm not sure what to do about the bounce buffer thing
but it needs to be fixed some other way without
breaking API.

-- 
MST

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 16:29             ` Michael S. Tsirkin
@ 2013-09-17 16:58               ` Paolo Bonzini
  2013-09-17 17:07                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 16:58 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 18:29, Michael S. Tsirkin ha scritto:
> > BTW, qemu_del_nic is another one that I forgot to mention.  You could
> > have MMIO that triggers a transmit while the device is going down, for
> > example.
> 
> Wait a second.  This API simply does not make sense.
> If region is not visible it's MMIO really mustn't trigger,
> exit or no exit.  Disabling region and still getting op callbacks
> afterwards is not what any caller of this API expects.
> 
> I'm not sure what to do about the bounce buffer thing
> but it needs to be fixed some other way without
> breaking API.

I don't think it's breaking the API.  The very same thing can happen
with RAM.  The only difference is that MMIO calls ops.

Also, this problem is subject to race conditions from buggy or
misbehaving guests.  If you want to have any hope of breaking devices
free of the BQL and do "simple" register I/O without taking a lock,
there is simply no precise moment to stop MMIO at.

All these problems do not happen in real hardware because real hardware
has buffers between the PHY and DMA circuitries, and because bus master
transactions transfer few bytes at a time (for example in PCI even when
a device does burst transactions, the other party can halt them with
such a small granularity).  A device can be quiesced in a matter of
microseconds, and other times (the time for the OS to react to hotplug
requests, the time for the driver to shut down, the time for the human
to physically unplug the connector) can be several order of magnitudes
larger.

Instead we have the opposite scenario, because we want to have as few
buffers as possible and map large amounts of memory (even 4K used by the
bounce buffer is comparatively large) for the host OS's benefit.  When
we do so, and the host backend fails (e.g. a disk is on NFS and there is
a networking problem), memory can remain mapped for a long time.

DMA-to-MMIO may be a theoretical problems only, but if we don't cover it
we have a bogus solution to the problem, because exactly the same thing
can and will happen for memory hot-unplug.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 16:58               ` Paolo Bonzini
@ 2013-09-17 17:07                 ` Michael S. Tsirkin
  2013-09-17 17:16                   ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 17:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 17, 2013 at 06:58:37PM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 18:29, Michael S. Tsirkin ha scritto:
> > > BTW, qemu_del_nic is another one that I forgot to mention.  You could
> > > have MMIO that triggers a transmit while the device is going down, for
> > > example.
> > 
> > Wait a second.  This API simply does not make sense.
> > If region is not visible it's MMIO really mustn't trigger,
> > exit or no exit.  Disabling region and still getting op callbacks
> > afterwards is not what any caller of this API expects.
> > 
> > I'm not sure what to do about the bounce buffer thing
> > but it needs to be fixed some other way without
> > breaking API.
> 
> I don't think it's breaking the API.  The very same thing can happen
> with RAM.  The only difference is that MMIO calls ops.

We can argue about RAM but getting callback after disable is
not really sane.

> Also, this problem is subject to race conditions from buggy or
> misbehaving guests.  If you want to have any hope of breaking devices
> free of the BQL and do "simple" register I/O without taking a lock,
> there is simply no precise moment to stop MMIO at.

I don't see why can't disable MR flush whatever is outstanding.


> All these problems do not happen in real hardware because real hardware
> has buffers between the PHY and DMA circuitries, and because bus master
> transactions transfer few bytes at a time (for example in PCI even when
> a device does burst transactions, the other party can halt them with
> such a small granularity).  A device can be quiesced in a matter of
> microseconds, and other times (the time for the OS to react to hotplug
> requests, the time for the driver to shut down, the time for the human
> to physically unplug the connector) can be several order of magnitudes
> larger.

They don't happen on real hardware because once you disable
memory in a PCI device, it does not accept memory
transactions.

> Instead we have the opposite scenario, because we want to have as few
> buffers as possible and map large amounts of memory (even 4K used by the
> bounce buffer is comparatively large) for the host OS's benefit.  When
> we do so, and the host backend fails (e.g. a disk is on NFS and there is
> a networking problem), memory can remain mapped for a long time.

I don't see why is this a problem.
So memory disable will take a long time.
Who cares? It's not data path.

> DMA-to-MMIO may be a theoretical problems only, but if we don't cover it
> we have a bogus solution to the problem, because exactly the same thing
> can and will happen for memory hot-unplug.
> 
> Paolo

We need to cover it without breaking APIs.

After memory_region_del_subregion returns,
it's a promise that there will not be accesses
to the region.

So I'm not even sure we really need to move destroy to finalize anymore ...

-- 
MST

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 17:07                 ` Michael S. Tsirkin
@ 2013-09-17 17:16                   ` Paolo Bonzini
  2013-09-17 17:26                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 17:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 19:07, Michael S. Tsirkin ha scritto:
> After memory_region_del_subregion returns,
> it's a promise that there will not be accesses
> to the region.

It's racy anyway.  You can have memory_region_del_subregion happen one
clock cycle after the other (physical) CPU has done checked that there
will not be accesses to the region.

A real bus has a "big PCI lock" (there can be only one transaction at a
time), which is exactly what we want to get rid of.

> So I'm not even sure we really need to move destroy to finalize anymore ...

We definitely need to move it.  Even if we added a flag that we set in
memory_region_del_subregion, we need to check it later when AIO completes.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 17:16                   ` Paolo Bonzini
@ 2013-09-17 17:26                     ` Michael S. Tsirkin
  2013-09-17 19:07                       ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 17:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 17, 2013 at 07:16:15PM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 19:07, Michael S. Tsirkin ha scritto:
> > After memory_region_del_subregion returns,
> > it's a promise that there will not be accesses
> > to the region.
> 
> It's racy anyway.  You can have memory_region_del_subregion happen one
> clock cycle after the other (physical) CPU has done checked that there
> will not be accesses to the region.

Fix it then. Stick synchronize_rcu() in memory_region_del_subregion.

Are you trying to convince me there's no way to have
synchronous APIs in presence of RCU?

> A real bus has a "big PCI lock" (there can be only one transaction at a
> time), which is exactly what we want to get rid of.

Not really, in a bridged setup transactions on the
secondary bus are handled in parallel with transactions
on the primary bus.

Also, there are split transactions completed
asynchronously: guests that care about ordering
do flushes e.g. using memory reads.
The logic is in the device/memory region really.

> > So I'm not even sure we really need to move destroy to finalize anymore ...
> 
> We definitely need to move it.  Even if we added a flag that we set in
> memory_region_del_subregion, we need to check it later when AIO completes.
> 
> Paolo

Flag where? region can become subregion of another
region entirely.

The solution for RCU is to flush on changes, not add
more flags and locks.

-- 
MST

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 17:26                     ` Michael S. Tsirkin
@ 2013-09-17 19:07                       ` Paolo Bonzini
  2013-09-17 19:51                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 19:07 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 19:26, Michael S. Tsirkin ha scritto:
> On Tue, Sep 17, 2013 at 07:16:15PM +0200, Paolo Bonzini wrote:
>> Il 17/09/2013 19:07, Michael S. Tsirkin ha scritto:
>>> After memory_region_del_subregion returns,
>>> it's a promise that there will not be accesses
>>> to the region.
>>
>> It's racy anyway.  You can have memory_region_del_subregion happen one
>> clock cycle after the other (physical) CPU has done checked that there
>> will not be accesses to the region.
> 
> Fix it then. Stick synchronize_rcu() in memory_region_del_subregion.

Easier said than done since you shouldn't do synchronize_rcu() with the
BQL taken.

> Are you trying to convince me there's no way to have
> synchronous APIs in presence of RCU?

No, I'm not.  But in the presence of coarse locks it's harder to use
synchronous APIs.  Sticking to past QEMU experiences, the AIO
synchronous cancellation API is a nightmare because the monitor cannot
run while I/O is cancelled.

>> A real bus has a "big PCI lock" (there can be only one transaction at a
>> time), which is exactly what we want to get rid of.
> 
> Not really, in a bridged setup transactions on the
> secondary bus are handled in parallel with transactions
> on the primary bus.

That's not what you're suggesting though.  A device removal on the
secondary bus, triggered by configuration space access on the primary
bus, wouldn't complete the configuration space access on the primary bus
until the removal on the secondary bus is complete.  So you still have a
big lock around transactions on each bus.  Would a PCI bridge behave any
differently?

It's still too coarse for emulation purposes.

> Also, there are split transactions completed
> asynchronously: guests that care about ordering
> do flushes e.g. using memory reads.

Can you do split burst transactions?  If not, the granularity is still
measured in bytes, not pages or more.

Paolo

> The logic is in the device/memory region really.
> 
>>> So I'm not even sure we really need to move destroy to finalize anymore ...
>>
>> We definitely need to move it.  Even if we added a flag that we set in
>> memory_region_del_subregion, we need to check it later when AIO completes.
>>
>> Paolo
> 
> Flag where? region can become subregion of another
> region entirely.
> 
> The solution for RCU is to flush on changes, not add
> more flags and locks.
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 19:07                       ` Paolo Bonzini
@ 2013-09-17 19:51                         ` Michael S. Tsirkin
  2013-09-17 22:02                           ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-17 19:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Tue, Sep 17, 2013 at 09:07:05PM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 19:26, Michael S. Tsirkin ha scritto:
> > On Tue, Sep 17, 2013 at 07:16:15PM +0200, Paolo Bonzini wrote:
> >> Il 17/09/2013 19:07, Michael S. Tsirkin ha scritto:
> >>> After memory_region_del_subregion returns,
> >>> it's a promise that there will not be accesses
> >>> to the region.
> >>
> >> It's racy anyway.  You can have memory_region_del_subregion happen one
> >> clock cycle after the other (physical) CPU has done checked that there
> >> will not be accesses to the region.
> > 
> > Fix it then. Stick synchronize_rcu() in memory_region_del_subregion.
> 
> Easier said than done since you shouldn't do synchronize_rcu() with the
> BQL taken.
> 
> > Are you trying to convince me there's no way to have
> > synchronous APIs in presence of RCU?
> 
> No, I'm not.  But in the presence of coarse locks it's harder to use
> synchronous APIs.  Sticking to past QEMU experiences, the AIO
> synchronous cancellation API is a nightmare because the monitor cannot
> run while I/O is cancelled.
>
> >> A real bus has a "big PCI lock" (there can be only one transaction at a
> >> time), which is exactly what we want to get rid of.
> > 
> > Not really, in a bridged setup transactions on the
> > secondary bus are handled in parallel with transactions
> > on the primary bus.
> 
> That's not what you're suggesting though.  A device removal on the
> secondary bus,
> triggered by configuration space access on the primary
> bus, wouldn't complete the configuration space access on the primary bus
> until the removal on the secondary bus is complete.  So you still have a
> big lock around transactions on each bus.


I'm not really concerned about device removal.  We can work it out.

A much more interesting case is e.g. disabling memory.
E.g.

config write (disable memory)
read (flush out outstanding writes)
write <- must now have no effect


Or disabling bus master:

config write (disable bus master)
read (flush in outstanding writes)
 <- device must now not change memory


And these rules absolutely must be obeyed,
if you don't you'll break guests.


So I think we really should find a way to make the above work correctly.
Removal will follow almost automatically, since it disables memory and
mastering by itself.

Note: I'm talking about writes done by CPU here.
When writes are initiated by device and target MMIO, it's less
practical.  The only case we emulate that I'm familiar with is MSI
writes into APIC, and this MR is never disabled resized or moved.


> Would a PCI bridge behave any differently?

A real bridge? Yes what happens on the primary bus doesn't
affect the secondary bus.

> It's still too coarse for emulation purposes.

It's possible that we'll have to implement real
transaction ordering. If you want to remove BQL for MMIOs I'd
say that's almost a given?

> > Also, there are split transactions completed
> > asynchronously: guests that care about ordering
> > do flushes e.g. using memory reads.
> 
> Can you do split burst transactions?  If not, the granularity is still
> measured in bytes, not pages or more.
> 
> Paolo

I think we can but I'll have to check.

> > The logic is in the device/memory region really.
> > 
> >>> So I'm not even sure we really need to move destroy to finalize anymore ...
> >>
> >> We definitely need to move it.  Even if we added a flag that we set in
> >> memory_region_del_subregion, we need to check it later when AIO completes.
> >>
> >> Paolo
> > 
> > Flag where? region can become subregion of another
> > region entirely.
> > 
> > The solution for RCU is to flush on changes, not add
> > more flags and locks.
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 19:51                         ` Michael S. Tsirkin
@ 2013-09-17 22:02                           ` Paolo Bonzini
  2013-09-18  5:48                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-17 22:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 17/09/2013 21:51, Michael S. Tsirkin ha scritto:
> A much more interesting case is e.g. disabling memory.
> E.g.
> 
> config write (disable memory)
> read (flush out outstanding writes)
> write <- must now have no effect

This works already.  memory_region_del_subregion is synchronous, and
will remain synchronous wrt the current CPU even after memory dispatch
is moved out of the BQL.

This is racy of course:

    VCPU 1                               VCPU 2
    ------------------------------------------------------------
    config write (disable memory)        write
    read

This is also racy:

    VCPU 1                               DMA to MMIO region
    ------------------------------------------------------------
    config write (disable memory)        write
    read

This is the case where a write from another device can do weird things
such as causing a packet to be transmitted on a NIC.  As you wrote (and
I agree) device removal is a superset of disabling memory and bus
master; ergo, if we handle it for disabling memory we need to handle it
for device removal too.  This is why I believe qemu_del_nic belongs in
instance_finalize.

> Or disabling bus master:
> 
> config write (disable bus master)
> read (flush in outstanding writes)
>  <- device must now not change memory

This doesn't work.  The problem is that when you disable bus master any
previous call to address_space_map remains mapped.  Whoever called
address_space_map can and will write blindly to that area.

This cannot be fixed by synchronize_rcu() no matter where you place it.
 The logic is like this

   address_space_map
    rcu_read_lock()
    mr = find memory region for address
    memory_region_ref(mr)
    rcu_read_unlock()
   do actual read or write
   address_space_unmap
    memory_region_unref(mr)

synchronize_rcu() ensures that future writes will not write to memory.
But it does not ensure anything about writes started before.  And
unfortunately a write "starts" at the moment you start an AIO operation,
and lasts until roughly when the AIO callback executes.

If you want to fix that, the right place to do it is the PCI core.  You
need a new callback of some sort in the PCI device, that will
synchronously cancel all pending I/O when the bus master bit is set to zero.

I don't think address_space_map/unmap has any equivalent PCI transaction
(or exists in any other sane bus, for that matter).  However, we do it
pervasively for obvious performance reasons.  I'm afraid that this is an
emulation tradeoff that we cannot avoid.

> And these rules absolutely must be obeyed,
> if you don't you'll break guests.

Yes (and I think it was already discussed, I have some deja vu feeling).
 I'm not sure it is _that_ important (QEMU lived with a no-op bus master
bit for years and nothing exploded), but it is certainly good to know
what works and what doesn't---and why, and whether it is fixable.

> So I think we really should find a way to make the above work correctly.
> Removal will follow almost automatically, since it disables memory and
> mastering by itself.

These patches are concerned with the "almost" part.  The difference
between config writes and removal is that, as the code is currently
written, config writes cannot cause dangling pointers.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 22:02                           ` Paolo Bonzini
@ 2013-09-18  5:48                             ` Michael S. Tsirkin
  2013-09-18  7:40                               ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-18  5:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Wed, Sep 18, 2013 at 12:02:51AM +0200, Paolo Bonzini wrote:
> Il 17/09/2013 21:51, Michael S. Tsirkin ha scritto:
> > A much more interesting case is e.g. disabling memory.
> > E.g.
> > 
> > config write (disable memory)
> > read (flush out outstanding writes)
> > write <- must now have no effect
> 
> This works already.  memory_region_del_subregion is synchronous, and
> will remain synchronous wrt the current CPU even after memory dispatch
> is moved out of the BQL.
> 
> This is racy of course:
> 
>     VCPU 1                               VCPU 2
>     ------------------------------------------------------------
>     config write (disable memory)        write
>     read
> 
> This is also racy:
> 
>     VCPU 1                               DMA to MMIO region


What about normal DMA to memory?
I don't believe DMA to MMIO is relevant ATM,
we can just make it fail (e.g. as Express spec 1.0 allowed).

>     ------------------------------------------------------------
>     config write (disable memory)        write
>     read
> 
> This is the case where a write from another device can do weird things
> such as causing a packet to be transmitted on a NIC.  As you wrote (and
> I agree) device removal is a superset of disabling memory and bus
> master; ergo, if we handle it for disabling memory we need to handle it
> for device removal too.  This is why I believe qemu_del_nic belongs in
> instance_finalize.
>
> > Or disabling bus master:
> > 
> > config write (disable bus master)
> > read (flush in outstanding writes)
> >  <- device must now not change memory
> 
> This doesn't work.  The problem is that when you disable bus master any
> previous call to address_space_map remains mapped.  Whoever called
> address_space_map can and will write blindly to that area.
> 
> This cannot be fixed by synchronize_rcu() no matter where you place it.
>  The logic is like this
> 
>    address_space_map
>     rcu_read_lock()
>     mr = find memory region for address
>     memory_region_ref(mr)
>     rcu_read_unlock()
>    do actual read or write
>    address_space_unmap
>     memory_region_unref(mr)
> 
> synchronize_rcu() ensures that future writes will not write to memory.
> But it does not ensure anything about writes started before.  And
> unfortunately a write "starts" at the moment you start an AIO operation,
> and lasts until roughly when the AIO callback executes.
> 
> If you want to fix that, the right place to do it is the PCI core.  You
> need a new callback of some sort in the PCI device, that will
> synchronously cancel all pending I/O when the bus master bit is set to zero.


It has to work even without config changes really.

So I think the fix is actually obeying ordering rules,
that is know that write is in progress
and flush on read.

> I don't think address_space_map/unmap has any equivalent PCI transaction
> (or exists in any other sane bus, for that matter).  However, we do it
> pervasively for obvious performance reasons.  I'm afraid that this is an
> emulation tradeoff that we cannot avoid.
> 
> > And these rules absolutely must be obeyed,
> > if you don't you'll break guests.
> 
> Yes (and I think it was already discussed, I have some deja vu feeling).
>  I'm not sure it is _that_ important (QEMU lived with a no-op bus master
> bit for years and nothing exploded),

Yes but you can't pass e.g. WHQL without it.
And even bus master is less important than obeying ordering rules.
We got and get away with ignoring them because of the BQL.
The moment emulated PCI devices write into memory without a BQL
we'll need to emulate ordering.

> but it is certainly good to know
> what works and what doesn't---and why, and whether it is fixable.
> 
> > So I think we really should find a way to make the above work correctly.
> > Removal will follow almost automatically, since it disables memory and
> > mastering by itself.
> 
> These patches are concerned with the "almost" part.  The difference
> between config writes and removal is that, as the code is currently
> written, config writes cannot cause dangling pointers.
> 
> Paolo

I'm not really objecting to the patches.

I think moving memory region destroy out to finalize makes sense
irrespectively, as long as destroy is made idempotent so we can simply
destroy everything without worrying whether we initialized it.

The rest of the changes will be harder, we'll have to do
them carefully on a case by case basis.

-- 
MST

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-18  5:48                             ` Michael S. Tsirkin
@ 2013-09-18  7:40                               ` Paolo Bonzini
  2013-09-18  8:41                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-18  7:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 18/09/2013 07:48, Michael S. Tsirkin ha scritto:
> So I think the fix is actually obeying ordering rules,
> that is know that write is in progress
> and flush on read.

I think this can be modeled as a generic, synchronous
(*busmaster_cancel)(PCIDevice*) callback, that is called after bus
master is turned off.  You don't even really have to wait for a read.

> I think moving memory region destroy out to finalize makes sense
> irrespectively, as long as destroy is made idempotent so we can simply
> destroy everything without worrying whether we initialized it.
> 
> The rest of the changes will be harder, we'll have to do
> them carefully on a case by case basis.

Good, we are in agreement then.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-18  7:40                               ` Paolo Bonzini
@ 2013-09-18  8:41                                 ` Michael S. Tsirkin
  2013-09-18 11:26                                   ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Michael S. Tsirkin @ 2013-09-18  8:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Wed, Sep 18, 2013 at 09:40:19AM +0200, Paolo Bonzini wrote:
> Il 18/09/2013 07:48, Michael S. Tsirkin ha scritto:
> > So I think the fix is actually obeying ordering rules,
> > that is know that write is in progress
> > and flush on read.
> 
> I think this can be modeled as a generic, synchronous
> (*busmaster_cancel)(PCIDevice*) callback, that is called after bus
> master is turned off.  You don't even really have to wait for a read.

Not really.
Bus master is just an single instance of the bigger issue.
It could be any device-specific register just as well.

PCI reads and writes must obey ordering rules.
ATM MMIO and DMA achieve this by using a single lock.
If you want to move MMIO and DMA out of a common lock you must find some
other way to force ordering.

> > I think moving memory region destroy out to finalize makes sense
> > irrespectively, as long as destroy is made idempotent so we can simply
> > destroy everything without worrying whether we initialized it.
> > 
> > The rest of the changes will be harder, we'll have to do
> > them carefully on a case by case basis.
> 
> Good, we are in agreement then.
> 
> Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-18  8:41                                 ` Michael S. Tsirkin
@ 2013-09-18 11:26                                   ` Paolo Bonzini
  2013-09-18 11:56                                     ` Peter Maydell
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-18 11:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

Il 18/09/2013 10:41, Michael S. Tsirkin ha scritto:
> On Wed, Sep 18, 2013 at 09:40:19AM +0200, Paolo Bonzini wrote:
>> Il 18/09/2013 07:48, Michael S. Tsirkin ha scritto:
>>> So I think the fix is actually obeying ordering rules,
>>> that is know that write is in progress
>>> and flush on read.
>>
>> I think this can be modeled as a generic, synchronous
>> (*busmaster_cancel)(PCIDevice*) callback, that is called after bus
>> master is turned off.  You don't even really have to wait for a read.
> 
> Not really.
> Bus master is just an single instance of the bigger issue.
> It could be any device-specific register just as well.
> 
> PCI reads and writes must obey ordering rules.
> ATM MMIO and DMA achieve this by using a single lock.

There are two issues.

One is synchronization with address_space_map...unmap.  Reads and writes
from address_space_map/unmap are already handled outside the BQL
(possibly in the kernel), so they already ignore any ordering.  As far
as memory writes are concerned, it is indeed specific to bus master, see
PCIe spec 6.4:

   The ability of the driver and/or system software to block new
   Requests from the device is supported by the Bus Master Enable, SERR
   Enable, and Interrupt Disable bits in the Command register
   (Section 7.5.1.1), and other such control bits.

The second is ordering of writes and between writes and reads.  This
does not concern device->memory transaction because when a device does
DMA it can use smp_*mb() to achieve the ordering.  Similarly I think we
can ignore message requests (but we probably have hidden bugs now, for
example you probably need a smp_wmb() in msi{,x}_notify).

But even though this only concerns CPU->device transactions, the
restrictions are very strong.  They apply across distinct devices
(because the ordering is already enforced at the root complex level,
right?), so basically you'd have to wrap each and every MMIO operation
destined to BARs or configuration space with some kind of
pci_global_{read,write}_{start,end} API.  This is likely slower than
just having a "big MMIO lock" around all memory accesses(*).  Devices
can then move part of the processing to a separate thread or bottom half.

     (*) You cannot really do that for all of them; if a BAR is backed
         by RAM, accesses would still be unordered since they do not go
         through QEMU at all.

But does guest code actually care?  In many cases, I suspect that
sticking a smp_rmb() in the read side of "unlocked" register accesses,
and a smp_wmb() in the write side, will do just fine.  And add a
compatibility property to place a device back under the BQL for guests
that have problems.

Paolo

> If you want to move MMIO and DMA out of a common lock you must find some
> other way to force ordering.
> 
>>> I think moving memory region destroy out to finalize makes sense
>>> irrespectively, as long as destroy is made idempotent so we can simply
>>> destroy everything without worrying whether we initialized it.
>>>
>>> The rest of the changes will be harder, we'll have to do
>>> them carefully on a case by case basis.
>>
>> Good, we are in agreement then.
>>
>> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-18 11:26                                   ` Paolo Bonzini
@ 2013-09-18 11:56                                     ` Peter Maydell
  2013-09-18 13:11                                       ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Maydell @ 2013-09-18 11:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: QEMU Developers, Michael S. Tsirkin

On 18 September 2013 20:26, Paolo Bonzini <pbonzini@redhat.com> wrote:
> But does guest code actually care?  In many cases, I suspect that
> sticking a smp_rmb() in the read side of "unlocked" register accesses,
> and a smp_wmb() in the write side, will do just fine.  And add a
> compatibility property to place a device back under the BQL for guests
> that have problems.

Yuck. This sounds like a recipe for spending the next five years
debugging subtle race conditions. We need to continue to support
the semantics that the architecture and hardware specs define
for memory access orderings to emulated devices.

-- PMM

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-18 11:56                                     ` Peter Maydell
@ 2013-09-18 13:11                                       ` Paolo Bonzini
  2013-09-18 13:19                                         ` Peter Maydell
  0 siblings, 1 reply; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-18 13:11 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Michael S. Tsirkin

Il 18/09/2013 13:56, Peter Maydell ha scritto:
>> > But does guest code actually care?  In many cases, I suspect that
>> > sticking a smp_rmb() in the read side of "unlocked" register accesses,
>> > and a smp_wmb() in the write side, will do just fine.  And add a
>> > compatibility property to place a device back under the BQL for guests
>> > that have problems.
> Yuck. This sounds like a recipe for spending the next five years
> debugging subtle race conditions. We need to continue to support
> the semantics that the architecture and hardware specs define
> for memory access orderings to emulated devices.

We cannot in the general case, QEMU is not a cycle-exact simulator.

You need to look at the particular case.  And if you look at particular
cases, you'll find many that are already broken now.

For example, we already have no such guarantee for RAM BARs when running
under KVM, because accesses do not go through QEMU and are not
serialized by the BQL.

Or you could have a device with an MSI vector, program it to write to
RAM, and poll the RAM location from the guest.  Such a write would
currently not be ordered with previous DMA from the device, which
contradicts the PCI spec.  (This is a bug and can be fixed).

address_space_map/unmap pretty much breaks any DMA that is concurrent
with control register access (e.g. the PCI command register).

And all these cases are already there!

Moving devices outside the BQL of course generates more of them.  But
it's not like everything is broken.  For example, ordering memory access
to one emulated device from one CPU is handled naturally (in either TCG
or KVM mode).  Ordering of accesses from a CPU with those from the QEMU
data-plane code is also handled simply with locks or memory barriers
private to the device.

With multiple VCPUs operating at the same time (e.g. the send path of a
network driver on a VCPU, with the interrupts processed on another VCPU)
the activities are likely not independent and the guest is doing its own
synchronization anyway.  It's more likely that they use a lock, but they
can even do Dekker-style synchronization using MMIO registers and it
will just work as long as MMIO read/write ops use
atomic_mb_read/atomic_mb_set (i.e. as long as the bus ordering
guarantees are implemented locally to the device).

There's nothing magic, really.  Both PV and real devices have been doing
it forever by placing some registers in RAM instead of MMIO, and
communicating synchronization points via interrupts and doorbell registers.

But above all, devices have to request BQL-free MMIO explicitly.  You do
not have to use it at all, you can just use all the infrastructure to do
unlocked bus-master DMA (which is anyway already broken from the
ordering POV).  You can limit BQL-free MMIO to PV devices, or to
extremely simple devices, or to one or two highly-optimized registers.
There is a huge gamut of choices, and no magic really.

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-18 13:11                                       ` Paolo Bonzini
@ 2013-09-18 13:19                                         ` Peter Maydell
  2013-09-18 13:28                                           ` Paolo Bonzini
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Maydell @ 2013-09-18 13:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: QEMU Developers, Michael S. Tsirkin

On 18 September 2013 22:11, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 18/09/2013 13:56, Peter Maydell ha scritto:
>>> > But does guest code actually care?  In many cases, I suspect that
>>> > sticking a smp_rmb() in the read side of "unlocked" register accesses,
>>> > and a smp_wmb() in the write side, will do just fine.  And add a
>>> > compatibility property to place a device back under the BQL for guests
>>> > that have problems.
>> Yuck. This sounds like a recipe for spending the next five years
>> debugging subtle race conditions. We need to continue to support
>> the semantics that the architecture and hardware specs define
>> for memory access orderings to emulated devices.
>
> We cannot in the general case, QEMU is not a cycle-exact simulator.

This isn't a cycle-accuracy issue (unsurprisingly, since bus specs
and architecture barrier definitions are written to work with multiple
implementations of the same CPU).

> There's nothing magic, really.  Both PV and real devices have been doing
> it forever by placing some registers in RAM instead of MMIO, and
> communicating synchronization points via interrupts and doorbell registers.

Sure, but that's a hardware design choice, it's different from
ripping out the assumptions about device behaviour from
underneath an existing driver.

-- PMM

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-18 13:19                                         ` Peter Maydell
@ 2013-09-18 13:28                                           ` Paolo Bonzini
  0 siblings, 0 replies; 78+ messages in thread
From: Paolo Bonzini @ 2013-09-18 13:28 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Michael S. Tsirkin

Il 18/09/2013 15:19, Peter Maydell ha scritto:
> > There's nothing magic, really.  Both PV and real devices have been doing
> > it forever by placing some registers in RAM instead of MMIO, and
> > communicating synchronization points via interrupts and doorbell registers.
>
> Sure, but that's a hardware design choice, it's different from
> ripping out the assumptions about device behaviour from
> underneath an existing driver.

That's why I wrote:

Devices have to request BQL-free MMIO explicitly.  You can just use all
the infrastructure to do unlocked bus-master DMA, you can limit BQL-free
MMIO to PV devices, or to extremely simple devices, or to one or two
highly-optimized registers.

Of course the choice of what's safe and what's not depends on the
hardware design and on what the device can expect from the drivers (the
PCI spec doesn't forbid putting additional requirements).

Paolo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize
  2013-09-17 10:01   ` Paolo Bonzini
@ 2013-09-20  6:16     ` Wenchao Xia
  0 siblings, 0 replies; 78+ messages in thread
From: Wenchao Xia @ 2013-09-20  6:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, mst

On 09/17/2013 06:01 PM, Paolo Bonzini wrote:
> Il 17/09/2013 08:44, Wenchao Xia ha scritto:
>> Just one question: where is the caller of .instance_finalize(), did
>> I missed that patch?
> It's called by qom/object.c.
>
> Paolo
>
I see, thanks for tipping.

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2013-09-20  6:16 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-03 12:32 [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 01/38] qdev: document assumption that unrealize is followed by finalize Paolo Bonzini
2013-09-17  9:00   ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 02/38] pci: split exit and finalize Paolo Bonzini
2013-09-17  9:16   ` Michael S. Tsirkin
2013-09-17  9:56     ` Paolo Bonzini
2013-09-17 10:23       ` Paolo Bonzini
2013-09-17 10:06   ` Michael S. Tsirkin
2013-09-03 12:32 ` [Qemu-devel] [PATCH 03/38] ac97: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 04/38] es1370: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 05/38] hda: reclaim memory in " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 06/38] serial: " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 07/38] tpci200: use " Paolo Bonzini
2013-09-03 12:32 ` [Qemu-devel] [PATCH 08/38] pci-assign: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 09/38] ahci: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 10/38] msix: split msix_free from msix_uninit Paolo Bonzini
2013-09-17  9:21   ` Michael S. Tsirkin
2013-09-17  9:56     ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 11/38] cmd646: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 12/38] ide/piix: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 13/38] ide/via: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 14/38] ivshmem: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 15/38] pci-testdev: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 16/38] vfio: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 17/38] e1000: use " Paolo Bonzini
2013-09-17  9:27   ` Michael S. Tsirkin
2013-09-17 10:13     ` Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 18/38] eepro100: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 19/38] ne2000: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 20/38] pcnet: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 21/38] rtl8139: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 22/38] vmxnet3: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 23/38] shpc: split shpc_free from shpc_cleanup Paolo Bonzini
2013-09-17  9:24   ` Michael S. Tsirkin
2013-09-17  9:58     ` Paolo Bonzini
2013-09-17 10:03       ` Michael S. Tsirkin
2013-09-03 12:33 ` [Qemu-devel] [PATCH 24/38] pci_bridge: split pci_bridge_free from pci_bridge_exitfn Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 25/38] pcie_aer: pcie_aer_exit really frees stuff Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 26/38] pci_bridge: reclaim memory in instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 27/38] ioh4320: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 28/38] xio3130-downstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 29/38] xio3130-upstream: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 30/38] pcie: do not recreate mmcfg I/O region, use an alias instead Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 31/38] esp: use instance_finalize instead of exit Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 32/38] lsi: " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 33/38] pvscsi: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 34/38] usb-uhci: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 35/38] virtio-pci: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 36/38] wdt_i6300esb: use " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 37/38] xen_pt: reclaim memory in " Paolo Bonzini
2013-09-03 12:33 ` [Qemu-devel] [PATCH 38/38] tpm: move add/del_subregion to realize/unrealize Paolo Bonzini
2013-09-16 16:35 ` [Qemu-devel] [PATCH v2 00/38] Delay destruction of memory regions to instance_finalize Paolo Bonzini
2013-09-17  6:44 ` Wenchao Xia
2013-09-17 10:01   ` Paolo Bonzini
2013-09-20  6:16     ` Wenchao Xia
2013-09-17  9:31 ` Michael S. Tsirkin
2013-09-17 12:47 ` Michael S. Tsirkin
2013-09-17 14:41   ` Paolo Bonzini
2013-09-17 14:45     ` Michael S. Tsirkin
2013-09-17 15:41       ` Paolo Bonzini
2013-09-17 15:59         ` Michael S. Tsirkin
2013-09-17 16:13           ` Paolo Bonzini
2013-09-17 16:29             ` Michael S. Tsirkin
2013-09-17 16:58               ` Paolo Bonzini
2013-09-17 17:07                 ` Michael S. Tsirkin
2013-09-17 17:16                   ` Paolo Bonzini
2013-09-17 17:26                     ` Michael S. Tsirkin
2013-09-17 19:07                       ` Paolo Bonzini
2013-09-17 19:51                         ` Michael S. Tsirkin
2013-09-17 22:02                           ` Paolo Bonzini
2013-09-18  5:48                             ` Michael S. Tsirkin
2013-09-18  7:40                               ` Paolo Bonzini
2013-09-18  8:41                                 ` Michael S. Tsirkin
2013-09-18 11:26                                   ` Paolo Bonzini
2013-09-18 11:56                                     ` Peter Maydell
2013-09-18 13:11                                       ` Paolo Bonzini
2013-09-18 13:19                                         ` Peter Maydell
2013-09-18 13:28                                           ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.