All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase
@ 2017-03-31  7:36 Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 1/5] QemuOpts: introduce qemu_opts_extract() Peter Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Peter Xu @ 2017-03-31  7:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, \  Michael S . Tsirkin \ ,
	peterx, Jason Wang, Markus Armbruster

At the very beginning, the x86 vIOMMUs are created via "-M iommu=on".
We moved one step further a year ago to have the vIOMMUs just like a
general device, so that we can init them with far more specific
parameters with "-device" interface.

However, gradually we found that problem starts to occur due to this.
The main issue is that the DMA address space of any PCI device is
postponed to be init after device realization, while some devices'
realizations would depend on this address space. That looks like a
dead lock. We have patches and solutions for different single problem,
but, maybe it's time we can consider to solve the root cause this
time, of course after 2.9 release.

This series tries to solve the root cause, and move vIOMMU inits back
to machine init for x86 platforms. Then, we'll have solid DMA address
space for each device even during realization.

Please kindly review. Thanks.

Peter Xu (5):
  QemuOpts: introduce qemu_opts_extract()
  util: export device_init_func()
  util: propagate error for device_func_init()
  q35: init vIOMMU during machine init
  pci: move dma_as init back to bus realize

 hw/pci-host/q35.c        | 45 +++++++++++++++++++++++++++++++++++++++++++++
 hw/pci/pci.c             | 21 +--------------------
 include/hw/pci/pci_bus.h |  2 --
 include/qemu-common.h    |  1 +
 include/qemu/option.h    |  2 ++
 include/sysemu/sysemu.h  |  1 -
 util/qemu-option.c       | 24 ++++++++++++++++++++++++
 vl.c                     |  9 ++-------
 8 files changed, 75 insertions(+), 30 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH for-2.10 1/5] QemuOpts: introduce qemu_opts_extract()
  2017-03-31  7:36 [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Peter Xu
@ 2017-03-31  7:36 ` Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 2/5] util: export device_init_func() Peter Xu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2017-03-31  7:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, \  Michael S . Tsirkin \ ,
	peterx, Jason Wang, Markus Armbruster

This helper function is used to extract specific QemuOpts item from an
existing QemuOptsList which matches specific patterns.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qemu/option.h |  2 ++
 util/qemu-option.c    | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index f7338db..355cee8 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -136,6 +136,8 @@ void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
 typedef int (*qemu_opts_loopfunc)(void *opaque, QemuOpts *opts, Error **errp);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func,
                       void *opaque, Error **errp);
+QemuOpts *qemu_opts_extract(QemuOptsList *list, qemu_opts_loopfunc func,
+                            void *opaque, Error **errp);
 void qemu_opts_print(QemuOpts *opts, const char *sep);
 void qemu_opts_print_help(QemuOptsList *list);
 void qemu_opts_free(QemuOptsList *list);
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 5ce1b5c..7c34d88 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1121,6 +1121,30 @@ int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func,
     return rc;
 }
 
+/*
+ * Extract specific QemuOpts from a QemuOptsList. For each QemuOpts
+ * item, if checks against func() returns zero, it'll be picked out
+ * from current QemuOptsList, then returned. If there are more than
+ * one QemuOpts that match the check, will only return the first one
+ * found.
+ */
+QemuOpts *qemu_opts_extract(QemuOptsList *list, qemu_opts_loopfunc func,
+                            void *opaque, Error **errp)
+{
+    QemuOpts *opts, *next_opts;
+
+    assert(list && func);
+
+    QTAILQ_FOREACH_SAFE(opts, &list->head, next, next_opts) {
+        if (func(opaque, opts, errp) == 0) {
+            QTAILQ_REMOVE(&list->head, opts, next);
+            return opts;
+        }
+    }
+
+    return NULL;
+}
+
 static size_t count_opts_list(QemuOptsList *list)
 {
     QemuOptDesc *desc = NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH for-2.10 2/5] util: export device_init_func()
  2017-03-31  7:36 [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 1/5] QemuOpts: introduce qemu_opts_extract() Peter Xu
@ 2017-03-31  7:36 ` Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 3/5] util: propagate error for device_func_init() Peter Xu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2017-03-31  7:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, \  Michael S . Tsirkin \ ,
	peterx, Jason Wang, Markus Armbruster

This general routine is used to create most of the "-device" objects.
Export it so that other modules can use it as well.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qemu-common.h | 1 +
 vl.c                  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index d218821..0e6bb3b 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -125,6 +125,7 @@ const char *qemu_get_vm_name(void);
 #define QEMU_FILE_TYPE_BIOS   0
 #define QEMU_FILE_TYPE_KEYMAP 1
 char *qemu_find_file(int type, const char *name);
+int device_init_func(void *opaque, QemuOpts *opts, Error **errp);
 
 /* OS specific functions */
 void os_setup_early_signal_handling(void);
diff --git a/vl.c b/vl.c
index 0b4ed52..b97b32a 100644
--- a/vl.c
+++ b/vl.c
@@ -2297,7 +2297,7 @@ static int device_help_func(void *opaque, QemuOpts *opts, Error **errp)
     return qdev_device_help(opts);
 }
 
-static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
+int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
     Error *err = NULL;
     DeviceState *dev;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH for-2.10 3/5] util: propagate error for device_func_init()
  2017-03-31  7:36 [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 1/5] QemuOpts: introduce qemu_opts_extract() Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 2/5] util: export device_init_func() Peter Xu
@ 2017-03-31  7:36 ` Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 4/5] q35: init vIOMMU during machine init Peter Xu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2017-03-31  7:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, \  Michael S . Tsirkin \ ,
	peterx, Jason Wang, Markus Armbruster

We have error_propagate(). Use it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 vl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vl.c b/vl.c
index b97b32a..b18bde7 100644
--- a/vl.c
+++ b/vl.c
@@ -2304,7 +2304,7 @@ int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
 
     dev = qdev_device_add(opts, &err);
     if (!dev) {
-        error_report_err(err);
+        error_propagate(errp, err);
         return -1;
     }
     object_unref(OBJECT(dev));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH for-2.10 4/5] q35: init vIOMMU during machine init
  2017-03-31  7:36 [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Peter Xu
                   ` (2 preceding siblings ...)
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 3/5] util: propagate error for device_func_init() Peter Xu
@ 2017-03-31  7:36 ` Peter Xu
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 5/5] pci: move dma_as init back to bus realize Peter Xu
  2017-03-31 16:17 ` [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Michael S. Tsirkin
  5 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2017-03-31  7:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, \  Michael S . Tsirkin \ ,
	peterx, Jason Wang, Markus Armbruster

Now x86 vIOMMUs are init along with all the rest of "-devices". That may
not be sufficient since some devices' realization will depend on the
vIOMMU object. Let's move the vIOMMU init back to machine init, so
that'll be far earlier than all the rest of devices.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci-host/q35.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 344f77b..c5becea 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -28,6 +28,8 @@
  * THE SOFTWARE.
  */
 #include "qemu/osdep.h"
+#include "qemu/option.h"
+#include "qemu/config-file.h"
 #include "hw/hw.h"
 #include "hw/pci-host/q35.h"
 #include "qapi/error.h"
@@ -460,6 +462,26 @@ static void mch_reset(DeviceState *qdev)
     mch_update(mch);
 }
 
+static int x86_iommu_detecter(void *opaque, QemuOpts *opts, Error **errp)
+{
+    const char *driver = qemu_opt_get(opts, "driver");
+
+    if (!driver) {
+        /*
+         * We don't need to set any error here. It'll be invoked later
+         * when init all the devices. Here we can just concentrate on
+         * the IOMMU device.
+         */
+        return -1;
+    }
+
+    if (!strcmp(driver, "intel-iommu") || !strcmp(driver, "amd-iommu")) {
+        return 0;
+    }
+
+    return -1;
+}
+
 static void mch_realize(PCIDevice *d, Error **errp)
 {
     int i;
@@ -518,6 +540,29 @@ static void mch_realize(PCIDevice *d, Error **errp)
                  mch->pci_address_space, &mch->pam_regions[i+1],
                  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
     }
+
+    /*
+     * Initialize vIOMMUs during machine init. Now we have vIOMMUs
+     * configured with "-device", let's try to pick them out in the
+     * device list, and init them before the rest of the devices (some
+     * device will depend on the vIOMMU during its realization).
+     *
+     * TODO: support multiple vIOMMUs. This loop prepares for that.
+     */
+    while (1) {
+        QemuOpts *iommu_opts;
+
+        iommu_opts = qemu_opts_extract(qemu_find_opts("device"),
+                                       x86_iommu_detecter, NULL, errp);
+        if (!iommu_opts) {
+            break;
+        }
+
+        /* Found one IOMMU device, init it */
+        if (device_init_func(NULL, iommu_opts, errp)) {
+            return;
+        }
+    }
 }
 
 uint64_t mch_mcfg_base(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH for-2.10 5/5] pci: move dma_as init back to bus realize
  2017-03-31  7:36 [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Peter Xu
                   ` (3 preceding siblings ...)
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 4/5] q35: init vIOMMU during machine init Peter Xu
@ 2017-03-31  7:36 ` Peter Xu
  2017-03-31 16:17 ` [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Michael S. Tsirkin
  5 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2017-03-31  7:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, \  Michael S . Tsirkin \ ,
	peterx, Jason Wang, Markus Armbruster

This patch partly reverted commit b86eacb ("hw/pci: delay
bus_master_enable_region initialization"). In that patch, we postponed
pci DMA address space initialization to support the new "-device"
interface for "intel-iommu" device. Now since we have vIOMMUs inited
back during machine init phase, we won't need this notifier mechanism
any more.

This brings us a benefit that all device realization will be able to
have a valid bus_master_as now.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c             | 21 +--------------------
 include/hw/pci/pci_bus.h |  2 --
 include/sysemu/sysemu.h  |  1 -
 vl.c                     |  5 -----
 4 files changed, 1 insertion(+), 28 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index bd8043c..dd32ddf 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -94,25 +94,10 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
                                 &pci_dev->bus_master_enable_region);
 }
 
-static void pcibus_machine_done(Notifier *notifier, void *data)
-{
-    PCIBus *bus = container_of(notifier, PCIBus, machine_done);
-    int i;
-
-    for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
-        if (bus->devices[i]) {
-            pci_init_bus_master(bus->devices[i]);
-        }
-    }
-}
-
 static void pci_bus_realize(BusState *qbus, Error **errp)
 {
     PCIBus *bus = PCI_BUS(qbus);
 
-    bus->machine_done.notify = pcibus_machine_done;
-    qemu_add_machine_init_done_notifier(&bus->machine_done);
-
     vmstate_register(NULL, -1, &vmstate_pcibus, bus);
 }
 
@@ -120,8 +105,6 @@ static void pci_bus_unrealize(BusState *qbus, Error **errp)
 {
     PCIBus *bus = PCI_BUS(qbus);
 
-    qemu_remove_machine_init_done_notifier(&bus->machine_done);
-
     vmstate_unregister(NULL, &vmstate_pcibus, bus);
 }
 
@@ -1004,9 +987,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
     address_space_init(&pci_dev->bus_master_as,
                        &pci_dev->bus_master_container_region, pci_dev->name);
 
-    if (qdev_hotplug) {
-        pci_init_bus_master(pci_dev);
-    }
+    pci_init_bus_master(pci_dev);
     pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
     pci_dev->irq_state = 0;
     pci_config_alloc(pci_dev);
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 5484a9b..403fec6 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -39,8 +39,6 @@ struct PCIBus {
        Keep a count of the number of devices with raised IRQs.  */
     int nirq;
     int *irq_count;
-
-    Notifier machine_done;
 };
 
 typedef struct PCIBridgeWindows PCIBridgeWindows;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 576c7ce..8b686b5 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -73,7 +73,6 @@ void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
 void qemu_add_machine_init_done_notifier(Notifier *notify);
-void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
 void hmp_savevm(Monitor *mon, const QDict *qdict);
 int save_vmstate(Monitor *mon, const char *name);
diff --git a/vl.c b/vl.c
index b18bde7..3786a08 100644
--- a/vl.c
+++ b/vl.c
@@ -2660,11 +2660,6 @@ void qemu_add_machine_init_done_notifier(Notifier *notify)
     }
 }
 
-void qemu_remove_machine_init_done_notifier(Notifier *notify)
-{
-    notifier_remove(notify);
-}
-
 static void qemu_run_machine_init_done_notifiers(void)
 {
     notifier_list_notify(&machine_init_done_notifiers, NULL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase
  2017-03-31  7:36 [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Peter Xu
                   ` (4 preceding siblings ...)
  2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 5/5] pci: move dma_as init back to bus realize Peter Xu
@ 2017-03-31 16:17 ` Michael S. Tsirkin
  2017-04-01  1:29   ` Peter Xu
  5 siblings, 1 reply; 10+ messages in thread
From: Michael S. Tsirkin @ 2017-03-31 16:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, Jason Wang, Markus Armbruster

On Fri, Mar 31, 2017 at 03:36:28PM +0800, Peter Xu wrote:
> At the very beginning, the x86 vIOMMUs are created via "-M iommu=on".
> We moved one step further a year ago to have the vIOMMUs just like a
> general device, so that we can init them with far more specific
> parameters with "-device" interface.
> 
> However, gradually we found that problem starts to occur due to this.
> The main issue is that the DMA address space of any PCI device is
> postponed to be init after device realization, while some devices'
> realizations would depend on this address space. That looks like a
> dead lock. We have patches and solutions for different single problem,
> but, maybe it's time we can consider to solve the root cause this
> time, of course after 2.9 release.
> 
> This series tries to solve the root cause, and move vIOMMU inits back
> to machine init for x86 platforms. Then, we'll have solid DMA address
> space for each device even during realization.
> 
> Please kindly review. Thanks.

I think it's a clean way to do it at a high level.
However I would like to set a tag in the class
rather than listing specific devices.
Also, init order should be consistent for all machines
not just q35.



> Peter Xu (5):
>   QemuOpts: introduce qemu_opts_extract()
>   util: export device_init_func()
>   util: propagate error for device_func_init()
>   q35: init vIOMMU during machine init
>   pci: move dma_as init back to bus realize
> 
>  hw/pci-host/q35.c        | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  hw/pci/pci.c             | 21 +--------------------
>  include/hw/pci/pci_bus.h |  2 --
>  include/qemu-common.h    |  1 +
>  include/qemu/option.h    |  2 ++
>  include/sysemu/sysemu.h  |  1 -
>  util/qemu-option.c       | 24 ++++++++++++++++++++++++
>  vl.c                     |  9 ++-------
>  8 files changed, 75 insertions(+), 30 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase
  2017-03-31 16:17 ` [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Michael S. Tsirkin
@ 2017-04-01  1:29   ` Peter Xu
  2017-04-11  9:22     ` Markus Armbruster
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2017-04-01  1:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, tianyu.lan, Paolo Bonzini, kevin.tian, yi.l.liu,
	Marcel Apfelbaum, Jason Wang, Markus Armbruster

On Fri, Mar 31, 2017 at 07:17:34PM +0300, Michael S. Tsirkin wrote:
> On Fri, Mar 31, 2017 at 03:36:28PM +0800, Peter Xu wrote:
> > At the very beginning, the x86 vIOMMUs are created via "-M iommu=on".
> > We moved one step further a year ago to have the vIOMMUs just like a
> > general device, so that we can init them with far more specific
> > parameters with "-device" interface.
> > 
> > However, gradually we found that problem starts to occur due to this.
> > The main issue is that the DMA address space of any PCI device is
> > postponed to be init after device realization, while some devices'
> > realizations would depend on this address space. That looks like a
> > dead lock. We have patches and solutions for different single problem,
> > but, maybe it's time we can consider to solve the root cause this
> > time, of course after 2.9 release.
> > 
> > This series tries to solve the root cause, and move vIOMMU inits back
> > to machine init for x86 platforms. Then, we'll have solid DMA address
> > space for each device even during realization.
> > 
> > Please kindly review. Thanks.
> 
> I think it's a clean way to do it at a high level.
> However I would like to set a tag in the class
> rather than listing specific devices.
> Also, init order should be consistent for all machines
> not just q35.

I agree that we may finally need a tag if we want to have a general
solution for device init ordering. However I was thinking maybe for
x86 IOMMUs we should even move the init earlier than an ordered device
list, considering the integrated devices are created during machine
init, and it's before the general device init loop.

Actually, iiuc that's also following how the real hardware works -
since the IOMMU unit (now we only have the root vIOMMU) belongs to
north bridge, so imho in emulation codes we'd better follow how the
hardware works if possible (I believe in hardware IOMMUs should be
inited along with north bridge).

(btw, maybe I should mark at least the first patch as RFC...)

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase
  2017-04-01  1:29   ` Peter Xu
@ 2017-04-11  9:22     ` Markus Armbruster
  2017-04-11 10:08       ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Markus Armbruster @ 2017-04-11  9:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, tianyu.lan, kevin.tian, yi.l.liu, Jason Wang,
	qemu-devel, Marcel Apfelbaum, Paolo Bonzini

Peter Xu <peterx@redhat.com> writes:

> On Fri, Mar 31, 2017 at 07:17:34PM +0300, Michael S. Tsirkin wrote:
>> On Fri, Mar 31, 2017 at 03:36:28PM +0800, Peter Xu wrote:
>> > At the very beginning, the x86 vIOMMUs are created via "-M iommu=on".
>> > We moved one step further a year ago to have the vIOMMUs just like a
>> > general device, so that we can init them with far more specific
>> > parameters with "-device" interface.
>> > 
>> > However, gradually we found that problem starts to occur due to this.
>> > The main issue is that the DMA address space of any PCI device is
>> > postponed to be init after device realization, while some devices'
>> > realizations would depend on this address space. That looks like a
>> > dead lock. We have patches and solutions for different single problem,
>> > but, maybe it's time we can consider to solve the root cause this
>> > time, of course after 2.9 release.
>> > 
>> > This series tries to solve the root cause, and move vIOMMU inits back
>> > to machine init for x86 platforms. Then, we'll have solid DMA address
>> > space for each device even during realization.
>> > 
>> > Please kindly review. Thanks.
>> 
>> I think it's a clean way to do it at a high level.
>> However I would like to set a tag in the class
>> rather than listing specific devices.
>> Also, init order should be consistent for all machines
>> not just q35.
>
> I agree that we may finally need a tag if we want to have a general
> solution for device init ordering. However I was thinking maybe for
> x86 IOMMUs we should even move the init earlier than an ordered device
> list, considering the integrated devices are created during machine
> init, and it's before the general device init loop.
>
> Actually, iiuc that's also following how the real hardware works -
> since the IOMMU unit (now we only have the root vIOMMU) belongs to
> north bridge, so imho in emulation codes we'd better follow how the
> hardware works if possible (I believe in hardware IOMMUs should be
> inited along with north bridge).
>
> (btw, maybe I should mark at least the first patch as RFC...)

No objection to fixing yet another initialization order problem by
making it yet another special case, but I think we should really, really
sit down and try to come up with a *generic* way to express
initialization order constraints.  This is a modelling problem: we need
to model how physical hardware resets.  As long as we don't, we're
damned to tinker with this rickety tower of special cases.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase
  2017-04-11  9:22     ` Markus Armbruster
@ 2017-04-11 10:08       ` Peter Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2017-04-11 10:08 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Michael S. Tsirkin, tianyu.lan, kevin.tian, yi.l.liu, Jason Wang,
	qemu-devel, Marcel Apfelbaum, Paolo Bonzini

On Tue, Apr 11, 2017 at 11:22:38AM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Mar 31, 2017 at 07:17:34PM +0300, Michael S. Tsirkin wrote:
> >> On Fri, Mar 31, 2017 at 03:36:28PM +0800, Peter Xu wrote:
> >> > At the very beginning, the x86 vIOMMUs are created via "-M iommu=on".
> >> > We moved one step further a year ago to have the vIOMMUs just like a
> >> > general device, so that we can init them with far more specific
> >> > parameters with "-device" interface.
> >> > 
> >> > However, gradually we found that problem starts to occur due to this.
> >> > The main issue is that the DMA address space of any PCI device is
> >> > postponed to be init after device realization, while some devices'
> >> > realizations would depend on this address space. That looks like a
> >> > dead lock. We have patches and solutions for different single problem,
> >> > but, maybe it's time we can consider to solve the root cause this
> >> > time, of course after 2.9 release.
> >> > 
> >> > This series tries to solve the root cause, and move vIOMMU inits back
> >> > to machine init for x86 platforms. Then, we'll have solid DMA address
> >> > space for each device even during realization.
> >> > 
> >> > Please kindly review. Thanks.
> >> 
> >> I think it's a clean way to do it at a high level.
> >> However I would like to set a tag in the class
> >> rather than listing specific devices.
> >> Also, init order should be consistent for all machines
> >> not just q35.
> >
> > I agree that we may finally need a tag if we want to have a general
> > solution for device init ordering. However I was thinking maybe for
> > x86 IOMMUs we should even move the init earlier than an ordered device
> > list, considering the integrated devices are created during machine
> > init, and it's before the general device init loop.
> >
> > Actually, iiuc that's also following how the real hardware works -
> > since the IOMMU unit (now we only have the root vIOMMU) belongs to
> > north bridge, so imho in emulation codes we'd better follow how the
> > hardware works if possible (I believe in hardware IOMMUs should be
> > inited along with north bridge).
> >
> > (btw, maybe I should mark at least the first patch as RFC...)
> 
> No objection to fixing yet another initialization order problem by
> making it yet another special case, but I think we should really, really
> sit down and try to come up with a *generic* way to express
> initialization order constraints.  This is a modelling problem: we need
> to model how physical hardware resets.  As long as we don't, we're
> damned to tinker with this rickety tower of special cases.

Hi, Markus,

Thanks for the comment. I totally agree with you that we need a
generic way to express the init ordering constraints. However, IMO
IOMMU use case is slightly special here...

Take q35 as example. Q35 machine itself is an object for now. Maybe we
can express an ordering constraint in the future that we want to
create one device A "after" the q35 machine, or create device B
"before" device C, but it'll be hard to express that we want to create
one device "during q35 init, but before creation of integrated devices
(including default VGA, network devices)". Currently on x86 we have
only one IOMMU, and that IOMMU is just exactly part of Q35
northbridge. Imho that's the reason why IOMMU is special comparing to
other device init ordering constraints (although it's using "-device"
keyword).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-04-11 10:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31  7:36 [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Peter Xu
2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 1/5] QemuOpts: introduce qemu_opts_extract() Peter Xu
2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 2/5] util: export device_init_func() Peter Xu
2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 3/5] util: propagate error for device_func_init() Peter Xu
2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 4/5] q35: init vIOMMU during machine init Peter Xu
2017-03-31  7:36 ` [Qemu-devel] [PATCH for-2.10 5/5] pci: move dma_as init back to bus realize Peter Xu
2017-03-31 16:17 ` [Qemu-devel] [PATCH for-2.10 0/5] x86/vIOMMU: move init back to machine init phase Michael S. Tsirkin
2017-04-01  1:29   ` Peter Xu
2017-04-11  9:22     ` Markus Armbruster
2017-04-11 10:08       ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.