qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU
@ 2021-10-28  4:31 Peter Xu
  2021-10-28  4:31 ` [PATCH v2 1/5] pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn Peter Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Peter Xu @ 2021-10-28  4:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, peterx, Eric Auger, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	David Gibson

Note that patch 1-4 are cleanups for pci subsystem, and patch 5 is a fix to
fail early for mis-ordered qemu cmdline on vfio and vIOMMU.  Logically they
should be posted separately and they're not directly related, however to make
it still correlated to v1 I kept them in the same patchset.

In this version I used pre_plug() hook for q35 to detect the ordering issue as
Igor suggested, meanwhile it's done via object_resolve_path_type() rather than
scanning the pci bus as Michael suggested.

Please review, thanks.

v2 changelog:
- Picked up r-b where I can
- Merged patch 1 & 4, 2 & 3, 5 & 6
- s/pci_root_bus_args/PCIRootBusArgs/ [David, Michael]
- Replace "void* " with "void *" in pci.h [Phil]
- Dropped "pci: Add pci_for_each_device_all()"
- Dropped "x86-iommu: Fail early if vIOMMU specified after vfio-pci"
- Added "qom: object_child_foreach_recursive_type()"
- Added "pc/q35: Add pre-plug hook for x86-iommu"

v1: https://lore.kernel.org/qemu-devel/20211021104259.57754-1-peterx@redhat.com/

Peter Xu (5):
  pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn
  pci: Export pci_for_each_device_under_bus*()
  qom: object_child_foreach_recursive_type()
  pci: Add pci_for_each_root_bus()
  pc/q35: Add pre-plug hook for x86-iommu

 hw/arm/virt-acpi-build.c    | 31 ++++++++--------------
 hw/i386/acpi-build.c        | 39 +++++++---------------------
 hw/i386/pc.c                |  4 +++
 hw/i386/x86-iommu.c         | 14 ++++++++++
 hw/pci/pci.c                | 52 +++++++++++++++++++++++++------------
 hw/pci/pcie.c               |  4 +--
 hw/ppc/spapr_pci.c          | 12 ++++-----
 hw/ppc/spapr_pci_nvlink2.c  |  7 +++--
 hw/ppc/spapr_pci_vfio.c     |  4 +--
 hw/s390x/s390-pci-bus.c     |  5 ++--
 hw/xen/xen_pt.c             |  4 +--
 include/hw/i386/x86-iommu.h |  8 ++++++
 include/hw/pci/pci.h        | 26 ++++++++++++-------
 include/qom/object.h        | 20 ++++++++++++++
 qom/object.c                | 27 +++++++++++++++++++
 15 files changed, 160 insertions(+), 97 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/5] pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn
  2021-10-28  4:31 [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
@ 2021-10-28  4:31 ` Peter Xu
  2021-10-28  4:31 ` [PATCH v2 2/5] pci: Export pci_for_each_device_under_bus*() Peter Xu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2021-10-28  4:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, peterx, Eric Auger, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	David Gibson

They're used in quite a few places of pci.[ch] and also in the rest of the code
base.  Define them so that it doesn't need to be defined all over the places.

The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
void* pointer.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c         | 20 ++++++--------------
 include/hw/pci/pci.h | 19 +++++++++----------
 2 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 186758ee11..17e59cb3a3 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1655,9 +1655,7 @@ static const pci_class_desc pci_class_descriptions[] =
 };
 
 static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
-                                                  void (*fn)(PCIBus *b,
-                                                             PCIDevice *d,
-                                                             void *opaque),
+                                                  pci_bus_dev_fn fn,
                                                   void *opaque)
 {
     PCIDevice *d;
@@ -1672,8 +1670,7 @@ static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
 }
 
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
-                         void *opaque)
+                                 pci_bus_dev_fn fn, void *opaque)
 {
     bus = pci_find_bus_nr(bus, bus_num);
 
@@ -1683,9 +1680,7 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
 }
 
 static void pci_for_each_device_under_bus(PCIBus *bus,
-                                          void (*fn)(PCIBus *b, PCIDevice *d,
-                                                     void *opaque),
-                                          void *opaque)
+                                          pci_bus_dev_fn fn, void *opaque)
 {
     PCIDevice *d;
     int devfn;
@@ -1699,8 +1694,7 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
 }
 
 void pci_for_each_device(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
-                         void *opaque)
+                         pci_bus_dev_fn fn, void *opaque)
 {
     bus = pci_find_bus_nr(bus, bus_num);
 
@@ -2078,10 +2072,8 @@ static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num)
     return NULL;
 }
 
-void pci_for_each_bus_depth_first(PCIBus *bus,
-                                  void *(*begin)(PCIBus *bus, void *parent_state),
-                                  void (*end)(PCIBus *bus, void *state),
-                                  void *parent_state)
+void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
+                                  pci_bus_fn end, void *parent_state)
 {
     PCIBus *sec;
     void *state;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 7fc90132cf..4a8740b76b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -401,6 +401,10 @@ typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
 OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
 #define TYPE_PCIE_BUS "PCIE"
 
+typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
+typedef void (*pci_bus_fn)(PCIBus *b, void *opaque);
+typedef void *(*pci_bus_ret_fn)(PCIBus *b, void *opaque);
+
 bool pci_bus_is_express(PCIBus *bus);
 
 void pci_root_bus_init(PCIBus *bus, size_t bus_size, DeviceState *parent,
@@ -458,23 +462,18 @@ static inline int pci_dev_bus_num(const PCIDevice *dev)
 
 int pci_bus_numa_node(PCIBus *bus);
 void pci_for_each_device(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
+                         pci_bus_dev_fn fn,
                          void *opaque);
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
-                                 void (*fn)(PCIBus *bus, PCIDevice *d,
-                                            void *opaque),
+                                 pci_bus_dev_fn fn,
                                  void *opaque);
-void pci_for_each_bus_depth_first(PCIBus *bus,
-                                  void *(*begin)(PCIBus *bus, void *parent_state),
-                                  void (*end)(PCIBus *bus, void *state),
-                                  void *parent_state);
+void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
+                                  pci_bus_fn end, void *parent_state);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
 
 /* Use this wrapper when specific scan order is not required. */
 static inline
-void pci_for_each_bus(PCIBus *bus,
-                      void (*fn)(PCIBus *bus, void *opaque),
-                      void *opaque)
+void pci_for_each_bus(PCIBus *bus, pci_bus_fn fn, void *opaque)
 {
     pci_for_each_bus_depth_first(bus, NULL, fn, opaque);
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/5] pci: Export pci_for_each_device_under_bus*()
  2021-10-28  4:31 [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
  2021-10-28  4:31 ` [PATCH v2 1/5] pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn Peter Xu
@ 2021-10-28  4:31 ` Peter Xu
  2021-10-29  0:06   ` David Gibson
  2021-10-28  4:31 ` [PATCH v2 3/5] qom: object_child_foreach_recursive_type() Peter Xu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-10-28  4:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, peterx, Eric Auger, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	David Gibson

They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand.  After exporting we can switch a lot
of the call sites to use these two helpers.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/acpi-build.c       |  5 ++---
 hw/pci/pci.c               | 10 +++++-----
 hw/pci/pcie.c              |  4 +---
 hw/ppc/spapr_pci.c         | 12 +++++-------
 hw/ppc/spapr_pci_nvlink2.c |  7 +++----
 hw/ppc/spapr_pci_vfio.c    |  4 ++--
 hw/s390x/s390-pci-bus.c    |  5 ++---
 hw/xen/xen_pt.c            |  4 ++--
 include/hw/pci/pci.h       |  5 +++++
 9 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 81418b7911..a76b17ed92 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2132,8 +2132,7 @@ dmar_host_bridges(Object *obj, void *opaque)
         PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
 
         if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device(bus, pci_bus_num(bus), insert_scope,
-                                scope_blob);
+            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
         }
     }
 
@@ -2339,7 +2338,7 @@ ivrs_host_bridges(Object *obj, void *opaque)
         PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
 
         if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_blob);
+            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
         }
     }
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 17e59cb3a3..4a84e478ce 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1654,9 +1654,9 @@ static const pci_class_desc pci_class_descriptions[] =
     { 0, NULL}
 };
 
-static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
-                                                  pci_bus_dev_fn fn,
-                                                  void *opaque)
+void pci_for_each_device_under_bus_reverse(PCIBus *bus,
+                                           pci_bus_dev_fn fn,
+                                           void *opaque)
 {
     PCIDevice *d;
     int devfn;
@@ -1679,8 +1679,8 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
     }
 }
 
-static void pci_for_each_device_under_bus(PCIBus *bus,
-                                          pci_bus_dev_fn fn, void *opaque)
+void pci_for_each_device_under_bus(PCIBus *bus,
+                                   pci_bus_dev_fn fn, void *opaque)
 {
     PCIDevice *d;
     int devfn;
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 6e95d82903..914a9bf3d1 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -694,9 +694,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
         (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
         (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
         PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
-        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                            pcie_unplug_device, NULL);
-
+        pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
         pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
                                      PCI_EXP_SLTSTA_PDS);
         if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7430bd6314..5bfd4aa9e5 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1317,8 +1317,7 @@ static int spapr_dt_pci_bus(SpaprPhbState *sphb, PCIBus *bus,
                           RESOURCE_CELLS_SIZE));
 
     assert(bus);
-    pci_for_each_device_reverse(bus, pci_bus_num(bus),
-                                spapr_dt_pci_device_cb, &cbinfo);
+    pci_for_each_device_under_bus_reverse(bus, spapr_dt_pci_device_cb, &cbinfo);
     if (cbinfo.err) {
         return cbinfo.err;
     }
@@ -2306,8 +2305,8 @@ static void spapr_phb_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
         return;
     }
 
-    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                        spapr_phb_pci_enumerate_bridge, bus_no);
+    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_enumerate_bridge,
+                                  bus_no);
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, *bus_no, 1);
 }
 
@@ -2316,9 +2315,8 @@ static void spapr_phb_pci_enumerate(SpaprPhbState *phb)
     PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
     unsigned int bus_no = 0;
 
-    pci_for_each_device(bus, pci_bus_num(bus),
-                        spapr_phb_pci_enumerate_bridge,
-                        &bus_no);
+    pci_for_each_device_under_bus(bus, spapr_phb_pci_enumerate_bridge,
+                                  &bus_no);
 
 }
 
diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
index 8ef9b40a18..7fb0cf4d04 100644
--- a/hw/ppc/spapr_pci_nvlink2.c
+++ b/hw/ppc/spapr_pci_nvlink2.c
@@ -164,8 +164,7 @@ static void spapr_phb_pci_collect_nvgpu(PCIBus *bus, PCIDevice *pdev,
         return;
     }
 
-    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                        spapr_phb_pci_collect_nvgpu, opaque);
+    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_collect_nvgpu, opaque);
 }
 
 void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
@@ -183,8 +182,8 @@ void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
     sphb->nvgpus->nv2_atsd_current = sphb->nv2_atsd_win_addr;
 
     bus = PCI_HOST_BRIDGE(sphb)->bus;
-    pci_for_each_device(bus, pci_bus_num(bus),
-                        spapr_phb_pci_collect_nvgpu, sphb->nvgpus);
+    pci_for_each_device_under_bus(bus, spapr_phb_pci_collect_nvgpu,
+                                  sphb->nvgpus);
 
     if (sphb->nvgpus->err) {
         error_propagate(errp, sphb->nvgpus->err);
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index f3b37df8ea..2a76b4e0b5 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -164,8 +164,8 @@ static void spapr_phb_vfio_eeh_clear_dev_msix(PCIBus *bus,
 
 static void spapr_phb_vfio_eeh_clear_bus_msix(PCIBus *bus, void *opaque)
 {
-       pci_for_each_device(bus, pci_bus_num(bus),
-                           spapr_phb_vfio_eeh_clear_dev_msix, NULL);
+       pci_for_each_device_under_bus(bus, spapr_phb_vfio_eeh_clear_dev_msix,
+                                     NULL);
 }
 
 static void spapr_phb_vfio_eeh_pre_reset(SpaprPhbState *sphb)
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 6fafffb029..1b51a72838 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1163,8 +1163,7 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
     }
 
     /* Assign numbers to all child bridges. The last is the highest number. */
-    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                        s390_pci_enumerate_bridge, s);
+    pci_for_each_device_under_bus(sec_bus, s390_pci_enumerate_bridge, s);
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
 }
 
@@ -1193,7 +1192,7 @@ static void s390_pcihost_reset(DeviceState *dev)
      * on every system reset, we also have to reassign numbers.
      */
     s->bus_no = 0;
-    pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
+    pci_for_each_device_under_bus(bus, s390_pci_enumerate_bridge, s);
 }
 
 static void s390_pcihost_class_init(ObjectClass *klass, void *data)
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index ca0a98187e..027190fa44 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -615,8 +615,8 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
     }
 
     args.type = d->io_regions[bar].type;
-    pci_for_each_device(pci_get_bus(d), pci_dev_bus_num(d),
-                        xen_pt_check_bar_overlap, &args);
+    pci_for_each_device_under_bus(pci_get_bus(d),
+                                  xen_pt_check_bar_overlap, &args);
     if (args.rc) {
         XEN_PT_WARN(d, "Region: %d (addr: 0x%"FMT_PCIBUS
                     ", len: 0x%"FMT_PCIBUS") is overlapped.\n",
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 4a8740b76b..5c4016b995 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -467,6 +467,11 @@ void pci_for_each_device(PCIBus *bus, int bus_num,
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
                                  pci_bus_dev_fn fn,
                                  void *opaque);
+void pci_for_each_device_under_bus(PCIBus *bus,
+                                   pci_bus_dev_fn fn, void *opaque);
+void pci_for_each_device_under_bus_reverse(PCIBus *bus,
+                                           pci_bus_dev_fn fn,
+                                           void *opaque);
 void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
                                   pci_bus_fn end, void *parent_state);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/5] qom: object_child_foreach_recursive_type()
  2021-10-28  4:31 [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
  2021-10-28  4:31 ` [PATCH v2 1/5] pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn Peter Xu
  2021-10-28  4:31 ` [PATCH v2 2/5] pci: Export pci_for_each_device_under_bus*() Peter Xu
@ 2021-10-28  4:31 ` Peter Xu
  2021-10-28  7:06   ` David Hildenbrand
  2021-10-28  4:31 ` [PATCH v2 4/5] pci: Add pci_for_each_root_bus() Peter Xu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-10-28  4:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, peterx, Eric Auger, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	David Gibson

Add this sister helper besides object_child_foreach_recursive() to loop over
child objects only if the object can be casted to a specific type.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/qom/object.h | 20 ++++++++++++++++++++
 qom/object.c         | 27 +++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index faae0d841f..355277db40 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1926,6 +1926,26 @@ int object_child_foreach(Object *obj, int (*fn)(Object *child, void *opaque),
 int object_child_foreach_recursive(Object *obj,
                                    int (*fn)(Object *child, void *opaque),
                                    void *opaque);
+
+/**
+ * object_child_foreach_recursive_type:
+ * @obj: the object whose children will be navigated
+ * @type: the typename string to scan
+ * @fn: the iterator function to be called
+ * @opaque: an opaque value that will be passed to the iterator
+ *
+ * This is a special version of object_child_foreach_recursive() so that we
+ * only call the fn() if the child can be casted to the @typename specified.
+ * Please refer to the comments above object_child_foreach_recursive() for
+ * more details.
+ *
+ * Returns: The last value returned by @fn, or 0 if there is no child.
+ */
+int object_child_foreach_recursive_type(Object *obj,
+                                        const char *typename,
+                                        int (*fn)(Object *child, void *opaque),
+                                        void *opaque);
+
 /**
  * container_get:
  * @root: root of the #path, e.g., object_get_root()
diff --git a/qom/object.c b/qom/object.c
index 6be710bc40..d25ca09b1d 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1134,6 +1134,33 @@ int object_child_foreach_recursive(Object *obj,
     return do_object_child_foreach(obj, fn, opaque, true);
 }
 
+typedef struct {
+    const char *typename;
+    int (*fn)(Object *child, void *opaque);
+    void *opaque;
+} ObjectTypeArgs;
+
+static int object_child_hook(Object *child, void *opaque)
+{
+    ObjectTypeArgs *args = opaque;
+
+    if (object_dynamic_cast(child, args->typename)) {
+        return args->fn(child, args->opaque);
+    }
+
+    return 0;
+}
+
+int object_child_foreach_recursive_type(Object *obj,
+                                        const char *typename,
+                                        int (*fn)(Object *child, void *opaque),
+                                        void *opaque)
+{
+    ObjectTypeArgs args = { .typename = typename, .fn = fn, .opaque = opaque };
+
+    return object_child_foreach_recursive(obj, object_child_hook, &args);
+}
+
 static void object_class_get_list_tramp(ObjectClass *klass, void *opaque)
 {
     GSList **list = opaque;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/5] pci: Add pci_for_each_root_bus()
  2021-10-28  4:31 [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (2 preceding siblings ...)
  2021-10-28  4:31 ` [PATCH v2 3/5] qom: object_child_foreach_recursive_type() Peter Xu
@ 2021-10-28  4:31 ` Peter Xu
  2021-10-28  7:09   ` David Hildenbrand
  2021-10-28  4:31 ` [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu Peter Xu
  2021-11-02  1:15 ` [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
  5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-10-28  4:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, peterx, Eric Auger, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	David Gibson

Add a helper to loop over each root bus of the system, either the default root
bus or extended buses like pxb-pcie.

There're three places that can be rewritten with the pci_for_each_root_bus()
helper that we just introduced.  De-dup the code.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/arm/virt-acpi-build.c | 31 +++++++++++--------------------
 hw/i386/acpi-build.c     | 38 ++++++++++----------------------------
 hw/pci/pci.c             | 26 ++++++++++++++++++++++++++
 include/hw/pci/pci.h     |  2 ++
 4 files changed, 49 insertions(+), 48 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 674f902652..adba51f35a 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -264,28 +264,20 @@ struct AcpiIortIdMapping {
 typedef struct AcpiIortIdMapping AcpiIortIdMapping;
 
 /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
-static int
-iort_host_bridges(Object *obj, void *opaque)
+static void
+iort_host_bridges(PCIBus *bus, void *opaque)
 {
-    GArray *idmap_blob = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-        if (bus && !pci_bus_bypass_iommu(bus)) {
-            int min_bus, max_bus;
+    if (!pci_bus_bypass_iommu(bus)) {
+        int min_bus, max_bus;
 
-            pci_bus_range(bus, &min_bus, &max_bus);
+        pci_bus_range(bus, &min_bus, &max_bus);
 
-            AcpiIortIdMapping idmap = {
-                .input_base = min_bus << 8,
-                .id_count = (max_bus - min_bus + 1) << 8,
-            };
-            g_array_append_val(idmap_blob, idmap);
-        }
+        AcpiIortIdMapping idmap = {
+            .input_base = min_bus << 8,
+            .id_count = (max_bus - min_bus + 1) << 8,
+        };
+        g_array_append_val((GArray *)opaque, idmap);
     }
-
-    return 0;
 }
 
 static int iort_idmap_compare(gconstpointer a, gconstpointer b)
@@ -320,8 +312,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     if (vms->iommu == VIRT_IOMMU_SMMUV3) {
         AcpiIortIdMapping next_range = {0};
 
-        object_child_foreach_recursive(object_get_root(),
-                                       iort_host_bridges, smmu_idmaps);
+        pci_for_each_root_bus(iort_host_bridges, smmu_idmaps);
 
         /* Sort the smmu idmap by input_base */
         g_array_sort(smmu_idmaps, iort_idmap_compare);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a76b17ed92..3e50acfe35 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2123,20 +2123,12 @@ insert_scope(PCIBus *bus, PCIDevice *dev, void *opaque)
 }
 
 /* For a given PCI host bridge, walk and insert DMAR scope */
-static int
-dmar_host_bridges(Object *obj, void *opaque)
+static void
+dmar_host_bridges(PCIBus *bus, void *opaque)
 {
-    GArray *scope_blob = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-        if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
-        }
+    if (!pci_bus_bypass_iommu(bus)) {
+        pci_for_each_device_under_bus(bus, insert_scope, opaque);
     }
-
-    return 0;
 }
 
 /*
@@ -2165,8 +2157,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
      * Insert scope for each PCI bridge and endpoint device which
      * is attached to a bus with iommu enabled.
      */
-    object_child_foreach_recursive(object_get_root(),
-                                   dmar_host_bridges, scope_blob);
+    pci_for_each_root_bus(dmar_host_bridges, scope_blob);
 
     assert(iommu);
     if (x86_iommu_ir_supported(iommu)) {
@@ -2329,20 +2320,12 @@ insert_ivhd(PCIBus *bus, PCIDevice *dev, void *opaque)
 }
 
 /* For all PCI host bridges, walk and insert IVHD entries */
-static int
-ivrs_host_bridges(Object *obj, void *opaque)
+static void
+ivrs_host_bridges(PCIBus *bus, void *opaque)
 {
-    GArray *ivhd_blob = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-        if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
-        }
+    if (!pci_bus_bypass_iommu(bus)) {
+        pci_for_each_device_under_bus(bus, insert_ivhd, opaque);
     }
-
-    return 0;
 }
 
 static void
@@ -2380,8 +2363,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
      * blob further below.  Fall back to an entry covering all devices, which
      * is sufficient when no aliases are present.
      */
-    object_child_foreach_recursive(object_get_root(),
-                                   ivrs_host_bridges, ivhd_blob);
+    pci_for_each_root_bus(ivrs_host_bridges, ivhd_blob);
 
     if (!ivhd_blob->len) {
         /*
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4a84e478ce..258290f4eb 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2097,6 +2097,32 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
     }
 }
 
+typedef struct {
+    pci_bus_fn fn;
+    void *opaque;
+} PCIRootBusArgs;
+
+static int pci_find_root_bus(Object *obj, void *opaque)
+{
+    PCIRootBusArgs *args = opaque;
+    PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
+
+    if (bus) {
+        args->fn(bus, args->opaque);
+    }
+
+    return 0;
+}
+
+void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
+{
+    PCIRootBusArgs args = { .fn = fn, .opaque = opaque };
+
+    object_child_foreach_recursive_type(object_get_root(),
+                                        TYPE_PCI_HOST_BRIDGE,
+                                        pci_find_root_bus,
+                                        &args);
+}
 
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 5c4016b995..6813f128e0 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -474,6 +474,8 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
                                            void *opaque);
 void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
                                   pci_bus_fn end, void *parent_state);
+/* Call `fn' for each pci root bus on the system */
+void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
 
 /* Use this wrapper when specific scan order is not required. */
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-28  4:31 [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (3 preceding siblings ...)
  2021-10-28  4:31 ` [PATCH v2 4/5] pci: Add pci_for_each_root_bus() Peter Xu
@ 2021-10-28  4:31 ` Peter Xu
  2021-10-28  7:17   ` David Hildenbrand
  2021-10-28 14:52   ` Alex Williamson
  2021-11-02  1:15 ` [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
  5 siblings, 2 replies; 18+ messages in thread
From: Peter Xu @ 2021-10-28  4:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, peterx, Eric Auger, Alex Williamson,
	Igor Mammedov, Paolo Bonzini, Philippe Mathieu-Daudé,
	David Gibson

Add a pre-plug hook for x86-iommu, so that we can detect vfio-pci devices
before realizing the vIOMMU device.

When the guest contains both the x86 vIOMMU and vfio-pci devices, the user
needs to specify the x86 vIOMMU before the vfio-pci devices.  The reason is,
vfio_realize() calls pci_device_iommu_address_space() to fetch the correct dma
address space for the device, while that API can only work right after the
vIOMMU device initialized first.

For example, the iommu_fn() that is used in pci_device_iommu_address_space() is
only setup in realize() of the vIOMMU devices.

For a long time we have had libvirt making sure that the ordering is correct,
however from qemu side we never fail a guest from booting even if the ordering
is specified wrongly.  When the order is wrong, the guest will encounter
misterious error when operating on the vfio-pci device because in QEMU we'll
still assume the vfio-pci devices are put into the default DMA domain (which is
normally the direct GPA mapping), so e.g. the DMAs will never go right.

This patch fails the guest from booting when we detected such errornous cmdline
specified, then the guest at least won't encounter weird device behavior after
booted.  The error message will also help the user to know how to fix the issue.

Cc: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/pc.c                |  4 ++++
 hw/i386/x86-iommu.c         | 14 ++++++++++++++
 include/hw/i386/x86-iommu.h |  8 ++++++++
 3 files changed, 26 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 86223acfd3..b70a04011e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -81,6 +81,7 @@
 #include "hw/core/cpu.h"
 #include "hw/usb.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 #include "hw/net/ne2000-isa.h"
 #include "standard-headers/asm-x86/bootparam.h"
 #include "hw/virtio/virtio-pmem-pci.h"
@@ -1327,6 +1328,8 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
         pc_memory_pre_plug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         x86_cpu_pre_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE)) {
+        x86_iommu_pre_plug(X86_IOMMU_DEVICE(dev), errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
                object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
         pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
@@ -1383,6 +1386,7 @@ static HotplugHandler *pc_get_hotplug_handler(MachineState *machine,
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
         object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE) ||
         object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
         object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
         return HOTPLUG_HANDLER(machine);
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 86ad03972e..c9ee9041a3 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -22,6 +22,7 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/qdev-properties.h"
 #include "hw/i386/pc.h"
+#include "hw/vfio/pci.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
@@ -103,6 +104,19 @@ IommuType x86_iommu_get_type(void)
     return x86_iommu_default->type;
 }
 
+void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp)
+{
+    bool ambiguous = false;
+    Object *object;
+
+    object = object_resolve_path_type("", TYPE_VFIO_PCI, &ambiguous);
+    if (object || ambiguous) {
+        /* There're one or more vfio-pci devices detected */
+        error_setg(errp, "Please specify all the vfio-pci devices to be after "
+                   "the vIOMMU device");
+    }
+}
+
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 9de92d33a1..e8b6c293e0 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -172,4 +172,12 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
  * @out: Output MSI message
  */
 void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *out);
+
+/**
+ * x86_iommu_pre_plug: called before plugging the iommu device
+ * @X86IOMMUState: the pointer to x86 iommu state
+ * @errp: the double pointer to Error, set if we want to fail the plug
+ */
+void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp);
+
 #endif
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/5] qom: object_child_foreach_recursive_type()
  2021-10-28  4:31 ` [PATCH v2 3/5] qom: object_child_foreach_recursive_type() Peter Xu
@ 2021-10-28  7:06   ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2021-10-28  7:06 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Eric Auger,
	Alex Williamson, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On 28.10.21 06:31, Peter Xu wrote:
> Add this sister helper besides object_child_foreach_recursive() to loop over
> child objects only if the object can be casted to a specific type.
> 
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/qom/object.h | 20 ++++++++++++++++++++
>  qom/object.c         | 27 +++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/include/qom/object.h b/include/qom/object.h
> index faae0d841f..355277db40 100644
> --- a/include/qom/object.h
> +++ b/include/qom/object.h
> @@ -1926,6 +1926,26 @@ int object_child_foreach(Object *obj, int (*fn)(Object *child, void *opaque),
>  int object_child_foreach_recursive(Object *obj,
>                                     int (*fn)(Object *child, void *opaque),
>                                     void *opaque);
> +
> +/**
> + * object_child_foreach_recursive_type:
> + * @obj: the object whose children will be navigated
> + * @type: the typename string to scan
> + * @fn: the iterator function to be called
> + * @opaque: an opaque value that will be passed to the iterator
> + *
> + * This is a special version of object_child_foreach_recursive() so that we
> + * only call the fn() if the child can be casted to the @typename specified.
> + * Please refer to the comments above object_child_foreach_recursive() for
> + * more details.
> + *
> + * Returns: The last value returned by @fn, or 0 if there is no child.
> + */
> +int object_child_foreach_recursive_type(Object *obj,
> +                                        const char *typename,
> +                                        int (*fn)(Object *child, void *opaque),
> +                                        void *opaque);
> +
>  /**
>   * container_get:
>   * @root: root of the #path, e.g., object_get_root()
> diff --git a/qom/object.c b/qom/object.c
> index 6be710bc40..d25ca09b1d 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -1134,6 +1134,33 @@ int object_child_foreach_recursive(Object *obj,
>      return do_object_child_foreach(obj, fn, opaque, true);
>  }
>  
> +typedef struct {
> +    const char *typename;
> +    int (*fn)(Object *child, void *opaque);
> +    void *opaque;
> +} ObjectTypeArgs;
> +
> +static int object_child_hook(Object *child, void *opaque)
> +{
> +    ObjectTypeArgs *args = opaque;
> +
> +    if (object_dynamic_cast(child, args->typename)) {
> +        return args->fn(child, args->opaque);
> +    }
> +
> +    return 0;
> +}
> +
> +int object_child_foreach_recursive_type(Object *obj,
> +                                        const char *typename,
> +                                        int (*fn)(Object *child, void *opaque),
> +                                        void *opaque)
> +{
> +    ObjectTypeArgs args = { .typename = typename, .fn = fn, .opaque = opaque };
> +
> +    return object_child_foreach_recursive(obj, object_child_hook, &args);
> +}
> +
>  static void object_class_get_list_tramp(ObjectClass *klass, void *opaque)
>  {
>      GSList **list = opaque;
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 4/5] pci: Add pci_for_each_root_bus()
  2021-10-28  4:31 ` [PATCH v2 4/5] pci: Add pci_for_each_root_bus() Peter Xu
@ 2021-10-28  7:09   ` David Hildenbrand
  0 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2021-10-28  7:09 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Eric Auger,
	Alex Williamson, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On 28.10.21 06:31, Peter Xu wrote:
> Add a helper to loop over each root bus of the system, either the default root
> bus or extended buses like pxb-pcie.
> 
> There're three places that can be rewritten with the pci_for_each_root_bus()
> helper that we just introduced.  De-dup the code.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/arm/virt-acpi-build.c | 31 +++++++++++--------------------
>  hw/i386/acpi-build.c     | 38 ++++++++++----------------------------
>  hw/pci/pci.c             | 26 ++++++++++++++++++++++++++
>  include/hw/pci/pci.h     |  2 ++
>  4 files changed, 49 insertions(+), 48 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 674f902652..adba51f35a 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -264,28 +264,20 @@ struct AcpiIortIdMapping {
>  typedef struct AcpiIortIdMapping AcpiIortIdMapping;
>  
>  /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
> -static int
> -iort_host_bridges(Object *obj, void *opaque)
> +static void
> +iort_host_bridges(PCIBus *bus, void *opaque)
>  {
> -    GArray *idmap_blob = opaque;
> -
> -    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> -        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> -
> -        if (bus && !pci_bus_bypass_iommu(bus)) {
> -            int min_bus, max_bus;
> +    if (!pci_bus_bypass_iommu(bus)) {
> +        int min_bus, max_bus;
>  
> -            pci_bus_range(bus, &min_bus, &max_bus);
> +        pci_bus_range(bus, &min_bus, &max_bus);
>  
> -            AcpiIortIdMapping idmap = {
> -                .input_base = min_bus << 8,
> -                .id_count = (max_bus - min_bus + 1) << 8,
> -            };
> -            g_array_append_val(idmap_blob, idmap);
> -        }
> +        AcpiIortIdMapping idmap = {
> +            .input_base = min_bus << 8,
> +            .id_count = (max_bus - min_bus + 1) << 8,
> +        };
> +        g_array_append_val((GArray *)opaque, idmap);
>      }
> -
> -    return 0;
>  }
>  
>  static int iort_idmap_compare(gconstpointer a, gconstpointer b)
> @@ -320,8 +312,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      if (vms->iommu == VIRT_IOMMU_SMMUV3) {
>          AcpiIortIdMapping next_range = {0};
>  
> -        object_child_foreach_recursive(object_get_root(),
> -                                       iort_host_bridges, smmu_idmaps);
> +        pci_for_each_root_bus(iort_host_bridges, smmu_idmaps);
>  
>          /* Sort the smmu idmap by input_base */
>          g_array_sort(smmu_idmaps, iort_idmap_compare);
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index a76b17ed92..3e50acfe35 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2123,20 +2123,12 @@ insert_scope(PCIBus *bus, PCIDevice *dev, void *opaque)
>  }
>  
>  /* For a given PCI host bridge, walk and insert DMAR scope */
> -static int
> -dmar_host_bridges(Object *obj, void *opaque)
> +static void
> +dmar_host_bridges(PCIBus *bus, void *opaque)
>  {
> -    GArray *scope_blob = opaque;
> -
> -    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> -        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> -
> -        if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
> -        }
> +    if (!pci_bus_bypass_iommu(bus)) {
> +        pci_for_each_device_under_bus(bus, insert_scope, opaque);
>      }
> -
> -    return 0;
>  }
>  
>  /*
> @@ -2165,8 +2157,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>       * Insert scope for each PCI bridge and endpoint device which
>       * is attached to a bus with iommu enabled.
>       */
> -    object_child_foreach_recursive(object_get_root(),
> -                                   dmar_host_bridges, scope_blob);
> +    pci_for_each_root_bus(dmar_host_bridges, scope_blob);
>  
>      assert(iommu);
>      if (x86_iommu_ir_supported(iommu)) {
> @@ -2329,20 +2320,12 @@ insert_ivhd(PCIBus *bus, PCIDevice *dev, void *opaque)
>  }
>  
>  /* For all PCI host bridges, walk and insert IVHD entries */
> -static int
> -ivrs_host_bridges(Object *obj, void *opaque)
> +static void
> +ivrs_host_bridges(PCIBus *bus, void *opaque)
>  {
> -    GArray *ivhd_blob = opaque;
> -
> -    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> -        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> -
> -        if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
> -        }
> +    if (!pci_bus_bypass_iommu(bus)) {
> +        pci_for_each_device_under_bus(bus, insert_ivhd, opaque);
>      }
> -
> -    return 0;
>  }
>  
>  static void
> @@ -2380,8 +2363,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>       * blob further below.  Fall back to an entry covering all devices, which
>       * is sufficient when no aliases are present.
>       */
> -    object_child_foreach_recursive(object_get_root(),
> -                                   ivrs_host_bridges, ivhd_blob);
> +    pci_for_each_root_bus(ivrs_host_bridges, ivhd_blob);
>  
>      if (!ivhd_blob->len) {
>          /*
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4a84e478ce..258290f4eb 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2097,6 +2097,32 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>      }
>  }
>  
> +typedef struct {
> +    pci_bus_fn fn;
> +    void *opaque;
> +} PCIRootBusArgs;
> +
> +static int pci_find_root_bus(Object *obj, void *opaque)
> +{
> +    PCIRootBusArgs *args = opaque;
> +    PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> +
> +    if (bus) {
> +        args->fn(bus, args->opaque);
> +    }
> +
> +    return 0;
> +}
> +
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
> +{
> +    PCIRootBusArgs args = { .fn = fn, .opaque = opaque };
> +
> +    object_child_foreach_recursive_type(object_get_root(),
> +                                        TYPE_PCI_HOST_BRIDGE,
> +                                        pci_find_root_bus,
> +                                        &args);
> +}
>  
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
>  {
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 5c4016b995..6813f128e0 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -474,6 +474,8 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>                                             void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>                                    pci_bus_fn end, void *parent_state);
> +/* Call `fn' for each pci root bus on the system */
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>  
>  /* Use this wrapper when specific scan order is not required. */
> 

LGTM

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-28  4:31 ` [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu Peter Xu
@ 2021-10-28  7:17   ` David Hildenbrand
  2021-10-28  8:16     ` Peter Xu
  2021-10-28 14:52   ` Alex Williamson
  1 sibling, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2021-10-28  7:17 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Eric Auger,
	Alex Williamson, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On 28.10.21 06:31, Peter Xu wrote:
> Add a pre-plug hook for x86-iommu, so that we can detect vfio-pci devices
> before realizing the vIOMMU device.
> 
> When the guest contains both the x86 vIOMMU and vfio-pci devices, the user
> needs to specify the x86 vIOMMU before the vfio-pci devices.  The reason is,
> vfio_realize() calls pci_device_iommu_address_space() to fetch the correct dma
> address space for the device, while that API can only work right after the
> vIOMMU device initialized first.
> 
> For example, the iommu_fn() that is used in pci_device_iommu_address_space() is
> only setup in realize() of the vIOMMU devices.
> 
> For a long time we have had libvirt making sure that the ordering is correct,
> however from qemu side we never fail a guest from booting even if the ordering
> is specified wrongly.  When the order is wrong, the guest will encounter
> misterious error when operating on the vfio-pci device because in QEMU we'll
> still assume the vfio-pci devices are put into the default DMA domain (which is
> normally the direct GPA mapping), so e.g. the DMAs will never go right.
> 
> This patch fails the guest from booting when we detected such errornous cmdline
> specified, then the guest at least won't encounter weird device behavior after
> booted.  The error message will also help the user to know how to fix the issue.
> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

I think that's a big improvement. I ran into this issue myself and found
the documentation at https://wiki.qemu.org/Features/VT-d at one time
("Meanwhile, the intel-iommu device must be specified as the first
device in the parameter list (before all the rest of the devices). ").

So feel free to add my

Acked-by: David Hildenbrand <david@redhat.com>

> ---
>  hw/i386/pc.c                |  4 ++++
>  hw/i386/x86-iommu.c         | 14 ++++++++++++++
>  include/hw/i386/x86-iommu.h |  8 ++++++++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 86223acfd3..b70a04011e 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -81,6 +81,7 @@
>  #include "hw/core/cpu.h"
>  #include "hw/usb.h"
>  #include "hw/i386/intel_iommu.h"
> +#include "hw/i386/x86-iommu.h"
>  #include "hw/net/ne2000-isa.h"
>  #include "standard-headers/asm-x86/bootparam.h"
>  #include "hw/virtio/virtio-pmem-pci.h"
> @@ -1327,6 +1328,8 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>          pc_memory_pre_plug(hotplug_dev, dev, errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>          x86_cpu_pre_plug(hotplug_dev, dev, errp);
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE)) {
> +        x86_iommu_pre_plug(X86_IOMMU_DEVICE(dev), errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
>                 object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
>          pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
> @@ -1383,6 +1386,7 @@ static HotplugHandler *pc_get_hotplug_handler(MachineState *machine,
>  {
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
> +        object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
>          return HOTPLUG_HANDLER(machine);
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 86ad03972e..c9ee9041a3 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -22,6 +22,7 @@
>  #include "hw/i386/x86-iommu.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/i386/pc.h"
> +#include "hw/vfio/pci.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
> @@ -103,6 +104,19 @@ IommuType x86_iommu_get_type(void)
>      return x86_iommu_default->type;
>  }
>  
> +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp)
> +{
> +    bool ambiguous = false;
> +    Object *object;
> +
> +    object = object_resolve_path_type("", TYPE_VFIO_PCI, &ambiguous);
> +    if (object || ambiguous) {
> +        /* There're one or more vfio-pci devices detected */
> +        error_setg(errp, "Please specify all the vfio-pci devices to be after "
> +                   "the vIOMMU device");
> +    }
> +}
> +
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> index 9de92d33a1..e8b6c293e0 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -172,4 +172,12 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
>   * @out: Output MSI message
>   */
>  void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *out);
> +
> +/**
> + * x86_iommu_pre_plug: called before plugging the iommu device
> + * @X86IOMMUState: the pointer to x86 iommu state
> + * @errp: the double pointer to Error, set if we want to fail the plug
> + */

I'd drop that documentation because it's essentially just how any other
pre_plug handlers works. But maybe it's just me that knows how the whole
hotplug machinery works, so ...

> +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp);
> +
>  #endif
> 


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-28  7:17   ` David Hildenbrand
@ 2021-10-28  8:16     ` Peter Xu
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Xu @ 2021-10-28  8:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, qemu-devel, Markus Armbruster,
	Eric Auger, Alex Williamson, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On Thu, Oct 28, 2021 at 09:17:35AM +0200, David Hildenbrand wrote:
> On 28.10.21 06:31, Peter Xu wrote:
> > Add a pre-plug hook for x86-iommu, so that we can detect vfio-pci devices
> > before realizing the vIOMMU device.
> > 
> > When the guest contains both the x86 vIOMMU and vfio-pci devices, the user
> > needs to specify the x86 vIOMMU before the vfio-pci devices.  The reason is,
> > vfio_realize() calls pci_device_iommu_address_space() to fetch the correct dma
> > address space for the device, while that API can only work right after the
> > vIOMMU device initialized first.
> > 
> > For example, the iommu_fn() that is used in pci_device_iommu_address_space() is
> > only setup in realize() of the vIOMMU devices.
> > 
> > For a long time we have had libvirt making sure that the ordering is correct,
> > however from qemu side we never fail a guest from booting even if the ordering
> > is specified wrongly.  When the order is wrong, the guest will encounter
> > misterious error when operating on the vfio-pci device because in QEMU we'll
> > still assume the vfio-pci devices are put into the default DMA domain (which is
> > normally the direct GPA mapping), so e.g. the DMAs will never go right.
> > 
> > This patch fails the guest from booting when we detected such errornous cmdline
> > specified, then the guest at least won't encounter weird device behavior after
> > booted.  The error message will also help the user to know how to fix the issue.
> > 
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Suggested-by: Igor Mammedov <imammedo@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I think that's a big improvement. I ran into this issue myself and found
> the documentation at https://wiki.qemu.org/Features/VT-d at one time
> ("Meanwhile, the intel-iommu device must be specified as the first
> device in the parameter list (before all the rest of the devices). ").
> 
> So feel free to add my
> 
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks, will do.

> > @@ -172,4 +172,12 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
> >   * @out: Output MSI message
> >   */
> >  void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *out);
> > +
> > +/**
> > + * x86_iommu_pre_plug: called before plugging the iommu device
> > + * @X86IOMMUState: the pointer to x86 iommu state
> > + * @errp: the double pointer to Error, set if we want to fail the plug
> > + */
> 
> I'd drop that documentation because it's essentially just how any other
> pre_plug handlers works. But maybe it's just me that knows how the whole
> hotplug machinery works, so ...

Yes the documentation is not very helpful because it shouldn't be called
randomly but only in the machine pre-plug hook of x86.  It's just trying to not
be the 1st one exported function in the header that does not have a comment.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-28  4:31 ` [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu Peter Xu
  2021-10-28  7:17   ` David Hildenbrand
@ 2021-10-28 14:52   ` Alex Williamson
  2021-10-28 15:36     ` Peter Xu
  1 sibling, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2021-10-28 14:52 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin, qemu-devel,
	Markus Armbruster, Eric Auger, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On Thu, 28 Oct 2021 12:31:29 +0800
Peter Xu <peterx@redhat.com> wrote:

> Add a pre-plug hook for x86-iommu, so that we can detect vfio-pci devices
> before realizing the vIOMMU device.
> 
> When the guest contains both the x86 vIOMMU and vfio-pci devices, the user
> needs to specify the x86 vIOMMU before the vfio-pci devices.  The reason is,
> vfio_realize() calls pci_device_iommu_address_space() to fetch the correct dma
> address space for the device, while that API can only work right after the
> vIOMMU device initialized first.
> 
> For example, the iommu_fn() that is used in pci_device_iommu_address_space() is
> only setup in realize() of the vIOMMU devices.
> 
> For a long time we have had libvirt making sure that the ordering is correct,
> however from qemu side we never fail a guest from booting even if the ordering
> is specified wrongly.  When the order is wrong, the guest will encounter
> misterious error when operating on the vfio-pci device because in QEMU we'll
> still assume the vfio-pci devices are put into the default DMA domain (which is
> normally the direct GPA mapping), so e.g. the DMAs will never go right.
> 
> This patch fails the guest from booting when we detected such errornous cmdline
> specified, then the guest at least won't encounter weird device behavior after
> booted.  The error message will also help the user to know how to fix the issue.
> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/pc.c                |  4 ++++
>  hw/i386/x86-iommu.c         | 14 ++++++++++++++
>  include/hw/i386/x86-iommu.h |  8 ++++++++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 86223acfd3..b70a04011e 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -81,6 +81,7 @@
>  #include "hw/core/cpu.h"
>  #include "hw/usb.h"
>  #include "hw/i386/intel_iommu.h"
> +#include "hw/i386/x86-iommu.h"
>  #include "hw/net/ne2000-isa.h"
>  #include "standard-headers/asm-x86/bootparam.h"
>  #include "hw/virtio/virtio-pmem-pci.h"
> @@ -1327,6 +1328,8 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>          pc_memory_pre_plug(hotplug_dev, dev, errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>          x86_cpu_pre_plug(hotplug_dev, dev, errp);
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE)) {
> +        x86_iommu_pre_plug(X86_IOMMU_DEVICE(dev), errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
>                 object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
>          pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
> @@ -1383,6 +1386,7 @@ static HotplugHandler *pc_get_hotplug_handler(MachineState *machine,
>  {
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
> +        object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
>          return HOTPLUG_HANDLER(machine);
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 86ad03972e..c9ee9041a3 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -22,6 +22,7 @@
>  #include "hw/i386/x86-iommu.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/i386/pc.h"
> +#include "hw/vfio/pci.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
> @@ -103,6 +104,19 @@ IommuType x86_iommu_get_type(void)
>      return x86_iommu_default->type;
>  }
>  
> +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp)
> +{
> +    bool ambiguous = false;
> +    Object *object;
> +
> +    object = object_resolve_path_type("", TYPE_VFIO_PCI, &ambiguous);
> +    if (object || ambiguous) {
> +        /* There're one or more vfio-pci devices detected */
> +        error_setg(errp, "Please specify all the vfio-pci devices to be after "
> +                   "the vIOMMU device");
> +    }

I still really don't buy the argument that vfio-pci is the only driver
that does "this thing", therefore we can just look for vfio-pci devices
by name rather than try to generically detect devices that have this
dependency.  That seems short sighted.

I've already suggested that pci-core could record on the PCIDevice
structure if the device address space has been accessed.  We could also
do something like create a TYPE_PCI_AS_DEVICE class derived from
TYPE_PCI_DEVICE and any PCI drivers that make use of the device address
space before machine-init-done would be of this class.  That could even
be enforced by pci_device_iommu_address_space() and would allow the
same sort of object resolution as used here.  Thanks,

Alex

> +}
> +
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
> diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
> index 9de92d33a1..e8b6c293e0 100644
> --- a/include/hw/i386/x86-iommu.h
> +++ b/include/hw/i386/x86-iommu.h
> @@ -172,4 +172,12 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global,
>   * @out: Output MSI message
>   */
>  void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *out);
> +
> +/**
> + * x86_iommu_pre_plug: called before plugging the iommu device
> + * @X86IOMMUState: the pointer to x86 iommu state
> + * @errp: the double pointer to Error, set if we want to fail the plug
> + */
> +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp);
> +
>  #endif



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-28 14:52   ` Alex Williamson
@ 2021-10-28 15:36     ` Peter Xu
  2021-10-28 16:11       ` Alex Williamson
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-10-28 15:36 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin, qemu-devel,
	Markus Armbruster, Eric Auger, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On Thu, Oct 28, 2021 at 08:52:42AM -0600, Alex Williamson wrote:
> > +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp)
> > +{
> > +    bool ambiguous = false;
> > +    Object *object;
> > +
> > +    object = object_resolve_path_type("", TYPE_VFIO_PCI, &ambiguous);
> > +    if (object || ambiguous) {
> > +        /* There're one or more vfio-pci devices detected */
> > +        error_setg(errp, "Please specify all the vfio-pci devices to be after "
> > +                   "the vIOMMU device");
> > +    }
> 
> I still really don't buy the argument that vfio-pci is the only driver
> that does "this thing", therefore we can just look for vfio-pci devices
> by name rather than try to generically detect devices that have this
> dependency.  That seems short sighted.
> 
> I've already suggested that pci-core could record on the PCIDevice
> structure if the device address space has been accessed.  We could also
> do something like create a TYPE_PCI_AS_DEVICE class derived from
> TYPE_PCI_DEVICE and any PCI drivers that make use of the device address
> space before machine-init-done would be of this class.  That could even
> be enforced by pci_device_iommu_address_space() and would allow the
> same sort of object resolution as used here.  Thanks,

Sorry Alex, I didn't receive any follow up so I thought you were fine with it.

I was always fine with either way, though I think another parent class would be
an overkill just for this.  Would you think below acceptable?

---8<---
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5cdf1d4298..2156b5d3ed 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3266,6 +3266,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
     pdc->exit = vfio_exitfn;
     pdc->config_read = vfio_pci_read_config;
     pdc->config_write = vfio_pci_write_config;
+    pdc->require_consolidated_iommu_as = true;
 }

 static const TypeInfo vfio_pci_dev_info = {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6813f128e0..ffddc766ba 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -239,6 +239,14 @@ struct PCIDeviceClass {
      */
     bool is_bridge;

+    /*
+     * Set this to true when a pci device needs consolidated result from the
+     * pci_device_iommu_address_space() in its realize() fn.  This also means
+     * when specified in cmdline, this "-device" parameter needs to be put
+     * before the vIOMMU devices so as to make it work.
+     */
+    bool require_consolidated_iommu_as;
+
     /* rom bar */
     const char *romfile;
 };
---8<---

Then I'll need to pick the dropped patch back on pci scanning, since then I
won't be able to use object_resolve_path_type() anymore, and I'll need to check
up PCIDeviceClass instead.

Michael, Igor, others - any objections?

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-28 15:36     ` Peter Xu
@ 2021-10-28 16:11       ` Alex Williamson
  2021-10-29  2:53         ` Peter Xu
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2021-10-28 16:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin, qemu-devel,
	Markus Armbruster, Eric Auger, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On Thu, 28 Oct 2021 23:36:33 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Thu, Oct 28, 2021 at 08:52:42AM -0600, Alex Williamson wrote:
> > > +void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp)
> > > +{
> > > +    bool ambiguous = false;
> > > +    Object *object;
> > > +
> > > +    object = object_resolve_path_type("", TYPE_VFIO_PCI, &ambiguous);
> > > +    if (object || ambiguous) {
> > > +        /* There're one or more vfio-pci devices detected */
> > > +        error_setg(errp, "Please specify all the vfio-pci devices to be after "
> > > +                   "the vIOMMU device");
> > > +    }  
> > 
> > I still really don't buy the argument that vfio-pci is the only driver
> > that does "this thing", therefore we can just look for vfio-pci devices
> > by name rather than try to generically detect devices that have this
> > dependency.  That seems short sighted.
> > 
> > I've already suggested that pci-core could record on the PCIDevice
> > structure if the device address space has been accessed.  We could also
> > do something like create a TYPE_PCI_AS_DEVICE class derived from
> > TYPE_PCI_DEVICE and any PCI drivers that make use of the device address
> > space before machine-init-done would be of this class.  That could even
> > be enforced by pci_device_iommu_address_space() and would allow the
> > same sort of object resolution as used here.  Thanks,  
> 
> Sorry Alex, I didn't receive any follow up so I thought you were fine with it.
> 
> I was always fine with either way, though I think another parent class would be
> an overkill just for this.  Would you think below acceptable?
> 
> ---8<---
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 5cdf1d4298..2156b5d3ed 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3266,6 +3266,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>      pdc->exit = vfio_exitfn;
>      pdc->config_read = vfio_pci_read_config;
>      pdc->config_write = vfio_pci_write_config;
> +    pdc->require_consolidated_iommu_as = true;
>  }
> 
>  static const TypeInfo vfio_pci_dev_info = {
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 6813f128e0..ffddc766ba 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -239,6 +239,14 @@ struct PCIDeviceClass {
>       */
>      bool is_bridge;
> 
> +    /*
> +     * Set this to true when a pci device needs consolidated result from the
> +     * pci_device_iommu_address_space() in its realize() fn.  This also means
> +     * when specified in cmdline, this "-device" parameter needs to be put
> +     * before the vIOMMU devices so as to make it work.
> +     */
> +    bool require_consolidated_iommu_as;

Maybe this is where the naming of the previous attempt along these lines
didn't work.  There's no "consolidation" requirement and the IOMMU is
only related in that it is a driver that provides new address spaces
for devices.  This is why I thought we might be able to make it
automatic if pci_device_iommu_address_space() records that the address
space for a device has been consumed.

What we're trying to describe is that the address space object for a
device must be fixed at the time the device is realized.  Along those
lines, maybe a better name is something like "required_fixed_as_obj".

> +
>      /* rom bar */
>      const char *romfile;
>  };
> ---8<---
> 
> Then I'll need to pick the dropped patch back on pci scanning, since then I
> won't be able to use object_resolve_path_type() anymore, and I'll need to check
> up PCIDeviceClass instead.

Better.  Like the class layering proposal, a downside is that the
driver needs to be aware that it's imposing this requirement to be able
to mark it in the class init function rather than some automatic means,
like an "as_object_consumed" flag set automatically on the device
structure via accessors like pci_device_iommu_address_space().  Thanks,

Alex



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/5] pci: Export pci_for_each_device_under_bus*()
  2021-10-28  4:31 ` [PATCH v2 2/5] pci: Export pci_for_each_device_under_bus*() Peter Xu
@ 2021-10-29  0:06   ` David Gibson
  0 siblings, 0 replies; 18+ messages in thread
From: David Gibson @ 2021-10-29  0:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin, qemu-devel,
	Markus Armbruster, Eric Auger, Alex Williamson, Igor Mammedov,
	Paolo Bonzini, Philippe Mathieu-Daudé

[-- Attachment #1: Type: text/plain, Size: 9982 bytes --]

On Thu, Oct 28, 2021 at 12:31:26PM +0800, Peter Xu wrote:
> They're actually more commonly used than the helper without _under_bus, because
> most callers do have the pci bus on hand.  After exporting we can switch a lot
> of the call sites to use these two helpers.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

ppc parts
Acked-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/i386/acpi-build.c       |  5 ++---
>  hw/pci/pci.c               | 10 +++++-----
>  hw/pci/pcie.c              |  4 +---
>  hw/ppc/spapr_pci.c         | 12 +++++-------
>  hw/ppc/spapr_pci_nvlink2.c |  7 +++----
>  hw/ppc/spapr_pci_vfio.c    |  4 ++--
>  hw/s390x/s390-pci-bus.c    |  5 ++---
>  hw/xen/xen_pt.c            |  4 ++--
>  include/hw/pci/pci.h       |  5 +++++
>  9 files changed, 27 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 81418b7911..a76b17ed92 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2132,8 +2132,7 @@ dmar_host_bridges(Object *obj, void *opaque)
>          PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
>  
>          if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device(bus, pci_bus_num(bus), insert_scope,
> -                                scope_blob);
> +            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
>          }
>      }
>  
> @@ -2339,7 +2338,7 @@ ivrs_host_bridges(Object *obj, void *opaque)
>          PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
>  
>          if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_blob);
> +            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
>          }
>      }
>  
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 17e59cb3a3..4a84e478ce 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1654,9 +1654,9 @@ static const pci_class_desc pci_class_descriptions[] =
>      { 0, NULL}
>  };
>  
> -static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> -                                                  pci_bus_dev_fn fn,
> -                                                  void *opaque)
> +void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> +                                           pci_bus_dev_fn fn,
> +                                           void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> @@ -1679,8 +1679,8 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
>      }
>  }
>  
> -static void pci_for_each_device_under_bus(PCIBus *bus,
> -                                          pci_bus_dev_fn fn, void *opaque)
> +void pci_for_each_device_under_bus(PCIBus *bus,
> +                                   pci_bus_dev_fn fn, void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 6e95d82903..914a9bf3d1 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -694,9 +694,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
>          (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
>          (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
>          PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
> -        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                            pcie_unplug_device, NULL);
> -
> +        pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
>          pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
>                                       PCI_EXP_SLTSTA_PDS);
>          if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 7430bd6314..5bfd4aa9e5 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1317,8 +1317,7 @@ static int spapr_dt_pci_bus(SpaprPhbState *sphb, PCIBus *bus,
>                            RESOURCE_CELLS_SIZE));
>  
>      assert(bus);
> -    pci_for_each_device_reverse(bus, pci_bus_num(bus),
> -                                spapr_dt_pci_device_cb, &cbinfo);
> +    pci_for_each_device_under_bus_reverse(bus, spapr_dt_pci_device_cb, &cbinfo);
>      if (cbinfo.err) {
>          return cbinfo.err;
>      }
> @@ -2306,8 +2305,8 @@ static void spapr_phb_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
>          return;
>      }
>  
> -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                        spapr_phb_pci_enumerate_bridge, bus_no);
> +    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_enumerate_bridge,
> +                                  bus_no);
>      pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, *bus_no, 1);
>  }
>  
> @@ -2316,9 +2315,8 @@ static void spapr_phb_pci_enumerate(SpaprPhbState *phb)
>      PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
>      unsigned int bus_no = 0;
>  
> -    pci_for_each_device(bus, pci_bus_num(bus),
> -                        spapr_phb_pci_enumerate_bridge,
> -                        &bus_no);
> +    pci_for_each_device_under_bus(bus, spapr_phb_pci_enumerate_bridge,
> +                                  &bus_no);
>  
>  }
>  
> diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
> index 8ef9b40a18..7fb0cf4d04 100644
> --- a/hw/ppc/spapr_pci_nvlink2.c
> +++ b/hw/ppc/spapr_pci_nvlink2.c
> @@ -164,8 +164,7 @@ static void spapr_phb_pci_collect_nvgpu(PCIBus *bus, PCIDevice *pdev,
>          return;
>      }
>  
> -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                        spapr_phb_pci_collect_nvgpu, opaque);
> +    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_collect_nvgpu, opaque);
>  }
>  
>  void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
> @@ -183,8 +182,8 @@ void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
>      sphb->nvgpus->nv2_atsd_current = sphb->nv2_atsd_win_addr;
>  
>      bus = PCI_HOST_BRIDGE(sphb)->bus;
> -    pci_for_each_device(bus, pci_bus_num(bus),
> -                        spapr_phb_pci_collect_nvgpu, sphb->nvgpus);
> +    pci_for_each_device_under_bus(bus, spapr_phb_pci_collect_nvgpu,
> +                                  sphb->nvgpus);
>  
>      if (sphb->nvgpus->err) {
>          error_propagate(errp, sphb->nvgpus->err);
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index f3b37df8ea..2a76b4e0b5 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -164,8 +164,8 @@ static void spapr_phb_vfio_eeh_clear_dev_msix(PCIBus *bus,
>  
>  static void spapr_phb_vfio_eeh_clear_bus_msix(PCIBus *bus, void *opaque)
>  {
> -       pci_for_each_device(bus, pci_bus_num(bus),
> -                           spapr_phb_vfio_eeh_clear_dev_msix, NULL);
> +       pci_for_each_device_under_bus(bus, spapr_phb_vfio_eeh_clear_dev_msix,
> +                                     NULL);
>  }
>  
>  static void spapr_phb_vfio_eeh_pre_reset(SpaprPhbState *sphb)
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 6fafffb029..1b51a72838 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -1163,8 +1163,7 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
>      }
>  
>      /* Assign numbers to all child bridges. The last is the highest number. */
> -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                        s390_pci_enumerate_bridge, s);
> +    pci_for_each_device_under_bus(sec_bus, s390_pci_enumerate_bridge, s);
>      pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
>  }
>  
> @@ -1193,7 +1192,7 @@ static void s390_pcihost_reset(DeviceState *dev)
>       * on every system reset, we also have to reassign numbers.
>       */
>      s->bus_no = 0;
> -    pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
> +    pci_for_each_device_under_bus(bus, s390_pci_enumerate_bridge, s);
>  }
>  
>  static void s390_pcihost_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index ca0a98187e..027190fa44 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -615,8 +615,8 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
>      }
>  
>      args.type = d->io_regions[bar].type;
> -    pci_for_each_device(pci_get_bus(d), pci_dev_bus_num(d),
> -                        xen_pt_check_bar_overlap, &args);
> +    pci_for_each_device_under_bus(pci_get_bus(d),
> +                                  xen_pt_check_bar_overlap, &args);
>      if (args.rc) {
>          XEN_PT_WARN(d, "Region: %d (addr: 0x%"FMT_PCIBUS
>                      ", len: 0x%"FMT_PCIBUS") is overlapped.\n",
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 4a8740b76b..5c4016b995 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -467,6 +467,11 @@ void pci_for_each_device(PCIBus *bus, int bus_num,
>  void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
>                                   pci_bus_dev_fn fn,
>                                   void *opaque);
> +void pci_for_each_device_under_bus(PCIBus *bus,
> +                                   pci_bus_dev_fn fn, void *opaque);
> +void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> +                                           pci_bus_dev_fn fn,
> +                                           void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>                                    pci_bus_fn end, void *parent_state);
>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-28 16:11       ` Alex Williamson
@ 2021-10-29  2:53         ` Peter Xu
  2021-10-29 15:31           ` Alex Williamson
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-10-29  2:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin, qemu-devel,
	Markus Armbruster, Eric Auger, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On Thu, Oct 28, 2021 at 10:11:35AM -0600, Alex Williamson wrote:
> Better.  Like the class layering proposal, a downside is that the
> driver needs to be aware that it's imposing this requirement to be able
> to mark it in the class init function rather than some automatic means,
> like an "as_object_consumed" flag set automatically on the device
> structure via accessors like pci_device_iommu_address_space().  Thanks,

Do you mean something like this?

---8<---
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 258290f4eb..969f4c85fd 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2729,6 +2729,10 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
     PCIBus *iommu_bus = bus;
     uint8_t devfn = dev->devfn;

+    if (!dev->address_space_consumed) {
+        dev->address_space_consumed = true;
+    }
+
     while (iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
         PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6813f128e0..704c9bdc6e 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -268,6 +268,13 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 struct PCIDevice {
     DeviceState qdev;
     bool partially_hotplugged;
+    /*
+     * This will be set after the 1st time the device implementation fetches
+     * its dma address space from pci_device_iommu_address_space().  It's used
+     * as a sanity check for platform devices like vIOMMU to detect incorrect
+     * ordering of device realization.
+     */
+    bool address_space_consumed;

     /* PCI config space */
     uint8_t *config;
---8<---

Then sanity check in pre-plug of vIOMMU.

The flag will be a bit more "misterious" than the previous approach, imho, as
the name of it will be even further from the problem it's going to solve.
However it looks at least clean on the changeset and it looks working too.

Thanks,

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu
  2021-10-29  2:53         ` Peter Xu
@ 2021-10-29 15:31           ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2021-10-29 15:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin, qemu-devel,
	Markus Armbruster, Eric Auger, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On Fri, 29 Oct 2021 10:53:07 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Thu, Oct 28, 2021 at 10:11:35AM -0600, Alex Williamson wrote:
> > Better.  Like the class layering proposal, a downside is that the
> > driver needs to be aware that it's imposing this requirement to be able
> > to mark it in the class init function rather than some automatic means,
> > like an "as_object_consumed" flag set automatically on the device
> > structure via accessors like pci_device_iommu_address_space().  Thanks,  
> 
> Do you mean something like this?
> 
> ---8<---
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 258290f4eb..969f4c85fd 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2729,6 +2729,10 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>      PCIBus *iommu_bus = bus;
>      uint8_t devfn = dev->devfn;
> 
> +    if (!dev->address_space_consumed) {
> +        dev->address_space_consumed = true;
> +    }

Could just set it unconditionally.

> +
>      while (iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
>          PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 6813f128e0..704c9bdc6e 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -268,6 +268,13 @@ typedef struct PCIReqIDCache PCIReqIDCache;
>  struct PCIDevice {
>      DeviceState qdev;
>      bool partially_hotplugged;
> +    /*
> +     * This will be set after the 1st time the device implementation fetches
> +     * its dma address space from pci_device_iommu_address_space().  It's used
> +     * as a sanity check for platform devices like vIOMMU to detect incorrect
> +     * ordering of device realization.
> +     */
> +    bool address_space_consumed;
> 
>      /* PCI config space */
>      uint8_t *config;
> ---8<---
> 
> Then sanity check in pre-plug of vIOMMU.
> 
> The flag will be a bit more "misterious" than the previous approach, imho, as
> the name of it will be even further from the problem it's going to solve.
> However it looks at least clean on the changeset and it looks working too.

That seems like a function of how well we name and comment the
variable, right?  We are making an assumption here that if the address
space for a device is provided then that address space is no longer
interchangeable, some decision has already been made based on the
provided address space.  If we look at the callers of
pci_device_iommu_address_space(), we have:

pci_init_bus_master() - It holds true here that the purpose of
accessing the address space is to make the memory of that address space
accessible to the device, the address space cannot be transparently
swapped for another.

vfio_realize() - The case we're concerned about, potentially the
earliest use case.

virtio_pci_iommu_enabled() - AIUI, this is where virtio devices decide
how DMA flows, the address space of the device cannot be changed after
this.

kvm_arch_fixup_msi_route() - ARM KVM decides MSI routing here, the
address space is fixed after this.

Actually, maybe there's a more simple approach, could we further assume
that if the address space for *any* device relative to an IOMMU is
evaluated, then we've passed the point where an IOMMU could be added?
IOW, maybe we don't need a per device flag and a global flag would be
enough.  A function like pci_iommu_as_evaluated() could report the
state of that flag.  For convenience to the user though, tracking per
device to be able to report which devices are mis-ordered could still
be useful though.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU
  2021-10-28  4:31 [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (4 preceding siblings ...)
  2021-10-28  4:31 ` [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu Peter Xu
@ 2021-11-02  1:15 ` Peter Xu
  2021-11-02  6:40   ` Michael S. Tsirkin
  5 siblings, 1 reply; 18+ messages in thread
From: Peter Xu @ 2021-11-02  1:15 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, Eric Auger, Alex Williamson, Igor Mammedov,
	Paolo Bonzini, Philippe Mathieu-Daudé,
	David Gibson

On Thu, Oct 28, 2021 at 12:31:24PM +0800, Peter Xu wrote:
> Note that patch 1-4 are cleanups for pci subsystem, and patch 5 is a fix to
> fail early for mis-ordered qemu cmdline on vfio and vIOMMU.  Logically they
> should be posted separately and they're not directly related, however to make
> it still correlated to v1 I kept them in the same patchset.
> 
> In this version I used pre_plug() hook for q35 to detect the ordering issue as
> Igor suggested, meanwhile it's done via object_resolve_path_type() rather than
> scanning the pci bus as Michael suggested.
> 
> Please review, thanks.

Michael,

Would you consider review/pick patches 1-4 first?  The last patch needs further
discussion, and I would like to address it separately in the future.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU
  2021-11-02  1:15 ` [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
@ 2021-11-02  6:40   ` Michael S. Tsirkin
  0 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2021-11-02  6:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, qemu-devel, Markus Armbruster,
	Eric Auger, Alex Williamson, Igor Mammedov, Paolo Bonzini,
	Philippe Mathieu-Daudé,
	David Gibson

On Tue, Nov 02, 2021 at 09:15:59AM +0800, Peter Xu wrote:
> On Thu, Oct 28, 2021 at 12:31:24PM +0800, Peter Xu wrote:
> > Note that patch 1-4 are cleanups for pci subsystem, and patch 5 is a fix to
> > fail early for mis-ordered qemu cmdline on vfio and vIOMMU.  Logically they
> > should be posted separately and they're not directly related, however to make
> > it still correlated to v1 I kept them in the same patchset.
> > 
> > In this version I used pre_plug() hook for q35 to detect the ordering issue as
> > Igor suggested, meanwhile it's done via object_resolve_path_type() rather than
> > scanning the pci bus as Michael suggested.
> > 
> > Please review, thanks.
> 
> Michael,
> 
> Would you consider review/pick patches 1-4 first?  The last patch needs further
> discussion, and I would like to address it separately in the future.
> 
> Thanks,

I did just that yesterday.

> -- 
> Peter Xu



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-11-02  6:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28  4:31 [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
2021-10-28  4:31 ` [PATCH v2 1/5] pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn Peter Xu
2021-10-28  4:31 ` [PATCH v2 2/5] pci: Export pci_for_each_device_under_bus*() Peter Xu
2021-10-29  0:06   ` David Gibson
2021-10-28  4:31 ` [PATCH v2 3/5] qom: object_child_foreach_recursive_type() Peter Xu
2021-10-28  7:06   ` David Hildenbrand
2021-10-28  4:31 ` [PATCH v2 4/5] pci: Add pci_for_each_root_bus() Peter Xu
2021-10-28  7:09   ` David Hildenbrand
2021-10-28  4:31 ` [PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu Peter Xu
2021-10-28  7:17   ` David Hildenbrand
2021-10-28  8:16     ` Peter Xu
2021-10-28 14:52   ` Alex Williamson
2021-10-28 15:36     ` Peter Xu
2021-10-28 16:11       ` Alex Williamson
2021-10-29  2:53         ` Peter Xu
2021-10-29 15:31           ` Alex Williamson
2021-11-02  1:15 ` [PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
2021-11-02  6:40   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).