qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU
@ 2021-10-21 10:42 Peter Xu
  2021-10-21 10:42 ` [PATCH 1/8] pci: Define pci_bus_dev_fn type Peter Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

This series overrides one previous patchset:

https://lore.kernel.org/qemu-devel/20210818194217.110451-1-peterx@redhat.com/

I started from v1 because obviously it's completely different way of doing the
same thing, hence versioning upon it would be weird.

Patches 1-7 are majorly cleanups for current pci code to finally provide a
clean way to loop over all the pci devices on the system.

Patch 8 uses the last helper pci_for_each_device_all() to loop over all the
devices during x86 vIOMMU realize() function to fail early if e.g. vfio-pci
devices are detected.  Although this is not exactly what Igor suggested but it
should be mostly the same approach, so I kept the Suggested-by credit.

Please review, thanks.

Peter Xu (8):
  pci: Define pci_bus_dev_fn type
  pci: Export pci_for_each_device_under_bus*()
  pci: Use pci_for_each_device_under_bus*()
  pci: Define pci_bus_fn/pci_bus_ret_fn type
  pci: Add pci_for_each_root_bus()
  pci: Use pci_for_each_root_bus() in current code
  pci: Add pci_for_each_device_all()
  x86-iommu: Fail early if vIOMMU specified after vfio-pci

 hw/arm/virt-acpi-build.c   | 31 ++++++---------
 hw/i386/acpi-build.c       | 39 +++++--------------
 hw/i386/x86-iommu.c        | 18 +++++++++
 hw/pci/pci.c               | 77 +++++++++++++++++++++++++++++---------
 hw/pci/pcie.c              |  4 +-
 hw/ppc/spapr_pci.c         | 12 +++---
 hw/ppc/spapr_pci_nvlink2.c |  7 ++--
 hw/ppc/spapr_pci_vfio.c    |  4 +-
 hw/s390x/s390-pci-bus.c    |  5 +--
 hw/xen/xen_pt.c            |  4 +-
 include/hw/pci/pci.h       | 28 +++++++++-----
 11 files changed, 132 insertions(+), 97 deletions(-)

-- 
2.32.0




^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/8] pci: Define pci_bus_dev_fn type
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 10:53   ` David Hildenbrand
                     ` (2 more replies)
  2021-10-21 10:42 ` [PATCH 2/8] pci: Export pci_for_each_device_under_bus*() Peter Xu
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

It's used in quite a few places of pci.c and also in the rest of the code base.
Define such a hook so that it doesn't need to be defined all over the places.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c         | 14 ++++----------
 include/hw/pci/pci.h |  7 ++++---
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 186758ee11..1ab2b78321 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1655,9 +1655,7 @@ static const pci_class_desc pci_class_descriptions[] =
 };
 
 static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
-                                                  void (*fn)(PCIBus *b,
-                                                             PCIDevice *d,
-                                                             void *opaque),
+                                                  pci_bus_dev_fn fn,
                                                   void *opaque)
 {
     PCIDevice *d;
@@ -1672,8 +1670,7 @@ static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
 }
 
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
-                         void *opaque)
+                                 pci_bus_dev_fn fn, void *opaque)
 {
     bus = pci_find_bus_nr(bus, bus_num);
 
@@ -1683,9 +1680,7 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
 }
 
 static void pci_for_each_device_under_bus(PCIBus *bus,
-                                          void (*fn)(PCIBus *b, PCIDevice *d,
-                                                     void *opaque),
-                                          void *opaque)
+                                          pci_bus_dev_fn fn, void *opaque)
 {
     PCIDevice *d;
     int devfn;
@@ -1699,8 +1694,7 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
 }
 
 void pci_for_each_device(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
-                         void *opaque)
+                         pci_bus_dev_fn fn, void *opaque)
 {
     bus = pci_find_bus_nr(bus, bus_num);
 
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 7fc90132cf..8e2d80860b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -401,6 +401,8 @@ typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
 OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
 #define TYPE_PCIE_BUS "PCIE"
 
+typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
+
 bool pci_bus_is_express(PCIBus *bus);
 
 void pci_root_bus_init(PCIBus *bus, size_t bus_size, DeviceState *parent,
@@ -458,11 +460,10 @@ static inline int pci_dev_bus_num(const PCIDevice *dev)
 
 int pci_bus_numa_node(PCIBus *bus);
 void pci_for_each_device(PCIBus *bus, int bus_num,
-                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
+                         pci_bus_dev_fn fn,
                          void *opaque);
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
-                                 void (*fn)(PCIBus *bus, PCIDevice *d,
-                                            void *opaque),
+                                 pci_bus_dev_fn fn,
                                  void *opaque);
 void pci_for_each_bus_depth_first(PCIBus *bus,
                                   void *(*begin)(PCIBus *bus, void *parent_state),
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/8] pci: Export pci_for_each_device_under_bus*()
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
  2021-10-21 10:42 ` [PATCH 1/8] pci: Define pci_bus_dev_fn type Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 10:54   ` David Hildenbrand
                     ` (2 more replies)
  2021-10-21 10:42 ` [PATCH 3/8] pci: Use pci_for_each_device_under_bus*() Peter Xu
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand.  After exporting we can switch a lot
of the call sites to use these two helpers.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c         | 10 +++++-----
 include/hw/pci/pci.h |  5 +++++
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 1ab2b78321..6b834cace5 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1654,9 +1654,9 @@ static const pci_class_desc pci_class_descriptions[] =
     { 0, NULL}
 };
 
-static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
-                                                  pci_bus_dev_fn fn,
-                                                  void *opaque)
+void pci_for_each_device_under_bus_reverse(PCIBus *bus,
+                                           pci_bus_dev_fn fn,
+                                           void *opaque)
 {
     PCIDevice *d;
     int devfn;
@@ -1679,8 +1679,8 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
     }
 }
 
-static void pci_for_each_device_under_bus(PCIBus *bus,
-                                          pci_bus_dev_fn fn, void *opaque)
+void pci_for_each_device_under_bus(PCIBus *bus,
+                                   pci_bus_dev_fn fn, void *opaque)
 {
     PCIDevice *d;
     int devfn;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 8e2d80860b..437eabe609 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -465,6 +465,11 @@ void pci_for_each_device(PCIBus *bus, int bus_num,
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
                                  pci_bus_dev_fn fn,
                                  void *opaque);
+void pci_for_each_device_under_bus(PCIBus *bus,
+                                   pci_bus_dev_fn fn, void *opaque);
+void pci_for_each_device_under_bus_reverse(PCIBus *bus,
+                                           pci_bus_dev_fn fn,
+                                           void *opaque);
 void pci_for_each_bus_depth_first(PCIBus *bus,
                                   void *(*begin)(PCIBus *bus, void *parent_state),
                                   void (*end)(PCIBus *bus, void *state),
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/8] pci: Use pci_for_each_device_under_bus*()
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
  2021-10-21 10:42 ` [PATCH 1/8] pci: Define pci_bus_dev_fn type Peter Xu
  2021-10-21 10:42 ` [PATCH 2/8] pci: Export pci_for_each_device_under_bus*() Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 10:56   ` David Hildenbrand
  2021-10-21 11:34   ` Eric Auger
  2021-10-21 10:42 ` [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type Peter Xu
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

Replace all the call sites of existing pci_for_each_device*() where the bus
number is calculated from a PCIBus* already.  It should avoid the lookup of the
PCIBus again.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/acpi-build.c       |  5 ++---
 hw/pci/pcie.c              |  4 +---
 hw/ppc/spapr_pci.c         | 12 +++++-------
 hw/ppc/spapr_pci_nvlink2.c |  7 +++----
 hw/ppc/spapr_pci_vfio.c    |  4 ++--
 hw/s390x/s390-pci-bus.c    |  5 ++---
 hw/xen/xen_pt.c            |  4 ++--
 7 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 81418b7911..a76b17ed92 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2132,8 +2132,7 @@ dmar_host_bridges(Object *obj, void *opaque)
         PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
 
         if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device(bus, pci_bus_num(bus), insert_scope,
-                                scope_blob);
+            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
         }
     }
 
@@ -2339,7 +2338,7 @@ ivrs_host_bridges(Object *obj, void *opaque)
         PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
 
         if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_blob);
+            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
         }
     }
 
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 6e95d82903..914a9bf3d1 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -694,9 +694,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
         (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
         (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
         PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
-        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                            pcie_unplug_device, NULL);
-
+        pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
         pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
                                      PCI_EXP_SLTSTA_PDS);
         if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7430bd6314..5bfd4aa9e5 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1317,8 +1317,7 @@ static int spapr_dt_pci_bus(SpaprPhbState *sphb, PCIBus *bus,
                           RESOURCE_CELLS_SIZE));
 
     assert(bus);
-    pci_for_each_device_reverse(bus, pci_bus_num(bus),
-                                spapr_dt_pci_device_cb, &cbinfo);
+    pci_for_each_device_under_bus_reverse(bus, spapr_dt_pci_device_cb, &cbinfo);
     if (cbinfo.err) {
         return cbinfo.err;
     }
@@ -2306,8 +2305,8 @@ static void spapr_phb_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
         return;
     }
 
-    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                        spapr_phb_pci_enumerate_bridge, bus_no);
+    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_enumerate_bridge,
+                                  bus_no);
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, *bus_no, 1);
 }
 
@@ -2316,9 +2315,8 @@ static void spapr_phb_pci_enumerate(SpaprPhbState *phb)
     PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
     unsigned int bus_no = 0;
 
-    pci_for_each_device(bus, pci_bus_num(bus),
-                        spapr_phb_pci_enumerate_bridge,
-                        &bus_no);
+    pci_for_each_device_under_bus(bus, spapr_phb_pci_enumerate_bridge,
+                                  &bus_no);
 
 }
 
diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
index 8ef9b40a18..7fb0cf4d04 100644
--- a/hw/ppc/spapr_pci_nvlink2.c
+++ b/hw/ppc/spapr_pci_nvlink2.c
@@ -164,8 +164,7 @@ static void spapr_phb_pci_collect_nvgpu(PCIBus *bus, PCIDevice *pdev,
         return;
     }
 
-    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                        spapr_phb_pci_collect_nvgpu, opaque);
+    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_collect_nvgpu, opaque);
 }
 
 void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
@@ -183,8 +182,8 @@ void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
     sphb->nvgpus->nv2_atsd_current = sphb->nv2_atsd_win_addr;
 
     bus = PCI_HOST_BRIDGE(sphb)->bus;
-    pci_for_each_device(bus, pci_bus_num(bus),
-                        spapr_phb_pci_collect_nvgpu, sphb->nvgpus);
+    pci_for_each_device_under_bus(bus, spapr_phb_pci_collect_nvgpu,
+                                  sphb->nvgpus);
 
     if (sphb->nvgpus->err) {
         error_propagate(errp, sphb->nvgpus->err);
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index f3b37df8ea..2a76b4e0b5 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -164,8 +164,8 @@ static void spapr_phb_vfio_eeh_clear_dev_msix(PCIBus *bus,
 
 static void spapr_phb_vfio_eeh_clear_bus_msix(PCIBus *bus, void *opaque)
 {
-       pci_for_each_device(bus, pci_bus_num(bus),
-                           spapr_phb_vfio_eeh_clear_dev_msix, NULL);
+       pci_for_each_device_under_bus(bus, spapr_phb_vfio_eeh_clear_dev_msix,
+                                     NULL);
 }
 
 static void spapr_phb_vfio_eeh_pre_reset(SpaprPhbState *sphb)
diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 6fafffb029..1b51a72838 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1163,8 +1163,7 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
     }
 
     /* Assign numbers to all child bridges. The last is the highest number. */
-    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-                        s390_pci_enumerate_bridge, s);
+    pci_for_each_device_under_bus(sec_bus, s390_pci_enumerate_bridge, s);
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
 }
 
@@ -1193,7 +1192,7 @@ static void s390_pcihost_reset(DeviceState *dev)
      * on every system reset, we also have to reassign numbers.
      */
     s->bus_no = 0;
-    pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
+    pci_for_each_device_under_bus(bus, s390_pci_enumerate_bridge, s);
 }
 
 static void s390_pcihost_class_init(ObjectClass *klass, void *data)
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index ca0a98187e..027190fa44 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -615,8 +615,8 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
     }
 
     args.type = d->io_regions[bar].type;
-    pci_for_each_device(pci_get_bus(d), pci_dev_bus_num(d),
-                        xen_pt_check_bar_overlap, &args);
+    pci_for_each_device_under_bus(pci_get_bus(d),
+                                  xen_pt_check_bar_overlap, &args);
     if (args.rc) {
         XEN_PT_WARN(d, "Region: %d (addr: 0x%"FMT_PCIBUS
                     ", len: 0x%"FMT_PCIBUS") is overlapped.\n",
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (2 preceding siblings ...)
  2021-10-21 10:42 ` [PATCH 3/8] pci: Use pci_for_each_device_under_bus*() Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 10:57   ` David Hildenbrand
                     ` (2 more replies)
  2021-10-21 10:42 ` [PATCH 5/8] pci: Add pci_for_each_root_bus() Peter Xu
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
void* pointer.

Use them where proper in pci.[ch], and to be used elsewhere.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c         |  6 ++----
 include/hw/pci/pci.h | 12 +++++-------
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 6b834cace5..4a84e478ce 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2072,10 +2072,8 @@ static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num)
     return NULL;
 }
 
-void pci_for_each_bus_depth_first(PCIBus *bus,
-                                  void *(*begin)(PCIBus *bus, void *parent_state),
-                                  void (*end)(PCIBus *bus, void *state),
-                                  void *parent_state)
+void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
+                                  pci_bus_fn end, void *parent_state)
 {
     PCIBus *sec;
     void *state;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 437eabe609..a7e81f04d3 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -402,6 +402,8 @@ OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
 #define TYPE_PCIE_BUS "PCIE"
 
 typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
+typedef void (*pci_bus_fn)(PCIBus *b, void *opaque);
+typedef void* (*pci_bus_ret_fn)(PCIBus *b, void *opaque);
 
 bool pci_bus_is_express(PCIBus *bus);
 
@@ -470,17 +472,13 @@ void pci_for_each_device_under_bus(PCIBus *bus,
 void pci_for_each_device_under_bus_reverse(PCIBus *bus,
                                            pci_bus_dev_fn fn,
                                            void *opaque);
-void pci_for_each_bus_depth_first(PCIBus *bus,
-                                  void *(*begin)(PCIBus *bus, void *parent_state),
-                                  void (*end)(PCIBus *bus, void *state),
-                                  void *parent_state);
+void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
+                                  pci_bus_fn end, void *parent_state);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
 
 /* Use this wrapper when specific scan order is not required. */
 static inline
-void pci_for_each_bus(PCIBus *bus,
-                      void (*fn)(PCIBus *bus, void *opaque),
-                      void *opaque)
+void pci_for_each_bus(PCIBus *bus, pci_bus_fn fn, void *opaque)
 {
     pci_for_each_bus_depth_first(bus, NULL, fn, opaque);
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 5/8] pci: Add pci_for_each_root_bus()
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (3 preceding siblings ...)
  2021-10-21 10:42 ` [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 11:00   ` David Hildenbrand
                     ` (2 more replies)
  2021-10-21 10:42 ` [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code Peter Xu
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

Add a helper to loop over each root bus of the system, either the default root
bus or extended buses like pxb-pcie.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c         | 26 ++++++++++++++++++++++++++
 include/hw/pci/pci.h |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4a84e478ce..1623bc9099 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2097,6 +2097,32 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
     }
 }
 
+typedef struct {
+    pci_bus_fn fn;
+    void *opaque;
+} pci_root_bus_args;
+
+static int pci_find_root_bus(Object *obj, void *opaque)
+{
+    pci_root_bus_args *args = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
+        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
+
+        if (bus) {
+            args->fn(bus, args->opaque);
+        }
+    }
+
+    return 0;
+}
+
+void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
+{
+    pci_root_bus_args args = { .fn = fn, .opaque = opaque };
+
+    object_child_foreach_recursive(object_get_root(), pci_find_root_bus, &args);
+}
 
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index a7e81f04d3..9e490d8969 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -474,6 +474,8 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
                                            void *opaque);
 void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
                                   pci_bus_fn end, void *parent_state);
+/* Call `fn' for each pci root bus on the system */
+void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
 
 /* Use this wrapper when specific scan order is not required. */
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (4 preceding siblings ...)
  2021-10-21 10:42 ` [PATCH 5/8] pci: Add pci_for_each_root_bus() Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 11:06   ` David Hildenbrand
  2021-10-21 12:28   ` Eric Auger
  2021-10-21 10:42 ` [PATCH 7/8] pci: Add pci_for_each_device_all() Peter Xu
  2021-10-21 10:42 ` [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci Peter Xu
  7 siblings, 2 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

There're three places that can be rewritten with the pci_for_each_root_bus()
helper that we just introduced.  De-dup the code.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/arm/virt-acpi-build.c | 31 +++++++++++--------------------
 hw/i386/acpi-build.c     | 38 ++++++++++----------------------------
 2 files changed, 21 insertions(+), 48 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 6cec97352b..54b82aa863 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -263,28 +263,20 @@ struct AcpiIortIdMapping {
 typedef struct AcpiIortIdMapping AcpiIortIdMapping;
 
 /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
-static int
-iort_host_bridges(Object *obj, void *opaque)
+static void
+iort_host_bridges(PCIBus *bus, void *opaque)
 {
-    GArray *idmap_blob = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-        if (bus && !pci_bus_bypass_iommu(bus)) {
-            int min_bus, max_bus;
+    if (!pci_bus_bypass_iommu(bus)) {
+        int min_bus, max_bus;
 
-            pci_bus_range(bus, &min_bus, &max_bus);
+        pci_bus_range(bus, &min_bus, &max_bus);
 
-            AcpiIortIdMapping idmap = {
-                .input_base = min_bus << 8,
-                .id_count = (max_bus - min_bus + 1) << 8,
-            };
-            g_array_append_val(idmap_blob, idmap);
-        }
+        AcpiIortIdMapping idmap = {
+            .input_base = min_bus << 8,
+            .id_count = (max_bus - min_bus + 1) << 8,
+        };
+        g_array_append_val((GArray *)opaque, idmap);
     }
-
-    return 0;
 }
 
 static int iort_idmap_compare(gconstpointer a, gconstpointer b)
@@ -318,8 +310,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     if (vms->iommu == VIRT_IOMMU_SMMUV3) {
         AcpiIortIdMapping next_range = {0};
 
-        object_child_foreach_recursive(object_get_root(),
-                                       iort_host_bridges, smmu_idmaps);
+        pci_for_each_root_bus(iort_host_bridges, smmu_idmaps);
 
         /* Sort the smmu idmap by input_base */
         g_array_sort(smmu_idmaps, iort_idmap_compare);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a76b17ed92..3e50acfe35 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2123,20 +2123,12 @@ insert_scope(PCIBus *bus, PCIDevice *dev, void *opaque)
 }
 
 /* For a given PCI host bridge, walk and insert DMAR scope */
-static int
-dmar_host_bridges(Object *obj, void *opaque)
+static void
+dmar_host_bridges(PCIBus *bus, void *opaque)
 {
-    GArray *scope_blob = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-        if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
-        }
+    if (!pci_bus_bypass_iommu(bus)) {
+        pci_for_each_device_under_bus(bus, insert_scope, opaque);
     }
-
-    return 0;
 }
 
 /*
@@ -2165,8 +2157,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
      * Insert scope for each PCI bridge and endpoint device which
      * is attached to a bus with iommu enabled.
      */
-    object_child_foreach_recursive(object_get_root(),
-                                   dmar_host_bridges, scope_blob);
+    pci_for_each_root_bus(dmar_host_bridges, scope_blob);
 
     assert(iommu);
     if (x86_iommu_ir_supported(iommu)) {
@@ -2329,20 +2320,12 @@ insert_ivhd(PCIBus *bus, PCIDevice *dev, void *opaque)
 }
 
 /* For all PCI host bridges, walk and insert IVHD entries */
-static int
-ivrs_host_bridges(Object *obj, void *opaque)
+static void
+ivrs_host_bridges(PCIBus *bus, void *opaque)
 {
-    GArray *ivhd_blob = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-        if (bus && !pci_bus_bypass_iommu(bus)) {
-            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
-        }
+    if (!pci_bus_bypass_iommu(bus)) {
+        pci_for_each_device_under_bus(bus, insert_ivhd, opaque);
     }
-
-    return 0;
 }
 
 static void
@@ -2380,8 +2363,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
      * blob further below.  Fall back to an entry covering all devices, which
      * is sufficient when no aliases are present.
      */
-    object_child_foreach_recursive(object_get_root(),
-                                   ivrs_host_bridges, ivhd_blob);
+    pci_for_each_root_bus(ivrs_host_bridges, ivhd_blob);
 
     if (!ivhd_blob->len) {
         /*
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 7/8] pci: Add pci_for_each_device_all()
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (5 preceding siblings ...)
  2021-10-21 10:42 ` [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 10:54   ` Michael S. Tsirkin
  2021-10-21 10:42 ` [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci Peter Xu
  7 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

With all the prepared infrastructures, we can easily add one helper now to loop
over all the existing PCI devices across the whole system.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/pci/pci.c         | 25 +++++++++++++++++++++++++
 include/hw/pci/pci.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 1623bc9099..5c970f0727 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2124,6 +2124,31 @@ void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
     object_child_foreach_recursive(object_get_root(), pci_find_root_bus, &args);
 }
 
+typedef struct {
+    pci_bus_dev_fn fn;
+    void *opaque;
+} pci_bus_dev_args;
+
+static void pci_single_bus_hook(PCIBus *bus, void *opaque)
+{
+    pci_bus_dev_args *args = opaque;
+
+    pci_for_each_device_under_bus(bus, args->fn, args->opaque);
+}
+
+static void pci_root_bus_hook(PCIBus *bus, void *opaque)
+{
+    assert(pci_bus_is_root(bus));
+    pci_for_each_bus(bus, pci_single_bus_hook, opaque);
+}
+
+void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque)
+{
+    pci_bus_dev_args args = { .fn = fn, .opaque = opaque };
+
+    pci_for_each_root_bus(pci_root_bus_hook, &args);
+}
+
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
 {
     bus = pci_find_bus_nr(bus, bus_num);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 9e490d8969..1a862d1903 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -476,6 +476,8 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
                                   pci_bus_fn end, void *parent_state);
 /* Call `fn' for each pci root bus on the system */
 void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
+/* Call 'fn' for each pci device on the system */
+void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
 
 /* Use this wrapper when specific scan order is not required. */
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
                   ` (6 preceding siblings ...)
  2021-10-21 10:42 ` [PATCH 7/8] pci: Add pci_for_each_device_all() Peter Xu
@ 2021-10-21 10:42 ` Peter Xu
  2021-10-21 10:49   ` Michael S. Tsirkin
                     ` (2 more replies)
  7 siblings, 3 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-21 10:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, peterx, Shannon Zhao, Alex Williamson,
	Paolo Bonzini, Igor Mammedov, Eric Auger, David Gibson

Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
is realized.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/i386/x86-iommu.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 86ad03972e..58abce7edc 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -21,6 +21,7 @@
 #include "hw/sysbus.h"
 #include "hw/i386/x86-iommu.h"
 #include "hw/qdev-properties.h"
+#include "hw/vfio/pci.h"
 #include "hw/i386/pc.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
@@ -103,6 +104,16 @@ IommuType x86_iommu_get_type(void)
     return x86_iommu_default->type;
 }
 
+static void x86_iommu_pci_dev_hook(PCIBus *bus, PCIDevice *dev, void *opaque)
+{
+    Error **errp = (Error **)opaque;
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_VFIO_PCI)) {
+        error_setg(errp, "Device '%s' must be specified before vIOMMUs",
+                   TYPE_VFIO_PCI);
+    }
+}
+
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
@@ -120,6 +131,12 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /* Make sure there's no special device plugged before vIOMMU */
+    pci_for_each_device_all(x86_iommu_pci_dev_hook, (void *)errp);
+    if (*errp) {
+        return;
+    }
+
     /* If the user didn't specify IR, choose a default value for it */
     if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
         x86_iommu->intr_supported = irq_all_kernel ?
@@ -151,6 +168,7 @@ static Property x86_iommu_properties[] = {
 static void x86_iommu_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
+
     dc->realize = x86_iommu_realize;
     device_class_set_props(dc, x86_iommu_properties);
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-21 10:42 ` [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci Peter Xu
@ 2021-10-21 10:49   ` Michael S. Tsirkin
  2021-10-21 12:38   ` Eric Auger
  2021-10-21 22:30   ` Alex Williamson
  2 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2021-10-21 10:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

On Thu, Oct 21, 2021 at 06:42:59PM +0800, Peter Xu wrote:
> Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> is realized.
> 
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/x86-iommu.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 86ad03972e..58abce7edc 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -21,6 +21,7 @@
>  #include "hw/sysbus.h"
>  #include "hw/i386/x86-iommu.h"
>  #include "hw/qdev-properties.h"
> +#include "hw/vfio/pci.h"
>  #include "hw/i386/pc.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
> @@ -103,6 +104,16 @@ IommuType x86_iommu_get_type(void)
>      return x86_iommu_default->type;
>  }
>  
> +static void x86_iommu_pci_dev_hook(PCIBus *bus, PCIDevice *dev, void *opaque)
> +{
> +    Error **errp = (Error **)opaque;
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_VFIO_PCI)) {
> +        error_setg(errp, "Device '%s' must be specified before vIOMMUs",
> +                   TYPE_VFIO_PCI);
> +    }
> +}
> +
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
> @@ -120,6 +131,12 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    /* Make sure there's no special device plugged before vIOMMU */
> +    pci_for_each_device_all(x86_iommu_pci_dev_hook, (void *)errp);

cast not needed here.

> +    if (*errp) {
> +        return;
> +    }
> +
>      /* If the user didn't specify IR, choose a default value for it */
>      if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
>          x86_iommu->intr_supported = irq_all_kernel ?
> @@ -151,6 +168,7 @@ static Property x86_iommu_properties[] = {
>  static void x86_iommu_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> +
>      dc->realize = x86_iommu_realize;
>      device_class_set_props(dc, x86_iommu_properties);
>  }
> -- 
> 2.32.0



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] pci: Define pci_bus_dev_fn type
  2021-10-21 10:42 ` [PATCH 1/8] pci: Define pci_bus_dev_fn type Peter Xu
@ 2021-10-21 10:53   ` David Hildenbrand
  2021-10-21 11:15   ` Eric Auger
  2021-10-21 11:36   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-10-21 10:53 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Shannon Zhao,
	Alex Williamson, Paolo Bonzini, Igor Mammedov, Eric Auger,
	David Gibson

On 21.10.21 12:42, Peter Xu wrote:
> It's used in quite a few places of pci.c and also in the rest of the code base.
> Define such a hook so that it doesn't need to be defined all over the places.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] pci: Export pci_for_each_device_under_bus*()
  2021-10-21 10:42 ` [PATCH 2/8] pci: Export pci_for_each_device_under_bus*() Peter Xu
@ 2021-10-21 10:54   ` David Hildenbrand
  2021-10-21 11:32   ` Eric Auger
  2021-10-21 11:39   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-10-21 10:54 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Shannon Zhao,
	Alex Williamson, Paolo Bonzini, Igor Mammedov, Eric Auger,
	David Gibson

On 21.10.21 12:42, Peter Xu wrote:
> They're actually more commonly used than the helper without _under_bus, because
> most callers do have the pci bus on hand.  After exporting we can switch a lot
> of the call sites to use these two helpers.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 10 +++++-----
>  include/hw/pci/pci.h |  5 +++++
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/8] pci: Add pci_for_each_device_all()
  2021-10-21 10:42 ` [PATCH 7/8] pci: Add pci_for_each_device_all() Peter Xu
@ 2021-10-21 10:54   ` Michael S. Tsirkin
  2021-10-22  2:33     ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2021-10-21 10:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

On Thu, Oct 21, 2021 at 06:42:58PM +0800, Peter Xu wrote:
> With all the prepared infrastructures, we can easily add one helper now to loop
> over all the existing PCI devices across the whole system.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 25 +++++++++++++++++++++++++
>  include/hw/pci/pci.h |  2 ++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 1623bc9099..5c970f0727 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2124,6 +2124,31 @@ void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
>      object_child_foreach_recursive(object_get_root(), pci_find_root_bus, &args);
>  }
>  
> +typedef struct {
> +    pci_bus_dev_fn fn;
> +    void *opaque;
> +} pci_bus_dev_args;

code style violation. CamelCase for structs pls.

> +
> +static void pci_single_bus_hook(PCIBus *bus, void *opaque)
> +{
> +    pci_bus_dev_args *args = opaque;
> +
> +    pci_for_each_device_under_bus(bus, args->fn, args->opaque);
> +}
> +
> +static void pci_root_bus_hook(PCIBus *bus, void *opaque)
> +{
> +    assert(pci_bus_is_root(bus));
> +    pci_for_each_bus(bus, pci_single_bus_hook, opaque);
> +}
> +
> +void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque)
> +{
> +    pci_bus_dev_args args = { .fn = fn, .opaque = opaque };
> +
> +    pci_for_each_root_bus(pci_root_bus_hook, &args);
> +}
> +
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
>  {
>      bus = pci_find_bus_nr(bus, bus_num);
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 9e490d8969..1a862d1903 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -476,6 +476,8 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>                                    pci_bus_fn end, void *parent_state);
>  /* Call `fn' for each pci root bus on the system */
>  void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
> +/* Call 'fn' for each pci device on the system */
> +void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque);

Instead of hacking pci making initialization o(N^2),
can't we add a variant of object_resolve_path_type ?


>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>  
>  /* Use this wrapper when specific scan order is not required. */
> -- 
> 2.32.0



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] pci: Use pci_for_each_device_under_bus*()
  2021-10-21 10:42 ` [PATCH 3/8] pci: Use pci_for_each_device_under_bus*() Peter Xu
@ 2021-10-21 10:56   ` David Hildenbrand
  2021-10-21 11:34   ` Eric Auger
  1 sibling, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-10-21 10:56 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Shannon Zhao,
	Alex Williamson, Paolo Bonzini, Igor Mammedov, Eric Auger,
	David Gibson

On 21.10.21 12:42, Peter Xu wrote:
> Replace all the call sites of existing pci_for_each_device*() where the bus
> number is calculated from a PCIBus* already.  It should avoid the lookup of the
> PCIBus again.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/acpi-build.c       |  5 ++---
>  hw/pci/pcie.c              |  4 +---
>  hw/ppc/spapr_pci.c         | 12 +++++-------
>  hw/ppc/spapr_pci_nvlink2.c |  7 +++----
>  hw/ppc/spapr_pci_vfio.c    |  4 ++--
>  hw/s390x/s390-pci-bus.c    |  5 ++---
>  hw/xen/xen_pt.c            |  4 ++--
>  7 files changed, 17 insertions(+), 24 deletions(-)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type
  2021-10-21 10:42 ` [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type Peter Xu
@ 2021-10-21 10:57   ` David Hildenbrand
  2021-10-21 11:37   ` Eric Auger
  2021-10-21 11:44   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-10-21 10:57 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Shannon Zhao,
	Alex Williamson, Paolo Bonzini, Igor Mammedov, Eric Auger,
	David Gibson

On 21.10.21 12:42, Peter Xu wrote:
> The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
> opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
> void* pointer.
> 
> Use them where proper in pci.[ch], and to be used elsewhere.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

But should we just have squashed that into patch #1?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] pci: Add pci_for_each_root_bus()
  2021-10-21 10:42 ` [PATCH 5/8] pci: Add pci_for_each_root_bus() Peter Xu
@ 2021-10-21 11:00   ` David Hildenbrand
  2021-10-21 12:22   ` Eric Auger
  2021-10-25 13:16   ` Michael S. Tsirkin
  2 siblings, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-10-21 11:00 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Shannon Zhao,
	Alex Williamson, Paolo Bonzini, Igor Mammedov, Eric Auger,
	David Gibson

On 21.10.21 12:42, Peter Xu wrote:
> Add a helper to loop over each root bus of the system, either the default root
> bus or extended buses like pxb-pcie.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 26 ++++++++++++++++++++++++++
>  include/hw/pci/pci.h |  2 ++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4a84e478ce..1623bc9099 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2097,6 +2097,32 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>      }
>  }
>  
> +typedef struct {
> +    pci_bus_fn fn;
> +    void *opaque;
> +} pci_root_bus_args;
> +

PCIRootBusArgs ?

Or maybe

PCIForEachRootBusInfo

... but it gets lengthy .. ;)

> +static int pci_find_root_bus(Object *obj, void *opaque)
> +{
> +    pci_root_bus_args *args = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> +        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> +
> +        if (bus) {
> +            args->fn(bus, args->opaque);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
> +{
> +    pci_root_bus_args args = { .fn = fn, .opaque = opaque };
> +
> +    object_child_foreach_recursive(object_get_root(), pci_find_root_bus, &args);
> +}
>  
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
>  {
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index a7e81f04d3..9e490d8969 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -474,6 +474,8 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>                                             void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>                                    pci_bus_fn end, void *parent_state);
> +/* Call `fn' for each pci root bus on the system */
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>  
>  /* Use this wrapper when specific scan order is not required. */
> 

Apart from that LGTM, but not a PCI expert.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code
  2021-10-21 10:42 ` [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code Peter Xu
@ 2021-10-21 11:06   ` David Hildenbrand
  2021-10-21 12:28   ` Eric Auger
  1 sibling, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2021-10-21 11:06 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, Shannon Zhao,
	Alex Williamson, Paolo Bonzini, Igor Mammedov, Eric Auger,
	David Gibson

On 21.10.21 12:42, Peter Xu wrote:
> There're three places that can be rewritten with the pci_for_each_root_bus()
> helper that we just introduced.  De-dup the code.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/arm/virt-acpi-build.c | 31 +++++++++++--------------------
>  hw/i386/acpi-build.c     | 38 ++++++++++----------------------------
>  2 files changed, 21 insertions(+), 48 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 6cec97352b..54b82aa863 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -263,28 +263,20 @@ struct AcpiIortIdMapping {
>  typedef struct AcpiIortIdMapping AcpiIortIdMapping;
>  
>  /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
> -static int
> -iort_host_bridges(Object *obj, void *opaque)
> +static void
> +iort_host_bridges(PCIBus *bus, void *opaque)

^ I never understood the use of the line break within a function
declaration+definition, especially if everything just fits into a single
line. If it's already there, it's most probably in-line with the QEMU
coding style.

LGTM

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] pci: Define pci_bus_dev_fn type
  2021-10-21 10:42 ` [PATCH 1/8] pci: Define pci_bus_dev_fn type Peter Xu
  2021-10-21 10:53   ` David Hildenbrand
@ 2021-10-21 11:15   ` Eric Auger
  2021-10-22  2:16     ` Peter Xu
  2021-10-21 11:36   ` Philippe Mathieu-Daudé
  2 siblings, 1 reply; 44+ messages in thread
From: Eric Auger @ 2021-10-21 11:15 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

Hi Peter,

On 10/21/21 12:42 PM, Peter Xu wrote:
> It's used in quite a few places of pci.c and also in the rest of the code base.
> Define such a hook so that it doesn't need to be defined all over the places.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 14 ++++----------
>  include/hw/pci/pci.h |  7 ++++---
>  2 files changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 186758ee11..1ab2b78321 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1655,9 +1655,7 @@ static const pci_class_desc pci_class_descriptions[] =
>  };
>  
>  static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> -                                                  void (*fn)(PCIBus *b,
> -                                                             PCIDevice *d,
> -                                                             void *opaque),
> +                                                  pci_bus_dev_fn fn,
>                                                    void *opaque)
>  {
>      PCIDevice *d;
> @@ -1672,8 +1670,7 @@ static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>  }
>  
>  void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
> -                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> -                         void *opaque)
> +                                 pci_bus_dev_fn fn, void *opaque)
>  {
>      bus = pci_find_bus_nr(bus, bus_num);
>  
> @@ -1683,9 +1680,7 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
>  }
>  
>  static void pci_for_each_device_under_bus(PCIBus *bus,
> -                                          void (*fn)(PCIBus *b, PCIDevice *d,
> -                                                     void *opaque),
> -                                          void *opaque)
> +                                          pci_bus_dev_fn fn, void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> @@ -1699,8 +1694,7 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
>  }
>  
>  void pci_for_each_device(PCIBus *bus, int bus_num,
> -                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> -                         void *opaque)
> +                         pci_bus_dev_fn fn, void *opaque)
>  {
>      bus = pci_find_bus_nr(bus, bus_num);
>  
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 7fc90132cf..8e2d80860b 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -401,6 +401,8 @@ typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
>  OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
>  #define TYPE_PCIE_BUS "PCIE"
>  
> +typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
> +
>  bool pci_bus_is_express(PCIBus *bus);
>  
>  void pci_root_bus_init(PCIBus *bus, size_t bus_size, DeviceState *parent,
> @@ -458,11 +460,10 @@ static inline int pci_dev_bus_num(const PCIDevice *dev)
>  
>  int pci_bus_numa_node(PCIBus *bus);
>  void pci_for_each_device(PCIBus *bus, int bus_num,
> -                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
> +                         pci_bus_dev_fn fn,
>                           void *opaque);
>  void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
> -                                 void (*fn)(PCIBus *bus, PCIDevice *d,
> -                                            void *opaque),
> +                                 pci_bus_dev_fn fn,
>                                   void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus,
>                                    void *(*begin)(PCIBus *bus, void *parent_state),
There is another candidate in
hw/ppc/pegasos2.c:    void (*dtf)(PCIBus *bus, PCIDevice *d, FDTInfo *fi);
but this may be coverted later by the maintainer of this file.

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] pci: Export pci_for_each_device_under_bus*()
  2021-10-21 10:42 ` [PATCH 2/8] pci: Export pci_for_each_device_under_bus*() Peter Xu
  2021-10-21 10:54   ` David Hildenbrand
@ 2021-10-21 11:32   ` Eric Auger
  2021-10-21 11:39   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 44+ messages in thread
From: Eric Auger @ 2021-10-21 11:32 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson



On 10/21/21 12:42 PM, Peter Xu wrote:
> They're actually more commonly used than the helper without _under_bus, because
> most callers do have the pci bus on hand.  After exporting we can switch a lot
> of the call sites to use these two helpers.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 10 +++++-----
>  include/hw/pci/pci.h |  5 +++++
>  2 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 1ab2b78321..6b834cace5 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1654,9 +1654,9 @@ static const pci_class_desc pci_class_descriptions[] =
>      { 0, NULL}
>  };
>  
> -static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> -                                                  pci_bus_dev_fn fn,
> -                                                  void *opaque)
> +void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> +                                           pci_bus_dev_fn fn,
> +                                           void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> @@ -1679,8 +1679,8 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
>      }
>  }
>  
> -static void pci_for_each_device_under_bus(PCIBus *bus,
> -                                          pci_bus_dev_fn fn, void *opaque)
> +void pci_for_each_device_under_bus(PCIBus *bus,
> +                                   pci_bus_dev_fn fn, void *opaque)
>  {
>      PCIDevice *d;
>      int devfn;
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 8e2d80860b..437eabe609 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -465,6 +465,11 @@ void pci_for_each_device(PCIBus *bus, int bus_num,
>  void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
>                                   pci_bus_dev_fn fn,
>                                   void *opaque);
> +void pci_for_each_device_under_bus(PCIBus *bus,
> +                                   pci_bus_dev_fn fn, void *opaque);
> +void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> +                                           pci_bus_dev_fn fn,
> +                                           void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus,
>                                    void *(*begin)(PCIBus *bus, void *parent_state),
>                                    void (*end)(PCIBus *bus, void *state),
Reviewed-by: Eric Auger <eric.auger@redhat.com>


Eric



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] pci: Use pci_for_each_device_under_bus*()
  2021-10-21 10:42 ` [PATCH 3/8] pci: Use pci_for_each_device_under_bus*() Peter Xu
  2021-10-21 10:56   ` David Hildenbrand
@ 2021-10-21 11:34   ` Eric Auger
  2021-10-22  2:19     ` Peter Xu
  1 sibling, 1 reply; 44+ messages in thread
From: Eric Auger @ 2021-10-21 11:34 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

Hi Peter,
On 10/21/21 12:42 PM, Peter Xu wrote:
> Replace all the call sites of existing pci_for_each_device*() where the bus
> number is calculated from a PCIBus* already.  It should avoid the lookup of the
> PCIBus again.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/acpi-build.c       |  5 ++---
>  hw/pci/pcie.c              |  4 +---
>  hw/ppc/spapr_pci.c         | 12 +++++-------
>  hw/ppc/spapr_pci_nvlink2.c |  7 +++----
>  hw/ppc/spapr_pci_vfio.c    |  4 ++--
>  hw/s390x/s390-pci-bus.c    |  5 ++---
>  hw/xen/xen_pt.c            |  4 ++--
>  7 files changed, 17 insertions(+), 24 deletions(-)
>
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 81418b7911..a76b17ed92 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2132,8 +2132,7 @@ dmar_host_bridges(Object *obj, void *opaque)
>          PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
>  
>          if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device(bus, pci_bus_num(bus), insert_scope,
> -                                scope_blob);
> +            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
>          }
>      }
>  
> @@ -2339,7 +2338,7 @@ ivrs_host_bridges(Object *obj, void *opaque)
>          PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
>  
>          if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_blob);
> +            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
>          }
>      }
>  
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 6e95d82903..914a9bf3d1 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -694,9 +694,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
>          (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
>          (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
>          PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
> -        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                            pcie_unplug_device, NULL);
> -
> +        pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
>          pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
>                                       PCI_EXP_SLTSTA_PDS);
>          if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 7430bd6314..5bfd4aa9e5 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1317,8 +1317,7 @@ static int spapr_dt_pci_bus(SpaprPhbState *sphb, PCIBus *bus,
>                            RESOURCE_CELLS_SIZE));
>  
>      assert(bus);
> -    pci_for_each_device_reverse(bus, pci_bus_num(bus),
> -                                spapr_dt_pci_device_cb, &cbinfo);
> +    pci_for_each_device_under_bus_reverse(bus, spapr_dt_pci_device_cb, &cbinfo);
>      if (cbinfo.err) {
>          return cbinfo.err;
>      }
> @@ -2306,8 +2305,8 @@ static void spapr_phb_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
>          return;
>      }
>  
> -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                        spapr_phb_pci_enumerate_bridge, bus_no);
> +    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_enumerate_bridge,
> +                                  bus_no);
>      pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, *bus_no, 1);
>  }
>  
> @@ -2316,9 +2315,8 @@ static void spapr_phb_pci_enumerate(SpaprPhbState *phb)
>      PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
>      unsigned int bus_no = 0;
>  
> -    pci_for_each_device(bus, pci_bus_num(bus),
> -                        spapr_phb_pci_enumerate_bridge,
> -                        &bus_no);
> +    pci_for_each_device_under_bus(bus, spapr_phb_pci_enumerate_bridge,
> +                                  &bus_no);
>  
>  }
>  
> diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
> index 8ef9b40a18..7fb0cf4d04 100644
> --- a/hw/ppc/spapr_pci_nvlink2.c
> +++ b/hw/ppc/spapr_pci_nvlink2.c
> @@ -164,8 +164,7 @@ static void spapr_phb_pci_collect_nvgpu(PCIBus *bus, PCIDevice *pdev,
>          return;
>      }
>  
> -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                        spapr_phb_pci_collect_nvgpu, opaque);
> +    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_collect_nvgpu, opaque);
>  }
>  
>  void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
> @@ -183,8 +182,8 @@ void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
>      sphb->nvgpus->nv2_atsd_current = sphb->nv2_atsd_win_addr;
>  
>      bus = PCI_HOST_BRIDGE(sphb)->bus;
> -    pci_for_each_device(bus, pci_bus_num(bus),
> -                        spapr_phb_pci_collect_nvgpu, sphb->nvgpus);
> +    pci_for_each_device_under_bus(bus, spapr_phb_pci_collect_nvgpu,
> +                                  sphb->nvgpus);
>  
>      if (sphb->nvgpus->err) {
>          error_propagate(errp, sphb->nvgpus->err);
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index f3b37df8ea..2a76b4e0b5 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -164,8 +164,8 @@ static void spapr_phb_vfio_eeh_clear_dev_msix(PCIBus *bus,
>  
>  static void spapr_phb_vfio_eeh_clear_bus_msix(PCIBus *bus, void *opaque)
>  {
> -       pci_for_each_device(bus, pci_bus_num(bus),
> -                           spapr_phb_vfio_eeh_clear_dev_msix, NULL);
> +       pci_for_each_device_under_bus(bus, spapr_phb_vfio_eeh_clear_dev_msix,
> +                                     NULL);
>  }
>  
>  static void spapr_phb_vfio_eeh_pre_reset(SpaprPhbState *sphb)
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index 6fafffb029..1b51a72838 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -1163,8 +1163,7 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
>      }
>  
>      /* Assign numbers to all child bridges. The last is the highest number. */
> -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                        s390_pci_enumerate_bridge, s);
> +    pci_for_each_device_under_bus(sec_bus, s390_pci_enumerate_bridge, s);
>      pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
>  }
>  
> @@ -1193,7 +1192,7 @@ static void s390_pcihost_reset(DeviceState *dev)
>       * on every system reset, we also have to reassign numbers.
>       */
>      s->bus_no = 0;
> -    pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
> +    pci_for_each_device_under_bus(bus, s390_pci_enumerate_bridge, s);
>  }
>  
>  static void s390_pcihost_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index ca0a98187e..027190fa44 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -615,8 +615,8 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
>      }
>  
>      args.type = d->io_regions[bar].type;
> -    pci_for_each_device(pci_get_bus(d), pci_dev_bus_num(d),
> -                        xen_pt_check_bar_overlap, &args);
> +    pci_for_each_device_under_bus(pci_get_bus(d),
> +                                  xen_pt_check_bar_overlap, &args);
>      if (args.rc) {
>          XEN_PT_WARN(d, "Region: %d (addr: 0x%"FMT_PCIBUS
>                      ", len: 0x%"FMT_PCIBUS") is overlapped.\n",
Maybe squash with the previous one?

Besides
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] pci: Define pci_bus_dev_fn type
  2021-10-21 10:42 ` [PATCH 1/8] pci: Define pci_bus_dev_fn type Peter Xu
  2021-10-21 10:53   ` David Hildenbrand
  2021-10-21 11:15   ` Eric Auger
@ 2021-10-21 11:36   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 44+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-10-21 11:36 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Eric Auger,
	Igor Mammedov, Paolo Bonzini, David Gibson

On 10/21/21 12:42, Peter Xu wrote:
> It's used in quite a few places of pci.c and also in the rest of the code base.
> Define such a hook so that it doesn't need to be defined all over the places.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 14 ++++----------
>  include/hw/pci/pci.h |  7 ++++---
>  2 files changed, 8 insertions(+), 13 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type
  2021-10-21 10:42 ` [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type Peter Xu
  2021-10-21 10:57   ` David Hildenbrand
@ 2021-10-21 11:37   ` Eric Auger
  2021-10-21 11:44   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 44+ messages in thread
From: Eric Auger @ 2021-10-21 11:37 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson



On 10/21/21 12:42 PM, Peter Xu wrote:
> The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
> opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
> void* pointer.
>
> Use them where proper in pci.[ch], and to be used elsewhere.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  hw/pci/pci.c         |  6 ++----
>  include/hw/pci/pci.h | 12 +++++-------
>  2 files changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 6b834cace5..4a84e478ce 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2072,10 +2072,8 @@ static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num)
>      return NULL;
>  }
>  
> -void pci_for_each_bus_depth_first(PCIBus *bus,
> -                                  void *(*begin)(PCIBus *bus, void *parent_state),
> -                                  void (*end)(PCIBus *bus, void *state),
> -                                  void *parent_state)
> +void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
> +                                  pci_bus_fn end, void *parent_state)
>  {
>      PCIBus *sec;
>      void *state;
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 437eabe609..a7e81f04d3 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -402,6 +402,8 @@ OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
>  #define TYPE_PCIE_BUS "PCIE"
>  
>  typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
> +typedef void (*pci_bus_fn)(PCIBus *b, void *opaque);
> +typedef void* (*pci_bus_ret_fn)(PCIBus *b, void *opaque);
>  
>  bool pci_bus_is_express(PCIBus *bus);
>  
> @@ -470,17 +472,13 @@ void pci_for_each_device_under_bus(PCIBus *bus,
>  void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>                                             pci_bus_dev_fn fn,
>                                             void *opaque);
> -void pci_for_each_bus_depth_first(PCIBus *bus,
> -                                  void *(*begin)(PCIBus *bus, void *parent_state),
> -                                  void (*end)(PCIBus *bus, void *state),
> -                                  void *parent_state);
> +void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
> +                                  pci_bus_fn end, void *parent_state);
>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>  
>  /* Use this wrapper when specific scan order is not required. */
>  static inline
> -void pci_for_each_bus(PCIBus *bus,
> -                      void (*fn)(PCIBus *bus, void *opaque),
> -                      void *opaque)
> +void pci_for_each_bus(PCIBus *bus, pci_bus_fn fn, void *opaque)
>  {
>      pci_for_each_bus_depth_first(bus, NULL, fn, opaque);
>  }



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/8] pci: Export pci_for_each_device_under_bus*()
  2021-10-21 10:42 ` [PATCH 2/8] pci: Export pci_for_each_device_under_bus*() Peter Xu
  2021-10-21 10:54   ` David Hildenbrand
  2021-10-21 11:32   ` Eric Auger
@ 2021-10-21 11:39   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 44+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-10-21 11:39 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Eric Auger,
	Igor Mammedov, Paolo Bonzini, David Gibson

On 10/21/21 12:42, Peter Xu wrote:
> They're actually more commonly used than the helper without _under_bus, because
> most callers do have the pci bus on hand.  After exporting we can switch a lot
> of the call sites to use these two helpers.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 10 +++++-----
>  include/hw/pci/pci.h |  5 +++++
>  2 files changed, 10 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type
  2021-10-21 10:42 ` [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type Peter Xu
  2021-10-21 10:57   ` David Hildenbrand
  2021-10-21 11:37   ` Eric Auger
@ 2021-10-21 11:44   ` Philippe Mathieu-Daudé
  2021-10-21 12:54     ` Philippe Mathieu-Daudé
  2 siblings, 1 reply; 44+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-10-21 11:44 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Eric Auger,
	Igor Mammedov, Paolo Bonzini, David Gibson

On 10/21/21 12:42, Peter Xu wrote:
> The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
> opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
> void* pointer.
> 
> Use them where proper in pci.[ch], and to be used elsewhere.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         |  6 ++----
>  include/hw/pci/pci.h | 12 +++++-------
>  2 files changed, 7 insertions(+), 11 deletions(-)

> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -402,6 +402,8 @@ OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
>  #define TYPE_PCIE_BUS "PCIE"
>  
>  typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
> +typedef void (*pci_bus_fn)(PCIBus *b, void *opaque);
> +typedef void* (*pci_bus_ret_fn)(PCIBus *b, void *opaque);

$ git grep -F '* (*'|wc -l
12
$ git grep -F ' *(*'|wc -l
88

Nitpicking but I'd rather follow the project typedef style, otherwise:

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] pci: Add pci_for_each_root_bus()
  2021-10-21 10:42 ` [PATCH 5/8] pci: Add pci_for_each_root_bus() Peter Xu
  2021-10-21 11:00   ` David Hildenbrand
@ 2021-10-21 12:22   ` Eric Auger
  2021-10-25 13:16   ` Michael S. Tsirkin
  2 siblings, 0 replies; 44+ messages in thread
From: Eric Auger @ 2021-10-21 12:22 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

Hi Peter,

On 10/21/21 12:42 PM, Peter Xu wrote:
> Add a helper to loop over each root bus of the system, either the default root
> bus or extended buses like pxb-pcie.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 26 ++++++++++++++++++++++++++
>  include/hw/pci/pci.h |  2 ++
>  2 files changed, 28 insertions(+)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4a84e478ce..1623bc9099 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2097,6 +2097,32 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>      }
>  }
>  
> +typedef struct {
> +    pci_bus_fn fn;
> +    void *opaque;
> +} pci_root_bus_args;
> +
> +static int pci_find_root_bus(Object *obj, void *opaque)
> +{
> +    pci_root_bus_args *args = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> +        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> +
> +        if (bus) {
> +            args->fn(bus, args->opaque);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
> +{
> +    pci_root_bus_args args = { .fn = fn, .opaque = opaque };
> +
> +    object_child_foreach_recursive(object_get_root(), pci_find_root_bus, &args);
> +}
>  
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
>  {
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index a7e81f04d3..9e490d8969 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -474,6 +474,8 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>                                             void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>                                    pci_bus_fn end, void *parent_state);
> +/* Call `fn' for each pci root bus on the system */
nit: @fn ?
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>  
>  /* Use this wrapper when specific scan order is not required. */
with the  pci_root_bus_args struct type renamed as David suggested
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code
  2021-10-21 10:42 ` [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code Peter Xu
  2021-10-21 11:06   ` David Hildenbrand
@ 2021-10-21 12:28   ` Eric Auger
  1 sibling, 0 replies; 44+ messages in thread
From: Eric Auger @ 2021-10-21 12:28 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

Hi Peter,

On 10/21/21 12:42 PM, Peter Xu wrote:
> There're three places that can be rewritten with the pci_for_each_root_bus()
> helper that we just introduced.  De-dup the code.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  hw/arm/virt-acpi-build.c | 31 +++++++++++--------------------
>  hw/i386/acpi-build.c     | 38 ++++++++++----------------------------
>  2 files changed, 21 insertions(+), 48 deletions(-)
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 6cec97352b..54b82aa863 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -263,28 +263,20 @@ struct AcpiIortIdMapping {
>  typedef struct AcpiIortIdMapping AcpiIortIdMapping;
>  
>  /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
> -static int
> -iort_host_bridges(Object *obj, void *opaque)
> +static void
> +iort_host_bridges(PCIBus *bus, void *opaque)
>  {
> -    GArray *idmap_blob = opaque;
> -
> -    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> -        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> -
> -        if (bus && !pci_bus_bypass_iommu(bus)) {
> -            int min_bus, max_bus;
> +    if (!pci_bus_bypass_iommu(bus)) {
> +        int min_bus, max_bus;
>  
> -            pci_bus_range(bus, &min_bus, &max_bus);
> +        pci_bus_range(bus, &min_bus, &max_bus);
>  
> -            AcpiIortIdMapping idmap = {
> -                .input_base = min_bus << 8,
> -                .id_count = (max_bus - min_bus + 1) << 8,
> -            };
> -            g_array_append_val(idmap_blob, idmap);
> -        }
> +        AcpiIortIdMapping idmap = {
> +            .input_base = min_bus << 8,
> +            .id_count = (max_bus - min_bus + 1) << 8,
> +        };
> +        g_array_append_val((GArray *)opaque, idmap);
>      }
> -
> -    return 0;
>  }
>  
>  static int iort_idmap_compare(gconstpointer a, gconstpointer b)
> @@ -318,8 +310,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      if (vms->iommu == VIRT_IOMMU_SMMUV3) {
>          AcpiIortIdMapping next_range = {0};
>  
> -        object_child_foreach_recursive(object_get_root(),
> -                                       iort_host_bridges, smmu_idmaps);
> +        pci_for_each_root_bus(iort_host_bridges, smmu_idmaps);
>  
>          /* Sort the smmu idmap by input_base */
>          g_array_sort(smmu_idmaps, iort_idmap_compare);
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index a76b17ed92..3e50acfe35 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2123,20 +2123,12 @@ insert_scope(PCIBus *bus, PCIDevice *dev, void *opaque)
>  }
>  
>  /* For a given PCI host bridge, walk and insert DMAR scope */
> -static int
> -dmar_host_bridges(Object *obj, void *opaque)
> +static void
> +dmar_host_bridges(PCIBus *bus, void *opaque)
>  {
> -    GArray *scope_blob = opaque;
> -
> -    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> -        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> -
> -        if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
> -        }
> +    if (!pci_bus_bypass_iommu(bus)) {
> +        pci_for_each_device_under_bus(bus, insert_scope, opaque);
>      }
> -
> -    return 0;
>  }
>  
>  /*
> @@ -2165,8 +2157,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>       * Insert scope for each PCI bridge and endpoint device which
>       * is attached to a bus with iommu enabled.
>       */
> -    object_child_foreach_recursive(object_get_root(),
> -                                   dmar_host_bridges, scope_blob);
> +    pci_for_each_root_bus(dmar_host_bridges, scope_blob);
>  
>      assert(iommu);
>      if (x86_iommu_ir_supported(iommu)) {
> @@ -2329,20 +2320,12 @@ insert_ivhd(PCIBus *bus, PCIDevice *dev, void *opaque)
>  }
>  
>  /* For all PCI host bridges, walk and insert IVHD entries */
> -static int
> -ivrs_host_bridges(Object *obj, void *opaque)
> +static void
> +ivrs_host_bridges(PCIBus *bus, void *opaque)
>  {
> -    GArray *ivhd_blob = opaque;
> -
> -    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> -        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> -
> -        if (bus && !pci_bus_bypass_iommu(bus)) {
> -            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
> -        }
> +    if (!pci_bus_bypass_iommu(bus)) {
> +        pci_for_each_device_under_bus(bus, insert_ivhd, opaque);
>      }
> -
> -    return 0;
>  }
>  
>  static void
> @@ -2380,8 +2363,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>       * blob further below.  Fall back to an entry covering all devices, which
>       * is sufficient when no aliases are present.
>       */
> -    object_child_foreach_recursive(object_get_root(),
> -                                   ivrs_host_bridges, ivhd_blob);
> +    pci_for_each_root_bus(ivrs_host_bridges, ivhd_blob);
>  
>      if (!ivhd_blob->len) {
>          /*



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-21 10:42 ` [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci Peter Xu
  2021-10-21 10:49   ` Michael S. Tsirkin
@ 2021-10-21 12:38   ` Eric Auger
  2021-10-22  2:37     ` Peter Xu
  2021-10-21 22:30   ` Alex Williamson
  2 siblings, 1 reply; 44+ messages in thread
From: Eric Auger @ 2021-10-21 12:38 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

Hi Peter,

On 10/21/21 12:42 PM, Peter Xu wrote:
> Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> is realized.
>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/x86-iommu.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 86ad03972e..58abce7edc 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -21,6 +21,7 @@
>  #include "hw/sysbus.h"
>  #include "hw/i386/x86-iommu.h"
>  #include "hw/qdev-properties.h"
> +#include "hw/vfio/pci.h"
>  #include "hw/i386/pc.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
> @@ -103,6 +104,16 @@ IommuType x86_iommu_get_type(void)
>      return x86_iommu_default->type;
>  }
>  
> +static void x86_iommu_pci_dev_hook(PCIBus *bus, PCIDevice *dev, void *opaque)
> +{
> +    Error **errp = (Error **)opaque;
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_VFIO_PCI)) {
> +        error_setg(errp, "Device '%s' must be specified before vIOMMUs",
> +                   TYPE_VFIO_PCI);
if there are several VFIO-PCI devices set before the IOMMU, errp may be
overriden
as we do not exit the loop as soon as there is an error I think

Eric
> +    }
> +}
> +
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
> @@ -120,6 +131,12 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    /* Make sure there's no special device plugged before vIOMMU */
> +    pci_for_each_device_all(x86_iommu_pci_dev_hook, (void *)errp);
> +    if (*errp) {
> +        return;
> +    }
> +
>      /* If the user didn't specify IR, choose a default value for it */
>      if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
>          x86_iommu->intr_supported = irq_all_kernel ?
> @@ -151,6 +168,7 @@ static Property x86_iommu_properties[] = {
>  static void x86_iommu_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> +
>      dc->realize = x86_iommu_realize;
>      device_class_set_props(dc, x86_iommu_properties);
>  }



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type
  2021-10-21 11:44   ` Philippe Mathieu-Daudé
@ 2021-10-21 12:54     ` Philippe Mathieu-Daudé
  2021-10-22  2:24       ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-10-21 12:54 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	David Hildenbrand, Jason Wang, Michael S . Tsirkin,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Eric Auger,
	Igor Mammedov, Paolo Bonzini, David Gibson

On 10/21/21 13:44, Philippe Mathieu-Daudé wrote:
> On 10/21/21 12:42, Peter Xu wrote:
>> The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
>> opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
>> void* pointer.
>>
>> Use them where proper in pci.[ch], and to be used elsewhere.
>>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>  hw/pci/pci.c         |  6 ++----
>>  include/hw/pci/pci.h | 12 +++++-------
>>  2 files changed, 7 insertions(+), 11 deletions(-)
> 
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -402,6 +402,8 @@ OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
>>  #define TYPE_PCIE_BUS "PCIE"
>>  
>>  typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
>> +typedef void (*pci_bus_fn)(PCIBus *b, void *opaque);
>> +typedef void* (*pci_bus_ret_fn)(PCIBus *b, void *opaque);

Now looking at patch #8, I wonder if it wouldn't be cleaner to have
a single:

/*
 * pci_bus_fn:
 * @bus: PCI bus
 * @opaque: Pointer to opaque pointer,
            can be used as input *and* output
 * Return #true on success, #false on failure
 */
typedef bool (*pci_bus_fn)(PCIBus *bus, void **opaque);



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-21 10:42 ` [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci Peter Xu
  2021-10-21 10:49   ` Michael S. Tsirkin
  2021-10-21 12:38   ` Eric Auger
@ 2021-10-21 22:30   ` Alex Williamson
  2021-10-22  2:14     ` Peter Xu
  2 siblings, 1 reply; 44+ messages in thread
From: Alex Williamson @ 2021-10-21 22:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Paolo Bonzini, Igor Mammedov,
	Eric Auger, David Gibson

On Thu, 21 Oct 2021 18:42:59 +0800
Peter Xu <peterx@redhat.com> wrote:

> Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> is realized.

Sorry, I'm not onboard with this solution at all.

It would be really useful though if this commit log or a code comment
described exactly the incompatibility for which vfio-pci devices are
being called out here.  Otherwise I see this as a bit of magic voodoo
that gets lost in lore and copied elsewhere and we're constantly trying
to figure out specific incompatibilities when vfio-pci devices are
trying really hard to be "just another device".

I infer from the link of the previous alternate solution that this is
to do with the fact that vfio devices attach a memory listener to the
device address space.  Interestingly that previous cover letter also
discusses how vdpa devices might have a similar issue, which makes it
confusing again that we're calling out vfio-pci devices by name rather
than for a behavior.

If the behavior here is that vfio-pci devices attach a listener to the
device address space, then that provides a couple possible options.  We
could look for devices that have recorded an interest in their address
space, such as by setting a flag on PCIDevice when someone calls
pci_device_iommu_address_space(), where we could walk all devices using
the code in this series to find a device with such a flag.

Another option might be to attach owner objects to memory listeners,
walk all the listeners on the system address space (that the vIOMMU is
about to replace for some devices) and evaluate the owner objects
against TYPE_PCI_DEVICE.

Please lets not just call out arbitrary devices, especially not without
a thorough explanation of the incompatibility.  Thanks,

Alex


> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/i386/x86-iommu.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 86ad03972e..58abce7edc 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -21,6 +21,7 @@
>  #include "hw/sysbus.h"
>  #include "hw/i386/x86-iommu.h"
>  #include "hw/qdev-properties.h"
> +#include "hw/vfio/pci.h"
>  #include "hw/i386/pc.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
> @@ -103,6 +104,16 @@ IommuType x86_iommu_get_type(void)
>      return x86_iommu_default->type;
>  }
>  
> +static void x86_iommu_pci_dev_hook(PCIBus *bus, PCIDevice *dev, void *opaque)
> +{
> +    Error **errp = (Error **)opaque;
> +
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_VFIO_PCI)) {
> +        error_setg(errp, "Device '%s' must be specified before vIOMMUs",
> +                   TYPE_VFIO_PCI);
> +    }
> +}
> +
>  static void x86_iommu_realize(DeviceState *dev, Error **errp)
>  {
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
> @@ -120,6 +131,12 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    /* Make sure there's no special device plugged before vIOMMU */
> +    pci_for_each_device_all(x86_iommu_pci_dev_hook, (void *)errp);
> +    if (*errp) {
> +        return;
> +    }
> +
>      /* If the user didn't specify IR, choose a default value for it */
>      if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) {
>          x86_iommu->intr_supported = irq_all_kernel ?
> @@ -151,6 +168,7 @@ static Property x86_iommu_properties[] = {
>  static void x86_iommu_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> +
>      dc->realize = x86_iommu_realize;
>      device_class_set_props(dc, x86_iommu_properties);
>  }



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-21 22:30   ` Alex Williamson
@ 2021-10-22  2:14     ` Peter Xu
  2021-10-26 15:11       ` Igor Mammedov
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2021-10-22  2:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Paolo Bonzini, Igor Mammedov,
	Eric Auger, David Gibson

Hi, Alex,

On Thu, Oct 21, 2021 at 04:30:39PM -0600, Alex Williamson wrote:
> On Thu, 21 Oct 2021 18:42:59 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> > is realized.
> 
> Sorry, I'm not onboard with this solution at all.
> 
> It would be really useful though if this commit log or a code comment
> described exactly the incompatibility for which vfio-pci devices are
> being called out here.  Otherwise I see this as a bit of magic voodoo
> that gets lost in lore and copied elsewhere and we're constantly trying
> to figure out specific incompatibilities when vfio-pci devices are
> trying really hard to be "just another device".

Sure, I can enrich the commit message.

> 
> I infer from the link of the previous alternate solution that this is
> to do with the fact that vfio devices attach a memory listener to the
> device address space.

IMHO it's not about the memory listeners, I think that' after vfio detected
some vIOMMU memory regions already, which must be based on an vIOMMU address
space being available.  I think the problem is that when realize() vfio-pci we
fetch the dma address space specifically for getting the vfio group, while that
could happen too early, even before vIOMMU is created.

> Interestingly that previous cover letter also discusses how vdpa devices
> might have a similar issue, which makes it confusing again that we're calling
> out vfio-pci devices by name rather than for a behavior.

Yes I'll need to see whether this approach will be accepted first.  I think
similar thing could help VDPA but it's not required there because VDPA has
already worked around using pci_device_iommu_address_space().  So potentially
the only one to "fix" is the vfio-pci device using along with vIOMMU, when the
device ordering is specified in the wrong order.  I'll leave the VDPA problem
to Jason to see whether he prefers keeping current code, or switch to a simpler
one.  That should be after this one.

> 
> If the behavior here is that vfio-pci devices attach a listener to the
> device address space, then that provides a couple possible options.  We
> could look for devices that have recorded an interest in their address
> space, such as by setting a flag on PCIDevice when someone calls
> pci_device_iommu_address_space(), where we could walk all devices using
> the code in this series to find a device with such a flag.

Right, we can set a flag for all the pci devices that needs to consolidate
pci_device_iommu_address_space() result, however then it'll be vfio-pci only so
far.  Btw, I actually proposed similar things two months ago, and I think Igor
showed concern on that flag being vague on meaning:

https://lore.kernel.org/qemu-devel/20210906104915.7dd5c934@redhat.com/

  > > Does it need to be a pre_plug hook?  I thought we might just need a flag in the
  > > pci device classes showing that it should be after vIOMMUs, then in vIOMMU
  > > realize functions we walk pci bus to make sure no such device exist?
  > > 
  > > We could have a base vIOMMU class, then that could be in the realize() of the
  > > common class.
  > 
  > We basically don't know if device needs IOMMU or not and can work
  > with/without it just fine. In this case I'd think about IOMMU as board
  > feature that morphs PCI buses (some of them) (address space, bus numers, ...).
  > So I don't perceive any iommu flag as a device property at all.
  > 
  > As for realize vs pre_plug, the later is the part of abstract realize
  > (see: device_set_realized) and is already used by some PCI infrastructure:
  >   ex: pcie_cap_slot_pre_plug_cb/spapr_pci_pre_plug

I still think that flag will work, that flag should only shows "whether this
device needs to be specified earlier than vIOMMU", but I can get the point from
Igor that it's at least confusing on what does the flag mean.  Meanwhile I
don't think that flag will be required, as this is not the first time we name a
special device in the code, e.g. pc_machine_device_pre_plug_cb().
intel_iommu.c has it too upon vfio-pci already on making sure caching-mode=on
in vtd_machine_done_notify_one().

If Igor is okay with adding such a flag for PCIDevice class, I can do that in
the new version.  I don't have a strong opinion on this.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/8] pci: Define pci_bus_dev_fn type
  2021-10-21 11:15   ` Eric Auger
@ 2021-10-22  2:16     ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-22  2:16 UTC (permalink / raw)
  To: Eric Auger
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

On Thu, Oct 21, 2021 at 01:15:03PM +0200, Eric Auger wrote:
> Hi Peter,
> 
> On 10/21/21 12:42 PM, Peter Xu wrote:
> > It's used in quite a few places of pci.c and also in the rest of the code base.
> > Define such a hook so that it doesn't need to be defined all over the places.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/pci/pci.c         | 14 ++++----------
> >  include/hw/pci/pci.h |  7 ++++---
> >  2 files changed, 8 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index 186758ee11..1ab2b78321 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -1655,9 +1655,7 @@ static const pci_class_desc pci_class_descriptions[] =
> >  };
> >  
> >  static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> > -                                                  void (*fn)(PCIBus *b,
> > -                                                             PCIDevice *d,
> > -                                                             void *opaque),
> > +                                                  pci_bus_dev_fn fn,
> >                                                    void *opaque)
> >  {
> >      PCIDevice *d;
> > @@ -1672,8 +1670,7 @@ static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> >  }
> >  
> >  void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
> > -                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> > -                         void *opaque)
> > +                                 pci_bus_dev_fn fn, void *opaque)
> >  {
> >      bus = pci_find_bus_nr(bus, bus_num);
> >  
> > @@ -1683,9 +1680,7 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
> >  }
> >  
> >  static void pci_for_each_device_under_bus(PCIBus *bus,
> > -                                          void (*fn)(PCIBus *b, PCIDevice *d,
> > -                                                     void *opaque),
> > -                                          void *opaque)
> > +                                          pci_bus_dev_fn fn, void *opaque)
> >  {
> >      PCIDevice *d;
> >      int devfn;
> > @@ -1699,8 +1694,7 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
> >  }
> >  
> >  void pci_for_each_device(PCIBus *bus, int bus_num,
> > -                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> > -                         void *opaque)
> > +                         pci_bus_dev_fn fn, void *opaque)
> >  {
> >      bus = pci_find_bus_nr(bus, bus_num);
> >  
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index 7fc90132cf..8e2d80860b 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -401,6 +401,8 @@ typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
> >  OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
> >  #define TYPE_PCIE_BUS "PCIE"
> >  
> > +typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
> > +
> >  bool pci_bus_is_express(PCIBus *bus);
> >  
> >  void pci_root_bus_init(PCIBus *bus, size_t bus_size, DeviceState *parent,
> > @@ -458,11 +460,10 @@ static inline int pci_dev_bus_num(const PCIDevice *dev)
> >  
> >  int pci_bus_numa_node(PCIBus *bus);
> >  void pci_for_each_device(PCIBus *bus, int bus_num,
> > -                         void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
> > +                         pci_bus_dev_fn fn,
> >                           void *opaque);
> >  void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
> > -                                 void (*fn)(PCIBus *bus, PCIDevice *d,
> > -                                            void *opaque),
> > +                                 pci_bus_dev_fn fn,
> >                                   void *opaque);
> >  void pci_for_each_bus_depth_first(PCIBus *bus,
> >                                    void *(*begin)(PCIBus *bus, void *parent_state),
> There is another candidate in
> hw/ppc/pegasos2.c:    void (*dtf)(PCIBus *bus, PCIDevice *d, FDTInfo *fi);
> but this may be coverted later by the maintainer of this file.

That one has the last parameter a specific type, rather than "void *".

> 
> Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/8] pci: Use pci_for_each_device_under_bus*()
  2021-10-21 11:34   ` Eric Auger
@ 2021-10-22  2:19     ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-22  2:19 UTC (permalink / raw)
  To: Eric Auger
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

On Thu, Oct 21, 2021 at 01:34:07PM +0200, Eric Auger wrote:
> Hi Peter,
> On 10/21/21 12:42 PM, Peter Xu wrote:
> > Replace all the call sites of existing pci_for_each_device*() where the bus
> > number is calculated from a PCIBus* already.  It should avoid the lookup of the
> > PCIBus again.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/acpi-build.c       |  5 ++---
> >  hw/pci/pcie.c              |  4 +---
> >  hw/ppc/spapr_pci.c         | 12 +++++-------
> >  hw/ppc/spapr_pci_nvlink2.c |  7 +++----
> >  hw/ppc/spapr_pci_vfio.c    |  4 ++--
> >  hw/s390x/s390-pci-bus.c    |  5 ++---
> >  hw/xen/xen_pt.c            |  4 ++--
> >  7 files changed, 17 insertions(+), 24 deletions(-)
> >
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 81418b7911..a76b17ed92 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -2132,8 +2132,7 @@ dmar_host_bridges(Object *obj, void *opaque)
> >          PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> >  
> >          if (bus && !pci_bus_bypass_iommu(bus)) {
> > -            pci_for_each_device(bus, pci_bus_num(bus), insert_scope,
> > -                                scope_blob);
> > +            pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
> >          }
> >      }
> >  
> > @@ -2339,7 +2338,7 @@ ivrs_host_bridges(Object *obj, void *opaque)
> >          PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> >  
> >          if (bus && !pci_bus_bypass_iommu(bus)) {
> > -            pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_blob);
> > +            pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
> >          }
> >      }
> >  
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index 6e95d82903..914a9bf3d1 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -694,9 +694,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
> >          (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
> >          (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
> >          PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
> > -        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > -                            pcie_unplug_device, NULL);
> > -
> > +        pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
> >          pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
> >                                       PCI_EXP_SLTSTA_PDS);
> >          if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 7430bd6314..5bfd4aa9e5 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1317,8 +1317,7 @@ static int spapr_dt_pci_bus(SpaprPhbState *sphb, PCIBus *bus,
> >                            RESOURCE_CELLS_SIZE));
> >  
> >      assert(bus);
> > -    pci_for_each_device_reverse(bus, pci_bus_num(bus),
> > -                                spapr_dt_pci_device_cb, &cbinfo);
> > +    pci_for_each_device_under_bus_reverse(bus, spapr_dt_pci_device_cb, &cbinfo);
> >      if (cbinfo.err) {
> >          return cbinfo.err;
> >      }
> > @@ -2306,8 +2305,8 @@ static void spapr_phb_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
> >          return;
> >      }
> >  
> > -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > -                        spapr_phb_pci_enumerate_bridge, bus_no);
> > +    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_enumerate_bridge,
> > +                                  bus_no);
> >      pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, *bus_no, 1);
> >  }
> >  
> > @@ -2316,9 +2315,8 @@ static void spapr_phb_pci_enumerate(SpaprPhbState *phb)
> >      PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
> >      unsigned int bus_no = 0;
> >  
> > -    pci_for_each_device(bus, pci_bus_num(bus),
> > -                        spapr_phb_pci_enumerate_bridge,
> > -                        &bus_no);
> > +    pci_for_each_device_under_bus(bus, spapr_phb_pci_enumerate_bridge,
> > +                                  &bus_no);
> >  
> >  }
> >  
> > diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
> > index 8ef9b40a18..7fb0cf4d04 100644
> > --- a/hw/ppc/spapr_pci_nvlink2.c
> > +++ b/hw/ppc/spapr_pci_nvlink2.c
> > @@ -164,8 +164,7 @@ static void spapr_phb_pci_collect_nvgpu(PCIBus *bus, PCIDevice *pdev,
> >          return;
> >      }
> >  
> > -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > -                        spapr_phb_pci_collect_nvgpu, opaque);
> > +    pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_collect_nvgpu, opaque);
> >  }
> >  
> >  void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
> > @@ -183,8 +182,8 @@ void spapr_phb_nvgpu_setup(SpaprPhbState *sphb, Error **errp)
> >      sphb->nvgpus->nv2_atsd_current = sphb->nv2_atsd_win_addr;
> >  
> >      bus = PCI_HOST_BRIDGE(sphb)->bus;
> > -    pci_for_each_device(bus, pci_bus_num(bus),
> > -                        spapr_phb_pci_collect_nvgpu, sphb->nvgpus);
> > +    pci_for_each_device_under_bus(bus, spapr_phb_pci_collect_nvgpu,
> > +                                  sphb->nvgpus);
> >  
> >      if (sphb->nvgpus->err) {
> >          error_propagate(errp, sphb->nvgpus->err);
> > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> > index f3b37df8ea..2a76b4e0b5 100644
> > --- a/hw/ppc/spapr_pci_vfio.c
> > +++ b/hw/ppc/spapr_pci_vfio.c
> > @@ -164,8 +164,8 @@ static void spapr_phb_vfio_eeh_clear_dev_msix(PCIBus *bus,
> >  
> >  static void spapr_phb_vfio_eeh_clear_bus_msix(PCIBus *bus, void *opaque)
> >  {
> > -       pci_for_each_device(bus, pci_bus_num(bus),
> > -                           spapr_phb_vfio_eeh_clear_dev_msix, NULL);
> > +       pci_for_each_device_under_bus(bus, spapr_phb_vfio_eeh_clear_dev_msix,
> > +                                     NULL);
> >  }
> >  
> >  static void spapr_phb_vfio_eeh_pre_reset(SpaprPhbState *sphb)
> > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> > index 6fafffb029..1b51a72838 100644
> > --- a/hw/s390x/s390-pci-bus.c
> > +++ b/hw/s390x/s390-pci-bus.c
> > @@ -1163,8 +1163,7 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
> >      }
> >  
> >      /* Assign numbers to all child bridges. The last is the highest number. */
> > -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > -                        s390_pci_enumerate_bridge, s);
> > +    pci_for_each_device_under_bus(sec_bus, s390_pci_enumerate_bridge, s);
> >      pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
> >  }
> >  
> > @@ -1193,7 +1192,7 @@ static void s390_pcihost_reset(DeviceState *dev)
> >       * on every system reset, we also have to reassign numbers.
> >       */
> >      s->bus_no = 0;
> > -    pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
> > +    pci_for_each_device_under_bus(bus, s390_pci_enumerate_bridge, s);
> >  }
> >  
> >  static void s390_pcihost_class_init(ObjectClass *klass, void *data)
> > diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> > index ca0a98187e..027190fa44 100644
> > --- a/hw/xen/xen_pt.c
> > +++ b/hw/xen/xen_pt.c
> > @@ -615,8 +615,8 @@ static void xen_pt_region_update(XenPCIPassthroughState *s,
> >      }
> >  
> >      args.type = d->io_regions[bar].type;
> > -    pci_for_each_device(pci_get_bus(d), pci_dev_bus_num(d),
> > -                        xen_pt_check_bar_overlap, &args);
> > +    pci_for_each_device_under_bus(pci_get_bus(d),
> > +                                  xen_pt_check_bar_overlap, &args);
> >      if (args.rc) {
> >          XEN_PT_WARN(d, "Region: %d (addr: 0x%"FMT_PCIBUS
> >                      ", len: 0x%"FMT_PCIBUS") is overlapped.\n",
> Maybe squash with the previous one?

Will do.

> 
> Besides
> Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type
  2021-10-21 12:54     ` Philippe Mathieu-Daudé
@ 2021-10-22  2:24       ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-22  2:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Eric Auger,
	Igor Mammedov, Paolo Bonzini, David Gibson

On Thu, Oct 21, 2021 at 02:54:44PM +0200, Philippe Mathieu-Daudé wrote:
> On 10/21/21 13:44, Philippe Mathieu-Daudé wrote:
> > On 10/21/21 12:42, Peter Xu wrote:
> >> The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
> >> opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
> >> void* pointer.
> >>
> >> Use them where proper in pci.[ch], and to be used elsewhere.
> >>
> >> Signed-off-by: Peter Xu <peterx@redhat.com>
> >> ---
> >>  hw/pci/pci.c         |  6 ++----
> >>  include/hw/pci/pci.h | 12 +++++-------
> >>  2 files changed, 7 insertions(+), 11 deletions(-)
> > 
> >> --- a/include/hw/pci/pci.h
> >> +++ b/include/hw/pci/pci.h
> >> @@ -402,6 +402,8 @@ OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
> >>  #define TYPE_PCIE_BUS "PCIE"
> >>  
> >>  typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
> >> +typedef void (*pci_bus_fn)(PCIBus *b, void *opaque);
> >> +typedef void* (*pci_bus_ret_fn)(PCIBus *b, void *opaque);
> 
> Now looking at patch #8, I wonder if it wouldn't be cleaner to have
> a single:
> 
> /*
>  * pci_bus_fn:
>  * @bus: PCI bus
>  * @opaque: Pointer to opaque pointer,
>             can be used as input *and* output
>  * Return #true on success, #false on failure
>  */
> typedef bool (*pci_bus_fn)(PCIBus *bus, void **opaque);

Having the errp could be better imho for patch 8 so we can allow the per-device
hook to set the exact error message.  That could be useful when we extend the
list of devices to scan (besides vfio-pci) so we can set different error
messages rather than return "false" for all.

I'll stole the rb and I'll adjust the "void* ()".  Thanks for looking, Phil.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/8] pci: Add pci_for_each_device_all()
  2021-10-21 10:54   ` Michael S. Tsirkin
@ 2021-10-22  2:33     ` Peter Xu
  2021-10-22  8:43       ` Michael S. Tsirkin
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2021-10-22  2:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

Hi, Michael,

On Thu, Oct 21, 2021 at 06:54:59AM -0400, Michael S. Tsirkin wrote:
> > +typedef struct {
> > +    pci_bus_dev_fn fn;
> > +    void *opaque;
> > +} pci_bus_dev_args;
> 
> code style violation. CamelCase for structs pls.

OK.

> > +/* Call 'fn' for each pci device on the system */
> > +void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque);
> 
> Instead of hacking pci making initialization o(N^2),

Why it's O(N^2)?  One vIOMMU walks O(N), and we only have one vIOMMU, or am I
wrong?

> can't we add a variant of object_resolve_path_type ?

Could you elaborate?  Here what we want to do is to make sure there're no
specific PCI devices registered, and potentially it can be more than one type
of device in the future.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-21 12:38   ` Eric Auger
@ 2021-10-22  2:37     ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-22  2:37 UTC (permalink / raw)
  To: Eric Auger
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Igor Mammedov, David Gibson

On Thu, Oct 21, 2021 at 02:38:54PM +0200, Eric Auger wrote:
> Hi Peter,
> 
> On 10/21/21 12:42 PM, Peter Xu wrote:
> > Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> > is realized.
> >
> > Suggested-by: Igor Mammedov <imammedo@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  hw/i386/x86-iommu.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> > index 86ad03972e..58abce7edc 100644
> > --- a/hw/i386/x86-iommu.c
> > +++ b/hw/i386/x86-iommu.c
> > @@ -21,6 +21,7 @@
> >  #include "hw/sysbus.h"
> >  #include "hw/i386/x86-iommu.h"
> >  #include "hw/qdev-properties.h"
> > +#include "hw/vfio/pci.h"
> >  #include "hw/i386/pc.h"
> >  #include "qapi/error.h"
> >  #include "qemu/error-report.h"
> > @@ -103,6 +104,16 @@ IommuType x86_iommu_get_type(void)
> >      return x86_iommu_default->type;
> >  }
> >  
> > +static void x86_iommu_pci_dev_hook(PCIBus *bus, PCIDevice *dev, void *opaque)
> > +{
> > +    Error **errp = (Error **)opaque;
> > +
> > +    if (object_dynamic_cast(OBJECT(dev), TYPE_VFIO_PCI)) {
> > +        error_setg(errp, "Device '%s' must be specified before vIOMMUs",
> > +                   TYPE_VFIO_PCI);
> if there are several VFIO-PCI devices set before the IOMMU, errp may be
> overriden
> as we do not exit the loop as soon as there is an error I think

Hmm, good point.  I won't worry too much about overriding yet as if there're
more devices violating the rule then reporting any of them should work - then
as the user tune the qemu cmdline it'll finally go right.

But I do see that error_setv() has an assertion on *errp being NULL.. I'll at
least make sure it won't trigger that assert by accident.

Thanks for spotting it!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/8] pci: Add pci_for_each_device_all()
  2021-10-22  2:33     ` Peter Xu
@ 2021-10-22  8:43       ` Michael S. Tsirkin
  2021-10-25 12:57         ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22  8:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

On Fri, Oct 22, 2021 at 10:33:15AM +0800, Peter Xu wrote:
> Hi, Michael,
> 
> On Thu, Oct 21, 2021 at 06:54:59AM -0400, Michael S. Tsirkin wrote:
> > > +typedef struct {
> > > +    pci_bus_dev_fn fn;
> > > +    void *opaque;
> > > +} pci_bus_dev_args;
> > 
> > code style violation. CamelCase for structs pls.
> 
> OK.
> 
> > > +/* Call 'fn' for each pci device on the system */
> > > +void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque);
> > 
> > Instead of hacking pci making initialization o(N^2),
> 
> Why it's O(N^2)?  One vIOMMU walks O(N), and we only have one vIOMMU, or am I
> wrong?

What I meant is this is O(N) and if called M times will be O(N * M)
yes your patches only call once so O(N), still we can do better.

> > can't we add a variant of object_resolve_path_type ?
> 
> Could you elaborate?  Here what we want to do is to make sure there're no
> specific PCI devices registered, and potentially it can be more than one type
> of device in the future.
> 
> Thanks,

All you seem to care about is checking there's no VFIO
(why - should really be documented in a code comment much more clearly).
Looks like object_resolve_path_type does that with O(1) complexity.
If we need a variant that checks for multiple types we can add that.

> -- 
> Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/8] pci: Add pci_for_each_device_all()
  2021-10-22  8:43       ` Michael S. Tsirkin
@ 2021-10-25 12:57         ` Peter Xu
  2021-10-25 13:13           ` Michael S. Tsirkin
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2021-10-25 12:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

On Fri, Oct 22, 2021 at 04:43:43AM -0400, Michael S. Tsirkin wrote:
> On Fri, Oct 22, 2021 at 10:33:15AM +0800, Peter Xu wrote:
> > Hi, Michael,
> > 
> > On Thu, Oct 21, 2021 at 06:54:59AM -0400, Michael S. Tsirkin wrote:
> > > > +typedef struct {
> > > > +    pci_bus_dev_fn fn;
> > > > +    void *opaque;
> > > > +} pci_bus_dev_args;
> > > 
> > > code style violation. CamelCase for structs pls.
> > 
> > OK.
> > 
> > > > +/* Call 'fn' for each pci device on the system */
> > > > +void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque);
> > > 
> > > Instead of hacking pci making initialization o(N^2),
> > 
> > Why it's O(N^2)?  One vIOMMU walks O(N), and we only have one vIOMMU, or am I
> > wrong?
> 
> What I meant is this is O(N) and if called M times will be O(N * M)
> yes your patches only call once so O(N), still we can do better.

I see.

> 
> > > can't we add a variant of object_resolve_path_type ?
> > 
> > Could you elaborate?  Here what we want to do is to make sure there're no
> > specific PCI devices registered, and potentially it can be more than one type
> > of device in the future.
> > 
> > Thanks,
> 
> All you seem to care about is checking there's no VFIO
> (why - should really be documented in a code comment much more clearly).

Right, Alex asked the same question.  I'll make sure to mention that in the
commit message in the next version.

> Looks like object_resolve_path_type does that with O(1) complexity.
> If we need a variant that checks for multiple types we can add that.

It's still O(N), or am I wrong?  I mean for example there's the loop in
object_resolve_partial_path().

But yeah I can use that too if you prefer, it's just that when we want to
detect more types of pci classes it could be slower iiuc, because we'll need to
call object_resolve_path_type() once for each type.  For pci bus scan it's
always one round because we only have at most one x86 vIOMMU for each guest.

At the meantime, IMHO patch 1-6 are cleanups that should be good even without
patch 7/8. If we prefer object_resolve_path_type() I'd still think it would be
good to propose patch 1-6 separately (with some patch properly squashed as
suggested by reviewers)?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/8] pci: Add pci_for_each_device_all()
  2021-10-25 12:57         ` Peter Xu
@ 2021-10-25 13:13           ` Michael S. Tsirkin
  0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2021-10-25 13:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

On Mon, Oct 25, 2021 at 08:57:36PM +0800, Peter Xu wrote:
> On Fri, Oct 22, 2021 at 04:43:43AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Oct 22, 2021 at 10:33:15AM +0800, Peter Xu wrote:
> > > Hi, Michael,
> > > 
> > > On Thu, Oct 21, 2021 at 06:54:59AM -0400, Michael S. Tsirkin wrote:
> > > > > +typedef struct {
> > > > > +    pci_bus_dev_fn fn;
> > > > > +    void *opaque;
> > > > > +} pci_bus_dev_args;
> > > > 
> > > > code style violation. CamelCase for structs pls.
> > > 
> > > OK.
> > > 
> > > > > +/* Call 'fn' for each pci device on the system */
> > > > > +void pci_for_each_device_all(pci_bus_dev_fn fn, void *opaque);
> > > > 
> > > > Instead of hacking pci making initialization o(N^2),
> > > 
> > > Why it's O(N^2)?  One vIOMMU walks O(N), and we only have one vIOMMU, or am I
> > > wrong?
> > 
> > What I meant is this is O(N) and if called M times will be O(N * M)
> > yes your patches only call once so O(N), still we can do better.
> 
> I see.
> 
> > 
> > > > can't we add a variant of object_resolve_path_type ?
> > > 
> > > Could you elaborate?  Here what we want to do is to make sure there're no
> > > specific PCI devices registered, and potentially it can be more than one type
> > > of device in the future.
> > > 
> > > Thanks,
> > 
> > All you seem to care about is checking there's no VFIO
> > (why - should really be documented in a code comment much more clearly).
> 
> Right, Alex asked the same question.  I'll make sure to mention that in the
> commit message in the next version.
> 
> > Looks like object_resolve_path_type does that with O(1) complexity.
> > If we need a variant that checks for multiple types we can add that.
> 
> It's still O(N), or am I wrong?  I mean for example there's the loop in
> object_resolve_partial_path().

Only if there's a hash collision.

> But yeah I can use that too if you prefer, it's just that when we want to
> detect more types of pci classes it could be slower iiuc, because we'll need to
> call object_resolve_path_type() once for each type.  For pci bus scan it's
> always one round because we only have at most one x86 vIOMMU for each guest.
> 
> At the meantime, IMHO patch 1-6 are cleanups that should be good even without
> patch 7/8. If we prefer object_resolve_path_type() I'd still think it would be
> good to propose patch 1-6 separately (with some patch properly squashed as
> suggested by reviewers)?
> 
> Thanks,

OK let's handle that separately.

> -- 
> Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] pci: Add pci_for_each_root_bus()
  2021-10-21 10:42 ` [PATCH 5/8] pci: Add pci_for_each_root_bus() Peter Xu
  2021-10-21 11:00   ` David Hildenbrand
  2021-10-21 12:22   ` Eric Auger
@ 2021-10-25 13:16   ` Michael S. Tsirkin
  2021-10-28  2:56     ` Peter Xu
  2 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2021-10-25 13:16 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

On Thu, Oct 21, 2021 at 06:42:56PM +0800, Peter Xu wrote:
> Add a helper to loop over each root bus of the system, either the default root
> bus or extended buses like pxb-pcie.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/pci/pci.c         | 26 ++++++++++++++++++++++++++
>  include/hw/pci/pci.h |  2 ++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4a84e478ce..1623bc9099 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2097,6 +2097,32 @@ void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>      }
>  }
>  
> +typedef struct {
> +    pci_bus_fn fn;
> +    void *opaque;
> +} pci_root_bus_args;
> +

coding style violation

> +static int pci_find_root_bus(Object *obj, void *opaque)
> +{
> +    pci_root_bus_args *args = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
> +        PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
> +
> +        if (bus) {
> +            args->fn(bus, args->opaque);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
> +{
> +    pci_root_bus_args args = { .fn = fn, .opaque = opaque };
> +
> +    object_child_foreach_recursive(object_get_root(), pci_find_root_bus, &args);
> +}
>  
>  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
>  {


How about adding an API with a type filter to the qom core?
E.g.
object_child_foreach_type_recursive getting a type.


> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index a7e81f04d3..9e490d8969 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -474,6 +474,8 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
>                                             void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
>                                    pci_bus_fn end, void *parent_state);
> +/* Call `fn' for each pci root bus on the system */
> +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque);
>  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
>  
>  /* Use this wrapper when specific scan order is not required. */
> -- 
> 2.32.0



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-22  2:14     ` Peter Xu
@ 2021-10-26 15:11       ` Igor Mammedov
  2021-10-26 15:38         ` Alex Williamson
  2021-10-27  8:30         ` Peter Xu
  0 siblings, 2 replies; 44+ messages in thread
From: Igor Mammedov @ 2021-10-26 15:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Eric Auger, David Gibson

On Fri, 22 Oct 2021 10:14:29 +0800
Peter Xu <peterx@redhat.com> wrote:

> Hi, Alex,
> 
> On Thu, Oct 21, 2021 at 04:30:39PM -0600, Alex Williamson wrote:
> > On Thu, 21 Oct 2021 18:42:59 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> > > is realized.  
> > 
> > Sorry, I'm not onboard with this solution at all.
> > 
> > It would be really useful though if this commit log or a code comment
> > described exactly the incompatibility for which vfio-pci devices are
> > being called out here.  Otherwise I see this as a bit of magic voodoo
> > that gets lost in lore and copied elsewhere and we're constantly trying
> > to figure out specific incompatibilities when vfio-pci devices are
> > trying really hard to be "just another device".  
> 
> Sure, I can enrich the commit message.
> 
> > 
> > I infer from the link of the previous alternate solution that this is
> > to do with the fact that vfio devices attach a memory listener to the
> > device address space.  
> 
> IMHO it's not about the memory listeners, I think that' after vfio detected
> some vIOMMU memory regions already, which must be based on an vIOMMU address
> space being available.  I think the problem is that when realize() vfio-pci we
> fetch the dma address space specifically for getting the vfio group, while that
> could happen too early, even before vIOMMU is created.
> 
> > Interestingly that previous cover letter also discusses how vdpa devices
> > might have a similar issue, which makes it confusing again that we're calling
> > out vfio-pci devices by name rather than for a behavior.  
> 
> Yes I'll need to see whether this approach will be accepted first.  I think
> similar thing could help VDPA but it's not required there because VDPA has
> already worked around using pci_device_iommu_address_space().  So potentially
> the only one to "fix" is the vfio-pci device using along with vIOMMU, when the
> device ordering is specified in the wrong order.  I'll leave the VDPA problem
> to Jason to see whether he prefers keeping current code, or switch to a simpler
> one.  That should be after this one.
> 
> > 
> > If the behavior here is that vfio-pci devices attach a listener to the
> > device address space, then that provides a couple possible options.  We
> > could look for devices that have recorded an interest in their address
> > space, such as by setting a flag on PCIDevice when someone calls
> > pci_device_iommu_address_space(), where we could walk all devices using
> > the code in this series to find a device with such a flag.  
> 
> Right, we can set a flag for all the pci devices that needs to consolidate
> pci_device_iommu_address_space() result, however then it'll be vfio-pci only so
> far.  Btw, I actually proposed similar things two months ago, and I think Igor
> showed concern on that flag being vague on meaning:

(1)
> https://lore.kernel.org/qemu-devel/20210906104915.7dd5c934@redhat.com/

> 
>   > > Does it need to be a pre_plug hook?  I thought we might just need a flag in the
>   > > pci device classes showing that it should be after vIOMMUs, then in vIOMMU
>   > > realize functions we walk pci bus to make sure no such device exist?
>   > > 
>   > > We could have a base vIOMMU class, then that could be in the realize() of the
>   > > common class.  
>   > 
>   > We basically don't know if device needs IOMMU or not and can work
>   > with/without it just fine. In this case I'd think about IOMMU as board
>   > feature that morphs PCI buses (some of them) (address space, bus numers, ...).
>   > So I don't perceive any iommu flag as a device property at all.
>   > 
>   > As for realize vs pre_plug, the later is the part of abstract realize
>   > (see: device_set_realized) and is already used by some PCI infrastructure:
>   >   ex: pcie_cap_slot_pre_plug_cb/spapr_pci_pre_plug  
> 
> I still think that flag will work, that flag should only shows "whether this
> device needs to be specified earlier than vIOMMU", but I can get the point from
> Igor that it's at least confusing on what does the flag mean.

> Meanwhile I
> don't think that flag will be required, as this is not the first time we name a
> special device in the code, e.g. pc_machine_device_pre_plug_cb().
> intel_iommu.c has it too upon vfio-pci already on making sure caching-mode=on
> in vtd_machine_done_notify_one().

I pointed to specifically to _pre_plug() handler and not as
implemented here in realize().
Reasoning behind it is that some_device_realize() should not poke
into other devices, while pc_machine_device_pre_plug_cb() is
part of machine code can/may legitimately access its child devices and verify/
configure them. (Hence I'd drop my suggested-by with current impl.)
 
> If Igor is okay with adding such a flag for PCIDevice class, I can do that in
> the new version.  I don't have a strong opinion on this.

Also, I've suggested to use pre_plug only as the last resort in case
vfio-pci can't be made independent of the order (see [1] for reset time
suggestion).
So why 'reset' approach didn't work out?
(this need to be cover letter/commit message as a reason why
we are resorting to a hack)

> 
> Thanks,
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-26 15:11       ` Igor Mammedov
@ 2021-10-26 15:38         ` Alex Williamson
  2021-10-27  8:30         ` Peter Xu
  1 sibling, 0 replies; 44+ messages in thread
From: Alex Williamson @ 2021-10-26 15:38 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand,
	Markus Armbruster, Peter Xu, qemu-devel, Shannon Zhao,
	Paolo Bonzini, Eric Auger, David Gibson

On Tue, 26 Oct 2021 17:11:39 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Fri, 22 Oct 2021 10:14:29 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Hi, Alex,
> > 
> > On Thu, Oct 21, 2021 at 04:30:39PM -0600, Alex Williamson wrote:  
> > > On Thu, 21 Oct 2021 18:42:59 +0800
> > > Peter Xu <peterx@redhat.com> wrote:
> > >     
> > > > Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> > > > is realized.    
> > > 
> > > Sorry, I'm not onboard with this solution at all.
> > > 
> > > It would be really useful though if this commit log or a code comment
> > > described exactly the incompatibility for which vfio-pci devices are
> > > being called out here.  Otherwise I see this as a bit of magic voodoo
> > > that gets lost in lore and copied elsewhere and we're constantly trying
> > > to figure out specific incompatibilities when vfio-pci devices are
> > > trying really hard to be "just another device".    
> > 
> > Sure, I can enrich the commit message.
> >   
> > > 
> > > I infer from the link of the previous alternate solution that this is
> > > to do with the fact that vfio devices attach a memory listener to the
> > > device address space.    
> > 
> > IMHO it's not about the memory listeners, I think that' after vfio detected
> > some vIOMMU memory regions already, which must be based on an vIOMMU address
> > space being available.  I think the problem is that when realize() vfio-pci we
> > fetch the dma address space specifically for getting the vfio group, while that
> > could happen too early, even before vIOMMU is created.
> >   
> > > Interestingly that previous cover letter also discusses how vdpa devices
> > > might have a similar issue, which makes it confusing again that we're calling
> > > out vfio-pci devices by name rather than for a behavior.    
> > 
> > Yes I'll need to see whether this approach will be accepted first.  I think
> > similar thing could help VDPA but it's not required there because VDPA has
> > already worked around using pci_device_iommu_address_space().  So potentially
> > the only one to "fix" is the vfio-pci device using along with vIOMMU, when the
> > device ordering is specified in the wrong order.  I'll leave the VDPA problem
> > to Jason to see whether he prefers keeping current code, or switch to a simpler
> > one.  That should be after this one.
> >   
> > > 
> > > If the behavior here is that vfio-pci devices attach a listener to the
> > > device address space, then that provides a couple possible options.  We
> > > could look for devices that have recorded an interest in their address
> > > space, such as by setting a flag on PCIDevice when someone calls
> > > pci_device_iommu_address_space(), where we could walk all devices using
> > > the code in this series to find a device with such a flag.    
> > 
> > Right, we can set a flag for all the pci devices that needs to consolidate
> > pci_device_iommu_address_space() result, however then it'll be vfio-pci only so
> > far.  Btw, I actually proposed similar things two months ago, and I think Igor
> > showed concern on that flag being vague on meaning:  
> 
> (1)
> > https://lore.kernel.org/qemu-devel/20210906104915.7dd5c934@redhat.com/  
> 
> >   
> >   > > Does it need to be a pre_plug hook?  I thought we might just need a flag in the
> >   > > pci device classes showing that it should be after vIOMMUs, then in vIOMMU
> >   > > realize functions we walk pci bus to make sure no such device exist?
> >   > > 
> >   > > We could have a base vIOMMU class, then that could be in the realize() of the
> >   > > common class.    
> >   > 
> >   > We basically don't know if device needs IOMMU or not and can work
> >   > with/without it just fine. In this case I'd think about IOMMU as board
> >   > feature that morphs PCI buses (some of them) (address space, bus numers, ...).
> >   > So I don't perceive any iommu flag as a device property at all.
> >   > 
> >   > As for realize vs pre_plug, the later is the part of abstract realize
> >   > (see: device_set_realized) and is already used by some PCI infrastructure:
> >   >   ex: pcie_cap_slot_pre_plug_cb/spapr_pci_pre_plug    
> > 
> > I still think that flag will work, that flag should only shows "whether this
> > device needs to be specified earlier than vIOMMU", but I can get the point from
> > Igor that it's at least confusing on what does the flag mean.  
> 
> > Meanwhile I
> > don't think that flag will be required, as this is not the first time we name a
> > special device in the code, e.g. pc_machine_device_pre_plug_cb().
> > intel_iommu.c has it too upon vfio-pci already on making sure caching-mode=on
> > in vtd_machine_done_notify_one().  
> 
> I pointed to specifically to _pre_plug() handler and not as
> implemented here in realize().
> Reasoning behind it is that some_device_realize() should not poke
> into other devices, while pc_machine_device_pre_plug_cb() is
> part of machine code can/may legitimately access its child devices and verify/
> configure them. (Hence I'd drop my suggested-by with current impl.)
>  
> > If Igor is okay with adding such a flag for PCIDevice class, I can do that in
> > the new version.  I don't have a strong opinion on this.  
> 
> Also, I've suggested to use pre_plug only as the last resort in case
> vfio-pci can't be made independent of the order (see [1] for reset time
> suggestion).
> So why 'reset' approach didn't work out?
> (this need to be cover letter/commit message as a reason why
> we are resorting to a hack)

Attaching MemoryListeners is not a lightweight process, that's where we
get the replay of device mappings and where the IOMMU will pin and map
all of the VM address space for the device.  Minimally, we'd need to
condition this setup based on the previous device address space such
that it's only affected on changes.  That probably leans more towards a
machine-init-done notifier to setup the listeners (assuming those work
correctly for hot-plug).

However, device address space is synonymous with the vfio container
used for a device.  In order to gain access to a vfio device, the vfio
group must first be placed into an IOMMU context, aka container.
Therefore that would imply that essentially the entire realize function
of a vfio device would move to a machine-init-done handler because we
can't actually access the device until we know the address space such
that we can place the group into a container.  Creating a container per
device which is attached to an address space at machine-init-done time
is also not practical as that incurs duplicate mapping overhead (time
and space) and locked memory accounting issues that we already struggle
with in the vIOMMU case.

There might be other issues, but that alone seems pretty impractical
for what seems like it ought not be such a taboo operation during
realize, to simply care about the device address space.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-26 15:11       ` Igor Mammedov
  2021-10-26 15:38         ` Alex Williamson
@ 2021-10-27  8:30         ` Peter Xu
  2021-10-28  2:30           ` Peter Xu
  1 sibling, 1 reply; 44+ messages in thread
From: Peter Xu @ 2021-10-27  8:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Eric Auger, David Gibson

On Tue, Oct 26, 2021 at 05:11:39PM +0200, Igor Mammedov wrote:
> On Fri, 22 Oct 2021 10:14:29 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Hi, Alex,
> > 
> > On Thu, Oct 21, 2021 at 04:30:39PM -0600, Alex Williamson wrote:
> > > On Thu, 21 Oct 2021 18:42:59 +0800
> > > Peter Xu <peterx@redhat.com> wrote:
> > >   
> > > > Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> > > > is realized.  
> > > 
> > > Sorry, I'm not onboard with this solution at all.
> > > 
> > > It would be really useful though if this commit log or a code comment
> > > described exactly the incompatibility for which vfio-pci devices are
> > > being called out here.  Otherwise I see this as a bit of magic voodoo
> > > that gets lost in lore and copied elsewhere and we're constantly trying
> > > to figure out specific incompatibilities when vfio-pci devices are
> > > trying really hard to be "just another device".  
> > 
> > Sure, I can enrich the commit message.
> > 
> > > 
> > > I infer from the link of the previous alternate solution that this is
> > > to do with the fact that vfio devices attach a memory listener to the
> > > device address space.  
> > 
> > IMHO it's not about the memory listeners, I think that' after vfio detected
> > some vIOMMU memory regions already, which must be based on an vIOMMU address
> > space being available.  I think the problem is that when realize() vfio-pci we
> > fetch the dma address space specifically for getting the vfio group, while that
> > could happen too early, even before vIOMMU is created.
> > 
> > > Interestingly that previous cover letter also discusses how vdpa devices
> > > might have a similar issue, which makes it confusing again that we're calling
> > > out vfio-pci devices by name rather than for a behavior.  
> > 
> > Yes I'll need to see whether this approach will be accepted first.  I think
> > similar thing could help VDPA but it's not required there because VDPA has
> > already worked around using pci_device_iommu_address_space().  So potentially
> > the only one to "fix" is the vfio-pci device using along with vIOMMU, when the
> > device ordering is specified in the wrong order.  I'll leave the VDPA problem
> > to Jason to see whether he prefers keeping current code, or switch to a simpler
> > one.  That should be after this one.
> > 
> > > 
> > > If the behavior here is that vfio-pci devices attach a listener to the
> > > device address space, then that provides a couple possible options.  We
> > > could look for devices that have recorded an interest in their address
> > > space, such as by setting a flag on PCIDevice when someone calls
> > > pci_device_iommu_address_space(), where we could walk all devices using
> > > the code in this series to find a device with such a flag.  
> > 
> > Right, we can set a flag for all the pci devices that needs to consolidate
> > pci_device_iommu_address_space() result, however then it'll be vfio-pci only so
> > far.  Btw, I actually proposed similar things two months ago, and I think Igor
> > showed concern on that flag being vague on meaning:
> 
> (1)
> > https://lore.kernel.org/qemu-devel/20210906104915.7dd5c934@redhat.com/
> 
> > 
> >   > > Does it need to be a pre_plug hook?  I thought we might just need a flag in the
> >   > > pci device classes showing that it should be after vIOMMUs, then in vIOMMU
> >   > > realize functions we walk pci bus to make sure no such device exist?
> >   > > 
> >   > > We could have a base vIOMMU class, then that could be in the realize() of the
> >   > > common class.  
> >   > 
> >   > We basically don't know if device needs IOMMU or not and can work
> >   > with/without it just fine. In this case I'd think about IOMMU as board
> >   > feature that morphs PCI buses (some of them) (address space, bus numers, ...).
> >   > So I don't perceive any iommu flag as a device property at all.
> >   > 
> >   > As for realize vs pre_plug, the later is the part of abstract realize
> >   > (see: device_set_realized) and is already used by some PCI infrastructure:
> >   >   ex: pcie_cap_slot_pre_plug_cb/spapr_pci_pre_plug  
> > 
> > I still think that flag will work, that flag should only shows "whether this
> > device needs to be specified earlier than vIOMMU", but I can get the point from
> > Igor that it's at least confusing on what does the flag mean.
> 
> > Meanwhile I
> > don't think that flag will be required, as this is not the first time we name a
> > special device in the code, e.g. pc_machine_device_pre_plug_cb().
> > intel_iommu.c has it too upon vfio-pci already on making sure caching-mode=on
> > in vtd_machine_done_notify_one().
> 
> I pointed to specifically to _pre_plug() handler and not as
> implemented here in realize().
> Reasoning behind it is that some_device_realize() should not poke
> into other devices, while pc_machine_device_pre_plug_cb() is
> part of machine code can/may legitimately access its child devices and verify/
> configure them. (Hence I'd drop my suggested-by with current impl.)

I see, I didn't try that because I see that q35 even does not have pre_plug()
installed yet, so I need to add it in the same patchset, am I right?

Frankly I've also no idea why pc has the pre_plug() but not q35.. But if you
think that matters then I can try.

>  
> > If Igor is okay with adding such a flag for PCIDevice class, I can do that in
> > the new version.  I don't have a strong opinion on this.
> 
> Also, I've suggested to use pre_plug only as the last resort in case
> vfio-pci can't be made independent of the order (see [1] for reset time
> suggestion).
> So why 'reset' approach didn't work out?
> (this need to be cover letter/commit message as a reason why
> we are resorting to a hack)

As explained by Alex, I think that's more challenging and it was discussed
quite a few times before, so I didn't mention that in cover letter.

Sorry I should have mentioned some of it.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci
  2021-10-27  8:30         ` Peter Xu
@ 2021-10-28  2:30           ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-28  2:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost,
	Michael S . Tsirkin, Jason Wang, David Hildenbrand, qemu-devel,
	Markus Armbruster, Shannon Zhao, Alex Williamson, Paolo Bonzini,
	Eric Auger, David Gibson

On Wed, Oct 27, 2021 at 04:30:18PM +0800, Peter Xu wrote:
> On Tue, Oct 26, 2021 at 05:11:39PM +0200, Igor Mammedov wrote:
> > On Fri, 22 Oct 2021 10:14:29 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> > 
> > > Hi, Alex,
> > > 
> > > On Thu, Oct 21, 2021 at 04:30:39PM -0600, Alex Williamson wrote:
> > > > On Thu, 21 Oct 2021 18:42:59 +0800
> > > > Peter Xu <peterx@redhat.com> wrote:
> > > >   
> > > > > Scan the pci bus to make sure there's no vfio-pci device attached before vIOMMU
> > > > > is realized.  
> > > > 
> > > > Sorry, I'm not onboard with this solution at all.
> > > > 
> > > > It would be really useful though if this commit log or a code comment
> > > > described exactly the incompatibility for which vfio-pci devices are
> > > > being called out here.  Otherwise I see this as a bit of magic voodoo
> > > > that gets lost in lore and copied elsewhere and we're constantly trying
> > > > to figure out specific incompatibilities when vfio-pci devices are
> > > > trying really hard to be "just another device".  
> > > 
> > > Sure, I can enrich the commit message.
> > > 
> > > > 
> > > > I infer from the link of the previous alternate solution that this is
> > > > to do with the fact that vfio devices attach a memory listener to the
> > > > device address space.  
> > > 
> > > IMHO it's not about the memory listeners, I think that' after vfio detected
> > > some vIOMMU memory regions already, which must be based on an vIOMMU address
> > > space being available.  I think the problem is that when realize() vfio-pci we
> > > fetch the dma address space specifically for getting the vfio group, while that
> > > could happen too early, even before vIOMMU is created.
> > > 
> > > > Interestingly that previous cover letter also discusses how vdpa devices
> > > > might have a similar issue, which makes it confusing again that we're calling
> > > > out vfio-pci devices by name rather than for a behavior.  
> > > 
> > > Yes I'll need to see whether this approach will be accepted first.  I think
> > > similar thing could help VDPA but it's not required there because VDPA has
> > > already worked around using pci_device_iommu_address_space().  So potentially
> > > the only one to "fix" is the vfio-pci device using along with vIOMMU, when the
> > > device ordering is specified in the wrong order.  I'll leave the VDPA problem
> > > to Jason to see whether he prefers keeping current code, or switch to a simpler
> > > one.  That should be after this one.
> > > 
> > > > 
> > > > If the behavior here is that vfio-pci devices attach a listener to the
> > > > device address space, then that provides a couple possible options.  We
> > > > could look for devices that have recorded an interest in their address
> > > > space, such as by setting a flag on PCIDevice when someone calls
> > > > pci_device_iommu_address_space(), where we could walk all devices using
> > > > the code in this series to find a device with such a flag.  
> > > 
> > > Right, we can set a flag for all the pci devices that needs to consolidate
> > > pci_device_iommu_address_space() result, however then it'll be vfio-pci only so
> > > far.  Btw, I actually proposed similar things two months ago, and I think Igor
> > > showed concern on that flag being vague on meaning:
> > 
> > (1)
> > > https://lore.kernel.org/qemu-devel/20210906104915.7dd5c934@redhat.com/
> > 
> > > 
> > >   > > Does it need to be a pre_plug hook?  I thought we might just need a flag in the
> > >   > > pci device classes showing that it should be after vIOMMUs, then in vIOMMU
> > >   > > realize functions we walk pci bus to make sure no such device exist?
> > >   > > 
> > >   > > We could have a base vIOMMU class, then that could be in the realize() of the
> > >   > > common class.  
> > >   > 
> > >   > We basically don't know if device needs IOMMU or not and can work
> > >   > with/without it just fine. In this case I'd think about IOMMU as board
> > >   > feature that morphs PCI buses (some of them) (address space, bus numers, ...).
> > >   > So I don't perceive any iommu flag as a device property at all.
> > >   > 
> > >   > As for realize vs pre_plug, the later is the part of abstract realize
> > >   > (see: device_set_realized) and is already used by some PCI infrastructure:
> > >   >   ex: pcie_cap_slot_pre_plug_cb/spapr_pci_pre_plug  
> > > 
> > > I still think that flag will work, that flag should only shows "whether this
> > > device needs to be specified earlier than vIOMMU", but I can get the point from
> > > Igor that it's at least confusing on what does the flag mean.
> > 
> > > Meanwhile I
> > > don't think that flag will be required, as this is not the first time we name a
> > > special device in the code, e.g. pc_machine_device_pre_plug_cb().
> > > intel_iommu.c has it too upon vfio-pci already on making sure caching-mode=on
> > > in vtd_machine_done_notify_one().
> > 
> > I pointed to specifically to _pre_plug() handler and not as
> > implemented here in realize().
> > Reasoning behind it is that some_device_realize() should not poke
> > into other devices, while pc_machine_device_pre_plug_cb() is
> > part of machine code can/may legitimately access its child devices and verify/
> > configure them. (Hence I'd drop my suggested-by with current impl.)
> 
> I see, I didn't try that because I see that q35 even does not have pre_plug()
> installed yet, so I need to add it in the same patchset, am I right?
> 
> Frankly I've also no idea why pc has the pre_plug() but not q35.. But if you
> think that matters then I can try.

I think I know what I'm missing.. there's the pc_get_hotplug_handler() that
conditionally fetch the handler, so when I was trying with q35/VT-d the handler
always returned null..

I'll repost with pre_plug(), thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/8] pci: Add pci_for_each_root_bus()
  2021-10-25 13:16   ` Michael S. Tsirkin
@ 2021-10-28  2:56     ` Peter Xu
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Xu @ 2021-10-28  2:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Daniel P . Berrange, Eduardo Habkost, Eric Auger,
	Jason Wang, David Hildenbrand, qemu-devel, Markus Armbruster,
	Shannon Zhao, Alex Williamson, Paolo Bonzini, Igor Mammedov,
	David Gibson

On Mon, Oct 25, 2021 at 09:16:53AM -0400, Michael S. Tsirkin wrote:
> > +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
> > +{
> > +    pci_root_bus_args args = { .fn = fn, .opaque = opaque };
> > +
> > +    object_child_foreach_recursive(object_get_root(), pci_find_root_bus, &args);
> > +}
> >  
> >  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
> >  {
> 
> 
> How about adding an API with a type filter to the qom core?
> E.g.
> object_child_foreach_type_recursive getting a type.

Sounds good, will do.  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-10-28  3:43 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 10:42 [PATCH 0/8] pci/iommu: Fail early if vfio-pci detected before vIOMMU Peter Xu
2021-10-21 10:42 ` [PATCH 1/8] pci: Define pci_bus_dev_fn type Peter Xu
2021-10-21 10:53   ` David Hildenbrand
2021-10-21 11:15   ` Eric Auger
2021-10-22  2:16     ` Peter Xu
2021-10-21 11:36   ` Philippe Mathieu-Daudé
2021-10-21 10:42 ` [PATCH 2/8] pci: Export pci_for_each_device_under_bus*() Peter Xu
2021-10-21 10:54   ` David Hildenbrand
2021-10-21 11:32   ` Eric Auger
2021-10-21 11:39   ` Philippe Mathieu-Daudé
2021-10-21 10:42 ` [PATCH 3/8] pci: Use pci_for_each_device_under_bus*() Peter Xu
2021-10-21 10:56   ` David Hildenbrand
2021-10-21 11:34   ` Eric Auger
2021-10-22  2:19     ` Peter Xu
2021-10-21 10:42 ` [PATCH 4/8] pci: Define pci_bus_fn/pci_bus_ret_fn type Peter Xu
2021-10-21 10:57   ` David Hildenbrand
2021-10-21 11:37   ` Eric Auger
2021-10-21 11:44   ` Philippe Mathieu-Daudé
2021-10-21 12:54     ` Philippe Mathieu-Daudé
2021-10-22  2:24       ` Peter Xu
2021-10-21 10:42 ` [PATCH 5/8] pci: Add pci_for_each_root_bus() Peter Xu
2021-10-21 11:00   ` David Hildenbrand
2021-10-21 12:22   ` Eric Auger
2021-10-25 13:16   ` Michael S. Tsirkin
2021-10-28  2:56     ` Peter Xu
2021-10-21 10:42 ` [PATCH 6/8] pci: Use pci_for_each_root_bus() in current code Peter Xu
2021-10-21 11:06   ` David Hildenbrand
2021-10-21 12:28   ` Eric Auger
2021-10-21 10:42 ` [PATCH 7/8] pci: Add pci_for_each_device_all() Peter Xu
2021-10-21 10:54   ` Michael S. Tsirkin
2021-10-22  2:33     ` Peter Xu
2021-10-22  8:43       ` Michael S. Tsirkin
2021-10-25 12:57         ` Peter Xu
2021-10-25 13:13           ` Michael S. Tsirkin
2021-10-21 10:42 ` [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci Peter Xu
2021-10-21 10:49   ` Michael S. Tsirkin
2021-10-21 12:38   ` Eric Auger
2021-10-22  2:37     ` Peter Xu
2021-10-21 22:30   ` Alex Williamson
2021-10-22  2:14     ` Peter Xu
2021-10-26 15:11       ` Igor Mammedov
2021-10-26 15:38         ` Alex Williamson
2021-10-27  8:30         ` Peter Xu
2021-10-28  2:30           ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).