All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V7 00/11] PCI devices passthrough on Arm, part 3
@ 2022-07-19 17:42 Oleksandr Tyshchenko
  2022-07-19 17:42 ` [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole Oleksandr Tyshchenko
                   ` (11 more replies)
  0 siblings, 12 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Paul Durrant,
	Roger Pau Monné,
	Oleksandr Andrushchenko, Rahul Singh

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Hi, all!

You can find previous discussion at [1].

1. This patch series is focusing on vPCI and adds support for non-identity
PCI BAR mappings which is required while passing through a PCI device to
a guest. The highlights are:

- Add relevant vpci register handlers when assigning PCI device to a domain
  and remove those when de-assigning. This allows having different
  handlers for different domains, e.g. hwdom and other guests.

- Emulate guest BAR register values based on physical BAR values.
  This allows creating a guest view of the registers and emulates
  size and properties probe as it is done during PCI device enumeration by
  the guest.

- Instead of handling a single range set, that contains all the memory
  regions of all the BARs and ROM, have them per BAR.

- Take into account guest's BAR view and program its p2m accordingly:
  gfn is guest's view of the BAR and mfn is the physical BAR value as set
  up by the host bridge in the hardware domain.
  This way hardware domain sees physical BAR values and guest sees
  emulated ones.

2. The series also adds support for virtual PCI bus topology for guests:
 - We emulate a single host bridge for the guest, so segment is always 0.
 - The implementation is limited to 32 devices which are allowed on
   a single PCI bus.
 - The virtual bus number is set to 0, so virtual devices are seen
   as embedded endpoints behind the root complex.

3. The series has been updated due to the new PCI(vPCI) locking scheme implemented
in the prereq series which is also on the review now [2].

4. For unprivileged guests vpci_{read|write} has been re-worked
to not passthrough accesses to the registers not explicitly handled
by the corresponding vPCI handlers: without that passthrough
to guests is completely unsafe as Xen allows them full access to
the registers. During development this can be reverted for debugging purposes.

!!! OT: please note, Oleksandr Andrushchenko who is the author of all this stuff
has managed to address allmost all review comments given for v6 and pushed the updated
version to the github (23.02.22). 
So after receiving his agreement I just picked it up and did the following before
pushing V7:
- rebased on the recent staging (resolving a few conflicts)
- updated according to the recent changes (added cf_check specifiers where appropriate, etc)
and performed minor adjustments
- made sure that both current and prereq series [2] didn't really break x86 by testing
PVH Dom0 (vPCI) and PV Dom0 + HVM DomU (PCI passthrough to DomU) using Qemu
- my colleague Volodymyr Babchuk (who was involved in the prereq series) rechecked that
both series worked on Arm using real HW

You can also find the series at [3].

[1] https://lore.kernel.org/xen-devel/20220204063459.680961-1-andr2000@gmail.com/
[2] https://lore.kernel.org/xen-devel/20220718211521.664729-1-volodymyr_babchuk@epam.com/
[3] https://github.com/otyshchenko1/xen/commits/vpci7

Oleksandr Andrushchenko (11):
  xen/pci: arm: add stub for is_memory_hole
  vpci: add hooks for PCI device assign/de-assign
  vpci/header: implement guest BAR register handlers
  rangeset: add RANGESETF_no_print flag
  vpci/header: handle p2m range sets per BAR
  vpci/header: program p2m with guest BAR view
  vpci/header: emulate PCI_COMMAND register for guests
  vpci/header: reset the command register when adding devices
  vpci: add initial support for virtual PCI bus topology
  xen/arm: translate virtual PCI bus topology for guests
  xen/arm: account IO handlers for emulated PCI MSI-X

 xen/arch/arm/mm.c             |   6 +
 xen/arch/arm/vpci.c           |  31 ++-
 xen/common/rangeset.c         |   5 +-
 xen/drivers/Kconfig           |   4 +
 xen/drivers/passthrough/pci.c |  11 +
 xen/drivers/vpci/header.c     | 458 ++++++++++++++++++++++++++--------
 xen/drivers/vpci/msi.c        |   4 +
 xen/drivers/vpci/msix.c       |   4 +
 xen/drivers/vpci/vpci.c       | 130 ++++++++++
 xen/include/xen/rangeset.h    |   5 +-
 xen/include/xen/sched.h       |   8 +
 xen/include/xen/vpci.h        |  42 +++-
 12 files changed, 604 insertions(+), 104 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-29 16:28   ` Oleksandr
  2022-07-19 17:42 ` [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign Oleksandr Tyshchenko
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add a stub for is_memory_hole which is required for PCI passthrough
on Arm.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
OT: It looks like the discussion got stuck. As I understand this
patch is not immediately needed in the context of current series
as PCI passthrough is not enabled on Arm at the moment. So the patch
could be added later on, but it is needed to allow PCI passthrough
to be built on Arm for those who want to test it.

Copy here some context provided by Julien:

Here a summary of the discussion (+ some my follow-up thoughts):

is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c
"xen/pci: detect when BARs are not suitably positioned") to check
whether the BAR are positioned outside of a valid memory range. This was
introduced to work-around quirky firmware.

In theory, this could also happen on Arm. In practice, this may not
happen but it sounds better to sanity check that the BAR contains
"valid" I/O range.

On x86, this is implemented by checking the region is not described is
in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined
ranges. So I think it would be possible to implement is_memory_hole() by
going through the list of hostbridges and check the ranges.

But first, I'd like to confirm my understanding with Rahul, and others.

If we were going to go this route, I would also rename the function to
be better match what it is doing (i.e. it checks the BAR is correctly
placed). As a potentially optimization/hardening for Arm, we could pass
the hostbridge so we don't have to walk all of them.
---
 xen/arch/arm/mm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 009b8cd9ef..bb34b97eb5 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1708,6 +1708,12 @@ unsigned long get_upper_mfn_bound(void)
     return max_page - 1;
 }
 
+bool is_memory_hole(mfn_t start, mfn_t end)
+{
+    /* TODO: this needs to be properly implemented. */
+    return true;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
  2022-07-19 17:42 ` [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-27 10:03   ` Jan Beulich
  2022-07-19 17:42 ` [PATCH V7 03/11] vpci/header: implement guest BAR register handlers Oleksandr Tyshchenko
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Paul Durrant, Roger Pau Monné

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

When a PCI device gets assigned/de-assigned some work on vPCI side needs
to be done for that device. Introduce a pair of hooks so vPCI can handle
that.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- do not pass struct domain to vpci_{assign|deassign}_device as
  pdev->domain can be used
- do not leave the device assigned (pdev->domain == new domain) in case
  vpci_assign_device fails: try to de-assign and if this also fails, then
  crash the domain
- re-work according to the new locking scheme (ASSERTs)
- OT: rebased
Since v5:
- do not split code into run_vpci_init
- do not check for is_system_domain in vpci_{de}assign_device
- do not use vpci_remove_device_handlers_locked and re-allocate
  pdev->vpci completely
- make vpci_deassign_device void
Since v4:
 - de-assign vPCI from the previous domain on device assignment
 - do not remove handlers in vpci_assign_device as those must not
   exist at that point
Since v3:
 - remove toolstack roll-back description from the commit message
   as error are to be handled with proper cleanup in Xen itself
 - remove __must_check
 - remove redundant rc check while assigning devices
 - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
 - use REGISTER_VPCI_INIT machinery to run required steps on device
   init/assign: add run_vpci_init helper
Since v2:
- define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
  for x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - extended the commit message
---
 xen/drivers/Kconfig           |  4 ++++
 xen/drivers/passthrough/pci.c | 11 +++++++++++
 xen/drivers/vpci/vpci.c       | 31 +++++++++++++++++++++++++++++++
 xen/include/xen/vpci.h        | 15 +++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47..780490cf8e 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
 	bool
 
+config HAS_VPCI_GUEST_SUPPORT
+	bool
+	depends on HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index f93922acc8..56af1dbd97 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1019,6 +1019,8 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
     if ( ret )
         goto out;
 
+    vpci_deassign_device(pdev);
+
     if ( pdev->domain == hardware_domain  )
         pdev->quarantine = false;
 
@@ -1558,6 +1560,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
 {
     const struct domain_iommu *hd = dom_iommu(d);
     struct pci_dev *pdev;
+    uint8_t old_devfn;
     int rc = 0;
 
     if ( !is_iommu_enabled(d) )
@@ -1577,6 +1580,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     if ( pdev->broken && d != hardware_domain && d != dom_io )
         goto done;
 
+    vpci_deassign_device(pdev);
+
     rc = pdev_msix_assign(d, pdev);
     if ( rc )
         goto done;
@@ -1594,6 +1599,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
                           pci_to_dev(pdev), flag)) )
         goto done;
 
+    old_devfn = devfn;
+
     for ( ; pdev->phantom_stride; rc = 0 )
     {
         devfn += pdev->phantom_stride;
@@ -1603,6 +1610,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
                         pci_to_dev(pdev), flag);
     }
 
+    rc = vpci_assign_device(pdev);
+    if ( rc && deassign_device(d, seg, bus, old_devfn) )
+        domain_crash(d);
+
  done:
     if ( rc )
         printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 674c9b347d..d187901422 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -92,6 +92,37 @@ int vpci_add_handlers(struct pci_dev *pdev)
 
     return rc;
 }
+
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Notify vPCI that device is assigned to guest. */
+int vpci_assign_device(struct pci_dev *pdev)
+{
+    int rc;
+
+    ASSERT(pcidevs_write_locked());
+
+    if ( !has_vpci(pdev->domain) )
+        return 0;
+
+    rc = vpci_add_handlers(pdev);
+    if ( rc )
+        vpci_deassign_device(pdev);
+
+    return rc;
+}
+
+/* Notify vPCI that device is de-assigned from guest. */
+void vpci_deassign_device(struct pci_dev *pdev)
+{
+    ASSERT(pcidevs_write_locked());
+
+    if ( !has_vpci(pdev->domain) )
+        return;
+
+    vpci_remove_device(pdev);
+}
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 7ab39839ff..e5501b9207 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -254,6 +254,21 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
 }
 #endif
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Notify vPCI that device is assigned/de-assigned to/from guest. */
+int vpci_assign_device(struct pci_dev *pdev);
+void vpci_deassign_device(struct pci_dev *pdev);
+#else
+static inline int vpci_assign_device(struct pci_dev *pdev)
+{
+    return 0;
+};
+
+static inline void vpci_deassign_device(struct pci_dev *pdev)
+{
+};
+#endif
+
 #endif
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 03/11] vpci/header: implement guest BAR register handlers
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
  2022-07-19 17:42 ` [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole Oleksandr Tyshchenko
  2022-07-19 17:42 ` [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-27 10:15   ` Jan Beulich
  2022-07-19 17:42 ` [PATCH V7 04/11] rangeset: add RANGESETF_no_print flag Oleksandr Tyshchenko
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Andrushchenko, Roger Pau Monné

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

All empty, IO and ROM BARs for guests are emulated by returning 0 on
reads and ignoring writes: this BARs are special with this respect as
their lower bits have special meaning, so returning default ~0 on read
may confuse guest OS.

Memory decoding is initially disabled when used by guests in order to
prevent the BAR being placed on top of a RAM region.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- unify the writing of the PCI_COMMAND register on the
  error path into a label
- do not introduce bar_ignore_access helper and open code
- s/guest_bar_ignore_read/empty_bar_read
- update error message in guest_bar_write
- only setup empty_bar_read for IO if !x86
- OT: rebased
- OT: add cf_check specifier to guest_bar_(write)read() and empty_bar_read()
Since v5:
- make sure that the guest set address has the same page offset
  as the physical address on the host
- remove guest_rom_{read|write} as those just implement the default
  behaviour of the registers not being handled
- adjusted comment for struct vpci.addr field
- add guest handlers for BARs which are not handled and will otherwise
  return ~0 on read and ignore writes. The BARs are special with this
  respect as their lower bits have special meaning, so returning ~0
  doesn't seem to be right
Since v4:
- updated commit message
- s/guest_addr/guest_reg
Since v3:
- squashed two patches: dynamic add/remove handlers and guest BAR
  handler implementation
- fix guest BAR read of the high part of a 64bit BAR (Roger)
- add error handling to vpci_assign_device
- s/dom%pd/%pd
- blank line before return
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment
---
 xen/drivers/vpci/header.c | 151 +++++++++++++++++++++++++++++++-------
 xen/include/xen/vpci.h    |   3 +
 2 files changed, 126 insertions(+), 28 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index e0461b1139..9fbbdc3500 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -412,6 +412,71 @@ static void cf_check bar_write(
     pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void cf_check guest_bar_write(
+    const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
+{
+    struct vpci_bar *bar = data;
+    bool hi = false;
+    uint64_t guest_reg = bar->guest_reg;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+    else
+    {
+        val &= PCI_BASE_ADDRESS_MEM_MASK;
+        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
+                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
+        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
+    }
+
+    guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
+    guest_reg |= (uint64_t)val << (hi ? 32 : 0);
+
+    guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
+
+    /*
+     * Make sure that the guest set address has the same page offset
+     * as the physical address on the host or otherwise things won't work as
+     * expected.
+     */
+    if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) !=
+         (bar->addr & ~PAGE_MASK) )
+    {
+        gprintk(XENLOG_WARNING,
+                "%pp: ignored BAR %zu write attempting to change page offset\n",
+                &pdev->sbdf, bar - pdev->vpci->header.bars + hi);
+        return;
+    }
+
+    bar->guest_reg = guest_reg;
+}
+
+static uint32_t cf_check guest_bar_read(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    const struct vpci_bar *bar = data;
+    bool hi = false;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+
+    return bar->guest_reg >> (hi ? 32 : 0);
+}
+
+static uint32_t cf_check empty_bar_read(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    return 0;
+}
+
 static void cf_check rom_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
@@ -468,6 +533,7 @@ static int cf_check init_bars(struct pci_dev *pdev)
     struct vpci_header *header;
     struct vpci_bar *bars;
     int rc;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
 
     ASSERT(pcidevs_write_locked());
 
@@ -512,13 +578,12 @@ static int cf_check init_bars(struct pci_dev *pdev)
         if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
         {
             bars[i].type = VPCI_BAR_MEM64_HI;
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
-                                   4, &bars[i]);
+            rc = vpci_add_register(pdev->vpci,
+                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
+                                   is_hwdom ? bar_write : guest_bar_write,
+                                   reg, 4, &bars[i]);
             if ( rc )
-            {
-                pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-                return rc;
-            }
+                goto fail;
 
             continue;
         }
@@ -527,6 +592,17 @@ static int cf_check init_bars(struct pci_dev *pdev)
         if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
         {
             bars[i].type = VPCI_BAR_IO;
+
+#ifndef CONFIG_X86
+            if ( !is_hwdom )
+            {
+                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
+                                       reg, 4, &bars[i]);
+                if ( rc )
+                    goto fail;
+            }
+#endif
+
             continue;
         }
         if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
@@ -538,14 +614,20 @@ static int cf_check init_bars(struct pci_dev *pdev)
         rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
                               (i == num_bars - 1) ? PCI_BAR_LAST : 0);
         if ( rc < 0 )
-        {
-            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-            return rc;
-        }
+            goto fail;
 
         if ( size == 0 )
         {
             bars[i].type = VPCI_BAR_EMPTY;
+
+            if ( !is_hwdom )
+            {
+                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
+                                       reg, 4, &bars[i]);
+                if ( rc )
+                    goto fail;
+            }
+
             continue;
         }
 
@@ -553,34 +635,47 @@ static int cf_check init_bars(struct pci_dev *pdev)
         bars[i].size = size;
         bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
-                               &bars[i]);
+        rc = vpci_add_register(pdev->vpci,
+                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
+                               is_hwdom ? bar_write : guest_bar_write,
+                               reg, 4, &bars[i]);
         if ( rc )
-        {
-            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-            return rc;
-        }
+            goto fail;
     }
 
-    /* Check expansion ROM. */
-    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
-    if ( rc > 0 && size )
+    /* Check expansion ROM: we do not handle ROM for guests. */
+    if ( is_hwdom )
     {
-        struct vpci_bar *rom = &header->bars[num_bars];
+        rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
+        if ( rc > 0 && size )
+        {
+            struct vpci_bar *rom = &header->bars[num_bars];
 
-        rom->type = VPCI_BAR_ROM;
-        rom->size = size;
-        rom->addr = addr;
-        header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
-                              PCI_ROM_ADDRESS_ENABLE;
+            rom->type = VPCI_BAR_ROM;
+            rom->size = size;
+            rom->addr = addr;
+            header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
+                                  PCI_ROM_ADDRESS_ENABLE;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
-                               4, rom);
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
+                                   rom_reg, 4, rom);
+            if ( rc )
+                rom->type = VPCI_BAR_EMPTY;
+        }
+    }
+    else
+    {
+        rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
+                               rom_reg, 4, &header->bars[num_bars]);
         if ( rc )
-            rom->type = VPCI_BAR_EMPTY;
+            goto fail;
     }
 
     return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
+
+ fail:
+    pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+    return rc;
 }
 REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e5501b9207..6e1d3b93cd 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -67,7 +67,10 @@ struct vpci {
     struct vpci_header {
         /* Information about the PCI BARs of this device. */
         struct vpci_bar {
+            /* Physical (host) address. */
             uint64_t addr;
+            /* Guest view of the BAR: address and lower bits. */
+            uint64_t guest_reg;
             uint64_t size;
             enum {
                 VPCI_BAR_EMPTY,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 04/11] rangeset: add RANGESETF_no_print flag
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 03/11] vpci/header: implement guest BAR register handlers Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-26 14:48   ` Rahul Singh
  2022-07-19 17:42 ` [PATCH V7 05/11] vpci/header: handle p2m range sets per BAR Oleksandr Tyshchenko
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

There are range sets which should not be printed, so introduce a flag
which allows marking those as such. Implement relevant logic to skip
such entries while printing.

While at it also simplify the definition of the flags by directly
defining those without helpers.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Since v5:
- comment indentation (Jan)
Since v1:
- update BUG_ON with new flag
- simplify the definition of the flags
---
 xen/common/rangeset.c      | 5 ++++-
 xen/include/xen/rangeset.h | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index a6ef264046..f8b909d016 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
     INIT_LIST_HEAD(&r->range_list);
     r->nr_ranges = -1;
 
-    BUG_ON(flags & ~RANGESETF_prettyprint_hex);
+    BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
     r->flags = flags;
 
     safe_strcpy(r->name, name ?: "(no name)");
@@ -575,6 +575,9 @@ void rangeset_domain_printk(
 
     list_for_each_entry ( r, &d->rangesets, rangeset_list )
     {
+        if ( r->flags & RANGESETF_no_print )
+            continue;
+
         printk("    ");
         rangeset_printk(r);
         printk("\n");
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 135f33f606..f7c69394d6 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -49,8 +49,9 @@ void rangeset_limit(
 
 /* Flags for passing to rangeset_new(). */
  /* Pretty-print range limits in hexadecimal. */
-#define _RANGESETF_prettyprint_hex 0
-#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
+#define RANGESETF_prettyprint_hex   (1U << 0)
+ /* Do not print entries marked with this flag. */
+#define RANGESETF_no_print          (1U << 1)
 
 bool_t __must_check rangeset_is_empty(
     const struct rangeset *r);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 05/11] vpci/header: handle p2m range sets per BAR
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (3 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 04/11] rangeset: add RANGESETF_no_print flag Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-19 17:42 ` [PATCH V7 06/11] vpci/header: program p2m with guest BAR view Oleksandr Tyshchenko
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Andrushchenko, Roger Pau Monné

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.
As the range sets are now created when a PCI device is added and destroyed
when it is removed so make them named and accounted.

Note that rangesets were chosen here despite there being only up to
3 separate ranges in each set (typically just 1). But rangeset per BAR
was chosen for the ease of implementation and existing code re-usability.

This is in preparation of making non-identity mappings in p2m for the MMIOs.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- update according to the new locking scheme
- remove odd fail label in modify_bars
- OT: rebased
Since v5:
- fix comments
- move rangeset allocation to init_bars and only allocate
  for MAPPABLE BARs
- check for overlap with the already setup BAR ranges
Since v4:
- use named range sets for BARs (Jan)
- changes required by the new locking scheme
- updated commit message (Jan)
Since v3:
- re-work vpci_cancel_pending accordingly to the per-BAR handling
- s/num_mem_ranges/map_pending and s/uint8_t/bool
- ASSERT(bar->mem) in modify_bars
- create and destroy the rangesets on add/remove
---
 xen/drivers/vpci/header.c | 241 +++++++++++++++++++++++++++-----------
 xen/drivers/vpci/vpci.c   |   5 +
 xen/include/xen/vpci.h    |   3 +-
 3 files changed, 182 insertions(+), 67 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 9fbbdc3500..f14ff11882 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,64 +131,106 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-    if ( v->vpci.mem )
+    struct pci_dev *pdev = v->vpci.pdev;
+
+    if ( !pdev )
+        return false;
+
+    pcidevs_read_lock();
+
+    if ( v->vpci.map_pending )
     {
         struct map_data data = {
             .d = v->domain,
             .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
         };
-        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
-
-        if ( rc == -ERESTART )
-            return true;
-
-        pcidevs_read_lock();
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev,
-                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
-        pcidevs_read_unlock();
-
-        rangeset_destroy(v->vpci.mem);
-        v->vpci.mem = NULL;
-        if ( rc )
+        struct vpci_header *header = &pdev->vpci->header;
+        unsigned int i;
+
+        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
         {
-            /*
-             * FIXME: in case of failure remove the device from the domain.
-             * Note that there might still be leftover mappings. While this is
-             * safe for Dom0, for DomUs the domain will likely need to be
-             * killed in order to avoid leaking stale p2m mappings on
-             * failure.
-             */
-            pcidevs_write_lock();
-            vpci_remove_device(v->vpci.pdev);
-            pcidevs_write_unlock();
+            struct vpci_bar *bar = &header->bars[i];
+            int rc;
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
+
+            if ( rc == -ERESTART )
+            {
+                pcidevs_read_unlock();
+                return true;
+            }
+
+            spin_lock(&pdev->vpci->lock);
+            /* Disable memory decoding unconditionally on failure. */
+            modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY :
+                                       v->vpci.cmd, !rc && v->vpci.rom_only);
+            spin_unlock(&pdev->vpci->lock);
+
+            if ( rc )
+            {
+                /*
+                 * FIXME: in case of failure remove the device from the domain.
+                 * Note that there might still be leftover mappings. While this
+                 * is safe for Dom0, for DomUs the domain needs to be killed in
+                 * order to avoid leaking stale p2m mappings on failure.
+                 */
+                v->vpci.map_pending = false;
+                pcidevs_read_unlock();
+
+                if ( is_hardware_domain(v->domain) )
+                {
+                    pcidevs_write_lock();
+                    vpci_remove_device(v->vpci.pdev);
+                    pcidevs_write_unlock();
+                }
+                else
+                    domain_crash(v->domain);
+
+                return false;
+            }
         }
+
+        v->vpci.map_pending = false;
     }
 
+    pcidevs_read_unlock();
+
     return false;
 }
 
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
-                            struct rangeset *mem, uint16_t cmd)
+                            uint16_t cmd)
 {
     struct map_data data = { .d = d, .map = true };
-    int rc;
+    struct vpci_header *header = &pdev->vpci->header;
+    int rc = 0;
+    unsigned int i;
 
-    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
+    ASSERT(pcidevs_write_locked());
+
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        /*
-         * It's safe to drop and re-acquire the lock in this context
-         * without risking pdev disappearing because devices cannot be
-         * removed until the initial domain has been started.
-         */
-        pcidevs_write_unlock();
-        process_pending_softirqs();
-        pcidevs_write_lock();
+        struct vpci_bar *bar = &header->bars[i];
+
+        if ( rangeset_is_empty(bar->mem) )
+            continue;
+
+        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
+                                              &data)) == -ERESTART )
+        {
+            /*
+             * It's safe to drop and reacquire the lock in this context
+             * without risking pdev disappearing because devices cannot be
+             * removed until the initial domain has been started.
+             */
+            pcidevs_write_unlock();
+            process_pending_softirqs();
+            pcidevs_write_lock();
+        }
     }
-    rangeset_destroy(mem);
     if ( !rc )
         modify_decoding(pdev, cmd, false);
 
@@ -196,7 +238,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
 }
 
 static void defer_map(struct domain *d, struct pci_dev *pdev,
-                      struct rangeset *mem, uint16_t cmd, bool rom_only)
+                      uint16_t cmd, bool rom_only)
 {
     struct vcpu *curr = current;
 
@@ -207,7 +249,7 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
      * started for the same device if the domain is not well-behaved.
      */
     curr->vpci.pdev = pdev;
-    curr->vpci.mem = mem;
+    curr->vpci.map_pending = true;
     curr->vpci.cmd = cmd;
     curr->vpci.rom_only = rom_only;
     /*
@@ -221,43 +263,61 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
 static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 {
     struct vpci_header *header = &pdev->vpci->header;
-    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
     struct pci_dev *tmp, *dev = NULL;
     const struct vpci_msix *msix = pdev->vpci->msix;
-    unsigned int i;
+    unsigned int i, j;
     int rc;
-
-    if ( !mem )
-        return -ENOMEM;
+    bool map_pending;
 
     /*
-     * Create a rangeset that represents the current device BARs memory region
-     * and compare it against all the currently active BAR memory regions. If
-     * an overlap is found, subtract it from the region to be mapped/unmapped.
+     * Create a rangeset per BAR that represents the current device memory
+     * region and compare it against all the currently active BAR memory
+     * regions. If an overlap is found, subtract it from the region to be
+     * mapped/unmapped.
      *
-     * First fill the rangeset with all the BARs of this device or with the ROM
+     * First fill the rangesets with the BARs of this device or with the ROM
      * BAR only, depending on whether the guest is toggling the memory decode
      * bit of the command register, or the enable bit of the ROM BAR register.
      */
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        const struct vpci_bar *bar = &header->bars[i];
+        struct vpci_bar *bar = &header->bars[i];
         unsigned long start = PFN_DOWN(bar->addr);
         unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
+        if ( !bar->mem )
+            continue;
+
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
                        : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
             continue;
 
-        rc = rangeset_add_range(mem, start, end);
+        rc = rangeset_add_range(bar->mem, start, end);
         if ( rc )
         {
             printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
                    start, end, rc);
-            rangeset_destroy(mem);
             return rc;
         }
+
+        /* Check for overlap with the already setup BAR ranges. */
+        for ( j = 0; j < i; j++ )
+        {
+            struct vpci_bar *bar = &header->bars[j];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                printk(XENLOG_G_WARNING
+                       "Failed to remove overlapping range [%lx, %lx]: %d\n",
+                       start, end, rc);
+                return rc;
+            }
+        }
     }
 
     /* Remove any MSIX regions if present. */
@@ -267,14 +327,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
         unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
                                      vmsix_table_size(pdev->vpci, i) - 1);
 
-        rc = rangeset_remove_range(mem, start, end);
-        if ( rc )
+        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
         {
-            printk(XENLOG_G_WARNING
-                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
-                   start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            const struct vpci_bar *bar = &header->bars[j];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                printk(XENLOG_G_WARNING
+                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
+                       start, end, rc);
+                return rc;
+            }
         }
     }
 
@@ -306,7 +373,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
             unsigned long start = PFN_DOWN(bar->addr);
             unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
-            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
+            if ( !bar->enabled ||
+                 !rangeset_overlaps_range(bar->mem, start, end) ||
                  /*
                   * If only the ROM enable bit is toggled check against other
                   * BARs in the same device for overlaps, but not against the
@@ -315,12 +383,11 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
                  (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
                 continue;
 
-            rc = rangeset_remove_range(mem, start, end);
+            rc = rangeset_remove_range(bar->mem, start, end);
             if ( rc )
             {
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
-                rangeset_destroy(mem);
                 return rc;
             }
         }
@@ -339,10 +406,23 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
          * will always be to establish mappings and process all the BARs.
          */
         ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
-        return apply_map(pdev->domain, pdev, mem, cmd);
+        return apply_map(pdev->domain, pdev, cmd);
     }
 
-    defer_map(dev->domain, dev, mem, cmd, rom_only);
+    /* Find out how many memory ranges has left after MSI and overlaps. */
+    map_pending = false;
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        if ( !rangeset_is_empty(header->bars[i].mem) )
+        {
+            map_pending = true;
+            break;
+        }
+
+    /* If there's no mapping work write the command register now. */
+    if ( !map_pending )
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+    else
+        defer_map(dev->domain, dev, cmd, rom_only);
 
     return 0;
 }
@@ -525,6 +605,19 @@ static void cf_check rom_write(
         rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
+static int bar_add_rangeset(struct pci_dev *pdev, struct vpci_bar *bar, int i)
+{
+    char str[32];
+
+    snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
+
+    bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
+    if ( !bar->mem )
+        return -ENOMEM;
+
+    return 0;
+}
+
 static int cf_check init_bars(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -611,6 +704,13 @@ static int cf_check init_bars(struct pci_dev *pdev)
         else
             bars[i].type = VPCI_BAR_MEM32;
 
+        rc = bar_add_rangeset(pdev, &bars[i], i);
+        if ( rc )
+        {
+            bars[i].type = VPCI_BAR_EMPTY;
+            goto fail;
+        }
+
         rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
                               (i == num_bars - 1) ? PCI_BAR_LAST : 0);
         if ( rc < 0 )
@@ -661,6 +761,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
                                    rom_reg, 4, rom);
             if ( rc )
                 rom->type = VPCI_BAR_EMPTY;
+            else
+            {
+                rc = bar_add_rangeset(pdev, rom, i);
+                if ( rc )
+                {
+                    rom->type = VPCI_BAR_EMPTY;
+                    goto fail;
+                }
+            }
         }
     }
     else
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index d187901422..f683346285 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -38,6 +38,8 @@ extern vpci_register_init_t *const __end_vpci_array[];
 
 void vpci_remove_device(struct pci_dev *pdev)
 {
+    unsigned int i;
+
     ASSERT(pcidevs_write_locked());
 
     if ( !has_vpci(pdev->domain) || !pdev->vpci )
@@ -54,6 +56,9 @@ void vpci_remove_device(struct pci_dev *pdev)
         xfree(r);
     }
     spin_unlock(&pdev->vpci->lock);
+
+    for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
+        rangeset_destroy(pdev->vpci->header.bars[i].mem);
     if ( pdev->vpci->msix && pdev->vpci->msix->pba )
         iounmap(pdev->vpci->msix->pba);
     xfree(pdev->vpci->msix);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6e1d3b93cd..6332659c4a 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -72,6 +72,7 @@ struct vpci {
             /* Guest view of the BAR: address and lower bits. */
             uint64_t guest_reg;
             uint64_t size;
+            struct rangeset *mem;
             enum {
                 VPCI_BAR_EMPTY,
                 VPCI_BAR_IO,
@@ -146,9 +147,9 @@ struct vpci {
 
 struct vpci_vcpu {
     /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
-    struct rangeset *mem;
     struct pci_dev *pdev;
     uint16_t cmd;
+    bool map_pending : 1;
     bool rom_only : 1;
 };
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 06/11] vpci/header: program p2m with guest BAR view
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (4 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 05/11] vpci/header: handle p2m range sets per BAR Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-27 10:19   ` Jan Beulich
  2022-07-19 17:42 ` [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Tyshchenko
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Andrushchenko, Roger Pau Monné

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Take into account guest's BAR view and program its p2m accordingly:
gfn is guest's view of the BAR and mfn is the physical BAR value as set
up by the PCI bus driver in the hardware domain.
This way hardware domain sees physical BAR values and guest sees
emulated ones.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v5:
- remove debug print in map_range callback
- remove "identity" from the debug print
Since v4:
- moved start_{gfn|mfn} calculation into map_range
- pass vpci_bar in the map_data instead of start_{gfn|mfn}
- s/guest_addr/guest_reg
Since v3:
- updated comment (Roger)
- removed gfn_add(map->start_gfn, rc); which is wrong
- use v->domain instead of v->vpci.pdev->domain
- removed odd e.g. in comment
- s/d%d/%pd in altered code
- use gdprintk for map/unmap logs
Since v2:
- improve readability for data.start_gfn and restructure ?: construct
Since v1:
 - s/MSI/MSI-X in comments
---
 xen/drivers/vpci/header.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index f14ff11882..4e6547a54d 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -30,6 +30,7 @@
 
 struct map_data {
     struct domain *d;
+    const struct vpci_bar *bar;
     bool map;
 };
 
@@ -41,8 +42,21 @@ static int cf_check map_range(
 
     for ( ; ; )
     {
+        /* Start address of the BAR as seen by the guest. */
+        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
+                                        ? map->bar->addr
+                                        : map->bar->guest_reg));
+        /* Physical start address of the BAR. */
+        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
         unsigned long size = e - s + 1;
 
+        /*
+         * Ranges to be mapped don't always start at the BAR start address, as
+         * there can be holes or partially consumed ranges. Account for the
+         * offset of the current address from the BAR start.
+         */
+        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
+
         /*
          * ARM TODOs:
          * - On ARM whether the memory is prefetchable or not should be passed
@@ -52,8 +66,8 @@ static int cf_check map_range(
          * - {un}map_mmio_regions doesn't support preemption.
          */
 
-        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
-                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
+        rc = map->map ? map_mmio_regions(map->d, start_gfn, size, _mfn(s))
+                      : unmap_mmio_regions(map->d, start_gfn, size, _mfn(s));
         if ( rc == 0 )
         {
             *c += size;
@@ -62,8 +76,8 @@ static int cf_check map_range(
         if ( rc < 0 )
         {
             printk(XENLOG_G_WARNING
-                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
-                   map->map ? "" : "un", s, e, map->d->domain_id, rc);
+                   "Failed to %smap [%lx, %lx] for %pd: %d\n",
+                   map->map ? "" : "un", s, e, map->d, rc);
             break;
         }
         ASSERT(rc < size);
@@ -155,6 +169,7 @@ bool vpci_process_pending(struct vcpu *v)
             if ( rangeset_is_empty(bar->mem) )
                 continue;
 
+            data.bar = bar;
             rc = rangeset_consume_ranges(bar->mem, map_range, &data);
 
             if ( rc == -ERESTART )
@@ -218,6 +233,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
         if ( rangeset_is_empty(bar->mem) )
             continue;
 
+        data.bar = bar;
         while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
                                               &data)) == -ERESTART )
         {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (5 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 06/11] vpci/header: program p2m with guest BAR view Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-26 15:30   ` Jan Beulich
  2022-07-19 17:42 ` [PATCH V7 08/11] vpci/header: reset the command register when adding devices Oleksandr Tyshchenko
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Andrushchenko, Roger Pau Monné

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's view of this will want to be zero initially, the host having set
it to 1 may not easily be overwritten with 0, or else we'd effectively
imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

There are examples of emulators [1], [2] which already deal with PCI_COMMAND
register emulation and it seems that at most they care about is the only INTx
bit (besides IO/memory enable and bus master which are write through).
It could be because in order to properly emulate the PCI_COMMAND register
we need to know about the whole PCI topology, e.g. if any setting in device's
command register is aligned with the upstream port etc.
This makes me think that because of this complexity others just ignore that.
Neither I think this can easily be done in Xen case.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the guest's view
of the command register.

For now our emulation only makes sure INTx is set according to the host
requirements, i.e. depending on MSI/MSI-X enabled state.

This implementation and the decision to only emulate INTx bit for now
is based on the previous discussion at [3].

[1] https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_config_init.c#L310
[2] https://github.com/projectacrn/acrn-hypervisor/blob/master/hypervisor/hw/pci.c#L336
[3] https://patchwork.kernel.org/project/xen-devel/patch/20210903100831.177748-9-andr2000@gmail.com/

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- fold guest's logic into cmd_write
- implement cmd_read, so we can report emulated INTx state to guests
- introduce header->guest_cmd to hold the emulated state of the
  PCI_COMMAND register for guests
- OT: rebased
- OT: add cf_check specifier to cmd_read()
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c | 38 +++++++++++++++++++++++++++++++++++++-
 xen/drivers/vpci/msi.c    |  4 ++++
 xen/drivers/vpci/msix.c   |  4 ++++
 xen/include/xen/vpci.h    |  3 +++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 4e6547a54d..2ce69a63a2 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -443,11 +443,27 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
     return 0;
 }
 
+/* TODO: Add proper emulation for all bits of the command register. */
 static void cf_check cmd_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
 {
     uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
 
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        struct vpci_header *header = data;
+
+        header->guest_cmd = cmd;
+#ifdef CONFIG_HAS_PCI_MSI
+        if ( pdev->vpci->msi->enabled || pdev->vpci->msix->enabled )
+            /*
+             * Guest wants to enable INTx, but it can't be enabled
+             * if MSI/MSI-X enabled.
+             */
+            cmd |= PCI_COMMAND_INTX_DISABLE;
+#endif
+    }
+
     /*
      * Let Dom0 play with all the bits directly except for the memory
      * decoding one.
@@ -464,6 +480,19 @@ static void cf_check cmd_write(
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static uint32_t cf_check cmd_read(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        struct vpci_header *header = data;
+
+        return header->guest_cmd;
+    }
+
+    return pci_conf_read16(pdev->sbdf, reg);
+}
+
 static void cf_check bar_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
@@ -665,8 +694,15 @@ static int cf_check init_bars(struct pci_dev *pdev)
         return -EOPNOTSUPP;
     }
 
+    /*
+     * According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
+     * Device Control" the reset state of the command register is
+     * typically all 0's, so this is used as initial value for the guests.
+     */
+    ASSERT(header->guest_cmd == 0);
+
     /* Setup a handler for the command register. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
+    rc = vpci_add_register(pdev->vpci, cmd_read, cmd_write, PCI_COMMAND,
                            2, header);
     if ( rc )
         return rc;
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index d864f740cf..c8c495e2d7 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -70,6 +70,10 @@ static void cf_check control_write(
 
         if ( vpci_msi_arch_enable(msi, pdev, vectors) )
             return;
+
+        /* Make sure guest doesn't enable INTx while enabling MSI. */
+        if ( !is_hardware_domain(pdev->domain) )
+            pci_intx(pdev, false);
     }
     else
         vpci_msi_arch_disable(msi, pdev);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 35cc9280f4..06f84b8b02 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -92,6 +92,10 @@ static void cf_check control_write(
         for ( i = 0; i < msix->max_entries; i++ )
             if ( !msix->entries[i].masked && msix->entries[i].updated )
                 update_entry(&msix->entries[i], pdev, i);
+
+        /* Make sure guest doesn't enable INTx while enabling MSI-X. */
+        if ( !is_hardware_domain(pdev->domain) )
+            pci_intx(pdev, false);
     }
     else if ( !new_enabled && msix->enabled )
     {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6332659c4a..1010f68c28 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -87,6 +87,9 @@ struct vpci {
         } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
         /* At most 6 BARS + 1 expansion ROM BAR. */
 
+        /* Guest view of the PCI_COMMAND register. */
+        uint16_t guest_cmd;
+
         /*
          * Store whether the ROM enable bit is set (doesn't imply ROM BAR
          * is mapped into guest p2m) if there's a ROM BAR on the device.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 08/11] vpci/header: reset the command register when adding devices
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (6 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-26 15:09   ` Rahul Singh
  2022-07-26 15:23   ` Jan Beulich
  2022-07-19 17:42 ` [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology Oleksandr Tyshchenko
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Oleksandr Andrushchenko, Roger Pau Monné

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Reset the command register when assigning a PCI device to a guest:
according to the PCI spec the PCI_COMMAND register is typically all 0's
after reset, but this might not be true for the guest as it needs
to respect host's settings.
For that reason, do not write 0 to the PCI_COMMAND register directly,
but go through the corresponding emulation layer (cmd_write), which
will take care about the actual bits written.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- use cmd_write directly without introducing emulate_cmd_reg
- update commit message with more description on all 0's in PCI_COMMAND
Since v5:
- updated commit message
Since v1:
 - do not write 0 to the command register, but respect host settings.
---
 xen/drivers/vpci/header.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 2ce69a63a2..1be9775dda 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -701,6 +701,10 @@ static int cf_check init_bars(struct pci_dev *pdev)
      */
     ASSERT(header->guest_cmd == 0);
 
+    /* Reset the command register for guests. */
+    if ( !is_hwdom )
+        cmd_write(pdev, PCI_COMMAND, 0, header);
+
     /* Setup a handler for the command register. */
     rc = vpci_add_register(pdev->vpci, cmd_read, cmd_write, PCI_COMMAND,
                            2, header);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (7 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 08/11] vpci/header: reset the command register when adding devices Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-27 10:32   ` Jan Beulich
  2022-07-19 17:42 ` [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests Oleksandr Tyshchenko
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Roger Pau Monné,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Wei Liu

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- re-work wrt new locking scheme
- OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
Since v5:
- s/vpci_add_virtual_device/add_virtual_device and make it static
- call add_virtual_device from vpci_assign_device and do not use
  REGISTER_VPCI_INIT machinery
- add pcidevs_locked ASSERT
- use DECLARE_BITMAP for vpci_dev_assigned_map
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
    functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/vpci/vpci.c | 70 ++++++++++++++++++++++++++++++++++++++++-
 xen/include/xen/sched.h |  8 +++++
 xen/include/xen/vpci.h  | 11 +++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index f683346285..d4601ecf9b 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -84,6 +84,9 @@ int vpci_add_handlers(struct pci_dev *pdev)
 
     INIT_LIST_HEAD(&pdev->vpci->handlers);
     spin_lock_init(&pdev->vpci->lock);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    pdev->vpci->guest_sbdf.sbdf = ~0;
+#endif
 
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
@@ -99,6 +102,62 @@ int vpci_add_handlers(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static int add_virtual_device(struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    pci_sbdf_t sbdf = { 0 };
+    unsigned long new_dev_number;
+
+    if ( is_hardware_domain(d) )
+        return 0;
+
+    ASSERT(pcidevs_write_locked());
+
+    /*
+     * Each PCI bus supports 32 devices/slots at max or up to 256 when
+     * there are multi-function ones which are not yet supported.
+     */
+    if ( pdev->info.is_extfn )
+    {
+        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+                 &pdev->sbdf);
+        return -EOPNOTSUPP;
+    }
+
+    new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
+                                         VPCI_MAX_VIRT_DEV);
+    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
+        return -ENOSPC;
+
+    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
+
+    /*
+     * Both segment and bus number are 0:
+     *  - we emulate a single host bridge for the guest, e.g. segment 0
+     *  - with bus 0 the virtual devices are seen as embedded
+     *    endpoints behind the root complex
+     *
+     * TODO: add support for multi-function devices.
+     */
+    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
+    pdev->vpci->guest_sbdf = sbdf;
+
+    return 0;
+
+}
+
+static void vpci_remove_virtual_device(const struct pci_dev *pdev)
+{
+    ASSERT(pcidevs_write_locked());
+
+    if ( pdev->vpci )
+    {
+        __clear_bit(pdev->vpci->guest_sbdf.dev,
+                    &pdev->domain->vpci_dev_assigned_map);
+        pdev->vpci->guest_sbdf.sbdf = ~0;
+    }
+}
+
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct pci_dev *pdev)
 {
@@ -111,8 +170,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
     rc = vpci_add_handlers(pdev);
     if ( rc )
-        vpci_deassign_device(pdev);
+        goto fail;
+
+    rc = add_virtual_device(pdev);
+    if ( rc )
+        goto fail;
+
+    return 0;
 
+ fail:
+    vpci_deassign_device(pdev);
     return rc;
 }
 
@@ -124,6 +191,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
     if ( !has_vpci(pdev->domain) )
         return;
 
+    vpci_remove_virtual_device(pdev);
     vpci_remove_device(pdev);
 }
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index b9515eb497..a2848a5740 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -457,6 +457,14 @@ struct domain
 
 #ifdef CONFIG_HAS_PCI
     struct list_head pdev_list;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /*
+     * The bitmap which shows which device numbers are already used by the
+     * virtual PCI bus topology and is used to assign a unique SBDF to the
+     * next passed through virtual PCI device.
+     */
+    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
+#endif
 #endif
 
 #ifdef CONFIG_HAS_PASSTHROUGH
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 1010f68c28..cc14b0086d 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -21,6 +21,13 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
+/*
+ * Maximum number of devices supported by the virtual bus topology:
+ * each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ */
+#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
+
 #define REGISTER_VPCI_INIT(x, p)                \
   static vpci_register_init_t *const x##_entry  \
                __used_section(".data.vpci." p) = x
@@ -145,6 +152,10 @@ struct vpci {
             struct vpci_arch_msix_entry arch;
         } entries[];
     } *msix;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /* Guest SBDF of the device. */
+    pci_sbdf_t guest_sbdf;
+#endif
 #endif
 };
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (8 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-26 15:16   ` Jan Beulich
  2022-07-19 17:42 ` [PATCH V7 11/11] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Tyshchenko
  2022-07-26 13:47 ` [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Rahul Singh
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v6:
- add pcidevs locking to vpci_translate_virtual_device
- update wrt to the new locking scheme
Since v5:
- add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT
  case to simplify ifdefery
- add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device
- reset output register on failed virtual SBDF translation
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/vpci.c     | 17 +++++++++++++++++
 xen/drivers/vpci/vpci.c | 26 ++++++++++++++++++++++++++
 xen/include/xen/vpci.h  |  7 +++++++
 3 files changed, 50 insertions(+)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index a9fc5817f9..84b2b068a0 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
     /* data is needed to prevent a pointer cast on 32bit */
     unsigned long data;
 
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+    {
+        *r = ~0ul;
+        return 1;
+    }
+
     if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
                         1U << info->dabt.size, &data) )
     {
@@ -59,6 +69,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
     struct pci_host_bridge *bridge = p;
     pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
 
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+        return 1;
+
     return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
                            1U << info->dabt.size, r);
 }
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index d4601ecf9b..fc2c51dc3e 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -158,6 +158,32 @@ static void vpci_remove_virtual_device(const struct pci_dev *pdev)
     }
 }
 
+/*
+ * Find the physical device which is mapped to the virtual device
+ * and translate virtual SBDF to the physical one.
+ */
+bool vpci_translate_virtual_device(struct domain *d, pci_sbdf_t *sbdf)
+{
+    struct pci_dev *pdev;
+
+    ASSERT(!is_hardware_domain(d));
+
+    pcidevs_read_lock();
+    for_each_pdev( d, pdev )
+    {
+        if ( pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf) )
+        {
+            /* Replace guest SBDF with the physical one. */
+            *sbdf = pdev->sbdf;
+            pcidevs_read_unlock();
+            return true;
+        }
+    }
+
+    pcidevs_read_unlock();
+    return false;
+}
+
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct pci_dev *pdev)
 {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index cc14b0086d..5749d8da78 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -276,6 +276,7 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
 /* Notify vPCI that device is assigned/de-assigned to/from guest. */
 int vpci_assign_device(struct pci_dev *pdev);
 void vpci_deassign_device(struct pci_dev *pdev);
+bool vpci_translate_virtual_device(struct domain *d, pci_sbdf_t *sbdf);
 #else
 static inline int vpci_assign_device(struct pci_dev *pdev)
 {
@@ -285,6 +286,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 static inline void vpci_deassign_device(struct pci_dev *pdev)
 {
 };
+
+static inline bool vpci_translate_virtual_device(struct domain *d,
+                                                 pci_sbdf_t *sbdf)
+{
+    return false;
+}
 #endif
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 11/11] xen/arm: account IO handlers for emulated PCI MSI-X
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (9 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests Oleksandr Tyshchenko
@ 2022-07-19 17:42 ` Oleksandr Tyshchenko
  2022-07-26 14:50   ` Rahul Singh
  2022-07-26 13:47 ` [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Rahul Singh
  11 siblings, 1 reply; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-19 17:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Andrushchenko, Bertrand Marquis, Volodymyr Babchuk,
	Julien Grall, Julien Grall, Stefano Stabellini

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Julien Grall <jgrall@amazon.com>
---
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
This actually moved here from the part 2 of the prep work for PCI
passthrough on Arm as it seems to be the proper place for it.

Since v5:
- optimize with IS_ENABLED(CONFIG_HAS_PCI_MSI) since VPCI_MAX_VIRT_DEV is
  defined unconditionally
New in v5
---
 xen/arch/arm/vpci.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 84b2b068a0..c5902cb9d3 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -131,6 +131,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+    unsigned int count;
+
     if ( !has_vpci(d) )
         return 0;
 
@@ -151,7 +153,17 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
      * For guests each host bridge requires one region to cover the
      * configuration space. At the moment, we only expose a single host bridge.
      */
-    return 1;
+    count = 1;
+
+    /*
+     * There's a single MSI-X MMIO handler that deals with both PBA
+     * and MSI-X tables per each PCI device being passed through.
+     * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
+     */
+    if ( IS_ENABLED(CONFIG_HAS_PCI_MSI) )
+        count += VPCI_MAX_VIRT_DEV;
+
+    return count;
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 00/11] PCI devices passthrough on Arm, part 3
  2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
                   ` (10 preceding siblings ...)
  2022-07-19 17:42 ` [PATCH V7 11/11] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Tyshchenko
@ 2022-07-26 13:47 ` Rahul Singh
  2022-07-26 15:18   ` Oleksandr Tyshchenko
  11 siblings, 1 reply; 48+ messages in thread
From: Rahul Singh @ 2022-07-26 13:47 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Paul Durrant,
	Roger Pau Monné,
	Oleksandr Andrushchenko

Hi Oleksandr,

> On 19 Jul 2022, at 6:42 pm, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Hi, all!
> 
> You can find previous discussion at [1].
> 
> 1. This patch series is focusing on vPCI and adds support for non-identity
> PCI BAR mappings which is required while passing through a PCI device to
> a guest. The highlights are:
> 
> - Add relevant vpci register handlers when assigning PCI device to a domain
>  and remove those when de-assigning. This allows having different
>  handlers for different domains, e.g. hwdom and other guests.
> 
> - Emulate guest BAR register values based on physical BAR values.
>  This allows creating a guest view of the registers and emulates
>  size and properties probe as it is done during PCI device enumeration by
>  the guest.
> 
> - Instead of handling a single range set, that contains all the memory
>  regions of all the BARs and ROM, have them per BAR.
> 
> - Take into account guest's BAR view and program its p2m accordingly:
>  gfn is guest's view of the BAR and mfn is the physical BAR value as set
>  up by the host bridge in the hardware domain.
>  This way hardware domain sees physical BAR values and guest sees
>  emulated ones.
> 
> 2. The series also adds support for virtual PCI bus topology for guests:
> - We emulate a single host bridge for the guest, so segment is always 0.
> - The implementation is limited to 32 devices which are allowed on
>   a single PCI bus.
> - The virtual bus number is set to 0, so virtual devices are seen
>   as embedded endpoints behind the root complex.
> 
> 3. The series has been updated due to the new PCI(vPCI) locking scheme implemented
> in the prereq series which is also on the review now [2].
> 
> 4. For unprivileged guests vpci_{read|write} has been re-worked
> to not passthrough accesses to the registers not explicitly handled
> by the corresponding vPCI handlers: without that passthrough
> to guests is completely unsafe as Xen allows them full access to
> the registers. During development this can be reverted for debugging purposes.
> 
> !!! OT: please note, Oleksandr Andrushchenko who is the author of all this stuff
> has managed to address allmost all review comments given for v6 and pushed the updated
> version to the github (23.02.22). 
> So after receiving his agreement I just picked it up and did the following before
> pushing V7:
> - rebased on the recent staging (resolving a few conflicts)
> - updated according to the recent changes (added cf_check specifiers where appropriate, etc)
> and performed minor adjustments
> - made sure that both current and prereq series [2] didn't really break x86 by testing
> PVH Dom0 (vPCI) and PV Dom0 + HVM DomU (PCI passthrough to DomU) using Qemu
> - my colleague Volodymyr Babchuk (who was involved in the prereq series) rechecked that
> both series worked on Arm using real HW
> 
> You can also find the series at [3].
> 
> [1] https://lore.kernel.org/xen-devel/20220204063459.680961-1-andr2000@gmail.com/
> [2] https://lore.kernel.org/xen-devel/20220718211521.664729-1-volodymyr_babchuk@epam.com/
> [3] https://github.com/otyshchenko1/xen/commits/vpci7
> 

I tested the whole series on ARM N1SDP board everything works as expected.

So for the whole series:
Tested-by: Rahul Singh <rahul.singh@arm.com>

Regards,
Rahul


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 04/11] rangeset: add RANGESETF_no_print flag
  2022-07-19 17:42 ` [PATCH V7 04/11] rangeset: add RANGESETF_no_print flag Oleksandr Tyshchenko
@ 2022-07-26 14:48   ` Rahul Singh
  0 siblings, 0 replies; 48+ messages in thread
From: Rahul Singh @ 2022-07-26 14:48 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu

Hi Oleksandr,

> On 19 Jul 2022, at 6:42 pm, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> There are range sets which should not be printed, so introduce a flag
> which allows marking those as such. Implement relevant logic to skip
> such entries while printing.
> 
> While at it also simplify the definition of the flags by directly
> defining those without helpers.
> 
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Rahul Singh <rahul.singh@arm.com>
 
Regards,
Rahul



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 11/11] xen/arm: account IO handlers for emulated PCI MSI-X
  2022-07-19 17:42 ` [PATCH V7 11/11] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Tyshchenko
@ 2022-07-26 14:50   ` Rahul Singh
  0 siblings, 0 replies; 48+ messages in thread
From: Rahul Singh @ 2022-07-26 14:50 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Andrushchenko, Bertrand Marquis,
	Volodymyr Babchuk, Julien Grall, Julien Grall,
	Stefano Stabellini

Hi Oleksandr,

> On 19 Jul 2022, at 6:42 pm, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> At the moment, we always allocate an extra 16 slots for IO handlers
> (see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
> MSI-X registers we need to explicitly tell that we have additional IO
> handlers, so those are accounted.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Acked-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Rahul Singh <rahul.singh@arm.com>

Regards,
Rahul



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/11] vpci/header: reset the command register when adding devices
  2022-07-19 17:42 ` [PATCH V7 08/11] vpci/header: reset the command register when adding devices Oleksandr Tyshchenko
@ 2022-07-26 15:09   ` Rahul Singh
  2022-07-26 15:23   ` Jan Beulich
  1 sibling, 0 replies; 48+ messages in thread
From: Rahul Singh @ 2022-07-26 15:09 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Andrushchenko, Roger Pau Monné

HI Oleksandr,

> On 19 Jul 2022, at 6:42 pm, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
> 
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Reset the command register when assigning a PCI device to a guest:
> according to the PCI spec the PCI_COMMAND register is typically all 0's
> after reset, but this might not be true for the guest as it needs
> to respect host's settings.
> For that reason, do not write 0 to the PCI_COMMAND register directly,
> but go through the corresponding emulation layer (cmd_write), which
> will take care about the actual bits written.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> 

Reviewed-by: Rahul Singh <rahul.singh@arm.com>

Regards,
Rahul

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-19 17:42 ` [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests Oleksandr Tyshchenko
@ 2022-07-26 15:16   ` Jan Beulich
  2022-07-27 17:54     ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-26 15:16 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné,
	xen-devel

On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>      /* data is needed to prevent a pointer cast on 32bit */
>      unsigned long data;
>  
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
> +    {
> +        *r = ~0ul;
> +        return 1;
> +    }

I'm probably simply lacking specific Arm-side knowledge, but it strikes
me as odd that the need for translation would be dependent upon "bridge".

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -158,6 +158,32 @@ static void vpci_remove_virtual_device(const struct pci_dev *pdev)
>      }
>  }
>  
> +/*
> + * Find the physical device which is mapped to the virtual device
> + * and translate virtual SBDF to the physical one.
> + */
> +bool vpci_translate_virtual_device(struct domain *d, pci_sbdf_t *sbdf)
> +{
> +    struct pci_dev *pdev;

const wherever possible please (i.e. likely also for the first function
parameter).

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -276,6 +276,7 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
>  /* Notify vPCI that device is assigned/de-assigned to/from guest. */
>  int vpci_assign_device(struct pci_dev *pdev);
>  void vpci_deassign_device(struct pci_dev *pdev);
> +bool vpci_translate_virtual_device(struct domain *d, pci_sbdf_t *sbdf);
>  #else
>  static inline int vpci_assign_device(struct pci_dev *pdev)
>  {
> @@ -285,6 +286,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
>  static inline void vpci_deassign_device(struct pci_dev *pdev)
>  {
>  };
> +
> +static inline bool vpci_translate_virtual_device(struct domain *d,
> +                                                 pci_sbdf_t *sbdf)
> +{
> +    return false;
> +}

Please don't add stubs which aren't really needed (which, afaict, is the
case for this one).

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 00/11] PCI devices passthrough on Arm, part 3
  2022-07-26 13:47 ` [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Rahul Singh
@ 2022-07-26 15:18   ` Oleksandr Tyshchenko
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr Tyshchenko @ 2022-07-26 15:18 UTC (permalink / raw)
  To: Rahul Singh
  Cc: xen-devel, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Paul Durrant, Roger Pau Monné,
	Oleksandr Andrushchenko, Oleksandr Tyshchenko


On 26.07.22 16:47, Rahul Singh wrote:
> Hi Oleksandr,


Hello Rahul



>
>> On 19 Jul 2022, at 6:42 pm, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>>
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Hi, all!
>>
>> You can find previous discussion at [1].
>>
>> 1. This patch series is focusing on vPCI and adds support for non-identity
>> PCI BAR mappings which is required while passing through a PCI device to
>> a guest. The highlights are:
>>
>> - Add relevant vpci register handlers when assigning PCI device to a domain
>>   and remove those when de-assigning. This allows having different
>>   handlers for different domains, e.g. hwdom and other guests.
>>
>> - Emulate guest BAR register values based on physical BAR values.
>>   This allows creating a guest view of the registers and emulates
>>   size and properties probe as it is done during PCI device enumeration by
>>   the guest.
>>
>> - Instead of handling a single range set, that contains all the memory
>>   regions of all the BARs and ROM, have them per BAR.
>>
>> - Take into account guest's BAR view and program its p2m accordingly:
>>   gfn is guest's view of the BAR and mfn is the physical BAR value as set
>>   up by the host bridge in the hardware domain.
>>   This way hardware domain sees physical BAR values and guest sees
>>   emulated ones.
>>
>> 2. The series also adds support for virtual PCI bus topology for guests:
>> - We emulate a single host bridge for the guest, so segment is always 0.
>> - The implementation is limited to 32 devices which are allowed on
>>    a single PCI bus.
>> - The virtual bus number is set to 0, so virtual devices are seen
>>    as embedded endpoints behind the root complex.
>>
>> 3. The series has been updated due to the new PCI(vPCI) locking scheme implemented
>> in the prereq series which is also on the review now [2].
>>
>> 4. For unprivileged guests vpci_{read|write} has been re-worked
>> to not passthrough accesses to the registers not explicitly handled
>> by the corresponding vPCI handlers: without that passthrough
>> to guests is completely unsafe as Xen allows them full access to
>> the registers. During development this can be reverted for debugging purposes.
>>
>> !!! OT: please note, Oleksandr Andrushchenko who is the author of all this stuff
>> has managed to address allmost all review comments given for v6 and pushed the updated
>> version to the github (23.02.22).
>> So after receiving his agreement I just picked it up and did the following before
>> pushing V7:
>> - rebased on the recent staging (resolving a few conflicts)
>> - updated according to the recent changes (added cf_check specifiers where appropriate, etc)
>> and performed minor adjustments
>> - made sure that both current and prereq series [2] didn't really break x86 by testing
>> PVH Dom0 (vPCI) and PV Dom0 + HVM DomU (PCI passthrough to DomU) using Qemu
>> - my colleague Volodymyr Babchuk (who was involved in the prereq series) rechecked that
>> both series worked on Arm using real HW
>>
>> You can also find the series at [3].
>>
>> [1] https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/20220204063459.680961-1-andr2000@gmail.com/__;!!GF_29dbcQIUBPA!1P9LeytJC7d3tnSuQCjk7YqIqfZPpGlrc6ES1l1sUAPbfGbeYg2YM477xiUy0oTU9ys7qv9MHD6GNDWCeHHG_qsr-NY$ [lore[.]kernel[.]org]
>> [2] https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/20220718211521.664729-1-volodymyr_babchuk@epam.com/__;!!GF_29dbcQIUBPA!1P9LeytJC7d3tnSuQCjk7YqIqfZPpGlrc6ES1l1sUAPbfGbeYg2YM477xiUy0oTU9ys7qv9MHD6GNDWCeHHGbScTNb4$ [lore[.]kernel[.]org]
>> [3] https://urldefense.com/v3/__https://github.com/otyshchenko1/xen/commits/vpci7__;!!GF_29dbcQIUBPA!1P9LeytJC7d3tnSuQCjk7YqIqfZPpGlrc6ES1l1sUAPbfGbeYg2YM477xiUy0oTU9ys7qv9MHD6GNDWCeHHGhpAmrcM$ [github[.]com]
>>
> I tested the whole series on ARM N1SDP board everything works as expected.


Sounds great!


>
> So for the whole series:
> Tested-by: Rahul Singh <rahul.singh@arm.com>


Thank you for testing!


>
> Regards,
> Rahul

-- 
Regards,

Oleksandr Tyshchenko

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/11] vpci/header: reset the command register when adding devices
  2022-07-19 17:42 ` [PATCH V7 08/11] vpci/header: reset the command register when adding devices Oleksandr Tyshchenko
  2022-07-26 15:09   ` Rahul Singh
@ 2022-07-26 15:23   ` Jan Beulich
  2022-07-27  8:58     ` Oleksandr
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-26 15:23 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel

On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Reset the command register when assigning a PCI device to a guest:
> according to the PCI spec the PCI_COMMAND register is typically all 0's
> after reset, but this might not be true for the guest as it needs
> to respect host's settings.
> For that reason, do not write 0 to the PCI_COMMAND register directly,
> but go through the corresponding emulation layer (cmd_write), which
> will take care about the actual bits written.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v6:
> - use cmd_write directly without introducing emulate_cmd_reg
> - update commit message with more description on all 0's in PCI_COMMAND

I agree with the change, but it's imo enough that you also need to sign
off on the patch (and this likely also applies elsewhere in the series).

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests
  2022-07-19 17:42 ` [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Tyshchenko
@ 2022-07-26 15:30   ` Jan Beulich
  2022-07-27 17:30     ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-26 15:30 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel

On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -443,11 +443,27 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>      return 0;
>  }
>  
> +/* TODO: Add proper emulation for all bits of the command register. */
>  static void cf_check cmd_write(
>      const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
>  {
>      uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>  
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        struct vpci_header *header = data;
> +
> +        header->guest_cmd = cmd;
> +#ifdef CONFIG_HAS_PCI_MSI
> +        if ( pdev->vpci->msi->enabled || pdev->vpci->msix->enabled )
> +            /*
> +             * Guest wants to enable INTx, but it can't be enabled
> +             * if MSI/MSI-X enabled.
> +             */
> +            cmd |= PCI_COMMAND_INTX_DISABLE;
> +#endif
> +    }
> +
>      /*
>       * Let Dom0 play with all the bits directly except for the memory
>       * decoding one.
> @@ -464,6 +480,19 @@ static void cf_check cmd_write(
>          pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> +static uint32_t cf_check cmd_read(
> +    const struct pci_dev *pdev, unsigned int reg, void *data)
> +{
> +    if ( !is_hardware_domain(pdev->domain) )
> +    {
> +        struct vpci_header *header = data;
> +
> +        return header->guest_cmd;
> +    }
> +
> +    return pci_conf_read16(pdev->sbdf, reg);
> +}

This function wants the same leading comment as cmd_write(). I also
think you better wouldn't give the guest the impression that r/o bits
can actually be written to (but getting this right may well fall
under the TODO).

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/11] vpci/header: reset the command register when adding devices
  2022-07-26 15:23   ` Jan Beulich
@ 2022-07-27  8:58     ` Oleksandr
  2022-07-27  9:46       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-27  8:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel


On 26.07.22 18:23, Jan Beulich wrote:

Hello Jan

> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Reset the command register when assigning a PCI device to a guest:
>> according to the PCI spec the PCI_COMMAND register is typically all 0's
>> after reset, but this might not be true for the guest as it needs
>> to respect host's settings.
>> For that reason, do not write 0 to the PCI_COMMAND register directly,
>> but go through the corresponding emulation layer (cmd_write), which
>> will take care about the actual bits written.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v6:
>> - use cmd_write directly without introducing emulate_cmd_reg
>> - update commit message with more description on all 0's in PCI_COMMAND
> I agree with the change,

thanks, may I please ask can this be converted to some tag?


>   but it's imo enough that you also need to sign
> off on the patch (and this likely also applies elsewhere in the series).


I got your point. Regarding the current patch, I didn't make any changes 
to it. As I mentioned in the cover letter [1] after "!!! OT: please note,"

Oleksandr Andrushchenko has addressed almost all review comments given 
for v6 by himself. For the patches which I had to touch I added "OT:" to 
the change log. For example, like here [2].

I will consider signing off these patches.



[1] 
https://lore.kernel.org/xen-devel/20220719174253.541965-1-olekstysh@gmail.com/

[2] 
https://lore.kernel.org/xen-devel/20220719174253.541965-8-olekstysh@gmail.com/


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/11] vpci/header: reset the command register when adding devices
  2022-07-27  8:58     ` Oleksandr
@ 2022-07-27  9:46       ` Jan Beulich
  2022-07-27 16:53         ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-27  9:46 UTC (permalink / raw)
  To: Oleksandr; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel

On 27.07.2022 10:58, Oleksandr wrote:
> On 26.07.22 18:23, Jan Beulich wrote:
>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Reset the command register when assigning a PCI device to a guest:
>>> according to the PCI spec the PCI_COMMAND register is typically all 0's
>>> after reset, but this might not be true for the guest as it needs
>>> to respect host's settings.
>>> For that reason, do not write 0 to the PCI_COMMAND register directly,
>>> but go through the corresponding emulation layer (cmd_write), which
>>> will take care about the actual bits written.
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> ---
>>> Since v6:
>>> - use cmd_write directly without introducing emulate_cmd_reg
>>> - update commit message with more description on all 0's in PCI_COMMAND
>> I agree with the change,
> 
> thanks, may I please ask can this be converted to some tag?

I could offer a R-b, but you've got one from Rahul already, so mine
won't buy you anything further. What you will need is a maintainer
ack, which imo isn't a priority since this is only patch 8 in a
series which itself depends on a further series likely continuing
to be controversial (didn't get there yet).

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign
  2022-07-19 17:42 ` [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign Oleksandr Tyshchenko
@ 2022-07-27 10:03   ` Jan Beulich
  2022-07-27 14:01     ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-27 10:03 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu, Paul Durrant,
	Roger Pau Monné,
	xen-devel

On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
> @@ -1558,6 +1560,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>  {
>      const struct domain_iommu *hd = dom_iommu(d);
>      struct pci_dev *pdev;
> +    uint8_t old_devfn;

Why "old"? There's nothing "new" here. Perhaps "orig", considering ...

> @@ -1594,6 +1599,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>                            pci_to_dev(pdev), flag)) )
>          goto done;
>  
> +    old_devfn = devfn;
> +
>      for ( ; pdev->phantom_stride; rc = 0 )
>      {
>          devfn += pdev->phantom_stride;

... this updating of devfn is what you mean to deal with? Then again
I see no need for a separate variable in the first place. The input
(seg,bus,devfn) tuple is translated to a pdev, so you can simply ...

> @@ -1603,6 +1610,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>                          pci_to_dev(pdev), flag);
>      }
>  
> +    rc = vpci_assign_device(pdev);
> +    if ( rc && deassign_device(d, seg, bus, old_devfn) )

... use pdev->devfn here.

> +        domain_crash(d);
> +
>   done:
>      if ( rc )
>          printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",

This log message will want to appear _before_ the domain_crash()
related output, or you will want to add another log message there.

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -92,6 +92,37 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  
>      return rc;
>  }
> +
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Notify vPCI that device is assigned to guest. */
> +int vpci_assign_device(struct pci_dev *pdev)
> +{
> +    int rc;
> +
> +    ASSERT(pcidevs_write_locked());
> +
> +    if ( !has_vpci(pdev->domain) )
> +        return 0;
> +
> +    rc = vpci_add_handlers(pdev);
> +    if ( rc )
> +        vpci_deassign_device(pdev);
> +
> +    return rc;
> +}
> +
> +/* Notify vPCI that device is de-assigned from guest. */
> +void vpci_deassign_device(struct pci_dev *pdev)
> +{
> +    ASSERT(pcidevs_write_locked());
> +
> +    if ( !has_vpci(pdev->domain) )
> +        return;

There's no need for this check since ...

> +    vpci_remove_device(pdev);

... this function already has it. At which point the question is why
a separate function needs to exist in the first place. To match with
vpci_assign_device(), a simple #define to alias is would be enough.
(This is one of the cases where personally I think a #define is
superior to an inline wrapper.)

Unless, of course, later patches extend this function. If so, the
commit message giving this as justification for the introduction of
(apparent) redundancy would be helpful.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 03/11] vpci/header: implement guest BAR register handlers
  2022-07-19 17:42 ` [PATCH V7 03/11] vpci/header: implement guest BAR register handlers Oleksandr Tyshchenko
@ 2022-07-27 10:15   ` Jan Beulich
  2022-07-27 16:17     ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-27 10:15 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel

On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
> @@ -527,6 +592,17 @@ static int cf_check init_bars(struct pci_dev *pdev)
>          if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>          {
>              bars[i].type = VPCI_BAR_IO;
> +
> +#ifndef CONFIG_X86
> +            if ( !is_hwdom )
> +            {
> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
> +                                       reg, 4, &bars[i]);
> +                if ( rc )
> +                    goto fail;
> +            }
> +#endif

Since long term this can't be correct, it wants a TODO comment put next
to it.

> @@ -553,34 +635,47 @@ static int cf_check init_bars(struct pci_dev *pdev)
>          bars[i].size = size;
>          bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>  
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> -                               &bars[i]);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                               is_hwdom ? bar_write : guest_bar_write,
> +                               reg, 4, &bars[i]);
>          if ( rc )
> -        {
> -            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> -            return rc;
> -        }
> +            goto fail;
>      }
>  
> -    /* Check expansion ROM. */
> -    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
> -    if ( rc > 0 && size )
> +    /* Check expansion ROM: we do not handle ROM for guests. */
> +    if ( is_hwdom )

This again can't be right long-term. Personally I'd prefer if the code
was (largely) left as is, with adjustments (with suitable TODO comments)
made on a much smaller scope only. But I'm not the maintainer of this
code - Roger may have a different view on this.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 06/11] vpci/header: program p2m with guest BAR view
  2022-07-19 17:42 ` [PATCH V7 06/11] vpci/header: program p2m with guest BAR view Oleksandr Tyshchenko
@ 2022-07-27 10:19   ` Jan Beulich
  2022-07-27 17:06     ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-27 10:19 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel

On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Take into account guest's BAR view and program its p2m accordingly:
> gfn is guest's view of the BAR and mfn is the physical BAR value as set
> up by the PCI bus driver in the hardware domain.
> This way hardware domain sees physical BAR values and guest sees
> emulated ones.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

If taking the previous patch as given, the patch here looks okay to me.
But since I'm still not really agreeing with what the previous patch
does, both that and this one will need to be judged on by Roger once
he's back. I'm sorry for the therefore resulting further delay.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology
  2022-07-19 17:42 ` [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology Oleksandr Tyshchenko
@ 2022-07-27 10:32   ` Jan Beulich
  2022-07-28 14:16     ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-27 10:32 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Andrushchenko, Roger Pau Monné,
	Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel

On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.
> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.
> 
> Please note, that at the moment only function 0 of a multifunction
> device can be passed through.

I've not been able to spot where this restriction is being enforced -
can you please point me at the respective code?

> @@ -99,6 +102,62 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  }
>  
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static int add_virtual_device(struct pci_dev *pdev)
> +{
> +    struct domain *d = pdev->domain;
> +    pci_sbdf_t sbdf = { 0 };
> +    unsigned long new_dev_number;
> +
> +    if ( is_hardware_domain(d) )
> +        return 0;
> +
> +    ASSERT(pcidevs_write_locked());
> +
> +    /*
> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
> +     * there are multi-function ones which are not yet supported.
> +     */
> +    if ( pdev->info.is_extfn )
> +    {
> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
> +                 &pdev->sbdf);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
> +                                         VPCI_MAX_VIRT_DEV);
> +    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
> +        return -ENOSPC;
> +
> +    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
> +
> +    /*
> +     * Both segment and bus number are 0:
> +     *  - we emulate a single host bridge for the guest, e.g. segment 0
> +     *  - with bus 0 the virtual devices are seen as embedded
> +     *    endpoints behind the root complex
> +     *
> +     * TODO: add support for multi-function devices.
> +     */
> +    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
> +    pdev->vpci->guest_sbdf = sbdf;
> +
> +    return 0;
> +
> +}
> +
> +static void vpci_remove_virtual_device(const struct pci_dev *pdev)
> +{
> +    ASSERT(pcidevs_write_locked());
> +
> +    if ( pdev->vpci )
> +    {
> +        __clear_bit(pdev->vpci->guest_sbdf.dev,
> +                    &pdev->domain->vpci_dev_assigned_map);
> +        pdev->vpci->guest_sbdf.sbdf = ~0;
> +    }
> +}

Feels like I did comment on this before: When ...

> @@ -111,8 +170,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>  
>      rc = vpci_add_handlers(pdev);
>      if ( rc )
> -        vpci_deassign_device(pdev);
> +        goto fail;

... this path is taken and hence ...

> +    rc = add_virtual_device(pdev);

... this is bypassed, ...

> +    if ( rc )
> +        goto fail;
> +
> +    return 0;
>  
> + fail:
> +    vpci_deassign_device(pdev);

... the function here will see guest_sbdf still as ~0, while pdev->vpci
is non-NULL. Therefore mistakenly bit 31 of vpci_dev_assigned_map will
be cleared.

> @@ -124,6 +191,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
>      if ( !has_vpci(pdev->domain) )
>          return;
>  
> +    vpci_remove_virtual_device(pdev);
>      vpci_remove_device(pdev);
>  }

And other call sites of vpci_remove_device() do not have a need of
cleaning up guest_sbdf / vpci_dev_assigned_map? IOW I wonder if it
wouldn't be better to have vpci_remove_device() do this as well
(retaining - see my comment on the earlier patch) the simple aliasing
of vpci_deassign_device() to vpci_remove_device()).

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -457,6 +457,14 @@ struct domain
>  
>  #ifdef CONFIG_HAS_PCI
>      struct list_head pdev_list;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    /*
> +     * The bitmap which shows which device numbers are already used by the
> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
> +     * next passed through virtual PCI device.
> +     */
> +    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
> +#endif
>  #endif

Hmm, yet another reason to keep sched.h including vpci.h, which
imo would better be dropped - sched.h already has way too many
dependencies. (Just a remark, not strictly a request to change
anything.)

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign
  2022-07-27 10:03   ` Jan Beulich
@ 2022-07-27 14:01     ` Oleksandr
  2022-07-27 14:35       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-27 14:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu, Paul Durrant,
	Roger Pau Monné,
	xen-devel


On 27.07.22 13:03, Jan Beulich wrote:


Hello Jan


> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>> @@ -1558,6 +1560,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>   {
>>       const struct domain_iommu *hd = dom_iommu(d);
>>       struct pci_dev *pdev;
>> +    uint8_t old_devfn;
> Why "old"? There's nothing "new" here. Perhaps "orig", considering ...


ok


>
>> @@ -1594,6 +1599,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>                             pci_to_dev(pdev), flag)) )
>>           goto done;
>>   
>> +    old_devfn = devfn;
>> +
>>       for ( ; pdev->phantom_stride; rc = 0 )
>>       {
>>           devfn += pdev->phantom_stride;
> ... this updating of devfn is what you mean to deal with?

As I understand that was the intention of the author. At least I don't 
see other reasons.



> Then again
> I see no need for a separate variable in the first place. The input
> (seg,bus,devfn) tuple is translated to a pdev, so you can simply ...
>
>> @@ -1603,6 +1610,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>                           pci_to_dev(pdev), flag);
>>       }
>>   
>> +    rc = vpci_assign_device(pdev);
>> +    if ( rc && deassign_device(d, seg, bus, old_devfn) )
> ... use pdev->devfn here.


Thanks, good point, will drop old_devfn and use pdev->devfn. I am 
wondering whether the printk after "done:" label (and other possible 
printk-s down the code) should really use pdev->devfn instead of devfn 
in PCI_SBDF construct?



>
>> +        domain_crash(d);
>> +
>>    done:
>>       if ( rc )
>>           printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
> This log message will want to appear _before_ the domain_crash()
> related output, or you will want to add another log message there.

I will probably add another log message before domain_crash() leaving 
this one as is.

printk(XENLOG_ERR "%pd: %pp was left partially assigned\n", d, 
&PCI_SBDF(seg, bus, devfn));


>
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -92,6 +92,37 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>   
>>       return rc;
>>   }
>> +
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +/* Notify vPCI that device is assigned to guest. */
>> +int vpci_assign_device(struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +
>> +    ASSERT(pcidevs_write_locked());
>> +
>> +    if ( !has_vpci(pdev->domain) )
>> +        return 0;
>> +
>> +    rc = vpci_add_handlers(pdev);
>> +    if ( rc )
>> +        vpci_deassign_device(pdev);
>> +
>> +    return rc;
>> +}
>> +
>> +/* Notify vPCI that device is de-assigned from guest. */
>> +void vpci_deassign_device(struct pci_dev *pdev)
>> +{
>> +    ASSERT(pcidevs_write_locked());
>> +
>> +    if ( !has_vpci(pdev->domain) )
>> +        return;
> There's no need for this check since ...
>
>> +    vpci_remove_device(pdev);
> ... this function already has it. At which point the question is why
> a separate function needs to exist in the first place. To match with
> vpci_assign_device(), a simple #define to alias is would be enough.
> (This is one of the cases where personally I think a #define is
> superior to an inline wrapper.)


Agree, but ...


>
> Unless, of course, later patches extend this function. If so, the
> commit message giving this as justification for the introduction of
> (apparent) redundancy would be helpful.

... exactly, the later patches extend this function. I will update 
commit description.



>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign
  2022-07-27 14:01     ` Oleksandr
@ 2022-07-27 14:35       ` Jan Beulich
  2022-07-27 16:49         ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-27 14:35 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu, Paul Durrant,
	Roger Pau Monné,
	xen-devel

On 27.07.2022 16:01, Oleksandr wrote:
> On 27.07.22 13:03, Jan Beulich wrote:
>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>> @@ -1603,6 +1610,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>>                           pci_to_dev(pdev), flag);
>>>       }
>>>   
>>> +    rc = vpci_assign_device(pdev);
>>> +    if ( rc && deassign_device(d, seg, bus, old_devfn) )
>> ... use pdev->devfn here.
> 
> 
> Thanks, good point, will drop old_devfn and use pdev->devfn. I am 
> wondering whether the printk after "done:" label (and other possible 
> printk-s down the code) should really use pdev->devfn instead of devfn 
> in PCI_SBDF construct?

Yes, that's intended: If assigning a phantom function fails, this
should be distinguishable from failure to assign the real device.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 03/11] vpci/header: implement guest BAR register handlers
  2022-07-27 10:15   ` Jan Beulich
@ 2022-07-27 16:17     ` Oleksandr
  2022-07-28  7:01       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-27 16:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel


On 27.07.22 13:15, Jan Beulich wrote:

Hello Jan

> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>> @@ -527,6 +592,17 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>           if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>           {
>>               bars[i].type = VPCI_BAR_IO;
>> +
>> +#ifndef CONFIG_X86
>> +            if ( !is_hwdom )
>> +            {
>> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
>> +                                       reg, 4, &bars[i]);
>> +                if ( rc )
>> +                    goto fail;
>> +            }
>> +#endif
> Since long term this can't be correct, it wants a TODO comment put next
> to it.


Looking into the previous versions of this patch (up to V3) I failed to 
find any changes in current version which hadn't been discussed (and 
agreed in some form).

Could you please clarify what exactly can't be correct the long term, 
for me to put the proper TODO here. Do you perhaps mean that TODO needs 
to explain why we have to diverge?


>
>> @@ -553,34 +635,47 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>           bars[i].size = size;
>>           bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>>   
>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
>> -                               &bars[i]);
>> +        rc = vpci_add_register(pdev->vpci,
>> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
>> +                               is_hwdom ? bar_write : guest_bar_write,
>> +                               reg, 4, &bars[i]);
>>           if ( rc )
>> -        {
>> -            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> -            return rc;
>> -        }
>> +            goto fail;
>>       }
>>   
>> -    /* Check expansion ROM. */
>> -    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
>> -    if ( rc > 0 && size )
>> +    /* Check expansion ROM: we do not handle ROM for guests. */
>> +    if ( is_hwdom )
> This again can't be right long-term. Personally I'd prefer if the code
> was (largely) left as is, with adjustments (with suitable TODO comments)
> made on a much smaller scope only.


I can revive a comment that Oleksandr Andrushchenko provided for earlier 
version by transforming into TODO:


ROM BAR is only handled for the hardware domain and for guest domains
there is a stub: at the moment PCI expansion ROM handling is supported
for x86 only and it might not be used by other architectures without
emulating x86. Other use-cases may include using that expansion ROM before
Xen boots, hence no emulation is needed in Xen itself. Or when a guest
wants to use the ROM code which seems to be rare.



>   But I'm not the maintainer of this
> code - Roger may have a different view on this.


Well, let's wait for Roger's input here.


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign
  2022-07-27 14:35       ` Jan Beulich
@ 2022-07-27 16:49         ` Oleksandr
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr @ 2022-07-27 16:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Andrew Cooper, George Dunlap,
	Julien Grall, Stefano Stabellini, Wei Liu, Paul Durrant,
	Roger Pau Monné,
	xen-devel


On 27.07.22 17:35, Jan Beulich wrote:


Hello Jan

> On 27.07.2022 16:01, Oleksandr wrote:
>> On 27.07.22 13:03, Jan Beulich wrote:
>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>> @@ -1603,6 +1610,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>>>                            pci_to_dev(pdev), flag);
>>>>        }
>>>>    
>>>> +    rc = vpci_assign_device(pdev);
>>>> +    if ( rc && deassign_device(d, seg, bus, old_devfn) )
>>> ... use pdev->devfn here.
>>
>> Thanks, good point, will drop old_devfn and use pdev->devfn. I am
>> wondering whether the printk after "done:" label (and other possible
>> printk-s down the code) should really use pdev->devfn instead of devfn
>> in PCI_SBDF construct?
> Yes, that's intended: If assigning a phantom function fails, this
> should be distinguishable from failure to assign the real device.


Thank you for the clarification.

Hmm, so before this patch the assigning of phantom functions was the 
last action before the "done:" label. So I will probably need to check 
for "rc" before calling vpci_assign_device().

Something like that:


[snip]

@@ -1602,6 +1606,17 @@ static int assign_device(struct domain *d, u16 
seg, u8 bus, u8 devfn, u32 flag)
          rc = iommu_call(hd->platform_ops, assign_device, d, devfn,
                          pci_to_dev(pdev), flag);
      }
+    if ( rc )
+        goto done;
+
+    devfn = pdev->devfn;
+    rc = vpci_assign_device(pdev);
+    if ( rc && deassign_device(d, seg, bus, devfn) )
+    {
+        printk(XENLOG_ERR "%pd: %pp was left partially assigned\n",
+               d, &PCI_SBDF(seg, bus, devfn));
+        domain_crash(d);
+    }

   done:
      if ( rc )

[snip]


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/11] vpci/header: reset the command register when adding devices
  2022-07-27  9:46       ` Jan Beulich
@ 2022-07-27 16:53         ` Oleksandr
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr @ 2022-07-27 16:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel


On 27.07.22 12:46, Jan Beulich wrote:

Hello Jan

> On 27.07.2022 10:58, Oleksandr wrote:
>> On 26.07.22 18:23, Jan Beulich wrote:
>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Reset the command register when assigning a PCI device to a guest:
>>>> according to the PCI spec the PCI_COMMAND register is typically all 0's
>>>> after reset, but this might not be true for the guest as it needs
>>>> to respect host's settings.
>>>> For that reason, do not write 0 to the PCI_COMMAND register directly,
>>>> but go through the corresponding emulation layer (cmd_write), which
>>>> will take care about the actual bits written.
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>> ---
>>>> Since v6:
>>>> - use cmd_write directly without introducing emulate_cmd_reg
>>>> - update commit message with more description on all 0's in PCI_COMMAND
>>> I agree with the change,
>> thanks, may I please ask can this be converted to some tag?
> I could offer a R-b, but you've got one from Rahul already, so mine
> won't buy you anything further. What you will need is a maintainer
> ack, which imo isn't a priority since this is only patch 8 in a
> series which itself depends on a further series likely continuing
> to be controversial (didn't get there yet).


ok, fair enough.


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 06/11] vpci/header: program p2m with guest BAR view
  2022-07-27 10:19   ` Jan Beulich
@ 2022-07-27 17:06     ` Oleksandr
  2022-07-28  7:04       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-27 17:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel


On 27.07.22 13:19, Jan Beulich wrote:

Hello Jan


> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Take into account guest's BAR view and program its p2m accordingly:
>> gfn is guest's view of the BAR and mfn is the physical BAR value as set
>> up by the PCI bus driver in the hardware domain.
>> This way hardware domain sees physical BAR values and guest sees
>> emulated ones.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> If taking the previous patch as given, the patch here looks okay to me.

This is a good news, thank you.


> But since I'm still not really agreeing with what the previous patch
> does,

Previous? Sorry, I didn't see any comments given for "[PATCH V7 05/11] 
vpci/header: handle p2m range sets per BAR".

Or do you perhaps mean "[PATCH V7 03/11] vpci/header: implement guest 
BAR register handlers" where you explicitly mentioned concerns?


> both that and this one will need to be judged on by Roger once
> he's back.


Let's wait for Roger's input here as well.


> I'm sorry for the therefore resulting further delay.

no problem


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests
  2022-07-26 15:30   ` Jan Beulich
@ 2022-07-27 17:30     ` Oleksandr
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr @ 2022-07-27 17:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel


On 26.07.22 18:30, Jan Beulich wrote:

Hello Jan

> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -443,11 +443,27 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>       return 0;
>>   }
>>   
>> +/* TODO: Add proper emulation for all bits of the command register. */
>>   static void cf_check cmd_write(
>>       const struct pci_dev *pdev, unsigned int reg, uint32_t cmd, void *data)
>>   {
>>       uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>   
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        struct vpci_header *header = data;
>> +
>> +        header->guest_cmd = cmd;
>> +#ifdef CONFIG_HAS_PCI_MSI
>> +        if ( pdev->vpci->msi->enabled || pdev->vpci->msix->enabled )
>> +            /*
>> +             * Guest wants to enable INTx, but it can't be enabled
>> +             * if MSI/MSI-X enabled.
>> +             */
>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>> +#endif
>> +    }
>> +
>>       /*
>>        * Let Dom0 play with all the bits directly except for the memory
>>        * decoding one.
>> @@ -464,6 +480,19 @@ static void cf_check cmd_write(
>>           pci_conf_write16(pdev->sbdf, reg, cmd);
>>   }
>>   
>> +static uint32_t cf_check cmd_read(
>> +    const struct pci_dev *pdev, unsigned int reg, void *data)
>> +{
>> +    if ( !is_hardware_domain(pdev->domain) )
>> +    {
>> +        struct vpci_header *header = data;
>> +
>> +        return header->guest_cmd;
>> +    }
>> +
>> +    return pci_conf_read16(pdev->sbdf, reg);
>> +}
> This function wants the same leading comment as cmd_write().

ok


>   I also
> think you better wouldn't give the guest the impression that r/o bits
> can actually be written to (but getting this right may well fall
> under the TODO).

ok


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-26 15:16   ` Jan Beulich
@ 2022-07-27 17:54     ` Oleksandr
  2022-07-27 19:39       ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-27 17:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné,
	xen-devel


On 26.07.22 18:16, Jan Beulich wrote:

Hello Jan

> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>> --- a/xen/arch/arm/vpci.c
>> +++ b/xen/arch/arm/vpci.c
>> @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>       /* data is needed to prevent a pointer cast on 32bit */
>>       unsigned long data;
>>   
>> +    /*
>> +     * For the passed through devices we need to map their virtual SBDF
>> +     * to the physical PCI device being passed through.
>> +     */
>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>> +    {
>> +        *r = ~0ul;
>> +        return 1;
>> +    }
> I'm probably simply lacking specific Arm-side knowledge, but it strikes
> me as odd that the need for translation would be dependent upon "bridge".


I am afraid I cannot answer immediately.

I will analyze that question and provide an answer later on.



>
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -158,6 +158,32 @@ static void vpci_remove_virtual_device(const struct pci_dev *pdev)
>>       }
>>   }
>>   
>> +/*
>> + * Find the physical device which is mapped to the virtual device
>> + * and translate virtual SBDF to the physical one.
>> + */
>> +bool vpci_translate_virtual_device(struct domain *d, pci_sbdf_t *sbdf)
>> +{
>> +    struct pci_dev *pdev;
> const wherever possible please (i.e. likely also for the first function
> parameter).


ok, will do


>
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -276,6 +276,7 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
>>   /* Notify vPCI that device is assigned/de-assigned to/from guest. */
>>   int vpci_assign_device(struct pci_dev *pdev);
>>   void vpci_deassign_device(struct pci_dev *pdev);
>> +bool vpci_translate_virtual_device(struct domain *d, pci_sbdf_t *sbdf);
>>   #else
>>   static inline int vpci_assign_device(struct pci_dev *pdev)
>>   {
>> @@ -285,6 +286,12 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
>>   static inline void vpci_deassign_device(struct pci_dev *pdev)
>>   {
>>   };
>> +
>> +static inline bool vpci_translate_virtual_device(struct domain *d,
>> +                                                 pci_sbdf_t *sbdf)
>> +{
>> +    return false;
>> +}
> Please don't add stubs which aren't really needed (which, afaict, is the
> case for this one).


I assume, this is needed if HAS_VPCI is present, but 
HAS_VPCI_GUEST_SUPPORT is not. And the author added that stub 
specifically to drop a few "#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT" from 
Arm's code.

Or I really missed something?



>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-27 17:54     ` Oleksandr
@ 2022-07-27 19:39       ` Oleksandr
  2022-07-28  7:15         ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-27 19:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné,
	xen-devel


Hello Jan


On 27.07.22 20:54, Oleksandr wrote:
>
> On 26.07.22 18:16, Jan Beulich wrote:
>
> Hello Jan
>
>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>> --- a/xen/arch/arm/vpci.c
>>> +++ b/xen/arch/arm/vpci.c
>>> @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v, 
>>> mmio_info_t *info,
>>>       /* data is needed to prevent a pointer cast on 32bit */
>>>       unsigned long data;
>>>   +    /*
>>> +     * For the passed through devices we need to map their virtual 
>>> SBDF
>>> +     * to the physical PCI device being passed through.
>>> +     */
>>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>>> +    {
>>> +        *r = ~0ul;
>>> +        return 1;
>>> +    }
>> I'm probably simply lacking specific Arm-side knowledge, but it strikes
>> me as odd that the need for translation would be dependent upon 
>> "bridge".
>
>
> I am afraid I cannot answer immediately.
>
> I will analyze that question and provide an answer later on.


Well, most likely that "valid" bridge pointer here is just used as an 
indicator of hwdom currently, so no need to perform virt->phys 
translation for sbdf.

You can see that domain_vpci_init() passes a valid value for hwdom and 
NULL for other domains when setting up vpci_mmio* callbacks.

Alternatively, I guess we could use "!is_hardware_domain(v->domain)" 
instead of "!bridge" in the first part of that check. Shall I?



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 03/11] vpci/header: implement guest BAR register handlers
  2022-07-27 16:17     ` Oleksandr
@ 2022-07-28  7:01       ` Jan Beulich
  2022-07-28 14:56         ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-28  7:01 UTC (permalink / raw)
  To: Oleksandr; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel

On 27.07.2022 18:17, Oleksandr wrote:
> On 27.07.22 13:15, Jan Beulich wrote:
>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>> @@ -527,6 +592,17 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>>           if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>>           {
>>>               bars[i].type = VPCI_BAR_IO;
>>> +
>>> +#ifndef CONFIG_X86
>>> +            if ( !is_hwdom )
>>> +            {
>>> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
>>> +                                       reg, 4, &bars[i]);
>>> +                if ( rc )
>>> +                    goto fail;
>>> +            }
>>> +#endif
>> Since long term this can't be correct, it wants a TODO comment put next
>> to it.
> 
> 
> Looking into the previous versions of this patch (up to V3) I failed to 
> find any changes in current version which hadn't been discussed (and 
> agreed in some form).
> 
> Could you please clarify what exactly can't be correct the long term, 
> for me to put the proper TODO here. Do you perhaps mean that TODO needs 
> to explain why we have to diverge?

If a device has I/O port ranges, then that's typically for a reason.
Drivers (in the guest) may therefore want to use those ranges to
communicate with the device. Imagine in particular a device without
any MMIO BARs, and with only I/O port one(s).

>>> @@ -553,34 +635,47 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>>           bars[i].size = size;
>>>           bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>>>   
>>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
>>> -                               &bars[i]);
>>> +        rc = vpci_add_register(pdev->vpci,
>>> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
>>> +                               is_hwdom ? bar_write : guest_bar_write,
>>> +                               reg, 4, &bars[i]);
>>>           if ( rc )
>>> -        {
>>> -            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>>> -            return rc;
>>> -        }
>>> +            goto fail;
>>>       }
>>>   
>>> -    /* Check expansion ROM. */
>>> -    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
>>> -    if ( rc > 0 && size )
>>> +    /* Check expansion ROM: we do not handle ROM for guests. */
>>> +    if ( is_hwdom )
>> This again can't be right long-term. Personally I'd prefer if the code
>> was (largely) left as is, with adjustments (with suitable TODO comments)
>> made on a much smaller scope only.
> 
> 
> I can revive a comment that Oleksandr Andrushchenko provided for earlier 
> version by transforming into TODO:
> 
> 
> ROM BAR is only handled for the hardware domain and for guest domains
> there is a stub: at the moment PCI expansion ROM handling is supported
> for x86 only and it might not be used by other architectures without
> emulating x86. Other use-cases may include using that expansion ROM before
> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
> wants to use the ROM code which seems to be rare.

ROMs can contain other than x86 code. While reportedly mostly dead, EFI
bytecode was an example of an abstraction layer supporting arbitrary
architectures. Therefore a comment along these lines would be okay, but
personally I'd prefer it to be less verbose - along the lines of the
one to be supplied for the I/O port restriction.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 06/11] vpci/header: program p2m with guest BAR view
  2022-07-27 17:06     ` Oleksandr
@ 2022-07-28  7:04       ` Jan Beulich
  0 siblings, 0 replies; 48+ messages in thread
From: Jan Beulich @ 2022-07-28  7:04 UTC (permalink / raw)
  To: Oleksandr; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel

On 27.07.2022 19:06, Oleksandr wrote:
> On 27.07.22 13:19, Jan Beulich wrote:
>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Take into account guest's BAR view and program its p2m accordingly:
>>> gfn is guest's view of the BAR and mfn is the physical BAR value as set
>>> up by the PCI bus driver in the hardware domain.
>>> This way hardware domain sees physical BAR values and guest sees
>>> emulated ones.
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> If taking the previous patch as given, the patch here looks okay to me.
> 
> This is a good news, thank you.
> 
> 
>> But since I'm still not really agreeing with what the previous patch
>> does,
> 
> Previous? Sorry, I didn't see any comments given for "[PATCH V7 05/11] 
> vpci/header: handle p2m range sets per BAR".
> 
> Or do you perhaps mean "[PATCH V7 03/11] vpci/header: implement guest 
> BAR register handlers" where you explicitly mentioned concerns?

No, I mean the previous patch, to which I had given comments in a much
earlier version. Roger looks to agree with the approach taken, so my
comments were (legitimately) put off. But with me not agreeing with
the approach, it's not very reasonable for me to (further) review that
change. Hence my deferral to Roger.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-27 19:39       ` Oleksandr
@ 2022-07-28  7:15         ` Jan Beulich
  2022-07-28 16:35           ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-28  7:15 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné,
	xen-devel

On 27.07.2022 21:39, Oleksandr wrote:
> On 27.07.22 20:54, Oleksandr wrote:
>> On 26.07.22 18:16, Jan Beulich wrote:
>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/arch/arm/vpci.c
>>>> +++ b/xen/arch/arm/vpci.c
>>>> @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v, 
>>>> mmio_info_t *info,
>>>>       /* data is needed to prevent a pointer cast on 32bit */
>>>>       unsigned long data;
>>>>   +    /*
>>>> +     * For the passed through devices we need to map their virtual 
>>>> SBDF
>>>> +     * to the physical PCI device being passed through.
>>>> +     */
>>>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>>>> +    {
>>>> +        *r = ~0ul;
>>>> +        return 1;
>>>> +    }
>>> I'm probably simply lacking specific Arm-side knowledge, but it strikes
>>> me as odd that the need for translation would be dependent upon 
>>> "bridge".
>>
>>
>> I am afraid I cannot answer immediately.
>>
>> I will analyze that question and provide an answer later on.
> 
> 
> Well, most likely that "valid" bridge pointer here is just used as an 
> indicator of hwdom currently, so no need to perform virt->phys 
> translation for sbdf.
> 
> You can see that domain_vpci_init() passes a valid value for hwdom and 
> NULL for other domains when setting up vpci_mmio* callbacks.

Oh, I see.

> Alternatively, I guess we could use "!is_hardware_domain(v->domain)" 
> instead of "!bridge" in the first part of that check. Shall I?

Maybe simply add a comment? Surely checking "bridge" is cheaper than
using is_hardware_domain(), so I can see the benefit. But the larger
arm/vpci.c grows, the less obvious the connection will be without a
comment. (Instead of a comment, an alternative may be a suitable
assertion, which then documents the connection at the same time, e.g.
ASSERT(!bridge == !is_hardware_domain(v->domain)). But that won't be
possible in e.g. vpci_sbdf_from_gpa(), where apparently a similar
assumption is being made.)

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology
  2022-07-27 10:32   ` Jan Beulich
@ 2022-07-28 14:16     ` Oleksandr
  2022-07-28 14:26       ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-28 14:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Roger Pau Monné,
	Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel


On 27.07.22 13:32, Jan Beulich wrote:


Hello Jan


> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Assign SBDF to the PCI devices being passed through with bus 0.
>> The resulting topology is where PCIe devices reside on the bus 0 of the
>> root complex itself (embedded endpoints).
>> This implementation is limited to 32 devices which are allowed on
>> a single PCI bus.
>>
>> Please note, that at the moment only function 0 of a multifunction
>> device can be passed through.
> I've not been able to spot where this restriction is being enforced -
> can you please point me at the respective code?

Nor have I found the respective code.

Could you please suggest a place where to put such enforcement (I guess, 
this should be present in the toolstack)?



>
>> @@ -99,6 +102,62 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>   }
>>   
>>   #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +static int add_virtual_device(struct pci_dev *pdev)
>> +{
>> +    struct domain *d = pdev->domain;
>> +    pci_sbdf_t sbdf = { 0 };
>> +    unsigned long new_dev_number;
>> +
>> +    if ( is_hardware_domain(d) )
>> +        return 0;
>> +
>> +    ASSERT(pcidevs_write_locked());
>> +
>> +    /*
>> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
>> +     * there are multi-function ones which are not yet supported.
>> +     */
>> +    if ( pdev->info.is_extfn )
>> +    {
>> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
>> +                 &pdev->sbdf);
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
>> +                                         VPCI_MAX_VIRT_DEV);
>> +    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
>> +        return -ENOSPC;
>> +
>> +    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
>> +
>> +    /*
>> +     * Both segment and bus number are 0:
>> +     *  - we emulate a single host bridge for the guest, e.g. segment 0
>> +     *  - with bus 0 the virtual devices are seen as embedded
>> +     *    endpoints behind the root complex
>> +     *
>> +     * TODO: add support for multi-function devices.
>> +     */
>> +    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
>> +    pdev->vpci->guest_sbdf = sbdf;
>> +
>> +    return 0;
>> +
>> +}
>> +
>> +static void vpci_remove_virtual_device(const struct pci_dev *pdev)
>> +{
>> +    ASSERT(pcidevs_write_locked());
>> +
>> +    if ( pdev->vpci )
>> +    {
>> +        __clear_bit(pdev->vpci->guest_sbdf.dev,
>> +                    &pdev->domain->vpci_dev_assigned_map);
>> +        pdev->vpci->guest_sbdf.sbdf = ~0;
>> +    }
>> +}
> Feels like I did comment on this before: When ...
>
>> @@ -111,8 +170,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>   
>>       rc = vpci_add_handlers(pdev);
>>       if ( rc )
>> -        vpci_deassign_device(pdev);
>> +        goto fail;
> ... this path is taken and hence ...
>
>> +    rc = add_virtual_device(pdev);
> ... this is bypassed, ...
>
>> +    if ( rc )
>> +        goto fail;
>> +
>> +    return 0;
>>   
>> + fail:
>> +    vpci_deassign_device(pdev);
> ... the function here will see guest_sbdf still as ~0, while pdev->vpci
> is non-NULL. Therefore mistakenly bit 31 of vpci_dev_assigned_map will
> be cleared.


Indeed, good catch, thanks! I assume this can be just fixed by extending 
a check in vpci_remove_virtual_device():

if ( pdev->vpci && (pdev->vpci->guest_sbdf.sbdf != ~0) )




>
>> @@ -124,6 +191,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
>>       if ( !has_vpci(pdev->domain) )
>>           return;
>>   
>> +    vpci_remove_virtual_device(pdev);
>>       vpci_remove_device(pdev);
>>   }
> And other call sites of vpci_remove_device() do not have a need of
> cleaning up guest_sbdf / vpci_dev_assigned_map?

I am not 100% sure, but it looks like they don't need. On the other 
hand, even if they don't need that, doing the cleaning won't be an issue 
at all,

there is a check before cleaning (which will be extended as I proposed 
above), so ...


> IOW I wonder if it
> wouldn't be better to have vpci_remove_device() do this as well
> (retaining - see my comment on the earlier patch) the simple aliasing
> of vpci_deassign_device() to vpci_remove_device()).


    ... maybe yes. Shall I do that change?



>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -457,6 +457,14 @@ struct domain
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       struct list_head pdev_list;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    /*
>> +     * The bitmap which shows which device numbers are already used by the
>> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
>> +     * next passed through virtual PCI device.
>> +     */
>> +    DECLARE_BITMAP(vpci_dev_assigned_map, VPCI_MAX_VIRT_DEV);
>> +#endif
>>   #endif
> Hmm, yet another reason to keep sched.h including vpci.h, which
> imo would better be dropped - sched.h already has way too many
> dependencies. (Just a remark, not strictly a request to change
> anything.)


I see.


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology
  2022-07-28 14:16     ` Oleksandr
@ 2022-07-28 14:26       ` Jan Beulich
  2022-07-28 14:41         ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-28 14:26 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Andrushchenko, Roger Pau Monné,
	Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel

On 28.07.2022 16:16, Oleksandr wrote:
> On 27.07.22 13:32, Jan Beulich wrote:
>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>> root complex itself (embedded endpoints).
>>> This implementation is limited to 32 devices which are allowed on
>>> a single PCI bus.
>>>
>>> Please note, that at the moment only function 0 of a multifunction
>>> device can be passed through.
>> I've not been able to spot where this restriction is being enforced -
>> can you please point me at the respective code?
> 
> Nor have I found the respective code.
> 
> Could you please suggest a place where to put such enforcement (I guess, 
> this should be present in the toolstack)?

Such check should be in the tool stack primarily to give a sensible
error message to the user. Yet the hypervisor needs to check itself
nevertheless. You know the code you're adding much better than I do,
so I guess I'm a little puzzled by you asking me to suggest a place.
(And for the tool stack I guess asking tool stack folks would get
you better mileage.)

>>> @@ -124,6 +191,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
>>>       if ( !has_vpci(pdev->domain) )
>>>           return;
>>>   
>>> +    vpci_remove_virtual_device(pdev);
>>>       vpci_remove_device(pdev);
>>>   }
>> And other call sites of vpci_remove_device() do not have a need of
>> cleaning up guest_sbdf / vpci_dev_assigned_map?
> 
> I am not 100% sure, but it looks like they don't need. On the other 
> hand, even if they don't need that, doing the cleaning won't be an issue 
> at all,
> 
> there is a check before cleaning (which will be extended as I proposed 
> above), so ...
> 
> 
>> IOW I wonder if it
>> wouldn't be better to have vpci_remove_device() do this as well
>> (retaining - see my comment on the earlier patch) the simple aliasing
>> of vpci_deassign_device() to vpci_remove_device()).
> 
> 
>     ... maybe yes. Shall I do that change?

Well - yes please, afaic.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology
  2022-07-28 14:26       ` Jan Beulich
@ 2022-07-28 14:41         ` Oleksandr
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr @ 2022-07-28 14:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Roger Pau Monné,
	Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel


On 28.07.22 17:26, Jan Beulich wrote:

Hello Jan

> On 28.07.2022 16:16, Oleksandr wrote:
>> On 27.07.22 13:32, Jan Beulich wrote:
>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>>> root complex itself (embedded endpoints).
>>>> This implementation is limited to 32 devices which are allowed on
>>>> a single PCI bus.
>>>>
>>>> Please note, that at the moment only function 0 of a multifunction
>>>> device can be passed through.
>>> I've not been able to spot where this restriction is being enforced -
>>> can you please point me at the respective code?
>> Nor have I found the respective code.
>>
>> Could you please suggest a place where to put such enforcement (I guess,
>> this should be present in the toolstack)?
> Such check should be in the tool stack primarily to give a sensible
> error message to the user. Yet the hypervisor needs to check itself
> nevertheless. You know the code you're adding much better than I do,
> so I guess I'm a little puzzled by you asking me to suggest a place.
> (And for the tool stack I guess asking tool stack folks would get
> you better mileage.)

Thanks for the clarification.


I am still getting used to the changes which that patch series makes (I 
didn't write that code).

Asking for suggestion I didn't mean to point an exact place in the code, 
but rather a subsystem/software layer,

sorry if I was unclear.


>
>>>> @@ -124,6 +191,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
>>>>        if ( !has_vpci(pdev->domain) )
>>>>            return;
>>>>    
>>>> +    vpci_remove_virtual_device(pdev);
>>>>        vpci_remove_device(pdev);
>>>>    }
>>> And other call sites of vpci_remove_device() do not have a need of
>>> cleaning up guest_sbdf / vpci_dev_assigned_map?
>> I am not 100% sure, but it looks like they don't need. On the other
>> hand, even if they don't need that, doing the cleaning won't be an issue
>> at all,
>>
>> there is a check before cleaning (which will be extended as I proposed
>> above), so ...
>>
>>
>>> IOW I wonder if it
>>> wouldn't be better to have vpci_remove_device() do this as well
>>> (retaining - see my comment on the earlier patch) the simple aliasing
>>> of vpci_deassign_device() to vpci_remove_device()).
>>
>>      ... maybe yes. Shall I do that change?
> Well - yes please, afaic.


ok, will do


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 03/11] vpci/header: implement guest BAR register handlers
  2022-07-28  7:01       ` Jan Beulich
@ 2022-07-28 14:56         ` Oleksandr
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr @ 2022-07-28 14:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Oleksandr Andrushchenko, Roger Pau Monné, xen-devel


On 28.07.22 10:01, Jan Beulich wrote:

Hello Jan


> On 27.07.2022 18:17, Oleksandr wrote:
>> On 27.07.22 13:15, Jan Beulich wrote:
>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>> @@ -527,6 +592,17 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>>>            if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>>>>            {
>>>>                bars[i].type = VPCI_BAR_IO;
>>>> +
>>>> +#ifndef CONFIG_X86
>>>> +            if ( !is_hwdom )
>>>> +            {
>>>> +                rc = vpci_add_register(pdev->vpci, empty_bar_read, NULL,
>>>> +                                       reg, 4, &bars[i]);
>>>> +                if ( rc )
>>>> +                    goto fail;
>>>> +            }
>>>> +#endif
>>> Since long term this can't be correct, it wants a TODO comment put next
>>> to it.
>>
>> Looking into the previous versions of this patch (up to V3) I failed to
>> find any changes in current version which hadn't been discussed (and
>> agreed in some form).
>>
>> Could you please clarify what exactly can't be correct the long term,
>> for me to put the proper TODO here. Do you perhaps mean that TODO needs
>> to explain why we have to diverge?
> If a device has I/O port ranges, then that's typically for a reason.
> Drivers (in the guest) may therefore want to use those ranges to
> communicate with the device. Imagine in particular a device without
> any MMIO BARs, and with only I/O port one(s).
>
>>>> @@ -553,34 +635,47 @@ static int cf_check init_bars(struct pci_dev *pdev)
>>>>            bars[i].size = size;
>>>>            bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>>>>    
>>>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
>>>> -                               &bars[i]);
>>>> +        rc = vpci_add_register(pdev->vpci,
>>>> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
>>>> +                               is_hwdom ? bar_write : guest_bar_write,
>>>> +                               reg, 4, &bars[i]);
>>>>            if ( rc )
>>>> -        {
>>>> -            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>>>> -            return rc;
>>>> -        }
>>>> +            goto fail;
>>>>        }
>>>>    
>>>> -    /* Check expansion ROM. */
>>>> -    rc = pci_size_mem_bar(pdev->sbdf, rom_reg, &addr, &size, PCI_BAR_ROM);
>>>> -    if ( rc > 0 && size )
>>>> +    /* Check expansion ROM: we do not handle ROM for guests. */
>>>> +    if ( is_hwdom )
>>> This again can't be right long-term. Personally I'd prefer if the code
>>> was (largely) left as is, with adjustments (with suitable TODO comments)
>>> made on a much smaller scope only.
>>
>> I can revive a comment that Oleksandr Andrushchenko provided for earlier
>> version by transforming into TODO:
>>
>>
>> ROM BAR is only handled for the hardware domain and for guest domains
>> there is a stub: at the moment PCI expansion ROM handling is supported
>> for x86 only and it might not be used by other architectures without
>> emulating x86. Other use-cases may include using that expansion ROM before
>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>> wants to use the ROM code which seems to be rare.
> ROMs can contain other than x86 code. While reportedly mostly dead, EFI
> bytecode was an example of an abstraction layer supporting arbitrary
> architectures. Therefore a comment along these lines would be okay, but
> personally I'd prefer it to be less verbose - along the lines of the
> one to be supplied for the I/O port restriction.


Thanks for the clarification. I will add two TODOs.



>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-28  7:15         ` Jan Beulich
@ 2022-07-28 16:35           ` Oleksandr
  2022-07-29  6:06             ` Jan Beulich
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-28 16:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné,
	xen-devel


On 28.07.22 10:15, Jan Beulich wrote:

Hello Jan

> On 27.07.2022 21:39, Oleksandr wrote:
>> On 27.07.22 20:54, Oleksandr wrote:
>>> On 26.07.22 18:16, Jan Beulich wrote:
>>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>>> --- a/xen/arch/arm/vpci.c
>>>>> +++ b/xen/arch/arm/vpci.c
>>>>> @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v,
>>>>> mmio_info_t *info,
>>>>>        /* data is needed to prevent a pointer cast on 32bit */
>>>>>        unsigned long data;
>>>>>    +    /*
>>>>> +     * For the passed through devices we need to map their virtual
>>>>> SBDF
>>>>> +     * to the physical PCI device being passed through.
>>>>> +     */
>>>>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>>>>> +    {
>>>>> +        *r = ~0ul;
>>>>> +        return 1;
>>>>> +    }
>>>> I'm probably simply lacking specific Arm-side knowledge, but it strikes
>>>> me as odd that the need for translation would be dependent upon
>>>> "bridge".
>>>
>>> I am afraid I cannot answer immediately.
>>>
>>> I will analyze that question and provide an answer later on.
>>
>> Well, most likely that "valid" bridge pointer here is just used as an
>> indicator of hwdom currently, so no need to perform virt->phys
>> translation for sbdf.
>>
>> You can see that domain_vpci_init() passes a valid value for hwdom and
>> NULL for other domains when setting up vpci_mmio* callbacks.
> Oh, I see.
>
>> Alternatively, I guess we could use "!is_hardware_domain(v->domain)"
>> instead of "!bridge" in the first part of that check. Shall I?
> Maybe simply add a comment? Surely checking "bridge" is cheaper than
> using is_hardware_domain(), so I can see the benefit. But the larger
> arm/vpci.c grows, the less obvious the connection will be without a
> comment.


Agree the connection is worth a comment ...



>   (Instead of a comment, an alternative may be a suitable
> assertion, which then documents the connection at the same time, e.g.
> ASSERT(!bridge == !is_hardware_domain(v->domain)). But that won't be
> possible in e.g. vpci_sbdf_from_gpa(), where apparently a similar
> assumption is being made.)


    ... or indeed to put such ASSERT _before_ vpci_sbdf_from_gpa().

This will cover assumption being made in both places.


diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index a9fc5817f9..1d4b1ef39e 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -37,10 +37,24 @@ static int vpci_mmio_read(struct vcpu *v, 
mmio_info_t *info,
                            register_t *r, void *p)
  {
      struct pci_host_bridge *bridge = p;
-    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+    pci_sbdf_t sbdf;
      /* data is needed to prevent a pointer cast on 32bit */
      unsigned long data;

+    ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+    sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+    {
+        *r = ~0ul;
+        return 1;
+    }
+
      if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
                          1U << info->dabt.size, &data) )
      {
@@ -57,7 +71,18 @@ static int vpci_mmio_write(struct vcpu *v, 
mmio_info_t *info,
                             register_t r, void *p)
  {
      struct pci_host_bridge *bridge = p;
-    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+    pci_sbdf_t sbdf;
+
+    ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+    sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+        return 1;

      return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
                             1U << info->dabt.size, r);
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index d4601ecf9b..fc2c51dc3e 100644


Any preference here?


Personally, I think that such ASSERT will better explain the connection 
than the comment will do.


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-28 16:35           ` Oleksandr
@ 2022-07-29  6:06             ` Jan Beulich
  2022-07-29 16:26               ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Jan Beulich @ 2022-07-29  6:06 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné,
	xen-devel

On 28.07.2022 18:35, Oleksandr wrote:
> On 28.07.22 10:15, Jan Beulich wrote:
>> On 27.07.2022 21:39, Oleksandr wrote:
>>> On 27.07.22 20:54, Oleksandr wrote:
>>>> On 26.07.22 18:16, Jan Beulich wrote:
>>>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>>>> --- a/xen/arch/arm/vpci.c
>>>>>> +++ b/xen/arch/arm/vpci.c
>>>>>> @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v,
>>>>>> mmio_info_t *info,
>>>>>>        /* data is needed to prevent a pointer cast on 32bit */
>>>>>>        unsigned long data;
>>>>>>    +    /*
>>>>>> +     * For the passed through devices we need to map their virtual
>>>>>> SBDF
>>>>>> +     * to the physical PCI device being passed through.
>>>>>> +     */
>>>>>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>>>>>> +    {
>>>>>> +        *r = ~0ul;
>>>>>> +        return 1;
>>>>>> +    }
>>>>> I'm probably simply lacking specific Arm-side knowledge, but it strikes
>>>>> me as odd that the need for translation would be dependent upon
>>>>> "bridge".
>>>>
>>>> I am afraid I cannot answer immediately.
>>>>
>>>> I will analyze that question and provide an answer later on.
>>>
>>> Well, most likely that "valid" bridge pointer here is just used as an
>>> indicator of hwdom currently, so no need to perform virt->phys
>>> translation for sbdf.
>>>
>>> You can see that domain_vpci_init() passes a valid value for hwdom and
>>> NULL for other domains when setting up vpci_mmio* callbacks.
>> Oh, I see.
>>
>>> Alternatively, I guess we could use "!is_hardware_domain(v->domain)"
>>> instead of "!bridge" in the first part of that check. Shall I?
>> Maybe simply add a comment? Surely checking "bridge" is cheaper than
>> using is_hardware_domain(), so I can see the benefit. But the larger
>> arm/vpci.c grows, the less obvious the connection will be without a
>> comment.
> 
> 
> Agree the connection is worth a comment ...
> 
> 
> 
>>   (Instead of a comment, an alternative may be a suitable
>> assertion, which then documents the connection at the same time, e.g.
>> ASSERT(!bridge == !is_hardware_domain(v->domain)). But that won't be
>> possible in e.g. vpci_sbdf_from_gpa(), where apparently a similar
>> assumption is being made.)
> 
> 
>     ... or indeed to put such ASSERT _before_ vpci_sbdf_from_gpa().
> 
> This will cover assumption being made in both places.
> 
> 
> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
> index a9fc5817f9..1d4b1ef39e 100644
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -37,10 +37,24 @@ static int vpci_mmio_read(struct vcpu *v, 
> mmio_info_t *info,
>                             register_t *r, void *p)
>   {
>       struct pci_host_bridge *bridge = p;
> -    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
> +    pci_sbdf_t sbdf;
>       /* data is needed to prevent a pointer cast on 32bit */
>       unsigned long data;
> 
> +    ASSERT(!bridge == !is_hardware_domain(v->domain));
> +
> +    sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
> +
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
> +    {
> +        *r = ~0ul;
> +        return 1;
> +    }
> +
>       if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
>                           1U << info->dabt.size, &data) )
>       {
> @@ -57,7 +71,18 @@ static int vpci_mmio_write(struct vcpu *v, 
> mmio_info_t *info,
>                              register_t r, void *p)
>   {
>       struct pci_host_bridge *bridge = p;
> -    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
> +    pci_sbdf_t sbdf;
> +
> +    ASSERT(!bridge == !is_hardware_domain(v->domain));
> +
> +    sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
> +
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
> +        return 1;
> 
>       return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
>                              1U << info->dabt.size, r);
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index d4601ecf9b..fc2c51dc3e 100644
> 
> 
> Any preference here?
> 
> 
> Personally, I think that such ASSERT will better explain the connection 
> than the comment will do.

Indeed I'd also prefer ASSERT()s being put there. But my opinion is
secondary here, as I'm not a maintainer of this code.

Jan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests
  2022-07-29  6:06             ` Jan Beulich
@ 2022-07-29 16:26               ` Oleksandr
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr @ 2022-07-29 16:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Roger Pau Monné,
	xen-devel


On 29.07.22 09:06, Jan Beulich wrote:

Hello Jan

> On 28.07.2022 18:35, Oleksandr wrote:
>> On 28.07.22 10:15, Jan Beulich wrote:
>>> On 27.07.2022 21:39, Oleksandr wrote:
>>>> On 27.07.22 20:54, Oleksandr wrote:
>>>>> On 26.07.22 18:16, Jan Beulich wrote:
>>>>>> On 19.07.2022 19:42, Oleksandr Tyshchenko wrote:
>>>>>>> --- a/xen/arch/arm/vpci.c
>>>>>>> +++ b/xen/arch/arm/vpci.c
>>>>>>> @@ -41,6 +41,16 @@ static int vpci_mmio_read(struct vcpu *v,
>>>>>>> mmio_info_t *info,
>>>>>>>         /* data is needed to prevent a pointer cast on 32bit */
>>>>>>>         unsigned long data;
>>>>>>>     +    /*
>>>>>>> +     * For the passed through devices we need to map their virtual
>>>>>>> SBDF
>>>>>>> +     * to the physical PCI device being passed through.
>>>>>>> +     */
>>>>>>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>>>>>>> +    {
>>>>>>> +        *r = ~0ul;
>>>>>>> +        return 1;
>>>>>>> +    }
>>>>>> I'm probably simply lacking specific Arm-side knowledge, but it strikes
>>>>>> me as odd that the need for translation would be dependent upon
>>>>>> "bridge".
>>>>> I am afraid I cannot answer immediately.
>>>>>
>>>>> I will analyze that question and provide an answer later on.
>>>> Well, most likely that "valid" bridge pointer here is just used as an
>>>> indicator of hwdom currently, so no need to perform virt->phys
>>>> translation for sbdf.
>>>>
>>>> You can see that domain_vpci_init() passes a valid value for hwdom and
>>>> NULL for other domains when setting up vpci_mmio* callbacks.
>>> Oh, I see.
>>>
>>>> Alternatively, I guess we could use "!is_hardware_domain(v->domain)"
>>>> instead of "!bridge" in the first part of that check. Shall I?
>>> Maybe simply add a comment? Surely checking "bridge" is cheaper than
>>> using is_hardware_domain(), so I can see the benefit. But the larger
>>> arm/vpci.c grows, the less obvious the connection will be without a
>>> comment.
>>
>> Agree the connection is worth a comment ...
>>
>>
>>
>>>    (Instead of a comment, an alternative may be a suitable
>>> assertion, which then documents the connection at the same time, e.g.
>>> ASSERT(!bridge == !is_hardware_domain(v->domain)). But that won't be
>>> possible in e.g. vpci_sbdf_from_gpa(), where apparently a similar
>>> assumption is being made.)
>>
>>      ... or indeed to put such ASSERT _before_ vpci_sbdf_from_gpa().
>>
>> This will cover assumption being made in both places.
>>
>>
>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>> index a9fc5817f9..1d4b1ef39e 100644
>> --- a/xen/arch/arm/vpci.c
>> +++ b/xen/arch/arm/vpci.c
>> @@ -37,10 +37,24 @@ static int vpci_mmio_read(struct vcpu *v,
>> mmio_info_t *info,
>>                              register_t *r, void *p)
>>    {
>>        struct pci_host_bridge *bridge = p;
>> -    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
>> +    pci_sbdf_t sbdf;
>>        /* data is needed to prevent a pointer cast on 32bit */
>>        unsigned long data;
>>
>> +    ASSERT(!bridge == !is_hardware_domain(v->domain));
>> +
>> +    sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
>> +
>> +    /*
>> +     * For the passed through devices we need to map their virtual SBDF
>> +     * to the physical PCI device being passed through.
>> +     */
>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>> +    {
>> +        *r = ~0ul;
>> +        return 1;
>> +    }
>> +
>>        if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
>>                            1U << info->dabt.size, &data) )
>>        {
>> @@ -57,7 +71,18 @@ static int vpci_mmio_write(struct vcpu *v,
>> mmio_info_t *info,
>>                               register_t r, void *p)
>>    {
>>        struct pci_host_bridge *bridge = p;
>> -    pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
>> +    pci_sbdf_t sbdf;
>> +
>> +    ASSERT(!bridge == !is_hardware_domain(v->domain));
>> +
>> +    sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
>> +
>> +    /*
>> +     * For the passed through devices we need to map their virtual SBDF
>> +     * to the physical PCI device being passed through.
>> +     */
>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>> +        return 1;
>>
>>        return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
>>                               1U << info->dabt.size, r);
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index d4601ecf9b..fc2c51dc3e 100644
>>
>>
>> Any preference here?
>>
>>
>> Personally, I think that such ASSERT will better explain the connection
>> than the comment will do.
> Indeed I'd also prefer ASSERT()s being put there.

good


>   But my opinion is
> secondary here, as I'm not a maintainer of this code.


sure, let's see what the Arm maintainers will say


>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole
  2022-07-19 17:42 ` [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole Oleksandr Tyshchenko
@ 2022-07-29 16:28   ` Oleksandr
  2022-08-03  9:29     ` Rahul Singh
  0 siblings, 1 reply; 48+ messages in thread
From: Oleksandr @ 2022-07-29 16:28 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, xen-devel


Hello Rahul


On 19.07.22 20:42, Oleksandr Tyshchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Add a stub for is_memory_hole which is required for PCI passthrough
> on Arm.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> OT: It looks like the discussion got stuck. As I understand this
> patch is not immediately needed in the context of current series
> as PCI passthrough is not enabled on Arm at the moment. So the patch
> could be added later on, but it is needed to allow PCI passthrough
> to be built on Arm for those who want to test it.
>
> Copy here some context provided by Julien:
>
> Here a summary of the discussion (+ some my follow-up thoughts):
>
> is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c
> "xen/pci: detect when BARs are not suitably positioned") to check
> whether the BAR are positioned outside of a valid memory range. This was
> introduced to work-around quirky firmware.
>
> In theory, this could also happen on Arm. In practice, this may not
> happen but it sounds better to sanity check that the BAR contains
> "valid" I/O range.
>
> On x86, this is implemented by checking the region is not described is
> in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined
> ranges. So I think it would be possible to implement is_memory_hole() by
> going through the list of hostbridges and check the ranges.
>
> But first, I'd like to confirm my understanding with Rahul, and others.


May I please ask about your opinion on that?


>
> If we were going to go this route, I would also rename the function to
> be better match what it is doing (i.e. it checks the BAR is correctly
> placed). As a potentially optimization/hardening for Arm, we could pass
> the hostbridge so we don't have to walk all of them.
> ---
>   xen/arch/arm/mm.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 009b8cd9ef..bb34b97eb5 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1708,6 +1708,12 @@ unsigned long get_upper_mfn_bound(void)
>       return max_page - 1;
>   }
>   
> +bool is_memory_hole(mfn_t start, mfn_t end)
> +{
> +    /* TODO: this needs to be properly implemented. */
> +    return true;
> +}
> +
>   /*
>    * Local variables:
>    * mode: C

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole
  2022-07-29 16:28   ` Oleksandr
@ 2022-08-03  9:29     ` Rahul Singh
  2022-08-03 14:18       ` Oleksandr
  0 siblings, 1 reply; 48+ messages in thread
From: Rahul Singh @ 2022-08-03  9:29 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, xen-devel

Hi Oleksandr,

> On 29 Jul 2022, at 5:28 pm, Oleksandr <olekstysh@gmail.com> wrote:
> 
> 
> Hello Rahul
> 
> 
> On 19.07.22 20:42, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> 
>> Add a stub for is_memory_hole which is required for PCI passthrough
>> on Arm.
>> 
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> OT: It looks like the discussion got stuck. As I understand this
>> patch is not immediately needed in the context of current series
>> as PCI passthrough is not enabled on Arm at the moment. So the patch
>> could be added later on, but it is needed to allow PCI passthrough
>> to be built on Arm for those who want to test it.
>> 
>> Copy here some context provided by Julien:
>> 
>> Here a summary of the discussion (+ some my follow-up thoughts):
>> 
>> is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c
>> "xen/pci: detect when BARs are not suitably positioned") to check
>> whether the BAR are positioned outside of a valid memory range. This was
>> introduced to work-around quirky firmware.
>> 
>> In theory, this could also happen on Arm. In practice, this may not
>> happen but it sounds better to sanity check that the BAR contains
>> "valid" I/O range.
>> 
>> On x86, this is implemented by checking the region is not described is
>> in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined
>> ranges. So I think it would be possible to implement is_memory_hole() by
>> going through the list of hostbridges and check the ranges.
>> 
>> But first, I'd like to confirm my understanding with Rahul, and others.
> 
> 
> May I please ask about your opinion on that?

I agree with Julien we can implement the something similar to is_memory_hole()  for ARM
that will check that the bar is within the bridge ranges.

If you are okay you can discard this patch in next version of the series and I will push the patch
for review.

Regards,
Rahul 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole
  2022-08-03  9:29     ` Rahul Singh
@ 2022-08-03 14:18       ` Oleksandr
  0 siblings, 0 replies; 48+ messages in thread
From: Oleksandr @ 2022-08-03 14:18 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Oleksandr Andrushchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, xen-devel


On 03.08.22 12:29, Rahul Singh wrote:
> Hi Oleksandr,


Hello Rahul


>
>> On 29 Jul 2022, at 5:28 pm, Oleksandr <olekstysh@gmail.com> wrote:
>>
>>
>> Hello Rahul
>>
>>
>> On 19.07.22 20:42, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Add a stub for is_memory_hole which is required for PCI passthrough
>>> on Arm.
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> ---
>>> OT: It looks like the discussion got stuck. As I understand this
>>> patch is not immediately needed in the context of current series
>>> as PCI passthrough is not enabled on Arm at the moment. So the patch
>>> could be added later on, but it is needed to allow PCI passthrough
>>> to be built on Arm for those who want to test it.
>>>
>>> Copy here some context provided by Julien:
>>>
>>> Here a summary of the discussion (+ some my follow-up thoughts):
>>>
>>> is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c
>>> "xen/pci: detect when BARs are not suitably positioned") to check
>>> whether the BAR are positioned outside of a valid memory range. This was
>>> introduced to work-around quirky firmware.
>>>
>>> In theory, this could also happen on Arm. In practice, this may not
>>> happen but it sounds better to sanity check that the BAR contains
>>> "valid" I/O range.
>>>
>>> On x86, this is implemented by checking the region is not described is
>>> in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined
>>> ranges. So I think it would be possible to implement is_memory_hole() by
>>> going through the list of hostbridges and check the ranges.
>>>
>>> But first, I'd like to confirm my understanding with Rahul, and others.
>>
>> May I please ask about your opinion on that?
> I agree with Julien we can implement the something similar to is_memory_hole()  for ARM
> that will check that the bar is within the bridge ranges.
>
> If you are okay you can discard this patch in next version of the series and I will push the patch
> for review.

Perfect! Thank you, that would be much appreciated!


>
> Regards,
> Rahul

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2022-08-03 14:18 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-19 17:42 [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Tyshchenko
2022-07-19 17:42 ` [PATCH V7 01/11] xen/pci: arm: add stub for is_memory_hole Oleksandr Tyshchenko
2022-07-29 16:28   ` Oleksandr
2022-08-03  9:29     ` Rahul Singh
2022-08-03 14:18       ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 02/11] vpci: add hooks for PCI device assign/de-assign Oleksandr Tyshchenko
2022-07-27 10:03   ` Jan Beulich
2022-07-27 14:01     ` Oleksandr
2022-07-27 14:35       ` Jan Beulich
2022-07-27 16:49         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 03/11] vpci/header: implement guest BAR register handlers Oleksandr Tyshchenko
2022-07-27 10:15   ` Jan Beulich
2022-07-27 16:17     ` Oleksandr
2022-07-28  7:01       ` Jan Beulich
2022-07-28 14:56         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 04/11] rangeset: add RANGESETF_no_print flag Oleksandr Tyshchenko
2022-07-26 14:48   ` Rahul Singh
2022-07-19 17:42 ` [PATCH V7 05/11] vpci/header: handle p2m range sets per BAR Oleksandr Tyshchenko
2022-07-19 17:42 ` [PATCH V7 06/11] vpci/header: program p2m with guest BAR view Oleksandr Tyshchenko
2022-07-27 10:19   ` Jan Beulich
2022-07-27 17:06     ` Oleksandr
2022-07-28  7:04       ` Jan Beulich
2022-07-19 17:42 ` [PATCH V7 07/11] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Tyshchenko
2022-07-26 15:30   ` Jan Beulich
2022-07-27 17:30     ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 08/11] vpci/header: reset the command register when adding devices Oleksandr Tyshchenko
2022-07-26 15:09   ` Rahul Singh
2022-07-26 15:23   ` Jan Beulich
2022-07-27  8:58     ` Oleksandr
2022-07-27  9:46       ` Jan Beulich
2022-07-27 16:53         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 09/11] vpci: add initial support for virtual PCI bus topology Oleksandr Tyshchenko
2022-07-27 10:32   ` Jan Beulich
2022-07-28 14:16     ` Oleksandr
2022-07-28 14:26       ` Jan Beulich
2022-07-28 14:41         ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 10/11] xen/arm: translate virtual PCI bus topology for guests Oleksandr Tyshchenko
2022-07-26 15:16   ` Jan Beulich
2022-07-27 17:54     ` Oleksandr
2022-07-27 19:39       ` Oleksandr
2022-07-28  7:15         ` Jan Beulich
2022-07-28 16:35           ` Oleksandr
2022-07-29  6:06             ` Jan Beulich
2022-07-29 16:26               ` Oleksandr
2022-07-19 17:42 ` [PATCH V7 11/11] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Tyshchenko
2022-07-26 14:50   ` Rahul Singh
2022-07-26 13:47 ` [PATCH V7 00/11] PCI devices passthrough on Arm, part 3 Rahul Singh
2022-07-26 15:18   ` Oleksandr Tyshchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.