All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/11] PCI devices passthrough on Arm, part 3
@ 2021-09-30  7:52 Oleksandr Andrushchenko
  2021-09-30  7:52 ` [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function Oleksandr Andrushchenko
                   ` (10 more replies)
  0 siblings, 11 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Hi, all!

This patch series is focusing on vPCI and adds support for non-identity
PCI BAR mappings which is required while passing through a PCI device to
a guest. The highlights are:

- Add relevant vpci register handlers when assigning PCI device to a domain
  and remove those when de-assigning. This allows having different
  handlers for different domains, e.g. hwdom and other guests.

- Emulate guest BAR register values based on physical BAR values.
  This allows creating a guest view of the registers and emulates
  size and properties probe as it is done during PCI device enumeration by
  the guest.

- Instead of handling a single range set, that contains all the memory
  regions of all the BARs and ROM, have them per BAR.

- Take into account guest's BAR view and program its p2m accordingly:
  gfn is guest's view of the BAR and mfn is the physical BAR value as set
  up by the host bridge in the hardware domain.
  This way hardware doamin sees physical BAR values and guest sees
  emulated ones.

The series also adds support for virtual PCI bus topology for guests:
 - We emulate a single host bridge for the guest, so segment is always 0.
 - The implementation is limited to 32 devices which are allowed on
   a single PCI bus.
 - The virtual bus number is set to 0, so virtual devices are seen
   as embedded endpoints behind the root complex.

The series was also tested on:
 - x86 PVH Dom0 and doesn't break it.
 - x86 HVM with PCI passthrough to DomU and doesn't break it.

Thank you,
Oleksandr

Oleksandr Andrushchenko (11):
  vpci: Make vpci registers removal a dedicated function
  vpci: Add hooks for PCI device assign/de-assign
  vpci/header: Move register assignments from init_bars
  vpci/header: Add and remove register handlers dynamically
  vpci/header: Implement guest BAR register handlers
  vpci/header: Handle p2m range sets per BAR
  vpci/header: program p2m with guest BAR view
  vpci/header: Emulate PCI_COMMAND register for guests
  vpci/header: Reset the command register when adding devices
  vpci: Add initial support for virtual PCI bus topology
  xen/arm: Translate virtual PCI bus topology for guests

 xen/arch/arm/domain.c         |   1 +
 xen/arch/arm/vpci.c           |  86 ++++++-
 xen/arch/arm/vpci.h           |   3 +
 xen/common/domain.c           |   3 +
 xen/drivers/Kconfig           |   4 +
 xen/drivers/passthrough/pci.c |  94 ++++++++
 xen/drivers/vpci/header.c     | 411 +++++++++++++++++++++++++++-------
 xen/drivers/vpci/vpci.c       |  42 +++-
 xen/include/asm-arm/pci.h     |   1 +
 xen/include/xen/pci.h         |  23 ++
 xen/include/xen/sched.h       |  10 +
 xen/include/xen/vpci.h        |  36 ++-
 12 files changed, 623 insertions(+), 91 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-13 11:11   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

This is in preparation for dynamic assignment of the vpci register
handlers depending on the domain: hwdom or guest.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
---
Since v1:
 - constify struct pci_dev where possible
---
 xen/drivers/vpci/vpci.c | 7 ++++++-
 xen/include/xen/vpci.h  | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index cbd1bac7fc33..1666402d55b8 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -35,7 +35,7 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
-void vpci_remove_device(struct pci_dev *pdev)
+void vpci_remove_device_registers(const struct pci_dev *pdev)
 {
     spin_lock(&pdev->vpci->lock);
     while ( !list_empty(&pdev->vpci->handlers) )
@@ -48,6 +48,11 @@ void vpci_remove_device(struct pci_dev *pdev)
         xfree(r);
     }
     spin_unlock(&pdev->vpci->lock);
+}
+
+void vpci_remove_device(struct pci_dev *pdev)
+{
+    vpci_remove_device_registers(pdev);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 9f5b5d52e159..2e910d0b1f90 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -28,6 +28,8 @@ int __must_check vpci_add_handlers(struct pci_dev *dev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
+/* Remove all handlers for the device given. */
+void vpci_remove_device_registers(const struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register(struct vpci *vpci,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
  2021-09-30  7:52 ` [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-09-30  8:21   ` Jan Beulich
  2021-10-13 11:29   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 03/11] vpci/header: Move register assignments from init_bars Oleksandr Andrushchenko
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

When a PCI device gets assigned/de-assigned some work on vPCI side needs
to be done for that device. Introduce a pair of hooks so vPCI can handle
that.

Please note, that in the current design the error path is handled by
the toolstack via XEN_DOMCTL_assign_device/XEN_DOMCTL_deassign_device,
so this is why it is acceptable not to de-assign devices if vPCI's
assign fails, e.g. the roll back will be handled on deassign_device when
it is called by the toolstack.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v2:
- define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
  for x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - extended the commit message
---
 xen/drivers/Kconfig           |  4 ++++
 xen/drivers/passthrough/pci.c |  9 +++++++++
 xen/drivers/vpci/vpci.c       | 23 +++++++++++++++++++++++
 xen/include/xen/vpci.h        | 20 ++++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47a6..780490cf8e39 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
 	bool
 
+config HAS_VPCI_GUEST_SUPPORT
+	bool
+	depends on HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 9f804a50e780..805ab86ed555 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -870,6 +870,10 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
     if ( ret )
         goto out;
 
+    ret = vpci_deassign_device(d, pdev);
+    if ( ret )
+        goto out;
+
     if ( pdev->domain == hardware_domain  )
         pdev->quarantine = false;
 
@@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
         rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
     }
 
+    if ( rc )
+        goto done;
+
+    rc = vpci_assign_device(d, pdev);
+
  done:
     if ( rc )
         printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1666402d55b8..0fe86cb30d23 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
 
     return rc;
 }
+
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Notify vPCI that device is assigned to guest. */
+int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
+{
+    /* It only makes sense to assign for hwdom or guest domain. */
+    if ( is_system_domain(d) || !has_vpci(d) )
+        return 0;
+
+    return 0;
+}
+
+/* Notify vPCI that device is de-assigned from guest. */
+int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
+{
+    /* It only makes sense to de-assign from hwdom or guest domain. */
+    if ( is_system_domain(d) || !has_vpci(d) )
+        return 0;
+
+    return 0;
+}
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 2e910d0b1f90..ecc08f2c0f65 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -242,6 +242,26 @@ static inline bool vpci_process_pending(struct vcpu *v)
 }
 #endif
 
+#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_HAS_VPCI_GUEST_SUPPORT)
+/* Notify vPCI that device is assigned/de-assigned to/from guest. */
+int __must_check vpci_assign_device(struct domain *d,
+                                    const struct pci_dev *dev);
+int __must_check vpci_deassign_device(struct domain *d,
+                                      const struct pci_dev *dev);
+#else
+static inline int vpci_assign_device(struct domain *d,
+                                     const struct pci_dev *dev)
+{
+    return 0;
+};
+
+static inline int vpci_deassign_device(struct domain *d,
+                                       const struct pci_dev *dev)
+{
+    return 0;
+};
+#endif
+
 #endif
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
  2021-09-30  7:52 ` [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function Oleksandr Andrushchenko
  2021-09-30  7:52 ` [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-13 13:51   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically Oleksandr Andrushchenko
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

This is in preparation for dynamic assignment of the vPCI register
handlers depending on the domain: hwdom or guest.
The need for this step is that it is easier to have all related functionality
put at one place. When the subsequent patches add decisions on which
handlers to install, e.g. hwdom or guest handlers, then this is easily
achievable.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v1:
 - constify struct pci_dev where possible
 - extend patch description
---
 xen/drivers/vpci/header.c | 83 ++++++++++++++++++++++++++-------------
 1 file changed, 56 insertions(+), 27 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index f8cd55e7c024..3d571356397a 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -445,6 +445,55 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
         rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
+static int add_bar_handlers(const struct pci_dev *pdev)
+{
+    unsigned int i;
+    struct vpci_header *header = &pdev->vpci->header;
+    struct vpci_bar *bars = header->bars;
+    int rc;
+
+    /* Setup a handler for the command register. */
+    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
+                           2, header);
+    if ( rc )
+        return rc;
+
+    if ( pdev->ignore_bars )
+        return 0;
+
+    for ( i = 0; i < PCI_HEADER_NORMAL_NR_BARS + 1; i++ )
+    {
+        if ( (bars[i].type == VPCI_BAR_IO) || (bars[i].type == VPCI_BAR_EMPTY) )
+            continue;
+
+        if ( bars[i].type == VPCI_BAR_ROM )
+        {
+            unsigned int rom_reg;
+            uint8_t header_type = pci_conf_read8(pdev->sbdf,
+                                                 PCI_HEADER_TYPE) & 0x7f;
+            if ( header_type == PCI_HEADER_TYPE_NORMAL )
+                rom_reg = PCI_ROM_ADDRESS;
+            else
+                rom_reg = PCI_ROM_ADDRESS1;
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
+                                   rom_reg, 4, &bars[i]);
+            if ( rc )
+                return rc;
+        }
+        else
+        {
+            uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
+
+            /* This is either VPCI_BAR_MEM32 or VPCI_BAR_MEM64_{LO|HI}. */
+            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
+                                   4, &bars[i]);
+            if ( rc )
+                return rc;
+        }
+    }
+    return 0;
+}
+
 static int init_bars(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -470,14 +519,8 @@ static int init_bars(struct pci_dev *pdev)
         return -EOPNOTSUPP;
     }
 
-    /* Setup a handler for the command register. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-                           2, header);
-    if ( rc )
-        return rc;
-
     if ( pdev->ignore_bars )
-        return 0;
+        return add_bar_handlers(pdev);
 
     /* Disable memory decoding before sizing. */
     cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
@@ -492,14 +535,6 @@ static int init_bars(struct pci_dev *pdev)
         if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
         {
             bars[i].type = VPCI_BAR_MEM64_HI;
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
-                                   4, &bars[i]);
-            if ( rc )
-            {
-                pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-                return rc;
-            }
-
             continue;
         }
 
@@ -532,14 +567,6 @@ static int init_bars(struct pci_dev *pdev)
         bars[i].addr = addr;
         bars[i].size = size;
         bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
-
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
-                               &bars[i]);
-        if ( rc )
-        {
-            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
-            return rc;
-        }
     }
 
     /* Check expansion ROM. */
@@ -553,11 +580,13 @@ static int init_bars(struct pci_dev *pdev)
         rom->addr = addr;
         header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
                               PCI_ROM_ADDRESS_ENABLE;
+    }
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
-                               4, rom);
-        if ( rc )
-            rom->type = VPCI_BAR_EMPTY;
+    rc = add_bar_handlers(pdev);
+    if ( rc )
+    {
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+        return rc;
     }
 
     return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (2 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 03/11] vpci/header: Move register assignments from init_bars Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-01 13:26   ` Jan Beulich
  2021-10-25 15:48   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers Oleksandr Andrushchenko
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Use stubs for guest domains for now.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
---
 xen/drivers/vpci/header.c | 72 ++++++++++++++++++++++++++++++++++-----
 xen/drivers/vpci/vpci.c   |  4 +--
 xen/include/xen/vpci.h    |  8 +++++
 3 files changed, 74 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 3d571356397a..1ce98795fcca 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -397,6 +397,17 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
     pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t val, void *data)
+{
+}
+
+static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
+                               void *data)
+{
+    return 0xffffffff;
+}
+
 static void rom_write(const struct pci_dev *pdev, unsigned int reg,
                       uint32_t val, void *data)
 {
@@ -445,14 +456,25 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
         rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
-static int add_bar_handlers(const struct pci_dev *pdev)
+static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t val, void *data)
+{
+}
+
+static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
+                               void *data)
+{
+    return 0xffffffff;
+}
+
+static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)
 {
     unsigned int i;
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
 
-    /* Setup a handler for the command register. */
+    /* Setup a handler for the command register: same for hwdom and guests. */
     rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
                            2, header);
     if ( rc )
@@ -475,8 +497,13 @@ static int add_bar_handlers(const struct pci_dev *pdev)
                 rom_reg = PCI_ROM_ADDRESS;
             else
                 rom_reg = PCI_ROM_ADDRESS1;
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
-                                   rom_reg, 4, &bars[i]);
+            if ( is_hwdom )
+                rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
+                                       rom_reg, 4, &bars[i]);
+            else
+                rc = vpci_add_register(pdev->vpci,
+                                       guest_rom_read, guest_rom_write,
+                                       rom_reg, 4, &bars[i]);
             if ( rc )
                 return rc;
         }
@@ -485,8 +512,13 @@ static int add_bar_handlers(const struct pci_dev *pdev)
             uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
 
             /* This is either VPCI_BAR_MEM32 or VPCI_BAR_MEM64_{LO|HI}. */
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
-                                   4, &bars[i]);
+            if ( is_hwdom )
+                rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write,
+                                       reg, 4, &bars[i]);
+            else
+                rc = vpci_add_register(pdev->vpci,
+                                       guest_bar_read, guest_bar_write,
+                                       reg, 4, &bars[i]);
             if ( rc )
                 return rc;
         }
@@ -520,7 +552,7 @@ static int init_bars(struct pci_dev *pdev)
     }
 
     if ( pdev->ignore_bars )
-        return add_bar_handlers(pdev);
+        return add_bar_handlers(pdev, true);
 
     /* Disable memory decoding before sizing. */
     cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
@@ -582,7 +614,7 @@ static int init_bars(struct pci_dev *pdev)
                               PCI_ROM_ADDRESS_ENABLE;
     }
 
-    rc = add_bar_handlers(pdev);
+    rc = add_bar_handlers(pdev, true);
     if ( rc )
     {
         pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
@@ -593,6 +625,30 @@ static int init_bars(struct pci_dev *pdev)
 }
 REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+int vpci_bar_add_handlers(const struct domain *d, const struct pci_dev *pdev)
+{
+    int rc;
+
+    /* Remove previously added registers. */
+    vpci_remove_device_registers(pdev);
+
+    rc = add_bar_handlers(pdev, is_hardware_domain(d));
+    if ( rc )
+        gdprintk(XENLOG_ERR,
+                 "%pp: failed to add BAR handlers for dom%pd: %d\n",
+                 &pdev->sbdf, d, rc);
+    return rc;
+}
+
+int vpci_bar_remove_handlers(const struct domain *d, const struct pci_dev *pdev)
+{
+    /* Remove previously added registers. */
+    vpci_remove_device_registers(pdev);
+    return 0;
+}
+#endif
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 0fe86cb30d23..702f7b5d5dda 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -95,7 +95,7 @@ int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
     if ( is_system_domain(d) || !has_vpci(d) )
         return 0;
 
-    return 0;
+    return vpci_bar_add_handlers(d, dev);
 }
 
 /* Notify vPCI that device is de-assigned from guest. */
@@ -105,7 +105,7 @@ int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
     if ( is_system_domain(d) || !has_vpci(d) )
         return 0;
 
-    return 0;
+    return vpci_bar_remove_handlers(d, dev);
 }
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index ecc08f2c0f65..fd822c903af5 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -57,6 +57,14 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
  */
 bool __must_check vpci_process_pending(struct vcpu *v);
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Add/remove BAR handlers for a domain. */
+int vpci_bar_add_handlers(const struct domain *d,
+                          const struct pci_dev *pdev);
+int vpci_bar_remove_handlers(const struct domain *d,
+                             const struct pci_dev *pdev);
+#endif
+
 struct vpci {
     /* List of vPCI handlers for a device. */
     struct list_head handlers;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (3 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-01 13:31   ` Jan Beulich
  2021-10-26  7:50   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR Oleksandr Andrushchenko
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

ROM BAR is only handled for the hardware domain and for guest domains
there is a stub: at the moment PCI expansion ROM is x86 only, so it
might not be used by other architectures without emulating x86. Other
use-cases may include using that expansion ROM before Xen boots, hence
no emulation is needed in Xen itself. Or when a guest wants to use the
ROM code which seems to be rare.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
---
Since v1:
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 xen/drivers/vpci/header.c | 30 +++++++++++++++++++++++++++++-
 xen/include/xen/vpci.h    |  3 +++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 1ce98795fcca..ec4d215f36ff 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -400,12 +400,38 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
 static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
                             uint32_t val, void *data)
 {
+    struct vpci_bar *bar = data;
+    bool hi = false;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+    else
+    {
+        val &= PCI_BASE_ADDRESS_MEM_MASK;
+        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
+                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
+        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
+    }
+
+    bar->guest_addr &= ~(0xffffffffull << (hi ? 32 : 0));
+    bar->guest_addr |= (uint64_t)val << (hi ? 32 : 0);
+
+    bar->guest_addr &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
 }
 
 static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
                                void *data)
 {
-    return 0xffffffff;
+    const struct vpci_bar *bar = data;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+        return bar->guest_addr >> 32;
+
+    return bar->guest_addr;
 }
 
 static void rom_write(const struct pci_dev *pdev, unsigned int reg,
@@ -522,6 +548,8 @@ static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)
             if ( rc )
                 return rc;
         }
+
+        bars[i].guest_addr = 0;
     }
     return 0;
 }
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index fd822c903af5..a0320b22cb36 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -75,7 +75,10 @@ struct vpci {
     struct vpci_header {
         /* Information about the PCI BARs of this device. */
         struct vpci_bar {
+            /* Physical view of the BAR. */
             uint64_t addr;
+            /* Guest view of the BAR. */
+            uint64_t guest_addr;
             uint64_t size;
             enum {
                 VPCI_BAR_EMPTY,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (4 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-25 11:51   ` Oleksandr Andrushchenko
  2021-10-26  9:08   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 07/11] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.

This is in preparation of making non-identity mappings in p2m for the
MMIOs/ROM.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 xen/drivers/vpci/header.c | 172 ++++++++++++++++++++++++++------------
 xen/include/xen/vpci.h    |   3 +-
 2 files changed, 122 insertions(+), 53 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index ec4d215f36ff..9c603d26d302 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,49 +131,75 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-    if ( v->vpci.mem )
+    if ( v->vpci.num_mem_ranges )
     {
         struct map_data data = {
             .d = v->domain,
             .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
         };
-        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
+        struct pci_dev *pdev = v->vpci.pdev;
+        struct vpci_header *header = &pdev->vpci->header;
+        unsigned int i;
 
-        if ( rc == -ERESTART )
-            return true;
+        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        {
+            struct vpci_bar *bar = &header->bars[i];
+            int rc;
 
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev,
-                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
+            if ( !bar->mem )
+                continue;
 
-        rangeset_destroy(v->vpci.mem);
-        v->vpci.mem = NULL;
-        if ( rc )
-            /*
-             * FIXME: in case of failure remove the device from the domain.
-             * Note that there might still be leftover mappings. While this is
-             * safe for Dom0, for DomUs the domain will likely need to be
-             * killed in order to avoid leaking stale p2m mappings on
-             * failure.
-             */
-            vpci_remove_device(v->vpci.pdev);
+            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
+
+            if ( rc == -ERESTART )
+                return true;
+
+            spin_lock(&pdev->vpci->lock);
+            /* Disable memory decoding unconditionally on failure. */
+            modify_decoding(pdev,
+                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
+                            !rc && v->vpci.rom_only);
+            spin_unlock(&pdev->vpci->lock);
+
+            rangeset_destroy(bar->mem);
+            bar->mem = NULL;
+            v->vpci.num_mem_ranges--;
+            if ( rc )
+                /*
+                 * FIXME: in case of failure remove the device from the domain.
+                 * Note that there might still be leftover mappings. While this is
+                 * safe for Dom0, for DomUs the domain will likely need to be
+                 * killed in order to avoid leaking stale p2m mappings on
+                 * failure.
+                 */
+                vpci_remove_device(pdev);
+        }
     }
 
     return false;
 }
 
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
-                            struct rangeset *mem, uint16_t cmd)
+                            uint16_t cmd)
 {
     struct map_data data = { .d = d, .map = true };
-    int rc;
+    struct vpci_header *header = &pdev->vpci->header;
+    int rc = 0;
+    unsigned int i;
+
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
 
-    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
-        process_pending_softirqs();
-    rangeset_destroy(mem);
+        if ( !bar->mem )
+            continue;
+
+        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
+                                              &data)) == -ERESTART )
+            process_pending_softirqs();
+        rangeset_destroy(bar->mem);
+        bar->mem = NULL;
+    }
     if ( !rc )
         modify_decoding(pdev, cmd, false);
 
@@ -181,7 +207,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
 }
 
 static void defer_map(struct domain *d, struct pci_dev *pdev,
-                      struct rangeset *mem, uint16_t cmd, bool rom_only)
+                      uint16_t cmd, bool rom_only, uint8_t num_mem_ranges)
 {
     struct vcpu *curr = current;
 
@@ -192,9 +218,9 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
      * started for the same device if the domain is not well-behaved.
      */
     curr->vpci.pdev = pdev;
-    curr->vpci.mem = mem;
     curr->vpci.cmd = cmd;
     curr->vpci.rom_only = rom_only;
+    curr->vpci.num_mem_ranges = num_mem_ranges;
     /*
      * Raise a scheduler softirq in order to prevent the guest from resuming
      * execution with pending mapping operations, to trigger the invocation
@@ -206,42 +232,47 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
 static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 {
     struct vpci_header *header = &pdev->vpci->header;
-    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
     struct pci_dev *tmp, *dev = NULL;
     const struct vpci_msix *msix = pdev->vpci->msix;
-    unsigned int i;
+    unsigned int i, j;
     int rc;
-
-    if ( !mem )
-        return -ENOMEM;
+    uint8_t num_mem_ranges;
 
     /*
-     * Create a rangeset that represents the current device BARs memory region
+     * Create a rangeset per BAR that represents the current device memory region
      * and compare it against all the currently active BAR memory regions. If
      * an overlap is found, subtract it from the region to be mapped/unmapped.
      *
-     * First fill the rangeset with all the BARs of this device or with the ROM
+     * First fill the rangesets with all the BARs of this device or with the ROM
      * BAR only, depending on whether the guest is toggling the memory decode
      * bit of the command register, or the enable bit of the ROM BAR register.
      */
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        const struct vpci_bar *bar = &header->bars[i];
+        struct vpci_bar *bar = &header->bars[i];
         unsigned long start = PFN_DOWN(bar->addr);
         unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
+        bar->mem = NULL;
+
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
                        : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
             continue;
 
-        rc = rangeset_add_range(mem, start, end);
+        bar->mem = rangeset_new(NULL, NULL, 0);
+        if ( !bar->mem )
+        {
+            rc = -ENOMEM;
+            goto fail;
+        }
+
+        rc = rangeset_add_range(bar->mem, start, end);
         if ( rc )
         {
             printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
                    start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            goto fail;
         }
     }
 
@@ -252,14 +283,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
         unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
                                      vmsix_table_size(pdev->vpci, i) - 1);
 
-        rc = rangeset_remove_range(mem, start, end);
-        if ( rc )
+        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
         {
-            printk(XENLOG_G_WARNING
-                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
-                   start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            const struct vpci_bar *bar = &header->bars[j];
+
+            if ( !bar->mem )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                printk(XENLOG_G_WARNING
+                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
+                       start, end, rc);
+                goto fail;
+            }
         }
     }
 
@@ -291,7 +329,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
             unsigned long start = PFN_DOWN(bar->addr);
             unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
-            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
+            if ( !bar->enabled ||
+                 !rangeset_overlaps_range(bar->mem, start, end) ||
                  /*
                   * If only the ROM enable bit is toggled check against other
                   * BARs in the same device for overlaps, but not against the
@@ -300,13 +339,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
                  (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
                 continue;
 
-            rc = rangeset_remove_range(mem, start, end);
+            rc = rangeset_remove_range(bar->mem, start, end);
             if ( rc )
             {
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
-                rangeset_destroy(mem);
-                return rc;
+                goto fail;
             }
         }
     }
@@ -324,12 +362,42 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
          * will always be to establish mappings and process all the BARs.
          */
         ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
-        return apply_map(pdev->domain, pdev, mem, cmd);
+        return apply_map(pdev->domain, pdev, cmd);
     }
 
-    defer_map(dev->domain, dev, mem, cmd, rom_only);
+    /* Find out how many memory ranges has left after MSI and overlaps. */
+    num_mem_ranges = 0;
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
+
+        if ( !rangeset_is_empty(bar->mem) )
+            num_mem_ranges++;
+    }
+
+    /*
+     * There are cases when PCI device, root port for example, has neither
+     * memory space nor IO. In this case PCI command register write is
+     * missed resulting in the underlying PCI device not functional, so:
+     *   - if there are no regions write the command register now
+     *   - if there are regions then defer work and write later on
+     */
+    if ( !num_mem_ranges )
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+    else
+        defer_map(dev->domain, dev, cmd, rom_only, num_mem_ranges);
 
     return 0;
+
+fail:
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
+
+        rangeset_destroy(bar->mem);
+        bar->mem = NULL;
+    }
+    return rc;
 }
 
 static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index a0320b22cb36..352e02d0106d 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -80,6 +80,7 @@ struct vpci {
             /* Guest view of the BAR. */
             uint64_t guest_addr;
             uint64_t size;
+            struct rangeset *mem;
             enum {
                 VPCI_BAR_EMPTY,
                 VPCI_BAR_IO,
@@ -154,9 +155,9 @@ struct vpci {
 
 struct vpci_vcpu {
     /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
-    struct rangeset *mem;
     struct pci_dev *pdev;
     uint16_t cmd;
+    uint8_t num_mem_ranges;
     bool rom_only : 1;
 };
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 07/11] vpci/header: program p2m with guest BAR view
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (5 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-01 13:38   ` Jan Beulich
  2021-10-26 10:35   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Take into account guest's BAR view and program its p2m accordingly:
gfn is guest's view of the BAR and mfn is the physical BAR value as set
up by the host bridge in the hardware domain.
This way hardware doamin sees physical BAR values and guest sees
emulated ones.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v2:
- improve readability for data.start_gfn and restructure ?: construct
Since v1:
 - s/MSI/MSI-X in comments
---
 xen/drivers/vpci/header.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 9c603d26d302..f23c956cde6c 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -30,6 +30,10 @@
 
 struct map_data {
     struct domain *d;
+    /* Start address of the BAR as seen by the guest. */
+    gfn_t start_gfn;
+    /* Physical start address of the BAR. */
+    mfn_t start_mfn;
     bool map;
 };
 
@@ -37,12 +41,28 @@ static int map_range(unsigned long s, unsigned long e, void *data,
                      unsigned long *c)
 {
     const struct map_data *map = data;
+    gfn_t start_gfn;
     int rc;
 
     for ( ; ; )
     {
         unsigned long size = e - s + 1;
 
+        /*
+         * Any BAR may have holes in its memory we want to map, e.g.
+         * we don't want to map MSI-X regions which may be a part of that BAR,
+         * e.g. when a single BAR is used for both MMIO and MSI-X.
+         * In this case MSI-X regions are subtracted from the mapping, but
+         * map->start_gfn still points to the very beginning of the BAR.
+         * So if there is a hole present then we need to adjust start_gfn
+         * to reflect the fact of that substraction.
+         */
+        start_gfn = gfn_add(map->start_gfn, s - mfn_x(map->start_mfn));
+
+        printk(XENLOG_G_DEBUG
+               "%smap [%lx, %lx] -> %#"PRI_gfn" for d%d\n",
+               map->map ? "" : "un", s, e, gfn_x(start_gfn),
+               map->d->domain_id);
         /*
          * ARM TODOs:
          * - On ARM whether the memory is prefetchable or not should be passed
@@ -52,8 +72,10 @@ static int map_range(unsigned long s, unsigned long e, void *data,
          * - {un}map_mmio_regions doesn't support preemption.
          */
 
-        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
-                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
+        rc = map->map ? map_mmio_regions(map->d, start_gfn,
+                                         size, _mfn(s))
+                      : unmap_mmio_regions(map->d, start_gfn,
+                                           size, _mfn(s));
         if ( rc == 0 )
         {
             *c += size;
@@ -69,6 +91,7 @@ static int map_range(unsigned long s, unsigned long e, void *data,
         ASSERT(rc < size);
         *c += rc;
         s += rc;
+        gfn_add(map->start_gfn, rc);
         if ( general_preempt_check() )
                 return -ERESTART;
     }
@@ -149,6 +172,10 @@ bool vpci_process_pending(struct vcpu *v)
             if ( !bar->mem )
                 continue;
 
+            data.start_gfn =
+                 _gfn(PFN_DOWN(is_hardware_domain(v->vpci.pdev->domain)
+                               ? bar->addr : bar->guest_addr));
+            data.start_mfn = _mfn(PFN_DOWN(bar->addr));
             rc = rangeset_consume_ranges(bar->mem, map_range, &data);
 
             if ( rc == -ERESTART )
@@ -194,6 +221,9 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
         if ( !bar->mem )
             continue;
 
+        data.start_gfn = _gfn(PFN_DOWN(is_hardware_domain(d)
+                                       ? bar->addr : bar->guest_addr));
+        data.start_mfn = _mfn(PFN_DOWN(bar->addr));
         while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
                                               &data)) == -ERESTART )
             process_pending_softirqs();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (6 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 07/11] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-26 10:52   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 09/11] vpci/header: Reset the command register when adding devices Oleksandr Andrushchenko
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add basic emulation support for guests. At the moment only emulate
PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left
as TODO.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
---
New in v2
---
 xen/drivers/vpci/header.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index f23c956cde6c..754aeb5a584f 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t cmd, void *data)
+{
+    /* TODO: Add proper emulation for all bits of the command register. */
+
+    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
+    {
+        /*
+         * Guest wants to enable INTx. It can't be enabled if:
+         *  - host has INTx disabled
+         *  - MSI/MSI-X enabled
+         */
+        if ( pdev->vpci->msi->enabled )
+            cmd |= PCI_COMMAND_INTX_DISABLE;
+        else
+        {
+            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
+
+            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
+                cmd |= PCI_COMMAND_INTX_DISABLE;
+        }
+    }
+
+    cmd_write(pdev, reg, cmd, data);
+}
+
 static void bar_write(const struct pci_dev *pdev, unsigned int reg,
                       uint32_t val, void *data)
 {
@@ -598,9 +624,12 @@ static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)
     struct vpci_bar *bars = header->bars;
     int rc;
 
-    /* Setup a handler for the command register: same for hwdom and guests. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-                           2, header);
+    if ( is_hwdom )
+        rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write,
+                               PCI_COMMAND, 2, header);
+    else
+        rc = vpci_add_register(pdev->vpci, vpci_hw_read16, guest_cmd_write,
+                               PCI_COMMAND, 2, header);
     if ( rc )
         return rc;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 09/11] vpci/header: Reset the command register when adding devices
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (7 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-10-26 11:00   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology Oleksandr Andrushchenko
  2021-09-30  7:52 ` [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests Oleksandr Andrushchenko
  10 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Reset the command register when passing through a PCI device:
it is possible that when passing through a PCI device its memory
decoding bits in the command register are already set. Thus, a
guest OS may not write to the command register to update memory
decoding, so guest mappings (guest's view of the BARs) are
left not updated.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
---
Since v1:
 - do not write 0 to the command register, but respect host settings.
---
 xen/drivers/vpci/header.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 754aeb5a584f..70d911b147e1 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -451,8 +451,7 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
-static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
-                            uint32_t cmd, void *data)
+static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
 {
     /* TODO: Add proper emulation for all bits of the command register. */
 
@@ -467,14 +466,20 @@ static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
             cmd |= PCI_COMMAND_INTX_DISABLE;
         else
         {
-            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
+            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
 
             if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
                 cmd |= PCI_COMMAND_INTX_DISABLE;
         }
     }
 
-    cmd_write(pdev, reg, cmd, data);
+    return cmd;
+}
+
+static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t cmd, void *data)
+{
+    cmd_write(pdev, reg, emulate_cmd_reg(pdev, cmd), data);
 }
 
 static void bar_write(const struct pci_dev *pdev, unsigned int reg,
@@ -793,6 +798,10 @@ int vpci_bar_add_handlers(const struct domain *d, const struct pci_dev *pdev)
         gdprintk(XENLOG_ERR,
                  "%pp: failed to add BAR handlers for dom%pd: %d\n",
                  &pdev->sbdf, d, rc);
+
+    /* Reset the command register with respect to host settings. */
+    pci_conf_write16(pdev->sbdf, PCI_COMMAND, emulate_cmd_reg(pdev, 0));
+
     return rc;
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (8 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 09/11] vpci/header: Reset the command register when adding devices Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-09-30  8:51   ` Jan Beulich
  2021-10-26 11:33   ` Roger Pau Monné
  2021-09-30  7:52 ` [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests Oleksandr Andrushchenko
  10 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
    functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/common/domain.c           |  3 ++
 xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c       | 14 +++++++-
 xen/include/xen/pci.h         | 22 +++++++++++++
 xen/include/xen/sched.h       |  8 +++++
 5 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 40d67ec34232..e0170087612d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
 
 #ifdef CONFIG_HAS_PCI
     INIT_LIST_HEAD(&d->pdev_list);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    INIT_LIST_HEAD(&d->vdev_list);
+#endif
 #endif
 
     /* All error paths can depend on the above setup. */
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 805ab86ed555..5b963d75d1ba 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
     return ret;
 }
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
+                                                const struct pci_dev *pdev)
+{
+    struct vpci_dev *vdev;
+
+    list_for_each_entry ( vdev, &d->vdev_list, list )
+        if ( vdev->pdev == pdev )
+            return vdev;
+    return NULL;
+}
+
+int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
+{
+    struct vpci_dev *vdev;
+
+    ASSERT(!pci_find_virtual_device(d, pdev));
+
+    /* Each PCI bus supports 32 devices/slots at max. */
+    if ( d->vpci_dev_next > 31 )
+        return -ENOSPC;
+
+    vdev = xzalloc(struct vpci_dev);
+    if ( !vdev )
+        return -ENOMEM;
+
+    /* We emulate a single host bridge for the guest, so segment is always 0. */
+    vdev->seg = 0;
+
+    /*
+     * The bus number is set to 0, so virtual devices are seen
+     * as embedded endpoints behind the root complex.
+     */
+    vdev->bus = 0;
+    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
+
+    vdev->pdev = pdev;
+    vdev->domain = d;
+
+    pcidevs_lock();
+    list_add_tail(&vdev->list, &d->vdev_list);
+    pcidevs_unlock();
+
+    return 0;
+}
+
+int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
+{
+    struct vpci_dev *vdev;
+
+    pcidevs_lock();
+    vdev = pci_find_virtual_device(d, pdev);
+    if ( vdev )
+        list_del(&vdev->list);
+    pcidevs_unlock();
+    xfree(vdev);
+    return 0;
+}
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 /* Caller should hold the pcidevs_lock */
 static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
                            uint8_t devfn)
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 702f7b5d5dda..d787f13e679e 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
 {
+    int rc;
+
     /* It only makes sense to assign for hwdom or guest domain. */
     if ( is_system_domain(d) || !has_vpci(d) )
         return 0;
 
-    return vpci_bar_add_handlers(d, dev);
+    rc = vpci_bar_add_handlers(d, dev);
+    if ( rc )
+        return rc;
+
+    return pci_add_virtual_device(d, dev);
 }
 
 /* Notify vPCI that device is de-assigned from guest. */
 int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
 {
+    int rc;
+
     /* It only makes sense to de-assign from hwdom or guest domain. */
     if ( is_system_domain(d) || !has_vpci(d) )
         return 0;
 
+    rc = pci_remove_virtual_device(d, dev);
+    if ( rc )
+        return rc;
+
     return vpci_bar_remove_handlers(d, dev);
 }
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 43b8a0817076..33033a3a8f8d 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -137,6 +137,24 @@ struct pci_dev {
     struct vpci *vpci;
 };
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+struct vpci_dev {
+    struct list_head list;
+    /* Physical PCI device this virtual device is connected to. */
+    const struct pci_dev *pdev;
+    /* Virtual SBDF of the device. */
+    union {
+        struct {
+            uint8_t devfn;
+            uint8_t bus;
+            uint16_t seg;
+        };
+        pci_sbdf_t sbdf;
+    };
+    struct domain *domain;
+};
+#endif
+
 #define for_each_pdev(domain, pdev) \
     list_for_each_entry(pdev, &(domain)->pdev_list, domain_list)
 
@@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg);
 int pci_add_device(u16 seg, u8 bus, u8 devfn,
                    const struct pci_dev_info *, nodeid_t node);
 int pci_remove_device(u16 seg, u8 bus, u8 devfn);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
+int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
+#endif
 int pci_ro_device(int seg, int bus, int devfn);
 int pci_hide_device(unsigned int seg, unsigned int bus, unsigned int devfn);
 struct pci_dev *pci_get_pdev(int seg, int bus, int devfn);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 28146ee404e6..ecdb04b4f7fc 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -444,6 +444,14 @@ struct domain
 
 #ifdef CONFIG_HAS_PCI
     struct list_head pdev_list;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    struct list_head vdev_list;
+    /*
+     * Current device number used by the virtual PCI bus topology
+     * to assign a unique SBDF to a passed through virtual PCI device.
+     */
+    int vpci_dev_next;
+#endif
 #endif
 
 #ifdef CONFIG_HAS_PASSTHROUGH
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (9 preceding siblings ...)
  2021-09-30  7:52 ` [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology Oleksandr Andrushchenko
@ 2021-09-30  7:52 ` Oleksandr Andrushchenko
  2021-09-30  8:53   ` Jan Beulich
                     ` (2 more replies)
  10 siblings, 3 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  7:52 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/domain.c         |  1 +
 xen/arch/arm/vpci.c           | 86 +++++++++++++++++++++++++++++++----
 xen/arch/arm/vpci.h           |  3 ++
 xen/drivers/passthrough/pci.c | 25 ++++++++++
 xen/include/asm-arm/pci.h     |  1 +
 xen/include/xen/pci.h         |  1 +
 xen/include/xen/sched.h       |  2 +
 7 files changed, 111 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index fa6fcc5e467c..095671742ad8 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d)
                        get_order_from_bytes(d->arch.efi_acpi_len));
 #endif
     domain_io_free(d);
+    domain_vpci_free(d);
 }
 
 void arch_domain_shutdown(struct domain *d)
diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 5d6c29c8dcd9..26ec2fa7cf2d 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -17,6 +17,14 @@
 
 #define REGISTER_OFFSET(addr)  ( (addr) & 0x00000fff)
 
+struct vpci_mmio_priv {
+    /*
+     * Set to true if the MMIO handlers were set up for the emulated
+     * ECAM host PCI bridge.
+     */
+    bool is_virt_ecam;
+};
+
 /* Do some sanity checks. */
 static bool vpci_mmio_access_allowed(unsigned int reg, unsigned int len)
 {
@@ -38,6 +46,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
     pci_sbdf_t sbdf;
     unsigned long data = ~0UL;
     unsigned int size = 1U << info->dabt.size;
+    struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p;
 
     sbdf.sbdf = MMCFG_BDF(info->gpa);
     reg = REGISTER_OFFSET(info->gpa);
@@ -45,6 +54,13 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
     if ( !vpci_mmio_access_allowed(reg, size) )
         return 0;
 
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) )
+            return 1;
+
     data = vpci_read(sbdf, reg, min(4u, size));
     if ( size == 8 )
         data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32;
@@ -61,6 +77,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
     pci_sbdf_t sbdf;
     unsigned long data = r;
     unsigned int size = 1U << info->dabt.size;
+    struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p;
 
     sbdf.sbdf = MMCFG_BDF(info->gpa);
     reg = REGISTER_OFFSET(info->gpa);
@@ -68,6 +85,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
     if ( !vpci_mmio_access_allowed(reg, size) )
         return 0;
 
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) )
+            return 1;
+
     vpci_write(sbdf, reg, min(4u, size), data);
     if ( size == 8 )
         vpci_write(sbdf, reg + 4, 4, data >> 32);
@@ -80,13 +104,48 @@ static const struct mmio_handler_ops vpci_mmio_handler = {
     .write = vpci_mmio_write,
 };
 
+/*
+ * There are three  originators for the PCI configuration space access:
+ * 1. The domain that owns physical host bridge: MMIO handlers are
+ *    there so we can update vPCI register handlers with the values
+ *    written by the hardware domain, e.g. physical view of the registers/
+ *    configuration space.
+ * 2. Guest access to the passed through PCI devices: we need to properly
+ *    map virtual bus topology to the physical one, e.g. pass the configuration
+ *    space access to the corresponding physical devices.
+ * 3. Emulated host PCI bridge access. It doesn't exist in the physical
+ *    topology, e.g. it can't be mapped to some physical host bridge.
+ *    So, all access to the host bridge itself needs to be trapped and
+ *    emulated.
+ */
 static int vpci_setup_mmio_handler(struct domain *d,
                                    struct pci_host_bridge *bridge)
 {
-    struct pci_config_window *cfg = bridge->cfg;
+    struct vpci_mmio_priv *priv;
+
+    priv = xzalloc(struct vpci_mmio_priv);
+    if ( !priv )
+        return -ENOMEM;
+
+    priv->is_virt_ecam = !is_hardware_domain(d);
 
-    register_mmio_handler(d, &vpci_mmio_handler,
-                          cfg->phys_addr, cfg->size, NULL);
+    if ( is_hardware_domain(d) )
+    {
+        struct pci_config_window *cfg = bridge->cfg;
+
+        bridge->mmio_priv = priv;
+        register_mmio_handler(d, &vpci_mmio_handler,
+                              cfg->phys_addr, cfg->size,
+                              priv);
+    }
+    else
+    {
+        d->vpci_mmio_priv = priv;
+        /* Guest domains use what is programmed in their device tree. */
+        register_mmio_handler(d, &vpci_mmio_handler,
+                              GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE,
+                              priv);
+    }
     return 0;
 }
 
@@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d)
     if ( !has_vpci(d) )
         return 0;
 
+    return pci_host_iterate_bridges(d, vpci_setup_mmio_handler);
+}
+
+static int domain_vpci_free_cb(struct domain *d,
+                               struct pci_host_bridge *bridge)
+{
     if ( is_hardware_domain(d) )
-        return pci_host_iterate_bridges(d, vpci_setup_mmio_handler);
+        XFREE(bridge->mmio_priv);
+    else
+        XFREE(d->vpci_mmio_priv);
+    return 0;
+}
 
-    /* Guest domains use what is programmed in their device tree. */
-    register_mmio_handler(d, &vpci_mmio_handler,
-                          GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL);
+void domain_vpci_free(struct domain *d)
+{
+    if ( !has_vpci(d) )
+        return;
 
-    return 0;
+    pci_host_iterate_bridges(d, domain_vpci_free_cb);
 }
 
 int domain_vpci_get_num_mmio_handlers(struct domain *d)
diff --git a/xen/arch/arm/vpci.h b/xen/arch/arm/vpci.h
index 27a2b069abd2..38e5a28c0d95 100644
--- a/xen/arch/arm/vpci.h
+++ b/xen/arch/arm/vpci.h
@@ -18,6 +18,7 @@
 #ifdef CONFIG_HAS_VPCI
 int domain_vpci_init(struct domain *d);
 int domain_vpci_get_num_mmio_handlers(struct domain *d);
+void domain_vpci_free(struct domain *d);
 #else
 static inline int domain_vpci_init(struct domain *d)
 {
@@ -28,6 +29,8 @@ static inline int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
     return 0;
 }
+
+static inline void domain_vpci_free(struct domain *d) { }
 #endif
 
 #endif /* __ARCH_ARM_VPCI_H__ */
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 5b963d75d1ba..b7dffb769cfd 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
     xfree(vdev);
     return 0;
 }
+
+/*
+ * Find the physical device which is mapped to the virtual device
+ * and translate virtual SBDF to the physical one.
+ */
+bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
+{
+    struct vpci_dev *vdev;
+    bool found = false;
+
+    pcidevs_lock();
+    list_for_each_entry ( vdev, &d->vdev_list, list )
+    {
+        if ( vdev->sbdf.sbdf == sbdf->sbdf )
+        {
+            /* Replace virtual SBDF with the physical one. */
+            *sbdf = vdev->pdev->sbdf;
+            found = true;
+            break;
+        }
+    }
+    pcidevs_unlock();
+
+    return found;
+}
 #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
 
 /* Caller should hold the pcidevs_lock */
diff --git a/xen/include/asm-arm/pci.h b/xen/include/asm-arm/pci.h
index 1bfba3da8f51..12b4bf467ad2 100644
--- a/xen/include/asm-arm/pci.h
+++ b/xen/include/asm-arm/pci.h
@@ -66,6 +66,7 @@ struct pci_host_bridge {
     uint16_t segment;                /* Segment number */
     struct pci_config_window* cfg;   /* Pointer to the bridge config window */
     struct pci_ops *ops;
+    void *mmio_priv;                 /* MMIO handler's private data. */
 };
 
 struct pci_ops {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 33033a3a8f8d..89cfc4853331 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -188,6 +188,7 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn);
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
 int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
 int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
+bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
 #endif
 int pci_ro_device(int seg, int bus, int devfn);
 int pci_hide_device(unsigned int seg, unsigned int bus, unsigned int devfn);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ecdb04b4f7fc..858b4133482f 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -451,6 +451,8 @@ struct domain
      * to assign a unique SBDF to a passed through virtual PCI device.
      */
     int vpci_dev_next;
+    /* Virtual PCI MMIO handler's private data. */
+    void *vpci_mmio_priv;
 #endif
 #endif
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30  7:52 ` [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
@ 2021-09-30  8:21   ` Jan Beulich
  2021-09-30  8:45     ` Oleksandr Andrushchenko
  2021-10-13 11:29   ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-09-30  8:21 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, xen-devel

On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> When a PCI device gets assigned/de-assigned some work on vPCI side needs
> to be done for that device. Introduce a pair of hooks so vPCI can handle
> that.
> 
> Please note, that in the current design the error path is handled by
> the toolstack via XEN_DOMCTL_assign_device/XEN_DOMCTL_deassign_device,
> so this is why it is acceptable not to de-assign devices if vPCI's
> assign fails, e.g. the roll back will be handled on deassign_device when
> it is called by the toolstack.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v2:
> - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
>   for x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - extended the commit message
> ---
>  xen/drivers/Kconfig           |  4 ++++
>  xen/drivers/passthrough/pci.c |  9 +++++++++

The Cc list does not match these files getting modified. Please make
sure you Cc maintainers, so they have a chance of noticing that
changes are being made which they may have an opinion on.

> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>          rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>      }
>  
> +    if ( rc )
> +        goto done;

From all I can tell this is dead code.

> +    rc = vpci_assign_device(d, pdev);
> +
>   done:
>      if ( rc )
>          printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  
>      return rc;
>  }
> +
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Notify vPCI that device is assigned to guest. */
> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
> +{
> +    /* It only makes sense to assign for hwdom or guest domain. */

Could you clarify for me in how far this code path is indeed intended
to be taken by hwdom? Because if it is, I'd like to further understand
the interaction with setup_hwdom_pci_devices().

> +    if ( is_system_domain(d) || !has_vpci(d) )
> +        return 0;
> +
> +    return 0;
> +}
> +
> +/* Notify vPCI that device is de-assigned from guest. */
> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
> +{
> +    /* It only makes sense to de-assign from hwdom or guest domain. */
> +    if ( is_system_domain(d) || !has_vpci(d) )
> +        return 0;
> +
> +    return 0;
> +}
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */

At this point of the series #ifdef is the less preferable variant of
arranging for dead code to get compiled out. I expect later patches
will change that?

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -242,6 +242,26 @@ static inline bool vpci_process_pending(struct vcpu *v)
>  }
>  #endif
>  
> +#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_HAS_VPCI_GUEST_SUPPORT)
> +/* Notify vPCI that device is assigned/de-assigned to/from guest. */
> +int __must_check vpci_assign_device(struct domain *d,
> +                                    const struct pci_dev *dev);
> +int __must_check vpci_deassign_device(struct domain *d,
> +                                      const struct pci_dev *dev);
> +#else
> +static inline int vpci_assign_device(struct domain *d,
> +                                     const struct pci_dev *dev)
> +{
> +    return 0;
> +};
> +
> +static inline int vpci_deassign_device(struct domain *d,
> +                                       const struct pci_dev *dev)
> +{
> +    return 0;
> +};
> +#endif

Please put the __must_check also on the stubs, or else people only
build-testing x86 may not notice that they might be introducing
build failures on Arm. Then again I'm not sure whether in this case
the attributes are necessary in the first place.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30  8:21   ` Jan Beulich
@ 2021-09-30  8:45     ` Oleksandr Andrushchenko
  2021-09-30  9:06       ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  8:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko, xen-devel



On 30.09.21 11:21, Jan Beulich wrote:
> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> When a PCI device gets assigned/de-assigned some work on vPCI side needs
>> to be done for that device. Introduce a pair of hooks so vPCI can handle
>> that.
>>
>> Please note, that in the current design the error path is handled by
>> the toolstack via XEN_DOMCTL_assign_device/XEN_DOMCTL_deassign_device,
>> so this is why it is acceptable not to de-assign devices if vPCI's
>> assign fails, e.g. the roll back will be handled on deassign_device when
>> it is called by the toolstack.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v2:
>> - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
>>    for x86
>> Since v1:
>>   - constify struct pci_dev where possible
>>   - do not open code is_system_domain()
>>   - extended the commit message
>> ---
>>   xen/drivers/Kconfig           |  4 ++++
>>   xen/drivers/passthrough/pci.c |  9 +++++++++
> The Cc list does not match these files getting modified. Please make
> sure you Cc maintainers, so they have a chance of noticing that
> changes are being made which they may have an opinion on.
Sure, I will go over the patches and put required Cc: next time
>
>> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>           rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>       }
>>   
>> +    if ( rc )
>> +        goto done;
>  From all I can tell this is dead code.
Before the change rc was set in the loop. And then we fall through
to the "done" label. I do agree that the way this code is done the
value of that rc will only reflect the last assignment done in the loop,
but with my change I didn't want to change the existing behavior,
thus "if ( rc"
>
>> +    rc = vpci_assign_device(d, pdev);
>> +
>>    done:
>>       if ( rc )
>>           printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>   
>>       return rc;
>>   }
>> +
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +/* Notify vPCI that device is assigned to guest. */
>> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>> +{
>> +    /* It only makes sense to assign for hwdom or guest domain. */
> Could you clarify for me in how far this code path is indeed intended
> to be taken by hwdom? Because if it is, I'd like to further understand
> the interaction with setup_hwdom_pci_devices().
setup_hwdom_pci_devices is not used on Arm as we do rely on
Dom0 to perform PCI host initialization and PCI device enumeration.

This is because of the fact that on Arm it is not a trivial task to
initialize a PCI host bridge in Xen, e.g. you need to properly initialize
power domains, clocks, quirks etc. for different SoCs.
All these make the task too complex and it was decided that at the
moment we do not want to bring PCI device drivers in Xen for that.
It was also decided that we expect Dom0 to take care of initialization
and enumeration.
Some day, when firmware can do PCI initialization for us and then we
can easily access ECAM, this will change. Then setup_hwdom_pci_devices
will be used on Arm as well.

Thus, we need to take care that Xen knows about the discovered
PCI devices via assign_device etc.
>
>> +    if ( is_system_domain(d) || !has_vpci(d) )
>> +        return 0;
>> +
>> +    return 0;
>> +}
>> +
>> +/* Notify vPCI that device is de-assigned from guest. */
>> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>> +{
>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>> +    if ( is_system_domain(d) || !has_vpci(d) )
>> +        return 0;
>> +
>> +    return 0;
>> +}
>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> At this point of the series #ifdef is the less preferable variant of
> arranging for dead code to get compiled out.
What is that other preferable way then?
>   I expect later patches
> will change that?
No, it is going to be this way all the time
>
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -242,6 +242,26 @@ static inline bool vpci_process_pending(struct vcpu *v)
>>   }
>>   #endif
>>   
>> +#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_HAS_VPCI_GUEST_SUPPORT)
>> +/* Notify vPCI that device is assigned/de-assigned to/from guest. */
>> +int __must_check vpci_assign_device(struct domain *d,
>> +                                    const struct pci_dev *dev);
>> +int __must_check vpci_deassign_device(struct domain *d,
>> +                                      const struct pci_dev *dev);
>> +#else
>> +static inline int vpci_assign_device(struct domain *d,
>> +                                     const struct pci_dev *dev)
>> +{
>> +    return 0;
>> +};
>> +
>> +static inline int vpci_deassign_device(struct domain *d,
>> +                                       const struct pci_dev *dev)
>> +{
>> +    return 0;
>> +};
>> +#endif
> Please put the __must_check also on the stubs, or else people only
> build-testing x86 may not notice that they might be introducing
> build failures on Arm. Then again I'm not sure whether in this case
> the attributes are necessary in the first place.
I will remove __must_check
>
> Jan
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-09-30  7:52 ` [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology Oleksandr Andrushchenko
@ 2021-09-30  8:51   ` Jan Beulich
  2021-09-30  9:34     ` Oleksandr Andrushchenko
  2021-10-26 11:33   ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-09-30  8:51 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, xen-devel

On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.

This reads a little odd: If bus is already known (and I think you imply
segment to also be known), it's only DF which get assigned.

> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.

Or up to 256 when there are multi-function ones. Imo you at least want
to spell out how that case is intended to be handled (even if maybe
the code doesn't cover that case yet, in which case a respective code
comment would also want leaving).

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      return ret;
>  }
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT

May I ask why the code enclosed by this conditional has been put here
rather than under drivers/vpci/?

> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
> +                                                const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    list_for_each_entry ( vdev, &d->vdev_list, list )
> +        if ( vdev->pdev == pdev )
> +            return vdev;
> +    return NULL;
> +}

No locking here or ...

> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    ASSERT(!pci_find_virtual_device(d, pdev));

... in this first caller that I've managed to spot? See also below as
to up the call tree the pcidevs-lock being held (which at the very
least you would then want to ASSERT() for here).

> +    /* Each PCI bus supports 32 devices/slots at max. */
> +    if ( d->vpci_dev_next > 31 )
> +        return -ENOSPC;

Please avoid open-coding literals when they can be suitably expressed.

> +    vdev = xzalloc(struct vpci_dev);
> +    if ( !vdev )
> +        return -ENOMEM;
> +
> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
> +    vdev->seg = 0;
> +
> +    /*
> +     * The bus number is set to 0, so virtual devices are seen
> +     * as embedded endpoints behind the root complex.
> +     */
> +    vdev->bus = 0;

Strictly speaking both of these assignments are redundant with you
using xzalloc(). I'd prefer if there was just a comment, as the compiler
has no way recognizing this in order to eliminate these stores.

> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
> +
> +    vdev->pdev = pdev;
> +    vdev->domain = d;
> +
> +    pcidevs_lock();
> +    list_add_tail(&vdev->list, &d->vdev_list);
> +    pcidevs_unlock();

I don't support a global lock getting (ab)used for per-domain list
management.

Apart from that I don't understand why you acquire the lock here. Does
that mean the functions further were truly left without any locking,
by you not having noticed that this lock is already being held by the
sole caller?

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  /* Notify vPCI that device is assigned to guest. */
>  int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to assign for hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> -    return vpci_bar_add_handlers(d, dev);
> +    rc = vpci_bar_add_handlers(d, dev);
> +    if ( rc )
> +        return rc;
> +
> +    return pci_add_virtual_device(d, dev);
>  }

I've peeked at the earlier patch, and both there and here I'm struggling to
see how undoing partially completed steps or fully completed earlier steps
is intended to work. I'm not convinced it is legitimate to leave handlers
in place until the tool stack manages to roll back the failed device
assignment.

>  /* Notify vPCI that device is de-assigned from guest. */
>  int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to de-assign from hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> +    rc = pci_remove_virtual_device(d, dev);
> +    if ( rc )
> +        return rc;
> +
>      return vpci_bar_remove_handlers(d, dev);
>  }

So what's the ultimate effect of a partially de-assigned device, where
one of the later steps failed? In a number of places we do best-effort
full cleanup, by recording errors but nevertheless continuing with
subsequent cleanup steps. I wonder whether that's a model to use here
as well.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -137,6 +137,24 @@ struct pci_dev {
>      struct vpci *vpci;
>  };
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +struct vpci_dev {
> +    struct list_head list;
> +    /* Physical PCI device this virtual device is connected to. */
> +    const struct pci_dev *pdev;
> +    /* Virtual SBDF of the device. */
> +    union {
> +        struct {
> +            uint8_t devfn;
> +            uint8_t bus;
> +            uint16_t seg;
> +        };
> +        pci_sbdf_t sbdf;

Could you explain to me why pci_sbdf_t (a typedef of a union) isn't
providing all you need? By putting it in a union with a custom
struct you set yourself up for things going out of sync if anyone
chose to alter pci_sbdf_t's layout.

> @@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg);
>  int pci_add_device(u16 seg, u8 bus, u8 devfn,
>                     const struct pci_dev_info *, nodeid_t node);
>  int pci_remove_device(u16 seg, u8 bus, u8 devfn);
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
> +#endif

Like for their definitions I question the placement of these
declarations.

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -444,6 +444,14 @@ struct domain
>  
>  #ifdef CONFIG_HAS_PCI
>      struct list_head pdev_list;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    struct list_head vdev_list;
> +    /*
> +     * Current device number used by the virtual PCI bus topology
> +     * to assign a unique SBDF to a passed through virtual PCI device.
> +     */
> +    int vpci_dev_next;

In how far can the number stored here be negative? If it can't be,
please use unsigned int.

As to the comment - "current" is ambiguous: Is it the number that
was used last, or the next one to be used?

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30  7:52 ` [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests Oleksandr Andrushchenko
@ 2021-09-30  8:53   ` Jan Beulich
  2021-09-30  9:35     ` Oleksandr Andrushchenko
  2021-09-30 16:57     ` Oleksandr Andrushchenko
  2021-10-18 18:32   ` Julien Grall
  2021-10-26 13:30   ` Roger Pau Monné
  2 siblings, 2 replies; 98+ messages in thread
From: Jan Beulich @ 2021-09-30  8:53 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, xen-devel

On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>      xfree(vdev);
>      return 0;
>  }
> +
> +/*
> + * Find the physical device which is mapped to the virtual device
> + * and translate virtual SBDF to the physical one.
> + */
> +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
> +{
> +    struct vpci_dev *vdev;

const (afaict)

> +    bool found = false;
> +
> +    pcidevs_lock();
> +    list_for_each_entry ( vdev, &d->vdev_list, list )
> +    {
> +        if ( vdev->sbdf.sbdf == sbdf->sbdf )
> +        {
> +            /* Replace virtual SBDF with the physical one. */
> +            *sbdf = vdev->pdev->sbdf;
> +            found = true;
> +            break;
> +        }
> +    }
> +    pcidevs_unlock();

As per the comments on the earlier patch, locking as well as placement
may need reconsidering.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30  8:45     ` Oleksandr Andrushchenko
@ 2021-09-30  9:06       ` Jan Beulich
  2021-09-30  9:21         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-09-30  9:06 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel

On 30.09.2021 10:45, Oleksandr Andrushchenko wrote:
> On 30.09.21 11:21, Jan Beulich wrote:
>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>>           rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>>       }
>>>   
>>> +    if ( rc )
>>> +        goto done;
>>  From all I can tell this is dead code.
> Before the change rc was set in the loop. And then we fall through
> to the "done" label. I do agree that the way this code is done the
> value of that rc will only reflect the last assignment done in the loop,
> but with my change I didn't want to change the existing behavior,
> thus "if ( rc"

rc is always 0 upon loop exit, afaict:

    for ( ; pdev->phantom_stride; rc = 0 )

Granted this is unusual and hence possibly unexpected.

>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>   
>>>       return rc;
>>>   }
>>> +
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +/* Notify vPCI that device is assigned to guest. */
>>> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>> +{
>>> +    /* It only makes sense to assign for hwdom or guest domain. */
>> Could you clarify for me in how far this code path is indeed intended
>> to be taken by hwdom? Because if it is, I'd like to further understand
>> the interaction with setup_hwdom_pci_devices().
> setup_hwdom_pci_devices is not used on Arm as we do rely on
> Dom0 to perform PCI host initialization and PCI device enumeration.
> 
> This is because of the fact that on Arm it is not a trivial task to
> initialize a PCI host bridge in Xen, e.g. you need to properly initialize
> power domains, clocks, quirks etc. for different SoCs.
> All these make the task too complex and it was decided that at the
> moment we do not want to bring PCI device drivers in Xen for that.
> It was also decided that we expect Dom0 to take care of initialization
> and enumeration.
> Some day, when firmware can do PCI initialization for us and then we
> can easily access ECAM, this will change. Then setup_hwdom_pci_devices
> will be used on Arm as well.
> 
> Thus, we need to take care that Xen knows about the discovered
> PCI devices via assign_device etc.

Fair enough, but since I've not spotted a patch expressing this (by
adding suitable conditionals), may I ask that you do so in yet another
patch (unless I've overlooked where this gets done)?

>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>> +        return 0;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* Notify vPCI that device is de-assigned from guest. */
>>> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>> +{
>>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>> +        return 0;
>>> +
>>> +    return 0;
>>> +}
>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>> At this point of the series #ifdef is the less preferable variant of
>> arranging for dead code to get compiled out.
> What is that other preferable way then?

"if ( !IS_ENABLED() )" as I did already point out to you yesterday in
reply to v2 of patch 10 of this very series.

>>   I expect later patches
>> will change that?
> No, it is going to be this way all the time

The question wasn't whether you switch away from the #ifdef-s, but
whether later patches leave that as the only choice (avoiding build
breakage).

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30  9:06       ` Jan Beulich
@ 2021-09-30  9:21         ` Oleksandr Andrushchenko
  2021-09-30 10:14           ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  9:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel, Oleksandr Andrushchenko



On 30.09.21 12:06, Jan Beulich wrote:
> On 30.09.2021 10:45, Oleksandr Andrushchenko wrote:
>> On 30.09.21 11:21, Jan Beulich wrote:
>>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>>> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>>>            rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>>>        }
>>>>    
>>>> +    if ( rc )
>>>> +        goto done;
>>>   From all I can tell this is dead code.
>> Before the change rc was set in the loop. And then we fall through
>> to the "done" label. I do agree that the way this code is done the
>> value of that rc will only reflect the last assignment done in the loop,
>> but with my change I didn't want to change the existing behavior,
>> thus "if ( rc"
> rc is always 0 upon loop exit, afaict:
>
>      for ( ; pdev->phantom_stride; rc = 0 )
>
> Granted this is unusual and hence possibly unexpected.
I will remove that check then. Do we want a comment about rc == 0,
so it is seen why there is no check for rc?
>
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>>    
>>>>        return rc;
>>>>    }
>>>> +
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +/* Notify vPCI that device is assigned to guest. */
>>>> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>> +{
>>>> +    /* It only makes sense to assign for hwdom or guest domain. */
>>> Could you clarify for me in how far this code path is indeed intended
>>> to be taken by hwdom? Because if it is, I'd like to further understand
>>> the interaction with setup_hwdom_pci_devices().
>> setup_hwdom_pci_devices is not used on Arm as we do rely on
>> Dom0 to perform PCI host initialization and PCI device enumeration.
>>
>> This is because of the fact that on Arm it is not a trivial task to
>> initialize a PCI host bridge in Xen, e.g. you need to properly initialize
>> power domains, clocks, quirks etc. for different SoCs.
>> All these make the task too complex and it was decided that at the
>> moment we do not want to bring PCI device drivers in Xen for that.
>> It was also decided that we expect Dom0 to take care of initialization
>> and enumeration.
>> Some day, when firmware can do PCI initialization for us and then we
>> can easily access ECAM, this will change. Then setup_hwdom_pci_devices
>> will be used on Arm as well.
>>
>> Thus, we need to take care that Xen knows about the discovered
>> PCI devices via assign_device etc.
> Fair enough, but since I've not spotted a patch expressing this
Well, it is all in the RFC for PCI passthrough on Arm which is mentioned
in series from Arm and EPAM (part 2). I didn't mention the RFC in the
cover letter for this series though.
>   (by
> adding suitable conditionals), may I ask that you do so in yet another
> patch (unless I've overlooked where this gets done)?
Could you please elaborate more on which conditionals you are
talking about here? I'm afraid I didn't understand this part.
>
>>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>>> +        return 0;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +/* Notify vPCI that device is de-assigned from guest. */
>>>> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>> +{
>>>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>>> +        return 0;
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>> At this point of the series #ifdef is the less preferable variant of
>>> arranging for dead code to get compiled out.
>> What is that other preferable way then?
> "if ( !IS_ENABLED() )" as I did already point out to you yesterday in
> reply to v2 of patch 10 of this very series.
Please see below
>
>>>    I expect later patches
>>> will change that?
>> No, it is going to be this way all the time
> The question wasn't whether you switch away from the #ifdef-s, but
> whether later patches leave that as the only choice (avoiding build
> breakage).
Yes, the code is going to always remain ifdef'ed, so we don't have
dead code for x86 (at least).
So, does the above mean that you are ok with "#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT"
and there is no need for "if ( !IS_ENABLED() )"?
>
> Jan
>
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-09-30  8:51   ` Jan Beulich
@ 2021-09-30  9:34     ` Oleksandr Andrushchenko
  2021-09-30 10:23       ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  9:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko, xen-devel



On 30.09.21 11:51, Jan Beulich wrote:
> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Assign SBDF to the PCI devices being passed through with bus 0.
> This reads a little odd: If bus is already known (and I think you imply
> segment to also be known), it's only DF which get assigned.
But at the end of the day we set all the parts of that SBDF.
Otherwise I should write "Assign DF as we know that bus and segment
are 0"
>
>> The resulting topology is where PCIe devices reside on the bus 0 of the
>> root complex itself (embedded endpoints).
>> This implementation is limited to 32 devices which are allowed on
>> a single PCI bus.
> Or up to 256 when there are multi-function ones. Imo you at least want
> to spell out how that case is intended to be handled (even if maybe
> the code doesn't cover that case yet, in which case a respective code
> comment would also want leaving).
We are not supporting multi-function yet, so I'll add a comment.
>
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>       return ret;
>>   }
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> May I ask why the code enclosed by this conditional has been put here
> rather than under drivers/vpci/?
Indeed this can be moved to xen/drivers/vpci/vpci.c.
I'll move and update function names accordingly.
>
>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>> +                                                const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>> +        if ( vdev->pdev == pdev )
>> +            return vdev;
>> +    return NULL;
>> +}
> No locking here or ...
>
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    ASSERT(!pci_find_virtual_device(d, pdev));
> ... in this first caller that I've managed to spot? See also below as
> to up the call tree the pcidevs-lock being held (which at the very
> least you would then want to ASSERT() for here).
I will move the code to vpci and make sure proper locking there
>
>> +    /* Each PCI bus supports 32 devices/slots at max. */
>> +    if ( d->vpci_dev_next > 31 )
>> +        return -ENOSPC;
> Please avoid open-coding literals when they can be suitably expressed.
I failed to find a suitable constant for that. Could you please point
me to the one I can use here?
>
>> +    vdev = xzalloc(struct vpci_dev);
>> +    if ( !vdev )
>> +        return -ENOMEM;
>> +
>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>> +    vdev->seg = 0;
>> +
>> +    /*
>> +     * The bus number is set to 0, so virtual devices are seen
>> +     * as embedded endpoints behind the root complex.
>> +     */
>> +    vdev->bus = 0;
> Strictly speaking both of these assignments are redundant with you
> using xzalloc(). I'd prefer if there was just a comment, as the compiler
> has no way recognizing this in order to eliminate these stores.
Yes, I just put the assignments to be explicitly seen here as being 0.
I will remove those and put a comment.
>
>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>> +
>> +    vdev->pdev = pdev;
>> +    vdev->domain = d;
>> +
>> +    pcidevs_lock();
>> +    list_add_tail(&vdev->list, &d->vdev_list);
>> +    pcidevs_unlock();
> I don't support a global lock getting (ab)used for per-domain list
> management.
>
> Apart from that I don't understand why you acquire the lock here. Does
> that mean the functions further were truly left without any locking,
> by you not having noticed that this lock is already being held by the
> sole caller?
I'll re-work locking with respect to the new location for this, e.g. vpci
>
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>   /* Notify vPCI that device is assigned to guest. */
>>   int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to assign for hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> -    return vpci_bar_add_handlers(d, dev);
>> +    rc = vpci_bar_add_handlers(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>> +    return pci_add_virtual_device(d, dev);
>>   }
> I've peeked at the earlier patch, and both there and here I'm struggling to
> see how undoing partially completed steps or fully completed earlier steps
> is intended to work. I'm not convinced it is legitimate to leave handlers
> in place until the tool stack manages to roll back the failed device
> assignment.
I'll see what and how we can roll back in case of error
>
>>   /* Notify vPCI that device is de-assigned from guest. */
>>   int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to de-assign from hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> +    rc = pci_remove_virtual_device(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>>       return vpci_bar_remove_handlers(d, dev);
>>   }
> So what's the ultimate effect of a partially de-assigned device, where
> one of the later steps failed? In a number of places we do best-effort
> full cleanup, by recording errors but nevertheless continuing with
> subsequent cleanup steps. I wonder whether that's a model to use here
> as well.
>
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -137,6 +137,24 @@ struct pci_dev {
>>       struct vpci *vpci;
>>   };
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +struct vpci_dev {
>> +    struct list_head list;
>> +    /* Physical PCI device this virtual device is connected to. */
>> +    const struct pci_dev *pdev;
>> +    /* Virtual SBDF of the device. */
>> +    union {
>> +        struct {
>> +            uint8_t devfn;
>> +            uint8_t bus;
>> +            uint16_t seg;
>> +        };
>> +        pci_sbdf_t sbdf;
> Could you explain to me why pci_sbdf_t (a typedef of a union) isn't
> providing all you need? By putting it in a union with a custom
> struct you set yourself up for things going out of sync if anyone
> chose to alter pci_sbdf_t's layout.
Sure, pci_sbdf_t should be enough
>
>> @@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg);
>>   int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>                      const struct pci_dev_info *, nodeid_t node);
>>   int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
>> +#endif
> Like for their definitions I question the placement of these
> declarations.
Will move to vpci.h
>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -444,6 +444,14 @@ struct domain
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       struct list_head pdev_list;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    struct list_head vdev_list;
>> +    /*
>> +     * Current device number used by the virtual PCI bus topology
>> +     * to assign a unique SBDF to a passed through virtual PCI device.
>> +     */
>> +    int vpci_dev_next;
> In how far can the number stored here be negative? If it can't be,
> please use unsigned int.
Will use unsigned
>
> As to the comment - "current" is ambiguous: Is it the number that
> was used last, or the next one to be used?
I will update the comment to remove ambiguity
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30  8:53   ` Jan Beulich
@ 2021-09-30  9:35     ` Oleksandr Andrushchenko
  2021-09-30 10:25       ` Jan Beulich
  2021-09-30 16:57     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30  9:35 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko, xen-devel



On 30.09.21 11:53, Jan Beulich wrote:
> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>       xfree(vdev);
>>       return 0;
>>   }
>> +
>> +/*
>> + * Find the physical device which is mapped to the virtual device
>> + * and translate virtual SBDF to the physical one.
>> + */
>> +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
>> +{
>> +    struct vpci_dev *vdev;
> const (afaict)
Ok
>
>> +    bool found = false;
>> +
>> +    pcidevs_lock();
>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>> +    {
>> +        if ( vdev->sbdf.sbdf == sbdf->sbdf )
>> +        {
>> +            /* Replace virtual SBDF with the physical one. */
>> +            *sbdf = vdev->pdev->sbdf;
>> +            found = true;
>> +            break;
>> +        }
>> +    }
>> +    pcidevs_unlock();
> As per the comments on the earlier patch, locking as well as placement
> may need reconsidering.
Other then that do you have other comments on this?
>
> Jan
>

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30  9:21         ` Oleksandr Andrushchenko
@ 2021-09-30 10:14           ` Jan Beulich
  2021-09-30 10:30             ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-09-30 10:14 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel

On 30.09.2021 11:21, Oleksandr Andrushchenko wrote:
> On 30.09.21 12:06, Jan Beulich wrote:
>> On 30.09.2021 10:45, Oleksandr Andrushchenko wrote:
>>> On 30.09.21 11:21, Jan Beulich wrote:
>>>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>>>> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>>>>            rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>>>>        }
>>>>>    
>>>>> +    if ( rc )
>>>>> +        goto done;
>>>>   From all I can tell this is dead code.
>>> Before the change rc was set in the loop. And then we fall through
>>> to the "done" label. I do agree that the way this code is done the
>>> value of that rc will only reflect the last assignment done in the loop,
>>> but with my change I didn't want to change the existing behavior,
>>> thus "if ( rc"
>> rc is always 0 upon loop exit, afaict:
>>
>>      for ( ; pdev->phantom_stride; rc = 0 )
>>
>> Granted this is unusual and hence possibly unexpected.
> I will remove that check then. Do we want a comment about rc == 0,
> so it is seen why there is no check for rc?

So far we've been doing fine without such a comment, but I wouldn't
object to a well worded one getting added.

>>>>> --- a/xen/drivers/vpci/vpci.c
>>>>> +++ b/xen/drivers/vpci/vpci.c
>>>>> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>>>    
>>>>>        return rc;
>>>>>    }
>>>>> +
>>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>>> +/* Notify vPCI that device is assigned to guest. */
>>>>> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>>> +{
>>>>> +    /* It only makes sense to assign for hwdom or guest domain. */
>>>> Could you clarify for me in how far this code path is indeed intended
>>>> to be taken by hwdom? Because if it is, I'd like to further understand
>>>> the interaction with setup_hwdom_pci_devices().
>>> setup_hwdom_pci_devices is not used on Arm as we do rely on
>>> Dom0 to perform PCI host initialization and PCI device enumeration.
>>>
>>> This is because of the fact that on Arm it is not a trivial task to
>>> initialize a PCI host bridge in Xen, e.g. you need to properly initialize
>>> power domains, clocks, quirks etc. for different SoCs.
>>> All these make the task too complex and it was decided that at the
>>> moment we do not want to bring PCI device drivers in Xen for that.
>>> It was also decided that we expect Dom0 to take care of initialization
>>> and enumeration.
>>> Some day, when firmware can do PCI initialization for us and then we
>>> can easily access ECAM, this will change. Then setup_hwdom_pci_devices
>>> will be used on Arm as well.
>>>
>>> Thus, we need to take care that Xen knows about the discovered
>>> PCI devices via assign_device etc.
>> Fair enough, but since I've not spotted a patch expressing this
> Well, it is all in the RFC for PCI passthrough on Arm which is mentioned
> in series from Arm and EPAM (part 2). I didn't mention the RFC in the
> cover letter for this series though.
>>   (by
>> adding suitable conditionals), may I ask that you do so in yet another
>> patch (unless I've overlooked where this gets done)?
> Could you please elaborate more on which conditionals you are
> talking about here? I'm afraid I didn't understand this part.

By putting it inside #if or adding "if ( !IS_ENABLED() )", you'd make
very obvious that the code in question isn't used, and hence no
interaction issues with vPCI exist.

>>>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>>>> +        return 0;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +/* Notify vPCI that device is de-assigned from guest. */
>>>>> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>>> +{
>>>>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>>>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>>>> +        return 0;
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>> At this point of the series #ifdef is the less preferable variant of
>>>> arranging for dead code to get compiled out.
>>> What is that other preferable way then?
>> "if ( !IS_ENABLED() )" as I did already point out to you yesterday in
>> reply to v2 of patch 10 of this very series.
> Please see below
>>
>>>>    I expect later patches
>>>> will change that?
>>> No, it is going to be this way all the time
>> The question wasn't whether you switch away from the #ifdef-s, but
>> whether later patches leave that as the only choice (avoiding build
>> breakage).
> Yes, the code is going to always remain ifdef'ed, so we don't have
> dead code for x86 (at least).
> So, does the above mean that you are ok with "#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT"
> and there is no need for "if ( !IS_ENABLED() )"?

I'm afraid you still didn't understand: "if ( !IS_ENABLED() )" is
also a way to make sure there's (almost) no dead code. And this model
has the advantage that the compiler would still check all that code
even in x86 builds (throwing away most of it in one of its DCE passes),
reducing the risk for someone not routinely doing Arm builds to
introduce a build issue.

But as soon a code references struct members which sit inside an
#ifdef, that code can't use this preferred approach anymore. That's
what I suspect might be happening in subsequent patches, which would
then justify your choice of #ifdef.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-09-30  9:34     ` Oleksandr Andrushchenko
@ 2021-09-30 10:23       ` Jan Beulich
  2021-09-30 10:26         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-09-30 10:23 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel

On 30.09.2021 11:34, Oleksandr Andrushchenko wrote:
> On 30.09.21 11:51, Jan Beulich wrote:
>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>       return ret;
>>>   }
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> May I ask why the code enclosed by this conditional has been put here
>> rather than under drivers/vpci/?
> Indeed this can be moved to xen/drivers/vpci/vpci.c.
> I'll move and update function names accordingly.
>>
>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>> +                                                const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>> +        if ( vdev->pdev == pdev )
>>> +            return vdev;
>>> +    return NULL;
>>> +}
>> No locking here or ...
>>
>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>> ... in this first caller that I've managed to spot? See also below as
>> to up the call tree the pcidevs-lock being held (which at the very
>> least you would then want to ASSERT() for here).
> I will move the code to vpci and make sure proper locking there
>>
>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>> +    if ( d->vpci_dev_next > 31 )
>>> +        return -ENOSPC;
>> Please avoid open-coding literals when they can be suitably expressed.
> I failed to find a suitable constant for that. Could you please point
> me to the one I can use here?

I wasn't hinting at a constant, but at an expression. If you grep, you
will find e.g. at least one instance of PCI_FUNC(~0); I'd suggest to
use PCI_SLOT(~0) here. (My rule of thumb is: Before I write a literal
number anywhere outside of a #define, and not 0 or 1 or some such
starting a loop, I try to think hard how that number can instead be
expressed. Such expressions then often also serve as documentation for
what the number actually means, helping future readers.)

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30  9:35     ` Oleksandr Andrushchenko
@ 2021-09-30 10:25       ` Jan Beulich
  0 siblings, 0 replies; 98+ messages in thread
From: Jan Beulich @ 2021-09-30 10:25 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel

On 30.09.2021 11:35, Oleksandr Andrushchenko wrote:
> On 30.09.21 11:53, Jan Beulich wrote:
>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>       xfree(vdev);
>>>       return 0;
>>>   }
>>> +
>>> +/*
>>> + * Find the physical device which is mapped to the virtual device
>>> + * and translate virtual SBDF to the physical one.
>>> + */
>>> +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
>>> +{
>>> +    struct vpci_dev *vdev;
>> const (afaict)
> Ok
>>
>>> +    bool found = false;
>>> +
>>> +    pcidevs_lock();
>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>> +    {
>>> +        if ( vdev->sbdf.sbdf == sbdf->sbdf )
>>> +        {
>>> +            /* Replace virtual SBDF with the physical one. */
>>> +            *sbdf = vdev->pdev->sbdf;
>>> +            found = true;
>>> +            break;
>>> +        }
>>> +    }
>>> +    pcidevs_unlock();
>> As per the comments on the earlier patch, locking as well as placement
>> may need reconsidering.
> Other then that do you have other comments on this?

Iirc this was the only thing here. But I haven't got around to look
at patches 4-9 yet ...

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-09-30 10:23       ` Jan Beulich
@ 2021-09-30 10:26         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30 10:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel, Oleksandr Andrushchenko



On 30.09.21 13:23, Jan Beulich wrote:
> On 30.09.2021 11:34, Oleksandr Andrushchenko wrote:
>> On 30.09.21 11:51, Jan Beulich wrote:
>>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>>        return ret;
>>>>    }
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> May I ask why the code enclosed by this conditional has been put here
>>> rather than under drivers/vpci/?
>> Indeed this can be moved to xen/drivers/vpci/vpci.c.
>> I'll move and update function names accordingly.
>>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>>> +                                                const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>>> +        if ( vdev->pdev == pdev )
>>>> +            return vdev;
>>>> +    return NULL;
>>>> +}
>>> No locking here or ...
>>>
>>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>> ... in this first caller that I've managed to spot? See also below as
>>> to up the call tree the pcidevs-lock being held (which at the very
>>> least you would then want to ASSERT() for here).
>> I will move the code to vpci and make sure proper locking there
>>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>>> +    if ( d->vpci_dev_next > 31 )
>>>> +        return -ENOSPC;
>>> Please avoid open-coding literals when they can be suitably expressed.
>> I failed to find a suitable constant for that. Could you please point
>> me to the one I can use here?
> I wasn't hinting at a constant, but at an expression. If you grep, you
> will find e.g. at least one instance of PCI_FUNC(~0); I'd suggest to
> use PCI_SLOT(~0) here.
Great, will use this. It is indeed does the job in a clear way.
Thank you!!
>   (My rule of thumb is: Before I write a literal
> number anywhere outside of a #define, and not 0 or 1 or some such
> starting a loop, I try to think hard how that number can instead be
> expressed. Such expressions then often also serve as documentation for
> what the number actually means, helping future readers.)
Sounds good
> Jan
>
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30 10:14           ` Jan Beulich
@ 2021-09-30 10:30             ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30 10:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel, Oleksandr Andrushchenko



On 30.09.21 13:14, Jan Beulich wrote:
> On 30.09.2021 11:21, Oleksandr Andrushchenko wrote:
>> On 30.09.21 12:06, Jan Beulich wrote:
>>> On 30.09.2021 10:45, Oleksandr Andrushchenko wrote:
>>>> On 30.09.21 11:21, Jan Beulich wrote:
>>>>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>>>>> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>>>>>             rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>>>>>         }
>>>>>>     
>>>>>> +    if ( rc )
>>>>>> +        goto done;
>>>>>    From all I can tell this is dead code.
>>>> Before the change rc was set in the loop. And then we fall through
>>>> to the "done" label. I do agree that the way this code is done the
>>>> value of that rc will only reflect the last assignment done in the loop,
>>>> but with my change I didn't want to change the existing behavior,
>>>> thus "if ( rc"
>>> rc is always 0 upon loop exit, afaict:
>>>
>>>       for ( ; pdev->phantom_stride; rc = 0 )
>>>
>>> Granted this is unusual and hence possibly unexpected.
>> I will remove that check then. Do we want a comment about rc == 0,
>> so it is seen why there is no check for rc?
> So far we've been doing fine without such a comment, but I wouldn't
> object to a well worded one getting added.
>
>>>>>> --- a/xen/drivers/vpci/vpci.c
>>>>>> +++ b/xen/drivers/vpci/vpci.c
>>>>>> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>>>>     
>>>>>>         return rc;
>>>>>>     }
>>>>>> +
>>>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>>>> +/* Notify vPCI that device is assigned to guest. */
>>>>>> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>>>> +{
>>>>>> +    /* It only makes sense to assign for hwdom or guest domain. */
>>>>> Could you clarify for me in how far this code path is indeed intended
>>>>> to be taken by hwdom? Because if it is, I'd like to further understand
>>>>> the interaction with setup_hwdom_pci_devices().
>>>> setup_hwdom_pci_devices is not used on Arm as we do rely on
>>>> Dom0 to perform PCI host initialization and PCI device enumeration.
>>>>
>>>> This is because of the fact that on Arm it is not a trivial task to
>>>> initialize a PCI host bridge in Xen, e.g. you need to properly initialize
>>>> power domains, clocks, quirks etc. for different SoCs.
>>>> All these make the task too complex and it was decided that at the
>>>> moment we do not want to bring PCI device drivers in Xen for that.
>>>> It was also decided that we expect Dom0 to take care of initialization
>>>> and enumeration.
>>>> Some day, when firmware can do PCI initialization for us and then we
>>>> can easily access ECAM, this will change. Then setup_hwdom_pci_devices
>>>> will be used on Arm as well.
>>>>
>>>> Thus, we need to take care that Xen knows about the discovered
>>>> PCI devices via assign_device etc.
>>> Fair enough, but since I've not spotted a patch expressing this
>> Well, it is all in the RFC for PCI passthrough on Arm which is mentioned
>> in series from Arm and EPAM (part 2). I didn't mention the RFC in the
>> cover letter for this series though.
>>>    (by
>>> adding suitable conditionals), may I ask that you do so in yet another
>>> patch (unless I've overlooked where this gets done)?
>> Could you please elaborate more on which conditionals you are
>> talking about here? I'm afraid I didn't understand this part.
> By putting it inside #if or adding "if ( !IS_ENABLED() )", you'd make
> very obvious that the code in question isn't used, and hence no
> interaction issues with vPCI exist.
>
>>>>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Notify vPCI that device is de-assigned from guest. */
>>>>>> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>>>> +{
>>>>>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>>>>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>>>>> +        return 0;
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>>> At this point of the series #ifdef is the less preferable variant of
>>>>> arranging for dead code to get compiled out.
>>>> What is that other preferable way then?
>>> "if ( !IS_ENABLED() )" as I did already point out to you yesterday in
>>> reply to v2 of patch 10 of this very series.
>> Please see below
>>>>>     I expect later patches
>>>>> will change that?
>>>> No, it is going to be this way all the time
>>> The question wasn't whether you switch away from the #ifdef-s, but
>>> whether later patches leave that as the only choice (avoiding build
>>> breakage).
>> Yes, the code is going to always remain ifdef'ed, so we don't have
>> dead code for x86 (at least).
>> So, does the above mean that you are ok with "#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT"
>> and there is no need for "if ( !IS_ENABLED() )"?
> I'm afraid you still didn't understand: "if ( !IS_ENABLED() )" is
> also a way to make sure there's (almost) no dead code. And this model
> has the advantage that the compiler would still check all that code
> even in x86 builds (throwing away most of it in one of its DCE passes),
> reducing the risk for someone not routinely doing Arm builds to
> introduce a build issue.
>
> But as soon a code references struct members which sit inside an
> #ifdef, that code can't use this preferred approach anymore. That's
> what I suspect might be happening in subsequent patches, which would
> then justify your choice of #ifdef.
This is the key to my not understanding: indeed, there are
structure members which are ifdef'ed thus rendering the idea with
IS_ENABLED not applicable:
@@ -444,6 +444,14 @@ struct domain

+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    struct list_head vdev_list;
+    /*
+     * Current device number used by the virtual PCI bus topology
+     * to assign a unique SBDF to a passed through virtual PCI device.
+     */
+    int vpci_dev_next;
+#endif

>
> Jan
>
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30  8:53   ` Jan Beulich
  2021-09-30  9:35     ` Oleksandr Andrushchenko
@ 2021-09-30 16:57     ` Oleksandr Andrushchenko
  2021-10-01  7:42       ` Jan Beulich
  1 sibling, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-09-30 16:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Oleksandr Andrushchenko,
	Bertrand Marquis, Rahul Singh, xen-devel

[snip]

>> +    bool found = false;
>> +
>> +    pcidevs_lock();
>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>> +    {
>> +        if ( vdev->sbdf.sbdf == sbdf->sbdf )
>> +        {
>> +            /* Replace virtual SBDF with the physical one. */
>> +            *sbdf = vdev->pdev->sbdf;
>> +            found = true;
>> +            break;
>> +        }
>> +    }
>> +    pcidevs_unlock();
> As per the comments on the earlier patch, locking as well as placement
> may need reconsidering.
I was thinking about the locking happening here.
So, there are 4 sources where we need to manipulate d->vdev_list:
1. XEN_DOMCTL_assign_device
2. XEN_DOMCTL_test_assign_device
3. XEN_DOMCTL_deassign_device
4. MMIO handlers
5. Do I miss others?

The first three already use pcidevs_{lock|unlock} and there it seems
to be ok as those get called when PCI devices are discovered by Dom0
and during guest domain creation. So, this is assumed not to happen
frequently and can be accepted wrt global locking.

What is more important is the fourth case, where in order to redirect
configuration space access from virtual SBDF to physical SBDF we need
to traverse the d->vdev_list each time the guest accesses PCI configuration
space. This means that with each such access we take a BIG PCI lock...

That being said, I think that we may want having a dedicated per-domain
lock for d->vdev_list handling, e.g. d->vdev_lock.
At the same time we may also consider that even for guests it is acceptable
to use pcidevs_{lock|unlock} as this will not affect PCI memory space access
and only has influence during device setup.

I would love to hear your opinion on this
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30 16:57     ` Oleksandr Andrushchenko
@ 2021-10-01  7:42       ` Jan Beulich
  2021-10-01  7:57         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-10-01  7:42 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Oleksandr Andrushchenko,
	Bertrand Marquis, Rahul Singh, xen-devel

On 30.09.2021 18:57, Oleksandr Andrushchenko wrote:
> [snip]
> 
>>> +    bool found = false;
>>> +
>>> +    pcidevs_lock();
>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>> +    {
>>> +        if ( vdev->sbdf.sbdf == sbdf->sbdf )
>>> +        {
>>> +            /* Replace virtual SBDF with the physical one. */
>>> +            *sbdf = vdev->pdev->sbdf;
>>> +            found = true;
>>> +            break;
>>> +        }
>>> +    }
>>> +    pcidevs_unlock();
>> As per the comments on the earlier patch, locking as well as placement
>> may need reconsidering.
> I was thinking about the locking happening here.
> So, there are 4 sources where we need to manipulate d->vdev_list:
> 1. XEN_DOMCTL_assign_device
> 2. XEN_DOMCTL_test_assign_device
> 3. XEN_DOMCTL_deassign_device
> 4. MMIO handlers
> 5. Do I miss others?
> 
> The first three already use pcidevs_{lock|unlock} and there it seems
> to be ok as those get called when PCI devices are discovered by Dom0
> and during guest domain creation. So, this is assumed not to happen
> frequently and can be accepted wrt global locking.
> 
> What is more important is the fourth case, where in order to redirect
> configuration space access from virtual SBDF to physical SBDF we need
> to traverse the d->vdev_list each time the guest accesses PCI configuration
> space. This means that with each such access we take a BIG PCI lock...
> 
> That being said, I think that we may want having a dedicated per-domain
> lock for d->vdev_list handling, e.g. d->vdev_lock.
> At the same time we may also consider that even for guests it is acceptable
> to use pcidevs_{lock|unlock} as this will not affect PCI memory space access
> and only has influence during device setup.
> 
> I would love to hear your opinion on this

I've voiced my opinion already: Using the global lock really is an
abuse, which would require good justification. Hence unless there's
anything speaking against a per-domain lock, that's imo the only
suitable route to go. Nesting rules with the global lock may want
explicitly spelling out.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-10-01  7:42       ` Jan Beulich
@ 2021-10-01  7:57         ` Oleksandr Andrushchenko
  2021-10-01  8:12           ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-01  7:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel, Oleksandr Andrushchenko



On 01.10.21 10:42, Jan Beulich wrote:
> On 30.09.2021 18:57, Oleksandr Andrushchenko wrote:
>> [snip]
>>
>>>> +    bool found = false;
>>>> +
>>>> +    pcidevs_lock();
>>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>>> +    {
>>>> +        if ( vdev->sbdf.sbdf == sbdf->sbdf )
>>>> +        {
>>>> +            /* Replace virtual SBDF with the physical one. */
>>>> +            *sbdf = vdev->pdev->sbdf;
>>>> +            found = true;
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +    pcidevs_unlock();
>>> As per the comments on the earlier patch, locking as well as placement
>>> may need reconsidering.
>> I was thinking about the locking happening here.
>> So, there are 4 sources where we need to manipulate d->vdev_list:
>> 1. XEN_DOMCTL_assign_device
>> 2. XEN_DOMCTL_test_assign_device
>> 3. XEN_DOMCTL_deassign_device
>> 4. MMIO handlers
>> 5. Do I miss others?
>>
>> The first three already use pcidevs_{lock|unlock} and there it seems
>> to be ok as those get called when PCI devices are discovered by Dom0
>> and during guest domain creation. So, this is assumed not to happen
>> frequently and can be accepted wrt global locking.
>>
>> What is more important is the fourth case, where in order to redirect
>> configuration space access from virtual SBDF to physical SBDF we need
>> to traverse the d->vdev_list each time the guest accesses PCI configuration
>> space. This means that with each such access we take a BIG PCI lock...
>>
>> That being said, I think that we may want having a dedicated per-domain
>> lock for d->vdev_list handling, e.g. d->vdev_lock.
>> At the same time we may also consider that even for guests it is acceptable
>> to use pcidevs_{lock|unlock} as this will not affect PCI memory space access
>> and only has influence during device setup.
>>
>> I would love to hear your opinion on this
> I've voiced my opinion already: Using the global lock really is an
> abuse, which would require good justification. Hence unless there's
> anything speaking against a per-domain lock, that's imo the only
> suitable route to go. Nesting rules with the global lock may want
> explicitly spelling out.
I do understand your concern here and also support the idea that
the less we wait for locks the better. Nevertheless, even if I introduce
d->vdev_lock, which will obviously help MMIO traps, the rest will remain
under pcidevs_{lock|unlock}, e.g. XEN_DOMCTL_assign_device,
XEN_DOMCTL_test_assign_device and XEN_DOMCTL_deassign_device
and the underlying code like vpci_{assign|deassign}_device in my case

Anyways, I'll implement a per-domain d->vdev_lock
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-10-01  7:57         ` Oleksandr Andrushchenko
@ 2021-10-01  8:12           ` Jan Beulich
  0 siblings, 0 replies; 98+ messages in thread
From: Jan Beulich @ 2021-10-01  8:12 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel

On 01.10.2021 09:57, Oleksandr Andrushchenko wrote:
> 
> 
> On 01.10.21 10:42, Jan Beulich wrote:
>> On 30.09.2021 18:57, Oleksandr Andrushchenko wrote:
>>> [snip]
>>>
>>>>> +    bool found = false;
>>>>> +
>>>>> +    pcidevs_lock();
>>>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>>>> +    {
>>>>> +        if ( vdev->sbdf.sbdf == sbdf->sbdf )
>>>>> +        {
>>>>> +            /* Replace virtual SBDF with the physical one. */
>>>>> +            *sbdf = vdev->pdev->sbdf;
>>>>> +            found = true;
>>>>> +            break;
>>>>> +        }
>>>>> +    }
>>>>> +    pcidevs_unlock();
>>>> As per the comments on the earlier patch, locking as well as placement
>>>> may need reconsidering.
>>> I was thinking about the locking happening here.
>>> So, there are 4 sources where we need to manipulate d->vdev_list:
>>> 1. XEN_DOMCTL_assign_device
>>> 2. XEN_DOMCTL_test_assign_device
>>> 3. XEN_DOMCTL_deassign_device
>>> 4. MMIO handlers
>>> 5. Do I miss others?
>>>
>>> The first three already use pcidevs_{lock|unlock} and there it seems
>>> to be ok as those get called when PCI devices are discovered by Dom0
>>> and during guest domain creation. So, this is assumed not to happen
>>> frequently and can be accepted wrt global locking.
>>>
>>> What is more important is the fourth case, where in order to redirect
>>> configuration space access from virtual SBDF to physical SBDF we need
>>> to traverse the d->vdev_list each time the guest accesses PCI configuration
>>> space. This means that with each such access we take a BIG PCI lock...
>>>
>>> That being said, I think that we may want having a dedicated per-domain
>>> lock for d->vdev_list handling, e.g. d->vdev_lock.
>>> At the same time we may also consider that even for guests it is acceptable
>>> to use pcidevs_{lock|unlock} as this will not affect PCI memory space access
>>> and only has influence during device setup.
>>>
>>> I would love to hear your opinion on this
>> I've voiced my opinion already: Using the global lock really is an
>> abuse, which would require good justification. Hence unless there's
>> anything speaking against a per-domain lock, that's imo the only
>> suitable route to go. Nesting rules with the global lock may want
>> explicitly spelling out.
> I do understand your concern here and also support the idea that
> the less we wait for locks the better. Nevertheless, even if I introduce
> d->vdev_lock, which will obviously help MMIO traps, the rest will remain
> under pcidevs_{lock|unlock}, e.g. XEN_DOMCTL_assign_device,
> XEN_DOMCTL_test_assign_device and XEN_DOMCTL_deassign_device
> and the underlying code like vpci_{assign|deassign}_device in my case

Well, it's entirely usual that certain operations require more than one
lock.

> Anyways, I'll implement a per-domain d->vdev_lock

Thanks.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-09-30  7:52 ` [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically Oleksandr Andrushchenko
@ 2021-10-01 13:26   ` Jan Beulich
  2021-10-04  5:58     ` Oleksandr Andrushchenko
  2021-10-25 15:48   ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-10-01 13:26 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, xen-devel

On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> @@ -445,14 +456,25 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>  }
>  
> -static int add_bar_handlers(const struct pci_dev *pdev)
> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +}
> +
> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    return 0xffffffff;
> +}
> +
> +static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)

I remain unconvinced that this boolean is the best way to go here, but
I'll leave the decision there to Roger. Just a couple of nits:

> @@ -593,6 +625,30 @@ static int init_bars(struct pci_dev *pdev)
>  }
>  REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +int vpci_bar_add_handlers(const struct domain *d, const struct pci_dev *pdev)
> +{
> +    int rc;
> +
> +    /* Remove previously added registers. */
> +    vpci_remove_device_registers(pdev);
> +
> +    rc = add_bar_handlers(pdev, is_hardware_domain(d));
> +    if ( rc )
> +        gdprintk(XENLOG_ERR,
> +                 "%pp: failed to add BAR handlers for dom%pd: %d\n",

Only %pd please, as that already expands to d<num>.

> +                 &pdev->sbdf, d, rc);
> +    return rc;

Blank line please ahead of the main return statement of a function.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers
  2021-09-30  7:52 ` [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers Oleksandr Andrushchenko
@ 2021-10-01 13:31   ` Jan Beulich
  2021-10-26  7:50   ` Roger Pau Monné
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Beulich @ 2021-10-01 13:31 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Michal Orzel, xen-devel

On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> ROM BAR is only handled for the hardware domain and for guest domains
> there is a stub: at the moment PCI expansion ROM is x86 only, so it
> might not be used by other architectures without emulating x86. Other
> use-cases may include using that expansion ROM before Xen boots, hence
> no emulation is needed in Xen itself. Or when a guest wants to use the
> ROM code which seems to be rare.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 07/11] vpci/header: program p2m with guest BAR view
  2021-09-30  7:52 ` [PATCH v3 07/11] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
@ 2021-10-01 13:38   ` Jan Beulich
  2021-10-04  6:26     ` Oleksandr Andrushchenko
  2021-10-26 10:35   ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-10-01 13:38 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, xen-devel

On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Take into account guest's BAR view and program its p2m accordingly:
> gfn is guest's view of the BAR and mfn is the physical BAR value as set
> up by the host bridge in the hardware domain.
> This way hardware doamin sees physical BAR values and guest sees
> emulated ones.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Just a couple of nits, as I remain unconvinced of the rangeset related
choice in the earlier patch.

> @@ -37,12 +41,28 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>                       unsigned long *c)
>  {
>      const struct map_data *map = data;
> +    gfn_t start_gfn;
>      int rc;
>  
>      for ( ; ; )
>      {
>          unsigned long size = e - s + 1;
>  
> +        /*
> +         * Any BAR may have holes in its memory we want to map, e.g.
> +         * we don't want to map MSI-X regions which may be a part of that BAR,
> +         * e.g. when a single BAR is used for both MMIO and MSI-X.

This second "e.g." seems, to me at least, quite redundant with the first
one.

> +         * In this case MSI-X regions are subtracted from the mapping, but
> +         * map->start_gfn still points to the very beginning of the BAR.
> +         * So if there is a hole present then we need to adjust start_gfn
> +         * to reflect the fact of that substraction.
> +         */
> +        start_gfn = gfn_add(map->start_gfn, s - mfn_x(map->start_mfn));
> +
> +        printk(XENLOG_G_DEBUG

Do you really mean this to be active even in release builds? Might get
quite noisy ...

> +               "%smap [%lx, %lx] -> %#"PRI_gfn" for d%d\n",

%pd please in new or altered code.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-10-01 13:26   ` Jan Beulich
@ 2021-10-04  5:58     ` Oleksandr Andrushchenko
  2021-10-07  7:22       ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-04  5:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel, Oleksandr Andrushchenko



On 01.10.21 16:26, Jan Beulich wrote:
> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>> @@ -445,14 +456,25 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>           rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>   }
>>   
>> -static int add_bar_handlers(const struct pci_dev *pdev)
>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t val, void *data)
>> +{
>> +}
>> +
>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>> +                               void *data)
>> +{
>> +    return 0xffffffff;
>> +}
>> +
>> +static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)
> I remain unconvinced that this boolean is the best way to go here,
I can remove "bool is_hwdom" and have the checks like:

static int add_bar_handlers(const struct pci_dev *pdev)
{
...
     if ( is_hardware_domain(pdev->domain) )
         rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write,
                                PCI_COMMAND, 2, header);
     else
         rc = vpci_add_register(pdev->vpci, vpci_hw_read16, guest_cmd_write,
                                PCI_COMMAND, 2, header);
Is this going to be better?
>   but
> I'll leave the decision there to Roger. Just a couple of nits:
>
>> @@ -593,6 +625,30 @@ static int init_bars(struct pci_dev *pdev)
>>   }
>>   REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +int vpci_bar_add_handlers(const struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +
>> +    /* Remove previously added registers. */
>> +    vpci_remove_device_registers(pdev);
>> +
>> +    rc = add_bar_handlers(pdev, is_hardware_domain(d));
>> +    if ( rc )
>> +        gdprintk(XENLOG_ERR,
>> +                 "%pp: failed to add BAR handlers for dom%pd: %d\n",
> Only %pd please, as that already expands to d<num>.
Good catch, thank you!
>
>> +                 &pdev->sbdf, d, rc);
>> +    return rc;
> Blank line please ahead of the main return statement of a function.
Will add
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 07/11] vpci/header: program p2m with guest BAR view
  2021-10-01 13:38   ` Jan Beulich
@ 2021-10-04  6:26     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-04  6:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel, Oleksandr Andrushchenko



On 01.10.21 16:38, Jan Beulich wrote:
> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Take into account guest's BAR view and program its p2m accordingly:
>> gfn is guest's view of the BAR and mfn is the physical BAR value as set
>> up by the host bridge in the hardware domain.
>> This way hardware doamin sees physical BAR values and guest sees
>> emulated ones.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Just a couple of nits, as I remain unconvinced of the rangeset related
> choice in the earlier patch.
>
>> @@ -37,12 +41,28 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>                        unsigned long *c)
>>   {
>>       const struct map_data *map = data;
>> +    gfn_t start_gfn;
>>       int rc;
>>   
>>       for ( ; ; )
>>       {
>>           unsigned long size = e - s + 1;
>>   
>> +        /*
>> +         * Any BAR may have holes in its memory we want to map, e.g.
>> +         * we don't want to map MSI-X regions which may be a part of that BAR,
>> +         * e.g. when a single BAR is used for both MMIO and MSI-X.
> This second "e.g." seems, to me at least, quite redundant with the first
> one.
Ok
>
>> +         * In this case MSI-X regions are subtracted from the mapping, but
>> +         * map->start_gfn still points to the very beginning of the BAR.
>> +         * So if there is a hole present then we need to adjust start_gfn
>> +         * to reflect the fact of that substraction.
>> +         */
>> +        start_gfn = gfn_add(map->start_gfn, s - mfn_x(map->start_mfn));
>> +
>> +        printk(XENLOG_G_DEBUG
> Do you really mean this to be active even in release builds? Might get
> quite noisy ...
I can change this one to "gdprintk(XENLOG_G_DEBUG"
and leave the below one as "printk(XENLOG_G_WARNING"
Or you also mean the warning to be gdprintk?
>
>> +               "%smap [%lx, %lx] -> %#"PRI_gfn" for d%d\n",
> %pd please in new or altered code.
Will change
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-10-04  5:58     ` Oleksandr Andrushchenko
@ 2021-10-07  7:22       ` Jan Beulich
  2021-10-13 15:38         ` Roger Pau Monné
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-10-07  7:22 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, Bertrand Marquis, Rahul Singh,
	xen-devel

On 04.10.2021 07:58, Oleksandr Andrushchenko wrote:
> 
> 
> On 01.10.21 16:26, Jan Beulich wrote:
>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>> @@ -445,14 +456,25 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>           rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>>   }
>>>   
>>> -static int add_bar_handlers(const struct pci_dev *pdev)
>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>>> +                            uint32_t val, void *data)
>>> +{
>>> +}
>>> +
>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>>> +                               void *data)
>>> +{
>>> +    return 0xffffffff;
>>> +}
>>> +
>>> +static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)
>> I remain unconvinced that this boolean is the best way to go here,
> I can remove "bool is_hwdom" and have the checks like:
> 
> static int add_bar_handlers(const struct pci_dev *pdev)
> {
> ...
>      if ( is_hardware_domain(pdev->domain) )
>          rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write,
>                                 PCI_COMMAND, 2, header);
>      else
>          rc = vpci_add_register(pdev->vpci, vpci_hw_read16, guest_cmd_write,
>                                 PCI_COMMAND, 2, header);
> Is this going to be better?

Marginally (plus you'd need to prove that pdev->domain can never be NULL
when making it here). "I remain unconvinced" was rather referring to our
prior discussion.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function
  2021-09-30  7:52 ` [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function Oleksandr Andrushchenko
@ 2021-10-13 11:11   ` Roger Pau Monné
  2021-10-27  9:12     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-13 11:11 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

On Thu, Sep 30, 2021 at 10:52:13AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> This is in preparation for dynamic assignment of the vpci register
> handlers depending on the domain: hwdom or guest.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> ---
> Since v1:
>  - constify struct pci_dev where possible
> ---
>  xen/drivers/vpci/vpci.c | 7 ++++++-
>  xen/include/xen/vpci.h  | 2 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index cbd1bac7fc33..1666402d55b8 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -35,7 +35,7 @@ extern vpci_register_init_t *const __start_vpci_array[];
>  extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
> -void vpci_remove_device(struct pci_dev *pdev)
> +void vpci_remove_device_registers(const struct pci_dev *pdev)

Making this const is kind of misleading, as you end up modifying
contents of the pdev, is just that vpci data is stored as a pointer
inside the struct so you avoid the effects of the constification.

>  {
>      spin_lock(&pdev->vpci->lock);
>      while ( !list_empty(&pdev->vpci->handlers) )
> @@ -48,6 +48,11 @@ void vpci_remove_device(struct pci_dev *pdev)
>          xfree(r);
>      }
>      spin_unlock(&pdev->vpci->lock);
> +}
> +
> +void vpci_remove_device(struct pci_dev *pdev)
> +{
> +    vpci_remove_device_registers(pdev);
>      xfree(pdev->vpci->msix);
>      xfree(pdev->vpci->msi);
>      xfree(pdev->vpci);
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 9f5b5d52e159..2e910d0b1f90 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -28,6 +28,8 @@ int __must_check vpci_add_handlers(struct pci_dev *dev);
>  
>  /* Remove all handlers and free vpci related structures. */
>  void vpci_remove_device(struct pci_dev *pdev);
> +/* Remove all handlers for the device given. */

I would drop the 'given' form the end of the sentence...

> +void vpci_remove_device_registers(const struct pci_dev *pdev);

...and maybe name this vpci_remove_device_handlers as it's clearer
IMO.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-09-30  7:52 ` [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
  2021-09-30  8:21   ` Jan Beulich
@ 2021-10-13 11:29   ` Roger Pau Monné
  2021-10-13 12:47     ` Jan Beulich
  2021-10-27  9:53     ` Oleksandr Andrushchenko
  1 sibling, 2 replies; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-13 11:29 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Thu, Sep 30, 2021 at 10:52:14AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> When a PCI device gets assigned/de-assigned some work on vPCI side needs
> to be done for that device. Introduce a pair of hooks so vPCI can handle
> that.
> 
> Please note, that in the current design the error path is handled by
> the toolstack via XEN_DOMCTL_assign_device/XEN_DOMCTL_deassign_device,
> so this is why it is acceptable not to de-assign devices if vPCI's
> assign fails, e.g. the roll back will be handled on deassign_device when
> it is called by the toolstack.

It's kind of hard to see what would need to be rolled back, as the
functions are just dummies right now that don't perform any actions.

I don't think the toolstack should be the one to deal with the
fallout, as it could leave Xen in a broken state. The current commit
message doesn't provide any information about why it has been designed
this way.

> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v2:
> - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
>   for x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - extended the commit message
> ---
>  xen/drivers/Kconfig           |  4 ++++
>  xen/drivers/passthrough/pci.c |  9 +++++++++
>  xen/drivers/vpci/vpci.c       | 23 +++++++++++++++++++++++
>  xen/include/xen/vpci.h        | 20 ++++++++++++++++++++
>  4 files changed, 56 insertions(+)
> 
> diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
> index db94393f47a6..780490cf8e39 100644
> --- a/xen/drivers/Kconfig
> +++ b/xen/drivers/Kconfig
> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>  config HAS_VPCI
>  	bool
>  
> +config HAS_VPCI_GUEST_SUPPORT
> +	bool
> +	depends on HAS_VPCI

I would assume this is to go away once the work is finished? I don't
think it makes sense to split vPCI code between domU/dom0 on a build
time basis.

> +
>  endmenu
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 9f804a50e780..805ab86ed555 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -870,6 +870,10 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>      if ( ret )
>          goto out;
>  
> +    ret = vpci_deassign_device(d, pdev);
> +    if ( ret )
> +        goto out;
> +
>      if ( pdev->domain == hardware_domain  )
>          pdev->quarantine = false;
>  
> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>          rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>      }
>  
> +    if ( rc )
> +        goto done;
> +
> +    rc = vpci_assign_device(d, pdev);
> +
>   done:
>      if ( rc )
>          printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 1666402d55b8..0fe86cb30d23 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  
>      return rc;
>  }
> +
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Notify vPCI that device is assigned to guest. */
> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
> +{
> +    /* It only makes sense to assign for hwdom or guest domain. */
> +    if ( is_system_domain(d) || !has_vpci(d) )
> +        return 0;
> +
> +    return 0;
> +}
> +
> +/* Notify vPCI that device is de-assigned from guest. */
> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
> +{
> +    /* It only makes sense to de-assign from hwdom or guest domain. */
> +    if ( is_system_domain(d) || !has_vpci(d) )
> +        return 0;
> +
> +    return 0;
> +}
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> +
>  #endif /* __XEN__ */
>  
>  static int vpci_register_cmp(const struct vpci_register *r1,
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 2e910d0b1f90..ecc08f2c0f65 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -242,6 +242,26 @@ static inline bool vpci_process_pending(struct vcpu *v)
>  }
>  #endif
>  
> +#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_HAS_VPCI_GUEST_SUPPORT)

You don't need to check for CONFIG_HAS_VPCI, as
CONFIG_HAS_VPCI_GUEST_SUPPORT already depends on CONFIG_HAS_VPCI being
set.

> +/* Notify vPCI that device is assigned/de-assigned to/from guest. */
> +int __must_check vpci_assign_device(struct domain *d,
> +                                    const struct pci_dev *dev);
> +int __must_check vpci_deassign_device(struct domain *d,
> +                                      const struct pci_dev *dev);
> +#else
> +static inline int vpci_assign_device(struct domain *d,
> +                                     const struct pci_dev *dev)
> +{
> +    return 0;
> +};
> +
> +static inline int vpci_deassign_device(struct domain *d,
> +                                       const struct pci_dev *dev)
> +{
> +    return 0;
> +};

You need the __must_check attributes here also to match the prototypes
above.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-10-13 11:29   ` Roger Pau Monné
@ 2021-10-13 12:47     ` Jan Beulich
  2021-10-27  9:53     ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Beulich @ 2021-10-13 12:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Oleksandr Andrushchenko

On 13.10.2021 13:29, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:14AM +0300, Oleksandr Andrushchenko wrote:
>> --- a/xen/drivers/Kconfig
>> +++ b/xen/drivers/Kconfig
>> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>>  config HAS_VPCI
>>  	bool
>>  
>> +config HAS_VPCI_GUEST_SUPPORT
>> +	bool
>> +	depends on HAS_VPCI
> 
> I would assume this is to go away once the work is finished? I don't
> think it makes sense to split vPCI code between domU/dom0 on a build
> time basis.

If by that you mean x86 side work, then maybe. I did ask for this so
that x86 wouldn't carry quite a bit of presently dead code.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-09-30  7:52 ` [PATCH v3 03/11] vpci/header: Move register assignments from init_bars Oleksandr Andrushchenko
@ 2021-10-13 13:51   ` Roger Pau Monné
  2021-10-15  6:04     ` Jan Beulich
  2021-10-27 10:17     ` Oleksandr Andrushchenko
  0 siblings, 2 replies; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-13 13:51 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> This is in preparation for dynamic assignment of the vPCI register
> handlers depending on the domain: hwdom or guest.
> The need for this step is that it is easier to have all related functionality
> put at one place. When the subsequent patches add decisions on which
> handlers to install, e.g. hwdom or guest handlers, then this is easily
> achievable.

Won't it be possible to select the handlers to install in init_bars
itself?

Splitting it like that means you need to iterate over the numbers of
BARs twice (one in add_bar_handlers and one in init_bars), which makes
it more likely to introduce errors or divergences.

Decoupling the filling of vpci_bar data with setting the handlers
seems slightly confusing.

> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v1:
>  - constify struct pci_dev where possible
>  - extend patch description
> ---
>  xen/drivers/vpci/header.c | 83 ++++++++++++++++++++++++++-------------
>  1 file changed, 56 insertions(+), 27 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index f8cd55e7c024..3d571356397a 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -445,6 +445,55 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>  }
>  
> +static int add_bar_handlers(const struct pci_dev *pdev)

Making this const is again misleading IMO, as you end up modifying
fields inside the pdev, you get away with it because vpci data is
stored in a pointer.

> +{
> +    unsigned int i;
> +    struct vpci_header *header = &pdev->vpci->header;
> +    struct vpci_bar *bars = header->bars;
> +    int rc;
> +
> +    /* Setup a handler for the command register. */
> +    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
> +                           2, header);
> +    if ( rc )
> +        return rc;
> +
> +    if ( pdev->ignore_bars )
> +        return 0;

You can join both ifs above:

if ( rc || pdev->ignore_bars )
    return rc;

> +
> +    for ( i = 0; i < PCI_HEADER_NORMAL_NR_BARS + 1; i++ )

init_bars deals with both TYPE_NORMAL and TYPE_BRIDGE classes, yet you
seem to unconditionally assume PCI_HEADER_NORMAL_NR_BARS here (even
when below you take into account the different ROM BAR position).

> +    {
> +        if ( (bars[i].type == VPCI_BAR_IO) || (bars[i].type == VPCI_BAR_EMPTY) )
> +            continue;
> +
> +        if ( bars[i].type == VPCI_BAR_ROM )
> +        {
> +            unsigned int rom_reg;
> +            uint8_t header_type = pci_conf_read8(pdev->sbdf,
> +                                                 PCI_HEADER_TYPE) & 0x7f;

Missing newline, and unsigned int preferably for header_type.

> +            if ( header_type == PCI_HEADER_TYPE_NORMAL )
> +                rom_reg = PCI_ROM_ADDRESS;
> +            else
> +                rom_reg = PCI_ROM_ADDRESS1;
> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
> +                                   rom_reg, 4, &bars[i]);
> +            if ( rc )
> +                return rc;
> +        }
> +        else
> +        {
> +            uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;

unsigned int please, we try to avoid using explicitly sized types
unless strictly necessary (ie: when dealing with hardware values for
example).

> +
> +            /* This is either VPCI_BAR_MEM32 or VPCI_BAR_MEM64_{LO|HI}. */
> +            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> +                                   4, &bars[i]);
> +            if ( rc )
> +                return rc;
> +        }
> +    }
> +    return 0;
> +}
> +
>  static int init_bars(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
> @@ -470,14 +519,8 @@ static int init_bars(struct pci_dev *pdev)
>          return -EOPNOTSUPP;
>      }
>  
> -    /* Setup a handler for the command register. */
> -    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
> -                           2, header);
> -    if ( rc )
> -        return rc;

I don't think you need to move the handler for the command register
inside add_bar_handlers: for once it makes the function name not
reflect what it actually does (as it then deals with both BARs and the
command register), and it would also prevent you from having to call
add_bar_handlers in if ignore_bars is set.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-10-07  7:22       ` Jan Beulich
@ 2021-10-13 15:38         ` Roger Pau Monné
  2021-10-15  6:09           ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-13 15:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Rahul Singh, xen-devel

On Thu, Oct 07, 2021 at 09:22:36AM +0200, Jan Beulich wrote:
> On 04.10.2021 07:58, Oleksandr Andrushchenko wrote:
> > 
> > 
> > On 01.10.21 16:26, Jan Beulich wrote:
> >> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
> >>> @@ -445,14 +456,25 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
> >>>           rom->addr = val & PCI_ROM_ADDRESS_MASK;
> >>>   }
> >>>   
> >>> -static int add_bar_handlers(const struct pci_dev *pdev)
> >>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> >>> +                            uint32_t val, void *data)
> >>> +{
> >>> +}
> >>> +
> >>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> >>> +                               void *data)
> >>> +{
> >>> +    return 0xffffffff;
> >>> +}
> >>> +
> >>> +static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)
> >> I remain unconvinced that this boolean is the best way to go here,
> > I can remove "bool is_hwdom" and have the checks like:
> > 
> > static int add_bar_handlers(const struct pci_dev *pdev)
> > {
> > ...
> >      if ( is_hardware_domain(pdev->domain) )
> >          rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write,
> >                                 PCI_COMMAND, 2, header);
> >      else
> >          rc = vpci_add_register(pdev->vpci, vpci_hw_read16, guest_cmd_write,
> >                                 PCI_COMMAND, 2, header);
> > Is this going to be better?
> 
> Marginally (plus you'd need to prove that pdev->domain can never be NULL
> when making it here).

I think it would an anomaly to try to setup vPCI handlers for a device
without pdev->domain being set. I'm quite sure other vPCI code already
relies on pdev->domain being set.

As I said in another reply I'm not convinced though that splitting
add_bar_handlers is the right thing to do.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-10-13 13:51   ` Roger Pau Monné
@ 2021-10-15  6:04     ` Jan Beulich
  2021-10-25 14:28       ` Roger Pau Monné
  2021-10-27 10:17     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-10-15  6:04 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Oleksandr Andrushchenko

On 13.10.2021 15:51, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -445,6 +445,55 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>  }
>>  
>> +static int add_bar_handlers(const struct pci_dev *pdev)
> 
> Making this const is again misleading IMO, as you end up modifying
> fields inside the pdev, you get away with it because vpci data is
> stored in a pointer.

I think it was me who asked for const to be added in places like this
one. vpci data hanging off of struct pci_dev is an implementation
artifact imo, not an unavoidable connection. In principle the vpci
data corresponding to a physical device could also be looked up using
e.g. SBDF.

Here the intention really is to leave the physical device unchanged;
that's what the const documents (apart from enforcing).

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-10-13 15:38         ` Roger Pau Monné
@ 2021-10-15  6:09           ` Jan Beulich
  0 siblings, 0 replies; 98+ messages in thread
From: Jan Beulich @ 2021-10-15  6:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Oleksandr Andrushchenko, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Rahul Singh, xen-devel

On 13.10.2021 17:38, Roger Pau Monné wrote:
> On Thu, Oct 07, 2021 at 09:22:36AM +0200, Jan Beulich wrote:
>> On 04.10.2021 07:58, Oleksandr Andrushchenko wrote:
>>>
>>>
>>> On 01.10.21 16:26, Jan Beulich wrote:
>>>> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>>>>> @@ -445,14 +456,25 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>           rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>>>>   }
>>>>>   
>>>>> -static int add_bar_handlers(const struct pci_dev *pdev)
>>>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                            uint32_t val, void *data)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                               void *data)
>>>>> +{
>>>>> +    return 0xffffffff;
>>>>> +}
>>>>> +
>>>>> +static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)
>>>> I remain unconvinced that this boolean is the best way to go here,
>>> I can remove "bool is_hwdom" and have the checks like:
>>>
>>> static int add_bar_handlers(const struct pci_dev *pdev)
>>> {
>>> ...
>>>      if ( is_hardware_domain(pdev->domain) )
>>>          rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write,
>>>                                 PCI_COMMAND, 2, header);
>>>      else
>>>          rc = vpci_add_register(pdev->vpci, vpci_hw_read16, guest_cmd_write,
>>>                                 PCI_COMMAND, 2, header);
>>> Is this going to be better?
>>
>> Marginally (plus you'd need to prove that pdev->domain can never be NULL
>> when making it here).
> 
> I think it would an anomaly to try to setup vPCI handlers for a device
> without pdev->domain being set. I'm quite sure other vPCI code already
> relies on pdev->domain being set.

Quite likely, and my point wasn't to request dealing with the NULL case
by adding a check here. I really meant "prove", mainly recalling that
another patch (in another related series?) altered code around the
setting of pdev->domain in pci_add_device(). It would need to be assured
that whatever goes on there guarantees pdev->domain to have got set.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30  7:52 ` [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests Oleksandr Andrushchenko
  2021-09-30  8:53   ` Jan Beulich
@ 2021-10-18 18:32   ` Julien Grall
  2021-10-26 13:30   ` Roger Pau Monné
  2 siblings, 0 replies; 98+ messages in thread
From: Julien Grall @ 2021-10-18 18:32 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, xen-devel
  Cc: sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

Hi Oleksandr,

On 30/09/2021 08:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> There are three  originators for the PCI configuration space access:
> 1. The domain that owns physical host bridge: MMIO handlers are
> there so we can update vPCI register handlers with the values
> written by the hardware domain, e.g. physical view of the registers
> vs guest's view on the configuration space.
> 2. Guest access to the passed through PCI devices: we need to properly
> map virtual bus topology to the physical one, e.g. pass the configuration
> space access to the corresponding physical devices.
> 3. Emulated host PCI bridge access. It doesn't exist in the physical
> topology, e.g. it can't be mapped to some physical host bridge.
> So, all access to the host bridge itself needs to be trapped and
> emulated.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v2:
>   - pass struct domain instead of struct vcpu
>   - constify arguments where possible
>   - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>   xen/arch/arm/domain.c         |  1 +
>   xen/arch/arm/vpci.c           | 86 +++++++++++++++++++++++++++++++----
>   xen/arch/arm/vpci.h           |  3 ++
>   xen/drivers/passthrough/pci.c | 25 ++++++++++
>   xen/include/asm-arm/pci.h     |  1 +
>   xen/include/xen/pci.h         |  1 +
>   xen/include/xen/sched.h       |  2 +
>   7 files changed, 111 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index fa6fcc5e467c..095671742ad8 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d)
>                          get_order_from_bytes(d->arch.efi_acpi_len));
>   #endif
>       domain_io_free(d);
> +    domain_vpci_free(d);
>   }
>   
>   void arch_domain_shutdown(struct domain *d)
> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
> index 5d6c29c8dcd9..26ec2fa7cf2d 100644
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -17,6 +17,14 @@
>   
>   #define REGISTER_OFFSET(addr)  ( (addr) & 0x00000fff)
>   
> +struct vpci_mmio_priv {
> +    /*
> +     * Set to true if the MMIO handlers were set up for the emulated
> +     * ECAM host PCI bridge.
> +     */
> +    bool is_virt_ecam;
> +};
> +
>   /* Do some sanity checks. */
>   static bool vpci_mmio_access_allowed(unsigned int reg, unsigned int len)
>   {
> @@ -38,6 +46,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>       pci_sbdf_t sbdf;
>       unsigned long data = ~0UL;
>       unsigned int size = 1U << info->dabt.size;
> +    struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p;

This cast is unnecessary. Same...

>   
>       sbdf.sbdf = MMCFG_BDF(info->gpa);
>       reg = REGISTER_OFFSET(info->gpa);
> @@ -45,6 +54,13 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>       if ( !vpci_mmio_access_allowed(reg, size) )
>           return 0;
>   
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) )
> +            return 1;
> +
>       data = vpci_read(sbdf, reg, min(4u, size));
>       if ( size == 8 )
>           data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32;
> @@ -61,6 +77,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>       pci_sbdf_t sbdf;
>       unsigned long data = r;
>       unsigned int size = 1U << info->dabt.size;
> +    struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p;

... here. But is it meant to be modified? If not, then I think you want 
to turn it to add a const in both cases.

>   
>       sbdf.sbdf = MMCFG_BDF(info->gpa);
>       reg = REGISTER_OFFSET(info->gpa);
> @@ -68,6 +85,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>       if ( !vpci_mmio_access_allowed(reg, size) )
>           return 0;
>   
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) )
> +            return 1;
> +
>       vpci_write(sbdf, reg, min(4u, size), data);
>       if ( size == 8 )
>           vpci_write(sbdf, reg + 4, 4, data >> 32);
> @@ -80,13 +104,48 @@ static const struct mmio_handler_ops vpci_mmio_handler = {
>       .write = vpci_mmio_write,
>   };
>   
> +/*
> + * There are three  originators for the PCI configuration space access:
> + * 1. The domain that owns physical host bridge: MMIO handlers are
> + *    there so we can update vPCI register handlers with the values
> + *    written by the hardware domain, e.g. physical view of the registers/
> + *    configuration space.
> + * 2. Guest access to the passed through PCI devices: we need to properly
> + *    map virtual bus topology to the physical one, e.g. pass the configuration
> + *    space access to the corresponding physical devices.
> + * 3. Emulated host PCI bridge access. It doesn't exist in the physical
> + *    topology, e.g. it can't be mapped to some physical host bridge.
> + *    So, all access to the host bridge itself needs to be trapped and
> + *    emulated.
> + */
>   static int vpci_setup_mmio_handler(struct domain *d,
>                                      struct pci_host_bridge *bridge)
>   {
> -    struct pci_config_window *cfg = bridge->cfg;
> +    struct vpci_mmio_priv *priv;
> +
> +    priv = xzalloc(struct vpci_mmio_priv);
> +    if ( !priv )
> +        return -ENOMEM;
> +
> +    priv->is_virt_ecam = !is_hardware_domain(d);
>   
> -    register_mmio_handler(d, &vpci_mmio_handler,
> -                          cfg->phys_addr, cfg->size, NULL);
> +    if ( is_hardware_domain(d) )
> +    {
> +        struct pci_config_window *cfg = bridge->cfg;
> +
> +        bridge->mmio_priv = priv;
> +        register_mmio_handler(d, &vpci_mmio_handler,
> +                              cfg->phys_addr, cfg->size,
> +                              priv);
> +    }
> +    else
> +    {
> +        d->vpci_mmio_priv = priv;

Something feels odd to me in this code. The if ( !is_hardware_domain(d) 
) part seems to suggests that this can be called on multiple bridge. But 
here you are directly assigning priv to d->vpci_mmio_priv.

The call...

> +        /* Guest domains use what is programmed in their device tree. */
> +        register_mmio_handler(d, &vpci_mmio_handler,
> +                              GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE,
> +                              priv);
> +    }
>       return 0;
>   }
>   
> @@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d)
>       if ( !has_vpci(d) )
>           return 0;
>   
> +    return pci_host_iterate_bridges(d, vpci_setup_mmio_handler);

... here seems to confirm that you may (in theory) have multiple 
bridges. So the 'else' would want some rework to avoid assuming a single 
bridge.

> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 5b963d75d1ba..b7dffb769cfd 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -889,6 +889,31 @@ int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>       xfree(vdev);
>       return 0;
>   }
> +
> +/*
> + * Find the physical device which is mapped to the virtual device
> + * and translate virtual SBDF to the physical one.
> + */
> +bool pci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
> +{
> +    struct vpci_dev *vdev;
> +    bool found = false;
> +
> +    pcidevs_lock();
> +    list_for_each_entry ( vdev, &d->vdev_list, list )

I haven't looked at the rest of the series yet. But I am a bit concerned 
to see code to iterate through a list accessible by the guest.
   1) What safety mechanism do we have in place to ensure that the list 
is going to be small
   2) If there is a limit, do we have any documentation on top of this 
limit to make clear this can't be bumped without removing the list?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR
  2021-09-30  7:52 ` [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR Oleksandr Andrushchenko
@ 2021-10-25 11:51   ` Oleksandr Andrushchenko
  2021-10-26  9:40     ` Roger Pau Monné
  2021-10-26  9:08   ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-25 11:51 UTC (permalink / raw)
  To: roger.pau
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, jbeulich, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko, xen-devel

Hi, Roger!
Could you please take a look at the below?
Jan was questioning the per BAR range set approach, so it
is crucial for the maintainer (you) to answer here.

Thank you in advance,
Oleksandr

On 30.09.21 10:52, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Instead of handling a single range set, that contains all the memory
> regions of all the BARs and ROM, have them per BAR.
>
> This is in preparation of making non-identity mappings in p2m for the
> MMIOs/ROM.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>   xen/drivers/vpci/header.c | 172 ++++++++++++++++++++++++++------------
>   xen/include/xen/vpci.h    |   3 +-
>   2 files changed, 122 insertions(+), 53 deletions(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ec4d215f36ff..9c603d26d302 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -131,49 +131,75 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>   
>   bool vpci_process_pending(struct vcpu *v)
>   {
> -    if ( v->vpci.mem )
> +    if ( v->vpci.num_mem_ranges )
>       {
>           struct map_data data = {
>               .d = v->domain,
>               .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>           };
> -        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
> +        struct pci_dev *pdev = v->vpci.pdev;
> +        struct vpci_header *header = &pdev->vpci->header;
> +        unsigned int i;
>   
> -        if ( rc == -ERESTART )
> -            return true;
> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +        {
> +            struct vpci_bar *bar = &header->bars[i];
> +            int rc;
>   
> -        spin_lock(&v->vpci.pdev->vpci->lock);
> -        /* Disable memory decoding unconditionally on failure. */
> -        modify_decoding(v->vpci.pdev,
> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> -                        !rc && v->vpci.rom_only);
> -        spin_unlock(&v->vpci.pdev->vpci->lock);
> +            if ( !bar->mem )
> +                continue;
>   
> -        rangeset_destroy(v->vpci.mem);
> -        v->vpci.mem = NULL;
> -        if ( rc )
> -            /*
> -             * FIXME: in case of failure remove the device from the domain.
> -             * Note that there might still be leftover mappings. While this is
> -             * safe for Dom0, for DomUs the domain will likely need to be
> -             * killed in order to avoid leaking stale p2m mappings on
> -             * failure.
> -             */
> -            vpci_remove_device(v->vpci.pdev);
> +            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
> +
> +            if ( rc == -ERESTART )
> +                return true;
> +
> +            spin_lock(&pdev->vpci->lock);
> +            /* Disable memory decoding unconditionally on failure. */
> +            modify_decoding(pdev,
> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> +                            !rc && v->vpci.rom_only);
> +            spin_unlock(&pdev->vpci->lock);
> +
> +            rangeset_destroy(bar->mem);
> +            bar->mem = NULL;
> +            v->vpci.num_mem_ranges--;
> +            if ( rc )
> +                /*
> +                 * FIXME: in case of failure remove the device from the domain.
> +                 * Note that there might still be leftover mappings. While this is
> +                 * safe for Dom0, for DomUs the domain will likely need to be
> +                 * killed in order to avoid leaking stale p2m mappings on
> +                 * failure.
> +                 */
> +                vpci_remove_device(pdev);
> +        }
>       }
>   
>       return false;
>   }
>   
>   static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
> -                            struct rangeset *mem, uint16_t cmd)
> +                            uint16_t cmd)
>   {
>       struct map_data data = { .d = d, .map = true };
> -    int rc;
> +    struct vpci_header *header = &pdev->vpci->header;
> +    int rc = 0;
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
>   
> -    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
> -        process_pending_softirqs();
> -    rangeset_destroy(mem);
> +        if ( !bar->mem )
> +            continue;
> +
> +        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
> +                                              &data)) == -ERESTART )
> +            process_pending_softirqs();
> +        rangeset_destroy(bar->mem);
> +        bar->mem = NULL;
> +    }
>       if ( !rc )
>           modify_decoding(pdev, cmd, false);
>   
> @@ -181,7 +207,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>   }
>   
>   static void defer_map(struct domain *d, struct pci_dev *pdev,
> -                      struct rangeset *mem, uint16_t cmd, bool rom_only)
> +                      uint16_t cmd, bool rom_only, uint8_t num_mem_ranges)
>   {
>       struct vcpu *curr = current;
>   
> @@ -192,9 +218,9 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>        * started for the same device if the domain is not well-behaved.
>        */
>       curr->vpci.pdev = pdev;
> -    curr->vpci.mem = mem;
>       curr->vpci.cmd = cmd;
>       curr->vpci.rom_only = rom_only;
> +    curr->vpci.num_mem_ranges = num_mem_ranges;
>       /*
>        * Raise a scheduler softirq in order to prevent the guest from resuming
>        * execution with pending mapping operations, to trigger the invocation
> @@ -206,42 +232,47 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>   static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>   {
>       struct vpci_header *header = &pdev->vpci->header;
> -    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>       struct pci_dev *tmp, *dev = NULL;
>       const struct vpci_msix *msix = pdev->vpci->msix;
> -    unsigned int i;
> +    unsigned int i, j;
>       int rc;
> -
> -    if ( !mem )
> -        return -ENOMEM;
> +    uint8_t num_mem_ranges;
>   
>       /*
> -     * Create a rangeset that represents the current device BARs memory region
> +     * Create a rangeset per BAR that represents the current device memory region
>        * and compare it against all the currently active BAR memory regions. If
>        * an overlap is found, subtract it from the region to be mapped/unmapped.
>        *
> -     * First fill the rangeset with all the BARs of this device or with the ROM
> +     * First fill the rangesets with all the BARs of this device or with the ROM
>        * BAR only, depending on whether the guest is toggling the memory decode
>        * bit of the command register, or the enable bit of the ROM BAR register.
>        */
>       for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>       {
> -        const struct vpci_bar *bar = &header->bars[i];
> +        struct vpci_bar *bar = &header->bars[i];
>           unsigned long start = PFN_DOWN(bar->addr);
>           unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>   
> +        bar->mem = NULL;
> +
>           if ( !MAPPABLE_BAR(bar) ||
>                (rom_only ? bar->type != VPCI_BAR_ROM
>                          : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
>               continue;
>   
> -        rc = rangeset_add_range(mem, start, end);
> +        bar->mem = rangeset_new(NULL, NULL, 0);
> +        if ( !bar->mem )
> +        {
> +            rc = -ENOMEM;
> +            goto fail;
> +        }
> +
> +        rc = rangeset_add_range(bar->mem, start, end);
>           if ( rc )
>           {
>               printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
>                      start, end, rc);
> -            rangeset_destroy(mem);
> -            return rc;
> +            goto fail;
>           }
>       }
>   
> @@ -252,14 +283,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>           unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>                                        vmsix_table_size(pdev->vpci, i) - 1);
>   
> -        rc = rangeset_remove_range(mem, start, end);
> -        if ( rc )
> +        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
>           {
> -            printk(XENLOG_G_WARNING
> -                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
> -                   start, end, rc);
> -            rangeset_destroy(mem);
> -            return rc;
> +            const struct vpci_bar *bar = &header->bars[j];
> +
> +            if ( !bar->mem )
> +                continue;
> +
> +            rc = rangeset_remove_range(bar->mem, start, end);
> +            if ( rc )
> +            {
> +                printk(XENLOG_G_WARNING
> +                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
> +                       start, end, rc);
> +                goto fail;
> +            }
>           }
>       }
>   
> @@ -291,7 +329,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>               unsigned long start = PFN_DOWN(bar->addr);
>               unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>   
> -            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
> +            if ( !bar->enabled ||
> +                 !rangeset_overlaps_range(bar->mem, start, end) ||
>                    /*
>                     * If only the ROM enable bit is toggled check against other
>                     * BARs in the same device for overlaps, but not against the
> @@ -300,13 +339,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>                    (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
>                   continue;
>   
> -            rc = rangeset_remove_range(mem, start, end);
> +            rc = rangeset_remove_range(bar->mem, start, end);
>               if ( rc )
>               {
>                   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>                          start, end, rc);
> -                rangeset_destroy(mem);
> -                return rc;
> +                goto fail;
>               }
>           }
>       }
> @@ -324,12 +362,42 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>            * will always be to establish mappings and process all the BARs.
>            */
>           ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
> -        return apply_map(pdev->domain, pdev, mem, cmd);
> +        return apply_map(pdev->domain, pdev, cmd);
>       }
>   
> -    defer_map(dev->domain, dev, mem, cmd, rom_only);
> +    /* Find out how many memory ranges has left after MSI and overlaps. */
> +    num_mem_ranges = 0;
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
> +
> +        if ( !rangeset_is_empty(bar->mem) )
> +            num_mem_ranges++;
> +    }
> +
> +    /*
> +     * There are cases when PCI device, root port for example, has neither
> +     * memory space nor IO. In this case PCI command register write is
> +     * missed resulting in the underlying PCI device not functional, so:
> +     *   - if there are no regions write the command register now
> +     *   - if there are regions then defer work and write later on
> +     */
> +    if ( !num_mem_ranges )
> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> +    else
> +        defer_map(dev->domain, dev, cmd, rom_only, num_mem_ranges);
>   
>       return 0;
> +
> +fail:
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
> +
> +        rangeset_destroy(bar->mem);
> +        bar->mem = NULL;
> +    }
> +    return rc;
>   }
>   
>   static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index a0320b22cb36..352e02d0106d 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -80,6 +80,7 @@ struct vpci {
>               /* Guest view of the BAR. */
>               uint64_t guest_addr;
>               uint64_t size;
> +            struct rangeset *mem;
>               enum {
>                   VPCI_BAR_EMPTY,
>                   VPCI_BAR_IO,
> @@ -154,9 +155,9 @@ struct vpci {
>   
>   struct vpci_vcpu {
>       /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
> -    struct rangeset *mem;
>       struct pci_dev *pdev;
>       uint16_t cmd;
> +    uint8_t num_mem_ranges;
>       bool rom_only : 1;
>   };
>   

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-10-15  6:04     ` Jan Beulich
@ 2021-10-25 14:28       ` Roger Pau Monné
  0 siblings, 0 replies; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-25 14:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Oleksandr Andrushchenko

On Fri, Oct 15, 2021 at 08:04:56AM +0200, Jan Beulich wrote:
> On 13.10.2021 15:51, Roger Pau Monné wrote:
> > On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
> >> --- a/xen/drivers/vpci/header.c
> >> +++ b/xen/drivers/vpci/header.c
> >> @@ -445,6 +445,55 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
> >>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
> >>  }
> >>  
> >> +static int add_bar_handlers(const struct pci_dev *pdev)
> > 
> > Making this const is again misleading IMO, as you end up modifying
> > fields inside the pdev, you get away with it because vpci data is
> > stored in a pointer.
> 
> I think it was me who asked for const to be added in places like this
> one. vpci data hanging off of struct pci_dev is an implementation
> artifact imo, not an unavoidable connection. In principle the vpci
> data corresponding to a physical device could also be looked up using
> e.g. SBDF.

I was considering vPCI part an intrinsic part of the pci_dev, but I
can see you thinking otherwise. We similarly have other pieces of data
hanging off pci_dev, so I think it's hard to tell which ones as fine
to have as part of the struct vs as pointer references.

> Here the intention really is to leave the physical device unchanged;
> that's what the const documents (apart from enforcing).

Ack. I wouldn't have asked for those myself, but as said above I can
see your point.

Regards, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-09-30  7:52 ` [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically Oleksandr Andrushchenko
  2021-10-01 13:26   ` Jan Beulich
@ 2021-10-25 15:48   ` Roger Pau Monné
  2021-11-01  9:18     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-25 15:48 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Thu, Sep 30, 2021 at 10:52:16AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
> 
> Use stubs for guest domains for now.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v2:
> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>   has been eliminated from being built on x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - simplify some code3. simplify
>  - use gdprintk + error code instead of gprintk
>  - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>    so these do not get compiled for x86
>  - removed unneeded is_system_domain check
> ---
>  xen/drivers/vpci/header.c | 72 ++++++++++++++++++++++++++++++++++-----
>  xen/drivers/vpci/vpci.c   |  4 +--
>  xen/include/xen/vpci.h    |  8 +++++
>  3 files changed, 74 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 3d571356397a..1ce98795fcca 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -397,6 +397,17 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>      pci_conf_write32(pdev->sbdf, reg, val);
>  }
>  
> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +}
> +
> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    return 0xffffffff;
> +}
> +
>  static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>                        uint32_t val, void *data)
>  {
> @@ -445,14 +456,25 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>  }
>  
> -static int add_bar_handlers(const struct pci_dev *pdev)
> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +}
> +
> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    return 0xffffffff;
> +}

FWIW, I would also be fine with introducing the code for those
handlers at the same time.

> +
> +static int add_bar_handlers(const struct pci_dev *pdev, bool is_hwdom)

I would rather use is_hardware_domain(pdev->domain) than passing a
boolean here, no need to duplicate data which is already available
from the pdev.

>  {
>      unsigned int i;
>      struct vpci_header *header = &pdev->vpci->header;
>      struct vpci_bar *bars = header->bars;
>      int rc;
>  
> -    /* Setup a handler for the command register. */
> +    /* Setup a handler for the command register: same for hwdom and guests. */
>      rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
>                             2, header);
>      if ( rc )
> @@ -475,8 +497,13 @@ static int add_bar_handlers(const struct pci_dev *pdev)
>                  rom_reg = PCI_ROM_ADDRESS;
>              else
>                  rom_reg = PCI_ROM_ADDRESS1;
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
> -                                   rom_reg, 4, &bars[i]);
> +            if ( is_hwdom )
> +                rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write,
> +                                       rom_reg, 4, &bars[i]);
> +            else
> +                rc = vpci_add_register(pdev->vpci,
> +                                       guest_rom_read, guest_rom_write,
> +                                       rom_reg, 4, &bars[i]);

I think you could use:

else if ( IS_ENABLED(CONFIG_HAS_VPCI_GUEST_SUPPORT) )
    rc = vpci_add_register(...
else
    ASSERT_UNREACHABLE();

And then guard the guest_ handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT.

>              if ( rc )
>                  return rc;
>          }
> @@ -485,8 +512,13 @@ static int add_bar_handlers(const struct pci_dev *pdev)
>              uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
>  
>              /* This is either VPCI_BAR_MEM32 or VPCI_BAR_MEM64_{LO|HI}. */
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> -                                   4, &bars[i]);
> +            if ( is_hwdom )
> +                rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write,
> +                                       reg, 4, &bars[i]);
> +            else
> +                rc = vpci_add_register(pdev->vpci,
> +                                       guest_bar_read, guest_bar_write,
> +                                       reg, 4, &bars[i]);
>              if ( rc )
>                  return rc;
>          }
> @@ -520,7 +552,7 @@ static int init_bars(struct pci_dev *pdev)
>      }
>  
>      if ( pdev->ignore_bars )
> -        return add_bar_handlers(pdev);
> +        return add_bar_handlers(pdev, true);
>  
>      /* Disable memory decoding before sizing. */
>      cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> @@ -582,7 +614,7 @@ static int init_bars(struct pci_dev *pdev)
>                                PCI_ROM_ADDRESS_ENABLE;
>      }
>  
> -    rc = add_bar_handlers(pdev);
> +    rc = add_bar_handlers(pdev, true);
>      if ( rc )
>      {
>          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -593,6 +625,30 @@ static int init_bars(struct pci_dev *pdev)
>  }
>  REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +int vpci_bar_add_handlers(const struct domain *d, const struct pci_dev *pdev)
> +{
> +    int rc;
> +
> +    /* Remove previously added registers. */
> +    vpci_remove_device_registers(pdev);

Shouldn't this be done by vpci_assign_device as a preparation for
assigning the device?

> +
> +    rc = add_bar_handlers(pdev, is_hardware_domain(d));

Also this model seems to assume that vPCI will require the hardware
domain to have owned the device before it being assigned to a guest,
but for example when using a PV dom0 that won't be the case, and hence
we would need the vPCI fields to be filled when assigning to a guest.

Hence I wonder whether we shouldn't do a full re-initialization when
assigning to a guest instead of this partial one.

> +    if ( rc )
> +        gdprintk(XENLOG_ERR,
> +                 "%pp: failed to add BAR handlers for dom%pd: %d\n",
> +                 &pdev->sbdf, d, rc);
> +    return rc;
> +}
> +
> +int vpci_bar_remove_handlers(const struct domain *d, const struct pci_dev *pdev)
> +{
> +    /* Remove previously added registers. */
> +    vpci_remove_device_registers(pdev);
> +    return 0;
> +}
> +#endif
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 0fe86cb30d23..702f7b5d5dda 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -95,7 +95,7 @@ int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> -    return 0;
> +    return vpci_bar_add_handlers(d, dev);
>  }
>  
>  /* Notify vPCI that device is de-assigned from guest. */
> @@ -105,7 +105,7 @@ int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> -    return 0;
> +    return vpci_bar_remove_handlers(d, dev);

I think it would be better to use something similar to
REGISTER_VPCI_INIT here, otherwise this will need to be modified every
time a new capability is handled by Xen.

Maybe we could reuse or expand REGISTER_VPCI_INIT adding another field
to be used for guest initialization?

>  }
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>  
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index ecc08f2c0f65..fd822c903af5 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -57,6 +57,14 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
>   */
>  bool __must_check vpci_process_pending(struct vcpu *v);
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Add/remove BAR handlers for a domain. */
> +int vpci_bar_add_handlers(const struct domain *d,
> +                          const struct pci_dev *pdev);
> +int vpci_bar_remove_handlers(const struct domain *d,
> +                             const struct pci_dev *pdev);
> +#endif

This would then go away if we implement a mechanism similar to
REGISTER_VPCI_INIT.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers
  2021-09-30  7:52 ` [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers Oleksandr Andrushchenko
  2021-10-01 13:31   ` Jan Beulich
@ 2021-10-26  7:50   ` Roger Pau Monné
  2021-10-26  8:09     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26  7:50 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

On Thu, Sep 30, 2021 at 10:52:17AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> ROM BAR is only handled for the hardware domain and for guest domains
> there is a stub: at the moment PCI expansion ROM is x86 only, so it
> might not be used by other architectures without emulating x86. Other
> use-cases may include using that expansion ROM before Xen boots, hence
> no emulation is needed in Xen itself. Or when a guest wants to use the
> ROM code which seems to be rare.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> ---
> Since v1:
>  - re-work guest read/write to be much simpler and do more work on write
>    than read which is expected to be called more frequently
>  - removed one too obvious comment
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>  xen/drivers/vpci/header.c | 30 +++++++++++++++++++++++++++++-
>  xen/include/xen/vpci.h    |  3 +++
>  2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 1ce98795fcca..ec4d215f36ff 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -400,12 +400,38 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>  static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
>                              uint32_t val, void *data)
>  {
> +    struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +    else
> +    {
> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +    }
> +
> +    bar->guest_addr &= ~(0xffffffffull << (hi ? 32 : 0));
> +    bar->guest_addr |= (uint64_t)val << (hi ? 32 : 0);
> +
> +    bar->guest_addr &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
>  }
>  
>  static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
>                                 void *data)
>  {
> -    return 0xffffffff;
> +    const struct vpci_bar *bar = data;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +        return bar->guest_addr >> 32;
> +
> +    return bar->guest_addr;

I think this is missing a check for whether the BAR is the high part
of a 64bit one? Ie:

struct vpci_bar *bar = data;
bool hi = false;

if ( bar->type == VPCI_BAR_MEM64_HI )
{
    ASSERT(reg > PCI_BASE_ADDRESS_0);
    bar--;
    hi = true;
}

return bar->guest_addr >> (hi ? 32 : 0);

Or else when accessing the high part of a 64bit BAR you will always
return 0s as it hasn't been setup by guest_bar_write.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers
  2021-10-26  7:50   ` Roger Pau Monné
@ 2021-10-26  8:09     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-26  8:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Michal Orzel, Oleksandr Andrushchenko



On 26.10.21 10:50, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:17AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Emulate guest BAR register values: this allows creating a guest view
>> of the registers and emulates size and properties probe as it is done
>> during PCI device enumeration by the guest.
>>
>> ROM BAR is only handled for the hardware domain and for guest domains
>> there is a stub: at the moment PCI expansion ROM is x86 only, so it
>> might not be used by other architectures without emulating x86. Other
>> use-cases may include using that expansion ROM before Xen boots, hence
>> no emulation is needed in Xen itself. Or when a guest wants to use the
>> ROM code which seems to be rare.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>> ---
>> Since v1:
>>   - re-work guest read/write to be much simpler and do more work on write
>>     than read which is expected to be called more frequently
>>   - removed one too obvious comment
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   xen/drivers/vpci/header.c | 30 +++++++++++++++++++++++++++++-
>>   xen/include/xen/vpci.h    |  3 +++
>>   2 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 1ce98795fcca..ec4d215f36ff 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -400,12 +400,38 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>>   static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
>>                               uint32_t val, void *data)
>>   {
>> +    struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +    else
>> +    {
>> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
>> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
>> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
>> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
>> +    }
>> +
>> +    bar->guest_addr &= ~(0xffffffffull << (hi ? 32 : 0));
>> +    bar->guest_addr |= (uint64_t)val << (hi ? 32 : 0);
>> +
>> +    bar->guest_addr &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
>>   }
>>   
>>   static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
>>                                  void *data)
>>   {
>> -    return 0xffffffff;
>> +    const struct vpci_bar *bar = data;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +        return bar->guest_addr >> 32;
>> +
>> +    return bar->guest_addr;
> I think this is missing a check for whether the BAR is the high part
> of a 64bit one? Ie:
>
> struct vpci_bar *bar = data;
> bool hi = false;
>
> if ( bar->type == VPCI_BAR_MEM64_HI )
> {
>      ASSERT(reg > PCI_BASE_ADDRESS_0);
>      bar--;
>      hi = true;
> }
>
> return bar->guest_addr >> (hi ? 32 : 0);
>
> Or else when accessing the high part of a 64bit BAR you will always
> return 0s as it hasn't been setup by guest_bar_write.
Yes, you are right
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR
  2021-09-30  7:52 ` [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR Oleksandr Andrushchenko
  2021-10-25 11:51   ` Oleksandr Andrushchenko
@ 2021-10-26  9:08   ` Roger Pau Monné
  2021-11-02 10:34     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26  9:08 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Thu, Sep 30, 2021 at 10:52:18AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Instead of handling a single range set, that contains all the memory
> regions of all the BARs and ROM, have them per BAR.
> 
> This is in preparation of making non-identity mappings in p2m for the
> MMIOs/ROM.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>  xen/drivers/vpci/header.c | 172 ++++++++++++++++++++++++++------------
>  xen/include/xen/vpci.h    |   3 +-
>  2 files changed, 122 insertions(+), 53 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ec4d215f36ff..9c603d26d302 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -131,49 +131,75 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>  
>  bool vpci_process_pending(struct vcpu *v)
>  {
> -    if ( v->vpci.mem )
> +    if ( v->vpci.num_mem_ranges )
>      {
>          struct map_data data = {
>              .d = v->domain,
>              .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>          };
> -        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
> +        struct pci_dev *pdev = v->vpci.pdev;
> +        struct vpci_header *header = &pdev->vpci->header;
> +        unsigned int i;
>  
> -        if ( rc == -ERESTART )
> -            return true;
> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +        {
> +            struct vpci_bar *bar = &header->bars[i];
> +            int rc;
>  
> -        spin_lock(&v->vpci.pdev->vpci->lock);
> -        /* Disable memory decoding unconditionally on failure. */
> -        modify_decoding(v->vpci.pdev,
> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> -                        !rc && v->vpci.rom_only);
> -        spin_unlock(&v->vpci.pdev->vpci->lock);
> +            if ( !bar->mem )
> +                continue;
>  
> -        rangeset_destroy(v->vpci.mem);
> -        v->vpci.mem = NULL;
> -        if ( rc )
> -            /*
> -             * FIXME: in case of failure remove the device from the domain.
> -             * Note that there might still be leftover mappings. While this is
> -             * safe for Dom0, for DomUs the domain will likely need to be
> -             * killed in order to avoid leaking stale p2m mappings on
> -             * failure.
> -             */
> -            vpci_remove_device(v->vpci.pdev);
> +            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
> +
> +            if ( rc == -ERESTART )
> +                return true;
> +
> +            spin_lock(&pdev->vpci->lock);
> +            /* Disable memory decoding unconditionally on failure. */
> +            modify_decoding(pdev,
> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> +                            !rc && v->vpci.rom_only);
> +            spin_unlock(&pdev->vpci->lock);
> +
> +            rangeset_destroy(bar->mem);

Now that the rangesets are per-BAR we might have to consider
allocating them at initialization time and not destroying them when
empty. We could replace the NULL checks with rangeset_is_empty
instead. Not that you have to do this on this patch, but I think it's
worth mentioning.

> +            bar->mem = NULL;
> +            v->vpci.num_mem_ranges--;
> +            if ( rc )
> +                /*
> +                 * FIXME: in case of failure remove the device from the domain.
> +                 * Note that there might still be leftover mappings. While this is
> +                 * safe for Dom0, for DomUs the domain will likely need to be
> +                 * killed in order to avoid leaking stale p2m mappings on
> +                 * failure.
> +                 */
> +                vpci_remove_device(pdev);
> +        }
>      }
>  
>      return false;
>  }
>  
>  static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
> -                            struct rangeset *mem, uint16_t cmd)
> +                            uint16_t cmd)
>  {
>      struct map_data data = { .d = d, .map = true };
> -    int rc;
> +    struct vpci_header *header = &pdev->vpci->header;
> +    int rc = 0;
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
>  
> -    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
> -        process_pending_softirqs();
> -    rangeset_destroy(mem);
> +        if ( !bar->mem )
> +            continue;
> +
> +        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
> +                                              &data)) == -ERESTART )
> +            process_pending_softirqs();
> +        rangeset_destroy(bar->mem);
> +        bar->mem = NULL;
> +    }
>      if ( !rc )
>          modify_decoding(pdev, cmd, false);
>  
> @@ -181,7 +207,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>  }
>  
>  static void defer_map(struct domain *d, struct pci_dev *pdev,
> -                      struct rangeset *mem, uint16_t cmd, bool rom_only)
> +                      uint16_t cmd, bool rom_only, uint8_t num_mem_ranges)

Like mentioned below, I don't think you need to pass the number of
BARs that need mapping changes. Iff that's strictly needed, it should
be an unsigned int.

>  {
>      struct vcpu *curr = current;
>  
> @@ -192,9 +218,9 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>       * started for the same device if the domain is not well-behaved.
>       */
>      curr->vpci.pdev = pdev;
> -    curr->vpci.mem = mem;
>      curr->vpci.cmd = cmd;
>      curr->vpci.rom_only = rom_only;
> +    curr->vpci.num_mem_ranges = num_mem_ranges;
>      /*
>       * Raise a scheduler softirq in order to prevent the guest from resuming
>       * execution with pending mapping operations, to trigger the invocation
> @@ -206,42 +232,47 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>  {
>      struct vpci_header *header = &pdev->vpci->header;
> -    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>      struct pci_dev *tmp, *dev = NULL;
>      const struct vpci_msix *msix = pdev->vpci->msix;
> -    unsigned int i;
> +    unsigned int i, j;
>      int rc;
> -
> -    if ( !mem )
> -        return -ENOMEM;
> +    uint8_t num_mem_ranges;
>  
>      /*
> -     * Create a rangeset that represents the current device BARs memory region
> +     * Create a rangeset per BAR that represents the current device memory region
>       * and compare it against all the currently active BAR memory regions. If
>       * an overlap is found, subtract it from the region to be mapped/unmapped.
>       *
> -     * First fill the rangeset with all the BARs of this device or with the ROM
> +     * First fill the rangesets with all the BARs of this device or with the ROM
>       * BAR only, depending on whether the guest is toggling the memory decode
>       * bit of the command register, or the enable bit of the ROM BAR register.
>       */
>      for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>      {
> -        const struct vpci_bar *bar = &header->bars[i];
> +        struct vpci_bar *bar = &header->bars[i];
>          unsigned long start = PFN_DOWN(bar->addr);
>          unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>  
> +        bar->mem = NULL;

Why do you need to set mem to NULL here? I think we should instead
assert that bar->mem == NULL here.

> +
>          if ( !MAPPABLE_BAR(bar) ||
>               (rom_only ? bar->type != VPCI_BAR_ROM
>                         : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
>              continue;
>  
> -        rc = rangeset_add_range(mem, start, end);
> +        bar->mem = rangeset_new(NULL, NULL, 0);
> +        if ( !bar->mem )
> +        {
> +            rc = -ENOMEM;
> +            goto fail;
> +        }
> +
> +        rc = rangeset_add_range(bar->mem, start, end);
>          if ( rc )
>          {
>              printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
>                     start, end, rc);
> -            rangeset_destroy(mem);
> -            return rc;
> +            goto fail;
>          }
>      }
>  
> @@ -252,14 +283,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>          unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>                                       vmsix_table_size(pdev->vpci, i) - 1);
>  
> -        rc = rangeset_remove_range(mem, start, end);
> -        if ( rc )
> +        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
>          {
> -            printk(XENLOG_G_WARNING
> -                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
> -                   start, end, rc);
> -            rangeset_destroy(mem);
> -            return rc;
> +            const struct vpci_bar *bar = &header->bars[j];
> +
> +            if ( !bar->mem )
> +                continue;
> +
> +            rc = rangeset_remove_range(bar->mem, start, end);
> +            if ( rc )
> +            {
> +                printk(XENLOG_G_WARNING
> +                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
> +                       start, end, rc);
> +                goto fail;
> +            }
>          }
>      }
>  
> @@ -291,7 +329,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>              unsigned long start = PFN_DOWN(bar->addr);
>              unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>  
> -            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
> +            if ( !bar->enabled ||
> +                 !rangeset_overlaps_range(bar->mem, start, end) ||
>                   /*
>                    * If only the ROM enable bit is toggled check against other
>                    * BARs in the same device for overlaps, but not against the
> @@ -300,13 +339,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>                   (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
>                  continue;
>  
> -            rc = rangeset_remove_range(mem, start, end);
> +            rc = rangeset_remove_range(bar->mem, start, end);
>              if ( rc )
>              {
>                  printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>                         start, end, rc);
> -                rangeset_destroy(mem);
> -                return rc;
> +                goto fail;
>              }
>          }
>      }
> @@ -324,12 +362,42 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>           * will always be to establish mappings and process all the BARs.
>           */
>          ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
> -        return apply_map(pdev->domain, pdev, mem, cmd);
> +        return apply_map(pdev->domain, pdev, cmd);
>      }
>  
> -    defer_map(dev->domain, dev, mem, cmd, rom_only);
> +    /* Find out how many memory ranges has left after MSI and overlaps. */
> +    num_mem_ranges = 0;
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];

There's no need to declare this local variable AFAICT, just use
header->bars[i].mem. In any case this is likely to go away if you
follow my recommendation below to just call defer_map unconditionally
like it's currently done.

> +
> +        if ( !rangeset_is_empty(bar->mem) )
> +            num_mem_ranges++;
> +    }
> +
> +    /*
> +     * There are cases when PCI device, root port for example, has neither
> +     * memory space nor IO. In this case PCI command register write is
> +     * missed resulting in the underlying PCI device not functional, so:
> +     *   - if there are no regions write the command register now
> +     *   - if there are regions then defer work and write later on
> +     */
> +    if ( !num_mem_ranges )
> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);

I think this is wrong, as not calling defer_map will prevent the
rangesets (bar[i]->mem) from being destroyed, so we are effectively
leaking memory.

You need to take a path similar to the failure one in case there are
no mappings pending, or even better just call defer_map anyway and let
it do it's thing, it should be capable of handling empty rangesets
just fine. That's how it's currently done.

> +    else
> +        defer_map(dev->domain, dev, cmd, rom_only, num_mem_ranges);
>  
>      return 0;
> +
> +fail:

We usually ask labels to be indented with one space.

> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
> +
> +        rangeset_destroy(bar->mem);
> +        bar->mem = NULL;
> +    }
> +    return rc;
>  }
>  
>  static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index a0320b22cb36..352e02d0106d 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -80,6 +80,7 @@ struct vpci {
>              /* Guest view of the BAR. */
>              uint64_t guest_addr;
>              uint64_t size;
> +            struct rangeset *mem;
>              enum {
>                  VPCI_BAR_EMPTY,
>                  VPCI_BAR_IO,
> @@ -154,9 +155,9 @@ struct vpci {
>  
>  struct vpci_vcpu {
>      /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
> -    struct rangeset *mem;
>      struct pci_dev *pdev;
>      uint16_t cmd;
> +    uint8_t num_mem_ranges;

AFAICT This could be a simple bool:

bool map_pending : 1;

As there's no strict need to know how many BARs have pending mappings.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR
  2021-10-25 11:51   ` Oleksandr Andrushchenko
@ 2021-10-26  9:40     ` Roger Pau Monné
  2021-11-02 11:13       ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26  9:40 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, jbeulich, Bertrand Marquis, Rahul Singh,
	xen-devel

On Mon, Oct 25, 2021 at 11:51:57AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> Could you please take a look at the below?
> Jan was questioning the per BAR range set approach, so it
> is crucial for the maintainer (you) to answer here.

I'm open to suggestions to using something different than a rangeset
per BAR, but lacking any concrete proposal I think using rangesets is
fine.

One possible way might be to extend rangesets so that private data
could be stored for each rangeset range, but that would then make
merging operations impossible, likewise splitting ranges would be
troublesome.

We could then store the physical BAR address in that private data and
use the rangeset addresses as guest physical address space. It's
unclear however that this approach would be any better than just using
a rangeset per BAR.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 07/11] vpci/header: program p2m with guest BAR view
  2021-09-30  7:52 ` [PATCH v3 07/11] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
  2021-10-01 13:38   ` Jan Beulich
@ 2021-10-26 10:35   ` Roger Pau Monné
  2021-11-02 10:43     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26 10:35 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Thu, Sep 30, 2021 at 10:52:19AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Take into account guest's BAR view and program its p2m accordingly:
> gfn is guest's view of the BAR and mfn is the physical BAR value as set
> up by the host bridge in the hardware domain.
> This way hardware doamin sees physical BAR values and guest sees
> emulated ones.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v2:
> - improve readability for data.start_gfn and restructure ?: construct
> Since v1:
>  - s/MSI/MSI-X in comments
> ---
>  xen/drivers/vpci/header.c | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 9c603d26d302..f23c956cde6c 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -30,6 +30,10 @@
>  
>  struct map_data {
>      struct domain *d;
> +    /* Start address of the BAR as seen by the guest. */
> +    gfn_t start_gfn;
> +    /* Physical start address of the BAR. */
> +    mfn_t start_mfn;
>      bool map;
>  };
>  
> @@ -37,12 +41,28 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>                       unsigned long *c)
>  {
>      const struct map_data *map = data;
> +    gfn_t start_gfn;
>      int rc;
>  
>      for ( ; ; )
>      {
>          unsigned long size = e - s + 1;
>  
> +        /*
> +         * Any BAR may have holes in its memory we want to map, e.g.
> +         * we don't want to map MSI-X regions which may be a part of that BAR,
> +         * e.g. when a single BAR is used for both MMIO and MSI-X.

IMO there are too many 'e.g.' here.

> +         * In this case MSI-X regions are subtracted from the mapping, but
> +         * map->start_gfn still points to the very beginning of the BAR.
> +         * So if there is a hole present then we need to adjust start_gfn
> +         * to reflect the fact of that substraction.
> +         */

I would simply the comment a bit:

/*
 * Ranges to be mapped don't always start at the BAR start address, as
 * there can be holes or partially consumed ranges. Account for the
 * offset of the current address from the BAR start.
 */

Apart from MSI-X related holes on x86 at least we support preemption
here, which means a range could be partially mapped before yielding.

> +        start_gfn = gfn_add(map->start_gfn, s - mfn_x(map->start_mfn));
> +
> +        printk(XENLOG_G_DEBUG
> +               "%smap [%lx, %lx] -> %#"PRI_gfn" for d%d\n",
> +               map->map ? "" : "un", s, e, gfn_x(start_gfn),
> +               map->d->domain_id);
>          /*
>           * ARM TODOs:
>           * - On ARM whether the memory is prefetchable or not should be passed
> @@ -52,8 +72,10 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>           * - {un}map_mmio_regions doesn't support preemption.
>           */
>  
> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
> +        rc = map->map ? map_mmio_regions(map->d, start_gfn,
> +                                         size, _mfn(s))
> +                      : unmap_mmio_regions(map->d, start_gfn,
> +                                           size, _mfn(s));
>          if ( rc == 0 )
>          {
>              *c += size;
> @@ -69,6 +91,7 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>          ASSERT(rc < size);
>          *c += rc;
>          s += rc;
> +        gfn_add(map->start_gfn, rc);

I think increasing map->start_gfn is wrong here, as it would get out
of sync with map->start_mfn then, and the calculations done to obtain
start_gfn would then be wrong.

>          if ( general_preempt_check() )
>                  return -ERESTART;
>      }
> @@ -149,6 +172,10 @@ bool vpci_process_pending(struct vcpu *v)
>              if ( !bar->mem )
>                  continue;
>  
> +            data.start_gfn =
> +                 _gfn(PFN_DOWN(is_hardware_domain(v->vpci.pdev->domain)

You can just use v->domain here.

> +                               ? bar->addr : bar->guest_addr));

I would place the '?' in the line above, but that's just my taste.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-09-30  7:52 ` [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
@ 2021-10-26 10:52   ` Roger Pau Monné
  2021-11-02 10:48     ` Oleksandr Andrushchenko
  2021-11-02 11:19     ` Jan Beulich
  0 siblings, 2 replies; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26 10:52 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add basic emulation support for guests. At the moment only emulate
> PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left
> as TODO.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> ---
> New in v2
> ---
>  xen/drivers/vpci/header.c | 35 ++++++++++++++++++++++++++++++++---
>  1 file changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index f23c956cde6c..754aeb5a584f 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>          pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t cmd, void *data)
> +{
> +    /* TODO: Add proper emulation for all bits of the command register. */
> +
> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
> +    {
> +        /*
> +         * Guest wants to enable INTx. It can't be enabled if:
> +         *  - host has INTx disabled
> +         *  - MSI/MSI-X enabled
> +         */
> +        if ( pdev->vpci->msi->enabled )
> +            cmd |= PCI_COMMAND_INTX_DISABLE;
> +        else
> +        {
> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
> +
> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
> +                cmd |= PCI_COMMAND_INTX_DISABLE;
> +        }

This last part should be Arm specific. On other architectures we
likely want the guest to modify INTx disable in order to select the
interrupt delivery mode for the device.

I really wonder if we should allow the guest to play with any other
bit apart from INTx disable and memory and IO decoding on the command
register.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 09/11] vpci/header: Reset the command register when adding devices
  2021-09-30  7:52 ` [PATCH v3 09/11] vpci/header: Reset the command register when adding devices Oleksandr Andrushchenko
@ 2021-10-26 11:00   ` Roger Pau Monné
  2021-11-02 11:11     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26 11:00 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, Michal Orzel

On Thu, Sep 30, 2021 at 10:52:21AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Reset the command register when passing through a PCI device:
> it is possible that when passing through a PCI device its memory
> decoding bits in the command register are already set. Thus, a
> guest OS may not write to the command register to update memory
> decoding, so guest mappings (guest's view of the BARs) are
> left not updated.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> ---
> Since v1:
>  - do not write 0 to the command register, but respect host settings.
> ---
>  xen/drivers/vpci/header.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 754aeb5a584f..70d911b147e1 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -451,8 +451,7 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>          pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> -static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> -                            uint32_t cmd, void *data)
> +static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
>  {
>      /* TODO: Add proper emulation for all bits of the command register. */
>  
> @@ -467,14 +466,20 @@ static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>              cmd |= PCI_COMMAND_INTX_DISABLE;
>          else
>          {
> -            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);

Either we keep reg here or we drop the parameter altogether from the
function prototype. Having one caller pass 0 while the other passing
PCI_COMMAND is confusing. The more that the parameter is now
effectively unused.

>  
>              if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>                  cmd |= PCI_COMMAND_INTX_DISABLE;
>          }
>      }
>  
> -    cmd_write(pdev, reg, cmd, data);
> +    return cmd;
> +}
> +
> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t cmd, void *data)
> +{
> +    cmd_write(pdev, reg, emulate_cmd_reg(pdev, cmd), data);
>  }
>  
>  static void bar_write(const struct pci_dev *pdev, unsigned int reg,
> @@ -793,6 +798,10 @@ int vpci_bar_add_handlers(const struct domain *d, const struct pci_dev *pdev)
>          gdprintk(XENLOG_ERR,
>                   "%pp: failed to add BAR handlers for dom%pd: %d\n",
>                   &pdev->sbdf, d, rc);
> +
> +    /* Reset the command register with respect to host settings. */
> +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, emulate_cmd_reg(pdev, 0));

I think we likely want to unset the memory and IO decoding bits from
the command register, as the guest view of the BAR address is
currently forced to 0, and not mapped into the guest p2m.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-09-30  7:52 ` [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology Oleksandr Andrushchenko
  2021-09-30  8:51   ` Jan Beulich
@ 2021-10-26 11:33   ` Roger Pau Monné
  2021-11-03  6:34     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26 11:33 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.
> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v2:
>  - remove casts that are (a) malformed and (b) unnecessary
>  - add new line for better readability
>  - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>     functions are now completely gated with this config
>  - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/common/domain.c           |  3 ++
>  xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>  xen/drivers/vpci/vpci.c       | 14 +++++++-
>  xen/include/xen/pci.h         | 22 +++++++++++++
>  xen/include/xen/sched.h       |  8 +++++
>  5 files changed, 106 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 40d67ec34232..e0170087612d 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PCI
>      INIT_LIST_HEAD(&d->pdev_list);
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    INIT_LIST_HEAD(&d->vdev_list);
> +#endif
>  #endif
>  
>      /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 805ab86ed555..5b963d75d1ba 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>      return ret;
>  }
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
> +                                                const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    list_for_each_entry ( vdev, &d->vdev_list, list )
> +        if ( vdev->pdev == pdev )
> +            return vdev;
> +    return NULL;
> +}
> +
> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    ASSERT(!pci_find_virtual_device(d, pdev));
> +
> +    /* Each PCI bus supports 32 devices/slots at max. */
> +    if ( d->vpci_dev_next > 31 )
> +        return -ENOSPC;
> +
> +    vdev = xzalloc(struct vpci_dev);
> +    if ( !vdev )
> +        return -ENOMEM;
> +
> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
> +    vdev->seg = 0;
> +
> +    /*
> +     * The bus number is set to 0, so virtual devices are seen
> +     * as embedded endpoints behind the root complex.
> +     */
> +    vdev->bus = 0;
> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);

This would likely be better as a bitmap where you set the bits of
in-use slots. Then you can use find_first_bit or similar to get a free
slot.

Long term you might want to allow the caller to provide a pre-selected
slot, as it's possible for users to request the device to appear at a
specific slot on the emulated bus.

> +
> +    vdev->pdev = pdev;
> +    vdev->domain = d;
> +
> +    pcidevs_lock();
> +    list_add_tail(&vdev->list, &d->vdev_list);
> +    pcidevs_unlock();
> +
> +    return 0;
> +}
> +
> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
> +{
> +    struct vpci_dev *vdev;
> +
> +    pcidevs_lock();
> +    vdev = pci_find_virtual_device(d, pdev);
> +    if ( vdev )
> +        list_del(&vdev->list);
> +    pcidevs_unlock();
> +    xfree(vdev);
> +    return 0;
> +}
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> +
>  /* Caller should hold the pcidevs_lock */
>  static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>                             uint8_t devfn)
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 702f7b5d5dda..d787f13e679e 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>  /* Notify vPCI that device is assigned to guest. */
>  int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to assign for hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> -    return vpci_bar_add_handlers(d, dev);
> +    rc = vpci_bar_add_handlers(d, dev);
> +    if ( rc )
> +        return rc;
> +
> +    return pci_add_virtual_device(d, dev);
>  }
>  
>  /* Notify vPCI that device is de-assigned from guest. */
>  int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>  {
> +    int rc;
> +
>      /* It only makes sense to de-assign from hwdom or guest domain. */
>      if ( is_system_domain(d) || !has_vpci(d) )
>          return 0;
>  
> +    rc = pci_remove_virtual_device(d, dev);
> +    if ( rc )
> +        return rc;
> +
>      return vpci_bar_remove_handlers(d, dev);
>  }
>  #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 43b8a0817076..33033a3a8f8d 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -137,6 +137,24 @@ struct pci_dev {
>      struct vpci *vpci;
>  };
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +struct vpci_dev {
> +    struct list_head list;
> +    /* Physical PCI device this virtual device is connected to. */
> +    const struct pci_dev *pdev;
> +    /* Virtual SBDF of the device. */
> +    union {
> +        struct {
> +            uint8_t devfn;
> +            uint8_t bus;
> +            uint16_t seg;
> +        };
> +        pci_sbdf_t sbdf;
> +    };
> +    struct domain *domain;
> +};
> +#endif

I wonder whether this is strictly needed. Won't it be enough to store
the virtual (ie: guest) sbdf inside the existing vpci struct?

It would avoid the overhead of the translation you do from pdev ->
vdev, and there doesn't seem to be anything relevant stored in
vpci_dev apart from the virtual sbdf.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-09-30  7:52 ` [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests Oleksandr Andrushchenko
  2021-09-30  8:53   ` Jan Beulich
  2021-10-18 18:32   ` Julien Grall
@ 2021-10-26 13:30   ` Roger Pau Monné
  2021-10-26 13:57     ` Oleksandr Andrushchenko
  2 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-26 13:30 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Thu, Sep 30, 2021 at 10:52:23AM +0300, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> There are three  originators for the PCI configuration space access:
> 1. The domain that owns physical host bridge: MMIO handlers are
> there so we can update vPCI register handlers with the values
> written by the hardware domain, e.g. physical view of the registers
> vs guest's view on the configuration space.
> 2. Guest access to the passed through PCI devices: we need to properly
> map virtual bus topology to the physical one, e.g. pass the configuration
> space access to the corresponding physical devices.
> 3. Emulated host PCI bridge access. It doesn't exist in the physical
> topology, e.g. it can't be mapped to some physical host bridge.
> So, all access to the host bridge itself needs to be trapped and
> emulated.

I'm slightly confused by the fact that you seem to allow unprivileged
guests to use vPCI in this commit, yet there's still a concerning bit
that AFAICT has not been changed by the series.

vpci_{read,write} will passthough any access not explicitly handled by
vPCI (see the usage of vpci_{read,write}_hw). This is fine for the
hardware domain, but needs inverting for unprivileged guests: any
access not explicitly handled by vPCI needs to be dropped.

> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v2:
>  - pass struct domain instead of struct vcpu
>  - constify arguments where possible
>  - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/arch/arm/domain.c         |  1 +
>  xen/arch/arm/vpci.c           | 86 +++++++++++++++++++++++++++++++----
>  xen/arch/arm/vpci.h           |  3 ++
>  xen/drivers/passthrough/pci.c | 25 ++++++++++
>  xen/include/asm-arm/pci.h     |  1 +
>  xen/include/xen/pci.h         |  1 +
>  xen/include/xen/sched.h       |  2 +
>  7 files changed, 111 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index fa6fcc5e467c..095671742ad8 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d)
>                         get_order_from_bytes(d->arch.efi_acpi_len));
>  #endif
>      domain_io_free(d);
> +    domain_vpci_free(d);

It's a nit, but I think from a logical PoV this should be inverted?
You first free the handlers and then the IO infrastructure.

>  }
>  
>  void arch_domain_shutdown(struct domain *d)
> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
> index 5d6c29c8dcd9..26ec2fa7cf2d 100644
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -17,6 +17,14 @@
>  
>  #define REGISTER_OFFSET(addr)  ( (addr) & 0x00000fff)
>  
> +struct vpci_mmio_priv {
> +    /*
> +     * Set to true if the MMIO handlers were set up for the emulated
> +     * ECAM host PCI bridge.
> +     */
> +    bool is_virt_ecam;
> +};

Is this strictly required? It feels a bit odd to have a structure to
store and single boolean.

I think you could replace it's usage with is_hardware_domain.

> +
>  /* Do some sanity checks. */
>  static bool vpci_mmio_access_allowed(unsigned int reg, unsigned int len)
>  {
> @@ -38,6 +46,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>      pci_sbdf_t sbdf;
>      unsigned long data = ~0UL;
>      unsigned int size = 1U << info->dabt.size;
> +    struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p;
>  
>      sbdf.sbdf = MMCFG_BDF(info->gpa);
>      reg = REGISTER_OFFSET(info->gpa);
> @@ -45,6 +54,13 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>      if ( !vpci_mmio_access_allowed(reg, size) )
>          return 0;
>  
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) )
> +            return 1;
> +
>      data = vpci_read(sbdf, reg, min(4u, size));

Given my earlier recommendation to place the virtual sbdf inside
struct vpci, it might make sense to let vpci_read do the translation
itself.

>      if ( size == 8 )
>          data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32;
> @@ -61,6 +77,7 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>      pci_sbdf_t sbdf;
>      unsigned long data = r;
>      unsigned int size = 1U << info->dabt.size;
> +    struct vpci_mmio_priv *priv = (struct vpci_mmio_priv *)p;
>  
>      sbdf.sbdf = MMCFG_BDF(info->gpa);
>      reg = REGISTER_OFFSET(info->gpa);
> @@ -68,6 +85,13 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>      if ( !vpci_mmio_access_allowed(reg, size) )
>          return 0;
>  
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( priv->is_virt_ecam && !pci_translate_virtual_device(v->domain, &sbdf) )
> +            return 1;
> +
>      vpci_write(sbdf, reg, min(4u, size), data);
>      if ( size == 8 )
>          vpci_write(sbdf, reg + 4, 4, data >> 32);
> @@ -80,13 +104,48 @@ static const struct mmio_handler_ops vpci_mmio_handler = {
>      .write = vpci_mmio_write,
>  };
>  
> +/*
> + * There are three  originators for the PCI configuration space access:
> + * 1. The domain that owns physical host bridge: MMIO handlers are
> + *    there so we can update vPCI register handlers with the values
> + *    written by the hardware domain, e.g. physical view of the registers/
> + *    configuration space.
> + * 2. Guest access to the passed through PCI devices: we need to properly
> + *    map virtual bus topology to the physical one, e.g. pass the configuration
> + *    space access to the corresponding physical devices.
> + * 3. Emulated host PCI bridge access. It doesn't exist in the physical
> + *    topology, e.g. it can't be mapped to some physical host bridge.
> + *    So, all access to the host bridge itself needs to be trapped and
> + *    emulated.

I'm not sure 3. is equivalent to the other points. 1. and 2. seem to
be referring to where accesses to the config space are coming from,
while point 3. is referring to a fully emulated device in Xen (one
that doesn't have a backing pci_dev).

I'm also failing to see any fully virtual PCI device being added to
the bus for guest domains so far.

> + */
>  static int vpci_setup_mmio_handler(struct domain *d,
>                                     struct pci_host_bridge *bridge)
>  {
> -    struct pci_config_window *cfg = bridge->cfg;
> +    struct vpci_mmio_priv *priv;
> +
> +    priv = xzalloc(struct vpci_mmio_priv);
> +    if ( !priv )
> +        return -ENOMEM;
> +
> +    priv->is_virt_ecam = !is_hardware_domain(d);
>  
> -    register_mmio_handler(d, &vpci_mmio_handler,
> -                          cfg->phys_addr, cfg->size, NULL);
> +    if ( is_hardware_domain(d) )
> +    {
> +        struct pci_config_window *cfg = bridge->cfg;
> +
> +        bridge->mmio_priv = priv;
> +        register_mmio_handler(d, &vpci_mmio_handler,
> +                              cfg->phys_addr, cfg->size,
> +                              priv);
> +    }
> +    else
> +    {
> +        d->vpci_mmio_priv = priv;
> +        /* Guest domains use what is programmed in their device tree. */
> +        register_mmio_handler(d, &vpci_mmio_handler,
> +                              GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE,
> +                              priv);
> +    }
>      return 0;
>  }
>  
> @@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d)
>      if ( !has_vpci(d) )
>          return 0;
>  
> +    return pci_host_iterate_bridges(d, vpci_setup_mmio_handler);

I think this is wrong for unprivileged domains: you iterate against
host bridges but just setup a single ECAM region from
GUEST_VPCI_ECAM_BASE to GUEST_VPCI_ECAM_SIZE, so you are leaking
multiple allocations of vpci_mmio_priv, and also adding a bunch of
duplicated IO handlers for the same ECAM region.

IMO you should iterate against host bridges only for the hardware
domain case. For the unpriviledged domain case there's no need to
iterate against the list of physical host bridges as you end up
exposing a fully emulated bus which bears no resemblance to the
physical setup.

> +}
> +
> +static int domain_vpci_free_cb(struct domain *d,
> +                               struct pci_host_bridge *bridge)
> +{
>      if ( is_hardware_domain(d) )
> -        return pci_host_iterate_bridges(d, vpci_setup_mmio_handler);
> +        XFREE(bridge->mmio_priv);
> +    else
> +        XFREE(d->vpci_mmio_priv);
> +    return 0;
> +}
>  
> -    /* Guest domains use what is programmed in their device tree. */
> -    register_mmio_handler(d, &vpci_mmio_handler,
> -                          GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL);
> +void domain_vpci_free(struct domain *d)
> +{
> +    if ( !has_vpci(d) )
> +        return;
>  
> -    return 0;
> +    pci_host_iterate_bridges(d, domain_vpci_free_cb);

Why do we need to iterate the host bridges for unprivileged domains?
AFAICT it just causes duplicated calls to XFREE(d->vpci_mmio_priv). I
would expect something like:

static int bridge_free_cb(struct domain *d,
                          struct pci_host_bridge *bridge)
{
    ASSERT(is_hardware_domain(d));
    XFREE(bridge->mmio_priv);
    return 0;
}

void domain_vpci_free(struct domain *d)
{
    if ( !has_vpci(d) )
        return;

    if ( is_hardware_domain(d) )
        pci_host_iterate_bridges(d, bridge_free_cb);
    else
        XFREE(d->vpci_mmio_priv);
}

Albeit I think there's no need for vpci_mmio_priv in the first place.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests
  2021-10-26 13:30   ` Roger Pau Monné
@ 2021-10-26 13:57     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-26 13:57 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 26.10.21 16:30, Roger Pau Monné wrote:
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index fa6fcc5e467c..095671742ad8 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -797,6 +797,7 @@ void arch_domain_destroy(struct domain *d)
>>                          get_order_from_bytes(d->arch.efi_acpi_len));
>>   #endif
>>       domain_io_free(d);
>> +    domain_vpci_free(d);
> It's a nit, but I think from a logical PoV this should be inverted?
> You first free the handlers and then the IO infrastructure.
Indeed, thanks
>
>>   }
>>   
>>   void arch_domain_shutdown(struct domain *d)
>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>> index 5d6c29c8dcd9..26ec2fa7cf2d 100644
>> --- a/xen/arch/arm/vpci.c
>> +++ b/xen/arch/arm/vpci.c
>> @@ -17,6 +17,14 @@
>>   
>>   #define REGISTER_OFFSET(addr)  ( (addr) & 0x00000fff)
>>   
>> +struct vpci_mmio_priv {
>> +    /*
>> +     * Set to true if the MMIO handlers were set up for the emulated
>> +     * ECAM host PCI bridge.
>> +     */
>> +    bool is_virt_ecam;
>> +};
> Is this strictly required? It feels a bit odd to have a structure to
> store and single boolean.
>
> I think you could replace it's usage with is_hardware_domain.
I am working on some "earlier" patch fixes [1] which already needs some private
to be passed to the handlers: we need to set sbdf.seg to the proper
host bridge segment instead of always setting it to 0.
And then I can pass "struct pci_host_bridge *bridge" as the private member
and use is_hardware_domain(v->domain) to see if this is guest or hwdom.
So, I'll remove the structure completely

[snip]

>> + */
>>   static int vpci_setup_mmio_handler(struct domain *d,
>>                                      struct pci_host_bridge *bridge)
>>   {
>> -    struct pci_config_window *cfg = bridge->cfg;
>> +    struct vpci_mmio_priv *priv;
>> +
>> +    priv = xzalloc(struct vpci_mmio_priv);
>> +    if ( !priv )
>> +        return -ENOMEM;
>> +
>> +    priv->is_virt_ecam = !is_hardware_domain(d);
>>   
>> -    register_mmio_handler(d, &vpci_mmio_handler,
>> -                          cfg->phys_addr, cfg->size, NULL);
>> +    if ( is_hardware_domain(d) )
>> +    {
>> +        struct pci_config_window *cfg = bridge->cfg;
>> +
>> +        bridge->mmio_priv = priv;
>> +        register_mmio_handler(d, &vpci_mmio_handler,
>> +                              cfg->phys_addr, cfg->size,
>> +                              priv);
>> +    }
>> +    else
>> +    {
>> +        d->vpci_mmio_priv = priv;
>> +        /* Guest domains use what is programmed in their device tree. */
>> +        register_mmio_handler(d, &vpci_mmio_handler,
>> +                              GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE,
>> +                              priv);
>> +    }
>>       return 0;
>>   }
>>   
>> @@ -95,14 +154,25 @@ int domain_vpci_init(struct domain *d)
>>       if ( !has_vpci(d) )
>>           return 0;
>>   
>> +    return pci_host_iterate_bridges(d, vpci_setup_mmio_handler);
> I think this is wrong for unprivileged domains: you iterate against
> host bridges but just setup a single ECAM region from
> GUEST_VPCI_ECAM_BASE to GUEST_VPCI_ECAM_SIZE, so you are leaking
> multiple allocations of vpci_mmio_priv, and also adding a bunch of
> duplicated IO handlers for the same ECAM region.
>
> IMO you should iterate against host bridges only for the hardware
> domain case. For the unpriviledged domain case there's no need to
> iterate against the list of physical host bridges as you end up
> exposing a fully emulated bus which bears no resemblance to the
> physical setup.
Yes, I am moving this code into that "earlier" patch [1] and already
spotted the leak: thus I am also re-working this code.
>
>> +}
>> +
>> +static int domain_vpci_free_cb(struct domain *d,
>> +                               struct pci_host_bridge *bridge)
>> +{
>>       if ( is_hardware_domain(d) )
>> -        return pci_host_iterate_bridges(d, vpci_setup_mmio_handler);
>> +        XFREE(bridge->mmio_priv);
>> +    else
>> +        XFREE(d->vpci_mmio_priv);
>> +    return 0;
>> +}
>>   
>> -    /* Guest domains use what is programmed in their device tree. */
>> -    register_mmio_handler(d, &vpci_mmio_handler,
>> -                          GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL);
>> +void domain_vpci_free(struct domain *d)
>> +{
>> +    if ( !has_vpci(d) )
>> +        return;
>>   
>> -    return 0;
>> +    pci_host_iterate_bridges(d, domain_vpci_free_cb);
> Why do we need to iterate the host bridges for unprivileged domains?
No need, I am taking care of this
> AFAICT it just causes duplicated calls to XFREE(d->vpci_mmio_priv). I
> would expect something like:
>
> static int bridge_free_cb(struct domain *d,
>                            struct pci_host_bridge *bridge)
> {
>      ASSERT(is_hardware_domain(d));
>      XFREE(bridge->mmio_priv);
>      return 0;
> }
>
> void domain_vpci_free(struct domain *d)
> {
>      if ( !has_vpci(d) )
>          return;
>
>      if ( is_hardware_domain(d) )
>          pci_host_iterate_bridges(d, bridge_free_cb);
>      else
>          XFREE(d->vpci_mmio_priv);
> }
>
> Albeit I think there's no need for vpci_mmio_priv in the first place.
>
> Thanks, Roger.
Thank you,
Oleksandr

[1] https://patchwork.kernel.org/project/xen-devel/patch/20211008055535.337436-9-andr2000@gmail.com/

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function
  2021-10-13 11:11   ` Roger Pau Monné
@ 2021-10-27  9:12     ` Oleksandr Andrushchenko
  2021-10-27  9:24       ` Roger Pau Monné
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-27  9:12 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko, Michal Orzel

Hi, Roger!

On 13.10.21 14:11, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:13AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> This is in preparation for dynamic assignment of the vpci register
>> handlers depending on the domain: hwdom or guest.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>> ---
>> Since v1:
>>   - constify struct pci_dev where possible
>> ---
>>   xen/drivers/vpci/vpci.c | 7 ++++++-
>>   xen/include/xen/vpci.h  | 2 ++
>>   2 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index cbd1bac7fc33..1666402d55b8 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -35,7 +35,7 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>   extern vpci_register_init_t *const __end_vpci_array[];
>>   #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>   
>> -void vpci_remove_device(struct pci_dev *pdev)
>> +void vpci_remove_device_registers(const struct pci_dev *pdev)
> Making this const is kind of misleading, as you end up modifying
> contents of the pdev, is just that vpci data is stored as a pointer
> inside the struct so you avoid the effects of the constification.
Ok, I will remove const
>
>>   {
>>       spin_lock(&pdev->vpci->lock);
>>       while ( !list_empty(&pdev->vpci->handlers) )
>> @@ -48,6 +48,11 @@ void vpci_remove_device(struct pci_dev *pdev)
>>           xfree(r);
>>       }
>>       spin_unlock(&pdev->vpci->lock);
>> +}
>> +
>> +void vpci_remove_device(struct pci_dev *pdev)
>> +{
>> +    vpci_remove_device_registers(pdev);
>>       xfree(pdev->vpci->msix);
>>       xfree(pdev->vpci->msi);
>>       xfree(pdev->vpci);
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index 9f5b5d52e159..2e910d0b1f90 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -28,6 +28,8 @@ int __must_check vpci_add_handlers(struct pci_dev *dev);
>>   
>>   /* Remove all handlers and free vpci related structures. */
>>   void vpci_remove_device(struct pci_dev *pdev);
>> +/* Remove all handlers for the device given. */
> I would drop the 'given' form the end of the sentence...
Sure
>
>> +void vpci_remove_device_registers(const struct pci_dev *pdev);
> ...and maybe name this vpci_remove_device_handlers as it's clearer
> IMO.
Ok, will rename
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function
  2021-10-27  9:12     ` Oleksandr Andrushchenko
@ 2021-10-27  9:24       ` Roger Pau Monné
  2021-10-27  9:41         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-27  9:24 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Michal Orzel

On Wed, Oct 27, 2021 at 09:12:14AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 13.10.21 14:11, Roger Pau Monné wrote:
> > On Thu, Sep 30, 2021 at 10:52:13AM +0300, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>
> >> This is in preparation for dynamic assignment of the vpci register
> >> handlers depending on the domain: hwdom or guest.
> >>
> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
> >> ---
> >> Since v1:
> >>   - constify struct pci_dev where possible
> >> ---
> >>   xen/drivers/vpci/vpci.c | 7 ++++++-
> >>   xen/include/xen/vpci.h  | 2 ++
> >>   2 files changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> >> index cbd1bac7fc33..1666402d55b8 100644
> >> --- a/xen/drivers/vpci/vpci.c
> >> +++ b/xen/drivers/vpci/vpci.c
> >> @@ -35,7 +35,7 @@ extern vpci_register_init_t *const __start_vpci_array[];
> >>   extern vpci_register_init_t *const __end_vpci_array[];
> >>   #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> >>   
> >> -void vpci_remove_device(struct pci_dev *pdev)
> >> +void vpci_remove_device_registers(const struct pci_dev *pdev)
> > Making this const is kind of misleading, as you end up modifying
> > contents of the pdev, is just that vpci data is stored as a pointer
> > inside the struct so you avoid the effects of the constification.
> Ok, I will remove const

Jan prefers the const, so please leave it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function
  2021-10-27  9:24       ` Roger Pau Monné
@ 2021-10-27  9:41         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-27  9:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Michal Orzel, Oleksandr Andrushchenko



On 27.10.21 12:24, Roger Pau Monné wrote:
> On Wed, Oct 27, 2021 at 09:12:14AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>
>> On 13.10.21 14:11, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:13AM +0300, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> This is in preparation for dynamic assignment of the vpci register
>>>> handlers depending on the domain: hwdom or guest.
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>>>> ---
>>>> Since v1:
>>>>    - constify struct pci_dev where possible
>>>> ---
>>>>    xen/drivers/vpci/vpci.c | 7 ++++++-
>>>>    xen/include/xen/vpci.h  | 2 ++
>>>>    2 files changed, 8 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index cbd1bac7fc33..1666402d55b8 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -35,7 +35,7 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>>>    extern vpci_register_init_t *const __end_vpci_array[];
>>>>    #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>>>    
>>>> -void vpci_remove_device(struct pci_dev *pdev)
>>>> +void vpci_remove_device_registers(const struct pci_dev *pdev)
>>> Making this const is kind of misleading, as you end up modifying
>>> contents of the pdev, is just that vpci data is stored as a pointer
>>> inside the struct so you avoid the effects of the constification.
>> Ok, I will remove const
> Jan prefers the const, so please leave it.
Ooook )
>
> Thanks, Roger.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign
  2021-10-13 11:29   ` Roger Pau Monné
  2021-10-13 12:47     ` Jan Beulich
@ 2021-10-27  9:53     ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-27  9:53 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 13.10.21 14:29, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:14AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> When a PCI device gets assigned/de-assigned some work on vPCI side needs
>> to be done for that device. Introduce a pair of hooks so vPCI can handle
>> that.
>>
>> Please note, that in the current design the error path is handled by
>> the toolstack via XEN_DOMCTL_assign_device/XEN_DOMCTL_deassign_device,
>> so this is why it is acceptable not to de-assign devices if vPCI's
>> assign fails, e.g. the roll back will be handled on deassign_device when
>> it is called by the toolstack.
> It's kind of hard to see what would need to be rolled back, as the
> functions are just dummies right now that don't perform any actions.
>
> I don't think the toolstack should be the one to deal with the
> fallout, as it could leave Xen in a broken state. The current commit
> message doesn't provide any information about why it has been designed
> this way.
Yes, we discussed in other patches that we need not rely on the
toolstack and perform cleanup ourselves, so this the code from the
future to illustrate the roll-back:

int vpci_assign_device(struct domain *d, const struct pci_dev *pdev)
{
     int rc;

     /* It only makes sense to assign for hwdom or guest domain. */
     if ( is_system_domain(d) || !has_vpci(d) )
         return 0;

     rc = vpci_bar_add_handlers(d, pdev);
     if ( rc )
         goto fail;

     rc = vpci_add_virtual_device(d, pdev);
     if ( rc )
     {
         gdprintk(XENLOG_ERR,
                  "%pp: failed to add virtual device for %pd: %d\n",
                  &pdev->sbdf, d, rc);
         goto fail;
     }

     return 0;

fail:
     /*
      * We are trying to clean up as much as we can, so ignore the return
      * value of vpci_deassign_device below, so we can return the
      * error which caused the failure.
      */
     vpci_deassign_device(d, pdev);
     return rc;
}

So, I will drop the part about the toolstack and cleanup from the commit message
>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v2:
>> - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
>>    for x86
>> Since v1:
>>   - constify struct pci_dev where possible
>>   - do not open code is_system_domain()
>>   - extended the commit message
>> ---
>>   xen/drivers/Kconfig           |  4 ++++
>>   xen/drivers/passthrough/pci.c |  9 +++++++++
>>   xen/drivers/vpci/vpci.c       | 23 +++++++++++++++++++++++
>>   xen/include/xen/vpci.h        | 20 ++++++++++++++++++++
>>   4 files changed, 56 insertions(+)
>>
>> diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
>> index db94393f47a6..780490cf8e39 100644
>> --- a/xen/drivers/Kconfig
>> +++ b/xen/drivers/Kconfig
>> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>>   config HAS_VPCI
>>   	bool
>>   
>> +config HAS_VPCI_GUEST_SUPPORT
>> +	bool
>> +	depends on HAS_VPCI
> I would assume this is to go away once the work is finished? I don't
> think it makes sense to split vPCI code between domU/dom0 on a build
> time basis.
>
>> +
>>   endmenu
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 9f804a50e780..805ab86ed555 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -870,6 +870,10 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>       if ( ret )
>>           goto out;
>>   
>> +    ret = vpci_deassign_device(d, pdev);
>> +    if ( ret )
>> +        goto out;
>> +
>>       if ( pdev->domain == hardware_domain  )
>>           pdev->quarantine = false;
>>   
>> @@ -1429,6 +1433,11 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>           rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>       }
>>   
>> +    if ( rc )
>> +        goto done;
>> +
>> +    rc = vpci_assign_device(d, pdev);
>> +
>>    done:
>>       if ( rc )
>>           printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 1666402d55b8..0fe86cb30d23 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -86,6 +86,29 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>   
>>       return rc;
>>   }
>> +
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +/* Notify vPCI that device is assigned to guest. */
>> +int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>> +{
>> +    /* It only makes sense to assign for hwdom or guest domain. */
>> +    if ( is_system_domain(d) || !has_vpci(d) )
>> +        return 0;
>> +
>> +    return 0;
>> +}
>> +
>> +/* Notify vPCI that device is de-assigned from guest. */
>> +int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>> +{
>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>> +    if ( is_system_domain(d) || !has_vpci(d) )
>> +        return 0;
>> +
>> +    return 0;
>> +}
>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>> +
>>   #endif /* __XEN__ */
>>   
>>   static int vpci_register_cmp(const struct vpci_register *r1,
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index 2e910d0b1f90..ecc08f2c0f65 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -242,6 +242,26 @@ static inline bool vpci_process_pending(struct vcpu *v)
>>   }
>>   #endif
>>   
>> +#if defined(CONFIG_HAS_VPCI) && defined(CONFIG_HAS_VPCI_GUEST_SUPPORT)
> You don't need to check for CONFIG_HAS_VPCI, as
> CONFIG_HAS_VPCI_GUEST_SUPPORT already depends on CONFIG_HAS_VPCI being
> set.
>
Will fix
>> +/* Notify vPCI that device is assigned/de-assigned to/from guest. */
>> +int __must_check vpci_assign_device(struct domain *d,
>> +                                    const struct pci_dev *dev);
>> +int __must_check vpci_deassign_device(struct domain *d,
>> +                                      const struct pci_dev *dev);
>> +#else
>> +static inline int vpci_assign_device(struct domain *d,
>> +                                     const struct pci_dev *dev)
>> +{
>> +    return 0;
>> +};
>> +
>> +static inline int vpci_deassign_device(struct domain *d,
>> +                                       const struct pci_dev *dev)
>> +{
>> +    return 0;
>> +};
> You need the __must_check attributes here also to match the prototypes
> above.
Yes, it was already discussed and I will remove __must_check.
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-10-13 13:51   ` Roger Pau Monné
  2021-10-15  6:04     ` Jan Beulich
@ 2021-10-27 10:17     ` Oleksandr Andrushchenko
  2021-10-27 11:59       ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-27 10:17 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 13.10.21 16:51, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> This is in preparation for dynamic assignment of the vPCI register
>> handlers depending on the domain: hwdom or guest.
>> The need for this step is that it is easier to have all related functionality
>> put at one place. When the subsequent patches add decisions on which
>> handlers to install, e.g. hwdom or guest handlers, then this is easily
>> achievable.
> Won't it be possible to select the handlers to install in init_bars
> itself?
It is possible
>
> Splitting it like that means you need to iterate over the numbers of
> BARs twice (one in add_bar_handlers and one in init_bars), which makes
> it more likely to introduce errors or divergences.
>
> Decoupling the filling of vpci_bar data with setting the handlers
> seems slightly confusing.
Ok, I won't introduce add_bar_handlers, thus rendering this patch useless.
I'll drop it and re-work the upcoming patches with this respect

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-10-27 10:17     ` Oleksandr Andrushchenko
@ 2021-10-27 11:59       ` Oleksandr Andrushchenko
  2021-10-27 13:23         ` Roger Pau Monné
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-27 11:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 27.10.21 13:17, Oleksandr Andrushchenko wrote:
> Hi, Roger!
>
> On 13.10.21 16:51, Roger Pau Monné wrote:
>> On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> This is in preparation for dynamic assignment of the vPCI register
>>> handlers depending on the domain: hwdom or guest.
>>> The need for this step is that it is easier to have all related functionality
>>> put at one place. When the subsequent patches add decisions on which
>>> handlers to install, e.g. hwdom or guest handlers, then this is easily
>>> achievable.
>> Won't it be possible to select the handlers to install in init_bars
>> itself?
> It is possible
>> Splitting it like that means you need to iterate over the numbers of
>> BARs twice (one in add_bar_handlers and one in init_bars), which makes
>> it more likely to introduce errors or divergences.
>>
>> Decoupling the filling of vpci_bar data with setting the handlers
>> seems slightly confusing.
> Ok, I won't introduce add_bar_handlers, thus rendering this patch useless.
> I'll drop it and re-work the upcoming patches with this respect
On the other hand after thinking a bit more.
What actually init_bars do?
1. Runs once per each pdev (__init?)
2. Sizes the BARs and detects their type, sets up pdev->vpci->header BAR values
3. Adds register handlers.

For DomU we only need 3), so we can setup guest handlers.
So, from this POV either we need to have a yet another add_bar_handlers
or similar for at least the guests and the case when pdev is assigned back to hwdom.

So this can be a reason to defend the current approach with add_bar_handlers.

Or? Do you have an idea how to do that some other way?
>
> Thank you,
> Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-10-27 11:59       ` Oleksandr Andrushchenko
@ 2021-10-27 13:23         ` Roger Pau Monné
  2021-10-27 14:06           ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-27 13:23 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh

On Wed, Oct 27, 2021 at 11:59:47AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 27.10.21 13:17, Oleksandr Andrushchenko wrote:
> > Hi, Roger!
> >
> > On 13.10.21 16:51, Roger Pau Monné wrote:
> >> On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
> >>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>
> >>> This is in preparation for dynamic assignment of the vPCI register
> >>> handlers depending on the domain: hwdom or guest.
> >>> The need for this step is that it is easier to have all related functionality
> >>> put at one place. When the subsequent patches add decisions on which
> >>> handlers to install, e.g. hwdom or guest handlers, then this is easily
> >>> achievable.
> >> Won't it be possible to select the handlers to install in init_bars
> >> itself?
> > It is possible
> >> Splitting it like that means you need to iterate over the numbers of
> >> BARs twice (one in add_bar_handlers and one in init_bars), which makes
> >> it more likely to introduce errors or divergences.
> >>
> >> Decoupling the filling of vpci_bar data with setting the handlers
> >> seems slightly confusing.
> > Ok, I won't introduce add_bar_handlers, thus rendering this patch useless.
> > I'll drop it and re-work the upcoming patches with this respect
> On the other hand after thinking a bit more.
> What actually init_bars do?
> 1. Runs once per each pdev (__init?)
> 2. Sizes the BARs and detects their type, sets up pdev->vpci->header BAR values
> 3. Adds register handlers.
> 
> For DomU we only need 3), so we can setup guest handlers.

I think you assume that there will always be a hardware domain with
vPCI enabled that will get the device assigned and thus init_bars will
be executed prior to assigning to a domU.

But what about dom0less, or when using a classic PV dom0? In that case
the device won't get assigned to a hardware domain with vPCI support,
so the vpci structure won't be allocated or filled, and hence
init_bars would have to be executed when assigning to a domU.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-10-27 13:23         ` Roger Pau Monné
@ 2021-10-27 14:06           ` Oleksandr Andrushchenko
  2021-10-27 15:34             ` Roger Pau Monné
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-10-27 14:06 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Oleksandr Andrushchenko, Rahul Singh



On 27.10.21 16:23, Roger Pau Monné wrote:
> On Wed, Oct 27, 2021 at 11:59:47AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>
>> On 27.10.21 13:17, Oleksandr Andrushchenko wrote:
>>> Hi, Roger!
>>>
>>> On 13.10.21 16:51, Roger Pau Monné wrote:
>>>> On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>
>>>>> This is in preparation for dynamic assignment of the vPCI register
>>>>> handlers depending on the domain: hwdom or guest.
>>>>> The need for this step is that it is easier to have all related functionality
>>>>> put at one place. When the subsequent patches add decisions on which
>>>>> handlers to install, e.g. hwdom or guest handlers, then this is easily
>>>>> achievable.
>>>> Won't it be possible to select the handlers to install in init_bars
>>>> itself?
>>> It is possible
>>>> Splitting it like that means you need to iterate over the numbers of
>>>> BARs twice (one in add_bar_handlers and one in init_bars), which makes
>>>> it more likely to introduce errors or divergences.
>>>>
>>>> Decoupling the filling of vpci_bar data with setting the handlers
>>>> seems slightly confusing.
>>> Ok, I won't introduce add_bar_handlers, thus rendering this patch useless.
>>> I'll drop it and re-work the upcoming patches with this respect
>> On the other hand after thinking a bit more.
>> What actually init_bars do?
>> 1. Runs once per each pdev (__init?)
>> 2. Sizes the BARs and detects their type, sets up pdev->vpci->header BAR values
>> 3. Adds register handlers.
>>
>> For DomU we only need 3), so we can setup guest handlers.
> I think you assume that there will always be a hardware domain with
> vPCI enabled that will get the device assigned and thus init_bars will
> be executed prior to assigning to a domU.
Yes, this is the current assumption...
>
> But what about dom0less,
it was decided to put dom0less out of scope for now
>   or when using a classic PV dom0?
I thought that vPCI is only used for PVH Dom0 and it is enough for now
(yes, this is a weak argument, but we do not want PCI passthrough on Arm
to become a never ending game... since 2015...)
>   In that case
> the device won't get assigned to a hardware domain with vPCI support,
> so the vpci structure won't be allocated or filled,
Yes, this is true. But because of the 3 functionflities of the init_bars is
doing it might still need some dis-aggregation, e.g. BAR sizing
is not needed and might not be possible while assigning to a DomU.
So, I think that init_bars will need to be split in any case.
>   and hence
> init_bars would have to be executed when assigning to a domU.
Please see above: not sure init_bars can exist in its form to achieve that.
One of the steps this patch is doing is we split init_bars into
a) register assignment
b) all the reset: initial pdev's header initialization, sizing etc.

The same is true for MSI/MSI-X. When we add support for MSI/MSI-X on Arm
you will see the same: we need to split [1] (this is WIP).

So, I am still convinced that we need add_bar_handlers in some form.
> Thanks, Roger.
>
[1] https://gitlab.com/rahsingh/xen-integration/-/commit/7b898601261fc3ad834ac3d06cc4c784f33c95bb

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 03/11] vpci/header: Move register assignments from init_bars
  2021-10-27 14:06           ` Oleksandr Andrushchenko
@ 2021-10-27 15:34             ` Roger Pau Monné
  0 siblings, 0 replies; 98+ messages in thread
From: Roger Pau Monné @ 2021-10-27 15:34 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh

On Wed, Oct 27, 2021 at 02:06:40PM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 27.10.21 16:23, Roger Pau Monné wrote:
> > On Wed, Oct 27, 2021 at 11:59:47AM +0000, Oleksandr Andrushchenko wrote:
> >> Hi, Roger!
> >>
> >> On 27.10.21 13:17, Oleksandr Andrushchenko wrote:
> >>> Hi, Roger!
> >>>
> >>> On 13.10.21 16:51, Roger Pau Monné wrote:
> >>>> On Thu, Sep 30, 2021 at 10:52:15AM +0300, Oleksandr Andrushchenko wrote:
> >>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>>>
> >>>>> This is in preparation for dynamic assignment of the vPCI register
> >>>>> handlers depending on the domain: hwdom or guest.
> >>>>> The need for this step is that it is easier to have all related functionality
> >>>>> put at one place. When the subsequent patches add decisions on which
> >>>>> handlers to install, e.g. hwdom or guest handlers, then this is easily
> >>>>> achievable.
> >>>> Won't it be possible to select the handlers to install in init_bars
> >>>> itself?
> >>> It is possible
> >>>> Splitting it like that means you need to iterate over the numbers of
> >>>> BARs twice (one in add_bar_handlers and one in init_bars), which makes
> >>>> it more likely to introduce errors or divergences.
> >>>>
> >>>> Decoupling the filling of vpci_bar data with setting the handlers
> >>>> seems slightly confusing.
> >>> Ok, I won't introduce add_bar_handlers, thus rendering this patch useless.
> >>> I'll drop it and re-work the upcoming patches with this respect
> >> On the other hand after thinking a bit more.
> >> What actually init_bars do?
> >> 1. Runs once per each pdev (__init?)
> >> 2. Sizes the BARs and detects their type, sets up pdev->vpci->header BAR values
> >> 3. Adds register handlers.
> >>
> >> For DomU we only need 3), so we can setup guest handlers.
> > I think you assume that there will always be a hardware domain with
> > vPCI enabled that will get the device assigned and thus init_bars will
> > be executed prior to assigning to a domU.
> Yes, this is the current assumption...
> >
> > But what about dom0less,
> it was decided to put dom0less out of scope for now
> >   or when using a classic PV dom0?
> I thought that vPCI is only used for PVH Dom0 and it is enough for now
> (yes, this is a weak argument, but we do not want PCI passthrough on Arm
> to become a never ending game... since 2015...)

I understand that not everything will be supported, that's perfectly
fine, but we should aim to not make supporting those use cases
harder in the future.

> >   In that case
> > the device won't get assigned to a hardware domain with vPCI support,
> > so the vpci structure won't be allocated or filled,
> Yes, this is true. But because of the 3 functionflities of the init_bars is
> doing it might still need some dis-aggregation, e.g. BAR sizing
> is not needed and might not be possible while assigning to a DomU.
> So, I think that init_bars will need to be split in any case.

I understand that BAR sizing will not be needed if the structure is
pre-initialized, but I also cannot see why it would be impossible, at
least on x86.

> >   and hence
> > init_bars would have to be executed when assigning to a domU.
> Please see above: not sure init_bars can exist in its form to achieve that.
> One of the steps this patch is doing is we split init_bars into
> a) register assignment
> b) all the reset: initial pdev's header initialization, sizing etc.
> 
> The same is true for MSI/MSI-X. When we add support for MSI/MSI-X on Arm
> you will see the same: we need to split [1] (this is WIP).
> 
> So, I am still convinced that we need add_bar_handlers in some form.

I'm fine to split it if there's a hard requirement, but I'm afraid so
far I'm not convinced it's required. Maybe if you could elaborate on
why BAR sizing might not be possible when assigning to domU I could be
convinced.

Another option might be to just modify init_bars to have slightly
different paths for dom0 vs domU.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-10-25 15:48   ` Roger Pau Monné
@ 2021-11-01  9:18     ` Oleksandr Andrushchenko
  2021-11-02 10:03       ` Roger Pau Monné
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-01  9:18 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko


>> +    if ( rc )
>> +        gdprintk(XENLOG_ERR,
>> +                 "%pp: failed to add BAR handlers for dom%pd: %d\n",
>> +                 &pdev->sbdf, d, rc);
>> +    return rc;
>> +}
>> +
>> +int vpci_bar_remove_handlers(const struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    /* Remove previously added registers. */
>> +    vpci_remove_device_registers(pdev);
>> +    return 0;
>> +}
>> +#endif
>> +
>>   /*
>>    * Local variables:
>>    * mode: C
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 0fe86cb30d23..702f7b5d5dda 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -95,7 +95,7 @@ int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> -    return 0;
>> +    return vpci_bar_add_handlers(d, dev);
>>   }
>>   
>>   /* Notify vPCI that device is de-assigned from guest. */
>> @@ -105,7 +105,7 @@ int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> -    return 0;
>> +    return vpci_bar_remove_handlers(d, dev);
> I think it would be better to use something similar to
> REGISTER_VPCI_INIT here, otherwise this will need to be modified every
> time a new capability is handled by Xen.
>
> Maybe we could reuse or expand REGISTER_VPCI_INIT adding another field
> to be used for guest initialization?
>
>>   }
>>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>   
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index ecc08f2c0f65..fd822c903af5 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -57,6 +57,14 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
>>    */
>>   bool __must_check vpci_process_pending(struct vcpu *v);
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +/* Add/remove BAR handlers for a domain. */
>> +int vpci_bar_add_handlers(const struct domain *d,
>> +                          const struct pci_dev *pdev);
>> +int vpci_bar_remove_handlers(const struct domain *d,
>> +                             const struct pci_dev *pdev);
>> +#endif
> This would then go away if we implement a mechanism similar to
> REGISTER_VPCI_INIT.
>
> Thanks, Roger.
Ok, so I can extend REGISTER_VPCI_INIT with an action parameter:

"There are number of actions to be taken while first initializing vPCI
for a PCI device or when the device is assigned to a guest or when it
is de-assigned and so on.
Every time a new action is needed during these steps we need to call some
relevant function to handle that. Make it is easier to track the required
steps by extending REGISTER_VPCI_INIT machinery with an action parameter
which shows which exactly step/action is being performed."

So, we have

-typedef int vpci_register_init_t(struct pci_dev *dev);
+enum VPCI_INIT_ACTION {
+  VPCI_INIT_ADD,
+  VPCI_INIT_ASSIGN,
+  VPCI_INIT_DEASSIGN,
+};
+
+typedef int vpci_register_init_t(struct pci_dev *dev,
+                                 enum VPCI_INIT_ACTION action);

and, for example,

@@ -452,6 +452,9 @@ static int init_bars(struct pci_dev *pdev)
      struct vpci_bar *bars = header->bars;
      int rc;

+    if ( action != VPCI_INIT_ADD )
+        return 0;
+

I was thinking about adding dedicated machinery similar to REGISTER_VPCI_INIT,
e.g. REGISTER_VPCI_{ASSIGN|DEASSIGN} + dedicated sections in the linker scripts,
but it seems not worth it: these steps are only executed at device init/assign/deassign,
so extending the existing approach doesn't seem to hurt performance much.

Please let me know if this is what you mean, so I can re-work the relevant code.

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-11-01  9:18     ` Oleksandr Andrushchenko
@ 2021-11-02 10:03       ` Roger Pau Monné
  2021-11-02 10:29         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-11-02 10:03 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh

On Mon, Nov 01, 2021 at 09:18:17AM +0000, Oleksandr Andrushchenko wrote:
> 
> >> +    if ( rc )
> >> +        gdprintk(XENLOG_ERR,
> >> +                 "%pp: failed to add BAR handlers for dom%pd: %d\n",
> >> +                 &pdev->sbdf, d, rc);
> >> +    return rc;
> >> +}
> >> +
> >> +int vpci_bar_remove_handlers(const struct domain *d, const struct pci_dev *pdev)
> >> +{
> >> +    /* Remove previously added registers. */
> >> +    vpci_remove_device_registers(pdev);
> >> +    return 0;
> >> +}
> >> +#endif
> >> +
> >>   /*
> >>    * Local variables:
> >>    * mode: C
> >> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> >> index 0fe86cb30d23..702f7b5d5dda 100644
> >> --- a/xen/drivers/vpci/vpci.c
> >> +++ b/xen/drivers/vpci/vpci.c
> >> @@ -95,7 +95,7 @@ int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
> >>       if ( is_system_domain(d) || !has_vpci(d) )
> >>           return 0;
> >>   
> >> -    return 0;
> >> +    return vpci_bar_add_handlers(d, dev);
> >>   }
> >>   
> >>   /* Notify vPCI that device is de-assigned from guest. */
> >> @@ -105,7 +105,7 @@ int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
> >>       if ( is_system_domain(d) || !has_vpci(d) )
> >>           return 0;
> >>   
> >> -    return 0;
> >> +    return vpci_bar_remove_handlers(d, dev);
> > I think it would be better to use something similar to
> > REGISTER_VPCI_INIT here, otherwise this will need to be modified every
> > time a new capability is handled by Xen.
> >
> > Maybe we could reuse or expand REGISTER_VPCI_INIT adding another field
> > to be used for guest initialization?
> >
> >>   }
> >>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> >>   
> >> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> >> index ecc08f2c0f65..fd822c903af5 100644
> >> --- a/xen/include/xen/vpci.h
> >> +++ b/xen/include/xen/vpci.h
> >> @@ -57,6 +57,14 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
> >>    */
> >>   bool __must_check vpci_process_pending(struct vcpu *v);
> >>   
> >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> >> +/* Add/remove BAR handlers for a domain. */
> >> +int vpci_bar_add_handlers(const struct domain *d,
> >> +                          const struct pci_dev *pdev);
> >> +int vpci_bar_remove_handlers(const struct domain *d,
> >> +                             const struct pci_dev *pdev);
> >> +#endif
> > This would then go away if we implement a mechanism similar to
> > REGISTER_VPCI_INIT.
> >
> > Thanks, Roger.
> Ok, so I can extend REGISTER_VPCI_INIT with an action parameter:
> 
> "There are number of actions to be taken while first initializing vPCI
> for a PCI device or when the device is assigned to a guest or when it
> is de-assigned and so on.
> Every time a new action is needed during these steps we need to call some
> relevant function to handle that. Make it is easier to track the required
> steps by extending REGISTER_VPCI_INIT machinery with an action parameter
> which shows which exactly step/action is being performed."
> 
> So, we have
> 
> -typedef int vpci_register_init_t(struct pci_dev *dev);
> +enum VPCI_INIT_ACTION {
> +  VPCI_INIT_ADD,
> +  VPCI_INIT_ASSIGN,
> +  VPCI_INIT_DEASSIGN,
> +};
> +
> +typedef int vpci_register_init_t(struct pci_dev *dev,
> +                                 enum VPCI_INIT_ACTION action);
> 
> and, for example,
> 
> @@ -452,6 +452,9 @@ static int init_bars(struct pci_dev *pdev)
>       struct vpci_bar *bars = header->bars;
>       int rc;
> 
> +    if ( action != VPCI_INIT_ADD )
> +        return 0;
> +
> 
> I was thinking about adding dedicated machinery similar to REGISTER_VPCI_INIT,
> e.g. REGISTER_VPCI_{ASSIGN|DEASSIGN} + dedicated sections in the linker scripts,
> but it seems not worth it: these steps are only executed at device init/assign/deassign,
> so extending the existing approach doesn't seem to hurt performance much.
> 
> Please let me know if this is what you mean, so I can re-work the relevant code.

I'm afraid I'm still unsure whether we need an explicit helper to
execute when assigning a device, rather than just using the current
init helpers (init_bars &c).

You said that sizing the BARs when assigning to a domU was not
possible [0], but I'm missing an explanation of why it's not possible,
as I think that won't be an issue on x86 [1].

Thanks, Roger.

[0] https://lore.kernel.org/xen-devel/368bf4b5-f9fd-76a6-294e-dbb93a18e73f@epam.com/
[1] https://lore.kernel.org/xen-devel/YXlxmdYdwptakDDK@Air-de-Roger/


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically
  2021-11-02 10:03       ` Roger Pau Monné
@ 2021-11-02 10:29         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-02 10:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko



On 02.11.21 12:03, Roger Pau Monné wrote:
> On Mon, Nov 01, 2021 at 09:18:17AM +0000, Oleksandr Andrushchenko wrote:
>>>> +    if ( rc )
>>>> +        gdprintk(XENLOG_ERR,
>>>> +                 "%pp: failed to add BAR handlers for dom%pd: %d\n",
>>>> +                 &pdev->sbdf, d, rc);
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +int vpci_bar_remove_handlers(const struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    /* Remove previously added registers. */
>>>> +    vpci_remove_device_registers(pdev);
>>>> +    return 0;
>>>> +}
>>>> +#endif
>>>> +
>>>>    /*
>>>>     * Local variables:
>>>>     * mode: C
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index 0fe86cb30d23..702f7b5d5dda 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -95,7 +95,7 @@ int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> -    return 0;
>>>> +    return vpci_bar_add_handlers(d, dev);
>>>>    }
>>>>    
>>>>    /* Notify vPCI that device is de-assigned from guest. */
>>>> @@ -105,7 +105,7 @@ int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> -    return 0;
>>>> +    return vpci_bar_remove_handlers(d, dev);
>>> I think it would be better to use something similar to
>>> REGISTER_VPCI_INIT here, otherwise this will need to be modified every
>>> time a new capability is handled by Xen.
>>>
>>> Maybe we could reuse or expand REGISTER_VPCI_INIT adding another field
>>> to be used for guest initialization?
>>>
>>>>    }
>>>>    #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>>    
>>>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>>>> index ecc08f2c0f65..fd822c903af5 100644
>>>> --- a/xen/include/xen/vpci.h
>>>> +++ b/xen/include/xen/vpci.h
>>>> @@ -57,6 +57,14 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
>>>>     */
>>>>    bool __must_check vpci_process_pending(struct vcpu *v);
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +/* Add/remove BAR handlers for a domain. */
>>>> +int vpci_bar_add_handlers(const struct domain *d,
>>>> +                          const struct pci_dev *pdev);
>>>> +int vpci_bar_remove_handlers(const struct domain *d,
>>>> +                             const struct pci_dev *pdev);
>>>> +#endif
>>> This would then go away if we implement a mechanism similar to
>>> REGISTER_VPCI_INIT.
>>>
>>> Thanks, Roger.
>> Ok, so I can extend REGISTER_VPCI_INIT with an action parameter:
>>
>> "There are number of actions to be taken while first initializing vPCI
>> for a PCI device or when the device is assigned to a guest or when it
>> is de-assigned and so on.
>> Every time a new action is needed during these steps we need to call some
>> relevant function to handle that. Make it is easier to track the required
>> steps by extending REGISTER_VPCI_INIT machinery with an action parameter
>> which shows which exactly step/action is being performed."
>>
>> So, we have
>>
>> -typedef int vpci_register_init_t(struct pci_dev *dev);
>> +enum VPCI_INIT_ACTION {
>> +  VPCI_INIT_ADD,
>> +  VPCI_INIT_ASSIGN,
>> +  VPCI_INIT_DEASSIGN,
>> +};
>> +
>> +typedef int vpci_register_init_t(struct pci_dev *dev,
>> +                                 enum VPCI_INIT_ACTION action);
>>
>> and, for example,
>>
>> @@ -452,6 +452,9 @@ static int init_bars(struct pci_dev *pdev)
>>        struct vpci_bar *bars = header->bars;
>>        int rc;
>>
>> +    if ( action != VPCI_INIT_ADD )
>> +        return 0;
>> +
>>
>> I was thinking about adding dedicated machinery similar to REGISTER_VPCI_INIT,
>> e.g. REGISTER_VPCI_{ASSIGN|DEASSIGN} + dedicated sections in the linker scripts,
>> but it seems not worth it: these steps are only executed at device init/assign/deassign,
>> so extending the existing approach doesn't seem to hurt performance much.
>>
>> Please let me know if this is what you mean, so I can re-work the relevant code.
> I'm afraid I'm still unsure whether we need an explicit helper to
> execute when assigning a device, rather than just using the current
> init helpers (init_bars &c).
>
> You said that sizing the BARs when assigning to a domU was not
> possible [0], but I'm missing an explanation of why it's not possible,
> as I think that won't be an issue on x86 [1].
I am in the process of re-working this and the relevant patches.
At the moment I have those helpers, but it seems I can remove them.
Once I finish the series I (most probably) will remove those.
>
> Thanks, Roger.
>
> [0] https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/368bf4b5-f9fd-76a6-294e-dbb93a18e73f@epam.com/__;!!GF_29dbcQIUBPA!mGz2uzJKNZsMr3R8awokkSOjo8ETjOS9N-JVkTIOJW5BYxvKgtZrKamPJq59I5u2GCDnsY4dQQ$ [lore[.]kernel[.]org]
> [1] https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/YXlxmdYdwptakDDK@Air-de-Roger/__;!!GF_29dbcQIUBPA!mGz2uzJKNZsMr3R8awokkSOjo8ETjOS9N-JVkTIOJW5BYxvKgtZrKamPJq59I5u2GCAHHkrD1g$ [lore[.]kernel[.]org]

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR
  2021-10-26  9:08   ` Roger Pau Monné
@ 2021-11-02 10:34     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-02 10:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 26.10.21 12:08, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:18AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Instead of handling a single range set, that contains all the memory
>> regions of all the BARs and ROM, have them per BAR.
>>
>> This is in preparation of making non-identity mappings in p2m for the
>> MMIOs/ROM.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   xen/drivers/vpci/header.c | 172 ++++++++++++++++++++++++++------------
>>   xen/include/xen/vpci.h    |   3 +-
>>   2 files changed, 122 insertions(+), 53 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ec4d215f36ff..9c603d26d302 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -131,49 +131,75 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>>   
>>   bool vpci_process_pending(struct vcpu *v)
>>   {
>> -    if ( v->vpci.mem )
>> +    if ( v->vpci.num_mem_ranges )
>>       {
>>           struct map_data data = {
>>               .d = v->domain,
>>               .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>>           };
>> -        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
>> +        struct pci_dev *pdev = v->vpci.pdev;
>> +        struct vpci_header *header = &pdev->vpci->header;
>> +        unsigned int i;
>>   
>> -        if ( rc == -ERESTART )
>> -            return true;
>> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +        {
>> +            struct vpci_bar *bar = &header->bars[i];
>> +            int rc;
>>   
>> -        spin_lock(&v->vpci.pdev->vpci->lock);
>> -        /* Disable memory decoding unconditionally on failure. */
>> -        modify_decoding(v->vpci.pdev,
>> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>> -                        !rc && v->vpci.rom_only);
>> -        spin_unlock(&v->vpci.pdev->vpci->lock);
>> +            if ( !bar->mem )
>> +                continue;
>>   
>> -        rangeset_destroy(v->vpci.mem);
>> -        v->vpci.mem = NULL;
>> -        if ( rc )
>> -            /*
>> -             * FIXME: in case of failure remove the device from the domain.
>> -             * Note that there might still be leftover mappings. While this is
>> -             * safe for Dom0, for DomUs the domain will likely need to be
>> -             * killed in order to avoid leaking stale p2m mappings on
>> -             * failure.
>> -             */
>> -            vpci_remove_device(v->vpci.pdev);
>> +            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
>> +
>> +            if ( rc == -ERESTART )
>> +                return true;
>> +
>> +            spin_lock(&pdev->vpci->lock);
>> +            /* Disable memory decoding unconditionally on failure. */
>> +            modify_decoding(pdev,
>> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>> +                            !rc && v->vpci.rom_only);
>> +            spin_unlock(&pdev->vpci->lock);
>> +
>> +            rangeset_destroy(bar->mem);
> Now that the rangesets are per-BAR we might have to consider
> allocating them at initialization time and not destroying them when
> empty. We could replace the NULL checks with rangeset_is_empty
> instead. Not that you have to do this on this patch, but I think it's
> worth mentioning.
Yes, this is a good idea. I will re-work the patch to create/destroy
the rangesets once in add/remove
>
>> +            bar->mem = NULL;
>> +            v->vpci.num_mem_ranges--;
>> +            if ( rc )
>> +                /*
>> +                 * FIXME: in case of failure remove the device from the domain.
>> +                 * Note that there might still be leftover mappings. While this is
>> +                 * safe for Dom0, for DomUs the domain will likely need to be
>> +                 * killed in order to avoid leaking stale p2m mappings on
>> +                 * failure.
>> +                 */
>> +                vpci_remove_device(pdev);
>> +        }
>>       }
>>   
>>       return false;
>>   }
>>   
>>   static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>> -                            struct rangeset *mem, uint16_t cmd)
>> +                            uint16_t cmd)
>>   {
>>       struct map_data data = { .d = d, .map = true };
>> -    int rc;
>> +    struct vpci_header *header = &pdev->vpci->header;
>> +    int rc = 0;
>> +    unsigned int i;
>> +
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +    {
>> +        struct vpci_bar *bar = &header->bars[i];
>>   
>> -    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
>> -        process_pending_softirqs();
>> -    rangeset_destroy(mem);
>> +        if ( !bar->mem )
>> +            continue;
>> +
>> +        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
>> +                                              &data)) == -ERESTART )
>> +            process_pending_softirqs();
>> +        rangeset_destroy(bar->mem);
>> +        bar->mem = NULL;
>> +    }
>>       if ( !rc )
>>           modify_decoding(pdev, cmd, false);
>>   
>> @@ -181,7 +207,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>>   }
>>   
>>   static void defer_map(struct domain *d, struct pci_dev *pdev,
>> -                      struct rangeset *mem, uint16_t cmd, bool rom_only)
>> +                      uint16_t cmd, bool rom_only, uint8_t num_mem_ranges)
> Like mentioned below, I don't think you need to pass the number of
> BARs that need mapping changes. Iff that's strictly needed, it should
> be an unsigned int.
bool map_pending :1 works great
>
>>   {
>>       struct vcpu *curr = current;
>>   
>> @@ -192,9 +218,9 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>>        * started for the same device if the domain is not well-behaved.
>>        */
>>       curr->vpci.pdev = pdev;
>> -    curr->vpci.mem = mem;
>>       curr->vpci.cmd = cmd;
>>       curr->vpci.rom_only = rom_only;
>> +    curr->vpci.num_mem_ranges = num_mem_ranges;
>>       /*
>>        * Raise a scheduler softirq in order to prevent the guest from resuming
>>        * execution with pending mapping operations, to trigger the invocation
>> @@ -206,42 +232,47 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>>   static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>   {
>>       struct vpci_header *header = &pdev->vpci->header;
>> -    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>>       struct pci_dev *tmp, *dev = NULL;
>>       const struct vpci_msix *msix = pdev->vpci->msix;
>> -    unsigned int i;
>> +    unsigned int i, j;
>>       int rc;
>> -
>> -    if ( !mem )
>> -        return -ENOMEM;
>> +    uint8_t num_mem_ranges;
>>   
>>       /*
>> -     * Create a rangeset that represents the current device BARs memory region
>> +     * Create a rangeset per BAR that represents the current device memory region
>>        * and compare it against all the currently active BAR memory regions. If
>>        * an overlap is found, subtract it from the region to be mapped/unmapped.
>>        *
>> -     * First fill the rangeset with all the BARs of this device or with the ROM
>> +     * First fill the rangesets with all the BARs of this device or with the ROM
>>        * BAR only, depending on whether the guest is toggling the memory decode
>>        * bit of the command register, or the enable bit of the ROM BAR register.
>>        */
>>       for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>       {
>> -        const struct vpci_bar *bar = &header->bars[i];
>> +        struct vpci_bar *bar = &header->bars[i];
>>           unsigned long start = PFN_DOWN(bar->addr);
>>           unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>   
>> +        bar->mem = NULL;
> Why do you need to set mem to NULL here? I think we should instead
> assert that bar->mem == NULL here.
I will put an ASSERT here
>
>> +
>>           if ( !MAPPABLE_BAR(bar) ||
>>                (rom_only ? bar->type != VPCI_BAR_ROM
>>                          : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
>>               continue;
>>   
>> -        rc = rangeset_add_range(mem, start, end);
>> +        bar->mem = rangeset_new(NULL, NULL, 0);
>> +        if ( !bar->mem )
>> +        {
>> +            rc = -ENOMEM;
>> +            goto fail;
>> +        }
>> +
>> +        rc = rangeset_add_range(bar->mem, start, end);
>>           if ( rc )
>>           {
>>               printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
>>                      start, end, rc);
>> -            rangeset_destroy(mem);
>> -            return rc;
>> +            goto fail;
>>           }
>>       }
>>   
>> @@ -252,14 +283,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>           unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>>                                        vmsix_table_size(pdev->vpci, i) - 1);
>>   
>> -        rc = rangeset_remove_range(mem, start, end);
>> -        if ( rc )
>> +        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
>>           {
>> -            printk(XENLOG_G_WARNING
>> -                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
>> -                   start, end, rc);
>> -            rangeset_destroy(mem);
>> -            return rc;
>> +            const struct vpci_bar *bar = &header->bars[j];
>> +
>> +            if ( !bar->mem )
>> +                continue;
>> +
>> +            rc = rangeset_remove_range(bar->mem, start, end);
>> +            if ( rc )
>> +            {
>> +                printk(XENLOG_G_WARNING
>> +                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
>> +                       start, end, rc);
>> +                goto fail;
>> +            }
>>           }
>>       }
>>   
>> @@ -291,7 +329,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>               unsigned long start = PFN_DOWN(bar->addr);
>>               unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>   
>> -            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
>> +            if ( !bar->enabled ||
>> +                 !rangeset_overlaps_range(bar->mem, start, end) ||
>>                    /*
>>                     * If only the ROM enable bit is toggled check against other
>>                     * BARs in the same device for overlaps, but not against the
>> @@ -300,13 +339,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>                    (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
>>                   continue;
>>   
>> -            rc = rangeset_remove_range(mem, start, end);
>> +            rc = rangeset_remove_range(bar->mem, start, end);
>>               if ( rc )
>>               {
>>                   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>>                          start, end, rc);
>> -                rangeset_destroy(mem);
>> -                return rc;
>> +                goto fail;
>>               }
>>           }
>>       }
>> @@ -324,12 +362,42 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>            * will always be to establish mappings and process all the BARs.
>>            */
>>           ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
>> -        return apply_map(pdev->domain, pdev, mem, cmd);
>> +        return apply_map(pdev->domain, pdev, cmd);
>>       }
>>   
>> -    defer_map(dev->domain, dev, mem, cmd, rom_only);
>> +    /* Find out how many memory ranges has left after MSI and overlaps. */
>> +    num_mem_ranges = 0;
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +    {
>> +        struct vpci_bar *bar = &header->bars[i];
> There's no need to declare this local variable AFAICT, just use
> header->bars[i].mem.
Ok
>   In any case this is likely to go away if you
> follow my recommendation below to just call defer_map unconditionally
> like it's currently done.
Please see below
>> +
>> +        if ( !rangeset_is_empty(bar->mem) )
>> +            num_mem_ranges++;
>> +    }
>> +
>> +    /*
>> +     * There are cases when PCI device, root port for example, has neither
>> +     * memory space nor IO. In this case PCI command register write is
>> +     * missed resulting in the underlying PCI device not functional, so:
>> +     *   - if there are no regions write the command register now
>> +     *   - if there are regions then defer work and write later on
>> +     */
>> +    if ( !num_mem_ranges )
>> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> I think this is wrong, as not calling defer_map will prevent the
> rangesets (bar[i]->mem) from being destroyed, so we are effectively
> leaking memory.
Not really. As in case of num_mem_ranges == 0 there are no rangesets
to free as none was allocated
>
> You need to take a path similar to the failure one in case there are
> no mappings pending, or even better just call defer_map anyway and let
> it do it's thing, it should be capable of handling empty rangesets
> just fine. That's how it's currently done.
So, I think this is still valid to break early and do not go with defer_map
>
>> +    else
>> +        defer_map(dev->domain, dev, cmd, rom_only, num_mem_ranges);
>>   
>>       return 0;
>> +
>> +fail:
> We usually ask labels to be indented with one space.
Sure. I am confused a bit: there is no word for that in the coding
style and the sources use labels with and without the space.
>
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +    {
>> +        struct vpci_bar *bar = &header->bars[i];
>> +
>> +        rangeset_destroy(bar->mem);
>> +        bar->mem = NULL;
>> +    }
>> +    return rc;
>>   }
>>   
>>   static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index a0320b22cb36..352e02d0106d 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -80,6 +80,7 @@ struct vpci {
>>               /* Guest view of the BAR. */
>>               uint64_t guest_addr;
>>               uint64_t size;
>> +            struct rangeset *mem;
>>               enum {
>>                   VPCI_BAR_EMPTY,
>>                   VPCI_BAR_IO,
>> @@ -154,9 +155,9 @@ struct vpci {
>>   
>>   struct vpci_vcpu {
>>       /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
>> -    struct rangeset *mem;
>>       struct pci_dev *pdev;
>>       uint16_t cmd;
>> +    uint8_t num_mem_ranges;
> AFAICT This could be a simple bool:
>
> bool map_pending : 1;
>
> As there's no strict need to know how many BARs have pending mappings.
This is true
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 07/11] vpci/header: program p2m with guest BAR view
  2021-10-26 10:35   ` Roger Pau Monné
@ 2021-11-02 10:43     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-02 10:43 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 26.10.21 13:35, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:19AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Take into account guest's BAR view and program its p2m accordingly:
>> gfn is guest's view of the BAR and mfn is the physical BAR value as set
>> up by the host bridge in the hardware domain.
>> This way hardware doamin sees physical BAR values and guest sees
>> emulated ones.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> ---
>> Since v2:
>> - improve readability for data.start_gfn and restructure ?: construct
>> Since v1:
>>   - s/MSI/MSI-X in comments
>> ---
>>   xen/drivers/vpci/header.c | 34 ++++++++++++++++++++++++++++++++--
>>   1 file changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 9c603d26d302..f23c956cde6c 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -30,6 +30,10 @@
>>   
>>   struct map_data {
>>       struct domain *d;
>> +    /* Start address of the BAR as seen by the guest. */
>> +    gfn_t start_gfn;
>> +    /* Physical start address of the BAR. */
>> +    mfn_t start_mfn;
>>       bool map;
>>   };
>>   
>> @@ -37,12 +41,28 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>                        unsigned long *c)
>>   {
>>       const struct map_data *map = data;
>> +    gfn_t start_gfn;
>>       int rc;
>>   
>>       for ( ; ; )
>>       {
>>           unsigned long size = e - s + 1;
>>   
>> +        /*
>> +         * Any BAR may have holes in its memory we want to map, e.g.
>> +         * we don't want to map MSI-X regions which may be a part of that BAR,
>> +         * e.g. when a single BAR is used for both MMIO and MSI-X.
> IMO there are too many 'e.g.' here.
>
>> +         * In this case MSI-X regions are subtracted from the mapping, but
>> +         * map->start_gfn still points to the very beginning of the BAR.
>> +         * So if there is a hole present then we need to adjust start_gfn
>> +         * to reflect the fact of that substraction.
>> +         */
> I would simply the comment a bit:
>
> /*
>   * Ranges to be mapped don't always start at the BAR start address, as
>   * there can be holes or partially consumed ranges. Account for the
>   * offset of the current address from the BAR start.
>   */
>
> Apart from MSI-X related holes on x86 at least we support preemption
> here, which means a range could be partially mapped before yielding.
Thank you, will use your comment which is shorter and still clear
>> +        start_gfn = gfn_add(map->start_gfn, s - mfn_x(map->start_mfn));
>> +
>> +        printk(XENLOG_G_DEBUG
>> +               "%smap [%lx, %lx] -> %#"PRI_gfn" for d%d\n",
>> +               map->map ? "" : "un", s, e, gfn_x(start_gfn),
>> +               map->d->domain_id);
>>           /*
>>            * ARM TODOs:
>>            * - On ARM whether the memory is prefetchable or not should be passed
>> @@ -52,8 +72,10 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>            * - {un}map_mmio_regions doesn't support preemption.
>>            */
>>   
>> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
>> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
>> +        rc = map->map ? map_mmio_regions(map->d, start_gfn,
>> +                                         size, _mfn(s))
>> +                      : unmap_mmio_regions(map->d, start_gfn,
>> +                                           size, _mfn(s));
>>           if ( rc == 0 )
>>           {
>>               *c += size;
>> @@ -69,6 +91,7 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>           ASSERT(rc < size);
>>           *c += rc;
>>           s += rc;
>> +        gfn_add(map->start_gfn, rc);
> I think increasing map->start_gfn is wrong here, as it would get out
> of sync with map->start_mfn then, and the calculations done to obtain
> start_gfn would then be wrong.
Indeed, will remove it
>
>>           if ( general_preempt_check() )
>>                   return -ERESTART;
>>       }
>> @@ -149,6 +172,10 @@ bool vpci_process_pending(struct vcpu *v)
>>               if ( !bar->mem )
>>                   continue;
>>   
>> +            data.start_gfn =
>> +                 _gfn(PFN_DOWN(is_hardware_domain(v->vpci.pdev->domain)
> You can just use v->domain here.
Ok
>
>> +                               ? bar->addr : bar->guest_addr));
> I would place the '?' in the line above, but that's just my taste.
Hmmm, this chunk was discussed before and this is the result of
that discussion ;) So, I'll better keep it as is
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-10-26 10:52   ` Roger Pau Monné
@ 2021-11-02 10:48     ` Oleksandr Andrushchenko
  2021-11-02 11:19     ` Jan Beulich
  1 sibling, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-02 10:48 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Michal Orzel, Oleksandr Andrushchenko

Hi, Roger!

On 26.10.21 13:52, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Add basic emulation support for guests. At the moment only emulate
>> PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left
>> as TODO.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>> ---
>> New in v2
>> ---
>>   xen/drivers/vpci/header.c | 35 ++++++++++++++++++++++++++++++++---
>>   1 file changed, 32 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index f23c956cde6c..754aeb5a584f 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>           pci_conf_write16(pdev->sbdf, reg, cmd);
>>   }
>>   
>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t cmd, void *data)
>> +{
>> +    /* TODO: Add proper emulation for all bits of the command register. */
>> +
>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>> +    {
>> +        /*
>> +         * Guest wants to enable INTx. It can't be enabled if:
>> +         *  - host has INTx disabled
>> +         *  - MSI/MSI-X enabled
>> +         */
>> +        if ( pdev->vpci->msi->enabled )
>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>> +        else
>> +        {
>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>> +
>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>> +        }
> This last part should be Arm specific. On other architectures we
> likely want the guest to modify INTx disable in order to select the
> interrupt delivery mode for the device.
This is not arch specific as we just do not allow INTx to be enabled
if MSI/MSI-X has been enabled before. This was discussed previously
(Jan) and this was pointed as an acceptable approach to limit the
guest from having inconsistent configuration
>
> I really wonder if we should allow the guest to play with any other
> bit apart from INTx disable and memory and IO decoding on the command
> register.
This needs to be implemented one day when we understand what
this emulation should look like. This is why I have a "TODO" above.
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 09/11] vpci/header: Reset the command register when adding devices
  2021-10-26 11:00   ` Roger Pau Monné
@ 2021-11-02 11:11     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-02 11:11 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, Bertrand Marquis,
	Rahul Singh, Michal Orzel, Oleksandr Andrushchenko

Hi, Roger!

On 26.10.21 14:00, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:21AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Reset the command register when passing through a PCI device:
>> it is possible that when passing through a PCI device its memory
>> decoding bits in the command register are already set. Thus, a
>> guest OS may not write to the command register to update memory
>> decoding, so guest mappings (guest's view of the BARs) are
>> left not updated.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Reviewed-by: Michal Orzel <michal.orzel@arm.com>
>> ---
>> Since v1:
>>   - do not write 0 to the command register, but respect host settings.
>> ---
>>   xen/drivers/vpci/header.c | 17 +++++++++++++----
>>   1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 754aeb5a584f..70d911b147e1 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -451,8 +451,7 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>           pci_conf_write16(pdev->sbdf, reg, cmd);
>>   }
>>   
>> -static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> -                            uint32_t cmd, void *data)
>> +static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
>>   {
>>       /* TODO: Add proper emulation for all bits of the command register. */
>>   
>> @@ -467,14 +466,20 @@ static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>               cmd |= PCI_COMMAND_INTX_DISABLE;
>>           else
>>           {
>> -            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> Either we keep reg here or we drop the parameter altogether from the
> function prototype. Having one caller pass 0 while the other passing
> PCI_COMMAND is confusing. The more that the parameter is now
> effectively unused.
This is probably because git diff isn't really helpful here in showing the change:
static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
{
     /* TODO: Add proper emulation for all bits of the command register. */

     if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
     {
         /*
          * Guest wants to enable INTx. It can't be enabled if:
          *  - host has INTx disabled
          *  - MSI/MSI-X enabled
          */
         if ( pdev->vpci->msi->enabled )
             cmd |= PCI_COMMAND_INTX_DISABLE;
         else
         {
             uint16_t current_cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);

             if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
                 cmd |= PCI_COMMAND_INTX_DISABLE;
         }
     }

     return cmd;
}

So, reg is not used here and cmd is the desired value of the PCI_COMMAND
register. So, I see no confusion here.
>>   
>>               if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>                   cmd |= PCI_COMMAND_INTX_DISABLE;
>>           }
>>       }
>>   
>> -    cmd_write(pdev, reg, cmd, data);
>> +    return cmd;
>> +}
>> +
>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t cmd, void *data)
>> +{
>> +    cmd_write(pdev, reg, emulate_cmd_reg(pdev, cmd), data);
>>   }
>>   
>>   static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>> @@ -793,6 +798,10 @@ int vpci_bar_add_handlers(const struct domain *d, const struct pci_dev *pdev)
>>           gdprintk(XENLOG_ERR,
>>                    "%pp: failed to add BAR handlers for dom%pd: %d\n",
>>                    &pdev->sbdf, d, rc);
>> +
>> +    /* Reset the command register with respect to host settings. */
>> +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, emulate_cmd_reg(pdev, 0));
> I think we likely want to unset the memory and IO decoding bits from
> the command register, as the guest view of the BAR address is
> currently forced to 0, and not mapped into the guest p2m.
By passing 0 here as the desired value of the PCI_COMMAND register
we do that. The emulation code will take care of that.
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR
  2021-10-26  9:40     ` Roger Pau Monné
@ 2021-11-02 11:13       ` Jan Beulich
  0 siblings, 0 replies; 98+ messages in thread
From: Jan Beulich @ 2021-11-02 11:13 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

On 26.10.2021 11:40, Roger Pau Monné wrote:
> On Mon, Oct 25, 2021 at 11:51:57AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>> Could you please take a look at the below?
>> Jan was questioning the per BAR range set approach, so it
>> is crucial for the maintainer (you) to answer here.
> 
> I'm open to suggestions to using something different than a rangeset
> per BAR, but lacking any concrete proposal I think using rangesets is
> fine.

The main reason for my objection is that for the average BAR the
rangeset will hold exactly one range. That's not an efficient way
to express a single range.

> One possible way might be to extend rangesets so that private data
> could be stored for each rangeset range, but that would then make
> merging operations impossible, likewise splitting ranges would be
> troublesome.

Indeed, so I don't view this as an option.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-10-26 10:52   ` Roger Pau Monné
  2021-11-02 10:48     ` Oleksandr Andrushchenko
@ 2021-11-02 11:19     ` Jan Beulich
  2021-11-02 11:50       ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-11-02 11:19 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Michal Orzel, Oleksandr Andrushchenko

On 26.10.2021 12:52, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>          pci_conf_write16(pdev->sbdf, reg, cmd);
>>  }
>>  
>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t cmd, void *data)
>> +{
>> +    /* TODO: Add proper emulation for all bits of the command register. */
>> +
>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>> +    {
>> +        /*
>> +         * Guest wants to enable INTx. It can't be enabled if:
>> +         *  - host has INTx disabled
>> +         *  - MSI/MSI-X enabled
>> +         */
>> +        if ( pdev->vpci->msi->enabled )
>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>> +        else
>> +        {
>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>> +
>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>> +        }
> 
> This last part should be Arm specific. On other architectures we
> likely want the guest to modify INTx disable in order to select the
> interrupt delivery mode for the device.

We cannot allow a guest to clear the bit when it has MSI / MSI-X
enabled - only one of the three is supposed to be active at a time.
(IOW similarly we cannot allow a guest to enable MSI / MSI-X when
the bit is clear.)

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-02 11:19     ` Jan Beulich
@ 2021-11-02 11:50       ` Roger Pau Monné
  2021-11-02 13:54         ` Jan Beulich
  2021-11-02 14:17         ` Julien Grall
  0 siblings, 2 replies; 98+ messages in thread
From: Roger Pau Monné @ 2021-11-02 11:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Michal Orzel, Oleksandr Andrushchenko

On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
> On 26.10.2021 12:52, Roger Pau Monné wrote:
> > On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
> >> --- a/xen/drivers/vpci/header.c
> >> +++ b/xen/drivers/vpci/header.c
> >> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> >>          pci_conf_write16(pdev->sbdf, reg, cmd);
> >>  }
> >>  
> >> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> >> +                            uint32_t cmd, void *data)
> >> +{
> >> +    /* TODO: Add proper emulation for all bits of the command register. */
> >> +
> >> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
> >> +    {
> >> +        /*
> >> +         * Guest wants to enable INTx. It can't be enabled if:
> >> +         *  - host has INTx disabled
> >> +         *  - MSI/MSI-X enabled
> >> +         */
> >> +        if ( pdev->vpci->msi->enabled )
> >> +            cmd |= PCI_COMMAND_INTX_DISABLE;
> >> +        else
> >> +        {
> >> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
> >> +
> >> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
> >> +                cmd |= PCI_COMMAND_INTX_DISABLE;
> >> +        }
> > 
> > This last part should be Arm specific. On other architectures we
> > likely want the guest to modify INTx disable in order to select the
> > interrupt delivery mode for the device.
> 
> We cannot allow a guest to clear the bit when it has MSI / MSI-X
> enabled - only one of the three is supposed to be active at a time.
> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
> the bit is clear.)

Sure, but this code is making the bit sticky, by not allowing
INTX_DISABLE to be cleared once set. We do not want that behavior on
x86, as a guest can decide to use MSI or INTx. The else branch needs
to be Arm only.

Regards, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-02 11:50       ` Roger Pau Monné
@ 2021-11-02 13:54         ` Jan Beulich
  2021-11-02 14:10           ` Oleksandr Andrushchenko
  2021-11-02 14:17         ` Julien Grall
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-11-02 13:54 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Michal Orzel, Oleksandr Andrushchenko

On 02.11.2021 12:50, Roger Pau Monné wrote:
> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>          pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>  }
>>>>  
>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>> +                            uint32_t cmd, void *data)
>>>> +{
>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>> +
>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>> +    {
>>>> +        /*
>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>> +         *  - host has INTx disabled
>>>> +         *  - MSI/MSI-X enabled
>>>> +         */
>>>> +        if ( pdev->vpci->msi->enabled )
>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>> +        else
>>>> +        {
>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>> +
>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>> +        }
>>>
>>> This last part should be Arm specific. On other architectures we
>>> likely want the guest to modify INTx disable in order to select the
>>> interrupt delivery mode for the device.
>>
>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>> enabled - only one of the three is supposed to be active at a time.
>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>> the bit is clear.)
> 
> Sure, but this code is making the bit sticky, by not allowing
> INTX_DISABLE to be cleared once set. We do not want that behavior on
> x86, as a guest can decide to use MSI or INTx. The else branch needs
> to be Arm only.

Isn't the "else" part questionable even on Arm?

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-02 13:54         ` Jan Beulich
@ 2021-11-02 14:10           ` Oleksandr Andrushchenko
  2021-11-03  8:53             ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-02 14:10 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel, Oleksandr Andrushchenko



On 02.11.21 15:54, Jan Beulich wrote:
> On 02.11.2021 12:50, Roger Pau Monné wrote:
>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>> --- a/xen/drivers/vpci/header.c
>>>>> +++ b/xen/drivers/vpci/header.c
>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>           pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>   }
>>>>>   
>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                            uint32_t cmd, void *data)
>>>>> +{
>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>> +
>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>> +    {
>>>>> +        /*
>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>> +         *  - host has INTx disabled
>>>>> +         *  - MSI/MSI-X enabled
>>>>> +         */
>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>> +        else
>>>>> +        {
>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>> +
>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>> +        }
>>>> This last part should be Arm specific. On other architectures we
>>>> likely want the guest to modify INTx disable in order to select the
>>>> interrupt delivery mode for the device.
>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>> enabled - only one of the three is supposed to be active at a time.
>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>> the bit is clear.)
>> Sure, but this code is making the bit sticky, by not allowing
>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>> to be Arm only.
> Isn't the "else" part questionable even on Arm?
It is. Once fixed I can't see anything Arm specific here
>
> Jan
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-02 11:50       ` Roger Pau Monné
  2021-11-02 13:54         ` Jan Beulich
@ 2021-11-02 14:17         ` Julien Grall
  1 sibling, 0 replies; 98+ messages in thread
From: Julien Grall @ 2021-11-02 14:17 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Michal Orzel, Oleksandr Andrushchenko

Hi Roger,

On 02/11/2021 11:50, Roger Pau Monné wrote:
> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>           pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>   }
>>>>   
>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>> +                            uint32_t cmd, void *data)
>>>> +{
>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>> +
>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>> +    {
>>>> +        /*
>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>> +         *  - host has INTx disabled
>>>> +         *  - MSI/MSI-X enabled
>>>> +         */
>>>> +        if ( pdev->vpci->msi->enabled )
>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>> +        else
>>>> +        {
>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>> +
>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>> +        }
>>>
>>> This last part should be Arm specific. On other architectures we
>>> likely want the guest to modify INTx disable in order to select the
>>> interrupt delivery mode for the device.
>>
>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>> enabled - only one of the three is supposed to be active at a time.
>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>> the bit is clear.)
> 
> Sure, but this code is making the bit sticky, by not allowing
> INTX_DISABLE to be cleared once set. We do not want that behavior on
> x86, as a guest can decide to use MSI or INTx.

On Arm, I am aware of some hosbridges (e.g. Thunder-X) where legacy 
interrupts are not supported. Do such hostbridges exist x86?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-10-26 11:33   ` Roger Pau Monné
@ 2021-11-03  6:34     ` Oleksandr Andrushchenko
  2021-11-03  8:41       ` Jan Beulich
  2021-11-03  8:52       ` Roger Pau Monné
  0 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03  6:34 UTC (permalink / raw)
  To: Roger Pau Monné, jbeulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger

On 26.10.21 14:33, Roger Pau Monné wrote:
> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Assign SBDF to the PCI devices being passed through with bus 0.
>> The resulting topology is where PCIe devices reside on the bus 0 of the
>> root complex itself (embedded endpoints).
>> This implementation is limited to 32 devices which are allowed on
>> a single PCI bus.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> ---
>> Since v2:
>>   - remove casts that are (a) malformed and (b) unnecessary
>>   - add new line for better readability
>>   - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>      functions are now completely gated with this config
>>   - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>> New in v2
>> ---
>>   xen/common/domain.c           |  3 ++
>>   xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>   xen/drivers/vpci/vpci.c       | 14 +++++++-
>>   xen/include/xen/pci.h         | 22 +++++++++++++
>>   xen/include/xen/sched.h       |  8 +++++
>>   5 files changed, 106 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 40d67ec34232..e0170087612d 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       INIT_LIST_HEAD(&d->pdev_list);
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    INIT_LIST_HEAD(&d->vdev_list);
>> +#endif
>>   #endif
>>   
>>       /* All error paths can depend on the above setup. */
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 805ab86ed555..5b963d75d1ba 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>       return ret;
>>   }
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>> +                                                const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>> +        if ( vdev->pdev == pdev )
>> +            return vdev;
>> +    return NULL;
>> +}
>> +
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>> +
>> +    /* Each PCI bus supports 32 devices/slots at max. */
>> +    if ( d->vpci_dev_next > 31 )
>> +        return -ENOSPC;
>> +
>> +    vdev = xzalloc(struct vpci_dev);
>> +    if ( !vdev )
>> +        return -ENOMEM;
>> +
>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>> +    vdev->seg = 0;
>> +
>> +    /*
>> +     * The bus number is set to 0, so virtual devices are seen
>> +     * as embedded endpoints behind the root complex.
>> +     */
>> +    vdev->bus = 0;
>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
> This would likely be better as a bitmap where you set the bits of
> in-use slots. Then you can use find_first_bit or similar to get a free
> slot.
>
> Long term you might want to allow the caller to provide a pre-selected
> slot, as it's possible for users to request the device to appear at a
> specific slot on the emulated bus.
>
>> +
>> +    vdev->pdev = pdev;
>> +    vdev->domain = d;
>> +
>> +    pcidevs_lock();
>> +    list_add_tail(&vdev->list, &d->vdev_list);
>> +    pcidevs_unlock();
>> +
>> +    return 0;
>> +}
>> +
>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>> +{
>> +    struct vpci_dev *vdev;
>> +
>> +    pcidevs_lock();
>> +    vdev = pci_find_virtual_device(d, pdev);
>> +    if ( vdev )
>> +        list_del(&vdev->list);
>> +    pcidevs_unlock();
>> +    xfree(vdev);
>> +    return 0;
>> +}
>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>> +
>>   /* Caller should hold the pcidevs_lock */
>>   static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>                              uint8_t devfn)
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 702f7b5d5dda..d787f13e679e 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>   /* Notify vPCI that device is assigned to guest. */
>>   int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to assign for hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> -    return vpci_bar_add_handlers(d, dev);
>> +    rc = vpci_bar_add_handlers(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>> +    return pci_add_virtual_device(d, dev);
>>   }
>>   
>>   /* Notify vPCI that device is de-assigned from guest. */
>>   int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>   {
>> +    int rc;
>> +
>>       /* It only makes sense to de-assign from hwdom or guest domain. */
>>       if ( is_system_domain(d) || !has_vpci(d) )
>>           return 0;
>>   
>> +    rc = pci_remove_virtual_device(d, dev);
>> +    if ( rc )
>> +        return rc;
>> +
>>       return vpci_bar_remove_handlers(d, dev);
>>   }
>>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>> index 43b8a0817076..33033a3a8f8d 100644
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -137,6 +137,24 @@ struct pci_dev {
>>       struct vpci *vpci;
>>   };
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +struct vpci_dev {
>> +    struct list_head list;
>> +    /* Physical PCI device this virtual device is connected to. */
>> +    const struct pci_dev *pdev;
>> +    /* Virtual SBDF of the device. */
>> +    union {
>> +        struct {
>> +            uint8_t devfn;
>> +            uint8_t bus;
>> +            uint16_t seg;
>> +        };
>> +        pci_sbdf_t sbdf;
>> +    };
>> +    struct domain *domain;
>> +};
>> +#endif
> I wonder whether this is strictly needed. Won't it be enough to store
> the virtual (ie: guest) sbdf inside the existing vpci struct?
>
> It would avoid the overhead of the translation you do from pdev ->
> vdev, and there doesn't seem to be anything relevant stored in
> vpci_dev apart from the virtual sbdf.
TL;DR It seems it might be needed from performance POV. If not implemented
for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
Note: pcidevs' lock is a recursive lock

There are 2 sources of access to virtual devices:
1. During initialization when we add, assign or de-assign a PCI device
2. At run-time when we trap configuration space access and need to
translate virtual SBDF into physical SBDF
3. At least de-assign can run concurrently with MMIO handlers

Now let's see which locks are in use while doing that.

1. No struct vpci_dev is used.
1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
1.2. To protect virtual devices we use pcidevs_{lock|unlock}
1.3. Locking happens on system level

2. struct vpci_dev is used
2.1. We have a per-domain lock vdev_lock
2.2. Locking happens on per domain level

To compare the two:

1. Without vpci_dev
pros: much simpler code
pros/cons: global lock is used during MMIO handling, but it is a recursive lock

2. With vpc_dev
pros: per-domain locking
cons: more code

I have implemented the two methods and we need to decide
which route we go.
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-11-03  6:34     ` Oleksandr Andrushchenko
@ 2021-11-03  8:41       ` Jan Beulich
  2021-11-03  8:57         ` Oleksandr Andrushchenko
  2021-11-03  8:52       ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-11-03  8:41 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Roger Pau Monné

On 03.11.2021 07:34, Oleksandr Andrushchenko wrote:
> Hi, Roger
> 
> On 26.10.21 14:33, Roger Pau Monné wrote:
>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>> root complex itself (embedded endpoints).
>>> This implementation is limited to 32 devices which are allowed on
>>> a single PCI bus.
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> ---
>>> Since v2:
>>>   - remove casts that are (a) malformed and (b) unnecessary
>>>   - add new line for better readability
>>>   - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>>      functions are now completely gated with this config
>>>   - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> New in v2
>>> ---
>>>   xen/common/domain.c           |  3 ++
>>>   xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>>   xen/drivers/vpci/vpci.c       | 14 +++++++-
>>>   xen/include/xen/pci.h         | 22 +++++++++++++
>>>   xen/include/xen/sched.h       |  8 +++++
>>>   5 files changed, 106 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index 40d67ec34232..e0170087612d 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>>   
>>>   #ifdef CONFIG_HAS_PCI
>>>       INIT_LIST_HEAD(&d->pdev_list);
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +    INIT_LIST_HEAD(&d->vdev_list);
>>> +#endif
>>>   #endif
>>>   
>>>       /* All error paths can depend on the above setup. */
>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>>> index 805ab86ed555..5b963d75d1ba 100644
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>       return ret;
>>>   }
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>> +                                                const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>> +        if ( vdev->pdev == pdev )
>>> +            return vdev;
>>> +    return NULL;
>>> +}
>>> +
>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>> +
>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>> +    if ( d->vpci_dev_next > 31 )
>>> +        return -ENOSPC;
>>> +
>>> +    vdev = xzalloc(struct vpci_dev);
>>> +    if ( !vdev )
>>> +        return -ENOMEM;
>>> +
>>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>>> +    vdev->seg = 0;
>>> +
>>> +    /*
>>> +     * The bus number is set to 0, so virtual devices are seen
>>> +     * as embedded endpoints behind the root complex.
>>> +     */
>>> +    vdev->bus = 0;
>>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>> This would likely be better as a bitmap where you set the bits of
>> in-use slots. Then you can use find_first_bit or similar to get a free
>> slot.
>>
>> Long term you might want to allow the caller to provide a pre-selected
>> slot, as it's possible for users to request the device to appear at a
>> specific slot on the emulated bus.
>>
>>> +
>>> +    vdev->pdev = pdev;
>>> +    vdev->domain = d;
>>> +
>>> +    pcidevs_lock();
>>> +    list_add_tail(&vdev->list, &d->vdev_list);
>>> +    pcidevs_unlock();
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>> +{
>>> +    struct vpci_dev *vdev;
>>> +
>>> +    pcidevs_lock();
>>> +    vdev = pci_find_virtual_device(d, pdev);
>>> +    if ( vdev )
>>> +        list_del(&vdev->list);
>>> +    pcidevs_unlock();
>>> +    xfree(vdev);
>>> +    return 0;
>>> +}
>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>> +
>>>   /* Caller should hold the pcidevs_lock */
>>>   static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>>                              uint8_t devfn)
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index 702f7b5d5dda..d787f13e679e 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>   /* Notify vPCI that device is assigned to guest. */
>>>   int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>   {
>>> +    int rc;
>>> +
>>>       /* It only makes sense to assign for hwdom or guest domain. */
>>>       if ( is_system_domain(d) || !has_vpci(d) )
>>>           return 0;
>>>   
>>> -    return vpci_bar_add_handlers(d, dev);
>>> +    rc = vpci_bar_add_handlers(d, dev);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>> +    return pci_add_virtual_device(d, dev);
>>>   }
>>>   
>>>   /* Notify vPCI that device is de-assigned from guest. */
>>>   int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>   {
>>> +    int rc;
>>> +
>>>       /* It only makes sense to de-assign from hwdom or guest domain. */
>>>       if ( is_system_domain(d) || !has_vpci(d) )
>>>           return 0;
>>>   
>>> +    rc = pci_remove_virtual_device(d, dev);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>>       return vpci_bar_remove_handlers(d, dev);
>>>   }
>>>   #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>> index 43b8a0817076..33033a3a8f8d 100644
>>> --- a/xen/include/xen/pci.h
>>> +++ b/xen/include/xen/pci.h
>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>       struct vpci *vpci;
>>>   };
>>>   
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +struct vpci_dev {
>>> +    struct list_head list;
>>> +    /* Physical PCI device this virtual device is connected to. */
>>> +    const struct pci_dev *pdev;
>>> +    /* Virtual SBDF of the device. */
>>> +    union {
>>> +        struct {
>>> +            uint8_t devfn;
>>> +            uint8_t bus;
>>> +            uint16_t seg;
>>> +        };
>>> +        pci_sbdf_t sbdf;
>>> +    };
>>> +    struct domain *domain;
>>> +};
>>> +#endif
>> I wonder whether this is strictly needed. Won't it be enough to store
>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>
>> It would avoid the overhead of the translation you do from pdev ->
>> vdev, and there doesn't seem to be anything relevant stored in
>> vpci_dev apart from the virtual sbdf.
> TL;DR It seems it might be needed from performance POV. If not implemented
> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
> Note: pcidevs' lock is a recursive lock
> 
> There are 2 sources of access to virtual devices:
> 1. During initialization when we add, assign or de-assign a PCI device
> 2. At run-time when we trap configuration space access and need to
> translate virtual SBDF into physical SBDF
> 3. At least de-assign can run concurrently with MMIO handlers
> 
> Now let's see which locks are in use while doing that.
> 
> 1. No struct vpci_dev is used.
> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
> 1.3. Locking happens on system level
> 
> 2. struct vpci_dev is used
> 2.1. We have a per-domain lock vdev_lock
> 2.2. Locking happens on per domain level
> 
> To compare the two:
> 
> 1. Without vpci_dev
> pros: much simpler code
> pros/cons: global lock is used during MMIO handling, but it is a recursive lock

Could you point out to me in which way the recursive nature of the lock
is relevant here? Afaict that aspect is of no interest when considering
the performance effects of using a global lock vs one with more narrow
scope.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-11-03  6:34     ` Oleksandr Andrushchenko
  2021-11-03  8:41       ` Jan Beulich
@ 2021-11-03  8:52       ` Roger Pau Monné
  2021-11-03  8:59         ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-11-03  8:52 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: jbeulich, xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh

On Wed, Nov 03, 2021 at 06:34:16AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger
> 
> On 26.10.21 14:33, Roger Pau Monné wrote:
> > On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
> >> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> >> index 43b8a0817076..33033a3a8f8d 100644
> >> --- a/xen/include/xen/pci.h
> >> +++ b/xen/include/xen/pci.h
> >> @@ -137,6 +137,24 @@ struct pci_dev {
> >>       struct vpci *vpci;
> >>   };
> >>   
> >> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> >> +struct vpci_dev {
> >> +    struct list_head list;
> >> +    /* Physical PCI device this virtual device is connected to. */
> >> +    const struct pci_dev *pdev;
> >> +    /* Virtual SBDF of the device. */
> >> +    union {
> >> +        struct {
> >> +            uint8_t devfn;
> >> +            uint8_t bus;
> >> +            uint16_t seg;
> >> +        };
> >> +        pci_sbdf_t sbdf;
> >> +    };
> >> +    struct domain *domain;
> >> +};
> >> +#endif
> > I wonder whether this is strictly needed. Won't it be enough to store
> > the virtual (ie: guest) sbdf inside the existing vpci struct?
> >
> > It would avoid the overhead of the translation you do from pdev ->
> > vdev, and there doesn't seem to be anything relevant stored in
> > vpci_dev apart from the virtual sbdf.
> TL;DR It seems it might be needed from performance POV. If not implemented
> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
> Note: pcidevs' lock is a recursive lock
> 
> There are 2 sources of access to virtual devices:
> 1. During initialization when we add, assign or de-assign a PCI device
> 2. At run-time when we trap configuration space access and need to
> translate virtual SBDF into physical SBDF
> 3. At least de-assign can run concurrently with MMIO handlers
> 
> Now let's see which locks are in use while doing that.
> 
> 1. No struct vpci_dev is used.
> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
> 1.3. Locking happens on system level
> 
> 2. struct vpci_dev is used
> 2.1. We have a per-domain lock vdev_lock
> 2.2. Locking happens on per domain level
> 
> To compare the two:
> 
> 1. Without vpci_dev
> pros: much simpler code
> pros/cons: global lock is used during MMIO handling, but it is a recursive lock
> 
> 2. With vpc_dev
> pros: per-domain locking
> cons: more code
> 
> I have implemented the two methods and we need to decide
> which route we go.

We could always see about converting the pcidevs lock into a rw one if
it turns out there's too much contention. PCI config space accesses
shouldn't be that common or performance critical, so having some
contention might not be noticeable.

TBH I would start with the simpler solution (add guest_sbdf and use
pci lock) and move to something more complex once issues are
identified.

Regards, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-02 14:10           ` Oleksandr Andrushchenko
@ 2021-11-03  8:53             ` Oleksandr Andrushchenko
  2021-11-03  9:11               ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03  8:53 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel



On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>
> On 02.11.21 15:54, Jan Beulich wrote:
>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>            pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>    }
>>>>>>    
>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>> +                            uint32_t cmd, void *data)
>>>>>> +{
>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>> +
>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>> +    {
>>>>>> +        /*
>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>> +         *  - host has INTx disabled
>>>>>> +         *  - MSI/MSI-X enabled
>>>>>> +         */
>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>> +        else
>>>>>> +        {
>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>> +
>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>> +        }
>>>>> This last part should be Arm specific. On other architectures we
>>>>> likely want the guest to modify INTx disable in order to select the
>>>>> interrupt delivery mode for the device.
>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>> enabled - only one of the three is supposed to be active at a time.
>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>> the bit is clear.)
>>> Sure, but this code is making the bit sticky, by not allowing
>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>> to be Arm only.
>> Isn't the "else" part questionable even on Arm?
> It is. Once fixed I can't see anything Arm specific here
Well, I have looked at the code one more time and everything seems to
be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
guest_cmd_write. The former is used for the hardware domain and has
*no restrictions* on writing PCI_COMMAND register contents and the later
is only used for guests and which does have restrictions applied in
emulate_cmd_reg function.

So, for the hardware domain, there is no "sticky" bit possible and for the
guest domains if the physical contents of the PCI_COMMAND register
has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
use PCI_COMMAND_INTX_DISABLE bit set.

So, from hardware domain POV, this should not be a problem, but from
guests view it can. Let's imagine that the hardware domain can handle
all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
domain can decide what can be used for the interrupt source (again, no
restriction here) and program PCI_COMMAND accordingly.
Guest domains need to align with this configuration, e.g. if INTx was disabled
by the hardware domain then INTx cannot be enabled for guests: yes, this doesn't
cover dom0less etc. so we do rely on some entity before the guest to set the
PCI_COMMAND correctly.
This is how it is implemented in the patch.
Please also see the discussion we had before [1].

What is not now covered is that if there is a hardware domain and the same PCI
device is first passed to one of the guests and then assigned to another. In this case:

hwdom (or any other entity) programs PCI_COMMAND
assign domU1
deassign domU1
*assign domIO*
assign domU2

So in this scenario the host assigned value is lost after assigning to domU1
and domU2 will use the value used by domU1.
So, it seems that this is the only use-case not covered by the patch.

Jan [1]:
"In the absence of Dom0 controlling the device, I think we ought to take
Xen's view as the "host" one. Which will want the bit set at least as
long as either MSI or MSI-X is enabled for the device."

So, for the PCI_COMMAND register we might want to have a reference value
to be stored so we can restore it while assigning the PCI device to a guest.
For the current implementation the best I can probably do is to read this value
in init_bars when it is called for the hardware domain:

if ( is_hardware_domain(d) )
   vpci->pci_command_reference = pci_read(PCI_COMMAND)

And when I want to reset PCI_COMMAND while assigning to a guest I will
use it instead of 0 as it is now.
>> Jan
>>
[1] https://lore.kernel.org/xen-devel/20210903100831.177748-9-andr2000@gmail.com/

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-11-03  8:41       ` Jan Beulich
@ 2021-11-03  8:57         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03  8:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Roger Pau Monné



On 03.11.21 10:41, Jan Beulich wrote:
> On 03.11.2021 07:34, Oleksandr Andrushchenko wrote:
>> Hi, Roger
>>
>> On 26.10.21 14:33, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Assign SBDF to the PCI devices being passed through with bus 0.
>>>> The resulting topology is where PCIe devices reside on the bus 0 of the
>>>> root complex itself (embedded endpoints).
>>>> This implementation is limited to 32 devices which are allowed on
>>>> a single PCI bus.
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> ---
>>>> Since v2:
>>>>    - remove casts that are (a) malformed and (b) unnecessary
>>>>    - add new line for better readability
>>>>    - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>>>       functions are now completely gated with this config
>>>>    - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> New in v2
>>>> ---
>>>>    xen/common/domain.c           |  3 ++
>>>>    xen/drivers/passthrough/pci.c | 60 +++++++++++++++++++++++++++++++++++
>>>>    xen/drivers/vpci/vpci.c       | 14 +++++++-
>>>>    xen/include/xen/pci.h         | 22 +++++++++++++
>>>>    xen/include/xen/sched.h       |  8 +++++
>>>>    5 files changed, 106 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>>> index 40d67ec34232..e0170087612d 100644
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -601,6 +601,9 @@ struct domain *domain_create(domid_t domid,
>>>>    
>>>>    #ifdef CONFIG_HAS_PCI
>>>>        INIT_LIST_HEAD(&d->pdev_list);
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +    INIT_LIST_HEAD(&d->vdev_list);
>>>> +#endif
>>>>    #endif
>>>>    
>>>>        /* All error paths can depend on the above setup. */
>>>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>>>> index 805ab86ed555..5b963d75d1ba 100644
>>>> --- a/xen/drivers/passthrough/pci.c
>>>> +++ b/xen/drivers/passthrough/pci.c
>>>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>>>>        return ret;
>>>>    }
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>>>> +                                                const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    list_for_each_entry ( vdev, &d->vdev_list, list )
>>>> +        if ( vdev->pdev == pdev )
>>>> +            return vdev;
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    ASSERT(!pci_find_virtual_device(d, pdev));
>>>> +
>>>> +    /* Each PCI bus supports 32 devices/slots at max. */
>>>> +    if ( d->vpci_dev_next > 31 )
>>>> +        return -ENOSPC;
>>>> +
>>>> +    vdev = xzalloc(struct vpci_dev);
>>>> +    if ( !vdev )
>>>> +        return -ENOMEM;
>>>> +
>>>> +    /* We emulate a single host bridge for the guest, so segment is always 0. */
>>>> +    vdev->seg = 0;
>>>> +
>>>> +    /*
>>>> +     * The bus number is set to 0, so virtual devices are seen
>>>> +     * as embedded endpoints behind the root complex.
>>>> +     */
>>>> +    vdev->bus = 0;
>>>> +    vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>>> This would likely be better as a bitmap where you set the bits of
>>> in-use slots. Then you can use find_first_bit or similar to get a free
>>> slot.
>>>
>>> Long term you might want to allow the caller to provide a pre-selected
>>> slot, as it's possible for users to request the device to appear at a
>>> specific slot on the emulated bus.
>>>
>>>> +
>>>> +    vdev->pdev = pdev;
>>>> +    vdev->domain = d;
>>>> +
>>>> +    pcidevs_lock();
>>>> +    list_add_tail(&vdev->list, &d->vdev_list);
>>>> +    pcidevs_unlock();
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev)
>>>> +{
>>>> +    struct vpci_dev *vdev;
>>>> +
>>>> +    pcidevs_lock();
>>>> +    vdev = pci_find_virtual_device(d, pdev);
>>>> +    if ( vdev )
>>>> +        list_del(&vdev->list);
>>>> +    pcidevs_unlock();
>>>> +    xfree(vdev);
>>>> +    return 0;
>>>> +}
>>>> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>> +
>>>>    /* Caller should hold the pcidevs_lock */
>>>>    static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>>>                               uint8_t devfn)
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index 702f7b5d5dda..d787f13e679e 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>>>>    /* Notify vPCI that device is assigned to guest. */
>>>>    int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>>>>    {
>>>> +    int rc;
>>>> +
>>>>        /* It only makes sense to assign for hwdom or guest domain. */
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> -    return vpci_bar_add_handlers(d, dev);
>>>> +    rc = vpci_bar_add_handlers(d, dev);
>>>> +    if ( rc )
>>>> +        return rc;
>>>> +
>>>> +    return pci_add_virtual_device(d, dev);
>>>>    }
>>>>    
>>>>    /* Notify vPCI that device is de-assigned from guest. */
>>>>    int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>>>>    {
>>>> +    int rc;
>>>> +
>>>>        /* It only makes sense to de-assign from hwdom or guest domain. */
>>>>        if ( is_system_domain(d) || !has_vpci(d) )
>>>>            return 0;
>>>>    
>>>> +    rc = pci_remove_virtual_device(d, dev);
>>>> +    if ( rc )
>>>> +        return rc;
>>>> +
>>>>        return vpci_bar_remove_handlers(d, dev);
>>>>    }
>>>>    #endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
>>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>>> index 43b8a0817076..33033a3a8f8d 100644
>>>> --- a/xen/include/xen/pci.h
>>>> +++ b/xen/include/xen/pci.h
>>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>>        struct vpci *vpci;
>>>>    };
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +struct vpci_dev {
>>>> +    struct list_head list;
>>>> +    /* Physical PCI device this virtual device is connected to. */
>>>> +    const struct pci_dev *pdev;
>>>> +    /* Virtual SBDF of the device. */
>>>> +    union {
>>>> +        struct {
>>>> +            uint8_t devfn;
>>>> +            uint8_t bus;
>>>> +            uint16_t seg;
>>>> +        };
>>>> +        pci_sbdf_t sbdf;
>>>> +    };
>>>> +    struct domain *domain;
>>>> +};
>>>> +#endif
>>> I wonder whether this is strictly needed. Won't it be enough to store
>>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>>
>>> It would avoid the overhead of the translation you do from pdev ->
>>> vdev, and there doesn't seem to be anything relevant stored in
>>> vpci_dev apart from the virtual sbdf.
>> TL;DR It seems it might be needed from performance POV. If not implemented
>> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
>> Note: pcidevs' lock is a recursive lock
>>
>> There are 2 sources of access to virtual devices:
>> 1. During initialization when we add, assign or de-assign a PCI device
>> 2. At run-time when we trap configuration space access and need to
>> translate virtual SBDF into physical SBDF
>> 3. At least de-assign can run concurrently with MMIO handlers
>>
>> Now let's see which locks are in use while doing that.
>>
>> 1. No struct vpci_dev is used.
>> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
>> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
>> 1.3. Locking happens on system level
>>
>> 2. struct vpci_dev is used
>> 2.1. We have a per-domain lock vdev_lock
>> 2.2. Locking happens on per domain level
>>
>> To compare the two:
>>
>> 1. Without vpci_dev
>> pros: much simpler code
>> pros/cons: global lock is used during MMIO handling, but it is a recursive lock
> Could you point out to me in which way the recursive nature of the lock
> is relevant here? Afaict that aspect is of no interest when considering
> the performance effects of using a global lock vs one with more narrow
> scope.
I just tried to find some excuses and defend pcidev's global lock,
so even lock's recursion could be an argument here. Weak.
Besides that I do agree that this is still a global lock.
>
> Jan
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology
  2021-11-03  8:52       ` Roger Pau Monné
@ 2021-11-03  8:59         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03  8:59 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: jbeulich, xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh



On 03.11.21 10:52, Roger Pau Monné wrote:
> On Wed, Nov 03, 2021 at 06:34:16AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger
>>
>> On 26.10.21 14:33, Roger Pau Monné wrote:
>>> On Thu, Sep 30, 2021 at 10:52:22AM +0300, Oleksandr Andrushchenko wrote:
>>>> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
>>>> index 43b8a0817076..33033a3a8f8d 100644
>>>> --- a/xen/include/xen/pci.h
>>>> +++ b/xen/include/xen/pci.h
>>>> @@ -137,6 +137,24 @@ struct pci_dev {
>>>>        struct vpci *vpci;
>>>>    };
>>>>    
>>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>>> +struct vpci_dev {
>>>> +    struct list_head list;
>>>> +    /* Physical PCI device this virtual device is connected to. */
>>>> +    const struct pci_dev *pdev;
>>>> +    /* Virtual SBDF of the device. */
>>>> +    union {
>>>> +        struct {
>>>> +            uint8_t devfn;
>>>> +            uint8_t bus;
>>>> +            uint16_t seg;
>>>> +        };
>>>> +        pci_sbdf_t sbdf;
>>>> +    };
>>>> +    struct domain *domain;
>>>> +};
>>>> +#endif
>>> I wonder whether this is strictly needed. Won't it be enough to store
>>> the virtual (ie: guest) sbdf inside the existing vpci struct?
>>>
>>> It would avoid the overhead of the translation you do from pdev ->
>>> vdev, and there doesn't seem to be anything relevant stored in
>>> vpci_dev apart from the virtual sbdf.
>> TL;DR It seems it might be needed from performance POV. If not implemented
>> for every MMIO trap we use a global PCI lock, e.g. pcidevs_{lock|unlock}.
>> Note: pcidevs' lock is a recursive lock
>>
>> There are 2 sources of access to virtual devices:
>> 1. During initialization when we add, assign or de-assign a PCI device
>> 2. At run-time when we trap configuration space access and need to
>> translate virtual SBDF into physical SBDF
>> 3. At least de-assign can run concurrently with MMIO handlers
>>
>> Now let's see which locks are in use while doing that.
>>
>> 1. No struct vpci_dev is used.
>> 1.1. We remove the structure and just add pdev->vpci->guest_sbdf as you suggest
>> 1.2. To protect virtual devices we use pcidevs_{lock|unlock}
>> 1.3. Locking happens on system level
>>
>> 2. struct vpci_dev is used
>> 2.1. We have a per-domain lock vdev_lock
>> 2.2. Locking happens on per domain level
>>
>> To compare the two:
>>
>> 1. Without vpci_dev
>> pros: much simpler code
>> pros/cons: global lock is used during MMIO handling, but it is a recursive lock
>>
>> 2. With vpc_dev
>> pros: per-domain locking
>> cons: more code
>>
>> I have implemented the two methods and we need to decide
>> which route we go.
> We could always see about converting the pcidevs lock into a rw one if
> it turns out there's too much contention. PCI config space accesses
> shouldn't be that common or performance critical, so having some
> contention might not be noticeable.
>
> TBH I would start with the simpler solution (add guest_sbdf and use
> pci lock) and move to something more complex once issues are
> identified.
Ok, the code is indeed way simpler with guest_sbdf and pci lock
So, I'll use this approach for now
>
> Regards, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  8:53             ` Oleksandr Andrushchenko
@ 2021-11-03  9:11               ` Jan Beulich
  2021-11-03  9:18                 ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-11-03  9:11 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel

On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
> 
> 
> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>>
>> On 02.11.21 15:54, Jan Beulich wrote:
>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>            pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>    }
>>>>>>>    
>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>> +{
>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>> +
>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>> +    {
>>>>>>> +        /*
>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>>> +         *  - host has INTx disabled
>>>>>>> +         *  - MSI/MSI-X enabled
>>>>>>> +         */
>>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>> +        else
>>>>>>> +        {
>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>>> +
>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>> +        }
>>>>>> This last part should be Arm specific. On other architectures we
>>>>>> likely want the guest to modify INTx disable in order to select the
>>>>>> interrupt delivery mode for the device.
>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>>> enabled - only one of the three is supposed to be active at a time.
>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>>> the bit is clear.)
>>>> Sure, but this code is making the bit sticky, by not allowing
>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>>> to be Arm only.
>>> Isn't the "else" part questionable even on Arm?
>> It is. Once fixed I can't see anything Arm specific here
> Well, I have looked at the code one more time and everything seems to
> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
> guest_cmd_write. The former is used for the hardware domain and has
> *no restrictions* on writing PCI_COMMAND register contents and the later
> is only used for guests and which does have restrictions applied in
> emulate_cmd_reg function.
> 
> So, for the hardware domain, there is no "sticky" bit possible and for the
> guest domains if the physical contents of the PCI_COMMAND register
> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
> use PCI_COMMAND_INTX_DISABLE bit set.
> 
> So, from hardware domain POV, this should not be a problem, but from
> guests view it can. Let's imagine that the hardware domain can handle
> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
> domain can decide what can be used for the interrupt source (again, no
> restriction here) and program PCI_COMMAND accordingly.
> Guest domains need to align with this configuration, e.g. if INTx was disabled
> by the hardware domain then INTx cannot be enabled for guests

Why? It's the DomU that's in control of the device, so it ought to
be able to pick any of the three. I don't think Dom0 is involved in
handling of interrupts from the device, and hence its own "dislike"
of INTx ought to only extend to the period of time where Dom0 is
controlling the device. This would be different if Xen's view was
different, but as we seem to agree Xen's role here is solely to
prevent invalid combinations getting established in hardware.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  9:11               ` Jan Beulich
@ 2021-11-03  9:18                 ` Oleksandr Andrushchenko
  2021-11-03  9:24                   ` Jan Beulich
  2021-11-03  9:39                   ` Roger Pau Monné
  0 siblings, 2 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03  9:18 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel, Oleksandr Andrushchenko



On 03.11.21 11:11, Jan Beulich wrote:
> On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
>>
>> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>>> On 02.11.21 15:54, Jan Beulich wrote:
>>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>             pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>     }
>>>>>>>>     
>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>> +{
>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>> +
>>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>>> +    {
>>>>>>>> +        /*
>>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>>>> +         *  - host has INTx disabled
>>>>>>>> +         *  - MSI/MSI-X enabled
>>>>>>>> +         */
>>>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>> +        else
>>>>>>>> +        {
>>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>>>> +
>>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>> +        }
>>>>>>> This last part should be Arm specific. On other architectures we
>>>>>>> likely want the guest to modify INTx disable in order to select the
>>>>>>> interrupt delivery mode for the device.
>>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>>>> enabled - only one of the three is supposed to be active at a time.
>>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>>>> the bit is clear.)
>>>>> Sure, but this code is making the bit sticky, by not allowing
>>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>>>> to be Arm only.
>>>> Isn't the "else" part questionable even on Arm?
>>> It is. Once fixed I can't see anything Arm specific here
>> Well, I have looked at the code one more time and everything seems to
>> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
>> guest_cmd_write. The former is used for the hardware domain and has
>> *no restrictions* on writing PCI_COMMAND register contents and the later
>> is only used for guests and which does have restrictions applied in
>> emulate_cmd_reg function.
>>
>> So, for the hardware domain, there is no "sticky" bit possible and for the
>> guest domains if the physical contents of the PCI_COMMAND register
>> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
>> use PCI_COMMAND_INTX_DISABLE bit set.
>>
>> So, from hardware domain POV, this should not be a problem, but from
>> guests view it can. Let's imagine that the hardware domain can handle
>> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
>> domain can decide what can be used for the interrupt source (again, no
>> restriction here) and program PCI_COMMAND accordingly.
>> Guest domains need to align with this configuration, e.g. if INTx was disabled
>> by the hardware domain then INTx cannot be enabled for guests
> Why? It's the DomU that's in control of the device, so it ought to
> be able to pick any of the three. I don't think Dom0 is involved in
> handling of interrupts from the device, and hence its own "dislike"
> of INTx ought to only extend to the period of time where Dom0 is
> controlling the device. This would be different if Xen's view was
> different, but as we seem to agree Xen's role here is solely to
> prevent invalid combinations getting established in hardware.
On top of a PCI device there is a physical host bridge and
physical bus topology which may impose restrictions from
Dom0 POV on that particular device. So, every PCI device
being passed through to a DomU may have different INTx
settings which do depend on Dom0 in our case.
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  9:18                 ` Oleksandr Andrushchenko
@ 2021-11-03  9:24                   ` Jan Beulich
  2021-11-03  9:30                     ` Oleksandr Andrushchenko
  2021-11-03  9:39                   ` Roger Pau Monné
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-11-03  9:24 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel, Roger Pau Monné

On 03.11.2021 10:18, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.11.21 11:11, Jan Beulich wrote:
>> On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
>>>
>>> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>>>> On 02.11.21 15:54, Jan Beulich wrote:
>>>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>             pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>>     }
>>>>>>>>>     
>>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>>> +{
>>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>>> +
>>>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>>>> +    {
>>>>>>>>> +        /*
>>>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>>>>> +         *  - host has INTx disabled
>>>>>>>>> +         *  - MSI/MSI-X enabled
>>>>>>>>> +         */
>>>>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>> +        else
>>>>>>>>> +        {
>>>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>>>>> +
>>>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>> +        }
>>>>>>>> This last part should be Arm specific. On other architectures we
>>>>>>>> likely want the guest to modify INTx disable in order to select the
>>>>>>>> interrupt delivery mode for the device.
>>>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>>>>> enabled - only one of the three is supposed to be active at a time.
>>>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>>>>> the bit is clear.)
>>>>>> Sure, but this code is making the bit sticky, by not allowing
>>>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>>>>> to be Arm only.
>>>>> Isn't the "else" part questionable even on Arm?
>>>> It is. Once fixed I can't see anything Arm specific here
>>> Well, I have looked at the code one more time and everything seems to
>>> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
>>> guest_cmd_write. The former is used for the hardware domain and has
>>> *no restrictions* on writing PCI_COMMAND register contents and the later
>>> is only used for guests and which does have restrictions applied in
>>> emulate_cmd_reg function.
>>>
>>> So, for the hardware domain, there is no "sticky" bit possible and for the
>>> guest domains if the physical contents of the PCI_COMMAND register
>>> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
>>> use PCI_COMMAND_INTX_DISABLE bit set.
>>>
>>> So, from hardware domain POV, this should not be a problem, but from
>>> guests view it can. Let's imagine that the hardware domain can handle
>>> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
>>> domain can decide what can be used for the interrupt source (again, no
>>> restriction here) and program PCI_COMMAND accordingly.
>>> Guest domains need to align with this configuration, e.g. if INTx was disabled
>>> by the hardware domain then INTx cannot be enabled for guests
>> Why? It's the DomU that's in control of the device, so it ought to
>> be able to pick any of the three. I don't think Dom0 is involved in
>> handling of interrupts from the device, and hence its own "dislike"
>> of INTx ought to only extend to the period of time where Dom0 is
>> controlling the device. This would be different if Xen's view was
>> different, but as we seem to agree Xen's role here is solely to
>> prevent invalid combinations getting established in hardware.
> On top of a PCI device there is a physical host bridge and
> physical bus topology which may impose restrictions from
> Dom0 POV on that particular device.

Well, such physical restrictions may mean INTx doesn't actually work,
but this won't mean the DomU isn't free in choosing the bit's setting.
The bit merely controls whether the device is allowed to assert its
interrupt pin. Hence ...

> So, every PCI device
> being passed through to a DomU may have different INTx
> settings which do depend on Dom0 in our case.

... I'm still unconvinced of this.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  9:24                   ` Jan Beulich
@ 2021-11-03  9:30                     ` Oleksandr Andrushchenko
  2021-11-03  9:49                       ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03  9:30 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Andrushchenko,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Rahul Singh, Michal Orzel



On 03.11.21 11:24, Jan Beulich wrote:
> On 03.11.2021 10:18, Oleksandr Andrushchenko wrote:
>>
>> On 03.11.21 11:11, Jan Beulich wrote:
>>> On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
>>>> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>>>>> On 02.11.21 15:54, Jan Beulich wrote:
>>>>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>>>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>>              pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>>>      }
>>>>>>>>>>      
>>>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>>>> +{
>>>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>>>> +
>>>>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>>>>> +    {
>>>>>>>>>> +        /*
>>>>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>>>>>> +         *  - host has INTx disabled
>>>>>>>>>> +         *  - MSI/MSI-X enabled
>>>>>>>>>> +         */
>>>>>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>> +        else
>>>>>>>>>> +        {
>>>>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>>>>>> +
>>>>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>> +        }
>>>>>>>>> This last part should be Arm specific. On other architectures we
>>>>>>>>> likely want the guest to modify INTx disable in order to select the
>>>>>>>>> interrupt delivery mode for the device.
>>>>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>>>>>> enabled - only one of the three is supposed to be active at a time.
>>>>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>>>>>> the bit is clear.)
>>>>>>> Sure, but this code is making the bit sticky, by not allowing
>>>>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>>>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>>>>>> to be Arm only.
>>>>>> Isn't the "else" part questionable even on Arm?
>>>>> It is. Once fixed I can't see anything Arm specific here
>>>> Well, I have looked at the code one more time and everything seems to
>>>> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
>>>> guest_cmd_write. The former is used for the hardware domain and has
>>>> *no restrictions* on writing PCI_COMMAND register contents and the later
>>>> is only used for guests and which does have restrictions applied in
>>>> emulate_cmd_reg function.
>>>>
>>>> So, for the hardware domain, there is no "sticky" bit possible and for the
>>>> guest domains if the physical contents of the PCI_COMMAND register
>>>> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
>>>> use PCI_COMMAND_INTX_DISABLE bit set.
>>>>
>>>> So, from hardware domain POV, this should not be a problem, but from
>>>> guests view it can. Let's imagine that the hardware domain can handle
>>>> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
>>>> domain can decide what can be used for the interrupt source (again, no
>>>> restriction here) and program PCI_COMMAND accordingly.
>>>> Guest domains need to align with this configuration, e.g. if INTx was disabled
>>>> by the hardware domain then INTx cannot be enabled for guests
>>> Why? It's the DomU that's in control of the device, so it ought to
>>> be able to pick any of the three. I don't think Dom0 is involved in
>>> handling of interrupts from the device, and hence its own "dislike"
>>> of INTx ought to only extend to the period of time where Dom0 is
>>> controlling the device. This would be different if Xen's view was
>>> different, but as we seem to agree Xen's role here is solely to
>>> prevent invalid combinations getting established in hardware.
>> On top of a PCI device there is a physical host bridge and
>> physical bus topology which may impose restrictions from
>> Dom0 POV on that particular device.
> Well, such physical restrictions may mean INTx doesn't actually work,
> but this won't mean the DomU isn't free in choosing the bit's setting.
> The bit merely controls whether the device is allowed to assert its
> interrupt pin. Hence ...
>
>> So, every PCI device
>> being passed through to a DomU may have different INTx
>> settings which do depend on Dom0 in our case.
> ... I'm still unconvinced of this.
Ok, so I can accept any suggestion how to solve this. It seems that
we already have number of no go scenarios here, but still it is not
clear to me what could be an acceptable approach here. Namely:
what do we do with INTx bit for guests?
1. I can leave it as is in the patch
2. I can remove INTx emulation and let the guest decide and program INTx
3. What else can I do?
>
> Jan
>
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  9:18                 ` Oleksandr Andrushchenko
  2021-11-03  9:24                   ` Jan Beulich
@ 2021-11-03  9:39                   ` Roger Pau Monné
  2021-11-03  9:50                     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-11-03  9:39 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Rahul Singh, Michal Orzel

On Wed, Nov 03, 2021 at 09:18:03AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.11.21 11:11, Jan Beulich wrote:
> > On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
> >>
> >> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
> >>> On 02.11.21 15:54, Jan Beulich wrote:
> >>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
> >>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
> >>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
> >>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
> >>>>>>>> --- a/xen/drivers/vpci/header.c
> >>>>>>>> +++ b/xen/drivers/vpci/header.c
> >>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> >>>>>>>>             pci_conf_write16(pdev->sbdf, reg, cmd);
> >>>>>>>>     }
> >>>>>>>>     
> >>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> >>>>>>>> +                            uint32_t cmd, void *data)
> >>>>>>>> +{
> >>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
> >>>>>>>> +
> >>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
> >>>>>>>> +    {
> >>>>>>>> +        /*
> >>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
> >>>>>>>> +         *  - host has INTx disabled
> >>>>>>>> +         *  - MSI/MSI-X enabled
> >>>>>>>> +         */
> >>>>>>>> +        if ( pdev->vpci->msi->enabled )
> >>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
> >>>>>>>> +        else
> >>>>>>>> +        {
> >>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
> >>>>>>>> +
> >>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
> >>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
> >>>>>>>> +        }
> >>>>>>> This last part should be Arm specific. On other architectures we
> >>>>>>> likely want the guest to modify INTx disable in order to select the
> >>>>>>> interrupt delivery mode for the device.
> >>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
> >>>>>> enabled - only one of the three is supposed to be active at a time.
> >>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
> >>>>>> the bit is clear.)
> >>>>> Sure, but this code is making the bit sticky, by not allowing
> >>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
> >>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
> >>>>> to be Arm only.
> >>>> Isn't the "else" part questionable even on Arm?
> >>> It is. Once fixed I can't see anything Arm specific here
> >> Well, I have looked at the code one more time and everything seems to
> >> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
> >> guest_cmd_write. The former is used for the hardware domain and has
> >> *no restrictions* on writing PCI_COMMAND register contents and the later
> >> is only used for guests and which does have restrictions applied in
> >> emulate_cmd_reg function.
> >>
> >> So, for the hardware domain, there is no "sticky" bit possible and for the
> >> guest domains if the physical contents of the PCI_COMMAND register
> >> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
> >> use PCI_COMMAND_INTX_DISABLE bit set.
> >>
> >> So, from hardware domain POV, this should not be a problem, but from
> >> guests view it can. Let's imagine that the hardware domain can handle
> >> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
> >> domain can decide what can be used for the interrupt source (again, no
> >> restriction here) and program PCI_COMMAND accordingly.
> >> Guest domains need to align with this configuration, e.g. if INTx was disabled
> >> by the hardware domain then INTx cannot be enabled for guests
> > Why? It's the DomU that's in control of the device, so it ought to
> > be able to pick any of the three. I don't think Dom0 is involved in
> > handling of interrupts from the device, and hence its own "dislike"
> > of INTx ought to only extend to the period of time where Dom0 is
> > controlling the device. This would be different if Xen's view was
> > different, but as we seem to agree Xen's role here is solely to
> > prevent invalid combinations getting established in hardware.
> On top of a PCI device there is a physical host bridge and
> physical bus topology which may impose restrictions from
> Dom0 POV on that particular device. So, every PCI device
> being passed through to a DomU may have different INTx
> settings which do depend on Dom0 in our case.

Hm, it's kind of weird. What happens if you play with this bit and the
bridge doesn't support it?

Also note that your current code would allow a domU to set the bit if
previously unset, but it then won't allow the domU to clear it, which
doesn't seem to be exactly what you are aiming for.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  9:30                     ` Oleksandr Andrushchenko
@ 2021-11-03  9:49                       ` Jan Beulich
  2021-11-03 10:24                         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-11-03  9:49 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel, Roger Pau Monné

On 03.11.2021 10:30, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.11.21 11:24, Jan Beulich wrote:
>> On 03.11.2021 10:18, Oleksandr Andrushchenko wrote:
>>>
>>> On 03.11.21 11:11, Jan Beulich wrote:
>>>> On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
>>>>> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>>>>>> On 02.11.21 15:54, Jan Beulich wrote:
>>>>>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>>>>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>>>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>>>              pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>>>>      }
>>>>>>>>>>>      
>>>>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>>>>> +{
>>>>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>>>>> +
>>>>>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>>>>>> +    {
>>>>>>>>>>> +        /*
>>>>>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>>>>>>> +         *  - host has INTx disabled
>>>>>>>>>>> +         *  - MSI/MSI-X enabled
>>>>>>>>>>> +         */
>>>>>>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>>> +        else
>>>>>>>>>>> +        {
>>>>>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>>>>>>> +
>>>>>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>>> +        }
>>>>>>>>>> This last part should be Arm specific. On other architectures we
>>>>>>>>>> likely want the guest to modify INTx disable in order to select the
>>>>>>>>>> interrupt delivery mode for the device.
>>>>>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>>>>>>> enabled - only one of the three is supposed to be active at a time.
>>>>>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>>>>>>> the bit is clear.)
>>>>>>>> Sure, but this code is making the bit sticky, by not allowing
>>>>>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>>>>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>>>>>>> to be Arm only.
>>>>>>> Isn't the "else" part questionable even on Arm?
>>>>>> It is. Once fixed I can't see anything Arm specific here
>>>>> Well, I have looked at the code one more time and everything seems to
>>>>> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
>>>>> guest_cmd_write. The former is used for the hardware domain and has
>>>>> *no restrictions* on writing PCI_COMMAND register contents and the later
>>>>> is only used for guests and which does have restrictions applied in
>>>>> emulate_cmd_reg function.
>>>>>
>>>>> So, for the hardware domain, there is no "sticky" bit possible and for the
>>>>> guest domains if the physical contents of the PCI_COMMAND register
>>>>> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
>>>>> use PCI_COMMAND_INTX_DISABLE bit set.
>>>>>
>>>>> So, from hardware domain POV, this should not be a problem, but from
>>>>> guests view it can. Let's imagine that the hardware domain can handle
>>>>> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
>>>>> domain can decide what can be used for the interrupt source (again, no
>>>>> restriction here) and program PCI_COMMAND accordingly.
>>>>> Guest domains need to align with this configuration, e.g. if INTx was disabled
>>>>> by the hardware domain then INTx cannot be enabled for guests
>>>> Why? It's the DomU that's in control of the device, so it ought to
>>>> be able to pick any of the three. I don't think Dom0 is involved in
>>>> handling of interrupts from the device, and hence its own "dislike"
>>>> of INTx ought to only extend to the period of time where Dom0 is
>>>> controlling the device. This would be different if Xen's view was
>>>> different, but as we seem to agree Xen's role here is solely to
>>>> prevent invalid combinations getting established in hardware.
>>> On top of a PCI device there is a physical host bridge and
>>> physical bus topology which may impose restrictions from
>>> Dom0 POV on that particular device.
>> Well, such physical restrictions may mean INTx doesn't actually work,
>> but this won't mean the DomU isn't free in choosing the bit's setting.
>> The bit merely controls whether the device is allowed to assert its
>> interrupt pin. Hence ...
>>
>>> So, every PCI device
>>> being passed through to a DomU may have different INTx
>>> settings which do depend on Dom0 in our case.
>> ... I'm still unconvinced of this.
> Ok, so I can accept any suggestion how to solve this. It seems that
> we already have number of no go scenarios here, but still it is not
> clear to me what could be an acceptable approach here. Namely:
> what do we do with INTx bit for guests?
> 1. I can leave it as is in the patch
> 2. I can remove INTx emulation and let the guest decide and program INTx
> 3. What else can I do?

Aiui you want to prevent the guest from clearing the bit if either
MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
or MSI-X, you will want to force the bit set (which may well be in
a separate, future patch).

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  9:39                   ` Roger Pau Monné
@ 2021-11-03  9:50                     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03  9:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Oleksandr Andrushchenko, Rahul Singh,
	Michal Orzel



On 03.11.21 11:39, Roger Pau Monné wrote:
> On Wed, Nov 03, 2021 at 09:18:03AM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 03.11.21 11:11, Jan Beulich wrote:
>>> On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
>>>> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>>>>> On 02.11.21 15:54, Jan Beulich wrote:
>>>>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>>>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>>              pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>>>      }
>>>>>>>>>>      
>>>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>>>> +{
>>>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>>>> +
>>>>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>>>>> +    {
>>>>>>>>>> +        /*
>>>>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>>>>>> +         *  - host has INTx disabled
>>>>>>>>>> +         *  - MSI/MSI-X enabled
>>>>>>>>>> +         */
>>>>>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>> +        else
>>>>>>>>>> +        {
>>>>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>>>>>> +
>>>>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>> +        }
>>>>>>>>> This last part should be Arm specific. On other architectures we
>>>>>>>>> likely want the guest to modify INTx disable in order to select the
>>>>>>>>> interrupt delivery mode for the device.
>>>>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>>>>>> enabled - only one of the three is supposed to be active at a time.
>>>>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>>>>>> the bit is clear.)
>>>>>>> Sure, but this code is making the bit sticky, by not allowing
>>>>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>>>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>>>>>> to be Arm only.
>>>>>> Isn't the "else" part questionable even on Arm?
>>>>> It is. Once fixed I can't see anything Arm specific here
>>>> Well, I have looked at the code one more time and everything seems to
>>>> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
>>>> guest_cmd_write. The former is used for the hardware domain and has
>>>> *no restrictions* on writing PCI_COMMAND register contents and the later
>>>> is only used for guests and which does have restrictions applied in
>>>> emulate_cmd_reg function.
>>>>
>>>> So, for the hardware domain, there is no "sticky" bit possible and for the
>>>> guest domains if the physical contents of the PCI_COMMAND register
>>>> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
>>>> use PCI_COMMAND_INTX_DISABLE bit set.
>>>>
>>>> So, from hardware domain POV, this should not be a problem, but from
>>>> guests view it can. Let's imagine that the hardware domain can handle
>>>> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
>>>> domain can decide what can be used for the interrupt source (again, no
>>>> restriction here) and program PCI_COMMAND accordingly.
>>>> Guest domains need to align with this configuration, e.g. if INTx was disabled
>>>> by the hardware domain then INTx cannot be enabled for guests
>>> Why? It's the DomU that's in control of the device, so it ought to
>>> be able to pick any of the three. I don't think Dom0 is involved in
>>> handling of interrupts from the device, and hence its own "dislike"
>>> of INTx ought to only extend to the period of time where Dom0 is
>>> controlling the device. This would be different if Xen's view was
>>> different, but as we seem to agree Xen's role here is solely to
>>> prevent invalid combinations getting established in hardware.
>> On top of a PCI device there is a physical host bridge and
>> physical bus topology which may impose restrictions from
>> Dom0 POV on that particular device. So, every PCI device
>> being passed through to a DomU may have different INTx
>> settings which do depend on Dom0 in our case.
> Hm, it's kind of weird. What happens if you play with this bit and the
> bridge doesn't support it?
For that reason I think it is enough to relay on some reference value
which shows if INTx can be used. For that I suggest we depend on
Dom0 for now and read this reference PCI_COMMAND value while
in init_bars + is_hardware_domain. Then this can be used to feed
the initial value of the PCI_COMMAND for guests.
This way Dom0 solves the problem "what is supported for this
PCI device with respect to the bus topology and host bridge"
>
> Also note that your current code would allow a domU to set the bit if
> previously unset, but it then won't allow the domU to clear it, which
> doesn't seem to be exactly what you are aiming for.
That was noted before. If we use the reference value and use it
as an initial value of the PCI_COMMAND for the guests (remember
I use 0 in the patch which resets PCI_COMMAND for the guests
and check the real PCI_COMMAND contents to decide on INTx).
So, this reference value can be used in checks:
         if ( pdev->vpci->msi->enabled )
             cmd |= PCI_COMMAND_INTX_DISABLE;
         else
         {
             if ( pdev->cmd_ref_value & PCI_COMMAND_INTX_DISABLE )
                  ^^^^^^^^^^^^^^
                 cmd |= PCI_COMMAND_INTX_DISABLE;
         }

init_bars:
if (hwdom)
  pdev->cmd_ref_value = read(PCI_COMMAND)
>
> Thanks, Roger.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03  9:49                       ` Jan Beulich
@ 2021-11-03 10:24                         ` Oleksandr Andrushchenko
  2021-11-03 10:34                           ` Jan Beulich
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03 10:24 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel, Oleksandr Andrushchenko



On 03.11.21 11:49, Jan Beulich wrote:
> On 03.11.2021 10:30, Oleksandr Andrushchenko wrote:
>>
>> On 03.11.21 11:24, Jan Beulich wrote:
>>> On 03.11.2021 10:18, Oleksandr Andrushchenko wrote:
>>>> On 03.11.21 11:11, Jan Beulich wrote:
>>>>> On 03.11.2021 09:53, Oleksandr Andrushchenko wrote:
>>>>>> On 02.11.21 16:10, Oleksandr Andrushchenko wrote:
>>>>>>> On 02.11.21 15:54, Jan Beulich wrote:
>>>>>>>> On 02.11.2021 12:50, Roger Pau Monné wrote:
>>>>>>>>> On Tue, Nov 02, 2021 at 12:19:13PM +0100, Jan Beulich wrote:
>>>>>>>>>> On 26.10.2021 12:52, Roger Pau Monné wrote:
>>>>>>>>>>> On Thu, Sep 30, 2021 at 10:52:20AM +0300, Oleksandr Andrushchenko wrote:
>>>>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>>>>> @@ -451,6 +451,32 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>>>>               pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>>>>>       }
>>>>>>>>>>>>       
>>>>>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +        /*
>>>>>>>>>>>> +         * Guest wants to enable INTx. It can't be enabled if:
>>>>>>>>>>>> +         *  - host has INTx disabled
>>>>>>>>>>>> +         *  - MSI/MSI-X enabled
>>>>>>>>>>>> +         */
>>>>>>>>>>>> +        if ( pdev->vpci->msi->enabled )
>>>>>>>>>>>> +            cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>>>> +        else
>>>>>>>>>>>> +        {
>>>>>>>>>>>> +            uint16_t current_cmd = pci_conf_read16(pdev->sbdf, reg);
>>>>>>>>>>>> +
>>>>>>>>>>>> +            if ( current_cmd & PCI_COMMAND_INTX_DISABLE )
>>>>>>>>>>>> +                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>>>>> +        }
>>>>>>>>>>> This last part should be Arm specific. On other architectures we
>>>>>>>>>>> likely want the guest to modify INTx disable in order to select the
>>>>>>>>>>> interrupt delivery mode for the device.
>>>>>>>>>> We cannot allow a guest to clear the bit when it has MSI / MSI-X
>>>>>>>>>> enabled - only one of the three is supposed to be active at a time.
>>>>>>>>>> (IOW similarly we cannot allow a guest to enable MSI / MSI-X when
>>>>>>>>>> the bit is clear.)
>>>>>>>>> Sure, but this code is making the bit sticky, by not allowing
>>>>>>>>> INTX_DISABLE to be cleared once set. We do not want that behavior on
>>>>>>>>> x86, as a guest can decide to use MSI or INTx. The else branch needs
>>>>>>>>> to be Arm only.
>>>>>>>> Isn't the "else" part questionable even on Arm?
>>>>>>> It is. Once fixed I can't see anything Arm specific here
>>>>>> Well, I have looked at the code one more time and everything seems to
>>>>>> be ok wrt that sticky bit: we have 2 handlers which are cmd_write and
>>>>>> guest_cmd_write. The former is used for the hardware domain and has
>>>>>> *no restrictions* on writing PCI_COMMAND register contents and the later
>>>>>> is only used for guests and which does have restrictions applied in
>>>>>> emulate_cmd_reg function.
>>>>>>
>>>>>> So, for the hardware domain, there is no "sticky" bit possible and for the
>>>>>> guest domains if the physical contents of the PCI_COMMAND register
>>>>>> has PCI_COMMAND_INTX_DISABLE bit set then the guest is enforced to
>>>>>> use PCI_COMMAND_INTX_DISABLE bit set.
>>>>>>
>>>>>> So, from hardware domain POV, this should not be a problem, but from
>>>>>> guests view it can. Let's imagine that the hardware domain can handle
>>>>>> all types of interrupts, e.g. INTx, MSI, MSI-X. In this case the hardware
>>>>>> domain can decide what can be used for the interrupt source (again, no
>>>>>> restriction here) and program PCI_COMMAND accordingly.
>>>>>> Guest domains need to align with this configuration, e.g. if INTx was disabled
>>>>>> by the hardware domain then INTx cannot be enabled for guests
>>>>> Why? It's the DomU that's in control of the device, so it ought to
>>>>> be able to pick any of the three. I don't think Dom0 is involved in
>>>>> handling of interrupts from the device, and hence its own "dislike"
>>>>> of INTx ought to only extend to the period of time where Dom0 is
>>>>> controlling the device. This would be different if Xen's view was
>>>>> different, but as we seem to agree Xen's role here is solely to
>>>>> prevent invalid combinations getting established in hardware.
>>>> On top of a PCI device there is a physical host bridge and
>>>> physical bus topology which may impose restrictions from
>>>> Dom0 POV on that particular device.
>>> Well, such physical restrictions may mean INTx doesn't actually work,
>>> but this won't mean the DomU isn't free in choosing the bit's setting.
>>> The bit merely controls whether the device is allowed to assert its
>>> interrupt pin. Hence ...
>>>
>>>> So, every PCI device
>>>> being passed through to a DomU may have different INTx
>>>> settings which do depend on Dom0 in our case.
>>> ... I'm still unconvinced of this.
>> Ok, so I can accept any suggestion how to solve this. It seems that
>> we already have number of no go scenarios here, but still it is not
>> clear to me what could be an acceptable approach here. Namely:
>> what do we do with INTx bit for guests?
>> 1. I can leave it as is in the patch
>> 2. I can remove INTx emulation and let the guest decide and program INTx
>> 3. What else can I do?
> Aiui you want to prevent the guest from clearing the bit if either
> MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
> or MSI-X, you will want to force the bit set (which may well be in
> a separate, future patch).
static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
{
     /* TODO: Add proper emulation for all bits of the command register. */

     if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
     {
         /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
#ifdef CONFIG_HAS_PCI_MSI
         if ( pdev->vpci->msi->enabled )
             cmd |= PCI_COMMAND_INTX_DISABLE;
#endif
     }

     return cmd;
}

Is this what you mean?
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03 10:24                         ` Oleksandr Andrushchenko
@ 2021-11-03 10:34                           ` Jan Beulich
  2021-11-03 10:36                             ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Jan Beulich @ 2021-11-03 10:34 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel, Roger Pau Monné

On 03.11.2021 11:24, Oleksandr Andrushchenko wrote:
> On 03.11.21 11:49, Jan Beulich wrote:
>> Aiui you want to prevent the guest from clearing the bit if either
>> MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
>> or MSI-X, you will want to force the bit set (which may well be in
>> a separate, future patch).
> static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
> {
>      /* TODO: Add proper emulation for all bits of the command register. */
> 
>      if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>      {
>          /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
> #ifdef CONFIG_HAS_PCI_MSI
>          if ( pdev->vpci->msi->enabled )
>              cmd |= PCI_COMMAND_INTX_DISABLE;
> #endif
>      }
> 
>      return cmd;
> }
> 
> Is this what you mean?

Something along these lines, yes. I'd omit the outer if() for clarity /
brevity.

Jan



^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03 10:34                           ` Jan Beulich
@ 2021-11-03 10:36                             ` Oleksandr Andrushchenko
  2021-11-03 11:01                               ` Roger Pau Monné
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03 10:36 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel, Oleksandr Andrushchenko



On 03.11.21 12:34, Jan Beulich wrote:
> On 03.11.2021 11:24, Oleksandr Andrushchenko wrote:
>> On 03.11.21 11:49, Jan Beulich wrote:
>>> Aiui you want to prevent the guest from clearing the bit if either
>>> MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
>>> or MSI-X, you will want to force the bit set (which may well be in
>>> a separate, future patch).
>> static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
>> {
>>       /* TODO: Add proper emulation for all bits of the command register. */
>>
>>       if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>       {
>>           /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>> #ifdef CONFIG_HAS_PCI_MSI
>>           if ( pdev->vpci->msi->enabled )
>>               cmd |= PCI_COMMAND_INTX_DISABLE;
>> #endif
>>       }
>>
>>       return cmd;
>> }
>>
>> Is this what you mean?
> Something along these lines, yes. I'd omit the outer if() for clarity /
> brevity.
Sure, thank you!
@Roger are you ok with this approach?
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03 10:36                             ` Oleksandr Andrushchenko
@ 2021-11-03 11:01                               ` Roger Pau Monné
  2021-11-03 11:02                                 ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-11-03 11:01 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Rahul Singh, Michal Orzel

On Wed, Nov 03, 2021 at 10:36:36AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.11.21 12:34, Jan Beulich wrote:
> > On 03.11.2021 11:24, Oleksandr Andrushchenko wrote:
> >> On 03.11.21 11:49, Jan Beulich wrote:
> >>> Aiui you want to prevent the guest from clearing the bit if either
> >>> MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
> >>> or MSI-X, you will want to force the bit set (which may well be in
> >>> a separate, future patch).
> >> static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
> >> {
> >>       /* TODO: Add proper emulation for all bits of the command register. */
> >>
> >>       if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
> >>       {
> >>           /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
> >> #ifdef CONFIG_HAS_PCI_MSI
> >>           if ( pdev->vpci->msi->enabled )
> >>               cmd |= PCI_COMMAND_INTX_DISABLE;
> >> #endif
> >>       }
> >>
> >>       return cmd;
> >> }
> >>
> >> Is this what you mean?
> > Something along these lines, yes. I'd omit the outer if() for clarity /
> > brevity.
> Sure, thank you!
> @Roger are you ok with this approach?

Sure, I would even do:

#ifdef CONFIG_HAS_PCI_MSI
if ( !(cmd & PCI_COMMAND_INTX_DISABLE) && pdev->vpci->msi->enabled )
{
    /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
    cmd |= PCI_COMMAND_INTX_DISABLE;
}
#endif

There's no need for the outer check if there's no support for MSI.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03 11:01                               ` Roger Pau Monné
@ 2021-11-03 11:02                                 ` Oleksandr Andrushchenko
  2021-11-03 11:26                                   ` Roger Pau Monné
  0 siblings, 1 reply; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03 11:02 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, Bertrand Marquis, Rahul Singh,
	Michal Orzel



On 03.11.21 13:01, Roger Pau Monné wrote:
> On Wed, Nov 03, 2021 at 10:36:36AM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 03.11.21 12:34, Jan Beulich wrote:
>>> On 03.11.2021 11:24, Oleksandr Andrushchenko wrote:
>>>> On 03.11.21 11:49, Jan Beulich wrote:
>>>>> Aiui you want to prevent the guest from clearing the bit if either
>>>>> MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
>>>>> or MSI-X, you will want to force the bit set (which may well be in
>>>>> a separate, future patch).
>>>> static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
>>>> {
>>>>        /* TODO: Add proper emulation for all bits of the command register. */
>>>>
>>>>        if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>        {
>>>>            /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>> #ifdef CONFIG_HAS_PCI_MSI
>>>>            if ( pdev->vpci->msi->enabled )
>>>>                cmd |= PCI_COMMAND_INTX_DISABLE;
>>>> #endif
>>>>        }
>>>>
>>>>        return cmd;
>>>> }
>>>>
>>>> Is this what you mean?
>>> Something along these lines, yes. I'd omit the outer if() for clarity /
>>> brevity.
>> Sure, thank you!
>> @Roger are you ok with this approach?
> Sure, I would even do:
>
> #ifdef CONFIG_HAS_PCI_MSI
> if ( !(cmd & PCI_COMMAND_INTX_DISABLE) && pdev->vpci->msi->enabled )
> {
>      /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>      cmd |= PCI_COMMAND_INTX_DISABLE;
> }
> #endif
>
> There's no need for the outer check if there's no support for MSI.
Ok, sounds good!
Thank you both!!
>
> Thanks, Roger.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03 11:02                                 ` Oleksandr Andrushchenko
@ 2021-11-03 11:26                                   ` Roger Pau Monné
  2021-11-03 11:34                                     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 98+ messages in thread
From: Roger Pau Monné @ 2021-11-03 11:26 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Rahul Singh, Michal Orzel

On Wed, Nov 03, 2021 at 11:02:37AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.11.21 13:01, Roger Pau Monné wrote:
> > On Wed, Nov 03, 2021 at 10:36:36AM +0000, Oleksandr Andrushchenko wrote:
> >>
> >> On 03.11.21 12:34, Jan Beulich wrote:
> >>> On 03.11.2021 11:24, Oleksandr Andrushchenko wrote:
> >>>> On 03.11.21 11:49, Jan Beulich wrote:
> >>>>> Aiui you want to prevent the guest from clearing the bit if either
> >>>>> MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
> >>>>> or MSI-X, you will want to force the bit set (which may well be in
> >>>>> a separate, future patch).
> >>>> static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
> >>>> {
> >>>>        /* TODO: Add proper emulation for all bits of the command register. */
> >>>>
> >>>>        if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
> >>>>        {
> >>>>            /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
> >>>> #ifdef CONFIG_HAS_PCI_MSI
> >>>>            if ( pdev->vpci->msi->enabled )
> >>>>                cmd |= PCI_COMMAND_INTX_DISABLE;
> >>>> #endif
> >>>>        }
> >>>>
> >>>>        return cmd;
> >>>> }
> >>>>
> >>>> Is this what you mean?
> >>> Something along these lines, yes. I'd omit the outer if() for clarity /
> >>> brevity.
> >> Sure, thank you!
> >> @Roger are you ok with this approach?
> > Sure, I would even do:
> >
> > #ifdef CONFIG_HAS_PCI_MSI
> > if ( !(cmd & PCI_COMMAND_INTX_DISABLE) && pdev->vpci->msi->enabled )
> > {
> >      /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
> >      cmd |= PCI_COMMAND_INTX_DISABLE;
> > }
> > #endif
> >
> > There's no need for the outer check if there's no support for MSI.
> Ok, sounds good!
> Thank you both!!

In fact you could even remove the check for !(cmd &
PCI_COMMAND_INTX_DISABLE) and always set PCI_COMMAND_INTX_DISABLE if
MSI is enabled, which I think is what Jan was pointing to in his
previous reply.

Regards, Roger.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests
  2021-11-03 11:26                                   ` Roger Pau Monné
@ 2021-11-03 11:34                                     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 98+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-03 11:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko,
	Michal Orzel



On 03.11.21 13:26, Roger Pau Monné wrote:
> On Wed, Nov 03, 2021 at 11:02:37AM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 03.11.21 13:01, Roger Pau Monné wrote:
>>> On Wed, Nov 03, 2021 at 10:36:36AM +0000, Oleksandr Andrushchenko wrote:
>>>> On 03.11.21 12:34, Jan Beulich wrote:
>>>>> On 03.11.2021 11:24, Oleksandr Andrushchenko wrote:
>>>>>> On 03.11.21 11:49, Jan Beulich wrote:
>>>>>>> Aiui you want to prevent the guest from clearing the bit if either
>>>>>>> MSI or MSI-X are in use. Symmetrically, when the guest enables MSI
>>>>>>> or MSI-X, you will want to force the bit set (which may well be in
>>>>>>> a separate, future patch).
>>>>>> static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
>>>>>> {
>>>>>>         /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>
>>>>>>         if ( (cmd & PCI_COMMAND_INTX_DISABLE) == 0 )
>>>>>>         {
>>>>>>             /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>>>> #ifdef CONFIG_HAS_PCI_MSI
>>>>>>             if ( pdev->vpci->msi->enabled )
>>>>>>                 cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>> #endif
>>>>>>         }
>>>>>>
>>>>>>         return cmd;
>>>>>> }
>>>>>>
>>>>>> Is this what you mean?
>>>>> Something along these lines, yes. I'd omit the outer if() for clarity /
>>>>> brevity.
>>>> Sure, thank you!
>>>> @Roger are you ok with this approach?
>>> Sure, I would even do:
>>>
>>> #ifdef CONFIG_HAS_PCI_MSI
>>> if ( !(cmd & PCI_COMMAND_INTX_DISABLE) && pdev->vpci->msi->enabled )
>>> {
>>>       /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>       cmd |= PCI_COMMAND_INTX_DISABLE;
>>> }
>>> #endif
>>>
>>> There's no need for the outer check if there's no support for MSI.
>> Ok, sounds good!
>> Thank you both!!
> In fact you could even remove the check for !(cmd &
> PCI_COMMAND_INTX_DISABLE) and always set PCI_COMMAND_INTX_DISABLE if
> MSI is enabled, which I think is what Jan was pointing to in his
> previous reply.
Ok, I will
>
> Regards, Roger.
>

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2021-11-03 11:34 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30  7:52 [PATCH v3 00/11] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 01/11] vpci: Make vpci registers removal a dedicated function Oleksandr Andrushchenko
2021-10-13 11:11   ` Roger Pau Monné
2021-10-27  9:12     ` Oleksandr Andrushchenko
2021-10-27  9:24       ` Roger Pau Monné
2021-10-27  9:41         ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 02/11] vpci: Add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
2021-09-30  8:21   ` Jan Beulich
2021-09-30  8:45     ` Oleksandr Andrushchenko
2021-09-30  9:06       ` Jan Beulich
2021-09-30  9:21         ` Oleksandr Andrushchenko
2021-09-30 10:14           ` Jan Beulich
2021-09-30 10:30             ` Oleksandr Andrushchenko
2021-10-13 11:29   ` Roger Pau Monné
2021-10-13 12:47     ` Jan Beulich
2021-10-27  9:53     ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 03/11] vpci/header: Move register assignments from init_bars Oleksandr Andrushchenko
2021-10-13 13:51   ` Roger Pau Monné
2021-10-15  6:04     ` Jan Beulich
2021-10-25 14:28       ` Roger Pau Monné
2021-10-27 10:17     ` Oleksandr Andrushchenko
2021-10-27 11:59       ` Oleksandr Andrushchenko
2021-10-27 13:23         ` Roger Pau Monné
2021-10-27 14:06           ` Oleksandr Andrushchenko
2021-10-27 15:34             ` Roger Pau Monné
2021-09-30  7:52 ` [PATCH v3 04/11] vpci/header: Add and remove register handlers dynamically Oleksandr Andrushchenko
2021-10-01 13:26   ` Jan Beulich
2021-10-04  5:58     ` Oleksandr Andrushchenko
2021-10-07  7:22       ` Jan Beulich
2021-10-13 15:38         ` Roger Pau Monné
2021-10-15  6:09           ` Jan Beulich
2021-10-25 15:48   ` Roger Pau Monné
2021-11-01  9:18     ` Oleksandr Andrushchenko
2021-11-02 10:03       ` Roger Pau Monné
2021-11-02 10:29         ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 05/11] vpci/header: Implement guest BAR register handlers Oleksandr Andrushchenko
2021-10-01 13:31   ` Jan Beulich
2021-10-26  7:50   ` Roger Pau Monné
2021-10-26  8:09     ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 06/11] vpci/header: Handle p2m range sets per BAR Oleksandr Andrushchenko
2021-10-25 11:51   ` Oleksandr Andrushchenko
2021-10-26  9:40     ` Roger Pau Monné
2021-11-02 11:13       ` Jan Beulich
2021-10-26  9:08   ` Roger Pau Monné
2021-11-02 10:34     ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 07/11] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
2021-10-01 13:38   ` Jan Beulich
2021-10-04  6:26     ` Oleksandr Andrushchenko
2021-10-26 10:35   ` Roger Pau Monné
2021-11-02 10:43     ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 08/11] vpci/header: Emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
2021-10-26 10:52   ` Roger Pau Monné
2021-11-02 10:48     ` Oleksandr Andrushchenko
2021-11-02 11:19     ` Jan Beulich
2021-11-02 11:50       ` Roger Pau Monné
2021-11-02 13:54         ` Jan Beulich
2021-11-02 14:10           ` Oleksandr Andrushchenko
2021-11-03  8:53             ` Oleksandr Andrushchenko
2021-11-03  9:11               ` Jan Beulich
2021-11-03  9:18                 ` Oleksandr Andrushchenko
2021-11-03  9:24                   ` Jan Beulich
2021-11-03  9:30                     ` Oleksandr Andrushchenko
2021-11-03  9:49                       ` Jan Beulich
2021-11-03 10:24                         ` Oleksandr Andrushchenko
2021-11-03 10:34                           ` Jan Beulich
2021-11-03 10:36                             ` Oleksandr Andrushchenko
2021-11-03 11:01                               ` Roger Pau Monné
2021-11-03 11:02                                 ` Oleksandr Andrushchenko
2021-11-03 11:26                                   ` Roger Pau Monné
2021-11-03 11:34                                     ` Oleksandr Andrushchenko
2021-11-03  9:39                   ` Roger Pau Monné
2021-11-03  9:50                     ` Oleksandr Andrushchenko
2021-11-02 14:17         ` Julien Grall
2021-09-30  7:52 ` [PATCH v3 09/11] vpci/header: Reset the command register when adding devices Oleksandr Andrushchenko
2021-10-26 11:00   ` Roger Pau Monné
2021-11-02 11:11     ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 10/11] vpci: Add initial support for virtual PCI bus topology Oleksandr Andrushchenko
2021-09-30  8:51   ` Jan Beulich
2021-09-30  9:34     ` Oleksandr Andrushchenko
2021-09-30 10:23       ` Jan Beulich
2021-09-30 10:26         ` Oleksandr Andrushchenko
2021-10-26 11:33   ` Roger Pau Monné
2021-11-03  6:34     ` Oleksandr Andrushchenko
2021-11-03  8:41       ` Jan Beulich
2021-11-03  8:57         ` Oleksandr Andrushchenko
2021-11-03  8:52       ` Roger Pau Monné
2021-11-03  8:59         ` Oleksandr Andrushchenko
2021-09-30  7:52 ` [PATCH v3 11/11] xen/arm: Translate virtual PCI bus topology for guests Oleksandr Andrushchenko
2021-09-30  8:53   ` Jan Beulich
2021-09-30  9:35     ` Oleksandr Andrushchenko
2021-09-30 10:25       ` Jan Beulich
2021-09-30 16:57     ` Oleksandr Andrushchenko
2021-10-01  7:42       ` Jan Beulich
2021-10-01  7:57         ` Oleksandr Andrushchenko
2021-10-01  8:12           ` Jan Beulich
2021-10-18 18:32   ` Julien Grall
2021-10-26 13:30   ` Roger Pau Monné
2021-10-26 13:57     ` Oleksandr Andrushchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.