All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/14] PCI devices passthrough on Arm, part 3
@ 2021-11-25 11:02 Oleksandr Andrushchenko
  2021-11-25 11:02 ` [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
                   ` (14 more replies)
  0 siblings, 15 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Hi, all!

1. This patch series is focusing on vPCI and adds support for non-identity
PCI BAR mappings which is required while passing through a PCI device to
a guest. The highlights are:

- Add relevant vpci register handlers when assigning PCI device to a domain
  and remove those when de-assigning. This allows having different
  handlers for different domains, e.g. hwdom and other guests.

- Emulate guest BAR register values based on physical BAR values.
  This allows creating a guest view of the registers and emulates
  size and properties probe as it is done during PCI device enumeration by
  the guest.

- Instead of handling a single range set, that contains all the memory
  regions of all the BARs and ROM, have them per BAR.

- Take into account guest's BAR view and program its p2m accordingly:
  gfn is guest's view of the BAR and mfn is the physical BAR value as set
  up by the host bridge in the hardware domain.
  This way hardware doamin sees physical BAR values and guest sees
  emulated ones.

2. The series also adds support for virtual PCI bus topology for guests:
 - We emulate a single host bridge for the guest, so segment is always 0.
 - The implementation is limited to 32 devices which are allowed on
   a single PCI bus.
 - The virtual bus number is set to 0, so virtual devices are seen
   as embedded endpoints behind the root complex.

3. The series has complete re-work of the locking scheme used/absent before with
the help of the work started by Roger [1]:
[PATCH v5 03/13] vpci: move lock outside of struct vpci

This way the lock can be used to check whether vpci is present, and
removal can be performed while holding the lock, in order to make
sure there are no accesses to the contents of the vpci struct.
Previously removal could race with vpci_read for example, since the
lock was dropped prior to freeing pdev->vpci.
This also solves synchronization issues between all vPCI code entities
which could run in parallel.

4. There is an outstanding TODO left unimplemented by this series:
for unprivileged guests vpci_{read|write} need to be re-worked
to not passthrough accesses to the registers not explicitly handled
by the corresponding vPCI handlers: without fixing that passthrough
to guests is completely unsafe as Xen allows them full access to
the registers.

Xen needs to be sure that every register a guest accesses is not
going to cause the system to malfunction, so Xen needs to keep a
list of the registers it is safe for a guest to access.

For example, we should only expose the PCI capabilities that we know
are safe for a guest to use, i.e.: MSI and MSI-X initially.
The rest of the capabilities should be blocked from guest access,
unless we audit them and declare safe for a guest to access.

As a reference we might want to look at the approach currently used
by QEMU in order to do PCI passthrough. A very limited set of PCI
capabilities known to be safe for untrusted access are exposed to the
guest and registers need to be explicitly handled or else access is
rejected. Xen needs a fairly similar model in vPCI or else none of
this will be safe for unprivileged access.

5. The series was also tested on:
 - x86 PVH Dom0 and doesn't break it.
 - x86 HVM with PCI passthrough to DomU and doesn't break it.

Thank you,
Oleksandr

[1] https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/

Oleksandr Andrushchenko (13):
  rangeset: add RANGESETF_no_print flag
  vpci: fix function attributes for vpci_process_pending
  vpci: cancel pending map/unmap on vpci removal
  vpci: add hooks for PCI device assign/de-assign
  vpci/header: implement guest BAR register handlers
  vpci/header: handle p2m range sets per BAR
  vpci/header: program p2m with guest BAR view
  vpci/header: emulate PCI_COMMAND register for guests
  vpci/header: reset the command register when adding devices
  vpci: add initial support for virtual PCI bus topology
  xen/arm: translate virtual PCI bus topology for guests
  xen/arm: account IO handlers for emulated PCI MSI-X
  vpci: add TODO for the registers not explicitly handled

Roger Pau Monne (1):
  vpci: move lock outside of struct vpci

 tools/tests/vpci/emul.h       |   5 +-
 tools/tests/vpci/main.c       |   4 +-
 xen/arch/arm/vpci.c           |  33 +++-
 xen/arch/x86/hvm/vmsi.c       |   8 +-
 xen/common/rangeset.c         |   5 +-
 xen/drivers/Kconfig           |   4 +
 xen/drivers/passthrough/pci.c |  11 ++
 xen/drivers/vpci/header.c     | 352 +++++++++++++++++++++++++++-------
 xen/drivers/vpci/msi.c        |  11 +-
 xen/drivers/vpci/msix.c       |   8 +-
 xen/drivers/vpci/vpci.c       | 252 +++++++++++++++++++++---
 xen/include/xen/pci.h         |   6 +
 xen/include/xen/rangeset.h    |   7 +-
 xen/include/xen/sched.h       |   8 +
 xen/include/xen/vpci.h        |  47 ++++-
 15 files changed, 644 insertions(+), 117 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 130+ messages in thread

* [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2021-11-25 11:06   ` Jan Beulich
  2021-12-15  3:20   ` Volodymyr Babchuk
  2021-11-25 11:02 ` [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending Oleksandr Andrushchenko
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

There are range sets which should not be printed, so introduce a flag
which allows marking those as such. Implement relevant logic to skip
such entries while printing.

While at it also simplify the definition of the flags by directly
defining those without helpers.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v1:
- update BUG_ON with new flag
- simplify the definition of the flags
---
 xen/common/rangeset.c      | 5 ++++-
 xen/include/xen/rangeset.h | 7 ++++---
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 885b6b15c229..ea27d651723b 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
     INIT_LIST_HEAD(&r->range_list);
     r->nr_ranges = -1;
 
-    BUG_ON(flags & ~RANGESETF_prettyprint_hex);
+    BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
     r->flags = flags;
 
     safe_strcpy(r->name, name ?: "(no name)");
@@ -575,6 +575,9 @@ void rangeset_domain_printk(
 
     list_for_each_entry ( r, &d->rangesets, rangeset_list )
     {
+        if ( r->flags & RANGESETF_no_print )
+            continue;
+
         printk("    ");
         rangeset_printk(r);
         printk("\n");
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 135f33f6066f..045fcafa8368 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -48,9 +48,10 @@ void rangeset_limit(
     struct rangeset *r, unsigned int limit);
 
 /* Flags for passing to rangeset_new(). */
- /* Pretty-print range limits in hexadecimal. */
-#define _RANGESETF_prettyprint_hex 0
-#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
+/* Pretty-print range limits in hexadecimal. */
+#define RANGESETF_prettyprint_hex   (1U << 0)
+/* Do not print entries marked with this flag. */
+#define RANGESETF_no_print          (1U << 1)
 
 bool_t __must_check rangeset_is_empty(
     const struct rangeset *r);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
  2021-11-25 11:02 ` [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2021-12-10 17:55   ` Julien Grall
  2021-11-25 11:02 ` [PATCH v5 03/14] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

vpci_process_pending is defined with different attributes, e.g.
with __must_check if CONFIG_HAS_VPCI enabled and not otherwise.
Fix this by defining both of the definitions with __must_check.

Fixes: 14583a590783 ("7fbb096bf345 kconfig: don't select VPCI if building a shim-only binary")

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Cc: Roger Pau Monné <roger.pau@citrix.com>

New in v4
---
 xen/include/xen/vpci.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 9ea66e033f11..3f32de9d7eb3 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -247,7 +247,7 @@ static inline void vpci_write(pci_sbdf_t sbdf, unsigned int reg,
     ASSERT_UNREACHABLE();
 }
 
-static inline bool vpci_process_pending(struct vcpu *v)
+static inline bool __must_check vpci_process_pending(struct vcpu *v)
 {
     ASSERT_UNREACHABLE();
     return false;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
  2021-11-25 11:02 ` [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
  2021-11-25 11:02 ` [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-11 15:17   ` Roger Pau Monné
  2022-01-12 14:57   ` Jan Beulich
  2021-11-25 11:02 ` [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal Oleksandr Andrushchenko
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Ian Jackson

From: Roger Pau Monne <roger.pau@citrix.com>

This way the lock can be used to check whether vpci is present, and
removal can be performed while holding the lock, in order to make
sure there are no accesses to the contents of the vpci struct.
Previously removal could race with vpci_read for example, since the
lock was dropped prior to freeing pdev->vpci.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
New in v5 of this series: this is an updated version of the patch published at
https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/

Changes since v2:
 - fixed pdev->vpci = xzalloc(struct vpci); under spin_lock (Jan)
Changes since v1:
 - Assert that vpci_lock is locked in vpci_remove_device_locked.
 - Remove double newline.
 - Shrink critical section in vpci_{read/write}.
---
 tools/tests/vpci/emul.h       |  5 ++-
 tools/tests/vpci/main.c       |  4 +--
 xen/arch/x86/hvm/vmsi.c       |  8 ++---
 xen/drivers/passthrough/pci.c |  1 +
 xen/drivers/vpci/header.c     | 21 +++++++----
 xen/drivers/vpci/msi.c        | 11 ++++--
 xen/drivers/vpci/msix.c       |  8 ++---
 xen/drivers/vpci/vpci.c       | 68 +++++++++++++++++++++++------------
 xen/include/xen/pci.h         |  1 +
 xen/include/xen/vpci.h        |  5 +--
 10 files changed, 85 insertions(+), 47 deletions(-)

diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
index 2e1d3057c9d8..d018fb5eef21 100644
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -44,6 +44,7 @@ struct domain {
 };
 
 struct pci_dev {
+    bool vpci_lock;
     struct vpci *vpci;
 };
 
@@ -53,10 +54,8 @@ struct vcpu
 };
 
 extern const struct vcpu *current;
-extern const struct pci_dev test_pdev;
+extern struct pci_dev test_pdev;
 
-typedef bool spinlock_t;
-#define spin_lock_init(l) (*(l) = false)
 #define spin_lock(l) (*(l) = true)
 #define spin_unlock(l) (*(l) = false)
 
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index b9a0a6006bb9..26c95b08b6b1 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -23,7 +23,8 @@ static struct vpci vpci;
 
 const static struct domain d;
 
-const struct pci_dev test_pdev = {
+struct pci_dev test_pdev = {
+    .vpci_lock = false,
     .vpci = &vpci,
 };
 
@@ -158,7 +159,6 @@ main(int argc, char **argv)
     int rc;
 
     INIT_LIST_HEAD(&vpci.handlers);
-    spin_lock_init(&vpci.lock);
 
     VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
     VPCI_READ_CHECK(0, 4, r0);
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b439..1f7a37f78264 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -910,14 +910,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
         {
             struct pci_dev *pdev = msix->pdev;
 
-            spin_unlock(&msix->pdev->vpci->lock);
+            spin_unlock(&msix->pdev->vpci_lock);
             process_pending_softirqs();
             /* NB: we assume that pdev cannot go away for an alive domain. */
-            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+            if ( !spin_trylock(&pdev->vpci_lock) )
                 return -EBUSY;
-            if ( pdev->vpci->msix != msix )
+            if ( !pdev->vpci || pdev->vpci->msix != msix )
             {
-                spin_unlock(&pdev->vpci->lock);
+                spin_unlock(&pdev->vpci_lock);
                 return -EAGAIN;
             }
         }
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index a9d31293ac09..286808b25e65 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -328,6 +328,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
     *((u8*) &pdev->bus) = bus;
     *((u8*) &pdev->devfn) = devfn;
     pdev->domain = NULL;
+    spin_lock_init(&pdev->vpci_lock);
 
     arch_pci_init_pdev(pdev);
 
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 40ff79c33f8f..bd23c0274d48 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
         if ( rc == -ERESTART )
             return true;
 
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev,
-                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
+        spin_lock(&v->vpci.pdev->vpci_lock);
+        if ( v->vpci.pdev->vpci )
+            /* Disable memory decoding unconditionally on failure. */
+            modify_decoding(v->vpci.pdev,
+                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
+                            !rc && v->vpci.rom_only);
+        spin_unlock(&v->vpci.pdev->vpci_lock);
 
         rangeset_destroy(v->vpci.mem);
         v->vpci.mem = NULL;
@@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
                 continue;
         }
 
+        spin_lock(&tmp->vpci_lock);
+        if ( !tmp->vpci )
+        {
+            spin_unlock(&tmp->vpci_lock);
+            continue;
+        }
         for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
         {
             const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
@@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
             rc = rangeset_remove_range(mem, start, end);
             if ( rc )
             {
+                spin_unlock(&tmp->vpci_lock);
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
                 rangeset_destroy(mem);
                 return rc;
             }
         }
+        spin_unlock(&tmp->vpci_lock);
     }
 
     ASSERT(dev);
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 5757a7aed20f..e3ce46869dad 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ void vpci_dump_msi(void)
     rcu_read_lock(&domlist_read_lock);
     for_each_domain ( d )
     {
-        const struct pci_dev *pdev;
+        struct pci_dev *pdev;
 
         if ( !has_vpci(d) )
             continue;
@@ -282,8 +282,13 @@ void vpci_dump_msi(void)
             const struct vpci_msi *msi;
             const struct vpci_msix *msix;
 
-            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+            if ( !spin_trylock(&pdev->vpci_lock) )
                 continue;
+            if ( !pdev->vpci )
+            {
+                spin_unlock(&pdev->vpci_lock);
+                continue;
+            }
 
             msi = pdev->vpci->msi;
             if ( msi && msi->enabled )
@@ -323,7 +328,7 @@ void vpci_dump_msi(void)
                 }
             }
 
-            spin_unlock(&pdev->vpci->lock);
+            spin_unlock(&pdev->vpci_lock);
             process_pending_softirqs();
         }
     }
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 846f1b8d7038..5310cc3ff520 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -225,7 +225,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
+    spin_lock(&msix->pdev->vpci_lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -254,7 +254,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
         ASSERT_UNREACHABLE();
         break;
     }
-    spin_unlock(&msix->pdev->vpci->lock);
+    spin_unlock(&msix->pdev->vpci_lock);
 
     return X86EMUL_OKAY;
 }
@@ -297,7 +297,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
+    spin_lock(&msix->pdev->vpci_lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -370,7 +370,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
         ASSERT_UNREACHABLE();
         break;
     }
-    spin_unlock(&msix->pdev->vpci->lock);
+    spin_unlock(&msix->pdev->vpci_lock);
 
     return X86EMUL_OKAY;
 }
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 657697fe3406..ceaac4516ff8 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
-void vpci_remove_device(struct pci_dev *pdev)
+static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
 {
-    if ( !has_vpci(pdev->domain) )
-        return;
+    ASSERT(spin_is_locked(&pdev->vpci_lock));
 
-    spin_lock(&pdev->vpci->lock);
     while ( !list_empty(&pdev->vpci->handlers) )
     {
         struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
@@ -50,15 +48,33 @@ void vpci_remove_device(struct pci_dev *pdev)
         list_del(&r->node);
         xfree(r);
     }
-    spin_unlock(&pdev->vpci->lock);
+}
+
+void vpci_remove_device_locked(struct pci_dev *pdev)
+{
+    ASSERT(spin_is_locked(&pdev->vpci_lock));
+
+    vpci_remove_device_handlers_locked(pdev);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
     pdev->vpci = NULL;
 }
 
+void vpci_remove_device(struct pci_dev *pdev)
+{
+    if ( !has_vpci(pdev->domain) )
+        return;
+
+    spin_lock(&pdev->vpci_lock);
+    if ( pdev->vpci )
+        vpci_remove_device_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
+}
+
 int vpci_add_handlers(struct pci_dev *pdev)
 {
+    struct vpci *vpci;
     unsigned int i;
     int rc = 0;
 
@@ -68,12 +84,13 @@ int vpci_add_handlers(struct pci_dev *pdev)
     /* We should not get here twice for the same device. */
     ASSERT(!pdev->vpci);
 
-    pdev->vpci = xzalloc(struct vpci);
-    if ( !pdev->vpci )
+    vpci = xzalloc(struct vpci);
+    if ( !vpci )
         return -ENOMEM;
 
+    spin_lock(&pdev->vpci_lock);
+    pdev->vpci = vpci;
     INIT_LIST_HEAD(&pdev->vpci->handlers);
-    spin_lock_init(&pdev->vpci->lock);
 
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
@@ -83,7 +100,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
     }
 
     if ( rc )
-        vpci_remove_device(pdev);
+        vpci_remove_device_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
 
     return rc;
 }
@@ -152,8 +170,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     r->offset = offset;
     r->private = data;
 
-    spin_lock(&vpci->lock);
-
     /* The list of handlers must be kept sorted at all times. */
     list_for_each ( prev, &vpci->handlers )
     {
@@ -165,14 +181,12 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
             break;
         if ( cmp == 0 )
         {
-            spin_unlock(&vpci->lock);
             xfree(r);
             return -EEXIST;
         }
     }
 
     list_add_tail(&r->node, prev);
-    spin_unlock(&vpci->lock);
 
     return 0;
 }
@@ -183,7 +197,6 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
     const struct vpci_register r = { .offset = offset, .size = size };
     struct vpci_register *rm;
 
-    spin_lock(&vpci->lock);
     list_for_each_entry ( rm, &vpci->handlers, node )
     {
         int cmp = vpci_register_cmp(&r, rm);
@@ -195,14 +208,12 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
         if ( !cmp && rm->offset == offset && rm->size == size )
         {
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
             return 0;
         }
         if ( cmp <= 0 )
             break;
     }
-    spin_unlock(&vpci->lock);
 
     return -ENOENT;
 }
@@ -311,7 +322,7 @@ static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 {
     const struct domain *d = current->domain;
-    const struct pci_dev *pdev;
+    struct pci_dev *pdev;
     const struct vpci_register *r;
     unsigned int data_offset = 0;
     uint32_t data = ~(uint32_t)0;
@@ -327,7 +338,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
     if ( !pdev )
         return vpci_read_hw(sbdf, reg, size);
 
-    spin_lock(&pdev->vpci->lock);
+    spin_lock(&pdev->vpci_lock);
+    if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
+        return vpci_read_hw(sbdf, reg, size);
+    }
 
     /* Read from the hardware or the emulated register handlers. */
     list_for_each_entry ( r, &pdev->vpci->handlers, node )
@@ -370,6 +386,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
             break;
         ASSERT(data_offset < size);
     }
+    spin_unlock(&pdev->vpci_lock);
 
     if ( data_offset < size )
     {
@@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 
         data = merge_result(data, tmp_data, size - data_offset, data_offset);
     }
-    spin_unlock(&pdev->vpci->lock);
 
     return data & (0xffffffff >> (32 - 8 * size));
 }
@@ -414,7 +430,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
                 uint32_t data)
 {
     const struct domain *d = current->domain;
-    const struct pci_dev *pdev;
+    struct pci_dev *pdev;
     const struct vpci_register *r;
     unsigned int data_offset = 0;
     const unsigned long *ro_map = pci_get_ro_map(sbdf.seg);
@@ -440,7 +456,14 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
         return;
     }
 
-    spin_lock(&pdev->vpci->lock);
+    spin_lock(&pdev->vpci_lock);
+    if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
+        vpci_write_hw(sbdf, reg, size, data);
+        return;
+    }
+
 
     /* Write the value to the hardware or emulated registers. */
     list_for_each_entry ( r, &pdev->vpci->handlers, node )
@@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
             break;
         ASSERT(data_offset < size);
     }
+    spin_unlock(&pdev->vpci_lock);
 
     if ( data_offset < size )
         /* Tailing gap, write the remaining. */
         vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
                       data >> (data_offset * 8));
-
-    spin_unlock(&pdev->vpci->lock);
 }
 
 /* Helper function to check an access size and alignment on vpci space. */
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index b6d7e454f814..3f60d6c6c6dd 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -134,6 +134,7 @@ struct pci_dev {
     u64 vf_rlen[6];
 
     /* Data for vPCI. */
+    spinlock_t vpci_lock;
     struct vpci *vpci;
 };
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 3f32de9d7eb3..8b22bdef11d0 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,8 +30,9 @@ int __must_check vpci_add_handlers(struct pci_dev *dev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
+void vpci_remove_device_locked(struct pci_dev *pdev);
 
-/* Add/remove a register handler. */
+/* Add/remove a register handler. Must be called holding the vpci_lock. */
 int __must_check vpci_add_register(struct vpci *vpci,
                                    vpci_read_t *read_handler,
                                    vpci_write_t *write_handler,
@@ -60,7 +61,6 @@ bool __must_check vpci_process_pending(struct vcpu *v);
 struct vpci {
     /* List of vPCI handlers for a device. */
     struct list_head handlers;
-    spinlock_t lock;
 
 #ifdef __XEN__
     /* Hide the rest of the vpci struct from the user-space test harness. */
@@ -231,6 +231,7 @@ static inline int vpci_add_handlers(struct pci_dev *pdev)
 }
 
 static inline void vpci_remove_device(struct pci_dev *pdev) { }
+static inline void vpci_remove_device_locked(struct pci_dev *pdev) { }
 
 static inline void vpci_dump_msi(void) { }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (2 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 03/14] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-11 16:57   ` Roger Pau Monné
                     ` (2 more replies)
  2021-11-25 11:02 ` [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
                   ` (10 subsequent siblings)
  14 siblings, 3 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

When a vPCI is removed for a PCI device it is possible that we have
scheduled a delayed work for map/unmap operations for that device.
For example, the following scenario can illustrate the problem:

pci_physdev_op
   pci_add_device
       init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ)
   iommu_add_device <- FAILS
   vpci_remove_device -> xfree(pdev->vpci)

leave_hypervisor_to_guest
   vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL

For the hardware domain we continue execution as the worse that
could happen is that MMIO mappings are left in place when the
device has been deassigned.

For unprivileged domains that get a failure in the middle of a vPCI
{un}map operation we need to destroy them, as we don't know in which
state the p2m is. This can only happen in vpci_process_pending for
DomUs as they won't be allowed to call pci_add_device.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Cc: Roger Pau Monné <roger.pau@citrix.com>
---
Since v4:
 - crash guest domain if map/unmap operation didn't succeed
 - re-work vpci cancel work to cancel work on all vCPUs
 - use new locking scheme with pdev->vpci_lock
New in v4

Fixes: 86dbcf6e30cb ("vpci: cancel pending map/unmap on vpci removal")

---

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 xen/drivers/vpci/header.c | 49 ++++++++++++++++++++++++++++++---------
 xen/drivers/vpci/vpci.c   |  2 ++
 xen/include/xen/pci.h     |  5 ++++
 xen/include/xen/vpci.h    |  6 +++++
 4 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index bd23c0274d48..ba333fb2f9b0 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -131,7 +131,13 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
 
 bool vpci_process_pending(struct vcpu *v)
 {
-    if ( v->vpci.mem )
+    struct pci_dev *pdev = v->vpci.pdev;
+
+    if ( !pdev )
+        return false;
+
+    spin_lock(&pdev->vpci_lock);
+    if ( !pdev->vpci_cancel_pending && v->vpci.mem )
     {
         struct map_data data = {
             .d = v->domain,
@@ -140,32 +146,53 @@ bool vpci_process_pending(struct vcpu *v)
         int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
 
         if ( rc == -ERESTART )
+        {
+            spin_unlock(&pdev->vpci_lock);
             return true;
+        }
 
-        spin_lock(&v->vpci.pdev->vpci_lock);
-        if ( v->vpci.pdev->vpci )
+        if ( pdev->vpci )
             /* Disable memory decoding unconditionally on failure. */
-            modify_decoding(v->vpci.pdev,
+            modify_decoding(pdev,
                             rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
                             !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci_lock);
 
-        rangeset_destroy(v->vpci.mem);
-        v->vpci.mem = NULL;
         if ( rc )
+        {
             /*
              * FIXME: in case of failure remove the device from the domain.
              * Note that there might still be leftover mappings. While this is
-             * safe for Dom0, for DomUs the domain will likely need to be
-             * killed in order to avoid leaking stale p2m mappings on
-             * failure.
+             * safe for Dom0, for DomUs the domain needs to be killed in order
+             * to avoid leaking stale p2m mappings on failure.
              */
-            vpci_remove_device(v->vpci.pdev);
+            if ( is_hardware_domain(v->domain) )
+                vpci_remove_device_locked(pdev);
+            else
+                domain_crash(v->domain);
+        }
     }
+    spin_unlock(&pdev->vpci_lock);
 
     return false;
 }
 
+void vpci_cancel_pending_locked(struct pci_dev *pdev)
+{
+    struct vcpu *v;
+
+    ASSERT(spin_is_locked(&pdev->vpci_lock));
+
+    /* Cancel any pending work now on all vCPUs. */
+    for_each_vcpu( pdev->domain, v )
+    {
+        if ( v->vpci.mem && (v->vpci.pdev == pdev) )
+        {
+            rangeset_destroy(v->vpci.mem);
+            v->vpci.mem = NULL;
+        }
+    }
+}
+
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
                             struct rangeset *mem, uint16_t cmd)
 {
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index ceaac4516ff8..37103e207635 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -54,7 +54,9 @@ void vpci_remove_device_locked(struct pci_dev *pdev)
 {
     ASSERT(spin_is_locked(&pdev->vpci_lock));
 
+    pdev->vpci_cancel_pending = true;
     vpci_remove_device_handlers_locked(pdev);
+    vpci_cancel_pending_locked(pdev);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3f60d6c6c6dd..52d302ac5f35 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -135,6 +135,11 @@ struct pci_dev {
 
     /* Data for vPCI. */
     spinlock_t vpci_lock;
+    /*
+     * Set if PCI device is being removed now and we need to cancel any
+     * pending map/unmap operations.
+     */
+    bool vpci_cancel_pending;
     struct vpci *vpci;
 };
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 8b22bdef11d0..cfff87e5801e 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -57,6 +57,7 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
  * should not run.
  */
 bool __must_check vpci_process_pending(struct vcpu *v);
+void vpci_cancel_pending_locked(struct pci_dev *pdev);
 
 struct vpci {
     /* List of vPCI handlers for a device. */
@@ -253,6 +254,11 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
     ASSERT_UNREACHABLE();
     return false;
 }
+
+static inline void vpci_cancel_pending_locked(struct pci_dev *pdev)
+{
+    ASSERT_UNREACHABLE();
+}
 #endif
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (3 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-12 12:12   ` Roger Pau Monné
  2022-01-13 11:40   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 06/14] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

When a PCI device gets assigned/de-assigned some work on vPCI side needs
to be done for that device. Introduce a pair of hooks so vPCI can handle
that.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v4:
 - de-assign vPCI from the previous domain on device assignment
 - do not remove handlers in vpci_assign_device as those must not
   exist at that point
Since v3:
 - remove toolstack roll-back description from the commit message
   as error are to be handled with proper cleanup in Xen itself
 - remove __must_check
 - remove redundant rc check while assigning devices
 - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
 - use REGISTER_VPCI_INIT machinery to run required steps on device
   init/assign: add run_vpci_init helper
Since v2:
- define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
  for x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - extended the commit message
---
 xen/drivers/Kconfig           |  4 +++
 xen/drivers/passthrough/pci.c | 10 ++++++
 xen/drivers/vpci/vpci.c       | 61 +++++++++++++++++++++++++++++------
 xen/include/xen/vpci.h        | 16 +++++++++
 4 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47a6..780490cf8e39 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
 	bool
 
+config HAS_VPCI_GUEST_SUPPORT
+	bool
+	depends on HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 286808b25e65..d9ef91571adf 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -874,6 +874,10 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
     if ( ret )
         goto out;
 
+    ret = vpci_deassign_device(d, pdev);
+    if ( ret )
+        goto out;
+
     if ( pdev->domain == hardware_domain  )
         pdev->quarantine = false;
 
@@ -1429,6 +1433,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
     ASSERT(pdev && (pdev->domain == hardware_domain ||
                     pdev->domain == dom_io));
 
+    rc = vpci_deassign_device(pdev->domain, pdev);
+    if ( rc )
+        goto done;
+
     rc = pdev_msix_assign(d, pdev);
     if ( rc )
         goto done;
@@ -1446,6 +1454,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
         rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
     }
 
+    rc = vpci_assign_device(d, pdev);
+
  done:
     if ( rc )
         printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 37103e207635..a9e9e8ec438c 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -74,12 +74,26 @@ void vpci_remove_device(struct pci_dev *pdev)
     spin_unlock(&pdev->vpci_lock);
 }
 
-int vpci_add_handlers(struct pci_dev *pdev)
+static int run_vpci_init(struct pci_dev *pdev)
 {
-    struct vpci *vpci;
     unsigned int i;
     int rc = 0;
 
+    for ( i = 0; i < NUM_VPCI_INIT; i++ )
+    {
+        rc = __start_vpci_array[i](pdev);
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
+int vpci_add_handlers(struct pci_dev *pdev)
+{
+    struct vpci *vpci;
+    int rc;
+
     if ( !has_vpci(pdev->domain) )
         return 0;
 
@@ -94,19 +108,48 @@ int vpci_add_handlers(struct pci_dev *pdev)
     pdev->vpci = vpci;
     INIT_LIST_HEAD(&pdev->vpci->handlers);
 
-    for ( i = 0; i < NUM_VPCI_INIT; i++ )
-    {
-        rc = __start_vpci_array[i](pdev);
-        if ( rc )
-            break;
-    }
-
+    rc = run_vpci_init(pdev);
     if ( rc )
         vpci_remove_device_locked(pdev);
     spin_unlock(&pdev->vpci_lock);
 
     return rc;
 }
+
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Notify vPCI that device is assigned to guest. */
+int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
+{
+    int rc;
+
+    /* It only makes sense to assign for hwdom or guest domain. */
+    if ( is_system_domain(d) || !has_vpci(d) )
+        return 0;
+
+    spin_lock(&pdev->vpci_lock);
+    rc = run_vpci_init(pdev);
+    spin_unlock(&pdev->vpci_lock);
+    if ( rc )
+        vpci_deassign_device(d, pdev);
+
+    return rc;
+}
+
+/* Notify vPCI that device is de-assigned from guest. */
+int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
+{
+    /* It only makes sense to de-assign from hwdom or guest domain. */
+    if ( is_system_domain(d) || !has_vpci(d) )
+        return 0;
+
+    spin_lock(&pdev->vpci_lock);
+    vpci_remove_device_handlers_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
+
+    return 0;
+}
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index cfff87e5801e..ed127a08a953 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -261,6 +261,22 @@ static inline void vpci_cancel_pending_locked(struct pci_dev *pdev)
 }
 #endif
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+/* Notify vPCI that device is assigned/de-assigned to/from guest. */
+int vpci_assign_device(struct domain *d, struct pci_dev *pdev);
+int vpci_deassign_device(struct domain *d, struct pci_dev *pdev);
+#else
+static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
+{
+    return 0;
+};
+
+static inline int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
+{
+    return 0;
+};
+#endif
+
 #endif
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (4 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2021-11-25 16:28   ` Bertrand Marquis
                     ` (2 more replies)
  2021-11-25 11:02 ` [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR Oleksandr Andrushchenko
                   ` (8 subsequent siblings)
  14 siblings, 3 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add relevant vpci register handlers when assigning PCI device to a domain
and remove those when de-assigning. This allows having different
handlers for different domains, e.g. hwdom and other guests.

Emulate guest BAR register values: this allows creating a guest view
of the registers and emulates size and properties probe as it is done
during PCI device enumeration by the guest.

ROM BAR is only handled for the hardware domain and for guest domains
there is a stub: at the moment PCI expansion ROM handling is supported
for x86 only and it might not be used by other architectures without
emulating x86. Other use-cases may include using that expansion ROM before
Xen boots, hence no emulation is needed in Xen itself. Or when a guest
wants to use the ROM code which seems to be rare.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v4:
- updated commit message
- s/guest_addr/guest_reg
Since v3:
- squashed two patches: dynamic add/remove handlers and guest BAR
  handler implementation
- fix guest BAR read of the high part of a 64bit BAR (Roger)
- add error handling to vpci_assign_device
- s/dom%pd/%pd
- blank line before return
Since v2:
- remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
  has been eliminated from being built on x86
Since v1:
 - constify struct pci_dev where possible
 - do not open code is_system_domain()
 - simplify some code3. simplify
 - use gdprintk + error code instead of gprintk
 - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
   so these do not get compiled for x86
 - removed unneeded is_system_domain check
 - re-work guest read/write to be much simpler and do more work on write
   than read which is expected to be called more frequently
 - removed one too obvious comment
---
 xen/drivers/vpci/header.c | 72 +++++++++++++++++++++++++++++++++++----
 xen/include/xen/vpci.h    |  3 ++
 2 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index ba333fb2f9b0..8880d34ebf8e 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -433,6 +433,48 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
     pci_conf_write32(pdev->sbdf, reg, val);
 }
 
+static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t val, void *data)
+{
+    struct vpci_bar *bar = data;
+    bool hi = false;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+    else
+    {
+        val &= PCI_BASE_ADDRESS_MEM_MASK;
+        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
+                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
+        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
+    }
+
+    bar->guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
+    bar->guest_reg |= (uint64_t)val << (hi ? 32 : 0);
+
+    bar->guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
+}
+
+static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
+                               void *data)
+{
+    const struct vpci_bar *bar = data;
+    bool hi = false;
+
+    if ( bar->type == VPCI_BAR_MEM64_HI )
+    {
+        ASSERT(reg > PCI_BASE_ADDRESS_0);
+        bar--;
+        hi = true;
+    }
+
+    return bar->guest_reg >> (hi ? 32 : 0);
+}
+
 static void rom_write(const struct pci_dev *pdev, unsigned int reg,
                       uint32_t val, void *data)
 {
@@ -481,6 +523,17 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
         rom->addr = val & PCI_ROM_ADDRESS_MASK;
 }
 
+static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t val, void *data)
+{
+}
+
+static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
+                               void *data)
+{
+    return 0xffffffff;
+}
+
 static int init_bars(struct pci_dev *pdev)
 {
     uint16_t cmd;
@@ -489,6 +542,7 @@ static int init_bars(struct pci_dev *pdev)
     struct vpci_header *header = &pdev->vpci->header;
     struct vpci_bar *bars = header->bars;
     int rc;
+    bool is_hwdom = is_hardware_domain(pdev->domain);
 
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
@@ -528,8 +582,10 @@ static int init_bars(struct pci_dev *pdev)
         if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
         {
             bars[i].type = VPCI_BAR_MEM64_HI;
-            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
-                                   4, &bars[i]);
+            rc = vpci_add_register(pdev->vpci,
+                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
+                                   is_hwdom ? bar_write : guest_bar_write,
+                                   reg, 4, &bars[i]);
             if ( rc )
             {
                 pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
@@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
         bars[i].size = size;
         bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
-                               &bars[i]);
+        rc = vpci_add_register(pdev->vpci,
+                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
+                               is_hwdom ? bar_write : guest_bar_write,
+                               reg, 4, &bars[i]);
         if ( rc )
         {
             pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
@@ -590,8 +648,10 @@ static int init_bars(struct pci_dev *pdev)
         header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
                               PCI_ROM_ADDRESS_ENABLE;
 
-        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
-                               4, rom);
+        rc = vpci_add_register(pdev->vpci,
+                               is_hwdom ? vpci_hw_read32 : guest_rom_read,
+                               is_hwdom ? rom_write : guest_rom_write,
+                               rom_reg, 4, rom);
         if ( rc )
             rom->type = VPCI_BAR_EMPTY;
     }
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index ed127a08a953..0a73b14a92dc 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -68,7 +68,10 @@ struct vpci {
     struct vpci_header {
         /* Information about the PCI BARs of this device. */
         struct vpci_bar {
+            /* Physical view of the BAR. */
             uint64_t addr;
+            /* Guest view of the BAR: address and lower bits. */
+            uint64_t guest_reg;
             uint64_t size;
             enum {
                 VPCI_BAR_EMPTY,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (5 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 06/14] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-12 15:15   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 08/14] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Instead of handling a single range set, that contains all the memory
regions of all the BARs and ROM, have them per BAR.
As the range sets are now created when a PCI device is added and destroyed
when it is removed so make them named and accounted.

Note that rangesets were chosen here despite there being only up to
3 separate ranges in each set (typically just 1). But rangeset per BAR
was chosen for the ease of implementation and existing code re-usability.

This is in preparation of making non-identity mappings in p2m for the
MMIOs/ROM.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Since v4:
- use named range sets for BARs (Jan)
- changes required by the new locking scheme
- updated commit message (Jan)
Since v3:
- re-work vpci_cancel_pending accordingly to the per-BAR handling
- s/num_mem_ranges/map_pending and s/uint8_t/bool
- ASSERT(bar->mem) in modify_bars
- create and destroy the rangesets on add/remove
---
 xen/drivers/vpci/header.c | 190 +++++++++++++++++++++++++++-----------
 xen/drivers/vpci/vpci.c   |  30 +++++-
 xen/include/xen/vpci.h    |   3 +-
 3 files changed, 166 insertions(+), 57 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 8880d34ebf8e..cc49aa68886f 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -137,45 +137,86 @@ bool vpci_process_pending(struct vcpu *v)
         return false;
 
     spin_lock(&pdev->vpci_lock);
-    if ( !pdev->vpci_cancel_pending && v->vpci.mem )
+    if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
+        return false;
+    }
+
+    if ( !pdev->vpci_cancel_pending && v->vpci.map_pending )
     {
         struct map_data data = {
             .d = v->domain,
             .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
         };
-        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
+        struct vpci_header *header = &pdev->vpci->header;
+        unsigned int i;
 
-        if ( rc == -ERESTART )
+        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
         {
-            spin_unlock(&pdev->vpci_lock);
-            return true;
-        }
+            struct vpci_bar *bar = &header->bars[i];
+            int rc;
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
+
+            if ( rc == -ERESTART )
+            {
+                spin_unlock(&pdev->vpci_lock);
+                return true;
+            }
 
-        if ( pdev->vpci )
             /* Disable memory decoding unconditionally on failure. */
-            modify_decoding(pdev,
-                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
+            modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
                             !rc && v->vpci.rom_only);
 
-        if ( rc )
-        {
-            /*
-             * FIXME: in case of failure remove the device from the domain.
-             * Note that there might still be leftover mappings. While this is
-             * safe for Dom0, for DomUs the domain needs to be killed in order
-             * to avoid leaking stale p2m mappings on failure.
-             */
-            if ( is_hardware_domain(v->domain) )
-                vpci_remove_device_locked(pdev);
-            else
-                domain_crash(v->domain);
+            if ( rc )
+            {
+                /*
+                 * FIXME: in case of failure remove the device from the domain.
+                 * Note that there might still be leftover mappings. While this is
+                 * safe for Dom0, for DomUs the domain needs to be killed in order
+                 * to avoid leaking stale p2m mappings on failure.
+                 */
+                if ( is_hardware_domain(v->domain) )
+                    vpci_remove_device_locked(pdev);
+                else
+                    domain_crash(v->domain);
+
+                break;
+            }
         }
+
+        v->vpci.map_pending = false;
     }
     spin_unlock(&pdev->vpci_lock);
 
     return false;
 }
 
+static void vpci_bar_remove_ranges(const struct pci_dev *pdev)
+{
+    struct vpci_header *header = &pdev->vpci->header;
+    unsigned int i;
+    int rc;
+
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
+
+        if ( rangeset_is_empty(bar->mem) )
+            continue;
+
+        rc = rangeset_remove_range(bar->mem, 0, ~0ULL);
+        if ( !rc )
+            printk(XENLOG_ERR
+                   "%pd %pp failed to remove range set for BAR: %d\n",
+                   pdev->domain, &pdev->sbdf, rc);
+    }
+}
+
 void vpci_cancel_pending_locked(struct pci_dev *pdev)
 {
     struct vcpu *v;
@@ -185,23 +226,33 @@ void vpci_cancel_pending_locked(struct pci_dev *pdev)
     /* Cancel any pending work now on all vCPUs. */
     for_each_vcpu( pdev->domain, v )
     {
-        if ( v->vpci.mem && (v->vpci.pdev == pdev) )
+        if ( v->vpci.map_pending && (v->vpci.pdev == pdev) )
         {
-            rangeset_destroy(v->vpci.mem);
-            v->vpci.mem = NULL;
+            vpci_bar_remove_ranges(pdev);
+            v->vpci.map_pending = false;
         }
     }
 }
 
 static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
-                            struct rangeset *mem, uint16_t cmd)
+                            uint16_t cmd)
 {
     struct map_data data = { .d = d, .map = true };
-    int rc;
+    struct vpci_header *header = &pdev->vpci->header;
+    int rc = 0;
+    unsigned int i;
+
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
 
-    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
-        process_pending_softirqs();
-    rangeset_destroy(mem);
+        if ( rangeset_is_empty(bar->mem) )
+            continue;
+
+        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
+                                              &data)) == -ERESTART )
+            process_pending_softirqs();
+    }
     if ( !rc )
         modify_decoding(pdev, cmd, false);
 
@@ -209,7 +260,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
 }
 
 static void defer_map(struct domain *d, struct pci_dev *pdev,
-                      struct rangeset *mem, uint16_t cmd, bool rom_only)
+                      uint16_t cmd, bool rom_only)
 {
     struct vcpu *curr = current;
 
@@ -220,7 +271,7 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
      * started for the same device if the domain is not well-behaved.
      */
     curr->vpci.pdev = pdev;
-    curr->vpci.mem = mem;
+    curr->vpci.map_pending = true;
     curr->vpci.cmd = cmd;
     curr->vpci.rom_only = rom_only;
     /*
@@ -234,42 +285,40 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
 static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
 {
     struct vpci_header *header = &pdev->vpci->header;
-    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
     struct pci_dev *tmp, *dev = NULL;
     const struct vpci_msix *msix = pdev->vpci->msix;
-    unsigned int i;
+    unsigned int i, j;
     int rc;
-
-    if ( !mem )
-        return -ENOMEM;
+    bool map_pending;
 
     /*
-     * Create a rangeset that represents the current device BARs memory region
+     * Create a rangeset per BAR that represents the current device memory region
      * and compare it against all the currently active BAR memory regions. If
      * an overlap is found, subtract it from the region to be mapped/unmapped.
      *
-     * First fill the rangeset with all the BARs of this device or with the ROM
+     * First fill the rangesets with all the BARs of this device or with the ROM
      * BAR only, depending on whether the guest is toggling the memory decode
      * bit of the command register, or the enable bit of the ROM BAR register.
      */
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
     {
-        const struct vpci_bar *bar = &header->bars[i];
+        struct vpci_bar *bar = &header->bars[i];
         unsigned long start = PFN_DOWN(bar->addr);
         unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
+        ASSERT(bar->mem);
+
         if ( !MAPPABLE_BAR(bar) ||
              (rom_only ? bar->type != VPCI_BAR_ROM
                        : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
             continue;
 
-        rc = rangeset_add_range(mem, start, end);
+        rc = rangeset_add_range(bar->mem, start, end);
         if ( rc )
         {
             printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
                    start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            goto fail;
         }
     }
 
@@ -280,14 +329,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
         unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
                                      vmsix_table_size(pdev->vpci, i) - 1);
 
-        rc = rangeset_remove_range(mem, start, end);
-        if ( rc )
+        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
         {
-            printk(XENLOG_G_WARNING
-                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
-                   start, end, rc);
-            rangeset_destroy(mem);
-            return rc;
+            const struct vpci_bar *bar = &header->bars[j];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            rc = rangeset_remove_range(bar->mem, start, end);
+            if ( rc )
+            {
+                printk(XENLOG_G_WARNING
+                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
+                       start, end, rc);
+                goto fail;
+            }
         }
     }
 
@@ -325,7 +381,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
             unsigned long start = PFN_DOWN(bar->addr);
             unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
 
-            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
+            if ( !bar->enabled ||
+                 !rangeset_overlaps_range(bar->mem, start, end) ||
                  /*
                   * If only the ROM enable bit is toggled check against other
                   * BARs in the same device for overlaps, but not against the
@@ -334,14 +391,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
                  (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
                 continue;
 
-            rc = rangeset_remove_range(mem, start, end);
+            rc = rangeset_remove_range(bar->mem, start, end);
             if ( rc )
             {
                 spin_unlock(&tmp->vpci_lock);
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
-                rangeset_destroy(mem);
-                return rc;
+                goto fail;
             }
         }
         spin_unlock(&tmp->vpci_lock);
@@ -360,12 +416,36 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
          * will always be to establish mappings and process all the BARs.
          */
         ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
-        return apply_map(pdev->domain, pdev, mem, cmd);
+        return apply_map(pdev->domain, pdev, cmd);
     }
 
-    defer_map(dev->domain, dev, mem, cmd, rom_only);
+    /* Find out how many memory ranges has left after MSI and overlaps. */
+    map_pending = false;
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        if ( !rangeset_is_empty(header->bars[i].mem) )
+        {
+            map_pending = true;
+            break;
+        }
+
+    /*
+     * There are cases when PCI device, root port for example, has neither
+     * memory space nor IO. In this case PCI command register write is
+     * missed resulting in the underlying PCI device not functional, so:
+     *   - if there are no regions write the command register now
+     *   - if there are regions then defer work and write later on
+     */
+    if ( !map_pending )
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
+    else
+        defer_map(dev->domain, dev, cmd, rom_only);
 
     return 0;
+
+fail:
+    /* Destroy all the ranges we may have added. */
+    vpci_bar_remove_ranges(pdev);
+    return rc;
 }
 
 static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index a9e9e8ec438c..98b12a61be6f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -52,11 +52,16 @@ static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
 
 void vpci_remove_device_locked(struct pci_dev *pdev)
 {
+    struct vpci_header *header = &pdev->vpci->header;
+    unsigned int i;
+
     ASSERT(spin_is_locked(&pdev->vpci_lock));
 
     pdev->vpci_cancel_pending = true;
     vpci_remove_device_handlers_locked(pdev);
     vpci_cancel_pending_locked(pdev);
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        rangeset_destroy(header->bars[i].mem);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
@@ -92,6 +97,8 @@ static int run_vpci_init(struct pci_dev *pdev)
 int vpci_add_handlers(struct pci_dev *pdev)
 {
     struct vpci *vpci;
+    struct vpci_header *header;
+    unsigned int i;
     int rc;
 
     if ( !has_vpci(pdev->domain) )
@@ -108,11 +115,32 @@ int vpci_add_handlers(struct pci_dev *pdev)
     pdev->vpci = vpci;
     INIT_LIST_HEAD(&pdev->vpci->handlers);
 
+    header = &pdev->vpci->header;
+    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+    {
+        struct vpci_bar *bar = &header->bars[i];
+        char str[32];
+
+        snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
+        bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
+        if ( !bar->mem )
+        {
+            rc = -ENOMEM;
+            goto fail;
+        }
+    }
+
     rc = run_vpci_init(pdev);
     if ( rc )
-        vpci_remove_device_locked(pdev);
+        goto fail;
+
     spin_unlock(&pdev->vpci_lock);
 
+    return 0;
+
+ fail:
+    vpci_remove_device_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
     return rc;
 }
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 0a73b14a92dc..18319fc329f9 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -73,6 +73,7 @@ struct vpci {
             /* Guest view of the BAR: address and lower bits. */
             uint64_t guest_reg;
             uint64_t size;
+            struct rangeset *mem;
             enum {
                 VPCI_BAR_EMPTY,
                 VPCI_BAR_IO,
@@ -147,9 +148,9 @@ struct vpci {
 
 struct vpci_vcpu {
     /* Per-vcpu structure to store state while {un}mapping of PCI BARs. */
-    struct rangeset *mem;
     struct pci_dev *pdev;
     uint16_t cmd;
+    bool map_pending : 1;
     bool rom_only : 1;
 };
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (6 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-13 10:22   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Take into account guest's BAR view and program its p2m accordingly:
gfn is guest's view of the BAR and mfn is the physical BAR value as set
up by the PCI bus driver in the hardware domain.
This way hardware domain sees physical BAR values and guest sees
emulated ones.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v4:
- moved start_{gfn|mfn} calculation into map_range
- pass vpci_bar in the map_data instead of start_{gfn|mfn}
- s/guest_addr/guest_reg
Since v3:
- updated comment (Roger)
- removed gfn_add(map->start_gfn, rc); which is wrong
- use v->domain instead of v->vpci.pdev->domain
- removed odd e.g. in comment
- s/d%d/%pd in altered code
- use gdprintk for map/unmap logs
Since v2:
- improve readability for data.start_gfn and restructure ?: construct
Since v1:
 - s/MSI/MSI-X in comments

---
---
 xen/drivers/vpci/header.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index cc49aa68886f..b0499d32c5d8 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -30,6 +30,7 @@
 
 struct map_data {
     struct domain *d;
+    const struct vpci_bar *bar;
     bool map;
 };
 
@@ -41,8 +42,25 @@ static int map_range(unsigned long s, unsigned long e, void *data,
 
     for ( ; ; )
     {
+        /* Start address of the BAR as seen by the guest. */
+        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
+                                        ? map->bar->addr
+                                        : map->bar->guest_reg));
+        /* Physical start address of the BAR. */
+        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
         unsigned long size = e - s + 1;
 
+        /*
+         * Ranges to be mapped don't always start at the BAR start address, as
+         * there can be holes or partially consumed ranges. Account for the
+         * offset of the current address from the BAR start.
+         */
+        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
+
+        gdprintk(XENLOG_G_DEBUG,
+                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
+                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
+                 map->d);
         /*
          * ARM TODOs:
          * - On ARM whether the memory is prefetchable or not should be passed
@@ -52,8 +70,10 @@ static int map_range(unsigned long s, unsigned long e, void *data,
          * - {un}map_mmio_regions doesn't support preemption.
          */
 
-        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
-                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
+        rc = map->map ? map_mmio_regions(map->d, start_gfn,
+                                         size, _mfn(s))
+                      : unmap_mmio_regions(map->d, start_gfn,
+                                           size, _mfn(s));
         if ( rc == 0 )
         {
             *c += size;
@@ -62,8 +82,8 @@ static int map_range(unsigned long s, unsigned long e, void *data,
         if ( rc < 0 )
         {
             printk(XENLOG_G_WARNING
-                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
-                   map->map ? "" : "un", s, e, map->d->domain_id, rc);
+                   "Failed to identity %smap [%lx, %lx] for %pd: %d\n",
+                   map->map ? "" : "un", s, e, map->d, rc);
             break;
         }
         ASSERT(rc < size);
@@ -160,6 +180,7 @@ bool vpci_process_pending(struct vcpu *v)
             if ( rangeset_is_empty(bar->mem) )
                 continue;
 
+            data.bar = bar;
             rc = rangeset_consume_ranges(bar->mem, map_range, &data);
 
             if ( rc == -ERESTART )
@@ -249,6 +270,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
         if ( rangeset_is_empty(bar->mem) )
             continue;
 
+        data.bar = bar;
         while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
                                               &data)) == -ERESTART )
             process_pending_softirqs();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (7 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 08/14] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-13 10:50   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 10/14] vpci/header: reset the command register when adding devices Oleksandr Andrushchenko
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Add basic emulation support for guests. At the moment only emulate
PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left
as TODO.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index b0499d32c5d8..2e44055946b0 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t cmd, void *data)
+{
+    /* TODO: Add proper emulation for all bits of the command register. */
+
+#ifdef CONFIG_HAS_PCI_MSI
+    if ( pdev->vpci->msi->enabled )
+    {
+        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
+        cmd |= PCI_COMMAND_INTX_DISABLE;
+    }
+#endif
+
+    cmd_write(pdev, reg, cmd, data);
+}
+
 static void bar_write(const struct pci_dev *pdev, unsigned int reg,
                       uint32_t val, void *data)
 {
@@ -663,8 +679,9 @@ static int init_bars(struct pci_dev *pdev)
     }
 
     /* Setup a handler for the command register. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-                           2, header);
+    rc = vpci_add_register(pdev->vpci, vpci_hw_read16,
+                           is_hwdom ? cmd_write : guest_cmd_write,
+                           PCI_COMMAND, 2, header);
     if ( rc )
         return rc;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 10/14] vpci/header: reset the command register when adding devices
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (8 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-13 11:07   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Reset the command register when passing through a PCI device:
it is possible that when passing through a PCI device its memory
decoding bits in the command register are already set. Thus, a
guest OS may not write to the command register to update memory
decoding, so guest mappings (guest's view of the BARs) are
left not updated.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v1:
 - do not write 0 to the command register, but respect host settings.
---
 xen/drivers/vpci/header.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 2e44055946b0..41dda3c43d56 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -491,8 +491,7 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
-static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
-                            uint32_t cmd, void *data)
+static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd)
 {
     /* TODO: Add proper emulation for all bits of the command register. */
 
@@ -504,7 +503,13 @@ static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
     }
 #endif
 
-    cmd_write(pdev, reg, cmd, data);
+    return cmd;
+}
+
+static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
+                            uint32_t cmd, void *data)
+{
+    cmd_write(pdev, reg, emulate_cmd_reg(pdev, cmd), data);
 }
 
 static void bar_write(const struct pci_dev *pdev, unsigned int reg,
@@ -678,6 +683,10 @@ static int init_bars(struct pci_dev *pdev)
         return -EOPNOTSUPP;
     }
 
+    /* Reset the command register for the guest. */
+    if ( !is_hwdom )
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, emulate_cmd_reg(pdev, 0));
+
     /* Setup a handler for the command register. */
     rc = vpci_add_register(pdev->vpci, vpci_hw_read16,
                            is_hwdom ? cmd_write : guest_cmd_write,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (9 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 10/14] vpci/header: reset the command register when adding devices Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-12 15:39   ` Jan Beulich
  2022-01-13 11:35   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 12/14] xen/arm: translate virtual PCI bus topology for guests Oleksandr Andrushchenko
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
    functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/vpci/vpci.c | 51 +++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/sched.h |  8 +++++++
 xen/include/xen/vpci.h  | 11 +++++++++
 3 files changed, 70 insertions(+)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 98b12a61be6f..c2fb4d4db233 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -114,6 +114,9 @@ int vpci_add_handlers(struct pci_dev *pdev)
     spin_lock(&pdev->vpci_lock);
     pdev->vpci = vpci;
     INIT_LIST_HEAD(&pdev->vpci->handlers);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    pdev->vpci->guest_sbdf.sbdf = ~0;
+#endif
 
     header = &pdev->vpci->header;
     for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
@@ -145,6 +148,53 @@ int vpci_add_handlers(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+int vpci_add_virtual_device(struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    pci_sbdf_t sbdf = { 0 };
+    unsigned long new_dev_number;
+
+    /*
+     * Each PCI bus supports 32 devices/slots at max or up to 256 when
+     * there are multi-function ones which are not yet supported.
+     */
+    if ( pdev->info.is_extfn )
+    {
+        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+                 &pdev->sbdf);
+        return -EOPNOTSUPP;
+    }
+
+    new_dev_number = find_first_zero_bit(&d->vpci_dev_assigned_map,
+                                         VPCI_MAX_VIRT_DEV);
+    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
+        return -ENOSPC;
+
+    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
+
+    /*
+     * Both segment and bus number are 0:
+     *  - we emulate a single host bridge for the guest, e.g. segment 0
+     *  - with bus 0 the virtual devices are seen as embedded
+     *    endpoints behind the root complex
+     *
+     * TODO: add support for multi-function devices.
+     */
+    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
+    pdev->vpci->guest_sbdf = sbdf;
+
+    return 0;
+
+}
+REGISTER_VPCI_INIT(vpci_add_virtual_device, VPCI_PRIORITY_MIDDLE);
+
+static void vpci_remove_virtual_device(struct domain *d,
+                                       const struct pci_dev *pdev)
+{
+    __clear_bit(pdev->vpci->guest_sbdf.dev, &d->vpci_dev_assigned_map);
+    pdev->vpci->guest_sbdf.sbdf = ~0;
+}
+
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
 {
@@ -171,6 +221,7 @@ int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
         return 0;
 
     spin_lock(&pdev->vpci_lock);
+    vpci_remove_virtual_device(d, pdev);
     vpci_remove_device_handlers_locked(pdev);
     spin_unlock(&pdev->vpci_lock);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 28146ee404e6..10bff103317c 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -444,6 +444,14 @@ struct domain
 
 #ifdef CONFIG_HAS_PCI
     struct list_head pdev_list;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /*
+     * The bitmap which shows which device numbers are already used by the
+     * virtual PCI bus topology and is used to assign a unique SBDF to the
+     * next passed through virtual PCI device.
+     */
+    unsigned long vpci_dev_assigned_map;
+#endif
 #endif
 
 #ifdef CONFIG_HAS_PASSTHROUGH
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 18319fc329f9..e5258bd7ce90 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -21,6 +21,13 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
 
 #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
 
+/*
+ * Maximum number of devices supported by the virtual bus topology:
+ * each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ */
+#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
+
 #define REGISTER_VPCI_INIT(x, p)                \
   static vpci_register_init_t *const x##_entry  \
                __used_section(".data.vpci." p) = x
@@ -143,6 +150,10 @@ struct vpci {
             struct vpci_arch_msix_entry arch;
         } entries[];
     } *msix;
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /* Guest SBDF of the device. */
+    pci_sbdf_t guest_sbdf;
+#endif
 #endif
 };
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 12/14] xen/arm: translate virtual PCI bus topology for guests
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (10 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-13 12:18   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 13/14] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Andrushchenko
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/vpci.c     | 18 ++++++++++++++++++
 xen/drivers/vpci/vpci.c | 27 +++++++++++++++++++++++++++
 xen/include/xen/vpci.h  |  1 +
 3 files changed, 46 insertions(+)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 8e801f275879..3d134f42d07e 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -41,6 +41,15 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
     /* data is needed to prevent a pointer cast on 32bit */
     unsigned long data;
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+        return 1;
+#endif
+
     if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
                         1U << info->dabt.size, &data) )
     {
@@ -59,6 +68,15 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
     struct pci_host_bridge *bridge = p;
     pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+    /*
+     * For the passed through devices we need to map their virtual SBDF
+     * to the physical PCI device being passed through.
+     */
+    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
+        return 1;
+#endif
+
     return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
                            1U << info->dabt.size, r);
 }
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index c2fb4d4db233..bdc8c63f73fa 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -195,6 +195,33 @@ static void vpci_remove_virtual_device(struct domain *d,
     pdev->vpci->guest_sbdf.sbdf = ~0;
 }
 
+/*
+ * Find the physical device which is mapped to the virtual device
+ * and translate virtual SBDF to the physical one.
+ */
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
+{
+    struct pci_dev *pdev;
+
+    for_each_pdev( d, pdev )
+    {
+        bool found;
+
+        spin_lock(&pdev->vpci_lock);
+        found = pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf);
+        spin_unlock(&pdev->vpci_lock);
+
+        if ( found )
+        {
+            /* Replace guest SBDF with the physical one. */
+            *sbdf = pdev->sbdf;
+            return true;
+        }
+    }
+
+    return false;
+}
+
 /* Notify vPCI that device is assigned to guest. */
 int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
 {
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e5258bd7ce90..21d76929391f 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -280,6 +280,7 @@ static inline void vpci_cancel_pending_locked(struct pci_dev *pdev)
 /* Notify vPCI that device is assigned/de-assigned to/from guest. */
 int vpci_assign_device(struct domain *d, struct pci_dev *pdev);
 int vpci_deassign_device(struct domain *d, struct pci_dev *pdev);
+bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
 #else
 static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 13/14] xen/arm: account IO handlers for emulated PCI MSI-X
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (11 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 12/14] xen/arm: translate virtual PCI bus topology for guests Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2022-01-13 13:23   ` Roger Pau Monné
  2021-11-25 11:02 ` [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled Oleksandr Andrushchenko
  2021-12-15 11:56 ` [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
This actually moved here from the part 2 of the prep work for PCI
passthrough on Arm as it seems to be the proper place for it.

New in v5
---
 xen/arch/arm/vpci.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3d134f42d07e..902f8491e030 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -134,6 +134,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+    unsigned int count;
+
     if ( !has_vpci(d) )
         return 0;
 
@@ -145,7 +147,18 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
     }
 
     /* For a single emulated host bridge's configuration space. */
-    return 1;
+    count = 1;
+
+#ifdef CONFIG_HAS_PCI_MSI
+    /*
+     * There's a single MSI-X MMIO handler that deals with both PBA
+     * and MSI-X tables per each PCI device being passed through.
+     * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
+     */
+    count += VPCI_MAX_VIRT_DEV;
+#endif
+
+    return count;
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (12 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 13/14] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Andrushchenko
@ 2021-11-25 11:02 ` Oleksandr Andrushchenko
  2021-11-25 11:17   ` Jan Beulich
  2021-12-15 11:56 ` [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:02 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

For unprivileged guests vpci_{read|write} need to be re-worked
to not passthrough accesses to the registers not explicitly handled
by the corresponding vPCI handlers: without fixing that passthrough
to guests is completely unsafe as Xen allows them full access to
the registers.

Xen needs to be sure that every register a guest accesses is not
going to cause the system to malfunction, so Xen needs to keep a
list of the registers it is safe for a guest to access.

For example, we should only expose the PCI capabilities that we know
are safe for a guest to use, i.e.: MSI and MSI-X initially.
The rest of the capabilities should be blocked from guest access,
unless we audit them and declare safe for a guest to access.

As a reference we might want to look at the approach currently used
by QEMU in order to do PCI passthrough. A very limited set of PCI
capabilities known to be safe for untrusted access are exposed to the
guest and registers need to be explicitly handled or else access is
rejected. Xen needs a fairly similar model in vPCI or else none of
this will be safe for unprivileged access.

Add the corresponding TODO comment to highlight there is a problem that
needs to be fixed.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
New in v5
---
 xen/drivers/vpci/vpci.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index bdc8c63f73fa..4fb77d08825a 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -493,6 +493,29 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
     if ( !pdev->vpci )
     {
         spin_unlock(&pdev->vpci_lock);
+        /*
+         * TODO: for unprivileged guests vpci_{read|write} need to be re-worked
+         * to not passthrough accesses to the registers not explicitly handled
+         * by the corresponding vPCI handlers: without fixing that passthrough
+         * to guests is completely unsafe as Xen allows them full access to
+         * the registers.
+         *
+         * Xen needs to be sure that every register a guest accesses is not
+         * going to cause the system to malfunction, so Xen needs to keep a
+         * list of the registers it is safe for a guest to access.
+         *
+         * For example, we should only expose the PCI capabilities that we know
+         * are safe for a guest to use, i.e.: MSI and MSI-X initially.
+         * The rest of the capabilities should be blocked from guest access,
+         * unless we audit them and declare safe for a guest to access.
+         *
+         * As a reference we might want to look at the approach currently used
+         * by QEMU in order to do PCI passthrough. A very limited set of PCI
+         * capabilities known to be safe for untrusted access are exposed to the
+         * guest and registers need to be explicitly handled or else access is
+         * rejected. Xen needs a fairly similar model in vPCI or else none of
+         * this will be safe for unprivileged access.
+         */
         return vpci_read_hw(sbdf, reg, size);
     }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag
  2021-11-25 11:02 ` [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
@ 2021-11-25 11:06   ` Jan Beulich
  2021-11-25 11:08     ` Oleksandr Andrushchenko
  2021-12-15  3:20   ` Volodymyr Babchuk
  1 sibling, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2021-11-25 11:06 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	xen-devel

On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> There are range sets which should not be printed, so introduce a flag
> which allows marking those as such. Implement relevant logic to skip
> such entries while printing.
> 
> While at it also simplify the definition of the flags by directly
> defining those without helpers.
> 
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit with a remark:

> --- a/xen/include/xen/rangeset.h
> +++ b/xen/include/xen/rangeset.h
> @@ -48,9 +48,10 @@ void rangeset_limit(
>      struct rangeset *r, unsigned int limit);
>  
>  /* Flags for passing to rangeset_new(). */
> - /* Pretty-print range limits in hexadecimal. */
> -#define _RANGESETF_prettyprint_hex 0
> -#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
> +/* Pretty-print range limits in hexadecimal. */

I would guess this comment was intentionally indented by a blank,
to visually separate it from the comment covering all flags. I'd
prefer if that was kept and if the new comment you add followed
suit.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag
  2021-11-25 11:06   ` Jan Beulich
@ 2021-11-25 11:08     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko,
	xen-devel, Oleksandr Andrushchenko



On 25.11.21 13:06, Jan Beulich wrote:
> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> There are range sets which should not be printed, so introduce a flag
>> which allows marking those as such. Implement relevant logic to skip
>> such entries while printing.
>>
>> While at it also simplify the definition of the flags by directly
>> defining those without helpers.
>>
>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit with a remark:
>
>> --- a/xen/include/xen/rangeset.h
>> +++ b/xen/include/xen/rangeset.h
>> @@ -48,9 +48,10 @@ void rangeset_limit(
>>       struct rangeset *r, unsigned int limit);
>>   
>>   /* Flags for passing to rangeset_new(). */
>> - /* Pretty-print range limits in hexadecimal. */
>> -#define _RANGESETF_prettyprint_hex 0
>> -#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
>> +/* Pretty-print range limits in hexadecimal. */
> I would guess this comment was intentionally indented by a blank,
> to visually separate it from the comment covering all flags. I'd
> prefer if that was kept and if the new comment you add followed
> suit.
Ah, ok, so I will add a space for the new flag's comment as well then
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled
  2021-11-25 11:02 ` [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled Oleksandr Andrushchenko
@ 2021-11-25 11:17   ` Jan Beulich
  2021-11-25 11:20     ` Oleksandr Andrushchenko
  2022-01-13 13:27     ` Roger Pau Monné
  0 siblings, 2 replies; 130+ messages in thread
From: Jan Beulich @ 2021-11-25 11:17 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	xen-devel

On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> For unprivileged guests vpci_{read|write} need to be re-worked
> to not passthrough accesses to the registers not explicitly handled
> by the corresponding vPCI handlers: without fixing that passthrough
> to guests is completely unsafe as Xen allows them full access to
> the registers.
> 
> Xen needs to be sure that every register a guest accesses is not
> going to cause the system to malfunction, so Xen needs to keep a
> list of the registers it is safe for a guest to access.
> 
> For example, we should only expose the PCI capabilities that we know
> are safe for a guest to use, i.e.: MSI and MSI-X initially.
> The rest of the capabilities should be blocked from guest access,
> unless we audit them and declare safe for a guest to access.
> 
> As a reference we might want to look at the approach currently used
> by QEMU in order to do PCI passthrough. A very limited set of PCI
> capabilities known to be safe for untrusted access are exposed to the
> guest and registers need to be explicitly handled or else access is
> rejected. Xen needs a fairly similar model in vPCI or else none of
> this will be safe for unprivileged access.
> 
> Add the corresponding TODO comment to highlight there is a problem that
> needs to be fixed.
> 
> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Looks okay to me in principle, but imo needs to come earlier in the
series, before things actually get exposed to DomU-s.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled
  2021-11-25 11:17   ` Jan Beulich
@ 2021-11-25 11:20     ` Oleksandr Andrushchenko
  2022-01-13 13:27     ` Roger Pau Monné
  1 sibling, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-25 11:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko,
	xen-devel, Oleksandr Andrushchenko



On 25.11.21 13:17, Jan Beulich wrote:
> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> For unprivileged guests vpci_{read|write} need to be re-worked
>> to not passthrough accesses to the registers not explicitly handled
>> by the corresponding vPCI handlers: without fixing that passthrough
>> to guests is completely unsafe as Xen allows them full access to
>> the registers.
>>
>> Xen needs to be sure that every register a guest accesses is not
>> going to cause the system to malfunction, so Xen needs to keep a
>> list of the registers it is safe for a guest to access.
>>
>> For example, we should only expose the PCI capabilities that we know
>> are safe for a guest to use, i.e.: MSI and MSI-X initially.
>> The rest of the capabilities should be blocked from guest access,
>> unless we audit them and declare safe for a guest to access.
>>
>> As a reference we might want to look at the approach currently used
>> by QEMU in order to do PCI passthrough. A very limited set of PCI
>> capabilities known to be safe for untrusted access are exposed to the
>> guest and registers need to be explicitly handled or else access is
>> rejected. Xen needs a fairly similar model in vPCI or else none of
>> this will be safe for unprivileged access.
>>
>> Add the corresponding TODO comment to highlight there is a problem that
>> needs to be fixed.
>>
>> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Looks okay to me in principle,
Thanks Roger for writing most of the text in e-mails while discussing the issue
>   but imo needs to come earlier in the
> series, before things actually get exposed to DomU-s.
I can have it after "[PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign"
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2021-11-25 11:02 ` [PATCH v5 06/14] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
@ 2021-11-25 16:28   ` Bertrand Marquis
  2021-11-26 12:19     ` Oleksandr Andrushchenko
  2022-01-12 12:35   ` Roger Pau Monné
  2022-01-12 17:34   ` Roger Pau Monné
  2 siblings, 1 reply; 130+ messages in thread
From: Bertrand Marquis @ 2021-11-25 16:28 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Xen-devel, Julien Grall, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, roger.pau, jbeulich,
	andrew.cooper3, george.dunlap, paul, Rahul Singh,
	Oleksandr Andrushchenko

Hi Oleksandr,

> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
> 
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> ROM BAR is only handled for the hardware domain and for guest domains
> there is a stub: at the moment PCI expansion ROM handling is supported
> for x86 only and it might not be used by other architectures without
> emulating x86. Other use-cases may include using that expansion ROM before
> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
> wants to use the ROM code which seems to be rare.

In the generic code, bars for ioports are actually skipped (check code before
in header.c, in case of ioports there is a continue) and no handler is registered for them.
The consequence will be that a guest will access hardware when reading those BARs.

I think we should instead make sure that we intercept all accesses to BARs and return
something empty for IOPORTS BARs.

Regards
Bertrand

> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v4:
> - updated commit message
> - s/guest_addr/guest_reg
> Since v3:
> - squashed two patches: dynamic add/remove handlers and guest BAR
>  handler implementation
> - fix guest BAR read of the high part of a 64bit BAR (Roger)
> - add error handling to vpci_assign_device
> - s/dom%pd/%pd
> - blank line before return
> Since v2:
> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>  has been eliminated from being built on x86
> Since v1:
> - constify struct pci_dev where possible
> - do not open code is_system_domain()
> - simplify some code3. simplify
> - use gdprintk + error code instead of gprintk
> - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>   so these do not get compiled for x86
> - removed unneeded is_system_domain check
> - re-work guest read/write to be much simpler and do more work on write
>   than read which is expected to be called more frequently
> - removed one too obvious comment
> ---
> xen/drivers/vpci/header.c | 72 +++++++++++++++++++++++++++++++++++----
> xen/include/xen/vpci.h    |  3 ++
> 2 files changed, 69 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ba333fb2f9b0..8880d34ebf8e 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -433,6 +433,48 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>     pci_conf_write32(pdev->sbdf, reg, val);
> }
> 
> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +    struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +    else
> +    {
> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +    }
> +
> +    bar->guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
> +    bar->guest_reg |= (uint64_t)val << (hi ? 32 : 0);
> +
> +    bar->guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
> +}
> +
> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    const struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +
> +    return bar->guest_reg >> (hi ? 32 : 0);
> +}
> +
> static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>                       uint32_t val, void *data)
> {
> @@ -481,6 +523,17 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>         rom->addr = val & PCI_ROM_ADDRESS_MASK;
> }
> 
> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +}
> +
> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    return 0xffffffff;
> +}
> +
> static int init_bars(struct pci_dev *pdev)
> {
>     uint16_t cmd;
> @@ -489,6 +542,7 @@ static int init_bars(struct pci_dev *pdev)
>     struct vpci_header *header = &pdev->vpci->header;
>     struct vpci_bar *bars = header->bars;
>     int rc;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
> 
>     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>     {
> @@ -528,8 +582,10 @@ static int init_bars(struct pci_dev *pdev)
>         if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>         {
>             bars[i].type = VPCI_BAR_MEM64_HI;
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> -                                   4, &bars[i]);
> +            rc = vpci_add_register(pdev->vpci,
> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                                   is_hwdom ? bar_write : guest_bar_write,
> +                                   reg, 4, &bars[i]);
>             if ( rc )
>             {
>                 pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
>         bars[i].size = size;
>         bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
> 
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> -                               &bars[i]);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                               is_hwdom ? bar_write : guest_bar_write,
> +                               reg, 4, &bars[i]);
>         if ( rc )
>         {
>             pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -590,8 +648,10 @@ static int init_bars(struct pci_dev *pdev)
>         header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
>                               PCI_ROM_ADDRESS_ENABLE;
> 
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
> -                               4, rom);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_rom_read,
> +                               is_hwdom ? rom_write : guest_rom_write,
> +                               rom_reg, 4, rom);
>         if ( rc )
>             rom->type = VPCI_BAR_EMPTY;
>     }
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index ed127a08a953..0a73b14a92dc 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -68,7 +68,10 @@ struct vpci {
>     struct vpci_header {
>         /* Information about the PCI BARs of this device. */
>         struct vpci_bar {
> +            /* Physical view of the BAR. */
>             uint64_t addr;
> +            /* Guest view of the BAR: address and lower bits. */
> +            uint64_t guest_reg;
>             uint64_t size;
>             enum {
>                 VPCI_BAR_EMPTY,
> -- 
> 2.25.1
> 



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2021-11-25 16:28   ` Bertrand Marquis
@ 2021-11-26 12:19     ` Oleksandr Andrushchenko
  2022-02-03 12:36       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-11-26 12:19 UTC (permalink / raw)
  To: Bertrand Marquis, roger.pau
  Cc: Xen-devel, Julien Grall, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Rahul Singh, Oleksandr Andrushchenko,
	Oleksandr Andrushchenko

Hi, Bertrand!

On 25.11.21 18:28, Bertrand Marquis wrote:
> Hi Oleksandr,
>
>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
>>
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Add relevant vpci register handlers when assigning PCI device to a domain
>> and remove those when de-assigning. This allows having different
>> handlers for different domains, e.g. hwdom and other guests.
>>
>> Emulate guest BAR register values: this allows creating a guest view
>> of the registers and emulates size and properties probe as it is done
>> during PCI device enumeration by the guest.
>>
>> ROM BAR is only handled for the hardware domain and for guest domains
>> there is a stub: at the moment PCI expansion ROM handling is supported
>> for x86 only and it might not be used by other architectures without
>> emulating x86. Other use-cases may include using that expansion ROM before
>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>> wants to use the ROM code which seems to be rare.
> In the generic code, bars for ioports are actually skipped (check code before
> in header.c, in case of ioports there is a continue) and no handler is registered for them.
> The consequence will be that a guest will access hardware when reading those BARs.
Yes, this seems to be a valid point
>
> I think we should instead make sure that we intercept all accesses to BARs and return
> something empty for IOPORTS BARs.
I would like to hear from Roger on what was the initial plan for that, so
we are aligned between the different architectures, Arm and x86 here for now

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending
  2021-11-25 11:02 ` [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending Oleksandr Andrushchenko
@ 2021-12-10 17:55   ` Julien Grall
  2021-12-11  8:20     ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Julien Grall @ 2021-12-10 17:55 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, xen-devel
  Cc: sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

Hi Oleksandr,

On 25/11/2021 11:02, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> vpci_process_pending is defined with different attributes, e.g.
> with __must_check if CONFIG_HAS_VPCI enabled and not otherwise.
> Fix this by defining both of the definitions with __must_check.
> 
> Fixes: 14583a590783 ("7fbb096bf345 kconfig: don't select VPCI if building a shim-only binary")
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Reviewed-by: Julien Grall <jgrall@amazon.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending
  2021-12-10 17:55   ` Julien Grall
@ 2021-12-11  8:20     ` Roger Pau Monné
  2021-12-11  8:57       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2021-12-11  8:20 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr Andrushchenko, xen-devel, sstabellini,
	oleksandr_tyshchenko, volodymyr_babchuk, Artem_Mygaiev, jbeulich,
	andrew.cooper3, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Fri, Dec 10, 2021 at 05:55:03PM +0000, Julien Grall wrote:
> Hi Oleksandr,
> 
> On 25/11/2021 11:02, Oleksandr Andrushchenko wrote:
> > From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> > 
> > vpci_process_pending is defined with different attributes, e.g.
> > with __must_check if CONFIG_HAS_VPCI enabled and not otherwise.
> > Fix this by defining both of the definitions with __must_check.
> > 
> > Fixes: 14583a590783 ("7fbb096bf345 kconfig: don't select VPCI if building a shim-only binary")
> > 
> > Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Reviewed-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

I think this can be committed independently of the rest of the
series?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending
  2021-12-11  8:20     ` Roger Pau Monné
@ 2021-12-11  8:57       ` Oleksandr Andrushchenko
  2022-01-26  8:31         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-12-11  8:57 UTC (permalink / raw)
  To: Roger Pau Monné, Julien Grall
  Cc: xen-devel, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, jbeulich, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 11.12.21 10:20, Roger Pau Monné wrote:
> On Fri, Dec 10, 2021 at 05:55:03PM +0000, Julien Grall wrote:
>> Hi Oleksandr,
>>
>> On 25/11/2021 11:02, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> vpci_process_pending is defined with different attributes, e.g.
>>> with __must_check if CONFIG_HAS_VPCI enabled and not otherwise.
>>> Fix this by defining both of the definitions with __must_check.
>>>
>>> Fixes: 14583a590783 ("7fbb096bf345 kconfig: don't select VPCI if building a shim-only binary")
>>>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Reviewed-by: Julien Grall <jgrall@amazon.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
> I think this can be committed independently of the rest of the
> series?
I think so
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag
  2021-11-25 11:02 ` [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
  2021-11-25 11:06   ` Jan Beulich
@ 2021-12-15  3:20   ` Volodymyr Babchuk
  2021-12-15  5:53     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Volodymyr Babchuk @ 2021-12-15  3:20 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Artem Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko


Hi Oleksandr,

Sorry for jumping in amid v5, but...

Oleksandr Andrushchenko <andr2000@gmail.com> writes:

> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> There are range sets which should not be printed, so introduce a flag
> which allows marking those as such. Implement relevant logic to skip
> such entries while printing.
>
> While at it also simplify the definition of the flags by directly
> defining those without helpers.
>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> ---
> Since v1:
> - update BUG_ON with new flag
> - simplify the definition of the flags
> ---
>  xen/common/rangeset.c      | 5 ++++-
>  xen/include/xen/rangeset.h | 7 ++++---
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
> index 885b6b15c229..ea27d651723b 100644
> --- a/xen/common/rangeset.c
> +++ b/xen/common/rangeset.c
> @@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
>      INIT_LIST_HEAD(&r->range_list);
>      r->nr_ranges = -1;
>  
> -    BUG_ON(flags & ~RANGESETF_prettyprint_hex);
> +    BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
>      r->flags = flags;
>  
>      safe_strcpy(r->name, name ?: "(no name)");
> @@ -575,6 +575,9 @@ void rangeset_domain_printk(
>  
>      list_for_each_entry ( r, &d->rangesets, rangeset_list )
>      {
> +        if ( r->flags & RANGESETF_no_print )
> +            continue;
> +
>          printk("    ");
>          rangeset_printk(r);
>          printk("\n");
> diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
> index 135f33f6066f..045fcafa8368 100644
> --- a/xen/include/xen/rangeset.h
> +++ b/xen/include/xen/rangeset.h
> @@ -48,9 +48,10 @@ void rangeset_limit(
>      struct rangeset *r, unsigned int limit);
>  
>  /* Flags for passing to rangeset_new(). */
> - /* Pretty-print range limits in hexadecimal. */
> -#define _RANGESETF_prettyprint_hex 0
> -#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
> +/* Pretty-print range limits in hexadecimal. */
> +#define RANGESETF_prettyprint_hex   (1U << 0)

If you already touching all the flags, why not to use BIT()?

> +/* Do not print entries marked with this flag. */
> +#define RANGESETF_no_print          (1U << 1)
>  
>  bool_t __must_check rangeset_is_empty(
>      const struct rangeset *r);


-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag
  2021-12-15  3:20   ` Volodymyr Babchuk
@ 2021-12-15  5:53     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-12-15  5:53 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Artem Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Volodymyr!

On 15.12.21 05:20, Volodymyr Babchuk wrote:
> Hi Oleksandr,
>
> Sorry for jumping in amid v5, but...
>
> Oleksandr Andrushchenko <andr2000@gmail.com> writes:
>
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> There are range sets which should not be printed, so introduce a flag
>> which allows marking those as such. Implement relevant logic to skip
>> such entries while printing.
>>
>> While at it also simplify the definition of the flags by directly
>> defining those without helpers.
>>
>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> ---
>> Since v1:
>> - update BUG_ON with new flag
>> - simplify the definition of the flags
>> ---
>>   xen/common/rangeset.c      | 5 ++++-
>>   xen/include/xen/rangeset.h | 7 ++++---
>>   2 files changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
>> index 885b6b15c229..ea27d651723b 100644
>> --- a/xen/common/rangeset.c
>> +++ b/xen/common/rangeset.c
>> @@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
>>       INIT_LIST_HEAD(&r->range_list);
>>       r->nr_ranges = -1;
>>   
>> -    BUG_ON(flags & ~RANGESETF_prettyprint_hex);
>> +    BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
>>       r->flags = flags;
>>   
>>       safe_strcpy(r->name, name ?: "(no name)");
>> @@ -575,6 +575,9 @@ void rangeset_domain_printk(
>>   
>>       list_for_each_entry ( r, &d->rangesets, rangeset_list )
>>       {
>> +        if ( r->flags & RANGESETF_no_print )
>> +            continue;
>> +
>>           printk("    ");
>>           rangeset_printk(r);
>>           printk("\n");
>> diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
>> index 135f33f6066f..045fcafa8368 100644
>> --- a/xen/include/xen/rangeset.h
>> +++ b/xen/include/xen/rangeset.h
>> @@ -48,9 +48,10 @@ void rangeset_limit(
>>       struct rangeset *r, unsigned int limit);
>>   
>>   /* Flags for passing to rangeset_new(). */
>> - /* Pretty-print range limits in hexadecimal. */
>> -#define _RANGESETF_prettyprint_hex 0
>> -#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
>> +/* Pretty-print range limits in hexadecimal. */
>> +#define RANGESETF_prettyprint_hex   (1U << 0)
> If you already touching all the flags, why not to use BIT()?
It was discussed previously [1] and we decided not to use the BIT macro

Thank you,
Oleksandr
>
>> +/* Do not print entries marked with this flag. */
>> +#define RANGESETF_no_print          (1U << 1)
>>   
>>   bool_t __must_check rangeset_is_empty(
>>       const struct rangeset *r);
>
[1] https://patchwork.kernel.org/project/xen-devel/patch/20211122092825.2502306-1-andr2000@gmail.com/

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 00/14] PCI devices passthrough on Arm, part 3
  2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
                   ` (13 preceding siblings ...)
  2021-11-25 11:02 ` [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled Oleksandr Andrushchenko
@ 2021-12-15 11:56 ` Oleksandr Andrushchenko
  2021-12-15 12:07   ` Jan Beulich
  14 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-12-15 11:56 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, George Dunlap, jbeulich, julien,
	sstabellini, Wei Liu
  Cc: Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	roger.pau, george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Dear rest maintainers!

Could you please review this series which seems to get stuck?

Thank you in advance,
Oleksandr

On 25.11.21 13:02, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Hi, all!
>
> 1. This patch series is focusing on vPCI and adds support for non-identity
> PCI BAR mappings which is required while passing through a PCI device to
> a guest. The highlights are:
>
> - Add relevant vpci register handlers when assigning PCI device to a domain
>    and remove those when de-assigning. This allows having different
>    handlers for different domains, e.g. hwdom and other guests.
>
> - Emulate guest BAR register values based on physical BAR values.
>    This allows creating a guest view of the registers and emulates
>    size and properties probe as it is done during PCI device enumeration by
>    the guest.
>
> - Instead of handling a single range set, that contains all the memory
>    regions of all the BARs and ROM, have them per BAR.
>
> - Take into account guest's BAR view and program its p2m accordingly:
>    gfn is guest's view of the BAR and mfn is the physical BAR value as set
>    up by the host bridge in the hardware domain.
>    This way hardware doamin sees physical BAR values and guest sees
>    emulated ones.
>
> 2. The series also adds support for virtual PCI bus topology for guests:
>   - We emulate a single host bridge for the guest, so segment is always 0.
>   - The implementation is limited to 32 devices which are allowed on
>     a single PCI bus.
>   - The virtual bus number is set to 0, so virtual devices are seen
>     as embedded endpoints behind the root complex.
>
> 3. The series has complete re-work of the locking scheme used/absent before with
> the help of the work started by Roger [1]:
> [PATCH v5 03/13] vpci: move lock outside of struct vpci
>
> This way the lock can be used to check whether vpci is present, and
> removal can be performed while holding the lock, in order to make
> sure there are no accesses to the contents of the vpci struct.
> Previously removal could race with vpci_read for example, since the
> lock was dropped prior to freeing pdev->vpci.
> This also solves synchronization issues between all vPCI code entities
> which could run in parallel.
>
> 4. There is an outstanding TODO left unimplemented by this series:
> for unprivileged guests vpci_{read|write} need to be re-worked
> to not passthrough accesses to the registers not explicitly handled
> by the corresponding vPCI handlers: without fixing that passthrough
> to guests is completely unsafe as Xen allows them full access to
> the registers.
>
> Xen needs to be sure that every register a guest accesses is not
> going to cause the system to malfunction, so Xen needs to keep a
> list of the registers it is safe for a guest to access.
>
> For example, we should only expose the PCI capabilities that we know
> are safe for a guest to use, i.e.: MSI and MSI-X initially.
> The rest of the capabilities should be blocked from guest access,
> unless we audit them and declare safe for a guest to access.
>
> As a reference we might want to look at the approach currently used
> by QEMU in order to do PCI passthrough. A very limited set of PCI
> capabilities known to be safe for untrusted access are exposed to the
> guest and registers need to be explicitly handled or else access is
> rejected. Xen needs a fairly similar model in vPCI or else none of
> this will be safe for unprivileged access.
>
> 5. The series was also tested on:
>   - x86 PVH Dom0 and doesn't break it.
>   - x86 HVM with PCI passthrough to DomU and doesn't break it.
>
> Thank you,
> Oleksandr
>
> [1] https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/
>
> Oleksandr Andrushchenko (13):
>    rangeset: add RANGESETF_no_print flag
>    vpci: fix function attributes for vpci_process_pending
>    vpci: cancel pending map/unmap on vpci removal
>    vpci: add hooks for PCI device assign/de-assign
>    vpci/header: implement guest BAR register handlers
>    vpci/header: handle p2m range sets per BAR
>    vpci/header: program p2m with guest BAR view
>    vpci/header: emulate PCI_COMMAND register for guests
>    vpci/header: reset the command register when adding devices
>    vpci: add initial support for virtual PCI bus topology
>    xen/arm: translate virtual PCI bus topology for guests
>    xen/arm: account IO handlers for emulated PCI MSI-X
>    vpci: add TODO for the registers not explicitly handled
>
> Roger Pau Monne (1):
>    vpci: move lock outside of struct vpci
>
>   tools/tests/vpci/emul.h       |   5 +-
>   tools/tests/vpci/main.c       |   4 +-
>   xen/arch/arm/vpci.c           |  33 +++-
>   xen/arch/x86/hvm/vmsi.c       |   8 +-
>   xen/common/rangeset.c         |   5 +-
>   xen/drivers/Kconfig           |   4 +
>   xen/drivers/passthrough/pci.c |  11 ++
>   xen/drivers/vpci/header.c     | 352 +++++++++++++++++++++++++++-------
>   xen/drivers/vpci/msi.c        |  11 +-
>   xen/drivers/vpci/msix.c       |   8 +-
>   xen/drivers/vpci/vpci.c       | 252 +++++++++++++++++++++---
>   xen/include/xen/pci.h         |   6 +
>   xen/include/xen/rangeset.h    |   7 +-
>   xen/include/xen/sched.h       |   8 +
>   xen/include/xen/vpci.h        |  47 ++++-
>   15 files changed, 644 insertions(+), 117 deletions(-)
>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 00/14] PCI devices passthrough on Arm, part 3
  2021-12-15 11:56 ` [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
@ 2021-12-15 12:07   ` Jan Beulich
  2021-12-15 12:22     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2021-12-15 12:07 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	roger.pau, paul, Bertrand Marquis, Rahul Singh, xen-devel,
	andrew.cooper3, George Dunlap, julien, sstabellini, Wei Liu

On 15.12.2021 12:56, Oleksandr Andrushchenko wrote:
> Dear rest maintainers!
> 
> Could you please review this series which seems to get stuck?

I don't seem to have any record of you having pinged Roger as the vPCI
maintainer. Also, as said on the Community Call when discussing this,
I don't think I'd view this series as in a state where an emergency
fallback to REST would be appropriate. As indicated, in particular I
wouldn't want to commit any of it without Roger's basic agreement. IOW
while REST maintainer reviews may help making progress (but as much
would reviews by anyone else), they may not put the series in a state
where it could go in.

In any event, as also said on the call, afaic this series is in my to-
be-reviewed folder, alongside a few dozen more patches. I'll get to it
if nobody else would, but I can't predict when that's going to be.
There's simply too much other stuff in need of taking care of.

Jan

> Thank you in advance,
> Oleksandr
> 
> On 25.11.21 13:02, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Hi, all!
>>
>> 1. This patch series is focusing on vPCI and adds support for non-identity
>> PCI BAR mappings which is required while passing through a PCI device to
>> a guest. The highlights are:
>>
>> - Add relevant vpci register handlers when assigning PCI device to a domain
>>    and remove those when de-assigning. This allows having different
>>    handlers for different domains, e.g. hwdom and other guests.
>>
>> - Emulate guest BAR register values based on physical BAR values.
>>    This allows creating a guest view of the registers and emulates
>>    size and properties probe as it is done during PCI device enumeration by
>>    the guest.
>>
>> - Instead of handling a single range set, that contains all the memory
>>    regions of all the BARs and ROM, have them per BAR.
>>
>> - Take into account guest's BAR view and program its p2m accordingly:
>>    gfn is guest's view of the BAR and mfn is the physical BAR value as set
>>    up by the host bridge in the hardware domain.
>>    This way hardware doamin sees physical BAR values and guest sees
>>    emulated ones.
>>
>> 2. The series also adds support for virtual PCI bus topology for guests:
>>   - We emulate a single host bridge for the guest, so segment is always 0.
>>   - The implementation is limited to 32 devices which are allowed on
>>     a single PCI bus.
>>   - The virtual bus number is set to 0, so virtual devices are seen
>>     as embedded endpoints behind the root complex.
>>
>> 3. The series has complete re-work of the locking scheme used/absent before with
>> the help of the work started by Roger [1]:
>> [PATCH v5 03/13] vpci: move lock outside of struct vpci
>>
>> This way the lock can be used to check whether vpci is present, and
>> removal can be performed while holding the lock, in order to make
>> sure there are no accesses to the contents of the vpci struct.
>> Previously removal could race with vpci_read for example, since the
>> lock was dropped prior to freeing pdev->vpci.
>> This also solves synchronization issues between all vPCI code entities
>> which could run in parallel.
>>
>> 4. There is an outstanding TODO left unimplemented by this series:
>> for unprivileged guests vpci_{read|write} need to be re-worked
>> to not passthrough accesses to the registers not explicitly handled
>> by the corresponding vPCI handlers: without fixing that passthrough
>> to guests is completely unsafe as Xen allows them full access to
>> the registers.
>>
>> Xen needs to be sure that every register a guest accesses is not
>> going to cause the system to malfunction, so Xen needs to keep a
>> list of the registers it is safe for a guest to access.
>>
>> For example, we should only expose the PCI capabilities that we know
>> are safe for a guest to use, i.e.: MSI and MSI-X initially.
>> The rest of the capabilities should be blocked from guest access,
>> unless we audit them and declare safe for a guest to access.
>>
>> As a reference we might want to look at the approach currently used
>> by QEMU in order to do PCI passthrough. A very limited set of PCI
>> capabilities known to be safe for untrusted access are exposed to the
>> guest and registers need to be explicitly handled or else access is
>> rejected. Xen needs a fairly similar model in vPCI or else none of
>> this will be safe for unprivileged access.
>>
>> 5. The series was also tested on:
>>   - x86 PVH Dom0 and doesn't break it.
>>   - x86 HVM with PCI passthrough to DomU and doesn't break it.
>>
>> Thank you,
>> Oleksandr
>>
>> [1] https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/
>>
>> Oleksandr Andrushchenko (13):
>>    rangeset: add RANGESETF_no_print flag
>>    vpci: fix function attributes for vpci_process_pending
>>    vpci: cancel pending map/unmap on vpci removal
>>    vpci: add hooks for PCI device assign/de-assign
>>    vpci/header: implement guest BAR register handlers
>>    vpci/header: handle p2m range sets per BAR
>>    vpci/header: program p2m with guest BAR view
>>    vpci/header: emulate PCI_COMMAND register for guests
>>    vpci/header: reset the command register when adding devices
>>    vpci: add initial support for virtual PCI bus topology
>>    xen/arm: translate virtual PCI bus topology for guests
>>    xen/arm: account IO handlers for emulated PCI MSI-X
>>    vpci: add TODO for the registers not explicitly handled
>>
>> Roger Pau Monne (1):
>>    vpci: move lock outside of struct vpci
>>
>>   tools/tests/vpci/emul.h       |   5 +-
>>   tools/tests/vpci/main.c       |   4 +-
>>   xen/arch/arm/vpci.c           |  33 +++-
>>   xen/arch/x86/hvm/vmsi.c       |   8 +-
>>   xen/common/rangeset.c         |   5 +-
>>   xen/drivers/Kconfig           |   4 +
>>   xen/drivers/passthrough/pci.c |  11 ++
>>   xen/drivers/vpci/header.c     | 352 +++++++++++++++++++++++++++-------
>>   xen/drivers/vpci/msi.c        |  11 +-
>>   xen/drivers/vpci/msix.c       |   8 +-
>>   xen/drivers/vpci/vpci.c       | 252 +++++++++++++++++++++---
>>   xen/include/xen/pci.h         |   6 +
>>   xen/include/xen/rangeset.h    |   7 +-
>>   xen/include/xen/sched.h       |   8 +
>>   xen/include/xen/vpci.h        |  47 ++++-
>>   15 files changed, 644 insertions(+), 117 deletions(-)
>>



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 00/14] PCI devices passthrough on Arm, part 3
  2021-12-15 12:07   ` Jan Beulich
@ 2021-12-15 12:22     ` Oleksandr Andrushchenko
  2021-12-15 14:51       ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-12-15 12:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	roger.pau, paul, Bertrand Marquis, Rahul Singh, xen-devel,
	andrew.cooper3, George Dunlap, julien, sstabellini, Wei Liu

Hi, Jan!

On 15.12.21 14:07, Jan Beulich wrote:
> On 15.12.2021 12:56, Oleksandr Andrushchenko wrote:
>> Dear rest maintainers!
>>
>> Could you please review this series which seems to get stuck?
> I don't seem to have any record of you having pinged Roger as the vPCI
> maintainer.
No, I didn't. Roger is on CC, so he might shed some light on when it might
happen, so we, those who work on PCI passthrough on Arm,
can also plan the relevant upcoming (re)work: we still miss MSI/MSI-X and
IOMMU series which do depend on this one
>   Also, as said on the Community Call when discussing this,
> I don't think I'd view this series as in a state where an emergency
> fallback to REST would be appropriate.
No emergency here, but v5 without any ack/nack might ring a bell
Which made me write this e-mail
>   As indicated, in particular I
> wouldn't want to commit any of it without Roger's basic agreement.
This is clear as it is up to the relevant maintainer to commit which
I might not expect from the rest maintainers
>   IOW
> while REST maintainer reviews may help making progress (but as much
> would reviews by anyone else),
This is my goal: to have ack/nack at least from the REST mainatainers
>   they may not put the series in a state
> where it could go in.
Fair enough
>
> In any event, as also said on the call, afaic this series is in my to-
> be-reviewed folder,
Appreciate this
>   alongside a few dozen more patches. I'll get to it
> if nobody else would, but I can't predict when that's going to be.
Thank you
> There's simply too much other stuff in need of taking care of.
Sure, our companies do want us to do something useful for them as well, but review

Thank you,
Oleksandr
>
> Jan
>
>> Thank you in advance,
>> Oleksandr
>>
>> On 25.11.21 13:02, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Hi, all!
>>>
>>> 1. This patch series is focusing on vPCI and adds support for non-identity
>>> PCI BAR mappings which is required while passing through a PCI device to
>>> a guest. The highlights are:
>>>
>>> - Add relevant vpci register handlers when assigning PCI device to a domain
>>>     and remove those when de-assigning. This allows having different
>>>     handlers for different domains, e.g. hwdom and other guests.
>>>
>>> - Emulate guest BAR register values based on physical BAR values.
>>>     This allows creating a guest view of the registers and emulates
>>>     size and properties probe as it is done during PCI device enumeration by
>>>     the guest.
>>>
>>> - Instead of handling a single range set, that contains all the memory
>>>     regions of all the BARs and ROM, have them per BAR.
>>>
>>> - Take into account guest's BAR view and program its p2m accordingly:
>>>     gfn is guest's view of the BAR and mfn is the physical BAR value as set
>>>     up by the host bridge in the hardware domain.
>>>     This way hardware doamin sees physical BAR values and guest sees
>>>     emulated ones.
>>>
>>> 2. The series also adds support for virtual PCI bus topology for guests:
>>>    - We emulate a single host bridge for the guest, so segment is always 0.
>>>    - The implementation is limited to 32 devices which are allowed on
>>>      a single PCI bus.
>>>    - The virtual bus number is set to 0, so virtual devices are seen
>>>      as embedded endpoints behind the root complex.
>>>
>>> 3. The series has complete re-work of the locking scheme used/absent before with
>>> the help of the work started by Roger [1]:
>>> [PATCH v5 03/13] vpci: move lock outside of struct vpci
>>>
>>> This way the lock can be used to check whether vpci is present, and
>>> removal can be performed while holding the lock, in order to make
>>> sure there are no accesses to the contents of the vpci struct.
>>> Previously removal could race with vpci_read for example, since the
>>> lock was dropped prior to freeing pdev->vpci.
>>> This also solves synchronization issues between all vPCI code entities
>>> which could run in parallel.
>>>
>>> 4. There is an outstanding TODO left unimplemented by this series:
>>> for unprivileged guests vpci_{read|write} need to be re-worked
>>> to not passthrough accesses to the registers not explicitly handled
>>> by the corresponding vPCI handlers: without fixing that passthrough
>>> to guests is completely unsafe as Xen allows them full access to
>>> the registers.
>>>
>>> Xen needs to be sure that every register a guest accesses is not
>>> going to cause the system to malfunction, so Xen needs to keep a
>>> list of the registers it is safe for a guest to access.
>>>
>>> For example, we should only expose the PCI capabilities that we know
>>> are safe for a guest to use, i.e.: MSI and MSI-X initially.
>>> The rest of the capabilities should be blocked from guest access,
>>> unless we audit them and declare safe for a guest to access.
>>>
>>> As a reference we might want to look at the approach currently used
>>> by QEMU in order to do PCI passthrough. A very limited set of PCI
>>> capabilities known to be safe for untrusted access are exposed to the
>>> guest and registers need to be explicitly handled or else access is
>>> rejected. Xen needs a fairly similar model in vPCI or else none of
>>> this will be safe for unprivileged access.
>>>
>>> 5. The series was also tested on:
>>>    - x86 PVH Dom0 and doesn't break it.
>>>    - x86 HVM with PCI passthrough to DomU and doesn't break it.
>>>
>>> Thank you,
>>> Oleksandr
>>>
>>> [1] https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/__;!!GF_29dbcQIUBPA!ntDLQ-kiosLLPDLG_D7C7Sdeb1Ad1j-43XjuCGTMgeJNboANStsYFP6a1hR43s67GNuFAx7Hug$ [lore[.]kernel[.]org]
>>>
>>> Oleksandr Andrushchenko (13):
>>>     rangeset: add RANGESETF_no_print flag
>>>     vpci: fix function attributes for vpci_process_pending
>>>     vpci: cancel pending map/unmap on vpci removal
>>>     vpci: add hooks for PCI device assign/de-assign
>>>     vpci/header: implement guest BAR register handlers
>>>     vpci/header: handle p2m range sets per BAR
>>>     vpci/header: program p2m with guest BAR view
>>>     vpci/header: emulate PCI_COMMAND register for guests
>>>     vpci/header: reset the command register when adding devices
>>>     vpci: add initial support for virtual PCI bus topology
>>>     xen/arm: translate virtual PCI bus topology for guests
>>>     xen/arm: account IO handlers for emulated PCI MSI-X
>>>     vpci: add TODO for the registers not explicitly handled
>>>
>>> Roger Pau Monne (1):
>>>     vpci: move lock outside of struct vpci
>>>
>>>    tools/tests/vpci/emul.h       |   5 +-
>>>    tools/tests/vpci/main.c       |   4 +-
>>>    xen/arch/arm/vpci.c           |  33 +++-
>>>    xen/arch/x86/hvm/vmsi.c       |   8 +-
>>>    xen/common/rangeset.c         |   5 +-
>>>    xen/drivers/Kconfig           |   4 +
>>>    xen/drivers/passthrough/pci.c |  11 ++
>>>    xen/drivers/vpci/header.c     | 352 +++++++++++++++++++++++++++-------
>>>    xen/drivers/vpci/msi.c        |  11 +-
>>>    xen/drivers/vpci/msix.c       |   8 +-
>>>    xen/drivers/vpci/vpci.c       | 252 +++++++++++++++++++++---
>>>    xen/include/xen/pci.h         |   6 +
>>>    xen/include/xen/rangeset.h    |   7 +-
>>>    xen/include/xen/sched.h       |   8 +
>>>    xen/include/xen/vpci.h        |  47 ++++-
>>>    15 files changed, 644 insertions(+), 117 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 00/14] PCI devices passthrough on Arm, part 3
  2021-12-15 12:22     ` Oleksandr Andrushchenko
@ 2021-12-15 14:51       ` Roger Pau Monné
  2021-12-15 15:02         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2021-12-15 14:51 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, paul, Bertrand Marquis, Rahul Singh, xen-devel,
	andrew.cooper3, George Dunlap, julien, sstabellini, Wei Liu

On Wed, Dec 15, 2021 at 12:22:32PM +0000, Oleksandr Andrushchenko wrote:
> Hi, Jan!
> 
> On 15.12.21 14:07, Jan Beulich wrote:
> > On 15.12.2021 12:56, Oleksandr Andrushchenko wrote:
> >> Dear rest maintainers!
> >>
> >> Could you please review this series which seems to get stuck?
> > I don't seem to have any record of you having pinged Roger as the vPCI
> > maintainer.
> No, I didn't. Roger is on CC, so he might shed some light on when it might
> happen, so we, those who work on PCI passthrough on Arm,
> can also plan the relevant upcoming (re)work: we still miss MSI/MSI-X and
> IOMMU series which do depend on this one

Hello,

I'm quite overloaded with patch review and other stuff, since I've
taken over the Community Manager role while George is away.

There are series on the mailing list that have been pending for way
longer, and while I understand that this is of no help or relief for
you it wouldn't be fair for me to review this piece for work before
other series that have been pending for longer, as other submitters
also deserve review.

Sorry, but I think it's unlikely I will get to it until after new
year.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 00/14] PCI devices passthrough on Arm, part 3
  2021-12-15 14:51       ` Roger Pau Monné
@ 2021-12-15 15:02         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2021-12-15 15:02 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, paul, Bertrand Marquis, Rahul Singh, xen-devel,
	andrew.cooper3, George Dunlap, julien, sstabellini, Wei Liu,
	Oleksandr Andrushchenko

Hi, Roger!

On 15.12.21 16:51, Roger Pau Monné wrote:
> On Wed, Dec 15, 2021 at 12:22:32PM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Jan!
>>
>> On 15.12.21 14:07, Jan Beulich wrote:
>>> On 15.12.2021 12:56, Oleksandr Andrushchenko wrote:
>>>> Dear rest maintainers!
>>>>
>>>> Could you please review this series which seems to get stuck?
>>> I don't seem to have any record of you having pinged Roger as the vPCI
>>> maintainer.
>> No, I didn't. Roger is on CC, so he might shed some light on when it might
>> happen, so we, those who work on PCI passthrough on Arm,
>> can also plan the relevant upcoming (re)work: we still miss MSI/MSI-X and
>> IOMMU series which do depend on this one
> Hello,
>
> I'm quite overloaded with patch review and other stuff, since I've
> taken over the Community Manager role while George is away.
>
> There are series on the mailing list that have been pending for way
> longer, and while I understand that this is of no help or relief for
> you it wouldn't be fair for me to review this piece for work before
> other series that have been pending for longer, as other submitters
> also deserve review.
This is fair
>
> Sorry, but I think it's unlikely I will get to it until after new
> year.
Thank you in advance,
Oleksandr
>
> Thanks, Roger.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2021-11-25 11:02 ` [PATCH v5 03/14] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
@ 2022-01-11 15:17   ` Roger Pau Monné
  2022-01-12 14:42     ` Jan Beulich
  2022-01-12 14:57   ` Jan Beulich
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-11 15:17 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko, Ian Jackson

On Thu, Nov 25, 2021 at 01:02:40PM +0200, Oleksandr Andrushchenko wrote:
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 657697fe3406..ceaac4516ff8 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>  extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
> -void vpci_remove_device(struct pci_dev *pdev)
> +static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
>  {
> -    if ( !has_vpci(pdev->domain) )
> -        return;
> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
>  
> -    spin_lock(&pdev->vpci->lock);
>      while ( !list_empty(&pdev->vpci->handlers) )
>      {
>          struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> @@ -50,15 +48,33 @@ void vpci_remove_device(struct pci_dev *pdev)
>          list_del(&r->node);
>          xfree(r);
>      }
> -    spin_unlock(&pdev->vpci->lock);
> +}
> +
> +void vpci_remove_device_locked(struct pci_dev *pdev)

I think this could be static instead, as it's only used by
vpci_remove_device and vpci_add_handlers which are local to the
file.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal
  2021-11-25 11:02 ` [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal Oleksandr Andrushchenko
@ 2022-01-11 16:57   ` Roger Pau Monné
  2022-01-12 15:27   ` Jan Beulich
  2022-01-31  7:53   ` Oleksandr Andrushchenko
  2 siblings, 0 replies; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-11 16:57 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:41PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> When a vPCI is removed for a PCI device it is possible that we have
> scheduled a delayed work for map/unmap operations for that device.
> For example, the following scenario can illustrate the problem:
> 
> pci_physdev_op
>    pci_add_device
>        init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ)
>    iommu_add_device <- FAILS
>    vpci_remove_device -> xfree(pdev->vpci)
> 
> leave_hypervisor_to_guest
>    vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
> 
> For the hardware domain we continue execution as the worse that
> could happen is that MMIO mappings are left in place when the
> device has been deassigned.
> 
> For unprivileged domains that get a failure in the middle of a vPCI
> {un}map operation we need to destroy them, as we don't know in which
> state the p2m is. This can only happen in vpci_process_pending for
> DomUs as they won't be allowed to call pci_add_device.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Since v4:
>  - crash guest domain if map/unmap operation didn't succeed
>  - re-work vpci cancel work to cancel work on all vCPUs
>  - use new locking scheme with pdev->vpci_lock
> New in v4
> 
> Fixes: 86dbcf6e30cb ("vpci: cancel pending map/unmap on vpci removal")
> 
> ---
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>  xen/drivers/vpci/header.c | 49 ++++++++++++++++++++++++++++++---------
>  xen/drivers/vpci/vpci.c   |  2 ++
>  xen/include/xen/pci.h     |  5 ++++
>  xen/include/xen/vpci.h    |  6 +++++
>  4 files changed, 51 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index bd23c0274d48..ba333fb2f9b0 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -131,7 +131,13 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>  
>  bool vpci_process_pending(struct vcpu *v)
>  {
> -    if ( v->vpci.mem )
> +    struct pci_dev *pdev = v->vpci.pdev;
> +
> +    if ( !pdev )
> +        return false;
> +
> +    spin_lock(&pdev->vpci_lock);
> +    if ( !pdev->vpci_cancel_pending && v->vpci.mem )

Could you just check for pdev->vpci != NULL instead of having to add a
new vpci_cancel_pending field?

I also have a suggestion below which could make the code here simpler.

>      {
>          struct map_data data = {
>              .d = v->domain,
> @@ -140,32 +146,53 @@ bool vpci_process_pending(struct vcpu *v)
>          int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
>  
>          if ( rc == -ERESTART )
> +        {
> +            spin_unlock(&pdev->vpci_lock);
>              return true;
> +        }
>  
> -        spin_lock(&v->vpci.pdev->vpci_lock);
> -        if ( v->vpci.pdev->vpci )
> +        if ( pdev->vpci )
>              /* Disable memory decoding unconditionally on failure. */
> -            modify_decoding(v->vpci.pdev,
> +            modify_decoding(pdev,
>                              rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>                              !rc && v->vpci.rom_only);
> -        spin_unlock(&v->vpci.pdev->vpci_lock);
>  
> -        rangeset_destroy(v->vpci.mem);
> -        v->vpci.mem = NULL;
>          if ( rc )
> +        {
>              /*
>               * FIXME: in case of failure remove the device from the domain.
>               * Note that there might still be leftover mappings. While this is
> -             * safe for Dom0, for DomUs the domain will likely need to be
> -             * killed in order to avoid leaking stale p2m mappings on
> -             * failure.
> +             * safe for Dom0, for DomUs the domain needs to be killed in order
> +             * to avoid leaking stale p2m mappings on failure.
>               */
> -            vpci_remove_device(v->vpci.pdev);
> +            if ( is_hardware_domain(v->domain) )
> +                vpci_remove_device_locked(pdev);
> +            else
> +                domain_crash(v->domain);
> +        }
>      }
> +    spin_unlock(&pdev->vpci_lock);
>  
>      return false;
>  }
>  
> +void vpci_cancel_pending_locked(struct pci_dev *pdev)
> +{
> +    struct vcpu *v;
> +
> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
> +
> +    /* Cancel any pending work now on all vCPUs. */
> +    for_each_vcpu( pdev->domain, v )
> +    {
> +        if ( v->vpci.mem && (v->vpci.pdev == pdev) )

I'm unsure this is correct. You are protecting the access to
v->vpci.pdev with an expectation that v->vpci.pdev->vpci_lock is being
held.

I wonder if it would be better to just pause all the domain vCPUs and
then perform the cleaning of any pending operations. That would assure
that there are no changes to v->vpci. vpci_cancel_pending_locked
shouldn't be a frequent operation, so the overhead of pausing all
domain vCPUs here is likely fine.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign
  2021-11-25 11:02 ` [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
@ 2022-01-12 12:12   ` Roger Pau Monné
  2022-01-31  8:43     ` Oleksandr Andrushchenko
  2022-01-13 11:40   ` Roger Pau Monné
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-12 12:12 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:42PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> When a PCI device gets assigned/de-assigned some work on vPCI side needs
> to be done for that device. Introduce a pair of hooks so vPCI can handle
> that.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v4:
>  - de-assign vPCI from the previous domain on device assignment
>  - do not remove handlers in vpci_assign_device as those must not
>    exist at that point
> Since v3:
>  - remove toolstack roll-back description from the commit message
>    as error are to be handled with proper cleanup in Xen itself
>  - remove __must_check
>  - remove redundant rc check while assigning devices
>  - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
>  - use REGISTER_VPCI_INIT machinery to run required steps on device
>    init/assign: add run_vpci_init helper
> Since v2:
> - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
>   for x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - extended the commit message
> ---
>  xen/drivers/Kconfig           |  4 +++
>  xen/drivers/passthrough/pci.c | 10 ++++++
>  xen/drivers/vpci/vpci.c       | 61 +++++++++++++++++++++++++++++------
>  xen/include/xen/vpci.h        | 16 +++++++++
>  4 files changed, 82 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
> index db94393f47a6..780490cf8e39 100644
> --- a/xen/drivers/Kconfig
> +++ b/xen/drivers/Kconfig
> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>  config HAS_VPCI
>  	bool
>  
> +config HAS_VPCI_GUEST_SUPPORT
> +	bool
> +	depends on HAS_VPCI
> +
>  endmenu
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 286808b25e65..d9ef91571adf 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -874,6 +874,10 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>      if ( ret )
>          goto out;
>  
> +    ret = vpci_deassign_device(d, pdev);
> +    if ( ret )
> +        goto out;

Following my comment below, this won't be allowed to fail.

> +
>      if ( pdev->domain == hardware_domain  )
>          pdev->quarantine = false;
>  
> @@ -1429,6 +1433,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>      ASSERT(pdev && (pdev->domain == hardware_domain ||
>                      pdev->domain == dom_io));
>  
> +    rc = vpci_deassign_device(pdev->domain, pdev);
> +    if ( rc )
> +        goto done;
> +
>      rc = pdev_msix_assign(d, pdev);
>      if ( rc )
>          goto done;
> @@ -1446,6 +1454,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>          rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>      }
>  
> +    rc = vpci_assign_device(d, pdev);
> +
>   done:
>      if ( rc )
>          printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 37103e207635..a9e9e8ec438c 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -74,12 +74,26 @@ void vpci_remove_device(struct pci_dev *pdev)
>      spin_unlock(&pdev->vpci_lock);
>  }
>  
> -int vpci_add_handlers(struct pci_dev *pdev)
> +static int run_vpci_init(struct pci_dev *pdev)

Just using add_handlers as function name would be clearer IMO.

>  {
> -    struct vpci *vpci;
>      unsigned int i;
>      int rc = 0;
>  
> +    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> +    {
> +        rc = __start_vpci_array[i](pdev);
> +        if ( rc )
> +            break;
> +    }
> +
> +    return rc;
> +}
> +
> +int vpci_add_handlers(struct pci_dev *pdev)
> +{
> +    struct vpci *vpci;
> +    int rc;
> +
>      if ( !has_vpci(pdev->domain) )
>          return 0;
>  
> @@ -94,19 +108,48 @@ int vpci_add_handlers(struct pci_dev *pdev)
>      pdev->vpci = vpci;
>      INIT_LIST_HEAD(&pdev->vpci->handlers);
>  
> -    for ( i = 0; i < NUM_VPCI_INIT; i++ )
> -    {
> -        rc = __start_vpci_array[i](pdev);
> -        if ( rc )
> -            break;
> -    }
> -
> +    rc = run_vpci_init(pdev);
>      if ( rc )
>          vpci_remove_device_locked(pdev);
>      spin_unlock(&pdev->vpci_lock);
>  
>      return rc;
>  }
> +
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Notify vPCI that device is assigned to guest. */
> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
> +{
> +    int rc;
> +
> +    /* It only makes sense to assign for hwdom or guest domain. */
> +    if ( is_system_domain(d) || !has_vpci(d) )

Do you really need the is_system_domain check? System domains
shouldn't have the VPCI flag set anyway, so should fail the has_vpci
test.

> +        return 0;
> +
> +    spin_lock(&pdev->vpci_lock);
> +    rc = run_vpci_init(pdev);
> +    spin_unlock(&pdev->vpci_lock);
> +    if ( rc )
> +        vpci_deassign_device(d, pdev);
> +
> +    return rc;
> +}
> +
> +/* Notify vPCI that device is de-assigned from guest. */
> +int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)

There's no need to return any value from this function AFAICT. It
should have void return type.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2021-11-25 11:02 ` [PATCH v5 06/14] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
  2021-11-25 16:28   ` Bertrand Marquis
@ 2022-01-12 12:35   ` Roger Pau Monné
  2022-01-31  9:47     ` Oleksandr Andrushchenko
  2022-01-31 15:06     ` Oleksandr Andrushchenko
  2022-01-12 17:34   ` Roger Pau Monné
  2 siblings, 2 replies; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-12 12:35 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:43PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> ROM BAR is only handled for the hardware domain and for guest domains
> there is a stub: at the moment PCI expansion ROM handling is supported
> for x86 only and it might not be used by other architectures without
> emulating x86. Other use-cases may include using that expansion ROM before
> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
> wants to use the ROM code which seems to be rare.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v4:
> - updated commit message
> - s/guest_addr/guest_reg
> Since v3:
> - squashed two patches: dynamic add/remove handlers and guest BAR
>   handler implementation
> - fix guest BAR read of the high part of a 64bit BAR (Roger)
> - add error handling to vpci_assign_device
> - s/dom%pd/%pd
> - blank line before return
> Since v2:
> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>   has been eliminated from being built on x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - simplify some code3. simplify
>  - use gdprintk + error code instead of gprintk
>  - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>    so these do not get compiled for x86
>  - removed unneeded is_system_domain check
>  - re-work guest read/write to be much simpler and do more work on write
>    than read which is expected to be called more frequently
>  - removed one too obvious comment
> ---
>  xen/drivers/vpci/header.c | 72 +++++++++++++++++++++++++++++++++++----
>  xen/include/xen/vpci.h    |  3 ++
>  2 files changed, 69 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ba333fb2f9b0..8880d34ebf8e 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -433,6 +433,48 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>      pci_conf_write32(pdev->sbdf, reg, val);
>  }
>  
> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +    struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +    else
> +    {
> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +    }
> +
> +    bar->guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
> +    bar->guest_reg |= (uint64_t)val << (hi ? 32 : 0);
> +
> +    bar->guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
> +}
> +
> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    const struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +
> +    return bar->guest_reg >> (hi ? 32 : 0);
> +}
> +
>  static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>                        uint32_t val, void *data)
>  {
> @@ -481,6 +523,17 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>  }
>  
> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +}
> +
> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    return 0xffffffff;
> +}

There should be no need for those handlers. As said elsewhere: for
guests registers not explicitly handled should return ~0 for reads and
drop writes, which is what you are proposing here.

> +
>  static int init_bars(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
> @@ -489,6 +542,7 @@ static int init_bars(struct pci_dev *pdev)
>      struct vpci_header *header = &pdev->vpci->header;
>      struct vpci_bar *bars = header->bars;
>      int rc;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>  
>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>      {
> @@ -528,8 +582,10 @@ static int init_bars(struct pci_dev *pdev)
>          if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>          {
>              bars[i].type = VPCI_BAR_MEM64_HI;
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> -                                   4, &bars[i]);
> +            rc = vpci_add_register(pdev->vpci,
> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                                   is_hwdom ? bar_write : guest_bar_write,
> +                                   reg, 4, &bars[i]);
>              if ( rc )
>              {
>                  pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
>          bars[i].size = size;
>          bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>  
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> -                               &bars[i]);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                               is_hwdom ? bar_write : guest_bar_write,
> +                               reg, 4, &bars[i]);
>          if ( rc )
>          {
>              pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -590,8 +648,10 @@ static int init_bars(struct pci_dev *pdev)
>          header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
>                                PCI_ROM_ADDRESS_ENABLE;
>  
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
> -                               4, rom);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_rom_read,
> +                               is_hwdom ? rom_write : guest_rom_write,
> +                               rom_reg, 4, rom);

This whole call should be made conditional to is_hwdom, as said above
there's no need for the guest_rom handlers.

Likewise I assume you expect IO BARs to simply return ~0 and drop
writes, as there's no explicit handler added for those?

>          if ( rc )
>              rom->type = VPCI_BAR_EMPTY;
>      }
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index ed127a08a953..0a73b14a92dc 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -68,7 +68,10 @@ struct vpci {
>      struct vpci_header {
>          /* Information about the PCI BARs of this device. */
>          struct vpci_bar {
> +            /* Physical view of the BAR. */

No, that's not the physical view, it's the physical (host) address.

>              uint64_t addr;
> +            /* Guest view of the BAR: address and lower bits. */
> +            uint64_t guest_reg;

I continue to think it would be clearer if you store the guest address
here (gaddr, without the low bits) and add those in guest_bar_read
based on bar->{type,prefetchable}. Then it would be equivalent to the
existing 'addr' field.

I wonder whether we need to protect the added code with
CONFIG_HAS_VPCI_GUEST_SUPPORT, this would effectively be dead code
otherwise. Long term I don't think we wish to differentiate between
dom0 and domU vPCI support at build time, so I'm unsure whether it's
helpful to pollute the code with CONFIG_HAS_VPCI_GUEST_SUPPORT when
the plan is to remove those long term.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-11 15:17   ` Roger Pau Monné
@ 2022-01-12 14:42     ` Jan Beulich
  2022-01-26  8:40       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-12 14:42 UTC (permalink / raw)
  To: Roger Pau Monné, Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, andrew.cooper3, george.dunlap,
	paul, bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	Ian Jackson

On 11.01.2022 16:17, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:40PM +0200, Oleksandr Andrushchenko wrote:
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 657697fe3406..ceaac4516ff8 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>  extern vpci_register_init_t *const __end_vpci_array[];
>>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>  
>> -void vpci_remove_device(struct pci_dev *pdev)
>> +static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
>>  {
>> -    if ( !has_vpci(pdev->domain) )
>> -        return;
>> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
>>  
>> -    spin_lock(&pdev->vpci->lock);
>>      while ( !list_empty(&pdev->vpci->handlers) )
>>      {
>>          struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
>> @@ -50,15 +48,33 @@ void vpci_remove_device(struct pci_dev *pdev)
>>          list_del(&r->node);
>>          xfree(r);
>>      }
>> -    spin_unlock(&pdev->vpci->lock);
>> +}
>> +
>> +void vpci_remove_device_locked(struct pci_dev *pdev)
> 
> I think this could be static instead, as it's only used by
> vpci_remove_device and vpci_add_handlers which are local to the
> file.

Does the splitting out of vpci_remove_device_handlers_locked() belong in
this patch in the first place? There's no second caller being added, so
this looks to be an orthogonal adjustment.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2021-11-25 11:02 ` [PATCH v5 03/14] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
  2022-01-11 15:17   ` Roger Pau Monné
@ 2022-01-12 14:57   ` Jan Beulich
  2022-01-12 15:42     ` Roger Pau Monné
  2022-01-28 14:12     ` Oleksandr Andrushchenko
  1 sibling, 2 replies; 130+ messages in thread
From: Jan Beulich @ 2022-01-12 14:57 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	xen-devel

On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> @@ -68,12 +84,13 @@ int vpci_add_handlers(struct pci_dev *pdev)
>      /* We should not get here twice for the same device. */
>      ASSERT(!pdev->vpci);
>  
> -    pdev->vpci = xzalloc(struct vpci);
> -    if ( !pdev->vpci )
> +    vpci = xzalloc(struct vpci);
> +    if ( !vpci )
>          return -ENOMEM;
>  
> +    spin_lock(&pdev->vpci_lock);
> +    pdev->vpci = vpci;
>      INIT_LIST_HEAD(&pdev->vpci->handlers);
> -    spin_lock_init(&pdev->vpci->lock);

INIT_LIST_HEAD() can occur ahead of taking the lock, and can also act
on &vpci->handlers rather than &pdev->vpci->handlers.

>      for ( i = 0; i < NUM_VPCI_INIT; i++ )
>      {
> @@ -83,7 +100,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
>      }

This loop wants to live in the locked region because you need to install
vpci into pdev->vpci up front, afaict. I wonder whether that couldn't
be relaxed, but perhaps that's an improvement that can be thought about
later.

The main reason I'm looking at this closely is because from the patch
title I didn't expect new locking regions to be introduced right here;
instead I did expect strictly a mechanical conversion.

> @@ -152,8 +170,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      r->offset = offset;
>      r->private = data;
>  
> -    spin_lock(&vpci->lock);

From the description I can't deduce why this lock is fine to go away
now, i.e. that all call sites have the lock now acquire earlier.
Therefore I'd expect at least an assertion to be left here ...

> @@ -183,7 +197,6 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>      const struct vpci_register r = { .offset = offset, .size = size };
>      struct vpci_register *rm;
>  
> -    spin_lock(&vpci->lock);

... and here.

> @@ -370,6 +386,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>              break;
>          ASSERT(data_offset < size);
>      }
> +    spin_unlock(&pdev->vpci_lock);
>  
>      if ( data_offset < size )
>      {
> @@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>  
>          data = merge_result(data, tmp_data, size - data_offset, data_offset);
>      }
> -    spin_unlock(&pdev->vpci->lock);
>  
>      return data & (0xffffffff >> (32 - 8 * size));
>  }

Here and ...

> @@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>              break;
>          ASSERT(data_offset < size);
>      }
> +    spin_unlock(&pdev->vpci_lock);
>  
>      if ( data_offset < size )
>          /* Tailing gap, write the remaining. */
>          vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>                        data >> (data_offset * 8));
> -
> -    spin_unlock(&pdev->vpci->lock);
>  }

... even more so here I'm not sure of the correctness of the moving
you do: While pdev->vpci indeed doesn't get accessed, I wonder
whether there wasn't an intention to avoid racing calls to
vpci_{read,write}_hw() this way. In any event I think such movement
would need justification in the description.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR
  2021-11-25 11:02 ` [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR Oleksandr Andrushchenko
@ 2022-01-12 15:15   ` Roger Pau Monné
  2022-01-12 15:18     ` Jan Beulich
  2022-02-02  6:44     ` Oleksandr Andrushchenko
  0 siblings, 2 replies; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-12 15:15 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:44PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Instead of handling a single range set, that contains all the memory
> regions of all the BARs and ROM, have them per BAR.
> As the range sets are now created when a PCI device is added and destroyed
> when it is removed so make them named and accounted.
> 
> Note that rangesets were chosen here despite there being only up to
> 3 separate ranges in each set (typically just 1). But rangeset per BAR
> was chosen for the ease of implementation and existing code re-usability.
> 
> This is in preparation of making non-identity mappings in p2m for the
> MMIOs/ROM.

I think we don't want to support ROM for guests (at least initially),
so no need to mention it here.

> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Since v4:
> - use named range sets for BARs (Jan)
> - changes required by the new locking scheme
> - updated commit message (Jan)
> Since v3:
> - re-work vpci_cancel_pending accordingly to the per-BAR handling
> - s/num_mem_ranges/map_pending and s/uint8_t/bool
> - ASSERT(bar->mem) in modify_bars
> - create and destroy the rangesets on add/remove
> ---
>  xen/drivers/vpci/header.c | 190 +++++++++++++++++++++++++++-----------
>  xen/drivers/vpci/vpci.c   |  30 +++++-
>  xen/include/xen/vpci.h    |   3 +-
>  3 files changed, 166 insertions(+), 57 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 8880d34ebf8e..cc49aa68886f 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -137,45 +137,86 @@ bool vpci_process_pending(struct vcpu *v)
>          return false;
>  
>      spin_lock(&pdev->vpci_lock);
> -    if ( !pdev->vpci_cancel_pending && v->vpci.mem )
> +    if ( !pdev->vpci )
> +    {
> +        spin_unlock(&pdev->vpci_lock);
> +        return false;
> +    }
> +
> +    if ( !pdev->vpci_cancel_pending && v->vpci.map_pending )
>      {
>          struct map_data data = {
>              .d = v->domain,
>              .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>          };
> -        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
> +        struct vpci_header *header = &pdev->vpci->header;
> +        unsigned int i;
>  
> -        if ( rc == -ERESTART )
> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>          {
> -            spin_unlock(&pdev->vpci_lock);
> -            return true;
> -        }
> +            struct vpci_bar *bar = &header->bars[i];
> +            int rc;
> +

You should check bar->mem != NULL here, there's no need to allocate a
rangeset for non-mappable BARs.

> +            if ( rangeset_is_empty(bar->mem) )
> +                continue;
> +
> +            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
> +
> +            if ( rc == -ERESTART )
> +            {
> +                spin_unlock(&pdev->vpci_lock);
> +                return true;
> +            }
>  
> -        if ( pdev->vpci )
>              /* Disable memory decoding unconditionally on failure. */
> -            modify_decoding(pdev,
> -                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> +            modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,

The above seems to be an unrelated change, and also exceeds the max
line length.

>                              !rc && v->vpci.rom_only);
>  
> -        if ( rc )
> -        {
> -            /*
> -             * FIXME: in case of failure remove the device from the domain.
> -             * Note that there might still be leftover mappings. While this is
> -             * safe for Dom0, for DomUs the domain needs to be killed in order
> -             * to avoid leaking stale p2m mappings on failure.
> -             */
> -            if ( is_hardware_domain(v->domain) )
> -                vpci_remove_device_locked(pdev);
> -            else
> -                domain_crash(v->domain);
> +            if ( rc )
> +            {
> +                /*
> +                 * FIXME: in case of failure remove the device from the domain.
> +                 * Note that there might still be leftover mappings. While this is
> +                 * safe for Dom0, for DomUs the domain needs to be killed in order
> +                 * to avoid leaking stale p2m mappings on failure.
> +                 */
> +                if ( is_hardware_domain(v->domain) )
> +                    vpci_remove_device_locked(pdev);
> +                else
> +                    domain_crash(v->domain);
> +
> +                break;
> +            }
>          }
> +
> +        v->vpci.map_pending = false;
>      }
>      spin_unlock(&pdev->vpci_lock);
>  
>      return false;
>  }
>  
> +static void vpci_bar_remove_ranges(const struct pci_dev *pdev)
> +{
> +    struct vpci_header *header = &pdev->vpci->header;
> +    unsigned int i;
> +    int rc;
> +
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
> +
> +        if ( rangeset_is_empty(bar->mem) )
> +            continue;
> +
> +        rc = rangeset_remove_range(bar->mem, 0, ~0ULL);

Might be interesting to introduce a rangeset_reset function that
removes all ranges. That would never fail, and thus there would be no
need to check for rc.

Also I think the current rangeset_remove_range should never fail when
removing all ranges, as there's nothing to allocate. Hence you can add
an ASSERT_UNREACHABLE below.

> +        if ( !rc )
> +            printk(XENLOG_ERR
> +                   "%pd %pp failed to remove range set for BAR: %d\n",
> +                   pdev->domain, &pdev->sbdf, rc);
> +    }
> +}
> +
>  void vpci_cancel_pending_locked(struct pci_dev *pdev)
>  {
>      struct vcpu *v;
> @@ -185,23 +226,33 @@ void vpci_cancel_pending_locked(struct pci_dev *pdev)
>      /* Cancel any pending work now on all vCPUs. */
>      for_each_vcpu( pdev->domain, v )
>      {
> -        if ( v->vpci.mem && (v->vpci.pdev == pdev) )
> +        if ( v->vpci.map_pending && (v->vpci.pdev == pdev) )
>          {
> -            rangeset_destroy(v->vpci.mem);
> -            v->vpci.mem = NULL;
> +            vpci_bar_remove_ranges(pdev);
> +            v->vpci.map_pending = false;
>          }
>      }
>  }
>  
>  static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
> -                            struct rangeset *mem, uint16_t cmd)
> +                            uint16_t cmd)
>  {
>      struct map_data data = { .d = d, .map = true };
> -    int rc;
> +    struct vpci_header *header = &pdev->vpci->header;
> +    int rc = 0;
> +    unsigned int i;
> +
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
>  
> -    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
> -        process_pending_softirqs();
> -    rangeset_destroy(mem);
> +        if ( rangeset_is_empty(bar->mem) )
> +            continue;
> +
> +        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
> +                                              &data)) == -ERESTART )
> +            process_pending_softirqs();
> +    }
>      if ( !rc )
>          modify_decoding(pdev, cmd, false);
>  
> @@ -209,7 +260,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>  }
>  
>  static void defer_map(struct domain *d, struct pci_dev *pdev,
> -                      struct rangeset *mem, uint16_t cmd, bool rom_only)
> +                      uint16_t cmd, bool rom_only)
>  {
>      struct vcpu *curr = current;
>  
> @@ -220,7 +271,7 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>       * started for the same device if the domain is not well-behaved.
>       */
>      curr->vpci.pdev = pdev;
> -    curr->vpci.mem = mem;
> +    curr->vpci.map_pending = true;
>      curr->vpci.cmd = cmd;
>      curr->vpci.rom_only = rom_only;
>      /*
> @@ -234,42 +285,40 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>  {
>      struct vpci_header *header = &pdev->vpci->header;
> -    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>      struct pci_dev *tmp, *dev = NULL;
>      const struct vpci_msix *msix = pdev->vpci->msix;
> -    unsigned int i;
> +    unsigned int i, j;
>      int rc;
> -
> -    if ( !mem )
> -        return -ENOMEM;
> +    bool map_pending;
>  
>      /*
> -     * Create a rangeset that represents the current device BARs memory region
> +     * Create a rangeset per BAR that represents the current device memory region
>       * and compare it against all the currently active BAR memory regions. If
>       * an overlap is found, subtract it from the region to be mapped/unmapped.
>       *
> -     * First fill the rangeset with all the BARs of this device or with the ROM
> +     * First fill the rangesets with all the BARs of this device or with the ROM
                                        ^ 'all' doesn't apply anymore.
>       * BAR only, depending on whether the guest is toggling the memory decode
>       * bit of the command register, or the enable bit of the ROM BAR register.
>       */
>      for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>      {
> -        const struct vpci_bar *bar = &header->bars[i];
> +        struct vpci_bar *bar = &header->bars[i];
>          unsigned long start = PFN_DOWN(bar->addr);
>          unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>  
> +        ASSERT(bar->mem);
> +
>          if ( !MAPPABLE_BAR(bar) ||
>               (rom_only ? bar->type != VPCI_BAR_ROM
>                         : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
>              continue;
>  
> -        rc = rangeset_add_range(mem, start, end);
> +        rc = rangeset_add_range(bar->mem, start, end);
>          if ( rc )
>          {
>              printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
>                     start, end, rc);
> -            rangeset_destroy(mem);
> -            return rc;
> +            goto fail;
>          }


I think you also need to check that BARs from the same device don't
overlap themselves. This wasn't needed before because all BARs shared
the same rangeset. It's not uncommon for BARs of the same device to
share a page.

So you would need something like the following added to the loop:

/* Check for overlap with the already setup BAR ranges. */
for ( j = 0; j < i; j++ )
    rangeset_remove_range(header->bars[j].mem, start, end);

>      }
>  
> @@ -280,14 +329,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>          unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>                                       vmsix_table_size(pdev->vpci, i) - 1);
>  
> -        rc = rangeset_remove_range(mem, start, end);
> -        if ( rc )
> +        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
>          {
> -            printk(XENLOG_G_WARNING
> -                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
> -                   start, end, rc);
> -            rangeset_destroy(mem);
> -            return rc;
> +            const struct vpci_bar *bar = &header->bars[j];
> +
> +            if ( rangeset_is_empty(bar->mem) )
> +                continue;
> +
> +            rc = rangeset_remove_range(bar->mem, start, end);
> +            if ( rc )
> +            {
> +                printk(XENLOG_G_WARNING
> +                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
> +                       start, end, rc);
> +                goto fail;
> +            }
>          }
>      }
>  
> @@ -325,7 +381,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>              unsigned long start = PFN_DOWN(bar->addr);
>              unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>  
> -            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
> +            if ( !bar->enabled ||
> +                 !rangeset_overlaps_range(bar->mem, start, end) ||
>                   /*
>                    * If only the ROM enable bit is toggled check against other
>                    * BARs in the same device for overlaps, but not against the
> @@ -334,14 +391,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>                   (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
>                  continue;
>  
> -            rc = rangeset_remove_range(mem, start, end);
> +            rc = rangeset_remove_range(bar->mem, start, end);
>              if ( rc )
>              {
>                  spin_unlock(&tmp->vpci_lock);
>                  printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>                         start, end, rc);
> -                rangeset_destroy(mem);
> -                return rc;
> +                goto fail;
>              }
>          }
>          spin_unlock(&tmp->vpci_lock);
> @@ -360,12 +416,36 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>           * will always be to establish mappings and process all the BARs.
>           */
>          ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
> -        return apply_map(pdev->domain, pdev, mem, cmd);
> +        return apply_map(pdev->domain, pdev, cmd);
>      }
>  
> -    defer_map(dev->domain, dev, mem, cmd, rom_only);
> +    /* Find out how many memory ranges has left after MSI and overlaps. */
> +    map_pending = false;
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +        if ( !rangeset_is_empty(header->bars[i].mem) )
> +        {
> +            map_pending = true;
> +            break;
> +        }
> +
> +    /*
> +     * There are cases when PCI device, root port for example, has neither
> +     * memory space nor IO. In this case PCI command register write is
> +     * missed resulting in the underlying PCI device not functional, so:
> +     *   - if there are no regions write the command register now
> +     *   - if there are regions then defer work and write later on

I would just say:

/* If there's no mapping work write the command register now. */

> +     */
> +    if ( !map_pending )
> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> +    else
> +        defer_map(dev->domain, dev, cmd, rom_only);
>  
>      return 0;
> +
> +fail:
> +    /* Destroy all the ranges we may have added. */
> +    vpci_bar_remove_ranges(pdev);
> +    return rc;
>  }
>  
>  static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index a9e9e8ec438c..98b12a61be6f 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -52,11 +52,16 @@ static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
>  
>  void vpci_remove_device_locked(struct pci_dev *pdev)
>  {
> +    struct vpci_header *header = &pdev->vpci->header;
> +    unsigned int i;
> +
>      ASSERT(spin_is_locked(&pdev->vpci_lock));
>  
>      pdev->vpci_cancel_pending = true;
>      vpci_remove_device_handlers_locked(pdev);
>      vpci_cancel_pending_locked(pdev);
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +        rangeset_destroy(header->bars[i].mem);
>      xfree(pdev->vpci->msix);
>      xfree(pdev->vpci->msi);
>      xfree(pdev->vpci);
> @@ -92,6 +97,8 @@ static int run_vpci_init(struct pci_dev *pdev)
>  int vpci_add_handlers(struct pci_dev *pdev)
>  {
>      struct vpci *vpci;
> +    struct vpci_header *header;
> +    unsigned int i;
>      int rc;
>  
>      if ( !has_vpci(pdev->domain) )
> @@ -108,11 +115,32 @@ int vpci_add_handlers(struct pci_dev *pdev)
>      pdev->vpci = vpci;
>      INIT_LIST_HEAD(&pdev->vpci->handlers);
>  
> +    header = &pdev->vpci->header;
> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +    {
> +        struct vpci_bar *bar = &header->bars[i];
> +        char str[32];
> +
> +        snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
> +        bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
> +        if ( !bar->mem )
> +        {
> +            rc = -ENOMEM;
> +            goto fail;
> +        }
> +    }

You just need the ranges for the VPCI_BAR_MEM32, VPCI_BAR_MEM64_LO and
VPCI_BAR_ROM BAR types (see the MAPPABLE_BAR macro). Would it be
possible to only allocate the rangeset for those BAR types?

Also this should be done in init_bars rather than here, as you would
know the BAR types.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR
  2022-01-12 15:15   ` Roger Pau Monné
@ 2022-01-12 15:18     ` Jan Beulich
  2022-02-02  6:44     ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 130+ messages in thread
From: Jan Beulich @ 2022-01-12 15:18 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, andrew.cooper3, george.dunlap,
	paul, bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	Oleksandr Andrushchenko

On 12.01.2022 16:15, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:44PM +0200, Oleksandr Andrushchenko wrote:
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -137,45 +137,86 @@ bool vpci_process_pending(struct vcpu *v)
>>          return false;
>>  
>>      spin_lock(&pdev->vpci_lock);
>> -    if ( !pdev->vpci_cancel_pending && v->vpci.mem )
>> +    if ( !pdev->vpci )
>> +    {
>> +        spin_unlock(&pdev->vpci_lock);
>> +        return false;
>> +    }
>> +
>> +    if ( !pdev->vpci_cancel_pending && v->vpci.map_pending )
>>      {
>>          struct map_data data = {
>>              .d = v->domain,
>>              .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>>          };
>> -        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
>> +        struct vpci_header *header = &pdev->vpci->header;
>> +        unsigned int i;
>>  
>> -        if ( rc == -ERESTART )
>> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>          {
>> -            spin_unlock(&pdev->vpci_lock);
>> -            return true;
>> -        }
>> +            struct vpci_bar *bar = &header->bars[i];
>> +            int rc;
>> +
> 
> You should check bar->mem != NULL here, there's no need to allocate a
> rangeset for non-mappable BARs.

There's a NULL check ...

>> +            if ( rangeset_is_empty(bar->mem) )
>> +                continue;

... inside rangeset_is_empty() (to help callers like this one).

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal
  2021-11-25 11:02 ` [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal Oleksandr Andrushchenko
  2022-01-11 16:57   ` Roger Pau Monné
@ 2022-01-12 15:27   ` Jan Beulich
  2022-01-28 12:21     ` Oleksandr Andrushchenko
  2022-01-31  7:53   ` Oleksandr Andrushchenko
  2 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-12 15:27 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	xen-devel

On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> When a vPCI is removed for a PCI device it is possible that we have
> scheduled a delayed work for map/unmap operations for that device.
> For example, the following scenario can illustrate the problem:
> 
> pci_physdev_op
>    pci_add_device
>        init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ)
>    iommu_add_device <- FAILS
>    vpci_remove_device -> xfree(pdev->vpci)
> 
> leave_hypervisor_to_guest
>    vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
> 
> For the hardware domain we continue execution as the worse that
> could happen is that MMIO mappings are left in place when the
> device has been deassigned.
> 
> For unprivileged domains that get a failure in the middle of a vPCI
> {un}map operation we need to destroy them, as we don't know in which
> state the p2m is. This can only happen in vpci_process_pending for
> DomUs as they won't be allowed to call pci_add_device.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ---
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Since v4:
>  - crash guest domain if map/unmap operation didn't succeed
>  - re-work vpci cancel work to cancel work on all vCPUs
>  - use new locking scheme with pdev->vpci_lock
> New in v4
> 
> Fixes: 86dbcf6e30cb ("vpci: cancel pending map/unmap on vpci removal")

What is this about?

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology
  2021-11-25 11:02 ` [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
@ 2022-01-12 15:39   ` Jan Beulich
  2022-02-02 13:15     ` Oleksandr Andrushchenko
  2022-01-13 11:35   ` Roger Pau Monné
  1 sibling, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-12 15:39 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	xen-devel

On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> @@ -145,6 +148,53 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  }
>  
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +int vpci_add_virtual_device(struct pci_dev *pdev)
> +{
> +    struct domain *d = pdev->domain;
> +    pci_sbdf_t sbdf = { 0 };
> +    unsigned long new_dev_number;
> +
> +    /*
> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
> +     * there are multi-function ones which are not yet supported.
> +     */
> +    if ( pdev->info.is_extfn )
> +    {
> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
> +                 &pdev->sbdf);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    new_dev_number = find_first_zero_bit(&d->vpci_dev_assigned_map,
> +                                         VPCI_MAX_VIRT_DEV);
> +    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
> +        return -ENOSPC;
> +
> +    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
> +
> +    /*
> +     * Both segment and bus number are 0:
> +     *  - we emulate a single host bridge for the guest, e.g. segment 0
> +     *  - with bus 0 the virtual devices are seen as embedded
> +     *    endpoints behind the root complex
> +     *
> +     * TODO: add support for multi-function devices.
> +     */
> +    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
> +    pdev->vpci->guest_sbdf = sbdf;
> +
> +    return 0;
> +
> +}
> +REGISTER_VPCI_INIT(vpci_add_virtual_device, VPCI_PRIORITY_MIDDLE);

Is this function guaranteed to always be invoked ahead of ...

> +static void vpci_remove_virtual_device(struct domain *d,
> +                                       const struct pci_dev *pdev)
> +{
> +    __clear_bit(pdev->vpci->guest_sbdf.dev, &d->vpci_dev_assigned_map);
> +    pdev->vpci->guest_sbdf.sbdf = ~0;
> +}

... this one, even when considering error paths? Otherwise you may
wrongly clear bit 31 here afaict.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-12 14:57   ` Jan Beulich
@ 2022-01-12 15:42     ` Roger Pau Monné
  2022-01-12 15:52       ` Jan Beulich
  2022-01-28 14:12     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-12 15:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, julien, sstabellini,
	oleksandr_tyshchenko, volodymyr_babchuk, Artem_Mygaiev,
	andrew.cooper3, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, xen-devel

On Wed, Jan 12, 2022 at 03:57:36PM +0100, Jan Beulich wrote:
> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> > @@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
> >  
> >          data = merge_result(data, tmp_data, size - data_offset, data_offset);
> >      }
> > -    spin_unlock(&pdev->vpci->lock);
> >  
> >      return data & (0xffffffff >> (32 - 8 * size));
> >  }
> 
> Here and ...
> 
> > @@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >              break;
> >          ASSERT(data_offset < size);
> >      }
> > +    spin_unlock(&pdev->vpci_lock);
> >  
> >      if ( data_offset < size )
> >          /* Tailing gap, write the remaining. */
> >          vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
> >                        data >> (data_offset * 8));
> > -
> > -    spin_unlock(&pdev->vpci->lock);
> >  }
> 
> ... even more so here I'm not sure of the correctness of the moving
> you do: While pdev->vpci indeed doesn't get accessed, I wonder
> whether there wasn't an intention to avoid racing calls to
> vpci_{read,write}_hw() this way. In any event I think such movement
> would need justification in the description.

I agree about the need for justification in the commit message, or
even better this being split into a pre-patch, as it's not related to
the lock switching done here.

I do think this is fine however, as racing calls to
vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
around pci_conf_{read,write} functions, and the required locking (in
case of using the IO ports) is already taken care in
pci_conf_{read,write}.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-12 15:42     ` Roger Pau Monné
@ 2022-01-12 15:52       ` Jan Beulich
  2022-01-13  8:58         ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-12 15:52 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Oleksandr Andrushchenko, julien, sstabellini,
	oleksandr_tyshchenko, volodymyr_babchuk, Artem_Mygaiev,
	andrew.cooper3, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, xen-devel

On 12.01.2022 16:42, Roger Pau Monné wrote:
> On Wed, Jan 12, 2022 at 03:57:36PM +0100, Jan Beulich wrote:
>> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>>> @@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>  
>>>          data = merge_result(data, tmp_data, size - data_offset, data_offset);
>>>      }
>>> -    spin_unlock(&pdev->vpci->lock);
>>>  
>>>      return data & (0xffffffff >> (32 - 8 * size));
>>>  }
>>
>> Here and ...
>>
>>> @@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>              break;
>>>          ASSERT(data_offset < size);
>>>      }
>>> +    spin_unlock(&pdev->vpci_lock);
>>>  
>>>      if ( data_offset < size )
>>>          /* Tailing gap, write the remaining. */
>>>          vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>>>                        data >> (data_offset * 8));
>>> -
>>> -    spin_unlock(&pdev->vpci->lock);
>>>  }
>>
>> ... even more so here I'm not sure of the correctness of the moving
>> you do: While pdev->vpci indeed doesn't get accessed, I wonder
>> whether there wasn't an intention to avoid racing calls to
>> vpci_{read,write}_hw() this way. In any event I think such movement
>> would need justification in the description.
> 
> I agree about the need for justification in the commit message, or
> even better this being split into a pre-patch, as it's not related to
> the lock switching done here.
> 
> I do think this is fine however, as racing calls to
> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
> around pci_conf_{read,write} functions, and the required locking (in
> case of using the IO ports) is already taken care in
> pci_conf_{read,write}.

IOW you consider it acceptable for a guest (really: Dom0) read racing
a write to read back only part of what was written (so far)? I would
think individual multi-byte reads and writes should appear atomic to
the guest.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2021-11-25 11:02 ` [PATCH v5 06/14] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
  2021-11-25 16:28   ` Bertrand Marquis
  2022-01-12 12:35   ` Roger Pau Monné
@ 2022-01-12 17:34   ` Roger Pau Monné
  2022-01-31  9:53     ` Oleksandr Andrushchenko
  2 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-12 17:34 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

A couple more comments I realized while walking the dog.

On Thu, Nov 25, 2021 at 01:02:43PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> ROM BAR is only handled for the hardware domain and for guest domains
> there is a stub: at the moment PCI expansion ROM handling is supported
> for x86 only and it might not be used by other architectures without
> emulating x86. Other use-cases may include using that expansion ROM before
> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
> wants to use the ROM code which seems to be rare.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v4:
> - updated commit message
> - s/guest_addr/guest_reg
> Since v3:
> - squashed two patches: dynamic add/remove handlers and guest BAR
>   handler implementation
> - fix guest BAR read of the high part of a 64bit BAR (Roger)
> - add error handling to vpci_assign_device
> - s/dom%pd/%pd
> - blank line before return
> Since v2:
> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>   has been eliminated from being built on x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - simplify some code3. simplify
>  - use gdprintk + error code instead of gprintk
>  - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>    so these do not get compiled for x86
>  - removed unneeded is_system_domain check
>  - re-work guest read/write to be much simpler and do more work on write
>    than read which is expected to be called more frequently
>  - removed one too obvious comment
> ---
>  xen/drivers/vpci/header.c | 72 +++++++++++++++++++++++++++++++++++----
>  xen/include/xen/vpci.h    |  3 ++
>  2 files changed, 69 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index ba333fb2f9b0..8880d34ebf8e 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -433,6 +433,48 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>      pci_conf_write32(pdev->sbdf, reg, val);
>  }
>  
> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +    struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +    else
> +    {
> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +    }
> +
> +    bar->guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
> +    bar->guest_reg |= (uint64_t)val << (hi ? 32 : 0);
> +
> +    bar->guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;

You need to assert that the guest set address has the same page offset
as the physical address on the host, or otherwise things won't work as
expected. Ie: guest_addr & ~PAGE_MASK == addr & ~PAGE_MASK.

> +}
> +
> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    const struct vpci_bar *bar = data;
> +    bool hi = false;
> +
> +    if ( bar->type == VPCI_BAR_MEM64_HI )
> +    {
> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
> +        bar--;
> +        hi = true;
> +    }
> +
> +    return bar->guest_reg >> (hi ? 32 : 0);
> +}
> +
>  static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>                        uint32_t val, void *data)
>  {
> @@ -481,6 +523,17 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>          rom->addr = val & PCI_ROM_ADDRESS_MASK;
>  }
>  
> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t val, void *data)
> +{
> +}
> +
> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> +                               void *data)
> +{
> +    return 0xffffffff;
> +}
> +
>  static int init_bars(struct pci_dev *pdev)
>  {
>      uint16_t cmd;
> @@ -489,6 +542,7 @@ static int init_bars(struct pci_dev *pdev)
>      struct vpci_header *header = &pdev->vpci->header;
>      struct vpci_bar *bars = header->bars;
>      int rc;
> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>  
>      switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>      {
> @@ -528,8 +582,10 @@ static int init_bars(struct pci_dev *pdev)
>          if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>          {
>              bars[i].type = VPCI_BAR_MEM64_HI;
> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> -                                   4, &bars[i]);
> +            rc = vpci_add_register(pdev->vpci,
> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                                   is_hwdom ? bar_write : guest_bar_write,
> +                                   reg, 4, &bars[i]);
>              if ( rc )
>              {
>                  pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
>          bars[i].size = size;
>          bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>  
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> -                               &bars[i]);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
> +                               is_hwdom ? bar_write : guest_bar_write,
> +                               reg, 4, &bars[i]);

You need to initialize guest_reg to the physical host value also.

>          if ( rc )
>          {
>              pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> @@ -590,8 +648,10 @@ static int init_bars(struct pci_dev *pdev)
>          header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
>                                PCI_ROM_ADDRESS_ENABLE;
>  
> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
> -                               4, rom);
> +        rc = vpci_add_register(pdev->vpci,
> +                               is_hwdom ? vpci_hw_read32 : guest_rom_read,
> +                               is_hwdom ? rom_write : guest_rom_write,
> +                               rom_reg, 4, rom);
>          if ( rc )
>              rom->type = VPCI_BAR_EMPTY;

Also memory decoding needs to be initially disabled when used by
guests, in order to prevent the BAR being placed on top of a RAM
region. The guest physmap will be different from the host one, so it's
possible for BARs to end up placed on top of RAM regions initially
until the firmware or OS places them at a suitable address.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-12 15:52       ` Jan Beulich
@ 2022-01-13  8:58         ` Roger Pau Monné
  2022-01-28 14:15           ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13  8:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, julien, sstabellini,
	oleksandr_tyshchenko, volodymyr_babchuk, Artem_Mygaiev,
	andrew.cooper3, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, xen-devel

On Wed, Jan 12, 2022 at 04:52:51PM +0100, Jan Beulich wrote:
> On 12.01.2022 16:42, Roger Pau Monné wrote:
> > On Wed, Jan 12, 2022 at 03:57:36PM +0100, Jan Beulich wrote:
> >> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> >>> @@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
> >>>  
> >>>          data = merge_result(data, tmp_data, size - data_offset, data_offset);
> >>>      }
> >>> -    spin_unlock(&pdev->vpci->lock);
> >>>  
> >>>      return data & (0xffffffff >> (32 - 8 * size));
> >>>  }
> >>
> >> Here and ...
> >>
> >>> @@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >>>              break;
> >>>          ASSERT(data_offset < size);
> >>>      }
> >>> +    spin_unlock(&pdev->vpci_lock);
> >>>  
> >>>      if ( data_offset < size )
> >>>          /* Tailing gap, write the remaining. */
> >>>          vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
> >>>                        data >> (data_offset * 8));
> >>> -
> >>> -    spin_unlock(&pdev->vpci->lock);
> >>>  }
> >>
> >> ... even more so here I'm not sure of the correctness of the moving
> >> you do: While pdev->vpci indeed doesn't get accessed, I wonder
> >> whether there wasn't an intention to avoid racing calls to
> >> vpci_{read,write}_hw() this way. In any event I think such movement
> >> would need justification in the description.
> > 
> > I agree about the need for justification in the commit message, or
> > even better this being split into a pre-patch, as it's not related to
> > the lock switching done here.
> > 
> > I do think this is fine however, as racing calls to
> > vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
> > around pci_conf_{read,write} functions, and the required locking (in
> > case of using the IO ports) is already taken care in
> > pci_conf_{read,write}.
> 
> IOW you consider it acceptable for a guest (really: Dom0) read racing
> a write to read back only part of what was written (so far)? I would
> think individual multi-byte reads and writes should appear atomic to
> the guest.

We split 64bit writes into two 32bit ones without taking the lock for
the whole duration of the access, so it's already possible to see a
partially updated state as a result of a 64bit write.

I'm going over the PCI(e) spec but I don't seem to find anything about
whether the ECAM is allowed to split memory transactions into multiple
Configuration Requests, and whether those could then interleave with
requests from a different CPU.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2021-11-25 11:02 ` [PATCH v5 08/14] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
@ 2022-01-13 10:22   ` Roger Pau Monné
  2022-02-02  8:23     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 10:22 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:45PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Take into account guest's BAR view and program its p2m accordingly:
> gfn is guest's view of the BAR and mfn is the physical BAR value as set
> up by the PCI bus driver in the hardware domain.
> This way hardware domain sees physical BAR values and guest sees
> emulated ones.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v4:
> - moved start_{gfn|mfn} calculation into map_range
> - pass vpci_bar in the map_data instead of start_{gfn|mfn}
> - s/guest_addr/guest_reg
> Since v3:
> - updated comment (Roger)
> - removed gfn_add(map->start_gfn, rc); which is wrong
> - use v->domain instead of v->vpci.pdev->domain
> - removed odd e.g. in comment
> - s/d%d/%pd in altered code
> - use gdprintk for map/unmap logs
> Since v2:
> - improve readability for data.start_gfn and restructure ?: construct
> Since v1:
>  - s/MSI/MSI-X in comments
> 
> ---
> ---
>  xen/drivers/vpci/header.c | 30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index cc49aa68886f..b0499d32c5d8 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -30,6 +30,7 @@
>  
>  struct map_data {
>      struct domain *d;
> +    const struct vpci_bar *bar;
>      bool map;
>  };
>  
> @@ -41,8 +42,25 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>  
>      for ( ; ; )
>      {
> +        /* Start address of the BAR as seen by the guest. */
> +        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
> +                                        ? map->bar->addr
> +                                        : map->bar->guest_reg));
> +        /* Physical start address of the BAR. */
> +        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
>          unsigned long size = e - s + 1;
>  
> +        /*
> +         * Ranges to be mapped don't always start at the BAR start address, as
> +         * there can be holes or partially consumed ranges. Account for the
> +         * offset of the current address from the BAR start.
> +         */
> +        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));

When doing guests mappings the rangeset should represent the guest
physical memory space, not the host one. So that collisions in the
guest p2m can be avoided. Also a guest should be allowed to map the
same mfn into multiple gfn. For example multiple BARs could share the
same physical page on the host and the guest might like to map them at
different pages in it's physmap.

> +
> +        gdprintk(XENLOG_G_DEBUG,
> +                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
> +                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
> +                 map->d);

That's too chatty IMO, I could be fine with printing something along
this lines from modify_bars, but not here because that function can be
preempted and called multiple times.

>          /*
>           * ARM TODOs:
>           * - On ARM whether the memory is prefetchable or not should be passed
> @@ -52,8 +70,10 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>           * - {un}map_mmio_regions doesn't support preemption.
>           */
>  
> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
> +        rc = map->map ? map_mmio_regions(map->d, start_gfn,
> +                                         size, _mfn(s))
> +                      : unmap_mmio_regions(map->d, start_gfn,
> +                                           size, _mfn(s));
>          if ( rc == 0 )
>          {
>              *c += size;
> @@ -62,8 +82,8 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>          if ( rc < 0 )
>          {
>              printk(XENLOG_G_WARNING
> -                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
> -                   map->map ? "" : "un", s, e, map->d->domain_id, rc);
> +                   "Failed to identity %smap [%lx, %lx] for %pd: %d\n",
> +                   map->map ? "" : "un", s, e, map->d, rc);

You need to adjust the message here, as this is no longer an identity
map for domUs.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2021-11-25 11:02 ` [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
@ 2022-01-13 10:50   ` Roger Pau Monné
  2022-02-02 12:49     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 10:50 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Add basic emulation support for guests. At the moment only emulate
> PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left
> as TODO.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v3:
> - gate more code on CONFIG_HAS_MSI
> - removed logic for the case when MSI/MSI-X not enabled
> ---
>  xen/drivers/vpci/header.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index b0499d32c5d8..2e44055946b0 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>          pci_conf_write16(pdev->sbdf, reg, cmd);
>  }
>  
> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> +                            uint32_t cmd, void *data)
> +{
> +    /* TODO: Add proper emulation for all bits of the command register. */
> +
> +#ifdef CONFIG_HAS_PCI_MSI
> +    if ( pdev->vpci->msi->enabled )

You need to check for MSI-X also, pdev->vpci->msix->enabled.

> +    {
> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
> +        cmd |= PCI_COMMAND_INTX_DISABLE;

You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
command register when attempting to enable MSI or MSIX capabilities.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 10/14] vpci/header: reset the command register when adding devices
  2021-11-25 11:02 ` [PATCH v5 10/14] vpci/header: reset the command register when adding devices Oleksandr Andrushchenko
@ 2022-01-13 11:07   ` Roger Pau Monné
  2022-02-02 12:58     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 11:07 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:47PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Reset the command register when passing through a PCI device:
> it is possible that when passing through a PCI device its memory
> decoding bits in the command register are already set. Thus, a
> guest OS may not write to the command register to update memory
> decoding, so guest mappings (guest's view of the BARs) are
> left not updated.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v1:
>  - do not write 0 to the command register, but respect host settings.

There's not much respect of host setting here, are you are basically
writing 0 except for the INTX_DISABLE which will be set if MSI(X) is
enabled.

I wonder whether you really need this anyway. I would expect that a
device that's being assigned to a guest has just been reset globally,
so there should be no need to reset the command register explicitly.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology
  2021-11-25 11:02 ` [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
  2022-01-12 15:39   ` Jan Beulich
@ 2022-01-13 11:35   ` Roger Pau Monné
  2022-02-02 13:17     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 11:35 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:48PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Assign SBDF to the PCI devices being passed through with bus 0.
> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.
> 
> Please note, that at the moment only function 0 of a multifunction
> device can be passed through.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v4:
> - moved and re-worked guest sbdf initializers
> - s/set_bit/__set_bit
> - s/clear_bit/__clear_bit
> - minor comment fix s/Virtual/Guest/
> - added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
>   later for counting the number of MMIO handlers required for a guest
>   (Julien)
> Since v3:
>  - make use of VPCI_INIT
>  - moved all new code to vpci.c which belongs to it
>  - changed open-coded 31 to PCI_SLOT(~0)
>  - added comments and code to reject multifunction devices with
>    functions other than 0
>  - updated comment about vpci_dev_next and made it unsigned int
>  - implement roll back in case of error while assigning/deassigning devices
>  - s/dom%pd/%pd
> Since v2:
>  - remove casts that are (a) malformed and (b) unnecessary
>  - add new line for better readability
>  - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>     functions are now completely gated with this config
>  - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/drivers/vpci/vpci.c | 51 +++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/sched.h |  8 +++++++
>  xen/include/xen/vpci.h  | 11 +++++++++
>  3 files changed, 70 insertions(+)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 98b12a61be6f..c2fb4d4db233 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -114,6 +114,9 @@ int vpci_add_handlers(struct pci_dev *pdev)
>      spin_lock(&pdev->vpci_lock);
>      pdev->vpci = vpci;
>      INIT_LIST_HEAD(&pdev->vpci->handlers);
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    pdev->vpci->guest_sbdf.sbdf = ~0;
> +#endif
>  
>      header = &pdev->vpci->header;
>      for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> @@ -145,6 +148,53 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  }
>  
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +int vpci_add_virtual_device(struct pci_dev *pdev)
> +{
> +    struct domain *d = pdev->domain;
> +    pci_sbdf_t sbdf = { 0 };
> +    unsigned long new_dev_number;

I think this needs to be limited to non-hardware domains?

Or else you will report failures for the hardware domain even if it's
not using the virtual topology at all.

> +    /*
> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
> +     * there are multi-function ones which are not yet supported.
> +     */
> +    if ( pdev->info.is_extfn )
> +    {
> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
> +                 &pdev->sbdf);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    new_dev_number = find_first_zero_bit(&d->vpci_dev_assigned_map,
> +                                         VPCI_MAX_VIRT_DEV);
> +    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
> +        return -ENOSPC;
> +
> +    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);

How is vpci_dev_assigned_map protected from concurrent accesses? Does
it rely on the pcidevs lock being held while accessing it?

If so it needs spelling out (and likely an assert added).

> +    /*
> +     * Both segment and bus number are 0:
> +     *  - we emulate a single host bridge for the guest, e.g. segment 0
> +     *  - with bus 0 the virtual devices are seen as embedded
> +     *    endpoints behind the root complex
> +     *
> +     * TODO: add support for multi-function devices.
> +     */
> +    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
> +    pdev->vpci->guest_sbdf = sbdf;
> +
> +    return 0;
> +
> +}
> +REGISTER_VPCI_INIT(vpci_add_virtual_device, VPCI_PRIORITY_MIDDLE);

I'm unsure this is the right place to do virtual SBDF assignment, my
plan was to use REGISTER_VPCI_INIT exclusively with PCI capabilities.

I think it would be better to do the virtual SBDF assignment from
vpci_assign_device.

> +
> +static void vpci_remove_virtual_device(struct domain *d,
> +                                       const struct pci_dev *pdev)
> +{
> +    __clear_bit(pdev->vpci->guest_sbdf.dev, &d->vpci_dev_assigned_map);
> +    pdev->vpci->guest_sbdf.sbdf = ~0;
> +}
> +
>  /* Notify vPCI that device is assigned to guest. */
>  int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>  {
> @@ -171,6 +221,7 @@ int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
>          return 0;
>  
>      spin_lock(&pdev->vpci_lock);
> +    vpci_remove_virtual_device(d, pdev);
>      vpci_remove_device_handlers_locked(pdev);
>      spin_unlock(&pdev->vpci_lock);
>  
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 28146ee404e6..10bff103317c 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -444,6 +444,14 @@ struct domain
>  
>  #ifdef CONFIG_HAS_PCI
>      struct list_head pdev_list;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    /*
> +     * The bitmap which shows which device numbers are already used by the
> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
> +     * next passed through virtual PCI device.
> +     */
> +    unsigned long vpci_dev_assigned_map;

Please use DECLARE_BITMAP with the maximum number of supported
devices as parameter.

> +#endif
>  #endif
>  
>  #ifdef CONFIG_HAS_PASSTHROUGH
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 18319fc329f9..e5258bd7ce90 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -21,6 +21,13 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>  
>  #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>  
> +/*
> + * Maximum number of devices supported by the virtual bus topology:
> + * each PCI bus supports 32 devices/slots at max or up to 256 when
> + * there are multi-function ones which are not yet supported.
> + */
> +#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
> +
>  #define REGISTER_VPCI_INIT(x, p)                \
>    static vpci_register_init_t *const x##_entry  \
>                 __used_section(".data.vpci." p) = x
> @@ -143,6 +150,10 @@ struct vpci {
>              struct vpci_arch_msix_entry arch;
>          } entries[];
>      } *msix;
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    /* Guest SBDF of the device. */
> +    pci_sbdf_t guest_sbdf;
> +#endif
>  #endif
>  };
>  
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign
  2021-11-25 11:02 ` [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
  2022-01-12 12:12   ` Roger Pau Monné
@ 2022-01-13 11:40   ` Roger Pau Monné
  2022-01-31  8:45     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 11:40 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:42PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Notify vPCI that device is assigned to guest. */
> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
> +{
> +    int rc;
> +
> +    /* It only makes sense to assign for hwdom or guest domain. */
> +    if ( is_system_domain(d) || !has_vpci(d) )
> +        return 0;
> +
> +    spin_lock(&pdev->vpci_lock);
> +    rc = run_vpci_init(pdev);

Following my comment below, this will likely need to call
vpci_add_handlers in order to allocate the pdev->vpci field.

It's not OK to carry the contents of pdev->vpci across domain
assignations, as the device should be reset, and thus the content of
pdev->vpci would be stale.

> +    spin_unlock(&pdev->vpci_lock);
> +    if ( rc )
> +        vpci_deassign_device(d, pdev);
> +
> +    return rc;
> +}
> +
> +/* Notify vPCI that device is de-assigned from guest. */
> +int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
> +{
> +    /* It only makes sense to de-assign from hwdom or guest domain. */
> +    if ( is_system_domain(d) || !has_vpci(d) )
> +        return 0;
> +
> +    spin_lock(&pdev->vpci_lock);
> +    vpci_remove_device_handlers_locked(pdev);

You need to free the pdev->vpci structure on deassign. I would expect
the device to be reset on deassign, so keeping the pdev->vpci contents
would be wrong.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 12/14] xen/arm: translate virtual PCI bus topology for guests
  2021-11-25 11:02 ` [PATCH v5 12/14] xen/arm: translate virtual PCI bus topology for guests Oleksandr Andrushchenko
@ 2022-01-13 12:18   ` Roger Pau Monné
  2022-02-02 13:58     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 12:18 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:49PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> There are three  originators for the PCI configuration space access:
> 1. The domain that owns physical host bridge: MMIO handlers are
> there so we can update vPCI register handlers with the values
> written by the hardware domain, e.g. physical view of the registers
> vs guest's view on the configuration space.
> 2. Guest access to the passed through PCI devices: we need to properly
> map virtual bus topology to the physical one, e.g. pass the configuration
> space access to the corresponding physical devices.
> 3. Emulated host PCI bridge access. It doesn't exist in the physical
> topology, e.g. it can't be mapped to some physical host bridge.
> So, all access to the host bridge itself needs to be trapped and
> emulated.

I'm kind of lost in this commit message. You are just adding a
translate function in order for domUs to translate from virtual SBDF
to the physical SBDF of the device. I realize you do that based on
whether 'bridge' is set or not, so I assume this is just a way to
signal whether the domain is a hardware domain or not. Ie:
!!bridge == is_hardware_domain(v->domain).

> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Since v4:
> - indentation fixes
> - constify struct domain
> - updated commit message
> - updates to the new locking scheme (pdev->vpci_lock)
> Since v3:
> - revisit locking
> - move code to vpci.c
> Since v2:
>  - pass struct domain instead of struct vcpu
>  - constify arguments where possible
>  - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/arch/arm/vpci.c     | 18 ++++++++++++++++++
>  xen/drivers/vpci/vpci.c | 27 +++++++++++++++++++++++++++
>  xen/include/xen/vpci.h  |  1 +
>  3 files changed, 46 insertions(+)
> 
> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
> index 8e801f275879..3d134f42d07e 100644
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -41,6 +41,15 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>      /* data is needed to prevent a pointer cast on 32bit */
>      unsigned long data;
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
> +        return 1;

I'm unsure what returning 1 implies for Arm here, but you likely need
to set '*r = ~0ul;'.

> +#endif
> +
>      if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
>                          1U << info->dabt.size, &data) )
>      {
> @@ -59,6 +68,15 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>      struct pci_host_bridge *bridge = p;
>      pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +    /*
> +     * For the passed through devices we need to map their virtual SBDF
> +     * to the physical PCI device being passed through.
> +     */
> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
> +        return 1;
> +#endif
> +
>      return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
>                             1U << info->dabt.size, r);
>  }
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index c2fb4d4db233..bdc8c63f73fa 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -195,6 +195,33 @@ static void vpci_remove_virtual_device(struct domain *d,
>      pdev->vpci->guest_sbdf.sbdf = ~0;
>  }
>  
> +/*
> + * Find the physical device which is mapped to the virtual device
> + * and translate virtual SBDF to the physical one.
> + */
> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
> +{
> +    struct pci_dev *pdev;
> +

I would add:

ASSERT(!is_hardware_domain(d));

To make sure this is not used for the hardware domain.

> +    for_each_pdev( d, pdev )
> +    {
> +        bool found;
> +
> +        spin_lock(&pdev->vpci_lock);
> +        found = pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf);
> +        spin_unlock(&pdev->vpci_lock);
> +
> +        if ( found )
> +        {
> +            /* Replace guest SBDF with the physical one. */
> +            *sbdf = pdev->sbdf;
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  /* Notify vPCI that device is assigned to guest. */
>  int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>  {
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index e5258bd7ce90..21d76929391f 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -280,6 +280,7 @@ static inline void vpci_cancel_pending_locked(struct pci_dev *pdev)
>  /* Notify vPCI that device is assigned/de-assigned to/from guest. */
>  int vpci_assign_device(struct domain *d, struct pci_dev *pdev);
>  int vpci_deassign_device(struct domain *d, struct pci_dev *pdev);
> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
>  #else
>  static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>  {

If you add a dummy vpci_translate_virtual_device helper that returns
false unconditionally here you could drop the #ifdefs in arm/vpci.c
AFAICT.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 13/14] xen/arm: account IO handlers for emulated PCI MSI-X
  2021-11-25 11:02 ` [PATCH v5 13/14] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Andrushchenko
@ 2022-01-13 13:23   ` Roger Pau Monné
  2022-02-02 14:08     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 13:23 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Thu, Nov 25, 2021 at 01:02:50PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> At the moment, we always allocate an extra 16 slots for IO handlers
> (see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
> MSI-X registers we need to explicitly tell that we have additional IO
> handlers, so those are accounted.
> 
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

LGTM, just one comment below. This will require an Ack from the Arm
guys.

> ---
> Cc: Julien Grall <julien@xen.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> This actually moved here from the part 2 of the prep work for PCI
> passthrough on Arm as it seems to be the proper place for it.
> 
> New in v5
> ---
>  xen/arch/arm/vpci.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
> index 3d134f42d07e..902f8491e030 100644
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -134,6 +134,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
>  
>  unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
>  {
> +    unsigned int count;
> +
>      if ( !has_vpci(d) )
>          return 0;
>  
> @@ -145,7 +147,18 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
>      }
>  
>      /* For a single emulated host bridge's configuration space. */
> -    return 1;
> +    count = 1;
> +
> +#ifdef CONFIG_HAS_PCI_MSI
> +    /*
> +     * There's a single MSI-X MMIO handler that deals with both PBA
> +     * and MSI-X tables per each PCI device being passed through.
> +     * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
> +     */
> +    count += VPCI_MAX_VIRT_DEV;

You could also use IS_ENABLED(CONFIG_HAS_PCI_MSI) since
VPCI_MAX_VIRT_DEV is defined unconditionally.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled
  2021-11-25 11:17   ` Jan Beulich
  2021-11-25 11:20     ` Oleksandr Andrushchenko
@ 2022-01-13 13:27     ` Roger Pau Monné
  2022-01-13 13:38       ` Jan Beulich
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-13 13:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, julien, sstabellini,
	oleksandr_tyshchenko, volodymyr_babchuk, Artem_Mygaiev,
	andrew.cooper3, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, xen-devel

On Thu, Nov 25, 2021 at 12:17:32PM +0100, Jan Beulich wrote:
> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> > From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> > 
> > For unprivileged guests vpci_{read|write} need to be re-worked
> > to not passthrough accesses to the registers not explicitly handled
> > by the corresponding vPCI handlers: without fixing that passthrough
> > to guests is completely unsafe as Xen allows them full access to
> > the registers.
> > 
> > Xen needs to be sure that every register a guest accesses is not
> > going to cause the system to malfunction, so Xen needs to keep a
> > list of the registers it is safe for a guest to access.
> > 
> > For example, we should only expose the PCI capabilities that we know
> > are safe for a guest to use, i.e.: MSI and MSI-X initially.
> > The rest of the capabilities should be blocked from guest access,
> > unless we audit them and declare safe for a guest to access.
> > 
> > As a reference we might want to look at the approach currently used
> > by QEMU in order to do PCI passthrough. A very limited set of PCI
> > capabilities known to be safe for untrusted access are exposed to the
> > guest and registers need to be explicitly handled or else access is
> > rejected. Xen needs a fairly similar model in vPCI or else none of
> > this will be safe for unprivileged access.
> > 
> > Add the corresponding TODO comment to highlight there is a problem that
> > needs to be fixed.
> > 
> > Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
> > Suggested-by: Jan Beulich <jbeulich@suse.com>
> > Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Looks okay to me in principle, but imo needs to come earlier in the
> series, before things actually get exposed to DomU-s.

Are domUs really allowed to use this code? Maybe it's done in a
separate series, but has_vpci is hardcoded to false on Arm, and
X86_EMU_VPCI can only be set for the hardware domain on x86.

Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled
  2022-01-13 13:27     ` Roger Pau Monné
@ 2022-01-13 13:38       ` Jan Beulich
  2022-01-28 13:03         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-13 13:38 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Oleksandr Andrushchenko, julien, sstabellini,
	oleksandr_tyshchenko, volodymyr_babchuk, Artem_Mygaiev,
	andrew.cooper3, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko, xen-devel

On 13.01.2022 14:27, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 12:17:32PM +0100, Jan Beulich wrote:
>> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> For unprivileged guests vpci_{read|write} need to be re-worked
>>> to not passthrough accesses to the registers not explicitly handled
>>> by the corresponding vPCI handlers: without fixing that passthrough
>>> to guests is completely unsafe as Xen allows them full access to
>>> the registers.
>>>
>>> Xen needs to be sure that every register a guest accesses is not
>>> going to cause the system to malfunction, so Xen needs to keep a
>>> list of the registers it is safe for a guest to access.
>>>
>>> For example, we should only expose the PCI capabilities that we know
>>> are safe for a guest to use, i.e.: MSI and MSI-X initially.
>>> The rest of the capabilities should be blocked from guest access,
>>> unless we audit them and declare safe for a guest to access.
>>>
>>> As a reference we might want to look at the approach currently used
>>> by QEMU in order to do PCI passthrough. A very limited set of PCI
>>> capabilities known to be safe for untrusted access are exposed to the
>>> guest and registers need to be explicitly handled or else access is
>>> rejected. Xen needs a fairly similar model in vPCI or else none of
>>> this will be safe for unprivileged access.
>>>
>>> Add the corresponding TODO comment to highlight there is a problem that
>>> needs to be fixed.
>>>
>>> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Looks okay to me in principle, but imo needs to come earlier in the
>> series, before things actually get exposed to DomU-s.
> 
> Are domUs really allowed to use this code? Maybe it's done in a
> separate series, but has_vpci is hardcoded to false on Arm, and
> X86_EMU_VPCI can only be set for the hardware domain on x86.

I'm not sure either. This series gives the impression of exposing things,
but I admit I didn't pay attention to has_vpci() being hardcoded on Arm.
Then again there were at least 3 series in parallel originally, with
interdependencies (iirc) not properly spelled out ...

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending
  2021-12-11  8:57       ` Oleksandr Andrushchenko
@ 2022-01-26  8:31         ` Oleksandr Andrushchenko
  2022-01-26 10:54           ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-26  8:31 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, jbeulich, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, Julien Grall

Hi, Roger!

On 11.12.21 10:57, Oleksandr Andrushchenko wrote:
> Hi, Roger!
>
> On 11.12.21 10:20, Roger Pau Monné wrote:
>> On Fri, Dec 10, 2021 at 05:55:03PM +0000, Julien Grall wrote:
>>> Hi Oleksandr,
>>>
>>> On 25/11/2021 11:02, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> vpci_process_pending is defined with different attributes, e.g.
>>>> with __must_check if CONFIG_HAS_VPCI enabled and not otherwise.
>>>> Fix this by defining both of the definitions with __must_check.
>>>>
>>>> Fixes: 14583a590783 ("7fbb096bf345 kconfig: don't select VPCI if building a shim-only binary")
>>>>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> Reviewed-by: Julien Grall <jgrall@amazon.com>
>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> I think this can be committed independently of the rest of the
>> series?
> I think so
Could you please commit this one, so I don't have to keep it in the v6 of the series?

Thank you in advance,
Oleksandr
>> Thanks, Roger.
> Thank you,
> Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-12 14:42     ` Jan Beulich
@ 2022-01-26  8:40       ` Oleksandr Andrushchenko
  2022-01-26 11:13         ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-26  8:40 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Ian Jackson,
	Oleksandr Andrushchenko

Hello, Roger, Jan!

On 12.01.22 16:42, Jan Beulich wrote:
> On 11.01.2022 16:17, Roger Pau Monné wrote:
>> On Thu, Nov 25, 2021 at 01:02:40PM +0200, Oleksandr Andrushchenko wrote:
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index 657697fe3406..ceaac4516ff8 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>>   extern vpci_register_init_t *const __end_vpci_array[];
>>>   #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>>   
>>> -void vpci_remove_device(struct pci_dev *pdev)
>>> +static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
>>>   {
>>> -    if ( !has_vpci(pdev->domain) )
>>> -        return;
>>> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
>>>   
>>> -    spin_lock(&pdev->vpci->lock);
>>>       while ( !list_empty(&pdev->vpci->handlers) )
>>>       {
>>>           struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
>>> @@ -50,15 +48,33 @@ void vpci_remove_device(struct pci_dev *pdev)
>>>           list_del(&r->node);
>>>           xfree(r);
>>>       }
>>> -    spin_unlock(&pdev->vpci->lock);
>>> +}
>>> +
>>> +void vpci_remove_device_locked(struct pci_dev *pdev)
>> I think this could be static instead, as it's only used by
>> vpci_remove_device and vpci_add_handlers which are local to the
>> file.
This is going to be used outside later on while processing pending mappings,
so I think it is not worth it defining it static here and then removing the static
key word later on: please see [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal [1]
> Does the splitting out of vpci_remove_device_handlers_locked() belong in
> this patch in the first place? There's no second caller being added, so
> this looks to be an orthogonal adjustment.
I think of it as a preparation for the upcoming code: although the reason for the
change might not be immediately seen in this patch it is still in line with what
happens next.
So, I would prefer to keep the change as is: anyways the whole series should probably
be committed as a single piece of work, so it won't look inconsistent then

Thank you,
Oleksandr
>
> Jan
>

[1] https://patchwork.kernel.org/project/xen-devel/patch/20211125110251.2877218-5-andr2000@gmail.com/

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending
  2022-01-26  8:31         ` Oleksandr Andrushchenko
@ 2022-01-26 10:54           ` Jan Beulich
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Beulich @ 2022-01-26 10:54 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, Julien Grall, Roger Pau Monné

On 26.01.2022 09:31, Oleksandr Andrushchenko wrote:
> On 11.12.21 10:57, Oleksandr Andrushchenko wrote:
>> On 11.12.21 10:20, Roger Pau Monné wrote:
>>> I think this can be committed independently of the rest of the
>>> series?
>> I think so
> Could you please commit this one, so I don't have to keep it in the v6 of the series?

Did you actually check before asking? See commit 7dc0233f534f from Dec 14th.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-26  8:40       ` Oleksandr Andrushchenko
@ 2022-01-26 11:13         ` Roger Pau Monné
  2022-01-31  7:41           ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-26 11:13 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	andrew.cooper3, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh, Ian Jackson

On Wed, Jan 26, 2022 at 08:40:09AM +0000, Oleksandr Andrushchenko wrote:
> Hello, Roger, Jan!
> 
> On 12.01.22 16:42, Jan Beulich wrote:
> > On 11.01.2022 16:17, Roger Pau Monné wrote:
> >> On Thu, Nov 25, 2021 at 01:02:40PM +0200, Oleksandr Andrushchenko wrote:
> >>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> >>> index 657697fe3406..ceaac4516ff8 100644
> >>> --- a/xen/drivers/vpci/vpci.c
> >>> +++ b/xen/drivers/vpci/vpci.c
> >>> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
> >>>   extern vpci_register_init_t *const __end_vpci_array[];
> >>>   #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
> >>>   
> >>> -void vpci_remove_device(struct pci_dev *pdev)
> >>> +static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
> >>>   {
> >>> -    if ( !has_vpci(pdev->domain) )
> >>> -        return;
> >>> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
> >>>   
> >>> -    spin_lock(&pdev->vpci->lock);
> >>>       while ( !list_empty(&pdev->vpci->handlers) )
> >>>       {
> >>>           struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
> >>> @@ -50,15 +48,33 @@ void vpci_remove_device(struct pci_dev *pdev)
> >>>           list_del(&r->node);
> >>>           xfree(r);
> >>>       }
> >>> -    spin_unlock(&pdev->vpci->lock);
> >>> +}
> >>> +
> >>> +void vpci_remove_device_locked(struct pci_dev *pdev)
> >> I think this could be static instead, as it's only used by
> >> vpci_remove_device and vpci_add_handlers which are local to the
> >> file.
> This is going to be used outside later on while processing pending mappings,
> so I think it is not worth it defining it static here and then removing the static
> key word later on: please see [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal [1]

I have some comments there also, which might change the approach
you are using.

> > Does the splitting out of vpci_remove_device_handlers_locked() belong in
> > this patch in the first place? There's no second caller being added, so
> > this looks to be an orthogonal adjustment.
> I think of it as a preparation for the upcoming code: although the reason for the
> change might not be immediately seen in this patch it is still in line with what
> happens next.

Right - it's generally best if the change is done together as the new
callers are added. Otherwise it's hard to understand why certain changes
are made, and you will likely get asked the same question on next
rounds.

It's also possible that the code that requires this is changed in
further iterations so there's no longer a need for the splitting.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal
  2022-01-12 15:27   ` Jan Beulich
@ 2022-01-28 12:21     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-28 12:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko



On 12.01.22 17:27, Jan Beulich wrote:
> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> When a vPCI is removed for a PCI device it is possible that we have
>> scheduled a delayed work for map/unmap operations for that device.
>> For example, the following scenario can illustrate the problem:
>>
>> pci_physdev_op
>>     pci_add_device
>>         init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ)
>>     iommu_add_device <- FAILS
>>     vpci_remove_device -> xfree(pdev->vpci)
>>
>> leave_hypervisor_to_guest
>>     vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
>>
>> For the hardware domain we continue execution as the worse that
>> could happen is that MMIO mappings are left in place when the
>> device has been deassigned.
>>
>> For unprivileged domains that get a failure in the middle of a vPCI
>> {un}map operation we need to destroy them, as we don't know in which
>> state the p2m is. This can only happen in vpci_process_pending for
>> DomUs as they won't be allowed to call pci_add_device.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> ---
>> Cc: Roger Pau Monné <roger.pau@citrix.com>
>> ---
>> Since v4:
>>   - crash guest domain if map/unmap operation didn't succeed
>>   - re-work vpci cancel work to cancel work on all vCPUs
>>   - use new locking scheme with pdev->vpci_lock
>> New in v4
>>
>> Fixes: 86dbcf6e30cb ("vpci: cancel pending map/unmap on vpci removal")
> What is this about?
Just a leftover after squashing WIP patches, sorry
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled
  2022-01-13 13:38       ` Jan Beulich
@ 2022-01-28 13:03         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-28 13:03 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

Hello, Roger, Jan!

On 13.01.22 15:38, Jan Beulich wrote:
> On 13.01.2022 14:27, Roger Pau Monné wrote:
>> On Thu, Nov 25, 2021 at 12:17:32PM +0100, Jan Beulich wrote:
>>> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> For unprivileged guests vpci_{read|write} need to be re-worked
>>>> to not passthrough accesses to the registers not explicitly handled
>>>> by the corresponding vPCI handlers: without fixing that passthrough
>>>> to guests is completely unsafe as Xen allows them full access to
>>>> the registers.
>>>>
>>>> Xen needs to be sure that every register a guest accesses is not
>>>> going to cause the system to malfunction, so Xen needs to keep a
>>>> list of the registers it is safe for a guest to access.
>>>>
>>>> For example, we should only expose the PCI capabilities that we know
>>>> are safe for a guest to use, i.e.: MSI and MSI-X initially.
>>>> The rest of the capabilities should be blocked from guest access,
>>>> unless we audit them and declare safe for a guest to access.
>>>>
>>>> As a reference we might want to look at the approach currently used
>>>> by QEMU in order to do PCI passthrough. A very limited set of PCI
>>>> capabilities known to be safe for untrusted access are exposed to the
>>>> guest and registers need to be explicitly handled or else access is
>>>> rejected. Xen needs a fairly similar model in vPCI or else none of
>>>> this will be safe for unprivileged access.
>>>>
>>>> Add the corresponding TODO comment to highlight there is a problem that
>>>> needs to be fixed.
>>>>
>>>> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> Looks okay to me in principle, but imo needs to come earlier in the
>>> series, before things actually get exposed to DomU-s.
>> Are domUs really allowed to use this code? Maybe it's done in a
>> separate series, but has_vpci is hardcoded to false on Arm, and
>> X86_EMU_VPCI can only be set for the hardware domain on x86.
That is by intention: we do not want to have this enabled on Arm until
it can really be used...
> I'm not sure either. This series gives the impression of exposing things,
> but I admit I didn't pay attention to has_vpci() being hardcoded on Arm.
...so we enable vPCI on Arm right after we are all set
> Then again there were at least 3 series in parallel originally, with
> interdependencies (iirc) not properly spelled out ...
Sorry about that, we should have said that explicitly
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-12 14:57   ` Jan Beulich
  2022-01-12 15:42     ` Roger Pau Monné
@ 2022-01-28 14:12     ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-28 14:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

Hi, Jan!

On 12.01.22 16:57, Jan Beulich wrote:
> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>> @@ -68,12 +84,13 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>       /* We should not get here twice for the same device. */
>>       ASSERT(!pdev->vpci);
>>   
>> -    pdev->vpci = xzalloc(struct vpci);
>> -    if ( !pdev->vpci )
>> +    vpci = xzalloc(struct vpci);
>> +    if ( !vpci )
>>           return -ENOMEM;
>>   
>> +    spin_lock(&pdev->vpci_lock);
>> +    pdev->vpci = vpci;
>>       INIT_LIST_HEAD(&pdev->vpci->handlers);
>> -    spin_lock_init(&pdev->vpci->lock);
> INIT_LIST_HEAD() can occur ahead of taking the lock, and can also act
> on &vpci->handlers rather than &pdev->vpci->handlers.
Yes, I will move it, good catch
>>       for ( i = 0; i < NUM_VPCI_INIT; i++ )
>>       {
>> @@ -83,7 +100,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>       }
> This loop wants to live in the locked region because you need to install
> vpci into pdev->vpci up front, afaict. I wonder whether that couldn't
> be relaxed, but perhaps that's an improvement that can be thought about
> later.
Ok, so I'll leave it as is
>
> The main reason I'm looking at this closely is because from the patch
> title I didn't expect new locking regions to be introduced right here;
> instead I did expect strictly a mechanical conversion.
>
>> @@ -152,8 +170,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>       r->offset = offset;
>>       r->private = data;
>>   
>> -    spin_lock(&vpci->lock);
>  From the description I can't deduce why this lock is fine to go away
> now, i.e. that all call sites have the lock now acquire earlier.
> Therefore I'd expect at least an assertion to be left here ...
>
>> @@ -183,7 +197,6 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
>>       const struct vpci_register r = { .offset = offset, .size = size };
>>       struct vpci_register *rm;
>>   
>> -    spin_lock(&vpci->lock);
> ... and here.
Previously the lock lived in struct vpci and now it lives in struct pci_dev which
is not visible here, so:
1. we cannot take that lock here and do expect for it to be acquired outside
2. we cannot add an ASSERT here as we would need
ASSERT(spin_is_locked(&pdev->vpci_lock));
and pdev is not here
All the callers of the vpci_{add|remove}_register are REGISTER_VPCI_INIT
functions which are called with &pdev->vpci_lock held.

So, while I agree that it would be indeed a good check with ASSERT here,
but adding an additional argument to the respective functions just for that
might not be a good idea IMO

I will describe this lock removal in the commit message

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-13  8:58         ` Roger Pau Monné
@ 2022-01-28 14:15           ` Oleksandr Andrushchenko
  2022-01-31  8:56             ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-28 14:15 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

Hi, Roger, Jan!

On 13.01.22 10:58, Roger Pau Monné wrote:
> On Wed, Jan 12, 2022 at 04:52:51PM +0100, Jan Beulich wrote:
>> On 12.01.2022 16:42, Roger Pau Monné wrote:
>>> On Wed, Jan 12, 2022 at 03:57:36PM +0100, Jan Beulich wrote:
>>>> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>>>>> @@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>>   
>>>>>           data = merge_result(data, tmp_data, size - data_offset, data_offset);
>>>>>       }
>>>>> -    spin_unlock(&pdev->vpci->lock);
>>>>>   
>>>>>       return data & (0xffffffff >> (32 - 8 * size));
>>>>>   }
>>>> Here and ...
>>>>
>>>>> @@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>>>               break;
>>>>>           ASSERT(data_offset < size);
>>>>>       }
>>>>> +    spin_unlock(&pdev->vpci_lock);
>>>>>   
>>>>>       if ( data_offset < size )
>>>>>           /* Tailing gap, write the remaining. */
>>>>>           vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>>>>>                         data >> (data_offset * 8));
>>>>> -
>>>>> -    spin_unlock(&pdev->vpci->lock);
>>>>>   }
>>>> ... even more so here I'm not sure of the correctness of the moving
>>>> you do: While pdev->vpci indeed doesn't get accessed, I wonder
>>>> whether there wasn't an intention to avoid racing calls to
>>>> vpci_{read,write}_hw() this way. In any event I think such movement
>>>> would need justification in the description.
>>> I agree about the need for justification in the commit message, or
>>> even better this being split into a pre-patch, as it's not related to
>>> the lock switching done here.
>>>
>>> I do think this is fine however, as racing calls to
>>> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
>>> around pci_conf_{read,write} functions, and the required locking (in
>>> case of using the IO ports) is already taken care in
>>> pci_conf_{read,write}.
>> IOW you consider it acceptable for a guest (really: Dom0) read racing
>> a write to read back only part of what was written (so far)? I would
>> think individual multi-byte reads and writes should appear atomic to
>> the guest.
> We split 64bit writes into two 32bit ones without taking the lock for
> the whole duration of the access, so it's already possible to see a
> partially updated state as a result of a 64bit write.
>
> I'm going over the PCI(e) spec but I don't seem to find anything about
> whether the ECAM is allowed to split memory transactions into multiple
> Configuration Requests, and whether those could then interleave with
> requests from a different CPU.
So, with the above is it still fine for you to have the change as is or
you want this optimization to go into a dedicated patch before this one?
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-26 11:13         ` Roger Pau Monné
@ 2022-01-31  7:41           ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31  7:41 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Ian Jackson

Hi, Jan, Roger!

On 26.01.22 13:13, Roger Pau Monné wrote:
> On Wed, Jan 26, 2022 at 08:40:09AM +0000, Oleksandr Andrushchenko wrote:
>> Hello, Roger, Jan!
>>
>> On 12.01.22 16:42, Jan Beulich wrote:
>>> On 11.01.2022 16:17, Roger Pau Monné wrote:
>>>> On Thu, Nov 25, 2021 at 01:02:40PM +0200, Oleksandr Andrushchenko wrote:
>>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>>> index 657697fe3406..ceaac4516ff8 100644
>>>>> --- a/xen/drivers/vpci/vpci.c
>>>>> +++ b/xen/drivers/vpci/vpci.c
>>>>> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>>>>    extern vpci_register_init_t *const __end_vpci_array[];
>>>>>    #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>>>>    
>>>>> -void vpci_remove_device(struct pci_dev *pdev)
>>>>> +static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
>>>>>    {
>>>>> -    if ( !has_vpci(pdev->domain) )
>>>>> -        return;
>>>>> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
>>>>>    
>>>>> -    spin_lock(&pdev->vpci->lock);
>>>>>        while ( !list_empty(&pdev->vpci->handlers) )
>>>>>        {
>>>>>            struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
>>>>> @@ -50,15 +48,33 @@ void vpci_remove_device(struct pci_dev *pdev)
>>>>>            list_del(&r->node);
>>>>>            xfree(r);
>>>>>        }
>>>>> -    spin_unlock(&pdev->vpci->lock);
>>>>> +}
>>>>> +
>>>>> +void vpci_remove_device_locked(struct pci_dev *pdev)
>>>> I think this could be static instead, as it's only used by
>>>> vpci_remove_device and vpci_add_handlers which are local to the
>>>> file.
>> This is going to be used outside later on while processing pending mappings,
>> so I think it is not worth it defining it static here and then removing the static
>> key word later on: please see [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal [1]
> I have some comments there also, which might change the approach
> you are using.
>
>>> Does the splitting out of vpci_remove_device_handlers_locked() belong in
>>> this patch in the first place? There's no second caller being added, so
>>> this looks to be an orthogonal adjustment.
>> I think of it as a preparation for the upcoming code: although the reason for the
>> change might not be immediately seen in this patch it is still in line with what
>> happens next.
> Right - it's generally best if the change is done together as the new
> callers are added. Otherwise it's hard to understand why certain changes
> are made, and you will likely get asked the same question on next
> rounds.
>
> It's also possible that the code that requires this is changed in
> further iterations so there's no longer a need for the splitting.
Ok, sounds reasonable
I will not split the functions without the obvious need
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal
  2021-11-25 11:02 ` [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal Oleksandr Andrushchenko
  2022-01-11 16:57   ` Roger Pau Monné
  2022-01-12 15:27   ` Jan Beulich
@ 2022-01-31  7:53   ` Oleksandr Andrushchenko
  2 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31  7:53 UTC (permalink / raw)
  To: xen-devel, roger.pau
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, jbeulich, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko

I am going to postpone this patch as it does need major re-thinking
on the approach to take. This is possible as it fixes a really rare
use-case seen during development phase, so shouldn't hurt the run-time

Thank you,
Oleksandr

On 25.11.21 13:02, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> When a vPCI is removed for a PCI device it is possible that we have
> scheduled a delayed work for map/unmap operations for that device.
> For example, the following scenario can illustrate the problem:
>
> pci_physdev_op
>     pci_add_device
>         init_bars -> modify_bars -> defer_map -> raise_softirq(SCHEDULE_SOFTIRQ)
>     iommu_add_device <- FAILS
>     vpci_remove_device -> xfree(pdev->vpci)
>
> leave_hypervisor_to_guest
>     vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL
>
> For the hardware domain we continue execution as the worse that
> could happen is that MMIO mappings are left in place when the
> device has been deassigned.
>
> For unprivileged domains that get a failure in the middle of a vPCI
> {un}map operation we need to destroy them, as we don't know in which
> state the p2m is. This can only happen in vpci_process_pending for
> DomUs as they won't be allowed to call pci_add_device.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> ---
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Since v4:
>   - crash guest domain if map/unmap operation didn't succeed
>   - re-work vpci cancel work to cancel work on all vCPUs
>   - use new locking scheme with pdev->vpci_lock
> New in v4
>
> Fixes: 86dbcf6e30cb ("vpci: cancel pending map/unmap on vpci removal")
>
> ---
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>   xen/drivers/vpci/header.c | 49 ++++++++++++++++++++++++++++++---------
>   xen/drivers/vpci/vpci.c   |  2 ++
>   xen/include/xen/pci.h     |  5 ++++
>   xen/include/xen/vpci.h    |  6 +++++
>   4 files changed, 51 insertions(+), 11 deletions(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index bd23c0274d48..ba333fb2f9b0 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -131,7 +131,13 @@ static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>   
>   bool vpci_process_pending(struct vcpu *v)
>   {
> -    if ( v->vpci.mem )
> +    struct pci_dev *pdev = v->vpci.pdev;
> +
> +    if ( !pdev )
> +        return false;
> +
> +    spin_lock(&pdev->vpci_lock);
> +    if ( !pdev->vpci_cancel_pending && v->vpci.mem )
>       {
>           struct map_data data = {
>               .d = v->domain,
> @@ -140,32 +146,53 @@ bool vpci_process_pending(struct vcpu *v)
>           int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
>   
>           if ( rc == -ERESTART )
> +        {
> +            spin_unlock(&pdev->vpci_lock);
>               return true;
> +        }
>   
> -        spin_lock(&v->vpci.pdev->vpci_lock);
> -        if ( v->vpci.pdev->vpci )
> +        if ( pdev->vpci )
>               /* Disable memory decoding unconditionally on failure. */
> -            modify_decoding(v->vpci.pdev,
> +            modify_decoding(pdev,
>                               rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>                               !rc && v->vpci.rom_only);
> -        spin_unlock(&v->vpci.pdev->vpci_lock);
>   
> -        rangeset_destroy(v->vpci.mem);
> -        v->vpci.mem = NULL;
>           if ( rc )
> +        {
>               /*
>                * FIXME: in case of failure remove the device from the domain.
>                * Note that there might still be leftover mappings. While this is
> -             * safe for Dom0, for DomUs the domain will likely need to be
> -             * killed in order to avoid leaking stale p2m mappings on
> -             * failure.
> +             * safe for Dom0, for DomUs the domain needs to be killed in order
> +             * to avoid leaking stale p2m mappings on failure.
>                */
> -            vpci_remove_device(v->vpci.pdev);
> +            if ( is_hardware_domain(v->domain) )
> +                vpci_remove_device_locked(pdev);
> +            else
> +                domain_crash(v->domain);
> +        }
>       }
> +    spin_unlock(&pdev->vpci_lock);
>   
>       return false;
>   }
>   
> +void vpci_cancel_pending_locked(struct pci_dev *pdev)
> +{
> +    struct vcpu *v;
> +
> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
> +
> +    /* Cancel any pending work now on all vCPUs. */
> +    for_each_vcpu( pdev->domain, v )
> +    {
> +        if ( v->vpci.mem && (v->vpci.pdev == pdev) )
> +        {
> +            rangeset_destroy(v->vpci.mem);
> +            v->vpci.mem = NULL;
> +        }
> +    }
> +}
> +
>   static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>                               struct rangeset *mem, uint16_t cmd)
>   {
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index ceaac4516ff8..37103e207635 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -54,7 +54,9 @@ void vpci_remove_device_locked(struct pci_dev *pdev)
>   {
>       ASSERT(spin_is_locked(&pdev->vpci_lock));
>   
> +    pdev->vpci_cancel_pending = true;
>       vpci_remove_device_handlers_locked(pdev);
> +    vpci_cancel_pending_locked(pdev);
>       xfree(pdev->vpci->msix);
>       xfree(pdev->vpci->msi);
>       xfree(pdev->vpci);
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 3f60d6c6c6dd..52d302ac5f35 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -135,6 +135,11 @@ struct pci_dev {
>   
>       /* Data for vPCI. */
>       spinlock_t vpci_lock;
> +    /*
> +     * Set if PCI device is being removed now and we need to cancel any
> +     * pending map/unmap operations.
> +     */
> +    bool vpci_cancel_pending;
>       struct vpci *vpci;
>   };
>   
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 8b22bdef11d0..cfff87e5801e 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -57,6 +57,7 @@ uint32_t vpci_hw_read32(const struct pci_dev *pdev, unsigned int reg,
>    * should not run.
>    */
>   bool __must_check vpci_process_pending(struct vcpu *v);
> +void vpci_cancel_pending_locked(struct pci_dev *pdev);
>   
>   struct vpci {
>       /* List of vPCI handlers for a device. */
> @@ -253,6 +254,11 @@ static inline bool __must_check vpci_process_pending(struct vcpu *v)
>       ASSERT_UNREACHABLE();
>       return false;
>   }
> +
> +static inline void vpci_cancel_pending_locked(struct pci_dev *pdev)
> +{
> +    ASSERT_UNREACHABLE();
> +}
>   #endif
>   
>   #endif

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign
  2022-01-12 12:12   ` Roger Pau Monné
@ 2022-01-31  8:43     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31  8:43 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 12.01.22 14:12, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:42PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> When a PCI device gets assigned/de-assigned some work on vPCI side needs
>> to be done for that device. Introduce a pair of hooks so vPCI can handle
>> that.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v4:
>>   - de-assign vPCI from the previous domain on device assignment
>>   - do not remove handlers in vpci_assign_device as those must not
>>     exist at that point
>> Since v3:
>>   - remove toolstack roll-back description from the commit message
>>     as error are to be handled with proper cleanup in Xen itself
>>   - remove __must_check
>>   - remove redundant rc check while assigning devices
>>   - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
>>   - use REGISTER_VPCI_INIT machinery to run required steps on device
>>     init/assign: add run_vpci_init helper
>> Since v2:
>> - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
>>    for x86
>> Since v1:
>>   - constify struct pci_dev where possible
>>   - do not open code is_system_domain()
>>   - extended the commit message
>> ---
>>   xen/drivers/Kconfig           |  4 +++
>>   xen/drivers/passthrough/pci.c | 10 ++++++
>>   xen/drivers/vpci/vpci.c       | 61 +++++++++++++++++++++++++++++------
>>   xen/include/xen/vpci.h        | 16 +++++++++
>>   4 files changed, 82 insertions(+), 9 deletions(-)
>>
>> diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
>> index db94393f47a6..780490cf8e39 100644
>> --- a/xen/drivers/Kconfig
>> +++ b/xen/drivers/Kconfig
>> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>>   config HAS_VPCI
>>   	bool
>>   
>> +config HAS_VPCI_GUEST_SUPPORT
>> +	bool
>> +	depends on HAS_VPCI
>> +
>>   endmenu
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 286808b25e65..d9ef91571adf 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -874,6 +874,10 @@ static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus,
>>       if ( ret )
>>           goto out;
>>   
>> +    ret = vpci_deassign_device(d, pdev);
>> +    if ( ret )
>> +        goto out;
> Following my comment below, this won't be allowed to fail.
>
>> +
>>       if ( pdev->domain == hardware_domain  )
>>           pdev->quarantine = false;
>>   
>> @@ -1429,6 +1433,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>       ASSERT(pdev && (pdev->domain == hardware_domain ||
>>                       pdev->domain == dom_io));
>>   
>> +    rc = vpci_deassign_device(pdev->domain, pdev);
>> +    if ( rc )
>> +        goto done;
>> +
>>       rc = pdev_msix_assign(d, pdev);
>>       if ( rc )
>>           goto done;
>> @@ -1446,6 +1454,8 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>           rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>       }
>>   
>> +    rc = vpci_assign_device(d, pdev);
>> +
>>    done:
>>       if ( rc )
>>           printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 37103e207635..a9e9e8ec438c 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -74,12 +74,26 @@ void vpci_remove_device(struct pci_dev *pdev)
>>       spin_unlock(&pdev->vpci_lock);
>>   }
>>   
>> -int vpci_add_handlers(struct pci_dev *pdev)
>> +static int run_vpci_init(struct pci_dev *pdev)
> Just using add_handlers as function name would be clearer IMO.
Ok, will change
>
>>   {
>> -    struct vpci *vpci;
>>       unsigned int i;
>>       int rc = 0;
>>   
>> +    for ( i = 0; i < NUM_VPCI_INIT; i++ )
>> +    {
>> +        rc = __start_vpci_array[i](pdev);
>> +        if ( rc )
>> +            break;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +int vpci_add_handlers(struct pci_dev *pdev)
>> +{
>> +    struct vpci *vpci;
>> +    int rc;
>> +
>>       if ( !has_vpci(pdev->domain) )
>>           return 0;
>>   
>> @@ -94,19 +108,48 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>       pdev->vpci = vpci;
>>       INIT_LIST_HEAD(&pdev->vpci->handlers);
>>   
>> -    for ( i = 0; i < NUM_VPCI_INIT; i++ )
>> -    {
>> -        rc = __start_vpci_array[i](pdev);
>> -        if ( rc )
>> -            break;
>> -    }
>> -
>> +    rc = run_vpci_init(pdev);
>>       if ( rc )
>>           vpci_remove_device_locked(pdev);
>>       spin_unlock(&pdev->vpci_lock);
>>   
>>       return rc;
>>   }
>> +
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +/* Notify vPCI that device is assigned to guest. */
>> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +
>> +    /* It only makes sense to assign for hwdom or guest domain. */
>> +    if ( is_system_domain(d) || !has_vpci(d) )
> Do you really need the is_system_domain check? System domains
> shouldn't have the VPCI flag set anyway, so should fail the has_vpci
> test.
No, it seems we do not need this check: will remove
>
>> +        return 0;
>> +
>> +    spin_lock(&pdev->vpci_lock);
>> +    rc = run_vpci_init(pdev);
>> +    spin_unlock(&pdev->vpci_lock);
>> +    if ( rc )
>> +        vpci_deassign_device(d, pdev);
>> +
>> +    return rc;
>> +}
>> +
>> +/* Notify vPCI that device is de-assigned from guest. */
>> +int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
> There's no need to return any value from this function AFAICT. It
> should have void return type.
Makes sense, I will s/int/void
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign
  2022-01-13 11:40   ` Roger Pau Monné
@ 2022-01-31  8:45     ` Oleksandr Andrushchenko
  2022-02-01  8:56       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31  8:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 13.01.22 13:40, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:42PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +/* Notify vPCI that device is assigned to guest. */
>> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>> +{
>> +    int rc;
>> +
>> +    /* It only makes sense to assign for hwdom or guest domain. */
>> +    if ( is_system_domain(d) || !has_vpci(d) )
>> +        return 0;
>> +
>> +    spin_lock(&pdev->vpci_lock);
>> +    rc = run_vpci_init(pdev);
> Following my comment below, this will likely need to call
> vpci_add_handlers in order to allocate the pdev->vpci field.
>
> It's not OK to carry the contents of pdev->vpci across domain
> assignations, as the device should be reset, and thus the content of
> pdev->vpci would be stale.
>
>> +    spin_unlock(&pdev->vpci_lock);
>> +    if ( rc )
>> +        vpci_deassign_device(d, pdev);
>> +
>> +    return rc;
>> +}
>> +
>> +/* Notify vPCI that device is de-assigned from guest. */
>> +int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
>> +{
>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>> +    if ( is_system_domain(d) || !has_vpci(d) )
>> +        return 0;
>> +
>> +    spin_lock(&pdev->vpci_lock);
>> +    vpci_remove_device_handlers_locked(pdev);
> You need to free the pdev->vpci structure on deassign. I would expect
> the device to be reset on deassign, so keeping the pdev->vpci contents
> would be wrong.
Sure, I will re-allocate pdev->vpci then
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-28 14:15           ` Oleksandr Andrushchenko
@ 2022-01-31  8:56             ` Roger Pau Monné
  2022-01-31  9:00               ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-31  8:56 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, xen-devel

On Fri, Jan 28, 2022 at 02:15:08PM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger, Jan!
> 
> On 13.01.22 10:58, Roger Pau Monné wrote:
> > On Wed, Jan 12, 2022 at 04:52:51PM +0100, Jan Beulich wrote:
> >> On 12.01.2022 16:42, Roger Pau Monné wrote:
> >>> On Wed, Jan 12, 2022 at 03:57:36PM +0100, Jan Beulich wrote:
> >>>> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
> >>>>> @@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
> >>>>>   
> >>>>>           data = merge_result(data, tmp_data, size - data_offset, data_offset);
> >>>>>       }
> >>>>> -    spin_unlock(&pdev->vpci->lock);
> >>>>>   
> >>>>>       return data & (0xffffffff >> (32 - 8 * size));
> >>>>>   }
> >>>> Here and ...
> >>>>
> >>>>> @@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >>>>>               break;
> >>>>>           ASSERT(data_offset < size);
> >>>>>       }
> >>>>> +    spin_unlock(&pdev->vpci_lock);
> >>>>>   
> >>>>>       if ( data_offset < size )
> >>>>>           /* Tailing gap, write the remaining. */
> >>>>>           vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
> >>>>>                         data >> (data_offset * 8));
> >>>>> -
> >>>>> -    spin_unlock(&pdev->vpci->lock);
> >>>>>   }
> >>>> ... even more so here I'm not sure of the correctness of the moving
> >>>> you do: While pdev->vpci indeed doesn't get accessed, I wonder
> >>>> whether there wasn't an intention to avoid racing calls to
> >>>> vpci_{read,write}_hw() this way. In any event I think such movement
> >>>> would need justification in the description.
> >>> I agree about the need for justification in the commit message, or
> >>> even better this being split into a pre-patch, as it's not related to
> >>> the lock switching done here.
> >>>
> >>> I do think this is fine however, as racing calls to
> >>> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
> >>> around pci_conf_{read,write} functions, and the required locking (in
> >>> case of using the IO ports) is already taken care in
> >>> pci_conf_{read,write}.
> >> IOW you consider it acceptable for a guest (really: Dom0) read racing
> >> a write to read back only part of what was written (so far)? I would
> >> think individual multi-byte reads and writes should appear atomic to
> >> the guest.
> > We split 64bit writes into two 32bit ones without taking the lock for
> > the whole duration of the access, so it's already possible to see a
> > partially updated state as a result of a 64bit write.
> >
> > I'm going over the PCI(e) spec but I don't seem to find anything about
> > whether the ECAM is allowed to split memory transactions into multiple
> > Configuration Requests, and whether those could then interleave with
> > requests from a different CPU.
> So, with the above is it still fine for you to have the change as is or
> you want this optimization to go into a dedicated patch before this one?

The change seems slightly controversial, so I think it would be best
if it was split into a separate patch with a proper reasoning in the
commit message.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 03/14] vpci: move lock outside of struct vpci
  2022-01-31  8:56             ` Roger Pau Monné
@ 2022-01-31  9:00               ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31  9:00 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

Hi, Roger!

On 31.01.22 10:56, Roger Pau Monné wrote:
> On Fri, Jan 28, 2022 at 02:15:08PM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger, Jan!
>>
>> On 13.01.22 10:58, Roger Pau Monné wrote:
>>> On Wed, Jan 12, 2022 at 04:52:51PM +0100, Jan Beulich wrote:
>>>> On 12.01.2022 16:42, Roger Pau Monné wrote:
>>>>> On Wed, Jan 12, 2022 at 03:57:36PM +0100, Jan Beulich wrote:
>>>>>> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>>>>>>> @@ -379,7 +396,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>>>>    
>>>>>>>            data = merge_result(data, tmp_data, size - data_offset, data_offset);
>>>>>>>        }
>>>>>>> -    spin_unlock(&pdev->vpci->lock);
>>>>>>>    
>>>>>>>        return data & (0xffffffff >> (32 - 8 * size));
>>>>>>>    }
>>>>>> Here and ...
>>>>>>
>>>>>>> @@ -475,13 +498,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>>>>>                break;
>>>>>>>            ASSERT(data_offset < size);
>>>>>>>        }
>>>>>>> +    spin_unlock(&pdev->vpci_lock);
>>>>>>>    
>>>>>>>        if ( data_offset < size )
>>>>>>>            /* Tailing gap, write the remaining. */
>>>>>>>            vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>>>>>>>                          data >> (data_offset * 8));
>>>>>>> -
>>>>>>> -    spin_unlock(&pdev->vpci->lock);
>>>>>>>    }
>>>>>> ... even more so here I'm not sure of the correctness of the moving
>>>>>> you do: While pdev->vpci indeed doesn't get accessed, I wonder
>>>>>> whether there wasn't an intention to avoid racing calls to
>>>>>> vpci_{read,write}_hw() this way. In any event I think such movement
>>>>>> would need justification in the description.
>>>>> I agree about the need for justification in the commit message, or
>>>>> even better this being split into a pre-patch, as it's not related to
>>>>> the lock switching done here.
>>>>>
>>>>> I do think this is fine however, as racing calls to
>>>>> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
>>>>> around pci_conf_{read,write} functions, and the required locking (in
>>>>> case of using the IO ports) is already taken care in
>>>>> pci_conf_{read,write}.
>>>> IOW you consider it acceptable for a guest (really: Dom0) read racing
>>>> a write to read back only part of what was written (so far)? I would
>>>> think individual multi-byte reads and writes should appear atomic to
>>>> the guest.
>>> We split 64bit writes into two 32bit ones without taking the lock for
>>> the whole duration of the access, so it's already possible to see a
>>> partially updated state as a result of a 64bit write.
>>>
>>> I'm going over the PCI(e) spec but I don't seem to find anything about
>>> whether the ECAM is allowed to split memory transactions into multiple
>>> Configuration Requests, and whether those could then interleave with
>>> requests from a different CPU.
>> So, with the above is it still fine for you to have the change as is or
>> you want this optimization to go into a dedicated patch before this one?
> The change seems slightly controversial, so I think it would be best
> if it was split into a separate patch with a proper reasoning in the
> commit message.
Sure, will move into a dedicated patch then
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-12 12:35   ` Roger Pau Monné
@ 2022-01-31  9:47     ` Oleksandr Andrushchenko
  2022-01-31 10:40       ` Oleksandr Andrushchenko
  2022-01-31 11:04       ` Roger Pau Monné
  2022-01-31 15:06     ` Oleksandr Andrushchenko
  1 sibling, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31  9:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 12.01.22 14:35, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:43PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Add relevant vpci register handlers when assigning PCI device to a domain
>> and remove those when de-assigning. This allows having different
>> handlers for different domains, e.g. hwdom and other guests.
>>
>> Emulate guest BAR register values: this allows creating a guest view
>> of the registers and emulates size and properties probe as it is done
>> during PCI device enumeration by the guest.
>>
>> ROM BAR is only handled for the hardware domain and for guest domains
>> there is a stub: at the moment PCI expansion ROM handling is supported
>> for x86 only and it might not be used by other architectures without
>> emulating x86. Other use-cases may include using that expansion ROM before
>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>> wants to use the ROM code which seems to be rare.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v4:
>> - updated commit message
>> - s/guest_addr/guest_reg
>> Since v3:
>> - squashed two patches: dynamic add/remove handlers and guest BAR
>>    handler implementation
>> - fix guest BAR read of the high part of a 64bit BAR (Roger)
>> - add error handling to vpci_assign_device
>> - s/dom%pd/%pd
>> - blank line before return
>> Since v2:
>> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>>    has been eliminated from being built on x86
>> Since v1:
>>   - constify struct pci_dev where possible
>>   - do not open code is_system_domain()
>>   - simplify some code3. simplify
>>   - use gdprintk + error code instead of gprintk
>>   - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>>     so these do not get compiled for x86
>>   - removed unneeded is_system_domain check
>>   - re-work guest read/write to be much simpler and do more work on write
>>     than read which is expected to be called more frequently
>>   - removed one too obvious comment
>> ---
>>   xen/drivers/vpci/header.c | 72 +++++++++++++++++++++++++++++++++++----
>>   xen/include/xen/vpci.h    |  3 ++
>>   2 files changed, 69 insertions(+), 6 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ba333fb2f9b0..8880d34ebf8e 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -433,6 +433,48 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>>       pci_conf_write32(pdev->sbdf, reg, val);
>>   }
>>   
>> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t val, void *data)
>> +{
>> +    struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +    else
>> +    {
>> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
>> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
>> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
>> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
>> +    }
>> +
>> +    bar->guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
>> +    bar->guest_reg |= (uint64_t)val << (hi ? 32 : 0);
>> +
>> +    bar->guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
>> +}
>> +
>> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
>> +                               void *data)
>> +{
>> +    const struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +
>> +    return bar->guest_reg >> (hi ? 32 : 0);
>> +}
>> +
>>   static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>                         uint32_t val, void *data)
>>   {
>> @@ -481,6 +523,17 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>           rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>   }
>>   
>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t val, void *data)
>> +{
>> +}
>> +
>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>> +                               void *data)
>> +{
>> +    return 0xffffffff;
>> +}
> There should be no need for those handlers. As said elsewhere: for
> guests registers not explicitly handled should return ~0 for reads and
> drop writes, which is what you are proposing here.
Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
handler exists (which is what I do here with guest_rom_read). But I am not that
sure about the dropped writes:

void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
                 uint32_t data)
{
     unsigned int data_offset = 0;

[snip]

     if ( data_offset < size )
         /* Tailing gap, write the remaining. */
         vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
                       data >> (data_offset * 8));

so it looks like for the un-handled writes we still reach the HW register.
Could you please tell if the code above needs improvement (like checking
if the write was handled) or I still need to provide a write handler, e.g.
guest_rom_write here?
>> +
>>   static int init_bars(struct pci_dev *pdev)
>>   {
>>       uint16_t cmd;
>> @@ -489,6 +542,7 @@ static int init_bars(struct pci_dev *pdev)
>>       struct vpci_header *header = &pdev->vpci->header;
>>       struct vpci_bar *bars = header->bars;
>>       int rc;
>> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>>   
>>       switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>>       {
>> @@ -528,8 +582,10 @@ static int init_bars(struct pci_dev *pdev)
>>           if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>>           {
>>               bars[i].type = VPCI_BAR_MEM64_HI;
>> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
>> -                                   4, &bars[i]);
>> +            rc = vpci_add_register(pdev->vpci,
>> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
>> +                                   is_hwdom ? bar_write : guest_bar_write,
>> +                                   reg, 4, &bars[i]);
>>               if ( rc )
>>               {
>>                   pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> @@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
>>           bars[i].size = size;
>>           bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>>   
>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
>> -                               &bars[i]);
>> +        rc = vpci_add_register(pdev->vpci,
>> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
>> +                               is_hwdom ? bar_write : guest_bar_write,
>> +                               reg, 4, &bars[i]);
>>           if ( rc )
>>           {
>>               pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> @@ -590,8 +648,10 @@ static int init_bars(struct pci_dev *pdev)
>>           header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
>>                                 PCI_ROM_ADDRESS_ENABLE;
>>   
>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
>> -                               4, rom);
>> +        rc = vpci_add_register(pdev->vpci,
>> +                               is_hwdom ? vpci_hw_read32 : guest_rom_read,
>> +                               is_hwdom ? rom_write : guest_rom_write,
>> +                               rom_reg, 4, rom);
> This whole call should be made conditional to is_hwdom, as said above
> there's no need for the guest_rom handlers.
Yes, if writes are indeed dropped, please see question above
>
> Likewise I assume you expect IO BARs to simply return ~0 and drop
> writes, as there's no explicit handler added for those?
Yes, but that was not my intention: I simply didn't handle IO BARs
and now we do need that handling: either with the default behavior
for the unhandled read/write (drop writes, read ~0) or by introducing
the handlers. I hope we can rely on the "unhandled read/write" and
get what we want
>
>>           if ( rc )
>>               rom->type = VPCI_BAR_EMPTY;
>>       }
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index ed127a08a953..0a73b14a92dc 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -68,7 +68,10 @@ struct vpci {
>>       struct vpci_header {
>>           /* Information about the PCI BARs of this device. */
>>           struct vpci_bar {
>> +            /* Physical view of the BAR. */
> No, that's not the physical view, it's the physical (host) address.
Ok
>
>>               uint64_t addr;
>> +            /* Guest view of the BAR: address and lower bits. */
>> +            uint64_t guest_reg;
> I continue to think it would be clearer if you store the guest address
> here (gaddr, without the low bits) and add those in guest_bar_read
> based on bar->{type,prefetchable}. Then it would be equivalent to the
> existing 'addr' field.
Ok, I'll re-work the code with this approach in mind: s/guest_reg/gaddr +
required code to handle that
>
> I wonder whether we need to protect the added code with
> CONFIG_HAS_VPCI_GUEST_SUPPORT, this would effectively be dead code
> otherwise. Long term I don't think we wish to differentiate between
> dom0 and domU vPCI support at build time, so I'm unsure whether it's
> helpful to pollute the code with CONFIG_HAS_VPCI_GUEST_SUPPORT when
> the plan is to remove those long term.
I would have it without CONFIG_HAS_VPCI_GUEST_SUPPORT if you
don't mind
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-12 17:34   ` Roger Pau Monné
@ 2022-01-31  9:53     ` Oleksandr Andrushchenko
  2022-01-31 10:56       ` Roger Pau Monné
  2022-02-03 12:45       ` Oleksandr Andrushchenko
  0 siblings, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31  9:53 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 12.01.22 19:34, Roger Pau Monné wrote:
> A couple more comments I realized while walking the dog.
>
> On Thu, Nov 25, 2021 at 01:02:43PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Add relevant vpci register handlers when assigning PCI device to a domain
>> and remove those when de-assigning. This allows having different
>> handlers for different domains, e.g. hwdom and other guests.
>>
>> Emulate guest BAR register values: this allows creating a guest view
>> of the registers and emulates size and properties probe as it is done
>> during PCI device enumeration by the guest.
>>
>> ROM BAR is only handled for the hardware domain and for guest domains
>> there is a stub: at the moment PCI expansion ROM handling is supported
>> for x86 only and it might not be used by other architectures without
>> emulating x86. Other use-cases may include using that expansion ROM before
>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>> wants to use the ROM code which seems to be rare.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v4:
>> - updated commit message
>> - s/guest_addr/guest_reg
>> Since v3:
>> - squashed two patches: dynamic add/remove handlers and guest BAR
>>    handler implementation
>> - fix guest BAR read of the high part of a 64bit BAR (Roger)
>> - add error handling to vpci_assign_device
>> - s/dom%pd/%pd
>> - blank line before return
>> Since v2:
>> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>>    has been eliminated from being built on x86
>> Since v1:
>>   - constify struct pci_dev where possible
>>   - do not open code is_system_domain()
>>   - simplify some code3. simplify
>>   - use gdprintk + error code instead of gprintk
>>   - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>>     so these do not get compiled for x86
>>   - removed unneeded is_system_domain check
>>   - re-work guest read/write to be much simpler and do more work on write
>>     than read which is expected to be called more frequently
>>   - removed one too obvious comment
>> ---
>>   xen/drivers/vpci/header.c | 72 +++++++++++++++++++++++++++++++++++----
>>   xen/include/xen/vpci.h    |  3 ++
>>   2 files changed, 69 insertions(+), 6 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index ba333fb2f9b0..8880d34ebf8e 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -433,6 +433,48 @@ static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>>       pci_conf_write32(pdev->sbdf, reg, val);
>>   }
>>   
>> +static void guest_bar_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t val, void *data)
>> +{
>> +    struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +    else
>> +    {
>> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
>> +        val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
>> +                                           : PCI_BASE_ADDRESS_MEM_TYPE_64;
>> +        val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
>> +    }
>> +
>> +    bar->guest_reg &= ~(0xffffffffull << (hi ? 32 : 0));
>> +    bar->guest_reg |= (uint64_t)val << (hi ? 32 : 0);
>> +
>> +    bar->guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
> You need to assert that the guest set address has the same page offset
> as the physical address on the host, or otherwise things won't work as
> expected. Ie: guest_addr & ~PAGE_MASK == addr & ~PAGE_MASK.
Good catch, thank you
>
>> +}
>> +
>> +static uint32_t guest_bar_read(const struct pci_dev *pdev, unsigned int reg,
>> +                               void *data)
>> +{
>> +    const struct vpci_bar *bar = data;
>> +    bool hi = false;
>> +
>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>> +    {
>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>> +        bar--;
>> +        hi = true;
>> +    }
>> +
>> +    return bar->guest_reg >> (hi ? 32 : 0);
>> +}
>> +
>>   static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>                         uint32_t val, void *data)
>>   {
>> @@ -481,6 +523,17 @@ static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>           rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>   }
>>   
>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t val, void *data)
>> +{
>> +}
>> +
>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>> +                               void *data)
>> +{
>> +    return 0xffffffff;
>> +}
>> +
>>   static int init_bars(struct pci_dev *pdev)
>>   {
>>       uint16_t cmd;
>> @@ -489,6 +542,7 @@ static int init_bars(struct pci_dev *pdev)
>>       struct vpci_header *header = &pdev->vpci->header;
>>       struct vpci_bar *bars = header->bars;
>>       int rc;
>> +    bool is_hwdom = is_hardware_domain(pdev->domain);
>>   
>>       switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>>       {
>> @@ -528,8 +582,10 @@ static int init_bars(struct pci_dev *pdev)
>>           if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
>>           {
>>               bars[i].type = VPCI_BAR_MEM64_HI;
>> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
>> -                                   4, &bars[i]);
>> +            rc = vpci_add_register(pdev->vpci,
>> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
>> +                                   is_hwdom ? bar_write : guest_bar_write,
>> +                                   reg, 4, &bars[i]);
>>               if ( rc )
>>               {
>>                   pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> @@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
>>           bars[i].size = size;
>>           bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
>>   
>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
>> -                               &bars[i]);
>> +        rc = vpci_add_register(pdev->vpci,
>> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
>> +                               is_hwdom ? bar_write : guest_bar_write,
>> +                               reg, 4, &bars[i]);
> You need to initialize guest_reg to the physical host value also.
But why? There was a concern that exposing host's value to a guest
may be a security issue. And wouldn't it be possible for a guest to decide
that the firmware has setup the BAR and it doesn't need to care of it and
hence use a wrong value instead of setting it up by itself? I had an issue
with that if I'm not mistaken that guest's Linux didn't set the BAR properly
and used what was programmed
>
>>           if ( rc )
>>           {
>>               pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> @@ -590,8 +648,10 @@ static int init_bars(struct pci_dev *pdev)
>>           header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
>>                                 PCI_ROM_ADDRESS_ENABLE;
>>   
>> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
>> -                               4, rom);
>> +        rc = vpci_add_register(pdev->vpci,
>> +                               is_hwdom ? vpci_hw_read32 : guest_rom_read,
>> +                               is_hwdom ? rom_write : guest_rom_write,
>> +                               rom_reg, 4, rom);
>>           if ( rc )
>>               rom->type = VPCI_BAR_EMPTY;
> Also memory decoding needs to be initially disabled when used by
> guests, in order to prevent the BAR being placed on top of a RAM
> region. The guest physmap will be different from the host one, so it's
> possible for BARs to end up placed on top of RAM regions initially
> until the firmware or OS places them at a suitable address.
Agree, memory decoding must be disabled
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31  9:47     ` Oleksandr Andrushchenko
@ 2022-01-31 10:40       ` Oleksandr Andrushchenko
  2022-01-31 10:54         ` Jan Beulich
  2022-01-31 11:10         ` Roger Pau Monné
  2022-01-31 11:04       ` Roger Pau Monné
  1 sibling, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 10:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
> Hi, Roger!
>
> On 12.01.22 14:35, Roger Pau Monné wrote:
>>
>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>>> +                            uint32_t val, void *data)
>>> +{
>>> +}
>>> +
>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>>> +                               void *data)
>>> +{
>>> +    return 0xffffffff;
>>> +}
>> There should be no need for those handlers. As said elsewhere: for
>> guests registers not explicitly handled should return ~0 for reads and
>> drop writes, which is what you are proposing here.
> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
> handler exists (which is what I do here with guest_rom_read). But I am not that
> sure about the dropped writes:
>
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>                   uint32_t data)
> {
>       unsigned int data_offset = 0;
>
> [snip]
>
>       if ( data_offset < size )
>           /* Tailing gap, write the remaining. */
>           vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>                         data >> (data_offset * 8));
>
> so it looks like for the un-handled writes we still reach the HW register.
> Could you please tell if the code above needs improvement (like checking
> if the write was handled) or I still need to provide a write handler, e.g.
> guest_rom_write here?
Hm, but the same applies to the reads as well... And this is no surprise,
as for the guests I can see that it accesses all the configuration space
registers that I don't handle. Without that I would have guests unable
to properly setup a PCI device being passed through... And this is why
I have a big TODO in this series describing unhandled registers.
So, it seems that I do need to provide those handlers which I need to
drop writes and return ~0 on reads.

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 10:40       ` Oleksandr Andrushchenko
@ 2022-01-31 10:54         ` Jan Beulich
  2022-01-31 11:04           ` Oleksandr Andrushchenko
  2022-01-31 11:10         ` Roger Pau Monné
  1 sibling, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-31 10:54 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 31.01.2022 11:40, Oleksandr Andrushchenko wrote:
> On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>
>> On 12.01.22 14:35, Roger Pau Monné wrote:
>>>
>>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>> +                            uint32_t val, void *data)
>>>> +{
>>>> +}
>>>> +
>>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>>>> +                               void *data)
>>>> +{
>>>> +    return 0xffffffff;
>>>> +}
>>> There should be no need for those handlers. As said elsewhere: for
>>> guests registers not explicitly handled should return ~0 for reads and
>>> drop writes, which is what you are proposing here.
>> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
>> handler exists (which is what I do here with guest_rom_read). But I am not that
>> sure about the dropped writes:
>>
>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>                   uint32_t data)
>> {
>>       unsigned int data_offset = 0;
>>
>> [snip]
>>
>>       if ( data_offset < size )
>>           /* Tailing gap, write the remaining. */
>>           vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>>                         data >> (data_offset * 8));
>>
>> so it looks like for the un-handled writes we still reach the HW register.
>> Could you please tell if the code above needs improvement (like checking
>> if the write was handled) or I still need to provide a write handler, e.g.
>> guest_rom_write here?
> Hm, but the same applies to the reads as well... And this is no surprise,
> as for the guests I can see that it accesses all the configuration space
> registers that I don't handle. Without that I would have guests unable
> to properly setup a PCI device being passed through... And this is why
> I have a big TODO in this series describing unhandled registers.
> So, it seems that I do need to provide those handlers which I need to
> drop writes and return ~0 on reads.

It feels like we had been there before: For your initial purposes it may
be fine to do as you suggest, but any such patches should carry RFC tags
or alike to indicate they're not considered ready. Once you're aiming
for things to go in, I think there's no good way around white-listing
what guests may access. You may know that we've been bitten by starting
out with black-listing in the past, first and foremost with x86'es MSRs.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31  9:53     ` Oleksandr Andrushchenko
@ 2022-01-31 10:56       ` Roger Pau Monné
  2022-02-03 12:45       ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-31 10:56 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh

On Mon, Jan 31, 2022 at 09:53:29AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 12.01.22 19:34, Roger Pau Monné wrote:
> > A couple more comments I realized while walking the dog.
> >
> > On Thu, Nov 25, 2021 at 01:02:43PM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >> @@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
> >>           bars[i].size = size;
> >>           bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
> >>   
> >> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> >> -                               &bars[i]);
> >> +        rc = vpci_add_register(pdev->vpci,
> >> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
> >> +                               is_hwdom ? bar_write : guest_bar_write,
> >> +                               reg, 4, &bars[i]);
> > You need to initialize guest_reg to the physical host value also.
> But why? There was a concern that exposing host's value to a guest
> may be a security issue. And wouldn't it be possible for a guest to decide
> that the firmware has setup the BAR and it doesn't need to care of it and
> hence use a wrong value instead of setting it up by itself? I had an issue
> with that if I'm not mistaken that guest's Linux didn't set the BAR properly
> and used what was programmed

I think I've made that comment before realizing that all BARs must
start with memory decoding disabled for guests, so that the guest
firmware can position them. Using the host value as a starting point
doesn't make sense because there's no relation between the guest and
the host memory maps. You can drop this comment.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 10:54         ` Jan Beulich
@ 2022-01-31 11:04           ` Oleksandr Andrushchenko
  2022-01-31 11:27             ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 11:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné,
	Oleksandr Andrushchenko

Hi, Jan!

On 31.01.22 12:54, Jan Beulich wrote:
> On 31.01.2022 11:40, Oleksandr Andrushchenko wrote:
>> On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
>>> Hi, Roger!
>>>
>>> On 12.01.22 14:35, Roger Pau Monné wrote:
>>>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                            uint32_t val, void *data)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                               void *data)
>>>>> +{
>>>>> +    return 0xffffffff;
>>>>> +}
>>>> There should be no need for those handlers. As said elsewhere: for
>>>> guests registers not explicitly handled should return ~0 for reads and
>>>> drop writes, which is what you are proposing here.
>>> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
>>> handler exists (which is what I do here with guest_rom_read). But I am not that
>>> sure about the dropped writes:
>>>
>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>                    uint32_t data)
>>> {
>>>        unsigned int data_offset = 0;
>>>
>>> [snip]
>>>
>>>        if ( data_offset < size )
>>>            /* Tailing gap, write the remaining. */
>>>            vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>>>                          data >> (data_offset * 8));
>>>
>>> so it looks like for the un-handled writes we still reach the HW register.
>>> Could you please tell if the code above needs improvement (like checking
>>> if the write was handled) or I still need to provide a write handler, e.g.
>>> guest_rom_write here?
>> Hm, but the same applies to the reads as well... And this is no surprise,
>> as for the guests I can see that it accesses all the configuration space
>> registers that I don't handle. Without that I would have guests unable
>> to properly setup a PCI device being passed through... And this is why
>> I have a big TODO in this series describing unhandled registers.
>> So, it seems that I do need to provide those handlers which I need to
>> drop writes and return ~0 on reads.
Replying to myself: it is still possible to have vpci_ignored_{read|write}
to handle defaults if, when vpci_add_register is called, the handler
provided is NULL
> It feels like we had been there before: For your initial purposes it may
> be fine to do as you suggest, but any such patches should carry RFC tags
> or alike to indicate they're not considered ready. Once you're aiming
> for things to go in, I think there's no good way around white-listing
> what guests may access. You may know that we've been bitten by starting
> out with black-listing in the past, first and foremost with x86'es MSRs.
I already have a big TODO patch describing the issue. Do you want
it to have a list of handlers that we support as of now? What sort of
while/black list would you expect?
I do understand that we do need proper handling for all the PCI registers
and capabilities long term, but this can't be done at the moment when
we have nothing working at all. Requesting proper handling now will
turn this series into a huge amount of code and undefined time frame.
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31  9:47     ` Oleksandr Andrushchenko
  2022-01-31 10:40       ` Oleksandr Andrushchenko
@ 2022-01-31 11:04       ` Roger Pau Monné
  2022-01-31 14:51         ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-31 11:04 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh

On Mon, Jan 31, 2022 at 09:47:07AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 12.01.22 14:35, Roger Pau Monné wrote:
> > On Thu, Nov 25, 2021 at 01:02:43PM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> >> +                            uint32_t val, void *data)
> >> +{
> >> +}
> >> +
> >> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> >> +                               void *data)
> >> +{
> >> +    return 0xffffffff;
> >> +}
> > There should be no need for those handlers. As said elsewhere: for
> > guests registers not explicitly handled should return ~0 for reads and
> > drop writes, which is what you are proposing here.
> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
> handler exists (which is what I do here with guest_rom_read). But I am not that
> sure about the dropped writes:
> 
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>                  uint32_t data)
> {
>      unsigned int data_offset = 0;
> 
> [snip]
> 
>      if ( data_offset < size )
>          /* Tailing gap, write the remaining. */
>          vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>                        data >> (data_offset * 8));
> 
> so it looks like for the un-handled writes we still reach the HW register.
> Could you please tell if the code above needs improvement (like checking
> if the write was handled) or I still need to provide a write handler, e.g.
> guest_rom_write here?

Right now (given the current code) unhandled reads and writes will all
end up being forwarded to the hardware. This is intended for dom0, but
this is not how it's going to work for domUs, where accesses will be
discarded based on an accept list. IOW the handlers that you are
adding here should be the default behavior for registers not
explicitly handled in the domU case, and shouldn't require explicit
handling.

> >> +
> >>   static int init_bars(struct pci_dev *pdev)
> >>   {
> >>       uint16_t cmd;
> >> @@ -489,6 +542,7 @@ static int init_bars(struct pci_dev *pdev)
> >>       struct vpci_header *header = &pdev->vpci->header;
> >>       struct vpci_bar *bars = header->bars;
> >>       int rc;
> >> +    bool is_hwdom = is_hardware_domain(pdev->domain);
> >>   
> >>       switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
> >>       {
> >> @@ -528,8 +582,10 @@ static int init_bars(struct pci_dev *pdev)
> >>           if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
> >>           {
> >>               bars[i].type = VPCI_BAR_MEM64_HI;
> >> -            rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg,
> >> -                                   4, &bars[i]);
> >> +            rc = vpci_add_register(pdev->vpci,
> >> +                                   is_hwdom ? vpci_hw_read32 : guest_bar_read,
> >> +                                   is_hwdom ? bar_write : guest_bar_write,
> >> +                                   reg, 4, &bars[i]);
> >>               if ( rc )
> >>               {
> >>                   pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> >> @@ -569,8 +625,10 @@ static int init_bars(struct pci_dev *pdev)
> >>           bars[i].size = size;
> >>           bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
> >>   
> >> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, bar_write, reg, 4,
> >> -                               &bars[i]);
> >> +        rc = vpci_add_register(pdev->vpci,
> >> +                               is_hwdom ? vpci_hw_read32 : guest_bar_read,
> >> +                               is_hwdom ? bar_write : guest_bar_write,
> >> +                               reg, 4, &bars[i]);
> >>           if ( rc )
> >>           {
> >>               pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
> >> @@ -590,8 +648,10 @@ static int init_bars(struct pci_dev *pdev)
> >>           header->rom_enabled = pci_conf_read32(pdev->sbdf, rom_reg) &
> >>                                 PCI_ROM_ADDRESS_ENABLE;
> >>   
> >> -        rc = vpci_add_register(pdev->vpci, vpci_hw_read32, rom_write, rom_reg,
> >> -                               4, rom);
> >> +        rc = vpci_add_register(pdev->vpci,
> >> +                               is_hwdom ? vpci_hw_read32 : guest_rom_read,
> >> +                               is_hwdom ? rom_write : guest_rom_write,
> >> +                               rom_reg, 4, rom);
> > This whole call should be made conditional to is_hwdom, as said above
> > there's no need for the guest_rom handlers.
> Yes, if writes are indeed dropped, please see question above
> >
> > Likewise I assume you expect IO BARs to simply return ~0 and drop
> > writes, as there's no explicit handler added for those?
> Yes, but that was not my intention: I simply didn't handle IO BARs
> and now we do need that handling: either with the default behavior
> for the unhandled read/write (drop writes, read ~0) or by introducing
> the handlers. I hope we can rely on the "unhandled read/write" and
> get what we want

Indeed, the default behavior should be changed for domUs to drop
writes, return ~0 for unhandled reads, then you won't need to add
dummy handlers for the registers you don't want to expose.

> >
> >>           if ( rc )
> >>               rom->type = VPCI_BAR_EMPTY;
> >>       }
> >> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> >> index ed127a08a953..0a73b14a92dc 100644
> >> --- a/xen/include/xen/vpci.h
> >> +++ b/xen/include/xen/vpci.h
> >> @@ -68,7 +68,10 @@ struct vpci {
> >>       struct vpci_header {
> >>           /* Information about the PCI BARs of this device. */
> >>           struct vpci_bar {
> >> +            /* Physical view of the BAR. */
> > No, that's not the physical view, it's the physical (host) address.
> Ok
> >
> >>               uint64_t addr;
> >> +            /* Guest view of the BAR: address and lower bits. */
> >> +            uint64_t guest_reg;
> > I continue to think it would be clearer if you store the guest address
> > here (gaddr, without the low bits) and add those in guest_bar_read
> > based on bar->{type,prefetchable}. Then it would be equivalent to the
> > existing 'addr' field.
> Ok, I'll re-work the code with this approach in mind: s/guest_reg/gaddr +
> required code to handle that
> >
> > I wonder whether we need to protect the added code with
> > CONFIG_HAS_VPCI_GUEST_SUPPORT, this would effectively be dead code
> > otherwise. Long term I don't think we wish to differentiate between
> > dom0 and domU vPCI support at build time, so I'm unsure whether it's
> > helpful to pollute the code with CONFIG_HAS_VPCI_GUEST_SUPPORT when
> > the plan is to remove those long term.
> I would have it without CONFIG_HAS_VPCI_GUEST_SUPPORT if you
> don't mind

Well, I guess if it's not too intrusive it's fine to add the defines,
removing them afterwards should be easy.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 10:40       ` Oleksandr Andrushchenko
  2022-01-31 10:54         ` Jan Beulich
@ 2022-01-31 11:10         ` Roger Pau Monné
  2022-01-31 11:23           ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-31 11:10 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh

On Mon, Jan 31, 2022 at 10:40:47AM +0000, Oleksandr Andrushchenko wrote:
> On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
> > Hi, Roger!
> >
> > On 12.01.22 14:35, Roger Pau Monné wrote:
> >>
> >>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> >>> +                            uint32_t val, void *data)
> >>> +{
> >>> +}
> >>> +
> >>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> >>> +                               void *data)
> >>> +{
> >>> +    return 0xffffffff;
> >>> +}
> >> There should be no need for those handlers. As said elsewhere: for
> >> guests registers not explicitly handled should return ~0 for reads and
> >> drop writes, which is what you are proposing here.
> > Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
> > handler exists (which is what I do here with guest_rom_read). But I am not that
> > sure about the dropped writes:
> >
> > void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >                   uint32_t data)
> > {
> >       unsigned int data_offset = 0;
> >
> > [snip]
> >
> >       if ( data_offset < size )
> >           /* Tailing gap, write the remaining. */
> >           vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
> >                         data >> (data_offset * 8));
> >
> > so it looks like for the un-handled writes we still reach the HW register.
> > Could you please tell if the code above needs improvement (like checking
> > if the write was handled) or I still need to provide a write handler, e.g.
> > guest_rom_write here?
> Hm, but the same applies to the reads as well... And this is no surprise,
> as for the guests I can see that it accesses all the configuration space
> registers that I don't handle. Without that I would have guests unable
> to properly setup a PCI device being passed through... And this is why
> I have a big TODO in this series describing unhandled registers.
> So, it seems that I do need to provide those handlers which I need to
> drop writes and return ~0 on reads.

Right (see my previous reply to this comment). I think it would be
easier (and cleaner) if you switched the default behavior regarding
unhandled register access for domUs at the start of the series (drop
writes, reads returns ~0), and then you won't need to add all those
dummy handler to drop writes and return ~0 for reads.

It's going to be more work initially as you would need to support
passthrough of more registers, but it's the right approach that we
need implementation wise.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 11:10         ` Roger Pau Monné
@ 2022-01-31 11:23           ` Oleksandr Andrushchenko
  2022-01-31 11:31             ` Roger Pau Monné
  2022-01-31 11:39             ` Jan Beulich
  0 siblings, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 11:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko



On 31.01.22 13:10, Roger Pau Monné wrote:
> On Mon, Jan 31, 2022 at 10:40:47AM +0000, Oleksandr Andrushchenko wrote:
>> On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
>>> Hi, Roger!
>>>
>>> On 12.01.22 14:35, Roger Pau Monné wrote:
>>>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                            uint32_t val, void *data)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                               void *data)
>>>>> +{
>>>>> +    return 0xffffffff;
>>>>> +}
>>>> There should be no need for those handlers. As said elsewhere: for
>>>> guests registers not explicitly handled should return ~0 for reads and
>>>> drop writes, which is what you are proposing here.
>>> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
>>> handler exists (which is what I do here with guest_rom_read). But I am not that
>>> sure about the dropped writes:
>>>
>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>                    uint32_t data)
>>> {
>>>        unsigned int data_offset = 0;
>>>
>>> [snip]
>>>
>>>        if ( data_offset < size )
>>>            /* Tailing gap, write the remaining. */
>>>            vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>>>                          data >> (data_offset * 8));
>>>
>>> so it looks like for the un-handled writes we still reach the HW register.
>>> Could you please tell if the code above needs improvement (like checking
>>> if the write was handled) or I still need to provide a write handler, e.g.
>>> guest_rom_write here?
>> Hm, but the same applies to the reads as well... And this is no surprise,
>> as for the guests I can see that it accesses all the configuration space
>> registers that I don't handle. Without that I would have guests unable
>> to properly setup a PCI device being passed through... And this is why
>> I have a big TODO in this series describing unhandled registers.
>> So, it seems that I do need to provide those handlers which I need to
>> drop writes and return ~0 on reads.
> Right (see my previous reply to this comment). I think it would be
> easier (and cleaner) if you switched the default behavior regarding
> unhandled register access for domUs at the start of the series (drop
> writes, reads returns ~0), and then you won't need to add all those
> dummy handler to drop writes and return ~0 for reads.
>
> It's going to be more work initially as you would need to support
> passthrough of more registers, but it's the right approach that we
> need implementation wise.
While I agree in general, this effectively means that I'll need to provide
handling for all PCIe registers and capabilities from the very start.
Otherwise no guest be able to properly initialize a PCI device without that.
Of course, we may want starting from stubs instead of proper emulation,
which will direct the access to real HW and later on we add proper emulation.
But, again, this is going to be a rather big piece of code where we need
to explicitly handle every possible capability.

At the moment we are not going to claim that vPCI provides all means to
pass through a PCI device safely with this respect and this is why the feature
itself won't even be a tech preview yet. For that reason I think we can still
have implemented only crucial set of handlers and still allow the rest to
be read/write directly without emulation.

Another question is what needs to be done for vendor specific capabilities?
How these are going to be emulated?
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 11:04           ` Oleksandr Andrushchenko
@ 2022-01-31 11:27             ` Roger Pau Monné
  2022-01-31 11:30               ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-31 11:27 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	andrew.cooper3, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh

On Mon, Jan 31, 2022 at 11:04:29AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Jan!
> 
> On 31.01.22 12:54, Jan Beulich wrote:
> > On 31.01.2022 11:40, Oleksandr Andrushchenko wrote:
> >> On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
> >>> Hi, Roger!
> >>>
> >>> On 12.01.22 14:35, Roger Pau Monné wrote:
> >>>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> >>>>> +                            uint32_t val, void *data)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> >>>>> +                               void *data)
> >>>>> +{
> >>>>> +    return 0xffffffff;
> >>>>> +}
> >>>> There should be no need for those handlers. As said elsewhere: for
> >>>> guests registers not explicitly handled should return ~0 for reads and
> >>>> drop writes, which is what you are proposing here.
> >>> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
> >>> handler exists (which is what I do here with guest_rom_read). But I am not that
> >>> sure about the dropped writes:
> >>>
> >>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >>>                    uint32_t data)
> >>> {
> >>>        unsigned int data_offset = 0;
> >>>
> >>> [snip]
> >>>
> >>>        if ( data_offset < size )
> >>>            /* Tailing gap, write the remaining. */
> >>>            vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
> >>>                          data >> (data_offset * 8));
> >>>
> >>> so it looks like for the un-handled writes we still reach the HW register.
> >>> Could you please tell if the code above needs improvement (like checking
> >>> if the write was handled) or I still need to provide a write handler, e.g.
> >>> guest_rom_write here?
> >> Hm, but the same applies to the reads as well... And this is no surprise,
> >> as for the guests I can see that it accesses all the configuration space
> >> registers that I don't handle. Without that I would have guests unable
> >> to properly setup a PCI device being passed through... And this is why
> >> I have a big TODO in this series describing unhandled registers.
> >> So, it seems that I do need to provide those handlers which I need to
> >> drop writes and return ~0 on reads.
> Replying to myself: it is still possible to have vpci_ignored_{read|write}
> to handle defaults if, when vpci_add_register is called, the handler
> provided is NULL
> > It feels like we had been there before: For your initial purposes it may
> > be fine to do as you suggest, but any such patches should carry RFC tags
> > or alike to indicate they're not considered ready. Once you're aiming
> > for things to go in, I think there's no good way around white-listing
> > what guests may access. You may know that we've been bitten by starting
> > out with black-listing in the past, first and foremost with x86'es MSRs.
> I already have a big TODO patch describing the issue. Do you want
> it to have a list of handlers that we support as of now? What sort of
> while/black list would you expect?
> I do understand that we do need proper handling for all the PCI registers
> and capabilities long term, but this can't be done at the moment when
> we have nothing working at all. Requesting proper handling now will
> turn this series into a huge amount of code and undefined time frame.

We should at least make sure the code added now doesn't need to be
changed in the future when the default is switched. If you don't
want to switch the default handling for domUs to ignore writes and
return ~0 from reads to unhandled registers right now you should keep
the patches that add the ignore handlers to the end of the series and
mark them as 'HACK' or some such in order to notice they are just
used for testing purposes.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 11:27             ` Roger Pau Monné
@ 2022-01-31 11:30               ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 11:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	andrew.cooper3, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko



On 31.01.22 13:27, Roger Pau Monné wrote:
> On Mon, Jan 31, 2022 at 11:04:29AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Jan!
>>
>> On 31.01.22 12:54, Jan Beulich wrote:
>>> On 31.01.2022 11:40, Oleksandr Andrushchenko wrote:
>>>> On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
>>>>> Hi, Roger!
>>>>>
>>>>> On 12.01.22 14:35, Roger Pau Monné wrote:
>>>>>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>> +                            uint32_t val, void *data)
>>>>>>> +{
>>>>>>> +}
>>>>>>> +
>>>>>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
>>>>>>> +                               void *data)
>>>>>>> +{
>>>>>>> +    return 0xffffffff;
>>>>>>> +}
>>>>>> There should be no need for those handlers. As said elsewhere: for
>>>>>> guests registers not explicitly handled should return ~0 for reads and
>>>>>> drop writes, which is what you are proposing here.
>>>>> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
>>>>> handler exists (which is what I do here with guest_rom_read). But I am not that
>>>>> sure about the dropped writes:
>>>>>
>>>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>>>                     uint32_t data)
>>>>> {
>>>>>         unsigned int data_offset = 0;
>>>>>
>>>>> [snip]
>>>>>
>>>>>         if ( data_offset < size )
>>>>>             /* Tailing gap, write the remaining. */
>>>>>             vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
>>>>>                           data >> (data_offset * 8));
>>>>>
>>>>> so it looks like for the un-handled writes we still reach the HW register.
>>>>> Could you please tell if the code above needs improvement (like checking
>>>>> if the write was handled) or I still need to provide a write handler, e.g.
>>>>> guest_rom_write here?
>>>> Hm, but the same applies to the reads as well... And this is no surprise,
>>>> as for the guests I can see that it accesses all the configuration space
>>>> registers that I don't handle. Without that I would have guests unable
>>>> to properly setup a PCI device being passed through... And this is why
>>>> I have a big TODO in this series describing unhandled registers.
>>>> So, it seems that I do need to provide those handlers which I need to
>>>> drop writes and return ~0 on reads.
>> Replying to myself: it is still possible to have vpci_ignored_{read|write}
>> to handle defaults if, when vpci_add_register is called, the handler
>> provided is NULL
>>> It feels like we had been there before: For your initial purposes it may
>>> be fine to do as you suggest, but any such patches should carry RFC tags
>>> or alike to indicate they're not considered ready. Once you're aiming
>>> for things to go in, I think there's no good way around white-listing
>>> what guests may access. You may know that we've been bitten by starting
>>> out with black-listing in the past, first and foremost with x86'es MSRs.
>> I already have a big TODO patch describing the issue. Do you want
>> it to have a list of handlers that we support as of now? What sort of
>> while/black list would you expect?
>> I do understand that we do need proper handling for all the PCI registers
>> and capabilities long term, but this can't be done at the moment when
>> we have nothing working at all. Requesting proper handling now will
>> turn this series into a huge amount of code and undefined time frame.
> We should at least make sure the code added now doesn't need to be
> changed in the future when the default is switched. If you don't
> want to switch the default handling for domUs to ignore writes and
> return ~0 from reads to unhandled registers right now you should keep
> the patches that add the ignore handlers to the end of the series and
> mark them as 'HACK' or some such in order to notice they are just
> used for testing purposes.
Or for all the registers that I do want the writes to be rejected and
reads return ~0 I can pass NULL while calling vpci_add_register,
so the following works:

int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
                       vpci_write_t *write_handler, unsigned int offset,
                       unsigned int size, void *data)
{
[snip]
     r->read = read_handler ?: vpci_ignored_read;
     r->write = write_handler ?: vpci_ignored_write;
which does what we want.
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 11:23           ` Oleksandr Andrushchenko
@ 2022-01-31 11:31             ` Roger Pau Monné
  2022-01-31 11:39             ` Jan Beulich
  1 sibling, 0 replies; 130+ messages in thread
From: Roger Pau Monné @ 2022-01-31 11:31 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh

On Mon, Jan 31, 2022 at 11:23:48AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 31.01.22 13:10, Roger Pau Monné wrote:
> > On Mon, Jan 31, 2022 at 10:40:47AM +0000, Oleksandr Andrushchenko wrote:
> >> On 31.01.22 11:47, Oleksandr Andrushchenko wrote:
> >>> Hi, Roger!
> >>>
> >>> On 12.01.22 14:35, Roger Pau Monné wrote:
> >>>>> +static void guest_rom_write(const struct pci_dev *pdev, unsigned int reg,
> >>>>> +                            uint32_t val, void *data)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>>> +static uint32_t guest_rom_read(const struct pci_dev *pdev, unsigned int reg,
> >>>>> +                               void *data)
> >>>>> +{
> >>>>> +    return 0xffffffff;
> >>>>> +}
> >>>> There should be no need for those handlers. As said elsewhere: for
> >>>> guests registers not explicitly handled should return ~0 for reads and
> >>>> drop writes, which is what you are proposing here.
> >>> Yes, you are right: I can see in vpci_read that we end up reading ~0 if no
> >>> handler exists (which is what I do here with guest_rom_read). But I am not that
> >>> sure about the dropped writes:
> >>>
> >>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >>>                    uint32_t data)
> >>> {
> >>>        unsigned int data_offset = 0;
> >>>
> >>> [snip]
> >>>
> >>>        if ( data_offset < size )
> >>>            /* Tailing gap, write the remaining. */
> >>>            vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
> >>>                          data >> (data_offset * 8));
> >>>
> >>> so it looks like for the un-handled writes we still reach the HW register.
> >>> Could you please tell if the code above needs improvement (like checking
> >>> if the write was handled) or I still need to provide a write handler, e.g.
> >>> guest_rom_write here?
> >> Hm, but the same applies to the reads as well... And this is no surprise,
> >> as for the guests I can see that it accesses all the configuration space
> >> registers that I don't handle. Without that I would have guests unable
> >> to properly setup a PCI device being passed through... And this is why
> >> I have a big TODO in this series describing unhandled registers.
> >> So, it seems that I do need to provide those handlers which I need to
> >> drop writes and return ~0 on reads.
> > Right (see my previous reply to this comment). I think it would be
> > easier (and cleaner) if you switched the default behavior regarding
> > unhandled register access for domUs at the start of the series (drop
> > writes, reads returns ~0), and then you won't need to add all those
> > dummy handler to drop writes and return ~0 for reads.
> >
> > It's going to be more work initially as you would need to support
> > passthrough of more registers, but it's the right approach that we
> > need implementation wise.
> While I agree in general, this effectively means that I'll need to provide
> handling for all PCIe registers and capabilities from the very start.

Well, we can only offer handling of the header and the MSI and MSI-X
capabilities right now, because that's all vPCI currently knows about.

> Otherwise no guest be able to properly initialize a PCI device without that.
> Of course, we may want starting from stubs instead of proper emulation,
> which will direct the access to real HW and later on we add proper emulation.
> But, again, this is going to be a rather big piece of code where we need
> to explicitly handle every possible capability.
> 
> At the moment we are not going to claim that vPCI provides all means to
> pass through a PCI device safely with this respect and this is why the feature
> itself won't even be a tech preview yet. For that reason I think we can still
> have implemented only crucial set of handlers and still allow the rest to
> be read/write directly without emulation.

See my other reply, you can probably move the special handlers into a
separate patch at the end of the series in order to test the
functionality without adding code that will need to be removed when
the defaults for domUs are changed.

> Another question is what needs to be done for vendor specific capabilities?
> How these are going to be emulated?

I think you will need some kind of permissive mode in order to allow a
guest to access those, as they shouldn't be exposed by default.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 11:23           ` Oleksandr Andrushchenko
  2022-01-31 11:31             ` Roger Pau Monné
@ 2022-01-31 11:39             ` Jan Beulich
  2022-01-31 13:30               ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-31 11:39 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 31.01.2022 12:23, Oleksandr Andrushchenko wrote:
> On 31.01.22 13:10, Roger Pau Monné wrote:
>> Right (see my previous reply to this comment). I think it would be
>> easier (and cleaner) if you switched the default behavior regarding
>> unhandled register access for domUs at the start of the series (drop
>> writes, reads returns ~0), and then you won't need to add all those
>> dummy handler to drop writes and return ~0 for reads.
>>
>> It's going to be more work initially as you would need to support
>> passthrough of more registers, but it's the right approach that we
>> need implementation wise.
> While I agree in general, this effectively means that I'll need to provide
> handling for all PCIe registers and capabilities from the very start.
> Otherwise no guest be able to properly initialize a PCI device without that.
> Of course, we may want starting from stubs instead of proper emulation,
> which will direct the access to real HW and later on we add proper emulation.
> But, again, this is going to be a rather big piece of code where we need
> to explicitly handle every possible capability.

Since the two sub-threads are now about exactly the same topic, I'm
answering here instead of there.

No, you are not going to need to emulate all possible capabilities.
We (or really qemu) don't do this on x86 either. Certain capabilities
may be a must, but not everything. There are also device specific
registers not covered by any capability structures - what to do with
those is even more of a question.

Furthermore for some of the fields justification why access to the
raw hardware value is fine is going to be easy: r/o fields like
vendor and device ID, for example. But every bit you allow direct
access to needs to come with justification.

> At the moment we are not going to claim that vPCI provides all means to
> pass through a PCI device safely with this respect and this is why the feature
> itself won't even be a tech preview yet. For that reason I think we can still
> have implemented only crucial set of handlers and still allow the rest to
> be read/write directly without emulation.

I think you need to separate what you need for development from what
goes upstream: For dev purposes you can very well invert the policy
from white- to black-listing. But if we accepted the latter into the
main tree, the risk would be there that something gets missed at the
time where the permission model gets changed around.

You could even have a non-default mode operating the way you want it
(along the lines of pciback's permissive mode), allowing you to get
away without needing to carry private patches. Things may also
initially only work in that mode. But the default should be a mode
which is secure (and which perhaps initially offers only very limited
functionality).

> Another question is what needs to be done for vendor specific capabilities?
> How these are going to be emulated?

By vendor specific code, I'm afraid. Assuming these capabilities
really need exposing in the first place.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 11:39             ` Jan Beulich
@ 2022-01-31 13:30               ` Oleksandr Andrushchenko
  2022-01-31 13:36                 ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 13:30 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko



On 31.01.22 13:39, Jan Beulich wrote:
> On 31.01.2022 12:23, Oleksandr Andrushchenko wrote:
>> On 31.01.22 13:10, Roger Pau Monné wrote:
>>> Right (see my previous reply to this comment). I think it would be
>>> easier (and cleaner) if you switched the default behavior regarding
>>> unhandled register access for domUs at the start of the series (drop
>>> writes, reads returns ~0), and then you won't need to add all those
>>> dummy handler to drop writes and return ~0 for reads.
>>>
>>> It's going to be more work initially as you would need to support
>>> passthrough of more registers, but it's the right approach that we
>>> need implementation wise.
>> While I agree in general, this effectively means that I'll need to provide
>> handling for all PCIe registers and capabilities from the very start.
>> Otherwise no guest be able to properly initialize a PCI device without that.
>> Of course, we may want starting from stubs instead of proper emulation,
>> which will direct the access to real HW and later on we add proper emulation.
>> But, again, this is going to be a rather big piece of code where we need
>> to explicitly handle every possible capability.
> Since the two sub-threads are now about exactly the same topic, I'm
> answering here instead of there.
>
> No, you are not going to need to emulate all possible capabilities.
> We (or really qemu) don't do this on x86 either. Certain capabilities
> may be a must, but not everything. There are also device specific
> registers not covered by any capability structures - what to do with
> those is even more of a question.
>
> Furthermore for some of the fields justification why access to the
> raw hardware value is fine is going to be easy: r/o fields like
> vendor and device ID, for example. But every bit you allow direct
> access to needs to come with justification.
>
>> At the moment we are not going to claim that vPCI provides all means to
>> pass through a PCI device safely with this respect and this is why the feature
>> itself won't even be a tech preview yet. For that reason I think we can still
>> have implemented only crucial set of handlers and still allow the rest to
>> be read/write directly without emulation.
> I think you need to separate what you need for development from what
> goes upstream: For dev purposes you can very well invert the policy
> from white- to black-listing. But if we accepted the latter into the
> main tree, the risk would be there that something gets missed at the
> time where the permission model gets changed around.
>
> You could even have a non-default mode operating the way you want it
> (along the lines of pciback's permissive mode), allowing you to get
> away without needing to carry private patches. Things may also
> initially only work in that mode. But the default should be a mode
> which is secure (and which perhaps initially offers only very limited
> functionality).
Ok, so to make it clear:
1. We do not allow unhandled access for guests: for that I will create a
dedicated patch which will implement such restrictions. Something like
the below (for both vPCI read and write):

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index c5e67491c24f..9ef2a1b5af58 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -347,6 +347,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
      const struct vpci_register *r;
      unsigned int data_offset = 0;
      uint32_t data = ~(uint32_t)0;
+    bool handled = false;

      if ( !size )
      {
@@ -405,6 +406,8 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
          if ( cmp > 0 )
              continue;

+        handled = true; /* Found the handler for this access. */
+
          if ( emu.offset < r->offset )
          {
              /* Heading gap, read partial content from hardware. */
@@ -432,6 +435,10 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
      }
      spin_unlock(&pdev->vpci_lock);

+    /* All unhandled guest requests return all 1's. */
+    if ( !is_hardware_domain(d) && !handled )
+        return ~(uint32_t)0;
+
      if ( data_offset < size )
      {
          /* Tailing gap, read the remaining. */

@Roger: does the above work for you?

2. For the time being, while only a reduced set of registers is emulated,
everyone who wants to test vPCI for guests should either create their own
patch with those handlers implemented or overcome that in any other
suitable way, e.g. with a hack that removes HW register access protection
or by any other means.

>
>> Another question is what needs to be done for vendor specific capabilities?
>> How these are going to be emulated?
> By vendor specific code, I'm afraid. Assuming these capabilities
> really need exposing in the first place.
I have a feeling this is not going to happen...
>
> Jan
>

Thank you,
Oleksandr

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 13:30               ` Oleksandr Andrushchenko
@ 2022-01-31 13:36                 ` Jan Beulich
  2022-01-31 13:41                   ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-31 13:36 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 31.01.2022 14:30, Oleksandr Andrushchenko wrote:
> 
> 
> On 31.01.22 13:39, Jan Beulich wrote:
>> On 31.01.2022 12:23, Oleksandr Andrushchenko wrote:
>>> On 31.01.22 13:10, Roger Pau Monné wrote:
>>>> Right (see my previous reply to this comment). I think it would be
>>>> easier (and cleaner) if you switched the default behavior regarding
>>>> unhandled register access for domUs at the start of the series (drop
>>>> writes, reads returns ~0), and then you won't need to add all those
>>>> dummy handler to drop writes and return ~0 for reads.
>>>>
>>>> It's going to be more work initially as you would need to support
>>>> passthrough of more registers, but it's the right approach that we
>>>> need implementation wise.
>>> While I agree in general, this effectively means that I'll need to provide
>>> handling for all PCIe registers and capabilities from the very start.
>>> Otherwise no guest be able to properly initialize a PCI device without that.
>>> Of course, we may want starting from stubs instead of proper emulation,
>>> which will direct the access to real HW and later on we add proper emulation.
>>> But, again, this is going to be a rather big piece of code where we need
>>> to explicitly handle every possible capability.
>> Since the two sub-threads are now about exactly the same topic, I'm
>> answering here instead of there.
>>
>> No, you are not going to need to emulate all possible capabilities.
>> We (or really qemu) don't do this on x86 either. Certain capabilities
>> may be a must, but not everything. There are also device specific
>> registers not covered by any capability structures - what to do with
>> those is even more of a question.
>>
>> Furthermore for some of the fields justification why access to the
>> raw hardware value is fine is going to be easy: r/o fields like
>> vendor and device ID, for example. But every bit you allow direct
>> access to needs to come with justification.
>>
>>> At the moment we are not going to claim that vPCI provides all means to
>>> pass through a PCI device safely with this respect and this is why the feature
>>> itself won't even be a tech preview yet. For that reason I think we can still
>>> have implemented only crucial set of handlers and still allow the rest to
>>> be read/write directly without emulation.
>> I think you need to separate what you need for development from what
>> goes upstream: For dev purposes you can very well invert the policy
>> from white- to black-listing. But if we accepted the latter into the
>> main tree, the risk would be there that something gets missed at the
>> time where the permission model gets changed around.
>>
>> You could even have a non-default mode operating the way you want it
>> (along the lines of pciback's permissive mode), allowing you to get
>> away without needing to carry private patches. Things may also
>> initially only work in that mode. But the default should be a mode
>> which is secure (and which perhaps initially offers only very limited
>> functionality).
> Ok, so to make it clear:
> 1. We do not allow unhandled access for guests: for that I will create a
> dedicated patch which will implement such restrictions. Something like
> the below (for both vPCI read and write):
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index c5e67491c24f..9ef2a1b5af58 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -347,6 +347,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>       const struct vpci_register *r;
>       unsigned int data_offset = 0;
>       uint32_t data = ~(uint32_t)0;
> +    bool handled = false;
> 
>       if ( !size )
>       {
> @@ -405,6 +406,8 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>           if ( cmp > 0 )
>               continue;
> 
> +        handled = true; /* Found the handler for this access. */
> +
>           if ( emu.offset < r->offset )
>           {
>               /* Heading gap, read partial content from hardware. */
> @@ -432,6 +435,10 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>       }
>       spin_unlock(&pdev->vpci_lock);
> 
> +    /* All unhandled guest requests return all 1's. */
> +    if ( !is_hardware_domain(d) && !handled )
> +        return ~(uint32_t)0;
> +
>       if ( data_offset < size )
>       {
>           /* Tailing gap, read the remaining. */

Except that like for the "tailing gap" you also need to avoid the
"heading gap" ending up in a read of the underlying hardware
register. Effectively you want to deal properly with all
vpci_read_hw() invocations (including the one when no pdev was
found, which for a DomU may simply mean domain_crash()).

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 13:36                 ` Jan Beulich
@ 2022-01-31 13:41                   ` Oleksandr Andrushchenko
  2022-01-31 13:51                     ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 13:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné



On 31.01.22 15:36, Jan Beulich wrote:
> On 31.01.2022 14:30, Oleksandr Andrushchenko wrote:
>>
>> On 31.01.22 13:39, Jan Beulich wrote:
>>> On 31.01.2022 12:23, Oleksandr Andrushchenko wrote:
>>>> On 31.01.22 13:10, Roger Pau Monné wrote:
>>>>> Right (see my previous reply to this comment). I think it would be
>>>>> easier (and cleaner) if you switched the default behavior regarding
>>>>> unhandled register access for domUs at the start of the series (drop
>>>>> writes, reads returns ~0), and then you won't need to add all those
>>>>> dummy handler to drop writes and return ~0 for reads.
>>>>>
>>>>> It's going to be more work initially as you would need to support
>>>>> passthrough of more registers, but it's the right approach that we
>>>>> need implementation wise.
>>>> While I agree in general, this effectively means that I'll need to provide
>>>> handling for all PCIe registers and capabilities from the very start.
>>>> Otherwise no guest be able to properly initialize a PCI device without that.
>>>> Of course, we may want starting from stubs instead of proper emulation,
>>>> which will direct the access to real HW and later on we add proper emulation.
>>>> But, again, this is going to be a rather big piece of code where we need
>>>> to explicitly handle every possible capability.
>>> Since the two sub-threads are now about exactly the same topic, I'm
>>> answering here instead of there.
>>>
>>> No, you are not going to need to emulate all possible capabilities.
>>> We (or really qemu) don't do this on x86 either. Certain capabilities
>>> may be a must, but not everything. There are also device specific
>>> registers not covered by any capability structures - what to do with
>>> those is even more of a question.
>>>
>>> Furthermore for some of the fields justification why access to the
>>> raw hardware value is fine is going to be easy: r/o fields like
>>> vendor and device ID, for example. But every bit you allow direct
>>> access to needs to come with justification.
>>>
>>>> At the moment we are not going to claim that vPCI provides all means to
>>>> pass through a PCI device safely with this respect and this is why the feature
>>>> itself won't even be a tech preview yet. For that reason I think we can still
>>>> have implemented only crucial set of handlers and still allow the rest to
>>>> be read/write directly without emulation.
>>> I think you need to separate what you need for development from what
>>> goes upstream: For dev purposes you can very well invert the policy
>>> from white- to black-listing. But if we accepted the latter into the
>>> main tree, the risk would be there that something gets missed at the
>>> time where the permission model gets changed around.
>>>
>>> You could even have a non-default mode operating the way you want it
>>> (along the lines of pciback's permissive mode), allowing you to get
>>> away without needing to carry private patches. Things may also
>>> initially only work in that mode. But the default should be a mode
>>> which is secure (and which perhaps initially offers only very limited
>>> functionality).
>> Ok, so to make it clear:
>> 1. We do not allow unhandled access for guests: for that I will create a
>> dedicated patch which will implement such restrictions. Something like
>> the below (for both vPCI read and write):
>>
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index c5e67491c24f..9ef2a1b5af58 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -347,6 +347,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>        const struct vpci_register *r;
>>        unsigned int data_offset = 0;
>>        uint32_t data = ~(uint32_t)0;
>> +    bool handled = false;
>>
>>        if ( !size )
>>        {
>> @@ -405,6 +406,8 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>            if ( cmp > 0 )
>>                continue;
>>
>> +        handled = true; /* Found the handler for this access. */
>> +
>>            if ( emu.offset < r->offset )
>>            {
>>                /* Heading gap, read partial content from hardware. */
>> @@ -432,6 +435,10 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>        }
>>        spin_unlock(&pdev->vpci_lock);
>>
>> +    /* All unhandled guest requests return all 1's. */
>> +    if ( !is_hardware_domain(d) && !handled )
>> +        return ~(uint32_t)0;
>> +
>>        if ( data_offset < size )
>>        {
>>            /* Tailing gap, read the remaining. */
> Except that like for the "tailing gap" you also need to avoid the
> "heading gap" ending up in a read of the underlying hardware
> register. Effectively you want to deal properly with all
> vpci_read_hw() invocations (including the one when no pdev was
> found, which for a DomU may simply mean domain_crash()).
Yes. And with the above patch I can now remove the "TODO patch" then?
Because it is saying that we allow access to the registers, but it is not safe.
And now, if we disable that access, then TODO should be about the need to
implement emulation for all the registers which are not yet handled which is
obvious.
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 13:41                   ` Oleksandr Andrushchenko
@ 2022-01-31 13:51                     ` Jan Beulich
  2022-01-31 13:58                       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-31 13:51 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 31.01.2022 14:41, Oleksandr Andrushchenko wrote:
> On 31.01.22 15:36, Jan Beulich wrote:
>> On 31.01.2022 14:30, Oleksandr Andrushchenko wrote:
>>> On 31.01.22 13:39, Jan Beulich wrote:
>>>> On 31.01.2022 12:23, Oleksandr Andrushchenko wrote:
>>>>> On 31.01.22 13:10, Roger Pau Monné wrote:
>>>>>> Right (see my previous reply to this comment). I think it would be
>>>>>> easier (and cleaner) if you switched the default behavior regarding
>>>>>> unhandled register access for domUs at the start of the series (drop
>>>>>> writes, reads returns ~0), and then you won't need to add all those
>>>>>> dummy handler to drop writes and return ~0 for reads.
>>>>>>
>>>>>> It's going to be more work initially as you would need to support
>>>>>> passthrough of more registers, but it's the right approach that we
>>>>>> need implementation wise.
>>>>> While I agree in general, this effectively means that I'll need to provide
>>>>> handling for all PCIe registers and capabilities from the very start.
>>>>> Otherwise no guest be able to properly initialize a PCI device without that.
>>>>> Of course, we may want starting from stubs instead of proper emulation,
>>>>> which will direct the access to real HW and later on we add proper emulation.
>>>>> But, again, this is going to be a rather big piece of code where we need
>>>>> to explicitly handle every possible capability.
>>>> Since the two sub-threads are now about exactly the same topic, I'm
>>>> answering here instead of there.
>>>>
>>>> No, you are not going to need to emulate all possible capabilities.
>>>> We (or really qemu) don't do this on x86 either. Certain capabilities
>>>> may be a must, but not everything. There are also device specific
>>>> registers not covered by any capability structures - what to do with
>>>> those is even more of a question.
>>>>
>>>> Furthermore for some of the fields justification why access to the
>>>> raw hardware value is fine is going to be easy: r/o fields like
>>>> vendor and device ID, for example. But every bit you allow direct
>>>> access to needs to come with justification.
>>>>
>>>>> At the moment we are not going to claim that vPCI provides all means to
>>>>> pass through a PCI device safely with this respect and this is why the feature
>>>>> itself won't even be a tech preview yet. For that reason I think we can still
>>>>> have implemented only crucial set of handlers and still allow the rest to
>>>>> be read/write directly without emulation.
>>>> I think you need to separate what you need for development from what
>>>> goes upstream: For dev purposes you can very well invert the policy
>>>> from white- to black-listing. But if we accepted the latter into the
>>>> main tree, the risk would be there that something gets missed at the
>>>> time where the permission model gets changed around.
>>>>
>>>> You could even have a non-default mode operating the way you want it
>>>> (along the lines of pciback's permissive mode), allowing you to get
>>>> away without needing to carry private patches. Things may also
>>>> initially only work in that mode. But the default should be a mode
>>>> which is secure (and which perhaps initially offers only very limited
>>>> functionality).
>>> Ok, so to make it clear:
>>> 1. We do not allow unhandled access for guests: for that I will create a
>>> dedicated patch which will implement such restrictions. Something like
>>> the below (for both vPCI read and write):
>>>
>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>> index c5e67491c24f..9ef2a1b5af58 100644
>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -347,6 +347,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>        const struct vpci_register *r;
>>>        unsigned int data_offset = 0;
>>>        uint32_t data = ~(uint32_t)0;
>>> +    bool handled = false;
>>>
>>>        if ( !size )
>>>        {
>>> @@ -405,6 +406,8 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>            if ( cmp > 0 )
>>>                continue;
>>>
>>> +        handled = true; /* Found the handler for this access. */
>>> +
>>>            if ( emu.offset < r->offset )
>>>            {
>>>                /* Heading gap, read partial content from hardware. */
>>> @@ -432,6 +435,10 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>        }
>>>        spin_unlock(&pdev->vpci_lock);
>>>
>>> +    /* All unhandled guest requests return all 1's. */
>>> +    if ( !is_hardware_domain(d) && !handled )
>>> +        return ~(uint32_t)0;
>>> +
>>>        if ( data_offset < size )
>>>        {
>>>            /* Tailing gap, read the remaining. */
>> Except that like for the "tailing gap" you also need to avoid the
>> "heading gap" ending up in a read of the underlying hardware
>> register. Effectively you want to deal properly with all
>> vpci_read_hw() invocations (including the one when no pdev was
>> found, which for a DomU may simply mean domain_crash()).
> Yes. And with the above patch I can now remove the "TODO patch" then?
> Because it is saying that we allow access to the registers, but it is not safe.
> And now, if we disable that access, then TODO should be about the need to
> implement emulation for all the registers which are not yet handled which is
> obvious.

Yes, I think that other patch then should have no use anymore. (To be
honest I don't recall such a patch anyway.)

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 13:51                     ` Jan Beulich
@ 2022-01-31 13:58                       ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 13:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné



On 31.01.22 15:51, Jan Beulich wrote:
> On 31.01.2022 14:41, Oleksandr Andrushchenko wrote:
>> On 31.01.22 15:36, Jan Beulich wrote:
>>> On 31.01.2022 14:30, Oleksandr Andrushchenko wrote:
>>>> On 31.01.22 13:39, Jan Beulich wrote:
>>>>> On 31.01.2022 12:23, Oleksandr Andrushchenko wrote:
>>>>>> On 31.01.22 13:10, Roger Pau Monné wrote:
>>>>>>> Right (see my previous reply to this comment). I think it would be
>>>>>>> easier (and cleaner) if you switched the default behavior regarding
>>>>>>> unhandled register access for domUs at the start of the series (drop
>>>>>>> writes, reads returns ~0), and then you won't need to add all those
>>>>>>> dummy handler to drop writes and return ~0 for reads.
>>>>>>>
>>>>>>> It's going to be more work initially as you would need to support
>>>>>>> passthrough of more registers, but it's the right approach that we
>>>>>>> need implementation wise.
>>>>>> While I agree in general, this effectively means that I'll need to provide
>>>>>> handling for all PCIe registers and capabilities from the very start.
>>>>>> Otherwise no guest be able to properly initialize a PCI device without that.
>>>>>> Of course, we may want starting from stubs instead of proper emulation,
>>>>>> which will direct the access to real HW and later on we add proper emulation.
>>>>>> But, again, this is going to be a rather big piece of code where we need
>>>>>> to explicitly handle every possible capability.
>>>>> Since the two sub-threads are now about exactly the same topic, I'm
>>>>> answering here instead of there.
>>>>>
>>>>> No, you are not going to need to emulate all possible capabilities.
>>>>> We (or really qemu) don't do this on x86 either. Certain capabilities
>>>>> may be a must, but not everything. There are also device specific
>>>>> registers not covered by any capability structures - what to do with
>>>>> those is even more of a question.
>>>>>
>>>>> Furthermore for some of the fields justification why access to the
>>>>> raw hardware value is fine is going to be easy: r/o fields like
>>>>> vendor and device ID, for example. But every bit you allow direct
>>>>> access to needs to come with justification.
>>>>>
>>>>>> At the moment we are not going to claim that vPCI provides all means to
>>>>>> pass through a PCI device safely with this respect and this is why the feature
>>>>>> itself won't even be a tech preview yet. For that reason I think we can still
>>>>>> have implemented only crucial set of handlers and still allow the rest to
>>>>>> be read/write directly without emulation.
>>>>> I think you need to separate what you need for development from what
>>>>> goes upstream: For dev purposes you can very well invert the policy
>>>>> from white- to black-listing. But if we accepted the latter into the
>>>>> main tree, the risk would be there that something gets missed at the
>>>>> time where the permission model gets changed around.
>>>>>
>>>>> You could even have a non-default mode operating the way you want it
>>>>> (along the lines of pciback's permissive mode), allowing you to get
>>>>> away without needing to carry private patches. Things may also
>>>>> initially only work in that mode. But the default should be a mode
>>>>> which is secure (and which perhaps initially offers only very limited
>>>>> functionality).
>>>> Ok, so to make it clear:
>>>> 1. We do not allow unhandled access for guests: for that I will create a
>>>> dedicated patch which will implement such restrictions. Something like
>>>> the below (for both vPCI read and write):
>>>>
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index c5e67491c24f..9ef2a1b5af58 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -347,6 +347,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>         const struct vpci_register *r;
>>>>         unsigned int data_offset = 0;
>>>>         uint32_t data = ~(uint32_t)0;
>>>> +    bool handled = false;
>>>>
>>>>         if ( !size )
>>>>         {
>>>> @@ -405,6 +406,8 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>             if ( cmp > 0 )
>>>>                 continue;
>>>>
>>>> +        handled = true; /* Found the handler for this access. */
>>>> +
>>>>             if ( emu.offset < r->offset )
>>>>             {
>>>>                 /* Heading gap, read partial content from hardware. */
>>>> @@ -432,6 +435,10 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>>>         }
>>>>         spin_unlock(&pdev->vpci_lock);
>>>>
>>>> +    /* All unhandled guest requests return all 1's. */
>>>> +    if ( !is_hardware_domain(d) && !handled )
>>>> +        return ~(uint32_t)0;
>>>> +
>>>>         if ( data_offset < size )
>>>>         {
>>>>             /* Tailing gap, read the remaining. */
>>> Except that like for the "tailing gap" you also need to avoid the
>>> "heading gap" ending up in a read of the underlying hardware
>>> register. Effectively you want to deal properly with all
>>> vpci_read_hw() invocations (including the one when no pdev was
>>> found, which for a DomU may simply mean domain_crash()).
>> Yes. And with the above patch I can now remove the "TODO patch" then?
>> Because it is saying that we allow access to the registers, but it is not safe.
>> And now, if we disable that access, then TODO should be about the need to
>> implement emulation for all the registers which are not yet handled which is
>> obvious.
> Yes, I think that other patch then should have no use anymore. (To be
> honest I don't recall such a patch anyway.)
This is "[PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled"
in this series
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 11:04       ` Roger Pau Monné
@ 2022-01-31 14:51         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 14:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko


>>> I wonder whether we need to protect the added code with
>>> CONFIG_HAS_VPCI_GUEST_SUPPORT, this would effectively be dead code
>>> otherwise. Long term I don't think we wish to differentiate between
>>> dom0 and domU vPCI support at build time, so I'm unsure whether it's
>>> helpful to pollute the code with CONFIG_HAS_VPCI_GUEST_SUPPORT when
>>> the plan is to remove those long term.
>> I would have it without CONFIG_HAS_VPCI_GUEST_SUPPORT if you
>> don't mind
> Well, I guess if it's not too intrusive it's fine to add the defines,
> removing them afterwards should be easy.
It is intrusive: it is easy to add such a define in struct vpci, but then you need
ifdefery in xen/drivers/vpci/header.c to sort out the case when it is defined or
not. I can still do that if you insist
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-12 12:35   ` Roger Pau Monné
  2022-01-31  9:47     ` Oleksandr Andrushchenko
@ 2022-01-31 15:06     ` Oleksandr Andrushchenko
  2022-01-31 15:50       ` Jan Beulich
  1 sibling, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-01-31 15:06 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!
>>               rom->type = VPCI_BAR_EMPTY;
>>       }
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index ed127a08a953..0a73b14a92dc 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -68,7 +68,10 @@ struct vpci {
>>       struct vpci_header {
>>           /* Information about the PCI BARs of this device. */
>>           struct vpci_bar {
>> +            /* Physical view of the BAR. */
> No, that's not the physical view, it's the physical (host) address.
>
>>               uint64_t addr;
>> +            /* Guest view of the BAR: address and lower bits. */
>> +            uint64_t guest_reg;
> I continue to think it would be clearer if you store the guest address
> here (gaddr, without the low bits) and add those in guest_bar_read
> based on bar->{type,prefetchable}. Then it would be equivalent to the
> existing 'addr' field.
>
I agreed first to do such a change, but then recalled our discussion with Jan [1].
And then we decided that in order for it to be efficient it is better if we setup all the
things during the write phase (rare), rather then during the write phase (more often).
If you still see it clearer I can re-work the code

Thank you,
Oleksandr

[1] https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg103431.html

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 15:06     ` Oleksandr Andrushchenko
@ 2022-01-31 15:50       ` Jan Beulich
  2022-02-01  7:31         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-01-31 15:50 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh

On 31.01.2022 16:06, Oleksandr Andrushchenko wrote:
> Hi, Roger!
>>>               rom->type = VPCI_BAR_EMPTY;
>>>       }
>>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>>> index ed127a08a953..0a73b14a92dc 100644
>>> --- a/xen/include/xen/vpci.h
>>> +++ b/xen/include/xen/vpci.h
>>> @@ -68,7 +68,10 @@ struct vpci {
>>>       struct vpci_header {
>>>           /* Information about the PCI BARs of this device. */
>>>           struct vpci_bar {
>>> +            /* Physical view of the BAR. */
>> No, that's not the physical view, it's the physical (host) address.
>>
>>>               uint64_t addr;
>>> +            /* Guest view of the BAR: address and lower bits. */
>>> +            uint64_t guest_reg;
>> I continue to think it would be clearer if you store the guest address
>> here (gaddr, without the low bits) and add those in guest_bar_read
>> based on bar->{type,prefetchable}. Then it would be equivalent to the
>> existing 'addr' field.
>>
> I agreed first to do such a change, but then recalled our discussion with Jan [1].
> And then we decided that in order for it to be efficient it is better if we setup all the
> things during the write phase (rare), rather then during the write phase (more often).

Small correction: The 2nd "write" was likely meant to be "read". But
please recall that Roger is the maintainer of the code, so he gets
the final say.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31 15:50       ` Jan Beulich
@ 2022-02-01  7:31         ` Oleksandr Andrushchenko
  2022-02-01 10:10           ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01  7:31 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko



On 31.01.22 17:50, Jan Beulich wrote:
> On 31.01.2022 16:06, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>>>                rom->type = VPCI_BAR_EMPTY;
>>>>        }
>>>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>>>> index ed127a08a953..0a73b14a92dc 100644
>>>> --- a/xen/include/xen/vpci.h
>>>> +++ b/xen/include/xen/vpci.h
>>>> @@ -68,7 +68,10 @@ struct vpci {
>>>>        struct vpci_header {
>>>>            /* Information about the PCI BARs of this device. */
>>>>            struct vpci_bar {
>>>> +            /* Physical view of the BAR. */
>>> No, that's not the physical view, it's the physical (host) address.
>>>
>>>>                uint64_t addr;
>>>> +            /* Guest view of the BAR: address and lower bits. */
>>>> +            uint64_t guest_reg;
>>> I continue to think it would be clearer if you store the guest address
>>> here (gaddr, without the low bits) and add those in guest_bar_read
>>> based on bar->{type,prefetchable}. Then it would be equivalent to the
>>> existing 'addr' field.
>>>
>> I agreed first to do such a change, but then recalled our discussion with Jan [1].
>> And then we decided that in order for it to be efficient it is better if we setup all the
>> things during the write phase (rare), rather then during the write phase (more often).
> Small correction: The 2nd "write" was likely meant to be "read".
Yes, this is correct.
>   But
> please recall that Roger is the maintainer of the code, so he gets
> the final say.
Agree, but would vote for the current approach as it still saves some
CPU cycles making the read operation really tiny
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign
  2022-01-31  8:45     ` Oleksandr Andrushchenko
@ 2022-02-01  8:56       ` Oleksandr Andrushchenko
  2022-02-01 10:23         ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01  8:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 31.01.22 10:45, Oleksandr Andrushchenko wrote:
> Hi, Roger!
>
> On 13.01.22 13:40, Roger Pau Monné wrote:
>> On Thu, Nov 25, 2021 at 01:02:42PM +0200, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>>> +/* Notify vPCI that device is assigned to guest. */
>>> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>>> +{
>>> +    int rc;
>>> +
>>> +    /* It only makes sense to assign for hwdom or guest domain. */
>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>> +        return 0;
>>> +
>>> +    spin_lock(&pdev->vpci_lock);
>>> +    rc = run_vpci_init(pdev);
>> Following my comment below, this will likely need to call
>> vpci_add_handlers in order to allocate the pdev->vpci field.
>>
>> It's not OK to carry the contents of pdev->vpci across domain
>> assignations, as the device should be reset, and thus the content of
>> pdev->vpci would be stale.
>>
>>> +    spin_unlock(&pdev->vpci_lock);
>>> +    if ( rc )
>>> +        vpci_deassign_device(d, pdev);
>>> +
>>> +    return rc;
>>> +}
>>> +
>>> +/* Notify vPCI that device is de-assigned from guest. */
>>> +int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
>>> +{
>>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
>>> +    if ( is_system_domain(d) || !has_vpci(d) )
>>> +        return 0;
>>> +
>>> +    spin_lock(&pdev->vpci_lock);
>>> +    vpci_remove_device_handlers_locked(pdev);
>> You need to free the pdev->vpci structure on deassign. I would expect
>> the device to be reset on deassign, so keeping the pdev->vpci contents
>> would be wrong.
> Sure, I will re-allocate pdev->vpci then
After thinking a bit more on this I have realized that we cannot free
pdev->vpci on de-assign. The reason for that is the fact that vpci
structure contains vital data which is collected and managed at different
stages: for example, BAR types are collected while we run for the
hardware domain and in init_bars we collect the types of the BARS etc.
This is then used while assigning device to construct guest's representation
of the device. Freeing vpci will lead to that data is lost and the required
data is not populated into vpci.
So, it is no possible to free vpci structure and I am about to leave the
approach as it is.
>> Thanks, Roger.
> Thank you,
> Oleksandr
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-01  7:31         ` Oleksandr Andrushchenko
@ 2022-02-01 10:10           ` Roger Pau Monné
  2022-02-01 10:41             ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-02-01 10:10 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	andrew.cooper3, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh

On Tue, Feb 01, 2022 at 07:31:31AM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 31.01.22 17:50, Jan Beulich wrote:
> > On 31.01.2022 16:06, Oleksandr Andrushchenko wrote:
> >> Hi, Roger!
> >>>>                rom->type = VPCI_BAR_EMPTY;
> >>>>        }
> >>>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> >>>> index ed127a08a953..0a73b14a92dc 100644
> >>>> --- a/xen/include/xen/vpci.h
> >>>> +++ b/xen/include/xen/vpci.h
> >>>> @@ -68,7 +68,10 @@ struct vpci {
> >>>>        struct vpci_header {
> >>>>            /* Information about the PCI BARs of this device. */
> >>>>            struct vpci_bar {
> >>>> +            /* Physical view of the BAR. */
> >>> No, that's not the physical view, it's the physical (host) address.
> >>>
> >>>>                uint64_t addr;
> >>>> +            /* Guest view of the BAR: address and lower bits. */
> >>>> +            uint64_t guest_reg;
> >>> I continue to think it would be clearer if you store the guest address
> >>> here (gaddr, without the low bits) and add those in guest_bar_read
> >>> based on bar->{type,prefetchable}. Then it would be equivalent to the
> >>> existing 'addr' field.
> >>>
> >> I agreed first to do such a change, but then recalled our discussion with Jan [1].
> >> And then we decided that in order for it to be efficient it is better if we setup all the
> >> things during the write phase (rare), rather then during the write phase (more often).
> > Small correction: The 2nd "write" was likely meant to be "read".
> Yes, this is correct.
> >   But
> > please recall that Roger is the maintainer of the code, so he gets
> > the final say.
> Agree, but would vote for the current approach as it still saves some
> CPU cycles making the read operation really tiny

I think you need to build the mapping rangeset(s) based on guest
addresses, not host ones, so it's likely going to be easier if you
store the address here in order to use it when building the rangeset.

Overall the cost of the vmexit will shadow the cost of doing a couple
of ORs here in order to return the guest view of the BAR.

If you think storing the guest view of the BAR register will make the
code easier to understand, then please go ahead. Otherwise I would
recommend to store the address like we do for the host position of the
BAR (ie: addr field).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign
  2022-02-01  8:56       ` Oleksandr Andrushchenko
@ 2022-02-01 10:23         ` Roger Pau Monné
  0 siblings, 0 replies; 130+ messages in thread
From: Roger Pau Monné @ 2022-02-01 10:23 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh

On Tue, Feb 01, 2022 at 08:56:49AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 31.01.22 10:45, Oleksandr Andrushchenko wrote:
> > Hi, Roger!
> >
> > On 13.01.22 13:40, Roger Pau Monné wrote:
> >> On Thu, Nov 25, 2021 at 01:02:42PM +0200, Oleksandr Andrushchenko wrote:
> >>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> >>> +/* Notify vPCI that device is assigned to guest. */
> >>> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
> >>> +{
> >>> +    int rc;
> >>> +
> >>> +    /* It only makes sense to assign for hwdom or guest domain. */
> >>> +    if ( is_system_domain(d) || !has_vpci(d) )
> >>> +        return 0;
> >>> +
> >>> +    spin_lock(&pdev->vpci_lock);
> >>> +    rc = run_vpci_init(pdev);
> >> Following my comment below, this will likely need to call
> >> vpci_add_handlers in order to allocate the pdev->vpci field.
> >>
> >> It's not OK to carry the contents of pdev->vpci across domain
> >> assignations, as the device should be reset, and thus the content of
> >> pdev->vpci would be stale.
> >>
> >>> +    spin_unlock(&pdev->vpci_lock);
> >>> +    if ( rc )
> >>> +        vpci_deassign_device(d, pdev);
> >>> +
> >>> +    return rc;
> >>> +}
> >>> +
> >>> +/* Notify vPCI that device is de-assigned from guest. */
> >>> +int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
> >>> +{
> >>> +    /* It only makes sense to de-assign from hwdom or guest domain. */
> >>> +    if ( is_system_domain(d) || !has_vpci(d) )
> >>> +        return 0;
> >>> +
> >>> +    spin_lock(&pdev->vpci_lock);
> >>> +    vpci_remove_device_handlers_locked(pdev);
> >> You need to free the pdev->vpci structure on deassign. I would expect
> >> the device to be reset on deassign, so keeping the pdev->vpci contents
> >> would be wrong.
> > Sure, I will re-allocate pdev->vpci then
> After thinking a bit more on this I have realized that we cannot free
> pdev->vpci on de-assign. The reason for that is the fact that vpci
> structure contains vital data which is collected and managed at different
> stages: for example, BAR types are collected while we run for the
> hardware domain and in init_bars we collect the types of the BARS etc.
> This is then used while assigning device to construct guest's representation
> of the device. Freeing vpci will lead to that data is lost and the required
> data is not populated into vpci.
> So, it is no possible to free vpci structure and I am about to leave the
> approach as it is.

We discussed this on IRC, and we have agreed that it's possible to
free pdev->vpci on deassign since in any case we need to call
init_bars (and other capability init functions) when the device is
assigned to setup the register traps and fetch the required
information in order to fill pdev->vpci.

Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-01 10:10           ` Roger Pau Monné
@ 2022-02-01 10:41             ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 10:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	andrew.cooper3, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Roger!

On 01.02.22 12:10, Roger Pau Monné wrote:
> On Tue, Feb 01, 2022 at 07:31:31AM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 31.01.22 17:50, Jan Beulich wrote:
>>> On 31.01.2022 16:06, Oleksandr Andrushchenko wrote:
>>>> Hi, Roger!
>>>>>>                 rom->type = VPCI_BAR_EMPTY;
>>>>>>         }
>>>>>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>>>>>> index ed127a08a953..0a73b14a92dc 100644
>>>>>> --- a/xen/include/xen/vpci.h
>>>>>> +++ b/xen/include/xen/vpci.h
>>>>>> @@ -68,7 +68,10 @@ struct vpci {
>>>>>>         struct vpci_header {
>>>>>>             /* Information about the PCI BARs of this device. */
>>>>>>             struct vpci_bar {
>>>>>> +            /* Physical view of the BAR. */
>>>>> No, that's not the physical view, it's the physical (host) address.
>>>>>
>>>>>>                 uint64_t addr;
>>>>>> +            /* Guest view of the BAR: address and lower bits. */
>>>>>> +            uint64_t guest_reg;
>>>>> I continue to think it would be clearer if you store the guest address
>>>>> here (gaddr, without the low bits) and add those in guest_bar_read
>>>>> based on bar->{type,prefetchable}. Then it would be equivalent to the
>>>>> existing 'addr' field.
>>>>>
>>>> I agreed first to do such a change, but then recalled our discussion with Jan [1].
>>>> And then we decided that in order for it to be efficient it is better if we setup all the
>>>> things during the write phase (rare), rather then during the write phase (more often).
>>> Small correction: The 2nd "write" was likely meant to be "read".
>> Yes, this is correct.
>>>    But
>>> please recall that Roger is the maintainer of the code, so he gets
>>> the final say.
>> Agree, but would vote for the current approach as it still saves some
>> CPU cycles making the read operation really tiny
> I think you need to build the mapping rangeset(s) based on guest
> addresses, not host ones, so it's likely going to be easier if you
> store the address here in order to use it when building the rangeset.
>
> Overall the cost of the vmexit will shadow the cost of doing a couple
> of ORs here in order to return the guest view of the BAR.
>
> If you think storing the guest view of the BAR register will make the
> code easier to understand, then please go ahead. Otherwise I would
> recommend to store the address like we do for the host position of the
> BAR (ie: addr field).
I still think it is easier to understand: if you take a look at what we do
for BAR write for both host and guest you'll see that we do almost the
same operations, but in host case we end up writing bar->addr + low
bits to the HW register and in case of a guest we store the complete
thing into bar->guest_reg. Read operation doesn't require any processing
for host, so it is equivalent to direct hw read and in case of a guest it
is as simple as possible and implements the equivalent by returning
part of bar->guest_reg (hi or lo).  So, from this POV it is IMO easier to
understand the logic.
That being said, I do agree that the contents of the bar->addr is not
equivalent to bar->guest_reg, but we have already taken care of it
by naming the guest's one with guest_reg, not guest_addr.

I will keep the code as is then.
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR
  2022-01-12 15:15   ` Roger Pau Monné
  2022-01-12 15:18     ` Jan Beulich
@ 2022-02-02  6:44     ` Oleksandr Andrushchenko
  2022-02-02  9:56       ` Roger Pau Monné
  1 sibling, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02  6:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 12.01.22 17:15, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:44PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko<oleksandr_andrushchenko@epam.com>
>>
>> Instead of handling a single range set, that contains all the memory
>> regions of all the BARs and ROM, have them per BAR.
>> As the range sets are now created when a PCI device is added and destroyed
>> when it is removed so make them named and accounted.
>>
>> Note that rangesets were chosen here despite there being only up to
>> 3 separate ranges in each set (typically just 1). But rangeset per BAR
>> was chosen for the ease of implementation and existing code re-usability.
>>
>> This is in preparation of making non-identity mappings in p2m for the
>> MMIOs/ROM.
> I think we don't want to support ROM for guests (at least initially),
> so no need to mention it here.
Will add
>> Signed-off-by: Oleksandr Andrushchenko<oleksandr_andrushchenko@epam.com>
>>
>> ---
>> Since v4:
>> - use named range sets for BARs (Jan)
>> - changes required by the new locking scheme
>> - updated commit message (Jan)
>> Since v3:
>> - re-work vpci_cancel_pending accordingly to the per-BAR handling
>> - s/num_mem_ranges/map_pending and s/uint8_t/bool
>> - ASSERT(bar->mem) in modify_bars
>> - create and destroy the rangesets on add/remove
>> ---
>>   xen/drivers/vpci/header.c | 190 +++++++++++++++++++++++++++-----------
>>   xen/drivers/vpci/vpci.c   |  30 +++++-
>>   xen/include/xen/vpci.h    |   3 +-
>>   3 files changed, 166 insertions(+), 57 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 8880d34ebf8e..cc49aa68886f 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -137,45 +137,86 @@ bool vpci_process_pending(struct vcpu *v)
>>           return false;
>>   
>>       spin_lock(&pdev->vpci_lock);
>> -    if ( !pdev->vpci_cancel_pending && v->vpci.mem )
>> +    if ( !pdev->vpci )
>> +    {
>> +        spin_unlock(&pdev->vpci_lock);
>> +        return false;
>> +    }
>> +
>> +    if ( !pdev->vpci_cancel_pending && v->vpci.map_pending )
>>       {
>>           struct map_data data = {
>>               .d = v->domain,
>>               .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>>           };
>> -        int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
>> +        struct vpci_header *header = &pdev->vpci->header;
>> +        unsigned int i;
>>   
>> -        if ( rc == -ERESTART )
>> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>           {
>> -            spin_unlock(&pdev->vpci_lock);
>> -            return true;
>> -        }
>> +            struct vpci_bar *bar = &header->bars[i];
>> +            int rc;
>> +
> You should check bar->mem != NULL here, there's no need to allocate a
> rangeset for non-mappable BARs.
Answered by Jan already: no need as rangeset_is_empty already handles
NULL pointer
>> +            if ( rangeset_is_empty(bar->mem) )
>> +                continue;
>> +
>> +            rc = rangeset_consume_ranges(bar->mem, map_range, &data);
>> +
>> +            if ( rc == -ERESTART )
>> +            {
>> +                spin_unlock(&pdev->vpci_lock);
>> +                return true;
>> +            }
>>   
>> -        if ( pdev->vpci )
>>               /* Disable memory decoding unconditionally on failure. */
>> -            modify_decoding(pdev,
>> -                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>> +            modify_decoding(pdev, rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> The above seems to be an unrelated change, and also exceeds the max
> line length.
Sure, will try to fit
>>                               !rc && v->vpci.rom_only);
>>   
>> -        if ( rc )
>> -        {
>> -            /*
>> -             * FIXME: in case of failure remove the device from the domain.
>> -             * Note that there might still be leftover mappings. While this is
>> -             * safe for Dom0, for DomUs the domain needs to be killed in order
>> -             * to avoid leaking stale p2m mappings on failure.
>> -             */
>> -            if ( is_hardware_domain(v->domain) )
>> -                vpci_remove_device_locked(pdev);
>> -            else
>> -                domain_crash(v->domain);
>> +            if ( rc )
>> +            {
>> +                /*
>> +                 * FIXME: in case of failure remove the device from the domain.
>> +                 * Note that there might still be leftover mappings. While this is
>> +                 * safe for Dom0, for DomUs the domain needs to be killed in order
>> +                 * to avoid leaking stale p2m mappings on failure.
>> +                 */
>> +                if ( is_hardware_domain(v->domain) )
>> +                    vpci_remove_device_locked(pdev);
>> +                else
>> +                    domain_crash(v->domain);
>> +
>> +                break;
>> +            }
>>           }
>> +
>> +        v->vpci.map_pending = false;
>>       }
>>       spin_unlock(&pdev->vpci_lock);
>>   
>>       return false;
>>   }
>>   
>> +static void vpci_bar_remove_ranges(const struct pci_dev *pdev)
>> +{
>> +    struct vpci_header *header = &pdev->vpci->header;
>> +    unsigned int i;
>> +    int rc;
>> +
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +    {
>> +        struct vpci_bar *bar = &header->bars[i];
>> +
>> +        if ( rangeset_is_empty(bar->mem) )
>> +            continue;
>> +
>> +        rc = rangeset_remove_range(bar->mem, 0, ~0ULL);
> Might be interesting to introduce a rangeset_reset function that
> removes all ranges. That would never fail, and thus there would be no
> need to check for rc.
Well, there is a single user of that as of now, so not sure it is worth it yet
And if we re-allocate pdev->vpci then there might be no need for this
at all
> Also I think the current rangeset_remove_range should never fail when
> removing all ranges, as there's nothing to allocate.
Agree
>   Hence you can add
> an ASSERT_UNREACHABLE below.

>> +        if ( !rc )
>> +            printk(XENLOG_ERR
>> +                   "%pd %pp failed to remove range set for BAR: %d\n",
>> +                   pdev->domain, &pdev->sbdf, rc);
>> +    }
>> +}
>> +
>>   void vpci_cancel_pending_locked(struct pci_dev *pdev)
>>   {
>>       struct vcpu *v;
>> @@ -185,23 +226,33 @@ void vpci_cancel_pending_locked(struct pci_dev *pdev)
>>       /* Cancel any pending work now on all vCPUs. */
>>       for_each_vcpu( pdev->domain, v )
>>       {
>> -        if ( v->vpci.mem && (v->vpci.pdev == pdev) )
>> +        if ( v->vpci.map_pending && (v->vpci.pdev == pdev) )
>>           {
>> -            rangeset_destroy(v->vpci.mem);
>> -            v->vpci.mem = NULL;
>> +            vpci_bar_remove_ranges(pdev);
>> +            v->vpci.map_pending = false;
>>           }
>>       }
>>   }
>>   
>>   static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>> -                            struct rangeset *mem, uint16_t cmd)
>> +                            uint16_t cmd)
>>   {
>>       struct map_data data = { .d = d, .map = true };
>> -    int rc;
>> +    struct vpci_header *header = &pdev->vpci->header;
>> +    int rc = 0;
>> +    unsigned int i;
>> +
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +    {
>> +        struct vpci_bar *bar = &header->bars[i];
>>   
>> -    while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == -ERESTART )
>> -        process_pending_softirqs();
>> -    rangeset_destroy(mem);
>> +        if ( rangeset_is_empty(bar->mem) )
>> +            continue;
>> +
>> +        while ( (rc = rangeset_consume_ranges(bar->mem, map_range,
>> +                                              &data)) == -ERESTART )
>> +            process_pending_softirqs();
>> +    }
>>       if ( !rc )
>>           modify_decoding(pdev, cmd, false);
>>   
>> @@ -209,7 +260,7 @@ static int __init apply_map(struct domain *d, const struct pci_dev *pdev,
>>   }
>>   
>>   static void defer_map(struct domain *d, struct pci_dev *pdev,
>> -                      struct rangeset *mem, uint16_t cmd, bool rom_only)
>> +                      uint16_t cmd, bool rom_only)
>>   {
>>       struct vcpu *curr = current;
>>   
>> @@ -220,7 +271,7 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>>        * started for the same device if the domain is not well-behaved.
>>        */
>>       curr->vpci.pdev = pdev;
>> -    curr->vpci.mem = mem;
>> +    curr->vpci.map_pending = true;
>>       curr->vpci.cmd = cmd;
>>       curr->vpci.rom_only = rom_only;
>>       /*
>> @@ -234,42 +285,40 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>>   static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>   {
>>       struct vpci_header *header = &pdev->vpci->header;
>> -    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>>       struct pci_dev *tmp, *dev = NULL;
>>       const struct vpci_msix *msix = pdev->vpci->msix;
>> -    unsigned int i;
>> +    unsigned int i, j;
>>       int rc;
>> -
>> -    if ( !mem )
>> -        return -ENOMEM;
>> +    bool map_pending;
>>   
>>       /*
>> -     * Create a rangeset that represents the current device BARs memory region
>> +     * Create a rangeset per BAR that represents the current device memory region
>>        * and compare it against all the currently active BAR memory regions. If
>>        * an overlap is found, subtract it from the region to be mapped/unmapped.
>>        *
>> -     * First fill the rangeset with all the BARs of this device or with the ROM
>> +     * First fill the rangesets with all the BARs of this device or with the ROM
>                                          ^ 'all' doesn't apply anymore.
Will fix
>>        * BAR only, depending on whether the guest is toggling the memory decode
>>        * bit of the command register, or the enable bit of the ROM BAR register.
>>        */
>>       for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>       {
>> -        const struct vpci_bar *bar = &header->bars[i];
>> +        struct vpci_bar *bar = &header->bars[i];
>>           unsigned long start = PFN_DOWN(bar->addr);
>>           unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>   
>> +        ASSERT(bar->mem);
>> +
>>           if ( !MAPPABLE_BAR(bar) ||
>>                (rom_only ? bar->type != VPCI_BAR_ROM
>>                          : (bar->type == VPCI_BAR_ROM && !header->rom_enabled)) )
>>               continue;
>>   
>> -        rc = rangeset_add_range(mem, start, end);
>> +        rc = rangeset_add_range(bar->mem, start, end);
>>           if ( rc )
>>           {
>>               printk(XENLOG_G_WARNING "Failed to add [%lx, %lx]: %d\n",
>>                      start, end, rc);
>> -            rangeset_destroy(mem);
>> -            return rc;
>> +            goto fail;
>>           }
> I think you also need to check that BARs from the same device don't
> overlap themselves. This wasn't needed before because all BARs shared
> the same rangeset. It's not uncommon for BARs of the same device to
> share a page.
>
> So you would need something like the following added to the loop:
>
> /* Check for overlap with the already setup BAR ranges. */
> for ( j = 0; j < i; j++ )
>      rangeset_remove_range(header->bars[j].mem, start, end);
Good point
>>       }
>>   
>> @@ -280,14 +329,21 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>           unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>>                                        vmsix_table_size(pdev->vpci, i) - 1);
>>   
>> -        rc = rangeset_remove_range(mem, start, end);
>> -        if ( rc )
>> +        for ( j = 0; j < ARRAY_SIZE(header->bars); j++ )
>>           {
>> -            printk(XENLOG_G_WARNING
>> -                   "Failed to remove MSIX table [%lx, %lx]: %d\n",
>> -                   start, end, rc);
>> -            rangeset_destroy(mem);
>> -            return rc;
>> +            const struct vpci_bar *bar = &header->bars[j];
>> +
>> +            if ( rangeset_is_empty(bar->mem) )
>> +                continue;
>> +
>> +            rc = rangeset_remove_range(bar->mem, start, end);
>> +            if ( rc )
>> +            {
>> +                printk(XENLOG_G_WARNING
>> +                       "Failed to remove MSIX table [%lx, %lx]: %d\n",
>> +                       start, end, rc);
>> +                goto fail;
>> +            }
>>           }
>>       }
>>   
>> @@ -325,7 +381,8 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>               unsigned long start = PFN_DOWN(bar->addr);
>>               unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>   
>> -            if ( !bar->enabled || !rangeset_overlaps_range(mem, start, end) ||
>> +            if ( !bar->enabled ||
>> +                 !rangeset_overlaps_range(bar->mem, start, end) ||
>>                    /*
>>                     * If only the ROM enable bit is toggled check against other
>>                     * BARs in the same device for overlaps, but not against the
>> @@ -334,14 +391,13 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>                    (rom_only && tmp == pdev && bar->type == VPCI_BAR_ROM) )
>>                   continue;
>>   
>> -            rc = rangeset_remove_range(mem, start, end);
>> +            rc = rangeset_remove_range(bar->mem, start, end);
>>               if ( rc )
>>               {
>>                   spin_unlock(&tmp->vpci_lock);
>>                   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>>                          start, end, rc);
>> -                rangeset_destroy(mem);
>> -                return rc;
>> +                goto fail;
>>               }
>>           }
>>           spin_unlock(&tmp->vpci_lock);
>> @@ -360,12 +416,36 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>            * will always be to establish mappings and process all the BARs.
>>            */
>>           ASSERT((cmd & PCI_COMMAND_MEMORY) && !rom_only);
>> -        return apply_map(pdev->domain, pdev, mem, cmd);
>> +        return apply_map(pdev->domain, pdev, cmd);
>>       }
>>   
>> -    defer_map(dev->domain, dev, mem, cmd, rom_only);
>> +    /* Find out how many memory ranges has left after MSI and overlaps. */
>> +    map_pending = false;
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +        if ( !rangeset_is_empty(header->bars[i].mem) )
>> +        {
>> +            map_pending = true;
>> +            break;
>> +        }
>> +
>> +    /*
>> +     * There are cases when PCI device, root port for example, has neither
>> +     * memory space nor IO. In this case PCI command register write is
>> +     * missed resulting in the underlying PCI device not functional, so:
>> +     *   - if there are no regions write the command register now
>> +     *   - if there are regions then defer work and write later on
> I would just say:
>
> /* If there's no mapping work write the command register now. */
Ok
>> +     */
>> +    if ( !map_pending )
>> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>> +    else
>> +        defer_map(dev->domain, dev, cmd, rom_only);
>>   
>>       return 0;
>> +
>> +fail:
>> +    /* Destroy all the ranges we may have added. */
>> +    vpci_bar_remove_ranges(pdev);
>> +    return rc;
>>   }
>>   
>>   static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index a9e9e8ec438c..98b12a61be6f 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -52,11 +52,16 @@ static void vpci_remove_device_handlers_locked(struct pci_dev *pdev)
>>   
>>   void vpci_remove_device_locked(struct pci_dev *pdev)
>>   {
>> +    struct vpci_header *header = &pdev->vpci->header;
>> +    unsigned int i;
>> +
>>       ASSERT(spin_is_locked(&pdev->vpci_lock));
>>   
>>       pdev->vpci_cancel_pending = true;
>>       vpci_remove_device_handlers_locked(pdev);
>>       vpci_cancel_pending_locked(pdev);
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +        rangeset_destroy(header->bars[i].mem);
>>       xfree(pdev->vpci->msix);
>>       xfree(pdev->vpci->msi);
>>       xfree(pdev->vpci);
>> @@ -92,6 +97,8 @@ static int run_vpci_init(struct pci_dev *pdev)
>>   int vpci_add_handlers(struct pci_dev *pdev)
>>   {
>>       struct vpci *vpci;
>> +    struct vpci_header *header;
>> +    unsigned int i;
>>       int rc;
>>   
>>       if ( !has_vpci(pdev->domain) )
>> @@ -108,11 +115,32 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>       pdev->vpci = vpci;
>>       INIT_LIST_HEAD(&pdev->vpci->handlers);
>>   
>> +    header = &pdev->vpci->header;
>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +    {
>> +        struct vpci_bar *bar = &header->bars[i];
>> +        char str[32];
>> +
>> +        snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
>> +        bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
>> +        if ( !bar->mem )
>> +        {
>> +            rc = -ENOMEM;
>> +            goto fail;
>> +        }
>> +    }
> You just need the ranges for the VPCI_BAR_MEM32, VPCI_BAR_MEM64_LO and
> VPCI_BAR_ROM BAR types (see the MAPPABLE_BAR macro). Would it be
> possible to only allocate the rangeset for those BAR types?
I guess so
> Also this should be done in init_bars rather than here, as you would
> know the BAR types.
So, if we allocate these in init_bars so where are they destroyed then?
I think this should be vpci_remove_device and from this POV it would
be good to keep alloc/free code close to each other, e.g.
vpci_add_handlers/vpci_remove_device in the same file

> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2022-01-13 10:22   ` Roger Pau Monné
@ 2022-02-02  8:23     ` Oleksandr Andrushchenko
  2022-02-02  9:46       ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02  8:23 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 13.01.22 12:22, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:45PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Take into account guest's BAR view and program its p2m accordingly:
>> gfn is guest's view of the BAR and mfn is the physical BAR value as set
>> up by the PCI bus driver in the hardware domain.
>> This way hardware domain sees physical BAR values and guest sees
>> emulated ones.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v4:
>> - moved start_{gfn|mfn} calculation into map_range
>> - pass vpci_bar in the map_data instead of start_{gfn|mfn}
>> - s/guest_addr/guest_reg
>> Since v3:
>> - updated comment (Roger)
>> - removed gfn_add(map->start_gfn, rc); which is wrong
>> - use v->domain instead of v->vpci.pdev->domain
>> - removed odd e.g. in comment
>> - s/d%d/%pd in altered code
>> - use gdprintk for map/unmap logs
>> Since v2:
>> - improve readability for data.start_gfn and restructure ?: construct
>> Since v1:
>>   - s/MSI/MSI-X in comments
>>
>> ---
>> ---
>>   xen/drivers/vpci/header.c | 30 ++++++++++++++++++++++++++----
>>   1 file changed, 26 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index cc49aa68886f..b0499d32c5d8 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -30,6 +30,7 @@
>>   
>>   struct map_data {
>>       struct domain *d;
>> +    const struct vpci_bar *bar;
>>       bool map;
>>   };
>>   
>> @@ -41,8 +42,25 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>   
>>       for ( ; ; )
>>       {
>> +        /* Start address of the BAR as seen by the guest. */
>> +        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
>> +                                        ? map->bar->addr
>> +                                        : map->bar->guest_reg));
>> +        /* Physical start address of the BAR. */
>> +        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
>>           unsigned long size = e - s + 1;
>>   
>> +        /*
>> +         * Ranges to be mapped don't always start at the BAR start address, as
>> +         * there can be holes or partially consumed ranges. Account for the
>> +         * offset of the current address from the BAR start.
>> +         */
>> +        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
> When doing guests mappings the rangeset should represent the guest
> physical memory space, not the host one.
So, it does
>   So that collisions in the
> guest p2m can be avoided. Also a guest should be allowed to map the
> same mfn into multiple gfn. For example multiple BARs could share the
> same physical page on the host and the guest might like to map them at
> different pages in it's physmap.
There is no such restriction imposed
>
>> +
>> +        gdprintk(XENLOG_G_DEBUG,
>> +                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
>> +                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
>> +                 map->d);
> That's too chatty IMO, I could be fine with printing something along
> this lines from modify_bars, but not here because that function can be
> preempted and called multiple times.
Ok, will move to modify_bars as these prints are really helpful for debug
>
>>           /*
>>            * ARM TODOs:
>>            * - On ARM whether the memory is prefetchable or not should be passed
>> @@ -52,8 +70,10 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>            * - {un}map_mmio_regions doesn't support preemption.
>>            */
>>   
>> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
>> -                      : unmap_mmio_regions(map->d, _gfn(s), size, _mfn(s));
>> +        rc = map->map ? map_mmio_regions(map->d, start_gfn,
>> +                                         size, _mfn(s))
>> +                      : unmap_mmio_regions(map->d, start_gfn,
>> +                                           size, _mfn(s));
>>           if ( rc == 0 )
>>           {
>>               *c += size;
>> @@ -62,8 +82,8 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>           if ( rc < 0 )
>>           {
>>               printk(XENLOG_G_WARNING
>> -                   "Failed to identity %smap [%lx, %lx] for d%d: %d\n",
>> -                   map->map ? "" : "un", s, e, map->d->domain_id, rc);
>> +                   "Failed to identity %smap [%lx, %lx] for %pd: %d\n",
>> +                   map->map ? "" : "un", s, e, map->d, rc);
> You need to adjust the message here, as this is no longer an identity
> map for domUs.
Sure
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2022-02-02  8:23     ` Oleksandr Andrushchenko
@ 2022-02-02  9:46       ` Oleksandr Andrushchenko
  2022-02-02 10:34         ` Roger Pau Monné
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02  9:46 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko


>>> +        gdprintk(XENLOG_G_DEBUG,
>>> +                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
>>> +                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
>>> +                 map->d);
>> That's too chatty IMO, I could be fine with printing something along
>> this lines from modify_bars, but not here because that function can be
>> preempted and called multiple times.
> Ok, will move to modify_bars as these prints are really helpful for debug
I tried to implement the same, but now in init_bars:

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 667c04cee3ae..92407e617609 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -57,10 +57,6 @@ static int map_range(unsigned long s, unsigned long e, void *data,
           */
          start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));

-        gdprintk(XENLOG_G_DEBUG,
-                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
-                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
-                 map->d);
          /*
           * ARM TODOs:
           * - On ARM whether the memory is prefetchable or not should be passed
@@ -258,6 +254,28 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
      raise_softirq(SCHEDULE_SOFTIRQ);
  }

+static int print_range(unsigned long s, unsigned long e, void *data)
+{
+    const struct map_data *map = data;
+
+    for ( ; ; )
+    {
+        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
+                                        ? map->bar->addr
+                                        : map->bar->guest_reg));
+        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
+
+        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
+
+        gdprintk(XENLOG_G_DEBUG,
+                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
+                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
+                 map->d);
+    }
+
+    return 0;
+}
+
  static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
  {
      struct vpci_header *header = &pdev->vpci->header;
@@ -423,7 +441,25 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
      if ( !map_pending )
          pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
      else
+    {
+        struct map_data data = {
+            .d = pdev->domain,
+            .map = cmd & PCI_COMMAND_MEMORY,
+        };
+
+        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
+        {
+            const struct vpci_bar *bar = &header->bars[i];
+
+            if ( rangeset_is_empty(bar->mem) )
+                continue;
+
+            data.bar = bar;
+            rc = rangeset_report_ranges(bar->mem, 0, ~0ul, print_range, &data);
+        }
+
          defer_map(dev->domain, dev, cmd, rom_only);
+    }

      return 0;


To me, to implement a single DEBUG print, it is a bit an overkill.
I do understand your concerns that "that function can be
preempted and called multiple times", but taking look at the code
above I think we can accept that for DEBUG builds.

Could you please let me know if I:
1. Still need to implement (the patch above)
2. Drop DEBUG prints (those are really useful while debugging)
3. Leave the print where it was in map_range

Thank you in advance,
Oleksandr

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR
  2022-02-02  6:44     ` Oleksandr Andrushchenko
@ 2022-02-02  9:56       ` Roger Pau Monné
  2022-02-02 10:02         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-02-02  9:56 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh

On Wed, Feb 02, 2022 at 06:44:41AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Roger!
> 
> On 12.01.22 17:15, Roger Pau Monné wrote:
> > On Thu, Nov 25, 2021 at 01:02:44PM +0200, Oleksandr Andrushchenko wrote:
> >> @@ -108,11 +115,32 @@ int vpci_add_handlers(struct pci_dev *pdev)
> >>       pdev->vpci = vpci;
> >>       INIT_LIST_HEAD(&pdev->vpci->handlers);
> >>   
> >> +    header = &pdev->vpci->header;
> >> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> >> +    {
> >> +        struct vpci_bar *bar = &header->bars[i];
> >> +        char str[32];
> >> +
> >> +        snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
> >> +        bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
> >> +        if ( !bar->mem )
> >> +        {
> >> +            rc = -ENOMEM;
> >> +            goto fail;
> >> +        }
> >> +    }
> > You just need the ranges for the VPCI_BAR_MEM32, VPCI_BAR_MEM64_LO and
> > VPCI_BAR_ROM BAR types (see the MAPPABLE_BAR macro). Would it be
> > possible to only allocate the rangeset for those BAR types?
> I guess so
> > Also this should be done in init_bars rather than here, as you would
> > know the BAR types.
> So, if we allocate these in init_bars so where are they destroyed then?
> I think this should be vpci_remove_device and from this POV it would
> be good to keep alloc/free code close to each other, e.g.
> vpci_add_handlers/vpci_remove_device in the same file

The alloc/free is asymmetric already, as vpci->{msix,msi} gets
allocated in init_msi{x} but freed at vpci_remove_device.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR
  2022-02-02  9:56       ` Roger Pau Monné
@ 2022-02-02 10:02         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 10:02 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 02.02.22 11:56, Roger Pau Monné wrote:
> On Wed, Feb 02, 2022 at 06:44:41AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>
>> On 12.01.22 17:15, Roger Pau Monné wrote:
>>> On Thu, Nov 25, 2021 at 01:02:44PM +0200, Oleksandr Andrushchenko wrote:
>>>> @@ -108,11 +115,32 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>>>        pdev->vpci = vpci;
>>>>        INIT_LIST_HEAD(&pdev->vpci->handlers);
>>>>    
>>>> +    header = &pdev->vpci->header;
>>>> +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>> +    {
>>>> +        struct vpci_bar *bar = &header->bars[i];
>>>> +        char str[32];
>>>> +
>>>> +        snprintf(str, sizeof(str), "%pp:BAR%d", &pdev->sbdf, i);
>>>> +        bar->mem = rangeset_new(pdev->domain, str, RANGESETF_no_print);
>>>> +        if ( !bar->mem )
>>>> +        {
>>>> +            rc = -ENOMEM;
>>>> +            goto fail;
>>>> +        }
>>>> +    }
>>> You just need the ranges for the VPCI_BAR_MEM32, VPCI_BAR_MEM64_LO and
>>> VPCI_BAR_ROM BAR types (see the MAPPABLE_BAR macro). Would it be
>>> possible to only allocate the rangeset for those BAR types?
>> I guess so
>>> Also this should be done in init_bars rather than here, as you would
>>> know the BAR types.
>> So, if we allocate these in init_bars so where are they destroyed then?
>> I think this should be vpci_remove_device and from this POV it would
>> be good to keep alloc/free code close to each other, e.g.
>> vpci_add_handlers/vpci_remove_device in the same file
> The alloc/free is asymmetric already, as vpci->{msix,msi} gets
> allocated in init_msi{x} but freed at vpci_remove_device.
Makes sense, I will implement as you suggest
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2022-02-02  9:46       ` Oleksandr Andrushchenko
@ 2022-02-02 10:34         ` Roger Pau Monné
  2022-02-02 10:44           ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-02-02 10:34 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh

On Wed, Feb 02, 2022 at 09:46:21AM +0000, Oleksandr Andrushchenko wrote:
> 
> >>> +        gdprintk(XENLOG_G_DEBUG,
> >>> +                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
> >>> +                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
> >>> +                 map->d);
> >> That's too chatty IMO, I could be fine with printing something along
> >> this lines from modify_bars, but not here because that function can be
> >> preempted and called multiple times.
> > Ok, will move to modify_bars as these prints are really helpful for debug
> I tried to implement the same, but now in init_bars:
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 667c04cee3ae..92407e617609 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -57,10 +57,6 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>            */
>           start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
> 
> -        gdprintk(XENLOG_G_DEBUG,
> -                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
> -                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
> -                 map->d);
>           /*
>            * ARM TODOs:
>            * - On ARM whether the memory is prefetchable or not should be passed
> @@ -258,6 +254,28 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>       raise_softirq(SCHEDULE_SOFTIRQ);
>   }
> 
> +static int print_range(unsigned long s, unsigned long e, void *data)
> +{
> +    const struct map_data *map = data;
> +
> +    for ( ; ; )
> +    {
> +        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
> +                                        ? map->bar->addr
> +                                        : map->bar->guest_reg));
> +        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
> +
> +        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
> +
> +        gdprintk(XENLOG_G_DEBUG,
> +                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
> +                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
> +                 map->d);
> +    }

This is an infinite loop AFAICT. Why do you need the for for?

> +
> +    return 0;
> +}
> +
>   static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>   {
>       struct vpci_header *header = &pdev->vpci->header;
> @@ -423,7 +441,25 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>       if ( !map_pending )
>           pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>       else
> +    {
> +        struct map_data data = {
> +            .d = pdev->domain,
> +            .map = cmd & PCI_COMMAND_MEMORY,
> +        };
> +
> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> +        {
> +            const struct vpci_bar *bar = &header->bars[i];
> +
> +            if ( rangeset_is_empty(bar->mem) )
> +                continue;
> +
> +            data.bar = bar;
> +            rc = rangeset_report_ranges(bar->mem, 0, ~0ul, print_range, &data);

Since this is per-BAR we should also print that information and the
SBDF of the device, ie:

%pd SBDF: (ROM)BAR%u %map [%lx, %lx] -> ...

> +        }
> +
>           defer_map(dev->domain, dev, cmd, rom_only);
> +    }
> 
>       return 0;
> 
> 
> To me, to implement a single DEBUG print, it is a bit an overkill.
> I do understand your concerns that "that function can be
> preempted and called multiple times", but taking look at the code
> above I think we can accept that for DEBUG builds.

It might be better if you print the per BAR positions at the top of
modify_bars, where each BAR is added to the rangeset? Or do you care
about reporting the holes also?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2022-02-02 10:34         ` Roger Pau Monné
@ 2022-02-02 10:44           ` Oleksandr Andrushchenko
  2022-02-02 11:11             ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 10:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko


On 02.02.22 12:34, Roger Pau Monné wrote:
> On Wed, Feb 02, 2022 at 09:46:21AM +0000, Oleksandr Andrushchenko wrote:
>>>>> +        gdprintk(XENLOG_G_DEBUG,
>>>>> +                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
>>>>> +                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
>>>>> +                 map->d);
>>>> That's too chatty IMO, I could be fine with printing something along
>>>> this lines from modify_bars, but not here because that function can be
>>>> preempted and called multiple times.
>>> Ok, will move to modify_bars as these prints are really helpful for debug
>> I tried to implement the same, but now in init_bars:
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 667c04cee3ae..92407e617609 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -57,10 +57,6 @@ static int map_range(unsigned long s, unsigned long e, void *data,
>>             */
>>            start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
>>
>> -        gdprintk(XENLOG_G_DEBUG,
>> -                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
>> -                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
>> -                 map->d);
>>            /*
>>             * ARM TODOs:
>>             * - On ARM whether the memory is prefetchable or not should be passed
>> @@ -258,6 +254,28 @@ static void defer_map(struct domain *d, struct pci_dev *pdev,
>>        raise_softirq(SCHEDULE_SOFTIRQ);
>>    }
>>
>> +static int print_range(unsigned long s, unsigned long e, void *data)
>> +{
>> +    const struct map_data *map = data;
>> +
>> +    for ( ; ; )
>> +    {
>> +        gfn_t start_gfn = _gfn(PFN_DOWN(is_hardware_domain(map->d)
>> +                                        ? map->bar->addr
>> +                                        : map->bar->guest_reg));
>> +        mfn_t start_mfn = _mfn(PFN_DOWN(map->bar->addr));
>> +
>> +        start_gfn = gfn_add(start_gfn, s - mfn_x(start_mfn));
>> +
>> +        gdprintk(XENLOG_G_DEBUG,
>> +                 "%smap [%lx, %lx] -> %#"PRI_gfn" for %pd\n",
>> +                 map->map ? "" : "un", s, e, gfn_x(start_gfn),
>> +                 map->d);
>> +    }
> This is an infinite loop AFAICT. Why do you need the for for?
>
>> +
>> +    return 0;
>> +}
>> +
>>    static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>    {
>>        struct vpci_header *header = &pdev->vpci->header;
>> @@ -423,7 +441,25 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>        if ( !map_pending )
>>            pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
>>        else
>> +    {
>> +        struct map_data data = {
>> +            .d = pdev->domain,
>> +            .map = cmd & PCI_COMMAND_MEMORY,
>> +        };
>> +
>> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> +        {
>> +            const struct vpci_bar *bar = &header->bars[i];
>> +
>> +            if ( rangeset_is_empty(bar->mem) )
>> +                continue;
>> +
>> +            data.bar = bar;
>> +            rc = rangeset_report_ranges(bar->mem, 0, ~0ul, print_range, &data);
> Since this is per-BAR we should also print that information and the
> SBDF of the device, ie:
>
> %pd SBDF: (ROM)BAR%u %map [%lx, %lx] -> ...
>
>> +        }
>> +
>>            defer_map(dev->domain, dev, cmd, rom_only);
>> +    }
>>
>>        return 0;
>>
>>
>> To me, to implement a single DEBUG print, it is a bit an overkill.
>> I do understand your concerns that "that function can be
>> preempted and called multiple times", but taking look at the code
>> above I think we can accept that for DEBUG builds.
> It might be better if you print the per BAR positions at the top of
> modify_bars, where each BAR is added to the rangeset? Or do you care
> about reporting the holes also?
First of all I didn't run this code, so it is just to show the complexity
If the approach itself is ok. If it is then I'll get it working: please
do not review it literally yet.

The original print was used to show only those {un}mappings that
we actually do, no holes etc., so we need to print at the bottom of
the init_bars, e.g. when the rangesets are all ready.

Again, IMO, adding such a big piece of DEBUG code instead of
printing a single DEBUG message could be a bit expansive.
I still hear your concerns on *when* it is printed, but still think we can
allow that.
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2022-02-02 10:44           ` Oleksandr Andrushchenko
@ 2022-02-02 11:11             ` Jan Beulich
  2022-02-02 11:14               ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-02 11:11 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 02.02.2022 11:44, Oleksandr Andrushchenko wrote:
> Again, IMO, adding such a big piece of DEBUG code instead of
> printing a single DEBUG message could be a bit expansive.
> I still hear your concerns on *when* it is printed, but still think we can
> allow that.

You do realize though that the mere act of logging a message may cause
the need for preemption, and hence logging messages in such cases is
detrimental to forward progress?

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 08/14] vpci/header: program p2m with guest BAR view
  2022-02-02 11:11             ` Jan Beulich
@ 2022-02-02 11:14               ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 11:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné,
	Oleksandr Andrushchenko



On 02.02.22 13:11, Jan Beulich wrote:
> On 02.02.2022 11:44, Oleksandr Andrushchenko wrote:
>> Again, IMO, adding such a big piece of DEBUG code instead of
>> printing a single DEBUG message could be a bit expansive.
>> I still hear your concerns on *when* it is printed, but still think we can
>> allow that.
> You do realize though that the mere act of logging a message may cause
> the need for preemption, and hence logging messages in such cases is
> detrimental to forward progress?
Then I will probably remove the print at all. It is easy to add if needed
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-01-13 10:50   ` Roger Pau Monné
@ 2022-02-02 12:49     ` Oleksandr Andrushchenko
  2022-02-02 13:32       ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 12:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 13.01.22 12:50, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Add basic emulation support for guests. At the moment only emulate
>> PCI_COMMAND_INTX_DISABLE bit, the rest is not emulated yet and left
>> as TODO.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v3:
>> - gate more code on CONFIG_HAS_MSI
>> - removed logic for the case when MSI/MSI-X not enabled
>> ---
>>   xen/drivers/vpci/header.c | 21 +++++++++++++++++++--
>>   1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index b0499d32c5d8..2e44055946b0 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>           pci_conf_write16(pdev->sbdf, reg, cmd);
>>   }
>>   
>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>> +                            uint32_t cmd, void *data)
>> +{
>> +    /* TODO: Add proper emulation for all bits of the command register. */
>> +
>> +#ifdef CONFIG_HAS_PCI_MSI
>> +    if ( pdev->vpci->msi->enabled )
> You need to check for MSI-X also, pdev->vpci->msix->enabled.
Indeed, thank you
>
>> +    {
>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
> command register when attempting to enable MSI or MSIX capabilities.
Isn't it enough that we just check above if MSI/MSI-X enabled then make
sure INTX disabled? I am not following you here on what else needs to
be done.
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 10/14] vpci/header: reset the command register when adding devices
  2022-01-13 11:07   ` Roger Pau Monné
@ 2022-02-02 12:58     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 12:58 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 13.01.22 13:07, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:47PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Reset the command register when passing through a PCI device:
>> it is possible that when passing through a PCI device its memory
>> decoding bits in the command register are already set. Thus, a
>> guest OS may not write to the command register to update memory
>> decoding, so guest mappings (guest's view of the BARs) are
>> left not updated.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v1:
>>   - do not write 0 to the command register, but respect host settings.
> There's not much respect of host setting here, are you are basically
> writing 0 except for the INTX_DISABLE which will be set if MSI(X) is
> enabled.
Yes, and this is because we only support INTX emulation at the
moment
>
> I wonder whether you really need this anyway. I would expect that a
> device that's being assigned to a guest has just been reset globally,
> so there should be no need to reset the command register explicitly.
 From my experience it was a real case when the device was not
reset making troubles. I'll remove this patch for now and see if
I can still run without it relying on the device reset which must
be in place while assigning a PCI device (here we rely on the
toolstack, right?).
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology
  2022-01-12 15:39   ` Jan Beulich
@ 2022-02-02 13:15     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 13:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

Hi, Jan!

On 12.01.22 17:39, Jan Beulich wrote:
> On 25.11.2021 12:02, Oleksandr Andrushchenko wrote:
>> @@ -145,6 +148,53 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>   }
>>   
>>   #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +int vpci_add_virtual_device(struct pci_dev *pdev)
>> +{
>> +    struct domain *d = pdev->domain;
>> +    pci_sbdf_t sbdf = { 0 };
>> +    unsigned long new_dev_number;
>> +
>> +    /*
>> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
>> +     * there are multi-function ones which are not yet supported.
>> +     */
>> +    if ( pdev->info.is_extfn )
>> +    {
>> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
>> +                 &pdev->sbdf);
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    new_dev_number = find_first_zero_bit(&d->vpci_dev_assigned_map,
>> +                                         VPCI_MAX_VIRT_DEV);
>> +    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
>> +        return -ENOSPC;
>> +
>> +    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
>> +
>> +    /*
>> +     * Both segment and bus number are 0:
>> +     *  - we emulate a single host bridge for the guest, e.g. segment 0
>> +     *  - with bus 0 the virtual devices are seen as embedded
>> +     *    endpoints behind the root complex
>> +     *
>> +     * TODO: add support for multi-function devices.
>> +     */
>> +    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
>> +    pdev->vpci->guest_sbdf = sbdf;
>> +
>> +    return 0;
>> +
>> +}
>> +REGISTER_VPCI_INIT(vpci_add_virtual_device, VPCI_PRIORITY_MIDDLE);
> Is this function guaranteed to always be invoked ahead of ...
>
>> +static void vpci_remove_virtual_device(struct domain *d,
>> +                                       const struct pci_dev *pdev)
>> +{
>> +    __clear_bit(pdev->vpci->guest_sbdf.dev, &d->vpci_dev_assigned_map);
>> +    pdev->vpci->guest_sbdf.sbdf = ~0;
>> +}
> ... this one, even when considering error paths? Otherwise you may
> wrongly clear bit 31 here afaict.
According to Roger's comment I will not use REGISTER_VPCI_INIT
machinery for this.
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology
  2022-01-13 11:35   ` Roger Pau Monné
@ 2022-02-02 13:17     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 13:17 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 13.01.22 13:35, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:48PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Assign SBDF to the PCI devices being passed through with bus 0.
>> The resulting topology is where PCIe devices reside on the bus 0 of the
>> root complex itself (embedded endpoints).
>> This implementation is limited to 32 devices which are allowed on
>> a single PCI bus.
>>
>> Please note, that at the moment only function 0 of a multifunction
>> device can be passed through.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v4:
>> - moved and re-worked guest sbdf initializers
>> - s/set_bit/__set_bit
>> - s/clear_bit/__clear_bit
>> - minor comment fix s/Virtual/Guest/
>> - added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
>>    later for counting the number of MMIO handlers required for a guest
>>    (Julien)
>> Since v3:
>>   - make use of VPCI_INIT
>>   - moved all new code to vpci.c which belongs to it
>>   - changed open-coded 31 to PCI_SLOT(~0)
>>   - added comments and code to reject multifunction devices with
>>     functions other than 0
>>   - updated comment about vpci_dev_next and made it unsigned int
>>   - implement roll back in case of error while assigning/deassigning devices
>>   - s/dom%pd/%pd
>> Since v2:
>>   - remove casts that are (a) malformed and (b) unnecessary
>>   - add new line for better readability
>>   - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
>>      functions are now completely gated with this config
>>   - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>> New in v2
>> ---
>>   xen/drivers/vpci/vpci.c | 51 +++++++++++++++++++++++++++++++++++++++++
>>   xen/include/xen/sched.h |  8 +++++++
>>   xen/include/xen/vpci.h  | 11 +++++++++
>>   3 files changed, 70 insertions(+)
>>
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 98b12a61be6f..c2fb4d4db233 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -114,6 +114,9 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>       spin_lock(&pdev->vpci_lock);
>>       pdev->vpci = vpci;
>>       INIT_LIST_HEAD(&pdev->vpci->handlers);
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    pdev->vpci->guest_sbdf.sbdf = ~0;
>> +#endif
>>   
>>       header = &pdev->vpci->header;
>>       for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>> @@ -145,6 +148,53 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>   }
>>   
>>   #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +int vpci_add_virtual_device(struct pci_dev *pdev)
>> +{
>> +    struct domain *d = pdev->domain;
>> +    pci_sbdf_t sbdf = { 0 };
>> +    unsigned long new_dev_number;
> I think this needs to be limited to non-hardware domains?
>
> Or else you will report failures for the hardware domain even if it's
> not using the virtual topology at all.
Yes, this wants an is_hardware_domain check
>
>> +    /*
>> +     * Each PCI bus supports 32 devices/slots at max or up to 256 when
>> +     * there are multi-function ones which are not yet supported.
>> +     */
>> +    if ( pdev->info.is_extfn )
>> +    {
>> +        gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
>> +                 &pdev->sbdf);
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    new_dev_number = find_first_zero_bit(&d->vpci_dev_assigned_map,
>> +                                         VPCI_MAX_VIRT_DEV);
>> +    if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
>> +        return -ENOSPC;
>> +
>> +    __set_bit(new_dev_number, &d->vpci_dev_assigned_map);
> How is vpci_dev_assigned_map protected from concurrent accesses? Does
> it rely on the pcidevs lock being held while accessing it?
It does rely on pcidevs lock, I'll add an assert here
>
> If so it needs spelling out (and likely an assert added).
>
>> +    /*
>> +     * Both segment and bus number are 0:
>> +     *  - we emulate a single host bridge for the guest, e.g. segment 0
>> +     *  - with bus 0 the virtual devices are seen as embedded
>> +     *    endpoints behind the root complex
>> +     *
>> +     * TODO: add support for multi-function devices.
>> +     */
>> +    sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
>> +    pdev->vpci->guest_sbdf = sbdf;
>> +
>> +    return 0;
>> +
>> +}
>> +REGISTER_VPCI_INIT(vpci_add_virtual_device, VPCI_PRIORITY_MIDDLE);
> I'm unsure this is the right place to do virtual SBDF assignment, my
> plan was to use REGISTER_VPCI_INIT exclusively with PCI capabilities.
>
> I think it would be better to do the virtual SBDF assignment from
> vpci_assign_device.
Ok, will do
>
>> +
>> +static void vpci_remove_virtual_device(struct domain *d,
>> +                                       const struct pci_dev *pdev)
>> +{
>> +    __clear_bit(pdev->vpci->guest_sbdf.dev, &d->vpci_dev_assigned_map);
>> +    pdev->vpci->guest_sbdf.sbdf = ~0;
>> +}
>> +
>>   /* Notify vPCI that device is assigned to guest. */
>>   int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>>   {
>> @@ -171,6 +221,7 @@ int vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
>>           return 0;
>>   
>>       spin_lock(&pdev->vpci_lock);
>> +    vpci_remove_virtual_device(d, pdev);
>>       vpci_remove_device_handlers_locked(pdev);
>>       spin_unlock(&pdev->vpci_lock);
>>   
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 28146ee404e6..10bff103317c 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -444,6 +444,14 @@ struct domain
>>   
>>   #ifdef CONFIG_HAS_PCI
>>       struct list_head pdev_list;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    /*
>> +     * The bitmap which shows which device numbers are already used by the
>> +     * virtual PCI bus topology and is used to assign a unique SBDF to the
>> +     * next passed through virtual PCI device.
>> +     */
>> +    unsigned long vpci_dev_assigned_map;
> Please use DECLARE_BITMAP with the maximum number of supported
> devices as parameter.
Will use
>
>> +#endif
>>   #endif
>>   
>>   #ifdef CONFIG_HAS_PASSTHROUGH
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index 18319fc329f9..e5258bd7ce90 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -21,6 +21,13 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
>>   
>>   #define VPCI_ECAM_BDF(addr)     (((addr) & 0x0ffff000) >> 12)
>>   
>> +/*
>> + * Maximum number of devices supported by the virtual bus topology:
>> + * each PCI bus supports 32 devices/slots at max or up to 256 when
>> + * there are multi-function ones which are not yet supported.
>> + */
>> +#define VPCI_MAX_VIRT_DEV       (PCI_SLOT(~0) + 1)
>> +
>>   #define REGISTER_VPCI_INIT(x, p)                \
>>     static vpci_register_init_t *const x##_entry  \
>>                  __used_section(".data.vpci." p) = x
>> @@ -143,6 +150,10 @@ struct vpci {
>>               struct vpci_arch_msix_entry arch;
>>           } entries[];
>>       } *msix;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    /* Guest SBDF of the device. */
>> +    pci_sbdf_t guest_sbdf;
>> +#endif
>>   #endif
>>   };
>>   
>> -- 
>> 2.25.1
>>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 12:49     ` Oleksandr Andrushchenko
@ 2022-02-02 13:32       ` Jan Beulich
  2022-02-02 13:47         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-02 13:32 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
> On 13.01.22 12:50, Roger Pau Monné wrote:
>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>           pci_conf_write16(pdev->sbdf, reg, cmd);
>>>   }
>>>   
>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>> +                            uint32_t cmd, void *data)
>>> +{
>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>> +
>>> +#ifdef CONFIG_HAS_PCI_MSI
>>> +    if ( pdev->vpci->msi->enabled )
>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
> Indeed, thank you
>>
>>> +    {
>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>> command register when attempting to enable MSI or MSIX capabilities.
> Isn't it enough that we just check above if MSI/MSI-X enabled then make
> sure INTX disabled? I am not following you here on what else needs to
> be done.

No, you need to deal with the potentially bad combination on both
paths - command register writes (here) and MSI/MSI-X control register
writes (which is what Roger points you at). I would like to suggest
to consider simply forcing INTX_DISABLE on behind the guest's back
for those other two paths.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 13:32       ` Jan Beulich
@ 2022-02-02 13:47         ` Oleksandr Andrushchenko
  2022-02-02 14:18           ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 13:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné,
	Oleksandr Andrushchenko



On 02.02.22 15:32, Jan Beulich wrote:
> On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
>> On 13.01.22 12:50, Roger Pau Monné wrote:
>>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>            pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>    }
>>>>    
>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>> +                            uint32_t cmd, void *data)
>>>> +{
>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>> +
>>>> +#ifdef CONFIG_HAS_PCI_MSI
>>>> +    if ( pdev->vpci->msi->enabled )
>>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
>> Indeed, thank you
>>>> +    {
>>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>>> command register when attempting to enable MSI or MSIX capabilities.
>> Isn't it enough that we just check above if MSI/MSI-X enabled then make
>> sure INTX disabled? I am not following you here on what else needs to
>> be done.
> No, you need to deal with the potentially bad combination on both
> paths - command register writes (here) and MSI/MSI-X control register
> writes (which is what Roger points you at). I would like to suggest
> to consider simply forcing INTX_DISABLE on behind the guest's back
> for those other two paths.
Do you suggest that we need to have some code which will
write PCI_COMMAND while we write MSI/MSI-X control register
for that kind of consistency? E.g. control register handler will
need to write to PCI_COMMAND and go through emulation for
guests?

If so, why didn't we have that before?
If it was ok before, then I guess the code I add does ensure INTX
is set if pdev->vpci->msi->enabled || pdev->vpci->msix->enabled
which is enough at least for PCI_COMMAND writes.

Sorry if I still didn't get to the point how to do that
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 12/14] xen/arm: translate virtual PCI bus topology for guests
  2022-01-13 12:18   ` Roger Pau Monné
@ 2022-02-02 13:58     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 13:58 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 13.01.22 14:18, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:49PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> There are three  originators for the PCI configuration space access:
>> 1. The domain that owns physical host bridge: MMIO handlers are
>> there so we can update vPCI register handlers with the values
>> written by the hardware domain, e.g. physical view of the registers
>> vs guest's view on the configuration space.
>> 2. Guest access to the passed through PCI devices: we need to properly
>> map virtual bus topology to the physical one, e.g. pass the configuration
>> space access to the corresponding physical devices.
>> 3. Emulated host PCI bridge access. It doesn't exist in the physical
>> topology, e.g. it can't be mapped to some physical host bridge.
>> So, all access to the host bridge itself needs to be trapped and
>> emulated.
> I'm kind of lost in this commit message. You are just adding a
> translate function in order for domUs to translate from virtual SBDF
> to the physical SBDF of the device. I realize you do that based on
> whether 'bridge' is set or not, so I assume this is just a way to
> signal whether the domain is a hardware domain or not. Ie:
> !!bridge == is_hardware_domain(v->domain).
Simply put: yes
>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Since v4:
>> - indentation fixes
>> - constify struct domain
>> - updated commit message
>> - updates to the new locking scheme (pdev->vpci_lock)
>> Since v3:
>> - revisit locking
>> - move code to vpci.c
>> Since v2:
>>   - pass struct domain instead of struct vcpu
>>   - constify arguments where possible
>>   - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
>> New in v2
>> ---
>>   xen/arch/arm/vpci.c     | 18 ++++++++++++++++++
>>   xen/drivers/vpci/vpci.c | 27 +++++++++++++++++++++++++++
>>   xen/include/xen/vpci.h  |  1 +
>>   3 files changed, 46 insertions(+)
>>
>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>> index 8e801f275879..3d134f42d07e 100644
>> --- a/xen/arch/arm/vpci.c
>> +++ b/xen/arch/arm/vpci.c
>> @@ -41,6 +41,15 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>>       /* data is needed to prevent a pointer cast on 32bit */
>>       unsigned long data;
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    /*
>> +     * For the passed through devices we need to map their virtual SBDF
>> +     * to the physical PCI device being passed through.
>> +     */
>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>> +        return 1;
> I'm unsure what returning 1 implies for Arm here, but you likely need
> to set '*r = ~0ul;'.
Good catch, will add
>
>> +#endif
>> +
>>       if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
>>                           1U << info->dabt.size, &data) )
>>       {
>> @@ -59,6 +68,15 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
>>       struct pci_host_bridge *bridge = p;
>>       pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
>>   
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +    /*
>> +     * For the passed through devices we need to map their virtual SBDF
>> +     * to the physical PCI device being passed through.
>> +     */
>> +    if ( !bridge && !vpci_translate_virtual_device(v->domain, &sbdf) )
>> +        return 1;
>> +#endif
>> +
>>       return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
>>                              1U << info->dabt.size, r);
>>   }
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index c2fb4d4db233..bdc8c63f73fa 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -195,6 +195,33 @@ static void vpci_remove_virtual_device(struct domain *d,
>>       pdev->vpci->guest_sbdf.sbdf = ~0;
>>   }
>>   
>> +/*
>> + * Find the physical device which is mapped to the virtual device
>> + * and translate virtual SBDF to the physical one.
>> + */
>> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf)
>> +{
>> +    struct pci_dev *pdev;
>> +
> I would add:
>
> ASSERT(!is_hardware_domain(d));
>
> To make sure this is not used for the hardware domain.
Will add
>
>> +    for_each_pdev( d, pdev )
>> +    {
>> +        bool found;
>> +
>> +        spin_lock(&pdev->vpci_lock);
>> +        found = pdev->vpci && (pdev->vpci->guest_sbdf.sbdf == sbdf->sbdf);
>> +        spin_unlock(&pdev->vpci_lock);
>> +
>> +        if ( found )
>> +        {
>> +            /* Replace guest SBDF with the physical one. */
>> +            *sbdf = pdev->sbdf;
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>   /* Notify vPCI that device is assigned to guest. */
>>   int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>>   {
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index e5258bd7ce90..21d76929391f 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -280,6 +280,7 @@ static inline void vpci_cancel_pending_locked(struct pci_dev *pdev)
>>   /* Notify vPCI that device is assigned/de-assigned to/from guest. */
>>   int vpci_assign_device(struct domain *d, struct pci_dev *pdev);
>>   int vpci_deassign_device(struct domain *d, struct pci_dev *pdev);
>> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf);
>>   #else
>>   static inline int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
>>   {
> If you add a dummy vpci_translate_virtual_device helper that returns
> false unconditionally here you could drop the #ifdefs in arm/vpci.c
> AFAICT.
Will try to do so
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 13/14] xen/arm: account IO handlers for emulated PCI MSI-X
  2022-01-13 13:23   ` Roger Pau Monné
@ 2022-02-02 14:08     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 14:08 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 13.01.22 15:23, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 01:02:50PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> At the moment, we always allocate an extra 16 slots for IO handlers
>> (see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
>> MSI-X registers we need to explicitly tell that we have additional IO
>> handlers, so those are accounted.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> LGTM, just one comment below. This will require an Ack from the Arm
> guys.
>
>> ---
>> Cc: Julien Grall <julien@xen.org>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>> This actually moved here from the part 2 of the prep work for PCI
>> passthrough on Arm as it seems to be the proper place for it.
>>
>> New in v5
>> ---
>>   xen/arch/arm/vpci.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
>> index 3d134f42d07e..902f8491e030 100644
>> --- a/xen/arch/arm/vpci.c
>> +++ b/xen/arch/arm/vpci.c
>> @@ -134,6 +134,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
>>   
>>   unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
>>   {
>> +    unsigned int count;
>> +
>>       if ( !has_vpci(d) )
>>           return 0;
>>   
>> @@ -145,7 +147,18 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
>>       }
>>   
>>       /* For a single emulated host bridge's configuration space. */
>> -    return 1;
>> +    count = 1;
>> +
>> +#ifdef CONFIG_HAS_PCI_MSI
>> +    /*
>> +     * There's a single MSI-X MMIO handler that deals with both PBA
>> +     * and MSI-X tables per each PCI device being passed through.
>> +     * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
>> +     */
>> +    count += VPCI_MAX_VIRT_DEV;
> You could also use IS_ENABLED(CONFIG_HAS_PCI_MSI) since
> VPCI_MAX_VIRT_DEV is defined unconditionally.
Yes, will use, thank you
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 13:47         ` Oleksandr Andrushchenko
@ 2022-02-02 14:18           ` Jan Beulich
  2022-02-02 14:26             ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-02 14:18 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 02.02.2022 14:47, Oleksandr Andrushchenko wrote:
>> On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
>>> On 13.01.22 12:50, Roger Pau Monné wrote:
>>>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>>>> --- a/xen/drivers/vpci/header.c
>>>>> +++ b/xen/drivers/vpci/header.c
>>>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>            pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>    }
>>>>>    
>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                            uint32_t cmd, void *data)
>>>>> +{
>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>> +
>>>>> +#ifdef CONFIG_HAS_PCI_MSI
>>>>> +    if ( pdev->vpci->msi->enabled )
>>>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
>>> Indeed, thank you
>>>>> +    {
>>>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>>>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>>>> command register when attempting to enable MSI or MSIX capabilities.
>>> Isn't it enough that we just check above if MSI/MSI-X enabled then make
>>> sure INTX disabled? I am not following you here on what else needs to
>>> be done.
>> No, you need to deal with the potentially bad combination on both
>> paths - command register writes (here) and MSI/MSI-X control register
>> writes (which is what Roger points you at). I would like to suggest
>> to consider simply forcing INTX_DISABLE on behind the guest's back
>> for those other two paths.
> Do you suggest that we need to have some code which will
> write PCI_COMMAND while we write MSI/MSI-X control register
> for that kind of consistency? E.g. control register handler will
> need to write to PCI_COMMAND and go through emulation for
> guests?

Either check or write, yes. Since you're setting the bit here behind
the guest's back, setting it on the other paths as well would only
look consistent to me.

> If so, why didn't we have that before?

Because we assume Dom0 to be behaving itself.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 14:18           ` Jan Beulich
@ 2022-02-02 14:26             ` Oleksandr Andrushchenko
  2022-02-02 14:31               ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 14:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné,
	Oleksandr Andrushchenko



On 02.02.22 16:18, Jan Beulich wrote:
> On 02.02.2022 14:47, Oleksandr Andrushchenko wrote:
>>> On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
>>>> On 13.01.22 12:50, Roger Pau Monné wrote:
>>>>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>             pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>     }
>>>>>>     
>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>> +                            uint32_t cmd, void *data)
>>>>>> +{
>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>> +
>>>>>> +#ifdef CONFIG_HAS_PCI_MSI
>>>>>> +    if ( pdev->vpci->msi->enabled )
>>>>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
>>>> Indeed, thank you
>>>>>> +    {
>>>>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>>>>> command register when attempting to enable MSI or MSIX capabilities.
>>>> Isn't it enough that we just check above if MSI/MSI-X enabled then make
>>>> sure INTX disabled? I am not following you here on what else needs to
>>>> be done.
>>> No, you need to deal with the potentially bad combination on both
>>> paths - command register writes (here) and MSI/MSI-X control register
>>> writes (which is what Roger points you at). I would like to suggest
>>> to consider simply forcing INTX_DISABLE on behind the guest's back
>>> for those other two paths.
>> Do you suggest that we need to have some code which will
>> write PCI_COMMAND while we write MSI/MSI-X control register
>> for that kind of consistency? E.g. control register handler will
>> need to write to PCI_COMMAND and go through emulation for
>> guests?
> Either check or write, yes. Since you're setting the bit here behind
> the guest's back, setting it on the other paths as well would only
> look consistent to me.
I can't find any access to PCI_COMMAND register from vMSI/vMSI-X
code, so what's the concern? This seems to be the only place in vPCI
which touches PCI_COMMAND register.
>
>> If so, why didn't we have that before?
> Because we assume Dom0 to be behaving itself.
ok...
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 14:26             ` Oleksandr Andrushchenko
@ 2022-02-02 14:31               ` Jan Beulich
  2022-02-02 15:04                 ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-02 14:31 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 02.02.2022 15:26, Oleksandr Andrushchenko wrote:
> 
> 
> On 02.02.22 16:18, Jan Beulich wrote:
>> On 02.02.2022 14:47, Oleksandr Andrushchenko wrote:
>>>> On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
>>>>> On 13.01.22 12:50, Roger Pau Monné wrote:
>>>>>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>             pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>     }
>>>>>>>     
>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>> +{
>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>> +
>>>>>>> +#ifdef CONFIG_HAS_PCI_MSI
>>>>>>> +    if ( pdev->vpci->msi->enabled )
>>>>>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
>>>>> Indeed, thank you
>>>>>>> +    {
>>>>>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>>>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>>>>>> command register when attempting to enable MSI or MSIX capabilities.
>>>>> Isn't it enough that we just check above if MSI/MSI-X enabled then make
>>>>> sure INTX disabled? I am not following you here on what else needs to
>>>>> be done.
>>>> No, you need to deal with the potentially bad combination on both
>>>> paths - command register writes (here) and MSI/MSI-X control register
>>>> writes (which is what Roger points you at). I would like to suggest
>>>> to consider simply forcing INTX_DISABLE on behind the guest's back
>>>> for those other two paths.
>>> Do you suggest that we need to have some code which will
>>> write PCI_COMMAND while we write MSI/MSI-X control register
>>> for that kind of consistency? E.g. control register handler will
>>> need to write to PCI_COMMAND and go through emulation for
>>> guests?
>> Either check or write, yes. Since you're setting the bit here behind
>> the guest's back, setting it on the other paths as well would only
>> look consistent to me.
> I can't find any access to PCI_COMMAND register from vMSI/vMSI-X
> code, so what's the concern?

Again: Only one of INTX, MSI, or MSI-X may be enabled at a time.
This needs to be checked whenever any one of the three is about
to change state. Since failing config space writes isn't really
an option (there's no error code to hand back and raising an
exception is nothing real hardware would do), adjusting state to
be sane behind the back of the guest looks to be the least bad
option.

> This seems to be the only place in vPCI which touches PCI_COMMAND register.

How is this relevant?

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 14:31               ` Jan Beulich
@ 2022-02-02 15:04                 ` Oleksandr Andrushchenko
  2022-02-02 15:08                   ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 15:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné,
	Oleksandr Andrushchenko



On 02.02.22 16:31, Jan Beulich wrote:
> On 02.02.2022 15:26, Oleksandr Andrushchenko wrote:
>>
>> On 02.02.22 16:18, Jan Beulich wrote:
>>> On 02.02.2022 14:47, Oleksandr Andrushchenko wrote:
>>>>> On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
>>>>>> On 13.01.22 12:50, Roger Pau Monné wrote:
>>>>>>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>              pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>      }
>>>>>>>>      
>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>> +{
>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>> +
>>>>>>>> +#ifdef CONFIG_HAS_PCI_MSI
>>>>>>>> +    if ( pdev->vpci->msi->enabled )
>>>>>>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
>>>>>> Indeed, thank you
>>>>>>>> +    {
>>>>>>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>>>>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>>>>>>> command register when attempting to enable MSI or MSIX capabilities.
>>>>>> Isn't it enough that we just check above if MSI/MSI-X enabled then make
>>>>>> sure INTX disabled? I am not following you here on what else needs to
>>>>>> be done.
>>>>> No, you need to deal with the potentially bad combination on both
>>>>> paths - command register writes (here) and MSI/MSI-X control register
>>>>> writes (which is what Roger points you at). I would like to suggest
>>>>> to consider simply forcing INTX_DISABLE on behind the guest's back
>>>>> for those other two paths.
>>>> Do you suggest that we need to have some code which will
>>>> write PCI_COMMAND while we write MSI/MSI-X control register
>>>> for that kind of consistency? E.g. control register handler will
>>>> need to write to PCI_COMMAND and go through emulation for
>>>> guests?
>>> Either check or write, yes. Since you're setting the bit here behind
>>> the guest's back, setting it on the other paths as well would only
>>> look consistent to me.
>> I can't find any access to PCI_COMMAND register from vMSI/vMSI-X
>> code, so what's the concern?
> Again: Only one of INTX, MSI, or MSI-X may be enabled at a time.
This is clear and I don't question that
> This needs to be checked whenever any one of the three is about
> to change state. Since failing config space writes isn't really
> an option (there's no error code to hand back and raising an
> exception is nothing real hardware would do), adjusting state to
> be sane behind the back of the guest looks to be the least bad
> option.
Would it be enough if I read PCI_MSIX_FLAGS_ENABLE and
PCI_MSI_FLAGS_ENABLE in guest_cmd_write to make a
decision on INTX?

On the other hand msi->enabled and msix->enabled
already have this information if I understand the
MSI/MSI-X code correctly.

Or do we want some additional code in MSI/MSI-X's control_write
functions to set INTX bit there as well?

I mean that in this guest_cmd_write handler we can only see
if we write a consistent wrt MSI/MSI-X PCI_COMMAND value

If we want some more checks when we alter PCI_MSIX_FLAGS_ENABLE
and/or PCI_MSI_FLAGS_ENABLE bits, this means we need a relevant
PCI_COMMAND write there to be added (which doesn't exist now)
to make sure INTX bit is set.

Please help me understand how you gentlemen want it
>
>> This seems to be the only place in vPCI which touches PCI_COMMAND register.
> How is this relevant?
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 15:04                 ` Oleksandr Andrushchenko
@ 2022-02-02 15:08                   ` Jan Beulich
  2022-02-02 15:12                     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-02 15:08 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 02.02.2022 16:04, Oleksandr Andrushchenko wrote:
> 
> 
> On 02.02.22 16:31, Jan Beulich wrote:
>> On 02.02.2022 15:26, Oleksandr Andrushchenko wrote:
>>>
>>> On 02.02.22 16:18, Jan Beulich wrote:
>>>> On 02.02.2022 14:47, Oleksandr Andrushchenko wrote:
>>>>>> On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
>>>>>>> On 13.01.22 12:50, Roger Pau Monné wrote:
>>>>>>>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>              pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>>      }
>>>>>>>>>      
>>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>>> +{
>>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>>> +
>>>>>>>>> +#ifdef CONFIG_HAS_PCI_MSI
>>>>>>>>> +    if ( pdev->vpci->msi->enabled )
>>>>>>>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
>>>>>>> Indeed, thank you
>>>>>>>>> +    {
>>>>>>>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>>>>>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>>>>>>>> command register when attempting to enable MSI or MSIX capabilities.
>>>>>>> Isn't it enough that we just check above if MSI/MSI-X enabled then make
>>>>>>> sure INTX disabled? I am not following you here on what else needs to
>>>>>>> be done.
>>>>>> No, you need to deal with the potentially bad combination on both
>>>>>> paths - command register writes (here) and MSI/MSI-X control register
>>>>>> writes (which is what Roger points you at). I would like to suggest
>>>>>> to consider simply forcing INTX_DISABLE on behind the guest's back
>>>>>> for those other two paths.
>>>>> Do you suggest that we need to have some code which will
>>>>> write PCI_COMMAND while we write MSI/MSI-X control register
>>>>> for that kind of consistency? E.g. control register handler will
>>>>> need to write to PCI_COMMAND and go through emulation for
>>>>> guests?
>>>> Either check or write, yes. Since you're setting the bit here behind
>>>> the guest's back, setting it on the other paths as well would only
>>>> look consistent to me.
>>> I can't find any access to PCI_COMMAND register from vMSI/vMSI-X
>>> code, so what's the concern?
>> Again: Only one of INTX, MSI, or MSI-X may be enabled at a time.
> This is clear and I don't question that
>> This needs to be checked whenever any one of the three is about
>> to change state. Since failing config space writes isn't really
>> an option (there's no error code to hand back and raising an
>> exception is nothing real hardware would do), adjusting state to
>> be sane behind the back of the guest looks to be the least bad
>> option.
> Would it be enough if I read PCI_MSIX_FLAGS_ENABLE and
> PCI_MSI_FLAGS_ENABLE in guest_cmd_write to make a
> decision on INTX?
> 
> On the other hand msi->enabled and msix->enabled
> already have this information if I understand the
> MSI/MSI-X code correctly.
> 
> Or do we want some additional code in MSI/MSI-X's control_write
> functions to set INTX bit there as well?

Well, yes, this is what Roger and I have been asking you to add.

> I mean that in this guest_cmd_write handler we can only see
> if we write a consistent wrt MSI/MSI-X PCI_COMMAND value
> 
> If we want some more checks when we alter PCI_MSIX_FLAGS_ENABLE
> and/or PCI_MSI_FLAGS_ENABLE bits, this means we need a relevant
> PCI_COMMAND write there to be added (which doesn't exist now)
> to make sure INTX bit is set.

Exactly.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 15:08                   ` Jan Beulich
@ 2022-02-02 15:12                     ` Oleksandr Andrushchenko
  2022-02-02 15:31                       ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 15:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné,
	Oleksandr Andrushchenko



On 02.02.22 17:08, Jan Beulich wrote:
> On 02.02.2022 16:04, Oleksandr Andrushchenko wrote:
>>
>> On 02.02.22 16:31, Jan Beulich wrote:
>>> On 02.02.2022 15:26, Oleksandr Andrushchenko wrote:
>>>> On 02.02.22 16:18, Jan Beulich wrote:
>>>>> On 02.02.2022 14:47, Oleksandr Andrushchenko wrote:
>>>>>>> On 02.02.2022 13:49, Oleksandr Andrushchenko wrote:
>>>>>>>> On 13.01.22 12:50, Roger Pau Monné wrote:
>>>>>>>>> On Thu, Nov 25, 2021 at 01:02:46PM +0200, Oleksandr Andrushchenko wrote:
>>>>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>>>>> @@ -491,6 +491,22 @@ static void cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>>               pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>>>>>>       }
>>>>>>>>>>       
>>>>>>>>>> +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>>>>>> +                            uint32_t cmd, void *data)
>>>>>>>>>> +{
>>>>>>>>>> +    /* TODO: Add proper emulation for all bits of the command register. */
>>>>>>>>>> +
>>>>>>>>>> +#ifdef CONFIG_HAS_PCI_MSI
>>>>>>>>>> +    if ( pdev->vpci->msi->enabled )
>>>>>>>>> You need to check for MSI-X also, pdev->vpci->msix->enabled.
>>>>>>>> Indeed, thank you
>>>>>>>>>> +    {
>>>>>>>>>> +        /* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X enabled. */
>>>>>>>>>> +        cmd |= PCI_COMMAND_INTX_DISABLE;
>>>>>>>>> You will also need to make sure PCI_COMMAND_INTX_DISABLE is set in the
>>>>>>>>> command register when attempting to enable MSI or MSIX capabilities.
>>>>>>>> Isn't it enough that we just check above if MSI/MSI-X enabled then make
>>>>>>>> sure INTX disabled? I am not following you here on what else needs to
>>>>>>>> be done.
>>>>>>> No, you need to deal with the potentially bad combination on both
>>>>>>> paths - command register writes (here) and MSI/MSI-X control register
>>>>>>> writes (which is what Roger points you at). I would like to suggest
>>>>>>> to consider simply forcing INTX_DISABLE on behind the guest's back
>>>>>>> for those other two paths.
>>>>>> Do you suggest that we need to have some code which will
>>>>>> write PCI_COMMAND while we write MSI/MSI-X control register
>>>>>> for that kind of consistency? E.g. control register handler will
>>>>>> need to write to PCI_COMMAND and go through emulation for
>>>>>> guests?
>>>>> Either check or write, yes. Since you're setting the bit here behind
>>>>> the guest's back, setting it on the other paths as well would only
>>>>> look consistent to me.
>>>> I can't find any access to PCI_COMMAND register from vMSI/vMSI-X
>>>> code, so what's the concern?
>>> Again: Only one of INTX, MSI, or MSI-X may be enabled at a time.
>> This is clear and I don't question that
>>> This needs to be checked whenever any one of the three is about
>>> to change state. Since failing config space writes isn't really
>>> an option (there's no error code to hand back and raising an
>>> exception is nothing real hardware would do), adjusting state to
>>> be sane behind the back of the guest looks to be the least bad
>>> option.
>> Would it be enough if I read PCI_MSIX_FLAGS_ENABLE and
>> PCI_MSI_FLAGS_ENABLE in guest_cmd_write to make a
>> decision on INTX?
>>
>> On the other hand msi->enabled and msix->enabled
>> already have this information if I understand the
>> MSI/MSI-X code correctly.
>>
>> Or do we want some additional code in MSI/MSI-X's control_write
>> functions to set INTX bit there as well?
> Well, yes, this is what Roger and I have been asking you to add.
Do we only want this for !is_hardware_domain(d) or unconditionally?
>
>> I mean that in this guest_cmd_write handler we can only see
>> if we write a consistent wrt MSI/MSI-X PCI_COMMAND value
>>
>> If we want some more checks when we alter PCI_MSIX_FLAGS_ENABLE
>> and/or PCI_MSI_FLAGS_ENABLE bits, this means we need a relevant
>> PCI_COMMAND write there to be added (which doesn't exist now)
>> to make sure INTX bit is set.
> Exactly.
Ok
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests
  2022-02-02 15:12                     ` Oleksandr Andrushchenko
@ 2022-02-02 15:31                       ` Jan Beulich
  0 siblings, 0 replies; 130+ messages in thread
From: Jan Beulich @ 2022-02-02 15:31 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 02.02.2022 16:12, Oleksandr Andrushchenko wrote:
> On 02.02.22 17:08, Jan Beulich wrote:
>> On 02.02.2022 16:04, Oleksandr Andrushchenko wrote:
>>> Or do we want some additional code in MSI/MSI-X's control_write
>>> functions to set INTX bit there as well?
>> Well, yes, this is what Roger and I have been asking you to add.
> Do we only want this for !is_hardware_domain(d) or unconditionally?

To keep present behavior unaltered, I'd suggest to do it only
conditionally.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2021-11-26 12:19     ` Oleksandr Andrushchenko
@ 2022-02-03 12:36       ` Oleksandr Andrushchenko
  2022-02-03 12:44         ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-03 12:36 UTC (permalink / raw)
  To: Bertrand Marquis, roger.pau
  Cc: Xen-devel, Julien Grall, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Rahul Singh, Oleksandr Andrushchenko

Hi, Bertrand!

On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
> Hi, Bertrand!
>
> On 25.11.21 18:28, Bertrand Marquis wrote:
>> Hi Oleksandr,
>>
>>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
>>>
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Add relevant vpci register handlers when assigning PCI device to a domain
>>> and remove those when de-assigning. This allows having different
>>> handlers for different domains, e.g. hwdom and other guests.
>>>
>>> Emulate guest BAR register values: this allows creating a guest view
>>> of the registers and emulates size and properties probe as it is done
>>> during PCI device enumeration by the guest.
>>>
>>> ROM BAR is only handled for the hardware domain and for guest domains
>>> there is a stub: at the moment PCI expansion ROM handling is supported
>>> for x86 only and it might not be used by other architectures without
>>> emulating x86. Other use-cases may include using that expansion ROM before
>>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>>> wants to use the ROM code which seems to be rare.
>> In the generic code, bars for ioports are actually skipped (check code before
>> in header.c, in case of ioports there is a continue) and no handler is registered for them.
>> The consequence will be that a guest will access hardware when reading those BARs.
> Yes, this seems to be a valid point
So, with the approach we have developed these days we will ignore all writes
and return ~0 for reads for all unhandled ops, e.g. those which do not have explicit
register handlers employed. Thus, this case will fall into unhandled clause.

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 12:36       ` Oleksandr Andrushchenko
@ 2022-02-03 12:44         ` Jan Beulich
  2022-02-03 12:48           ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-03 12:44 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Xen-devel, Julien Grall, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Rahul Singh, roger.pau, Bertrand Marquis

On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
> Hi, Bertrand!
> 
> On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
>> Hi, Bertrand!
>>
>> On 25.11.21 18:28, Bertrand Marquis wrote:
>>> Hi Oleksandr,
>>>
>>>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
>>>>
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Add relevant vpci register handlers when assigning PCI device to a domain
>>>> and remove those when de-assigning. This allows having different
>>>> handlers for different domains, e.g. hwdom and other guests.
>>>>
>>>> Emulate guest BAR register values: this allows creating a guest view
>>>> of the registers and emulates size and properties probe as it is done
>>>> during PCI device enumeration by the guest.
>>>>
>>>> ROM BAR is only handled for the hardware domain and for guest domains
>>>> there is a stub: at the moment PCI expansion ROM handling is supported
>>>> for x86 only and it might not be used by other architectures without
>>>> emulating x86. Other use-cases may include using that expansion ROM before
>>>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>>>> wants to use the ROM code which seems to be rare.
>>> In the generic code, bars for ioports are actually skipped (check code before
>>> in header.c, in case of ioports there is a continue) and no handler is registered for them.
>>> The consequence will be that a guest will access hardware when reading those BARs.
>> Yes, this seems to be a valid point
> So, with the approach we have developed these days we will ignore all writes
> and return ~0 for reads for all unhandled ops, e.g. those which do not have explicit
> register handlers employed. Thus, this case will fall into unhandled clause.

Except that I guess BARs are special in that reads may not return ~0,
or else the low bits carry a meaning we don't want to convey. Unused
BARs need to be hard-wired to 0, I think.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-01-31  9:53     ` Oleksandr Andrushchenko
  2022-01-31 10:56       ` Roger Pau Monné
@ 2022-02-03 12:45       ` Oleksandr Andrushchenko
  2022-02-03 12:54         ` Jan Beulich
  1 sibling, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-03 12:45 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!
>> Also memory decoding needs to be initially disabled when used by
>> guests, in order to prevent the BAR being placed on top of a RAM
>> region. The guest physmap will be different from the host one, so it's
>> possible for BARs to end up placed on top of RAM regions initially
>> until the firmware or OS places them at a suitable address.
> Agree, memory decoding must be disabled
Isn't it already achieved by the toolstack resetting the PCI device
while assigning  it to a guest?

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 12:44         ` Jan Beulich
@ 2022-02-03 12:48           ` Oleksandr Andrushchenko
  2022-02-03 12:50             ` Jan Beulich
  0 siblings, 1 reply; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-03 12:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Julien Grall, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Rahul Singh, roger.pau, Bertrand Marquis,
	Oleksandr Andrushchenko

Hi, Jan!

On 03.02.22 14:44, Jan Beulich wrote:
> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
>> Hi, Bertrand!
>>
>> On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
>>> Hi, Bertrand!
>>>
>>> On 25.11.21 18:28, Bertrand Marquis wrote:
>>>> Hi Oleksandr,
>>>>
>>>>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
>>>>>
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>
>>>>> Add relevant vpci register handlers when assigning PCI device to a domain
>>>>> and remove those when de-assigning. This allows having different
>>>>> handlers for different domains, e.g. hwdom and other guests.
>>>>>
>>>>> Emulate guest BAR register values: this allows creating a guest view
>>>>> of the registers and emulates size and properties probe as it is done
>>>>> during PCI device enumeration by the guest.
>>>>>
>>>>> ROM BAR is only handled for the hardware domain and for guest domains
>>>>> there is a stub: at the moment PCI expansion ROM handling is supported
>>>>> for x86 only and it might not be used by other architectures without
>>>>> emulating x86. Other use-cases may include using that expansion ROM before
>>>>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>>>>> wants to use the ROM code which seems to be rare.
>>>> In the generic code, bars for ioports are actually skipped (check code before
>>>> in header.c, in case of ioports there is a continue) and no handler is registered for them.
>>>> The consequence will be that a guest will access hardware when reading those BARs.
>>> Yes, this seems to be a valid point
>> So, with the approach we have developed these days we will ignore all writes
>> and return ~0 for reads for all unhandled ops, e.g. those which do not have explicit
>> register handlers employed. Thus, this case will fall into unhandled clause.
> Except that I guess BARs are special in that reads may not return ~0,
> or else the low bits carry a meaning we don't want to convey. Unused
> BARs need to be hard-wired to 0, I think.
So, you mean we should have 2 sets of BAR handlers for guests:
1. normal emulation (these are implemented in this patch)
2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO etc.

Is this what you mean?
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 12:48           ` Oleksandr Andrushchenko
@ 2022-02-03 12:50             ` Jan Beulich
  2022-02-03 12:53               ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-03 12:50 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Xen-devel, Julien Grall, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Rahul Singh, roger.pau, Bertrand Marquis

On 03.02.2022 13:48, Oleksandr Andrushchenko wrote:
> Hi, Jan!
> 
> On 03.02.22 14:44, Jan Beulich wrote:
>> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
>>> Hi, Bertrand!
>>>
>>> On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
>>>> Hi, Bertrand!
>>>>
>>>> On 25.11.21 18:28, Bertrand Marquis wrote:
>>>>> Hi Oleksandr,
>>>>>
>>>>>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
>>>>>>
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> Add relevant vpci register handlers when assigning PCI device to a domain
>>>>>> and remove those when de-assigning. This allows having different
>>>>>> handlers for different domains, e.g. hwdom and other guests.
>>>>>>
>>>>>> Emulate guest BAR register values: this allows creating a guest view
>>>>>> of the registers and emulates size and properties probe as it is done
>>>>>> during PCI device enumeration by the guest.
>>>>>>
>>>>>> ROM BAR is only handled for the hardware domain and for guest domains
>>>>>> there is a stub: at the moment PCI expansion ROM handling is supported
>>>>>> for x86 only and it might not be used by other architectures without
>>>>>> emulating x86. Other use-cases may include using that expansion ROM before
>>>>>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>>>>>> wants to use the ROM code which seems to be rare.
>>>>> In the generic code, bars for ioports are actually skipped (check code before
>>>>> in header.c, in case of ioports there is a continue) and no handler is registered for them.
>>>>> The consequence will be that a guest will access hardware when reading those BARs.
>>>> Yes, this seems to be a valid point
>>> So, with the approach we have developed these days we will ignore all writes
>>> and return ~0 for reads for all unhandled ops, e.g. those which do not have explicit
>>> register handlers employed. Thus, this case will fall into unhandled clause.
>> Except that I guess BARs are special in that reads may not return ~0,
>> or else the low bits carry a meaning we don't want to convey. Unused
>> BARs need to be hard-wired to 0, I think.
> So, you mean we should have 2 sets of BAR handlers for guests:
> 1. normal emulation (these are implemented in this patch)
> 2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO etc.
> 
> Is this what you mean?

I think that's what we're going to need, yes.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 12:50             ` Jan Beulich
@ 2022-02-03 12:53               ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-03 12:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Xen-devel, Julien Grall, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Rahul Singh, roger.pau, Bertrand Marquis,
	Oleksandr Andrushchenko



On 03.02.22 14:50, Jan Beulich wrote:
> On 03.02.2022 13:48, Oleksandr Andrushchenko wrote:
>> Hi, Jan!
>>
>> On 03.02.22 14:44, Jan Beulich wrote:
>>> On 03.02.2022 13:36, Oleksandr Andrushchenko wrote:
>>>> Hi, Bertrand!
>>>>
>>>> On 26.11.21 14:19, Oleksandr Andrushchenko wrote:
>>>>> Hi, Bertrand!
>>>>>
>>>>> On 25.11.21 18:28, Bertrand Marquis wrote:
>>>>>> Hi Oleksandr,
>>>>>>
>>>>>>> On 25 Nov 2021, at 11:02, Oleksandr Andrushchenko <andr2000@gmail.com> wrote:
>>>>>>>
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>
>>>>>>> Add relevant vpci register handlers when assigning PCI device to a domain
>>>>>>> and remove those when de-assigning. This allows having different
>>>>>>> handlers for different domains, e.g. hwdom and other guests.
>>>>>>>
>>>>>>> Emulate guest BAR register values: this allows creating a guest view
>>>>>>> of the registers and emulates size and properties probe as it is done
>>>>>>> during PCI device enumeration by the guest.
>>>>>>>
>>>>>>> ROM BAR is only handled for the hardware domain and for guest domains
>>>>>>> there is a stub: at the moment PCI expansion ROM handling is supported
>>>>>>> for x86 only and it might not be used by other architectures without
>>>>>>> emulating x86. Other use-cases may include using that expansion ROM before
>>>>>>> Xen boots, hence no emulation is needed in Xen itself. Or when a guest
>>>>>>> wants to use the ROM code which seems to be rare.
>>>>>> In the generic code, bars for ioports are actually skipped (check code before
>>>>>> in header.c, in case of ioports there is a continue) and no handler is registered for them.
>>>>>> The consequence will be that a guest will access hardware when reading those BARs.
>>>>> Yes, this seems to be a valid point
>>>> So, with the approach we have developed these days we will ignore all writes
>>>> and return ~0 for reads for all unhandled ops, e.g. those which do not have explicit
>>>> register handlers employed. Thus, this case will fall into unhandled clause.
>>> Except that I guess BARs are special in that reads may not return ~0,
>>> or else the low bits carry a meaning we don't want to convey. Unused
>>> BARs need to be hard-wired to 0, I think.
>> So, you mean we should have 2 sets of BAR handlers for guests:
>> 1. normal emulation (these are implemented in this patch)
>> 2. all other BARs: read 0/ignore write for all other BARs, including ROM, IO etc.
>>
>> Is this what you mean?
> I think that's what we're going to need, yes.
Ok, then I'll stuff that into this patch v6
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 12:45       ` Oleksandr Andrushchenko
@ 2022-02-03 12:54         ` Jan Beulich
  2022-02-03 13:30           ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-03 12:54 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>>> Also memory decoding needs to be initially disabled when used by
>>> guests, in order to prevent the BAR being placed on top of a RAM
>>> region. The guest physmap will be different from the host one, so it's
>>> possible for BARs to end up placed on top of RAM regions initially
>>> until the firmware or OS places them at a suitable address.
>> Agree, memory decoding must be disabled
> Isn't it already achieved by the toolstack resetting the PCI device
> while assigning  it to a guest?

Iirc the tool stack would reset a device only after getting it back from
a DomU. When coming straight from Dom0 or DomIO, no reset would be
performed. Furthermore, (again iirc) there are cases where there's no
known (standard) way to reset a device. Assigning such to a guest when
it previously was owned by another one is risky (and hence needs an
admin knowing what they're doing), but may be acceptable in particular
when e.g. simply rebooting a guest.

IOW - I don't think you can rely on the bit being in a particular state.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 12:54         ` Jan Beulich
@ 2022-02-03 13:30           ` Oleksandr Andrushchenko
  2022-02-03 14:04             ` Jan Beulich
  2022-02-03 14:05             ` Roger Pau Monné
  0 siblings, 2 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-03 13:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné



On 03.02.22 14:54, Jan Beulich wrote:
> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>>>> Also memory decoding needs to be initially disabled when used by
>>>> guests, in order to prevent the BAR being placed on top of a RAM
>>>> region. The guest physmap will be different from the host one, so it's
>>>> possible for BARs to end up placed on top of RAM regions initially
>>>> until the firmware or OS places them at a suitable address.
>>> Agree, memory decoding must be disabled
>> Isn't it already achieved by the toolstack resetting the PCI device
>> while assigning  it to a guest?
> Iirc the tool stack would reset a device only after getting it back from
> a DomU. When coming straight from Dom0 or DomIO, no reset would be
> performed. Furthermore, (again iirc) there are cases where there's no
> known (standard) way to reset a device. Assigning such to a guest when
> it previously was owned by another one is risky (and hence needs an
> admin knowing what they're doing), but may be acceptable in particular
> when e.g. simply rebooting a guest.
>
> IOW - I don't think you can rely on the bit being in a particular state.
So, you mean something like:

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 7695158e6445..9ebd57472da8 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
              return rc;
      }

+    /*
+     * Memory decoding needs to be initially disabled when used by
+     * guests, in order to prevent the BAR being placed on top of a RAM
+     * region.
+     */
+    if ( !is_hwdom )
+        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
+
      return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
  }
  REGISTER_VPCI_INIT(init_bars, VPCI_PRIORITY_MIDDLE);

> Jan
>
Thank you,
Oleksandr

^ permalink raw reply related	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 13:30           ` Oleksandr Andrushchenko
@ 2022-02-03 14:04             ` Jan Beulich
  2022-02-03 14:19               ` Oleksandr Andrushchenko
  2022-02-03 14:05             ` Roger Pau Monné
  1 sibling, 1 reply; 130+ messages in thread
From: Jan Beulich @ 2022-02-03 14:04 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné

On 03.02.2022 14:30, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.02.22 14:54, Jan Beulich wrote:
>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>>>>> Also memory decoding needs to be initially disabled when used by
>>>>> guests, in order to prevent the BAR being placed on top of a RAM
>>>>> region. The guest physmap will be different from the host one, so it's
>>>>> possible for BARs to end up placed on top of RAM regions initially
>>>>> until the firmware or OS places them at a suitable address.
>>>> Agree, memory decoding must be disabled
>>> Isn't it already achieved by the toolstack resetting the PCI device
>>> while assigning  it to a guest?
>> Iirc the tool stack would reset a device only after getting it back from
>> a DomU. When coming straight from Dom0 or DomIO, no reset would be
>> performed. Furthermore, (again iirc) there are cases where there's no
>> known (standard) way to reset a device. Assigning such to a guest when
>> it previously was owned by another one is risky (and hence needs an
>> admin knowing what they're doing), but may be acceptable in particular
>> when e.g. simply rebooting a guest.
>>
>> IOW - I don't think you can rely on the bit being in a particular state.
> So, you mean something like:

Perhaps, but then I think ...

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>               return rc;
>       }
> 
> +    /*
> +     * Memory decoding needs to be initially disabled when used by
> +     * guests, in order to prevent the BAR being placed on top of a RAM
> +     * region.
> +     */
> +    if ( !is_hwdom )
> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
> +
>       return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;

... you also want to update cmd, thus avoiding the call to modify_bars().

And btw, from an abstract pov the same is true for I/O decoding: I
realize that you mean to leave I/O port BARs aside for the moment, but I
think the command register handling could very well take care of both.

Which quickly gets us to the bus master enable bit: I think that one
should initially be off too. Making me wonder: Doesn't the PCI spec
define what the reset state of this register is? If so, that's what I
think we want to put in place for DomU-s.

Jan



^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 13:30           ` Oleksandr Andrushchenko
  2022-02-03 14:04             ` Jan Beulich
@ 2022-02-03 14:05             ` Roger Pau Monné
  2022-02-03 14:26               ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 130+ messages in thread
From: Roger Pau Monné @ 2022-02-03 14:05 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: Jan Beulich, xen-devel, julien, sstabellini,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Artem Mygaiev,
	andrew.cooper3, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh

On Thu, Feb 03, 2022 at 01:30:26PM +0000, Oleksandr Andrushchenko wrote:
> 
> 
> On 03.02.22 14:54, Jan Beulich wrote:
> > On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
> >>>> Also memory decoding needs to be initially disabled when used by
> >>>> guests, in order to prevent the BAR being placed on top of a RAM
> >>>> region. The guest physmap will be different from the host one, so it's
> >>>> possible for BARs to end up placed on top of RAM regions initially
> >>>> until the firmware or OS places them at a suitable address.
> >>> Agree, memory decoding must be disabled
> >> Isn't it already achieved by the toolstack resetting the PCI device
> >> while assigning  it to a guest?
> > Iirc the tool stack would reset a device only after getting it back from
> > a DomU. When coming straight from Dom0 or DomIO, no reset would be
> > performed. Furthermore, (again iirc) there are cases where there's no
> > known (standard) way to reset a device. Assigning such to a guest when
> > it previously was owned by another one is risky (and hence needs an
> > admin knowing what they're doing), but may be acceptable in particular
> > when e.g. simply rebooting a guest.
> >
> > IOW - I don't think you can rely on the bit being in a particular state.
> So, you mean something like:
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 7695158e6445..9ebd57472da8 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>               return rc;
>       }
> 
> +    /*
> +     * Memory decoding needs to be initially disabled when used by
> +     * guests, in order to prevent the BAR being placed on top of a RAM
> +     * region.
> +     */
> +    if ( !is_hwdom )
> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);

Memory decoding is already disabled here, so you just need to avoid
enabling it, for example:

    /*
     * Memory decoding needs to be initially disabled when used by
     * guests, in order to prevent the BARs being mapped at gfn 0 by
     * default.
     */
    if ( !is_hwdom )
        cmd &= ~PCI_COMMAND_MEMORY;

>       return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;

This is important here because guest_reg won't be set (ie: will be set
to 0) so if for some reason memory decoding was enabled you would end
up with BARs mappings overlapping at gfn 0.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 14:04             ` Jan Beulich
@ 2022-02-03 14:19               ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-03 14:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Roger Pau Monné



On 03.02.22 16:04, Jan Beulich wrote:
> On 03.02.2022 14:30, Oleksandr Andrushchenko wrote:
>>
>> On 03.02.22 14:54, Jan Beulich wrote:
>>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>>>>>> Also memory decoding needs to be initially disabled when used by
>>>>>> guests, in order to prevent the BAR being placed on top of a RAM
>>>>>> region. The guest physmap will be different from the host one, so it's
>>>>>> possible for BARs to end up placed on top of RAM regions initially
>>>>>> until the firmware or OS places them at a suitable address.
>>>>> Agree, memory decoding must be disabled
>>>> Isn't it already achieved by the toolstack resetting the PCI device
>>>> while assigning  it to a guest?
>>> Iirc the tool stack would reset a device only after getting it back from
>>> a DomU. When coming straight from Dom0 or DomIO, no reset would be
>>> performed. Furthermore, (again iirc) there are cases where there's no
>>> known (standard) way to reset a device. Assigning such to a guest when
>>> it previously was owned by another one is risky (and hence needs an
>>> admin knowing what they're doing), but may be acceptable in particular
>>> when e.g. simply rebooting a guest.
>>>
>>> IOW - I don't think you can rely on the bit being in a particular state.
>> So, you mean something like:
> Perhaps, but then I think ...
>
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>>                return rc;
>>        }
>>
>> +    /*
>> +     * Memory decoding needs to be initially disabled when used by
>> +     * guests, in order to prevent the BAR being placed on top of a RAM
>> +     * region.
>> +     */
>> +    if ( !is_hwdom )
>> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
>> +
>>        return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
> ... you also want to update cmd, thus avoiding the call to modify_bars().
>
> And btw, from an abstract pov the same is true for I/O decoding: I
> realize that you mean to leave I/O port BARs aside for the moment, but I
> think the command register handling could very well take care of both.
>
> Which quickly gets us to the bus master enable bit: I think that one
> should initially be off too. Making me wonder: Doesn't the PCI spec
> define what the reset state of this register is? If so, that's what I
> think we want to put in place for DomU-s.
The spec I have says that all bits are typically 0 after reset.
So, it seems to be reasonable to just write 0 to PCI_COMMAND
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 130+ messages in thread

* Re: [PATCH v5 06/14] vpci/header: implement guest BAR register handlers
  2022-02-03 14:05             ` Roger Pau Monné
@ 2022-02-03 14:26               ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 130+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-03 14:26 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, andrew.cooper3, george.dunlap,
	paul, Bertrand Marquis, Rahul Singh, Oleksandr Andrushchenko



On 03.02.22 16:05, Roger Pau Monné wrote:
> On Thu, Feb 03, 2022 at 01:30:26PM +0000, Oleksandr Andrushchenko wrote:
>>
>> On 03.02.22 14:54, Jan Beulich wrote:
>>> On 03.02.2022 13:45, Oleksandr Andrushchenko wrote:
>>>>>> Also memory decoding needs to be initially disabled when used by
>>>>>> guests, in order to prevent the BAR being placed on top of a RAM
>>>>>> region. The guest physmap will be different from the host one, so it's
>>>>>> possible for BARs to end up placed on top of RAM regions initially
>>>>>> until the firmware or OS places them at a suitable address.
>>>>> Agree, memory decoding must be disabled
>>>> Isn't it already achieved by the toolstack resetting the PCI device
>>>> while assigning  it to a guest?
>>> Iirc the tool stack would reset a device only after getting it back from
>>> a DomU. When coming straight from Dom0 or DomIO, no reset would be
>>> performed. Furthermore, (again iirc) there are cases where there's no
>>> known (standard) way to reset a device. Assigning such to a guest when
>>> it previously was owned by another one is risky (and hence needs an
>>> admin knowing what they're doing), but may be acceptable in particular
>>> when e.g. simply rebooting a guest.
>>>
>>> IOW - I don't think you can rely on the bit being in a particular state.
>> So, you mean something like:
>>
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 7695158e6445..9ebd57472da8 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -808,6 +808,14 @@ static int init_bars(struct pci_dev *pdev)
>>                return rc;
>>        }
>>
>> +    /*
>> +     * Memory decoding needs to be initially disabled when used by
>> +     * guests, in order to prevent the BAR being placed on top of a RAM
>> +     * region.
>> +     */
>> +    if ( !is_hwdom )
>> +        pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
> Memory decoding is already disabled here, so you just need to avoid
> enabling it, for example:
>
>      /*
>       * Memory decoding needs to be initially disabled when used by
>       * guests, in order to prevent the BARs being mapped at gfn 0 by
>       * default.
>       */
>      if ( !is_hwdom )
>          cmd &= ~PCI_COMMAND_MEMORY;
>
>>        return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
> This is important here because guest_reg won't be set (ie: will be set
> to 0) so if for some reason memory decoding was enabled you would end
> up with BARs mappings overlapping at gfn 0.
Then the patch [1] will do what we need to be done for the guest I guess
I am thinking to still have it in the series which will solve exactly the problem
we are trying to solve
>
> Thanks, Roger.
[1] https://patchwork.kernel.org/project/xen-devel/patch/20211125110251.2877218-11-andr2000@gmail.com/

^ permalink raw reply	[flat|nested] 130+ messages in thread

end of thread, other threads:[~2022-02-03 14:26 UTC | newest]

Thread overview: 130+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25 11:02 [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 01/14] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
2021-11-25 11:06   ` Jan Beulich
2021-11-25 11:08     ` Oleksandr Andrushchenko
2021-12-15  3:20   ` Volodymyr Babchuk
2021-12-15  5:53     ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 02/14] vpci: fix function attributes for vpci_process_pending Oleksandr Andrushchenko
2021-12-10 17:55   ` Julien Grall
2021-12-11  8:20     ` Roger Pau Monné
2021-12-11  8:57       ` Oleksandr Andrushchenko
2022-01-26  8:31         ` Oleksandr Andrushchenko
2022-01-26 10:54           ` Jan Beulich
2021-11-25 11:02 ` [PATCH v5 03/14] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
2022-01-11 15:17   ` Roger Pau Monné
2022-01-12 14:42     ` Jan Beulich
2022-01-26  8:40       ` Oleksandr Andrushchenko
2022-01-26 11:13         ` Roger Pau Monné
2022-01-31  7:41           ` Oleksandr Andrushchenko
2022-01-12 14:57   ` Jan Beulich
2022-01-12 15:42     ` Roger Pau Monné
2022-01-12 15:52       ` Jan Beulich
2022-01-13  8:58         ` Roger Pau Monné
2022-01-28 14:15           ` Oleksandr Andrushchenko
2022-01-31  8:56             ` Roger Pau Monné
2022-01-31  9:00               ` Oleksandr Andrushchenko
2022-01-28 14:12     ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 04/14] vpci: cancel pending map/unmap on vpci removal Oleksandr Andrushchenko
2022-01-11 16:57   ` Roger Pau Monné
2022-01-12 15:27   ` Jan Beulich
2022-01-28 12:21     ` Oleksandr Andrushchenko
2022-01-31  7:53   ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 05/14] vpci: add hooks for PCI device assign/de-assign Oleksandr Andrushchenko
2022-01-12 12:12   ` Roger Pau Monné
2022-01-31  8:43     ` Oleksandr Andrushchenko
2022-01-13 11:40   ` Roger Pau Monné
2022-01-31  8:45     ` Oleksandr Andrushchenko
2022-02-01  8:56       ` Oleksandr Andrushchenko
2022-02-01 10:23         ` Roger Pau Monné
2021-11-25 11:02 ` [PATCH v5 06/14] vpci/header: implement guest BAR register handlers Oleksandr Andrushchenko
2021-11-25 16:28   ` Bertrand Marquis
2021-11-26 12:19     ` Oleksandr Andrushchenko
2022-02-03 12:36       ` Oleksandr Andrushchenko
2022-02-03 12:44         ` Jan Beulich
2022-02-03 12:48           ` Oleksandr Andrushchenko
2022-02-03 12:50             ` Jan Beulich
2022-02-03 12:53               ` Oleksandr Andrushchenko
2022-01-12 12:35   ` Roger Pau Monné
2022-01-31  9:47     ` Oleksandr Andrushchenko
2022-01-31 10:40       ` Oleksandr Andrushchenko
2022-01-31 10:54         ` Jan Beulich
2022-01-31 11:04           ` Oleksandr Andrushchenko
2022-01-31 11:27             ` Roger Pau Monné
2022-01-31 11:30               ` Oleksandr Andrushchenko
2022-01-31 11:10         ` Roger Pau Monné
2022-01-31 11:23           ` Oleksandr Andrushchenko
2022-01-31 11:31             ` Roger Pau Monné
2022-01-31 11:39             ` Jan Beulich
2022-01-31 13:30               ` Oleksandr Andrushchenko
2022-01-31 13:36                 ` Jan Beulich
2022-01-31 13:41                   ` Oleksandr Andrushchenko
2022-01-31 13:51                     ` Jan Beulich
2022-01-31 13:58                       ` Oleksandr Andrushchenko
2022-01-31 11:04       ` Roger Pau Monné
2022-01-31 14:51         ` Oleksandr Andrushchenko
2022-01-31 15:06     ` Oleksandr Andrushchenko
2022-01-31 15:50       ` Jan Beulich
2022-02-01  7:31         ` Oleksandr Andrushchenko
2022-02-01 10:10           ` Roger Pau Monné
2022-02-01 10:41             ` Oleksandr Andrushchenko
2022-01-12 17:34   ` Roger Pau Monné
2022-01-31  9:53     ` Oleksandr Andrushchenko
2022-01-31 10:56       ` Roger Pau Monné
2022-02-03 12:45       ` Oleksandr Andrushchenko
2022-02-03 12:54         ` Jan Beulich
2022-02-03 13:30           ` Oleksandr Andrushchenko
2022-02-03 14:04             ` Jan Beulich
2022-02-03 14:19               ` Oleksandr Andrushchenko
2022-02-03 14:05             ` Roger Pau Monné
2022-02-03 14:26               ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 07/14] vpci/header: handle p2m range sets per BAR Oleksandr Andrushchenko
2022-01-12 15:15   ` Roger Pau Monné
2022-01-12 15:18     ` Jan Beulich
2022-02-02  6:44     ` Oleksandr Andrushchenko
2022-02-02  9:56       ` Roger Pau Monné
2022-02-02 10:02         ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 08/14] vpci/header: program p2m with guest BAR view Oleksandr Andrushchenko
2022-01-13 10:22   ` Roger Pau Monné
2022-02-02  8:23     ` Oleksandr Andrushchenko
2022-02-02  9:46       ` Oleksandr Andrushchenko
2022-02-02 10:34         ` Roger Pau Monné
2022-02-02 10:44           ` Oleksandr Andrushchenko
2022-02-02 11:11             ` Jan Beulich
2022-02-02 11:14               ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 09/14] vpci/header: emulate PCI_COMMAND register for guests Oleksandr Andrushchenko
2022-01-13 10:50   ` Roger Pau Monné
2022-02-02 12:49     ` Oleksandr Andrushchenko
2022-02-02 13:32       ` Jan Beulich
2022-02-02 13:47         ` Oleksandr Andrushchenko
2022-02-02 14:18           ` Jan Beulich
2022-02-02 14:26             ` Oleksandr Andrushchenko
2022-02-02 14:31               ` Jan Beulich
2022-02-02 15:04                 ` Oleksandr Andrushchenko
2022-02-02 15:08                   ` Jan Beulich
2022-02-02 15:12                     ` Oleksandr Andrushchenko
2022-02-02 15:31                       ` Jan Beulich
2021-11-25 11:02 ` [PATCH v5 10/14] vpci/header: reset the command register when adding devices Oleksandr Andrushchenko
2022-01-13 11:07   ` Roger Pau Monné
2022-02-02 12:58     ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 11/14] vpci: add initial support for virtual PCI bus topology Oleksandr Andrushchenko
2022-01-12 15:39   ` Jan Beulich
2022-02-02 13:15     ` Oleksandr Andrushchenko
2022-01-13 11:35   ` Roger Pau Monné
2022-02-02 13:17     ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 12/14] xen/arm: translate virtual PCI bus topology for guests Oleksandr Andrushchenko
2022-01-13 12:18   ` Roger Pau Monné
2022-02-02 13:58     ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 13/14] xen/arm: account IO handlers for emulated PCI MSI-X Oleksandr Andrushchenko
2022-01-13 13:23   ` Roger Pau Monné
2022-02-02 14:08     ` Oleksandr Andrushchenko
2021-11-25 11:02 ` [PATCH v5 14/14] vpci: add TODO for the registers not explicitly handled Oleksandr Andrushchenko
2021-11-25 11:17   ` Jan Beulich
2021-11-25 11:20     ` Oleksandr Andrushchenko
2022-01-13 13:27     ` Roger Pau Monné
2022-01-13 13:38       ` Jan Beulich
2022-01-28 13:03         ` Oleksandr Andrushchenko
2021-12-15 11:56 ` [PATCH v5 00/14] PCI devices passthrough on Arm, part 3 Oleksandr Andrushchenko
2021-12-15 12:07   ` Jan Beulich
2021-12-15 12:22     ` Oleksandr Andrushchenko
2021-12-15 14:51       ` Roger Pau Monné
2021-12-15 15:02         ` Oleksandr Andrushchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.