All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] PCI devices passthrough pre-req patches
@ 2022-02-01 16:25 Oleksandr Andrushchenko
  2022-02-01 16:25 ` [PATCH 1/4] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 16:25 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Hi, all!

While working on vPCI series [1] I have created number of patches that
either add some useful helpers or serve as the ground for the upcoming
vPCI changes or both.

To ease the task of reviewing the bigger vPCI series I am sending these
now with the hope they can be accepted.

I would like to thank Roger and Jan for providing valuable comments and
ideas, some of which have materialized here.

Thank you,
Oleksandr

[1] https://patchwork.kernel.org/project/xen-devel/list/?series=585781

Oleksandr Andrushchenko (3):
  rangeset: add RANGESETF_no_print flag
  rangeset: add rangeset_reset helper function
  vpci: shrink critical section in vpci_{read/write}

Roger Pau Monne (1):
  vpci: move lock outside of struct vpci

 tools/tests/vpci/emul.h       |  5 ++-
 tools/tests/vpci/main.c       |  4 +--
 xen/arch/x86/hvm/vmsi.c       |  8 ++---
 xen/common/rangeset.c         | 11 +++++-
 xen/drivers/passthrough/pci.c |  1 +
 xen/drivers/vpci/header.c     | 21 ++++++++----
 xen/drivers/vpci/msi.c        | 11 ++++--
 xen/drivers/vpci/msix.c       |  8 ++---
 xen/drivers/vpci/vpci.c       | 64 ++++++++++++++++++++++-------------
 xen/include/xen/pci.h         |  1 +
 xen/include/xen/rangeset.h    |  8 +++--
 xen/include/xen/vpci.h        |  3 +-
 12 files changed, 94 insertions(+), 51 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] rangeset: add RANGESETF_no_print flag
  2022-02-01 16:25 [PATCH 0/4] PCI devices passthrough pre-req patches Oleksandr Andrushchenko
@ 2022-02-01 16:25 ` Oleksandr Andrushchenko
  2022-02-01 16:25 ` [PATCH 2/4] rangeset: add rangeset_reset helper function Oleksandr Andrushchenko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 16:25 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

There are range sets which should not be printed, so introduce a flag
which allows marking those as such. Implement relevant logic to skip
such entries while printing.

While at it also simplify the definition of the flags by directly
defining those without helpers.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Since v5:
- comment indentation (Jan)
Since v1:
- update BUG_ON with new flag
- simplify the definition of the flags
---
 xen/common/rangeset.c      | 5 ++++-
 xen/include/xen/rangeset.h | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 885b6b15c229..ea27d651723b 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -433,7 +433,7 @@ struct rangeset *rangeset_new(
     INIT_LIST_HEAD(&r->range_list);
     r->nr_ranges = -1;
 
-    BUG_ON(flags & ~RANGESETF_prettyprint_hex);
+    BUG_ON(flags & ~(RANGESETF_prettyprint_hex | RANGESETF_no_print));
     r->flags = flags;
 
     safe_strcpy(r->name, name ?: "(no name)");
@@ -575,6 +575,9 @@ void rangeset_domain_printk(
 
     list_for_each_entry ( r, &d->rangesets, rangeset_list )
     {
+        if ( r->flags & RANGESETF_no_print )
+            continue;
+
         printk("    ");
         rangeset_printk(r);
         printk("\n");
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 135f33f6066f..f7c69394d66a 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -49,8 +49,9 @@ void rangeset_limit(
 
 /* Flags for passing to rangeset_new(). */
  /* Pretty-print range limits in hexadecimal. */
-#define _RANGESETF_prettyprint_hex 0
-#define RANGESETF_prettyprint_hex  (1U << _RANGESETF_prettyprint_hex)
+#define RANGESETF_prettyprint_hex   (1U << 0)
+ /* Do not print entries marked with this flag. */
+#define RANGESETF_no_print          (1U << 1)
 
 bool_t __must_check rangeset_is_empty(
     const struct rangeset *r);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] rangeset: add rangeset_reset helper function
  2022-02-01 16:25 [PATCH 0/4] PCI devices passthrough pre-req patches Oleksandr Andrushchenko
  2022-02-01 16:25 ` [PATCH 1/4] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
@ 2022-02-01 16:25 ` Oleksandr Andrushchenko
  2022-02-01 17:05   ` Julien Grall
  2022-02-01 16:25 ` [PATCH 3/4] vpci: shrink critical section in vpci_{read/write} Oleksandr Andrushchenko
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 16:25 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

This helper destroys all the ranges of the rangeset given.
Please note, that it uses rangeset_remove_range which returns an error
code on failure. This error code can be ignored as while destroying all
the ranges no memory allocation is expected, so in this case it must not
fail.

To make sure this remains valid use BUG_ON if that changes in the future.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 xen/common/rangeset.c      | 6 ++++++
 xen/include/xen/rangeset.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index ea27d651723b..9ca2b06cff22 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -525,6 +525,12 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
     write_unlock(&b->lock);
 }
 
+void rangeset_reset(struct rangeset *r)
+{
+    /* This doesn't allocate anything and must not fail. */
+    BUG_ON(rangeset_remove_range(r, 0, ~0ULL));
+}
+
 /*****************************
  * Pretty-printing functions
  */
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index f7c69394d66a..e0d70d88bdd7 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -95,6 +95,9 @@ bool_t __must_check rangeset_contains_singleton(
 /* swap contents */
 void rangeset_swap(struct rangeset *a, struct rangeset *b);
 
+/* Destroy all ranges. */
+void rangeset_reset(struct rangeset *r);
+
 /* Rangeset pretty printing. */
 void rangeset_domain_printk(
     struct domain *d);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] vpci: shrink critical section in vpci_{read/write}
  2022-02-01 16:25 [PATCH 0/4] PCI devices passthrough pre-req patches Oleksandr Andrushchenko
  2022-02-01 16:25 ` [PATCH 1/4] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
  2022-02-01 16:25 ` [PATCH 2/4] rangeset: add rangeset_reset helper function Oleksandr Andrushchenko
@ 2022-02-01 16:25 ` Oleksandr Andrushchenko
  2022-02-02  8:44   ` Roger Pau Monné
  2022-02-01 16:25 ` [PATCH 4/4] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
  2022-02-02  8:48 ` [PATCH 0/4] PCI devices passthrough pre-req patches Jan Beulich
  4 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 16:25 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Shrink critical section in vpci_{read/write} as racing calls to
vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
around pci_conf_{read,write} functions, and the required locking (in
case of using the IO ports) is already taken care in pci_conf_{read,write}.

Please note, that we anyways split 64bit writes into two 32bit ones
without taking the lock for the whole duration of the access, so it is
possible to see a partially updated state as a result of a 64bit write:
the PCI(e) specification don't seem to specify whether the ECAM is allowed
to split memory transactions into multiple Configuration Requests and
whether those could then interleave with requests from a different CPU.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

---
New in v6
---
 xen/drivers/vpci/vpci.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 657697fe3406..fb0947179b79 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -370,6 +370,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
             break;
         ASSERT(data_offset < size);
     }
+    spin_unlock(&pdev->vpci->lock);
 
     if ( data_offset < size )
     {
@@ -379,7 +380,6 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 
         data = merge_result(data, tmp_data, size - data_offset, data_offset);
     }
-    spin_unlock(&pdev->vpci->lock);
 
     return data & (0xffffffff >> (32 - 8 * size));
 }
@@ -475,13 +475,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
             break;
         ASSERT(data_offset < size);
     }
+    spin_unlock(&pdev->vpci->lock);
 
     if ( data_offset < size )
         /* Tailing gap, write the remaining. */
         vpci_write_hw(sbdf, reg + data_offset, size - data_offset,
                       data >> (data_offset * 8));
-
-    spin_unlock(&pdev->vpci->lock);
 }
 
 /* Helper function to check an access size and alignment on vpci space. */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] vpci: move lock outside of struct vpci
  2022-02-01 16:25 [PATCH 0/4] PCI devices passthrough pre-req patches Oleksandr Andrushchenko
                   ` (2 preceding siblings ...)
  2022-02-01 16:25 ` [PATCH 3/4] vpci: shrink critical section in vpci_{read/write} Oleksandr Andrushchenko
@ 2022-02-01 16:25 ` Oleksandr Andrushchenko
  2022-02-02  9:22   ` Jan Beulich
  2022-02-02  9:45   ` Roger Pau Monné
  2022-02-02  8:48 ` [PATCH 0/4] PCI devices passthrough pre-req patches Jan Beulich
  4 siblings, 2 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 16:25 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

From: Roger Pau Monne <roger.pau@citrix.com>

This way the lock can be used to check whether vpci is present, and
removal can be performed while holding the lock, in order to make
sure there are no accesses to the contents of the vpci struct.
Previously removal could race with vpci_read for example, since the
lock was dropped prior to freeing pdev->vpci.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
---
New in v5 of this series: this is an updated version of the patch published at
https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/

Changes since v5:
 - do not split code into vpci_remove_device_handlers_locked yet
 - move INIT_LIST_HEAD outside the locked region (Jan)
 - stripped out locking optimizations for vpci_{read|write} into a
   dedicated patch
Changes since v2:
 - fixed pdev->vpci = xzalloc(struct vpci); under spin_lock (Jan)
Changes since v1:
 - Assert that vpci_lock is locked in vpci_remove_device_locked.
 - Remove double newline.
 - Shrink critical section in vpci_{read/write}.
---
 tools/tests/vpci/emul.h       |  5 ++-
 tools/tests/vpci/main.c       |  4 +--
 xen/arch/x86/hvm/vmsi.c       |  8 ++---
 xen/drivers/passthrough/pci.c |  1 +
 xen/drivers/vpci/header.c     | 21 ++++++++----
 xen/drivers/vpci/msi.c        | 11 ++++--
 xen/drivers/vpci/msix.c       |  8 ++---
 xen/drivers/vpci/vpci.c       | 63 ++++++++++++++++++++++-------------
 xen/include/xen/pci.h         |  1 +
 xen/include/xen/vpci.h        |  3 +-
 10 files changed, 78 insertions(+), 47 deletions(-)

diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
index 2e1d3057c9d8..d018fb5eef21 100644
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -44,6 +44,7 @@ struct domain {
 };
 
 struct pci_dev {
+    bool vpci_lock;
     struct vpci *vpci;
 };
 
@@ -53,10 +54,8 @@ struct vcpu
 };
 
 extern const struct vcpu *current;
-extern const struct pci_dev test_pdev;
+extern struct pci_dev test_pdev;
 
-typedef bool spinlock_t;
-#define spin_lock_init(l) (*(l) = false)
 #define spin_lock(l) (*(l) = true)
 #define spin_unlock(l) (*(l) = false)
 
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index b9a0a6006bb9..26c95b08b6b1 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -23,7 +23,8 @@ static struct vpci vpci;
 
 const static struct domain d;
 
-const struct pci_dev test_pdev = {
+struct pci_dev test_pdev = {
+    .vpci_lock = false,
     .vpci = &vpci,
 };
 
@@ -158,7 +159,6 @@ main(int argc, char **argv)
     int rc;
 
     INIT_LIST_HEAD(&vpci.handlers);
-    spin_lock_init(&vpci.lock);
 
     VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
     VPCI_READ_CHECK(0, 4, r0);
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b439..1f7a37f78264 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -910,14 +910,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
         {
             struct pci_dev *pdev = msix->pdev;
 
-            spin_unlock(&msix->pdev->vpci->lock);
+            spin_unlock(&msix->pdev->vpci_lock);
             process_pending_softirqs();
             /* NB: we assume that pdev cannot go away for an alive domain. */
-            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+            if ( !spin_trylock(&pdev->vpci_lock) )
                 return -EBUSY;
-            if ( pdev->vpci->msix != msix )
+            if ( !pdev->vpci || pdev->vpci->msix != msix )
             {
-                spin_unlock(&pdev->vpci->lock);
+                spin_unlock(&pdev->vpci_lock);
                 return -EAGAIN;
             }
         }
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 1fad80362f0e..af648c6a19b5 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -328,6 +328,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
     *((u8*) &pdev->bus) = bus;
     *((u8*) &pdev->devfn) = devfn;
     pdev->domain = NULL;
+    spin_lock_init(&pdev->vpci_lock);
 
     arch_pci_init_pdev(pdev);
 
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 40ff79c33f8f..bd23c0274d48 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
         if ( rc == -ERESTART )
             return true;
 
-        spin_lock(&v->vpci.pdev->vpci->lock);
-        /* Disable memory decoding unconditionally on failure. */
-        modify_decoding(v->vpci.pdev,
-                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-                        !rc && v->vpci.rom_only);
-        spin_unlock(&v->vpci.pdev->vpci->lock);
+        spin_lock(&v->vpci.pdev->vpci_lock);
+        if ( v->vpci.pdev->vpci )
+            /* Disable memory decoding unconditionally on failure. */
+            modify_decoding(v->vpci.pdev,
+                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
+                            !rc && v->vpci.rom_only);
+        spin_unlock(&v->vpci.pdev->vpci_lock);
 
         rangeset_destroy(v->vpci.mem);
         v->vpci.mem = NULL;
@@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
                 continue;
         }
 
+        spin_lock(&tmp->vpci_lock);
+        if ( !tmp->vpci )
+        {
+            spin_unlock(&tmp->vpci_lock);
+            continue;
+        }
         for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
         {
             const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
@@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
             rc = rangeset_remove_range(mem, start, end);
             if ( rc )
             {
+                spin_unlock(&tmp->vpci_lock);
                 printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
                        start, end, rc);
                 rangeset_destroy(mem);
                 return rc;
             }
         }
+        spin_unlock(&tmp->vpci_lock);
     }
 
     ASSERT(dev);
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 5757a7aed20f..e3ce46869dad 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -270,7 +270,7 @@ void vpci_dump_msi(void)
     rcu_read_lock(&domlist_read_lock);
     for_each_domain ( d )
     {
-        const struct pci_dev *pdev;
+        struct pci_dev *pdev;
 
         if ( !has_vpci(d) )
             continue;
@@ -282,8 +282,13 @@ void vpci_dump_msi(void)
             const struct vpci_msi *msi;
             const struct vpci_msix *msix;
 
-            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+            if ( !spin_trylock(&pdev->vpci_lock) )
                 continue;
+            if ( !pdev->vpci )
+            {
+                spin_unlock(&pdev->vpci_lock);
+                continue;
+            }
 
             msi = pdev->vpci->msi;
             if ( msi && msi->enabled )
@@ -323,7 +328,7 @@ void vpci_dump_msi(void)
                 }
             }
 
-            spin_unlock(&pdev->vpci->lock);
+            spin_unlock(&pdev->vpci_lock);
             process_pending_softirqs();
         }
     }
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 846f1b8d7038..5310cc3ff520 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -225,7 +225,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
+    spin_lock(&msix->pdev->vpci_lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -254,7 +254,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
         ASSERT_UNREACHABLE();
         break;
     }
-    spin_unlock(&msix->pdev->vpci->lock);
+    spin_unlock(&msix->pdev->vpci_lock);
 
     return X86EMUL_OKAY;
 }
@@ -297,7 +297,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
+    spin_lock(&msix->pdev->vpci_lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -370,7 +370,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
         ASSERT_UNREACHABLE();
         break;
     }
-    spin_unlock(&msix->pdev->vpci->lock);
+    spin_unlock(&msix->pdev->vpci_lock);
 
     return X86EMUL_OKAY;
 }
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index fb0947179b79..c015a4d77540 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
-void vpci_remove_device(struct pci_dev *pdev)
+static void vpci_remove_device_locked(struct pci_dev *pdev)
 {
-    if ( !has_vpci(pdev->domain) )
-        return;
+    ASSERT(spin_is_locked(&pdev->vpci_lock));
 
-    spin_lock(&pdev->vpci->lock);
     while ( !list_empty(&pdev->vpci->handlers) )
     {
         struct vpci_register *r = list_first_entry(&pdev->vpci->handlers,
@@ -50,15 +48,26 @@ void vpci_remove_device(struct pci_dev *pdev)
         list_del(&r->node);
         xfree(r);
     }
-    spin_unlock(&pdev->vpci->lock);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
     pdev->vpci = NULL;
 }
 
+void vpci_remove_device(struct pci_dev *pdev)
+{
+    if ( !has_vpci(pdev->domain) )
+        return;
+
+    spin_lock(&pdev->vpci_lock);
+    if ( pdev->vpci )
+        vpci_remove_device_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
+}
+
 int vpci_add_handlers(struct pci_dev *pdev)
 {
+    struct vpci *vpci;
     unsigned int i;
     int rc = 0;
 
@@ -68,12 +77,14 @@ int vpci_add_handlers(struct pci_dev *pdev)
     /* We should not get here twice for the same device. */
     ASSERT(!pdev->vpci);
 
-    pdev->vpci = xzalloc(struct vpci);
-    if ( !pdev->vpci )
+    vpci = xzalloc(struct vpci);
+    if ( !vpci )
         return -ENOMEM;
 
-    INIT_LIST_HEAD(&pdev->vpci->handlers);
-    spin_lock_init(&pdev->vpci->lock);
+    INIT_LIST_HEAD(&vpci->handlers);
+
+    spin_lock(&pdev->vpci_lock);
+    pdev->vpci = vpci;
 
     for ( i = 0; i < NUM_VPCI_INIT; i++ )
     {
@@ -83,7 +94,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
     }
 
     if ( rc )
-        vpci_remove_device(pdev);
+        vpci_remove_device_locked(pdev);
+    spin_unlock(&pdev->vpci_lock);
 
     return rc;
 }
@@ -152,8 +164,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
     r->offset = offset;
     r->private = data;
 
-    spin_lock(&vpci->lock);
-
     /* The list of handlers must be kept sorted at all times. */
     list_for_each ( prev, &vpci->handlers )
     {
@@ -165,14 +175,12 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
             break;
         if ( cmp == 0 )
         {
-            spin_unlock(&vpci->lock);
             xfree(r);
             return -EEXIST;
         }
     }
 
     list_add_tail(&r->node, prev);
-    spin_unlock(&vpci->lock);
 
     return 0;
 }
@@ -183,7 +191,6 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
     const struct vpci_register r = { .offset = offset, .size = size };
     struct vpci_register *rm;
 
-    spin_lock(&vpci->lock);
     list_for_each_entry ( rm, &vpci->handlers, node )
     {
         int cmp = vpci_register_cmp(&r, rm);
@@ -195,14 +202,12 @@ int vpci_remove_register(struct vpci *vpci, unsigned int offset,
         if ( !cmp && rm->offset == offset && rm->size == size )
         {
             list_del(&rm->node);
-            spin_unlock(&vpci->lock);
             xfree(rm);
             return 0;
         }
         if ( cmp <= 0 )
             break;
     }
-    spin_unlock(&vpci->lock);
 
     return -ENOENT;
 }
@@ -311,7 +316,7 @@ static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
 uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
 {
     const struct domain *d = current->domain;
-    const struct pci_dev *pdev;
+    struct pci_dev *pdev;
     const struct vpci_register *r;
     unsigned int data_offset = 0;
     uint32_t data = ~(uint32_t)0;
@@ -327,7 +332,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
     if ( !pdev )
         return vpci_read_hw(sbdf, reg, size);
 
-    spin_lock(&pdev->vpci->lock);
+    spin_lock(&pdev->vpci_lock);
+    if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
+        return vpci_read_hw(sbdf, reg, size);
+    }
 
     /* Read from the hardware or the emulated register handlers. */
     list_for_each_entry ( r, &pdev->vpci->handlers, node )
@@ -370,7 +380,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
             break;
         ASSERT(data_offset < size);
     }
-    spin_unlock(&pdev->vpci->lock);
+    spin_unlock(&pdev->vpci_lock);
 
     if ( data_offset < size )
     {
@@ -414,7 +424,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
                 uint32_t data)
 {
     const struct domain *d = current->domain;
-    const struct pci_dev *pdev;
+    struct pci_dev *pdev;
     const struct vpci_register *r;
     unsigned int data_offset = 0;
     const unsigned long *ro_map = pci_get_ro_map(sbdf.seg);
@@ -440,7 +450,14 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
         return;
     }
 
-    spin_lock(&pdev->vpci->lock);
+    spin_lock(&pdev->vpci_lock);
+    if ( !pdev->vpci )
+    {
+        spin_unlock(&pdev->vpci_lock);
+        vpci_write_hw(sbdf, reg, size, data);
+        return;
+    }
+
 
     /* Write the value to the hardware or emulated registers. */
     list_for_each_entry ( r, &pdev->vpci->handlers, node )
@@ -475,7 +492,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
             break;
         ASSERT(data_offset < size);
     }
-    spin_unlock(&pdev->vpci->lock);
+    spin_unlock(&pdev->vpci_lock);
 
     if ( data_offset < size )
         /* Tailing gap, write the remaining. */
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index b6d7e454f814..3f60d6c6c6dd 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -134,6 +134,7 @@ struct pci_dev {
     u64 vf_rlen[6];
 
     /* Data for vPCI. */
+    spinlock_t vpci_lock;
     struct vpci *vpci;
 };
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 3f32de9d7eb3..d06efc3cea46 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -31,7 +31,7 @@ int __must_check vpci_add_handlers(struct pci_dev *dev);
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
 
-/* Add/remove a register handler. */
+/* Add/remove a register handler. Must be called holding the vpci_lock. */
 int __must_check vpci_add_register(struct vpci *vpci,
                                    vpci_read_t *read_handler,
                                    vpci_write_t *write_handler,
@@ -60,7 +60,6 @@ bool __must_check vpci_process_pending(struct vcpu *v);
 struct vpci {
     /* List of vPCI handlers for a device. */
     struct list_head handlers;
-    spinlock_t lock;
 
 #ifdef __XEN__
     /* Hide the rest of the vpci struct from the user-space test harness. */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] rangeset: add rangeset_reset helper function
  2022-02-01 16:25 ` [PATCH 2/4] rangeset: add rangeset_reset helper function Oleksandr Andrushchenko
@ 2022-02-01 17:05   ` Julien Grall
  2022-02-01 17:14     ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 23+ messages in thread
From: Julien Grall @ 2022-02-01 17:05 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, xen-devel
  Cc: sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

Hi,

On 01/02/2022 16:25, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> This helper destroys all the ranges of the rangeset given.
> Please note, that it uses rangeset_remove_range which returns an error
> code on failure. This error code can be ignored as while destroying all
> the ranges no memory allocation is expected, so in this case it must not
> fail.
> 
> To make sure this remains valid use BUG_ON if that changes in the future.
> 
> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>   xen/common/rangeset.c      | 6 ++++++
>   xen/include/xen/rangeset.h | 3 +++
>   2 files changed, 9 insertions(+)
> 
> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
> index ea27d651723b..9ca2b06cff22 100644
> --- a/xen/common/rangeset.c
> +++ b/xen/common/rangeset.c
> @@ -525,6 +525,12 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
>       write_unlock(&b->lock);
>   }
>   
> +void rangeset_reset(struct rangeset *r)
> +{
> +    /* This doesn't allocate anything and must not fail. */
> +    BUG_ON(rangeset_remove_range(r, 0, ~0ULL));

I vaguely recall some discussion in the past (not related to this 
series) that we wanted to avoid calling function have side-effect in a 
BUG_ON(). So if we decide to remove at compile-time BUG_ON(), there 
would be no issue.

But then I am not sure about the use of BUG_ON(). Can you outline why 
crashing the hypervisor is better than continuing (e.g. use WARN_ON() or 
ASSERT())?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] rangeset: add rangeset_reset helper function
  2022-02-01 17:05   ` Julien Grall
@ 2022-02-01 17:14     ` Oleksandr Andrushchenko
  2022-02-01 17:33       ` Julien Grall
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 17:14 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Julien!

On 01.02.22 19:05, Julien Grall wrote:
> Hi,
>
> On 01/02/2022 16:25, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> This helper destroys all the ranges of the rangeset given.
>> Please note, that it uses rangeset_remove_range which returns an error
>> code on failure. This error code can be ignored as while destroying all
>> the ranges no memory allocation is expected, so in this case it must not
>> fail.
>>
>> To make sure this remains valid use BUG_ON if that changes in the future.
>>
>> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   xen/common/rangeset.c      | 6 ++++++
>>   xen/include/xen/rangeset.h | 3 +++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
>> index ea27d651723b..9ca2b06cff22 100644
>> --- a/xen/common/rangeset.c
>> +++ b/xen/common/rangeset.c
>> @@ -525,6 +525,12 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
>>       write_unlock(&b->lock);
>>   }
>>   +void rangeset_reset(struct rangeset *r)
>> +{
>> +    /* This doesn't allocate anything and must not fail. */
>> +    BUG_ON(rangeset_remove_range(r, 0, ~0ULL));
>
> I vaguely recall some discussion in the past (not related to this series) that we wanted to avoid calling function have side-effect in a BUG_ON(). So if we decide to remove at compile-time BUG_ON(), there would be no issue.
>
> But then I am not sure about the use of BUG_ON(). Can you outline why crashing the hypervisor is better than continuing (e.g. use WARN_ON() or ASSERT())?
Non-zero value will indicate we were not able to complete the operation
which must not fail because of the concrete use-case: we remove all the
ranges and it is not expected that this may fail.
Just to make sure this behavior does not change I use BUG_ON here which
if triggered clearly indicates that the behavior has changed and there is
a need in code change.

I can turn this into WARN_ON instead, but this may lead to memory leaks
or some other errors not handled.

>
> Cheers,
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] rangeset: add rangeset_reset helper function
  2022-02-01 17:14     ` Oleksandr Andrushchenko
@ 2022-02-01 17:33       ` Julien Grall
  2022-02-01 17:39         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 23+ messages in thread
From: Julien Grall @ 2022-02-01 17:33 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, xen-devel
  Cc: sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh



On 01/02/2022 17:14, Oleksandr Andrushchenko wrote:
> On 01.02.22 19:05, Julien Grall wrote:
>> On 01/02/2022 16:25, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> This helper destroys all the ranges of the rangeset given.
>>> Please note, that it uses rangeset_remove_range which returns an error
>>> code on failure. This error code can be ignored as while destroying all
>>> the ranges no memory allocation is expected, so in this case it must not
>>> fail.
>>>
>>> To make sure this remains valid use BUG_ON if that changes in the future.
>>>
>>> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> ---
>>>    xen/common/rangeset.c      | 6 ++++++
>>>    xen/include/xen/rangeset.h | 3 +++
>>>    2 files changed, 9 insertions(+)
>>>
>>> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
>>> index ea27d651723b..9ca2b06cff22 100644
>>> --- a/xen/common/rangeset.c
>>> +++ b/xen/common/rangeset.c
>>> @@ -525,6 +525,12 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
>>>        write_unlock(&b->lock);
>>>    }
>>>    +void rangeset_reset(struct rangeset *r)
>>> +{
>>> +    /* This doesn't allocate anything and must not fail. */
>>> +    BUG_ON(rangeset_remove_range(r, 0, ~0ULL));
>>
>> I vaguely recall some discussion in the past (not related to this series) that we wanted to avoid calling function have side-effect in a BUG_ON(). So if we decide to remove at compile-time BUG_ON(), there would be no issue.
>>
>> But then I am not sure about the use of BUG_ON(). Can you outline why crashing the hypervisor is better than continuing (e.g. use WARN_ON() or ASSERT())?
> Non-zero value will indicate we were not able to complete the operation
> which must not fail because of the concrete use-case: we remove all the
> ranges and it is not expected that this may fail.
> Just to make sure this behavior does not change I use BUG_ON here which
> if triggered clearly indicates that the behavior has changed and there is
> a need in code change.

Right, but that change of behavior may not be noticed during 
development. So I think we want to avoid BUG_ON() when this is possible.

> 
> I can turn this into WARN_ON instead, but this may lead to memory leaks
> or some other errors not handled.

IMHO, this is a bit better but not by much. Looking a 
rangeset_destroy(), you should be able to do it without any of the 
issues you described here. Something like:

     if ( r == NULL )
       return;

     while ( (x = first_range(r)) != NULL )
         destroy_range(r, x);

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] rangeset: add rangeset_reset helper function
  2022-02-01 17:33       ` Julien Grall
@ 2022-02-01 17:39         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-01 17:39 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko


On 01.02.22 19:33, Julien Grall wrote:
>
>
> On 01/02/2022 17:14, Oleksandr Andrushchenko wrote:
>> On 01.02.22 19:05, Julien Grall wrote:
>>> On 01/02/2022 16:25, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> This helper destroys all the ranges of the rangeset given.
>>>> Please note, that it uses rangeset_remove_range which returns an error
>>>> code on failure. This error code can be ignored as while destroying all
>>>> the ranges no memory allocation is expected, so in this case it must not
>>>> fail.
>>>>
>>>> To make sure this remains valid use BUG_ON if that changes in the future.
>>>>
>>>> Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>> ---
>>>>    xen/common/rangeset.c      | 6 ++++++
>>>>    xen/include/xen/rangeset.h | 3 +++
>>>>    2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
>>>> index ea27d651723b..9ca2b06cff22 100644
>>>> --- a/xen/common/rangeset.c
>>>> +++ b/xen/common/rangeset.c
>>>> @@ -525,6 +525,12 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
>>>>        write_unlock(&b->lock);
>>>>    }
>>>>    +void rangeset_reset(struct rangeset *r)
>>>> +{
>>>> +    /* This doesn't allocate anything and must not fail. */
>>>> +    BUG_ON(rangeset_remove_range(r, 0, ~0ULL));
>>>
>>> I vaguely recall some discussion in the past (not related to this series) that we wanted to avoid calling function have side-effect in a BUG_ON(). So if we decide to remove at compile-time BUG_ON(), there would be no issue.
>>>
>>> But then I am not sure about the use of BUG_ON(). Can you outline why crashing the hypervisor is better than continuing (e.g. use WARN_ON() or ASSERT())?
>> Non-zero value will indicate we were not able to complete the operation
>> which must not fail because of the concrete use-case: we remove all the
>> ranges and it is not expected that this may fail.
>> Just to make sure this behavior does not change I use BUG_ON here which
>> if triggered clearly indicates that the behavior has changed and there is
>> a need in code change.
>
> Right, but that change of behavior may not be noticed during development. So I think we want to avoid BUG_ON() when this is possible.
>
>>
>> I can turn this into WARN_ON instead, but this may lead to memory leaks
>> or some other errors not handled.
>
> IMHO, this is a bit better but not by much. Looking a rangeset_destroy(), you should be able to do it without any of the issues you described here. Something like:
>
>     if ( r == NULL )
>       return;
>
>     while ( (x = first_range(r)) != NULL )
>         destroy_range(r, x);
>
Yes, this is actually what Roger suggested to me privately on IRC.
Ok, so I'll re-work the patch as above

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] vpci: shrink critical section in vpci_{read/write}
  2022-02-01 16:25 ` [PATCH 3/4] vpci: shrink critical section in vpci_{read/write} Oleksandr Andrushchenko
@ 2022-02-02  8:44   ` Roger Pau Monné
  2022-02-02  9:05     ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2022-02-02  8:44 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, jbeulich
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On Tue, Feb 01, 2022 at 06:25:07PM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Shrink critical section in vpci_{read/write} as racing calls to
> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
> around pci_conf_{read,write} functions, and the required locking (in
> case of using the IO ports) is already taken care in pci_conf_{read,write}.
> 
> Please note, that we anyways split 64bit writes into two 32bit ones
> without taking the lock for the whole duration of the access, so it is
> possible to see a partially updated state as a result of a 64bit write:
> the PCI(e) specification don't seem to specify whether the ECAM is allowed
> to split memory transactions into multiple Configuration Requests and
> whether those could then interleave with requests from a different CPU.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Would like to make sure whether Jan still have concerns about
splitting accesses though. Also since I'm the maintainer we need a
Reviewed-by from someone else.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] PCI devices passthrough pre-req patches
  2022-02-01 16:25 [PATCH 0/4] PCI devices passthrough pre-req patches Oleksandr Andrushchenko
                   ` (3 preceding siblings ...)
  2022-02-01 16:25 ` [PATCH 4/4] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
@ 2022-02-02  8:48 ` Jan Beulich
  2022-02-02  9:03   ` Oleksandr Andrushchenko
  4 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2022-02-02  8:48 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	xen-devel

On 01.02.2022 17:25, Oleksandr Andrushchenko wrote:
> Oleksandr Andrushchenko (3):
>   rangeset: add RANGESETF_no_print flag
>   rangeset: add rangeset_reset helper function
>   vpci: shrink critical section in vpci_{read/write}
> 
> Roger Pau Monne (1):
>   vpci: move lock outside of struct vpci

Btw, while I'll let Roger judge for the latter two, for the former
two while I appreciate you breaking this out from the larger series
I'm not convinced these will want committing without a user
appearing at least in close succession. Hence also why so far I
didn't commit patch 1, which as per its tags could have been put in
already.

Jan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4] PCI devices passthrough pre-req patches
  2022-02-02  8:48 ` [PATCH 0/4] PCI devices passthrough pre-req patches Jan Beulich
@ 2022-02-02  9:03   ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02  9:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, roger.pau, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

Hi, Jan!

On 02.02.22 10:48, Jan Beulich wrote:
> On 01.02.2022 17:25, Oleksandr Andrushchenko wrote:
>> Oleksandr Andrushchenko (3):
>>    rangeset: add RANGESETF_no_print flag
>>    rangeset: add rangeset_reset helper function
>>    vpci: shrink critical section in vpci_{read/write}
>>
>> Roger Pau Monne (1):
>>    vpci: move lock outside of struct vpci
> Btw, while I'll let Roger judge for the latter two, for the former
> two while I appreciate you breaking this out from the larger series
> I'm not convinced these will want committing without a user
> appearing at least in close succession. Hence also why so far I
> didn't commit patch 1, which as per its tags could have been put in
> already.
This is fair
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] vpci: shrink critical section in vpci_{read/write}
  2022-02-02  8:44   ` Roger Pau Monné
@ 2022-02-02  9:05     ` Jan Beulich
  2022-02-02  9:38       ` Oleksandr Andrushchenko
  2022-02-02  9:49       ` Roger Pau Monné
  0 siblings, 2 replies; 23+ messages in thread
From: Jan Beulich @ 2022-02-02  9:05 UTC (permalink / raw)
  To: Roger Pau Monné, Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, george.dunlap, paul, bertrand.marquis,
	rahul.singh, Oleksandr Andrushchenko

On 02.02.2022 09:44, Roger Pau Monné wrote:
> On Tue, Feb 01, 2022 at 06:25:07PM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Oleksandr, can you please clarify authorship here? The rule of thumb is
that From: matches ...

>> Shrink critical section in vpci_{read/write} as racing calls to
>> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
>> around pci_conf_{read,write} functions, and the required locking (in
>> case of using the IO ports) is already taken care in pci_conf_{read,write}.
>>
>> Please note, that we anyways split 64bit writes into two 32bit ones
>> without taking the lock for the whole duration of the access, so it is
>> possible to see a partially updated state as a result of a 64bit write:
>> the PCI(e) specification don't seem to specify whether the ECAM is allowed
>> to split memory transactions into multiple Configuration Requests and
>> whether those could then interleave with requests from a different CPU.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

... the first S-o-b, as these are expected to be in chronological
order.

> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

I'll take your unconstrained ack to indicate that you're also fine
with this going in right away; see my reply to 0/4 as to the earlier
two patches. Please let me know (soonish) if I shouldn't make this
implication, but I shall wait with committing for clarification of
the question further up anyway.

> Would like to make sure whether Jan still have concerns about
> splitting accesses though.

I continue to be a little concerned, but as long as the decision is
taken consciously (and this is recorded in the description), which
clearly is the case now, I have no objections. In the end well
behaved OSes will suitably serialize accesses to config space anyway.

> Also since I'm the maintainer we need a Reviewed-by from someone else.

Reviewed-by: Jan Beulich <jbeulich@suse.com>

I'm not sure this is strictly needed though: I'd generally consider
a 2nd (later) S-o-b as valid stand-in for R-b, at least as long as
the 2nd author doesn't scope-restrict their tag.

One further remark though: The resulting asymmetry of the locking
(covering the "head" hw read but not the "tail" one) looks a little
odd, but I will admit that I don't see a good way to restore symmetry.

Jan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] vpci: move lock outside of struct vpci
  2022-02-01 16:25 ` [PATCH 4/4] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
@ 2022-02-02  9:22   ` Jan Beulich
  2022-02-02 11:03     ` Oleksandr Andrushchenko
  2022-02-02  9:45   ` Roger Pau Monné
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2022-02-02  9:22 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Roger Pau Monné
  Cc: julien, sstabellini, oleksandr_tyshchenko, volodymyr_babchuk,
	Artem_Mygaiev, andrew.cooper3, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko,
	xen-devel

On 01.02.2022 17:25, Oleksandr Andrushchenko wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> This way the lock can be used to check whether vpci is present, and
> removal can be performed while holding the lock, in order to make
> sure there are no accesses to the contents of the vpci struct.
> Previously removal could race with vpci_read for example, since the
> lock was dropped prior to freeing pdev->vpci.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> New in v5 of this series: this is an updated version of the patch published at
> https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/
> 
> Changes since v5:

This is a little odd in a series implicitly tagged as v1.

> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
>          if ( rc == -ERESTART )
>              return true;
>  
> -        spin_lock(&v->vpci.pdev->vpci->lock);
> -        /* Disable memory decoding unconditionally on failure. */
> -        modify_decoding(v->vpci.pdev,
> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> -                        !rc && v->vpci.rom_only);
> -        spin_unlock(&v->vpci.pdev->vpci->lock);
> +        spin_lock(&v->vpci.pdev->vpci_lock);
> +        if ( v->vpci.pdev->vpci )
> +            /* Disable memory decoding unconditionally on failure. */
> +            modify_decoding(v->vpci.pdev,
> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> +                            !rc && v->vpci.rom_only);
> +        spin_unlock(&v->vpci.pdev->vpci_lock);

While I certainly see the point, the addition of this if() (and a
few more elsewhere) isn't covered by title or description.

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>  extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
> -void vpci_remove_device(struct pci_dev *pdev)
> +static void vpci_remove_device_locked(struct pci_dev *pdev)
>  {
> -    if ( !has_vpci(pdev->domain) )
> -        return;
> +    ASSERT(spin_is_locked(&pdev->vpci_lock));

While, unlike here, ...

> @@ -152,8 +164,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>      r->offset = offset;
>      r->private = data;
>  
> -    spin_lock(&vpci->lock);

... you did explain why you don't want to add a similar assertion
here, I think in return the function wants to have a comment added
that it's required to be called with the respective lock held. I
notice you did so for the declaration, but I think such a comment
would better be present at the definition as well. Same for
vpci_remove_register() then, obviously.

> @@ -311,7 +316,7 @@ static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
>  uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>  {
>      const struct domain *d = current->domain;
> -    const struct pci_dev *pdev;
> +    struct pci_dev *pdev;
>      const struct vpci_register *r;
>      unsigned int data_offset = 0;
>      uint32_t data = ~(uint32_t)0;
> @@ -327,7 +332,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>      if ( !pdev )
>          return vpci_read_hw(sbdf, reg, size);
>  
> -    spin_lock(&pdev->vpci->lock);
> +    spin_lock(&pdev->vpci_lock);
> +    if ( !pdev->vpci )
> +    {
> +        spin_unlock(&pdev->vpci_lock);
> +        return vpci_read_hw(sbdf, reg, size);
> +    }

In this case as well as in its write counterpart it becomes even more
important to justify (in the description) the new behavior. It is not
obvious at all that the absence of a struct vpci should be taken as
an indication that the underlying device needs accessing instead.
This also cannot be inferred from the "!pdev" case visible in context.
In that case we have no record of a device at this SBDF, and hence the
fallback pretty clearly is a "just in case" one. Yet if we know of a
device, the absence of a struct vpci may mean various possible things.

Jan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] vpci: shrink critical section in vpci_{read/write}
  2022-02-02  9:05     ` Jan Beulich
@ 2022-02-02  9:38       ` Oleksandr Andrushchenko
  2022-02-02  9:45         ` Jan Beulich
  2022-02-02  9:49       ` Roger Pau Monné
  1 sibling, 1 reply; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02  9:38 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko

Hi, Jan!

On 02.02.22 11:05, Jan Beulich wrote:
> On 02.02.2022 09:44, Roger Pau Monné wrote:
>> On Tue, Feb 01, 2022 at 06:25:07PM +0200, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Oleksandr, can you please clarify authorship here? The rule of thumb is
> that From: matches ...
>
>>> Shrink critical section in vpci_{read/write} as racing calls to
>>> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
>>> around pci_conf_{read,write} functions, and the required locking (in
>>> case of using the IO ports) is already taken care in pci_conf_{read,write}.
>>>
>>> Please note, that we anyways split 64bit writes into two 32bit ones
>>> without taking the lock for the whole duration of the access, so it is
>>> possible to see a partially updated state as a result of a 64bit write:
>>> the PCI(e) specification don't seem to specify whether the ECAM is allowed
>>> to split memory transactions into multiple Configuration Requests and
>>> whether those could then interleave with requests from a different CPU.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ... the first S-o-b, as these are expected to be in chronological
> order.
Well, I was not sure here: the idea and the original code belongs
to Roger and it was a part of a dedicated other patch. So, technically,
this patch didn't exist before and Roger hasn't created it (the patch).
So, this is why I'm in doubt here: should I change the authorship
to Roger's? I had no means to offend anyone here nor I pretend
for the authorship in any form.

I would like to apologize if anyone feels offended because of the authorship

Please help me understand what is the right approach here.
>
>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> I'll take your unconstrained ack to indicate that you're also fine
> with this going in right away; see my reply to 0/4 as to the earlier
> two patches. Please let me know (soonish) if I shouldn't make this
> implication, but I shall wait with committing for clarification of
> the question further up anyway.
I would postpone patches [0; 1] and just go with [3; 4] if your will
If not, then the whole series can be postponed until I have the
bigger one ready.
>
>> Would like to make sure whether Jan still have concerns about
>> splitting accesses though.
> I continue to be a little concerned, but as long as the decision is
> taken consciously (and this is recorded in the description), which
> clearly is the case now, I have no objections. In the end well
> behaved OSes will suitably serialize accesses to config space anyway.
>
>> Also since I'm the maintainer we need a Reviewed-by from someone else.
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> I'm not sure this is strictly needed though: I'd generally consider
> a 2nd (later) S-o-b as valid stand-in for R-b, at least as long as
> the 2nd author doesn't scope-restrict their tag.
>
> One further remark though: The resulting asymmetry of the locking
> (covering the "head" hw read but not the "tail" one) looks a little
> odd, but I will admit that I don't see a good way to restore symmetry.
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] vpci: move lock outside of struct vpci
  2022-02-01 16:25 ` [PATCH 4/4] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
  2022-02-02  9:22   ` Jan Beulich
@ 2022-02-02  9:45   ` Roger Pau Monné
  2022-02-02 10:15     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 23+ messages in thread
From: Roger Pau Monné @ 2022-02-02  9:45 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, oleksandr_tyshchenko,
	volodymyr_babchuk, Artem_Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, bertrand.marquis, rahul.singh,
	Oleksandr Andrushchenko

On Tue, Feb 01, 2022 at 06:25:08PM +0200, Oleksandr Andrushchenko wrote:
> From: Roger Pau Monne <roger.pau@citrix.com>
> 
> This way the lock can be used to check whether vpci is present, and
> removal can be performed while holding the lock, in order to make
> sure there are no accesses to the contents of the vpci struct.
> Previously removal could race with vpci_read for example, since the
> lock was dropped prior to freeing pdev->vpci.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien@xen.org>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> ---
> New in v5 of this series: this is an updated version of the patch published at
> https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/
> 
> Changes since v5:
>  - do not split code into vpci_remove_device_handlers_locked yet
>  - move INIT_LIST_HEAD outside the locked region (Jan)
>  - stripped out locking optimizations for vpci_{read|write} into a
>    dedicated patch
> Changes since v2:
>  - fixed pdev->vpci = xzalloc(struct vpci); under spin_lock (Jan)
> Changes since v1:
>  - Assert that vpci_lock is locked in vpci_remove_device_locked.
>  - Remove double newline.
>  - Shrink critical section in vpci_{read/write}.
> ---
>  tools/tests/vpci/emul.h       |  5 ++-
>  tools/tests/vpci/main.c       |  4 +--
>  xen/arch/x86/hvm/vmsi.c       |  8 ++---
>  xen/drivers/passthrough/pci.c |  1 +
>  xen/drivers/vpci/header.c     | 21 ++++++++----
>  xen/drivers/vpci/msi.c        | 11 ++++--
>  xen/drivers/vpci/msix.c       |  8 ++---
>  xen/drivers/vpci/vpci.c       | 63 ++++++++++++++++++++++-------------
>  xen/include/xen/pci.h         |  1 +
>  xen/include/xen/vpci.h        |  3 +-
>  10 files changed, 78 insertions(+), 47 deletions(-)
> 
> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
> index 2e1d3057c9d8..d018fb5eef21 100644
> --- a/tools/tests/vpci/emul.h
> +++ b/tools/tests/vpci/emul.h
> @@ -44,6 +44,7 @@ struct domain {
>  };
>  
>  struct pci_dev {
> +    bool vpci_lock;
>      struct vpci *vpci;
>  };
>  
> @@ -53,10 +54,8 @@ struct vcpu
>  };
>  
>  extern const struct vcpu *current;
> -extern const struct pci_dev test_pdev;
> +extern struct pci_dev test_pdev;
>  
> -typedef bool spinlock_t;
> -#define spin_lock_init(l) (*(l) = false)
>  #define spin_lock(l) (*(l) = true)
>  #define spin_unlock(l) (*(l) = false)
>  
> diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
> index b9a0a6006bb9..26c95b08b6b1 100644
> --- a/tools/tests/vpci/main.c
> +++ b/tools/tests/vpci/main.c
> @@ -23,7 +23,8 @@ static struct vpci vpci;
>  
>  const static struct domain d;
>  
> -const struct pci_dev test_pdev = {
> +struct pci_dev test_pdev = {
> +    .vpci_lock = false,

Nit: vpci_lock will already be initialized to false by default, so
this is redundant.

>      .vpci = &vpci,
>  };
>  
> @@ -158,7 +159,6 @@ main(int argc, char **argv)
>      int rc;
>  
>      INIT_LIST_HEAD(&vpci.handlers);
> -    spin_lock_init(&vpci.lock);
>  
>      VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
>      VPCI_READ_CHECK(0, 4, r0);
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 13e2a190b439..1f7a37f78264 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -910,14 +910,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>          {
>              struct pci_dev *pdev = msix->pdev;
>  
> -            spin_unlock(&msix->pdev->vpci->lock);
> +            spin_unlock(&msix->pdev->vpci_lock);
>              process_pending_softirqs();
>              /* NB: we assume that pdev cannot go away for an alive domain. */
> -            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
> +            if ( !spin_trylock(&pdev->vpci_lock) )
>                  return -EBUSY;
> -            if ( pdev->vpci->msix != msix )
> +            if ( !pdev->vpci || pdev->vpci->msix != msix )
>              {
> -                spin_unlock(&pdev->vpci->lock);
> +                spin_unlock(&pdev->vpci_lock);
>                  return -EAGAIN;
>              }
>          }
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 1fad80362f0e..af648c6a19b5 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -328,6 +328,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>      *((u8*) &pdev->bus) = bus;
>      *((u8*) &pdev->devfn) = devfn;
>      pdev->domain = NULL;
> +    spin_lock_init(&pdev->vpci_lock);
>  
>      arch_pci_init_pdev(pdev);
>  
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 40ff79c33f8f..bd23c0274d48 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
>          if ( rc == -ERESTART )
>              return true;
>  
> -        spin_lock(&v->vpci.pdev->vpci->lock);
> -        /* Disable memory decoding unconditionally on failure. */
> -        modify_decoding(v->vpci.pdev,
> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> -                        !rc && v->vpci.rom_only);
> -        spin_unlock(&v->vpci.pdev->vpci->lock);
> +        spin_lock(&v->vpci.pdev->vpci_lock);
> +        if ( v->vpci.pdev->vpci )
> +            /* Disable memory decoding unconditionally on failure. */
> +            modify_decoding(v->vpci.pdev,
> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
> +                            !rc && v->vpci.rom_only);
> +        spin_unlock(&v->vpci.pdev->vpci_lock);
>  
>          rangeset_destroy(v->vpci.mem);
>          v->vpci.mem = NULL;
> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>                  continue;
>          }
>  
> +        spin_lock(&tmp->vpci_lock);
> +        if ( !tmp->vpci )
> +        {
> +            spin_unlock(&tmp->vpci_lock);
> +            continue;
> +        }
>          for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>          {
>              const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>              rc = rangeset_remove_range(mem, start, end);
>              if ( rc )
>              {
> +                spin_unlock(&tmp->vpci_lock);
>                  printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>                         start, end, rc);
>                  rangeset_destroy(mem);
>                  return rc;
>              }
>          }
> +        spin_unlock(&tmp->vpci_lock);
>      }
>  
>      ASSERT(dev);
> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
> index 5757a7aed20f..e3ce46869dad 100644
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -270,7 +270,7 @@ void vpci_dump_msi(void)
>      rcu_read_lock(&domlist_read_lock);
>      for_each_domain ( d )
>      {
> -        const struct pci_dev *pdev;
> +        struct pci_dev *pdev;
>  
>          if ( !has_vpci(d) )
>              continue;
> @@ -282,8 +282,13 @@ void vpci_dump_msi(void)
>              const struct vpci_msi *msi;
>              const struct vpci_msix *msix;
>  
> -            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
> +            if ( !spin_trylock(&pdev->vpci_lock) )
>                  continue;
> +            if ( !pdev->vpci )
> +            {
> +                spin_unlock(&pdev->vpci_lock);
> +                continue;
> +            }
>  
>              msi = pdev->vpci->msi;
>              if ( msi && msi->enabled )
> @@ -323,7 +328,7 @@ void vpci_dump_msi(void)
>                  }
>              }
>  
> -            spin_unlock(&pdev->vpci->lock);
> +            spin_unlock(&pdev->vpci_lock);
>              process_pending_softirqs();
>          }
>      }
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 846f1b8d7038..5310cc3ff520 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -225,7 +225,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,

I think you also need to add locking to msix_find, otherwise it will
dereference pdev->vpci without holding the vpci_lock.

It might be a better approach to rename msix_find to msix_get and
return the vpci_msix struct with the vpci_lock taken, so we can assert
it's not going to disappear under our feet. Then you will also need to
add a msix_put function that releases the lock.

>          return X86EMUL_OKAY;
>      }
>  
> -    spin_lock(&msix->pdev->vpci->lock);
> +    spin_lock(&msix->pdev->vpci_lock);
>      entry = get_entry(msix, addr);
>      offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>  
> @@ -254,7 +254,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
>          ASSERT_UNREACHABLE();
>          break;
>      }
> -    spin_unlock(&msix->pdev->vpci->lock);
> +    spin_unlock(&msix->pdev->vpci_lock);
>  
>      return X86EMUL_OKAY;
>  }
> @@ -297,7 +297,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
>          return X86EMUL_OKAY;
>      }
>  
> -    spin_lock(&msix->pdev->vpci->lock);
> +    spin_lock(&msix->pdev->vpci_lock);
>      entry = get_entry(msix, addr);
>      offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>  
> @@ -370,7 +370,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
>          ASSERT_UNREACHABLE();
>          break;
>      }
> -    spin_unlock(&msix->pdev->vpci->lock);
> +    spin_unlock(&msix->pdev->vpci_lock);
>  
>      return X86EMUL_OKAY;
>  }
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index fb0947179b79..c015a4d77540 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>  extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
> -void vpci_remove_device(struct pci_dev *pdev)
> +static void vpci_remove_device_locked(struct pci_dev *pdev)

Nit: since it's a static function you can drop the vpci_ prefix here.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] vpci: shrink critical section in vpci_{read/write}
  2022-02-02  9:38       ` Oleksandr Andrushchenko
@ 2022-02-02  9:45         ` Jan Beulich
  2022-02-02  9:49           ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2022-02-02  9:45 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh, Roger Pau Monné

On 02.02.2022 10:38, Oleksandr Andrushchenko wrote:
> On 02.02.22 11:05, Jan Beulich wrote:
>> On 02.02.2022 09:44, Roger Pau Monné wrote:
>>> On Tue, Feb 01, 2022 at 06:25:07PM +0200, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Oleksandr, can you please clarify authorship here? The rule of thumb is
>> that From: matches ...
>>
>>>> Shrink critical section in vpci_{read/write} as racing calls to
>>>> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
>>>> around pci_conf_{read,write} functions, and the required locking (in
>>>> case of using the IO ports) is already taken care in pci_conf_{read,write}.
>>>>
>>>> Please note, that we anyways split 64bit writes into two 32bit ones
>>>> without taking the lock for the whole duration of the access, so it is
>>>> possible to see a partially updated state as a result of a 64bit write:
>>>> the PCI(e) specification don't seem to specify whether the ECAM is allowed
>>>> to split memory transactions into multiple Configuration Requests and
>>>> whether those could then interleave with requests from a different CPU.
>>>>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ... the first S-o-b, as these are expected to be in chronological
>> order.
> Well, I was not sure here: the idea and the original code belongs
> to Roger and it was a part of a dedicated other patch. So, technically,
> this patch didn't exist before and Roger hasn't created it (the patch).
> So, this is why I'm in doubt here: should I change the authorship
> to Roger's? I had no means to offend anyone here nor I pretend
> for the authorship in any form.

My personal view on it is that if you've broken this out of a larger
patch coming from Roger, then he should be named as the author.

Jan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] vpci: shrink critical section in vpci_{read/write}
  2022-02-02  9:45         ` Jan Beulich
@ 2022-02-02  9:49           ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02  9:49 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, george.dunlap, paul, Bertrand Marquis,
	Rahul Singh, Oleksandr Andrushchenko



On 02.02.22 11:45, Jan Beulich wrote:
> On 02.02.2022 10:38, Oleksandr Andrushchenko wrote:
>> On 02.02.22 11:05, Jan Beulich wrote:
>>> On 02.02.2022 09:44, Roger Pau Monné wrote:
>>>> On Tue, Feb 01, 2022 at 06:25:07PM +0200, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> Oleksandr, can you please clarify authorship here? The rule of thumb is
>>> that From: matches ...
>>>
>>>>> Shrink critical section in vpci_{read/write} as racing calls to
>>>>> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
>>>>> around pci_conf_{read,write} functions, and the required locking (in
>>>>> case of using the IO ports) is already taken care in pci_conf_{read,write}.
>>>>>
>>>>> Please note, that we anyways split 64bit writes into two 32bit ones
>>>>> without taking the lock for the whole duration of the access, so it is
>>>>> possible to see a partially updated state as a result of a 64bit write:
>>>>> the PCI(e) specification don't seem to specify whether the ECAM is allowed
>>>>> to split memory transactions into multiple Configuration Requests and
>>>>> whether those could then interleave with requests from a different CPU.
>>>>>
>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> ... the first S-o-b, as these are expected to be in chronological
>>> order.
>> Well, I was not sure here: the idea and the original code belongs
>> to Roger and it was a part of a dedicated other patch. So, technically,
>> this patch didn't exist before and Roger hasn't created it (the patch).
>> So, this is why I'm in doubt here: should I change the authorship
>> to Roger's? I had no means to offend anyone here nor I pretend
>> for the authorship in any form.
> My personal view on it is that if you've broken this out of a larger
> patch coming from Roger, then he should be named as the author.
Agree, will change.
Roger, I am sorry I didn't do it from the start
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] vpci: shrink critical section in vpci_{read/write}
  2022-02-02  9:05     ` Jan Beulich
  2022-02-02  9:38       ` Oleksandr Andrushchenko
@ 2022-02-02  9:49       ` Roger Pau Monné
  1 sibling, 0 replies; 23+ messages in thread
From: Roger Pau Monné @ 2022-02-02  9:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Andrushchenko, xen-devel, julien, sstabellini,
	oleksandr_tyshchenko, volodymyr_babchuk, george.dunlap, paul,
	bertrand.marquis, rahul.singh, Oleksandr Andrushchenko

On Wed, Feb 02, 2022 at 10:05:55AM +0100, Jan Beulich wrote:
> On 02.02.2022 09:44, Roger Pau Monné wrote:
> > On Tue, Feb 01, 2022 at 06:25:07PM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> Oleksandr, can you please clarify authorship here? The rule of thumb is
> that From: matches ...
> 
> >> Shrink critical section in vpci_{read/write} as racing calls to
> >> vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
> >> around pci_conf_{read,write} functions, and the required locking (in
> >> case of using the IO ports) is already taken care in pci_conf_{read,write}.
> >>
> >> Please note, that we anyways split 64bit writes into two 32bit ones
> >> without taking the lock for the whole duration of the access, so it is
> >> possible to see a partially updated state as a result of a 64bit write:
> >> the PCI(e) specification don't seem to specify whether the ECAM is allowed
> >> to split memory transactions into multiple Configuration Requests and
> >> whether those could then interleave with requests from a different CPU.
> >>
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> 
> ... the first S-o-b, as these are expected to be in chronological
> order.
> 
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> I'll take your unconstrained ack to indicate that you're also fine
> with this going in right away; see my reply to 0/4 as to the earlier
> two patches. Please let me know (soonish) if I shouldn't make this
> implication, but I shall wait with committing for clarification of
> the question further up anyway.

I think both vPCI patches in the series could go in when ready. They
are improvements on their own.

> > Would like to make sure whether Jan still have concerns about
> > splitting accesses though.
> 
> I continue to be a little concerned, but as long as the decision is
> taken consciously (and this is recorded in the description), which
> clearly is the case now, I have no objections. In the end well
> behaved OSes will suitably serialize accesses to config space anyway.
> 
> > Also since I'm the maintainer we need a Reviewed-by from someone else.
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> I'm not sure this is strictly needed though: I'd generally consider
> a 2nd (later) S-o-b as valid stand-in for R-b, at least as long as
> the 2nd author doesn't scope-restrict their tag.
> 
> One further remark though: The resulting asymmetry of the locking
> (covering the "head" hw read but not the "tail" one) looks a little
> odd, but I will admit that I don't see a good way to restore symmetry.

I did realize about such asymmetry also, but I don't think it can be
solved.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] vpci: move lock outside of struct vpci
  2022-02-02  9:45   ` Roger Pau Monné
@ 2022-02-02 10:15     ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 10:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, julien, sstabellini, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Artem Mygaiev, jbeulich, andrew.cooper3,
	george.dunlap, paul, Bertrand Marquis, Rahul Singh,
	Oleksandr Andrushchenko

Hi, Roger!

On 02.02.22 11:45, Roger Pau Monné wrote:
> On Tue, Feb 01, 2022 at 06:25:08PM +0200, Oleksandr Andrushchenko wrote:
>> From: Roger Pau Monne <roger.pau@citrix.com>
>>
>> This way the lock can be used to check whether vpci is present, and
>> removal can be performed while holding the lock, in order to make
>> sure there are no accesses to the contents of the vpci struct.
>> Previously removal could race with vpci_read for example, since the
>> lock was dropped prior to freeing pdev->vpci.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Julien Grall <julien@xen.org>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>> New in v5 of this series: this is an updated version of the patch published at
>> https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/__;!!GF_29dbcQIUBPA!ilhEn2M-44dKY8rsVO7qGmUSY22vSHwNaGJNUqhwvr-V2kdb-FDYL6f39FS8pKJm3q8HnOzyuA$ [lore[.]kernel[.]org]
>>
>> Changes since v5:
>>   - do not split code into vpci_remove_device_handlers_locked yet
>>   - move INIT_LIST_HEAD outside the locked region (Jan)
>>   - stripped out locking optimizations for vpci_{read|write} into a
>>     dedicated patch
>> Changes since v2:
>>   - fixed pdev->vpci = xzalloc(struct vpci); under spin_lock (Jan)
>> Changes since v1:
>>   - Assert that vpci_lock is locked in vpci_remove_device_locked.
>>   - Remove double newline.
>>   - Shrink critical section in vpci_{read/write}.
>> ---
>>   tools/tests/vpci/emul.h       |  5 ++-
>>   tools/tests/vpci/main.c       |  4 +--
>>   xen/arch/x86/hvm/vmsi.c       |  8 ++---
>>   xen/drivers/passthrough/pci.c |  1 +
>>   xen/drivers/vpci/header.c     | 21 ++++++++----
>>   xen/drivers/vpci/msi.c        | 11 ++++--
>>   xen/drivers/vpci/msix.c       |  8 ++---
>>   xen/drivers/vpci/vpci.c       | 63 ++++++++++++++++++++++-------------
>>   xen/include/xen/pci.h         |  1 +
>>   xen/include/xen/vpci.h        |  3 +-
>>   10 files changed, 78 insertions(+), 47 deletions(-)
>>
>> diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
>> index 2e1d3057c9d8..d018fb5eef21 100644
>> --- a/tools/tests/vpci/emul.h
>> +++ b/tools/tests/vpci/emul.h
>> @@ -44,6 +44,7 @@ struct domain {
>>   };
>>   
>>   struct pci_dev {
>> +    bool vpci_lock;
>>       struct vpci *vpci;
>>   };
>>   
>> @@ -53,10 +54,8 @@ struct vcpu
>>   };
>>   
>>   extern const struct vcpu *current;
>> -extern const struct pci_dev test_pdev;
>> +extern struct pci_dev test_pdev;
>>   
>> -typedef bool spinlock_t;
>> -#define spin_lock_init(l) (*(l) = false)
>>   #define spin_lock(l) (*(l) = true)
>>   #define spin_unlock(l) (*(l) = false)
>>   
>> diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
>> index b9a0a6006bb9..26c95b08b6b1 100644
>> --- a/tools/tests/vpci/main.c
>> +++ b/tools/tests/vpci/main.c
>> @@ -23,7 +23,8 @@ static struct vpci vpci;
>>   
>>   const static struct domain d;
>>   
>> -const struct pci_dev test_pdev = {
>> +struct pci_dev test_pdev = {
>> +    .vpci_lock = false,
> Nit: vpci_lock will already be initialized to false by default, so
> this is redundant.
Will remove
>
>>       .vpci = &vpci,
>>   };
>>   
>> @@ -158,7 +159,6 @@ main(int argc, char **argv)
>>       int rc;
>>   
>>       INIT_LIST_HEAD(&vpci.handlers);
>> -    spin_lock_init(&vpci.lock);
>>   
>>       VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
>>       VPCI_READ_CHECK(0, 4, r0);
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index 13e2a190b439..1f7a37f78264 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -910,14 +910,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>>           {
>>               struct pci_dev *pdev = msix->pdev;
>>   
>> -            spin_unlock(&msix->pdev->vpci->lock);
>> +            spin_unlock(&msix->pdev->vpci_lock);
>>               process_pending_softirqs();
>>               /* NB: we assume that pdev cannot go away for an alive domain. */
>> -            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
>> +            if ( !spin_trylock(&pdev->vpci_lock) )
>>                   return -EBUSY;
>> -            if ( pdev->vpci->msix != msix )
>> +            if ( !pdev->vpci || pdev->vpci->msix != msix )
>>               {
>> -                spin_unlock(&pdev->vpci->lock);
>> +                spin_unlock(&pdev->vpci_lock);
>>                   return -EAGAIN;
>>               }
>>           }
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 1fad80362f0e..af648c6a19b5 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -328,6 +328,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>>       *((u8*) &pdev->bus) = bus;
>>       *((u8*) &pdev->devfn) = devfn;
>>       pdev->domain = NULL;
>> +    spin_lock_init(&pdev->vpci_lock);
>>   
>>       arch_pci_init_pdev(pdev);
>>   
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 40ff79c33f8f..bd23c0274d48 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
>>           if ( rc == -ERESTART )
>>               return true;
>>   
>> -        spin_lock(&v->vpci.pdev->vpci->lock);
>> -        /* Disable memory decoding unconditionally on failure. */
>> -        modify_decoding(v->vpci.pdev,
>> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>> -                        !rc && v->vpci.rom_only);
>> -        spin_unlock(&v->vpci.pdev->vpci->lock);
>> +        spin_lock(&v->vpci.pdev->vpci_lock);
>> +        if ( v->vpci.pdev->vpci )
>> +            /* Disable memory decoding unconditionally on failure. */
>> +            modify_decoding(v->vpci.pdev,
>> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>> +                            !rc && v->vpci.rom_only);
>> +        spin_unlock(&v->vpci.pdev->vpci_lock);
>>   
>>           rangeset_destroy(v->vpci.mem);
>>           v->vpci.mem = NULL;
>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>                   continue;
>>           }
>>   
>> +        spin_lock(&tmp->vpci_lock);
>> +        if ( !tmp->vpci )
>> +        {
>> +            spin_unlock(&tmp->vpci_lock);
>> +            continue;
>> +        }
>>           for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>           {
>>               const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
>>               rc = rangeset_remove_range(mem, start, end);
>>               if ( rc )
>>               {
>> +                spin_unlock(&tmp->vpci_lock);
>>                   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
>>                          start, end, rc);
>>                   rangeset_destroy(mem);
>>                   return rc;
>>               }
>>           }
>> +        spin_unlock(&tmp->vpci_lock);
>>       }
>>   
>>       ASSERT(dev);
>> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
>> index 5757a7aed20f..e3ce46869dad 100644
>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -270,7 +270,7 @@ void vpci_dump_msi(void)
>>       rcu_read_lock(&domlist_read_lock);
>>       for_each_domain ( d )
>>       {
>> -        const struct pci_dev *pdev;
>> +        struct pci_dev *pdev;
>>   
>>           if ( !has_vpci(d) )
>>               continue;
>> @@ -282,8 +282,13 @@ void vpci_dump_msi(void)
>>               const struct vpci_msi *msi;
>>               const struct vpci_msix *msix;
>>   
>> -            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
>> +            if ( !spin_trylock(&pdev->vpci_lock) )
>>                   continue;
>> +            if ( !pdev->vpci )
>> +            {
>> +                spin_unlock(&pdev->vpci_lock);
>> +                continue;
>> +            }
>>   
>>               msi = pdev->vpci->msi;
>>               if ( msi && msi->enabled )
>> @@ -323,7 +328,7 @@ void vpci_dump_msi(void)
>>                   }
>>               }
>>   
>> -            spin_unlock(&pdev->vpci->lock);
>> +            spin_unlock(&pdev->vpci_lock);
>>               process_pending_softirqs();
>>           }
>>       }
>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> index 846f1b8d7038..5310cc3ff520 100644
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -225,7 +225,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
> I think you also need to add locking to msix_find, otherwise it will
> dereference pdev->vpci without holding the vpci_lock.
>
> It might be a better approach to rename msix_find to msix_get and
> return the vpci_msix struct with the vpci_lock taken, so we can assert
> it's not going to disappear under our feet. Then you will also need to
> add a msix_put function that releases the lock.
Ok, sounds good: so, I'll implement msix_{get|put} then
>
>>           return X86EMUL_OKAY;
>>       }
>>   
>> -    spin_lock(&msix->pdev->vpci->lock);
>> +    spin_lock(&msix->pdev->vpci_lock);
>>       entry = get_entry(msix, addr);
>>       offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>>   
>> @@ -254,7 +254,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
>>           ASSERT_UNREACHABLE();
>>           break;
>>       }
>> -    spin_unlock(&msix->pdev->vpci->lock);
>> +    spin_unlock(&msix->pdev->vpci_lock);
>>   
>>       return X86EMUL_OKAY;
>>   }
>> @@ -297,7 +297,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
>>           return X86EMUL_OKAY;
>>       }
>>   
>> -    spin_lock(&msix->pdev->vpci->lock);
>> +    spin_lock(&msix->pdev->vpci_lock);
>>       entry = get_entry(msix, addr);
>>       offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>>   
>> @@ -370,7 +370,7 @@ static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
>>           ASSERT_UNREACHABLE();
>>           break;
>>       }
>> -    spin_unlock(&msix->pdev->vpci->lock);
>> +    spin_unlock(&msix->pdev->vpci_lock);
>>   
>>       return X86EMUL_OKAY;
>>   }
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index fb0947179b79..c015a4d77540 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>   extern vpci_register_init_t *const __end_vpci_array[];
>>   #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>   
>> -void vpci_remove_device(struct pci_dev *pdev)
>> +static void vpci_remove_device_locked(struct pci_dev *pdev)
> Nit: since it's a static function you can drop the vpci_ prefix here.
This function is going to be used outside later on, but not yet.
So, I can change the name and then change it back once it is
used by others.
What's your preference here?
>
> Thanks, Roger.
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] vpci: move lock outside of struct vpci
  2022-02-02  9:22   ` Jan Beulich
@ 2022-02-02 11:03     ` Oleksandr Andrushchenko
  2022-02-02 11:14       ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 11:03 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel,
	Oleksandr Andrushchenko

Hi, Jan!

On 02.02.22 11:22, Jan Beulich wrote:
> On 01.02.2022 17:25, Oleksandr Andrushchenko wrote:
>> From: Roger Pau Monne <roger.pau@citrix.com>
>>
>> This way the lock can be used to check whether vpci is present, and
>> removal can be performed while holding the lock, in order to make
>> sure there are no accesses to the contents of the vpci struct.
>> Previously removal could race with vpci_read for example, since the
>> lock was dropped prior to freeing pdev->vpci.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Julien Grall <julien@xen.org>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> ---
>> New in v5 of this series: this is an updated version of the patch published at
>> https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/__;!!GF_29dbcQIUBPA!jmmcewY6y9Ur4rgvOgqscz8gBWaod2JnQOkHvWtYKgnqeU6BoWJTqCN3UDpCw3io8Ynk-wBXhA$ [lore[.]kernel[.]org]
>>
>> Changes since v5:
> This is a little odd in a series implicitly tagged as v1.
>
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
>>           if ( rc == -ERESTART )
>>               return true;
>>   
>> -        spin_lock(&v->vpci.pdev->vpci->lock);
>> -        /* Disable memory decoding unconditionally on failure. */
>> -        modify_decoding(v->vpci.pdev,
>> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>> -                        !rc && v->vpci.rom_only);
>> -        spin_unlock(&v->vpci.pdev->vpci->lock);
>> +        spin_lock(&v->vpci.pdev->vpci_lock);
>> +        if ( v->vpci.pdev->vpci )
>> +            /* Disable memory decoding unconditionally on failure. */
>> +            modify_decoding(v->vpci.pdev,
>> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>> +                            !rc && v->vpci.rom_only);
>> +        spin_unlock(&v->vpci.pdev->vpci_lock);
> While I certainly see the point, the addition of this if() (and a
> few more elsewhere) isn't covered by title or description.
The commit message says:
"This way the lock can be used to check whether vpci is present, and
removal can be performed while holding the lock, in order to make
sure there are no accesses to the contents of the vpci struct."
So, I think this is enough to describe the fact that after you have locked
the protected structure may have gone already and we need to
re-check it is still present.
>
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -35,12 +35,10 @@ extern vpci_register_init_t *const __start_vpci_array[];
>>   extern vpci_register_init_t *const __end_vpci_array[];
>>   #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>>   
>> -void vpci_remove_device(struct pci_dev *pdev)
>> +static void vpci_remove_device_locked(struct pci_dev *pdev)
>>   {
>> -    if ( !has_vpci(pdev->domain) )
>> -        return;
>> +    ASSERT(spin_is_locked(&pdev->vpci_lock));
> While, unlike here, ...
>
>> @@ -152,8 +164,6 @@ int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>       r->offset = offset;
>>       r->private = data;
>>   
>> -    spin_lock(&vpci->lock);
> ... you did explain why you don't want to add a similar assertion
> here, I think in return the function wants to have a comment added
> that it's required to be called with the respective lock held.
Will add the comments
>   I
> notice you did so for the declaration, but I think such a comment
> would better be present at the definition as well. Same for
> vpci_remove_register() then, obviously.
Ok
>
>> @@ -311,7 +316,7 @@ static uint32_t merge_result(uint32_t data, uint32_t new, unsigned int size,
>>   uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>   {
>>       const struct domain *d = current->domain;
>> -    const struct pci_dev *pdev;
>> +    struct pci_dev *pdev;
>>       const struct vpci_register *r;
>>       unsigned int data_offset = 0;
>>       uint32_t data = ~(uint32_t)0;
>> @@ -327,7 +332,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>       if ( !pdev )
>>           return vpci_read_hw(sbdf, reg, size);
>>   
>> -    spin_lock(&pdev->vpci->lock);
>> +    spin_lock(&pdev->vpci_lock);
>> +    if ( !pdev->vpci )
>> +    {
>> +        spin_unlock(&pdev->vpci_lock);
>> +        return vpci_read_hw(sbdf, reg, size);
>> +    }
> In this case as well as in its write counterpart it becomes even more
> important to justify (in the description) the new behavior. It is not
> obvious at all that the absence of a struct vpci should be taken as
> an indication that the underlying device needs accessing instead.
> This also cannot be inferred from the "!pdev" case visible in context.
> In that case we have no record of a device at this SBDF, and hence the
> fallback pretty clearly is a "just in case" one. Yet if we know of a
> device, the absence of a struct vpci may mean various possible things.
Ok, I'll describe this change
>
> Jan
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] vpci: move lock outside of struct vpci
  2022-02-02 11:03     ` Oleksandr Andrushchenko
@ 2022-02-02 11:14       ` Jan Beulich
  2022-02-02 11:26         ` Oleksandr Andrushchenko
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2022-02-02 11:14 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel, Roger Pau Monné

On 02.02.2022 12:03, Oleksandr Andrushchenko wrote:
> On 02.02.22 11:22, Jan Beulich wrote:
>> On 01.02.2022 17:25, Oleksandr Andrushchenko wrote:
>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>
>>> This way the lock can be used to check whether vpci is present, and
>>> removal can be performed while holding the lock, in order to make
>>> sure there are no accesses to the contents of the vpci struct.
>>> Previously removal could race with vpci_read for example, since the
>>> lock was dropped prior to freeing pdev->vpci.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>> ---
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Julien Grall <julien@xen.org>
>>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>>> ---
>>> New in v5 of this series: this is an updated version of the patch published at
>>> https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/__;!!GF_29dbcQIUBPA!jmmcewY6y9Ur4rgvOgqscz8gBWaod2JnQOkHvWtYKgnqeU6BoWJTqCN3UDpCw3io8Ynk-wBXhA$ [lore[.]kernel[.]org]
>>>
>>> Changes since v5:
>> This is a little odd in a series implicitly tagged as v1.
>>
>>> --- a/xen/drivers/vpci/header.c
>>> +++ b/xen/drivers/vpci/header.c
>>> @@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
>>>           if ( rc == -ERESTART )
>>>               return true;
>>>   
>>> -        spin_lock(&v->vpci.pdev->vpci->lock);
>>> -        /* Disable memory decoding unconditionally on failure. */
>>> -        modify_decoding(v->vpci.pdev,
>>> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>>> -                        !rc && v->vpci.rom_only);
>>> -        spin_unlock(&v->vpci.pdev->vpci->lock);
>>> +        spin_lock(&v->vpci.pdev->vpci_lock);
>>> +        if ( v->vpci.pdev->vpci )
>>> +            /* Disable memory decoding unconditionally on failure. */
>>> +            modify_decoding(v->vpci.pdev,
>>> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>>> +                            !rc && v->vpci.rom_only);
>>> +        spin_unlock(&v->vpci.pdev->vpci_lock);
>> While I certainly see the point, the addition of this if() (and a
>> few more elsewhere) isn't covered by title or description.
> The commit message says:
> "This way the lock can be used to check whether vpci is present, and
> removal can be performed while holding the lock, in order to make
> sure there are no accesses to the contents of the vpci struct."
> So, I think this is enough to describe the fact that after you have locked
> the protected structure may have gone already and we need to
> re-check it is still present.

I'm afraid that to me "can be used" describes future behavior, as
opposed to e.g. "is used". If you want to point out both aspects,
maybe "... can be used (and in a few cases is used right away) ..."?

Jan



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] vpci: move lock outside of struct vpci
  2022-02-02 11:14       ` Jan Beulich
@ 2022-02-02 11:26         ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 23+ messages in thread
From: Oleksandr Andrushchenko @ 2022-02-02 11:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: julien, sstabellini, Oleksandr Tyshchenko, Volodymyr Babchuk,
	Artem Mygaiev, andrew.cooper3, george.dunlap, paul,
	Bertrand Marquis, Rahul Singh, xen-devel, Roger Pau Monné,
	Oleksandr Andrushchenko



On 02.02.22 13:14, Jan Beulich wrote:
> On 02.02.2022 12:03, Oleksandr Andrushchenko wrote:
>> On 02.02.22 11:22, Jan Beulich wrote:
>>> On 01.02.2022 17:25, Oleksandr Andrushchenko wrote:
>>>> From: Roger Pau Monne <roger.pau@citrix.com>
>>>>
>>>> This way the lock can be used to check whether vpci is present, and
>>>> removal can be performed while holding the lock, in order to make
>>>> sure there are no accesses to the contents of the vpci struct.
>>>> Previously removal could race with vpci_read for example, since the
>>>> lock was dropped prior to freeing pdev->vpci.
>>>>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>> ---
>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>> Cc: Julien Grall <julien@xen.org>
>>>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>>>> ---
>>>> New in v5 of this series: this is an updated version of the patch published at
>>>> https://urldefense.com/v3/__https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger.pau@citrix.com/__;!!GF_29dbcQIUBPA!jmmcewY6y9Ur4rgvOgqscz8gBWaod2JnQOkHvWtYKgnqeU6BoWJTqCN3UDpCw3io8Ynk-wBXhA$ [lore[.]kernel[.]org]
>>>>
>>>> Changes since v5:
>>> This is a little odd in a series implicitly tagged as v1.
>>>
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
>>>>            if ( rc == -ERESTART )
>>>>                return true;
>>>>    
>>>> -        spin_lock(&v->vpci.pdev->vpci->lock);
>>>> -        /* Disable memory decoding unconditionally on failure. */
>>>> -        modify_decoding(v->vpci.pdev,
>>>> -                        rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>>>> -                        !rc && v->vpci.rom_only);
>>>> -        spin_unlock(&v->vpci.pdev->vpci->lock);
>>>> +        spin_lock(&v->vpci.pdev->vpci_lock);
>>>> +        if ( v->vpci.pdev->vpci )
>>>> +            /* Disable memory decoding unconditionally on failure. */
>>>> +            modify_decoding(v->vpci.pdev,
>>>> +                            rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
>>>> +                            !rc && v->vpci.rom_only);
>>>> +        spin_unlock(&v->vpci.pdev->vpci_lock);
>>> While I certainly see the point, the addition of this if() (and a
>>> few more elsewhere) isn't covered by title or description.
>> The commit message says:
>> "This way the lock can be used to check whether vpci is present, and
>> removal can be performed while holding the lock, in order to make
>> sure there are no accesses to the contents of the vpci struct."
>> So, I think this is enough to describe the fact that after you have locked
>> the protected structure may have gone already and we need to
>> re-check it is still present.
> I'm afraid that to me "can be used" describes future behavior, as
> opposed to e.g. "is used". If you want to point out both aspects,
> maybe "... can be used (and in a few cases is used right away) ..."?
This sounds good to me, thank you for suggesting that
>
> Jan
>

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-02-02 11:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-01 16:25 [PATCH 0/4] PCI devices passthrough pre-req patches Oleksandr Andrushchenko
2022-02-01 16:25 ` [PATCH 1/4] rangeset: add RANGESETF_no_print flag Oleksandr Andrushchenko
2022-02-01 16:25 ` [PATCH 2/4] rangeset: add rangeset_reset helper function Oleksandr Andrushchenko
2022-02-01 17:05   ` Julien Grall
2022-02-01 17:14     ` Oleksandr Andrushchenko
2022-02-01 17:33       ` Julien Grall
2022-02-01 17:39         ` Oleksandr Andrushchenko
2022-02-01 16:25 ` [PATCH 3/4] vpci: shrink critical section in vpci_{read/write} Oleksandr Andrushchenko
2022-02-02  8:44   ` Roger Pau Monné
2022-02-02  9:05     ` Jan Beulich
2022-02-02  9:38       ` Oleksandr Andrushchenko
2022-02-02  9:45         ` Jan Beulich
2022-02-02  9:49           ` Oleksandr Andrushchenko
2022-02-02  9:49       ` Roger Pau Monné
2022-02-01 16:25 ` [PATCH 4/4] vpci: move lock outside of struct vpci Oleksandr Andrushchenko
2022-02-02  9:22   ` Jan Beulich
2022-02-02 11:03     ` Oleksandr Andrushchenko
2022-02-02 11:14       ` Jan Beulich
2022-02-02 11:26         ` Oleksandr Andrushchenko
2022-02-02  9:45   ` Roger Pau Monné
2022-02-02 10:15     ` Oleksandr Andrushchenko
2022-02-02  8:48 ` [PATCH 0/4] PCI devices passthrough pre-req patches Jan Beulich
2022-02-02  9:03   ` Oleksandr Andrushchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.